CN107515674B - It is a kind of that implementation method is interacted based on virtual reality more with the mining processes of augmented reality - Google Patents
It is a kind of that implementation method is interacted based on virtual reality more with the mining processes of augmented reality Download PDFInfo
- Publication number
- CN107515674B CN107515674B CN201710668415.XA CN201710668415A CN107515674B CN 107515674 B CN107515674 B CN 107515674B CN 201710668415 A CN201710668415 A CN 201710668415A CN 107515674 B CN107515674 B CN 107515674B
- Authority
- CN
- China
- Prior art keywords
- model
- function
- probability
- mining
- virtual
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/03—Arrangements for converting the position or the displacement of a member into a coded form
- G06F3/033—Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor
- G06F3/0346—Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor with detection of the device orientation or free movement in a 3D space, e.g. 3D mice, 6-DOF [six degrees of freedom] pointers using gyroscopes, accelerometers or tilt-sensors
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating 3D models or images for computer graphics
- G06T19/003—Navigation within 3D models or images
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating 3D models or images for computer graphics
- G06T19/006—Mixed reality
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating 3D models or images for computer graphics
- G06T19/20—Editing of 3D images, e.g. changing shapes or colours, aligning objects or positioning parts
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2203/00—Indexing scheme relating to G06F3/00 - G06F3/048
- G06F2203/01—Indexing scheme relating to G06F3/01
- G06F2203/012—Walk-in-place systems for allowing a user to walk in a virtual environment while constraining him to a given position in the physical environment
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computer Graphics (AREA)
- Computer Hardware Design (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Quality & Reliability (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Probability & Statistics with Applications (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Architecture (AREA)
- User Interface Of Digital Computer (AREA)
- Coloring Foods And Improving Nutritive Qualities (AREA)
- Processing Or Creating Images (AREA)
Abstract
本发明公开了一种基于虚拟现实与增强现实的采矿操作多交互实现方法,属于虚拟现实、增强现实技术领域,包括虚拟现实与增强现实两种模式,虚拟现实场景下,可以实现虚拟场景中的模型、材质的选择、更换,场景漫游,任意移动摆放模型,视频嵌入,生成二维码,触发器实现自然交互,语音交互等;增强现实场景下,可以选择模型、播放语音、演示模型运行动态以及控制模型旋转停止、截图和功能扩展;在两种模式下,实现了语音控制、手势控制以及键盘鼠标控制的多种交互方式。本发明应用于采矿操作的虚拟仿真应用场景,可用于培训矿区开采工人以及采矿工程专业的学生,减少了培训资本,提高了工人的技能,对指导生产施工和科学技术研究提供了先进、快捷的手段。
The invention discloses a mining operation multi-interaction realization method based on virtual reality and augmented reality, which belongs to the technical field of virtual reality and augmented reality, and includes two modes of virtual reality and augmented reality. Selection and replacement of models and materials, scene roaming, arbitrary movement and placement of models, video embedding, generation of QR codes, triggers to achieve natural interaction, voice interaction, etc.; in augmented reality scenarios, you can select models, play voices, and demonstrate model operation Dynamic and control model rotation stop, screenshot and function expansion; in the two modes, various interactive modes of voice control, gesture control and keyboard and mouse control are realized. The present invention is applied to the virtual simulation application scene of mining operation, can be used to train mining workers in mining areas and students majoring in mining engineering, reduces training capital, improves workers' skills, and provides advanced and fast methods for guiding production construction and scientific and technological research means.
Description
技术领域technical field
本发明属于虚拟现实、增强现实技术领域,具体涉及一种基于虚拟现实与增强现实的采矿操作多交互实现方法。The invention belongs to the technical field of virtual reality and augmented reality, and in particular relates to a multi-interaction realization method for mining operations based on virtual reality and augmented reality.
背景技术Background technique
2016年被业界称为“虚拟现实元年”,可能会有人误以为这项技术是近几年发展起来的新技术。其实不然,虚拟现实(Virtual Reality,简称VR)技术兴起于20世纪90年代,2000年以后,虚拟现实技术在整合发展中引入了XML、JAVA等先进技术,应用强大的3D计算能力和交互式技术,提高渲染质量和传输速度,进入了崭新的发展时代。虚拟现实技术是经济和社会生产力发展的产物,有着广阔的应用前景。我国虚拟现实技术的研究起步于20世纪90年代初。随着计算机图形学、计算机系统工程等的高速发展,虚拟现实技术得到相当的重视。国家广告研究院等多家机构联合发布的《2016上半年中国VR用户行为研究报告》显示,2016年上半年国内虚拟现实潜在用户达4.5亿,浅度用户约为2700万,重度用户约237万,预计国内虚拟现实市场将迎来爆发式增长。而增强现实(Augmented Reality,简称AR)技术是在虚拟现实的基础上发展起来的一种新兴技术。其应用领域也非常广泛,其在工业、医疗、军事、市政、电视、游戏、展览等领域都表现出了良好的应用前景。2016 is called "the first year of virtual reality" by the industry. Some people may mistakenly think that this technology is a new technology developed in recent years. In fact, virtual reality (Virtual Reality, referred to as VR) technology emerged in the 1990s. After 2000, virtual reality technology introduced advanced technologies such as XML and JAVA in the integrated development, and applied powerful 3D computing capabilities and interactive technologies. , improve rendering quality and transmission speed, and enter a new era of development. Virtual reality technology is the product of the development of economic and social productivity, and has broad application prospects. Research on virtual reality technology in my country started in the early 1990s. With the rapid development of computer graphics and computer system engineering, virtual reality technology has been given considerable attention. The "Research Report on VR User Behavior in China in the First Half of 2016" jointly released by the National Advertising Research Institute and other institutions shows that in the first half of 2016, the potential users of virtual reality in China reached 450 million, with about 27 million shallow users and about 2.37 million heavy users. , It is expected that the domestic virtual reality market will usher in explosive growth. The augmented reality (Augmented Reality, referred to as AR) technology is a new technology developed on the basis of virtual reality. Its application fields are also very wide, and it has shown good application prospects in fields such as industry, medical treatment, military affairs, municipal administration, television, games, and exhibitions.
目前,VR与AR技术不断发展,应用范围也越来越广泛,但是这两种技术更多的应用于军事、娱乐等领域,对于教育、工业、工程等领域的应用,由于领域本身涉及多种物理、地理等多学科因素,还需要更多的研究与发展。对于矿山开采工业领域,我国矿山的地质条件较为复杂,且多为井下开采,在矿山开采过程中,由于开采环境位于地下,工艺工序又颇为复杂,瓦斯、水害等灾害事故时有发生。与此同时,矿山开采又是一个工期长、投资大、安全隐患高的行业,很容易发生安全事故,所以采矿员工的安全培训一直是采矿工作的重中之重。但是,目前存在的传统培训教学系统,基本是理论介绍加以模具展示或者二维图像展示,以课堂讲解为主,辅以简单的动画和音、视频的介绍,实践不足、缺乏真实场景。即便是观看模具也不能很好的掌握工具的实际操作流程。随着技术的不断发展,各种应用于煤矿开采的训练系统也相应开发,但是也存在系统场景真实性差、沉浸性效果不好以及交互性功能少,只能简单的演示等问题。At present, VR and AR technologies continue to develop, and the scope of application is becoming wider and wider. However, these two technologies are more used in military, entertainment and other fields. For applications in education, industry, engineering and other fields, because the field itself involves many Multidisciplinary factors such as physics and geography still need more research and development. For the field of mining industry, the geological conditions of mines in my country are relatively complex, and most of them are mined underground. During the mining process, because the mining environment is located underground and the process is quite complicated, disasters such as gas and water damage occur from time to time. At the same time, mining is an industry with long construction period, large investment and high safety hazards, and safety accidents are prone to occur. Therefore, the safety training of mining employees has always been the top priority of mining work. However, the existing traditional training and teaching system is basically theoretical introduction plus mold display or two-dimensional image display, mainly classroom explanation, supplemented by simple animation, audio and video introduction, lack of practice and lack of real scenes. Even watching the mold is not a good grasp of the actual operation process of the tool. With the continuous development of technology, various training systems for coal mining have also been developed accordingly, but there are also problems such as poor authenticity of the system scene, poor immersion effect, and few interactive functions, which can only be demonstrated simply.
发明内容Contents of the invention
针对现有技术中存在的上述技术问题,本发明提出了一种基于虚拟现实与增强现实的采矿操作多交互实现方法,设计合理,克服了现有技术的不足,具有良好的效果。Aiming at the above-mentioned technical problems existing in the prior art, the present invention proposes a multi-interaction realization method for mining operations based on virtual reality and augmented reality, which has a reasonable design, overcomes the deficiencies of the prior art, and has good effects.
为了实现上述目的,本发明采用如下技术方案:In order to achieve the above object, the present invention adopts the following technical solutions:
一种基于虚拟现实与增强现实的采矿操作多交互实现方法,采用井下采矿操作多交互仿真系统,该系统包含虚拟现实模式和增强现实模式两种模式;虚拟现实模式包含特定场景的建模、漫游、模型及其材质的更换、视频嵌入虚拟场景、模型移动、应用场景意向交互、二维码生成以及语音交互;增强现实模式包含模型选择、模型讲解、动态模型演示、手势控制模型交互、截图生成图标、360度旋转以及停止、功能模式切换以及功能扩展;系统有设计两种隐藏菜单,即虚拟现实模式下的更换模型、材质的选择菜单以及增强现实模式下的模型选择类菜单;第一种用户进入特定区域菜单才会显示,离开即会隐藏;第二种点击便可在某处显示二级菜单,再次点击菜单隐藏;A multi-interaction realization method for mining operations based on virtual reality and augmented reality, using a multi-interaction simulation system for underground mining operations, the system includes two modes: virtual reality mode and augmented reality mode; virtual reality mode includes modeling, roaming , model and material replacement, video embedding into virtual scene, model movement, application scene interaction, QR code generation and voice interaction; augmented reality mode includes model selection, model explanation, dynamic model demonstration, gesture control model interaction, screenshot generation Icons, 360-degree rotation and stop, function mode switching and function expansion; the system has designed two hidden menus, that is, model replacement in virtual reality mode, material selection menu and model selection menu in augmented reality mode; the first one The menu will only be displayed when the user enters a specific area, and will be hidden when the user leaves; the second click can display the secondary menu somewhere, and click the menu again to hide;
所述的采矿操作多交互实现方法,具体包括如下步骤:The multi-interactive implementation method of the mining operation specifically includes the following steps:
步骤1:矿山开采操作的整个环境场景的搭建Step 1: Construction of the entire environment scene for mining operations
根据井下采矿操作的真实环境,利用建模工具3DMax进行1:1等比建模,实现整个井下采矿操作的环境模拟;利用UE4引擎对模型进行包括创建、编辑贴图以及材质在内的编辑,添加物理碰撞,对整体环境进行灯光、效果光照以及特效添加,并进行烘焙、渲染;According to the real environment of underground mining operations, use the modeling tool 3DMax to carry out 1:1 scale modeling to realize the environmental simulation of the entire underground mining operation; use the UE4 engine to edit the model including creating, editing textures and materials, adding Physical collision, adding lighting, effect lighting and special effects to the overall environment, and baking and rendering;
步骤2:虚拟现实应用场景的漫游Step 2: Roaming of virtual reality application scenarios
在UE4引擎中,设置键盘上、下、左、右键,绑定Up、Down、Right、Left方向控制函数,为鼠标绑定Turnaround控制函数,实现整个井下采矿操作的虚拟现实场景的漫游;In the UE4 engine, set up, down, left, and right keys on the keyboard, bind the Up, Down, Right, and Left direction control functions, and bind the Turnaround control function for the mouse to realize the roaming of the virtual reality scene of the entire underground mining operation;
步骤3:更换井下采矿操作的工具模型以及矿山地质的模拟材质Step 3: Replacement of tool models for underground mining operations and simulated materials for mine geology
在虚拟井下矿山开采场景中添加隐藏菜单,当漫游至矿山开采处,会自动出现模型或者材质选择菜单,用户可以根据需求从菜单中选择模型或者材质进行更换;Add a hidden menu in the virtual underground mining scene. When roaming to the mining site, a model or material selection menu will automatically appear. Users can select a model or material from the menu to replace according to their needs;
步骤4:将视频素材嵌入三维应用场景并控制播放、停止Step 4: Embed the video material into the 3D application scene and control the playback and stop
将视频素材嵌入虚拟现实场景,在三维空间中播放,模拟矿山开采环境的监控显示设备,设置键盘X键,绑定UE4平台的MediaPlayer媒体类,通过OpenSource和Close函数控制视频的播放和停止;Embed the video material into the virtual reality scene, play it in three-dimensional space, simulate the monitoring and display device of the mining environment, set the keyboard X key, bind the MediaPlayer media class of the UE4 platform, and control the playback and stop of the video through the OpenSource and Close functions;
步骤5:选择模型并移动到任意位置Step 5: Select the model and move it anywhere
通过鼠标选中模型并将模型移动到任意需要进行模拟操作的位置,达到真实场景中的机械移动模拟;Select the model with the mouse and move the model to any position that needs to be simulated to achieve mechanical movement simulation in the real scene;
步骤6:实现应用场景的意向交互Step 6: Realize the intentional interaction of the application scenario
当用户在虚拟现实应用场景中漫游至某一特定位置,系统检测到用户有意向进入,就自动开启环境灯,实现虚拟场景中的自然交互;When the user roams to a specific location in the virtual reality application scene, the system detects that the user intends to enter, and automatically turns on the ambient light to realize natural interaction in the virtual scene;
步骤7:二维码生成Step 7: QR code generation
绑定键盘的F键,添加二维码生成函数,设置键盘按键控制生成二维码功能,用户按键盘F键,系统生成含设置好采样点的虚拟场景全景图的二维码;Bind the F key of the keyboard, add a QR code generation function, set the keyboard key to control the function of generating a QR code, and the user presses the F key on the keyboard, and the system generates a QR code with a virtual scene panorama with the set sampling points;
步骤8:实现语音交互Step 8: Implement Voice Interaction
用户通过包括正转、反转、升臂、降臂、停止在内的关键词控制虚拟现实场景中的采煤机,模拟其运作效果;The user controls the coal shearer in the virtual reality scene through keywords including forward rotation, reverse rotation, raising arm, lowering arm, and stop, and simulates its operation effect;
步骤9:AR动态演示功能模式切换Step 9: AR Dynamic Demonstration Function Mode Switching
用户点击系统右上角的AR模式按键切换到AR演示模式。The user clicks the AR mode button in the upper right corner of the system to switch to the AR demonstration mode.
优选地,在步骤3中,将模型实例化为具体的Actor,添加SetMesh函数以及SetMaterial函数来更换模型和模型材质,设置Widget Blueprint用户界面以及Boxcollision碰撞检测,实现三维空间的隐藏菜单功能。Preferably, in step 3, instantiate the model as a specific Actor, add the SetMesh function and SetMaterial function to replace the model and model material, set the Widget Blueprint user interface and Boxcollision collision detection, and realize the hidden menu function of the three-dimensional space.
优选地,在步骤5中,为要操作的模型添加鼠标事件,通过GetHitResult函数将模型选中,然后根据鼠标在三维空间的坐标,改变模型的SetActorLocation函数的坐标值,当鼠标再次点击,将此时鼠标x、y、z三个方向的坐标值赋给模型,此时GetHitResult函数将模型设置为取消选中模式。Preferably, in step 5, add a mouse event for the model to be operated, select the model through the GetHitResult function, and then change the coordinate value of the SetActorLocation function of the model according to the coordinates of the mouse in the three-dimensional space. When the mouse clicks again, the The coordinates of the mouse in the x, y, and z directions are assigned to the model, and the GetHitResult function sets the model to the unselected mode.
优选地,在步骤6中,设置TriggerBox触发器,当第一人称角色触发TriggerBox,系统检测到用户有意向进入某区域,便会自动启用此区域的某个设备。Preferably, in step 6, a TriggerBox trigger is set. When the first-person character triggers the TriggerBox and the system detects that the user intends to enter a certain area, a certain device in this area will be automatically enabled.
优选地,在步骤7中,用户按键盘F键,系统生成含设置好采样点的虚拟场景全景图二维码,用户用手机扫描二维码,跳转到手机端的虚拟应用场景展示页面,在手机端,用户启用陀螺仪,切换到VR分屏模式,设置好手机参数,便用VR眼镜体验虚拟井下采矿操作环境场景,实现720度的视角展示,还能够实现手机端的多场景、多角度的漫游体验。Preferably, in step 7, the user presses the F key on the keyboard, and the system generates a two-dimensional code of the virtual scene panorama with the set sampling points. On the mobile phone, the user activates the gyroscope, switches to the VR split-screen mode, sets the mobile phone parameters, and then uses the VR glasses to experience the virtual underground mining operation environment scene, realizing a 720-degree viewing angle display, and also realizing multi-scene and multi-angle viewing on the mobile phone. roaming experience.
优选地,在步骤8中,语音识别基于Pocket-sphinx库实现,通过改进中文关键字字典,经过预处理、特征提取、声学模型训练、语言模型训练以及语音解码和搜索实现识别功能,最后经过UE4引擎的编写功能控制函数实现三维空间里语音对模型的控制;语音识别的具体实现步骤如下:Preferably, in step 8, speech recognition is implemented based on the Pocket-sphinx library, by improving the Chinese keyword dictionary, through preprocessing, feature extraction, acoustic model training, language model training, speech decoding and search to realize the recognition function, and finally through UE4 The writing function of the engine controls the function to realize the voice control of the model in the three-dimensional space; the specific implementation steps of the voice recognition are as follows:
步骤8.1:预处理Step 8.1: Preprocessing
对输入的原始语音信号进行处理,滤除掉其中的不重要的信息以及背景噪声,并对语音信号的端点检测、语音分帧和预加重进行处理;Process the input original voice signal, filter out the unimportant information and background noise, and process the endpoint detection, voice framing and pre-emphasis of the voice signal;
通过一阶FIR高通数字滤波器来实现预加重,一阶FIR高通数字滤波器的传递函数为:Pre-emphasis is realized by a first-order FIR high-pass digital filter, and the transfer function of the first-order FIR high-pass digital filter is:
H(z)=1-az-1;H(z)=1-az -1 ;
其中,a为预加重滤波器的系数,取值范围为0.9~1.0,若设n时刻的语音采样值为x(n),则预加重后的信号为Among them, a is the coefficient of the pre-emphasis filter, and the value range is 0.9 to 1.0. If the speech sampling value at time n is x(n), the signal after pre-emphasis is
y(n)=x(n)-a*x(n-1);y(n)=x(n)-a*x(n-1);
步骤8.2:特征提取Step 8.2: Feature Extraction
通过梅尔频率倒谱系数(MFCC)的方法来进行特征提取;具体按照如下步骤进行:Feature extraction is performed by the method of Mel Frequency Cepstral Coefficient (MFCC); the specific steps are as follows:
步骤8.2.1:利用人听觉的临界带效应,采用MEL倒谱分析技术对语音信号处理得到MEL倒谱系数矢量序列;Step 8.2.1: Using the critical band effect of human hearing, MEL cepstrum analysis technology is used to process the speech signal to obtain the vector sequence of MEL cepstrum coefficients;
步骤8.2.2:用MEL倒谱系数矢量序列表示输入语音的频谱,在语音频谱范围内设置若干个具有三角形或正弦形滤波特性的带通滤波器;Step 8.2.2: represent the frequency spectrum of input speech with MEL cepstral coefficient vector sequence, set several band-pass filters with triangular or sinusoidal filtering characteristics in the range of speech spectrum;
步骤8.2.3:通过带通滤波器组,求各个带通滤波器的输出数据;Step 8.2.3: Find the output data of each band-pass filter through the band-pass filter bank;
步骤8.2.4:对各个带通滤波器的输出数据取对数,并做离散余弦变换(DCT);Step 8.2.4: take the logarithm of the output data of each bandpass filter, and do discrete cosine transform (DCT);
步骤8.2.5:得到MFCC系数;求解公式如下:Step 8.2.5: Obtain the MFCC coefficient; the solution formula is as follows:
其中,Ci为特征参数,k为变量,1≤k≤P;P为三角滤波器的个数,F(k)为各个滤波器的输出数据,i为数据长度;Wherein, C i is a feature parameter, k is a variable, 1≤k≤P; P is the number of triangular filters, F(k) is the output data of each filter, and i is the data length;
步骤8.3:声学模型训练Step 8.3: Acoustic Model Training
根据训练语音库的特征参数训练出声学模型参数;Acoustic model parameters are trained according to the characteristic parameters of the training speech library;
在识别时将待识别的语音的特征参数同声学模型进行匹配,得到识别结果;声学模型采用混合高斯模型-隐马尔科夫模型(GMM-HMM)实现,具体包括如下步骤:During recognition, the characteristic parameters of the speech to be recognized are matched with the acoustic model to obtain the recognition result; the acoustic model is realized by using a Gaussian mixture model-hidden Markov model (GMM-HMM), which specifically includes the following steps:
步骤8.3.1:求出混合高斯模型的联合概率密度函数:Step 8.3.1: Find the joint probability density function of the mixed Gaussian model:
其中,M表示混合高斯模型中高斯的个数,Cm表示权重,um表示均值,∑m表示协方差矩阵,D为观测矢量维数;利用最大期望值算法(EM)对混合高斯模型参数变量Θ={Cm,um,∑m}进行估计,利用如下公式求解:Among them, M represents the number of Gaussians in the mixed Gaussian model, C m represents the weight, u m represents the mean value, ∑ m represents the covariance matrix, D is the dimension of the observation vector; Θ={C m , u m , ∑ m } to estimate, use the following formula to solve:
其中,j是当前迭代轮数,N表示训练数据集中元素的个数,x(t)为t时刻的特征向量,hm(t)表示t时刻Cm的后验概率;GMM参数通过EM算法进行估计,使其在训练数据上生成语音观察特征的概率最大化;Among them, j is the number of current iteration rounds, N represents the number of elements in the training data set, x (t) is the feature vector at time t, h m (t) represents the posterior probability of C m at time t; GMM parameters are passed through the EM algorithm make an estimate that maximizes its probability of generating speech observation features on the training data;
步骤8.3.2:求解HMM的三个主要组成部分Step 8.3.2: Solve the three main components of the HMM
设状态序列为q1,q2,…,qN,令转移概率矩阵A=[aij]i,j∈[1,N],则求出的马尔科夫链状态间的跳转概率为:aij=P(qt=j|qt-1=i);马尔科夫链的初始概率π=[πi]i∈[1,N],其中,πi=P(q1=i);令每个状态的观察概率分布bi(ot)=P(ot|qt=i),采用GMM模型来描述状态的观察概率分布;根据步骤8.3.1,求解公式为:Let the state sequence be q 1 , q 2 ,…,q N , and let the transition probability matrix A=[a ij ]i,j∈[1,N], then the calculated transition probability between the Markov chain states is : a ij =P(q t =j|q t-1 =i); the initial probability of the Markov chain π=[π i ]i∈[1,N], where π i =P(q 1 = i); make the observation probability distribution of each state b i (o t )=P(o t |q t =i), and use the GMM model to describe the observation probability distribution of the state; according to step 8.3.1, the solution formula is:
其中,N为状态个数,i、j表示状态,aij表示t-1时刻从i状态跳转到t时刻j状态的概率,ot为t时刻的观测值,Ci,m为混合系数,表示不同高斯之间的权重,ui,m表示不同高斯之间的均值,∑i,m表示不同高斯之间的协方差矩阵;HMM的参数通过Baum-Welch算法进行估计得出,最后生成声学模型文件;Among them, N is the number of states, i and j represent states, a ij represents the probability of jumping from state i to state j at time t-1 at time t-1, o t is the observed value at time t, C i,m is the mixing coefficient , represents the weight between different Gaussians, u i, m represents the mean value between different Gaussians, ∑ i, m represents the covariance matrix between different Gaussians; the parameters of HMM are estimated by the Baum-Welch algorithm, and finally generated Acoustic model files;
步骤8.4:语言模型训练Step 8.4: Language Model Training
采用N-Gram模型实现语言模型的训练;在一个语句中第i个词出现的概率,条件依赖于它前面的N-1个词,即将一个词的上下文定义为该词前面出现的N-1个词,其表达式为:The N-Gram model is used to train the language model; the probability of the i-th word appearing in a sentence depends on the N-1 words before it, that is, the context of a word is defined as the N-1 words that appear before the word words whose expression is:
使用条件概率公式S将上述表达式替换成如下公式:Use the conditional probability formula S to replace the above expression with the following formula:
P(sentence)=P(w1)*P(w2|w1)*P(w3|w2)…*P(wn|w1,w2,…,wn-1)P(sentence)=P(w 1 )*P(w 2 |w 1 )*P(w 3 |w 2 )…*P(w n |w 1 ,w 2 ,…,w n-1 )
其中,P(w1)是w1在文章中出现的概率,P(w1,w2)是w1,w2连续出现的概率,P(w2|w1)是已知w1已出现的情况下出现w2的概率,假设识别sentence的概率用P(s)表示,P(s)=P(w1,w2,…,wn)表示单词集w1,w2,…,wn连续出现并生成S的概率;Among them, P(w 1 ) is the probability of w 1 appearing in the article, P(w 1 ,w 2 ) is the probability of w 1 and w 2 appearing continuously, P(w 2 |w 1 ) is the known w 1 has The probability of w 2 appears in the case of occurrence, assuming that the probability of recognizing sentence is represented by P(s), P(s)=P(w 1 ,w 2 ,…,w n ) means the word set w 1 ,w 2 ,… , the probability that w n appears continuously and generates S;
通过马尔科夫假设精简成如下公式:Simplified by the Markov assumption into the following formula:
P(sentence)=P(w1)*P(w2|w1)*P(w3|w2)…*P(wn|wn-1)P(sentence)=P(w 1 )*P(w 2 |w 1 )*P(w 3 |w 2 )…*P(w n |w n-1 )
其中,P(wi|wi-1)=P(wi-1,wi)/P(wi),P(wi-1,wi)和P(Wi)从语料统计出来,最终就能得到P(sentence),语言模型存储P(wi-1,wi)的概率统计值,通过求出P(sentence)的最大值来实现整个识别过程;Among them, P(w i |wi -1 )=P(wi -1 , wi )/P( wi ), P(wi -1 , wi ) and P(W i ) are calculated from the corpus , and finally P(sentence) can be obtained, the language model stores the probability statistics of P(wi -1 , wi ), and the entire recognition process is realized by finding the maximum value of P(sentence);
步骤8.5:语音解码和搜索算法Step 8.5: Speech decoding and search algorithm
针对输入的语音信号,根据己经训练好的声学模型、语言模型及利用g2p工具创建好的字典映射文件建立一个识别网络,根据搜索算法在该网络中寻找最佳的一条路径,这个路径就是能够以最大概率输出该语音信号的词串,这样就确定这个语音样本所包含的文字,语音解码采用Viterbi算法实现,具体过程如下:For the input speech signal, build a recognition network according to the trained acoustic model, language model and dictionary mapping file created by the g2p tool, and find the best path in the network according to the search algorithm. This path is able to Output the word string of the speech signal with the maximum probability, so that the text contained in the speech sample is determined, and the speech decoding is realized by the Viterbi algorithm. The specific process is as follows:
步骤8.5.1:输入HMM模型的参数和观测序列O={o1,o2,…,oT},则t=1时所有的状态概率为:Step 8.5.1: Input the parameters of the HMM model and the observation sequence O={o 1 ,o 2 ,…,o T }, then all state probabilities at t=1 are:
δ1(i)=πibi(o1)δ 1 (i) = π i b i (o 1 )
ψ1(i)=0ψ 1 (i) = 0
步骤8.5.2:逐渐递推到t=2,3,…,T,则为:Step 8.5.2: Gradually deduce to t=2,3,...,T, then:
步骤8.5.3:终止遍历:Step 8.5.3: Terminate the traversal:
步骤8.5.4:回溯最优路径,t=T-1,T-2,…,1;Step 8.5.4: backtrack the optimal path, t=T-1, T-2,...,1;
步骤8.5.5:输出最优隐状态路径 Step 8.5.5: Output the optimal hidden state path
其中,δt(i)是递推到t时刻,最优路径经过的所有结点的联合概率,ψt(i)是t时刻的隐状态,T为时间,P*为最优路径的概率,为最优路径的终结点。Among them, δ t (i) is the joint probability of all nodes passed by the optimal path at time t, ψ t (i) is the hidden state at time t, T is time, and P * is the probability of the optimal path , is the endpoint of the optimal path.
优选地,a取0.97。Preferably, a is 0.97.
优选地,在步骤9中,具体包括如下步骤:Preferably, in step 9, the following steps are specifically included:
步骤9.1:模型选择Step 9.1: Model Selection
对采煤机模型、掘进机模型、风煤钻模型以及综采支架模型进行选择,每一类模型都是对真实采煤工具的1:1建模模拟;Select the shearer model, roadheader model, wind coal drill model and fully mechanized mining support model, each type of model is a 1:1 modeling simulation of real coal mining tools;
步骤9.2:模型讲解Step 9.2: Model Explanation
用户通过选择模型后,再通过增强现实模式下的模型选择类菜单选择需要学习的工具模型选项,系统会播放对应的语音讲解,再次点击按键语音停止;After selecting the model, the user selects the tool model option to be learned through the model selection menu in the augmented reality mode, and the system will play the corresponding voice explanation, and click the button again to stop the voice;
步骤9.3:模型演示Step 9.3: Model presentation
将在3DMax建模过程中制作的工具模拟运行动画导入到Unreal Engine引擎中,设置相应的选择菜单,点击便可在AR模式下演示相应采煤工具的运行状态;Import the tool simulation running animation made in the 3DMax modeling process into the Unreal Engine engine, set the corresponding selection menu, and click to demonstrate the operating status of the corresponding coal mining tool in AR mode;
步骤9.4:截图生成图标Step 9.4: Take screenshot to generate icon
在AR模式的主菜单,添加一个按钮,绑定摄像机的截图功能,在菜单右侧添加滚动菜单栏,当截图函数成功触发,截图通过设置好的动态材质转换函数,显示到右侧滚动菜单栏,演示过程中,用户点击截图按键,系统会在界面一侧生成图标;In the main menu of AR mode, add a button to bind the screenshot function of the camera, and add a scrolling menu bar on the right side of the menu. When the screenshot function is successfully triggered, the screenshot will be displayed on the right scrolling menu bar through the set dynamic material conversion function. , during the demonstration, if the user clicks the screenshot button, the system will generate an icon on the side of the interface;
步骤9.5:旋转Step 9.5: Rotate
将设置的模型实例化为一个Actor,添加Rotation函数,实现模型顺时针旋转;Instantiate the set model as an Actor, add the Rotation function, and realize the clockwise rotation of the model;
步骤9.6:功能扩展Step 9.6: Function Extension
添加二级UI,控制Map切换,实现包括地球、土星、水星、含大气层星球以及星系在内的运行演示功能;添加WidgetBlueprint编码实现了知识简介面板的显示或隐藏;设计返回键可以回到AR编辑主模块;Add a secondary UI to control the map switch, and realize the operation demonstration function including the earth, Saturn, Mercury, planets with atmospheres, and galaxies; add WidgetBlueprint code to realize the display or hide of the knowledge introduction panel; design the return key to return to AR editing main module;
步骤9.7:动态手势控制模型,真实环境与虚拟模型叠加,手势与模型进行交互控制,具体包括如下步骤:Step 9.7: Control the model with dynamic gestures, superimpose the real environment and the virtual model, and interact with the model by gestures, including the following steps:
步骤9.7.1:初始化视频捕捉,读取标志文件和摄像相机参数;Step 9.7.1: Initialize the video capture, read the logo file and camera parameters;
步骤9.7.2:抓取视频帧图像;Step 9.7.2: Grab the video frame image;
步骤9.7.3:执行探测标记以及识别视频帧中的标记模板,并利用OpenCV库函数对获取的视频帧图像进行运动检测,判断是否检测到运动轨迹;Step 9.7.3: Execute the detection mark and identify the mark template in the video frame, and use the OpenCV library function to perform motion detection on the acquired video frame image, and judge whether a motion track is detected;
若:判断结果是检测到运动轨迹,则执行步骤9.7.4:If: the judgment result is that the motion track is detected, then perform step 9.7.4:
或判断结果是没有检测到运动轨迹,则继续执行探测标记以及识别视频帧中的标记模板,然后执行步骤9.7.12;Or if the judgment result is that no motion track is detected, then continue to perform the detection mark and identify the mark template in the video frame, and then perform step 9.7.12;
基于颜色直方图与背景差分进行运动检测,对采集的帧以及对每帧运动检测后得到除运动手势区域外的像素做背景更新,公式如下;Motion detection is performed based on the color histogram and the background difference, and the background update is performed on the collected frames and the pixels other than the motion gesture area obtained after the motion detection of each frame. The formula is as follows;
其中,ut为背景图像相应的像素点,ut+1为更新后的背景图像像素点;It为当前帧图像的像素点,If是当前帧图像像素点的掩码值,即是否做背景更新;a∈[0,1]为背景图像模型更新速度;Among them, u t is the corresponding pixel of the background image, u t+1 is the updated background image pixel; I t is the pixel of the current frame image, If is the mask value of the current frame image pixel, that is, whether Do background update; a∈[0,1] is the update speed of the background image model;
步骤9.7.4:对图像进行包括去噪在内的预处理;Step 9.7.4: Preprocessing the image including denoising;
通过运动检测步骤,如果检测到有运动信息,则开始对含有运动手势的视频帧图像进行预处理:通过OpenCV的medianBlur函数对图像进行中值滤波,去除椒盐噪声;Through the motion detection step, if motion information is detected, the video frame image containing motion gestures is preprocessed: the image is median-filtered through the medianBlur function of OpenCV to remove the salt and pepper noise;
步骤9.7.5:转换到HSV空间;Step 9.7.5: Convert to HSV space;
通过cvtColor函数对图像进行颜色空间转换,得到其HSV空间的数据,并对HSV空间中的亮度v值重新设定如下式所示:Use the cvtColor function to convert the color space of the image to obtain its HSV space data, and reset the brightness v value in the HSV space as shown in the following formula:
其中,r、g为肤色区域的红色与绿色像素,且r>g;Among them, r and g are the red and green pixels in the skin color area, and r>g;
步骤9.7.6:分割手区域;Step 9.7.6: Segment the hand region;
步骤9.7.7:进行形态学处理,去除杂点;Step 9.7.7: Carry out morphological processing to remove impurities;
将得到的运动二值图和通过反投影得到的二值图相与,并进行图像形态学闭操作得到比较完整的运动肤色手势二值图;并去除图像中的杂点;Comparing the obtained motion binary image with the binary image obtained by back-projection, and performing image morphology closing operation to obtain a relatively complete binary image of motion skin color gesture; and removing noise points in the image;
步骤9.7.8:获取手轮廓;Step 9.7.8: Obtain hand contour;
通过初步的形态学操作,去除噪声,并使手的边界更加清晰后,通过OpenCV的findContours函数得到手势轮廓,然后进行去除伪轮廓操作;After preliminary morphological operations, noise is removed, and the boundary of the hand is made clearer, the gesture contour is obtained through OpenCV's findContours function, and then the false contour is removed;
步骤9.7.9:画出手轮廓,标定信息;Step 9.7.9: Draw the outline of the hand and calibrate the information;
步骤9.7.10:轮廓信息比较,设置方向向量;Step 9.7.10: Comparing the contour information and setting the direction vector;
将每一帧得到的轮廓进行比较,设定比较条件,通过比较给方向标志变量赋值;Compare the contours obtained in each frame, set the comparison condition, and assign a value to the direction flag variable by comparison;
步骤9.7.11:对模型根据矢量坐标进行受力模拟,实现动态手势与虚拟模型的交互;Step 9.7.11: Simulate the force on the model according to the vector coordinates to realize the interaction between dynamic gestures and the virtual model;
动态手势通过轮廓判断后,根据不同的判断结果对虚拟模型进行受力模拟操作,根据轮廓判断过程中方向标记的值,模型在三维空间的坐标值将会进行x、y、z三个坐标轴上的相乘计算,通过坐标值的改变,实现模型位置的改变而达到受力的模拟;After the dynamic gesture is judged by the outline, the virtual model is subjected to a force simulation operation according to different judgment results. According to the value of the direction mark in the outline judgment process, the coordinate values of the model in the three-dimensional space will be carried out on the three coordinate axes of x, y, and z. The multiplication calculation on the above, through the change of the coordinate value, the change of the model position is realized to achieve the simulation of the force;
步骤9.7.12:计算摄像头相对于探测到的标记的转换矩阵;Step 9.7.12: Calculate the transformation matrix of the camera relative to the detected marker;
步骤9.7.13:在探测到的标记上叠加虚拟物体,并返回执行步骤9.7.2,实现真实环境与虚拟模型的叠加显示。Step 9.7.13: superimpose the virtual object on the detected mark, and return to step 9.7.2 to realize the superimposed display of the real environment and the virtual model.
本发明所带来的有益技术效果:Beneficial technical effects brought by the present invention:
(1)本发明的三维模型采用等比例建立,材质贴图通过UE4引擎平台的编辑贴近真实,应用场景的环境灯光采用真实灯光模拟烘焙渲染。整个虚拟现实场景都更真实,沉浸感极强。(1) The 3D model of the present invention is built in equal proportions, the texture map is close to the real through the editing of the UE4 engine platform, and the ambient lighting of the application scene is rendered by simulated baking of real lighting. The entire virtual reality scene is more real and has a strong sense of immersion.
(2)本发明通过技术方案实现了多种功能交互,例如在虚拟井下开采场景漫游的过程中通过隐藏菜单更换工具模型,更换矿山材质来模拟不同的开采地质,自由移动开采工具的位置,以及视频信息嵌入机器显示屏展现真实场景,利用语音功能实现控制采煤机的正转、反转、升臂、降臂、停止等。(2) The present invention realizes a variety of functional interactions through technical solutions, such as changing the tool model through the hidden menu during the roaming process of the virtual underground mining scene, changing the mine material to simulate different mining geology, freely moving the position of the mining tool, and The video information is embedded in the display screen of the machine to show the real scene, and the voice function is used to control the forward rotation, reverse rotation, lifting arm, lowering arm, stop, etc. of the shearer.
(3)本发明还通过生成二维码功能,将PC端展示连接到手机端展示,手机端功能更是可以利用手机内置陀螺仪,产生重力感应,如果在设置成VR眼镜模式便可利用简单的VR眼镜体验实时的场景沉浸感。(3) The present invention also connects the display on the PC to the display on the mobile phone through the function of generating a two-dimensional code. The function of the mobile phone can use the built-in gyroscope of the mobile phone to generate gravity induction. If it is set to the VR glasses mode, it can be used easily. VR glasses to experience real-time scene immersion.
(4)本发明还利用AR开发SDK—ARToolKit实现了AR动态演示功能,通过AR编辑演示功能,用户可是实时选择采矿工具模型,进行360旋转展示、语音讲解以及动态运行展示,截图保存等,更重要的是其将工具模型以AR的模式展示,虚拟模型与真实环境结合的展示效果,这不仅能展现模型的直观立体性,更能展现其真实性,使其具有更好的学习、教育的效果。(4) The present invention also utilizes the AR development SDK—ARToolKit to realize the AR dynamic demonstration function. Through the AR editing demonstration function, the user can select the mining tool model in real time, perform 360 rotation display, voice explanation, dynamic operation display, screenshot saving, etc., and more The most important thing is that it displays the tool model in AR mode, and the display effect of combining the virtual model with the real environment, which can not only show the intuitive three-dimensionality of the model, but also show its authenticity, so that it has better learning and educational functions. Effect.
(5)本发明的AR模块,除了其动态演示功能,更是对视频流添加了处理,当动态手势进入摄像头视角,它会产生与模型的交互,手从远到近的动态会传递给模型一个三维空间向前的一个模拟力,从上到下的动态会给模型一个向上的一个模拟力,向前翻转手的动态会给模型一个向下的模拟了,同样,如果手扭动或者左右倾斜,便会给模型一个具有矢量方向的模拟力。(5) The AR module of the present invention, in addition to its dynamic demonstration function, adds processing to the video stream. When the dynamic gesture enters the camera's perspective, it will interact with the model, and the dynamics of the hand from far to near will be passed to the model. A simulated force forward in a three-dimensional space, the dynamic from top to bottom will give the model an upward simulated force, and the dynamics of turning the hand forward will give the model a downward simulation. Similarly, if the hand twists or left and right Tilting gives the model a simulated force with a vector direction.
(6)本发明在AR模块除了在煤矿应用场景的功能实现,还扩展了AR在天文学领域的展示功能。添加地球、土星、水星、含有动态大气层的星球以及星系的AR展示功能,与此同时将知识简介面板显示功能添加到此AR展示的模块中,丰富了AR在教育展示领域的应用。(6) In addition to the function realization of the AR module in the coal mine application scene, the present invention also expands the display function of AR in the field of astronomy. Add the AR display function of the earth, Saturn, Mercury, planets with dynamic atmospheres, and galaxies. At the same time, add the knowledge briefing panel display function to this AR display module, which enriches the application of AR in the field of educational display.
附图说明Description of drawings
图1是本发明实现的整体功能结构图。Fig. 1 is an overall functional structure diagram realized by the present invention.
图2是本发明生成二维码功能的示意图。Fig. 2 is a schematic diagram of the function of generating a two-dimensional code in the present invention.
图3是本发明语音识别实现交互功能的原理图。Fig. 3 is a schematic diagram of the realization of the interaction function of the speech recognition of the present invention.
图4是本发明AR模式实现的原理图。FIG. 4 is a schematic diagram of the realization of the AR mode of the present invention.
图5是本发明动态手势交互功能实现的流程图。Fig. 5 is a flowchart of the realization of the dynamic gesture interaction function of the present invention.
具体实施方式Detailed ways
下面结合附图以及具体实施方式对本发明作进一步详细说明:Below in conjunction with accompanying drawing and specific embodiment the present invention is described in further detail:
本发明提供一种基于虚拟现实与增强现实的采矿操作多交互实现方法。结合附图1可以了解本发明所包含的整个技术功能。其具体实施步骤如下:The invention provides a mining operation multi-interaction realization method based on virtual reality and augmented reality. The entire technical function included in the present invention can be understood in conjunction with accompanying drawing 1. Its specific implementation steps are as follows:
步骤1:井下矿山开采作业的整个环境场景搭建。利用3DMax建模工具根据真实采矿操作环境创建相关模型。将模型分类导入UE4引擎,通过UE4平台,对模型进行材质编写、模拟自然灯光、环境光,添加物理碰撞检测,对系统进行参数调整,烘焙渲染。Step 1: Construction of the entire environment scene for underground mining operations. Utilize 3DMax modeling tools to create relevant models based on real mining operating environments. Import the models into the UE4 engine by category, and use the UE4 platform to write materials for the models, simulate natural light and ambient light, add physical collision detection, adjust the parameters of the system, and bake and render.
步骤2:在虚拟应用场景添加第一人称角色,给角色添加鼠标键盘控制事件。将键盘的上下左右键绑定Up、Down、Right、Left函数,控制第一人称角色在虚拟三维空间的坐标改变,实现漫游。给鼠标添加Turnaround函数,控制第一人称视角在虚拟三维空间的720度旋转。Step 2: Add a first-person character in the virtual application scene, and add mouse and keyboard control events to the character. Bind the up, down, left, and right keys of the keyboard to the Up, Down, Right, and Left functions to control the coordinate change of the first-person character in the virtual three-dimensional space to realize roaming. Add the Turnaround function to the mouse to control the 720-degree rotation of the first-person perspective in the virtual three-dimensional space.
步骤3:设置交互菜单,实现更换井下采矿操作的工具模型、矿山地质材质等功能交互。首先创建一个Widget Blueprint用户界面,设置菜单选项,为选项添加点击事件。然后给模型添加Box collision碰撞检测区域,当角色进入Box collision碰撞检测区域。创建的Widget Blueprint用户界面显示。离开Box collision碰撞检测区域,WidgetBlueprint用户界面隐藏。将采煤机模型实例化为一个Actor,并添加SetMesh函数,实现更换其他工具模型。同理,将三维空间里的矿山地质模型添加SetMaterial函数,实现更换材质。本发明设置四类开采工具模型供用户选择,以及将矿山地质设置成材质可选择模式,通过显示的样式菜单更换模型、材质。更换完毕,离开检测区域,菜单自动隐藏,不影响整体漫游视觉效果,又能达到实时交互的功能。Step 3: Set up the interactive menu to realize functional interaction such as changing tool models and mine geological materials for underground mining operations. First create a Widget Blueprint user interface, set the menu options, and add click events for the options. Then add a Box collision collision detection area to the model, when the character enters the Box collision collision detection area. The created Widget Blueprint UI is displayed. Leaving the Box collision collision detection area, the WidgetBlueprint user interface is hidden. Instantiate the shearer model as an Actor, and add the SetMesh function to replace other tool models. In the same way, add the SetMaterial function to the mine geological model in the three-dimensional space to realize the replacement of materials. The present invention sets four types of mining tool models for users to choose, and sets the mine geology into a material selectable mode, and replaces the models and materials through the displayed style menu. After the replacement, leave the detection area, the menu is automatically hidden, without affecting the overall roaming visual effect, and can achieve the function of real-time interaction.
步骤4:视频嵌入,在三维空间播放,模拟矿山开采环境的监控显示设备。本发明设置键盘X键绑定UE4平台的MediaPlayer媒体类,通过Open-Source和Close函数实现控制视频流的播放和停止。此操作可以模拟井下矿山开采控制设备的屏幕显示,以及实时环境监控的画面显示,凸显三维场景的真实性与动态性,使模拟的虚拟场景更加贴近现实。Step 4: Video embedding, playing in three-dimensional space, simulating the monitoring and display equipment of the mining environment. The invention sets the keyboard X key to be bound to the MediaPlayer media class of the UE4 platform, and controls the playing and stopping of the video stream through the Open-Source and Close functions. This operation can simulate the screen display of underground mine mining control equipment and the screen display of real-time environmental monitoring, highlighting the authenticity and dynamics of the 3D scene and making the simulated virtual scene closer to reality.
步骤5:选择模型可以拖动到任意用户想要放置的位置,以及实现设备自动开启的意向交互功能。为要操作的模型添加鼠标事件,通过GetHitResult函数将模型选中,然后根据鼠标在三维空间的坐标,改变模型的SetActorLocation函数的坐标值。当鼠标再次点击,将此时鼠标x、y、z三个方向的坐标值赋给模型,此时GetHitResult函数将模型设置为取消选中模式。本实施例用户可以点击场景中的采煤机模型,将其放到采矿操作场景的其他开采位置。Step 5: The selected model can be dragged to any position that the user wants to place, and the intentional interaction function that the device is automatically turned on can be realized. Add a mouse event for the model to be operated, select the model through the GetHitResult function, and then change the coordinate value of the SetActorLocation function of the model according to the coordinates of the mouse in the three-dimensional space. When the mouse is clicked again, the coordinates of the mouse in the x, y, and z directions are assigned to the model, and the GetHitResult function sets the model to the unselected mode. In this embodiment, the user can click on the shearer model in the scene and place it in other mining positions in the mining operation scene.
系统在特定区域添加TriggerBox触发器,第一人称角色进入此区域,触发TriggerBox触发器,相应下一个区域的环境灯控制函数SetVisible触发,灯被打开,从而实现了本发明设置的自动感应灯功能。这也是本发明设计的检测人意向的功能,从而实现更自然的系统交互。The system adds a TriggerBox trigger in a specific area, the first-person character enters this area, triggers the TriggerBox trigger, and correspondingly triggers the ambient light control function SetVisible in the next area, and the light is turned on, thereby realizing the automatic sensor light function set by the present invention. This is also the function of detecting human intention designed by the present invention, so as to realize more natural system interaction.
步骤6:二维码生成功能。单一的PC端展示不能满足多用户的体验,本发明通过添加二维码生成,扫描二维码便可实现多用户手机端的展示,通过二维码连接,手机跳转到煤矿开采作业的全景展示页面。在手机端,用户可以启用陀螺仪,切换到VR分屏模式,设置好手机参数,便可用VR眼镜体验虚拟井下煤矿开采环境,实现720度的视角展示。与此同时,可以实现手机端的多场景、多角度的漫游体验。本功能主要是通过绑定键盘的F、V键,添加二维码生成与隐藏函数。在UE4引擎中添加场景6个Point采集点,通过采集点位置生成全景图,再将信息和相关手机端设置以生成二维码形式生成网络连接,实现端与端的转换。此功能实现的流程如图2所示。Step 6: QR code generation function. A single PC-side display cannot satisfy the experience of multiple users. This invention generates by adding a QR code, and scanning the QR code can realize the display of multi-user mobile phones. Through the connection of the QR code, the mobile phone jumps to the panoramic display of coal mining operations page. On the mobile phone, the user can enable the gyroscope, switch to the VR split-screen mode, set the mobile phone parameters, and use the VR glasses to experience the virtual underground coal mining environment, realizing a 720-degree perspective display. At the same time, a multi-scenario and multi-angle roaming experience on the mobile phone can be realized. This function is mainly to add QR code generation and hiding functions by binding the F and V keys of the keyboard. Add 6 point collection points of the scene in the UE4 engine, generate a panorama through the location of the collection points, and then generate a network connection in the form of a QR code to generate information and related mobile terminal settings to achieve end-to-end conversion. The process of realizing this function is shown in Figure 2.
步骤7:实现语音控制功能。本发明利用Pocket-sphinx实现中文的关键字识别。具体的语音控制实现原理流程如图3所示,本发明在以采煤机模型创建的Actor上添加语音识别函数,通过在系统初始化后启用语音识别类,并保存对此类的引用。之后创建并绑定一个方法到语音识别函数OnWordSpoken,每当用户说出设置好的控制词语时,便会触发此方法,通过关键词匹配实现采煤机的正转、反转、升臂、降臂以及停止等相关控制。本方法实现的语音识别是基于美国卡内基梅隆大学开发的英语语音识别系统Sphinx改进而实现的。本发明的语音识别方法是大量词汇、非特定人、连续中文音节的孤立词识别方法。能够很好的识别不同人发出的设定词汇。最终通过UE4的编码技术,实现了语音词汇识别后与匹配词对应动作控制函数的触发,实现模型的相应动作控制。此识别体系包括语音预处理、特征提取、声学模型训练、语言模型训练和语音解码五个部分。以下是语音识别的具体流程:Step 7: Realize the voice control function. The invention utilizes Pocket-sphinx to realize Chinese keyword recognition. The specific voice control implementation principle flow is shown in Figure 3. The present invention adds a voice recognition function to the Actor created with the shearer model, and enables the voice recognition class after system initialization, and saves references to this class. Then create and bind a method to the speech recognition function OnWordSpoken. Whenever the user speaks the set control words, this method will be triggered, and the forward rotation, reverse rotation, lifting arm, and lowering of the coal shearer will be realized through keyword matching. Arm and stop and other related controls. The speech recognition realized by this method is based on the improvement of the English speech recognition system Sphinx developed by Carnegie Mellon University in the United States. The voice recognition method of the present invention is an isolated word recognition method for a large number of vocabulary, non-specific people, and continuous Chinese syllables. Able to recognize set words from different people very well. Finally, through the encoding technology of UE4, the triggering of the action control function corresponding to the matching word after speech vocabulary recognition is realized, and the corresponding action control of the model is realized. This recognition system includes five parts: speech preprocessing, feature extraction, acoustic model training, language model training and speech decoding. The following is the specific process of speech recognition:
步骤7.1:预处理。Step 7.1: Preprocessing.
对输入的原始语音信号进行处理,滤除掉其中的不重要的信息以及背景噪声,并进行语音信号的端点检测、语音分帧、预加重等处理。语音信号的预加重,目的是为了对语音的高频部分进行加重,去除口唇辐射的影响,增加语音的高频分辨率。一般通过传递函数为H(z)=1-az-1一阶FIR高通数字滤波器来实现预加重,a为预加重滤波器的系数,取值范围一般在0.9~1.0,本文取0.97。设n时刻的语音采样值为x(n),预加重后的信号为Process the input original voice signal, filter out the unimportant information and background noise, and perform endpoint detection, voice framing, pre-emphasis and other processing of the voice signal. The purpose of the pre-emphasis of the voice signal is to emphasize the high-frequency part of the voice, remove the influence of lip radiation, and increase the high-frequency resolution of the voice. Generally, pre-emphasis is realized through a first-order FIR high-pass digital filter with a transfer function of H(z)=1-az -1 . a is the coefficient of the pre-emphasis filter, and its value ranges generally from 0.9 to 1.0. This paper takes 0.97. Let the speech sampling value at time n be x(n), and the signal after pre-emphasis is
y(n)=x(n)-a*x(n-1)y(n)=x(n)-a*x(n-1)
步骤7.2:特征提取。Step 7.2: Feature Extraction.
本文使用梅尔频率倒谱系数(MFCC)的方法来提取。MFCC参数是基于人的听觉特性的,他利用人听觉的临界带效应,采用MEL倒谱分析技术对语音信号处理得到MEL倒谱系数矢量序列,用MEL倒谱系数表示输入语音的频谱。在语音频谱范围内设置若干个具有三角形或正弦形滤波特性的带通滤波器,然后将语音能量谱通过该滤波器组,求各个滤波器输出,对其取对数,并做离散余弦变换(DCT),即可得到MFCC系数。求解公式如下:This paper uses the method of Mel frequency cepstral coefficient (MFCC) to extract. MFCC parameters are based on human auditory characteristics. He utilizes the critical band effect of human hearing and uses MEL cepstrum analysis technology to process speech signals to obtain MEL cepstrum coefficient vector sequences, and use MEL cepstrum coefficients to represent the spectrum of input speech. Several band-pass filters with triangular or sine-shaped filter characteristics are set within the voice spectrum range, then the voice energy spectrum is passed through the filter bank, each filter output is sought, logarithm is taken to it, and discrete cosine transform ( DCT), the MFCC coefficients can be obtained. The solution formula is as follows:
其中,Ci为特征参数,k为变量,1≤k≤P;P为三角滤波器的个数,F(k)为各个滤波器的输出数据,为数据长度。Among them, C i is a characteristic parameter, k is a variable, 1≤k≤P; P is the number of triangular filters, F(k) is the output data of each filter, and is the data length.
步骤7.3:声学模型训练。Step 7.3: Acoustic model training.
根据训练语音库的特征参数训练出声学模型参数。在识别时可以将待识别的语音的特征参数同声学模型进行匹配,得到识别结果。本文采用混合高斯模型-隐马尔科夫模型(GMM-HMM)作为声学模型。Acoustic model parameters are trained according to the characteristic parameters of the training speech library. During recognition, the characteristic parameters of the speech to be recognized can be matched with the acoustic model to obtain the recognition result. In this paper, Gaussian Hybrid Model-Hidden Markov Model (GMM-HMM) is used as the acoustic model.
步骤7.3.1:求出混合高斯模型的联合概率密度函数:Step 7.3.1: Find the joint probability density function of the mixture Gaussian model:
其中,M表示混合高斯模型中高斯的个数,Cm表示权重,um表示均值,∑m表示协方差矩阵,D为观测矢量维数。利用最大期望值算法(EM)对混合高斯模型参数变量:Θ={Cm,um,∑m}进行估计,利用如下公式求解:Among them, M represents the number of Gaussians in the mixed Gaussian model, C m represents the weight, u m represents the mean value, ∑ m represents the covariance matrix, and D is the dimension of the observation vector. Use the maximum expectation algorithm (EM) to estimate the parameter variables of the mixed Gaussian model: Θ={C m , u m , ∑ m }, and use the following formula to solve it:
其中,j是当前迭代轮数,N表示训练数据集中元素的个数,x(t)为t时刻的特征向量,hm(t)表示t时刻Cm的后验概率。GMM参数通过EM算法进行估计,可以使其在训练数据上生成语音观察特征的概率最大化。Among them, j is the current iteration number, N represents the number of elements in the training data set, x (t) is the feature vector at time t, h m (t) represents the posterior probability of C m at time t. The GMM parameters are estimated by the EM algorithm, which maximizes its probability of generating speech observation features on the training data.
步骤7.3.2:求解HMM三个主要组成部分。Step 7.3.2: Solve the three main components of the HMM.
设状态序列为q1,q2,…,qN,令转移概率矩阵A=[aij]i,j∈[1,N],则求出的马尔科夫链状态间的跳转概率为:aij=P(qt=j|qt-1=i);马尔科夫链的初始概率π=[πi]i∈[1,N],其中,πi=P(q1=i);令每个状态的观察概率分布bi(ot)=P(ot|qt=i),采用GMM模型来描述状态的观察概率分布;根据步骤7.3.1,求解公式为:Let the state sequence be q 1 , q 2 ,…,q N , and let the transition probability matrix A=[a ij ]i,j∈[1,N], then the calculated transition probability between the Markov chain states is : a ij =P(q t =j|q t-1 =i); the initial probability of the Markov chain π=[π i ]i∈[1,N], where π i =P(q 1 = i); let the observation probability distribution of each state b i (o t )=P(o t |q t =i), use the GMM model to describe the observation probability distribution of the state; according to step 7.3.1, the solution formula is:
其中,N为状态个数,i、j表示状态,aij表示t-1时刻从i状态跳转到t时刻j状态的概率,ot为t时刻的观测值,Ci,m为混合系数,表示不同高斯之间的权重,ui,m表示不同高斯之间的均值,∑i,m表示不同高斯之间的协方差矩阵;HMM的参数通过Baum-Welch算法进行估计得出,最后生成声学模型文件;Among them, N is the number of states, i and j represent states, a ij represents the probability of jumping from state i to state j at time t-1 at time t-1, o t is the observed value at time t, C i, m is the mixing coefficient , represents the weight between different Gaussians, u i, m represents the mean value between different Gaussians, ∑ i, m represents the covariance matrix between different Gaussians; the parameters of HMM are estimated by the Baum-Welch algorithm, and finally generated Acoustic model files;
步骤7.4:语言模型训练。Step 7.4: Language model training.
语言模型是用来约束单词搜索,语言建模能够有效的结合汉语语法和语义的知识,描述词之间的内在关系,从而提高识别率,减少搜索范围。本文采用N-Gram模型实现语言模型的训练。在一个语句中第i个词出现的概率,条件依赖于它前面的N-1个词,即将一个词的上下文定义为该词前面出现的N-1个词,其表达公式为:The language model is used to constrain the word search. Language modeling can effectively combine the knowledge of Chinese grammar and semantics to describe the internal relationship between words, thereby improving the recognition rate and reducing the search scope. In this paper, the N-Gram model is used to train the language model. The probability that the i-th word appears in a sentence depends on the N-1 words before it, that is, the context of a word is defined as the N-1 words that appear before the word, and its expression formula is:
本文取N=2和N=3,也就是通过前一个或两个单词来判定当前单词出现的概率P(w2|w1),P(w3|w2,w1)。In this paper, N=2 and N=3 are used, that is, the probability of occurrence of the current word is determined by the previous one or two words P(w 2 |w 1 ), P(w 3 |w 2 ,w 1 ).
简单的说,语言模型就是统计语料得到的模型,语料是用于训练的文本库,字典文件存放的就是训练用的语料和对应发言。语言模型就是表达的语料的组合概率。如设P(w1)是w1在文章中出现的概率,P(w1,w2)是w1,w2连续出现是概率,P(w2|w1)是已知w1已出现的情况下出现w2的概率,假设识别sentence的概率用P(s)表示,P(s)=P(w1,w2,…,wn)表示单词集w1,w2,…,wn连续出现并生成S的概率,使用条件概率公式S把整个公式替换成:To put it simply, the language model is the model obtained by statistical corpus, the corpus is the text library used for training, and the dictionary file stores the corpus for training and the corresponding speech. The language model is the combined probability of the expressed corpus. For example, if P(w 1 ) is the probability of w 1 appearing in the article, P(w 1 ,w 2 ) is the probability of w 1 and w 2 appearing continuously, and P(w 2 |w 1 ) is the known w 1 has The probability of w 2 appears in the case of occurrence, assuming that the probability of recognizing sentence is represented by P(s), P(s)=P(w 1 ,w 2 ,…,w n ) means the word set w 1 ,w 2 ,… , the probability that w n appears continuously and generates S, use the conditional probability formula S to replace the entire formula with:
P(sentence)=P(w1)*P(w2|w1)*P(w3|w2)…*P(wn|w1,w2,…,wn-1)P(sentence)=P(w 1 )*P(w 2 |w 1 )*P(w 3 |w 2 )…*P(w n |w 1 ,w 2 ,…,w n-1 )
再用马尔科夫假设精简成:Then use the Markov assumption to simplify it into:
P(sentence)=P(w1)*P(w2|w1)*P(w3|w2)…*P(wn|wn-1)P(sentence)=P(w 1 )*P(w 2 |w 1 )*P(w 3 |w 2 )…*P(w n |w n-1 )
我们知道,P(wi|wi-1)=P(wi-1,wi)/P(wi),P(wi-1,wi)和P(wi)都可以从语料统计出来,最终就能得到P(sentence)。语言模型存储P(wi-1,wi)的概率统计值,通过求出P(sentence)的最大值来实现整个识别过程。We know that P( wi |wi -1 )=P(wi -1 , wi )/P( wi ), both P(wi -1 , wi ) and P( wi ) can be obtained from After the corpus is counted, P (sentence) can be obtained in the end. The language model stores the probability statistics of P(w i-1 , w i ), and realizes the whole recognition process by finding the maximum value of P(sentence).
步骤7.5:语音解码和搜索算法。Step 7.5: Speech decoding and search algorithm.
针对输入的语音信号,根据己经训练好的声学模型、语言模型及字典建立一个识别网络,根据搜索算法在该网络中寻找最佳的一条路径,这个路径就是能够以最大概率输出该语音信号的词串,这样就确定这个语音样本所包含的文字。本文采用Viterbi算法实现语音的解码。具体过程如下:For the input speech signal, build a recognition network according to the trained acoustic model, language model and dictionary, and find the best path in the network according to the search algorithm, which is the path that can output the speech signal with the greatest probability Word string, so as to determine the text contained in this speech sample. This text adopts Viterbi algorithm to realize the decoding of speech. The specific process is as follows:
(1)输入HMM模型的参数和观测序列O={o1,o2,…,oT},则t=1时所有的状态概率为:(1) Input the parameters of the HMM model and the observation sequence O={o 1 ,o 2 ,...,o T }, then all state probabilities at t=1 are:
δ1(i)=πibi(o1)δ 1 (i) = π i b i (o 1 )
ψ1(i)=0ψ 1 (i) = 0
(2)逐渐递推到t=2,3,…,T,则为:(2) Gradually deduce to t=2,3,...,T, then:
(3)终止遍历:(3) Terminate traversal:
(4)回溯最优路径,t=T-1,T-2,…,1;(4) Backtracking to the optimal path, t=T-1, T-2,...,1;
输出最优隐状态路径其中,δt(i)是递推到t时刻,最优路径经过的所有结点的联合概率,ψt(i)是t时刻的隐状态,T为时间,P*为最优路径的概率,为最优路径的终结点。最后通过最优路径实现语音识别。output optimal hidden state path Among them, δ t (i) is the joint probability of all nodes passed by the optimal path at time t, ψ t (i) is the hidden state at time t, T is time, and P * is the probability of the optimal path , is the endpoint of the optimal path. Finally, speech recognition is realized through the optimal path.
用户说出升臂、降臂、正转、反转以及停止语音后,仿真系统实现采煤机的相应操作,系统识别出用户说出的关键字后会在界面的左上角显示。After the user speaks out the lifting arm, lowering arm, forward rotation, reverse rotation and stop voice, the simulation system realizes the corresponding operation of the coal shearer, and the system recognizes the keywords spoken by the user and displays them in the upper left corner of the interface.
步骤8:AR动态演示功能模式切换。Step 8: AR dynamic demonstration function mode switching.
在界面设置一个widget blueprint,添加openLevel函数,切换到新的Map,即AR模式。进入AR演示模式,此模式具体实现采煤过程中的工具模型演示讲解,从而实现AR技术的学习、教育应用功能。Set a widget blueprint on the interface, add the openLevel function, and switch to the new Map, that is, the AR mode. Enter the AR demonstration mode, which specifically realizes the demonstration and explanation of tool models in the coal mining process, so as to realize the learning and educational application functions of AR technology.
步骤9:AR模式下的模型选择、模型讲解以及动态演示。Step 9: Model selection, model explanation and dynamic demonstration in AR mode.
本发明的AR动态演示模块,用户界面为了更简洁和便于AR展示,设计二级隐含菜单,本实施例是将模型选择、模型讲解、模型演示以及功能扩展的附加子功能选择设计成隐藏的二级菜单,模型选择分为采煤机、掘进机、风煤钻、综采支架等模型,用户选择完毕,子菜单隐藏即可,模型讲解、模型动态演示以及功能扩展菜单亦是如此实现。具体实现包含内容可参照图1。本文以NFT(自然图片追踪,Natural Feature Tracking)为例实现AR技术,其原理如图4所示,具体流程如下:In the AR dynamic demonstration module of the present invention, in order to make the user interface more concise and convenient for AR display, a second-level hidden menu is designed. In this embodiment, the additional sub-function selection of model selection, model explanation, model demonstration and function expansion is designed to be hidden. In the second level menu, the model selection is divided into models such as coal shearer, roadheader, wind coal drill, fully mechanized mining support, etc. After the user selects, the submenu can be hidden, and the model explanation, model dynamic demonstration and function expansion menu are also implemented in the same way. Refer to Figure 1 for specific implementation contents. This article takes NFT (Natural Feature Tracking) as an example to implement AR technology. The principle is shown in Figure 4. The specific process is as follows:
步骤9.1:通过摄像头校准标定,获取到因为摄像头制造工艺偏差而造成的畸变参数,也就是摄像头内参(intrinsic matrix),来复原相机模型的3D空间到2D空间的一一对应关系。Step 9.1: Through camera calibration, the distortion parameters caused by the deviation of the camera manufacturing process, that is, the intrinsic matrix of the camera, are obtained to restore the one-to-one correspondence between the 3D space and the 2D space of the camera model.
步骤9.2:根据摄像头本身的硬件参数,我们可以计算出相应的投影矩阵(Projection Matrix)。Step 9.2: According to the hardware parameters of the camera itself, we can calculate the corresponding projection matrix (Projection Matrix).
步骤9.3:对待识别的自然图片进行特征提取,获取到一组特征点{p}。Step 9.3: Perform feature extraction on the natural picture to be recognized, and obtain a set of feature points {p}.
步骤9.4:实时对摄像头获取到的图像进行特征提取,也是一组特征点{q}。Step 9.4: Perform feature extraction on the image captured by the camera in real time, which is also a set of feature points {q}.
步骤9.5:使用ICP(Iterative Closest Point)算法来迭代求解这两组特征点的R、T矩阵(Rotation&Translation),即Pose矩阵,也就是图形学中常说的模型视图矩阵(Model View Matrix)。假设三维空间的两个点为:他们的欧氏距离为:Step 9.5: Use the ICP (Iterative Closest Point) algorithm to iteratively solve the R, T matrix (Rotation&Translation) of these two groups of feature points, that is, the Pose matrix, which is often called the Model View Matrix in graphics. Suppose two points in three-dimensional space are: Their Euclidean distance is:
为求p和q变化的矩阵R和T,对于其中i,j=1,2,…,N,利用最小二乘法求出最优解。使:To find the matrices R and T of p and q changes, for Where i, j=1, 2,..., N, use the least square method to find the optimal solution. Make:
最小时的R和T,此时的R、T即MVP矩阵。其中,E为变换后两个点集中对应点的距离和,N为点集中点的个数。R and T at the minimum time, R and T at this time are the MVP matrix. Among them, E is the distance sum of the corresponding points in the two point sets after transformation, and N is the number of points in the point set.
步骤9.6:得到MVP矩阵(Model View Projection),进行三维图形绘制。Step 9.6: Obtain the MVP matrix (Model View Projection), and perform three-dimensional graphics drawing.
步骤10:截图生成图标。Step 10: Take a screenshot to generate an icon.
在AR模式的主菜单,添加一个按钮,绑定摄像机的截图功能,在菜单右侧添加滚动菜单栏,当截图函数成功触发,截图通过设置好的动态材质转换函数,显示到右侧滚动菜单栏。演示过程中,用户点击截图按键,系统会在界面左边生成图标,方便用户对学习过程中的难点、疑问点记录与详细观测,这样可以加固学习效果。In the main menu of AR mode, add a button to bind the screenshot function of the camera, and add a scrolling menu bar on the right side of the menu. When the screenshot function is successfully triggered, the screenshot will be displayed on the right scrolling menu bar through the set dynamic material conversion function. . During the demonstration process, when the user clicks the screenshot button, the system will generate an icon on the left side of the interface, which is convenient for the user to record and observe the difficulties and doubts in the learning process in detail, which can strengthen the learning effect.
步骤11:模型旋转停止展示。Step 11: The model rotates to stop displaying.
AR模式下,用户看到的是真实场景与虚拟模型的叠加。将设置的模型实例化为一个Actor,添加Rotation函数,实现模型顺时针旋转。此设计,设置模型旋转,用户对工具模型有一个360度观测、学习,可以更好的达到视觉效果,这种演示学习模式更具有真实性、沉浸感。In AR mode, what the user sees is the superposition of the real scene and the virtual model. Instantiate the set model as an Actor, add the Rotation function, and realize the clockwise rotation of the model. With this design, the model rotation is set, and the user has a 360-degree observation and learning of the tool model, which can better achieve visual effects. This demonstration learning mode is more authentic and immersive.
步骤12:AR功能扩展模块。Step 12: AR function expansion module.
本发明添加AR教育展示扩展功能,通过添加二级UI,控制Map切换,实现不同物体的演示。其中包括地球、土星、水星、含大气层星球以及星系运行展示功能,星球做自转运动,通过AR模式,将运动的星球展现在用户眼前,并添加知识简介功能,完善了本系统扩展的教育展示功能。The present invention adds an AR education display extension function, controls Map switching by adding a secondary UI, and realizes the demonstration of different objects. It includes the display function of the earth, Saturn, Mercury, planets with atmospheres and galaxies. The planets rotate. Through the AR mode, the moving planets are displayed in front of the user's eyes, and the knowledge introduction function is added to improve the expanded education display function of this system. .
步骤13:动态手势与模型交互。Step 13: Dynamic gestures to interact with the model.
AR模式添加OpenCV视频信息处理,初始化视频流后,先进行运动检测,如果检测到动态手运动,则进行图像处理,将手势进行图形处理去噪、转成HSV模式、形态学处理、画轮廓线,标定信息、轮廓信息比较,最后进行模型受力模拟,实现动态手势与虚拟模型的交互,具体实现原理流程如图5所示。特别的,此动态手势交互实现了模拟三维手势的识别控制,视频流获取的动态手为二维信息,这里通过矩阵运算,将与计算得到的摄像机相对于探测到的标识的转换矩阵做比较,得到一个三维运动手势运动信息,从而实现对模型在三维空间里不同方向上的受力模拟;具体包括如下步骤:AR mode adds OpenCV video information processing. After initializing the video stream, first perform motion detection. If dynamic hand motion is detected, image processing is performed, and the gesture is subjected to graphic processing to denoise, convert to HSV mode, morphological processing, and outline drawing. , calibration information and contour information are compared, and finally the force simulation of the model is carried out to realize the interaction between the dynamic gesture and the virtual model. The specific implementation principle flow is shown in Figure 5. In particular, this dynamic gesture interaction realizes the recognition and control of simulating three-dimensional gestures. The dynamic hand acquired by the video stream is two-dimensional information. Here, through matrix operations, it will be compared with the calculated conversion matrix of the camera relative to the detected logo. Get a 3D motion gesture motion information, so as to realize the force simulation of the model in different directions in the 3D space; specifically include the following steps:
步骤13.1:运动检测Step 13.1: Motion Detection
本方法是基于颜色直方图与背景差分的运动检测,程序在启动摄像头过程中,需要一定时间,这个时间差不多可以采集20帧的图像,对这20帧进行循环背景更新如下式,并对每帧运动检测后得到除运动手势区域外的像素也做背景更新。This method is based on the motion detection of the difference between the color histogram and the background. When the program starts the camera, it takes a certain amount of time. During this time, 20 frames of images can be collected, and the cyclic background update of these 20 frames is as follows, and each frame After the motion detection, the pixels other than the motion gesture area are also updated for the background.
其中,ut为背景图像相应的像素点,ut+1为更新后的背景图像像素点;It为当前帧图像的像素点,If是当前帧图像像素点的掩码值,即是否做背景更新;a∈[0,1]为背景图像模型更新速度,一般取0.8到1,本方法取0.8。Among them, u t is the corresponding pixel of the background image, u t+1 is the updated background image pixel; I t is the pixel of the current frame image, If is the mask value of the current frame image pixel, that is, whether Do background update; a ∈ [0,1] is the update speed of the background image model, generally 0.8 to 1, this method takes 0.8.
步骤13.2:图像预处理Step 13.2: Image Preprocessing
通过步骤13.1的简单运动检测步骤,如果检测到有运动信息,则开始对含有运动手势的视频帧图像进行预处理:通过OpenCV的medianBlur函数对图像进行中值滤波,去除椒盐噪声:Through the simple motion detection step in step 13.1, if motion information is detected, start to preprocess the video frame images containing motion gestures: use OpenCV's medianBlur function to perform median filtering on the image to remove salt and pepper noise:
步骤13.3:转换到HSV空间Step 13.3: Convert to HSV space
通过cvtColor函数对图像进行颜色空间转换,得到其HSV空间的数据,并对在HSV空间中亮度v值重新设置为比较小的亮度值(减小静止类肤色的干扰);对HSV空间中亮度v值重新设定如下式所示:Through the cvtColor function, the color space conversion of the image is carried out to obtain the data of its HSV space, and the brightness v value in the HSV space is reset to a relatively small brightness value (reducing the interference of static skin color); for the brightness v in the HSV space The value reset is as follows:
其中,r、g为感兴趣的肤色区域的红色与绿色像素,且r>g;Among them, r and g are the red and green pixels of the skin color area of interest, and r>g;
步骤13.4:分割手区域,并进行形态学处理Step 13.4: Segment the hand region and perform morphological processing
将得到的运动二值图和通过反投影得到的二值图相与,在进行一些图像形态学闭操作得到比较完整的运动肤色手势二值图;去除图像中的杂点;Comparing the obtained motion binary image with the binary image obtained by back projection, performing some image morphological closing operations to obtain a relatively complete binary image of motion skin color gesture; remove the noise in the image;
步骤13.5:获取手势轮廓Step 13.5: Get Gesture Outlines
通过初步的形态学操作,去除噪声,并使手的边界更加清晰后,通过OpenCV的findContours函数得到手势轮廓,然后进行去除伪轮廓操作;After preliminary morphological operations, noise is removed, and the boundary of the hand is made clearer, the gesture contour is obtained through OpenCV's findContours function, and then the false contour is removed;
步骤13.6:画出轮廓,标定信息Step 13.6: Draw contours, calibration information
步骤13.7:轮廓信息比较,设置方向矢量Step 13.7: Comparing contour information, setting direction vector
由于手是不断运动的,所以我们得到的轮廓也是不断在改变。将每一帧得到的轮廓进行比较,设定比较条件。通过比较给方向标志变量赋值。状态比较和分析如表1:Since the hand is constantly moving, the outline we get is constantly changing. Compare the contours obtained in each frame and set the comparison conditions. Assign a value to the direction flag variable by comparison. The status comparison and analysis are shown in Table 1:
表1:状态分析Table 1: State Analysis
步骤13.8:通过方向矢量,作用到虚拟模型,产生受力模拟Step 13.8: Act on the virtual model through the direction vector to generate force simulation
动态手势通过轮廓判断后,根据不同的判断结果对虚拟模型进行受力模拟操作。根据轮廓判断过程中方向标记的值,模型在三维空间的坐标值将会进行x、y、z三个坐标轴上的相乘计算,通过坐标值的改变,实现模型位置的改变而达到受力的模拟。After the dynamic gesture is judged by the outline, the virtual model is subjected to force simulation operation according to different judgment results. According to the value of the direction mark in the contour judgment process, the coordinate value of the model in the three-dimensional space will be multiplied on the three coordinate axes of x, y, and z. Through the change of the coordinate value, the change of the position of the model is realized to achieve the force simulation.
本实施例中选取一组手掌由远到近的运动、由下到上运动以及手掌向各个方向扭转运动对模型产生的不同受力效果模拟展示,手势运动模型分别向前移动、向上移动以及根据手的不同扭转方向有一个向各方向受力的运行效果。此功能展示了动态手势与虚拟模型的交互,此交互可以帮助用户多角度观察模型,并实现了教学与用户之间的互动,增加趣味性。In this embodiment, a group of palm movements from far to near, from bottom to top, and palm twisting in various directions are selected to simulate and display different force effects on the model. The gesture movement models move forward, upward and according to The different twisting directions of the hand have a running effect of force in all directions. This function demonstrates the interaction between dynamic gestures and the virtual model. This interaction can help users observe the model from multiple angles, and realize the interaction between teaching and users, increasing the fun.
当然,上述说明并非是对本发明的限制,本发明也并不仅限于上述举例,本技术领域的技术人员在本发明的实质范围内所做出的变化、改型、添加或替换,也应属于本发明的保护范围。Of course, the above descriptions are not intended to limit the present invention, and the present invention is not limited to the above examples. Changes, modifications, additions or replacements made by those skilled in the art within the scope of the present invention shall also belong to the present invention. protection scope of the invention.
Claims (8)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710668415.XA CN107515674B (en) | 2017-08-08 | 2017-08-08 | It is a kind of that implementation method is interacted based on virtual reality more with the mining processes of augmented reality |
| PCT/CN2017/118923 WO2019029100A1 (en) | 2017-08-08 | 2017-12-27 | Multi-interaction implementation method for mining operation based on virtual reality and augmented reality |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710668415.XA CN107515674B (en) | 2017-08-08 | 2017-08-08 | It is a kind of that implementation method is interacted based on virtual reality more with the mining processes of augmented reality |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN107515674A CN107515674A (en) | 2017-12-26 |
| CN107515674B true CN107515674B (en) | 2018-09-04 |
Family
ID=60722284
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201710668415.XA Active CN107515674B (en) | 2017-08-08 | 2017-08-08 | It is a kind of that implementation method is interacted based on virtual reality more with the mining processes of augmented reality |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN107515674B (en) |
| WO (1) | WO2019029100A1 (en) |
Families Citing this family (46)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107515674B (en) * | 2017-08-08 | 2018-09-04 | 山东科技大学 | It is a kind of that implementation method is interacted based on virtual reality more with the mining processes of augmented reality |
| TWI633500B (en) * | 2017-12-27 | 2018-08-21 | 中華電信股份有限公司 | Augmented reality application generation system and method |
| CN108198246A (en) * | 2017-12-28 | 2018-06-22 | 重庆创通联达智能技术有限公司 | A kind of method of controlling rotation and device for showing 3-D view |
| CN108230440A (en) * | 2017-12-29 | 2018-06-29 | 杭州百子尖科技有限公司 | Chemical industry whole process operating system and method based on virtual augmented reality |
| US11074292B2 (en) * | 2017-12-29 | 2021-07-27 | Realwear, Inc. | Voice tagging of video while recording |
| CN110058673A (en) * | 2018-01-17 | 2019-07-26 | 广西米克尔森科技股份有限公司 | A kind of virtual reality and augmented reality show exchange technology |
| CN108509031A (en) * | 2018-03-12 | 2018-09-07 | 中国科学院国家空间科学中心 | A kind of space science task display systems based on augmented reality |
| CN108629076A (en) * | 2018-03-22 | 2018-10-09 | 广东长亨石业有限公司 | A kind of stone pit simulation system and its method based on 3D models |
| CN108399815A (en) * | 2018-03-22 | 2018-08-14 | 河南职业技术学院 | A kind of security risk based on VR looks into the method and its system except rehearsal |
| CN108563395A (en) * | 2018-05-07 | 2018-09-21 | 北京知道创宇信息技术有限公司 | The visual angles 3D exchange method and device |
| CN110489184B (en) * | 2018-05-14 | 2023-07-25 | 北京凌宇智控科技有限公司 | Virtual reality scene implementation method and system based on UE4 engine |
| CN109144256B (en) * | 2018-08-20 | 2019-08-23 | 广州市三川田文化科技股份有限公司 | A kind of virtual reality behavior interactive approach and device |
| CN110873901B (en) * | 2018-08-29 | 2022-03-08 | 中国石油化工股份有限公司 | Pseudo well curve frequency increasing method and system |
| CN109268010B (en) * | 2018-09-22 | 2020-07-03 | 太原理工大学 | A remote inspection and intervention method of virtual reality mine fully mechanized mining face |
| CN109407918A (en) * | 2018-09-25 | 2019-03-01 | 苏州梦想人软件科技有限公司 | The implementation method of augmented reality content multistage interactive mode |
| CN109191978A (en) * | 2018-09-27 | 2019-01-11 | 常州工程职业技术学院 | Shield machine manipulates driving analog system |
| CN109543072B (en) * | 2018-12-05 | 2022-04-22 | 深圳Tcl新技术有限公司 | Video-based AR education method, smart television, readable storage medium and system |
| CN109992118A (en) * | 2019-02-18 | 2019-07-09 | 杭州同绘科技有限公司 | Simulation Operating System for Insulated Boom Truck Based on Virtual Reality Technology |
| WO2020190272A1 (en) * | 2019-03-18 | 2020-09-24 | Siemens Aktiengesellschaft | Creation of digital twin of the interaction among parts of the physical system |
| CN110131448A (en) * | 2019-05-17 | 2019-08-16 | 佛山市小纯电器科技有限公司 | Temperature control faucet |
| CN110275610B (en) * | 2019-05-27 | 2022-09-30 | 山东科技大学 | Cooperative gesture control coal mining simulation control method based on LeapMotion somatosensory controller |
| CN110348370B (en) * | 2019-07-09 | 2021-05-11 | 北京猫眼视觉科技有限公司 | Augmented reality system and method for human body action recognition |
| CN110502121B (en) * | 2019-07-24 | 2023-02-17 | 江苏大学 | Frame type virtual keyboard with touch sense and high recognition resolution and input correction algorithm thereof |
| CN110740263B (en) * | 2019-10-31 | 2021-03-12 | 维沃移动通信有限公司 | Image processing method and terminal equipment |
| CN110969687B (en) * | 2019-11-29 | 2023-07-28 | 中国商用飞机有限责任公司北京民用飞机技术研究中心 | Collision detection method, device, equipment and medium |
| CN111241963B (en) * | 2020-01-06 | 2023-07-14 | 中山大学 | First-person perspective video interaction behavior recognition method based on interaction modeling |
| CN111309202B (en) * | 2020-01-20 | 2021-09-21 | 深圳市赛易特信息技术有限公司 | Dynamic display method, terminal and storage medium based on webpage |
| US11119569B2 (en) | 2020-02-18 | 2021-09-14 | International Business Machines Corporation | Real-time visual playbacks |
| CN111367407B (en) * | 2020-02-24 | 2023-10-10 | Oppo(重庆)智能科技有限公司 | Intelligent glasses interaction method, intelligent glasses interaction device and intelligent glasses |
| CN111300412A (en) * | 2020-02-28 | 2020-06-19 | 华南理工大学 | A method of controlling robots based on Unreal Engine |
| CN112419329A (en) * | 2020-06-03 | 2021-02-26 | 中煤华晋集团有限公司王家岭矿 | Bulk similarity simulation top coal migration monitoring method based on MATLAB |
| CN111784850B (en) * | 2020-07-03 | 2024-02-02 | 深圳市瑞立视多媒体科技有限公司 | Object grabbing simulation method based on illusion engine and related equipment |
| CN111833460B (en) | 2020-07-10 | 2024-07-26 | 北京字节跳动网络技术有限公司 | Augmented reality image processing method and device, electronic equipment and storage medium |
| CN111894582B (en) * | 2020-08-04 | 2021-09-24 | 中国矿业大学 | A kind of control method of coal shearer |
| CN111968445A (en) * | 2020-09-02 | 2020-11-20 | 上海上益教育设备制造有限公司 | Elevator installation teaching virtual reality system |
| CN112382293A (en) * | 2020-11-11 | 2021-02-19 | 广东电网有限责任公司 | Intelligent voice interaction method and system for power Internet of things |
| CN112799507B (en) * | 2021-01-15 | 2022-01-04 | 北京航空航天大学 | Human body virtual model display method, device, electronic device and storage medium |
| CN113380088A (en) * | 2021-04-07 | 2021-09-10 | 上海中船船舶设计技术国家工程研究中心有限公司 | Interactive simulation training support system |
| CN113128716A (en) * | 2021-04-25 | 2021-07-16 | 中国科学院计算机网络信息中心 | Operation guidance interaction method and system |
| CN113160395B (en) * | 2021-05-20 | 2022-06-24 | 北京知优科技有限公司 | CIM-based urban multi-dimensional information interaction and scene generation method, device and medium |
| CN116744041A (en) * | 2022-03-04 | 2023-09-12 | 北京字跳网络技术有限公司 | Information display method, device, head-mounted display device and storage medium |
| CN114743554A (en) * | 2022-06-09 | 2022-07-12 | 武汉工商学院 | Intelligent household interaction method and device based on Internet of things |
| CN117316143A (en) * | 2023-11-30 | 2023-12-29 | 深圳市金大智能创新科技有限公司 | Method for human-computer interaction based on virtual person |
| CN117934674B (en) * | 2024-02-05 | 2024-09-17 | 深圳萌想文化传播有限公司 | Deep learning and three-dimensional animation interactive cooperation method and system |
| CN117873119B (en) * | 2024-03-11 | 2024-05-28 | 北京数易科技有限公司 | Mobile control method, system and medium for mobile equipment based on virtual reality |
| CN119006756B (en) * | 2024-10-22 | 2025-01-24 | 泸州职业技术学院 | A virtual simulation teaching method, system and computer equipment |
Family Cites Families (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9274340B2 (en) * | 2014-02-18 | 2016-03-01 | Merge Labs, Inc. | Soft head mounted display goggles for use with mobile computing devices |
| US20160090839A1 (en) * | 2014-11-26 | 2016-03-31 | Larry G. Stolarczyk | Method of protecting the health and well-being of coal mine machine operators |
| US20160163063A1 (en) * | 2014-12-04 | 2016-06-09 | Matthew Ashman | Mixed-reality visualization and method |
| CN105955456B (en) * | 2016-04-15 | 2018-09-04 | 深圳超多维科技有限公司 | The method, apparatus and intelligent wearable device that virtual reality is merged with augmented reality |
| CN106019364B (en) * | 2016-05-08 | 2019-02-05 | 大连理工大学 | Early warning system and method for floor water inrush during coal mining |
| CN106953900A (en) * | 2017-03-09 | 2017-07-14 | 华东师范大学 | An industrial environment enhanced interactive terminal and system |
| CN107515674B (en) * | 2017-08-08 | 2018-09-04 | 山东科技大学 | It is a kind of that implementation method is interacted based on virtual reality more with the mining processes of augmented reality |
-
2017
- 2017-08-08 CN CN201710668415.XA patent/CN107515674B/en active Active
- 2017-12-27 WO PCT/CN2017/118923 patent/WO2019029100A1/en not_active Ceased
Also Published As
| Publication number | Publication date |
|---|---|
| CN107515674A (en) | 2017-12-26 |
| WO2019029100A1 (en) | 2019-02-14 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN107515674B (en) | It is a kind of that implementation method is interacted based on virtual reality more with the mining processes of augmented reality | |
| JP7096902B2 (en) | Image processing methods, equipment, computer programs and computer devices | |
| CN108874126B (en) | Interaction method and system based on virtual reality equipment | |
| CN102270275B (en) | The method of selecting object and multimedia terminal in virtual environment | |
| CN104166851B (en) | The interactive multimedia learning system and method for a kind of papery teaching material | |
| CN114144790A (en) | Personalized speech-to-video with three-dimensional skeletal regularization and representative body gestures | |
| CN106062673A (en) | Controlling a computing-based device using gestures | |
| CN112668407B (en) | Method, device, storage medium and electronic device for generating key points of human face | |
| WO2010006087A9 (en) | Process for providing and editing instructions, data, data structures, and algorithms in a computer system | |
| CA3185810A1 (en) | Systems and methods for augmented or mixed reality writing | |
| CN103649967A (en) | Dynamic gesture recognition process and authoring system | |
| Şen et al. | A novel gesture-based interface for a VR simulation: Re-discovering Vrouw Maria | |
| CN111598996B (en) | Article 3D model display method and system based on AR technology | |
| CN104484034B (en) | A kind of gesture motion primitive transition frames localization method based on gesture identification | |
| CN112764530A (en) | Ammunition identification method based on touch handle and augmented reality glasses | |
| Yin et al. | Toward natural interaction in the real world: Real-time gesture recognition | |
| CN114840089A (en) | Augmented reality musical instrument display method, equipment and storage medium | |
| Abdallah et al. | An overview of gesture recognition | |
| Thakar et al. | Hand gesture controlled gaming application | |
| CN112764531A (en) | Augmented reality ammunition identification method | |
| Tseng | Intelligent augmented reality system based on speech recognition | |
| Idrees et al. | Controlling Power Point Using Hand Gestures In Python | |
| CN112788390B (en) | Control method, device, equipment and storage medium based on man-machine interaction | |
| CN118250523A (en) | Digital human video generation method and device, storage medium and electronic equipment | |
| CN111860086A (en) | Gesture recognition method, device and system based on deep neural network |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |