KR20230054522A

KR20230054522A - Augmented reality rehabilitation training system applied with hand gesture recognition improvement technology

Info

Publication number: KR20230054522A
Application number: KR1020210137209A
Authority: KR
Inventors: 태기식; 송근산; 이현주
Original assignee: 건양대학교산학협력단
Priority date: 2021-10-15
Filing date: 2021-10-15
Publication date: 2023-04-25
Anticipated expiration: 2041-10-15
Also published as: KR102665856B1

Abstract

본 발명은 립모션 컨트롤러를 이용한 적외선 이미지 기반 손동작 인식 기술과 음성인식 기술을 활용하여 증강현실 재활 훈련에서 환자의 흥미를 유발하되 손동작 인식 정확도와 훈련 효율을 크게 향상시킨 손동작 인식 향상 기술이 적용된 증강현실 재활 훈련시스템에 관한 것이다.The present invention utilizes infrared image-based hand motion recognition technology and voice recognition technology using a lip motion controller to induce interest in patients in augmented reality rehabilitation training, while greatly improving hand motion recognition accuracy and training efficiency. Augmented reality with hand motion recognition enhancement technology applied It is about the rehabilitation training system.

Description

Augmented reality rehabilitation training system applied with hand gesture recognition improvement technology}

본 발명은 재활 훈련시스템에 관한 것으로, 자세하게는 립모션 컨트롤러를 이용한 적외선 이미지 기반 손동작 인식 기술과 음성인식 기술을 활용하여 증강현실 재활 훈련에서 환자의 흥미를 유발하되 손동작 인식 정확도와 훈련 효율을 크게 향상시킨 손동작 인식 향상 기술이 적용된 증강현실 재활 훈련시스템에 관한 것이다.The present invention relates to a rehabilitation training system, and more particularly, utilizes infrared image-based hand motion recognition technology and voice recognition technology using a leap motion controller to arouse patients' interest in augmented reality rehabilitation training while significantly improving hand motion recognition accuracy and training efficiency. It is about an augmented reality rehabilitation training system to which hand motion recognition improvement technology is applied.

인지재활 분야에서의 다양한 연구에서 뇌는 반복적이고 집중적이며 작업 지향적 훈련을 통해 회복 가능하다는 것이 밝혀졌으나, 뇌 손상 환자는 전통적으로 재활에 필요한 운동이 환자에게 특정행동을 반복적으로 유도하여 지루하게 느끼는 것이 일반적이다.Various studies in the field of cognitive rehabilitation have revealed that the brain can be restored through repetitive, intensive, and task-oriented training. Common.

이에 환자에게 동기를 부여하는 것은 재활 성공의 중요한 요소로 작용하여 재활 결과에 영향을 끼치는 경우가 많으므로, 치료사는 환자가 재활 프로그램에 적극적으로 참여하도록 장려하는 방법을 찾는 것이 중요하다.Therefore, it is important for therapists to find ways to encourage patients to actively participate in rehabilitation programs, as motivating patients is an important factor in rehabilitation success and often affects rehabilitation outcomes.

하지만, 이러한 치료방법들이 시장에서 겪는 가장 큰 어려움은 재활치료 전문가의 지도를 필연적으로 필요로 하거나 높은 치료비용을 요구한다는 점이다.However, the biggest difficulty these treatment methods face in the market is that they inevitably require guidance from rehabilitation experts or require high treatment costs.

최근 4차 산업혁명 유망기술에 대한 관심 및 연구가 활발함에 따라 메타버스(Metaverse) 기술이 각광받으며 일상생활뿐만 아닌 미디어, 문화, 의료, 산업에 대해 꾸준한 적용이 연구되며 그 효과를 입증해나가고 있다. 메타버스 기반의 디지털 전산 콘텐츠를 이용한 인지재활의 중재는 앞서 언급한 비용이나 전문가의 부재로 겪는 어려움에 대한 해결책을 제공할 것으로 기대된다.Recently, as interest in and research on promising technologies for the 4th Industrial Revolution have been active, Metaverse technology has been in the limelight, and its effects have been proven through continuous research on media, culture, medical care, and industry as well as daily life. . Intervention of cognitive rehabilitation using metaverse-based digital computer contents is expected to provide a solution to the difficulties experienced due to the aforementioned cost or lack of experts.

특히 재활의학에서 메타버스의 요소를 재활훈련 목적의 전산 콘텐츠에 포함시키려는 노력은 환자로 하여금 흥미를 잃지 않도록 돕거나 훈련효과를 상승시킬 수 있다고 보고된 바 있으며 앞으로는 이러한 훈련효과의 극대화를 위해 인공지능을 활용한 입력메커니즘(input mechanism)의 강화 및 시스템 연구 개발 성과가 주목되고 있다.In particular, it has been reported that in rehabilitation medicine, efforts to include metaverse elements in computerized contents for rehabilitation training can help patients not lose interest or increase training effects, and in the future, artificial intelligence will be used to maximize these training effects. Reinforcement of the input mechanism using , and achievements in system research and development are attracting attention.

근래 가상현실은 출력 메커니즘(output mechanism)이나 입력 메커니즘에서의 하드웨어에 속하는 외부 보조 장치를 포함하지 않고도 내부 콘텐츠를 가상현실에 맞게 디자인하고 개발, 구축함으로써 그 기능성을 극대화하고자 하고 있다. 이러한 노력은 외부 보조 장치에만 의존한 메타버스의 구현에서 벗어나 적합한 전용 콘텐츠의 내용과 개발 표준을 강화시키고 입력 메커니즘의 소프트웨어적 처리와 같은 제반 연구의 필요성이 증가되는 실정을 대변하고 있다.Recently, virtual reality is trying to maximize its functionality by designing, developing, and constructing internal contents suitable for virtual reality without including an external auxiliary device belonging to hardware in an output mechanism or an input mechanism. These efforts represent a situation where the need for overall research, such as software processing of the input mechanism, is increasing, escaping from the implementation of the metaverse that relied only on external auxiliary devices, strengthening the content and development standards of suitable dedicated content.

그러나 현재 개발되고 있는 메타버스 기반 재활 콘텐츠의 입력 메커니즘은 터치스크린(a), 조이스틱(b) 등의 기성 컨트롤러를 사용하거나 그 사용법이 실제 일상생활 동작과는 거리가 멀고 직관적이지 못한 실정으로, 이를 해결하기 위해 입력 메커니즘의 개선을 통해 콘텐츠 참여를 위한 동기부여가 필요하다.However, the input mechanism of metaverse-based rehabilitation contents currently being developed uses ready-made controllers such as touch screens (a) and joysticks (b), or their usage is far from actual daily life motions and is not intuitive. To solve this, it is necessary to motivate participation in content through improvement of input mechanism.

또한, 각각의 재활목적의 전산 콘텐츠를 수행하려고 할 때마다 각 제품만의 컨트롤러 사용법을 학습해야 한다는 것은 환자와 치료사 양쪽 모두에게 부담으로 작용할 수밖에 없고 인지재활을 주로 이용하는 고령자 환자의 경우 이러한 터치스크린과 조이스틱 등의 입력 메커니즘의 동작방식 자체가 익숙하지 않아 금방 흥미를 잃게 될 수 있다.In addition, having to learn how to use a controller for each product every time you try to perform computer contents for the purpose of rehabilitation is a burden on both the patient and the therapist, and in the case of elderly patients who mainly use cognitive rehabilitation, You may quickly lose interest because you are not familiar with the operation method of an input mechanism such as a joystick.

콘텐츠의 중재방법을 임상에서 적용하는데 필요한 자격요건은 접근성, 간편성이고 대상자들에게 흥미 있는 콘텐츠로 재활에 적극 참여할 수 있도록 개선하는 과정은 환자가 콘텐츠를 통한 재활치료를 함에 있어 동기부여에 큰 도움을 줄 수 있다.The qualifications necessary to apply content mediation methods in clinical practice are accessibility and simplicity, and the process of improving the subjects to actively participate in rehabilitation with interesting content is a great help in motivating patients in rehabilitation treatment through content. can give

대한민국 등록특허 제10-1519808호(2015.05.06)Republic of Korea Patent No. 10-1519808 (2015.05.06)

본 발명은 상기와 같은 배경 및 필요에 의하여 창출된 것으로, 본 발명의 목적은 립모션 컨트롤러를 이용한 적외선 이미지 기반 손동작 인식 기술과 음성인식 기술을 활용하여 증강현실 재활 훈련에서 환자의 흥미를 유발하되 손동작 인식 정확도와 훈련 효율을 크게 향상시킨 손동작 인식 향상 기술이 적용된 증강현실 재활 훈련시스템을 제공하는 것이다.The present invention was created based on the above background and needs, and an object of the present invention is to induce interest in patients in augmented reality rehabilitation training by utilizing infrared image-based hand motion recognition technology and voice recognition technology using a Leap Motion controller, but to improve hand motion To provide an augmented reality rehabilitation training system applied with hand motion recognition enhancement technology that greatly improves recognition accuracy and training efficiency.

상기와 같은 목적을 위해 본 발명은 내측에 고정부가 형성된 하부프레임과, 상기 하부프레임 상측에 위치하며 거치대가 형성된 상부프레임과, 상기 하부프레임 및 상부프레임 사이에 손동작을 위한 동작영역이 형성되도록 상호 연결하는 수직프레임으로 구성되는 몸체부; 상기 고정부에 설치되며 상기 동작영역에서의 손동작을 촬영하는 영상감지모듈; 상기 거치대에 결합되며 영상을 출력하는 디스플레이부와, 음원을 출력하는 음향출력부를 구비하는 출력모듈; 재활훈련을 위한 안내음성과 재활훈련 콘텐츠와 상기 콘텐츠에 대응하는 콘텐츠영상 및 유저인터페이스와 콘텐츠음원이 저장된 저장부와, 상기 영상감지모듈에서 촬영된 손동작을 인식하는 동작인식부와, 인식된 손동작을 콘텐츠영상에 적용하여 출력시키는 영상처리부와, 인식된 손동작을 상기 콘텐츠의 내용에 대응하여 평가하는 평가부와, 평가결과를 영상 또는 음성을 통해 출력되도록 하는 피드백부를 구비하는 제어모듈; 로 이루어지는 것을 특징으로 한다.For the above purpose, the present invention provides interconnection between a lower frame having a fixing part formed therein, an upper frame positioned above the lower frame and having a cradle formed therein, and an operating area for hand motion between the lower frame and the upper frame. Body portion consisting of a vertical frame to; an image detection module installed in the fixing part and capturing a hand motion in the motion area; an output module coupled to the cradle and having a display unit outputting an image and an audio output unit outputting a sound source; A storage unit for storing guide voices for rehabilitation, rehabilitation training contents, content images corresponding to the contents, a user interface, and content sound sources, a motion recognition unit for recognizing hand motions captured by the image sensing module, and recognized hand motions A control module including an image processing unit that applies to a content image and outputs it, an evaluation unit that evaluates the recognized hand motion in response to the contents of the content, and a feedback unit that outputs the evaluation result through video or audio; It is characterized by consisting of.

이때 상기 동작인식부는, 립 모션 컨트롤러를 통해 손의 움직임을 추적하여 3차원의 가상환경에 손 모델의 3차원 위치 값, 회전 값을 반영하고, 각 손가락 체절에 관한 상대각도 값을 구하는 것이 바람직하다.At this time, it is preferable that the motion recognition unit tracks the movement of the hand through a lip motion controller, reflects the 3D position value and rotation value of the hand model in a 3D virtual environment, and obtains a relative angle value for each finger segment. .

또한, 사용자로부터 음성을 입력받는 음성입력부와, 입력된 음성을 분석하여 텍스트정보로 변환하는 음성인식부를 구비하는 음성입력모듈; 을 더 포함하고, 상기 제어모듈은, 상기 텍스트정보를 통해 콘텐츠의 선택 및 진행이 이루어지도록 하는 음성인터페이스부를 더 포함할 수 있다.In addition, a voice input module having a voice input unit for receiving a voice input from a user and a voice recognition unit for analyzing the input voice and converting it into text information; Further, the control module may further include a voice interface unit for selecting and progressing content through the text information.

또한, 상기 재활훈련콘텐츠는, 음원을 재생과 함께 점수 및 자막과 상호작용 목표 오브젝트를 출력하고, 사용자가 상기 상호작용 목표 오브젝트에 맞춰 손뼉을 치면 손의 위치와 손뼉 동작을 인식하여 효과음과 함께 점수를 부여하도록 구성될 수 있다.In addition, the rehabilitation training content outputs a score, subtitles, and an interactive target object while playing a sound source, and when the user claps his hands in accordance with the interactive target object, the hand position and hand clap motion are recognized to obtain a score along with sound effects. It can be configured to give.

또한, 상기 재활훈련콘텐츠는, 복수의 동물 캐릭터가 오브젝트로 주어지고 안내음성과 함께 선택된 오프젝트에 변화를 주며, 사용자가 설정된 손동작을 통해 음성인식을 활성화 시킨 후 입력한 음성인식결과와 정답이 일치할 경우 효과음과 함께 점수가 주어지도록 구성될 수 있다.In addition, in the rehabilitation training content, a plurality of animal characters are given as objects, and the selected object is changed along with a guide voice, and the user activates voice recognition through a set hand motion, and then the correct answer matches the voice recognition result input. If you do, it may be configured to give a score along with a sound effect.

또한, 상기 재활훈련콘텐츠는, 쓰레기 분리수거 항목으로 복수의 포지션이 이름표와 함께 제시되고, 가상의 책상에 무작위로 분리수거 대상인 오브젝트를 생성 후, 안내 음성에 맞춰 오프젝트를 잡아 지정 포지션을 옮기는 손동작에 대응하는 손 모델을 함께 출력하며, 주어진 이름표와 쓰레기 오브젝트의 속성이 일치할 경우 정답으로 간주하여 효과음과 함께 점수가 주어지도록 구성될 수 있다.In addition, in the rehabilitation training content, a plurality of positions are presented with a name tag as a separate garbage collection item, and after randomly generating an object to be collected separately on a virtual desk, a hand motion of moving the designated position by grabbing the object according to the voice guide The hand model corresponding to is output together, and if the attributes of the given name tag and the trash object match, it may be regarded as a correct answer and a score may be given along with a sound effect.

또한, 상기 재활훈련콘텐츠는, 설정된 다수의 물건 및 상자를 제시하고, 주어지는 대상과 연관된 물건을 찾아 상자에 넣으라는 지시 음성 메시지가 출력하고, 선택된 물건을 집어 상자에 알맞게 집어넣는 동작이 이루어질 경우 음성피드백과 함께 점수를 부여하고 잘못된 동작이 이루어질 경우 음성피드백과 함께 다음 문제로 넘어가게 되는 과정을 반복하도록 구성될 수 있다.In addition, the rehabilitation training content presents a set number of objects and boxes, outputs a commanding voice message to find an object related to the given object and put it in the box, and when an operation of picking up the selected object and properly putting it into the box is performed, a voice message is produced. It can be configured to repeat the process of assigning a score with feedback and moving on to the next problem with voice feedback if an incorrect action is made.

본 발명은 소프트웨어적인 방법으로 환자의 흥미도를 유지하면서 사용법이나 사용 중에 느끼는 괴리감을 해결할 수 있으며 손동작 인식도 정확하게 이루어질 수 있다.According to the present invention, it is possible to solve the discrepancy felt during usage or use while maintaining the patient's interest by using a software method, and also accurately recognize hand gestures.

도 1은 본 발명의 실시예에 따른 구성 및 연결관계를 나타낸 블록도,
도 2는 본 발명의 실시예에 따른 외형을 나타낸 사시도,
도 3은 본 발명의 실시예에 따른 손뼉 동작 및 잡는 동작의 검출 플로어차트,
도 4는 본 발명의 실시예에 따른 손동작 인식을 위한 계층구조,
도 5는 본 발명의 실시예에 따른 콘텐츠 1의 진행 시나리오,
도 6은 본 발명의 실시예에 따른 콘텐츠 1의 설계 및 목표,
도 7은 본 발명의 실시예에 따른 콘텐츠 1의 시스템 설계 레이아웃,
도 8은 본 발명의 실시예에 따른 콘텐츠 1의 시스템 내수 스크린샷,
도 9는 본 발명의 실시예에 따른 콘텐츠 2의 진행 시나리오,
도 10은 본 발명의 실시예에 따른 콘텐츠 2의 설계 및 목표,
도 11은 본 발명의 실시예에 따른 콘텐츠 2의 시스템 설계 레이아웃,
도 12는 본 발명의 실시예에 따른 콘텐츠 2의 시스템 내수 스크린샷,
도 13은 본 발명의 실시예에 따른 콘텐츠 3의 진행 시나리오,
도 14는 본 발명의 실시예에 따른 콘텐츠 3의 설계 및 목표,
도 15는 본 발명의 실시예에 따른 콘텐츠 3의 시스템 설계 레이아웃,
도 16은 본 발명의 실시예에 따른 콘텐츠 3의 시스템 내수 스크린샷,
도 17은 본 발명의 실시예에 따른 콘텐츠 4의 진행 시나리오,
도 18은 본 발명의 실시예에 따른 콘텐츠 4의 설계 및 목표,
도 19는 본 발명의 실시예에 따른 콘텐츠 4의 시스템 설계 레이아웃,
도 20은 본 발명의 실시예에 따른 콘텐츠 4의 시스템 내수 스크린샷,
도 21은 본 발명의 실험예에 따른 측정대상 손동작 사전 정의,
도 22는 본 발명의 실험예에 따른 오브젝트 각도에 따른 체절 각도 획득 방법 비교,
도 23은 본 발명의 실험을 위한 CNN 모델 블록 개요,
도 24는 본 발명의 실험예에 따른 모델 레이어 정의 및 구조,
도 25는 Local transform euler를 이용한 손동작별 손동작 인식 정확도를 나타낸 측정표,
도 26은 Quaternion to euler를 이용한 손동작별 인식 정확도를 나타낸 측정표,
도 27은 Local transform euler을 이용한 각도별 인식 정확도를 나타낸 측정표,
도 28은 Quaternion to euler 를 이용한 각도별 정확도를 나타낸 측정표,
도 29는 본 발명의 실험예에 따른 모델 재인식 반응속도 실험결과,
도 30은 본 발명의 따른 손동작 인식 정확도 실험에 따른 confusion matrix 이다.1 is a block diagram showing the configuration and connection relationship according to an embodiment of the present invention;
2 is a perspective view showing an external appearance according to an embodiment of the present invention;
3 is a flowchart of detection of hand clapping and grasping motions according to an embodiment of the present invention;
4 is a hierarchical structure for hand gesture recognition according to an embodiment of the present invention;
5 is a progress scenario of content 1 according to an embodiment of the present invention;
6 is a design and goal of content 1 according to an embodiment of the present invention;
7 is a system design layout of content 1 according to an embodiment of the present invention;
8 is a system internal screenshot of content 1 according to an embodiment of the present invention;
9 is a progress scenario of content 2 according to an embodiment of the present invention;
10 is a design and goal of content 2 according to an embodiment of the present invention;
11 is a system design layout of content 2 according to an embodiment of the present invention;
12 is a system internal screenshot of content 2 according to an embodiment of the present invention;
13 is a progress scenario of content 3 according to an embodiment of the present invention;
14 is a design and goal of content 3 according to an embodiment of the present invention;
15 is a system design layout of content 3 according to an embodiment of the present invention;
16 is a system internal screenshot of content 3 according to an embodiment of the present invention;
17 is a progress scenario of content 4 according to an embodiment of the present invention;
18 is a design and goal of content 4 according to an embodiment of the present invention;
19 is a system design layout of content 4 according to an embodiment of the present invention;
20 is a system internal screenshot of content 4 according to an embodiment of the present invention;
21 is a dictionary definition of a hand gesture to be measured according to an experimental example of the present invention;
22 is a comparison of segment angle acquisition methods according to object angles according to an experimental example of the present invention;
23 is an overview of CNN model blocks for the experiment of the present invention;
24 is a model layer definition and structure according to an experimental example of the present invention;
25 is a measurement table showing hand gesture recognition accuracy for each hand gesture using a local transform euler;
26 is a measurement table showing recognition accuracy for each hand gesture using Quaternion to Euler;
27 is a measurement table showing recognition accuracy for each angle using a local transform euler;
28 is a measurement table showing accuracy for each angle using Quaternion to euler;
29 is a model re-recognition reaction rate test results according to an experimental example of the present invention,
30 is a confusion matrix according to the hand gesture recognition accuracy test according to the present invention.

이하, 첨부된 도면을 참고하여 본 발명 손동작 인식 향상 기술이 적용된 증강현실 재활 훈련시스템의 구성을 구체적으로 설명한다.Hereinafter, the configuration of the augmented reality rehabilitation training system to which the hand gesture recognition enhancement technology of the present invention is applied will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 실시예에 따른 구성 및 연결관계를 나타낸 블록도, 도 2는 본 발명의 실시예에 따른 외형을 나타낸 사시도로서, 본 발명은 주요구성으로 시스템의 틀을 이루는 몸체부(100)를 비롯하여 증강현실 재활훈련을 위한 구성으로 영상감지모듈(140)과, 음성입력모듈(150)과, 출력모듈(160)과, 제어모듈(170)로 이루어진다.Figure 1 is a block diagram showing the configuration and connection relationship according to an embodiment of the present invention, Figure 2 is a perspective view showing the external appearance according to an embodiment of the present invention, the present invention is a body part (100) constituting the frame of the system as a main component ) Including a configuration for augmented reality rehabilitation training, it consists of an image detection module 140, a voice input module 150, an output module 160, and a control module 170.

먼저, 상기 몸체부(100)는 내측에 고정부가 형성된 하부프레임(110)과, 상기 하부프레임(110) 상측에 위치하며 거치대가 형성된 상부프레임(120)과, 상기 하부프레임(110) 및 상부프레임(120) 사이에 손동작을 위한 동작영역(131)이 형성되도록 상호 연결하는 수직프레임(130)으로 구성된다.First, the body portion 100 includes a lower frame 110 having a fixing part formed therein, an upper frame 120 located above the lower frame 110 and having a cradle formed thereon, the lower frame 110 and the upper frame It consists of vertical frames 130 interconnected so that an operating area 131 for hand motion is formed between 120.

상기 하부프레임(110)은 상측이 개방된 상자형태로서 내측 중앙에는 후술되는 영상감지모듈(140)의 설치를 위한 고정부(111)가 형성되며, 양측으로 수직프레임(130)이 결합되며 상측으로 동작영역(131)에 해당하는 작업 스테이션을 구축함과 더불어, 상기 수직프레임(130)의 상단으로는 거치대(121)가 형성된 상부프레임(120)을 연결된다. 각 프레임의 연결시 내구성이 보장되어야 하며 기역 모양의 꺽쇠를 이용하여 지지하고 측면에서 볼트로 추가 고정하게 된다.The lower frame 110 has a box shape with an upper side open, and a fixing part 111 for installing an image sensing module 140 to be described later is formed at the inner center, and vertical frames 130 are coupled to both sides to move upward. In addition to constructing a work station corresponding to the operation area 131, the upper frame 120 on which the holder 121 is formed is connected to the upper end of the vertical frame 130. Durability must be guaranteed when connecting each frame, and it is supported using a group-shaped clamp and additionally fixed with bolts on the side.

또한, 상기 하부프레임(110)에는 상기 영상감지모듈(140)을 PC, 즉 제어모듈과 연결하기 위한 USB 케이블의 통과를 위한 케이블 구멍이 따로 구비된다.In addition, the lower frame 110 is separately provided with a cable hole for passing a USB cable for connecting the image sensing module 140 to a PC, that is, a control module.

상기 영상감지모듈(140)은 상기 고정부에 설치되며 상기 동작영역에서의 손동작을 촬영하며, 본 발명의 실시예에서는 립모션 컨트롤러(leap motion controller)를 이용하여 영상감지모듈(140)을 구성하게 된다.The image sensing module 140 is installed in the fixing part and captures hand motions in the motion area. In an embodiment of the present invention, the image sensing module 140 is configured using a leap motion controller do.

상기 립모션 컨트롤러는 손동작 인식용 적외선 카메라 모듈로써 단일장비로 손의 각도와 움직임, 손가락의 굽힘 정도를 추적하는데 특화되어있다. 본 발명의 실시예에서는 립모션 4.2 Orion SDK을 이용하여 제작하였으며 데스크톱 상부연결 방식을 사용하여 Unity에서 정보를 획득해 매핑하였다. 이는 데스크톱에 배치되는 소형 USB 장치이며 추적 정밀도가 현저하게 양호한 것으로 입증되었다. 표준 편차는 1mm 미만이고, 비록 카메라가 손의 가려진 부분을 볼 수는 없지만, 알고리즘을 통해 몇몇 전형적인 손동작을 검출하여 자체적으로도 유용한 추적을 제공할 수 있으며 연구실에서 활용하기에 매우 적합한 크기와 무게, 사용방식을 취하고 있어 본 발명에 적합한 특성을 갖는다.The Leap Motion Controller is an infrared camera module for recognizing hand motions and is specialized in tracking the angle and movement of a hand and the degree of bending of a finger with a single device. In the embodiment of the present invention, it was produced using the Leap Motion 4.2 Orion SDK, and information was acquired and mapped from Unity using the desktop top connection method. It is a small USB device placed on a desktop and has proven remarkably good tracking accuracy. The standard deviation is less than 1 mm, and although the camera cannot see the occluded parts of the hand, the algorithm can detect some typical hand motions, providing useful tracking on its own, and is a very good size and weight for laboratory applications. It has characteristics suitable for the present invention because it takes a method of use.

상기 음성입력모듈(150)은 마이크를 기반으로 사용자로부터 음성을 입력받는 음성입력부(151)와, 입력된 음성을 분석하여 텍스트정보로 변환하는 음성인식부(152)로 구성된다.The voice input module 150 is composed of a voice input unit 151 that receives a voice input from a user based on a microphone and a voice recognition unit 152 that analyzes the input voice and converts it into text information.

본 발명의 실시예에서는 한국어 음성인식을 위해서는 Google speech to text API를 이용하였다. Google speech to text API는 음성 인식 맞춤설정을 통해 힌트를 제공하여 분야별 용어와 많이 쓰이지 않는 단어를 텍스트로 변환하고 특정단어 또는 어구의 텍스트 변환 정확도를 향상시킬 수 있으며 클래스를 사용해서 음성으로 인식된 숫자를 주소, 연도, 통화 등으로 자동 변환이 가능하고 Google의 최첨단 자동 음성 인식(ASR) 딥 러닝 신경망 알고리즘을 적용하여 음성 명령을 통해 제품에 더 나은 사용자 환경을 제공하고 콘텐츠를 실시간으로 또는 저장된 파일에서 텍스트로 변환할 수 있다는 장점이 있다.In the embodiment of the present invention, Google speech to text API was used for Korean speech recognition. The Google speech to text API provides hints for customizing speech recognition to convert subject terms and less common words to text, improve the accuracy of text conversion for specific words or phrases, and use classes to recognize speech as digits. to address, year, currency, etc., and applies Google's state-of-the-art Automatic Speech Recognition (ASR) deep learning neural network algorithm to provide a better user experience for products through voice commands and convert content in real time or from stored files. It has the advantage of being able to convert to text.

상기 출력모듈(160)은 상기 거치대(121)에 결합되며 영상을 출력하는 디스플레이부(161)와, 음원을 출력하는 음향출력부(162)로 구성된다. 즉 모니터에 해당하는 디스플레이부(161) 및 스피커에 해당하는 음향출력부(162)가 상기 거치대(121)에 고정되는 것으로, 영상감지모듈(140)은 상부프레임(120) 하부에 위치하는 상자 형태의 하부프레임(110) 내부 고정부(111)에 위치하여 좌우 수직 프레임으로 구성되는 동작영역(작업스테이션) 내에서의 손 움직임을 촬영하도록 구성된다.The output module 160 is coupled to the holder 121 and is composed of a display unit 161 outputting an image and an audio output unit 162 outputting a sound source. That is, the display unit 161 corresponding to the monitor and the sound output unit 162 corresponding to the speaker are fixed to the holder 121, and the image sensing module 140 is in the form of a box located below the upper frame 120. It is located in the lower frame 110 of the internal fixing part 111 and is configured to capture hand movements in the operation area (work station) composed of left and right vertical frames.

상기 제어모듈(170)은 상기 영상감지모듈(140)과 음성입력모듈(150) 및 출력모듈(160)과 연결되어 재활훈련 콘텐츠를 통해 재활훈련을 진행하기 위한 구성으로 PC나 이와 동등한 기능을 갖는 단말기로 구성되며, 세부 구성으로 저장부(171)와, 동작인식부(172)와, 음성인터페이스부(173)와, 영상처리부(174)와, 평가부(175) 및 피드백부(176)를 구비한다.The control module 170 is connected to the image sensing module 140, the voice input module 150, and the output module 160 to perform rehabilitation training through rehabilitation training contents, and has a PC or equivalent function. It is composed of a terminal, and the detailed configuration includes a storage unit 171, a motion recognition unit 172, a voice interface unit 173, an image processing unit 174, an evaluation unit 175, and a feedback unit 176. provide

상기 저장부(171)는 메모리 내지는 저장매체로서 재활훈련을 위한 안내음성과 재활훈련 콘텐츠와 상기 콘텐츠에 대응하는 콘텐츠영상 및 유저인터페이스와 콘텐츠음원이 저장되며, 영상, 음원 및 증강현실을 통해 재활훈련을 진행하는 다양한 콘텐츠 및 관련 데이터가 저장될 수 있다.The storage unit 171, as a memory or storage medium, stores guide voices for rehabilitation training, rehabilitation training contents, content images corresponding to the contents, user interface, and contents sound sources, and rehabilitation training through images, sound sources, and augmented reality. Various contents and related data can be stored.

상기 동작인식부(172)는 상기 영상감지모듈에서 촬영된 손동작을 인식하는 구성이다. 앞서 언급한 바와 같이 립모션 컨트롤러를 통해 영상감지모듈(140)을 구성하므로, 본 발명에서 상기 동작인식부(172)는 립 모션 컨트롤러를 통해 손의 움직임을 추적하여 3차원의 가상환경에 손 모델의 3차원 위치 값, 회전 값을 반영하고, 각 손가락 체절에 관한 상대각도 값을 구하게 된다.The motion recognition unit 172 is a component that recognizes a hand motion captured by the image sensing module. As mentioned above, since the image sensing module 140 is configured through the lip motion controller, in the present invention, the motion recognition unit 172 tracks the movement of the hand through the lip motion controller to create a hand model in a 3D virtual environment. It reflects the 3D position value and rotation value of , and obtains the relative angle value for each finger segment.

도 3은 본 발명의 실시예에 따른 손뼉 동작 및 잡는 동작의 검출 플로어차트를 나타내고 있다.3 shows a flow chart for detecting a hand clapping motion and a grasping motion according to an embodiment of the present invention.

본 발명에서 잡는 동작의 검출 및 손뼉 동작의 검출은 물건을 집기 위한 상호작용 알고리즘의 하위 메서드로써 잡는 동작의 검출을 입력받는 이진화 변수와 손뼉 동작의 검출을 입력받는 트리거를 발생시킨다.In the present invention, the detection of the grasping motion and the detection of the hand clap motion are sub-methods of the interaction algorithm for picking up objects, and generate a binarization variable receiving the detection of the grasping motion and a trigger receiving the detection of the hand clap motion.

립모션 장비는 손의 움직임을 추적하여 3차원의 가상환경에 손 모델의 3차원 위치 값, 회전 값을 반환한다. 이러한 3차원 손 모델은 손가락마다 4개의 관절에 해당하는 총 20개의 개별 객체에 관한 위치, 회전 정보를 포함하는 3D 객체로써 같은 씬의 객체들과 유연한 상호동작이 가능하다.The Leap Motion equipment tracks hand movements and returns the 3D position and rotation values of the hand model to the 3D virtual environment. This 3D hand model is a 3D object that includes location and rotation information for a total of 20 individual objects corresponding to 4 joints per finger, enabling flexible interaction with objects in the same scene.

별도로 손동작에 대한 손 모양을 인식하여 분류하는 라벨링과 손동작에 따른 상호작용에 대한 피드백인 활성함수는 제공되지 않는다. Pinch, Grab, Palm direction 등의 제한적인 손동작 라벨만을 제공한다. 따라서 충돌 매개변수, 회전 속도 검출, 좌우 스냅 등의 동작을 구현하기 위해서는 함수 추가 및 구현을 통해야만 가상의 물체를 집거나 좌우 스냅과 Pinch를 통한 Zoom 등의 기능을 수행할 수 있다.Separately, labeling for recognizing and classifying hand shapes for hand gestures and activation functions, which are feedback for interactions according to hand gestures, are not provided. Only limited hand motion labels such as Pinch, Grab, and Palm direction are provided. Therefore, in order to implement actions such as collision parameter, rotational speed detection, and left/right snapping, it is possible to perform functions such as pinching a virtual object or zooming through left/right snapping and pinching only through adding and implementing functions.

또한, 립모션 자체는 모델구현만을 제공할 뿐 손동작 인식을 위해서는 별도의 라벨링 및 정의함수가 추가로 필요하다. 예를 들어 물건을 집기 위해서 손을 오므리거나 피는 동작을 3D모델 구현한다고 했을 때, 집는 모델을 렌더링하는 것은 립모션을 통해 관절의 위치를 찾아 매핑하는 것으로 해결될 수 있지만 콘텐츠 내부에 존재하는 오브젝트와 상호작용을 하려면 충돌체크와 충돌연산에 동반되는 동작을 구현해서 지정해 주어야 한다.In addition, Leap Motion itself only provides model implementation, but separate labeling and definition functions are additionally required for hand motion recognition. For example, when implementing a 3D model of a hand closing or opening to pick up an object, rendering the picking model can be solved by finding and mapping the location of the joint through Leap Motion, but the object and In order to interact, the collision check and the motion accompanying the collision operation must be implemented and specified.

이 때문에 손바닥 부분과 각 손끝 관절은 충돌의 매개체가 되는 물리량을 갖는 충돌체를 추가해주게 되는데 이를 이하 충돌 판정 박스라고 한다. 충돌 판정 박스를 추가해줌으로써 상호작용이 가능한 물체 혹은 가상현실의 구성 오브젝트가 가지고 있는 충돌 판정 박스와 충돌체크를 진행할 수 있도록 구성하였다.For this reason, the palm part and each fingertip joint are added with a colliding body having a physical quantity that is a medium of collision, which is hereinafter referred to as a collision determination box. By adding a collision determination box, it is configured to proceed with a collision check with a collision determination box possessed by an interactive object or a constituent object of virtual reality.

고정된 관절 구성요소(fixed joint) 컴포넌트를 추가하면 손 모델에 포함된 충돌 판정 박스와 다른 상호작용 오브젝트 사이의 접점을 상호작용 오브젝트의 원점으로 설정할 수 있어 위치정보의 부모객체로써 동작하고 마치 물건을 집어서 옮기는 것과 같이 보이도록 할 수 있다. 이 원리에 따라 동작하는 순서는 우선 손 3D 모델에 충돌 판정 박스를 추가하고 손 3D 모델의 충돌 판정 박스가 다른 상호작용 오브젝트의 충돌 판정 박스와 충돌하였을 때 상호작용 오브젝트의 태그 변수를 확인한다.By adding a fixed joint component, the point of contact between the collision detection box included in the hand model and other interaction objects can be set as the origin of the interaction object, which acts as a parent object for location information and acts as an object. You can make it look like pick up and move. The sequence of operation according to this principle is to first add a collision determination box to the hand 3D model, and check the tag variable of the interaction object when the collision determination box of the hand 3D model collides with the collision determination box of another interaction object.

태그 변수는 일종의 라벨과 같은 변수로써 문자열의 형태로 오브젝트의 식별정보를 반환하는 기능을 수행한다. 이때 확인한 식별정보를 담는 태그 변수가 잡을 수 있는 객체로 분류되는 경우 엄지와 검지가 일정각도 이상인지 확인하여 일정각도 이상일 경우 3D객체의 모델링 내부 스켈레톤을 고정하는 컴포넌트인 고정된 관절 구성요소를 생성하여 접점을 구성하는 방식으로 동작하며 고정된 관절 구성요소가 존재할 때 엄지와 검지의 각도가 일정 각도 이하로 떨어질 경우 고정된 관절 구성요소를 해제하면 된다.A tag variable is a variable such as a kind of label, and performs a function of returning object identification information in the form of a string. At this time, if the tag variable containing the identified identification information is classified as an object that can be grasped, check whether the thumb and forefinger are at a certain angle or more, and if the angle is more than a certain angle, a fixed joint component, which is a component that fixes the skeleton inside the modeling of the 3D object, is created. It operates in a way to configure a contact point, and when there is a fixed joint component, if the angle between the thumb and index finger falls below a certain angle, the fixed joint component can be released.

결과적으로 잡는 동작의 검출 및 손뼉 동작의 검출은 이진화 변수로 나타낼 수 있으며 손뼉 동작 검출의 경우 측정 후 대기 상태로 복귀하는 트리거 변수를 통해 처리된다. 트리거 변수와 이진화 변수의 차이점은 트리거 변수는 이벤트 핸들러에 발생 값을 전달시키고 소멸하여 함수 동작이 완료되면 다시 대기상태로 진입하여 동작하도록 만들어주지만 이진화 변수는 상태 반환 값을 유지한다는 점에서 다르기 때문에 손뼉 동작의 검출은 트리거 변수를 이용해 손뼉을 친 순간에만 이벤트를 1회 처리하는 것이 가능하다.As a result, detection of grasping motion and detection of hand clapping motion can be expressed as a binarization variable, and detection of hand clapping motion is processed through a trigger variable that returns to a standby state after measurement. The difference between a trigger variable and a binary variable is that the trigger variable passes the occurrence value to the event handler and destroys it, and when the function operation is completed, it enters the waiting state and operates again, but the binary variable maintains the status return value. For motion detection, it is possible to process an event once only at the moment when a hand is clapped by using a trigger variable.

잡는 동작의 검출을 위한 기존의 방법은 pinch 동작을 이용해 감지하고 있어 엄지와 집게손가락을 사용하여 잡는 것이 바람직하다. 실제 손으로 잡는 동작을 하는 경우 작용점과 받침점이 반드시 존재하게 되며 이는 손가락 하나로 작은 물체를 잡을 때도 2가지 이상의 손 오브젝트와 접촉하였을 때 그 내각이 90도 이하이면 잡는 동작이 발생하도록 조정하였다. 손 오브젝트는 태그로써 구분하며 잡는 것이 가능한 오브젝트에서 실행함수를 통해 손 태그를 가진 오브젝트가 접촉하였을 경우 카운트를 1 증가시키고 직전에 카운트를 증가시켰던 손 오브젝트가 벗어났을 경우 카운트를 1 감소시킨다.Existing methods for detecting a grasping motion detect using a pinch motion, so it is preferable to grasp using the thumb and forefinger. In the case of actual hand-holding motion, the action point and the fulcrum point must exist, and this was adjusted so that when holding a small object with one finger, when the inner angle is less than 90 degrees when in contact with two or more hand objects, the holding motion occurs. The hand object is classified as a tag, and the count is increased by 1 when an object with a hand tag is contacted through an execution function in an object that can be grabbed, and the count is decreased by 1 when the hand object whose count was increased immediately before is out of the way.

그리고 카운트가 2 이상일 때 첫 번째 카운트를 증가시켰던 손 오브젝트의 각도를 기준으로 이후 카운트를 증가시킨 손 오브젝트와의 내각이 90도 이하인 경우 잡는 동작으로 판정하고 이때 충돌연산이 계속해서 발생할 경우 힘이 작용하여 잘못된 방향으로 애니메이션이 작용되어 손 오브젝트가 떨림 현상을 보일 수 있기 때문에 잡는 것이 가능한 오브젝트의 물리연산 강체 컴포넌트를 비활성화시키고 고정된 관절 구성요소를 생성하여 손 오브젝트에 고정할 수 있도록 한다. 물리연산 강체가 소실되어도 충돌연산을 진행할 수 있으므로 충돌이 벗어난 경우 카운트를 감소시키는 것에는 지장이 없다.And when the count is 2 or more, based on the angle of the hand object that increased the first count, if the interior angle with the hand object that increased the count later is 90 degrees or less, it is determined as a grabbing motion, and if collision calculation continues at this time, force is applied Therefore, since the hand object may show a shaking phenomenon due to animation in the wrong direction, disable the physics calculation rigid body component of the grippable object and create a fixed joint component so that it can be fixed to the hand object. Even if the physics calculation rigid body is lost, collision calculation can proceed, so there is no problem in decrementing the count when the collision is out of the way.

손뼉 동작의 검출은 훨씬 간단한 방법으로 구현 가능한데 손바닥 모델에 충돌연산이 가능한 충돌 연산부를 추가함으로써 충돌이 시작한 경우에만 1회 효과음을 재생하도록 한다. 경우에 따라서 충돌이 벗어난 경우 플래그를 초기화시킴으로써 안정성을 높일 수 있다.The detection of the hand clap motion can be implemented in a much simpler way. By adding a collision calculation unit capable of collision calculation to the palm model, a sound effect is reproduced only when a collision starts. In some cases, stability can be increased by resetting a flag when the collision is out of the way.

또한, 본 발명에서는 손동작 라벨링을 위해 잡는 동작의 검출 및 손뼉 동작 검출은 손가락의 각도를 각각 입력받아 처리할 수 있도록 손 각도를 수집하는 메서드를 구현하여 연상 가능한 모든 손동작을 인식할 수 있도록 설계가 이루어진다. 이는 손바닥 모델을 기준으로 상대각도를 얻어 처리함으로써 가능한 방식이며 미리 지정된 손동작들을 라벨 검사를 통해 획득하도록 하였다.In addition, the present invention implements a method for collecting hand angles so that the grasping motion detection and hand clap motion detection for hand motion labeling can receive and process the angles of each finger, so that all possible hand motions can be recognized. . This is possible by obtaining and processing relative angles based on the palm model, and pre-specified hand motions were acquired through label inspection.

손가락 체절의 각도는 지역 객체에 관한 오일러 각으로 받아올 수 있으며 손 모델의 인식 및 렌더링 이후 체절이 갖는 (x, y, z)축에 대한 회전 및 보간을 직관적으로 받기 위해 사용된다. 예를 들어, 오일러 각도가 (x, y, z)축 순서로 (γ, β, α)라고 하면 이는 우선 x축으로 γ만큼 회전한 다음, 회전한 축(x', y', z')의 y′축을 기준으로 β만큼 회전한다. 그리고 마지막으로 회전한 축(x'', y'', z'')의 z''축을 기준으로 α만큼 회전한다. 이때, 오일러 각도는 하나의 회전으로 모든 회전의 합을 표현하고 이를 행렬로 나타내면 [수학식 1]과 같이 표현할 수 있다.The angle of the finger segment can be obtained as an Euler angle for the local object, and is used to intuitively receive rotation and interpolation about the (x, y, z) axis of the segment after recognizing and rendering the hand model. For example, if the Euler angles are (γ, β, α) in the order of the (x, y, z) axes, it is first rotated by γ around the x-axis, and then the rotated axis (x', y', z') rotates by β about the y' axis of Then, it rotates by α based on the z'' axis of the last rotated axis (x'', y'', z''). At this time, the Euler angle expresses the sum of all rotations with one rotation, and when expressed as a matrix, it can be expressed as in [Equation 1].

[수학식 1][Equation 1]

그러나 오일러 각은 우리가 흔히 알고 있듯이 계층구조에 의한 짐벌락(gimbal lock) 현상의 발생으로 축이 상실되어 올바른 회전 각도를 구현하지 못할 가능성이 있는데 계층구조를 설계하고 쿼터니언(quaternion) 회전 각도를 받아오는 것으로 축이 상실되는 것 없이 온전한 체절 각도를 획득할 수 있다.However, Euler angles, as we commonly know, may not be able to implement the correct rotation angle because the axis is lost due to the occurrence of the gimbal lock phenomenon caused by the hierarchy. Design the hierarchy and receive the quaternion rotation angle By coming, the full segment angle can be obtained without loss of axis.

손동작을 구분하는 것은 각 손가락 별로 회전 원점을 제대로 지정하여 측정하는 것을 통해 구현할 수 있으며 이는 다시 말해 원점을 기준으로 각도를 측정하지 않고 바로 상위 오브젝트를 기준으로 각도를 측정하도록 하는 것으로 가능하다.Distinguishing hand gestures can be implemented by properly specifying and measuring the origin of rotation for each finger.

이를 위해 손가락 끝 마디는 바로 상위 마디의 3차원 정보를 상속받도록 설정해야 하고 최종적으로 손바닥 모델과 연결되도록 하는 일련의 정형화된 객체 구조를 구성하여 측정해야만 한다. 모델링이 상기한 객체구조에 따라 구성되지 않았을 경우 3D Max 혹은 Maya 같은 프로그램을 이용하여 3D 모델의 뼈대의 3차원 정보 상속 순서를 설정해 주어야 하고 그 외에 3D 콘텐츠 내에서 원점에 대한 전역각도의 감산으로 이를 구할 수 있다.To this end, the fingertip joint must be set to inherit the 3D information of the upper joint, and a series of standardized object structures that are finally connected to the palm model must be configured and measured. If the modeling is not configured according to the above object structure, the order of 3D information inheritance of the skeleton of the 3D model must be set using a program such as 3D Max or Maya. can be saved

도 4는 본 발명의 실시예에 따른 손동작 인식을 위한 계층구조로서, 이러한 작업을 통해 각 손가락 체절에 관한 상대각도 값을 구함으로써 손목의 회전이나 상완의 각도에 상관없이 온전한 오브젝트의 각도값을 얻을 수 있게 된다. 기본적으로 오일러 각이 갖는 짐벌락 현상 때문에 위와 같이 계층구조를 설정하지 않으면 손가락 체절은 상위체절에 대하여 z축으로 90도 이상 회전할 수 없다.4 is a hierarchical structure for hand gesture recognition according to an embodiment of the present invention. By obtaining relative angle values for each finger segment through this operation, the angle value of a complete object can be obtained regardless of the rotation of the wrist or the angle of the upper arm. be able to Basically, because of the gimbal lock effect of Euler angles, unless the hierarchical structure is set as above, the finger segment cannot rotate more than 90 degrees in the z-axis with respect to the upper segment.

계층구조를 통해 축과 각도에 대한 안정성을 확보할 수 있으며 이를 통해 안전하게 체절마다 각도를 획득할 수 있으므로 다양한 손동작을 인식하는 것이 가능하다. 정확하게는 모든 손가락 체절마다 각도를 얻는 것이 가능하므로 라벨링을 통해 규정한 모든 손동작을 구분 지을 수 있게 된다. 쿼터니언의 회전을 통해 얻는 방법은 한 축이 회전함에 따라 다른 축의 회전 값도 회전연산을 통해 조정되어 정확한 추적이 어렵고 그만큼 직관적인 각도를 유추할 수 없다는 단점이 있어 본 논문에서 차용한 회전 및 연산 순서 변환을 위한 계층구조 설계보다 체절의 각도를 획득하는데 있어서 비효율적이라고 판단된다.It is possible to secure the stability of the axis and the angle through the hierarchical structure, and through this, it is possible to safely acquire the angle for each segment, so that it is possible to recognize various hand motions. Accurately, since it is possible to obtain angles for all finger segments, all hand motions defined through labeling can be distinguished. The method obtained through quaternion rotation has the disadvantage that as one axis rotates, the rotation value of the other axis is also adjusted through rotation calculation, making it difficult to accurately track and inferring an intuitive angle. It is judged to be inefficient in obtaining the angle of the segment than the hierarchical design for transformation.

상기 음성인터페이스부(173)는 상기 텍스트정보를 통해 콘텐츠의 선택 및 진행이 이루어지도록 하는 구성으로, 키보드 자판이나 마우스, 터치스크린과 같은 입력수단의 경우 재활환자의 접근성을 떨어뜨릴 수 있어 음성입력을 기반으로 동작의 입력 및 콘텐츠 진행이 이루어지도록 구성하게 된다.The voice interface unit 173 is configured to select and proceed with content through the text information, and in the case of input means such as a keyboard keyboard, mouse, or touch screen, it can reduce the accessibility of rehabilitation patients, thereby enabling voice input. Based on this, it is configured so that motion input and content progress are made.

상기 영상처리부(174)는 증강현실 환경을 만들어 주는 구성으로 상기 동작인식부를 통해 인식된 손동작을 콘텐츠 영상에 적용, 즉 합치는 영상처리를 통해 증강현실 영상을 출력시키게 된다.The image processing unit 174 is configured to create an augmented reality environment, and outputs an augmented reality image through image processing of applying, that is, merging, the hand motion recognized through the motion recognition unit to the content image.

상기 평가부(175)는 인식된 손동작을 상기 콘텐츠의 내용에 대응하여 평가하는 구성으로, 특히 어떤 손동작이 정확하게 이루어진 여부나 정확한 타이밍에 맞게 이루어졌는지 등을 평가하게 된다.The evaluation unit 175 is a component that evaluates the recognized hand gestures in correspondence with the contents of the content, and in particular, evaluates whether a certain hand gesture was made correctly or at the correct timing.

상기 피드백부(176)는 상기 평가부 평가결과를 영상 또는 음성을 통해 출력하여 사용자가 정확한 동작을 실행했는지나 동작이나 타이밍의 수정이 필요한지를 인지할 수 있도록 한다. The feedback unit 176 outputs the evaluation result of the evaluation unit through video or audio so that the user can recognize whether the correct operation has been performed or whether the operation or timing needs to be corrected.

본 발명에서는 4개의 재활 콘텐츠를 실시예로 제시하여 설명한다.In the present invention, four rehabilitation contents are presented and described as examples.

첫 번째 실시예로 상기 재활훈련콘텐츠는 음원을 재생과 함께 점수 및 자막과 상호작용 목표 오브젝트를 출력하고, 사용자가 상기 상호작용 목표 오브젝트에 맞춰 손뼉을 치면 손의 위치와 손뼉 동작을 인식하여 효과음과 함께 점수를 부여하도록 구성될 수 있다.In the first embodiment, the rehabilitation training content outputs a score, subtitles, and an interactive target object while reproducing a sound source. can be configured to score together.

도 5는 본 발명의 실시예에 따른 콘텐츠 1의 진행 시나리오, 도 6은 본 발명의 실시예에 따른 콘텐츠 1의 설계 및 목표, 도 7은 본 발명의 실시예에 따른 콘텐츠 1의 시스템 설계 레이아웃, 도 8은 본 발명의 실시예에 따른 콘텐츠 1의 시스템 내수 스크린샷이다.5 is a progress scenario of content 1 according to an embodiment of the present invention, FIG. 6 is a design and goal of content 1 according to an embodiment of the present invention, FIG. 7 is a system design layout of content 1 according to an embodiment of the present invention, 8 is a system internal screen shot of content 1 according to an embodiment of the present invention.

본 발명에서 개발한 립모션을 이용한 증강현실 기반 첫 번째 손동작 인식 재활 콘텐츠는 주의훈련 및 관리기능 훈련을 목표로 하는 훈련 콘텐츠로, 제시되는 상황에 따라 사전에 학습된 알맞은 동작을 처리하도록 하는 과정을 훈련하는 것이다.The first augmented reality-based hand motion recognition rehabilitation content using Leap Motion developed in the present invention is training content aimed at attention training and management function training. is to train

콘텐츠의 개연성을 위해 노래 및 영상을 출력하는 도중 손뼉을 쳐 호응해야 한다는 시나리오를 통해 동기를 부여하고 있다.For the plausibility of the content, motivation is given through a scenario in which you have to clap your hands while outputting songs and videos.

실제 일상생활에서 사용 빈도가 높은 손동작을 이용한 재활콘텐츠를 구현하는 것을 특징으로 하고 있으며 사전에 조사한 인지재활 활동항목들 중 주의훈련과 관리기능 훈련을 목표로 시지각적으로 전달되는 정보를 순간적으로 분류하여 사전에 학습된 운동반응을 의도적으로 이끌어내는 훈련이다.It is characterized by implementing rehabilitation contents using hand movements that are frequently used in daily life. It is a training that intentionally elicits a pre-learned motor response.

진행방식은 총 3가지 노래 선택지를 제공하는 초기화면에서 노래를 선택하고 시작버튼을 누르면 도 7 내지 8과 같이 구성된 UI가 표시된다. UI는 점수 표시와 자막표시를 위한 텍스트 뷰를 포함하고 있으며 배경에는 무작위 동영상이 재생되고 상호작용 목표 오브젝트가 음악에 맞추어 좌 혹은 우에 표시되어 나타난다.In the proceeding method, when a song is selected on the initial screen that provides a total of three song options and the start button is pressed, the UI configured as shown in FIGS. 7 to 8 is displayed. The UI includes a text view for displaying scores and subtitles, and random videos are played in the background, and interactive target objects are displayed on the left or right in line with the music.

사용자는 화면에 표시된 상호작용 목표 오브젝트로 손을 이동하여 손뼉을 치면 손의 위치와 손뼉 동작을 인식하여 효과음과 함께 점수를 획득하게 된다.When the user moves his hand to the interactive target object displayed on the screen and claps his hand, the position of his hand and hand clap motion are recognized and points are acquired along with sound effects.

두 번째 실시예로 상기 재활훈련콘텐츠는, 복수의 동물 캐릭터가 오브젝트로 주어지고 안내음성과 함께 선택된 오프젝트에 변화를 주며, 사용자가 설정된 손동작을 통해 음성인식을 활성화 시킨 후 입력한 음성인식결과와 정답이 일치할 경우 효과음과 함께 점수가 주어지도록 구성될 수 있다.In the second embodiment, the rehabilitation training content is given as an object, changes the selected object with a guide voice, activates voice recognition through a hand motion set by the user, and then inputs the voice recognition result and If the correct answer matches, it may be configured to give a score along with a sound effect.

도 9는 본 발명의 실시예에 따른 콘텐츠 2의 진행 시나리오, 도 10은 본 발명의 실시예에 따른 콘텐츠 2의 설계 및 목표, 도 11은 본 발명의 실시예에 따른 콘텐츠 2의 시스템 설계 레이아웃, 도 12는 본 발명의 실시예에 따른 콘텐츠 2의 시스템 내수 스크린샷이다.9 is a progress scenario of content 2 according to an embodiment of the present invention, FIG. 10 is a design and goal of content 2 according to an embodiment of the present invention, FIG. 11 is a system design layout of content 2 according to an embodiment of the present invention, 12 is a system internal screen shot of content 2 according to an embodiment of the present invention.

콘텐츠 2는 단기 개념 기억 훈련 콘텐츠로 Google speech to text API를 이용한 가상현실 기반 음성인식 재활 콘텐츠이며 주의훈련 및 기억 훈련을 목표로 하는 훈련 콘텐츠이다.Content 2 is short-term concept memory training content, virtual reality-based speech recognition rehabilitation content using Google speech to text API, and training content aimed at attention training and memory training.

가상 오브젝트가 배치된 상태에서 일정 시간이 흐른 후 무작위로 한 가지 오브젝트가 이동하도록 설정되어있다. 이동한 오브젝트의 이름을 제시된 선택지 내에서 선택할 수 있도록 하는 시나리오를 지니고 있다.One object is set to move randomly after a certain amount of time has elapsed in the state where the virtual object is placed. We have a scenario in which the name of the moved object can be selected from among the options presented.

콘텐츠 2는 음성인식을 이용해 컨트롤 가능한 재활콘텐츠를 구현하는 것을 특징으로 하고 있으며 사전에 조사한 인지재활 활동항목들 중 언어적 회상훈련과 주의훈련을 목표로 디스플레이에 시지각적으로 과거에 전달되었던 정보를 기억하였다가 디스플레이의 화면 정보에 집중하여 변화된 정보를 획득하도록 하고 이를 선택적으로 이끌어내는 훈련이다.Content 2 is characterized by implementing controllable rehabilitation content using voice recognition, and remembers information that was delivered in the past visually and perceptually on the display with the goal of verbal recall training and attention training among cognitive rehabilitation activity items investigated in advance. It is a training to acquire the changed information by concentrating on the screen information of the display and selectively derive it.

UI는 점수표시부분과 약간의 주의집중력과 기억 활동을 요구할 수 있는 움직이는 오브젝트가 3D동물 캐릭터로 총 4마리 표시된다. 진행방식은 콘텐츠가 시작되면 한 마리 동물이 도망치고 있으니 기억해달라는 안내음성과 함께 4마리 동물 중 한 마리가 이동하게 되고, 잠시 후 어떤 동물이 이동했는지 주먹을 쥐어 음성인식을 활성화 시킨 후에 정답을 말하고 다시 주먹을 펴 입력된 음성인식결과를 출력한다. 이때 UI를 통해 정답인 동물을 포함한 3가지의 선택지가 나온다. 음성인식결과와 정답이 일치할 경우 효과음과 함께 점수를 획득한다.In the UI, a total of four 3D animal characters are displayed, including a score display part and moving objects that may require a little attention and memory activity. When the content starts, one of the four animals moves along with a guide voice asking you to remember that one animal is running away, and after a while, activate voice recognition by clench your fist to see which animal has moved, and then say the correct answer. Open your fist again and output the input voice recognition result. At this time, through the UI, three options including the correct animal appear. If the voice recognition result matches the correct answer, a score is obtained along with a sound effect.

세 번째 실시예로 상기 재활훈련콘텐츠는, 쓰레기 분리수거 항목으로 복수의 포지션이 이름표와 함께 제시되고, 가상의 책상에 무작위로 분리수거 대상인 오브젝트를 생성 후, 안내 음성에 맞춰 오프젝트를 잡아 지정 포지션을 옮기는 손동작에 대응하는 손 모델을 함께 출력하며, 주어진 이름표와 쓰레기 오브젝트의 속성이 일치할 경우 정답으로 간주하여 효과음과 함께 점수가 주어지도록 구성될 수 있다.In the third embodiment, in the rehabilitation training content, a plurality of positions are presented with name tags as items for separate garbage collection, and objects to be collected are randomly generated on a virtual desk, and then the object is grabbed according to the voice guide and the designated position is set. The hand model corresponding to the hand motion of moving the object is output together, and when the properties of the given name tag and the garbage object match, it can be configured to be considered as the correct answer and given a score along with a sound effect.

도 13은 본 발명의 실시예에 따른 콘텐츠 3의 진행 시나리오, 도 14는 본 발명의 실시예에 따른 콘텐츠 3의 설계 및 목표, 도 15는 본 발명의 실시예에 따른 콘텐츠 3의 시스템 설계 레이아웃, 도 16은 본 발명의 실시예에 따른 콘텐츠 3의 시스템 내수 스크린샷이다.13 is a progress scenario of content 3 according to an embodiment of the present invention, FIG. 14 is a design and goal of content 3 according to an embodiment of the present invention, FIG. 15 is a system design layout of content 3 according to an embodiment of the present invention, 16 is a system internal screen shot of content 3 according to an embodiment of the present invention.

콘텐츠 3은 립모션을 이용한 세 번째 가상현실 기반 손동작 인식 재활 콘텐츠로써 구성 및 개념화 훈련을 목표로 하는 훈련 콘텐츠로, 나열된 오브젝트를 앞에 제시된 분류 항목에 따라 구분하는 과정을 훈련하는 콘텐츠이다.Content 3 is the third virtual reality-based hand motion recognition rehabilitation content using Leap Motion, and is training content aimed at composition and conceptualization training, which trains the process of classifying listed objects according to the previously presented classification items.

쓰레기를 알맞은 분류에 따라 분리수거를 진행한다는 시나리오를 가지고 있다. 콘텐츠 3은 립모션 컨트롤러를 이용한 손 모델링 및 콘텐츠에 렌더링이 가능하고 상기 알고리즘에서 소개한 잡는 동작의 검출을 통해 물건 집기동작을 수행할 수 있는 프로그램으로써 UI는 점수 표시 부분과 물건을 집어넣을 수 있는 골 포지션이 플라스틱, 캔, 유리라는 이름표와 함께 3군데가 제시되며 앞에 있는 가상의 책상에 무작위로 분리수거 대상인 오브젝트가 생성된다.We have a scenario in which garbage is sorted and collected according to appropriate classification. Content 3 is a program that can model a hand using a Leap Motion Controller and render content, and can perform an object picking operation through the detection of the grabbing motion introduced in the above algorithm. Three goal positions are presented along with name tags of plastic, can, and glass, and objects for separate collection are randomly created on the imaginary desk in front.

진행방식은 콘텐츠가 시작되면 알맞은 쓰레기통에 쓰레기를 나누어 달라는 안내 음성이 들리고 오브젝트가 책상 위에 생성되며 손 모델을 움직여 주먹을 쥐어 잡는 동작을 하면 VR 콘텐츠 내부에서도 물건이 집어져 이동 가능하게 된다.As for the progress method, when the content starts, a guide voice is heard to distribute the garbage to the appropriate trash can, an object is created on the desk, and when the hand model is moved to make a fist, the object is picked up and moved inside the VR content.

이를 이용해 알맞은 쓰레기통에 가져가 주먹을 펴 물건을 놓는 것으로 쓰레기통에 물건을 집어넣을 수 있다. 쓰레기통에 붙여진 이름표와 쓰레기 오브젝트의 속성이 일치할 경우 정답으로 간주하여 효과음과 함께 점수를 획득한다.You can use it to bring the item to the appropriate trash can and put the item into the trash can by opening your fist and placing the item. If the name tag attached to the trash can and the properties of the trash object match, it is considered a correct answer and points are awarded along with sound effects.

네 번째 실시예로 상기 재활훈련콘텐츠는, 설정된 다수의 물건 및 상자를 제시하고, 주어지는 대상과 연관된 물건을 찾아 상자에 넣으라는 지시 음성 메시지가 출력하고, 선택된 물건을 집어 상자에 알맞게 집어넣는 동작이 이루어질 경우 음성피드백과 함께 점수를 부여하고 잘못된 동작이 이루어질 경우 음성피드백과 함께 다음 문제로 넘어가게 되는 과정을 반복하도록 구성될 수 있다.In the fourth embodiment, the rehabilitation training content presents a set number of objects and boxes, outputs a commanding voice message to find an object related to the given object and put it in the box, and performs an operation of picking up the selected object and putting it into the box appropriately. If done, it may be configured to give a score along with voice feedback and to repeat the process of moving on to the next problem along with voice feedback if an incorrect action is made.

도 17은 본 발명의 실시예에 따른 콘텐츠 4의 진행 시나리오, 도 18은 본 발명의 실시예에 따른 콘텐츠 4의 설계 및 목표, 도 19는 본 발명의 실시예에 따른 콘텐츠 4의 시스템 설계 레이아웃, 도 20은 본 발명의 실시예에 따른 콘텐츠 4의 시스템 내수 스크린샷이다.17 is a progress scenario of content 4 according to an embodiment of the present invention, FIG. 18 is a design and goal of content 4 according to an embodiment of the present invention, FIG. 19 is a system design layout of content 4 according to an embodiment of the present invention, 20 is a system internal screen shot of content 4 according to an embodiment of the present invention.

콘텐츠 4인 구성 및 개념화 훈련 콘텐츠는 립모션을 이용한 네 번째 가상현실 기반 손동작 인식 재활 콘텐츠로써 구성 및 개념화 훈련을 목표로 하는 훈련 콘텐츠이며, 제시되는 목표 가이드 그림과 같은 분류에 해당하는 물건을 집어 상자로 옮겨 담는 동작을 수행하여야 한다.Contents The 4-person composition and conceptualization training content is the fourth virtual reality-based hand motion recognition rehabilitation content using Leap Motion, and is training content aimed at composition and conceptualization training. The transfer operation must be performed.

콘텐츠 4는 립모션을 이용한 네 번째 가상현실 기반 손동작 인식 재활 콘텐츠로써 사전에 조사한 인지재활 활동항목들 중 구성훈련과 개념화 훈련을 목표로 하는 훈련 콘텐츠이다. 디스플레이에 시지각적으로 정보를 입력받은 후 이를 앞에 제시된 물체의 범주와 서로 연관성을 찾기 위한 과정이 필요하다.Content 4 is the fourth virtual reality-based hand motion recognition rehabilitation content using Leap Motion, and it is training content aimed at constructive training and conceptualization training among previously investigated cognitive rehabilitation activity items. After receiving information visually and perceptually on the display, a process of finding a correlation with the categories of objects presented earlier is required.

훈련이 시작되면 앞에 그림이 한 장 제시되며 눈앞의 물건 중 연관된 물건을 찾아 상자에 넣으라는 지시 음성 메시지가 출력된다. 이를 듣고 물체를 집어 올려 상자에 알맞게 집어넣었으면 음성피드백과 함께 점수가 오르게 되고 잘못 집어넣었으면 음성피드백과 함께 다음 문제로 넘어가게 되는 과정을 총 10회 반복하도록 구성되어있다.When the training starts, a picture is presented in front of you, and a voice message is output to tell you to find a related object among the objects in front of you and put it in the box. After listening to this, if the object is picked up and put properly in the box, the score goes up along with voice feedback, and if it is put incorrectly, the process of moving on to the next problem with voice feedback is configured to be repeated a total of 10 times.

<실험예><Experimental example>

상기의 재활 훈련시스템의 이용에 있어 언급한 체절 계층구조를 통한 알고리즘 및 잡는 동작검출 개선 등의 입력 메커니즘의 실제 효용성을 검토하기 위하여 쿼터니언 연산을 변환 없이 이용할 경우의 손동작 인식 정확도와 오일러 변환 각Hand motion recognition accuracy and Euler transformation angle when quaternion operation is used without conversion to examine the actual effectiveness of the input mechanism such as algorithm through segment hierarchy and grasping motion detection improvement mentioned in the use of the above rehabilitation training system

및 계층구조를 적용한 알고리즘의 손동작 정확도를 측정하고 인식 정확도를 각 경우에 따라 측정 후 비교하였다.and the hand motion accuracy of the algorithm applying the hierarchical structure was measured, and the recognition accuracy was measured and compared according to each case.

또한, 재활 프로그램 콘텐츠의 효용성을 테스트하기 위해 손뼉 동작의 검출의 후속처리 속도, 잡는 동작 검출에 따른 음성인식의 반응속도를 측정하였다. In addition, in order to test the effectiveness of rehabilitation program contents, the post-processing speed of detection of hand clapping motions and the reaction speed of voice recognition according to the detection of grasping motions were measured.

또한, 손동작을 인식하고 라벨링하는 과정에서 CNN을 이용한 스펙트로그램 분석기법을 사용하여 손동작 인식 정확도가 얼마나 높게 나오는가를 측정하는 실험을 수행하였다.In addition, in the process of recognizing and labeling hand gestures, an experiment was conducted to measure how high the accuracy of hand gesture recognition was by using a spectrogram analysis technique using CNN.

실험 1은 미리 라벨링한 손동작에 대한 사용자 의도와 실제 인식한 손동작에 대해서 얼마나 정확하게 인식하였는지를 손동작별로 30회씩 반복하여 손동작 인식 정확도를 평가하였다.In Experiment 1, the hand gesture recognition accuracy was evaluated by repeating each hand gesture 30 times to determine how accurately the user's intention for the previously labeled hand gesture and the actual recognized hand gesture were recognized.

실험 2는 손뼉 동작의 검출 후 비활성화된 손 모델이 복귀하는데 걸리는 시간을 50회 측정하여 그 평균치를 확인하였다.In Experiment 2, the time taken for the disabled hand model to return after detecting the hand clap motion was measured 50 times and the average value was confirmed.

실험 3은 3-1과 3-2로 나누어 실험하였으며 기존 SVM을 활용한 발명에서 특정 동작을 선별하였듯이 같은 기준으로 정확도가 높은 5개 동작을 선별하여 실시하였다.Experiment 3 was divided into 3-1 and 3-2, and as the specific motion was selected in the existing invention using SVM, five motions with high accuracy were selected and conducted based on the same criteria.

실험 3-1은 5개 동작에 대하여 10회 데이터를 수집하였고 회별로 10초 동안 0.1초 마다 하나의 행을 입력받아 1,000개의 행을 갖는 데이터를 입력받았다. In Experiment 3-1, data was collected 10 times for 5 operations, and data having 1,000 rows was input by receiving one row every 0.1 second for 10 seconds per time.

실험 3-2는 5개 동작에 대하여 10회 데이터를 수집하였고 회별로 1초 동안 0.1초마다 하나의 행을 입력받아 10개의 행을 갖는 데이터를 입력받았다. 이후 CNN 모델을 학습시킨 후 사전검정을 통해 정확도를 확인하였다.In Experiment 3-2, data was collected 10 times for 5 motions, and one row was input every 0.1 second for 1 second per time, and data having 10 rows was input. Then, after training the CNN model, the accuracy was confirmed through pretest.

- 실험 1: 손동작 인식 정확도 실험- Experiment 1: Hand gesture recognition accuracy test

총 8가지의 손동작을 사전 정의하여 손가락 체절의 각도를 입력받아 손동작을 구분하도록 하여 목표 손동작이 오른쪽 상단 UI에 표시될 수 있도록 하였다. 또한, 현재 손동작이 어떤 손동작인지 화면 왼쪽 상단 UI에 표시가 되어 5초 이내에 지시된 것과 알맞은 손동작을 취한 경우 오른쪽 상단에 위치한 UI의 점수 카운트가 올라가서 몇 번 알맞은 손동작을 취했는지 체크할 수 있도록 하였다. 알맞은 손동작을 5초 이내에 취하여 카운트가 상승하거나 5초 동안 제시된 손동작을 취하지 못하면 알림음이 나오고 오른쪽 상단의 실험 진행횟수에 대한 횟수에 관한 숫자가 상승한다.A total of eight hand gestures were predefined, and the angle of the finger segment was input to distinguish the hand gestures so that the target hand gestures could be displayed on the upper right UI. In addition, the current hand gesture is displayed on the upper left UI of the screen, and if the indicated and appropriate hand gesture is taken within 5 seconds, the score count of the UI located in the upper right corner rises, so that it is possible to check how many appropriate hand gestures have been taken. If an appropriate hand gesture is taken within 5 seconds, the count rises, or if the suggested hand gesture is not taken for 5 seconds, a notification sounds and the number related to the number of experiments in the upper right corner rises.

이때 제한시간 5초 이내에 알맞은 손동작을 제시하지 못한 경우에는 횟수만 증가하고 점수가 증가하지 않는다. 도 21은 본 발명의 실험예에 따른 측정대상 손동작 사전 정의로서, 각 동작 별로 30회씩 실험을 진행하였으며 사용된 동작은 아래 도 21에 표시된 것과 같은 사전 정의된 동작을 의도하여 올바르게 입력받는지 확인하는 것으로 진행되었다.At this time, if the proper hand motion is not presented within the time limit of 5 seconds, only the number of times increases and the score does not increase. 21 is a pre-definition of a hand motion to be measured according to an experimental example of the present invention, and experiments were conducted 30 times for each motion, and the used motion was intended to be a predefined motion as shown in FIG. progressed

이때 카메라를 통해서 획득한 영상을 기반으로 손목의 상대적인 각도에 따라 정확도가 다르게 인식되는 경향이 있어 카메라에 대한 손바닥의 각도에 따른 인식률 변화를 각 동작별로 분석하여 표로 그 차이를 분석하였다.At this time, based on the image acquired through the camera, the accuracy tends to be recognized differently depending on the relative angle of the wrist, so the change in recognition rate according to the angle of the palm to the camera was analyzed for each motion and the difference was analyzed in a table.

다른 각도 획득 방법과 본 발명에서의 각도 획득 방법인 지역 오일러 각의 차이점을 좀 더 명확히 확인하기 위해 펼쳐진 손에 대하여 4가지 방식으로 검지 둘째 마디의 회전율을 검출하도록 한다.In order to more clearly confirm the difference between other angle acquisition methods and the local Euler angle, which is an angle acquisition method in the present invention, the rotation rate of the second index finger is detected in four ways for an open hand.

이때 적용하는 4가지 방법은 다음과 같이 정의할 수 있다. 전역 오일러 각(global euler)은 행렬연산을 따르지만 계층구조를 적용시킬 때 그 기준을 원점으로 설정하여 원점으로부터 획득한 오일러 각이며 회전정보 로우데이터(rotation)는 계층구조를 적용시키지 않았고 쿼터니언 행렬 구조가 야기하는 문제를 해결하기 위해 변환된 Unity 상의 로우데이터이다. 지역 오일러 각(local euler)은 계층구조를 적용시키고 행렬연산을 수행하여 얻은 본 발명에서 다룬 기법에 관한 결과값이며 쿼터니언의 오일러 각 변환 함수 결과값(quaternion to euler) 또한 행렬연산을 실시하게 되나 Unity는 회전에 대한 쿼터니언의 내부 표현을 오일러 각도로 변환하는데 오일러 각도를 사용하여 주어진 회전을 표현하기 위해 다시 읽은 값은 할당한 값과 상당히 다를 수 있게 된다.The four methods applied at this time can be defined as follows. The global Euler angle follows the matrix operation, but when applying the hierarchical structure, it is the Euler angle obtained from the origin by setting the criterion as the origin. It is raw data on Unity that has been converted to solve the problem it causes. The local Euler angle is the result value of the technique dealt with in the present invention obtained by applying the hierarchical structure and performing matrix operation, and the quaternion Euler angle conversion function result value (quaternion to euler) also performs matrix operation, Unity converts the quaternion's internal representation of rotation to Euler angles, so that the value read back to represent a given rotation using Euler angles can be quite different from the value you assigned.

즉 내부에서는 행렬연산을 수행하여 동작하지만 그것을 활용하기 위해 반환받을 경우 행렬연산을 수행하지 않은 값으로 출력된다는 것이다.In other words, it operates by performing matrix operation internally, but when it is returned to use it, it is output as a value without performing matrix operation.

도 22는 본 발명의 실험예에 따른 오브젝트 각도에 따른 체절 각도 획득 방법 비교자료로서, 각 4가지 방법별로 수직축 회전을 실시하면서 펼쳐진 손에 대한 검지 둘째마디 회전율을 4가지 방식으로 검출하고 비교한 것이다.22 is comparative data on the segment angle acquisition method according to the object angle according to the experimental example of the present invention, and detects and compares the rotation rate of the second knuckle of the index finger for the open hand in four ways while performing vertical axis rotation for each of the four methods. .

왼쪽은 모두 각도 0도에서 측정한 것이고 오른쪽은 각도 360도에서 측정한 것이다. 이때 전역 오일러 각, 지역 오일러 각, 회전 변수 값은 0도와 360도 회전 시 각도 값의 차이가 없으나 쿼터니언의 오일러 각 변환 함수 결과 값은 0도와 360도 간에 서로 일정하지 못한 결과값을 보인다.The left side is all measured at an angle of 0 degrees, and the right side is measured at an angle of 360 degrees. At this time, the global Euler angle, local Euler angle, and rotation variable values have no difference between angle values between 0 and 360 degrees, but the quaternion's Euler angle conversion function result values show inconsistent results between 0 degrees and 360 degrees.

이는 상기 서술한 짐벌락 현상에 관한 것으로 회전함에 따라 회전축이 상실되었을 때에 대해 보정이 이루어지지 못한 것이며 값이 일정하지 않은 것을 생각했을 때 쿼터니언의 오일러 각 변환함수 결과값이 손동작 인식을 위한 각도획득 방법으로 불리하다고 판단할 수 있다.This is related to the gimbal lock phenomenon described above, and when the rotation axis is lost as it rotates, correction is not made, and considering that the value is not constant, the quaternion's Euler angle conversion function result value is an angle acquisition method for hand gesture recognition. can be judged unfavorable.

회전 변수값의 경우 짐벌락에 대한 문제가 보완을 위한 회전각 변환이 잘 이루어진 상태이며 360도 회전과 0도 회전에서의 차이는 매우 작은 것으로 나타난다.In the case of the rotation variable value, the rotation angle conversion to compensate for the problem of gimbal lock is well done, and the difference between 360 degree rotation and 0 degree rotation appears to be very small.

기존 SVM 모델을 적용한 손 모델 인식 알고리즘의 경우에도 상기한 문제에 대하여 하드웨어적인 개선이 필수적으로 요구된다고 연구된 바 있으며 이러한 이유 때문에 소프트웨어적인 알고리즘을 검정하기로 한 기존 SVM을 적용하여 손동작을 인식한 연구들에서는 특정 각도와 특정 손동작만을 제한하여 실험을 진행하였다.Even in the case of the hand model recognition algorithm to which the existing SVM model is applied, it has been studied that hardware improvement is essential for the above problems. In the field, the experiment was conducted by limiting only specific angles and specific hand movements.

본 발명에서 또한 이러한 제한점을 확인하고 실험조건을 동일하게 적용하기 위한 방법으로 손동작별 정확도와 각도별 정확도를 평가하여 그 차이가 있는지 확인할 수 있도록 하며 4가지 변환 방법 중 어느 것을 스펙트로그램으로 변환하기 위한 전처리 로우데이터로 활용할지를 선별하도록 하기 위한 실험이다.In the present invention, as a method for confirming these limitations and equally applying the experimental conditions, it is possible to evaluate the accuracy for each hand motion and the accuracy for each angle to determine if there is a difference, and to convert which of the four conversion methods to a spectrogram This is an experiment to select whether to use as preprocessing raw data.

- 실험 2: 모델 재인식 반응속도 실험- Experiment 2: Model Recognition Response Speed Test

두 번째 실험은 손뼉 동작의 검출 후 손 모델이 한 덩어리로 인식되어 양손이 손 하나로 인식되어 한쪽 손 모델이 소실되었을 때 다시 손뼉 이후 모델을 인식하는데 걸리는 시간을 확인함으로써 그 실효성을 검증하기 위한 실험이다.The second experiment is an experiment to verify the effectiveness by checking the time taken to recognize the model after hand clapping again when the hand model is recognized as a lump after the hand clapping motion is recognized and both hands are recognized as one hand and the one hand model is lost. .

손뼉 동작의 검출을 트리거로 동작하여 이후 활성화 변수인 씬 내부 활성화 상태를 체크하여 손 모델의 렌더링 상태를 50회 반환받아 최대 소요시간과 최소 소요시간 평균 시간을 확인하였다. 총 50회 동안 손뼉 동작의 검출을 유도하고 그 이후 소실된 경우에 다시 손 모델을 인식한 경우를 밀리초(ms) 단위로 측정하였으며 시간이 워낙 짧은 경우 육안으로 확인이 불가능하므로 활성함수로부터 반환받기까지 걸린 시간을 왼쪽 UI 상단에 표시할 수 있도록 하였다.The detection of the hand clap motion was operated as a trigger, and then the activation variable, the activation state inside the scene, was checked, and the rendering state of the hand model was returned 50 times, and the average time required for the maximum and minimum required times was checked. The detection of the hand clap motion was induced for a total of 50 times, and when it was lost after that, the case of recognizing the hand model again was measured in milliseconds (ms). The time it took to complete the game can be displayed at the top of the UI on the left.

- 실험 3: CNN 기반 스펙트로그램을 이용한 손동작 분류- Experiment 3: Hand motion classification using CNN-based spectrogram

CNN 기법을 통해 손동작 분류 기법을 적용하고 SVM과 비교하여 그 정확도의 향상을 확인하기 위하여 아래 그림과 같은 모델을 설계하여 학습을 진행하도록 한다.In order to apply the hand motion classification technique through the CNN technique and compare it with the SVM to confirm the improvement in accuracy, a model as shown in the figure below is designed and trained.

도 23은 본 발명의 실험을 위한 CNN 모델 블록 개요, 도 24는 본 발명의 실험예에 따른 모델 레이어 정의 및 구조로서, 모델은 가장 널리 통용되는 CNN 모델인 mnist 를 참조하여 디자인하였으며 3개의 convolution 레이어와 3개의 max pooling 레이어, 2개의 dense 레이어를 사용하며 각 레이어의 상세구조는 도 24와 같이 구성되어 있다.23 is an overview of CNN model blocks for the experiment of the present invention, and FIG. 24 is a model layer definition and structure according to an experimental example of the present invention. The model was designed with reference to mnist, the most widely used CNN model, and three convolution layers , 3 max pooling layers, and 2 dense layers are used, and the detailed structure of each layer is configured as shown in FIG. 24.

학습에 사용된 데이터는 지역 오일러 각에서 비교적 정확도가 높게 측정되었던 A, C, D, E, F 5가지 손동작에 대하여 각 동작별로 10개의 데이터를 측정한 실험 3-1과 1000개의 데이터를 측정한 실험 3-2로 나뉘며 이러한 데이터를 수집하기 위하여 시뮬레이션 프로그램을 구성하여 동작시켰다.The data used for learning were Experiment 3-1, in which 10 data were measured for each motion, and 1000 data were measured for the five hand motions A, C, D, E, and F, which were measured with relatively high accuracy at the local Euler angle. Divided into Experiment 3-2, a simulation program was configured and operated to collect these data.

기본적으로 VR 콘텐츠 내부의 최소 시간단위는 컴퓨터의 성능에 의해 좌우되며 0.04초를 기본 최소 시간단위로 잡고 있다. 필요에 따라 그 이상을 성능을 내는 것도 가능하지만 시스템의 안정성을 떨어트릴 수 있어 0.04초마다 동작하는 기본 최소시간단위를 고려하여 0.1초마다 Unity VR 콘텐츠 내부에서 정해진 동작을 취하고 데이터 수집버튼을 눌렀을 때 데이터가 수집되도록 하였다. 때문에 1,000개의 데이터를 모으는 시뮬레이션은 0.1초마다 데이터를 수집하여 한 가지 손동작을 행할 것을 디스플레이를 통해 지시하고 이후 1초가 흐른 뒤부터 총 100초 동안 데이터를 0.1초마다 수집하여 csv파일로 기록하였다.Basically, the minimum time unit inside VR content is influenced by the performance of the computer, and 0.04 seconds is set as the basic minimum time unit. It is possible to perform more than that if necessary, but it can reduce the stability of the system. Considering the basic minimum time unit that operates every 0.04 seconds, when a predetermined action is taken within the Unity VR content every 0.1 second and the data collection button is pressed data was collected. Therefore, in the simulation of collecting 1,000 data, data was collected every 0.1 second, and one hand gesture was instructed through the display, and after 1 second, data was collected every 0.1 second for a total of 100 seconds and recorded as a csv file.

이후 100초가 지난 후 파일을 저장하고 디스플레이상에서 데이터 수집이 끝났음을 알리고 다음 버튼이 눌릴 때까지 사용자가 대기하는 방식으로 데이터 수집이 이루어졌다.After 100 seconds, the file was saved, data collection was completed on the display, and data collection was performed in such a way that the user waited until the next button was pressed.

10개의 데이터를 수집하는 방식은 데이터가 적은 경우에 정확도에서 얼마나 차이가 발생하는지를 확인하기 위해서 실시하였으며 동일한 방식으로 미리 목표 손동작을 디스플레이상에서 지시하고 데이터 수집버튼을 누른 뒤 1초 후 총 1초간 데이터를 수집하고 1초가 지난 후 데이터 수집이 끝났음을 디스플레이상에 알렸다.The method of collecting 10 data was conducted to check how much difference in accuracy occurs when there is little data. After 1 second of collection, the end of data collection was indicated on the display.

이러한 방식을 반복하여 총 5개 손동작에 대하여 각 실험 3-1, 3-2 모두 손동작 별로 100개의 데이터를 수집하였으며 각각 500개씩 총 1,000개의 데이터를 수집하였다. 실험은 나누어서 진행되었으며 실험 3-1, 3-2 모두 500개의 개별 데이터를 훈련 세트(training set) 90%(450개)와 평가 세트(test set) 10%(50개)로 나누어 학습 및 테스트를 진행하였다.By repeating this method, 100 data were collected for each hand motion in Experiments 3-1 and 3-2 for a total of 5 hand motions, and a total of 1,000 data of 500 each were collected. The experiment was divided and divided into 90% (450 pieces) of training set and 10% (50 pieces) of test set with 500 individual data in both Experiments 3-1 and 3-2 for learning and testing. proceeded.

<실험결과><Experiment result>

손바닥이 카메라를 바라보고 있을 경우 가장 인식률이 높았으며 이때 각도를 0도라고 했을 때, 90도에서 84.58%, 135도에서 86.25%로 낮았으며 가장 높은 정확도를 보인 것은 0도일 때 98.75%, 315도 일 때 98.75%로 손바닥이 카메라 정면인 경우 모든 동작에 대하여 정확도가 가장 높았다.The recognition rate was the highest when the palm was facing the camera, and when the angle was 0 degrees, it was 84.58% at 90 degrees and 86.25% at 135 degrees, and the highest accuracy was 98.75% at 0 degrees and 315 degrees. When the palm was in front of the camera at 98.75%, the accuracy was the highest for all motions.

도 25는 Local transform euler를 이용한 손동작별 손동작 인식 정확도를 나타낸 측정표이다.25 is a measurement table showing hand gesture recognition accuracy for each hand gesture using a local transform euler.

Local transform euler을 기반으로 8가지 손동작에 대해 각도를 변화시키며 30회 인식결과를 측정하였다. 이는 손동작 D, E, H에 대해 30±0, 29.37±0.91, 29.37±0.91로 적은 표준편차를 보였으며 전체적으로 80% 이상의 정확도를 보였다. 손동작 D, E, H를 제외하고 표준편차가 상대적으로 높고 정확도가 상대적으로 높게 측정되었다.Based on the local transform euler, the recognition results were measured 30 times while changing the angle for 8 hand gestures. This showed a small standard deviation of 30±0, 29.37±0.91, and 29.37±0.91 for hand movements D, E, and H, and showed an overall accuracy of over 80%. Except for hand movements D, E, and H, the standard deviation was relatively high and the accuracy was measured relatively high.

도 26은 Quaternion to euler를 이용한 손동작별 인식 정확도를 나타낸 측정표이다.26 is a measurement table showing recognition accuracy for each hand gesture using quaternion to euler.

쿼터니언의 오일러 각 변환 함수 결과 값을 기반으로 8가지 손동작에 대해 각도를 변화해가며 330회 인식결과를 측정하였다. 전체적으로 70% 이상의 정확도를 보였으나, 모든 각도에 대하여 평균 정확도가 지역 오일러 각보다 낮게 측정되었다.Based on the quaternion's Euler angle conversion function result value, the recognition results were measured 330 times while changing the angle for 8 hand gestures. Although the overall accuracy was over 70%, the average accuracy for all angles was lower than the local Euler angle.

도 27은 Local transform euler을 이용한 각도별 인식 정확도를 나타낸 측정표이다.27 is a measurement table showing recognition accuracy for each angle using a local transform euler.

Local transform euler를 기반으로 8가지 각도변화에 대해 손동작 변화로 나타낸 30회 인식결과는 표 10과 같다. 이는 각도 0도와 315도에 대해 29.62±1.06, 29.62±0.74 으로 적은 표준편차를 보였으며 전체적으로 85% 이상의 정확도를 보였다. 각도 0도와 315도를 제외하고 표준편차가 상대적으로 높고 정확도가 상대적으로 높게 측정되었다.Table 10 shows the 30 recognition results shown as hand motion changes for 8 angle changes based on the local transform euler. This showed a small standard deviation of 29.62±1.06 and 29.62±0.74 for the angles of 0 degree and 315 degrees, and showed an overall accuracy of over 85%. Except for the angles of 0 degree and 315 degrees, the standard deviation was relatively high and the accuracy was measured relatively high.

도 28은 Quaternion to euler 를 이용한 각도별 정확도를 나타낸 측정표로서, Quaternion to euler를 기반으로 8가지 각도변화에 대해 손동작 변화로 나타낸 30회 인식결과이다.28 is a measurement table showing accuracy for each angle using Quaternion to euler, and is a recognition result of 30 times represented by hand motion change for 8 angle changes based on Quaternion to euler.

전체적으로 50% 이상의 정확도를 보였으나, 모든 각도에 대하여 표준편차 및 평균 정확도가 local transform euler보다 낮은 수준으로 측정되었다.Although the overall accuracy was over 50%, the standard deviation and average accuracy for all angles were measured at a lower level than the local transform euler.

또한, 손바닥이 정면과 45도 각도 일 때인 45도와 315도를 비교했을 때 45도는 95%로 98.75%인 315도에 비하여 다소 정확도가 낮은 경향을 보였다. 새끼손가락은 가려져 있어도 인식률이 좋았던 반면 엄지손가락은 가려져 있을 경우 구부림 정도 인식률이 뒤떨어지는 경향을 보였다.In addition, when comparing 45 degrees and 315 degrees, when the palm is at a 45 degree angle with the front, 45 degrees tended to be slightly less accurate than 315 degrees, 98.75% at 95%. The recognition rate was good even when the little finger was covered, but the recognition rate of the degree of bending tended to be inferior when the thumb was covered.

동작별 정확도는 D, H 동작과 같이 손가락의 구부림 정도만 가지고 인식할 수 있는 동작의 정확도는 높게 측정되었으며 B처럼 손등 쪽에서 보았을 때 손가락의 움직임을 특정할 수 없는 경우는 손가락이 어떤 형태인지 추측하는데 어려움을 겪음에 따라 정확도가 감소하였고 이는 정확히 손등이 카메라를 마주보고 있는 180도인 경우보다 측면에서 바라보았을 때 더욱 인식률이 떨어지는 모습을 보여 주었다. C와 G도 마찬가지로 손바닥이 카메라와 마주보고 있는 0도와 45도 차이가 나는 45도, 315도에서 측정하였을 때에는 28회~30회의 정확도를 보였으며 180도인 경우에 26, 27회의 정확도를 보여주었으나 측면에서 사용한 경우 최대 C는 21As for the accuracy of each motion, the accuracy of motions that can be recognized only with the degree of bending of the fingers, such as motions D and H, was measured to be high, and it is difficult to guess what shape the finger is in cases where the motion of the finger cannot be specified when viewed from the back of the hand, such as B. The accuracy decreased according to the experience, and this showed that the recognition rate fell more when viewed from the side than when the back of the hand was facing the camera at 180 degrees. Likewise, C and G showed accuracy of 28 to 30 times when measured at 45 degrees and 315 degrees, which are 45 degrees apart from 0 degrees, where the palm faces the camera, and 26 and 27 times when measured at 180 degrees. Max C is 21 when used on the side

회, G는 19회까지 인식률이 떨어지는 모습을 보여주었다.Episodes and G showed a drop in the recognition rate until episode 19.

각도에 따른 정확도 변화를 측정하였을 때 각도 0도와 315도에서 정확도 90% 이상을 만족하였다. 단일 깊이카메라를 이용한 방식이기 때문에 각도에 취약하다는 사실이 증명 가능하다. 기존 SVM 모델을 적용한 손동작 인식 알고리즘에서 제시한 하드웨어의 물리적 한계를 확인할 수 있었으며, 이러한 이유 때문에 소프트웨어적인 알고리즘을 검정하기로 한 본 발명에서는 기존 SVM을 적용하여 손동작을 인식한 연구들과 동일한 실험조건을 위해 앞으로의 실험 3-1, 실험 3-2에서 0도로 각도를 제한하여 실험하였다.When the accuracy change according to the angle was measured, the accuracy of 90% or more was satisfied at the angle of 0 degree and 315 degree. Since it is a method using a single depth camera, it is possible to prove that it is vulnerable to angles. We were able to confirm the physical limitations of the hardware presented in the hand gesture recognition algorithm using the existing SVM model. For this reason, the present invention, which decided to test the software algorithm, performed the same experimental conditions as the previous studies that recognized hand gestures using the existing SVM. For this purpose, the angle was limited to 0 degrees in future experiments 3-1 and 3-2.

최대 소요시간은 0.9초였으며 최소 소요시간은 0.512초로 평균 0.727초 내로 손 모델이 소실된 후 인식하는 결과값을 보였다. 이때 소실되는 것을 방지하기 위해 손 모델 렌더링 활성화에 관한 SDK 기능을 비활성화한 경우 오히려 인식률이 감소하는 경향을 보였다. 이러한 경우 최대 소요시간은 1.395초였으며 최소 소요시간은 0.704초로 전체적으로 크게 재인식에 걸리는 시간이 증가하였다. 손 모델 렌더링 활성화 기능을 사용하지 않고 손뼉 동작의 검출을 수행하였을 때 평균 0.727초 이내에 손 모델이 복구되었다.The maximum required time was 0.9 seconds and the minimum required time was 0.512 seconds, showing the result value recognized after the hand model disappeared within an average of 0.727 seconds. At this time, when the SDK function related to activation of hand model rendering was disabled to prevent loss, the recognition rate tended to decrease. In this case, the maximum required time was 1.395 seconds and the minimum required time was 0.704 seconds, which significantly increased the time required for re-recognition as a whole. When hand clapping motion was detected without using the hand model rendering activation function, the hand model was recovered within an average of 0.727 seconds.

도 29는 본 발명의 실험예에 따른 모델 재인식 반응속도 실험결과로, 손뼉 동작의 검출 후 Hand model 재인식에 대한 반응속도(x축 회수, y축 시간(second))를 나타낸다.29 is a model recognition reaction speed test result according to an experimental example of the present invention, showing the reaction speed (number of times on the x-axis, time (second) on the y-axis) for hand model recognition after detecting a hand clap motion.

- 실험 3: 손동작 인식 정확도 실험- Experiment 3: Hand gesture recognition accuracy test

실험 3-1을 진행하며 A, C, D, E, F 5개의 손동작에 대하여 손동작별로 1초 동안 수집된 10행의 데이터를 100회씩 수집하여 총 500개의 데이터를 수집한 후 90%(450개)는 학습에 이용하고 10%(50개)를 테스트에 이용했을 때 테스트 결과 100%로 측정되었다.In Experiment 3-1, 10 rows of data collected for 1 second for each hand motion were collected 100 times for 5 hand motions A, C, D, E, and F for a total of 500 data, and then 90% (450 data) ) was used for learning and when 10% (50 pieces) were used for testing, the test result was measured as 100%.

실험 3-2를 진행하며 A, C, D, E, F 5개의 손동작에 대하여 각 손동작 별로 1초 동안 수집된 1,000행의 데이터를 100회씩 수집하여 총 500개의 데이터를 수집한 후 90%(450개)는 학습에 이용하고 10%(50개)를 테스트에 이용했을 때 테스트 결과 100%로 측정되었다.During Experiment 3-2, 1,000 rows of data collected for 1 second for each hand gesture were collected 100 times for 5 hand gestures A, C, D, E, and F for a total of 500 data, and then 90% (450 dog) was used for learning and 10% (50 pieces) was used for testing, and the test result was measured as 100%.

도 30은 본 발명의 따른 손동작 인식 정확도 실험에 따른 confusion matrix(X 축: actual, Y축: predicted)로, 실험 3-2의 결과를 갖고 테스트 결과와 정답이 맞았던 항목에 대하여 confusion matrix로 나타낸 것이다.30 is a confusion matrix (X-axis: actual, Y-axis: predicted) according to the hand gesture recognition accuracy test according to the present invention, and the results of Experiment 3-2 are shown as a confusion matrix for the items for which the test result and the correct answer were correct. .

이와 같은 실험 결과에 따라 본 발명에서 제시한 CNN 기법은 SVM을 활용한 손동작 인식 방법보다 정확도가 높고 주변 배경변화 등에 유리하여 증강현실 기반 인지재활 훈련시스템에 유용하게 적용될 수 있을 것으로 기대할 수 있다.According to these experimental results, the CNN technique presented in the present invention is more accurate than the hand motion recognition method using SVM and is advantageous to the surrounding background change, so it can be expected to be usefully applied to the augmented reality-based cognitive rehabilitation training system.

본 발명의 권리는 위에서 설명된 실시 예에 한정되지 않고 청구범위에 기재된 바에 의해 정의되며, 본 발명의 분야에서 통상의 지식을 가진 자가 청구범위에 기재된 권리범위 내에서 다양한 변형과 개작을 할 수 있다는 것은 자명하다.The rights of the present invention are defined by what is described in the claims, not limited to the embodiments described above, and that those skilled in the art can make various modifications and adaptations within the scope of rights described in the claims. It is self-evident.

100: 몸체부 110: 하부프레임
111: 고정부 120: 상부프레임
121: 거치대 130: 수직프레임
131: 동작영역 140: 영상감지모듈
150: 음성입력모듈 151: 음성입력부
152: 음성인식부 160: 출력모듈
161: 디스플레이부 162: 음향출력부
170: 제어모듈 171: 저장부
172: 동작인식부 173: 음성인터페이스부
174: 영상처리부 175: 평가부
176: 피드백부100: body part 110: lower frame
111: fixing part 120: upper frame
121: cradle 130: vertical frame
131: operation area 140: image detection module
150: voice input module 151: voice input unit
152: voice recognition unit 160: output module
161: display unit 162: sound output unit
170: control module 171: storage unit
172: motion recognition unit 173: voice interface unit
174: image processing unit 175: evaluation unit
176: feedback unit

Claims

A lower frame 110 having a fixing part 111 formed therein, an upper frame 120 located above the lower frame 110 and having a cradle 121 formed thereon, the lower frame 110 and the upper frame 120 ) The body portion 100 composed of vertical frames 130 interconnected so that an operation area 131 for hand motion is formed between them;
An image sensing module 140 installed on the fixing part 111 and capturing hand motions in the motion area;
an output module 160 coupled to the cradle 121 and having a display unit 161 outputting an image and an audio output unit 162 outputting a sound source;
A storage unit 171 for storing guide voices for rehabilitation training, rehabilitation training contents, content images corresponding to the contents, a user interface, and contents sound sources, and a motion recognition unit recognizing hand gestures captured by the image detection module 140. 172, an image processing unit 174 that applies the recognized hand motion to a content image and outputs it, an evaluation unit 175 that evaluates the recognized hand motion corresponding to the contents of the content, and converts the evaluation result into video or audio. Control module 170 having a feedback unit 176 to be output through; Augmented reality rehabilitation training system, characterized in that consisting of.

According to claim 1,
The motion recognition unit 172,
Augmented reality rehabilitation training characterized by tracking the movement of the hand through a lip motion controller, reflecting the 3-dimensional position value and rotation value of the hand model in a 3-dimensional virtual environment, and obtaining the relative angle value for each finger segment system.

According to claim 1,
a voice input module 150 having a voice input unit 151 that receives voice input from a user and a voice recognition unit 152 that analyzes and converts the input voice into text information; Including more,
The control module 170,
Augmented reality rehabilitation training system further comprising a voice interface unit 173 for selecting and proceeding with content through the text information.

According to claim 3,
The rehabilitation training content,
Augmentation characterized in that, along with playing the sound source, scores and subtitles and interactive target objects are output, and when the user claps his hands in line with the interactive target objects, the hand position and hand clap motion are recognized and a score is given along with sound effects. A real rehabilitation training system.

According to claim 3,
The rehabilitation training content,
A plurality of animal characters are given as objects, and the selected object is changed along with a guide voice. After activating voice recognition through a hand gesture set by the user, if the correct answer matches the input voice recognition result, a score is given along with a sound effect. Augmented reality rehabilitation training system, characterized in that.

According to claim 3,
The rehabilitation training content,
Multiple positions are presented with name tags as items for separate garbage collection, and after randomly generating objects to be collected separately on a virtual desk, a hand model corresponding to the hand motion of moving the designated position by catching the object according to the guidance voice is output together. And, if the attributes of the given name tag and the trash object match, it is considered as the correct answer and a score is given with sound effects. Augmented reality rehabilitation training system.

According to claim 3,
The rehabilitation training content,
A number of set objects and boxes are presented, and a voice message is output to instruct to find objects related to the given object and put them in the box. Augmented reality rehabilitation training system characterized by repeating the process of moving to the next problem with voice feedback when the action is made.