KR20200081925A

KR20200081925A - System for voice recognition of interactive robot and the method therof

Info

Publication number: KR20200081925A
Application number: KR1020180171954A
Authority: KR
Inventors: 이성종
Original assignee: 수상에스티(주)
Priority date: 2018-12-28
Filing date: 2018-12-28
Publication date: 2020-07-08
Anticipated expiration: 2038-12-28
Also published as: KR102181583B1

Abstract

본 발명은 음성인식 교감형 로봇, 교감형 로봇 음성인식 시스템 및 그 방법을 개시한다. 본 발명의 일실시례에 따른 교감형 로봇의 음성인식 시스템은, 외부 단말로부터 전송되는 음성 데이터를 수신하는 음성 데이터 수신부, 상기 음성 데이터를 텍스트로 변환하는 텍스트 변환부, 상기 변환된 텍스트로부터 키워드를 추출하는 키워드 추출부, 상기 추출된 키워드에 대응하는 응답 텍스트를 기저장된 메타데이터로부터 추출하는 응답 텍스트 생성부, 상기 응답 텍스트를 음성 데이터로 변환하는 음성 변환부, 및 상기 변환된 음성 데이터를 상기 외부 단말로 송신하는 송신부를 포함할 수 있다.
본 발명의 일실시례에 따른 교감형 로봇은, 버튼 조작을 통해 음성 입력을 개시하기 위한 명령을 입력받는 음성인식 버튼부, 사용자로부터 발화되는 음성을 입력받는 음성 입력부, 상기 입력된 음성의 녹음 데이터를 PCM data 형태로 외부 시스템에 전송하는 음성 전송부 및 상기 외부 시스템으로부터의 응답 데이터를 수신하고 출력하는 음성 출력부를 포함한다.The present invention discloses a voice recognition sympathetic robot, a sympathetic robot voice recognition system, and a method thereof. The voice recognition system of the sympathetic robot according to an embodiment of the present invention includes a voice data receiving unit that receives voice data transmitted from an external terminal, a text conversion unit that converts the voice data into text, and a keyword from the converted text. Keyword extracting unit for extracting, response text generating unit for extracting the response text corresponding to the extracted keyword from pre-stored metadata, a voice conversion unit for converting the response text into voice data, and the converted voice data to the outside It may include a transmitter for transmitting to the terminal.
In the sympathetic robot according to an embodiment of the present invention, a voice recognition button unit that receives a command for starting a voice input through a button operation, a voice input unit that receives a voice uttered by a user, and recording data of the input voice It includes a voice transmission unit for transmitting to the external system in the form of PCM data and a voice output unit for receiving and outputting response data from the external system.

Description

Voice recognition sympathetic robot, sympathetic robot voice recognition system, and method thereof{SYSTEM FOR VOICE RECOGNITION OF INTERACTIVE ROBOT AND THE METHOD THEROF}

본 발명은 음성인식 교감형 로봇, 교감형 로봇의 음성인식 시스템 및 그 방법에 관한 것으로, 보다 상세하게는 교감형 로봇을 통해 사용자의 음성을 인식하고, 상응하는 이벤트를 생성하는 시스템 및 그 방법에 관한 것이다.The present invention relates to a speech recognition sympathetic robot, a speech recognition system of a sympathetic robot, and more particularly, to a system and method for recognizing a user's voice through a sympathetic robot and generating a corresponding event. It is about.

사용자와 교감이 가능한 로봇, 인형 등은 유아나 어린이가 가지고 놀면서 신체 운동 발달 및 기능을 숙달하고, 상상력이나 창의력 개발을 통해 지능 발달 등 교육적으로 중요한 역할을 하기 때문에, 관련하여 교감형 로봇 또는 인형 기술 개발이 크게 관심을 받고 있다.Robots and dolls that can interact with the user play an important role in educational development, such as intelligence development through imagination and creativity development, as infants and children play with them and master physical movement development and functions. Technology development is receiving great attention.

다만, 기존의 로봇 또는 인형은 제한된 소리를 출력하거나, 동작이 없으므로 사용자로 하여금 지속적으로 새로운 관심과 흥미를 끌어내기 어려운 한계가 있었다.However, the existing robot or doll outputs limited sound or there is no motion, so it is difficult for the user to continuously draw new interest and interest.

따라서, 사용자의 소리를 인식하여 응답하되, 사용자의 음성입력으로부터 사용자의 의도를 파악하고 이에 상응하는 응답을 표현할 수 있는 로봇 및 음성인식 시스템에 관한 연구가 필요하다.Therefore, it is necessary to study a robot and a voice recognition system capable of recognizing and responding to the user's sound, but grasping the user's intention from the user's voice input and expressing the corresponding response.

선행기술문헌 : 한국등록특허 제10-1791942호Prior Art Document: Korean Registered Patent No. 10-1791942

본 발명은 음성인식 교감형 로봇을 통해 사용자의 음성을 입력받고 이를 서버에 송신하여 서버에서 사용자의 음성을 분석하고 상응하는 응답 음성을 출력하도록 함으로써, 음성인식 교감형 로봇의 처리능력을 최소화하고, 비용을 절감할 수 있는 음성인식 교감형 로봇, 교감형 로봇 음성인식 시스템 및 그 방법을 제공한다.The present invention minimizes the processing capability of the voice recognition sympathetic robot by receiving the user's voice through the voice recognition sympathetic robot and transmitting it to the server to analyze the user's voice and output a corresponding response voice, Provided is a voice recognition sympathetic robot, a sympathetic robot voice recognition system, and a method for reducing the cost.

본 발명은 음성인식 교감형 로봇에 입력되는 음성을 서버로 전송하여 처리하되, MTU(Maximum Transmission Unit) 단위를 조정하여 데이터를 분할 전송함으로써, 상대적으로 낮은 사양의 하드웨어를 사용하면서도 고속의 음성인식이 가능해지는 음성인식 교감형 로봇, 교감형 로봇 음성인식 시스템 및 그 방법을 제공한다.The present invention transmits and processes the voice input to the voice recognition sympathetic robot to the server, but adjusts the MTU (Maximum Transmission Unit) unit to divide and transmit the data, thereby enabling high-speed voice recognition while using relatively low-spec hardware. Provided are a voice recognition sympathetic robot, a sympathetic robot voice recognition system, and a method.

본 발명은 무선통신을 통해 서버에 접속되어 각 사용자에 특화된 음성을 분석하여 그에 상응하는 음성을 출력함으로써, 개별 사용자의 언어습관 등 특성에 부합하여 보다 정확한 음성인식이 가능한 음성인식 교감형 로봇, 교감형 로봇 음성인식 시스템 및 그 방법을 제공한다.The present invention connects to a server through wireless communication, analyzes a voice specialized for each user, and outputs a corresponding voice, thereby matching voice characteristics such as language habits of individual users to enable more accurate voice recognition. Provided is a robot voice recognition system and method.

본 발명은 입력된 음성을 텍스트로 변환하고 상기 텍스트로부터 키워드를 추출하되, 추출된 키워드의 유사어, 카테고리 속성을 추출함으로써, 상기 유사어와 카테고리 속성에 대응하는 응답 텍스트를 보다 효과적으로 생성할 수 있는 음성인식 교감형 로봇, 교감형 로봇 음성인식 시스템 및 그 방법을 제공한다.The present invention converts the input voice into text and extracts keywords from the text, but extracts similar words and category attributes of the extracted keywords, so that voice recognition capable of generating response texts corresponding to the similar words and category attributes more effectively Provided is a sympathetic robot, a sympathetic robot voice recognition system, and a method thereof.

본 발명의 일실시례에 따른 교감형 로봇의 음성인식 시스템은, 외부 단말로부터 전송되는 음성 데이터를 수신하는 음성 데이터 수신부, 상기 음성 데이터를 텍스트로 변환하는 텍스트 변환부, 상기 변환된 텍스트로부터 키워드를 추출하는 키워드 추출부, 상기 추출된 키워드에 대응하는 응답 텍스트를 기저장된 메타데이터로부터 추출하는 응답 텍스트 생성부, 상기 응답 텍스트를 음성 데이터로 변환하는 음성 변환부, 및 상기 변환된 음성 데이터를 상기 외부 단말로 송신하는 송신부를 포함할 수 있다.The voice recognition system of the sympathetic robot according to an embodiment of the present invention includes a voice data receiving unit that receives voice data transmitted from an external terminal, a text conversion unit that converts the voice data into text, and a keyword from the converted text. Keyword extracting unit for extracting, response text generating unit for extracting the response text corresponding to the extracted keyword from pre-stored metadata, a voice conversion unit for converting the response text into voice data, and the converted voice data to the outside It may include a transmitter for transmitting to the terminal.

본 발명의 일측에 따르면, 상기 외부 단말의 사용자를 식별하는 고유 키(primary key)를 수신하고, 상기 고유 키에 대응하는 설정값을 독출하는 사용자 관리부를 더 포함할 수 있다.According to an aspect of the present invention, a user management unit that receives a primary key that identifies a user of the external terminal and reads a setting value corresponding to the unique key may be further included.

본 발명의 일측에 따르면, 상기 키워드 추출부는, 상기 변환된 텍스트에 존재하는 다수의 명사를 추출하고, 상기 명사의 유사어 셋(set)을 생성하고, 상기 추출된 명사의 카테고리를 기설정된 카테고리에 매칭하여, 추출된 키워드마다 유사어 셋과 카테고리 속성을 부여할 수 있다.According to an aspect of the present invention, the keyword extracting unit extracts a plurality of nouns existing in the converted text, generates a set of synonyms of the noun, and matches the extracted noun category to a preset category. Thus, it is possible to assign similar word sets and category attributes for each extracted keyword.

본 발명의 일측에 따르면, 상기 응답 텍스트 생성부는, 상기 추출된 각각의 키워드의 유사어 셋과 카테고리 속성에 대응하여 연관된 질문 리스트 셋(set)을 각각 추출하고, 상기 질문 리스트 간의 공통 질문을 추출하여 상기 응답 텍스트를 생성할 수 있다.According to an aspect of the present invention, the response text generation unit extracts a set of related question lists corresponding to the set of similar words and the category attribute of each extracted keyword, and extracts a common question between the question lists. You can generate response text.

본 발명의 일실시례에 따른 교감형 로봇은, 버튼 조작을 통해 음성 입력을 개시하기 위한 명령을 입력받는 음성인식 버튼부, 사용자로부터 발화되는 음성을 입력받는 음성 입력부, 상기 입력된 음성의 녹음 데이터를 PCM data 형태로 외부 시스템에 전송하는 음성 전송부 및 상기 외부 시스템으로부터의 응답 데이터를 수신하고 출력하는 음성 출력부를 포함한다.In the sympathetic robot according to an embodiment of the present invention, a voice recognition button unit that receives a command for starting a voice input through a button operation, a voice input unit that receives a voice uttered by a user, and recording data of the input voice It includes a voice transmission unit for transmitting to the external system in the form of PCM data and a voice output unit for receiving and outputting response data from the external system.

본 발명의 일측에 따르면, 상기 출력부는, 음성코덱의 각 레지스터의 딜레이 값이 0인지 확인하고, 0이 아닌 경우에는 음성 코덱의 설정동작 대기를 위한 딜레이 함수를 콜하여 각 레지스터의 딜레이 값에 상응하는 대기시간을 부여할 수 있다.According to an aspect of the present invention, the output unit checks whether the delay value of each register of the voice codec is 0, and when it is not 0, calls a delay function for waiting for a setting operation of the voice codec to correspond to the delay value of each register You can give a waiting time.

본 발명의 일실시례에 따른 교감형 로봇의 음성인식 방법은, 외부 단말로부터 전송되는 음성 데이터를 수신하는 단계, 상기 음성 데이터를 텍스트로 변환하는 단계, 상기 변환된 텍스트로부터 키워드를 추출하는 단계, 상기 추출된 키워드에 대응하는 응답 텍스트를 기저장된 메타데이터로부터 추출하는 단계, 상기 응답 텍스트를 음성 데이터로 변환하는 단계 및 상기 변환된 음성 데이터를 상기 외부 단말로 송신하는 단계를 포함한다.The voice recognition method of the sympathetic robot according to an embodiment of the present invention includes receiving voice data transmitted from an external terminal, converting the voice data into text, and extracting keywords from the converted text, And extracting the response text corresponding to the extracted keyword from pre-stored metadata, converting the response text to voice data, and transmitting the converted voice data to the external terminal.

본 발명의 일실시례에 따르면, 음성인식 교감형 로봇을 통해 사용자의 음성을 입력받고 이를 서버에 송신하여 서버에서 사용자의 음성을 분석하고 상응하는 응답 음성을 출력하도록 함으로써, 음성인식 교감형 로봇의 처리능력을 최소화하고, 비용을 절감할 수 있는 음성인식 교감형 로봇, 교감형 로봇 음성인식 시스템 및 그 방법이 제공된다.According to an embodiment of the present invention, by receiving a user's voice through a voice recognition sympathetic robot and transmitting it to the server, the server analyzes the user's voice and outputs a corresponding response voice, thereby Provided is a voice recognition sympathetic robot, a sympathetic robot voice recognition system, and a method for minimizing processing capacity and reducing cost.

본 발명의 일실시례에 따르면, 음성인식 교감형 로봇에 입력되는 음성을 서버로 전송하여 처리하되, MTU(Maximum Transmission Unit) 단위를 조정하여 데이터를 분할 전송함으로써, 상대적으로 낮은 사양의 하드웨어를 사용하면서도 고속의 음성인식이 가능해지는 음성인식 교감형 로봇, 교감형 로봇 음성인식 시스템 및 그 방법이 제공된다.According to an embodiment of the present invention, the voice input to the voice recognition sympathetic robot is transmitted to a server for processing, and data is divided and transmitted by adjusting a maximum transmission unit (MTU) unit, thereby using hardware with a relatively low specification. Provided are a voice recognition sympathetic robot, a sympathetic robot voice recognition system, and a method for enabling high-speed voice recognition.

본 발명의 일실시례에 따르면, 무선통신을 통해 서버에 접속되어 각 사용자에 특화된 음성을 분석하여 그에 상응하는 음성을 출력함으로써, 개별 사용자의 언어습관 등 특성에 부합하여 보다 정확한 음성인식이 가능한 음성인식 교감형 로봇, 교감형 로봇 음성인식 시스템 및 그 방법이 제공된다.According to an embodiment of the present invention, by connecting to a server through wireless communication and analyzing a voice specialized for each user and outputting a corresponding voice, a voice capable of more accurate voice recognition in accordance with characteristics of individual users' language habits, etc. A cognitive sympathetic robot, a sympathetic robot voice recognition system, and a method are provided.

본 발명의 일실시례에 따르면, 입력된 음성을 텍스트로 변환하고 상기 텍스트로부터 키워드를 추출하되, 추출된 키워드의 유사어, 카테고리 속성을 추출함으로써, 상기 유사어와 카테고리 속성에 대응하는 응답 텍스트를 보다 효과적으로 생성할 수 있는 음성인식 교감형 로봇, 교감형 로봇 음성인식 시스템 및 그 방법이 제공된다.According to an embodiment of the present invention, by converting the input voice to text and extracting keywords from the text, by extracting the similar words and category attributes of the extracted keywords, the response text corresponding to the similar words and category attributes more effectively A speech recognition sympathetic robot, a sympathetic robot speech recognition system, and a method are provided.

도 1은 본 발명의 실시예에 따른 음성인식 교감형 로봇을 통해 음성을 인식받고, 이를 교감형 로봇 음성인식 시스템으로 전달하여 음성인식에 대한 이벤트를 발생시키기 위한 시스템과 로봇 전체 구성을 나타낸 도면이다.
도 2는 본 발명의 실시예에 따른 교감형 로봇 음성인식 시스템의 세부구성을 나타낸 블록도이다.
도 3은 본 발명의 실시예에 따른 음성인식 교감형 로봇의 세부구성을 나타낸 블록도이다.
도 4는 본 발명의 실시예에 따른 교감형 로봇 음성인식 방법의 흐름을 나타낸 동작흐름도이다.1 is a view showing the overall system and a system for generating an event for voice recognition by receiving a voice through a voice recognition sympathetic robot according to an embodiment of the present invention and transmitting it to a sympathetic robot voice recognition system. .
2 is a block diagram showing the detailed configuration of a sympathetic robot voice recognition system according to an embodiment of the present invention.
3 is a block diagram showing the detailed configuration of a voice recognition sympathetic robot according to an embodiment of the present invention.
4 is an operational flow diagram showing the flow of a sympathetic robot voice recognition method according to an embodiment of the present invention.

이하, 첨부된 도면들에 기재된 내용들을 참조하여 본 발명의　실시예들을 상세하게 설명한다. 다만, 본 발명이 실시예들에 의해 제한되거나 한정되는 것은 아니다. 각 도면에 제시된 동일한 참조부호는 동일한 부재를 나타낸다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the contents described in the accompanying drawings. However, the present invention is not limited or limited by the embodiments. The same reference numerals in each drawing denote the same members.

종래에 로봇, 인형 등을 통해 사용자의 음성을 인식하고 이에 대한 응답 메시지를 출력하는 기술은 사용자의 음성을 통해 사용자 질문의 의도를 정확하게 파악하지 못하였으며, 이에 따라 응답 메시지 또한 단순한 메시지들로 이루어지는 등의 문제점이 있었다.Conventionally, the technology of recognizing the user's voice through a robot, doll, etc. and outputting a response message to it does not accurately grasp the intention of the user's question through the user's voice. Accordingly, the response message also consists of simple messages, etc. There was a problem.

본 발명은 상기 종래 기술의 문제점을 해결하기 위해 고안된 발명으로, 본 발명의 구성을 아래에 상세하게 설명한다.The present invention is an invention designed to solve the problems of the prior art, and the configuration of the present invention will be described in detail below.

도 1은 본 발명의 실시예에 따른 음성인식 교감형 로봇을 통해 음성을 인식받고, 이를 교감형 로봇 음성인식 시스템으로 전달하여 음성인식에 대한 이벤트를 발생시키기 위한 시스템과 로봇 전체 구성을 나타낸 도면이다.1 is a view showing the overall system and a system for generating an event for voice recognition by receiving a voice through a voice recognition sympathetic robot according to an embodiment of the present invention and transmitting it to a sympathetic robot voice recognition system. .

도 1을 참고하면, 연결된 스마트 기기(300)으로 음성인식 교감형 로봇(200)이 에이전트 서버와 통신을 하기 위한 절차를 마련한 뒤 음성인식 교감형 로봇(200)을 통해 사용자가 인사, 질문, 감정표현 메시지 등을 음성을 통해 입력하면, 상기 입력된 음성 데이터는 로봇 음성인식 시스템(100)으로 전송할 수 있다.Referring to FIG. 1, a user recognizes, questions, and emotions through the voice recognition sympathetic robot 200 after preparing a procedure for the voice recognition sympathetic robot 200 to communicate with the agent server using the connected smart device 300. When an expression message or the like is input through voice, the input voice data may be transmitted to the robot voice recognition system 100.

이후, 로봇 음성인식 시스템(100)은 상기 음성 데이터를 텍스트로 변환하고, 키워드를 추출하여, 추출된 키워드에 상응하는 응답 텍스트를 생성하고, 이를 음성인식 교감형 로봇(200)에 송신하면, 음성인식 교감형 로봇(200)은 스피커 등을 통해 이를 출력하여 상기 사용자와 교감할 수 있다.Thereafter, the robot voice recognition system 100 converts the voice data into text, extracts keywords, generates a response text corresponding to the extracted keywords, and transmits it to the voice recognition sympathetic robot 200, thereby The cognitive sympathetic robot 200 may output it through a speaker or the like to interact with the user.

이때, 사용자는 음성인식 교감형 로봇(200)을 통해 음성을 입력할 수 있으며, 응답 음성을 음성인식 교감형 로봇(200)을 통하여 확인할 수 있다.At this time, the user can input a voice through the voice recognition sympathetic robot 200, and can confirm the response voice through the voice recognition sympathetic robot 200.

이하에서는 음성을 입력하고 이에 대응하는 응답 텍스트를 생성하기 위한 세부절차와 구성을 보다 상세하게 설명한다.Hereinafter, detailed procedures and configurations for inputting a voice and generating a response text corresponding thereto will be described in more detail.

도 2는 본 발명의 실시예에 따른 교감형 로봇 음성인식 시스템의 세부구성을 나타낸 블록도이다.2 is a block diagram showing the detailed configuration of a sympathetic robot voice recognition system according to an embodiment of the present invention.

도 2를 참고하면, 교감형 로봇 음성인식 시스템(100)은 음성 데이터 수신부(110), 텍스트 변환부(120), 키워드 추출부(130), 응답 텍스트 생성부(140), 음성 변환부(150) 및 송신부(160)를 포함한다.Referring to FIG. 2, the sympathetic robot voice recognition system 100 includes a voice data receiving unit 110, a text conversion unit 120, a keyword extraction unit 130, a response text generation unit 140, and a voice conversion unit 150 ) And the transmitter 160.

음성 데이터 수신부(110)는 외부 단말로부터 전송되는 음성 데이터를 수신할 수 있다. 즉, 교감형 로봇(200) 또는 이와 연결된 스마트 기기(300)를 통해 사용자의 음성이 인식되면, 음성 데이터 수신부(110)는 이를 전송받아 수신할 수 있다.The voice data receiving unit 110 may receive voice data transmitted from an external terminal. That is, when the user's voice is recognized through the sympathetic robot 200 or the smart device 300 connected thereto, the voice data receiving unit 110 may receive and receive it.

이때, 수신하는 음성 데이터는 PCM data를 포함한 다양한 형태의 데이터일 수 있다.At this time, the received voice data may be various types of data including PCM data.

일례로, 상기 교감형 로봇(200)은 Artik053을 사용할 수 있는데, 상기 Artik053에서 사용자의 음성이 인식되면 이는 상기 음성 데이터 수신부(110)로 전송되고, 이 과정에서 전송되는 음성 데이터가 설정된 MTU 값 이상이 되면, 여러 개의 패킷으로 분할되어 전송되도록 하여, 상기와 같이 상대적으로 낮은 사양의 하드웨어(ex. Artik053)를 사용하는 교감형 로봇(200)을 통해서도 고속의 음성인식이 지원될 수 있다.For example, the sympathetic robot 200 may use Artik053, when the user's voice is recognized in the Artik053, it is transmitted to the voice data receiving unit 110, and the voice data transmitted in this process is greater than or equal to a set MTU value. In this case, it is divided into several packets and transmitted so that high-speed voice recognition can be supported through the sympathetic robot 200 using relatively low-spec hardware (ex. Artik053) as described above.

또한, Artik053과 같은 저사양 하드웨어에서는 API 서비스를 사용하기 위한 SDK를 설치할 수 없으므로 상기와 같이 음성 데이터를 작은 단위의 패킷으로 나누어 전달 받으면 기존 STT API에서 마이크 입력으로 처리되던 부분을 상기와 같이 통신을 통해 전달 받은 음성 데이터를 받아오는 형식으로 변환하여 사용할 수 있고, 이를 통해 본 발명의 일실시례와 같은 저사양 하드웨어에서도 음성을 텍스트로 신속히 변환하는 스트리밍 서비스를 제공할 수 있다.In addition, since low-end hardware such as Artik053 cannot install the SDK for using the API service, if the voice data is divided into small packets and delivered as described above, the part processed by the microphone input in the existing STT API through communication as above The received voice data can be converted into a received format, and through this, a streaming service for quickly converting voice to text can be provided even in low-spec hardware such as an embodiment of the present invention.

상기 음성 데이터를 수신한 이후 텍스트 변환부(120)는 상기 음성 데이터를 텍스트로 변환할 수 있다. 이때, 교감형 로봇 음성인식 시스템(100)은 개별 사용마다 언어습관 등이 다르므로, 개별 사용자의 특성에 맞도록 상기 외부 단말의 사용자를 식별하는 고유 키(primary key)를 수신하고, 상기 고유 키에 대응하는 설정값을 독출하기 위해 사용자 관리부를 더 포함할 수 있다.After receiving the voice data, the text conversion unit 120 may convert the voice data into text. At this time, since the sympathetic robot voice recognition system 100 has different language habits for each individual use, it receives a primary key that identifies a user of the external terminal to match the characteristics of the individual user, and the unique key A user management unit may be further included to read a setting value corresponding to.

즉, 사용자마다 음성인식 및 텍스트 변환을 위한 설정값을 달리하여 개별 사용자에게 최적화된 음성인식 및 텍스트 변환 과정을 진행함으로써, 사용자 맞춤형 음성인식이 이루어질 수 있다.That is, a user-customized voice recognition may be achieved by performing an optimized voice recognition and text conversion process for individual users by setting different voice recognition and text setting values for each user.

한편, 음성을 텍스트로 변환하기 위한 STT(Speech to Text) 과정에서는 클라우드를 통해 지원되는 API 등을 사용할 수 있으며, 120개 이상의 언어와 방언을 인식하고, 머신러닝 기술을 사용하여 실시간 스트리밍 또는 사전 녹음 오디오를 처리할 수 있다. Meanwhile, in the Speech to Text (STT) process for converting voice to text, you can use APIs supported through the cloud, recognize more than 120 languages and dialects, and use real-time streaming or pre-recording using machine learning technology. Can process audio.

상기에서 음성 데이터가 텍스트로 변환되면, 키워드 추출부(130)는 상기 변환된 텍스트로부터 핵심 키워드를 추출할 수 있다.When the voice data is converted to text, the keyword extracting unit 130 may extract a key keyword from the converted text.

이를 위해, 상기 변환된 텍스트에 존재하는 다수의 명사를 추출하고, 상기 명사의 유사어 셋(set)을 생성하고, 상기 추출된 명사의 카테고리를 기설정된 카테고리에 매칭하여, 추출된 키워드마다 유사어 셋과 카테고리 속성을 부여할 수 있다.To this end, a plurality of nouns existing in the converted text are extracted, a set of synonyms of the noun is matched, and the category of the extracted noun is matched to a preset category, and the set of synonyms for each extracted keyword Category attributes can be assigned.

일례로, 사용자가 입력한 문장이 "내일 소풍 갈거야"인 경우, 상기 문장에 포함된 명사인 '내일', 과 '소풍'을 추출하고, '내일'의 유사어인 'tomorrow', '다음날', '이튿날' 등의 유사어 셋(set)을 추출하며, '내일'은 시간을 나타내는 단어이므로 카테고리 속성으로 '시간 단어'를 부여할 수 있다. For example, if the sentence entered by the user is "I'm going on a picnic tomorrow", the nouns'Tomorrow' and'Excursion' included in the sentence are extracted, and'tomorrow' and'tomorrow', which are synonyms for'tomorrow'. ,'The next day', and the like, and a set of similar words is extracted, and'tomorrow' is a word representing time, so a'time word' can be assigned as a category attribute.

또한, '소풍'의 경우에도 유사어인 'picnic', '나들이', '야유회' 등의 유사어 셋(set)을 추출하고, '소풍'은 야외에서의 행동을 나타내는 단어이므로 카테고리 속성으로 '야외행동 단어'를 부여할 수 있다.Also, in the case of'excursion', a set of similar words such as'picnic','outing' and'outing party' are extracted, and'excursion' is a word indicating action in the outdoors, so it is a category attribute. Word'.

따라서, 상기 키워드에 대한 유사어 셋과 카테고리 속성을 이용하여 하기에서 설명될 응답 텍스트 생성부(140)에서 사용자의 의도를 보다 정확하게 파악하여 그에 상응하는 응답 텍스트를 도출하도록 할 수 있다.Therefore, the response text generation unit 140 to be described below can more accurately grasp the user's intention using the similar word set and category attributes for the keyword and derive the corresponding response text.

따라서, 이와 연관하여 응답 텍스트 생성부(140)는 상기 추출된 키워드에 대응하는 응답 텍스트를 기저장된 메타데이터로부터 추출할 수 있다. Accordingly, in response to this, the response text generation unit 140 may extract the response text corresponding to the extracted keyword from pre-stored metadata.

이를 위해 응답 텍스트 생성부(140)는, 상기 추출된 각각의 키워드의 유사어 셋과 카테고리 속성에 대응하여 연관된 질문 리스트 셋(set)을 각각 추출하고, 상기 질문 리스트 간의 공통 질문을 추출하여 상기 응답 텍스트를 생성할 수 있다.To this end, the response text generation unit 140 extracts each set of related question lists corresponding to the similar word set and category attribute of each extracted keyword, and extracts a common question between the question lists to extract the response text Can generate

일례로, 추출된 키워드 '내일'과 관련된 질문 리스트 셋에 질문이 5개 포함되고, '소풍'과 관련된 질문 리스트 셋에 질문이 7개 포함된 경우, 상기 두 질문 리스트 셋에 내용이 최대한 중첩되는 질문을 하나 추출하고, 이를 사용자에게 응답할 텍스트로 결정할 수 있다. For example, if five questions are included in the set of question lists related to the extracted keyword'tomorrow', and seven questions are included in the set of question lists related to'excursion', the contents of the two question list sets overlap as much as possible You can extract a question and decide it as the text to respond to the user.

이에 따라 음성 변환부(150)는 상기 결정된 응답 텍스트를 음성 데이터로 변환할 수 있다.Accordingly, the voice conversion unit 150 may convert the determined response text into voice data.

여기서, 텍스트를 음성 변환하기 위한 TTS(Text to Speech) 과정에서는 딥러닝 기술을 사용하여 실제 사람의 음성처럼 소리를 합성하고, 다양한 언어, 음성을 설정할 수 있으며, 상기 사용자의 설정값에 따라 사용자의 언어습관과 유사한 형태로 음성을 생성할 수 있고, 사용자 지정어휘 또는 저장된 용어(회사 이름, 두문자어, 외래어, 신조어 등)에 따라 특정 단어의 발음을 반영하여 생성할 수 있다. Here, in the text to speech (TTS) process for converting text to speech, Using deep learning technology, you can synthesize sounds like a real person's voice, set various languages and voices, generate voices in a form similar to the user's language habits according to the user's settings, and create custom vocabulary. Alternatively, it may be generated by reflecting the pronunciation of a specific word according to the stored term (company name, acronym, foreign language, new word, etc.).

송신부(160)는 상기 변환된 음성 데이터를 상기 외부 단말로 송신할 수 있다. 이때, 송신되는 데이터는 MP3 형식을 포함한 다양한 형식의 음성 데이터로 송신할 수 있다.The transmitter 160 may transmit the converted voice data to the external terminal. At this time, the data to be transmitted can be transmitted as voice data in various formats, including the MP3 format.

상기와 같이, 교감형 로봇 음성인식 시스템을 사용하여 음성을 인식하고 이에 상응하는 응답 텍스트를 생성함으로써, 개별 사용자 맞춤형 음성인식 및 응답 텍스트 생성이 가능하며, 사용자의 의도를 보다 정확하게 파악하여 이에 부합하는 응답을 제공할 수 있는 효과가 발생할 수 있다.As described above, by recognizing the voice using the sympathetic robot voice recognition system and generating the corresponding response text, it is possible to generate individual user-specific voice recognition and response text, and more accurately grasp the user's intention to meet this The effect of providing a response can occur.

이하에서는 음성인식 교감형 로봇을 통해 음성을 입력받고 이를 교감형 로봇 음성인식 시스템에 전송하며, 교감형 로봇 음성인식 시스템으로부터 응답 텍스트(음성 변환된 데이터)를 수신하여 출력하는 구성을 보다 상세하게 설명한다.Hereinafter, a configuration for receiving a voice through a voice recognition sympathetic robot, transmitting it to a sympathetic robot voice recognition system, and receiving and outputting response text (voice converted data) from the sympathetic robot voice recognition system will be described in more detail. do.

도 3은 본 발명의 실시예에 따른 음성인식 교감형 로봇의 세부구성을 나타낸 블록도이다. 이때, 음성인식 교감형 로봇(200)은 일례로 ARTIK053 보드를 내장할 수 있다.3 is a block diagram showing the detailed configuration of a voice recognition sympathetic robot according to an embodiment of the present invention. At this time, the voice recognition sympathetic robot 200 may include an ARTIK053 board as an example.

도 3을 참고하면, 음성인식 교감형 로봇(200)은 음성인식 버튼부(210), 음성 입력부(220), 음성 전송부(230) 및 음성 출력부(240)를 포함할 수 있다.Referring to FIG. 3, the voice recognition sympathetic robot 200 may include a voice recognition button unit 210, a voice input unit 220, a voice transmission unit 230, and a voice output unit 240.

음성인식 버튼부(210)는 버튼 조작을 통해 음성 입력을 개시하기 위한 명령을 입력받을 수 있다. 즉, 종래의 경우 스마트 스피커 등을 통해 소리를 감지하는 것으로 음성입력을 개시하나, 본 발명의 일실시례에서는 사용자가 버튼을 조작하여야만 음성 입력을 개시하므로, 사용자의 적극적인 동작을 통해 음성입력을 개시하기 전까지는 사용자들의 음성대화를 모니터링 하지 않으므로, 사용자의 의도와 무관하게 대화내용이 녹음되고 제3자에게 유출되지 않도록 관리될 수 있다.The voice recognition button unit 210 may receive a command for starting voice input through a button operation. That is, in the conventional case, voice input is started by detecting sound through a smart speaker or the like, but in one embodiment of the present invention, voice input is started only through a user's active operation, since a user has to operate a button. Since the user's voice conversation is not monitored until after, it can be managed so that the conversation is recorded regardless of the user's intention and not leaked to third parties.

상기 버튼은 교감형 로봇의 손 부분에 위치하여, 버튼의 조작시 로봇의 손을 잡는 감성을 제공함으로써, 사용자가 로봇과 보다 교감을 느끼도록 설정할 수 있다.The button is located on the hand portion of the sympathetic robot, and by providing the emotion of holding the robot's hand when the button is operated, the user can be set to feel more sympathetic with the robot.

음성 입력부(220)는 사용자로부터 발화되는 음성을 입력받으며, 상기 음성인식 버튼을 통해 음성입력이 개시되면, 마이크 입력 등을 통해 사용자의 음성을 입력받을 수 있다.The voice input unit 220 receives a voice uttered by a user, and when voice input is started through the voice recognition button, the user's voice may be input through a microphone input or the like.

음성 전송부(230)는 상기 입력된 음성의 녹음 데이터를 PCM data 형태로 외부 시스템에 전송할 수 있다. 즉, PCM data 형태로 데이터를 전송함으로써, 보다 효과적이고 손실없이 데이터를 전송할 수 있다.The voice transmission unit 230 may transmit the recorded data of the input voice to an external system in the form of PCM data. That is, by transmitting data in the form of PCM data, data can be transmitted more effectively and without loss.

한편, 음성 전송은 네트워크 인터페이스에서 세그먼트 없이 보낼 수 있는 최대 데이터그램 크기 값이며, 패킷이 한번에 보낼 수 있는 최대 크기인 MTU size를 590으로 설정 하여 전송할 수 있고, 보내는 데이터가 MTU 값 이상이 되면 여러 개의 패킷으로 분할되어 전송될 수 있어 보다 효과적으로 데이터 전송이 가능해질 수 있다.On the other hand, the voice transmission is the maximum datagram size value that can be sent without a segment on the network interface, and the maximum size that a packet can send at one time can be transmitted by setting the MTU size to 590. Since it can be divided into packets and transmitted, data can be transmitted more effectively.

음성 출력부(240)는 상기 외부 시스템으로부터 음성 데이터 형태의 응답 데이터를 수신하면 스피커 등의 출력장치를 통해 사용자가 인식할 수 있도록 응답 데이터를 출력할 수 있다.When the voice output unit 240 receives the response data in the form of voice data from the external system, the voice output unit 240 may output the response data for the user to recognize through an output device such as a speaker.

한편, 상기 음성 출력부(240)는 음성 출력을 위해 사용되는 코덱 셋팅시 발생하는 지연을 최소화하기 위해 하기의 방법을 사용할 수 있다.Meanwhile, the voice output unit 240 may use the following method to minimize the delay that occurs when setting a codec used for voice output.

코텍 사용전의 코덱 레지스터 설정과정에서 코덱의 설정동작을 기다리기 위해 delay 함수를 콜(call)하여 script[i].delay 만큼의 대기시간을 가지는데, 실제로는 script[i].delay 값은 0인 경우가 많다. 따라서, script[i].delay 함수 자체를 콜하는 시간에 따른 지연을 방지하고자 각 레지스터의 script[i].delay 값이 0인지 확인하고, 0이 아닌 경우에만 음성 코덱의 설정동작 대기를 위한 script[i].delay 함수를 콜하여 각 레지스터의 딜레이 값에 상응하는 대기시간을 부여할 수 있다. 여기서, script[i]의 멤버는 레지스터 주소이며, script[i].delay는 각 레지스터의 딜레이 값에 해당한다.In the process of setting the codec register before using codec, the delay function is called to wait for the codec setting operation. As a result, script[i].delay has a waiting time. In practice, the script[i].delay value is 0. There are many. Therefore, in order to prevent the delay due to the time to call the script[i].delay function itself, check that the value of script[i].delay in each register is 0, and wait for the setting operation of the voice codec only when it is not 0. By calling the [i].delay function, a wait time corresponding to the delay value of each register can be assigned. Here, the member of script[i] is the register address, and script[i].delay corresponds to the delay value of each register.

상기와 같이 본 발명의 일실시례에 따른 교감형 로봇을 통해 음성인식 교감형 로봇의 처리능력을 최소화하고, 비용을 절감할 수 있는 장치가 제공될 수 있다.As described above, through the sympathetic robot according to an embodiment of the present invention, an apparatus capable of minimizing the processing capability of the speech recognition sympathetic robot and reducing costs may be provided.

도 4는 본 발명의 실시예에 따른 교감형 로봇 음성인식 방법의 흐름을 나타낸 동작흐름도이다.4 is an operational flow diagram showing the flow of a sympathetic robot voice recognition method according to an embodiment of the present invention.

한편, 하기에서는 음성인식 교감형 로봇(200)은 ARTIK053 보드를 내장하고, 소켓통신을 통해 교감형 로봇의 음성인식 시스템(100)과 통신하는 것을 일례로 설명한다.On the other hand, in the following, it will be described as an example that the voice recognition sympathetic robot 200 has an ARTIK053 board and communicates with the voice recognition system 100 of the sympathetic robot through socket communication.

이를 위해 단계(410)에서는 외부 단말로부터 전송되는 음성 데이터를 수신할 수 있다. To this end, in step 410, voice data transmitted from an external terminal may be received.

즉, ARTIK053 보드와의 소켓통신으로 Client(ARITK053)가 교감형 로봇의 음성인식 시스템(100) Server에 접속하게 되면, 사용자 구분을 위한 User_info_check()가 실행되며, 사용자는 고유 값을 갖는 primary key를 통해 구분될 수 있다.That is, when a client (ARITK053) connects to the voice recognition system 100 server of the sympathetic robot through socket communication with the ARTIK053 board, User_info_check() for user identification is executed, and the user enters a primary key with a unique value. Can be distinguished through.

Client 정보를 action_thread() 호출과 함께 넘겨주면 해당 client socket을 이용하여, google_cloud_streaming() 동작으로 사용자의 음성 데이터(PCM data)를 server로 가져올 수 있다. If the client information is passed along with the action_thread() call, the user's voice data (PCM data) can be brought to the server by using the corresponding client socket, using google_cloud_streaming().

다음으로 단계(420)에서는 상기 음성 데이터를 텍스트로 변환할 수 있다. 이때, Google cloud streaming Speech To Text API를 통해서 text로 변환할 수 있다.Next, in step 420, the voice data may be converted into text. At this time, it can be converted to text through the Google cloud streaming Speech To Text API.

단계(430)에서는 상기 변환된 텍스트로부터 키워드를 추출할 수 있고, 단계(440)에서는 상기 추출된 키워드에 대응하는 응답 텍스트를 기저장된 메타데이터로부터 추출할 수 있다.In step 430, a keyword may be extracted from the converted text, and in step 440, a response text corresponding to the extracted keyword may be extracted from pre-stored metadata.

단계(450)에서는 상기 응답 텍스트를 음성 데이터로 변환할 수 있고, 단계(460)에서는 상기 변환된 음성 데이터를 상기 외부 단말로 송신할 수 있다.In step 450, the response text may be converted into voice data, and in step 460, the converted voice data may be transmitted to the external terminal.

이를 위해 응답 text는 AWS Polly Text To Speech API를 이용하여 1-Chenal, Mono, 22050HZ의 mp3 file로 생성되고 해당 mp3 file을 FFmpeg module를 사용하여 2-Channel Stereo 44000HZ로 변환 및 ARTIK053 보드에게 전달하는 과정이 진행될 수 있다.To this end, the response text is generated as an mp3 file of 1-Chenal, Mono, 22050HZ using the AWS Polly Text To Speech API, and the corresponding mp3 file is converted to 2-Channel Stereo 44000HZ using the FFmpeg module and delivered to the ARTIK053 board. This can proceed.

상기와 같이 본 발명의 일실시례에 따르면, 음성인식 교감형 로봇을 통해 사용자의 음성을 입력받고 이를 서버에 송신하여 서버에서 사용자의 음성을 분석하고 상응하는 응답 음성을 출력하도록 함으로써, 음성인식 교감형 로봇의 처리능력을 최소화하고, 비용을 절감할 수 있는 음성인식 교감형 로봇, 교감형 로봇 음성인식 시스템 및 그 방법이 제공된다.According to an embodiment of the present invention as described above, by receiving a user's voice through a voice recognition sympathetic robot and transmitting it to the server, the server analyzes the user's voice and outputs a corresponding response voice, thereby recognizing voice recognition. Provided is a voice recognition sympathetic robot, a sympathetic robot voice recognition system, and a method for minimizing the processing capability of a robot and reducing cost.

또한, 본 발명의 일실시례에 따르면, 저사양 하드웨어를 이용하여 전력소모가 적고 가벼워 휴대가 용이해지고, 초기비용을 현저히 낮출 수 있으며, 사용자가 이동중에도 고속 음성인식 서비스를 제공받을 수 있는 효과가 발생될 수 있다.In addition, according to one embodiment of the present invention, low power consumption and lightness are facilitated by using low-spec hardware, and the initial cost can be significantly lowered, and an effect that a user can be provided with a high-speed voice recognition service while moving is generated. Can be.

또한 본 발명의 일실시례에 따른, 교감형 로봇 음성인식 방법은 다양한 컴퓨터로 구현되는 동작을 수행하기 위한 프로그램 명령을 포함하는 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.In addition, according to an embodiment of the present invention, the sympathetic robot voice recognition method may be recorded in a computer-readable medium including program instructions for performing various computer-implemented operations. The computer-readable medium may include program instructions, data files, data structures, or the like alone or in combination. The media may be program instructions specially designed and constructed for the present invention, or may be known and usable by those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs, DVDs, and magnetic media such as floptical disks. -Hardware devices specially configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language code that can be executed by a computer using an interpreter, etc., as well as machine language codes produced by a compiler.

이상과 같이 본 발명의 일실시례는 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명의 일실시례는 상기 설명된 실시예에 한정되는 것은 아니며, 이는 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다.　 따라서, 본 발명의 일실시례는 아래에 기재된 특허청구범위에 의해서만 파악되어야 하고, 이의 균등 또는 등가적 변형 모두는 본 발명 사상의 범주에 속한다고 할 것이다.As described above, although one embodiment of the present invention has been described by a limited embodiment and drawings, one embodiment of the present invention is not limited to the above-described embodiment, which is a general knowledge in the field to which the present invention pertains. Various modifications and variations can be made by those who have this description. Accordingly, one embodiment of the present invention should be understood only by the claims set forth below, and all equivalents or equivalent modifications thereof will be said to fall within the scope of the spirit of the present invention.

100: 교감형 로봇 음성인식 시스템
110: 음성 데이터 수신부
120: 텍스트 변환부
130: 키워드 추출부
140: 응답 텍스트 생성부
150: 음성 변환부
160: 송신부
200: 교감형 로봇
210: 음성인식 버튼부
220: 음성 입력부
230: 음성 전송부
240: 음성 출력부100: sympathetic robot voice recognition system
110: voice data receiving unit
120: text conversion unit
130: keyword extraction unit
140: response text generation unit
150: speech converter
160: transmitter
200: sympathetic robot
210: voice recognition button unit
220: voice input
230: voice transmission unit
240: audio output unit

Claims

A voice data receiver configured to receive voice data transmitted from an external terminal;
A text conversion unit that converts the voice data into text;
A keyword extraction unit extracting keywords from the converted text;
A response text generator for extracting response text corresponding to the extracted keyword from pre-stored metadata;
A voice converter for converting the response text into voice data; And
A transmitter which transmits the converted voice data to the external terminal;
Speech recognition system of a sympathetic robot comprising a.

According to claim 1,
A user management unit that receives a primary key that identifies a user of the external terminal and reads a setting value corresponding to the unique key
Voice recognition system of the sympathetic robot, characterized in that it further comprises.

According to claim 1,
The keyword extraction unit,
Extracting a number of nouns existing in the converted text,
Generating a set of similar words of the noun and matching the extracted noun category to a preset category,
A speech recognition system of a sympathetic robot, characterized in that similar keyword sets and category attributes are assigned to each extracted keyword.

According to claim 3,
The response text generation unit,
The sympathetic robot characterized in that each of the extracted keyword sets and associated question list sets corresponding to the keyword set and category attributes are extracted, and the common text between the question lists is extracted to generate the response text. Voice recognition system.

A voice recognition button unit that receives a command for starting voice input through a button operation;
A voice input unit that receives a voice uttered by a user;
A voice transmission unit transmitting the recorded data of the input voice to an external system in the form of PCM data, and dividing and transmitting the data in units of a preset MTU; And
A voice output unit for receiving and outputting response data from the external system;
Sympathetic robot comprising a.

The method of claim 5,
The output unit,
The sympathy characterized in that the delay value of each register of the voice codec is 0, and when it is not 0, a delay function for waiting for the setting operation of the voice codec is called to give a waiting time corresponding to the delay value of each register. Type robot.

Receiving voice data transmitted from an external terminal;
Converting the voice data into text;
Extracting keywords from the converted text;
Extracting a response text corresponding to the extracted keyword from pre-stored metadata;
Converting the response text into speech data; And
Transmitting the converted voice data to the external terminal;
Speech recognition method of a sympathetic robot comprising a.