KR102836597B1

KR102836597B1 - Voice recognition system for vehicle and method of controlling the same

Info

Publication number: KR102836597B1
Application number: KR1020180136625A
Authority: KR
Inventors: 신용진
Original assignee: 현대자동차주식회사; 기아 주식회사
Priority date: 2018-11-08
Filing date: 2018-11-08
Publication date: 2025-07-21
Anticipated expiration: 2038-11-08
Also published as: KR20200053242A

Abstract

본 발명은 보다 정확한 음성 인식을 위해 복수의 음성 인식 결과를 효과적으로 조합할 수 있는 차량용 음성인식 시스템 및 그 제어 방법에 관한 것이다. 본 발명의 일 실시예에 따른 음성 인식 방법은, 메시지 전송을 위한 발화자의 음성 데이터를 획득하는 단계; 상기 음성 데이터를 기반으로 폰북에 접근 가능하며, 차량 내부에서 구동되는 제1 음성 인식 엔진에서 제1 음성 인식을 수행하는 단계; 상기 음성 데이터를 기반으로 차량 외부에서 구동되는 제2 음성 인식 엔진에서 제2 음성 인식을 수행하는 단계; 및 상기 제1 음성 인식의 결과인 제1 인식 결과와 상기 제2 음성 인식의 결과인 제2 인식 결과가 획득되면, 적어도 상기 제1 인식 결과에 포함된 이름(Name) 정보와 상기 제2 인식 결과에 포함된 바디(SMS body) 정보를 이용하여 최종 인식 결과를 결정하는 단계를 포함할 수 있다.The present invention relates to a voice recognition system for a vehicle capable of effectively combining a plurality of voice recognition results for more accurate voice recognition, and a control method thereof. The voice recognition method according to one embodiment of the present invention may include the steps of: obtaining voice data of a speaker for message transmission; performing first voice recognition on a first voice recognition engine driven inside a vehicle and capable of accessing a phone book based on the voice data; performing second voice recognition on a second voice recognition engine driven outside the vehicle based on the voice data; and, when a first recognition result as a result of the first voice recognition and a second recognition result as a result of the second voice recognition are obtained, determining a final recognition result by using at least name information included in the first recognition result and body information (SMS body) included in the second recognition result.

Description

{VOICE RECOGNITION SYSTEM FOR VEHICLE AND METHOD OF CONTROLLING THE SAME}

본 발명은 보다 정확한 음성 인식을 위해 복수의 음성 인식 결과를 효과적으로 조합할 수 있는 차량용 음성인식 시스템 및 그 제어 방법에 관한 것이다.The present invention relates to a vehicle voice recognition system and a control method thereof capable of effectively combining multiple voice recognition results for more accurate voice recognition.

최근 음성 인식 기술의 발달로 인해 인식률이 높아짐에 따라, 음성 인식 기술의 적용 대상이 확대되고 있다. 이러한 적용 대상의 대표적인 예로 차량을 들 수 있다. 차량에서의 음성 인식은 손을 사용할 필요가 없어 보다 운전에 집중할 수 있도록 하며, 문자 메시지 작성과 같이 조작량이 많을 경우 특히 유용하다.As the recognition rate has increased due to the recent development of voice recognition technology, the application target of voice recognition technology has expanded. A representative example of such application target is a vehicle. Voice recognition in a vehicle allows you to focus more on driving without having to use your hands, and is especially useful when there is a lot of manipulation, such as writing a text message.

차량에서의 음성 인식은 크게 차량에 구비된 음성 인식 엔진, 예컨대 AVN(Audio/Video/Navigation) 시스템 또는 헤드 유닛(H/U)에서 구동되는 음성 인식 엔진을 이용하는 방법과 음성 데이터를 지정된 서버에 전송하면, 서버에서 음성 인식 결과를 리턴하는 방식의 서버 기반 음성 인식 엔진을 이용하는 방법으로 구분될 수 있다.Voice recognition in vehicles can be broadly divided into a method using a voice recognition engine equipped in the vehicle, such as a voice recognition engine driven by an AVN (Audio/Video/Navigation) system or a head unit (H/U), and a method using a server-based voice recognition engine that transmits voice data to a designated server and returns voice recognition results from the server.

그런데, 일반적인 AVN 시스템에서 구동되는 음성 인식 엔진은 자유 발화에 대한 인식률 자체가 서버 기반 음성 인식 엔진 대비 떨어지는 경향이 있다. 반면에, 서버 기반 음성 인식 엔진은 자유 발화에 대한 음성 인식률은 비교적 높은 편이나, 문자 메시지 전송을 위한 음성 인식에 있어서는 발화 내용 중 문자 메시지에 대한 전송 요청과, 문자 메시지에 포함될 내용(즉, 바디)에 대한 구분을 정확히 하지 못하는 문제가 있다.However, the speech recognition engine that is driven by a general AVN system tends to have a lower recognition rate for free speech than that of a server-based speech recognition engine. On the other hand, although the speech recognition rate for free speech is relatively high in a server-based speech recognition engine, there is a problem in that it cannot accurately distinguish between the request for sending a text message among the speech contents and the content to be included in the text message (i.e., the body) in the text message when recognizing the speech for sending text messages.

예를 들어, 발화 내용이 'Send message to Morrow "I am on the way".'와 같은 문장의 경우, 'Send message to Morrow'가 메시지 전송 요청에 해당하고, '"I am on the way"'가 메시지 바디에 해당한다. 그런데, 'To'와 'Morrow'를 서버가 'tomorrow'란 하나의 단어로 인식할 경우(즉, 이름과 유사하되 사용 빈도가 높은 단어가 있을 경우), 서버 기반 음성 인식 엔진의 음성 인식 결과는 'Send message "Tomorrow I am on the way".'와 같이 된다. 결국, 차량에서는 서버의 인식 결과를 그대로 사용할 경우 Morrow에게 "Tomorrow I am on the way" 라는 메시지를 전송하게 되는 문제점이 있다.For example, if the utterance is a sentence such as 'Send message to Morrow "I am on the way".', 'Send message to Morrow' corresponds to the message transmission request, and '"I am on the way"' corresponds to the message body. However, if the server recognizes 'To' and 'Morrow' as a single word called 'tomorrow' (i.e., if there is a word that is similar to the name but is used frequently), the speech recognition result of the server-based speech recognition engine will be 'Send message "Tomorrow I am on the way".' Ultimately, if the vehicle uses the recognition result of the server as is, there is a problem in that the message "Tomorrow I am on the way" is sent to Morrow.

본 발명은 차량 환경에서 음성 인식을 수행함에 있어서, 보다 높은 인식률을 갖는 차량용 음성 인식 시스템 및 그 제어 방법을 제공하기 위한 것이다.The present invention provides a vehicle voice recognition system and a control method thereof having a higher recognition rate when performing voice recognition in a vehicle environment.

특히, 본 발명은 특정 서비스의 실행 요청에 대응되는 음성 명령에 대한 인식률을 높이기 위한 차량용 음성 인식 시스템 및 그 제어 방법을 제공하기 위한 것이다.In particular, the present invention provides a vehicle voice recognition system and a control method thereof for increasing the recognition rate of a voice command corresponding to a request for execution of a specific service.

본 발명에서 이루고자 하는 기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급하지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The technical problems to be achieved in the present invention are not limited to the technical problems mentioned above, and other technical problems not mentioned will be clearly understood by a person having ordinary skill in the technical field to which the present invention belongs from the description below.

상기와 같은 기술적 과제를 해결하기 위하여, 본 발명의 일 실시예에 따른 음성 인식 방법은, 메시지 전송을 위한 발화자의 음성 데이터를 획득하는 단계; 상기 음성 데이터를 기반으로 폰북에 접근 가능하며, 차량 내부에서 구동되는 제1 음성 인식 엔진에서 제1 음성 인식을 수행하는 단계; 상기 음성 데이터를 기반으로 차량 외부에서 구동되는 제2 음성 인식 엔진에서 제2 음성 인식을 수행하는 단계; 및 상기 제1 음성 인식의 결과인 제1 인식 결과와 상기 제2 음성 인식의 결과인 제2 인식 결과가 획득되면, 적어도 상기 제1 인식 결과에 포함된 이름(Name) 정보와 상기 제2 인식 결과에 포함된 바디(SMS body) 정보를 이용하여 최종 인식 결과를 결정하는 단계를 포함할 수 있다.In order to solve the above technical problem, a voice recognition method according to an embodiment of the present invention may include the steps of: obtaining voice data of a speaker for message transmission; performing first voice recognition based on the voice data in a first voice recognition engine that is capable of accessing a phonebook and is driven inside a vehicle; performing second voice recognition based on the voice data in a second voice recognition engine that is driven outside the vehicle; and, when a first recognition result as a result of the first voice recognition and a second recognition result as a result of the second voice recognition are obtained, determining a final recognition result by using at least name information included in the first recognition result and body information (SMS body) included in the second recognition result.

또한, 본 발명의 일 실시예에 따른 차량용 음성 인식 시스템은, 메시지 전송을 위한 발화자의 음성 명령어를 입력받는 마이크; 및 상기 음성 명령어에 대응되는 음성 데이터를 획득하고, 상기 음성 데이터를 기반으로 최종 인식 결과를 결정하는 음성 인식 장치를 포함하되, 상기 음성 인식 장치는 상기 음성 데이터를 기반으로 폰북에 접근 가능한 제1 음성 인식 엔진을 통해 제1 음성 인식을 수행하여 제1 인식 결과를 획득하는 제어부; 및 상기 음성 데이터를 제2 음성 인식 엔진을 구동하는 외부 음성 인식 장치에 전달하고, 상기 제2 음성 인식 엔진에서 수행된 제2 음성 인식의 결과인 제2 인식 결과를 획득하는 통신부를 포함하되, 상기 제어부는 적어도 상기 제1 인식 결과에 포함된 이름(Name) 정보와 상기 제2 인식 결과에 포함된 바디(SMS body) 정보를 이용하여 상기 최종 인식 결과를 결정할 수 있다.In addition, a vehicle voice recognition system according to one embodiment of the present invention includes: a microphone for receiving a voice command of a speaker for message transmission; and a voice recognition device for obtaining voice data corresponding to the voice command and determining a final recognition result based on the voice data, wherein the voice recognition device includes: a control unit for performing first voice recognition through a first voice recognition engine accessible to a phonebook based on the voice data to obtain a first recognition result; and a communication unit for transmitting the voice data to an external voice recognition device that drives a second voice recognition engine and obtaining a second recognition result which is a result of second voice recognition performed by the second voice recognition engine, wherein the control unit can determine the final recognition result by using at least name (Name) information included in the first recognition result and body (SMS body) information included in the second recognition result.

상기와 같이 구성되는 본 발명의 적어도 하나의 실시예에 의하면, 차량 환경에서 보다 인식률이 높은 음성 인식 서비스가 제공될 수 있다.According to at least one embodiment of the present invention configured as described above, a voice recognition service having a higher recognition rate can be provided in a vehicle environment.

특히, 본 발명은 메시지 전송 요청에 대응되는 음성 인식을 수행함에 있어서, 폰북에 접근 가능한 음성 인식 엔진과 자유 발화 인식률이 높은 음성 인식 엔진을 각각 이용하여 인식 결과를 서로 비교함으로써 높은 인식률이 기대될 수 있다.In particular, the present invention performs voice recognition corresponding to a message transmission request by using a voice recognition engine accessible to a phonebook and a voice recognition engine having a high free speech recognition rate, respectively, thereby comparing the recognition results, so that a high recognition rate can be expected.

본 발명에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The effects obtainable from the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art to which the present invention belongs from the description below.

도 1은 본 발명의 일 실시예에 따른 차량용 음성 인식 시스템 구성 및 동작의 일례를 나타내는 블럭도이다.
도 2는 본 발명의 일 실시예에 따른 음성 인식 시스템을 통한 음성 인식 결과를 출력하는 과정의 일례를 나타낸다.
도 3은 본 발명의 일 실시예에 따른 서로 다른 음성 인식 엔진에서 출력된 음성 인식 결과가 비교되는 형태의 일례를 설명하기 위한 도면이다.FIG. 1 is a block diagram showing an example of the configuration and operation of a vehicle voice recognition system according to one embodiment of the present invention.
FIG. 2 illustrates an example of a process for outputting a voice recognition result through a voice recognition system according to one embodiment of the present invention.
FIG. 3 is a drawing for explaining an example of a form in which voice recognition results output from different voice recognition engines according to one embodiment of the present invention are compared.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시 예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, with reference to the attached drawings, embodiments of the present invention will be described in detail so that those skilled in the art can easily practice the present invention. However, the present invention may be implemented in various different forms and is not limited to the embodiments described herein. In addition, in order to clearly describe the present invention in the drawings, parts that are not related to the description are omitted, and similar parts are assigned similar drawing reference numerals throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서 전체에 걸쳐서 동일한 참조번호로 표시된 부분들은 동일한 구성요소들을 의미한다.Throughout the specification, whenever a part is said to "include" a certain component, this does not mean that it excludes other components, but rather that it may include other components, unless otherwise specifically stated. Furthermore, parts designated by the same reference numerals throughout the specification refer to the same components.

본 발명의 일 실시예에 의하면, 특정 서비스를 실행하기 위한 음성 명령이 입력되면, 서로 다른 음성 인식 엔진으로부터 음성 인식 결과를 획득하고, 각각의 음성 인식 결과를 서로 비교하여 음성 인식률을 높일 수 있는 차량용 음성 인식 시스템 및 그 제어 방법이 제공된다.According to one embodiment of the present invention, a vehicle voice recognition system and a control method thereof are provided, which, when a voice command for executing a specific service is input, acquires voice recognition results from different voice recognition engines and compares each voice recognition result with each other to increase a voice recognition rate.

본 실시예의 일 양상에 의하면, 특정 서비스는 문자 메시지(SMS) 전송 서비스일 수 있다. 이러한 경우, 음성 명령은 크게 네 가지 부분으로 구성될 수 있다. 구체적으로, 네 부분은 각각 서비스 도메인(Domain), 의도(intention), 대상 이름(name), 그리고 바디(body)에 해당한다. 서비스 도메인은 서비스의 종류(즉, SMS 서비스)에 해당하며, 의도는 해당 서비스의 실행 형태(즉, 메시지 작성)를 의미하며, 대상 이름은 수신자에 해당하고, 바디는 메시지 자체의 내용을 의미한다. 여기서, 도메인과 의도는 비교적 정형성을 갖는 부분으로 음성 인식 엔진의 종류에 따라 인식률 차이가 크게 발생하지 아니하나, 대상 이름과 바디는 비교적 정형적이지 못하므로 음성 인식 엔진에 따른 편차가 발생한다.According to one aspect of the present embodiment, a specific service may be a text message (SMS) transmission service. In this case, a voice command may be largely composed of four parts. Specifically, the four parts correspond to a service domain, an intention, a target name, and a body, respectively. The service domain corresponds to a type of service (i.e., SMS service), the intention refers to the execution form of the service (i.e., message writing), the target name corresponds to a recipient, and the body refers to the content of the message itself. Here, the domain and the intention are relatively standardized parts, and the recognition rate does not differ significantly depending on the type of voice recognition engine, but the target name and the body are relatively non-standardized, and therefore, deviations occur depending on the voice recognition engine.

구체적으로, 대상 이름의 경우, 발화자가 전화번호나 이메일 주소 등 수신처의 고유 식별 정보를 말하지 않는 이상, 폰북을 참조하게 된다. 따라서, 적어도 대상 이름은 폰북에 접근이 가능한 주체에서 구동되는 음성 인식 엔진을 통해 인식되는 것이 바람직하며, 본 실시예에서 이러한 주체는 AVN 시스템(즉, 헤드유닛)에서 구동되는(Embedded) 음성 인식 엔진이 될 수 있다. 반면에, 바디의 경우 자유 발화에 해당하는 경우가 많아 서버 기반 음성 인식 엔진의 인식률이 높다.Specifically, for the target name, unless the speaker says unique identification information of the recipient, such as a phone number or email address, the phone book is referenced. Therefore, at least the target name is preferably recognized by a voice recognition engine driven by a subject that can access the phone book, and in this embodiment, such a subject can be a voice recognition engine driven (embedded) in the AVN system (i.e., head unit). On the other hand, in the case of the body, since it often corresponds to free speech, the recognition rate of the server-based voice recognition engine is high.

따라서, 본 실시예에서는 발화자의 음성 명령에 대하여, 폰북 접근이 가능한 음성 인식 엔진(이하, 편의상 '임베디드 엔진'이라 칭함)의 인식 결과에서 적어도 대상 이름을 추출하고, 서버 기반 음성 인식 엔진(이하, 편의상 '서버 엔진'이라 칭함)의 인식 결과에서는 적어도 바디를 추출하여 최종 인식 결과를 결정할 것을 제안한다. 이를 수행하기 위한 음성 인식 시스템의 구성을 도 1을 참조하여 설명한다.Therefore, in this embodiment, it is proposed to extract at least the target name from the recognition result of a voice recognition engine (hereinafter, referred to as an 'embedded engine' for convenience) capable of accessing a phonebook for a speaker's voice command, and to extract at least the body from the recognition result of a server-based voice recognition engine (hereinafter, referred to as a 'server engine' for convenience) to determine the final recognition result. The configuration of a voice recognition system for performing this is described with reference to Fig. 1.

도 1은 본 발명의 일 실시예에 따른 차량용 음성 인식 시스템 구성 및 동작의 일례를 나타내는 블럭도이다.FIG. 1 is a block diagram showing an example of the configuration and operation of a vehicle voice recognition system according to one embodiment of the present invention.

도 1을 참조하면, 음성 인식 시스템은 크게 차량측 구성 요소와 차량 외부 구성 요소로 구분될 수 있다. 차량측 구성 요소는 마이크(110)와 헤드 유닛(120, 또는 AVN 시스템)을 포함할 수 있으며, 차량 외부 구성 요소는 중계 개체(130) 및 음성 인식 서버(140)를 포함할 수 있다.Referring to Fig. 1, the voice recognition system can be largely divided into a vehicle-side component and a vehicle-external component. The vehicle-side component may include a microphone (110) and a head unit (120, or AVN system), and the vehicle-external component may include a relay entity (130) and a voice recognition server (140).

헤드 유닛(120)은 다시 폰북 정보를 저장하는 저장부와, 임베디드 엔진을 구동하며 최종 인식 결과를 결정하는 제어부, 그리고 마이크로부터 입력된 발화자의 음성 데이터를 외부로 전송하고, 음성 인식 서버(140)가 인식한 결과(즉, 서버 인식 결과)를 수신하는 통신부(미도시)를 포함할 수 있다.The head unit (120) may include a storage unit that stores phonebook information again, a control unit that drives an embedded engine and determines the final recognition result, and a communication unit (not shown) that transmits the speaker's voice data input from a microphone to the outside and receives the result recognized by the voice recognition server (140) (i.e., server recognition result).

중계 개체(130)는 차량 제조사 등에서 제공하는 텔레매틱스 서버일 수 있으나, 이는 예시적인 것으로 반드시 이에 한정되는 것은 아니고, 헤드 유닛(120)과 음성 인식 서버(140) 사이에서 데이터 전송을 중계할 수 있다면 어떠한 개체에도 적용될 수 있다.The relay entity (130) may be a telematics server provided by a vehicle manufacturer, etc., but this is merely an example and is not necessarily limited thereto. Any entity that can relay data transmission between the head unit (120) and the voice recognition server (140) may be applied.

음성 인식 서버(140)는 중계 개체(130)를 통해 헤드 유닛(120)이 전송한 음성 데이터를 기반으로 음성 인식을 수행하고, 그 결과를 다시 중계 개체(130)를 통해 헤드 유닛(120)으로 전달할 수 있다. 이때, 음성 인식 서버(140)는 적어도 음성 데이터에서 바디를 추출할 수 있다.The voice recognition server (140) can perform voice recognition based on voice data transmitted by the head unit (120) through the relay entity (130) and transmit the result back to the head unit (120) through the relay entity (130). At this time, the voice recognition server (140) can extract at least a body from the voice data.

상술한 구성 요소간의 연결 관계를 기반으로 음성 명령 처리 과정을 설명하면 다음과 같다.Based on the relationship between the components described above, the voice command processing process is explained as follows.

먼저, 발화자가 차량 내 마이크(110)를 통해 'Send Message to Anna "I am on the way".'라는 음성 명령(210)을 발화하면, 헤드 유닛(120)에서는 해당 음성 명령을 중계 개체(130)로 전송하는 한편, 임베디드 엔진을 통해 음성 인식을 수행한다. 구체적으로, 임베디드 엔진은 해당 음성 명령(210)에 대한 도메인 판단 및 기 저장된 폰북에서 이름(Anna) 추출을 수행할 수 있다. 다만, 바디에 대한 부분은 인식이 되더라도 무시(즉, Garbage 처리)하여, 임베디드 엔진의 인식 결과, 즉, 임베디드 인식 결과(220)는 'Send Message to Anna <Garbage>'가 된다.First, when the speaker utters a voice command (210) of 'Send Message to Anna "I am on the way"' through the microphone (110) in the vehicle, the head unit (120) transmits the voice command to the relay entity (130) and performs voice recognition through the embedded engine. Specifically, the embedded engine can perform domain judgment for the voice command (210) and extract the name (Anna) from the stored phone book. However, even if the body part is recognized, it is ignored (i.e., processed as garbage), so that the recognition result of the embedded engine, i.e., the embedded recognition result (220), becomes 'Send Message to Anna <Garbage>'.

한편, 음성 인식 서버(140)에서는 서버 엔진을 통해 음성 인식을 수행한다. 구체적으로, 서버 엔진은 전송된 음성 데이터를 기반으로 도메인 판단, 이름 추출 (Ana) 및 바디 (I am on the way) 추출을 수행한다. 그에 따른 서버 엔진의 인식 결과, 즉, 서버 인식 결과(230)는 'Send Message to Ana "I am on the way".'가 된다.Meanwhile, the voice recognition server (140) performs voice recognition through the server engine. Specifically, the server engine performs domain judgment, name extraction (Ana), and body (I am on the way) extraction based on the transmitted voice data. The recognition result of the server engine accordingly, that is, the server recognition result (230), becomes 'Send Message to Ana "I am on the way".'

해당 서버 인식 결과(230)가 다시 헤드 유닛(120)에 전달되면, 헤드 유닛(120)은 적어도 이름은 임베디드 인식 결과(220)의 것을 사용하고, 적어도 바디는 서버 인식 결과(230)의 것을 사용하여 최종 인식 결과(240)를 결정한다. 따라서, 최종 인식 결과(240)는 서버가 인식한 'Ana'가 아닌, 'Anna에게 "I am on the way"란 메시지를 보낼 것'이 된다.When the server recognition result (230) is transmitted back to the head unit (120), the head unit (120) determines the final recognition result (240) by using at least the name of the embedded recognition result (220) and at least the body of the server recognition result (230). Therefore, the final recognition result (240) is not 'Ana' recognized by the server, but 'Send the message "I am on the way" to Anna.'

상술한 음성 인식 시스템이 적용될 경우, 상대적으로 바디에 대한 인식률이 높은 서버가 이름을 잘못 인식하더라도, 폰북 조회가 가능한 헤드 유닛에서 인식된 이름이 최종 결과에 적용되므로 각 음성 인식 엔진의 장점이 선별적으로 조합될 수 있으므로 최종 인식률이 향상될 수 있다.When the above-described voice recognition system is applied, even if a server with a relatively high recognition rate for the body incorrectly recognizes the name, the name recognized by the head unit capable of looking up the phone book is applied to the final result, so the advantages of each voice recognition engine can be selectively combined, so that the final recognition rate can be improved.

다만, 상술한 방법이 적용되더라도, 전술된 바와 같이 발화 내용이 'Send message to Morrow "I am on the way".'와 같은 문장의 경우, 서버 인식 결과의 바디가 "Tomorrow I am on the way"와 같이 될 경우, 이름이 정확히 인식되더라도 바디 내용에 오류가 있게 된다. 따라서, 본 실시예의 다른 양상에서는 임베디드 인식 결과와 서버 인식 결과를 조합하여 최종 인식 결과를 결정함에 있어 타임 스탬프 정보를 이용할 것을 제안한다.However, even if the above-described method is applied, if the utterance content is a sentence such as 'Send message to Morrow "I am on the way".' as described above, if the body of the server recognition result becomes "Tomorrow I am on the way", there will be an error in the body content even if the name is recognized correctly. Therefore, in another aspect of the present embodiment, it is proposed to utilize time stamp information when determining the final recognition result by combining the embedded recognition result and the server recognition result.

이를 도 2 및 도 3을 참조하여 설명한다. 이하에서 설명되는 내용에서는 타임 스탬프에 관련된 내용을 제외하면, 기본적인 음성 인식 과정은 도 1을 참조하여 설명한 바와 같으므로 중복되는 설명은 생략하기로 한다.This is explained with reference to FIGS. 2 and 3. In the following description, except for the contents related to the time stamp, the basic voice recognition process is the same as that explained with reference to FIG. 1, so redundant description will be omitted.

도 2는 본 발명의 일 실시예에 따른 음성 인식 시스템을 통한 음성 인식 결과를 출력하는 과정의 일례를 나타내고, 도 3은 본 발명의 일 실시예에 따른 서로 다른 음성 인식 엔진에서 출력된 음성 인식 결과가 비교되는 형태의 일례를 설명하기 위한 도면이다.FIG. 2 illustrates an example of a process for outputting a voice recognition result through a voice recognition system according to one embodiment of the present invention, and FIG. 3 is a drawing for explaining an example of a form in which voice recognition results output from different voice recognition engines according to one embodiment of the present invention are compared.

도 2를 참조하면, 먼저 발화자가 음성 명령어를 발화한다(S310).Referring to Figure 2, first, the speaker utters a voice command (S310).

발화된 음성 명령어는 마이크(110)를 통해 전기 신호로 변환되며, 헤드 유닛은 다시 이를 기반으로 음성 데이터(예를 들어, 웨이브 파일)를 생성하여 음성 인식 서버(140)로 전송한다(S320). 이때, 음성 데이터는 중계 개체(130)를 경유할 수 있음은 전술한 바와 같다.The spoken voice command is converted into an electric signal through a microphone (110), and the head unit again generates voice data (e.g., wave file) based on this and transmits it to a voice recognition server (140) (S320). At this time, as described above, the voice data may pass through a relay entity (130).

헤드 유닛(120)에서는 임베디드 엔진의 딕테이션(Dictation)을 통해 음성 명령어에 대한 음성 인식을 수행하며, 이때, 소정 인식 단위로 타임 스탬프를 함께 추출할 수 있다(S330A). 여기서 소정 인식 단위는 단어 단위일 수 있으나, 반드시 이에 한정되는 것은 아니다. 예컨대, "Send Message to"에 대한 인식열(또는 말뭉치: Corpus)은 임베디드 엔진에 미리 저장되어 있을 것이므로 해당 인식에 대한 타임 스탬프가 추출될 수 있다.The head unit (120) performs voice recognition for a voice command through dictation of the embedded engine, and at this time, a time stamp can be extracted together with a predetermined recognition unit (S330A). Here, the predetermined recognition unit may be a word unit, but is not necessarily limited thereto. For example, since the recognition string (or corpus) for "Send Message to" may be stored in advance in the embedded engine, a time stamp for the corresponding recognition can be extracted.

또한, 임베디드 엔진에서는 기 저장(다운로드)된 폰북을 기반으로 폰북 검색 및 인식을 통해 이름을 추출할 수 있다(S340A).Additionally, the embedded engine can extract names through phone book search and recognition based on a pre-stored (downloaded) phone book (S340A).

한편, 음성 인식 서버(140)에서는 획득된 음성 데이터를 기반으로 서버 엔진의 딕테이션(Dictation)을 통한 음성 인식 및 소정 인식 단위별 타임 스탬프 추출을 수행할 수 있다(S330B). 물론, 임베디드 엔진의 경우와 같이 소정 인식 단위는 단어 단위일 수 있으나, 반드시 이에 한정되는 것은 아니다.Meanwhile, the voice recognition server (140) can perform voice recognition and time stamp extraction for each predetermined recognition unit through dictation of the server engine based on the acquired voice data (S330B). Of course, as in the case of the embedded engine, the predetermined recognition unit can be a word unit, but it is not necessarily limited thereto.

음성 인식 서버(140)는 딕테이션(Dictation)된 텍스트를 기반으로 바디(SMS Body)를 추출할 수 있다(S340B).The voice recognition server (140) can extract a body (SMS Body) based on the dictated text (S340B).

이후, 음성 인식 서버(140)는 서버 인식 결과를 차량으로 전송할 수 있다(S350). 여기서, 서버 인식 결과에는 딕테이션된 텍스트와 타임 스탬프 정보가 포함될 수 있으며, 바디를 구분하기 위한 정보가 더욱 포함될 수도 있다.Thereafter, the voice recognition server (140) can transmit the server recognition result to the vehicle (S350). Here, the server recognition result may include the dictated text and time stamp information, and may further include information for distinguishing the body.

헤드 유닛(120)에서는 서버 인식 결과를 획득하면, 타임 스탬프를 기반으로 최종 인식 결과에 포함될 바디를 결정할 수 있다(S360). 본 과정은 헤드 유닛(120)의 제어부를 통해 구동되는 음성 인식(VR: Voice Recognition) 어플리케이션을 통해 수행될 수 있다. 이때, 최종 인식 결과에 포함되는 이름(Name)에 해당하는 부분은 적어도 임베디드 인식 결과에 포함된 이름인 것은 전술된 바와 같다. 구체적인 바디 결정 형태는 도 3을 참조하여 설명한다.In the head unit (120), when the server recognition result is obtained, the body to be included in the final recognition result can be determined based on the time stamp (S360). This process can be performed through a voice recognition (VR: Voice Recognition) application driven by the control unit of the head unit (120). At this time, as described above, the part corresponding to the name included in the final recognition result is at least the name included in the embedded recognition result. The specific body determination form is described with reference to FIG. 3.

도 3을 참조하면, 임베디드 인식 결과에서 이름에 해당하는 부분(Morrow)는 음성 데이터의 시작을 기준으로 0.6초 내지 0.9초 사이에 위치한다. 따라서, 헤드 유닛(120)에서는 임베디드 인식 결과에서 이름에 해당하는 부분의 시점을 판단하여, 서버 인식 결과의 바디 중 0.9초 이후 부분만을 최종 인식 결과의 바디로 적용할 수 있다. 따라서, 최종 인식 결과는 'Send message to Morrow "I am on the way".'가 될 수 있다.Referring to Fig. 3, the part corresponding to the name (Morrow) in the embedded recognition result is located between 0.6 seconds and 0.9 seconds from the start of the voice data. Therefore, the head unit (120) can determine the timing of the part corresponding to the name in the embedded recognition result and apply only the part after 0.9 seconds of the body of the server recognition result as the body of the final recognition result. Therefore, the final recognition result can be 'Send message to Morrow "I am on the way".'

이러한 최종 인식 결과에 따라, 헤드 유닛(120)은 "I am on the way"라는 바디를 갖는 SMS를 폰북의 'Morrow'에 해당하는 고유 식별 주소로 전송할 수 있게 되며, 이러한 인식 결과는 헤드 유닛(120)의 출력부를 통해 출력될 수 있다(S370). 여기서, 출력부는 스피커와 디스플레이 중 적어도 하나를 포함할 수 있다. 예를 들어, 스피커를 통해서는 'Morrow에게 "I am on the way"라고 문자를 전송합니다'와 같은 음성 메시지가 출력될 수 있고, 디스플레이에는 그에 해당하는 텍스트 정보가 표시될 수 있다.According to this final recognition result, the head unit (120) can transmit an SMS having a body of "I am on the way" to a unique identification address corresponding to 'Morrow' in the phone book, and this recognition result can be output through the output unit of the head unit (120) (S370). Here, the output unit can include at least one of a speaker and a display. For example, a voice message such as 'Send a text message saying "I am on the way" to Morrow' can be output through the speaker, and corresponding text information can be displayed on the display.

상술한 실시예에서는 음성 명령 내에서 바디가 이름(Name) 뒤에 오는 경우를 가정하였으나, 본 발명의 실시예들은 음성 명령 내에서 바디와 이름 또는 그 외의 요소의 상대적인 위치에 제한되지 아니한다. 예를 들어, 'Send message to Morrow "I am on the way".' 대신에 'Send message "I am on the way" to Morrow.'와 같은 음성 명령에도 본 실시예는 적용이 가능하다. 이는 상술한 바와 같이 자연어 음성인식 처리하기 위한 말뭉치(Corpus)가 사전에 수집되어 임베디드 엔진 측에 저장되어 있기 때문이다. 결국, 임베디드 엔진 단에서도 말뭉치를 기반으로 딕테이션이 가능하므로, 각 인식된 단어 또는 문장 단위로 임베디드 엔진과 서버에서 타임 스탬프를 추적(tracking)이 가능하다. 다시 말해, 임베디드 엔진에서 말뭉치 기반 인식을 통해 서비스 도메인(Domain)과 의도(intention)에 해당하는 타임 스탬프를 판단하고, 폰북 기반 인식을 통해 대상 이름(name)의 타임 스탬프를 판단하면, 임베디드 엔진은 위치에 무관하게 나머지 부분을 바디(body)에 해당한다고 판단하여, 바디에 해당하는 시간 구간을 서버 인식 결과로 대체할 수 있게 된다.Although the above-described embodiment assumes a case where the body comes after the name in the voice command, the embodiments of the present invention are not limited to the relative positions of the body and the name or other elements in the voice command. For example, the present embodiment can be applied to a voice command such as 'Send message "I am on the way" to Morrow.' instead of 'Send message to Morrow "I am on the way".' This is because, as described above, a corpus for processing natural language speech recognition is collected in advance and stored on the embedded engine side. Ultimately, since dictation is possible based on the corpus even on the embedded engine side, it is possible to track the time stamp on the embedded engine and the server for each recognized word or sentence. In other words, if the embedded engine determines the timestamp corresponding to the service domain and intention through corpus-based recognition, and determines the timestamp of the target name through phonebook-based recognition, the embedded engine can determine that the remaining portion corresponds to the body regardless of location, and replace the time section corresponding to the body with the server recognition result.

또한, 지금까지 설명된 실시예들에서는 음성 명령어가 영어로 구성된 경우가 가정되었으나, 이는 설명의 편의를 위한 것으로 언어마다 음성 명령어 내에서 이름과 바디가 위치하는 지점에 따라 타임 스탬프를 이용하여 서버 인식 결과에서 바디를 추출하는 기준 시점이 상이하게 적용될 수 있음은 당업자에 자명하다.In addition, in the embodiments described so far, it has been assumed that the voice command is composed of English, but this is for convenience of explanation, and it is obvious to those skilled in the art that the reference point in time for extracting the body from the server recognition result using a time stamp may be applied differently depending on the location of the name and body in the voice command for each language.

전술한 본 발명은, 프로그램이 기록된 매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 매체는, 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 매체의 예로는, HDD(Hard Disk Drive), SSD(Solid State Disk), SDD(Silicon Disk Drive), ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장 장치 등이 있다.The above-described present invention can be implemented as a computer-readable code on a medium in which a program is recorded. The computer-readable medium includes all kinds of recording devices in which data that can be read by a computer system is stored. Examples of the computer-readable medium include a hard disk drive (HDD), a solid state disk (SSD), a silicon disk drive (SDD), a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, etc.

따라서, 상기의 상세한 설명은 모든 면에서 제한적으로 해석되어서는 아니되고 예시적인 것으로 고려되어야 한다. 본 발명의 범위는 첨부된 청구항의 합리적 해석에 의해 결정되어야 하고, 본 발명의 등가적 범위 내에서의 모든 변경은 본 발명의 범위에 포함된다.Accordingly, the above detailed description should not be construed as restrictive in all respects but should be considered as illustrative. The scope of the invention should be determined by a reasonable interpretation of the appended claims, and all changes within the equivalent scope of the invention are intended to be included in the scope of the invention.

Claims

A method for controlling a voice recognition system for a vehicle using a first voice recognition engine driven inside a vehicle and a second voice recognition engine driven outside the vehicle:
A step of acquiring voice data of a speaker for message transmission;
A step of recognizing the voice data through the first voice recognition engine;
A first extraction step for extracting a domain corresponding to the message transmission based on the recognized voice data;
A second extraction step of accessing the phone book and extracting name information based on the recognized voice data;
A step of requesting voice recognition of the voice data through the second voice recognition engine; and
A method for controlling a voice recognition system for a vehicle, comprising: when receiving a voice recognition result through the second voice recognition engine, a step of determining a final recognition result by replacing at least name information with the extraction result of the second extraction step.

delete

In the first paragraph,
Each of the first voice recognition engine and the second voice recognition engine,
A control method of a voice recognition system for a vehicle, comprising a step of extracting a time stamp for each predetermined recognition unit during a voice recognition process.

In the fourth paragraph,
A control method for a vehicle voice recognition system, wherein the above-mentioned predetermined recognition unit includes a word unit.

In the fourth paragraph,
The above decision-making steps are:
A control method for a vehicle voice recognition system, comprising a step of determining the location of the name information in the second extraction step based on the extracted time stamp.

In Article 6,
The above decision-making steps are:
A control method for a voice recognition system for a vehicle, further comprising a step of determining a body of the final recognition result by considering the position of the determined name information in the result of the voice recognition.

In the first paragraph,
A control method for a vehicle voice recognition system, further comprising a step of outputting the final recognition result through an output unit.

In the first paragraph,
The above first voice recognition engine is driven by a head unit or AVN system,
A method for controlling a vehicle voice recognition system, wherein the second voice recognition engine is driven by an external voice recognition server.

A computer-readable recording medium recording a program for executing a method for controlling a vehicle voice recognition system according to any one of claims 1 and 4 to 9.

A vehicle voice recognition system using a first voice recognition engine driven inside a vehicle and a second voice recognition engine driven outside the vehicle:
A microphone for receiving voice commands from the speaker for sending messages; and
Including a voice recognition device that obtains voice data corresponding to the above voice command and determines a final recognition result based on the voice data,
The above voice recognition device,
A control unit that obtains a first recognition result by performing a first extraction process of recognizing the voice data through the first voice recognition engine and extracting a domain corresponding to the message transmission based on the recognized voice data, and a second extraction process of accessing a phone book based on the recognized voice data and extracting name information; and
A communication unit that transmits the above voice data to an external voice recognition device that drives the second voice recognition engine and obtains a second recognition result, which is a result of the second voice recognition performed by the second voice recognition engine,
A vehicle voice recognition system, wherein the control unit determines the final recognition result by using at least the name information included in the first recognition result and the body information (SMS body) included in the second recognition result.

In Article 11,
The above control unit,
A vehicle voice recognition system that extracts the name information by searching the above phone book.

In Article 12,
The above control unit,
A vehicle voice recognition system that determines the domain corresponding to the above message transmission.

In Article 11,
A vehicle voice recognition system, wherein each of the first voice recognition engine and the second voice recognition engine extracts a time stamp for each predetermined recognition unit during the voice recognition process.

In Article 14,
A vehicle voice recognition system, wherein the above-mentioned predetermined recognition unit includes a word unit.

In Article 14,
The above control unit,
A vehicle voice recognition system, which determines the location of the name information in the first recognition result based on the extracted time stamp.

In Article 16,
The above control unit,
A vehicle voice recognition system that determines the body of the final recognition result by considering the location of the judged name information in the second recognition result.

In Article 11,
A vehicle voice recognition system further comprising an output unit that outputs the final recognition result.

In Article 11,
The above voice recognition device includes a head unit or AVN system,
A vehicle voice recognition system, wherein the external voice recognition device includes an external voice recognition server.