KR20130052800A

KR20130052800A - Apparatus for providing speech recognition service, and speech recognition method for improving pronunciation error detection thereof

Info

Publication number: KR20130052800A
Application number: KR1020110118048A
Authority: KR
Inventors: 김영준
Original assignee: 에스케이텔레콤 주식회사
Priority date: 2011-11-14
Filing date: 2011-11-14
Publication date: 2013-05-23
Anticipated expiration: 2031-11-14
Also published as: KR101854369B1

Abstract

본 발명은 각 단어에 대한 기준 발음이 정의된 음성인식용 발음 사전 및 각 단어에 대하여 발생 가능한 오류 발음 유형을 정의한 오류 검출용 발음 사전을 구분하여 저장하고, 단말 장치로부터 사용자 음성 입력에 대한 오류 검출이 요청되면, 상기 음성인식용 발음 사전을 기준으로 상기 사용자 음성 입력에 대응하는 단어를 인식한 후, 상기 오류 검출용 발음 사전으로부터 상기 인식된 단어에 대한 오류 발음을 추출하고, 이를 상기 사용자 음성 입력과 비교하여, 사용자 발음에 대한 오류를 검출하여 단말 장치에 제공하는 서비스 장치; 사용자로부터 상기 사용자 음성을 입력 받아, 상기 서비스 장치로 전송하여 오류 검출을 요청하고, 상기 서비스 장치로부터 상기 사용자 음성 입력의 발음에 대한 오류 검출 결과를 수신하여 사용자에게 출력하는 단말 장치를 포함하는 것을 특징으로 하는 오류 발음 검출 능력 향상을 위한 음성 인식 시스템을 제공하여, 오류 유형을 아는 상태에서 오류를 검출하므로 빠르고 정확하게 음성 인식을 수행하고 발성한 단어에 존재하는 오류를 검출할 수 있다.The present invention distinguishes and stores a pronunciation dictionary for speech recognition in which a reference pronunciation for each word is defined and a pronunciation dictionary for error detection for defining an error pronunciation type that may occur for each word, and detects an error for a user's voice input from a terminal device. When the request is made, after recognizing a word corresponding to the user's voice input based on the pronunciation dictionary for voice recognition, an error pronunciation of the recognized word is extracted from the error detection pronunciation dictionary, and the user's voice input is performed. A service device for detecting an error regarding a user pronunciation and providing the same to a terminal device; And a terminal device for receiving the user's voice from the user, transmitting the user's voice to the service device, requesting an error detection, and receiving an error detection result of the pronunciation of the user's voice input from the service device and outputting the result to the user. The present invention provides a speech recognition system for improving error pronunciation detection capability, and detects an error in a state of knowing an error type, thereby quickly and accurately performing speech recognition and detecting an error present in a spoken word.

Description

Apparatus for providing speech recognition service, and speech recognition method for improving pronunciation error detection

본 발명은 사용자 음성에 대하여 음성 인식을 수행하는 음성 인식에 관한 것으로, 특히, 모국어 간섭에 의해 발생하는 외국어의 발음 오류에 대한 검출 능력을 향상시키기 위한 음성 인식 서비스를 제공하는 장치 및 그의 오류 발음 검출 능력 향상을 위한 음성 인식 방법에 관한 것이다.The present invention relates to speech recognition for performing speech recognition on a user's voice, and more particularly, to an apparatus for providing a speech recognition service for improving the detection capability of pronunciation errors in a foreign language caused by native language interference, and error pronunciation detection thereof. The present invention relates to a speech recognition method for improving ability.

일반적으로 음성인식에서는 사람의 목소리 패턴과 유사도를 측정하기 위하여 음향모델을 사용한다. 이때 음운(phoneme)을 음향모델의 최소 단위로 사용하는 경우가 많다. 더하여 단어마다 발성 가능한 음운열 정보를 기록해 놓은 것을 사전(lexicon)이라고 한다. 한 단어도 여러 개의 음운열로 발성되는 경우가 있기 때문에 이렇게 여러 개의 음운열로 발음되는 단어에 대해서는 각기 다른 음운열에 해당하는 정보를 복수로 추가하게 된다.In general, voice recognition uses acoustic models to measure human voice patterns and similarities. The phoneme is often used as the minimum unit of the acoustic model. In addition, a lexicon that records phonological information that can be spoken for each word is called a lexicon. Since a word is often pronounced with multiple phonological strings, a plurality of pieces of information corresponding to different phonological strings are added to a word pronounced with multiple phonological strings.

기존의 음성 인식에서는 정확한 발음(표준 발음)에 대한 발음 사전만을 사용하여 사용자가 발성한 음성에 대응하는 단어를 인식하였다.In conventional speech recognition, words corresponding to voices spoken by a user are recognized using only pronunciation dictionaries for accurate pronunciation (standard pronunciation).

이러한 기존의 음성 인식 기술에서는, 외국어 학습자의 모국어 간섭에 의해 발음 오류가 발생하는 경우, 이들을 정확히 인식하기 어려우며, 특히, 표준 발음과 모국어 간섭에 의한 오류 발음 구분하여 인식하는 것은 불가능하였다.In the existing speech recognition technology, when a pronunciation error occurs due to a foreign language learner's native language interference, it is difficult to accurately recognize them, and in particular, it is impossible to distinguish between the standard pronunciation and the error pronunciation by the native language interference.

모국어 간섭에 의한 발음 오류에 대해서도 정확히 단어를 인식하기 위해서는, 발음 사전에 더 많은 음운열을 추가하여야 하는데, 이렇게 하나의 단어에 대해 너무 많은 개수의 음운열 정보를 사용할 경우, 확률적으로 맞춰야 하는 음성인식 성능이 떨어지게 된다.In order to correctly recognize words even with pronunciation errors caused by native language interference, more phonological sequences must be added to the pronunciation dictionary. Recognition performance is reduced.

본 발명은 음성 인식에 이용되는 발음 사전을, 음성 인식용과 오류검출용으로 구분하여 구성하고, 음성인식용 발음 사전을 이용해서 음성인식을 수행하여 정확한 단어를 찾고, 오류검출용 발음 사전을 적용하여 발화자가 발성한 단어에 존재하는 발음 오류를 검출하도록 함으로써, 음성 인식 성능 및 오류 검출 성능을 향상시킬 수 있는 음성 인식 서비스를 제공하는 장치 및 그의 오류 발음 검출 능력 향상을 위한 음성 인식 방법을 제공하고자 한다.The present invention is configured by dividing the pronunciation dictionary used for speech recognition for speech recognition and error detection, by performing speech recognition using the speech recognition pronunciation dictionary to find the correct words, by applying the pronunciation dictionary for error detection An apparatus for providing a speech recognition service capable of improving speech recognition performance and error detection performance by allowing a talker to detect a pronunciation error present in a spoken word, and a speech recognition method for improving error pronunciation detection capability thereof is provided. .

상술한 과제를 해결하기 위한 수단으로서, 본 발명은 통신망을 통해 데이터를 송수신하는 통신부; 음성 인식을 위해 단어별로 발생 가능한 기준 음운열을 정의하는 음성인식용 발음 사전과, 사용자 발성에 대한 오류를 검출하기 위해 각 단어 별로 발생 가능한 오류 음운열을 정의하는 오류검출용 발음 사전을 구분하여 저장하는 저장부; 및 상기 통신부를 통해 단말 장치로부터 사용자 음성이 수신되면, 상기 음성인식용 발음 사전을 기준으로 상기 사용자 음성에 대응하는 단어를 인식한 후, 상기 오류검출용 발음 사전을 이용하여 상기 인식된 단어에 대한 발음 오류를 검출하여, 음성 인식 결과 및 오류 검출 결과를 상기 단말 장치에 제공하는 서비스 제공부;를 포함하는 것을 특징으로 하는 음성인식 서비스를 제공하는 서비스 장치를 제공한다.As a means for solving the above problems, the present invention provides a communication unit for transmitting and receiving data through a communication network; It stores and classifies phonetic dictionaries for speech recognition that define reference phoneme sequences that can occur by words for speech recognition, and pronunciation dictionaries for error detection that define error phonetic sequences that can occur for each word to detect errors about user speech. A storage unit; And when a user voice is received from the terminal device through the communication unit, after recognizing a word corresponding to the user voice based on the voice recognition pronunciation dictionary, using the error detection pronunciation dictionary for the recognized word. And a service providing unit for detecting a pronunciation error and providing a voice recognition result and an error detection result to the terminal device.

본 발명에 의한 서비스 장치에 있어서, 음성인식용 발음 사전 및 오류 검출용 발음 사전은 모국어 간섭에 의해 발생 가능한 오류 음운열을 포함하되, 음성인식용 발음 사전은 오류검출용 발음 사전에 비하여 더 적은 음운열을 포함하는 것을 특징으로 한다.In the service apparatus according to the present invention, the pronunciation dictionary for speech recognition and the pronunciation dictionary for error detection include error phonological sequences that may occur due to native language interference, but the pronunciation dictionary for speech recognition has less phonology than the pronunciation dictionary for error detection It is characterized by including the heat.

본 발명에 의한 서비스 장치에 있어서, 서비스 제공부는 사용자 음성에 대하여 발음, 장단, 억양, 강세 중 하나 이상을 포함하는 오류 유형을 검출하는 오류 검출 모듈을 포함할 수 있다.In the service device according to the present invention, the service providing unit may include an error detection module that detects an error type including at least one of pronunciation, short, long, intonation, and stress for the user's voice.

본 발명에 의한 서비스 장치에 있어서, 오류검출용 발음 사전은 각각의 오류 음운열 별로 오류 발생 빈도율을 더 포함하고, 서비스 제공부는 오류 검출 시 오류 빈도율이 높은 순서대로 오류 음운열과 상기 사용자 음성을 비교하여 오류를 검출할 수 있다.In the service device according to the present invention, the pronunciation dictionary for error detection further includes an error occurrence rate for each error phoneme sequence, and the service provider is configured to determine the error phoneme sequence and the user voice in the order of the high error frequency rate when the error is detected. By comparison, errors can be detected.

본 발명에 의한 서비스 장치에 있어서, 서비스 제공부는 단말 장치로 오류 검출 결과의 제공 시, 각각의 오류 유형에 대한 교정 방법 또는 오류 원인에 대한 정보를 더 제공할 수 있다.In the service device according to the present invention, the service providing unit may further provide information on a correction method or an error cause for each error type when providing an error detection result to the terminal device.

본 발명에 의한 서비스 장치의 서비스 제공부는, 상기 사용자 음성이 표준 음운열과의 매칭되지는 않으나 유사도가 기 설정된 범위이고, 오류검출용 발음 사전에 대응하는 음운열이 존재하지 않는 경우, 상기 사용자 음성에 대응하는 음운열을 오류검출용 발음 사전에 추가하는 오류 관리 모듈을 더 포함할 수 있다.The service providing unit of the service apparatus according to the present invention, if the user voice does not match the standard phonological sequence, but similarity is a preset range, and there is no phonological sequence corresponding to the pronunciation dictionary for error detection, The apparatus may further include an error management module for adding a corresponding phonological sequence to the pronunciation dictionary for error detection.

또한, 본 발명은 상술한 과제를 해결하기 위한 다른 수단으로서, 음성 인식을 위해 단어별로 발생 가능한 기준 음운열을 정의하는 음성인식용 발음 사전과, 사용자 발성에 대한 오류를 검출하기 위해 각 단어 별로 발생 가능한 오류 음운열을 정의하는 오류검출용 발음 사전을 구분하여 저장하는 저장부; 사용자의 요청을 입력 받기 위한 입력부; 사용자의 음성을 입력 받는 오디오 처리부; 입력부를 통해 입력된 사용자의 요청에 따라서, 음성인식용 발음 사전을 이용하여 오디오 처리부를 통해 입력된 사용자 음성에 대응하는 단어를 인식하고, 오류검출용 발음 사전을 이용하여 인식된 단어에 대한 발음 오류를 추출하고, 음성 인식 결과 및 오류 검출 결과를 출력하도록 제어하는 제어부; 및 음성 인식 결과 및 오류 검출 결과를 출력하는 출력부;를 포함하는 것을 특징으로 하는 음성인식 서비스를 제공하는 단말 장치를 제공한다.In addition, the present invention as another means for solving the above-described problems, a phonetic dictionary for speech recognition that defines a reference phonological sequence that can be generated for each word for speech recognition, and generated for each word to detect errors in user utterance A storage unit for classifying and storing a pronunciation dictionary for error detection that defines a possible error phonological sequence; An input unit for receiving a request of a user; An audio processor configured to receive a user's voice; In response to a user's request input through the input unit, a word corresponding to the user's voice input through the audio processor is recognized by using the voice recognition pronunciation dictionary, and a pronunciation error of the recognized word using the pronunciation dictionary for error detection. A control unit for extracting the control unit and outputting the voice recognition result and the error detection result; And an output unit configured to output a voice recognition result and an error detection result.

본 발명에 의한 단말 장치에 있어서, 음성인식용 발음 사전 및 오류검출용 발음 사전은 모국어 간섭에 의해 발생 가능한 오류 음운열을 포함하되, 음성인식용 발음 사전은 오류검출용 발음 사전에 비하여 더 적은 음운열을 포함하는 것을 특징으로 한다.In the terminal device according to the present invention, the pronunciation dictionary for speech recognition and pronunciation dictionary for error detection includes error phonological sequences that may occur due to native language interference, but the pronunciation dictionary for speech recognition is less phonetic than the pronunciation dictionary for error detection It is characterized by including the heat.

본 발명에 의한 단말 장치에 있어서, 제어부는 사용자 음성에 대하여 발음, 장단, 억양, 강세 중 하나 이상을 포함하는 오류 유형을 더 검출할 수 있다.In the terminal device according to the present invention, the control unit may further detect an error type including one or more of pronunciation, short and long term, intonation, and stress for the user's voice.

본 발명에 의한 단말 장치에 있어서, 오류검출용 발음 사전은 각각의 오류 음운열 별로 오류 발생 빈도율을 더 포함하고, 제어부는 오류 검출 시 오류 빈도율이 높은 순서대로 오류 음운열과 상기 사용자 음성을 비교하여 오류를 검출할 수 있다.In the terminal device according to the present invention, the pronunciation dictionary for error detection further includes an error occurrence rate for each error phoneme sequence, and the controller compares the error phoneme sequence with the user's voice in the order of the high error frequency rate when the error is detected. Error can be detected.

본 발명에 의한 단말 장치에 있어서, 제어부는 오류 검출 결과의 제공 시, 각각의 오류 유형에 대한 교정 방법 또는 오류 원인에 대한 정보를 더 제공할 수 있다.In the terminal device according to the present invention, when providing an error detection result, the controller may further provide information about a correction method or an error cause for each error type.

본 발명에 의한 단말 장치에 있어서, 제어부는 통신부를 통해 서비스 장치로부터 음성인식용 발음 사전 및 오류검출용 발음 사전을 수신하여 저장할 수 있다.In the terminal device according to the present invention, the control unit may receive and store a pronunciation dictionary for speech recognition and a pronunciation dictionary for error detection from the service device through the communication unit.

또한, 본 발명은 상술한 과제를 해결하기 위한 또 다른 수단으로서, 음성 인식을 위해 단어 별로 발생 가능한 기준 음운열을 정의하는 음성인식용 발음 사전을 이용하여, 사용자 음성에 대응하는 단어를 인식하는 단계; 및 사용자 발성에 대한 오류를 검출하기 위해 각 단어 별로 발생 가능한 오류 음운열을 정의하는 오류검출용 발음 사전을 이용하여, 사용자 음성에 포함된 발음 오류를 검출하는 단계를 포함하는 것을 특징으로 하는 오류 발음 검출 능력 향상을 위한 음성 인식 방법을 제공한다.In addition, the present invention as another means for solving the above problems, using a speech recognition pronunciation dictionary that defines the reference phonological sequence that can be generated for each word for speech recognition, the step of recognizing a word corresponding to the user's voice ; And detecting a pronunciation error included in the user's voice by using an error detection pronunciation dictionary that defines an error phonological sequence that can occur for each word to detect an error of the user's speech. A voice recognition method for improving detection capability is provided.

본 발명에 의한 음성 인식 방법에 있어서, 음성인식용 발음 사전 및 오류검출용 발음 사전은 모국어 간섭에 의해 발생 가능한 오류 음운열을 포함하되, 음성인식용 발음 사전은 오류검출용 발음 사전에 비하여 더 적은 음운열을 포함하는 것을 특징으로 한다.In the speech recognition method according to the present invention, the pronunciation dictionary for speech recognition and pronunciation dictionary for error detection include error phonological sequences that may occur due to native language interference, but the pronunciation dictionary for speech recognition is less than the pronunciation dictionary for error detection. It is characterized by including phonological heat.

본 발명에 의한 음성 인식 방법은, 상기 검출된 오류에 대하여, 발음, 장단, 억양, 강세 중 하나 이상을 포함하는 오류 유형을 검출하는 단계를 더 포함할 수 있다.The speech recognition method according to the present invention may further include detecting, based on the detected error, an error type including one or more of pronunciation, short, long, intonation, and stress.

본 발명은 음성 인식을 위해, 단어 별로 발생 가능한 음운열을 정의하는 발음 사전을 구축하는데 있어서, 음성인식용과 오류검출용을 구분하여 구성하고, 음성인식용 발음 사전을 통해 사용자 음성에 대응하는 단어를 인식한 후, 오류검출용 발음 사전을 통해 사용자 음성에 포함된 발음 오류를 검출하도록 함으로써, 음성 인식 및 오류 검출 성능을 향상시킬 수 있다.According to the present invention, a speech dictionary for defining a phonological sequence that can be generated for each word for speech recognition is configured by distinguishing between speech recognition and error detection, and a word corresponding to a user's voice through a speech recognition pronunciation dictionary. After the recognition, the pronunciation error included in the user's voice is detected through the pronunciation dictionary for error detection, thereby improving speech recognition and error detection performance.

특히, 본 발명은 음성인식용 발음 사전 및 오류 검출용 발음 사전의 구성시 사용자의 모국어 간섭에 의해 발생 가능한 오류 음운열을 포함하도록 하되, 음성 인식용 발음 사전의 음운열이 오류 검출용 발음 사전의 음운열보다 적게 함으로써, 음성 인식시에는 최소로 압축된 음운열로 학습하여, 음성 인식 속도 및 정확도를 향상시키고, 오류 검출시에는 발생 가능한 모든 오류 음운열을 적용함으로써, 오류 검출 성능을 향상시킬 수 있다.In particular, the present invention includes an error phonological sequence that may occur due to a user's native language interference when the speech dictionary and error detection pronunciation dictionary are configured, and the phonological sequence of the pronunciation dictionary for speech recognition is a pronunciation dictionary for error detection. By reducing the phonological sequence, the speech recognition system can be trained with the least compressed phonological sequence to improve the speech recognition speed and accuracy, and the error detection performance can be improved by applying all possible error phonological sequences. have.

도 1은 본 발명에 의한 오류 발음 검출 능력 향상을 위한 음성 인식 시스템을 나타낸 블록도이다.
도 2는 본 발명에 의한 오류 발음 검출 능력 향상을 위한 음성 인식 시스템에서 서비스 장치의 구성을 나타낸 블록도이다.
도 3은 본 발명의 서비스 장치를 통해 수행되는 오류 발음 검출 능력 향상을 위한 음성 인식 방법을 나타낸 순서도이다.
도 4는 본 발명의 다른 실시예에 의한 오류 발음 검출 능력 향상을 위한 음성 인식 서비스 제공을 위한 단말 장치의 구성을 나타낸 블록도이다.
도 5는 본 발명의 다른 실시예에 의한 오류 발음 검출 능력 향상을 위한 음성 인식 방법을 나타낸 순서도이다.1 is a block diagram illustrating a speech recognition system for improving error pronunciation detection capability according to the present invention.
2 is a block diagram showing a configuration of a service apparatus in a speech recognition system for improving error pronunciation detection capability according to the present invention.
3 is a flowchart illustrating a speech recognition method for improving error pronunciation detection capability performed by the service apparatus of the present invention.
4 is a block diagram illustrating a configuration of a terminal device for providing a voice recognition service for improving error pronunciation detection capability according to another embodiment of the present invention.
5 is a flowchart illustrating a speech recognition method for improving error pronunciation detection capability according to another embodiment of the present invention.

이하 본 발명의 바람직한 실시 예를 첨부한 도면을 참조하여 상세히 설명한다. 다만, 하기의 설명 및 첨부된 도면에서 본 발명의 요지를 흐릴 수 있는 공지 기능 또는 구성에 대한 상세한 설명은 생략한다. 또한, 도면 전체에 걸쳐 동일한 구성 요소들은 가능한 한 동일한 도면 부호로 나타내고 있음에 유의하여야 한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the following description and the accompanying drawings, detailed description of well-known functions or constructions that may obscure the subject matter of the present invention will be omitted. In addition, it should be noted that like elements are denoted by the same reference numerals as much as possible throughout the drawings.

이하에서 설명되는 본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니 되며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위한 용어로 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야만 한다. 따라서 본 명세서에 기재된 실시 예와 도면에 도시된 구성은 본 발명의 가장 바람직한 일 실시 예에 불과할 뿐이고, 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형 예들이 있을 수 있음을 이해하여야 한다.The terms and words used in the present specification and claims should not be construed in an ordinary or dictionary sense, and the inventor shall properly define the terms of his invention in the best way possible It should be construed as meaning and concept consistent with the technical idea of the present invention. Therefore, the embodiments described in the present specification and the configurations shown in the drawings are merely the most preferred embodiments of the present invention, and not all of the technical ideas of the present invention are described. Therefore, It is to be understood that equivalents and modifications are possible.

도 1은 본 발명에 의한 오류 발음 검출 능력 향상을 위한 음성 인식 시스템을 나타낸 블록도로서, 이를 참조하면, 본 발명의 일 실시 예에 따른 음성 인식 시스템은 단말 장치(100)와, 서비스 장치(200)와, 네트워크(300)를 포함하여 이루어질 수 있다.1 is a block diagram illustrating a speech recognition system for improving error pronunciation detection capability according to an embodiment of the present invention. Referring to this, the speech recognition system according to an embodiment of the present invention may include a terminal device 100 and a service device 200. ), And the network 300.

본 발명의 일 실시 예에 있어서, 음성 인식 서비스는 서버 기반 컴퓨팅 방식으로 이루어질 수 있다. 여기서, 서비스 기반 컴퓨팅 방식은, 네트워크를 매개로 연결된 임의의 장치에서 본 발명에 따른 오류 발음 검출 능력 향상을 위한 음성 인식 서비스 제공 방법의 처리가 이루어지고, 단말 장치에서는 입출력만 이루어지는 방식을 의미한다. 이하에서는 설명의 편의를 위해 본 발명에 따른 오류 발음 검출 능력 향상을 위한 음성 인식 서비스를 제공하는 장치를, 서비스 장치(200)로 구분하기로 한다.In one embodiment of the present invention, the voice recognition service may be performed in a server-based computing scheme. Here, the service-based computing method means a method of providing a speech recognition service providing method for improving error pronunciation detection capability according to the present invention in an arbitrary device connected through a network, and performing only input / output in a terminal device. Hereinafter, for convenience of description, an apparatus for providing a speech recognition service for improving error pronunciation detection capability according to the present invention will be divided into a service apparatus 200.

서비스 장치(200)는, 네트워크(300)를 통해서 단말 장치(100)로 본 발명에 따른 오류 발음 검출 능력 향상을 위한 음성 인식 서비스를 제공한다. 더 구체적으로, 서비스 장치(200)는 각 단어에 대한 기준 음운열이 정의된 음성인식용 발음 사전 및 각 단어에 대하여 발생 가능한 오류 음운열을 정의한 오류 검출용 발음 사전을 구분하여 저장하고, 단말 장치(100)로부터 사용자 음성 입력에 대한 오류 검출이 요청되면, 음성인식용 발음 사전을 기준으로 사용자 음성 입력에 대응하는 단어를 인식한 후, 오류 검출용 발음 사전으로부터 인식된 단어에 대한 오류 발음을 추출하고, 이를 단말 장치(100)에 제공한다.The service device 200 provides a voice recognition service for improving the error pronunciation detection capability according to the present invention to the terminal device 100 through the network 300. More specifically, the service device 200 distinguishes and stores a speech recognition pronunciation dictionary in which a reference phonological sequence for each word is defined and a pronunciation dictionary for error detection in which an error phonological sequence that may occur for each word is stored and stored. When the error detection for the user's voice input is requested from the user 100, after recognizing a word corresponding to the user's voice input based on the pronunciation dictionary for speech recognition, the error pronunciation for the recognized word is extracted from the pronunciation dictionary for error detection. And provide it to the terminal device 100.

상기에서 음성인식용 발음사전과 오류 검출용 발음 사전은 모두 모국어 간섭에 의해 발생 가능한 오류 음운열을 포함할 수 있으며, 다만, 음성 인식용 발음 사전은 오류 검출용 발음 사전에 비하여 더 적은 음운열을 포함하는 것이 바람직하다. 이를 통해 본 발명은 음성 인식 성능 뿐만 아니라 오류 검출 성능까지 향상시킬 수 있다.The pronunciation dictionary for speech recognition and the pronunciation dictionary for error detection may both include error phonological sequences that may occur due to native language interference. However, the pronunciation dictionary for speech recognition may have less phonological sequence than the pronunciation dictionary for error detection. It is preferable to include. Through this, the present invention can improve not only speech recognition performance but also error detection performance.

이러한 서비스 장치(200)는 서버-클라이언트 컴퓨팅 방식으로 동작할 수도 있고, 클라우드 컴퓨팅 기반으로 동작할 수도 있다. 즉, 오류 발음 검출 능력 향상을 위한 음성 인식 서비스를 진행하는데 필요한 컴퓨터 자원, 예를 들면, 하드웨어, 소프트웨어 중에서 하나 이상을 서비스 장치(200)에 제공할 수 있다.The service device 200 may operate in a server-client computing manner or may operate on a cloud computing basis. That is, the service apparatus 200 may provide one or more of computer resources, for example, hardware or software, required to perform a voice recognition service for improving error pronunciation detection capability.

단말 장치(100)는 사용자가 이용하는 다양한 형태의 장치로서, 예를 들면, PC(Personal Computer), 노트북 컴퓨터, 휴대폰(mobile phone), 태블릿 PC, 내비게이션(navigation) 단말기, 스마트폰(smart phone), PDA(Personal Digital Assistants), 스마트 TV(Smart TV), PMP(Portable Multimedia Player) 및 디지털방송 수신기를 포함할 수 있다. 물론 이는 예시에 불과할 뿐이며, 상술한 예 이외에도 현재 개발되어 상용화되었거나 향후 개발될 모든 통신이 가능한 장치를 포함하는 개념으로 해석되어야 한다.The terminal device 100 is a various type of device used by a user, for example, a personal computer (PC), a notebook computer, a mobile phone, a tablet PC, a navigation terminal, a smart phone, Personal Digital Assistants (PDAs), Smart TVs (Smart TVs), Portable Multimedia Players (PMPs), and digital broadcast receivers may be included. Of course, this is merely an example, and it should be construed as a concept including a device that is currently developed, commercialized, or capable of all communication to be developed in the future, in addition to the above-described examples.

이러한 단말 장치(100)는 오류 발음 검출 능력 향상을 위한 음성 인식 서비스를 요청하는 사용자가 사용할 수 있다. 본 발명에 따른 오류 발음 검출 능력 향상을 위한 음성 인식 시스템에서, 단말 장치(100)는 사용자로부터 사용자 음성을 입력 받아, 서비스 장치(200)로 전송하여 오류 검출을 요청하고, 서비스 장치(200)로부터 사용자 음성에 대한 음성 인식 및 오류 검출 결과를 수신하여 사용자에게 출력한다.The terminal device 100 may be used by a user who requests a voice recognition service for improving error pronunciation detection capability. In the voice recognition system for improving the error pronunciation detection capability according to the present invention, the terminal device 100 receives a user's voice from the user, transmits it to the service device 200 to request error detection, and from the service device 200. Receive voice recognition and error detection results for the user's voice and output the result to the user.

네트워크(300)는 서비스 장치(200)와 복수의 단말 장치(100) 간에 데이터의 송수신을 위한 통로를 제공한다. 이러한 네트워크(300)는 인터넷 프로토콜(IP)을 통하여 대용량 데이터의 송수신 서비스 및 끊기는 현상이 없는 데이터 서비스를 제공하는 아이피망으로, 아이피를 기반으로 서로 다른 망을 통합한 아이피망 구조인 올 아이피(All IP)망 일 수 있다. 또한, 네트워크(300)는 유선네트워크, Wibro(Wireless Broadband)망, WCDMA를 포함하는 3 세대 이동네트워크, HSDPA(High Speed Downlink Packet Access)망 및 LTE망을 포함하는 3.5세대 이동네트워크, LTE advanced를 포함하는 4세대 이동네트워크, 위성네트워크 및 와이파이(Wi-Fi)망을 포함하는 무선랜 중 하나 이상을 포함하여 이루어질 수 있다.The network 300 provides a passage for transmitting and receiving data between the service device 200 and the plurality of terminal devices 100. The network 300 is an IP network providing a data transmission / reception service and a disconnected data service through an internet protocol (IP), and an IP network structure in which different networks are integrated based on IP. IP) network. In addition, the network 300 includes a wired network, a wireless broadband network (Wibro), a third generation mobile network including WCDMA, a 3.5 generation mobile network including a high speed downlink packet access (HSDPA) network, and an LTE network, and LTE advanced. It can be made by including one or more of the wireless LAN, including a 4G mobile network, satellite network and Wi-Fi (Wi-Fi) network.

본 발명에 있어서, 단말 장치(100)는 오류 발음 검출 능력 향상을 위한 음성 인식 서비스 제공을 위한 출력 기능만을 수행하므로, 이하에서 서비스 장치(200)를 위주로 설명하기로 한다.In the present invention, since the terminal device 100 performs only an output function for providing a voice recognition service for improving error pronunciation detection capability, the service device 200 will be described below.

도 2는 본 발명의 일 실시예에 의한 오류 발음 검출 능력 향상을 위한 음성 인식 서비스를 제공하기 위한 서비스 장치(200)의 구성을 나타낸 블록도이다. 도 2에서는 서비스 장치(200)의 구성을 기능 단위로 표현하였으나, 이는 실제로 구현 시 다수의 서버 장치에 분산되어 구현될 수도 있고, 하나의 서버 장치에 구현될 수도 있다.2 is a block diagram illustrating a configuration of a service apparatus 200 for providing a voice recognition service for improving error pronunciation detection capability according to an embodiment of the present invention. In FIG. 2, the configuration of the service device 200 is expressed in functional units, but in practice, the service device 200 may be distributed to a plurality of server devices or may be implemented in one server device.

도 2를 참조하면, 본 발명의 오류 발음 검출 능력 향상을 위한 음성 인식 시스템에 있어서 서비스 장치(200)는 통신부(210), 저장부(220), 서비스 제공부(230)를 포함하여 이루어질 수 있다.Referring to FIG. 2, in the voice recognition system for improving error pronunciation detection capability of the present invention, the service device 200 may include a communication unit 210, a storage unit 220, and a service provider unit 230. .

통신부(210)는 네트워크(300)를 통하여 단말 장치(100)와 데이터를 주고받는다.The communication unit 210 exchanges data with the terminal device 100 through the network 300.

저장부(220)는 서비스 장치(200)의 동작을 위한 데이터 및 프로그램을 저장하는 수단으로서, 특히, 본 발명에 의한 오류 발음 검출 능력 향상을 위한 음성 인식 서비스 제공을 위하여, 각 단어에 대한 기준 음운열이 정의된 음성인식용 발음 사전(221) 및 각 단어에 대하여 발생 가능한 오류 음운열을 정의한 오류 검출용 발음 사전(222)을 구분하여 저장한다. 더하여 오류 검출용 발음 사전(222)은 각각의 오류 음운열에 대한 발생 빈도율을 더 포함하여 오류 검출 시 발생 빈도율이 높은 순서대로 오류 유형을 입력된 음성과 비교하여 검출할 수 있다.The storage unit 220 is a means for storing data and a program for the operation of the service apparatus 200. In particular, the storage unit 220 provides a reference phoneme for each word in order to provide a voice recognition service for improving error pronunciation detection capability according to the present invention. A speech recognition pronunciation dictionary 221 having a column defined and an error detection pronunciation dictionary 222 defining an error phonological sequence that may occur for each word are classified and stored. In addition, the pronunciation dictionary 222 for error detection may further include an incidence rate for each error phoneme sequence and compare the error type with the input voice in order of occurrence of the incidence rate.

서비스 제공부(230)는, 단말 장치(100)로부터 사용자 음성 입력에 대한 오류 검출이 요청되면, 음성인식용 발음 사전(221)을 기준으로 사용자 음성 입력에 대응하는 단어를 인식한 후, 오류 검출용 발음 사전(222)으로부터 인식된 단어에 대한 발생가능한 오류 음운열을 추출하고, 이를 사용자 음성과 비교하여, 사용자 발음에 대한 오류를 검출하여 단말 장치(100)에 제공한다.When the service provider 230 requests an error detection for the user's voice input from the terminal device 100, the service provider 230 recognizes a word corresponding to the user's voice input based on the phonetic pronunciation dictionary 221 and then detects the error. A probable error phonological sequence for the recognized word is extracted from the phonetic dictionary 222, is compared with the user's voice, and an error about the user's pronunciation is detected and provided to the terminal device 100.

더불어, 서비스 제공부(230)는, 음성 입력된 발음이 표준 발음과 동일하지는 않으나, 기 설정된 유사도 범위에 해당하며, 오류 검출용 발음 사전(222)에 존재하지 않는 발음에 해당하는 경우, 상기 사용자 음성에 대응하는 음운열을 오류 검출용 발음 사전(222)에 추가하여 저장할 수 있다.In addition, the service provider 230, although the voice input pronunciation is not the same as the standard pronunciation, but corresponds to a preset similarity range, and corresponds to a pronunciation that does not exist in the pronunciation dictionary 222 for error detection, the user A phonological sequence corresponding to the voice may be added to and stored in the pronunciation dictionary 222 for error detection.

이러한 서비스 제공부(230)는, 본 발명에 의한 오류 발음 검출 능력 향상을 위한 음성 인식 서비스 제공을 위한 음성 인식 모듈(231)과 오류 검출 모듈(232)과 오류 관리 모듈(233)을 포함하여 이루어질 수 있다.The service provider 230 includes a voice recognition module 231, an error detection module 232, and an error management module 233 for providing a voice recognition service for improving error pronunciation detection capability according to the present invention. Can be.

음성 인식 모듈(231)은 음성인식용 발음 사전(221)을 기준으로 사용자 음성 입력에 대응하는 단어를 인식한다.The voice recognition module 231 recognizes a word corresponding to a user's voice input based on the phonetic pronunciation dictionary 221.

오류 검출 모듈(232)은 오류 검출용 발음 사전으로부터 인식된 단어에 대하여 발생 가능한 다수의 오류 음운열을 추출하여 사용자 음성과 비교하여 사용자 음성에 포함된 오류를 검출한다. 오류 검출 시 오류 빈도율이 높은 순서대로 오류 음운열을 사용자 음성과 비교하여 검출할 수 있다. The error detection module 232 extracts a plurality of error phonological sequences that can be generated for the recognized words from the pronunciation dictionary for error detection, and detects an error included in the user's voice by comparing with the user's voice. When detecting an error, an error phonological sequence may be compared with a user's voice in order of increasing error frequency.

오류 관리 모듈(233)은 사용자 음성에 대응하는 발음이 표준 발음과 기 설정된 유사도 범위에 해당하고, 오류 검출용 발음 사전(222)에 존재하지 않는 새로운 오류 음운열에 해당하는 경우 오류 검출용 발음 사전(222)에 추가한다. 상기 새로운 오류 음운열의 추가는 사용자의 요청에 따라서 이루어질 수 있다.If the pronunciation corresponding to the user's voice corresponds to the standard pronunciation and a preset similarity range, the error management module 233 corresponds to a new error phonological sequence that does not exist in the pronunciation dictionary 222 for error detection. 222). The addition of the new error phoneme sequence may be made at the request of the user.

음성 인식 모듈(231)과, 오류 검출 모듈(232)과, 오류 관리 모듈(233)은 소프트웨어 혹은 하드웨어 혹은 소프트웨어와 하드웨어의 조합에 의해 구현될 수 있는 것으로서, 예를 들면, 프로그램 형태로 저장부(220)에 저장되어 있다가 서비스 제공부(230)에 의해 실행됨에 의해 구현될 수 있다.The voice recognition module 231, the error detection module 232, and the error management module 233 may be implemented by software or hardware or a combination of software and hardware. For example, the voice recognition module 231, the error detection module 232, and the error management module 233 may be stored in the form of a program. It may be implemented by being stored in 220 and executed by the service provider 230.

도 3은 본 발명에 의한 오류 발음 검출 능력 향상을 위한 음성 인식 서비스 제공 방법을 나타낸 순서도이다.3 is a flowchart illustrating a method of providing a speech recognition service for improving error pronunciation detection capability according to the present invention.

도 3을 참조하면, 서비스 장치(200)는 각 단어에 대한 기준 음운열이 정의된 음성인식용 발음 사전(221) 및 각 단어에 대하여 발생 가능한 오류 음운열을 정의한 오류 검출용 발음 사전(222)을 구분하여 저장하여 둔다(S105).Referring to FIG. 3, the service device 200 includes a speech recognition pronunciation dictionary 221 in which a reference phoneme sequence for each word is defined, and a pronunciation dictionary 222 for error detection, which defines an error phoneme sequence that may occur for each word. And store them separately (S105).

단말 장치(100)로 사용자에 의해 음성이 입력되면(S110), 단말 장치(100)는 상기 사용자 음성을 서비스 장치(200)로 전송하여 음성 인식 및 오류 검출을 요청할 수 있다(S115). 이에 서비스 장치(200)는, 음성인식용 발음 사전을 기준으로 수신한 사용자 음성에 대응하는 단어를 인식한다(S120).When a voice is input to the terminal device 100 by the user (S110), the terminal device 100 may transmit the user voice to the service device 200 to request voice recognition and error detection (S115). In response, the service device 200 recognizes a word corresponding to the received user's voice based on the phonetic pronunciation dictionary (S120).

그리고, 오류검출용 발음 사전에서 상기 단어에 대응하는 오류 음운열과 상기 사용자 음성을 비교하여, 사용자의 발음에 존재하는 오류를 검출한다(S125). 이때, 상기 서비스 장치(200)는 사용자의 음성에서 검출된 오류와 관련하여, 발음 오류, 장단 오류, 억양 오류, 강세 오류 중에서 하나 이상을 포함하는 오류 유형을 더 검출할 수 있다. 또한, 인식된 단어에 대하여 발생 가능한 다수의 오류 음운열과 사용자 음성을 비교하는데 있어서, 각 오류 음운열별로 발생 빈도율을 저장하여 두고, 발생 빈도율이 높은 순서대로 오류 음운열을 사용자 음성과 비교함으로써, 보다 신속하게 오류를 검출할 수 있다. Then, the error phonological sequence corresponding to the word is compared with the user voice in the pronunciation dictionary for error detection to detect an error in the pronunciation of the user (S125). In this case, the service device 200 may further detect an error type including at least one of a pronunciation error, a short and long error, an intonation error, and a stress error in relation to an error detected in a user's voice. In addition, in comparing a plurality of error phonological sequences that can occur with respect to a recognized word and a user's voice, the frequency of occurrence of each error phonological sequence is stored, and the error phonological sequences are compared with the user's voice in the order of high frequency. As a result, errors can be detected more quickly.

그리고, 서비스 장치(200)는 음성 인식 및 오류 검출 결과를 단말 장치(100)로 제공하고(S130), 단말 장치(100)는 서비스 장치(200)로부터 수신한 음성 인식 및 오류 검출 결과를 사용자에게 출력한다(S135).In addition, the service device 200 provides the voice recognition and error detection result to the terminal device 100 (S130), and the terminal device 100 provides the voice recognition and error detection result received from the service device 200 to the user. Output it (S135).

이때, 서비스 장치(200)는 오류 검출 결과 제공시, 검출된 오류 유형에 대한 교정 방법 또는 오류 원인을 더 제공할 수 있다.In this case, when providing an error detection result, the service device 200 may further provide a correction method or an error cause for the detected error type.

한편, 사용자 음성이 표준 발음과 기 설정된 유사도 범위에 해당되고, 오류 검출용 발음 사전에 존재하지 않는 새로운 오류 유형에 해당하는 경우(S140), 서비스 장치(200)는 오류 검출용 발음 사전(222)에 상기 사용자 음성에 대응하는 음운열을 새로운 오류 음운열로서 추가함으로써 오류 검출용 발음 사전(222)을 업데이트 할 수 있다(S145). 이때, 추가되는 오류 음운열의 최종 선정은 사용자에 의해 이루어질 수 있다.On the other hand, if the user voice corresponds to the standard pronunciation and the preset similarity range, and corresponds to a new error type that does not exist in the pronunciation dictionary for error detection (S140), the service device 200 is the pronunciation dictionary for error detection (222) By adding a phonological sequence corresponding to the user's voice as a new error phonological sequence, the pronunciation dictionary 222 for error detection can be updated (S145). In this case, the final selection of the additional error phoneme sequence may be performed by the user.

본 발명의 다른 실시 예에 있어서, 오류 발음 검출 능력 향상을 위한 음성 인식 서비스의 제공은 단말 장치(100)를 기반으로 이루어질 수 있다.In another embodiment of the present invention, the provision of the voice recognition service for improving the error pronunciation detection capability may be performed based on the terminal device 100.

도 4는 본 발명의 다른 실시 예에 의한 오류 발음 검출 능력 향상을 위한 음성 인식 서비스의 제공을 위한 단말 장치(100)의 구성을 나타낸 블록도이다.4 is a block diagram illustrating a configuration of a terminal device 100 for providing a voice recognition service for improving error pronunciation detection capability according to another embodiment of the present invention.

도 4를 참조하면, 본 발명에 따른 단말 장치(100)는 입력부(110)와, 출력부(120)와, 오디오 처리부(130)와, 저장부(140)와, 제어부(150)를 포함할 수 있다.Referring to FIG. 4, the terminal device 100 according to the present invention may include an input unit 110, an output unit 120, an audio processor 130, a storage unit 140, and a controller 150. Can be.

입력부(110)는 사용자의 조작에 따라서 단말 장치(100)를 제어하거나 동작하기 위한 사용자 입력 신호를 발생하는 수단으로서, 다양한 방식의 입력 수단으로 구현될 수 있다. 예를 들어, 입력부(110)는 키 입력 수단, 터치 입력 수단, 제스처 입력 수단, 음성 입력 수단 중에서 하나 이상을 포함할 수 있다. 키 입력 수단은, 키 조작에 따라서 해당 키에 대응하는 신호를 발생시키는 것으로서, 키패드, 키보드가 해당된다. 터치 입력 수단은, 사용자가 특정 부분을 터치하는 동작을 감지하여 입력 동작을 인식하는 것으로서, 터치 패드, 터치 스크린, 터치 센서를 들 수 있다. 제스처 입력 수단은, 사용자의 동작, 예를 들어, 단말 장치를 흔들거나 움직이는 동작, 단말 장치에 접근하는 동작, 눈을 깜빡이는 동작 등 지정된 특정 동작을 특정 입력 신호로 인식하는 것으로서, 지자기 센서, 가속도 센서, 카메라, 고도계, 자이로 센서, 근접 센서 중에서 하나 이상을 포함하여 이루어질 수 있다.The input unit 110 is a means for generating a user input signal for controlling or operating the terminal device 100 according to a user's manipulation. The input unit 110 may be implemented by various means. For example, the input unit 110 may include one or more of a key input unit, a touch input unit, a gesture input unit, and a voice input unit. The key input means generates a signal corresponding to the key according to the key operation, and corresponds to a keypad and a keyboard. The touch input means recognizes an input operation by detecting an operation of touching a specific portion of the user, and may include a touch pad, a touch screen, and a touch sensor. The gesture input means recognizes a specific specific action such as a user's motion, for example, shaking or moving the terminal device, approaching the terminal device, or blinking an eye, as a specific input signal. It may include one or more of a sensor, a camera, an altimeter, a gyro sensor, and a proximity sensor.

출력부(120)는 단말 장치(100)와 사용자 간의 인터페이스 화면을 출력하는 출력 수단으로서, 음성 인식 및 오류 검출 결과를 표시한다. 이러한 출력부(120)는 예를 들면, LCD((Liquid Crystal Display), TFT-LCD(Thin Film Transistor-Liquid Crystal Display), LED(Light Emitting Diodes), OLED(Organic Light Emitting Diodes), AMOLED(Active Matrix Organic Light Emitting Diodes), 플렉시블 디스플레이(flexible display), 3차원 디스플레이 중에서 어느 하나가 될 수 있다.The output unit 120 is an output unit for outputting an interface screen between the terminal device 100 and the user, and displays a result of voice recognition and error detection. The output unit 120 may include, for example, a liquid crystal display (LCD), a thin film transistor-liquid crystal display (TFT-LCD), light emitting diodes (LEDs), organic light emitting diodes (OLEDs), and active AMOLEDs (AMOLEDs). Matrix Organic Light Emitting Diodes, flexible displays, and three-dimensional displays can be any one.

오디오 처리부(130)는 음성 및 음향의 입력 및 출력을 처리하는 수단으로서, 음성 인식을 위한 사용자 음성을 입력 받게 된다. 이러한 오디오 처리부(130)는, 음성을 입력할 수 있는 마이크와 출력할 수 있는 스피커를 포함하여 이루어질 수 있다.The audio processor 130 is a means for processing input and output of voice and sound, and receives a user voice for voice recognition. The audio processor 130 may include a microphone capable of inputting a voice and a speaker capable of outputting the voice.

저장부(140)는 단말 장치(100)의 동작에 필요한 데이터 혹은 프로그램을 저장하는 수단으로서, 기본적으로 단말 장치(100)의 운용 프로그램(OS) 및 하나 이상의 응용 프로그램을 저장할 수 있다. 더하여, 본 발명에 있어서, 저장부(140)는 오류 발음 검출 능력 향상을 위한 음성 인식 서비스 제공을 위한 각 단어에 대한 기준 음운열이 정의된 음성인식용 발음 사전(141) 및 각 단어에 대하여 발생 가능한 오류 음운열을 정의한 오류 검출용 발음 사전(142)을 구분하여 저장한다. 이러한 저장부(140)는, 램(RAM, Read Access Memory), 롬(ROM, Read Only Memory), 하드디스크(HDD, Hard Disk Drive), 플래시 메모리, CD-ROM, DVD와 같은 모든 종류의 저장 매체를 포함할 수 있다.The storage unit 140 is a means for storing data or a program necessary for the operation of the terminal device 100, and basically stores an operating program (OS) and one or more application programs of the terminal device 100. In addition, in the present invention, the storage 140 is generated for the speech recognition pronunciation dictionary 141 and each word in which a reference phonological sequence is defined for each word for providing a speech recognition service for improving error pronunciation detection capability. An error detection pronunciation dictionary 142 defining possible error phonological sequences is classified and stored. The storage unit 140 stores all types of RAM, such as RAM (Read Access Memory), ROM (Read Only Memory), hard disk (HDD, Hard Disk Drive), flash memory, CD-ROM, DVD, and the like. Media may be included.

제어부(150)는 단말 장치(100)의 동작 전반을 제어하는 것으로서, 기본적으로 저장부(150)에 저장한 운영 프로그램을 기반으로 동작하여 단말 장치(100)의 기본적인 플랫폼 환경을 구축하고, 사용자의 선택에 따라서 응용 프로그램을 실행하여 임의 기능을 제공한다. 본 발명의 다른 실시 예에 있어서, 제어부(150)는, 입력부(110)를 통해 사용자로부터 음성인식 및 오류 검출이 요청되면, 음성인식용 발음 사전(141)을 기준으로 오디오 처리부(130)를 통해 입력된 사용자 음성에 대응하는 단어를 인식한 후, 오류 검출용 발음 사전(142)으로부터 인식된 단어에 대한 오류 음운열을 추출하고, 이를 사용자 음성과 비교하여, 사용자 발음에 대한 오류를 검출하여 오류 검출 결과를 출력한다. 이러한 제어부(150)는 음성 인식 모듈(151), 오류 검출 모듈(152), 오류 관리 모듈(153)을 포함할 수 있다.The controller 150 controls the overall operation of the terminal device 100, and basically operates based on an operating program stored in the storage unit 150 to build a basic platform environment of the terminal device 100. Optionally, run the application to provide arbitrary functionality. According to another embodiment of the present disclosure, when the voice recognition and error detection are requested from the user through the input unit 110, the controller 150 uses the audio processor 130 based on the pronunciation dictionary 141 for voice recognition. After recognizing a word corresponding to the input user's voice, an error phonological sequence for the recognized word is extracted from the pronunciation dictionary 142 for error detection, and compared with the user's voice to detect an error for the user's pronunciation. Output the detection result. The controller 150 may include a voice recognition module 151, an error detection module 152, and an error management module 153.

음성 인식 모듈(151)은 음성인식용 발음 사전(141)을 기준으로 사용자 음성에 대응하는 단어를 인식한다.The speech recognition module 151 recognizes a word corresponding to the user's voice based on the pronunciation dictionary 141 for speech recognition.

오류 검출 모듈(152)은 오류 검출용 발음 사전으로부터 인식된 단어에 대하여 발음, 장단, 억양, 강세 중 하나 이상에 대한 오류를 포함하는 오류 음운열을 추출하며, 추출된 오류 음운열과 사용자 음성을 비교하여, 사용자 음성에 포함된 발음 오류를 검출한다. 이때, 오류 음운열별로 발생 빈도율이 저장되어, 발생 빈도율이 높은 순서대로 오류 음운열을 사용자 음성과 비교하도록 할 수 있다. The error detection module 152 extracts an error phonological sequence including an error about one or more of pronunciation, short, long, intonation, and stress for the recognized word from the pronunciation dictionary for error detection, and compares the extracted error phonological sequence with the user's voice. Thus, a pronunciation error included in the user's voice is detected. At this time, the occurrence frequency rate is stored for each error phonological sequence, so that the error phonological sequence can be compared with the user's voice in the order of high occurrence frequency rate.

오류 관리 모듈(153)은 입력된 사용자 음성의 발음이 표준 발음과 기 설정된 유사도 범위이고, 오류 검출용 발음 사전(142)에 존재하지 않는 새로운 오류 유형에 해당하는 경우, 상기 사용자 음성의 음운열을 오류 검출용 발음 사전(142)에 새로운 오류 유형을 추가한다.The error management module 153 may determine a phonological sequence of the user voice when the pronunciation of the inputted user voice corresponds to a standard pronunciation and a preset similarity range, and corresponds to a new error type that does not exist in the pronunciation dictionary 142 for error detection. A new error type is added to the pronunciation dictionary 142 for error detection.

음성 인식 모듈(151)과, 오류 검출 모듈(152)과, 오류 관리 모듈(153)은 소프트웨어 혹은 하드웨어 혹은 소프트웨어와 하드웨어의 조합에 의해 구현될 수 있는 것으로서, 예를 들면, 프로그램 형태로 저장부(140)에 저장되어 있다가 제어부(150)에 의해 실행됨에 의해 구현될 수 있다.The voice recognition module 151, the error detection module 152, and the error management module 153 may be implemented by software or hardware, or a combination of software and hardware. The data stored in the processor 140 may be implemented by being executed by the controller 150.

도 5는 본 발명의 다른 실시 예에 의한 오류 발음 검출 능력 향상을 위한 음성 인식 서비스 제공 방법을 나타낸 순서도이다.5 is a flowchart illustrating a method of providing a speech recognition service for improving error pronunciation detection capability according to another embodiment of the present invention.

도 5를 참조하면, 단말 장치(100)는 각 단어에 대한 기준 음운열이 정의된 음성인식용 발음 사전(141) 및 각 단어에 대하여 발생 가능한 오류 음운열을 정의한 오류 검출용 발음 사전(142)을 구분하여 저장하여 둔다(S205). 상기 음성인식용 발음 사전(141) 및 오류 검출용 발음 사전(142)은 서비스 장치(200)로부터 수신하여 획득할 수 있다.Referring to FIG. 5, the terminal device 100 includes a speech recognition pronunciation dictionary 141 in which a reference phonological sequence for each word is defined, and a pronunciation dictionary 142 for error detection in which an error phonological sequence that can occur for each word is defined. To store them separately (S205). The speech recognition pronunciation dictionary 141 and the error detection pronunciation dictionary 142 may be received and obtained from the service device 200.

이후 사용자 음성이 입력되면(S210), 음성인식용 발음 사전(141)을 기준으로 사용자 음성에 대응하는 단어를 인식한다(S215). 이어서, 인식한 단어에 대하여 발생 가능한 발음 오류, 장단 오류, 억양 오류, 강세 오류를 포함하는 다수의 오류 음운열을 오류 검출용 발음 사전(142)으로부터 추출하고, 이를 사용자 음성과 비교하여 사용자 음성에 포함된 오류 발음을 검출한다(S220). 이때, 검출된 오류 발음에 대응하여, 발음 오류, 장단 오류, 억양 오류, 강세 오류 중에서 하나 이상을 포함하는 오류 유형을 더 검출할 수 있다. 더불어, 오류 검출시, 발생 빈도율이 높은 순서대로 오류 음운열과 사용자 음성의 발음을 비교함으로써, 보다 신속하게 음성 입력된 단어의 오류를 검출할 수 있다. Thereafter, when the user's voice is input (S210), the word corresponding to the user's voice is recognized based on the phonetic pronunciation dictionary 141 (S215). Subsequently, a plurality of error phonological sequences including pronunciation errors, pros and cons, accent errors, and stress errors that may occur with respect to the recognized words are extracted from the pronunciation dictionary 142 for error detection, and compared with the user's voice. The included pronunciation of errors is detected (S220). In this case, in response to the detected error pronunciation, an error type including one or more of a pronunciation error, a short and long error, an intonation error, and a stress error may be further detected. In addition, when the error is detected, the error phonological sequence and the pronunciation of the user's voice are compared in the order of the high frequency of occurrence, so that the error of the voice input word can be detected more quickly.

단말 장치(100)는 출력부(120)를 통해 음성 인식 및 오류 검출 결과를 출력하여 사용자에게 제공한다(S225). 이때 오류 검출 결과는 각 단어의 오류 유형 또는 검출된 오류 유형에 대한 발음 교정 방법 또는 발음의 오류 원인을 더 포함함으로써, 외국어 학습시 도움을 줄 수 있다.The terminal device 100 outputs the voice recognition result and the error detection result through the output unit 120 and provides the result to the user (S225). In this case, the error detection result may further include a pronunciation correction method or a cause of an error of the error type or the detected error type of each word, thereby helping to learn a foreign language.

한편 음성 입력된 발음이 정상 발음과 기 설정된 유사도 범위이고, 오류 검출용 발음 사전에 존재하지 않는 새로운 오류 유형에 해당하는 경우(S230), 단말 장치(100)는 오류 검출용 발음 사전(142)에 새로운 오류 유형을 추가함으로써 오류 검출용 발음 사전(142)을 업데이트 할 수 있다(S235).On the other hand, when the voice input pronunciation corresponds to a normal pronunciation and a preset similarity range, and corresponds to a new error type that does not exist in the pronunciation dictionary for error detection (S230), the terminal device 100 may determine the pronunciation dictionary 142. By adding a new error type, the pronunciation dictionary 142 for error detection may be updated (S235).

본 발명에 따른 오류 발음 검출 능력 향상을 위한 음성 인식 방법은 다양한 컴퓨터 수단을 통하여 판독 가능한 소프트웨어 형태로 구현되어 컴퓨터로 판독 가능한 기록매체에 기록될 수 있다. 여기서, 기록매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 기록매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 예컨대 기록매체는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(Magnetic Media), CD-ROM(Compact Disk Read Only Memory), DVD(Digital Video Disk)와 같은 광 기록 매체(Optical Media), 플롭티컬 디스크(Floptical Disk)와 같은 자기-광 매체(Magneto-Optical Media), 및 롬(ROM), 램(RAM, Random Access Memory), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치를 포함한다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함할 수 있다. 이러한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The speech recognition method for improving error pronunciation detection capability according to the present invention may be implemented in a software form readable by various computer means and recorded on a computer readable recording medium. Here, the recording medium may include program commands, data files, data structures, and the like, alone or in combination. Program instructions recorded on the recording medium may be those specially designed and constructed for the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts. For example, the recording medium may be an optical recording medium such as a magnetic medium such as a hard disk, a floppy disk and a magnetic tape, a compact disk read only memory (CD-ROM), a digital video disk (DVD) Includes a hardware device that is specially configured to store and execute program instructions such as a magneto-optical medium such as a floppy disk and a ROM, a random access memory (RAM), a flash memory, do. Examples of program instructions may include machine language code such as those generated by a compiler, as well as high-level language code that may be executed by a computer using an interpreter or the like. Such hardware devices may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

이상과 같이, 본 명세서와 도면에는 본 발명의 바람직한 실시 예에 대하여 개시하였으나, 여기에 개시된 실시 예외에도 본 발명의 기술적 사상에 바탕을 둔 다른 변형 예들이 실시 가능하다는 것은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 자명한 것이다. 또한, 본 명세서와 도면에서 특정 용어들이 사용되었으나, 이는 단지 본 발명의 기술 내용을 쉽게 설명하고 발명의 이해를 돕기 위한 일반적인 의미에서 사용된 것이지, 본 발명의 범위를 한정하고자 하는 것은 아니다.While the present invention has been described in connection with what is presently considered to be practical exemplary embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, It will be apparent to those skilled in the art. In addition, although specific terms are used in the specification and the drawings, they are only used in a general sense to easily explain the technical contents of the present invention and to help the understanding of the present invention, and are not intended to limit the scope of the present invention.

음성인식용 사전과 오류검출용 사전을 별도로 구비하여 음성인식용 사전을 통해서 음성인식을 수행하여 정확한 단어를 찾고, 그 단어에 대한 오류 유형을 포함하는 오류 검출용 사전을 이용하여 발성한 단어에 존재하는 오류를 검출하여, 오류 유형을 아는 상태에서 오류를 검출하므로 빠르고 정확하게 음성 인식을 수행하고 발성한 단어에 존재하는 오류를 검출할 수 있다.A separate speech recognition dictionary and an error detection dictionary are provided to perform a speech recognition through the speech recognition dictionary to find the correct word and to exist in a word spoken using an error detection dictionary including an error type for the word. By detecting the error, the error is detected in a state where the error type is known, so that speech recognition can be performed quickly and accurately and the error present in the spoken word can be detected.

100: 단말 장치 110: 입력부 120: 출력부
130: 오디오 처리부 140: 저장부
141: 음성인식용 발음 사전 142: 오류 검출용 발음 사전
150: 제어부 151: 음성 인식 모듈
152: 오류 검출 모듈 153: 오류 관리 모듈
200: 서비스 장치 210: 통신부 220: 저장부
221: 음성인식용 발음 사전 222: 오류검출용 발음 사전
230: 서비스 제공부 231: 음성 인식 모듈
232: 오류 검출 모듈 233: 오류 관리 모듈 300: 네트워크100: terminal device 110: input unit 120: output unit
130: audio processing unit 140: storage unit
141: pronunciation dictionary for speech recognition 142: pronunciation dictionary for error detection
150: control unit 151: speech recognition module
152: error detection module 153: error management module
200: service device 210: communication unit 220: storage unit
221: pronunciation dictionary for speech recognition 222: pronunciation dictionary for error detection
230: service provider 231: voice recognition module
232: error detection module 233: error management module 300: network

Claims

A communication unit for transmitting and receiving data through a communication network;
It stores and classifies phonetic dictionaries for speech recognition that define reference phoneme sequences that can occur by words for speech recognition, and pronunciation dictionaries for error detection that define error phonetic sequences that can occur for each word to detect errors about user speech. A storage unit; And
When the user's voice is received from the terminal device through the communication unit, the word corresponding to the user's voice is recognized based on the pronunciation dictionary for speech recognition, and the pronunciation of the recognized word is used using the pronunciation dictionary for error detection. A service providing unit which detects an error and provides a voice recognition result and an error detection result to the terminal device;
Service apparatus for providing a voice recognition service comprising a.

The method of claim 1,
The speech recognition pronunciation dictionary and the error detection pronunciation dictionary include error phonological sequences that may occur due to native language interference, but the speech recognition pronunciation dictionary includes fewer phonological sequences than the pronunciation dictionary for error detection. A service device for providing a voice recognition service.

The method of claim 1, wherein the service provider
And an error detection module for detecting an error type including at least one of pronunciation, short, long, intonation, and stress for the user's voice.

The method of claim 1,
The pronunciation dictionary for error detection further includes an error occurrence rate for each error phoneme sequence,
The service providing unit provides a voice recognition service, characterized in that for detecting the error by comparing the error phonological sequence and the user voice in the order of the error frequency is high.

The method of claim 1, wherein the service provider
When providing the error detection result to the terminal device, the service device for providing a voice recognition service, characterized in that further providing information on the correction method or the cause of the error for each type of error.

The method of claim 1, wherein the service provider
If the user's voice does not match the standard phonological sequence, but the similarity is a preset range, and there is no phonological sequence corresponding to the error detection pronunciation dictionary, the phonological sequence corresponding to the user's voice is included in the pronunciation dictionary for error detection. Service apparatus for providing a voice recognition service comprising an error management module for adding.

It stores and classifies phonetic dictionaries for speech recognition that define reference phoneme sequences that can occur by words for speech recognition, and pronunciation dictionaries for error detection that define error phonetic sequences that can occur for each word to detect errors about user speech. A storage unit;
An input unit for receiving a request of a user;
An audio processor configured to receive a user's voice;
In response to a user's request input through an input unit, a word corresponding to a user's voice input through the audio processor is recognized using the voice recognition pronunciation dictionary, and the recognized word is used using the pronunciation dictionary for error detection. A control unit configured to extract a pronunciation error for and output the speech recognition result and the error detection result; And
An output unit for outputting the voice recognition result and the error detection result;
Terminal device providing a voice recognition service comprising a.

The method of claim 7, wherein
The phonetic dictionary for speech recognition and pronunciation dictionary for error detection include error phonological sequences that may occur due to native language interference, but the phonetic dictionary for speech recognition includes less phonological sequence than the pronunciation dictionary for error detection. Terminal device that provides a voice recognition service.

8. The apparatus of claim 7, wherein the control unit
And detecting an error type including one or more of pronunciation, short, long, intonation, and stress of the user's voice.

The method of claim 7, wherein
The pronunciation dictionary for error detection further includes an error occurrence rate for each error phoneme sequence,
The control unit provides a voice recognition service, characterized in that for detecting the error by comparing the error phonological sequence and the user voice in the order of the error frequency rate is high.

8. The apparatus of claim 7, wherein the control unit
When providing the error detection result, the terminal device providing a voice recognition service, characterized in that further providing information on the correction method or the cause of the error for each error type.

8. The apparatus of claim 7, wherein the control unit
And a voice recognition pronunciation dictionary and an error detection pronunciation dictionary from a service device through the communication unit, and storing the received voice dictionary.

Recognizing a word corresponding to a user's voice using a pronunciation dictionary for speech recognition that defines a reference phonological sequence that can be generated for each word for speech recognition; And
Error pronunciation characterized in that it comprises the step of detecting a pronunciation error included in the user's voice using an error detection pronunciation dictionary that defines an error phonological sequence that can occur for each word in order to detect an error for the user utterance Speech recognition method for improving detection capability.

The method of claim 13,
The speech recognition pronunciation dictionary and the error detection pronunciation dictionary include error phonological sequences that may occur due to native language interference, but the speech recognition pronunciation dictionary includes fewer phonological sequences than the pronunciation dictionary for error detection. Speech recognition method to improve error pronunciation detection.

The method of claim 13,
And detecting an error type that includes one or more of pronunciation, short, long, intonation, and stress in response to the detected error.