KR102405547B1

KR102405547B1 - Pronunciation evaluation system based on deep learning

Info

Publication number: KR102405547B1
Application number: KR1020200118571A
Authority: KR
Inventors: 장현석
Original assignee: 주식회사 퀄슨
Priority date: 2020-09-15
Filing date: 2020-09-15
Publication date: 2022-06-07
Anticipated expiration: 2040-09-15
Also published as: KR20220036239A

Abstract

본 발명은 발음 평가 시스템에 대한 것이다.
본 발명에 따른 발음 평가 시스템은 학습에 필요한 문장을 제공하는 문자 제공부, 제공된 문장에 대응하여 학습자에 의해 발화된 음성을 수집하는 발화 음성 수집부, 상기 발화된 음성을 텍스트 형태로 변환하고, 변환된 텍스트를 발화문으로 획득하는 텍스트 변환부, 상기 획득한 발화문과 제공된 문장을 비교 분석하여 문장에 대한 매칭률을 산출하는 평가부, 평가된 결과를 이용하여 학습자가 지속적으로 틀리게 발음하는 단어 및 인식에 오류가 발생되는 단어들에 대한 정보를 수집하고, 수집된 단어를 테이블형태로 저장하는 오인식 단어 수집부, 그리고 발화문과 제공된 문장이 일치할 경우에는 평가된 결과를 수치화하고, 발음 정확도 및 강세 일치에 따라 색상으로 구분화하여 제공하고, 발화문과 제공된 문장이 불일치 할 경우에는 제공된 문장에 대하여 재 발화 요청 신호를 송신하는 제어부를 포함한다.The present invention relates to a pronunciation evaluation system.
Pronunciation evaluation system according to the present invention includes a text providing unit that provides a sentence necessary for learning, an uttered voice collection unit that collects the voice uttered by the learner in response to the provided sentence, and converts the uttered voice into text form and converts it A text conversion unit that acquires the acquired text as an utterance, an evaluation unit that compares and analyzes the acquired utterance and the provided sentence to calculate a matching rate for the sentence, and the word and recognition that the learner continuously pronounces incorrectly using the evaluated result A misrecognized word collection unit that collects information on words that cause errors in the word and stores the collected words in a table format. and a control unit for providing a color-coded according to the utterance, and transmitting a re-speech request signal for the provided sentence when the utterance sentence and the provided sentence do not match.

Description

Pronunciation evaluation system based on deep learning

본 발명은 발음 평가 시스템에 관한 것으로, 보다 상세하게는 딥러닝 및 NLP(자연어 처리)기술을 활용하여 사용자가 발화함에 따라 획득한 음성정보와 제공되는 문장을 매칭하여 사용자의 발음을 평가하는 발음 평가 시스템에 관한 것이다.The present invention relates to a pronunciation evaluation system, and more particularly, a pronunciation evaluation that evaluates the user's pronunciation by matching the provided sentence with the voice information obtained as the user speaks by using deep learning and NLP (Natural Language Processing) technology. It's about the system.

최근 컴퓨터를 이용하여 영어와 같은 외국어 학습을 하는 사용자가 증가되고 있다. 특히 영어 발음의 학습을 위한 프로그램이 증가하고 있는데 사용자가 마이크를 통해 특정 단어 또는 문장을 발화하면 그 발화를 분석하여 사용자의 발음에 대한 평가를 수행하여 제공한다. 이때, 사용자의 발화 내용을 알아내기 위해 음성 인식 기술이 응용되며, 평가 결과로 점수 또는 평가 수준에 맞는 피드백을 사용자에게 제공한다.Recently, the number of users learning foreign languages such as English using computers is increasing. In particular, programs for learning English pronunciation are increasing. When a user utters a specific word or sentence through a microphone, the utterance is analyzed and the user's pronunciation is evaluated and provided. In this case, a speech recognition technology is applied to find out the content of the user's utterance, and as an evaluation result, a feedback corresponding to a score or an evaluation level is provided to the user.

사용자에게 제공되는 발음 학습 결과의 내용으로는 문장 발음 학습의 경우 발화 내용 전체에 대한 전반적인 발음의 정확도(overall score)만을 표시하고 있는 경우가 많으며, 단어 발음 학습의 경우 해당 단어의 발음이 제대로 발음되었는지 아닌 지만을 표시하고 있는 경우가 많다. 이때, 문장과 같이 여러 단어를 발화하는 경우에는 일부 문제 있는 발음의 단어에 대한 지적이 없어 정확한 발음 학습 결과를 제공받지 못하는 문제점이 있다.As the contents of the pronunciation learning result provided to the user, in the case of sentence pronunciation learning, only the overall score of the entire utterance is displayed in many cases. In many cases, it only indicates that it is not. In this case, when several words are uttered, such as a sentence, there is a problem in that an accurate pronunciation learning result is not provided because there is no point on the words having some problematic pronunciations.

한편, 음성 인식 결과를 더욱 활용하여 사용자가 잘못 발음한 부분을 지적하여 알려주는 경우도 있으나, 이 경우 한국인의 영어 발음에 있어서 틀리기 쉬운 부분을 미리 지식화하여 구축하고 음성 인식을 통해 잘못된 발음이 인식되면 이를 사용자에게 알려주는 것으로 추가로 정확한 발음에 대한 정보를 구축하여야 발음의 교정이 가능한 문제점이 있다.On the other hand, there are cases where the voice recognition result is further utilized to point out and inform the user of the wrong pronunciation, but in this case, the part that is easy to make mistakes in English pronunciation of Koreans is established and knowledgeable in advance, and the wrong pronunciation is recognized through voice recognition. If the user is notified of this, there is a problem in that the pronunciation can be corrected only when information on the correct pronunciation is additionally established.

발음 자체의 정확도 이외에도, 특히 문장 발화에 있어 발화의 자연스러움을 측정하기 위해 초분절적인(suprasegmental) 평가요소를 포함하여 제공하는 경우도 있으나, 초분절적인 요소도 문장단위로 평가되고 있어 문장내의 잘못된 부분을 지적하고 어떻게 잘못된 것인지에 대한 세부적인 내용을 제공하기에는 어려움이 따른다. 여기서, 초분절적인 평가요소로는 문장의 억양, 강세, 말의 빠르기 등과 같이 분절되지 않는 항목이며, 분절적인 요소는 문장, 구절, 음절, 단어 및 음소와 같이 분리 가능한 항목이다.In addition to the accuracy of pronunciation itself, in some cases, a suprasegmental evaluation factor is included to measure the naturalness of speech, especially in sentence speech. It is difficult to point out the parts and provide details on how they went wrong. Here, the hypersegmental evaluation factors are items that are not segmented such as intonation, stress, and speed of speech, and the segmental factors are separable items such as sentences, phrases, syllables, words, and phonemes.

따라서, 문장에 대한 발음학습 결과를 소정의 마디 단위 별로 초분절적인 요소까지 평가하여 제공할 수 있는 외국어 발음 평가 기술이 요구된다.Accordingly, there is a need for a foreign language pronunciation evaluation technology that can evaluate and provide the result of pronunciation learning for a sentence even to a hypersegmental element for each predetermined word unit.

본 발명의 배경이 되는 기술은 대한민국 공개특허공보 제10-2019-0068841호(2019.06.19. 공개)에 개시되어 있다.The technology that is the background of the present invention is disclosed in Korean Patent Application Laid-Open No. 10-2019-0068841 (published on June 19, 2019).

본 발명이 이루고자 하는 기술적 과제는 딥러닝 및 NLP(자연어 처리)기술을 활용하여 사용자가 발화함에 따라 획득한 음성정보와 제공되는 문장을 매칭하여 사용자의 발음을 평가하는 발음 평가 시스템을 제공하기 위한 것이다.The technical problem to be achieved by the present invention is to provide a pronunciation evaluation system that evaluates the user's pronunciation by matching the provided sentences with the voice information obtained as the user speaks by utilizing deep learning and NLP (Natural Language Processing) technology. .

이러한 기술적 과제를 이루기 위한 본 발명의 실시 예에 따른 발음 평가 시스템에 있어서, 학습에 필요한 문장을 제공하는 문자 제공부, 제공된 문장에 대응하여 학습자에 의해 발화된 음성을 수집하는 발화 음성 수집부, 상기 발화된 음성을 텍스트 형태로 변환하고, 변환된 텍스트를 발화문으로 획득하는 텍스트 변환부, 상기 획득한 발화문과 제공된 문장을 비교 분석하여 문장에 대한 매칭률을 산출하는 평가부, 평가된 결과를 이용하여 학습자가 지속적으로 틀리게 발음하는 단어 및 인식에 오류가 발생되는 단어들에 대한 정보를 수집하고, 수집된 단어를 테이블형태로 저장하는 오인식 단어 수집부, 그리고 발화문과 제공된 문장이 일치할 경우에는 평가된 결과를 수치화하고, 발음 정확도 및 강세 일치에 따라 색상으로 구분화하여 제공하고, 발화문과 제공된 문장이 불일치 할 경우에는 제공된 문장에 대하여 재 발화 요청 신호를 송신하는 제어부를 포함한다. In the pronunciation evaluation system according to an embodiment of the present invention for achieving this technical task, a text providing unit for providing a sentence necessary for learning, a spoken voice collecting unit for collecting a voice uttered by a learner in response to the provided sentence, the above A text conversion unit that converts the spoken voice into a text form and obtains the converted text as an utterance, an evaluation unit that compares and analyzes the obtained utterance and the provided sentence to calculate a matching rate for the sentence, using the evaluated result Thus, it collects information on words that the learner continuously pronounces incorrectly and words that cause errors in recognition, and a misrecognized word collection unit that stores the collected words in a table format, and evaluates when the utterances and the provided sentences match and a control unit that digitizes the obtained results, classifies them by color according to pronunciation accuracy and stress matching, and provides them, and transmits a re-speech request signal for the provided sentences when the utterances and the provided sentences do not match.

상기 텍스트 변환부는, 상기 발화된 음성을 단어별로 분류하고, 분류된 단어를 이용하여 획득한 발화문과, 상기 발화문과 유사한 복수의 후보 문장을 생성하며, 상기 분류된 단어마다 그에 대응하는 시간 정보를 제공할 수 있다. The text conversion unit classifies the spoken voice by word, generates an utterance obtained using the classified word, and a plurality of candidate sentences similar to the utterance, and provides time information corresponding thereto for each of the classified words can do.

상기 평가부는, 제공된 문장과 발화문을 한 단어씩 비교하고, 매칭되는 단어의 개수에 따라 매칭률을 산출할 수 있다. The evaluation unit may compare the provided sentence and the spoken sentence word by word, and calculate a matching rate according to the number of matching words.

상기 평가부는, 제공된 문장과 발화문을 단어 단위로 분류하고, 분류된 단어 마다 비교하여 제공된 문장 또는 발화문에 가중치를 부여하여 매칭률을 산출할 수 있다. The evaluation unit may calculate a matching rate by classifying the provided sentences and utterances in units of words, comparing each classified word, and assigning weights to the provided sentences or utterances.

상기 평가부는, 단어를 제대로 발음하거나 단어를 한번 더 발음하였을 경우에는 발화문에 가중치를 부여하고, 단어를 잘못 발음하였거나 단어를 빼먹었을 경우에는 제공된 문장에 가중치를 부여할 수 있다. The evaluation unit may give weight to an utterance when a word is correctly pronounced or a word is pronounced once more, and a weight can be given to a provided sentence when a word is incorrectly pronounced or a word is omitted.

상기 평가부는, 제공된 문장과 발화문을 단어 단위로 분류하고, 분류된 단어를 복수의 경우의 수로 설정한 다음, 설정된 경우의 수를 이용하여 매칭률을 산출할 수 있다. The evaluation unit may classify the provided sentences and utterances in units of words, set the classified word as the number of a plurality of cases, and then calculate a matching rate using the set number of cases.

상기 오인식 단어 수집부는, 숫자, 자주 쓰이지 않은 단어 및 학습자가 반복적으로 틀리게 발음하는 단어를 추출하여 오인식 단어로 분류하고, 분류된 오인식 단어들을 테이블 형태로 저장할 수 있다. The misrecognized word collecting unit may extract numbers, infrequently used words, and words that the learner repeatedly incorrectly pronounces, classify them as misrecognized words, and store the classified misrecognized words in the form of a table.

상기 제어부는, 학습자의 발화된 음성을 이용하여 파형을 획득하고, 상기 획득한 파형과 발화문에 포함된 단어의 시간정보를 매칭하여 단어의 강세를 획득하며, 획득한 강세와 원 단어의 강세를 비교하여 정답 여부를 색상으로 표현하여 출력할 수 있다. The control unit acquires a waveform by using the learner's spoken voice, matches the acquired waveform with time information of the word included in the utterance to acquire the stress of the word, and calculates the acquired stress and the original word's stress. By comparison, the correct answer can be expressed in color and printed out.

이와 같이 본 발명에 따르면, 딥러닝 및 자연어 처리(NLP)기술을 기반으로 문장에 대한 사용자의 발화 정도를 평가하고, 평가된 결과를 지표화하여 제공하므로 사용자로 하여금 성취감을 느낄 수 있도록 하며, 스스로 영어의 발음을 입력하면 부정확한 발음에 대한 위치를 음소 혹은 음절 단위로 지적하여 사용자에게 알려줌과 동시에 발음의 정확도, 강세, 억양, 속도 등에 대한 자세한 분석이 가능하여 학습 능률을 높일 수 있다. As described above, according to the present invention, based on deep learning and natural language processing (NLP) technology, the user's speech level is evaluated, and the evaluated result is provided as an index, so that the user can feel a sense of achievement, and If you input the pronunciation of , it notifies the user by pointing out the location of inaccurate pronunciation in phoneme or syllable units, and at the same time, detailed analysis of pronunciation accuracy, stress, intonation, speed, etc. is possible, improving learning efficiency.

또한 본 발명에 따르면, 사용자가 반복적으로 틀리게 하는 발음하는 단어들을 모니터링하여 지속적으로 변환테이블을 업데이트 함으로서 사용자의 발음에 대한 인식률을 높일 수 있고, 강세가 포함된 단어와 강세가 포함되지 않은 단어로 분류하여 발화된 문장과 제공된 문장의 매칭률을 높일 수 있다. In addition, according to the present invention, it is possible to increase the recognition rate of the user's pronunciation by continuously updating the conversion table by monitoring the words that the user repeatedly pronounces incorrectly, and classifies the words with and without stress. Thus, it is possible to increase the matching rate between the spoken sentence and the provided sentence.

도 1은 본 발명의 실시예에 따른 발음 평가 시스템을 설명하기 위한 구성도이다.
도 2는 본 발명의 실시예에 따른 발음 평가 시스템을 이용하여 발음을 평가하는 방법을 나타내는 순서도이다.
도 3은 도 2에 도시된 S210단계를 나타내는 도면이다.
도 4는 도 2에 도시된 S220단계를 설명하기 위한 도면이다.
도 5는 도 2에 도시된 S240단계에서 발화문으로 변환하여 출력한 상태를 나타내는 예시도이다.
도 6은 도 2에 도시된 S250단계에서 3번째 방법에 의해 매칭률을 산출하는 방법을 설명하기 위한 도면이다.
도 7은 도 2에 도시된 S270단계에서 도출된 결과를 출력한 상태를 나타내는 예시도이다.
도 8은 도 2에 도시된 S270단계에서 단어의 강세를 판단하는 방법을 설명하기 위한 도면이다. 1 is a configuration diagram for explaining a pronunciation evaluation system according to an embodiment of the present invention.
2 is a flowchart illustrating a method of evaluating pronunciation using a pronunciation evaluation system according to an embodiment of the present invention.
FIG. 3 is a diagram illustrating step S210 shown in FIG. 2 .
FIG. 4 is a view for explaining step S220 shown in FIG. 2 .
FIG. 5 is an exemplary diagram illustrating a state in which an utterance is converted and outputted in step S240 shown in FIG. 2 .
FIG. 6 is a view for explaining a method of calculating a matching rate by the third method in step S250 shown in FIG. 2 .
7 is an exemplary view showing a state in which the result derived in step S270 shown in FIG. 2 is output.
FIG. 8 is a view for explaining a method of determining the stress of a word in step S270 shown in FIG. 2 .

이하 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시 예를 상세히 설명하기로 한다. 이 과정에서 도면에 도시된 선들의 두께나 구성요소의 크기 등은 설명의 명료성과 편의상 과장되게 도시되어 있을 수 있다. Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings. In this process, the thickness of the lines or the size of the components shown in the drawings may be exaggerated for clarity and convenience of explanation.

또한 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서, 이는 사용자, 운용자의 의도 또는 관례에 따라 달라질 수 있다. 그러므로 이러한 용어들에 대한 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.In addition, the terms to be described later are terms defined in consideration of functions in the present invention, which may vary according to intentions or customs of users and operators. Therefore, definitions of these terms should be made based on the content throughout this specification.

이하에서는 본 발명의 실시예에 따른 발음 평가 시스템에 대해 더욱 구체적으로 설명한다. Hereinafter, a pronunciation evaluation system according to an embodiment of the present invention will be described in more detail.

도 1은 본 발명의 실시예에 따른 딥러닝 기반의 발음 평가 시스템을 설명하기 위한 구성도이다. 1 is a configuration diagram for explaining a deep learning-based pronunciation evaluation system according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 본 발명의 실시예에 따른 발음 평가 시스템(100)은 문자 제공부(110), 발화 음성 수집부(120), 텍스트 변환부(130), 평가부(140), 오인식 단어 수집부(150) 및 제어부(160)를 포함한다. As shown in FIG. 1 , the pronunciation evaluation system 100 according to an embodiment of the present invention includes a text providing unit 110 , an spoken voice collecting unit 120 , a text converting unit 130 , an evaluation unit 140 , It includes a misrecognized word collection unit 150 and a control unit 160 .

먼저, 문자 제공부(110)는 학습에 필요한 문장을 제공한다. 이때 제공되는 문장은 영화나 드라마에 나오는 대사일수도 있고 일상 생활에 필요한 대화 또는 음악 가사일 수도 있다. First, the text providing unit 110 provides a sentence necessary for learning. The sentences provided at this time may be lines from movies or dramas, dialogues or music lyrics necessary for daily life.

발화 음성 수집부(120)는 제공된 문장에 대응하여 학습자에 의해 발화된 음성을 수집한다. 학습자는 단말기에 포함된 마이크 기능을 턴온한다. 그리고, 마이크에 대고 제공되는 문자를 따라 읽음으로써, 학습자 단말기는 학습자로부터 발화된 음성을 수집한다. The spoken voice collecting unit 120 collects the voices uttered by the learner in response to the provided sentences. The learner turns on the microphone function included in the terminal. And, by reading along the text provided to the microphone, the learner terminal collects the voice uttered by the learner.

텍스트 변환부(130)는 수집된 발화된 음성을 텍스트 형태로 변환하여 획득한다. The text converter 130 converts the collected spoken voice into a text form and obtains it.

평가부(140)는 발화문과 학습에 제공된 문장을 비교하여 매칭률을 획득한다. 평가부(140)는 딥러닝 모델을 구축하고, 구축된 딥러닝 모델에 발화문과 학습에 제공된 문장을 입력한다. 그러면, 딥러닝 모델은 발화문과 학습에 제공된 문장을 단어 단위로 분류한 다음, 분류된 단어를 상호 매칭하여 매칭률을 획득한다. The evaluation unit 140 compares the spoken sentence with the sentence provided for learning to obtain a matching rate. The evaluation unit 140 builds a deep learning model, and inputs an utterance sentence and a sentence provided for learning into the built deep learning model. Then, the deep learning model classifies the spoken sentence and the sentence provided for learning into word units, and then matches the classified words with each other to obtain a matching rate.

오인식 단어 수집부(150)는 평가된 결과를 이용하여 학습자가 지속적으로 틀리게 발음하는 단어 및 인식에 오류가 발생되는 단어들에 대한 정보를 수집하고, 수집된 단어를 테이블형태로 저장한다. The misrecognized word collecting unit 150 collects information on words that the learner continuously pronounces incorrectly and words that cause errors in recognition by using the evaluated result, and stores the collected words in a table form.

마지막으로 제어부(160)는 발화문과 제공된 문장이 일치할 경우에는 평가된 결과를 수치화하고, 발음 정확도 및 강세 일치에 따라 색상으로 구분화하여 제공하고, 발화문과 제공된 문장이 불일치 할 경우에는 제공된 문장에 대하여 재 발화 요청 신호를 송신한다. Finally, when the utterance sentence and the provided sentence match, the controller 160 quantifies the evaluation result, classifies the result by color according to pronunciation accuracy and stress match, and provides it. When the utterance sentence and the provided sentence do not match, the to transmit a re-ignition request signal.

이하에서는 도 2 내지 도 8을 이용하여 본 발명의 실시예에 따른 발음 평가 시스템을 이용하여 발음을 평가하는 방법에 대해 더욱 상세하게 설명한다. Hereinafter, a method of evaluating pronunciation using the pronunciation evaluation system according to an embodiment of the present invention will be described in more detail with reference to FIGS. 2 to 8 .

도 2는 본 발명의 실시예에 따른 발음 평가 시스템을 이용하여 발음을 평가하는 방법을 나타내는 순서도이고, 도 3은 도 2에 도시된 S210단계를 나타내는 도면이고, 도 4는 도 2에 도시된 S220단계를 설명하기 위한 도면이고, 도 5는 도 2에 도시된 S240단계에서 발화문으로 변환하여 출력한 상태를 나타내는 예시도이고 도 6은 도 2에 도시된 S250단계에서 3번째 방법에 의해 매칭률을 산출하는 방법을 설명하기 위한 도면이고, 도 7은 도 2에 도시된 S270단계에서 도출된 결과를 출력한 상태를 나타내는 예시도이고, 도 8은 도 2에 도시된 S270단계에서 단어의 강세를 판단하는 방법을 설명하기 위한 도면이다. FIG. 2 is a flowchart illustrating a method for evaluating pronunciation using a pronunciation evaluation system according to an embodiment of the present invention, FIG. 3 is a diagram illustrating step S210 illustrated in FIG. 2, and FIG. 4 is S220 illustrated in FIG. It is a diagram for explaining the steps, and FIG. 5 is an exemplary view showing a state in which the output is converted into an utterance in step S240 shown in FIG. 2 , and FIG. 6 is a matching rate by the third method in step S250 shown in FIG. 2 . It is a view for explaining a method of calculating It is a diagram for explaining a method of judging.

먼저, 학습자는 발음을 평가할 수 있는 어플리케이션을 다운로드 하고, 이를 활성화한다. 그러면, 도 3에 도시된 바와 같이, 문자 제공부(110)는 학습 가능한 문장을 제공한다(S210). First, the learner downloads an application that can evaluate pronunciation and activates it. Then, as shown in FIG. 3 , the text providing unit 110 provides a learnable sentence ( S210 ).

학습자는 제공된 문장 중에서 어느 하나의 문장을 선택할 수 도 있고, 첫 문장부터 순차적으로 선택할 수 도 있다. The learner may select any one sentence from among the provided sentences, or may select sequentially from the first sentence.

문장 선택이 완료되면, 제어부(160)는 학습자에 의해 선택된 학습자의 성별과 평가받고자 하는 특정 국가에 대한 억양에 대한 정보를 수신한다(S220).When the sentence selection is completed, the control unit 160 receives information on the gender of the learner selected by the learner and the intonation of a specific country to be evaluated (S220).

도 4에 도시된 바와 같이, 본 발명은 학습자의 선택된 언어 방식 즉, 영국식 또는 미국식 중에서 선택된 방법에 따라 그에 대응하여 평가가 이루어질 수 있도록 한다. As shown in FIG. 4 , the present invention enables the evaluation to be made in response to the learner's selected language method, that is, according to a method selected from among British and American styles.

그 다음, 학습자가 제공받은 문장에 대해 녹음을 수행하면, 발화 음성 수집부(120)는 녹음을 수행함에 발생된 학습자의 발화 음성을 수신한다(S230).Then, when the learner records the provided sentence, the spoken voice collecting unit 120 receives the learner's spoken voice generated during the recording (S230).

텍스트 변환부(130)는 수신된 발화 음성을 텍스트 형태로 변환한다(S240). The text conversion unit 130 converts the received spoken voice into a text form (S240).

그리고, 도 5에 도시된 바와 같이, 변환된 발화문은 즉각적으로 어플리케이션을 통해 출력된다 And, as shown in FIG. 5 , the converted utterance is immediately output through the application.

부연하자면, 텍스트 변환부(130)는 발화된 음성을 단어별로 분류하고, 분류된 단어를 이용하여 발화문을 획득한다. 이때, 텍스트 변환부(130)는 발화문 이외에 추가적으로 복수의 후보 문장을 생성한다.In other words, the text conversion unit 130 classifies the spoken voice for each word, and obtains the utterance by using the classified word. In this case, the text conversion unit 130 additionally generates a plurality of candidate sentences in addition to the spoken sentence.

예를 들면, 제공된 문장이 "I am hungry"일 경우, 텍스트 변환부(130)는 제공된 문자에 대응하여 발화된 음성을 "I am hungry"로 변환하고, 그 외에 "I was hungry", "I am angry"와 같이 후보 문장을 생성한다. For example, when the provided sentence is "I am hungry", the text conversion unit 130 converts a voice uttered in response to the provided character to "I am hungry", and other "I was hungry", "I Create a candidate sentence like "am angry".

S250 단계가 완료되면, 평가부(140)는 발화문과 제공된 문장을 상호 비교하여 매칭율을 산출한다(S250). When step S250 is completed, the evaluation unit 140 calculates a matching rate by comparing the spoken sentence and the provided sentence with each other (S250).

평가부(140)는 3가지 방법 중에서 어느 하나의 방법에 의해 매칭율을 산출한다. The evaluation unit 140 calculates the matching rate by any one of three methods.

먼저 첫번째 방법에 대해 설명하면, 평가부(140)는 제공된 문장과 발화문을 한 단어씩 비교하고, 매칭되는 단어의 개수에 따라 매칭률을 산출한다. First, the first method will be described. The evaluation unit 140 compares the provided sentence and the utterance word by word, and calculates a matching rate according to the number of matching words.

예를 들면, 제공된 문장이 "I want to go home"이고, 발화문이 "I want to go home"라고 가정하면, 평가부(140)는 해당되는 발화문에 대해 매칭률을 100%로 산출한다.For example, assuming that the provided sentence is “I want to go home” and the utterance is “I want to go home”, the evaluation unit 140 calculates a matching rate of 100% for the corresponding utterance. .

반면에, 발화문이 "I go home"일 경우, 평가부(140)는 해당하는 발화문에 대해 매칭률을 20%로 산출한다. On the other hand, when the utterance is “I go home”, the evaluation unit 140 calculates a matching rate of 20% for the utterance.

그 다음, 두번째 발명에 대해 설명하면, 평가부(140)는 제공된 문장과 발화문을 단어 단위로 분류하고, 분류된 단어 마다 비교하여 제공된 문장 또는 발화문에 가중치를 부여하여 매칭률을 산출한다. Next, referring to the second invention, the evaluation unit 140 classifies the provided sentences and utterances in units of words, compares each classified word, and assigns weights to the provided sentences or utterances to calculate a matching rate.

예를 들면, 제공된 문장이 "I want to go home"으로서 "I", "want", "to", "go", "home"으로 단어 단위로 분류되었고, 제공된 문장에 대응하는, 발화문은 "I go home "으로서 "I", "go", "home"으로 단어 단위로 분류되었다고 가정한다.For example, the provided sentence is classified into words as "I", "want", "to", "go", and "home" as "I want to go home", and the utterance corresponding to the provided sentence is It is assumed that "I go home" is classified in word units as "I", "go", and "home".

그러면, 평가부(140)는 한 단어씩 비교하여 단어를 제대로 발음하거나 단어를 한번 더 발음하였을 경우에는 발화문에 가중치를 부여하고, 단어를 잘못 발음하였거나 단어를 빼먹었을 경우에는 제공된 문장에 가중치를 부여하여 매칭률을 산출한다. Then, the evaluation unit 140 compares the words one by one and gives weight to the spoken sentence when the word is correctly pronounced or the word is pronounced once more, and gives weight to the provided sentence when the word is incorrectly pronounced or omitted to calculate the matching rate.

마지막으로 3번째 방법에 대해 설명하면, 평가부(140)는 제공된 문장과 발화문을 단어 단위로 분류한다. 그리고 도 6에 도시된 바와 같이, 평가부(140)는 분류된 단어를 복수의 경우의 수로 설정한 다음, 설정된 경우의 수를 이용하여 매칭률을 산출한다. Finally, when the third method is described, the evaluation unit 140 classifies the provided sentences and utterances in units of words. And, as shown in FIG. 6 , the evaluation unit 140 sets the number of classified words as a plurality of cases, and then calculates a matching rate using the set number of cases.

그 다음, 오인식 단어 수집부(150)는 평가가 완료된 발화문으로부터 인식되지 않은 단어 또는 학습자가 발음에 실패하는 단어들을 추출하고, 추출된 단어들을 테이블 형태로 저장한다(S260). Next, the misrecognized word collecting unit 150 extracts unrecognized words or words for which the learner fails to pronounce from the evaluated utterances, and stores the extracted words in the form of a table ( S260 ).

오인식 단어 수집부(150)는 인식하는데 어려움을 느끼는 숫자, 또는 자주 활용되지 않은 단어 또는 인간의 감정을 표현하는 단어 등에 대해 제대로 인식되지 않고 잘못 변환될 확률이 높으므로, 오인식 단어 수집부(150)는 오인식된 단어들을 추출하고, 추출된 단어를 테이블 형태로 저장한다. Since the misrecognized word collecting unit 150 is not properly recognized and has a high probability of being converted incorrectly for numbers that are difficult to recognize, words that are not frequently used, or words that express human emotions, the misrecognized word collecting unit 150 extracts misrecognized words and stores the extracted words in the form of a table.

테이블 형태로 오인식 단어들은 저장하는 이유는 발화문과 비교하여 오답 또는 정답으로 분석하는데 정확한 인식률을 도모할 수 있도록 하기 위함이다. The reason for storing the misrecognized words in a table form is to achieve an accurate recognition rate when analyzing them as incorrect or correct answers compared to the spoken text.

마지막으로 제어부(160)는 매칭률이 높은 발화문에 대한 분석된 결과를 출력한다(S270).Finally, the controller 160 outputs the analysis result for the utterance having a high matching rate (S270).

도 7에 도시된 바와 같이, 제어부(160)는 발음 정확도, 강세, 국가에 따른 억양과 비슷한 정도를 분석하여 출력한다. As shown in FIG. 7 , the control unit 160 analyzes and outputs pronunciation accuracy, stress, and a degree similar to intonation according to country.

이때 도 8에 도시된 바와 같이, 제어부(160)는 학습자의 발화된 음성을 이용하여 파형을 획득하고, 획득한 파형과 발화문에 포함된 단어의 시간정보를 매칭하여 단어의 강세를 획득한다. At this time, as shown in FIG. 8 , the controller 160 acquires a waveform using the learner's uttered voice, and acquires the stress of the word by matching the acquired waveform with time information of the word included in the utterance.

한편, 제어부(160)는 학습자 목소리의 세기에 따라 단어의 강세를 판단할 수도 있다. Meanwhile, the controller 160 may determine the stress of a word according to the strength of the learner's voice.

본 발명은 도면에 도시된 실시 예를 참고로 하여 설명되었으나 이는 예시적인 것에 불과하며, 당해 기술이 속하는 분야에서 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시 예가 가능하다는 점을 이해할 것이다. 따라서 본 발명의 진정한 기술적 보호범위는 아래의 특허청구범위의 기술적 사상에 의하여 정해져야 할 것이다.Although the present invention has been described with reference to the embodiment shown in the drawings, this is merely exemplary, and it will be understood by those skilled in the art that various modifications and equivalent other embodiments are possible therefrom. will be. Therefore, the true technical protection scope of the present invention should be determined by the technical spirit of the following claims.

100 : 발음 평가 시스템
110 : 문자 제공부
120 : 발화 음성 수집부
130 : 텍스트 변환부
140 : 평가부
150 : 오인식 단어 수집부
160 : 제어부100: pronunciation evaluation system
110: text providing unit
120: spoken voice collecting unit
130: text conversion unit
140: evaluation unit
150: misrecognized word collection unit
160: control unit

Claims

In the pronunciation evaluation system based on deep learning and NLP (Natural Language Processing) technology,
A text providing unit that provides the sentences necessary for learning,
Speech voice collection unit for collecting the voice uttered by the learner in response to the provided sentence;
A text conversion unit that converts the spoken voice into a text form and obtains the converted text as an utterance;
An evaluation unit that compares and analyzes the acquired utterance and the provided sentence using a pre-established deep learning model to calculate a matching rate for the sentence;
A misrecognized word collecting unit that collects information on words that the learner continuously pronounces incorrectly using the evaluation results and words that cause errors in recognition, and stores the collected words in a table format; and
If the utterance and the provided sentence match, the evaluated result is digitized, and the result is classified by color according to pronunciation accuracy and stress match. It includes a control unit that
The evaluation unit,
A pronunciation evaluation system that classifies the provided sentences and utterances by word unit, compares each classified word, and assigns weights to the provided sentences or utterances to calculate a matching rate.

According to claim 1,
The text conversion unit,
A pronunciation evaluation system for classifying the uttered speech for each word, generating an utterance obtained using the classified word, a plurality of candidate sentences similar to the utterance, and providing time information corresponding to each of the classified words.

According to claim 1,
The evaluation unit,
A pronunciation evaluation system that compares the provided sentences and utterances word by word, and calculates a matching rate according to the number of matching words.

delete

According to claim 1,
The evaluation unit,
When a word is pronounced correctly or a word is pronounced once more, weight is given to the speech,
A pronunciation evaluation system that gives weight to a given sentence in case a word is pronounced incorrectly or a word is omitted.

According to claim 1,
The evaluation unit,
A pronunciation evaluation system that classifies the provided sentences and utterances in units of words, sets the classified word as the number of multiple cases, and calculates a matching rate using the set number of cases.

According to claim 1,
The misrecognized word collection unit,
A pronunciation evaluation system that extracts numbers, infrequently used words, and words that the learner repeatedly incorrectly pronounces, classifies them as misrecognized words, and stores the classified misrecognized words in a table format.

According to claim 1,
The control unit is
Acquire a waveform using the learner's spoken voice,
A pronunciation evaluation system that matches the acquired waveform with time information of the words included in the utterance to obtain the stress of the word, compares the acquired stress with the stress of the original word, and expresses whether the correct answer is correct in color.