KR102755648B1

KR102755648B1 - Artficial intelligence-based multilingual interpretation system and service method

Info

Publication number: KR102755648B1
Application number: KR1020230140856A
Authority: KR
Inventors: 김운
Original assignee: 주식회사 디엠티랩스; (주)넥타르소프트
Priority date: 2023-10-20
Filing date: 2023-10-20
Publication date: 2025-01-22
Anticipated expiration: 2043-10-20

Abstract

본 발명은 인공지능 기반의 다국어 통역 시스템 및 서비스 방법에 관한 것으로, 보다 구체적으로, 번역텍스트와 번역음성을 사용자별 출력 모드에 따라 맞춤형으로 출력할 수 있도록 서비스할 수 있는 인공지능 기반의 다국어 통역 시스템 및 서비스 방법에 관한 것이다. 이를 위해, 인공지능 기반 다국어 통역 시스템은 딥러닝 기반의 언어별 음성인식모델을 이용하여, 제1 언어의 대상음성을 제1 출력텍스트로 변환처리하는 음성인식부, 상기 제1 출력텍스트를 자동번역 알고리즘을 통해 제2 언어의 제2 출력텍스트로 번역처리하는 자동번역부, 상기 대상음성으로부터 HIFI-GAN 기술을 통해 분석된 음성특징에 기초하여, 상기 제2 출력텍스트에 대한 제2 언어의 번역음성을 합성하는 음성합성부 및 상기 제2 출력텍스트와 상기 번역음성을 웹기반의 통역서비스를 통해 기등록된 사용자별 출력 모드에 따라 맞춤형으로 출력할 수 있도록 서비스하는 통합서비스부를 포함하고, 상기 웹기반의 통역서비스는 각 고객단말로부터 네트워크를 통해 접속가능한 웹페이지이다. The present invention relates to an artificial intelligence-based multilingual interpretation system and service method, and more specifically, to an artificial intelligence-based multilingual interpretation system and service method capable of providing a service in which a translated text and a translated voice can be output in a customized manner according to an output mode for each user. To this end, the artificial intelligence-based multilingual interpretation system includes a speech recognition unit that converts a target speech in a first language into a first output text using a deep learning-based language-specific speech recognition model, an automatic translation unit that translates the first output text into a second output text in a second language through an automatic translation algorithm, a speech synthesis unit that synthesizes a translated speech in a second language for the second output text based on speech features analyzed from the target speech through HIFI-GAN technology, and an integrated service unit that provides a service in which the second output text and the translated speech can be output in a customized manner according to an output mode for each user registered through a web-based interpretation service, wherein the web-based interpretation service is a webpage that can be accessed from each customer terminal through a network.

Description

{ARTFICIAL INTELLIGENCE-BASED MULTILINGUAL INTERPRETATION SYSTEM AND SERVICE METHOD}

본 발명은 인공지능 기반의 다국어 통역 시스템 및 서비스 방법에 관한 것으로, 보다 구체적으로, 번역텍스트와 번역음성을 사용자별 출력 모드에 따라 맞춤형으로 출력할 수 있도록 서비스할 수 있는 인공지능 기반의 다국어 통역 시스템 및 서비스 방법에 관한 것이다. The present invention relates to an artificial intelligence-based multilingual interpretation system and service method, and more specifically, to an artificial intelligence-based multilingual interpretation system and service method capable of providing a service in which translated text and translated voice can be output in a customized manner according to an output mode for each user.

교통, 통신수단의 발달에 따라 국가 간의 인적, 물적 교류가 활발해져 왔다. 이러한 인적, 물적 교류의 확대에도 불구하고 국가 사이의 이종 언어는 의사소통에 있어서 장벽으로 작용하고 있다.With the development of transportation and communication, human and material exchanges between countries have become more active. Despite this expansion of human and material exchanges, different languages between countries are acting as a barrier to communication.

이종 언어로 인한 불편함을 덜어주기 위해 이종 언어로 된 문자 간의 변환을 번역이라 한다. 그리고 spoken language translation system은 이종 언어의 음성 간의 변환을 의미하는데, 방송뉴스 통역이 이에 해당한다.Translation is the conversion between letters in different languages to alleviate the inconvenience caused by different languages. And the spoken language translation system refers to the conversion between the sounds of different languages, and broadcast news interpretation is an example of this.

특히, 국가 간의 각종 컨퍼런스에서 이종 언어로 인한 대화자 사이의 불편함은 동시통역을 통해 해결되고 있다. 과거 동시 통역사들의 전유물이었던 동시통역이 음성인식, 자동번역 및 음성합성의 기술의 발전으로 인해 기계에 의한 자동통역이 이루어지고 있다.In particular, the discomfort between speakers in different languages at various conferences between countries is being resolved through simultaneous interpretation. Simultaneous interpretation, which was once the exclusive domain of simultaneous interpreters, is now being automatically interpreted by machines due to the advancement of technologies such as speech recognition, automatic translation, and speech synthesis.

이때, 자동통역은 양방향에서 제1 언어의 음성을 제2 언어의 음성으로 변환하는 것이다. 구체적으로, 자동통역은 제1 언어의 발화를 음성인식, 자동번역 등의 과정을 거쳐서 제2 언어로 변환하고, 이를 자막으로 출력하거나 혹은 음성합성 후 스피커를 통해 들려주는 과정 및 기술을 의미한다.At this time, automatic interpretation is the process and technology of converting the speech of a first language into the speech of a second language in both directions. Specifically, automatic interpretation refers to the process and technology of converting the speech of a first language into a second language through processes such as speech recognition and automatic translation, and outputting it as subtitles or making it heard through a speaker after voice synthesis.

본 발명에서는 자동통역을 통해 획득된 번역텍스트와 번역음성을 사용자별 출력 모드에 따라 맞춤형으로 출력할 수 있는 인공지능 기반의 다국어 통역 시스템 및 서비스 방법을 제공하고자 한다. The present invention aims to provide an artificial intelligence-based multilingual interpretation system and service method capable of outputting translated text and translated voice obtained through automatic interpretation in a customized manner according to the output mode for each user.

본 발명은 상기와 같은 문제점을 해결하기 위한 것으로서, 본 발명의 목적은 번역텍스트와 번역음성을 사용자별 출력 모드에 따라 맞춤형으로 출력할 수 있도록 서비스할 수 있는 인공지능 기반의 다국어 통역 시스템 및 서비스 방법을 제공하기 위한 것이다. The present invention is intended to solve the above problems, and an object of the present invention is to provide an artificial intelligence-based multilingual interpretation system and service method capable of providing a service in which translated text and translated voice can be output in a customized manner according to the output mode of each user.

본 발명의 상기 및 다른 목적과 이점은 바람직한 실시예를 설명한 하기의 설명으로부터 분명해질 것이다.The above and other objects and advantages of the present invention will become apparent from the following description of preferred embodiments.

상기와 같은 목적을 달성하기 위한 본 발명의 일실시예에 따른 인공지능 기반 다국어 통역 시스템은 딥러닝 기반의 언어별 음성인식모델을 이용하여, 제1 언어의 대상음성을 제1 출력텍스트로 변환처리하는 음성인식부, 상기 제1 출력텍스트를 자동번역 알고리즘을 통해 제2 언어의 제2 출력텍스트로 번역처리하는 자동번역부, 상기 대상음성으로부터 HIFI-GAN과 같은 종단형 음성합성 기술을 통해 분석된 음성특징에 기초하여, 상기 제2 출력텍스트에 대한 제2 언어의 번역음성을 합성하는 음성합성부 및 상기 제2 출력텍스트와 상기 번역음성을 웹기반의 통역서비스를 통해 기등록된 사용자별 출력 모드에 따라 맞춤형으로 출력할 수 있도록 서비스하는 통합서비스부를 포함하고, 상기 웹기반의 통역서비스는 각 고객단말로부터 네트워크를 통해 접속가능한 웹페이지이다. In order to achieve the above object, according to one embodiment of the present invention, an AI-based multilingual interpretation system includes a speech recognition unit which converts a target speech in a first language into a first output text using a deep learning-based language-specific speech recognition model, an automatic translation unit which translates the first output text into a second output text in a second language using an automatic translation algorithm, a speech synthesis unit which synthesizes a translation speech in a second language for the second output text based on speech features analyzed from the target speech using an end-to-end speech synthesis technology such as HIFI-GAN, and an integrated service unit which provides a service so that the second output text and the translation speech can be output in a customized manner according to an output mode for each user registered through a web-based interpretation service, wherein the web-based interpretation service is a webpage which can be accessed from each customer terminal through a network.

상기와 같은 목적을 달성하기 위한 본 발명의 일실시예에 따른 인공지능 기반 다국어 통역을 서비스하는 방법으로서, 음성인식부가 고객단말로부터 웹기반의 통역서비스를 통해 녹음되는 제1 언어의 대상음성을 전송받는 단계, 상기 음성인식부가 딥러닝 기반의 언어별 음성인식모델을 이용하여, 상기 대상음성을 제1 출력텍스트로 변환처리하는 단계, 자동번역부가 상기 제1 출력텍스트를 자동번역 알고리즘을 통해 제2 언어의 제2 출력텍스트로 번역처리하는 단계, 음성합성부가 상기 대상음성으로부터 HIFI-GAN과 같은 종단형 음성합성 기술을 통해 분석된 음성특징에 기초하여, 상기 제2 출력텍스트에 대한 제2 언어의 번역음성을 합성하는 단계 및 통합서비스부가 상기 통역서비스를 통해 기등록된 사용자별 출력 모드에 따라, 상기 고객단말에 상기 제2 출력텍스트와 상기 번역음성을 맞춤형으로 출력할 수 있도록 서비스하는 단계를 포함한다.In order to achieve the above object, according to one embodiment of the present invention, a method for providing artificial intelligence-based multilingual interpretation includes the steps of: a voice recognition unit receiving a target voice in a first language recorded from a customer terminal through a web-based interpretation service; a voice recognition unit converting and processing the target voice into a first output text using a deep learning-based language-specific voice recognition model; an automatic translation unit translating and processing the first output text into a second output text in a second language using an automatic translation algorithm; a voice synthesis unit synthesizing a translation voice in a second language for the second output text based on voice features analyzed from the target voice through an end-to-end voice synthesis technology such as HIFI-GAN; and a service step of an integrated service unit enabling the second output text and the translation voice to be output in a customized manner to the customer terminal according to a user-specific output mode pre-registered through the interpretation service.

본 발명의 실시예에 따르면, 번역텍스트와 번역음성을 사용자별 출력 모드에 따라 맞춤형으로 출력할 수 있도록 서비스함으로써, 동시 통역에 대한 보다 정확한 이해도를 높이는 동시에 편의성을 높일 수 있다.According to an embodiment of the present invention, by providing a service in which translated text and translated voice can be output in a customized manner according to the output mode of each user, more accurate understanding of simultaneous interpretation can be increased, while convenience can be improved.

도 1은 본 발명의 실시예에 따른 인공지능 기반의 다국어 통역 시스템(1000)을 개략적으로 나타내는 도이다.
도 2(A) 내지 도 2(D)는 웹기반의 통역서비스를 나타내는 실시예들이다.
도 3은 도 1의 자동번역부(200)에 대한 실시예를 나타내는 블록도이다.
도 4는 도 1의 통합서비스부(400)에 대한 일 실시예를 나타내는 블록도이다.
도 5는 도 4의 결정부(420)를 구체적으로 설명하기 위한 블록도이다.
도 6은 도 1의 다국어 통역 시스템(1000)의 인공지능 기반 다국어 통역 서비스방법을 개략적으로 나타내는 순서도이다.
도 7은 도 3의 자동번역부(200)의 동작을 구체적으로 나타내는 동작프로세스이다.
도 8은 도 4의 통합서비스부(400)의 일 실시예에 따른 동작을 구체적으로 나타내는 동작프로세스이다. FIG. 1 is a diagram schematically illustrating an artificial intelligence-based multilingual interpretation system (1000) according to an embodiment of the present invention.
Figures 2(A) to 2(D) are examples showing web-based interpretation services.
Fig. 3 is a block diagram showing an embodiment of the automatic translation unit (200) of Fig. 1.
Figure 4 is a block diagram showing one embodiment of the integrated service unit (400) of Figure 1.
Figure 5 is a block diagram specifically explaining the decision unit (420) of Figure 4.
Figure 6 is a flowchart schematically illustrating an artificial intelligence-based multilingual interpretation service method of the multilingual interpretation system (1000) of Figure 1.
Figure 7 is an operation process specifically showing the operation of the automatic translation unit (200) of Figure 3.
Figure 8 is an operation process specifically showing the operation according to one embodiment of the integrated service unit (400) of Figure 4.

이하, 본 발명의 실시예와 도면을 참조하여 본 발명을 상세히 설명한다. 이들 실시예는 오로지 본 발명을 보다 구체적으로 설명하기 위해 예시적으로 제시한 것일 뿐, 본 발명의 범위가 이들 실시예에 의해 제한되지 않는다는 것은 당업계에서 통상의 지식을 가지는 자에 있어서 자명할 것이다.Hereinafter, the present invention will be described in detail with reference to examples and drawings of the present invention. It will be apparent to those skilled in the art that these examples are merely presented as examples to more specifically explain the present invention, and that the scope of the present invention is not limited by these examples.

또한, 달리 정의하지 않는 한, 본 명세서에서 사용되는 모든 기술적 및 과학적 용어는 본 발명이 속하는 기술 분야의 숙련자에 의해 통상적으로 이해되는 바와 동일한 의미를 가지며, 상충되는 경우에는, 정의를 포함하는 본 명세서의 기재가 우선할 것이다.Additionally, unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs, and in case of conflict, the description in this specification, including definitions, shall prevail.

도면에서 제안된 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다. 그리고, 어떤 부분이 어떤 구성 요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성 요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에서 기술한 "부"란, 특정 기능을 수행하는 하나의 단위 또는 블록을 의미한다.In order to clearly explain the invention proposed in the drawings, parts that are not related to the description have been omitted, and similar parts have been given similar drawing reference numerals throughout the specification. In addition, when a part is said to "include" a certain component, this does not mean that other components are excluded, but rather that other components can be further included, unless otherwise specifically stated. In addition, a "part" described in the specification means a single unit or block that performs a specific function.

각 단계들에 있어 식별부호(제1, 제2, 등)는 설명의 편의를 위하여 사용되는 것으로 식별부호는 각 단계들의 순서를 설명하는 것이 아니며, 각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않는 이상 명기된 순서와 다르게 실시될 수 있다. 즉, 각 단계들은 명기된 순서와 동일하게 실시될 수도 있고 실질적으로 동시에 실시될 수도 있으며 반대의 순서대로 실시될 수도 있다.The identification codes (1st, 2nd, etc.) for each step are used for convenience of explanation and do not describe the order of each step. Each step may be performed in a different order than stated unless the context clearly indicates a specific order. That is, each step may be performed in the same order as stated, may be performed substantially simultaneously, or may be performed in the opposite order.

도 1은 본 발명의 실시예에 따른 인공지능 기반의 다국어 통역 시스템(1000)을 개략적으로 나타내는 도이고, 도 2(A) 내지 도 2(D)는 웹기반의 통역서비스를 나타내는 실시예들이다. FIG. 1 is a diagram schematically showing an artificial intelligence-based multilingual interpretation system (1000) according to an embodiment of the present invention, and FIGS. 2(A) to 2(D) are examples showing web-based interpretation services.

도 1 내지 도 2(D)를 참조하여 설명하면, 인공지능 기반의 다국어 통역 시스템(1000)은 음성인식부(100), 자동번역부(200), 음성합성부(300) 및 통합서비스부(400)를 포함할 수 있다. Referring to FIGS. 1 to 2(D), an artificial intelligence-based multilingual interpretation system (1000) may include a voice recognition unit (100), an automatic translation unit (200), a voice synthesis unit (300), and an integrated service unit (400).

먼저, 음성인식부(100)는 딥러닝 기반의 언어별 음성인식모델을 이용하여, 제1 언어의 대상음성을 인식하고 제1 출력텍스트로 변환처리할 수 있다. First, the voice recognition unit (100) can recognize the target voice of the first language and convert it into the first output text using a deep learning-based language-specific voice recognition model.

여기서, 딥러닝 기반의 언어별 음성인식모델은 음성, 발음 및 내용 등을 인식하기 위한 연속발화 기반의 종단형(End-to-End) 인공신경망일 수 있다. Here, the deep learning-based language-specific speech recognition model can be an end-to-end artificial neural network based on continuous speech to recognize voice, pronunciation, and content.

예를 들면, 연속발화 기반의 종단형 인공신경망은 음성의 연속적인 발화를 하나의 시퀀스로 인식하여 문자로 변환하기 때문에, 기존의 음성 인식 기술보다 정확도가 높고, 다양한 환경에서 사용이 가능합니다. For example, a continuous speech-based end-to-end artificial neural network recognizes continuous speech as a sequence and converts it into text, so it is more accurate than existing speech recognition technology and can be used in various environments.

일 실시예에 따라, 음성인식부(100)는 제1 언어의 대상음성으로부터 기설정된 잡음과 왜곡을 제거하는 동시에 음성볼륨을 증폭시킨 다음에 기설정된 전처리 범위에 따라 음성프레임과 음성스페트럼을 정규화 처리할 수 있다. According to one embodiment, the voice recognition unit (100) can remove preset noise and distortion from a target voice of a first language, amplify the voice volume, and then normalize the voice frame and voice spectrum according to a preset preprocessing range.

여기서, 기설정된 전처리 범위는 기설정된 음성의 크기, 주파수 및 시간을 포함할 수 있다. Here, the preset preprocessing range may include preset sound size, frequency and time.

다른 실시예에 따라, 음성인식부(100)는 기수집된 언어별 발화음성 데이터, 분야별 발화음성 데이터, 기호 및 숫자 표현 관련 데이터를 잡음제거, 음질향상처리 및 분류하는 기설정된 정제기술을 통해 딥러닝 기반의 언어별 음성인식모델을 위한 정제용 학습데이터로 처리할 수 있다. 여기서, 정제용 학습데이터는 언어별 태그가 라벨링될 수 있다. According to another embodiment, the voice recognition unit (100) may process pre-collected language-specific speech data, field-specific speech data, and symbol and number expression-related data into refined learning data for a deep learning-based language-specific speech recognition model through a preset refinement technique that removes noise, improves sound quality, and classifies the collected language-specific speech data, field-specific speech data, and symbol and number expression-related data. Here, the refined learning data may be labeled with language-specific tags.

또 다른 실시예에 따라, 음성인식부(100)는 제1 언어의 대상음성과 제1 출력텍스트 간의 일치성 여부 검사를 수행함에 따라 확인된 검사결과에 기초하여, 비일치성 발화음성-텍스트 데이터를 사전에 등록할 수 있다.According to another embodiment, the speech recognition unit (100) may perform a check for consistency between the target speech of the first language and the first output text, and based on the result of the check, non-matching spoken speech-to-text data may be registered in advance.

이때, 음성인식부(100)는 제1 언어의 대상음성으로부터 변환처리된 제1 출력텍스트에 따라, 비일치성 발화음성-텍스트 데이터 여부 검사를 사전에 수행할 수 있다. At this time, the voice recognition unit (100) can perform a check in advance for non-matching spoken voice-text data based on the first output text converted from the target voice of the first language.

또 다른 실시예에 따라, 음성인식부(100)는 장소나 상황 텍스트를 인식하는 개체명인식 기술, 기설정된 위기상황별 등급이 태깅된 텍스트를 인식하는 기술 및 상황별 키워드 기반의 자연어처리 기술을 이용하여, 제1 언어의 대상음성으로부터 변환처리된 제1 출력텍스트에 대한 의미적 오류, 문법 및 맞춤법 오류, 불필요 단어를 수정할 수 있다. According to another embodiment, the voice recognition unit (100) can correct semantic errors, grammatical and spelling errors, and unnecessary words in the first output text converted from the target voice of the first language by using entity recognition technology that recognizes location or situational text, technology that recognizes text tagged with a predetermined crisis situation level, and natural language processing technology based on situational keywords.

다음으로, 자동번역부(200)는 제1 출력텍스트를 자동번역 알고리즘을 통해 제2 언어의 제2 출력텍스트로 번역처리할 수 있다. Next, the automatic translation unit (200) can translate the first output text into a second output text in a second language through an automatic translation algorithm.

여기서, 자동번역 알고리즘은 제1 출력텍스트를 제2 출력텍스트로 번역처리하기 위하여, 기설정된 셀프어텐션 트랜스포머(Self-Attention Transformer) 기반, 사전학습 (Pre-trained) 기반의 종단형(End-to-End) 인공지능 자동번역(Neuronal Machine Translation: NMT) 중 적어도 하나의 알고리즘일 수 있다. Here, the automatic translation algorithm may be at least one of a preset self-attention transformer-based, pre-trained, end-to-end neural machine translation (NMT) algorithm to translate the first output text into the second output text.

이때, 적어도 하나의 번역알고리즘은 다국어 도메인 특화 번역 말뭉치, 민원행정 번역 말뭉치, 관광 관련 번역 말뭉치, 교통 관련 번역 말뭉치, 길안내 관련 번역 말뭉치, 쇼핑 관련 번역 말뭉치, 음식 관련 번역 말뭉치 및 용어 관련 번역 말뭉치를 사전에 학습할 수 있다. At this time, at least one translation algorithm can learn in advance a multilingual domain-specific translation corpus, a civil affairs translation corpus, a tourism-related translation corpus, a transportation-related translation corpus, a road guidance-related translation corpus, a shopping-related translation corpus, a food-related translation corpus, and a terminology-related translation corpus.

일 실시예에 따라, 자동번역부(200)는 자동번역 알고리즘을 학습하기 위하여 번역용 각 말뭉치 데이터를 수집할 때, 기설정된 길이 구간을 벗어난 문장, 기설정된 개수의 기호를 포함한 문장, 원문과 번역문이 비대칭을 가진 비대칭문장, 대역어를 가진 문장, 표현의 일관성이 없는 문장, 데이터 편향성(Suffering)을 가진 문장을 필터링하여 사전에 제거할 수 있다. According to one embodiment, when collecting each corpus data for translation to learn an automatic translation algorithm, the automatic translation unit (200) may filter and remove in advance sentences that are outside a preset length range, sentences that include a preset number of symbols, asymmetric sentences with asymmetry between the original text and the translated text, sentences with translated words, sentences with inconsistent expressions, and sentences with data bias (suffering).

다른 실시예에 따라, 자동번역부(200)는 번역용 각 말뭉치 데이터로부터 추출된 다의어에 대한 각 의미를 문장별 태깅 처리하고, 분야별 말뭉치 분배비율을 기설정된 비율에 따라 조절하며, 자동번역 알고리즘에 대한 단어 표현 단위를 설정하는 토근, 학습용 단어의 수를 설정하는 vocab 사이즈 및 모델 학습 횟수를 기설정된 수치로 개별적으로 설정할 수 있다. According to another embodiment, the automatic translation unit (200) may tag each meaning of a polysemous word extracted from each corpus data for translation by sentence, adjust the corpus distribution ratio by field according to a preset ratio, and individually set the token for setting the word expression unit for the automatic translation algorithm, the vocab size for setting the number of words for learning, and the number of model learning times to preset values.

다음으로, 음성합성부(300)는 제1 언어의 대상음성으로부터 HIFI-GAN과 같은 종단형 음성합성 기술을 통해 분석된 음성특징에 기초하여, 제2 출력텍스트에 대한 제2 언어의 번역음성을 합성하여 출력할 수 있다. Next, the voice synthesis unit (300) can synthesize and output a translation voice of a second language for a second output text based on voice features analyzed from a target voice of a first language through an end-to-end voice synthesis technology such as HIFI-GAN.

여기서, 음성특징은 음성의 높낮이, 음량 및 음색을 포함할 수 있다. Here, voice features may include pitch, volume, and timbre of the voice.

이때, HIFI-GAN과 같은 종단형 음성합성 기술은 대상음성에서 추출한 멜 스펙트로그램을 입력받아 현실적인 음성을 생성하기 위한 기술일 수 있다. At this time, end-to-end voice synthesis technology such as HIFI-GAN can be a technology for generating realistic voices by inputting a Mel Spectrogram extracted from a target voice.

다음으로, 통합서비스부(400)는 제2 출력텍스트와 번역음성을 웹기반의 통역서비스를 통해 기등록된 사용자별 출력 모드에 따라 각 고객단말(10_1~10_N)에 맞춤형으로 출력할 수 있도록 서비스할 수 있다. Next, the integrated service department (400) can provide a service in which the second output text and the translation voice can be customized for each customer terminal (10_1 to 10_N) according to the output mode of each registered user through a web-based interpretation service.

여기서, 웹기반의 통역서비스는 도 2(A) 내지 도 2(D)에 도시된 바와 같이, 각 고객단말(10_1~10_N)로부터 네트워크를 통해 접속가능한 웹페이지로서, 다국어 통역을 서비스하기 위한 다양한 프로그램 메뉴도구들을 포함할 수 있다. Here, the web-based interpretation service is a web page that can be accessed through a network from each customer terminal (10_1 to 10_N), as shown in FIGS. 2(A) to 2(D), and may include various program menu tools for providing multilingual interpretation services.

이러한 웹기반의 통역서비스는 콜센터 서버와 연동이 가능하도록 구현되거나, 대민상담을 위한 양방향 태블릿 키오스크 장치, 스마트 디바이스 단말의 어플리케이션 또는 PC 프로그램으로도 구현될 수 있다. 예를 들면, 웹기반의 통역서비스는 어플리케이션으로 구현되어, 주민센터(면사무소), 행정복지센터, 119안전센터, 관광안내센터, 보건의료원, 보건소, 약국, 상급종합병원, 종합병원, 병원, 한방병원, 치과병원을 포함하여 외국인이 자주 이용하는 장소에서의 통역서비스에 사용될 수 있다.These web-based interpretation services can be implemented to be linked with call center servers, or can be implemented as two-way tablet kiosk devices for public consultation, smart device terminal applications, or PC programs. For example, web-based interpretation services can be implemented as applications and used for interpretation services in places frequently visited by foreigners, including community centers (district offices), administrative welfare centers, 119 safety centers, tourist information centers, health care centers, public health centers, pharmacies, general hospitals, general hospitals, hospitals, oriental medicine hospitals, and dental hospitals.

이때, 사용자별 출력 모드는 제1 및 제2 개별 출력 모드, 동시 출력 모드 및 순차반복 학습 모드를 포함할 수 있다. 구체적으로, 제1 개별 출력 모드는 번역음성을 출력한 다음에 제2 출력텍스트를 출력하는 모드이고, 제2 개별 출력 모드는 제2 출력텍스트를 출력한 다음에 번역음성을 출력하는 모드이며, 동시 출력 모드는 번역음성과 제2 출력텍스트를 동시에 출력하는 모드이고, 순차반복 학습 모드는 번역음성과 제2 출력텍스트를 순차적으로 반복하여 출력하는 모드일 수 있다. At this time, the user-specific output mode may include first and second individual output modes, a simultaneous output mode, and a sequential repetition learning mode. Specifically, the first individual output mode is a mode in which a translated voice is output and then a second output text is output, the second individual output mode is a mode in which the second output text is output and then a translated voice is output, the simultaneous output mode is a mode in which the translated voice and the second output text are output simultaneously, and the sequential repetition learning mode may be a mode in which the translated voice and the second output text are output sequentially and repeatedly.

일 실시예에 따라, 통합서비스부(400)는 제1 출력텍스트로부터 식별된 대화 중요도에 기초하여, 기설정된 다국어 번역웹서비스를 통해 제2 출력텍스트에 대한 교차 번역 검증을 수행할 수 있다.According to one embodiment, the integrated service unit (400) may perform cross-translation verification on the second output text through a preset multilingual translation web service based on the dialogue importance identified from the first output text.

예를 들면, 일상 대화의 경우 통합서비스부(400)는 제1 출력텍스트에 대한 대화 중요도를 보다 낮은 중요도로 식별함에 따라, 기설정된 다국어 번역웹서비스를 통해 제2 출력텍스트에 대한 교차 번역 검증을 스킵할 수 있다. 또한, 병원이나 긴급 상황 대화의 경우 통합서비스부(400)는 제1 출력텍스트에 대한 대화 중요도를 보다 높은 중요도로 식별함에 따라, 기설정된 다국어 번역웹서비스를 통해 제2 출력텍스트에 대한 교차 번역 검증을 수행할 수 있다. For example, in the case of daily conversation, the integrated service department (400) may skip cross-translation verification for the second output text through a preset multilingual translation web service as the conversation importance for the first output text is identified as lower in importance. In addition, in the case of hospital or emergency conversation, the integrated service department (400) may perform cross-translation verification for the second output text through a preset multilingual translation web service as the conversation importance for the first output text is identified as higher in importance.

다른 실시예에 따라, 통합서비스부(400)는 고객단말(10)로부터 웹기반의 통역서비스를 통해 전송받는 고객정보로부터 확인되는 청각장애등급에 기초하여, 음성합성부(300)의 동작을 스킵하고 해당 제2 출력텍스트를 통역서비스 웹페이지를 통해 출력할 수 있도록 서비스할 수 있다. According to another embodiment, the integrated service unit (400) may skip the operation of the voice synthesis unit (300) and output the corresponding second output text through the interpretation service webpage based on the hearing impairment level confirmed from the customer information transmitted from the customer terminal (10) through the web-based interpretation service.

또 다른 실시예에 따라, 통합서비스부(400)는 전문가로부터 평가받는 제1 및 제2 출력텍스트 간의 번역 등급 점수에 기초하여, 자동번역 알고리즘에 대한 가중치를 보정할 수 있다. According to another embodiment, the integrated service unit (400) may adjust the weight for the automatic translation algorithm based on the translation grade scores between the first and second output texts evaluated by experts.

또 다른 실시예에 따라, 통합서비스부(400)는 제1 언어의 대상음성과 제2 언어의 번역음성 간의 기설정된 유사도 계산식을 통해 계산된 유사도 점수에 기초하여, HIFI-GAN과 같은 종단형 음성합성 기술에 적용되는 기설정된 제어변수값들을 보정할 수 있다. According to another embodiment, the integrated service unit (400) may correct preset control variable values applied to an end-to-end speech synthesis technology such as HIFI-GAN based on a similarity score calculated through a preset similarity calculation formula between a target voice in a first language and a translated voice in a second language.

이하, 구체적인 실시예와 비교예를 통하여 본 발명의 구성 및 그에 따른 효과를 보다 상세히 설명하고자 한다. 그러나, 본 실시예는 본 발명을 보다 구체적으로 설명하기 위한 것이며, 본 발명의 범위가 이들 실시예에 한정되는 것은 아니다.Hereinafter, the configuration of the present invention and the effects thereof will be described in more detail through specific examples and comparative examples. However, these examples are intended to explain the present invention more specifically, and the scope of the present invention is not limited to these examples.

도 3은 도 1의 자동번역부(200)에 대한 실시예를 나타내는 블록도이다. Figure 3 is a block diagram showing an embodiment of the automatic translation unit (200) of Figure 1.

도 1 내지 도 3을 참조하여 설명하면, 자동번역부(200)는 제1 및 제2 탐색부(210, 220)와 번역평가부(230)를 포함할 수 있다. Referring to FIGS. 1 to 3, the automatic translation unit (200) may include first and second search units (210, 220) and a translation evaluation unit (230).

먼저, 제1 탐색부(210)는 제1 출력텍스트로부터 검출된 적어도 하나의 제1 언어의 특징단어에 대응되는 제1 단어이미지를 웹검색을 통해 탐색할 수 있다. First, the first search unit (210) can search for a first word image corresponding to at least one feature word of the first language detected from the first output text through a web search.

다음으로, 제2 탐색부(220)는 제2 출력텍스트로부터 검출된 적어도 하나의 제2 언어의 특징단어에 대응되는 제2 단어이미지를 웹검색을 통해 탐색할 수 있다. Next, the second search unit (220) can search for a second word image corresponding to at least one feature word of the second language detected from the second output text through a web search.

다음으로, 번역평가부(230)는 제1 및 제2 단어이미지 간의 대응 여부에 기초하여, 자동번역 알고리즘에 대한 번역 정확도를 평가할 수 있다. Next, the translation evaluation unit (230) can evaluate the translation accuracy of the automatic translation algorithm based on whether there is a correspondence between the first and second word images.

실시예에 따라, 통합서비스부(400)는 자동번역 알고리즘에 대한 번역 정확도가 기설정된 수치 이상으로 평가될 때, 제2 출력텍스트에 제1 및 제2 단어이미지를 연관이미지로 태깅하여 서비스할 수 있다. According to an embodiment, when the translation accuracy for the automatic translation algorithm is evaluated to be higher than a preset value, the integrated service department (400) may tag the first and second word images as associated images in the second output text and provide the service.

도 4는 도 1의 통합서비스부(400)에 대한 일 실시예를 나타내는 블록도이다. Figure 4 is a block diagram showing one embodiment of the integrated service unit (400) of Figure 1.

도 1 내지 도 4를 참조하여 설명하면, 통합서비스부(400)는 통역테스트부(410), 결정부(420), 갭타임결정부(430) 및 출력제어부(440)를 포함할 수 있다. Referring to FIGS. 1 to 4, the integrated service unit (400) may include an interpretation test unit (410), a decision unit (420), a gap time decision unit (430), and an output control unit (440).

먼저, 통역테스트부(410)는 기등록된 사용자의 통역 능력을 분석하기 위한 통역등급 테스트를 각 고객단말(10_1~10_N)에 제공함에 따라 응답받는 각 응답정보에 기초하여, 각 사용자에 대한 자막이해등급과 음성이해등급을 판정할 수 있다. First, the interpretation test department (410) provides an interpretation grade test to analyze the interpretation ability of a registered user to each customer terminal (10_1 to 10_N), and based on each response information received, the subtitle comprehension grade and voice comprehension grade for each user can be determined.

다음으로, 결정부(420)는 각 사용자에 대한 자막이해등급과 음성이해등급에 기초하여, 제2 출력텍스트와 번역음성에 대한 사용자별 출력 모드를 제1 및 제2 개별 출력 모드, 동시 출력 모드 및 순차반복 학습 모드 중 어느 하나로 결정할 수 있다. Next, the decision unit (420) can determine the user-specific output mode for the second output text and the translated voice as one of the first and second individual output modes, the simultaneous output mode, and the sequential repetition learning mode based on the subtitle understanding level and the voice understanding level for each user.

다음으로, 갭타임결정부(430)는 제1 언어의 대상음성으로부터 확인되는 발화속도와 발화량에 기초하여, 제2 출력텍스트와 번역음성 사이의 출력 갭타임을 결정결정할 수 있다. Next, the gap time determination unit (430) can determine the output gap time between the second output text and the translated voice based on the speech rate and speech amount confirmed from the target voice of the first language.

예를 들면, 발화속도가 기설정된 속도 이상이고, 발화량이 기준수치 이상인 경우, 갭타임결정부(430)는 출력 갭타임을 기준타임보다 일정구간 길도록 결정할 수 있다. For example, if the firing speed is higher than the preset speed and the firing amount is higher than the standard value, the gap time determination unit (430) can determine the output gap time to be a certain period longer than the standard time.

다음으로, 출력제어부(440)는 출력 갭타임과 사용자별 출력 모드에 기초하여, 제2 출력텍스트와 번역음성에 대한 출력 순서를 결정하고 웹기반의 통역서비스를 통해 각 고객단말(10_1~10_N)에 제2 출력텍스트와 번역음성을 출력시킬 수 있다. Next, the output control unit (440) determines the output order for the second output text and the translated voice based on the output gap time and the user-specific output mode, and can output the second output text and the translated voice to each customer terminal (10_1 to 10_N) through a web-based interpretation service.

실시예에 따라, 출력제어부(440)는 제1 언어의 대상음성으로부터 분석되는 기설정된 심리상태에 기초하여, 어느 하나의 출력모드를 제1 개별 출력 모드로 강제 전환시킬 수 있다. According to an embodiment, the output control unit (440) may forcibly switch one output mode to the first individual output mode based on a preset psychological state analyzed from the target voice of the first language.

예를 들면, 제1 언어의 대상음성으로부터 분석되는 심리상태가 긴장상태로 분석되는 경우, 출력제어부(440)는 어느 하나의 출력모드를 제1 개별 출력 모드로 강제 전환시킴으로써, 번역음성에 대한 분위기 상태를 미리 파악할 수 있게 지원할 수 있다. For example, if the psychological state analyzed from the target voice of the first language is analyzed as a state of tension, the output control unit (440) can support the ability to identify the mood state of the translated voice in advance by forcibly switching one output mode to the first individual output mode.

도 5는 도 4의 결정부(420)를 구체적으로 설명하기 위한 블록도이다. Figure 5 is a block diagram specifically explaining the decision unit (420) of Figure 4.

도 4와 도 5를 참조하여 설명하면, 결정부(420)는 제1 내지 제4 모드등록부(421~424)를 포함할 수 있다. Referring to FIGS. 4 and 5, the decision unit (420) may include first to fourth mode registration units (421 to 424).

먼저, 제1 모드등록부(421)는 자막이해등급이 기설정된 임계등급보다 낮고 음성이해등급이 기설정된 임계등급 이상인 경우, 제2 출력텍스트와 번역음성에 대한 사용자별 출력 모드를 제1 개별 출력 모드로 등록할 수 있다. First, the first mode registration unit (421) can register the user-specific output mode for the second output text and the translated voice as the first individual output mode when the subtitle understanding level is lower than the preset threshold level and the voice understanding level is higher than the preset threshold level.

다음으로, 제2 모드등록부(422)는 음성이해등급이 기설정된 임계등급보다 낮고 자막이해등급이 기설정된 임계등급 이상인 경우, 제2 출력텍스트와 번역음성에 대한 사용자별 출력 모드를 제2 개별 출력 모드로 등록할 수 있다. Next, the second mode registration unit (422) can register the user-specific output mode for the second output text and the translated voice as the second individual output mode when the voice understanding level is lower than the preset threshold level and the subtitle understanding level is higher than the preset threshold level.

다음으로, 제3 모드등록부(423)는 음성이해등급과 자막이해등급이 기설정된 임계등급 이상인 경우, 제2 출력텍스트와 번역음성에 대한 사용자별 출력 모드를 동시 출력 모드로 등록할 수 있다. Next, the third mode registration unit (423) can register the user-specific output mode for the second output text and the translated voice as a simultaneous output mode when the voice understanding level and subtitle understanding level are equal to or higher than a preset threshold level.

다음으로, 제4 모드등록부(424)는 음성이해등급과 자막이해등급이 기설정된 임계등급 미만인 경우, 제2 출력텍스트와 번역음성에 대한 사용자별 출력 모드를 순차반복 학습 모드로 등록할 수 있다. Next, the fourth mode registration unit (424) can register the user-specific output mode for the second output text and the translated voice in a sequential repetition learning mode when the voice understanding level and subtitle understanding level are below a preset threshold level.

도 6은 도 1의 다국어 통역 시스템(1000)의 인공지능 기반 다국어 통역 서비스방법을 개략적으로 나타내는 순서도이다. Figure 6 is a flowchart schematically illustrating an artificial intelligence-based multilingual interpretation service method of the multilingual interpretation system (1000) of Figure 1.

도 1과 도 6을 참조하여 설명하면, 먼저, S110 단계에서, 음성인식부(100)는 어느 하나의 고객단말(예컨대, 10_1)로부터 웹기반의 통역서비스를 통해 녹음되는 제1 언어의 대상음성을 전송받을 수 있다. Referring to FIG. 1 and FIG. 6, first, at step S110, the voice recognition unit (100) can receive a target voice in a first language recorded through a web-based interpretation service from any one customer terminal (e.g., 10_1).

그런 다음, S120 단계에서, 음성인식부(100)는 딥러닝 기반의 언어별 음성인식모델을 이용하여, 제1 언어의 대상음성을 제1 출력텍스트로 변환처리할 수 있다. Then, at step S120, the speech recognition unit (100) can convert the target speech of the first language into the first output text using a deep learning-based language-specific speech recognition model.

이때, S130 단계에서, 자동번역부(200)는 음성인식부(100)를 통해 변환처리된 제1 출력텍스트를 자동번역 알고리즘를 통해 제2 언어의 제2 출력텍스트로 번역처리할 수 있다. At this time, at step S130, the automatic translation unit (200) can translate the first output text converted through the voice recognition unit (100) into a second output text in a second language through an automatic translation algorithm.

실시예에 따라, S130 단계에서, 통합서비스부(400)는 제1 출력텍스트로부터 식별되는 대화 중요도에 기초하여, 기설정된 다국어 번역웹서비스를 통해 제2 출력텍스트에 대한 교차 번역 검증을 수행할 수 있다. According to an embodiment, at step S130, the integrated service unit (400) may perform cross-translation verification on the second output text through a preset multilingual translation web service based on the dialogue importance identified from the first output text.

예를 들면, 대화 중요도가 일정수준보다 높은 중요도로 식별된 경우, 통합서비스부(400)는 다국어 서비스를 통해 제2 출력텍스트에 대한 교차 번역 검증을 수행할 수 있다. 또한, 대화 중요도가 일정수준보다 낮은 중요도로 식별된 경우, 통합서비스부(400)는 OpenAI의 whisper, 아마존 TTS, 구글 ASR의 다국어 서비스를 통해 제2 출력텍스트에 대한 교차 번역 검증을 스킵할 수 있다. For example, if the conversation importance is identified as being higher than a certain level, the integrated service unit (400) can perform cross-translation verification for the second output text through a multilingual service. Also, if the conversation importance is identified as being lower than a certain level, the integrated service unit (400) can skip cross-translation verification for the second output text through a multilingual service of OpenAI's whisper, Amazon TTS, and Google ASR.

그런 다음, S140 단계에서, 음성합성부(300)는 대상음성으로부터 HIFI-GAN과 같은 종단형 음성합성 기술을 통해 분석된 음성특징에 기초하여, 제2 출력텍스트에 대한 제2 언어의 번역음성을 합성할 수 있다. Then, at step S140, the voice synthesis unit (300) can synthesize a second language translation voice for the second output text based on voice features analyzed from the target voice through an end-to-end voice synthesis technology such as HIFI-GAN.

이후, S150 단계에서, 통합서비스부(400)는 웹기반의 통역서비스를 통해 기등록된 사용자별 출력 모드에 따라, 어느 하나의 고객단말(예컨대, 10_1)에 맞춤형으로 제2 출력텍스트와 번역음성을 출력할 수 있도록 서비스할 수 있다. Thereafter, at step S150, the integrated service department (400) can provide a service to output the second output text and the translation voice in a customized manner to one customer terminal (e.g., 10_1) according to the output mode of each registered user through a web-based interpretation service.

도 7은 도 3의 자동번역부(200)의 동작을 구체적으로 나타내는 동작프로세스이다. Figure 7 is an operation process specifically showing the operation of the automatic translation unit (200) of Figure 3.

도 3, 도 6 및 도 7을 참조하여 설명하면, 먼저, S210 단계에서, 자동번역부(200)는 제1 출력텍스트로부터 검출된 적어도 하나의 제1 특징단어에 대응되는 제1 단어이미지를 웹검색을 통해 탐색할 수 있다. Referring to FIGS. 3, 6, and 7, first, in step S210, the automatic translation unit (200) can search for a first word image corresponding to at least one first feature word detected from the first output text through a web search.

그런 다음, S220 단계에서, 자동번역부(200)는 제2 출력텍스트로부터 검출된 적어도 하나의 제1 특징단어에 대응되는 제2 단어이미지를 웹검색을 통해 탐색할 수 있다. Then, at step S220, the automatic translation unit (200) can search for a second word image corresponding to at least one first feature word detected from the second output text through a web search.

이후, 자동번역부(200)는 제1 및 제2 단어이미지를 이미지 분류 모델에 적용함에 따라 도출되는 각 출력확률값의 대응 여부에 기초하여, 자동번역 알고리즘에 대한 번역 정확도를 평가할 수 있다. Thereafter, the automatic translation unit (200) can evaluate the translation accuracy for the automatic translation algorithm based on whether each output probability value corresponds as the first and second word images are applied to the image classification model.

도 8은 도 4의 통합서비스부(400)의 일 실시예에 따른 동작을 구체적으로 나타내는 동작프로세스이다. Figure 8 is an operation process specifically showing the operation according to one embodiment of the integrated service unit (400) of Figure 4.

도 4와 도 8을 참조하여 설명하면, 먼저, S310 단계에서, 통합서비스부(400)는 사용자의 통역 능력을 분석하기 위한 통역등급 테스트를 어느 하나의 고객단말(예컨대, 10_1)에 제공함에 따라 응답받는 응답정보에 기초하여, 해당 사용자에 대한 자막이해등급과 음성이해등급을 판정할 수 있다. Referring to FIGS. 4 and 8, first, at step S310, the integrated service department (400) provides an interpretation level test for analyzing the user's interpretation ability to one customer terminal (e.g., 10_1), and based on the response information received, the subtitle understanding level and voice understanding level for the user can be determined.

그런 다음, S320 단계에서, 통합서비스부(400)는 자막이해등급과 음성이해등급에 기초하여, 제2 출력텍스트와 번역음성에 대한 사용자별 출력 모드를 결정할 수 있다. Then, at step S320, the integrated service unit (400) can determine a user-specific output mode for the second output text and the translated voice based on the subtitle understanding level and the voice understanding level.

구체적으로, 자막이해등급이 기설정된 임계등급보다 낮고 음성이해등급이 기설정된 임계등급 이상인 경우, 통합서비스부(400)는 사용자별 출력 모드를 제1 개별 출력 모드로 등록할 수 있다. Specifically, when the subtitle comprehension level is lower than the preset threshold level and the voice comprehension level is higher than the preset threshold level, the integrated service unit (400) can register the user-specific output mode as the first individual output mode.

또한, 음성이해등급이 기설정된 임계등급보다 낮고 자막이해등급이 기설정된 임계등급 이상인 경우, 통합서비스부(400)는 사용자별 출력 모드를 제2 개별 출력 모드로 등록할 수 있다. In addition, if the voice understanding level is lower than the preset threshold level and the subtitle understanding level is higher than the preset threshold level, the integrated service department (400) can register the user-specific output mode as the second individual output mode.

또한, 음성이해등급과 상기 자막이해등급이 기설정된 임계등급 이상인 경우, 통합서비스부(400)는 사용자별 출력 모드를 동시 출력 모드로 등록할 수 있다. In addition, if the voice understanding level and the subtitle understanding level are higher than the preset threshold level, the integrated service department (400) can register the user-specific output mode as a simultaneous output mode.

또한, 음성이해등급과 자막이해등급이 기설정된 임계등급 미만인 경우, 통합서비스부(400)는 사용자별 출력 모드를 순차반복 학습 모드로 등록할 수 있다. In addition, if the voice understanding level and subtitle understanding level are below the preset threshold level, the integrated service department (400) can register the user-specific output mode as a sequential repetition learning mode.

이때, S330 단계에서, 통합서비스부(400)는 대상음성으로부터 확인되는 발화속도와 발화량에 기초하여, 제2 출력텍스트와 번역음성 사이의 출력 갭타임을 결정할 수 있다. At this time, in step S330, the integrated service unit (400) can determine the output gap time between the second output text and the translated voice based on the speech rate and speech amount confirmed from the target voice.

그런 다음, S340 단계에서, 통합서비스부(400)는 출력 갭타임과 사용자별 출력 모드에 기초하여, 제2 출력텍스트와 번역음성에 대한 출력 순서를 결정하여 웹기반의 통역서비스를 통해 출력시킬 수 있다. Then, at step S340, the integrated service department (400) can determine the output order for the second output text and the translated voice based on the output gap time and the user-specific output mode and output them through a web-based interpretation service.

나아가, 본 발명의 실시예에 따른 인공지능 기반 다국어 통역 서비스를 실행하기 위한 프로그램을 포함하는 컴퓨터 판독가능 기록매체가 구현될 수 있다. Furthermore, a computer-readable recording medium including a program for executing an artificial intelligence-based multilingual interpretation service according to an embodiment of the present invention can be implemented.

이때, 인공지능 기반 다국어 통역 서비스는 음성인식부가 고객단말로부터 웹기반의 통역서비스를 통해 녹음되는 제1 언어의 대상음성을 전송받는 단계, 상기 음성인식부가 딥러닝 기반의 언어별 음성인식모델을 이용하여, 상기 제1 언어의 대상음성을 제1 출력텍스트로 변환처리하는 단계, 자동번역부가 상기 제1 출력텍스트를 자동번역 알고리즘를 통해 제2 언어의 제2 출력텍스트로 번역처리하는 단계, 음성합성부가 상기 대상음성으로부터 HIFI-GAN과 같은 종단형 음성합성 기술을 통해 분석된 음성특징에 기초하여, 상기 제2 출력텍스트에 대한 제2 언어의 번역음성을 합성하여 출력하는 단계 및 통합서비스부가 상기 제2 출력텍스트와 상기 번역음성을 웹기반의 통역서비스를 통해 기등록된 사용자별 출력 모드에 따라 맞춤형으로 출력할 수 있도록 서비스하는 단계를 포함하도록 구현될 수 있다.At this time, the AI-based multilingual interpretation service can be implemented to include a step in which a voice recognition unit receives a target voice in a first language recorded from a customer terminal through a web-based interpretation service, a step in which the voice recognition unit converts and processes the target voice in the first language into a first output text using a deep learning-based language-specific voice recognition model, a step in which an automatic translation unit translates and processes the first output text into a second output text in a second language using an automatic translation algorithm, a step in which a voice synthesis unit synthesizes and outputs a translation voice in a second language for the second output text based on voice features analyzed from the target voice through an end-to-end voice synthesis technology such as HIFI-GAN, and a step in which an integrated service unit provides a service so that the second output text and the translation voice can be output in a customized manner according to a pre-registered user-specific output mode through a web-based interpretation service.

본 명세서에서는 본 발명자들이 수행한 다양한 실시예 가운데 몇 개의 예만을 들어 설명하는 것이나 본 발명의 기술적 사상은 이에 한정하거나 제한되지 않고, 당업자에 의해 변형되어 다양하게 실시될 수 있음은 물론이다.Although this specification describes only a few examples among various embodiments performed by the inventors of the present invention, the technical idea of the present invention is not limited or restricted thereto, and can be modified and implemented in various ways by those skilled in the art.

10_1~10_N: 각 고객단말
100: 음성인식부
200: 자동번역부
300: 음성합성부
400: 통합서비스부
1000: 인공지능 기반의 다국어 통역 시스템10_1~10_N: Each customer terminal
100: Voice recognition unit
200: Machine translation department
300: Voice synthesis section
400: Integrated Services Department
1000: AI-based multilingual interpretation system

Claims

A speech recognition unit that converts and processes a target speech of a first language into a first output text using a deep learning-based language-specific speech recognition model;
An automatic translation unit that translates the first output text into a second output text in a second language using an automatic translation algorithm;
A speech synthesis unit that synthesizes a second language translation speech for the second output text based on speech features analyzed from the target speech of the first language through an end-to-end speech synthesis technology such as HIFI-GAN; and
Includes an integrated service section that provides a service to output the second output text and the translated voice in a customized manner according to the output mode for each registered user through a web-based interpretation service.
The above web-based interpretation service is a web page that can be accessed through a network from each customer terminal.
The above user-specific output modes include first and second individual output modes, simultaneous output mode, and sequential repetition learning mode.
The above first individual output mode is a mode that outputs the translated voice and then outputs the second output text.
The above second individual output mode is a mode that outputs the second output text and then outputs the translated voice.
The above simultaneous output mode is a mode that outputs the translation voice and the second output text simultaneously.
The above sequential repetition learning mode is a mode that sequentially repeats and outputs the translated voice and the second output text.
The automatic translation unit includes a first search unit that searches for a first word image corresponding to at least one first feature word detected from the first output text through a web search;
A second search unit that searches for a second word image corresponding to at least one first feature word detected from the second output text through a web search; and
The first and second word images above It includes a translation evaluation unit that evaluates the translation accuracy of the automatic translation algorithm based on whether or not the liver corresponds,
The above integrated service department provides a service by tagging the first and second word images as related images to the second output text when the translation accuracy of the automatic translation algorithm is evaluated to be higher than a preset value.
The above integrated service department provides an interpretation level test to each customer terminal, and based on each response information received, an interpretation test department determines the subtitle comprehension level and voice comprehension level for each user;
A decision unit that determines the output mode for each user based on the subtitle comprehension level and the voice comprehension level;
A gap time determination unit that determines an output gap time between the second output text and the translated voice based on the speech rate and speech amount identified from the target voice; and
Based on the output gap time and the user-specific output mode, an output control unit is included that determines the output order for the second output text and the translated voice and outputs them through the interpretation service.
The above decision unit comprises a first mode registration unit that registers the user-specific output mode as a first individual output mode when the subtitle understanding level is lower than a preset threshold level and the voice understanding level is higher than the preset threshold level;
A second mode registration unit that registers the user-specific output mode as a second individual output mode when the above-mentioned voice understanding level is lower than a preset threshold level and the above-mentioned subtitle understanding level is higher than the preset threshold level;
A third mode registration unit that registers the user-specific output mode as a simultaneous output mode when the above voice understanding level and the above subtitle understanding level are equal to or higher than a preset threshold level; and
If the above voice understanding level and the above subtitle understanding level are below a preset threshold level, a fourth mode registration unit is included that registers the user-specific output mode as a sequential repetition learning mode.
The above output control unit forcibly switches the user-specific output mode to the first individual output mode based on the preset psychological state analyzed from the target voice,
The above integrated service department selectively performs cross-translation verification on the second output text through a preset multilingual translation web service based on the dialogue importance identified from the first output text.
An artificial intelligence-based multilingual interpretation system, wherein the speech recognition unit performs a check for consistency between the target speech of the first language and the first output text, and based on the results of the check, pre-registers non-matching speech-to-text data, and performs a check for non-matching speech-to-text data when converting the target speech of the first language into the first output text.

delete