KR20020026228A

KR20020026228A - Real Time Speech Translation

Info

Publication number: KR20020026228A
Application number: KR1020020011222A
Authority: KR
Inventors: 백수곤
Original assignee: 백수곤
Priority date: 2002-03-02
Filing date: 2002-03-02
Publication date: 2002-04-06

Abstract

본 발명은 다수의 방법을 통하여 입력되는 음성이나 소리를 동시에 특정의 소리나 언어, 문자로 실시간 변환시키는 방법에 관한 것이다.The present invention relates to a method for simultaneously converting a voice or a sound input through a plurality of methods into a specific sound, language, and text at the same time.

Description

Real Time Speech Translation

본 발명은 다수의 방법을 통하여 입력되는 음성이나 소리를 동시에 특정의 소리나 언어, 문자로 실시간 변환시키는 방법을 강구하고자 하는 것이다.The present invention seeks to devise a method for converting a voice or a sound input through a plurality of methods simultaneously into a specific sound, language, or text in real time.

본 기술은 소리의 실시간 변환에 관한 기술이다.This technology relates to the real-time conversion of sound.

본 발명은 다수의 방법을 통하여 입력되는 음성이나 소리를 동시에 특정의 소리나 언어, 문자로 실시간 변환시키는 방법을 찾고자 하는 것이다.The present invention seeks to find a method for simultaneously converting a voice or a sound input through a plurality of methods into a specific sound, language, and text in real time.

도 1 은 본 발명에 의한 실시간 음성변환 실시예1 is a real-time speech conversion embodiment according to the present invention

도 2 는 본 발명에 의한 실시간 음성변환 실시예2 is a real-time speech conversion embodiment according to the present invention

도 3 은 본 발명에 의한 실시간 음성변환 실시예3 is a real-time speech conversion embodiment according to the present invention

* 도면의 주요부분에 대한 설명* Description of the main parts of the drawings

1 : 음성 2 : 소리 11 : 통역 12 : 번역1: Voice 2: Sound 11: Interpretation 12: Translation

소리는 진동수의 많고 적음에 따른 높낮이와 에너지의 크기에 비례하는 진동폭의 장단에 따르는 세기와 진동 스펙트럼의 구조에 따르는 음색으로 특성지을 수 있다. 또한, 말하는 사람에 따라 음의장단(Rhythm), 강약(Accent), 억양(Intonation), 음가(Formant) 및 유연성(Weakening)등의 특징이 있어 듣는 사람은 상대가 누구인지를 알게 된다.The sound can be characterized by the tone according to the structure of the intensity and vibration spectrum according to the high and low frequency and the amplitude of the amplitude, which is proportional to the magnitude of the energy. In addition, depending on the speaker, Rhythm, Accent, Intonation, Formant, Weakening, etc., allows the listener to know who the other person is.

본 발명은 이와 같은 음의 다양한 특성을 화자(話者)나 소정의 개체별로 디지털 데이터화여 분류, 저장하고 실시간 입력되고 있는 소리를 저장된 데이터와 비교하여 화자나 개체를 알아내면서 동시에 사용자가 미리 지정한 사람이나 언어, 문자로 변환하여 실시간으로 출력하는 것으로 구성되어 있다. 입력되는 것은 [도 1]과 같이 사람의 음성이나 자연의 소리, 문자로서 여러 사람의 목소리나 여러 동물의 소리가 동시에 입력될 수도 있다. 이러한 소리나 문자는 [도 3]과 같이 인터넷이나 전화, 휴대폰, 녹음기 및 음성이나 소리를 재생하는 어떤 형태의 장비로부터도 입력될 수 있으며 이를 [도 2]와 같은 입력, 코드화, 변환, 출력의 단계로 사용한다.The present invention categorizes and stores various characteristics of such sounds by a speaker or a predetermined object, and compares the sound being input in real time with the stored data to find a speaker or an object and at the same time a user who has previously designated the sound. It is composed of real-time output by converting it to language, text, and text. As shown in FIG. 1, a voice of a person, a sound of nature, and a character may be input at the same time. Such sound or text may be input from the Internet or a telephone, a mobile phone, a voice recorder, or any type of device that reproduces voice or sound, such as [FIG. 3], and the input, encoding, conversion, and output of FIG. Use as a step.

입력되는 말은 고립단어 특성과 연결단어 특성 및 연속 음성 특성을 분석하여 화자별 특성으로 분류/저장한다. 변환부에서는 실시간 입력되는 언어를 특정의 언어로 통역하고, 이를 문자화하며, 특정인의 음성으로 변환한다.The input words are classified and stored as speaker-specific characteristics by analyzing isolated word characteristics, connected word characteristics, and continuous speech characteristics. The conversion unit translates the language input in real time into a specific language, textifies it, and converts the language into a specific person's voice.

이렇게 함으로서 죽은 사람의 음성으로 녹음 되지 않았던 말을 한다든지, 외국 대통령이 국내에 와서 말을 할 때 동시통역을 하는데 그 사람의 목소리로 말 할 수 있으며, A는 영어로 말해도 문장을 번역하여 한국어로 번역하여 말하는데 목소리가 그 사람의 목소리이며, 영문과 국문으로 기록된다. 이때 다양한 사람의 목소리와 언어, 문자로 변환이 가능하다.By doing this, you can say something that was not recorded in the voice of the dead or simultaneous interpretation when the foreign president came to Korea and spoke in the voice of the person. Translated and spoken, the voice is the person's voice, recorded in English and Korean. At this time, it can be converted into various voices, languages and characters.

또한, 여러 사람의 말이 입력될 때에 한 사람의 말을 추적할 수 있고, 또 한 사람의 말을 다양한 사람의 말과 혼합하여 출력할 수도 있다.In addition, when several people's words are input, one person's words can be tracked, and another person's words can be mixed with the words of various people and output.

인식하고 합성하는 말은 화자종속 조건으로 여러 사람의 말중에서 특정인의 말을 추적하여 기록하거나, 화자독립적으로 특정인이 아닌 여러 사람의 공통 의견을 추출할 수도 있으며, 화자 적응(Speaker Adaptive) 방식이어서 새로운 화자의 말을 추적, 특징을 추정할 수도 있다.Recognizing and synthesizing is a speaker-dependent condition that tracks and records the words of a specific person from the words of several people, or extracts common opinions of several people who are not speaker-independent speakers. You can also follow the speaker's words and estimate their characteristics.

새로운 사람의 말을 들으면 그의 특징을 기억하면서 데이터 베이스에 저장, 대화특성은 가상 최적치를 찾아내고 가상 최적치에 해당하는 실제 입력이 들어오면 가상치를 버리고 실제값으로 입력하여 가상치를 제거하면서 데이터 베이스를 실시간으로 관리한다.When you hear a new person, it remembers its characteristics and stores it in the database.The interactive feature finds the virtual optimal value, and when the actual input corresponding to the virtual optimal value comes in, discards the virtual value and inputs the actual value to remove the virtual value. Manage with.

이와 같은 발명을 통하여Through this invention

여러 음성에서 변조된 여러 음성으로From multiple voices to multiple modulated voices

선택된 사람의 변조된 단독 음성으로With the selected person's modulated single voice

통역된 여러 음성으로With multiple interpreted voices

선택된 사람의 통역된 단독 음성으로With the interpreted sole voice of the chosen person

여러 사람의 문자로By several people

특정인의 문자로In character

번역된 여러 사람의 문자로As translated characters

번역된 특정인의 문자로As a specific person translated

한사람의 음성에서 변조된 한사람의 음성으로From one person's voice to one person's voice

통역된 음성으로With an interpreted voice

특정인의 문자로 변환이 가능하다.Can be converted to a specific character.

그러므로 본 발명은 하나 이상의 음성이나 문자가 실시간으로 입력될 때에 이 음성이나 문자를 디지털 코드화 하기위하여 샘플링하고, 표본화하여 입력된 코드를 단위시간으로 절단한다.Therefore, in the present invention, when one or more voices or texts are input in real time, the voices or texts are sampled for digital coding, sampled, and the input codes are cut in unit time.

단위 음색의 특성을 구분하여 입력/저장/비교하는 단계하기 위해서는 푸리에 변환과 스펙트럼 분석을 통하여 사물과 말하는 사람의 특징을 추출하고, 비교하여 인식할 수 있다.In order to distinguish, input, store, and compare the characteristics of unit tones, a feature of an object and a speaker may be extracted, compared, and recognized through Fourier transform and spectrum analysis.

일련의 연속되는 음성으로 이루어진 문장의 특성을 구분하여 입력/저장/비교하는 단계와Inputting, storing, and comparing the characteristics of sentences consisting of a series of consecutive voices;

단위 음색과 일련의 연속되는 음성으로부터 화자(話者)의 대화특성을 추출하여 입력/저장/비교하는 단계와Extracting, storing, and comparing the dialogue characteristics of the speaker from the unit tone and a series of continuous voices;

입력된 언어를 특정의 언어로 통역하는 단계와Interpreting the input language into a specific language

입력된 문자를 특정의 문자로 번역하는 단계와Translating the input characters into specific characters

입력된 언어를 특정의 문자로 변환하는 단계와Converting the input language into specific characters

입력된 소리를 특정의 소리로 변성하는 단계와Converting the input sound into a specific sound; and

입력된 음성을 특정의 음성으로 변성하는 단계와Converting the input voice into a specific voice; and

하나 이상의 입력된 소리로부터 특정의 소리를 구분하는 단계와Distinguishing a particular sound from one or more input sounds; and

이미 입력되어 있는 데이터와 입력되고 있는 데이터로서는 요구되는 변환특성의 출력물을 얻기가 곤란할 경우에는 상기에 해당하는 최적의 단계를 추론하여 출력하되, 각각의 특성값이 추론치와 실제 값에 따른 출력을 기억하도록 데이터베이스화하여, 실제의 값이 입력될 경우에는 즉시 추론값을 실제값으로 대치하도록 하는 단계와If it is difficult to obtain the output of the required conversion characteristics from the data already inputted and the inputted data, the optimal step corresponding to the above is inferred and outputted, and each characteristic value is output according to the inference value and the actual value. Database to remember, so that when the actual value is entered, the inference value is replaced with the actual value immediately.

앞서 말한 통역, 번역, 변성 및 다수로부터 하나를 선택한 결과를 출력하는 단계로 이루어진 실시간 음성 처리방법으로 구성된다.It consists of a real-time speech processing method consisting of the above-described interpretation, translation, degeneration and outputting the result of selecting one from a plurality.

본 발명으로 실시간으로 입력되는 소리나 음성을 자신이 원하는 목소리나 언어, 문자로 출력할 수 있다.According to the present invention, a sound or a voice input in real time can be output in a voice, a language, and a character desired by the user.

Claims

When one or more voices or texts are input in real time, the voices or texts are digitally coded and the input codes are cut in unit time.

Inputting, storing, and comparing the characteristics of unit voices

Inputting, storing, and comparing the characteristics of sentences consisting of a series of consecutive voices;

Extracting, storing, and comparing the dialogue characteristics of the speaker from the unit tone and a series of continuous voices;

Interpreting the input language into a specific language

Translating the input characters into specific characters

Converting the input language into specific characters

Converting the input sound into a specific sound; and

Converting the input voice into a specific voice; and

Distinguishing a particular sound from one or more input sounds; and

If it is difficult to obtain the output of the required conversion characteristics from the data already inputted and the inputted data, the optimal step corresponding to the above is inferred and outputted, and each characteristic value is output according to the inference value and the actual value. Database to remember, so that when the actual value is entered, the inference value is replaced with the actual value immediately.

Real-time speech processing method comprising the steps of outputting the result of selecting one of the above-described interpretation, translation, degeneration and the majority

A device that inputs, stores, and compares the characteristics of unit voices.

A device that inputs, stores, and compares the characteristics of sentences consisting of a series of consecutive voices

A device that inputs, stores, and compares a speaker's dialogue characteristics from a unit tone and a series of continuous voices

A device for interpreting the input language into a specific language;

A device for translating input characters into specific characters

A device for converting input languages into specific characters;

A device for converting the input sound into a specific sound;

A device for converting an input voice into a specific voice

A device that distinguishes a particular sound from one or more input sounds

If it is difficult to obtain the output of the required conversion characteristics from the data already inputted and the inputted data, the optimum device corresponding to the above is inferred and outputted, and each characteristic value is output according to the inference value and the actual value. A device that makes a database to remember and replaces an inference value with an actual value immediately when an actual value is inputted;

Real-time voice processing device consisting of a device for outputting the results of the above interpretation, translation, degeneration and the selection of a plurality