WO2019208859A1 - Procédé de génération de dictionnaire de prononciation et appareil associé - Google Patents
Procédé de génération de dictionnaire de prononciation et appareil associé Download PDFInfo
- Publication number
- WO2019208859A1 WO2019208859A1 PCT/KR2018/004971 KR2018004971W WO2019208859A1 WO 2019208859 A1 WO2019208859 A1 WO 2019208859A1 KR 2018004971 W KR2018004971 W KR 2018004971W WO 2019208859 A1 WO2019208859 A1 WO 2019208859A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- vocabulary
- user
- voice
- phonetic
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/01—Assessment or evaluation of speech recognition systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Definitions
- the present invention relates to a method for applying (generating) a user dictionary (ie, a user pronunciation dictionary) in an automatic speech recognition device.
- a user dictionary ie, a user pronunciation dictionary
- Speech recognition is a technology that converts speech into text using a computer. This technology has made a rapid improvement in recognition rate in recent years.
- Unregistered words are words that the language processing system cannot process. When a vocabulary that cannot be reflected in the system at the time of system development occurs in service, the system needs a function to cope with it. Without proper correspondence to unregistered words, the system's output cannot be guaranteed and even the system may not work.
- the default method for dealing with non-registered words is to set a default processing method, and in the worst case, give a fixed answer.
- a vocabulary is analyzed as a noun
- a vocabulary is unknown (converted to phonetic) in a translation
- a word unknown is output as a unknown word.
- User dictionaries can be applied in a more aggressive way to respond to unregistered words in the service system. That is, the user directly enters the information of the vocabulary and allows the system to refer to it. For example, in linguistic analysis, users can register vocabulary and vocabulary analysis information, and in translation, the original vocabulary and translated vocabulary information, and the system refers to this information during operation and processes unregistered words.
- phonetic symbols have different notation or classification criteria for each system, and there is no unified standard.
- users who are new to the system need to learn about notation. Or you should have a program that converts vocabulary to phonetic symbols, but most programs that convert vocabulary to phonetic symbols are not 100% accurate.
- An object of the present invention is to propose a method for easily and accurately registering a vocabulary and a corresponding phonetic symbol in speech recognition.
- An aspect of the present invention provides a method of generating a pronunciation dictionary by a speech recognition apparatus, the method comprising: receiving a voice from a user or a mechanical device; Recognizing the input voice using an acoustic model; Generating a phonetic symbol for the speech based on the recognized result; Receiving a user vocabulary from a user; And associating the input user vocabulary with the generated phonetic symbol and storing the same in the phonetic dictionary.
- evaluating the reliability of the recognized result evaluating the reliability of the recognized result; And when the reliability is less than a preset threshold, requesting the user vocabulary corresponding to the phonetic symbol.
- selecting a misrecognition and / or unrecognition portion from the recognized result And requesting the user vocabulary corresponding to the misrecognition and / or unrecognition portion.
- evaluating the reliability of the recognized result Preferably, evaluating the reliability of the recognized result; Generating a vocabulary candidate group that can be inferred from the phonetic symbols when the reliability is smaller than a preset threshold and searching from a text corpus; And after registering the searched vocabulary candidate group in the pronunciation dictionary, recognizing the voice using the acoustic model.
- the method may further include requesting a user to determine whether to register the searched vocabulary or the vocabulary before registering the pronunciation dictionary.
- evaluating the reliability of the recognized result Searching for a vocabulary that can be inferred from the dictionary by storing all inferred phonetic symbols corresponding to each word of a text corpus when the reliability is less than a preset threshold; And after registering the searched vocabulary in the pronunciation dictionary, recognizing the voice by using the acoustic model.
- the method may further include requesting a user to determine whether to register the searched word or words before registering the pronunciation dictionary.
- the weight of the acoustic model may be set higher than the weight of the language model to generate the phonetic symbols.
- candidates of all possible letters or phoneme units may be added to the pronunciation dictionary and the language model used to generate the phonetic symbols.
- the phonetic dictionary and the language model used to generate the phonetic symbols may consist of all possible letters or phoneme units.
- a method of generating a pronunciation dictionary in a speech recognition apparatus comprising: receiving a speech-text parallel corpus in which speech data and text are corresponded; Recognizing the voice data using an acoustic model; Generating a phonetic symbol and a text result for the speech data based on the recognized result; And updating the phonetic dictionary with respect to differently expressed parts by matching the phonetic-text parallel corpus with the phonetic symbols and the text results.
- the method may further include learning the acoustic model using the updated pronunciation dictionary and the speech-text parallel corpus.
- Another aspect of the invention the step of receiving text from the user or the machine; Requesting a voice of a corresponding vocabulary from a user of a vocabulary of an input text not included in the pronunciation dictionary; Recognizing the input voice using an acoustic model; Generating a phonetic symbol for the speech based on the recognized result; And associating the input user vocabulary with the generated phonetic symbol and storing the same in the phonetic dictionary.
- a voice recognition apparatus comprising: a voice input unit configured to receive a voice from a user; A vocabulary input unit for receiving a user vocabulary from a user; A memory for storing a pronunciation dictionary comprising a vocabulary and a pronunciation symbol corresponding to the vocabulary; And a processor for controlling the voice input unit, the vocabulary input unit, and the storage unit, wherein the processor receives a voice from a user through the voice input unit, recognizes the input voice using an acoustic model, Generates a phonetic symbol for the voice based on the recognized result using a language model, receives a user vocabulary from a user through the vocabulary input unit, and associates the input user vocabulary with the generated phonetic symbol. Can be stored in the pronunciation dictionary.
- a user may easily and accurately register a vocabulary and a corresponding phonetic symbol in speech recognition.
- the user does not need to receive a separate education and does not need a separate program.
- FIG. 1 is a block diagram of a speech recognition apparatus according to an embodiment of the present invention.
- FIG. 2 is a diagram illustrating a pronunciation dictionary of a speech recognition apparatus according to an embodiment of the present invention.
- FIG. 3 is a diagram for describing a method of generating a pronunciation dictionary, according to an exemplary embodiment.
- FIG. 4 is a diagram for describing a method of generating a pronunciation dictionary, according to an exemplary embodiment.
- FIG. 5 is a diagram for describing a method of generating a pronunciation dictionary, according to an exemplary embodiment.
- FIG. 6 illustrates a configuration of a voice recognizer according to an embodiment of the present invention.
- FIG. 7 is a diagram illustrating a method of generating a pronunciation dictionary of a speech recognition apparatus according to an embodiment of the present invention.
- FIG. 1 is a block diagram of a speech recognition apparatus according to an embodiment of the present invention.
- the speech recognition apparatus 100 may include a voice input unit 110 that receives a voice from a user, a vocabulary input unit 120 that receives a vocabulary from a user, various data related to voice (especially a pronunciation dictionary, an acoustic model, And a processor 140 for processing the voice of the input user and storing (updating) the pronunciation dictionary.
- a voice input unit 110 that receives a voice from a user
- a vocabulary input unit 120 that receives a vocabulary from a user
- various data related to voice especially a pronunciation dictionary, an acoustic model
- a processor 140 for processing the voice of the input user and storing (updating) the pronunciation dictionary.
- the voice input unit 110 may include a microphone, and when a user's uttered voice is input, the voice input unit 110 converts the voice signal into an electrical signal and outputs it to the processor 140.
- the vocabulary input unit 120 generates input data for a vocabulary input from a user or a mechanical device and outputs the input data to the processor 140.
- the vocabulary input unit 120 may be implemented as a keyboard, a key pad, a touch pad (static pressure / capacitance), or the like.
- the processor 140 may acquire a voice data of the user by applying a speech recognition algorithm or a speech recognition engine to a signal received from the voice input unit 110.
- the signal input to the processor 140 may be converted into a more useful form for speech recognition, the processor 140 converts the input signal from analog form to digital form and detects the start and end points of the voice. To detect the actual speech section / data included in the speech data. This is called end point detection (EPD).
- EPD end point detection
- the processor 140 may perform Cepstrum, Linear Predictive Coefficient (LPC), Mel Frequency Cepstral Coefficient (MFCC), or Filter Bank energy within the detected interval.
- LPC Linear Predictive Coefficient
- MFCC Mel Frequency Cepstral Coefficient
- Filter Bank energy may be applied to extract a feature vector of a signal.
- the processor 140 may store information about the end point of the voice data and the feature vector by using the memory 130 that stores the data.
- the memory 130 may be a flash memory, a hard disc, a memory card, a read-only memory (ROM), a random access memory (RAM), a memory card, an electrically erasable programmable memory (EPEPROM).
- a storage medium may include at least one of a read-only memory (PROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, and an optical disk.
- the processor 140 may obtain a recognition result by comparing the extracted feature vector with the trained reference pattern.
- an acoustic model for modeling and comparing signal characteristics of speech and a language model for modeling linguistic order relations such as words or syllables corresponding to a recognized vocabulary may be used.
- the acoustic model may be further divided into a direct comparison method of setting a recognition object as a feature vector model and comparing it with a feature vector of speech data and a statistical method of statistically processing the feature vector of the recognition object.
- the direct comparison method is a method of setting a unit of a word, a phoneme, or the like to be recognized as a feature vector model and comparing how similar the input speech is.
- a vector quantization method is used. According to the vector quantization method, a feature vector of input speech data is mapped with a codebook, which is a reference model, and encoded into a representative value to compare the code values.
- the statistical model method is a method of constructing a unit for a recognition object into a state sequence and using a relationship between state columns.
- the status column may consist of a plurality of nodes.
- Methods using the relationship between the state columns again include Dynamic Time Warping (DTW), Hidden Markov Model (HMM), and neural networks.
- DTW Dynamic Time Warping
- HMM Hidden Markov Model
- Dynamic time warping is a method of compensating for differences in the time axis when considering the dynamic characteristics of speech whose length varies over time even if the same person pronounces the same. Assuming a Markov process with probability and observed probability of nodes (output symbols) in each state, we estimate state transition probability and observed probability of the nodes from the training data, and calculate the probability that the voice input from the estimated model will occur. Is a recognition technology.
- a language model that models linguistic order relations such as words or syllables may apply acoustic relations between units constituting language to units obtained in speech recognition, thereby reducing acoustic ambiguity and reducing recognition errors.
- the linguistic model has a model based on a statistical language model and a finite state automata (FSA), and the statistical linguistic model uses a chain probability of words such as Unigram, Bigram, and Trigram.
- FSA finite state automata
- the processor 140 may use any of the methods described above in recognizing the voice. For example, an acoustic model with a hidden Markov model may be used, or an N-best search method that integrates the acoustic model and the language model.
- the N-best search method can improve recognition performance by selecting up to N recognition result candidates using acoustic models and language models, and then re-evaluating the ranks of these candidates.
- the processor 140 may calculate a confidence score (or may be abbreviated as 'confidence') in order to secure the reliability of the recognition result.
- the confidence score is a measure of how reliable the result is for a speech recognition result. It can be defined as a relative value of the phoneme or word that is a recognized result and the probability that the word is spoken from other phonemes or words. have. Therefore, the reliability score may be expressed as a value between 0 and 1, or may be expressed as a value between 0 and 100. If the confidence score is larger than a predetermined threshold, the recognition result may be recognized, and if the confidence score is small, the recognition result may be rejected.
- the reliability score may be obtained according to various conventional reliability score obtaining algorithms.
- the processor 140 may perform the voice recognition operation using the acoustic model.
- the processor 140 may be implemented in a computer-readable recording medium using software, hardware, or a combination thereof.
- Hardware implementations include Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), Processors, and Micros It may be implemented using at least one of an electrical unit such as a controller (microcontrollers), a micro-processor (micro-processor).
- a software implementation it may be implemented with a separate software module that performs at least one function or operation, and the software code may be implemented by a software application written in a suitable programming language.
- the processor 140 implements the functions, processes, and / or methods proposed in FIGS. 2 to 7 to be described later.
- FIG. 2 is a diagram illustrating a pronunciation dictionary of a speech recognition apparatus according to an embodiment of the present invention.
- the pronunciation dictionary includes a vocabulary (surface form) 210 and a phonetic symbol 220 corresponding to each vocabulary.
- the phonetic symbol of the vocabulary of 'computer' may be represented as [K AH M P Y UW T ER].
- FIG. 2 illustrates English
- the present invention is not limited thereto and the present invention may be applied to all languages.
- the phonetic symbols representing the vocabulary and the pronunciation of each vocabulary may be expressed in different languages.
- FIG. 3 is a diagram for describing a method of generating a pronunciation dictionary, according to an exemplary embodiment.
- a user When a new word or pronunciation is added to a speech recognizer, a user inputs a vocabulary and a phonetic symbol pair by using a program or a program (which may generate a phonetic symbol different from the actual pronunciation). According to the embodiment of the present invention, the user only needs to input a voice or vocabulary.
- the user vocabulary for the voice recognizer (ie, the user vocabulary used in the voice recognizer) is composed of a surface layer form and a phonetic symbol of the vocabulary as described above.
- the voice recognizer may receive a voice from a user or a mechanical device to generate a phonetic symbol, and the voice recognizer may receive a user vocabulary from a user through a keyboard.
- the voice recognizer receives a voice from a user or a machine to generate a phonetic symbol [S IH STR AA N], receives a user vocabulary 'SYSTRAN' through a keyboard, and then pronounces the vocabulary 'SYSTRAN'.
- the process of matching [S IH STR AA N] as a symbol and registering it in a pronunciation dictionary (ie, memory) is illustrated.
- the speech recognizer minimizes the influence of a language model and maximizes the influence of an acoustic model among a myriad of phonetic candidates.
- the candidate with the highest score can be selected.
- the weight for the acoustic model may be set higher than the weight for the language model.
- a phonetic symbol may be generated using only an acoustic model among phonetic symbol candidates.
- all possible letters or phoneme unit candidates may be added to the phonetic dictionary and the language model. That is, phonetic symbols may be generated using a phonetic dictionary and a language model to which all possible letters or phoneme unit candidates are added.
- the phonetic dictionary and the language model may consist of all possible letter or phoneme unit candidates. That is, phonetic symbols may be generated using a phonetic dictionary and language model composed of all possible letters or phoneme unit candidates.
- the voice recognizer can generate phonetic symbols quickly, easily and accurately.
- the speech recognizer can generate a phonetic symbol as it is, without error, compared to a surface type vocabulary-pronounced symbol conversion program. Can be generated.
- the speech recognizer may be applied to a third language to generate phonetic symbols.
- FIG. 4 is a diagram for describing a method of generating a pronunciation dictionary, according to an exemplary embodiment.
- the phonetic symbol for the target voice is generated by the voice recognizer and the user only needs to input the corresponding vocabulary.
- the voice recognizer may receive a voice from a user or a machine to generate a phonetic symbol.
- the voice recognizer may receive a voice from a user or a machine, and recognize the input voice using an acoustic model.
- the speech recognizer may generate a phonetic symbol for the voice based on the recognized result.
- the voice recognizer recognizes the input voice and may automatically or manually (ie, by the user) select a misrecognized and unrecognized portion (that is, a portion that cannot determine the surface type vocabulary corresponding to the input voice). . Then, the surface type vocabulary corresponding to the misrecognition and the unrecognition portion can be directly input from the user or the mechanical device.
- the voice recognizer may store the phonetic vocabulary input from a user or a mechanical device and the phonetic symbols secured in a part of a false or unrecognized part of the voice recognizer in the phonetic dictionary.
- the voice recognizer generates a phonetic symbol [S IH S T R AA N] by receiving a voice from a user or a machine, but illustrates a case in which the surface type vocabulary cannot be determined.
- the voice recognizer requests a user of the surface type vocabulary corresponding to the phonetic symbol [S IH STR AA N], and the user uses the surface type vocabulary corresponding to the phonetic symbol [S IH STR AA N] (ie, 'SYSTRAN'). ) Can be input.
- the speech recognizer can store the phonetic word 'SYSTRAN' input from the user and the phonetic symbol [S IH STR AA N], which can be obtained from the part of the speech recognition that is misidentified or unrecognized. have.
- FIG. 5 is a diagram for describing a method of generating a pronunciation dictionary, according to an exemplary embodiment.
- FIG. 5 is an example opposite to the method of FIG. 4, in which a user recognizes only a voice corresponding to a vocabulary from a user and generates a phonetic symbol in a speech recognizer.
- the voice recognizer may receive a user vocabulary from a user through a keyboard.
- the voice recognizer determines whether the input user vocabulary exists in the existing pronunciation dictionary by comparing with the existing pronunciation dictionary. In other words, the voice recognizer may receive text from a user or a machine, and determine whether the text is present in a pronunciation dictionary among the vocabulary of the input text.
- the voice recognizer may request a voice for the input user vocabulary.
- the voice recognizer when the voice recognizer receives a voice of a pronunciation corresponding to the corresponding vocabulary from a user or a machine, the voice recognizer recognizes the user voice using an acoustic model.
- the voice recognizer may generate a phonetic symbol corresponding to the input voice, and store the phonetic dictionary in correspondence with the input surface type vocabulary and the generated phonetic symbol.
- the voice recognizer may request a voice for the 'SYSTRAN' from the user or the apparatus.
- a voice for 'SYSTRAN' is received from a user or a mechanical device, a phonetic symbol [S IH S T R AA N] may be generated from the voice.
- the input vocabulary 'SYSTRAN' and the generated phonetic symbol [S IH S T R AA N] may be associated with each other and stored in a pronunciation dictionary (ie, a memory).
- FIG. 6 illustrates a configuration of a voice recognizer according to an embodiment of the present invention.
- a new vocabulary may be automatically added to a pronunciation dictionary by automatically extracting an unregistered vocabulary from a voice / text pair. Based on this, a new acoustic model can be learned from the updated pronunciation dictionary and the speech / text pair.
- a speech recognizer when a speech recognizer receives a speech-text parallel corpus stored in correspondence with speech data, the speech recognizer recognizes speech data of the input speech-text parallel corpus using an acoustic model and pronounces the speech data from the speech data. Create a symbol. That is, the speech recognizer may generate a phonetic symbol and text result of the speech data from the result recognized using the existing phonetic dictionary and language model. Then, the phonetic symbols generated for the speech-text parallel corpus and the text results may be aligned with each other to generate phonetic symbols for each text. In other words, it is possible to update the phonetic dictionary with respect to differently expressed parts by matching the phonetic symbols and the text results with the speech-text parallel corpus. That is, a new pronunciation dictionary is generated.
- the speech recognizer learns the acoustic model using the input speech-text parallel corpus and the newly generated pronunciation dictionary.
- the speech recognizer can obtain an improved pronunciation dictionary and acoustic model.
- the new acoustic model obtained as described above may reduce the problem caused by the error of the existing phonetic symbol generator and reduce the learning ambiguity caused by the inconsistency between the text and the voice caused by the user's pronunciation error.
- FIG. 7 is a diagram illustrating a method of generating a pronunciation dictionary of a speech recognition apparatus according to an embodiment of the present invention.
- the voice recognition apparatus receives a voice from a user or a machine (S701).
- the speech recognition apparatus recognizes the input voice using an acoustic model in operation S702.
- the speech recognition apparatus may evaluate the reliability of the recognized result.
- the speech recognition apparatus may search for a lexical candidate group inferred by the phonetic symbols from the text corpus.
- the speech recognition apparatus may register the searched vocabulary candidate group in the pronunciation dictionary and then re-recognize the speech using the acoustic model.
- the speech recognition apparatus may evaluate the reliability of the recognized result.
- the speech recognition apparatus may search for a vocabulary that can be inferred from the dictionary by storing all inferred pronunciation symbols corresponding to each word of the text corpus.
- the speech recognition apparatus may register the searched vocabulary in the pronunciation dictionary, and then re-recognize the speech using the acoustic model.
- the speech recognition apparatus generates a phonetic symbol for the speech based on the recognized result (S703).
- a phonetic dictionary and a language model may be used to generate phonetic symbols.
- the weight of the acoustic model may be set higher than the weight for the language model to generate a phonetic symbol.
- candidates of all possible letters or phoneme units may be added to the pronunciation dictionary and the language model used to generate the phonetic symbols.
- the pronunciation dictionary and the language model used to generate the phonetic symbols may be composed of all possible letters or phoneme units.
- the speech recognition apparatus receives a user vocabulary from the user (S704).
- the voice recognition apparatus may request the voice corresponding to the user vocabulary.
- steps S701 to S703 may be performed after step S704.
- the speech recognition apparatus may evaluate the reliability of the result of the recognition in step S702. In this case, when the reliability is smaller than a preset threshold, the speech recognition apparatus may request the user vocabulary corresponding to the phonetic symbol and receive a user vocabulary from the user.
- the speech recognition apparatus may select a misrecognition and / or unrecognition portion from the recognition result in step S702.
- the voice recognition apparatus may request the user vocabulary corresponding to the misrecognition and / or unrecognized portion and receive a user vocabulary from a user.
- the speech recognition apparatus stores the input user vocabulary and the generated phonetic symbols in the phonetic dictionary (S705).
- the speech recognition apparatus may receive a speech-text parallel corpus stored in correspondence with speech data and text.
- the speech recognition apparatus may recognize the speech data using an acoustic model and generate a phonetic symbol for the speech data based on the recognized result using a pronunciation dictionary and a language model.
- the speech recognition apparatus may update the pronunciation dictionary by associating a phonetic symbol with respect to the voice data and the text.
- the speech recognition apparatus may learn the acoustic model using the updated pronunciation dictionary and the speech-text parallel corpus.
- Embodiments according to the present invention may be implemented by various means, for example, hardware, firmware, software, or a combination thereof.
- an embodiment of the present invention may include one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), FPGAs ( field programmable gate arrays), processors, controllers, microcontrollers, microprocessors, and the like.
- ASICs application specific integrated circuits
- DSPs digital signal processors
- DSPDs digital signal processing devices
- PLDs programmable logic devices
- FPGAs field programmable gate arrays
- processors controllers, microcontrollers, microprocessors, and the like.
- an embodiment of the present invention may be implemented in the form of a module, procedure, function, etc. that performs the functions or operations described above.
- the software code may be stored in memory and driven by the processor.
- the memory may be located inside or outside the processor, and may exchange data with the processor by various known means.
- the present invention can be applied to various speech recognition technologies.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
L'invention concerne un procédé destiné à générer un dictionnaire de prononciation, ainsi qu'un appareil associé. Selon l'invention, le procédé de génération de dictionnaire de prononciation par un dispositif de reconnaissance vocale peut comprendre les étapes consistant : à recevoir une entrée vocale provenant d'un utilisateur ou d'une machine ; à reconnaître l'entrée vocale au moyen d'un modèle audio ; à générer un symbole de prononciation pour la voix en fonction du résultat de la reconnaissance ; à recevoir de l'utilisateur une entrée d'un vocabulaire utilisateur ; et à mettre en correspondance le vocabulaire utilisateur d'entrée avec le symbole de prononciation généré et à le stocker dans le dictionnaire de prononciation.
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/KR2018/004971 WO2019208859A1 (fr) | 2018-04-27 | 2018-04-27 | Procédé de génération de dictionnaire de prononciation et appareil associé |
| JP2020560362A JP2021529338A (ja) | 2018-04-27 | 2018-04-27 | 発音辞書生成方法及びそのための装置 |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/KR2018/004971 WO2019208859A1 (fr) | 2018-04-27 | 2018-04-27 | Procédé de génération de dictionnaire de prononciation et appareil associé |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2019208859A1 true WO2019208859A1 (fr) | 2019-10-31 |
Family
ID=68295544
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/KR2018/004971 Ceased WO2019208859A1 (fr) | 2018-04-27 | 2018-04-27 | Procédé de génération de dictionnaire de prononciation et appareil associé |
Country Status (2)
| Country | Link |
|---|---|
| JP (1) | JP2021529338A (fr) |
| WO (1) | WO2019208859A1 (fr) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112530414A (zh) * | 2021-02-08 | 2021-03-19 | 数据堂(北京)科技股份有限公司 | 迭代式大规模发音词典构建方法及装置 |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR100277694B1 (ko) * | 1998-11-11 | 2001-01-15 | 정선종 | 음성인식시스템에서의 발음사전 자동생성 방법 |
| US20020013707A1 (en) * | 1998-12-18 | 2002-01-31 | Rhonda Shaw | System for developing word-pronunciation pairs |
| KR20100130263A (ko) * | 2009-06-03 | 2010-12-13 | 삼성전자주식회사 | 음성 인식용 발음사전 확장 장치 및 방법 |
| KR20130011323A (ko) * | 2011-07-21 | 2013-01-30 | 한국전자통신연구원 | 통계 기반의 다중 발음 사전 생성 장치 및 방법 |
| KR20160098910A (ko) * | 2015-02-11 | 2016-08-19 | 한국전자통신연구원 | 음성 인식 데이터 베이스 확장 방법 및 장치 |
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH03217900A (ja) * | 1990-01-24 | 1991-09-25 | Oki Electric Ind Co Ltd | テキスト音声合成装置 |
| JP5366169B2 (ja) * | 2006-11-30 | 2013-12-11 | 独立行政法人産業技術総合研究所 | 音声認識システム及び音声認識システム用プログラム |
| WO2009078256A1 (fr) * | 2007-12-18 | 2009-06-25 | Nec Corporation | Dispositif d'extraction de règle de fluctuation de prononciation, procédé d'extraction de règle de fluctuation de prononciation et programme d'extraction de règle de fluctuation de prononciation |
| JP2012063526A (ja) * | 2010-09-15 | 2012-03-29 | Ntt Docomo Inc | 端末装置、音声認識方法および音声認識プログラム |
| JP6410491B2 (ja) * | 2014-06-27 | 2018-10-24 | インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation | 発音辞書の拡張システム、拡張プログラム、拡張方法、該拡張方法により得られた拡張発音辞書を用いた音響モデルの学習方法、学習プログラム、および学習システム |
| JP6475517B2 (ja) * | 2015-03-02 | 2019-02-27 | 日本放送協会 | 発音系列拡張装置およびそのプログラム |
-
2018
- 2018-04-27 JP JP2020560362A patent/JP2021529338A/ja active Pending
- 2018-04-27 WO PCT/KR2018/004971 patent/WO2019208859A1/fr not_active Ceased
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR100277694B1 (ko) * | 1998-11-11 | 2001-01-15 | 정선종 | 음성인식시스템에서의 발음사전 자동생성 방법 |
| US20020013707A1 (en) * | 1998-12-18 | 2002-01-31 | Rhonda Shaw | System for developing word-pronunciation pairs |
| KR20100130263A (ko) * | 2009-06-03 | 2010-12-13 | 삼성전자주식회사 | 음성 인식용 발음사전 확장 장치 및 방법 |
| KR20130011323A (ko) * | 2011-07-21 | 2013-01-30 | 한국전자통신연구원 | 통계 기반의 다중 발음 사전 생성 장치 및 방법 |
| KR20160098910A (ko) * | 2015-02-11 | 2016-08-19 | 한국전자통신연구원 | 음성 인식 데이터 베이스 확장 방법 및 장치 |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112530414A (zh) * | 2021-02-08 | 2021-03-19 | 数据堂(北京)科技股份有限公司 | 迭代式大规模发音词典构建方法及装置 |
| CN112530414B (zh) * | 2021-02-08 | 2021-05-25 | 数据堂(北京)科技股份有限公司 | 迭代式大规模发音词典构建方法及装置 |
Also Published As
| Publication number | Publication date |
|---|---|
| JP2021529338A (ja) | 2021-10-28 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Zissman et al. | Automatic language identification | |
| CN100559462C (zh) | 语音处理装置、语音处理方法、程序、和记录介质 | |
| Hazen | Automatic language identification using a segment-based approach | |
| US6208964B1 (en) | Method and apparatus for providing unsupervised adaptation of transcriptions | |
| KR100612839B1 (ko) | 도메인 기반 대화 음성인식방법 및 장치 | |
| WO2023163383A1 (fr) | Procédé et appareil à base multimodale pour reconnaître une émotion en temps réel | |
| US5787230A (en) | System and method of intelligent Mandarin speech input for Chinese computers | |
| US5937383A (en) | Apparatus and methods for speech recognition including individual or speaker class dependent decoding history caches for fast word acceptance or rejection | |
| US11450320B2 (en) | Dialogue system, dialogue processing method and electronic apparatus | |
| US12087291B2 (en) | Dialogue system, dialogue processing method, translating apparatus, and method of translation | |
| WO2019208860A1 (fr) | Procédé d'enregistrement et de sortie de conversation entre de multiples parties au moyen d'une technologie de reconnaissance vocale, et dispositif associé | |
| CN107886968B (zh) | 语音评测方法及系统 | |
| WO2015163684A1 (fr) | Procédé et dispositif pour améliorer un ensemble d'au moins une unité sémantique, et support d'enregistrement lisible par ordinateur | |
| WO2014200187A1 (fr) | Appareil pour apprendre l'apophonie et procédé associé | |
| WO2019172734A2 (fr) | Dispositif d'exploration de données, et procédé et système de reconnaissance vocale utilisant ce dispositif | |
| WO2020096078A1 (fr) | Procédé et dispositif pour fournir un service de reconnaissance vocale | |
| Zhang et al. | Phonetic RNN-transducer for mispronunciation diagnosis | |
| WO2019208858A1 (fr) | Procédé de reconnaissance vocale et dispositif associé | |
| WO2020091123A1 (fr) | Procédé et dispositif de fourniture de service de reconnaissance vocale fondé sur le contexte | |
| Ballard et al. | A multimodal learning interface for word acquisition | |
| JP2014164261A (ja) | 情報処理装置およびその方法 | |
| WO2020096073A1 (fr) | Procédé et dispositif pour générer un modèle linguistique optimal à l'aide de mégadonnées | |
| Kou et al. | Fix it where it fails: Pronunciation learning by mining error corrections from speech logs | |
| WO2019208859A1 (fr) | Procédé de génération de dictionnaire de prononciation et appareil associé | |
| Wang et al. | L2 mispronunciation verification based on acoustic phone embedding and Siamese networks |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18916031 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2020560362 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 18916031 Country of ref document: EP Kind code of ref document: A1 |