[go: up one dir, main page]

WO2019208858A1 - Procédé de reconnaissance vocale et dispositif associé - Google Patents

Procédé de reconnaissance vocale et dispositif associé Download PDF

Info

Publication number
WO2019208858A1
WO2019208858A1 PCT/KR2018/004970 KR2018004970W WO2019208858A1 WO 2019208858 A1 WO2019208858 A1 WO 2019208858A1 KR 2018004970 W KR2018004970 W KR 2018004970W WO 2019208858 A1 WO2019208858 A1 WO 2019208858A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
speech recognition
speech
reliability
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/KR2018/004970
Other languages
English (en)
Korean (ko)
Inventor
황명진
지창진
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Systran International
Original Assignee
Systran International
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Systran International filed Critical Systran International
Priority to PCT/KR2018/004970 priority Critical patent/WO2019208858A1/fr
Publication of WO2019208858A1 publication Critical patent/WO2019208858A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/01Assessment or evaluation of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • the present invention relates to a speech recognition method having improved speech recognition accuracy and performance, and an apparatus supporting the same.
  • Speech recognition is a technology that converts speech into text using a computer. This technology has made a drastic improvement in recognition rate in recent years.
  • An object of the present invention is to provide a method and algorithm for selecting and updating a speech recognition result using a separate model in addition to the basic speech recognition model.
  • An aspect of the present invention provides a method of recognizing a voice, comprising: receiving a voice signal and converting the voice signal into voice data; Recognizing the speech data with a first speech recognition model; Evaluating a first reliability of a result recognized through the first speech recognition model; Recognizing the speech data with a second speech recognition model when the first reliability is less than a preset threshold; Evaluating a second reliability of the result of re-recognition through the second speech recognition model; And determining the re-recognition result as a final speech recognition result when the second reliability is greater than the preset threshold. It may include.
  • the speech recognition method may further include determining the recognized result as the final speech recognition result when the first reliability is greater than the preset threshold; It may further include.
  • the second speech recognition model may be a different speech recognition model different from the first speech recognition model, or may be set as a first speech recognition model in which only parameters are adjusted differently.
  • a method of recognizing a voice comprising: receiving a voice signal and converting the voice signal into voice data; Recognizing the speech data with a plurality of speech recognition models; Evaluating reliability of results recognized through the plurality of speech recognition models; Selecting a first recognition result having the highest reliability among the recognized results as a final speech recognition result; It may include.
  • At least one of the plurality of speech recognition models may be a speech recognition model that supports a plurality of language models.
  • the method for improving the real-time performance of the speech recognition apparatus comprising: receiving a voice signal, and converting the voice signal into voice data; Recognizing the voice data as a first voice recognition model; Evaluating a first reliability of a first recognition result recognized through the first speech recognition model; Recognizing the speech data with a second speech recognition model when the first reliability is less than a preset threshold; Evaluating a second reliability of the second recognition result re-recognized through the second speech recognition model; And identifying the difference between the re-recognized second recognition result and the first recognition result when the second reliability is greater than the preset threshold. And training the first voice recognition model to recognize the input voice for the difference portion as the second recognition result, not the first recognition result. It may include.
  • the voice input unit for receiving a voice signal, and converts the voice signal into voice data; And recognizing the voice data using a first voice recognition model, evaluating a first reliability of a result recognized through the first voice recognition model, and when the first reliability is smaller than a preset threshold, the voice data.
  • Re-recognizes as a second speech recognition model evaluates a second reliability for the result of re-recognition through the second speech recognition model, and if the second reliability is greater than the preset threshold, finalize the result of the re-recognition
  • a processor for determining a result of speech recognition may include.
  • the voice input unit for receiving a voice signal, and converts the voice signal into voice data; Receiving a voice signal, converting the voice signal into voice data, recognizing the voice data into a plurality of voice recognition models, evaluating the reliability of the results recognized through the plurality of voice recognition models, Selecting a first recognition result having the highest reliability among the recognized results as a final speech recognition result; It may include.
  • a speech recognition apparatus comprising: a speech input unit for receiving a speech signal and converting the speech signal into speech data; And recognizing the voice data as a first voice recognition model, evaluating a first reliability of a first recognition result recognized through the first voice recognition model, and when the first reliability is smaller than a preset threshold, Recognizing the speech data with a second speech recognition model, evaluating a second reliability with respect to the second recognition result re-recognized through the second speech recognition model, and wherein the second reliability is greater than the predetermined threshold value.
  • the portion of the difference between the re-recognized second recognition result and the first recognition result is checked, and the input voice for the difference is recognized as the second recognition result, not the first recognition result.
  • a processor for learning a first speech recognition model may include.
  • the speech recognition model in the above three embodiments may be applied to language models, acoustic models, neural network models, probability models, rule models, etc. without being limited to speech recognition models.
  • using a relatively small size of the model is fast speed, and because a plurality of models are used, there is an effect that the speech recognition is high with high accuracy range.
  • FIG. 1 is a block diagram of a speech recognition apparatus according to an embodiment of the present invention.
  • FIG. 2 illustrates an example of a speech recognition apparatus that processes a speech recognition result when the reliability of the speech recognition result is low.
  • FIG. 3 illustrates a speech recognition apparatus that recognizes / processes speech using a plurality of acoustic recognition models.
  • FIG. 4 is a diagram illustrating a method for improving the accuracy / performance of a recognition result of a specific acoustic model using recognition results obtained from another acoustic model according to an embodiment of the present invention.
  • FIG. 5 is a flowchart illustrating a voice recognition method according to an embodiment of the present invention.
  • FIG. 6 is a flowchart illustrating a voice recognition method according to an embodiment of the present invention.
  • FIG. 1 is a block diagram of a speech recognition apparatus according to an embodiment of the present invention.
  • the voice recognition apparatus 100 may include a voice input unit 110 that receives a user's voice, a memory 120 that stores various types of data related to a recognized voice, and a processor 130 that processes the input user's voice. It may include at least one of).
  • the voice input unit 110 may include a microphone, and when a user's uttered voice is input, the voice input unit 110 converts the voice signal into an electrical signal and outputs it to the processor 130.
  • the processor 130 may acquire voice data of the user by applying a speech recognition algorithm or a speech recognition engine to a signal received from the voice input unit 110.
  • the signal input to the processor 130 may be converted into a more useful form for speech recognition, the processor 130 converts the input signal from the analog form to the digital form, and detects the start and end points of the voice To detect the actual speech section / data included in the speech data. This is called end point detection (EPD).
  • EPD end point detection
  • the processor 130 may perform Cepstrum, Linear Predictive Coefficient (LPC), Mel Frequency Cepstral Coefficient (MFCC), or Filter Bank energy within the detected interval.
  • LPC Linear Predictive Coefficient
  • MFCC Mel Frequency Cepstral Coefficient
  • Filter Bank energy may be applied to extract a feature vector of a signal.
  • the processor 130 may store information about the end point of the voice data and the feature vector using the memory 120 storing the data.
  • the memory 120 may be a flash memory, a hard disc, a memory card, a read-only memory (ROM), a random access memory (RAM), a memory card, or an electrically erasable programmable memory (EPEROM).
  • a storage medium may include at least one of a read-only memory (PROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, and an optical disk.
  • the processor 130 may obtain a recognition result by comparing the extracted feature vector with the trained reference pattern.
  • a speech recognition model for modeling and comparing signal characteristics of speech and a language model for modeling linguistic order relations such as words or syllables corresponding to the recognized vocabulary may be used.
  • the speech recognition model may be divided into a direct comparison method of setting a recognition object as a feature vector model and comparing it with a feature vector of speech data and a statistical method of statistically processing the feature vector of the recognition object.
  • the direct comparison method is a method of setting a unit of a word, a phoneme, or the like to be recognized as a feature vector model and comparing how similar the input speech is.
  • a vector quantization method is used. According to the vector quantization method, a feature vector of input speech data is mapped with a codebook, which is a reference model, and encoded into a representative value to compare the code values.
  • the statistical model method is a method of constructing a unit for a recognition object into a state sequence and using a relationship between state columns.
  • the status column may consist of a plurality of nodes.
  • Methods using the relationship between the state columns again include Dynamic Time Warping (DTW), Hidden Markov Model (HMM), and neural networks.
  • DTW Dynamic Time Warping
  • HMM Hidden Markov Model
  • Dynamic time warping is a method of compensating for differences in the time axis when considering the dynamic characteristics of speech whose length varies over time even if the same person pronounces the same. Assuming a Markov process with probability and observed probability of nodes (output symbols) in each state, we estimate state transition probability and observed probability of the nodes from the training data, and calculate the probability that the voice input from the estimated model will occur. Is a recognition technology.
  • a language model that models linguistic order relations such as words or syllables may apply acoustic relations between units constituting language to units obtained in speech recognition, thereby reducing acoustic ambiguity and reducing recognition errors.
  • the linguistic model has a model based on a statistical language model and a finite state automata (FSA), and the statistical linguistic model uses a chain probability of words such as Unigram, Bigram, and Trigram.
  • FSA finite state automata
  • the processor 130 may use any of the methods described above in recognizing the voice. For example, a speech recognition model with a hidden Markov model may be used, or an N-best search method that integrates a speech recognition model and a language model may be used.
  • the N-best search method can improve recognition performance by selecting up to N recognition result candidates using a speech recognition model and a language model, and then re-evaluating the ranking of these candidates.
  • the processor 130 may calculate a confidence score (or may be abbreviated as 'confidence') in order to secure the reliability of the recognition result.
  • the confidence score is a measure of how reliable the result is for a speech recognition result. It can be defined as a relative value of the phoneme or word that is a recognized result and the probability that the word is spoken from other phonemes or words. have. Therefore, the reliability score may be expressed as a value between 0 and 1, or may be expressed as a value between 0 and 100. If the confidence score is larger than a predetermined threshold, the recognition result may be recognized, and if the confidence score is small, the recognition result may be rejected.
  • the reliability score may be obtained according to various conventional reliability score obtaining algorithms.
  • the processor 130 may perform a speech recognition operation using a separate speech recognition model. Embodiments will be described later in detail with reference to FIGS. 3 to 5.
  • the processor 130 may be implemented in a computer-readable recording medium using software, hardware, or a combination thereof.
  • Hardware implementations include Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), Processors, and Micros It may be implemented using at least one of an electrical unit such as a controller (microcontrollers), a micro-processor (micro-processor).
  • a software implementation it may be implemented with a separate software module that performs at least one function or operation, and the software code may be implemented by a software application written in a suitable programming language.
  • the processor 130 implements the functions, processes, and / or methods proposed in FIGS. 2 to 6, which will be described later. Hereinafter, the processor 130 will be described by identifying the processor 130 with the voice recognition apparatus 100. do.
  • FIG. 2 illustrates an example of a speech recognition apparatus that processes a speech recognition result when the reliability of the speech recognition result is low.
  • the speech recognition apparatus may basically recognize speech data as X, which is a default (or first) speech recognition model.
  • the speech recognition apparatus may evaluate / acquire reliability of the result recognized through the speech recognition model X. If it is determined that the reliability of the recognition result through the speech recognition model X is low (for example, lower than a threshold value), the speech recognition apparatus may re-recognize the speech data as Y, the second speech recognition model. When it is determined that the reliability of the result re-recognized through the voice recognition model Y is higher than the first voice recognition model X (for example, higher than the threshold value), the voice recognition device may determine the result of the voice recognition through the voice recognition model Y. The final result can be chosen.
  • the present invention is not limited thereto, and a plurality of second voice recognition models may exist.
  • Priority may be assigned between the two speech recognition models. Therefore, when it is determined that the reliability of the first speech recognition model X is low (for example, lower than a threshold value), the speech recognition apparatus may sequentially perform speech recognition through the second speech recognition models according to the priority.
  • the result recognized by the second speech recognition model determined as having high reliability for example, higher than a threshold value or higher than the first speech recognition model X
  • a speech recognition model having higher performance than the first speech recognition model may be used as the second speech recognition model.
  • the reason why the higher performance speech recognition model is not used as the first speech recognition model is that the accuracy and performance of the second speech recognition model may be good, but the overall speech processing speed may be slowed down.
  • the second speech recognition model may be a separate / different model from the first speech recognition model X, or may be the same speech recognition model that is the same as the first speech recognition model X, but with different decoding parameters.
  • the speech recognition apparatus may improve in real time the result of misrecognition of the first speech recognition model X due to unregistered vocabulary, low sound quality, or the like.
  • FIG. 3 illustrates a speech recognition apparatus that recognizes / processes speech using a plurality of acoustic recognition models.
  • the apparatus for recognizing speech may recognize speech data using a plurality of speech recognition models, generate a plurality of recognition results, and evaluate / acquire reliability of the plurality of recognition results.
  • the speech recognition apparatus may select the recognition result having the highest reliability as the final result based on the evaluated / obtained reliability.
  • the present embodiment corresponds to an embodiment in which a plurality of first speech recognition models are set (for example, speech recognition models X, Y, and Z, etc.).
  • speech data processing speed of the speech recognition apparatus may be faster when using several small speech recognition models (although small performance) than using one large speech recognition model (high performance). The reason is that the speech recognition apparatus needs to perform a large-scale processing with one processor in order to process a large-sized speech recognition model. However, when using several small-sized speech recognition models, the speech recognition apparatus only needs to perform a small processing with multiple processors.
  • the plurality of speech recognition models may be configured with the same speech recognition model in which only decoding parameters are different from each other.
  • a plurality of speech recognition models may include a plurality of languages, in which case it is not a big problem even if the speaker's language is not explicitly designated as a specific language. The reason is that the recognition result of the speech recognition model through a language that does not correspond to the language of the speaker is already low and is naturally excluded.
  • FIG. 4 is a diagram illustrating a method for improving the accuracy / performance of a recognition result of a specific speech recognition model using recognition results obtained from another speech recognition model according to an embodiment of the present invention.
  • the speech recognition apparatus may improve speech recognition performance by comparing the reliability of recognition results obtained from two or more models and applying the recognition result with high reliability to the recognition result with low reliability.
  • the speech recognition apparatus has a recognition result of the speech recognition model X determined to be low in reliability (e.g., determined to be lower than a threshold value) and a determination of high reliability (e.g., reliability is higher than a threshold value).
  • the recognition results of the speech recognition model Y determined to be high can be compared with each other.
  • the speech recognition apparatus may improve speech recognition performance by correcting at least one of a word, a sentence headword, and a phonetic symbol having a difference between the two recognition results within the same low recognition result as a highly reliable recognition result.
  • the speech recognition apparatus may learn the speech recognition model X having a low reliability recognition result based on the recognition result and the comparison result of the speech recognition model Y. To this end, various learning functions such as machine-learning and deep-learning may be applied.
  • the speech recognition apparatus may recognize and register some misrecognition vocabulary in real time / automatically in a domain-specific model used as the first speech recognition model.
  • the fast domain-specific speech recognition model having a smaller size than the large speech recognition model can be used to supplement only a part of the poor / less accurate vocabulary in real time, thereby achieving faster processing speed and higher accuracy. Can be.
  • voice data as well as words, sentence headings, and phonetic symbols of portions that differ between high and low reliability recognition results may be extracted, and a speech recognition model that generates low reliability recognition results using such information / data. This can be improved.
  • FIG. 5 is a flowchart illustrating a voice recognition method according to an embodiment of the present invention.
  • the above-described embodiments described above with reference to FIGS. 1 to 4 may be applied in the same or similar manner, and a redundant description will be omitted below.
  • the voice recognition apparatus may receive a voice signal and convert the voice signal into voice data (S501).
  • the speech recognition apparatus may recognize the speech data as the first speech recognition model (S502).
  • the speech recognition apparatus may evaluate a first reliability of a result recognized through the first speech recognition model (S503).
  • the speech recognition apparatus may determine the recognized result as the final speech recognition result.
  • the speech recognition apparatus may re-recognize the speech data as the second speech recognition model (S504).
  • the second speech recognition model used may be set as a different speech recognition model different from the first speech recognition model, or may be set as a first speech recognition model in which only parameters are adjusted differently.
  • the speech recognition apparatus may evaluate a second reliability of the result of re-recognition through the second speech recognition model (S505).
  • the speech recognition apparatus may determine the re-recognition result as the final speech recognition result (S506).
  • FIG. 6 is a flowchart illustrating a voice recognition method according to an embodiment of the present invention.
  • the above-described embodiments described above with reference to FIGS. 1 to 4 may be applied in the same or similar manner, and a redundant description will be omitted below.
  • the voice recognition apparatus may receive a voice signal and convert the voice signal into voice data (S601).
  • the speech recognition apparatus may recognize the speech data as a plurality of first speech recognition models (S602).
  • the speech recognition apparatus may evaluate reliability of the results recognized through the plurality of first speech recognition models (S603).
  • the speech recognition apparatus may select the first recognition result having the highest reliability among the recognized results as the final speech recognition result (S604).
  • the speech recognition apparatus may compare the first recognition result with the second recognition result having the lowest reliability among the recognized results.
  • the apparatus for recognizing a speech may modify the second recognition result by applying the first recognition result to a portion where a difference from the first recognition result exists in the second recognition result. That is, the second recognition result may be corrected / updated based on the first recognition result.
  • the speech recognition apparatus may learn the first speech recognition model from which the second recognition result is derived, based on the first recognition result with respect to the portion in which the first and second recognition result differences exist.
  • At least one voice recognition model of the plurality of voice recognition models may be a voice recognition model that supports a plurality of language models.
  • the speech recognition apparatus may recognize the speech data as a plurality of language models through the at least one basic speech recognition model, and may recognize the recognition result having the highest reliability among the recognition results recognized by the plurality of language models. Through the language model of the voice data can be automatically recognized.
  • FIG. 7 is a flowchart illustrating a method of improving real-time performance of a voice recognizer according to an embodiment of the present invention.
  • the above-described embodiments described above with reference to FIGS. 1 to 4 may be applied in the same or similar manner, and a redundant description will be omitted below.
  • the voice recognition apparatus may receive a voice signal and convert the voice signal into voice data (S701).
  • the speech recognition apparatus may recognize the speech data as the first speech recognition model (S702).
  • the speech recognition apparatus may evaluate a first reliability of a result recognized through the first speech recognition model (S703).
  • the speech recognition apparatus may determine the recognized result as the final speech recognition result.
  • the speech recognition apparatus may re-recognize the speech data as the second speech recognition model (S704).
  • the second speech recognition model used may be set as a different speech recognition model different from the first speech recognition model, or may be set as a first speech recognition model in which only parameters are adjusted differently.
  • the speech recognition apparatus may evaluate a second reliability of the result of re-recognition through the second speech recognition model (S705).
  • the speech recognition apparatus may determine the re-recognition result as the final speech recognition result (S706).
  • the speech recognition apparatus may compare the first recognition result with the second recognition result having the lowest reliability among the recognized results (S706).
  • the apparatus for recognizing a speech may modify the second recognition result by applying the first recognition result to a portion where a difference from the first recognition result exists in the second recognition result (S707). That is, the second recognition result may be corrected / updated based on the first recognition result.
  • the speech recognition apparatus may learn the first speech recognition model from which the second recognition result is derived, based on the first recognition result with respect to the portion in which the first and second recognition result differences exist.
  • Embodiments according to the present invention may be implemented by various means, for example, hardware, firmware, software, or a combination thereof.
  • an embodiment of the present invention may include one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), FPGAs ( field programmable gate arrays), processors, controllers, microcontrollers, microprocessors, and the like.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGAs field programmable gate arrays
  • processors controllers, microcontrollers, microprocessors, and the like.
  • an embodiment of the present invention may be implemented in the form of a module, procedure, function, etc. that performs the functions or operations described above.
  • the software code may be stored in memory and driven by the processor.
  • the memory may be located inside or outside the processor, and may exchange data with the processor by various known means.
  • the present invention can be applied to various speech recognition technologies.
  • the present invention can be applied not only to speech recognition models but also to technical fields using neural network models, probability models, rule models, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

Un aspect de la présente invention concerne un procédé de reconnaissance d'une voix pouvant comprendre les étapes de : réception d'un signal vocal et conversion du signal vocal en données vocales ; reconnaissance des données vocales comme premier modèle de reconnaissance vocale ; évaluation d'une première fiabilité pour un résultat reconnu par l'intermédiaire du premier modèle de reconnaissance vocale ; nouvelle reconnaissance des données vocales comme second modèle de reconnaissance vocale lorsque la première fiabilité est inférieure à une valeur seuil prédéfinie ; évaluation d'une seconde fiabilité sur un résultat de nouveau reconnu par l'intermédiaire du second modèle de reconnaissance vocale ; et détermination du résultat reconnu comme résultat de reconnaissance vocale final lorsque la seconde fiabilité est supérieure à la valeur seuil prédéfinie.
PCT/KR2018/004970 2018-04-27 2018-04-27 Procédé de reconnaissance vocale et dispositif associé Ceased WO2019208858A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/KR2018/004970 WO2019208858A1 (fr) 2018-04-27 2018-04-27 Procédé de reconnaissance vocale et dispositif associé

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/KR2018/004970 WO2019208858A1 (fr) 2018-04-27 2018-04-27 Procédé de reconnaissance vocale et dispositif associé

Publications (1)

Publication Number Publication Date
WO2019208858A1 true WO2019208858A1 (fr) 2019-10-31

Family

ID=68295548

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2018/004970 Ceased WO2019208858A1 (fr) 2018-04-27 2018-04-27 Procédé de reconnaissance vocale et dispositif associé

Country Status (1)

Country Link
WO (1) WO2019208858A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111583934A (zh) * 2020-04-30 2020-08-25 联想(北京)有限公司 一种数据处理方法及装置
CN114495931A (zh) * 2022-01-28 2022-05-13 达闼机器人股份有限公司 语音交互方法、系统、装置、设备及存储介质
CN114974226A (zh) * 2022-05-19 2022-08-30 京东科技信息技术有限公司 一种音频数据的识别方法和装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20040072102A (ko) * 2003-02-08 2004-08-18 엘지전자 주식회사 음성인식 방법 및 장치
KR20050082249A (ko) * 2004-02-18 2005-08-23 삼성전자주식회사 도메인 기반 대화 음성인식방법 및 장치
KR20100030483A (ko) * 2008-09-10 2010-03-18 엘지전자 주식회사 다중 스레드를 이용한 음성 인식 장치 및 그 방법
KR20110133739A (ko) * 2010-06-07 2011-12-14 주식회사 서비전자 다중 모델 적응화와 음성인식장치 및 방법
US20150051909A1 (en) * 2013-08-13 2015-02-19 Mitsubishi Electric Research Laboratories, Inc. Pattern recognition apparatus and pattern recognition method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20040072102A (ko) * 2003-02-08 2004-08-18 엘지전자 주식회사 음성인식 방법 및 장치
KR20050082249A (ko) * 2004-02-18 2005-08-23 삼성전자주식회사 도메인 기반 대화 음성인식방법 및 장치
KR20100030483A (ko) * 2008-09-10 2010-03-18 엘지전자 주식회사 다중 스레드를 이용한 음성 인식 장치 및 그 방법
KR20110133739A (ko) * 2010-06-07 2011-12-14 주식회사 서비전자 다중 모델 적응화와 음성인식장치 및 방법
US20150051909A1 (en) * 2013-08-13 2015-02-19 Mitsubishi Electric Research Laboratories, Inc. Pattern recognition apparatus and pattern recognition method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111583934A (zh) * 2020-04-30 2020-08-25 联想(北京)有限公司 一种数据处理方法及装置
CN114495931A (zh) * 2022-01-28 2022-05-13 达闼机器人股份有限公司 语音交互方法、系统、装置、设备及存储介质
CN114974226A (zh) * 2022-05-19 2022-08-30 京东科技信息技术有限公司 一种音频数据的识别方法和装置
CN114974226B (zh) * 2022-05-19 2025-11-18 京东科技信息技术有限公司 一种音频数据的识别方法和装置

Similar Documents

Publication Publication Date Title
CN100559462C (zh) 语音处理装置、语音处理方法、程序、和记录介质
Zissman et al. Automatic language identification
US20250201267A1 (en) Method and apparatus for emotion recognition in real-time based on multimodal
US10249294B2 (en) Speech recognition system and method
US5865626A (en) Multi-dialect speech recognition method and apparatus
US6208964B1 (en) Method and apparatus for providing unsupervised adaptation of transcriptions
US5621857A (en) Method and system for identifying and recognizing speech
WO2009145508A2 (fr) Système pour détecter un intervalle vocal et pour reconnaître des paroles continues dans un environnement bruyant par une reconnaissance en temps réel d'instructions d'appel
JPH0422276B2 (fr)
Lamel et al. Cross-lingual experiments with phone recognition
US11450320B2 (en) Dialogue system, dialogue processing method and electronic apparatus
CN107886968B (zh) 语音评测方法及系统
WO2020096078A1 (fr) Procédé et dispositif pour fournir un service de reconnaissance vocale
WO2019208858A1 (fr) Procédé de reconnaissance vocale et dispositif associé
Zhang et al. Phonetic RNN-transducer for mispronunciation diagnosis
Raškinis et al. Building medium‐vocabulary isolated‐word lithuanian hmm speech recognition system
WO2020091123A1 (fr) Procédé et dispositif de fourniture de service de reconnaissance vocale fondé sur le contexte
WO2020096073A1 (fr) Procédé et dispositif pour générer un modèle linguistique optimal à l'aide de mégadonnées
Rasipuram et al. Grapheme and multilingual posterior features for under-resourced speech recognition: a study on scottish gaelic
Wang et al. L2 mispronunciation verification based on acoustic phone embedding and Siamese networks
Wana et al. A multi-view approach for Mandarin non-native mispronunciation verification
WO2019208859A1 (fr) Procédé de génération de dictionnaire de prononciation et appareil associé
KR20030010979A (ko) 의미어단위 모델을 이용한 연속음성인식방법 및 장치
Qian et al. Improving native language (l1) identifation with better vad and tdnn trained separately on native and non-native english corpora
Benıtez et al. Word verification using confidence measures in speech recognition

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18916464

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18916464

Country of ref document: EP

Kind code of ref document: A1