CN109243430A - A kind of audio recognition method and device - Google Patents
A kind of audio recognition method and device Download PDFInfo
- Publication number
- CN109243430A CN109243430A CN201710537548.3A CN201710537548A CN109243430A CN 109243430 A CN109243430 A CN 109243430A CN 201710537548 A CN201710537548 A CN 201710537548A CN 109243430 A CN109243430 A CN 109243430A
- Authority
- CN
- China
- Prior art keywords
- user
- speech recognition
- language model
- recognition result
- language
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 239000013598 vector Substances 0.000 claims description 26
- 230000015654 memory Effects 0.000 claims description 18
- 238000012549 training Methods 0.000 claims description 18
- 238000012163 sequencing technique Methods 0.000 claims description 7
- 235000013399 edible fruits Nutrition 0.000 claims description 5
- 238000012545 processing Methods 0.000 description 14
- 238000004891 communication Methods 0.000 description 11
- 238000005516 engineering process Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 7
- 230000004048 modification Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000005236 sound signal Effects 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 230000007812 deficiency Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000000712 assembly Effects 0.000 description 2
- 238000000429 assembly Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- KLDZYURQCUYZBL-UHFFFAOYSA-N 2-[3-[(2-hydroxyphenyl)methylideneamino]propyliminomethyl]phenol Chemical compound OC1=CC=CC=C1C=NCCCN=CC1=CC=CC=C1O KLDZYURQCUYZBL-UHFFFAOYSA-N 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 201000001098 delayed sleep phase syndrome Diseases 0.000 description 1
- 208000033921 delayed sleep phase type circadian rhythm sleep disease Diseases 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 239000012092 media component Substances 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
Landscapes
- Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
Abstract
The embodiment of the present invention provides a kind of audio recognition method and device, which comprises the voice input for receiving user identifies voice input, obtains candidate speech recognition result;The candidate speech recognition result is ranked up using the language identification model of user;The personal language model corresponding to the user is the language model established using the history text input data of the user;Final speech recognition result is obtained using the candidate speech recognition result after sequence.The embodiment of the present invention can effectively improve the accuracy of speech recognition result.
Description
Technical field
The present embodiments relate to technical field of voice recognition, and in particular to a kind of audio recognition method and device.
Background technique
Speech recognition technology is a kind of technology that human speech is converted to computer-readable input.Speech recognition technology exists
The fields such as phonetic dialing, Voice Navigation, automatic equipment control are all widely used.Therefore, the standard of speech recognition how is improved
True property becomes an important project.
In the prior art, it is generally identified using the voice that speech model inputs user, by the phonetic feature of input
Sequence is converted to character string.Speech model generally comprises acoustic model and language model, respectively corresponds voice to syllable probability
Calculating and syllable to character probabilities calculating.
Applicant has found that the prior art is using identical speech recognition modeling to difference during studying the prior art
The voice of user identifies, however, the pronunciation characteristic of different user and language use habit be it is different, the prior art without
Method provides accurate, personalized speech recognition result.Although there are a kind of methods for the prior art, a voice of user can be applied
It learns model user speech is identified to obtain recognition result, but this method only only accounts for the pronunciation characteristic of user, such as
Dialect classification belonging to user, this method are still not able to provide more accurate, personalized speech recognition result.
Summary of the invention
The embodiment of the present invention is intended to provide a kind of audio recognition method and device, can use general language model and with
The corresponding personal language model of the user is ranked up candidate speech recognition result, obtains more accurate, personalized
Speech recognition result.
For this purpose, the embodiment of the present invention provides the following technical solutions:
In a first aspect, the embodiment of the invention provides a kind of audio recognition methods, comprising: the voice input of user is received,
Voice input is identified, candidate speech recognition result is obtained;Using the language identification model of user to the candidate
Speech recognition result is ranked up;Wherein, the language identification model of the user by general language model and with user couple
The personal language model answered obtains, and the personal language model corresponding to the user is to be inputted using the history text of the user
The language model that data are established;Final speech recognition result is obtained using the candidate speech recognition result after sequence.
Second aspect, the embodiment of the invention provides a kind of speech recognition equipments, comprising: recognition unit is used for receiving
The voice at family inputs, and identifies to voice input, obtains candidate speech recognition result;Sequencing unit, for utilizing use
The language identification model at family is ranked up the candidate speech recognition result;Wherein, the language identification model of the user is logical
It crosses general language model and personal language model corresponding to the user obtains, the personal language model corresponding to the user is
The language model established using the history text input data of the user;As a result obtaining unit, for utilizing the time after sequence
Speech recognition result is selected to obtain final speech recognition result.
The third aspect, it to include memory, Yi Jiyi that the embodiment of the invention provides a kind of devices for speech recognition
A perhaps more than one program one of them or more than one program is stored in memory, and is configured to by one
Or it includes the instruction for performing the following operation that more than one processor, which executes the one or more programs: being received
The voice of user inputs, and identifies to voice input, obtains candidate speech recognition result;Utilize the language identification of user
Model is ranked up the candidate speech recognition result;Wherein, the language identification model of the user passes through all-purpose language mould
Type and personal language model corresponding to the user obtain, and the personal language model corresponding to the user is to utilize the user
History text input data establish language model;Final voice is obtained using the candidate speech recognition result after sequence to know
Other result.
Fourth aspect, the embodiment of the invention provides a kind of machine readable medias, are stored thereon with instruction, when by one or
When multiple processors execute, so that device executes the audio recognition method as shown in first aspect.
Audio recognition method and device provided in an embodiment of the present invention can receive the voice input of user, to institute's predicate
Sound input is identified, is obtained candidate speech recognition result, is identified using the language identification model of user to the candidate speech
As a result it is ranked up, obtains final speech recognition result using the candidate speech recognition result after sequence.Due to of the invention real
Apply that general language model is utilized in example and individual subscriber language model has obtained the language identification model of user to candidate speech
Recognition result is ranked up, not only allow for general language use habit, it is also contemplated that user individual language use habit
Influence to candidate speech recognition result, so that the sort result for more meeting user individual language use habit has in forefront
Effect improves the accuracy of speech recognition result.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
The some embodiments recorded in invention, for those of ordinary skill in the art, without creative efforts,
It is also possible to obtain other drawings based on these drawings.
Fig. 1 is the audio recognition method flow chart that one embodiment of the invention provides;
Fig. 2 be another embodiment of the present invention provides audio recognition method flow chart;
Fig. 3 is the speech recognition equipment schematic diagram that one embodiment of the invention provides;
Fig. 4 is a kind of block diagram for speech recognition equipment shown according to an exemplary embodiment;
Fig. 5 is the block diagram of server shown according to an exemplary embodiment.
Specific embodiment
The embodiment of the present invention is intended to provide a kind of audio recognition method and device, can use general language model and with
The corresponding personal language model of the user is ranked up candidate speech recognition result, obtains more accurate, personalized
Speech recognition result.
Technical solution in order to enable those skilled in the art to better understand the present invention, below in conjunction with of the invention real
The attached drawing in example is applied, technical solution in the embodiment of the present invention is described, it is clear that described embodiment is only this hair
Bright a part of the embodiment, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not having
Every other embodiment obtained under the premise of creative work is made, should fall within the scope of the present invention.
The audio recognition method shown in exemplary embodiment of the present is introduced below in conjunction with attached drawing 1 to attached drawing 2.
Referring to Fig. 1, the audio recognition method flow chart provided for one embodiment of the invention.As shown in Figure 1, may include:
S101 receives the voice input of user, identifies to voice input, obtain candidate speech recognition result.
It should be noted that acoustic mode in the prior art can be used when the voice input to user identifies
Type identifies voice input, obtains candidate speech recognition result.It is best that the candidate speech recognition result is generally top n
Recognition result, N are positive integer, and value rule of thumb or can need to set.
S102 is ranked up the candidate speech recognition result using the language identification model of user.
It should be noted that generally being known in the prior art using general speech model to the voice input of user
Not, the pronunciation characteristic and language use habit of different user are not considered.For example, each user has different language uses
Habit, such as each user have different pet phrases, different common words, different industry common-use words and different places
Characteristic vocabulary etc..The personalized acoustic model recognition methods that the prior art provides, it is contemplated that the different pronunciation of different user is special
Point, the acoustic feature for acquiring user establishes personal acoustic model to provide the accuracy of speech recognition, but this method does not have
There is the language use in view of user to be accustomed to, more accurate, personalized speech recognition result can not be still provided.
In the embodiment of the present invention, personal language model corresponding to the user can be pre-established.It is described corresponding to the user
Personal language model is the language model established using the history text input data of the user, can be with one sentence of valid metric
The height of sub- probability.
When specific implementation, personal language model corresponding to the user can be established in the following manner:
A, the history text input data of user is obtained.
When specific implementation, language of the text input data of user as speech model training can be collected through a variety of ways
Material.
B, the word feature and/or word combination feature of the user are obtained according to the history text input data of the user,
Institute's predicate feature includes the statistics frequency of word and word, and institute's word combination feature includes the statistics of word combination and word combination
The frequency.
Wherein, the statistics frequency of word is specifically as follows frequency of occurrence of the word in entire corpus.Word combination
Counting the frequency is specially frequency of occurrence of the word combination in entire corpus.Phrase is combined into the combination of more than two words.It illustrates
Bright, user Zhang San misses potter input " I strangle go " when typewriting, and seldom inputs " I a go ".By collecting user Zhang San
Word combination, and count its number occurred in corpus, when future, Zhang San said the words with voice, candidate speech recognition result
" I strangle go " will come before candidate speech recognition result " me go ".For another example, it is often inputted when user Li Si typewrites
" storehouse " seldom inputs " battle ".By collecting word feature, in subsequent speech recognition, candidate speech recognition result " storehouse "
It will come before candidate speech recognition result " battle ".
C, personal language corresponding with the user is obtained using the word feature of the user and/or the training of word combination feature
Model.
In the personal language model of training, a kind of N-Gram (speech model) language model training side can be specifically used
Method, Recognition with Recurrent Neural Network (full name in English is Recurrent neural Network, English abbreviation RNN) language model training
Method, shot and long term memory network (full name in English is Long Short-Term Memory, English abbreviation LSTM) language model
Training method etc..It is introduced by taking the training of ternary N-Gram language model as an example below.
(1) word segmentation processing is carried out to each sentence in corpus.Such as the result that ABC the words is segmented is
(A, B, C).
(2) probability that word A, word B, word C occur in corpus is calculated separately.Wherein:
Total word number in number/corpus that P (A)=A occurs in corpus
(3) probability occurred in word combination AB corpus is calculated.
The total degree that A occurs in number/corpus that P (B | A)=AB occurs in corpus
(4) conditional probability that word C occurs after word combination AB is calculated.
The number that AB occurs in number/corpus that P (C | AB)=ABC occurs in corpus
(5) probability that sentence ABC occurs in corpus is calculated.
P (ABC)=P (A) P (B | A) P (C | AB)
As a result, all corpus of user are trained with the personal language model that can obtain user, a human speech
Say that model can be with the height of one sentence probability of occurrence of valid metric.
When the invention is realized in detail, after obtaining N number of candidate speech recognition result, it can use the language identification of user
Model is ranked up the candidate speech recognition result.Wherein, the language identification model of the user passes through all-purpose language mould
Type and personal language model corresponding to the user obtain, and general language model is instructed using the text input corpus of all users
The model got.
In some embodiments, the language identification model using user carries out the candidate speech recognition result
Sequence includes: to carry out linear interpolation using the general language model and personal language model corresponding with the user, is obtained
To the language identification model of the user;The general of each candidate speech recognition result is calculated using the language identification model of the user
Rate is ranked up each candidate speech recognition result according to the probability of calculating.
For example, weight can be preset, the language identification model of user is obtained by following formula:
Language identification model=a × individual's language model+b × general language model of user
Wherein, 0 < a < 1,0 <b < 1, a+b=1.
For example, the value that the value of a is 0.7, b is 0.3.
The probability that each candidate speech recognition result is calculated using acquisition language identification model, is arranged according to probability size descending
Sequence.
In other embodiments, the language identification model using user to the candidate speech recognition result into
Row sequence includes: to calculate the probability of each candidate speech recognition result using general language model and utilize corresponding with the user
Personal language model calculate the probability of each candidate speech recognition result;By the probability being calculated using general language model with
Linear interpolation is carried out using the probability that personal language model corresponding with the user is calculated, according to the result of linear interpolation
Each candidate speech recognition result is ranked up
For example, weight can be preset, the result of linear interpolation is obtained by following formula:
Probability value+b × general language model probability value of final probability value=a × individual's language model
Wherein, 0 < a < 1,0 <b < 1, a+b=1.
It illustrates, it is assumed that the value that the value of a is 0.7, b is 0.3.The personal language model of user Zhang San is to candidate language
The probability value calculated result of sound recognition result " I strangles a go " is 0.00038683, probability value of the general language model to the words
Calculated result is 0.00023453, by linear interpolation, obtains the final probability value calculated result of the words are as follows: 0.7 ×
0.00038683+0.3 × 0.00023453=0.00034114.
In some embodiments, the method also includes: obtain group corresponding with user language model;It is described
Group's language model is used to describe the language feature of the affiliated group of user;The language identification model using user is to the time
Selecting speech recognition result to be ranked up includes: to utilize general language model, personal language model corresponding with the user and institute
The corresponding group's language model of user is stated to be ranked up the candidate speech recognition result.Specific implementation may refer to Fig. 2 institute
Show embodiment.
S103 obtains final speech recognition result using the candidate speech recognition result after sequence.
When specific implementation, the candidate speech recognition result of sequence up front can be obtained into final speech recognition knot
Fruit.For example, user says " Wang Li is made a phone call to me ".If being using the possible recognition result of general language model merely
" Wang Li is made a phone call to me ".But for the user usually when using input method, the name often inputted is " Wang Li ".Using
Method of the invention, when carrying out N-best to candidate input results and resetting sequence algorithm and beat again point sequence, can general " Wang Li is beaten to me
A phone " is discharged to before " Wang Li is made a phone call to me ".Thus obtained personalized identification result can compare universal identification
As a result more acurrate, more meet the language use feature of user.
In this embodiment of the invention, it can receive the voice input of user, voice input identified, is obtained
Candidate speech recognition result is obtained, the candidate speech recognition result is ranked up using the language identification model of user, is utilized
Candidate speech recognition result after sequence obtains final speech recognition result.Since all-purpose language is utilized in the embodiment of the present invention
The language identification model that model and individual subscriber language model have obtained user is ranked up candidate speech recognition result, comprehensive
The influence for considering general language use habit and user individual language use habit to candidate speech recognition result is closed,
So that the sort result for more meeting user individual language use habit effectively increases the accurate of speech recognition result in forefront
Property.
Referring to fig. 2, for another embodiment of the present invention provides audio recognition method flow chart.Not with embodiment illustrated in fig. 1
With, this embodiment also contemplates influence of the group's language model corresponding to the user to recognition result, by with user
Corresponding group's language model can identify that user is rarely employed but the common personalized language of similar population corresponding to the user
Material, to make up the deficiency of user's corpus, improves the accuracy of speech recognition.
S201 establishes personal language model corresponding to the user.
When specific implementation, personal language model corresponding to the user can be established in the following manner: obtaining going through for user
History text input data;The word feature and/or word combination of the user are obtained according to the history text input data of the user
Feature, institute's predicate feature include the statistics frequency of word and word, and institute's word combination feature includes word combination and word combination
The statistics frequency;A human speech corresponding with the user is obtained using word feature and/or word combination the feature training of the user
Say model.Specific implementation is referred to embodiment illustrated in fig. 1 and realizes.
S202 establishes each group's language model.
Wherein, group's language model is used to describe the language feature of the affiliated group of user.Each user has therewith
Corresponding group's language model, can pre-save the corresponding relationship of user Yu group's language model, in S205, it can benefit
With the corresponding relationship pre-saved, group corresponding with active user language model is obtained.
Wherein, it is described establish each group's language model the following steps are included:
S202A calculates the similarity between different user, obtains similar users group set according to the similarity of calculating.Institute
Stating similar users group set includes each user that similarity is greater than given threshold.
When specific implementation, S202A may include: the word feature vector for obtaining different user again;By the different user
Each user calculates the word feature vector of the active user and the word feature vector of other users respectively as active user
COS distance, using the COS distance as the similarity of active user and other users;It will be big with the similarity of active user
Similar users corresponding with active user group set is added in each user of given threshold.
For example, the word feature vector of user can be obtained by the 1-Gram individual language model of user, shown word
Feature vector can be the probability value of each word, represent frequency of use of the user on each word.Due to similar users
Frequency of use to word is similar, therefore the similitude of user group can be measured by the similarity of word feature vector.
For example, the common words of the common words of doctor A and doctor B are similar, common words and ice hockey the coach C, truck of doctor A
The common words of driver D be it is different, the word feature of each user can reflect on the vector of a vocabulary size.Doctor
The word feature vector of A and the similarity of doctor B word feature vector are greater than the word feature vector of doctor A certainly and ice hockey trains C's
The similarity of word feature vector.It, can be using the method for the COS distance for calculating vector when calculating similarity.
Wherein it is possible to calculate the COS distance of two vectors a and b using following formula:
Cos θ=(ab)/‖ a ‖ ‖ b ‖
It should be noted that if the similarity of two user's word feature vectors is greater than given threshold, it is believed that two use
Family is similar users.Will with the similarity of active user be greater than given threshold each user constitute set can be used as with currently
The corresponding similar users group of user.If the text input corpus of active user is less, similar users group can be made up currently
User version inputs the less deficiency of corpus, can supplement user and be rarely employed but user group similar with its is commonly used
Personalized corpus, thus obtained speech recognition result is more accurate.
S202B is obtained corresponding with similar users group using the text input of each user of similar users group set
Word feature and/or word combination feature, institute's predicate feature include the statistics frequency of word and word, and institute's word combination feature includes
The statistics frequency of word combination and word combination.
S202C obtains group's language using the word feature corresponding with similar users group and/or the training of word combination feature
Say model.
It should be noted that the method that training obtains group's language model is similar with the method for obtaining personal language model,
Only input corpus is different, and the input corpus of group's language model can be some or all of in similar users group
The corpus of user.
S203 establishes general language model.
Wherein, there is no the successive of certainty to execute sequence by S201, S202, S203, and execution sequence can be executed reversedly,
Or it is performed in parallel.
S204 receives the voice input of user, identifies to voice input, obtain candidate speech recognition result.
S205 utilizes general language model and the user corresponding personal language model, group corresponding with the user
Body language model is ranked up the candidate speech recognition result.
In some embodiments, it is described using general language model, personal language model corresponding with the user and
It includes: to utilize general language model that the corresponding group's language model of the user, which is ranked up the candidate speech recognition result,
And personal language model corresponding with the user, group's language model carry out linear interpolation, obtain the language of the user
Identification model;The probability that each candidate speech recognition result is calculated using the language identification model of the user, according to the general of calculating
Rate is ranked up each candidate speech recognition result.
For example, weight can be preset, the language identification model of user is obtained by following formula:
Language identification model=x × individual's language model+y × general language model+z × group's language model of user
Wherein, 0 < x < 1,0 < y < 1,0 < z < 1, x+y+z=1.
For example, the value that the value that the value of x is 0.5, b is 0.3, z is 0.2.
When specific implementation, the final language identification model that can use acquisition calculates the general of each candidate speech recognition result
Rate, according to probability size descending sort.
In some embodiments, it is described using general language model, personal language model corresponding with the user and
It includes: to utilize general language model that the corresponding group's language model of the user, which is ranked up the candidate speech recognition result,
It calculates the probability of each candidate speech recognition result, calculate each candidate speech knowledge using personal language model corresponding with the user
The probability of other result and the probability that each candidate speech recognition result is calculated using group corresponding with user language model;
It is calculated by the probability being calculated using general language model, using personal language model corresponding with the user general
Rate and the probability progress linear interpolation being calculated using group's language model corresponding to the user, according to the knot of linear interpolation
Fruit is ranked up each candidate speech recognition result.
For example, weight can be preset, the result of linear interpolation is obtained by following formula:
Probability value+y × general language model probability value+z × group's language of final probability value=x × individual's language model
Say the probability value of model
Wherein, 0 < x < 1,0 < y < 1,0 < z < 1, x+y+z=1.
S206 obtains final speech recognition result using the candidate speech recognition result after sequence.
In this embodiment of the invention, it is contemplated that influence of the group's language model corresponding to the user to recognition result,
By group's language model corresponding to the user, it can identify that user is rarely employed but similar population corresponding to the user is common
Personalized corpus improve the accuracy of speech recognition to make up the deficiency of user's corpus.
Referring to Fig. 3, the speech recognition equipment schematic diagram provided for one embodiment of the invention.
A kind of speech recognition equipment 300, comprising:
Recognition unit 301, the voice for receiving user are inputted, are identified to voice input, obtain candidate language
Sound recognition result.Wherein, the specific implementation of the recognition unit 301 is referred to the step 101 of embodiment illustrated in fig. 1 and real
It is existing.
Sequencing unit 302 is ranked up the candidate speech recognition result for the language identification model using user;
Wherein, the language identification model of the user is obtained by general language model and personal language model corresponding to the user,
The personal language model corresponding to the user is the language model established using the history text input data of the user.Its
In, the specific implementation of the sequencing unit 302 is referred to the step 102 of embodiment illustrated in fig. 1 and realizes.
As a result obtaining unit 303, for obtaining final speech recognition knot using the candidate speech recognition result after sequence
Fruit.Wherein, the specific implementation of the result obtaining unit 303 is referred to the step 103 of embodiment illustrated in fig. 1 and realizes.
In some embodiments, the sequencing unit 302 is specifically used for: using general language model and with the use
The corresponding personal language model in family carries out linear interpolation, obtains the language identification model of the user;Utilize the language of the user
Speech identification model calculates the probability of each candidate speech recognition result, is carried out according to the probability of calculating to each candidate speech recognition result
Sequence.
In some embodiments, the sequencing unit 302 is specifically used for: calculating each candidate language using general language model
The probability of sound recognition result and utilization personal language model corresponding with the user calculate each candidate speech recognition result
Probability;It is calculated by the probability being calculated using general language model and using personal language model corresponding with the user
The probability arrived carries out linear interpolation, is ranked up according to the result of linear interpolation to each candidate speech recognition result.
In some embodiments, described device further includes that personal language model establishes unit, for establishing and user couple
The personal language model answered, wherein individual's language model is established unit and is specifically used for: the history text input of user is obtained
Data;The word feature and/or word combination feature of the user, institute's predicate are obtained according to the history text input data of the user
Feature includes the statistics frequency of word and word, and institute's word combination feature includes the statistics frequency of word combination and word combination;
Personal language model corresponding with the user is obtained using word feature and/or word combination the feature training of the user.
In some embodiments, described device further includes obtaining group's language model unit, for obtaining and the use
The corresponding group's language model in family;Group's language model is used to describe the language feature of the affiliated group of user;
The sequencing unit is also used to: utilizing general language model, personal language model corresponding with the user and institute
The corresponding group's language model of user is stated to be ranked up the candidate speech recognition result.
In some embodiments, described device further includes that group's language model establishes unit, group's language model
It establishes unit to be specifically used for: calculating the similarity between different user, similar users group set is obtained according to the similarity of calculating,
The similar users group set includes each user that similarity is greater than given threshold;Gather each user using similar users group
Text input obtain corresponding with similar users group word feature and/or word combination feature, institute's predicate feature includes word
And the statistics frequency of word, institute's word combination feature include the statistics frequency of word combination and word combination;Using described with phase
Group's language model is obtained like the corresponding word feature of user group and/or the training of word combination feature.
In some embodiments, group's language model is established unit and is specifically used for: the word for obtaining different user is special
Levy vector;Using each user of the different user as active user, calculate the word feature vector of the active user with
And the COS distance of the word feature vector of other users, it is similar to other users using the COS distance as active user
Degree;By and the similarity of active user be greater than each user of given threshold corresponding with active user similar users group be added
Body set.
Wherein, the setting of apparatus of the present invention each unit or module is referred to Fig. 1 and realizes to method shown in Fig. 2,
This is not repeated.
It referring to fig. 4, is a kind of block diagram for speech recognition equipment shown according to an exemplary embodiment.For example, dress
Setting 400 can be mobile phone, computer, digital broadcasting terminal, messaging device, game console, tablet device, medical treatment
Equipment, body-building equipment, personal digital assistant etc..
Referring to Fig. 4, device 400 may include following one or more components: processing component 402, memory 404, power supply
Component 406, multimedia component 408, audio component 410, the interface 412 of input/output (I/O), sensor module 414, and
Communication component 416.
The integrated operation of the usual control device 400 of processing component 402, such as with display, telephone call, data communication, phase
Machine operation and record operate associated operation.Processing component 402 may include that one or more processors 420 refer to execute
It enables, to perform all or part of the steps of the methods described above.In addition, processing component 402 may include one or more modules, just
Interaction between processing component 402 and other assemblies.For example, processing component 402 may include multi-media module, it is more to facilitate
Interaction between media component 408 and processing component 402.
Memory 404 is configured as storing various types of data to support the operation in equipment 400.These data are shown
Example includes the instruction of any application or method for operating on device 400, contact data, and telephone book data disappears
Breath, picture, video etc..Memory 404 can be by any kind of volatibility or non-volatile memory device or their group
It closes and realizes, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable to compile
Journey read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash
Device, disk or CD.
Power supply module 406 provides electric power for the various assemblies of device 400.Power supply module 406 may include power management system
System, one or more power supplys and other with for device 400 generate, manage, and distribute the associated component of electric power.
Multimedia component 408 includes the screen of one output interface of offer between described device 400 and user.One
In a little embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen
Curtain may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touch sensings
Device is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding action
Boundary, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, more matchmakers
Body component 408 includes a front camera and/or rear camera.When equipment 400 is in operation mode, such as screening-mode or
When video mode, front camera and/or rear camera can receive external multi-medium data.Each front camera and
Rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.
Audio component 410 is configured as output and/or input audio signal.For example, audio component 410 includes a Mike
Wind (MIC), when device 400 is in operation mode, when such as call mode, recording mode, and voice recognition mode, microphone is matched
It is set to reception external audio signal.The received audio signal can be further stored in memory 404 or via communication set
Part 416 is sent.In some embodiments, audio component 410 further includes a loudspeaker, is used for output audio signal.
I/O interface 412 provides interface between processing component 402 and peripheral interface module, and above-mentioned peripheral interface module can
To be keyboard, click wheel, button etc..These buttons may include, but are not limited to: home button, volume button, start button and lock
Determine button.
Sensor module 414 includes one or more sensors, and the state for providing various aspects for device 400 is commented
Estimate.For example, sensor module 414 can detecte the state that opens/closes of equipment 400, and the relative positioning of component, for example, it is described
Component is the display and keypad of device 400, and sensor module 414 can be with 400 1 components of detection device 400 or device
Position change, the existence or non-existence that user contacts with device 400,400 orientation of device or acceleration/deceleration and device 400
Temperature change.Sensor module 414 may include proximity sensor, be configured to detect without any physical contact
Presence of nearby objects.Sensor module 414 can also include optical sensor, such as CMOS or ccd image sensor, at
As being used in application.In some embodiments, which can also include acceleration transducer, gyro sensors
Device, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 416 is configured to facilitate the communication of wired or wireless way between device 400 and other equipment.Device
400 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or their combination.In an exemplary implementation
In example, communication component 414 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel.
In one exemplary embodiment, the communication component 414 further includes near-field communication (NFC) module, to promote short range communication.Example
Such as, NFC module can be based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology,
Bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, device 400 can be believed by one or more application specific integrated circuit (ASIC), number
Number processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array
(FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.
Specifically, the embodiment of the invention provides a kind of speech recognition equipments 400, include memory 404 and one
Perhaps more than one program one of them or more than one program is stored in memory 404, and is configured to by one
Or it includes the instruction for performing the following operation that more than one processor 420, which executes the one or more programs:
The voice input for receiving user, identifies voice input, obtains candidate speech recognition result;Utilize the language of user
Identification model is ranked up the candidate speech recognition result;Wherein, the language identification model of the user passes through common language
Speech model and personal language model corresponding to the user obtain, and the personal language model corresponding to the user is described in utilization
The language model that the history text input data of user is established;Final language is obtained using the candidate speech recognition result after sequence
Sound recognition result.
Further, it includes to be used for that the processor 420, which specifically is also used to execute the one or more programs,
The instruction performed the following operation: it is linearly inserted using general language model and personal language model corresponding with the user
Value, obtains the language identification model of the user;Each candidate speech identification knot is calculated using the language identification model of the user
The probability of fruit is ranked up each candidate speech recognition result according to the probability of calculating.
Further, it includes to be used for that the processor 420, which specifically is also used to execute the one or more programs,
The instruction performed the following operation: using general language model calculate each candidate speech recognition result probability and using with it is described
The corresponding personal language model of user calculates the probability of each candidate speech recognition result;It will be calculated using general language model
Probability and carry out linear interpolation using the probability that personal language model corresponding with the user is calculated, inserted according to linear
The result of value is ranked up each candidate speech recognition result.
Further, it includes to be used for that the processor 420, which specifically is also used to execute the one or more programs,
The instruction performed the following operation: the history text input data of user is obtained;According to the history text input data of the user
The word feature and/or word combination feature of the user are obtained, institute's predicate feature includes the statistics frequency of word and word, described
Word combination feature includes the statistics frequency of word combination and word combination;Utilize the word feature and/or word combination feature of the user
Training obtains personal language model corresponding with the user.
Further, it includes to be used for that the processor 420, which specifically is also used to execute the one or more programs,
The instruction performed the following operation: group corresponding with user language model is obtained;Group's language model is for describing
The language feature of the affiliated group of user;Utilize general language model, personal language model corresponding with the user and the use
The corresponding group's language model in family is ranked up the candidate speech recognition result.
Further, it includes to be used for that the processor 420, which specifically is also used to execute the one or more programs,
The instruction performed the following operation: calculating the similarity between different user, obtains each similar users group according to the similarity of calculating
Set, the similar users group set include each user that similarity is greater than given threshold;Gathered using similar users group
The text input of each user obtain corresponding with similar users group word feature and/or word combination feature, institute's predicate feature packet
The statistics frequency of word and word is included, institute's word combination feature includes the statistics frequency of word combination and word combination;Using institute
It states word feature corresponding with similar users group and/or the training of word combination feature obtains group's language model.
Further, it includes to be used for that the processor 420, which specifically is also used to execute the one or more programs,
The instruction performed the following operation: different word feature vectors is obtained;It is used using each user of the different user as current
Family calculates the COS distance of the word feature vector of the active user and the word feature vector of other users, by the cosine
Similarity of the distance as active user and other users;The user for being greater than given threshold with the similarity of active user is added
Similar users corresponding with active user group set.
A kind of machine readable media, such as the machine readable media can be non-transitorycomputer readable storage medium,
When the instruction in the medium is executed by the processor of device (terminal or server), enable a device to execute a kind
Audio recognition method, which comprises the voice input for receiving user identifies voice input, obtains candidate
Speech recognition result;The candidate speech recognition result is ranked up using the language identification model of user;Described and user
Corresponding individual's language model is the language model established using the history text input data of the user;After sequence
Candidate speech recognition result obtains final speech recognition result.
Optionally, the language identification model using user, which is ranked up the candidate speech recognition result, includes:
Linear interpolation is carried out using general language model and personal language model corresponding with the user, obtains the language of the user
Say identification model;The probability that each candidate speech recognition result is calculated using the language identification model of the user, according to calculating
Probability is ranked up each candidate speech recognition result.
Optionally, the language identification model using user, which is ranked up the candidate speech recognition result, includes:
The probability of each candidate speech recognition result is calculated using general language model and utilizes personal language corresponding with the user
Model calculates the probability of each candidate speech recognition result;By the probability being calculated using general language model with using with it is described
The probability that the corresponding personal language model of user is calculated carries out linear interpolation, according to the result of linear interpolation to each candidate language
Sound recognition result is ranked up.
Optionally, the method also includes: obtain the history text input data of user;According to the history of user text
This input data obtains the word feature and/or word combination feature of the user, and institute's predicate feature includes the system of word and word
The frequency is counted, institute's word combination feature includes the statistics frequency of word combination and word combination;Using the user word feature and/or
The training of word combination feature obtains personal language model corresponding with the user.
Optionally, the method also includes: obtain group corresponding with user language model;Group's language mould
Type is used to describe the language feature of the affiliated group of user;It is described that the candidate speech is identified using the language identification model of user
As a result be ranked up includes: to utilize general language model, personal language model corresponding with the user, corresponding with the user
Group's language model the candidate speech recognition result is ranked up.
Optionally, acquisition group corresponding with user language model includes:
Pre-establish each group's language model;
According to the corresponding relationship of user and group's language model, group corresponding with user language model is obtained.
Optionally, it is described pre-establish each group's language model include: calculate different user between similarity, according to calculating
Similarity obtain similar users group set, similar users group set includes each use that similarity is greater than given threshold
Family;Using the text input of each user of similar users group set obtain corresponding with similar users group word feature and/or
Word combination feature, institute's predicate feature include the statistics frequency of word and word, institute's word combination feature include word combination and
The statistics frequency of word combination;Group is obtained using the word feature corresponding with similar users group and/or the training of word combination feature
Body language model.
Optionally, the similarity calculated between different user obtains similar users group collection according to the similarity of calculating
Conjunction includes: the word feature vector for obtaining different user;Using each user of the different user as active user, institute is calculated
The COS distance for stating the word feature vector of active user and the word feature vector of other users, using the COS distance as working as
The similarity of preceding user and other users;It will be added with the similarity of active user greater than the user of given threshold and described current
The corresponding similar users group set of user.
Fig. 5 is the structural schematic diagram of server in the embodiment of the present invention.The server 500 can be due to configuration or performance be different
Generate bigger difference, may include one or more central processing units (central processing units,
CPU) 522 (for example, one or more processors) and memory 532, one or more storage application programs 542 or
The storage medium 530 (such as one or more mass memory units) of data 544.Wherein, memory 532 and storage medium
530 can be of short duration storage or persistent storage.The program for being stored in storage medium 530 may include one or more modules
(diagram does not mark), each module may include to the series of instructions operation in server.Further, central processing unit
522 can be set to communicate with storage medium 530, and the series of instructions behaviour in storage medium 530 is executed on server 500
Make.
Server 500 can also include one or more power supplys 526, one or more wired or wireless networks
Interface 550, one or more input/output interfaces 558, one or more keyboards 556, and/or, one or one
The above operating system 541, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to of the invention its
Its embodiment.The present invention is directed to cover any variations, uses, or adaptations of the invention, these modifications, purposes or
Person's adaptive change follows general principle of the invention and including the undocumented common knowledge in the art of the disclosure
Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following
Claim is pointed out.
It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, and
And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and
Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to
Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those
Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment
Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that
There is also other identical elements in process, method, article or equipment including the element.The present invention can be by calculating
The general described in the text, such as program module up and down for the computer executable instructions that machine executes.Generally, program module includes holding
The routine of row particular task or realization particular abstract data type, programs, objects, component, data structure etc..It can also divide
Cloth, which calculates, practices the present invention in environment, in these distributed computing environments, by connected long-range by communication network
Processing equipment executes task.In a distributed computing environment, program module can be located at the local including storage equipment
In remote computer storage medium.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device reality
For applying example, since it is substantially similar to the method embodiment, so describing fairly simple, related place is referring to embodiment of the method
Part explanation.The apparatus embodiments described above are merely exemplary, wherein described be used as separate part description
Unit may or may not be physically separated, component shown as a unit may or may not be
Physical unit, it can it is in one place, or may be distributed over multiple network units.It can be according to the actual needs
Some or all of the modules therein is selected to achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying
In the case where creative work, it can understand and implement.The above is only a specific embodiment of the invention, should be referred to
Out, for those skilled in the art, without departing from the principle of the present invention, can also make several
Improvements and modifications, these modifications and embellishments should also be considered as the scope of protection of the present invention.
Claims (10)
1. a kind of audio recognition method characterized by comprising
The voice input for receiving user, identifies voice input, obtains candidate speech recognition result;
The candidate speech recognition result is ranked up using the language identification model of user;Wherein, the language of the user
Identification model is obtained by general language model and personal language model corresponding to the user, the individual corresponding to the user
Language model is the language model established using the history text input data of the user;
Final speech recognition result is obtained using the candidate speech recognition result after sequence.
2. the method according to claim 1, wherein the language identification model using user is to the candidate
Speech recognition result, which is ranked up, includes:
Linear interpolation is carried out using the general language model and personal language model corresponding with the user, is obtained described
The language identification model of user;
The probability that each candidate speech recognition result is calculated using the language identification model of the user, according to the probability of calculating to each
Candidate speech recognition result is ranked up.
3. the method according to claim 1, wherein the language identification model using user is to the candidate
Speech recognition result, which is ranked up, includes:
The probability of each candidate speech recognition result is calculated using general language model and utilizes individual corresponding with the user
Language model calculates the probability of each candidate speech recognition result;
It is calculated by the probability being calculated using general language model and using personal language model corresponding with the user
The probability arrived carries out linear interpolation, is ranked up according to the result of linear interpolation to each candidate speech recognition result.
4. according to claim 1 to method described in 3 any one, which is characterized in that the method also includes:
Obtain the history text input data of user;
The word feature and/or word combination feature of the user, institute's predicate are obtained according to the history text input data of the user
Feature includes the statistics frequency of word and word, and institute's word combination feature includes the statistics frequency of word combination and word combination;
Personal language model corresponding with the user is obtained using word feature and/or word combination the feature training of the user.
5. the method according to claim 1, wherein the method also includes:
Obtain group corresponding with user language model;Group's language model is used to describe the language of the affiliated group of user
Say feature;
The language identification model using user, which is ranked up the candidate speech recognition result, includes:
Utilize general language model and the user corresponding personal language model, group's language mould corresponding with the user
Type is ranked up the candidate speech recognition result.
6. according to the method described in claim 5, it is characterized in that, acquisition group corresponding with user language model
Include:
Pre-establish each group's language model;
According to the corresponding relationship of user and group's language model, group corresponding with user language model is obtained;
Wherein, pre-establishing each group's language model includes:
The similarity between different user is calculated, each similar users group is obtained according to the similarity of calculating and is gathered, the similar use
Family group set includes each user that similarity is greater than given threshold;
Using the text input of each user of similar users group set obtain corresponding with similar users group word feature and/or
Word combination feature, institute's predicate feature include the statistics frequency of word and word, institute's word combination feature include word combination and
The statistics frequency of word combination;
Group's language model is obtained using the word feature corresponding with similar users group and/or the training of word combination feature.
7. according to the method described in claim 6, it is characterized in that, it is described calculate different user between similarity, according to calculating
Similarity obtain similar users group set include:
Obtain the word feature vector of different user;
Using each user of the different user as active user, calculate the active user word feature vector and its
The COS distance of the word feature vector of his user, using the COS distance as the similarity of active user and other users;
By and the similarity of active user be greater than the user of given threshold corresponding with active user similar users group be added
Body set.
8. a kind of speech recognition equipment characterized by comprising
Recognition unit, the voice for receiving user are inputted, are identified to voice input, obtain candidate speech identification knot
Fruit;
Sequencing unit is ranked up the candidate speech recognition result for the language identification model using user;Wherein, institute
The language identification model for stating user is obtained by general language model and personal language model corresponding to the user, described and use
The corresponding personal language model in family is the language model established using the history text input data of the user;
As a result obtaining unit, for obtaining final speech recognition result using the candidate speech recognition result after sequence.
9. a kind of device for speech recognition, which is characterized in that include memory and one or more than one journey
Sequence, perhaps more than one program is stored in memory and is configured to by one or more than one processor for one of them
Executing the one or more programs includes the instruction for performing the following operation:
The voice input for receiving user, identifies voice input, obtains candidate speech recognition result;
The candidate speech recognition result is ranked up using the language identification model of user;Wherein, the language of the user
Identification model is obtained by general language model and personal language model corresponding to the user, the individual corresponding to the user
Language model is the language model established using the history text input data of the user;
Final speech recognition result is obtained using the candidate speech recognition result after sequence.
10. a kind of machine readable media is stored thereon with instruction, when executed by one or more processors, so that device is held
Audio recognition method of the row as described in one or more in claim 1 to 7.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710537548.3A CN109243430B (en) | 2017-07-04 | 2017-07-04 | Voice recognition method and device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710537548.3A CN109243430B (en) | 2017-07-04 | 2017-07-04 | Voice recognition method and device |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN109243430A true CN109243430A (en) | 2019-01-18 |
| CN109243430B CN109243430B (en) | 2022-03-01 |
Family
ID=65083290
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201710537548.3A Active CN109243430B (en) | 2017-07-04 | 2017-07-04 | Voice recognition method and device |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN109243430B (en) |
Cited By (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110120221A (en) * | 2019-06-06 | 2019-08-13 | 上海蔚来汽车有限公司 | The offline audio recognition method of user individual and its system for vehicle system |
| CN110502126A (en) * | 2019-05-28 | 2019-11-26 | 华为技术有限公司 | Input method and electronic equipment |
| CN110992939A (en) * | 2019-12-18 | 2020-04-10 | 广州市百果园信息技术有限公司 | Language model training method, decoding method, device, storage medium and equipment |
| CN111145756A (en) * | 2019-12-26 | 2020-05-12 | 北京搜狗科技发展有限公司 | Voice recognition method and device for voice recognition |
| CN111554276A (en) * | 2020-05-15 | 2020-08-18 | 深圳前海微众银行股份有限公司 | Speech recognition method, apparatus, device, and computer-readable storage medium |
| CN111627452A (en) * | 2019-02-28 | 2020-09-04 | 百度在线网络技术(北京)有限公司 | Voice decoding method and device and terminal equipment |
| CN111651599A (en) * | 2020-05-29 | 2020-09-11 | 北京搜狗科技发展有限公司 | Method and device for sorting candidate voice recognition results |
| CN111816165A (en) * | 2020-07-07 | 2020-10-23 | 北京声智科技有限公司 | Speech recognition method, device and electronic device |
| CN112242142A (en) * | 2019-07-17 | 2021-01-19 | 北京搜狗科技发展有限公司 | A kind of voice recognition input method and related device |
| CN112363631A (en) * | 2019-07-24 | 2021-02-12 | 北京搜狗科技发展有限公司 | Input method, input device and input device |
| CN112490516A (en) * | 2019-08-23 | 2021-03-12 | 上海汽车集团股份有限公司 | A power battery maintenance mode generation system and method |
| CN114327355A (en) * | 2021-12-30 | 2022-04-12 | 科大讯飞股份有限公司 | Voice input method, electronic device and computer storage medium |
| CN115394289A (en) * | 2022-07-27 | 2022-11-25 | 京东科技信息技术有限公司 | Identification information generation method and device, electronic equipment and computer readable medium |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20120173237A1 (en) * | 2003-12-23 | 2012-07-05 | Nuance Communications, Inc. | Interactive speech recognition model |
| CN103577386A (en) * | 2012-08-06 | 2014-02-12 | 腾讯科技(深圳)有限公司 | Method and device for dynamically loading language model based on user input scene |
| CN104508739A (en) * | 2012-06-21 | 2015-04-08 | 谷歌公司 | dynamic language model |
| US9190055B1 (en) * | 2013-03-14 | 2015-11-17 | Amazon Technologies, Inc. | Named entity recognition with personalized models |
| CN105096940A (en) * | 2015-06-30 | 2015-11-25 | 百度在线网络技术(北京)有限公司 | Method and device for voice recognition |
| CN105122354A (en) * | 2012-12-12 | 2015-12-02 | 亚马逊技术有限公司 | Speech model retrieval in distributed speech recognition systems |
| CN105448292A (en) * | 2014-08-19 | 2016-03-30 | 北京羽扇智信息科技有限公司 | Scene-based real-time voice recognition system and method |
-
2017
- 2017-07-04 CN CN201710537548.3A patent/CN109243430B/en active Active
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20120173237A1 (en) * | 2003-12-23 | 2012-07-05 | Nuance Communications, Inc. | Interactive speech recognition model |
| CN104508739A (en) * | 2012-06-21 | 2015-04-08 | 谷歌公司 | dynamic language model |
| CN103577386A (en) * | 2012-08-06 | 2014-02-12 | 腾讯科技(深圳)有限公司 | Method and device for dynamically loading language model based on user input scene |
| CN105122354A (en) * | 2012-12-12 | 2015-12-02 | 亚马逊技术有限公司 | Speech model retrieval in distributed speech recognition systems |
| US9190055B1 (en) * | 2013-03-14 | 2015-11-17 | Amazon Technologies, Inc. | Named entity recognition with personalized models |
| CN105448292A (en) * | 2014-08-19 | 2016-03-30 | 北京羽扇智信息科技有限公司 | Scene-based real-time voice recognition system and method |
| CN105096940A (en) * | 2015-06-30 | 2015-11-25 | 百度在线网络技术(北京)有限公司 | Method and device for voice recognition |
Cited By (19)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111627452A (en) * | 2019-02-28 | 2020-09-04 | 百度在线网络技术(北京)有限公司 | Voice decoding method and device and terminal equipment |
| CN111627452B (en) * | 2019-02-28 | 2023-05-23 | 百度在线网络技术(北京)有限公司 | Voice decoding method and device and terminal equipment |
| CN110502126A (en) * | 2019-05-28 | 2019-11-26 | 华为技术有限公司 | Input method and electronic equipment |
| CN110502126B (en) * | 2019-05-28 | 2023-12-29 | 华为技术有限公司 | Input methods and electronic devices |
| CN110120221A (en) * | 2019-06-06 | 2019-08-13 | 上海蔚来汽车有限公司 | The offline audio recognition method of user individual and its system for vehicle system |
| CN112242142B (en) * | 2019-07-17 | 2024-01-30 | 北京搜狗科技发展有限公司 | A speech recognition input method and related devices |
| CN112242142A (en) * | 2019-07-17 | 2021-01-19 | 北京搜狗科技发展有限公司 | A kind of voice recognition input method and related device |
| CN112363631A (en) * | 2019-07-24 | 2021-02-12 | 北京搜狗科技发展有限公司 | Input method, input device and input device |
| CN112490516A (en) * | 2019-08-23 | 2021-03-12 | 上海汽车集团股份有限公司 | A power battery maintenance mode generation system and method |
| CN110992939A (en) * | 2019-12-18 | 2020-04-10 | 广州市百果园信息技术有限公司 | Language model training method, decoding method, device, storage medium and equipment |
| CN111145756B (en) * | 2019-12-26 | 2022-06-14 | 北京搜狗科技发展有限公司 | Voice recognition method and device for voice recognition |
| CN111145756A (en) * | 2019-12-26 | 2020-05-12 | 北京搜狗科技发展有限公司 | Voice recognition method and device for voice recognition |
| CN111554276B (en) * | 2020-05-15 | 2023-11-03 | 深圳前海微众银行股份有限公司 | Speech recognition method, device, equipment and computer readable storage medium |
| CN111554276A (en) * | 2020-05-15 | 2020-08-18 | 深圳前海微众银行股份有限公司 | Speech recognition method, apparatus, device, and computer-readable storage medium |
| CN111651599A (en) * | 2020-05-29 | 2020-09-11 | 北京搜狗科技发展有限公司 | Method and device for sorting candidate voice recognition results |
| CN111651599B (en) * | 2020-05-29 | 2023-05-26 | 北京搜狗科技发展有限公司 | Method and device for ordering voice recognition candidate results |
| CN111816165A (en) * | 2020-07-07 | 2020-10-23 | 北京声智科技有限公司 | Speech recognition method, device and electronic device |
| CN114327355A (en) * | 2021-12-30 | 2022-04-12 | 科大讯飞股份有限公司 | Voice input method, electronic device and computer storage medium |
| CN115394289A (en) * | 2022-07-27 | 2022-11-25 | 京东科技信息技术有限公司 | Identification information generation method and device, electronic equipment and computer readable medium |
Also Published As
| Publication number | Publication date |
|---|---|
| CN109243430B (en) | 2022-03-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109243430A (en) | A kind of audio recognition method and device | |
| CN109117862B (en) | Image tag recognition methods, device and server | |
| CN108874967B (en) | Method and device for determining dialog state, dialog system, terminal, storage medium | |
| CN107291690B (en) | Punctuation adding method and device and punctuation adding device | |
| US11138422B2 (en) | Posture detection method, apparatus and device, and storage medium | |
| CN110674801B (en) | Method and device for identifying user motion mode based on accelerometer and electronic equipment | |
| CN107992812A (en) | A kind of lip reading recognition methods and device | |
| CN109389162B (en) | Sample image screening technique and device, electronic equipment and storage medium | |
| CN107221330A (en) | Punctuate adding method and device, the device added for punctuate | |
| WO2021128880A1 (en) | Speech recognition method, device, and device for speech recognition | |
| CN108345581A (en) | A kind of information identifying method, device and terminal device | |
| CN109819288A (en) | Determination method, apparatus, electronic equipment and the storage medium of advertisement dispensing video | |
| CN107992813A (en) | A kind of lip condition detection method and device | |
| CN108831508A (en) | Voice activity detection method, device and equipment | |
| CN109360197A (en) | Processing method, device, electronic equipment and the storage medium of image | |
| CN109961791A (en) | A kind of voice information processing method, device and electronic equipment | |
| CN109961094A (en) | Sample acquiring method, device, electronic equipment and readable storage medium storing program for executing | |
| CN108803890A (en) | A kind of input method, input unit and the device for input | |
| CN108628813A (en) | Treating method and apparatus, the device for processing | |
| CN111739535A (en) | Voice recognition method and device and electronic equipment | |
| CN110968246A (en) | Intelligent Chinese handwriting input recognition method and device | |
| CN108628819A (en) | Treating method and apparatus, the device for processing | |
| CN109725736A (en) | A kind of candidate's sort method, device and electronic equipment | |
| CN109102813B (en) | Voiceprint recognition method and device, electronic equipment and storage medium | |
| CN110858099A (en) | Candidate word generation method and device |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |