Summary of the invention
The embodiment of the invention provides a kind of methods and terminal for converting speech into text, can be improved with providing one kind
The method that text verifies efficiency and reduces errors in text rate.
The embodiment of the invention provides a kind of methods for converting speech into text, comprising:
Extract the fisrt feature parameter of targeted voice signal;
Second feature parameter is matched with the third feature parameter in speech database, determines N number of target signature ginseng
Number, N number of target signature parameter be it is maximum N number of with the second feature parameter matching degree in the third feature parameter,
N >=2, the second feature parameter are a part in the fisrt feature parameter,;
It determines text corresponding with the maximum target signature parameter of the second feature parameter matching degree, and exports institute
State text;
The accuracy rate of the text is determined using the matching degree of N number of target signature parameter;
If the accuracy rate is lower than preset threshold, the text is carried out to highlight label.
Further, the accuracy rate of the text is determined using the matching degree of N number of target signature parameter, comprising:
Determine the sum of corresponding matching degree of N number of target signature parameter;
Determine that the corresponding matching degree of the text accounts for the specific gravity of the sum of described matching degree, the specific gravity is the standard of the text
True rate.
Further, if the accuracy rate is lower than preset threshold, the text is carried out to highlight label, comprising:
If the accuracy rate is lower than preset threshold, color mark is carried out to the text.
Further, the method also includes:
Obtain voice signal;
If the perdurabgility of sentence halted signals is more than preset time in the voice signal, pauses and believe in the sentence
Make pauses in reading unpunctuated ancient writings at number to the voice signal, forms speech signal segment;
Timestamp is marked to the speech signal segment, the speech signal segment is targeted voice signal.
Further, the method also includes:
The corresponding text section of speech signal segment described in timestamp label using the speech signal segment.
Further, the method also includes:
When detecting play instruction, text to be played is obtained;
The corresponding timestamp of text section where determining the text to be played;
Play the corresponding speech signal segment of the timestamp.
The embodiment of the invention also provides a kind of devices for converting speech into text, comprising:
Extraction unit, for extracting the fisrt feature parameter of targeted voice signal;
Matching unit determines N for matching second feature parameter with the third feature parameter in speech database
A target signature parameter, N number of target signature parameter are to match in the third feature parameter with the second feature parameter
Maximum N number of, N >=2 are spent, the second feature parameter is a part in the fisrt feature parameter;
First determination unit, for the determining and maximum target signature parameter pair of the second feature parameter matching degree
The text answered, and export the text;
Second determination unit, for determining the accuracy rate of the text using the matching degree of N number of target signature parameter;
First marking unit highlights the text if be lower than preset threshold for the accuracy rate
Label.
Further, described device further include:
Acquiring unit, for obtaining voice signal;
Punctuate unit, for when the perdurabgility of sentence halted signals in the voice signal be more than preset time when,
Make pauses in reading unpunctuated ancient writings at the sentence halted signals to the voice signal, forms speech signal segment;
Second marking unit, for marking timestamp to the speech signal segment, the speech signal segment is target
Voice signal, and utilize the corresponding text section of speech signal segment described in the timestamp label of the speech signal segment.
Further, described device further include:
Second acquisition unit obtains text to be played when receiving play instruction;
Third determination unit, for the corresponding timestamp of text section where determining the text to be played;
Broadcast unit, for playing the corresponding speech signal segment of the timestamp.
The embodiment of the invention also provides a kind of systems for converting speech into text, comprising: terminal and with the end
Hold the cloud server of connection;
The terminal is sent to the cloud server for acquiring voice signal, and by the voice signal after acquisition;
The cloud server includes:
Receiving unit, the voice signal sent for receiving the terminal;
Extraction unit, for extracting the fisrt feature parameter of the voice signal;
Matching unit determines N for matching second feature parameter with the third feature parameter in speech database
A target signature parameter, N number of target signature parameter are to match in the third feature parameter with the second feature parameter
Maximum N number of, N >=2 are spent, the second feature parameter is a part in the fisrt feature parameter;
First determination unit, for the determining and maximum target signature parameter pair of the second feature parameter matching degree
The text answered, and export the text;
Second determination unit, for determining the accuracy rate of the text using the matching degree of N number of target signature parameter;
Marking unit, for carrying out highlighting label to the text when the accuracy rate is lower than preset threshold.
Method and apparatus provided in an embodiment of the present invention can make verification personnel easily find the lower text of accuracy rate
Word simultaneously judges correcting errors for the text, while facilitating verification, additionally it is possible to improve verification efficiency and guarantee text after converting
Accuracy rate.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not
The disclosure can be limited.
Specific embodiment
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real
Applying mode, the present invention is described in further detail.
The embodiment of the present invention can be applied at the terminal, such as mobile phone, computer, tablet computer etc..It is one of real
Existing mode can be text and convert in real time, i.e., while the voice signal of acquisition spokesman output on the spot, by the voice signal
It is converted into text, and is saved.In this implementation, the terminal with speech signal collection function can be used, such as
Terminal with microphone.Another implementation can be non real-time conversion, i.e., using the equipment with sound-recording function to speech
The voice signal of person's output is recorded in advance, then the complete speech signal recorded is sent to terminal, and terminal is to getting
Voice signal carry out text conversion.
The embodiment of the present invention can be applied in terminal and the cloud server connecting with terminal.Terminal can will into
The voice signal of row text conversion is sent to cloud server, and cloud server carries out text to the voice signal got and turns
Change, and the text after conversion is sent to terminal.The voice signal of the pending text conversion can be terminal itself and record in real time
It makes, is also possible to be sent to the voice signal of terminal after other sound pick-up outfits have been recorded.
It is a kind of method for converting speech into text provided in an embodiment of the present invention referring to Fig. 1, this method can be applied
At the terminal.As shown in Figure 1, this method may comprise steps of.
Step 11, voice signal is obtained.
The voice signal is to be transformed be text voice signal.The mode for converting voice signals into text can be reality
When convert, be also possible to non real-time conversion.Record to voice signal can be terminal itself and records, and be also possible to
It is recorded using other equipment.
Step 12, if in the voice signal sentence halted signals perdurabgility be more than preset time, in institute's predicate
Make pauses in reading unpunctuated ancient writings at sentence halted signals to the voice signal, forms speech signal segment.
Step 13, timestamp is marked to the speech signal segment, the speech signal segment is targeted voice signal.
During obtaining voice signal or after obtaining voice signal, if sentence halted signals in the voice signal
Perdurabgility be more than preset time, make pauses in reading unpunctuated ancient writings at the sentence halted signals, speech signal segment formed, then to voice
Signal segment marks timestamp.After making pauses in reading unpunctuated ancient writings to voice signal, corresponding timestamp at the punctuate can be calculated, with determination
The initial time of speech signal segment and end time.In the concrete realization, the voice signal piece that first can be formed
The initial time of section is set as 0 second.Speech signal segment can one timestamp of label, can for speech signal segment rise
Begin time or end time.Speech signal segment can also two timestamps of label, respectively when speech signal segment starting when
Between and the end time.
Such as in the mode converted in real time, terminal carries out during acquiring voice signal, while to voice signal
Detection, if the perdurabgility for detecting sentence halted signals in voice signal is more than preset time, in sentence halted signals
Place makes pauses in reading unpunctuated ancient writings, and marks timestamp to the speech signal segment that punctuate is formed, and carry out text conversion.In non real-time conversion
In mode, whole voice signals can be made pauses in reading unpunctuated ancient writings first according to the perdurabgility of sentence halted signals in voice signal,
And corresponding timestamp is marked to each speech signal segment, then text conversion is carried out to speech signal segment.
The perdurabgility of the sentence halted signals can be determined according to the waveform of voice signal, and details are not described herein.
The present embodiment does not limit the punctuate mode of voice signal, such as can also be according to fixed time interval pair
Voice signal is made pauses in reading unpunctuated ancient writings.
Step 14, the fisrt feature parameter of targeted voice signal is extracted.
In the concrete realization, single syllable element one by one can be divided, then mention according to the energy value of voice signal
The characteristic parameter of each single syllable element is taken, to obtain the fisrt feature parameter of targeted voice signal.
If the voice signal is analog signal, it can be first converted into digital signal, then carry out feature extraction, obtained
The corresponding fisrt feature parameter of the voice signal.
In the present embodiment feature extraction can be carried out as unit of the speech signal segment after making pauses in reading unpunctuated ancient writings.
Step 15, second feature parameter is matched with the third feature parameter in speech database, determines N number of target
Characteristic parameter, N number of target signature parameter are maximum with the second feature parameter matching degree in the third feature parameter
N number of, N >=2, the second feature parameter be the fisrt feature parameter in a part.
It should be noted that unless otherwise specified, it is following in matching degree refer both to the characteristic parameter in third feature parameter
With the matching degree between the second feature parameter.
It, can when being matched the fisrt feature parameter of a speech signal segment with the characteristic parameter in speech database
With successively from fisrt feature parameter selected part characteristic parameter, i.e. second feature parameter is matched, to successively match
The corresponding text of one characteristic parameter.The second feature parameter can be the characteristic parameter of a single syllable element, two or four
The corresponding characteristic parameter of single syllable element combinations.
Step 16, text corresponding with the maximum target signature parameter of the second feature parameter matching degree is determined,
And export the text.
Speech database preserves the characteristic parameter of single text, two-character phrase language or the corresponding voice signal of four word Chinese idioms,
That is third feature parameter, it is generally the case that can will be corresponding with the maximum third feature parameter of second feature parameter matching degree
Text is determined as the corresponding text of voice signal to be converted, that is, the text for needing to export.
In the present embodiment, not only it needs to be determined that with the maximum third feature parameter of second feature parameter matching degree, but also
It needs to be determined that at least one is in addition to the maximum third feature parameter of matching degree, most with the second feature parameter matching degree
Big target signature parameter.
For example, to the third feature parameter in the corresponding second feature parameter of the voice signal of " hello " and speech database
It is matched, the maximum third feature parameter of three matching degrees, respectively " hello " corresponding third feature ginseng can be matched to
Number, " Ni Hao " corresponding third feature parameter, and " Ning Hao " corresponding third feature parameter, these three third feature parameters
Matching degree between fisrt feature parameter is respectively 60%, 20% and 5%.Wherein " hello " corresponding third feature ginseng
Matching degree between several and fisrt feature parameter is maximum, it is possible to determine that " hello " is text, i.e. the voice signal is corresponding
Text.Determining text can export and save as text formatting.
It should be noted that the present embodiment is not to the value of N, i.e. the number of target signature parameter is defined, such as can
Think two, three or four etc..
Wherein, the matching degree between fisrt feature parameter and third feature parameter can use dynamic time warping (DTW)
Algorithm, hidden Markov model (HMM) algorithm, vector quantization (VQ) the methods of algorithm or neural network are determined, specific mistake
Details are not described herein for journey.
Step 17, the accuracy rate of the text is determined using the matching degree of N number of target signature parameter.
In this step, it can first determine the sum of corresponding matching degree of N number of target signature parameter, then determine the text
The corresponding matching degree of word accounts for the specific gravity of the sum of described matching degree, and the specific gravity is the accuracy rate of the text.
For example, three target signature parameters are obtained by step 13, the matching degree difference between second feature parameter
It is 60%, 20% and 5%, the sum of these three corresponding matching degrees of target signature parameter are 85% (60%+20%+5%=
85%) the corresponding text of target signature parameter that, matching degree is 60% is text, and the corresponding matching degree of text accounts for the matching
The ratio that the specific gravity of the sum of degree is 60% and 85%, i.e., 70.59%.So the accuracy rate of the text is 70.59%.
In the concrete realization, the sum of maximum matching degree and the second largest matching degree only can also be accounted for by calculating maximum matching degree
Specific gravity, determine the accuracy rate of text.That is, in the example above, ratio that the accuracy rate of text is 60% and 80%
Value, i.e., 75%.
Difference between maximum matching degree matching degree corresponding with other each target signature parameters can characterize maximum
With the accuracy rate for spending corresponding text, the bigger accuracy rate of difference is relatively high, so in the concrete realization, the accuracy rate of text
Can also by calculate the difference between maximum matching degree matching degree corresponding with other each target signature parameters carry out it is true
It is fixed.For example, determining accuracy rate according to the difference between maximum matching degree and the second largest matching degree.Target signature parameter is corresponding
Matching degree is respectively 60% and 20%, and the difference between maximum matching degree and the second largest matching degree is 40% (60%-20%=
40%), can be according to 40% difference and difference accuracy rate corresponding with the calculating of the mapping relations of accuracy rate, or it can also
To be determined as accuracy rate for 40%.It should be noted that the present embodiment does not carry out the circular of the accuracy rate of text
Limitation.
Step 18, if the accuracy rate is lower than preset threshold, the text is carried out to highlight label.
In the present embodiment, color mark can be carried out lower than the text of preset threshold to accuracy rate, after text is shown,
The text that accuracy rate is lower than preset threshold can be highlighted, to facilitate verification personnel to carry out text verification.And it can basis
The grade of accuracy rate is to word marking different colours, the preset range that each grade correspondence can be different.For example, can will not
With accuracy rate be divided into three grades, it can be 80% to 60% without color mark, accuracy rate that accuracy rate, which reaches 80%,
Text carry out yellow flag, accuracy rate lower than 60% text carry out red-label.
When carrying out highlighting label to text, preset range belonging to text accuracy rate can be first determined, further according to this
Affiliated preset range determines color identifier.When display text, the text with color identifier can be character background and indicate
Color or text itself indicate color.
For example, a speech signal segment is that " everybody is leader, and every welcome guest, good morning for everybody!", the text being converted to
Format can be for " everybody is leader, everybody [B: yellow] beer on draft [E: yellow], good morning for everybody!" wherein, [B: yellow] indicates yellow
Color marker beginning, [E: yellow] indicate that yellow flag terminates to locate, and when showing the text, reader can see " beer on draft "
Font mark is yellow.
It should be noted that the mode that the present embodiment does not highlight label to text is defined, can be used for example
Marking, italic label or overstriking label etc. are used for highlighted mark mode.
The corresponding whole texts of targeted voice signal can be matched through the above steps, when targeted voice signal is voice
When signal segment, the corresponding text of each speech signal segment can form text section.
Step 19, the corresponding text section of speech signal segment described in the timestamp label using the speech signal segment.
For example, speech signal segment and its corresponding timestamp are as follows: [00:00] in three big basic telecommunication companies, XXX is detailed
Understand enterprise to reinforce information infrastructure building, carry out technological innovation and application, implementation speed-raising drop expense measure service enterprise in a deep going way
The case where industry and consumer etc..[00:28] he encourages enterprise to aim at scientific and technological revolution and industry transformation trend, puts forth effort to break through
More core technologies seize international competition commanding elevation, promote to wider, deeper time fusion application.
After speech signal segment is converted to text, " in three big basic telecommunication companies, XXX has understood enterprise's reinforcement in detail
Information infrastructure building carries out technological innovation and application in a deep going way, implements speed-raising Jiang Fei measure service enterprise and consumer etc. side
The timestamp of this text section of the case where face " can mark as the second, and " he encourages enterprise to aim at scientific and technological revolution and industry transformation
Trend puts forth effort to break through more core technologies, seizes international competition commanding elevation, promotes to answer to wider, deeper time fusion
With " timestamp of this text section can mark as the second.
The timestamp of text section can have one, and mark the section in text section first or section tail, the timestamp indicate when
Between can be initial time or the end time of the corresponding speech signal segment of this article field.The timestamp of text section can also be with
There are two, and mark that the section in text section is first and section tail, the two timestamps can indicate that this article field is corresponding respectively respectively
The initial time of speech signal segment and end time.
It should be noted that when voice signal carries out punctuate and voice signal after to punctuate according to fixed time interval
When fragment label timestamp, if the timestamp is also the timestamp of the corresponding text section of speech signal segment, the timestamp pair
If the text section answered may not be one section complete for semantically, so after converting text for speech signal segment, it can
Punctuate is re-started with the meaning of one's words according to text, to each text segment mark timestamp after punctuate.To the time of text segment mark
Stamp can be determined according to the corresponding initial time of the corresponding speech signal segment of text section and end time.And it is possible to
According to the text section made pauses in reading unpunctuated ancient writings again, make pauses in reading unpunctuated ancient writings again to voice signal, and the corresponding time is marked to speech signal segment
Stamp.
In non real-time text transform mode, above-mentioned targeted voice signal may be complete voice signal, by the language
After sound signal is converted into text, it can be made pauses in reading unpunctuated ancient writings according to the meaning of one's words of text to text and voice signal, to each text after punctuate
Field and speech signal segment mark timestamp.
Above-mentioned steps 11 can not only be executed to step 19 by terminal, can also be executed by cloud server.Carrying out text
In the mode that word converts in real time, terminal acquires voice signal in real time, and is sent to cloud server, and cloud server is by voice
Signal is then forwarded to terminal after being converted to text, to realize conversion in real time.
In another embodiment, terminal can be when acquiring voice signal, when according to the continuity of sentence halted signals
Between make pauses in reading unpunctuated ancient writings to voice signal, timestamp is marked to speech signal segment, and speech signal segment is sent to cloud service
The speech signal segment is converted to text section by device, cloud server, and carries out highlighting label to corresponding text, with
And then the text section after conversion is sent to text segment mark timestamp by terminal according to the timestamp of speech signal segment,
In this embodiment, the timestamp of speech signal segment can also be marked by cloud server.
Text after conversion can form readable and editable text, when verifying to the text, verify personnel
It can easily find the lower text of accuracy rate and judge correcting errors for the text, while facilitating verification, additionally it is possible to mention
Height verification efficiency and the accuracy rate for guaranteeing text after conversion.
Judging when correcting errors of text conversion, the corresponding speech signal segment of the text can played, will pass through context
Judged, specific implementation process can be with are as follows: when detecting play instruction, obtain text to be played, determine it is described to
The corresponding timestamp of text section where the text of broadcasting, then plays the corresponding speech signal segment of the timestamp.It should be wait broadcast
The text put can be selected text.
In specific implementation, verification personnel can choose the text to be verified, and click play button, click play button
Afterwards, system is able to detect that play instruction, and then plays the corresponding voice signal piece of timestamp of text place text section
Section, so as to judge whether the text is correct by context.
It should be noted that the embodiment of the present invention is not defined the display mode of the broadcast button, the broadcast button
It can show, can also be shown after choosing text after opening text, be shown after text and right click mouse can also be chosen.
Mark text section timestamp can the timestamp of speech signal segment corresponding with text section it is identical, thus true
After fixed text to be played, it can determine that corresponding speech signal segment is gone forward side by side according to the corresponding timestamp of text to be played
Row plays, so use is more convenient, can be improved verification using method provided in this embodiment when carrying out text verification
Efficiency.
It referring to fig. 2, is a kind of structural block diagram for the device for converting speech into text provided in an embodiment of the present invention, the dress
Setting can be only fitted in terminal or for terminal itself.As shown in Fig. 2, the device can specifically include acquiring unit 21, punctuate is single
Member 22, the second marking unit 23, extraction unit 24, matching unit 25, the first determination unit 26, the second determination unit 27, first
Marking unit 28.
Acquiring unit 21, for obtaining voice signal.
Punctuate unit 22, for when the perdurabgility of sentence halted signals in the voice signal be more than preset time when,
Make pauses in reading unpunctuated ancient writings at the sentence halted signals to the voice signal, forms speech signal segment.
Second marking unit 23, for marking timestamp to the speech signal segment, the speech signal segment is mesh
Poster sound signal.
Extraction unit 24, for extracting the fisrt feature parameter of targeted voice signal.
Matching unit 25, for matching second feature parameter with the third feature parameter in speech database, really
Fixed N number of target signature parameter, N number of target signature parameter be in the third feature parameter with the second feature parameter
Matching degree is maximum N number of, N >=2, and the second feature parameter is a part in the fisrt feature parameter.
First determination unit 26, for the determining and maximum target signature parameter of the second feature parameter matching degree
Corresponding text, and export the text.
Second determination unit 27, for determining the accurate of the text using the matching degree of N number of target signature parameter
Rate.
Second determination unit 27 is specifically determined for the sum of corresponding matching degree of N number of target signature parameter;With
And determining that the corresponding matching degree of the text accounts for the specific gravity of the sum of described matching degree, the specific gravity is the accuracy rate of the text.
First marking unit 28 carries out the text prominent aobvious if be lower than preset threshold for the accuracy rate
Indicating note.
Second marking unit 23 is also used to voice signal described in the timestamp label using the speech signal segment
The corresponding text section of segment.
The device can also include second acquisition unit, third determination unit, broadcast unit.
Second acquisition unit, for when receiving play instruction, obtaining text to be played.
Third determination unit, for the corresponding timestamp of text section where determining the text to be played.
Broadcast unit, for playing the corresponding speech signal segment of the timestamp.
Device provided in an embodiment of the present invention can make verification personnel easily find the lower text of accuracy rate and sentence
The text that breaks is corrected errors, while facilitating verification, additionally it is possible to which text is accurate after raising verification efficiency and guarantee conversion
Rate.
It is a kind of structural block diagram for the system for converting speech into text provided in an embodiment of the present invention, this is referring to Fig. 3
System may include terminal 31 and the cloud server 32 that connect with the terminal 31.
Voice signal after acquisition is sent to cloud server 32 for acquiring voice signal by terminal 31.
Cloud server 32 is used to the voice signal being converted to text, and the text is sent to the terminal 31.
Cloud server 32 can specifically include with lower unit.
Receiving unit 321, the voice signal sent for receiving the terminal.
Punctuate unit 322, for when the perdurabgility of sentence halted signals in the voice signal be more than preset time when,
Make pauses in reading unpunctuated ancient writings at the sentence halted signals to the voice signal, forms speech signal segment.
Second marking unit 323, for marking timestamp to the speech signal segment.
Extraction unit 324, for extracting the fisrt feature parameter of the voice signal.
Extraction unit 325, specifically for extracting the fisrt feature parameter of the speech signal segment.
Matching unit 326, for matching second feature parameter with the third feature parameter in speech database, really
Fixed N number of target signature parameter, N number of target signature parameter be in the third feature parameter with the fisrt feature parameter
Matching degree is maximum N number of, N >=2, and the second feature parameter is a part in the fisrt feature parameter.
First determination unit 327 is joined for determining with the maximum target signature of the second feature parameter matching degree
The corresponding text of number, and export the text.
Second determination unit 328, for determining the accurate of the text using the matching degree of N number of target signature parameter
Rate.
Marking unit 329, for carrying out highlighting mark to the text when the accuracy rate is lower than preset threshold
Note.
Second marking unit 323 is also used to voice described in the timestamp label using the speech signal segment and believes
Number corresponding text section of segment.
Transmission unit 330, for the text matched to be sent to the terminal 31, the text includes the voice letter
Number corresponding text.
Device provided in an embodiment of the present invention can make verification personnel easily find the lower text of accuracy rate and sentence
The text that breaks is corrected errors, while facilitating verification, additionally it is possible to which text is accurate after raising verification efficiency and guarantee conversion
Rate.
For device embodiment, since it is basically similar to the method embodiment, related so being described relatively simple
Place illustrates referring to the part of embodiment of the method.
All the embodiments in this specification are described in a progressive manner, the highlights of each of the examples are with
The difference of other embodiments, the same or similar parts between the embodiments can be referred to each other.
It should be understood by those skilled in the art that, the embodiment of the embodiment of the present invention can provide as method, apparatus or calculate
Machine program product.Therefore, the embodiment of the present invention can be used complete hardware embodiment, complete software embodiment or combine software and
The form of the embodiment of hardware aspect.Moreover, the embodiment of the present invention can be used one or more wherein include computer can
With in the computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) of program code
The form of the computer program product of implementation.
The embodiment of the present invention be referring to according to the method for the embodiment of the present invention, terminal device (system) and computer program
The flowchart and/or the block diagram of product describes.It should be understood that flow chart and/or box can be realized by computer program instructions
The combination of the process and/or box in each flow and/or block and flowchart and/or the block diagram in figure.It can provide
These computer program instructions are whole to the processing of general purpose computer, special purpose computer, Embedded Processor or other programmable datas
The processor of end equipment is to generate a machine, so that passing through computer or the place of other programmable data processing terminal devices
The instruction that device executes is managed to generate for realizing in one box of one or more flows of the flowchart and/or block diagram or more
The device for the function of being specified in a box.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing terminal devices
In computer-readable memory operate in a specific manner, so that instruction stored in the computer readable memory generates
Manufacture including command device, the command device are realized in one or more flows of the flowchart and/or one, block diagram
The function of being specified in box or multiple boxes.
These computer program instructions can also be loaded into computer or other programmable data processing terminal devices, so that
Series of operation steps are executed on computer or other programmable terminal equipments to generate computer implemented processing, thus
The instruction executed on computer or other programmable terminal equipments is provided for realizing in one process of flow chart or multiple streams
The step of function of being specified in journey and/or one or more blocks of the block diagram.
Although the preferred embodiment of the embodiment of the present invention has been described, once a person skilled in the art knows bases
This creative concept, then additional changes and modifications can be made to these embodiments.So appended claims are intended to explain
Being includes preferred embodiment and all change and modification for falling into range of embodiment of the invention.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by
One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or behaviour
There are any actual relationship or orders between work.Moreover, the terms "include", "comprise" or its any other change
Body is intended to non-exclusive inclusion, so that including process, method, article or the terminal device of a series of elements
Include not only those elements, but also including other elements that are not explicitly listed, or further includes for this process, side
Method, article or the intrinsic element of terminal device.In the absence of more restrictions, by sentence "including a ..."
The element of restriction, it is not excluded that there is also other in process, method, article or the terminal device for including the element
Identical element.
Above to a kind of method, apparatus and system for converting speech into text provided by the present invention, carry out in detail
It introduces, used herein a specific example illustrates the principle and implementation of the invention, the explanation of above embodiments
It is merely used to help understand method and its core concept of the invention;At the same time, for those skilled in the art, foundation
Thought of the invention, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification
It should not be construed as limiting the invention.