CN103165126A - Method for voice playing of mobile phone text short messages - Google Patents
Method for voice playing of mobile phone text short messages Download PDFInfo
- Publication number
- CN103165126A CN103165126A CN2011104243757A CN201110424375A CN103165126A CN 103165126 A CN103165126 A CN 103165126A CN 2011104243757 A CN2011104243757 A CN 2011104243757A CN 201110424375 A CN201110424375 A CN 201110424375A CN 103165126 A CN103165126 A CN 103165126A
- Authority
- CN
- China
- Prior art keywords
- word
- text
- rhythm
- mobile phone
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 12
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 12
- 230000033764 rhythmic process Effects 0.000 claims description 24
- 238000005259 measurement Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 2
- 238000006243 chemical reaction Methods 0.000 abstract description 5
- 238000004458 analytical method Methods 0.000 abstract description 2
- 230000008901 benefit Effects 0.000 abstract description 2
- 206010047531 Visual acuity reduced Diseases 0.000 abstract 1
- 230000008569 process Effects 0.000 description 15
- 230000006870 function Effects 0.000 description 12
- 238000012545 processing Methods 0.000 description 8
- 238000004590 computer program Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 238000005520 cutting process Methods 0.000 description 5
- 230000008859 change Effects 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000008451 emotion Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 241001672694 Citrus reticulata Species 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000004438 eyesight Effects 0.000 description 1
- 239000012467 final product Substances 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000005211 surface analysis Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Landscapes
- Machine Translation (AREA)
Abstract
The invention discloses a method for voice playing of mobile phone text short messages. After a mobile phone receives a short message in a text type, text analysis is carried out on text word strings of the short message to obtain corresponding voice waveform, and therefore a voice is synthetized and played. The method has the advantages that instant voice synthesis and instant text voice conversion are achieved, time is saved, driving safety of a user is guaranteed, and convenience is brought to old users of poor vision.
Description
Technical field
The present invention relates to moving communicating field, but particularly a kind of real-time voice is play the method for note.
Background technology
Along with popularizing of increasing cell phone type PDA equipment, people have obtained many convenience in life or work, can exchange with the household with friend timely and conveniently.But many users receive note in steering vehicle, can not in time consult, if consult by force, easily cause again traffic hazard; Moreover, for old cellphone subscriber, because the size of screen font and its eyesight degree are not inconsistent, also cause many difficulties in use.Therefore, seek a kind of speech synthesis technique, the text SMS that mobile phone is received in time plays back by the mode of voice, becomes a very useful function of mobile phone.
Summary of the invention
For solving above-mentioned technical barrier, but the present invention aims to provide a kind of method that real-time voice is play note, after mobile phone receives the note of textual form, to the text word string of this note through text analyzing, obtain corresponding speech waveform, thereby form synthetic speech and play.
It comprises that text normalization is processed and the symbol step of converting, and the special symbol, abbreviation, English word and the measurement unit that are used for the text-string of the note that will obtain are converted to discernible phonation unit and identify.
Comprise the participle model treatment step, be used for the text to input is carried out the division of word by the participle rule that presets, determine the pronunciation of rhythm structure and the polyphone of sentence.
Also comprise prosody prediction step, coarticulation step and select the word step, wherein the prosody prediction step determines each word pronunciation, coarticulation has determined the annexation between each word, selects the word step to select optimum pronunciation according to the pronunciation of rhythm requirement and word in dictionary.
When selecting acoustic elements structure sound bank, utilize the degree of loss function to describe the synthesis capability with formed objects sound bank, the degree of loss function can be expressed as:
ζ(f,d,c)=cf/d
Wherein f is the word frequency of current acoustic elements, d is the prediction duration of acoustic elements, and c is not considering under rhythm condition for the size of coarticulation between the phoneme that comprises in this unit, during sound bank that structure is comprised of acoustic elements, make that the value of degree of loss letter on this sound bank is minimum is target.
Adopt the base frequency parameters model to control the generation of the rhythm.
The speech playing method of mobile phone text note provided by the invention, at mobile phone, the text voice translation system is installed simultaneously, automatically complete phonetic synthesis and play by loudspeaker, it is synthetic that the present invention has real-time phonetic, instant text voice conversion, save time, guarantee user's traffic safety, convenient old user's weak-eyed advantage.Set and the sound bank customization by the user, can change male and female students, perhaps cartoon character sound.Mobile phone also can be according to predefined originator's sex, this sex sound bank of Automatically invoked.
Description of drawings
Fig. 1 is method flow diagram in the embodiment of the present invention;
Fig. 2 is the process flow diagram of a kind of specific embodiment of the present invention.
Embodiment
With reference to figure 1, but the method for a kind of real-time voice broadcast of the present invention note comprises the following steps:
Step 1, note receive
It is to utilize mobile phone receive the note of one or more textual forms by its radio-frequency module from the base station and temporarily be stored in the internal memory of mobile phone that note receives.
Step 2, the conversion of civilian language
Adopt civilian language modular converter (Text-To-Speech model, TTS Model), namely one take text strings as the input voice synthetic module.Its input be common text word string, text analyzer in module is at first according to Pronounceable dictionary, the text strings of input is decomposed into word and pronunciation symbol thereof with attribute flags, again according to semantic rules and phonetic rules, for stress grade and sentence structure and intonation determined in each word, each syllable, and various pauses etc.Text strings just changes the symbol code string into like this.According to the result of front surface analysis, generate the prosodic features of target voice, then carry out the voice combination, synthesize the output voice.
The present invention reports the note data in mobile phone out so that the form of voice is instant, corresponding user only needs passive listening to get final product, here, requirement to speech synthesis system is fast response time, computation complexity and storage space complexity are low, be with good expansibility and synthetic speech sharpness, the property understood strong, be suitable for the interchange of daily life or some professional domain etc.
The present invention utilizes the degree of loss function to describe the synthesis capability with formed objects sound bank when selecting acoustic elements structure sound bank.The degree of loss function can be expressed as:
ζ(f,d,c)=cf/d
Wherein f is the word frequency of current acoustic elements, and d is the prediction duration of acoustic elements, and c is the size of coarticulation between the phoneme that comprises in this unit.Do not considering under rhythm condition, when constructing the sound bank that is comprised of acoustic elements, should make the value of degree of loss letter on this sound bank minimum is target.
The present invention adopts the Fujisaki model to control the generation of the rhythm, and it is a kind of widely used base frequency parameters model, mainly predicts the variation of fundamental frequency by simulation people's Mechanism of Speech Production, controls rhythm, tone intonation, the emotion of synthetic speech;
The inconvenient situation of directly going to see note with eyes of all users that has been suitable for of the present invention.
Set and the sound bank customization by the user, can change male and female students, perhaps cartoon character sound.
Mobile phone also can be according to predefined originator's sex, this sex sound bank of Automatically invoked.
With reference to figure 2,, the text of input mainly transforms by standardization processing and symbol, and wherein special symbol, abbreviation, English word and measurement unit etc. are converted to discernible phonation unit sign.In participle model, the text of inputting is carried out the division of word by the participle rule that presets, just substantially determined the pronunciation of rhythm structure and the polyphone of sentence by word segmentation processing.Prosody prediction determines each word pronunciation; Coarticulation has determined the annexation between each word.Select the word module to select optimum pronunciation according to the pronunciation of rhythm requirement and word in dictionary, reconstruct recovers waveform through voice.The speech waveform of each word is completed the synthetic of final statement under the control of splicing parameter through concatenation module.
1, acoustic elements is selected and is generated
For making synthetic speech have higher sharpness, intelligibility and naturalness, usually take the speech synthesis technique based on waveform.Synthesis unit in the waveform concatenation phonetic synthesis cuts out from the primitive nature voice, has kept some prosodic features of natural-sounding.According to voice and the rhythm rule of natural language, store suitable speech primitive, make these unit have maximum voice and rhythm coverage rate under the memory capacity of determining.Export voice after the steps such as the selection of process acoustic elements, waveform concatenation, smoothing processing when synthetic.By well-designed corpus, and choose optimal acoustic elements according to voice and prosodic rules from the sound storehouse, make the voice of system's outputting high quality.
Common voice unit candidate can have phrase, syllable, phoneme and diphones etc.When the needed corpus of structure waveform concatenation, can be in conjunction with the relative merits of dissimilar sample, for example for the strong phoneme of some coarticulations that often occur in natural flow, syllable combination, when forming target voice by waveform concatenation, should avoid splicing between the large phonotactics of these coarticulation impacts as far as possible, otherwise slightly having of unit selection is improper, will cause the acceptance that is difficult to acoustically.So type and the length of the acoustic elements of taking when the practical synthesis system of structure will be all unfixed.
When selecting acoustic elements structure sound bank, usually utilize certain degree of loss function to describe the synthesis capability with formed objects sound bank.A typical degree of loss function can be expressed as:
ζ(f,d,c)=cf/d (1)
Wherein f is the word frequency of current acoustic elements, and d is the prediction duration of acoustic elements, and c is the size of coarticulation between the phoneme that comprises in this unit.Do not considering under rhythm condition, when constructing the sound bank that is comprised of acoustic elements, should make minimum by the value of degree of loss letter on this sound bank of (1) expression is target.
The acoustic elements that is used for splicing is obtained by continuous flow cutting usually.In living, common expressions can obtain word frequency information by statistics, and selects sentence under the guidance of word frequency information, makes the sentence of selecting have preferably high frequency words and covers, and these select sentences become needs the script recorded after a while.
Select suitable announcer, the contrast script is rationally read aloud, and recording.The speech waveform data of recording gained are carried out cutting by the division of script and acoustic elements, usually can cutting be word, word (CV structure) and English needs cutting to arrive word and a small amount of phoneme or diphones usually for Chinese, thereby consist of the phonation unit storehouse.The acoustic elements that cutting the is obtained words that the position in former sentence (after in front) and front and back are connected by it marks.These markup informations provide foundation to the judgement of selecting the word module
2, the generation of the rhythm
Prosodic parameter is significant for the rhythm of controlling synthetic speech, tone intonation, emotion etc., and to Chinese spectrum mandarin, fundamental frequency is that the harmony straightening connects relevant physical parameter.The constitution principle of Chinese can be summed up as follows: consist of initial consonant or simple or compound vowel of a Chinese syllable by phoneme, become after tone on the rhythm master tape and transfer mother, become syllable by single accent mother or by initial consonant with transferring female the splicing.Chinese has high and level tone, rising tone, upper sound, falling tone, 5 accent softly, and more than 1200 has the tuning joint.A syllable is exactly the sound of a word, i.e. syllable word.Consist of word by syllable word, consist of sentence by word more at last.
Prosody generation based on machine learning.Although obtained many rules about the rhythm at present, these rules also fall far short for forming the rhythm that gets close to nature very much.Hide and inenarrable prosodic rules utilizes the method for machine learning to realize the generation of the rhythm usually for realizing.Algorithm model commonly used has hidden Markov model (HMM), artificial neural network (ANN), support vector machine (SVM) and decision tree etc.
Prosody generation based on parameterized model.Rhythm model based on machine learning extracts the detailed rules and regulations that some manually can't be analyzed, the adult reduces the workload that artificial participation is analyzed, but also there are the following problems simultaneously for this method: at first, general learning algorithm all requires many data resources, when particularly attributive character is many; Secondly, if the data with existing maldistribution of the resources is even, the whole deviation of training, impact analysis result will be caused; Again, expertise well in conjunction with utilizing, is not a kind of information waste; The 4th, training pattern does not have and language feature and human perception hook, can't shift and adjust.Fundamental frequency and duration be the direct parameters,acoustic that affects people's rhythm sense of hearing, both temporal evolution and environmental change.Parameter model utilizes priori, and the relation of first analyzing fundamental frequency duration and language feature, people's sense of hearing is built this relation and touched, and extracts fundamental frequency duration and language feature and people's the directly related parameter of sense of hearing.Such model has effectively utilized expertise, just can train with few data the relation of text language feature and parameter, just can reach simultaneously the purpose of the prosodic features that changes sense of hearing by the adjustment model parameter.
The Fujisaki model is a kind of widely used base frequency parameters model, and it mainly predicts the variation of fundamental frequency by simulation people's Mechanism of Speech Production.Fujisaki thinks that the change of fundamental frequency mainly contains two reasons: the impact of (Accent) transferred in the impact on prosodic phrase border (Phrase) and syllable.The generation of fundamental curve is the mechanism according to vocal cord vibration, and with Phrase and the Accent input as prognoses system, with the input of fundamental curve as system, wherein the form with pulse signal produces the Phrase shape, produces the Accent shape with step function.Fundamental curve can be expressed as under this model:
Wherein,
Other parameters in formula are as follows: Fmin, fundamental frequency minimum value; a
i, i Phrase order control coefrficient; I, the Phrase number of elements; β
j, j Accent order control coefrficient; J, the Accent number of elements; θ, Accent order maximal value parameter; T
0i, the time mark of i Phrase order; A
pi, i Phrase order amplitude; T
1j, j Accent order start time; A
aj, j Accent order amplitude; T
2j, j Accent order concluding time.
The mechanism of Fujisaki model is very simple, for each phrase order, is exactly to pass through the phrase wave filter with a pulse signal, and corresponding fundamental frequency value rises to maximum point, then decay gradually.For continuous phrase order, fundamental curve produces continuous fluctuation.The Accent order is by a step function initialization, because the parameter alpha of accent wave filter much larger than β, makes the Accent element reach very soon its maximal value, and then decay rapidly.
Beneficial effect of the present invention:
It is synthetic that the present invention has real-time phonetic, instant text voice conversion, and system effectiveness is high, stable;
The text voice conversion module clear in structure that the present invention proposes, the each several part division of labor is clear and definite, and independence is strong;
The present invention is convenient, and old user uses, and thoroughly breaks away from presbyopic glasses;
User of the present invention uses when driving, guarantees user's traffic safety.
Those skilled in the art should understand, embodiments of the invention can be provided as method, system or computer program.Therefore, the present invention can adopt complete hardware implementation example, implement software example or in conjunction with the form of the embodiment of software and hardware aspect fully.And the present invention can adopt the form that wherein includes the upper computer program of implementing of computer-usable storage medium (including but not limited to magnetic disk memory and optical memory etc.) of computer usable program code one or more.
The present invention is that reference is described according to process flow diagram and/or the block scheme of method, equipment (system) and the computer program of the embodiment of the present invention.Should understand can be by the flow process in each flow process in computer program instructions realization flow figure and/or block scheme and/or square frame and process flow diagram and/or block scheme and/or the combination of square frame.Can provide these computer program instructions to the processor of multi-purpose computer, special purpose computer, Embedded Processor or other programmable data processing device to produce a machine, make the instruction of carrying out by the processor of computing machine or other programmable data processing device produce to be used for the device of realizing in the function of flow process of process flow diagram or a plurality of flow process and/or square frame of block scheme or a plurality of square frame appointments.
These computer program instructions also can be stored in energy vectoring computer or the computer-readable memory of other programmable data processing device with ad hoc fashion work, make the instruction that is stored in this computer-readable memory produce the manufacture that comprises command device, this command device is realized the function of appointment in flow process of process flow diagram or a plurality of flow process and/or square frame of block scheme or a plurality of square frame.
These computer program instructions also can be loaded on computing machine or other programmable data processing device, make on computing machine or other programmable devices and to carry out the sequence of operations step producing computer implemented processing, thereby be provided for realizing the step of the function of appointment in flow process of process flow diagram or a plurality of flow process and/or square frame of block scheme or a plurality of square frame in the instruction of carrying out on computing machine or other programmable devices.
Obviously, those skilled in the art can carry out various changes and modification and not break away from the spirit and scope of the present invention the present invention.Like this, if within of the present invention these are revised and modification belongs to the scope of claim of the present invention and equivalent technologies thereof, the present invention also is intended to comprise these changes and modification interior.
Claims (6)
1. the method for the speech play of a mobile phone text note after mobile phone receives the note of textual form,, obtains corresponding speech waveform, thereby forms synthetic speech and play through text analyzing the text word string of this note.
2. method as claimed in claim 1, it is characterized in that: comprise that text normalization is processed and the symbol step of converting, the special symbol, abbreviation, English word and the measurement unit that are used for the text-string of the note that will obtain are converted to discernible phonation unit and identify.
3. method as claimed in claim 2, is characterized in that: comprise the participle model treatment step, be used for the text to input is carried out the division of word by the participle rule that presets, determine the pronunciation of rhythm structure and the polyphone of sentence.
4. method as claimed in claim 3, it is characterized in that: also comprise prosody prediction step, coarticulation step and select the word step, wherein the prosody prediction step determines each word pronunciation, coarticulation has determined the annexation between each word, selects the word step to select optimum pronunciation according to the pronunciation of rhythm requirement and word in dictionary.
5. method as claimed in claim 1, it is characterized in that: when selecting acoustic elements structure sound bank, utilize the degree of loss function to describe the synthesis capability with formed objects sound bank, the degree of loss function can be expressed as:
ζ(f,d,c)=cf/d
Wherein f is the word frequency of current acoustic elements, d is the prediction duration of acoustic elements, and c is not considering under rhythm condition for the size of coarticulation between the phoneme that comprises in this unit, during sound bank that structure is comprised of acoustic elements, make that the value of degree of loss letter on this sound bank is minimum is target.
6. method as claimed in claim 1, is characterized in that: adopt the base frequency parameters model to control the generation of the rhythm.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN2011104243757A CN103165126A (en) | 2011-12-15 | 2011-12-15 | Method for voice playing of mobile phone text short messages |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN2011104243757A CN103165126A (en) | 2011-12-15 | 2011-12-15 | Method for voice playing of mobile phone text short messages |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN103165126A true CN103165126A (en) | 2013-06-19 |
Family
ID=48588150
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN2011104243757A Pending CN103165126A (en) | 2011-12-15 | 2011-12-15 | Method for voice playing of mobile phone text short messages |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN103165126A (en) |
Cited By (16)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103888611A (en) * | 2014-03-20 | 2014-06-25 | 联想(北京)有限公司 | A kind of output method and communication equipment |
| CN104200803A (en) * | 2014-09-16 | 2014-12-10 | 北京开元智信通软件有限公司 | Voice broadcasting method, device and system |
| CN104485100A (en) * | 2014-12-18 | 2015-04-01 | 天津讯飞信息科技有限公司 | Text-to-speech pronunciation person self-adaptive method and system |
| CN105095180A (en) * | 2014-05-14 | 2015-11-25 | 中兴通讯股份有限公司 | Chinese name broadcasting method and device |
| CN106294310A (en) * | 2015-06-12 | 2017-01-04 | 讯飞智元信息科技有限公司 | A kind of Tibetan language tone Forecasting Methodology and system |
| CN106507321A (en) * | 2016-11-22 | 2017-03-15 | 新疆农业大学 | A Uygur and Chinese Bilingual GSM Short Message Speech Conversion Broadcasting System |
| CN106652995A (en) * | 2016-12-31 | 2017-05-10 | 深圳市优必选科技有限公司 | Text voice broadcast method and system |
| CN107909993A (en) * | 2017-11-27 | 2018-04-13 | 安徽经邦软件技术有限公司 | A kind of intelligent sound report preparing system |
| CN109031474A (en) * | 2018-08-31 | 2018-12-18 | 成都润联科技开发有限公司 | A kind of weather information hiding Chinese phonetic broadcasting terminals and its working method based on Beidou satellite communication |
| CN111128116A (en) * | 2019-12-20 | 2020-05-08 | 珠海格力电器股份有限公司 | Voice processing method and device, computing equipment and storage medium |
| CN111261139A (en) * | 2018-11-30 | 2020-06-09 | 上海擎感智能科技有限公司 | Character personification broadcasting method and system |
| CN112966476A (en) * | 2021-04-19 | 2021-06-15 | 马上消费金融股份有限公司 | Text processing method and device, electronic equipment and storage medium |
| CN113382123A (en) * | 2020-03-10 | 2021-09-10 | 精工爱普生株式会社 | Scanning system, storage medium, and scanning data generation method for scanning system |
| CN113903324A (en) * | 2020-06-18 | 2022-01-07 | 新加坡依图有限责任公司(私有) | Method, device, equipment and machine readable medium for text-to-speech |
| CN113936638A (en) * | 2020-06-29 | 2022-01-14 | 华为技术有限公司 | Text audio playing method and device and terminal equipment |
| CN114360494A (en) * | 2021-12-29 | 2022-04-15 | 广州酷狗计算机科技有限公司 | Rhythm labeling method and device, computer equipment and storage medium |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1271216A (en) * | 1999-04-16 | 2000-10-25 | 松下电器产业株式会社 | Speech voice communication system |
| DE20102259U1 (en) * | 2001-02-09 | 2002-02-21 | Materna Gmbh Information & Com | SMS short message system |
| GB2378875A (en) * | 2001-05-04 | 2003-02-19 | Andrew James Marsh | Annunciator for converting text messages to speech |
| CN1731509A (en) * | 2005-09-02 | 2006-02-08 | 清华大学 | Mobile speech synthesis method |
| CN1972478A (en) * | 2005-11-24 | 2007-05-30 | 展讯通信(上海)有限公司 | A novel method for mobile phone reading short message |
| CN101605307A (en) * | 2008-06-12 | 2009-12-16 | 深圳富泰宏精密工业有限公司 | Test short message service (SMS) voice play system and method |
-
2011
- 2011-12-15 CN CN2011104243757A patent/CN103165126A/en active Pending
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1271216A (en) * | 1999-04-16 | 2000-10-25 | 松下电器产业株式会社 | Speech voice communication system |
| DE20102259U1 (en) * | 2001-02-09 | 2002-02-21 | Materna Gmbh Information & Com | SMS short message system |
| GB2378875A (en) * | 2001-05-04 | 2003-02-19 | Andrew James Marsh | Annunciator for converting text messages to speech |
| CN1731509A (en) * | 2005-09-02 | 2006-02-08 | 清华大学 | Mobile speech synthesis method |
| CN1972478A (en) * | 2005-11-24 | 2007-05-30 | 展讯通信(上海)有限公司 | A novel method for mobile phone reading short message |
| CN101605307A (en) * | 2008-06-12 | 2009-12-16 | 深圳富泰宏精密工业有限公司 | Test short message service (SMS) voice play system and method |
Cited By (23)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105357397A (en) * | 2014-03-20 | 2016-02-24 | 联想(北京)有限公司 | Output method and communication devices |
| CN103888611A (en) * | 2014-03-20 | 2014-06-25 | 联想(北京)有限公司 | A kind of output method and communication equipment |
| CN103888611B (en) * | 2014-03-20 | 2016-01-27 | 联想(北京)有限公司 | A kind of output method and communication equipment |
| CN105095180A (en) * | 2014-05-14 | 2015-11-25 | 中兴通讯股份有限公司 | Chinese name broadcasting method and device |
| CN104200803A (en) * | 2014-09-16 | 2014-12-10 | 北京开元智信通软件有限公司 | Voice broadcasting method, device and system |
| CN104485100B (en) * | 2014-12-18 | 2018-06-15 | 天津讯飞信息科技有限公司 | Phonetic synthesis speaker adaptive approach and system |
| CN104485100A (en) * | 2014-12-18 | 2015-04-01 | 天津讯飞信息科技有限公司 | Text-to-speech pronunciation person self-adaptive method and system |
| CN106294310B (en) * | 2015-06-12 | 2019-05-03 | 讯飞智元信息科技有限公司 | A kind of Tibetan language tone prediction technique and system |
| CN106294310A (en) * | 2015-06-12 | 2017-01-04 | 讯飞智元信息科技有限公司 | A kind of Tibetan language tone Forecasting Methodology and system |
| CN106507321A (en) * | 2016-11-22 | 2017-03-15 | 新疆农业大学 | A Uygur and Chinese Bilingual GSM Short Message Speech Conversion Broadcasting System |
| CN106652995A (en) * | 2016-12-31 | 2017-05-10 | 深圳市优必选科技有限公司 | Text voice broadcast method and system |
| WO2018121757A1 (en) * | 2016-12-31 | 2018-07-05 | 深圳市优必选科技有限公司 | Method and system for speech broadcast of text |
| CN107909993A (en) * | 2017-11-27 | 2018-04-13 | 安徽经邦软件技术有限公司 | A kind of intelligent sound report preparing system |
| CN109031474A (en) * | 2018-08-31 | 2018-12-18 | 成都润联科技开发有限公司 | A kind of weather information hiding Chinese phonetic broadcasting terminals and its working method based on Beidou satellite communication |
| CN111261139B (en) * | 2018-11-30 | 2023-12-26 | 上海擎感智能科技有限公司 | Literal personification broadcasting method and system |
| CN111261139A (en) * | 2018-11-30 | 2020-06-09 | 上海擎感智能科技有限公司 | Character personification broadcasting method and system |
| CN111128116A (en) * | 2019-12-20 | 2020-05-08 | 珠海格力电器股份有限公司 | Voice processing method and device, computing equipment and storage medium |
| CN113382123A (en) * | 2020-03-10 | 2021-09-10 | 精工爱普生株式会社 | Scanning system, storage medium, and scanning data generation method for scanning system |
| CN113903324A (en) * | 2020-06-18 | 2022-01-07 | 新加坡依图有限责任公司(私有) | Method, device, equipment and machine readable medium for text-to-speech |
| CN113936638A (en) * | 2020-06-29 | 2022-01-14 | 华为技术有限公司 | Text audio playing method and device and terminal equipment |
| CN112966476B (en) * | 2021-04-19 | 2022-03-25 | 马上消费金融股份有限公司 | Text processing method and device, electronic equipment and storage medium |
| CN112966476A (en) * | 2021-04-19 | 2021-06-15 | 马上消费金融股份有限公司 | Text processing method and device, electronic equipment and storage medium |
| CN114360494A (en) * | 2021-12-29 | 2022-04-15 | 广州酷狗计算机科技有限公司 | Rhythm labeling method and device, computer equipment and storage medium |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN103165126A (en) | Method for voice playing of mobile phone text short messages | |
| US8219398B2 (en) | Computerized speech synthesizer for synthesizing speech from text | |
| Kuligowska et al. | Speech synthesis systems: disadvantages and limitations | |
| Panda et al. | A survey on speech synthesis techniques in Indian languages | |
| Indumathi et al. | Survey on speech synthesis | |
| Panda et al. | Text-to-speech synthesis with an Indian language perspective | |
| Mukherjee et al. | A bengali hmm based speech synthesis system | |
| Chomphan et al. | Tone correctness improvement in speaker dependent HMM-based Thai speech synthesis | |
| Koriyama et al. | Prosody generation using frame-based Gaussian process regression and classification for statistical parametric speech synthesis | |
| Trouvain et al. | Speech synthesis: text-to-speech conversion and artificial voices | |
| Sun et al. | A method for generation of Mandarin F0 contours based on tone nucleus model and superpositional model | |
| Kishore et al. | Building Hindi and Telugu voices using festvox | |
| CN102122505A (en) | Modeling method for enhancing expressive force of text-to-speech (TTS) system | |
| Chen et al. | A Mandarin Text-to-Speech System | |
| Sečujski et al. | Learning prosodic stress from data in neural network based text-to-speech synthesis | |
| Pitrelli et al. | Expressive speech synthesis using American English ToBI: questions and contrastive emphasis | |
| Bruce et al. | On the analysis of prosody in interaction | |
| Gerazov et al. | A novel quasi-diphone inventory approach to Text-To-Speech synthesis | |
| Anberbir et al. | Development of an Amharic text-to-speech system using cepstral method | |
| Waghmare et al. | Analysis of pitch and duration in speech synthesis using PSOLA | |
| Bunnell | Speech synthesis: Toward a “Voice” for all | |
| Azeem | Designing a model for speech synthesis using HMM | |
| Ng | Survey of data-driven approaches to Speech Synthesis | |
| IMRAN | ADMAS UNIVERSITY SCHOOL OF POST GRADUATE STUDIES DEPARTMENT OF COMPUTER SCIENCE | |
| Tuckova et al. | Prosody optimisation of a Czech language synthesizer |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C05 | Deemed withdrawal (patent law before 1993) | ||
| WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20130619 |