WO2003017251A1 - Prosodic boundary markup mechanism - Google Patents
Prosodic boundary markup mechanism Download PDFInfo
- Publication number
- WO2003017251A1 WO2003017251A1 PCT/GB2002/003738 GB0203738W WO03017251A1 WO 2003017251 A1 WO2003017251 A1 WO 2003017251A1 GB 0203738 W GB0203738 W GB 0203738W WO 03017251 A1 WO03017251 A1 WO 03017251A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- text
- prosodic
- prosodic boundary
- text portion
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
- G10L13/10—Prosody rules derived from text; Stress or intonation
Definitions
- the present invention relates to an automated prosodic boundary markup mechanism, a method for automated prosodic boundary markup, a program element for prosodic boundary markup and a computer system configured to implement prosodic boundary markup.
- the present invention relates to an automated prosodic boundary markup mechanism for an automated text-to-speech (TTS) converter.
- TTS text-to-speech
- text-to-speech converters sometimes referred to as text-to-speech synthesisers
- a text-to-speech converter is a computer-based system that is able to read out aloud text that is input to it.
- TTS converters with a very high level of intelligibility, sound quality and the naturalness or humanness of the speech still remains a major problem.
- TTS applications are implemented in multi-task scenarios where the human-machine communication takes place via audio channels, for example via a telephone call.
- an initial syntactic estimation of the appropriate sites for prosodic breaks/boundaries in a text portion can be redefined, using accent-based and other constraints, to give an improved estimation of where these breaks/boundaries can be marked. This yields improved synthesised speech generated from such marked text.
- the present invention provides an automated prosodic boundary markup mechanism for an automated text-to-speech converter, the markup mechanism operable to perform a syntactic analysis on a text portion input thereto and to assign one or more prosodic boundaries to said text portion, the markup mechanism further operable to apply a constraint to said one or more prosodic boundaries and to remove a prosodic boundary satisfying said constraint.
- the present invention provides a method for configuring a computer system, including memory and a processor, to mark up a text portion input thereto with one or more prosodic boundaries, the method comprising configuring said computer system to: store said portion in said memory; process said text portion in said processor to perform a syntactic analysis thereof and to assign one or more prosodic boundaries to said text portion; store in said memory prosodic boundary marks associated with said prosodic boundaries; process said text portion having prosodic boundaries assigned thereto to apply a constraint; and delete prosodic boundary marks satisfying said constraint.
- an advantage of an embodiment in accordance with said first or second aspect of the present invention is that overmarking with prosodic boundaries, which typically occurs following a syntactic analysis of a text portion, is reduced.
- speech synthesised from a text portion processed in accordance with an embodiment of the present invention may be perceived to sound more like natural speech than known TTS converters, and thereby the intelligibility of the synthesised speech is improved.
- the prosodic boundary markup mechanism is operable to assign a prosodic boundary mark to each of said one or more prosodic boundaries, and to remove a prosodic boundary mark satisfying said constraint. The assigning of such prosodic boundary marks to identified prosodic boundaries within the text portion aid the automated analysis of the prosodic boundary content of the text portion, since such prosodic boundary marks may be identified by a machine and processed accordingly.
- the prosodic boundary markup mechanism is operable to identify a deaccented word in said text portion and to apply a constraint comprising removing a prosodic boundary immediately subsequent to said deaccented word.
- inappropriate prosodic boundaries identified in the text portion on the basis of solely a grammatical syntactic analysis, and incorrectly associated with a deaccented word, may be removed.
- the prosodic boundary markup mechanism is operable to identify punctuation in said text portion, and to apply a constraint comprising removing a prosodic boundary conditional on there being only one word between a prosodic boundary and preceding or subsequent punctuation, thereby further removing inappropriate prosodic boundaries assigned by the syntactic analysis.
- the prosodic boundary markup mechanism is operable to assign a major syntactic class to one or more words in said text portion, and to apply a constraint comprising removing a prosodic boundary bounded by two words having the same major syntactic class assigned to them.
- the major syntactic class is preferably selected from the set comprising the following major syntactic classes: noun; verb; adjective and adverb. Thereby, further inappropriately assigned prosodic boundaries may be removed.
- the syntactic classes may by identified by POS tags assigned to the words.
- the prosodic boundary markup mechanism may be operable to identify one or more function words in said text portion, and to apply a constraint comprising removing a prosodic boundary immediately subsequent to a function word, which further removes inappropriately assigned prosodic boundaries.
- the function words are identified in accordance with one or more function definitions, drawn from the Perm TreeBank tag set ⁇ dt, ex, ttin, Is, md, pdt, pos, prp, pps, rp, sym, tto, uh, vbp, wdt, wp, wpz ⁇ .
- the prosodic boundary markup mechanism is operable to determine a first number being the number of words between a first prosodic boundary and a second prosodic boundary being the next subsequent prosodic boundary from said first prosodic boundary; to determine a second number being the number of words between said second prosodic boundary and a third prosodic boundary being the next subsequent prosodic boundary from said second prosodic boundary; and to apply a constraint comprising removing said second prosodic boundary conditional on the ratio of said first number to said second number being greater than a predetermined threshold, suitably the threshold being two.
- a predetermined threshold suitably the threshold being two.
- the prosodic boundary markup mechanism is operable to assign a part of speech to one or more words, preferably each word, in said text portion, and to parse said text portion including said part of speech assigned words, thereby to perform said syntactic analysis on said text portion.
- Parsing a text portion on the basis of the parts of speech assigned to individual words in the text portion is a well-known technique, and for which automated processes are known.
- the parts of speech assigned to each word may be used to determined a condition for applying a constraint, thereby utilising the results of the parts of speech analysis for two separate functions of the markup mechanism.
- the prosodic boundary markup mechanism is operable to implement an automated Brill Tagger mechanism to assign a part of speech to said one or more words in said text portion, which is a well known and available Tagger mechanism.
- the prosodic boundary markup mechanism comprises a database including a Penn Treebank set of part of speech tags, and is operable to implement said automated Brill Tagger mechanism to interrogate said database to obtain a part of speech tag to assign to one or more words in said text portion.
- the use of the Penn TreeBank set of part of speech tags is advantageous in that it is a well known nomenclature for marking parts of speech, and is well understood by persons practised in the relevant art.
- the prosodic boundary markup mechanism is operable to parse said text portion including said part of speech assigned words to return a partial parse of said text portion, and yet more preferably to return the longest possible parse. Even more preferably a parsed sentence is returned.
- An advantage of a prosodic boundary markup mechanism in which a longest possible parse is sought to be returned, preferably a parsed sentence, is that there is a graceful degradation in the parsed quality should it not be possible to provide a complete parse of a sentence. That is to say, a failure to provide a completely parsed sentence, results in the next longest possible parse being returned.
- a parse corresponding to the longest spanning edge from a vertex is returned.
- the present invention provides a computer system comprising memory, a processor and a prosodic boundary markup mechanism as described above.
- the present invention provides an automated text- to-speech converter system including a computer system as described above, a text input mechanism for the computer system and an audio output mechanism for outputting speech from the text-to-speech converter system.
- the automated text-to-speech converter system includes a text source, and the computer system is operable to communicate with said text source to provide text to said text input mechanism.
- the text source may comprise a news report database, a sports report database, a children's story database or an e-mail database for example.
- the text input mechanism comprises a keyboard for the computer system
- the automated text-to-speech converter system may be operable to provide editing of text originating from a text source with or via the keyboard.
- a human operator may process text prior to it being input to the TTS converter. This is an important process since the better formed a text portion is, the better the text-to-speech conversion.
- the text portion should be correctly marked in accordance with grammatical and punctuation rules, and have appropriate capitalisation.
- the text or a news report or sports report provided by a professional journalist, for example, is likely to be well formed. However, e-mails are unlikely to be so well formed and in accordance with grammatical rules, in particular where the author has adopted common e-mail shorthand conventions.
- the automated text-to-speech converter system is configured with an audio output mechanism which outputs u-law, A-law, or MPEG formatted audio output.
- the audio output may be put onto the regular Public Subscriber Telephone Network (PSTN), or via a suitable digital communications conduit.
- PSTN Public Subscriber Telephone Network
- the audio output mechanism may comprise a speaker.
- the present invention provides an automated text-to- speech converter including a prosodic boundary markup mechanism as described above.
- the present invention provides a user device which includes a prosodic boundary markup mechanism such as described above, or a text-to- speech converter such as referred to above.
- the user device may comprise a Personal Digital Assistant (PDA), a hand held computer, a mobile telephone or a laptop computer, for example.
- PDA Personal Digital Assistant
- Such user devices which are preferably portable, are capable of providing text-to-speech conversion.
- the present invention provides a communications network which comprises an automated text-to-speech converter system, a network interface connecting the automated text-to-speech converter system to the network and a user device connected to the network.
- the user device may be such as described above, but in this aspect need not include the prosodic boundary markup mechanism, since they are connected to a text-to-speech converter via the network, and thereby will receive text converted synthesised speech.
- Figure 1 is a schematic representation of a computer system
- Figure 2 is a schematic and simplified representation of an implementation of the computer system of Figure 1
- Figure 3 is a schematic illustration of the main processes in a TTS converter system
- FIG 4 is a schematic illustration of a TTS converter system coupled by a network to user devices
- Figure 5 is a schematic illustration of a TSS converter system in accordance with an embodiment of the present invention
- Figure 6 is a flow diagram for an embodiment of the present invention
- Figure 7 is a flow diagram illustrating in more detail a part of the flow diagram of Figure 6
- Figure 8 is a flow diagram illustrating in more detail a part of the flow diagram of Figure 7;
- Figure 9 is a schematic illustraton of the main processes of an embodiment of the present invention.
- processing platforms other than a computer system such as described below may be suitable for providing a platform for an embodiment of the present invention.
- FIG. 1 there is a schematic representation of an illustrative example of a computer system 11.
- the computer system 11 comprises a system unit 12, a display device 18 with a display screen 20, and user input devices, including a keyboard 22 and a mouse 24.
- a printer 21 is also connected to the system.
- Each system unit 12 comprises media drives, including an optical disk drive 14, a floppy disk drive 16 and an internal hard disk drive not explicitly shown in Figure 1.
- a CD-ROM 15 and a floppy disk 17 are also illustrated.
- the basic operations of the computer system 11 are controlled by an operating system which is a computer program typically supplied already loaded into the computer system.
- the computer system may be configured to perform other functions by loading it with a computer program known as an application program, for example.
- a computer program for implementing various functions or conveying various information may be supplied on media such as one or more CD-ROMs and/or floppy disks and then stored on a hard disk, for example.
- the computer system shown in Figure 1 may also be connected to a network, which may be the Internet or a local or wide area dedicated or private network, for example.
- a program or program element implementable by a computer system may also be supplied on a telecommunications medium, for example over a telecommunications network and/or the Internet, and embodied as an electronic signal.
- the telecommunications medium may be a radio frequency carrier wave carrying suitably encoded signals representing the computer program and data or information.
- the carrier wave may be an optical carrier wave for an optical fibre link or any other suitable carrier medium for a land line link telecommunication system.
- FIG. 2 there is shown a schematic and simplified representation of an illustrative implementation of a computer system such as that referred to with reference to Figure 1.
- the computer system comprises various data processing resources such as a processor (CPU) 30 coupled to a bus structure 38. Also connected to the bus structure 38 are further data processing resources such as read only memory 32 and random access memory 34.
- a display adaptor 36 connects a display device 18 to the bus structure 38.
- One or more user- input device adapters 40 connect the user-input devices, including the keyboard 22 and mouse 24 to the bus structure 38.
- An adapter 41 for the connection of the printer 21 may also be provided.
- One or more media drive adapters 42 can be provided for connecting the media drives, for example the optical disk drive 14, the floppy disk drive 16 and hard disk drive 19, to the bus structure 38.
- One or more telecommunications adapters 44 can be provided thereby providing processing resource interface means for connecting the computer system to one or more networks or to other computer systems.
- the communications adapters 44 could include a local area network adapter, a modem and/or ISDN terminal adapter, or serial or parallel port adapter etc, as required.
- Figure 2 is a schematic representation of one possible implementation of a computer system, and that from the following description of embodiments of the present invention, the computer system in which the invention could be implemented may take many forms.
- the computer system may be a non-PC type of computer which is Internet- or network-compatible, for example a Web TV, or set-top box for a domestic TV capable of providing access to a computer network such as the Internet.
- the computer system may be in the form of a wireless PDA or a multimedia terminal.
- the main processes and functions for a TTS converter such as may be implemented on a computer system platform as described above with reference to Figures 1 and 2, will now be described with reference to Figure 3 of the drawings.
- the TTS converter 49 illustrated in Figure 3 comprises two main stages.
- the first stage is a text analysis module 50, comprising a text normalisation process 52 and a linguistic analysis process 54.
- a text portion 56 comprising a single sentence or multiple sentences, is input to the text analysis module 50 and undergoes a text normalisation process 52.
- Text normalisation sometimes referred to text preprocessing, involves breaking up the text portion string into individual words. For a language such as English this is relatively easily as words are separated by spaces. However, in an ideographic language such as Japanese or Chinese a more complicated set of rules could be involved. Text normalisation also involves converting abbreviations, non-alpha characters, numbers and acronyms into a fully spelt out form.
- control sequences are used to separate different modes. For example, a particular control signal may indicate a "mathematical mode” for numbers and mathematical expressions, another control sequence to indicate a "date mode” for dates, and a further control sequence to indicate an "e-mail” mode for e-mail specific characters.
- the normalised text is fed to the linguistic analysis process 54, where it is analysed to derive phonetic information 58 and prosodic information 60 of input text portion 56.
- the phonetic information 58 may be derived by linguistic analysis process 54 to provide a phonetic representation of the input text, for example by way of grapheme to phoneme conversion, which is important for deriving the co ⁇ ect pronunciation of words.
- the grapheme to phoneme conversion is done by way of a combination of grammatical rules and a look up dictionary for exceptions to the grammar rules.
- the grammar typical consists of simple and complex rules to deal with all phonological variations in the language of the text to be converted.
- a morphological and syntactic parser may also be utilised in order to derive a phonetic representation for certain pronunciations that are conditioned by morphosyntactic contexts.
- such a parser may be used to resolve the ambiguity between the words "read” (present tense) and "read” (past tense) by way of analysis of the context in which the words are used.
- Idiosyncratic pronunciations for example proper names, are stored in a dictionary.
- the linguistic analysis process 54 applies the grammar rules to the input text, and also checks the dictionary in order to identify any exceptions and to derive their phonetic representation.
- the linguistic analysis process 54 also provides prosodic information 60 regarding the input text portion 56.
- Prosodic information relates to duration, intonation and rhythm of a language, and the linguistic analysis process 54 seeks to provide prosodic information 60 regarding such aspects of the input text 56.
- determining prosody from raw text is extremely difficult because indications of prosodic features are not always marked on text.
- prosody is linked with the fundamental frequency contour at different levels in the hierarchy in speech, i.e. words, phrases and boundaries in speech.
- Fundamental frequency is a frequency of vibration of the vocal chords, which are usually represented as a number of cycles of variations in air-pressure per second.
- fundamental frequency is the acoustic correlate of pitch. An increase in pitch corresponds to an increase in the fundamental frequency.
- Prosody is typically manifested in terms of pitch accents on words, phrase accents on phrases and boundary tones at the end of major breaks in the prosodic structure. Each of these prosodic "events" or “boundaries” is associated with some change in the movement of the fundamental frequency. Prosody markup in TTS conversion involves the marking up of such events or boundaries on the text, which are later mapped onto acoustic parameters related to pause durations and change in the fundamental frequency contours. A significant problem in predicting prosodic events or boundaries lies in the fact that many of them are governed by extra-linguistic factors such as a speaker's attitude or emotive state.
- the phonetic information 58 and prosodic information 60 derived by the linguistic analysis process 54 is forwarded to speech generation module 62.
- the term prosody refers to those features of speech that deal with pitch (fundamental frequency), emphasis (amplitude), and length (duration).
- Prosodic features have specific functions in speech commumcation. The most common function is to group words together in order to facilitate the interpretation of the meaning of a sentence. For example, the different groupings of the words in the two sentences below cause a change in the meaning of the sentence:
- John is identifying James as a liar.
- it is James whose identifying John as a liar.
- a difference between an ordinary statement and a question can only be understood by using different tones at the end of the sentences:
- Text may be labelled in order to identify its prosodic structure.
- An example of such prosodic labelling is the Tonal and Break Indices (ToBI) annotation protocol for prosody based on PierreHumbert's theory of intonational English, see Pie ⁇ eHumbert, J.B. (1980) The Phonology and Phonetics of English Intonation, PhD Thesis, The Massachusetts Institute of Technology. Distributed by the Indiana University Linguistics Club.
- ToBI Tonal and Break Indices
- Pie ⁇ eHumbert 's intonation description is used as a standard to label prosodic boundaries at different levels in English, and other languages also.
- An example of the ToBI nomenclature is laid out below:
- Pitch accents are a local maxima or minima in the fundamental frequency contour associated with intonationally prominent words in a speech utterance. There are six pitch-accents marked on a word: a) L*: Low tone (f0 valley) aligned with the stressed syllable b) H*: High tone (fO peak) aligned with the stressed syllable c) LH*: Low tone followed by high tone aligned with the stressed syllable d) L*H: Low tone aligned with the stressed syllable followed by a high tone e) HL*: High tone followed by a low tone aligned with the stressed syllable f) H*L: High tone aligned with the stressed syllable followed by a low tone.
- Phrase accents A simple rise or fall in the fundamental frequency contour over a minor prosodic phrase. There are two boundary tones: a) L-: a valley extending from the preceding pitch accent to the end of the prosodic phrase. b) H-: a plateau extending from the preceding pitch accent to the end of the prosodic phrase.
- Boundary tones A rise or fall in the fundamental frequency contour at the end of a major prosodic phrase. The value of a boundary tone is higher than the corresponding phrase accent. There are two boundary tones: a) L%: a valley extending from the preceding pitch accent to the end of the prosodic phrase. b) H%: arise at the end of a major phrase.
- Major break Withinnational Phrase end: These consist of one or more intermediate/minor phrases plus a boundary tone. Thus, the end of a major phrase always corresponds with the end of a minor phrase, but not vice versa. These are marked by L- or H-, followed by either L% or H%. This gives a total of four possible major break markers; L-L%, L- H%, H-L%, H-H%.
- prosodic phrases such as intonational phrases and/or intermediate phrases.
- the simplest and typically the most common way of doing this is to insert prosodic boundaries corresponding to all punctuation marks.
- Such a method is arranged to predict major breaks at the following punctuation ["?”, “!, “.”, “!, “:”, “;”], and to predict minor breaks at [",", "(", ")", "-”].
- the e ⁇ or rate for this method is low, as there are always prosodic breaks at locations of punctuation, it tends to predict too fewer boundaries and leaves out many prosodic boundaries that might occur at places not marked by punctuation.
- Another method for marking prosodic breaks for TTS conversion is to mark minor breaks at punctuation and between a content word and a function word, and major breaks at the end of the sentence.
- this causes the opposite problems to the foregoing method dependent on merely inserting breaks at punctuation.
- Always inserting a break between a content word and a function word causes too many breaks.
- this method does not take into account the fact that certain function words, like particles, may be accented, or that certain content words like phrasal verbs are de-accented.
- the method does not take into account clitics, that is phonologically weakened (reduced, de-accented) words that phonologically form a part of the adjacent content word.
- Another method involves the manual labelling of data sets by experienced human labellers, and then training prosodic models on such data sets.
- a typical method involves training prosodic methods on a simplified ToBI annotated data set.
- Embodiments of the present invention seek to address, and preferably overcome, at least one of the drawbacks associated with the prior art, by providing an automated prosody boundary markup mechanism which seeks to constrain prosody overmarking resulting from syntactic analysis of a text portion.
- a text portion 56 is input to the markup mechanism 78 by way of a suitable text input mechanism, for example an interface to a text source database comprising text files, or by way of text typed in via a keyboard.
- the text portion should be unformatted, that is to say should not include special text such as underlining, emboldening, italicisation, and different sized fonts, for example, and merely comprise plain text such as ASCII, or as supported by Unicode tables for example. However, more complex text sources, such as (Microsoft) WORD documents, may be used provided they are arranged in plain text form. Additionally, the text portion 56 should also be, or have been normalised by a text normalisation process such as described above with reference to Figure 3. The text portion may comprise a single sentence, or multiple sentences of a corpus of text.
- the normalised text portion is input to a Part Of Speech (POS) tagger, where the text portion is tagged with POS tags.
- the tagged text portion is input to a parser 82 which performs a syntactic analysis on the tagged text in order to obtain "chunks" or groups of words that form a syntactic phrase.
- the chunks co ⁇ espond to the way words are grouped together in spoken utterances.
- syntactic factors such as parts of speech do influence a grouping of words into prosodic phrases, there is not always a one to one co ⁇ espondence between a syntactic phrase and a prosodic phrase.
- a prosodic boundary marked at the end of each syntactic phrase would over-predict the number of prosodic phrases.
- the number of phrase boundaries predicted on the basis of syntactic phrases generated by the chunking parser 82 are more than the actual number of prosodic phrase boundaries for a given text portion.
- a markup mechanism in accordance with the present invention applies one or more constraints to the output of chunking parser 82 in order to reduce the over- prediction of prosodic phrases.
- a constraints module 84 provides information regarding the prosodic quality of the text portion that influences the occu ⁇ ence of a potential prosodic boundary, and can therefore be used to modify the output of parser 82.
- the constraints module 84 receives the text portion tagged up with POS tags, and determines which words are accented or de- accented on the basis of their associated POS tag. A word which is deaccented is one which is not accented.
- the text portion marked up with prosodic boundaries based on the syntactic analysis performed in parser 82 is then forwarded to a prosodic boundary elimination module 86, together with the identification of deaccented words from constraints module 84.
- Prosodic boundary elimination module 86 is configured to remove or eliminate prosodic boundaries signed by the chunking parser 82 which follow deaccented words as identified by constraints module 84. A prosodic boundary following a deaccented word is considered illegal. Other constraints influencing the potential prosodic boundaries of the text portion may also be determined and applied in respective modules 84 and 86. Having eliminated prosodic boundaries identified as being illegal in module 86, the remaining prosodic boundaries or breaks are classified as major or minor breaks, 88.
- the ToBI marked up text portion is output, 92, to a speech synthesiser unit for producing synthesised speech.
- Synthesised speech derived from a text portion marked up by a markup mechanism in accordance with an embodiment of the invention as illustrated in Figure 9 better simulates natural speech, and improves the intelligibility of the synthesised speech.
- a TTS converter system may be used for a number of applications.
- a particularly useful application is to provide speech output to users who are in environments or situations where reading text is inappropriate or not possible. Or where a user may wish to listen to speech, whilst performing some other task.
- a suitable TTS conversion system may be implemented on a user device, such as a personal computer, personal digital assistant, mobile phone, hand-held or laptop computer, the processing overhead for TTS conversion is generally high, and therefore a particularly suitable application for a TTS converter system is one in which it is connected to user devices by way of a communications network.
- a particularly suitable application for a TTS converter system is one in which it is connected to user devices by way of a communications network.
- the use of a TTS converter within user devices themselves is not precluded from falling within the scope of the present invention.
- TTS converter system 90 is configured to operate as a server for user devices wishing to receive synthesised speech output.
- the TTS conversion system 90 is connected to a text source 92 including databases of various types of text material, such as e-mail, news reports, sports reports and children's stories.
- Each text database may be coupled to the TTS converter system 90 by way of a suitable server.
- e-mail database may be connected to TTS converter system 90 by way a mail server 92(1) which forwards e-mail text to the TTS converter system.
- Suitable servers such as a news server 92(2) and a storey server 92(n) are also connected to the TTS converter system 90.
- the output of the TTS converter system is forwarded to the communications network 96 via a network interface 94.
- the communications network may be any suitable, or combination of suitable, communications networks, for example Internet backbone services, Public Subscriber Telephone Network (PSTN) or Cellular Radio Telephone Networks for example.
- PSTN Public Subscriber Telephone Network
- Various user devices may be connected to the communications network 96, for example a personal computer 98 a regular landline telephone 100 or a mobile telephone 102. Other sorts of user devices may also be connected to the communications network 96.
- the user devices 98, 100, 102 are connected to the TTS converter system 90 via communications network 96 and network interface 94.
- network interface 94 is configured to receive requests from user devices 98, 100 and 102 for speech co ⁇ esponding to a particular text source 92.
- a user of mobile telephone 102 may request, via network interface 94, their e-mails.
- network interface 94 accesses mail server 92(1) to cause the requested e- mail/s to be forwarded to the TTS converter system 90.
- the e-mails are converted into speech and forwarded to network interface 94 where they are communicated back to the user via mobile telephone 102.
- the network interface may be connected to the text source 92 by way of the TTS converter system 90 which controls access to the various text source servers, retrieves requested text sources and converts them into speech for output to communications network 96 via network interface 94.
- the TTS converter system 90 controls access to the various text source servers, retrieves requested text sources and converts them into speech for output to communications network 96 via network interface 94.
- other configurations and a ⁇ angements may be utilised and embodiments of the invention are not limited to the a ⁇ angement described with reference to Figure 4.
- a text source 92 supplies a portion of text to tokenise module 112, either directly or via editing work station 110.
- the text portion should be unformatted, and preferably well-structured.
- a human operator may edit a text portion from text source 92 in order to ensure that it is well formed. For example, proper capitalisation may be inserted, and the text portion edited to ensure that it conforms with grammatical and punctuation rules.
- Special formatting such as underlining, emboldening, italicisation, etc. may also be removed at this stage. Optionally, such formatting may be automatically removed if the appropriate control characters are recognised by the system.
- the tokenised text is input to POS tagger 114, which in the described example is a Brill Tagger and therefore requires the tokenised text prepared by tokenised module 112.
- POS Brill Tagger 114 assigns tags to each word in the tokenised text portion in accordance with a Penn TreeBank POS tag set stored in database 136.
- the Penn TreeBank POS set of tags will be described in detail hereinafter, but is a well known set of tags an example of which is available from url "http://wvvw.ccl.umist.ac.uk/teaching/material/1019/Lect6/tsld006.htm", and accessible on 18 July 2001.
- Parser 116 is connected to a memory module 138 in which parser 116 can store parse trees and other parsing and syntactic information for use in the parsing operation.
- Memory module 138 may be a dedicated unit, or a logical part of a memory resource shared by other parts of the TTS converter system.
- Parsed text is forwarded to a prosodic break insertion module 118, which inserts prosodic boundary marks into the parsed text in accordance with the syntactic analysis carried out by parser 116.
- Prosodic break insertion module 118 also receives punctuation 120 from tokenised module 112. The punctuation information is used by prosodic break insertion module 118 to assign further prosodic boundaries to the text portion.
- the prosodic boundary markup text configured in module 118 is forwarded to constraints module 122.
- POS tagger 114 is connected 124 to the constraints module 122 to provide information regarding POS tagged text.
- Constraints module 122 uses the POS tagged text in order to apply constraints to the prosodic boundary markup text output from module 118, in order to reduce overmarking and to delete "illegal" boundary marks.
- the prosodic boundary constrained text is then output from constraints module 122 to TTS synthesiser unit 126, wherein synthesised speech is generated from the constrained text.
- An audio output mechanism 128 receives the synthesised speech from the TTS synthesiser 126, and outputs the speech by way of a speaker 130, mu- or A-law 132 audio output for a PSTN, or a digital MPEG audio output 134, for example, but any suitable encoder may be used.
- a prosodic boundary markup mechanism in accordance with an embodiment of the present invention may be implemented as a computer program or a computer program element. The operation of such a computer program or computer program element will now be described with reference to the flow diagram of Figure 6.
- Plain text 150 i.e. unformatted text and which in this example is a news report is tokenised at step 152 in order to ensure spaces are inserted between words and punctuation marks.
- Tokenised text is then tagged with parts of speech at step 154 by way of a Brill POS tagger.
- a Brill POS tagger is a computer program written by Eric Brill and available from the Massachusetts Institute of Technology (M.I.T.) for use without fee or royalty.
- the Brill POS tagger applies POS tags using the notation of the Penn TreeBank tag set derived by Pie ⁇ eHumbert.
- the POS tagged text is then post-processed at step 156, and which will be described in more detail later with reference to Figure 7 and 8.
- step 158 the prosodic boundaries are placed in the text based on the post processing at step 156, and prosodic boundaries corresponding to punctuation are inserted at step 160.
- constraints are applied to the prosodic boundary marked up text in order to reduce overmarking, and the constrained marked up text is then output to a TTS speech synthesiser 164.
- Figure 7 illustrates the operation of the parser, refe ⁇ ed to herein as a "chunking" parser since the parser identifies syntactic fragments of a sentence based on a sentence syntax, the fragments being refe ⁇ ed to as chunks.
- the Applicant has recognised that there is some co ⁇ espondence between the chunks and the sites of prosodic boundaries.
- the chunk boundaries are identified by using a modified chart parser and a phase structure grammar.
- Chart parsing is a well-known and efficient parsing technique. It uses a particular kind of data structure called a chart, which contains a number of so-called "edges". Parsing is in essence a search problem, and chart parsing is efficient in performing the necessary search since the edges contain information about all the partial solutions previously found for a particular parse.
- the principle advantage of this technique is that it is not necessary, for example, to attempt to construct an entirely new parse tree in order to investigate every possible parse. Thus, repeatedly encountering the same dead-ends, a problem which arises in other approaches, is avoided.
- the parser used in the present embodiment is a modification of a chart parser, known as Gazdaar & Mellish's bottom-up chart parser downloadable from url "http://www.coli.uni-sb.de/ ⁇ brawer/prolog/botupchart", modified to:
- the parser is loaded with a phase-structure grammar (PSG) capable of identifying chunk boundaries such as may be implemented by reference to suitable text books, and the parser memory is initialised by clearing it of any information relating to a previous parsing activity.
- PSG phase-structure grammar
- the tagged sentence 170 is tagged by a Brill Tagger using the Penn TreeBank set of tags. Each word just receives a tag indicating the word class (part of speech) played by the word. In the cu ⁇ ent text example, each word receives the following POS tags:
- A/DT report/NN into/LN the DT Ladbroke/NNP Grove/NNP train/NN crash/NN has/NBZ blamed/VB ⁇ a/DT "/" lamentable/JJ failure/ ⁇ "/” by/I ⁇ Railtrack/ ⁇ P to/TO respond/VB to/TO safety/ ⁇ warnings/ ⁇ S before/I ⁇ the DT accident/ ⁇ .
- the notation used in the foregoing example comprises the Penn TreeBank set of tags paired with each word or punctuation mark by way of a forward slash following the relevant word or punctuation mark, and then the POS tag.
- each word-tagged pair is read into the parser until a full stop is encountered.
- Each word-tag pair is stored as a term in the programs run time data base at step 174. Information about the location and type of punctuation marks is also retained for later use although the punctuation itself is discarded for the purpose of parsing.
- the word-tag terms are used in the parsing itself, and the subsequent evaluation of prosodic boundary constraints.
- the chart parser routine is called, and the word-tag terms are used to initialise it.
- the parsing proceeds by processing the sentence in the direction of reading, i.e. for English type languages processing would proceed left to right.
- edges are gradually added, until a parse is found or there are no further alternatives left to explore.
- the parser regards the sentence as having numbered vertices in the gaps between each word in the sentence, as well as before the first word, and after the last word of the sentence.
- an active or inactive edge may be added to text portion undergoing parsing.
- An inactive edge represents the unification of a text chunk spanning two vertices, with a grammar rule. The unification is complete, i.e. there is nothing left over. This may be described by way of the following example.
- a grammar rule: rule(s, [np, vp]) may be interpreted as: 's' rewrites as an 'np' followed by a 'vp'. (Other grammar rules may expand these components in full text i.e. np - noun phase, vp - verb phrase. A noun phrase and verb phrase amoungst other grammatical constructts, co ⁇ espond to "chunks".
- edge (152) an active edge, the entire [np, vp] sequence of constituents is not found in that edge. There remains a vp 'left over' to be found. In inactive edge 150, however, the np has been found completely. Hence there is nothing 'left over'.
- Active edges requiring components just identified in the inactive edge may also be added to the chart at this point. Active edges are added during the process of attempting to unify a chunk of a sentence with a grammar rule. When the left most constituent of a grammar rule is found in the chunk, an active edge is added which records what has been found so far, and the vertices it spans, along with what remains to be found.
- An edge has the syntax: /* edge (ID, FormVertex, To Vertex, SyntacticCategory, Found, Still-to-Find) */
- Edge 152 is an active edge. It says that (part of) a sentence, s, has been found between vertex 11 and vertex 16, in edge 150. A vp remains to be found, from vertex 17 onwards in order to complete the sentence.
- a second example is:
- Edge 150 is an inactive edge. It says that a noun phrase (np) has been found between vertices 11 and 16. Subcomponents of the phrase are found in edges 149, 147, 145, 143, 138. None remains to be found in order to complete the noun phase (np) - this is what qualifies 150 as an inactive edge.
- the parser When no further edges can be added to the chart, this part of the parser terminates.
- the parser then explores the entire set of inactive edges at step 178, seeking paths through the set of edges which span the entire sentence from left to right.
- the parser operates in a top-down fashion, and considers all inactive edges in which the currently sought category (eg, s, np, vp etc) is found. If a sequence of edges can be found which spans all vertices, from left to right, with no gaps or overlaps, then there is a complete parse.
- this structure is re- written using the symbol '
- the re- written text is them exported at step 184 to the main procedure illustrated in Figure 8 to inport boundaries co ⁇ esponding to punctuation.
- the main procedure for the described embodiment of the prosodic boundary markup mechanism is illustrated in Figure 8.
- POS text tagged by a Brill Tagger is input to the get parse routine 192, described above with reference to the flow diagram of Figure 7.
- the output of get parse routine 192 is the re- written text labelled (9) above.
- prosodic boundary marks co ⁇ esponding to the punctuation in the original text portion are imported into the re-written text portion (9), and are marked with '
- embodiments of the present invention apply prosodic boundary constraints to text incorporating prosodic boundaries marks based on a purely syntactic analysis, depending upon whether particular words are accented or deaccented. Other constraints are applied take account of the relative positions of prosodic boundary marks already posited in the text portion or sentence. These constraints result in the removal of at least some of the over-predicted prosodic boundaries marks.
- constraints are relatively straightforward, and is by way of passing the marked up text (10) to a number of filters in sequence. Each filter implements one of the constraints. Each time a prosodic boundaries is identified as illegal, according to a constraint, it is removed.
- An overriding rule is that all boundaries at punctuation are legal and cannot be deleted.
- Major and minor boundaries are defined as follows: a) all boudnaries at punctuation are major except commas in lists, and designated "
- the first constraint is applied at step 196, and comprises applying filter (1) to prosodic boundary markup text (10), as defined below:
- Filter (1) If the preceding word is deaccented, then the boundary is illegal. The determination of whether or not a word is deaccented may be based on identifying deaccented and accented words. Any words not being accented are deaccented. The criteria for identifying words as deaccented or accented are set out below. These criteria are applied to each word as part of Filter (1) to remove prosodic boundaries following deaccented words.
- Phrasal verb any verb preceding RP.
- Second nominal of a nominal compound (not if both are NNP, then both are accented).
- JJS JJS, NN, NNS, NNP, NNPS, RB, RBR, RBS, VB, VBD, VBG, VBN, VBZ, WRB.
- Nominal pronouns everybody, anybody, everything, anything, (something/body, nobody/nothing are not accented)
- Post-quantifiers same as PDT except that they don't occur before the Determiner (e.g. 'quite')
- Negative do do, does, did + 'not' VOX GENERATION
- Wh- words all wh-words except WPZ, and WRB
- NPs and AdvPs can also function as preposed adverbials but for the present embodiment it is not relevant because their heads would be accented in any case since they are content words.
- the constraints analysis then proceeds to step 198, where additional constraints are applied in a sequence laid out below:
- Filter (3) Two words of the major same class cannot su ⁇ ound a boundary.
- a major class is defined as one of noun, verb, adjective, adverb. The identification of the major class is by way of the POS-tag information, i.e. POS-tagged words.
- Filter (5) Stating from the beginning of a sentence, or from a boundary, the next two boundaries are found. If the ratio of the lengths, measured as number of words, from the start point to the first boundary, and from first boundary to the second boundary is greater than 2, then the first boundary is illegal. Preferably, this only applies where the number of words between the first and second boundaries is 3 or less.
- Text portion (5) is a structure in which reasonable prosodic boundaries have been identified.
- the text portion (5) is then marked up with ToBI prosodic annotation in order to achieve realistic, (or more realistic), prosodic speech quality in synthesised speech, by adjusting one or more of the following quantities: pitch accent, amplitude, and duration.
- Each of the prosodic boundaries identifies a point at which some treatment would be applied.
- the main routine illustrated in Figure 8 then returns to the process flow illustrated in Figure 6 in which the prosody markup (ToBI) text is input to a TTS synthesiser at step 164.
- ToBI prosody markup
- the parser is configured to return partial parse results, and therefore the inability to render a completely accurate parse is not catastrophic.
- the parser seeks a parse which spans an entire sentence. However, if no parse can be found which spans the entire sentence, the longest available partial parse is sought. Thus, it is still possible to recover useful, although incomplete, information about where prosodic boundaries may be posited. Thus, the invention exhibits a graceful degradation in its performance, rather than abrupt or catastrophic failure.
- the longest spanning edge is chosen. Whilst other complete parses may be valid, and represent another interpretation of the meaning of the sentence. Since semantic information is not available to the automated system to resolve the ambiguity, no decision may be made based on the meaning of one or other sentence independent upon its parse. Therefore, the prefe ⁇ ed embodiment of the invention always chooses longest spanning edge, since it tends to minimise the number of prosodic boundaries, and thereby reduces the risk of an inappropriate boundary misleading a listener with regard to the semantic content.
- a software-controlled programmable processing device such as a Digital Signal Processor, microprocessor, other processing devices, data processing apparatus or computer system
- a computer program or program element for configuring a programmable device, apparatus or system to implement the foregoing described methods is envisaged as an aspect of the present invention.
- the computer program or program element may be embodied as source code and undergo compilation for implementation on a processing device, apparatus or system, or may be embodied as object code, for example.
- object code for example.
- the term computer in its most general sense encompasses programmable devices such as refe ⁇ ed to above, and data processing apparatus and computer systems.
- the computer program or program element is stored on a carrier medium in machine or device readable form, for example in solid-state memory or magnetic memory such as disc or tape and the processing device utilises the program, program element or a part thereof to configure it for operation.
- the computer program or program element may be supplied from a remote source embodied in a communications medium such as an electronic signal, including radio frequency carrier wave or optical carrier wave.
- a communications medium such as an electronic signal, including radio frequency carrier wave or optical carrier wave.
- carrier media are also envisaged as aspects of the present invention.
- a POS tagger other than the Brill Tagger may be used.
- a text portion input to a prosodic boundary markup mechanism may comprise one or more sentences.
- the text sources, servers in Figure 4 may provide any suitable textual content beyond that illustrated. Other examples being horoscopes.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB0119842A GB2378877B (en) | 2001-08-14 | 2001-08-14 | Prosodic boundary markup mechanism |
| GB0119842.3 | 2001-08-14 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2003017251A1 true WO2003017251A1 (en) | 2003-02-27 |
Family
ID=9920400
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/GB2002/003738 Ceased WO2003017251A1 (en) | 2001-08-14 | 2002-08-14 | Prosodic boundary markup mechanism |
Country Status (2)
| Country | Link |
|---|---|
| GB (1) | GB2378877B (en) |
| WO (1) | WO2003017251A1 (en) |
Cited By (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103035241A (en) * | 2012-12-07 | 2013-04-10 | 中国科学院自动化研究所 | Model complementary Chinese rhythm interruption recognition system and method |
| CN105185374A (en) * | 2015-09-11 | 2015-12-23 | 百度在线网络技术(北京)有限公司 | Prosodic hierarchy annotation method and device |
| US10127901B2 (en) | 2014-06-13 | 2018-11-13 | Microsoft Technology Licensing, Llc | Hyper-structure recurrent neural networks for text-to-speech |
| US10169305B2 (en) | 2017-05-30 | 2019-01-01 | Abbyy Development Llc | Marking comparison for similar documents |
| BE1025287B1 (en) * | 2017-10-09 | 2019-01-08 | Mind The Tea Sas | Method of transforming an electronic file into a digital audio file |
| CN110782880A (en) * | 2019-10-22 | 2020-02-11 | 腾讯科技(深圳)有限公司 | Training method and device of rhythm generation model |
| CN110782918A (en) * | 2019-10-12 | 2020-02-11 | 腾讯科技(深圳)有限公司 | Voice rhythm evaluation method and device based on artificial intelligence |
| CN112528014A (en) * | 2019-08-30 | 2021-03-19 | 成都启英泰伦科技有限公司 | Word segmentation, part of speech and rhythm prediction method and training model of language text |
| CN112786023A (en) * | 2020-12-23 | 2021-05-11 | 竹间智能科技(上海)有限公司 | Mark model construction method and voice broadcasting system |
| CN113392645A (en) * | 2021-06-22 | 2021-09-14 | 云知声智能科技股份有限公司 | Prosodic phrase boundary prediction method and device, electronic equipment and storage medium |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CA2119397C (en) * | 1993-03-19 | 2007-10-02 | Kim E.A. Silverman | Improved automated voice synthesis employing enhanced prosodic treatment of text, spelling of text and rate of annunciation |
-
2001
- 2001-08-14 GB GB0119842A patent/GB2378877B/en not_active Expired - Lifetime
-
2002
- 2002-08-14 WO PCT/GB2002/003738 patent/WO2003017251A1/en not_active Ceased
Non-Patent Citations (4)
| Title |
|---|
| ABEILLÉ ANNE, CLÉMENT LIONEL AND KINYON ALEXANDRA: "Building a Treebank for French", LREC2000, 2ND INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES & EVALUATION, 31 May 2000 (2000-05-31) - 2 June 2000 (2000-06-02), XP002217607 * |
| ATTERER MICHAELA: "Assigning Prosodic Structure for Speech Synthesis via Syntax-Prosody Mapping", 2000, DIVISION OF INFORMATICS, UNIVERSITY OF EDINBURGH, EDINBURGH, UK, XP002217610 * |
| WALKER MARK R. AND HUNT ANDREW: "Speech Synthesis Markup Language Specification for the Speech Interface Framework", HTTP://WWW.W3.ORG/TR/2001/WD-SPEECH-SYNTHESIS-20010103, 3 January 2001 (2001-01-03), XP002217609 * |
| WOLTERS MARIA: "Linguistic Annotation of Two Prosodic Databases", PROCEEDINGS OF THE WORKSHOP ON RECENT ADVANCES IN CORPUS ANNOTATION, ESSLI'98, 1998, Saarbrücken, DE, XP002217608 * |
Cited By (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103035241A (en) * | 2012-12-07 | 2013-04-10 | 中国科学院自动化研究所 | Model complementary Chinese rhythm interruption recognition system and method |
| US10127901B2 (en) | 2014-06-13 | 2018-11-13 | Microsoft Technology Licensing, Llc | Hyper-structure recurrent neural networks for text-to-speech |
| CN105185374A (en) * | 2015-09-11 | 2015-12-23 | 百度在线网络技术(北京)有限公司 | Prosodic hierarchy annotation method and device |
| CN105185374B (en) * | 2015-09-11 | 2017-03-29 | 百度在线网络技术(北京)有限公司 | Prosody hierarchy mask method and device |
| US10169305B2 (en) | 2017-05-30 | 2019-01-01 | Abbyy Development Llc | Marking comparison for similar documents |
| BE1025287B1 (en) * | 2017-10-09 | 2019-01-08 | Mind The Tea Sas | Method of transforming an electronic file into a digital audio file |
| CN112528014B (en) * | 2019-08-30 | 2023-04-18 | 成都启英泰伦科技有限公司 | Method and device for predicting word segmentation, part of speech and rhythm of language text |
| CN112528014A (en) * | 2019-08-30 | 2021-03-19 | 成都启英泰伦科技有限公司 | Word segmentation, part of speech and rhythm prediction method and training model of language text |
| CN110782918A (en) * | 2019-10-12 | 2020-02-11 | 腾讯科技(深圳)有限公司 | Voice rhythm evaluation method and device based on artificial intelligence |
| CN110782918B (en) * | 2019-10-12 | 2024-02-20 | 腾讯科技(深圳)有限公司 | Speech prosody assessment method and device based on artificial intelligence |
| CN110782880A (en) * | 2019-10-22 | 2020-02-11 | 腾讯科技(深圳)有限公司 | Training method and device of rhythm generation model |
| CN110782880B (en) * | 2019-10-22 | 2024-04-09 | 腾讯科技(深圳)有限公司 | Training method and device for prosody generation model |
| CN112786023A (en) * | 2020-12-23 | 2021-05-11 | 竹间智能科技(上海)有限公司 | Mark model construction method and voice broadcasting system |
| CN113392645A (en) * | 2021-06-22 | 2021-09-14 | 云知声智能科技股份有限公司 | Prosodic phrase boundary prediction method and device, electronic equipment and storage medium |
| CN113392645B (en) * | 2021-06-22 | 2023-12-15 | 云知声智能科技股份有限公司 | Prosodic phrase boundary prediction method and device, electronic equipment and storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| GB2378877B (en) | 2005-04-13 |
| GB0119842D0 (en) | 2001-10-10 |
| GB2378877A (en) | 2003-02-19 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US6535849B1 (en) | Method and system for generating semi-literal transcripts for speech recognition systems | |
| Black et al. | Building synthetic voices | |
| US6952665B1 (en) | Translating apparatus and method, and recording medium used therewith | |
| Hirschberg | Pitch accent in context predicting intonational prominence from text | |
| Ostendorf et al. | The Boston University radio news corpus | |
| Klatt | The Klattalk text-to-speech conversion system | |
| US8594995B2 (en) | Multilingual asynchronous communications of speech messages recorded in digital media files | |
| CN108470024B (en) | Chinese prosodic structure prediction method fusing syntactic and semantic information | |
| US20050154580A1 (en) | Automated grammar generator (AGG) | |
| JP2000353161A (en) | Method and device for controlling style in generation of natural language | |
| Gibbon et al. | Representation and annotation of dialogue | |
| Heldner et al. | Exploring the prosody-syntax interface in conversations | |
| WO2003017251A1 (en) | Prosodic boundary markup mechanism | |
| Carlson et al. | Linguistic processing in the KTH multi-lingual text-to-speech system | |
| JP3706758B2 (en) | Natural language processing method, natural language processing recording medium, and speech synthesizer | |
| US20030216921A1 (en) | Method and system for limited domain text to speech (TTS) processing | |
| US6772116B2 (en) | Method of decoding telegraphic speech | |
| Veilleux | Computational models of the prosody/syntax mapping for spoken language systems | |
| Wang | Porting the galaxy system to Mandarin Chinese | |
| JP2005208483A (en) | Device and program for speech recognition, and method and device for language model generation | |
| JP2001117922A (en) | Translation apparatus, translation method, and recording medium | |
| JP2008257116A (en) | Speech synthesis system | |
| JP2001117583A (en) | Speech recognition device, speech recognition method, and recording medium | |
| Allen | Speech synthesis from text | |
| JP3638000B2 (en) | Audio output device, audio output method, and recording medium therefor |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BY BZ CA CH CN CO CR CU CZ DE DM DZ EC EE ES FI GB GD GE GH HR HU ID IL IN IS JP KE KG KP KR LC LK LR LS LT LU LV MA MD MG MN MW MX MZ NO NZ OM PH PL PT RU SD SE SG SI SK SL TJ TM TN TR TZ UA UG US UZ VC VN YU ZA ZM Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
| AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ UG ZM ZW AM AZ BY KG KZ RU TJ TM AT BE BG CH CY CZ DK EE ES FI FR GB GR IE IT LU MC PT SE SK TR BF BJ CF CG CI GA GN GQ GW ML MR NE SN TD TG US Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LU MC NL PT SE SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
| DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
| REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
| 122 | Ep: pct application non-entry in european phase | ||
| NENP | Non-entry into the national phase |
Ref country code: JP |
|
| WWW | Wipo information: withdrawn in national office |
Country of ref document: JP |