HK1060418B

HK1060418B - Method and system for multi-modal entry of ideogrammatic languages

Info

Publication number: HK1060418B
Application number: HK04103357.7A
Authority: HK
Inventors: 朱允诚; 洪小文
Original assignee: 微软公司
Priority date: 2002-05-08
Filing date: 2004-05-13
Publication date: 2010-09-03

Description

Method and apparatus for multimodal input of ideographic languages

Technical Field

The present invention relates to data processing systems. More particularly, the present invention relates to the entry of written languages with ideograms (e.g., Chinese and Japanese) into computer systems.

Background

Entering non-speech or non-alphabetic languages with ideographic characters into a computer system is time consuming and cumbersome. One system commonly used (also known as "logogram" or "logogram" and used herein as "ideogram" is a symbol representing a word in written language, as opposed to using phonemes or syllables to construct words from which sounds are composed.) is often referred to as IME (input method editor), which is sold by microsoft corporation of Redmond, washington. In such systems, phonetic symbols are provided to a computer using a standard keyboard. The computer includes a converter module that converts the phonetic symbols into the selected language. For example, it is common to create Japanese text in a computer system by entering phonetic characters using an English or Latin keyboard. The use of letters in the latin alphabet to enter japanese phonetic characters is referred to as "romaji". The computer system compares each romaji character to the stored dictionary and generates a sequence of "kana" characters (a plurality of kana characters). Kana is a japanese syllable symbol representing the sound of japanese. Then, the IME converter converts the created kana into a form of "kanji" -a formal japanese written language (a formal japanese writing system actually includes a mixture of kanji and kana, wherein a kanji represents most of the content information without instruction information about pronunciation) through a complicated linguistic analysis.

However, in conventional text processing systems used in Japanese word processors (e.g., the IME system discussed above), it is often necessary to use a so-called candidate word display and selection method to select or correct the appropriate Kanji equivalent for a kana sequence. In particular, a number of kanji candidate words are displayed for a sequence of kana so that the user can select an appropriate one. This display and selection method is necessary because japanese contains many homonyms and has no definite word boundaries, which inevitably leads to kana-kanji conversion errors. By displaying the candidate kanji words, the user can view the possible candidate words and select the appropriate kanji representation.

Similarly, text editing modules used in chinese word processors or other chinese processing systems also require IME conversion-from phonetic symbols (pinyin) to written chinese representations. The pinyin IME is the most popular phonetic chinese IME and operates similarly to the japanese kana discussed above. The character string information of the "pinyin" of a voice is generally converted into chinese characters by using a "pinyin" dictionary and a language model. If there is no tonal mark in the "Pinyin" IME, more homonyms are produced than in the Japanese kana IME. Some pinyin-sequence homonym lists are often too long to fit on the entire screen of the visual display.

Recently, speech recognition has been used in these systems, which naturally provides speech information that was previously entered through a keyboard. However, the homonym problem discussed above still exists. In addition, speech recognition errors may also occur during conversion, which may require more use of candidate word display and selection methods to obtain accurate ideograms.

Accordingly, there is an ongoing need for more efficiently and effectively implementing a system for obtaining written symbols in ideographic languages, such as Chinese and Japanese.

Disclosure of Invention

A method for entering ideograms into a computer system comprising: speech information relating to a desired ideogram to be input is received, and a list of possible ideograms that are a function of the received speech information is created. Stroke information is received to obtain a desired ideogram from a list of candidate words. The stroke information includes one or more strokes of the desired ideogram. This method of obtaining the desired ideogram is "multimodal" and is embodied in: two different, essentially unrelated types of information (speech and strokes) are used to locate the desired ideogram or symbol.

This is particularly useful when one must correct an ideogram that is automatically selected by a text editing system or word processing system, wherein the speech information is provided by a speech recognizer. Typically, the ideograph automatically selected by the system is the most probable ideograph in the list of candidate words. By using the stroke information, ideograms are removed from the list of candidate words when they do not have the strokes in the desired ideogram or symbol as indicated by the user. By repeatedly entering strokes of the desired ideographic word, the user may reduce the list of candidate words. In this way, the user need not enter all strokes for the desired ideogram, but rather only a few strokes, is sufficient to identify the desired ideogram from the list of candidate words.

If the user has not found the desired ideogram or symbol when the initial candidate word list is reduced to zero, then additional ideograms or symbols may be added to the candidate word list (as a function of the stroke information received so far). This is another aspect of the invention. In this way, the user does not need to re-enter stroke information to find the desired ideogram, and thus, the phonetic-information-based ideogram or symbol can be smoothly converted into stroke-information-based ideograms and symbols alone.

According to one aspect of the present invention, there is provided a method for multimodal input of ideographic languages, comprising: receiving voice information about a desired ideogram to be input from an input voice; establishing a candidate character list of possible ideograms as a function of the received voice message; receiving stroke information associated with a desired ideogram, wherein the stroke information includes at least one stroke in the desired ideogram, wherein said receiving stroke information includes receiving stroke information from a handwriting input; using stroke information to obtain a desired ideogram from a list of candidate words, wherein said using stroke information comprises removing ideograms from the list of candidate words that do not have a stroke corresponding to the stroke information; presenting the ideographs in the candidate word list to the user; receiving input relating to an ideogram selected from the list of candidate words as a function of the rendered ideogram; wherein the sequence of steps of receiving stroke information associated with a desired ideogram, removing an ideogram that does not have a stroke corresponding to the stroke information from the list of candidate words, and presenting the ideograms in the list of candidate words to the user is repeated until input associated with the selected ideogram is received; and further comprising adding at least one new ideogram candidate word to the candidate word list if the number of candidate words in the candidate word list is reduced to zero by repeatedly performing the sequence of steps, wherein the at least one new ideogram candidate word is obtained as a function of the stroke information.

According to another aspect of the present invention, there is provided a method for multimodal input of ideographic languages, comprising: receiving voice information about a desired ideogram to be input from an input voice; establishing a candidate character list of possible ideograms as a function of the received voice message; receiving stroke information associated with a desired ideogram, wherein the stroke information includes at least one stroke in the desired ideogram, wherein said receiving stroke information includes receiving stroke information from a handwriting input; using stroke information to obtain a desired ideogram from a list of candidate words, wherein said using stroke information comprises removing ideograms from the list of candidate words that do not have a stroke corresponding to the stroke information; presenting the ideographs in the candidate word list to the user; receiving input relating to an ideogram selected from the list of candidate words as a function of the rendered ideogram; wherein the sequence of steps of receiving stroke information associated with a desired ideogram, removing an ideogram that does not have a stroke corresponding to the stroke information from the list of candidate words, and presenting the ideograms in the list of candidate words to the user is repeated until input associated with the selected ideogram is received; and further comprising adding a plurality of new ideogram candidate words to the candidate word list if the number of candidate words in the candidate word list is reduced to zero by repeatedly performing the sequence of steps, wherein each of the new ideogram candidate words is obtained as a function of the stroke information.

According to still another aspect of the present invention, there is provided an apparatus for multimodal input of ideographic languages, comprising: means for receiving voice information on a desired ideogram to be input from an input voice; means for establishing a list of candidate words of the possible ideograms as a function of the received voice message; means for receiving stroke information associated with the desired ideogram, wherein the stroke information includes at least one stroke in the desired ideogram, wherein said receiving stroke information includes receiving stroke information from the handwriting input; means for using stroke information to obtain a desired ideogram from the candidate word list, wherein said using stroke information comprises removing from the candidate word list ideograms that do not have a stroke corresponding to the stroke information; means for presenting the ideograms in the list of candidate words to the user; means for receiving input relating to an ideogram selected from the list of candidate words as a function of the rendered ideogram; wherein the sequence of steps of receiving stroke information associated with a desired ideogram, removing an ideogram that does not have a stroke corresponding to the stroke information from the list of candidate words, and presenting the ideograms in the list of candidate words to the user is repeated until input associated with the selected ideogram is received; and further comprising means for adding at least one new ideogram candidate word to the candidate word list if the number of candidate words in the candidate word list is reduced to zero by repeatedly performing the sequence of steps, wherein the at least one new ideogram candidate word is obtained as a function of the stroke information.

According to yet another aspect of the present invention, there is provided an apparatus for multimodal input of ideographic languages, comprising: means for receiving voice information on a desired ideogram to be input from an input voice; means for establishing a list of candidate words of the possible ideograms as a function of the received voice message; means for receiving stroke information associated with the desired ideogram, wherein the stroke information includes at least one stroke in the desired ideogram, wherein said receiving stroke information includes receiving stroke information from the handwriting input; means for using stroke information to obtain a desired ideogram from the candidate word list, wherein said using stroke information comprises removing from the candidate word list ideograms that do not have a stroke corresponding to the stroke information; means for presenting the ideograms in the list of candidate words to the user; means for receiving input relating to an ideogram selected from the list of candidate words as a function of the rendered ideogram; wherein the sequence of steps of receiving stroke information associated with a desired ideogram, removing an ideogram that does not have a stroke corresponding to the stroke information from the list of candidate words, and presenting the ideograms in the list of candidate words to the user is repeated until input associated with the selected ideogram is received; and further comprising means for adding a plurality of new ideogram candidate words to the candidate word list if the number of candidate words in the candidate word list is reduced to zero by repeatedly performing the sequence of steps, wherein each of the new ideogram candidate words is obtained as a function of the stroke information.

Drawings

FIG. 1 is a flow chart illustrating one aspect of the present invention.

Fig. 2 is a flow chart illustrating a method of operation in accordance with the present invention.

FIG. 3 is a block diagram of an exemplary environment for implementing the present invention.

FIG. 4 is a block diagram of a speech recognition system.

FIG. 5 is a block diagram of a handwriting recognition system.

FIG. 6 is a block diagram of a module for reducing and rendering a list of candidate words as a function of stroke information.

Fig. 7 is a flow chart illustrating a method of operation in accordance with an alternative embodiment of the present invention.

FIG. 8 is a block diagram of an exemplary processing system.

FIG. 9 is an exemplary list of candidate words.

Detailed Description

Referring to FIG. 1, one aspect of the present invention is a method 10 of entering ideograms in a computer system. The method 10 includes step 12: the phonetic information for the ideograph is typically received from the user by a speech recognition system and a list of candidate ideographs is created that may correspond to the phonetic information received in step 14. At 17, FIG. 9 illustrates an example of a list of candidate words presented to a user on a display. Typically, the ideograph with the highest priority is automatically selected and saved. However, if an error occurs in the automatic selection, then in step 16, the user may provide "stroke" information for at least one stroke that constitutes the correct ideogram. In step 18, the computer system uses the stroke information to obtain the desired ideogram from the candidate word list.

Referring to fig. 2, providing stroke information and obtaining a desired ideogram may include: repeating the steps 19-22. Step 19 comprises: stroke information (i.e., one or more strokes contained in the desired ideogram) is obtained from the user. Using the stroke information obtained from step 19 (and any additional stroke information obtained by the original execution of step 19), the list of candidate words may be narrowed down in step 20 to include only those ideograms having stroke information obtained from the user. In step 21, the narrowed list of candidate words is presented to the user. If the user identifies the desired ideogram in step 22, the selected ideogram is saved; otherwise, the user may provide additional stroke information in step 19 and the process repeats.

It should be noted that stroke information is generally independent of phonetic information, so that the list of candidate words can be easily reviewed (e.g., reduced) to obtain the desired ideographic word. In one system of entering letters of a desired word to obtain the word, there is a strong association between the letters and the speech uttered by the letters in the word. Thus, many, if not all, letters need to be entered to reduce the list of word candidates to identify the desired word. By contrast, by using stroke information that is generally not closely related to the phonetic association of an ideogram, a desired ideogram may be quickly identified from a list of similarly pronounced candidate ideograms.

The method 10 described above may be performed in any text editing module, which may take many forms. For example, the text editing module can be the IME system described in the background section above that receives speech information through speech and converts the speech information into a written language (e.g., japanese, chinese, etc.). Additionally, the text editing module may be a word processing application or form part of a dictation system that receives input speech from a user through a microphone and converts the input speech into text.

Before proceeding with a further detailed discussion of the present invention, it may be helpful to overview an operating environment. FIG. 3 illustrates an example of a suitable computing system environment 50 on which the invention may be implemented. The computing system environment 50 is only one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the invention. Neither should the computing environment 50 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary operating environment 50.

The invention is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the invention include, but are not limited to, personal computers, server computers, hand-held or portable devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Program modules generally include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices. Tasks performed by the programs and modules are described below and with reference to the figures. Those skilled in the art can implement the description and figures in processor-executable instructions, which can be written on any form of a computer-readable medium.

With reference to FIG. 3, an exemplary system for implementing the invention includes a general purpose computing device in the form of a computer 60. The components of computer 60 may include, but are not limited to, a processing component 70, a system memory 80, and a system bus 71 that couples various system components including the system memory to the processing component 70. The system bus 71 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus also known as Mezzanine bus.

Computer 60 typically includes a variety of computer readable media. Computer readable media can be any available media that can be accessed by computer 50 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by computer 50.

Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, FR, infrared and other wireless media. Combinations of any of the above should also be included within the scope of computer readable media.

The system memory 80 includes computer storage media in the form of volatile and/or nonvolatile memory such as Read Only Memory (ROM)81 and Random Access Memory (RAM) 82. A basic input/output system 83(BIOS), containing the basic routines that help to transfer information between elements within computer 60, such as during start-up, is typically stored in ROM 81. RAM 82 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing element 70. By way of example, and not limitation, FIG. 3 illustrates operating system 84, application programs 85, other program modules 86, and program data 87.

The computer 60 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only, FIG. 3 illustrates a hard disk drive 91 that reads from or writes to non-removable, nonvolatile magnetic media, a magnetic disk drive 101 that reads from or writes to a removable, nonvolatile magnetic disk 102, and an optical disk drive 105 that reads from or writes to a removable, nonvolatile optical disk 106 (e.g., a CD-ROM or other optical media). Other removable/non-removable, volatile/nonvolatile computer storage media that can be used in the exemplary operating environment include, but are not limited to, magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM, and the like. The hard disk drive 91 is typically connected to the system bus 71 through a non-removable memory interface such as interface 90, and magnetic disk drive 101 and optical disk drive 105 are typically connected to the system bus 71 by a removable memory interface, such as interface 100.

The various drives and their associated computer storage media discussed above and illustrated in FIG. 3, provide storage of computer readable instructions, data structures, program modules and other data for the computer 60. In FIG. 3, for example, hard disk drive 91 is illustrated as storing operating system 94, application programs 95, other program modules 96, and program data 97. Note that these components can either be the same as or different from operating system 84, application programs 85, other program modules 86, and program data 87. Operating system 84, application programs 85, other program modules 86, and program data 87 are provided with different numbers here to illustrate that: they are minimally different copies.

A user may enter commands and information into the computer 60 through input devices such as a keyboard 112, a microphone 113, a writing tablet 114, and a pointing device 111, such as a mouse, trackball or touch pad. Other input devices (not shown) may include a joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to the processing unit 70 through a user input interface 110 that is coupled to the system bus, but may be connected by other interface and bus structures, such as a parallel port, game port or a Universal Serial Bus (USB). A monitor 141 or other type of display device is also connected to the system bus 71 via an interface, such as a video interface 140. In addition to the monitor, computers may also include other peripheral output devices such as speakers 147 and printer 146, which may be connected through an output peripheral interface 145.

The computer 60 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 130. The remote computer 130 may be a personal computer, a hand-held device, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 60. The logical connections depicted in FIG. 3 include a Local Area Network (LAN)121 and a Wide Area Network (WAN)123, but may also include other networks. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and the Internet.

When used in a LAN networking environment, the computer 60 is connected to the LAN 121 through a network interface or adapter 120. When used in a WAN networking environment, the computer 60 typically includes a modem 122 or other means for establishing communications over the WAN 123 (e.g., the Internet). The modem 122, which may be internal or external, may be connected to the system bus 71 via the user input interface 110, or other appropriate mechanism. In a networked environment, program modules depicted relative to the computer 60, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation, FIG. 3 illustrates remote application programs 135 as residing on remote computer 130. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.

The speech information obtained in step 12 is typically provided by a speech recognition system, an exemplary embodiment of which is shown in fig. 4 at 160. The speech recognition system 160 typically receives input speech from a user and converts the input speech to text. In general, a speech recognition system used in this way is called a "dictation system". While the speech recognition system 160 may be built as part of a word processing application or text editing module, it should be understood that the present invention also encompasses a dictation system that provides a text file as output only. In other words, one form of dictation system may not include the ability to edit the text file (rather than correcting the ideograms as discussed above).

In the exemplary embodiment, speech recognition system 160 includes microphone 92, an analog-to-digital (A/D) converter 164, a training module 165, a trait extraction module 166, a thesaurus storage module 170, an acoustic model 172 along with senone tree walk, a tree walk search engine 174, and a language model 175. It should be noted that the entire system 160 or a portion of the speech recognition system 160 may be implemented in the environment illustrated in FIG. 3. For example, the computer 50 may be preferably provided with a microphone 92 as an input device through an appropriate interface and through an A/D converter 164. The training module 165 and the feature extraction module 166 may be hardware modules in the computer 50 or may be software modules stored in any of the information storage devices shown in fig. 3 and accessible by the processing component 51 or another suitable processor. In addition, the thesaurus storage module 170, the acoustic model 172 and the language model 175 are also preferably stored in any of the storage devices shown in fig. 3. In addition, the tree walk search engine 174 is implemented in the processing element 51 (which may include one or more processors) or may be implemented by a dedicated speech recognition processor used by the personal computer 50.

In the illustrated embodiment, during speech recognition, a user provides speech (as input into the system 160 in the form of audible speech signals) to the microphone 92. Microphone 92 converts audible speech signals into analog electronic signals that are provided to a/D converter 164. A/D converter 164 converts the analog speech signal to a sequence of digital signals, which are provided to feature extraction module 166. In one embodiment, feature extraction module 166 is a conventional array processor that performs spectral analysis on the digital signal and calculates a quantitative value for each frequency band of a spectrum. In one illustrative embodiment, these signals are provided by A/D converter 164 to feature extraction module 166 at a sampling rate of approximately 16 kHz.

Feature extraction module 166 separates the digital signal received from a/D converter 164 into frames comprising a plurality of digital samples. The duration of each frame is approximately 10 milliseconds. The feature extraction module 166 then encodes the frames into a feature vector that reflects the spectral features of the plurality of frequency bands. In the case of discrete and semi-continuous "hidden Markov modeling," the feature extraction module 166 also encodes the feature vectors into one or more code words using various vector quantization techniques and a telegraph codebook obtained from training data. In this way, feature extraction module 166 provides a feature vector (or code word) at its output for each spoken utterance. The feature extraction module 166 provides feature vectors (or code words) at a rate of approximately one feature vector or (code word) every 10 milliseconds.

The feature vectors (or code words) of the particular frame being analyzed are then used to compute an output probability distribution according to a "hidden Markov model". These probability assignments are later used in performing Viterbi or similar types of processing techniques.

Upon receiving the code words from the trait extraction module 166, the tree walk search engine 174 accesses the information stored in the acoustic model 172. The models 172 store various acoustic models (e.g., "hidden Markov models") that represent speech components to be detected by the speech recognition system 160. In one embodiment, the acoustic model 172 includes one senone tree path associated with each Markov state in the "hidden Markov model". In one illustrative embodiment, these "hidden Markov models" represent phonemes. The tree walk search engine 174 determines from senones in the acoustic model 172 the most likely phonemes represented by the feature vectors (or code words) received from the feature extraction module 166, and thus represents utterances received from users of the system.

The tree walk search engine 174 also accesses the thesaurus stored in the module 170. In searching the thesaurus storage module 170, the information received by the tree walk search engine 174 based on its access to the acoustic model 172 is used to determine a symbol or ideogram that most likely represents the code word or feature vector received from the feature extraction module 166. In addition, the search engine 174 accesses the language model 175. Language model 175 is also used to recognize the most likely symbol or ideogram that the input speech represents. The possible symbols or ideograms may be organized in a list of candidate words. The most likely symbol or ideogram from the candidate word list is provided as output text. The training module 165 and the keyboard 70 are used to train the speech recognition system 160.

Although the speech recognition system 160 is described herein using HMM modeling and senone tree paths, it should be understood that the speech recognition system 160 can take many forms of hardware and software modules, and that what is required is: it preferably provides the text as output by using a list of candidate words.

The stroke information obtained in step 16 is typically provided by a handwriting recognition module or system, an exemplary embodiment of which is illustrated in FIG. 5 at 181. Handwriting recognition module 181 receives input from a user through tablet 114.

In general, handwriting recognition systems are well known. Fig. 5 illustrates an exemplary embodiment that may be modified to operate in the present invention, which is disclosed in U.S. patent No. 5,729,629, assigned to the same assignee as the present invention. Briefly, handwriting recognition system 185 includes a handwriting recognition module 181 coupled to tablet 114, tablet 114 receiving handwritten input symbols from a user and displaying reference symbols determined by handwriting recognition module 181 to correspond to the handwritten symbols. The handwriting recognition module 181 is coupled to a storage component 189 that temporarily stores coordinate information representing characteristics of input strokes received from the tablet 114. Handwriting recognition module 181 includes a stroke analyzer 191 that retrieves the coordinate information from storage 189 and translates the coordinate information for each written feature into a feature code representing one of a predetermined number of feature models stored in storage 189. For purposes of the present invention, the handwriting recognition module 181 need not recognize the entire ideogram or symbol, but rather only recognize one or more individual strokes contained within the ideogram or symbol, with the stroke information being used to separate the ideogram or symbol having that stroke from the ideogram or symbol not having that stroke.

The individual stroke feature evaluation is performed by a tag comparator 193 which compares the feature code of the input stroke with the feature codes of the reference strokes stored in the storage element 189 and identifies one or more reference strokes having feature codes that most closely match the feature code of the input stroke. The reference stroke that most closely matches the handwritten input stroke, as determined by the tag comparator, is used to select the desired ideogram as a function of the stroke information at step 18 in FIG. 1, or to reduce the candidate word list at step 20 with reference to FIG. 2.

As discussed above, the handwriting recognition system 185 may be executed on the computer 50. The storage component 189 may include any of the storage devices discussed above (e.g., RAM 55, hard drive 57, removable magnetic disk 59, or CD for optical disk drive 60), or may include any storage device that is accessed by the remote computer 130. The stroke parser 191, the tag comparator 193 may be a manually wired circuit or module, but is typically a software program or module. The tablet 114 includes an input device (e.g., a conventional digitizer tablet and stylus or electronic scanner). Typically, the input device provides a series of X-Y coordinate points to define stroke segments corresponding to the continuous movement of a pen on a digitized form or the style of a symbol as detected by an electronic scanner. The tablet 114 sends these coordinate points to the storage means 189 where they are stored while the strokes are being recognized. It should also be noted that the form of the handwriting recognition system 185 may be altered by using other techniques to recognize the entered strokes without departing from aspects of the present invention. Another suitable system or module for obtaining stroke information and reducing a range of potential ideograms may be found in imeprad by microsoft corporation.

Stroke information may be used in various ways to reduce the list of candidate words in step 20 of FIG. 2. For example, referring to FIG. 6, a central or primary database 170 may be maintained in a computer-readable medium having data representing all of the ideograms or symbols used in a language, and in particular, data representing strokes in each ideogram or symbol. 171 are provided to a processing module 173 that uses the database 170 to identify the corresponding ideogram or symbol or at least the strokes of the ideogram or symbol in the candidate list 171. When stroke information is received from the user, the processing module 173 accesses stroke information corresponding to ideograms in the candidate word list 171 to exclude those ideograms or symbols in the candidate word list 171 that do not include strokes recognized by the user. The ideograms or symbols in the candidate word list 171 are typically presented to the user through a suitable presentation module 177 (e.g., as shown in fig. 9) because stroke information is used to reduce the number of symbols so that once the user identifies the desired ideogram or symbol, he or she can quickly select the desired ideogram or symbol. In this way, to recognize the desired symbol, the user will not typically have to enter all strokes of the desired symbol.

In some cases, the stroke information provided by the user will not correspond to any ideogram or symbol in the candidate word list 171, which ultimately will result in the user not being presented with the ideogram or symbol for selection using the techniques described above. Another aspect of the invention is: rather than requiring the user to manually extract the desired ideogram or symbol, and to start over by re-entering the stroke information and comparing it to all of the stroke information contained in the database 170, the processing module 173 may retain all of the stroke information that the user has provided and use it to identify at least one of the (typically) plurality of ideograms or symbols having strokes that have been entered so far. In practice, the identified ideograms or symbols may create a new list of candidate words 171 that is further reduced using other stroke information provided by the user until the desired ideogram or symbol is selected.

This aspect may be achieved by using the method 240 shown in fig. 7. The (illustrative) operation of method 240 is similar to that described in fig. 2, with similar components numbered similarly. In this method, step 242 is added after step 20 to check if the candidate word list is empty. If there is an item in the candidate word list, the method proceeds to step 21 as previously described. If the candidate word list is empty, the method 240 proceeds to step 244. In step 244, the stroke information previously entered by the user is applied to the complete ideographic list. Ideograms with similar stroke information form part of a new candidate word list 171. The list is then presented to the user in step 21 and the method proceeds as described in relation to figure 2. More iterations of the method, where additional stroke information is required from the user, may be used to further reduce the new candidate word list 171, if desired.

From the user's perspective, although generally apparent, the transition from the list of candidate words initially identified in the phonetic information to the list of candidate words based solely on stroke information is smooth, since the list of candidate words 171 will appear to be reduced in number when stroke information is entered, and then suddenly expand when all candidate words in the initial list have been excluded. Another benefit of this technique is that recognition errors (where the initial ideograms or symbols in the candidate word list are incorrect) can be easily corrected because all ideograms or symbols in the database 170 can be accessed based on stroke information, if desired.

FIG. 8 is a block diagram illustrating an exemplary processing system or text editing system 220 for use in the Japanese IME system. The system 220 includes a speech recognition system (e.g., the speech recognition system 160 described above) that inputs speech information, and a system for inputting stroke information (e.g., the handwriting recognition system 185 discussed above).

Speech information provided by the speech recognition system 160 is stored in the input memory 222 and is transferred from the input memory 222 to the conversion controller 224. If roman phonetic symbols are provided by the speech recognition system 160, the symbols are first processed using a conversion processor 226 to convert the roman phonetic symbols to kana characters. The conversion processor 226 accesses dictionary data stored in memory 228 (that converts roman phonetic symbols to kana characters).

Then, the kana data is divided into respective predetermined processing units (for example, into word units or clause units) under the control of the conversion processor 226. Subsequently, the divided data is subjected to kana-kanji conversion processing. The conversion processor 226 uses dictionary data, also stored in memory 228, for kana-kanji conversion. If multiple kanji forms correspond to a sequence of kana symbols, conversion processor 226 may select the most likely one from the list of candidate words as a result of the conversion determined by the language model stored in memory 230 (as is typically performed by an N-gram language model like the word trigram language model 175 in the exemplary embodiment of the speech recognition system illustrated in fig. 4). If the selected symbol is determined by the user to be incorrect, stroke information may be entered using handwriting recognition system 185 as discussed above to ultimately select the correct ideogram or symbol, wherein IME controller 224 serves as processing module 172, output memory 232 and output device 77 are used to render candidate word list 171.

It should also be noted that stroke information may be input by other devices besides the handwriting recognition system 185. For example, a keyboard with keys representing all strokes in an ideographic word may also be used. This type of system may be beneficial because by manipulating the key representing a particular stroke, it may no longer be necessary to recognize the stroke based on the user's handwriting. This type of input is used in the Chinese IME system, where "five strokes" are stroke information and the phonetic information includes "pinyin" symbols.

Although the present invention has been described with reference to preferred embodiments, workers skilled in the art will recognize that: changes may be made in form and detail without departing from the spirit and scope of the invention.

Claims

1. A method for multimodal input of ideographic languages, comprising:

receiving voice information about a desired ideogram to be input from an input voice;

establishing a candidate character list of possible ideograms as a function of the received voice message;

receiving stroke information associated with a desired ideogram, wherein the stroke information includes at least one stroke in the desired ideogram, wherein said receiving stroke information includes receiving stroke information from a handwriting input; using stroke information to obtain a desired ideogram from a list of candidate words, wherein said using stroke information comprises removing ideograms from the list of candidate words that do not have a stroke corresponding to the stroke information;

presenting the ideographs in the candidate word list to the user;

receiving input relating to an ideogram selected from the list of candidate words as a function of the rendered ideogram; wherein the sequence of steps of receiving stroke information associated with a desired ideogram, removing an ideogram that does not have a stroke corresponding to the stroke information from the list of candidate words, and presenting the ideograms in the list of candidate words to the user is repeated until input associated with the selected ideogram is received; and further comprising adding at least one new ideogram candidate word to the candidate word list if the number of candidate words in the candidate word list is reduced to zero by repeatedly performing the sequence of steps, wherein the at least one new ideogram candidate word is obtained as a function of the stroke information.

2. The method of claim 1, wherein adding at least one new ideographic candidate word to the candidate word list comprises: at least one new ideographic candidate word is added to the list of candidate words as a function of stroke information only.

3. A method for multimodal input of ideographic languages, comprising:

receiving stroke information associated with a desired ideogram, wherein the stroke information includes at least one stroke in the desired ideogram, wherein said receiving stroke information includes receiving stroke information from a handwriting input;

using stroke information to obtain a desired ideogram from a list of candidate words, wherein said using stroke information comprises removing ideograms from the list of candidate words that do not have a stroke corresponding to the stroke information;

presenting the ideographs in the candidate word list to the user;

receiving input relating to an ideogram selected from the list of candidate words as a function of the rendered ideogram; wherein the sequence of steps of receiving stroke information associated with a desired ideogram, removing an ideogram that does not have a stroke corresponding to the stroke information from the list of candidate words, and presenting the ideograms in the list of candidate words to the user is repeated until input associated with the selected ideogram is received; and further comprising adding a plurality of new ideogram candidate words to the candidate word list if the number of candidate words in the candidate word list is reduced to zero by repeatedly performing the sequence of steps, wherein each of the new ideogram candidate words is obtained as a function of the stroke information.

4. The method of claim 3, wherein adding the plurality of ideographic candidate words to the list of candidate words comprises: each ideogram candidate is added to the candidate list as a function of stroke information only.

5. The method of claim 3, wherein receiving the voice message comprises: an audible voice of the user is recognized.

6. The method of claim 3 or 5, wherein receiving stroke information comprises: individual strokes written by the user are identified.

7. A device for multimodal input of ideographic languages, comprising:

means for receiving voice information on a desired ideogram to be input from an input voice;

means for establishing a list of candidate words of the possible ideograms as a function of the received voice message;

means for receiving stroke information associated with the desired ideogram, wherein the stroke information includes at least one stroke in the desired ideogram, wherein said receiving stroke information includes receiving stroke information from the handwriting input;

means for using stroke information to obtain a desired ideogram from the candidate word list, wherein said using stroke information comprises removing from the candidate word list ideograms that do not have a stroke corresponding to the stroke information;

means for presenting the ideograms in the list of candidate words to the user;

means for receiving input relating to an ideogram selected from the list of candidate words as a function of the rendered ideogram; wherein the sequence of steps of receiving stroke information associated with a desired ideogram, removing an ideogram that does not have a stroke corresponding to the stroke information from the list of candidate words, and presenting the ideograms in the list of candidate words to the user is repeated until input associated with the selected ideogram is received; and further comprising means for adding at least one new ideogram candidate word to the candidate word list if the number of candidate words in the candidate word list is reduced to zero by repeatedly performing the sequence of steps, wherein the at least one new ideogram candidate word is obtained as a function of the stroke information.

8. The apparatus of claim 7, wherein the means for adding at least one new ideogram candidate to the candidate list comprises: means for adding at least one new ideographic candidate word to the list of candidate words as a function of stroke information only.

9. A device for multimodal input of ideographic languages, comprising:

means for presenting the ideograms in the list of candidate words to the user;

means for receiving input relating to an ideogram selected from the list of candidate words as a function of the rendered ideogram; wherein the sequence of steps of receiving stroke information associated with a desired ideogram, removing an ideogram that does not have a stroke corresponding to the stroke information from the list of candidate words, and presenting the ideograms in the list of candidate words to the user is repeated until input associated with the selected ideogram is received; and further comprising means for adding a plurality of new ideogram candidate words to the candidate word list if the number of candidate words in the candidate word list is reduced to zero by repeatedly performing the sequence of steps, wherein each of the new ideogram candidate words is obtained as a function of the stroke information.

10. The apparatus of claim 9, wherein the means for adding the plurality of ideographic candidate words to the list of candidate words comprises: means for adding each ideographic candidate word to the list of candidate words as a function of stroke information only.

11. The apparatus of claim 9, wherein the means for receiving voice information comprises: means for recognizing audible speech of a user.

12. The apparatus of claim 9, wherein the means for receiving stroke information comprises: means for recognizing individual strokes written by a user.