HK1050578B

HK1050578B - Language input architecture for converting one text form to another text form with modeless entry

Info

Publication number: HK1050578B
Application number: HK03102615.8A
Authority: HK
Inventors: 李凯夫; 陈征; 韩建
Original assignee: 微软公司
Priority date: 1999-11-05
Filing date: 2000-10-13
Publication date: 2010-02-05

Description

Language input architecture for converting one text form to another text form with modeless input

Technical Field

The invention relates to a language input method and a language input system. More particularly, the present invention provides a language input method and system that is fault tolerant to both typographical errors that occur during text entry and to conversion errors that occur during conversion from one language form to another.

Background

Language specific word processors have existed for many years. More complex word processors provide advanced tools for users, such as spelling and grammar correction, to assist in drafting documents. For example, many word processing machines are capable of recognizing misspelled words or grammatically incorrect sentence structures, and in some cases automatically correcting the recognized errors.

Generally, there are two reasons why errors are introduced into text. One reason is that the user does not know the correct spelling or sentence structure at all. Word processors can make suggestions to assist users in selecting the correct spelling or word. A second and more typical cause of error is that the user incorrectly enters a word or sentence into the computer, even if he/she knows the correct spelling or grammatical structure. Word processors are often quite useful in this context in recognizing improperly entered strings and correcting them into intended words or phrases.

Entry errors tend to be more prevalent in word processors designed for languages that do not employ roman characters. For many languages, no language specific keyboard, such as the english version of the QWERTY keyboard, exists because these languages have many more characters than can be conveniently arranged as keys in the keyboard. For example, many asian languages contain thousands of characters. It is virtually impossible to create a keyboard to support individual keys for these many different characters.

Rather than designing expensive language and dialect specific keyboards, language specific word processing systems have been developed that allow users to enter phonetic text from a small character set keyboard (e.g., a QWERTY keyboard) and convert that language text to language text. "phonetic text" represents the sound made when a given language is spoken, and "language text" represents the written characters as they actually appear in the text. For example, in Chinese, "Pinyin" is an example of phonetic text, and "Hanzi" is an example of language text. By converting phonetic text to language text, many different languages can be processed by a language specific word processor using a conventional computer and a standard QWERTY keyboard.

Thus, word processors that require speech input suffer from two types of possible input errors. One type of input error is a common typographical error. However, another type of error is that the word processing engine may incorrectly convert the language text to unintended character text, even if the text is not typographically incorrect. When both of these problems work in the same phonetic text input string, a series of errors may result. In some instances, typing-induced errors may not be easily detected without long studies of the entire context of the phrase or sentence.

The invention described herein is primarily directed to the former class of input errors made by the user when typing language text, but also provides tolerance to conversion errors made by the word processing engine. To better demonstrate the problems that accompany such typographical errors, consider a Chinese-based word processor that converts phonetic text "pinyin" to the language text "kanji".

There are several reasons why entering phonetic text tends to increase in typographical errors. One reason is that the average typing accuracy on an english keyboard is lower in china than in english countries. The second reason is that the phonetic text is used less frequently. In early education, users were not inclined to study and learn phonetic spelling as English speaking users were taught to spell English words.

A third reason for the increased typographical errors in the phonetic text input process is that many people speak in a local dialect, as opposed to a standard dialect, naturally. The standard dialect that is the source of the speech text is the second language. In some dialects and accents, the spoken words may not match the corresponding appropriate phonetic text, making it more difficult for a user to type phonetic text, for example, many Chinese people speak various Chinese dialects as their first language and are taught "Mandarin" Chinese as the second language, with "Mandarin" being the source of "pinyin". For example, in some chinese parlance, there is no distinction in the pronunciation of "h" and "w" in certain contexts; while in other dialects, "ng" and "n" can be said to be the same; in other dialects as well, "r" cannot be clearly emitted. As a result, Chinese users who speak "Mandarin" as the second language may be prone to typographical errors when attempting to input pinyin.

Another possible reason for many typographical errors is that it is difficult to check for errors when inputting phonetic text. This is due in part to the fact that phonetic text is often a long, unreadable string of characters that are difficult to read. In contrast to english-based text input where the content being viewed is the input content, the input phonetic text is often not "what you see is what you get," but rather is converted by a word processor into language text. As a result, the user typically does not check for errors in the phonetic text, but rather waits until the phonetic text is converted to a language text.

For this last reason, typographical errors can be particularly annoying in the case of "pinyin" input. Pinyin strings are difficult to check and correct because there are no spaces between the characters. Instead, the pinyin characters appear together regardless of how many words are formed by the pinyin characters. Furthermore, the conversion of pinyin to hanzi often does not occur immediately, but continues as more pinyin text is entered to constitute the correct interpretation. Thus, if the user enters the wrong pinyin symbol, a single error may be incorporated by the conversion process and propagated downstream, causing several additional errors. As a result, correction takes longer because by this time the system has positively switched to the Kanji character and the user then knows that an error has occurred, and the user is forced to back check many times, just to make one correction. In some systems, the original error cannot even be revealed.

Since errors are expected to frequently occur during speech input, a system that is tolerant of errors in speech input is needed. It would be desirable to have a system that returns a correct answer even when the speech string contains slightly wrong characters.

In addition to the input problem, language-specific word processors face another problem that involves switching modes (modes) between two languages to input words from different languages into the same text. For example, it is common to include english words, such as technical terms (e.g., the Internet) or terms that are difficult to translate (e.g., abbreviations, symbols, surnames, company sentences, etc.) in a drafted chinese document. Conventional word processors require the user to switch from one language mode to another when entering different words. Thus, when a user wants to enter a word from another language, the user must stop thinking about the text input, switch the mode from one language to another, enter the word, and then switch the mode back to the first language. This significantly reduces the typing speed of the user and requires the user to divert his/her attention between the text entry task and the additional control task of changing language mode.

Thus, there is a need for a "modeless" system that does not require mode switching. To avoid these patterns, the system should be able to detect the language being entered and then dynamically convert the letter sequence to that or that language on a word-by-word basis.

However, this is not as easy as it might look, as many strings may be suitable for both contexts. For example, many valid English words are also valid "Pinyin" strings. Furthermore, in pinyin input, there may be more confusion because there is no space between Chinese characters and between Chinese and English words.

As an example, when a user enters the pinyin input text string "woshiyigezhongguoren", the system converts the string into chinese characters: "I am a Chinese" (usually translated to "I am a Chinese").

Sometimes, instead of entering "woshiyigezhongguoren", the user enters the following character string:

wosiyigezhongguoren (error is "sh" and "s" confusion);

woshiyigezongguoren (error is confusion between "zh" and "z");

woshiygezhongguoren (error is "y" followed by "i");

woshiyigezhonggouren (error is "ou" juxtaposition);

woshiyigezhongguiren (error is an confusion of "i" and "o").

The inventors have developed a word processing system and method that enables spelling correction of difficult foreign languages, such as chinese, and also allows modeless input of multiple languages through automatic language recognition.

Disclosure of Invention

A language input architecture converts a phonetic text input string (e.g., Chinese Pinyin) to a language text output string (e.g., Chinese Hanzi) in a manner that minimizes typing errors and conversion errors during conversion from phonetic text to language text. The language input architecture may be implemented in a variety of fields including word processing programs, email programs, spreadsheets, browsers, and the like.

In one implementation, the language input architecture has a user interface for receiving input strings of characters, symbols, or other textual elements, which may include both phonetic and non-phonetic text, and one or more languages. The user interface allows a user to enter an input text string into a single edit line without switching modes between inputs in different text forms or different languages. In this manner, the language input architecture provides modeless input in multiple languages for the convenience of the user.

The language input architecture also has a search engine, one or more typing models (models), a language model, and one or more dictionaries (lexicon) for different languages. The search engine receives an input string from the user interface and distributes the input string to one or more typing models, each typing model configured to generate a list of possible typing candidates that may replace the input string, the list generated based on a probability of a typographical error of how likely each candidate string is to be incorrectly entered as the input string. These probable typing candidates may be stored in a database.

The typing model is trained from data collected from many trainees who enter training text. For example, in the case of Chinese, the trainees enter training text written in "Pinyin". The observed errors made during the run-in of the training text are used to calculate the probability of accompanying typing candidates that may be used to correct the typing errors. Where multiple typing models are used, each typing model may be trained in a different language.

In one implementation, the typing model may be trained by reading input text strings and mapping syllables to the corresponding typed letters of each string. A frequency count is maintained representing the number of times each typed letter is mapped to one of the syllables, and the probability of typing for each syllable is calculated based on the frequency count.

The typing model returns a set of possible typing candidates that constitute possible typing errors in the input string. The typing candidate string is written in the same language or text form as the input string.

The search engine passes the typing candidates to a language model, which provides possible conversion strings for each typing candidate. More specifically, the language model is a trigram language model that determines, based on two previous text elements, how likely a possible translation output string is to represent the language text probability of the candidate string. The converted string is written in a different language or text form than the input string. For example, the input string may contain Chinese pinyin or other phonetic text, while the output string may contain Chinese hanzi or other language text.

Based on the probabilities derived in the typing model and the language model, the search engine selects the associated typing candidate string and the conversion candidate string that exhibits the highest probability. The search engine converts an input string (e.g., written in phonetic text) into an output string containing conversion candidate strings returned from the language model to replace the input text form (e.g., phonetic text) with another text form (e.g., language text). In this way, any input errors made by the user during the input of the phonetic text can be eliminated.

When multiple languages are used, the output string may have a combination of the conversion candidate string and a portion of the input string (without conversion). An example of the latter case is a chinese-based language input architecture that outputs both pinyin-to-chinese converted text and non-converted english text.

The user interface displays the output text string in the same edit line that continues to be used for inputting the input string. In this manner, the conversion occurs automatically and simultaneously with the user entering the post-added text.

Drawings

The same numbers are used throughout the drawings to reference like components or features.

FIG. 1 is a block diagram of a computer system having a language specific word processor implementing a language input architecture.

FIG. 2 is a block diagram of one embodiment of the language input architecture.

FIG. 3 is a graphical representation of a text string that is segmented or segmented into different syllable groups and candidate strings that may be used to replace those syllables, assuming the text string contains errors.

FIG. 4 is a flow diagram illustrating a general conversion operation by the language input architecture.

FIG. 5 is a block diagram of a training computer for training a probability-based model employed in the language input architecture.

FIG. 6 is a flow chart illustrating a training technique.

FIG. 7 is a block diagram of another embodiment of the language input architecture in which multiple typing models are employed.

FIG. 8 is a flow chart showing a multilingual conversion process.

Detailed Description

The present invention relates to a language input system and method that converts one language form (e.g., a phonetic version) to another language form (e.g., a written version). The system and method have fault tolerance for spelling and typing errors during text entry and conversion errors that occur during conversion from one language form to another. For ease of discussion, the invention is described herein in the general context of a word processing program that is executed by a general purpose computer. However, the present invention may be implemented in many different environments other than word processing and may be used on many different types of devices. Other situations may include email programs, spreadsheets, browsers, etc.

The language input system utilizes a statistical language model to achieve high accuracy. In one embodiment, a language input system architecture uses a statistical language model plus an automatic, maximum likelihood-based approach to segmenting words, selecting a dictionary, filtering training data, and deriving most likely translation candidate strings.

However, the statistical, sentence-based language model assumes that the user's input is perfect. In practice, there are many typing and spelling errors in the user's input. Thus, the speech input architecture includes one or more typing models that utilize probabilistic spelling models to accept correct typing while tolerating common typing and spelling errors. These typing models may be trained on multiple languages (e.g., english and chinese) to discern how much of the input sequence is likely to be a word in one language over another. The two models can run in parallel and be guided by a language model (e.g., the chinese language model) to output the most likely character sequence (i.e., english and chinese).

Example computer System

FIG. 1 shows an example computer system 100 having a Central Processing Unit (CPU)102, a memory 104, and an input/output (I/O) interface 106. The CPU102 communicates with a memory 104 and an I/O interface 106. Memory 104 represents both volatile memory (such as RAM) and non-volatile memory (e.g., ROM, hard disk, etc.).

Computer system 100 has one or more peripheral devices connected via I/O interface 106. Example peripheral devices include a mouse 110, a keyboard 112 (e.g., an alphanumeric QWERTY keyboard, a shorthand keyboard, etc.), a display monitor 114, a printer 116, a peripheral storage device 118, and a microphone 120. The computer system may be implemented as a general purpose computer, for example. Thus, computer system 100 implements a computer operating system (not shown) that is stored in memory 104 and executed in CPU 102. The operating system is preferably a multitasking operating system supporting a windowing environment. One example of a suitable operating system is the Windows operating system from microsoft corporation.

It should be noted that other computer system configurations may be used, such as hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, and the like. In addition, although a stand-alone computer is shown in FIG. 1, the language input system may also be used in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network (e.g., a LAN, the Internet, etc.). In a distributed computing environment, program modules may be located in both local and remote memory storage devices.

A data or word processing program 130 is stored in memory 104 and executed on CPU 102. Other programs, data, files, etc. may also be stored in memory 104, but are not shown for ease of discussion. Word processor 130 is configured to receive the phonetic text and automatically convert it to language text. More specifically, word processor 130 implements a language input architecture 131 that, for discussion purposes, is implemented as computer software stored in memory and executable on a processor. In addition to the architecture 131, the word processor 130 may include other components, but these components are considered standard components of word processors and are not shown or discussed in detail.

Language input architecture 131 of word processing program 130 has a User Interface (UI)132, a search engine 134, one or more typing models 135, a language model 136, and one or more dictionaries 137 for various languages. Architecture 131 is language independent. UI 132 and search engine 134 are generic and can be used in any language. Architecture 131 may be adapted to a particular language by changing language model 136, typing model 135, and dictionary 137.

Together, the search engine 134 and the language model 136 form a speech-to-text converter 138. With the assistance of typing model 135, converter 138 becomes tolerant to typing errors and spelling errors by the user. For purposes of this disclosure, "text" refers to one or more characters and/or non-character symbols. "phonetic text" generally refers to alphanumeric text representing the sound made when a given language is spoken. "language text" is the characters and non-character symbols that represent the written language. "non-speech text" is alphanumeric text that does not represent the sound made when speaking a given language. Non-speech text may include punctuation, special symbols, and alphanumeric text representing a written language that is not language text.

Perhaps more generally, the phonetic text may be any alphanumeric text represented by a romaji-based character set (e.g., english letters) that represents the sound emitted when speaking a given language that does not utilize the romaji-based character set when written. Language text is a written symbol corresponding to the given language.

For ease of discussion, word processor 130 is described in the context of a Chinese-based word processor, while language input architecture 131 is configured to convert "pinyin" to "hanzi". That is, the phonetic text is pinyin and the language text is hanzi. However, the language input architecture is language independent and can be used for other languages. For example, the phonetic text may be spoken Japanese, and the language text represents a Japanese written language, such as "Kanji". Many other examples exist, including, but not limited to, arabic, korean, indian, other asian languages, and the like.

The input of the phonetic text is made via one or more peripheral input devices, such as a mouse 110, a keyboard 112, or a microphone 120. In this manner, the user is allowed to enter phonetic text using a typing or spoken approach. In the case of spoken input, the computer system may further implement a speech recognition module (not shown) to receive the spoken words and convert them into phonetic text. The following discussion assumes that text entry via the keyboard 112 is performed on a full-size, standard alphanumeric QWERTY keyboard.

The UI 132 displays the voice text while the voice text is being input. The UI is preferably a graphical user interface. A more detailed discussion of UI 132 may be found in co-pending application Ser. No. __, entitled "language input user interface," which is incorporated herein by reference.

User interface 132 transfers phonetic text (P) to search engine 134, which search engine 134 transfers the phonetic text to typing model 137. Typing model 137 generates various typing candidate strings (TC)₁，…TC_N) They may be the appropriate version of the user's intended phonetic text, but it is known that the phonetic text may include errors. Typing model 137 returns a plurality of typing candidates with reasonable probabilities to search engine 134, and search engine 134 passes the typing candidates to language model 136. The language model 136 evaluates these typing candidates in the context of the ongoing sentence and generates various Conversion Candidates (CC) written in language text₁，…CC_N) They may represent the converted form of the user's intended speech text. The conversion candidate string is associated with the typing candidate string.

The conversion from speech text to language text is not a one-to-one conversion. The same or similar phonetic text may represent several characters or symbols in the language text. Thus, the context of the speech text is interpreted before conversion to the language text. Non-speech text, on the other hand, will typically be directly converted one-to-one, with the displayed alphanumeric text being the same as the alphanumeric input.

Conversion candidate string (CC)₁…CC_N) Is passed back to search engine 134 where a statistical analysis is performed to determine which of the typing and conversion candidate strings exhibits the highest probability of being the one intended by the user. Once the probabilities are calculated, the search engine 134 selects the candidate string with the highest probability and returns the language text that converted the candidate string to the UI 132. Then, the UI 132 replaces the speech text thereof with the language text of the conversion candidate string in the same line of the display. At the same time, the newly entered phonetic text continues to be displayed in front of the newly inserted language text in the line.

If the user wishes to change the language text selected by search engine 134, user interface 132 presents a first list in which the other high probability candidate strings are listed in order of the likelihood size of being actually selected as the intended answer. If the user is still not satisfied with the possible candidate strings, the UI 132 presents a second list that provides all possible choices. This second list may be ordered by probability or other metric, such as number of strokes or complexity in the chinese text.

Language input architecture

Fig. 2 shows the language input architecture 131 in more detail. Architecture 131 supports fault tolerance for language input, including typographical errors and conversion errors. In addition to UI 132, search engine 134, language model 136, and typing model 135, architecture 131 further includes editor 204 and sentence context model 216. The sentence context model 216 is coupled with the search engine 134.

The user interface 132 receives input text, such as phonetic text (e.g., chinese pinyin text) and non-phonetic text (e.g., english) from one or more peripheral devices (e.g., keyboard, mouse, microphone) and communicates the input text to the editor 204. Editor 204 requests search engine 132 to convert the input text to output text, such as language text (e.g., chinese kanji text), in conjunction with typing model 135 and language model 136. The editor 204 passes the output text back to the UI 132 for display.

Upon receiving an input text string from user interface 132, search engine 134 sends the input text string to one or more of typing model 135 and sentence context model 216. Typing model 135 measures a priori probabilities of typographical errors in the input text. Typing model 135 generates and outputs likely typing candidates for the input text entered by the user, effectively seeking to correct input errors (e.g., typographical errors). In one implementation, the typing model looks for possible candidate strings in the candidate string database 210. In another implementation, typing model 135 uses a statistical-based model to generate possible candidate strings for the input text.

Sentence context model 216 may optionally send any previously input text in the sentence to search engine 132 for use by typing model 135. In this manner, the typing model may generate possible typing candidates based on a combination of the new text string and the previously entered text string in the sentence.

It should be understood that the terms "input error," "typographical error," and "spelling error" may be used interchangeably to refer to errors made in entering text with a keyboard. In the case of spoken input, such errors may be caused by improper recognition of the speech input.

Typing model 135 may return all possible typing candidates or may eliminate possible typing candidates with a lower probability so that only possible typing candidates with a higher probability are returned to search engine 134. It should also be understood that this deletion function is accomplished by search engine 134 rather than typing model 135.

According to one aspect of the present invention, typing model 135 is trained using real data collected from hundreds or thousands of trainees who are required to enter sentences to observe common typographical errors. The typing model and its training will be described in more detail below under the "train typing model" heading.

Search engine 134 sends the list of possible typing candidates returned by typing model 135 to language model 136. Briefly stated, a language model measures the likelihood of these words or text strings within a given context (e.g., a phrase or sentence). That is, the language model may take any sequence of items (words, characters, letters, etc.) and estimate the probability of that sequence. Language model 136 combines the possible typing candidates from search engine 134 with previous text to generate one or more language text candidates corresponding to the typing candidates.

Corpus data (corps data) or other types of data 214 are used to train the trigram language model 136. The training corpus 214 may be any type of general data, such as daily text for news articles, or environment-specific data, such as text for a particular field (e.g., medical). The training language model 136 is known in the word processing arts and will not be described in detail herein.

Language input architecture 131 tolerates errors made during the input of an input text string and attempts to return the most likely words and sentences for the input string. The language model 136 helps the typing model 135 to determine which sentence is most reasonable for the input string entered by the user. These two models can be described statistically as the probability that the entered text string S is a recognizable and valid word in the dictionary, or P (w | S). Using Bayes formula, the probability P (w | s) is described as:

the denominator p(s) remains the same in order to compare possible intended words for a given input string. The analysis then only focuses on the product of the molecules P (s | w) · P (w), where probability P (s | w) represents the spelling or typing model and probability P (w) represents the language model. More specifically, the typing model P (s | w) describes how likely one would like to enter X but instead enter Y; and the language model p (w) describes how likely it is to produce a particular word for a given sentence context.

In the specific case of converting "pinyin" to "hanzi", the probability P (w | s) can be restated as P (H | P), where H represents a string of chinese characters and P represents a string of pinyin. The goal is to find the most likely Chinese character H', thus maximizing P (H | P). Thus, the probability P (H | P) is the likelihood that the input pinyin string P is a valid chinese string H. Since P is fixed, P (P) is constant for a given pinyin string, and the bayesian formulation reduces the probability P (H | P) as follows:

H’＝arg max_HP(H|P)＝argmax_HP(P|H)＊P(H)

the probability P (P | H) represents the spelling or typing model. In general, a Chinese character string H can be further decomposed into a plurality of words W₁，W₃，…W_MAnd the probability P (P | H) can be estimated as:

Pr(P|H)∏P(P_f(i)/W_i)

where P is_f(i)Is the word W_iThe corresponding pinyin character sequence.

In the prior art statistics-based pinyin-to-Chinese conversion system, if P_f(i)Is the word W_iIs acceptable, then the probability P (P)_f(i)|W_i) Set to 1 if not the word W_iAcceptable composition ofWrite, then probability is set to 0. As a result, conventional systems do not provide fault tolerance for any erroneously entered characters. Some systems have a "southern confusing pronunciation" feature to address this issue, but this still uses preset probability values of 1 and 0. Moreover, such a system only addresses a small fraction of typing errors, since it is not data-driven (learned from real typing errors).

In contrast, the language architecture described herein utilizes both typing and language models for conversion. Typing models by training probabilities P (P) from a real corpus_f(i)|W_i) Fault tolerance for erroneous input characters is enabled. There are many ways to build typing models. In theory, all possible P (P)_f(i)|W_i) Can be trained; in practice, however, too many parameters exist. To reduce the number of parameters that need to be trained, one approach is to consider only single-character words and map all characters with equivalent pronunciations to a single syllable. In Chinese language, there are about 406 syllables, so this is essentially a training P (Pinyin text | syllable), and then each character is mapped to its corresponding syllable. This is described in more detail below under the heading "training typing model".

Using the language architecture 131, a wide range of probabilities is computed. One goal of pinyin-to-hanzi conversion is to find the hanzi string H that maximizes the probability P (P | H). This is done by selecting the Wi that produces the highest probability as the best kanji sequence. In practice, an efficient search may be used, such as the well-known Viterbi bundle (Viterbi Beam) search. For more information on Viterbi cluster searches, the reader is referred to an article by Kai-Fu Lee entitled "automatic Speech recognition", Kluwer Academic publishers, 1989, and an article by Chin-Hui Lee, Frank K.Soong, Kuldip K.Paliwal entitled "automatic Speech and speaker recognition — advanced topics", Kluwer Academic publishers, 1996.

Probability p (h) represents a language model that measures the prior probability of any given string of words. The usual approach to constructing statistical language models is to construct an N-gram language model from a known training text set using a prefix tree data structure. An example of a widely used statistical language model is the N-gram markov model, which is described in the text "statistical model for speech recognition", author Frederick Jelinek, MIT press, Cambridge, Massachusetts, 1997. The use of a prefix tree data structure (a.k.a. suffix tree, or PAT tree) enables a high-level application to quickly traverse the language model, providing the substantially real-time performance characteristics described above. The N-gram language model counts the number of times a particular item (word, character, etc.) in a string (size N) in the entire text occurs. The count is used to calculate a probability of use for the string of items.

The language model 136 is preferably a trigram language model (i.e., an N-gram, where N ═ 3), although a bigram may be suitable in some contexts. The trigram language model is applicable to english and well suited to chinese, given that it utilizes a large training corpus.

The trigram model considers the two most previous characters in a text string to predict the next character, as follows:

(a) segmenting the characters (C) into individual language texts or words (W) using a predetermined dictionary, where each W is mapped to one or more C's in a tree;

(b) predicting a word sequence from the first two words (W)₁，W₂，…W_M) Probability of (c):

P(W₁，W₂，…W_M)≈∏P(W_n|W_n-1，W_n-2) (1)

where P () represents the probability of the language text;

W_nis the current word

W_n-1Is the previous word

W_n-2Is W_n-1One word before.

FIG. 3 shows an example of input text 300 that is entered by the user and passed to typing model 135 and language model 136. Upon receipt of input text 300, typing model 135 segments input text 300 in various ways to produce a list of possible typing candidates 302 that take into account the typographical errors made during the keyboard input. Typing candidate string 302 has different segments in each time frame such that the end time of the previous word is the start time of the current word. For example, the top row of candidate string 302 segments input string 300 "mafangnitryyis …" into "ma", "fan", "ni", "try", "yi", and so on. The second row of typing candidate string 302 segments the input string "mafangniryyis …" differently into "ma", "fang", "nit", "yu", "xia", etc.

These candidate strings may be stored in a database or some other accessible memory. It should be understood that FIG. 3 is merely an example, and that there may be different numbers of possible typing candidates for the input text.

The language model 136 evaluates each segment of the possible typing candidate strings 302 in the context of the sentence and generates corresponding language text. For ease of illustration, each segment of possible typed text 302 and the corresponding possible language text are combined into a box.

The search engine 134 performs a statistical analysis from these candidate strings to determine which candidate string exhibits the highest probability of being the user's intended string. The typing candidates in each row are independent of each other, so the search engine is free to select different segments from any row to determine acceptable conversion candidates. In the example of FIG. 3, the search engine has determined that the highlighted typing candidate strings 304, 306, 308, 310, 312, and 314 exhibit the highest probabilities. The candidate strings may be concatenated from left to right such that candidate string 304 is followed by candidate string 306, and so on, to form an acceptable interpretation of the input text 300.

Once the probabilities are calculated, the search engine 134 selects the candidate string with the highest probability. The search engine then converts the input phonetic text to language text associated with the selected string. For example, the search engine converts the input text 300 into language text displayed in boxes 304, 306, 308, 310, 312, and 314 and returns the language text to the user interface 132 via the editor 204. Once punctuation is received at the user interface, i.e., the new input text string is in a new sentence, typing model 135 begins operating on the new text string in the new sentence.

Generic conversion

Fig. 4 shows a general process 400 for converting phonetic text (e.g., pinyin) to language text (e.g., chinese characters). The process is implemented by the language input architecture 131 and is described with additional reference to FIG. 2.

At step 402, the user interface 132 receives a phonetic text string, such as pinyin, entered by a user. The input text string contains one or more typographical errors. UI 132 transfers the input text to search engine 134 via editor 204, and search engine 134 assigns the input text to typing model 135 and sentence context model 216.

At step 404, typing model 135 generates possible typing candidates from the input text. One way to derive candidate strings is to segment the input text string into different parts and find the candidate string in the database that most closely resembles the input string segment. For example, in FIG. 3, candidate string 302 has a segment that specifies the possible segments "ma", "fan", etc.

The possible typing candidates are returned to search engine 134, and search engine 134 passes the candidates to language model 136. Language model 136 combines the possible typing candidates with previous text and generates one or more language text candidates corresponding to the typing candidates. For example, referring to candidate string 302 in FIG. 3, the language model returns the language text in box 302 as possible output text.

At step 406, the search engine 134 performs a statistical analysis to determine which candidate string exhibits the highest probability of being the user's intended string. Once the most likely typing candidate string is selected for the phonetic text, the search engine converts the input phonetic text into language text corresponding to the typing candidate string. In this way, any input errors by the user during the input of the phonetic text can be eliminated. Search engine 134 returns error-free language text to UI 132 via editor 204. At step 408, the converted language text is displayed on the screen at the same line where the user continues to enter the phonetic text.

Training typing model

As previously noted, typing model 135 is based on probability P (s | w). The typing model computes probabilities for different typing candidates, which can be used to convert input text to output text and select possible candidates. In this manner, the typing model returns possible typing candidates for the input text even in the presence of typing errors, thereby enabling fault tolerance of the typing model.

One aspect of the present invention involves training typing model P (s | w) from actual data. The typing model is developed or trained on text entered by as many trainers as possible, e.g., hundreds, and preferably thousands. The trainees enter the same or different training data and any differences between the entered and trained data are caught as typographical errors. The goal is to have them enter the same training text and determine the probability based on the number of errors in their typing or the number of typing candidates. In this manner, the typing model learns the probability of typographical errors by the trainer.

Fig. 5 shows a training computer 500 having a processor 502 and volatile memory 504. And a non-volatile memory 506. Training computer 500 runs a training program 508 to generate probabilities 512 (i.e., P (s | w)) from user-entered data 510. Training program 508 is shown executing on processor 502, but it is loaded into the processor from memory on non-volatile memory 506. Training computer 500 may be configured to train on data 510 during the input of data 510, or after it is collected and stored in memory.

For ease of discussion, consider a typing model that is customized for the Chinese language, where Chinese Pinyin text is converted to Chinese text. In this case, thousands of people are invited to enter pinyin text. Preferably, hundreds of sentences are collected from each person, with the goal of having them have similar types and numbers of errors in their typing. The typing model is configured to receive pinyin text from a search engine and provide possible candidate strings for replacing characters in an input string.

Various techniques can be used to train typing model 135. In one approach, the typing model is trained directly by considering a single character text and mapping all of the equally pronounced character text to a single syllable. For example, there are over four hundred syllables in Chinese pinyin. For a given syllable, the probability of its phonetic text (e.g., P (pinyin text | syllable)) is trained, and then each character text is mapped to its corresponding syllable.

FIG. 6 shows a syllable mapping training technique 600 where, at step 602, the training program 508 reads a text string entered by the trainer. The text string may be a sentence or some other combination of words and/or characters. Program 508 aligns or maps the syllables to the corresponding letters in the text string (step 604). For each text string, the letter frequencies mapped to each syllable are updated (step 606). This step is repeated for each text string contained in the training data entered by the trainees, as represented by the "YES" branch from step 608. The end result is that the input text string will represent many or all of the syllables in the chinese pinyin. Once all the strings have been read, the training program determines the probability P (Pinyin text | syllables) that the user entered each syllable, as represented by the "NO" branch from step 608 (step 610). In one implementation, the typing probability is determined by first normalizing all syllables.

Each syllable may be represented as a Hidden Markov Model (HMM). Each input key can be seen as a sequence of states mapped in the HMM. The correct input and the actual input are aligned to determine the transition probabilities between the states. Different HMMs can be used to simulate different skill levels of typists.

For all 406 syllables in the training text, a large amount of data is required. To reduce this data requirement, the same letters in different syllables are concatenated as a state. This reduces the number of states to 27 (i.e., 26 letters from "a" to "z", plus one state representing an unknown letter). This model can be integrated into a Viterbi bundle search using a trigram language model.

In another training technique, training is based on the probability of single letter editing, such as the insertion of one letter (i.e., + > x), the deletion of one letter (i.e., x → phi), and the substitution of one letter for another letter (x → y). The probability of such a single letter edit can be statistically expressed as:

and (3) replacing: p (x is replaced by y)

Inserting: p (x inserted before/after y)

And (3) deleting: p (x before/after y deleted)

Each probability (P) is essentially a two-letter typing model, but can also be extended to an N-letter typing model that takes into account the much broader text context outside of adjacent characters. Thus, for any possible input text string, the typing model has a probability of generating each possible letter sequence by first providing the correct letter sequence and then using dynamic programming to determine the lowest cost path to convert the correct letter sequence to the given letter sequence. The cost may be determined as the minimum number of erroneous characters, or some other metric. In practice, this error model can be implemented as part of the V itebi bundle search method.

It should be understood that any other type of error besides typographical errors or spelling errors can be trained within the scope of the present invention. Also, it should be understood that different training techniques can be used to train the typing model without departing from the scope of the present invention.

Multi-language training without modal input

Another annoying problem that plagues language input systems is the need to switch between modes when inputting two or more languages. For example, a user typing in Chinese may wish to enter an English word. Conventional input systems require the user to switch modes between inputting english words and chinese words. Unfortunately, users tend to forget to make the switch.

Language input architecture 131 (FIG. 1) can be trained to accept mixed language inputs, thus eliminating mode conversion between two or more languages in a multilingual word processing system. This is called "no mode input".

The language input architecture implements a spelling/typing model that automatically distinguishes words in different languages, such as which word is Chinese and which word is English. This is not easy because many legitimate english words are also legitimate pinyin strings. In addition, because there is no space between pinyin, english and chinese characters, more confusion can be generated during input. Using bayesian rules:

H’＝arg max_HP(H|P)＝arg max_HP(P|H)＊P(H)

the objective function can be characterized in two parts: one spelling model P (ph) for english and one language model P (H) for chinese.

One way to process mixed-language input is to train a language model for a first language by treating words from a second language (e.g., English) as a special class of the first language. For example, a word from the second language is treated as a single word in the first language.

By way of example, assume a Chinese-based word processing system uses an English keyboard as an input device. The typing model utilized in this chinese-based word processing system is a chinese language model trained on text mixed with english and chinese words.

A second way to handle mixed language input is to implement two typing models, one chinese and one english typing model, in the language input architecture and train each model separately. That is, the Chinese typing model is trained by the trainer entering a keyboard input stream (e.g., a phonetic string) in the manner described above, while the English typing model is trained on English characters entered by an English speaking trainer.

The english typing model can be implemented as a combination of:

1. a real english trained single-letter set language model inserted in chinese language text. This model can handle many frequently used english words, but cannot predict unseen english words.

2. English spelling model of triphone probability. This model should have a non-zero probability for each tri-syllable sequence, but yields a higher probability for words like english. This can also be trained on real english words and can deal with unseen english words.

These english models generally return a high probability for english text, a high probability for strings that appear as english text, and a low probability for non-english text.

FIG. 7 shows a language input architecture 700 modified from architecture 131 in FIG. 2, which utilizes multiple typing models 135(1) - (135 (N). Each typing model is configured for a particular language. Each typing model 135 is trained individually using words and errors that are common to that particular language. Thus, the individual training data 212(1) - (212 (N) are provided to the corresponding typing models 135(1) - (135 (N)). In this example case, only two typing models are used: one for English and one for Chinese. However, it should be understood that the language input architecture may be modified to include more than two typing models to accommodate input in more than two languages. It should also be noted that the language input architecture can be used with many other types of multilingual word processing systems, such as Japanese, Korean, French, German, and the like.

During operation of the language input architecture, the English typing model and the Chinese typing model operate in parallel. These two typing models compete with each other to distinguish whether the input text is in english or chinese by calculating the probability that the input text string may be a chinese string (including errors) or may be an english string (also potentially including errors).

When the input text string or sequence is clearly a Chinese pinyin text, the Chinese typing model returns a much higher probability than the English typing model. Thus, the language input architecture converts the input pinyin text to chinese character text. When an input text string or sequence is clearly in english (e.g., the last name, abbreviation ("IEEE"), company name ("Microsoft"), technical term ("INTERNET"), etc.), the english typing model exhibits a much higher probability than the chinese typing model. Thus, the architecture converts this input text to English text based on the English typing model.

When an input text string or sequence is ambiguous, the Chinese and English typing models continue to compute probabilities until further context provides more information to distinguish between Chinese and English. When the input text string or sequence does not resemble either Chinese or English, the Chinese typing model is less tolerant of faults than the English typing model. As a result, the English typing model has a higher probability than the Chinese typing model.

To illustrate the multilingual conversion, assume that the user enters a text string "woaidu internetzazhi", which means "I love reading the INTERNET magazine". The beginning string "woaidu" is received, the Chinese typing model generates a higher probability than the English typing model, and the part of the input text is converted into "I love reading". The architecture continues to find that the next entered portion "interne" is ambiguous until the letter "t" is entered. At this point, the English typing model returns a higher probability for "INTERNET" than the Chinese typing model, and the language input architecture converts this portion of the input text to "INTERNET". Next, for "zazhi," the Chinese typing model exhibits a higher probability than the English typing model, and the language input architecture then converts that portion of the input text into "magazines.

Multi-language input conversion

FIG. 8 shows a process 800 for converting an input multilingual input text string having typographical errors into an error-free multilingual output text string. The process is implemented by language input architecture 700 and is described herein with additional reference to FIG. 7.

In step 802, the user interface 132 receives a multilingual input text string. It contains speech words (e.g., pinyin) and words in at least one other language (e.g., english). The input text may also include typographical errors caused by the user entering the speech words and the second-language words. UI 132 communicates the multilingual input text string to search engine 134 via editor 204, and search engine 134 assigns the input text to typing models 135(1) - (135 (N), and sentence context model 216.

Each typing model generates possible typing candidates from the input text, as represented by steps 804(1) -804 (N). At step 806, possible typing candidates with reasonable probabilities are returned to search engine 134. At step 808, search engine 134 sends typing candidate strings with typing probabilities to language model 136. At step 810, the language model combines the possible typing candidates with the previous text to provide a sentence-based context and generates one or more conversion candidates for the language text corresponding to the typing candidates by selecting a path through the typing candidates, as described above with respect to FIG. 3. At step 812, the search engine 134 performs a statistical analysis to select the conversion candidate string having the highest probability of being the user-intended string.

At step 814, the most likely conversion candidate string for the text string is converted into an output text string. The output text string includes language text (e.g., kanji) and a second language (e.g., english), but removes typographical errors. Search engine 134 returns error-free output text to UI 132 via editor 204. At step 816, the converted language text is displayed on the UI 132 at the same line where the user continues to enter the phonetic text.

In the above example, the Chinese language is the primary language and English is the secondary language. It should be understood that both languages can be designated as primary languages. Furthermore, the mixed input text string may be composed of more than two languages.

Conclusion

Although the description above uses language that is specific to structural features and/or methodological acts, it is to be understood that the invention defined in the appended claims is not limited to the specific features or acts described. Rather, the specific features and acts are disclosed as example forms of implementing the invention.

Claims

1. A method of converting text from one language to another language, comprising:

receiving an input string comprising at least a first and a second language;

segmenting the input string into possible typing candidates having different partitions;

generating one or more candidate strings of language text for each of the languages using one or more of the possible typing candidates;

at least one of the candidate strings that is available to replace the input string is determined based on a probability of how likely at least one of the candidate strings is to be incorrectly input as the input string.

2. The method of claim 1, further comprising performing one of the following based on a probability of the input string being a candidate string in a first language or a candidate string in a second language: (1) converting the input string into a candidate string in a first language, or (2) leaving the input string in a second language.

3. A method as recited in claim 1, wherein the first language is a primary language and the second language is a secondary language that is used less frequently than the primary language.

4. The method of claim 1, wherein the input string in the first language comprises phonetic text and the input string in the second language comprises non-phonetic text.

5. The method as recited in claim 1, wherein the first language is chinese and the second language is english.

6. A method of converting text from one language to another language, comprising:

allowing input of an input string containing at least first and second languages without switching input modes of the first and second languages;

segmenting the input string into possible typing candidates having different partitions; and

determining possible candidate strings in at least one of the first and second languages that may be used to replace the input string from the possible typing candidates based on a probability of how likely each candidate string is to be incorrectly input as the input string;

selectively performing one of the following based on the probability: (1) converting the input string into an output string in a first language and outputting the output string, or (2) outputting an input string in a second language.

7. The method as recited in claim 6, further comprising:

displaying an input string comprising a first and a second language in a single edit line; and

the output string or the input string is selectively displayed in a single edit line.

8. The method as recited in claim 6, wherein the first language is chinese and the second language is a particular language other than chinese.

9. A language input system, comprising:

a user interface for receiving an input string written from a combination of phonetic text and non-phonetic text;

a first typing model for generating likely first typing candidates written by phonetic text that can replace the input string according to a probability of a typographical error of how likely each first candidate string is incorrectly entered as the input string, wherein each said first candidate string is generated at least in part by segmenting the input string;

a second typing model for generating possible second typing candidates written by the non-phonetic text that may replace the input string based on a probability of a typographical error for how likely each second candidate string is incorrectly entered as the input string, wherein each said second candidate string is generated at least in part by segmenting the input string;

a language model for providing possible conversion strings written in language text for a first typing candidate written in language text; and

a search engine configured to (1) convert an input string to one of the converted strings to replace phonetic text with linguistic text based on a typing error probability of the first typing model and the second typing model; or (2) outputting one of the second candidates such that the non-speech text remains unconverted.

10. A language input system as recited in claim 9, wherein the search engine converts the input string to one of the converted strings when the first probability is higher than the second probability.

11. A language input system as recited in claim 9, wherein the search engine outputs one of the second candidates based on a comparison of a typographical error probability of the second candidate string with a typographical error probability of the first candidate string.

12. A language input system as recited in claim 9, wherein the phonetic text is in a first language and the non-phonetic text is in a second language.

13. A language input system as recited in claim 9, wherein the phonetic text is pinyin and the non-phonetic text is english.

14. A method of converting text from one language to another language, comprising:

receiving an input string comprising at least a first and a second language;

determining at least one first candidate string using at least one of the possible typing candidates available for replacement of the input string based on a first probability of how likely the first candidate string is to be incorrectly input as an input string in the first language;

determining at least one second candidate string using one or more of the possible typing candidates available for replacement of the input string based on a second probability of how likely the second candidate string is to be incorrectly input as an input string in the second language;

if the first probability is higher than the second probability, using the first candidate string to obtain at least one output string containing the first language; and

if the first probability is lower than the second probability, at least one output string containing the second language is derived using the second candidate string.

15. A method as recited in claim 14, wherein the first language is a primary language and the second language is a secondary language that is used less frequently than the primary language.

16. The method of claim 14, wherein the input string in the first language comprises phonetic text and the input string in the second language comprises non-phonetic text.

17. The method as recited in claim 14, wherein the first language is chinese and the second language is english.

18. The method as recited in claim 14, wherein the input string is a combination of chinese pinyin and english and the output string is a combination of chinese hanzi and english.

19. The method of claim 14, further comprising obtaining the first and second candidate strings from a database.

20. The method as recited in claim 14, further comprising:

obtaining a first probability that a first candidate string was incorrectly entered from data collected from a plurality of users entering a first language training text;

a second probability that the second candidate string was incorrectly entered is obtained from data collected from a plurality of users entering training text in a second language.

21. The method of claim 14, further comprising displaying the output string in the same row that the user entered the input string.

22. A language input system, comprising:

a first typing model for receiving an input string, segmenting the input string into likely typing candidates having different partitions, and determining how likely the first candidate string is to be incorrectly input as a first typing error probability for the input string;

a second typing model for receiving the input string, segmenting the input string into probable typing candidates having different partitions, and determining a second probability of typographical error of how likely the second candidate string is incorrectly entered as the input string; and

a search engine to select one of the first and second candidate strings based on the respective first and second typographical error probabilities.

23. The language input system of claim 22, wherein the first typing model is trained using a first language and the second typing model is trained using a second language.

24. A language input system as recited in claim 22, wherein the input string contains phonetic text and non-phonetic text, and the first typing model is trained on phonetic text and the second typing model is trained on non-phonetic text.

25. A language input system as recited in claim 22, wherein the first typing model is trained using chinese and the second typing model is trained using english.

26. The language input system of claim 22, wherein the input string contains pinyin and english, and the first typing model is trained on pinyin and the second typing model is trained on english.

27. The language input system of claim 22, further comprising a language model for providing an output string for the selected typing candidate.

28. A language input system as recited in claim 27, wherein said search engine converts an input string to an output string.

29. A language input system as recited in claim 27, further comprising a user interface for receiving an input string and displaying an output string in the same edit line.