CN1006333B

CN1006333B - Chinese input method

Info

Publication number: CN1006333B
Application number: CN 87104535
Authority: CN
Inventors: 伊藤英俊; 楠井健
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1986-06-12
Filing date: 1987-06-12
Publication date: 1990-01-03
Also published as: JPS63106070A; CN87104535A; JPH0640330B2

Abstract

The invention discloses a method for processing homophones when the keyed-in Chinese phonetic alphabet sequence is transformed into a Chinese character sequence. This method has a list of homophones. For several Chinese homophones, the word with the highest frequency of use is regarded as class A, the m words with moderate frequency of use are regarded as class B, and the word with lower frequency of use is regarded as class B. as class C. When a character is selected from n homophones, a character of class A is first displayed, and when the character is not the required character, m words of class B are retrieved according to a specific method, and when characters of class A and class B When none of the required characters are found, then a character of class C is retrieved according to the original determined order, thereby realizing high-speed Chinese input.

Description

本发明涉及中国语言文字的输入方法（以下称为中文输入方法），特别是涉及应用标音字母输入中文情况下的同音字处理方法。The present invention relates to an input method of Chinese language and characters (hereinafter referred to as the Chinese input method), in particular to a method for processing homophones in the case of using phonetic alphabets to input Chinese.

在处理中文的信息处理系统中，中文键盘是必须的，作为这种键盘输入方法，一般是采用把中文汉字的形状、读法、或使二者组合的代码化方法。In an information processing system that handles Chinese, a Chinese keyboard is essential, and as such a keyboard input method, a method of encoding the shape of Chinese characters, the reading method, or a combination of the two is generally used.

表示中文读法的标音字母，有中国政府制订了的拼音和在此制订之前就已经使用的注音。现在中国以拼音为主，标音主要是老年人和在一部分地区使用。The phonetic alphabet representing Chinese pronunciation includes Pinyin formulated by the Chinese government and Zhuyin that was used before this formulation. Now China is dominated by pinyin, and the phonetic transcription is mainly used by the elderly and in some areas.

中文的读法除了一部分文字外，具有原则上是一个中文字对应一个音节，而没有其它读法的优点。但是具有相同读法、即具有相同音节的同音字很多，特别是在一部分相当日语假名书写的使用频度很高的文字中，存在很多由一个文字构成的单字词，并且大都有许多同音字，还有两个文字以上的词也有同音词，但其数量很少。The reading method of Chinese has the advantage that one Chinese character corresponds to one syllable in principle, and there are no other reading methods except for some characters. However, there are many homophones with the same pronunciation, that is, the same syllable, especially in some of the frequently used characters written in Japanese kana, there are many single-character words composed of one character, and most of them have many homophones , and words with more than two characters also have homophones, but their number is very small.

因此，根据键入标音字母（例如拼音）输入上述同音字时，一般在操作上是从顺序表示的多个同音字中选择所需的一个字。Therefore, when inputting the above-mentioned homophones according to typing phonetic letters (such as pinyin), generally in operation, a desired word is selected from a plurality of homophones represented in sequence.

然而，用这种方法存在着每当反复输入相同的字时，必须重复进行上述选择操作的缺点。因此，一般在进行同音字的学习处理时，若把使用频度有很大差别的同音字之间用同一方法处理，这反倒成为使输入变成复杂的一个原因。However, with this method, there is a disadvantage that the above-mentioned selection operation must be repeated every time the same character is repeatedly input. Therefore, in general, when learning and processing homophones, if the same method is used to process homophones with very different frequencies of use, this will instead become a reason for making the input complicated.

关于中文字的使用频度根据最近的调查，在一般的图书和报纸等方面，大约使用6，000个汉字，但是使用频度程度很高的60个字出现次数为30%以上，使用最多的“的”字出现次数为4%以上。过去的中文输入方法没有采用考虑到这些使用频度的同音字学习方法。About the frequency of use of Chinese characters According to a recent survey, in general books and newspapers, about 6,000 Chinese characters are used, but the 60 characters with high frequency of use account for more than 30%, and the most used The number of occurrences of the word "的" is more than 4%. Conventional Chinese input methods have not adopted a method of learning homophones that takes these frequency of use into consideration.

本发明的目的在于提供一种把各个同音字根据使用频度和实际状况加以分类，而改善了上述缺点的中文输入方法。The object of the present invention is to provide a kind of Chinese input method that classifies each homonym according to frequency of use and actual situation, and improves above-mentioned shortcoming.

本发明的中文输入方法，在把键入的标音字母序列变换成中文文字序列的中文输入方法中，具有一个同音字表，它对于中文的n个同音字，把使用频度最高的一个字作为A类，把使用频度适中的m个字（m≤n-1）作为B类，把使用频度低的l个字（l≤n-m-1）作为C类。具有把从前述B类的m个字中选择一个字时，以刚刚选择过的字开始顺序地进行显示的第一手段，和从上述C类的l个字当中选择一个字时根据前面已经确定的顺序、顺次地进行显示的第二手段。从上述n个同音字中选择一个字时，最初显示上述A类的一个字，而后，当A类的这个字不相符时，根据上述第一手段检索B类的m个字。当上述A类和上述B类的各个字都不相符时，根据上述第二手段检索上述C类的l个字来实现。The Chinese input method of the present invention, in the Chinese input method that the keyed-in phonetic letter sequence is transformed into the Chinese character sequence, has a homonym table, and it regards the n homonym characters of Chinese, uses the most frequently used word as In category A, m words with moderate frequency of use (m≤n-1) are classified as category B, and l words with low frequency of use (l≤n-m-1) are classified as category C. When a word is selected from the m words of the aforementioned B category, the first means of displaying sequentially with the word that has just been selected is arranged, and when a word is selected from the l words of the above-mentioned C category, it has been determined according to the previous The second means of displaying sequentially and sequentially. When a character is selected from the above n homophones, a character of the above-mentioned A category is initially displayed, and then, when the character of the A category does not match, m characters of the B category are retrieved according to the above-mentioned first means. When each word of above-mentioned A category and above-mentioned B category does not match, realize according to above-mentioned 1 word of above-mentioned C category retrieval of second means.

根据本发明的中文输入方法，最初显示使用频度最高的一个字，当其不是所需字时，再从使用频度适中的字中，从刚刚选择过的字开始顺序显示，因为做了这种操作判断，故不必要象过去那样，每当输入同一个字时，要重复进行同一选择操作。就是说，即使最初显示的字不是所需字时，那么两次以后显示的字成为所需字的程度一定很高，由于这种学习功能的作用，所以能够期待极大节省复杂操作的高速中文输入。According to the Chinese input method of the present invention, a word with the highest frequency of use is initially displayed, and when it is not a required word, then from the words with moderate frequency of use, it is displayed sequentially from the word just selected, because this is done This operation is judged, so it is not necessary to repeat the same selection operation whenever the same word is input as in the past. That is to say, even if the first displayed character is not the required character, the degree of the required character must be very high for the character displayed after the second time. Due to the effect of this learning function, high-speed Chinese can be expected to greatly save complicated operations. enter.

图1为根据本发明的中文输入方法的一个实施例的框图;图2为动作步骤的流程图;图3为同音字表例的说明图;图4为把同音字加以分类例的说明图。Fig. 1 is the block diagram according to an embodiment of Chinese input method of the present invention; Fig. 2 is the flow chart of action step; Fig. 3 is the explanatory figure of homophone table example; Fig. 4 is the explanatory diagram of classifying homophone.

实现本发明的最佳方案Realize the best scheme of the present invention

以下对于本发明的中文输入方法参照设计图予以说明。Below, the Chinese input method of the present invention will be described with reference to the design drawing.

图1为本发明的一个实施例的框图，在该图中的中文输入方法，由拥有拼音键1a，并能输入中文标音字母序列（拼音字母序列）的输入部1、暂存含有各种控制信号的输入信号100的输入缓冲区部2、把拼音字母序列变换成中文汉字序列的变换部3、拥有拼音字母序列与中文汉字序列对应表的字典部4、控制拼音字母序列和中文汉字序列显示的显示控制部5、和显示它们的显示部6构成。Fig. 1 is the block diagram of an embodiment of the present invention, the Chinese input method in this figure, by having pinyin key 1a, and can input the input section 1 of Chinese phonetic letter sequence (pinyin letter sequence), temporary storage contains various The input buffer part 2 of the input signal 100 of the control signal, the transformation part 3 that converts the sequence of pinyin letters into the sequence of Chinese characters, the dictionary part 4 with the correspondence table between the sequence of pinyin letters and the sequence of Chinese characters, and the sequence of control pinyin letters and sequences of Chinese characters The display control unit 5 for displaying and the display unit 6 for displaying them are constituted.

输入部1含有拼音键1a，输出包含与拼音字母共存的各种控制信号的输入信号100。The input unit 1 includes a pinyin key 1a, and outputs an input signal 100 including various control signals coexisting with pinyin letters.

缓冲区部2接收输入信号100，并输出将其识别后要显示成为拼音的输入数据101和指示从拼音到中文汉字变换的变换控制信号106。The buffer unit 2 receives an input signal 100, and outputs input data 101 that is recognized and displayed as pinyin and a conversion control signal 106 that instructs conversion from pinyin to Chinese characters.

变换部3包括字典存取手段3a、读出数据缓冲区3b和变换控制手段3C。字典存取手段3a接收输入数据101，并根据控制信号106输出检索信号102。另外，变换控制手段3C根据变换控制信号106输出变换指示信号107和显示指示信号108。读出数据缓冲区3b接收读出数据103并暂时保存，根据变换信号107从中选择一个字作为变换数据104输出（关于选择方法在后边叙述）。The conversion unit 3 includes dictionary access means 3a, read data buffer 3b, and conversion control means 3c. The dictionary access means 3 a receives input data 101 and outputs a search signal 102 according to a control signal 106 . Also, the conversion control means 3C outputs a conversion instruction signal 107 and a display instruction signal 108 based on the conversion control signal 106 . The read data buffer 3b receives and temporarily stores the read data 103, and selects one word from the read data 107 to output as the converted data 104 (the selection method will be described later).

字典部4包含同音字表4a，该表含有拼音字母和中文汉字的对应表，把对应检索信号102的中文汉字作为读出数据103输出。The dictionary unit 4 includes a homophone table 4 a including a correspondence table between pinyin letters and Chinese characters, and outputs the Chinese characters corresponding to the retrieval signal 102 as read data 103 .

显示控制部5包含输入数据显示缓冲区5a、变换数据显示缓冲区5b和显示缓冲区5C。并且，输入数据显示缓冲区5a及变换数据显示缓冲区5b分别接收输入数据101和变换数据104并暂存它们。显示缓冲区5C把如上所述保存的输入数据101及变换数据104 根据显示指示信号108进行输入，并把其作为显示信号105输出。The display control unit 5 includes an input data display buffer 5a, a converted data display buffer 5b, and a display buffer 5c. Furthermore, the input data display buffer 5a and the converted data display buffer 5b respectively receive the input data 101 and the converted data 104 and temporarily store them. The display buffer 5C saves the input data 101 and the converted data 104 as described above. An input is made according to a display instruction signal 108 and output as a display signal 105 .

显示部6接收上述的显示信号105，在显示器显示它们，并能识别操作过程。The display unit 6 receives the above-mentioned display signals 105, displays them on the display, and can recognize the operation process.

图2为表示上述中文输入方法动作步骤的流程图。在该图中处理21是键入标音字母序列的操作，接下去处理22，是判断是否把在处理21键入了的标音字母序列变换成为中文汉字，当不作变换时返回到处理21，继续键入标音字母序列，当进行变换时转移到处理23。Fig. 2 is a flow chart showing the action steps of the above-mentioned Chinese input method. In this figure, processing 21 is the operation of typing in the phonetic alphabet sequence, and then processing 22 is to judge whether the phonetic alphabet sequence that has been keyed in in processing 21 is converted into a Chinese character, and returns to processing 21 when not converting, and continues typing The phonetic alphabet sequence is transferred to processing 23 when the conversion is performed.

处理23显示A类的字，处理24判断被显示的A类的字是否是所需字。当是所需字时，转移到处理25，当不是所需字时转移到处理27。Processing 23 displays the characters of type A, and processing 24 judges whether the displayed type A characters are desired characters. If it is the desired word, it transfers to processing 25, and when it is not the desired word, it transfers to processing 27.

处理25把显示的所需字写入文件，根据这个处理标音字母被置换成中文汉字。Processing 25 writes the displayed desired word into the file, and according to this processing, the phonetic alphabet is replaced into Chinese characters.

处理27把B类的字按照上述顺序进行显示，处理28判断被显示的B类字是否是所需字，当是所需字时转移到处理29，当不是所需字时，转移到处理30。Processing 27 displays the words of Class B according to the above-mentioned order, processing 28 judges whether the displayed Class B characters are required characters, and transfers to processing 29 when required characters, and transfers to processing 30 when not required characters .

处理29改写上述的参照标记，即把在处理28已确定的所需字，在下面的检索时改写成最初显示的参照标记，然后转移到前述的处理25。Processing 29 rewrites the above-mentioned reference mark, that is, rewrites the desired word determined in processing 28 to the reference mark displayed first at the time of the next search, and then transfers to the above-mentioned processing 25 .

处理30判断B类的字是否已全部显示过了，若还有没有显示的字时，转移到上述处理27，当全都显示过时，转移到处理31。Process 30 judges whether all the words of class B have been displayed, and if there are still undisplayed characters, transfer to above-mentioned processing 27, and when all have been displayed, transfer to processing 31.

处理31把C类的词按照原来已经规定的顺序进行显示，处理32判断被显示的C类的字是否是所需字，当其是所需字时，转移到处理25，当不是所需字时，转移到处理33。Process 31 the word of C class is displayed according to the order that has been stipulated originally, process 32 judge whether the word of C class that is displayed is required word, when it is required word, transfer to processing 25, when not required word When , transfer to processing 33 .

处理33判断是否C类的字已全部显示过了，当还有没显示的字时，转移到上述处理31，当已全部显示过时，转移到处理34。Processing 33 judges whether the word of C class has all been shown, when there is the word that does not show When , it transfers to the above-mentioned processing 31, and when all of them have been displayed, it transfers to the processing 34.

处理34指示所需字不存在于同音字表4a中（例如表示〔？〕同时写入文件）。Processing 34 indicates that the required word does not exist in the homophone table 4a (for example, indicating that [?] is written to the file at the same time).

处理26为判断是否继续进行中文输入，当继续时返回到处理21并重复进行上述操作。Processing 26 is for judging whether to continue Chinese input, when continuing, return to processing 21 and repeat the above operations.

上述的中文输入方法，把有序的同音字作为候补所需字顺次地显示，一边判断其是否是所需字，一边进行中文输入。In the above-mentioned Chinese input method, orderly homophones are sequentially displayed as candidate required characters, and Chinese input is performed while judging whether they are required characters.

根据字典存取手段3a输出的检索信号102，所读出的读出数据103包含同音字。而对于读出数据缓冲区3b，则暂存了所需的同音字表4a。According to the search signal 102 output by the dictionary access means 3a, the read data 103 read includes homophones. For the read data buffer 3b, the required homophone table 4a is temporarily stored.

图3为上述同音字表4a的一部分说明图。在该图中，同音字表4a由检索时存放索引的标音字母域10，存放A类中文汉字的域11、存放B类中文汉字的域12和存放C类中文汉字的域13组成，并且域12中的每个同音字都包含了其在B类检索中所处显示顺序的参考标记域12a（学习域）。FIG. 3 is an explanatory diagram of a part of the above-mentioned homophone table 4a. In this figure, homophone table 4a is made up of the phonetic letter field 10 of depositing index during retrieval, the field 11 of depositing A class Chinese Chinese characters, the field 12 of depositing B class Chinese Chinese characters and the field 13 of depositing C class Chinese Chinese characters, and Each homonym in field 12 contains its reference mark field 12a (learning field) in the display order in the B-type retrieval.

A类存放的字，是在同音字中使用频度最高的一个字。当键入域10中的索引时，将首先显示该字。存放在B类的字是在同音字中具有中等使用频度的一些字，检索B类的字时，根据域12a的参考标记所指示的顺序进行显示。即从刚刚选择了的字开始顺序地显示。每当操作选中所需字时需改写参考标记。存放在C类的字是在同音字中具有使用频度较低的一些字，检索这些字时和通常的字典一样，根据原来确定的顺序来显示。The word stored in class A is the most frequently used word among homophones. When typing an index in field 10, that word will be displayed first. The characters stored in category B are some characters with medium usage frequency among homophones, and when searching for the characters in category B, they are displayed according to the order indicated by the reference marks in field 12a. That is, the characters are displayed sequentially from the character just selected. The reference marks need to be rewritten each time the desired word is selected by the operation. The words stored in category C are some words with lower frequency of use among the homophones. When these words are retrieved, they are displayed according to the order determined originally, as in the usual dictionary.

下面，读出数据缓冲区3b，把暂存的如上述同音字表的内容，根据变换控制手段3C输出的变换指示信号107，作为变换数据104 顺序输出。另外变换控制手段3C输出显示指示信号108，通过显示控制部5，把上述变换数据104顺次在显示部6显示，即参照图2的处理模块41把A类的同音字、处理模块42把B类的同音字、处理模块43把C类的同音字分别顺序表示，并根据操作的判断进行动作。Next, the data buffer 3b is read out, and the content of the above-mentioned homonym table is temporarily stored, according to the conversion instruction signal 107 output by the conversion control means 3C, as the conversion data 104 sequential output. In addition, the conversion control means 3C outputs a display instruction signal 108, and the above-mentioned conversion data 104 is displayed on the display unit 6 in sequence by the display control unit 5, that is, referring to the processing module 41 of FIG. The homophones of the class and the processing module 43 respectively sequentially represent the homophones of the class C, and perform actions according to the judgment of the operation.

图4是把中文的同音字（特别是单字词）加以分类的例子，在该图中，A类中表示的字为部分相当于日文的假名书写的字，它们都是使用频度很高的字。Figure 4 is an example of classifying Chinese homophones (especially single-character words). In this figure, the characters represented in category A are characters written in kana that are partially equivalent to Japanese, and they are all frequently used the word.

在本实例中，就部分相当于日文假名书写的中文单字词为主予以说明，关于两个字以上的词也能够采用上述同样的方法实现中文输入。In this example, some Chinese single-character words equivalent to Japanese kana are mainly described, and the same method as above can be used to realize Chinese input for words with more than two characters.

Claims

A kind of Chinese phonetic alphabet string with key entry is transformed into the Chinese character input method of Chinese character string, has following feature:

N phonetically similar word for Chinese is provided with homophonic character table, this table the highest word of usage frequency as category-A, m moderate word of usage frequency (m≤n-1) as category-B, low usage frequency
Individual word (
≤ n-m-1) as the C class;

When having from m word of above-mentioned category-B the word of selection, be beginning with the word of just having selected, first means that order shows and from above-mentioned C class
When selecting a word in the individual word, according to original order of having arranged, the second means that shows in turn with a certain rule;

When from a said n phonetically similar word, selecting a word, an initial word that shows above-mentioned category-A when not conforming to this word of above-mentioned category-A, is retrieved m word of aforementioned category-B according to aforementioned first means, when each word of aforementioned category-A and category-B does not conform to, retrieve aforementioned C class according to aforementioned second means
Individual word.