CN1094607C

CN1094607C - Intelligent phoneme-shape code input method and application thereof

Info

Publication number: CN1094607C
Application number: CN97101951A
Authority: CN
Inventors: 罗仁; 郭彦
Original assignee: Individual
Current assignee: Individual
Priority date: 1997-03-27
Filing date: 1997-03-27
Publication date: 2002-11-20
Anticipated expiration: 2017-03-27
Also published as: CN1182906A

Abstract

The present invention belongs to the field of computer character input. The character input can be carried out on a standard keyboard by using a plurality of modes, and the character input can also be carried out by using a keyboard additionally with initial consonants and vowels. The present invention is also provided with a telephone number keyboard specially used for the Chinese input, and the telephone number keyboard can be applied to various computer telephone service fields. The present invention is additionally provided with an intelligent input mode which can be widely applied to a plurality of computer character input fields of intelligent input, character proof reading, Chinese character recognition, Chinese speech input, a letter post code of a mailing address instead of a numeral post code, a Chinese display function realization of a BP machine on an English numeral BP machine, etc.

Description

Intelligent sound-shape code input method and application thereof

The present invention belongs to the field of computer character input. In this field, many new Chinese input methods have been invented. There are some phonetic and shape code input methods and intelligent input methods. In the single character input method of non-whole sentence intelligent input, some methods have high input speed. For example: the method is characterized by a patent method ' simple pronunciation-shape code Chinese input method ' invented by Mr. Wang Luo and Mr. Xuhui, (the patent publication number is 1081772, the legal state is authorization, the input speed is about 2.0 per character average code length.) and a patent method ' pronunciation-shape-stroke comprehensive coding Chinese character high-speed input method and a keyboard used by ' invented by Mr. in the original benefit '. (patent publication No. 1039132, no legal status, and input speed of average code length per character is about 1.3-1.8.) the single character input method of the present invention is a phonetic-shape code input method with fast input speed, convenience, practicality, and easy learning and memorization.

In the countries of the English and the American countries, a common English input telephone code keyboard is provided. (see fig. 2 for details.) and the technology of telephone code english input based on such a keyboard has been widely used in various fields of computer telephony services. However, no one in China currently proposes to solve the problem of Chinese input of telephone codes by using a similar telephone code technology. The invention designs a telephone code keyboard (detailed in figure 3 and figure 4) suitable for Chinese input and a series of technical methods for carrying out Chinese input of telephone codes by using telephone code technology on the basis of providing an intelligent phonetic-shape code Chinese input method. The invention also provides that the technology can be widely applied to various computer telephone service fields, and particularly points out a plurality of application methods.

At present, some techniques for intelligent input of whole sentence (see "Chinese information newspaper", 1996.2, "an input method based on language understanding — intelligent pinyin input method") and techniques for assisting character proofreading have been proposed at home and abroad. And some products applying the whole sentence intelligent input technology and the auxiliary character proofreading technology are published, for example: the system is a new product which is represented by 'self-communication input' software of new technology company of Beijing Longguan Willd, 'intelligent input' software of black horse and 'proofreading' software of black horse company and applies an intelligent input technology of whole sentence and an auxiliary character proofreading technology. However, at present, no one in China proposes that the essence of the whole sentence intelligent input technology and the auxiliary character proofreading technology is directly applied to many computer character input fields including character proofreading, Chinese character recognition, Chinese voice input, communication address letter zip code instead of digital zip code, letter telephone number instead of digital telephone number, and the function of a Chinese display BP machine on an English digital BP machine. The present invention proposes many methods of applying intelligent phonetic-shape code input method, intelligent sentence input technology and auxiliary character proofreading technology.

The invention aims to provide an intelligent phonetic-form code Chinese input method which is convenient, practical, easy to learn and remember, and simultaneously provides a telephone code keyboard technology and an additional initial consonant and vowel keyboard for Chinese input. Based on the Chinese input method, various advanced Chinese input technologies can be applied to a wide range of fields. Specifically, the telephone code technology can be applied to various computer telephone service fields, and the intelligent sound-shape code input method can also be applied to a plurality of computer character input fields such as intelligent input, character proofreading, Chinese character recognition, Chinese voice input, communication address letter zip code instead of digital zip code, letter telephone number instead of digital telephone number, and Chinese display BP machine function realized on English digital BP machine.

The technical scheme of the invention is divided into a single character input mode and a whole sentence intelligent input mode. Wherein the single character input mode is a phonetic and shape code input mode. The code consists of sound code and shape code. The prescribed principle is popular and simple, easy to learn and remember and the Chinese character code is shorter. The input method has various input modes such as five-code type, four-code type, three-code type, inverted sequence type, shape-sound type, shape-code type, new five-stroke character type and mixed type. The five-code type Chinese character code consists of three phonetic codes and two shape codes. The first phonetic code of the three phonetic codes is the first letter of the initial consonant of Chinese phonetic alphabet, and the last two phonetic codes are the first and last letters of the final consonant of Chinese phonetic alphabet. Wherein syllables belonging to the zero initial are treated as finals. (Note: someone puts the syllables beginning with y [ i- ] and w [ u- ] in zero, but this input method handles both cases as initials, not zero). The two shape codes of the five-code type Chinese character code are obtained by splitting the Chinese character into a plurality of parts according to the specified principle and then taking the first Chinese phonetic letters of the names of the first part and the last part. The intelligent input mode of the intelligent phonetic-shape code input method is based on the intelligent comprehension technology and the auxiliary character proofreading technology, for various coding schemes of single character input mode or multiple coding schemes such as only inputting the first or the first few codes of each Chinese character basic phonetic code, the Markov chain method is used to obtain the Chinese sentence which satisfies the condition and has the maximum occurrence probability as output, simultaneously, some sentences to be selected with the largest probability are provided, various computer telephone service fields are provided for places with higher error probability, the intelligent sound-shape code input method can also be applied to intelligent input, character proofreading, Chinese character recognition and Chinese voice input, the letter zip code of the communication address is used for replacing the number zip code, the letter telephone number is used for replacing the number telephone number, and the English digital BP machine realizes the function of a Chinese display BP machine and other computer character input fields.

The technical scheme of the invention is divided into a single character input mode and a whole sentence intelligent input mode. Wherein the single character input mode is a phonetic and shape code input mode. The code consists of sound code and shape code. The prescribed principle is popular and simple, easy to learn and remember and the Chinese character code is shorter. The input method has various input modes such as five-code type, four-code type, three-code type, inverted sequence type, shape-sound type, shape-code type, new five-stroke character type and mixed type. The five-code type Chinese character code consists of three phonetic codes and two shape codes. The first phonetic code of the three phonetic codes is the first letter of the initial consonant of Chinese phonetic alphabet, and the last two phonetic codes are the first and last letters of the final consonant of Chinese phonetic alphabet. Wherein syllables belonging to the zero initial are treated as finals. (Note: someone puts the syllables beginning with y [ i- ] and w [ u- ] in zero, but this input method handles both cases as initials, not zero). The two shape codes of the five-code type Chinese character code are obtained by splitting the Chinese character into a plurality of parts according to the specified principle and then taking the first Chinese phonetic letters of the names of the first part and the last part. The intelligent input mode of whole sentence for intelligent phonetic-shape code input method is based on intelligent understanding technology and auxiliary character proofreading technology, and uses Markov chain method to calculate the Chinese sentence meeting the condition and with maximum probability as output for various coding schemes of single character input mode or inputting the first or first codes of basic phonetic code of each Chinese character, etc. at the same time, some sentences to be selected with maximum probability are given out, and the labels are given out for the places with high error probability. The intelligent sentence input mode is combined with the word and word integration technology, so that the accuracy of input, proofreading and recognition can be improved, the recognition and calculation speed can be increased, and the real-time performance and the practical intelligent function are improved. The intelligent sentence input mode may also have intelligent functions of real-time Chinese character display, automatic memory, self-learning, self-adaptive modification of word library to adapt to the user's features, self-adaptive selection of error category for correction, self-adaptive selection and correction of the character model word library used in identifying Chinese characters, self-adaptive user's professional features, etc.

The invention also provides a telephone code keyboard technology suitable for Chinese input and a corresponding input method. And to point out many applications of such keyboards and corresponding input techniques in various fields of computer telephony services. The intelligent input mode of the whole sentence can also be applied to many computer character input fields such as intelligent input, character proofreading, Chinese character recognition, Chinese voice input, communication address letter zip code instead of digital zip code, letter telephone number instead of digital telephone number, and the function of Chinese display BP machine realized on English digital BP machine.

In general, many input methods only seek to increase input speed (i.e. reduce average code length per word) and reduce repetition rate, but often neglect the requirements of convenience, practicality, easy learning and easy memorization. Therefore, although the two performance indexes of the input methods may exceed or approach the methods of the five-stroke character, the natural code and the like which are commonly used by people in the past, the input methods are not obviously improved in the aspects of convenience, practicability, easy learning and easy memorization. Thus, due to habitual factors, the new input methods are not accepted by people and cannot be applied to practice. The present invention overcomes this disadvantage. On the premise of ensuring that the input speed and the coincident code rate index are not much different from the five-stroke font and the natural code, the single character input mode of the invention has obvious advantages in the aspects of convenience, practicability, easy learning and easy memorization compared with all the methods. The natural code method invented by mr. Zhou Shinong is slightly superior to the five-stroke character method in a few years ago, and the main reason is that the advantages of the phonetic code method and the shape code method are combined. The phonetic code of natural code is a form formed by compressing Chinese phonetic alphabet into two letters according to a certain memory rule. The shape code of natural code is formed from first letter of Chinese name phonetic alphabet of first and second components of Chinese character. Thus the average code length per word is typically less than 4.0. The single character input mode of the present invention is also a phonetic-configurational code method which combines the advantages of phonetic code method and configurational code method. But the memory rule of the compressed Chinese pinyin is more convenient and practical than that of natural codes and other methods, and is easy to learn and remember. Thus, the present invention has many advantages over other methods in general.

Compared with the existing telephone code English input technology in English and American, the telephone code Chinese input technology of the invention is more suitable for people using Chinese, in particular to people using Chinese pinyin in continental land and Singapore. The technology can also be combined with the intelligent sentence input mode and the application in the aspect of Chinese voice input, and can be widely applied to various computer telephone service fields.

Compared with other intelligent input modes, the intelligent phonetic-shape code input method has the advantage that the input speed of the whole sentence intelligent input mode is higher in the same mode. (for example, under the form of full spelling, the intelligent input mode of the whole sentence of the invention is much faster than the input speed of the intelligent spelling method, and the average code length of each character of the invention can be reduced by 1-2 code lengths compared with the intelligent spelling method.) the intelligent input technology of the whole sentence of the invention is combined with the word integration technology, so that the correctness of input, proofreading and recognition can be increased, the calculation speed can also be improved, and the real-time performance and practical intelligent function can be increased. The intelligent sentence input mode of the present invention may have also Chinese character display, automatic memory, self learning, self-adaptive modification of word library to adapt to the features of the user, self-adaptive selection of error category for correction, self-adaptive selection and correction of the character model word library for Chinese character recognition, self-adaptive user's professional features and other intelligent functions. The intelligent input mode of whole sentence can also be widely applied to many computer character input fields such as intelligent input, character proofreading, Chinese character recognition, Chinese voice input, communication address letter postcodes instead of digital postcodes, letter telephone numbers instead of digital telephone numbers, and the realization of the function of a Chinese display BP machine on an English digital BP machine.

The drawings that accompany the present invention can be briefly described as follows.

FIG. 1 is the keyboard of the intelligent phonetic and shape code input method with additional initial consonants and vowels.

Fig. 2 is a common English input telephone code keyboard commonly used in English and American.

FIG. 3 is the (I) type Chinese input telephone code keyboard in the intelligent phonetic and shape code input method of the present invention.

FIG. 4 is the (II) type Chinese input telephone code keyboard in the intelligent phonetic-shape code input method of the present invention.

The single character input mode of intelligent phonetic and shape code input method is a Chinese character input method belonging to phonetic and shape codes, called phonetic and shape code input method for short. Each Chinese character code of the phonetic-shape code input method can be composed of a phonetic code and a shape code. The selection mode of the sound code and the shape code is quite distinctive, and compared with the current Chinese character codes in China, the method has the advantages of popular and specified principles, simplicity, convenience, easy memorization, shorter Chinese character codes and the like. The user can master the input method in a short time only by initially having knowledge of pinyin and knowledge of Chinese character structure. The phonetic-shape code input method has five-code type, four-code type, three-code type, inverted sequence type, shape-sound type, four-shape code type, three-shape code type, new five-stroke character type and mixed type. The user may select one of them during the input process according to the situation. For example, when a character with unknown pronunciation in the secondary character library is encountered, an input method such as a four-shape code type, a three-shape code type or a new five-stroke character type can be used. The five-code mode in the sound-shape code input method is particularly important and most convenient and practical. Its Chinese character code is formed from three phonetic codes and two form codes. The first phonetic code of the three phonetic codes is the first letter of the initial consonant of Chinese phonetic alphabet, and the last two phonetic codes are the first and last letters of the final consonant of Chinese phonetic alphabet. Wherein syllables belonging to the zero initial are treated as finals. (Note: someone puts the syllables beginning with y [ i- ] and w [ u- ] in zero, but this input method handles both cases as initials, not zero). The two shape codes of the five-code mode are formed by splitting the Chinese character into a plurality of parts according to the specified principle and then taking the first Chinese phonetic alphabet of the name of the first part and the last part. The basic phonetic code, basic shape code, Chinese character codes used by various types of methods, some expansion functions and intelligent functions in the phonetic-shape code input method, and finally the performance analysis of various types are described.

The basic phonetic code of the phonetic-form code input method is generated on the basis of the phonetic alphabet of the Mandarin Chinese. The pinyin u in the Chinese pinyin can be represented by v in various input methods of the phonetic and configuration codes. The initial consonant part in the Chinese pinyin is mostly represented by a single letter, and only ch, sh and zh are composed of two letters. The initial consonant part of the basic phonetic code of the phonetic code input method adopts the initial consonant represented by a single letter in Chinese phonetic alphabet, and for three initial consonants consisting of two letters, namely ch ═ c, sh ═ s and zh ═ z, only the first letter is taken to represent. The final part of the Chinese pinyin generally consists of 1-4 letters. The vowel part of the basic phonetic code of the phonetic code input method adopts vowels formed by 1-2 letters in Chinese pinyin, such as a, i, ao, in, ai, an, in and the like; the vowels formed by 3-4 letters only take the first letter and the last letter as basic phonetic codes of the phonetic-shape code input method. Such as: iao io, uai ui, ang ag, uang ug, uan, ian in, etc. If the pinyin of the Chinese character only has vowels and no initials, the basic phonetic codes of the phonetic and shape code input method only have vowel parts, such as: anan, ai. I.e. syllables belonging to the zero initial are treated as finals. (Note: one can assign zero initial to syllables beginning with y [ i- ] and w [ u- ]. however, the present input method treats both cases as initial, rather than zero initial.) the basic phonetic code input method has only one letter in the shortest, and only three letters in the longest, for example: "double" shuang ═ sug, "pi ao ═ pio," original "yuan ═ yun," shop "dian ═ din. (Note: the two initial consonants of c, s, z and ch, sh, zh are combined and are represented by the letter of c, s, z, this aspect is also for the convenience of some dialect district people can not distinguish the two initial consonants

The Chinese character library in the computer is divided into a first class library and a second class library. The first-level character library is common Chinese characters, the second-level character library is uncommon Chinese characters, and most Chinese characters are complex Chinese characters with more strokes. According to the characteristics, the basic shape code of the sound-shape code input method consists of two shape codes which are set according to the structure of the Chinese character. The structure of Chinese characters can be analyzed into single-body characters and multi-body characters. The independent character is a precession character and can not be analyzed. The composite character is composed of more than two components. Some of these components are individual characters and some are components or strokes that do not act independently as characters. The phonetic-shape code input method splits Chinese characters into a plurality of components no matter in single-body characters or multi-body characters. Two basic shape codes are determined according to the writing stroke order of Chinese characters (namely from left to right, from top to bottom, from outside to inside and from the middle to two sides), according to the conditions of the first stroke and the last stroke.

The character splitting has the following principles:

one, the principle of single character: when the first or last stroke and the related stroke form a single character, the first letter of the Chinese pinyin of the single character is taken as the shape code. Such as: the first stroke and related stroke of "sorrow" form a single-body character "not", the shape code is f, the last stroke and related stroke form a single-body character "heart", the shape code is x, and thus, the basic shape code of "sorrow" is fx.

The principle of radicals: the first or last stroke and the related stroke only form the component, and the first letter of the Chinese pinyin with the name of the component (see appendix two in detail) is taken as the form code, for example: the first stroke of ' river ' and related strokes form a side ' to be fixed, the Chinese name is ' three-point water ', and the first letter s of the Chinese phonetic alphabet of ' three ' is taken as a shape code; the last stroke and the related strokes form a single character 'I', and g is used as a code. The font code of "river" is sg. For another example: the first stroke and the related strokes form a radical 'a common line', called 'beside the handle', and the first letter t of the extracted Chinese pinyin is taken as a shape code; the last stroke and the related strokes form a single character T, and d is used as a code. The "typing" shape code is td.

Stroke principle: the first or last stroke and related stroke do not form a single character or a component, and the first letter of the Chinese phonetic alphabet of the stroke (point, horizontal, vertical, left-falling, right-falling, turning, etc.) is taken as a form code (see appendix II in detail). If the first stroke of the small pen is a vertical hook and the last stroke is a point, then sd is used; the first stroke of "grief" is a point, and the last stroke is a stroke, then dn is used; "Chuan" is used vertically.

Taking a big principle: the principle of character splitting and part taking of the sound-shape code input method is to take a large part and not to take a small part; popular and common ones, but not ancient ones and rare ones. Such as: the first stroke of "forest" and related strokes form a single character "wood", and m is used as a code. The last stroke constitutes both the "wood" but also the "forest" (see one). According to the principle of getting big, the shape code of the latter pen only takes 'forest' and does not take 'wood', and the shape code of 'forest' is ml.

Fifthly, homophonic avoidance principle: if the shape code of a part is the same as the first letter of the phonetic code of the character, the part represented by the shape code is further disassembled to avoid repeated codes. Like "move", the principle of getting larger is "a slice + like", but the sound of "like" and "move" is the same, so "move" is divided into "a slice + and" a move "whose form code is ty.

The isomorphic avoidance principle is as follows: when the second shape code extracted by the method is the same as the first shape code, if the second shape code can be separated into more than three parts, two shape codes of the first part and the second part are extracted. If only two parts can be removed, the shape code is not changed. For example, the former five principles can be used, and the basic font is two fonts, namely ww, of the king. However, according to the sixth principle, the former two components "wang + dian", whose code is wd, should be taken. And for example, the wood can only be disassembled into two parts, namely wood and wood, and the shape and the code of the wood are mm.

The Chinese character code of five-code type input method is formed from maximally three phonetic codes and two form codes. Wherein, three phonetic codes are the basic phonetic codes and two form codes are the basic form codes.

The Chinese character library in the computer is divided into a first-level library and a second-level library. The Chinese characters in the first-level character library are common characters. The Chinese characters in the secondary word stock are the uncommon Chinese characters. Generally, most of the secondary word stocks are Chinese characters with many strokes and complexity, and the pronunciation of the Chinese characters is difficult to determine by ordinary people. According to the characteristics, the five-code input method adds another four-code input method to the Chinese characters in the secondary character library, and one of the four-code input method and the two optional input method can input the Chinese characters in the secondary character library.

The four-shape code type input method only uses four shape codes and does not use sound codes, so that people who do not know the pronunciation of Chinese characters can use the method. The character splitting method of the four-shape code type input method comprises the following steps:

each word is broken down into four parts. The first step is to determine two parts and their codes first based on the basic form-code separating principle, and then to separate the large part, i.e. the part with many strokes, into two parts based on the principle and determine its codes. Such as: "Gal", the first step is: the first step of adding + is force + opening (lk), and the shape code of gamma is djlk. If the two parts of the first disassembly are of equal size, the latter part is disassembled in a second step.

The Chinese character code of the four-code type input method is generally two phonetic codes plus two shape codes at the longest. If the basic phonetic codes are less than two, the phonetic codes of the four-code type input method are the same as the basic phonetic codes, if the basic phonetic codes are more than two, the phonetic codes of the four-code type input method are the first two phonetic codes of the basic phonetic codes, and the two shape codes are the basic shape codes. The four-shape code type input method described above is also applicable to this section.

The Chinese character code of the three-code type input method is generally one phonetic code plus two shape codes. The phonetic code is the first phonetic code of the basic phonetic code, and the shape code is the basic shape code. The said four-shape code type input method is a three-shape code type input method formed by three shape codes without the third one.

The reverse order input method consists of two shape codes and three phonetic codes. The method is mainly suitable for UCDOS and WINDOWS95 with gradual prompt function. The method is characterized in that: the basic shape code composed of two shape codes is input first, then the basic sound code composed of three sound codes is input, i.e. the sequence is opposite to that of the five-code type. Compared with the five-code type, the method has faster input speed, but is not as natural and convenient as the five-code type. The four-character input method of section 2.4 is equally applicable to this section.

The shape-pronunciation type input method generally consists of one pronunciation code and three shape codes. The phonetic code is the first letter in the basic phonetic code. The method for splitting the three shape codes is as follows: the first step is to determine two parts and their codes first based on the basic form-code separating principle, then to separate the parts with more strokes into two parts based on the principle, and to determine the codes only by the last part, so that the three codes constitute three form-codes. (if the first two parts removed are of equal size, the latter part is removed). For example: the example can be disassembled into "" and "" columns "", then the "" column "" is disassembled into "" and "" vacuum control unit "", only the "" vacuum control unit "" is taken, therefore, the code of the "" example "" is dll. Similarly, "guo" may be split into "you" and "", and then "you" into "dian" and "son" and just get "son", which is code xez. (Note: the shape code portion of the shape-sound type input method may be similar to the aforementioned three-shape code type input method.)

The shape-pronunciation input method uses a new five-stroke input method for Chinese characters in the secondary character library, and the splitting method is that a plurality of components are split according to the five-stroke input method to take the first three components and the last component, and then the new five-stroke input method codes are generated according to the method for generating the basic shape codes. If "carve" (sound ji), the code of the wubi type method is dskj, and each component that it is corresponding to is "big T vacuum control unit", then the code that the new wubi type input method corresponds to is ddkl.

There are various mixing methods in the phonetic-shape code input method, in which some or all of the above five types are mixed for use. In these mixed methods, some of the methods can be optionally mixed to input Chinese characters, and all the methods can output correct Chinese characters. This may be suitable for use by users having a variety of different needs. Because the various hybrid methods are complex, the various files created are too numerous and of a type that is not currently recommended for use.

In the phonetic-shape code input method, various fault-tolerant functions are designed. Besides the error-tolerant functions of universal keys or error-tolerant keys in WINDOWS95, WINDOWS3.1 and UCDOS, the phonetic-shape code input method also has the self multi-coding error-tolerant function. When a Chinese character has a plurality of different natural methods, one method can be selected for input. In addition, the sound-shape code input method also has the function of a special fault-tolerant key. Because the letters i, u, v cannot appear at the first letter of the pinyin, the letters must not be used as the first phonetic code and each shape code of the chinese character code of the phonetic-shape code input method. When you cannot determine the first phonetic code and the shape-splitting code of a Chinese character, the code can be replaced by a fault-tolerant key (also called fuzzy key or neutral key) consisting of letters i, u and v. The number of the special fault-tolerant keys is three. The first is the letter "i", which indicates those form codes that cannot be of a certain type. The second and third are the letters "u" and "v", which represent two types of fixed-type shape codes, respectively.

In addition, hundreds of simple words can be input in a simplified mode in the phonetic-shape code input method. The input method of each simplified word is as follows: only two letter codes, namely the first phonetic code in the basic phonetic codes of the Chinese characters and the first shape code of the basic shape codes, are required to be input.

Fault tolerance function: when a Chinese character has more than two different natural splitting methods, one splitting method can be selected optionally.

The second, the full-spelling function: when inputting the phonetic code, the phonetic code can be input according to the full spelling method.

Thirdly, word formation function: the phrase simplifies the input function.

Adding a new word stock function: (slightly.)

Selecting an installation function: (slightly.)

The phonetic and font code input method has most advanced functions of WINDOWS95, WINDOWS3.1 and Chinese input method under UCDOS.

The basic word-forming rule of the single-character input mode of the intelligent phonetic-shape code input method is as follows:

two words: spelling is carried out according to P11+ P21+ P12+ P22. For example, the "hello" is pieced as nhia.

(II) three words: the first letter of the three words is added. For example, the "co-production party" is combined as gcd.

(three) words with more than four characters: the first three letters plus the last letter. For example, the one-day thousand miles is combined as yrqd, and the Marxilinning sense is combined as mksy.

The word formation rule of the five-code input method is an expansion word formation rule:

two words: according to P11+ P12+ P21+ P22 (optionally one of the two-word methods with the basic word-formation method). For example, the "hello" is pieced together as niha.

And (II) words with more than three characters are the same as basic word construction method.

The word forming rule of the three-code input method is that the first, the second and the third in the basic word forming method

Three codes are removed. Other input methods all adopt basic word-building rules.

The five-code type input mode in the phonetic-shape code input method is also called as convenient type, is the most convenient and practical phonetic-shape code input method, is the most convenient and easy to learn, is very easy to popularize and has strong practicability. However, the code length is 2, 3, 4, 5, the maximum code length is 5, and the code length is slightly longer than the three-code pattern (code length is 3) and the four-code pattern (code length is 4). It is slightly less efficient than the reverse type, but more natural and convenient. Its word stock duplication code rate is about 8%, it is higher than the shape and pronunciation type (the duplication code rate of the word stock can be as low as about 1%), but is lower than the three-code type (the duplication code rate of the word stock is as high as 20% -50%), it is a method which has moderate performance and is suitable for the majority of users, especially beginners.

The three-code type code length is shortest, and actually almost reaches the limit, so that an improved input method of shorter codes cannot be provided. And its repetition rate is not too high, and 50% words may not be selected. In addition, most of other words are 10, especially 10% of the repeated code words in the first-level national standard word bank, and 10% of the first-level and second-level words are added. Therefore, most common characters can be directly selected at one time without page turning, the Chinese character input method is particularly suitable for computer application, Chinese characters can be interactively and quickly input by using a screen, each character generally has four key strokes at most, and a large number of characters are less than 3 key strokes. The keys are hit the least and the input the fastest.

The four-code type code length is generally 4, the longest code length is the same as the maximum code length of the five-stroke type, the four-code type code length is simple and easy to learn, the sound code is the first two codes of the five-code type sound code, the shape code is the same as the five-code type, the duplication code rate is increased little compared with the five-code type, and the convenience degree is similar. But the maximum code length is shorter and the performance is slightly improved. The method has the advantages that the method is convenient, important indexes such as the maximum code length, the repeated code rate and the like are between five codes and three codes, the performance is moderate, the method is suitable for popularization, and the method can be tested in the market.

The reverse-sequence type is to reverse the input sequence of the sound code and the shape code of the five-code type, firstly input two basic shape codes and then input three basic sound codes. Because the five-code type firstly inputs the sound code, when the sound code gradually prompts, if the shape code can not be input by using the selection key, the condition that the character is difficult to be separated by using the shape code can be avoided. If the shape code is input first by the reverse order type, the shape code must be used for separating characters, so that the method is difficult and unnatural, and the convenience degree in the aspect is not as good as that of the five-code type. However, since the shape code division efficiency is higher than that of the last two phonetic codes, only one single character is in the same code word library after the first four codes of the reverse order type are input. The space key or enter key can be pressed to input the character. Therefore, the average code length of the input characters is greatly shortened to be less than or equal to four and five, so that the reverse order method also has certain unique advantages.

The shape-sound type input mode code length in the sound-shape code input method is 2, 3 and 4, the general maximum code length is 4, the code length is centered, and the code length is between a three-code type (the code length is 3) and a five-code type (the code length is 5). It has moderate performance and the code length is the same as the famous five-stroke font method which is commonly used by some users. The code duplication rate is very low and is about 1%, and the method is more convenient and easier to learn than the five-stroke font, and only needs to learn a few simple rules, so that the method is an input method with excellent performance. However, three shape codes are required to be input, and character splitting is more difficult and longer than three-code type, four-code type, five-code type, reverse-sequence type and the like, so that the method is a sound shape code input method which is more emphasized on the shape codes. Therefore, it is more difficult to learn and is more difficult to popularize than three-code type, four-code type, five-code type, inverted sequence type and the like.

First, yu in pinyin, when it is used alone or in combination with other letters, can be represented by either u or v. Schemes can also be designed which can only be represented by u or only by v. Thus, there are many possible representations of the Pinyin Yuan. Among them, the best scheme is a scheme which can be represented by either u or v.

In addition, since the letters i, u, v cannot appear at the first letter of the pinyin for chinese characters, they must not be used as the first phonetic code and each shape code of the chinese character code. In this way, the letter keys i, u, v can be used as special fault-tolerant keys for the first phonetic code and the respective configuration code. (see § 2.11.) wherein i may represent a code of an indeterminate type, u may represent a code of a first type, and v may represent a code of a second type.

The keyboard with additional initials and finals is shown in figure 1. Except the initial consonant and zero initial consonant of each Chinese phonetic alphabet on the standard keyboard, three initial consonants of ch, sh and zh can not be directly expressed by the letter keys on the standard keyboard. In the vowels and zero consonants of Chinese pinyin, besides a, e, i, o, u which can be represented on a standard keyboard, (where u can be represented by letter keys u or v) there are also ai, an, ang, ao, ei, en, eng, er, ia, ian, iang, iao, ie, in, ing, iong, iu, ong, ou, ua, uai, ua, uun, uuan, uang, uue, ui, un, etc. many vowels or zero consonants can not be directly represented by letter keys on a standard keyboard. (Note: someone calls syllables beginning with y [ i- ] and w [ u- ] to the zero initial consonant, and the present input algorithm treats both cases as initial consonants rather than the zero initial consonant.)

For all the initials, finals and zero initials which can not be directly represented by the standard keyboard, and the vowel lu in the Chinese pinyin, an additional initial and final keyboard consisting of corresponding initial and final keys is manufactured. At this time, after the standard keyboard and the additional initial consonant and vowel keyboard are connected with the computer host, any initial consonant, vowel and zero initial consonant in the Chinese pinyin can be directly input by pressing keys on the computer host, and the Chinese pinyin of each Chinese character can be input by pressing the keyboard at most two times. The keyboard is simple and convenient to use, visual and practical, easy to learn and remember and high in input speed. Therefore, the computer Chinese input device is a computer Chinese input device with strong practicability.

In the countries of the English and the American countries, a common English input telephone code keyboard is provided. (see fig. 2 for details.) and the technology of telephone code english input based on such a keyboard has been widely used in various fields of computer telephony services. However, no one in China currently proposes to solve the problem of Chinese input of telephone codes by using a similar telephone code technology. The invention designs a telephone code keyboard suitable for Chinese input and a series of methods for carrying out Chinese input of telephone codes by using telephone code technology on the basis of providing an intelligent phonetic-shape code Chinese input method. The invention also provides that the technology can be widely applied to various computer telephone service fields, and particularly points out a plurality of application methods.

Compared with the existing telephone code English input technology in English and American, the telephone code Chinese input technology of the invention is more suitable for people using Chinese, in particular to people using Chinese pinyin in continental land and Singapore. The technology can also be combined with the whole sentence intelligent input mode and the application in the aspect of voice input, and can be widely applied to various computer telephone service fields.

As shown in fig. 3 and 4, on a conventional 12-key telephone keypad, zero, three or four corresponding letters or symbols are printed in sequence on each of the 1-9 nine numeric keys. A total of 26 english letters and some symbols are printed, for example: the vowel symbols "u" and "symbols" in pinyin, etc. Wherein, the letter v can be replaced by the vowel u in the Chinese pinyin, or the symbol ". The Chinese input telephone code keypad can have many forms, fig. 3 and fig. 4, which list two forms that are useful for Chinese input. FIG. 3 is a type (I) Chinese input telephone number keyboard, and FIG. 4 is a type (II) Chinese input telephone number keyboard.

When inputting Chinese characters, firstly, the phonetic codes of Chinese characters are determined according to the Chinese phonetic alphabet, then the form codes are determined according to the defined principle, and then the Chinese characters are inputted according to these codes. It is also possible to enter only the first code or the first few codes of these codes. If the input method of the invention is used, the code length can be greatly reduced, and the coincident code rate is reduced. The use is also very simple and convenient, and the device is visual and practical, and is easy to learn and remember. When the Chinese character is input by telephone code technology, the letter code composed of 26 letters is determined according to the special defined method, then the corresponding telephone number code is input by pressing the corresponding number key, and finally the corresponding Chinese character options are searched by the codes. When the coincident code exists, the user can select or input the related Chinese character name, phrase or Chinese option by playing the related recording and asking the interactive mode selected by the user. In order to reduce the total amount of interaction for playing the recording, an input method with less coincident code rate is used. Because the number of commonly used phrases of Chinese characters is only tens of thousands, only part of the Chinese phrases are often used when the Chinese character is input by using the telephone code technology. Therefore, the phrase letter codes formed by connecting the first few codes of each Chinese character can be determined according to the word-forming rules in 2.12, and then the corresponding number keys are pressed. When the item to be queried of the database only uses partial Chinese character word groups, for example, uses less than 30000 word groups, the four-code word group code is used, and has about 10000 possibilities, and the number of coincident codes is generally less than 10. In this way, the total amount of interactive selections for playing the recorded sound is acceptable. If a small database is used (many practical applications belong to the class), the used Chinese word groups are only equivalent to thousands of orders or less, and the method (namely the four-code word group code) can achieve almost no repeated codes, namely, the recording is hardly required to be played for interactive selection. In a corresponding software technology, a database can be established. Each data has a Chinese phrase to-be-inquired item, a telephone code four-code word group code item formed by combining 1-9 nine numbers and some corresponding information items. After receiving the telephone code input by using the telephone voice card and corresponding software each time, the database query technology can be used for searching out all telephone code four-code word group code items appointed by the input code, and after repeated code selection is carried out by playing the recorded sound, the data item to be searched and corresponding information can be found out definitely, and finally the information can be played to the user through the telephone. Thus, the Chinese input telephone code technology can be used for various applications such as Chinese input, database query, interactive selection and the like. Therefore, the Chinese input telephone code technology can be widely applied to various computer telephone service fields.

First, the above-described Chinese input telephone code technology can be applied to the automatic message leaving service of the BP set. At present, 127 BP set automatic paging services are opened in some cities in China. However, 127 stations can only automatically page and cannot automatically leave messages. By using the Chinese character input telephone code technology, after the automatic message leaving function number and the paging number are input, the short message leaving information can be input by the corresponding telephone code. (e.g., the last name of the caller, the name of the caller, a brief message, time, location, etc.. the detailed code can be referred to as the last code scheme.) the pager automatically recognizes the incoming message, sends a corresponding signal and causes the signal to be displayed on the user's pager. The digital BP machine, the digital English BP machine and the Chinese display BP machine can all use the technology. The application of the technology can greatly improve the automatic service level of the BP machine industry.

Second, the above-described chinese input telephone code technology can be applied to 114-station directory unattended automatic management system. The name of the person (or unit) of the telephone number to be inquired can be input on the telephone by using the Chinese input telephone code technology, and after interactive selection, the system can automatically find the inquired number and play a corresponding recording to report the inquired number to the user. Therefore, 114 number-checking personnel can be saved, the labor is saved, the expenditure is saved, and the automation level is improved. Furthermore, the telephone code technology can be applied to realize an unattended switchboard system for automatically transferring the telephone by using the name of the person or the name of the unit. The technology can be firstly applied to a large number of extension telephone exchange systems in China, and the extension telephone exchange system which applies the Chinese character input telephone code technology and automatically transfers the telephone by using the name of the person or the name of the unit is established, so that a telephone is automatically connected according to the name of the calling person or the name of the unit input by a caller instead of a telephone operator. For example: when the telephone of the first steel workshop is called, the first steel switchboard is firstly dialed, then the telephone code of the three-character 'two-workshop' is input by using the telephone code technology, and the system can automatically connect the telephone to the first steel workshop.

Similar to the application of using letter postal code instead of digit postal code, on the basis of the above-mentioned technology, it can create a switchboard system which can use letter telephone number instead of digit telephone number and can automatically transfer telephone by using person name (or unit name). Therefore, the advanced function that the user can still dial the telephone of the user by other people without informing other people when the user moves or changes the telephone number can be realized. It can also realize the functions of keeping the telephone number secret, connecting in limited time, changing the hot-line telephone number, leaving the hot-line telephone number and voice mail box when the business is going on. It can also produce a telephone which uses the above-mentioned Chinese input telephone code technology or directly uses the Chinese phonetic alphabet keyboard to make Chinese input, directly inputs the alphabetic telephone number, then the electronic device in the telephone automatically dials out the digital telephone number corresponding to the alphabetic telephone number. A small display can be installed on the phone and the user can view the display while entering it to correct and correct errors at any time. Finally, after confirmation, the corresponding digital telephone number is formally dialed. The letter telephone number corresponding to the number telephone number can be input and determined by the user. Such a telephone must be very practical and have a certain market. The function can be realized on the telephone connected with the computer by using computer technologies such as telephone voice card technology and the like, and a cheaper special telephone can also be specially produced.

An alphanumeric telephone number long distance area code telephone number using chinese input telephone number keypad technology may be used first. The new letter telephone number long distance area code telephone number can be divided into two kinds. The first one is a new type of short code letter telephone number for long distance area code, which consists of three digits, the first two digits are the digits of Chinese input telephone number corresponding to the first letter of Chinese phonetic alphabet of the first two characters in Chinese name of the area to be called in long distance telephone, and the third digit is the serial number of repeated codes. The short code mother telephone code long distance area code telephone number can be applied to dozens or hundreds of main cities in China. For example: according to the telephone code rule specified on the type (I) chinese input telephone code keypad shown in fig. 3, the long distance area code telephone number of the short code word mother telephone code in beijing may be represented by 251 or 250, and the long distance area code telephone number of the short code word mother telephone code in inner Mongolia autonomous region call and Haote may be represented by 33 plus the third repeating code serial number. Thus, the use and the memory are very simple and convenient. The second is a new long code mother code long distance area code telephone number, which is generally applied to some small area regions in each large area. Areas where the first number is not available will typically be able to use the second number. The first two digits are the Chinese input telephone code digits corresponding to the first letters of the Chinese pinyin of the first two characters of the Chinese name of the large area such as province, city, autonomous region and the like where the small area of the long distance telephone is located, the third and fourth digits are the Chinese input telephone code digits corresponding to the first letters of the Chinese pinyin of the first two characters of the Chinese name of the small area of the long distance telephone, and the fifth or fifth and sixth digits are the repeated code serial number. With fewer and less than 10 coincident codes, a number of five digits may be used. More and more than 10 but less than 100 coincident codes, six-digit numbers can be used. Typically, the repetition code is no more than 10, and even no more than 100. Generally, the duplicate numbers may be arranged in accordance with the total number of installed telephones in each region, with the number of installed telephones arranged in front. For example: the long code word mother phone number long distance area code phone number in inner Mongolia autonomous area, Erythagon, may be represented by 6635 plus a repeated code sequence number. Such long distance telephone numbers, which do not exceed six digits at the longest, are fully acceptable. As long as the telephones at various places commonly use the type (I) Chinese input telephone code keyboard and the corresponding telephone code technology shown in figure 3, the alphanumeric long-distance telephone numbers can be used and memorized simply and conveniently. It can save much memory trouble.

It is suggested that the letter code corresponding to the area code of the long-distance telephone number is artificially defined to be the same as the letter postcode of the area. This can be done as long as the specified principles are consistent. Thus, the use is more convenient and practical.

Third, the above-described telephone code technology can be applied to help establish an LR coding system for valuable and easily lost items and a telephone loss inquiry system based on such coding and the above-described telephone code technology. One or more registration centers may be established where any valuable and easily lost items may be applied for registration of an LR number for loss reporting. The registration center needs to establish a corresponding data archive, sequentially distributes LR numbers to all application registrants, records all approved LR numbers, corresponding information of the applicants, the owner of the article and the like, and ensures that each article only uses one LR number, and any two different articles cannot be renumbered. The registration center or owner of the item then commissions a dedicated organization to print, write or inscribe the corresponding LR number on the item. Thus, when the article is lost, the lovers can report the lost article to the telephone loss inquiry system established in each place through the telephone, and the LR number and other related information of the lost article are reported. In the telephone loss reporting inquiry system, the telephone code technology can be applied to establish an unattended automatic management system, and Chinese input, digital input and interactive selection are automatically carried out. Meanwhile, a data archive is established to record all LR numbers currently in a loss report state. And adding a record in the database every time when a person reports the loss, and recording the corresponding LR number. After the article is found or the loss report is cancelled, the corresponding record can be cancelled. When someone else purchases an item or suspects that the item may be down, if the item is printed, written or engraved with an LR number, a telephone call can be made to the loss call inquiry system to inquire whether the item is in loss. If the lost state is reported, corresponding measures can be taken to avoid any loss of the owner, and the owner of the lost article can be helped to find clues about the lost article and find the criminal as soon as possible. If not, the purchase may be reassured or no further investigation may be undertaken. This protects the interests of the general public, particularly the owner of the lost item and the merchant who purchased the item. The method can make the public feel more safe, stabilize the society, inhibit crimes and serve the public, and is a great good matter benefiting the nation and the people.

Fourthly, for various database query systems, the telephone code technology can be used for realizing the function of database query by telephone. By combining with the internet technology, the internet network system which is suitable for the situation of China, convenient and cheap and can be accessed by telephone can be realized.

Finally, remote control and remote operation of the computer may be performed by telephone using the telephone code techniques described above. And the operation is convenient and practical, the interface is friendly, and the method is suitable for Chinese people. Furthermore, the telephone code technology is combined with the internet technology, so that the function of inquiring related information on the internet through the telephone can be realized. Further revolution in new telephony, such as videophone technology, will make the phone a convenient, inexpensive, and practical terminal device that can be integrated into various networks.

A fast Chinese character input method based on the understanding of 'sentence' is introduced below. The method utilizes context correlation to realize automatic conversion from short codes of Chinese characters to Chinese characters. The user only needs to input the corresponding short Chinese character codes without manually selecting the Chinese characters, the system automatically gives the corresponding Chinese characters according to the context, and dynamically adjusts the result according to the change of the input content in the range of the whole sentence, thereby ensuring the correctness of the sentence at any time. The method reduces the number of keystrokes, basically realizes touch typing, greatly improves the speed of inputting Chinese characters, and can be mastered almost without learning and training. (see reference [7. ].)

The problem of inputting Chinese characters into a computer is a 'big problem', and although nearly thousands of Chinese character input methods (coding schemes) have been proposed at home and abroad at present, the Chinese characters can not be input quickly and can be mastered easily. Existing methods can be broadly divided into two categories: one is coding around the strokes of Chinese characters, and the other is various coding based on pronunciation (Pinyin). The two methods have characteristics respectively, and the former method is based on character splitting, and typical methods comprise five-stroke character codes and the like. The method has the characteristics of less repeated codes, high input speed and capability of realizing touch typing, thereby being suitable for professional typing personnel to use. However, since it needs to memorize many things (for example, five strokes, 227 characters and many input rules), it takes a long time to train for use, making it difficult for general people to master. Even if the user learns, if the user does not use the input method for a period of time, the user is easy to loose the original, and in addition, how to disassemble characters is considered when the input method is used, the habit of using the language of the user is not met, and the draft-removing input cannot be actually realized. The latter method can be mastered only by spelling, and many people can spell the pinyin. Furthermore, because voice is the most natural, convenient and effective form for human to transmit information, the input method based on pronunciation accords with the habit of using language of people, and is easy to realize draft-removing input. Therefore, despite the slow speed of these current methods, many people still use it. But it is not suitable for inputting long articles, just because its input speed is too slow.

It is believed that the existing pinyin input methods are slow, mainly due to the large number of keystrokes that are not true. The statistical result of a large number of corpora shows that the average of each pronunciation of the full pinyin coded by the national Chinese pinyin scheme is only 3.06 letters, the average keystroke times of inputting a Chinese character is less than five times after the word frequency is considered during the repeated code sorting, and if the average letter length of each pronunciation of the common simple pinyin method is 2.11 (based on WPS), the keystroke times of inputting a Chinese character is less than 3.5 times, and the speed is basically equivalent even if the speed does not reach the speed of the methods such as five strokes and the like, and is not too slow. In fact, the speed provided by the current pinyin input methods is not on the same order of magnitude as the former pinyin input methods. The reason for this is that the existing pinyin input method must manually select a desired character from a plurality of homophones after inputting the pronunciation of a character, and the user needs to scan the whole prompt line and even turn pages, which greatly affects the input speed. When attention is paid to the prompting line, blind typing cannot be achieved. Therefore, the key for improving the efficiency of the pinyin input method is to change manual character selection into automatic character selection.

Chinese has the characteristic of one-tone multiple characters, in other words, many coincident codes. The Chinese pronunciation recorded in the Xinhua dictionary is taken as the standard, 412 Chinese pronunciations are totally taken (tone is not taken into consideration), the second-level national standard Chinese characters are equivalent to 7536 Chinese characters after one character and multiple pronunciations are taken into consideration, and each pronunciation corresponds to 18.29 national standard Chinese characters on average. Therefore, it is not easy to solve the problem of one-tone multi-word.

At present, word-based pinyin input methods such as associative coding and double-pinyin input methods adopt word-by-word input or prompt, which is one step ahead, and solve the problem of character selection to a certain extent, but the method is still far away from the target of touch typing and mainly shows that:

1. a large number of words are encountered in actual chinese text, such as: "is", "and", "of", etc., which still require manual selection of the chinese characters.

2. The word with more than three characters can not be processed basically.

3. Even if the two-character words can be well processed, the homophone word problem exists, and manual selection is needed.

4. The user does not actually know which words have been included, can enter directly by word, and which do not. Therefore, after pinyin is input by pressing a word, the word is found not to exist in a word bank, and a plurality of keys are pressed without end.

Although both methods are continuously improving to increase speed and ease of use, they have a common deficiency from the point of view of coding research, namely neglecting the effect of natural language context dependencies on coding. The research method of the whole sentence intelligent input method is different from the traditional coding research method. The method uses the information theory for Chinese character coding, researches a sentence as a whole, reduces the coding length by using the context correlation of the sentence, and particularly realizes the automatic conversion from a pinyin string or a Chinese character short coding string to a Chinese character string by sentence understanding, thereby omitting the process of manual character selection. The intelligent input method of the whole sentence provided by the section overcomes the defects of the input method, and is a high-speed input method which can be used for touch typing, accords with the language habit of people and is easy to master. The user only needs to input short codes of Chinese characters without manual character selection, the article can be input, and the input speed can approach or even exceed the quick input method of five-stroke characters and the like.

From the perspective of information theory, the process of inputting Chinese characters is actually the process of providing information to a computer by a person. One Chinese character can be uniquely determined as long as sufficient information is provided. Therefore, in order to study coding, it is first known to determine how much information is needed in a single word, which in the information theory is equivalent to the amount of information that each word contains on average.

If the Chinese characters in an article are regarded as independent equal probabilities, because the secondary international Chinese characters are 6724 in total, 12.7 bits are required to represent one Chinese character on average, and the Chinese character is called the average information content contained in one Chinese character. If 26 letters are used for coding, the information amount per letter is 4.7 bits, so that the average shortest code is 2.7 letters. However, if the frequency of occurrence of different Chinese characters in the text and the correlation of natural language context are considered, the number of bits required for distinguishing each Chinese character is greatly reduced. When the difference of the frequency of each Chinese character is considered, only 9.6 bits are needed to distinguish one Chinese character on average (see [1 ]). This value is reduced when the context correlation is taken into account, and although it is not known how much this is reduced for chinese, the number of bits can be reduced at least 1/3 (see [2, 3, 4]), i.e. around 6 bits, from the case of english research and speech understanding. The shortest code averages around 1.3 letters.

Given above is a theoretical value, in actual encoding, the average encoding length is larger than this value due to convenience of use to be considered. The aim of studying Chinese character coding is to provide a code which is easy to use and has a small average number of keystrokes. From the above analysis, it can be seen that reducing the length can be undertaken from two aspects, first, studying the structure or pronunciation of each word, reducing the encoding length for each word; second, the redundancy of the coding is reduced by the context dependency. The past researches almost all go to the first one, and the word group is considered individually. The current methods based on "word breaking" have approached the limit of the 2.7 key and can only write one another on few strokes that reduce zero, even zero. If context dependency is not utilized, only about 2-key one word can be achieved at most even if the word frequency is considered. And the resulting code is difficult to remember.

The input method proposed here is to use the above second approach, i.e. to reduce the number of keystrokes by using contextual relevance. The present pinyin input method needs to provide two kinds of information to a computer to input a Chinese character: firstly, syllable information, namely, inputting syllables by using a full spelling or a simple spelling, requires average keystrokes of 3.5 times and 2.1 times respectively (for full spelling codes, since one spelling string can be a substring of the other, a space needs to be added for distinguishing when the string is input, and 0.4 keystrokes are added on average); second, the information of the first character of many homonymous Chinese characters corresponding to a tone is currently input by numeric keys, and the information is provided for 1.4 keystrokes on average (considering the frequency). Thus, the pinyin input method averages between 3.5 and 4.9 keystrokes per input word, and is very redundant compared to the theoretical limit of 13, indicating that some of the information is omissible. It has been found that the second type of information mentioned above is omissible. The fact that the content can be known after a person hears the pronunciation of a sentence without ambiguity indicates thatIf the pronunciation of a sentence is known, the Chinese character contained in the sentence can be uniquely determined. Although one pronunciation corresponds to many chinese characters in isolation, in a specific context, there is only one choice, that is, if the context under consideration is sufficient, the pronunciation sequence (pinyin string sequence) of chinese characters and chinese characters can be associated one by one. In view of the above, the method considers a sentence as a whole, uniquely determines the Chinese character corresponding to each pinyin according to the context, and omits the link of manual character selection. On one hand, the number of keystrokes can be reduced, and more importantly, the input speed is much higher because the pinyin is used for touch typing to input Chinese characters. The core technology of the method is automatic conversion from pinyin to Chinese characters. This conversion can be done using a sentence understanding method based on corpus statistics. The conversion from pinyin to hanzi can be regarded as a corresponding problem, namely that the pronunciation S of a sentence S is known (S ═ S)₁，S₂，…，S_N) Find out what kind of Chinese character word string should correspond to (W)₁，W₂，…，W_N) (a sentence can always be divided into several words, including a word), according to the maximum a posteriori probability criterion: w ═ ArgMaxP (W)^(j)/S) (1)W^(j)Are several candidate sentences (word sequences) of the input sentence. According to the Bayes equation, and the independence of P (S) and J, there are: w ArgMax { P (S/W)^(j))P(W^(j))}＝ArgMax{P(S₁，S₂，…，S_N/W₁，W₂，…W_M)P(W₁，W₂，…，W_M) } (2) according to the Markov assumption and the independent output assumption, there are:

P(S₁，S₂，…，S_N/W₁，W₂，…，W_M)

= P (S_{1}, . . ., S_{n_{1}} / W_{1}) P (S_{n_{1} + 1}, . . ., S_{n_{2}} / W_{2}) . . . P (S_{n_{m - 1} + 1}, . . ., S_{N} / W_{M}) - - - (3)

wherein,

corresponds to W_k+1When no confusion is caused, the candidate number J is omitted for the convenience of writing. After considering polyphonic Chinese characters as several different characters:

P (S_{n_{k} + 1}, S_{n_{k} + 2}, . . . S_{n_{k + 1}} / W_{k + 1})

therefore, the problem of calculating equation (1) is left with equations (2), (3), and (4), which are obtained by statistics of a large number of existing articles (corpus). In implementation, the values in the formulas (3) and (4) are counted in advance (see [5, 6]), and then a most probable Chinese character sentence is calculated for the input pinyin string, and the calculation process is automatically completed by a computer. Because the method selects the Chinese characters based on the probability, certain errors are avoided, but the error rate does not exceed 5%. If the tone selection is added after the pinyin is input, the error rate does not exceed 2 percent. Manual intervention may also be performed on words that are not automatically understood. Meanwhile, the method allows a user to define words, and after the user-defined words are used, the error rate can be reduced. The above error rate is based on various articles input in newspaper, and the total word number of the tested corpus exceeds 150 ten thousand words. The intelligent pinyin input method realizes automatic conversion from pinyin to Chinese characters by utilizing the context correlation of natural language and realizes Chinese character input by pinyin touch typing. The average key stroke frequency of inputting a Chinese character by the intelligent full spelling input method is 3.5, which is close to the current various quick input methods; the average key stroke number of the intelligent simple spelling is only 2.11, which is faster than that of the current various input methods. The method conforms to the habit of using language and characters of people, does not need to memorize complicated code tables, and is easy to master. When in use, the character splitting is not needed to be considered, and the manuscript-removing input can be realized. The intelligent input mode of the whole sentence of the intelligent phonetic-shape code input method of the invention is more advanced by one step, the average key stroke number of the intelligent phonetic-shape code input method is 1.0-2.0 times less than that of the intelligent full-spelling input method, and the use convenience degree of the intelligent phonetic-shape code input method is much better than that of the intelligent full-spelling input method and almost similar to that of the intelligent full-spelling input method. Therefore, it is a better input method in general. The intelligent sentence input method is based on natural language understanding, and can avoid inputting wrongly written characters to some extent. There are also a number of efforts in this regard. In the intelligent input mode of the whole sentence of the intelligent pinyin and the intelligent sound-shape codes, the input codes are supposed to be completely correct at present, and the input method can be improved to still output correct Chinese characters when certain wrong input is allowed in the future.

The intelligent input mode of whole sentence in the intelligent phonetic-shape code input method of the invention is based on the intelligent understanding technology and auxiliary proofreading technology of whole sentence, for various coding schemes of single character input mode or various coding schemes such as only inputting the first code or the first few codes of each Chinese character basic phonetic code, the Chinese sentences which meet various conditions and have high occurrence probability are obtained by using the Markov chain model and the intelligent input principle of whole sentence introduced in the above as output, simultaneously, some sentences to be selected with the maximum probability are given, and various marks are given to places with larger error probability, so as to be convenient for further proofreading and modification. The intelligent input mode of the whole sentence of the intelligent sound-shape code can display Chinese words in real time in the process of inputting each Chinese character of the sentence by a user. The user can see the input Chinese characters without waiting for inputting the pinyin of the whole sentence. In case of wrong key or wrong pinyin, the method can also find the wrong key or wrong pinyin in time and is easy to modify. The user does not need to specially establish the word stock of the user, and the system can automatically establish and automatically maintain the word stock while inputting the Chinese characters, thereby being self-adaptive to the professional characteristics of the user. The intelligent sentence input technology and the word integration technology are combined, so that the accuracy of input, proofreading and recognition can be improved, the recognition and calculation speed can be increased, and the real-time performance and the practical intelligent function are improved. The intelligent input method can also have various intelligent functions of automatic memory, self-learning, self-adaptive word stock modification to adapt to the characteristics of users, self-adaptive error category selection for proofreading, self-adaptive character type stock selection and correction for recognizing Chinese characters, self-adaptive professional characteristics of users and the like. So that the intelligence metric of the system will be higher and higher. The intelligent sentence input method has a lot of extensive applications in many computer character input fields, such as intelligent input, character proofreading, Chinese character recognition, Chinese voice input, replacing digital postcodes with communication address letter postcodes, replacing digital telephone numbers with letter telephone numbers, and realizing the function of a Chinese display BP machine on an English digital BP machine. (see reference [7. ].)

Obviously, the principle of the whole sentence intelligent input method can be used for realizing the function of auxiliary character proofreading on homophones or easy-to-mix and error-prone pronunciations. An important feature of Chinese is that homonymous characters and confusing and easily-mistakable characters are very many. Therefore, the auxiliary character proofreading function in this aspect is particularly important.

In addition, the text errors found in the text proofreading include word confusion errors, grammar errors, and other errors. Wherein, grammar error can be checked and corrected by grammar rule, and the place with larger error is marked out and the correct sentence for reference is given. At present, no effective method for applying the whole sentence intelligent input principle is found in the grammar error part, and only the grammar rule is used for correcting. For word confusion errors, a word library consisting of all words which are easily wrongly written into each Chinese word can be established. When the article is corrected, the probability of the occurrence of the corrected original text sentence is obtained by using the whole sentence intelligent input principle described in the first two sections, the probability of the occurrence of the sentence with the highest probability in all the possibilities after a few Chinese characters in the sentence are replaced by the Chinese characters in the word stock (for example, the limitation of replacing the Chinese characters is only one character), and the probability of the occurrence of the sentence with the highest probability in all the possibilities after a few Chinese characters more than one character but not more than half of the sentence in the sentence are replaced by the Chinese characters in the word stock (for example, the limitation of replacing the Chinese characters is more than one character but not more than half of the sentence). If the first probability is greater than both the second and third probabilities, the sentence is deemed to be free of word confusion errors. If the first probability is much smaller than both the second and third probabilities, (e.g., less than 0.8 times both) then the sentence is considered a word confusion error. At this time, the replaced word or portion in the sentence may be marked, and the sentence with the highest probability of occurrence after the replacement may be used as the correct sentence for reference. If the situation is between the two, no judgment is made. If the first probability is greater than the second probability but less than the third probability, then the ratio of the first probability to the third probability is considered. If this ratio is small, (e.g., the ratio is less than the evaluation limit of 0.5,) the sentence is deemed to have a word confusion error. In this case, the same processing as that when there is a word confusion error above can be performed. Otherwise, the sentence is considered to have no word confusion error. The limit of the estimation limit and the limit of the substituted Chinese characters in the original sentence when the second probability and the third probability are calculated can be continuously and properly adjusted, so that the proofreading method has higher accuracy and better takes care of other factors. For example: for "all people age! When the calibration is performed, the calculation is performed by the method, and the result that the first probability is smaller than the second probability can be obtained. Thus, the system considers this sentence as having word confusion errors, and marks the "ren" word therein, while giving the correct sentence "people age!for reference! ". The method is combined with a word integration technology, so that the calculation speed can be increased, and the real-time performance and real-time intelligent function are increased. The method can also have various intelligent functions of automatic memory, self-learning, self-adaption to user requirements, self-adaption to select error categories for proofreading so as to process the situation respectively, self-adaption to professional characteristics of the user and the like.

The error category word bank in the method can be a homophonic error-prone word bank, a dialect error-prone word bank, a font-similar error-prone word bank, a non-standard Chinese character word bank, a full-spelling input method error-prone word bank, a double-spelling input method error-prone word bank, a five-stroke font input method error-prone word bank, a natural code input method error-prone word bank, a Row code input method error-prone word bank, an intelligent sound-shape code input method error-prone word bank, various Chinese character recognition software error-prone word banks, various integrated word banks of the various word banks, a. The word banks with different error categories can be applied to the correction of various conditions such as the mode of using Chinese typing, reading typing and thinking typing, the input methods such as using a full-spelling input method, a double-spelling input method, a five-stroke character and shape input method, a natural code input method, a compass code input method and an intelligent sound and shape code input method in the invention, the output file generated by using different Chinese character recognition software for Chinese character recognition, the uncertain input method or the used Chinese character recognition software, and the like. For different situations, a basic word stock and corresponding proofreading parameters can be determined by research. And then, continuously carrying out self-learning by the system according to the situation generated during actual correction. If the system finds an error, a positive sub-sample is obtained. If the system misses an error, an anti-subsample is obtained. If the system misses an error, it is also equivalent to find an error, and an inverse sub-sample is obtained. At this time, the system can automatically increase, decrease or modify the corresponding error category lexicon and the corresponding proofreading parameters, so that the proofreading software continuously increases the reliability and intelligent measure in the self-learning process. In short, the intelligent sentence input method has important application in many computer character input fields such as character proofreading and the like. Chinese character recognition, especially handwritten Chinese character recognition, is a major issue in the field of computer character input. At present, the most difficult problem encountered by the method is how to further improve the success rate and reliability of handwritten Chinese character recognition so as to achieve the purpose of practicability. At present, the recognition rate of standard handwritten Chinese characters can reach 60% -95%, but the recognition rate of irregular handwritten Chinese characters is still very low. The current general level has not reached the goal of complete practicality. Therefore, the invention provides an identification method which can greatly improve the identification rate of handwritten Chinese characters on the basis of the intelligent sentence input technology and the auxiliary character proofreading technology and quickly reach the practical level, namely the intelligent sentence proofreading identification method. If all the chinese characters can be accurately recognized according to the currently general recognition method for each chinese character in a sentence, the following auxiliary method is not used. If not, determining a group of Chinese characters of which the font images are similar to the Chinese characters in the original text according to the current general recognition method for the Chinese characters which cannot be accurately recognized, and then respectively using the Chinese characters and the accurately recognized Chinese characters to form sentences in any one possible order according to the original text sequence. Two factors are considered, (1) the likelihood of these sentences appearing, and (2) the sum of the degree of similarity of each approximated chinese character to the glyph image of the corresponding recognized font in the original text. The comprehensive index calculated based on these two factors is used to find out the best sentence as output and to give some sub-best candidate sentences, and some mark is displayed in the place with great error for further correction and modification. Therefore, the recognition rate of the current handwritten Chinese characters can be greatly improved. Various parameters, similar word libraries and algorithms in the method can be continuously and properly adjusted, so that the recognition rate of the method is higher, and various other factors can be comprehensively taken care of. For example: when the existing identification method is used, the 'people's forever 'is possibly mistakenly identified as' people's forever' by a computer. However, when the recognition method of the present invention is used, the computer automatically finds out that the first Chinese character may be recognized as two characters, i.e. "person" and "in", respectively, and then comprehensively considers the two factors (1) and (2) to find out that the corresponding sentence is much more likely to appear when the character is recognized as "person" and the character forms are not much different. Thus, recognizing "person" as the best choice, output "people's forever," and give some suboptimal candidate sentences such as: for "enter ten thousand years", the appropriate labels will be given for the "people" word and the "enter" word. This will improve the recognition rate and the recognition method. The technology can be combined with a word integration technology, so that the calculation speed can be increased, and the real-time performance and the real-time intelligent function are increased. The method also has a plurality of intelligent functions of automatic memory, self-learning, self-adapting to the requirements of users, self-adaptively selecting error categories for proofreading, self-adaptively selecting and correcting a character matrix library used when identifying the Chinese characters, the professional characteristics of the self-adapting users and the like. In the application of on-line handwritten Chinese character recognition, the method can also have various intelligent functions of displaying Chinese characters in real time word by word, continuously utilizing context information to modify and correct in real time and the like.

The method is actually only a post-processing technology of Chinese character recognition software applying a whole sentence intelligent understanding technology and an auxiliary character proofreading technology. The Chinese characters which are different from the originally recognized Chinese characters after being corrected by the method and the originally recognized Chinese characters can be recorded. And finally, after the automatic proofreading is finished, manually proofreading and editing are carried out to determine a correct text file. The Chinese characters which are correctly identified and are wrong in post-processing correction can be found out from the Chinese characters to be used as anti-subsample for learning. And finding out the Chinese characters which are originally identified wrongly but are corrected by post-processing, and studying the Chinese characters as positive sub-samples. In the self-learning process, corresponding Chinese characters in a similar character library used in post-processing calibration can be increased, reduced or modified continuously, and various post-processing calibration parameters are corrected continuously. This can continuously improve the reliability and intelligence measures of such software. It is also possible to develop post-processing proofreading software which is applicable to all or a part of the Chinese character recognition software without depending on the specific Chinese character recognition software. It is also possible to develop a post-processing proofreading software which does not use the internal information of various Chinese character recognition software but directly proofreads the output files of various Chinese character recognition software. In practice this is a particular example of the collation software described above. Such software is relatively easy to develop.

Obviously, the whole sentence intelligent input technology in the invention can also be applied to other Chinese character recognition fields. In short, the intelligent sentence input technology of the present invention has important application in Chinese character recognition, especially hand writing Chinese character recognition, etc. in many computer character input fields.

Speech input is a significant problem in the field of computer text input. A main problem in the existing Chinese speech input field is that a large number of homophones and easy-to-use near-phonetic characters exist in Chinese, and the Chinese character to be input cannot be determined by using a speech input method only. Therefore, the Chinese speech input technology has not been practical for a long time.

Now, by using the intelligent sentence input technology and the auxiliary character proofreading technology in the previous paragraphs, all homophones and easy-to-use near-sound characters can be established into some homophones and near-sound character libraries marked by the to-be-recognized sound to be used in speech input. When inputting speech, firstly, find out the homophonic near-phonetic Chinese character library corresponding to each pronunciation in the sentence spoken by the speaker. Then, for the whole sentence, the method of the whole sentence intelligent input technology, the auxiliary character proofreading technology and the like is utilized to obtain a Chinese sentence with the highest probability of occurrence in any combination of different Chinese characters in each character library as output, and a plurality of sub-optimal sentences to be selected are given, and marks are given to the parts with larger probability of errors, so that the method of playing the recording and interactively selecting by the user is further utilized to carry out proofreading and modification. The technology can be combined with a word integration technology, so that the calculation speed can be increased, and the real-time performance and the real-time intelligent function are increased. The method can also have a plurality of intelligent functions of automatic memory, self-learning, self-adaption to user requirements and the like.

This speech input technology can be combined with the telephone code technology described in chapter iv to input text, in particular chinese, directly by telephone to a computer using the speech input technology described above. When the checking and modifying are required, a mode of playing the sound recording to ask the user to interactively select and check and modify can be used. Thus, the technology can be used for establishing the convenient and cheap Internet network system which is suitable for the situation of China and can be accessed by telephone.

In conclusion, the intelligent sentence input method has important application in many computer character input fields such as Chinese speech input and the like. These technologies will bring a series of revolutionary advances in many areas of computer text entry and computer telephony services.

Since identifying handwritten number codes distinguishes between 0-9 ten digits and identifying handwritten letter codes distinguishes between 26 letters. Therefore, the identification difficulty, the identification rate and the identification reliability of the two are not much different and are at least in the same order of magnitude. At present, each post office in China uses a computer to automatically identify a postal code written by a handwritten digital code, and meanwhile, the computer is used for automatically sorting letters. The recognition rate and the recognition reliability of the handwritten number codes obtained by using the technology can meet the requirements. At present, with the rapid development of computer character recognition technology, the technical indexes such as recognition rate and recognition reliability of handwritten letter codes can basically meet the requirements. For hand-written letter codes, automatic recognition and letter sorting can be realized in post offices by using a computer, and the recognition rate and the recognition reliability are not much different. It is thus possible to write a letter code instead of a zip code by hand. When using postal codes today, the sender often does not know or remember the postal code of the addressee. Finding a zip code is also inconvenient and it is unlikely that the digital zip codes for all addresses can be found at the post office or at a certain location. The problem is solved by writing the communication address by using the hand-written letter code instead of the postal code, thereby greatly facilitating the user. If the Chinese character input method is used for writing the address, the length of the letter code can be greatly reduced, and the Chinese character input method is convenient, practical, easy to learn and remember. The technology can be combined with a whole sentence intelligent input technology and an address common phrase and word integration technology for use, so that the accuracy of recognition can be increased, the recognition calculation speed is improved, the real-time performance is improved, and the practical intelligent function is increased. And can have a plurality of intelligent functions of automatic memory, self-learning, self-adaptive user characteristics and the like. When the sender knows or wishes to listen to the addressee's digital zip code, the original digital zip code can still be used. All of the original practices regarding digital zip codes remain unchanged. But the post office adds some items to serve the user. For example: the sender can send the letter without knowing the digital postal code of the receiver, and only a special envelope with higher charge is used. Two types of envelopes may be sold. A printer is provided which is inexpensive but requires the user to be trouble-free and must use a rotary alphanumeric character similar to a rotary numeric code character for printing the show time on a movie ticket sold in a movie theater. The user may go to the post office to use a printer made with a rotating alphanumeric matrix in the post office or purchase the printer by himself. If such a letter is often sent, it is also economically acceptable to purchase the printer as described above. When the user uses the cheaper envelope, the desired letter zip code can be printed on the envelope using the printer described above. Thus, the printed font is standard print letters and numbers, and there is only one standard font. Therefore, the difficulty of identification is low, and the identification rate and the identification reliability are high. Much more reliable than recognizing handwritten digital codes. Therefore, the problem of difficulty in identification is solved completely, and the difficulty of a user is also solved. The other may be charged a higher fee. The user can buy the letter and write the postcode by pen on the envelope. Thus, the method is convenient for users and has economic benefit. Such envelopes are charged higher to ensure higher revenue for the added service items, and a portion of the excess revenue can be used to add equipment and improve technology that can be developed progressively from one or two pilot cities to nationwide use. Therefore, the user can self-raise the self without spending too much money in the country. In order to send a letter, the user does not need to spend time or effort to search the digital postal code, and can spend a little more effort to the post office to use the rotary letter code matrix, or spend a little more money to purchase one rotary letter code matrix, or spend a little more money to send a letter using a second envelope. Different print marks can be marked on special positions of two special envelopes. For example: y-print and S-handwriting. When the computer automatically checks the identification, the mark is checked firstly. If no mark, firstly identifying the digital post code in the digital post code frame, if no digital post code or the digital post code is not correct, then identifying the letter post code according to the hand-written form or the printed form. If so, it may be delivered. However, if no mark is present, it can be judged as a non-specific envelope for the sake of profit. If no digital zip code exists or the digital zip codes are not matched, the user can refuse to recognize the zip code and can recognize and judge the zip code by the user. If the envelope is not a special envelope, the envelope can be delivered or can not be delivered and returned to the original place. Users are encouraged to purchase specialty envelopes because they are not guaranteed to deliver non-specialty envelopes without a digital zip code, and are generally reluctant to risk letters being returned. If marked as Y, the print is identified. If the user does not print out the letter zip code using a print on a special envelope marked Y, the responsibility is left to the user, risking misidentification and misdrop of the letter. If marked S, recognition is by handwriting. The rejection limit can be appropriately increased, and the reliability can be improved. Rejected letters can be identified manually, and reliability is improved. Because such specialized envelopes are charged more, such manual service is needed and also worth. Such envelopes may be purchased home, and the user may write letters to the zip code with a pen and post a letter from a mailbox near the residence. In this case, instead of writing a number zip code, a letter zip code may be written by hand, or a letter may be sent. If the special envelope of the sign S is not used at this time, the letter may be returned. If a special envelope of the standard S is used, the letter can be guaranteed to be mailed. The alphabetic zip code described above may be coded in one way that is simplified according to the simplest of the present input methods. It can be composed of the first letter of the first two characters of the Chinese pinyin plus a coincident code serial number in the large area (i.e. province, city, autonomous region, etc.) Chinese name in the address, and the first letter and the last coincident code serial number of each Chinese pinyin of the small area (or address and unit) Chinese name. Where duplicate codes in large area names are few and it is necessary to remember them. They are also used for other addresses in the same area. For example: provinces, prefectural cities and autonomous regions are basically not coincident codes at the first level, only Shanxi province and Shaanxi province are represented by SX1 and SX2, Hebei province and Hubei province are represented by HEB and HUB which are HB1 and HB2, Henan province and Hunan province are represented by HN1 and HN2 or HEN and HUN. There may be some duplication codes on a small area or three levels of smaller address units, but generally no more than 10, and at most no more than 100, so that at most two duplication code number digits are required. This alphabetic zip code can be used by simply remembering the three sets of large area domain name repeat codes and some small area address name repeat codes described above. For most parts without repeated codes, the parts can be directly written without searching and memorizing. For the part with duplicate codes, it needs to be searched and memorized. But generally only the duplicate code serial number, i.e. only the 1-3 digit number, is memorized. This can reduce the amount of memory. For example: the address "water works of Taiyuan city of Shanxi province" can be represented by "SX 1 TYSZLSC", and the address "Yong Anli of the Yang ward district of Beijing city" can be represented by "BJ 1 CYQYAL". Wherein, the duplicate code numbers are arranged from 0 or 1 according to the use frequency. The difficult thing in this respect is that it is difficult to know in advance whether there is a duplicate code or not and what the duplicate code number is. Generally, various advertisements are written with numeric zip codes and letter zip codes with coincident serial numbers. If the letter zip code is not written, the user may consider it to be no duplicate code. If the letter delays time because of the coincident code, the letter is responsible for being self-borne by the advertising manufacturer. If the user is not reassured, the postal code telephone auto-query system concerned may be queried using the Chinese input telephone code technique described in chapter four. When the corresponding function number is selected, the Chinese input telephone code corresponding to the letter post code written on the envelope is input, and the telephone will automatically report the address, number post code and letter post code of the same code. The user can check whether the letter zip code entered by himself is correct. Either a single digit may be used to represent a single letter of telephone code or two digits may be used to represent a single letter of telephone code, or both may be used together. The number zip code and letter zip code can be reported together in the telephone. A user may generally use a numeric zip code, but a user may use an alphabetic zip code if the user cannot remember the numeric zip code or is not relieved of using only one zip code. This solves the difficulty of finding a zip code for an arbitrary address. In the event that the user is unwilling to look up zip codes by telephone or before such a system is actually used, the user may purchase a booklet recording all of the duplicate code cases for the large and medium area names in the address to address the problem. The duplication code condition is only about a few hundreds at most, and the booklet is not too thick and can be installed certainly. The user may also go to the post office to look up brochures or make inquiries using the associated computer software, or may make inquiries using the aforementioned zip code telephone auto-inquiry system. In this way, the problem of duplication of large and medium area names in the address can be substantially solved. The recognition system typically rejects if it encounters ambiguity about the large and medium area names in the address. And then the identification is carried out by a manual method or the original place is returned. For other duplication cases, the manual is too thick because of too many cases. It is impossible for a large number of users to own, and the search is inconvenient. Therefore, the system can reject the duplicate code condition and is supplemented and processed by a manual method. Thus, because the case of using letter zip codes is much less than the case of using numeric zip codes, the total manual effort does not increase much. The average workload for processing each letter zip code letter is at best a few times greater than the average workload for processing each number zip code letter. This is desirable because the former charges more.

In a word, the technology can greatly improve the automation level and the service level of the postal industry in China, and can hopefully open a system for inquiring the postal code of any address by using a telephone. Can improve service and is convenient for users and people. The technology can also be widely applied to various questionnaires, product surveys, warranty lists, filling of various forms, computer data entry and the like.

Similarly, an exchange system in which a user changes a telephone number without notifying others and others can dial the user's telephone can be realized by replacing a numeric telephone number with an alphabetic telephone number. Therefore, the telephone number does not need to be memorized frequently, and other people do not need to be informed when the telephone number is changed. Thus, the system is very convenient and practical to use.

The foregoing telephone code technology can be applied to 114 directory automated unattended management systems. The name of the person (or unit name) of the telephone number to be inquired can be input on the telephone by using telephone code technology, and after interactive selection, the system can automatically find the inquired number and play a corresponding sound record to report the inquired number to the user. Therefore, 114 number-checking personnel can be saved, the labor is saved, the expenditure is saved, and the automation level is improved. Furthermore, the telephone code technology can be applied to realize an unattended switchboard system for automatically transferring the telephone by using the name of the person or the name of the unit. The technology can be firstly applied to a large number of extension telephone exchange systems in China, and an extension telephone exchange system which applies the telephone code technology and automatically transfers the telephone by using the name of the person or the name of the unit is established to replace a telephone operator, so that the telephone can be automatically connected according to the name of the calling person or the name of the unit input by the calling person. For example: when the telephone of the first steel workshop is called, the first steel switchboard is firstly dialed, then the telephone code of the three-character 'two-workshop' is input by using the telephone code technology, and the system can automatically connect the telephone to the first steel workshop. On the basis of this technology, it is possible to establish a switching system for automatically switching over a telephone using a person name (or a unit name) by replacing a numeric telephone number with an alphabetic telephone number. Therefore, the advanced function that the user can still dial the telephone of the user by other people without informing other people when the user moves or changes the telephone number can be realized. It can also realize the functions of keeping the telephone number secret, connecting in limited time, changing the hot-line telephone number, leaving the hot-line telephone number and voice mail box when the business is going on. It can also produce a telephone which can use the above-mentioned telephone code technique or directly use the Chinese phonetic alphabet keyboard to make Chinese input, directly input the alphabetic telephone number, then automatically dial out the digital telephone number correspondent to the alphabetic telephone number by means of electronic device in the telephone. A small display can be installed on the phone and the user can view the display while entering it to correct and correct errors at any time. Finally, after confirmation, the corresponding digital telephone number is formally dialed. The letter telephone number corresponding to the number telephone number can be input and determined by the user. Such phones must be very practical and have a certain market. The function can be realized on the telephone connected with the computer by using computer technologies such as telephone voice card technology and the like, and a cheaper special telephone can also be specially produced.

The present invention is mainly characterized by that it utilizes some coding schemes of intelligent phonetic-shape code input method to make every Chinese character only take its front code or front several codes, and only take some Chinese characters in the whole sentence according to the defined rule to code short information so as to make the coding scheme applied on the English digital BP machine. (the detailed encoding scheme can be seen in the last description.) thus, the original digital encoding scheme of the digital BP is modified to obtain an alphabetic code encoding scheme consisting of 26 letters and 0-9 ten numbers. The English number BP machine which is many times cheaper than the Chinese display BP machine can be used, the new coding scheme is used for replacing the old coding scheme, the use is very convenient and practical, the operation is simple and cheap, and the function of the Chinese display BP machine can be realized. In particular, the coding scheme can be used when an English digital BP set is used for transmitting the information of the family name, the short message, the time and the place of the caller, and the like. Each BP station only needs to change the signal of the original sending digital code into the signal of the corresponding Chinese phonetic alphabet and digital mixed code. Users using an english digital BP can use this technique without adding devices or making any changes. At this time, the corresponding Chinese phonetic codes are displayed on the English digital BP machine of the user. The user can obtain the corresponding Chinese character information from the Chinese pinyin codes through simple learning without searching or repeating the Chinese pinyin codes on a coding book. Therefore, the English digital BP set has a plurality of functions similar to Chinese display BP set, and the English digital BP set and the Chinese display BP set are almost the same in use and convenient to use. The technology is convenient, cheap, simple, practical, easy to learn and remember, and can provide great help for improving the service level of the English digital BP set and the like. The coding scheme can also be applied to the aspects of the automatic message leaving service of the BP set and the like. In this case, various BP machines (including a chinese character BP machine, a digital BP machine, and an english digital BP machine) can use these encoding schemes. (see the coding scheme described above for details.)

The English digital BP machine coding scheme of the intelligent sound and shape code input method is a coding scheme of an English digital BP machine system which can realize the functions of a Chinese display BP machine. It has a plurality of coding modes. Each BP station and the users of the broad BPs can decide which coding mode to use according to their preferences. For the surname code, the present scheme provides two schemes, a 2-code encoding scheme and a 3-code encoding scheme. I recommend you use a 3-code encoding scheme. For other short phrases, main place names, main units and entertainment places, restaurants, shopping malls, hotels, various related public services, institutions and foreign provinces, local organizations, stadiums and others, etc., the scheme provides a multi-code classification scheme, a multi-code scheme, a 4-code classification scheme, a 4-code scheme, a 3-code classification scheme, a 3-code scheme and other schemes. The classification scheme divides the various use occasions into ten types as follows: 0-used for the notice phrase transmitted when the BP machine notifies the user and the related system notice information, 1-used for various congratulatory phrases, polite phrases and related information, 2-used for requesting phrases and related information, 3-used for situation notice, time and place and other various nouns and other information, 4-used for the related information such as main place names, 5-used for the related information such as main units and entertainment places, 6-used for the related information such as restaurants, shopping mals and hotels, 7-used for various related public services and other related information, 8-used for the related information such as institutions and the local institution of the city of the foreign province, and 9-used for stadium and other related information. (the letters i, u, v or all 26 letters can also be used here as a classifier.) I recommend you use a multi-code classification scheme. (the detailed description of the use can be found in detail in the specific description of the various encoding schemes described below.) such as: ai: ai, an: an, ao: and (3) a chelate. ba: ba, baibai, ban, baobao, be: shell, bi: after all, clip, edge benzyl, guest, bo: bobo, bu: bu step. ca: cang, chang, chai, cai, cao, chaulmoogra, ce: car, cheng, 35852ci: pond lag, cog: chong, from the clumps, cu: in the recipe, there are three main types. da: da, dang, dong, dan, de: deng, di: d, do: dongdong, sinus dou, du: block, pause and many. e: and (4) a jaw. fa: square house, fan, fe: feng seals fengfengfeng, fee, fu: fu Zi Fu V. ga: lid, sweet stem, high tea Gao, ge: ge, gunn, go: gong, Gong and gu: guo. ha: ha, Hangzhou, sea, Han, Hao, he: he and, black, ho: red hong, thick waiting, hu: hu, Hua, Huang and Huo Hu. ji: jijie nationality Ji, Jia, Jie, Jiang, Jingjing, Jiang, jin Jian, jiao, jv (or ju): ju. ka: kongkang Perry, Kai, leather, ke: kok, kok: void, kou, ku: kuang carry the square tube, Kuai. la: lang, lai, lan, lao, le: cold, thunder, li: li jie, liang, ling, lian, lin rush, Liao, liu, lo: long, long building, lu: luolulululululululululu, Luan, luo, lv (or lu): lu Ling. ma: horse linen, shikimic, mai bui, manyflower, Mao, me: monton, plum, mi: minced rice, Min, Miao, mo: mo, mu: and (4) performing nomadic. And na: that, south, ne: energy, ni: ni, nie, ning, year, niu kou, nog: and (5) farming. ou: the European region. pa: pompe, pan, pe: pennleau, Pei, pi: and (2) performing Pisi (Pisi): pupu. qi: qiqichiqiqi, qiang, qin, qinqing qiqiqiqiqiqiqiqin, qiao, qiqiqiqiqiqiqiqiqiqiqiqiqiqiqiqiqiqiqiqiqiqiqie, qu: full weight, qv (or qu): buckling Qu. ra: ran, re: and ro: rong and Rou: rana, Rui. sa: sa, sha, sang, shang, sai dan, shao, se: leftover, prosperous rope, Sen, Shen cautious, si: shismishi Shishi Shi, si, so: songsong, shou, su: su hosta, Shu, Shuai, Su. ta: tang, tan tang, pottery peach, te: teng, ti: iron, field, to: \20319child, tu: and (6) coating the meat on the poultry. And wa: wang, wanwang over ten thousand, we: weak, Weiweiwei Hazar, Wenzhong, wu: wuwuwuwuwuwu Wuwuwu. xi: xi, summer, xie xi, xiang chen, xu xiao, xu u: xue, Xuan, Xun, xv (or xu): xu. ya: yang, Yanyan, Yanyanyan, Yao, ye: leaf, yi: yi, Yiyin Ying Yin, Yi Yiyin Ying, yo: yongyong and Yong, especially you, yu: yueyuan Yue Yuan, Yuan volatile, yv (or yu): yushu Yu. za: hide, open a chapter, slaughter year, the Zhai Zhan accounts for the splendid, Zhao zhao, ze: zheng, coconut, zi: z, zo: zong, zhong, zhou, zu: zu Zhu, Zuo. Compound name: cy: singly, dm: end wood, df: east, dg: southeast guo, gs: grandchild, gl: fair, beams (unique repeated name), hf (or hp): huangpu, ng: south uterus, nm: south door, oy: europe, sg: upper officer, sk: and (3) emptying, sm: and (2) horse, st: apprentice, xh: summer, xm: siemens, zl: bell-jar, zs: zhongsun, zg: radix Brassicae Rapae. And others: qt: other surnames, wz: alien, wg: foreigners. For another example: ai: ai, an: an, ao: and (3) a chelate. ba: bar, bai: white cypress, ban: class, bao: baobao, bei: shell, bi: after all, bie: in other words, bin: side benzyl, guest, bo: bobo, bu: bu step. cag: cang, chang, cai: firewood, cai, cao: cao, chaulmoogra, ce: vehicle, ceg: the equation is as follows: \ 35852j, ci: pond lag, cog: chong, from the plexus, cui: to the side, cu: and (4) storing clearly. da: to, dag: party, dai: wear, dan: red lead, deg: deng, di: dira, dig: d, dio: cun, dog: dongdong, dou: sinus dou, du: plug, dun: section, dun, duo: much more. e: and (4) a jaw. fag: square house, fan: fanfan, feg: von fengfengfeng, fei: fee, fu: fu Zi Fu V. gai: cover, gao: ganjang, gao: high tea 37084, ge: ge, geg: gunn, gog: gong, gou: gou, gu: gu, gui: cinnamon, gun: closing tube, guo: guo is too old. ha: ha, hag: hangzhou, hai: sea, han: korean, hao: hao, he: what sum, hei: black, hog: red flood spreading, hou: wait for thick, hu: hu, hua: flower, hug: yellow, huo: and (6) carrying out Huo fire. ji: jiji book season, jia: jia, jie: section, jig: jiangjiang, Jingjing, jin: brief, jin Jian jin, jio: coke, jv (orju): ju. kag: kakang Gao, kai: kai, kan: leather, ke: koehne, kog: pore space, kou: kou, kug: kuang carry out the principle, kui: kuai are provided. And lag: lang, lai: lysine, lan: blue, lao: old man, leg: cold, lei: thunder, li: plum liqour, lig: beam, slush, lin: lian, forest iris, lio: liao, liu: liu, log: long, lou: building, Lo, lu: lulu luu, lun: luan, luo: luo, lv (or lu): lu Ling. ma: horse linen, mag: shikimic, mai: buying wheat, man: full, mao: fescue, meg: monton, mei: plum, mi: rice gruel, mig: min, Min: min, mio: miao, mo: mo, mou: mu, mu: and (4) performing nomadic. And na: that, nan: south, neg: energy, ni: ni, nie: ne, nig: ning, nin: year, niu: cow button, nog: and (5) farming. ou: the European region. pag: pont, pan: pan, peg: pengli, pei: pei, pi: and (2) performing Pisi (Pisi): pupu. qi: qichiqian, qig: strong, paniculate swallowwort root, qin: money, qin qinzhong, qio: bridge, qiu: qiu fur venge, qun: full weight, qv (or qu): buckling Qu. ran: ran, ren: any, rog: rong and Rong: raney, rui: and (3) poisonous buttering. sa: sa, sand, sag: mulberry, trade dress, sai: race, san: shan singly, sao: shaoshao, se: residual, seg: rope flourishing, sen: sen, Shen cautious, si: stewart scholar, ss, sog: song song, sou: longevity, su: sudor millet, shu, sug: bis, sui: general, inert, water tax, sun: sun, suo: and (4) cable. tag: tang soup, tan: tan chatting, tao: pottery peach, teg: teng, tie: iron, tin: field, tog: \20319child, tu: and (6) coating the meat on the poultry. wag: wang, wan: ten thousand times over, weg: wei, the Chinese-character Wei: weiweiweiwei jeopard, wen: wen wen, wu: wuwuwuwuwuwu Wuwuwu. xi: xi, xia: summer, xie: thank metabolism, xig: paradox, chenchencheng, xin: camp, octyl, xio: xiaoxiao, xue: chef, xun: xuan, Xun, xv (or xu): xu. And yag: yang of poplar and sheep, yan: yanyanyan, yao: yao, ye: leaf, yi: yi, yig: to be, yin: yin Yi Yin Yiyin, yog: yongyong, you: particularly, yue: yue le, yun: yuanyuan Yuan, Yuan volatile, yv (or yu): yushu Yu. And (4) zag: zang, zhang chapter, zai: slaughtering, Zhai, zan: zhanzhao Zhan, zao Zhao zhao, zeg: zheng, zen: screening, zi: resource, branch, zog: zong, zhong, zou: zhou, zu: ancestor, Zhuzhu, zug: zhuang, zuo: left, tall. Compound name: cy: singly, dm: end wood, df: east, dg: southeast guo, gs: grandchild, gl: fair, beams (unique repeated name), hf (or hp): huangpu, ng: south uterus, nm: south door, oy: europe, sg: upper officer, sk: and (3) emptying, sm: and (2) horse, st: apprentice, xh: summer, xm: siemens, zl: bell-jar, zs: zhongsun, zg: radix Brassicae Rapae. And others: qtx: other last names, wzr: alien, wgr: foreigners. Several short term coding schemes: all-while-all 1wsry please return to office 2qhbgs

… …

(briefly here.)

… … also includes: new year 1xnh please return to desk 0qft everything as one wishes wsry please return to office qhbgs

… …

(briefly here.)

… … New year xnh please refer to qft reference: [1.] Von aspiration, Entropy of Chinese characters, quantitative analysis of modern Chinese, Shanghai education Press, pp.267-278.[2.] P.F.Brown, et.al., An Estimate of An Upper Bound for the experiment of

English，Computational Linguists，Vol.XX.，pp.31-40.[3.]C.Shannon，Prediction and Entropy of Printed English，Bell Systems

Technical Journal, vol.30, pp.50-64.[ 4] wu army, research and implementation of pinyin-based chinese speech understanding method, the university of qing university master paper, instructor: king (Chinese character of 'Wang')

English, 1993.6.[5.] Wujun et al, statistically performing Chinese speech understanding and phonetic-character conversion, and third nationwide man-machine speech communication

Intraoperative meeting, month 10 of 1994. Jelinek, Self-Organized Language Modeling for spech Recognition,

ICASSP' 91, pp.450-506, 1992, U.S. Pat. No. 7, et al, a language understanding-based input method- -intelligent pinyin input method, Chinese information newspaper,

vol.10, No.2, second 1992.

Claims

1. An intelligent sound-shape code Chinese character input method is characterized in that: the Chinese character code corresponding to each Chinese character is generally composed of three phonetic codes and two shape codes; the first phonetic code in the three phonetic codes is the initial part of the phonetic code and consists of the first letter of the initial consonant of the Chinese phonetic alphabet corresponding to the Chinese character, the last two phonetic codes are the final parts of the phonetic codes and consist of the first letter and the last letter of the final sound of the Chinese phonetic alphabet corresponding to the Chinese character, if the final has only one letter, the final part of the phonetic code of the Chinese character is only one letter, if the Chinese phonetic alphabet corresponding to the Chinese character is zero initial, the first letter and the last letter of the zero initial constitute the phonetic code part corresponding to the Chinese character, if the zero initial has only one letter, the phonetic code part corresponding to the Chinese character consists of the letter; the two shape codes of Chinese character coding are the combination of Chinese character separated into single character components, side components or stroke components, and the first Chinese phonetic alphabet of the name of the first component and the last component is taken to form; when the first or last stroke and the related stroke form a single-body character, the first letter of the Chinese pinyin of the single-body character is taken as a shape code, when the first or last stroke and the related stroke only form a component, the first letter of the Chinese pinyin with the name of the component is taken as the shape code, and when the first or last stroke and the related stroke do not form the single-body character or the component, the first letter of the Chinese pinyin with the name of the stroke is taken as the shape code; in addition, the whole phrase is quickly input by using phrase codes, or the Chinese pinyin of Chinese characters is quickly input by using the combination of initial consonants and vowels of the Chinese pinyin; thus, the code element obtained from the above-mentioned phonetic code, form code and phrase code is formed from 26 English letters and symbols "u", in which "u" is substituted by letter V, and on the computer keyboard the correspondent letter keys can be pressed in turn to input Chinese characters.

2. The intelligent phonetic and configurational code Chinese character input method according to claim 1, wherein: when the Chinese character is disassembled into a plurality of components, more strokes are preferentially taken, and then less strokes are taken; if the shape code letter of one component is the same as the first letter of the character sound code, the component corresponding to the shape code needs to be slightly disassembled again, and finally the first Chinese phonetic alphabet of the Chinese character name of the first component and the last component is taken as two shape codes of the character according to the assembled components; if the second shape code is the same as the first shape code, two shape codes corresponding to the first part and the second part are taken in turn.

3. The intelligent phonetic and configurational code Chinese character input method according to claim 1, wherein: when the whole phrase is input by using phrase coding, for two words, phrase coding is formed according to one of two modes of first coding of a first word, second coding of the first word, first coding of a second word and second coding of the second word, or first coding of the first word, first coding of the second word, second coding of the first word and second coding of the second word; for three words, the Chinese phonetic initials of the three words are sequentially connected to form a phrase code; for words with four characters or more than four characters, the phrase code is composed of the first Chinese phonetic alphabet of the first three characters and the first Chinese phonetic alphabet of the last character.

4. The intelligent phonetic and configurational code Chinese character input method according to claim 1, wherein: four shape codes are used for rare Chinese characters which are difficult to distinguish character pronunciation in a secondary character library, each Chinese character is firstly disassembled to obtain two parts of determined codes, so that the first two corresponding shape codes are obtained, then the parts with more strokes or the parts with more stroke orders are disassembled to obtain the second two corresponding shape codes, and finally the first two shape codes and the second two shape codes are connected to form the Chinese character codes of the four shape codes.

5. The intelligent phonetic and configurational code Chinese character input method according to claim 1, wherein: when inputting Chinese pinyin of Chinese characters by using the initial consonant and vowel combination of the Chinese pinyin, an additional keyboard is additionally arranged on a standard keyboard of a computer; wherein the additional keyboard has a 1 st row of the 1 st key on which ch is printed, a 1 st row of the 2 nd key on which sh is printed, a 1 st row of the 3 rd key on which zh is printed, a 1 st row of the 4 th key on which ai is printed, a 1 st row of the 5 th key on which an is printed, a 1 st row of the 6 th key on which ang is printed, a 1 st row of the 7 th key on which ao is printed, a 1 st row of the 8 th key on which ei is printed, a 1 st row of the 9 th key on which en is printed, a 2 nd row of the 1 st key on which eng is printed, a 2 nd row of the 2 nd key on which er is printed, a 2 nd row of the 3 rd key on which ia is printed, a 2 nd row of the 4 th key on which ian is printed, a 2 nd row of the 5 th key on which ang is printed, a 2 nd row of the 6 th key on which iao is printed, a 2 nd row of the 7 th key on which ie is printed, a 2 nd row of the 8 th key on which in is printed, a 2 nd row of the 9 th row of the 2 nd key on which ing is printed, a 3 rd row of the 3 nd key on which ig is printed, a 3 nd row of the 3 nd key on which iu is printed, the 6 th key on the 3 rd line is printed with uai, the 7 th key on the 3 rd line is printed with uan, the 8 th key on the 3 rd line is printed with nuan, the 9 th key on the 3 rd line is printed with uang, the 1 st key on the 4 th line is printed with ue, the 2 nd key on the 4 th line is printed with ui, the 3 rd key on the 4 th line is printed with un, the 4 th key on the 4 th line is printed with un, the 5 th key on the 4 th line is printed with uo, and the 6 th key on the 4 th line is printed with lu.

6. The intelligent phonetic and configurational code Chinese character input method according to claim 1, wherein: for the code elements obtained from the sound codes, the shape codes and the phrase codes, Chinese characters can be input by sequentially pressing the number keys corresponding to the codes on the electric keyboard; ABC can be printed on a numeric key 2 of a telephone keyboard, DEF can be printed on a numeric key 3, GHI can be printed on a numeric key 4, JKL can be printed on a numeric key 5, MNO can be printed on a numeric key 6, PQRS can be printed on a numeric key 7, TUV can be printed on a numeric key 8, and WXYZ can be printed on a numeric key 9; or the UV is printed on the numeric key 1 of the electric active keyboard, the ABC is printed on the numeric key 2, the DEF is printed on the numeric key 3, the GHI is printed on the numeric key 4, the JKL is printed on the numeric key 5, the MNO is printed on the numeric key 6, the PQR is printed on the numeric key 7, the STW is printed on the numeric key 8, and the XYZ is printed on the numeric key 9.