JPH11259477A

JPH11259477A - Document processing system and storage medium

Info

Publication number: JPH11259477A
Application number: JP10062683A
Authority: JP
Inventors: Koichi Nomura; 浩一野村; Etsuo Ito; 悦雄伊藤
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1998-03-13
Filing date: 1998-03-13
Publication date: 1999-09-24

Abstract

(57)【要約】【課題】文書情報のフォントを迅速、正確に決定
することにある。【解決手段】予め各言語の文字コード頻度パターンを
記憶し、入力手段１１，１２から入力される文書情報に
関し、フォントデータ処理部１７のコード言語判断手段
３１は、前記文書情報の文字コード頻度データを計測
し、この文字コード頻度データと文字コード頻度パター
ンとを比較し、その傾向が一致またはほぼ一致すると
き、文書情報の言語が文字コード頻度パターンの言語で
あると判断する。そして、フォント決定手段３２は、予
め記憶される各国の言語のフォントの中から当該文字コ
ード頻度パターンの言語のフォントを選択決定し、この
決定されたフォントを用いて文書情報を表示するもので
ある。 (57) [Summary] [PROBLEMS] To quickly and accurately determine a font of document information. SOLUTION: A character code frequency pattern of each language is stored in advance, and with respect to document information input from input means 11 and 12, a code language determining means 31 of a font data processing unit 17 stores character code frequency data of the document information. Is measured, and the character code frequency data is compared with the character code frequency pattern. When the tendency matches or almost matches, the language of the document information is determined to be the language of the character code frequency pattern. Then, the font determining means 32 selects and determines a font of the language of the character code frequency pattern from fonts of the languages of each country stored in advance, and displays document information using the determined font. .

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は多言語を取り扱うパ
ソコンを含む各種計算機システム，翻訳処理システム等
に利用される文書処理システムおよび記録媒体に関す
る。[0001] 1. Field of the Invention [0002] The present invention relates to a document processing system and a recording medium used in various computer systems including a personal computer that handles multiple languages, a translation processing system, and the like.

【０００２】[0002]

【従来の技術】近年、パソコンその他の計算機システム
等の文書処理システムでは、インターネットの普及に伴
って自国語以外の言語で書かれている文書情報を取り扱
う例が急激に増えてきている。このような場合には、イ
ンターネットに接続される文書処理システムには次のよ
うな手段が講じられている。（１）インターネットから取り込んだ多言語の情報を
表示する際、その情報が何言語で書かれているかを識別
し、その識別結果に基づく言語のフォント（文字形態）
を用いて情報を表示することが行われている。（２）また、文書処理システムの１つである翻訳処理
システムでは、取り込んだ情報が何言語であるか理解で
きない場合、翻訳ソフトを用いて自国語に翻訳し表示す
ることにより、文書情報を読み取ることが行われてい
る。2. Description of the Related Art In recent years, examples of document processing systems, such as personal computers and other computer systems, which handle document information written in languages other than their own languages have been rapidly increasing with the spread of the Internet. In such a case, the following measures are taken in the document processing system connected to the Internet. (1) When displaying multilingual information imported from the Internet, identify the language in which the information is written, and use the language font (character form) based on the identification result.
The information is displayed by using the information. (2) In a translation processing system, which is one of the document processing systems, when it is not possible to understand the language of the captured information, the translation information is translated into a native language and displayed using translation software, thereby reading the document information. That is being done.

【０００３】[0003]

【発明が解決しようとする課題】ところで、前記（１）
で述べたシステムでは、何言語で書かれた情報かを識別
し、その識別結果に基づく言語のフォントを用いて画面
上に情報を表示するが、何言語であるかを適切に識別で
きないとき、つまり異なる言語のフォントを用いて情報
を表示すると、文字化けを起こし、全く意味のなさない
情報となってしまう。The above-mentioned (1)
In the system described in the above, the information is written in what language, and the information is displayed on the screen using the font of the language based on the identification result, but when it is not possible to properly identify the language, In other words, if information is displayed using fonts in different languages, the information will be garbled and become meaningless information.

【０００４】また、前記（２）で述べたシステムでは、
取り込んだ情報の言語を理解できないとき、翻訳ソフト
を用いて自国語に翻訳処理するが、その前提として元の
情報が何言語であるか識別し、それに合致したソフトを
選択する必要があり、同様に多言語をもつ情報が何れの
言語であるかを適切に識別しなければならない問題が生
ずる。In the system described in (2),
When the language of the imported information cannot be understood, it is translated into its own language using translation software, but as a prerequisite, it is necessary to identify the language of the original information and select software that matches it. In this case, a problem arises in which information in multiple languages must be properly identified.

【０００５】また、機械翻訳システムは、予め翻訳する
文書の分野に応じた辞書および文法を用意し、自国語以
外の言語を自国語などに翻訳しているが、このように分
野別辞書および各国の文法を用意すれば、翻訳精度を上
げることができる。このような機械翻訳システムでは、
翻訳する文書を解析することにより分野判定を行った
後、この判定された分野をもとに改めて翻訳する方法を
取っているが、インターネットからの情報を取り込んで
翻訳する場合には、その情報の書かれているページ全部
を取得しない限り、そのページの分野を判定できないと
いった問題がある。A machine translation system prepares a dictionary and a grammar corresponding to the field of a document to be translated in advance, and translates a language other than the native language into a native language or the like. By preparing the grammar, the translation accuracy can be improved. In such a machine translation system,
After analyzing the document to be translated to determine the field, a method of translating again based on the determined field is used.When translating by taking in information from the Internet, There is a problem that the field of the page cannot be determined unless the entire written page is obtained.

【０００６】本発明は上記事情に鑑みてなされたもの
で、文書情報の文字コード出現頻度から迅速、かつ、正
確に言語を判断しフォントを決定する文書処理システム
を提供することにある。The present invention has been made in view of the above circumstances, and it is an object of the present invention to provide a document processing system for quickly and accurately determining a language from a character code appearance frequency of document information and determining a font.

【０００７】また、本発明の他の目的は、インターネッ
トからの情報であっても、迅速、かつ、正確に言語を判
断しフォントを決定する文書処理システムを提供するこ
とにある。Another object of the present invention is to provide a document processing system for quickly and accurately determining a language and determining a font, even for information from the Internet.

【０００８】さらに、自身の入力手段またはインターネ
ットの情報の言語を迅速、かつ、正確に判断し、翻訳処
理の効率および翻訳精度の向上を図る文書処理システム
を提供することにある。It is another object of the present invention to provide a document processing system for quickly and accurately determining the language of information on its own input means or the Internet and improving the efficiency and accuracy of translation processing.

【０００９】さらに、別の発明の目的とするところは、
各言語の文字コード出現頻度やアドレス情報から言語を
判断し、迅速、かつ、正確に自身の入力手段またはイン
ターネットから入力される情報のフォントや翻訳処理を
実現するためのプログラムを記録する記録媒体を提供す
ることにある。Further, another object of the invention is as follows.
Judgment of the language from the character code appearance frequency and address information of each language, and a recording medium that records a program for quickly and accurately implementing fonts and translation processing of information input from its own input means or the Internet. To provide.

【００１０】[0010]

【課題を解決するための手段】上記課題を解決するため
に、請求項１，請求項２に対応する発明は、予め記憶手
段に複数の言語の文字コード頻度パターン、各言語のフ
ォントデータを記憶するとともに、自身の入力手段また
はネットワークから入力される文字情報の文字コードの
出現頻度を計測する。そして、この計測された文字コー
ド出現頻度と予め記憶される文字コード頻度パターンと
を比較し文書情報の言語を識別する。この言語が分かれ
ば、既に記憶される各言語のフォントデータの中から該
当言語のフォントを選択決定し、文書情報を表示する。According to a first aspect of the present invention, a character code frequency pattern of a plurality of languages and font data of each language are stored in advance in a storage unit. At the same time, the appearance frequency of the character code of the character information input from its own input means or network is measured. Then, the measured character code appearance frequency is compared with a previously stored character code frequency pattern to identify the language of the document information. If this language is known, the font of the corresponding language is selected and determined from the font data of each language already stored, and the document information is displayed.

【００１１】このような手段を講じたことにより、複数
の言語の文字コード頻度パターンを用いて迅速、かつ、
的確に文書情報の言語に合致したフォントを決定でき、
文字化けを未然に防止できる。[0011] By taking such means, it is possible to quickly and quickly use character code frequency patterns of a plurality of languages.
Fonts that match the language of the document information can be determined accurately,
Garbled characters can be prevented beforehand.

【００１２】請求項３に対応する発明は、予め記憶手段
に第１の言語（自国語以外の少なくとも１ヶ国以上の言
語）の文字コード頻度パターンおよび第１の言語を第二
の言語に翻訳する辞書・文法を記憶しておく。この状態
おいて自身の入力手段またはネットワークから入力され
る文書情報の文字コードの出現頻度を計測し、この計測
された文書情報の文字コードの出現頻度と第１の言語の
文字コード頻度パターンとから文書情報の第一の言語を
識別すれば、前記辞書・文法を用いて第二の言語に翻訳
する。According to a third aspect of the present invention, the character code frequency pattern of the first language (at least one language other than the native language) and the first language are translated into the second language in the storage means in advance. Store the dictionary and grammar. In this state, the appearance frequency of the character code of the document information input from the input means or the network is measured, and the appearance frequency of the character code of the measured document information and the character code frequency pattern of the first language are measured. When the first language of the document information is identified, it is translated into the second language using the dictionary / grammar.

【００１３】このような手段を講ずることにより、第１
の言語の文字コード頻度パターンから文書情報の文字コ
ードの言語が迅速、的確に把握でき、この文書情報の言
語が的確に分かれば、翻訳精度を上げることができる。By taking such measures, the first
The language of the character code of the document information can be quickly and accurately grasped from the character code frequency pattern of this language, and if the language of the document information is accurately known, the translation accuracy can be improved.

【００１４】請求項４，５に対応する発明は、入力され
る文書情報の文字コード出現頻度と予め各言語の文字コ
ード出現頻度パターンとを比較し、文書情報の言語を判
断し、この判断に基づく言語に基づいてフォントを決定
したり、或いは入力される文書情報の言語を特定し翻訳
するプログラムを記録した記録媒体を提供するものであ
る。According to a fourth aspect of the present invention, a character code appearance frequency of input document information is compared with a character code appearance frequency pattern of each language in advance to determine the language of the document information. It is an object of the present invention to provide a recording medium which stores a program for determining a font based on a language based on the language or specifying and translating a language of input document information.

【００１５】なお、前記文字コード頻度パターンとは連
接文字コードパターンや非連接文字コードパターンとを
含むものである。請求項７，請求項８に対応する発明
は、予め各国別参照用アドレス情報を記憶し、ネットワ
ークに接続される相手側機器に対してアドレス情報を送
信するとき、その送信アドレス情報と各国別の参照用ア
ドレス情報とを比較すれば容易に相手側機器からの情報
の言語を判断でき、これによりフォントを的確に決定で
き、また第１の言語を第２の言語に精度よく翻訳するこ
とができる。The character code frequency pattern includes a connected character code pattern and a non-connected character code pattern. The invention corresponding to claims 7 and 8 is characterized in that the reference address information for each country is stored in advance, and when the address information is transmitted to the partner device connected to the network, the transmission address information and the country-specific By comparing with the reference address information, the language of the information from the partner device can be easily determined, whereby the font can be accurately determined, and the first language can be accurately translated into the second language. .

【００１６】請求項９，請求項１０に対応する発明は、
ネットワークに接続される相手側機器をアクセスするア
ドレス情報と予め各国別参照用アドレス情報とを比較
し、アドレス情報の言語を判断し、この判断に基づく言
語に基づいてフォントを決定したり、或いは入力される
文書情報の言語を特定し翻訳するプログラムを記録した
記録媒体を提供するものである。The inventions corresponding to claims 9 and 10 are:
The address information for accessing the partner device connected to the network is compared with the reference address information for each country in advance, the language of the address information is determined, and the font is determined or input based on the language based on the determination. It is an object of the present invention to provide a recording medium storing a program for specifying and translating a language of document information to be transmitted.

【００１７】請求項１１に対応する発明は、予め各国分
野別参照アドレスコード情報を記憶し、ネットワークに
接続される相手側機器に対して分野を含むアドレスコー
ド情報を送信するとき、その送信アドレスコード情報と
各国分野別参照用アドレスコード情報とを比較すること
により、相手側機器からの情報の言語および分野別を判
断でき、これにより正確な分野の辞書を用いて第１の言
語を第２の言語に高精度に翻訳できる。According to an eleventh aspect of the present invention, when the address code information including the field is transmitted to a partner device connected to the network, the reference address code information for each field is stored in advance. By comparing the information with the country-specific reference address code information, it is possible to determine the language and the field of the information from the partner device, whereby the first language is changed to the second language using the dictionary of the correct field. Can be translated into languages with high accuracy.

【００１８】[0018]

【発明の実施の形態】先ず、フォントを決定するための
基本原理について図１および図２を参照して説明する。
多言語の出現頻度に関しては、統計言語学の手法から得
られる各言語中での文字コード出現頻度の統計データが
存在することを用い、文書情報の文字コード別に出現頻
度を計測すれば、この計測された文字コードの出現頻度
が一定の傾向をもつことが分かる。そこで、この傾向と
統計データから何れの言語であるかが判断できる。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS First, a basic principle for determining a font will be described with reference to FIGS.
With regard to the appearance frequency of multi-languages, using the existence of statistical data on the appearance frequency of character codes in each language obtained from the statistical linguistics method, if the appearance frequency is measured for each character code of document information, this measurement It can be seen that the appearance frequency of the given character code has a certain tendency. Therefore, it is possible to determine which language is used from the tendency and the statistical data.

【００１９】例えば１６進数で６５と表現されるコード
が頻発するデータである場合、これは英語を表現するコ
ード体系では「ｅ」を表し、しかも英語では「ｅ」が最
とも多く出現する文字であるという統計データと照らし
合わせることにより、この文書情報が英語であると判断
し、英語用のフォントを用いて表示したり、英語を第１
言語として翻訳したりする。同様に３連接での文書中の
コード出現頻度を計測し、「７４」，「６８」，「６
５」という３連接が多い場合、これは英語の「ｔｈｅ」
に相当し、「ｔｈｅ」の出現頻度が英文中で非常に高い
ことを利用し英語であると判断できる。For example, in the case of data in which a code represented by a hexadecimal number 65 frequently occurs, this represents "e" in a code system expressing English, and in English, it is a character in which "e" appears most frequently. The document information is judged to be in English by comparing with the statistical data that there is, and it is displayed using an English font or the first English is used.
Or translate it as a language. Similarly, the frequency of occurrence of a code in a document in three concatenations is measured, and “74”, “68”, “6”
If there are many triads of "5", this is the English "the"
The fact that "the" appears very frequently in English sentences can be used to determine that the language is English.

【００２０】今、図１を用いて具体的に説明すると、文
書情報１に含まれる各文字をコード別に分類し、２５６
次元ベクトル２で表す。このベクトル２と予め記録され
ている各言語のベクトル３Ａ，３Ｂ，３Ｃ，…との距離
を比較すれば、最もと距離の近いものが文書情報の言語
であると判断できる。Now, a specific description will be given with reference to FIG. 1. Each character included in the document information 1 is classified by code, and
It is represented by dimension vector 2. By comparing the distance between the vector 2 and the previously recorded vectors 3A, 3B, 3C,... Of each language, it can be determined that the one with the closest distance is the language of the document information.

【００２１】また、近年提唱されているunicode におい
ては、図２に示すように共通の文字は同じコードで表現
されているが、異なる文字は別々のコードで表示されて
いる。つまり、英語の「Ａ」とドイツ語の「Ａ」は同じ
コードで表現されているので、英語でもドイツ語でも使
用されるが、ドイツ語のウムラウト付きの文字、フラン
ス語のアクセント付き文字、ハングル文字、ひらがな、
中国語簡略文字体などは、１つ１つ別々のコードとなっ
ており、その言語でしか使用されない。その結果、文書
情報がウムラウト付き文字のコードを使用しているか否
なかの判断は容易であり、ウムラウト付き文字コードが
使用されている場合にはドイツ語の文書情報であると判
断できる。他の言語についても同様であり、使用コード
から容易に言語を判断できる。この図２においては、文
書情報中にある文字コードをUnicode テーブルにマッピ
ングし、ウムラウト文字コードが出現していることから
ドイツ語の文書であると判断する。In the unicode proposed in recent years, common characters are represented by the same code as shown in FIG. 2, but different characters are represented by different codes. In other words, the English "A" and the German "A" are represented by the same code, so they are used in both English and German, but with German umlauts, French accented characters, and Hangul characters. , Hiragana,
Each of the simplified Chinese characters and the like is a separate code, and is used only in that language. As a result, it is easy to determine whether or not the document information uses the character code with the umlaut. If the character code with the umlaut is used, it can be determined that the document information is German document information. The same applies to other languages, and the language can be easily determined from the used code. In FIG. 2, the character code in the document information is mapped to a Unicode table, and it is determined that the document is a German document because an umlaut character code appears.

【００２２】従って、以上のような文字コードの出現頻
度を用いれば、文書情報のコード出現頻度の傾向から比
較的迅速、かつ、正確に言語，ひいてはフォントを決定
でき、また入力される文書情報の第１言語を判断でき
る。（１）図３は本発明に係わる文書処理システムの一実
施の形態を示す構成図である。Therefore, by using the appearance frequency of the character codes as described above, the language and the font can be determined relatively quickly and accurately from the tendency of the code appearance frequency of the document information. The first language can be determined. (1) FIG. 3 is a configuration diagram showing an embodiment of a document processing system according to the present invention.

【００２３】このシステムは、ＷＷＷブラウザで作成さ
れるハイパーテキストの文書をもつコンピュータで構成
され、或いは単に相手側コンピュータに対して宛先アド
レスを指定し相手側コンピュータから所要とする文書を
取り込む文書処理システム本体４と、自国語以外の言語
の文書情報を所有する少なくとも１台以上の相手側コン
ピュータ５，…がネットワーク６によって接続されてい
る。This system comprises a computer having a hypertext document created by a WWW browser, or simply designates a destination address to a partner computer and fetches a required document from the partner computer. The main body 4 and at least one other computer 5 having document information in a language other than the native language are connected by a network 6.

【００２４】前記文書処理システム本体４は、相手側コ
ンピュータ５のハイパーテキストの文書画面の行き先
（ＵＲＬ）アドレスまたはハイパーテキスト文書以外の
相手側宛先アドレスなどを入力する入力部１１、この入
力部１１の指示に基づいて相手側コンピュータ５から送
られてくる自国語以外の言語の文書情報を受信し取り込
む通信制御部１２、この通信制御部１２で受信した文書
情報中の文字コードが何れの言語のフォントであるかを
決定するプログラムを記録するフォント決定用プログラ
ム記憶部（記憶媒体）１３、予め各言語で表現される文
書情報のコード頻度パターンを記憶するコード頻度パタ
ーン記憶部１４、前記通信制御部１２から取り込んだ文
書情報の文字コード別の出現頻度を計測し記憶するコー
ド頻度データ記憶部１５、自国語を含み、或いは自国語
以外の複数の言語のフォントデータを記憶するフォント
データ記憶部１６、フォント決定用プログラム記憶部１
３のプログラムに基づいて取り込んだ文書情報中の文字
コード頻度から何れの言語のフォントであるかを決定す
るＣＰＵで構成されたフォントデータ処理部１７、処理
途中および処理結果のデータを記憶するデータ記憶部１
７およびフォントデータ処理部１７で決定されるフォン
トを用いて文書情報を表示する表示部１９等が設けられ
ている。The document processing system main unit 4 has an input unit 11 for inputting a destination (URL) address of a hypertext document screen of the partner computer 5 or a destination address other than the hypertext document, and the like. A communication control unit 12 that receives and takes in document information in a language other than the native language sent from the partner computer 5 based on the instruction, and the character code in the document information received by the communication control unit 12 is a font of any language. A font determination program storage unit (storage medium) 13 for storing a program for determining whether or not the code frequency is determined, a code frequency pattern storage unit 14 for storing a code frequency pattern of document information expressed in advance in each language, and the communication control unit 12 Code frequency data storage unit that measures and stores the frequency of occurrence of document information imported from 5, includes a native language or font data storage unit 16 for storing font data of a plurality of languages other than the native language, font determination program storage unit 1
A font data processing unit 17 composed of a CPU that determines which language the font is based on the character code frequency in the document information fetched based on the program of No. 3, and data storage for storing data during and during processing Part 1
And a display section 19 for displaying document information using the font determined by the font data processing section 17 and the like.

【００２５】前記入力部１１は、ハイパーテキストの文
書画面のクリック指示その他相手側コンピュータ５のア
ドレスデータなどを入力するものであって、通常，キー
ボード，ＯＣＲ，タブレット，マウス，ライトペン或い
はＦＤＤなどが用いられ、ハイパーテキストの文書画面
の行き先リンク（ＵＬＲ）をクリックし、或いは相手側
アドレスデータを入力し、自国語以外の文書情報を取り
込む機能をもっている。なお、入力部１１自身をコンピ
ュータに接続し、或いは入力部１１自身にＦＤを挿入
し、文書情報を取り込んでもよい。The input section 11 is for inputting a click instruction on a hypertext document screen and other address data of the computer 5 of the other party, and usually includes a keyboard, an OCR, a tablet, a mouse, a light pen or an FDD. It has a function of clicking a destination link (ULR) on a hypertext document screen, or inputting address data of the other party, and taking in document information other than its own language. Note that the input unit 11 itself may be connected to a computer, or an FD may be inserted into the input unit 11 itself to acquire document information.

【００２６】前記フォント決定用プログラム記憶部１３
は、例えばＣＤ−ＲＯＭを用いるが、このＣＤ−ＲＯＭ
以外に、磁気テープ、ＤＶＤ−ＲＯＭ、フロッピーディ
スク、ＭＯ、ＭＤ、ＣＤ−Ｒ、メモリカードなどが用い
られる。The font determination program storage unit 13
Uses a CD-ROM, for example.
In addition, a magnetic tape, a DVD-ROM, a floppy disk, an MO, an MD, a CD-R, a memory card, and the like are used.

【００２７】前記フォントデータ処理部１７は、図４に
示すように入力部１１またはネットワーク６から取り込
んだ文書情報の文字コードの出現頻度を計測し、この計
測された文字コード出現頻度と各言語の文字コード頻度
パターンとを比較し文書情報の言語を判断するコード言
語判断手段３１と、このコード言語判断手段３１で判断
された言語に基づき、フォントデータ記憶部１６の中の
言語別フォントから文書情報のフォントを選択決定する
フォント決定手段３２と、この決定されたフォントデー
タを用いて文書情報を表示する表示制御手段３３とから
なっている。The font data processing unit 17 measures the appearance frequency of the character code of the document information fetched from the input unit 11 or the network 6 as shown in FIG. A code language judging means for comparing the character code frequency pattern with the character code frequency pattern to judge the language of the document information; And a display control unit 33 for displaying document information using the determined font data.

【００２８】なお、記憶部１３〜１６、１８は、それぞ
れ個別の記憶媒体を用いてもよく、或いは任意数の記憶
媒体をエリア分けし共用してもよい。次に、以上のよう
なシステムおよび記憶媒体に関する動作について図５を
参照して説明する。The storage units 13 to 16 and 18 may use individual storage media, or may share an arbitrary number of storage media by dividing the area. Next, the operation of the above system and storage medium will be described with reference to FIG.

【００２９】先ず、予めコード頻度パターン記憶部１３
には各言語の文書情報の文字コードを２５６次元ベクト
ルで表現したコード頻度パターンが記憶され、またフォ
ントデータ記憶部１６には各言語のフォントデータが記
憶されている。First, the code frequency pattern storage unit 13
Stores a code frequency pattern in which the character code of the document information of each language is represented by a 256-dimensional vector, and the font data storage unit 16 stores font data of each language.

【００３０】この状態において動作が開始すると、フォ
ントデータ処理部１７は、記憶媒体１３からフォント決
定用プログラムを読み取り、次に述べるような所定の処
理を実行する。すなわち、フォントデータ処理部１７
は、初期化処理を実行した後（Ｓ１）、入力部１１また
は通信制御部１２から文書情報が入力されると（Ｓ
２）、コード言語判断手段３１を実行する。When the operation starts in this state, the font data processing section 17 reads the font determination program from the storage medium 13 and executes the following predetermined processing. That is, the font data processing unit 17
Is executed when the document information is input from the input unit 11 or the communication control unit 12 after performing the initialization process (S1) (S1).
2) Execute the code language determining means 31.

【００３１】このコード言語判断手段３１は、文書情報
の中の各文字コードを２５６次元ベクトルの文字コード
別に分類しコード頻度データ記憶部１５に順次記憶する
ことにより文字コード出現頻度を計測する（Ｓ３）。し
かる後、このコード頻度データ記憶部１５に計測記憶さ
れるコード頻度データとコード頻度パターン記憶部１４
に記憶される各言語の参照用文字コード頻度パターンと
を比較し、その傾向が一致するか否かを判断する（Ｓ
４）。ここで、傾向が一致しない場合には、ステップＳ
５に移行し、文字情報全部の読み取りが完了したか否か
を判断し、完了していない場合にはステップＳ３に戻っ
て同様の処理を行い、完了している場合には読み取った
文書情報を表示部１９に表示し（Ｓ６）、オペレータに
フォントの決定を催促する（Ｓ７）。一方、ステップＳ
４においてコード頻度パターンの傾向が一致またはほぼ
一致するとき、そのコード頻度パターンの言語を文書情
報の言語と判断する。The code language judging means 31 measures the character code appearance frequency by classifying each character code in the document information according to the character code of the 256-dimensional vector and sequentially storing them in the code frequency data storage unit 15 (S3). ). Thereafter, the code frequency data measured and stored in the code frequency data storage unit 15 and the code frequency pattern storage unit 14 are stored.
Is compared with the reference character code frequency pattern of each language stored in the storage device, and it is determined whether or not the tendency coincides (S
4). If the trends do not match, step S
The process proceeds to step S5, where it is determined whether or not reading of all the character information has been completed. If not, the process returns to step S3 to perform the same processing. It is displayed on the display unit 19 (S6), and prompts the operator to determine a font (S7). On the other hand, step S
When the tendency of the code frequency pattern matches or almost matches in 4, the language of the code frequency pattern is determined to be the language of the document information.

【００３２】このようにして言語を判断すると、フォン
ト決定手段３２を実行する。このフォント決定手段３２
は、コード判断手段３１によって判断された言語に基づ
き、フォントデータ記憶部１６に記憶される各言語ごと
のフォントデータの中から当該判断言語のフォントを選
択決定する（Ｓ８）。そして、表示制御手段３３にて、
決定されたフォントを用いて文書情報を表示部１９に表
示する（Ｓ９，Ｓ１０）。When the language is determined in this way, the font determining means 32 is executed. This font determining means 32
Selects and determines the font of the determined language from the font data for each language stored in the font data storage unit 16 based on the language determined by the code determining means 31 (S8). Then, in the display control means 33,
The document information is displayed on the display unit 19 using the determined font (S9, S10).

【００３３】従って、以上のような実施の形態によれ
ば、文書情報中の各文字の文字コード出現頻度と予め統
計言語学の手法から得られる参照用コード頻度パターン
とを比較し、パターン傾向の近い参照用コード頻度パタ
ーンから言語を判断し、文書情報中の各文字を表現し表
示するので、異なるフォントに基づく文書情報の文字化
けがなくなり、意味のある情報を確実に表示できる。Therefore, according to the above-described embodiment, the character code appearance frequency of each character in the document information is compared with a reference code frequency pattern obtained in advance by a statistical linguistics method, and the pattern tendency is determined. Since the language is determined from the close reference code frequency pattern and each character in the document information is expressed and displayed, the character of the document information based on different fonts is not garbled, and meaningful information can be displayed reliably.

【００３４】また、予め以上のような一連の処理プログ
ラムを記録媒体１３に記録し、フォントデータ処理部１
７で読み取ることにより、図５に示す処理を実行でき、
文書情報のフォントを迅速、確実に決定でき、文字化け
なく文書情報を正確に表示できる。（２）文書処理システムの１つである翻訳処理システ
ムの実施の形態例。A series of processing programs as described above are recorded in the recording medium 13 in advance, and the font data processing unit 1
7, the processing shown in FIG. 5 can be executed.
The font of the document information can be quickly and reliably determined, and the document information can be accurately displayed without being garbled. (2) An embodiment of a translation processing system that is one of the document processing systems.

【００３５】図６は翻訳処理システムの一実施の形態を
示す構成図である。この翻訳処理システムは、図３の構
成を採用して翻訳処理を行うことから、図３と同一部分
には同一符号を付し、その詳しい説明は省略する。FIG. 6 is a block diagram showing an embodiment of the translation processing system. Since the translation processing system adopts the configuration of FIG. 3 to perform the translation processing, the same parts as those of FIG. 3 are denoted by the same reference numerals, and detailed description thereof will be omitted.

【００３６】このシステムにおいて特に異なる部分は、
図５に示す一連の処理を含む図７に示すような処理を行
うためのプログラムを記憶する翻訳処理用プログラム記
憶部（記録媒体）３６、自国語以外の言語を自国語に翻
訳するための辞書データや各言語の文法データを記憶す
る辞書・文法記憶部３７、翻訳処理用プログラム記憶部
である記録媒体３６のプログラムを読み取って図７に示
す一連の処理を実行するＣＰＵで構成された翻訳処理部
３８などが設けられている。A particularly different part of this system is that
A translation processing program storage unit (recording medium) 36 for storing a program for performing the processing shown in FIG. 7 including the series of processing shown in FIG. 5, a dictionary for translating languages other than the native language into the native language A dictionary / grammar storage unit 37 for storing data and grammar data of each language, and a translation process constituted by a CPU which reads a program on a recording medium 36 which is a translation process program storage unit and executes a series of processes shown in FIG. A part 38 and the like are provided.

【００３７】次に、以上のような翻訳処理システムの動
作について説明する。この翻訳処理システムにおいて動
作が開始すると、翻訳処理部３８は、記憶媒体３６から
翻訳処理用プログラムを読み取り、次に述べるような所
定の処理を実行する。すなわち、翻訳処理部３８は、初
期化処理を実行した後（Ｓ１１）、入力部１１または通
信制御部１２から文書情報が入力されると（Ｓ１２）、
図４に示すコード言語判断手段３１を実行する。Next, the operation of the above-described translation processing system will be described. When the operation starts in this translation processing system, the translation processing unit 38 reads the translation processing program from the storage medium 36 and executes the following predetermined processing. That is, after performing the initialization processing (S11), the translation processing unit 38 receives the document information from the input unit 11 or the communication control unit 12 (S12).
The code language determining means 31 shown in FIG. 4 is executed.

【００３８】このコード判断手段３１は、文書情報の中
の各文字コードを２５６次元ベクトルの文字コード別に
分類しコード頻度データ記憶部１５に順次記憶すること
により文字コード出現頻度を計測する（Ｓ１３）。しか
る後、このコード頻度データ記憶部１５に計測記憶され
る各文字コード頻度データとコード頻度パターン記憶部
１４に記憶される各言語の参照用文字コード頻度パター
ンとを比較し、その傾向が一致するか否かを判断する
（Ｓ１４）。ここで、傾向が一致しない場合には、ステ
ップＳ１５に移行し、文書情報全部の読み取りが完了し
たか否かを判断し、完了していない場合にはステップＳ
１３に戻って同様の処理を行い、完了している場合には
読み取った文書情報を表示部１９に表示し（Ｓ１６）、
オペレータに文書情報の言語の決定を催促する（Ｓ１
７）。一方、ステップＳ１４において文字コード頻度パ
ターンの傾向が一致またはほぼ一致するとき、そのコー
ド頻度パターンから言語を判断する。The code judging means 31 measures the character code appearance frequency by classifying each character code in the document information according to the character code of the 256-dimensional vector and sequentially storing it in the code frequency data storage section 15 (S13). . Thereafter, each character code frequency data measured and stored in the code frequency data storage unit 15 is compared with a reference character code frequency pattern of each language stored in the code frequency pattern storage unit 14, and their tendencies coincide. It is determined whether or not it is (S14). If the trends do not match, the process proceeds to step S15, and it is determined whether reading of all the document information has been completed.
13, the same processing is performed, and if completed, the read document information is displayed on the display unit 19 (S16),
Prompts the operator to determine the language of the document information (S1
7). On the other hand, when the tendency of the character code frequency pattern matches or almost matches in step S14, the language is determined from the code frequency pattern.

【００３９】このようにして言語を判断すると、翻訳処
理を実行する（Ｓ１８，Ｓ１９）。この翻訳処理は、文
書情報に関し、判断された言語を自国語に翻訳し、或い
は自国語以外の他の国の言語に翻訳するための辞書デー
タより訳語を選択し、かつ、翻訳すべき国の言語を表現
する文法データを用いて翻訳する。When the language is determined in this way, a translation process is executed (S18, S19). In this translation process, regarding the document information, the determined language is translated into its own language, or a translation word is selected from dictionary data for translation into a language of another country other than its own language, and the translation of the country to be translated is performed. Translate using grammatical data expressing the language.

【００４０】従って、以上のような実施の形態によれ
ば、文書情報の文字コード出現頻度と各言語で表現され
た文字コード頻度パターンとを比較し言語を判断するの
で、文書情報の言語を頁全部を受信することなく迅速、
正確に文書情報の言語を判断でき、自国語または所望の
言語に高精度に翻訳することができる。Therefore, according to the above-described embodiment, the language is determined by comparing the character code appearance frequency of the document information with the character code frequency pattern expressed in each language. Quick without receiving everything,
The language of the document information can be accurately determined, and the document information can be translated into a native language or a desired language with high accuracy.

【００４１】また、以上のような一連の処理プログラム
を記録媒体３６に記録し、翻訳処理部３８で読み取るよ
うにすれば、文書情報の文字コードの出現頻度から言語
を判断でき、文書情報を自国語を含む所望の言語に高精
度に翻訳できる。If a series of processing programs as described above are recorded on the recording medium 36 and read by the translation processing unit 38, the language can be determined from the frequency of appearance of the character code of the document information, and the document information can be automatically stored. It can be translated into a desired language including a national language with high accuracy.

【００４２】なお、前記文字コード頻度パターン記憶部
１４として、例えば図８に示すように連接コード頻度パ
ターン記憶部４１を設け、基本原理で説明したような連
接文字コード頻度パターンを記憶するようにしてもよ
い。例えば各言語の文書の中で比較的多く出現する複数
連接のコード頻度パターン，例えば英語の文書では比較
的出現する「ｔｈｅ」に相当する３連接コード「７
４」，「６８」，「６５」などを記憶しておけば、コー
ド言語判断手段３１は、入力される英語の文書情報中の
各文字コード出現頻度を計測し、連接コード「７４」，
「６８」，「６５」の頻度が高いとき、英語の文書情報
であると判断できるので、迅速に言語を判断でき、この
言語からフォントを決定でき、また翻訳処理も進めるこ
とができる。その結果、文書情報を翻訳する場合でも、
比較的早く、かつ、正確に所望の言語に翻訳できる。As shown in FIG. 8, for example, a connected code frequency pattern storage section 41 is provided as the character code frequency pattern storage section 14 to store the connected character code frequency pattern described in the basic principle. Is also good. For example, a plurality of concatenated code frequency patterns that appear relatively frequently in documents in each language, for example, a three-connected code “7” corresponding to “the” that relatively appears in English documents
If “4”, “68”, “65”, etc. are stored, the code language determining unit 31 measures the frequency of occurrence of each character code in the input English document information, and calculates the connection codes “74”,
When the frequency of “68” or “65” is high, it can be determined that the document information is in English, so that the language can be quickly determined, the font can be determined from this language, and the translation process can be advanced. As a result, even when translating document information,
Can be translated relatively quickly and accurately into the desired language.

【００４３】さらに、コード頻度パターン記憶部１４ま
たは４１には、各言語の文書情報の文字コードを２５６
次元ベクトルで表現したコード頻度パターンと連接文字
コード頻度パターンとの両方を記憶してもよいことは言
うまでもない。（３）文書処理システムおよび記録媒体の他の実施の形
態例。Further, the character frequency of the document information of each language is stored in the code frequency pattern storage
It goes without saying that both the code frequency pattern represented by the dimensional vector and the connected character code frequency pattern may be stored. (3) Another embodiment of a document processing system and a recording medium.

【００４４】一般に、インターネット情報を表示した
り、翻訳処理したりする場合、その情報の分野を特定す
るためにアドレス情報を利用する例が多い。例えばxxx
x.yyy.uk というアドレスにアクセスしようとする場
合、ukというのが英国の国のコードであるので、このア
ドレスの内容から英語の情報を要求していることが分か
る。また、例えばaaaa.bbbb.itというアドレスにアクセ
スする場合、itというのがイタリアの国のコードである
ので、イタリア語の情報を要求していることが分かる。In general, when displaying or translating Internet information, in many cases, address information is used to specify the field of the information. For example xxx
If you try to access the address x.yyy.uk, uk is the code for the country of the United Kingdom, so the contents of this address indicate that you are requesting English information. In addition, for example, when accessing the address “aaaa.bbbb.it”, it can be seen that the information is requested in Italian because it is the code of the country of Italy.

【００４５】この実施の形態は以上のようなアドレスか
ら要求情報の言語を判断し、フォントを決定したり、翻
訳処理に利用するものである。図９はかかる文書処理シ
ステムの要部の一実施の形態を示す機能構成図である。
なお、文書処理システムの全体構成は図３とほぼ同様で
あるので、その詳しい説明は省略する。In this embodiment, the language of the request information is determined from the above addresses, and the font is determined or used for translation processing. FIG. 9 is a functional configuration diagram showing an embodiment of a main part of the document processing system.
Note that the overall configuration of the document processing system is substantially the same as that of FIG. 3, and a detailed description thereof will be omitted.

【００４６】この文書処理システムにおいては、コード
頻度パターン記憶部１４に代え、アドレス情報の中で比
較的多く使用される各国に関連するアドレス情報を記憶
する参照用アドレス情報記憶部５０を設け、またフォン
トデータ処理部１７には、アドレス言語判断手段５１、
フォント決定手段５２および表示制御手段５３が設けら
れている。その他の構成は図３とほぼ同様な構成であ
る。In this document processing system, in place of the code frequency pattern storage unit 14, a reference address information storage unit 50 for storing address information related to each country relatively frequently used in the address information is provided. The font data processing unit 17 includes an address language determining unit 51,
Font determining means 52 and display control means 53 are provided. Other configurations are almost the same as those in FIG.

【００４７】次に、このシステムの動作について図１０
を参照して説明する先ず、予め参照アドレス情報記憶部
５０には各国に関連するアドレス情報を記憶し、またフ
ォントデータ記憶部１６には各言語のフォントデータが
記憶されている。Next, the operation of this system will be described with reference to FIG.
First, reference address information storage unit 50 stores address information related to each country in advance, and font data storage unit 16 stores font data of each language.

【００４８】この状態において動作が開始すると、フォ
ントデータ処理部１７は、記憶媒体１３からフォント決
定用プログラムを読み取り、次に述べるような所定の処
理を実行する。すなわち、フォントデータ処理部１７
は、初期化処理を実行した後（Ｓ２１）、アドレス言語
判断手段５１を実行する。When the operation starts in this state, the font data processing section 17 reads the font determination program from the storage medium 13 and executes the following predetermined processing. That is, the font data processing unit 17
Executes the initialization processing (S21), and then executes the address language determining means 51.

【００４９】このアドレス言語判断手段５１は、オペレ
ータが相手側コンピュータ１５をアクセスするためのア
ドレス情報を入力したか否かを判断する（Ｓ２２）。ア
ドレス情報が入力されたと判断したとき、このアドレス
情報が参照用アドレス情報に一致する部分があるか否か
を判断し（Ｓ２３）、一致しない場合にはエラー表示
（Ｓ２４）または相手側コンピュータ１５から送られて
くる文書情報の受信待機状態とする。The address language determining means 51 determines whether or not the operator has input address information for accessing the partner computer 15 (S22). When it is determined that the address information has been input, it is determined whether or not there is a portion where this address information matches the reference address information (S23). If not, an error display (S24) or the other computer 15 It is set to a standby state for receiving the sent document information.

【００５０】アドレス情報が参照用アドレス情報に一致
するとき、当該参照用アドレス情報の国の言語と判断
し、フォント決定手段５２を実行する。このフォント決
定手段５２は、アドレス言語判断手段５１で判断された
言語に基づき、フォントデータ記憶部１６に記憶される
各言語ごとのフォントデータの中から判断言語のフォン
トを選択決定し（Ｓ２５）、データ記憶部１８などに記
憶する。When the address information matches the reference address information, it is determined that the language of the country of the reference address information is the country, and the font determining means 52 is executed. The font determining means 52 selects and determines the font of the determined language from the font data for each language stored in the font data storage unit 16 based on the language determined by the address language determining means 51 (S25), It is stored in the data storage unit 18 or the like.

【００５１】この状態においてアドレス情報に基づき、
相手側コンピュータ５から文書情報が送られ、通信制御
部１２で受信すると、表示制御手段５３では受信された
文書情報を選択決定されたフォントを用いて表示部１９
に表示する（Ｓ２６）。In this state, based on the address information,
When the document information is transmitted from the other computer 5 and received by the communication control unit 12, the display control unit 53 converts the received document information into the display unit 19 using the selected and determined font.
(S26).

【００５２】従って、以上のような実施の形態によれ
ば、相手側コンピュータ５をアクセスするアドレス情報
から要求する文書情報の言語を容易に判断でき、迅速に
フォントを決定できる。さらに、決定されたフォントを
用いて文書情報中の各文字を表示すれば、異なるフォン
トに基づく文書情報の文字化けがなくなり、意味のある
情報を確実に表示できる。（４）翻訳処理システムの他の実施の形態例。Therefore, according to the above-described embodiment, the language of the requested document information can be easily determined from the address information for accessing the partner computer 5, and the font can be quickly determined. Further, if each character in the document information is displayed using the determined font, the character of the document information based on different fonts is not garbled, and meaningful information can be displayed reliably. (4) Another embodiment of the translation processing system.

【００５３】図１１は翻訳処理システムの一実施の形態
を示す構成図である。なお、文書処理システムの全体構
成は図３とほぼ同様であるので、その詳しい説明は省略
する。FIG. 11 is a block diagram showing an embodiment of the translation processing system. Note that the overall configuration of the document processing system is substantially the same as that of FIG. 3, and a detailed description thereof will be omitted.

【００５４】この翻訳処理システムは、図９と同様に各
国に関連するアドレス情報を記憶する参照用アドレス情
報記憶部５０を設け、この参照アドレス情報と相手側コ
ンピュータ１５を呼び出すために入力部１１から入力さ
れるアドレス情報とを比較し何れの国の言語かを判断
し、これが相手側コンピュータ１５から受け取る文書情
報の言語であると判断し翻訳処理することにある。This translation processing system is provided with a reference address information storage unit 50 for storing address information related to each country, similarly to FIG. 9, and from the input unit 11 to call up the reference address information and the other computer 15. It is to compare the input address information with the input address information to determine the language of the country, determine that this is the language of the document information received from the other computer 15, and perform the translation processing.

【００５５】この翻訳処理システムの翻訳処理部３８
は、記録憶媒体３６の翻訳処理プログラムを読み取って
次のような機能を実現する。すなわち、翻訳処理部３８
は、入力部１１から入力される相手側コンピュータ１５
のアドレス情報と参照アドレス情報記憶部５０に記憶さ
れる各国に関連する参照用アドレス情報とを比較し、相
手側コンピュータ１５のアドレス情報の言語を判断する
アドレス言語判断手段５１と、このアドレス情報のもと
に相手側コンピュータ１５から送られてくる文書情報が
一時ネットワーク情報記憶部１８ａに記憶されるが、こ
の文書情報を前記アドレス言語判断手段５１で判断され
た言語に基づいて辞書・文法記憶部３７の辞書および文
法を用いて翻訳する翻訳処理手段５４と、その翻訳結果
を一時翻訳結果データ記憶部１８ｂに記憶した後、表示
部１９に表示する表示制御手段５３とによって構成され
ている。その他の構成は図６，図９と同様である。The translation processing unit 38 of this translation processing system
Reads the translation processing program in the storage medium 36 and realizes the following functions. That is, the translation processing unit 38
Is the other computer 15 input from the input unit 11
Address language determining means 51 for comparing the address information of the other computer 15 with the reference address information related to each country stored in the reference address information storage unit 50, and determining the language of the address information of the partner computer 15. Originally, the document information sent from the partner computer 15 is stored in the temporary network information storage unit 18a. The document information is stored in the dictionary / grammar storage unit based on the language determined by the address language determination unit 51. The translation processing unit 54 translates using the 37 dictionaries and grammar, and the display control unit 53 stores the translation result in the temporary translation result data storage unit 18b and displays it on the display unit 19. Other configurations are the same as those in FIGS.

【００５６】次に、このシステムの動作について図１２
を参照して説明する予め参照アドレス情報記憶部５０に
は各国に関連するアドレス情報が記憶され、またフォン
トデータ記憶部１６には各国言語のフォントデータが記
憶されている。Next, the operation of this system will be described with reference to FIG.
The address information related to each country is stored in advance in the reference address information storage unit 50 described with reference to FIG. 5, and the font data storage unit 16 stores font data in each language.

【００５７】この状態において動作を開始すると、翻訳
処理部３８は、記憶媒体３６から翻訳処理用プログラム
を読み取り、次に述べるような所定の処理を実行する。
すなわち、翻訳処理部３８は、初期化処理を実行した後
（Ｓ３１）、アドレス言語判断手段５１を実行する。こ
の判断手段５１は、オペレータが入力部１１から相手側
コンピュータ１５をアクセスするアドレス情報を通信制
御部１２を介してネットワーク６に送出するが、このと
きアドレス情報有りと判断し（Ｓ３２）、このアドレス
情報と参照アドレス情報記憶部５０の各国参照用アドレ
ス情報とを比較し、ある国の言語の参照用アドレス情報
と一致すれば、当該国の言語と判断する（Ｓ３３）。When the operation is started in this state, the translation processing section 38 reads the translation processing program from the storage medium 36 and executes a predetermined process as described below.
That is, after performing the initialization process (S31), the translation processing unit 38 executes the address language determination unit 51. The judging means 51 sends address information for the operator to access the partner computer 15 from the input unit 11 to the network 6 via the communication control unit 12. At this time, it is judged that there is address information (S32). The information is compared with the reference address information of each country in the reference address information storage unit 50. If the information matches the reference address information of the language of a certain country, it is determined that the language is that country (S33).

【００５８】ここで、翻訳処理手段５４は、相手側コン
ピュータ１５から送られてくる文書情報が一時ネットワ
ーク情報記憶部１８ａに記憶されるが、翻訳処理の指示
があれば（Ｓ３４）、ネットワーク情報記憶部１８ａか
ら文書情報を取り出し、アドレス言語判断手段５１で判
断された言語に基づき、辞書データおよび文法データを
用いて翻訳処理を実行し（Ｓ３５）、その翻訳結果を順
次翻訳結果データ記憶部１８ｂに記憶する。Here, the translation processing means 54 stores the document information sent from the partner computer 15 in the temporary network information storage unit 18a. If there is a translation processing instruction (S34), the translation processing means 54 stores the network information. The document information is extracted from the unit 18a, and the translation process is performed using the dictionary data and the grammatical data based on the language determined by the address language determination unit 51 (S35), and the translation results are sequentially stored in the translation result data storage unit 18b. Remember.

【００５９】しかる後、翻訳完了か否かを判断し、翻訳
完了の場合には表示制御手段５３を実行する。つまり、
翻訳結果データ記憶部１８ｂに記憶されている翻訳結果
データを読み出して表示部１９に表示する（Ｓ３７）。
なお、その都度，翻訳結果を表示部１９に表示してもよ
い。Thereafter, it is determined whether or not the translation is completed. If the translation is completed, the display control means 53 is executed. That is,
The translation result data stored in the translation result data storage unit 18b is read and displayed on the display unit 19 (S37).
The translation result may be displayed on the display unit 19 each time.

【００６０】従って、以上のような実施の形態によれ
ば、予め各国参照用のアドレス情報を記憶し、入力部１
１から相手側コンピュータ１５をアクセスするアドレス
情報を受けたとき、各国参照用のアドレス情報を参照
し、相手側コンピュータ１５から送られてくる文書情報
の言語を決定するので、相手側コンピュータ１５から文
書が届く前に何れの国の言語かを速やかに判断でき、か
つ、正確な言語のもとに迅速に翻訳処理でき、かつ、精
度の高い翻訳処理を行うことができる。（５）翻訳システムのさらに他の実施の形態例。Therefore, according to the above embodiment, the address information for each country reference is stored in advance and the input unit 1
1 receives the address information for accessing the other computer 15 from the other computer 15 and determines the language of the document information sent from the other computer 15 by referring to the address information for country reference. Can be quickly determined before the message arrives, the translation process can be quickly performed in an accurate language, and the translation process can be performed with high accuracy. (5) Still another embodiment of the translation system.

【００６１】一般に、機械翻訳においては、翻訳する文
書情報の分野に応じた辞書・文法などを用いて翻訳する
と、翻訳精度が向上する。従来の技術では、翻訳する文
書を解析し分野判定をした後、改めて翻訳するという手
法を採用しているが、例えばインターネット情報を翻訳
する場合にはその情報を取得するまで、その分野がどの
分野であるか分からず、判定できないといった問題があ
る。In general, in machine translation, translation using a dictionary or grammar corresponding to the field of document information to be translated improves translation accuracy. In the conventional technology, a technique of analyzing a document to be translated, determining a field, and then translating again is adopted.For example, when translating Internet information, the field in which the field is to be acquired is determined until the information is acquired. There is a problem that it is impossible to determine whether or not this is the case.

【００６２】そこで、本実施の形態においては、xxxx.y
yy.go.jpやaaaa.bbb.govのアドレス情報は政府機関であ
るので、政治経済分野の情報であると判断でき、例えば
president のサイトの場合には大統領という訳語を優先
させることが有効である。具体例を示すと、www.whiteh
ouse.govのサイトでは、政治分野の辞書を使用し、pres
ident を大統領と翻訳するが、www.toshiba.co.jp の場
合にはビジネス分野の辞書を使用し、社長と翻訳するも
のである。Therefore, in the present embodiment, xxxx.y
Since the address information of yy.go.jp and aaaa.bbb.gov is a government agency, it can be determined that it is information on the political economy field, for example,
In the case of president's site, it is effective to prioritize the translation of president. For example, www.whiteh
ouse.gov's site uses a political dictionary and pres
ident is translated as president, but in the case of www.toshiba.co.jp, a dictionary in the business field is used and translated with the president.

【００６３】図１３は以上述べたアドレス情報をもとに
分野判定を行う翻訳処理システムの他の実施の形態を示
す構成図である。この翻訳処理システムは、各国の分野
を表す分野参照データを記憶する分野参照データ記憶部
６１を設け、また翻訳処理部３８には、入力部１１から
入力されるアドレス情報と分野参照データとを比較し分
野判定を行う分野判定手段６２と、入力部１１から入力
されるアドレス情報に基づいて相手側コンピュータ５か
ら送られてくる文書情報の翻訳に際し、前記分野判定デ
ータに基づく辞書を選択し文書情報の翻訳を実行する翻
訳処理手段６３と、この翻訳処理結果のデータを表示部
１９に表示する表示制御手段６４とが設けられている。FIG. 13 is a block diagram showing another embodiment of the translation processing system for making a field judgment based on the address information described above. This translation processing system is provided with a field reference data storage section 61 for storing field reference data representing fields in each country, and a translation processing section 38 compares address information input from the input section 11 with field reference data. A field determining means for performing field determination, and a dictionary based on the field determination data selected upon translation of the document information sent from the other computer based on the address information input from the input unit. And a display control means 64 for displaying data of the result of the translation processing on the display unit 19.

【００６４】その他の構成部分は図１１と同一であるの
で、ここではその説明は省略する。次に、以上のような
システムの動作について図１４を参照して説明する。予
め分野参照データ記憶部６１に各国の分野を特定するた
めの分野参照データが記憶され、また辞書・文法記憶部
３７には各国の分野別辞書や各国言語の文法データが記
憶されている。The other components are the same as those in FIG. 11, and the description thereof is omitted here. Next, the operation of the above system will be described with reference to FIG. Field reference data for specifying a field of each country is stored in advance in the field reference data storage section 61, and a dictionary for each field and grammar data of each language are stored in the dictionary / grammar storage section 37.

【００６５】この状態において動作が開始すると、翻訳
処理部３８は、記憶媒体３６から翻訳処理用プログラム
を読み取り、初期化処理を実行した後（Ｓ４１）、分野
判断手段６２を実行する。この分野判断手段６２は、オ
ペレータが入力部１１から相手側コンピュータ１５をア
クセスするアドレス情報を通信制御部１２を介してネッ
トワーク６に送出すると、アドレス情報有りと判断し
（Ｓ４２）、このアドレス情報と分野参照データ記憶部
６１の各国の分野参照データとを比較し、ある分野参照
データと一致したとき（Ｓ４３）、アドレス情報が当該
国の分野参照データの分野であると判断する。When the operation starts in this state, the translation processing section 38 reads the translation processing program from the storage medium 36, executes the initialization processing (S41), and executes the field determination means 62. When the operator sends address information for accessing the partner computer 15 from the input unit 11 to the network 6 via the communication control unit 12, the field determining unit 62 determines that there is address information (S42). The field reference data is compared with the field reference data of each country in the field reference data storage section 61, and when the address information matches the field reference data (S43), it is determined that the address information is the field of the field reference data of the country.

【００６６】ここで、翻訳処理手段６３は、相手側コン
ピュータ１５から送られてくる文書情報が一時ネットワ
ーク情報記憶部１８ａに記憶されるが、翻訳処理の指示
があれば（Ｓ４４）、ネットワーク情報記憶部１８ａか
ら文書情報を取り出し、分野判断手段６２で判断された
分野に基づいて特定の分野辞書を選択し、かつ、分野判
断手段６２で判断された分野の国の文法データを取り出
し、翻訳処理を実行し（ＳＳ４５）、その翻訳結果を順
次翻訳結果データ記憶部１８ｂに記憶する。Here, the translation processing means 63 stores the document information sent from the partner computer 15 in the temporary network information storage unit 18a. If there is an instruction for the translation processing (S44), the translation processing means 63 stores the network information. The document information is extracted from the section 18a, a specific field dictionary is selected based on the field determined by the field determination means 62, and grammatical data of the country of the field determined by the field determination means 62 is retrieved. The translation is executed (SS45), and the translation results are sequentially stored in the translation result data storage unit 18b.

【００６７】しかる後、翻訳完了か否かを判断し（Ｓ４
６）、翻訳完了の場合には表示制御手段５３は翻訳結果
データ記憶部１８ｂに記憶される翻訳結果データを読み
出して表示部１９に表示する（Ｓ４７）。なお、その都
度，翻訳結果を表示部１９に表示してもよい。Thereafter, it is determined whether the translation is completed (S4).
6) If the translation is completed, the display control means 53 reads the translation result data stored in the translation result data storage unit 18b and displays it on the display unit 19 (S47). The translation result may be displayed on the display unit 19 each time.

【００６８】従って、以上のような実施の形態によれ
ば、予め各国の分野参照データを記憶し、入力部１１か
ら相手側コンピュータ１５をアクセスするアドレス情報
を受けたとき、各国の分野参照データを参照し、相手側
コンピュータ１５から送られてくる文書情報の分野を判
断するので、相手側コンピュータ１５から文書が届く前
に何れの国の言語および分野かを速やかに判断でき、か
つ、正確な言語および分野のもとに迅速に翻訳処理で
き、かつ、精度の高い翻訳処理を行うことができる。Therefore, according to the above-described embodiment, field reference data of each country is stored in advance, and when address information for accessing the other computer 15 is received from the input unit 11, the field reference data of each country is stored. Since the field of the document information sent from the other computer 15 is referred to, the language and field of which country can be quickly determined before the document arrives from the other computer 15, and the correct language can be determined. In addition, translation processing can be quickly performed in accordance with a field and a highly accurate translation processing can be performed.

【００６９】[0069]

【発明の効果】以上説明したように本発明によれば、次
のような種々の効果を奏する。請求項１，２の発明によ
れば、入力される文書情報の文字コード出現頻度と予め
定められている言語の文字コード頻度パータンとを参照
することにより、文書情報の言語を迅速、かつ、正確に
判断でき、この言語からフォントを迅速、正確に決定で
きる。よって、文書情報の文字化けを解決でき、意味の
ある文書情報を表示できる。As described above, according to the present invention, the following various effects can be obtained. According to the first and second aspects of the present invention, the language of the document information can be quickly and accurately determined by referring to the character code appearance frequency of the input document information and the character code frequency pattern of a predetermined language. The language can be quickly and accurately determined from this language. Therefore, the garbled character of the document information can be resolved, and meaningful document information can be displayed.

【００７０】請求項３の発明によれば、入力される文書
情報の文字コード出現頻度から言語を判断し翻訳処理を
行うので、文書情報の頁全部を受信することなく正確に
言語を判断でき、翻訳処理の効率および精度を上げるこ
とができる。According to the third aspect of the present invention, the language is determined based on the character code appearance frequency of the input document information and the translation process is performed. Therefore, the language can be accurately determined without receiving the entire page of the document information. The efficiency and accuracy of the translation process can be improved.

【００７１】請求項４の発明によれば、入力される文書
情報の文字コード出現頻度から言語を見つけ出しフォン
トを決定できるプログラムを記録した記録媒体を提供で
きる。According to the fourth aspect of the present invention, it is possible to provide a recording medium in which a program capable of finding a language from the character code appearance frequency of input document information and determining a font is recorded.

【００７２】請求項５の発明によれば、入力される文書
情報の文字コード出現頻度から言語を見つけ出し、この
言語を文書情報の言語と判断し翻訳処理を実現するプロ
グラムを記録した記録媒体を提供できる。According to the fifth aspect of the present invention, there is provided a recording medium storing a program for finding a language from the frequency of occurrence of character codes of input document information, determining the language as the language of the document information, and implementing a translation process. it can.

【００７３】請求項６の発明によれば、文字コード頻度
パータンとして連接コード頻度パターンを用いることに
より、さらに迅速、かつ、正確にフォントを決定でき、
また翻訳処理を実現できる。According to the sixth aspect of the present invention, a font can be determined more quickly and accurately by using a concatenated code frequency pattern as a character code frequency pattern.
Also, translation processing can be realized.

【００７４】請求項７の発明によれば、相手機器を呼び
出すアドレス情報に基づいて相手機器から送られてくる
文書情報の言語を判断するので、フォントを迅速、正確
に決定できる。これにより、文書情報の文字化けを解決
でき、意味のある文書情報を表示できる。According to the seventh aspect of the present invention, the language of the document information sent from the partner device is determined based on the address information for calling the partner device, so that the font can be determined quickly and accurately. Thereby, the garbled character of the document information can be solved, and meaningful document information can be displayed.

【００７５】請求項８の発明によれば、相手機器を呼び
出すアドレス情報に基づいて相手機器から送られてくる
文書情報の言語を判断するので、文書情報を受信する前
に正確に言語を判断でき、翻訳処理の効率および精度を
上げることができる。According to the invention of claim 8, since the language of the document information sent from the partner device is determined based on the address information for calling the partner device, the language can be accurately determined before receiving the document information. The efficiency and accuracy of the translation process can be improved.

【００７６】請求項９の発明によれば、相手機器を呼び
出すアドレス情報から言語を見つけ出し、この言語を相
手機器から送られてくる文書情報の言語と特定するの
で、フォントを迅速、正確に決定できる。これにより、
文書情報の文字化けを解決でき、意味のある文書情報を
表示できる記憶媒体を提供できる。According to the ninth aspect of the present invention, the language is found from the address information for calling the partner device, and this language is specified as the language of the document information sent from the partner device, so that the font can be determined quickly and accurately. . This allows
It is possible to provide a storage medium that can solve garbled document information and display meaningful document information.

【００７７】請求項１０の発明によれば、相手機器を呼
び出すアドレス情報から言語を見つけ出し、この言語を
相手機器から送られてくる言語と特定し、翻訳処理を行
うので、翻訳処理の効率および精度を上げることができ
る。According to the tenth aspect of the present invention, the language is found from the address information for calling the partner device, the language is specified as the language sent from the partner device, and the translation process is performed. Can be raised.

【００７８】請求項１１の発明によれば、相手機器を呼
び出すアドレス情報から分野を特定し分野別辞書を選択
使用することにより、翻訳精度を高めることができる。
請求項１２の発明によれば、文書情報の文字コード出現
頻度やアドレス情報から言語を判断し、自国語を含む所
要の言語に迅速、正確に翻訳処理できる。According to the eleventh aspect, the translation accuracy can be improved by specifying the field from the address information for calling the partner device and selecting and using the dictionary according to the field.
According to the twelfth aspect, the language can be determined from the character code appearance frequency of the document information and the address information, and the translation processing can be quickly and accurately performed into a required language including the native language.

[Brief description of the drawings]

【図１】本発明システムの基本原理を説明する各言語
の文字コード頻度パターンと文書情報の文字コードの出
現頻度との関係を説明する図。FIG. 1 is a view for explaining a relationship between a character code frequency pattern of each language and an appearance frequency of a character code of document information for explaining a basic principle of the system of the present invention.

【図２】本発明システムの基本原理を説明する各言語
とコードとの関係を説明する図。FIG. 2 is a view for explaining the relationship between each language and a code for explaining the basic principle of the system of the present invention.

【図３】本発明に係わる文書処理システムの一実施の
形態を示す構成図。FIG. 3 is a configuration diagram showing one embodiment of a document processing system according to the present invention.

【図４】図３に示すフォントデータ処理部の機能構成
図。FIG. 4 is a functional configuration diagram of a font data processing unit shown in FIG. 3;

【図５】図３に示すシステムの動作を説明するフロー
図。FIG. 5 is a flowchart for explaining the operation of the system shown in FIG. 3;

【図６】翻訳処理システムに適用した本発明システム
の他の実施の形態を示す構成図。FIG. 6 is a configuration diagram showing another embodiment of the system of the present invention applied to a translation processing system.

【図７】図６に示すシステムの動作を説明するフロー
図。FIG. 7 is a flowchart for explaining the operation of the system shown in FIG. 6;

【図８】文字コード頻度パターンとして連接文字コー
ド頻度パターンを用いた場合の構成例を示す図。FIG. 8 is a diagram showing a configuration example in a case where a concatenated character code frequency pattern is used as a character code frequency pattern.

【図９】本発明に係わる文書処理システムのうち、フ
ォントデータ処理部に関連する部分の他の実施の形態を
示す機能構成図。FIG. 9 is a functional configuration diagram showing another embodiment of a part related to a font data processing unit in the document processing system according to the present invention.

【図１０】図９に示すフォントデータ処理部の動作を
説明するフロー図。FIG. 10 is a flowchart for explaining the operation of the font data processing unit shown in FIG. 9;

【図１１】翻訳処理システムに適用した本発明システ
ムのうち、翻訳処理部に関連する部分の他の実施の形態
を示す機能構成図。FIG. 11 is a functional configuration diagram showing another embodiment of a portion related to the translation processing unit in the system of the present invention applied to the translation processing system.

【図１２】図１１に示す翻訳処理部の動作を説明する
フロー図。FIG. 12 is a flowchart for explaining the operation of the translation processing unit shown in FIG. 11;

【図１３】翻訳処理システムに適用した本発明システ
ムのうち、翻訳処理部に関連する部分の他の実施の形態
を示す機能構成図。FIG. 13 is a functional configuration diagram showing another embodiment of a part related to the translation processing unit in the system of the present invention applied to the translation processing system.

【図１４】図１３に示す翻訳処理部の動作を説明する
フロー図。FIG. 14 is a flowchart for explaining the operation of the translation processing unit shown in FIG. 13;

[Explanation of symbols]

１…文書情報２…文字コード頻度パターン５…相手側コンピュータ６…ネットワーク１３…記憶媒体１４…コード頻度パターン記憶部１５…コード頻度データ記憶部１６…フォントデータ記憶部１７…フォントデータ処理部３１…コード言語判断手段３２，５２…フォント決定手段３３，５３，６４…表示制御手段３６…辞書・文法記憶部３８…翻訳処理部４１…連接コード頻度パターン記憶部５０…参照アドレス情報記憶部５１…アドレス言語判断手段５４，６３…翻訳処理手段６１…分野参照データ記憶部６２…分野判断手段 DESCRIPTION OF SYMBOLS 1 ... Document information 2 ... Character code frequency pattern 5 ... Other computer 6 ... Network 13 ... Storage medium 14 ... Code frequency pattern storage unit 15 ... Code frequency data storage unit 16 ... Font data storage unit 17 ... Font data processing unit 31 ... Code language determining means 32, 52 Font determining means 33, 53, 64 Display control means 36 Dictionary / grammar storage section 38 Translation processing section 41 Connection code frequency pattern storage section 50 Reference address information storage section 51 Address Language determination means 54, 63 ... Translation processing means 61 ... Field reference data storage section 62 ... Field determination means

Claims

[Claims]

1. A document processing system, comprising: means for measuring the frequency of appearance of character codes in input document information, and determining a font of the document information or a language of any country. .

2. A code frequency pattern storage means for storing character code frequency patterns of a plurality of languages, a font data storage means for storing font data of a plurality of languages, and an own input means or another connection to a network. A code for measuring the character code appearance frequency of the document information input from the device via the communication control means, comparing the measured character code appearance frequency with the character code frequency pattern, and determining the language of the document information Language determination means;
Selecting and determining a font from the font data storage means based on the language determined by the code language determination means,
A document determination system for displaying document information using the determined font.

3. A code frequency pattern storage unit for storing a character code frequency pattern of a first language, and document information input from its own input unit or another device connected to a network via a communication control unit. A code language judging means for measuring the character code appearance frequency, comparing the measured character code appearance frequency with the character code frequency pattern, and judging the language of the document information; A dictionary / grammar storage unit for storing dictionary data and grammar data for translation into, and when the document information is determined to be in the first language by the code language determining unit, the dictionary data and the grammar data are used. And a translation processing means for translating the document information into a second language.

4. A computer for font data processing for determining a font of input document information, measuring a character code appearance frequency of the input document information, and storing the measured character code appearance frequency and a preliminarily stored character code appearance frequency. A code language determining function of comparing the character code frequency pattern of each language to determine the language of the document information, and a language determined by the code language determining function.
A font determining function for selecting and determining desired font data from font data for each language stored in advance, and a display for displaying the input document information using the font data determined by the font determining function A computer-readable recording medium on which a program for realizing a control function is recorded.

5. A translation processing computer for translating input document information, wherein a character code appearance frequency of the input document information is measured, and the measured character code appearance frequency is stored in advance. A code language judging function for judging the language of the document information by comparing the character code frequency patterns of the respective languages, and judging that the language judged by the code language judging function is the first language of the document information. A translation processing function of translating the document information using a dictionary / grammar for translating a first language into a second language, and a display control function of displaying a translation result translated by the translation processing function. Readable recording medium on which a program for recording is recorded.

6. The code frequency pattern according to claim 1, wherein only the concatenated character code frequency pattern or a combination of the concatenated character code frequency pattern and the non-concatenated character code frequency pattern is used. A document processing system including the storage medium according to any one of the above.

7. An input means for inputting address information for calling a counterpart device connected to the network, and address information input from the input means is transmitted to the network, and transmitted from the counterpart device on the network. Communication control means for receiving incoming information, reference address information storage means for storing reference address information in each language in advance, and comparing the address information inputted from the input means with the reference address information, Address language determining means for determining the language of the information sent from the device, font data storing means for storing font data of a plurality of languages, and storing the font data based on the language determined by the address language determining means Means for selecting and determining desired font data, and using the determined font data, the communication control is performed. And a font determining means for displaying information received by the control means.

8. An input means for inputting address information for calling a counterpart device connected to the network, and address information input from the input means is transmitted to the network, and is transmitted from the counterpart device on the network. Communication control means for receiving incoming information, reference address information storage means for storing reference address information in each language in advance, and comparing address information input from the input means with reference address information for each country. Address language determining means for determining the language of the information sent from the other device; dictionary / grammar storage means for storing dictionary data and grammar data for translating the first language into a second language;
Translation processing means for translating the language of the information sent from the counterpart device to a second language using dictionary data and grammatical data when the address language determination means determines that the language is the first language. A document processing system comprising:

9. A font data processing computer for determining a font of input document information includes: address information for calling a partner device connected to a network; reference address information stored in advance for each language; Address language judging function for judging the language of the document information by comparing with the language information, and selecting the desired font data from the font data for each language stored in advance based on the language judged by the code language judging function. A computer-readable recording medium storing a program for realizing a font determining function to be determined and a display control function for displaying the input document information using the font data determined by the font determining function.

10. A translation processing computer for translating input document information compares address information for calling a partner device connected to a network with reference address information stored in advance for each language. An address language judging function for judging a language of the document information, and when it is judged that the language judged by the address language judging function is the first language of the document information, the first language is changed to the second language. A computer-readable program storing a program for realizing a translation processing function of translating the document information using a dictionary / grammar translated to a computer and a display control function of displaying a translation result translated by the translation processing function recoding media.

11. An input means for inputting address information for calling a partner device connected to the network, and address code information input from the input unit is transmitted to the network, and transmitted from the partner device on the network. Communication control means for receiving received information, reference address code storage means for storing field-specific reference address code information in advance, address code information input from the input means and the field-specific reference address code information, Address code determining means for determining the first language of the information sent from the other device by comparing the first language and the dictionary data and grammatical data for translating the first language into the second language. When the information is in the first language by the dictionary / grammar storage means and the address code language determination means, the dictionary data and And a translation processing means for translating into a second language using grammar data.

12. The method according to claim 3, wherein the first language and the second language are at least one or more different languages.
A document processing system including the storage medium according to any one of claims 1 to 3.