JP2000338870A

JP2000338870A - Text electronic authentication device, method, and recording medium recording text electronic authentication program

Info

Publication number: JP2000338870A
Application number: JP11145676A
Authority: JP
Inventors: Hiroto Inagaki; 博人稲垣; Kazuhiro Hayakawa; 和宏早川; Daijiro Mori; 大二郎森; Kazuo Tanaka; 一男田中
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: NTT Inc
Priority date: 1999-05-25
Filing date: 1999-05-25
Publication date: 2000-12-08
Anticipated expiration: 2019-05-25
Also published as: JP3676120B2

Abstract

(57)【要約】【課題】本発明の目的は、ＨＴＭＬ文書を代表とする
テキストが記述された電子文書に対して適切な認証を与
えることができ、かつ、著作権を無視した、違法なコピ
ーなどを排除するために、電子文書に埋め込んだ認証情
報を取り出すことのできるテキスト電子認証装置を提供
することにある。【解決手段】本発明のテキスト電子認証装置は、テキ
ストが記述された電子文書からテキストを読み取り、読
み取ったテキストの特徴を抽出し、テキストの特徴を表
す情報と発行元の情報とテキストの認証のための情報と
を利用して、既存の電子文書表示器では不可視であっ
て、かつ、解読不能な暗号化電子認証情報とし、該暗号
化電子認証情報を前記テキストに埋め込む機能と、該暗
号化認証情報を埋め込んだ電子文書から、発行元の情報
とテキストの認証のための情報と元のテキストを取り出
す機能を備える。 (57) [Summary] An object of the present invention is to provide an appropriate authentication for an electronic document in which a text typified by an HTML document is described, and to ignore an illegal copyright. An object of the present invention is to provide a text electronic authentication apparatus capable of extracting authentication information embedded in an electronic document in order to eliminate copying and the like. A text electronic authentication device according to the present invention reads a text from an electronic document in which the text is described, extracts a feature of the read text, and obtains information representing the text feature, information of a publisher, and authentication of the text. A function for embedding the encrypted electronic authentication information in the text, making the encrypted electronic authentication information invisible and indecipherable by the existing electronic document display using the information for It has a function of extracting information of the publisher and information for authenticating the text and the original text from the electronic document in which the authentication information is embedded.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、テキストが記述さ
れた電子文書（ファイル）に対する電子的な認証を目的
として、テキスト中に透かし情報を入れる電子認証装置
に関する。[0001] 1. Field of the Invention [0002] The present invention relates to an electronic authentication apparatus that inserts watermark information into text for the purpose of electronic authentication of an electronic document (file) in which text is described.

【０００２】[0002]

【従来の技術】従来、インターネットでは、ＨＴＭＬ
（Hyper Text Markup Language）文書ファイルが、標準
的な文書として利用されている。ＨＴＭＬ文書ファイル
に記述されたテキストの認証は、適切なサイトから認証
されうるセキユアな方法、例えば、ＳＳＬ（セキュア
ソケットレイア）などを利用することにより、適切な
認証が行われている。2. Description of the Related Art Conventionally, in the Internet, HTML
(Hyper Text Markup Language) document files are used as standard documents. The authentication of the text described in the HTML document file is performed by a secure method that can be authenticated from an appropriate site, for example, SSL (secure).
Appropriate authentication is performed by using a socket layer).

【０００３】[0003]

【発明が解決しようとする課題】しかし、これらの文書
は、ブラウザや、クローラとよばれるような機械的にｗ
ｅｂサーバからＨＴＭＬ文書を取り出すソフトウェアに
より、サーバ側からクライアント側に取り出すことがで
きるため、クライアント側に取り出した文書を不正に使
用することが可能であった。また、上記ＳＳＬ等の方法
では、特定の情報のみのテキスト情報をやりとりする場
合には、適切であるが、どのような文書に対しても適切
な認証を行うことは困難であり、かつ、一般のＨＴＭＬ
文書のに対して適切な認証を与えることが難しかった。However, these documents are stored in a browser or in a mechanical form such as a crawler.
Since the software for extracting the HTML document from the web server can be extracted from the server side to the client side, the document extracted on the client side can be used illegally. Further, the method such as SSL is appropriate when text information including only specific information is exchanged, but it is difficult to perform appropriate authentication for any document. HTML
It was difficult to give proper certification to documents.

【０００４】本発明は、上記の点に鑑みてなされたもの
で、インターネットなどで標準的に使用されているＨＴ
ＭＬ文書を代表とする電子文書に対して適切な認証を与
えることができ、かつ、著作権を無視した違法なコピー
などを排除するために、ＨＴＭＬ文書等の電子文書に埋
め込んだ認証情報（透かし情報）を取り出すことのでき
るテキスト電子認証装置、方法、及び、テキスト電子認
証プログラムを記録した記録媒体を提供するものであ
る。[0004] The present invention has been made in view of the above points, and has been developed in accordance with the HT standard used in the Internet and the like.
Appropriate authentication can be given to an electronic document represented by an ML document, and authentication information (watermark) embedded in an electronic document such as an HTML document is used in order to eliminate illegal copying ignoring copyright. The present invention provides a text electronic authentication device, a method, and a recording medium on which a text electronic authentication program can be extracted.

【０００５】[0005]

【課題を解決するための手段】本発明のテキスト電子認
証装置は、テキストが記述された電子文書に対して、該
電子文書に認証情報を埋め込むことで該電子文書を認証
可能とするテキスト電子認証装置であって、前記テキス
トが記述された電子文書からテキストを読み取るテキス
ト読み取り部と、前記テキスト読み取り部が読み取った
テキストの特徴を抽出するテキスト特徴抽出部と、前記
電子文書に記述されたテキストの発行元の情報を入力す
るテキスト発行元情報入力部と、前記電子文書に記述さ
れたテキストの認証のための情報を入力するテキスト認
証情報入力部と、前記テキスト特徴抽出部が抽出したテ
キストの特徴を表す情報と、前記テキスト発行元情報入
力部に入力された発行元の情報と、前記テキスト認証情
報入力部に入力されたテキストの認証のための情報と、
を利用し、既存の電子文書表示器では不可視であって、
かつ、解読不能な暗号化電子認証情報とするテキスト認
証情報発生部と、前記テキスト認証情報発生部により生
成された暗号化電子認証情報を前記テキスト読み取り部
により読み取られたテキストに埋め込み、かつ、該暗号
化電子認証情報を埋め込んだテキストを編集しても暗号
化電子認証情報が消失しないように暗号化電子認証情報
を埋め込むテキスト認証情報埋め込み部と、からなるこ
とを特徴とする。SUMMARY OF THE INVENTION A text electronic authentication apparatus according to the present invention enables a digital electronic document in which text is described to authenticate an electronic document by embedding authentication information in the electronic document. An apparatus, comprising: a text reading unit that reads text from an electronic document in which the text is described; a text feature extraction unit that extracts features of the text read by the text reading unit; A text publisher information input unit for inputting publisher information, a text authentication information input unit for inputting text authentication information described in the electronic document, and text features extracted by the text feature extraction unit , Information on the publisher input to the text publisher information input unit, and information input to the text authentication information input unit. And information for the text of authentication was,
And is invisible on existing electronic document displays,
And, a text authentication information generating unit to be unencrypted encrypted electronic authentication information, and the encrypted electronic authentication information generated by the text authentication information generating unit is embedded in the text read by the text reading unit, and A text authentication information embedding unit for embedding the encrypted electronic authentication information so that the encrypted electronic authentication information is not lost even if the text in which the encrypted electronic authentication information is embedded is edited.

【０００６】上記構成とし、テキストの特徴を表す情報
と、テキストの発行元の情報と、テキストの認証のため
の情報とからなる認証情報を暗号化し電子文書に埋め込
むので、電子文書に埋め込まれたテキストの特徴をもと
に改ざんの有無を検出することができとともに、テキス
トの発行元の情報と、テキストの認証のための情報とか
ら発行元に基づく電子文書の認証を行うことができる。
なお、テキストの発行元の情報とは、テキストを発行し
た本人であることを示す情報（例えば、テキストの著作
権を持つ会社組織や、テキストを著作した個人の住所、
氏名等である）。また、テキストの認証のための情報と
は、公的機関や、ある種の認証会社が発行するＩＤであ
る。また、電子文書表示器とは、専用の表示装置、もし
くは、パソコン等のコンピュータ上で実行されるｗｅｂ
ブラウザ等のブラウジング用アプリケーションソフトウ
ェアを利用するものである。[0006] With the above configuration, authentication information including information representing the characteristics of the text, information on the publisher of the text, and information for authenticating the text is encrypted and embedded in the electronic document. The presence or absence of falsification can be detected based on the characteristics of the text, and the electronic document can be authenticated based on the publisher based on the information of the publisher of the text and the information for authenticating the text.
In addition, the information of the publisher of the text is information indicating that the user has issued the text (for example, a company organization having copyright of the text, an address of an individual who wrote the text,
Name etc.). The information for text authentication is an ID issued by a public organization or a certain authentication company. An electronic document display is a dedicated display device or a Web executed on a computer such as a personal computer.
It uses browsing application software such as a browser.

【０００７】また、本発明のテキスト電子認証装置にお
いて、前記テキスト特徴抽出部は、抽出するテキストの
特徴の大きさに応じた複数の抽出方法を実行可能であ
り、該抽出するテキストの特徴の大きさの指定に応じ
て、テキストの特徴を抽出し出力するとともに、用いた
抽出方法を示す識別子をさらに出力することを特徴とす
る。本発明によれば、テキスト全体を認証する場合、
テキストの内容も認証したい場合等の状況に応じて、抽
出するテキストの特徴の大きさに対応した抽出方法を選
択・実行することで適切な認証を行うことができる。Further, in the text electronic authentication device of the present invention, the text feature extraction unit can execute a plurality of extraction methods according to the size of the feature of the text to be extracted, and the size of the feature of the text to be extracted is provided. In accordance with the designation of the length, the feature of the text is extracted and output, and an identifier indicating the extraction method used is further output. According to the invention, when authenticating the entire text,
Appropriate authentication can be performed by selecting and executing an extraction method corresponding to the size of the feature of the text to be extracted in accordance with the situation where the contents of the text are also to be authenticated.

【０００８】また、本発明のテキスト電子認証装置にお
いて、前記テキスト認証情報発生部は、前記テキストの
特徴を表す情報が入力され、該テキストの特徴を表す情
報を出力する特徴量入力部と、前記テキストの特徴を抽
出する際に用いられた抽出方法を示す識別子が入力さ
れ、該識別子を出力する特徴量識別子入力部と、前記テ
キストの発行元の情報が入力され、該テキストの発行元
の情報を出力する発行元情報入力部と、前記テキストの
認証のための情報が入力され、該テキストの認証のため
の情報を出力する発行元認証情報入力部と、前記テキス
トの特徴を表す情報と、識別子と、テキストの発行元の
情報と、テキストの認証のための情報とからなる認証情
報を暗号化し、暗号化認証情報を出力する暗号化器と、
前記暗号化認証情報を、前記表示器がテキストの表示に
使用しないコードに変換した不可視バイト列とし、該不
可視バイト列を出力するコード変換部と、前記不可視バ
イト列が入力され、該不可視バイト列を出力する暗号化
認証情報出力部とからなることを特徴とする。Further, in the text electronic authentication apparatus of the present invention, the text authentication information generating section receives information representing the characteristics of the text, and outputs a characteristic amount inputting information representing the characteristics of the text. An identifier indicating an extraction method used for extracting a feature of a text is input, a feature identifier input unit for outputting the identifier, and information of a publisher of the text are input, and information of a publisher of the text is input. An information input unit that outputs information for authentication of the text, and an authentication information input unit that outputs information for authentication of the text, and information indicating characteristics of the text. An encryptor that encrypts authentication information including an identifier, information of a text issuer, and information for text authentication, and outputs encrypted authentication information;
The encrypted authentication information, the display device as an invisible byte sequence converted to a code not used for displaying text, a code conversion unit that outputs the invisible byte sequence, the invisible byte sequence is input, the invisible byte sequence And an encrypted authentication information output unit that outputs

【０００９】また、本発明のテキスト電子認証装置にお
いて、前記テキスト認証情報埋め込み部は、前記識別子
が入力され、該識別子を出力する特徴量識別子入力部
と、前記不可視バイト列が入力され、該不可視バイト列
を出力する暗号化認証情報入力部と、前記電子文書に記
述されたテキストが入力されるテキスト入力部と、前記
不可視バイト列のすべてを前記電子文書に挿入したか判
定する判定部と、前記識別子に応じて前記暗号化認証情
報を取り込み出力する埋込暗号化認証情報出力部と、前
記識別子に応じて前記電子文書に記述されたテキストを
取り込み出力する埋込テキスト出力部と、前記判定部に
より前記不可視バイト列のすべてを前記電子文書に埋め
込んだと判定された場合、前記電子文書に記述されたテ
キストの残りすべてを出力するテキスト出力部とからな
り、前記判定部により前記不可視バイト列のすべてを前
記電子文書に挿入していないと判定された場合、前記埋
込暗号化認証情報出力部と埋込テキスト出力部は交互に
出力を行うことを特徴とする。Further, in the text electronic authentication device of the present invention, the text authentication information embedding section receives the identifier, and inputs a feature quantity identifier input section for outputting the identifier; An encrypted authentication information input unit that outputs a byte string, a text input unit into which a text described in the electronic document is input, and a determination unit that determines whether all of the invisible byte strings have been inserted into the electronic document. An embedded encryption authentication information output unit that captures and outputs the encrypted authentication information according to the identifier; an embedded text output unit that captures and outputs text described in the electronic document according to the identifier; If it is determined by the section that all of the invisible byte strings are embedded in the electronic document, all of the remaining text described in the electronic document A text output unit that outputs the embedded encrypted authentication information output unit and the embedded text output unit when the determination unit determines that all of the invisible byte strings have not been inserted into the electronic document. It is characterized in that output is performed alternately.

【００１０】また、本発明は、請求項１に記載のテキス
ト電子認証装置により認証情報を埋め込まれた電子文書
から、該認証情報を取り出す装置であって、前記テキス
ト認証情報埋め込み部により前記暗号化電子認証情報が
埋め込まれたテキストを読み込み、前記電子文書に記述
されたテキストと前記暗号化電子認証情報とを分離し取
り出すテキスト認証情報取り出し部と、前記テキスト認
証情報取り出し部により取り出された前記電子文書に記
述されたテキストに基づき、テキストの特徴を抽出する
テキスト特徴取り出し部と、前記テキスト認証情報取り
出し部によって分離・取り出された暗号化電子認証情報
と、前記テキスト特徴取り出し部により取り出されたテ
キストの特徴をもとに、該テキストに埋め込まれた暗号
化認証情報を復号化するテキスト認証読み取り部と、前
記テキスト認証情報読み取り部が復号化した電子認証情
報から、テキスト発行元の情報を読み取るテキスト発行
元情報抽出部と、前記テキスト認証情報読み取り部が復
号化した電子認証情報から、前記テキストの認証のため
の情報を読み取るテキスト認証情報抽出部と、からなる
ことを特徴とする。Further, the present invention is an apparatus for extracting authentication information from an electronic document in which authentication information is embedded by the text electronic authentication apparatus according to claim 1, wherein the text authentication information embedding section performs the encryption. A text authentication information extracting unit that reads a text in which electronic authentication information is embedded, separates and extracts the text described in the electronic document and the encrypted electronic authentication information, and the electronic device that is extracted by the text authentication information extracting unit. A text feature extracting unit that extracts text features based on the text described in the document; encrypted electronic authentication information separated and extracted by the text authentication information extracting unit; and a text extracted by the text feature extracting unit. Decrypts the encrypted authentication information embedded in the text based on the characteristics of A text authentication reading unit, a text publisher information extracting unit that reads information of a text publisher from the electronic authentication information decrypted by the text authentication information reader, and an electronic authentication information decrypted by the text authentication information reader. And a text authentication information extraction unit for reading information for text authentication.

【００１１】また、本発明のテキスト電子認証方法は、
テキストが記述された電子文書に対して、該電子文書に
認証情報を埋め込むことで該電子文書を認証可能とする
テキスト電子認証方法であって、前記テキストが記述さ
れた電子文書からテキストを読み取るテキスト読み取り
手順と、前記テキスト読み取り手順により読み取ったテ
キストの特徴を抽出するテキスト特徴抽出手順と、前記
電子文書に記述されたテキストの発行元の情報を入力す
るテキスト発行元情報入力手順と、前記電子文書に記述
されたテキストの認証のための情報を入力するテキスト
認証情報入力手順と、前記テキスト特徴抽出手順により
抽出したテキストの特徴を表す情報と、前記テキスト発
行元情報入力手順により入力された発行元の情報と、前
記テキスト認証情報入力手順により入力されたテキスト
の認証のための情報と、を利用し、既存の電子文書表示
器では不可視であって、かつ、解読不能な暗号化電子認
証情報とするテキスト認証発生手順と、前記テキスト認
証発生手順により生成された暗号化電子認証情報を前記
テキスト読み取り手順により読み取られたテキストに埋
め込み、かつ、該暗号化電子認証情報を埋め込んだテキ
ストを編集しても暗号化電子認証情報が消失しないよう
に暗号化電子認証情報を埋め込むテキスト認証埋め込み
手順と、を含むことを特徴とする。Further, the text electronic authentication method of the present invention comprises:
A text electronic authentication method for embedding authentication information in an electronic document in which the text is described, whereby the electronic document can be authenticated, wherein the text is a text that reads a text from the electronic document in which the text is described A reading procedure, a text feature extraction procedure for extracting a feature of the text read by the text reading procedure, a text publisher information input procedure for inputting information on a publisher of the text described in the electronic document, and the electronic document A text authentication information inputting step of inputting information for text authentication described in the text information, information representing characteristics of the text extracted by the text feature extracting step, and a publisher input by the text publisher information inputting step. Information for authenticating the text input by the text authentication information input procedure. And a text authentication generation procedure that is invisible to an existing electronic document display device and is indecipherable and that cannot be decrypted; and the encrypted electronic authentication information generated by the text authentication generation procedure. Embedded in the text read by the text reading procedure, and also embeds the encrypted electronic authentication information so that the encrypted electronic authentication information is not lost even if the text in which the encrypted electronic authentication information is embedded is edited. And a procedure.

【００１２】また、本発明は、請求項６に記載のテキス
ト電子認証方法により認証情報を埋め込まれた電子文書
から、該認証情報を取り出す方法であって、前記テキス
ト認証埋め込み手順において前記暗号化電子認証情報を
埋め込んだテキストを読み込み、前記電子文書に記述さ
れたテキストと前記暗号化電子認証情報とを分離し、取
り出すテキスト認証情報取り出し手順と、前記テキスト
認証情報取り出し手順により取り出された前記電子文書
に記述されたテキストに基づき、テキストの特徴を抽出
するテキスト特徴取り出し手順と、前記テキスト認証情
報取り出し手順によって分離・取り出された暗号化電子
認証情報と、前記テキスト特徴取り出し部手順により取
り出されたテキストの特徴をもとに、該テキストに埋め
込まれた暗号化認証情報を復号化するテキスト認証読み
取り手順と、前記テキスト認証読み取り手順により復号
化した電子認証情報から、テキスト発行元の情報を読み
取るテキスト発行元情報抽出手順と、前記テキスト認証
読み取り手順により復号化した電子認証情報から、前記
テキストの認証のための情報を読み取るテキスト認証情
報抽出手順と、を含むことを特徴とする。Further, the present invention is a method for extracting authentication information from an electronic document in which authentication information is embedded by the text electronic authentication method according to claim 6, wherein said encrypted electronic information is included in said text authentication embedding procedure. The text embedded with the authentication information is read, the text described in the electronic document is separated from the encrypted electronic authentication information, the text authentication information extracting procedure to be extracted, and the electronic document extracted by the text authentication information extracting procedure A text feature extraction procedure for extracting a text feature based on the text described in, an encrypted electronic authentication information separated and extracted by the text authentication information extraction procedure, and a text extracted by the text feature extraction unit procedure. Based on the features of A text authentication reading procedure for decoding information, a text issuer information extraction procedure for reading information on a text issuer from the electronic authentication information decoded by the text authentication reading procedure, and an electronic decryption procedure performed by the text authentication reading procedure. A text authentication information extracting step of reading information for authenticating the text from the authentication information.

【００１３】また、本発明は、テキストが記述された電
子文書に対して、該電子文書に認証情報を埋め込むこと
で該電子文書を認証可能とするテキスト電子認証プログ
ラムを記録した記録媒体であって、前記テキストが記述
された電子文書からテキストを読み取るテキスト読み取
り手順と、前記テキスト読み取り手順により読み取った
テキストの特徴を抽出するテキスト特徴抽出手順と、前
記電子文書に記述されたテキストの発行元の情報を入力
するテキスト発行元情報入力手順と、前記電子文書に記
述されたテキストの認証のための情報を入力するテキス
ト認証情報入力手順と、前記テキスト特徴抽出手順によ
り抽出したテキストの特徴を表す情報と、前記テキスト
発行元情報入力手順により入力された発行元の情報と、
前記テキスト認証情報入力手順により入力されたテキス
トの認証のための情報と、を利用し、既存の電子文書表
示器では不可視であって、かつ、解読不能な暗号化電子
認証情報とするテキスト認証発生手順と、前記テキスト
認証発生手順により生成された暗号化電子認証情報を前
記テキスト読み取り手順により読み取られたテキストに
埋め込み、かつ、該暗号化電子認証情報を埋め込んだテ
キストを編集しても暗号化電子認証情報が消失しないよ
うに暗号化電子認証情報を埋め込むテキスト認証埋め込
み手順とをコンピュータに実行させるテキスト電子認証
プログラムを記録したコンピュータ読み取り可能な記録
媒体を提供することで、コンピュータを用いて容易にテ
キスト電子認証装置を実現できる。Further, the present invention is a recording medium which records a text electronic authentication program which enables authentication of an electronic document in which text is described by embedding authentication information in the electronic document. A text reading procedure for reading a text from an electronic document in which the text is described, a text feature extracting procedure for extracting features of the text read by the text reading procedure, and information on a publisher of the text described in the electronic document A text publisher information input procedure for inputting, a text authentication information input procedure for inputting information for text authentication described in the electronic document, and information representing the text features extracted by the text feature extraction procedure. , The publisher information input by the text publisher information input procedure,
Using the information for text authentication input by the text authentication information input procedure, the text authentication generation is performed as encrypted electronic authentication information that is invisible and indecipherable on existing electronic document displays. And encrypting the encrypted electronic authentication information generated by the text authentication generating procedure into the text read by the text reading procedure, and editing the text in which the encrypted electronic authentication information is embedded. By providing a computer-readable recording medium recording a text electronic authentication program for causing a computer to execute a text authentication embedding procedure for embedding encrypted electronic authentication information so that the authentication information is not lost, the text can be easily read using a computer. An electronic authentication device can be realized.

【００１４】また、本発明は、請求項８に記載のテキス
ト電子認証プログラムを実装したコンピュータにより認
証情報を埋め込まれた電子文書から、該認証情報を取り
出すプログラムを記録した記録媒体であって、前記テキ
スト認証埋め込み手順において前記暗号化電子認証情報
を埋め込んだテキストを読み込み、前記電子文書に記述
されたテキストと前記暗号化電子認証情報とを分離し、
取り出すテキスト認証情報取り出し手順と、前記テキス
ト認証情報取り出し手順により取り出された前記電子文
書に記述されたテキストに基づき、テキストの特徴を抽
出するテキスト特徴取り出し手順と、前記テキスト認証
情報取り出し手順によって分離・取り出された暗号化電
子認証情報と、前記テキスト特徴取り出し部手順により
取り出されたテキストの特徴をもとに、該テキストに埋
め込まれた暗号化認証情報を復号化するテキスト認証読
み取り手順と、前記テキスト認証読み取り手順により復
号化した電子認証情報から、テキスト発行元の情報を読
み取るテキスト発行元情報抽出手順と、前記テキスト認
証読み取り手順により復号化した電子認証情報から、前
記テキストの認証のための情報を読み取るテキスト認証
情報抽出手順とをコンピュータに実行させるテキスト電
子認証プログラムを記録したコンピュータ読み取り可能
な記録媒体を提供することで、コンピュータを用いて容
易にテキスト電子認証装置を実現できる。The present invention also provides a recording medium storing a program for extracting authentication information from an electronic document in which the authentication information is embedded by a computer having the text electronic authentication program according to claim 8 mounted thereon. In the text authentication embedding procedure, read the text in which the encrypted electronic authentication information is embedded, to separate the text described in the electronic document and the encrypted electronic authentication information,
A text authentication information extraction procedure to be extracted, a text feature extraction procedure to extract text features based on the text described in the electronic document extracted by the text authentication information extraction procedure, A text authentication reading step for decrypting the encrypted authentication information embedded in the text based on the extracted encrypted electronic authentication information and the text feature extracted by the text feature extraction unit procedure; and From the electronic authentication information decrypted by the authentication reading procedure, a text publisher information extraction procedure for reading the information of the text publisher, and from the electronic authentication information decrypted by the text authentication reading procedure, information for authenticating the text is obtained. Read text authentication information extraction procedure To provide a computer readable recording medium recording a text electronic authentication program to be executed by the computer, it can be easily realized text electronic authentication system using a computer.

【００１５】[0015]

【発明の実施の形態】以下、本発明の実施の形態を図面
を参照して説明する。図１は、本発明の一実施の形態で
あるテキスト電子認証装置の構成を示すブロック図であ
る。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a configuration of a text electronic authentication device according to an embodiment of the present invention.

【００１６】Ｔ−１は、テキスト読み取り部で、テキス
トが記述されたファイルをオープンし、装置または記憶
媒体上に読み込む部分である。Ｔ−２は、テキスト特徴
抽出部で、Ｔ−１のテキスト読み取り部で読み取ったテ
キストの特徴を表すデータ（テキスト特徴情報）を抽出
する部分である。ここでは、テキスト特徴情報として種
々考えられる。例えば、すべてのテキストデータをテキ
スト特徴情報として捕らえてもよい。この場合、テキス
トを認証するためのデータ量を固定とすると、テキスト
特徴情報が大きくなればなるほど、テキスト認証情報に
記述されるテキスト特徴情報の比率は減少する。つま
り、テキスト全データをテキスト特徴情報とした場合、
テキスト全データに対して認証は可能であるが、そのテ
キストの中に記述された個々のテキスト内容についての
認証はできなくなる可能性が高い。T-1 is a text reading unit which opens a file in which text is described and reads it on a device or a storage medium. T-2 is a text feature extraction unit for extracting data (text feature information) representing the features of the text read by the text reading unit of T-1. Here, various types of text feature information can be considered. For example, all text data may be captured as text feature information. In this case, assuming that the data amount for authenticating the text is fixed, the larger the text feature information is, the smaller the ratio of the text feature information described in the text authentication information is. In other words, if all text data is text feature information,
Although authentication is possible for all text data, it is highly likely that authentication of individual text contents described in the text will not be possible.

【００１７】本装置の使い方として、テキスト全体を認
証する場合には、テキスト全体をテキスト特徴情報とし
て抽出すべきであるが、テキストの内容も認証したい場
合、例えば、自らが書いたテキストの一部であるかどう
かを認証するためには、テキスト全体を抽出するのでは
なく、テキストに含まれる自立語または重要な語を抽出
し、こららをテキスト特徴情報としなければならない。
利用者は、これらの特性を考慮して、一例として以下に
示す最適なテキスト特徴量（テキスト特徴情報の大き
さ）またはこのテキスト特徴量に対応するフラグ（識別
子、テキスト特徴量パラメータ）をＴ−２のテキスト特
徴抽出部に設定する。テキスト特徴量テキスト抽出部位テキスト抽出法大きいテキスト全体 ― 中テキストに含まれる自立語形態素解析小さいテキストに含まれる重要な語要約解析Ｔ−２のテキスト特徴抽出部は、利用者の設定したテキ
スト特徴量に基づき、テキスト特徴量パラメータとし
て、使用したテキスト抽出法を指定するフラグをＴ−５
のテキスト認証情報発生部に対して通知する。As a method of using the present apparatus, when authenticating the entire text, the entire text should be extracted as text feature information. However, when the content of the text is to be authenticated, for example, a part of the text written by oneself is used. In order to authenticate whether or not the word is not the whole text, it is necessary to extract independent words or important words included in the text and use them as text feature information.
In consideration of these characteristics, the user sets the following optimal text feature amount (the size of the text feature information) or a flag (identifier, text feature amount parameter) corresponding to the text feature amount as shown below as an example. 2 is set in the text feature extraction unit. Text features Text extraction part Text extraction method Large whole text-Medium Independent words contained in text Morphological analysis Small words Important words contained in text Summary analysis The text feature extraction part of T-2 is the text feature amount set by the user A flag specifying the used text extraction method as a text feature parameter based on T-5
Is notified to the text authentication information generation unit of.

【００１８】Ｔ−３は、テキスト発行元情報入力部であ
る。これは、テキストを発行した本人であることを示す
情報（テキスト発行元情報）を記述する。例えば、テキ
ストの著作権を持つ会社組織や、テキストを著作した個
人の住所、氏名、ＵＲＬなどの発行元であることを示す
情報を記述する。Ｔ−４は、テキスト認証情報入力部で
ある。これは、公的機関や、ある種の認証会社が発行す
る、発行元のＩＤ（テキスト発行元ＩＤ）を入力する部
位である。このテキスト発行元ＩＤを元に、発行者を特
定することができる。ただし、発行元のＩＤを一意に示
すため、テキスト発行元ＩＤは、世の中で一意である必
要がある。T-3 is a text issuer information input unit. This describes information (text issuer information) indicating that the user has issued the text. For example, information indicating a company organization having a copyright of the text or an address, a name, a URL or the like of an individual who wrote the text is described. T-4 is a text authentication information input unit. This is a part for inputting an issuer ID (text issuer ID) issued by a public organization or a certain certification company. The publisher can be specified based on the text publisher ID. However, in order to uniquely indicate the ID of the publisher, the text publisher ID needs to be unique in the world.

【００１９】Ｔ−５は、テキスト認証情報発生部であ
る。本テキスト認証情報発生部では、Ｔ−２のテキスト
特徴抽出部で抽出したテキスト特徴情報とテキスト特徴
量パラメータおよび、Ｔ−３のテキスト発行元情報入力
部で入力された発行元の情報を示すテキスト発行元情報
および、Ｔ―４のテキスト認証情報入力部で入力したテ
キスト発行元ＩＤを入力として、該当テキストのテキス
ト認証情報を発生する部分である。T-5 is a text authentication information generator. In the text authentication information generating unit, the text feature information and the text feature amount parameter extracted by the text feature extracting unit of T-2, and the text indicating the information of the publisher input by the text publisher information input unit of T-3 This is a section for generating text authentication information of the corresponding text by inputting the publisher information and the text publisher ID input in the text authentication information input section of T-4.

【００２０】詳しくは、図２に示すように、Ｔ−２のテ
キスト特徴抽出部で抽出したテキスト特徴情報を入力す
るテキスト特徴情報入力部Ｔ−５ａと、Ｔ−２のテキス
ト特徴抽出部から通知されるテキスト特徴量パラメータ
を入力する特徴量パラメータ入力部Ｔ−５ｂと、Ｔ−３
のテキスト発行元情報入力部で入力されたテキスト発行
元情報を入力する発行元情報入力部Ｔ−５ｃと、Ｔ―４
のテキスト認証情報入力部で入力したテキスト発行元Ｉ
Ｄを入力する発行元ＩＤ入力部Ｔ−５ｄと、テキスト特
徴情報とテキスト発行元情報とテキスト発行元ＩＤとテ
キスト特徴量パラメータとからなるテキスト認証情報を
暗号化する暗号化器Ｔ−５ｅと、暗号化されたテキスト
認証情報をさらにブラウザに不可視なコード（不可視バ
イト列）に変換するコード変換部Ｔ−５ｆと、コード変
換部Ｔ−５ｆによりコード変換された不可視バイト列を
出力する暗号化認証情報出力部Ｔ−５ｇとから構成され
る。なお、暗号化およびコード変換の詳細は後述する。More specifically, as shown in FIG. 2, a text feature information input unit T-5a for inputting text feature information extracted by the text feature extraction unit T-2, and a notification from the text feature extraction unit T-2. Parameter input unit T-5b for inputting a text feature parameter to be input, and T-3
Issuer information input unit T-5c for inputting the text issuer information input in the text issuer information input unit of T-4;
Text publisher I entered in the text authentication information input section of
An issuer ID input unit T-5d for inputting D, an encryptor T-5e for encrypting text authentication information including text feature information, text issuer information, text issuer ID, and text feature amount parameter; A code conversion unit T-5f for further converting the encrypted text authentication information into a code (invisible byte sequence) invisible to the browser, and an encryption authentication for outputting the invisible byte sequence code-converted by the code conversion unit T-5f. And an information output unit T-5g. The details of the encryption and the code conversion will be described later.

【００２１】Ｔ−６は、テキスト認証情報埋め込み部で
ある。本テキスト認証情報埋め込み部では、Ｔ−５のテ
キスト認証情報発生部で発生したテキスト認証情報（不
可視バイト列）を入力されたテキストに対して埋め込
む。ただし、テキスト情報であるため、単純に埋め込む
場合、テキストへの埋め込みが他者に漏洩し、当該テキ
スト認証情報を削除してしまう場合などが考えられる。
また、テキスト中に埋め込む場合、通常のテキストのブ
ラウジングの際に妨げとなってしまうことが考えられ
る。そこで、本装置では、これらのＴ−５のテキスト認
証情報発生部で発生したテキスト認証情報を、他者に漏
洩しにくいように、テキスト中にばらまくとともに、テ
キストのブラウジングの妨げとならないような形で、テ
キスト中に埋め込むことを特徴とする。つまり、通常の
テキストのブラウジングを行った場合となんら変わらな
いようにテキスト認証情報を埋め込むとともに、テキス
トのカットや削除などの変更にも耐久性があるようにテ
キスト認証情報をテキスト中に分散させて埋め込む。T-6 is a text authentication information embedding section. The text authentication information embedding unit embeds the text authentication information (invisible byte string) generated in the text authentication information generation unit of T-5 into the input text. However, since the text authentication information is text information, the embedding into text may be leaked to another person and the text authentication information may be deleted.
In addition, when embedding in a text, it can be considered that the embedding is hindered when browsing a normal text. Therefore, in this apparatus, the text authentication information generated by the T-5 text authentication information generation unit is scattered in the text so as not to be easily leaked to others, and the text authentication information is not hindered from browsing the text. And is embedded in the text. In other words, embed the text authentication information in the same way as when browsing normal text, and distribute the text authentication information in the text so that it is durable to changes such as cutting or deleting text. Embed.

【００２２】詳しくは、図３に示すように、Ｔ−５のテ
キスト認証情報発生部からテキスト特徴量パラメータ
（Ｆ値）を読み取る特徴量パラメータ入力部Ｔ−６ａ
と、Ｔ−５のテキスト認証情報発生部から不可視バイト
列を入力する暗号化認証情報入力部Ｔ−６ｂと、Ｔ−１
のテキスト読み取り部から入力されたテキストを入力す
るテキスト入力部Ｔ−６ｃと、すべての不可視バイト列
を出力したか判定し、各出力部を制御する判定部Ｔ−６
ｄと、テキスト特徴量パラメータ（Ｆ値）に基づいて、
入力テキストを読み込み、読み込んだテキストのバイト
列を出力する埋込テキスト出力部Ｔ−６ｅと、テキスト
特徴量パラメータ（Ｆ値）に基づいて、不可視バイト列
を読み込み、読み込んだ不可視バイト列を出力する埋込
暗号化認証情報出力部Ｔ−６ｆと、すべての不可視バイ
ト列が出力された場合、残りの入力テキストを出力する
テキスト出力部Ｔ−６ｇとから構成される。More specifically, as shown in FIG. 3, a feature parameter input section T-6a for reading a text feature parameter (F value) from the text authentication information generating section of T-5.
An encrypted authentication information input unit T-6b for inputting an invisible byte string from the text authentication information generation unit of T-5;
A text input unit T-6c for inputting a text input from a text reading unit, and a determination unit T-6 for determining whether all invisible byte strings have been output and controlling each output unit.
d and the text feature parameter (F value),
An embedded text output unit T-6e that reads an input text, outputs a byte sequence of the read text, reads an invisible byte sequence based on a text feature parameter (F value), and outputs the read invisible byte sequence. It comprises an embedded encrypted authentication information output section T-6f and a text output section T-6g for outputting the remaining input text when all invisible byte strings are output.

【００２３】Ｔ−７は、テキスト認証情報取り出し部で
ある。Ｔ−６のテキスト認証情報埋め込み部により埋
め込まれたテキスト認証情報を、認証等のためにテキス
ト認証情報が埋め込まれたテキスト中から取り出す処理
部である。本処理部では、テキスト認証情報が埋め込ま
れたテキストから、テキスト認証情報と元のテキストと
を分離する処理も行う。Ｔ−８は、テキスト特徴取り出
し部である。これは、テキスト認証情報を発生する際に
使用したテキスト特徴情報を、テキスト認証情報を分離
したテキスト（元のテキスト）から抽出する部位であ
る。Ｔ−７のテキスト認証情報取り出し部により、テキ
スト認証情報と元のテキストが分離されるので、その分
離されたテキストからテキストの特徴（テキスト特徴情
報）を再計算する。T-7 is a text authentication information extracting section. A processing unit for extracting the text authentication information embedded by the text authentication information embedding unit of T-6 from the text in which the text authentication information is embedded for authentication or the like. The processing unit also performs a process of separating the text authentication information and the original text from the text in which the text authentication information is embedded. T-8 is a text feature extracting unit. This is a part for extracting the text feature information used when generating the text authentication information from the text (original text) obtained by separating the text authentication information. Since the text authentication information and the original text are separated by the text authentication information extracting unit of T-7, the feature of the text (text feature information) is recalculated from the separated text.

【００２４】Ｔ−９は、テキスト認証読み出し部であ
る。Ｔ−７のテキスト認証情報取り出し部により分離
されたテキスト認証情報に対して、Ｔ−８が抽出したテ
キスト特徴情報とを用いて、テキスト発行元ＩＤとテキ
スト発行元情報を分離・抽出する。Ｔ−１０は、テキス
ト発行元情報抽出部で、Ｔ−９で抽出したテキスト発行
元情報を取り出し、出力する。Ｔ−１１は、テキスト認
証情報抽出部で、Ｔ−９で抽出したテキスト発行元ＩＤ
を取り出し、出力機器に出力する。なお、本テキスト電
子認証装置は、専用の装置として構成されてもよく、ま
た、上記各部の機能を実現するプログラムを、コンピュ
ータに読み込ませ実行させることにより実現されてもよ
い。また、本テキスト電子認証装置は、周辺機器として
入力装置、表示装置等（いずれも図示せず）が接続され
るものとする。ここで、入力装置とはキーボード、マウ
ス等の入力デバイスのことをいう。表示装置とはＣＲＴ
（ＣａｔｈｏｄｅＲａｙＴｕｂｅ）や液晶表示装置等
のことをいう。T-9 is a text authentication reading section. For the text authentication information separated by the text authentication information extracting unit of T-7, the text publisher ID and the text publisher information are separated and extracted using the text feature information extracted by T-8. T-10 is a text publisher information extraction unit that extracts and outputs the text publisher information extracted in T-9. T-11 is a text authentication information extraction unit, and the text issuer ID extracted in T-9
And output it to the output device. The text electronic authentication device may be configured as a dedicated device, or may be realized by causing a computer to read and execute a program that realizes the functions of the above-described units. Further, it is assumed that the text electronic authentication device is connected to an input device, a display device, and the like (neither is shown) as peripheral devices. Here, the input device refers to an input device such as a keyboard and a mouse. What is a display device CRT
(Cathode Ray Tube) or a liquid crystal display device.

【００２５】次に、このように構成された本実施の形態
のテキスト電子認証装置の動作について、図を参照して
説明する。以下の説明は、ＨＴＭＬ文書（ＨＴＭＬテキ
スト）に対する動作例である。Next, the operation of the thus configured text electronic authentication apparatus of the present embodiment will be described with reference to the drawings. The following description is an operation example for an HTML document (HTML text).

【００２６】インターネットで主流のＨＴＭＬテキスト
とは、“＜”と“＞”のタグで記述されたテキスト属性
に基づき、テキストを構造化した文書である。このタグ
は、一般に、Ｗ３Ｃ（World Wide Web Consortium）等
で認証され、規定されている。このＨＴＭＬテキストを
記述するために、オーサリングソフト等がある。一方、
ＨＴＭＬテキスト形式で記述された文書を見る（ブラウ
ジング）するための装置として、一般に、ブラウザとい
うものがある。これは、上記タグの情報をもとに、テキ
スト情報を構造化して表示機器上に再配置して情報を提
示する装置である。これらのブラウザでは、通常のＨＴ
ＭＬテキストを解釈し、表示する機構をもっている。The mainstream HTML text on the Internet is a document in which text is structured based on text attributes described by tags “<” and “>”. This tag is generally authenticated and specified by W3C (World Wide Web Consortium) or the like. Authoring software and the like exist for describing the HTML text. on the other hand,
As a device for viewing (browsing) a document described in the HTML text format, there is generally a browser. This is an apparatus that presents information by structuring text information based on the information of the tag and rearranging it on a display device. In these browsers, the normal HT
It has a mechanism to interpret and display ML text.

【００２７】例えば、ＨＴＭＬテキストの例としては以
下のような例を考える。 <ＨＴＭＬ> <ＴＩＴＬＥ>これはテストです</ＴＩＴＬＥ> <ＢＯＤＹ> <Ｈ１>今日は天気がよい。 </ＢＯＤＹ> </ＨＴＭＬ> このようにＨＴＭＬテキストは、プレーンなテキスト構
造となっている。これをＨＴＭＬテキストを見ることが
可能なブラウザで表示すると、例えば図４のようにな
る。<ＴＩＴＬＥ>部は、表示器（ブラウザ）の一番上部
に表示される。一方、<Ｈ１>タグで示された部分は、表
示器の中に表示される。For example, the following example is considered as an example of an HTML text. <HTML><TITLE> This is a test </ TITLE><BODY><H1> The weather is fine today. </ BODY></HTML> As described above, the HTML text has a plain text structure. When this is displayed by a browser capable of viewing the HTML text, for example, the result is as shown in FIG. The <TITLE> section is displayed at the top of the display (browser). On the other hand, the portion indicated by the <H1> tag is displayed on the display.

【００２８】Ｔ−１のテキスト読み取り部では、このよ
うなＨＴＭＬテキストを読み込む（図８（ａ）：ステッ
プＳ１）。８ｂｉｔ単位でデータを読み込むとすれば、
以下のようなバイト群が装置に読み込まれる。 The T-1 text reading section reads such HTML text (FIG. 8A: step S1). If data is read in units of 8 bits,
The following groups of bytes are read into the device.

【００２９】次に、Ｔ−２のテキスト特徴抽出部は、Ｔ
−１のテキスト読み取り部で読み取ったテキストの特徴
を表すデータであるテキスト特徴情報を抽出する（図８
（ａ）：ステップＳ２）。Next, the text feature extraction unit of T-2
The text characteristic information which is data representing the characteristics of the text read by the text reading unit -1 is extracted (FIG. 8).
(A): Step S2).

【００３０】例えば、テキスト特徴量と、テキスト抽出
部位と、テキスト抽出法と、テキスト特徴量パラメータ
（Ｆ値）の関係を以下のように定義する。テキスト特徴量テキスト抽出部位テキスト抽出法Ｆ単位大きいテキスト全体 ― １全体タグぬきテキストタグ解析２タグテキストに含まれる自立語形態素解析３単語テキストに含まれる重要な文要約解析４重要文小さいテキストに含まれる重要な語要約解析＋形態素解析５重要語For example, the relationship between a text feature, a text extraction part, a text extraction method, and a text feature parameter (F value) is defined as follows. Text feature Text extraction part Text extraction method F unit Large Large whole text-1 Whole tagless text Tag analysis 2 Tag Independent words included in text Morphological analysis 3 Words Important sentence included in text Summary analysis 4 Important sentence Small text included Important words summary analysis + morphological analysis 5 important words

【００３１】テキスト全体をテキスト抽出する場合は、
Ｔ−１のテキスト読み取り部で読み込まれたテキストを
すべてそのまま、テキスト特徴情報として、Ｔ−５のテ
キスト認証情報発生部に渡す。テキスト抽出の手法を示
すフラグ（Ｆ）は、初期に設定を行う。To extract the text of the entire text,
All of the text read by the text reading unit of T-1 is passed as it is to the text authentication information generating unit of T-5 as text feature information. The flag (F) indicating the text extraction method is initially set.

【００３２】テキスト全体からなるテキスト特徴情報を
渡すと、以下のバイト列をＴ−５のテキスト認証情報発
生部に渡すことになる。この場合、１６７バイト必要と
なる。 When the text feature information composed of the entire text is passed, the following byte string is passed to the text authentication information generating section of T-5. In this case, 167 bytes are required.

【００３３】ＨＴＭＬテキストは、タグの情報により文
書を構造化しているが、タグ自体には、文書の意味が含
まれているわけではない。そこで、これらのＨＴＭＬテ
キストのタグ情報を削除したテキストをテキスト特徴情
報とすることも考えられる。例えば、これはテストです今日は天気がよい。をバイト例で表すと、となり、３８バイトで表すことができる。Although the HTML text structures a document by tag information, the tag itself does not include the meaning of the document. Therefore, it is conceivable to use text in which the tag information of the HTML text is deleted as text feature information. For example, this is a test today is fine. Is represented by a byte example. And can be represented by 38 bytes.

【００３４】さらに、テキストに含まれる自立語は、テ
キストのキーワードとして利用されることが多く、該当
テキストの内容を表すために用いられることが多い。例
えば、これらのテキストに含まれる自立語を抽出する方
法としては、形態素解析が通常用いられる。形態素解析
とは、入力された文字列を単語辞書に対して、検索を行
い、品詞情報(品詞)、文頭可否情報(文頭可)、前方接続
情報(前接)、後方接続情報(後接)などの情報を取得す
る。通常の単語辞書では、ＴＲＥΙ辞書構造という特別
な辞書構造を行うことにより高速な検索を行えるように
なっている。Further, the independent word included in the text is often used as a keyword of the text, and is often used to represent the content of the text. For example, morphological analysis is usually used as a method for extracting independent words contained in these texts. With morphological analysis, the input character string is searched against a word dictionary, and part-of-speech information (part-of-speech), head-of-sentence information (head-of-sentence possible), forward connection information (antecedent), backward connection information (adjoint) Get information such as. In an ordinary word dictionary, a high-speed search can be performed by performing a special dictionary structure called a TREΙ dictionary structure.

【００３５】辞書項目として、“ああ”、“あいさ
つ”、“あい”、などがある場合、それぞれの第一文字
（ここでは、日本語であるので、アルファベットと異な
り、日本語文字２バイトを指し示す）が同じもの、第二
文字目が同じものなど、それぞれ順次に、木構造的に構
成される。そして、最後の文字まで、一致した場合に
は、その単語辞書項目に対する品詞情報(品詞)、文頭可
否情報(文頭可)、前方接続情報(前接)、後方接続情報
(後接)などの情報が記述される。When the dictionary items include "Ah", "Greetings", "Ai", etc., each of the first characters (in this case, different from the alphabet because it is Japanese, indicates 2 bytes of Japanese characters) Are the same, and the second character is the same. If the last character matches, the part-of-speech information (part-of-speech), head-of-sentence availability information (head-of-sentence possible), forward connection information (prefix), back connection information
Information such as (subsequent) is described.

【００３６】文頭可否情報とは、文頭にあってよいかど
うかを示すフラグである。文頭可であれば、文頭に存在
してもよいが、文頭否であれば、文頭にあることが許可
されない単語ということになる。前方接続情報とは、前
の単語の品詞または属性が適正な場合だけ接続が許可さ
れ、前接で接続が許可されない単語の場合、候補として
削除される。同様に後方接続情報も、後の単語の品詞ま
たは属性が適正な場合だけ接続が許可され、後接で接続
が許可されない単語の場合、候補として削除される。The head-of-sentence information is a flag indicating whether or not it can be at the head of the sentence. If the beginning of the sentence is acceptable, the word may be present at the beginning of the sentence, but if the sentence is not the beginning of the sentence, the word is not permitted to be at the beginning of the sentence. The forward connection information is permitted to be connected only when the part of speech or attribute of the preceding word is appropriate, and is deleted as a candidate when the preceding word is not permitted to be connected. Similarly, the backward connection information is permitted to be connected only when the part of speech or attribute of the subsequent word is appropriate, and is deleted as a candidate when the word is not permitted to be connected later.

【００３７】このような、品詞接続により、候補を選択
する。最尤候補は、最小コスト法とよぶ方法により選択
する。最小コスト法とは、最もコストが最小となる形態
素候補を最尤候補とする処理方式である。形態素解析に
おいて利用されるコストは、以下の２種類のコストがあ
る。１．接続コスト２．単語コスト接続コストは、ある単語と単語を接続する場合に必要な
コストである。単語と単語であるため、単語＋当該単語
の活用、に対する接続コストは０となる。また、単語コ
ストとは、その単語に関するコストであり、例えば、使
用頻度が高い単語は、コストが低くなる。また、活用は
単語ではないので、コストは０となる。形態素解析によ
り、テキスト部が単語単位に分解されると同時に、各単
語に尤も正しいと考えられる品詞が付与される。本実施
の形態では、接続コストと単語コストの総和が最小とな
るものを最尤候補とする。なお、接続コストおよび単語
コストの数値定義は別途なされるものである。A candidate is selected by such a part-of-speech connection. The maximum likelihood candidate is selected by a method called a minimum cost method. The minimum cost method is a processing method in which a morpheme candidate having the lowest cost is set as a maximum likelihood candidate. The costs used in the morphological analysis include the following two types of costs. 1. Connection costs 2. Word cost The connection cost is the cost required to connect a word to a word. Since the words are words, the connection cost for the word + the use of the word is zero. The word cost is a cost related to the word. For example, a frequently used word has a low cost. Also, since utilization is not a word, the cost is zero. By the morphological analysis, the text part is decomposed into words, and at the same time, parts of speech that are considered to be correct are given to each word. In this embodiment, the candidate with the smallest sum of the connection cost and the word cost is set as the maximum likelihood candidate. Note that the numerical definitions of the connection cost and the word cost are made separately.

【００３８】ここで、先の例文が入力されたとする。こ
れはテストです今日は天気がよい。この例における形態
素解析は以下のようになる。Here, it is assumed that the previous example sentence has been input. This is a test Today is fine. The morphological analysis in this example is as follows.

【００３９】これはテストです。 −−−−−−−−−−−−−−− 表記品詞自立語これ名詞 ○ は助詞テスト動詞 ○ です助動詞。句点ｃｏｓｔ＝３３This is a test. −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− This noun is a particle test verb. Period cost = 33

【００４０】今日は天気がよい。 −−−−−−−−−−−−−−− 表記品詞自立語今日名詞 ○ は助詞天気名詞 ○ が助詞よい形容詞。句点ｃｏｓｔ＝４９The weather is fine today. −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− Period cost = 49

【００４１】このような例文で、自立語だけを抽出する
と、これ、テスト、今日、天気が抽出される。これをバイト列で表すと、以下のように
なる。この例では、合計１９バイトでテキストの特徴を表現す
ることができる。When only independent words are extracted from such example sentences, the test, today, and weather are extracted. This is represented by a byte string as follows. In this example, text features can be represented by a total of 19 bytes.

【００４２】さらに、テキストの要約を使う。例えば、
稲垣らが発明した、テキストの要約手法（特願平１０−
１８０１８１公開文書要約装置およびそのためのプログ
ラムを記録した記録媒体）を利用すれば、テキストから
その中で重要な要旨を抽出することができる。例えば、
上記の例の要旨として、今日は天気がよい。が選ばれた
とする。そうすると、これに対するバイト列は、以下の
ようになる。 In addition, a text summary is used. For example,
Text summarization method invented by Inagaki et al.
If the 180181 published document summarizing apparatus and a recording medium for recording a program for the same are used, an important point can be extracted from the text. For example,
As a gist of the above example, the weather is fine today. Is chosen. Then, the byte sequence for this is as follows.

【００４３】さらに、上記文を形態素解析し、この中の
自立語を抽出すると、今日、天気が抽出される。これをバイト列で表すと、以下のように
なる。 0000000 baa3 c6fc c5b7Further, when the sentence is subjected to morphological analysis and an independent word is extracted from the sentence, the weather is extracted today. This is represented by a byte string as follows. 0000000 baa3 c6fc c5b7

【００４４】このようにして、用途に応じて、適切なテ
キスト特徴情報を抽出する。例えば、テキスト全体を認
証する場合には、テキスト全体をテキスト抽出すべきで
あるが、テキストの内容も認証したい場合、例えば、自
らが書いたテキストの一部であるかどうかを認証するた
めには、テキスト全体を抽出するのではなく、テキスト
に含まれる自立語または、重要な語をテキスト抽出し、
テキスト特徴情報としなければならない。利用者は、こ
れらの特性を考慮して、最適なテキスト特徴量を設定す
る。Ｔ−２のテキスト特徴抽出部は、利用者の設定した
テキスト特徴量に基づき、テキスト特徴情報の抽出に使
用したテキスト抽出法（Ｆ値）をＴ−５のテキスト認証
情報発生部に対して通知する。In this way, appropriate text feature information is extracted according to the application. For example, when authenticating the entire text, the entire text should be extracted, but if you want to authenticate the contents of the text, for example, to authenticate whether it is part of the text you wrote, , Instead of extracting the entire text, it extracts text that is independent or important from the text,
Must be text feature information. The user sets an optimal text feature amount in consideration of these characteristics. The text feature extraction unit of T-2 notifies the text authentication information generation unit of T-5 of the text extraction method (F value) used for extracting the text feature information based on the text feature amount set by the user. I do.

【００４５】次に、Ｔ−３のテキスト発行元情報入力部
は、テキストを発行した本人であることを示す情報（テ
キスト発行元情報）の入力を受け所定の形式に記述する
（図８（ａ）：ステップＳ３）。例えば、テキストの著
作権を持つ会社組織や、テキストを著作した個人の住
所、氏名、ＵＲＬなどの発行元であることを示す情報が
入力され所定の形式に記述する。Next, the text publisher information input section of T-3 receives information (text publisher information) indicating the identity of the person who issued the text and describes it in a predetermined format (FIG. 8 (a)). ): Step S3). For example, information indicating a company organization having a copyright of the text or an address, a name, a URL or the like of the individual who wrote the text is input and described in a predetermined format.

【００４６】一例として、以下のようなテキスト発行元
情報を記述する。 <氏名>あいうえおたろう</氏名> <所属>たろう株式会社</所属> <往所>京都太郎区１</住所> <URL>http://aaaaaa.ne.jp/aaa.htm1</URL> <作成日>９９年３月１日</作成日> <発行日>９９年３月２日</発行日> <権利保有日>２０００年３月２日</権利保有日>As an example, the following text publisher information is described. <Name> Tarou Aiue </ Name><Affiliation> Taro Co., Ltd. </ Affiliation><Where> Taro-ku, Kyoto 1 </ Address><URL> http://aaaaaa.ne.jp/aaa.htm1 </ URL ><Date of creation> March 1, 1999 </ Date of creation><Date of issue> March 2, 1999 </ Issuance date><Date of holding rights> March 2, 2000 </ Date of holding rights>

【００４７】Ｔ−３のテキスト発行元情報入力部は、次
に、テキスト発行元情報をＴ−５のテキスト認証情報発
生部に送る。これらのテキスト発行元情報をＴ−５のテ
キスト認証情報発生部に送る場合、テキスト発行元情報
の属性（つまり、住所であるのか、氏名であるのかな
ど）を明確にするために、ＳＧＭＬ（Standard general
ized Markup Language）と同様にタグでその属性で囲ん
でいる。例えば、<氏名></氏名>のタグの間に属性の
値、つまりここでは氏名を記述する。氏名などの属性の
終了は、“/”で記述されたタグ（ここでは、</氏名>の
部分）がそれを示すマーカとなる。Next, the text publisher information input unit of T-3 sends the text publisher information to the text authentication information generator of T-5. When the text publisher information is sent to the text authentication information generator of T-5, SGML (Standard) is used to clarify the attributes of the text publisher information (that is, whether the address is an address or a name). general
(like Marked Language), it is surrounded by its attributes with tags. For example, the value of the attribute, that is, the name here is described between the tags of <name></name>. At the end of an attribute such as a name, a tag (here, a </ name> part) described by “/” becomes a marker indicating this.

【００４８】これらをバイト列で表すと、以下のように
なる。 These are represented by byte strings as follows.

【００４９】Ｔ−４のテキスト認証情報入力部は、公的
機関や、ある種の認証会社が発行する、発行元のＩＤの
入力を受け所定の形式に記述する（図８（ｂ）：ステッ
プＳ４）。このテキスト発行元ＩＤを元に、発行者を特
定することができる。ただし、発行元のＩＤを一意に示
すため、テキスト発行元ＩＤは、世の中で一意である必
要がある。例えば、以下に示す発行元組織ＩＤとこの発
行元組織が一意に発行した、テキスト発行元ＩＤをテキ
スト発行元ＩＤとして記述する。 <発行元組織ＩＤ>ＡＡＡ</発行元組織ＩＤ> <発行元ＩＤ>123456789</発行元ＩＤ>The text authentication information input section of T-4 receives the ID of the issuer issued by a public organization or a certain certification company and describes it in a predetermined format (FIG. 8B: step S4). The publisher can be specified based on the text publisher ID. However, in order to uniquely indicate the ID of the publisher, the text publisher ID needs to be unique in the world. For example, an issuer organization ID shown below and a text issuer ID uniquely issued by the issuer organization are described as a text issuer ID. <Issuing organization ID> AAA </ Issuing organization ID><IssuingID> 123456789 </ Issuing ID>

【００５０】次に、Ｔ−５のテキスト認証情報発生部
は、テキスト認証情報を発生する（図８（ａ）：ステッ
プＳ５）。詳しくは、Ｔ−２のテキスト特徴抽出部で抽
出したテキスト特徴情報をテキスト特徴情報入力部Ｔ−
５ａで受け（図９：ステップＳ５１）、また、Ｔ−２の
テキスト特徴抽出部から通知されるテキスト特徴量パラ
メータを特徴量パラメータ入力部Ｔ−５ｂで受け（図
９：ステップＳ５２）、Ｔ−３のテキスト発行元情報入
力部で入力された発行元の情報を示すテキスト発行元情
報を発行元情報入力部Ｔ−５ｃで受け（図９：ステップ
Ｓ５３）、Ｔ−４のテキスト認証情報入力部で入力され
たテキスト発行元ＩＤを発行元ＩＤ入力部Ｔ−５ｄで受
け（図９：ステップＳ５４）、各情報が揃うと、以下の
ようにして、テキスト認証情報を発生する。テキスト認
証情報は、それ自体がテキスト中に埋め込まれるため、
単純にテキスト認証情報をテキスト中に埋め込んでしま
うと、テキスト認証情報がブラウザに表示されるととも
に、ある特殊な編集器により改ざんされる可能性が生じ
る。そこで、本装置では、テキスト認証情報が、ブラウ
ザ等で不可視となり、かつ、どのようなテキスト認証情
報が記述されているかがわからないように、暗号化する
ことを行う。Next, the text authentication information generating section of T-5 generates text authentication information (FIG. 8A: step S5). Specifically, the text feature information extracted by the text feature extraction unit of T-2 is input to the text feature information input unit T-
5a (FIG. 9: step S51), and a text feature parameter notified from the text feature extraction unit of T-2 is received by the feature parameter input unit T-5b (FIG. 9: step S52). The text publisher information indicating the publisher information input in the text publisher information input unit 3 is received in the publisher information input unit T-5c (FIG. 9: step S53), and the text authentication information input unit in T-4 is received. Is received by the issuer ID input unit T-5d (FIG. 9: step S54), and when all the information is obtained, the text authentication information is generated as follows. Text credentials are themselves embedded in the text,
If the text authentication information is simply embedded in the text, the text authentication information is displayed on the browser and may be falsified by a special editor. Therefore, the present apparatus performs encryption so that the text authentication information becomes invisible on a browser or the like and the text authentication information is not known.

【００５１】つまり、まず、暗号化器Ｔ−５ｅがテキス
ト認証情報を暗号化し、通常のユーザからは解読不能と
する（図９：ステップＳ５５）。暗号化には、例えば、
清水らが発明した、ＦＥＡＬ−８、ＮＸ（特願昭６０-
２５２６５０、「データ拡散装置」）などの暗号化装置
を利用する。暗号化装置は、基本的には、あるバイト列
と暗号鍵を与えると、それに基づき、バイト列を暗号化
して、暗号化されたバイト列を返す装置である。通常、
これらの暗号化装置は、バイト列を暗号化して適当なバ
イト列に変換する。しかし、これらの暗号化装置では、
ブラウザに不可視であるような暗号化を行うわけではな
く、ブラウザに対しては、可視であったり、制御コード
となってしまう場合がある。That is, first, the encryptor T-5e encrypts the text authentication information so that it cannot be decrypted by a normal user (FIG. 9: step S55). For encryption, for example,
FEAL-8, NX invented by Shimizu et al.
252650, a "data spreading device"). An encryption device is basically a device that, when given a byte sequence and an encryption key, encrypts the byte sequence based on the byte sequence and returns an encrypted byte sequence. Normal,
These encryption devices encrypt a byte sequence and convert it to an appropriate byte sequence. However, with these encryption devices,
The encryption is not performed so as to be invisible to the browser, but may be visible to the browser or become a control code.

【００５２】例えば、0x20は半角スペースであったり、
0x0aは、改行コードであると通常のブラウザでは認識し
てしまう。そのため、ただ単純に暗号化器Ｔ−５ｅで生
成したテキスト認証情報であると、ブラウザに可視とな
ってしまう。そのため、ブラウザにおいて不可視とする
ために、暗号化器Ｔ−５ｅが生成したバイト列を、ある
特殊なコード列に変換することによりブラウザにおいて
不可視とするバイト列を生成する。例えば、ＳＪＩＳ漢
字コード体系では、１バイト目のバイトが以下のように
定められている。制御コード群が、0x00から0x1fまでＡＳＣＩＩコード群が、0x20から0x7FまでＳＪＩＳコード群が、0x81から0x9Fまで半角カタカナコード群が0xa1から0xCFまでＳＪＩＳコードが、0xE0から0xf9まで以上が漢字コードとして利用される。それ以外のコード
は、逆にブラウザに非表示なコード群となる。つまり、
漢字コードが属しているものを１バイト目と２バイト目
の関係で表せば、図５のようになる。For example, 0x20 is a half-width space,
0x0a is recognized by ordinary browsers as a line feed code. Therefore, if the text authentication information is simply generated by the encryptor T-5e, the text authentication information becomes visible to the browser. Therefore, in order to make the invisible in the browser, the byte string generated by the encryptor T-5e is converted into a special code string to generate a byte string invisible in the browser. For example, in the SJIS kanji code system, the first byte is defined as follows. Control code group from 0x00 to 0x1f ASCII code group from 0x20 to 0x7F SJIS code group from 0x81 to 0x9F Half-width katakana code group from 0xa1 to 0xCF SJIS code from 0xE0 to 0xf9 Use as Kanji code Is done. Other codes are conversely hidden code groups in the browser. That is,
FIG. 5 shows the relationship of the kanji code to the first byte and the second byte.

【００５３】すなわち、斜線で示した部分が漢字コード
で利用されるコード群であり、白の部分は、漢字コード
で利用されていないコード群であり、かつ、ブラウザで
不可視となりうるコード群である。この２バイトで表さ
れる、未使用領域に対してそれぞれ暗号化されたバイト
列を写像することにより、暗号化された任意の８バイト
を表す。例えば、以下のように暗号化バイト表現バイト 0x00 0f8080 0x01 0x8081 Ox02 0x8082 0x03 0x8083 … 0xff 0xffff というような写像テーブルを記述することにより、暗号
化バイト列をブラウザに不可視な不可視バイト列に変換
することができる。That is, the shaded portion is a code group used in the kanji code, and the white portion is a code group not used in the kanji code and a code group that may be invisible to the browser. . By mapping the encrypted byte strings to the unused area represented by these 2 bytes, any encrypted 8 bytes are represented. For example, by writing a mapping table such as encrypted byte representation byte 0x00 0f8080 0x01 0x8081 Ox02 0x8082 0x03 0x8083… 0xff 0xffff as shown below, it is possible to convert the encrypted byte sequence into an invisible byte sequence invisible to the browser. it can.

【００５４】Ｔ−５のテキスト認証情報発生部では、コ
ード変換部Ｔ−５ｆが最終的に暗号化バイト列を不可視
バイト列に変換して（図９：ステップＳ５６）、暗号化
認証情報出力部Ｔ−５ｇがこの不可視バイト列を出力す
る（図９：ステップＳ５７）。暗号化するデータ列とし
ては、Ｔ−３のテキスト発行元情報入力部からの出力で
あるテキスト発行元情報と、Ｔ−４のテキスト認証情報
入力部から出力されるテキスト発行元ＩＤと、Ｔ―２の
テキスト特徴抽出部から出力されるテキスト特徴情報お
よびテキストパラメータ特徴パラメータ（Ｆ値）であ
る。テキスト特徴情報については、入力されたテキスト
の特徴を明確にするために使用されるので、復元された
際に、復元されたデータと現状データの相違が発見され
た場合には、現状データが元データの改ざんを受けたこ
とを表す。In the text authentication information generating section of T-5, the code converting section T-5f finally converts the encrypted byte sequence into an invisible byte sequence (FIG. 9: step S56), and outputs the encrypted authentication information. T-5g outputs the invisible byte sequence (FIG. 9: step S57). The data string to be encrypted includes text publisher information output from the text publisher information input unit of T-3, text publisher ID output from the text authentication information input unit of T-4, 2 is the text feature information and the text parameter feature parameter (F value) output from the text feature extraction unit 2. The text feature information is used to clarify the features of the input text, so if a difference between the restored data and the current data is found when Indicates that data has been tampered with.

【００５５】次に、Ｔ−５のテキスト認証情報発生部に
おける暗号化の詳細を説明する。テキスト発行元情報お
よびテキスト発行元ＩＤおよびテキスト特徴量パラメー
タを暗号化するとともに、テキスト特徴情報については
埋め込むバイト数に応じて、暗号化する。テキスト発行
元情報とテキスト発行元ＩＤは、必ず埋め込むため、こ
れらの情報のバイト数の総和よりも埋め込みバイト数は
大きくなければならない。すなわち、（テキスト発行元情報バイト数）＋（テキスト発行元Ｉ
Ｄバイト数）＋ｎ＜埋め込みバイト数ただし、ｎはテキスト特徴量パラメータ（Ｆ値）のバイ
ト数である。Next, the details of the encryption in the text authentication information generating section of T-5 will be described. The text issuer information, the text issuer ID, and the text feature parameter are encrypted, and the text feature information is encrypted according to the number of bytes to be embedded. Since the text publisher information and the text publisher ID are always embedded, the number of embedded bytes must be larger than the sum of the bytes of these pieces of information. That is, (text publisher information byte count) + (text publisher I
D number of bytes) + n <number of embedded bytes where n is the number of bytes of the text feature parameter (F value).

【００５６】Ｔ−５のテキスト認証情報発生部では、埋
め込みバイト数単位で、暗号化するデータを分割する。
（一番最後の余りについては、0x00をpaddingする。）
図６に示すように、分割されたそれぞれのバイト列に対
して暗号化を行い、暗号化された結果に対して、さらに
次の分割されたテキスト特徴情報の排他的論理和を演算
し、その結果を秘密鍵で暗号化する。これを埋め込み可
能バイト数で分割されたテキスト特徴情報量分だけこな
し、最終的に暗号化された結果を埋め込む。The T-5 text authentication information generation section divides the data to be encrypted in units of embedded bytes.
(Padding 0x00 for the last remainder.)
As shown in FIG. 6, encryption is performed on each of the divided byte strings, and an exclusive OR of the next divided text feature information is further calculated on the encrypted result. Encrypt the result with the private key. This is done by the amount of text feature information divided by the number of bytes that can be embedded, and finally the encrypted result is embedded.

【００５７】これは、ＦＥＡＬなどのようなブロック暗
号型暗号化装置では、不得意なある偏った入カデータ
（例えば、テキスト）に対する暗号の強度を強くする方
法として利用されている操作モードに近い。本操作モー
ドを利用することにより、テキスト特徴は、単純にテキ
スト特徴を暗号化しただけでなく、暗号化されたテキス
ト特徴との排他論理和による暗号をかけているため、暗
号化器に対する入力が分散し、暗号強度が著しく高くな
る。This is similar to an operation mode used in a block cipher type encryption device such as FEAL or the like, which is used as a method for increasing the strength of encryption for undesired and unbalanced input data (eg, text). By using this operation mode, the text features are not only simply encrypted text features, but also encrypted by exclusive OR with the encrypted text features, so that the input to the encryptor is It is dispersed, and the encryption strength is significantly increased.

【００５８】例えば、埋め込み可能なバイト数が８バイ
トであれば、８バイト以内でテキストを暗号化しなけれ
ばならない。暗号化する情報のバイト列は、以下であ
る。 For example, if the number of bytes that can be embedded is 8 bytes, the text must be encrypted within 8 bytes. The byte sequence of the information to be encrypted is as follows.

【００５９】これらを埋め込み可能なバイト数（８バイ
ト）で分割する。一番最後に分割された（２４）番目の分割データに対し
ては、最後に0x00をパッディング（padding）する。そ
して、（１）の暗号化器の出力を（２）と排他的論理和
を行い、その結果を暗号化器にかけ、さらに（３）と排
他論理和を行い。最後の（２４）まで暗号器にかけた結
果を、最終的な暗号化結果とする。以上、Ｔ−５のテキ
スト認証情報発生部における暗号化の詳細を説明した。These are divided by the number of bytes that can be embedded (8 bytes). The last (24) -th divided data is padded with 0x00 at the end. Then, the output of the encryptor in (1) is exclusive-ORed with (2), the result is applied to the encryptor, and the exclusive-OR is performed with (3). The result applied to the encryptor up to the last (24) is defined as the final encryption result. The details of the encryption in the text authentication information generation unit of T-5 have been described above.

【００６０】また、テキストの一部の編集などの可能性
により、全体のテキストが分断される可能性があるが、
本装置では、前記Ｆ値に基づき、解析される単位を元
に、テキスト特徴情報を抽出し、それぞれの解析単位毎
に暗号化されたテキスト特徴情報とテキスト全体のテキ
スト発行元情報、テキスト発行元ＩＤ、テキスト特徴パ
ラメータ(Ｆ値)とを、Ｔ−６のテキスト認証情報埋め込
み部で埋め込む（図８（ｂ）：ステップＳ７）。Also, the possibility of editing a part of the text may cause the entire text to be fragmented.
In this apparatus, text feature information is extracted based on the unit to be analyzed based on the F value, and the text feature information encrypted for each analysis unit, the text publisher information of the entire text, the text publisher The ID and the text feature parameter (F value) are embedded in the text authentication information embedding unit of T-6 (FIG. 8B: step S7).

【００６１】ここで、Ｔ−６のテキスト認証情報埋め込
み部の動作の詳細を説明する。まず、Ｔ−５のテキスト
認証情報発生部からテキスト特徴パラメータ（Ｆ値）
が、テキスト特徴パラメータ入力部Ｔ−６ａに入力され
る（図１０：ステップＳ６１）。そして、Ｔ−５のテキ
スト認証情報発生部が発生した不可視バイト列が暗号化
認証情報入力部Ｔ−６ｂに入力される（図１０：ステッ
プＳ６２）。さらにＴ−１のテキスト読み取り部からテ
キスト認証情報を埋め込むテキストが、テキスト入力部
Ｔ−６ｃに入力される（図１０：ステップＳ６３）。そ
して、各入力部に入力された情報が揃うと、判定部Ｔ−
６ｄは、すべての不可視バイト列を出力したか判断する
（図１０：ステップＳ６４）。Here, the operation of the text authentication information embedding section of T-6 will be described in detail. First, a text feature parameter (F value) from the text authentication information generation unit of T-5
Is input to the text feature parameter input unit T-6a (FIG. 10: step S61). Then, the invisible byte sequence generated by the text authentication information generation unit at T-5 is input to the encrypted authentication information input unit T-6b (FIG. 10: step S62). Further, the text in which the text authentication information is embedded is input from the text reading unit T-1 to the text input unit T-6c (FIG. 10: step S63). When the information input to each input unit is completed, the determination unit T-
6d determines whether all invisible byte strings have been output (FIG. 10: step S64).

【００６２】判定部Ｔ−６ｄですべての不可視バイト列
を出力していないと判定された場合、埋込テキスト出力
部Ｔ−６ｆは、入力テキストを、Ｆ値に基づき別途定め
られるサイズのバイト列毎に読み込み、出力する（図１
０：ステップＳ６５）。さらに、埋込暗号化認証情報出
力部Ｔ−６ｅは、不可視バイト列を、Ｆ値に基づき別途
定められるサイズのバイト列毎に読み込み、出力する
（図１０：ステップＳ６６）。そして、すべての不可視
バイト列が出力されるまで、図１０のステップＳ６４〜
Ｓ６６を繰り返す。判定部Ｔ−６ｄですべての不可視バ
イト列を出力したと判定された場合、残った入力テキス
トをすべて出力する（図１０：ステップＳ６７）。以
上、Ｔ−６のテキスト認証情報埋め込み部の動作の詳細
を説明した。If the determining unit T-6d determines that all the invisible byte strings have not been output, the embedded text output unit T-6f converts the input text into a byte string of a size separately determined based on the F value. Read and output each time (Fig. 1
0: Step S65). Further, the embedded encryption authentication information output unit T-6e reads and outputs the invisible byte sequence for each byte sequence of a size separately determined based on the F value (FIG. 10: step S66). Until all invisible byte strings are output, steps S64 to S64 in FIG.
S66 is repeated. When the determining unit T-6d determines that all the invisible byte strings have been output, all the remaining input texts are output (FIG. 10: step S67). The operation of the text authentication information embedding unit at T-6 has been described in detail above.

【００６３】次に、Ｔ−６のテキスト認証情報埋め込み
部の動作を具体例を上げて説明する。前記例で示したＦ
値を１とする場合、すなわちテキスト特徴としてテキス
ト全体を利用する場合には、テキスト全体の任意の個所
（ＳＪＩＳ等の２バイトの境界を妨げない範囲）に挿入
する。例えば、以下のようなテキストが入力され、 Next, the operation of the text authentication information embedding section of T-6 will be described with a specific example. F shown in the above example
When the value is 1, that is, when the entire text is used as a text feature, the text is inserted at an arbitrary position (a range that does not obstruct the boundary of 2 bytes such as SJIS). For example, if the following text is entered,

【００６４】暗号化最終データが 8081 8283 8485 8687 となっていたとすると、下線の部分に暗号化データが埋
め込まれる。 Assuming that the final encrypted data is 8081 8283 8485 8687, the encrypted data is embedded in the underlined portion.

【００６５】Ｆ値が２の場合には、タグ単位で処理され
るので、タグ毎に埋め込まれる。第一タグの暗号化最終
データが 8081 8283 8485 8687 第二タグの暗号化最終データが 8889 9091 9293 9495 である時、以下のようにデータ中に埋め込まれる。以上が、テキスト認証情報をＨＴＭＬテキストに埋め込
むまでの動作である。When the F value is 2, the processing is performed on a tag basis, and is therefore embedded for each tag. When the final encrypted data of the first tag is 8081 8283 8485 8687, the final encrypted data of the second tag is 8889 9091 9293 9495, it is embedded in the data as follows. The above is the operation up to embedding the text authentication information in the HTML text.

【００６６】次に、テキスト認証情報を埋め込まれたＨ
ＴＭＬテキストの認証を行う際の動作を説明する。Next, H in which the text authentication information is embedded
The operation at the time of performing TML text authentication will be described.

【００６７】はじめにＴ−７のテキスト認証情報取り出
し部は、Ｔ―６のテキスト認証情報埋め込み部により埋
め込まれたテキスト認証情報を、認証のためにテキスト
中から取り出す。さらにＴ−７のテキスト認証情報取り
出し部では、テキスト認証情報が埋め込まれたテキスト
から、テキスト認証情報と元のテキストを分離する処理
も行う。つまり、図５で示されるコード領域で例えばＳ
ＪＩＳで利用される領域のバイト列は、テキスト（テキ
ストバイト列）と判断し、それ以外のバイト列は、テキ
スト認証情報であると判断する。テキストバイト列は、
テキスト特徴情報を取り出すためにＴ−８のテキスト特
徴取り出し部に渡される。一方、暗号化されているテキ
スト認証情報は、これをを復元するためにＴ−９のテキ
スト認証情報読み取り部に渡される。First, the text authentication information extracting section at T-7 extracts the text authentication information embedded by the text authentication information embedding section at T-6 from the text for authentication. Further, the T-7 text authentication information extracting unit also performs a process of separating the text authentication information and the original text from the text in which the text authentication information is embedded. That is, for example, in the code area shown in FIG.
The byte string of the area used in JIS is determined to be text (text byte string), and the other byte strings are determined to be text authentication information. The text byte sequence is
The text feature information is passed to the text feature extraction unit of T-8 to extract the text feature information. On the other hand, the encrypted text authentication information is passed to the T-9 text authentication information reading unit in order to restore it.

【００６８】次に、Ｔ−８のテキスト特徴取り出し部
は、テキスト認証情報を発生する際に使用したテキスト
特徴情報を、テキスト認証情報を分離したテキスト（す
なわち、元のテキスト）から抽出する部位である。Ｔ−
７のテキスト認証情報取り出し部により、テキスト認証
情報と元のテキストが分離されるので、その分離された
元のテキストからテキストの特徴をＴ−２のテキスト特
徴抽出部と同様な処理方式で再計算する。テキスト特徴
パラメータ（Ｆ値）は初期に設定されているので、その
値に基づき対応する処理を行い、テキスト特徴情報とテ
キスト特徴情報を抽出する解析単位を抽出する。Next, the text feature extracting section of T-8 extracts the text feature information used for generating the text authentication information from the text from which the text authentication information is separated (ie, the original text). is there. T-
7, the text authentication information extraction unit separates the text authentication information from the original text, and recalculates the text features from the separated original text in the same processing method as the T-2 text feature extraction unit. I do. Since the text feature parameter (F value) is initially set, corresponding processing is performed based on the value to extract text feature information and an analysis unit for extracting the text feature information.

【００６９】次に、Ｔ−９のテキスト認証読み出し部
は、Ｔ−７のテキスト認証情報取り出し部により分離さ
れたテキスト認証情報に対して、Ｔ−８のテキスト特徴
取り出し部が抽出したテキスト特徴情報と解析単位を用
いて、テキスト発行元ＩＤとテキスト発行元情報を分離
・抽出する。Ｔ−８のテキスト特徴取り出し部が抽出し
た解析単位に基づき、埋め込みバイト数が特定されるの
で、それに基づいてＴ−８のテキスト特徴取り出し部が
抽出したテキスト特徴情報をｎ列に分離される。このｎ
列を図７に示すような方法で、復号化を行い、テキスト
発行元情報および、テキスト発行元ＩＤを抽出する。Next, the text authentication reading section of T-9 applies the text feature information extracted by the text feature extracting section of T-8 to the text authentication information separated by the text authentication information extracting section of T-7. Then, the text issuer ID and the text issuer information are separated and extracted using the analysis unit. Since the number of embedded bytes is specified based on the analysis unit extracted by the T-8 text feature extraction unit, the text feature information extracted by the T-8 text feature extraction unit is separated into n columns based on the number of embedded bytes. This n
The column is decrypted by a method as shown in FIG. 7, and text publisher information and a text publisher ID are extracted.

【００７０】先の例では、以下のような、テキスト認証
情報などの情報が透かし情報として埋め込まれたテキス
トから、暗号化情報と元のテキスト情報を分離する例を
考えると、 In the above example, consider the following example in which encrypted information and original text information are separated from text in which information such as text authentication information is embedded as watermark information.

【００７１】下線のある部分と、下線以外の部分とに分
離する。分離には、図５に示されるようなＳＪＩＳのコ
ード体系を利用し、ＳＪＩＳのコードとそれ以外のデー
タに分離することにより分離処理が実行される。結果、
以下のように分離される。 The part with the underline and the part other than the underline are separated. The separation is performed by using an SJIS code system as shown in FIG. 5 and separating the data into SJIS codes and other data. result,
It is separated as follows.

【００７２】さらに、暗号化部を復元し、テキスト発行
元情報とテキスト発行元ＩＤを計算する。例えば、上記
データから、 <氏名>たろう</氏名> <発行元ＩＤ>123</発行元ＩＤ> が抽出される。Further, the encryption unit is restored, and text publisher information and text publisher ID are calculated. For example, <name> Taro </ name><issuerID> 123 </ issuer ID> is extracted from the above data.

【００７３】次に、Ｔ−１０のテキスト発行元情報抽出
部は、Ｔ−９のテキスト認証読み出し部で抽出したテキ
スト発行元情報を取り出し、出力する。上記例では、Ｔ
−９のテキスト認証読み出し部で抽出したテキスト発行
元情報が、“<氏名>たろう</氏名>”であるから、この
情報が出力機器に出力される。次に、Ｔ−１１のテキス
ト認証情報抽出部は、Ｔ−９のテキスト認証読み出し部
で抽出したテキスト発行元ＩＤを取り出し、出力機器に
出力を行う。上記例では、Ｔ−９のテキスト認証読み出
し部で抽出したテキスト発行元ＩＤが、“<発行元ＩＤ>
123</発行元ＩＤ>”であるから、この情報が出力機器に
出力される。以上が、テキスト認証情報が埋め込まれた
ＨＴＭＬテキストの認証を行う際の動作を説明した。Next, the text publisher information extraction unit of T-10 extracts and outputs the text publisher information extracted by the text authentication reading unit of T-9. In the above example, T
Since the text issuer information extracted by the text authentication reading unit -9 is “<name> Taro </ name>”, this information is output to the output device. Next, the text authentication information extracting unit at T-11 extracts the text publisher ID extracted by the text authentication reading unit at T-9 and outputs the text publisher ID to the output device. In the above example, the text issuer ID extracted by the text authentication reading unit of T-9 is “<issuer ID>
123 </ issuer ID> ”, this information is output to the output device. The above description has been given of the operation when performing the authentication of the HTML text in which the text authentication information is embedded.

【００７４】なお、本発明は、インターネットの他、Ｌ
ＡＮやダイアルアップによるネットワークを利用しても
よい。また、スタンドアローンの装置として実現されて
もよい。また、認証情報の埋め込みの機能および取り出
しの機能を、各々別個の装置として実現してもよく、同
一の装置として実現してもよい。また、本発明のテキス
ト電子認証装置を実現するためのプログラム（テキスト
電子認証プログラム）をコンピュータ読み取り可能な記
録媒体に記録して、この記録媒体に記録されたプログラ
ムをコンピュータシステムに読み込ませ、実行すること
によりテキストの電子認証を行ってもよい。すなわち、
このテキスト電子認証プログラムの一方は、前記テキス
ト読み取り部の機能と、テキスト特徴抽出部の機能と、
テキスト発行元情報入力部の機能と、テキスト認証情報
入力部の機能と、テキスト認証情報発生部の機能と、テ
キスト認証情報埋め込み部の機能とをコンピュータに実
現させる。また、このテキスト電子認証プログラムの他
方は、テキスト認証情報取り出し部の機能と、テキスト
特徴取り出し部の機能と、テキスト認証情報読み取り部
の機能と、テキスト発行元情報抽出部の機能と、テキス
ト認証情報抽出部の機能とをコンピュータに実現させ
る。The present invention is not limited to the Internet,
A network using an AN or dial-up may be used. Further, it may be realized as a stand-alone device. The function of embedding authentication information and the function of extracting authentication information may be realized as separate devices, or may be realized as the same device. Further, a program (text electronic authentication program) for realizing the text electronic authentication device of the present invention is recorded on a computer-readable recording medium, and the program recorded on this recording medium is read and executed by a computer system. Thus, electronic authentication of the text may be performed. That is,
One of the text electronic authentication programs includes a function of the text reading unit, a function of the text feature extracting unit,
The computer realizes the function of the text issuer information input unit, the function of the text authentication information input unit, the function of the text authentication information generation unit, and the function of the text authentication information embedding unit. The other part of the text electronic authentication program includes a function of a text authentication information extracting unit, a function of a text feature extracting unit, a function of a text authentication information reading unit, a function of a text publisher information extracting unit, and a function of a text authentication information extracting unit. The function of the extracting unit is realized by a computer.

【００７５】なお、ここでいう「コンピュータシステ
ム」とは、ＯＳや周辺機器等のハードウェアを含むもの
とする。また、「コンピュータシステム」は、ＷＷＷシ
ステムを利用している場合であれば、ホームページ提供
環境（あるいは表示環境）も含むものとする。また、
「コンピュータ読み取り可能な記録媒体」とは、フロッ
ピーディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ
等の可般媒体、コンピュータシステムに内蔵されるハー
ドディスク等の記憶装置のことをいう。さらに「コンピ
ュータ読み取り可能な記録媒体」とは、インターネット
等のネットワークや電話回線等の通信回線を介してプロ
グラムを送信する場合の通信線のように、短時間の間、
動的にプログラムを保持するもの（伝送媒体ないしは伝
送波）、その場合のサーバやクライアントとなるコンピ
ュータシステム内部の揮発性メモリのように、一定時間
プログラムを保持しているものも含むものとする。また
上記プログラムは、前述した機能の一部を実現するため
のものであってもよい。さらに、前述した機能をコンピ
ュータシステムにすでに記録されているプログラムとの
組み合わせで実現できるもの、いわゆる差分ファイル
（差分プログラム）であってもよい。It is to be noted that the "computer system" here includes an OS and hardware such as peripheral devices. The “computer system” also includes a homepage providing environment (or a display environment) if a WWW system is used. Also,
"Computer readable recording medium" means a floppy disk, a magneto-optical disk, a ROM, a CD-ROM.
And a storage device such as a hard disk built in a computer system. Further, "computer-readable recording medium" refers to a communication line for transmitting a program through a network such as the Internet or a communication line such as a telephone line for a short time, such as a communication line.
It also includes those that dynamically store programs (transmission media or transmission waves) and those that store programs for a certain period of time, such as volatile memories in computer systems serving as servers and clients in that case. Further, the program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, what is called a difference file (difference program) may be sufficient.

【００７６】以上、この発明の実施形態を図面を参照し
て詳述してきたが、具体的な構成はこの実施形態に限ら
れるものではなく、この発明の要旨を逸脱しない範囲の
設計等も含まれる。Although the embodiments of the present invention have been described in detail with reference to the drawings, the specific configuration is not limited to the embodiments, and includes a design and the like within a range not departing from the gist of the present invention. It is.

【００７７】[0077]

【発明の効果】以上、詳細に説明したように、本発明に
よれば、テキストが記述された電子文書から、該テキス
トの特徴を抽出するとともに、該テキストの発行元の情
報と認証のための情報を入力し、これらの情報を既存の
電子文書表示器では不可視であって、かつ、解読不能な
暗号化電子認証情報とし、前記電子文書に埋め込むの
で、一般に用いられるテキストが記述された電子文書に
対する適切な認証を与えることができる。As described above in detail, according to the present invention, it is possible to extract the characteristics of a text from an electronic document in which the text is described, and obtain information of the text issuing source and authentication. Information is entered, and these information are invisible and indecipherable encrypted electronic authentication information on an existing electronic document display, and are embedded in the electronic document. Appropriate authentication for can be given.

[Brief description of the drawings]

【図１】本発明の一実施の形態であるテキスト電子認
証装置の構成を示す図である。FIG. 1 is a diagram showing a configuration of a text electronic authentication device according to an embodiment of the present invention.

【図２】一実施の形態であるテキスト認証情報発生部
の構成を示す図である。FIG. 2 is a diagram illustrating a configuration of a text authentication information generation unit according to an embodiment.

【図３】一実施の形態であるテキスト認証情報埋め込
み部の構成を示す図である。FIG. 3 is a diagram illustrating a configuration of a text authentication information embedding unit according to an embodiment;

【図４】ＨＴＭＬテキストの一例の表示例である。FIG. 4 is a display example of an example of an HTML text.

【図５】ＳＪＩＳ漢字コード体系における漢字コード
とそれ以外のコードを示す図である。FIG. 5 is a diagram showing kanji codes and other codes in the SJIS kanji code system.

【図６】テキスト認証情報発生部における暗号化の手
順を説明する図である。FIG. 6 is a diagram illustrating an encryption procedure in a text authentication information generation unit.

【図７】テキスト認証読み出し部における復号化の手
順を説明する図である。FIG. 7 is a diagram illustrating a decryption procedure in a text authentication reading unit.

【図８】一実施の形態であるテキスト電子認証装置の
動作手順を示す図である。FIG. 8 is a diagram illustrating an operation procedure of the text electronic authentication device according to the embodiment;

【図９】一実施の形態であるテキスト認証情報発生部
の動作手順を示す図である。FIG. 9 is a diagram showing an operation procedure of a text authentication information generation unit according to one embodiment.

【図１０】一実施の形態であるテキスト認証情報埋め
込み部の動作手順を示す図である。FIG. 10 is a diagram illustrating an operation procedure of a text authentication information embedding unit according to an embodiment.

[Explanation of symbols]

Ｔ−１…テキスト読み取り部Ｔ−２…テキスト特徴抽出部Ｔ−３…テキスト発行元情報入力部Ｔ−４…テキスト認証情報入力部Ｔ−５…テキスト認証情報発生部Ｔ−６…テキスト認証情報埋め込み部Ｔ−７…テキスト認証情報取り出し部Ｔ−８…テキスト特徴取り出し部Ｔ−９…テキスト認証情報読み取り部Ｔ−１０…テキスト発行元情報抽出部Ｔ−１１…テキスト認証情報抽出部Ｔ−５ａ…テキスト特徴情報入力部Ｔ−５ｂ…特徴量パラメータ入力部（特徴量識別子入力
部）Ｔ−５ｃ…発行元情報入力部Ｔ−５ｄ…発行元ＩＤ入力部（発行元認証情報入力部）Ｔ−５ｅ…暗号化器Ｔ−５ｆ…コード変換部Ｔ−５ｇ…暗号化認証情報出力部Ｔ−６ａ…特徴量パラメータ入力部（特徴量識別子入力
部）Ｔ−６ｂ…暗号化認証情報入力部Ｔ−６ｃ…テキスト入力部Ｔ−６ｄ…判定部Ｔ−６ｅ…埋込暗号化認証情報出力部Ｔ−６ｆ…埋込テキスト出力部Ｔ−６ｇ…テキスト出力部T-1: Text reading unit T-2: Text feature extracting unit T-3: Text issuing information input unit T-4: Text authentication information input unit T-5: Text authentication information generating unit T-6: Text authentication information Embedding unit T-7: Text authentication information extraction unit T-8: Text feature extraction unit T-9: Text authentication information reading unit T-10: Text issuer information extraction unit T-11: Text authentication information extraction unit T-5a ... Text feature information input section T-5b ... Feature parameter input section (Feature identifier input section) T-5c ... Publisher information input section T-5d ... Publisher ID input section (Publisher authentication information input section) T- 5e: encryptor T-5f: code conversion unit T-5g: encrypted authentication information output unit T-6a: feature amount parameter input unit (feature amount identifier input unit) T-6b: encrypted authentication information input Power section T-6c: text input section T-6d: determination section T-6e: embedded encrypted authentication information output section T-6f: embedded text output section T-6g: text output section

フロントページの続き (72)発明者森大二郎東京都新宿区西新宿三丁目19番２号日本電信電話株式会社内 (72)発明者田中一男東京都新宿区西新宿三丁目19番２号日本電信電話株式会社内Ｆターム(参考） 5J104 AA08 AA14 NA27 NA36 NA38 PA07 PA09 Continuing from the front page (72) Inventor Daijiro Mori 3-19-2 Nishi Shinjuku, Shinjuku-ku, Tokyo Japan Telegraph and Telephone Corporation (72) Inventor Kazuo Tanaka 3-192-2 Nishi-Shinjuku, Shinjuku-ku, Tokyo Nippon Telegraph and Telephone Telephone Co., Ltd. F term (reference) 5J104 AA08 AA14 NA27 NA36 NA38 PA07 PA09

Claims

[Claims]

1. A text electronic authentication device which authenticates an electronic document in which text is described by embedding authentication information in the electronic document, wherein the electronic document in which the text is described is provided. A text reading unit that reads text from the text reading unit; a text feature extracting unit that extracts features of the text read by the text reading unit; and a text publisher information input unit that inputs information about the publisher of the text described in the electronic document. A text authentication information input unit for inputting information for authentication of a text described in the electronic document; information representing a feature of the text extracted by the text feature extraction unit; and the text publisher information input unit. Using the information of the input publisher and the information for text authentication input to the text authentication information input unit, A text authentication information generating unit that is invisible to an existing electronic document display device and is invisible and cannot be decrypted; and reads the encrypted electronic authentication information generated by the text authentication information generating unit into the text. A text authentication information embedding section that embeds the encrypted electronic authentication information so that the encrypted electronic authentication information is not lost even if the text embedded with the encrypted electronic authentication information is edited, embedded in the text read by the section; A text electronic authentication device comprising:

2. The text feature extraction unit is capable of executing a plurality of extraction methods according to the size of the feature of the text to be extracted, and according to the designation of the feature size of the text to be extracted, 2. The text electronic authentication apparatus according to claim 1, wherein the feature is extracted and output, and an identifier indicating the extraction method used is further output.

3. The text authentication information generating unit, wherein information representing characteristics of the text is input, and a feature amount input unit for outputting information representing the characteristics of the text; An identifier indicating the extracted extraction method is input, a feature amount identifier input unit that outputs the identifier, and an issuer information input unit that receives information of the publisher of the text and outputs information of the publisher of the text. An authentication information input unit for inputting information for authenticating the text and outputting information for authenticating the text; information indicating characteristics of the text; an identifier; and information on an issuer of the text An encryptor that encrypts authentication information including information for text authentication and outputs encrypted authentication information; and that the display device does not use the encrypted authentication information for displaying text. And a code conversion unit that outputs the invisible byte sequence, and an encrypted authentication information output unit that receives the invisible byte sequence and outputs the invisible byte sequence. 3. The text electronic authentication device according to claim 2, wherein

4. The text authentication information embedding section, wherein the identifier is input, a feature quantity identifier input section that outputs the identifier, and encrypted authentication information that receives the invisible byte sequence and outputs the invisible byte sequence. An input unit, a text input unit into which a text described in the electronic document is input, a determination unit that determines whether all of the invisible byte strings have been inserted into the electronic document, and the encryption authentication according to the identifier. An embedded encryption authentication information output unit that captures and outputs information; an embedded text output unit that captures and outputs text described in the electronic document according to the identifier; and the determination unit converts all of the invisible byte strings. A text output unit that outputs all the rest of the text described in the electronic document when it is determined that the text is embedded in the electronic document; When the determining unit determines that all of the invisible byte strings have not been inserted into the electronic document, the embedded encrypted authentication information output unit and the embedded text output unit output alternately. The text electronic authentication device according to claim 3.

5. An apparatus for extracting authentication information from an electronic document in which authentication information is embedded by the text electronic authentication apparatus according to claim 1, wherein the encrypted electronic authentication information is embedded by the text authentication information embedding unit. A text authentication information extracting unit that reads an embedded text and separates and extracts the text described in the electronic document and the encrypted electronic authentication information; anda text authentication information extracting unit that is described in the electronic document extracted by the text authentication information extracting unit. A text feature extracting unit that extracts text features based on the extracted text; an encrypted electronic authentication information separated and extracted by the text authentication information extracting unit; and a text feature extracted by the text feature extracting unit. And a text for decrypting the encrypted authentication information embedded in the text. An authentication reading unit, from the electronic authentication information decrypted by the text authentication information reading unit, a text publisher information extracting unit that reads information of a text publisher, and from the electronic authentication information decrypted by the text authentication information reading unit, A text authentication information extracting unit that reads information for text authentication.

6. A text electronic authentication method for embedding authentication information in an electronic document in which a text is described, whereby the electronic document can be authenticated, wherein the electronic document in which the text is described A text reading procedure for reading a text from the text, a text feature extraction procedure for extracting a feature of the text read by the text reading procedure, and a text publisher information input procedure for inputting information about a publisher of the text described in the electronic document. A text authentication information inputting step of inputting information for text authentication described in the electronic document; information representing a text feature extracted by the text feature extracting procedure; and the text publisher information inputting procedure. The input publisher information and the text input by the text authentication information input procedure Using information for authentication, a text authentication generation procedure that is invisible to existing electronic document displays and that is encrypted electronic authentication information that cannot be decrypted, and a text authentication generation procedure generated by the text authentication generation procedure The encrypted electronic authentication information is embedded so that the encrypted electronic authentication information is embedded in the text read by the text reading procedure, and the encrypted electronic authentication information is not lost even if the text in which the encrypted electronic authentication information is embedded is edited. And a text authentication embedding procedure.

7. A method for extracting authentication information from an electronic document in which authentication information is embedded by the text electronic authentication method according to claim 6, wherein the encrypted electronic authentication information is embedded in the text authentication embedding procedure. A text authentication information reading procedure, separating the text described in the electronic document from the encrypted electronic authentication information and extracting the text authentication information, and a text authentication information extracting procedure described in the text authentication information extracting procedure. A text feature extraction procedure for extracting text features based on text, encrypted electronic authentication information separated and extracted by the text authentication information extraction procedure, and a text feature extracted by the text feature extraction unit procedure are also included. And decrypt the encrypted authentication information embedded in the text From the text authentication reading procedure to perform, from the electronic authentication information decrypted by the text authentication reading procedure, a text publisher information extraction procedure to read the text publisher information, and from the electronic authentication information decrypted by the text authentication reading procedure, A text authentication information extraction procedure for reading the information for text authentication.

8. A recording medium recording a text electronic authentication program for embedding authentication information in an electronic document in which the text is described, whereby the electronic document can be authenticated, wherein the text is A text reading procedure for reading a text from the described electronic document, a text feature extracting procedure for extracting a feature of the text read by the text reading procedure, and a text for inputting information of a publisher of the text described in the electronic document Issuer information input procedure; text authentication information input procedure for inputting information for text authentication described in the electronic document; information representing text features extracted by the text feature extraction procedure; Source information input in the source information input procedure and the text authentication information input procedure A text authentication generating procedure that uses the input information for text authentication to make encrypted electronic authentication information that is invisible and indecipherable with an existing electronic document display, and the text authentication The encrypted electronic authentication information generated by the generating procedure is embedded in the text read by the text reading procedure, and the encrypted electronic authentication information is not lost even if the text in which the encrypted electronic authentication information is embedded is edited. A computer-readable recording medium recording a text electronic authentication program for causing a computer to execute a text authentication embedding procedure for embedding encrypted electronic authentication information in a computer.

9. A recording medium storing a program for extracting authentication information from an electronic document in which authentication information is embedded by a computer having the text electronic authentication program according to claim 8, wherein the text authentication embedding procedure is performed. Reading the text in which the encrypted electronic authentication information is embedded, separating the text described in the electronic document from the encrypted electronic authentication information, extracting the extracted text authentication information, and extracting the text by the text authentication information extracting procedure. A text feature extraction procedure for extracting a text feature based on the text described in the electronic document obtained, an encrypted electronic authentication information separated and extracted by the text authentication information extraction procedure, and a text feature extraction unit procedure Of the text extracted by A text authentication reading procedure for decrypting the encrypted authentication information embedded in the text; and a text publisher information extracting procedure for reading text publisher information from the electronic authentication information decrypted by the text authentication reading procedure. And a computer-readable recording medium storing a text electronic authentication program for causing a computer to execute a text authentication information extracting procedure for reading information for text authentication from the electronic authentication information decrypted by the text authentication reading procedure. .