JP2000057315A

JP2000057315A - Document filing apparatus and document filing method

Info

Publication number: JP2000057315A
Application number: JP10222543A
Authority: JP
Inventors: Taizou Kameshiro; 泰三亀代; Yasuhiro Okada; 康裕岡田
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1998-08-06
Filing date: 1998-08-06
Publication date: 2000-02-25
Anticipated expiration: 2018-08-06
Also published as: JP3620299B2

Abstract

(57)【要約】【課題】文字認識を用いたファイリングシステムにお
ける文書検索において、検索もれ、誤抽出を減少させ、
高精度な検索を行うことを目的とする。【解決手段】文書登録時に文字認識手段によって作成
された文字コードと、特徴作成手段によって作成された
文字特徴を保存する。検索時には、文字コードが一致し
ない部分は特徴で照合する。切り出しエラーでキーワー
ドと文字認識結果の文字数が異なる部分でも検索可能と
なる。 (57) [Summary] [PROBLEMS] In a document search in a filing system using character recognition, search omission and erroneous extraction are reduced.
The purpose is to perform a highly accurate search. SOLUTION: A character code created by a character recognizing unit at the time of document registration and a character feature created by a feature creating unit are stored. At the time of search, the part where the character codes do not match is compared with the feature. It becomes possible to search even a part where the number of characters of the keyword and the character recognition result are different due to the cut-out error.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、例えば文書や図
面等の画像を電子的にファイリングするシステムに関
し、特に文書や図面に記載された文字を認識し、画像と
共に蓄積した文書・図面を任意の入力キーワードで全文
検索する文書ファイリング装置及び文書ファイリング方
法に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a system for electronically filing images such as documents and drawings, and more particularly to a system for recognizing characters described in documents and drawings and storing the documents and drawings stored together with the images. The present invention relates to a document filing apparatus and a document filing method for performing full-text search using an input keyword.

【０００２】[0002]

【従来の技術】従来、文書画像を電子的に保存し、検索
および表示するためには、文書画像に対して人手でキー
ワード情報を付加して保存する方法が用いられている。
また人手によるキーワード入力の手間を省くために文字
認識機能を有するシステムで文書画像中の文字を認識
し、関連するキーワードまたは全文を文書画像とともに
保存する方法が用いられている。後者では、文字認識の
性能が不完全であるために、誤認識が生じる。それ故検
索のために入力したキーワードに対し、入力キーワード
と異なる文字列が検索結果として表示される「検索ノイ
ズ」が発生する。また、文書画像中の文字が入力キーワ
ードと同一であるにもかかわらず、文字認識の誤認識の
ために検索結果として表示されない「検索漏れ」も発生
する。2. Description of the Related Art Conventionally, in order to electronically save, search and display a document image, a method of manually adding keyword information to a document image and saving the document image has been used.
Further, in order to save the trouble of manually inputting a keyword, a method of recognizing a character in a document image by a system having a character recognition function and storing a related keyword or a whole text together with the document image is used. In the latter case, erroneous recognition occurs due to imperfect character recognition performance. Therefore, for the keyword input for the search, “search noise” occurs in which a character string different from the input keyword is displayed as a search result. Further, even though the characters in the document image are the same as the input keyword, "search omission" may not be displayed as a search result due to erroneous character recognition.

【０００３】検索精度を向上させるためには、検索ノイ
ズ及び検索漏れを極力少なくする必要がある。検索時の
検索ノイズ、検索漏れを減少させる方法には、「文字認
識結果の候補を複数個保持する方法」および「検索方法
をあいまいにして文字認識性能の不完全性を補助する方
法」がある。「文字認識結果の候補を複数個保持する方
法」として、文字認識結果をラティス構造で保持し、ラ
ティス構造の文字認識結果を探索して文字コードを検索
することにより正解文字を検索する方法がある（「全文
検索可能な文書画像データベースシステムの開発」（第
８回ディジタル図書館ワークショップ（図書館情報大
学、１９９６年１０月２３日））。これを従来技術１と
する。In order to improve search accuracy, it is necessary to minimize search noise and search omission. Methods for reducing search noise and search omission at the time of search include a "method of retaining a plurality of character recognition result candidates" and a "method of making the search method ambiguous to assist incomplete character recognition performance". . As a “method of holding a plurality of candidates for character recognition results”, there is a method of holding character recognition results in a lattice structure, searching for character recognition results in a lattice structure, and searching for a character code to search for a correct character. ("Development of a full-text searchable document image database system" (8th Digital Library Workshop (Library and Information University, Oct. 23, 1996)).

【０００４】従来技術１の説明を図２７〜図２９を用い
て行う。図２７で５は文書画像を入力する入力手段、５
２は文字認識手段、５５は検索手段、５６は認識辞書、
５１は文字認識の実行を制御したり文書画像の入力およ
び入力キーワードを用いた検索を制御する制御手段、４
は入力キーワードでの検索結果を表示する表示手段、５
７は検索データ格納部である。図２８は文書画像の例で
あり、図２９(a)は図２８の文字画像であり、６０〜６
５は文字切り出し候補点を示す。図２９(b)は図２９
(a）の文字切り出し候補点の組合わせから文字認識を行
った結果を示す。The prior art 1 will be described with reference to FIGS. In FIG. 27, reference numeral 5 denotes input means for inputting a document image;
2 is character recognition means, 55 is search means, 56 is a recognition dictionary,
Reference numeral 51 denotes control means for controlling execution of character recognition, input of a document image, and control of a search using an input keyword.
Are display means for displaying search results for the input keyword;
Reference numeral 7 denotes a search data storage unit. FIG. 28 shows an example of a document image, and FIG. 29A shows the character image of FIG.
Reference numeral 5 indicates a character segmentation candidate point. FIG. 29B shows FIG.
The result of performing character recognition from the combination of the character extraction candidate points in (a) is shown.

【０００５】従来技術１では、文字認識手段５２が文字
認識の際、黒画素連結成分で挟まれる位置を文字切り出
し候補点とし(図２９(a)の候補点６０〜６５)、各文字
切り出し候補点間の画像に対して文字認識を行い、認識
結果候補文字の類似度から文字を保存するか否かを判定
し、文字候補を保持または棄却する。図２９(b)が文字
認識手段５２が作成した検索用データの例である。例え
ば、図２９(a)の文字切り出し候補点６１および６２で
挟まれる画像に対する文字認識結果は、図２９(b)の
「す」となる。検索手段５５は図２９(b)に示すラティ
ス構造の検索データをたどりながら文字列を検索し、入
力キーワードと一致するか否かを判定する。従来技術１
は検索データをラティス構造とし、文字切り出しエラー
に対応して正解文字数をより多く含ませ実質的な認識率
を向上させることで、検索性能を向上させようとするも
のである。In the prior art 1, when the character recognizing means 52 performs character recognition, a position sandwiched by black pixel connected components is set as a character extraction candidate point (candidate points 60 to 65 in FIG. 29A). Character recognition is performed on the image between the points, and it is determined whether or not to store the character based on the similarity of the recognition result candidate character, and the character candidate is retained or rejected. FIG. 29B shows an example of search data created by the character recognition means 52. For example, the character recognition result for the image sandwiched between the character extraction candidate points 61 and 62 in FIG. 29A is “SU” in FIG. 29B. The search means 55 searches the character string while following the search data of the lattice structure shown in FIG. 29B, and determines whether or not the character string matches the input keyword. Conventional technology 1
Aims to improve search performance by making the search data into a lattice structure and including a larger number of correct characters in response to a character segmentation error to improve the substantial recognition rate.

【０００６】「検索方法をあいまいにして文字認識性能
の不完全性を補助する方法」としては、例えば特開平8-
272813に記載されるように、入力キーワードと認識結果
との一致度の計算方法を、 m=一致した文字数／入力キーワードの文字数 *100(%) ・・・数式(1) で算出し、認識結果候補文字中に全ての検索文字を含ま
なくとも検索結果として出力するものがある。As a "method of assisting incomplete character recognition performance by making the retrieval method ambiguous", for example, Japanese Patent Laid-Open No.
As described in 272813, the calculation method of the matching degree between the input keyword and the recognition result is calculated as follows: m = the number of matched characters / the number of characters of the input keyword * 100 (%) ・・・ Calculated by formula (1) There are some which output as a search result even if all the search characters are not included in the candidate characters.

【０００７】以下に従来技術２として特開平8-272813の
説明をする。図３０は特開平8-272813の構成を示す図で
ある。図３０で５は文書画像を入力する入力手段、７２
は文字を認識する文字認識手段、７６は文字認識の際に
使用する各文字の標準パターンを格納した認識辞書、７
５は検索手段、７１は文字認識の実行を制御したり文書
画像の入力および入力キーワードを用いた検索を制御す
る制御手段、７７は画像および認識結果データを格納す
る検索データ格納部、４は検索結果を表示する表示手段
である。[0007] The prior art 2 will be described in JP-A-8-272813. FIG. 30 is a diagram showing the configuration of JP-A-8-272813. In FIG. 30, reference numeral 5 denotes an input means for inputting a document image;
Is a character recognition means for recognizing characters, 76 is a recognition dictionary storing standard patterns of characters used for character recognition, 7
5 is a search means, 71 is a control means for controlling the execution of character recognition or controlling the input of a document image and a search using an input keyword, 77 is a search data storage unit for storing images and recognition result data, 4 is a search data Display means for displaying the result.

【０００８】はじめにデータの格納方法について説明す
る。文字認識手段７２は入力手段５から入力された文書
画像内の文字の切り出し、認識を行い、文字画像１文字
に対して４文字までの認識結果候補文字を制御手段７１
に出力する。制御手段７１は個々の文字に対し、文字画
像と認識結果候補文字を例えば４文字ずつ、検索データ
格納部７７に保存する。First, a method of storing data will be described. The character recognizing means 72 cuts out and recognizes characters in the document image input from the input means 5, and recognizes up to four recognition result candidate characters for one character image.
Output to The control means 71 stores a character image and recognition result candidate characters, for example, four characters for each character in the search data storage 77.

【０００９】次に検索方法について説明する。図３１は
検索データ格納部７７の一部である。いま、検索のため
の入力キーワードを「内部処理統合型」とした場合の文
字認識結果と入力キーワードの照合部分を矢印で示す。
検索手段７５は、４位までの候補文字全てと照合する。
いま、数式(1)でmがある閾値、例えば60(％)以上の場合
にこれを検索結果候補とする。この場合、図３１では入
力キーワードの文字数が７文字に対し６文字と照合して
いるので、 m=6/7*100 = 85.7(%) となり、検索結果候補となる。Next, a search method will be described. FIG. 31 shows a part of the search data storage unit 77. Now, an arrow indicates a collation portion between the character recognition result and the input keyword when the input keyword for the search is “internal processing integrated type”.
The search means 75 collates with all the candidate characters up to the fourth place.
Now, when m is equal to or greater than a certain threshold value, for example, 60 (%) in Expression (1), this is set as a search result candidate. In this case, in FIG. 31, since the number of characters of the input keyword is collated with 6 characters for 7 characters, m = 6/7 * 100 = 85.7 (%), which is a search result candidate.

【００１０】[0010]

【発明が解決しようとする課題】従来技術１の場合、文
字切り出し候補点の決定は黒画素の連結成分の切れ目と
するので、例えば図２０の「／Ｗ」のように隣り合う文
字が接触して連続する場合、１文字としての文字切り出
しを行うことができず、結果としてその文字が誤認識と
なり、検索漏れとなってしまう問題点があった。また、
文字切り出しが成功した場合でも文字認識において正し
い結果が出力できなかった場合には上記と同様に検索漏
れが発生する。In the case of the prior art 1, since the character cutout candidate point is determined at the break of the connected component of the black pixel, adjacent characters come into contact with each other, for example, "/ W" in FIG. If the characters are consecutive, character extraction as one character cannot be performed, and as a result, the character is erroneously recognized, resulting in a problem that search is omitted. Also,
Even if character extraction is successful, if a correct result is not output in character recognition, search omission occurs as described above.

【００１１】また、従来技術２においては、入力キーワ
ードと照合する文字の不一致となる部分がどのような文
字であっても、一致する部分が共通であると同一の一致
度として計算される問題点があった。これにより、例え
ば、入力キーワードが「日本人」に対し、文字列が「日
本入」「日本語」「日本国」「日本の」「日本は」など
はどれもm=2/3*100=67％（数式(1)より）で同一の一致
度となり、検索結果として出力し表示する。[0011] Further, in the prior art 2, no matter what part of the character to be compared with the character to be matched with the input keyword is the same degree of matching if the matching part is common. was there. Thus, for example, if the input keyword is "Japanese", the character strings "Japanese", "Japanese", "Japan", "Japan", "Japan" are all m = 2/3 * 100 = 67% (from equation (1)) has the same degree of match, and is output and displayed as a search result.

【００１２】ここで、「日本入」の場合は「入」が誤認
識しており、実際は「日本人」である場合、上記の「日
本語」「日本国」「日本の」「日本は」等と一致度が等
しいために、一致度の高い順に表示した場合、「日本入」
をこれらの中に埋もれて表示してしまう。ユーザは表示
手段４が表示したこのような検索ノイズの中から更に希
望する結果を探す必要があり、この不一致を許可する閾
値が小さいほど検索ノイズも大量に出力されるためにユ
ーザが本当に検索したい文書が検索ノイズに埋もれ、結
果としてユーザが使いづらいという問題点があった。ま
た、閾値を大きくすると検索漏れが大きくなるといった
問題点があった。本発明はこれらの課題を解決するため
になされたものである。[0012] Here, in the case of "Japanese entry", "in" is erroneously recognized, and when it is actually "Japanese", the above "Japanese", "Japan", "Japan", "Japan is" Since the degree of coincidence is the same as that of the others, when displayed in descending order of the degree of coincidence,
Is buried in these and displayed. The user needs to search for more desired results from such search noises displayed by the display means 4, and the smaller the threshold for permitting the inconsistency, the more search noises are output, so the user really wants to search. There is a problem that the document is buried in the search noise, and as a result, it is difficult for the user to use the document. In addition, there is a problem that an increase in the threshold increases search omission. The present invention has been made to solve these problems.

【００１３】[0013]

【課題を解決するための手段】請求項１の文書ファイリ
ング装置は、文書画像が入力される入力手段と、予め各
文字の標準パターンが格納された認識辞書と、前記入力
手段により入力された文書画像から文字を切り出し、前
記認識辞書を参照して切り出された文字を認識し文字コ
ードを作成する文字認識手段と、前記文字認識手段が認
識した文字毎に特徴を作成する特徴作成手段と、前記文
字認識手段が作成した文字コードと前記特徴作成手段が
作成した特徴を保存する検索データ格納部と、予め前記
標準パターンの特徴を保持する特徴辞書と、検索時入力
された検索用の入力キーワード各文字の特徴を前記特徴
辞書から取得する検索特徴作成手段と、前記入力キーワ
ードと前記検索データ格納部内のデータを照合して検索
しその結果に対し、所定条件のとき前記検索特徴作成手
段の結果を参照して処理し出力する検索手段と、この検
索手段の検索結果を表示する表示手段とを備える。According to a first aspect of the present invention, there is provided a document filing apparatus comprising: an input unit for inputting a document image; a recognition dictionary in which standard patterns of respective characters are stored in advance; and a document input by the input unit. A character recognizing unit that extracts characters from an image, recognizes the extracted characters with reference to the recognition dictionary, and creates a character code, a feature creating unit that creates a feature for each character recognized by the character recognizing unit, A search data storage unit for storing the character codes created by the character recognition unit and the features created by the feature creation unit; a feature dictionary for holding the features of the standard pattern in advance; Search feature creation means for acquiring a feature of a character from the feature dictionary; and collating and searching the input keyword against data in the search data storage unit. When the predetermined condition comprises search means for outputting reference to processing results of the search feature creation means, and display means for displaying the search result of the search means.

【００１４】請求項２の文書ファイリング装置では、前
記特徴作成手段は、前記文字認識手段が文字認識を行う
際に文字切り出しを行った各文字矩形において文字の外
郭部の垂直、水平、右上がり、右下がりの４方向成分特
徴を作成する構成にされる。[0014] In the document filing apparatus of the second aspect, the feature creating means may include: a vertical, horizontal, and right-upward portion of a character outline in each character rectangle from which a character is cut out when the character recognizing means performs character recognition; The configuration is such that a four-way component feature falling to the right is created.

【００１５】請求項３の文書ファイリング装置では、前
記検索手段は、前記入力キーワードと前記検索データ格
納部内のデータとの照合において、前記入力キーワード
と前記検索データ格納部の文字コードが一致している部
分に対しては文字コード同士の距離を計算し、文字コー
ドが一致していない部分においては前記検索データ格納
部内の該文字の特徴および前記特徴辞書内の特徴を照合
してその距離を計算し前記文字コードの距離および前記
特徴の距離をもとに検索結果を決定する構成にされる。According to a third aspect of the present invention, in the document filing apparatus, the search unit matches the character code of the input keyword with the character code of the search data storage unit in matching the input keyword with data in the search data storage unit. For the part, the distance between the character codes is calculated, and for the part where the character codes do not match, the characteristic of the character in the search data storage unit and the characteristic in the characteristic dictionary are compared to calculate the distance. The search result is determined based on the distance of the character code and the distance of the feature.

【００１６】請求項４の文書ファイリング装置では、前
記検索手段は、前記入力キーワードと前記検索データ格
納部との照合において、前記入力キーワードと一致する
文字数の割合が所定の値以上の場合に、文字コードが一
致していない部分での前記検索データ格納部内の該文字
の特徴および前記特徴辞書内の特徴を照合しその一致度
を計算する構成にされる。In the document filing apparatus according to a fourth aspect of the present invention, the search means, when comparing the input keyword with the search data storage unit, if the ratio of the number of characters matching the input keyword is equal to or more than a predetermined value, The feature of the character in the search data storage unit and the feature in the feature dictionary at a portion where codes do not match are compared, and the degree of matching is calculated.

【００１７】請求項５の文書ファイリング装置では、前
記特徴作成手段は、前記文字認識手段が認識した文字コ
ードを所定の基準を用いて検定し、その個々の文字認識
結果が正解文字であると判定した文字に対しては特徴を
作成せず前記文字認識手段が出力する文字コードのみを
保存し、正解文字と判定できない場合は前記文字認識手
段が出力する文字コードと前記特徴作成手段が作成する
特徴を保存する構成にされる。In the document filing apparatus according to a fifth aspect, the feature creation unit tests the character code recognized by the character recognition unit using a predetermined standard, and determines that each character recognition result is a correct character. For a character that has been created, only the character code output by the character recognition unit is stored without creating a feature, and if it cannot be determined as a correct character, the character code output by the character recognition unit and the feature created by the feature creation unit Is saved.

【００１８】請求項６の文書ファイリング装置では、前
記検索手段は、前記検索データ格納部における検索用デ
ータが文字コードのみ保持する部分は文字コードのみに
よる距離から一致を判定し、文字コードと特徴を保持す
る部分は文字コードと特徴から一致度を計算する構成に
される。In the document filing apparatus according to a sixth aspect of the present invention, the search means determines that a portion of the search data storage section in which the search data holds only the character code determines a match from a distance based on only the character code, and determines the character code and the characteristic. The retained part is configured to calculate the degree of coincidence from the character code and the feature.

【００１９】請求項７の文書ファイリング装置では、前
記文字認識手段は文書が縦書きであるか、横書きである
かを判定しその結果を前記検索データ格納部に保存し、
前記検索特徴作成手段は、特徴を照合する入力キーワー
ドと前記検索データ格納部の文字列の文字数が異なる場
合は、検索データ格納部内のデータが縦書きであるか、
横書きであるかの情報をもとに特徴を所定の基準にした
がって再作成する構成にされる。[0019] In the document filing apparatus according to claim 7, the character recognizing means determines whether the document is written vertically or horizontally, and stores the result in the search data storage.
When the number of characters of the character string of the search data storage unit is different from the number of characters of the input keyword for matching the feature, whether the data in the search data storage unit is in vertical writing,
The feature is configured to be re-created in accordance with a predetermined criterion based on the information indicating whether the writing is horizontal.

【００２０】請求項８の文書ファイリング装置では、前
記検索手段は、入力キーワードと検索データ格納部の文
字列の文字数が異なる場合は、動的計画法によって入力
キーワードと検索データ格納部内の該当文字の特徴同士
の照合を行う構成にされる。In the document filing apparatus according to the present invention, when the number of characters of the character string in the search data storage unit differs from that of the input keyword, the search unit may use the dynamic programming method to search for the input keyword and the corresponding character in the search data storage unit. The configuration is such that features are compared.

【００２１】請求項９の文書ファイリング装置では、前
記文字認識手段は文書が縦書きであるか、横書きである
かを判定しその結果を前記検索データ格納部に保存し、
前記特徴作成手段は、縦書きと横書きに対応する夫々の
特徴作成方法を備え、前記文字認識手段が縦書きである
か、横書きであるかを判定した結果により、対応する特
徴作成方法を用いて特徴作成する構成にされる。According to a ninth aspect of the present invention, the character recognizing means determines whether the document is written vertically or horizontally, and stores the result in the search data storage.
The feature creator has a feature creation method corresponding to vertical writing and horizontal writing, and uses the corresponding feature creation method based on a result of determining whether the character recognizing means is vertical writing or horizontal writing. It is configured to create features.

【００２２】請求項１０の文書ファイリング装置では、
前記特徴作成手段は、複数の異なる特徴作成方法を備
え、入力キーワードの文字の種類により、夫々対応した
特徴作成方法を選択する構成にされる。According to a tenth aspect of the present invention, there is provided a document filing apparatus.
The feature creation means includes a plurality of different feature creation methods, and is configured to select a corresponding feature creation method according to the type of the character of the input keyword.

【００２３】請求項１１の文書ファイリング装置では、
前記特徴作成手段は、入力キーワードが英字または記号
の場合、入力キーワードを構成する隣り合う文字の特徴
同士を一部重ねあわせて統合特徴を作成する構成にされ
る。[0023] In the document filing apparatus according to the eleventh aspect,
When the input keyword is an alphabetic character or a symbol, the feature creating means creates an integrated feature by partially overlapping features of adjacent characters constituting the input keyword.

【００２４】請求項１２の文書ファイリング装置では、
前記文字認識手段は、文字認識の際に文字切り出しを行
い、各文字毎の矩形情報を前記検索データ格納部へ保存
し、前記検索特徴作成手段が出力した入力キーワードの
各文字の矩形形状と、前記検索データ格納部から取得し
た文字矩形の情報および入力キーワードの文字のうち照
合対象とする文字数から、特徴を照合するかしないかを
判定し、特徴照合しないと判定した場合は前記検索デー
タ格納部の該文字列と前記入力キーワードが一致してい
ないとみなす特徴照合判定手段を備える。According to a twelfth aspect of the present invention, there is provided a document filing apparatus.
The character recognizing unit performs character segmentation at the time of character recognition, stores rectangular information for each character in the search data storage unit, and forms a rectangular shape of each character of the input keyword output by the search feature creating unit. Based on the information of the character rectangle obtained from the search data storage unit and the number of characters to be compared among the characters of the input keyword, it is determined whether or not the feature is to be compared. And a feature collation judging means for judging that the character string does not match the input keyword.

【００２５】請求項１３の文書ファイリング方法では、
文書画像を入力する入力ステップと、前記入力ステップ
により入力された文書画像から文字を切り出し、予め各
文字の標準パターンが格納された認識辞書を参照して切
り出された文字を認識し文字コードを作成する文字認識
ステップと、前記文字認識ステップが認識した文字毎に
特徴を作成する特徴作成ステップと、前記文字認識ステ
ップが作成した文字コードと前記特徴作成ステップが作
成した特徴を検索データ格納部に保存する検索データス
テップと、検索時入力された検索用の入力キーワード各
文字の特徴を予め標準パターンの特徴が保持された特徴
辞書から取得する検索特徴作成ステップと、前記入力キ
ーワードと前記検索データ格納部内のデータを照合して
検索しその結果に対し、所定条件のとき前記検索特徴作
成ステップの結果を参照して処理し出力する検索ステッ
プと、この検索ステップの検索結果を表示する表示ステ
ップとを備える。In the document filing method of claim 13,
An input step of inputting a document image, and characters are cut out from the document image input in the input step, and the cut-out characters are recognized by referring to a recognition dictionary in which a standard pattern of each character is stored in advance to create a character code. A character recognition step, a feature creation step of creating a feature for each character recognized by the character recognition step, and a character code created by the character recognition step and a feature created by the feature creation step stored in a search data storage unit. A search data step, a search feature creating step of acquiring a feature of each character of a search input keyword input at the time of search from a feature dictionary in which features of a standard pattern are held in advance, and The result of the search feature creation step is compared with the result of the search feature creation step under a predetermined condition. Comprising a search step of referring to processes output, and a display step of displaying the search result of the search step.

【００２６】[0026]

【発明の実施の形態】実施の形態１以下本発明の実施の形態１について説明する。まず、は
じめに文書の登録方法について、図１〜図７を用いて説
明する。図１は本発明の実施の形態１におけるブロック
図である。図１において、５は文書登録時にスキャナを
使用して紙文書の画像を光電変換により電子化、或いは
予め光電変換された画像をネットワーク経由等で入力す
る入力手段、９は文字認識に使用する認識辞書、２は入
力手段５により与えられた画像から文字を抽出して、縦
書き、横書きの判定をし、さらに認識辞書９を参照して
抽出した文字を認識し、文字コードを出力する文字認識
手段、３は文字認識手段２が文字認識を行った文字矩形
毎に特徴を作成する特徴作成手段である。Embodiment 1 Embodiment 1 of the present invention will be described below. First, a document registration method will be described with reference to FIGS. FIG. 1 is a block diagram according to Embodiment 1 of the present invention. In FIG. 1, reference numeral 5 denotes input means for digitizing an image of a paper document by photoelectric conversion using a scanner at the time of document registration, or inputting a photoelectrically converted image via a network or the like, and reference numeral 9 denotes recognition used for character recognition. The dictionary 2 extracts characters from the image given by the input means 5, determines vertical writing or horizontal writing, further recognizes the extracted characters with reference to the recognition dictionary 9, and outputs a character code. Means 3 are feature creating means for creating a feature for each character rectangle for which the character recognizing means 2 has performed character recognition.

【００２７】１０は文字認識手段２および特徴作成手段
３が作成した文字コード、特徴及び縦書き、横書きの種
類を保存する検索データ格納部、４は検索結果、文書画
像を表示する表示手段、６は検索時においてユーザが入
力した入力キーワードに相当する画像内の文字部分を検
索データ格納部１０から検索する検索手段、７は入力キ
ーワードと検索データ格納部１０内の文字データとの特
徴の照合を行うか否かを判定する特徴照合判定手段、１
１は予め標準パターンの特徴を保持する特徴辞書、８は
入力した入力キーワードの文字毎の特徴を特徴辞書１１
から読み出し、読み出した文字毎の特徴の加工を行う検
索特徴作成手段、１２は前記入力手段１からの電子化さ
れた文書画像を格納する文書画像格納手段、１は前記各
手段を制御して文書の登録、検索処理を管理する制御手
段である。Reference numeral 10 denotes a search data storage unit for storing the character codes, characteristics, and types of vertical writing and horizontal writing created by the character recognizing means 2 and the feature creating means 3, 4 a display means for displaying search results and document images, 6 Is a retrieval means for retrieving a character portion in an image corresponding to an input keyword input by a user at the time of retrieval from the retrieval data storage unit 10; Feature collation judging means for judging whether or not to perform, 1
Reference numeral 1 denotes a feature dictionary that holds the features of the standard pattern in advance, 8 denotes a feature dictionary 11 that stores the features of each character of the input keyword that has been input.
A search feature creating means for processing the features of each read character; a document image storing means for storing the digitized document image from the input means; This is control means for managing registration and search processing.

【００２８】図２(a)は文書画像の例であり、図２(b)は
図２(a)の文書画像を文字認識した結果である。図３は
文字認識処理における各文字毎に切り出した結果から特
徴を作成する場合の領域分割方法の説明図、図４は方向
成分特徴を作成するために用いるマスクの例、図５は検
索データ格納部１０に格納するデータの例、図６は文書
登録処理のフローチャート、図７は検索用データ作成
（図６に示すステップS102の処理）のフローチャートで
ある。はじめに文書の登録方法について図６、図７のフ
ローチャートをもとに説明する。FIG. 2A shows an example of a document image, and FIG. 2B shows the result of character recognition of the document image of FIG. 2A. FIG. 3 is an explanatory diagram of an area dividing method in a case where a feature is created from a result extracted for each character in the character recognition processing. FIG. 4 is an example of a mask used to create a directional component feature. 6 is a flowchart of a document registration process, and FIG. 7 is a flowchart of search data creation (the process of step S102 shown in FIG. 6). First, a document registration method will be described with reference to the flowcharts of FIGS.

【００２９】図６のステップS101で、入力手段５により
文書画像を入力する。入力手段５を実現するには、スキ
ャナを使用して紙文書を光電変換により電子化してもよ
いし、予め光電変換された画像をネットワーク経由等で
入力してもよい。入力する文書画像の例を図２(a)に示
す。入力手段５によって入力された画像は、ここでは各
画素値が１（黒）か０（白）の値をとる２値画像とす
る。次に、ステップS102で、検索データ格納部１０に格
納するデータの作成を行う。ここでは制御手段１は、入
力画像を文字認識手段２へ渡し、文字認識を起動する。
そしてステップS103へ進み、ステップS102で作成した文
字コード、特徴及び縦書き、横書きの種類を検索データ
格納部１０に保存する。In step S101 of FIG. 6, a document image is inputted by the input means 5. In order to realize the input unit 5, a paper document may be digitized by photoelectric conversion using a scanner, or an image that has been photoelectrically converted in advance may be input via a network or the like. FIG. 2A shows an example of an input document image. The image input by the input unit 5 is a binary image in which each pixel value takes a value of 1 (black) or 0 (white). Next, in step S102, data to be stored in the search data storage unit 10 is created. Here, the control means 1 passes the input image to the character recognition means 2 and starts character recognition.
Then, the process proceeds to step S103, and the character code, feature, vertical writing, and horizontal writing type created in step S102 are stored in the search data storage unit 10.

【００３０】次に図７に示す処理の流れに従い文字認識
手段２と特徴作成手段３とによるステップS102における
検索データ作成の詳細について述べる。はじめに、図７
のステップS201で、文字認識手段２は、入力画像から文
字領域の抽出を行う。文字領域の抽出方法は、例えば文
書画像内の黒画素が連続する領域を連結し、黒画素の連
結成分の幅、高さの値から文字列であるか否かを決定
し、隣接する文字列同士をまとめ一領域とする。Next, details of the search data creation in step S102 by the character recognition means 2 and the feature creation means 3 will be described in accordance with the flow of processing shown in FIG. First, FIG.
In step S201, the character recognition unit 2 extracts a character area from the input image. The method of extracting a character region is, for example, to connect regions where black pixels are continuous in a document image, determine whether or not the character is a character string from the width and height values of the connected components of the black pixels, These are combined into one region.

【００３１】次に図７のステップS202で文字認識手段２
は各領域毎に縦書き、横書きの判定をする。判定方法は
公知の方法を用い、例えば領域内の文字列の並びから、
各文字列の幅、高さを求め、縦長の文字列が多く存在す
る領域を縦書き、横長の文字列が多く存在する領域を横
書きと判定する。次にステップS203で文字認識手段２は
文字認識を実行する。ここでは、公知技術を用いて文字
切り出し、文字認識を行い、１文字画像あたり１文字ま
たは複数の候補文字を作成する。Next, at step S202 in FIG.
Determines vertical writing and horizontal writing for each area. The determination method uses a known method, for example, from the arrangement of character strings in the area,
The width and height of each character string are determined, and an area where many vertically long character strings exist is determined as vertical writing, and an area where many horizontally long character strings exist is determined as horizontal writing. Next, in step S203, the character recognition means 2 executes character recognition. Here, characters are cut out using a known technique and character recognition is performed, and one character or a plurality of candidate characters are created for one character image.

【００３２】文字切り出し方法は、例えばS201で決定し
た領域の各文字列画像を縦方向と横方向から走査し、黒
画素数の周辺分布を求め、黒画素数の少ない部分を切り
出し候補点として１文字毎の画像に分割する。文字認識
処理は、文字切り出しによって一文字単位に分割した画
像に対し、例えば８×８次元の各小領域の黒画素数をカ
ウントし、標準パターンに対して各次元毎に差分の絶対
値の和を求め、差分の絶対値の和の小さい標準パターン
を有する文字を抽出し、認識結果として出力する。In the character extracting method, for example, each character string image in the area determined in S201 is scanned in the vertical direction and the horizontal direction, and the peripheral distribution of the number of black pixels is obtained. Divide into images for each character. The character recognition process counts the number of black pixels in each small area of, for example, 8 × 8 dimensions for an image divided into single characters by character segmentation, and calculates the sum of absolute values of differences for each dimension with respect to a standard pattern. Then, a character having a standard pattern with a small sum of the absolute values of the differences is extracted and output as a recognition result.

【００３３】次に、図７のステップS204に進み、特徴作
成手段３は各文字から特徴を作成する。ここでは、図３
に示すように個々の文字切り出し後の矩形を仮想的に８
分割し、各領域毎に文字画像のエッジの４方向成分特徴
（水平、垂直、右上がり、右下がり特徴）を抽出する。
４方向成分特徴の作成方法は、図３に示すような８分割
された各領域内で図４で示すマスクを走査し、画像とマ
スクのビットＡＮＤをとる。その結果がマスクと同一の
場合にそのマスクの方向成分を１増加させる。そのよう
にして特徴を作成した例を図５に示す。図５では縦書
き、横書きの判定・認識をして作成した文字コード、お
よび各文字の特徴を示している。図５で「水平」は水平
成分特徴、「垂直」は垂直成分特徴、「右上」は右上がり
成分特徴、「右下」は右下がり方向成分特徴である。Next, the process proceeds to step S204 in FIG. 7, and the feature creating means 3 creates a feature from each character. Here, FIG.
As shown in FIG.
The image is divided, and four-direction component features (horizontal, vertical, upward-sloping, downward-sloping features) of the edge of the character image are extracted for each region.
In the method of creating the four-direction component feature, the mask shown in FIG. 4 is scanned in each of the eight divided areas as shown in FIG. 3, and a bit AND of the image and the mask is obtained. When the result is the same as that of the mask, the direction component of the mask is increased by one. FIG. 5 shows an example in which features are created in this manner. FIG. 5 shows character codes created by judging and recognizing vertical writing and horizontal writing, and the characteristics of each character. In FIG. 5, “horizontal” is a horizontal component feature, “vertical” is a vertical component feature, “upper right” is a right-up component component, and “lower right” is a right-down component feature.

【００３４】次に上述のように図６のステップS103へ進
み、制御手段１は図５に示す文字認識手段２および特徴
作成手段３が作成した文字コード、特徴及び縦書き、横
書きの種類を検索データ格納部１０に保存する。Next, as described above, the process proceeds to step S103 in FIG. 6, and the control means 1 searches the character codes, characteristics, and types of vertical writing and horizontal writing created by the character recognizing means 2 and the feature creating means 3 shown in FIG. The data is stored in the data storage unit 10.

【００３５】次に図５、図８〜図１０を用いて検索時の
動作を説明する。図８は検索のフローチャートであり、
図９、図１０は検索の動作を説明する図である。まず、
図８のフローチャートを基に検索の動作を説明する。は
じめに図８のステップS301で検索手段６が、ユーザーの
入力した入力キーワードと検索データ格納部１０内の文
字コードデータとの照合を行う。検索手段６は検索デー
タ格納部１０内を探索し、入力キーワードと一致する文
字が存在した場合、その文字の格納位置を示す数字をバ
ッファに保持する。Next, the operation at the time of retrieval will be described with reference to FIGS. FIG. 8 is a flowchart of the search,
9 and 10 are diagrams for explaining the search operation. First,
The search operation will be described based on the flowchart of FIG. First, in step S301 in FIG. 8, the search unit 6 checks the input keyword input by the user with the character code data in the search data storage unit 10. The search unit 6 searches the search data storage unit 10 and, if a character that matches the input keyword exists, holds a number indicating the storage position of the character in a buffer.

【００３６】図９において、２１は検索データ内の文字
の位置を示す番号である。はじめに図５のデータ番号１
と照合する。図９で２０は入力キーワードと検索データ
の一致した文字の文字番号を示す。いま、データ番号１
の検索データと入力キーワードとの照合において、入力
キーワードの各文字に対して文字コードが一致した文字
番号２０がバッファ（図示せず）に格納される。入力キ
ーワードの文字「文」に対しては文字番号１が、「認」に
対しては３が、「識」に対しては４が対応付けられる。
図１０は図５のデータ番号２と照合したものであり、図
１０の２４はデータ番号２の検索データと入力キーワー
ドが一致した文字の文字番号を示す。In FIG. 9, reference numeral 21 denotes a number indicating the position of a character in the search data. First, data number 1 in FIG.
To match. In FIG. 9, reference numeral 20 denotes the character number of the character whose input keyword matches the search data. Now, data number 1
When the search data is compared with the input keyword, the character number 20 whose character code matches each character of the input keyword is stored in a buffer (not shown). The character number 1 is associated with the character “sentence” of the input keyword, 3 is associated with “recognition”, and 4 is associated with “sense”.
FIG. 10 shows a comparison with data number 2 in FIG. 5, and 24 in FIG. 10 shows the character number of the character whose input keyword matches the search data of data number 2.

【００３７】次に、ステップS302で検索手段６は検索候
補エリアを算出する。ここでは、入力キーワードと一致
した文字の文字番号を検定して候補エリアとするか否か
を決定する。決定するための条件は、入力キーワードを
構成する全文字の中で、検索データと一致する文字が占
める割合が30%以上であり、文字番号が入力キーワード
の出現順に並んでおり、一致した文字番号が近接する場
合に候補エリアとする。図９の例は２０が、図１０の例
では２４が候補エリアとなる。Next, in step S302, the search means 6 calculates a search candidate area. Here, the character number of the character that matches the input keyword is tested to determine whether or not to be a candidate area. The condition for determining is that the characters that match the search data account for 30% or more of all the characters that make up the input keyword, the character numbers are arranged in the order in which the input keywords appear, and the matching character numbers Is considered as a candidate area when they are close to each other. 9 is the candidate area, and in the example of FIG. 10, 24 is the candidate area.

【００３８】次にステップS303に進み、検索手段６は検
索データと入力キーワードとの特徴の照合を行うか否か
を判定する。図９の２０の結果は上記の条件を満たして
おり、一致していない入力キーワード文字「字」と検索
データ「宇」の特徴を照合する。入力キーワード「字」
に対する特徴は検索特徴作成手段８が特徴辞書１１から
読み出し、文字「字」の特徴をバッファ（図示せず）に
ロードする。また、「宇」の検索用特徴に対しては検索
手段６が同じくバッファ（図示せず）にロードする。ロ
ードしたバッファ（図示せず）の例を図９の２２および
２３に示す。Next, proceeding to step S303, the search means 6 determines whether or not to match features between the search data and the input keyword. The result of 20 in FIG. 9 satisfies the above condition, and the character of the unmatched input keyword character “character” is compared with the feature of the search data “u”. Input keyword "letter"
The search feature creating means 8 reads the feature corresponding to the character from the feature dictionary 11 and loads the feature of the character "character" into a buffer (not shown). In addition, the search means 6 similarly loads a search feature “U” into a buffer (not shown). Examples of loaded buffers (not shown) are shown at 22 and 23 in FIG.

【００３９】ステップS303での判定結果が検索データと
入力キーワードとの特徴の照合を行うであると次にステ
ップS304に進み、検索手段６は検索データと入力キーワ
ードとの特徴間の距離を算出する。特徴間の距離の計算
方法はIf the result of the determination in step S303 is that the feature between the search data and the input keyword is to be compared, the process proceeds to step S304, where the search means 6 calculates the distance between the feature between the search data and the input keyword. . How to calculate the distance between features

【００４０】[0040]

【数１】 (Equation 1)

【００４１】とする。ただし、Fdicは辞書の特徴値、Fi
mgは検索データの特徴値、Iは方向成分数、Jは各方向成
分毎の特徴数であり、ここではI＝４、J＝８である。ま
た、1≦i≦I、1≦j≦Jである。いま、図９の例でDを計
算すると、D1[dic,img] = １２となる。また、図１０に
示す例ではD2[dic,img] = ４９となる。It is assumed that However, Fdic is the dictionary feature value, Fi
mg is the feature value of the search data, I is the number of direction components, and J is the number of features for each direction component, where I = 4 and J = 8. Also, 1 ≦ i ≦ I and 1 ≦ j ≦ J. Now, when D is calculated in the example of FIG. 9, D1 [dic, img] = 12. In the example shown in FIG. 10, D2 [dic, img] = 49.

【００４２】次にステップS305に進み、検索手段６は入
力キーワードと検索データの全体の距離によってキーワ
ード候補とするかどうかを決定する。いま、入力キーワ
ード内の文字と検索データの文字コードデータが一致し
た文字間の距離を０とし、全体の距離の計算を Dist＝ ΣD／入力キーワード文字数・・・数式(3) で計算し、距離がある所定値Ａ以下の場合は候補として
出力すると、図９の例ではDist1=12/4=3となり、図１０
の例ではDist2=49/4=12となる。例えばＡ=10で棄却する
場合では図１０の例が候補から棄却される。また、閾値
で棄却しなくとも、入力キーワードと形状の異なる「文
の認識」が、入力キーワードに近い「文宇認識」に比べ
大きな距離となり、距離の小さい順に候補をソーティン
グして表示する場合、入力キーワードに近い「文宇認
識」が「文の認識」に比べ正しい候補により近い側に表
示されるためユーザは候補の中から正解を見つける手間
が軽減され使い勝手が向上する。なお、ステップS303で
の判定結果が検索データと入力キーワードとの特徴の照
合を行わないであると、ステップS304およびステップS3
05を飛ばし終了になる。Next, proceeding to step S305, the search means 6 determines whether or not to be a keyword candidate based on the entire distance between the input keyword and the search data. Now, the distance between the character in the input keyword and the character in which the character code data of the search data is matched is set to 0, and the total distance is calculated by Dist = ΣD / the number of characters in the input keyword. In the example of FIG. 9, Dist1 = 12/4 = 3, and output is performed as a candidate when is smaller than a predetermined value A.
In the example, Dist2 = 49/4 = 12. For example, when rejecting with A = 10, the example in FIG. 10 is rejected from the candidates. Also, without rejecting with a threshold, if the “sentence recognition” having a different shape from the input keyword has a larger distance than “sentence recognition” closer to the input keyword, and the candidates are sorted and displayed in ascending order of distance, “Sentence recognition” close to the input keyword is displayed closer to the correct candidate than “sentence recognition”, so that the user can reduce the trouble of finding the correct answer from the candidates and improve the usability. If the result of the determination in step S303 indicates that the matching between the search data and the input keyword is not to be performed, the process proceeds to step S304 and step S3.
Skip 05 to end.

【００４３】実施の形態１では、作成する特徴を４方向
成分特徴としたが、この特徴に限らず、他の特徴、例え
ばメッシュ特徴でもヒストグラム特徴でもよい。また、
特徴作成手段３が作成する特徴は文字認識手段２が文字
認識に使用した特徴を流用しても良い。また、特徴の距
離の計算方法および入力キーワード全体との距離計算方
法はこれに限ったものではない。更に実施の形態１での
Ａの値もこの限りではない。In the first embodiment, the feature to be created is a four-direction component feature. However, the feature is not limited to this feature, and may be another feature, for example, a mesh feature or a histogram feature. Also,
As the feature created by the feature creating unit 3, the feature used by the character recognizing unit 2 for character recognition may be used. Further, the method of calculating the distance of the feature and the method of calculating the distance to the entire input keyword are not limited to those described above. Further, the value of A in the first embodiment is not limited to this.

【００４４】また、実施の形態１では全ての認識結果に
対して特徴を作成し保存するが、これに限らず、例えば
文字認識の評価値が非常に高く認識結果が確実に正解で
あると判定できる場合はその文字コードの特徴値の保存
を省略することによって検索データ保存のための容量を
削減することができる。例えば、図１１に示すフローチ
ャートを用いて文書登録を実行し、ステップS205で文字
認識の類似度が一定値以下の文字に対して特徴作成手段
３が特徴を作成する。その結果の例を図１２に示す。図
１２では「宇」「の」「識」の文字について特徴を作成し
ている。検索時において、検索手段６は検索データ格納
部１０内で特徴が存在しない文字コードは数式（４）で
文字コードのみの照合を行い、特徴が存在する文字は、
数式（４）と数式（２）を用いて計算する。In the first embodiment, a feature is created and stored for all recognition results. However, the present invention is not limited to this. For example, the evaluation value of character recognition is extremely high, and it is determined that the recognition result is definitely correct. If possible, the storage for the search data can be reduced by omitting the storage of the characteristic value of the character code. For example, document registration is executed using the flowchart shown in FIG. 11, and in step S205, the feature creating unit 3 creates a feature for a character whose similarity in character recognition is equal to or less than a certain value. FIG. 12 shows an example of the result. In FIG. 12, features are created for the characters “U”, “N”, and “Ken”. At the time of the search, the search means 6 checks only the character code having no feature in the search data storage unit 10 using the mathematical formula (4).
The calculation is performed using Expression (4) and Expression (2).

【００４５】[0045]

【数２】 (Equation 2)

【００４６】候補エリア全体での距離を Dist＝（ΣD ＋ ΣC）／入力キーワード文字数・・・数式(3)■ によって計算し、入力キーワードと一致するかしないか
を判定する。The distance in the entire candidate area is calculated by Dist = (ΣD + ΣC) / the number of characters of the input keyword (Equation (3)), and it is determined whether the distance matches the input keyword.

【００４７】以上説明したように、この実施の形態１で
は、誤認識により文字認識結果が一致しない場合でも、
一致していない文字の特徴を比較し、これを用いて検索
を行うことによって正しい候補と誤った候補を類似度に
よって選別することができる。As described above, in the first embodiment, even if the character recognition results do not match due to erroneous recognition,
By comparing the characteristics of the characters that do not match and performing a search using the same, it is possible to select a correct candidate from an incorrect candidate based on the similarity.

【００４８】実施の形態２次に、文字切り出しエラー等で入力キーワードと検索デ
ータの文字数が異なる場合の検索方法について図１３〜
図１７を用いて説明する。今、図１３(a)に示すように
文字認識手段２の文字切り出しエラーにより「J」と
「E」が誤って１文字として切出され、図１３(b)に示す
検索データが出力された場合に入力キーワード「REJEC
T」を用いて検索する例について説明する。Embodiment 2 Next, a search method in the case where the number of characters of the input keyword differs from that of the search data due to a character extraction error or the like will be described with reference to FIGS.
This will be described with reference to FIG. Now, as shown in FIG. 13A, "J" and "E" are erroneously cut out as one character due to a character cut-out error of the character recognizing means 2, and the search data shown in FIG. 13B is output. If the input keyword "REJEC
An example of searching using "T" will be described.

【００４９】はじめに図８のステップS301で、入力キー
ワードと検索用データの文字コードによる照合を行う。
ここでは、入力キーワードと検索データの「R」「E」
「C」「T」が一致する。ステップS302で候補エリアを算
出する。ここでは「R」「E」「C」「T」の文字並びも順
番も正しいので、検索候補エリアとする。次にステップ
S303で特徴間の照合を行うか否か判定する。ここでは、
一致した文字数が入力入力キーワードの4/6=66.7%であ
り、30%以上であるので、特徴の照合を行う。First, in step S301 in FIG. 8, collation between the input keyword and the search data by the character code is performed.
Here, input keywords and search data "R""E"
"C" and "T" match. In step S302, a candidate area is calculated. Here, since the character arrangement and the order of “R”, “E”, “C”, and “T” are correct, the area is set as a search candidate area. Next step
In S303, it is determined whether or not matching between features is performed. here,
Since the number of matched characters is 4/6 = 66.7% of the input keyword, which is 30% or more, feature matching is performed.

【００５０】検索データの「作」と入力キーワードの
「JE」との照合を行う例について説明する。検索特徴作
成手段８は「J」「E」の特徴を特徴辞書１１からバッフ
ァ（図示せず）に読み出す。図１４に「J」「E」の特徴
を示す。図１４で[ ]に囲まれる部分が同一方向成分特
徴であり、図１７に示す領域番号の順に並べてある。次
に、「作」の特徴と「JE」の特徴の照合を行うが、文字
数が異なるために、特徴数も異なる。このため、実施の
形態１のような差分をとる方法は使用できない。この場
合、一般に良く知られている動的計画法（ＤＰマッチン
グ）によって照合を行う。ここでは検索データが横書き
であることから、図１８の１５，および１６のように上
下２つの領域を統合して、動的計画法で矢印へ向かう照
合を行う。図１８の１５は入力キーワードの特徴または
検索用データの特徴で多い方（この例の場合は入力キー
ワード「JE」の特徴）であり、１６はこの例の場合は検
索データ「作」の特徴である。An example will be described in which the search data "work" is collated with the input keyword "JE". The search feature creating means 8 reads the features of “J” and “E” from the feature dictionary 11 to a buffer (not shown). FIG. 14 shows the features of “J” and “E”. In FIG. 14, portions surrounded by [] are the same direction component features, and are arranged in the order of the region numbers shown in FIG. Next, the feature of “work” and the feature of “JE” are collated, but the number of characters is different, so the number of features is also different. For this reason, the method for obtaining the difference as in the first embodiment cannot be used. In this case, matching is performed by a generally well-known dynamic programming method (DP matching). Here, since the search data is written horizontally, the upper and lower two areas are integrated as shown at 15 and 16 in FIG. 18 and collation directed to the arrow is performed by dynamic programming. In FIG. 18, reference numeral 15 denotes the feature of the input keyword or the feature of the search data (the feature of the input keyword "JE" in this example), and 16 denotes the feature of the search data "work" in this example. is there.

【００５１】検索特徴作成手段８は「J」と「E」の特徴
を各成分毎に横方向に連結して作成する。ここでは水
平、垂直、右上、右下の各方向成分を、領域「１」
「２」「３」「４」と領域「５」「６」「７」「８」に
分け、「J」「E」の各方向成分を連結する。図１５で水
平(上)とは、図１７の領域「１」「２」「３」「４」の
水平成分を「J」「E」の順に特徴辞書を連結して並べた
ものであり、水平(下)とは、図１７の領域「５」「６」
「７」「８」の水平成分を「J」「E」の順に特徴辞書を
連結して並べたものである。図１６は図１３の検索デー
タ「作」について、図１５と同様に検索特徴作成手段８が
並べ替えたものである。いま、ある成分特徴間の距離をThe search feature creation means 8 creates the features "J" and "E" by connecting them in the horizontal direction for each component. Here, the horizontal, vertical, upper right, and lower right direction components are defined as a region “1”.
It is divided into “2”, “3” and “4” and areas “5”, “6”, “7” and “8”, and each direction component of “J” and “E” is connected. In FIG. 15, the horizontal (upper) is obtained by connecting the horizontal components of the areas “1”, “2”, “3”, and “4” in FIG. 17 by connecting the feature dictionaries in the order of “J” and “E”. Horizontal (bottom) refers to areas “5” and “6” in FIG.
The horizontal components of “7” and “8” are arranged by connecting the feature dictionaries in the order of “J” and “E”. FIG. 16 shows the search data “work” of FIG. 13 rearranged by the search feature creating means 8 in the same manner as in FIG. Now, the distance between certain component features

【００５２】[0052]

【数３】 (Equation 3)

【００５３】とする。ここで、FDは入力キーワード内の
文字特徴、FIは検索データ内の文字特徴、n=1は図１
５、図１６で水平（上）成分を表し、n=2は水平
（下）、n=3は垂直(上)、n=4は垂直(下)、n=5は右上
（上）、n=6は右上（下）、n=7は右下（上）、n=8は右
下（下）の各成分を示す。また、ここでは1≦i≦Iであ
り、I＝８、また1≦j≦JでJ=4である。例えば、i=2、j
=1のとき、FDniは図１５の点線で示す部分１３即ち、図
１７の「２」「６」で示す部分の領域に含まれる文字特
徴を示し、Finjは図１６の点線で示す部分１４即ち、図
１７の「１」「５」で示す部分の領域の文字特徴との距
離計算を数式(5)を用いて行う。このときAssume that Here, FD is a character feature in the input keyword, FI is a character feature in the search data, and n = 1 is FIG.
5, the horizontal (upper) component is shown in FIG. 16, where n = 2 is horizontal (lower), n = 3 is vertical (upper), n = 4 is vertical (lower), n = 5 is upper right (upper), n = 6 indicates upper right (lower), n = 7 indicates lower right (upper), and n = 8 indicates lower right (lower). Here, 1 ≦ i ≦ I, I = 8, and 1 ≦ j ≦ J, and J = 4. For example, i = 2, j
When = 1, FDni indicates the character feature included in the portion 13 indicated by the dotted line in FIG. 15, that is, the area of the portion indicated by "2" and "6" in FIG. 17, and Finj indicates the portion 14 indicated by the dotted line in FIG. The distance between the character feature of the area indicated by “1” and “5” in FIG. 17 is calculated using Expression (5). At this time

【００５４】[0054]

【数４】 (Equation 4)

【００５５】を計算し、 dist[dic,img] = Ddp(I,J)／I ・・・数式（7）として各特徴間の距離distを計算する。ここで、図１
５、図１６の例で実際に計算するとdist[dic,img] = 25
/8 = 3となる。次にステップS305で入力キーワードと検
索データ内の候補領域全体の距離を計算する。数式(3)
からDist = 3/6 = 0.5となる。このように、特徴数が異
なる場合は、特徴を再作成し、ＤＰマッチングを用いる
ことにより、検索可能となる。Dist [dic, img] = Ddp (I, J) / I The distance dist between each feature is calculated as Expression (7). Here, FIG.
5. When actually calculated in the example of FIG. 16, dist [dic, img] = 25
/ 8 = 3. Next, in step S305, the distance between the input keyword and the entire candidate area in the search data is calculated. Formula (3)
Therefore, Dist = 3/6 = 0.5. As described above, when the number of features is different, the features can be re-created and searched by using DP matching.

【００５６】この例では、検索データが横書きであるこ
とから、縦の２領域を統合して複数文字の特徴を横に連
結したが、検索データが縦書きの場合は、図１９の１７
と１８のように横の４領域を統合して、動的計画法によ
り矢印（縦）に向かって照合する。検索特徴作成手段８
は特徴を縦に連結する。特徴間距離の計算方法はIn this example, since the search data is written horizontally, two vertical regions are integrated and the characteristics of a plurality of characters are connected horizontally. However, when the search data is written vertically, 17 in FIG.
Then, the four horizontal areas are integrated as shown in (18) and (18), and collation is performed toward the arrow (vertical) by dynamic programming. Search feature creation means 8
Links features vertically. How to calculate the distance between features

【００５７】[0057]

【数５】 (Equation 5)

【００５８】および数式（６）、数式（７）、数式
（３）を用いる。数式(5)■でnが１６であるのは、４方
向成分を４領域について比較するためである。Equations (6), (7) and (3) are used. In Expression (5) ■, n is 16 in order to compare four directional components for four regions.

【００５９】実施の形態２では、入力キーワードと検索
データの特徴数が異なる場合、動的計画法により、照合
を行っているが、これに限らず、例えば多い方の特徴数
を少ない方の特徴数に一致させ、実施の形態１のように
照合を行うことも可能である。In the second embodiment, when the number of features of the input keyword differs from that of the search data, the matching is performed by the dynamic programming. However, the present invention is not limited to this. It is also possible to match the numbers and perform the matching as in the first embodiment.

【００６０】実施の形態３次に、検索データ格納部１０内の文字が途中で分割され
ている場合でも文字の特徴を再作成することによって検
索可能となる例を図８、図２０〜図２４を用いて説明す
る。図２０の文字画像に対し、文字認識手段２が図２１
のように文字認識し、同様に特徴作成手段３によって図
２１のように特徴を作成する。ここで、図２１に示す検
索データから入力キーワード「Ｓ／Ｗ」を検索する場合
について説明する。Third Embodiment Next, an example in which even if a character in the search data storage unit 10 is divided in the middle, the character can be searched by recreating the character feature is shown in FIGS. 8, 20 to 24. This will be described with reference to FIG. For the character image of FIG.
, And a feature is created by the feature creating means 3 as shown in FIG. Here, a case of searching for the input keyword “S / W” from the search data shown in FIG. 21 will be described.

【００６１】図８のステップS301で入力キーワードと検
索データの文字コードの照合を行う。いま、図２１の文
字コードと入力キーワード「Ｓ／Ｗ」の文字コードが一
致するのは「S」のみである。次に図８のステップS302
で候補エリアを算出し、ステップS303で検索手段６は特
徴を照合するか否かを決定する。いま、一致している文
字の入力キーワード全体に占める割合は1/3＝33.3%であ
るので照合候補とする。次にステップS304で一致してい
ない文字コードの特徴の照合を行う。図２１の検索デー
タ「ノN」の特徴と特徴辞書１１からバッファに取り入
れた「／Ｗ」の特徴を実施の形態１と同様に照合し、D
(dic,img) = 23となる。ここで、入力キーワードが英字
の場合は、文字が接触することが多いので、それに対応
して特徴を再作成する。In step S301 of FIG. 8, the input keyword is collated with the character code of the search data. Now, only the character code "S" matches the character code of FIG. 21 with the character code of the input keyword "S / W". Next, step S302 in FIG.
To calculate a candidate area, and in step S303, the search means 6 determines whether or not to collate the feature. Now, since the ratio of the matching character to the entire input keyword is 1/3 = 33.3%, it is set as the matching candidate. Next, in step S304, the feature of the character code that does not match is collated. The feature of the search data “NO N” in FIG. 21 and the feature of “/ W” taken into the buffer from the feature dictionary 11 are collated in the same manner as in the first embodiment, and D
(dic, img) = 23. Here, if the input keyword is an alphabetic character, the character often contacts, and the feature is recreated correspondingly.

【００６２】再作成の方法を図２２および図２３を用い
て説明する。検索特徴作成手段８は、入力キーワードが
英字の場合は、文字の接触に対応して特徴を再作成す
る。この時、図２２に示す文字の次の文字が英字の場合
は、接触に対応した特徴の再作成をする。ここでは、入
力キーワードに「／」が含まれるので、「／」および
「Ｗ」の特徴を再作成する。特徴辞書１１内に標準パタ
ーンを保持しておき、標準パターンを次のように仮想的
に分割することにより作成する。図２３の「Ｗ」に実線
で示す部分の左から1/4の領域（矢印３０から矢印３１
で挟まれる領域３６）を図２３「／」で示す特徴の右1/
4(図２３の３７)に重ねあわせる。その後、「Ｗ」の残
ったイメージ（矢印３１と矢印３２に挟まれる領域）を
３３〜３５で示す点線で４等分し、各方向成分特徴を再
作成する。再作成した特徴の結果を図２４に示す。次に
再び再作成した特徴と検索データの特徴との照合を行
う。即ち図２１の特徴と図２４の特徴を照合してD=14を
得る。The method of re-creation will be described with reference to FIGS. If the input keyword is an alphabetic character, the search feature creating means 8 recreates the feature in response to the contact of the character. At this time, if the character next to the character shown in FIG. 22 is an alphabetic character, the feature corresponding to the contact is recreated. Here, since the input keyword includes “/”, the features of “/” and “W” are recreated. The standard pattern is held in the feature dictionary 11, and is created by virtually dividing the standard pattern as follows. The area (領域 from arrow 30 to arrow 31) which is 1/4 from the left of the part shown by a solid line in “W” in FIG.
The region 36) sandwiched between the right and left of the feature indicated by “/” in FIG.
4 (37 in FIG. 23). After that, the remaining image of "W" (the area between the arrows 31 and 32) is divided into four equal parts by the dotted lines indicated by 33 to 35, and each direction component feature is recreated. The results of the recreated features are shown in FIG. Next, the re-created feature is compared with the feature of the search data. That is, D = 14 is obtained by comparing the feature of FIG. 21 with the feature of FIG.

【００６３】特徴の再作成前の距離に比べ値が小さくな
っており、より画像の形状に近づいたことがわかる。こ
れによって英字の入力キーワードでの検索データとの距
離が小さくなり、検索漏れが起きにくくなる。The value is smaller than the distance before re-creation of the feature, and it can be seen that the shape is closer to the shape of the image. As a result, the distance from the search data for the English input keyword is reduced, and search omission is less likely to occur.

【００６４】本実施の形態では、標準パターンを保持
し、それから再計算して特徴を作成する例について述べ
たが、これに限らず、特徴辞書１１の作成時に各文字の
標準パターンの特徴を作成するための領域を細分して保
存し、検索特徴作成手段８が特徴辞書１１から照合する
文字の特徴をバッファにロードする時に特徴の隣接する
各成分を統合することによって特徴を再作成することも
可能である。例えば、横書き対応として横方向に４等分
ではなく16等分し、各領域内の方向成分特徴を作成した
標準パターンの特徴を特徴辞書１１内に保存する。検索
特徴作成手段８が特徴を作成する時は、英字、記号以外
は隣接する４領域を統合して特徴を再作成し、英字、記
号は領域の左1/4を左隣の文字と統合させ、残り3/4を4
分割、すなわち12/16の領域を４等分するので3/16（隣
接する３特徴）ずつ特徴を統合することで実施の形態３
に示す特徴作成が作成可能となる。In this embodiment, an example has been described in which a standard pattern is held and then recalculated to create a feature. However, the present invention is not limited to this, and the feature of the standard pattern of each character is created when the feature dictionary 11 is created. It is also possible to recreate the feature by integrating the components adjacent to the feature when the search feature creating means 8 loads the feature of the character to be matched from the feature dictionary 11 into the buffer. It is possible. For example, the feature of the standard pattern in which the directional component feature in each area is created is stored in the feature dictionary 11 by dividing the horizontal component into sixteen instead of four in the horizontal direction. When the search feature creating means 8 creates a feature, the feature is recreated by integrating the four adjacent areas except for the alphabet and the symbol, and the alphabet and the symbol are integrated with the left quarter of the area and the character adjacent to the left. , The remaining 3/4 to 4
Since the division, that is, the area of 12/16 is divided into four equal parts, the features are integrated by 3/16 (three adjacent features) in the third embodiment.
Can be created.

【００６５】また、縦書きの文章で英字はほとんど採用
されないので、特徴辞書１１の作成時に縦書きと横書き
で特徴の精度を変えて、横書きはより細かく分割して特
徴を作成し、縦書きは粗い特徴によって作成することも
可能である。同様にこれにより特徴辞書１１の容量を削
減することが可能となる。Also, since English characters are hardly adopted in the vertical writing, the accuracy of the characteristics is changed between the vertical writing and the horizontal writing when the feature dictionary 11 is created, and the horizontal writing is divided more finely to create the features. It is also possible to create with coarse features. Similarly, the capacity of the feature dictionary 11 can be reduced.

【００６６】実施の形態４特徴の照合を行う前に正解候補となり得るかを判定する
ことによって処理時間の短縮および検索ノイズを抑制す
る方法を図２５、図２６を用いて説明する。いま、図２
５に示す「REACT」の検索データと入力キーワード「RES
PECT」との照合について説明する。図２５に記述するｓ
ｘ、ｓｙ、ｗ、ｈとは、ｓｘ、ｓｙが検索データの各文
字矩形の左上点のｘ座標およびｙ座標、ｗが矩形の幅、
ｈが矩形の高さである。Embodiment 4 A method of reducing processing time and suppressing search noise by determining whether a candidate is a correct candidate before performing feature matching will be described with reference to FIGS. 25 and 26. Now, FIG.
Search data of "REACT" shown in Fig. 5 and the input keyword "RES
PECT ”will be described. S described in FIG.
x, sy, w, and h are sx, sy, the x and y coordinates of the upper left point of each character rectangle in the search data, w is the width of the rectangle,
h is the height of the rectangle.

【００６７】はじめに図８のS301、S302で各文字コード
の照合を行い、一致している部分を図示していないバッ
ファに作成する。ここでは「R」「E」「C」「T」と一致
する。次に図８のS303で検索手段６は特徴の照合を行う
か否かを判定する。入力キーワードと4/7=57%一致し、
順序関係、隣接関係を満たすので、候補領域とする。検
索データの「A」および入力キーワードの「SPE」が一致
しないので、特徴照合判定手段７は各文字の特徴の照合
を行うか否かの判定をする。ここで、特徴照合判定手段
７は図２５に示すｗ、ｈを読み込み、「A」の文字矩形
形状を求める。ここでは、ｈ／ｗ=1.0である。First, in S301 and S302 in FIG. 8, each character code is collated, and a matching part is created in a buffer (not shown). Here, they match "R", "E", "C", and "T". Next, in S303 of FIG. 8, the search means 6 determines whether or not to perform feature matching. 4/7 = 57% match with input keyword,
Since the order relation and the adjacency relation are satisfied, it is set as a candidate area. Since "A" of the search data does not match "SPE" of the input keyword, the feature matching determination means 7 determines whether or not to match features of each character. Here, the feature collation determination means 7 reads w and h shown in FIG. 25 and obtains the character rectangular shape of “A”. Here, h / w = 1.0.

【００６８】また、検索特徴作成手段８は入力キーワー
ド「S」「P」「E」の各文字に対して図２６の表から矩
形情報を推定する。ここでは「S」「P」「E」の各文字
ともその他４３に属する。そこで特徴照合判定手段７は
「S」「P」「E」を連結した場合の矩形形状を計算す
る。いま、検索データの文字高さが60なので「S」「P」
「E」を連結した矩形形状は、60×0.7×3 = 126から60
×1.2×3＝216の間となる。検索データ「A」の文字幅＝
60であり、「SPE」の連結幅の取り得る値は126〜216で
あるので例えば入力キーワードと検索データの特徴を照
合する文字の幅の差が一方の2倍以上の場合は特徴の照
合を行わずに候補から外すという条件を追加すると、
「A」と「SPE」の照合は行わずに候補から外すことがで
きる。The search feature creating means 8 estimates rectangle information from the table of FIG. 26 for each character of the input keywords "S", "P" and "E". Here, each of the characters “S”, “P” and “E” belongs to the others 43. Therefore, the feature matching determination means 7 calculates a rectangular shape when “S”, “P”, and “E” are connected. Now, since the character height of the search data is 60, "S""P"
The rectangular shape connecting "E" is 60 x 0.7 x 3 = 126 to 60
× 1.2 × 3 = 216. Character width of search data "A" =
60 and the possible value of the connection width of "SPE" is 126 to 216, so if the difference between the widths of the characters matching the input keyword and the features of the search data is more than twice as large as one, the feature matching is performed. If you add a condition to remove it from the list without doing it,
"A" and "SPE" can be excluded from the candidate without matching.

【００６９】このように照合する特徴の幅に一定の差が
ある場合は特徴を照合する対象とせずに一致していない
とみなすことにより、明らかに一致しない照合を回避す
ることが可能である。この場合検索手段６はS303で特徴
の照合を行わず、同様にS304を実行せず、S305で候補と
しない。In the case where there is a certain difference in the width of the feature to be compared as described above, it is possible to avoid the apparently unmatched matching by regarding the feature as not to be compared and assuming that the feature does not match. In this case, the search means 6 does not perform feature comparison in S303, similarly does not execute S304, and does not make it a candidate in S305.

【００７０】実施の形態４では特徴の矩形幅から入力キ
ーワードと検索データの照合を行うか行わないか決定し
たが、これに限らず、例えば入力キーワードと検索デー
タ内の特徴を照合する文字の文字数の差が２以上になる
と照合を行わないなどとしてもよい。In the fourth embodiment, whether the input keyword is compared with the search data is determined based on the rectangular width of the feature. However, the present invention is not limited to this. For example, the number of characters to be compared with the input keyword and the feature in the search data is determined. If the difference between them is 2 or more, the matching may not be performed.

【００７１】実施の形態４では特徴同士の照合を行う際
に入力キーワードと文字認識結果の文字矩形を用いて特
徴の照合を行うか否かを判定することにより、無駄な照
合を省くことが可能となり、その結果処理時間の短縮、
検索精度の向上を行える。In the fourth embodiment, when comparing features, it is possible to eliminate useless matching by determining whether or not to perform feature matching using an input keyword and a character rectangle of a character recognition result. As a result, processing time is shortened,
Search accuracy can be improved.

【００７２】[0072]

【発明の効果】以上説明したように、本発明によると、
請求項１乃至請求項３及び請求項１３では、文字コード
と特徴を保存しておき、検索時には文字コードと特徴と
から検索を行うことにより、文字認識エラーが生じた部
分の検索において、適切な距離の付与が可能となる。As described above, according to the present invention,
In claims 1 to 3 and claim 13, character codes and features are stored, and a search is performed based on the character codes and features at the time of search. Distance can be given.

【００７３】また、請求項４では、文字コードが一致す
る割合が一定値以上の領域で特徴の照合を行うことによ
り、検索ノイズの増加を抑えることが可能となり、処理
時間も短縮される。According to the fourth aspect of the present invention, the feature matching is performed in a region where the character code matching ratio is equal to or more than a certain value, so that an increase in search noise can be suppressed and the processing time can be shortened.

【００７４】請求項５では、前記文字認識手段が認識し
た文字認識結果が正解文字であるときは特徴を作成せず
文字コードのみを保存し、正解文字と判定できない場合
は文字コードと前記特徴作成手段が作成する特徴を保存
する構成にされているので検索データ格納部で格納する
容量を削減することが可能となる。According to a fifth aspect of the present invention, when the character recognition result recognized by the character recognition means is a correct character, no character is created and only the character code is stored. Since the feature created by the means is stored, it is possible to reduce the storage capacity of the search data storage unit.

【００７５】請求項６では、検索用データが文字コード
のみ保持する部分は文字コードのみによる距離から一致
を判定し、文字コードと特徴を保持する部分は文字コー
ドと特徴から一致度を計算する構成にされているので、
検索ノイズを減少することが可能となる。According to a sixth aspect of the present invention, the portion where the search data holds only the character code determines the match from the distance based on only the character code, and the portion holding the character code and the feature calculates the degree of matching from the character code and the feature. It has been
Search noise can be reduced.

【００７６】請求項７および請求項８では、入力キーワ
ードと検索データ格納部の文字列の文字数が異なる場
合、特徴を所定の基準にしたがって再作成する構成にさ
れているので、文字切り出しエラーによる誤認識データ
も適切な距離を付与した検索が可能となる。According to the seventh and eighth aspects, when the number of characters of the character string in the search data storage unit differs from that of the input keyword, the feature is re-created according to a predetermined criterion. Recognition data can also be searched with an appropriate distance.

【００７７】請求項９では、縦書きか、横書きかの判定
結果により、対応する特徴作成方法を用いて特徴を作成
する構成にされているので、縦書き、横書きの場合のい
ずれにおいても文字切り出しエラーに対処した検索が可
能である。According to the ninth aspect, a feature is created by using a corresponding feature creation method based on a determination result of vertical writing or horizontal writing. Therefore, character extraction is performed in both vertical writing and horizontal writing. Searches that deal with errors are possible.

【００７８】請求項１０では、入力キーワードの文字の
種類により、夫々対応した特徴作成方法を選択する構成
にされ、請求項１１では、入力キーワードが英字または
記号の場合、入力キーワードを構成する隣り合う文字の
特徴同士を一部重ねあわせて統合特徴を作成する構成に
されているので、英字にありがちな隣り合う文字の接触
による誤認識にも対応して検索可能となる。According to a tenth aspect, a feature creation method corresponding to each of the types of characters of the input keyword is selected. In an eleventh aspect, when the input keyword is an alphabetic character or a symbol, adjacent ones constituting the input keyword are arranged. Since the integrated feature is created by partially overlapping the features of the characters, it is possible to perform a search in response to erroneous recognition due to contact between adjacent characters that is likely to occur in English characters.

【００７９】請求項１２では、矩形情報および文字数情
報を用いて明らかに異なる文字列同士の照合を回避する
構成にされているので、検索ノイズの減少、処理時間の
短縮が可能となる。According to the twelfth aspect, since it is configured to avoid collation between character strings that are clearly different using the rectangular information and the number-of-characters information, it is possible to reduce search noise and shorten processing time.

【００８０】[0080]

[Brief description of the drawings]

【図１】本発明の実施の形態１を示すブロック図。FIG. 1 is a block diagram showing Embodiment 1 of the present invention.

【図２】実施の形態１での登録用画像の説明図。FIG. 2 is an explanatory diagram of a registration image according to the first embodiment.

【図３】文字切り出し結果と特徴作成領域の説明図。FIG. 3 is an explanatory diagram of a character cutout result and a feature creation area.

【図４】４方向成分特徴を作成するマスクの説明図。FIG. 4 is an explanatory diagram of a mask for creating a four-direction component feature.

【図５】検索用データの内容の説明図。FIG. 5 is an explanatory diagram of contents of search data.

【図６】登録処理のフローチャート。FIG. 6 is a flowchart of a registration process.

【図７】文字認識、特徴作成のフローチャート。FIG. 7 is a flowchart of character recognition and feature creation.

【図８】検索のフローチャート。FIG. 8 is a flowchart of a search.

【図９】実施の形態１での検索動作を説明する図。FIG. 9 illustrates a search operation according to the first embodiment.

【図１０】実施の形態１での検索動作を説明する図。FIG. 10 illustrates a search operation in Embodiment 1.

【図１１】登録処理の変形のフローチャート。FIG. 11 is a flowchart of a modification of the registration process.

【図１２】変形登録処理による検索データの説明図。FIG. 12 is an explanatory diagram of search data by a deformation registration process.

【図１３】実施の形態２で用いる検索データの説明
図。FIG. 13 is an explanatory diagram of search data used in the second embodiment.

【図１４】「J」「E」の特徴辞書の説明図。FIG. 14 is an explanatory diagram of a feature dictionary of “J” and “E”.

【図１５】「J」「E」の特徴を再作成した例を示す説
明図。FIG. 15 is an explanatory diagram showing an example in which features of “J” and “E” are recreated.

【図１６】「作」の特徴を再作成した例を示す説明図。FIG. 16 is an explanatory diagram showing an example in which the feature of “work” is re-created.

【図１７】領域番号を示す説明図。FIG. 17 is an explanatory diagram showing area numbers.

【図１８】実施の形態２で横書きの場合の照合方法を
示す説明図。FIG. 18 is an explanatory diagram showing a collation method in the case of horizontal writing in the second embodiment.

【図１９】実施の形態２で縦書きの場合の照合方法を
示す説明図。FIG. 19 is an explanatory diagram showing a matching method in the case of vertical writing in the second embodiment.

【図２０】実施の形態３での登録文書の例を示す説明
図。FIG. 20 is an explanatory diagram showing an example of a registered document according to the third embodiment.

【図２１】実施の形態３での検索データの内容を示す
説明図。FIG. 21 is an explanatory diagram showing contents of search data according to the third embodiment.

【図２２】特徴を再作成する文字を示す説明図。FIG. 22 is an explanatory diagram showing characters for recreating a feature.

【図２３】特徴再作成の方法を示す説明図。FIG. 23 is an explanatory diagram showing a feature re-creation method.

【図２４】再作成した特徴辞書の説明図。FIG. 24 is an explanatory diagram of a recreated feature dictionary.

【図２５】実施の形態４での検索データの内容を示す
説明図。FIG. 25 is an explanatory diagram showing contents of search data according to the fourth embodiment.

【図２６】文字コード-形状判定テーブルを示す説明
図。FIG. 26 is an explanatory diagram showing a character code-shape determination table.

【図２７】従来技術１のブロック図。FIG. 27 is a block diagram of a conventional technique 1;

【図２８】従来技術１で用いる文書画像を示す説明
図。FIG. 28 is an explanatory diagram showing a document image used in the related art 1.

【図２９】従来技術１の文字切り出し候補点および文
字保存の例を示す説明図。FIG. 29 is an explanatory diagram showing an example of a character extraction candidate point and character storage according to the related art 1.

【図３０】従来技術２のブロック図。FIG. 30 is a block diagram of a conventional technique 2.

【図３１】従来技術２の画像・認識結果データベース
を示す説明図。FIG. 31 is an explanatory diagram showing an image / recognition result database according to the related art 2.

[Explanation of symbols]

１制御手段、２文字認識手段、３特徴作成手
段、４表示手段５入力手段、６検索手段、７特徴照合判定手
段、８検索特徴作成手段、９認識辞書、１０検
索データ格納部、１１特徴辞書、１２文書画像
格納手段。REFERENCE SIGNS LIST 1 control means, 2 character recognition means, 3 feature creation means, 4 display means 5 input means, 6 search means, 7 feature collation determination means, 8 search feature creation means, 9 recognition dictionary, 10 search data storage section, 11 feature dictionary 12. Document image storage means.

フロントページの続きＦターム(参考） 5B050 BA10 BA16 EA03 EA07 GA08 5B075 ND07 NK02 NK07 NK13 NK24 NK31 PP02 PP12 PP22 PR06 PR10 QM02 QM08 UU06 Continued on the front page F term (reference) 5B050 BA10 BA16 EA03 EA07 GA08 5B075 ND07 NK02 NK07 NK13 NK24 NK31 PP02 PP12 PP22 PR06 PR10 QM02 QM08 UU06

Claims

[Claims]

An input means for inputting a document image, a recognition dictionary in which standard patterns of respective characters are stored in advance, and characters are cut out from the document image input by the input means.
Character recognition means for recognizing the cut-out character with reference to the recognition dictionary and generating a character code; feature generation means for generating a feature for each character recognized by the character recognition means; A search data storage unit for storing a character code and a feature created by the feature creating unit; a feature dictionary for holding features of the standard pattern in advance; and a feature of each character of a search input keyword input during a search. A search feature creating unit that obtains from a dictionary, and performs a search by collating the input keyword with data in the search data storage unit and performing a search by referring to the result of the search feature creating unit when a predetermined condition is satisfied. A document filing apparatus comprising: a search unit that performs search; and a display unit that displays a search result of the search unit.

2. The method according to claim 2, wherein the character recognition unit performs four-directional component vertical, horizontal, upward-sloping, and downward-sloping component outlines of the character in each character rectangle from which the character is cut out when performing character recognition. 2. The document filing apparatus according to claim 1, wherein the document filing apparatus is configured to create a document.

3. The search means according to claim 1, wherein, in comparing the input keyword with data in the search data storage unit, a character code corresponding to a part where the input keyword matches the character code of the search data storage unit is used. Calculate the distance between each other, and in the portion where the character codes do not match, compare the characteristics of the character in the search data storage unit with the characteristics in the feature dictionary and calculate the distance to calculate the distance. 2. The document filing apparatus according to claim 1, wherein a search result is determined based on a feature distance.

4. The method according to claim 1, wherein the search unit is configured to determine, when comparing the input keyword with the search data storage unit, that a character code does not match when a ratio of the number of characters matching the input keyword is a predetermined value or more. 4. The document filing apparatus according to claim 3, wherein a feature of the character in the search data storage unit is compared with a feature in the feature dictionary, and a matching degree is calculated.

5. The feature creating means tests the character code recognized by the character recognizing means using a predetermined criterion, and applies a feature to a character whose individual character recognition result is determined to be a correct character. The character code output by the character recognizing unit is saved without creating the character code, and the character code output by the character recognizing unit and the feature created by the feature creating unit are saved when it cannot be determined that the character is the correct character. The document filing apparatus according to claim 1, wherein:

6. The search means determines that a portion of the search data storage unit in which search data holds only a character code determines a match based on a distance based on only a character code, and a portion that holds a character code and a feature uses a character code. 6. The document filing apparatus according to claim 5, wherein the degree of coincidence is calculated from the feature.

7. The character recognizing means determines whether the document is written vertically or horizontally, and saves the result in the search data storage section. The search feature creating means includes an input keyword for matching features. When the number of characters in the character string in the search data storage unit differs from that in the search data storage unit, the feature is re-created based on information on whether the data in the search data storage unit is vertical writing or horizontal writing according to a predetermined standard. 2. The document filing apparatus according to claim 1, wherein

8. When the number of characters of a character string in the search data storage unit differs from the number of characters in the input keyword, the search unit performs matching between the features of the input keyword and the characters in the search data storage unit by dynamic programming. The document filing apparatus according to any one of claims 1 to 3, wherein:

9. The character recognizing means determines whether the document is written vertically or horizontally and saves the result in the search data storage unit, and the feature creating means corresponds to the vertical writing and the horizontal writing. Claims: A feature is provided in which each of the feature creating methods is provided, and a feature is created using a corresponding feature creating method based on a result of determining whether the character recognition unit is vertical writing or horizontal writing. Item 2. The document filing device according to Item 1.

10. The apparatus according to claim 1, wherein the feature creating means includes a plurality of different feature creating methods, and selects a corresponding feature creating method according to the type of character of the input keyword. Document filing device

11. The method according to claim 1, wherein the feature creating means creates an integrated feature by partially overlapping features of adjacent characters constituting the input keyword when the input keyword is an alphabetic character or a symbol. The document filing apparatus according to claim 10, wherein

12. The character recognizing means performs character segmentation at the time of character recognition, saves rectangular information for each character in the search data storage unit, and stores each character of the input keyword output by the search feature creating means. From the rectangular shape of the character and the information of the character rectangle obtained from the search data storage unit and the number of characters to be compared among the characters of the input keyword, it is determined whether or not the feature is to be compared. 13. The document filing apparatus according to claim 1, further comprising a feature collation determining unit that determines that the character string in the search data storage unit does not match the input keyword.

13. An inputting step of inputting a document image,
A character recognition step of extracting characters from the document image input in the input step, referring to a recognition dictionary in which a standard pattern of each character is stored in advance, and recognizing the extracted characters to create a character code; A feature creation step of creating a feature for each character recognized by the step;
A search data step of storing the character code created by the character recognition step and the feature created by the feature creation step in a search data storage unit; A search feature creation step of acquiring from a feature dictionary in which features are retained; and searching by collating the input keyword with data in the search data storage unit. A document filing method, comprising: a search step of referring to and processing and outputting; and a display step of displaying a search result of the search step on a display unit.