JP2011028503A

JP2011028503A - Image processor, image processing method, and program

Info

Publication number: JP2011028503A
Application number: JP2009173376A
Authority: JP
Inventors: Ryo Kosaka; 亮小坂
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2009-07-24
Filing date: 2009-07-24
Publication date: 2011-02-10

Abstract

【課題】入力画像文書中のオブジェクトにメタデータを付与する際に、全てのオブジェクトを個別に管理すると、付与するメタデータが増えてしまうことからファイルサイズの増加を招き、検索時に該当する候補が増えてしまうことから検索効率の低下につながる。
【解決手段】入力文書を領域に分割する領域分割部と、属性情報を付与する属性情報付与部と、文字情報を得るための文字認識部と、オブジェクトをグループ化する際の階層化レベルを算出する階層化レベル算出部と、前記オブジェクトを、同一のキャプションをもつグループで統合し、該グループごとに個別の識別子を生成するオブジェクト用識別子生成部と、付随するキャプションを前記入力文書より検出することで、該オブジェクトに関連するメタデータを抽出し、オブジェクト用識別子とメタデータを関連付けして記憶領域に記憶するメタデータ抽出部を備える。
【選択図】図５When assigning metadata to objects in an input image document, if all the objects are managed individually, the amount of metadata to be added increases, resulting in an increase in file size. This increases the search efficiency.
A region dividing unit for dividing an input document into regions, an attribute information adding unit for adding attribute information, a character recognition unit for obtaining character information, and a hierarchization level for grouping objects are calculated. A hierarchy level calculation unit that integrates the objects in a group having the same caption, and an object identifier generation unit that generates an individual identifier for each group, and an associated caption is detected from the input document. And a metadata extraction unit that extracts metadata related to the object and associates the identifier for the object with the metadata and stores it in the storage area.
[Selection] Figure 5

Description

本発明は、文書画像より効率的にオブジェクトを検索するためのメタデータ付与の仕方を制御する情報処理装置、情報処理システム、情報出力制御方法、それを実施するためのプログラムを記憶したコンピュータ読出可能な記憶媒体、及び当該プログラムに関するものである。 The present invention relates to an information processing apparatus, an information processing system, an information output control method, and a computer readable program storing a program for implementing the method for providing metadata for efficiently searching for an object from a document image. The present invention relates to a storage medium and the program.

入力文書画像を有効利用するために、文書中の文字を除くオブジェクト（例えば、写真、図面、線画、表など）に隣接する文字列がキャプション（オブジェクトを説明している文字列）である場合、該オブジェクトにキャプションをメタデータとして関連付けることが行われている。（以下、オブジェクトは、特に記載がない限り文字を除くオブジェクト（例えば、写真、図面、線画、表など）のことを示すものとする。）これにより、電子化された文書画像をアプリケーション側で利用する際に、該メタデータを検索キーワードとしてオブジェクトを検索することが可能となる。 In order to effectively use the input document image, when a character string adjacent to an object (for example, a photo, a drawing, a line drawing, a table, etc.) excluding characters in the document is a caption (a character string describing the object), A caption is associated with the object as metadata. (Hereinafter, unless otherwise specified, an object means an object excluding characters (for example, a photograph, a drawing, a line drawing, a table, etc.).) Thus, an electronic document image is used on the application side. In this case, it is possible to search for an object using the metadata as a search keyword.

このオブジェクトに隣接するキャプションが図番（例えば、「図１」や「第１図」など）である場合、一般的な文書画像では、この図番と同じ表現が本文中に記載されている。この場合、図番と本文の同一表現との間にリンクを自動的に生成し、ハイパーテキスト化することが行われている。例えば、オブジェクトに隣接するキャプションが「図１」であり、本文中に「図１は、ＡＡＡである」という記載がある場合、キャプション「図１」と本文中の「図１」は同一表現であるため、リンクが生成される。（特許文献１）
一方、入力された文書画像を解析し、文書要素の幾何情報（例えば、テキスト、図面、写真、表など）に関するレイアウト構造や、文書の論理意味情報（例えば、章、節、文書段落など）に関する論理構造などを持った構造化文書を作成することが行われている。これにより、文書画像に含まれる文章と図表の各領域をグループ化し、上記のキャプション情報をタグ付けして文書構造に指定することが可能となる（特許文献２） When the caption adjacent to this object is a figure number (for example, “FIG. 1”, “FIG. 1”, etc.), in a general document image, the same expression as this figure number is described in the text. In this case, a link is automatically generated between the figure number and the same expression of the text, and converted into hypertext. For example, when the caption adjacent to the object is “FIG. 1” and the description “FIG. 1 is AAA” in the text, the caption “FIG. 1” and “FIG. 1” in the text are the same expression. Because there is, a link is generated. (Patent Document 1)
On the other hand, the input document image is analyzed, and the layout structure related to the geometric information of the document element (for example, text, drawing, photo, table, etc.) and the logical semantic information (for example, chapter, section, document paragraph, etc.) A structured document having a logical structure or the like is being created. As a result, it is possible to group text and chart areas included in a document image, tag the above caption information, and specify the document structure (Patent Document 2).

特開平１０−２２８４７３号公報Japanese Patent Laid-Open No. 10-228473 特開２００３−２８８３３４号公報JP 2003-288334 A

しかしながら、上記の先行件では、入力画像文書中にサブキャプション（例えば、「図１（ａ）」、「図１（ｂ）など」）で分類されているような関連性の高いオブジェクトが多数存在している場合においても、それぞれのオブジェクトの関連性を検知しない。そのため、全てが独立したオブジェクトとして、オブジェクトと対応するキャプション・サブキャプションがグループ化され、保持される。その結果、全てのオブジェクトに対して画像文書中の説明文中より検出されたキーワードを付与することになり、ファイルサイズが大きくなるという問題がある。また、キーワード検索時に該当する候補数が増えてしまうことから、所望の結果を見つけにくくなり、検索効率の低下につながるという問題がある。 However, in the preceding cases, there are many objects with high relevance that are classified by sub-captions (for example, “FIG. 1A”, “FIG. 1B”, etc.) in the input image document. Even in such a case, the relevance of each object is not detected. For this reason, captions and sub-captions corresponding to the objects are grouped and held as all independent objects. As a result, the keywords detected from the explanatory text in the image document are assigned to all objects, and there is a problem that the file size increases. Further, since the number of candidates corresponding to the keyword search increases, there is a problem that it becomes difficult to find a desired result and the search efficiency is lowered.

上記課題を解決するために、入力文書を領域に分割する領域分割部と、前記分割領域に属性情報を付与する属性情報付与部と、文字領域に対して文字情報を得るための文字認識部と、前記オブジェクトをグループ化する際の階層化レベルを算出する階層化レベル算出部と、前記オブジェクトを、前記階層化レベルにおいて同一のキャプションをもつグループで統合し、該グループごとに個別の識別子を生成するオブジェクト用識別子生成部と、前記オブジェクトに付随するキャプションを前記入力文書より検出することで、該オブジェクトに関連するメタデータを抽出し、前記オブジェクト用識別子と前記メタデータを関連付けして記憶領域に記憶するメタデータ抽出部を備える画像処理装置とした。 In order to solve the above problems, an area dividing unit that divides an input document into regions, an attribute information adding unit that assigns attribute information to the divided regions, and a character recognition unit that obtains character information for a character region; A hierarchy level calculation unit for calculating a hierarchy level when grouping the objects, and the objects are integrated in a group having the same caption in the hierarchy level, and an individual identifier is generated for each group. An object identifier generation unit that detects the caption associated with the object from the input document, and extracts metadata associated with the object, and associates the object identifier with the metadata in a storage area. The image processing apparatus includes a metadata extraction unit for storing.

本発明は、入力文書画像のファイルサイズや、ユーザによる設定などに応じて、オブジェクトの関連性を考慮してグループ化することで、ファイルサイズの削減ならびに、検索効率を向上することが出来る。 According to the present invention, the file size can be reduced and the search efficiency can be improved by grouping in consideration of the relevance of the objects in accordance with the file size of the input document image or the setting by the user.

画像処理システム構成の一例Example of image processing system configuration ＭＦＰの構成図Configuration diagram of MFP データ処理部構成の一例Example of data processing unit configuration 階層化レベルとグループ化の説明図Hierarchical level and grouping illustration 第１の実施形態における処理フローProcessing flow in the first embodiment 領域分割・属性情報付与の説明図Explanatory drawing of area division / attribute information assignment 第１の実施形態における階層化レベル算出処理の一例Example of hierarchical level calculation processing in the first embodiment オブジェクトの階層構造解析の説明図Explanatory diagram of object hierarchy analysis メタデータ抽出処理の説明図Illustration of metadata extraction process 画像検索の説明図Illustration of image search 階層化レベルＮの違いによる画像検索結果とメタデータの比較（ａ）Ｎ＝１、（ｂ）Ｎ＝２、（ｃ）Ｎ＝３Comparison of image search results and metadata according to difference in hierarchical level N (a) N = 1, (b) N = 2, (c) N = 3 操作画面の一例Example of operation screen 第２の実施形態における階層化レベル算出処理Hierarchical level calculation processing in the second embodiment フォーマットの一例（ａ）Ｎ＝１、（ｂ）Ｎ＝３Example of format (a) N = 1, (b) N = 3

（実施例１）
以下、図面を参照して、本発明を実施するための最良の形態について説明する。 Example 1
The best mode for carrying out the present invention will be described below with reference to the drawings.

実施例１では、効率的な検索を行うために、関連するオブジェクトをグループ化し、グループ化されたオブジェクトに対して、オブジェクトを検索するためのメタデータを関連付ける方法についての説明を行う。 In the first embodiment, in order to perform an efficient search, a method of grouping related objects and associating metadata for searching the objects with the grouped objects will be described.

図１は本発明の実施例１の画像処理システムの構成を示すブロック図である。 FIG. 1 is a block diagram showing the configuration of the image processing system according to the first embodiment of the present invention.

図１において、オフィスＡ内に構築されたＬＡＮ１０２には、複数の機能（例えばコピー機能、印刷機能、送信機能等）を実現する複合機であるＭＦＰ（ＭｕｌｔｉＦｕｎｃｔｉｏｎＰｅｒｉｐｈｅｒａｌ）１００が接続されている。また、ＭＦＰ１００からの送信データを受信したり、ＭＦＰ１００が実現する機能を利用したりするクライアントＰＣ１０１及びプロキシサーバ１０３が同じＬＡＮ１０２上に接続されている。ＬＡＮ１０２は、プロキシサーバ１０３を介してネットワーク１０４に接続されている。このクライアントＰＣ１０１では、例えば、印刷データをＭＦＰ１００へ送信することで、その印刷データに基づく印刷物をＭＦＰ１００で印刷することが可能である。 In FIG. 1, an MFP (Multi Function Peripheral) 100 that is a multi-function peripheral that realizes a plurality of functions (for example, a copy function, a print function, a transmission function, etc.) is connected to a LAN 102 constructed in an office A. A client PC 101 and a proxy server 103 that receive transmission data from the MFP 100 and use functions realized by the MFP 100 are connected to the same LAN 102. The LAN 102 is connected to the network 104 via the proxy server 103. In the client PC 101, for example, by transmitting print data to the MFP 100, a printed matter based on the print data can be printed by the MFP 100.

尚、図１の構成は一例であり、オフィスＡと同様の構成要素を有する、複数のオフィスがネットワーク１０４上に接続されていても良い。ネットワーク１０４は、インターネットやＬＡＮやＷＡＮや電話回線、専用デジタル回線、ＡＴＭやフレームリレー回線、通信衛星回線、ケーブルテレビ回線、データ放送用無線回線などである。または、これらの組み合わせにより実現されるいわゆる通信ネットワークで、データの送受信が可能であれば良い。また、クライアントＰＣ１０１、プロキシサーバ１０３の各種端末は、汎用コンピュータに搭載される標準的な構成要素（例えば、ＣＰＵ、ＲＡＭ、ＲＯＭ、ハードディスク、外部記憶装置、ネットワークＩ／Ｆ、ディスプレイ、キーボード、マウスなど）を有する。 The configuration in FIG. 1 is an example, and a plurality of offices having the same components as the office A may be connected on the network 104. The network 104 is the Internet, LAN, WAN, telephone line, dedicated digital line, ATM, frame relay line, communication satellite line, cable TV line, data broadcasting wireless line, or the like. Or what is necessary is just to be able to transmit / receive data in what is called a communication network implement | achieved by these combination. The various terminals of the client PC 101 and the proxy server 103 are standard components (for example, CPU, RAM, ROM, hard disk, external storage device, network I / F, display, keyboard, mouse, etc.) mounted on a general-purpose computer. ).

次に、ＭＦＰ１００の詳細構成について、図２を用いて説明する。図２は本発明の実施例１のＭＦＰの詳細構成を示す図である。 Next, a detailed configuration of the MFP 100 will be described with reference to FIG. FIG. 2 is a diagram showing a detailed configuration of the MFP according to the first embodiment of the present invention.

ＭＦＰ１００は図２に示すように、ネットワークＩ／Ｆ２０４、スキャナ部２００、プリンタ部２０１、操作部２０７、表示部２０８、コントローラ部２０９で構成される。また、コントローラ部２０９にはデータ処理部２０２、記憶部２０３、ＰＤＬ処理部２０５、制御部２０６が含まれる。ＭＦＰ内部における処理の流れを説明する。 As shown in FIG. 2, the MFP 100 includes a network I / F 204, a scanner unit 200, a printer unit 201, an operation unit 207, a display unit 208, and a controller unit 209. The controller unit 209 includes a data processing unit 202, a storage unit 203, a PDL processing unit 205, and a control unit 206. A process flow inside the MFP will be described.

オートドキュメントフィーダ（ＡＤＦ）を含むスキャナ部２００は、入力原稿画像を光源で照射し、原稿反射像をレンズで固体撮像素子上に結像し、固体撮像素子からラスタ状の画像読取信号を所定密度（例えば、６００ＤＰＩ）の画像データとして得る。制御部２０６は、スキャナ部２００で得られた画像データをデータ処理部２０２に送る。 A scanner unit 200 including an auto document feeder (ADF) irradiates an input original image with a light source, forms an original reflection image on a solid-state image sensor with a lens, and outputs a raster-like image read signal from the solid-state image sensor to a predetermined density. Obtained as image data (for example, 600 DPI). The control unit 206 sends the image data obtained by the scanner unit 200 to the data processing unit 202.

一方、クライアントＰＣ１０１から出力されたＰＤＬデータはネットワークＩ／Ｆ２０４経由でＰＤＬ処理部２０５が受信する。ＰＤＬ処理部２０５は、そのＰＤＬデータをレンダリング処理し、制御部２０６により、レンダリング処理されたＰＤＬデータをデータ処理部２０２に送る。 On the other hand, the PDL data output from the client PC 101 is received by the PDL processing unit 205 via the network I / F 204. The PDL processing unit 205 renders the PDL data, and the control unit 206 sends the rendered PDL data to the data processing unit 202.

そして、データ処理部２０２は、受け取った入力信号をプリンタで出力するのに適するように画像処理を行い、記憶部２０４を介してプリンタ部２０１へ送る。 Then, the data processing unit 202 performs image processing so that the received input signal is suitable for output by the printer, and sends it to the printer unit 201 via the storage unit 204.

また、ネットワークＩ／Ｆ２０４を介する送信機能は、スキャナ部２００から得られる画像信号を、ＴＩＦＦやＪＰＥＧなどの圧縮画像ファイル形式や、ＰＤＦなどのベクトルデータファイル形式の画像ファイルへ変換し、ネットワークＩ／Ｆ２０４から出力する。出力された画像ファイルは、ＬＡＮ１０２を介してクライアント１０１へ送信されたり、更にネットワーク１０４経由でネットワーク上の外部端末（例えば、別のＭＦＰやクライアントＰＣ）に転送されたりする。 Also, the transmission function via the network I / F 204 converts the image signal obtained from the scanner unit 200 into an image file in a compressed image file format such as TIFF or JPEG or a vector data file format such as PDF. Output from F204. The output image file is transmitted to the client 101 via the LAN 102, and further transferred to an external terminal (for example, another MFP or client PC) via the network 104.

ＭＦＰ１００へのユーザの指示は、ＭＦＰ１００に装備されたキー操作部やタッチパネルからなる操作部２０７及び表示部２０８から行われ、これら一連の動作は制御部２０６で制御される。また、操作入力の状態表示及び処理中の画像データの表示は、表示部２０８で行われる。 A user's instruction to the MFP 100 is performed from an operation unit 207 and a display unit 208 including a key operation unit and a touch panel provided in the MFP 100, and a series of these operations is controlled by the control unit 206. Further, the display of the operation input status and the image data being processed are displayed on the display unit 208.

記憶部２０３は、例えば、大容量のハードディスクで実現され、スキャナ部２００で読み取った画像データや、ＰＤＬ処理部２０５で処理されたＰＤLデータを記憶管理するデータベースを構成している。特に、本発明では、イメージデータと、そのイメージデータを領域分割することによって得られる領域情報を対応づけて管理することが出来る。 The storage unit 203 is realized by a large-capacity hard disk, for example, and constitutes a database that stores and manages image data read by the scanner unit 200 and PDL data processed by the PDL processing unit 205. In particular, in the present invention, image data and area information obtained by dividing the image data into areas can be managed in association with each other.

次に、図３を用いて図２のデータ処理部２０２の構成と各構成部の処理の概要について説明を行う。 Next, the configuration of the data processing unit 202 in FIG. 2 and the outline of the processing of each configuration unit will be described with reference to FIG.

データ処理部２０２は、領域分割部３００、属性情報付与部３０１、文字認識部３０２、階層化レベル算出部３０３、オブジェクト用識別子生成部３０４、メタデータ抽出部３０５から構成される。そして、スキャナ部２００やＰＤＬ処理部２０５から受けっとったデータに対して領域分割、オブジェクトのグループ化、メタデータ抽出処理などを行い、メタデータの付与されたデータを作成し、記憶部２０３へ送る。 The data processing unit 202 includes an area dividing unit 300, an attribute information adding unit 301, a character recognizing unit 302, a hierarchized level calculating unit 303, an object identifier generating unit 304, and a metadata extracting unit 305. Then, the data received from the scanner unit 200 or the PDL processing unit 205 is subjected to area division, object grouping, metadata extraction processing, etc., and data with metadata added is created and stored in the storage unit 203. send.

領域分割部３００は、スキャナ部２００やＰＤＬ処理部２０５から受けっとった画像文書を入力とし、領域を分割する。 The area dividing unit 300 receives an image document received from the scanner unit 200 or the PDL processing unit 205 as an input, and divides the area.

属性情報付与部３０１は、領域分割部３００によって分割された領域ごとに属性情報を付与する。まず、領域内に文字が含まれているかによって文字領域とオブジェクト領域に分類する。文字領域には「章」、「節」、「本文」、「キャプション」、「サブキャプション」、「ヘッダー・フッター」「文字部」という属性を、オブジェクト領域に対しては「表」、「図」、「ノイズ」という属性を付与する。 The attribute information adding unit 301 provides attribute information for each area divided by the area dividing unit 300. First, it is classified into a character area and an object area depending on whether characters are included in the area. In the character area, attributes such as “chapter”, “section”, “body”, “caption”, “subcaption”, “header / footer”, and “character part” are used. ”And“ Noise ”attributes.

文字認識部３０２は、「本文」、「キャプション」、「サブキャプション」、「章」、「節」、「ヘッダー・フッター」、「文字部」といった属性が付与された文字領域に対して文字認識を行う。そして、その結果を対象領域に関連付けを行う。 The character recognition unit 302 performs character recognition on character regions to which attributes such as “text”, “caption”, “subcaption”, “chapter”, “section”, “header / footer”, and “character part” are assigned. I do. Then, the result is associated with the target area.

階層化レベル算出部３０３は、スキャナ部２００やＰＤＬ処理部２０５から受けっとったデータサイズに応じて、階層化レベル（Ｎ）を決定する。階層化レベルとは、後述するオブジェクト用識別子生成時において、複数のオブジェクトを一つのグループとして識別する際の基準として利用する。階層化レベルを小さくすると、より詳細レベルでオブジェクトを分類することになるため、グループ化があまり行われない。一方、階層化レベルを高くすることで、より抽象的なまとまりでオブジェクトは分類される。 The hierarchization level calculation unit 303 determines the hierarchization level (N) according to the data size received from the scanner unit 200 or the PDL processing unit 205. The hierarchization level is used as a reference for identifying a plurality of objects as one group when generating an object identifier described later. If the layering level is reduced, objects are classified at a more detailed level, so that grouping is not performed much. On the other hand, by increasing the hierarchization level, objects are classified in a more abstract group.

オブジェクト用識別子生成部３０４では、関連するオブジェクトのグループ化を行い、それぞれのグループ化されたオブジェクトを識別するためのオブジェクト用識別子を生成と付与を行う。例えば、図４のようにオブジェクトがサブキャプションによって細分化されている場合、階層化レベルが最小であるとき（Ｎ＝１のとき）には、オブジェクトの最小単位であるサブキャプションレベル（Ｆｉｇ１（ａ）とＦｉｇ１（ｂ））で比較を行う。その結果、グループ化が一切行われず、図４（ａ）に示すように２つのオブジェクト４００と４０１がそれぞれ独立オブジェクトとして扱われる。ここで階層化レベルを一つ上げると（Ｎ＝２のとき）、抽象度をあげてキャプションレベル（Ｆｉｇ１）で比較する。結果として、図４（ｂ）に示すように２つのオブジェクト４００と４０１が１つにグループ化され、あたかも１つのオブジェクト４０２であるかのように扱うことがきる。 The object identifier generation unit 304 groups related objects, and generates and assigns object identifiers for identifying each grouped object. For example, when the object is subdivided by sub-caption as shown in FIG. 4, when the hierarchization level is minimum (when N = 1), the sub-caption level (FIG 1 (a ) And FIG. 1 (b)). As a result, no grouping is performed, and the two objects 400 and 401 are treated as independent objects as shown in FIG. Here, when the hierarchization level is increased by one (when N = 2), the abstraction level is increased and the comparison is made at the caption level (FIG. 1). As a result, as shown in FIG. 4B, two objects 400 and 401 are grouped into one, and can be handled as if they were one object 402.

メタデータ抽出部３０５は、オブジェクトに付与されたキャプションと同一または同義表現を本文領域から検索する。もし見つかった場合には、オブジェクトを検索する際のメタデータとして本文中から抽出し、オブジェクト用識別子と関連付けして保存する。 The metadata extraction unit 305 searches the body area for the same or synonymous expression as the caption given to the object. If found, it is extracted from the text as metadata when searching for the object and stored in association with the identifier for the object.

次に、図５で示すフローチャート図を参照して、本実施形態における第１の実施例の処理の流れならびに、各構成部の詳細な処理について説明する。 Next, with reference to the flowchart shown in FIG. 5, the flow of the process of the 1st Example in this embodiment and the detailed process of each structure part are demonstrated.

まず、ステップＳ５００で領域分割部３００がスキャナ部２００やＰＤＬ処理部２０５から受けっとった画像文書を領域ごとに分割する処理を行う。領域の抽出方法について、一例を説明すると、まず、入力画像をＭ×Ｎ画素毎に分割し、１画素でも画像が存在すれば、そこに画像があるとして間引き画像を作成する。次に、間引き画像が連結する部分をまとめ小矩形を作成していく。縦横比が、大きく短辺同士が接近している矩形は文字行である可能性が高いので、この場合は矩形同士を結合する。また、短辺の長さがほぼ同じで、ほぼ等間隔に並んでいる矩形の集合は、本文部である可能性が高いので結合する。その結果、例えば、図６に示すような領域４１０から領域４１９が抽出される。 First, in step S500, the region dividing unit 300 performs processing for dividing the image document received from the scanner unit 200 and the PDL processing unit 205 into regions. An example of the region extraction method will be described. First, an input image is divided into M × N pixels, and if even one pixel exists, a thinned image is created assuming that there is an image there. Next, a small rectangle is created by collecting the portions where the thinned images are connected. Since rectangles having a large aspect ratio and short sides close to each other are highly likely to be character lines, the rectangles are combined in this case. In addition, a set of rectangles having substantially the same short side length and arranged at equal intervals is likely to be a body part, and thus is combined. As a result, for example, a region 419 is extracted from the region 410 as shown in FIG.

ステップＳ５０１では属性情報付与部３０１が分割されたそれぞれの領域に対して、図、表、本文、キャプションなどの属性を付与する。まず、文字が含まれているか否かを判断し、文字領域（文字が含まれる領域）と、オブジェクト領域（文字領域以外）に分類する。続いて文字領域とオブジェクト領域を詳細に分類する。 In step S501, the attribute information adding unit 301 assigns attributes such as a figure, a table, a text, and a caption to each divided area. First, it is determined whether or not a character is included, and it is classified into a character region (region including a character) and an object region (other than the character region). Subsequently, the character area and the object area are classified in detail.

オブジェクト領域に対しては、領域が小さい部分を「ノイズ」、画素密度の小さい部分を「表」、それ以外を図や写真であるとして「図」といった属性を付与する。 For the object area, an attribute such as “noise” is assigned to a portion having a small area, “table” is assigned to a portion having a low pixel density, and “figure” is assigned to a portion other than the figure.

一方、文字領域については、複数の小矩形を結合して作成された領域は本文部である可能性が高いと判断され、「本文」の属性を付加する。また、「表」や「写真」という属性が付与されたオブジェクトの近傍に存在している文字領域は、そのオブジェクトを説明する「キャプション」として属性が付与される。ここでキャプションの近傍にさらに小さな文字領域が存在していた場合、キャプションをさらに説明するキャプションであるとして、「サブキャプション」「キャプション」という関係性に再分類する。残りの領域に対しては、本文部との位置関係や、文字サイズ、文字列の行間によって、「章」、「節」、「ヘッダー・フッター」、「文字部」などの属性を付与する。 On the other hand, regarding the character area, it is determined that an area created by combining a plurality of small rectangles is a body part, and an attribute “body” is added. In addition, a character area existing in the vicinity of an object to which the attribute “table” or “photo” is assigned is given an attribute as “caption” for explaining the object. Here, if a smaller character area exists in the vicinity of the caption, the caption is re-classified into a relationship of “sub-caption” and “caption” as a caption that further explains the caption. For the remaining area, attributes such as “chapter”, “section”, “header / footer”, and “character part” are given depending on the positional relationship with the body part, the character size, and the line spacing of the character string.

以上の分類をもとに図６を分類した結果を示すと、領域４１０は節、領域４１１〜領域４１３は図、領域４１４および領域４１５はキャプション、領域４１６および領域４１７はサブキャプション、領域４１８は本文、領域４１９はページという属性が付与される。 FIG. 6 shows the result of classifying FIG. 6 based on the above classification. Area 410 is a node, areas 411 to 413 are diagrams, areas 414 and 415 are captions, areas 416 and 417 are sub-captions, and areas 418 are The text and area 419 are given the attribute of page.

ステップＳ５０２では、文字認識部３０２において、本文やキャプションなどの属性が付与された文字領域に対して文字認識処理を行い、文字領域と関連付けをして保持する。これにより、画像文書中の文章を検索することが出来るようになる。 In step S502, the character recognizing unit 302 performs character recognition processing on a character area to which an attribute such as a text or a caption is added, and stores the character area in association with the character area. This makes it possible to search for text in the image document.

ステップＳ５０３では、後述するオブジェクト用識別子生成処理Ｓ５０７で必要となる階層化レベルＮを階層化レベル算出部３０３により決定する。階層化レベルを小さくすると、詳細な単位でオブジェクトとメタデータの関係を保持するため、ファイルサイズが必然的に大きくなる。そこで、スキャナ部２００やＰＤＬ処理部２０５から受けっとったデータサイズに応じて、階層化レベルＮを段階的に決定する。判定基準は、例えば図７のようにページ数が５ページ以下ならＮ＝１，１０ページ以下ならＮ＝２、それ以上ならＮ＝３として決定する。あるいはファイルの容量や画像文書中に含まれるオブジェクトの個数などを判定条件にしてもよい。もちろん、これらを複合的に組み合わせて判定してもよい。また、階層化の最大レベルはここでは３としているが、自由に設定することも可能である。ただし、最大レベルを大きくしすぎてしまうと、本来は関連のないオブジェクトであっても、抽象度を上げすぎてしまったためにグループ化され、検索効率を落とす可能性もあるため、最大３〜５程度が適切であると思われる。 In step S503, the hierarchization level calculation unit 303 determines the hierarchization level N required in an object identifier generation process S507 described later. Decreasing the hierarchization level inevitably increases the file size because the relationship between the object and metadata is maintained in detailed units. Therefore, the hierarchization level N is determined step by step according to the data size received from the scanner unit 200 and the PDL processing unit 205. As shown in FIG. 7, for example, N = 1 if the number of pages is 5 or less, N = 2 if the number of pages is 10 or less, and N = 3 if it is more than 10 pages. Alternatively, the file capacity, the number of objects included in the image document, and the like may be used as the determination conditions. Of course, these may be combined and determined. The maximum level of hierarchization is 3 here, but it can be set freely. However, if the maximum level is increased too much, even objects that are not originally related may be grouped because the level of abstraction has been increased too much, which may reduce search efficiency. The degree seems to be appropriate.

続いてオブジェクト用識別子生成部３０４において、関連するオブジェクトをグループ化し、それぞれに対して固有のオブジェクト用識別子を生成する。まず、ステップＳ５０４でキャプション属性の付与されたオブジェクト領域を検出する。検出されなかった場合はステップＳ５０８に進む。検出された場合は、全てのオブジェクトに対して階層構造を分析し（ステップＳ５０５）、ステップＳ５０３で得られた階層化レベルに応じてオブジェクトをグループ化し（ステップＳ５０６）、グループごとに固有のオブジェクト用識別子を算出して付与する（ステップＳ５０７）。 Subsequently, the object identifier generation unit 304 groups related objects and generates a unique object identifier for each of them. First, in step S504, an object area with a caption attribute is detected. If not detected, the process proceeds to step S508. If detected, the hierarchical structure is analyzed for all objects (step S505), and the objects are grouped according to the hierarchical level obtained in step S503 (step S506). An identifier is calculated and assigned (step S507).

図６に対して、オブジェクトの階層構造を分析した結果を図８に示す。 FIG. 8 shows the result of analyzing the hierarchical structure of the object with respect to FIG.

ステップＳ５０４でキャプション属性の付与されたオブジェクト領域を検出すると、オブジェクト領域４１１〜４１３が該当する。そこで、この３つの領域に対して、レイアウト構造や論理構造を解析し、階層的にキャプションとなる文字列を検出する（ステップＳ５０５）。オブジェクト領域４１２を例に構造化の様子を説明する。まず、オブジェクト領域４１２には詳細な説明を付与するサブキャプションとして「Ｆｉｇ２（ａ）」というオブジェクト領域４１６が存在する。これを第１階層とする。第２階層では、「Ｆｉｇ２」というキャプションを持つオブジェクト領域４１５がサブキャプションをもつオブジェクトを総括していることが分かる。第３階層では、オブジェクト領域４１２は「１．１ＸＸ」という節（オブジェクト領域４１０）に含まれた図としてみなされ、第４階層になると「Ｐａｇｅ１」（オブジェクト領域４１９）という単位に属することがわかる。同様の処理をオブジェクト４１１、オブジェクト４１３に関して解析を行った結果が図８である。 If an object area with a caption attribute is detected in step S504, the object areas 411 to 413 correspond. Therefore, the layout structure and the logical structure are analyzed for these three areas, and character strings that are captioned hierarchically are detected (step S505). The state of structuring will be described using the object region 412 as an example. First, in the object area 412, there is an object area 416 "Fig2 (a)" as a sub-caption that gives a detailed description. This is the first layer. In the second layer, it can be seen that the object area 415 having the caption “FIG2” summarizes the objects having the subcaption. In the third hierarchy, the object area 412 is regarded as a figure included in the section “1.1 XX” (object area 410). In the fourth hierarchy, the object area 412 may belong to the unit “Page1” (object area 419). Recognize. FIG. 8 shows the result of analyzing the object 411 and the object 413 in the same process.

ステップＳ５０６では、この解析結果と、前述の階層化レベル算出部３０３で算出された階層化レベル（Ｎ）を比較し、キャプションが同一もしくは同義であるオブジェクトを、関連性があると判定してグループ化を行う。そして、各グループに固有のオブジェクト用識別子を生成し、付与する（ステップＳ５０７）。例えば、階層化レベルＮ＝１のときは、第１階層において付与されているキャプションで比較をするため、それぞれのオブジェクトが独立したものと判定される。その結果、オブジェクト用識別子ＩＤ＝１，２，３が生成され、それぞれに付与される。また階層化レベルＮ＝３のときは、第３階層で比較を行うため、すべてのオブジェクトに対して「１．１ＸＸ」というキャプションが付与されていることが分かる。その結果、３つのオブジェクトを１つのグループとして統合し、共通のオブジェクト用識別子ＩＤ＝1を生成し、それぞれのオブジェクトに付与する。 In step S506, the analysis result is compared with the hierarchization level (N) calculated by the hierarchization level calculation unit 303 described above, and the objects having the same or the same caption are determined to be related to each other. Do. Then, a unique object identifier is generated and assigned to each group (step S507). For example, when the hierarchization level N = 1, since the comparison is performed using the captions assigned in the first hierarchy, it is determined that each object is independent. As a result, object identifier IDs = 1, 2, and 3 are generated and assigned to each. When the hierarchization level N = 3, since the comparison is performed in the third hierarchy, it can be seen that the caption “1.1 XX” is given to all objects. As a result, the three objects are integrated as one group, and a common object identifier ID = 1 is generated and assigned to each object.

その後、メタデータ抽出部３０５において、グループ化されたオブジェクトに付与されたキャプションと同一または同義表現を本文領域から検索する（ステップＳ５０８）。見つからなかった場合はステップＳ５１１へ進む。もし見つかった場合には、ステップＳ５０９へ進み、オブジェクトを検索する際のメタデータとして本文中から抽出する。そして、オブジェクト用識別子と関連付けして保存する（ステップＳ５１０）。例えば、階層化レベルＮ＝３の場合は、３つのオブジェクト領域に付与されているキャプション全てを本文中から検索し、該当したキーワードを抽出し、重複キーワードを削除したものをメタデータとして付与する（図９参照）。 Thereafter, the metadata extraction unit 305 searches the body area for the same or synonymous expression as the caption given to the grouped object (step S508). If not found, the process proceeds to step S511. If it is found, the process proceeds to step S509, where it is extracted from the text as metadata when searching for the object. Then, it is stored in association with the object identifier (step S510). For example, when the hierarchization level N = 3, all the captions assigned to the three object areas are searched from the text, the corresponding keywords are extracted, and the duplicate keywords deleted are assigned as metadata ( (See FIG. 9).

ステップＳ５１１で、入力画像文書に抽出されたメタデータを付与した画像文書を生成する。フォーマットは、オブジェクトにメタデータを関連付けすることが可能なフォーマット（例えば、ＰＤＦ、ＸＰＳ、ＯＯＸＭＬ）である。（「ＰＤＦ」「ＸＰＳ」は登録商標）
図１４にメタデータを付与したフォーマットを作成した一例を示す。 In step S511, an image document with the extracted metadata added to the input image document is generated. The format is a format (for example, PDF, XPS, OOXML) that can associate metadata with an object. ("PDF" and "XPS" are registered trademarks)
FIG. 14 shows an example of creating a format with metadata.

図の属性をもつオブジェクト領域４１１〜４１３は、フォーマット要素８００〜８０３にそれぞれ対応する。そして、領域分割部３００で分割された領域の位置（Ｐｏｓｉｔｉｏｎ）やサイズ（Ｓｉｚｅ）などの情報と、オブジェクト用識別子生成部３０４で生成されたオブジェクト用識別子（ＩＤ）を保持する。また文字領域、例えば、本文属性をもつ本文領域４１８はフォーマット要素８０３として、領域分割部３００で分割された領域の位置（Ｐｏｓｉｔｉｏｎ）やサイズ（Ｓｉｚｅ）などの情報と、文字認識部３０２で認識された文字列情報（Ｄａｔａ）が記述される。オブジェクトに付随するメタデータはオブジェクトとは独立して記述される（８０４〜８０７）。そしてメタデータにはキャプションと本文中から抽出されたキーワードが関連付けられて記述される。このとき、メタデータの番号とオブジェクト領域中に書き込まれたオブジェクト用識別子（ＩＤ）が対応することになる。 Object areas 411 to 413 having attributes shown in the figure correspond to format elements 800 to 803, respectively. Information such as the position (Position) and size (Size) of the area divided by the area dividing unit 300 and the object identifier (ID) generated by the object identifier generating unit 304 are held. In addition, a character area, for example, a body area 418 having a body attribute, is recognized as a format element 803 by the character recognition unit 302 and information such as the position (Position) and size (Size) of the region divided by the region dividing unit 300. The character string information (Data) is described. Metadata associated with the object is described independently of the object (804 to 807). In the metadata, captions and keywords extracted from the text are associated and described. At this time, the metadata number and the object identifier (ID) written in the object area correspond to each other.

図１４（ａ）は、階層化レベルＮ＝１の場合を示している。Ｎ＝１のときはオブジェクト４１１〜４１３が異なるオブジェクト用識別子（ＩＤ＝１，２，３）を持ち、それぞれが個別のメタデータ８０４〜８０６を参照している。一方、図１４（ｂ）は、階層化レベルＮ＝３の場合であり、３つのオブジェクト４１１〜４１３が共通のオブジェクト用識別子（ＩＤ＝１）を持ち、１つのメタデータ８０７を参照するというフォーマットになる。 FIG. 14A shows a case where the hierarchization level N = 1. When N = 1, the objects 411 to 413 have different object identifiers (ID = 1, 2, 3), and each refers to individual metadata 804 to 806. On the other hand, FIG. 14B shows a case where the hierarchization level N = 3, in which three objects 411 to 413 have a common object identifier (ID = 1) and refer to one metadata 807. become.

オブジェクトをキーワード検索するときには、メタデータとして付与されている情報８０４〜８０７に対して検索をかける。そのため、階層化レベルＮ＝１の場合には各メタデータからキーワードが検出され、それに対応した画像が個別の結果として出力される。一方、階層化レベルＮ＝３の場合には、検索されるメタデータは１つなので、対応した画像が同時に検索結果として得られることになる。これにより、あたかも一つのオブジェクトを保持しているように、複数のオブジェクトをグループ化して取り扱うことが出来る。 When searching for an object by keyword, information 804 to 807 given as metadata is searched. Therefore, when the hierarchization level N = 1, a keyword is detected from each metadata, and an image corresponding to the keyword is output as an individual result. On the other hand, when the hierarchization level N = 3, there is only one metadata to be searched, so that a corresponding image is obtained as a search result at the same time. As a result, a plurality of objects can be grouped and handled as if one object is held.

最後に、前述したメタデータを検索キーワードとしてオブジェクトを検索する例について説明する。図１０にオブジェクトを検索するためのＶｉｅｗｅｒの一例を示す。例えばＡｄｏｂｅのＡｃｒｏｂａｔＲｅａｄｅｒなどがあげられる。 Finally, an example of searching for an object using the above-described metadata as a search keyword will be described. FIG. 10 shows an example of a viewer for searching for an object. For example, Adobe's Acrobat Reader.

Ｖｉｅｗｅｒ６００でメタデータの付与されたデータを開くと文書表示ウィンドウ６０１内に表示される。検索キーワード入力欄６０２に検索する語句「ＸＸ」を入力すると、画像の検索結果一覧が検索結果表示欄６０３に表示される。ここでは３つのオブジェクトが検索結果としてヒットしている。そして、検索結果１を選択すると、オブジェクト領域４１１が該当箇所として表示される様子を示している。検索語句「ＸＸ」でオブジェクト領域を検索出来る理由については、前述したようにメタデータ「ＸＸ」が、オブジェクト領域４１１を指し示すオブジェクト用識別子と関連付けられているためである。 When data to which metadata is added is opened in the Viewer 600, it is displayed in the document display window 601. When the search term “XX” is entered in the search keyword input field 602, a list of image search results is displayed in the search result display field 603. Here, three objects are hit as search results. Then, when the search result 1 is selected, the object area 411 is displayed as a corresponding portion. The reason why the object area can be searched with the search term “XX” is that the metadata “XX” is associated with the object identifier indicating the object area 411 as described above.

図１１は階層化レベル（Ｎ）による検索結果と各オブジェクト用識別子に付与されたメタデータを説明する図である。図１１（ａ）〜（ｃ）はそれぞれ、キーワード「ＸＸ」を階層化レベルＮ＝１，Ｎ＝２，Ｎ＝３の場合に検索したときの検索結果ならびにメタデータを示している。 FIG. 11 is a diagram for explaining the search results based on the hierarchical level (N) and the metadata assigned to each object identifier. FIGS. 11A to 11C show search results and metadata when the keyword “XX” is searched when the hierarchization levels N = 1, N = 2, and N = 3, respectively.

階層化レベルＮ＝１とした場合（図１１（ａ））には３つのオブジェクトがそれぞれ異なるオブジェクト用識別子が割り当てられ、ＸＸというキーワードを持っている。そのため、３個の検索結果が得られる。一方、階層化レベルをＮ＝３とした場合（図１１（ｃ））には３つのオブジェクトが１グループに統合され、画像を検索した場合には文書表示ウィンドウ６０１中に３つが統合された状態で表示される。 When the hierarchization level N = 1 (FIG. 11A), three objects are assigned different object identifiers and have the keyword XX. Therefore, three search results are obtained. On the other hand, when the hierarchization level is N = 3 (FIG. 11C), three objects are integrated into one group, and when an image is searched, three are integrated in the document display window 601. Is displayed.

階層化レベルを小さく設定することにより、詳細レベルで画像の検索が出来ることになる。しかし、ページ数が多くなってきた場合にはヒットする結果も増えてきてしまい、所望の結果を見つけるのに時間を要してしまう。一方、階層化レベルを大きくすることで詳細な検索は出来なくなってしまうものの、大きなまとまりで検索が出来るため、ページ数が増えた場合にもヒットする個数を大幅に減らすことが出来る。結果として所望の画像が含まれている領域を簡単に見つけることが出来るようになる。 By setting the hierarchization level small, it is possible to search for images at the detailed level. However, as the number of pages increases, the hit results also increase, and it takes time to find a desired result. On the other hand, although the detailed search cannot be performed by increasing the hierarchization level, the search can be performed in a large unit, so that the number of hits can be greatly reduced even when the number of pages increases. As a result, it is possible to easily find a region including a desired image.

また、階層化レベルＮを大きくすることで、重複していたキーワードを削除することが出来るため、ファイルサイズも小さく抑えることも出来る。 Also, by increasing the hierarchization level N, duplicate keywords can be deleted, so the file size can be reduced.

（実施例２）
実施例１ではグループ化する階層化レベルを入力データに応じて自動的に算出して、オブジェクトのグループ化とメタデータ抽出・付与を行う方法について説明を行った。しかし、オブジェクトのグループ化などが自動で行われてしまうと、ユーザの所望の動作と反してしまうことがある。そこで、実施例２では、ユーザがファイルサイズやグループ化の階層化レベルを自由に設定出来る仕組みを提供する。 (Example 2)
In the first embodiment, the method of automatically calculating the hierarchical level to be grouped according to the input data, and performing the object grouping and metadata extraction / assignment has been described. However, if object grouping or the like is performed automatically, it may be contrary to the user's desired action. Therefore, the second embodiment provides a mechanism that allows the user to freely set the file size and grouping hierarchy level.

図１２に操作部２０７における操作画面７００の一例を示す。操作画面７００にはファイルサイズを設定するためのスライダーバー７０１と階層化レベルを設定するためのスライダーバー７０２が設けられている。初期状態としてはオブジェクトをグループ化しないようにするために、階層化レベルを最小（Ｎ＝1）として設定してある。ユーザは２つのスライダーバー７０１と７０２を操作して所望のデータ出力設定を行い、ＯＫボタン７０４を押すことで、ＭＦＰ１００内の制御部２０６が階層化レベル算出部３０２へ設定内容を転送する。なお、Ｃａｎｃｅｌボタン７０３を押すことで操作を取り消すことが可能である。階層化レベル算出部３０３は受信した設定内容に応じて階層化レベルＮを算出する（図１３参照）。 FIG. 12 shows an example of the operation screen 700 in the operation unit 207. The operation screen 700 is provided with a slider bar 701 for setting a file size and a slider bar 702 for setting a hierarchical level. As an initial state, the hierarchization level is set to the minimum (N = 1) so as not to group the objects. The user operates the two slider bars 701 and 702 to perform desired data output settings, and presses an OK button 704, whereby the control unit 206 in the MFP 100 transfers the setting contents to the hierarchical level calculation unit 302. Note that the operation can be canceled by pressing the Cancel button 703. The hierarchization level calculation unit 303 calculates the hierarchization level N according to the received setting content (see FIG. 13).

階層化レベル算出処理以外は、実施例１と同様の処理のため、説明を省略する。 Except for the hierarchized level calculation process, the process is the same as that of the first embodiment, and a description thereof will be omitted.

（実施例３）
本発明は、例えば、システム、装置、方法、プログラムもしくは記憶媒体などとしての実施態様をとることが可能である。具体的には、複数の機器を含むシステムに適用しても良いし、また、一つの機器からなる装置に適用しても良い。 (Example 3)
The present invention can take the form of, for example, a system, apparatus, method, program, or storage medium. Specifically, the present invention may be applied to a system including a plurality of devices, or may be applied to an apparatus including a single device.

尚、本発明は、前述した実施形態の機能を実現するソフトウェアのプログラム（実施形態では図に示すフローチャートに対応したプログラム）を、システムあるいは装置に直接あるいは遠隔から供給する。そして、そのシステムあるいは装置のコンピュータが該供給されたプログラムコードを読み出して実行することによっても達成される場合を含む。 In the present invention, a software program (in the embodiment, a program corresponding to the flowchart shown in the drawing) that realizes the functions of the above-described embodiments is directly or remotely supplied to a system or apparatus. In addition, this includes a case where the system or the computer of the apparatus is also achieved by reading and executing the supplied program code.

従って、本発明の機能処理をコンピュータで実現するために、該コンピュータにインストールされるプログラムコード自体も本発明を実現するものである。つまり、本発明は、本発明の機能処理を実現するためのコンピュータプログラム自体も含まれる。 Accordingly, since the functions of the present invention are implemented by computer, the program code installed in the computer also implements the present invention. In other words, the present invention includes a computer program itself for realizing the functional processing of the present invention.

その場合、プログラムの機能を有していれば、オブジェクトコード、インタプリタにより実行されるプログラム、ＯＳに供給するスクリプトデータなどの形態であっても良い。 In that case, as long as it has a program function, it may be in the form of object code, a program executed by an interpreter, script data supplied to the OS, or the like.

プログラムを供給するための記録媒体としては、例えば、フロッピー（登録商標）ディスク、ハードディスク、光ディスクがある。また、更に、記録媒体としては、光磁気ディスク、ＭＯ、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、ＣＤ−ＲＷ、磁気テープ、不揮発性のメモリカード、ＲＯＭ、ＤＶＤ（ＤＶＤ−ＲＯＭ，ＤＶＤ−Ｒ）などがある。 Examples of the recording medium for supplying the program include a floppy (registered trademark) disk, a hard disk, and an optical disk. Further, as a recording medium, magneto-optical disk, MO, CD-ROM, CD-R, CD-RW, magnetic tape, nonvolatile memory card, ROM, DVD (DVD-ROM, DVD-R), etc. is there.

その他、プログラムの供給方法としては、クライアントコンピュータのブラウザを用いてインターネットのホームページに接続する。そして、その接続先のホームページから本発明のコンピュータプログラムそのもの、もしくは圧縮され自動インストール機能を含むファイルをハードディスクなどの記録媒体にダウンロードすることによっても供給出来る。また、本発明のプログラムのプログラムコードを複数のファイルに分割し、それぞれのファイルを異なるホームページからダウンロードすることによっても実現可能である。つまり、本発明の機能処理をコンピュータで実現するためのプログラムファイルを複数のユーザに対してダウンロードさせるＷＷＷサーバも、本発明に含まれるものである。 As another program supply method, a browser on a client computer is used to connect to an Internet home page. Then, the computer program itself of the present invention or a compressed file including an automatic installation function can be supplied by downloading it to a recording medium such as a hard disk from the homepage of the connection destination. It can also be realized by dividing the program code of the program of the present invention into a plurality of files and downloading each file from a different homepage. That is, a WWW server that allows a plurality of users to download a program file for realizing the functional processing of the present invention on a computer is also included in the present invention.

また、本発明のプログラムを暗号化してＣＤ−ＲＯＭなどの記憶媒体に格納してユーザに配布し、所定の条件をクリアしたユーザに対し、インターネットを介してホームページから暗号化を解く鍵情報をダウンロードさせる。そして、その鍵情報を使用することにより暗号化されたプログラムを実行してコンピュータにインストールさせて実現することも可能である。 In addition, the program of the present invention is encrypted, stored in a storage medium such as a CD-ROM, distributed to users, and key information for decryption is downloaded from a homepage via the Internet to users who satisfy predetermined conditions. Let It is also possible to execute the encrypted program by using the key information and install the program on a computer.

また、コンピュータが、読み出したプログラムを実行することによって、前述した実施形態の機能が実現される。また、そのプログラムの指示に基づき、コンピュータ上で稼動しているＯＳなどが、実際の処理の一部または全部を行ない、その処理によっても前述した実施形態の機能が実現され得る。 Further, the functions of the above-described embodiments are realized by the computer executing the read program. Further, based on the instructions of the program, an OS or the like running on the computer performs part or all of the actual processing, and the functions of the above-described embodiments can be realized by the processing.

さらに、記録媒体から読み出されたプログラムが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれる。その後、そのプログラムの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部または全部を行ない、その処理によっても前述した実施形態の機能が実現される。 Further, the program read from the recording medium is written in a memory provided in a function expansion board inserted into the computer or a function expansion unit connected to the computer. Thereafter, the CPU of the function expansion board or function expansion unit performs part or all of the actual processing based on the instructions of the program, and the functions of the above-described embodiments are realized by the processing.

Claims

An area dividing unit 300 for dividing the input document into areas;
An attribute information giving unit 301 for giving attribute information to the divided area;
A character recognition unit 302 for obtaining character information for a character region;
A hierarchization level calculation unit 303 that calculates a hierarchization level when grouping the objects;
An object identifier generating unit 304 that integrates the objects into groups having the same caption at the hierarchical level and generates an individual identifier for each group;
A metadata extraction unit 305 extracts metadata associated with the object by detecting a caption associated with the object from the input document, and associates the identifier for the object with the metadata and stores the metadata in a storage area.
An image processing apparatus comprising:

The image processing apparatus according to claim 1, wherein the hierarchized level calculation unit 303 can automatically calculate a level for grouping objects according to a data size of the input document.

The image processing apparatus according to claim 1, wherein the hierarchized level calculation unit 303 can change the level at which the objects are grouped in a stepwise manner in accordance with a user setting.

Based on the caption information attached to the object or the logical structure of the chapter or section, the anchor identifier generation unit 304 selects an object including the same or similar expression at the hierarchization level determined by the hierarchization level calculation unit 303. The image processing apparatus according to any one of claims 1 to 3, wherein the image processing apparatus recognizes the same group and assigns a unique identifier to each group.

The attribute information analysis unit 301 assigns any of the attributes of chapter, section, text, and caption to the character area, and any of the table, photograph, drawing, and line drawing for the object area other than the character. The image processing apparatus according to claim 1, wherein the attribute is assigned.

The metadata extraction unit 305 searches the text for a sentence or keyword that includes the same or similar expression as the caption attached to the object, and assigns the sentence as metadata to the object. Image processing device.