JP2016163141A

JP2016163141A - Image processing apparatus, image processing method and program

Info

Publication number: JP2016163141A
Application number: JP2015038751A
Authority: JP
Inventors: 大地小玉; Daichi Kodama
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2015-02-27
Filing date: 2015-02-27
Publication date: 2016-09-05

Abstract

【課題】縮小した複数枚の原稿画像以外の情報を生成して印刷することなく、適切に等倍復元する。【解決手段】本発明の画像処理装置は、文字情報取得部と集約数判定部とページ順序判定部とを備える。文字情報取得部は、原稿から読み取った画像データに含まれる文字を示す文字情報を取得する。集約数判定部は、文字情報取得部により取得された文字情報に基づいて、画像データに集約された複数の原稿画像の数を示す集約数を判定する。ページ順序判定部は、複数の原稿画像の各々に含まれる文字情報に基づいて、複数の原稿画像のページ順序を判定する。【選択図】図２An object of the present invention is to appropriately restore original images to the same size without generating and printing information other than images of a plurality of reduced original documents. An image processing apparatus according to the present invention includes a character information acquisition section, an aggregation number determination section, and a page order determination section. The character information acquisition unit acquires character information indicating characters included in image data read from a document. The aggregation number determination section determines an aggregation number indicating the number of a plurality of document images that are aggregated into image data, based on the character information acquired by the character information acquisition section. The page order determination unit determines the page order of the plurality of manuscript images based on character information included in each of the plurality of manuscript images. [Selection diagram] Figure 2

Description

本発明は、画像処理装置、画像処理方法およびプログラムに関する。 The present invention relates to an image processing apparatus, an image processing method, and a program.

従来、複数枚の原稿を縮小して同一の記録紙上に画像を形成する機能（集約印刷）を有する画像形成装置が知られている。このような画像形成装置においては、集約印刷において縮小された各原稿画像を、集約コピーする前の元の画像サイズに戻してスキャンやプリントする（等倍復元する）場合に、ユーザが手作業でトリミングや拡大をする必要があるために、ユーザの負担が大きいという問題があった。 2. Description of the Related Art Conventionally, an image forming apparatus having a function (aggregate printing) for reducing a plurality of originals and forming an image on the same recording paper is known. In such an image forming apparatus, when each original image reduced in the collective printing is returned to the original image size before the collective copy and scanned or printed (same size restoration), the user manually operates. Since it is necessary to perform trimming and enlargement, there is a problem that the burden on the user is heavy.

このような問題を解決するために、集約印刷時に、縮小した複数枚の原稿画像と併せて、集約率等の等倍復元に必要な情報を表すマークを記録紙に印刷しておき、それを元に非集約状態に復元する（等倍復元する）技術が知られている。 In order to solve such a problem, at the time of aggregate printing, in addition to a plurality of reduced original images, a mark representing information necessary for restoring the same magnification such as an aggregation rate is printed on a recording sheet. A technique for restoring to a non-aggregated state (same size restoration) is known.

しかしながら、特許文献１に開示された技術では、何らかの要因で、等倍復元に必要なマークを認識できなかった場合は、等倍復元することができない。すなわち、特許文献１に開示された技術では、適切に等倍復元することができない場合がある。また、特許文献１に開示された技術では、縮小した複数枚の原稿画像以外の情報（等倍復元に必要なマーク）を生成して印刷する必要があるため、処理が煩雑になる。 However, with the technique disclosed in Patent Document 1, if a mark necessary for the normal magnification restoration cannot be recognized for some reason, the normal magnification restoration cannot be performed. That is, with the technique disclosed in Patent Document 1, there is a case where it is not possible to appropriately restore the same size. Further, in the technique disclosed in Patent Document 1, it is necessary to generate and print information other than the reduced plurality of document images (marks necessary for the normal magnification restoration), so that the processing becomes complicated.

上述した課題を解決し、目的を達成するために、本発明は、原稿から読み取った画像データに含まれる文字を示す文字情報を取得する文字情報取得部と、前記文字情報取得部により取得された前記文字情報に基づいて、前記画像データに集約された複数の原稿画像の数を示す集約数を判定する集約数判定部と、前記複数の原稿画像の各々に含まれる前記文字情報に基づいて、前記複数の原稿画像のページ順序を判定するページ順序判定部と、を備える画像処理装置である。 In order to solve the above-described problems and achieve the object, the present invention is obtained by a character information acquisition unit that acquires character information indicating characters included in image data read from a document, and the character information acquisition unit. Based on the character information, based on the character information included in each of the plurality of document images, an aggregation number determination unit that determines an aggregation number indicating the number of document images aggregated in the image data, And a page order determining unit that determines a page order of the plurality of document images.

本発明によれば、縮小した複数枚の原稿画像以外の情報を生成して印刷することなく、適切に等倍復元することができる。 According to the present invention, it is possible to appropriately restore the same size without generating and printing information other than a plurality of reduced original images.

図１は、ＭＦＰのハードウェア構成の一例を示す図である。FIG. 1 is a diagram illustrating an example of a hardware configuration of an MFP. 図２は、画像処理装置が有する機能の一例を示す図である。FIG. 2 is a diagram illustrating an example of functions of the image processing apparatus. 図３は、画像処理装置による処理の一例を示すフローチャートである。FIG. 3 is a flowchart illustrating an example of processing performed by the image processing apparatus. 図４は、入力画像データの一例を示す図である。FIG. 4 is a diagram illustrating an example of input image data. 図５は、集約数判定処理の第１の過程を説明するための図である。FIG. 5 is a diagram for explaining a first process of the aggregation number determination process. 図６は、集約数判定処理の第２の過程を説明するための図である。FIG. 6 is a diagram for explaining a second process of the aggregation number determination process. 図７は、集約数判定処理の第３の過程を説明するための図である。FIG. 7 is a diagram for explaining a third process of the aggregation number determination process. 図８は、集約数判定処理の第４の過程を説明するための図である。FIG. 8 is a diagram for explaining a fourth process of the aggregation number determination process. 図９は、集約数判定処理を示すフローチャートである。FIG. 9 is a flowchart showing the aggregation number determination process. 図１０は、集約数判定処理を示すフローチャートである。FIG. 10 is a flowchart showing the aggregation number determination process. 図１１は、集約数判定処理を示すフローチャートである。FIG. 11 is a flowchart showing the aggregation number determination process. 図１２は、入力画像データのイメージ図である。FIG. 12 is an image diagram of input image data. 図１３は、ページ番号判定処理の一例を示すフローチャートである。FIG. 13 is a flowchart illustrating an example of the page number determination process. 図１４は、入力画像データのイメージ図である。FIG. 14 is an image diagram of input image data. 図１５は、目次内容判定処理のイメージ図である。FIG. 15 is an image diagram of the table of contents determination process. 図１６は、目次内容判定処理の一例を示すフローチャートである。FIG. 16 is a flowchart illustrating an example of a table of contents determination process. 図１７は、入力画像データのイメージ図である。FIG. 17 is an image diagram of input image data. 図１８は、目次再使用判定処理の一例を示すフローチャートである。FIG. 18 is a flowchart illustrating an example of a table of contents reuse determination process. 図１９は、最終ページの原稿から読み取った画像データを表すイメージ図である。FIG. 19 is an image diagram showing image data read from the document of the last page. 図２０は、ページ跨ぎ文字判定処理のイメージ図である。FIG. 20 is an image diagram of the page crossing character determination process.

以下、添付図面を参照しながら、本発明に係る画像処理装置、画像処理方法およびプログラムの実施形態を詳細に説明する。以下では、本発明が適用される画像形成装置の一例として、複合機（ＭＦＰ：Multifunction Peripheral）を例に挙げて説明するが、これに限られるものではない。なお、複合機とは、コピー機能、スキャナ機能、プリント機能、ファクス機能などの複数の異なる機能を有する装置である。 Hereinafter, embodiments of an image processing apparatus, an image processing method, and a program according to the present invention will be described in detail with reference to the accompanying drawings. Hereinafter, as an example of an image forming apparatus to which the present invention is applied, a multifunction peripheral (MFP) will be described as an example. However, the present invention is not limited to this. Note that a multifunction peripheral is a device having a plurality of different functions such as a copy function, a scanner function, a print function, and a fax function.

図１は、本実施形態のＭＦＰ１のハードウェア構成の一例を示す図である。図１に示すように、ＭＦＰ１は、原稿読み取り装置１０と、画像処理装置２０と、操作部６０と、印刷装置４０と、を備える。 FIG. 1 is a diagram illustrating an example of a hardware configuration of the MFP 1 according to the present embodiment. As illustrated in FIG. 1, the MFP 1 includes a document reading device 10, an image processing device 20, an operation unit 60, and a printing device 40.

原稿読み取り装置１０は、原稿台に載置された原稿から画像データを読み取る装置であり、公知の様々な構成で実現可能である。原稿読み取り装置１０によって読み取られた画像データは、画像処理装置２０に入力される。 The document reading device 10 is a device that reads image data from a document placed on a document table, and can be realized in various known configurations. Image data read by the document reading device 10 is input to the image processing device 20.

画像処理装置２０は、原稿読み取り装置１０から入力された画像データ（原稿から読み取った画像データ）に対して画像処理を施す装置である。図１に示すように、画像処理装置２０は、ＣＰＵ２０１と、ＲＯＭ２０２と、ＲＡＭ２０３と、白紙検知処理回路２０４と、回転処理回路２０５と、変倍処理回路２０６と、トリミング処理回路２０７と、ネットワークＩ／Ｆ回路２０８と、出力画像処理回路２０９とを備える。 The image processing device 20 is a device that performs image processing on image data (image data read from a document) input from the document reading device 10. As shown in FIG. 1, the image processing apparatus 20 includes a CPU 201, a ROM 202, a RAM 203, a blank sheet detection processing circuit 204, a rotation processing circuit 205, a scaling processing circuit 206, a trimming processing circuit 207, and a network I. / F circuit 208 and output image processing circuit 209.

ＣＰＵ２０１は、画像処理装置２０の動作を統括的に制御する。ＣＰＵ２０１は、ＲＡＭ２０３をワークエリア（作業領域）としてＲＯＭ２０２等に格納されたプログラムを実行することで、画像処理装置２０全体の動作を制御する。 The CPU 201 comprehensively controls the operation of the image processing apparatus 20. The CPU 201 controls the overall operation of the image processing apparatus 20 by executing a program stored in the ROM 202 or the like using the RAM 203 as a work area (work area).

白紙検知処理回路２０４は、原稿の白紙領域を検知する回路である。回転処理回路２０５は、画像データを回転させる回転処理を行う回路である。変倍処理回路２０６は、画像データの縮小や拡大などの処理を行う回路である。トリミング処理回路２０７は、画像データの一部の領域を切り出す（抽出する）処理を行う回路である。ネットワークＩ／Ｆ回路２０８は、外部装置５０との間でデータの送受信を行うためのインタフェース回路である。出力画像処理回路２０９は、印刷装置４０に出力する画像データ（出力画像）の処理を行う回路である。 The blank sheet detection processing circuit 204 is a circuit that detects a blank area of a document. The rotation processing circuit 205 is a circuit that performs rotation processing for rotating image data. The scaling processing circuit 206 is a circuit that performs processing such as reduction and enlargement of image data. The trimming processing circuit 207 is a circuit that performs processing for cutting out (extracting) a partial region of image data. The network I / F circuit 208 is an interface circuit for transmitting / receiving data to / from the external device 50. The output image processing circuit 209 is a circuit that processes image data (output image) output to the printing apparatus 40.

操作部６０は、ユーザによる操作を受け付ける装置であり、例えばタッチパネルなどで構成されてもよい。印刷装置４０は、画像処理装置２０から出力された画像データに基づく印刷を行う装置であり、公知の様々な構成で実現可能である。 The operation unit 60 is a device that receives an operation by a user, and may be configured with a touch panel, for example. The printing device 40 is a device that performs printing based on the image data output from the image processing device 20, and can be realized with various known configurations.

次に、画像処理装置２０が有する機能について説明する。図２は、画像処理装置２０が有する機能の一例を示すブロック図である。図２に示すように、画像処理装置２０は、文字情報取得部２１と、回転処理部２２と、集約数判定部２３と、ページ順序判定部２４と、トリミング処理部２５と、記憶部２６と、拡大部２７と、ページ順序変更部２８と、ネットワークＩ／Ｆ部２９と、出力画像処理部３０と、出力部３１とを有する。本実施形態では、画像処理装置２０の各部の機能（文字情報取得部２１、回転処理部２２、集約数判定部２３、ページ順序判定部２４、トリミング処理部２５、拡大部２７、ページ順序変更部２８、ネットワークＩ／Ｆ部２９、出力画像処理部３０、出力部３１）は、ＣＰＵ２０１がＲＯＭ２０２等に格納されたプログラムを実行することにより実現されるが、これに限らず、例えば画像処理装置２０の各部の機能のうちの少なくとも一部が専用のハードウェア回路で実現されてもよい。 Next, functions of the image processing apparatus 20 will be described. FIG. 2 is a block diagram illustrating an example of functions that the image processing apparatus 20 has. As shown in FIG. 2, the image processing apparatus 20 includes a character information acquisition unit 21, a rotation processing unit 22, an aggregation number determination unit 23, a page order determination unit 24, a trimming processing unit 25, and a storage unit 26. The enlargement unit 27, the page order change unit 28, the network I / F unit 29, the output image processing unit 30, and the output unit 31. In the present embodiment, functions of each unit of the image processing apparatus 20 (character information acquisition unit 21, rotation processing unit 22, aggregation number determination unit 23, page order determination unit 24, trimming processing unit 25, enlargement unit 27, page order change unit 28, the network I / F unit 29, the output image processing unit 30, and the output unit 31) are realized by the CPU 201 executing a program stored in the ROM 202 or the like. At least a part of the functions of the respective units may be realized by a dedicated hardware circuit.

文字情報取得部２１は、原稿から読み取った画像データ（以下、「入力画像データ」と称する場合がある）に含まれる文字を示す文字情報を取得する。例えば文字情報取得部２１は、入力画像データに含まれる文字を識別する処理（ＯＣＲ処理等）を行うことで、文字情報を取得するとともに文字の印字方向（文字の進行方向）を判定する。回転処理部２２は、文字情報取得部２１によって判定された印字方向に応じて、入力画像データを回転（例えば９０度回転）する処理を行う。 The character information acquisition unit 21 acquires character information indicating characters included in image data read from a document (hereinafter also referred to as “input image data”). For example, the character information acquisition unit 21 performs processing (OCR processing or the like) for identifying characters included in the input image data, thereby acquiring character information and determining the character printing direction (character progression direction). The rotation processing unit 22 performs a process of rotating (for example, rotating 90 degrees) the input image data according to the printing direction determined by the character information acquisition unit 21.

集約数判定部２３は、文字情報取得部２１により取得された文字情報に基づいて、入力画像データに集約された複数の原稿画像の数を示す集約数を判定する。詳細な内容は後述するが、本実施形態では、集約数判定部２３は、入力画像データを分割して得られる複数の単位領域ごとに、文字情報を含む非空白領域であるか文字情報を含まない空白領域であるかを判定し、該単位領域が非空白領域であり、かつ、入力画像データの副走査方向（紙の送り方向と直交する方向）の半分の位置において主走査方向（紙の送り方向）に延びる第１の基準線を対称軸として該単位領域に対応する単位領域を示す第１の対称単位領域が、空白領域である場合は、第１の対称単位領域は非空白領域であると判定する。また、該単位領域が空白領域であり、かつ、第１の対称単位領域が非空白領域である場合は、該単位領域は非空白領域であると判定する。また、該単位領域が非空白領域であり、かつ、入力画像データの主走査方向の半分の位置において副走査方向に延びる第２の基準線を対称軸として該単位領域に対応する前記単位領域を示す第２の対称単位領域が、空白領域である場合は、第２の対称単位領域は非空白領域であると判定する。さらに、該単位領域が空白領域であり、かつ、第２の対称単位領域が非空白領域である場合は、該単位領域は非空白領域であると判定する。集約数判定部２３は、非空白領域であると判定した複数の単位領域が集合している領域を、原稿画像の領域であると判定する。また、集約数判定部２３は、ユーザによる操作に応じて（操作部６０で受け付けた操作に応じて）、単位領域のサイズを可変に設定することもできる。 Based on the character information acquired by the character information acquisition unit 21, the aggregation number determination unit 23 determines an aggregation number indicating the number of a plurality of document images aggregated in the input image data. Although detailed contents will be described later, in the present embodiment, the aggregation number determination unit 23 is a non-blank area including character information or includes character information for each of a plurality of unit areas obtained by dividing the input image data. The unit area is a non-blank area, and the main scanning direction (paper direction) is half the position of the input image data in the sub-scanning direction (direction perpendicular to the paper feeding direction). When the first symmetric unit region indicating the unit region corresponding to the unit region with the first reference line extending in the (feed direction) as the symmetry axis is a blank region, the first symmetric unit region is a non-blank region. Judge that there is. If the unit area is a blank area and the first symmetric unit area is a non-blank area, it is determined that the unit area is a non-blank area. The unit area is a non-blank area, and the unit area corresponding to the unit area is defined with a second reference line extending in the sub-scanning direction at a half position in the main scanning direction of the input image data as an axis of symmetry. When the second symmetric unit area shown is a blank area, it is determined that the second symmetric unit area is a non-blank area. Furthermore, when the unit area is a blank area and the second symmetric unit area is a non-blank area, it is determined that the unit area is a non-blank area. The aggregation number determination unit 23 determines that an area in which a plurality of unit areas determined to be non-blank areas is a document image area. Further, the aggregation number determination unit 23 can also variably set the size of the unit area in accordance with an operation by the user (in response to an operation received by the operation unit 60).

ページ順序判定部２４は、入力画像データに集約された複数の原稿画像の各々に含まれる文字情報に基づいて、複数の原稿画像のページ順序を判定する。本実施形態では、ページ順序判定部２４は、ページ番号判定部１０１と、目次内容判定部１０２と、目次再使用判定部１０３と、欠けページ判定部１０４と、ページ跨ぎ文字列判定部１０５とを含む。 The page order determination unit 24 determines the page order of the plurality of document images based on the character information included in each of the plurality of document images collected in the input image data. In the present embodiment, the page order determination unit 24 includes a page number determination unit 101, a table of contents determination unit 102, a table of contents reuse determination unit 103, a missing page determination unit 104, and a page crossing character string determination unit 105. Including.

ページ番号判定部１０１は、入力画像データに集約された複数の原稿画像の各々に含まれる文字情報のうち、数を表す文字情報である順序文字情報に基づいて、複数の原稿画像のページ順序を判定する。詳細な内容は後述するが、ページ番号判定部１０１は、複数の原稿画像のうちの何れか１つの原稿画像を示す対象原稿画像に含まれる順序文字情報を示す対象順序文字情報を特定し、対象順序文字情報と、対象原稿画像の次のページの候補となる原稿画像を示す次ページ候補原稿画像のうち対象順序文字情報に対応する位置に存在する順序文字情報との関係に基づいて、対象原稿画像と次ページ候補原稿画像とのページ順序を判定する。 The page number determination unit 101 determines the page order of the plurality of document images based on the order character information, which is character information representing the number, among the character information included in each of the plurality of document images collected in the input image data. judge. Although detailed contents will be described later, the page number determination unit 101 identifies target order character information indicating the order character information included in the target document image indicating any one of the plurality of document images, and sets the target Based on the relationship between the order character information and the order character information existing at the position corresponding to the target order character information in the next page candidate document image indicating the document image that is a candidate for the next page of the target document image, the target document The page order of the image and the next page candidate document image is determined.

目次内容判定部１０２は、入力画像データに集約された複数の原稿画像の各々の形状と、文字の印字方向とから先頭の原稿画像を判定し、先頭の原稿画像、および、先頭の原稿画像の次のページの候補となる原稿画像を示す次ページ候補原稿画像の中から、それぞれに含まれる文字情報、または、それぞれのレイアウトに基づいて、目次のページを判定する。詳細な内容は後述するが、目次内容判定部１０２は、判定した目次のページに含まれる文字情報に基づいて、目次のページに含まれる複数の文字列と登場順序とを対応付けた目次情報を生成し、目次情報と、目次のページ以外の原稿画像に含まれる文字情報とに基づいて、複数のページ順序を判定する。 The table of contents determination unit 102 determines the first document image from the shape of each of the plurality of document images collected in the input image data and the print direction of the characters, and determines the first document image and the first document image. A table of contents page is determined based on character information included in each of the next page candidate document images indicating document images that are candidates for the next page or each layout. Although the detailed contents will be described later, the table of contents determination unit 102 generates table of contents information in which a plurality of character strings included in the table of contents page are associated with the appearance order based on the character information included in the determined page of the table of contents. The order of the plurality of pages is determined based on the table of contents information and the character information included in the document image other than the table of contents page.

目次再使用判定部１０３は、目次内容判定部１０２と同様に目次のページを判定し、判定した目次のページに含まれる文字情報に基づいて、目次のページに含まれる複数の文字列と登場順序とを対応付けた目次情報を生成し、目次のページ以外の原稿画像のうち、その中に含まれる複数の文字列の種類と登場順序が目次情報と一致する原稿画像が存在するか否かを判定する。詳細な内容は後述するが、目次再使用判定部１０３は、目次のページ以外の原稿画像のうち、その中に含まれる複数の文字列の種類と登場順序が目次情報と一致し、かつ、目次情報に含まれる複数の文字列のうち何れか１つの文字列のサイズまたは色が変わっている、あるいは、変わっていない原稿画像が存在する場合は、その１つの文字列の登場順序に基づいて、その原稿画像のページ順序を判定する。 The table of contents reuse determination unit 103 determines the table of contents page in the same manner as the table of contents determination unit 102, and based on the character information included in the determined table of contents page, the plurality of character strings included in the table of contents page and the appearance order Is generated, and it is determined whether or not there is a document image whose type and appearance order of the plurality of character strings included in the document image other than the table of contents page match the table of contents information. judge. Although the detailed contents will be described later, the table of contents reuse determination unit 103 matches the table of contents information with the types and appearance order of a plurality of character strings included in the document images other than the table of contents page. If there is a document image in which the size or color of any one of the plurality of character strings included in the information has changed or has not changed, based on the appearance order of the one character string, The page order of the document image is determined.

欠けページ判定部１０４は、最終ページの原稿から読み取った画像データのうち、集約数判定部２３で判定された集約数が示す数の原稿画像と１対１に対応する複数の原稿画像の領域のうちの少なくとも１つの原稿画像の領域が白紙領域である場合、その白紙領域の位置に応じて、複数の原稿画像のページ順序を判定する。詳細な内容は後述する。 The missing page determination unit 104 includes a plurality of document image areas corresponding one-to-one with the number of document images indicated by the aggregation number determined by the aggregation number determination unit 23 in the image data read from the document of the last page. When at least one of the document image areas is a blank area, the page order of the plurality of document images is determined according to the position of the blank area. Detailed contents will be described later.

ページ跨ぎ文字列判定部１０５は、入力画像データに集約された複数の原稿画像の中に、予め定められた一定数以上の文字で表される文字列を示すページ跨ぎ候補文字列の一部が後端に含まれている原稿画像が存在し、かつ、その原稿画像の次のページの候補となる原稿画像を示す次ページ候補原稿画像の先端に、そのページ跨ぎ候補文字列の残りの部分が含まれている場合は、その原稿画像の次のページは次ページ候補原稿画像であると判定する。詳細な内容は後述する。 The page-crossing character string determination unit 105 includes a part of a page-crossing candidate character string indicating a character string represented by a predetermined number of characters or more in a plurality of document images collected in the input image data. There is a document image included at the rear end, and the remaining portion of the page-strand candidate character string is at the leading end of the next page candidate document image indicating a document image that is a candidate for the next page of the document image. If it is included, it is determined that the next page of the document image is a next page candidate document image. Detailed contents will be described later.

トリミング処理部２５は、入力画像データから複数の原稿画像の各々を抽出する（切り出す）。記憶部２６（第１の記憶部）は、トリミング処理部２５により抽出された複数の原稿画像を記憶する。 The trimming processing unit 25 extracts (cuts out) each of a plurality of document images from the input image data. The storage unit 26 (first storage unit) stores a plurality of document images extracted by the trimming processing unit 25.

拡大部２７は、トリミング処理部２５により抽出された複数の原稿画像の各々を、集約前の元の画像サイズに拡大する。本実施形態では、拡大部２７は、記憶部２６に記憶された複数の原稿画像を、集約前の元の画像サイズに拡大する。この例では、拡大部２７が記憶部２６の後段に配置されるので、記憶部２６には縮小された状態の画像が記憶される。これにより、記憶部２６の記憶容量を低減（節約）することができる。 The enlargement unit 27 enlarges each of the plurality of document images extracted by the trimming processing unit 25 to the original image size before aggregation. In the present embodiment, the enlargement unit 27 enlarges the plurality of document images stored in the storage unit 26 to the original image size before aggregation. In this example, since the enlargement unit 27 is arranged at the subsequent stage of the storage unit 26, the storage unit 26 stores a reduced image. As a result, the storage capacity of the storage unit 26 can be reduced (saved).

ページ順序変更部２８は、拡大部２７により拡大された複数の原稿画像を、ページ順序判定部２４で判定されたページ順序に従って並び替える。 The page order changing unit 28 rearranges the plurality of document images enlarged by the enlargement unit 27 according to the page order determined by the page order determination unit 24.

ネットワークＩ／Ｆ部２９は、本実施形態の画像処理により得られた複数の原稿画像を外部装置５０へ転送する機能を有する。 The network I / F unit 29 has a function of transferring a plurality of document images obtained by the image processing of the present embodiment to the external device 50.

出力画像処理部３０は、本実施形態の画像処理により得られた複数の原稿画像を処理する機能を有する。出力部３１は、出力画像処理部３０による処理後の画像を、印刷データとして印刷装置４０へ出力する機能を有する。 The output image processing unit 30 has a function of processing a plurality of document images obtained by the image processing of the present embodiment. The output unit 31 has a function of outputting the image processed by the output image processing unit 30 to the printing apparatus 40 as print data.

図３は、本実施形態の画像処理装置２０による処理の一例を示すフローチャートである。まず、文字情報取得部２１は、入力画像データ（原稿から読み取った画像データ）に対してＯＣＲ処理などを行い、入力画像データに含まれる文字情報を取得する（ステップＳ１）。次に、集約数判定部２３は、ステップＳ１で取得された文字情報に基づいて、集約数を判定する集約数判定処理を行う（ステップＳ２）。集約数判定処理の具体的な内容については後述する。 FIG. 3 is a flowchart illustrating an example of processing performed by the image processing apparatus 20 according to the present embodiment. First, the character information acquisition unit 21 performs OCR processing or the like on input image data (image data read from a document), and acquires character information included in the input image data (step S1). Next, the aggregation number determination unit 23 performs an aggregation number determination process for determining the aggregation number based on the character information acquired in step S1 (step S2). Specific contents of the aggregation number determination process will be described later.

次に、ページ番号判定部１０１はページ番号判定処理を行う（ステップＳ３）。ページ番号判定処理の具体的な内容については後述する。ページ番号判定処理によってページ順序が判明した場合（ステップＳ４：Ｙｅｓ）、処理はステップＳ１２に移行し、ページ順序が判明しない場合（ステップＳ４：Ｎｏ）、目次内容判定部１０２は目次内容判定処理を行う（ステップＳ５）。目次内容判定処理の具体的な内容については後述する。 Next, the page number determination unit 101 performs a page number determination process (step S3). Specific contents of the page number determination process will be described later. If the page order is determined by the page number determination process (step S4: Yes), the process proceeds to step S12. If the page order is not determined (step S4: No), the table of contents determination unit 102 performs the table of contents determination process. This is performed (step S5). Specific contents of the table of contents determination process will be described later.

目次内容判定処理によってページ順序が判明した場合（ステップＳ６：Ｙｅｓ）、処理はステップＳ１２に移行し、ページ順序が判明しない場合（ステップＳ６：Ｎｏ）、目次再使用判定部１０３は目次再使用判定処理を行う（ステップＳ７）。目次再使用判定処理の具体的な内容については後述する。 If the page order is determined by the table of contents determination process (step S6: Yes), the process proceeds to step S12. If the page order is not determined (step S6: No), the table of contents reuse determination unit 103 determines the table of contents reuse. Processing is performed (step S7). Specific contents of the table of contents reuse determination process will be described later.

目次再使用判定処理によってページ順序が判明した場合（ステップＳ８：Ｙｅｓ）、処理はステップＳ１２に移行し、ページ順序が判明しない場合（ステップＳ８：Ｎｏ）、欠けページ判定部１０４は欠けページ判定処理を行う（ステップＳ９）。欠けページ判定処理の具体的な内容については後述する。 When the page order is determined by the table of contents reuse determination process (step S8: Yes), the process proceeds to step S12. When the page order is not determined (step S8: No), the missing page determination unit 104 performs the missing page determination process. Is performed (step S9). Specific contents of the missing page determination process will be described later.

欠けページ判定処理によってページ順序が判明した場合（ステップＳ１０：Ｙｅｓ）、処理はステップＳ１２に移行し、ページ順序が判明しない場合（ステップＳ１０：Ｎｏ）、ページ跨ぎ文字列判定部１０５はページ跨ぎ文字判定処理を行う（ステップＳ１１）。ページ跨ぎ文字判定処理の具体的な内容については後述する。ページ跨ぎ文字判定処理の後、処理はステップＳ１２に移行する。なお、目次内容判定処理、目次再使用判定処理、欠けページ判定処理、ページ跨ぎ文字判定処理の順序は任意であり、並列に行うこともできる。 If the page order is determined by the missing page determination process (step S10: Yes), the process proceeds to step S12. If the page order is not determined (step S10: No), the page-crossing character string determination unit 105 determines the page-crossing character. A determination process is performed (step S11). Specific contents of the page-crossing character determination process will be described later. After the page crossing character determination process, the process proceeds to step S12. The order of the table of contents determination process, the table of contents reuse determination process, the missing page determination process, and the page crossing character determination process is arbitrary, and can be performed in parallel.

ステップＳ１２において、トリミング処理部２５は、入力画像データから複数の原稿画像の各々を抽出するトリミング処理を行う。次に、拡大部２７は、トリミング処理によって抽出された複数の原稿画像の各々を、集約前の元の画像サイズに拡大する拡大処理を行う（ステップＳ１３）。 In step S12, the trimming processing unit 25 performs a trimming process for extracting each of a plurality of document images from the input image data. Next, the enlarging unit 27 performs an enlarging process for enlarging each of the plurality of document images extracted by the trimming process to the original image size before aggregation (step S13).

次に、集約数判定処理の具体的な内容を説明する。図４は、対象となる入力画像データの一例を示す図である。図４の点線で囲まれた領域は、集約された原稿画像が配置される領域（被集約領域）を表し、グレーで表示された領域は、文字情報を含む領域（非空白領域）を表している。図４の例では、正しく判定されるべき集約数は「６」である。 Next, specific contents of the aggregation number determination process will be described. FIG. 4 is a diagram illustrating an example of target input image data. An area surrounded by a dotted line in FIG. 4 represents an area (aggregated area) where the aggregated document images are arranged, and an area displayed in gray represents an area including character information (non-blank area). Yes. In the example of FIG. 4, the aggregation number to be correctly determined is “6”.

図５は、集約数判定処理の第１の過程について説明するための図である。第１の過程においては、集約数判定部２３は、主走査方向をある一定間隔Δｙ（この間隔は任意に設定可能）で分割する。そして、分割した主走査方向の領域を、文字情報取得部２１による文字情報の取得結果を利用して、文字情報を含む領域（非空白領域）と、文字情報を含まない領域（空白領域）とに分類する。 FIG. 5 is a diagram for explaining a first process of the aggregation number determination process. In the first process, the aggregation number determination unit 23 divides the main scanning direction at a certain interval Δy (this interval can be arbitrarily set). Then, the divided areas in the main scanning direction are obtained by using the character information acquisition result by the character information acquisition unit 21, an area including character information (non-blank area), and an area not including character information (blank area). Classify into:

図６は、第１の過程の後の第２の過程について説明するための図である。第２の過程においては、集約数判定部２３は、副走査方向をある一定間隔Δｘ（この間隔は任意に設定可能）で分割する。そして、分割した副走査方向の領域を、文字情報取得部２１による文字情報の取得結果を利用して、文字情報を含む領域（非空白領域）と、文字情報を含まない領域（空白領域）とに分類する。 FIG. 6 is a diagram for explaining the second process after the first process. In the second step, the aggregation number determination unit 23 divides the sub-scanning direction at a certain interval Δx (this interval can be arbitrarily set). Then, the divided areas in the sub-scanning direction are divided into an area including character information (non-blank area) and an area not including character information (blank area) using the character information acquisition result by the character information acquisition unit 21. Classify into:

図７は、集約数判定処理の第３の過程について説明するための図である。一般的に集約された原稿上の原稿画像の配置は上下左右が線対称になっている。これを利用して、第３の過程では、第２の過程の結果を副走査中心で折り返し、非空白領域の和をとる。 FIG. 7 is a diagram for explaining a third process of the aggregation number determination process. In general, the arrangement of original images on an integrated original is symmetrical with respect to the top, bottom, left and right. Using this, in the third process, the result of the second process is folded back at the sub-scanning center, and the sum of the non-blank areas is obtained.

つまり、集約数判定部２３は、入力画像データを分割して得られる複数の単位領域ごとに、文字情報を含む非空白領域であるか文字情報を含まない空白領域であるかを判定し、該単位領域が非空白領域であり、かつ、入力画像データの副走査方向の半分の位置において主走査方向に延びる第１の基準線を対称軸として該単位領域に対応する単位領域を示す第１の対称単位領域が、空白領域である場合は、第１の対称単位領域は非空白領域であると判定する。また、該単位領域が空白領域であり、かつ、第１の対称単位領域が非空白領域である場合は、該単位領域は非空白領域であると判定する。これにより、被集約領域でたまたま空白領域であった単位領域でも、第１の基準線を対称軸として該領域に対応する第１の対称単位領域が非空白領域であった場合は、該単位領域も非空白領域であると判定し、非空白領域であると判定した複数の単位領域が集合している領域を、被集約領域（原稿画像が配置される領域）であると判定することができる。 That is, the aggregation number determination unit 23 determines, for each of a plurality of unit areas obtained by dividing the input image data, whether it is a non-blank area including character information or a blank area not including character information, A first unit area that is a non-blank area and indicates a unit area corresponding to the unit area with a first reference line extending in the main scanning direction at a half position in the sub-scanning direction of the input image data. When the symmetric unit area is a blank area, it is determined that the first symmetric unit area is a non-blank area. If the unit area is a blank area and the first symmetric unit area is a non-blank area, it is determined that the unit area is a non-blank area. As a result, even if a unit area that happens to be a blank area in the aggregated area is a non-blank area when the first symmetric unit area corresponding to the first reference line as a symmetry axis is a non-blank area, Can also be determined to be an area to be aggregated (area where document images are arranged), in which a plurality of unit areas determined to be non-blank areas are collected. .

図８は、集約数判定処理の第４の過程について説明するための図である。第４の過程では、第２の過程の結果を主走査中心で折り返し、非空白領域の和をとる。つまり、集約数判定部２３は、入力画像データを分割して得られる複数の単位領域ごとに、文字情報を含む非空白領域であるか文字情報を含まない空白領域であるかを判定し、該単位領域が非空白領域であり、かつ、入力画像データの主走査方向の半分の位置において副走査方向に延びる第２の基準線を対称軸として該単位領域に対応する単位領域を示す第２の対称単位領域が、空白領域である場合は、第２の対称単位領域は非空白領域であると判定する。また、該単位領域が空白領域であり、かつ、第２の対称単位領域が非空白領域である場合は、該単位領域は非空白領域であると判定する。ここまでの処理で、被集約領域の形状が長方形になっていない場合（凹みが有るような場合）は、その形状を長方形に補正する。また、通常は、各被集約領域の形状は同じであることに着目し、全領域の修復最大の長さを用いて領域形状を補正する。細すぎる領域については、周辺の領域とのマージや一般的な配置パターンとの比較による補正などの更なる処理を行っても良い。 FIG. 8 is a diagram for explaining a fourth process of the aggregation number determination process. In the fourth process, the result of the second process is folded back at the main scanning center, and the sum of the non-blank areas is obtained. That is, the aggregation number determination unit 23 determines, for each of a plurality of unit areas obtained by dividing the input image data, whether it is a non-blank area including character information or a blank area not including character information, A second unit area that is a non-blank area and that indicates a unit area corresponding to the unit area with a second reference line extending in the sub-scanning direction at a half position in the main scanning direction of the input image data as an axis of symmetry; If the symmetric unit area is a blank area, it is determined that the second symmetric unit area is a non-blank area. If the unit area is a blank area and the second symmetric unit area is a non-blank area, the unit area is determined to be a non-blank area. In the processing so far, when the shape of the aggregated region is not a rectangle (when there is a dent), the shape is corrected to a rectangle. Further, paying attention to the fact that the shape of each aggregated region is usually the same, the region shape is corrected using the maximum repair length of all regions. For areas that are too thin, further processing such as merging with surrounding areas or correction by comparison with a general arrangement pattern may be performed.

図９〜図１１は、集約判定処理のフローチャートである。図９に示すように、まず集約数判定部２３は、判定行間隔Δｙを設定する（ステップＳ２０）。次に、集約数判定部２３は、入力画像データのうち副走査方向の先頭からΔｙ分の領域（主走査長分の領域）に着目し（ステップＳ２１）、文字情報取得部２１による文字情報の取得結果を利用して、その着目した領域（着目領域）が文字情報を含むか否かを判定する（ステップＳ２２）。着目領域が文字情報を含む場合は（ステップＳ２２：Ｙｅｓ）、着目領域は非空白領域であると判定する（ステップＳ２３）。着目領域が文字情報を含まない場合は（ステップＳ２２：Ｎｏ）、着目領域は空白領域であると判定する（ステップＳ２４）。 9 to 11 are flowcharts of the aggregation determination process. As shown in FIG. 9, first, the aggregation number determination unit 23 sets a determination line interval Δy (step S20). Next, the aggregation number determination unit 23 pays attention to a region Δy from the top in the sub-scanning direction (region for the main scanning length) in the input image data (step S21), and the character information acquisition unit 21 performs character information Using the obtained result, it is determined whether or not the focused area (focused area) includes character information (step S22). When the attention area includes character information (step S22: Yes), it is determined that the attention area is a non-blank area (step S23). If the region of interest does not include character information (step S22: No), it is determined that the region of interest is a blank region (step S24).

そして、集約数判定部２３は、原稿（入力画像データ）全体に着目済みであるか否かを判定する（ステップＳ２５）。原稿全体に着目済みでない場合は（ステップＳ２５：Ｎｏ）、着目領域を下にΔｙシフトして（ステップＳ２６）、ステップＳ２２以降の処理を繰り返す。 Then, the aggregation number determination unit 23 determines whether or not attention has been paid to the entire document (input image data) (step S25). If the entire document has not been focused (step S25: No), the focused area is shifted downward by Δy (step S26), and the processing from step S22 onward is repeated.

原稿全体に着目済みである場合は（ステップＳ２５：Ｙｅｓ）、集約数判定部２３は、判定列間隔Δｘを設定する（ステップＳ２７）。次に、集約数判定部２３は、入力画像データのうち主走査方向の先頭からΔｘ分の領域（副走査長分の領域）に着目し（ステップＳ２８）、文字情報取得部２１による文字情報の取得結果を利用して、着目領域が文字情報を含むか否かを判定する（ステップＳ２９）。着目領域が文字情報を含む場合は（ステップＳ２９：Ｙｅｓ）、着目領域は非空白領域であると判定する（ステップＳ３０）。着目領域が文字情報を含まない場合は（ステップＳ２９：Ｎｏ）、着目領域は空白領域であると判定する（ステップＳ３１）。 When attention has been paid to the entire document (step S25: Yes), the aggregation number determination unit 23 sets the determination column interval Δx (step S27). Next, the aggregation number determination unit 23 pays attention to a region corresponding to Δx from the head in the main scanning direction (region corresponding to the sub-scanning length) in the input image data (step S28), and character information obtained by the character information acquisition unit 21 is displayed. Using the obtained result, it is determined whether or not the region of interest includes character information (step S29). When the attention area includes character information (step S29: Yes), it is determined that the attention area is a non-blank area (step S30). If the region of interest does not include character information (step S29: No), it is determined that the region of interest is a blank region (step S31).

そして、集約数判定部２３は、原稿全体に着目済みであるか否かを判定する（ステップＳ３２）。原稿全体に着目済みでない場合は（ステップＳ３２：Ｎｏ）、着目領域を右にΔｘシフトして（ステップＳ３３）、ステップＳ２９以降の処理を繰り返す。原稿全体に着目済みである場合は（ステップＳ３２：Ｙｅｓ）、図１０のフローに移行する。 Then, the aggregation number determination unit 23 determines whether or not attention has been paid to the entire document (step S32). If the entire document has not been focused (step S32: No), the focused area is shifted to the right by Δx (step S33), and the processing from step S29 is repeated. If attention has been paid to the entire document (step S32: Yes), the flow proceeds to the flow of FIG.

図１０に示すように、集約数判定部２３は、入力画像データの主走査方向の先頭からΔｘ分、副走査方向の先頭からΔｙ分の領域（単位領域）を着目領域とする（ステップＳ４０）。次に、集約数判定部２３は、着目領域が非空白領域であるか否かを判定する（ステップＳ４１）。着目領域が非空白領域である場合（ステップＳ４１：Ｙｅｓ）、集約数判定部２３は、第１の基準線（副走査方向半分の位置において主走査方向に延びる基準線）を対称軸として着目領域に対応する第１の対称単位領域は空白領域であるか否かを判定する（ステップＳ４２）。着目領域に対応する第１の対称単位領域が空白領域である場合（ステップＳ４２：Ｙｅｓ）、第１の対称単位領域を非空白領域に設定する（ステップＳ４３）。 As shown in FIG. 10, the aggregation number determination unit 23 sets a region (unit region) of Δx from the top in the main scanning direction and Δy from the top in the sub-scanning direction of the input image data as a region of interest (Step S40). . Next, the aggregation number determination unit 23 determines whether or not the region of interest is a non-blank region (step S41). When the attention area is a non-blank area (step S41: Yes), the aggregation number determination unit 23 uses the first reference line (a reference line extending in the main scanning direction at a half position in the sub-scanning direction) as the symmetry area. It is determined whether or not the first symmetrical unit area corresponding to is a blank area (step S42). When the first symmetric unit region corresponding to the region of interest is a blank region (step S42: Yes), the first symmetric unit region is set as a non-blank region (step S43).

一方、上述のステップＳ４１において、着目領域が空白領域である場合（ステップＳ４１：Ｎｏ）、集約数判定部２３は、着目領域に対応する第１の対称単位領域は非空白領域であるか否かを判定する（ステップＳ４４）。着目領域に対応する第１の対称単位領域が非空白領域である場合（ステップＳ４４：Ｙｅｓ）、着目領域を非空白領域に設定する（ステップＳ４５）。 On the other hand, in the above-described step S41, when the attention area is a blank area (step S41: No), the aggregation number determination unit 23 determines whether or not the first symmetric unit area corresponding to the attention area is a non-blank area. Is determined (step S44). When the first symmetrical unit area corresponding to the attention area is a non-blank area (step S44: Yes), the attention area is set as a non-blank area (step S45).

そして、集約数判定部２３は、着目領域が主走査方向の後端（行の後端）であるか否かを判定する（ステップＳ４６）。着目領域が主走査方向の後端ではない場合（ステップＳ４６：Ｎｏ）、集約数判定部２３は着目領域を右にΔｘシフトし（ステップＳ４７）、ステップＳ４１以降の処理を繰り返す。着目領域が主走査方向の後端である場合（ステップＳ４６：Ｙｅｓ）、集約数判定部２３は、着目領域が副走査方向の後端（列の後端）であるか否かを判定する（ステップＳ４８）。着目領域が副走査方向の後端ではない場合（ステップＳ４８：Ｎｏ）、集約数判定部２３は着目領域を左に主走査長分、かつ、下にΔｙシフトし（ステップＳ４９）、ステップＳ４１以降の処理を繰り返す。 Then, the aggregation number determining unit 23 determines whether or not the region of interest is the rear end (the rear end of the row) in the main scanning direction (step S46). If the region of interest is not the rear end in the main scanning direction (step S46: No), the aggregation number determination unit 23 shifts the region of interest by Δx to the right (step S47), and repeats the processing after step S41. When the region of interest is the rear end in the main scanning direction (step S46: Yes), the aggregation number determination unit 23 determines whether or not the region of interest is the rear end in the sub-scanning direction (the rear end of the column) ( Step S48). If the region of interest is not the rear end in the sub-scanning direction (step S48: No), the aggregation number determination unit 23 shifts the region of interest to the left by the main scanning length and downward by Δy (step S49), and after step S41 Repeat the process.

着目領域が副走査方向の後端である場合（ステップＳ４８：Ｙｅｓ）、集約数判定部２３は、再び、入力画像データの主走査方向の先頭からΔｘ分、副走査方向の先頭からΔｙ分の領域（単位領域）を着目領域とする（ステップＳ５０）。次に、集約数判定部２３は、着目領域が非空白領域であるか否かを判定する（ステップＳ５１）。着目領域が非空白領域である場合（ステップＳ５１：Ｙｅｓ）、集約数判定部２３は、第２の基準線（主走査方向半分の位置において副走査方向に延びる基準線）を対称軸として着目領域に対応する第２の対称単位領域は空白領域であるか否かを判定する（ステップＳ５２）。着目領域に対応する第２の対称単位領域が空白領域である場合（ステップＳ５２：Ｙｅｓ）、第２の対称単位領域を非空白領域に設定する（ステップＳ５３）。 When the region of interest is the rear end in the sub-scanning direction (step S48: Yes), the aggregation number determination unit 23 again performs Δx from the head of the input image data in the main scanning direction and Δy from the head in the sub-scanning direction. A region (unit region) is set as a region of interest (step S50). Next, the aggregation number determination unit 23 determines whether or not the region of interest is a non-blank region (step S51). When the attention area is a non-blank area (step S51: Yes), the aggregation number determination unit 23 uses the second reference line (a reference line extending in the sub-scanning direction at a half position in the main scanning direction) as the symmetry area. It is determined whether or not the second symmetric unit area corresponding to is a blank area (step S52). If the second symmetric unit region corresponding to the region of interest is a blank region (step S52: Yes), the second symmetric unit region is set as a non-blank region (step S53).

一方、上述のステップＳ５１において、着目領域が空白領域である場合（ステップＳ５１：Ｎｏ）、集約数判定部２３は、着目領域に対応する第２の対称単位領域は非空白領域であるか否かを判定する（ステップＳ５４）。着目領域に対応する第２の対称単位領域が非空白領域である場合（ステップＳ５４：Ｙｅｓ）、着目領域を非空白領域に設定する（ステップＳ５５）。 On the other hand, in the above-described step S51, when the target area is a blank area (step S51: No), the aggregation number determination unit 23 determines whether the second symmetric unit area corresponding to the target area is a non-blank area. Is determined (step S54). When the second symmetrical unit region corresponding to the region of interest is a non-blank region (step S54: Yes), the region of interest is set as a non-blank region (step S55).

そして、集約数判定部２３は、着目領域が主走査方向の後端であるか否かを判定する（ステップＳ５６）。着目領域が主走査方向の後端ではない場合（ステップＳ５６：Ｎｏ）、集約数判定部２３は着目領域を右にΔｘシフトし（ステップＳ５７）、ステップＳ５１以降の処理を繰り返す。着目領域が主走査方向の後端である場合（ステップＳ５６：Ｙｅｓ）、集約数判定部２３は、着目領域が副走査方向の後端であるか否かを判定する（ステップＳ５８）。着目領域が副走査方向の後端ではない場合（ステップＳ５８：Ｎｏ）、集約数判定部２３は着目領域を左に主走査長分、かつ、下にΔｙシフトし（ステップＳ５９）、ステップＳ５１以降の処理を繰り返す。着目領域が副走査方向の後端である場合（ステップＳ５８：Ｙｅｓ）、図１１のフローに移行する。集約数判定部２３は、非空白領域であると判定した単位領域の集合を、被集約領域（原稿画像の領域）であると判定することができる。なお、ステップＳ４０〜ステップＳ４９までの一連の処理と、ステップＳ５０〜ステップＳ５９までの一連の処理の順番は反対であってもよい。 Then, the aggregation number determination unit 23 determines whether or not the region of interest is the rear end in the main scanning direction (step S56). If the region of interest is not the rear end in the main scanning direction (step S56: No), the aggregation number determination unit 23 shifts the region of interest to the right by Δx (step S57), and repeats the processing after step S51. When the region of interest is the rear end in the main scanning direction (step S56: Yes), the aggregation number determination unit 23 determines whether the region of interest is the rear end in the sub-scanning direction (step S58). If the region of interest is not the rear end in the sub-scanning direction (step S58: No), the aggregation number determination unit 23 shifts the region of interest to the left by the main scanning length and downward by Δy (step S59), and after step S51 Repeat the process. When the region of interest is the rear end in the sub-scanning direction (step S58: Yes), the flow proceeds to the flow in FIG. The aggregation number determination unit 23 can determine that a set of unit areas determined to be a non-blank area is an area to be aggregated (original image area). Note that the order of the series of processes from step S40 to step S49 and the series of processes from step S50 to step S59 may be reversed.

図１１に示すように、集約数判定部２３は、被集約領域として判定した領域の間隔が規定値より狭いか否かを判定する（ステップＳ６１）。領域間の間隔が規定値よりも狭い場合（ステップＳ６１：Ｙｅｓ）、領域間を非空白領域として領域同士を合成する（ステップＳ６２）。 As illustrated in FIG. 11, the aggregation number determination unit 23 determines whether or not the interval between the areas determined as the aggregated areas is narrower than a specified value (step S61). If the interval between the regions is narrower than the specified value (step S61: Yes), the regions are combined with each other as a non-blank region (step S62).

次に、集約数判定部２３は、各領域（被集約領域）に凹みがあるか否かを判定する（ステップＳ６３）。各領域に凹みがある場合（ステップＳ６３：Ｙｅｓ）、凹み部を非空白領域にして領域形状を長方形に補正する（ステップＳ６４）。次に、集約数判定部２３は、各領域に凸部があるか否かを判定する（ステップＳ６５）。各領域に凸部がある場合（ステップＳ６５：Ｙｅｓ）、集約数判定部２３は、凸部に合わせて領域を長方形に補正する（ステップＳ６６）。 Next, the aggregation number determination unit 23 determines whether or not each area (aggregated area) has a dent (step S63). When there is a dent in each area (step S63: Yes), the dent is made a non-blank area and the area shape is corrected to a rectangle (step S64). Next, the aggregation number determination unit 23 determines whether or not each region has a convex portion (step S65). When there is a convex portion in each region (step S65: Yes), the aggregation number determination unit 23 corrects the region to a rectangle according to the convex portion (step S66).

次に、集約数判定部２３は、各領域の主走査幅を最大の領域に合わせる補正を行う（ステップＳ６７）。次に、集約数判定部２３は、各領域の副走査幅を最大の領域に合わせる補正を行う（ステップＳ６８）。 Next, the aggregation number determination unit 23 performs correction to match the main scanning width of each region with the maximum region (step S67). Next, the aggregation number determination unit 23 performs correction to match the sub-scanning width of each area with the maximum area (step S68).

次に、ページ番号判定処理の具体的な内容について説明する。図１２は、横並びの６枚集約の原稿から読み取った画像データ（入力画像データ）のイメージ図である。集約された各原稿画像には、タイトルの章番号、箇条書きの番号、ページ番号などの、数を表す文字情報である順序文字情報が含まれている。 Next, specific contents of the page number determination process will be described. FIG. 12 is an image diagram of image data (input image data) read from six originals arranged side by side. Each collected document image includes order character information, which is character information representing a number, such as a chapter number of a title, an itemized number, and a page number.

ページ番号判定部１０１は、集約判定処理で判定された複数の原稿画像の各々の形状（各被集約領域の形状）と、文字の印字方向とから最初の（先頭の）原稿画像を決定する。例えば各被集約領域の形状が長方形で、文字の印字方向が横方向（つまり横書き）の場合は、最初の原稿画像は左上で、そこから右方向に次のページの原稿画像が配置されるか、そこから下方向に次のページの原稿画像が配置されるのが一般的であり、最初の原稿画像が右上で、そこから下方向や左方向に次のページの原稿画像が配置されるという並びにはならない。これを利用して、先頭の原稿画像の次のページの候補となる原稿画像を示す次ページ候補原稿画像として、対象原稿画像の右隣に配置された原稿画像と、対象原稿画像の直下に配置された原稿画像を選択することができる。 The page number determination unit 101 determines the first (first) document image from the shape of each of the plurality of document images determined by the aggregation determination process (the shape of each aggregated area) and the character printing direction. For example, if the shape of each aggregated area is rectangular and the character printing direction is horizontal (that is, horizontal writing), is the first original image placed at the upper left and the original image of the next page placed from the right to the right? In general, the original image of the next page is arranged in the downward direction from there, and the original image of the next page is arranged in the upper and lower directions from the first original image. Do not line up. Using this, as a next page candidate document image indicating a document image that is a candidate for the next page of the first document image, a document image arranged on the right side of the target document image and arranged immediately below the target document image The selected original image can be selected.

次に、ページ番号判定部１０１は、先頭の原稿画像を対象原稿画像とし、対象原稿画像に含まれる順序文字情報である対象順序文字情報を特定する。そして、次ページ候補原稿画像のうち対象順序文字情報に対応する位置に、対象順序文字情報が示す数よりも大きく、かつ、同じ種類の数を示す順序文字情報が存在する場合は、対象順序文字情報と、次ページ候補原稿画像のうち対象順序文字情報に対応する位置に存在する順序文字情報とを組み合わせとする。図１２の例では、先頭ページである左上の原稿画像（対象原稿画像）の左上に配置された「１．１」という章番号や、右下に配置された「３」というページ番号については、次ページ候補原稿画像である右隣の原稿画像や直下の原稿画像の各々の対応する位置にも、それらを表す順序文字情報が存在するので、次ページ候補原稿画像のうち、対象原稿画像に含まれる順序文字情報（対象順序文字情報）に対応する位置に、対象順序文字情報が示す数よりも大きく、かつ、同じ種類の数を示す順序文字情報が存在する場合は、対象順序文字情報と、次ページ候補原稿画像のうち対象順序文字情報に対応する位置に存在する順序文字情報と、を組み合わせとする。 Next, the page number determination unit 101 sets the first document image as a target document image, and identifies target sequence character information that is sequence character information included in the target document image. If there is order character information that is larger than the number indicated by the target order character information at the position corresponding to the target order character information in the next page candidate document image and indicates the same type number, the target order character information The information is combined with the order character information existing at the position corresponding to the target order character information in the next page candidate document image. In the example of FIG. 12, the chapter number “1.1” arranged at the upper left of the upper left original image (target original image) that is the first page and the page number “3” arranged at the lower right are as follows. Since there is order character information indicating the next page candidate document image on the right side and the document image immediately below the corresponding document image, it is included in the target document image among the next page candidate document images. If there is order character information that is larger than the number indicated by the target order character information and has the same type number at the position corresponding to the target order character information (target order character information), The next page candidate document image is combined with the order character information existing at the position corresponding to the target order character information.

ページ番号判定部１０１は、対象原稿画像と次ページ候補原稿画像との組ごとに、複数の組み合わせを見つけていき、対象原稿画像の次のページとなる次ページ候補原稿画像を判定する。判定基準は、単純に組み合わせの数が多い方の次ページ候補原稿画像を対象原稿画像の次のページとして判定してもよいし、組み合わせの方向（横方向、縦方向）に応じた重み付けを行って、組み合わせごとに評価値を算出し、評価値の総和が大きい方の次ページ候補原稿画像を対象原稿画像の次のページとして判定してもよい。例えば横長の用紙には横方向の配列を使用する可能性が高い場合は、横方向の組み合わせの評価値が縦方向の組み合わせの評価値よりも高くなるように重み付けを行ってもよい。ただし、これに限らず、例えばユーザの操作に応じて重み付けを行ってもよい。また、次ページ候補原稿画像のうち対象順序文字情報に対応する位置に、対象順序文字情報が示す数よりも１だけ大きい数を示す順序文字情報が存在する場合は、その組み合わせの評価値が高くなるように重み付けを行うこともできる。以上のようにして、ページ番号判定部１０１は、対象原稿画像を切り替えながら、対象原稿画像の次のページの原稿画像を判定することで、複数の原稿画像のページ順序を判定する。 The page number determination unit 101 finds a plurality of combinations for each set of the target document image and the next page candidate document image, and determines the next page candidate document image that becomes the next page of the target document image. As a determination criterion, the next page candidate document image with the larger number of combinations may be determined as the next page of the target document image, and weighting according to the combination direction (horizontal direction, vertical direction) is performed. Thus, an evaluation value may be calculated for each combination, and the next page candidate document image having a larger sum of evaluation values may be determined as the next page of the target document image. For example, when there is a high possibility of using a horizontal arrangement for a horizontally long sheet, weighting may be performed so that the evaluation value of the combination in the horizontal direction is higher than the evaluation value of the combination in the vertical direction. However, the present invention is not limited to this, and weighting may be performed according to, for example, a user operation. In addition, when there is order character information indicating a number one larger than the number indicated by the target order character information at the position corresponding to the target order character information in the next page candidate document image, the evaluation value of the combination is high. Weighting can also be performed. As described above, the page number determination unit 101 determines the page order of a plurality of document images by determining the document image of the next page of the target document image while switching the target document image.

図１３は、ページ番号判定処理のフローチャートである。図１３に示すように、まずページ番号判定部１０１は、最初の（先頭の）原稿画像に着目する（ステップＳ７０）。以下の説明では、着目した原稿画像を対象原稿画像と称する。次に、ページ番号判定部１０１は、対象原稿画像内の何れかの順序文字情報に着目する（ステップＳ７１）。以下の説明では、対象原稿画像内の着目している順序文字情報を、対象順序文字情報と称する。ページ番号判定部１０１は、対象原稿画像の次のページの候補となる原稿画像を示す次ページ候補原稿画像のうち対象順序文字情報と同じ位置に、順序文字情報が存在するか否かを判定する（ステップＳ７２）。順序文字情報が存在しない場合（ステップＳ７２：Ｎｏ）、関連性は無い（組み合わせにはならない）と判定する（ステップＳ７８）。 FIG. 13 is a flowchart of the page number determination process. As shown in FIG. 13, the page number determination unit 101 first focuses on the first (first) document image (step S70). In the following description, the focused document image is referred to as a target document image. Next, the page number determination unit 101 pays attention to any order character information in the target document image (step S71). In the following description, the target order character information in the target document image is referred to as target order character information. The page number determination unit 101 determines whether or not the order character information exists at the same position as the target order character information in the next page candidate document image indicating the document image that is a candidate for the next page of the target document image. (Step S72). When the order character information does not exist (step S72: No), it is determined that there is no relevance (cannot be combined) (step S78).

ステップＳ７２において、順序文字情報が存在する場合は（ステップＳ７２：Ｙｅｓ）、その順序文字情報が示す数の種類と、対象順序文字情報が示す数の種類は一致しているか否かを判定する（ステップＳ７３）。種類が一致しない場合は（ステップＳ７３：Ｎｏ）、関連性は無いと判定する（ステップＳ７８）。種類が一致する場合は（ステップＳ７３：Ｙｅｓ）、対象順序文字情報と同じ位置に存在する順序文字情報が示す数は、対象順序文字情報が示す数よりも増加しているか否かを判定する（ステップＳ７４）。増加していない場合は（ステップＳ７４：Ｎｏ）、関連性は無いと判定する（ステップＳ７８）。増加している場合は（ステップＳ７４：Ｙｅｓ）、その増加量は１であるか否かを判定する（ステップＳ７５）。増加量が１ではない場合（ステップＳ７５：Ｎｏ）、対象順序文字情報と、次ページ候補原稿画像のうち対象順序文字情報と同じ位置に存在する順序文字情報とは組み合わせになるものの、関連性は弱いと判定する（ステップＳ７６）。増加量が１である場合（ステップＳ７５：Ｙｅｓ）、対象順序文字情報と、次ページ候補原稿画像のうち対象順序文字情報と同じ位置に存在する順序文字情報とは組み合わせになり、関連性は強いと判定する（ステップＳ７７）。 In step S72, when the order character information exists (step S72: Yes), it is determined whether the number type indicated by the order character information matches the number type indicated by the target order character information ( Step S73). If the types do not match (step S73: No), it is determined that there is no relationship (step S78). If the types match (step S73: Yes), it is determined whether or not the number indicated by the order character information existing at the same position as the target order character information is greater than the number indicated by the target order character information ( Step S74). If not increased (step S74: No), it is determined that there is no relevance (step S78). If it has increased (step S74: Yes), it is determined whether or not the increase amount is 1 (step S75). When the increase amount is not 1 (step S75: No), the target order character information and the order character information existing at the same position as the target order character information in the next page candidate document image are combined, but the relevance is It is determined that it is weak (step S76). When the increase amount is 1 (step S75: Yes), the target order character information and the order character information existing at the same position as the target order character information in the next page candidate document image are combined, and the relationship is strong. (Step S77).

そして、ページ番号判定部１０１は、対象原稿画像内の全ての順序文字情報に着目したか否かを判定する（ステップＳ７９）。全ての順序文字情報に着目していない場合（ステップＳ７９：Ｎｏ）、ページ番号判定部１０１は、対象順序文字情報を更新して（ステップＳ８０）、ステップＳ７２以降の処理を繰り返す。全ての順序文字情報に着目した場合（ステップＳ７９：Ｙｅｓ）、ページ番号判定部１０１は、全ての原稿画像に着目したか否かを判定する（ステップＳ８１）。全ての原稿画像に着目していない場合（ステップＳ８１：Ｎｏ）、ページ番号判定部１０１は、対象原稿画像を更新して（ステップＳ８２）、ステップＳ７１以降の処理を繰り返す。全ての原稿画像に着目した場合（ステップＳ８１：Ｙｅｓ）、ページ番号判定部１０１は、組み合わせの数、関連性の強弱から、複数の原稿画像のページ順序を判定する（ステップＳ８３）。 Then, the page number determination unit 101 determines whether or not attention is paid to all the order character information in the target document image (step S79). When not paying attention to all the order character information (step S79: No), the page number determination unit 101 updates the target order character information (step S80), and repeats the processing after step S72. When attention is paid to all the order character information (step S79: Yes), the page number determination unit 101 determines whether or not attention is paid to all document images (step S81). When not paying attention to all the original images (step S81: No), the page number determination unit 101 updates the target original image (step S82) and repeats the processing after step S71. When all document images are focused (step S81: Yes), the page number determination unit 101 determines the page order of a plurality of document images based on the number of combinations and the strength of relevance (step S83).

次に、目次内容判定処理の具体的な内容について説明する。図１４は、横並びの６枚集約の原稿から読み取った画像データ（入力画像データ）のイメージ図であり、先頭のページ（この例では左上の原稿画像）が目次のページになっている。 Next, specific contents of the table of contents determination process will be described. FIG. 14 is an image diagram of image data (input image data) read from six documents arranged side by side. The first page (the upper left document image in this example) is the table of contents page.

図１５は、目次内容判定処理のイメージ図である。この例では、目次内容判定部１０２は、先頭のページ、または、先頭のページの次のページの候補となる原稿画像を示す次ページ候補原稿画像の中に、「目次」や「アジェンダ」などの、予め定められた目次用の文字列を表す１以上の文字情報が存在するか否かを判定し、目次用の文字列を表す１以上の文字情報を含む原稿画像を、目次のページであると判定する。また、必要であれば、画面内のレイアウトの比較を他の原稿画像と行って目次のページを判定してもよい。図１５の例では、先頭のページが目次のページであると判定される。そして、目次内容判定部１０２は、目次のページに含まれる文字情報に基づいて、目次のページに含まれる複数の文字列と登場順序とを対応付けた目次情報を生成し、目次情報と、目次のページ以外の原稿画像に含まれる文字情報とに基づいて、複数のページ順序を判定する。 FIG. 15 is an image diagram of the table of contents determination process. In this example, the table of contents determination unit 102 includes “table of contents”, “agenda”, and the like in the next page candidate document image indicating a document image that is a candidate for the first page or the next page of the first page. Then, it is determined whether or not one or more character information representing a predetermined character string for the table of contents exists, and a document image including one or more character information representing the character string for the table of contents is a table of contents page. Is determined. If necessary, the table of contents page may be determined by comparing the layouts in the screen with other document images. In the example of FIG. 15, it is determined that the first page is the table of contents page. Then, the table of contents determination unit 102 generates table of contents information in which a plurality of character strings included in the page of the table of contents and the order of appearance are associated with each other based on the character information included in the page of the table of contents. The order of a plurality of pages is determined based on the character information included in the document image other than the current page.

図１６は、目次判定処理のフローチャートである。まず目次内容判定部１０２は、先頭の原稿画像に着目する（ステップＳ９０）。説明の便宜上、この例では、図１５の左上に配置された原稿画像が先頭の原稿画像であり、その右隣および直下に配置された原稿画像は次のページの候補となる原稿画像を示す次ページ候補原稿画像であることを前提とする。以下の説明では、この２つの次ページ候補原稿画像のうちの一方を「第１の次ページ候補原稿画像」、他方を「第２の次ページ候補原稿画像」と称する。また、以下の説明では、着目した原稿画像を「対象原稿画像」と称する。 FIG. 16 is a flowchart of the table of contents determination process. First, the table of contents determination unit 102 pays attention to the leading document image (step S90). For the sake of convenience of explanation, in this example, the document image arranged at the upper left in FIG. 15 is the first document image, and the document image arranged immediately to the right and directly below is the next document image that is a candidate for the next page. It is assumed that it is a page candidate document image. In the following description, one of the two next page candidate document images is referred to as a “first next page candidate document image”, and the other is referred to as a “second next page candidate document image”. In the following description, the focused document image is referred to as “target document image”.

次に、目次内容判定部１０２は、対象原稿画像の中に、目次用の文字列を表す１以上の文字情報があるか否かを判定する（ステップＳ９１）。目次用の文字列を表す１以上の文字情報がない場合（ステップＳ９１：Ｎｏ）、処理は後述のステップＳ９５に移行する。目次用の文字列を表す１以上の文字情報がある場合（ステップＳ９１：Ｙｅｓ）、対象原稿画像の文字の印字方向が横方向（横書き）であるか否かを判定する（ステップＳ９２）。対象原稿画像の文字の印字方向が横方向である場合（ステップＳ９２：Ｙｅｓ）、目次内容判定部１０２は、対象原稿画像の副走査方向の上から順に見ていき、登場する文字列と、その登場順序とを対応付けた目次情報を生成して保存する（ステップＳ９３）。また、対象原稿画像の文字の印字方向が横方向ではない場合（ステップＳ９２：Ｎｏ）、つまり、縦書きの場合、目次内容判定部１０２は、対象原稿画像の主走査方向の右から順に見ていき、登場する文字列と、その登場順序とを対応付けた目次情報を生成して保存する（ステップＳ９４）。 Next, the table of contents determination unit 102 determines whether or not there is one or more character information representing a character string for the table of contents in the target document image (step S91). If there is no one or more pieces of character information representing a table of contents character string (step S91: No), the process proceeds to step S95 described later. If there is one or more character information representing a character string for the table of contents (step S91: Yes), it is determined whether or not the character printing direction of the target document image is horizontal (horizontal writing) (step S92). When the character printing direction of the target document image is the horizontal direction (step S92: Yes), the table of contents determination unit 102 sequentially looks from the top of the target document image in the sub-scanning direction, The table of contents information associated with the appearance order is generated and stored (step S93). If the character printing direction of the target document image is not the horizontal direction (step S92: No), that is, in the case of vertical writing, the table of contents determination unit 102 sequentially looks from the right in the main scanning direction of the target document image. Then, the table of contents information in which the character string that appears and the order of appearance is associated is generated and stored (step S94).

次に、目次内容判定部１０２は、第１の次ページ候補原稿画像に着目済みであるか否かを判定する（ステップＳ９５）。第１の次ページ候補原稿画像に着目済みではない場合（ステップＳ９５：Ｎｏ）、目次内容判定部１０２は、第１の次ページ候補原稿画像に着目して（ステップＳ９６）、ステップＳ９１以降の処理を繰り返す。第１の次ページ候補原稿画像に着目済みである場合（ステップＳ９５：Ｙｅｓ）、目次内容判定部１０２は、第２の次ページ候補原稿画像に着目済みであるか否かを判定する（ステップＳ９７）。第２の次ページ候補原稿画像に着目済みではない場合（ステップＳ９７：Ｎｏ）、第２の次ページ候補原稿画像に着目して（ステップＳ９８）、ステップＳ９１以降の処理を繰り返す。第２の次ページ候補領域に着目済みである場合（ステップＳ９７：Ｙｅｓ）、目次内容判定部１０２は、目次のページ以外の原稿画像の中に、目次情報に含まれる文字列が存在するか否かを判定し（ステップＳ９９）、目次情報に含まれる文字列が存在する場合は（ステップＳ９９：Ｙｅｓ）、その文字列の登場順序に従って、原稿画像のページの順序を判定する（ステップＳ１００）。 Next, the table of contents determination unit 102 determines whether or not attention has been paid to the first next page candidate document image (step S95). When attention has not been paid to the first next page candidate document image (step S95: No), the table of contents determination unit 102 pays attention to the first next page candidate document image (step S96), and the processing after step S91. repeat. When attention has been paid to the first next page candidate document image (step S95: Yes), the table of contents determination unit 102 determines whether attention has been paid to the second next page candidate document image (step S97). ). When attention has not been paid to the second next page candidate document image (Step S97: No), attention is paid to the second next page candidate document image (Step S98), and the processes after Step S91 are repeated. When attention has been paid to the second next page candidate area (step S97: Yes), the table of contents determination unit 102 determines whether or not a character string included in the table of contents information exists in the document image other than the page of the table of contents. (Step S99), and if there is a character string included in the table of contents information (step S99: Yes), the page order of the original image is determined according to the appearance order of the character string (step S100).

次に、目次再使用判定処理の具体的な内容について説明する。図１７は、横並びの６枚集約の原稿から読み取った画像データ（入力画像データ）のイメージ図であり、先頭のページ（この例では左上の原稿画像）が目次のページになっていて、以降のページに、その中に含まれる複数の文字列と登場順序が目次情報と一致し、かつ、目次情報に含まれる複数の文字列のうちの何れか１つの文字列の色が変わっている原稿画像が存在している。目次情報を再使用し、一部の文字列の色を変える（例えば一つの文字列だけ赤くする）、または、一部の文字列の色を変えない（例えばその一部の文字列以外の文字列の色を薄くする）ことや、同様にサイズを変えることはよく行われる。目次再使用判定部１０３は、目次のページ以外の原稿画像のうち、その中に含まれる複数の文字列と登場順序が目次情報と一致し、かつ、目次情報に含まれる複数の文字列のうち何れか１つの文字列のサイズまたは色が変わっている、あるいは、変わっていない原稿画像が存在する場合は、その１つの文字列の登場順序に基づいて、その原稿画像のページ順序を判定する。 Next, specific contents of the table of contents reuse determination process will be described. FIG. 17 is an image diagram of image data (input image data) read from a six-sheet original arranged side by side. The first page (in this example, the upper left original image) is the table of contents page, and the following pages A document image in which the appearance order of the plurality of character strings included therein matches the table of contents information and the color of any one of the plurality of character strings included in the table of contents information is changed. Existing. Reuse the table of contents information and change the color of some character strings (for example, make only one character string red), or do not change the color of some character strings (for example, characters other than some of the character strings) It is common to lighten the color of the columns) and to change the size as well. The table-of-contents reuse determination unit 103 includes a plurality of character strings included in the document image other than the table of contents page, the appearance order of which matches the table of contents information, and the plurality of character strings included in the table of contents information. If there is a document image in which the size or color of any one character string has changed or has not changed, the page order of the document image is determined based on the appearance order of the one character string.

図１８は、目次再使用判定処理のフローチャートである。まず、目次再使用判定部１０３は、先頭の原稿画像に着目する（ステップＳ１１０）。説明の便宜上、この例では、図６の左上に配置された原稿画像が先頭の原稿画像であり、その右隣および直下に配置された原稿画像は次のページの候補となる原稿画像を示す次ページ候補原稿画像であることを前提とする。以下の説明では、この２つの次ページ候補原稿画像のうちの一方を「第１の次ページ候補原稿画像」、他方を「第２の次ページ候補原稿画像」と称する。また、以下の説明では、着目した原稿画像を「対象原稿画像」と称する。 FIG. 18 is a flowchart of the table of contents reuse determination process. First, the table of contents reuse determination unit 103 focuses on the first document image (step S110). For the sake of convenience of explanation, in this example, the document image arranged at the upper left in FIG. 6 is the first document image, and the document image arranged at the right and directly below the next image indicates a document image that is a candidate for the next page. It is assumed that it is a page candidate document image. In the following description, one of the two next page candidate document images is referred to as a “first next page candidate document image”, and the other is referred to as a “second next page candidate document image”. In the following description, the focused document image is referred to as “target document image”.

次に、目次再使用判定部１０３は、対象原稿画像の中に、目次用の文字列を表す１以上の文字情報があるか否かを判定する（ステップＳ１１１）。対象原稿画像の中に、目次用の文字列を表す１以上の文字情報がない場合（ステップＳ１１１：Ｎｏ）、処理は後述のステップＳ１１３に移行する。一方、対象原稿画像の中に、目次用の文字列を表す１以上の文字情報がある場合（ステップＳ１１１：Ｙｅｓ）、目次再使用判定部１０３は、対象原稿画像は目次のページであると判定し、対象原稿画像に登場する文字列と、その登場順序とを対応付けた目次情報を生成して保存する（ステップＳ１１２）。 Next, the table of contents reuse determination unit 103 determines whether or not there is one or more character information representing a character string for the table of contents in the target document image (step S111). If the target document image does not include one or more pieces of character information representing a table of contents character string (step S111: No), the process proceeds to step S113 described later. On the other hand, when the target document image includes one or more pieces of character information representing a text string for the table of contents (step S111: Yes), the table of contents reuse determination unit 103 determines that the target document image is a page of the table of contents. Then, the table of contents information in which the character string appearing in the target document image is associated with the appearance order is generated and stored (step S112).

次に、目次再使用判定部１０３は、目次のページが判明したか否かを判定する（ステップＳ１１３）。まず、目次のページが判明した場合（ステップＳ１１３：Ｙｅｓ）について説明する。この場合、目次再使用判定部１０３は、第１の次ページ候補原稿画像に着目済みであるか否かを判定する（ステップＳ１１４）。第１の次ページ候補原稿画像に着目済みではない場合（ステップＳ１１４：Ｎｏ）、目次再使用判定部１０３は、第１の次ページ候補原稿画像に着目して（ステップＳ１１５）、ステップＳ１１１以降の処理を繰り返す。第１の次ページ候補原稿画像に着目済みである場合（ステップＳ１１４：Ｙｅｓ）、目次再使用判定部１０３は、第２の次ページ候補原稿画像に着目済みであるか否かを判定する（ステップＳ１１６）。第２の次ページ候補原稿画像に着目済みではない場合（ステップＳ１１６：Ｎｏ）、第２の次ページ候補原稿画像に着目して（ステップＳ１１７）、ステップＳ１１１以降の処理を繰り返す。第２の次ページ候補領域に着目済みである場合（ステップＳ１１６：Ｙｅｓ）、目次再使用判定部１０３は、目次のページはなしと判定し（ステップＳ１１８）、処理を終了する。 Next, the table of contents reuse determination unit 103 determines whether or not the table of contents page has been found (step S113). First, the case where the table of contents page is found (step S113: Yes) will be described. In this case, the table of contents reuse determination unit 103 determines whether or not attention has been paid to the first next page candidate document image (step S114). When attention has not been paid to the first next page candidate document image (step S114: No), the table of contents reuse determination unit 103 focuses on the first next page candidate document image (step S115) and performs steps subsequent to step S111. Repeat the process. If attention has been paid to the first next page candidate document image (step S114: Yes), the table of contents reuse determination unit 103 determines whether or not attention has been paid to the second next page candidate document image (step S114). S116). When attention has not been paid to the second next page candidate document image (step S116: No), attention is paid to the second next page candidate document image (step S117), and the processes after step S111 are repeated. When the second next page candidate area has been focused (step S116: Yes), the table of contents reuse determination unit 103 determines that there is no page of the table of contents (step S118), and ends the process.

次に、目次のページが判明しない場合（ステップＳ１１３：Ｎｏ）について説明する。この場合、目次再使用判定部１０３は、目次のページ以外の原稿画像のうち、その中に含まれる文字列と登場順序が目次情報と一致する原稿画像があるか否かを判定する（ステップＳ１１９）。目次のページ以外の原稿画像のうち、その中に含まれる文字列と登場順序が目次情報と一致する原稿画像がある場合（ステップＳ１１９：Ｙｅｓ）、目次再使用判定部１０３は、目次情報の再使用であると判定し、その原稿画像に着目する（ステップＳ１２０）。 Next, a case where the table of contents page is not found (step S113: No) will be described. In this case, the table of contents reuse determination unit 103 determines whether or not there is a document image whose character string and appearance order match the table of contents information among document images other than the table of contents page (step S119). ). When there is a document image whose appearance order matches the table of contents information among the document images other than the page of the table of contents (step S119: Yes), the table of contents reuse determination unit 103 re-reads the table of contents information. It is determined that the document is in use, and attention is paid to the document image (step S120).

次に、目次再使用判定部１０３は、ステップＳ１２０で着目した原稿画像（対象原稿画像）において、目次情報に含まれる複数の文字列のうち何れか１つの文字列のサイズまたは色が変わっているか否かを判定する（ステップＳ１２１）。何れか１つの文字列のサイズまたは色が変わっている場合（ステップＳ１２１：Ｙｅｓ）、目次再使用判定部１０３は、その１つの文字列の登場順序に基づいて、対象原稿画像のページ順序を判定する（ステップＳ１２３）。また、ステップＳ１２１の判定結果が否定の場合（ステップＳ１２１：Ｎｏ）、目次再使用判定部１０３は、対象原稿画像において、目次情報に含まれる複数の文字列のうち何れか１つの文字列のサイズまたは色が変わっていないか否かを判定する（ステップＳ１２２）。何れか１つの文字列のサイズまたは色が変わっていない場合（ステップＳ１２２：Ｙｅｓ）、目次再使用判定部１０３は、その１つの文字列の登場順序に基づいて、対象原稿画像のページ順序を判定する（ステップＳ１２３）。 Next, the table of contents reuse determination unit 103 determines whether the size or color of any one of the plurality of character strings included in the table of contents information has changed in the document image (target document image) focused in step S120. It is determined whether or not (step S121). If the size or color of any one character string has changed (step S121: Yes), the table of contents reuse determination unit 103 determines the page order of the target document image based on the appearance order of the one character string. (Step S123). If the determination result in step S121 is negative (step S121: No), the table of contents reuse determination unit 103 determines the size of any one character string among a plurality of character strings included in the table of contents information in the target document image. Alternatively, it is determined whether or not the color has changed (step S122). When the size or color of any one character string has not changed (step S122: Yes), the table of contents reuse determination unit 103 determines the page order of the target document image based on the appearance order of the one character string. (Step S123).

次に、欠けページ判定処理の具体的な内容について説明する。図１９は、集約前の元原稿の枚数が６の倍数＋４枚であり、集約印刷により得られた複数枚の原稿の中の最終ページの原稿から読み取った画像データを表すイメージ図である。欠けページ判定部１０４は、最終ページの原稿から読み取った画像データのうち、集約数判定部２３で判定された集約数が示す数の原稿画像と１対１に対応する複数の原稿画像の領域（被集約領域）のうちの少なくとも１つの原稿画像の領域が白紙領域である場合、その白紙領域の位置に応じて、複数の原稿画像のページ順序を判定する欠けページ判定処理を行う。 Next, specific contents of the missing page determination process will be described. FIG. 19 is an image diagram showing image data read from the document of the last page among a plurality of documents obtained by aggregate printing, where the number of original documents before aggregation is a multiple of 6 + 4. The missing page determination unit 104 includes a plurality of document image areas (one to one corresponding to the number of document images indicated by the aggregation number determined by the aggregation number determination unit 23 in the image data read from the document of the last page). If at least one document image area is a blank area, a missing page determination process for determining the page order of a plurality of document images is performed according to the position of the blank area.

次に、ページ跨ぎ文字判定処理の具体的な内容について説明する。プレゼン資料などではなく文章を集約した場合、ページ間で文字列が跨っていることはよくある。この特徴に着目することで、ページの配置順序の判定の確度を高めることができる。図２０の例では、文書内に「画像形成装置」という熟語が登場する。本処理では、ページに存在する文字列の中で一定数以上の文字で表される文字列を、予めページ跨ぎ候補文字列として記憶しておく。記憶したページ跨ぎ候補文字列の一部に該当する文字列が、着目した原稿画像(対象原稿画像)の後端に有った場合は、対象原稿画像の次のページの候補となる原稿画像を示す次ページ候補原稿画像の先頭の文字列を確認し、残りの一部に一致していれば、対象原稿画像と次ページ候補原稿画像との間に連続性が有ることを判定することができる。これを利用して、配置順を判定することができる。 Next, specific contents of the cross-page character determination process will be described. When sentences are gathered instead of presentation materials, character strings often straddle between pages. By paying attention to this feature, the accuracy of determining the page arrangement order can be increased. In the example of FIG. 20, the phrase “image forming apparatus” appears in the document. In this process, a character string represented by a certain number of characters or more among character strings existing on the page is stored in advance as a page-strand candidate character string. If a character string corresponding to a part of the stored page-stretching candidate character string is at the rear end of the document image (target document image) of interest, a document image that is a candidate for the next page of the target document image is selected. The first character string of the next page candidate document image shown is confirmed, and if it matches the remaining part, it can be determined that there is continuity between the target document image and the next page candidate document image . Using this, the arrangement order can be determined.

以上に説明したように、本実施形態では、入力画像データに含まれる文字情報に基づいて、入力画像データに集約された複数の原稿画像の数を示す集約数を判定し、その集約された複数の原稿画像の各々に含まれる文字情報に基づいて、複数の原稿画像のページ順序を判定する。これにより、縮小した複数枚の原稿画像以外の情報を生成して印刷することなく、適切に等倍復元することができる。 As described above, in the present embodiment, based on the character information included in the input image data, the aggregation number indicating the number of document images aggregated in the input image data is determined, and the aggregated plural The page order of the plurality of document images is determined based on the character information included in each of the document images. Accordingly, it is possible to appropriately restore the same size without generating and printing information other than the reduced plurality of document images.

以上、本発明に係る実施形態について説明したが、本発明は、上述の実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上述の実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。 Although the embodiments according to the present invention have been described above, the present invention is not limited to the above-described embodiments as they are, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. Various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the above-described embodiments. For example, some components may be deleted from all the components shown in the embodiment.

なお、ページ順序判定部２４は、少なくともページ番号判定部１０１を備える形態であればよく、その他の構成要素（目次内容判定部１０２、目次再使用判定部１０３、欠けページ判定部１０４、ページ跨ぎ文字列判定部１０５）は設けられなくてもよいし、その他の構成要素のうち、何れか１つまたは２以上の構成要素が選択的に設けられてもよい。 Note that the page order determination unit 24 only needs to include at least the page number determination unit 101, and other components (a table of contents determination unit 102, a table of contents reuse determination unit 103, a missing page determination unit 104, a page-crossing character) The column determination unit 105) may not be provided, and any one or more of the other constituent elements may be selectively provided.

また、例えばページ順序変更部２８が設けられずに、拡大部２７により拡大された複数の原稿画像を記憶する第２の記憶部をさらに備え、ページ順序判定部２４で判定されたページ順序に従って、該第２の記憶部に記憶された原稿画像が取り出される形態であってもよい。 Further, for example, a second storage unit that stores a plurality of document images enlarged by the enlargement unit 27 without the page order change unit 28 is provided, and according to the page order determined by the page order determination unit 24. The document image stored in the second storage unit may be taken out.

また、例えばユーザによる操作（操作部６０で受け付けた操作）に応じて、集約数の設定値を変更する集約数設定部をさらに備える形態であってもよい。 Further, for example, an aggregation number setting unit that changes the setting value of the aggregation number in accordance with an operation by the user (operation received by the operation unit 60) may be provided.

また、例えばユーザによる操作に応じて、等倍復元の対象となる複数の原稿画像のうちの何れかを削除するページ削除部をさらに備える形態であってもよい。 In addition, for example, a page deletion unit that deletes any one of a plurality of document images to be restored at the same magnification may be provided in accordance with an operation by the user.

また、上述した実施形態の画像処理装置２０（ＣＰＵ２０１）で実行されるプログラムは、インストール可能な形式または実行可能な形式のファイルでＣＤ−ＲＯＭ、フレキシブルディスク（ＦＤ）、ＣＤ−Ｒ、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋ）、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）等のコンピュータで読み取り可能な記録媒体に記録して提供するように構成してもよいし、インターネット等のネットワーク経由で提供または配布するように構成してもよい。また、各種プログラムを、ＲＯＭ等に予め組み込んで提供するように構成してもよい。 The program executed by the image processing apparatus 20 (CPU 201) of the above-described embodiment is a file in an installable format or an executable format, and is a CD-ROM, flexible disk (FD), CD-R, DVD (Digital). It may be configured to be recorded and provided on a computer-readable recording medium such as Versatile Disk (USB) or USB (Universal Serial Bus), or may be configured to be provided or distributed via a network such as the Internet. Good. Various programs may be provided by being incorporated in advance in a ROM or the like.

１ＭＦＰ
１０原稿読み取り装置
２０画像処理装置
２１文字情報取得部
２２回転処理部
２３集約数判定部
２４ページ順序判定部
２５トリミング処理部
２６記憶部
２７拡大部
２８ページ順序変更部
２９ネットワークＩ／Ｆ部
３０出力画像処理部
３１出力部
４０印刷装置
６０操作部
１０１ページ番号判定部
１０２目次内容判定部
１０３目次再使用判定部
１０４欠けページ判定部
１０５ページ跨ぎ文字列判定部
２０１ＣＰＵ
２０２ＲＯＭ
２０３ＲＡＭ
２０４白紙検知処理回路
２０５回転処理回路
２０６変倍処理回路
２０７トリミング処理回路
２０８ネットワークＩ／Ｆ回路
２０９出力画像処理回路 1 MFP
10 Document Reading Device 20 Image Processing Device 21 Character Information Acquisition Unit 22 Rotation Processing Unit 23 Aggregation Number Determination Unit 24 Page Order Determination Unit 25 Trimming Processing Unit 26 Storage Unit 27 Enlargement Unit 28 Page Order Change Unit 29 Network I / F Unit 30 Output Image processing unit 31 Output unit 40 Printing device 60 Operation unit 101 Page number determination unit 102 Table of contents determination unit 103 Table of contents reuse determination unit 104 Missing page determination unit 105 Cross-page character string determination unit 201 CPU
202 ROM
203 RAM
204 Blank Paper Detection Processing Circuit 205 Rotation Processing Circuit 206 Scaling Processing Circuit 207 Trimming Processing Circuit 208 Network I / F Circuit 209 Output Image Processing Circuit

特開２００４−２８２６６８号公報JP 2004-282668 A

Claims

A character information acquisition unit that acquires character information indicating characters included in image data read from a document;
An aggregation number determination unit that determines an aggregation number indicating the number of a plurality of document images aggregated in the image data based on the character information acquired by the character information acquisition unit;
A page order determination unit that determines a page order of the plurality of document images based on the character information included in each of the plurality of document images.
Image processing device.

The aggregation number determination unit
For each of a plurality of unit areas obtained by dividing the image data,
Determining whether it is a non-blank area including the character information or a blank area not including the character information;
The unit area is the non-blank area, and the unit area corresponding to the unit area is defined with a first reference line extending in the main scanning direction at a half position in the sub-scanning direction of the image data as an axis of symmetry. If the first symmetric unit region is the blank region, determine that the first symmetric unit region is the non-blank region,
When the unit area is the blank area and the first symmetric unit area is the non-blank area, it is determined that the unit area is the non-blank area;
The unit area is the non-blank area and corresponds to the unit area with a second reference line extending in the sub-scanning direction at a half position in the main scanning direction of the image data as an axis of symmetry. When the second symmetric unit region indicating is the blank region, it is determined that the second symmetric unit region is the non-blank region,
If the unit area is the blank area and the second symmetric unit area is the non-blank area, it is determined that the unit area is the non-blank area;
Determining an area where the plurality of unit areas determined to be the non-blank area are an area of the document image;
The image processing apparatus according to claim 1.

The aggregation number determination unit variably sets the size of the unit area in accordance with an operation by a user.
The image processing apparatus according to claim 2.

The page order determination unit
A page number determination unit that determines the page order of the plurality of document images based on the order character information that is the character information representing the number of the character information included in each of the plurality of document images;
The image processing apparatus according to claim 1.

The page number determination unit identifies target order character information indicating the order character information included in a target document image indicating any one of the plurality of document images, and the target order character information; The target document image is based on the relationship with the sequence character information existing at a position corresponding to the target sequence character information among the next page candidate document images indicating document images that are candidates for the next page of the target document image. And determining the page order of the next page candidate document image,
The image processing apparatus according to claim 4.

The page order determination unit
The leading document image is determined from the shape of each of the plurality of document images and the character printing direction, and the leading document image and the document image that is a candidate for the next page of the leading document image are indicated. A table of contents content determination unit that determines a page of a table of contents based on the character information included in each of the next page candidate document images or each layout;
The image processing apparatus according to claim 4 or 5.

The table of contents determination unit generates table of contents information in which a plurality of character strings included in the table of contents page are associated with the appearance order based on the character information included in the table of contents page, and the table of contents information Determining the page order of the plurality of document images based on the character information included in the document images other than the table of contents pages;
The image processing apparatus according to claim 6.

The page order determination unit
The first document image is determined from the shape of each of the plurality of document images aggregated in the image data and the printing direction of characters, and the first document image and the next page of the first document image are determined. Based on the character information contained in each of the next page candidate manuscript images indicating candidate manuscript images or the respective layouts, a table of contents page is determined and the character information contained in the table of contents page To generate a table of contents information in which a plurality of character strings included in the table of contents page are associated with an appearance order, and among a plurality of document images other than the page of the table of contents, a plurality of character strings included therein A table of contents reuse determination unit for determining whether or not there is a document image whose appearance order matches the table of contents information;
The image processing apparatus according to claim 4.

A character information acquisition step for acquiring character information indicating characters included in the image data read from the document;
An aggregation number determination step of determining an aggregation number indicating the number of document images aggregated in the image data based on the character information acquired in the character information acquisition step;
A page order determination step for determining a page order of the plurality of document images based on the character information included in each of the plurality of document images included in the image data.
Image processing method.

On the computer,
A character information acquisition step for acquiring character information indicating characters included in the image data read from the document;
An aggregation number determination step of determining an aggregation number indicating the number of document images aggregated in the image data based on the character information acquired in the character information acquisition step;
A program for executing a page order determination step for determining a page order of the plurality of document images based on the character information included in each of the plurality of document images.