JP2010130500A

JP2010130500A - Image reading apparatus, image reading method and image reading program

Info

Publication number: JP2010130500A
Application number: JP2008304720A
Authority: JP
Inventors: Ryosuke Okajima; 良介岡島
Original assignee: Konica Minolta Business Technologies Inc
Current assignee: Konica Minolta Business Technologies Inc
Priority date: 2008-11-28
Filing date: 2008-11-28
Publication date: 2010-06-10

Abstract

<P>PROBLEM TO BE SOLVED: To provide an image reading apparatus capable of creating an image file including a table-of-contents (TOC) page to be generated on the basis of a page, on which tab paper is detected, when reading a plurality of document including the tab paper to create the image file. <P>SOLUTION: The present invention relates to an image reading apparatus for reading a plurality of documents including tab paper to create an image file. The image reading apparatus includes: a reading means for sequentially reading a plurality of documents including tab paper to generate image data of the documents; a storage means for storing image data while adding page numbers corresponding to an order of reading, and storing the page number of the tab paper without recognizing image data on the tab paper as a page in the case where the tab paper is detected as a read document; a TOC generating means for generating a TOC of the image file on the basis of a page number of the tab paper stored in the storage means in the case of reading of the documents is completed; and an image file creating means for creating the image file by defining the generated TOC as a leading page of the image file and continuing read image data to subsequent pages. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、タブ紙を含む複数の原稿を読み取り画像ファイルを作成する画像読取装置、画像読取方法および画像読取プログラムに関する。 The present invention relates to an image reading apparatus, an image reading method, and an image reading program for reading a plurality of documents including tab sheets and creating an image file.

近年、データの種類別にアプリケーションを使い分ける必要がなく、各種アプリケーションを一つの統合アプリケーションとして形成するプログラムが提案されている。このプログラムによれば、各アプリケーションで生成されたデータ同士を組み合わせて一つの文書を構成し、ユーザは、各アプリケーションで作成したデータを、その統合アプリケーションに含まれる特定のアプリケーションによって一つの文書にまとめることができる。 In recent years, there has been proposed a program for forming various applications as one integrated application without using different applications for different types of data. According to this program, data generated by each application is combined to form one document, and the user combines the data created by each application into one document by a specific application included in the integrated application. be able to.

さらに、ユーザの所望するアプリケーションで作成されるデータをひとまとめにして、文書を作成したり編集したりするとともに、文書情報の所定ページの間にインデックス紙を挿入し、このインデックス紙のタブに印刷する文字列を設定して印刷できる情報処理装置が提供されている（例えば、特許文献１参照。）。 Further, the data created by the application desired by the user is collectively created to create or edit a document, and an index sheet is inserted between predetermined pages of the document information and printed on the tab of the index sheet. There has been provided an information processing apparatus that can set and print a character string (see, for example, Patent Document 1).

特許文献１に記載の技術は、目次情報が予め埋め込まれている文書を印刷する際に、目次で示されたページにタブ紙を挿入し印刷するものである。
特開２００３−２９６３１２号公報 The technique described in Japanese Patent Laid-Open No. 2004-228561 inserts and prints a tab sheet on a page indicated in the table of contents when printing a document in which table of contents information is embedded in advance.
JP 2003-296212 A

しかしながら、上記特許文献１に記載の技術とは逆に、インデックス紙が挿入された文書を、スキャナなどの画像読取装置で読み取って、文書データを生成する場合、タブ紙のページを取り除くため、目次ページを作成するためにページ番号を再計算したり、データを再構成することが必要であった。 However, contrary to the technique described in Patent Document 1, when a document with index paper inserted is read by an image reading device such as a scanner to generate document data, the tab paper page is removed. It was necessary to recalculate the page number or reconstruct the data to create the page.

また、画像読取装置でタブ紙のページを読み取った場合であっても、タブ紙は他の原稿と区別されることなく、読み取られ文書データが生成される。そのため、本来のタブ紙の機能である章区切りを行うことができず、ユーザが目次ページを作成する際には、タブ紙が挿入されたページを指定し、目次ページを作成しなければなかった。 Even when a tab sheet page is read by the image reading apparatus, the tab sheet is read and document data is generated without being distinguished from other documents. For this reason, chapter separation, which is the original tab sheet function, cannot be performed, and when the user creates a table of contents page, the user must specify the page in which the tab sheet is inserted and create the table of contents page. .

本発明は上記従来の課題に鑑みてなされたものであり、本発明の目的は、タブ紙を含む複数の原稿を読み取り画像ファイルを作成する際に、タブ紙を検出したページに基づいて生成する目次ページを有する画像ファイルを作成することができる画像読取装置、画像読取方法および画像読取プログラムを提供することにある。 The present invention has been made in view of the above-described conventional problems, and an object of the present invention is to generate a read image file based on a page on which a tab sheet is detected when a plurality of documents including the tab sheet are read and an image file is created. An object of the present invention is to provide an image reading apparatus, an image reading method, and an image reading program capable of creating an image file having a table of contents page.

本発明の上記目的は、下記の手段によって達成される。 The above object of the present invention is achieved by the following means.

（１) タブ紙を含む複数の原稿を読み取り画像ファイルを作成する画像読取装置であって、タブ紙を含め複数の原稿を順次読み取り、原稿の画像データを生成する読取手段と、読取原稿としてタブ紙が検出された場合、読み取り順に応じた当該タブ紙のページ番号を記憶する記憶手段と、前記記憶手段に記録される前記タブ紙のページ番号に基づいて、画像ファイルの目次を生成する目次生成手段と、生成された前記目次と読み取った前記画像データとを用いて画像ファイルを作成する画像ファイル作成手段とを有することを特徴とする画像読取装置。 (1) An image reading apparatus that reads a plurality of originals including tab sheets and creates an image file, a reading unit that sequentially reads a plurality of originals including tab sheets and generates image data of the originals, and a tab as a read original When a sheet is detected, a storage unit that stores the page numbers of the tab sheets according to the reading order, and a table of contents generation that generates a table of contents of the image file based on the page numbers of the tab sheets recorded in the storage unit An image reading apparatus comprising: means; and an image file creation means for creating an image file using the generated table of contents and the read image data.

（２) 読取原稿としてタブ紙が検出された場合、当該タブ紙の画像データをページとして認識するか否かの指定を受け付ける指定手段をさらに有し、前記画像ファイル作成手段は、タブ紙の画像データをページとして認識する指定がされている場合には、当該タブ紙の本体部分の画像データを含む前記画像データを用いて画像ファイルを作成し、タブ紙の画像データをページとして認識する指定がされていない場合には、当該タブ紙の画像データを除く前記画像データを用いて画像ファイルを作成することを特徴とする上記（１）に記載の画像読取装置。 (2) When a tab sheet is detected as a read document, the image forming apparatus further includes a designation unit that accepts designation of whether or not to recognize the image data of the tab sheet as a page, and the image file creation unit includes an image of the tab sheet If it is specified that the data is recognized as a page, an image file is created using the image data including the image data of the main body portion of the tab sheet, and the specification that recognizes the tab sheet image data as a page is specified. If not, the image file is created by using the image data excluding the image data of the tab sheet, and the image reading apparatus according to (1) above.

（３) 読取原稿としてタブ紙が検出された場合、タブ紙のタブ部分を文字認識し文字列を検出する手段をさらに有し、前記記憶手段は、前記タブ紙のページ番号と共に前記文字列をさらに記憶し、前記目次生成手段は、前記タブ紙のページ番号と前記文字列に基づいて、前記画像ファイルの目次を生成することを特徴とする上記（１）または（２）に記載の画像読取装置。 (3) In the case where a tab sheet is detected as a read document, it further has means for recognizing the tab portion of the tab sheet and detecting a character string, and the storage means The image reading unit according to (1) or (2), further storing, wherein the table of contents generation unit generates a table of contents of the image file based on a page number of the tab sheet and the character string. apparatus.

（４) タブ紙を含む複数の原稿を読み取り画像ファイルを作成する画像読取方法であって、タブ紙を含め複数の原稿を順次読み取り、原稿の画像データを生成するステップ（ａ）と、読取原稿としてタブ紙が検出された場合、読み取り順に応じた当該タブ紙のページ番号を記憶するステップ（ｂ）と、記憶された前記タブ紙のページ番号に基づいて、画像ファイルの目次を生成するステップ（ｃ）と、生成された前記目次と読み取った前記画像データとを用いて画像ファイルを作成するステップ（ｄ）とを有することを特徴とする画像読取方法。 (4) An image reading method for reading a plurality of documents including tab sheets and creating an image file, the step (a) for sequentially reading a plurality of documents including tab sheets and generating image data of the documents, and a read document When a tab sheet is detected as a tab sheet, a step (b) of storing the page number of the tab sheet corresponding to the reading order, and a step of generating a table of contents of the image file based on the stored page number of the tab sheet ( c) and a step (d) of creating an image file using the generated table of contents and the read image data.

（５) 読取原稿としてタブ紙が検出された場合、当該タブ紙の画像データをページとして認識するか否かの指定を受け付けるステップ（ｅ）をさらに有し、前記ステップ（ｄ）において、タブ紙の画像データをページとして認識する指定がされている場合には、当該タブ紙の本体部分の画像データを含む前記画像データを用いて画像ファイルが作成され、タブ紙の画像データをページとして認識する指定がされていない場合には、当該タブ紙の画像データを除く前記画像データを用いて画像ファイルが作成されることを特徴とする上記（４）に記載の画像読取方法。 (5) When a tab sheet is detected as a read document, the method further includes a step (e) of accepting designation as to whether or not the image data of the tab sheet is recognized as a page. Is designated as a page, an image file is created using the image data including the image data of the main part of the tab sheet, and the tab sheet image data is recognized as a page. The image reading method according to the above (4), wherein when not specified, an image file is created using the image data excluding the image data of the tab sheet.

（６)読取原稿としてタブ紙が検出された場合、タブ紙のタブ部分を文字認識し文字列を検出するステップ（ｆ）をさらに有し、
前記ステップ（ｂ）において、前記タブ紙のページ番号と共に前記文字列がさらに記憶され、前記ステップ（ｃ）において、前記タブ紙のページ番号と前記文字列に基づいて、前記画像ファイルの目次が生成されることを特徴とする上記（４）または（５）に記載の画像読取方法。 (6) When a tab sheet is detected as a read original, the method further includes a step (f) of recognizing the tab portion of the tab sheet and detecting a character string.
In step (b), the character string is further stored together with the page number of the tab sheet, and in step (c), a table of contents of the image file is generated based on the page number of the tab sheet and the character string. The image reading method according to (4) or (5) above, wherein

（７)タブ紙を含む複数の原稿を読み取り画像ファイルを作成する画像読取装置によって実行されるプログラムであって、タブ紙を含め複数の原稿を順次読み取り、原稿の画像データを生成する手順（ａ）と、読取原稿としてタブ紙が検出された場合、読み取り順に応じた当該タブ紙のページ番号を記憶部に記憶させる手順（ｂ）と、記憶された前記タブ紙のページ番号に基づいて、画像ファイルの目次を生成する手順（ｃ）と、生成された前記目次と読み取った前記画像データとを用いて画像ファイルを作成する手順（ｄ）とを画像読取装置に実行させることを特徴とする画像読取プログラム。 (7) A program executed by an image reading apparatus that reads a plurality of documents including tab sheets and creates an image file, and sequentially reads a plurality of documents including tab sheets to generate image data of the documents (a ), And when a tab sheet is detected as a read document, the page number of the tab sheet corresponding to the reading order is stored in the storage unit (b), and an image is generated based on the stored page number of the tab sheet. An image reading apparatus which causes the image reading apparatus to execute a procedure (c) for generating a table of contents of a file and a procedure (d) for creating an image file using the generated table of contents and the read image data Reading program.

（８)読取原稿としてタブ紙が検出された場合、当該タブ紙の画像データをページとして認識するか否かの指定を受け付ける手順（ｅ）をさらに有し、前記手順（ｄ）において、タブ紙の画像データをページとして認識する指定がされている場合には、当該タブ紙の本体部分の画像データを含む前記画像データを用いて画像ファイルが作成され、タブ紙の画像データをページとして認識する指定がされていない場合には、当該タブ紙の画像データを除く前記画像データを用いて画像ファイルが作成されることを特徴とする上記（７）に記載の画像読取プログラム。 (8) When a tab sheet is detected as a read document, the method further includes a procedure (e) for accepting designation as to whether or not the image data of the tab sheet is recognized as a page. Is designated as a page, an image file is created using the image data including the image data of the main part of the tab sheet, and the tab sheet image data is recognized as a page. The image reading program according to (7), wherein if not specified, an image file is created using the image data excluding the image data of the tab sheet.

（９)読取原稿としてタブ紙が検出された場合、タブ紙のタブ部分を文字認識し文字列を検出する手順（ｆ）をさらに画像読取装置に実行させ、前記手順（ｂ）において、前記タブ紙のページ番号と共に前記文字列がさらに前記記憶部に記憶させられ、前記手順（ｃ）において、前記タブ紙のページ番号と前記文字列に基づいて、前記画像ファイルの目次が生成されることを特徴とする上記（７）または（８）に記載の画像読取プログラム。 (9) When a tab sheet is detected as a read document, the image reading apparatus is further caused to execute a procedure (f) for recognizing the tab portion of the tab sheet and detecting a character string. The character string is further stored in the storage unit together with the page number of the paper, and in the step (c), the table of contents of the image file is generated based on the page number of the tab paper and the character string. The image reading program according to (7) or (8), which is characterized in that

（１０)上記（７）〜（９）のいずれか１つに記載の画像読取プログラムを記録したコンピュータ読み取り可能な記録媒体。 (10) A computer-readable recording medium on which the image reading program according to any one of (7) to (9) is recorded.

本発明では、タブ紙を含む複数の原稿を読み取り画像ファイルを作成する際に、タブ紙を検出したページに基づいて生成する目次ページを有する画像ファイルを作成することができる。 In the present invention, when a plurality of documents including tab sheets are read and an image file is generated, an image file having a table of contents page generated based on the page on which the tab sheet is detected can be generated.

以下、本発明の実施の形態を、図面を参照して詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本発明の画像読取装置を組み込むＭＦＰの構成を概略的に示す構成図である。 FIG. 1 is a configuration diagram schematically showing a configuration of an MFP incorporating an image reading apparatus of the present invention.

本発明の画像読取装置は、画像形成装置としてのＭＦＰ１００（Ｍｕｌｔｉ−ＦｕｎｃｔｉｏｎＰｅｒｉｐｈｅｒａｌ：多機能周辺機器）に組み込まれて実現され、ＭＦＰ１００の機能の一部を構成する。 The image reading apparatus of the present invention is realized by being incorporated in an MFP 100 (Multi-Function Peripheral) as an image forming apparatus, and constitutes a part of the function of the MFP 100.

ＭＦＰ１００は、制御部１０１、記憶部１０２、操作パネル部１０３、ＡＤＦ（ＡｕｔｏＤｏｃｕｍｅｎｎｔＦｅｅｄｅｒ）１０４、画像読取部１０５、給紙部１０６、画像形成部１０７および画像処理部１０８を備えており、これらは信号をやり取りするためのバスを介して相互に接続される。 The MFP 100 includes a control unit 101, a storage unit 102, an operation panel unit 103, an ADF (Auto Document Feeder) 104, an image reading unit 105, a paper feeding unit 106, an image forming unit 107, and an image processing unit 108. They are connected to each other via a bus for exchanging signals.

制御部１０１はＣＰＵであり、プログラムにしたがって上記各部の制御や各種の演算処理を行い、以下に示す画像読取装置２００の制御も行う。記憶部１０２は、ＭＦＰ１の基本動作を制御する各種プログラムや各種データを格納しておくＲＯＭ、作業領域として一時的にプログラムやデータを記憶するＲＡＭ、オペレーティングシステムを含む各種プログラムや各種データを格納するハードディスク（ＨＤＤ）等からなる。これらのＲＯＭ、ＲＡＭ、ＨＤＤは、以下に示すＲＯＭ２２、ＲＡＭ２３、ＨＤＤ２４にその一部を提供する。 The control unit 101 is a CPU, and controls the above units and various arithmetic processes according to a program, and also controls the image reading apparatus 200 described below. The storage unit 102 stores various programs and various data for controlling the basic operation of the MFP 1, RAM that temporarily stores programs and data as a work area, and various programs and various data including an operating system. It consists of a hard disk (HDD). Some of these ROM, RAM, and HDD provide the ROM 22, RAM 23, and HDD 24 described below.

操作パネル部１０３は、タッチパネル、テンキー、スタートボタン、ストップボタン等を備えており、各種情報の表示および各種指示の入力に使用される。 The operation panel unit 103 includes a touch panel, a numeric keypad, a start button, a stop button, and the like, and is used for displaying various information and inputting various instructions.

ＡＤＦ１０４は、セットされたタブ紙を含む複数枚の原稿を１枚ずつ画像読取部１０５の所定の読み取り位置まで順次搬送し、そして画像読み取り後の原稿を順次排出する。 The ADF 104 sequentially conveys a plurality of originals including the set tab sheets to a predetermined reading position of the image reading unit 105 one by one, and sequentially discharges the originals after image reading.

画像読取部１０５は、所定の読み取り位置にセットされた原稿またはＡＤＦ１０４により所定の読み取り位置に搬送された原稿に、蛍光ランプ等の光源で光を当て、その反射光をＣＣＤイメージセンサ等の撮像装置で光電変換して、その電気信号から画像データを生成する。 The image reading unit 105 irradiates the original set at a predetermined reading position or the original conveyed to the predetermined reading position by the ADF 104 with a light source such as a fluorescent lamp, and reflects the reflected light to an imaging device such as a CCD image sensor. The image data is generated from the electrical signal.

給紙部１０６は、印刷に使用される記録材としての用紙を収容する。給紙部１０６は、収容された用紙を１枚ずつ画像形成部１０７に送り出す。画像形成部１０７は、帯電、露光、現像、転写および定着の各工程を含む電子写真式プロセス等の周知の作像プロセスを用いて、各種データを用紙上に印刷する。 The paper feeding unit 106 accommodates paper as a recording material used for printing. The sheet feeding unit 106 sends out the stored sheets one by one to the image forming unit 107. The image forming unit 107 prints various data on a sheet using a known image forming process such as an electrophotographic process including charging, exposure, development, transfer, and fixing processes.

画像処理部１０８は、画像読取部１０５で読み取った画像データから文字画像領域、図形画像領域および写真画像領域を分離し、各領域に適切な画像処理を施す。また、画像処理部１０８は、各領域を位置情報に基づいて合成し、内部ファイル形式により文書ファイルを作成し、文書ファイルをＰＤＦファイルに変換する。 The image processing unit 108 separates the character image region, the graphic image region, and the photographic image region from the image data read by the image reading unit 105, and performs appropriate image processing on each region. Further, the image processing unit 108 combines the areas based on the position information, creates a document file in the internal file format, and converts the document file into a PDF file.

なお、ＭＦＰ１００は、通信インタフェースを備えており、例えばパーソナルコンピュータ（ＰＣ）や携帯端末などの外部装置と作成したＰＤＦファイル等のデータをやり取りする送受信機能、および原稿を複写するコピー機能を備える。 The MFP 100 includes a communication interface, and includes a transmission / reception function for exchanging data such as a created PDF file with an external device such as a personal computer (PC) or a portable terminal, and a copy function for copying a document.

図２は、画像読取装置の構成を機能的に示すブロック図である。 FIG. 2 is a block diagram functionally showing the configuration of the image reading apparatus.

図２に示すように、画像読取装置２００は、ＣＰＵ２１、ＲＯＭ２２、ＲＡＭ２３、ハードディスク（ＨＤＤ）２４、操作パネル部２５、画像処理部２６および原稿読取部２７を備えており、これらは信号をやり取りするためのバスを介して相互に接続される。 As shown in FIG. 2, the image reading apparatus 200 includes a CPU 21, a ROM 22, a RAM 23, a hard disk (HDD) 24, an operation panel unit 25, an image processing unit 26, and a document reading unit 27, which exchange signals. Connected to each other via a bus.

ＣＰＵ２１は、上記制御部１０１の一部の機能を示し、プログラムにしたがって画像読取装置２００の各部の制御や各種の演算処理等を行う。ＲＯＭ２２は、画像読取装置２００の基本動作を制御する各種プログラムやパラメータを格納する。 The CPU 21 shows a part of the functions of the control unit 101, and controls each part of the image reading apparatus 200 and performs various arithmetic processes according to a program. The ROM 22 stores various programs and parameters for controlling basic operations of the image reading apparatus 200.

ＲＡＭ２３は、作業領域として一時的にプログラムやデータを記憶する。ＲＡＭ２３には、操作パネル部１０３によるタブ紙本体の画像データをＰＤＦファイルに含むか否かの設定を記憶するための領域と、原稿読取部２７で読み取った画像データを一時的に記憶するための領域と、タブ紙を検出したページに基づいて生成される目次テーブルを記憶するための領域とが確保される。ハードディスク（ＨＤＤ）２４には、予め用意した定型原稿サイズのデータを記憶するための領域と、生成したＰＤＦファイルを記憶するための領域とが確保される。 The RAM 23 temporarily stores programs and data as a work area. In the RAM 23, an area for storing whether or not the PDF data includes image data of the tab sheet main body by the operation panel unit 103 and image data read by the document reading unit 27 are temporarily stored. An area and an area for storing a table of contents table generated based on the page where the tab sheet is detected are secured. In the hard disk (HDD) 24, an area for storing data of a standard document size prepared in advance and an area for storing the generated PDF file are secured.

操作パネル部２５は、上記操作パネル部１０３の一部の機能を示し、タブ紙本体の画像データをＰＤＦファイルに含むか否かを設定する機能を備える。原稿読取部２７は、ＡＤＦ１０４および画像読取部１０５からなり、ＡＤＦ１０４にセットされた原稿を、画像読取部１０５により連続的に読み取って、画像データを生成し、ＡＤＦ１０４にセットされた原稿が無くなると作動を停止する。 The operation panel unit 25 shows a part of the functions of the operation panel unit 103, and has a function of setting whether or not the image data of the tab sheet main body is included in the PDF file. The document reading unit 27 includes an ADF 104 and an image reading unit 105. The document reading unit 105 continuously reads a document set on the ADF 104 to generate image data, and operates when there is no document set on the ADF 104. To stop.

図３は、タブ紙を含む読取原稿の一例を示す図である。 FIG. 3 is a diagram illustrating an example of a read document including a tab sheet.

読取原稿は、１０枚の定型サイズの原稿と３枚のタブ紙とからなり、全１３ページで構成される。つまり、先頭にタブ紙が配置され、次いで定型サイズの原稿が３枚、タブ紙、定型サイズの原稿が４枚、タブ紙、定型サイズの原稿が３枚の順に配置される。 The read original is composed of 10 standard-size originals and 3 tab sheets, and is composed of 13 pages. That is, a tab sheet is arranged at the head, then three standard-size originals, tab paper, four standard-size originals, tab paper, and three standard-size originals are arranged in this order.

このタブ紙は、原稿の各章を表しており、先頭のタブ紙は章１を表し、章１は４ページで構成され、次のタブ紙は章２を表し、章２は５ページで構成され、次のタブ紙は章３を表し、章３は４ページで構成される。 This tab sheet represents each chapter of the manuscript, the first tab sheet represents chapter 1, chapter 1 is composed of 4 pages, the next tab sheet represents chapter 2, and chapter 2 is composed of 5 pages. The next tab sheet represents chapter 3, and chapter 3 is composed of four pages.

タブ紙は、それぞれ矩形状の本体と、当該本体の一辺の所定位置に設けられる突出したタブとを備える。この例では、タブ紙のタブには、章を表す文字列としてＩｎｄｅｘ番号が付される。 Each of the tab sheets includes a rectangular main body and a protruding tab provided at a predetermined position on one side of the main body. In this example, an index number is assigned to the tab of the tab sheet as a character string representing a chapter.

このタブ紙は、３枚の構成要素からなる１セットのタブ紙であり、３タブと称される。なお、タブ紙はインデックス紙とも呼ばれる。図３の例では、タブ紙の並び順は正順と呼ばれ、タブが本体の右側に位置されたときに上層のタブが下層のタブよりも図３の上側に位置するように並べられる。３タブのタブ紙の場合、各タブは、本体の一辺の上部から順に３段階にてそれぞれ下降した位置に設けられる。タブ紙は、通常複数セット分重ねられて所定の給紙トレイにセットされる。 This tab sheet is a set of tab sheets composed of three components and is referred to as three tabs. Tab sheets are also called index sheets. In the example of FIG. 3, the order of tab sheets is called normal order, and when the tabs are positioned on the right side of the main body, the upper tabs are arranged so that they are positioned on the upper side of FIG. In the case of a 3-tab tab sheet, each tab is provided at a position lowered in three stages from the top of one side of the main body. Tab sheets are usually stacked for a plurality of sets and set in a predetermined paper feed tray.

図４は、タブ紙を検出する方法を説明するための図である。 FIG. 4 is a diagram for explaining a method of detecting a tab sheet.

原稿読取部２７は、原稿を主走査方向（図中左から右）にライン単位で読み取る。原稿後端部で突出部が存在するか否かが検出され、検出された突出部が連続するか否かが判断される。その結果、突出部が存在する場合、読み取った原稿はタブ紙であると判断され、突出部が存在しない場合、読み取った原稿は定型サイズの原稿であると判断される。 The document reading unit 27 reads a document line by line in the main scanning direction (from left to right in the figure). It is detected whether or not there is a protrusion at the trailing edge of the document, and it is determined whether or not the detected protrusion is continuous. As a result, when the protruding portion exists, it is determined that the read document is a tab sheet, and when the protruding portion does not exist, it is determined that the read document is a standard size document.

次に、タブ紙が検出された場合、タブ紙の本体とタブ紙とが分離され、目次ページと見出しの文字列とを関連付けた目次テーブルが生成される。目次テーブルは、画像読取装置２００のＲＡＭ２３に記憶され、ＰＤＦファイルの目次ページを生成する際に参照される。 Next, when a tab sheet is detected, the main body of the tab sheet and the tab sheet are separated, and a table of contents table in which the table of contents page and the character string of the headline are associated with each other is generated. The table of contents table is stored in the RAM 23 of the image reading apparatus 200 and is referred to when generating the table of contents page of the PDF file.

図３の例では、タブ紙の本体を原稿ページに含める場合、目次ページとしての原稿の１ページ目には、目次文字列「Ｉｎｄｅｘ１」が関連付けられ、目次ページとしての原稿の５ページ目には、目次文字列「Ｉｎｄｅｘ２」が関連付けられ、目次ページとしての原稿の１０ページ目には、目次文字列「Ｉｎｄｅｘ３」が関連付けられる。 In the example of FIG. 3, when the main body of the tab sheet is included in the manuscript page, the first page of the manuscript as the table of contents page is associated with the table of contents character string “Index1”, and the fifth page of the manuscript as the table of contents page is displayed. The table of contents character string “Index2” is associated, and the table of contents character string “Index3” is associated with the 10th page of the document as the table of contents page.

また、タブ紙の本体を原稿ページに含めない場合、目次ページとしての原稿の１ページ目には、目次文字列「Ｉｎｄｅｘ１」が関連付けられ、目次ページとしての原稿の４ページ目には、目次文字列「Ｉｎｄｅｘ２」が関連付けられ、目次ページとしての原稿の９ページ目には、目次文字列「Ｉｎｄｅｘ３」が関連付けられる。 When the main body of the tab sheet is not included in the manuscript page, the table of contents character string “Index1” is associated with the first page of the manuscript as the table of contents page, and the table of contents characters are displayed on the fourth page of the manuscript as the table of contents page. The column “Index2” is associated, and the table of contents character string “Index3” is associated with the ninth page of the document as the table of contents page.

タブ紙の本体とタブとを分離する方法としては、定型原稿（Ａ４、Ａ３、レター、タブロイド等）のサイズの情報を予め記憶させておき、タブ紙の画像データのサイズと最も近似する定型原稿を参照することにより、定型原稿のサイズに収まっている箇所が本体であると判断させ、収まっていない残りの画像データがタブであると判断させる方法がある。また、原稿読取部が原稿を読み取る際に、タブ紙のサイズや原稿の読取方法を予め指定させることにより判断させる方法もある。 As a method of separating the tab sheet main body and the tab, information on the size of a standard document (A4, A3, letter, tabloid, etc.) is stored in advance, and the standard document closest to the size of the tab sheet image data. , There is a method for determining that a portion that fits into the size of the standard document is the main body and that the remaining image data that does not fit is a tab. There is also a method of determining when the document reading unit reads a document by specifying in advance the size of the tab sheet and the document reading method.

図５および図６は、ＰＤＦファイルのデータ構造の記述例を示す図である。 5 and 6 are diagrams illustrating a description example of the data structure of the PDF file.

ＰＤＦファイルは、ヘッダ、ボディ、相互参照表およびトレーラの４つの要素から構成される。 The PDF file is composed of four elements: a header, a body, a cross reference table, and a trailer.

ヘッダ（％ＰＤＦ−１．６）は、そのファイルが準拠するＰＤＦ仕様のバージョン情報を特定する。ボディには、ファイルに格納された文書を構成するオブジェクトが記述される。相互参照表には、ファイル中の間接オブジェクトに関する情報が記述されるが、ここでは記述を省略する。トレーラ（ｔｒａｉｌｅｒ）は、ファイル終端記号（％％ＥＯＦ）を含み、相互参照表の位置およびファイルのボディ内にある一部の特別なオブジェクトの位置を示す。 The header (% PDF-1.6) specifies version information of the PDF specification to which the file complies. In the body, an object constituting the document stored in the file is described. In the cross reference table, information about indirect objects in the file is described, but the description is omitted here. The trailer includes an end-of-file symbol (%% EOF) and indicates the location of the cross-reference table and some special objects within the body of the file.

ここで、ＰＤＦファイルの各ページを表現するボディについて詳細に説明する。図５および６の例では、ボディには読取原稿の１ページ目から３ページ目と目次ページの表示内容が記述される。 Here, the body expressing each page of the PDF file will be described in detail. 5 and 6, the display contents of the first to third pages and the table of contents page of the read document are described in the body.

オブジェクト１（「１０ｏｂｊ」で示されるオブジェクト。以下、「＊＊０ｏｂｊ」をオブジェクト＊＊という。）はページオブジェクト（Ｔｙｐｅ／Ｐａｇｅ）であり、文書の単一ページの属性を指定する。また、オブジェクト１は、ページが必要とするリソースを含む辞書（Ｒｅｓｏｕｒｃｅｓ２０Ｒ）、このノードの親であるページツリーノード（Ｐａｒｅｎｔ１９０Ｒ）、ページ内容が表示または印刷されるときにクリッピングされる領域（ＣｒｏｐＢｏｘ［００５９５．２２８４２］）、ページの印刷先となる物理媒体の出力可能最大領域（ＭｅｄｉａＢｏｘ［００５９５．２２８４２］）などを定義する
オブジェクト２は１ページ目のリソース辞書を定義する。この例では、オブジェクト２は、カラー値が表現されるカラースペース（ＣｏｌｏｒＳｐａｃｅ＜＜／Ｃｓ６３２０Ｒ＞＞）、参照するフォント辞書（Ｆｏｎｔ＜＜／ＴＴ１３００Ｒ／ＴＴ４１３０Ｒ／ＴＴ６１１０Ｒ＞＞）、ＰｏｓｔＳｃｒｉｐｔ出力装置に印刷する際に解釈される定義済みの手続きセット名（ＰｒｏｃＳｅｔ［／ＰＤＦ／Ｔｅｘｔ］）、グラフィック状態（ＧＳ１）のパラメータ辞書（ＥｘｔＧＳｔａｔｅ＜＜／ＧＳ１３５０Ｒ＞）を定義する。 An object 1 (an object indicated by “1 0 obj”. “** 0 obj” is hereinafter referred to as an object **) is a page object (Type / Page), and specifies an attribute of a single page of a document. Also, object 1 is clipped when the dictionary (Resources 2 0 R) containing the resources required by the page, the page tree node (Parent 19 0 R) that is the parent of this node, or when the page content is displayed or printed. Object 2 (CropBox [0 0 595.22 842]), the maximum outputable area (MediaBox [0 0 595.22 842]) of the physical medium that is the print destination of the page, etc. Define a dictionary. In this example, the object 2 has a color space (ColorSpace << / Cs6 32 0 R >>) in which color values are expressed, and a font dictionary (Font << TT 1 30 0 R / TT 4 13 0 R / TT 6 11) to be referred to. 0 R >>), a predefined procedure set name (ProcSet [/ PDF / Text]) that is interpreted when printing to the PostScript output device, a parameter dictionary (ExtGSState << GS1 35 0 R) of the graphic state (GS1) >).

オブジェクト３はストリームオブジェクトであり、ストリームデータ（バイトの並び）を記述する辞書とそれに続くストリームデータから構成される。この例では、ストリームデータのバイト数（Ｌｅｎｇｔｈ１６９）、ストリームデータの処理に適用されるフィルタ名（Ｆｉｌｔｅｒ／ＦｌａｔｅＤｅｃｏｄｅ）およびストリームデータ（ｓｔｒｅａｍとＥｎｄｓｔｒｅａｍのデータ）を定義する。なお、ストリームデータは省略する。 The object 3 is a stream object, and is composed of a dictionary describing stream data (a sequence of bytes) and subsequent stream data. In this example, the number of bytes of stream data (Length 169), a filter name (Filter / FlateDecode) applied to stream data processing, and stream data (stream and Endstream data) are defined. Note that stream data is omitted.

このように、原稿の１ページ目は、オブジェクト１〜３に記述されるページオブジェクト、ページのリソース辞書、ストリームオブジェクトにより表現される。同様に、原稿の２ページ目は、オブジェクト４〜６に記述されるページオブジェクト、ページのリソース辞書、ストリームオブジェクトにより表現され、原稿の３ページ目は、オブジェクト７〜９に記述されるページオブジェクト、ページのリソース辞書、ストリームオブジェクトにより表現される。 In this way, the first page of the document is represented by the page object described in the objects 1 to 3, the page resource dictionary, and the stream object. Similarly, the second page of the document is represented by a page object described in the objects 4 to 6, a page resource dictionary, and a stream object, and the third page of the document is a page object described in the objects 7 to 9, Represented by a page resource dictionary and stream object.

オブジェクト２４はページオブジェクト（Ｔｙｐｅ／Ｐａｇｅ）であり、目次ページを表現する。オブジェクト２４は、オブジェクト１、４、７と同様に、ページが必要とするリソースを含む辞書、ページツリーノード、クリッピング領域、ページの印刷先の物理媒体の出力可能最大領域などを定義する。また、それ以外に、ページに関連した注釈を表現する注釈オブジェクトをオブジェクト番号２５で指定する（Ａｎｎｏｔｓ２５０Ｒ）。注釈機能は、オブジェクトをＰＤＦ文書のページ上の場所に関連付け、マウスやキーボードを通じてユーザと対話することができ、これによりページオブジェクトは更新される。 An object 24 is a page object (Type / Page) and represents a table of contents page. Similar to the objects 1, 4, and 7, the object 24 defines a dictionary including resources required by the page, a page tree node, a clipping area, a maximum printable area of the physical medium on which the page is printed, and the like. In addition, an annotation object expressing an annotation related to the page is designated by an object number 25 (Annots 25 0 R). The annotation function associates the object with a location on the page of the PDF document and can interact with the user through a mouse or keyboard, which updates the page object.

オブジェクト２５は配列オブジェクト（［２６０Ｒ２７０Ｒ２８０Ｒ］）であり、注釈するオブジェクト番号を指定する。つまり、注釈オブジェクトとして参照する注釈辞書がオブジェクト番号２６、２７、２８に指定される。 The object 25 is an array object ([26 0 R 27 0 R 28 0 R]), and specifies the object number to be annotated. In other words, the annotation dictionary to be referred to as the annotation object is designated as the object numbers 26, 27, and 28.

オブジェクト２６は注釈辞書である。注釈のタイプはリンク注釈（Ｓｕｂｔｙｐｅ／Ｌｉｎｋ）であり、文書内の別の場所にある宛先へのハイパーテキストリンク（ジャンプ）を表現する。また、ページ上における注釈の場所（Ｒｅｃｔ[５６．４１５９７５３．０６２５４２．９２７６７．０]）は、デフォルトユーザの空間の単位で定義される。また、Ａエントリ（Ａ４２０Ｒ／Ｈ／Ｉ）は、注釈がアクティブにされたときに実行されるアクションを表現する。つまり、この例では、注釈のアクティブ領域（ＢＳ<<／Ｓ／ＳＷ０／Ｔｙｐｅ／Ｂｏｒｄｅｒ>>）内でマウスボタンが押されるか、押し続けられるときに、オブジェクト番号４２にジャンプする。同様に、オブジェクト２７はオブジェクト番号４３にジャンプし、オブジェクト２８はオブジェクト番号４４にジャンプすることを記述する。 Object 26 is an annotation dictionary. The annotation type is a link annotation (Subtype / Link), which represents a hypertext link (jump) to a destination at another location in the document. The annotation location (Rect [56.4159 753.062 542.92 767.0]) on the page is defined in units of the default user space. The A entry (A 42 0 R / H / I) represents an action to be executed when an annotation is activated. That is, in this example, when the mouse button is pressed or kept pressed in the annotation active area (BS << / S / SW 0 / Type / Border >>), the program jumps to the object number 42. Similarly, it is described that the object 27 jumps to the object number 43 and the object 28 jumps to the object number 44.

オブジェクト３７はストリームオブジェクトであり、ストリームデータを記述する辞書とそれに続くストリームデータから構成される。 An object 37 is a stream object, and is composed of a dictionary describing stream data and subsequent stream data.

オブジェクト４２はＧｏ−ｔｏアクション（Ｓ／ＧｏＴｏ）であり、ジャンプ先の宛先を示す。この例では、オブジェクト番号１で記述される原稿の１ページ目へジャンプする。つまり、注釈のアクティブ領域内でマウスボタンが押された場合、オブジェクト４２にジャンプし（オブジェクト２６）、オブジェクト番号１で記述される原稿の１ページ目が表示される（オブジェクト４２）。同様に、オブジェクト４２はオブジェクト番号４で記述される原稿の２ページ目へジャンプし、オブジェクト４３はオブジェクト番号７で記述される原稿の３ページ目へジャンプすることを記述する。 An object 42 is a Go-to action (S / GoTo) and indicates a destination of a jump destination. In this example, a jump is made to the first page of the document described by object number 1. That is, when the mouse button is pressed in the active area of the annotation, the program jumps to the object 42 (object 26), and the first page of the document described by the object number 1 is displayed (object 42). Similarly, the object 42 describes jumping to the second page of the document described by the object number 4, and the object 43 describes jumping to the third page of the document described by the object number 7.

オブジェクト１９はページツリー（Ｔｙｐｅ／Ｐａｇｅｓ）であり、文書内におけるページの順序を定義して、個々のページを結び付けＰＤＦファイルのページを構成する。この例では、４ページの文書のページツリーがあることを記述する（Ｃｏｕｎｔ４）。また、オブジェクト番号２４、１、４、７で記述されるページの順番に文書内におけるページが位置付けられる。つまり、ＰＤＦファイルは、目次ページ、原稿１ページ目、原稿２ページ目、原稿３ページ目の順にページが構成される。 An object 19 is a page tree (Type / Pages), which defines the order of pages in a document and connects individual pages to constitute a page of a PDF file. In this example, it is described that there is a page tree of a 4-page document (Count 4). Further, the pages in the document are positioned in the order of the pages described by the object numbers 24, 1, 4, and 7. That is, the PDF file includes pages in the order of the index page, the first page of the document, the second page of the document, and the third page of the document.

このように、目次ページはＰＤＦファイルの１ページ目に表示される。また、目次ページでは、目次項目（注釈のアクティブ領域）がマウスボタンでクリックされた場合、目次項目にリンクされた各原稿のページが表示される。 Thus, the table of contents page is displayed on the first page of the PDF file. On the table of contents page, when the table of contents item (active area of annotation) is clicked with the mouse button, the page of each document linked to the table of contents item is displayed.

図７は、画像読取装置における処理の手順を示すフローチャートである。 FIG. 7 is a flowchart illustrating a processing procedure in the image reading apparatus.

原稿読取部２７にセットされた原稿が読み取られる（ステップＳ１０１）。原稿読取部２７では、ＡＤＦ１０４がセットされた原稿を１枚ずつ画像読取部１０５に順次搬送し、画像読取部１０５が連続的に読み取って画像データを生成する。 The original set on the original reading unit 27 is read (step S101). The document reading unit 27 sequentially conveys the document on which the ADF 104 is set to the image reading unit 105 one by one, and the image reading unit 105 continuously reads and generates image data.

生成された画像データが１ページ目であるか否かが判断される（ステップＳ１０２）。画像データが１ページ目であると判断された場合（Ｓ１０２：Ｙｅｓ）、ＰＤＦファイルが生成され、ヘッダが書き込まれる（ステップＳ１０３）。画像読取装置２００は、複数の原稿のうち最初の原稿を読み取った場合、ＰＤＦファイルをＨＤＤ２４上に生成し、ＰＤＦ仕様のバージョン情報を特定してヘッダを書き込む。 It is determined whether or not the generated image data is the first page (step S102). If it is determined that the image data is the first page (S102: Yes), a PDF file is generated and a header is written (step S103). When the image reading apparatus 200 reads the first document among a plurality of documents, the image reading device 200 generates a PDF file on the HDD 24, specifies the version information of the PDF specification, and writes the header.

目次テーブルが生成され（ステップＳ１０４）、処理はステップＳ１０５に進む。画像読取装置２００は、上述した目次テーブルをＲＡＭ２３上に生成する。 A table of contents table is generated (step S104), and the process proceeds to step S105. The image reading apparatus 200 generates the above-described table of contents on the RAM 23.

一方、ステップＳ１０２で、１ページ目でないと判断された場合（Ｓ１０２：Ｎｏ）、原稿後端に突起があるか否かが判断される（ステップＳ１０５）。上述したように、画像形成装置２００は、原稿後端に突出部が存在するか否かにより、タブ紙であるか定型サイズの原稿であるかを判断する。 On the other hand, if it is determined in step S102 that it is not the first page (S102: No), it is determined whether or not there is a protrusion on the trailing edge of the document (step S105). As described above, the image forming apparatus 200 determines whether the document is a tab sheet or a standard size document depending on whether or not a protruding portion is present at the trailing edge of the document.

すなわち、画像読取装置２００は、最初に原稿を読み取った場合、ＰＤＦファイルおよび目次テーブルを生成し、タブ紙であるか否かを判断する。一方、２ページ目以降の原稿を読み取った場合、そのままタブ紙であるか否かを判断する。 That is, when the original is first read, the image reading apparatus 200 generates a PDF file and a table of contents, and determines whether the sheet is tab paper. On the other hand, when the second and subsequent pages are read, it is determined whether or not it is a tab sheet as it is.

原稿後端に突起があると判断された場合（Ｓ１０５：Ｙｅｓ）、読み取った画像データと定型原稿サイズとが比較される（ステップＳ１０６）。画像読取装置２００は、ＨＤＤ２４から定型原稿サイズのデータを読み出し、ＲＡＭ２３に記録されている読み取った画像データと比較する。 If it is determined that there is a protrusion at the trailing edge of the document (S105: Yes), the read image data is compared with the standard document size (step S106). The image reading apparatus 200 reads data of a standard document size from the HDD 24 and compares it with the read image data recorded in the RAM 23.

読み取った画像データが本体とタブ部分に分離される（ステップＳ１０７）。画像読取装置２００は、上述した方法で、定型原稿サイズのデータと、読み取った画像データとを比較することで、タブ紙の画像データを本体とタブとに分離して、ＲＡＭ２３に記憶する。 The read image data is separated into a main body and a tab portion (step S107). The image reading apparatus 200 compares the data of the standard document size with the read image data by the above-described method, thereby separating the tab sheet image data into the main body and the tab and storing them in the RAM 23.

タブの本体を含む設定であるか否かが判断される（ステップＳ１０８）。画像読取装置２００は、タブ紙本体の画像データをＰＤＦファイルに含むか否かの操作パネル部１０３による設定をＲＡＭ２３から読み出し、タブの本体を含む設定であるか否かを判断する。 It is determined whether the setting includes the main body of the tab (step S108). The image reading apparatus 200 reads the setting by the operation panel unit 103 as to whether or not the image data of the tab sheet main body is included in the PDF file from the RAM 23, and determines whether or not the setting includes the tab main body.

タブの本体を含む設定でないと判断された場合（Ｓ１０８：Ｎｏ）、処理はステップＳ１１０に進む。一方、タブの本体を含む設定であると判断された場合（Ｓ１０８：Ｙｅｓ）、ＰＤＦファイルに本体の画像データが次の画像ページとして追記される（ステップＳ１０９）。 If it is determined that the setting does not include the main body of the tab (S108: No), the process proceeds to step S110. On the other hand, if it is determined that the setting includes the main body of the tab (S108: Yes), the image data of the main body is added to the PDF file as the next image page (step S109).

タブ部分の画像データが文字認識される（ステップＳ１１０）。画像読取装置２００は、ステップＳ１０７で分離したタブ紙のタブ部分の画像データを文字認識する。文字認識の方法としては、たとえば各文字画像の特微量と予め記憶されている辞書パターンとの一致の度合いに基づいて判別する方法等を用いることができる
ページ番号およびステップＳ１１０で認識されたタブ部分の文字列が目次テーブルに格納され（ステップＳ１１１）、処理はステップ１１３に進む。画像読取装置２００は、ステップＳ１０４で生成した目次テーブルにページ番号とタブ部分の文字列を書き込む。 The image data of the tab part is recognized (step S110). The image reading apparatus 200 recognizes the image data of the tab portion of the tab sheet separated in step S107. As a method of character recognition, for example, a method of discriminating based on the degree of matching between the feature amount of each character image and a dictionary pattern stored in advance can be used. Page number and tab portion recognized in step S110 Is stored in the table of contents table (step S111), and the process proceeds to step 113. The image reading apparatus 200 writes the page number and the character string of the tab portion in the table of contents generated in step S104.

一方、ステップＳ１０５で、原稿後端に突起がないと判断された場合（Ｓ１０５：Ｎｏ）、ＰＤＦファイルに画像データが追記され（ステップＳ１１２）、処理はステップ１１３に進む。 On the other hand, if it is determined in step S105 that there is no protrusion at the trailing edge of the document (S105: No), image data is added to the PDF file (step S112), and the process proceeds to step 113.

つまり、画像読取装置２００は、原稿がタブ紙である場合、設定に応じてタブ紙本体の画像データをＰＤＦファイルに追記し、ページ番号とタブ部分の文字列を文字テーブルに格納する。一方、原稿がタブ紙でない場合、画像データを次の原稿ページとしてＰＤＦファイルに追記する。 That is, when the document is a tab sheet, the image reading apparatus 200 adds the image data of the tab sheet main body to the PDF file according to the setting, and stores the page number and the character string of the tab part in the character table. On the other hand, if the document is not a tab sheet, the image data is added to the PDF file as the next document page.

最終ページであるか否かが判断される（ステップＳ１１３）。画像読取装置２００は、原稿読取部２７によりＡＤＦ１０４にセットされた原稿が無くなったか否かで判断する。最終ページでないと判断された場合（Ｓ１１３：Ｎｏ）、処理はステップＳ１０１に戻り、次の原稿の読み取り処理を行う。 It is determined whether it is the last page (step S113). The image reading apparatus 200 determines whether or not the original set on the ADF 104 by the original reading unit 27 is lost. If it is determined that the page is not the last page (S113: No), the process returns to step S101, and the next original reading process is performed.

一方、最終ページであると判断された場合（Ｓ１１３：Ｙｅｓ）、目次テーブルに基づいて、ＰＤＦファイルの目次オブジェクトが生成される（ステップＳ１１４）。画像読取装置２００は、ＰＤＦファイルの原稿ページの書き込みを終了し、目次ページの生成を開始する。 On the other hand, when it is determined that the page is the last page (S113: Yes), a table of contents object of the PDF file is generated based on the table of contents table (step S114). The image reading apparatus 200 finishes writing the original page of the PDF file and starts generating the table of contents page.

ステップＳ１１４で生成された目次オブジェクトがＰＤＦファイルに追記される（ステップＳ１１５）。画像読取装置２００は、ＰＤＦファイルの目次ページとして目次オブジェクトを追記する。 The table of contents object generated in step S114 is added to the PDF file (step S115). The image reading apparatus 200 adds a table of contents object as the table of contents page of the PDF file.

ＰＤＦファイルにフッタが書き込まれ（ステップＳ１１６）、処理は終了する。画像読取装置２００は、ＰＤＦファイルのフッタとしてトレーラを追記する。 A footer is written in the PDF file (step S116), and the process ends. The image reading apparatus 200 adds a trailer as a footer of the PDF file.

つまり、画像読取装置２００は、原稿を最終ページまで読み取った後、目次テーブルに基づいてＰＤＦファイルの目次ページを生成し追記する。なお、上記で説明したように、追記される目次テーブルは、ＰＤＦのページツリー構造を用いて先頭ページに表示される。 That is, the image reading apparatus 200 reads the document up to the last page, and then generates and appends a table of contents page of the PDF file based on the table of contents table. As described above, the added table of contents table is displayed on the first page using the page tree structure of PDF.

以上、本実施形態によれば、タブ紙を含む複数の原稿を読み取り画像ファイルを作成する際に、タブ部分の文字列を目次項目とし、その目次項目にタブ紙を検出したページを記載する目次ページを生成することができる。さらに、この目次ページを先頭ページとする画像ファイルを作成することができる。そうすることで、ユーザは、タブ紙を含む複数の原稿を読み取る場合、タブ紙により章区切りされた目次ページ付きの画像ファイルを取得することができる。 As described above, according to the present embodiment, when a plurality of documents including tab sheets are read and an image file is created, a character string in the tab portion is used as a table of contents item, and the table of contents in which the tab sheet is detected is included in the table of contents item. A page can be generated. Furthermore, an image file having the table of contents page as the first page can be created. By doing so, the user can acquire an image file with a table of contents page divided into chapters by tab sheets when reading a plurality of originals including tab sheets.

本発明は、上記した実施の形態のみに限定されるものではなく、特許請求の範囲内において、種々改変することができる。 The present invention is not limited to the above-described embodiments, and various modifications can be made within the scope of the claims.

上記実施の形態では、操作パネル部１０３により、タブ紙本体の画像データをＰＤＦファイルに含むか否かの設定し、タブの本体を含む設定である場合、タブ紙本体の画像データは次の画像ページとして追記された（ステップＳ１０９）が、本発明はこれに限定されない。読み取り原稿としてタブ紙が検出された場合、タブ紙本体の画像データはページとして認識されることなく、次の画像データを次の画像ページとしてＰＤＦファイルに追記することもできる。 In the above embodiment, whether or not to include the image data of the tab sheet main body in the PDF file is set by the operation panel unit 103. If the setting includes the main body of the tab, the image data of the tab sheet main body is the next image. Although added as a page (step S109), the present invention is not limited to this. When a tab sheet is detected as a read document, the image data of the tab sheet main body is not recognized as a page, and the next image data can be added to the PDF file as the next image page.

そうすることにより、操作パネル部による設定を省略することができ、ユーザによる入力が不要となる。また、画像読取装置による処理においても、タブ紙の本体を含む設定である否かの判断する処理を省略することができ、処理の簡略化が図れる。 By doing so, setting by the operation panel unit can be omitted, and input by the user becomes unnecessary. Also in the processing by the image reading apparatus, the processing for determining whether the setting includes the main body of the tab sheet can be omitted, and the processing can be simplified.

また、上記実施の形態では、画像読取装置は、ＭＦＰ１００に組み込まれて実現され、ＭＦＰ１００の機能の一部を構成しているが、本発明はこれに限定されない。本発明の画像読取装置は、原稿読取部および画像処理部を備え、タブ紙を含む複数の原稿を読み取り画像ファイルを作成することができれば、単独の装置で実現することができ、またＭＦＰ以外の他の装置に組み込まれてもよい。 In the above-described embodiment, the image reading apparatus is realized by being incorporated in MFP 100, and constitutes a part of the function of MFP 100. However, the present invention is not limited to this. The image reading apparatus of the present invention includes an original reading unit and an image processing unit, and can be realized by a single device if it can read a plurality of originals including tab sheets and create an image file. It may be incorporated in other devices.

また、本実施形態の画像処理装置における処理は、専用のハードウェア回路、またはプログラムされたコンピュータのいずれによっても実現することが可能である。上記プログラムは、たとえばフレキシブルディスクやＣＤ−ＲＯＭなどのコンピュータ読み取り可能な記録媒体によって提供されてもよいし、インターネット等のネットワークを介してオンラインで提供されてもよい。この場合、コンピュータ読み取り可能な記録媒体に記録されたプログラムは、通常、ハードディスク等の記憶部に転送されて記憶される。また、上記プログラムは、単独のアプリケーションソフトとして提供されてもよいし、装置の一機能としてその装置のソフトウェアに組み込まれてもよい。 Further, the processing in the image processing apparatus of the present embodiment can be realized by either a dedicated hardware circuit or a programmed computer. The program may be provided by a computer-readable recording medium such as a flexible disk or a CD-ROM, or may be provided online via a network such as the Internet. In this case, the program recorded on the computer-readable recording medium is usually transferred to and stored in a storage unit such as a hard disk. The program may be provided as a single application software, or may be incorporated into the software of the device as one function of the device.

本発明の画像読取装置を組み込むＭＦＰの構成を概略的に示す構成図である。1 is a configuration diagram schematically showing a configuration of an MFP incorporating an image reading apparatus of the present invention. 画像読取装置の構成を機能的に示すブロック図である。2 is a block diagram functionally showing the configuration of the image reading apparatus. FIG. タブ紙を含む読取原稿の一例を示す図である。It is a figure which shows an example of the reading original document containing a tab sheet. タブ紙を検出する方法を説明するための図である。It is a figure for demonstrating the method to detect a tab sheet. ＰＤＦデータのデータ構造の記述例を示す図である。It is a figure which shows the example of a description of the data structure of PDF data. ＰＤＦデータのデータ構造の記述例を示す図である。It is a figure which shows the example of a description of the data structure of PDF data. 画像読取装置における処理の手順を示すフローチャートである。3 is a flowchart illustrating a processing procedure in the image reading apparatus.

Explanation of symbols

２１ＣＰＵ、
２２ＲＯＭ、
２３ＲＡＭ、
２４ＨＤＤ、
２５，１０３操作パネル部、
２６画像処理部、
２７原稿読取部、
１００ＭＦＰ、
１０１制御部、
１０２記憶部、
１０４ＡＤＦ、
１０５画像読取部、
１０６給紙部、
１０７画像形成部、
１０８画像処理部、
２００画像読取装置。 21 CPU,
22 ROM,
23 RAM,
24 HDD,
25,103 operation panel section,
26 Image processing unit,
27 Document reading unit,
100 MFP,
101 control unit,
102 storage unit,
104 ADF,
105 image reading unit,
106 paper feed unit,
107 Image forming unit,
108 image processing unit,
200 Image reader.

Claims

An image reading apparatus that reads a plurality of documents including tab sheets and creates an image file,
Reading means for sequentially reading a plurality of documents including tab sheets and generating image data of the documents;
Storage means for storing the page numbers of the tab sheets according to the reading order when a tab sheet is detected as a read document;
A table of contents generation unit that generates a table of contents of the image file based on the page number of the tab sheet recorded in the storage unit;
An image reading apparatus, comprising: an image file creating unit that creates an image file using the generated table of contents and the read image data.

When a tab sheet is detected as a read document, the apparatus further includes a designation unit that accepts designation of whether or not to recognize the image data of the tab sheet as a page,
The image file creating means creates an image file using the image data including the image data of the main body portion of the tab sheet when the tab sheet image data is designated to be recognized as a page. The image reading apparatus according to claim 1, wherein when it is not designated to recognize the image data of paper as a page, an image file is created using the image data excluding the image data of the tab sheet. apparatus.

When a tab sheet is detected as a read document, the apparatus further includes means for recognizing the tab portion of the tab sheet and detecting a character string.
The storage unit further stores the character string together with the page number of the tab sheet, and the table of contents generation unit generates a table of contents of the image file based on the page number of the tab sheet and the character string. The image reading apparatus according to claim 1, wherein the image reading apparatus is an image reading apparatus.

An image reading method for reading a plurality of originals including a tab sheet and creating an image file,
A step (a) of sequentially reading a plurality of documents including tab sheets and generating image data of the documents;
A step (b) of storing page numbers of the tab sheets according to the reading order when a tab sheet is detected as a read original;
Generating a table of contents of the image file based on the stored page number of the tab sheet (c);
An image reading method comprising the step (d) of creating an image file using the generated table of contents and the read image data.

A step (e) of receiving a designation as to whether or not to recognize the image data of the tab sheet as a page when a tab sheet is detected as a read document;
If it is specified in step (d) that the image data of the tab sheet is recognized as a page, an image file is created using the image data including the image data of the main body portion of the tab sheet, and the tab data 5. The image according to claim 4, wherein when the image data of paper is not designated to be recognized as a page, an image file is created using the image data excluding the image data of the tab paper. Reading method.

A step (f) of detecting a character string by recognizing a tab portion of the tab paper when a tab paper is detected as a read document;
In step (b), the character string is further stored together with the page number of the tab sheet, and in step (c), a table of contents of the image file is generated based on the page number of the tab sheet and the character string. 6. The image reading method according to claim 4, wherein the image reading method is performed.

A program executed by an image reading apparatus that reads a plurality of documents including tab sheets and creates an image file,
A procedure (a) for sequentially reading a plurality of originals including tab sheets and generating image data of the originals;
When a tab sheet is detected as a read document, a procedure (b) for storing the page number of the tab sheet corresponding to the reading order in the storage unit;
A procedure (c) for generating a table of contents of the image file based on the stored page number of the tab sheet;
An image reading program for causing an image reading apparatus to execute a procedure (d) for creating an image file using the generated table of contents and the read image data.

A procedure (e) for receiving a designation as to whether or not to recognize the image data of the tab sheet as a page when a tab sheet is detected as a read document;
In the step (d), when it is designated to recognize the tab sheet image data as a page, an image file is created using the image data including the image data of the main body portion of the tab sheet, 8. The image file according to claim 7, wherein if it is not specified that the paper image data is recognized as a page, an image file is created using the image data excluding the image data of the tab sheet. Reading program.

When a tab sheet is detected as a read document, the image reading apparatus is further caused to execute a procedure (f) of recognizing the tab portion of the tab sheet and detecting a character string,
In the step (b), the character string is further stored in the storage unit together with the page number of the tab sheet. In the step (c), the image is based on the page number of the tab sheet and the character string. 9. The image reading program according to claim 7, wherein a table of contents of the file is generated.

The computer-readable recording medium which recorded the image reading program of any one of Claims 7-9.