JP2005149269A

JP2005149269A - System for processing structured document

Info

Publication number: JP2005149269A
Application number: JP2003387738A
Authority: JP
Inventors: Kazuyoshi Tanaka; 一義田中
Original assignee: Hitachi Systems and Services Ltd
Current assignee: Hitachi Systems and Services Ltd
Priority date: 2003-11-18
Filing date: 2003-11-18
Publication date: 2005-06-09

Abstract

<P>PROBLEM TO BE SOLVED: To provide a system for processing a structured document that can reduce labor for programming, in a software program for handling a structured document, by eliminating data conversion processing for processing of document contents and data extraction processing for selection of necessary data. <P>SOLUTION: A structured document holds a document structure definition represented by declarations of document elements forming document contents and by a set of semantic relations defined between the document elements, and a set of instances of the respective document elements matching the document structure definition. The system for processing a structured document, which comprises data reading means for reading the instances of document elements from the structured document and editing them into data processible by a software program to provide them, comprises basic data structure selecting means for selecting and specifying a data structure of the data provided by the data reading means, from known basic data structures or object structures, such as array, set, list, tree, graph and table structures. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、構造化文書の処理システムに係り、特に、電子文書を入力としてその内容情報を処理対象データとして他のソフトウェアプログラムヘ提供するミドルウェアシステムに関する。 The present invention relates to a structured document processing system, and more particularly to a middleware system that provides an electronic document as input and provides content information to other software programs as processing target data.

従来、構造化文書の実現方式としてＸＭＬ（Extensible Markup Language）がよく知られている。この実現方式による構造化文書は、ツリー型の論理構造に基づいた文書構造定義に従う。
そこで、従来、これらの構造化文書に対しては、読み出した構造化文書の内容情報をツリー型のデータ構造に基づいて編成し、処理対象データとして他のソフトウェアプログラムヘ提供するミドルウェアシステムが一般に広く利用されている。
このようなツリー型のデータ構造とミドルウェアシステムの仕様を規定した規格としてＷ３ＣＤＯＭ（Document Object Model）がある。 Conventionally, XML (Extensible Markup Language) is well known as a method for realizing structured documents. A structured document according to this implementation method follows a document structure definition based on a tree-type logical structure.
Therefore, conventionally, for these structured documents, there is generally a middleware system that organizes the content information of the read structured document based on a tree-type data structure and provides it to other software programs as processing target data. It's being used.
There is W3C DOM (Document Object Model) as a standard that defines such a tree-type data structure and middleware system specifications.

構造化文書を処理対象データとするソフトウェアプログラムは、さまざまな目的に応じて作成される。
さまざまな目的とは、例えば、構造化文書の内容情報をディスプレイ画面に表示してユーザヘ提示する表示目的、構造化文書の内容情報が何らかの基準に対して妥当であるかを確認するための検証目的、および複数の構造化文書の内容情報を集計する集約目的などである。
それぞれの目的に応じたソフトウェアプログラムは、それぞれの目的を達するための機能を実現するために、それぞれのデータ構造とアルゴリズムを用いて作成される。
用いられるデータ構造はそれぞれ個別の機能を実現するために設計されるので、一般に機能ごとに異なる構造をなす。
これに対して、ミドルウェアシステムが提供する処理対象データがツリー構造のように汎用的なひと通りのデータ構造に決まっている場合、ソフトウェアプログラムは、処理対象データをそれぞれの機能ごとに合うデータ構造へ変換するデータ変換処理を行う必要があった。
また、それぞれの目的に応じたソフトウェアプログラムは、それぞれの目的を達するために必要な内容を構造化文書から抽出して処理対象とする。
構造化文書には、さまざまな目的に応じて多面的に意味づけされた文書内容を単一文書内に保持することができる。
例えば、表示目的に必要な書式やレイアウト順番に関係付けられた文書要素群からなる文書内容と、数量的な検証目的に必要な演算式を表現するように関係付けられた文書要素群からなる文書内容を、単一文書内に共存したり重複したりさせることが可能である。
このように複数の意味関係に基づく複数の文書内容を含む構造化文書に対して、ソフトウェアプログラムは、それぞれの機能ごとに必要な処理対象データを、構造化文書全体の中から抽出するデータ抽出処理を行う必要があった。 Software programs that use structured documents as processing target data are created for various purposes.
The various purposes include, for example, a display purpose for displaying the content information of the structured document on the display screen and presenting it to the user, and a verification purpose for confirming whether the content information of the structured document is appropriate for some standard. And the purpose of aggregating content information of a plurality of structured documents.
A software program corresponding to each purpose is created using each data structure and algorithm in order to realize a function for achieving each purpose.
Since the data structures used are designed to realize individual functions, they generally have different structures for each function.
On the other hand, if the processing target data provided by the middleware system has a general data structure such as a tree structure, the software program converts the processing target data into a data structure suitable for each function. It was necessary to perform data conversion processing for conversion.
Further, the software program corresponding to each purpose extracts the contents necessary to achieve each purpose from the structured document and sets it as a processing target.
A structured document can hold document contents that are given various meanings according to various purposes in a single document.
For example, a document consisting of a group of document elements related to the format and layout order required for display purposes, and a document consisting of groups of document elements related to express an arithmetic expression required for quantitative verification purposes Content can coexist and overlap within a single document.
For a structured document including a plurality of document contents based on a plurality of semantic relationships as described above, the software program extracts data to be processed for each function from the entire structured document. Had to do.

本発明は、前記従来技術の問題点を解決するためになされたものであり、本発明の目的は、ソフトウェアプログラムにおける前記データ変換処理と前記データ抽出処理を省略可能とすることにより、ソフトウェアプログラムを作成するための手間を軽減することにある。
本発明の前記ならびにその他の目的と新規な特徴は、本明細書の記述及び添付図面によって明らかにする。 The present invention has been made to solve the problems of the prior art, and an object of the present invention is to provide a software program by enabling the data conversion process and the data extraction process in a software program to be omitted. It is to reduce the time and effort for creating.
The above and other objects and novel features of the present invention will become apparent from the description of this specification and the accompanying drawings.

前述の目的を達成するために、本発明では、文書内容を構成する文書要素の宣言（スキーマ）と、それら文書要素の間に定義する意味関係の集合（ハイパーリンク）によって表現される文書構造定義、および前記文書構造定義に照応してそれぞれの文書要素の実現値の集合を保持する構造化文書に対し、文書要素の実現値を構造化文書から読み取り、ソフトウェアプログラムで処理可能なデータに編集して提供するデータ読込み手段を備える構造化文書の処理システムにおいて、構造化文書に対して、目的に応じて抽出された文書内容を、目的に応じたデータ構造を通じて参照する手段を提供する。
請求項１に記載の構造化文書の処理システムによるミドルウェアシステムでは、基本データ構造選択手段を用いて、データ読込み手段から提供される処理対象データのデータ構造を変更することができる。
基本データ構造選択手段は、データ読込み時のデータ編集の方法を、あらかじめ用意される既知の基本的なデータ構造またはオブジェクト構造のなかから選択して指示する方式であるため、データ編集の手順を逐一指定する必要がない。
そのため、ソフトウェアプログラムが実現しようとする機能が一般的であり、基本的なデータ構造またはオブジェクト構造のいずれかで適切に作成可能である場合には、ソフトウェアプログラムの中で前記データ変換処理を行う必要がなくなる。 In order to achieve the above-described object, in the present invention, a document structure definition expressed by a declaration (schema) of document elements constituting document contents and a set of semantic relationships (hyperlink) defined between the document elements. For a structured document that holds a set of actual values of each document element in response to the document structure definition, the actual value of the document element is read from the structured document and edited into data that can be processed by a software program. In the structured document processing system provided with the data reading means to be provided, a means is provided for referring to the document contents extracted according to the purpose through the data structure according to the purpose for the structured document.
In the middleware system using the structured document processing system according to the first aspect, the data structure of the processing target data provided from the data reading means can be changed using the basic data structure selecting means.
The basic data structure selection means is a method of selecting and instructing a data editing method at the time of data reading from a known basic data structure or object structure prepared in advance. There is no need to specify.
Therefore, the function to be realized by the software program is general, and if it can be appropriately created with either a basic data structure or an object structure, it is necessary to perform the data conversion process in the software program. Disappears.

請求項２に記載の構造化文書の処理システムによるミドルウェアシステムでは、意味関係抽出手段を用いて、データ読込み手段から提供される処理対象データの内容を、意味関係の特定の属性によって分類される意味関係の種類に注目した一部分だけに限定することができる。
そのため、文章構造定義における意味関係の集合が、処理目的ごとに意味関係の特定の属性値にて分類されている場合には、意味関係の種類と処理目的が対応するため、ソフトウェアプログラムの中で前記データ抽出処理を行う必要がなくなる。
請求項３に記載の構造化文書の処理システムによるミドルウェアシステムでは、請求項１に記載の構造化文書の処理システムにおいて、ソフトウェアプログラムが実現しようとする機能に合うデータ構造またはオブジェクト構造がデータ構造選択手段の選択肢に存在しなかった場合には、新たに実行時ライブラリとして作成し、それを選択肢に追加しておいて以後選択可能とすることができ、構造化文書の処理システムの適用範囲を拡張していくことができる。 In the middleware system based on the structured document processing system according to claim 2, the meaning of the content of the processing target data provided from the data reading means is classified by a specific attribute of the semantic relation using the semantic relation extracting means. It can be limited to only a part focusing on the type of relationship.
Therefore, if the set of semantic relationships in the sentence structure definition is classified by the specific attribute value of the semantic relationship for each processing purpose, the type of semantic relationship corresponds to the processing purpose. There is no need to perform the data extraction process.
The middleware system according to the structured document processing system according to claim 3, wherein in the structured document processing system according to claim 1, a data structure or object structure suitable for a function to be realized by a software program is selected as a data structure If it does not exist as a choice of means, it can be newly created as a runtime library, added to the choice and made selectable thereafter, and the scope of application of the structured document processing system is expanded. Can continue.

本願において開示される発明のうち代表的なものによって得られる効果を簡単に説明すれば、下記の通りである。
本発明によれば、構造化文書を取り扱うソフトウェアプログラムにおいて、文書内容をプログラムで処理するためのデータ変換処理と、必要なデータを選択するためのデータ抽出処理を省略することにより、プログラム作成の手間を低減することが可能となる。 The effects obtained by the representative ones of the inventions disclosed in the present application will be briefly described as follows.
According to the present invention, in a software program that handles structured documents, it is possible to create a program by omitting data conversion processing for processing the document contents by the program and data extraction processing for selecting necessary data. Can be reduced.

以下、図面を参照して本発明の実施例を詳細に説明する。
なお、実施例を説明するための全図において、同一機能を有するものは同一符号を付け、その繰り返しの説明は省略する。
［機能ブロック図］
図１は、本発明の実施例の構造化文書の処理システムの概略構成を示す機能ブロック図である。
構造化文書の処理システム１は、業務処理プログラム２からの要求に応じて構造化文書の文書内容を処理対象データとして提供するシステムである。ここで、構造化文書の処理システム１は、例えば、汎用のパーソナルコンピュータで構成される。
業務処理プログラム２が要求する文書構造定義と構造化文書を読み込むため、構造化文書の処理システム１はそれぞれ文書構造定義記録媒体３と構造化文書記録媒体４に接続している。
例えば、キーボード、マウスなどで構成されるプログラミング端末５は、構造化文書の処理システム１の実行前に、ユーザが実行パラメータを指定するために接続される。
データ読込み手段１２は、文書要素の実現値を構造化文書から読み取り、業務処理プログラム２で処理可能なデータに編集して提供する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
In all the drawings for explaining the embodiments, parts having the same functions are given the same reference numerals, and repeated explanation thereof is omitted.
[Function block diagram]
FIG. 1 is a functional block diagram showing a schematic configuration of a structured document processing system according to an embodiment of the present invention.
The structured document processing system 1 is a system that provides document contents of a structured document as processing target data in response to a request from the business processing program 2. Here, the structured document processing system 1 is constituted by, for example, a general-purpose personal computer.
The structured document processing system 1 is connected to the document structure definition recording medium 3 and the structured document recording medium 4 in order to read the document structure definition and the structured document requested by the business processing program 2.
For example, the programming terminal 5 constituted by a keyboard, a mouse, and the like is connected for the user to specify an execution parameter before the structured document processing system 1 is executed.
The data reading means 12 reads the actual value of the document element from the structured document, and edits and provides the data that can be processed by the business processing program 2.

意味関係抽出手段１１は、文書構造定義のうち意味関係の集合からそれぞれの意味関係の種類を意味関係の属性値に従って判定し、判定した種類に基づく部分集合を抽出する。
データ読込み手段１２は、文書要素の実現値を構造化文書から読み取る際に、抽出した前記部分集合に関与する文書要素のみを読み取りの対象とする。
基本データ構造選択手段１３は、データ読込み手段１２が提供すべきデータのデータ構造を指示する。
データ構造の指示は、構造化文書の処理システム１の実行前に、ユーザがプログラミング端末５を通じて与える。
基本データ構造選択手段１３は、プログラミング端末５を通じて配列、集合、リスト、ツリー、グラフ、および表構造など、既知の基本的なデータ構造またはオブジェクト構造の選択肢を提示し、ユーザは、業務処理プログラム２に適したデータ構造をそのなかから選択する。
この基本データ構造選択手段１３を通じて選択されるデータ構造またはオブジェクト構造を、所定の接続仕様を実装する実行時ライブラリとして作成しておくことにより、当該実行時ライブラリを結合することで、基本データ構造選択手段１３における選択肢に追加することが可能である。 The semantic relationship extraction unit 11 determines the type of each semantic relationship from the set of semantic relationships in the document structure definition according to the attribute value of the semantic relationship, and extracts a subset based on the determined type.
When reading the actual value of the document element from the structured document, the data reading unit 12 reads only the document element related to the extracted subset.
The basic data structure selection unit 13 indicates the data structure of data to be provided by the data reading unit 12.
The data structure instruction is given by the user through the programming terminal 5 before the structured document processing system 1 is executed.
The basic data structure selection means 13 presents known basic data structure or object structure options such as an array, a set, a list, a tree, a graph, and a table structure through the programming terminal 5. Choose a data structure suitable for your needs.
By creating the data structure or object structure selected through the basic data structure selection means 13 as a runtime library for implementing a predetermined connection specification, the basic data structure selection is performed by combining the runtime library. It is possible to add to the options in the means 13.

［基本データ構造選択による多態的な構造化文書参照］
図２、図３、図４、図５を用いて、同一の構造化文書から異なるデータ構造を編集する方法を説明する。
図２は、文書構造定義の例である。ここで文書構造定義はカギ括弧でくくられたタグで記述される。
文書要素の宣言２１は、構造化文書の中で記述できる文書要素を列挙して宣言するものである。ひとつの文書要素の宣言はitemタグで記述し、それぞれの名前をname属性で指定する。
意味関係の集合２２は、一対の文書要素の間の意味関係を列挙して宣言するものである。
ひとつの意味関係の宣言はedgeタグで記述し、意味関係の種別をtype属性で表現する。edgeタグに入れ子になったfromタグとtoタグは、意味関係の両端にあたる文書要素の名前を示している。
例えば、type属性が”part”であり、fromタグで注文書、toタグで注番が指定された意味関係は、文書要素「注文書」から文書要素「注番」への間に”part”と呼ぶ種類の意味関係が存在することを宣言している。
図３は、図２に示した文書構造定義に従って作成された構造化文書の例である。
図３は、構造化文書３１では、文書要素の宣言２１にて宣言された文書要素の名前をタグ名として利用し、それぞれの文書要素に実現値を与えている。例えば、文書要素「注番」には、０１２３という実現値が与えられている。 [Refer to polymorphic structured documents by basic data structure selection]
A method for editing different data structures from the same structured document will be described with reference to FIGS. 2, 3, 4, and 5.
FIG. 2 is an example of a document structure definition. Here, the document structure definition is described by tags enclosed in square brackets.
The document element declaration 21 enumerates and declares document elements that can be described in the structured document. The declaration of one document element is described with the item tag, and each name is specified with the name attribute.
The semantic relation set 22 is a list of semantic relations between a pair of document elements.
One semantic relationship declaration is described by an edge tag, and the type of semantic relationship is expressed by a type attribute. The from tag and the to tag nested in the edge tag indicate the names of document elements corresponding to both ends of the semantic relationship.
For example, if the type attribute is “part” and the purchase order is specified with the from tag and the order number is specified with the to tag, the semantic relationship is “part” between the document element “order” and the document element “note”. Declares that there is a kind of semantic relationship called.
FIG. 3 is an example of a structured document created according to the document structure definition shown in FIG.
In FIG. 3, in the structured document 31, the name of the document element declared in the document element declaration 21 is used as a tag name, and an actual value is given to each document element. For example, an actual value of 0123 is given to the document element “note number”.

図４は、図２に示した文書構造定義に従って、図３に示した構造化文書の値を、意味関係の種類”part”と”aggregation”に注目して抽出し、基本データ構造のひとつである木構造に編集した例を図示するものである。
図４に示す例は、基本データ構造選択手段１３において、データ読込み手段１２が提供すべきデータのデータ構造として木構造を指示したものである。
図４の例による木構造では、文書要素と対応する値の組を葉４１とし、個々の意味関係または文書要素の入れ子関係を枝４２としてツリー構造を構成する。
図５は、図２に示した文書構造定義に従って、図３に示した構造化文書の値を、意味関係の種類”part”と”aggregation”に注目して抽出し、基本データ構造のひとつである配列構造に編集した例を図示するものである。
図５に示す例は、基本データ構造選択手段１３において、データ読込み手段１２が提供すべきデータのデータ構造として配列構造を指示したものである。
図５の例による配列構造では、一配列要素に相当する一行を、行番号欄と、見出し欄５ｆと、値欄５２で表現する。
文書要素の名前を見出し欄５１に記述し、対応する値を値欄５２に記述し、個々の意味関係または文書要素の入れ子関係を見出し欄５１の３段階のカラム並び順で表現している。
図４に示した例と図５に示した例は、同じ文書内容を異なるデータ構造で表現している。
このように異なるデータ構造を、業務処理プログラム２の実装アルゴリズムに応じて選択する。 4 extracts the values of the structured document shown in FIG. 3 while focusing on the types of semantic relationships “part” and “aggregation” in accordance with the document structure definition shown in FIG. An example of editing a certain tree structure is illustrated.
In the example shown in FIG. 4, the basic data structure selection unit 13 indicates a tree structure as the data structure of data to be provided by the data reading unit 12.
In the tree structure according to the example of FIG. 4, a tree structure is configured with a set of values corresponding to document elements as leaves 41 and individual semantic relationships or nested relationships between document elements as branches 42.
FIG. 5 extracts the values of the structured document shown in FIG. 3 by focusing on the types of semantic relationships “part” and “aggregation” in accordance with the document structure definition shown in FIG. An example of editing into an array structure is illustrated.
In the example shown in FIG. 5, the basic data structure selection unit 13 indicates an array structure as the data structure of data to be provided by the data reading unit 12.
In the array structure according to the example of FIG. 5, one line corresponding to one array element is represented by a line number field, a heading field 5 f, and a value field 52.
The name of the document element is described in the heading column 51, the corresponding value is described in the value column 52, and the individual semantic relationship or the nested relationship of the document elements is expressed in the three column arrangement order of the heading column 51.
The example shown in FIG. 4 and the example shown in FIG. 5 express the same document content with different data structures.
In this way, different data structures are selected according to the implementation algorithm of the business processing program 2.

［意味関係抽出による多面的な構造化文書参照］
図６は、図２に示した文書構造定義に従って、図３に示した構造化文書の値を、意味関係の種類”arithmetic”に注目して抽出し、図４に示したものと同じ木構造に編集した例を図示するものである。
以上、本発明者によってなされた発明を、前記実施例に基づき具体的に説明したが、本発明は、前記実施例に限定されるものではなく、その要旨を逸脱しない範囲において種々変更可能であることは勿論である。 [Refer to multifaceted structured documents by extracting semantic relations]
6 extracts the values of the structured document shown in FIG. 3 while paying attention to the semantic relation type “arithmetic” in accordance with the document structure definition shown in FIG. 2, and the same tree structure as shown in FIG. An example of editing is shown in FIG.
As mentioned above, the invention made by the present inventor has been specifically described based on the above embodiments. However, the present invention is not limited to the above embodiments, and various modifications can be made without departing from the scope of the invention. Of course.

本発明の実施例の構造化文書の処理システムの概略構成を示す機能ブロック図である。It is a functional block diagram which shows schematic structure of the processing system of the structured document of the Example of this invention. 文書構造定義の一例を示す図である。It is a figure which shows an example of a document structure definition. 構造化文書の一例を示す図である。It is a figure which shows an example of a structured document. 図１に示すデータ読込み手段によるデータ編集の一例を示す基本データ構造の例である。It is an example of the basic data structure which shows an example of the data editing by the data reading means shown in FIG. 図１に示すデータ読込み手段によるデータ編集の別の例を示す基本データ構造の例である。It is an example of the basic data structure which shows another example of the data editing by the data reading means shown in FIG. 図１に示す意味関係抽出手段の効果を示すための基本データ構造の例である。It is an example of the basic data structure for showing the effect of the semantic relationship extraction means shown in FIG.

Explanation of symbols

１構造化文書の処理システム
２業務処理プログラム
３文書構造定義記録媒体
４構造化文書記録媒体
１１意味関係抽出手段
１２データ読込み手段
１３基本データ構造選択手段
２１文書要素の宣言
２２意味関係の集合
３１構造化文書 DESCRIPTION OF SYMBOLS 1 Processing system of structured document 2 Business processing program 3 Document structure definition recording medium 4 Structured document recording medium 11 Semantic relation extraction means 12 Data reading means 13 Basic data structure selection means 21 Document element declaration 22 Semantic relation set 31 Structure Document

Claims

A document structure definition expressed by a declaration of document elements constituting the document content, a set of semantic relations defined between the document elements, and a set of actual values of each document element corresponding to the document structure definition In a structured document processing system including a data reading unit that reads a realized value of a document element from a structured document and edits and provides it to data that can be processed by a software program for the structured document to be held.
Basic data structure selection for indicating and selecting the data structure of data provided by the data reading means from among known basic data structures or object structures such as arrays, sets, lists, trees, graphs, and table structures A structured document processing system comprising: means.

Semantic relationship extraction means for determining the type of each semantic relationship from the set of semantic relationships in the document structure definition according to the attribute value of the semantic relationship, and extracting a subset based on the determined type,
2. The structured document according to claim 1, wherein when the data reading means reads the actual value of the document element from the structured document, only the document element related to the extracted subset is read. Processing system.

By creating the data structure or object structure selected through the basic data structure selection means as a runtime library that implements a predetermined connection specification, the basic data structure selection is performed by combining the runtime library. The structured document processing system according to claim 1, wherein the system can be added to options in the means.