JP2004348593A

JP2004348593A - Storage search device, storage search method, storage search program, and storage search program recording medium

Info

Publication number: JP2004348593A
Application number: JP2003146784A
Authority: JP
Inventors: Shiro Kasuga; 史朗春日; Kiyoutaro Horiguchi; 恭太郎堀口; Mitsuaki Tsunakawa; 光明綱川
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: NTT Inc
Priority date: 2003-05-23
Filing date: 2003-05-23
Publication date: 2004-12-09
Anticipated expiration: 2023-05-23
Also published as: JP4242701B2

Abstract

【課題】オブジェクト指向構造化文書データベースがメタ情報を格納する機能を有しなくても、アプリケーションプログラムに依存せずに、メタ情報の追加・変更・削除に柔軟に対応できる格納検索装置、格納検索方法、格納検索プログラム、および格納検索プログラム記録媒体を提供する。
【解決手段】格納・検索システム１は、アプリケーション１００、格納・検索装置２００、オブジェクト指向構造化文書データベース（以下、データベースと呼ぶ）３００を備えている。格納・検索装置２００は、アプリケーション１００から渡された構造化文書およびメタ情報をデータベース３００に格納する機能と、アプリケーション１００から渡されたパス検索式をもってデータベース３００に格納された構造化文書およびメタ情報の検索を行い、その検索結果をアプリケーション１００に返却する機能と、を有するミドルウェアプログラムが記録され、実行される装置である。
【選択図】図１Kind Code: A1 A storage search device and storage search that can flexibly cope with addition, change, and deletion of meta information without depending on an application program even if the object-oriented structured document database does not have a function of storing meta information. A method, a storage search program, and a storage search program recording medium are provided.
A storage / retrieval system includes an application, a storage / retrieval device, and an object-oriented structured document database (hereinafter, referred to as a database). The storage / retrieval apparatus 200 has a function of storing the structured document and the meta information passed from the application 100 in the database 300, and a function of storing the structured document and the meta information stored in the database 300 with the path search formula passed from the application 100. And a function of returning a search result to the application 100.
[Selection diagram] Fig. 1

Description

【０００１】
【発明の属する技術分野】
本発明は、構造化文書を格納し、検索するコンピュータシステムに用いられるミドルウェアに関し、特に、メタ情報を格納する機能を持たないオブジェクト指向データベースに対して構造化文書およびメタ情報を格納し、検索するミドルウェアに関する。
【０００２】
【従来の技術】
近年、ＸＭＬ（ｅＸｔｅｎｓｉｂｌｅＭａｒｋｕｐＬａｎｇｕａｇｅ）をはじめとする構造化文書が、インターネット上の様々な情報共有のためのデータフォーマットとして、利用されるようになっている。ＸＭＬは、１９９７年１２月に、標準化団体Ｗ３Ｃ（ＷｏｒｌｄＷｉｄｅＷｅｂＣｏｎｓｏｒｉｕｍ）により標準化された構造化文書の規格の一種である。このＸＭＬ規格に沿って書かれたデータをＸＭＬ文書と呼ぶ。
【０００３】
ＸＭＬ文書は、人が解読・編集可能な文書である。しかし、同時に、ＸＭＬ文書は、タグを用いて構造化されており、コンピュータプログラムが、容易に処理することが可能なデータでもある。ＸＭＬ文書のタグは、見かけは文書中に埋めこまれた「＜」と「＞」で囲まれた文字列である。タグには、開始タグと終了タグがあり、開始タグと終了タグで囲まれた領域を要素と呼ぶ。要素は、複数の子要素を持ち、それぞれの子要素が複数の孫要素を持つというように、入れ子状に記述できる。そのため、ＸＭＬ文書は、多段階の木構造を表現することができる。
【０００４】
現在、ＸＭＬ文書によって表現される情報は、多岐に渡り、ＸＭＬ規格にタグの付け方の規則を規定することで、特定の用途への応用が行われている。例えば、企業間連携のためのＲｏｓｅｔｔａＮｅｔ（ｈｔｔｐ：／／ｗｗｗ．ｒｏｓｅｔｔａｎｅｔ．ｇｒ．ｊｐ／）やｅｂＸＭＬ（ｈｔｔｐ：／／ｗｗｗ．ｅｂｘｍｌ．ｏｒｇ／）、リソース情報記述のためのＲＤＦ（ＲｅｓｏｕｒｃｅＤｅｆｉｎｉｔｉｏｎＦｒａｍｅｗｏｒｋ，ｈｔｔｐ：／／ｗｗｗ．ｗ３．ｏｒｇ／ＲＤＦ／）、マルチメデイア情報記述のためのＳＶＧ（ＳｃａｌａｂｌｅＶｅｃｔｏｒＧｒａｐｈｉｃｓ）やＳＭＩＬ（ＳｙｎｃｈｒｏｎｉｚｅｄＭｕｌｔｉｍｅｄｉａＩｎｔｅｇｒａｔｉｏｎＬａｎｇｕａｇｅ）などがある。上記の特定用途のＸＭＬ文書を利用するシステムは、それぞれのシステムが処理すべきＸＭＬ文書であることを確認するために、ＸＭＬ用のスキーマ言語（ＸＭＬＳｃｈｅｍａ，ｈｔｔｐ：／／ｗｗｗ．ｗ３．ｏｒｇ／ＸＭＬ／Ｓｃｈｅｍａ）を用いて検証を行い、規定外のＸＭＬ文書を排除することで、処理対象のＸＭＬ文書のみに処理を注力することができる。
【０００５】
ＸＭＬ文書をコンピュータプログラムが処理する際には、ＸＭＬ文書が表現する木構造をコンピュータメモリ上の木構造に変換した方が便利である。このように、ＸＭＬ文書をコンピュータメモリ上の木構造として表現したものを、ＤＯＭ（ＤｏｃｕｍｅｎｔＯｂｊｅｃｔＭｏｄｅｌ）と呼ぶ。ＤＯＭは、同じくＷ３Ｃにより標準化されている。ＤＯＭは、ノードとリンクよりなるノード・リンクモデルでＸＭＬ文書を表現する。ＸＭＬ文書の要素は、ＤＯＭのノードに相当する。
【０００６】
コンピュータメモリ上のＤＯＭデータを処理するシステムを作成する際に、ＤＯＭデータ中のノードを指し示す検索式が利用できれば便利である。そのために、同じくＷ３Ｃにより、ＸＰａｔｈ（ＸＭＬＰａｔｈＬａｎｇｕａｇｅ）という表記方法が標準化されている。ＸＰａｔｈのようなパス検索式を用いることで、ＤＯＭデータ中の条件に合うノードを指し示すことができる。
【０００７】
上記のようなＸＭＬに関する様々な技術の規格化が行われ、様々なコンピュータシステムがＸＭＬをベースとして開発されるようになったため、近年、ＸＭＬ文書を格納するためのデータベースの必要性も増している。ＸＭＬを格納するデータベースには、大きく分けて、リレーショナルデータベース、オブジェクト指向データベース、文書データベースの３種類がある。
【０００８】
リレーショナルデータベースにＸＭＬ文書を格納するには、ＸＭＬ文書をリレーショナルデータベースの格納モデルである二次元の表に変換する必要がある。現在、リレーショナルモデルに基づくリレーショナルデータベース管理システム（ＲＤＢＭＳ）は、データベース管理システム（ＤＢＭＳ）の主流として、顧客管理データベースや物品管理データベースなどに広く利用されている。従って、信頼性の高いリレーショナルデータベース管理システム（ＲＤＢＭＳ）を利用することは容易であるが、ＸＭＬ文書を二次元の表形式に変換するには、元となるＸＭＬ文書の形式や利用目的を分析し、最適な変換方法を検討し、リレーショナルスキーマを設計する必要がある。そのため、設計・構築コストが高く、大規模なシステム開発には向くが、中小規模のシステム開発には不向きである。
【０００９】
オブジェクト指向データベースにＸＭＬ文書を格納するには、ＸＭＬ文書をそのままデータベースに格納すればよい。これは、オブジェクト指向データベースは、ＸＭＬ文書の基本構造である木構造をオブジェクトの親子関係として、そのままの形で格納することができるからである。そのため、システム開発のコスト低減や、構築期間の短縮が重要な中小規模のシステム開発においては、複雑なスキーマ設計が必要ないという理由から、ＸＭＬ文書を木構造データとしてデータベースに格納し、パス検索式を用いて検索を行うことが可能なオブジェクト指向データベースが盛んに利用されている。なお、以後の説明において、構造化文書を格納するオブジェクト指向データベースをオブジェクト指向構造化文書データベースと呼ぶ。
【００１０】
文書データベースにＸＭＬ文書を格納する際には、構造化文書を文章として格納する。文書データベースは、構造化文書を文章として扱い、自然言語解析を施し、索引付けを行い、データベースに格納するので、文章の類似検索が可能なデータベースである。そのため、文書データベースは、ＸＭＬ文書のうち、文章データを格納する場合に特化して利用されるが、文章を扱うシステム開発以外には、用いられない。
【００１１】
オブジェクト指向構造化文書データベースの構造化文書の格納は、図１９に示す構造化文書を図２０（ａ）のようなノードとリンクの木構造として表現し、ノードオブジェクトとその間のリンクという形式で保存することで実現されている。尚、図２０（ｂ）は図２０（ａ）の表記方法を説明している凡例であるが、これによれば、木構造には、必ずルートノードがあり、構造化文書の要素は要素ノード、属性は属性ノード、文字列はテキストノードとして格納される。
【００１２】
オブジェクト指向構造化文書データベースは、このノードとリンクの木構造に対して、木構造取得機能、木構造操作機能、およびパス検索機能の３つの機能を有する。
【００１３】
木構造取得機能は、データベースに格納された構造化文書を木構造としてアクセスし、ノード情報を取得する機能である。これにより、データベースクライアントは、木構造を辿り、ノードの情報を取得することができる。また、木構造を辿ることで、元の構造化文書を再構成することができる。例えば、図２０（ａ）に示すノードｎ_００２を基点と指定すると、図２１に示す部分構造化文書を取り出すことができる。
【００１４】
木構造操作機能は、データベースに格納された構造化文書を木構造としてアクセスし、ノード情報を操作する機能である。これにより、データベースクライアントは、基点となるノードを指定し、そのノードへ新しい子ノードの追加を行うことができる。本機能を用いると、構造化文書中に別の構造化文書を、部分構造化文書として埋め込むことができる。例えば、図２２に示す部分構造化文書をノードｎ_００２の子ノードに追加すると、図２３に示す木構造になる。この機能を部分構造化文書挿入と呼ぶ。尚、部分構造化文書挿入については、ルートノード（図２３におけるｎ_０００）を基点として構造化文書自身の挿入を指定することで、構造化文書の全文書挿入を実現することができる。
【００１５】
パス検索機能は、パス検索式により該当するノード群をノード集合として取得する機能である。パスは、複数の要素名や属性名を“／”で区切った文字列で、ＵＮＩＸ（登録商標）ＯＳなどで用いられているディレクトリパスと似た概念であり、構造化文書の木構造を辿る順序を表している。また、パス検索式には、条件式を付加することができる。条件式は、木構造を辿る際に、ノードの絞込みを行うことを指示する。図２４は、パス検索式の一例である。この例では、ｏｒｄｅｒノードの子の、ｂｏｏｋノードの子の、ａｕｔｈｏｒノードを返却することと、ｐｒｉｃｅノードの値が２００以上であるｂｏｏｋノードに限ることを表している。図２４に示すパス検索式は、ルートノードを基点とし、図２３に示すノード集合Ｎ＝｛ｎ_００５｝が返却される。
【００１６】
以上のように、ＸＭＬ文書のような構造化された文書を格納する必要がある中小規模のデータベースシステムには、オブジェクト指向データベースが適している。
【００１７】
尚、この出願に関連する先行技術文献情報としては、次のものがある。
【００１８】
【特許文献１】
特開２００１−３３１４７９
【００１９】
【発明が解決しようとする課題】
ところで、従来、オブジェクト指向データベースとアプリケーションプログラムを用いてコンピュータシステムを開発する際には、ＸＭＬ文書のような構造化文書に文書外追加情報（例えば、ＸＭＬ文書の作成者、日付、更新履歴など。以下、これをメタ情報と呼ぶ）を付加して、オブジェクト指向データベースに格納する場合が多い。このような場合には、元となるＸＭＬ文書のスキーマに任意の構造を付加する機能があるため、ＸＭＬ文書のスキーマにメタ情報を追加するという方法で対応していた。
【００２０】
しかしながら、上記の方法においては、オブジェクト指向データベースのスキーマ上では元のＸＭＬ文書とメタ情報部分は区別されないため、
（１）ＸＭＬ文書をオブジェクト指向データベースに格納する際には、ＸＭＬ文書とメタ情報を結合してから格納する
（２）オブジェクト指向データベースからＸＭＬ文書を検索する際には、元のＸＭＬ文書にメタ情報部分が含まれたまま取り出される
という処理が行われることになる。特に（２）については、扱うＸＭＬ文書がＲＤＦやＳＭＩＬなど規格化されたＸＭＬ文書である場合には、メタ情報部分が不正スキーマとしてエラーになるため、規格に合わないＸＭＬ文書をそのままＲＤＦやＳＭＩＬなどの処理に使えないという問題が発生する。
【００２１】
そのため、通常、アプリケーションが不要な部分であるメタ情報部分を削除し、ＲＤＦやＳＭＩＬの処理で扱えるようにする手段が必要となり、このことから、上記コンピュータシステムの開発においては以下の問題が発生していた。
【００２２】
（１）メタ情報の付加および削除処理にアプリケーションが対応する必要があり、開発コストが余計にかかる。
【００２３】
（２）メタ情報が増えるとその度にアプリケーションを修正する必要がある。
【００２４】
本発明は、上記の事情を鑑みたものであり、オブジェクト指向構造化文書データベースがメタ情報を格納する機能を有しなくても、アプリケーションプログラムに依存せずに、メタ情報の追加・変更・削除に柔軟に対応できる格納検索装置、格納検索方法、格納検索プログラム、および格納検索プログラム記録媒体を提供することを目的とする。
【００２５】
【課題を解決するための手段】
上記目的を達成するため、請求項１記載の本発明は、アプリケーションプログラムからの指示に基づいてオブジェクト指向構造化文書データベースにアクセスし、情報処理を行う格納検索装置であって、構造化文書とともに前記オブジェクト指向構造化文書データベースに格納される前記構造化文書のメタ情報のパスに関する情報を記憶する設定情報記憶手段と、前記アプリケーションプログラムからの格納指示のもと受け取った前記構造化文書を前記オブジェクト指向構造化文書データベースに格納するとともに、前記アプリケーションプログラムから受け取ったメタ情報を、前記設定情報記憶手段に記憶されている前記メタ情報のパスに関する情報に従って、格納された前記構造化文書に挿入し、拡張構造化文書として格納する格納手段と、前記アプリケーションプログラムからの検索指示および前記設定情報記憶手段に記憶された前記メタ情報のパスに関する情報に従って、前記オブジェクト指向構造化文書データベースに格納された前記拡張構造化文書から該当する前記構造化文書又は前記メタ情報を分離して取得する検索手段と、を有することを要旨とする。
【００２６】
請求項２記載の本発明は、請求項１記載の発明において、前記検索手段は、前記検索指示であるパス検索式と前記メタ情報のパスに関する情報の比較に基づいて、取得する文書が前記構造化文書か前記メタ情報かを判定し、判定した結果が構造化文書である場合には、取得した文書から前記格納手段で挿入した前記メタ情報を取り除くことを要旨とする。
【００２７】
請求項３記載の本発明は、アプリケーションプログラムからの指示に基づいてオブジェクト指向構造化文書データベースにアクセスし、情報処理を行う格納検索装置の格納検索方法であって、構造化文書とともに前記オブジェクト指向構造化文書データベースに格納される前記構造化文書のメタ情報のパスに関する情報を記憶する設定情報記憶ステップと、前記アプリケーションプログラムからの格納指示のもと受け取った前記構造化文書を前記オブジェクト指向構造化文書データベースに格納するとともに、前記アプリケーションプログラムから受け取ったメタ情報を、前記設定情報記憶ステップで記憶した前記メタ情報のパスに関する情報に従って、格納された前記構造化文書に挿入し、拡張構造化文書として格納する格納ステップと、前記アプリケーションプログラムからの検索指示および前記設定情報記憶ステップで記憶した前記メタ情報のパスに関する情報に従って、前記オブジェクト指向構造化文書データベースに格納された前記拡張構造化文書から該当する前記構造化文書又は前記メタ情報を分離して取得する検索ステップと、を有することを要旨とする。
【００２８】
請求項４記載の本発明は、請求項３記載の発明において、前記検索ステップは、前記検索指示であるパス検索式と前記メタ情報のパスに関する情報の比較に基づいて、取得する文書が前記構造化文書か前記メタ情報かを判定し、判定した結果が構造化文書である場合には、取得した文書から前記格納ステップで挿入した前記メタ情報を取り除くことを要旨とする。
【００２９】
請求項５記載の本発明は、請求項３又は４に記載の格納検索装置に前記各ステップを実行させる格納検索プログラムであることを要旨とする。
【００３０】
請求項６記載の本発明は、請求項５に記載された格納検索プログラムをコンピュータ読み取り可能な記録媒体に記録している格納検索プログラム記録媒体であることを要旨とする。
【００３１】
【発明の実施の形態】
以下、図面を用いて本発明の実施の形態について説明する。
【００３２】
図１は本発明の実施形態に係る格納・検索システム１の概略構成図である。図１に示す格納・検索システム１は、アプリケーション１００、格納・検索装置２００、オブジェクト指向構造化文書データベース（以下、データベースと呼ぶ）３００を備えている。尚、格納・検索システム１は、構成としては、一つからなる装置、各構成要素が分散されて複数の装置がネットワーク接続されたシステムなどのいずれの構成であっても良い。
【００３３】
アプリケーション１００は、格納・検索装置２００を利用するアプリケーションプログラムであり、その処理の中で構造化文書をデータベースに格納し、構造化文書を検索することを必要とするアプリケーションプログラムである。ここで、構造化文書を検索するために用いる検索式は上述したパス検索式である。
【００３４】
格納・検索装置２００は、アプリケーション１００から渡された構造化文書およびメタ情報をデータベース３００に格納する機能と、アプリケーション１００から渡されたパス検索式をもってデータベース３００に格納された構造化文書およびメタ情報の検索を行い、その検索結果をアプリケーション１００に返却する機能と、を有するミドルウェアプログラムが記録され、実行される装置である。
【００３５】
データベース３００は、構造化文書を格納するオブジェクト指向構造化文書データベースであり、上述した木構造取得機能、木構造操作機能、およびパス検索機能の３つの機能を有する。
【００３６】
さらに詳しくは、格納・検索装置２００は、制御装置２０１、設定情報辞書２０２、格納装置２０３および、検索装置２０４を備えている。
【００３７】
制御装置２０１は、アプリケーション１００からパス検索式を受け取ると、他の装置２０２乃至２０４を制御し、アプリケーション１００に検索結果を返却するようになっている。
【００３８】
設定情報辞書２０２は、格納・検索装置２００の動作を決定する設定情報を格納する辞書である。
【００３９】
格納装置２０３は、制御装置２０１から構造化文書およびメタ情報を受け取り、データベース３００に格納するようになっている。
【００４０】
検索装置２０４は、制御装置２０１からパス検索式を受け取り、データベース３００に対し検索を実行し、返却結果を制御装置２０１に返却するようになっている。
【００４１】
尚、設定情報辞書２０２に格納される設定情報には、データベース３００に格納する構造化文書に対して、どの位置にメタ情報を付加するかを示すメタ情報パスＰが含まれている。
【００４２】
次に、本実施の形態に係る格納・検索システム１の動作を図２乃至５を用いて説明する。ここで、図２は、格納・検索システム１の処理手順を示すフローチャート図である。図３乃至５は、図２の各ステップＳ１００、Ｓ２００、およびＳ３００を詳細に説明するフローチャート図である。
【００４３】
図２に示すように、格納・検索システム１は、まず、辞書の設定を行い、次に、アプリケーション１００からの指示により、構造化文書の格納もしくは、構造化文書の検索を行う（ステップＳ１００〜Ｓ４００）。尚、複数の構造化文書を処理する場合においては、ステップＳ１００をはじめに一度だけ行い、以降は個々の構造化文書について任意の順序でステップＳ２００およびステップＳ３００を繰り返し行う。
【００４４】
ここで、上述の各ステップについて説明する。まず、図３を用いて辞書の設定ステップＳ１００について説明する。
【００４５】
ユーザは、上述したメタ情報パスＰの一覧であるメタ情報パス集合Ｐ_１−ｎを生成し（ステップＳ１０１）、生成したメタ情報パス集合Ｐ_１−ｎを設定情報辞書２０２に対し設定入力する（ステップＳ１０２）。
【００４６】
次に、図４を用いて構造化文書の格納ステップＳ２００について説明する。
【００４７】
アプリケーション１００は、格納する構造化文書Ｄとメタ情報集合Ｍ_１−ｎを生成し、制御装置２０１に入力し、構造化文書格納を指示する（ステップＳ２０１）。なお、メタ情報Ｍ_ｉは、パスＰ_ｉに対応するメタ情報である。
【００４８】
制御装置２０１は、ステップＳ２０１で入力された構造化文書Ｄを、格納装置２０３に入力し、構造化文書の格納を指示する（ステップＳ２０２）。
【００４９】
格納装置２０３は、ステップＳ２０２で入力された構造化文書Ｄをデータベース３００に入力し、全文書挿入を指示する（ステップＳ２０３）。
【００５０】
データベース３００は、ステップＳ２０３で入力され指示された構造化文書Ｄを用いて、全文書挿入を実行する（ステップＳ２０４）。
【００５１】
次に、制御装置２０１は、ステップＳ２０１で入力されたメタ情報集合Ｍ_１−ｎ、および設定情報辞書２０２より取り出したメタ情報パス集合Ｐ_１−ｎを格納装置２０３に入力し、メタ情報格納を指示する（ステップＳ２０５）。
【００５２】
格納装置２０３は、ステップＳ２０５で入力されたメタ情報Ｍ_ｉをメタ情報パスＰ_ｉに従ってデータベース３００に入力し、部分構造化文書挿入を指示する（ステップＳ２０６）。
【００５３】
データベース３００は、ステップＳ２０６で入力され指示されたメタ情報Ｍ_ｉ、およびメタ情報パスＰ_ｉに基づいて部分構造化文書挿入を実行する（ステップＳ２０７）。この際、メタ情報パスＰ_ｉの指し示す位置にメタ情報Ｍ_ｉを挿入する。
【００５４】
次に、図５を用いて構造化文書の検索ステップＳ３００について説明する。
【００５５】
アプリケーション１００は、データベース３００より構造化文書またはメタ情報を取得するためのパス検索式Ｑを生成する（ステップＳ３０１）。この際、パス検索式Ｑの条件に、構造化文書Ｄおよびメタ情報集合Ｍ_１−ｎを指し示すパスを指定することができる。
【００５６】
アプリケーション１００は、ステップＳ３０１で生成したパス検索式Ｑを制御装置２０１に入力し、検索実行を指示する（ステップＳ３０２）。
【００５７】
制御装置２０１は、ステップＳ３０２で入力されたパス検索式Ｑを検索装置２０４に入力し、検索実行を指示する（ステップＳ３０３）。
【００５８】
検索装置２０４は、ステップＳ３０３で入力されたパス検索式Ｑをデータベース３００に入力し、検索実行を指示すると、データベース３００は、検索を実行し、検索装置２０４にノード集合Ｎ_１−ｍを返却する（ステップＳ３０４）。
【００５９】
検索装置２０４は、ステップＳ３０４で返却されたノード集合Ｎ_１−ｍを、制御装置２０１に返却する（ステップＳ３０５）。
【００６０】
制御装置２０１は、設定情報辞書２０２よりメタ情報パス集合Ｐ_１−ｎを取得し、該メタ情報パス集合Ｐ_１−ｎと、パス検索式Ｑから条件を除いたパスＰ_Ｑと、を比較する（ステップＳ３０６）。これは、具体的には、メタ情報パス集合Ｐ_１−ｎ中の全てのメタ情報パスＰ_ｋについて、パスＰ_Ｑがメタ情報パスＰ_ｋ自身またはその子孫ノードを指し示すかどうかで判定するものである。パスＰ_Ｑがメタ情報パスＰ_ｋ自身またはその子孫ノードを指し示さない場合には、パス検索式Ｑは構造化文書Ｄの部分構造化文書集合を指し示すものとみなし、ステップＳ３０８を実行する。これに対して、パスＰ_Ｑがメタ情報パスＰ_ｋ自身またはその子孫ノードを指し示す場合には、パス検索式Ｑはメタ情報集合Ｍ_１−ｎの部分構造化文書集合を指し示すものとみなし、ステップＳ３０９を実行する（ステップＳ３０７）。
【００６１】
パスＰ_Ｑがメタ情報パスＰ_ｋ自身またはその子孫ノードを指し示さない場合には、制御装置２０１は、ステップＳ３０５で返却されたノード集合Ｎ_１−ｍの個々のノードＮ_ｊについて、部分構造化文書取得を実行する（ステップＳ３０８）。部分構造化文書取得は、ノードＮ_ｊ以下の子孫ノードを全て取得し、構造化文書に組み立てることで行う。ただし、この際、設定情報辞書２０２より、メタ情報パス集合Ｐ_１−ｎを取得し、これらのパスに該当するノードに関しては取得しない。これにより生成されるノードＮ_ｊを頂点とする部分構造化文書を部分構造化文書Ｅ_ｊとする。最終的に、ノード集合Ｎ_１−ｍの全てのノードについて部分構造化文書を生成し、部分構造化文書集合Ｅ_１−ｍを得る。
【００６２】
これに対して、パスＰ_Ｑがメタ情報パスＰ_ｋ自身またはその子孫ノードを指し示す場合には、制御装置２０１は、ステップＳ３０５で返却されたノード集合Ｎ_１−ｍの個々のノードＮ_ｊについて、部分構造化文書取得を実行する（ステップＳ３０９）。部分構造化文書取得は、ノードＮ_ｊ以下の子孫ノードを全て取得し、構造化文書に組み立てることで行う。これにより生成されるノードＮ_ｊを頂点とする部分構造化文書を部分構造化文書Ｅ_ｊとする。最終的に、ノード集合Ｎ_１−ｍの全てのノードについて、部分構造化文書を生成し、部分構造化文書集合Ｅ_１−ｍを得る。
【００６３】
制御装置２０１は、ステップＳ３０８又はＳ３０９で生成した部分構造化文書集合Ｅ_１−ｍをアプリケーション１００に返却する（ステップＳ３１０）。
【００６４】
次に、具体的に、構造化文書としてＸＭＬ（ｅＸｔｅｎｓｉｂｌｅＭａｒｋｕｐＬａｎｇｕａｇｅ）、データベース３００は、パス検索式としてＸＰａｔｈ（ＸＭＬＰａｔｈＬａｎｇｕａｇｅ）をサポートするデータベース（以下、ＸＭＬＤＢと呼ぶ）を用いた場合の格納・検索システム１について説明する。
【００６５】
この格納・検索システム１は、上述した図２のフローチャートに示す動作を行う。ここで、実際に処理においては、アプリケーション１００の利用目的とユーザの操作に応じて、任意の順序でステップＳ２００およびＳ３００を必要回数繰り返すが、説明上、ステップＳ１００乃至Ｓ３００を１度のみ行うものとする。
【００６６】
まず、図３を用いて辞書の設定ステップＳ１００について説明する。
【００６７】
ユーザは、設定情報辞書２０２に対して、図６に示すメタ情報パスＰ_１およびＰ_２の一覧であるメタ情報パス集合Ｐ_１−２を生成する（ステップＳ１０１）。
【００６８】
そして、ユーザは、ステップＳ１０１で生成したメタ情報パス集合Ｐ_１−２を、設定情報辞書２０２に対し設定する（ステップＳ１０２）。
【００６９】
次に、図４を用いて構造化文書の格納ステップＳ２００について説明する。
【００７０】
アプリケーション１００は、図７に示す構造化文書Ｄと、図８に示すメタ情報集合Ｍ_１−２を生成し、制御装置２０１に入力し、構造化文書格納を指示する（ステップＳ２０１）。なお、メタ情報Ｍ_ｉ（ｉ＝１，２）は、パスＰ_ｉ（ｉ＝１，２）に対応するメタ情報である。
【００７１】
制御装置２０１は、ステップＳ２０１で入力された構造化文書Ｄを、格納装置２０３に入力し、構造化文書の格納を指示する（ステップＳ２０２）。
【００７２】
格納装置２０３は、ステップＳ２０２で入力された構造化文書Ｄをデータベース３００に入力し、全文書挿入を指示する（ステップＳ２０３）。
【００７３】
データベース３００は、ステップＳ２０３で入力され指示された構造化文書Ｄを用いて、全文書挿入を実行する（ステップＳ２０４）。格納された構造化文書Ｄのデータベース内での構造を図９に示す。
【００７４】
制御装置２０１は、ステップＳ２０１で入力されたメタ情報集合Ｍ_１−２、および設定情報辞書２０２より取り出されたメタ情報パス集合Ｐ_１−２を格納装置２０３に入力し、メタ情報格納を指示する（ステップＳ２０５）。
【００７５】
格納装置２０３は、ステップＳ２０５で入力されたメタ情報Ｍ_ｉをメタ情報パスＰ_ｉに従って、データベース３００に入力し、部分構造化文書挿入を指示する（ステップＳ２０６）。
【００７６】
データベース３００は、ステップＳ２０６で入力され指示されたメタ情報Ｍ_ｉ、およびメタ情報パスＰ_ｉに基づいて部分構造化文書挿入を実行する（ステップＳ２０７）。この際、メタ情報パスＰ_ｉの指し示す位置にメタ情報Ｍ_ｉを挿入する。挿入された構造化文書Ｄとメタ情報集合Ｍ_１−２のデータベース内での構造を図１０に示す。図１０においては、メタ情報パスＰ_１の示すｎ_００６の位置にメタ情報Ｍ_１が、メタ情報パスＰ_２の示すｎ_００７の位置にメタ情報Ｍ_２が挿入されている。
【００７７】
次に、図５を用いて構造化文書の検索ステップＳ３００について説明する。
【００７８】
アプリケーション１００は、データベース３００より構造化文書を取得するための図１１に示すパス検索式Ｑを生成する（ステップＳ３０１）。図１１に示すパス検索式Ｑは、条件としてメタ情報を指定し（メタ情報であるｆｉｌｅｎａｍｅ属性が‘ｈｏｍｅｐａｇｅ１．ｘｍｌ’であるもの）、構造化文書Ｄの部分構造化文書取得を表している（パス“／ＲＤＦ”配下の部分構造化文書を取得）。
【００７９】
アプリケーション１００は、ステップＳ３０１で生成したパス検索式Ｑを制御装置２０１に入力し、検索実行を指示する（ステップＳ３０２）。
【００８０】
制御装置２０１は、ステップＳ３０２で入力されたパス検索式Ｑを検索装置２０４に入力し、検索実行を指示する（ステップＳ３０３）。
【００８１】
検索装置２０４は、ステップＳ３０３で入力されたパス検索式Ｑをデータベース３００に入力し、検索実行を指示すると、データベース３００は、検索を実行し、検索装置２０４にノード集合Ｎ_１−ｍを返却する（ステップＳ３０４）。返却されるノード集合Ｎ_１−ｍ（ｍ＝１であり、Ｎ_１）を図１２に示す。
【００８２】
検索装置２０４は、ステップＳ３０４で返却されたノード集合Ｎ_１を、制御装置２０１に返却する（ステップＳ３０５）。
【００８３】
制御装置２０１は、設定情報辞書２０２より、メタ情報パス集合Ｐ_１−２を取得し、パス検索式Ｑから条件を除いたパスＰ_Ｑと比較する（ステップＳ３０６）。この例におけるパスＰ_Ｑを図１３に示す。パスＰ_Ｑが指し示すパスはルートノードの子ノードの“ＲＤＦ”ノードである。メタ情報パス集合Ｐ_１−２中の全てのメタ情報パスＰ_ｋについて、パスＰ_Ｑが指し示すノードが、メタ情報パスＰ_ｋ自身かその子孫ノードであるようなメタ情報パスＰ_ｋが存在しないので、パス検索式Ｑは構造化文書Ｄの部分構造化文書集合を返却するものとみなしステップＳ３０８を実行する（ステップＳ３０７）。
【００８４】
制御装置２０１は、ステップＳ３０５で返却されたノード集合Ｎ_１のノードＮ_１について、部分構造化文書取得を実行する（ステップＳ３０８）。部分構造化文書取得は、ノードＮ_１以下の子孫ノードを全て取得し、構造化文書に組み立てることで行う。ただし、この際、設定情報辞書２０２より、図６に示すメタ情報パス集合Ｐ_１−２を取得し、これらのパスに該当するノードは取得しない。これにより生成されるノードＮ_１を頂点とする部分構造化文書を部分構造化文書Ｅ_１とする。この具体例においては、ノード集合Ｎ_１のノードはノードＮ_１だけであるので、ノードＮ_１より部分構造化文書集合Ｅ_１を得る。図１４に生成される部分構造化文書集合Ｅ_１を示す。この時、データベース３００内の木構造に付加されていたメタ情報は、部分構造化文書集合Ｅ_１には付加されず、元の構造化文書Ｄに含まれている要素だけが出力される。
【００８５】
制御装置２０１は、ステップＳ３０８で生成した部分構造化文書集合Ｅ_１をアプリケーション１００に返却する（ステップＳ３１０）。
【００８６】
次に、ステップＳ３０７において、パス検索式Ｑがメタ情報を取得する場合の処理を以下に示す。
【００８７】
アプリケーション１００は、データベース３００より構造化文書を取得するための図１５に示すパス検索式Ｑ’を生成する（ステップＳ３０１）。パス検索式Ｑ’は、条件として構造化文書を指定し（構造化文書の“／ＲＤＦ／Ｄｅｓｃｒｉｐｔｉｏｎ／ｄｃ：ｃｒｅａｔｏｒ”要素が‘春日’であるもの）、メタ情報の部分構造化文書取得を表している（パス“／ＲＤＦ／ｃｈａｎｇｅ＿ｌｏｇ／ｌｏｇ”配下の部分構造化文書を取得）。
【００８８】
アプリケーション１００は、ステップＳ３０１で生成したパス検索式Ｑ’を制御装置２０１に入力し、検索実行を指示する（ステップＳ３０２）。
【００８９】
制御装置２０１は、ステップＳ３０２で入力されたパス検索式Ｑ’を検索装置２０４に入力し、検索実行を指示する（ステップＳ３０３）。
【００９０】
検索装置２０４は、ステップＳ３０３で入力されたパス検索式Ｑ’をデータベース３００に入力し、検索実行を指示する（ステップＳ３０４）。データベース３００は、検索を実行し、検索装置２０４にノード集合Ｎ’_１−ｍを返却する。返却されるノード集合Ｎ’_１−ｍ（ｍ＝１であり、Ｎ’_１）を図１６に示す。
【００９１】
検索装置２０４は、ステップＳ３０４で返却されたノード集合Ｎ’_１を制御装置２０１に返却する（ステップＳ３０５）。
【００９２】
制御装置２０１は、設定情報辞書２０２よりメタ情報パス集合Ｐ_１−２を取得し、パス検索式Ｑから条件を除いたパスＰ_Ｑ _’と比較する（ステップＳ３０６）。この例におけるパスＰ_Ｑ _’を図１７に示す。パスＰ_Ｑが指し示すパスはメタ情報集合Ｍ_２の子ノードの“ｌｏｇ”ノードである。メタ情報パス集合Ｐ_１−２中の全てのメタ情報パスＰ_ｋについて、パスＰ_Ｑが指し示すノードが、メタ情報パスＰ_ｋ自身かその子孫ノードであるようなメタ情報パスＰ_ｋが存在する（「メタ情報パスＰ_２」が該当）ので、パス検索式Ｑ’はメタ情報集合Ｍ_１−２の部分構造化文書集合を返却するものとみなしステップＳ３０９を実行する（ステップＳ３０７）。
【００９３】
制御装置２０１は、ステップＳ３０５で返却されたノード集合Ｎ’_１のノードＮ’_１について、部分構造化文書取得を実行する（ステップＳ３０９）。部分構造化文書取得は、ノードＮ’_１以下の子孫ノードを全て取得し、構造化文書に組み立てることで行う。これにより生成されるノードＮ’_１を頂点とする部分構造化文書を部分構造化文書Ｅ’_１とする。この具体例においては、ノード集合Ｎ’_１−のノードはノードＮ’_１だけであるので、ノードＮ’_１より部分構造化文書集合Ｅ’_１を得る。図１８に生成される部分構造化文書集合Ｅ’_１を示す。
【００９４】
制御装置２０１は、ステップＳ３０９で生成した部分構造化文書集合Ｅ’_１をアプリケーション１００に返却する（ステップＳ３１０）。
【００９５】
従って、本実施の形態の格納・検索システム１によれば、メタ情報を格納する機能のないデータベース３００とアプリケーション１００の間に、ミドルウェアとしての格納・検索装置２００を用いることで、アプリケーション１００に依存せずに、メタ情報の追加・変更・削除に柔軟に対応することができるので、システム設計・開発の利便性の向上を図ることができる。
【００９６】
具体的には、オブジェクト指向構造化文書データベースに構造化文書を格納する際には、メタ情報を合わせて格納することができ、構造化文書を検索する際には、構造化文書とメタ情報を別々に取得することができる。また、構造化文書を取得する際には、メタ情報を条件として検索することが可能となり、メタ情報を取得する際には、構造化文書を条件として検索することが可能となる。
尚、上記実施の形態の格納・検索装置２００に格納されたミドルウェアプログラムは、ハードディスク、フレキシブルディスク、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤ−ＲＯＭなどのコンピュータ読み取り可能な記録媒体に記録することも、通信ネットワークを介して配信することも可能である。
【００９７】
【発明の効果】
以上説明したように、本発明によれば、オブジェクト指向構造化文書データベースがメタ情報を格納する機能を有しなくても、アプリケーションプログラムに依存せずに、メタ情報の追加・変更・削除に柔軟に対応できる格納検索装置、格納検索方法、格納検索プログラム、および格納検索プログラム記録媒体を提供することができる。
【００９８】
これにより、メタ情報を格納する機能を有しないオブジェクト指向構造化文書データベースを利用して、構造化文書およびメタ情報を格納・検索するコンピュータシステムのシステム開発コストを低減させることができる。
【図面の簡単な説明】
【図１】本発明の実施の形態に係る格納・検索システムの概略構成図である。
【図２】本発明の実施の形態に係る格納・検索システムの動作を示すフローチャートである
【図３】本発明の実施の形態に係る格納・検索システムの辞書の設定動作を示すフローチャートである。
【図４】本発明の実施の形態に係る格納・検索システムの構造化文書の格納動作を示すフローチャートである。
【図５】本発明の実施の形態に係る格納・検索システムの構造化文書の検索動作を示すフローチャートである。
【図６】メタ情報パスの一例である。
【図７】構造化文書の一例である。
【図８】メタ情報の一例である。
【図９】オブジェクト指向構造化文書データベースに格納された構造化文書の一例である。
【図１０】オブジェクト指向構造化文書データベースに格納された構造化文書の一例である。
【図１１】パス検索式の一例である。
【図１２】ノード集合の一例である。
【図１３】パスの一例である。
【図１４】部分構造化文書の一例である。
【図１５】パス検索式の一例である。
【図１６】ノード集合の一例である。
【図１７】パスの一例である。
【図１８】部分構造化文書の一例である。
【図１９】構造化文書の一例である。
【図２０】オブジェクト指向構造化文書データベースに格納された構造化文書の一例である。
【図２１】取り出された部分構造化文書の一例である。
【図２２】挿入する部分構造化文書の一例である。
【図２３】挿入後のオブジェクト指向構造化文書データベースに格納された構造化文書の一例である。
【図２４】パス検索式の一例である。
【符号の説明】
１格納・検索システム
１００アプリケーション
２００格納・検索装置
２０１制御装置
２０２設定情報辞書
２０３格納装置
２０４検索装置
３００オブジェクト指向構造化データベース[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to middleware used in a computer system for storing and retrieving structured documents, and more particularly, for storing and retrieving structured documents and meta information in an object-oriented database having no function of storing meta information. Regarding middleware.
[0002]
[Prior art]
In recent years, structured documents such as XML (extensible Markup Language) have been used as data formats for sharing various information on the Internet. XML is a type of standardized structured document standardized in December 1997 by the standardization organization W3C (World Wide Web Consortium). Data written in accordance with the XML standard is called an XML document.
[0003]
An XML document is a document that can be decrypted and edited by a person. However, at the same time, the XML document is structured using tags, and is data that can be easily processed by a computer program. The tag of the XML document is a character string apparently surrounded by “<” and “>” embedded in the document. The tag has a start tag and an end tag, and an area surrounded by the start tag and the end tag is called an element. An element can be nested, such as having multiple child elements, each child element having multiple grandchild elements. Therefore, the XML document can represent a multi-stage tree structure.
[0004]
At present, information represented by an XML document is diversified, and is applied to a specific application by defining rules for tagging in the XML standard. For example, RosettaNet (http://www.rosettanet.gr.jp/) and ebXML (http://www.ebxml.org/) for cooperation between companies, RDF (Resource Definition Framework, RDF for describing resource information) http://www.w3.org/RDF/), SVG (Scalable Vector Graphics) for describing multimedia information, SMIL (Synchronized Multimedia Integration Language), and the like. In order to confirm that each system is an XML document to be processed, a system that uses the XML document for a specific purpose described above uses an XML schema language (XML Schema, http://www.w3.org/). By performing verification using XML / Schema) and excluding non-defined XML documents, processing can be focused on only the XML document to be processed.
[0005]
When a computer program processes an XML document, it is more convenient to convert the tree structure represented by the XML document into a tree structure on a computer memory. A representation of an XML document as a tree structure on a computer memory is called a DOM (Document Object Model). DOM is also standardized by W3C. DOM expresses an XML document by a node / link model including nodes and links. Elements of an XML document correspond to DOM nodes.
[0006]
When creating a system for processing DOM data on a computer memory, it is convenient if a search expression that points to a node in the DOM data can be used. For this purpose, a notation called XPath (XML Path Language) is standardized by the W3C. By using a path search formula such as XPath, a node in the DOM data that meets a condition can be indicated.
[0007]
Since various technologies related to XML have been standardized as described above and various computer systems have been developed based on XML, the need for a database for storing XML documents has recently increased. . Databases that store XML are roughly classified into three types: relational databases, object-oriented databases, and document databases.
[0008]
In order to store an XML document in a relational database, it is necessary to convert the XML document into a two-dimensional table which is a storage model of the relational database. At present, a relational database management system (RDBMS) based on a relational model is widely used as a mainstream database management system (DBMS) for a customer management database, an article management database, and the like. Therefore, it is easy to use a highly reliable relational database management system (RDBMS). However, in order to convert an XML document into a two-dimensional table format, the format and purpose of use of the original XML document must be analyzed. It is necessary to consider the optimal conversion method and design a relational schema. Therefore, the design and construction costs are high and suitable for large-scale system development, but unsuitable for small- and medium-scale system development.
[0009]
To store an XML document in the object-oriented database, the XML document may be stored in the database as it is. This is because the object-oriented database can store the tree structure, which is the basic structure of the XML document, as it is as the parent-child relationship of the objects. Therefore, in the development of small- and medium-sized systems in which it is important to reduce the cost of system development and shorten the construction period, XML documents are stored in a database as tree-structured data in a database, since a complex schema design is not required. An object-oriented database that can perform a search by using is widely used. In the following description, an object-oriented database that stores structured documents is called an object-oriented structured document database.
[0010]
When storing an XML document in the document database, the structured document is stored as a sentence. The document database is a database that treats structured documents as sentences, performs natural language analysis, performs indexing, and stores the documents in the database, so that similarity search of sentences is possible. Therefore, the document database is used specifically for storing text data in the XML document, but is not used except for the system development that handles text.
[0011]
In the storage of the structured document in the object-oriented structured document database, the structured document shown in FIG. 19 is expressed as a tree structure of nodes and links as shown in FIG. 20A, and is stored in the form of node objects and links between them. It is realized by doing. 20B is a legend explaining the notation method of FIG. 20A. According to this, the tree structure always has a root node, and the element of the structured document is an element node. , Attributes are stored as attribute nodes, and character strings are stored as text nodes.
[0012]
The object-oriented structured document database has three functions of a tree structure acquisition function, a tree structure operation function, and a path search function for the tree structure of the nodes and links.
[0013]
The tree structure acquisition function is a function of accessing a structured document stored in a database as a tree structure and acquiring node information. This allows the database client to follow the tree structure and acquire node information. By tracing the tree structure, the original structured document can be reconstructed. For example, the node n shown in FIG.₀₀₂Is designated as the base point, the partially structured document shown in FIG. 21 can be extracted.
[0014]
The tree structure operation function is a function of accessing a structured document stored in the database as a tree structure and operating node information. As a result, the database client can specify a base node and add a new child node to the specified node. By using this function, another structured document can be embedded as a partially structured document in the structured document. For example, the partially structured document shown in FIG.₀₀₂When it is added to the child node of, the tree structure shown in FIG. 23 is obtained. This function is called partially structured document insertion. Note that the insertion of a partially structured document is performed using the root node (n in FIG. 23).₀₀₀By designating the insertion of the structured document itself on the basis of ()), it is possible to insert all the documents of the structured document.
[0015]
The path search function is a function of acquiring a corresponding node group as a node set by a path search formula. A path is a character string in which a plurality of element names and attribute names are separated by “/”, and is a concept similar to a directory path used in a UNIX (registered trademark) OS or the like, and follows a tree structure of a structured document. Represents an order. Further, a conditional expression can be added to the path search expression. The conditional expression indicates that nodes should be narrowed down when tracing the tree structure. FIG. 24 is an example of a path search formula. In this example, it is shown that an author node, which is a child of the order node, a child of the book node, is returned, and that the value of the price node is limited to a book node whose value is 200 or more. The path search formula shown in FIG. 24 is based on the root node, and the node set N = ｛n shown in FIG.₀₀₅｝ Is returned.
[0016]
As described above, an object-oriented database is suitable for a small- to medium-sized database system that needs to store a structured document such as an XML document.
[0017]
Prior art document information related to this application includes the following.
[0018]
[Patent Document 1]
JP-A-2001-331479
[0019]
[Problems to be solved by the invention]
By the way, conventionally, when a computer system is developed using an object-oriented database and an application program, additional information outside the document (for example, creator, date, update history, etc. of the XML document) is added to a structured document such as an XML document. Hereinafter, this is referred to as meta information) and stored in an object-oriented database in many cases. In such a case, since there is a function of adding an arbitrary structure to the schema of the original XML document, a method of adding meta information to the schema of the XML document has been used.
[0020]
However, in the above method, the original XML document and the meta information portion are not distinguished on the schema of the object-oriented database.
(1) When storing an XML document in an object-oriented database, the XML document and the meta information are combined and then stored.
(2) When retrieving an XML document from an object-oriented database, the original XML document is extracted with the meta information part included.
Is performed. In particular, regarding (2), if the XML document to be handled is a standardized XML document such as RDF or SMIL, an error occurs in the meta information part as an invalid schema. The problem that it cannot be used for such processing occurs.
[0021]
For this reason, it is usually necessary to provide a means for deleting the meta information portion, which is an unnecessary portion of the application, so that the application can handle the meta information portion in the RDF or SMIL processing. As a result, the following problems occur in the development of the computer system. I was
[0022]
(1) The application needs to cope with the addition and deletion of the meta information, and the development cost is increased.
[0023]
(2) It is necessary to modify the application each time the meta information increases.
[0024]
The present invention has been made in view of the above circumstances, and even if an object-oriented structured document database does not have a function of storing meta information, the addition, change, and deletion of meta information can be performed without depending on an application program. It is an object of the present invention to provide a storage search device, a storage search method, a storage search program, and a storage search program recording medium that can flexibly cope with the following.
[0025]
[Means for Solving the Problems]
In order to achieve the above object, the present invention according to claim 1 is a storage and retrieval device for accessing an object-oriented structured document database based on an instruction from an application program and performing information processing. Setting information storage means for storing information relating to the path of the meta information of the structured document stored in the object-oriented structured document database, and the object-oriented structured document received under a storage instruction from the application program; In addition to storing in the structured document database, the meta information received from the application program is inserted into the stored structured document according to the information on the path of the meta information stored in the setting information storage means, and expanded. Storage means for storing as a structured document The structured document corresponding to the extended structured document stored in the object-oriented structured document database in accordance with a search instruction from the application program and information on the path of the meta information stored in the setting information storage unit. Or a search unit for separating and acquiring the meta information.
[0026]
According to a second aspect of the present invention, in the first aspect of the present invention, the search unit obtains the document having the structure based on a comparison between a path search expression as the search instruction and information on the path of the meta information. It is determined whether the document is a structured document or the meta information, and if the result of the determination is a structured document, the meta information inserted by the storage unit is removed from the acquired document.
[0027]
The present invention according to claim 3 is a storage and retrieval method of a storage and retrieval device that accesses an object-oriented structured document database based on an instruction from an application program and performs information processing, wherein the object-oriented structure is stored together with a structured document. Setting information storing step of storing information about a path of meta information of the structured document stored in the structured document database; and storing the structured document received under a storage instruction from the application program in the object-oriented structured document. The meta information received from the application program is inserted into the stored structured document according to the information on the path of the meta information stored in the setting information storing step, and is stored as an extended structured document. Storing step; According to a search instruction from the application program and information on the path of the meta information stored in the setting information storage step, the structured document or the extended structured document corresponding to the extended structured document stored in the object-oriented structured document database. And a search step of separating and acquiring the meta information.
[0028]
According to a fourth aspect of the present invention, in the third aspect of the present invention, in the searching step, based on a comparison between a path search expression as the search instruction and information on the path of the meta information, the document to be obtained has the structure It is determined whether the meta information is a structured document or the meta information, and if the result of the determination is a structured document, the meta information inserted in the storing step is removed from the acquired document.
[0029]
According to a fifth aspect of the present invention, there is provided a storage and retrieval program for causing the storage and retrieval device according to the third or fourth aspect to execute the above-described steps.
[0030]
According to a sixth aspect of the present invention, there is provided a storage and retrieval program recording medium which records the storage and retrieval program according to the fifth aspect on a computer-readable recording medium.
[0031]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0032]
FIG. 1 is a schematic configuration diagram of a storage and retrieval system 1 according to an embodiment of the present invention. The storage / retrieval system 1 shown in FIG. 1 includes an application 100, a storage / retrieval device 200, and an object-oriented structured document database (hereinafter, referred to as a database) 300. The storage / retrieval system 1 may have any configuration such as a single device or a system in which each component is distributed and a plurality of devices are connected to a network.
[0033]
The application 100 is an application program that uses the storage / retrieval apparatus 200, and is an application program that needs to store a structured document in a database and search for the structured document during the processing. Here, the search formula used for searching the structured document is the above-described path search formula.
[0034]
The storage / retrieval apparatus 200 has a function of storing the structured document and the meta information passed from the application 100 in the database 300, and a function of storing the structured document and the meta information stored in the database 300 with the path search formula passed from the application 100. And a function of returning a search result to the application 100.
[0035]
The database 300 is an object-oriented structured document database that stores structured documents, and has the three functions of the above-described tree structure acquisition function, tree structure operation function, and path search function.
[0036]
More specifically, the storage / search device 200 includes a control device 201, a setting information dictionary 202, a storage device 203, and a search device 204.
[0037]
When receiving the path search formula from the application 100, the control device 201 controls the other devices 202 to 204 and returns the search result to the application 100.
[0038]
The setting information dictionary 202 is a dictionary that stores setting information that determines the operation of the storage / retrieval apparatus 200.
[0039]
The storage device 203 receives the structured document and the meta information from the control device 201 and stores them in the database 300.
[0040]
The search device 204 receives the path search formula from the control device 201, executes a search on the database 300, and returns a return result to the control device 201.
[0041]
Note that the setting information stored in the setting information dictionary 202 includes a meta information path P indicating where to add the meta information to the structured document stored in the database 300.
[0042]
Next, the operation of the storage and retrieval system 1 according to the present embodiment will be described with reference to FIGS. Here, FIG. 2 is a flowchart illustrating a processing procedure of the storage / retrieval system 1. FIG. 3 to FIG. 5 are flowcharts illustrating the steps S100, S200, and S300 of FIG. 2 in detail.
[0043]
As shown in FIG. 2, the storage / search system 1 first sets a dictionary, and then stores a structured document or searches for a structured document according to an instruction from the application 100 (steps S100 to S100). S400). When processing a plurality of structured documents, step S100 is first performed only once, and thereafter, steps S200 and S300 are repeated for each structured document in an arbitrary order.
[0044]
Here, each of the above steps will be described. First, the dictionary setting step S100 will be described with reference to FIG.
[0045]
The user sets a meta information path set P which is a list of the above meta information paths P._1-nIs generated (step S101), and the generated meta information path set P_1-nIs input to the setting information dictionary 202 (step S102).
[0046]
Next, the structured document storage step S200 will be described with reference to FIG.
[0047]
The application 100 stores the structured document D and the meta information set M_1-nIs generated and input to the control device 201 to instruct storage of a structured document (step S201). The meta information M_iIs the path P_iIs meta information corresponding to.
[0048]
The control device 201 inputs the structured document D input in step S201 to the storage device 203, and instructs storage of the structured document (step S202).
[0049]
The storage device 203 inputs the structured document D input in step S202 to the database 300, and instructs insertion of all documents (step S203).
[0050]
The database 300 executes the entire document insertion using the structured document D input and specified in step S203 (step S204).
[0051]
Next, the control device 201 sets the meta information set M input in step S201._1-n, And the meta information path set P extracted from the setting information dictionary 202_1-nIs input to the storage device 203 to instruct the storage of the meta information (step S205).
[0052]
The storage device 203 stores the meta information M input in step S205._iTo the meta information path P_iTo the database 300 according to the above, and instruct the insertion of the partially structured document (step S206).
[0053]
The database 300 stores the meta information M input and specified in step S206._i, And meta information path P_i(Step S207). At this time, the meta information path P_iMeta information M at the position indicated by_iInsert
[0054]
Next, the structured document search step S300 will be described with reference to FIG.
[0055]
The application 100 generates a path search formula Q for acquiring a structured document or meta information from the database 300 (Step S301). At this time, the conditions of the path search formula Q include the structured document D and the meta information set M_1-nCan be specified.
[0056]
The application 100 inputs the path search formula Q generated in step S301 to the control device 201, and instructs execution of the search (step S302).
[0057]
The control device 201 inputs the path search formula Q input in step S302 to the search device 204, and instructs execution of the search (step S303).
[0058]
When the search device 204 inputs the path search expression Q input in step S303 to the database 300 and instructs the search device to execute the search, the database 300 executes the search, and sends the node set N to the search device 204._1-mIs returned (step S304).
[0059]
The search device 204 determines the node set N returned in step S304._1-mIs returned to the control device 201 (step S305).
[0060]
The control device 201 obtains the meta information path set P from the setting information dictionary 202._1-nAnd obtains the meta information path set P_1-nAnd the path P obtained by removing the condition from the path search expression Q_QIs compared with (step S306). This is, specifically, a meta information path set P_1-nAll meta information paths P in_kAbout the path P_QIs the meta information path P_kIt is determined whether or not it points to itself or its descendant nodes. Pass P_QIs the meta information path P_kIf it does not indicate itself or its descendant nodes, the path search expression Q is regarded as indicating the partially structured document set of the structured document D, and step S308 is executed. On the other hand, the path P_QIs the meta information path P_kWhen pointing to itself or its descendant nodes, the path search expression Q_1-nIs regarded as indicating the partial structured document set, and step S309 is executed (step S307).
[0061]
Pass P_QIs the meta information path P_kIf the control device 201 does not indicate itself or its descendant nodes, the control device 201 returns to the node set N returned in step S305._1-mIndividual nodes N_j, A partial structured document acquisition is executed (step S308). To obtain a partially structured document, the node N_jThis is done by acquiring all the following descendant nodes and assembling them into a structured document. However, at this time, the meta information path set P_1-nIs obtained, and the nodes corresponding to these paths are not obtained. Node N generated by this_jA partially structured document having the_jAnd Finally, the node set N_1-mGenerates a partially structured document for all nodes of the_1-mGet.
[0062]
On the other hand, the path P_QIs the meta information path P_kWhen pointing to itself or its descendant nodes, the control device 201 returns the node set N returned in step S305._1-mIndividual nodes N_j, A partial structured document acquisition is executed (step S309). To obtain a partially structured document, the node N_jThis is done by acquiring all the following descendant nodes and assembling them into a structured document. Node N generated by this_jA partially structured document having the_jAnd Finally, the node set N_1-m, A partially structured document is generated, and a partially structured document set E is generated._1-mGet.
[0063]
The control device 201 transmits the partially structured document set E generated in step S308 or S309._1-mIs returned to the application 100 (step S310).
[0064]
Next, specifically, storage in the case of using an XML (extensible Markup Language) as a structured document and a database (hereinafter, referred to as an XML DB) that supports XPath (XML Path Language) as a path search expression is used. -The search system 1 will be described.
[0065]
This storage / retrieval system 1 performs the operation shown in the flowchart of FIG. 2 described above. Here, in the actual processing, steps S200 and S300 are repeated a required number of times in an arbitrary order according to the purpose of use of the application 100 and the operation of the user. However, for the sake of explanation, steps S100 to S300 are performed only once. I do.
[0066]
First, the dictionary setting step S100 will be described with reference to FIG.
[0067]
The user inputs the meta information path P shown in FIG.₁And P₂Meta information path set P that is a list of_1-2Is generated (step S101).
[0068]
Then, the user sets the meta information path set P generated in step S101._1-2Is set in the setting information dictionary 202 (step S102).
[0069]
Next, the structured document storage step S200 will be described with reference to FIG.
[0070]
The application 100 includes a structured document D shown in FIG. 7 and a meta information set M shown in FIG._1-2Is generated and input to the control device 201 to instruct storage of a structured document (step S201). The meta information M_i(I = 1, 2) is the path P_iThis is meta information corresponding to (i = 1, 2).
[0071]
The control device 201 inputs the structured document D input in step S201 to the storage device 203, and instructs storage of the structured document (step S202).
[0072]
The storage device 203 inputs the structured document D input in step S202 to the database 300, and instructs insertion of all documents (step S203).
[0073]
The database 300 executes the entire document insertion using the structured document D input and specified in step S203 (step S204). FIG. 9 shows the structure of the stored structured document D in the database.
[0074]
The control device 201 sets the meta information set M input in step S201._1-2, And the meta information path set P extracted from the setting information dictionary 202_1-2Is input to the storage device 203 to instruct the storage of the meta information (step S205).
[0075]
The storage device 203 stores the meta information M input in step S205._iTo the meta information path P_iTo the database 300 to instruct insertion of a partially structured document (step S206).
[0076]
The database 300 stores the meta information M input and specified in step S206._i, And meta information path P_i(Step S207). At this time, the meta information path P_iMeta information M at the position indicated by_iInsert Inserted structured document D and meta information set M_1-2FIG. 10 shows the structure in the database. In FIG. 10, the meta information path P₁N₀₀₆Meta information M at the position₁Is the meta information path P₂N₀₀₇Meta information M at the position₂Is inserted.
[0077]
Next, the structured document search step S300 will be described with reference to FIG.
[0078]
The application 100 generates a path search formula Q shown in FIG. 11 for acquiring a structured document from the database 300 (Step S301). The path search formula Q shown in FIG. 11 specifies meta-information as a condition (the file name attribute of which is meta-information is “homepage1.xml”), and represents the acquisition of a partially structured document of the structured document D ( (A partially structured document under the path “/ RDF” is acquired.)
[0079]
The application 100 inputs the path search formula Q generated in step S301 to the control device 201, and instructs execution of the search (step S302).
[0080]
The control device 201 inputs the path search formula Q input in step S302 to the search device 204, and instructs execution of the search (step S303).
[0081]
When the search device 204 inputs the path search expression Q input in step S303 to the database 300 and instructs the search device to execute the search, the database 300 executes the search, and sends the node set N to the search device 204._1-mIs returned (step S304). Node set N to be returned_1-m(M = 1, N₁) Is shown in FIG.
[0082]
The search device 204 determines the node set N returned in step S304.₁Is returned to the control device 201 (step S305).
[0083]
The control device 201 uses the meta information path set P_1-2And the path P obtained by removing the condition from the path search expression Q_QAnd (Step S306). Path P in this example_QIs shown in FIG. Pass P_QIs the "RDF" node of the child node of the root node. Meta information path set P_1-2All meta information paths P in_kAbout the path P_QIs the meta information path P_kMeta information path P that is itself or its descendant node_kDoes not exist, the path search expression Q is regarded as returning a partially structured document set of the structured document D, and step S308 is executed (step S307).
[0084]
The control device 201 sets the node set N returned in step S305.₁Node N₁, A partial structured document acquisition is executed (step S308). To obtain a partially structured document, the node N₁This is done by acquiring all the following descendant nodes and assembling them into a structured document. However, at this time, the meta information path set P shown in FIG._1-2And do not obtain the nodes corresponding to these paths. Node N generated by this₁A partially structured document having the₁And In this specific example, the node set N₁Is the node N₁Node N₁More partially structured document set E₁Get. The partially structured document set E generated in FIG.₁Is shown. At this time, the meta information added to the tree structure in the database 300 becomes a partially structured document set E₁, Only the elements included in the original structured document D are output.
[0085]
The control device 201 transmits the partial structured document set E generated in step S308.₁Is returned to the application 100 (step S310).
[0086]
Next, the processing when the path search formula Q acquires meta information in step S307 will be described below.
[0087]
The application 100 generates a path search formula Q 'shown in FIG. 15 for acquiring a structured document from the database 300 (Step S301). The path search formula Q ′ specifies a structured document as a condition (the “/ RDF / Description / dc: creator” element of the structured document is “Kasuga”), and represents the acquisition of a partially structured document of meta information. (A partially structured document under the path “/ RDF / change_log / log” is acquired).
[0088]
The application 100 inputs the path search expression Q 'generated in step S301 to the control device 201, and instructs the execution of the search (step S302).
[0089]
The control device 201 inputs the path search expression Q ′ input in step S302 to the search device 204, and instructs execution of the search (step S303).
[0090]
The search device 204 inputs the path search expression Q 'input in step S303 into the database 300, and instructs execution of the search (step S304). The database 300 executes a search, and sends a node set N ′ to the search device 204._1-mIs returned. Node set N 'to be returned_1-m(M = 1, N ′₁) Is shown in FIG.
[0091]
The search device 204 determines the node set N ′ returned in step S304.₁Is returned to the control device 201 (step S305).
[0092]
The control device 201 obtains the meta information path set P from the setting information dictionary 202._1-2And the path P obtained by removing the condition from the path search expression Q_Q _'And (Step S306). Path P in this example_Q _'Is shown in FIG. Pass P_QIs the meta information set M₂Is a "log" node of the child node. Meta information path set P_1-2All meta information paths P in_kAbout the path P_QIs the meta information path P_kMeta information path P that is itself or its descendant node_kExists ("Meta information path P₂Applies), so that the path search expression Q 'is_1-2Is regarded as returning the partially structured document set, and the step S309 is executed (step S307).
[0093]
The control device 201 sets the node set N ′ returned in step S305.₁Node N '₁, A partial structured document acquisition is executed (step S309). Partial structured document acquisition is performed at node N '₁This is done by acquiring all the following descendant nodes and assembling them into a structured document. Node N 'generated by this₁Is converted to a partially structured document E '₁And In this specific example, the node set N '_1-Is a node N '₁Only node N ′₁More partially structured document set E '₁Get. The partially structured document set E 'generated in FIG.₁Is shown.
[0094]
The control device 201 transmits the partially structured document set E ′ generated in step S309.₁Is returned to the application 100 (step S310).
[0095]
Therefore, according to the storage / retrieval system 1 of the present embodiment, the storage / retrieval device 200 as middleware is used between the database 300 having no function of storing meta information and the application 100, so that the storage / retrieval system 1 is dependent on the application 100. Without doing so, it is possible to flexibly cope with addition, change, and deletion of meta information, so that it is possible to improve the convenience of system design and development.
[0096]
Specifically, when storing a structured document in the object-oriented structured document database, the meta information can be stored together. When searching the structured document, the structured document and the meta information can be stored. Can be obtained separately. In addition, when acquiring a structured document, it is possible to perform a search using meta information as a condition. When acquiring meta information, it is possible to perform a search using a structured document as a condition.
Note that the middleware program stored in the storage / retrieval apparatus 200 according to the above-described embodiment can be recorded on a computer-readable recording medium such as a hard disk, a flexible disk, a CD-ROM, an MO, and a DVD-ROM. It is also possible to distribute via.
[0097]
【The invention's effect】
As described above, according to the present invention, even if the object-oriented structured document database does not have a function of storing meta information, it is possible to flexibly add, change, and delete meta information without depending on an application program. , A storage search method, a storage search program, and a storage medium storing a storage search program.
[0098]
This makes it possible to reduce the system development cost of a computer system that stores and retrieves structured documents and meta information using an object-oriented structured document database that does not have a function of storing meta information.
[Brief description of the drawings]
FIG. 1 is a schematic configuration diagram of a storage / retrieval system according to an embodiment of the present invention.
FIG. 2 is a flowchart showing an operation of the storage / retrieval system according to the embodiment of the present invention;
FIG. 3 is a flowchart showing a dictionary setting operation of the storage / search system according to the embodiment of the present invention.
FIG. 4 is a flowchart showing an operation of storing a structured document in the storage and retrieval system according to the embodiment of the present invention.
FIG. 5 is a flowchart illustrating a structured document search operation of the storage and search system according to the embodiment of the present invention.
FIG. 6 is an example of a meta information path.
FIG. 7 is an example of a structured document.
FIG. 8 is an example of meta information.
FIG. 9 is an example of a structured document stored in an object-oriented structured document database.
FIG. 10 is an example of a structured document stored in an object-oriented structured document database.
FIG. 11 is an example of a path search formula.
FIG. 12 is an example of a node set.
FIG. 13 is an example of a path.
FIG. 14 is an example of a partially structured document.
FIG. 15 is an example of a path search expression.
FIG. 16 is an example of a node set.
FIG. 17 is an example of a path.
FIG. 18 is an example of a partially structured document.
FIG. 19 is an example of a structured document.
FIG. 20 is an example of a structured document stored in an object-oriented structured document database.
FIG. 21 is an example of a retrieved partial structured document.
FIG. 22 is an example of a partially structured document to be inserted.
FIG. 23 is an example of a structured document stored in an object-oriented structured document database after insertion.
FIG. 24 is an example of a path search formula.
[Explanation of symbols]
1 Storage and retrieval system
100 applications
200 storage and retrieval device
201 Control device
202 Setting information dictionary
203 storage device
204 search device
300 Object-Oriented Structured Database

Claims

A storage and retrieval device that accesses an object-oriented structured document database based on an instruction from an application program and performs information processing,
Setting information storage means for storing information about a path of meta information of the structured document stored in the object-oriented structured document database together with the structured document;
The structured document received under the storage instruction from the application program is stored in the object-oriented structured document database, and the meta information received from the application program is stored in the setting information storage unit. Storage means for inserting into the stored structured document according to the information on the path of the meta information, and storing as an extended structured document;
According to a search instruction from the application program and information on the path of the meta information stored in the setting information storage unit, the structured document corresponding to the extended structured document stored in the object-oriented structured document database or Search means for separating and acquiring the meta information,
A storage and retrieval device characterized by having:

The search means,
Based on a comparison between the path search expression that is the search instruction and the information about the path of the meta information, it is determined whether the document to be obtained is the structured document or the meta information, and when the determined result is a structured document, 2. The storage and retrieval device according to claim 1, wherein said meta information inserted in said storage means is removed from the acquired document.

A storage and retrieval method for a storage and retrieval device that accesses an object-oriented structured document database based on an instruction from an application program and performs information processing,
Setting information storing step of storing information on a path of meta information of the structured document stored in the object-oriented structured document database together with the structured document;
Storing the structured document received under the storage instruction from the application program in the object-oriented structured document database; and storing the meta information received from the application program in the setting information storing step. According to the information on the path of, the storage step of inserting into the stored structured document, and storing as an extended structured document,
According to the search instruction from the application program and the information on the path of the meta information stored in the setting information storing step, the structured document or the extended structured document corresponding to the extended structured document stored in the object-oriented structured document database A search step for separating and obtaining meta information;
A storage retrieval method, comprising:

The search step includes:
Based on a comparison between the path search expression that is the search instruction and the information about the path of the meta information, it is determined whether the document to be obtained is the structured document or the meta information, and when the determined result is a structured document, 4. The storage retrieval method according to claim 3, wherein said step of removing said meta information inserted in said storing step from the acquired document.

A storage retrieval program that causes the storage retrieval device according to claim 3 or 4 to execute each of the steps.

A storage medium storing a storage search program according to claim 5, wherein the storage search program according to claim 5 is recorded on a computer-readable storage medium.