JP2004280405A

JP2004280405A - Information providing system, information providing method, and computer program

Info

Publication number: JP2004280405A
Application number: JP2003070157A
Authority: JP
Inventors: Takashi Nozaki; 隆志野崎; Shinichiro Sega; 信一郎瀬賀; Yasushi Fukuda; 安志福田
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2003-03-14
Filing date: 2003-03-14
Publication date: 2004-10-07

Abstract

【課題】ＷＷＷ情報をキャッシュしておくことにより特定のユーザの要求から比較的短い応答時間で情報を提供する。
【解決手段】ユーザのリクエスト・ログを用いて、ユーザ毎にプリフェッチを行なうことで、特定ユーザのリクエストしたＷＷＷ情報を効率的にキャッシュする。また、先読みする機構に、ベイジアン・ネットワーク・モデルを用いることで、刻一刻と変わるユーザの嗜好性の変化に応じてアクセス・パターンが変化したとしても、これに対応して先読み対象を確率的に求め、先読み対象を限定し、ユーザの変化に動的に対応する。
【選択図】図１The present invention provides information with a relatively short response time from a request of a specific user by caching WWW information.
A WWW information requested by a specific user is efficiently cached by performing a prefetch for each user using a request log of the user. In addition, by using a Bayesian network model for the look-ahead mechanism, even if the access pattern changes in response to an ever-changing user preference, the look-ahead target is stochastically corresponding to this. It seeks, limits the prefetch target, and responds dynamically to user changes.
[Selection diagram] Fig. 1

Description

【０００１】
【発明の属する技術分野】
本発明は、ユーザが要求した情報を提供する情報提供システム及び情報提供方法、並びにコンピュータ・プログラムに係り、特に、ＷＷＷ情報空間で不特定多数のユーザに提供される情報の中から、特定のユーザの要求に応じて効率的に情報を提供する情報提供システム及び情報提供方法、並びにコンピュータ・プログラムに関する。
【０００２】
さらに詳しくは、本発明は、ＷＷＷ情報をキャッシュしておくことにより特定のユーザの要求から比較的短い応答時間で情報を提供する情報提供システム及び情報提供方法、並びにコンピュータ・プログラムに係り、特に、ユーザが要求すると思われる情報をあらかじめ先読みしてキャッシュしておく（すなわち先読みする）ことにより、ユーザの要求から比較的短い応答時間で情報を提供する情報提供システム及び情報提供方法、並びにコンピュータ・プログラムに関する。
【０００３】
【従来の技術】
昨今、ネットワーク・コンピューティング技術が急速に展開している。ネットワーク接続環境下では、コンピュータ資源の共有や、情報の共有・流通・配布・交換などの協働的作業を円滑に行なうことができる。
【０００４】
コンピュータ同士を相互接続するネットワークの形態は様々である。例えば、イーサネット（登録商標）のような局所に敷設されたＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）や、さらには、ネットワーク間の相互接続を繰り返し行った結果として文字通り世界規模のネットワークへ成長を遂げた「インターネット」（ＴｈｅＩｎｔｅｒｎｅｔ）などさまざまである。特に、ブロードバンド通信や常時接続の普及と相俟って、インターネットが広汎に普及してきている。
【０００５】
インターネット上では、サーバ同士がＴＣＰ／ＩＰ（ＴｒａｎｓｍｉｓｓｉｏｎＣｏｎｔｒｏｌＰｒｏｔｏｃｏｌ・ｉｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ）ベースで相互接続され、ＷＷＷ（ｗｏｒｌｄＷｉｄｅＷｅｂ）、Ｎｅｗｓ、ＴＥＬＮＥＴ（ＴＥＬｅｔｙｐｅｗｒｉｔｅｒＮＥＴｗｏｒｋ）、ＦＴＰ（ＦｉｌｅＴｒａｎｓｆｅｒＰｒｏｔｏｃｏｌ）、Ｇｏｐｈｅｒなど、多数のサービスが公開されている。
【０００６】
このうちＷＷＷは、ハイパーリンク構造の情報空間を提供する広域情報検索システムであり、インターネットの爆発的な成長や急速な普及を遂げる最大の要因ともなっている。ＷＷＷシステム上では、テキスト、画像、音声などの各種メディアの情報コンテンツが公開されている。ＷＷＷサービスの利用者は、クライアント装置からインターネットを通じて、ＷＷＷ情報を提供するサーバに接続し、ＷＷＷ情報を取得することができる。
【０００７】
ＷＷＷ情報は、ＨＴＭＬ（ＨｙｐｅｒＴｅｘｔＭａｒｋｕｐＬａｎｇｕａｇｅ）と呼ばれるハイパーテキスト形式の記述言語で記述されている。ＴＣＰ／ＩＰに従えば、情報資源は、ＵＲＬ（ＵｎｉｆｏｒｍＲｅｓｏｕｒｃｅＬｏｃａｔｏｒ）という形式の識別子によって特定され、ＨＴＴＰ（ＨｙｐｅｒＴｅｘｔＴｒａｎｓｆｅｒＰｒｏｔｏｃｏｌ）プロトコルに従ってＨＴＭＬドキュメントを転送することができる（周知）。
【０００８】
ＷＷＷ情報は、通常、ページと呼ばれる単位でサーバから提供されている。クライアント側では、ＷＷＷブラウザを用いてページ単位でＷＷＷ情報をダウンロードして、画面上にＷＷＷページとして表示させることができる。また、ハイパーテキスト形式で記述されるＷＷＷページは、ハイパーリンクによって、同じサーバ上で提供されているページ、あるいは他のサーバ上で提供されているページとの間で相互に参照関係を持っている。
【０００９】
ここで、ＷＷＷ情報を利用するに際し、サーバの処理時間やインターネットの帯域幅などの理由から、サーバから直接情報を入手する場合の応答時間が遅い問題がある。
【００１０】
この問題点を解決するために、ユーザのクライアント装置とＷＷＷの情報を提供する装置からなるインターネット上に、キャッシュ・サーバと呼ばれるＷＷＷ情報提供装置を設置するという手法が採られている（例えば、特許文献１を参照のこと）。キャッシュ・サーバは、不特定多数のユーザから要求されたＷＷＷ情報のうちアクセス頻度の高いＷＷＷ情報を装置内に蓄えておく。そして、ユーザから要求された情報が装置内にあれば（キャッシュ・ヒット）、ユーザに提供するが、要求された情報が装置内になければ（キャッシュ・ミス）、ＷＷＷ情報を提供するサーバに情報を取りに行き、要求元ユーザに提供する。
【００１１】
ところが、従来のキャッシュ・サーバは、一般に、不特定多数のユーザによって共用され、また、記憶容量が有限であることから、ＦＩＦＯ（ＦａｓｔＩｎＦａｓｔｏｕｔ）やＬＲＵ（ＬｅａｓｔＲｅｃｅｎｔｌｙＵｓｅｄ）などの論理に従ってキャッシュ・データの管理（データの削除）が行なわれている。このような場合、キャッシュ・サーバは、不特定多数のユーザのアクセス情報に基づいてアクセス頻度の高いＷＷＷ情報を装置内に蓄えているので、必ずしも特定のユーザが要求した情報がキャッシュ・サーバ内に蓄えられているとは限らない。このため、キャッシュ・ミスにより、キャッシュ・サーバがＷＷＷ情報を提供するサーバに逐次情報を取りに行く結果となり、結局のところ、応答時間の問題を解決することができない。
【００１２】
また、ユーザが要求したＷＷＷ情報の参照先ページ内から、他のページへのリンク情報を取得し、すべての参照先ページを先読みしておき、応答時間の低減を図ることが行なわれている（例えば、非特許文献１を参照のこと）。
【００１３】
しかしながら、この場合、すべてのページにおいて、ページ内のすべてのリンク先を先読みするので、アクセスされない可能性の高いものまで先読みしてしまうことになり、非効率的であり、帯域を無駄に使ってしまうという問題がある。
【００１４】
また、ユーザやＷＷＷ情報の参照先ページに重要度や優先度を設定して、先読みを行なうという方法もある（例えば、特許文献２を参照のこと）。例えば、プロキシー・サーバのＷＷＷデータ・アクセスの履歴を集計・解析する仕組みを設け、その解析結果からキャッシュ・サーバにキャッシュされているデータに優先度を与える。そして、優先度に従ってリクエストされたデータ毎に先読みを行なうかどうかの条件を設ける。これによって、先読みによるネットワークへの負荷の増大を抑えながら、先読みによるキャッシュの再現率の向上を実現することができる。
【００１５】
しかしながら、この場合、重要度や優先度が画一的になってしまうことから、先読みする対象も同様に画一的になる、という問題がある。言い換えれば、ユーザの嗜好性が変わり、動的にアクセス・パターンが変わってしまうような場合、キャッシュ・サーバは対応することができない。
【００１６】
【特許文献１】
特開平１０−２１１７４号公報
【特許文献２】
特開平１１−１４９４０５号公報
【非特許文献１】
Ｗｅｂサーバー完全技術解説、日経ＢＰ社
【００１７】
【発明が解決しようとする課題】
本発明の目的は、ＷＷＷ情報空間で不特定多数のユーザに提供される情報の中から、特定のユーザの要求に応じて効率的に情報を提供することができる、優れた情報提供システム及び情報提供方法、並びにコンピュータ・プログラムを提供することにある。
【００１８】
本発明のさらなる目的は、ＷＷＷ情報をキャッシュしておくことにより特定のユーザの要求から比較的短い応答時間で情報を提供することができる、優れた情報提供システム及び情報提供方法、並びにコンピュータ・プログラムを提供することにある。
【００１９】
本発明のさらなる目的は、ユーザが要求するであろう情報をあらかじめ先読みすることにより、ユーザの要求から比較的短い応答時間で情報を提供することができる、優れた情報提供システム及び情報提供方法、並びにコンピュータ・プログラムを提供することにある。
【００２０】
本発明のさらなる目的は、刻一刻と変わるユーザの嗜好性の変化に対応してユーザが要求する情報を先読みしておくことにより、ユーザの要求から比較的短い応答時間で情報を提供することができる、優れた情報提供システム及び情報提供方法、並びにコンピュータ・プログラムを提供することにある。
【００２１】
【課題を解決するための手段及び作用】
本発明は、上記課題を参酌してなされたものであり、その第１の側面は、複数のサイトに跨って相互に参照関係を持つページからなる情報を提供する情報提供システムであって、
ユーザ毎の情報を要求するリクエスト・ログを抽出するログ抽出手段と、
リクエスト・ログに含まれるページのアクセス系列に基づいて、次にアクセスされるページを予測し、先読みすべきページをユーザ毎に求めるプリフェッチ対象ページ選定手段と、
リクエスト・ログに含まれるサイトのアクセス系列に基づいて、次にアクセスされるサイトを予測し、先読みすべきサイトをユーザ毎に求めるプリフェッチ対象サイト選定手段と、
前記プリフェッチ対象ページ選定手段及び前記プリフェッチ対象サイト選定手段により選定されたページ及びサイトに基づいて、情報の先読みを行なうプリフェッチ手段と、
を具備することを特徴とする情報提供システムである。
【００２２】
但し、ここで言う「システム」とは、複数の装置（又は特定の機能を実現する機能モジュール）が論理的に集合した物のことを言い、各装置や機能モジュールが単一の筐体内にあるか否かは特に問わない。
【００２３】
本発明に係る情報提供システムは、ＷＷＷサーバにおいて公開されているＷＷＷ情報をキャッシュしておくことにより、ユーザのリクエストから比較的短い応答時間で情報を提供する。
【００２４】
この際、ユーザのリクエスト・ログを用いてユーザ毎にプリフェッチを行なうことで、不特定多数ではなく、特定のユーザのリクエストしたＷＷＷ情報を効率的にキャッシュすることができる。
【００２５】
ここで、前記プリフェッチ対象ページ選定手段及び／又は前記プリフェッチ対象サイト選定手段は、ユーザのアクセス系列を記述したユーザ毎の確率ネットワーク・モデルを用いて、ユーザが次にアクセスするページ及び／又はサイトを予測する。
【００２６】
すなわち、ＷＷＷ情報を先読みする機構に、確率ネットワーク・モデルの１つであるベイジアン・ネットワーク・モデルを用いることで、刻一刻と変わるユーザの嗜好性の変化に応じてアクセス・パターンが変化したとしても、これに対応して先読み対象を確率的に求め、先読み対象を限定し、ユーザの変化に動的に対応することができる。
【００２７】
また、前記プリフェッチ対象ページ選定手段及び／又は前記プリフェッチ対象サイト選定手段は、次のユーザのアクセス結果に基づいて、確率ネットワーク・モデルを更新することができる。
【００２８】
また、前記プリフェッチ手段は、ネットワークの負荷などを考慮してプリフェッチ対象のプリフェッチ作業を行なうようにしてもよい。例えば、予測に基づくプリフェッチ対象のページ及び／又はサイトの優先順位を、ネットワーク負荷を考慮して並べ替えるようにしてもよい。
【００２９】
また、プリフェッチした情報などのキャッシュ情報を、ユーザ毎に割り当てられた記憶領域に蓄積するようにしてもよい。あるいは、ユーザを区別しないで同じ記憶空間上でキャッシュ情報を管理するようにしてもよい。
【００３０】
また、本発明の第２の側面は、複数のサイトに跨って相互に参照関係を持つページからなる情報を提供するための処理をコンピュータ・システム上で実行するようにコンピュータ可読形式で記述されたコンピュータ・プログラムであって、
ユーザ毎の情報を要求するリクエスト・ログを抽出するログ抽出ステップと、
リクエスト・ログに含まれるページのアクセス系列に基づいて、次にアクセスされるページを予測し、先読みすべきページをユーザ毎に求めるプリフェッチ対象ページ選定ステップと、
リクエスト・ログに含まれるサイトのアクセス系列に基づいて、次にアクセスされるサイトを予測し、先読みすべきサイトをユーザ毎に求めるプリフェッチ対象サイト選定ステップと、
前記プリフェッチ対象ページ選定ステップ及び前記プリフェッチ対象サイト選定ステップにより選定されたページ及びサイトに基づいて、情報の先読みを行なうプリフェッチ・ステップと、
を具備することを特徴とするコンピュータ・プログラムである。
【００３１】
本発明の第２の側面に係るコンピュータ・プログラムは、コンピュータ・システム上で所定の処理を実現するようにコンピュータ可読形式で記述されたコンピュータ・プログラムを定義したものである。換言すれば、本発明の第２の側面に係るコンピュータ・プログラムをコンピュータ・システムにインストールすることによって、コンピュータ・システム上では協働的作用が発揮され、本発明の第１の側面に係る情報提供システムと同様の作用効果を得ることができる。
【００３２】
本発明のさらに他の目的、特徴や利点は、後述する本発明の実施形態や添付する図面に基づくより詳細な説明によって明らかになるであろう。
【００３３】
【発明の実施の形態】
以下、図面を参照しながら本発明の実施形態について詳解する。
【００３４】
本発明は、ユーザのクライアント装置とＷＷＷの情報を提供する装置からなるインターネット上にキャッシュ・サーバを配置し、キャッシュ・サーバが次にアクセスされる情報の先読み（プリフェッチ）作業をユーザ毎に行なうことにより、不特定多数ではなく、特定ユーザのリクエストしたＷＷＷ情報を効率的に提供するものである。
【００３５】
図１には、本発明を適用したＷＷＷ情報提供システムの構成例を模式的に示している。同図に示す例では、ユーザが情報を要求するクライアントとしての端末１００と、ＷＷＷ情報をページ単位で提供するＷｅｂサーバ１０３と、Ｗｅｂサーバ１０３から提供されるＷＷＷ情報を蓄積しておくキャッシュ・サーバ装置１０４が、ネットワーク１０１を通じてインターネット１０２に繋がっている。キャッシュ・サーバ装置１０４内に、本発明の一実施形態に係るキャッシュ・システム１０５（後述）が構築されている。
【００３６】
端末１００は、ユーザがインターネット上のＷＷＷ情報を閲覧する際に用いるクライアント端末のことであり、例えば、ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）やＰＤＡ（ＰｅｒｓｏｎａｌＤａｔａＡｓｓｉｓｔａｎｔｓ）、あるいは携帯電話などに相当する。
【００３７】
Ｗｅｂサーバ１０３は、インターネット上でＷＷＷ情報の提供サービスを実施する者が、ユーザへＷＷＷ情報を提供するために設置した装置で構成される。
【００３８】
ＷＷＷ情報は、ＨＴＭＬと呼ばれるハイパーテキスト形式の記述言語を用いて記述されている。ＷＷＷ情報は、ＵＲＬによって特定され、ＨＴＴＰプロトコルに従って転送することができる。ＷＷＷ情報は、通常、ページ（以下、「Ｗｅｂページ」とする）と呼ばれる単位でサーバから提供されている。クライアント側では、ＷＷＷブラウザを用いてページ単位でＷＷＷ情報をダウンロードして、画面上にＷｅｂページとして表示させることができる。また、ハイパーテキスト形式で記述されるＷｅｂページは、ハイパーリンクによって、同じサーバ上で提供されているページ、あるいは他のサーバ上で提供されているページとの間で相互に参照関係を持っている。
【００３９】
キャッシュ・サーバ装置１０４は、ユーザが、ＷｅｂページなどのＷＷＷ情報を、Ｗｅｂサーバ１０３から直接得る場合の遅延時間を低減のために配設されている。キャッシュ・システム１０５は、各ユーザのＷＷＷ情報のリクエスト毎に、ユーザがリクエストしたＷｅｂページなどの情報をキャッシュすること、及び、ユーザがリクエストしたＷｅｂページの次にアクセスする可能性のあるページを予測してキャッシュ（先読み）することを行なうようになっている。図示のキャッシュ・サーバ装置１０４は、「バックサイド・キャッシュ」と呼ばれる構成である。
【００４０】
図示の形態では、ユーザのリクエストが端末１００からネットワーク１０１を通じてインターネット１０２に流れ、キャッシュ・サーバ装置１０４に届く。キャッシュ・サーバ装置１０４では、キャッシュ・システム１０５にユーザのリクエストを渡す。
【００４１】
キャッシュ・システム１０５では、ユーザのリクエストしたページをキャッシュしていれば（キャッシュ・ヒット）、キャッシュ・サーバ装置１０４に、ユーザのリクエストしたページを渡すが、キャッシュしていなければ（キャッシュ・ミス）、ユーザのリクエストしたページをＷｅｂサーバ１０３から取得し、キャッシュ・サーバ装置１０４に渡す。
【００４２】
また、キャッシュ・システム１０５では、ユーザのリクエストを受け取ったときに、次にユーザがアクセスする可能性のあるページをあらかじめキャッシュしておく処理（先読み）を行なうが、この点については後に詳解する。
【００４３】
また、図２には、本発明を適用したＷＷＷ情報提供システムについての他の構成例を模式的に示している。同図に示す例では、ユーザが情報を要求するクライアントとしての端末２００や、社内ＬＡＮ２０３などが、ネットワーク２０１を通じてインターネット２０２に繋がっている。ここで言う社内ＬＡＮ２０３は、インターネット上でＷｅｂページを提供するサービスを行なう会社内のローカル・エリア・ネットワークなどのことを指す。
【００４４】
また、社内ＬＡＮ２０３上には、ＷＷＷ情報をページ単位で提供するＷｅｂサーバ２０６や、Ｗｅｂサーバ１０３から提供されるＷＷＷ情報を蓄積しておくキャッシュ・サーバ装置２０４が繋がっている。キャッシュ・サーバ装置２０４内には、本発明の一実施形態に係るキャッシュ・システム２０５が構築されている。図示のキャッシュ・サーバ装置２０４は、「フロントサイド・キャッシュ」と呼ばれる構成である。
【００４５】
図示の形態では、ユーザのリクエストが端末２００からネットワーク２０１を通じてインターネット２０２に流れ、社内ＬＡＮ２０３経由で、キャッシュ・サーバ装置２０４に届く。キャッシュ・サーバ装置２０４では、キャッシュ・システム２０５にユーザのリクエストを渡す。
【００４６】
キャッシュ・システム２０５では、ユーザのリクエストしたページをキャッシュしていれば（キャッシュ・ヒット）、キャッシュ・サーバ装置２０４に、ユーザのリクエストしたページを渡すが、キャッシュしていなければ（キャッシュ・ミス）、Ｗｅｂサーバ２０３から、ユーザのリクエストしたページを取得し、キャッシュ・サーバ装置２０４に渡す。
【００４７】
また、キャッシュ・システム２０５では、ユーザのリクエストを受け取ったときに、次にユーザがアクセスする可能性のあるページを予測してあらかじめキャッシュしておく処理（先読み）を行なうが、この点については後に詳解する。
【００４８】
また、図３には、本発明を適用したＷＷＷ情報提供システムについてのさらに他の構成例を模式的に示している。同図に示す例では、ユーザが情報を要求するクライアントとしての端末３００や、ＷＷＷ情報をページ単位で提供するＷｅｂサーバ３０４が、ネットワーク３０２を通じてインターネット３０３に繋がっている。また、端末３００内には、本発明の一実施形態に係るキャッシュ・システム３０１が構築されている。
【００４９】
図示の形態では、ユーザのリクエストが、まず端末３００からキャッシュ・システム３０１に渡される。そして、キャッシュ・システム３０１では、ユーザのリクエストしたページをキャッシュしていれば（キャッシュ・ヒット）、端末３００にユーザのリクエストしたページを渡すが、キャッシュしていなければ（キャッシュ・ミス）、ネットワーク３０２を通じインターネット３０３を経由して、Ｗｅｂサーバ３０４からユーザのリクエストしたページを取得し、これを端末３００に渡す。
【００５０】
また、キャッシュ・システム３０１では、ユーザのリクエストを受け取ったときに、次にユーザがアクセスする可能性のあるページを予測してあらかじめキャッシュしておく処理（先読み）を行なうが、この点については後に詳解する。
【００５１】
図４には、本発明の一実施形態に係るキャッシュ・システムの機能構成を模式的に示している。
【００５２】
リクエスト処理部１００１では、ユーザのリクエストをアクセス・ログ１００２に記録し、ログ抽出部１００３にログ処理を指示するとともに、ユーザのリクエストしたページをキャッシュ管理部１００４から取得して、ユーザに取得したページを返す。また、キャッシュ管理部１００４から該当するページを取得することができないときは、外部のＷｅｂサーバからユーザのリクエストしたページを取得し、キャッシュ管理部（１００４）に取得したページを書き込むように指示した後、ユーザに取得したページを返すことができる。
【００５３】
図５には、アクセス・ログ１００２において記録されているログのデータ構造の一例を示している。同図に示す例では、ユーザからのリクエスト毎にログのレコードが生成され、各レコードは、アクセスの日付や時間、要求元ユーザのＩＰアドレス、リクエスト種別、リクエストしたＷｅｂページのＵＲＬ、リクエストされた文書タイプなどを記録するフィールドを持っている。
【００５４】
ログ抽出部１００３は、リクエスト処理部１００１からの指示に応答して、プリフェッチ対象ページ選定部１００７とプリフェッチ対象サイト選定部１００９がそれぞれ先読みすべきページ又はサイトを予測するために必要な情報（例えば、クライアントＩＰアドレス、ＵＲＬ、アクセス時刻など）をアクセス・ログ１００２から抽出し、プリフェッチ対象ページ選定部１００７とプリフェッチ対象サイト選定部１００９のそれぞれに抽出した情報を受け渡し、選定処理を行なうように指示する。
【００５５】
キャッシュ管理部１００４は、リクエスト処理部１００１やプリフェッチ部１０１３の要求に応じて、プリフェッチ要求されているファイルがキャッシュ・ディスク１００６上に存在するかの検索、キャッシュ・ディスク１００６からのファイルの読み出し、キャッシュ・ディスク１００６へのファイルの書き込みなどのディスク・アクセス処理を行なうとともに、キャッシュ・ディスク１００６のファイル・リストとアクセスした日時などのキャッシュ管理情報をキャッシュ管理テーブル１００５に記録する。
【００５６】
図６には、キャッシュ管理テーブル１００５のデータ構造の一例を示している。同図に示す例では、要求元クライアント・ユーザ毎にキャッシュ管理情報が管理されている。各クライアントは、ＩＰアドレスによって識別され、それぞれのための専用のキャッシュ記憶容量が割り当てられている。キャッシュ管理テーブル１００５では、各クライアントについての割り当てられている記憶容量と現在使用中の記憶容量が管理されるとともに、各クライアント毎に、リクエストされたＷｅｂページについての最終アクセス日時、ＵＲＬ、キャッシュ・ディスク（１００６）内でのファイル名からなるリストが記録されている。リクエストされたＷｅｂページのリストは、例えばクライアント毎に最終アクセス日時の新しい順に配列されている。
【００５７】
キャッシュ管理部１００４でファイルの検索を行なう場合には、キャッシュ管理テーブル１００５を参照する。したがって、端末からの要求のあったファイル、Ｗｅｂサーバから取得したファイルなどは、キャッシュ管理テーブル１００５上に存在することになる。図６に示したキャッシュ管理テーブルの構成例では、ユーザの端末を区別してキャッシュしている。すなわち、ユーザの端末毎にキャッシュ・ディスク１００６の割り当て容量を決め、端末毎のキャッシュを行なうことができる。その反面、ユーザの端末毎のキャッシュ領域間で重複したファイルがキャッシュされる可能性があり、キャッシュ・ディスク１００６を効率的に利用できない場合がある。
【００５８】
これに対し、すべてのユーザの端末を区別することなくファイルを管理する方法も考えられる。一般的なキャッシュ・サーバではこのような管理方法が採られている。図２４には、すべてのユーザの端末を区別しない場合におけるキャッシュ管理テーブル１００５のデータ構造の一例を示している。同図に示す例では、キャッシュ管理テーブル１００５上では、各クライアントは識別されず、クライアント毎のキャッシュ領域の割り当てもない。キャッシュ管理テーブル１００５は、キャッシュ・ディスク１００６全体の割当容量と現在の容量を把握するとともに、リクエストされたＷｅｂページについての最終アクセス日時、ＵＲＬ、キャッシュ・ディスク（１００６）内でのファイル名からなるリストが、アクセス要求元クライアントを識別することなく記録されている。この場合、ユーザの端末毎にキャッシュされないという短所があるが、アクセスの多いものほどキャッシュ上に残ることになるので、キャッシュ・ディスク１００６の利用効率がよくなる。
【００５９】
本発明では、ユーザの端末からリクエストがある度に、リクエストされたページなどの次にリクエストされる可能性があるページ又はサイトを予測し、予測されたページ又はサイトをＷｅｂサーバから取得（先読み）し、キャッシュ・ディスク１００６上にあらかじめキャッシュしておく（後述）。キャッシュ管理テーブル１００５を図６又は図２４のいずれの構成を採用しても、リクエストされる可能性のあるページなどが常にキャッシュ管理テーブル１００５並びにキャッシュ・ディスク１００６上に存在する形となる。
【００６０】
プリフェッチ対象ページ選定部１００７は、ログ抽出部１００３がアクセス・ログ１００２から抽出した情報を受け取り、この情報に基づいて次にユーザがアクセスする可能性があるページを予測して、プリフェッチ対象ページとして選定する。より具体的には、ユーザがリクエストしたページの次にアクセスされる可能性のあるページを０〜１までの範囲の確率値で計算し、プリフェッチするまでの取得期限付きで選定し、選定した結果をプリフェッチ対象ページ・リスト１００８に記録する。また、プリフェッチ・スケジューラ部１０１１に対して、プリフェッチ対象ページ・リスト１００８が更新されたことを通知する。
【００６１】
図７には、プリフェッチ対象ページ・リスト１００８のデータ構造の一例を示している。同図に示す例では、プリフェッチ対象ページ・リスト１００８上には、アクセスされる可能性があると予測されたページが例えば確率値の順でリストされ、各ページをリクエストするクライアントのＩＰアドレス、当該ページのＵＲＬ、プリフェッチの期限などが記録されている。
【００６２】
プリフェッチ対象サイト選定部１００９は、ログ抽出部１００３がアクセス・ログ１００２から抽出した情報を受け取り、この情報すなわちユーザのアクセス系列に基づいてユーザが次にアクセスする可能性があるサイトを予測して、プリフェッチ対象サイトとして選定する。より具体的には、ユーザがリクエストしたページが所在するサイトの次にアクセスされる可能性のあるサイトを０〜１までの範囲の確率値で計算し、プリフェッチするまでの取得期限付きで選定し、選定した結果をプリフェッチ対象サイト・リスト１０１０に記録する。また、プリフェッチ・スケジューラ部１０１１に対して、プリフェッチ対象サイト・リスト１０１０が更新されたことを通知する。
【００６３】
図８には、プリフェッチ対象サイト・リスト１０１０のデータ構造の一例を示している。同図に示す例では、プリフェッチ対象サイト・リスト１０１０上には、アクセスされる可能性があると予測されたサイトが例えば確率値の順でリストされ、各サイトをリクエストするクライアントのＩＰアドレス、当該サイトのＵＲＬ、プリフェッチの期限などが記録されている。
【００６４】
プリフェッチ・スケジューラ部１０１１は、プリフェッチ対象ページ選定部１００７からの通知に応答して、プリフェッチ対象ページ・リスト１００８を読み込み、プリフェッチするページを確率値順にプリフェッチ・スケジュール表１０１２に記録する。また、プリフェッチ対象サイト選定部１００９からの通知に応答して、プリフェッチ対象サイト・リスト１０１０を読み込み、プリフェッチするサイトを確率値順にプリフェッチ・スケジュール表１０１２に記録する。そして、プリフェッチ部１０１３に対して、プリフェッチ・スケジュール表１０１２が更新されたことを通知する。
【００６５】
図９には、プリフェッチ・スケジュール表１０１２のデータ構造を模式的に示している。同図に示す例では、プリフェッチ対象として選定されたページ並びにサイトが確率値の順でリストされ、各プリフェッチ対象ページ又はサイトについてのプリフェッチ開始時刻、プリフェッチ期限、当該ページ又はサイトのＵＲＬ、プリフェッチするデータのサイズ、プリフェッチ時に使用する帯域、確率値、プリフェッチの種別、プリフェッチの状況などが記録されている。
【００６６】
プリフェッチ部１０１３は、プリフェッチ・スケジュール表１０１２に基づいて、外部Ｗｅｂサーバからスケジュール表１０１２内のプリフェッチ対象ページを取得し、あるいはプリフェチ・スケジューラ部１０１１からの通知により、プリフェッチ・スケジュール表１０１２を整理する。
【００６７】
続いて、本実施形態に係るキャッシュ・システム１０００内の各部において実行される処理動作について説明する。
【００６８】
図１０には、リクエスト処理部１００１において実行される処理手順をフローチャートの形式で示している。
【００６９】
リクエスト処理部１００１では、ユーザからのリクエストを受信待ちしており（ステップＳ１０）、リクエストを受信すると（ステップＳ１１）、まず、アクセス・ログ１００２へリクエストを書き込む（ステップＳ１２）。
【００７０】
アクセス・ログ１００２へリクエストを書き終えたら、ログ抽出部１００３に対して、アクセス・ログ１００２を処理するように指示を発行する（ステップＳ１３）。
【００７１】
次に、ユーザからリクエストのあったＷｅｂページなどをキャッシュ・ディスク１００６にキャッシュしているかどうかをキャッシュ管理部１００４へ尋ねる（ステップＳ１４）。
【００７２】
そして、キャッシュ・ディスク１００６にキャッシュしていれば（キャッシュ・ヒット）、キャッシュ管理部１００４へ読み出しを依頼し、読み出した後（ステップＳ１５）、要求元ユーザにリクエストされたＷｅｂページなどを返す（ステップＳ１６）。
【００７３】
一方、キャッシュ・ディスク１００６にキャッシュしていなければ（キャッシュ・ミス）、リクエストされたＷｅｂページなどを外部サーバから取得し（ステップＳ１７）、キャッシュ管理部１００４に対して、取得したＷｅｂページなどをキャッシュへ保存するように指示した後（ステップＳ１８）、要求元ユーザにリクエストされたＷｅｂページなどを返す（ステップＳ１６）。
【００７４】
図１１には、ログ抽出部１００３において実行される処理手順をフローチャートの形式で示している。
【００７５】
ログ抽出部１００３では、リクエスト処理部１００１からのアクセス・ログ処理要求を待っており（ステップＳ２０）、アクセス・ログ処理要求が到来すると（ステップＳ２１）、まず、アクセス・ログ１００２から最新のユーザ・リクエストを読み込む（ステップＳ２２）、
【００７６】
次いで、読み込んだユーザ・リクエストから観測値を抽出する（ステップＳ２３）。観測値としては、例えば図５に示すようなアクセス・ログの場合には、アクセスの日付と、時間、クライアントＩＰアドレス、ＵＲＬ、文書タイプを抽出する。
【００７７】
ログ抽出部１００３は、抽出した情報を、まずプリフェッチ対象ページ選定部１００７へ送り、プリフェッチすべきページの選定作業を行なうように指示する（ステップＳ２４）。
【００７８】
次いで、抽出した情報のうち、ＵＲＬに着目し、前回のＵＲＬとサイトが変改しているかどうかを判別する（ステップＳ２５）。前回のＵＲＬとサイトが異なっている場合には、抽出した情報をプリフェッチ対象サイト選定部１０１０へ送り、選定作業を行なうように指示する（ステップＳ２６）、
【００７９】
例えば、同じユーザから前回リクエストされたＵＲＬが“ｈｔｔｐ：／／ｗｗｗ．ａａａ．ｃｏ．ｊｐ／ｂｂｂ．ｈｔｍ”で、今回リクエストされたＵＲＬが“ｈｔｔｐ：／／ｗｗｗ．ｂｂｂ．ｎｅ．ｊｐ／ｉｎｄｅｘ．ｈｔｍ”であった場合に、ログ抽出部１００３は、サイト“ｗｗｗ．ａａａ．ｃｏ．ｊｐ”からサイト“ｗｗｗ．ｂｂｂ．ｎｅ．ｊｐ”にアクセスが変更されたことを捕捉して、サイト“ｗｗｗ．ｂｂｂ．ｎｅ．ｊｐ”の次にアクセスされる可能性のあるサイトを選定するために、プリフェッチ対象サイト選定部１００９へ抽出した情報を送り、プリフェッチすべきサイトの選定作業を指示する。
【００８０】
図１２には、キャッシュ管理部１００４において実行される処理手順をフローチャートの形式で示している。
【００８１】
キャッシュ管理部１００４では、リクエスト処理部１００１やプリフェッチ部１０１３からのコマンドを受信待ちしている（ステップＳ３０）。そして、キャッシュ・ディスク１００６内にリクエストされたＵＲＬに該当したファイルの有無を尋ねられたとき（ステップＳ３１）、図６に示したようなキャッシュ管理テーブル１００５を照会して、該当するファイルがキャッシュ・ディスク１００６上に存在するかどうかを確認する（ステップＳ３２）。
【００８２】
キャッシュ管理テーブル１００５に、リクエストされたＵＲＬに該当したファイルがキャッシュ・ディスク１００６上に存在する場合には、当該ＵＲＬに対応したファイル有りのメッセージを返す（ステップＳ３３）、また、リクエストされたＵＲＬに該当したファイルがキャッシュ・ディスク１００６上に存在しない場合には、当該ＵＲＬに対応したファイル無しのメッセージを返す（ステップＳ３４）。
【００８３】
また、コマンドが読み出し要求だった場合には（ステップＳ３５）、図６に示したようなキャッシュ管理テーブル１００５を照会して、リクエストされたＵＲＬに対応したファイルがキャッシュ・ディスク１００６上に存在するかどうかを確認する（ステップＳ３６）。
【００８４】
当該ＵＲＬに対応したファイルがキャッシュ・ディスク１００６上に存在する場合には、図６に示したようなキャッシュ管理テーブル１００５の最終アクセス日付と時刻を現在の日付と時刻に更新し（ステップＳ３７）、該当ＵＲＬに対応したファイルを返す（ステップＳ３８）。また、当該ＵＲＬに対応したファイルがキャッシュ・ディスク１００６上に存在しない場合には、該当ＵＲＬに対応したファイル無しのメッセージを返す（ステップＳ３４）。
【００８５】
また、コマンドが書き込み要求だった場合には（ステップＳ３９）、まず、図６に示したようなキャッシュ管理テーブル１００５の現在容量に要求があった該当ＵＲＬのファイル・サイズを足したものが、同テーブル１００５で規定されている割り当て容量を超えていないかどうかを確認する（ステップＳ４０）。
【００８６】
そして、割り当て容量を超えていない場合には、キャッシュ領域に該当ＵＲＬに対応したファイルを書き込み（ステップＳ４２）、キャッシュ管理テーブル１００５を更新する（ステップＳ４３）。
【００８７】
一方、現在容量に要求があった該当ＵＲＬのファイル・サイズを足したものが割り当て領域を超える場合は、書き込まれる該当ＵＲＬに対応したファイルの容量分だけ、アクセス日付と時刻から最も古いものからファイルを消去してキャッシュ領域を確保してから（ステップＳ４１）、キャッシュ領域に該当ＵＲＬに対応したファイルを書き込み（ステップＳ４２）、キャッシュ管理テーブル１００５を更新する（ステップＳ４３）。
【００８８】
図１３には、プリフェッチ対象ページ選定部１００７において実行される処理手順をフローチャートの形式で示している。
【００８９】
プリフェッチ対象ページ選定部１００７では、コマンドの受信を待っており（ステップＳ５０）、ログ抽出部１００３から観測値を受け取り、プリフェッチ対象となる選定処理を指示されると（ステップＳ５１）、図１４や図１５に示すようなベイジアン・ネットワーク・モデルへ観測値を設定する（ステップＳ５２）。
【００９０】
ベイジアン・ネットワーク・モデルについての一般的な説明は、例えば本村陽一著の論文「不確実性モデリングのための情報表現：ベイジアンネット」（ＢＮ２００１ベイジアンネットチュートリアル講演論文集、ｐｐ．５−１３（２００１）、人工知能学会人工知能基礎論研究会）を参照されたい。
【００９１】
観測値の設定処理が終わると、次いで、条件付確率の学習を行なう（ステッＳ５３）。ここで、条件付確率の学習には、観測値の設定処理で、記録された観測回数を用いて、ＥＭアルゴリズムで条件付確率の学習を行なう。なお、ＥＭアルゴリズムについては、例えば渡辺美智子、山口和範編著「ＥＭアルゴリズムと不完全データの諸問題」（多賀出版）を参照されたい。
【００９２】
条件付確率の学習に続いて、事後確率の計算を行なう（ステップＳ５４）。ここで、事後確率の計算には、観測値の設定処理で、計算しないように立てたフラグが立っていないノードに対して行なう。なお、事後確率の計算については、例えば本村陽一著の論文「不確実性モデリングのための情報表現：ベイジアンネット」（ＢＮ２００１ベイジアンネットチュートリアル講演論文集、ｐｐ．５−１３（２００１）、人工知能学会人工知能基礎論研究会）を参照されたい。
【００９３】
事後確率の計算が終わると、次いで、図１４に示した“次のＵＲＬノード”４００２や図１５に示した “次のＵＲＬノード”４００２から、各ＵＲＬの確率値を取得し、確率値の高いもの順に並べる（後述）。この中で、ある閾値を越える確率を持つものをプリフェッチ対象ページとしてプリフェッチ対象ページ・リストにＵＲＬと確率値を一旦記録する。ここで言うある閾値とは、例えば、平均値（“次のＵＲＬノード”内の全ＵＲＬ分の１）や、平均値（“次のＵＲＬノード”内の全ＵＲＬ分の１）のα倍などを用いる。
【００９４】
次に、一旦作成したプリフェッチ対象ページ・リスト中の各ＵＲＬ毎に、プリフェッチ期限（秒）を“アクセス間隔（秒）ノード”から求める。求め方は、プリフェッチ対象ページ・リスト中の各ＵＲＬ毎に、仮に観測されたものとして、“次のＵＲＬノード”へ仮の観測値を設定して（仮の観測値に該当するものを１、それ以外を０）、“アクセス間隔（秒）ノード”の事後確率を計算する、この計算された事後確率のうち、確率値の最大となっているものを求める。
【００９５】
プリフェッチ対象ページ・リスト中の各ＵＲＬ毎にプリフェッチ期限（秒）を求めたら、プリフェッチ対象ページ・リスト内にプリフェチ期限（秒）を記録して、プリフェッチ対象ページ・リストを完成させる（ステップＳ５５）。
【００９６】
プリフェチ対象ページ・リストを作成したら、プリフェッチ・スケジューラ部１０１１へリストが作成されたことを通知する（ステップＳ５６）。
【００９７】
ここで、図１４並びに図１５に示した例を参照しながら、ベイジアン・ネットワーク・モデル、並びにこれを用いたユーザ嗜好型のプリフェッチ・アルゴリズムについて、以下に説明する。
【００９８】
図１４と図１５はともに、ユーザの嗜好性が、ユーザが時系列にアクセスしたＵＲＬ（以後、「アクセス系列」という）間の関連性に現れることを利用したベイジアン・ネットワーク・モデルである。アクセス系列としては、前回アクセスしたものと今回アクセスしたものの２つを利用している。例えば、あるユーザが毎回スポーツに関するサイトＡにアクセスした後、サイトＡ内のジャンルＢのページにアクセスするなどの傾向を確率的に計測したりすることができる。
【００９９】
また、図１４に示したベイジアン・ネットワーク・モデルと図１５に示したベイジアン・ネットワーク・モデルの相違は、季節や時間によるユーザの嗜好性を、リクエスト内容の季節的、時間的な変化を加味して、プリフェッチ対象のページの選定処理を行なうかどうかという点である。例えば、あるユーザが、夏にはスポーツ新聞のサイトＡで高校野球に関するページを見る傾向があるが、それ以外の季節ではサッカーに関するページを見る傾向があるなどを確率的に測ることができる。そこで、ユーザ・アクセスの指向性に季節や時間の変化が現れるような場合には、図１５を用いる。
【０１００】
プリフェッチ対象ページ選定部１００７では、ユーザがリクエストしてきたＵＲＬの観測値を図１４又は図１５に示すベイジアン・ネットワーク・モデルに代入し、条件付確率の学習演算を行なって条件付確率の値を算出し、モデル中の個々のノードにおける事後確率を求め、最終的に、ユーザがリクエストしてきたＵＲＬの次にアクセスするＵＲＬを図１４又は図１５に示したモデル中の“次のＵＲＬ”ノード４００２の確率値を参考に取得する。
【０１０１】
ここで、ノード間の条件付の依存関係を向きの付いたリンクによって、Ｘ_ｉ→Ｘ_ｊと表し、Ｘ_ｉを親ノード、Ｘ_ｊを子ノードと呼ぶ。また、図１４や図１５では、図中のノード間の矢印が親子関係を表している。子ノードＸ_ｊに関する条件付確率Ｐは、ある子ノードに対する全親ノード集合をπ（Ｘ_ｊ）＝｛Ｘ_１，・・・，Ｘ_ｉ｝とすると、以下の式で定義される。
【０１０２】
【数１】

【０１０３】
各ノードに複数の状態変数があると、この条件付確率は、条件付確率表で表され、例えば図１４中の“次のＵＲＬ”ノード４００２における条件付確率表は、図１６に示すようになる。
【０１０４】
ここで言う学習とは、条件付確率の値を、Ｘ_ｉ→Ｘ_ｊとなる観測値が得られた観測回数から求める作業のことを意味する。観測回数は、各ノードにおいて、Ｘ_ｉ→Ｘ_ｊとなる観測回数を条件付観測回数表で保存しておく。例えば、図１４中の“次のＵＲＬ”ノード４００２では、条件付観測回数表は図１７に示すようになる。学習は観測値が得られる度に行なわれるので、条件付確率の値も観測値が得られる度に計算されて逐次変化する。
【０１０５】
事後確率とは、観測された変数の確定値（ｅｖｉｄｅｎｃｅ：ｅ）から知りたい確率変数の取り得る具現値の確率である。事後確率を計算するノードから先の親ノードに与えられるｅｖｉｄｅｎｃｅをｅ^＋、計算するノードから先の子ノードに与えられるｅｖｉｄｅｎｃｅをｅ⁻とすると、ベイズの定理より、ノードＸ_ｊの確率は、例えば下式によって求まる。
【０１０６】
【数２】

【０１０７】
なお、ベイジアン・ネットワーク・モデルそのものについての一般的な説明、モデルの表現方法、数学的な演算方法などは、例えば、本村陽一著の論文「不確実性モデリングのための情報表現：ベイジアンネット」（ＢＮ２００１ベイジアンネットチュートリアル講演論文集、ｐｐ．５−１３（２００１）、人工知能学会人工知能基礎論研究会）を参照されたい。
【０１０８】
続いて、ベイジアン・ネットワーク・モデルを用いてユーザの嗜好性を反映したプリフェッチ対象ページを選定する動作について説明する。まず、観測値の設定に関して、図１４を例にとって以下に説明する。
【０１０９】
“クライアントＩＰアドレスノード”４０００へ観測値を設定する場合は、“クライアントＩＰアドレスノード”４０００で、観測値のクライアントＩＰアドレスの確率値を１に、それ以外を０にして、後の事後確率の計算で“クライアントＩＰアドレスノード”４０００の事後確率を計算しないように、フラグを立てておく。なお、“クライアントＩＰアドレスノード”４０００に該当するものがなければ、新規に観測されたクライアントＩＰアドレスを“クライアントＩＰアドレスノード”４０００へ追加する。追加する際の確率値は、追加したものを１とし、それ以外は０とし、後の事後確率の計算を行なわないように、フラグを立てておく。
【０１１０】
“次のＵＲＬノード”４００２へ観測値を設定する場合は、“次のＵＲＬノード”４００２で、後の条件付確率の計算（ＥＭアルゴリズム）で使えるように、“次のＵＲＬノード”４００２の観測されたＵＲＬの観測回数を記録しておく。なお、“次のＵＲＬノード”４００２に該当するＵＲＬがなければ、新規に観測されたＵＲＬを“次のＵＲＬノード”４００２に追加する。追加する際の観測回数は１回としておく。
【０１１１】
“今のＵＲＬノード”４００１へ観測値を設定する場合は、“今のＵＲＬノード”４００１で、観測値のＵＲＬの確率値を１に、それ以外を０にして、後の事後確率の計算で、 “今のＵＲＬノード”４００１の事後確率の計算をしないようにフラグを立てておく。なお、“今のＵＲＬノード”４００１に該当するものがなければ、新規に観測されたＵＲＬを、“今のＵＲＬノード”４００１へ追加する。追加する際の確率値は、追加したものを１とし、それ以外は０とし、後の事後確率の計算を行なわないようにフラグを立てておく。
【０１１２】
“アクセス間隔（秒）ノード”４００３へ観測値を設定する場合は、“アクセス間隔（秒）ノード”４００３で、後の条件付確率の計算（ＥＭアルゴリズム）で使えるように、“アクセス間隔（秒）ノード”４００３の観測されたアクセス間隔（秒）の観測回数を記録しておく。なお、アクセス間隔（秒）は、前回の観測値で得られた日付、時間と今回の観測値で得られた日付、時間の差分から、“アクセス間隔（秒）ノード”４００３の該当箇所に収まるようにして計算する。例えば、差分が３８秒ならば３０秒とする。
【０１１３】
また、図１５に示すベイジアン・ネットワーク・モデルの場合、“季節ノード”４１０１に関して、観測値の日付から、例えば、３月〜５月を春、６月〜８月を夏、９月〜１１月を秋、１２月〜２月を冬として、該当箇所を１に、それ以外を０にして観測値を設定し、“季節ノード”４１０１の事後確率の計算を行なわないようにフラグを立てる。また、“時間ノード”４１０２に関して、観測値の時間から、例えば、６時〜９時を朝、９時〜１２時を午前、１２時〜１３時を真昼、１３時〜１６時を午後、１６時〜１８時を夕方、１８時〜２４時を夜、２４時〜６時を夜中として、該当箇所を１に、それ以外を０にして観測値を設定し、“時間ノード”４１０２の事後確率の計算を行なわないようにフラグを立てる。
【０１１４】
観測値の設定処理が終わると、次いで、条件付確率の学習を行なう（前述）。ここで、条件付確率の学習には、観測値の設定処理で、記録された観測回数を用いて、ＥＭアルゴリズムで条件付確率の学習を行なう。
【０１１５】
条件付確率の学習の次に、事後確率の計算を行なう（前述）。ここで、事後確率の計算には、観測値の設定処理で、計算しないように立てたフラグが立っていないノードに対して行なう。
【０１１６】
事後確率の計算が終わると、次に、図１４中の“次のＵＲＬノード”４００２や図１５中の“次のＵＲＬノード”４００２から各ＵＲＬの確率値を取得し、確率値の高いもの順に並べる。この中で、ある閾値を越える確率を持つものをプリフェッチ対象ページとしてプリフェッチ対象ページ・リスト１００８にそのＵＲＬと確率値を一旦記録する。ここで言うある閾値とは、例えば、平均値（“次のＵＲＬノード”内の全ＵＲＬ分の１）、平均値（“次のＵＲＬノード”内の全ＵＲＬ分の１）のα倍などを用いる。
【０１１７】
次いで、一旦作成したプリフェッチ対象ページ・リスト１００８内の各ＵＲＬ毎に、プリフェッチ期限（秒）を“アクセス間隔（秒）ノード”から求める。求め方は、プリフェッチ対象ページ・リスト１００８内の各ＵＲＬ毎に、仮に観測されたものとして、“次のＵＲＬノード”へ仮の観測値を設定して（仮の観測値に該当するものを１、それ以外を０）、“アクセス間隔（秒）ノード”の事後確率を計算する、この計算された事後確率のうち、確率値の最大となっているものを求める。プリフェッチ対象ページ・リスト１００８内の各ＵＲＬ毎にプリフェッチ期限（秒）を求めたら、プリフェッチ対象ページ・リスト１００８内にプリフェチ期限（秒）を記録して、プリフェッチ対象ページ・リスト１００８を完成させる（前述）。
【０１１８】
図１８には、プリフェッチ対象サイト選定部１００９において実行される処理手順をフローチャートの形式で示している。なお、プリフェッチ対象サイト選定部１００９の動作自体は、プリフェッチ対象ページ選定部１００７と同じで、与えられる観測値が異なっている。
【０１１９】
プリフェッチ対象サイト選定部１００９では、コマンドの受信を待ち（ステップＳ６０）、ログ抽出部１００３から観測値を受け取り、選定処理を指示される（ステップＳ６１）と、図１４や図１５に示すようなベイジアン・ネットワーク・モデルへ観測値を設定する（ステップＳ６２）。観測値の設定の仕方は、プリフェッチ対象ページ選定部と同様である。
【０１２０】
観測値の設定が終わると、条件付確率の学習（ステップＳ６３）を行なった後、事後確率の計算を行ない（ステップＳ６４）、プリフェッチ対象サイト・リスト１０１０を作成する（ステップＳ６５）。なお、条件付確率の計算、事後確率の計算、プリフェッチ対象サイト・リスト１０１０の作成方法は、プリフェッチ対象ページ選定部１００７と同様である。
【０１２１】
プリフェッチ対象サイト・リスト１０１０を作成したら、プリフェッチ・スケジューラ部１０１１へリストが作成されたことを通知する（ステップＳ６６）。
【０１２２】
以上に述べたプリフェッチ対象ページ・リスト１００８では、ユーザが時系列にアクセスしたページ間に相関関係があり、相関関係の度合い（確率値）が、ページ間のユーザの嗜好性を反映しているため、相関関係の度合い（確率値）の高いものほど、ユーザの嗜好性が強いことになるから、同リスト１００８上には嗜好性が強いと推定されるものが書き込まれる。
【０１２３】
同様に、プリフェッチ対象サイト・リスト１０１０では、ユーザが時系列にアクセスしたサイト間に相関関係があり、相関関係の度合い（確率値）が、サイト間のユーザの嗜好性を反映しているため、相関関係の度合い（確率値）の高いものほど、ユーザの嗜好性が強いことになるから、同リスト１０１０上には嗜好性が強いと推定されるものが書き込まれる。
【０１２４】
また、条件付観測回数表は、ユーザからのリクエスト要求（観測値）が得られる度に更新されていくので、条件付確率表も更新される。ユーザの嗜好性の変化に伴い、ユーザからのリクエスト要求が変化すると、それに合わせて条件付観測回数表や条件付確率表が変化するので、事後確率も同様に変化する。したがって、プリフェッチ対象ページ・リスト１００８又はプリフェッチ対照サイト・リスト１０１０内に書き込まれたページやサイトが、ユーザの嗜好性に合わせて変化する。この相関関係の度合い（確率値）を基に、ユーザからのリクエストから次にアクセスされることが予想されるページやサイトを順次先読みしてキャッシュ・ディスク１００６上に保存しておくことで、ユーザの変遷する嗜好性に応じたキャッシュが可能となる。
【０１２５】
このようなキャッシュ動作を行なう場合、キャッシュ管理テーブル１００５がユーザを区別せずにキャッシュ領域を管理し、あるユーザがアクセスするファイルがキャッシュ・ディスク１００６上から消えていたとしても、プリフェッチ対象ページ選定部１００７やプリフェッチ対象サイト選定部１００９からのユーザ毎に推定された要求に基づいて、次にアクセスされる可能性のある嗜好性の強いページ又はサイトがＷｅｂサーバから取得され、常にキャッシュ・ディスク１００６上に保存されていくことになる。したがって、ユーザ毎にアクセスされる可能性のある嗜好性の強い（確率値の高い）ファイルが常にキャッシュ・ディスク１００６上に存在することになる。
【０１２６】
図１９には、プリフェッチ・スケジューラ部１０１１において実行される処理手順をフローチャートの形式で示している。
【０１２７】
プリフェッチ・スケジューラ１０１１部では、プリフェッチ対象ページ選定部１００７やプリフェッチ対象サイト選定部１００９から、リスト作成の通知を待っており（ステップＳ７０）、リスト作成完了通知が到来すると（ステップＳ７１）、リスト内のＵＲＬ、プリフェッチ期限（秒）、確率値と現在時刻をプリフェッチ開始時刻として、現在存在するプリフェッチ・スケジュール表１０１２へ追加登録する（ステップＳ７２）。プリフェッチ・スケジュール表１０１２へ登録したら、プリフェッチ部１０１３へ通知する（ステップＳ７３）。
【０１２８】
図２０には、プリフェッチ部１０１３において実行される処理手順をフローチャートの形式で示している。
【０１２９】
プリフェッチ部１０１３では、プリフェッチ・スケジューラ部１０１１からの通知を待っており（ステップＳ８０）、通知が到来したら（ステップＳ８１）、プリフェッチ・スケジュール表１０１２の中から、プリフェッチ開始時刻が現在時刻よりも古いもののうちプリフェッチが未だに完了していないエントリを探して、該当するものは、プリフェッチを中止する（ステップＳ８２）。次に、プリフェッチ・スケジュール表１０１２から、プリフェッチ開始時刻が現在時刻よりも古いエントリを削除（ステップＳ８３）する。
【０１３０】
プリフェッチ・スケジューラ部１０１１から通知が来ない場合には、スケジュール表の該当ＵＲＬでプリフェッチが行なわれていないものについて、キャッシュ管理部１００４に対して、キャッシュ・ディスク１００６上に該当ＵＲＬがあるかどうかを問い合わせる（ステップＳ８４）。
【０１３１】
該当するＵＲＬがキャッシュ・ディスク１００６上に存在する場合には（ステップＳ８５）、該当ＵＲＬが更新されているかどうかを該当ＵＲＬがあるＷｅｂサーバへ問い合わせる（ステップＳ８６）。そして、該当ＵＲＬが更新されていなかった場合には（ステップＳ８７）、登録通知待ち（ステップＳ８０）へ戻る。
【０１３２】
また、キャッシュ上に該当ＵＲＬがなかった場合と（ステップＳ８５）と、該当ＵＲＬが更新されていた場合には（ステップＳ８７）、まず、リスト内の対象ＵＲＬのサイズを、対象ＵＲＬのあるＷｅｂサーバからＨＴＴＰプロトコルを用いて調べる。
【０１３３】
このサイズを調べる際に、対象ＵＲＬのあるＷｅｂサーバのレスポンスからおおよその帯域を得て、プリフェッチ・スケジュール表１０１２へ、サイズと帯域を記録する（ステップＳ８８）。サイズと帯域からプリフェッチにかかる時間が求まるので、プリフェッチ期限と比べる。
【０１３４】
プリフェッチ期限を越えていた場合は、現在プリフェッチが行なわれている確率値の低いものから順に、プリフェッチを一旦中止し、再度レスポンスを計測して、プリフェッチ期限内に収まるかを試行する。そして、どうしてもプリフェッチ期限内に収まらない場合には、プリフェッチを行なわないように設定する。
【０１３５】
次に、プリフェッチを一旦中止していた確率値の低いものについて、再度、プリフェッチを続けるようにする（ステップＳ８９）。
【０１３６】
次に、スケジュール表の該当ＵＲＬでプリフェッチが行なわれていないものについて、スケジュール表の確率値順に該当ＵＲＬに対応するページ、ページ内の画像等のファイル一式を取得する（ステップＳ９０）。ファイル取得していく過程で、取得順に、キャッシュ管理部１００４へ、取得したファイルを書き込むように依頼する（ステップＳ９１）。
【０１３７】
図２１及び図２２には、本実施形態に係るキャッシュ・システム１０００の介在によりユーザの端末からＷｅｂサーバ上のページを取得するための動作シーケンスを示している。
【０１３８】
図２１には、キャッシュ・システム１０００上にリクエストされたページが存在する場合の動作シーケンスが描かれている。
【０１３９】
この場合、まず、端末からのページ取得リクエストをＷＷＷキャッシュ・システムが受け取る。
【０１４０】
次に、ＷＷＷキャッシュ・システム内のキャッシュ上に、端末からリクエストされたページが存在しているかどうか検索する。そして、キャッシュ上に存在していれば、端末からリクエストされたページを端末に返す。
【０１４１】
次に、ＷＷＷキャッシュ・サーバは、端末からリクエストされたページの次にアクセスされる可能性のあるページを上述した手順に従って予測し、これをＷＷＷサーバにリクエストする。ＷＷＷサーバは、ＷＷＷキャッシュ・システムからリクエストされたページを、ＷＷＷキャッシュ・システムへ返す。
【０１４２】
ＷＷＷキャッシュ・システムは、リクエストしたページをＷＷＷサーバから受け取ると、ＷＷＷキャッシュ・システム内のキャッシュ上に、リクエストしたページを書き込む。
【０１４３】
また、図２２には、キャッシュ・システム１０００上にリクエストされたページが存在しない場合の動作シーケンスが描かれている。
【０１４４】
この場合、まず、端末からのページ取得リクエストをＷＷＷキャッシュ・システムが受け取る。次に、ＷＷＷキャッシュ・システム内のキャッシュ上に、端末からリクエストされたページが存在しているかどうかを検索する。
【０１４５】
そして、キャッシュ上に存在していなければ、端末からリクエストされたページをＷＷＷサーバへリクエストする。ＷＷＷサーバは、ＷＷＷキャッシュ・システムからリクエストされたページを、ＷＷＷキャッシュ・システムへ返す。
【０１４６】
ＷＷＷキャッシュ・システムは、リクエストしたページをＷＷＷサーバから受け取ると、端末にＷＷＷサーバから受け取ったページを送信するとともに、ＷＷＷキャッシュ・システム内のキャッシュ領域上に、リクエストしたページを書き込む。
【０１４７】
次に、ＷＷＷキャッシュ・システムは、端末からリクエストされたページの次にアクセスされる可能性のあるページを上述した手順に従って予測し、これをＷＷＷサーバにリクエストする。
【０１４８】
ＷＷＷサーバは、ＷＷＷキャッシュ・システムからリクエストされたページを、ＷＷＷキャッシュ・システムへ返す。ＷＷＷキャッシュ・システムは、リクエストしたページをＷＷＷサーバから受け取ると、ＷＷＷキャッシュ・システム内のキャッシュ上に、リクエストしたページを書き込む。
【０１４９】
最後に、本実施形態に係るキャッシュ・システムを動作させるハードウェア環境について、図２３を参照しながら説明する。
【０１５０】
ＣＰＵ５０００は、例えば米インテル社のプロセッサ・チップ“Ｐｅｎｔｉｕｍ（登録商標）”などを用いて構成され、本キャッシュ・サーバ装置の全体の動作を統括的に制御するとともにさまざまな演算処理を行なう機能を備えている。ＣＰＵ５０００上では、オペレーティング・システム（ＯＳ）が提供する実行環境下で、本実施形態に係るキャッシュ・サーバ・アプリケーションを始めとしてさまざまなアプリケーションプログラムが動作する。
【０１５１】
キャッシュ・メモリ５００１は、例えばＳＲＡＭ（ＳｔａｔｉｃＲＡＭ）などで構成される小容量の高速メモリであり、ＣＰＵ５０００が頻繁にアクセスするコマンドやデータなどの情報を一時記憶し、ＣＰＵ５０００と直接的な情報授受を行なうことにより、システムの高速化を図っている。
【０１５２】
システム・コントローラ５００２は、ＣＰＵ５０００のローカル・ピンに直結するホストバス５００８と、周辺バスとしてのＰＣＩ（ＰｅｒｉｐｈｅｒａｌＣｏｍｐｏｎｅｎｔＩｎｔｅｒｃｏｎｎｅｃｔ）バス５００９とのインターフェース・プロトコルを実現し、ＣＰＵ５０００と、主メモリ５００３、キャッシュ・メモリ５００１、その他の各種資源（ハードディスク・フレキシブル・ディスクなど）のシステム全体のタイミング調整などを行なう。システム・コントローラ５００２は、例えば、米インテル社のＴＲＩＴＯＮ（４３０ＦＸ）などを用いて構成される。
【０１５３】
主メモリ５００３は、例えばＤＲＡＭ（ＤｙｎａｍｉｃＲＡＭ）などを用いて構成される半導体メモリ装置であり、ＣＰＵ５０００において実行されるプログラム・コードをロードしたり、実行プログラムの作業データの保存領域として用いられたりする。例えば、本発明に係るキャッシュ・サーバ・アプリケーションが主メモリ５００３にロードされ、ＣＰＵ５０００によって実行される。また、アクセス・ログ１００２や、キャッシュ管理テーブル１００５、プリフェッチ対象ページ・リスト１００８、プリフェッチ対象サイト・リスト１０１０、プリフェッチ・スケジュール表１０１２などが、主メモリ５００３に一時的に保管される。
【０１５４】
ＣＰＵ５０００やシステム・コントローラ５００２の指示により、主メモリ５００３への情報の書き込み・読み出し動作が行なわれる。また、主メモリ５００３は、システム・コントローラ５００２を通じてＣＰＵや各種資源に接続され、これらの要求に従って情報の記憶が行なわれる。
【０１５５】
ホストバス５００８は、ＣＰＵ５０００に直接接続された情報の伝達手段であり、キャッシュ・メモリ５００１やシステム・コントローラ５００２などと情報授受ができるようになっている。
【０１５６】
ＰＣＩバス５００９は、ホストバス５００８と分離された情報の伝達手段であって、システム・コントローラ５００２によって相互接続されている。そして、システム・コントローラ５００２を介してＣＰＵ５０００はＰＣＩバス５００９上に接続された各種のハードウェア資源にアクセスすることができる。
【０１５７】
ＨＤコントローラ５００４は、ハード・ディスク・ドライブ（ＨＤＤ）５００５とＰＣＩバス５００９に接続され、ＰＣＩバス５００９を介したディスク・アクセス要求に応答して、ディスク５００５内の特定の領域への情報の書き込み・読み出し動作を制御するようになっている。ハード・ディスク上には、ＣＰＵ５０００において実行されるプログラムをインストールしたり、プログラムやデータなどのコンピュータ・ファイルが蓄積したりする。例えば、本発明に係るキャッシュ・サーバ・アプリケーションがハード・ディスク上にインストールされ、あるいは、アクセス・ログ１００２や、キャッシュ管理テーブル１００５、プリフェッチ対象ページ・リスト１００８、プリフェッチ対象サイト・リスト１０１０、プリフェッチ・スケジュール表１０１２などがハード・ディスク上に蓄積される。
【０１５８】
ＦＤコントローラ５００６は、フレキシブル・ディスクを取り出し可能に装填するフレキシブル・ディスク・ドライブ５００７とＰＣＩバス５００９に接続され、ＰＣＩバス５００９を介したディスク・アクセス要求に応答して、フレキシブル・ディスク内の特定の領域に情報の書き込み・読み出し動作を制御するようになっている。
【０１５９】
あるいは、システム内には、フレキシブル・ディスク以外の可搬型メディアを装填してアクセス動作を行なうメディア・ドライブがＰＣＩバス５００９に接続されていてもよい。この種の可搬型メディアは、システム間でのプログラムやデータの移動に利用される。例えば、本発明に係るキャッシュ・サーバ・アプリケーションや、キャッシュ・サーバ動作において利用されるアクセス・ログ１００２や、キャッシュ管理テーブル１００５、プリフェッチ対象ページ・リスト１００８、プリフェッチ対象サイト・リスト１０１０、プリフェッチ・スケジュール表１０１２などを、可搬型メディアを介して複数のシステム間で移動することができる。
【０１６０】
マウス・コントローラ５０１０は、ユーザ入力装置としてのマウス５０１１（並びにキーボード（図示しない））とＰＣＩバス５００９を接続し、操作者が強制したマウスの動きやその他のユーザ入力操作を所定のシーケンスに従ってＣＰＵ５０００に伝達するようになっている。ＣＰＵ５０００側では、例えばＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）環境を提供し、ＣＲＴディスプレイ５０１３上に表示された映像に対して併せて表示されたマウス・ポインタ（矢印絵文字）を相対的に移動させる情報作成の基礎情報を生成することができる。ＣＲＴディスプレイ５０１３は、ＣＲＴＣ（５０１２）に接続され、ＣＰＵの作成した状態その他の情報を、映像として表示するようになっている。
【０１６１】
ＣＲＴＣ（ＣＲＴコントローラ）５０１２は、ＰＣＩバス５００９に接続され、ＣＰＵ５０００などの指示に基づいて、図形などの描画情報をＣＲＴディスプレイ５０１３上に描画する。
【０１６２】
ネットワーク・インターフェース５０１４は、当該システムを、ＬＡＮやインターネットなどの外部ネットワークに接続する。システムは、ネットワーク上では、上述したキャッシュ・サーバとして動作し、端末からのリクエストをネットワーク・インターフェース５０１４を介して受け取り、ファイルの配信やＷＷＷサーバからのファイル取得、各ユーザ毎のアクセス系列の解析、ファイルの先読みなどの動作を実行する。
【０１６３】
また、ネットワーク上では、システム間でのプログラムやデータの移動（ダウンロード）が行なわれる。例えば、本発明に係るキャッシュ・サーバ・アプリケーションや、キャッシュ・サーバ動作において利用されるアクセス・ログ１００２や、キャッシュ管理テーブル１００５、プリフェッチ対象ページ・リスト１００８、プリフェッチ対象サイト・リスト１０１０、プリフェッチ・スケジュール表１０１２などを、ネットワークを介してシステム間で転送することができる。
【０１６４】
［追補］
以上、特定の実施形態を参照しながら、本発明について詳解してきた。しかしながら、本発明の要旨を逸脱しない範囲で当業者が該実施形態の修正や代用を成し得ることは自明である。すなわち、例示という形態で本発明を開示してきたのであり、本明細書の記載内容を限定的に解釈するべきではない。本発明の要旨を判断するためには、冒頭に記載した特許請求の範囲の欄を参酌すべきである。
【０１６５】
【発明の効果】
以上詳記したように、本発明によれば、ＷＷＷ情報空間で不特定多数のユーザに提供される情報の中から、特定のユーザの要求に応じて効率的に情報を提供することができる、優れた情報提供システム及び情報提供方法、並びにコンピュータ・プログラムを提供することができる。
【０１６６】
また、本発明によれば、ＷＷＷ情報をキャッシュしておくことにより特定のユーザの要求から比較的短い応答時間で情報を提供することができる、優れた情報提供システム及び情報提供方法、並びにコンピュータ・プログラムを提供することができる。
【０１６７】
また、本発明によれば、ユーザが要求する情報を先読みすることにより、ユーザの要求から比較的短い応答時間で情報を提供することができる、優れた情報提供システム及び情報提供方法、並びにコンピュータ・プログラムを提供することができる。
【０１６８】
また、本発明によれば、時々刻々と変わるユーザの嗜好性の変化に対応してユーザが要求する情報を先読みしておくことにより、ユーザの要求から比較的短い応答時間で情報を提供することができる、優れた情報提供システム及び情報提供方法、並びにコンピュータ・プログラムを提供することができる。
【０１６９】
本発明によれば、ユーザのリクエスト・ログを用いてユーザ毎にプリフェッチを行なうことで、不特定多数ではなく、特定のユーザのリクエストしたＷＷＷ情報を効率的にキャッシュすることができる。また、先読みする機構に、確率ネットワーク・モデルの１つであるベイジアン・ネットワーク・モデルを用いることで、刻一刻と変わるユーザの嗜好性の変化に応じてアクセス・パターンが変化したとしても、これに対応して先読み対象を確率的に求め、先読み対象を限定し、ユーザの変化に動的に対応することができる。
【図面の簡単な説明】
【図１】本発明を適用したＷＷＷ情報提供システムの構成例を模式的に示した図である。
【図２】本発明を適用したＷＷＷ情報提供システムについての他の構成例を模式的に示した図である。
【図３】本発明を適用したＷＷＷ情報提供システムについてのさらに他の構成例を模式的に示した図である。
【図４】本発明の一実施形態に係るキャッシュ・システム１０００の機能構成を模式的に示した図である。
【図５】アクセス・ログ１００２において記録されているログのデータ構造の一例を示した図である。
【図６】キャッシュ管理テーブル１００５のデータ構造の一例を示した図である。
【図７】プリフェッチ対象ページ・リスト１００８のデータ構造の一例を示した図である。
【図８】プリフェッチ対象サイト・リスト１０１０のデータ構造の一例を示した図である。
【図９】プリフェッチ・スケジュール表１０１２のデータ構造を模式的に示した図である。
【図１０】リクエスト処理部１００１において実行される処理手順を示したフローチャートである。
【図１１】ログ抽出部１００３において実行される処理手順を示したフローチャートである。
【図１２】キャッシュ管理部１００４において実行される処理手順を示したフローチャートである。
【図１３】プリフェッチ対象ページ選定部１００７において実行される処理手順を示したフローチャートである。
【図１４】ベイジアン・ネットワーク・モデルの構成例を示した図である。
【図１５】ベイジアン・ネットワーク・モデルの構成例を示した図である。
【図１６】条件付確率表を示した図である。
【図１７】条件付確率観測回数表を示した図である。
【図１８】プリフェッチ対象サイト選定部１００９において実行される処理手順を示したフローチャートである。
【図１９】プリフェッチ・スケジューラ部１０１１において実行される処理手順を示したフローチャートである。
【図２０】プリフェッチ部１０１３において実行される処理手順を示したフローチャートである。
【図２１】キャッシュ・システム１０００の介在によりユーザの端末からＷｅｂサーバ上のページを取得するための動作シーケンス（但し、キャッシュ・システム１０００上にリクエストされたページが存在する場合）を示した図である。
【図２２】キャッシュ・システム１０００の介在によりユーザの端末からＷｅｂサーバ上のページを取得するための動作シーケンス（但し、キャッシュ・システム１０００上にリクエストされたページが存在しない場合）を示した図である。
【図２３】本発明に係るキャッシュ・システムを動作させるハードウェア構成の一例を示した図である。
【図２４】すべてのユーザの端末を区別しない場合におけるキャッシュ管理テーブル１００５のデータ構造の一例を示した図である。
【符号の説明】
１０００…キャッシュ・システム
１００１…リクエスト処理部
１００２…アクセス・ログ
１００３…ログ抽出部
１００４…キャッシュ管理部
１００５…キャッシュ管理テーブル
１００６…キャッシュ・ディスク
１００７…プリフェッチ対象ページ選定部
１００８…プリフェッチ対象ページ・リスト
１００９…プリフェッチ対象サイト選定部
１０１０…プリフェッチ対象サイト・リスト
１０１１…プリフェッチ・スケジューラ部
１０１２…プリフェッチ・スケジュール表
１０１３…プリフェッチ部
５０００…ＣＰＵ，５００１…キャッシュ・メモリ
５００２…システム・コントローラ，５００３…主メモリ
５００４…ＨＤコントローラ，５００５…ＨＤＤ
５００６…ＦＤコントローラ，５００７…ＦＤ
５００８…ホストバス，５００９…ＰＣＩバス
５０１０…マウス・コントローラ，５０１１…マウス
５０１２…ＣＲＴディスプレイ，５０１３…ＣＲＴコントローラ
５０１４…ネットワーク・インターフェース[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an information providing system, an information providing method, and a computer program for providing information requested by a user, and more particularly, to a specific user among information provided to an unspecified number of users in a WWW information space. The present invention relates to an information providing system, an information providing method, and a computer program for efficiently providing information in response to a request from a user.
[0002]
More specifically, the present invention relates to an information providing system and an information providing method for providing information with a relatively short response time from a request of a specific user by caching WWW information, and a computer program. An information providing system, an information providing method, and a computer program for providing information in a relatively short response time from a user request by pre-reading and caching (that is, pre-reading) information expected to be requested by the user. About.
[0003]
[Prior art]
Recently, network computing technology is evolving rapidly. Under a network connection environment, cooperative work such as sharing of computer resources and sharing, distribution, distribution, and exchange of information can be smoothly performed.
[0004]
There are various forms of networks for interconnecting computers. For example, a locally laid LAN (Local Area Network) such as Ethernet (registered trademark), and the “Internet” which has literally grown into a world-wide network as a result of repeatedly interconnecting networks. (The Internet). In particular, the Internet has become widespread along with the spread of broadband communication and constant connection.
[0005]
On the Internet, servers are interconnected on the basis of TCP / IP (Transmission Control Protocol / Internet Protocol), and WWW (world wide web), News, TELNET (TELetypewriter NETwork), FTP (multiple file browser, etc.) Service is open to the public.
[0006]
Of these, WWW is a wide-area information retrieval system that provides an information space having a hyperlink structure, and is the largest factor for the explosive growth and rapid spread of the Internet. On the WWW system, information contents of various media such as texts, images, and sounds are disclosed. A user of the WWW service can connect to a server that provides WWW information from the client device via the Internet and acquire WWW information.
[0007]
The WWW information is described in a hypertext format description language called HTML (Hyper Text Markup Language). According to TCP / IP, an information resource is specified by an identifier in the form of a URL (Uniform Resource Locator), and is capable of transferring an HTML document according to an HTTP (Hyper Text Transfer Protocol) protocol (well-known).
[0008]
WWW information is usually provided from a server in units called pages. On the client side, WWW information can be downloaded in page units using a WWW browser and displayed on the screen as a WWW page. A WWW page described in a hypertext format has a mutual reference relationship with a page provided on the same server or a page provided on another server by a hyperlink. .
[0009]
Here, when using WWW information, there is a problem that response time when information is obtained directly from the server is slow due to processing time of the server and bandwidth of the Internet.
[0010]
In order to solve this problem, a technique has been adopted in which a WWW information providing device called a cache server is installed on the Internet consisting of a user's client device and a device for providing WWW information (for example, see Patent Reference 1). The cache server stores WWW information frequently accessed among the WWW information requested by an unspecified number of users in the apparatus. If the information requested by the user is present in the apparatus (cache hit), the information is provided to the user. If the requested information is not present in the apparatus (cache miss), the information is provided to the server that provides the WWW information. And provide it to the requesting user.
[0011]
However, the conventional cache server is generally shared by an unspecified number of users and has a finite storage capacity. Therefore, the cache server follows a logic such as FIFO (Fast In Fast Out) or LRU (Least Recently Used). -Data management (data deletion) is being performed. In such a case, the cache server stores the frequently accessed WWW information in the device based on the access information of the unspecified number of users, so that the information requested by the specific user is not necessarily stored in the cache server. It is not always stored. For this reason, a cache miss results in the cache server sequentially obtaining information from the server that provides the WWW information, and as a result, the problem of response time cannot be solved.
[0012]
Further, link information to another page is obtained from within the reference page of the WWW information requested by the user, and all the reference pages are read ahead to reduce the response time ( See, for example, Non-Patent Document 1).
[0013]
However, in this case, in all pages, since all the link destinations in the page are prefetched, prefetching is performed even to the one that is likely not to be accessed, which is inefficient and wastes bandwidth. Problem.
[0014]
There is also a method in which importance and priority are set for a reference page of a user or WWW information, and prefetching is performed (for example, see Patent Document 2). For example, a mechanism is provided for totalizing and analyzing the history of WWW data access of the proxy server, and a priority is given to data cached in the cache server based on the analysis result. Then, a condition is set as to whether or not to perform prefetch for each requested data according to the priority. As a result, it is possible to improve the cache recall rate by prefetching while suppressing an increase in the load on the network due to prefetching.
[0015]
However, in this case, since the importance and the priority become uniform, there is a problem that the object to be pre-read also becomes uniform. In other words, when the user's preference changes and the access pattern changes dynamically, the cache server cannot cope.
[0016]
[Patent Document 1]
JP-A-10-21174
[Patent Document 2]
JP-A-11-149405
[Non-patent document 1]
Full technical explanation of Web server, Nikkei BP
[0017]
[Problems to be solved by the invention]
An object of the present invention is to provide an excellent information providing system and information which can efficiently provide information according to a request of a specific user from information provided to an unspecified number of users in a WWW information space. An object of the present invention is to provide a providing method and a computer program.
[0018]
A further object of the present invention is to provide an excellent information providing system, an information providing method, and a computer program, which can provide information with a relatively short response time from a request of a specific user by caching WWW information. Is to provide.
[0019]
A further object of the present invention is to provide an excellent information providing system and information providing method capable of providing information in a relatively short response time from a user request by pre-reading information that the user will request, And to provide a computer program.
[0020]
It is a further object of the present invention to provide information in a relatively short response time from a user request by pre-reading information requested by the user in response to a change in user's preference changing every moment. An object of the present invention is to provide an excellent information providing system, an information providing method, and a computer program.
[0021]
Means and Action for Solving the Problems
The present invention has been made in consideration of the above problems, and a first aspect of the present invention is an information providing system for providing information including pages having a mutual reference relationship across a plurality of sites,
Log extraction means for extracting a request log requesting information for each user;
A prefetch target page selecting unit that predicts a next page to be accessed based on an access sequence of pages included in the request log, and obtains a page to be prefetched for each user;
Prefetch target site selection means for predicting the next site to be accessed based on the access sequence of the site included in the request log, and seeking a site to be prefetched for each user,
A prefetch unit that prefetches information based on a page and a site selected by the prefetch target page selection unit and the prefetch target site selection unit;
An information providing system comprising:
[0022]
However, the term “system” as used herein refers to a logical collection of a plurality of devices (or functional modules that realize specific functions), and each device or functional module is in a single housing. It does not matter in particular.
[0023]
The information providing system according to the present invention provides information with a relatively short response time from a user request by caching WWW information published in a WWW server.
[0024]
At this time, by performing prefetching for each user using the user's request log, WWW information requested by a specific user, rather than an unspecified majority, can be efficiently cached.
[0025]
Here, the prefetch target page selection unit and / or the prefetch target site selection unit uses a probability network model for each user that describes a user's access sequence to determine a page and / or site that the user next accesses. Predict.
[0026]
That is, by using the Bayesian network model, which is one of the probability network models, for the mechanism for prefetching WWW information, even if the access pattern changes in response to the user's ever-changing preference. In response to this, it is possible to stochastically obtain a prefetch target, limit the prefetch target, and dynamically respond to a change of the user.
[0027]
In addition, the prefetch target page selection unit and / or the prefetch target site selection unit can update the probability network model based on the access result of the next user.
[0028]
Further, the prefetch means may perform a prefetch operation for a prefetch target in consideration of a load on a network or the like. For example, the priorities of the pages and / or sites to be prefetched based on the prediction may be rearranged in consideration of the network load.
[0029]
Further, cache information such as prefetched information may be stored in a storage area allocated to each user. Alternatively, cache information may be managed on the same storage space without distinguishing users.
[0030]
A second aspect of the present invention is described in a computer-readable format so as to execute, on a computer system, a process for providing information comprising pages having a mutual reference relationship across a plurality of sites. A computer program,
A log extracting step of extracting a request log requesting information for each user;
A prefetch target page selection step of predicting the next page to be accessed based on the access sequence of the pages included in the request log, and seeking a page to be prefetched for each user;
A prefetch target site selection step of predicting the next site to be accessed based on the access sequence of the site included in the request log and obtaining a site to be prefetched for each user;
A prefetch step of prefetching information based on the page and site selected by the prefetch target page selection step and the prefetch target site selection step;
A computer program characterized by comprising:
[0031]
The computer program according to the second aspect of the present invention defines a computer program described in a computer-readable format so as to realize a predetermined process on a computer system. In other words, by installing the computer program according to the second aspect of the present invention into a computer system, a cooperative action is exerted on the computer system, and the information provision according to the first aspect of the present invention is provided. The same operation and effect as those of the system can be obtained.
[0032]
Further objects, features, and advantages of the present invention will become apparent from more detailed descriptions based on embodiments of the present invention described below and the accompanying drawings.
[0033]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
[0034]
According to the present invention, a cache server is arranged on the Internet comprising a client device of a user and a device for providing WWW information, and the cache server performs a prefetch operation for information to be accessed next for each user. Thus, WWW information requested by a specific user, not an unspecified number, is efficiently provided.
[0035]
FIG. 1 schematically shows a configuration example of a WWW information providing system to which the present invention is applied. In the example shown in the figure, a terminal 100 as a client from which a user requests information, a Web server 103 for providing WWW information in page units, and a cache server for storing WWW information provided from the Web server 103 A device 104 is connected to the Internet 102 via a network 101. In the cache server device 104, a cache system 105 (described later) according to an embodiment of the present invention is constructed.
[0036]
The terminal 100 is a client terminal used when a user browses WWW information on the Internet, and corresponds to, for example, a PC (Personal Computer), a PDA (Personal Data Assistants), or a mobile phone.
[0037]
The Web server 103 is configured by a device installed by a person who provides a service for providing WWW information on the Internet to provide WWW information to users.
[0038]
WWW information is described using a hypertext format description language called HTML. The WWW information is specified by a URL and can be transferred according to the HTTP protocol. The WWW information is usually provided from the server in units called pages (hereinafter, referred to as “Web pages”). On the client side, WWW information can be downloaded in page units using a WWW browser and displayed on the screen as a Web page. In addition, a Web page described in a hypertext format has a mutual reference relationship with a page provided on the same server or a page provided on another server by a hyperlink. .
[0039]
The cache server device 104 is provided to reduce a delay time when a user obtains WWW information such as a Web page directly from the Web server 103. The cache system 105 caches information such as a web page requested by a user for each user's request for WWW information, and predicts a page that may be accessed next to the web page requested by the user. Cache (read ahead). The illustrated cache server device 104 has a configuration called a “backside cache”.
[0040]
In the illustrated form, a user request flows from the terminal 100 to the Internet 102 through the network 101 and reaches the cache server device 104. In the cache server device 104, a user request is passed to the cache system 105.
[0041]
In the cache system 105, if the page requested by the user is cached (cache hit), the page requested by the user is passed to the cache server device 104, but if the page is not cached (cache miss), The page requested by the user is acquired from the Web server 103 and passed to the cache server device 104.
[0042]
In addition, when the cache system 105 receives a user request, the cache system 105 performs a process of pre-caching a page to which the user may access next (prefetching). This point will be described in detail later.
[0043]
FIG. 2 schematically shows another configuration example of the WWW information providing system to which the present invention is applied. In the example shown in FIG. 1, a terminal 200 as a client from which a user requests information, an in-house LAN 203, and the like are connected to the Internet 202 through a network 201. The in-house LAN 203 here refers to a local area network in a company that provides a service for providing a Web page on the Internet.
[0044]
A web server 206 that provides WWW information in page units and a cache server device 204 that stores WWW information provided from the web server 103 are connected to the in-house LAN 203. In the cache server device 204, a cache system 205 according to an embodiment of the present invention is constructed. The illustrated cache server device 204 has a configuration called a “front side cache”.
[0045]
In the illustrated form, a user request flows from the terminal 200 to the Internet 202 via the network 201 and reaches the cache server device 204 via the in-house LAN 203. In the cache server device 204, a user request is passed to the cache system 205.
[0046]
In the cache system 205, if the page requested by the user is cached (cache hit), the page requested by the user is passed to the cache server device 204, but if the page is not cached (cache miss), The page requested by the user is acquired from the Web server 203 and passed to the cache server device 204.
[0047]
In addition, when the cache system 205 receives a user request, the cache system 205 performs a process of predicting a page which is likely to be accessed next by the user and caching the page in advance (prefetching). I will explain in detail.
[0048]
FIG. 3 schematically shows still another configuration example of the WWW information providing system to which the present invention is applied. In the example shown in the figure, a terminal 300 as a client from which a user requests information and a Web server 304 for providing WWW information in page units are connected to the Internet 303 via a network 302. In the terminal 300, a cache system 301 according to an embodiment of the present invention is constructed.
[0049]
In the illustrated form, a user request is first passed from the terminal 300 to the cache system 301. Then, in the cache system 301, if the page requested by the user is cached (cache hit), the page requested by the user is passed to the terminal 300, but if not cached (cache miss), the network 302 And obtains the page requested by the user from the Web server 304 via the Internet 303 and passes it to the terminal 300.
[0050]
Also, when the cache system 301 receives a user request, the cache system 301 performs a process of predicting a page which is likely to be accessed next by the user and caching it in advance (prefetching). I will explain in detail.
[0051]
FIG. 4 schematically shows a functional configuration of a cache system according to an embodiment of the present invention.
[0052]
The request processing unit 1001 records the user request in the access log 1002, instructs the log extraction unit 1003 to perform log processing, obtains the page requested by the user from the cache management unit 1004, and obtains the page obtained by the user. return it. If the corresponding page cannot be obtained from the cache management unit 1004, the page requested by the user is obtained from an external Web server, and the cache management unit (1004) is instructed to write the obtained page. , The retrieved page can be returned to the user.
[0053]
FIG. 5 shows an example of the data structure of a log recorded in the access log 1002. In the example shown in the figure, a log record is generated for each request from the user, and each record includes the date and time of access, the IP address of the requesting user, the request type, the URL of the requested Web page, and the requested URL. Has a field to record the document type, etc.
[0054]
The log extracting unit 1003 responds to an instruction from the request processing unit 1001 by the prefetch target page selecting unit 1007 and the prefetch target site selecting unit 1009, each of which is necessary for predicting a page or site to be prefetched (for example, Client IP address, URL, access time, etc.) are extracted from the access log 1002, and the extracted information is transferred to each of the prefetch target page selection unit 1007 and the prefetch target site selection unit 1009, and an instruction is given to perform a selection process.
[0055]
In response to a request from the request processing unit 1001 or the prefetch unit 1013, the cache management unit 1004 searches whether the file requested to be prefetched exists on the cache disk 1006, reads the file from the cache disk 1006, Performs disk access processing such as writing a file to the disk 1006, and records cache management information such as a file list of the cache disk 1006 and access date and time in the cache management table 1005.
[0056]
FIG. 6 shows an example of the data structure of the cache management table 1005. In the example shown in the figure, cache management information is managed for each requesting client user. Each client is identified by an IP address and has a dedicated cache storage capacity assigned to it. The cache management table 1005 manages the storage capacity allocated to each client and the storage capacity currently in use, and for each client, the last access date and time, URL, cache disk, and the like for the requested Web page. A list including file names in (1006) is recorded. The list of requested Web pages is arranged, for example, in descending order of the last access date and time for each client.
[0057]
When a file is searched by the cache management unit 1004, the cache management table 1005 is referred to. Therefore, the file requested from the terminal, the file acquired from the Web server, and the like exist in the cache management table 1005. In the configuration example of the cache management table illustrated in FIG. 6, the user terminal is separately cached. That is, it is possible to determine the allocated capacity of the cache disk 1006 for each user terminal and perform caching for each terminal. On the other hand, duplicate files may be cached between cache areas for each user terminal, and the cache disk 1006 may not be used efficiently.
[0058]
On the other hand, a method of managing files without distinguishing terminals of all users is also conceivable. A general cache server employs such a management method. FIG. 24 illustrates an example of the data structure of the cache management table 1005 when terminals of all users are not distinguished. In the example shown in the figure, each client is not identified on the cache management table 1005, and there is no cache area allocation for each client. The cache management table 1005 keeps track of the allocated capacity and the current capacity of the entire cache disk 1006, and lists the last access date and time, URL, and file name in the cache disk (1006) of the requested Web page. Is recorded without identifying the access requesting client. In this case, there is a disadvantage that the cache is not cached for each terminal of the user, but the one that is accessed more frequently remains in the cache, so that the use efficiency of the cache disk 1006 is improved.
[0059]
According to the present invention, every time a request is received from a user terminal, a page or a site that may be requested next, such as a requested page, is predicted, and the predicted page or site is obtained from a Web server (prefetching). Then, it is cached in advance on the cache disk 1006 (described later). Regardless of the configuration of the cache management table 1005 shown in FIG. 6 or FIG. 24, a page or the like that may be requested always exists on the cache management table 1005 and the cache disk 1006.
[0060]
The prefetch target page selection unit 1007 receives the information extracted from the access log 1002 by the log extraction unit 1003, predicts a page that may be accessed next by the user based on this information, and selects the page as a prefetch target page. I do. More specifically, a page that is likely to be accessed next to the page requested by the user is calculated with a probability value in the range of 0 to 1, selected with a time limit for acquisition until prefetching, and the selected result Is recorded in the prefetch target page list 1008. Further, it notifies the prefetch scheduler unit 1011 that the prefetch target page list 1008 has been updated.
[0061]
FIG. 7 shows an example of the data structure of the prefetch target page list 1008. In the example shown in the figure, on the prefetch target page list 1008, pages predicted to be likely to be accessed are listed, for example, in order of probability value, and the IP address of the client requesting each page is displayed. The page URL, prefetch deadline, and the like are recorded.
[0062]
The prefetch target site selection unit 1009 receives the information extracted from the access log 1002 by the log extraction unit 1003, and predicts a site to which the user may access next based on this information, that is, the user's access sequence. Select as a prefetch target site. More specifically, a site that is likely to be accessed next to the site where the page requested by the user is located is calculated with a probability value in the range of 0 to 1, and is selected with an acquisition deadline until prefetching. Is recorded in the prefetch target site list 1010. Further, the prefetch scheduler unit 1011 is notified that the prefetch target site list 1010 has been updated.
[0063]
FIG. 8 shows an example of the data structure of the prefetch target site list 1010. In the example shown in the drawing, sites predicted to be likely to be accessed are listed on the prefetch target site list 1010, for example, in order of probability value, and the IP address of the client requesting each site is displayed. The site URL, the prefetch deadline, and the like are recorded.
[0064]
In response to the notification from the prefetch target page selection unit 1007, the prefetch scheduler unit 1011 reads the prefetch target page list 1008, and records the pages to be prefetched in the prefetch schedule table 1012 in order of probability value. Further, in response to the notification from the prefetch target site selection unit 1009, the prefetch target site list 1010 is read, and sites to be prefetched are recorded in the prefetch schedule table 1012 in order of probability value. Then, it notifies the prefetch section 1013 that the prefetch schedule table 1012 has been updated.
[0065]
FIG. 9 schematically shows the data structure of the prefetch schedule table 1012. In the example shown in the figure, pages and sites selected as prefetch targets are listed in order of probability value, the prefetch start time, prefetch deadline, URL of the page or site, data to be prefetched for each prefetch target page or site. , A band used at the time of prefetch, a probability value, a prefetch type, a prefetch status, and the like.
[0066]
The prefetch unit 1013 acquires a prefetch target page in the schedule table 1012 from an external Web server based on the prefetch schedule table 1012, or sorts the prefetch schedule table 1012 according to a notification from the prefetch scheduler unit 1011.
[0067]
Subsequently, a processing operation executed in each unit in the cache system 1000 according to the present embodiment will be described.
[0068]
FIG. 10 shows a processing procedure executed by the request processing unit 1001 in the form of a flowchart.
[0069]
The request processing unit 1001 waits for a request from the user (step S10), and upon receiving the request (step S11), first writes the request to the access log 1002 (step S12).
[0070]
After writing the request in the access log 1002, an instruction is issued to the log extracting unit 1003 to process the access log 1002 (step S13).
[0071]
Next, it inquires of the cache management unit 1004 whether a Web page or the like requested by the user is cached in the cache disk 1006 (step S14).
[0072]
If the data is cached in the cache disk 1006 (cache hit), a request is made to the cache management unit 1004 for reading, and after reading (step S15), a Web page or the like requested by the requesting user is returned (step S15). S16).
[0073]
On the other hand, if the cached page is not cached in the cache disk 1006 (cache miss), the requested Web page or the like is acquired from the external server (step S17), and the acquired Web page or the like is cached to the cache management unit 1004. After instructing to save (step S18), a Web page or the like requested by the requesting user is returned (step S16).
[0074]
FIG. 11 shows a processing procedure executed in the log extracting unit 1003 in the form of a flowchart.
[0075]
The log extracting unit 1003 waits for an access log processing request from the request processing unit 1001 (step S20). When an access log processing request arrives (step S21), first, the latest user information is read from the access log 1002. The request is read (step S22),
[0076]
Next, an observation value is extracted from the read user request (step S23). As an observation value, for example, in the case of an access log as shown in FIG. 5, an access date and time, a client IP address, a URL, and a document type are extracted.
[0077]
The log extracting unit 1003 first sends the extracted information to the prefetch target page selecting unit 1007, and instructs to perform a work of selecting a page to be prefetched (step S24).
[0078]
Next, focusing on the URL of the extracted information, it is determined whether or not the previous URL and the site have been changed (step S25). If the previous URL is different from the site, the extracted information is sent to the prefetch target site selecting unit 1010 to instruct the prefetch target site to perform the selecting operation (step S26).
[0079]
For example, the URL previously requested by the same user is “http://www.aaa.co.jp/bbb.htm”, and the URL requested this time is “http://www.bbb.ne.jp/index. .Htm ”, the log extracting unit 1003 captures that the access has been changed from the site“ www.aaa.co.jp ”to the site“ www.bbb.ne.jp ”, and In order to select a site that may be accessed next to “www.bbb.ne.jp”, the extracted information is sent to the prefetch target site selection unit 1009 to instruct the prefetch target site selection operation.
[0080]
FIG. 12 shows, in the form of a flowchart, a processing procedure executed in the cache management unit 1004.
[0081]
The cache management unit 1004 waits for a command from the request processing unit 1001 or the prefetch unit 1013 (step S30). Then, when asked if there is a file corresponding to the requested URL in the cache disk 1006 (step S31), the cache management table 1005 as shown in FIG. It is checked whether the disk exists on the disk 1006 (step S32).
[0082]
If the file corresponding to the requested URL exists in the cache disk 1006 in the cache management table 1005, a message indicating that there is a file corresponding to the URL is returned (step S33). If the corresponding file does not exist on the cache disk 1006, a message indicating that there is no file corresponding to the URL is returned (step S34).
[0083]
If the command is a read request (step S35), the cache management table 1005 as shown in FIG. 6 is checked to determine whether a file corresponding to the requested URL exists on the cache disk 1006. It is confirmed whether or not it is (step S36).
[0084]
If the file corresponding to the URL exists on the cache disk 1006, the last access date and time of the cache management table 1005 as shown in FIG. 6 are updated to the current date and time (step S37). The file corresponding to the URL is returned (step S38). If the file corresponding to the URL does not exist on the cache disk 1006, a message indicating that there is no file corresponding to the URL is returned (step S34).
[0085]
If the command is a write request (step S39), first, the current size of the cache management table 1005 as shown in FIG. It is checked whether or not the allocated capacity specified in the table 1005 has been exceeded (step S40).
[0086]
If the allocated capacity is not exceeded, a file corresponding to the URL is written in the cache area (step S42), and the cache management table 1005 is updated (step S43).
[0087]
On the other hand, if the sum of the file size of the requested URL and the requested size exceeds the allocated area, the file is written starting from the oldest file from the access date and time by the size of the file corresponding to the written URL. Is erased to secure a cache area (step S41), a file corresponding to the URL is written in the cache area (step S42), and the cache management table 1005 is updated (step S43).
[0088]
FIG. 13 is a flowchart illustrating a processing procedure performed by the prefetch target page selection unit 1007.
[0089]
The prefetch target page selection unit 1007 is waiting for a command to be received (step S50), receives an observation value from the log extraction unit 1003, and is instructed to perform prefetch target selection processing (step S51). An observation value is set in a Bayesian network model as shown in FIG. 15 (step S52).
[0090]
For a general description of the Bayesian network model, see, for example, a paper by Yoichi Motomura, "Information Representation for Uncertainty Modeling: Bayesian Net" (BN2001 Bayesian Net Tutorial Lecture Book, pp. 5-13 (2001)). , The Japan Society for Artificial Intelligence Artificial Intelligence Basic Research Group).
[0091]
After the observation value setting process is completed, learning of conditional probability is performed (step S53). Here, in the learning of the conditional probability, in the process of setting the observation value, the conditional probability is learned by the EM algorithm using the recorded number of observations. For more information on the EM algorithm, see, for example, "Machine Watanabe and Kazunori Yamaguchi, edited by EM Algorithm and Problems of Incomplete Data" (Taga Publishing).
[0092]
Following the learning of the conditional probability, the posterior probability is calculated (step S54). Here, the calculation of the posterior probabilities is performed on the nodes in which the flag set not to be calculated is not set in the observation value setting processing. The calculation of posterior probabilities is described in, for example, a paper by Yoichi Motomura, "Information Representation for Uncertainty Modeling: Bayesian Net" (BN2001, Bayesian Net Tutorial Lecture Book, pp. 5-13 (2001), Artificial Intelligence Society) Please refer to the Artificial Intelligence Fundamental Research Group).
[0093]
After the calculation of the posterior probabilities, the probability value of each URL is acquired from the “next URL node” 4002 shown in FIG. 14 or the “next URL node” 4002 shown in FIG. They are arranged in the order of the objects (described later). Among these, a URL having a probability exceeding a certain threshold is set as a prefetch target page, and the URL and the probability value are temporarily recorded in the prefetch target page list. Here, the certain threshold value is, for example, an average value (1 / all the URLs in the “next URL node”) or α times the average value (1 / all the URLs in the “next URL node”). Is used.
[0094]
Next, a prefetch time limit (second) is obtained from the “access interval (second) node” for each URL in the once created prefetch target page list. The method of finding is to set a temporary observation value to the “next URL node” as a provisionally observed one for each URL in the prefetch target page list (one corresponding to the provisional observation value is set to 1, Otherwise, the posterior probability of the “access interval (second) node” is calculated. Among the calculated posterior probabilities, the one having the maximum probability value is obtained.
[0095]
After obtaining the prefetch time limit (seconds) for each URL in the prefetch target page list, the prefetch time limit (seconds) is recorded in the prefetch target page list to complete the prefetch target page list (step S55).
[0096]
After creating the prefetch target page list, the prefetch scheduler unit 1011 is notified that the list has been created (step S56).
[0097]
Here, a Bayesian network model and a user preference type prefetch algorithm using the same will be described below with reference to the examples shown in FIGS.
[0098]
14 and 15 are both Bayesian network models utilizing the fact that the user's preference appears in the relevance between URLs accessed by the user in time series (hereinafter, referred to as “access series”). Two access sequences are used, one that was accessed last time and one that was accessed this time. For example, it is possible to stochastically measure a tendency that a certain user accesses a site A relating to sports every time and then accesses a page of a genre B in the site A.
[0099]
In addition, the difference between the Bayesian network model shown in FIG. 14 and the Bayesian network model shown in FIG. 15 takes into account the user's preference according to the season and time, and the seasonal and temporal changes in the request contents. That is, whether or not to perform a process of selecting a page to be prefetched. For example, it is possible to stochastically measure that a certain user tends to view a page related to high school baseball on the sports newspaper site A in summer, but tends to view a page related to soccer in other seasons. Therefore, in a case where the seasonality and time change appear in the directivity of user access, FIG. 15 is used.
[0100]
The prefetch target page selection unit 1007 substitutes the observed value of the URL requested by the user into the Bayesian network model shown in FIG. 14 or FIG. 15 and performs a conditional probability learning operation to calculate a value of the conditional probability. Then, the posterior probabilities at the individual nodes in the model are obtained, and finally, the URL accessed next to the URL requested by the user is determined by the “next URL” node 4002 in the model shown in FIG. 14 or FIG. Obtain with reference to the probability value.
[0101]
Here, conditional links between nodes are represented by X _i → X _j And X _i Is the parent node, X _j Is called a child node. In FIGS. 14 and 15, arrows between nodes in the figures represent parent-child relationships. Child node X _j The conditional probability P for all child nodes for a given child node is π (X _j ) = ｛X ₁ , ..., X _i If｝, it is defined by the following equation.
[0102]
(Equation 1)

[0103]
If there are a plurality of state variables at each node, the conditional probability is represented by a conditional probability table. For example, the conditional probability table at the “next URL” node 4002 in FIG. Become.
[0104]
Learning here means that the value of the conditional probability is X _i → X _j Means the work to obtain from the number of observations where the observed value is obtained. The number of observations is X _i → X _j The number of observations that becomes is stored in the conditional observation number table. For example, in the “next URL” node 4002 in FIG. 14, the conditional observation count table is as shown in FIG. Since learning is performed each time an observation value is obtained, the value of the conditional probability is calculated each time an observation value is obtained and changes sequentially.
[0105]
The posterior probability is the probability of a possible actual value of a random variable desired to be known from the observed variable definite value (evidence: e). Evidence given from the node for calculating the posterior probability to the parent node ahead is e ⁺ Evidence given from the node to be calculated to the previous child node ⁻ Then, from Bayes' theorem, node X _j Is determined, for example, by the following equation.
[0106]
(Equation 2)

[0107]
For a general description of the Bayesian network model itself, the model expression method, and the mathematical calculation method, see, for example, a paper by Yoichi Motomura, "Information Expression for Uncertainty Modeling: Bayesian Net" ( BN2001 Bayesian Net Tutorial Lecture Collection, pp. 5-13 (2001), Japanese Society for Artificial Intelligence, Artificial Intelligence Fundamental Study Group).
[0108]
Next, an operation of selecting a prefetch target page that reflects the user's preference using the Bayesian network model will be described. First, setting of observation values will be described below with reference to FIG. 14 as an example.
[0109]
When setting the observation value in the “client IP address node” 4000, the “client IP address node” 4000 sets the probability value of the client IP address of the observation value to 1, the other values to 0, and sets the subsequent posterior probability. A flag is set so that the posterior probability of “client IP address node” 4000 is not calculated in the calculation. If there is no “client IP address node” 4000, the newly observed client IP address is added to the “client IP address node” 4000. The probability value at the time of addition is set to 1 for the added value, and set to 0 for the other values, and a flag is set so that subsequent posterior probabilities are not calculated.
[0110]
When an observation value is set to the “next URL node” 4002, the “next URL node” 4002 is used for the observation of the “next URL node” 4002 so that it can be used in the calculation of the conditional probability (EM algorithm) later. The number of times the URL was observed is recorded. If there is no URL corresponding to the “next URL node” 4002, the newly observed URL is added to the “next URL node” 4002. The number of observations when adding is set to one.
[0111]
When setting the observation value in the “current URL node” 4001, the probability value of the URL of the observation value is set to 1 in the “current URL node” 4001, and the other values are set to 0 in the subsequent posterior probability calculation. A flag is set so that the posterior probability of the “current URL node” 4001 is not calculated. If there is no “current URL node” 4001, the newly observed URL is added to the “current URL node” 4001. The added probability value is set to 1 for the added value, and set to 0 for the other values, and a flag is set so as not to calculate the posterior probability.
[0112]
When an observation value is set in the “access interval (second) node” 4003, the “access interval (second) node” 4003 uses the “access interval (second) node” so that it can be used in the calculation of the conditional probability (EM algorithm) later. ) Record the number of observations of the observed access interval (second) of the node “4003”. Note that the access interval (second) falls within the corresponding location of the “access interval (second) node” 4003 from the difference between the date and time obtained by the previous observation value and the date and time obtained by the current observation value. To calculate. For example, if the difference is 38 seconds, it is 30 seconds.
[0113]
In the case of the Bayesian network model shown in FIG. 15, for the “seasonal node” 4101, from the date of the observation value, for example, March to May is spring, June to August is summer, and September to November. Is set to fall, December to February is set to winter, the corresponding part is set to 1, and other parts are set to 0 to set an observation value, and a flag is set so that the posterior probability of the “seasonal node” 4101 is not calculated. Further, regarding the “time node” 4102, for example, from 6:00 to 9:00 in the morning, 9:00 to 12:00 in the morning, 12:00 to 13:00 in the midday, 13:00 to 16:00 in the afternoon, Hour to 18:00 is the evening, 18:00 to 24:00 is the night, and 24:00 to 6:00 is the night, the corresponding part is set to 1 and the others are set to 0, and the observation value is set. The posterior probability of “time node” 4102 Set a flag to not calculate.
[0114]
After the observation value setting process is completed, learning of the conditional probability is performed (described above). Here, in the learning of the conditional probability, in the process of setting the observation value, the conditional probability is learned by the EM algorithm using the recorded number of observations.
[0115]
After the learning of the conditional probabilities, the posterior probabilities are calculated (described above). Here, the calculation of the posterior probabilities is performed on the nodes in which the flag set not to be calculated is not set in the observation value setting processing.
[0116]
After the calculation of the posterior probabilities, the probability values of the respective URLs are obtained from the “next URL node” 4002 in FIG. 14 and the “next URL node” 4002 in FIG. Line up. Among them, a URL having a probability exceeding a certain threshold is set as a prefetch target page, and its URL and a probability value are temporarily recorded in the prefetch target page list 1008. The certain threshold value referred to here is, for example, an average value (1/1 of all the URLs in the “next URL node”), α times the average value (1/1 of all the URLs in the “next URL node”), or the like. Used.
[0117]
Next, for each URL in the prefetch target page list 1008 once created, the prefetch time limit (second) is obtained from the “access interval (second) node”. The method of finding is to set a tentative observation value to the “next URL node” for each URL in the prefetch target page list 1008 as tentatively observed (one corresponding to the tentative observation value is set to 1). And 0) for the others, and calculate the posterior probabilities of the “access interval (second) node”. Of the calculated posterior probabilities, find the one having the maximum probability value. Once the prefetch deadline (seconds) is obtained for each URL in the prefetch target page list 1008, the prefetch deadline (seconds) is recorded in the prefetch target page list 1008 to complete the prefetch target page list 1008 (described above). ).
[0118]
FIG. 18 shows, in the form of a flowchart, a processing procedure executed by the prefetch target site selection unit 1009. Note that the operation itself of the prefetch target site selection unit 1009 is the same as that of the prefetch target page selection unit 1007, and a given observation value is different.
[0119]
The prefetch target site selection unit 1009 waits for a command (step S60), receives an observation value from the log extraction unit 1003, and is instructed to perform a selection process (step S61). Set observation values in the network model (step S62). The method of setting the observation value is the same as that of the prefetch target page selection unit.
[0120]
When the setting of the observation value is completed, learning of the conditional probability is performed (step S63), and then the posterior probability is calculated (step S64), and a prefetch target site list 1010 is created (step S65). The method of calculating the conditional probability, the calculation of the posterior probability, and the method of creating the prefetch target site list 1010 are the same as those of the prefetch target page selection unit 1007.
[0121]
After creating the prefetch target site list 1010, the prefetch scheduler unit 1011 is notified that the list has been created (step S66).
[0122]
In the prefetch target page list 1008 described above, there is a correlation between the pages accessed by the user in chronological order, and the degree of the correlation (probability value) reflects the user's preference between the pages. Since the higher the degree of correlation (probability value), the higher the preference of the user, the higher the preference, the higher the preference is written in the list 1008.
[0123]
Similarly, in the prefetch target site list 1010, there is a correlation between the sites accessed by the user in time series, and the degree of the correlation (probability value) reflects the user's preference between the sites. The higher the degree of correlation (probability value), the higher the preference of the user. Therefore, the list 1010 is written with an estimated preference.
[0124]
Further, the conditional observation count table is updated each time a request request (observation value) from the user is obtained, so that the conditional probability table is also updated. When the request request from the user changes in accordance with the change in the user's preference, the conditional observation count table and the conditional probability table change accordingly, so the posterior probability also changes. Therefore, the pages and sites written in the prefetch target page list 1008 or the prefetch control site list 1010 change according to the user's preference. Based on the degree of the correlation (probability value), pages and sites expected to be accessed next from the request from the user are sequentially read ahead and stored on the cache disk 1006, so that the user Can be cached according to the changing preference.
[0125]
When such a cache operation is performed, the cache management table 1005 manages a cache area without distinguishing users, and even if a file accessed by a certain user has disappeared from the cache disk 1006, a prefetch target page selection unit. Based on a request estimated for each user from the site selection unit 1007 or the prefetch target site selection unit 1009, a page or site having a high preference that is likely to be accessed next is obtained from the Web server, and is always stored on the cache disk 1006. Will be saved. Therefore, a file having a strong preference (high probability value) that can be accessed for each user always exists on the cache disk 1006.
[0126]
FIG. 19 shows, in the form of a flowchart, a processing procedure executed in the prefetch scheduler unit 1011.
[0127]
The prefetch scheduler 1011 unit is waiting for a list creation notification from the prefetch target page selection unit 1007 and the prefetch target site selection unit 1009 (step S70). When a list creation completion notification arrives (step S71), The URL, the prefetch deadline (seconds), the probability value, and the current time are additionally registered as the prefetch start time in the prefetch schedule table 1012 that currently exists (step S72). After registration in the prefetch schedule table 1012, the prefetch scheduler 1013 is notified (step S73).
[0128]
FIG. 20 is a flowchart illustrating a processing procedure executed in the prefetch unit 1013.
[0129]
The prefetch unit 1013 is waiting for a notification from the prefetch scheduler unit 1011 (step S80). When the notification arrives (step S81), the prefetch start time is older than the current time in the prefetch schedule table 1012 from the prefetch schedule table 1012. Of these, an entry for which the prefetch has not been completed is searched for, and if the entry is found, the prefetch is stopped (step S82). Next, the entry whose prefetch start time is older than the current time is deleted from the prefetch schedule table 1012 (step S83).
[0130]
If the notification is not received from the prefetch scheduler unit 1011, for the corresponding URL in the schedule table for which prefetch has not been performed, the cache management unit 1004 is checked whether there is the corresponding URL on the cache disk 1006. An inquiry is made (step S84).
[0131]
If the corresponding URL exists on the cache disk 1006 (step S85), the Web server inquires of the Web server having the corresponding URL whether the relevant URL has been updated (step S86). If the URL has not been updated (step S87), the process returns to the registration notification wait (step S80).
[0132]
In addition, when there is no corresponding URL in the cache (step S85), and when the corresponding URL has been updated (step S87), first, the size of the target URL in the list is set to the Web server having the target URL. And using the HTTP protocol.
[0133]
When checking the size, the approximate bandwidth is obtained from the response of the Web server having the target URL, and the size and the bandwidth are recorded in the prefetch schedule table 1012 (step S88). Since the time required for prefetching is obtained from the size and the bandwidth, it is compared with the prefetch time limit.
[0134]
If the prefetch time limit has been exceeded, prefetching is temporarily stopped in ascending order of the probability value of the current prefetching, and the response is measured again to determine whether the response falls within the prefetch time limit. Then, if it does not fit within the prefetch time limit, it is set not to perform prefetch.
[0135]
Next, prefetching is continued again for those having a low probability value for which prefetching has been once stopped (step S89).
[0136]
Next, for the URLs in the schedule table that have not been prefetched, a set of files such as pages corresponding to the URLs and images in the pages is acquired in the order of the probability values in the schedule table (step S90). In the process of acquiring files, the cache management unit 1004 is requested to write the acquired files in the order of acquisition (step S91).
[0137]
FIGS. 21 and 22 show an operation sequence for acquiring a page on the Web server from the user's terminal through the intervention of the cache system 1000 according to the present embodiment.
[0138]
FIG. 21 illustrates an operation sequence when the requested page exists on the cache system 1000.
[0139]
In this case, first, the WWW cache system receives a page acquisition request from the terminal.
[0140]
Next, a search is made as to whether the page requested by the terminal exists on the cache in the WWW cache system. If it exists in the cache, the page requested by the terminal is returned to the terminal.
[0141]
Next, the WWW cache server predicts a page that is likely to be accessed next to the page requested from the terminal according to the above-described procedure, and requests the WWW server for this. The WWW server returns the page requested from the WWW cache system to the WWW cache system.
[0142]
When receiving the requested page from the WWW server, the WWW cache system writes the requested page on a cache in the WWW cache system.
[0143]
FIG. 22 illustrates an operation sequence when the requested page does not exist on the cache system 1000.
[0144]
In this case, first, the WWW cache system receives a page acquisition request from the terminal. Next, it is searched whether or not the page requested by the terminal exists on the cache in the WWW cache system.
[0145]
If the page does not exist in the cache, the page requested from the terminal is requested to the WWW server. The WWW server returns the page requested from the WWW cache system to the WWW cache system.
[0146]
Upon receiving the requested page from the WWW server, the WWW cache system transmits the page received from the WWW server to the terminal and writes the requested page on a cache area in the WWW cache system.
[0147]
Next, the WWW cache system predicts a page which is likely to be accessed next to the page requested by the terminal according to the above-described procedure, and requests the WWW server for this.
[0148]
The WWW server returns the page requested from the WWW cache system to the WWW cache system. When receiving the requested page from the WWW server, the WWW cache system writes the requested page on a cache in the WWW cache system.
[0149]
Finally, a hardware environment for operating the cache system according to the present embodiment will be described with reference to FIG.
[0150]
The CPU 5000 is configured using, for example, a processor chip “Pentium (registered trademark)” manufactured by Intel Corporation of the United States, and has a function of comprehensively controlling the overall operation of the cache server device and performing various arithmetic processes. ing. Various application programs, including the cache server application according to the present embodiment, operate on the CPU 5000 in an execution environment provided by an operating system (OS).
[0151]
The cache memory 5001 is a small-capacity high-speed memory constituted by, for example, an SRAM (Static RAM), temporarily stores information such as commands and data frequently accessed by the CPU 5000, and directly exchanges information with the CPU 5000. By doing so, the system speeds up.
[0152]
A system controller 5002 implements an interface protocol between a host bus 5008 directly connected to a local pin of the CPU 5000 and a peripheral component interconnect (PCI) bus 5009 as a peripheral bus, and executes a CPU 5000, a main memory 5003, and a cache memory. 5001 and other various resources (such as a hard disk, a flexible disk, etc.) to adjust the timing of the entire system. The system controller 5002 is configured using, for example, TRITON (430FX) manufactured by Intel Corporation of the United States.
[0153]
The main memory 5003 is a semiconductor memory device configured using, for example, a DRAM (Dynamic RAM), and loads a program code executed in the CPU 5000 or is used as a storage area for work data of an execution program. . For example, the cache server application according to the present invention is loaded into the main memory 5003 and executed by the CPU 5000. Further, an access log 1002, a cache management table 1005, a prefetch target page list 1008, a prefetch target site list 1010, a prefetch schedule table 1012, and the like are temporarily stored in the main memory 5003.
[0154]
The operation of writing / reading information to / from the main memory 5003 is performed according to an instruction from the CPU 5000 or the system controller 5002. The main memory 5003 is connected to the CPU and various resources through the system controller 5002, and stores information according to these requests.
[0155]
The host bus 5008 is a means for transmitting information directly connected to the CPU 5000, and can exchange information with the cache memory 5001, the system controller 5002, and the like.
[0156]
The PCI bus 5009 is a means for transmitting information separated from the host bus 5008, and is interconnected by the system controller 5002. Then, the CPU 5000 can access various hardware resources connected on the PCI bus 5009 via the system controller 5002.
[0157]
The HD controller 5004 is connected to a hard disk drive (HDD) 5005 and a PCI bus 5009, and writes / writes information to a specific area in the disk 5005 in response to a disk access request via the PCI bus 5009. The read operation is controlled. On the hard disk, programs executed by the CPU 5000 are installed, and computer files such as programs and data are stored. For example, the cache server application according to the present invention is installed on a hard disk, or an access log 1002, a cache management table 1005, a prefetch target page list 1008, a prefetch target site list 1010, a prefetch schedule Table 1012 and the like are stored on the hard disk.
[0158]
The FD controller 5006 is connected to a flexible disk drive 5007 for removably loading a flexible disk and a PCI bus 5009, and responds to a disk access request via the PCI bus 5009 to specify a specific disk in the flexible disk. The operation of writing / reading information to / from the area is controlled.
[0159]
Alternatively, in the system, a media drive that loads a portable medium other than a flexible disk and performs an access operation may be connected to the PCI bus 5009. This type of portable media is used for transferring programs and data between systems. For example, a cache server application according to the present invention, an access log 1002 used in a cache server operation, a cache management table 1005, a prefetch target page list 1008, a prefetch target site list 1010, a prefetch schedule table 1012 and the like can be moved between multiple systems via portable media.
[0160]
A mouse controller 5010 connects a mouse 5011 (and a keyboard (not shown)) serving as a user input device to a PCI bus 5009, and transmits a mouse movement forced by an operator and other user input operations to the CPU 5000 according to a predetermined sequence. It is designed to communicate. On the CPU 5000 side, for example, a GUI (Graphical User Interface) environment is provided, and the basis of information creation for relatively moving the mouse pointer (arrow pictogram) displayed together with the image displayed on the CRT display 5013 is described. Information can be generated. The CRT display 5013 is connected to the CRTC (5012), and displays a state created by the CPU and other information as an image.
[0161]
A CRTC (CRT controller) 5012 is connected to the PCI bus 5009 and draws drawing information such as a figure on the CRT display 5013 based on an instruction from the CPU 5000 or the like.
[0162]
The network interface 5014 connects the system to an external network such as a LAN or the Internet. The system operates as a cache server described above on a network, receives a request from a terminal via a network interface 5014, distributes a file, obtains a file from a WWW server, analyzes an access sequence for each user, Perform actions such as file prefetching.
[0163]
On the network, programs (data) are transferred (downloaded) between systems. For example, a cache server application according to the present invention, an access log 1002 used in a cache server operation, a cache management table 1005, a prefetch target page list 1008, a prefetch target site list 1010, a prefetch schedule table 1012 and the like can be transferred between systems via a network.
[0164]
[Supplement]
The present invention has been described in detail with reference to the specific embodiments. However, it is obvious that those skilled in the art can modify or substitute the embodiment without departing from the scope of the present invention. That is, the present invention has been disclosed by way of example, and the contents described in this specification should not be interpreted in a limited manner. In order to determine the gist of the present invention, the claims described at the beginning should be considered.
[0165]
【The invention's effect】
As described above in detail, according to the present invention, information can be efficiently provided in response to a request of a specific user from information provided to an unspecified number of users in a WWW information space. An excellent information providing system, information providing method, and computer program can be provided.
[0166]
Further, according to the present invention, an excellent information providing system, an information providing method, and a computer, which can provide information with a relatively short response time from a request of a specific user by caching WWW information, Program can be provided.
[0167]
Further, according to the present invention, an information providing system, an information providing method, and a computer, which can provide information in a relatively short response time from a user's request by pre-reading information requested by the user, Program can be provided.
[0168]
Further, according to the present invention, by pre-reading information requested by a user in response to a change in user's preference changing from moment to moment, information can be provided in a relatively short response time from a user's request. An excellent information providing system, information providing method, and computer program can be provided.
[0169]
According to the present invention, by performing prefetching for each user using the user's request log, WWW information requested by a specific user, rather than an unspecified number, can be efficiently cached. In addition, by using a Bayesian network model, which is one of the probabilistic network models, for the look-ahead mechanism, even if the access pattern changes in response to an ever-changing user's preference, this is not considered. Correspondingly, it is possible to stochastically determine a prefetch target, limit the prefetch target, and dynamically respond to a change of the user.
[Brief description of the drawings]
FIG. 1 is a diagram schematically showing a configuration example of a WWW information providing system to which the present invention is applied.
FIG. 2 is a diagram schematically illustrating another configuration example of a WWW information providing system to which the present invention is applied.
FIG. 3 is a diagram schematically showing still another configuration example of the WWW information providing system to which the present invention is applied.
FIG. 4 is a diagram schematically showing a functional configuration of a cache system 1000 according to an embodiment of the present invention.
FIG. 5 is a diagram showing an example of a data structure of a log recorded in an access log 1002.
FIG. 6 is a diagram showing an example of a data structure of a cache management table 1005.
FIG. 7 is a diagram showing an example of a data structure of a prefetch target page list 1008.
FIG. 8 is a diagram showing an example of a data structure of a prefetch target site list 1010.
FIG. 9 is a diagram schematically showing a data structure of a prefetch schedule table 1012.
FIG. 10 is a flowchart showing a processing procedure executed in the request processing unit 1001.
FIG. 11 is a flowchart showing a processing procedure executed in the log extracting unit 1003.
FIG. 12 is a flowchart showing a processing procedure executed in the cache management unit 1004.
FIG. 13 is a flowchart showing a processing procedure executed in a prefetch target page selection unit 1007.
FIG. 14 is a diagram illustrating a configuration example of a Bayesian network model.
FIG. 15 is a diagram illustrating a configuration example of a Bayesian network model.
FIG. 16 is a diagram showing a conditional probability table.
FIG. 17 is a diagram showing a conditional probability observation count table.
FIG. 18 is a flowchart showing a processing procedure executed in a prefetch target site selection unit 1009.
FIG. 19 is a flowchart showing a processing procedure executed in a prefetch scheduler unit 1011.
FIG. 20 is a flowchart showing a processing procedure executed in a prefetch unit 1013.
FIG. 21 is a diagram showing an operation sequence for obtaining a page on a Web server from a user terminal through the intervention of a cache system 1000 (provided that a requested page exists on the cache system 1000). is there.
FIG. 22 is a diagram showing an operation sequence for obtaining a page on a Web server from a user terminal through the intervention of the cache system 1000 (provided that the requested page does not exist on the cache system 1000). is there.
FIG. 23 is a diagram showing an example of a hardware configuration for operating the cache system according to the present invention.
FIG. 24 is a diagram showing an example of a data structure of a cache management table 1005 when terminals of all users are not distinguished.
[Explanation of symbols]
1000 ... Cash system
1001 Request processing unit
1002 ... Access log
1003 ... log extraction unit
1004: Cache management unit
1005: Cache management table
1006 ... Cache disk
1007: Prefetch target page selection unit
1008: Prefetch target page list
1009: Prefetch target site selection section
1010: Prefetch target site list
1011: Prefetch scheduler section
1012: Prefetch schedule table
1013: Prefetch unit
5000 CPU, 5001 Cache memory
5002: System controller, 5003: Main memory
5004 ... HD controller, 5005 ... HDD
5006 ... FD controller, 5007 ... FD
5008: Host bus, 5009: PCI bus
5010 mouse controller, 5011 mouse
5012 ... CRT display, 5013 ... CRT controller
5014 ... Network interface

Claims

An information providing system for providing information consisting of pages having a mutual reference relationship over a plurality of sites,
Log extraction means for extracting a request log requesting information for each user;
A prefetch target page selecting unit that predicts a next page to be accessed based on an access sequence of pages included in the request log, and obtains a page to be prefetched for each user;
Prefetch target site selection means for predicting the next site to be accessed based on the access sequence of the site included in the request log, and seeking a site to be prefetched for each user,
A prefetch unit that prefetches information based on a page and a site selected by the prefetch target page selection unit and the prefetch target site selection unit;
An information providing system comprising:

The prefetch target page selection unit and / or the prefetch target site selection unit predicts the next page and / or site to be accessed by the user, using a probability network model for each user that describes the access sequence of the user. The information providing system according to claim 1, wherein:

The prefetch target page selection unit and / or the prefetch target site selection unit updates a probability network model based on an access result of a next user.
3. The information providing system according to claim 2, wherein:

The prefetch means performs a prefetch operation for a prefetch target in consideration of a network load or the like.
The information providing system according to claim 1, wherein:

The prefetch unit rearranges the priorities of pages and / or sites to be prefetched based on the prediction in consideration of network load.
The information providing system according to claim 4, wherein:

The prefetch means stores the prefetched information in a storage area assigned to each user,
The information providing system according to claim 1, wherein:

An information providing method for providing information including pages having a mutual reference relationship over a plurality of sites,
A log extracting step of extracting a request log requesting information for each user;
A prefetch target page selection step of predicting the next page to be accessed based on the access sequence of the pages included in the request log, and seeking a page to be prefetched for each user;
A prefetch target site selection step of predicting the next site to be accessed based on the access sequence of the site included in the request log and obtaining a site to be prefetched for each user;
A prefetch step of prefetching information based on the page and site selected by the prefetch target page selection step and the prefetch target site selection step;
An information providing method, comprising:

In the step of selecting a page to be prefetched and / or the step of selecting a prefetch target site, a page and / or a site to be accessed next by a user is predicted using a probability network model for each user that describes a user's access sequence.
The information providing method according to claim 7, wherein:

In the prefetch target page selection step and / or the prefetch target site selection step, a probability network model is updated based on an access result of a next user.
The information providing method according to claim 7, wherein:

In the prefetch step, a prefetch operation for a prefetch target is performed in consideration of a network load or the like.
The information providing method according to claim 7, wherein:

In the prefetching step, the priorities of pages and / or sites to be prefetched based on the prediction are rearranged in consideration of network load.
The information providing method according to claim 7, wherein:

In the prefetch step, the prefetched information is stored in a storage area assigned to each user,
The information providing method according to claim 7, wherein:

A computer program written in a computer-readable format so as to execute a process for providing information consisting of pages having a mutual reference relationship over a plurality of sites on a computer system,
A log extracting step of extracting a request log requesting information for each user; and a page to be accessed next is predicted based on an access sequence of pages included in the request log, and a page to be prefetched is determined for each user. A step of selecting a prefetch target page to be sought;
A prefetch target site selection step of predicting the next site to be accessed based on the access sequence of the site included in the request log and obtaining a site to be prefetched for each user;
A prefetch step of prefetching information based on the page and site selected by the prefetch target page selection step and the prefetch target site selection step;
A computer program comprising: