JP2008181186A

JP2008181186A - How to find the relevance between keywords and sites using query logs

Info

Publication number: JP2008181186A
Application number: JP2007012402A
Authority: JP
Inventors: Sumio Fujita; 澄男藤田
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2007-01-23
Filing date: 2007-01-23
Publication date: 2008-08-07

Abstract

【課題】クエリーログを利用して、検索キーワードとサイトとの関連性を求める新たな方法を提供すること。
【解決手段】ユーザ端末と検索サーバの間に位置するサーバが、ユーザ端末からキーワードによる検索を行った際に、独自の検索結果ページを生成し、該ユーザの端末に表示する。本サーバは、ユーザの検索結果におけるクエリーログを格納するクエリーログ記憶手段を備え、ユーザが検索を行った際のセッションのＩＤ、検索結果ページからユーザがクリックしたＷｅｂページのＵＲＬ、該Ｗｅｂページのリンクをクリックした際の日時、及び前記キーワードをクエリーログ記憶手段に集積するステップと、クエリーログ記憶手段を用いて、キーワード毎にＵＲＬのクリック数を集計するステップと、クエリーログ記憶手段から、予め入力された所定のキーワード・リスト内のキーワードを含むクエリーログを抽出するステップと、を含む。
【選択図】図１To provide a new method for obtaining a relationship between a search keyword and a site using a query log.
When a server located between a user terminal and a search server performs a search using a keyword from the user terminal, a unique search result page is generated and displayed on the user terminal. The server includes query log storage means for storing a query log in a user search result, the session ID when the user performs a search, the URL of the Web page clicked by the user from the search result page, the Web page The date and time when the link is clicked, the step of accumulating the keyword in the query log storage means, the step of counting the number of URL clicks for each keyword using the query log storage means, and the query log storage means in advance Extracting a query log including keywords in the inputted predetermined keyword list.
[Selection] Figure 1

Description

本発明は、クエリーログを利用したキーワードとサイトの関連度を求める方法に関する。より詳しくは、キーワード検索の検索結果におけるクリックログを含むクエリーログを利用したキーワードとサイトの関連度を求めるための方法、サーバ、及びプログラムに関する。 The present invention relates to a method for obtaining a degree of association between a keyword and a site using a query log. More specifically, the present invention relates to a method, a server, and a program for obtaining a degree of association between a keyword and a site using a query log including a click log in a keyword search result.

今日、インターネットを用いて膨大な情報の中から、誰もがいつでも欲しい情報を検索することが可能になった。ある情報を求めるユーザは多くの場合、インターネット上の各種検索サイトでその情報の特徴を表すキーワードを入力してクエリー（照会）を行う。このようなクエリーを行うことにより、検索サイトの検索エンジンが、インターネット上でＷｅｂサイトの検索を行い、その結果として検索結果ページがユーザに表示される。この検索結果ページには、数々の情報を記述したＷｅｂサイトへのリンクが、検索サイトの検索エンジンによって定められた優先順位にしたがって表示される。 Today, it is now possible to search for information that anyone wants at any time from a vast amount of information using the Internet. In many cases, a user who seeks certain information inputs a keyword representing the feature of the information at various search sites on the Internet and performs a query. By performing such a query, the search engine of the search site searches the Web site on the Internet, and as a result, the search result page is displayed to the user. On this search result page, links to Web sites describing various information are displayed in accordance with the priority order determined by the search engine of the search site.

しかし、この検索結果ページから得たリンク先のＷｅｂサイトの情報には、必ずしもユーザの求める情報が含まれず、入力したキーワードとの関連性の低いものも存在する。検索結果ページで最初にクリックしたリンク先に求める情報がなかった場合には、多くの場合、ユーザは次の表示順位のリンク先のページを順にたどってゆくことになる。そのため、検索結果ページにはキーワードと関連の深いＷｅｂサイトほど表示されることが望ましい。一方、アダルト、暴力、グロテスク、差別語等、子供等の特定のユーザにとっては、不適切、若しくは「有害」な情報を含んだサイトも多数存在するので、このようなサイトは、場合によってはそのユーザの検索結果ページに表示されないようにすることも必要である。 However, the information on the linked Web site obtained from the search result page does not necessarily include the information requested by the user, and there are some that are not related to the input keyword. In the case where there is no information to be requested for the link destination clicked first on the search result page, in many cases, the user sequentially follows the link destination page of the next display order. For this reason, it is desirable that Web sites that are more closely related to keywords are displayed on the search result page. On the other hand, there are many sites that contain information that is inappropriate or “harmful” for certain users such as adults, violence, grotesques, discriminatory words, and so on. It is also necessary to prevent it from being displayed on the user search results page.

そのため、このような「有害」サイトをフィルタリング（データ内容を検査して通過されるかどうかを判定すること）を行う方法が多数存在する。例えば、特許文献１には、予め登録したキーワードファイルを読み込み、そのキーワードに基づく検索を定期的に自動で行って、検索結果情報から抽出した「不良ＵＲＬ（ＵｎｉｖｅｒｓａｌＲｅｓｏｕｒｃｅＡｌｌｏｃａｔｏｒ）」を不良ＵＲＬデータベースに登録し、クライアント装置からの送信要求情報をＷｅｂサーバへ送信するか否かを判定する管理サーバが開示されている。
特開２００４−４６７３９号公報 Therefore, there are many methods for filtering such “harmful” sites (inspecting the data content to determine whether it can be passed). For example, in Patent Document 1, a keyword file registered in advance is read, a search based on the keyword is automatically performed periodically, and a “bad URL (Universal Resource Allocator)” extracted from the search result information is stored in the bad URL database. A management server that registers and determines whether or not to send transmission request information from a client device to a Web server is disclosed.
JP 2004-46739 A

しかしながら、上記特許文献１の管理サーバにおいて、キーワードに基づいて「不良ＵＲＬ」をどのように抽出するのかについては具体的には記載されていない。今日、「有害」サイトと呼ばれるものは膨大な数が存在する一方、「不適切」なキーワードを含んでいても「健全」なサイトも多数存在する（例えば、時事ニュースや評論、解説記事等のサイト）。そのため、「有害」サイトの自動抽出には限界があり、このようなサイトの抽出は、専門業者等人手による方法が広く用いられているのが現状である。一方、機械学習的な方法によって自動的にフィルタリングする方法も存在するが、計算量の多さや学習のためのトレーニングデータ等の作成に手間がかかる等の問題がある。また、この方法は、「有害」サイトの入り口に十分な語句がなく、もっぱら画像のみでページが構成されている場合には適用できない。 However, it is not specifically described how the “bad URL” is extracted based on the keyword in the management server of Patent Document 1. Today there are a huge number of what are called “poisonous” sites, but there are also many “healthy” sites that contain “inappropriate” keywords (for example, current news, reviews, commentary, etc.) site). For this reason, there is a limit to the automatic extraction of “harmful” sites, and in the current situation, manual extraction methods such as specialists are widely used. On the other hand, there is a method of automatically filtering by a machine-learning method, but there are problems such as a large amount of calculation and troublesome training data creation for learning. Also, this method cannot be applied when there are not enough words at the entrance of the “harmful” site and the page is composed solely of images.

本発明は、上記課題に鑑み、多数のユーザが実際に行った検索のクエリーのログを利用して、検索に用いたキーワードとＷｅｂサイトとの関連度を経験的に求める新たな方法等を提供することを目的とする。 In view of the above problems, the present invention provides a new method for empirically obtaining the degree of relevance between a keyword used for a search and a Web site using a log of search queries actually performed by a large number of users. The purpose is to do.

本発明では、以下のような解決手段を提供する。 The present invention provides the following solutions.

（１）通信ネットワークを介して複数のユーザの端末と接続可能なサーバがキーワードとＷｅｂサイトの関連度を求める方法であって、
前記サーバは、前記端末のユーザがキーワードによる検索を行った際に検索結果ページを生成し該ユーザの端末に前記検索結果ページを表示する手段と、該ユーザの前記検索結果ページからのクリックログを格納するクエリーログ記憶手段を備え、
前記サーバにおいて、
前記ユーザが前記検索を行った際のセッションのＩＤ、前記検索結果ページから前記ユーザがクリックしたＷｅｂページのＵＲＬ、該Ｗｅｂページへのリンクをクリックした際のアクセス日時、及び前記キーワードを前記クエリーログ記憶手段に集積するステップと、
前記クエリーログ記憶手段を用いて、前記キーワード毎に前記ＵＲＬに対するクリック数を集計するステップと、
前記クエリーログ記憶手段から、予め入力された所定のキーワード・リスト内の各キーワードを含むクエリーログを抽出するステップと、
を含む方法。 (1) A method in which a server that can be connected to a plurality of user terminals via a communication network obtains the degree of association between a keyword and a website,
The server generates a search result page when the user of the terminal performs a search by a keyword and displays the search result page on the user's terminal, and a click log from the search result page of the user. A query log storage means for storing,
In the server,
The query log includes the session ID when the user performs the search, the URL of the Web page clicked by the user from the search result page, the access date and time when the link to the Web page is clicked, and the keyword. Accumulating in storage means;
Using the query log storage means to count the number of clicks for the URL for each keyword;
Extracting a query log including each keyword in a predetermined keyword list inputted in advance from the query log storage means;
Including methods.

このような構成によれば、このサーバは、ユーザがキーワードによる検索を行った際に、（検索エンジンが求めた検索結果から）独自の検索結果ページを生成し、ユーザの端末に表示する手段（具体的には検索結果ページのＵＲＬを送信する）を有する。この検索結果ページに表示されたリンク先をユーザがクリックすると、前記の検索を行った際のセッションＩＤ、クリックされたリンク先のＵＲＬ、クリックしたときのアクセス日時、検索の時に用いたキーワード（キーワードは複数であってもよい）を、本サーバに接続されたクエリーログ記憶手段（例えば後述のクエリーログ・データベース）に格納する。そして、このクエリーログ記憶手段を用いて、検索時のキーワード毎に、検索結果ページからユーザが実際にクリックしたＵＲＬのクリック数を集計する。そして、別に予め定められた所定のキーワード・リスト（例えば、有害サイトを抽出するためのキーワード・リスト）内の各キーワード（シードクエリーとも呼ぶ）を含むクエリーログを前記のクエリーログ記憶手段から抽出する。 According to such a configuration, this server generates a unique search result page (from the search result obtained by the search engine) and displays it on the user's terminal when the user performs a search by keyword ( Specifically, the URL of the search result page is transmitted. When the user clicks the link destination displayed on the search result page, the session ID when the search is performed, the URL of the clicked link destination, the access date and time when the click is performed, and the keyword (keyword used for the search) Are stored in a query log storage means (for example, a query log database described later) connected to the server. Then, using this query log storage means, the number of URL clicks actually clicked by the user from the search result page is counted for each keyword at the time of search. Then, a query log including each keyword (also referred to as a seed query) in a predetermined keyword list (for example, a keyword list for extracting harmful sites) set in advance is extracted from the query log storage means. .

このことにより、ユーザが様々なキーワードを用いて検索した結果である検索結果ページから、実際にそのユーザがクリックしたＵＲＬをクエリーログ・データベース等の記憶手段に、多数集積することができ、そのキーワードとクリックされたＵＲＬ集合の中から所定のキーワード・リストにマッチするＵＲＬを抽出し、この抽出されたキーワードとＵＲＬ集合を用いてサイトの収集や検証に役立てることができる。 As a result, it is possible to accumulate a large number of URLs actually clicked by a user in a storage means such as a query log database from a search result page which is a result of a user searching using various keywords. URLs that match a predetermined keyword list can be extracted from the clicked URL set, and this extracted keyword and URL set can be used to collect and verify sites.

（２）前記抽出したクエリーログに含まれるＵＲＬのクリック数の頻度の高い順に、前記ＵＲＬを、前記サーバに接続されたサイト検証者の端末に表示するステップを更に含む、（１）に記載の方法。 (2) The method according to (1), further including a step of displaying the URLs on a site verifier terminal connected to the server in the descending order of the number of clicks of the URLs included in the extracted query log. Method.

このような構成によれば、サイト検証者（特定の種類のＷｅｂサイトを収集し、それらのサイトが実際にその種類にあっているかどうかを検証する者）が、例えば、ペアレンタル・コントロールのために有害サイトのリストを集めようとしたとき等に、ユーザのクリック数が多い（すなわち、実際に有害度の影響も高いと考えられる）サイトを優先的に表示してサイト検証者の検証効率を高めることができる。 According to such a configuration, a site verifier (a person who collects specific types of Web sites and verifies whether or not those sites are actually in that type) can be used for parental control, for example. When trying to collect a list of harmful sites, etc., the site verifier's verification efficiency can be improved by preferentially displaying the sites where the number of clicks by the user is high (that is, the impact of the harmfulness is actually high). Can be increased.

（３）前記抽出したクエリーログに含まれる各ＵＲＬに対するアクセス日時から、前記ユーザがクリックしたＷｅｂページにおける滞在時間を計算し、同一頻度のクリックのＵＲＬに対しては、前記滞在時間の長い順に前記ＵＲＬを前記サーバに接続されたサイト検証者の端末に表示するステップを更に含む、（２）に記載の方法。 (3) From the access date and time for each URL included in the extracted query log, the stay time in the Web page clicked by the user is calculated, and for the click URLs with the same frequency, the stay time in the descending order of the stay time. The method according to (2), further comprising displaying a URL on a site verifier terminal connected to the server.

このような構成によれば、ユーザが特定のＵＲＬをクリックしたアクセス日時からそのＵＲＬのＷｅｂページの滞在時間を何らかの方法で取得することによって、同じクリック数のＵＲＬがあった場合でも、滞在時間の長いＷｅｂページは、ユーザの求める情報がより多くあったと考え、すなわち、ユーザが入力した検索キーワードとそのＵＲＬを持つサイトの関連度が高いものとして、そのＵＲＬを優先的に表示させることで検証の効率をあげることができる。 According to such a configuration, even if there is a URL with the same number of clicks by acquiring the stay time of the Web page of the URL from the access date and time when the user clicked on the specific URL by some method, A long Web page is considered to have more information requested by the user, that is, the search keyword entered by the user and the site having the URL have high relevance, and the URL is preferentially displayed for verification. Efficiency can be increased.

（４）前記抽出したクエリーログに含まれる各ＵＲＬに対するアクセス日時から、前記ユーザがクリックしたＷｅｂページにおける滞在時間を計算し、前記滞在時間から求めた重み度と前記クリック数とを掛けてキーワードとサイトの関連度を計算し、前記関連度の高い順に前記ＵＲＬを前記サーバに接続されたサイト検証者の端末に表示するステップを更に含む、（１）に記載の方法。 (4) From the access date and time for each URL included in the extracted query log, the stay time on the Web page clicked by the user is calculated, and the keyword is obtained by multiplying the weight obtained from the stay time and the number of clicks. The method according to (1), further comprising a step of calculating a degree of association of a site and displaying the URL on a terminal of a site verifier connected to the server in descending order of the degree of association.

このような構成によれば、クリック数と滞在時間から求めた重み度（ウェイト）を掛け合わせ、その値をサイトの関連度として考える。この際、（３）のように同一頻度のＵＲＬにのみ滞在時間を求めるのでなく、クリックされた全てのＵＲＬに対して滞在時間を求め、それをクリック数に対する重み付けとする。その結果、キーワードとサイトの関連度を多面的に求めることができる。 According to such a configuration, the number of clicks is multiplied by the weight (weight) obtained from the stay time, and the value is considered as the degree of relevance of the site. At this time, the staying time is not obtained only for URLs having the same frequency as in (3), but the staying time is obtained for all clicked URLs, and is used as a weight for the number of clicks. As a result, the degree of relevance between the keyword and the site can be obtained in many ways.

（５）前記滞在時間は、前記検索結果ページに掲載されるＷｅｂページにおいて、前記ユーザが、あるＷｅｂページへのリンクをクリックした時刻と次に別のＷｅｂページへのリンクをクリックした時刻との差によって求める、（３）または（４）に記載の方法。 (5) The staying time is the time when the user clicked a link to a certain web page and the time when the user clicked a link to another web page on the web page posted on the search result page. The method according to (3) or (4), which is determined by a difference.

このような構成によれば、あるＷｅｂページへのリンクをクリックした時刻と次に別のＷｅｂページへのリンクをクリックした時刻との差によって先にクリックしたＵＲＬにおけるユーザ滞在時間（閲覧時間）を求めることができる。すなわち、滞在時間をクエリーログに保存されたアクセス日時（時刻）から容易に求めることができる。なお、この方法では、セッションの最後にクリックしたＵＲＬは、次のＵＲＬが存在しないため、滞在時間として十分大きい値、例えば３０分とする、を便宜的に与えることにする。 According to such a configuration, the user staying time (viewing time) in the URL clicked first by the difference between the time when the link to a certain Web page is clicked and the time when the link to another Web page is clicked next is calculated. Can be sought. That is, the staying time can be easily obtained from the access date and time (time) stored in the query log. In this method, the URL clicked at the end of the session does not have the next URL, and therefore, for convenience, a sufficiently long value, for example, 30 minutes is given as the stay time.

（６）前記検索結果ページに掲載するＷｅｂページへのリンクは、リダイレクタのＵＲＬであり、前記ユーザが前記リンクをクリックすると、前記リダイレクタによって本来のリンク先にリダイレクトされる、（１）に記載の方法。 (6) The link to the Web page posted on the search result page is a redirector URL, and when the user clicks the link, the redirector redirects the original link destination to the original link destination. Method.

このような構成によれば、検索結果ページに掲載されるＷｅｂページへのリンクは直接そのＷｅｂページのＵＲＬでなく、予め設定されたリダイレクタのページのＵＲＬとする。その結果、ユーザが検索結果ページから、あるＷｅｂページへのリンクをクリックすると、いったんリダイレクタのページに飛び、リダイレクタがユーザのクリックした時の時刻やクリック数をカウントし、クエリーログに保存した後、本来のＷｅｂページにリダイレクトする。こうようにすることで、ユーザには意識させずにユーザのクエリーログを容易に集積することができる。 According to such a configuration, the link to the Web page posted on the search result page is not the URL of the Web page directly, but the URL of the redirector page set in advance. As a result, when the user clicks a link to a certain web page from the search result page, it jumps to the redirector page, counts the time and number of clicks when the redirector clicks, and saves it in the query log. Redirect to the original web page. By doing so, the user's query log can be easily accumulated without making the user aware of it.

（７）前記サイト検証者のための特定のＵＲＬに対する滞在時間に関するスコアであるＵＲＬ＿Ｓｃｏｒｅ（ｕ）を、次の数式によって求める（４）または（５）に記載の方法。

ただし、ｔ（ｓ）：セッションｓにおけるＵＲＬｕの滞在時間
ｑ：シードクエリー
Ｓ：セッション集合
Ｑ：同種のサイトを検索するためのシードクエリーの集合
ＵＲＬ＿Ｓｃｏｒｅ（ｑ，ｕ）：シードクエリーｑに対するＵＲＬｕのスコア (7) The method according to (4) or (5), wherein URL_Score (u), which is a score related to a stay time for a specific URL for the site verifier, is obtained by the following formula.

Where t (s): URLu stay time in session s
q: Seed query
S: Session set
Q: A set of seed queries for searching similar sites
URL_Score (q, u): URLu score for seed query q

ここでシードクエリーとは、サイト検証者がサイトを収集するために用いるキーワードを意味する。このような構成によれば、同種のサイトを求めるシードクエリーがキーワード・リストとして複数ある場合でも、特定のＵＲＬに対してその滞在時間の和をそのシードクエリーに対するスコアとして求め、そのスコアをシードクエリーの集合全てにおいて加算することで、ＵＲＬのあるシードクエリーの集合に対する全体スコアを求める。そして、この全体スコアの高いＵＲＬを優先的に表示すれば検証の効率をあげることができる。なお、シードクエリーの集合は、予めサイト検証者がキーワード・リスト等で定義しておく。 Here, the seed query means a keyword used by the site verifier to collect sites. According to such a configuration, even when there are a plurality of seed queries for the same type of site as a keyword list, the sum of the staying times is obtained as a score for the seed query for a specific URL, and the score is obtained as a seed query. The total score for the set of seed queries with a URL is obtained by adding all the sets. If the URL having a high overall score is preferentially displayed, the efficiency of verification can be increased. A set of seed queries is previously defined by a site verifier using a keyword list or the like.

（８）上記集計するステップは、所定の時間間隔毎に定期的に行う、（１）〜（７）に記載の方法。 (8) The method according to any one of (1) to (7), wherein the counting step is periodically performed at predetermined time intervals.

このような構成によれば、本発明のサーバは、定期的に（例えば２４時間毎、１週間毎、１ヶ月毎等）、与えられたキーワード・リストによるサイトの検索を自動的に行うので、新規のサイトが登場した場合でも直ちにサイト検証の対象に加えることができる。 According to such a configuration, the server of the present invention automatically searches the site by a given keyword list periodically (for example, every 24 hours, every week, every month, etc.) Even if a new site appears, it can be immediately added to the site verification target.

（９）前記所定のキーワード・リストが、キーワードとして、有害サイトを識別するペアレンタル・コントロールのための所定の猥褻語、差別語を含む、（１）〜（８）に記載の方法。 (9) The method according to any one of (1) to (8), wherein the predetermined keyword list includes, as keywords, predetermined language and discriminatory words for parental control for identifying harmful sites.

本発明の方法は、ペアレンタル・コントロールにおける有害サイトの識別に利用できる。所定の卑猥語、差別語は、サイト検証者によってキーワード・リストによって入力される。 The method of the present invention can be used to identify harmful sites in parental control. Predetermined obscene words and discriminatory words are entered by the site verifier through a keyword list.

（１０）前記所定のキーワード・リストが、キーワードとして、ネットオークションにおける所定の取引禁止物品名を含む、（１）〜（８）に記載の方法。 (10) The method according to any one of (1) to (8), wherein the predetermined keyword list includes a predetermined trade prohibited article name in an online auction as a keyword.

本発明の方法は、ネットオークションの所定の取引禁止物品名（例えば、「医薬品」、「麻薬」、「武器類」、「取引禁止動植物」、「猥褻品」等に分類される物品名）を含む出品ページの検索に利用できる。所定の取引禁止物品名は、サイト検証者によって当該ネットオークションの規定に基づきキーワード・リストによって入力される。 In the method of the present invention, a predetermined trade prohibited article name (for example, an article name classified as “medicine”, “narcotics”, “weapons”, “transaction prohibited animals and plants”, “grocery”, etc.) in the Internet auction is used. Can be used to search for listing pages. The predetermined trade prohibited product name is input by the site verifier by a keyword list based on the rules of the net auction.

（１１）通信ネットワークを介して複数のユーザの端末と接続可能でキーワードとＷｅｂサイトの関連度を求めるためのサーバであって、
前記サーバは、前記端末のユーザがキーワードによる検索を行った際に検索結果ページを生成する検索結果ページ生成部と、
該ユーザの検索結果におけるクエリーログを格納するクエリーログ・データベースと、
前記ユーザが前記検索を行った際のセッションのＩＤ、前記検索結果ページから前記ユーザがクリックしたＷｅｂページのＵＲＬ、該Ｗｅｂページへのリンクをクリックした際のアクセス日時、及び前記キーワードを格納するクエリーログ保存部と、
前記クエリーログ・データベースを用いて、前記キーワード毎に前記ＵＲＬに対するクリック数を集計するクリック数集計部と、
前記クエリーログ・データベースから、予め入力された所定のキーワード・リスト内のキーワードを含むクエリーログを抽出するクエリーログ抽出部と、
を備えるサーバ。 (11) A server that can be connected to a plurality of users' terminals via a communication network and obtains the degree of association between a keyword and a website,
The server includes a search result page generation unit that generates a search result page when a user of the terminal performs a search using a keyword,
A query log database for storing a query log in the search results of the user;
Query that stores the session ID when the user performs the search, the URL of the Web page that the user clicked from the search result page, the access date and time when the link to the Web page is clicked, and the keyword A log storage unit;
Using the query log database, a click number counting unit for counting the number of clicks for the URL for each keyword;
A query log extraction unit for extracting a query log including keywords in a predetermined keyword list inputted in advance from the query log database;
A server comprising

このような構成によれば、（１）に記載の方法と同様な作用効果を有するサーバ装置が提供できる。 According to such a configuration, it is possible to provide a server device having the same function and effect as the method described in (1).

（１２）前記抽出したクエリーログに含まれるＵＲＬのクリック数の頻度の高い順に、前記ＵＲＬを前記サーバに接続されたサイト検証者の端末に表示する手段を更に備える、（１１）に記載のサーバ。 (12) The server according to (11), further comprising means for displaying the URL on a terminal of a site verifier connected to the server in descending order of the number of clicks of the URL included in the extracted query log. .

このような構成によれば、（２）に記載の方法と同様な作用効果を有するサーバ装置が提供できる。 According to such a configuration, it is possible to provide a server device having the same operational effects as the method described in (2).

（１３）前記サーバは、前記ユーザがクリックしたＷｅｂページにおける滞在時間を計算する滞在時間計算部を更に備え、同一頻度のクリック数のＵＲＬに対しては、前記滞在時間の大きい順に前記ＵＲＬを前記サーバに接続されたサイト検証者の端末に表示する手段を更に備える、（１２）に記載のサーバ。 (13) The server further includes a stay time calculation unit that calculates a stay time in the Web page clicked by the user, and for the URLs having the same number of clicks, the URLs are listed in descending order of the stay time. The server according to (12), further comprising means for displaying on a terminal of a site verifier connected to the server.

このような構成によれば、（３）に記載の方法と同様な作用効果を有するサーバ装置が提供できる。 According to such a structure, the server apparatus which has the same effect as the method as described in (3) can be provided.

（１４）前記抽出したクエリーログに含まれる各ＵＲＬに対するアクセス日時から、前記ユーザがクリックしたＷｅｂページにおける滞在時間を計算する滞在時間計算部を更に備え、前記滞在時間から求めた重み度と前記クリック数とを掛けてキーワードとサイトの関連度を計算し、前記関連度の高い順に前記ＵＲＬを前記サーバに接続されたサイト検証者の端末に表示する手段を更に含む、（１１）に記載のサーバ。 (14) It further includes a stay time calculation unit that calculates a stay time on the Web page clicked by the user from an access date and time for each URL included in the extracted query log, and the weighting degree obtained from the stay time and the click The server according to (11), further including means for calculating a degree of association between a keyword and a site by multiplying the number, and displaying the URL on a terminal of a site verifier connected to the server in descending order of the degree of association. .

このような構成によれば、（４）に記載の方法と同様な作用効果を有するサーバ装置が提供できる。 According to such a structure, the server apparatus which has the same effect as the method as described in (4) can be provided.

（１５）通信ネットワークを介して複数のユーザの端末と接続可能なサーバにおいてキーワードとＷｅｂサイトの関連度を求めるためのコンピュータ・プログラムあって、
前記サーバは、前記端末のユーザがキーワードによる検索を行った際に検索結果ページを生成し該ユーザの端末に前記検索結果ページを表示する手段と、該ユーザの検索結果におけるクリックログを格納するクエリーログ記憶手段を備え、
前記サーバに、
前記ユーザが前記検索を行った際のセッションのＩＤ、前記検索結果ページから前記ユーザがクリックしたＷｅｂページのＵＲＬ、該Ｗｅｂページへのリンクをクリックした際のアクセス日時、及び前記キーワードを前記クエリーログ記憶手段に集積するステップと、
前記クエリーログ記憶手段を用いて、前記キーワード毎に前記ＵＲＬに対するクリック数を集計するステップと、
前記クエリーログ記憶手段から、予め入力された所定のキーワード・リスト内のキーワードを含むクエリーログを抽出するステップと、
を実行させるコンピュータ・プログラム。 (15) A computer program for obtaining a degree of association between a keyword and a website in a server connectable to a plurality of user terminals via a communication network,
The server includes a means for generating a search result page when the user of the terminal performs a search using a keyword and displaying the search result page on the user terminal, and a query for storing a click log in the search result of the user Log storage means,
To the server,
The query log includes the session ID when the user performs the search, the URL of the Web page clicked by the user from the search result page, the access date and time when the link to the Web page is clicked, and the keyword. Accumulating in storage means;
Using the query log storage means to count the number of clicks for the URL for each keyword;
Extracting a query log including keywords in a predetermined keyword list input in advance from the query log storage means;
A computer program that runs

このような構成によれば、（１）に記載の方法と同様な作用効果を有するサーバ装置を実現させる手段として、コンピュータ・プログラムの形態で本発明を提供できる。 According to such a configuration, the present invention can be provided in the form of a computer program as means for realizing a server device having the same operational effects as the method described in (1).

本発明によれば、キーワード検索におけるクリックログやアクセス日時を含むクエリーログを用いることで、サイト検証者が、例えば、有害サイトのページ、アクセス制限対象サイトのページ、オークションにおける取引禁止物品の出品ページ、特定主題に対する話題のサイトのページ等を効率的にかつ持続的に収集することができる。 According to the present invention, by using a click log in keyword search and a query log including access date and time, a site verifier can, for example, a page of a harmful site, a page of an access-restricted site, an exhibition page of prohibited items in an auction, for example. It is possible to efficiently and continuously collect a page of a topical site for a specific subject.

以下、本発明の好適な実施形態について図を参照しながら説明する。 Hereinafter, preferred embodiments of the present invention will be described with reference to the drawings.

［クエリーログ集計サーバの構成］
図１は、本発明の好適な実施形態の一例に係るシステムの全体構成、及びその中核となるクエリーログ集計サーバの機能ブロックを示す図である。本システムの全体構成としては、クエリーログ集計サーバ１０が、複数のユーザ端末２０とインターネット２１を介して接続され、更に検索サーバ３０とネットワーク３１を介して接続される。ネットワーク３１は、任意の通信ネットワークであってよく、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）であっても、またインターネットであってもよい。ユーザは、ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）やＰＤＡ（ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｔｓ）、携帯電話機等のユーザ端末２０から、検索キーワードを用いてインターネット上で求める情報が存在するＷｅｂサイトを検索する。一般ユーザがキーワードを用いて検索サイト等で行う検索を、本明細書ではクエリー（照会）と呼んでいる。また、サイト検証者がサイトを収集するために用いるキーワードをシードクエリーと呼んでいる。通常、クエリー要求は、検索サーバ３０に送信され、検索サーバ３０のクエリー受付部３２でキーワードを受信し、種々の公知の検索エンジン３３を用いて検索処理がなされる。この検索サーバ３０は、インターネットを介して、直接的または間接的にユーザ端末２０と通信可能であるが、本発明では、ユーザ端末２０と検索サーバ３０との間にクエリーログ集計サーバ１０を接続する。すなわち、クエリーログ集計サーバ１０は、ユーザ端末２０と検索サーバ３０との仲立ちをする役目を担う。 [Configuration of Query Log Total Server]
FIG. 1 is a diagram showing an overall configuration of a system according to an example of a preferred embodiment of the present invention and functional blocks of a query log tabulation server as a core thereof. As an overall configuration of this system, a query log totaling server 10 is connected to a plurality of user terminals 20 via the Internet 21 and further connected to a search server 30 via a network 31. The network 31 may be an arbitrary communication network, and may be a local area network (LAN), a wide area network (WAN), or the Internet. A user searches for a Web site containing information to be searched on the Internet using a search keyword from a user terminal 20 such as a PC (Personal Computer), a PDA (Personal Digital Assistant), or a mobile phone. A search performed by a general user at a search site or the like using a keyword is referred to as a query in this specification. A keyword used by a site verifier to collect sites is called a seed query. Usually, the query request is transmitted to the search server 30, the keyword is received by the query receiving unit 32 of the search server 30, and search processing is performed using various known search engines 33. The search server 30 can communicate with the user terminal 20 directly or indirectly via the Internet. In the present invention, the query log totaling server 10 is connected between the user terminal 20 and the search server 30. . That is, the query log totaling server 10 plays a role of mediating between the user terminal 20 and the search server 30.

クエリーログ集計サーバ１０は、ユーザ端末２０からクエリー要求を受け取ると、検索サーバ３０にそれを転送する。このとき、そのクエリーのセッションのＩＤを記憶しておく。検索サーバ３０から、対応するセッションＩＤを含んだユーザのクエリーに対する検索結果を受け取ると、検索結果生成部１１がユーザ端末２０に対して独自の検索結果ページを生成する。通常、検索結果ページには、キーワードにマッチする（と判断された）Ｗｅｂページへのリンクが含まれるが、本サーバ１０は、後述するリダイレクタによる方法で、ユーザがこの独自の検索結果ページに掲載されたリンク先をクリックするたびに、クエリーログ保存部１２によって、ユーザのクエリーログを保存する。このクエリーログには、セッションＩＤの他、検索結果ページからユーザがクリックしたＷｅｂページのＵＲＬ、該Ｗｅｂページへのリンクをクリックした際のアクセス日時、及び検索に用いたキーワードが含まれる。 When the query log totaling server 10 receives the query request from the user terminal 20, it transfers it to the search server 30. At this time, the session ID of the query is stored. When the search result for the user query including the corresponding session ID is received from the search server 30, the search result generation unit 11 generates a unique search result page for the user terminal 20. Normally, the search result page includes a link to a Web page that is matched (determined as a keyword), but the server 10 can be posted on this unique search result page by a redirector method described later. Whenever the clicked link destination is clicked, the query log storage unit 12 stores the user's query log. In addition to the session ID, the query log includes the URL of the Web page that the user clicked from the search result page, the access date and time when the link to the Web page is clicked, and the keyword used for the search.

クエリーログのデータは、クエリーログ・データベース（クエリーログＤＢ１７；以降、データベースはＤＢと略することがある）に格納される。クエリーログＤＢ１７に蓄積された多数のユーザのクエリーログは、クエリーログ集計サーバ１０の集計部（図示せず）によって、適時集計作業が行われる。集計部は、例えば、クリック数集計部１３と滞在時間計算部１４によって構成され、クエリーログ中のデータをクエリーに用いられたキーワードと、クリックされたＵＲＬを一定の時間間隔（例えば２４時間毎、１週間毎、１ヶ月毎等）で自動的に集計する。得られた各種集計情報は、クエリーログＤＢ１７に格納されるか、あるいは別のデータベースに格納されてもよい。また、このような各種データベースは、クエリーログ集計サーバ１０の記憶部１０ｂとして構成されてもよいが、サーバの外部に接続するように構成されてもよい。 The query log data is stored in a query log database (query log DB 17; hereinafter, the database may be abbreviated as DB). The query logs of a large number of users stored in the query log DB 17 are subjected to timely tabulation work by a tabulation unit (not shown) of the query log tabulation server 10. The totaling unit is constituted by, for example, the click number totaling unit 13 and the stay time calculating unit 14, and the keyword used in the query for the data in the query log and the clicked URL are set at a certain time interval (for example, every 24 hours, (Every week, every month, etc.) The obtained various total information may be stored in the query log DB 17 or may be stored in another database. Such various databases may be configured as the storage unit 10b of the query log totaling server 10, but may be configured to be connected to the outside of the server.

また、クエリーログ集計サーバ１０は、ネットワーク３１を介して、または図のように直接的に表示部／操作部１６を介して、サイト検証者端末４０と接続される。サイト検証者とは、特定の種類のＷｅｂサイトを収集し、それらのサイトが実際にその種類「分類」にあっているかどうかを検証する者である。サイト検証者は、クエリーログＤＢ１７に蓄積されたデータを用いて、求める種類のサイト（例えば「有害」サイト等）を収集し、その内容を検証する。具体的には、サイト検証者は、サイトを収集するためのキーワード・リストを作成し、それをクエリーログ集計サーバ１０に入力する。キーワード・リストが多数あるときは、キーワード・リストＤＢ１８に登録、格納するようにしてもよい。 The query log totaling server 10 is connected to the site verifier terminal 40 via the network 31 or directly via the display / operation unit 16 as shown in the figure. The site verifier is a person who collects specific types of Web sites and verifies whether or not those sites actually meet the type “classification”. The site verifier uses the data accumulated in the query log DB 17 to collect a desired type of site (for example, a “harmful” site) and verifies its contents. Specifically, the site verifier creates a keyword list for collecting sites and inputs it to the query log totaling server 10. When there are many keyword lists, they may be registered and stored in the keyword list DB 18.

クエリーログ抽出部１５は、この予め定められたキーワード・リストに含まれるキーワード（１語のキーワードまたは複数のキーワードの組であってもよい）に対して、後述するような方法で、関連度の高いサイトのＵＲＬを抽出する。抽出した結果は、何らかの方法（例えば順位づけて表示する等）でサイト検証者によって利用される。 The query log extraction unit 15 uses a method as described later to determine the degree of relevance for keywords included in this predetermined keyword list (which may be a single keyword or a set of keywords). Extract high site URLs. The extracted results are used by the site verifier by some method (for example, displaying the rankings).

なお、このようなクエリーログ集計サーバ１０の各機能部１１〜１５は、制御部１０ａ（典型的にはコンピュータのＣＰＵによって実行される機能）を構成する。また、上記の実施形態では、判り易いように、クエリーログ集計サーバ１０と検索サーバ３０は、別個のサーバであるとして説明したが、必ずしもこの構成に限定される必要はなく、検索サーバ３０に、クエリーログ集計サーバ１０の機能を含ませてもよい。 In addition, each function part 11-15 of such a query log totalization server 10 comprises the control part 10a (typically the function performed by CPU of a computer). In the above embodiment, the query log totaling server 10 and the search server 30 have been described as being separate servers for easy understanding. However, the search server 30 is not necessarily limited to this configuration. The function of the query log totaling server 10 may be included.

図２は、上記システムにおける以上の処理の流れをまとめた図である。詳細は前述の説明と重複するので省略する。なお、この図では、クエリーログ集計サーバ１０のクエリーログ集計処理は示していないが、クエリーログ集計処理は、ここでの処理とは独立して行われる（後述）。 FIG. 2 is a diagram summarizing the above processing flow in the system. Details will be omitted because they overlap with the above description. Although the query log totaling process of the query log totaling server 10 is not shown in this figure, the query log totaling process is performed independently of the process here (described later).

図３は、クエリーログを集計するための一つの方法として、リダイレクタ４２の概念を示す図である。クエリーログ集計サーバ１０の検索結果ページ生成部１１は、検索サーバ３０からユーザのクエリーに対する検索結果を受け取ると、検索結果ページ４１に掲載された各Ｗｅｂページへのリンク先をリダイレクタ４２のＵＲＬに置き換える。そして、このリダイレクタのＵＲＬを含んだ検索結果ページ４１を、ユーザ端末２０に返信する。ユーザが受信した検索結果ページ４１から、所望のＷｅｂページへのリンク（例えば、ページＡリンク）をクリックすると、実際にはリダイレクタ４２にジャンプする。 FIG. 3 is a diagram showing the concept of the redirector 42 as one method for aggregating query logs. When the search result page generation unit 11 of the query log totaling server 10 receives a search result for the user query from the search server 30, the search result page generation unit 11 replaces the link destination to each Web page posted on the search result page 41 with the URL of the redirector 42. . Then, the search result page 41 including the redirector URL is returned to the user terminal 20. When the user clicks a link (for example, page A link) to a desired Web page from the search result page 41 received, the user actually jumps to the redirector 42.

リダイレクタ４２は、ユーザが検索結果ページに掲載されたＷｅｂページのリンクをクリックしたとき、クエリーに対する各種データを、具体的には、ユーザ端末２０のＩＰアドレス（固定ＩＰアドレスでなくてもよい）、Ｗｅｂページへのリンクをクリックした時のアクセス日時、クエリーセッションのセッションＩＤ、ユーザがクエリーに用いた一または複数のキーワード、及びユーザが検索結果ページからクリックしたＷｅｂページの本来のＵＲＬ等、クエリーログデータ４４として、クエリーログＤＢ１７へ保存する。その後、ユーザを本来のリンク先のＷｅｂページ（リンク先ページ４３）へジャンプさせる。このようにして、リダイレクタ４２を用いることによって、クエリーログ集計サーバ１０は、ＩＰアドレスで識別されるユーザ毎、セッションＩＤで識別されるセッション毎にクエリーログを容易に集めることができる。 When the user clicks on the link of the Web page posted on the search result page, the redirector 42, the various data for the query, specifically, the IP address of the user terminal 20 (not necessarily a fixed IP address), Query log, such as the access date and time when a link to a Web page is clicked, the session ID of the query session, one or more keywords used by the user for the query, and the original URL of the Web page that the user clicked from the search results page The data 44 is stored in the query log DB 17. Thereafter, the user is caused to jump to the original linked web page (linked page 43). Thus, by using the redirector 42, the query log totaling server 10 can easily collect query logs for each user identified by the IP address and for each session identified by the session ID.

図４は、クエリーログＤＢ１７に格納されるクエリーログデータ４４の一例を示す図である。図示するように、この表ではユーザ端末２０を識別するためのＩＰアドレス、検索結果ページからクリックしたアクセス日時、クエリーのセッションＩＤ、クエリーに用いられたキーワード、及び実際にクリックしたＵＲＬが表形式で格納されている。この例では、ユーザ端末２０（ＩＰアドレス１１０．１４９．１４５．１）から、クエリーセッションＩＤ：Ｂｐ４ｅｄ６９１７において、クエリーキーワードとして「無修正」ａｎｄ「画像」を用いて検索した結果、得られた検索結果ページから、表の右端に示すような５つのＵＲＬを実際に、ユーザが２００６年１０月６日２１：４５：５３の時点から順次Ｗｅｂページへのリンクをクリックしていったときのアクセス日時（時刻）が記録されている。 FIG. 4 is a diagram illustrating an example of the query log data 44 stored in the query log DB 17. As shown in the figure, this table shows the IP address for identifying the user terminal 20, the access date and time clicked from the search result page, the query session ID, the keyword used in the query, and the URL actually clicked in a table format. Stored. In this example, the search result obtained as a result of searching from the user terminal 20 (IP address 110.149.145.1) using “uncorrected” and “image” as query keywords in the query session ID: Bp4ed6917. From the page, five URLs as shown on the right end of the table are actually accessed when the user clicks on the link to the Web page sequentially from 21:45:53 on October 6, 2006 ( Time) is recorded.

なお、特に図示していないが、検索結果ページ上での各ＵＲＬの表示順位（ランク）も記録するようにしてもよい。これは、検索結果ページの上位にあるリンクほどユーザがクリックする確率が高いため、それを考慮に入れることを可能にするためである。例えば、検索結果ページの第１ページに表示されたリンク（Ｙａｈｏｏ！検索では、１位から１０位までを１ページに表示されるようにしている）は、クリックされる率が高いことが知られている。したがって、２ページ目以降に表示されたリンクがクリックされた場合には、１ページ目にあるリンクより高い重み度を与えるようにしてもよい。 Although not specifically shown, the display order (rank) of each URL on the search result page may also be recorded. This is because the higher the probability that the user clicks on the link at the top of the search result page, it is possible to take it into account. For example, it is known that the link displayed on the first page of the search result page (from Yahoo! search, the first to tenth pages are displayed on one page) has a high click rate. ing. Therefore, when a link displayed on the second page or later is clicked, a higher degree of weight than the link on the first page may be given.

図５は、クエリーログ集計サーバ１０の集計処理手順の一例を示す図である。まず、ステップＳ１において、クエリーログの集合を取得する。すなわち、一定の集計期間の全ユーザのクエリーログをセッションＩＤ毎に集める。 FIG. 5 is a diagram illustrating an example of a totaling process procedure of the query log totaling server 10. First, in step S1, a set of query logs is acquired. That is, the query logs of all users for a certain counting period are collected for each session ID.

次に、ステップＳ２において、上記の集合をアクセス日時でソートする。更に、ステップＳ３において、あるセッションにおける一つのＵＲＬに対するアクセス日時と、次にクリックしたＵＲＬに対するアクセス日時の差を求め、これを最初のＵＲＬにおける滞在時間とする。この処理をセッション内の全てのＵＲＬに対して繰り返す。これについて詳しくは次の図６に示す。 Next, in step S2, the set is sorted by access date. Further, in step S3, a difference between the access date / time for one URL in a session and the access date / time for the next clicked URL is obtained, and this is set as the stay time in the first URL. This process is repeated for all URLs in the session. This is shown in detail in FIG.

図６は、Ｗｅｂページの滞在時間を求める方法の概略を示す図である。この図は、Ｙａｈｏｏ！検索において、ユーザが検索キーワードとして、「ＡＡＡ」ａｎｄ「ＢＢＢ」を用いて検索した結果として、検索結果ページ５０が表示された例を示している。ここでは実際には、「ＡＡＡ」は「無修正」、「ＢＢＢ」は「画像」の用語を用いた。この検索結果では、約６００万件以上のサイトがヒットしているが、サイト検証者がこの全てのサイトを検証するのは、非常に困難である。 FIG. 6 is a diagram showing an outline of a method for obtaining the stay time of a Web page. This figure is Yahoo! In the search, an example is shown in which a search result page 50 is displayed as a result of a search performed by the user using “AAA” and “BBB” as search keywords. In practice, the term “AAA” is “uncorrected” and “BBB” is “image”. In this search result, about 6 million or more sites have been hit, but it is very difficult for the site verifier to verify all the sites.

ユーザは、例えば、検索結果ページ５０の第１順位の「ＡＡＡＢＢＢ最前線」ページを時刻ｔ１ｓにおいてクリックし、「ＡＡＡＢＢＢ最前線」のリンク先ページであるＷｅｂページ５１を閲覧した後、ブラウザの「戻る」ボタンを時刻ｔ１ｅに押して、検索結果ページ５０に戻る。同様に、第２順位の「ＡＡＡＢＢＢの宝庫」のリンク先ページであるＷｅｂページ５２を時刻ｔ２ｓにおいてクリックし、時刻ｔ２ｅに戻ったとする。このとき、ページ５１におけるユーザの滞在時間は、ｔ１ｅ−ｔ１ｓであり、ページ５２の滞在時間は、ｔ２ｅ−ｔ２ｓである。しかしながら、ｔ１ｅから次のｔ２ｓまでの時間は、一般的に短く無視し得るので、本発明の方法では、ｔ１ｅがｔ２ｓにほぼ等しいとする。すなわち、ページ５１の滞在時間は、近似的にｔ２ｓ−ｔ１ｓで求めることができる。次のページ５２以降についても同様にして滞在時間を求める。 For example, the user clicks the “AAABBBB forefront” page of the first ranking of the search result page 50 at time t1s, browses the Web page 51 that is the linked page of “AAABBB forefront”, and then returns to the browser “return”. ”Button at time t 1 e to return to the search result page 50. Similarly, it is assumed that the Web page 52 that is the linked page of “AAABBB treasure” in the second rank is clicked at time t2s and returned to time t2e. At this time, the stay time of the user on the page 51 is t1e-t1s, and the stay time of the page 52 is t2e-t2s. However, since the time from t1e to the next t2s is generally short and can be ignored, the method of the present invention assumes that t1e is approximately equal to t2s. That is, the staying time of the page 51 can be approximately calculated by t2s-t1s. The stay time is similarly obtained for the next page 52 and thereafter.

ただし、仮に、ページ５２がこのクエリーセッションにおける最後の閲覧ページであった場合は、次にクリックしたページが存在しないので、上記の方法は使えない。しかし、この場合は（次のＵＲＬが存在しない場合）、滞在時間として十分に長い時間、例えば３０分、をセットするようにする。あるいは、最後のＵＲＬの重み度を通常のウェイトより多くするように調整してもよい。例えば、最後のＵＲＬ以外の平均滞在時間を２倍して最後のＵＲＬの滞在時間としてもよい。これは、最後にクリックしたページには、ユーザの求める情報が存在した確率が高いからである。すなわち、キーワードとサイトとの関連が高いと推察できる。このようにして本発明の実施形態では、滞在時間の計算を近似的に求めているが、ブラウザの「戻る」ボタン等の押下を何らかの方法でリダイレクタ４２が検出できるようにし、滞在時間をより正確に求めるようにしてももちろんよい。 However, if the page 52 is the last viewed page in this query session, the page clicked next does not exist, so the above method cannot be used. However, in this case (when the next URL does not exist), a sufficiently long time, for example, 30 minutes is set as the staying time. Or you may adjust so that the weight degree of the last URL may be made larger than a normal weight. For example, the average staying time other than the last URL may be doubled as the last URL staying time. This is because the page clicked last has a high probability that the information requested by the user exists. That is, it can be inferred that the relationship between the keyword and the site is high. Thus, in the embodiment of the present invention, the calculation of the stay time is approximately obtained. However, the redirector 42 can detect the pressing of the “return” button or the like of the browser by some method, and the stay time is more accurately determined. Of course, you may ask for it.

滞在時間の計算ステップが終わると、図５のステップＳ４に戻り、クリック済みのＵＲＬをクリック数の多い順に、すなわち高頻度順にソートする。次に、ステップＳ５において、同一頻度のクリック数のＵＲＬに対しては、滞在時間順にソートして集計処理を終わる。なお、ステップＳ５は、同一頻度のＵＲＬに対してのみでなく、全てのＵＲＬに対して滞在時間を求め、更にそこから重み度（ウェイト）を求め、その値とクリック頻度を掛け合わせた値の順にソートするようにしてもよい。例えば、滞在時間が３０秒以内の場合は、重み度を１とし、以後滞在時間が３０秒増える毎に重み度を１加えるようにする。こうすることによって、クリック頻度が高くても滞在時間の短いＷｅｂページは、キーワードに対する関連度が低いか、Ｗｅｂページのタイトルと内容がマッチしてないか等の理由が考えられ、このようなＷｅｂページは、相対的に順位が低くなるのでサイト検証効率のアップに役立つ。 When the stay time calculation step is completed, the process returns to step S4 in FIG. Next, in step S5, the URLs with the same number of clicks are sorted in the order of staying time, and the tabulation process ends. In step S5, the stay time is obtained for all URLs, not only for URLs with the same frequency, and the weight (weight) is obtained therefrom, and the value multiplied by the click frequency is obtained. You may make it sort in order. For example, when the staying time is within 30 seconds, the weighting factor is set to 1, and thereafter, every time the staying time increases by 30 seconds, the weighting factor is added by one. By doing this, a Web page with a short stay time even if the click frequency is high may be due to reasons such as low relevance to the keyword or whether the title and content of the Web page do not match. Since the rank of the page is relatively low, it helps to improve the site verification efficiency.

また、前述したように、検索結果ページ上に表示される順位（ランク）を、別にウェイトとして考慮してもよい。例えば、検索結果ページの上位にランクされるＷｅｂページが多数クリックされるのは当然であるので、ランクが低いにも関わらず、クリック数が所定の数より多いＵＲＬ、またはクリック率が高いＵＲＬに対しては、ウェイトを２倍にする等の方法が考えられる。 Further, as described above, the rank (rank) displayed on the search result page may be considered as a separate weight. For example, since it is natural that many Web pages ranked higher in the search result page are clicked, a URL having a higher number of clicks than a predetermined number or a URL having a high click rate although the rank is low. On the other hand, a method such as doubling the weight is conceivable.

また、サイト検証者のための特定のＵＲＬに対する滞在時間に関するスコアであるＵＲＬ＿Ｓｃｏｒｅ（ｕ）を、次の数式によって求める。

Also, URL_Score (u), which is a score related to the staying time for a specific URL for the site verifier, is obtained by the following formula.

ただし、ｔ（ｓ）：セッションｓにおけるＵＲＬｕの滞在時間
ｑ：シードクエリー
Ｓ：セッション集合
Ｑ：同種のサイトを検索するためのシードクエリーの集合
ＵＲＬ＿Ｓｃｏｒｅ（ｑ，ｕ）：シードクエリーｑに対するＵＲＬｕのスコア Where t (s): URLu stay time in session s
q: Seed query
S: Session set
Q: A set of seed queries for searching similar sites
URL_Score (q, u): URLu score for seed query q

このようにすることで、サイト検証者のための同種のサイトを求めるシードクエリーが複数ある場合でも、特定のＵＲＬに対してその滞在時間の和をスコアとして求め、そのスコアをシードクエリーの集合全てにおいて加算することで、ＵＲＬ毎のスコアを求める。 In this way, even when there are a plurality of seed queries for obtaining the same type of site for the site verifier, the sum of the staying times is obtained as a score for a specific URL, and the score is obtained for all sets of seed queries. The score for each URL is obtained by adding at.

図７は、クエリーログＤＢ１７に格納されるクエリーログ集計結果データの一例を示す図である。ここでは、検索キーワード（クエリーに用いたキーワード）「ＡＡＡ」に対してＵＲＬ毎に集計した結果が示されている。例えば、ＵＲＬｈｔｔｐ：／／ｘｘｘ．ａａａ．ｂｂｂは、２００６年１２月１日１２時０分に集計されたときには、クリック数１４３、滞在時間から求めたウェイト（重み度）は、４３であったが、次の２４時間の２００６年１２月２日１２時０分には、クリック数１８９、ウェイト８９になっている。この例では、集計期間として２４時間毎のデータを集めているが、これらを更に集めて、例えば、１週間毎、１ヶ月毎の集計データも作成してよい。またこの例では、単一のキーワード「ＡＡＡ」のみを示しているが、複数の単語、例えば、「ＡＡＡ」ａｎｄ「ＢＢＢ」、「ＡＡＡ」ｏｒ「ＢＢＢ」も一つのキーワードとして集計する。 FIG. 7 is a diagram illustrating an example of query log tabulation result data stored in the query log DB 17. Here, the results of aggregation for each URL with respect to the search keyword (keyword used in the query) “AAA” are shown. For example, URL http: // xxx. aaa. When bbb was counted at 12:00 on December 1, 2006, the number of clicks 143 and the weight (weight) calculated from the stay time was 43, but the next 24 hours in December 2006 At 12:00 on the 2nd, the number of clicks is 189 and the weight is 89. In this example, data for every 24 hours is collected as the total period, but these may be further collected to generate total data for every week, for example, every month. In this example, only a single keyword “AAA” is shown, but a plurality of words, for example, “AAA” and “BBB”, “AAA” or “BBB” are also counted as one keyword.

［実施例］
図８は、本発明の一実施例として、ペアレンタル・コントロールでの活用方法を示した図である。この例では、クエリーログＤＢ１７に、符号７２で示す集計結果データ表が格納されているとする。この集計結果データ７２は、図７で説明した表と基本的には同様である。また、この例では有害サイトの検証者は、キーワード・リスト７５をＮＧキーワードＤＢ７４に格納することで、クエリーログ集計サーバ１０に入力しているものとしている。キーワード・リスト７５には、卑猥、暴力、差別、グロテスク等に分類されるキーワードが入力されている。 [Example]
FIG. 8 is a diagram showing a utilization method in parental control as an embodiment of the present invention. In this example, it is assumed that a tabulation result data table indicated by reference numeral 72 is stored in the query log DB 17. The tabulation result data 72 is basically the same as the table described with reference to FIG. Further, in this example, it is assumed that the verifier of the harmful site has input the keyword list 75 into the query log totaling server 10 by storing the keyword list 75 in the NG keyword DB 74. In the keyword list 75, keywords classified into obscene, violent, discrimination, grotesque, and the like are input.

クエリーログ集計サーバ１０は、このキーワード・リスト７５と集計結果データ表７２を比較し、集計結果データ表７２からＮＧキーワード（ここでは、「ｄｄｄ」、「ｂｂｂ」、「ｆｆｆ」）を含む情報（集計データレコード）を抜き出す（符号７３で示す処理）。そして、この集計データレコードから、クリック数とウェイトを掛け合わせて関連度を計算する。この例では、ＮＧキーワード「ｄｄｄ」とＵＲＬｈｔｔｐ：／／ｘｘｘの関連度は１６８２１となる。同様に、ＮＧキーワード「ｂｂｂ」とＵＲＬｈｔｔｐ：／／ｙｙｙの関連度は１３４００、「ｆｆｆ」とＵＲＬｈｔｔｐ：／／ｚｚｚの関連度は４２２４となっている。クエリーログ集計サーバ１０は、この関連度データ７７を格納した判定リストＤＢ７６を作成し、各ＵＲＬを関連度の高い順にソートしてサイト検証者の端末に表示する。サイト検証者は、このソートされたＵＲＬの上位のものから順にＷｅｂページの内容を閲覧し、アクセス制限の判定（符号７８で示す処理）を行い、該当するＷｅｂページのＵＲＬをブラックリストＤＢ７９に登録する。このように有害サイトの収集、検証に本発明のクエリーログ集計サーバ１０を用いることで、サイト検証者のサイトの検証効率を上げることができる。 The query log totaling server 10 compares this keyword list 75 with the totaling result data table 72, and information including NG keywords (here, “ddd”, “bbb”, “fff”) from the totaling result data table 72 ( (A total data record) is extracted (processing indicated by reference numeral 73). Then, the degree of association is calculated from the total data record by multiplying the number of clicks and the weight. In this example, the degree of association between the NG keyword “ddd” and the URL http: // xxx is 16821. Similarly, the degree of association between the NG keyword “bbb” and the URL http: // yyy is 13400, and the degree of association between “fff” and the URL http: // zzz is 4224. The query log totaling server 10 creates a determination list DB 76 storing the relevance data 77, sorts the URLs in descending order of relevance, and displays them on the site verifier's terminal. The site verifier browses the contents of the Web pages in order from the top of the sorted URL, performs access restriction determination (processing indicated by reference numeral 78), and registers the URL of the corresponding Web page in the blacklist DB 79. To do. In this way, by using the query log tabulation server 10 of the present invention for collecting and verifying harmful sites, the site verifier's site verification efficiency can be increased.

上記の実施例では、ペアレンタル・コントロールにおけるブラックリストの作成に本発明の方法を利用したが、別の応用例として、ネットオークションにおける取引禁止物品を含む出品ページの探索や、特定の主題に関する話題のサイトを効率的に固定して、迅速に情報提供することを特徴とする情報サービスに用いることができる。 In the above embodiment, the method of the present invention was used to create a blacklist in parental control. However, as another application example, a search for an exhibition page including trade prohibited articles in a net auction, or a topic related to a specific subject matter. This site can be used for an information service characterized in that the site is efficiently fixed and information is quickly provided.

［クエリーログ集計サーバのハードウェア構成］
図９は、発明の好適な実施形態の一例に係るクエリーログ集計サーバ１０のハードウェア構成の一例を示す図である。クエリーログ集計サーバ１０は、制御部１０ａを構成するＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１０１０（マルチプロセッサ構成ではＣＰＵ１０１２等複数のＣＰＵが追加されてもよい）、バスライン１００５、通信Ｉ／Ｆ１０４０、メインメモリ１０５０、ＢＩＯＳ（ＢａｓｉｃＩｎｐｕｔＯｕｔｐｕｔＳｙｓｔｅｍ）１０６０、ＵＳＢポート１０９０、Ｉ／Ｏコントローラ１０７０、並びにキーボード及びマウス１１００等の入力手段や表示装置１０２２を備える。 [Hardware configuration of query log summary server]
FIG. 9 is a diagram illustrating an example of a hardware configuration of the query log totaling server 10 according to an example of the preferred embodiment of the invention. The query log totaling server 10 includes a central processing unit (CPU) 1010 (a plurality of CPUs such as the CPU 1012 may be added in a multiprocessor configuration), a bus line 1005, a communication I / F 1040, and a main memory 1050. , A BIOS (Basic Input Output System) 1060, a USB port 1090, an I / O controller 1070, an input means such as a keyboard and mouse 1100, and a display device 1022.

Ｉ／Ｏコントローラ１０７０には、テープドライブ１０７２、ハードディスク１０７４、光ディスクドライブ１０７６、半導体メモリ１０７８、等の記憶手段を接続することができる。 Storage means such as a tape drive 1072, a hard disk 1074, an optical disk drive 1076, and a semiconductor memory 1078 can be connected to the I / O controller 1070.

ＢＩＯＳ１０６０は、クエリーログ集計サーバ１０の起動時にＣＰＵ１０１０が実行するブートプログラムや、クエリーログ集計サーバ１０のハードウェアに依存するプログラム等を格納する。 The BIOS 1060 stores a boot program executed by the CPU 1010 when the query log totaling server 10 is started, a program depending on the hardware of the query log totaling server 10, and the like.

記憶部１０ｂを構成するハードディスク１０７４は、クエリーログ集計サーバ１０が本発明の機能を実行するためのプログラムを記憶しており、更に必要に応じて各種データベースを構成可能である。 The hard disk 1074 constituting the storage unit 10b stores a program for the query log totaling server 10 to execute the functions of the present invention, and various databases can be configured as necessary.

光ディスクドライブ１０７６としては、例えば、ＤＶＤ−ＲＯＭドライブ、ＣＤ−ＲＯＭドライブ、ＤＶＤ−ＲＡＭドライブ、ＣＤ−ＲＡＭドライブを使用することができる。この場合は各ドライブに対応した光ディスク１０７７を使用する。光ディスク１０７７から光ディスクドライブ１０７６によりプログラムまたはデータを読み取り、Ｉ／Ｏコントローラ１０７０を介してメインメモリ１０５０またはハードディスク１０７４に提供することもできる。また、同様にテープドライブ１０７２に対応したテープメディア１０７１を主としてバックアップのために使用することもできる。 As the optical disc drive 1076, for example, a DVD-ROM drive, a CD-ROM drive, a DVD-RAM drive, or a CD-RAM drive can be used. In this case, the optical disk 1077 corresponding to each drive is used. A program or data can be read from the optical disk 1077 by the optical disk drive 1076 and provided to the main memory 1050 or the hard disk 1074 via the I / O controller 1070. Similarly, the tape medium 1071 corresponding to the tape drive 1072 can be used mainly for backup.

クエリーログ集計サーバ１０に提供されるプログラムは、ハードディスク１０７４、光ディスク１０７７、またはメモリカード等の記録媒体に格納されて提供される。このプログラムは、Ｉ／Ｏコントローラ１０７０を介して、記録媒体から読み出され、または通信Ｉ／Ｆ１０４０を介してダウンロードされることによって、クエリーログ集計サーバ１０にインストールされ実行されてもよい。 The program provided to the query log totaling server 10 is provided by being stored in a recording medium such as the hard disk 1074, the optical disk 1077, or a memory card. This program may be read from the recording medium via the I / O controller 1070 or downloaded via the communication I / F 1040 to be installed and executed in the query log totaling server 10.

前述のプログラムは、内部または外部の記憶媒体に格納されてもよい。ここで、記憶部１０ｂを構成する記憶媒体としては、ハードディスク１０７４、光ディスク１０７７、またはメモリカードの他に、ＭＤ等の光磁気記録媒体、テープ媒体を用いることができる。また、専用通信回線やインターネットに接続されたサーバシステムに設けたハードディスク１０７４または光ディスクライブラリ等の記憶装置を記録媒体として使用し、通信回線を介してプログラムをクエリーログ集計サーバ１０に提供してもよい。 The aforementioned program may be stored in an internal or external storage medium. Here, in addition to the hard disk 1074, the optical disk 1077, or the memory card, a magneto-optical recording medium such as MD and a tape medium can be used as the storage medium constituting the storage unit 10b. Further, a storage device such as a hard disk 1074 or an optical disk library provided in a server system connected to a dedicated communication line or the Internet may be used as a recording medium, and the program may be provided to the query log totaling server 10 via the communication line. .

ここで、表示装置１０２２は、ユーザにデータの入力を受け付ける画面を表示したり、クエリーログ集計サーバ１０による演算処理結果の画面を表示したりするものであり、ブラウン管表示装置（ＣＲＴ）、液晶表示装置（ＬＣＤ）等のディスプレイ装置を含む。 Here, the display device 1022 displays a screen for accepting data input to the user or displays a calculation processing result screen by the query log totaling server 10, and includes a cathode ray tube display device (CRT), a liquid crystal display, and the like. Includes a display device such as a device (LCD).

ここで、入力手段は、ユーザによる入力の受付を行うものであり、キーボード及びマウス１１００等により構成してよい。 Here, the input means accepts input by the user, and may be configured by a keyboard, a mouse 1100, and the like.

また、通信Ｉ／Ｆ１０４０は、クエリーログ集計サーバ１０を専用ネットワークまたは公共ネットワークを介して端末と接続できるようにするためのネットワーク・アダプタである。通信Ｉ／Ｆ１０４０は、モデム、ケーブル・モデム及びイーサネット（登録商標）・アダプタを含んでよい。 The communication I / F 1040 is a network adapter that enables the query log totaling server 10 to be connected to a terminal via a dedicated network or a public network. The communication I / F 1040 may include a modem, a cable modem, and an Ethernet (registered trademark) adapter.

以上の例は、クエリーログ集計サーバ１０について主に説明したが、コンピュータに、プログラムをインストールして、そのコンピュータをサーバ装置として動作させることにより上記で説明した機能を実現することもできる。したがって、本発明において一実施形態として説明したサーバにより実現される機能は、前述の方法を当該コンピュータにより実行することにより、あるいは、上述のプログラムを当該コンピュータに導入して実行することによっても実現可能である。 In the above example, the query log totaling server 10 has been mainly described. However, the functions described above can also be realized by installing a program in a computer and operating the computer as a server device. Therefore, the functions realized by the server described as an embodiment in the present invention can be realized by executing the above-described method by the computer, or by introducing the above-mentioned program into the computer and executing it. It is.

以上、本発明の実施形態について説明したが、本発明は上述した実施形態に限るものではない。また、本発明の実施形態に記載された効果は、本発明から生じる最も好適な効果を列挙したに過ぎず、本発明による効果は、本発明の実施例に記載されたものに限定されるものではない。 As mentioned above, although embodiment of this invention was described, this invention is not restricted to embodiment mentioned above. The effects described in the embodiments of the present invention are only the most preferable effects resulting from the present invention, and the effects of the present invention are limited to those described in the embodiments of the present invention. is not.

本発明の好適な実施形態の一例に係るシステムの全体構成、及びその中核となるクエリーログ集計サーバの機能ブロックを示す図である。1 is a diagram illustrating an overall configuration of a system according to an example of a preferred embodiment of the present invention and functional blocks of a query log tabulation server as a core thereof. 本発明の好適な実施形態の一例に係るシステムにおける処理の流れを示した図である。It is the figure which showed the flow of the process in the system which concerns on an example of suitable embodiment of this invention. クエリーログを集計するための一つの方法として、リダイレクタ４２の概念を示す図である。It is a figure which shows the concept of the redirector 42 as one method for totaling a query log. クエリーログＤＢ１７に格納されるクエリーログデータ４４の一例を示す図である。It is a figure which shows an example of the query log data 44 stored in query log DB17. クエリーログ集計サーバ１０の集計処理手順の一例を示す図である。It is a figure which shows an example of the total process procedure of the query log total server. Ｗｅｂページの滞在時間を求める方法の概略を示す図である。It is a figure which shows the outline of the method of calculating | requiring the stay time of a web page. クエリーログＤＢ１７に格納されるクエリーログ集計結果のデータの一例を示す図である。It is a figure which shows an example of the data of the query log total result stored in query log DB17. 本発明の一実施例として、ペアレンタル・コントロールでの活用方法を示した図である。It is the figure which showed the utilization method in parental control as one Example of this invention. 発明の好適な実施形態の一例に係るクエリーログ集計サーバ１０のハードウェア構成の一例を示す図であるIt is a figure which shows an example of the hardware constitutions of the query log totalization server 10 which concerns on an example of suitable embodiment of invention.

Explanation of symbols

１０クエリーログ集計サーバ
１０ａ制御部
１０ｂ記憶部
１１検索結果ページ生成部
１２クエリーログ保存部
１３クリック数集計部
１４滞在時間計算部
１５クエリーログ抽出部
１６表示部／操作部
１７クエリーログＤＢ
１８キーワード・リストＤＢ
２０ユーザ端末
２１インターネット
３０検索サーバ
３１ネットワーク
３２クエリー受付部
３３検索エンジン
４０サイト検証者端末
４１検索結果ページ
４２リダイレクタ
４３リンク先ページ
４４クエリーログデータ
５０検索結果ページ
５１リンク先ページ
５２リンク先ページ
７２集計結果データ
７３ＮＧキーワードマッチ情報抽出処理
７４ＮＧキーワードＤＢ
７５ＮＧキーワード表
７６判定リストＤＢ
７７関連度データ
７８アクセス制限判定処理
７９ブラックリストＤＢ DESCRIPTION OF SYMBOLS 10 Query log total server 10a Control part 10b Memory | storage part 11 Search result page production | generation part 12 Query log preservation | save part 13 Click count total part 14 Stay time calculation part 15 Query log extraction part 16 Display part / operation part 17 Query log DB
18 Keyword List DB
20 User terminal 21 Internet 30 Search server 31 Network 32 Query accepting unit 33 Search engine 40 Site verifier terminal 41 Search result page 42 Redirector 43 Link page 44 Query log data 50 Search result page 51 Link page 52 Link page 72 Total Result data 73 NG keyword match information extraction process 74 NG keyword DB
75 NG Keyword Table 76 Judgment List DB
77 Relevance data 78 Access restriction judgment processing 79 Black list DB

Claims

A server that can be connected to a plurality of user terminals via a communication network is a method for determining the degree of association between a keyword and a website,
The server generates a search result page when the user of the terminal performs a search by a keyword and displays the search result page on the user's terminal, and a click log from the search result page of the user. A query log storage means for storing,
In the server,
The query log includes the session ID when the user performs the search, the URL of the Web page clicked by the user from the search result page, the access date and time when the link to the Web page is clicked, and the keyword. Accumulating in storage means;
Using the query log storage means to count the number of clicks for the URL for each keyword;
Extracting a query log including keywords in a predetermined keyword list input in advance from the query log storage means;
Including methods.

The method according to claim 1, further comprising: displaying the URLs on a site verifier terminal connected to the server in descending order of the number of clicks of the URLs included in the extracted query log.

The stay time on the Web page clicked by the user is calculated from the access date and time for each URL included in the extracted query log. For the URLs with the same frequency of click, the URLs are listed in the order of long stay time. The method of claim 2, further comprising displaying on a site verifier terminal connected to the server.

Based on the access date and time for each URL included in the extracted query log, the stay time on the Web page clicked by the user is calculated, and the relationship between the keyword and the site is calculated by multiplying the weight obtained from the stay time and the number of clicks. The method according to claim 1, further comprising calculating a degree and displaying the URL on a site verifier terminal connected to the server in descending order of the relevance.

The staying time is obtained by a difference between a time when the user clicks a link to a certain web page and a time when a user clicks a link to another web page on the web page posted on the search result page. The method according to claim 3 or 4.

The method according to claim 1, wherein a link to a Web page to be posted on the search result page is a redirector URL, and when the user clicks the link, the redirector redirects to an original link destination. .

The method according to claim 4 or 5, wherein URL_Score (u), which is a score related to a staying time for a specific URL for the site verifier, is obtained by the following formula.

The method according to any one of claims 1 to 7, wherein the counting step is periodically performed at predetermined time intervals.

9. The method according to claim 1, wherein the predetermined keyword list includes, as keywords, predetermined words and discriminatory words for parental control for identifying harmful sites.

The method according to any one of claims 1 to 8, wherein the predetermined keyword list includes a predetermined trade prohibited article name in an online auction as a keyword.

A server that can be connected to a plurality of users' terminals via a communication network and obtains the degree of association between a keyword and a website,
The server includes a search result page generation unit that generates a search result page when a user of the terminal performs a search using a keyword,
A query log database for storing a query log in the search results of the user;
Query that stores the session ID when the user performs the search, the URL of the Web page that the user clicked from the search result page, the access date and time when the link to the Web page is clicked, and the keyword A log storage unit;
Using the query log database, a click number counting unit for counting the number of clicks for the URL for each keyword;
A query log extraction unit for extracting a query log including keywords in a predetermined keyword list inputted in advance from the query log database;
A server comprising

The server according to claim 11, further comprising means for displaying the URLs on a site verifier terminal connected to the server in descending order of the number of clicks of the URLs included in the extracted query log.

The server further includes a stay time calculation unit that calculates a stay time on the Web page clicked by the user, and the URLs are connected to the server in the descending order of the stay time for click URLs having the same frequency. The server according to claim 12, further comprising means for displaying on a terminal of a verified site verifier.

A stay time calculation unit for calculating a stay time on the Web page clicked by the user from an access date and time for each URL included in the extracted query log, and further comprising: a weight degree obtained from the stay time and the number of clicks The server according to claim 11, further comprising means for multiplying a keyword and a site to calculate a degree of association, and displaying the URL on a site verifier terminal connected to the server in descending order of the degree of association.

A computer program for determining the degree of association between a keyword and a website in a server connectable to a plurality of user terminals via a communication network,
The server includes a means for generating a search result page when the user of the terminal performs a search using a keyword and displaying the search result page on the user terminal, and a query for storing a click log in the search result of the user Log storage means,
To the server,
The query log includes the session ID when the user performs the search, the URL of the Web page clicked by the user from the search result page, the access date and time when the link to the Web page is clicked, and the keyword. Accumulating in storage means;
Using the query log storage means to count the number of clicks for the URL for each keyword;
Extracting a query log including keywords in a predetermined keyword list input in advance from the query log storage means;
A computer program that runs