JP2004355550A

JP2004355550A - Natural sentence retrieval apparatus, method and program

Info

Publication number: JP2004355550A
Application number: JP2003155561A
Authority: JP
Inventors: Masaaki Nagata; 昌明永田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: NTT Inc
Priority date: 2003-05-30
Filing date: 2003-05-30
Publication date: 2004-12-16
Anticipated expiration: 2023-05-30
Also published as: JP4162223B2

Abstract

【課題】自然言語により表現された質問文を情報検索要求として入力し、当該質問文に対する回答を含み、かつ適合度の高い文書を出力することができる自然文検索装置を提供することにある。
【解決手段】自然言語で表現された質問文から適合度の高い文書を検索する自然文検索装置１０が開示されている。本装置１０は、質問文から検索キーワード集合を作成する質問解析部１０１と、検索された文書及びその文書におけるＫＷＩＣを抽出する文書検索部１０２と、質問への回答を文書が含む期待値を文書の適合度の尺度として文書を順位付けする文書再ランキング部１０３とを有する。
【選択図】図１An object of the present invention is to provide a natural sentence search device capable of inputting a question sentence expressed in a natural language as an information search request and outputting a document having a high degree of relevance including an answer to the question sentence.
Kind Code: A1 A natural sentence search device for searching a document having a high degree of relevance from a question sentence expressed in a natural language is disclosed. The apparatus 10 includes a question analysis unit 101 that creates a search keyword set from a question sentence, a document search unit 102 that extracts a searched document and a KWIC in the document, and a document that includes an expected value including an answer to the question. And a document re-ranking unit 103 for ranking documents as a measure of the degree of relevance.
[Selection diagram] Fig. 1

Description

【０００１】
【発明の属する技術分野】
本発明は、一般的には自然言語により表現された情報検索要求に応じて情報を検索する自然文検索装置に関し、特に、キーワード検索方式を利用して適合度の高い文書情報を獲得できる自然文検索装置に関する。
【０００２】
【従来の技術】
従来の文書検索システムは、基本的にキーワード検索システムであり、キーワード集合で表現された情報検索要求を入力とし、それに適合する文書集合を検索結果として出力する。この際、適合度の尺度としては、ＴＦ−ＩＤＦ法のようなキーワード集合と文書の類似度を使用し、入力されたキーワード集合との類似度が高い順番に文書を出力することが多い（例えば、非特許文献１を参照）。
【０００３】
さらに、ＷＷＷ（ＷｏｒｌｄＷｉｄｅＷｅｂ）上の文書を検索対象とするインターネット検索エンジンの場合には、多くのサイトからリンクを張られているサイトの情報は信頼できるというような、ＷＷＷのトポロジー（相互接続性）に基づくヒューリスティクスを利用することにより適合度の判定の精度を高めている。この方法はＰａｇｅＲａｎｋと呼ばれている（例えば、非特許文献２を参照）。
【０００４】
しかし、例えば、「歴史上、一番背が高いアメリカの大統領は誰か？」というような自然言語により表現された質問文に対する回答を与える文書を検索したい場合、キーワード検索システムに入力すべきキーワード集合をこの質問文から作成するのは必ずしも容易ではない。
【０００５】
そこで、キーワード集合ではなく自然言語で情報検索要求を文書検索システムに入力する方法が従来より研究されており、これはキーワード検索に対して自然文検索と呼ばれている。インターネット検索エンジンには、キーワード検索に加えて自然文検索が可能なものが存在する。
【０００６】
自然文検索は、ユーザが知りたい情報を話し言葉で（しゃべるように）検索できるので、キーワード検索に比べてＡＮＤ−ＯＲなどの論理演算に関する専門知識を必要としないので、ユーザにとっては情報検索要求を自然に表現できるという利点がある。また、情報検索サービスを提供する側からみると、検索キーワードよりも自然文の方が、ユーザが欲しい情報をより正確に把握することができるという利点がある。
【０００７】
従来の自然文検索の研究開発では、例えば以下の特許文献のように、自然言語で表現された情報検索要求、すなわち質問文から検索キーワードや検索式を作成する方法、および、シソーラス（同義語・関連語辞書）を利用してユーザが使用する語彙と検索対象となる文書で使用されている語彙の違いを吸収する方法（いわいる「概念検索」）などが考案されている（例えば、特許文献１）。
【０００８】
また英語の自然文検索では、ユーザが入力した質問文に対して、システムがその意味を解釈して複数の言い換えの可能性を提示し、ユーザにその中から一つを選ばせることによって、システムが回答可能な質問へユーザを誘導する手法もある。
【０００９】
しかし、従来の自然文検索では、質問文からユーザが何をどういう情報が知りたいかを判定し、その質問文に対する回答が文書中に含まれているかどうかを質問文と文書の適合度の尺度とするような方法は存在しない。
【００１０】
近年、ユーザの質問文に対する回答をシステムが直接提示する質問応答システムが盛んに研究されている（例えば、特許文献２を参照）。
【００１１】
質問応答システムでは、例えば、ユーザが「一番背が高いアメリカの大統領は誰ですか？」という質問文を入力すると、システムは、「一番背が高いアメリカの大統領」に関する文書を検索するのではなく、「リンカーン」という回答を出力する点に特徴がある。
【００１２】
一般に、質問応答システムでは、質問文に対する回答を表示するだけではなく、以下の表示例のように、回答を抽出した文書もユーザに提示する。これは、例えば「一番背が高いアメリカの大統領は誰ですか？」という質問文に対して、「リンカーン」という回答だけが出力されても、ユーザは本当に「リンカーン」が正しい回答かどうかを確認できないからである。
（表示例）
「２月１２日
…流血のカンザス事件」などが相次いで起った。リンカーン＝ダグラス論争１８５８年のアメリカ中間選挙でイリノイ州…リンカーンは身長が１９３．０ｃｍもあり、歴代大統領で一番背が高かく、顔もかなり面長で端から見ると…」
従って、「回答および回答を抽出した文書の組」を出力する質問応答システムは、質問文を入力として文書を出力するところから、自然文検索システムの一種と見なせる。
【００１３】
しかし、質問応答システムは、回答の尤もらしさが大きい順に、回答および回答を抽出した文書の組を出力するものであり、文書は、必ずしも質問文に対する適合度の順に出力されない。
【００１４】
例えば、「一番背が高いアメリカの大統領は誰ですか？」という質問文に対して、「アメリカ」と「大統領」という２つのキーワードしか含まない（質問文に対する適合度が低い）文書が大量に存在し、その中に「ブッシュ」という人名が高頻度で出現した場合、質問応答システムでは、回答候補の第１位として「ブッシュ」が選択され、「ブッシュ大統領」に関する文書が回答の根拠として出力されてしまう可能性がある。
【００１５】
すなわち、従来の質問応答システムでは、質問文解析、固有表現抽出、回答候補選択など、文書検索以外の様々な処理が原因となって回答を誤る場合が相当数あり、このような誤りが発生した場合には、非常に「的はずれ」な回答とともに、質問文に対する適合度が低い文書が表示されるという問題点がある。従って、質問応答システムを、そのまま自然文検索システムとして使用するには問題が多い。
【００１６】
【非特許文献１】
北研二，津田和彦，獅々堀正幹著「情報検索アルゴリズム」共立出版、２００２年。
【００１７】
【非特許文献２】
ＳｅｒｇｅｙＢｒｉｎａｎｄＬａｗｒｅｎｃｅＰａｇｅ，ＴｈｅＡｎａｔｏｍｙｏｆａＬａｒｇｅ−ＳｃａｌｅＨｙｐｅｒｔｅｘｔｕａｌＷｅｂＳｅａｒｃｈＥｎｇｉｎｅ，ＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅＳｅｖｅｎｔｈＩｎｔｅｒｎａｔｉｏｎａｌＷｏｒｌｄＷｉｄｅＷｅｂＣｏｎｆｅｒｅｎｃｅ（ＷＷＷ７），１９９８。
【００１８】
【特許文献１】
特開２００２−６３２０３号公報。
【００１９】
【特許文献２】
特開２００２−１３２８１１号公報。
【００２０】
【発明が解決しようとする課題】
従来の自然文検索システムでは、ユーザの質問文に対する回答を文書が含んでいるかどうかを質問文と文書の適合度の尺度とするものは存在しなかった。一方、従来の質問応答システムは、質問文に対する回答を出力することができるので、回答を抽出した文書を回答と同時に出力すれば、ユーザの質問文に対する回答を与えることができる文書を出力する自然文検索とみなすことができる。しかし、質問応答システムでは、質問文解析、固有表現抽出、回答候補選択など、文書検索以外の様々な処理が原因となって回答を誤る可能性を無視できず、もし誤った回答を質問応答システムが選択した場合には、質問文に対する適合度が低い文書が出力されるという問題があった。
【００２１】
本発明は、このような事情に鑑みてなされたものであり、質問文から検索キーワード集合を作成してキーワード検索により文書集合を検索し、検索された文書における検索キーワードの周囲のテキストが質問文に対する回答を含むという事象の期待値が大きい順に、検索された文書と検索キーワードの周囲のテキストの組を表示することにより、検索キーワードの周囲のテキストが質問に対する回答および回答の根拠を含むと期待される文書を上位に順位付けて出力する自然文検索装置を提供することを目的とする。
【００２２】
【課題を解決するための手段】
本発明の観点は、自然言語により表現された情報検索要求を入力とする自然文検索装置であって、特に、自然言語による質問文から検索キーワード集合を作成して、当該検索キーワード集合を用いて文書集合を検索する装置である。
【００２３】
本発明の観点に従った自然文検索装置は、自然文で表現された情報検索要求として質問文を入力し、当該質問文に適合する文書集合を適合度の順に出力する自然文検索装置であって、入力された質問文から検索キーワード集合を作成する質問解析手段と、前記作成された検索キーワード集合に基づいて、指定の文書検索エンジンから検索された文書集合、及び当該各文書における検索キーワードの周囲のテキストを獲得する文書検索インターフェース手段と、前記検索キーワードの周囲のテキストが質問文に対する回答を含むという事象の期待値に基づいて、前記検索された文書と検索キーワードの周囲のテキストの組を前記質問文に対する文書の適合度の尺度として順位付けする文書再ランキング手段とを備えたものである。
【００２４】
【発明の実施の形態】
以下図面を参照して、本発明の実施の形態を説明する。
（システム構成）
図１は、本実施形態に関する自然文検索装置の原理的システム構成を示すブロック図である。
【００２５】
本装置１０は、文書データベース１００と、質問解析部１０１と、文書検索部１０２と、文書再ランキング部１０３とを有する。
【００２６】
文書データベース１００は、検索対象となる文書情報を蓄積している情報記憶装置を主要素とする。質問解析部１０１は、ユーザが自然言語で表現した情報検索要求、すなわち質問文から検索キーワード集合を作成する。文書検索部１０２は、質問解析部１０１により作成された検索キーワード集合に基づいて、文書データベース１００から文書を検索し、さらに、検索された文書から検索キーワードの周囲のテキスト（ＫＷＩＣ）を抽出する。文書再ランキング部１０３は、当該ＫＷＩＣが質問文に対する回答を含むという事象の期待値に基づいて、文書検索部１０２により検索された文書を順位付けて、当該文書とＫＷＩＣとの組み合わせ情報を出力する。
【００２７】
なお、本システムは、ソフトウェア及び当該ソフトウェアを実行するＣＰＵとメモリからなるハードウエアを含むコンピュータシステムにより実現される。
（本実施形態の原理的動作）
以下図１に示すシステムの原理的動作を、図２に示すフローチャートを参照して説明する。
【００２８】
まず、システム１０に対して、ユーザが自然言語で質問文を入力する（ステップＳ２０１）。質問解析部１０１は、入力された質問文を形態素解析し、検索キーワード集合を作成する（ステップＳ２０２）。文書検索部１０２は、質問解析部１０１により作成された検索キーワード集合に基づいて、文書データベース１００から文書を検索し、さらに、各文書から検索キーワードの周囲のテキスト（ＫＷＩＣ）を抽出する（ステップＳ２０３）。
【００２９】
次に、文書再ランキング部１０３は、各文書のＫＷＩＣを形態素解析し、質問文に対する回答が当該ＫＷＩＣの中に含まれている期待値を計算する。そして、文書再ランキング部１０３は、算出した期待値の大きさに基づいて、検索された文書の順位付け処理（再ランキング）を実行する（ステップＳ２０４）。最後に、文書再ランキング部１０３は、文書とＫＷＩＣの組をステップ２０４で求めた期待値の大きい順に出力する（ステップ２０５）。
【００３０】
以上要するに本実施形態のシステムによれば、ユーザが自然言語で入力した質問文に対して、文書データベース１００から、期待値の大きい順に文書とＫＷＩＣとの組み合わせを取得する事ができる。当該期待値は、当該ＫＷＩＣが質問文に対する回答を含むという事象の期待値であり、質問文に対する文書の適合度の尺度としてみることができる。
【００３１】
従って、本システムであれば、ユーザからの質問文に対して、回答を含む期待値の大きい順に、即ち適合度の大きい順に、検索された文書と検索キーワードの周囲のテキスト（ＫＷＩＣ）の組を、例えばディスプレイ上に表示できる。この場合、ＫＷＩＣは、ユーザの質問に対する回答を含む期待値が大きい文書に関して、その回答の根拠を示す役割を果たす。
【００３２】
また、従来の質問応答システムが回答の尤もらしさの順に文書を順位付ける方式に対して、本実施形態のシステムは、回答を含む可能性の大きさの順に文書を順位付けるので、回答選択などの処理における誤りの影響を受けることがなく、より質問文に対する適合度の高い文書を検索結果とすることができる。
（本実施形態を適用する具体例）
図３は、本実施形態のシステムを適用した具体的な自然文検索装置３０のシステム構成を示すブロック図である。
【００３３】
本システムは、質問解析部３０１と、文書検索インターフェース部３０２、文書再ランキング部３０３と、形態素解析器３０５と、固有表現抽出器３０６と、意味カテゴリ辞書３０７と、統計的分類器３０８とを有する。
【００３４】
質問解析部３０１は、形態素解析器３０５を用いて、自然言語からなる質問文の単語分割および品詞付与などの処理を実行して、検索キーワード集合を抽出する。具体的には、名詞・形容詞・副詞などの内容語、及びカタカナ文字列、英文字列、数字列などのキーワードになりやすい未知語を検索キーワードとして抽出する。例えば、「Ｍ（選手名）とＹ（球団名）との契約金は？」という質問文に対しては、「Ｍ」、「Ｙ」、「契約」、「金」が検索キーワード集合として抽出される。
【００３５】
また、質問解析部３０１は、意味カテゴリ辞書３０７および統計的分類器３０８を用いて質問タイプを判定する。質問タイプは、質問文が要求している回答の種類に基づいて質問文を分類するもので、例えば「組織名、人名、地名、固有物名、日付、時間、金額、割合」の８種類を使用する。質問タイプの分類は、固有表現抽出器３０６が抽出する固有表現の分類と同じである。
【００３６】
質問文の質問タイプを判定する問題は、基本的にはテキスト分類問題である。従って、質問文を大量に収集し、各質問文に対して人手により質問タイプを付与したデータを大量に用意すれば、これを学習データとして統計的分類器３０８を学習させることにより、任意の質問文に対して質問タイプを付与することができる。
【００３７】
本実施形態の具体例としては、様々な語彙を含む質問文に対して高精度に質問タイプの分類を行うために、統計的分類器３０８としてサポートベクトルマシン（ＳＶＭ）を使用する。ＳＶＭについては、例えば、文献「ＶｌａｄｉｍｉｒＮ．Ｖａｐｎｉｋ，“ＴｈｅＮａｔｕｒｅｏｆＳｔａｔｉｓｔｉｃａｌＬｅａｒｎｉｎｇＴｈｅｏｒｙ”，Ｓｐｒｉｎｇｅｒ，１９９５」に開示されている。また、統計的分類器３０８としては、サポートベクトルマシン以外に、最近隣法、ブースティング、最大エントロピー法、決定木などを使用した方法でもよい。
【００３８】
また、サポートベクトルマシンの入力となる特徴ベクトルを質問文から作成する際には、名詞の意味カテゴリを特徴として利用するために意味カテゴリ辞書３０７を使用する。意味カテゴリ辞書３０７としては、例えば文献（ＮＴＴコミュニケーション科学研究所監修，“日本語語彙体系”，岩波書店，１９９７）に開示されている。この日本語語彙体系では、名詞を１２段、２７１５カテゴリに分類し、１単語につき、最大５個のカテゴリが割り当てられている。
【００３９】
意味カテゴリ辞書３０７と統計的分類器３０８（サポートベクトルマシン）を用いて、質問文の質問タイプを判定する方法については、例えば文献「鈴木潤，佐々木裕，前田英作，“統計的機械学習による質問タイプ同定”，情報科学技術フォーラム（ＦＩＴ２００２），情報技術レターズ，ｐｐ．８９−９０，２００２」に開示されている。
【００４０】
この開示されている方法では、各意味カテゴリに対応する２７１５次元の特徴ベクトルを作成し、あるカテゴリに所属する名詞が質問文中に出現したら、そのカテゴリおよびその上位のすべてのカテゴリに対応する特徴ベクトルの位置のビットに１を立てる。質問タイプの判定に使用する特徴ベクトルには、意味カテゴリ辞書３０７のカテゴリ以外に、必要に応じて、質問文の学習データに出現した高頻度の単語や、固有表現抽出器３０６を用いて抽出した固有表現の種類別での出現の有無などを使用してもよい。
【００４１】
形態素解析器３０５および固有表現抽出器３０６としては、形態素解析（単語分割と品詞付与）および固有表現抽出（固有名詞および数値表現の認識と分類）ができるものならば何を使用してもよい。固有表現抽出器３０６としては、例えば文献「齋藤邦子，永田昌明，“ＨＭＭに基づく多言語固有表現抽出システムの開発”，言語処理学会第９回年次大会発表論文集，ｐｐ．５−８，２００２」に開示されている隠れマルコフモデル（ＨＭＭ）を用いた固有表現抽出器３０６が使用される。
【００４２】
文書検索インターフェース部３０２は、質問解析部３０１が作成した検索キーワード集合を用いて、文書検索エンジン３０４を介して検索された文書及びＫＷＩＣ（即ち、検索キーワードの周囲のテキスト）を獲得する。
【００４３】
ここで、文書検索エンジン３０４は、例えばインターネット（Ｗｅｂ）からＷｅｂ文書を検索するインターネット検索エンジンとして、本システム３０の外部に設けられた要素である。また、文書検索エンジン３０４は、本システム３０の内部に設けられて、内部または外部の文書データベースからキーワード検索を実行するテキスト検索システムに相当するものでもよい。要するに、文書検索エンジン３０４としては、文書データベースからキーワード検索が可能で、かつＫＷＩＣを取得できるものならば何でもよい。
【００４４】
ここでは、文書データベースとしてインターネット（Ｗｅｂ）を使用し、文書検索エンジン３０４は、インターネット検索エンジンとして本システム３０の外部要素の場合を想定する。
【００４５】
ここで、ＫＷＩＣを抽出する方法は、一般的には「パッセージ検索」と呼ばれる方法であり、長い文書の中の関連する一部分を抜き出す技術を利用する。パッセージ検索の実現法については、例えば文献「ＭａｒｃｉｎＫａｓｚｋｉｅｌａｎｄＪｕｓｔｉｎＺｏｂｅｌ，“ＰａｓｓａｇｅＲｅｔｒｉｅｖａｌＲｅｖｉｓｉｔｅｄ”，ＳＩＧＩＲ−９７，ｐｐ．１７８−１８５」に開示されている。
【００４６】
文書再ランキング部３０３は、文書検索インターフェース部３０２により獲得された検索文書とＫＷＩＣの組を入力として、当該ＫＷＩＣの中に正しい回答が含まれる期待値を算出し、この期待値が大きい順に文書を順位付けする。この処理は、文書検索エンジン３０４が出力する文書の順位とは別の順位を計算するため、「再ランキング」処理と呼ぶ。
【００４７】
ここで、実際にはＫＷＩＣの中に正しい回答が含まれる期待値を厳密に求めることは難しいので、様々なヒューリスティクスを用いてこれを近似する。最も単純なヒューリスティクスは、ＫＷＩＣが質問文により近い表現（同じ単語列）を含むほど、回答を含む可能性が高いというものである。
【００４８】
本具体例では、まず質問文を形態素解析し、質問文中に含まれる単語のｕｎｉｇｒａｍ，ｂｉｇｒａｍ，ｔｒｉｇｒａｍを作成する。次に、以下の計算式（１）により各ＫＷＩＣに回答が含まれる期待値に相当するスコアＳを算出する。
【００４９】
【数１】

【００５０】
ここでＮ_ｎ（ｎ＝１，２，３）は、あるＫＷＩＣに出現する質問文中のｕｎｉｇｒａｍ，ｂｉｇｒａｍ，ｔｒｉｇｒａｍの異なり数である。ｔｆ_ｎはｎ−ｇｒａｍの出現頻度であり、ｉｄｆは逆文書頻度である。ｗ_ｎはｎ−ｇｒａｍへの重みであり、より長いｎ−ｇｒａｍに対する重みを大きくするように実験的に設定する。Ｎｏｒｍａｌｉｚｅｄ＿ＦａｃｔｏｒはＫＷＩＣの長さの違いを正規化する重みであり、より長いＫＷＩＣほど大きくなるように実験的に設定する。
【００５１】
逆文書頻度を計算する際に分母として必要な総文書数は、文書検索エンジン３０４から取得する文書数とする。本具体例では、当該文書検索エンジン３０４から取得する文書数を事前に設定できることを想定し、デフォルトでは例えば１０件に設定することができる。
【００５２】
また、本具体例では、質問タイプと一致する固有表現タイプを持つ語句がＫＷＩＣ中に存在するかどうかを、期待値（スコア）の計算に反映させても良い。その場合には、あらかじめ質問解析部３０１において質問文の質問タイプを判定し、文書検索エンジン３０４が検索した各文書のＫＷＩＣから固有表現抽出器３０６を用いて固有表現を抽出した上で、次式（２）をスコアの計算に用いる。
【００５３】
【数２】

【００５４】
ここで、Ｎ_ｑｔは質問タイプと同じ固有表現タイプを持つＫＷＩＣ中の語句の異なり数を表す。ｗ_ｑｔは質問タイプに対する重みであり、この重みの最適な値は実験的に決定される。
（検索結果の具体例）
図４は、本具体例のシステムにおける検索結果の例を示す機能ブロック図である。ここでは、「Ｍ（選手名）とＹ（球団名）の契約金は？」という質問文が入力された場合を例として示している。
【００５５】
まずユーザは、質問文を入力し、インターネット検索エンジンとそこから検索する文書数を選択する（処理４０１）。この例ではインターネット検索エンジンとして「ＸＸＸＸＸ」を選択し、検索件数として１０件を指定している。
【００５６】
質問解析部３０１は、入力された質問文から「Ｍ、Ｙ、契約、金」というキーワード集合を抽出し、また質問タイプを「金額」と判定する（処理４０２）。
【００５７】
文書検索インターフェース部３０２は、検索キーワードをインターネット検索エンジン３０４に送り、当該検索エンジン３０４から文書のＵＲＬおよびＫＷＩＣを得る（処理４０３）。
【００５８】
文書再ランキング部３０３は、文書検索インターフェース部３０２により獲得された検索文書とＫＷＩＣの組（ＵＲＬタイトル概要文に相当）を入力として、当該ＫＷＩＣの中に正しい回答が含まれる期待値を算出し、この期待値が大きい順に文書を順位付けを実行する。具体的には、質問文とＫＷＩＣの類似度、および、質問タイプと同じタイプを持つ固有表現の有無に基づいて、文書を再ランキングし（処理４０４）、当該結果を例えばディスプレイ上に表示する（表示結果４０５）。
【００５９】
この例では、インターネット検索エンジン３０４の検索結果では、例えば第９位にあった文書が、再ランキングの結果、「Ｍ、Ｙ、契約」というキーワードを含み、かつ、例えば「約２１００万ドル（約２５億２０００万円）」という金額の表現をＫＷＩＣに含むことから第１位に順位付けられる。
【００６０】
従って、ユーザからの例えば「Ｍ（選手名）とＹ（球団名）の契約金は？」という質問文に対して、「約２１００万ドル（約２５億２０００万円）」という回答を含む文書を上位にランキングし、かつ、回答の根拠として当該文書と組となるＫＷＩＣを表示することができる。
【００６１】
なお、本発明は上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。
【００６２】
【発明の効果】
以上詳述したように本発明によれば、自然言語により表現された質問文を情報検索要求として入力し、当該質問文に対する回答を含むという事象の期待値に基づいて文書を順位付けする方式を実現することにより、質問文に対して適合度の高い文書を出力することができる自然文検索装置を提供できる。
【図面の簡単な説明】
【図１】本発明の実施形態に関する自然文検索装置の原理的システム構成を示すブロック図。
【図２】本実施形態の原理的動作を説明するためのフローチャート。
【図３】本実施形態のシステムを適用した自然文検索装置の具体例のシステム構成を示すブロック図。
【図４】同具体例のシステムに関する検索結果の表示例を示す図。
【符号の説明】
１０…自然文検索装置、１００…文書データベース、１０１…質問解析部、
１０２…文書検索部、１０３…文書再ランキング部。
３０１…質問解析部、３０２…文書検索インターフェース部、
３０３…文書再ランキング部、３０４…文書検索エンジン、
３０５…形態素解析器、３０６…固有表現抽出器、３０７…意味カテゴリ辞書
３０８…統計的分類器。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention generally relates to a natural sentence search apparatus that searches for information in response to an information search request expressed in a natural language, and more particularly to a natural sentence that can acquire highly suitable document information using a keyword search method. Related to a search device.
[0002]
[Prior art]
A conventional document search system is basically a keyword search system, which receives an information search request expressed by a set of keywords as an input and outputs a set of documents matching the request as a search result. At this time, the similarity between the keyword set and the document as in the TF-IDF method is used as a measure of the degree of conformity, and the documents are often output in the descending order of the similarity with the input keyword set (for example, , Non-Patent Document 1).
[0003]
Furthermore, in the case of an Internet search engine that searches for documents on the WWW (World Wide Web), a WWW topology (interconnection) in which information of sites linked from many sites is reliable. By using heuristics based on gender, the accuracy of the determination of the degree of conformity is improved. This method is called PageRank (for example, see Non-Patent Document 2).
[0004]
However, for example, if you want to search for a document that gives an answer to a question sentence expressed in natural language such as "Who is the tallest US president in history?" Is not always easy to create from this question sentence.
[0005]
Therefore, a method of inputting an information search request to a document search system in a natural language instead of a keyword set has been studied, and this method is called a natural sentence search for a keyword search. Some Internet search engines can perform natural sentence searches in addition to keyword searches.
[0006]
The natural sentence search can search for information that the user wants to know in a spoken language (like speaking), and therefore does not require specialized knowledge on logical operations such as AND-OR as compared with the keyword search. There is an advantage that it can be expressed naturally. In addition, from the viewpoint of the information search service provider, there is an advantage that natural sentence can more accurately grasp the information desired by the user than the search keyword.
[0007]
In the conventional research and development of natural sentence search, for example, as in the following patent document, an information search request expressed in a natural language, that is, a method of creating a search keyword or search expression from a question sentence, and a thesaurus (synonym, A method of using a related word dictionary) to absorb the difference between the vocabulary used by the user and the vocabulary used in the document to be searched (so-called “concept search”) has been devised (eg, Patent Documents). 1).
[0008]
In English natural sentence search, the system interprets the meaning of the question sentence input by the user, presents a plurality of possible paraphrases, and allows the user to select one of them. There is also a method to guide the user to a question that can be answered.
[0009]
However, in a conventional natural sentence search, a user determines what information and what information the user wants to know from a question sentence, and determines whether or not the answer to the question sentence is included in the document. No such method exists.
[0010]
In recent years, a question answering system in which a system directly presents an answer to a user's question sentence has been actively researched (for example, see Patent Document 2).
[0011]
In a question answering system, for example, when a user enters the question "Who is the tallest US president?", The system searches for documents related to "tallest US president." Instead, it outputs the answer "Lincoln".
[0012]
In general, a question answering system not only displays an answer to a question sentence but also presents a user with a document from which an answer has been extracted, as in the following display example. This means that, for example, in response to the question "Who is the tallest US president?", If only the answer "Lincoln" is output, the user can confirm whether "Lincoln" is the correct answer. This is because it cannot be confirmed.
(Display example)
"February 12 ... A bloody Kansas case" etc. occurred one after another. The Lincoln-Douglas Controversy In the 1858 U.S. midterm election, Illinois ... Lincoln is 193.0 cm tall, the tallest president in history, his face is quite tall, and viewed from the end ... "
Therefore, a question answering system that outputs a "set of answers and documents from which answers have been extracted" can be regarded as a type of natural sentence search system because a question sentence is input and a document is output.
[0013]
However, the question answering system outputs an answer and a set of documents from which the answer is extracted in descending order of the likelihood of the answer, and the documents are not necessarily output in the order of the degree of conformity to the question sentence.
[0014]
For example, in response to the question "Who is the tallest President of the United States?", A large number of documents that contain only the two keywords "USA" and "President" (low relevance to the question) If the name “Bush” frequently appears in the questionnaire, the question-and-answer system selects “Bush” as the number one answer candidate, and the document on “President Bush” is used as the basis for the answer. It may be output.
[0015]
In other words, in the conventional question answering system, there are a considerable number of cases where wrong answers are caused due to various processes other than document search, such as question sentence analysis, named entity extraction, and answer candidate selection, and such errors occurred. In such a case, there is a problem that a document having a low degree of relevance to the question sentence is displayed together with a very “out of target” answer. Therefore, there are many problems in using the question answering system as it is as a natural sentence search system.
[0016]
[Non-patent document 1]
Kenji Kita, Kazuhiko Tsuda, Masamiki Shishibori, "Information Retrieval Algorithm," Kyoritsu Publishing, 2002.
[0017]
[Non-patent document 2]
Sergey Brin and Lawrence Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine, Proceedings of the WonderWent International Airport.
[0018]
[Patent Document 1]
JP-A-2002-63203.
[0019]
[Patent Document 2]
JP-A-2002-132811.
[0020]
[Problems to be solved by the invention]
In a conventional natural sentence retrieval system, there is no system that uses whether or not a document includes an answer to a user's question sentence as a measure of the degree of matching between the question sentence and the document. On the other hand, the conventional question answering system can output an answer to a question sentence. Therefore, if a document from which an answer is extracted is output at the same time as an answer, a natural answer that outputs a document that can give an answer to the user's question sentence is output. It can be regarded as a sentence search. However, the question answering system cannot ignore the possibility of incorrect answers due to various processes other than document search, such as question sentence analysis, named entity extraction, and answer candidate selection. When there is a problem, there is a problem that a document having a low relevance to the question sentence is output.
[0021]
The present invention has been made in view of such circumstances, and a search keyword set is created from a question sentence, a document set is searched by a keyword search, and the text around the search keyword in the searched document is a question sentence. By displaying the pair of the searched document and the text surrounding the search keyword in the descending order of the expected value of the event that includes the answer to, it is expected that the text surrounding the search keyword includes the answer to the question and the basis of the answer It is an object of the present invention to provide a natural sentence retrieval apparatus that ranks and outputs documents to be ranked higher.
[0022]
[Means for Solving the Problems]
An aspect of the present invention is a natural sentence search apparatus that receives an information search request expressed in a natural language as an input, and in particular, creates a search keyword set from a question sentence in a natural language and uses the search keyword set. This is a device that searches a set of documents.
[0023]
A natural sentence search device according to an aspect of the present invention is a natural sentence search device that inputs a question sentence as an information search request expressed in a natural sentence and outputs a set of documents that match the question sentence in order of relevance. Question analysis means for creating a set of search keywords from the input question text, a set of documents searched from a specified document search engine based on the created set of search keywords, and a search keyword in each document. Document search interface means for acquiring surrounding text, and a set of the searched document and the surrounding text of the search keyword based on an expected value of an event that the surrounding text of the search keyword includes an answer to a question sentence. Document re-ranking means for ranking as a measure of the relevance of the document to the question sentence.
[0024]
BEST MODE FOR CARRYING OUT THE INVENTION
Embodiments of the present invention will be described below with reference to the drawings.
(System configuration)
FIG. 1 is a block diagram showing a basic system configuration of a natural sentence search device according to the present embodiment.
[0025]
The device 10 includes a document database 100, a question analysis unit 101, a document search unit 102, and a document re-ranking unit 103.
[0026]
The main component of the document database 100 is an information storage device that stores document information to be searched. The question analysis unit 101 creates a search keyword set from an information search request expressed by a user in a natural language, that is, a question sentence. The document search unit 102 searches for a document from the document database 100 based on the search keyword set created by the question analysis unit 101, and further extracts a text (KWIC) around the search keyword from the searched document. The document re-ranking unit 103 ranks the documents searched by the document search unit 102 based on the expected value of the event that the KWIC includes an answer to the question sentence, and outputs combination information of the document and the KWIC. .
[0027]
The present system is realized by a computer system including software and hardware including a CPU for executing the software and a memory.
(Principle operation of this embodiment)
Hereinafter, the principle operation of the system shown in FIG. 1 will be described with reference to the flowchart shown in FIG.
[0028]
First, a user inputs a question sentence to the system 10 in a natural language (step S201). The question analysis unit 101 performs a morphological analysis on the input question sentence, and creates a search keyword set (step S202). The document search unit 102 searches for documents from the document database 100 based on the search keyword set created by the question analysis unit 101, and further extracts a text (KWIC) around the search keyword from each document (step S203). ).
[0029]
Next, the document re-ranking unit 103 performs a morphological analysis on the KWIC of each document, and calculates an expected value in which an answer to the question is included in the KWIC. Then, the document re-ranking unit 103 executes a ranking process (re-ranking) of the retrieved documents based on the calculated magnitude of the expected value (step S204). Finally, the document re-ranking unit 103 outputs the pair of the document and the KWIC in descending order of the expected value obtained in step 204 (step 205).
[0030]
In short, according to the system of the present embodiment, for a question sentence input by a user in a natural language, a combination of a document and KWIC can be acquired from the document database 100 in descending order of expected value. The expected value is an expected value of an event that the KWIC includes an answer to the question sentence, and can be viewed as a measure of the degree of relevance of the document to the question sentence.
[0031]
Therefore, according to the present system, for a question sentence from a user, a set of a retrieved document and a text (KWIC) around a search keyword are arranged in descending order of expected values including answers, that is, in descending order of relevance. , For example, on a display. In this case, the KWIC plays a role of indicating the basis of the answer for a document having a high expected value including the answer to the user's question.
[0032]
Also, in contrast to the conventional question answering system in which documents are ranked in the order of likelihood of answers, the system of the present embodiment ranks documents in the order of the likelihood of containing the answer, so that the answer selection etc. A document having a higher degree of relevance to the question sentence can be used as a search result without being affected by an error in the processing.
(Specific example to which this embodiment is applied)
FIG. 3 is a block diagram showing a system configuration of a specific natural sentence search device 30 to which the system of the present embodiment is applied.
[0033]
The system includes a question analysis unit 301, a document search interface unit 302, a document re-ranking unit 303, a morphological analyzer 305, a named entity extractor 306, a semantic category dictionary 307, and a statistical classifier 308. .
[0034]
The question analysis unit 301 uses the morphological analyzer 305 to execute processing such as word division and part-of-speech assignment of a question sentence composed of a natural language to extract a search keyword set. Specifically, content words such as nouns, adjectives, and adverbs, and unknown words that are likely to be keywords such as katakana character strings, English character strings, and numeric strings are extracted as search keywords. For example, in response to a question “What is the contract amount between M (player name) and Y (team name)?”, “M”, “Y”, “contract”, and “gold” are extracted as a set of search keywords. Is done.
[0035]
Further, the question analysis unit 301 determines the question type using the semantic category dictionary 307 and the statistical classifier 308. The question type classifies the question text based on the type of answer required by the question text. For example, eight types of "organization name, person name, place name, unique property name, date, time, amount, and ratio" are used. use. The classification of the question type is the same as the classification of the named entity extracted by the named entity extractor 306.
[0036]
The problem of determining the question type of a question sentence is basically a text classification problem. Therefore, if a large number of question sentences are collected and a large amount of data to which a question type is manually assigned to each question sentence is prepared, the statistical classifier 308 is trained using the data as learning data. Question types can be assigned to sentences.
[0037]
As a specific example of the present embodiment, a support vector machine (SVM) is used as the statistical classifier 308 in order to classify question types including various vocabularies with high accuracy. The SVM is disclosed in, for example, the document “Vladimir N. Vapnik,“ The Nature of Statistical Learning Theory ”, Springer, 1995”. Further, as the statistical classifier 308, a method using a nearest neighbor method, boosting, a maximum entropy method, a decision tree, or the like may be used in addition to the support vector machine.
[0038]
When a feature vector to be input to the support vector machine is created from a question sentence, a semantic category dictionary 307 is used to use a semantic category of a noun as a feature. The semantic category dictionary 307 is disclosed in, for example, a document (edited by NTT Communication Science Laboratories, “Japanese Vocabulary System”, Iwanami Shoten, 1997). In this Japanese vocabulary system, nouns are classified into 12 columns and 2715 categories, and a maximum of five categories are assigned to one word.
[0039]
For a method of determining the question type of a question sentence using the semantic category dictionary 307 and the statistical classifier 308 (support vector machine), see, for example, a document “Jun Suzuki, Yutaka Sasaki, Eisaku Maeda,“ Question by Statistical Machine Learning. Type identification ”, Information Technology Forum (FIT2002), Information Technology Letters, pp. 89-90, 2002”.
[0040]
In the disclosed method, a 2715-dimensional feature vector corresponding to each semantic category is created, and when a noun belonging to a certain category appears in a question sentence, a feature vector corresponding to the category and all higher-order categories is generated. Set 1 to the bit at the position. The feature vector used for the determination of the question type was extracted by using the named words in the learning data of the question sentence, or by using the named entity extractor 306, as necessary, in addition to the categories of the semantic category dictionary 307. The presence / absence of each type of named entity may be used.
[0041]
As the morphological analyzer 305 and the named entity extractor 306, any device can be used as long as it can perform morphological analysis (word division and part-of-speech assignment) and named entity extraction (recognition and classification of named entities and numerical expressions). Examples of the named entity extractor 306 include the documents “Kuniko Saito, Masaaki Nagata,“ Development of a Multilingual Named Expression Extraction System Based on HMM ””, Proc. Of the 9th Annual Meeting of the Language Processing Society, pp. 5-8, 2002, a named entity extractor 306 using a Hidden Markov Model (HMM).
[0042]
The document search interface unit 302 uses the search keyword set created by the question analysis unit 301 to obtain a document and a KWIC (ie, text surrounding the search keyword) searched through the document search engine 304.
[0043]
Here, the document search engine 304 is an element provided outside the present system 30 as an Internet search engine for searching for a Web document from the Internet (Web), for example. The document search engine 304 may be provided inside the system 30 and may correspond to a text search system that executes a keyword search from an internal or external document database. In short, any document search engine 304 can be used as long as a keyword search can be performed from a document database and a KWIC can be acquired.
[0044]
Here, it is assumed that the Internet is used as the document database, and the document search engine 304 is an external element of the system 30 as the Internet search engine.
[0045]
Here, the method of extracting the KWIC is a method generally called “passage search”, and uses a technique of extracting a relevant part from a long document. The method of implementing the passage search is disclosed in, for example, the document “Marcin Kaszkiel and Justin Zobel,“ Passage Retrieval Revised ”, SIGIR-97, pp. 178-185”.
[0046]
The document re-ranking unit 303 receives the set of the search document and the KWIC obtained by the document search interface unit 302 as input, calculates an expected value in which a correct answer is included in the KWIC, and sorts the documents in descending order of the expected value. Rank. This process is called a “re-ranking” process because it calculates a different rank from the rank of the document output by the document search engine 304.
[0047]
Here, since it is actually difficult to exactly determine an expected value in which a correct answer is included in the KWIC, this is approximated using various heuristics. The simplest heuristic is that the more likely the KWIC contains an expression (same word sequence) closer to the question sentence, the more likely it is to include an answer.
[0048]
In this specific example, first, a question sentence is subjected to morphological analysis, and unigrams, bigrams, and trigrams of words included in the question sentence are created. Next, a score S corresponding to an expected value in which an answer is included in each KWIC is calculated by the following formula (1).
[0049]
(Equation 1)

[0050]
Here, N _n (n = 1, 2, 3) is the number of different unigrams, bigrams, and trigrams in a question sentence appearing in a certain KWIC. tf _n is the appearance frequency of n-gram, and idf is the reverse document frequency. w _n is the weight for the n-gram, and is experimentally set so as to increase the weight for the longer n-gram. Normalized_Factor is a weight for normalizing the difference in the length of the KWIC, and is experimentally set so that the longer the KWIC, the larger the value.
[0051]
The total number of documents required as a denominator when calculating the inverse document frequency is the number of documents acquired from the document search engine 304. In this specific example, it is assumed that the number of documents acquired from the document search engine 304 can be set in advance, and the number can be set to, for example, 10 by default.
[0052]
Further, in this specific example, whether or not a word having a named entity type matching the question type exists in the KWIC may be reflected in the calculation of the expected value (score). In that case, the question analysis unit 301 determines the question type of the question sentence in advance, and the document search engine 304 extracts a named expression from the KWIC of each document searched using the named expression extractor 306. (2) is used for calculating the score.
[0053]
(Equation 2)

[0054]
Here, N _qt represents the number of different words in KWIC having the same entity expression type as the question type. w _qt is the weight for the question type, and the optimal value of this weight is determined experimentally.
(Specific examples of search results)
FIG. 4 is a functional block diagram illustrating an example of a search result in the system according to this specific example. Here, an example is shown in which a question message “What is the contract money between M (player name) and Y (team name)?” Is input.
[0055]
First, a user inputs a question sentence, and selects an Internet search engine and the number of documents to be searched therefrom (process 401). In this example, “XXXXXX” is selected as the Internet search engine, and 10 are specified as the number of search cases.
[0056]
The question analysis unit 301 extracts a keyword set of “M, Y, contract, money” from the input question text, and determines that the question type is “money” (process 402).
[0057]
The document search interface unit 302 sends the search keyword to the Internet search engine 304, and obtains the URL and KWIC of the document from the search engine 304 (process 403).
[0058]
The document re-ranking unit 303 receives the set of the search document and the KWIC (corresponding to the URL title outline sentence) acquired by the document search interface unit 302 and calculates an expected value in which the correct answer is included in the KWIC, The documents are ranked in descending order of the expected value. Specifically, the documents are re-ranked based on the similarity between the question text and the KWIC and the presence / absence of a named entity having the same type as the question type (process 404), and the result is displayed on, for example, a display (process 404). Display result 405).
[0059]
In this example, in the search results of the Internet search engine 304, for example, a document ranked in the ninth place includes the keyword “M, Y, contract” as a result of the re-ranking and, for example, “about 21 million dollars (about (2.52 billion yen) "is included in the KWIC and ranked first.
[0060]
Therefore, in response to a question sentence from a user, for example, "What is the contract amount between M (player name) and Y (team name)?" , And the KWIC paired with the document can be displayed as the basis for the answer.
[0061]
Note that the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying constituent elements in an implementation stage without departing from the scope of the invention. Various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the above embodiments. For example, some components may be deleted from all the components shown in the embodiment. Further, components of different embodiments may be appropriately combined.
[0062]
【The invention's effect】
As described above in detail, according to the present invention, a method of inputting a question sentence expressed in a natural language as an information search request and ranking documents based on an expected value of an event including an answer to the question sentence is provided. By realizing this, it is possible to provide a natural sentence search device capable of outputting a document having a high degree of relevance to a question sentence.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a basic system configuration of a natural sentence search device according to an embodiment of the present invention.
FIG. 2 is a flowchart for explaining the principle operation of the embodiment.
FIG. 3 is a block diagram showing a system configuration of a specific example of a natural sentence search device to which the system of the embodiment is applied.
FIG. 4 is an exemplary view showing a display example of a search result regarding the system of the specific example.
[Explanation of symbols]
10: Natural sentence search device, 100: Document database, 101: Question analysis unit,
102: document search unit; 103: document re-ranking unit.
301: question analysis unit, 302: document search interface unit
303: document re-ranking unit, 304: document search engine,
305: morphological analyzer, 306: named entity extractor, 307: semantic category dictionary 308: statistical classifier.

Claims

A natural sentence search device that inputs a question sentence as an information search request expressed in a natural sentence and outputs a set of documents that match the question sentence in order of relevance,
Question analysis means for creating a set of search keywords from the input question sentence,
Document search interface means for acquiring a set of documents searched from a designated document search engine based on the created set of search keywords and text around the search keyword in each document;
Based on the expected value of the event that the text surrounding the search keyword includes an answer to the question, the pair of the searched document and the text surrounding the search keyword is ranked as a measure of the relevance of the document to the question. And a document re-ranking means for attaching a natural sentence.

The question analysis means, while creating a search keyword set from the input question sentence, includes means for determining a question type that is a classification of the question sentence based on the type of answer that the question sentence is requesting,
The natural sentence search device according to claim 1, wherein the document re-ranking means calculates the expected value based on the question type and the appearance distribution of a search keyword.

The question analysis unit executes a recognition process of a proper noun or a numerical expression when determining the question type, and the document re-ranking unit performs a recognition process of a proper noun or a numerical expression when calculating the expected value. The natural sentence search device according to claim 1, wherein the natural sentence search device executes the following.

The question analysis means uses the meaning category of each word when determining the question type, and the document re-ranking means uses the meaning category of each word when calculating the expected value. The natural sentence search device according to any one of claims 1 to 3, wherein

5. The apparatus according to claim 1, further comprising means for displaying the searched documents ranked by the document re-ranking means and the text surrounding the search keyword in descending order of relevance. The natural sentence retrieval device according to any one of the above.

A search method applied to a natural sentence search device that inputs a question sentence as an information search request expressed in a natural sentence and outputs a set of documents that match the question sentence in order of relevance.
A question analysis step of creating a search keyword set from the input question sentence,
A document search step of acquiring a document set searched from a specified document search engine based on the created search keyword set and text around the search keyword in each document;
Based on the expected value of the event that the text surrounding the search keyword includes an answer to the question, the pair of the searched document and the text surrounding the search keyword is ranked as a measure of the relevance of the document to the question. And a document re-ranking step.

In the question analysis step, a search keyword set is created from the input question sentence, and at the same time, a question type that is a classification of a question sentence based on the type of answer requested by the question sentence is determined, and the document re-ranking is performed. 7. The search method according to claim 6, wherein the step calculates the expected value based on the question type and an appearance distribution of a search keyword.

The question analysis step performs a recognition process of a proper noun or a numerical expression when determining the question type, and the document re-ranking step performs a recognition process of a proper noun or a numerical expression when calculating the expected value. The search method according to claim 6, wherein the search method is executed.

The question analysis step uses a semantic category of each word when determining the question type, and the document re-ranking step uses a semantic category of each word when calculating the expected value. The search method according to any one of claims 6 to 8, wherein:

The method according to claim 6, further comprising the step of displaying the searched documents ranked in the document re-ranking step and the text around the search keyword in descending order of relevance. The search method according to any one of the above.

A program for realizing a natural sentence search method having a computer, inputting a question sentence as an information search request expressed in a natural sentence, and outputting a set of documents matching the question sentence in order of relevance,
A question analysis procedure for creating a set of search keywords from the input question text,
A document search procedure for acquiring a document set searched from a specified document search engine based on the created search keyword set and text around the search keyword in each document;
Based on the expected value of the event that the text surrounding the search keyword includes an answer to the question, the pair of the searched document and the text surrounding the search keyword is ranked as a measure of the relevance of the document to the question. And a program for causing the computer to execute a document re-ranking procedure to be attached.

The question analysis step creates a search keyword set from the input question sentence, determines a question type that is a classification of the question sentence based on the type of answer requested by the question sentence, and re-ranks the document. The program according to claim 11, wherein the procedure calculates the expected value based on the question type and an appearance distribution of a search keyword.

The question analysis step executes a recognition process of a proper noun or a numerical expression when determining the question type, and the document re-ranking step performs a recognition process of a proper noun or a numerical expression when calculating the expected value. 13. The program according to claim 11, wherein the program is executed.

The question analysis step uses the semantic category of each word when determining the question type, and the document re-ranking step uses the semantic category of each word when calculating the expected value. The program according to any one of claims 11 to 13, wherein:

The method according to claim 11, further comprising a step of displaying the searched documents ranked in the document re-ranking procedure and the text around the search keyword in descending order of relevance. The program according to any one of the above.