JP2004362314A

JP2004362314A - Search information registration device, information search device, search information registration method

Info

Publication number: JP2004362314A
Application number: JP2003160624A
Authority: JP
Inventors: Hiromasa Tanaka; 宏征田中; Masaomi Nakajima; 正臣中嶋; Kutics Andrea; クティチアンドレア; Akihiko Nakagawa; 明彦中川; Kiyotaka Tanaka; 清隆田中; Minoru Yamada; 実山田
Original assignee: JAPAN SYSTEMS CO Ltd; NTT Data Corp
Current assignee: JAPAN SYSTEMS CO Ltd; NTT Data Group Corp
Priority date: 2003-06-05
Filing date: 2003-06-05
Publication date: 2004-12-24

Abstract

【課題】ユーザが意図する画像を簡単に検索ができる仕組みを提供することを課題とする。
【解決手段】オブジェクト抽出部１０６により登録する画像中のオブジェクトを抽出し、特徴量抽出部１０７が抽出されたオブジェクトの視覚的特徴を抽出し、単語推定部１０８が画像の視覚的特徴を表す単語を記憶する視覚分類辞書部１０１を参照して抽出したオブジェクトの視覚的特徴に対応する単語を抽出し、語句推定部１０９が語句と単語を関連付けて記憶する語彙辞書部１０２を参照して抽出した単語に関連付けられている語句を抽出して、画像データベース１０４に当該語句とオブジェクトを含む画像と関連付けて記憶するようにした。
【選択図】図１An object of the present invention is to provide a mechanism capable of easily searching for an image intended by a user.
An object extracting unit extracts an object in an image to be registered, a feature amount extracting unit extracts a visual feature of the extracted object, and a word estimating unit extracts a word representing a visual feature of the image. The word corresponding to the visual feature of the extracted object is extracted by referring to the visual classification dictionary unit 101 that stores the words, and the phrase estimating unit 109 extracts the words by referring to the vocabulary dictionary unit 102 that stores the words in association with the words. The phrase associated with the word is extracted and stored in the image database 104 in association with the image including the phrase and the object.
[Selection diagram] Fig. 1

Description

【０００１】
【発明の属する技術分野】
本発明は、情報検索及び情報検索のためのデータベースを作成するための技術であって、特に画像を検索するのに好適な技術に関する。
【０００２】
【従来の技術】
従来、複数の画像を記憶した画像データベースから所望の画像を抽出する場合、何らかの方法で検索する画像の特徴を特定し、この特徴に基づいて検索を行っていた。その一例として、人が所望の画像を表現するために言葉を用いることが多いことから、画像を言葉により表現してテキスト情報としてデータベース化して検索することが行われていた。またテキストでは画像の特徴の表現が不十分であるため、画像の特徴量を抽出してそれをデータベース化して画像の検索が行われていた。
さらに検索するユーザの意図にあった画像検索を可能とするため、画像の潜在的な意味をインデックスとする手法や、２次元ＭＨＭＭによる画像全体の統計的モデル化を行ったり、ＥＭによる統計的学習と識別の応用、Ｈｏｐｆｉｅｌｄネットワークに概念の連想など様々な手法が提案されている。
【０００３】
この一例として、検索すべき画像の意味および認識による記述をユーザから受け取り、キーワード等の意味的検索を行うと共に、形態的もしくは構成要素の空間的関係に基づく類似画像検索を行い、これらの検索結果を統合し、検索結果を類似度基準に基づくランキングを行ってユーザに提供する方法などが提案されている（例えば、特許文献１）。
また、別の方法としては、画像データの所定領域を解析して、領域の色、テクスチャ、大きさなど特徴量を属性として付与するとともに、各領域間の位相関係を表したデータを作成しておき、各領域間の位相関係からオブジェクトで置換して、このオブジェクト、オブジェクトの相対位置及びオブジェクトの位相関係をキーワードとして抽出し、このキーワードを画像データと共に登録することが提案されている（例えば、特許文献２）。
【０００４】
【特許文献１】
特開平１０−２４０７７１号公報
【特許文献２】
特開平１０−４９５４２号公報
【０００５】
【発明が解決しようとする課題】
しかし、従来の意味ベースの画像検索システムでは、類似度を評価するモデルがオブジェクトなどの視覚的な属性の中で客観的に捉えられるものだけに依存しているものがほとんどであった。
そのため、ユーザがある状況の中で感じる画像の主観的な類似度と食い違うことが多く、いわゆるセマンティックギャップが発生してしまうという問題があった。そして、このセマンティックギャップのため、例えば、同じ動物の画像を検索する場合にであっても、従来であれば座っているライオンの画像、右を向いているライオンの画像、走っているライオンの画像などすべて同じ「ライオン」という言葉に関連付けられていたため、ユーザが寝ているライオンの画像を検索しようとすると、全てのライオンの画像が抽出されてしまったり、又は走っているライオンの画像などのユーザの意図しない画像が優先的に選択されるなどの問題がった。そのため、膨大な画像の中からユーザの主観的な意図に基づいて画像を検索しようとしても、意図する画像を検索することは大変困難であった。
【０００６】
また、テキストに基づく画像検索だけでなく、ある画像に関連する画像を検索したい場合、その画像の捉え方、感じ方はユーザにより様々である。そのため、同じ画像でもユーザにより関連付ける言葉が異なり、ある画像に基づいてこれに関連する画像を検索することは非常に困難であった。
【０００７】
本発明は上述の問題点を解決するためになされたものであって、ユーザが意図する画像を簡単に検索できる仕組みを提供することを課題とする。
【０００８】
【課題を解決するための手段】
上述の課題を解決するため、本発明は画像中のオブジェクトの視覚的特徴を表す単語の検索と、当該単語に関連する意味を持った語句の検索の２段階の検索を取り入れることにより、より高い精度で検索が行える仕組みを提供する。
このための仕組みの一例として、本発明にかかる検索情報登録装置は、登録対象の画像のデータの入力を受け付ける登録受付手段と、受け付けた画像中のオブジェクトを抽出するオブジェクト抽出手段と、当該オブジェクトの視覚的特徴を抽出する特徴抽出手段と、画像中のオブジェクトの視覚的特徴と当該視覚的特徴を表す単語を関連付けて記憶する第１の記憶手段と、上記第１の記憶手段を参照して、抽出したオブジェクトの視覚的特徴に対応する単語を抽出する単語抽出手段と、意味概念をあらわす語句と上記視覚的特徴をあらわす単語とを関連付けて記憶する第２の記憶手段と、上記第２の記憶手段を参照して、上記オブジェクトの視覚的特徴を表す単語に関連付けられている語句を抽出する語句抽出手段と、抽出された語句を上記オブジェクトを含む画像と関連付けて検索情報として記憶する第３の記憶手段とを有することを特徴とする。
【０００９】
また、語句の類義語及び語句が属するカテゴリの類義カテゴリを記憶する類義語・類義カテゴリ記憶手段と、類義語・類義カテゴリ記憶手段を参照して、上記抽出された語句の類義語及び当該語句が属する類義カテゴリを検索する類義語検索手段を更に有し、上記第３の記憶手段は、上記オブジェクト画像に関連付けて検索された類義語及び類義カテゴリを更に記憶するようにしてもよい。
【００１０】
また、上記登録受付手段は、登録対象の画像に付加されている当該画像に関連した内容を有するテキストデータを更に受け付け、上記語句抽出手段は上記受け付けたテキストデータに含まれる語句を更に抽出するようにしてもよい。
【００１１】
また、上記第２の記憶手段には、さらに語句に関連付けて当該語句のカテゴリが記憶されており、上記語句抽出手段は、抽出した語句が属するカテゴリを更に抽出して上記第３の記憶手段に記憶するようにしてもよい。
また、上記オブジェクトの視覚的特徴は、オブジェクトの色、テクスチャ、形状により特徴を特定するようにしてもよい。
また、上記単語抽出手段は、複数の単語を抽出し、上記語句抽出手段は、一つのオブジェクトに対して複数の語句を抽出するようにしてもよい。
【００１２】
本発明の一の観点にかかる情報検索装置は、検索対象となる検索画像の入力を受け付ける検索受付手段と、受け付けた検索画像からオブジェクトを抽出するオブジェクト抽出手段と、抽出されたオブジェクトの視覚的特徴を抽出する視覚的特徴抽出手段と、画像の視覚的特徴を表す単語を記憶する第１の記憶手段と、上記第１の記憶手段を参照して、抽出した視覚的特徴に対応する単語を抽出する単語抽出手段と、意味概念をあらわす語句と上記視覚的特徴をあらわす単語を関連付けて記憶する第２の記憶手段と、画像と当該画像に関連する語句及び当該語句が属するカテゴリとを関連付けて記憶する第３の記憶手段と、上記第２の記憶手段を参照して、上記検索画像の視覚的特徴を表す単語に関連付けられている１又は複数の語句を抽出する語句抽出手段と、上記第３の記憶手段を参照して、上記抽出された１又は複数の語句に関連付けられている画像を検索して出力する検索出力手段とを有することを特徴とする。
【００１３】
上記受付手段は、検索対象となる画像に関するテキストデータをさらに受け付け、上記検索出力手段は、上記第３の記憶手段を参照して、上記語句抽出手段により抽出された語句及び上記テキストデータに含まれる語句に関連付けられている画像を検索して出力するようにしてもよい。
【００１４】
また、上記出力された画像のうち、ユーザにより選択された割合又は頻度に基づいて画像に関連付けられている語句及びカテゴリの重み付けを行う評価手段を更に有し、上記第３の記憶手段には、評価手段により評価された画像に対する各語句及び各カテゴリの重み付けを記憶し、上記画像検索出力手段は、上記重み付けを参照して抽出された語句及びカテゴリの重み付けが高い画像を優先して検索出力するようにしてもよい。
【００１５】
本発明の一の観点にかかる検索情報登録方法は、画像中のオブジェクトの視覚的特徴を表す単語を記憶する第１の記憶手段と、意味概念をあらわす語句と上記視覚的特徴をあらわす単語とを関連付けて記憶する第２の記憶手段とを有するコンピュータにより実行される方法であって、上記コンピュータが、登録対象画像中のオブジェクトを抽出するステップと、当該オブジェクトの視覚的特徴を抽出するステップと、上記第１の記憶手段を参照して、抽出したオブジェクトの視覚的特徴に対応する単語を抽出するステップと、上記第２の記憶手段を参照して、上記オブジェクトの視覚的特徴を表す単語に関連付けられている語句を抽出し、抽出された語句を上記オブジェクトを含む画像と関連付けて第３の記憶手段に記憶するステップとを行うことを特徴とする。
【００１６】
また情報検索方法としては、画像の視覚的特徴を表す単語を記憶する第１の記憶手段と、意味概念をあらわす語句と上記視覚的特徴をあらわす単語を関連付けて記憶する第２の記憶手段と、上記語句と当該語句に関連する画像とを関連付けて記憶する第３の記憶手段とを有するコンピュータにより実行される方法であって、コンピュータが、検索対象となる検索画像の入力を受け付けるステップと、受け付けた検索画像からオブジェクトを抽出するステップと、抽出されたオブジェクトの視覚的特徴を抽出するステップと、上記第１の記憶手段を参照して、抽出した視覚的特徴に対応する単語を抽出するステップと、上記第２の記憶手段を参照して、上記検索画像の視覚的特徴を表す単語に関連付けられている１又は複数の語句を抽出するステップと、上記第３の記憶手段を参照して、上記抽出された１又は複数の語句に関連付けられている画像を検索して出力するステップとを行うようにしてもよい。
【００１７】
【発明の実施の形態】
以下、図面を参照して本発明にかかる検索情報登録装置及び情報検索装置を適用した装置の一実施形態について説明する。図１に本実施形態にかかる情報処理装置１の全体構成の一例を示す。
本実施形態にかかる情報処理装置１はコンピュータにより構成されており、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＣＰＵが実行するコンピュータプログラム、コンピュータプログラムや所定のデータなどを記憶することができるＲＡＭ、ＲＯＭなどの内部メモリ及びハードディスクドライブなどの外部記憶装置により図１に示した機能ブロックを構成することができる。
図１に示した機能ブロックは、視覚分類辞書部１０１、語彙辞書部１０２、類義語・類義カテゴリー辞書部１０３、画像データベース１０４、登録受付部１０５、オブジェクト抽出部１０６、特徴量抽出部１０７、単語推定部１０８、語句推定部１０９、類義語推定部１１０、検索受付部１１１、出力部１１２、評価処理部１１３から構成されている。
【００１８】
視覚分類辞書部１０１は、画像の視覚的特徴を表す単語を記憶する記憶部である。視覚分類とはある画像特徴量を持つ画像を人が認識している名詞や形容詞などの言葉に置き換えるための分類である。本実施形態では、色、テクスチャ、形状の視覚的特徴から視覚分類を行っており、視覚分類辞書部１０１にはそれぞれに対応して、色テーブル、テクスチャテーブル、形状テーブルが記憶されている。
【００１９】
色テーブルは、図２に示すように、各色をＨＳＶ（ＨｕｅＳａｔｕｒａｔｉｏｎＶａｌｕｅ）色空間により表したテーブルである。この図２に示したもののうち、Ｈは色相（Ｈｕｅ）、Ｓは彩度（Ｓａｔｕｒａｔｉｏｎ）、Ｖは明度（Ｖａｌｕｅ）をそれぞれ表している。そして、例えば、黒であればＨは「ｎｕｌｌ」でＳ及びＶはそれぞれ０のように、色ごとにＨＳＶのそれぞれの値が定義されている。これらの値はＨＳＶ色空間に基づいた定義となっている。
テクスチャテーブルは、図３に示すように、テクスチャーごとにスポットやストライプなどの模様があるか否かが分析され定義されている。なお、このテクスチャーの分析は例えば、ガボールフィルタにより各テクスチャーを解析した結果に基づいてテーブルを作成することができる。
形状テーブルは、図４に示すように、円、矩形、三角形などの形状ごとに、その特徴を表すｃｕｒｖａｔｕｒｅ、ｃｉｒｃｕｌａｔｉｏｎ、ｉｎｖａｒｉａｎｔｍｏｍｅｎｔ、ｄｉｆｆｅｒｅｎｔｉａｌｉｎｖａｒｉａｎｔｓのそれぞれ特徴量が定義されている。
【００２０】
語彙辞書部１０２は、意味概念をあらわす語句と視覚的特徴をあらわす単語を関連付けて記憶する記憶部である。この語彙辞書部１０２には、図５に示すように意味概念をあらわす語句と、その語句が属するカテゴリーと、その語句の類義語と、その語句の説明とが関連付けられてている。
カテゴリはその語句を所定の基準に基づいて分類したものであり、例えば、「ｃａｔ」に対しては、「ｂｅａｓｔ」、「ａｎｉｍａｌ」などのカテゴリーが相当する。また、説明は、語句の意味概念を説明するためのテキストデータであり、この説明文中に視覚的特徴を表す色、テクスチャー、形状などの単語が１又は複数含まれるようになっている。
【００２１】
類義語・類義カテゴリ辞書部１０３は、ある語句に対する類義語や、分類を表すカテゴリに類似するカテゴリを記憶したものである。
【００２２】
画像データベース１０４は、語句と画像とを関連付けてデータベース化したものである。この画像データベース１０４には、画像を識別するための画像ＩＤ、画像中のオブジェクトの領域を表す領域ＩＤ、オブジェクトの特徴量、特徴量重み、画像中のオブジェクトを表す語句、語句の類義語、各類義語の重み付け値、オブジェクトが属するカテゴリ、各カテゴリの重み付け値が記憶できるようになっている。
オブジェクトの特徴量は、特徴量抽出部１０７により算出されたオブジェクトの特徴量である。
特徴量重み付け値は、オブジェクトの色、テクスチャ、形状ごとの重み付けを表しており、図６中ｃは色、ｔはテクスチャ、ｓは形状のそれぞれ重み付け値を示している。
また、類義語重み付け値は、各類義語が当該画像のオブジェクトとどの程度の関連性を有しているかを表す値であり、図６の例では、左から順にｊａｇｕａｒが０．８、ｃｈｅｅｔａｈが０．７というように類義語ごとに対応して記憶されており、値が大きいほど関連性が高いことを表している。同様に、カテゴリーについても、各カテゴリーに対してカテゴリー重み付け値が対応するようになっており、図示の例ではｃｒｅａｔｕｒｅが０．９、ａｎｉｍａｌが０．７のように重み付けがされている。
【００２３】
オブジェクト抽出部１０６は、画像中からオブジェクトを抽出する処理を行う。
このオブジェクトの抽出は、例えば、オブジェクト抽出部１０６が、マルチスケール非線形色拡散（Ｍｕｌｔｉ−ｓｃａｌｅｉｎｈｏｍｏｇｅｎｅｏｕｓｃｏｌｏｒｄｉｆｆｕｓｉｏｎ）アルゴリズムを適用して、画像領域内を徐々に細分化していくことにより特徴的なオブジェクト領域を抽出するようにしてもよい。
【００２４】
特徴量抽出部１０７は、抽出したオブジェクトの視覚的特徴を抽出する処理を行う。視覚的特徴抽出としては、例えば、色はＨＳＶ色空間上でベクトル量子化を施したり、ガボールフィルタによる解析により抽出できる。
【００２５】
単語推定部１０８は、視覚分類辞書部１０１を参照して、抽出したオブジェクトの視覚的特徴に対応する単語を抽出する処理を行う。この処理は、例えば、単語推定部１０８が、視覚分類辞書部１０１を参照して、特徴量抽出部１０７により抽出されたデータに基づいて色、テクスチャ、形状のそれぞれに合致した単語を抽出するようになっている。
【００２６】
語句推定部１０９は、抽出された画像中のオブジェクトの視覚的特徴を表す単語に関連付けられている語句を推定して抽出する処理を行う。この処理は、語句推定部１０９が、語彙辞書部１０２の各語句の説明を参照して、視覚的特徴から抽出された単語が説明中に存在する場合に、当該語句を当該オブジェクトに関連する語句として抽出するようになっている。
また、語句推定部１０９は、画像に関連して当該画像の説明文などのテキストデータに付加されている場合には、当該テキストデータに含まれている語句を抽出する
また、語句推定部１０９は、語彙辞書部１０２から抽出した語句のカテゴリを抽出する。
【００２７】
類義語推定部１１０は、抽出された語句の類義語及びカテゴリの類義カテゴリを抽出する処理を行う。
類義語、類義カテゴリの検索は、類義語・類義カテゴリー辞書部１０３を参照してある語句の類義語や、あるカテゴリの類義カテゴリーを検索するごとに類義語、類義カテゴリを抽出できるようになっている。
【００２８】
検索受付部１１１は、ユーザからの画像検索要求を受け付ける処理を行う。検索要求の受付は、所定のディスプレイ上に所定の検索要求入力画面を表示して底から検索対象の画像等を入力できるようにすることができる。
なお、画像検索要求は、例えば、言葉を用いたテキストデータによる画像検索要求、所定の画像データに基づいた関連画像の検索要求、画像データとテキストデータとによる関連画像の検索要求のいずれでもよい。
【００２９】
出力部１１２は、画像データベース１０４を参照して、抽出された語句に関連付けられている画像を検索して所定のディスプレイなどに出力する処理を行う。
また、出力部１１２は、検索要求としてテキストデータを含む検索要求を受け付けた場合には、当該テキストデータに含まれる語句に関連付けられている画像を検索して出力することもできるように構成されている。
また、出力部１１２は、画像に関連付けて記憶されている語句に重み付けがされている場合には、当該重み付け情報を参照して、検索対象となった画像又はテキストデータから抽出された語句に対して高い重み付けがされている画像を優先して出力するようになっている。
【００３０】
評価処理部１１３は、出力部１１２により出力された画像のうち、ある語句や単語に基づいて抽出した画像を、ユーザが所望の画像として選択した頻度に基づいて、各画像に対する各語句の重み付け、特徴量に対する重み付けを計算し、これを画像データベース１０４に記憶するようになっている。この頻度は、例えば、語句の出現頻度や、いくつかの語句が組になって出現する頻度を表す共起頻度に基づいて重み付けを行うことができる。
【００３１】
次に、本発明にかかる検索画像登録方法の一実施形態について図７を参照して説明する。
図７において、まずユーザが情報処理装置１を使用して登録する画像データを入力すると、登録受付部１０５が画像の入力を受け付ける（Ｓ１０１）。
この登録される画像の一例を図８に示す。図８に示すように、入力されるデータは、画像データ１００１と、この画像の内容を説明するテキストデータの説明文１００２から構成されている。なお、入力データは、説明文１００２が付加されている必要はなく、画像データ１００１のみであってもよく任意である。また、画像に対する説明はビデオフレームに埋め込まれたテロップや音声も利用でき、これらをテキストデータに変換して同様な処理をすることができる。
なお、画像は前処理として量子化処理を行い、量子化されたデータを取り込むようにしてもよい。また、登録処理は、登録受付部１０５が所定の記憶媒体に記憶されている画像を読み取ることにより、またインターネット等の所定のネットワークを介して受け付けるようにしてもよく任意である。
【００３２】
画像を受け付けると、オブジェクト抽出部１０６が入力画像中の特徴オブジェクトを抽出する（Ｓ１０２）。
この抽出処理は、オブジェクト抽出部１０６が、以下の式に基づいて計算を行い、オブジェクトの抽出をすることができる。
【００３３】
【数１】

【００３４】
ここで、Ｉは色特徴ベクトルのことであり、たとえば、ＨＳＶ色データのベクトルを表す。また、ｘ、ｙは画像中の画素の位置、ｔは拡散回数（計算の繰り返し数）、ｄｉｖは発散（ｄｉｖｅｒｇｅｎｃｅ）、ｇｒａｄはガウス関数との畳み込みによる平滑化演算子を表す。この式に基づき、オブジェクト抽出部１０６が所定回数（ｔ）拡散処理を行うことにより特徴的なオブジェクトを抽出することができる。
【００３５】
なお、上述のｃ（ｘ，ｙ，ｔ）は拡散率（伝導率）関数を表し、このアルゴリズムでの多変量色拡散の計算ではｃ（ｘ，ｙ，ｔ）＝１／｛１＋（‖ｇｒａｄ｛Ｉ｝‖／Ｋ）^２｝を用いることができる。そして、ここに示すＫはテクスチャに依存する適応型伝導パラメータであり、これに基づいて拡散処理を制御（調整）するものであり、このパラメータ値は実験的に求めることができる。
【００３６】
また、上記ｇｒａｄを計算するために、２つのピクセルＰ_ｉ，ｊ，Ｐ_{ｉ＋ｍ，ｊ＋ｎ}の色差を、以下の式（２）に示すように定義することができる。ここで、添え字のｍおよびｎは、４方向の隣接ピクセルを表している。なお、ｗ_１、ｗ_２、ｗ_３は重み定数である。
【００３７】
【数２】

【００３８】
そして、具体的なパラメータを求めるためには、まず、スケール（σ）パラメータに応じて決まるピクセル近傍でのきめの粗さを（ｔｅｘｔｕｒｅｄｎｅｓｓ）求め、求めたきめの粗さを評価し、所定のきめ粗さの条件（例えば、きめ粗さの閾値２０％）に応じて、テクスチャあるいは色の勾配を求める処理を行う。このきめの粗さは、輪郭抽出した画像データ上の処理対象範囲内における輪郭となる画素の比率から求められる。なお、輪郭の抽出は、一次微分法、ｚｅｒｏ−ｃｒｏｓｓ法、スーベル法、キャニー法などの周知手段を用いることにより行う。
【００３９】
そして、きめの粗さがの所定の閾値より大きい場合は、ピクセル近傍におけるアングル差及び色差（すなわち角および色の度数分布の差分）を、次の式（３）及び（４）により求め、テクスチャ勾配Ｇ_{Ｔｅｘｔｕｒｅ}＝Ｗ_１＊Ｄ＋Ｗ_２ｄ_１を計算し、適応型伝導パラメータＫ＝ｆ_１（Ｇ_{Ｔｅｘｔｕｒｅ}）を求める。この際、アングル差及び色差は、対象画素の四近傍周辺領域について算出して平滑化処理する。この四近傍周辺領域は、５×５、７×７というようなｎ×ｎサイズの画素領域を基本とし、対象画素領域において上下左右に四方向に隣接する周辺画素領域をいう。なお、Ｗ_１、Ｗ_２は重み定数である。
【００４０】
【数３】

【００４１】
【数４】

【００４２】
ここで、式（３）中のＩ_ｉ、Ｊ_ｊは、アングルヒストグラムの値であり、角の度数分布について、各区画に含まれる度数の行列を表す。また、Ａは行列で、その要素は２つの方向（角）の類似度であり、アングルヒストグラムの角度数に応じた類似度をテーブルとして定義している。また、Ｎはアングルヒストグラムの分割総数であり、ここでは１０°刻みに分割した３６を用いる。なお、アングルヒストグラムはテクスチャ特徴から画素においてエッジの角度に関するものであり、テクスチャ特性は、ウェーブレット・フィルタやガボール・フィルタなどで求めることができる。
【００４３】
また、式（４）中のｈ及びｇは、色を量子化して区画に分けた度数分布ヒストグラムであり、ｄ_１は、その共通部分を表す。また、Ｍは色ヒストグラムの分割総数であり、ここではＨＳＶ色空間を７２に分割したものを用いる。
【００４４】
一方、このきめの粗さがの所定の閾値より小さい場合は、ピクセル近傍における色差を、前述の式（２）より色勾配Ｇ_{Ｃｏｌｏｒ}＝ｄを計算し、適応型伝導パラメータＫ＝ｆ_２（Ｇ_{Ｃｏｌｏｒ}）を求める。この際、色差は、対象画素の四近傍連接領域について算出して平滑化処理する。この四近傍連接領域とは、対象画素において上下左右の四方向に連接する連接画素領域をいう。
【００４５】
このようにして、拡散パラメータである適応型伝導パラメータにより拡散処理率を制御し、きめの粗い部分を「不要な（ｎｏｉｓｙ）」領域とみなし、境界を保存したまま拡散させることによりそのテクスチャを除去する。そして、最後に領域の拡大と併合を適用して特徴的なオブジェクトを抽出することができる。
【００４６】
オブジェクトの抽出がされると、特徴量抽出部１０７が抽出したオブジェクトの特徴量を抽出する（Ｓ１０３）。
本実施形態では、特徴量抽出部１０７が、抽出したオブジェクトの色をＨＳＶ色空間に変換したり、ガボールフィルタによりテクスチャーを解析したり、オブジェクトの形状を抽出したりする。
【００４７】
特徴量が抽出されると、単語推定部１０８が視覚分類辞書部１０１を参照して、抽出した特徴量から視覚的特徴を表す単語を推定して抽出する（Ｓ１０４）。
この処理は、例えば、単語推定部１０８が、特徴量としてＨＳＶの値がそれぞれＨ（ｎｕｌｌ）、Ｓ（０）、Ｖ（０）、であれば色テーブルを参照して対応する単語「黒」を抽出する。同様に、テクスチャのスポット、ストライプなどの特徴から対応するテクスチャ名を抽出し、また円、矩形などの形状の特徴量から対応する形状を抽出する。
【００４８】
オブジェクトの特徴量に対応する単語が抽出されると、語句推定部１０９が、語彙辞書部１０２を参照してこれらの語句をその説明中に有している語句及びその語句のカテゴリを抽出する（Ｓ１０５）。
この処理は、例えば、語句推定部１０９が「ブラウン」、「スポット」、「サーキュラー」などの抽出された単語から、これら単語が語句の説明に使用されている語句として「ｃａｔ」を抽出する。
なお、この際、登録対象画像の説明文などのテキストデータが付加された形で入力された場合には、語句推定部１０９は当該テキストデータから語句を抽出する。この語句の抽出は例えば、文法的に意味のある分節でテキストを区切ることにより抽出するようにしてもよい。
【００４９】
そして、類義語推定部１１０が、類義語・類義カテゴリ辞書部１０３を参照して、抽出された語句の類義語及び類義カテゴリを推定する（Ｓ１０６）。
これにより、例えば、抽出された語句が「ｃａｔ」であれば、それの類義語として「ジャガー」、「チーター」、「ライオン」が抽出され、カテゴリーとして「けもの」に類似するカテゴリとして「動物」のカテゴリが抽出される。
【００５０】
この画像と単語、語句及びカテゴリー、類義語・類義カテゴリーの関係を図９に示す。図９に示すように、所定の画像１００１に対して、まず推定された単語１００３として色の特徴を表すブラウン、テクスチャの特徴を表すスポット、形状の特徴を表すサーキュラーが関連付けられる。そして、これら単語を含む語句として１００４の「ｃａｔ」が抽出される。さらに、その類義語として１００５のジャガー、チーター、ライオンが関連付けられる。またカテゴリとして１００６のけものが関連つけられ、その類義カテゴリーとして１００７の動物が関連付けられるようになる。なお、図９中のカッコは画像中のオブジェクトに対する重み付けを表している。
【００５１】
そして、類義語推定部１１０が、抽出された語句及びカテゴリ、類義語と類義カテゴリとを画像に関連付けて画像データベース１０４に登録して（Ｓ１０７）、処理を終了する。なお、この際、登録受付部１０５により付与した画像を識別する画像ＩＤ、オブジェクト抽出部１０６により特定したオブジェクトの領域ＩＤ、特徴量抽出部１０７により抽出した画像特徴量も画像データベース１０４に登録する。
【００５２】
次に、登録された画像を検索する場合の処理の一例について図１０を参照して説明する。
図１０において、まず検索画面からユーザが検索画像を入力して検索要求を行うと、検索受付部１１１が画像検索要求を受付ける（Ｓ２０１）。
検索要求入力画面の一例を図１１に示す。図１１に示すように、検索対象の画像を入力する検索画像指定欄２００１、また必要なキーワードなど画像を検索するためのテキストデータを入力するキーワード入力欄２００２が設けられている。そして、これらに検索対象の画像又はキーワードを入力して検索ボタン２００３がユーザにより指示されることにより検索要求がされるようになっている。
なお、検索の際のキーワードなどのテキストデータを入力するか否かは任意である。また、検索対象画像は所定のユーザが外部から量子化された画像データを取り込んで入力してもよいし、また予め用意された所定の量子化された画像の中から選択して入力してもよく任意である。
【００５３】
画像検索要求を受け付けると、オブジェクト抽出部１０６が、上述のＳ１０２の処理と同様に検索対象画像中の特徴オブジェクトを抽出する（Ｓ２０２）。
【００５４】
オブジェクトの抽出がされると、特徴量抽出部１０７が抽出したオブジェクトの特徴量を抽出する（Ｓ２０３）。本実施形態では、特徴量抽出部１０７が、抽出したオブジェクトの色をＨＳＶ色空間に変換したり、ガボールフィルタによりテクスチャーを解析したり、オブジェクトの形状を抽出してオブジェクトの視覚的特徴量を抽出する。
【００５５】
特徴量が抽出されると、単語推定部１０８が視覚分類辞書部１０１を参照して、抽出した特徴量に対応する単語を推定して抽出する（Ｓ２０４）。この処理は、前述のＳ１０４の処理と同様に、例えば、単語推定部１０８は、特徴量としてＨＳＶの値がそれぞれＨ（ｎｕｌｌ）、Ｓ（０）、Ｖ（０）、であれば色テーブルを参照して対応する単語「黒」を抽出する。同様に、テクスチャのスポット、ストライプなどの特徴から対応するテクスチャ名を抽出し、また円、矩形などの特徴から対応する形状を抽出する。
【００５６】
視覚特徴量に関連する単語が抽出されると、語句推定部１０９が、語彙辞書部１０２を参照してこれらの単語をその説明中に有している語句及びその語句のカテゴリを抽出する（Ｓ２０５）。
【００５７】
語句が抽出されると、類義語推定部１１０が、前述のＳ１０６の処理と同様に、類義語・類義カテゴリ辞書部１０３を参照して、抽出された語句の類義語及び類義カテゴリを抽出する（Ｓ２０６）。
【００５８】
そして出力部１１２が、画像データベース１０４を参照して、抽出した語句及びカテゴリ、類義語及び類義カテゴリに関連付けられている画像を抽出する（Ｓ２０７）。
この抽出処理は、まず出力部１１２が、画像データベース１０４の画像の説明を参照して、説明の中に当該語句を含みかつ同じカテゴリに属する画像を抽出する。同様に、出力部１１２が、説明の中に類義語を含みかつ類義カテゴリに属する画像を抽出する。また、説明中に当該語句を含みかつ類義カテゴリに属する画像を抽出したり、説明中に類義語を含みかつ同じカテゴリーに属する画像を抽出したりしてもよい。これらの処理を行うことにより出力部１１２が１又は複数の選択候補画像を抽出する。
【００５９】
出力部１１２は、抽出した１又は複数の画像を所定のディスプレイなどに認識可能な形態で出力し、ユーザに対して所望の画像を選択するよう要求する（Ｓ２０８）。画像を出力した際の一例を図１２に示す。図１２に示すように、検索結果画面上には、数枚の画像が表示され、そのうちからユーザが意図する画像を選択できるようになっている。出力を行う際、出力部１１２は、画像に対する語句の重み付けに基づいて、検索要求された画像に近いと判断された画像、即ち抽出された語句、カテゴリに対する重み付けが高い順に画像を優先的に表示する。
そして、ユーザは所望の画像がない場合には、次の画像の候補を要求することで、出力部１１２は次に重み付けが高く優先度が高い画像を表示し、ユーザからの選択を待つようにしてもよい。
【００６０】
ユーザが意図する画像をポインティングデバイスなどで指示するなどして選択すると、出力部１１２は当該画像のみをディスプレイ等に表示する（Ｓ２０９）。
これにより、ユーザは選択した画像を参照したり、ダウンロードしたりして利用することができるようになる。
【００６１】
また、評価処理部１１３は、ユーザにより画像が選択されたことにより、画像データベース１０４に記憶されている当該画像に対する語句及びカテゴリの重み付けを更新し、今回の選択結果を重み付けにフィードバックして（Ｓ２１０）、処理を終了する。
このフィードバックは、例えば、当該画像を検索するに当たって「ｃａｔ」という語句を抽出した場合には、当該画像に対する「ｃａｔ」の重み付けを加算することにより行う。なお、この重み付けの計算は、当該画像が選択された全回数のうち当該語句が抽出された割合、当該語句に基づいて画像が抽出され、ユーザに選択された頻度などに応じて計算しても良く任意である。
【００６２】
以上のように、本実施形態によれば、検索用画像を登録する際、オブジェクト抽出部１０６により登録する画像中の特徴的なオブジェクトを抽出し、特徴量抽出部１０７が抽出されたオブジェクトの視覚的特徴を抽出し、単語推定部１０８が視覚分類辞書部１０１を参照して抽出したオブジェクトの視覚的特徴に対応する単語を抽出し、語句推定部１０９が語彙辞書部１０２を参照して抽出した単語に関連付けられている語句を抽出して、画像データベース１０４に当該語句とオブジェクトを含む画像とを関連付けて記憶するようにしたことから、画像が含んでいるオブジェクトにより関連性が深い語句を画像と対応付けてデータベース化することができる。
これにより、この画像データベース１０４を参照することにより、ユーザが特定した語句に基づいた画像の検索が容易にできるようになる。特に、オブジェクトの視覚的特徴を表す単語の検索と、当該単語に関連した意味を有する語句の検索の２段階を行って語句を抽出したことにより、オブジェクトの視覚的特徴を反映させ、かつ、オブジェクトの意味に関連する語句を画像に関連付けることができ、これによりユーザの意図に合った画像の検索が可能となる。
【００６３】
また、類義語推定部１１０により抽出された語句の類義語及び／又は当該語句が属する類義カテゴリを検索し、画像データベース１０４に画像に関連付けて検索された類義語及び／又は類義カテゴリを更に記憶するようにしたことから、画像を一つの語句やカテゴリーだけでなく、類義語や類義カテゴリとも関連付けることができ、検索を行う際の幅を広げることでユーザが意図する画像を容易に検索するための画像データベースの作成が可能となる。
【００６４】
また、登録受付部１０５が、登録対象の画像に付加されている当該画像に関連した内容を有する説明文などのテキストデータを受け付け、語句推定部１０９が受け付けたテキストデータに含まれる語句を抽出するようにしたことから、もともと画像に関連の深い画像の説明文などのテキストデータの内容を反映することができ、よりユーザの意図に適した検索が可能となるデータベースを作成できる。
【００６５】
また、画像を検索する際、検索受付部１１１により検索対象となる検索対象画像の入力を受付け、オブジェクト抽出部１０６が当該画像中の特徴的なオブジェクトを抽出し、特徴量抽出部１０７が抽出されたオブジェクトの視覚的特徴を抽出し、単語推定部１０８が視覚分類辞書部１０１を参照して抽出されたオブジェクトの視覚的特徴に対応する単語を抽出し、語句推定部１０９が語彙辞書部１０２を参照して抽出した単語に関連付けられている語句を抽出し、出力部１１２が画像データベース１０４を参照して、抽出された語句に関連付けられている画像を検索して出力するようにしたことから、検索対象画像に基づいてユーザが意図する関連画像を簡単に検索することができる。
これにより、ユーザが意図した関連する画像の検索が容易にできるようになる。特に、オブジェクトの視覚的特徴を表す単語の検索と、当該単語に関連する語句の検索の２段階を画像検索に取り入れたことで、オブジェクトの視覚的特徴を反映させ、かつ、オブジェクトの意味に関連する語句を関連付けることができユーザの意図に合った画像の検索が可能となる。
【００６６】
また、検索受付部１１１が検索対象となるテキストデータを受け付け、出力部１１２が抽出された語句及びテキストデータに含まれる語句に関連付けられている画像を検索して出力するようにすれば、予め検索対象画像に関連付けられている説明文などのテキストデータを利用してよりユーザの意図に適合した精度の高い画像検索をすることができる。
【００６７】
また、評価処理部１１３が出力された画像のうちユーザにより所望の画像として選択された割合又は頻度に基づいて、当該画像に対する語句の検索重み付けを行い、画像に関連付けてこの語句ごとの重み付けを画像データベース１０４に記憶するようにし、出力部１１２がこの重み付けを参照して、入力された語句の重み付けが高い画像を優先して出力するようにすれば、ユーザによるフィードバックを反映して、ユーザの意図する画像をより優先的に検索出力することができる。
【００６８】
なお、上述の実施形態では、類義語や類義カテゴリまで検索する例について説明したが、類義語や類義カテゴリまで検索する必要がない場合にはこれを行わなくともよい。また、語句又はカテゴリのいずれか一方のみ用いるようにしてもよい。
【００６９】
また、上述の実施形態では、画像に基づく検索の例について説明したが、画像を用いずにテキストデータのみに基づいて画像検索を行うようにしてもよい。
【００７０】
本実施形態の情報処理装置１用のコンピュータプログラムを、コンピュータ読み取り可能な媒体（ＦＤ、ＣＤ−ＲＯＭ等）に格納して配布してもよいし、搬送波に重畳し、通信ネットワークを介して配信することも可能である。
【００７１】
【発明の効果】
本発明によれば、ユーザが意図する画像を簡単に検索ができる。
【図面の簡単な説明】
【図１】本発明にかかる検索情報登録装置及び情報検索装置を適用した情報処理装置の一実施形態を示した機能ブロック図。
【図２】本実施形態にかかる視覚分類辞書部の色テーブルに記憶されるデータの一例を示した図。
【図３】本実施形態にかかる視覚分類辞書部のテクスチャーテーブルに記憶されるデータの一例を示した図。
【図４】本実施形態にかかる視覚分類辞書部の形状テーブルに記憶されるデータの一例を示した図。
【図５】本実施形態にかかる語彙辞書部に記憶されるデータの一例を示した図。
【図６】本実施形態にかかる画像データベースに記憶されるデータの一例を示した図。
【図７】本実施形態にかかる画像登録処理の流れを示した処理フロー。
【図８】本実施形態にかかる登録画像の一例を示した図。
【図９】本実施形態にかかる画像と単語、語句、カテゴリー、類義語、類義カテゴリーの関連を示した模式図。
【図１０】本実施形態にかかる画像検索処理の流れを示した処理フロー。
【図１１】本実施形態にかかる検索画像の入力画面の一例を示した図。
【図１２】本実施形態にかかる検索結果出力画面の一例を示した図。
【符号の説明】
１情報処理装置
１０１視覚分類辞書部
１０２語彙辞書部
１０３類義語・類義カテゴリー記憶部
１０４画像データベース
１０５登録受付部
１０６オブジェクト抽出部
１０７特徴量抽出部
１０８単語推定部
１０９語句推定部
１１０類義語推定部
１１１検索受付部
１１２出力部
１１３評価処理部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a technique for creating an information search and a database for information search, and more particularly to a technique suitable for searching for an image.
[0002]
[Prior art]
Conventionally, when a desired image is extracted from an image database storing a plurality of images, a feature of an image to be searched is specified by some method, and a search is performed based on the feature. As an example, since a person often uses words to express a desired image, it has been performed to express an image in words and create a database as text information for searching. In addition, since the expression of the features of the image is insufficient in the text, the feature amount of the image is extracted, and the database is used to search for the image.
Furthermore, in order to enable an image search that matches the user's intention to search, a method of using the latent meaning of the image as an index, a statistical modeling of the entire image by a two-dimensional MHMM, a statistical learning by an EM Various methods have been proposed, such as application of identification and identification, and association of concepts with Hopfield networks.
[0003]
As an example of this, a description based on the meaning and recognition of an image to be searched is received from a user, a semantic search such as a keyword is performed, and a similar image search is performed based on a morphological or spatial relationship between components. And a method of ranking search results based on similarity criteria and providing the results to users (for example, Patent Document 1).
Another method is to analyze a predetermined area of the image data, assign a feature amount such as a color, texture, and size of the area as an attribute, and create data representing a phase relationship between the areas. It has been proposed that the object, the relative position of the object and the phase relation of the object be extracted as a keyword by substituting the object from the topological relationship between the regions, and the keyword be registered together with the image data (for example, Patent Document 2).
[0004]
[Patent Document 1]
JP-A-10-240771
[Patent Document 2]
JP-A-10-49542
[0005]
[Problems to be solved by the invention]
However, in the conventional semantic-based image retrieval system, the model for evaluating the degree of similarity mostly depends on only the objective attributes of the visual attributes such as objects.
For this reason, there is a problem that the subjective similarity of an image that the user feels in a certain situation often differs from the subjective similarity, and a so-called semantic gap occurs. And because of this semantic gap, for example, even when searching for images of the same animal, images of a lion sitting, images of a lion facing right, images of a running lion would be All of them are related to the same word "lion", so when a user tries to search for images of sleeping lions, all images of lions are extracted or users such as images of running lions There is a problem that an unintended image is preferentially selected. For this reason, even if an attempt is made to search for an image based on the user's subjective intention from a large number of images, it is very difficult to search for the intended image.
[0006]
In addition to searching for an image related to a certain image in addition to searching for an image based on text, there are various ways of capturing and feeling the image depending on the user. Therefore, words associated with the same image differ depending on the user, and it is very difficult to search for an image related to the image based on a certain image.
[0007]
SUMMARY An advantage of some aspects of the invention is to provide a mechanism that allows a user to easily search for an image intended by a user.
[0008]
[Means for Solving the Problems]
In order to solve the above-mentioned problem, the present invention achieves a higher level of search by incorporating a two-stage search of a word representing a visual feature of an object in an image and a search for a phrase having a meaning related to the word. Provide a mechanism that can search with accuracy.
As an example of a mechanism for this, a search information registration device according to the present invention includes a registration reception unit that receives input of data of an image to be registered, an object extraction unit that extracts an object in the received image, Referring to the feature extracting means for extracting the visual feature, the first storing means for storing the visual feature of the object in the image in association with the word representing the visual feature, and the first storing means, Word extraction means for extracting a word corresponding to the visual feature of the extracted object; second storage means for storing a phrase representing a semantic concept in association with a word representing the visual characteristic; and the second storage Means for extracting a phrase associated with a word representing the visual feature of the object, and extracting the extracted phrase And having a third storage means for storing as search information in association with an image including the object.
[0009]
Further, referring to the synonym / synonym category storage means for storing synonyms of the word and the synonym category of the category to which the word belongs, and referring to the synonym / synonym category storage means, the synonym of the extracted word and the synonym to which the word belongs The apparatus may further include a synonym search unit that searches for a synonym category, and the third storage unit may further store a synonym and a synonym category searched in association with the object image.
[0010]
Further, the registration accepting unit may further accept text data having a content related to the image to be registered, which is added to the image to be registered, and the phrase extracting unit may further extract a phrase included in the accepted text data. It may be.
[0011]
The second storage unit further stores a category of the phrase in association with the phrase. The phrase extraction unit further extracts a category to which the extracted phrase belongs, and stores the category in the third storage unit. You may make it memorize | store.
The visual characteristics of the object may be specified by the color, texture, and shape of the object.
Further, the word extracting unit may extract a plurality of words, and the phrase extracting unit may extract a plurality of phrases for one object.
[0012]
An information search device according to one aspect of the present invention includes a search receiving unit that receives an input of a search image to be searched, an object extracting unit that extracts an object from the received search image, and a visual feature of the extracted object. Extracting a word corresponding to the extracted visual feature with reference to the first storing unit, the first storing unit storing a word representing the visual feature of the image, Word extracting means, second storage means for storing words representing the meaning concept and words representing the visual characteristics in association with each other, and storing images in association with the words related to the images and the category to which the words belong. With reference to the third storage unit and the second storage unit, one or a plurality of phrases associated with the words representing the visual characteristics of the search image are extracted. A phrase extraction means, with reference to the third storage means, and having a search output means for searching the image associated with the one or more words of the extracted.
[0013]
The receiving means further receives text data relating to an image to be searched, and the search output means refers to the third storage means and is included in the words and phrases extracted by the word and phrase extracting means and the text data. An image associated with a phrase may be searched for and output.
[0014]
The image processing apparatus further includes an evaluation unit that weights a phrase and a category associated with the image based on a ratio or a frequency selected by a user among the output images, and the third storage unit includes: The weighting of each word and each category with respect to the image evaluated by the evaluation means is stored, and the image search output means preferentially searches and outputs an image in which the weight of the words and categories extracted with reference to the weight is high. You may do so.
[0015]
A search information registration method according to one aspect of the present invention includes a first storage unit that stores a word that represents a visual feature of an object in an image, a word that represents a semantic concept, and a word that represents the visual feature. A second storage means for storing in association with a computer, wherein the computer extracts an object in the registration target image; and extracts a visual feature of the object. Extracting a word corresponding to the visual characteristic of the extracted object with reference to the first storage means, and associating the word with the word representing the visual characteristic of the object with reference to the second storage means Extracting the extracted phrase, and storing the extracted phrase in the third storage unit in association with the image including the object. It is characterized in.
[0016]
Further, as the information retrieval method, a first storage means for storing a word representing a visual feature of an image, a second storage means for storing a word representing a semantic concept and a word representing the visual feature in association with each other, A method executed by a computer having a third storage unit that stores the word and an image related to the word in association with each other, wherein the computer receives an input of a search image to be searched, Extracting an object from the retrieved search image, extracting a visual feature of the extracted object, and extracting a word corresponding to the extracted visual feature by referring to the first storage means. Referring to the second storage means to extract one or a plurality of phrases associated with words representing visual characteristics of the search image. And-up, with reference to the third storage means, it may be performed and outputting searching for images that are associated with one or more words of the extracted.
[0017]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, an embodiment of a search information registration apparatus and an apparatus to which an information search apparatus according to the present invention is applied will be described with reference to the drawings. FIG. 1 shows an example of the overall configuration of an information processing apparatus 1 according to the present embodiment.
The information processing apparatus 1 according to the present embodiment is configured by a computer, and includes a CPU (Central Processing Unit), a computer program executed by the CPU, a RAM that can store the computer program, predetermined data, and the like. The functional blocks shown in FIG. 1 can be configured by an external storage device such as a memory and a hard disk drive.
The functional blocks shown in FIG. 1 include a visual classification dictionary unit 101, a vocabulary dictionary unit 102, a synonymous / synonymous category dictionary unit 103, an image database 104, a registration accepting unit 105, an object extracting unit 106, a feature amount extracting unit 107, a word It comprises an estimating unit 108, a phrase estimating unit 109, a synonymous estimating unit 110, a search receiving unit 111, an output unit 112, and an evaluation processing unit 113.
[0018]
The visual classification dictionary unit 101 is a storage unit that stores words representing visual characteristics of an image. The visual classification is a classification for replacing an image having a certain image feature amount with words such as nouns and adjectives recognized by humans. In the present embodiment, visual classification is performed based on the visual characteristics of colors, textures, and shapes, and a color table, a texture table, and a shape table are stored in the visual classification dictionary unit 101, respectively.
[0019]
As shown in FIG. 2, the color table is a table in which each color is represented by an HSV (Hue Saturation Value) color space. In FIG. 2, H indicates hue (Hue), S indicates saturation (Saturation), and V indicates lightness (Value). For example, in the case of black, H is “null”, S and V are each 0, and each value of HSV is defined for each color. These values are defined based on the HSV color space.
As shown in FIG. 3, the texture table is defined by analyzing whether there is a pattern such as a spot or a stripe for each texture. In this texture analysis, for example, a table can be created based on the result of analyzing each texture by a Gabor filter.
As shown in FIG. 4, the shape table defines, for each shape such as a circle, a rectangle, a triangle, and the like, characteristic amounts of a curvature, a circulation, an invariant moment, and a differential inverses representing the features.
[0020]
The vocabulary dictionary unit 102 is a storage unit that associates and stores words that represent semantic concepts and words that represent visual features. As shown in FIG. 5, the vocabulary dictionary unit 102 associates a phrase representing a meaning concept, a category to which the phrase belongs, a synonym of the phrase, and a description of the phrase.
The category is obtained by classifying the phrase based on a predetermined criterion. For example, a category such as “best” or “animal” corresponds to “cat”. The explanation is text data for explaining the concept of the meaning of a word, and one or more words such as colors, textures, and shapes representing visual features are included in the explanation.
[0021]
The synonym / synonym category dictionary unit 103 stores synonyms for a certain phrase or a category similar to a category representing a classification.
[0022]
The image database 104 is a database in which words and images are associated with each other. The image database 104 includes an image ID for identifying an image, an area ID representing a region of an object in the image, a feature amount of the object, a feature amount weight, a phrase representing the object in the image, synonyms of the phrase, and various synonyms. , The category to which the object belongs, and the weight value of each category.
The feature amount of the object is the feature amount of the object calculated by the feature amount extraction unit 107.
The feature amount weighting values indicate the weighting of each color, texture, and shape of the object. In FIG. 6, c indicates the color, t indicates the texture, and s indicates the weighting value of the shape.
Further, the synonym weighting value is a value indicating the degree of relevance of each synonym with the object of the image. In the example of FIG. 6, jaguar is 0.8 and cheetah is .0 in order from the left. For example, 7 is stored in association with each synonym, and the larger the value, the higher the relevance. Similarly, for the categories, category weight values correspond to the respective categories. In the illustrated example, the weights are set such that “creation” is 0.9 and “animal” is 0.7.
[0023]
The object extracting unit 106 performs a process of extracting an object from an image.
The object extraction is performed by, for example, applying a multi-scale inhomogeneous color diffusion algorithm to the object extraction unit 106 to gradually subdivide the image area into a characteristic object area. May be extracted.
[0024]
The feature amount extraction unit 107 performs a process of extracting a visual feature of the extracted object. As the visual feature extraction, for example, a color can be extracted by performing vector quantization on an HSV color space or analyzing by a Gabor filter.
[0025]
The word estimating unit 108 performs a process of extracting a word corresponding to the visual feature of the extracted object with reference to the visual classification dictionary unit 101. This processing is performed, for example, such that the word estimating unit 108 refers to the visual classification dictionary unit 101 and extracts words that match each of the color, texture, and shape based on the data extracted by the feature amount extracting unit 107. It has become.
[0026]
The phrase estimating unit 109 performs a process of estimating and extracting a phrase associated with a word representing a visual feature of an object in the extracted image. In this processing, the phrase estimation unit 109 refers to the description of each phrase in the vocabulary dictionary unit 102 and, if a word extracted from the visual feature exists in the description, replaces the phrase with a phrase related to the object. It is designed to be extracted as
When the phrase estimation unit 109 is attached to text data such as a description of the image in association with the image, the phrase estimation unit 109 extracts the phrase included in the text data.
Further, the phrase estimation unit 109 extracts the category of the phrase extracted from the vocabulary dictionary unit 102.
[0027]
The synonym estimation unit 110 performs a process of extracting a synonym of the extracted phrase and a synonym category of the category.
For synonym and synonym category search, synonyms and synonym categories can be extracted each time a synonym of a phrase is referred to by referring to the synonym / synonym category dictionary unit 103 or a synonym category of a certain category is searched. I have.
[0028]
The search receiving unit 111 performs a process of receiving an image search request from a user. In receiving the search request, a predetermined search request input screen can be displayed on a predetermined display so that an image or the like to be searched can be input from the bottom.
The image search request may be, for example, any of an image search request using text data using words, a request for searching for a related image based on predetermined image data, and a request for searching for a related image based on image data and text data.
[0029]
The output unit 112 performs a process of referring to the image database 104 to search for an image associated with the extracted phrase and output the image to a predetermined display or the like.
Further, when a search request including text data is received as a search request, the output unit 112 is configured to be able to search for and output an image associated with a phrase included in the text data. I have.
In addition, when the words stored in association with the image are weighted, the output unit 112 refers to the weighting information and outputs the words or phrases extracted from the search target image or text data. The image with the higher weight is preferentially output.
[0030]
The evaluation processing unit 113 weights each phrase with respect to each image based on the frequency at which the user selects an image extracted based on a certain phrase or word from the images output by the output unit 112 as a desired image, A weight for the feature amount is calculated, and the calculated weight is stored in the image database 104. This frequency can be weighted based on, for example, the appearance frequency of a phrase or the co-occurrence frequency representing the frequency of appearance of some phrases as a set.
[0031]
Next, an embodiment of a search image registration method according to the present invention will be described with reference to FIG.
In FIG. 7, first, when the user inputs image data to be registered using the information processing apparatus 1, the registration receiving unit 105 receives an input of an image (S101).
FIG. 8 shows an example of the registered image. As shown in FIG. 8, the input data is composed of image data 1001 and a description 1002 of text data describing the contents of the image. It should be noted that the input data does not need to have the description 1002 added, and may be only the image data 1001 arbitrarily. In addition, a description of an image can use a telop or a sound embedded in a video frame, and these can be converted into text data and similar processing can be performed.
It should be noted that the image may be subjected to a quantization process as pre-processing, and the quantized data may be imported. The registration process may be performed by the registration receiving unit 105 by reading an image stored in a predetermined storage medium or by receiving the image via a predetermined network such as the Internet.
[0032]
When receiving the image, the object extracting unit 106 extracts a feature object in the input image (S102).
In this extraction process, the object extraction unit 106 can perform calculation based on the following equation to extract an object.
[0033]
(Equation 1)

[0034]
Here, I is a color feature vector, and represents, for example, a vector of HSV color data. Also, x and y are pixel positions in the image, t is the number of times of diffusion (the number of repetitions of calculation), div is divergence, and grad is a smoothing operator by convolution with a Gaussian function. Based on this equation, the object extraction unit 106 can perform a predetermined number of (t) diffusion processes to extract a characteristic object.
[0035]
Note that c (x, y, t) described above represents a diffusion rate (conductivity) function, and c (x, y, t) = 1 / ｛1+ (‖grad) in multivariate color diffusion calculation by this algorithm. {I} / K)² ｝ Can be used. K shown here is an adaptive conduction parameter that depends on the texture, and controls (adjusts) the diffusion process based on the parameter. The parameter value can be obtained experimentally.
[0036]
To calculate the grad, two pixels P_{i, j}, P_{i + m, j + n}Can be defined as shown in the following equation (2). Here, the subscripts m and n represent adjacent pixels in four directions. Note that w₁ , W₂ , W₃ Is a weight constant.
[0037]
(Equation 2)

[0038]
Then, in order to obtain specific parameters, first, the texture roughness in the vicinity of the pixel determined according to the scale (σ) parameter is determined (textureness), the obtained texture roughness is evaluated, and the predetermined texture is determined. A process for obtaining a texture or color gradient is performed according to a roughness condition (for example, a texture roughness threshold of 20%). The roughness of the texture is obtained from the ratio of pixels forming the contour within the processing target range on the contour-extracted image data. The contour is extracted by using well-known means such as the first derivative method, the zero-cross method, the Sobel method, and the Canny method.
[0039]
If the roughness is larger than a predetermined threshold, the angle difference and the color difference in the vicinity of the pixel (that is, the difference between the frequency distribution of the corner and the color) are obtained by the following equations (3) and (4), and the texture is obtained. Gradient G_Texture= W₁ * D + W₂ d₁, And the adaptive conduction parameter K = f₁(G_Texture). At this time, the angle difference and the color difference are calculated with respect to the four neighboring areas of the target pixel and smoothed. The four neighboring peripheral regions are basically pixel regions of n × n size such as 5 × 5 and 7 × 7, and are peripheral pixel regions adjacent to the target pixel region in four directions, up, down, left, and right. Note that W₁ , W₂ Is a weight constant.
[0040]
(Equation 3)

[0041]
(Equation 4)

[0042]
Here, I in the equation (3)_i, J_jIs a value of an angle histogram, and represents a matrix of frequencies included in each section with respect to a frequency distribution of angles. A is a matrix, whose elements are similarities in two directions (corners), and defines a similarity according to the number of angles of the angle histogram as a table. N is the total number of divisions of the angle histogram, and here, 36 divided by 10 ° is used. Note that the angle histogram relates to the angle of an edge in a pixel from a texture feature, and the texture property can be obtained by a wavelet filter, a Gabor filter, or the like.
[0043]
H and g in the equation (4) are frequency distribution histograms obtained by quantizing a color and dividing the color into sections, and d₁ Represents the common part. M is the total number of divisions of the color histogram. Here, a value obtained by dividing the HSV color space into 72 is used.
[0044]
On the other hand, if the roughness is smaller than the predetermined threshold value, the color difference in the vicinity of the pixel is calculated using the color gradient G according to the above equation (2)._Color= D, And the adaptive conduction parameter K = f₂(G_Color). At this time, the color difference is calculated for a four-neighbor connection region of the target pixel and subjected to smoothing processing. The four-neighbor connection region refers to a connection pixel region connected in four directions of up, down, left, and right in the target pixel.
[0045]
In this manner, the diffusion processing rate is controlled by the adaptive conduction parameter, which is a diffusion parameter, and the rough portion is regarded as an “noisy” region, and the texture is removed by diffusing the boundary while preserving the boundary. I do. Finally, a characteristic object can be extracted by applying area enlargement and merging.
[0046]
When the object is extracted, the feature amount extraction unit 107 extracts the feature amount of the extracted object (S103).
In the present embodiment, the feature amount extraction unit 107 converts the color of the extracted object into the HSV color space, analyzes the texture using a Gabor filter, and extracts the shape of the object.
[0047]
When the feature amount is extracted, the word estimation unit 108 refers to the visual classification dictionary unit 101 to estimate and extract a word representing a visual feature from the extracted feature amount (S104).
In this process, for example, if the HSV values are H (null), S (0), and V (0) as the feature amounts, the word estimation unit 108 refers to the color table and refers to the corresponding word “black”. Is extracted. Similarly, a corresponding texture name is extracted from features such as texture spots and stripes, and a corresponding shape is extracted from features such as circles and rectangles.
[0048]
When the word corresponding to the feature amount of the object is extracted, the phrase estimating unit 109 refers to the vocabulary dictionary unit 102 and extracts the phrase having the phrase in its description and the category of the phrase ( S105).
In this process, for example, the phrase estimating unit 109 extracts “cat” from the extracted words such as “brown”, “spot”, and “circular” as the words used in the description of the words.
At this time, when text data such as a description of the registration target image is input, the phrase estimation unit 109 extracts a phrase from the text data. For example, the phrase may be extracted by separating the text with a grammatically meaningful segment.
[0049]
Then, the synonym estimating section 110 refers to the synonym / synonym category dictionary section 103 to estimate synonyms and synonym categories of the extracted words (S106).
Thus, for example, if the extracted phrase is “cat”, “jaguar”, “cheetah”, and “lion” are extracted as synonyms thereof, and the category “animal” is similar to the category “kemono” Are extracted.
[0050]
FIG. 9 shows the relationship between this image, words, phrases and categories, and synonyms and synonyms. As shown in FIG. 9, a predetermined image 1001 is associated with a brown representing a color feature, a spot representing a texture feature, and a circular representing a shape feature as the estimated word 1003. Then, “cat” of 1004 is extracted as a phrase including these words. Further, 1005 jaguars, cheetahs, and lions are associated as synonyms. Also, 1006 beasts are associated as categories, and 1007 animals are associated as their synonymous categories. Note that parentheses in FIG. 9 represent weights for objects in the image.
[0051]
Then, the synonym estimation unit 110 registers the extracted words and categories, and the synonyms and synonyms in the image database 104 in association with the image (S107), and ends the process. At this time, the image ID for identifying the image given by the registration receiving unit 105, the area ID of the object specified by the object extracting unit 106, and the image feature extracted by the feature extracting unit 107 are also registered in the image database 104.
[0052]
Next, an example of a process for searching for a registered image will be described with reference to FIG.
In FIG. 10, first, when a user inputs a search image from the search screen and makes a search request, the search receiving unit 111 receives the image search request (S201).
FIG. 11 shows an example of the search request input screen. As shown in FIG. 11, a search image designation field 2001 for inputting an image to be searched and a keyword input field 2002 for inputting text data for searching for an image such as a necessary keyword are provided. Then, a search request is made by inputting an image or a keyword to be searched for and inputting a search button 2003 by the user.
Whether or not to input text data such as a keyword at the time of a search is optional. In addition, the search target image may be input by a predetermined user by taking in quantized image data from the outside, or by selecting and inputting from a predetermined quantized image prepared in advance. Well optional.
[0053]
Upon receiving the image search request, the object extraction unit 106 extracts a characteristic object in the search target image in the same manner as in the processing of S102 described above (S202).
[0054]
When the object is extracted, the feature amount extraction unit 107 extracts the feature amount of the extracted object (S203). In the present embodiment, the feature amount extraction unit 107 converts the color of the extracted object into the HSV color space, analyzes the texture with a Gabor filter, extracts the shape of the object, and extracts the visual feature amount of the object. I do.
[0055]
When the feature amount is extracted, the word estimation unit 108 estimates and extracts a word corresponding to the extracted feature amount with reference to the visual classification dictionary unit 101 (S204). In this process, similar to the process of S104 described above, for example, the word estimating unit 108 generates a color table if the HSV values as the feature amounts are H (null), S (0), and V (0), respectively. The corresponding word “black” is extracted by reference. Similarly, a corresponding texture name is extracted from features such as texture spots and stripes, and a corresponding shape is extracted from features such as circles and rectangles.
[0056]
When words related to the visual feature are extracted, the phrase estimating unit 109 refers to the vocabulary dictionary unit 102 and extracts the words having these words in their descriptions and the category of the words (S205). ).
[0057]
When the phrase is extracted, the synonym estimating unit 110 refers to the synonym / synonym category dictionary unit 103 to extract synonyms and synonym categories of the extracted phrase, as in the processing of S106 described above (S206). ).
[0058]
Then, the output unit 112 refers to the image database 104 and extracts the extracted words and categories, synonyms and images associated with the synonymous categories (S207).
In this extraction processing, first, the output unit 112 refers to the description of the image in the image database 104 and extracts an image that includes the phrase in the description and belongs to the same category. Similarly, the output unit 112 extracts an image that includes a synonym in the description and belongs to a synonym category. Further, an image that includes the phrase in the description and belongs to a synonymous category may be extracted, or an image that includes a synonym in the description and belongs to the same category may be extracted. By performing these processes, the output unit 112 extracts one or a plurality of selection candidate images.
[0059]
The output unit 112 outputs the extracted one or more images to a predetermined display or the like in a recognizable form, and requests the user to select a desired image (S208). FIG. 12 shows an example when an image is output. As shown in FIG. 12, several images are displayed on the search result screen, from which the user can select an intended image. At the time of output, the output unit 112 preferentially displays images determined to be close to the image requested to be searched, that is, images in the order of higher weighting to the extracted words and categories, based on the weighting of the words to the images. I do.
Then, when the user does not have the desired image, the output unit 112 requests the next image candidate, so that the output unit 112 displays the next weighted image with the highest priority, and waits for selection from the user. May be.
[0060]
When the user selects an intended image by instructing with a pointing device or the like, the output unit 112 displays only the image on a display or the like (S209).
As a result, the user can refer to or download the selected image and use it.
[0061]
In addition, when the user selects an image, the evaluation processing unit 113 updates the weights of the words and categories for the image stored in the image database 104 and feeds back the current selection result to the weighting (S210). ), End the process.
This feedback is performed, for example, by adding the weight of “cat” to the image when the word “cat” is extracted in searching for the image. In addition, the calculation of the weighting may be performed based on the ratio of the extraction of the phrase out of the total number of times the image is selected, the image extraction based on the phrase, and the frequency selected by the user. Well optional.
[0062]
As described above, according to the present embodiment, when a search image is registered, a characteristic object in the image to be registered is extracted by the object extraction unit 106, and the characteristic amount extraction unit 107 visually recognizes the extracted object. The word corresponding to the visual feature of the object extracted by the word estimating unit 108 with reference to the visual classification dictionary unit 101 is extracted, and the phrase estimating unit 109 is extracted with reference to the vocabulary dictionary unit 102. Since the phrase associated with the word is extracted and the phrase and the image including the object are stored in the image database 104 in association with each other, the phrase more closely related to the object included in the image is defined as the image. A database can be created by associating them.
Thus, by referring to the image database 104, it is possible to easily search for an image based on the phrase specified by the user. In particular, by performing a two-step process of searching for a word representing a visual characteristic of an object and a search for a phrase having a meaning related to the word and extracting the phrase, the visual characteristics of the object are reflected, Can be associated with the image, which makes it possible to search for an image that meets the user's intention.
[0063]
Further, a synonym of the phrase extracted by the synonym estimation unit 110 and / or a synonym category to which the phrase belongs are searched, and the synonym and / or the synonym category searched for in association with the image are further stored in the image database 104. As a result, images can be associated not only with a single phrase or category, but also with synonyms and synonymous categories, and images that allow users to easily search for images intended by expanding the range of searches. It is possible to create a database.
[0064]
Further, the registration accepting unit 105 accepts text data such as a description added to the image to be registered and has a content related to the image, and extracts a phrase included in the text data accepted by the phrase estimating unit 109. By doing so, it is possible to reflect the contents of the text data such as the description of the image that is originally closely related to the image, and to create a database that enables a search more suitable for the user's intention.
[0065]
When searching for an image, the search receiving unit 111 receives an input of a search target image to be searched, the object extracting unit 106 extracts a characteristic object in the image, and the feature amount extracting unit 107 extracts the characteristic object. The visual feature of the extracted object is extracted, the word estimating unit 108 extracts a word corresponding to the visual feature of the extracted object with reference to the visual classification dictionary unit 101, and the phrase estimating unit 109 executes the vocabulary dictionary unit 102. Since the phrase associated with the extracted word is extracted and the output unit 112 refers to the image database 104 to search for and output the image associated with the extracted phrase, A related image intended by the user can be easily searched based on the search target image.
This makes it possible to easily search for a related image intended by the user. In particular, the two steps of searching for words that represent the visual characteristics of the object and searching for phrases related to the word are incorporated into the image search, so that the visual characteristics of the object are reflected and the meaning related to the meaning of the object is reflected. This makes it possible to search for an image that matches the user's intention.
[0066]
In addition, if the search receiving unit 111 receives text data to be searched, and the output unit 112 searches and outputs the extracted words and images associated with the words included in the text data, the search can be performed in advance. Using text data such as a description associated with the target image, it is possible to perform a highly accurate image search that is more suitable for the user's intention.
[0067]
In addition, the evaluation processing unit 113 performs search weighting of a phrase for the image based on the ratio or frequency selected as a desired image by the user among the output images, and associates the weighting for each phrase with the image by the image. If the image is stored in the database 104 and the output unit 112 refers to this weight and preferentially outputs the image in which the weight of the input word is high, the feedback from the user is reflected, and the intention of the user is reflected. An image to be searched can be searched and output with higher priority.
[0068]
In the above-described embodiment, an example in which synonyms and synonymous categories are searched has been described. However, when it is not necessary to search for synonyms and synonymous categories, this need not be performed. Alternatively, only one of a word and a category may be used.
[0069]
Further, in the above-described embodiment, an example of a search based on an image has been described. However, an image search may be performed based on only text data without using an image.
[0070]
The computer program for the information processing apparatus 1 of the present embodiment may be stored and distributed on a computer-readable medium (FD, CD-ROM, or the like), or may be superimposed on a carrier wave and distributed via a communication network. It is also possible.
[0071]
【The invention's effect】
According to the present invention, an image intended by a user can be easily searched.
[Brief description of the drawings]
FIG. 1 is a functional block diagram showing an embodiment of a search information registration device and an information processing device to which the information search device is applied according to the present invention.
FIG. 2 is an exemplary view showing an example of data stored in a color table of a visual classification dictionary unit according to the embodiment;
FIG. 3 is an exemplary view showing an example of data stored in a texture table of the visual classification dictionary unit according to the embodiment;
FIG. 4 is an exemplary view showing an example of data stored in a shape table of the visual classification dictionary unit according to the embodiment;
FIG. 5 is a view showing an example of data stored in a vocabulary dictionary unit according to the embodiment.
FIG. 6 is a view showing an example of data stored in an image database according to the embodiment.
FIG. 7 is a processing flow showing a flow of an image registration processing according to the embodiment;
FIG. 8 is a view showing an example of a registered image according to the embodiment.
FIG. 9 is a schematic diagram showing the relationship between images and words, phrases, categories, synonyms, and synonymous categories according to the embodiment.
FIG. 10 is a processing flow showing a flow of an image search processing according to the embodiment;
FIG. 11 is an exemplary view showing an example of a search image input screen according to the embodiment;
FIG. 12 is an exemplary view showing an example of a search result output screen according to the embodiment.
[Explanation of symbols]
1 Information processing device
101 Visual Classification Dictionary
102 Vocabulary dictionary
103 Synonym / Synonym Category Storage
104 Image Database
105 Registration reception unit
106 Object extractor
107 Feature Extraction Unit
108 Word Estimator
109 Phrase Estimation Unit
110 Synonym Estimation Unit
111 search reception section
112 output unit
113 Evaluation Processing Unit

Claims

A registration accepting unit for accepting input of image data to be registered;
Object extraction means for extracting an object in the received image;
Feature extracting means for extracting visual features of the object;
First storage means for storing a visual feature of an object in an image and a word representing the visual feature in association with each other;
Word extracting means for extracting a word corresponding to the visual feature of the extracted object with reference to the first storage means;
Second storage means for storing a phrase representing a semantic concept and a word representing the visual feature in association with each other;
A phrase extraction unit for extracting a phrase associated with a word representing a visual feature of the object with reference to the second storage unit;
Third storage means for storing the extracted phrase as search information in association with the image including the object,
A search information registration device, comprising:

Synonym and synonym category storage means for storing synonyms of the phrase and synonyms of the category to which the phrase belongs;
Referring to the synonym / synonym category storage means, further comprising synonym search means for searching for synonyms of the extracted phrase and the synonym category to which the phrase belongs,
The third storage unit further stores a synonym and a synonym category searched for in association with the object image.
The search information registration device according to claim 1.

The registration accepting unit further accepts text data having a content related to the image added to the image to be registered,
The phrase extracting means further extracts a phrase included in the received text data;
The search information registration device according to claim 1.

Search accepting means for accepting an input of a search image to be searched;
Object extracting means for extracting an object from the received search image;
Visual feature extracting means for extracting visual features of the extracted object;
First storage means for storing words representing visual features of the image;
Word extraction means for extracting a word corresponding to the extracted visual feature with reference to the first storage means;
Second storage means for storing a phrase representing a semantic concept and a word representing the visual feature in association with each other;
Third storage means for storing an image including the object, a phrase related to the object, and a category to which the phrase belongs in association with each other;
A phrase extracting unit that extracts one or a plurality of phrases associated with a word representing a visual feature of the search image with reference to the second storage unit;
A search output unit that searches for and outputs an image associated with the extracted one or more phrases with reference to the third storage unit;
An information retrieval device, comprising:

The receiving means further receives text data on an image to be searched,
The search output unit refers to the third storage unit, searches and outputs an image associated with the phrase extracted by the phrase extraction unit and the phrase included in the text data,
The information retrieval device according to claim 4.

Among the output images, the image processing apparatus further includes an evaluation unit that weights a phrase and a category associated with the image based on the ratio or frequency selected by the user,
The third storage means stores the weight of each word and each category for the image evaluated by the evaluation means,
The image search and output unit preferentially searches for and outputs an image having a high weight of a word and a category extracted with reference to the weight,
The information retrieval device according to claim 4.

A computer having first storage means for storing a word representing a visual feature of an object in an image and second storage means for storing a phrase representing a semantic concept and a word representing the visual feature in association with each other. The method to be performed,
The above computer is
Receiving data of the image to be registered;
Extracting an object in the registration target image;
Extracting visual features of the extracted object;
Extracting a word corresponding to the visual feature of the extracted object with reference to the first storage means;
With reference to the second storage unit, a phrase associated with a word representing a visual feature of the object is extracted, and the extracted phrase is stored in the third storage unit in association with an image including the object. Steps and
Search information registration method characterized by performing the following.