JP2004192555A

JP2004192555A - Information management method, device and program

Info

Publication number: JP2004192555A
Application number: JP2002362729A
Authority: JP
Inventors: Hitoshi Okamoto; 仁岡本
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2002-12-13
Filing date: 2002-12-13
Publication date: 2004-07-08

Abstract

<P>PROBLEM TO BE SOLVED: To provide an information management method, device and program managing information by selecting an appropriate feature quantity according to the information to be managed. <P>SOLUTION: This information management method includes a feature quantity acquisition step S14 of acquiring a plurality of feature quantities from each information of an information group to be a management object, an information selecting step S18 of selecting the information having a specific attribute out of the information group to be the management object, and a reference feature quantity vector selecting step S20 of constituting a feature quantity vector for every information by rearranging the plurality of feature quantities in order and finding a reference feature quantity vector effective for dividing the selected information from unselected information based on a vector distance between the feature quantity vector of the selected information and a feature quantity of the unselected feature quantity vector. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、目的に応じて逐次基準を変更して特定の属性を有する情報を管理する情報管理方法、情報管理装置及び情報管理プログラムに関する。
【０００２】
【従来の技術】
従来、画像データ等の情報を管理するためには、情報をデータベースに登録する際にキーワード等の情報を特定するための付加情報を付与する方法が主流であった。例えば、データベースから特定の情報に類似する情報を検索する際には、検索する情報に関連するキーワードを用いて、そのキーワードが付加情報として付加されている情報を選出する。
【０００３】
しかしながら、このような付加情報を利用する方法では、情報を登録する際に付加情報を付与する必要があり、登録するユーザに煩雑な登録作業を強いる欠点がある。
【０００４】
そこで、情報に含まれる特徴的な要素を特徴量として抽出して特徴量ベクトルを構成し、複数の情報間における特徴量ベクトルの距離に基づいて情報を管理する方法が用いられるようになっている。
【０００５】
例えば、図７のように、管理の基準となる基準画像データをｎ×ｍ個のメッシュ区画に分割し、１つのメッシュ区画（ｉ，ｊ）から特徴量として彩度Ｉｒ_ｉｊ及び明度Ｂｒ_ｉｊを抽出した場合、１つのメッシュ区画（ｉ，ｊ）から２次元の特徴量ベクトルが得られる。画像データ全体ではｎ×ｍ×２個のメッシュ区画が存在するため、画像データ全体ではｎ×ｍ×２次元の特徴量ベクトルが取得される。同様に、管理の対象となる対象画像も基準画像と等しいメッシュに分割し、基準画像のメッシュ区画と対応するメッシュ区画（ｉ，ｊ）から特徴量として彩度Ｉｏ_ｉｊ及び明度Ｂｏ_ｉｊを抽出した場合、ｎ×ｍ×２次元の特徴量ベクトルを得ることができる。
【０００６】
続いて、図８に示すように、このようにして得られた基準画像の特徴量ベクトルと管理対象画像の特徴量ベクトルとの距離ｄを求める。２つの特徴量ベクトルの距離ｄは数式（１）によって求めることができる。
【０００７】
【数１】

【０００８】
この距離ｄは、基準画像と対象画像との非類似性を示す。すなわち、距離ｄが大きいほど画像データ間の類似性は低く、距離ｄが小さいほど画像データ間の類似性は高いといえる。従って、基準画像データと対象画像データとの特徴量ベクトルの距離ｄを基準とした類似性に基づいて画像を比較・分類・検索等し、情報管理を行うことができる。
【０００９】
このような特徴量ベクトルを利用した情報管理方法においては、より適確な検索結果を得るために、１つの情報から多くの種類の特徴量を抽出する必要がある。しかし、必要以上に多くの特徴量を抽出して特徴量ベクトルが多数の特徴量空間軸を持つことは、特徴量ベクトルを照合する際の処理時間を増大させるのみならず、冗長な特徴量の影響により検索精度を低下させる原因にもなる。
【００１０】
そこで、抽出された特徴量で表現される特徴量ベクトルに対して主成分分析を行い、特徴量ベクトルの直交性を保つと共に分散が小さい特徴量を除外することによって、特徴量ベクトルを構成する特徴量の数を少なく抑える方法が用いられることが多い。
【００１１】
また、特開２０００−１１２９４３号公報には、データベースのデータを特徴量毎にその値の順に並べたデータのリストを作成しておき、基底インデクスから順次選択された１つの特徴量につき、リストからテストデータと成分値の差の小さい順にデータを指すポインタを更新し、ポインタの指すデータとテストデータとの１つの特徴量の値の差に基づいて、終了条件を満足するかを判定し、満足しなければポインタの指すデータとテストデータとの部分空間における距離に基づいて、棄却条件を満足するかを判定し、満足しなければポインタの指すデータとテストデータとの全空間における距離を計算し、計算された距離の小さい順に所定数のデータを抽出するデータの検出方法が開示されている。
【００１２】
また、特開２００１−１３４５７３号公報には、特徴量ベクトルが示す点を含む多次元空間を分割するセル空間を階層的に構築し、階層化された各セル空間を表現するビット列を用いて特徴量ベクトルを一意に管理する類似データ検索方法が開示されている。当該方法における検索では、分割時のセル幅をもとにセル空間を復元し、検索キーの特徴ベクトルが示す多次元空間内の点とセル空間の距離を計算し、算出した距離により候補セル空間を絞り込み、候補セル空間に含まれる特徴量ベクトルのセル空間内の点について、検索キーの特徴量ベクトルのセル空間内の点との距離をもとに検索を行う。
【００１３】
【特許文献１】
特開２０００−１１２９４３号公報
【特許文献２】
特開２００１−１３４５７３号公報
【００１４】
【発明が解決しようとする課題】
しかしながら、特徴量ベクトルを構成する特徴量の数が非常に多い場合や分散が大きい特徴量しか含まれない場合には、主成分分析を行っても特徴量ベクトルを構成する特徴量の数を十分に低減できない問題があった。
【００１５】
一方、特開２０００−１１２９４３号公報や特開２００１−１３４５７３号公報に開示されている方法を用いた場合には、所定の特徴量の空間軸上において距離が大きいという理由で、本来情報の特徴を良く表した特徴量が排除されてしまう問題があった。
【００１６】
さらに、これらの従来技術では、特徴量ベクトルを構成する特徴量の数の削減が検索結果の妥当性とは無関係に行われており、特徴量の数の削減によってユーザにとって望ましくない検索結果となる致命的な問題を含んでいた。
【００１７】
加えて、上記従来技術のいずれの方法を用いた場合でも、検索対象となる情報から多種多数の特徴量を抽出し、主成分分析等の負荷の高い処理を行う必要があった。また、いわゆる「ピアツーピア」によるネットワーク検索のように検索対象が随時発生する場合には、事前に特徴量の抽出や主成分分析等の処理を行うことができないため、上記従来技術を用いることが困難であった。
【００１８】
本発明は、上記従来技術の問題を鑑み、少なくとも上記課題の１つを解決すべく、管理する情報に応じて適切な特徴量を選択して情報を管理する情報管理方法、情報管理装置及び情報管理プログラムを提供することを目的とする。
【００１９】
【課題を解決するための手段】
上記課題を解決できる本発明は、複数の情報を互いの類似性に基づいて管理する情報管理方法であって、逐次若しくは所定の時点で、管理対象となる情報群の中の情報から各々複数の特徴量を取得し、前記特徴量を取得した情報から特定の属性を有する情報を選択し、前記取得した複数の特徴量の中から、選択された情報の特徴量の値と選択されなかった情報の特徴量の値との差が大きい特徴量を基準特徴量として求め、前記基準特徴量の値に基づいて情報を管理することを特徴とする。
【００２０】
より具体的には、管理対象となる情報群の中の情報から各々複数の特徴量を取得する特徴量取得工程と、前記特徴量を取得した情報の中から特定の属性を有する情報を選択する情報選択工程と、前記複数の特徴量を組み替えて情報毎に複数の特徴量ベクトルを構成し、前記選択された情報の特徴量ベクトルと前記選択されなかった情報の特徴量ベクトルとのベクトル間距離に基づいて、前記選択された情報と前記選択されなかった情報とを区別するために有効な基準特徴量ベクトルを求める基準特徴量ベクトル選択工程とを含み、逐次若しくは所定の時点で前記基準特徴量ベクトルを構成し、その値に基づいて情報を管理することを特徴とする。
【００２１】
ここで、前記情報選択工程は、ユーザに情報を選択させることが好適である。
【００２２】
また、別の具体的態様は、互いに共通する属性を有する複数の情報群と当該共通する属性を有さない情報群とを分別する情報分別工程と、前記共通する属性を有する情報群と前記共通する属性を有さない情報群とに含まれる情報の各々から複数の特徴量を取得する特徴量取得工程と、前記複数の特徴量を組み替えて複数の特徴量ベクトルを構成し、前記共通する属性を有する情報群の特徴量ベクトルと前記共通する属性を有さない情報群の特徴量ベクトルとのベクトル間距離に基づいて、前記共通する属性を有する情報群と前記共通する属性を有さない情報群とを区別する基準となる基準特徴量ベクトルを抽出する基準特徴量ベクトル選択工程とを含み、逐次若しくは所定の時点で前記基準特徴量ベクトルを構成し、その値に基づいて情報を管理することを特徴とする。
【００２３】
ここで、前記情報分別工程は、前記共通する属性を有する情報群と前記共通する属性を有さない情報群とをユーザに分別させることが好適である。
【００２４】
さらに、前記基準特徴量ベクトル選択工程は、前記共通する属性を有する情報群における特徴量の平均値から構成される平均特徴量ベクトルと、前記共通する属性を有さない情報群における特徴量の平均値から構成される平均特徴量ベクトルと、のベクトル間距離が最も大きくなる平均特徴量ベクトルを基準特徴量ベクトルとして選択することが好ましい。
【００２５】
さらに、上記情報管理方法において、予備的検索のキーとなるキー情報を取得するキー情報取得工程をさらに含み、前記情報選択工程は、前記キー情報に基づいて予備的に検索された情報群の中から特定の特徴を有する情報を選択することものとしても良い。
【００２６】
また、上記情報管理方法において、キー情報を取得するキー情報取得工程をさらに含み、前記特徴量取得工程は、前記キー情報から複数の特徴量を取得し、前記基準特徴量ベクトル選択工程は、前記キー情報から取得された複数の特徴量を組み替えて前記キー情報に対する複数の特徴量ベクトルを構成し、さらに前記共通する属性を有する情報群の特徴量ベクトルと前記キー情報の特徴量ベクトルとのベクトル間距離及び前記共通する属性を有さない情報群の特徴量ベクトルと前記キー情報の特徴量ベクトルとのベクトル間距離に基づいて前記基準特徴量ベクトルを選択するものとしても良い。
【００２７】
このとき、前記基準特徴量ベクトル選択工程において前記基準特徴量ベクトルが選択できない場合には、情報から所定の特徴量の値を取得し、当該所定の特徴量の値に基づいて情報を管理することが好適である。
【００２８】
上記課題を解決するための本発明は、複数の情報を互いの類似性に基づいて管理する情報管理装置であって、逐次若しくは所定の時点で、管理対象となる情報群の中の各情報から複数の特徴量を取得し、前記特徴量を取得した情報から特定の属性を有する情報を選択し、前記取得した複数の特徴量の中から、選択された情報の特徴量の値と選択されなかった情報の特徴量の値との差が大きい特徴量を基準特徴量として求め、前記基準特徴量の値に基づいて情報を管理することを特徴とする。
【００２９】
具体的には、管理対象となる情報群の中の各情報から複数の特徴量を取得する特徴量取得手段と、前記管理対象となる情報群の中から特定の属性を有する情報を選択する情報選択手段と、前記特徴量取得手段において取得された複数の特徴量を組み替えて複数の特徴量ベクトルを構成し、前記選択された情報の特徴量ベクトルと前記選択されなかった情報の特徴量ベクトルとのベクトル間距離に基づいて、前記選択された情報と前記選択されなかった情報とを区別するために有効な基準特徴量ベクトルを求める基準特徴量ベクトル選択手段とを含み、逐次若しくは所定の時点で前記基準特徴量ベクトルを構成し、その値に基づいて情報を管理することを特徴とする。
【００３０】
ここで、前記情報選択手段は、ユーザに情報を選択させることが好適である。
【００３１】
また、別の具体的態様は、互いに共通する属性を有する複数の情報群と当該共通する属性を有さない情報群とを分別する情報分別手段と、前記共通する属性を有する情報群と前記共通する属性を有さない情報群とに含まれる情報の各々から複数の特徴量を取得する特徴量取得手段と、前記特徴量取得手段において取得された複数の特徴量を組み替えて複数の特徴量ベクトルを構成し、前記共通する属性を有する情報群の特徴量ベクトルと前記共通する属性を有さない情報群の特徴量ベクトルとのベクトル間距離に基づいて、前記共通する属性を有する情報群と前記共通する属性を有さない情報群とを区別する基準となる基準特徴量ベクトルを抽出する基準特徴量ベクトル選択手段とを含み、逐次若しくは所定の時点で前記基準特徴量ベクトルを構成し、その値に基づいて情報を管理することを特徴とする。
【００３２】
ここで、前記情報分別手段は、前記共通する属性を有する情報群と前記共通する属性を有さない情報群とをユーザに分別させることが好適である。
【００３３】
さらに、前記基準特徴量ベクトル選択手段は、前記共通する属性を有する情報群における特徴量の平均値から構成される平均特徴量ベクトルと、前記共通する属性を有さない情報群における特徴量の平均値から構成される平均特徴量ベクトルと、のベクトル間距離が最も大きくなる平均特徴量ベクトルを基準特徴量ベクトルとして選択することが好ましい。
【００３４】
さらに、上記情報管理装置において、予備的検索のキーとなるキー情報を取得するキー情報取得手段をさらに含み、前記情報選択手段は、前記キー情報に基づいて予備的に検索された情報群の中から特定の特徴を有する情報を選択するものとしても良い。
【００３５】
また、上記情報管理装置において、キー情報を取得するキー情報取得手段をさらに含み、前記特徴量取得手段は、前記キー情報から複数の特徴量を取得し、前記基準特徴量ベクトル選択手段は、前記特徴量取得手段においてキー情報から取得された複数の特徴量を組み替えて前記キー情報に対する複数の特徴量ベクトルを構成し、さらに前記共通する属性を有する情報群の特徴量ベクトルと前記キー情報の特徴量ベクトルとのベクトル間距離及び前記共通する属性を有さない情報群の特徴量ベクトルと前記キー情報の特徴量ベクトルとのベクトル間距離に基づいて前記基準特徴量ベクトルを選択するものとしても良い。
【００３６】
このとき、前記基準特徴量ベクトル選択手段において前記基準特徴量ベクトルが選択できない場合には、情報から所定の特徴量の値を取得し、当該所定の特徴量の値に基づいて情報を管理することが好適である。
【００３７】
上記課題を解決できる本発明は、複数の情報を互いの類似性に基づいて管理する情報管理プログラムであって、コンピュータに、管理対象となる情報群の中の情報から各々複数の特徴量を取得する特徴量取得工程と、前記特徴量を取得した情報の中から特定の属性を有する情報を選択する情報選択工程と、前記複数の特徴量を組み替えて複数の特徴量ベクトルを構成し、前記選択された情報の特徴量ベクトルと前記選択されなかった情報の特徴量ベクトルとのベクトル間距離に基づいて、前記選択された情報と前記選択されなかった情報とを区別するために有効な基準特徴量ベクトルを求める基準特徴量ベクトル選択工程とを含む処理を逐次若しくは所定の時点で実行させることを特徴とする。
【００３８】
上記課題を解決できる本発明は、複数の情報を互いの類似性に基づいて管理する情報管理プログラムであって、コンピュータに、互いに共通する属性を有する複数の情報群と当該共通する属性を有さない情報群とを分別する情報分別工程と、前記共通する属性を有する情報群と前記共通する属性を有さない情報群とに含まれる情報の各々から複数の特徴量を取得する特徴量取得工程と、前記複数の特徴量を組み替えて複数の特徴量ベクトルを構成し、前記共通する属性を有する情報群の特徴量ベクトルと前記共通する属性を有さない情報群の特徴量ベクトルとのベクトル間距離に基づいて、前記共通する属性を有する情報群と前記共通する属性を有さない情報群とを区別する基準となる基準特徴量ベクトルを抽出する基準特徴量ベクトル選択工程とを含む処理を逐次若しくは所定の時点で実行させることを特徴とする。
【００３９】
【発明の実施の形態】
＜情報処理装置の構成＞
本発明の実施の形態における情報管理装置１００は、図１に示すように、処理部１０、記憶部１２、入力部１４、出力部１６及び外部インターフェース１８から基本的に構成される。これら各部は、バス２０を介して、互いに情報伝達可能に接続される。
【００４０】
本実施の形態における情報管理装置１００は、図２に示すように、装置外部のネットワーク２００を介して、他の情報処理装置１０２（コンピュータ等）、画像読取装置１０４（スキャナ等）、画像形成装置１０６（プリンタ、コピー機又はこれらの複合機等）、携帯端末１０８（携帯電話）等と情報交換可能に接続される。情報管理装置１００は、ネットワーク２００を介して、ユーザから情報管理の実行命令を受けて遠隔処理を行ったり、他の情報機器に蓄積されている情報を検索対象として利用したりすることができる。
【００４１】
処理部１０は、記憶部１２に記憶されている情報管理プログラムを実行し、検索のキーとなるキー情報を入力部１４や外部インターフェース１８から受け、情報を管理する。処理については後に詳細に説明する。
【００４２】
記憶部１２は、情報管理プログラムや検索対象となる情報のデータベース等を記憶する。また、キー情報や検索処理の中間処理結果等を一時的に格納及び保持する。これらの情報は、バス２０を介して、処理部１０から適宜参照されて処理に供される。記憶部１２としては、半導体メモリ、ハードディスク装置、光ディスク装置等を用いることができる。
【００４３】
入力部１４は、ユーザからのプログラム実行の指令や処理に必要なパラメータを受け付ける。また、情報が画像情報である場合には、データベースを構築したり、キー情報となるキー画像データを入力したりするために、画像データの読み込みを行う。入力部１４としては、マウス等のポインティングデバイス、キーボード、スキャナ等を用いることができる。また、出力部１６は、ユーザに対して処理結果や処理に必要な情報を提示する。出力部１６としては、ディスプレイ装置、プリンタ等を用いることができる。また、外部インターフェース１８は、情報管理装置とネットワークとを情報伝達可能に接続する。例えば、ＴＣＰ／ＩＰ等の標準化された通信プロトコルを用いて情報管理装置と外部の情報処理装置の情報のやり取りを可能にする。
【００４４】
以上の説明のように、本実施の形態の情報管理装置は、一般的に広く用いられているコンピュータを用いて実現することができる。
【００４５】
＜第１の情報管理方法＞
本実施の形態における第１の情報管理方法を、図３に示すフローチャートに沿って詳細に説明する。ここでは、一例として、キー情報となる画像データ（以下、キー画像データという）に類似する情報を検索対象となる検索対象画像データ群から検索する方法について説明する。
【００４６】
本実施の形態における第１の情報管理方法をコンピュータで処理可能なプログラムとしてコーディングし、当該プログラムを記憶部１２に格納及び保持し、処理部１０において実行することによって本実施の形態における情報管理装置で処理することができる。また、以下の説明では、記憶部１２に検索対象となる画像データ群がデータベースとして予め格納及び保持されているものとする。
【００４７】
ステップＳ１０のキー情報取得工程では、検索のキーとなるキー情報を取得する。ユーザに対してキー情報の入力を促す画面を出力部１６に表示させ、ユーザにキー情報を入力部１４から入力させる。ユーザは、例えば、スキャナ等の画像読取装置を用いてキーとなる画像データ（以下、本実施の形態においてキー画像データという）を読み取らせ、情報管理装置に入力することができる。取得されたキー画像データは記憶部１２に格納され、以下の処理において処理部１０等から適宜参照される。
【００４８】
ステップＳ１２のキー情報特徴量取得工程では、キー画像データから特徴量の値を取得する。このとき、キー画像データから抽出可能な全種類の特徴量の値を抽出し、全特徴量から特徴量ベクトルを構成することができる。一方、予めキー画像データの特徴を良く表す特徴量が特定できる場合には、それらの特徴量の値のみを抽出して特徴量ベクトルを構成しても良い。特徴量は、例えば、ＲＧＢ色空間における各成分、Ｌ＊ａ＊ｂ＊色空間における各成分、明度、彩度、輝度、コントラスト等の互いに独立した変量から選択して用いることができる。
【００４９】
また、画像データを構成する全画素の各々から特徴量の値を抽出しても良いし、画像データを所定サイズのメッシュ状に分割し、各メッシュ区画に含まれる画素値の平均値を特徴量の値として取得しても良い。
【００５０】
例えば、図４に示すように、キー画像データＡをｎ×ｍ個のメッシュに分割し、各メッシュ区画（ｉ，ｊ）（ｉ＝１〜ｎ，ｊ＝１〜ｍ）からα個の特徴量の値Ａｋ_ｉｊ（ｋ＝１〜α）を取得することによって、キー画像データＡに対してｎ×ｍ×α次元の特徴量ベクトルＶ_Ａ（Ａ１_１１，Ａ２_１１，・・・・，Ａｋ_ｉｊ，・・・・，Ａα_ｎｍ）を得ることができる。
【００５１】
ステップＳ１４の特徴量取得工程では、データベースに含まれる検索対象となる画像データ（以下、検索対象画像データという）からキー情報から取得した特徴量と同一の特徴量を取得する。特徴量の取得は、ステップＳ１２と同様に行うことができる。
【００５２】
例えば、データベース内に検索対象画像データＢｇ（ｇ＝１，２，３・・・）が含まれている場合には、各検索対象画像データＢｇをキー画像データＡと同一のｎ×ｍ個のメッシュ区画に分割し、各メッシュ区画（ｉ，ｊ）（ｉ＝１〜ｎ，ｊ＝１〜ｍ）からキー画像データＡから抽出したものと等しいα個の特徴量の値Ｂｇｋ_ｉｊ（ｋ＝１〜α）を抽出する。これらの特徴量を用いて、各検索対象画像データＢｇについてｎ×ｍ×α次元の特徴量ベクトルＶ_Ｂｇ（Ｂｇ１_１１，Ｂｇ２_１１，・・・・，Ｂｇｋ_ｉｊ，・・・・，Ｂｇα_ｎｍ）を構成することもできる。
【００５３】
ステップＳ１６の予備検索工程では、キー画像データから取得された特徴量から構成される特徴量ベクトルを用いて、検索対象画像データ群からキー画像データに類似する画像データを予備的に検索する。
【００５４】
この工程では、上記従来技術と同様に、検索対象画像データの各々から抽出された特徴量から構成される特徴量ベクトルとキー画像データの特徴量ベクトルとのベクトル間距離を算出する。このベクトル間距離が小さいほど検索対象画像データとキー画像データとの類似性が高いものと判断できるため、ベクトル間距離が小さい順に所定数の検索対象画像データを選択する。
【００５５】
次に、キー画像データの特徴量ベクトルと検索対象画像データの特徴量ベクトルとのベクトル間距離ｄを数式（２）を用いて算出する。以下の説明において、他のベクトル間距離も同様に算出することができる。
【００５６】
【数２】

【００５７】
データベースに含まれる検索対象画像データの全てに対してキー画像データとのベクトル間距離ｄを算出し、ベクトル間距離ｄが小さい順に所定数の画像データを抽出する。
【００５８】
なお、この予備検索工程は、データベースのなかからキー画像データに類似する検索対象画像データを比較的広い範囲で絞り込み、後にユーザに選択させることを目的としている。従って、検索された画像データをユーザが確認し、その中から選択可能な程度の数（例えば、数個から数十個）の画像データが抽出されるように特徴量の数を定めておくことが好ましい。
【００５９】
ステップＳ１８の情報選択工程では、ステップＳ１６の予備検索工程で検索された画像データをユーザに提示し、その中から最終的な検索結果として相応しいと考えられる基準となる複数の画像データ（以下、基準画像データという）を選択させる。
【００６０】
例えば、予備検索工程において検索された画像データのサムネイルを出力部１６に表示させ、その中からユーザがキー画像データに類似していると判断するものをマウス等のポインティングデバイスを用いて選択させることが好適である。
【００６１】
ステップＳ２０の基準特徴量ベクトル選択工程では、検索対象画像データ毎に既に取得された複数の特徴量の中から少なくとも１つの特徴量を順次選択し、選択した特徴量を組み合わせた特徴量ベクトルを構成する。
【００６２】
例えば、画像データＢ１に対して特徴量ベクトルＶ_Ｂ１を構成するｎ×ｍ×α個の特徴量（Ｂ１１_１１，Ｂ１２_１１，・・・・，Ｂ１ｋ_ｉｊ，・・・・，Ｂ１α_ｎｍ）（ｋ＝１〜α，ｉ＝１〜ｎ，ｊ＝１〜ｍ）が取得されているとする。この場合、図５に示すように、１つの特徴量Ｂ１ｋ_ｉｊのみから構成される特徴量ベクトル、２つの特徴量Ｂ１ｋ_ｉｊ及びＢ１ｇ_ｑｒ（ｇ＝１〜α（ｋを除く），ｑ＝１〜ｎ（ｉを除く），ｒ＝１〜ｍ（ｊを除く））との組合せから構成される特徴量ベクトル・・・、と各特徴量を順次組み合わせて得られる特徴量ベクトルを構成する。
【００６３】
次に、検索対象画像データの各々に対して構成された複数の特徴量ベクトルのうち、基準画像データとして選ばれた画像データ群と選ばれなかった検索対象画像データ群とを区別するために有効な特徴量ベクトルを選び出す。すなわち、ユーザが主観的に選択した基準画像データ間においてベクトル間距離が小さく、基準画像データ群と他の検索対象画像データ群とのベクトル間距離が大きくなる特徴量ベクトルを基準特徴量ベクトルとして選択する。
【００６４】
まず、基準画像データの中から１つの画像データを選択し、その他の画像データとの間において、同一の特徴量を構成要素とする特徴量ベクトル同士のベクトル間距離を算出する。全画像データに対して特徴量ベクトルのベクトル間距離が算出されると、ベクトル間距離が小さい画像データが上位となるように画像データをソートする。その結果、基準画像データが基準画像データでない検索対象画像データよりも上位となっていた場合にその特徴量ベクトルを基準特徴量ベクトルとして選択する。
【００６５】
以下、具体例を用いて説明する。ステップＳ１６の情報選択工程において基準画像データＢ１及びＢ２が選択されたとすると、基準画像データＢ１及びＢ２のうちいずれか１つ、例えば、基準画像データＢ１を１つ選択し、その基準画像データＢ１と他の画像データＢｇ（ｇ＝２，３・・・）との間で特徴量ベクトルのベクトル間距離を算出する。このとき、同一の特徴量を構成要素とする特徴量ベクトル同士のベクトル間距離を算出する。
【００６６】
例えば、基準画像データＢ１と画像データＢ２との間でＢ１１_１１からなる特徴量ベクトルとＢ２１_１１からなる特徴量ベクトルとのベクトル間距離を算出する。次に、基準画像データＢ１と画像データＢ３との間でＢ１１_１１からなる特徴量ベクトルとＢ３１_１１からなる特徴量ベクトルとのベクトル間距離を算出する。同様に、基準画像データＢ１と他の画像データとのベクトル間距離も算出する。
【００６７】
このように算出されたベクトル間距離を比較し、ベクトル間距離が小さい順に画像データを並べる。同様に、他の特徴量の組合せからなる特徴量ベクトル同士についてもベクトル間距離を算出し、ベクトル間距離が小さい順に画像データをソートする。
【００６８】
その結果、基準画像データＢ２に対するベクトル間距離が他の画像データに対するベクトル間距離よりも小さくなる特徴量ベクトルを基準特徴量ベクトルとして選定する。
【００６９】
基準画像データが基準画像データでない画像データよりも上位となる特徴量ベクトルが多数ある場合、いずれの基準特徴量や基準特徴量ベクトルも基準画像データと他の画像データとを区別するために有効であるといえるが、例えば、今後の処理負担を軽減するためにはできるだけ少数の基準特徴量から構成される基準特徴量ベクトルを採択することが好ましい。一方、厳密に検索処理を行うことを目的とする場合には、より多数の基準特徴量から構成される基準特徴量ベクトルを採択することが好ましい。また、基準画像データに対するベクトル間距離と他の画像データに対するベクトル間距離との差が最も大きい特徴量ベクトルを採択することも好適である。
【００７０】
このようにして得られた特徴量ベクトルを構成する特徴量が基準特徴量となる。
【００７１】
なお、本実施の形態では全検索対象画像データに対して処理を行ったが、これらの処理をステップＳ１６で予備検索された画像データ群のみに限定して行うことも好適である。すなわち、予備検索された画像データのうちユーザがキー画像データに類似するものとして選択した基準画像データと選択されなかった画像データとの特徴量ベクトルのベクトル間距離を算出し、そのベクトル間距離に基づいて基準画像データとその他の画像データを区別する基準特徴量ベクトルを求めることもできる。このように、処理する画像データ数を限定することによって、処理負担を低減することができる。
【００７２】
また、本実施の形態では、全ての処理を１つの情報管理装置によって行うものとして説明したが、これに限られるものではない。例えば、ネットワーク等による情報伝達を用いて、キー情報取得工程と情報選択工程を情報管理装置以外の外部端末を用いてユーザに行わせ、キー情報特徴量取得工程、特徴量取得工程、予備検索工程及び基準特徴量ベクトル選択工程を情報管理装置に行わせる形態としても良い。
【００７３】
また、検索処理を確実に行うために、基準特徴量ベクトルが選択できない場合には、情報から所定の特徴量の値を取得し、当該所定の特徴量の値に基づいて検索を行うことが好適である。
【００７４】
以上のように、本実施の形態によれば、基準画像データと他の画像データとを区別するために有効な基準特徴量ベクトル及び基準特徴量を求めることができる。以降、検索を続ける場合に、キー画像データから基準特徴量ベクトルを構成する基準特徴量の値を抽出して検索を行うことによって、ユーザの検索目的に合致した検索を行うことができる。
【００７５】
例えば、ステップＳ１２のキー情報特徴量取得工程及びステップＳ１４の特徴量取得工程において選択された基準特徴量の値のみを抽出するようにフィードバックすることによって、本実施の形態の検索処理を重ねる度にユーザの趣向により合致した検索を実現することもできる。
【００７６】
また、特徴量ベクトルを構成する特徴量の数が非常に多い場合や分散が大きい特徴量しか含まれない場合にも、特徴量ベクトルを構成する特徴量の数を十分に低減できる。さらに、従来技術のように所定の特徴量の空間軸上において距離が大きいという理由のみで本来画像データの特徴を良く表した特徴量ベクトルが排除される問題を回避することができ、ユーザにとって望ましい検索結果を得ることができる。
【００７７】
さらに、実際の検索作業を行うまえに事前に基準特徴量や基準特徴量ベクトルを抽出しておくこともできるので、「ピアツーピア」によるネットワーク検索にも対応することができる。
【００７８】
なお、本実施の形態では、画像データを対象として説明を行ったが、本発明の対象はこれに限られるものではない。例えば、文字を含む文書情報であれば、形態素解析等によって特徴量を抽出し、それらの特徴量に対して同様の処理を適用することができる。
【００７９】
＜第２の情報管理方法＞
本実施の形態における第２の情報管理方法を、図に示すフローチャートに沿って詳細に説明する。ここでは、管理対象となる対象画像データを予め分別されたグループに振り分ける処理について説明する。
【００８０】
本実施の形態における第２の情報管理方法をコンピュータで処理可能なプログラムとしてコーディングし、当該プログラムを記憶部１２に格納及び保持し、処理部１０において実行することによって上記本実施の形態における情報管理装置で処理することができる。また、以下の説明では、記憶部１２に画像データ群がデータベースとして予め格納及び保持されているものとする。
【００８１】
ステップＳ３０の情報分別工程では、データベースに格納されている画像データ群を、その画像データの属性に基づいて複数のグループに分別する。例えば、データベースに格納されている画像データのサムネイルをディスプレイ等の出力装置に表示させ、マウス等のポインティングデバイスを用いていずれのグループに属する画像データであるかをユーザに選択させることが好ましい。
【００８２】
画像データを分別する基準となる属性としては、例えば、「人物」を撮影した写真画像、「風景」を撮影した写真画像等のようにユーザが分別して管理したいと考える属性を採用すれば良い。
【００８３】
ステップＳ３２のキー情報取得工程では、ユーザが特定のグループの特徴を良く表していると考える情報、例えば画像データ（以下、本実施の形態においてキー画像データという）、及びその属性を取得する。上記ステップＳ１０と同様に、出力装置に情報の入力を促す画面を表示させ、ユーザに分別の対象となるキー画像データ及びどのグループに属するかを示す属性を入力させる。
【００８４】
ステップＳ３４の特徴量取得工程では、データベースに含まれる各画像データ及びキー画像データから特徴量を抽出する。特徴量の抽出は、上記ステップＳ１２と同様に行うことができる。特徴量は、例えば、ＲＧＢ色空間における各成分、Ｌ＊ａ＊ｂ＊色空間における各成分、明度、彩度、輝度、コントラスト等の互いに独立した変量から選択して用いることができる。
【００８５】
また、画像データを構成する全画素の各々から特徴量を抽出しても良いし、画像データをメッシュ状に区画し、各メッシュ区画に含まれる画素値を平均化して特徴量として抽出しても良い。
【００８６】
ステップＳ３６の特徴量寄与度抽出工程では、各画像データから特徴量を抽出し、その特徴量が画像データをグループとして区別するためにどれだけ寄与しているかを示す寄与度を算出する。
【００８７】
まず、１つのグループに含まれる画像データから抽出された各特徴量の値を平均し、その特徴量の平均値からなる全平均特徴量ベクトルＶ_ｇを構成する。例えば、１つの画像データから明度及び輝度の特徴量が抽出されている場合には、１つのグループに含まれる全画像データの明度について平均値を算出する。同様に、輝度についても平均値を算出する。そして、明度及び輝度の平均値によって全平均特徴量ベクトルＶ_ｇを構成する。
【００８８】
データベースに含まれる全グループについて平均特徴量ベクトルが算出されると、各グループ間において全平均特徴量ベクトルのベクトル間距離ｄ_ｇを算出する。このベクトル間距離ｄ_ｇは、各グループに含まれる画像データの平均的な類似性を表す。
【００８９】
次に、１つのグループに含まれる画像データから抽出された特徴量のいずれか１つを除いて平均特徴量ベクトルＶ_ｃを構成する。上記の例の場合、例えば輝度の特徴量を除いて、明度の平均値のみから平均特徴量ベクトルＶ_ｃを構成する。各グループ間において平均特徴量ベクトルのベクトル間距離ｄ_ｃを算出する。
【００９０】
先に算出した全平均特徴量ベクトルのベクトル間距離ｄ_ｇと平均特徴量ベクトルのベクトル間距離ｄ_ｃとの差は、取り除いた特徴量が各グループを区別するためにどれだけ寄与しているかを示している。したがって、このベクトル間距離の差｜ｄ_ｇ−ｄ_ｃ｜を算出し、それを取り除いた特徴量の寄与度Ｅとする。
【００９１】
各画像データから多数の特徴量を抽出した場合にも同様に寄与度を計算することができる。例えば、各画像データからα個の特徴量Ｂｋ（ｋ＝１〜α）が抽出された場合には、α次元の全平均特徴量ベクトルＶ_ｇを求め、各グループ間において全平均特徴量ベクトルのベクトル間距離ｄ_ｇを算出する。次に、特徴量Ｂｋのいずれか１つを取り除いてα−１次元の平均特徴量ベクトルＶ_ｃｋを求め、各グループ間において平均特徴量ベクトルＶ_ｃｋのベクトル間距離ｄ_ｃｋを算出する。これらのベクトル間距離の差｜ｄ_ｇ−ｄ_ｃｋ｜が取り除いた特徴量Ｂｋの寄与度Ｅｋとなる。
【００９２】
ここまでに得られた寄与度Ｅｋに基づいて、各グループを区別するために有効な基準特徴量を選択しても良い。例えば、寄与度Ｅｋが所定の閾値以下である特徴量Ｂｋを除き、残った特徴量のみから構成される特徴量ベクトルを基準特徴量ベクトルとして選択しても良い。
【００９３】
しかしながら、本実施の形態では、以下の処理においてさらにグループを区別するために有効な特徴量の選択を行う。
【００９４】
ステップＳ３８の基準特徴量ベクトル選択工程では、キー画像データの特徴量ベクトルを用いて、さらに有効な特徴量に絞り込む。
【００９５】
データベースの各グループに含まれる各画像データ及びキー画像データに対して、抽出された特徴量を順次組み合わせて複数の特徴量ベクトルを構成する。このとき、図５に示すように、ステップＳ２０と同様に特徴量を組み合わせて様々な特徴量ベクトルを構成することができる。
【００９６】
次に、同一の特徴量から構成される特徴量ベクトルについてキー画像データと各画像データとのベクトル間距離を算出し、ベクトル間距離が小さい順に画像データをソートする。画像データがソートされると、キー画像データの属性と一致するグループに含まれる画像データの順位の平均値と、キー画像データの属性と一致しないグループに含まれる画像データの順位の平均値と、をグループ毎に算出する。このキー画像データの属性と一致するグループに対する順位の平均値と、対象画像の属性と一致しないグループに対する順位の平均値との差が、ユーザがグループを区別するためにキー画像データを選択した主観的な評価を客観的に示すものとなる。
【００９７】
すなわち、ソート結果の上位にキー画像データの属性と一致する画像データが多く存在し、下位にキー画像データの属性と一致しない画像データが存在する場合には、グループ間の順位の平均値の差は大きくなる。従って、その特徴量ベクトルはキー画像データを適切なグループに振り分けるために有効であると考えられる。一方、ソート結果の上位にキー画像データの属性と一致しない画像データが存在し、下位にキー画像データの属性と一致する画像データが多く存在する場合には、グループ間の順位の平均値の差は小さくなり、その特徴量ベクトルは有効ではないと考えられる。
【００９８】
このように取得された各特徴量ベクトルに対するグループ間の順位の平均値の差と先に取得した寄与率とによって、最もグループ同士を区別するために有効である基準特徴量ベクトルを選択する。例えば、特徴量ベクトル毎に求められたグループ間の順位の平均値の差に、その特徴量ベクトルに含まれる特徴量の寄与率を乗算した値が最も大きい特徴量ベクトルを基準特徴量ベクトルとして定めることができる。
【００９９】
また、本実施の形態では、全ての処理を１つの情報管理装置によって行うものとして説明したが、これに限られるものではない。例えば、ネットワーク等による情報伝達を用いて、情報分別工程とキー情報取得工程とを情報管理装置以外の外部端末を用いてユーザに行わせ、特徴量取得工程、特徴量寄与度抽出工程及び基準特徴量ベクトル選択工程を情報管理装置に行わせる形態としても良い。
【０１００】
また、処理を確実に行うために、基準特徴量ベクトルが選択できない場合には、情報から所定の特徴量の値を取得し、当該所定の特徴量の値に基づいて検索を行うことが好適である。
【０１０１】
以上のように、本実施の形態によれば、画像データをグループに分割するために有効な寄与率と、キーとなるキー画像データと各グループとの類似性を示すソート順位の平均値との両方に基づいて、基準特徴量ベクトルを選び出すことができる。以降、画像データの検索、照合、分類を行う際には、この基準特徴量ベクトルを構成する基準特徴量の値を用いて情報管理を行うことによって、画像データから不要な特徴量を抽出したり、不適切な特徴量を照合に利用したりすることがなくなるため、情報管理の処理速度を向上することができる。また、ユーザの主観を踏まえて基準特徴量が選択できるため、情報管理の精度を高めることができる。
【０１０２】
また、上記第１の情報管理方法と同様に、本来画像データの特徴を良く表した特徴量ベクトルが排除される問題を回避したり、「ピアツーピア」によるネットワーク検索にも対応することができる。
【０１０３】
なお、本実施の形態では、画像データを対象として説明を行ったが、本発明の対象はこれに限られるものではない。例えば、文字情報を含む文書データであれば、形態素解析等によって特徴量を抽出し、それらの特徴量に対して同様の処理を行うことができる。
【０１０４】
【発明の効果】
本発明によれば、情報の比較、分類、検索等を正確かつ高速に行う情報管理方法、情報管理装置及び情報管理プログラムを提供することができる。
【図面の簡単な説明】
【図１】本発明の実施の形態における情報管理装置の構成を示すブロック図である。
【図２】本発明の実施の形態におけるダウンロード方法のフローチャートである。
【図３】本発明の実施の形態におけるダウンロード処理中の各記憶部の内容を示す図である。
【図４】本発明の実施の形態におけるダウンロード処理中の各記憶部の内容を示す図である。
【図５】本発明の実施の形態におけるダウンロード処理中の各記憶部の内容を示す図である。
【図６】本発明の実施の形態におけるダウンロードプログラム及びその記録媒体を用いた実施態様を示す図である。
【図７】第１の従来技術における情報処理装置の構成のブロック図である。
【図８】第１の従来技術におけるダウンロード処理中の各記憶部の内容を示す図である。
【符号の説明】
１０処理部、１２記憶部、１４入力部、１６出力部、１８外部インターフェース、２０バス、１００情報管理装置、１０２情報処理装置、１０４画像読取装置、１０６画像形成装置、１０８携帯端末、２００ネットワーク。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an information management method, an information management device, and an information management program for managing information having a specific attribute by sequentially changing a reference according to a purpose.
[0002]
[Prior art]
Conventionally, in order to manage information such as image data, a method of adding additional information for specifying information such as a keyword when information is registered in a database has been mainstream. For example, when searching for information similar to specific information from a database, a keyword associated with the information to be searched is used to select information to which the keyword is added as additional information.
[0003]
However, such a method using the additional information requires that the additional information be added when registering the information, and has a disadvantage that the registering user is forced to perform a complicated registration operation.
[0004]
Therefore, a method of extracting a characteristic element included in information as a characteristic amount to form a characteristic amount vector, and managing information based on a distance of the characteristic amount vector between a plurality of pieces of information has been used. .
[0005]
For example, as shown in FIG. 7, reference image data serving as a management reference is divided into n × m mesh sections, and saturation Ir as a feature amount from one mesh section (i, j). _ij And lightness Br _ij Is extracted, a two-dimensional feature vector is obtained from one mesh section (i, j). Since n × m × 2 mesh sections exist in the entire image data, an n × m × two-dimensional feature amount vector is obtained in the entire image data. Similarly, the target image to be managed is also divided into meshes equal to the reference image, and the saturation Io is used as a feature amount from the mesh section (i, j) corresponding to the mesh section of the reference image. _ij And lightness Bo _ij Is extracted, an n × m × 2 dimensional feature amount vector can be obtained.
[0006]
Subsequently, as shown in FIG. 8, a distance d between the feature amount vector of the reference image obtained in this way and the feature amount vector of the management target image is obtained. The distance d between the two feature amount vectors can be obtained by Expression (1).
[0007]
(Equation 1)

[0008]
This distance d indicates the dissimilarity between the reference image and the target image. That is, it can be said that the larger the distance d, the lower the similarity between the image data, and the smaller the distance d, the higher the similarity between the image data. Therefore, it is possible to perform information management by comparing, classifying, searching, and the like, based on the similarity based on the distance d of the feature amount vector between the reference image data and the target image data.
[0009]
In the information management method using such a feature amount vector, it is necessary to extract many types of feature amounts from one piece of information in order to obtain more accurate search results. However, extracting a feature amount unnecessarily and having a large number of feature amount spatial axes in the feature amount vector not only increases the processing time when matching the feature amount vector, but also increases the redundant feature amount. The influence may cause a decrease in search accuracy.
[0010]
Therefore, the principal component analysis is performed on the feature amount vector represented by the extracted feature amount, and the feature amount constituting the feature amount vector is eliminated by maintaining the orthogonality of the feature amount vector and excluding the feature amount having a small variance. Often a method is used to keep the number of quantities low.
[0011]
In Japanese Patent Application Laid-Open No. 2000-112943, a list of data in which data in a database is arranged in order of its value for each feature amount is created, and one feature amount sequentially selected from a base index is read from the list. The pointer that points to the data is updated in ascending order of the difference between the test data and the component value, and it is determined whether or not the end condition is satisfied based on the difference in the value of one feature between the data pointed to by the pointer and the test data. If not, determine whether the rejection condition is satisfied based on the distance in the subspace between the data pointed to by the pointer and the test data.If not, calculate the distance in the entire space between the data pointed to by the pointer and the test data. A data detection method for extracting a predetermined number of data in ascending order of calculated distance is disclosed.
[0012]
Also, Japanese Patent Application Laid-Open No. 2001-134573 discloses that a cell space that divides a multidimensional space including a point indicated by a feature vector is hierarchically constructed, and a feature is provided using a bit string representing each hierarchical cell space. A similar data search method for uniquely managing a quantity vector is disclosed. In the search by the method, the cell space is restored based on the cell width at the time of division, the distance between the point in the multidimensional space indicated by the feature vector of the search key and the cell space is calculated, and the candidate cell space is calculated based on the calculated distance. Are searched for a point in the cell space of the feature vector included in the candidate cell space based on the distance from the point in the cell space of the feature vector of the search key.
[0013]
[Patent Document 1]
JP-A-2000-112943
[Patent Document 2]
JP 2001-134573 A
[0014]
[Problems to be solved by the invention]
However, if the number of features constituting the feature vector is very large or if only the features having a large variance are included, the number of features constituting the feature vector is not sufficient even if the principal component analysis is performed. There was a problem that could not be reduced.
[0015]
On the other hand, when the method disclosed in JP-A-2000-112943 or JP-A-2001-134573 is used, the characteristic of the original information is determined because the distance on the spatial axis of the predetermined characteristic amount is large. However, there is a problem that the characteristic amount that well represents the exclusion is excluded.
[0016]
Furthermore, in these prior arts, the number of features constituting the feature vector is reduced regardless of the validity of the search result, and the reduction in the number of features results in a search result that is undesirable for the user. It contained a fatal problem.
[0017]
In addition, when using any of the above-described prior art methods, it is necessary to extract a large number of various feature amounts from the information to be searched, and perform a high-load process such as principal component analysis. In addition, when a search target occurs as needed, such as a so-called “peer-to-peer” network search, it is difficult to use the above-described conventional technology because it is not possible to perform processing such as feature amount extraction or principal component analysis in advance. Met.
[0018]
The present invention has been made in consideration of the problems of the related art, and in order to solve at least one of the above problems, an information management method, an information management device, and an information management method for managing information by selecting an appropriate feature amount according to information to be managed. The purpose is to provide a management program.
[0019]
[Means for Solving the Problems]
The present invention that can solve the above-described problems is an information management method for managing a plurality of pieces of information based on a similarity between each other, and sequentially or at a predetermined time, a plurality of pieces of information from information in an information group to be managed. Acquiring a feature amount, selecting information having a specific attribute from the information on which the feature amount has been acquired, and selecting, from the acquired plurality of feature amounts, a value of the feature amount of the selected information and information not selected. A feature amount having a large difference from the value of the feature amount is obtained as a reference feature amount, and information is managed based on the value of the reference feature amount.
[0020]
More specifically, a feature amount obtaining step of obtaining a plurality of feature amounts from information in an information group to be managed, and selecting information having a specific attribute from the information on which the feature amounts are obtained. An information selecting step, restructuring the plurality of feature quantities to form a plurality of feature quantity vectors for each piece of information, and a distance between the feature quantity vector of the selected information and the feature quantity vector of the unselected information; A reference feature vector selection step of obtaining a reference feature vector effective for distinguishing the selected information from the non-selected information based on the reference feature, and sequentially or at a predetermined time. It is characterized in that a vector is formed and information is managed based on the value.
[0021]
Here, in the information selecting step, it is preferable that the user selects information.
[0022]
Further, another specific aspect is an information separation step of separating a plurality of information groups having an attribute common to each other and an information group not having the common attribute, and the information group having the common attribute and the information group having the common attribute. A feature amount obtaining step of obtaining a plurality of feature amounts from each of information included in an information group having no attribute to perform, and a plurality of feature amount vectors being configured by rearranging the plurality of feature amounts; Based on the inter-vector distance between the feature vector of the information group having the information group and the feature vector of the information group having no common attribute, the information group having the common attribute and the information not having the common attribute A reference feature vector selection step of extracting a reference feature vector serving as a reference for distinguishing the group from the group, comprising sequentially or at a predetermined time the reference feature vector, and information based on the value. Characterized in that it sense.
[0023]
Here, it is preferable that in the information sorting step, the user sorts the information group having the common attribute and the information group having no common attribute.
[0024]
Further, the reference feature vector selection step includes an average feature vector composed of an average value of features in the information group having the common attribute, and an average feature vector in the information group having no common attribute. It is preferable to select, as the reference feature vector, the average feature vector having the largest distance between the average feature vector composed of the values.
[0025]
Furthermore, the information management method further includes a key information obtaining step of obtaining key information serving as a key of a preliminary search, wherein the information selecting step includes the step of selecting a group of information preliminarily searched based on the key information. Alternatively, information having a specific feature may be selected from the information.
[0026]
The information management method further includes a key information acquisition step of acquiring key information, wherein the feature quantity acquisition step acquires a plurality of feature quantities from the key information, and wherein the reference feature quantity vector selection step comprises: A plurality of feature amounts obtained from the key information are rearranged to form a plurality of feature amount vectors for the key information, and further, a vector of a feature amount vector of the information group having the common attribute and a feature amount vector of the key information The reference feature vector may be selected based on an inter-distance and an inter-vector distance between a feature vector of the information group having no common attribute and a feature vector of the key information.
[0027]
At this time, if the reference feature vector cannot be selected in the reference feature vector selection step, a value of a predetermined feature is acquired from the information, and the information is managed based on the value of the predetermined feature. Is preferred.
[0028]
The present invention for solving the above problems is an information management device that manages a plurality of pieces of information based on the similarity between each other, and sequentially or at a predetermined time, from each information in an information group to be managed. Acquiring a plurality of feature amounts, selecting information having a specific attribute from the information on which the feature amounts have been acquired, and selecting, from the acquired plurality of feature amounts, a value of the feature amount of the selected information and not selecting A feature amount having a large difference from the value of the feature amount of the obtained information is obtained as a reference feature amount, and information is managed based on the value of the reference feature amount.
[0029]
Specifically, a feature amount obtaining unit that obtains a plurality of feature amounts from each information in the information group to be managed, and information that selects information having a specific attribute from the information group to be managed. A selecting unit, and a plurality of feature amounts obtained by the feature amount obtaining unit are rearranged to form a plurality of feature amount vectors; a feature amount vector of the selected information and a feature amount vector of the unselected information; Based on the inter-vector distance, including a reference feature vector selection means for obtaining a reference feature vector effective to distinguish the selected information and the non-selected information, sequentially or at a predetermined time The method is characterized in that the reference feature amount vector is formed, and information is managed based on the value.
[0030]
Here, it is preferable that the information selecting means allows a user to select information.
[0031]
Further, another specific mode is an information separation unit that separates a plurality of information groups having an attribute common to each other from an information group not having the common attribute, and the information group having the common attribute and the information group having the common attribute. A feature amount obtaining unit that obtains a plurality of feature amounts from each of information included in an information group having no attribute to perform, and a plurality of feature amount vectors by rearranging the plurality of feature amounts obtained by the feature amount obtaining unit. The information group having the common attribute and the information group having the common attribute, based on the inter-vector distance between the feature vector of the information group having the common attribute and the feature vector of the information group having no common attribute. A reference feature vector selection means for extracting a reference feature vector serving as a reference for distinguishing from a group of information not having a common attribute, wherein the reference feature vector is sequentially or at a predetermined time. Configured, and wherein the managing information based on the value.
[0032]
Here, it is preferable that the information classification means allows a user to classify the information group having the common attribute and the information group not having the common attribute.
[0033]
Further, the reference feature vector selection means may include an average feature vector composed of an average value of the features in the information group having the common attribute, and an average feature vector in the information group having no common attribute. It is preferable to select, as the reference feature vector, the average feature vector having the largest distance between the average feature vector composed of the values.
[0034]
Further, in the information management device, the information management apparatus further includes key information acquisition means for acquiring key information serving as a key of a preliminary search, wherein the information selection means includes an information group preliminarily searched based on the key information. Alternatively, information having a specific feature may be selected.
[0035]
Further, in the information management device, the information management apparatus further includes key information acquisition means for acquiring key information, wherein the feature quantity acquisition means acquires a plurality of feature quantities from the key information, and the reference feature quantity vector selection means, A plurality of feature amounts obtained from the key information in the feature amount obtaining means are rearranged to form a plurality of feature amount vectors for the key information, and further, a feature amount vector of an information group having the common attribute and a feature of the key information. The reference feature vector may be selected based on a distance between the vector and the feature vector of the information group having no common attribute and a distance between the feature vectors of the key information. .
[0036]
At this time, when the reference feature amount vector cannot be selected by the reference feature amount vector selection means, a value of a predetermined feature amount is obtained from the information, and the information is managed based on the value of the predetermined feature amount. Is preferred.
[0037]
The present invention that can solve the above-described problems is an information management program that manages a plurality of pieces of information based on the similarity between the pieces of information. A feature value acquiring step, an information selecting step of selecting information having a specific attribute from the information on which the feature values have been acquired, and a plurality of feature value vectors are configured by rearranging the plurality of feature values, and the selection is performed. A reference feature quantity effective for distinguishing the selected information from the non-selected information based on an inter-vector distance between the feature quantity vector of the selected information and the feature quantity vector of the unselected information. A process including a reference feature vector selection step for obtaining a vector is executed sequentially or at a predetermined time.
[0038]
The present invention that can solve the above-described problems is an information management program that manages a plurality of pieces of information based on the similarity of each other, wherein the computer has a plurality of information groups having common attributes and the common attributes. An information classification step of classifying the information group having no common attribute, and a feature amount obtaining step of acquiring a plurality of feature amounts from each of information included in the information group having the common attribute and the information group not having the common attribute And a plurality of feature amounts are recombined to form a plurality of feature amount vectors, and a vector between the feature amount vector of the information group having the common attribute and the feature amount vector of the information group not having the common attribute A reference feature vector selection for extracting a reference feature vector serving as a reference for distinguishing the information group having the common attribute from the information group having no common attribute based on the distance Characterized in that to execute a process including the extent sequentially or at a given time.
[0039]
BEST MODE FOR CARRYING OUT THE INVENTION
<Configuration of information processing device>
The information management device 100 according to the embodiment of the present invention basically includes a processing unit 10, a storage unit 12, an input unit 14, an output unit 16, and an external interface 18, as shown in FIG. These units are connected to each other via a bus 20 so that information can be transmitted.
[0040]
As shown in FIG. 2, the information management apparatus 100 according to the present embodiment includes another information processing apparatus 102 (computer or the like), an image reading apparatus 104 (a scanner or the like), an image forming apparatus via a network 200 outside the apparatus. 106 (a printer, a copier, or a multifunction peripheral thereof), a portable terminal 108 (a cellular phone), and the like. The information management device 100 can perform a remote process by receiving an information management execution command from a user via the network 200, and can use information stored in another information device as a search target.
[0041]
The processing unit 10 executes the information management program stored in the storage unit 12, receives key information serving as a search key from the input unit 14 and the external interface 18, and manages the information. The processing will be described later in detail.
[0042]
The storage unit 12 stores an information management program, a database of information to be searched, and the like. In addition, it temporarily stores and holds key information, intermediate processing results of search processing, and the like. These pieces of information are appropriately referred to by the processing unit 10 via the bus 20 and provided for processing. As the storage unit 12, a semiconductor memory, a hard disk device, an optical disk device, or the like can be used.
[0043]
The input unit 14 receives a program execution command from a user and parameters required for processing. When the information is image information, image data is read in order to construct a database or to input key image data serving as key information. As the input unit 14, a pointing device such as a mouse, a keyboard, a scanner, or the like can be used. Further, the output unit 16 presents a processing result and information necessary for the processing to the user. As the output unit 16, a display device, a printer, or the like can be used. The external interface 18 connects the information management device and the network so that information can be transmitted. For example, it is possible to exchange information between the information management apparatus and an external information processing apparatus using a standardized communication protocol such as TCP / IP.
[0044]
As described above, the information management device according to the present embodiment can be realized using a computer that is generally widely used.
[0045]
<First information management method>
The first information management method according to the present embodiment will be described in detail with reference to the flowchart shown in FIG. Here, as an example, a method of searching for information similar to image data serving as key information (hereinafter referred to as key image data) from a search target image data group to be searched will be described.
[0046]
The first information management method according to the present embodiment is coded as a program that can be processed by a computer, and the program is stored and held in the storage unit 12 and executed by the processing unit 10 to execute the information management apparatus according to the present embodiment. Can be processed. In the following description, it is assumed that a group of image data to be searched is stored and held in the storage unit 12 in advance as a database.
[0047]
In the key information acquisition step of step S10, key information serving as a search key is acquired. A screen prompting the user to input key information is displayed on the output unit 16, and the user is caused to input the key information from the input unit 14. For example, the user can use an image reading device such as a scanner to read key image data (hereinafter, referred to as key image data in the present embodiment) and input the key image data to the information management device. The acquired key image data is stored in the storage unit 12, and is appropriately referred to by the processing unit 10 and the like in the following processing.
[0048]
In the key information feature amount obtaining step of step S12, the value of the feature amount is obtained from the key image data. At this time, values of all types of feature amounts that can be extracted from the key image data can be extracted, and a feature amount vector can be formed from all the feature amounts. On the other hand, if feature amounts that well represent the features of the key image data can be specified in advance, a feature amount vector may be configured by extracting only the values of those feature amounts. The feature amount can be selected and used from, for example, each component in the RGB color space, each component in the L * a * b * color space, and independent variables such as brightness, saturation, luminance, and contrast.
[0049]
Further, the value of the feature amount may be extracted from each of all the pixels constituting the image data, or the image data may be divided into meshes of a predetermined size, and the average value of the pixel values included in each mesh section may be calculated as the feature amount. May be obtained.
[0050]
For example, as shown in FIG. 4, the key image data A is divided into n × m meshes, and α features are obtained from each mesh section (i, j) (i = 1 to n, j = 1 to m). Amount value Ak _ij (K = 1 to α), an n × m × α-dimensional feature amount vector V for the key image data A is obtained. _A (A1 ₁₁ , A2 ₁₁ , ..., Ak _ij , ..., Aα _nm ) Can be obtained.
[0051]
In the feature amount acquiring step of step S14, the same feature amount as the feature amount acquired from the key information is acquired from image data to be searched (hereinafter, referred to as search target image data) included in the database. The acquisition of the feature amount can be performed in the same manner as in step S12.
[0052]
For example, when the database includes search target image data Bg (g = 1, 2, 3,...), Each search target image data Bg is converted into the same n × m number of key image data A as the key image data A. Αg feature values Bgk equal to those extracted from the key image data A from each of the mesh sections (i, j) (i = 1 to n, j = 1 to m). _ij (K = 1 to α) is extracted. Using these feature amounts, an n × m × α-dimensional feature amount vector V for each search target image data Bg _Bg (Bg1 ₁₁ , Bg2 ₁₁ , ..., Bgk _ij , ..., Bgα _nm ) Can also be configured.
[0053]
In the preliminary search step of step S16, image data similar to the key image data is preliminarily searched from the search target image data group using a feature amount vector configured from the feature amounts acquired from the key image data.
[0054]
In this step, as in the above-described related art, the inter-vector distance between the feature vector composed of the feature extracted from each of the search target image data and the feature vector of the key image data is calculated. Since it can be determined that the similarity between the search target image data and the key image data is higher as the inter-vector distance is smaller, a predetermined number of the search target image data is selected in ascending order of the inter-vector distance.
[0055]
Next, an inter-vector distance d between the feature amount vector of the key image data and the feature amount vector of the search target image data is calculated using Expression (2). In the following description, other inter-vector distances can be similarly calculated.
[0056]
(Equation 2)

[0057]
The inter-vector distance d with the key image data is calculated for all the search target image data included in the database, and a predetermined number of image data is extracted in ascending order of the inter-vector distance d.
[0058]
This preliminary search step is intended to narrow the search target image data similar to the key image data from the database in a relatively wide range, and to allow the user to select later. Therefore, the user confirms the searched image data and determines the number of feature values so that a selectable number (for example, several to several tens) of image data is extracted from the searched image data. Is preferred.
[0059]
In the information selection step in step S18, the image data searched in the preliminary search step in step S16 is presented to the user, and a plurality of image data (hereinafter referred to as reference) that are considered to be appropriate as the final search result from the image data are presented. Image data).
[0060]
For example, the thumbnails of the image data searched in the preliminary search step are displayed on the output unit 16, and the user is allowed to select one of them determined to be similar to the key image data using a pointing device such as a mouse. Is preferred.
[0061]
In the reference feature vector selection step of step S20, at least one feature is sequentially selected from a plurality of features already acquired for each search target image data, and a feature vector combining the selected features is formed. I do.
[0062]
For example, for the image data B1, the feature amount vector V _B1 N × m × α feature amounts (B11 ₁₁ , B12 ₁₁ , ..., B1k _ij , ..., B1α _nm ) (K = 1 to α, i = 1 to n, j = 1 to m) have been acquired. In this case, as shown in FIG. 5, one feature B1k _ij Feature amount vector composed of only two feature amounts B1k _ij And B1g _qr (G = 1 to α (excluding k), q = 1 to n (excluding i), r = 1 to m (excluding j)) A feature amount vector obtained by sequentially combining the feature amounts is configured.
[0063]
Next, among a plurality of feature amount vectors configured for each of the search target image data, it is effective to distinguish between the image data group selected as the reference image data and the search target image data group not selected. Out the feature vector. That is, a feature amount vector in which the distance between vectors is small between the reference image data selected by the user subjectively and the distance between the reference image data group and another search target image data group is large is selected as the reference feature amount vector. I do.
[0064]
First, one image data is selected from the reference image data, and an inter-vector distance between feature amount vectors having the same feature amount as a component is calculated with other image data. When the inter-vector distance of the feature amount vectors is calculated for all the image data, the image data is sorted so that the image data having the smaller inter-vector distance is ranked higher. As a result, when the reference image data is higher than the search target image data that is not the reference image data, the feature vector is selected as the reference feature vector.
[0065]
Hereinafter, a specific example will be described. Assuming that the reference image data B1 and B2 are selected in the information selection step of step S16, one of the reference image data B1 and B2, for example, one reference image data B1 is selected, and the reference image data B1 and B2 are selected. The inter-vector distance of the feature amount vector is calculated with other image data Bg (g = 2, 3...). At this time, an inter-vector distance between feature amount vectors having the same feature amount as a component is calculated.
[0066]
For example, between reference image data B1 and image data B2, B11 ₁₁ Vector consisting of ₁₁ The distance between the vector and the feature amount vector is calculated. Next, B11 is set between the reference image data B1 and the image data B3. ₁₁ Vector consisting of ₁₁ The distance between the vector and the feature amount vector is calculated. Similarly, the inter-vector distance between the reference image data B1 and other image data is calculated.
[0067]
The calculated inter-vector distances are compared, and the image data is arranged in ascending order of the inter-vector distance. Similarly, the inter-vector distance is calculated for the feature amount vectors composed of other combinations of the feature amounts, and the image data is sorted in ascending order of the inter-vector distance.
[0068]
As a result, a feature amount vector in which the inter-vector distance to the reference image data B2 is smaller than the inter-vector distance to other image data is selected as the reference feature amount vector.
[0069]
When there are many feature amount vectors in which the reference image data is higher than the image data that is not the reference image data, any of the reference feature amounts and the reference feature amount vectors are effective for distinguishing the reference image data from other image data. However, for example, in order to reduce the processing load in the future, it is preferable to adopt a reference feature vector composed of as few reference feature as possible. On the other hand, when the purpose is to perform the search processing strictly, it is preferable to adopt a reference feature amount vector composed of a larger number of reference feature amounts. It is also preferable to adopt a feature amount vector having the largest difference between the inter-vector distance for the reference image data and the inter-vector distance for other image data.
[0070]
The feature amount forming the feature amount vector obtained in this manner is the reference feature amount.
[0071]
In the present embodiment, the processing is performed on all the search target image data. However, it is also preferable to perform these processing only on the image data group preliminarily searched in step S16. That is, the inter-vector distance of the feature vector between the reference image data selected by the user as being similar to the key image data and the unselected image data among the pre-searched image data is calculated. Based on this, it is also possible to obtain a reference feature amount vector for distinguishing the reference image data from other image data. As described above, by limiting the number of image data to be processed, the processing load can be reduced.
[0072]
Further, in the present embodiment, all processes are performed by one information management device, but the present invention is not limited to this. For example, using information transmission over a network or the like, the key information acquisition step and the information selection step are performed by the user using an external terminal other than the information management apparatus, and the key information feature quantity acquisition step, the feature quantity acquisition step, and the preliminary search step are performed. Alternatively, the information management apparatus may perform the reference feature vector selection step.
[0073]
Further, in order to reliably perform the search process, when the reference feature amount vector cannot be selected, it is preferable to obtain a value of a predetermined feature amount from the information and perform a search based on the value of the predetermined feature amount. It is.
[0074]
As described above, according to the present embodiment, a reference feature amount vector and a reference feature amount effective for distinguishing reference image data from other image data can be obtained. Thereafter, when the search is continued, the value of the reference feature value constituting the reference feature value vector is extracted from the key image data and the search is performed, whereby the search that matches the search purpose of the user can be performed.
[0075]
For example, by performing feedback so as to extract only the value of the reference characteristic amount selected in the key information characteristic amount obtaining step of step S12 and the characteristic amount obtaining step of step S14, each time the search processing of this embodiment is repeated. A search that matches the user's taste can be realized.
[0076]
Further, even when the number of feature values constituting the feature value vector is very large or only the feature value having a large variance is included, the number of feature values constituting the feature value vector can be sufficiently reduced. Further, it is possible to avoid a problem that a feature vector that originally expresses a feature of image data is excluded only because a distance is large on a spatial axis of a predetermined feature as in the related art, which is desirable for a user. You can get search results.
[0077]
Furthermore, since the reference feature amount and the reference feature amount vector can be extracted in advance before the actual search operation is performed, it is possible to cope with a "peer-to-peer" network search.
[0078]
In the present embodiment, the description has been given with respect to image data, but the subject of the present invention is not limited to this. For example, in the case of document information including characters, feature amounts can be extracted by morphological analysis or the like, and similar processing can be applied to those feature amounts.
[0079]
<Second information management method>
The second information management method according to the present embodiment will be described in detail with reference to the flowchart shown in FIG. Here, a description will be given of a process of allocating target image data to be managed to a pre-sorted group.
[0080]
The second information management method according to the present embodiment is coded as a program that can be processed by a computer, the program is stored and held in the storage unit 12, and the processing unit 10 executes the program. It can be processed by the device. In the following description, it is assumed that the image data group is stored and held in the storage unit 12 in advance as a database.
[0081]
In the information classification process of step S30, the image data groups stored in the database are classified into a plurality of groups based on the attributes of the image data. For example, it is preferable that thumbnails of the image data stored in the database are displayed on an output device such as a display, and the user selects a group to which the image data belongs by using a pointing device such as a mouse.
[0082]
As an attribute serving as a reference for sorting image data, for example, an attribute that the user wants to sort and manage, such as a photographic image of “person” or a photographic image of “landscape”, may be used.
[0083]
In the key information acquisition step in step S32, information that the user considers to well represent the characteristics of the specific group, for example, image data (hereinafter, referred to as key image data in the present embodiment) and its attributes are acquired. As in step S10, a screen for prompting input of information is displayed on the output device, and the user is prompted to input key image data to be sorted and an attribute indicating to which group the group belongs.
[0084]
In the feature amount obtaining step of step S34, a feature amount is extracted from each image data and key image data included in the database. The feature amount can be extracted in the same manner as in step S12. The feature amount can be selected and used from, for example, each component in the RGB color space, each component in the L * a * b * color space, and independent variables such as brightness, saturation, luminance, and contrast.
[0085]
Further, a feature amount may be extracted from each of all the pixels constituting the image data, or the image data may be divided into meshes, and the pixel values included in each mesh section may be averaged and extracted as a feature amount. good.
[0086]
In the feature value contribution extraction step of step S36, a feature value is extracted from each image data, and a contribution value indicating how much the feature value contributes to distinguish the image data as a group is calculated.
[0087]
First, the values of the respective feature amounts extracted from the image data included in one group are averaged, and the total average feature amount vector V including the average value of the feature amounts is obtained. _g Is composed. For example, when brightness and luminance feature amounts are extracted from one image data, an average value is calculated for the brightness of all the image data included in one group. Similarly, an average value is calculated for the luminance. Then, the total average feature amount vector V _g Is composed.
[0088]
When the average feature vector is calculated for all the groups included in the database, the distance d between the vectors of all the average feature vectors between the groups _g Is calculated. This vector distance d _g Represents the average similarity of the image data included in each group.
[0089]
Next, the average feature vector V is calculated by removing one of the features extracted from the image data included in one group. _c Is composed. In the case of the above example, for example, excluding the luminance feature, the average feature vector V _c Is composed. Distance d between vectors of the average feature vector between each group _c Is calculated.
[0090]
The distance d between the vectors of the previously calculated total average feature amount vector _g And the distance d between the vector of the average feature vector _c Indicates how much the removed feature contributes to distinguish each group. Therefore, the difference | d _g -D _c Is calculated, and the degree of contribution E of the feature amount obtained by removing | is calculated.
[0091]
Even when a large number of features are extracted from each image data, the contribution can be calculated in the same manner. For example, when α feature values Bk (k = 1 to α) are extracted from each image data, the α-dimensional total average feature vector V _g Between the groups, and the distance d between the vectors of the total average feature amount vector between the groups _g Is calculated. Next, any one of the feature amounts Bk is removed to obtain an α-1 dimensional average feature amount vector V _ck , And the average feature amount vector V _ck Vector distance d _ck Is calculated. Difference | d between these vectors _g -D _ck | Becomes the contribution Ek of the removed characteristic amount Bk.
[0092]
Based on the contribution Ek obtained so far, a reference feature amount effective for distinguishing each group may be selected. For example, a feature amount vector composed of only the remaining feature amounts may be selected as the reference feature amount vector except for the feature amount Bk whose contribution degree Ek is equal to or less than a predetermined threshold.
[0093]
However, in the present embodiment, a feature amount effective for further distinguishing a group is selected in the following processing.
[0094]
In the reference feature vector selection step of step S38, the feature is narrowed down to more effective features using the feature vectors of the key image data.
[0095]
A plurality of feature amount vectors are formed by sequentially combining the extracted feature amounts with each of the image data and the key image data included in each group of the database. At this time, as shown in FIG. 5, various feature amount vectors can be formed by combining the feature amounts similarly to step S20.
[0096]
Next, the inter-vector distance between the key image data and each of the image data is calculated for a feature vector composed of the same feature, and the image data is sorted in ascending order of the inter-vector distance. When the image data is sorted, the average value of the rank of the image data included in the group that matches the attribute of the key image data, the average value of the rank of the image data included in the group that does not match the attribute of the key image data, Is calculated for each group. The difference between the average value of the rank for a group that matches the attribute of the key image data and the average value of the rank for a group that does not match the attribute of the target image is the subjective opinion that the user has selected the key image data in order to distinguish the group. It shows objective evaluation objectively.
[0097]
In other words, if there are many image data that matches the attribute of the key image data at the top of the sorting result and image data that does not match the attribute of the key image data at the bottom, the difference between the average values of the ranks among the groups Becomes larger. Therefore, the feature amount vector is considered to be effective for distributing the key image data to an appropriate group. On the other hand, if image data that does not match the attribute of the key image data exists in the upper part of the sorting result and there are many image data that matches the attribute of the key image data in the lower part, the difference between the average values of the ranks among the groups Is small, and the feature amount vector is not considered effective.
[0098]
Based on the difference between the average values of the ranks among the groups with respect to each of the acquired feature vectors and the previously acquired contribution ratio, a reference feature vector that is most effective for distinguishing between groups is selected. For example, a feature value vector having the largest value obtained by multiplying the difference between the average values of the ranks among the groups obtained for each feature value vector by the contribution ratio of the feature value included in the feature value vector is determined as the reference feature value vector. be able to.
[0099]
Further, in the present embodiment, all processes are performed by one information management device, but the present invention is not limited to this. For example, by using information transmission through a network or the like, the user is allowed to perform an information classification step and a key information acquisition step using an external terminal other than the information management apparatus, and a feature amount acquisition step, a feature amount contribution extraction step, and a reference feature. It is good also as a form which makes an information management apparatus perform a quantity vector selection process.
[0100]
Further, in order to reliably perform the processing, when the reference feature amount vector cannot be selected, it is preferable to obtain a value of the predetermined feature amount from the information and perform a search based on the value of the predetermined feature amount. is there.
[0101]
As described above, according to the present embodiment, the contribution rate effective for dividing image data into groups and the average value of the sort order indicating the similarity between key image data as a key and each group are determined. A reference feature vector can be selected based on both. Thereafter, when performing search, collation, and classification of image data, information management is performed using the values of the reference feature amounts constituting the reference feature amount vector to extract unnecessary feature amounts from the image data. In addition, since inappropriate feature amounts are not used for matching, the processing speed of information management can be improved. In addition, since the reference feature amount can be selected based on the user's subjectivity, the accuracy of information management can be improved.
[0102]
Further, similarly to the first information management method, it is possible to avoid a problem that a feature amount vector which originally expresses a feature of image data well is excluded, and to cope with a "peer-to-peer" network search.
[0103]
In the present embodiment, the description has been given with respect to image data, but the subject of the present invention is not limited to this. For example, in the case of document data including character information, feature amounts can be extracted by morphological analysis or the like, and similar processing can be performed on those feature amounts.
[0104]
【The invention's effect】
According to the present invention, it is possible to provide an information management method, an information management device, and an information management program for accurately and quickly comparing, classifying, and searching information.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a configuration of an information management device according to an embodiment of the present invention.
FIG. 2 is a flowchart of a download method according to the embodiment of the present invention.
FIG. 3 is a diagram showing contents of each storage unit during a download process according to the embodiment of the present invention.
FIG. 4 is a diagram showing contents of each storage unit during a download process according to the embodiment of the present invention.
FIG. 5 is a diagram showing contents of each storage unit during a download process according to the embodiment of the present invention.
FIG. 6 is a diagram showing an embodiment using a download program and its recording medium in the embodiment of the present invention.
FIG. 7 is a block diagram of a configuration of an information processing device according to a first conventional technique.
FIG. 8 is a diagram showing the contents of each storage unit during a download process according to the first conventional technique.
[Explanation of symbols]
Reference Signs List 10 processing unit, 12 storage unit, 14 input unit, 16 output unit, 18 external interface, 20 bus, 100 information management device, 102 information processing device, 104 image reading device, 106 image forming device, 108 portable terminal, 200 network.

Claims

An information management method for managing a plurality of pieces of information based on a similarity between each other,
Sequentially or at a predetermined point in time, a plurality of feature amounts are obtained from information in the information group to be managed, information having a specific attribute is selected from the information on which the feature amounts are obtained, and the obtained plurality of From the feature values, a feature value having a large difference between the value of the feature value of the selected information and the value of the feature value of the information not selected is determined as a reference feature value, and the information is determined based on the value of the reference feature value. An information management method characterized by managing information.

An information management method for managing a plurality of pieces of information based on a similarity between each other,
A feature amount obtaining step of obtaining a plurality of feature amounts from information in the information group to be managed,
An information selection step of selecting information having a specific attribute from the information on which the feature amount has been acquired,
The plurality of feature amounts are rearranged to form a plurality of feature amount vectors, and the selected feature amount vector is determined based on an inter-vector distance between the feature amount vector of the selected information and the feature amount vector of the unselected information. A reference feature vector selection step of determining a reference feature vector effective to distinguish information from the non-selected information;
An information management method comprising: constructing the reference feature vector sequentially or at a predetermined time, and managing information based on the value.

The information management method according to claim 2,
The information management method according to claim 1, wherein the information selecting step causes a user to select information.

An information management method for managing a plurality of pieces of information based on a similarity between each other,
An information separation step of separating a plurality of information groups having an attribute common to each other and an information group not having the common attribute,
A feature amount obtaining step of obtaining a plurality of feature amounts from each of information included in the information group having the common attribute and the information group having no common attribute,
The plurality of feature amounts are rearranged to form a plurality of feature amount vectors, and the distance between vectors between the feature amount vector of the information group having the common attribute and the feature amount vector of the information group having no common attribute is calculated. A reference feature vector selection step of extracting a reference feature vector serving as a reference for distinguishing the information group having the common attribute from the information group having no common attribute,
An information management method comprising: constructing the reference feature vector sequentially or at a predetermined time, and managing information based on the value.

The information management method according to claim 4,
The information management method according to claim 1, wherein the information separating step causes a user to separate the information group having the common attribute from the information group not having the common attribute.

The information management method according to claim 4 or 5,
The reference feature amount vector selecting step includes calculating an average feature amount vector including an average value of feature amounts in the information group having the common attribute and an average value of the feature amounts in the information group having no common attribute. An information management method characterized by selecting, as a reference feature vector, an average feature vector having the largest distance between the configured average feature vector and the vector.

The information management method according to claim 2 or 3,
The method further includes a key information acquisition step of acquiring key information serving as a key of a preliminary search,
The information management method according to claim 1, wherein the information selecting step selects information having a specific characteristic from an information group preliminarily searched based on the key information.

The information management method according to claim 4, wherein
Further including a key information acquisition step of acquiring key information,
The feature amount obtaining step obtains a plurality of feature amounts from the key information,
The reference feature amount vector selecting step includes recombining a plurality of feature amounts obtained from the key information to form a plurality of feature amount vectors for the key information, and further includes a feature amount vector of an information group having the common attribute. The reference feature vector is selected based on the inter-vector distance between the feature vector of the key information and the inter-vector distance between the feature vector of the key group and the feature vector of the information group having no common attribute. An information management method, comprising:

The information management method according to any one of claims 2 to 8,
When the reference feature vector cannot be selected in the reference feature vector selection step, a value of a predetermined feature is acquired from the information, and the information is managed based on the value of the predetermined feature. Information management method.

An information management device that manages a plurality of pieces of information based on mutual similarity,
Sequentially or at a predetermined point in time, a plurality of feature amounts are obtained from each information in the information group to be managed, information having a specific attribute is selected from the information to be managed, and the obtained plurality of features are selected. From the amounts, a feature amount having a large difference between the value of the feature amount of the selected information and the value of the feature amount of the unselected information is obtained as a reference feature amount, and the information is determined based on the value of the reference feature amount. An information management device characterized by management.

An information management device that manages a plurality of pieces of information based on mutual similarity,
A feature amount obtaining unit that obtains a plurality of feature amounts from each information in the information group to be managed;
Information selection means for selecting information having a specific attribute from the information group to be managed,
A plurality of feature amounts obtained by the feature amount obtaining means are rearranged to form a plurality of feature amount vectors, and a distance between the feature amount vector of the selected information and the feature amount vector of the unselected information is set. Based on the reference feature vector selection means for obtaining a reference feature vector effective to distinguish the selected information and the non-selected information,
An information management device comprising: constructing the reference feature vector sequentially or at a predetermined time, and managing information based on the value.

The information management device according to claim 11,
The information management device according to claim 1, wherein the information selection unit allows a user to select information.

An information management device that manages a plurality of pieces of information based on mutual similarity,
Information separation means for separating a plurality of information groups having an attribute common to each other and an information group not having the common attribute,
A feature amount obtaining unit configured to obtain a plurality of feature amounts from each of information included in the information group having the common attribute and the information group having no common attribute,
A plurality of feature amounts obtained by the feature amount obtaining means are rearranged to form a plurality of feature amount vectors, and a feature amount vector of the information group having the common attribute and a feature of the information group not having the common attribute A reference feature vector selection means for extracting a reference feature vector serving as a reference for distinguishing the information group having the common attribute from the information group having no common attribute based on a distance between the vector and the vector; When,
An information management device comprising: constructing the reference feature vector sequentially or at a predetermined time, and managing information based on the value.

The information management device according to claim 13,
The information management device according to claim 1, wherein the information classification unit causes a user to classify the information group having the common attribute and the information group not having the common attribute.

The information management device according to claim 13 or 14,
The reference feature vector selection means is configured to calculate an average feature vector including an average value of feature values in the information group having the common attribute and an average feature value in the information group having no common attribute. An information management apparatus characterized in that an average feature vector having the largest distance between the configured average feature vector and the average feature vector is selected as a reference feature vector.

The information management device according to claim 11 or 12,
Further including key information acquisition means for acquiring key information serving as a key for the preliminary search,
The information management device, wherein the information selection means selects information having a specific characteristic from a group of information preliminarily searched based on the key information.

The information management device according to claim 13, wherein
Further including key information acquisition means for acquiring key information,
The feature amount obtaining means obtains a plurality of feature amounts from the key information,
The reference feature amount vector selecting unit reconfigures a plurality of feature amounts obtained from the key information in the feature amount obtaining unit to form a plurality of feature amount vectors for the key information, and further includes an information group having the common attribute. Based on the inter-vector distance between the characteristic amount vector of the key information and the characteristic amount vector of the key information and the inter-vector distance between the characteristic amount vector of the information group having no common attribute and the characteristic amount vector of the key information. An information management device for selecting a feature vector.

The information management device according to any one of claims 11 to 17,
When the reference feature amount vector cannot be selected by the reference feature amount vector selection means, a value of a predetermined feature amount is obtained from the information, and the information is managed based on the value of the predetermined feature amount. Information management device.

An information management program that manages a plurality of pieces of information based on similarities to each other,
On the computer,
A feature amount obtaining step of obtaining a plurality of feature amounts from information in the information group to be managed,
An information selection step of selecting information having a specific attribute from the information on which the feature amount has been acquired,
The plurality of feature amounts are rearranged to form a plurality of feature amount vectors, and the selected feature amount vector is determined based on an inter-vector distance between the feature amount vector of the selected information and the feature amount vector of the unselected information. A reference feature vector selection step of determining a reference feature vector effective to distinguish information from the non-selected information;
An information management program for causing a computer to execute processes including at least one of them sequentially or at a predetermined time.

An information management program that manages a plurality of pieces of information based on similarities to each other,
On the computer,
An information separation step of separating a plurality of information groups having an attribute common to each other and an information group not having the common attribute,
A feature amount obtaining step of obtaining a plurality of feature amounts from each of information included in the information group having the common attribute and the information group having no common attribute,
The plurality of feature amounts are rearranged to form a plurality of feature amount vectors, and the distance between vectors between the feature amount vector of the information group having the common attribute and the feature amount vector of the information group having no common attribute is calculated. A reference feature vector selection step of extracting a reference feature vector serving as a reference for distinguishing the information group having the common attribute from the information group having no common attribute,
An information management program for causing a computer to execute processes including at least one of them sequentially or at a predetermined time.