JP2011070408A

JP2011070408A - Method of acquiring sample image, sample image acquisition device, and image classification device

Info

Publication number: JP2011070408A
Application number: JP2009220993A
Authority: JP
Inventors: Kenji Matsuo; 賢治松尾
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2009-09-25
Filing date: 2009-09-25
Publication date: 2011-04-07
Anticipated expiration: 2029-09-25
Also published as: JP5346756B2

Abstract

【課題】サンプル画像収集の手間を効率化するとともに、分類カテゴリ及び画像検索用キーワードの多様性へ適応可能な画像分類装置を得る。
【解決手段】キーワードを指定し該キーワードが掲載されている複数のHTMLページ内の画像を収集するWeb画像検索手段２と、収集したWeb検索画像又は入力画像から色，形，模様といった低次元の信号値を測定し画像特徴量として出力する特徴量変換手段３と、複数のWeb検索画像から生成された画像特徴量について、キーワードに関連するサンプル画像として特定するサンプル画像特定手段４と、サンプル画像に対応する画像特徴量を抽出し、指定されたキーワードを名称とする分類カテゴリを代表する画像特徴量とする一方、分類カテゴリが未知である入力画像から生成された画像特徴量と、分類カテゴリを代表する画像特徴量との類似度を算出し、類似度に基づいて入力画像を所定の分類先に分類する分類カテゴリ判定手段５とを備える。
【選択図】図１An object of the present invention is to obtain an image classification device that can improve the efficiency of collecting sample images and can be adapted to the diversity of classification categories and image search keywords.
Web image search means 2 for specifying a keyword and collecting images in a plurality of HTML pages on which the keyword is posted, and a low-dimensional color, shape, pattern, etc. from the collected Web search image or input image Feature quantity conversion means 3 that measures signal values and outputs them as image feature quantities; sample image identification means 4 that identifies image feature quantities generated from a plurality of Web search images as sample images related to keywords; and sample images The image feature quantity corresponding to is extracted as an image feature quantity representative of the classification category whose name is the designated keyword, while the image feature quantity generated from the input image whose classification category is unknown, and the classification category Classification category determining means 5 is provided for calculating a similarity with a representative image feature and classifying an input image into a predetermined classification destination based on the similarity.
[Selection] Figure 1

Description

本発明は、大量の画像を所定の分類先に分類する画像分類装置に関し、指定されたキーワードに対応する画像の特徴量を適応的に生成し、生成された画像特徴量を用いてユーザが所望する観点で分類を実現するための画像分類装置、及び、大量の画像の中からユーザが希望する画像を探し出すための画像検索用のキーワードを付与する機能を有する画像分類装置に関する。 The present invention relates to an image classification apparatus that classifies a large number of images into a predetermined classification destination, and adaptively generates image feature amounts corresponding to a specified keyword, and a user desires using the generated image feature amounts. The present invention relates to an image classification apparatus for realizing classification from the viewpoint of performing image classification, and an image classification apparatus having a function of assigning a keyword for image search for searching for an image desired by a user from a large number of images.

また、本発明は、画像分類装置内で使用されるサンプル画像取得方法及び装置に関し、キーワードを指定することでWeb画像からサンプル画像を自動的に取得するサンプル画像取得方法及びサンプル画像取得装置に関する。 The present invention also relates to a sample image acquisition method and apparatus used in an image classification apparatus, and more particularly to a sample image acquisition method and sample image acquisition apparatus that automatically acquire a sample image from a Web image by specifying a keyword.

近年のディジタルカメラの普及により、誰でもが気軽に大量の写真を撮影できるようになった。そのため、ストレージに蓄積された膨大な枚数の画像（写真）を効率良く閲覧する方法が求められている。
これまで紙に印刷された写真では、写真を日時別、人物別、シーン別に分類整理して保管することで閲覧時の効率を高めることが一般的に行われていた。一方、記録媒体に記録されるディジタル画像では、画像の内容を的確に表すキーワードを画像に付与して保存し、閲覧する際にはキーワードを指定して直接所望の画像を検索することで、効率良く閲覧することが可能となった。 With the recent spread of digital cameras, anyone can easily take a large number of photos. Therefore, a method for efficiently browsing an enormous number of images (photos) stored in the storage is required.
In the past, photographs printed on paper have been generally improved in browsing efficiency by sorting and storing the photographs by date, person, and scene. On the other hand, in digital images recorded on a recording medium, keywords that accurately represent the contents of the images are assigned to the images and stored, and when browsing, the desired images can be searched directly by specifying the keywords. It became possible to browse well.

この場合、キーワードには、例えば人物名、物体名称、シーン、日時、場所、情景、状況、場面、事象、印象、意味、意図等、画像から連想される物事で、言語記述できるものであれば全てが当てはまる。 In this case, the keywords are things associated with the image, such as a person name, object name, scene, date, place, scene, situation, scene, event, impression, meaning, intention, etc. Everything is true.

しかしながら、キーワードの付与は人手を介して行われることから、画像の内容を一枚ずつ目視で確認し、適切なキーワードを考えて画像に付与するための手間が発生し、対象となる画像の枚数が増えると無視できない程大きな労力となっていた。
そこで、画像へのキーワード付与に要する作業負担を軽減するため、特許文献１〜特許文献３に示されるような画像分類装置や画像検索用キーワード付与装置が提案されている。 However, since keyword assignment is performed manually, it takes time to visually check the contents of the images one by one, and consider the appropriate keyword to assign to the image. The number of images to be processed As the number increased, it was a great effort that could not be ignored.
Therefore, in order to reduce the work burden required for assigning keywords to images, image classification devices and image search keyword assignment devices as disclosed in Patent Documents 1 to 3 have been proposed.

特許文献１及び特許文献２の記載の装置は、入力された画像が予め設定された分類カテゴリの内、いずれの分類カテゴリに属しているかを、入力された画像から抽出される色，形，模様といった画像特徴量と、予め各分類カテゴリを代表するサンプル画像から抽出された画像特徴量との類似度に基づいて、自動的に判定する画像分類装置である。 The devices described in Patent Document 1 and Patent Document 2 are provided with a color, shape, or pattern extracted from an input image indicating which of the classification categories the input image belongs to. The image classification device automatically determines the image feature amount based on the similarity between the image feature amount extracted in advance from the sample image representing each classification category.

特に、特許文献２に記載の画像分類装置は、分類カテゴリ内を代表するサンプル画像から抽出される特徴量の中でも、他のカテゴリのサンプル画像とは大きく異なる顕著な特徴量を、カテゴリ間の分類のための画像特徴量として用いることによって、カテゴリ分類の精度を大きく改善するという効果を有する。 In particular, the image classification device described in Patent Literature 2 classifies significant feature amounts that are significantly different from sample images of other categories among the feature amounts extracted from sample images that represent the classification categories. By using it as an image feature amount for, there is an effect of greatly improving the accuracy of category classification.

また、特許文献３には、特許文献１又は特許文献２に記載された画像分類装置によりカテゴリ分類された結果を使って、分類されたカテゴリの名称をキーワードとして画像に付与することを特徴とした画像検索用キーワード付与装置が提案されている。 Patent Document 3 is characterized in that, using the result of category classification by the image classification device described in Patent Document 1 or Patent Document 2, the name of the classified category is assigned to the image as a keyword. An image search keyword assigning device has been proposed.

特開平１１−３２８４２２号JP-A-11-328422 特開２００８−３３７０５号JP 2008-33705 A 特開２００２−３２７５１号JP 2002-32751 A

上述した画像分類装置及び画像検索用キーワード付与装置によれば、入力された画像を自動的に分類してキーワードを付与することができる。しかしながら、画像分類を実施するためには、事前に各分類カテゴリの画像特徴量を生成しておく必要があり、そのためには予め各分類カテゴリを代表するサンプル画像を大量に収集する必要がある。 According to the above-described image classification device and image search keyword assignment device, it is possible to automatically classify an input image and assign a keyword. However, in order to perform image classification, it is necessary to generate an image feature amount of each classification category in advance. For this purpose, it is necessary to collect a large amount of sample images representing each classification category in advance.

特許文献１〜特許文献３に記載された装置におけるサンプル画像の収集方法については、いずれの例においても特段の記述が無いため、自動分類とキーワード付与を実現するためには、事前にサンプル画像を人手で収集する必要があり、その収集作業に膨大な手間がかかってしまうという課題があった。
そのため、サンプル画像収集の手間を省くため、例えば自動もしくは半自動で、分類の基準となる各カテゴリを代表するサンプル画像を効率良く収集するサンプル画像取得方法の実現が望まれていた。 As for the sample image collection method in the apparatus described in Patent Literature 1 to Patent Literature 3, since there is no special description in any of the examples, in order to realize automatic classification and keyword assignment, a sample image must be stored in advance. There is a problem that it is necessary to collect manually, and the collection work takes a lot of time and effort.
Therefore, in order to save the trouble of collecting sample images, it has been desired to realize a sample image acquisition method that efficiently collects sample images representing each category as a classification reference, for example, automatically or semi-automatically.

また、人によって分類・検索の観点は様々に異なるため、万人に共通する分類カテゴリ及び画像検索用キーワードを設定するのは困難である。できる限り多くの種類の分類カテゴリについて画像特徴量を生成するとしても、分類カテゴリ及び画像検索用キーワードの候補は無限に近いため、その全てについてサンプル画像を準備するのは現実的ではないという課題があった。
そのため、分類カテゴリ及び画像検索用キーワードの多様性に対処するため、ユーザの要求に応じて適応的に分類カテゴリを設定でき、またその画像特徴量を作り出すことが望まれていた。 In addition, since the viewpoint of classification / search varies depending on the person, it is difficult to set a classification category and an image search keyword common to all. Even if image feature quantities are generated for as many types of classification categories as possible, there are almost infinite number of classification category and image search keyword candidates, so it is not practical to prepare sample images for all of them. there were.
Therefore, in order to deal with the diversity of classification categories and image search keywords, it has been desired to adaptively set the classification categories according to the user's request and to create the image feature amounts.

本発明は上記事情に鑑みて提案されたものであり、サンプル画像収集の手間を効率化するサンプル画像取得方法及装置を得るとともに、分類カテゴリ及び画像検索用キーワードの多様性へ適応可能な画像分類装置を提供することを目的としている。 The present invention has been proposed in view of the above circumstances, and provides a sample image acquisition method and apparatus that improves the efficiency of sample image collection, and image classification that can be adapted to the diversity of classification categories and image search keywords. The object is to provide a device.

上記目的を達成するため本発明は、Webに蓄積されているテキスト情報および画像情報を利用し、指定されたキーワードに関連するサンプル画像を人出を介さず自動的に、又は、ユーザの要求に応じて適応的に収集することを主要な構成としている。 In order to achieve the above object, the present invention uses text information and image information stored on the Web, and automatically generates a sample image related to a specified keyword without human intervention or in response to a user request. The main structure is to collect adaptively.

すなわち、請求項１のサンプル画像取得方法は、ネットワークを介してHTMLページに接続可能としたシステムにおいて、キーワードを指定し該キーワードが掲載されている複数のHTMLページ内の画像をWebから収集し、前記Webから収集した複数のWeb検索画像から色，形，模様といった低次元の信号値を測定し画像特徴量としてそれぞれ出力し、前記複数の画像特徴量について、特徴空間上での分布特性から距離の近い画像特徴量同士で特徴空間をクラスタリングし、分布数の最も多いクラスタ又は分散の小さいクラスタに属する画像を指定されたキーワードに関連するサンプル画像として特定することを特徴としている。 That is, in the sample image acquisition method of claim 1, in a system that enables connection to an HTML page via a network, a keyword is specified, and images in a plurality of HTML pages on which the keyword is posted are collected from the Web, Low-dimensional signal values such as color, shape, and pattern are measured from a plurality of Web search images collected from the Web and output as image feature amounts, respectively, and the plurality of image feature amounts are separated from distribution characteristics in a feature space. The feature space is clustered with image feature quantities close to each other, and an image belonging to a cluster having the largest distribution number or a cluster having a small variance is specified as a sample image related to a designated keyword.

請求項２のサンプル画像取得装置は、ネットワークを介してHTMLページに接続可能としたシステムにおいて、次の構成を含むことを特徴としている。
キーワードを指定し該キーワードが掲載されている複数のHTMLページ内の画像をWebから収集するWeb画像検索手段。
前記Web画像検索手段から収集したWeb検索画像から色，形，模様といった低次元の信号値を測定し画像特徴量として出力する特徴量変換手段。
前記特徴量変換手段で複数のWeb検索画像からそれぞれ生成された画像特徴量について、特徴空間上での分布特性から距離の近い画像特徴量同士で特徴空間をクラスタリングし、分布数の最も多いクラスタ又は分散の小さいクラスタに所属する画像を指定されたキーワードに関連するサンプル画像として特定するサンプル画像特定手段。 According to a second aspect of the present invention, there is provided a sample image acquisition apparatus including a following configuration in a system that can be connected to an HTML page via a network.
Web image search means for specifying a keyword and collecting images in a plurality of HTML pages on which the keyword is posted from the Web.
Feature quantity conversion means for measuring low-dimensional signal values such as color, shape, and pattern from the Web search image collected from the Web image search means and outputting them as image feature quantities.
For the image feature amounts respectively generated from the plurality of Web search images by the feature amount conversion means, the feature space is clustered between image feature amounts that are close to each other from the distribution characteristics in the feature space, and the cluster having the largest distribution number or Sample image specifying means for specifying an image belonging to a cluster with small variance as a sample image related to a specified keyword.

請求項３の画像分類装置は、請求項２のサンプル画像取得装置の構成に加えて、前記サンプル画像特定手段により特定されたサンプル画像に対応する画像特徴量を抽出し、前記指定されたキーワードを名称とする分類カテゴリを代表する画像特徴量とする一方、分類カテゴリが未知である前記入力画像から前記特徴量変換手段により生成された画像特徴量と、前記分類カテゴリを代表する画像特徴量との類似度を算出し、前記類似度に基づいて入力画像を所定の分類先に分類する分類カテゴリ判定手段とを具備したことを特徴としている。 In addition to the configuration of the sample image acquisition device according to claim 2, the image classification device according to claim 3 extracts an image feature amount corresponding to the sample image specified by the sample image specifying means, and extracts the specified keyword. An image feature amount representing a classification category as a name, and an image feature amount generated by the feature amount conversion unit from the input image whose classification category is unknown, and an image feature amount representing the classification category A classification category determining unit that calculates a similarity and classifies an input image into a predetermined classification destination based on the similarity is provided.

請求項４は、請求項３の画像分類装置において、前記分類カテゴリ判定手段により、前記類似度に基づいて前記分類カテゴリに属しているかを判定し、前記入力画像が分類カテゴリに属していると判定した場合は、入力画像に分類カテゴリの名称をキーワードとして付与する画像検索用キーワード付与手段を備えたことを特徴としている。 According to a fourth aspect of the present invention, in the image classification apparatus according to the third aspect, the classification category determination unit determines whether the input image belongs to the classification category based on the similarity, and determines that the input image belongs to the classification category. In this case, the image search keyword adding means for adding the name of the category to the input image as a keyword is provided.

請求項５は、請求項３又は請求項４の画像分類装置において、
前記特徴量変換手段は、
画像からエッジや凹凸等の信号変化の大きい点をキーポイントとして複数抽出し、各キーポイント付近の色，形，模様等から算出される特徴ベクトルを出力する特徴ベクトル抽出手段と、
前記特徴ベクトル抽出手段により異なる多種多様の画像から抽出された特徴ベクトル群の特徴空間上での分布特性を測定し、距離の近い特徴ベクトル同士で特徴空間をクラスタリングして、各クラスタの代表ベクトルをCodeBookとして出力するCodeBook生成手段と、
前記CodeBook生成手段により生成されたCodeBookを保存する蓄積部と、
前記特徴ベクトル抽出手段により抽出された特徴ベクトルを、前記蓄積部に蓄積されているCodeBookの中で最も近い代表ベクトルへとベクトル量子化するのと同時に各代表ベクトルの出現頻度を求め、そのヒストグラムを入力された画像の画像特徴量として出力するベクトル量子化手段と
を備えたことを特徴としている。 A fifth aspect of the present invention relates to the image classification device according to the third or fourth aspect,
The feature amount conversion means includes:
A feature vector extracting means for extracting a plurality of points having large signal changes such as edges and irregularities from an image as key points, and outputting a feature vector calculated from a color, shape, pattern, etc. near each key point;
Measure the distribution characteristics in the feature space of the feature vector group extracted from a wide variety of different images by the feature vector extraction means, cluster the feature space between feature vectors that are close to each other, and select representative vectors for each cluster CodeBook generation means to output as CodeBook,
An accumulator for storing the CodeBook generated by the CodeBook generating means;
The feature vector extracted by the feature vector extraction means is vector quantized to the nearest representative vector in the CodeBook stored in the storage unit, and at the same time, the appearance frequency of each representative vector is obtained, and the histogram is obtained. It is characterized by comprising vector quantization means for outputting as an image feature quantity of the input image.

請求項６は、請求項３又は請求項４の画像分類装置において、
前記分類カテゴリ判定手段は、
前記サンプル画像特定手段により特定されたサンプル画像の特徴量を、指定されたキーワードを名称とする分類カテゴリの画像特徴量として保存する蓄積部と、
指定されたキーワードを名称とする分類カテゴリの画像特徴量が、前記蓄積部に既に保存されている他の様々な分類カテゴリの画像特徴量と大きく異なる相違点を分類判定の基準として機械学習により抽出し、分類判定の基準を分類器として出力するカテゴリ分類器生成手段と、
分類カテゴリが未知である入力画像から前記特徴量変換手段により生成された画像特徴量が、指定されたキーワードを名称とする分類カテゴリに属しているか否かを、前記カテゴリ分類器生成手段により生成された分類器を用いて判定し、属していると判定された場合は入力画像を所定の分類先に分類するカテゴリ分類手段と
を備えたことを特徴としている。 A sixth aspect of the present invention relates to the image classification device according to the third or fourth aspect,
The classification category determination means includes
An accumulation unit for storing the feature amount of the sample image specified by the sample image specifying means as an image feature amount of a classification category whose name is a specified keyword;
The difference between the image feature amount of the classification category whose name is the designated keyword and the image feature amount of other various category categories already stored in the storage unit is extracted by machine learning as a criterion for classification determination. A category classifier generating means for outputting a classification judgment criterion as a classifier;
Whether the image feature quantity generated by the feature quantity conversion means from the input image whose classification category is unknown belongs to the classification category whose name is the designated keyword is generated by the category classifier generation means. It is characterized by comprising a category classification means for classifying an input image into a predetermined classification destination when it is determined that the input image belongs.

請求項１及び請求項２の発明によれば、人手による目視確認を必要とせず、指定されたキーワードに関連した画像をWebから自動的に収集できる。したがって、画像特徴量を生成するために必要となるサンプル画像を人手で収集する手間を省くことができ、指定されたキーワードに関連した画像特徴量を自動的に生成することできる。 According to the first and second aspects of the present invention, it is possible to automatically collect images related to a designated keyword from the Web without requiring manual visual confirmation. Therefore, it is possible to save the labor of manually collecting the sample images necessary for generating the image feature amount, and it is possible to automatically generate the image feature amount related to the designated keyword.

請求項３の発明によれば、予め設定されている分類カテゴリだけでなく、ユーザがキーワードを指定して新たな分類カテゴリを設定し、適応的に画像を分類することができる。この際、指定されるキーワードに制限が無いため、ユーザは所望するキーワードによる画像分類が可能となる。例えば、ユーザが所持している画像の中から、ユーザが所望するキーワードに関連する画像を分類することができるようになる。 According to the invention of claim 3, in addition to the preset classification category, the user can specify a keyword and set a new classification category to adaptively classify the images. At this time, since there is no restriction on the keyword to be specified, the user can perform image classification based on a desired keyword. For example, an image related to a keyword desired by the user can be classified from among images possessed by the user.

請求項４の発明によれば、ユーザが指定したキーワードに関連すると判定された画像に、画像検索用キーワードを付与することができる。例えば、ユーザが所持している画像の中から、ユーザが指定したキーワードに関連すると判定された画像に、画像検索用キーワードを付与することができる。 According to the fourth aspect of the present invention, an image search keyword can be assigned to an image determined to be related to a keyword designated by the user. For example, an image search keyword can be assigned to an image determined to be related to a keyword designated by the user from among images possessed by the user.

請求項５及び請求項６の発明によれば、画像からより安定した特徴量を抽出し、画像分類および画像検索用キーワードの付与の精度をより高めることができる。
さらに、人手を介した画像検索用のキーワード付与では、同じ画像を対象とした場合であっても、人の主観によって付与するキーワードが異なることがあり、画像の内容を的確に表すキーワードの付与には人物間で変動する可能性があったが、本発明の画像分類装置による画像検索用キーワードの付与によれば、画像にいずれのキーワードを付与するのが適切かを、画像特徴量の類似度に基づいて客観的な基準をもって判定され、類似していると判定された分類カテゴリの名称がキーワードとして付与されるため、人に依存せず常に統一された基準をもって画像検索用のキーワードが付与することができる。 According to the fifth and sixth aspects of the invention, it is possible to extract a more stable feature amount from an image and further improve the accuracy of image classification and image search keyword assignment.
Furthermore, in the keyword assignment for image search through human hands, even when the same image is targeted, the keyword assigned may differ depending on the subjectivity of the person. However, according to the image classification device according to the present invention, it is possible to determine which keyword is appropriate for the image and the similarity of the image feature amount. Since the classification category names determined to be similar based on the objective criteria and given as keywords are assigned as keywords, the keywords for image search are always assigned based on unified criteria without depending on people. be able to.

本発明の画像分類装置の構成を示すブロック図である。It is a block diagram which shows the structure of the image classification device of this invention. （ａ）〜（ｃ）は本発明の画像分類装置の特徴量変換手段における画像特徴量を算出する過程を説明するためのグラフ図である。(A)-(c) is a graph for demonstrating the process which calculates the image feature-value in the feature-value conversion means of the image classification device of this invention. 本発明の画像分類装置のサンプル画像特定手段における例として「トマトのクラスタ作成」を説明するためのグラフ図である。It is a graph for demonstrating "tomato cluster creation" as an example in the sample image specific means of the image classification device of this invention.

本発明の画像分類装置の実施の形態の一例について、図面を参照しながら説明する。
本発明の画像分類装置は、ネットワークを介してHTMLページに接続可能とすることで、HTMLファイルの内容を手掛かりとして、Web上からサンプル画像を収集して分類するものである。
Web上のサーバには多数のHTMLファイルが蓄積されており、それらはネットワークを介して接続されている。HTMLはテキストと画像から構成され、同じHTML内のテキストの中に、画像の内容を的確に表すキーワードが含まれている場合が多い。本発明は、このことに着目し、キーワードを指定し、通常のWeb画像の検索方法を用いることでWeb上から複数のサンプル画像を収集し分類する。 An example of an embodiment of an image classification device of the present invention will be described with reference to the drawings.
The image classification apparatus according to the present invention collects and classifies sample images from the Web using the contents of an HTML file as a clue by enabling connection to an HTML page via a network.
A large number of HTML files are stored on a server on the Web, and they are connected via a network. HTML is composed of text and images, and in many cases, keywords in the same HTML contain keywords that accurately represent the contents of the images. The present invention pays attention to this, collects and classifies a plurality of sample images from the Web by specifying a keyword and using a normal Web image search method.

本発明の画像分類装置は、Web上のテキスト情報及び画像にアクセスするための送受信部と、ユーザによるキーワードの入力を行う入力部と、処理中の画像データを一時的に展開する主記憶部と、画像と各種テキスト情報を格納する蓄積部と、計算および各種処理を制御する中央演算部と、処理結果を表示する表示部とを有して構成されている。 An image classification apparatus according to the present invention includes a transmission / reception unit for accessing text information and images on the Web, an input unit for inputting a keyword by a user, and a main storage unit for temporarily developing image data being processed. The storage unit stores images and various text information, the central processing unit controls the calculation and various processes, and the display unit displays the processing results.

すなわち、本発明の画像分類装置は、図１に示すように、入力部から入力されるキーワードに対してネットワーク１に接続してHTMLページ内の画像を収集するWeb画像検索手段２と、収集画像又は入力画像から画像特徴量を算出する特徴量変換手段３と、キーワードに関連するサンプル画像を特定するサンプル画像特定手段４と、入力画像を所定の分類先に分類する分類カテゴリ判定手段５と、入力画像に分類カテゴリの名称を検索用のキーワードとして付与する画像検索用キーワード付与手段６と、各種の情報を記録する蓄積部７を有している。また、画像分類装置に使用される本発明のサンプル画像取得装置は、Web画像検索手段２と、特徴量変換手段３と、サンプル画像特定手段４とにより構成されている。 That is, as shown in FIG. 1, the image classification apparatus of the present invention is connected to a network 1 for keywords input from an input unit, collects images in an HTML page, and collects images. Alternatively, a feature amount conversion unit 3 that calculates an image feature amount from an input image, a sample image specification unit 4 that specifies a sample image related to a keyword, a classification category determination unit 5 that classifies the input image into a predetermined classification destination, It has an image search keyword assigning means 6 for assigning the name of the classification category to the input image as a search keyword, and an accumulation unit 7 for recording various information. The sample image acquisition device of the present invention used for the image classification device is composed of a Web image search means 2, a feature amount conversion means 3, and a sample image specifying means 4.

画像分類装置は、（１）ユーザにより指定されたキーワード（指定キーワード）に対して画像を収集して画像特徴量を生成する機能（特徴量生成時）と、（２）分類カテゴリが未知（キーワードが付与されていない）の入力画像に対して画像検索用のキーワードを付与する機能（分類・検索時）とを有している。 The image classification device includes (1) a function for collecting an image for a keyword (designated keyword) designated by a user and generating an image feature quantity (at the time of feature quantity creation), and (2) an unknown classification category (keyword A function (at the time of classification / search) for assigning an image search keyword to an input image.

Web画像検索手段２は、ユーザにより指定されたキーワード（指定キーワード）が掲載されているHTMLページ内の画像をWebから多数収集し、特徴量変換手段３へ出力する。HTMLページ内の画像にはテキスト情報としての文字情報が付与されている場合が多いので、指定キーワードに合致したテキスト情報を有する画像を収集して取り込む。 The Web image search means 2 collects a large number of images in an HTML page on which a keyword (specified keyword) specified by the user is posted from the Web, and outputs it to the feature amount conversion means 3. Since the image information in the HTML page is often provided with character information as text information, an image having text information that matches the specified keyword is collected and captured.

特徴量変換手段３は、Web画像検索手段２で収集した画像（収集画像）や直接入力される入力画像から色，形，模様といった低次元の信号値を測定し、画像特徴量として出力する。
すなわち、特徴量変換手段３は、収集画像から特徴ベクトルを算出する特徴ベクトル抽出手段３１と、特徴ベクトルからCodeBookを生成するCodeBook生成手段３２と、特徴ベクトルを量子化して画像特徴量を算出するベクトル量子化手段３３を備えて構成されている。 The feature amount conversion unit 3 measures low-dimensional signal values such as color, shape, and pattern from the image (collected image) collected by the Web image search unit 2 and the input image that is directly input, and outputs it as an image feature amount.
That is, the feature amount conversion unit 3 includes a feature vector extraction unit 31 that calculates a feature vector from a collected image, a CodeBook generation unit 32 that generates a CodeBook from the feature vector, and a vector that calculates an image feature amount by quantizing the feature vector. A quantization means 33 is provided.

特徴ベクトル抽出手段３１は、画像からエッジや凹凸等の信号変化の大きい点をキーポイントとして複数抽出し、各キーポイント付近の色，形，模様等から算出される特徴ベクトルを出力する。この特徴ベクトル抽出手段３１で行われる特徴ベクトルの算出は、公知の技術で実現される。具体的な実現手段としては、例えば、Loweらによって提案されたSIFT（Scale-Invariant Feature Transform, Lowe, D. "Distinctive Image Features from. scale-invariant keypoints," International journal of Computer Vision, Vol.60, No.2, pp. 91-110, 2004）を用いることができる。この場合、特徴ベクトルは128次元のベクトルとして出力される。 The feature vector extraction means 31 extracts a plurality of points with large signal changes such as edges and irregularities from the image as key points, and outputs a feature vector calculated from the color, shape, pattern, etc. near each key point. The calculation of the feature vector performed by the feature vector extracting unit 31 is realized by a known technique. Specific implementation methods include, for example, SIFT (Scale-Invariant Feature Transform, Lowe, D. "Distinctive Image Features from. Scale-invariant keypoints," International journal of Computer Vision, Vol. 60, proposed by Lowe et al. No. 2, pp. 91-110, 2004) can be used. In this case, the feature vector is output as a 128-dimensional vector.

SIFTを使用することにより、回転や大きさ等について見え方の異なる画像でも、同一被写体および同じ内容の画像であれば、同じキーポイントが抽出され、同じ特徴ベクトルが抽出される。SIFTは、以下の流れで（Ａ）キーポイントの検出と、（Ｂ）特徴ベクトルの抽出の各処理が行われる。
（Ａ）キーポイントの検出
（ａ）キーポイント候補点の検出
（ｂ）キーポイントのローカライズ
（Ｂ）特徴ベクトルの抽出
（ｃ）オリエンテーションの算出
（ｄ）特徴量の抽出 By using SIFT, the same key points are extracted and the same feature vectors are extracted even if the images are different in view of rotation, size, etc., if they are the same subject and the same contents. In SIFT, each process of (A) key point detection and (B) feature vector extraction is performed in the following flow.
(A) Key point detection (a) Key point candidate point detection (b) Key point localization (B) Feature vector extraction (c) Orientation calculation (d) Feature quantity extraction

（ａ）のキーポイント候補点の検出では、DoG（Difference-of-Gaussian）処理により画像からエッジや凹凸等の信号変化の大きい点をキーポイント候補点として複数検出する。ガウス関数のスケールを数段階に変化させ、ガウス関数と入力画像を畳み込んだ平滑化画像を複数作成し、それらの平滑化画像の差分画像（DoG画像）内で極値となる点をキーポイント候補点として検出する。
（ｂ）のキーポイントのローカライズでは、（ａ）で検出されたキーポイント候補点から安定して抽出できるキーポイントを絞り込む。すなわち、コントラストの小さい点、主曲率の大きな点を、ノイズの影響を受けた点、安定的な抽出には向かない点として、キーポイントの候補点からそれぞれ削除する。
（ｃ）のオリエンテーションの算出では、同じキーポイントであれば画像が回転しても同じ特徴ベクトルが抽出できるようにするため、平滑化画像内の各点の勾配から、各キーポイントを特徴付ける方向を算出する。具体的には、キーポイント周辺の矩形領域から勾配方向と勾配強度に関するヒストグラムを測定する。
先ず、勾配方向に関して36に量子化された階級で分類する。
次に、分類された階級に勾配強度を加算し、ヒストグラム内で最頻を示した階級の方向をオリエンテーションとして算出する。
（ｄ）の特徴量の記述では、（ｃ）で求めたオリエンテーションに基づいて、各キーポイントにおける特徴ベクトルの抽出対象領域を正規化し、正規化して切り出されたキーポイント周辺の特徴ベクトルの抽出対象領域から特徴ベクトルを算出する。 In the detection of key point candidate points in (a), a plurality of points having large signal changes such as edges and irregularities are detected from the image by DoG (Difference-of-Gaussian) processing. Change the scale of the Gaussian function in several stages, create multiple smoothed images convoluted with the Gaussian function and the input image, and key points are the points that become extreme values in the difference image (DoG image) of those smoothed images Detect as candidate points.
In the localization of key points in (b), key points that can be stably extracted from the key point candidate points detected in (a) are narrowed down. That is, a point having a small contrast and a point having a large main curvature are deleted from the key point candidate points as points affected by noise and points not suitable for stable extraction.
In the orientation calculation of (c), in order to extract the same feature vector even if the image is rotated at the same key point, the direction characterizing each key point is determined from the gradient of each point in the smoothed image. calculate. Specifically, a histogram relating to the gradient direction and gradient intensity is measured from a rectangular area around the key point.
First, classification is performed by a class quantized to 36 with respect to the gradient direction.
Next, the gradient strength is added to the classified class, and the direction of the class showing the mode in the histogram is calculated as the orientation.
In the description of the feature quantity in (d), the feature vector extraction target area at each key point is normalized based on the orientation obtained in (c), and the feature vector extraction target around the key point cut out by normalization is extracted. A feature vector is calculated from the region.

次に、CodeBook生成手段３２によるCodeBookの生成及びベクトル量子化手段３３による画像特徴量の算出について、図２を参照しながら説明する。
CodeBook生成手段３２は、特徴ベクトル抽出手段３１により様々に異なる多種多様の画像から抽出された特徴ベクトル（図２（ａ）における各×点）を特徴空間上にプロットし、特徴ベクトル群の特徴空間上での分布特性を測定し、距離の近い特徴ベクトル同士で特徴空間をクラスタリングして、各クラスタの代表ベクトル（図２（ａ）における○×点が特徴ベクトル）を後段のベクトル量子化手段３３のためのCodeBook７１として出力する。 Next, generation of CodeBook by the CodeBook generation unit 32 and calculation of image feature amounts by the vector quantization unit 33 will be described with reference to FIG.
The CodeBook generating unit 32 plots the feature vectors (each x point in FIG. 2A) extracted from various different images by the feature vector extracting unit 31 on the feature space, and the feature space of the feature vector group The distribution characteristics above are measured, the feature space is clustered between feature vectors that are close to each other, and the representative vector of each cluster (the point XX in FIG. 2A is the feature vector) is the vector quantization means 33 in the subsequent stage. Is output as CodeBook 71 for

クラスタリングの具体的な実現手段としては、公知技術であるk-meansを用いることができる。
k-means によるクラスタリングは、以下の（１）〜（４）の手順により行われる。
（１）データを指定された任意の数であるk個のクラスタに分割する。
（２）各クラスタについて重心を計算する。
（３）全てのデータについて、重心との距離を最小にするクラスタを求め、各データを最小のクラスタに割り当てる。
（４）前回のクラスタから変化がなければ終了する。変化がある場合は、（２）に戻る。 As a specific means for realizing clustering, k-means, which is a known technique, can be used.
Clustering by k-means is performed by the following procedures (1) to (4).
(1) The data is divided into k clusters, which are an arbitrary number.
(2) Calculate the centroid for each cluster.
(3) A cluster that minimizes the distance from the center of gravity is obtained for all data, and each data is assigned to the smallest cluster.
(4) End if there is no change from the previous cluster. If there is a change, return to (2).

この場合、分割クラスタ数ｋを任意に設定でき、生成される画像特徴量はｋ次元となる。CodeBook７１は、様々に異なる多種多様の画像を特徴ベクトル抽出手段３１に予め入力することで、CodeBook生成手段３２により生成され、蓄積部７に保存される。 In this case, the number of divided clusters k can be arbitrarily set, and the generated image feature amount is k-dimensional. The CodeBook 71 is generated by the CodeBook generating unit 32 by inputting various different images in advance to the feature vector extracting unit 31 and stored in the storage unit 7.

ベクトル量子化手段３３は、入力画像に対して特徴ベクトル抽出手段３１により抽出された特徴ベクトルを、蓄積部７に蓄積されているCodeBook７１の中で最も近い代表ベクトル（CodeBook）へとベクトル量子化する（図２（ｂ））のと同時に各代表ベクトルの出現頻度を求め、そのヒストグラムを入力画像の特徴量として出力する（図２（ｃ））。したがって、ベクトル量子化手段３３で算出される画像特徴量は、ｋ次元のヒストグラムで生成される。 The vector quantization means 33 vector-quantizes the feature vector extracted by the feature vector extraction means 31 for the input image to the nearest representative vector (CodeBook) in the CodeBook 71 stored in the storage unit 7. Simultaneously with (FIG. 2B), the appearance frequency of each representative vector is obtained, and the histogram is output as the feature quantity of the input image (FIG. 2C). Therefore, the image feature amount calculated by the vector quantization unit 33 is generated as a k-dimensional histogram.

ユーザにより指定されたキーワード（指定キーワード）に対して画像を収集する場合、Web画像検索手段２では、現在実用化されている検索方法でキーワードに対してWeb画像が検索される。例えば、Google（商標名）の画像検索サービスによる検索方法は、従来のキーワードを指定してテキスト情報を検索する方法を拡張したものである。すなわち、指定したキーワードが含まれるHTML内の画像を検索結果として提示するだけである。したがって、HTMLファイル内のテキストには画像の内容を的確に表すキーワード以外のノイズとなる成分も多く含まれるため、Web画像検索により検索された画像の内容が必ずしも指定されたキーワードとは一致しない場合も多い。
例えば、キーワードを「トマト」と指定しても、「トマト」の外観を示した画像だけでなく、「トマトジュース」や「トマトケチャップ」、さらにはトマトとは無関係の画像も検索される場合がある。 When collecting images for a keyword (designated keyword) specified by the user, the Web image search means 2 searches for a Web image for the keyword by a search method currently in practical use. For example, a search method using an image search service of Google (trade name) is an extension of a conventional method for searching text information by specifying a keyword. In other words, an image in HTML including the specified keyword is only presented as a search result. Therefore, the text in the HTML file contains many noise-causing components other than the keywords that accurately represent the image contents, so the image contents retrieved by Web image search do not always match the specified keywords. There are also many.
For example, even if the keyword “tomato” is specified, not only the image showing the appearance of “tomato” but also “tomato juice”, “tomato ketchup”, and even an image unrelated to tomato may be searched. is there.

本発明の画像分類装置では、ベクトル量子化手段３３から出力される各画像特徴量からノイズとなる成分を除去するためのサンプル画像特定手段４を設けている。
サンプル画像特定手段４は、Web画像検索手段２によって収集された複数の画像（収集画像）を、特徴量変換手段３により画像特徴量（ｋ次元）にそれぞれ変換し、特徴空間上での分布特性から距離の近い画像特徴量同士で特徴空間をクラスタリングし、分布数の最も多いクラスタに所属する画像、又は、分散の小さいクラスタに所属する画像を指定されたキーワードに関連するサンプル画像として特定する。 In the image classification device of the present invention, the sample image specifying means 4 for removing a component that becomes noise from each image feature amount output from the vector quantization means 33 is provided.
The sample image specifying unit 4 converts a plurality of images (collected images) collected by the Web image search unit 2 into image feature amounts (k dimensions) by the feature amount conversion unit 3, respectively, and distribution characteristics in the feature space The feature space is clustered with image feature quantities that are close to each other, and an image belonging to the cluster having the largest distribution number or an image belonging to the cluster having a small variance is specified as a sample image related to the designated keyword.

例えば、キーワードを「トマト」を入力し、「トマト」の画像を得たい場合、Web画像から検索される画像はトマトの外観画像の他、「トマトジュース」や「トマトケチャップ」、「トマトとは無関係の画像」が含まれる。これらの画像について、特徴量変換手段３により画像特徴量にそれぞれ変換すると、図３のようになる。
そして、特徴空間上での分布特性から距離の近い画像特徴量同士で特徴空間をクラスタリングし、分布数の最も多いクラスタ（図３の場合、Ｘ領域）に所属する画像を指定されたキーワード「トマト」に関連するサンプル画像として特定する。また、別の実施形態として、分散の小さいクラスタに所属する画像を指定されたキーワード「トマト」に関連するサンプル画像として特定しても良い。 For example, if you enter the keyword "tomato" and you want to get an image of "tomato", the image searched from the web image is the appearance image of tomato, "tomato juice", "tomato ketchup", "tomato""Irrelevantimage". When these images are converted into image feature amounts by the feature amount conversion means 3, they are as shown in FIG.
Then, the feature space is clustered between image feature quantities that are close to each other based on the distribution characteristics in the feature space, and an image belonging to the cluster having the largest number of distributions (X region in the case of FIG. 3) is designated as the keyword “tomato As a sample image related to "." As another embodiment, an image belonging to a cluster with small variance may be specified as a sample image related to the designated keyword “tomato”.

クラスタリングの具体的な実現手段としては、公知技術であるx-meansを用いることができる。x-means は、上述したk-meansと異なり、分割させたいクラスタ数kを指定する必要はなく、自動的にクラスタ数kまでもが決定されクラスタリングが実施される。 As a specific means for realizing clustering, x-means, which is a known technique, can be used. Unlike the k-means described above, the x-means need not specify the number k of clusters to be divided, and the cluster number k is automatically determined and clustering is performed.

分類カテゴリ判定手段５は、特徴量変換手段３により生成された画像特徴量から、サンプル画像特定手段４により特定されたサンプル画像に対応する画像特徴量を抽出して、指定されたキーワードの分類カテゴリを代表する画像特徴量とする一方、分類カテゴリが未知である入力画像が入力された場合、特徴量変換手段３により生成された画像特徴量との類似度を算出し、類似度に基づいて入力画像を所定の分類先に分類する。 The classification category determination unit 5 extracts the image feature amount corresponding to the sample image specified by the sample image specification unit 4 from the image feature amount generated by the feature amount conversion unit 3 and classifies the category of the designated keyword. When an input image whose classification category is unknown is input, the similarity with the image feature generated by the feature conversion unit 3 is calculated and input based on the similarity The image is classified into a predetermined classification destination.

すなわち、分類カテゴリ判定手段５は、指定キーワードを名称とする分類カテゴリを代表する画像特徴量とする画像分類を行う機能（特徴量生成時）を実現させるためのカテゴリ分類器生成手段５１と、画像検索用のキーワードを付与する機能（分類・検索時）を実現させるため分類カテゴリが未知（キーワードが付与されていない）の入力画像に対してカテゴリの分類を行うカテゴリ分類手段５２とを有している。また、蓄積部７では、他の様々な分類カテゴリの画像特徴量７２を予め保存するとともに、サンプル画像特定手段４により特定されたサンプル画像の特徴量を、指定されたキーワードを名称とする分類カテゴリの画像特徴量となる分類器７３として保存する。 That is, the classification category determination unit 5 includes a category classifier generation unit 51 for realizing a function (at the time of generating a feature amount) of performing image classification with an image feature amount representing a classification category whose name is a designated keyword, A category classification means 52 for classifying an input image whose classification category is unknown (no keyword is assigned) in order to realize a function for assigning a search keyword (during classification / search); Yes. In addition, the storage unit 7 stores in advance the image feature quantities 72 of other various classification categories, and classifies the feature quantities of the sample images specified by the sample image specifying means 4 as classification categories having the designated keywords as names. Is stored as a classifier 73 that becomes the image feature amount of the image.

カテゴリ分類器生成手段５１は、指定されたキーワードを名称とする分類カテゴリの画像特徴量が、蓄積部７に予め保存されている他の様々な分類カテゴリの画像特徴量７２と大きく異なる相違点を分類判定の基準として機械学習により抽出し、分類判定の基準を分類器７３として出力する。
機械学習の具体的な実現手段として、公知技術であるSVM（Support Vector Machine）を用いることができる。 The category classifier generating means 51 determines that the image feature quantity of the category classified by the designated keyword is significantly different from the image feature quantities 72 of other various category categories stored in the storage unit 7 in advance. Extraction is performed by machine learning as a criterion for classification determination, and the criterion for classification determination is output as a classifier 73.
As a specific means for realizing machine learning, SVM (Support Vector Machine), which is a known technique, can be used.

SVMは、与えられたデータが設定されたある２つのクラスのいずれに属するかを判定する分類法であり、1995年にAT&TのCorinna CortesとV.Vapnikによって提案された。SVMでは、分類のための判定基準は事前に収集したサンプルデータから学習によって抽出される。
例えば、与えられた画像が「顔画像であるかそうでないか」を分類するためには、顔画像だけでなく、顔以外の画像も多数収集し、学習用のサンプル画像として用いる。「この画像は顔である」もしくは「顔ではない」といった事前知識を与え、この事前知識に基づき、分類のための判定基準を生成することとなる。 SVM is a classification method that determines which of two given classes a given data belongs to, and was proposed in 1995 by Corinna Cortes and V.Vapnik of AT & T. In SVM, criteria for classification are extracted by learning from sample data collected in advance.
For example, in order to classify whether a given image is “a face image or not”, not only a face image but also many images other than a face are collected and used as a sample image for learning. Prior knowledge such as “this image is a face” or “not a face” is given, and a determination criterion for classification is generated based on this prior knowledge.

具体的には、各サンプル画像を画像特徴量に変換し、特徴空間上での分布状況から顔と非顔とを分類する境界（超平面）を求め、分類の基準として用いる。実際に、新たに与えられた画像の分類を実施する際は、新たに与えられた画像から抽出された画像特徴量が特徴空間上で境界（超平面）のいずれ側に含まれるかによって、顔画像であるか否か、を判定する。
したがって、SVMにより実際に分類を実施するためには、（１）サンプル画像を多数収集し、（２）学習により分類のための判定基準（分類器）を生成する、という手順が事前に必要となる。この際、各サンプル画像には２つのクラスのいずれに属しているかが明らかにする情報が付与されていることが求められる。本例ではサンプル画像特定手段４によりこの条件が満たされている（キーワードが付与されたサンプル画像が入手できる）。 Specifically, each sample image is converted into an image feature amount, and a boundary (hyperplane) for classifying the face and the non-face from the distribution state in the feature space is obtained and used as a classification reference. Actually, when classifying a newly given image, the face is determined depending on which side of the boundary (hyperplane) the image feature amount extracted from the newly given image is included in the feature space. Whether it is an image or not is determined.
Therefore, in order to actually perform classification by SVM, a procedure of (1) collecting a large number of sample images and (2) generating a criterion (classifier) for classification by learning is required in advance. Become. At this time, each sample image is required to be provided with information for clarifying which of the two classes it belongs. In this example, this condition is satisfied by the sample image specifying means 4 (a sample image to which a keyword is assigned can be obtained).

カテゴリ分類手段５２は、分類カテゴリが未知である入力画像から特徴量変換手段３により生成された画像特徴量が、指定されたキーワードを名称とする分類カテゴリに属しているか否かを、カテゴリ分類器生成手段５１により生成された分類器７３を用いて判定する。カテゴリ分類手段５２において、指定されたキーワードを名称とする分類カテゴリに属していると判定された場合は、入力画像を所定の分類先である蓄積部７の画像７４として分類するか、又は、画像検索用キーワード付与手段６へ出力する。 The category classification unit 52 determines whether the image feature amount generated by the feature amount conversion unit 3 from an input image whose classification category is unknown belongs to a classification category whose name is a designated keyword. The determination is made using the classifier 73 generated by the generation means 51. If the category classification means 52 determines that the specified keyword belongs to the classification category whose name is the specified keyword, the input image is classified as the image 74 of the storage unit 7 which is a predetermined classification destination, or the image Output to the search keyword assigning means 6.

画像検索用キーワード付与手段６は、分類カテゴリ判定手段５により、入力画像が指定されたキーワードを名称とする分類カテゴリに属していると判定された場合に、入力画像に分類カテゴリの名称をキーワードとして付与し、蓄積部７における所定の分類先に画像７４として分類する。 When the classification category determination unit 5 determines that the input image belongs to a classification category whose name is the designated keyword, the image search keyword assignment unit 6 uses the name of the classification category as a keyword in the input image. And the image 74 is classified into a predetermined classification destination in the storage unit 7.

次に、上述した画像分類装置を用いて画像特徴量の作成、画像検索用キーワードの付与を行う場合の手順について説明する。
「各種画像についての画像特徴量の作成」
先ず、カテゴリ分類器生成手段５１において各画像の画像特徴量に対する分類器７３を作成するための基準となる画像特徴量７２について、予め蓄積部７に記録する処理が行われる。この処理は、ネットワーク１を介して又は直接入力された任意の各種画像から特徴量変換手段３において画像特徴量を算出し、予め蓄積部７に画像特徴量７２として記録することで行われる。 Next, a procedure for creating an image feature amount and assigning an image search keyword using the image classification apparatus described above will be described.
“Creating image features for various images”
First, the category classifier generation means 51 performs processing for previously recording the image feature quantity 72 serving as a reference for creating the classifier 73 for the image feature quantity of each image in the storage unit 7. This process is performed by calculating an image feature amount in the feature amount conversion means 3 from arbitrary various images input via the network 1 or directly, and recording it in the storage unit 7 as an image feature amount 72 in advance.

すなわち、特徴量変換手段３の特徴ベクトル抽出手段３１により、異なる多種多様の画像から抽出された特徴ベクトル群をCodeBook生成手段３２に入力し、それらの特徴空間上での分布特性を測定し、距離の近い特徴ベクトル同士で特徴空間をクラスタリングする。そして、各クラスタの代表ベクトルをCodeBook７１として蓄積部７に記録する。そして、ベクトル量子化手段３３において、各CodeBookの量子化処理が行われ代表ベクトルの数が１００〜２００のヒストグラムから構成される各画像特徴量７２を作成し蓄積部７に記録する。
これらの画像特徴量（ベクトル量子化手段３３によって出力された画像特徴量）７２は、カテゴリ分類器生成手段５１にて分類器７３を作成するために必要となり、カテゴリ分類器生成手段５１によって出力された分類器７３は、蓄積部７に蓄積することで、分類カテゴリが未知の入力画像について、カテゴリ分類手段５２で分類を行うための判断基準として使用される。 That is, the feature vector extracting unit 31 of the feature amount converting unit 3 inputs a feature vector group extracted from a wide variety of different images to the CodeBook generating unit 32, measures the distribution characteristics in the feature space, and calculates the distance. The feature space is clustered with feature vectors that are close to each other. Then, the representative vector of each cluster is recorded in the storage unit 7 as a CodeBook 71. Then, the vector quantization means 33 performs quantization processing for each CodeBook, creates each image feature amount 72 composed of histograms with 100 to 200 representative vectors, and records it in the storage unit 7.
These image feature quantities (image feature quantities output by the vector quantization means 33) 72 are necessary for creating the classifier 73 by the category classifier generation means 51, and are output by the category classifier generation means 51. The classifier 73 accumulates in the accumulating unit 7 and is used as a criterion for classifying the input image whose classification category is unknown by the category classification unit 52.

「キーワードで検索された画像についての画像特徴量の作成」
ユーザが入力部からキーワードを入力し、キーワードに合致した画像の画像特徴量を作成する。すなわち、Web画像検索手段２では、入力部から入力されたキーワードに基づいて、Web画像検索を行い、ネットワーク１を通して得られた画像が特徴量変換手段３に出力される。
特徴ベクトル抽出手段３１では、例えばSIFTで行われるように、キーポイント検出と検出されたキーポイント周辺から特徴ベクトルを抽出する方法により、入力された画像を特徴ベクトルに変換して出力する。 “Create image features for images searched by keywords”
The user inputs a keyword from the input unit and creates an image feature amount of an image that matches the keyword. That is, the Web image search unit 2 performs a Web image search based on the keyword input from the input unit, and an image obtained through the network 1 is output to the feature amount conversion unit 3.
The feature vector extraction means 31 converts the input image into a feature vector and outputs it by the method of extracting the feature vector from the detected key point and the detected key point, for example, as in SIFT.

ベクトル量子化手段３３では、蓄積部７に蓄積されたCodeBook７１を用いて入力された特徴ベクトルのベクトル量子化を行い、入力された特徴ベクトルをCodeBook７１の中で最も近い代表ベクトルへと量子化するのと同時に、各代表ベクトルへと量子化された頻度を求め、そのヒストグラムを画像特徴量として出力する。 The vector quantization means 33 performs vector quantization of the input feature vector using the CodeBook 71 stored in the storage unit 7, and quantizes the input feature vector to the nearest representative vector in the CodeBook 71. At the same time, the frequency quantized into each representative vector is obtained, and the histogram is output as an image feature amount.

サンプル画像特定手段４では、入力された画像特徴量を特徴空間上での分布特性から距離の近い画像特徴量同士でクラスタリングし、分布数の多いクラスタ又は分散の小さいクラスタに所属する画像を、指定されたキーワードに関連するサンプル画像として特定する。 The sample image specifying means 4 clusters the input image feature values by image feature values that are close in distance from the distribution characteristics in the feature space, and designates an image that belongs to a cluster with a large number of distributions or a cluster with a small variance. Specified as a sample image related to the keyword.

カテゴリ分類器生成手段５１では、入力された指定されたキーワードを名称とする分類カテゴリの画像特徴量と、予め蓄積部７に蓄積された他の様々な分類カテゴリの画像特徴量７２から、両者が大きく異なる相違点を分類判定の基準として機械学習によって抽出し、分類器７３として出力する。出力された分類器７３は蓄積部７に蓄積することで、後段のカテゴリ分類手段５２にて分類のための判断基準として使用することができる。 In the category classifier generating means 51, both are inputted from the image feature quantity of the classification category whose name is the inputted designated keyword and the image feature quantities 72 of other various classification categories stored in the storage unit 7 in advance. Differences that are significantly different are extracted by machine learning as a criterion for classification determination and output as a classifier 73. The output classifier 73 is stored in the storage unit 7 so that it can be used as a judgment criterion for classification by the category classification means 52 in the subsequent stage.

「分類カテゴリが未知の入力画像に対するカテゴリ分類と画像検索用キーワード付与」
特徴ベクトル抽出手段３１では、例えばSIFTで行われるように、キーポイント検出と検出されたキーポイント周辺から特徴ベクトルを抽出する方法により、分類カテゴリが未知の入力画像を特徴ベクトルに変換して出力する。
ベクトル量子化手段３３では、蓄積部７に蓄積されたCodeBook７１を用いて入力された特徴ベクトルのベクトル量子化を行い、入力された特徴ベクトルをCodeBook７１の中で最も近い代表ベクトルへと量子化するのと同時に、各代表ベクトルへと量子化された頻度を求め、そのヒストグラムを画像特徴量として出力する。 “Category classification and image search keyword assignment for input images with unknown classification category”
The feature vector extraction means 31 converts an input image whose classification category is unknown into a feature vector and outputs it by a method of extracting a feature vector from the detected key points and the detected key points, for example, as in SIFT. .
The vector quantization means 33 performs vector quantization of the input feature vector using the CodeBook 71 stored in the storage unit 7, and quantizes the input feature vector to the nearest representative vector in the CodeBook 71. At the same time, the frequency quantized into each representative vector is obtained, and the histogram is output as an image feature amount.

カテゴリ分類手段５２では、蓄積部７に蓄積されている分類器７３を用いて、新たに入力された画像が分類器７３のカテゴリに属しているか否かを判定し、属していると判定された場合は入力画像を所定の分類先である画像７４に分類する。
画像検索用キーワード付与手段６は、前記カテゴリ分類手段５２の結果、入力画像が指定された分類器７３のカテゴリに属していると判定された場合は、入力画像に分類器７３の示す名称を画像検索用のキーワードとして付与する。画像に直接付与せずとも、データベースなどに、画像のファイル名とキーワードをリンクさせて記録しても良い。 The category classification means 52 uses the classifier 73 stored in the storage unit 7 to determine whether or not the newly input image belongs to the category of the classifier 73 and determines that it belongs. In this case, the input image is classified into an image 74 that is a predetermined classification destination.
If it is determined as a result of the category classification means 52 that the input image belongs to the category of the specified classifier 73, the image search keyword assigning means 6 images the name indicated by the classifier 73 in the input image. It is given as a keyword for search. The image file name and the keyword may be linked and recorded in a database or the like without being directly assigned to the image.

上述した画像分類装置を使用した具体的な活用例について説明する。
ユーザが所持している複数の大量な画像を分類および検索するサービスに用いることができる。
例えば、ユーザが蓄積部７に子供の多数の画像を撮り貯めている場合（単に蓄積部７に記録だけしている場合）、これらの画像を蓄積部７から特徴量変換手段３（または直接入力画像として）に入力する。子供の多数の画像から「座ってご飯を食べている」画像を検索したいような場合、ユーザが入力部でキーワードとして「座ってご飯を食べている」と入力し、Web画像検索手段２でキーワードに基づいた画像がHTMLより収集されることで上述した各処理が行われ、蓄積部７に「座ってご飯を食べている」画像の画像特徴量が分類器７３として記録される。
次に、カテゴリ分類手段５２において、ユーザが所持している（蓄積部７に蓄積している又は直接入力される）各画像の画像特徴量と、分類器７３に記録された「座ってご飯を食べている」画像の画像特徴量と比較し、類似度が高い場合に「座ってご飯を食べている」画像と認識し、結果を表示部に表示することで、多数の画像の中から必要な画像を効率良く抽出することができる。 A specific application example using the above-described image classification apparatus will be described.
The present invention can be used for a service for classifying and searching a large number of images possessed by a user.
For example, when the user has taken and stored a large number of images of the child in the storage unit 7 (when the image is simply recorded in the storage unit 7), the feature amount conversion means 3 (or direct input) these images from the storage unit 7 As image). If you want to search for “sitting and eating” images from a large number of images of the child, the user enters “sitting and eating” as a keyword in the input section. The above-described processes are performed by collecting images based on the HTML from the HTML, and the image feature amount of the image “sitting and eating” is recorded in the storage unit 7 as the classifier 73.
Next, in the category classification means 52, the image feature amount of each image possessed by the user (stored in the storage unit 7 or directly input) and the “sitting and cooking rice” recorded in the classifier 73. Compared with the image feature value of the “eat” image, if the degree of similarity is high, it is recognized as a “sitting and eating” image, and the result is displayed on the display unit. Efficient images can be extracted efficiently.

また、ユーザが所持している子供の多数の画像の分類を行いたいような場合、ユーザがキーワードを指定して（例えば「座ってご飯を食べている」）入力し、Web画像検索手段２でキーワードに基づいた画像がHTMLより収集されることで上述した各処理によって、分類器７３に「座ってご飯を食べている」画像の画像特徴量が記録される。
次に、カテゴリ分類手段５２において、ユーザが所持している各画像の画像特徴量と、分類器７３に記録された「座ってご飯を食べている」画像の画像特徴量と比較し、類似度が高い場合に「座ってご飯を食べている」画像と認識し、画像検索用付与手段６によりその画像について「座ってご飯を食べている」というキーワードを付与して画像とともに蓄積部７に画像７４として記録される。 If the user wants to classify a large number of children's images, the user designates and inputs a keyword (for example, “sitting and eating”), and the Web image search means 2 uses the keyword. By collecting the images based on the HTML from the HTML, the image feature amount of the image “sitting and eating” is recorded in the classifier 73 by the above-described processes.
Next, the category classification means 52 compares the image feature amount of each image possessed by the user with the image feature amount of the “sitting and eating” image recorded in the classifier 73, and the degree of similarity. If the image is high, the image is recognized as “sitting and eating”, and the keyword “sitting and eating” is assigned to the image by the image retrieval providing unit 6 and the image is stored in the storage unit 7 together with the image. Recorded as 74.

上述した画像分類装置によれば、画像に検索用キーワードを付与することで、次回に画像の検索を行う場合に、キーワードの入力で直に抽出することが可能となる。
また、従来は分類カテゴリや画像検索用キーワードは予め設定されたものに制限されていたが、上述の例ではキーワードをユーザが自由に設定することができるので、ユーザの好みに応じたキーワードでの分類が可能となる。 According to the above-described image classification device, by adding a search keyword to an image, it is possible to directly extract the image by inputting the keyword when searching for an image next time.
Conventionally, the classification category and the keyword for image search are limited to those set in advance. However, in the above example, the keyword can be freely set by the user, so that the keyword according to the user's preference can be set. Classification is possible.

１…ネットワーク、２…Web画像検索手段、３…特徴量変換手段、４…サンプル画像特定手段、５…分類カテゴリ判定手段、６…画像検索用キーワード付与手段、７…蓄積部、３１…特徴ベクトル抽出手段、３２…CodeBook生成手段、３３…ベクトル量子化手段、５１…カテゴリ分類器生成手段、５２…カテゴリ分類手段、７１…CodeBook、７２…画像特徴量、７３…分類器、７４…画像。 DESCRIPTION OF SYMBOLS 1 ... Network, 2 ... Web image search means, 3 ... Feature-value conversion means, 4 ... Sample image specification means, 5 ... Classification category determination means, 6 ... Image search keyword provision means, 7 ... Accumulation part, 31 ... Feature vector Extraction means, 32 ... CodeBook generation means, 33 ... Vector quantization means, 51 ... Category classifier generation means, 52 ... Category classification means, 71 ... CodeBook, 72 ... Image feature amount, 73 ... Classifier, 74 ... Image.

Claims

In a system that can connect to an HTML page via a network,
Specify keywords and collect images from multiple HTML pages on which the keywords are posted from the Web.
Measure low-dimensional signal values such as color, shape, and pattern from multiple web search images collected from the web, and output them as image features.
For the plurality of image feature amounts, the feature space is clustered between image feature amounts that are close in distance from the distribution characteristics in the feature space, and an image belonging to a cluster with the largest distribution number or a cluster with a small variance is designated as a specified keyword. A sample image acquisition method characterized by specifying as a related sample image.

In a system that can connect to an HTML page via a network,
Web image search means for specifying a keyword and collecting images from a plurality of HTML pages on which the keyword is posted from the Web;
Feature quantity conversion means for measuring low-dimensional signal values such as colors, shapes and patterns from the web search images collected from the web image search means and outputting them as image feature quantities;
For the image feature amounts respectively generated from the plurality of Web search images by the feature amount conversion means, the feature spaces are clustered between image feature amounts that are close in distance from the distribution characteristics in the feature space, and the cluster having the largest number of distributions or Sample image specifying means for specifying an image belonging to a cluster with small variance as a sample image related to a specified keyword;
A sample image acquisition apparatus comprising:

In a system that can connect to an HTML page via a network,
Web image search means for specifying a keyword and collecting images from a plurality of HTML pages on which the keyword is posted from the Web;
Feature quantity conversion means for measuring low-dimensional signal values such as colors, shapes and patterns from the web search image or input image collected from the web image search means and outputting them as image feature quantities;
For the image feature amounts respectively generated from the plurality of Web search images by the feature amount conversion means, the feature spaces are clustered between image feature amounts that are close in distance from the distribution characteristics in the feature space, and the cluster having the largest number of distributions or Sample image specifying means for specifying an image belonging to a cluster with small variance as a sample image related to a specified keyword;
The image feature quantity corresponding to the sample image specified by the sample image specifying means is extracted and used as an image feature quantity representative of a classification category whose name is the designated keyword, while the classification category is unknown A classification for calculating the similarity between the image feature quantity generated by the feature quantity conversion unit from the image and the image feature quantity representing the classification category, and classifying the input image into a predetermined classification destination based on the similarity degree An image classification apparatus comprising a category determination unit.

Based on the similarity, the classification category determination means determines whether it belongs to the classification category, and if it is determined that the input image belongs to a classification category, the name of the classification category is used as a keyword in the input image. The image classification device according to claim 3, further comprising an image search keyword assigning unit for assigning.

The feature amount conversion means includes:
A feature vector extracting means for extracting a plurality of points having large signal changes such as edges and irregularities from an image as key points, and outputting a feature vector calculated from a color, shape, pattern, etc. near each key point;
Measure the distribution characteristics in the feature space of the feature vector group extracted from a wide variety of different images by the feature vector extraction means, cluster the feature space between feature vectors that are close to each other, and select representative vectors for each cluster CodeBook generation means to output as CodeBook,
An accumulator for storing the CodeBook generated by the CodeBook generating means;
The feature vector extracted by the feature vector extraction means is vector quantized to the nearest representative vector in the CodeBook stored in the storage unit, and at the same time, the appearance frequency of each representative vector is obtained, and the histogram is obtained. The image classification apparatus according to claim 3, further comprising a vector quantization unit that outputs an image feature amount of the input image.

The classification category determination means includes
An accumulation unit for storing the feature amount of the sample image specified by the sample image specifying means as an image feature amount of a classification category whose name is a specified keyword;
The difference between the image feature amount of the classification category whose name is the designated keyword and the image feature amount of other various category categories already stored in the storage unit is extracted by machine learning as a criterion for classification determination. A category classifier generating means for outputting a classification judgment criterion as a classifier;
Whether the image feature quantity generated by the feature quantity conversion means from the input image whose classification category is unknown belongs to the classification category whose name is the designated keyword is generated by the category classifier generation means. The image classification apparatus according to claim 3 or 4, further comprising: a category classification unit that classifies the input image into a predetermined classification destination when it is determined that the input image belongs.