JP2006099565A

JP2006099565A - Content identification device

Info

Publication number: JP2006099565A
Application number: JP2004286620A
Authority: JP
Inventors: Haruhisa Kato; 晴久加藤; Yasuhiro Takishima; 康弘滝嶋
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2004-09-30
Filing date: 2004-09-30
Publication date: 2006-04-13
Anticipated expiration: 2024-09-30
Also published as: JP4553300B2

Abstract

【課題】複数のコンテンツの中から任意のコンテンツを高速かつ高精度に識別することができるコンテンツ識別装置を提供すること。
【解決手段】学習処理手段１０では、予め識別対象のコンテンツ（正例コンテンツ）の特徴量と識別対象外のコンテンツ（負例コンテンツ）の特徴量をもとした学習を行うことにより学習モデルを構築する。識別処理手段２０では、未知のコンテンツの特徴量と学習処理手段１０により構築された学習モデルに基づいて未知のコンテンツが正例コンテンツであるか否かを識別する。様々なコンテンツを含む負例コンテンツについては分類手段１３でクラスタ分けを行い、クラスタごと、特徴量ごとに学習手段１４〜１６で学習モデルを構築し、クラスタごとに最適な特徴量と学習モデルを選定する。
【選択図】図１To provide a content identification device capable of identifying any content from a plurality of contents at high speed and with high accuracy.
A learning processing unit constructs a learning model by performing learning based on a feature amount of content to be identified (positive example content) and a feature amount of content not to be identified (negative example content) in advance. To do. The identification processing means 20 identifies whether or not the unknown content is a positive example content based on the feature amount of the unknown content and the learning model constructed by the learning processing means 10. Negative example content including various contents is classified into clusters by the classification means 13, and learning models are constructed by the learning means 14 to 16 for each cluster and each feature quantity, and the optimum feature quantity and learning model are selected for each cluster. To do.
[Selection] Figure 1

Description

本発明は、コンテンツ識別装置に関し、特に、コンテンツの特徴量を用いて高速かつ高精度にコンテンツを識別することができるコンテンツ識別装置に関する。 The present invention relates to a content identification device, and more particularly to a content identification device that can identify content at high speed and with high accuracy using the feature amount of the content.

従来、複数のコンテンツの中から希望するコンテンツを検索するために、予め各コンテンツにそれぞれの特徴量を表すメタデータを付与しておくことが行われている。コンテンツの検索は、コンテンツそのものではなくメタデータを介して行われる。 2. Description of the Related Art Conventionally, in order to search for a desired content from a plurality of contents, it has been performed that metadata indicating a feature amount is assigned to each content in advance. The search for content is performed via metadata, not the content itself.

各コンテンツへのメタデータの付与は、原始的には、各コンテンツに対し手動で説明文を記述して付与することにより行われる。また、コンテンツの検索は、検索に際して入力されたテキストと一致する説明文が付与されたコンテンツを提示することにより行われるのが一般的である。 The provision of metadata to each content is originally performed by manually describing and giving the description to each content. In addition, the content search is generally performed by presenting content to which an explanatory note that matches the text input at the time of search is presented.

一方、コンテンツに付与するメタデータの抽出を自動化する方法も提案されている。例えば、特許文献１には、色を複数の色グループに分類し、画像全体に占める各色グループの画素の割合である占有率を算出し、算出された色グループの画素の占有率とその色グループ名または色グループの代表色をメタデータとすることが記載されている。検索時にはメタデータ同士のユークリッド距離によって画像間の類似度を計算し、類似度が大である画像を検索結果として提示できる。 On the other hand, a method for automating extraction of metadata to be added to content has also been proposed. For example, Patent Literature 1 classifies colors into a plurality of color groups, calculates an occupancy ratio that is a ratio of pixels of each color group in the entire image, and calculates the pixel occupancy ratio of the calculated color group and the color group. It is described that the representative color of the name or color group is used as metadata. At the time of search, the similarity between images can be calculated based on the Euclidean distance between metadata, and an image with a high similarity can be presented as a search result.

また、特許文献２には、顔領域を含む濃淡画像をモザイク化し、顔領域の内外でモザイクパタンが異なることを利用して顔領域のモザイクをメタデータとして抽出することが記載され、特許文献３には、画像から罫線以外の部分を除去することにより抽出した罫線パターンを数量化したものをメタデータとすることが記載されている。これらでは、正規化したメタデータ同士の内積を距離として画像検索あるいは画像分類できる。 Patent Document 2 describes that a grayscale image including a face area is made into a mosaic, and the mosaic of the face area is extracted as metadata using the fact that the mosaic pattern is different inside and outside the face area. Describes that a ruled line pattern extracted by removing a part other than a ruled line from an image is quantified as metadata. In these, image search or image classification can be performed by using the inner product of normalized metadata as distances.

また、特許文献４には、色のヒストグラムおよびエッジ画素情報と２つのフレームのエッジ画素変化情報をメタデータとし、自己組織化マップを用いて類似画像を検索することが記載され、特許文献５には、異なる解像度の画像ごとに抽出したカラーヒストグラム、ブロックごとの輝度平均、エッジ量をメタデータとして用い、クラスタ解析で類似画像を検索することが記載されている。 Patent Document 4 describes that a color histogram, edge pixel information, and edge pixel change information of two frames are used as metadata, and a similar image is searched using a self-organizing map. Describes searching for similar images by cluster analysis using the color histogram extracted for each image of different resolution, the luminance average for each block, and the edge amount as metadata.

さらに、特許文献６には、annealing M-estimatorで求めた代表色とその配置をメタデータとして用い、判別分析法を利用して類似画像を検索することが記載されている。
特開平１１−９６３６４号公報特開平８−２２１５４７号公報特開平７−１６０８４４号公報特開平１１−３９３２５号公報特開２００３−２５６４２７号公報特開２００３−６７７６４号公報 Further, Patent Document 6 describes that similar images are searched using a discriminant analysis method using the representative colors obtained by annealing M-estimator and their arrangement as metadata.
Japanese Patent Laid-Open No. 11-96364 JP-A-8-221547 JP-A-7-160844 JP 11-39325 A JP 2003-256427 A JP 2003-676764 A

しかしながら、上記原始的な方法では、個々のコンテンツに手動でメタデータを付与しなければならず、コンテンツが膨大な量に達している場合にはメタデータ付与の作業負荷が発散し現実的でない。また、各コンテンツに付与する説明文は客観的なものである必要があるが、主観的な判断が入り込まざるを得ず、異なるコンテンツに同じような説明文が付与されたり、同じようなコンテンツに異なる説明文が付与されたりする恐れがあるという課題がある。 However, in the above-mentioned primitive method, metadata must be manually assigned to each content, and when the amount of content has reached a huge amount, the workload of adding metadata spreads and is not realistic. In addition, the explanatory text given to each content needs to be objective, but subjective judgment must be entered, and similar content is given to different content, There is a problem that a different explanation may be given.

特許文献１〜６に記載された方法によれば、客観的なメタデータを自動的に抽出できるので上記の問題点は解消される。しかし、特許文献１に記載された方法で抽出される各色グループの画素の占有率という単純なヒストグラムは、コンテンツの特徴を的確に捉えているとは言い切れず、コンテンツの識別精度の低下をもたらすという課題を抱える。また、識別には個々のメタデータのユークリッド距離を測る必要があるため、コンテンツの数が増えるに従って処理時間が問題となってくる。 According to the methods described in Patent Documents 1 to 6, objective metadata can be automatically extracted, and thus the above-described problems are solved. However, the simple histogram of the pixel occupancy ratio of each color group extracted by the method described in Patent Document 1 cannot be said to accurately capture the feature of the content, resulting in a decrease in content identification accuracy. I have a problem. Further, since it is necessary to measure the Euclidean distance of individual metadata for identification, processing time becomes a problem as the number of contents increases.

また、特許文献２，３に記載された方法は、画像の特徴量を特に顔領域のモザイクパタンや罫線パタンとするものであるため、処理対象が画像の中の更に顔画像や文書画像に限定され、汎用性が乏しいという課題がある。 In addition, the methods described in Patent Documents 2 and 3 use the feature amount of the image as a mosaic pattern or ruled line pattern of the face area in particular, so that the processing target is further limited to a face image or a document image in the image. There is a problem that versatility is poor.

さらに、特許文献４〜６の方法は、それぞれ大量のメタデータを算出により抽出する必要があるため、処理時間が掛かるという課題がある。また、識別に際してはカラーヒストグラムやエッジなどの異なる種類のメタデータを一括して用いるため、相互のメタデータを考慮したパラメータの設定が困難であるという課題もある。 Furthermore, the methods of Patent Documents 4 to 6 each have a problem that it takes a long processing time because it is necessary to extract a large amount of metadata by calculation. Further, since different types of metadata such as color histograms and edges are collectively used for identification, there is a problem that it is difficult to set parameters in consideration of mutual metadata.

本発明の目的は、上記課題を解決し、任意の未知のコンテンツが識別対象のコンテンツ（正例コンテンツ）であるか識別対象外のコンテンツ（負例コンテンツ）であるかを高速かつ高精度に識別することができるコンテンツ識別装置を提供することにある。 The object of the present invention is to solve the above-mentioned problems and identify whether any unknown content is content to be identified (positive example content) or non-identification content (negative example content) with high speed and high accuracy. It is an object of the present invention to provide a content identification device that can do this.

上記課題を解決するために、本発明は、未知のコンテンツが正例コンテンツであるか否かを識別するコンテンツ識別装置において、予め正例コンテンツの特徴量と負例コンテンツの特徴量をもとした学習を行うことにより学習モデルを構築する学習処理手段と、前記未知のコンテンツの特徴量と前記学習処理手段により構築された学習モデルに基づいて前記未知のコンテンツが正例コンテンツであるか否かを識別する識別処理手段とを備えたことを基本的特徴としている。 In order to solve the above-mentioned problem, the present invention is based on the feature amount of the positive example content and the feature amount of the negative example content in advance in the content identification device for identifying whether or not the unknown content is the positive example content. Learning processing means for constructing a learning model by performing learning, and whether or not the unknown content is a positive content based on the feature amount of the unknown content and the learning model constructed by the learning processing means The basic feature is that an identification processing means is provided.

また、本発明は、正例コンテンツに属する正例教師コンテンツおよび負例コンテンツに属する負例教師コンテンツを予め用意し、前記正例教師コンテンツから抽出した特徴量を正例コンテンツの特徴量とし、前記負例教師コンテンツから抽出した特徴量を負例コンテンツの特徴量とすることを特徴としている。 In addition, the present invention prepares positive example teacher content belonging to positive example content and negative example teacher content belonging to negative example content in advance, and a feature amount extracted from the positive example teacher content is used as a feature amount of the positive example content, The feature amount extracted from the negative example teacher content is a feature amount of the negative example content.

また、本発明は、負例教師コンテンツをその特徴量に応じて分類し、この分類ごと、特徴量ごとに学習モデルを構築すること、未知のコンテンツを識別するのに最適な特徴量の学習モデルを適応的に選択することを特徴としている。 In addition, the present invention classifies negative example teacher content according to the feature amount, constructs a learning model for each classification and for each feature amount, and learns a feature amount that is optimal for identifying unknown content. Is selected adaptively.

さらに、本発明は、負例教師コンテンツの分類に対応する複数の識別手段を設け、該複数の識別手段を連結させて識別処理を行わせること、負例教師コンテンツの分類に従うコンテンツの未知のコンテンツ中での出現頻度あるいは複数の識別手段の識別精度に応じて複数の識別手段の適用順序が変更可能であることを特徴としている。 Furthermore, the present invention provides a plurality of identification means corresponding to the classification of negative example teacher content, and performs identification processing by connecting the plurality of identification means, and the unknown content of the content according to the classification of the negative example teacher content The application order of the plurality of identification means can be changed in accordance with the appearance frequency in them or the identification accuracy of the plurality of identification means.

本発明によれば、未知のコンテンツが正例コンテンツであるか負例コンテンツであるかの識別を高精度に行うことができる。また、分類された負例教師コンテンツごとに識別に使用する特徴量を適応的に選択することにより、識別時に未知のコンテンツから抽出する特徴量を必要最小限に抑えることができる。また、特徴量間の距離が最大化されるように階層的に識別処理を配置し、識別処理ごとに必要な特徴量を抽出して段階的な識別処理を行うことにより、識別の精度を損なうことなく処理の高速化を実現できる。 According to the present invention, it is possible to identify with high accuracy whether unknown content is positive content or negative content. Further, by adaptively selecting the feature amount used for identification for each classified negative example teacher content, the feature amount extracted from the unknown content at the time of identification can be minimized. In addition, the identification processing is arranged in a hierarchical manner so that the distance between the feature amounts is maximized, and the identification amount is deteriorated by extracting the necessary feature amounts for each identification processing and performing the stepwise identification processing. The processing speed can be increased without any problems.

以下、図面を参照して本発明を説明する。図１は、本発明に係るコンテンツ識別装置の一実施形態を示す機能ブロック図である。以下では、コンテンツが静止画像であり、静止画像のメタデータとしての特徴量が主に色合い、色の分布・構図・模様などの静止画像の色情報および形状情報であるとして説明するが、本発明はこれらに限定されるものではなく、音声や動画像など任意のコンテンツを処理対象とし、任意の特徴量をメタデータとして実施できる。 The present invention will be described below with reference to the drawings. FIG. 1 is a functional block diagram showing an embodiment of a content identification apparatus according to the present invention. In the following description, it is assumed that the content is a still image, and the feature amount as metadata of the still image is mainly color information and shape information of the still image such as hue, color distribution, composition, and pattern. However, the present invention is not limited to these, and any content such as audio or moving images can be processed, and any feature can be implemented as metadata.

本発明は、教師コンテンツを用いて学習を行うことにより学習モデルを構築する学習処理手段１０と未知コンテンツに対する識別処理手段２０を基本的構成として備える。これらの処理手段はソフトウエアで構成できるがハードウエアで構成してもよい。
（１）学習処理手段１０ The present invention comprises a learning processing means 10 for constructing a learning model by performing learning using teacher content and an identification processing means 20 for unknown content as basic configurations. These processing means can be configured by software, but may be configured by hardware.
(1) Learning processing means 10

まず、学習処理手段１０について説明する。学習処理に際しては、予め識別対象内であることが分かっている画像と識別対象外であることが分かっている画像を教師コンテンツとして用意する。本明細書では、識別対象の画像を正例教師コンテンツと称し、識別対象外の画像を負例教師コンテンツと称する。例えば、未知の画像が人物の画像であるか否かを識別したい場合、人物の画像は正例教師コンテンツであり、人物の画像以外の画像は全て負例教師コンテンツとなる。 First, the learning processing means 10 will be described. In the learning process, an image that is known to be within the identification target and an image that is known to be outside the identification target are prepared as teacher content. In this specification, an image to be identified is referred to as positive example teacher content, and an image that is not to be identified is referred to as negative example teacher content. For example, when it is desired to identify whether an unknown image is a person image, the person image is a positive example teacher content, and all images other than the person image are negative example teacher content.

次に、正例教師コンテンツ、負例教師コンテンツをそれぞれ抽出手段１１、１２に入力し、それぞれの教師コンテンツから特徴量を抽出してメタデータとする。メタデータは任意の特徴量でよいが、国際標準規格であるMPEG-7で規定されたデスクリプタ、つまりColor Layout、Scalable Color、Domiant Color、Color StructureやEdge Histgramを利用するとメタデータに互換性を持たせることができ、汎用性に富んだ装置を実現できるので都合がよい。 Next, positive example teacher contents and negative example teacher contents are input to the extraction means 11 and 12, respectively, and feature quantities are extracted from the respective teacher contents to form metadata. The metadata can be any feature, but it is compatible with the descriptors defined by the international standard MPEG-7, that is, Color Layout, Scalable Color, Dominant Color, Color Structure and Edge Histgram. This is convenient because it can realize a versatile device.

負例教師コンテンツは、例えば人物の画像が正例コンテンツである場合、人物の画像以外の様々な種類の画像を含んでいるので、負例教師コンテンツのメタデータについては分類手段１３でメタデータの距離に応じてクラスタ分けを行う。このクラスタ分けには、K-mean法や自己組織化マップなどを利用することができる。また、学習処理で用意される負例教師コンテンツの種類は予め分かっているので、ここで用意された負例教師コンテンツとそれから抽出されたメタデータの対応をとるなどして手動操作によってクラスタ分けすることもできる。 The negative example teacher content includes, for example, various types of images other than the person image when the person image is the positive example content. Perform clustering according to distance. For this clustering, a K-mean method or a self-organizing map can be used. Also, since the types of negative example teacher content prepared in the learning process are known in advance, the negative example teacher content prepared here and the metadata extracted from it are clustered by manual operation, for example. You can also.

分類手段１３でのクラスタ分けにより負例教師コンテンツの中から、例えば海の画像のメタデータはクラスタＡとして分類され、山の画像のメタデータはクラスタＢとして分類される。 From the negative example teacher contents by the clustering by the classification means 13, for example, the metadata of the sea image is classified as cluster A, and the metadata of the mountain image is classified as cluster B.

学習手段(1)〜(3)１４〜１６は、抽出手段１１で抽出された正例教師コンテンツのメタデータと分類手段１２でクラスタ分けされた個々の負例教師コンテンツのメタデータを使用して正例教師コンテンツと負例教師コンテンツの２群に最適に分離する分離超平面を算出する。例えば、学習手段(1)は、抽出手段１１により抽出された人物の画像のメタデータと抽出手段１２で抽出され分類手段１３で分類された海の画像のメタデータを使用して、正例コンテンツである人物の画像と負例コンテンツである海の画像を分離する分離超平面を算出する。メタデータが複数ある場合には、メタデータごとに分離超平面を算出する。分離超平面の算出には、例えばサポートベクタマシン（SVM）あるいは判別分析を利用することができる。 The learning means (1) to (3) 14 to 16 use the metadata of the positive example teacher contents extracted by the extraction means 11 and the metadata of the individual negative example teacher contents clustered by the classification means 12. A separation hyperplane that is optimally separated into two groups of positive example teacher content and negative example teacher content is calculated. For example, the learning means (1) uses the metadata of the person image extracted by the extracting means 11 and the metadata of the sea image extracted by the extracting means 12 and classified by the classifying means 13, and uses the example content. A separation hyperplane that separates an image of a person and a sea image as negative example content is calculated. When there are a plurality of metadata, a separation hyperplane is calculated for each metadata. For example, a support vector machine (SVM) or discriminant analysis can be used to calculate the separation hyperplane.

サポートベクタマシン（SVM）については、例えば「Vapnik:Statistical learning theory, A Wiley-Interscience Publication, 1998」や「C-C.CHANG, C-W.IISU AND C-J LIN, The analysis of decomposition methods for support vector machines, IEEE Transaction on Neural Networks, 11(4) pp.1003-1008」に記載されている。 For support vector machines (SVM), for example, “Vapnik: Statistical learning theory, A Wiley-Interscience Publication, 1998”, “CC.CHANG, CW.IISU AND CJ LIN, The analysis of decomposition methods for support vector machines, IEEE Transaction” on Neural Networks, 11 (4) pp.1003-1008 ”.

図２は、ＳＶＭの概念を示す説明図であり、同図に示すように、あるメタデータにおける異なる要素ａ、ｂの量をそれぞれ縦軸、横軸に取り、各画像から抽出したメタデータの位置をプロットする。例えばメタデータがScalable Color（HSV色ヒストグラムのスケーラブル表現）である場合、赤、青をそれぞれ縦軸、横軸にとると、人物の各画像のメタデータは「○」の位置にプロットされ、海の各画像のメタデータは「×」の位置にプロットされる。なお、図２は、あるメタデータにおける要素が２種ａ，ｂの場合であるが、それ以上の場合にはそれに対応した次元位置でのプロットとなる。 FIG. 2 is an explanatory diagram showing the concept of SVM. As shown in FIG. 2, the amounts of different elements a and b in a given metadata are plotted on the vertical and horizontal axes, respectively. Plot the position. For example, if the metadata is Scalable Color (HSV color histogram scalable representation), taking red and blue on the vertical and horizontal axes respectively, the metadata of each image of the person is plotted at the position of “ The metadata of each image is plotted at the position of “x”. Note that FIG. 2 shows a case where there are two types of elements a and b in a certain metadata.

図２に示すように、ＳＶＭは分離の閾値となる分離超平面ｈを有する。分離超平面ｈは、上述したように、正例教師コンテンツと負例教師コンテンツのメタデータをＳＶＭに与え、各メタデータについて学習を行わせ、この学習結果に基づいて正例教師コンテンツと負例教師コンテンツが最適に分離されるように設定される。学習モデルの識別精度は、種々の正例教師コンテンツおよび負例教師コンテンツのメタデータをＳＶＭに与えたときに得られる分離超平面ｈとそれに最接近するプロット位置との間の離間距離に依存する。 As shown in FIG. 2, the SVM has a separation hyperplane h that serves as a separation threshold. As described above, the separation hyperplane h gives the metadata of the positive example teacher content and the negative example teacher content to the SVM, performs learning for each metadata, and based on the learning result, the positive example teacher content and the negative example It is set so that teacher content is optimally separated. The identification accuracy of the learning model depends on the separation distance between the separation hyperplane h obtained when the metadata of various positive example teacher contents and negative example teacher contents is given to the SVM and the closest plot position. .

以上により、正例コンテンツか負例コンテンツかを分類するための学習モデルが、クラスタごと、メタデータごとに構築される。例えば、分類手段１３によりクラスタＡ（海の画像）と分類された負例教師コンテンツについて、人物の画像（正例コンテンツ）か海の画像（負例コンテンツ）かを識別するColor Layoutの学習モデル、Scalable Colorの学習モデル、Domiant Colorの学習モデル、Color Structureの学習モデル、Edge Histgramの学習モデルなどが構築される。同様に、クラスタＢ（山の画像）と分類された負例教師コンテンツについても、人物の画像（正例コンテンツ）か山の画像（負例コンテンツ）かを識別するColor Layoutの学習モデル、Scalable Colorの学習モデル、Domiant Colorの学習モデル、Color Structureの学習モデル、Edge Histgramの学習モデルなどが構築される。 As described above, a learning model for classifying positive content or negative content is constructed for each cluster and each metadata. For example, for a negative example teacher content classified as cluster A (sea image) by the classification means 13, a learning model of Color Layout for identifying whether a person image (positive example content) or a sea image (negative example content), Scalable Color learning model, Dominant Color learning model, Color Structure learning model, Edge Histgram learning model, etc. are built. Similarly, with regard to negative example teacher content classified as cluster B (mountain image), a Color Layout learning model for identifying whether a human image (positive example content) or a mountain image (negative example content), Scalable Color Learning model, Dominant Color learning model, Color Structure learning model, Edge Histgram learning model, etc. are constructed.

選定手段(1)〜(3)１７〜１９は、正例教師コンテンツおよび負例教師コンテンツから抽出された複数のメタデータ対して、負例教師コンテンツの種類に応じてメタデータを適応的に利用するため、クラスタごとに正例教師コンテンツとの識別性能が最適なメタデータおよびその学習モデルを選定する。例えばクラスタＡ（海の画像）についてはColor Layoutのメタデータおよびその学習モデルを選定し、クラスタＢ（山の画像）についてはEdge Histgramのメタデータおよびその学習モデルを選定する。 Selection means (1) to (3) 17 to 19 adaptively use metadata according to the type of negative example teacher content for a plurality of metadata extracted from positive example teacher content and negative example teacher content. Therefore, the metadata and the learning model having the optimum discrimination performance with the positive teacher content are selected for each cluster. For example, for the cluster A (sea image), Color Layout metadata and its learning model are selected, and for the cluster B (mountain image), Edge Histgram metadata and its learning model are selected.

各選定手段(1)〜(3)１７〜１９により選定された各クラスタごとのメタデータおよびその学習モデルは、識別処理手段２０の対応する識別手段(1)〜(3)２１〜２３にそれぞれ与えられる。 The metadata for each cluster selected by the selection means (1) to (3) 17 to 19 and the learning model thereof are respectively stored in the corresponding identification means (1) to (3) 21 to 23 of the identification processing means 20. Given.

なお、あるクラスタにおける最適なメタデータが予め分かっている場合、例えば海の画像を負例コンテンツとして識別するのに最適なメタデータがScalable Colorであることが予め、あるいは先の海の画像での学習から分かっている場合、その後の学習に供される海の画像についてはメタデータとしてScalable Colorのみを抽出し、海の画像全体に対する学習モデルを生成あるいは修正するようにすることができる。
（２）識別処理手段２０ In addition, when the optimal metadata in a certain cluster is known in advance, for example, the optimal metadata for identifying an ocean image as negative example content is Scalable Color in advance, or in the previous ocean image If it is known from learning, only a scalable color can be extracted as metadata for a sea image for subsequent learning, and a learning model for the entire sea image can be generated or modified.
(2) Identification processing means 20

次に、識別処理手段２０について説明する。識別処理手段２０は、選定手段(1)〜(3)１７〜１９、つまりクラスタと同数の複数の識別手段(1)〜(3)２１〜２３を有する。未知コンテンツは識別処理手段２０に与えられ、正例コンテンツか負例コンテンツかが識別される。 Next, the identification processing means 20 will be described. The identification processing means 20 includes selection means (1) to (3) 17 to 19, that is, a plurality of identification means (1) to (3) 21 to 23 as many as the clusters. Unknown content is given to the identification processing means 20, and positive content or negative content is identified.

ここで、識別手段(1)〜(3)２１〜２３での識別処理を独立して行うことも考えられるが、識別手段(1)〜(3)２１〜２３を互いに連結させて、例えば、後述するように、未知コンテンツに対して段階的な識別処理が行われるようにするのがよい。未知コンテンツは、全ての識別手段(1)〜(3)２１〜２３で正例コンテンツと識別されれたとき正例コンテンツと判断される。また、識別手段(1)〜(3)２１〜２３の１つにおいてでも負例コンテンツと識別されれば負例コンテンツと判断され、その段階で識別処理は中断されてそれ以上の処理は行われない。 Here, it is conceivable to perform the identification processing in the identification means (1) to (3) 21 to 23 independently, but the identification means (1) to (3) 21 to 23 are connected to each other, for example, As will be described later, stepwise identification processing is preferably performed on unknown content. An unknown content is determined to be a positive content when it is identified as a positive content by all the identification means (1) to (3) 21 to 23. Further, if any of the identification means (1) to (3) 21 to 23 is identified as negative example content, it is determined as negative example content, and at that stage, the identification process is interrupted and further processing is performed. Absent.

図３は、識別処理手段２０での識別処理の手順の一例を示すフローチャートである。識別処理手段２０における個々の識別手段(1)〜(3)２１〜２３は、選定手段(1)〜(3)１７〜１９によって選定されたそれぞれのクラスタごとの最適なメタデータだけを未知コンテンツから抽出する。なお、このメタデータが他の識別手段で既に抽出されていればそのメタデータを再利用でき、新たにメタデータを抽出する処理は不要である。 FIG. 3 is a flowchart showing an example of the procedure of identification processing in the identification processing means 20. The individual identification means (1) to (3) 21 to 23 in the identification processing means 20 use only the optimum metadata for each cluster selected by the selection means (1) to (3) 17 to 19 as unknown content. Extract from Note that if this metadata has already been extracted by other identifying means, the metadata can be reused, and a process for extracting new metadata is not necessary.

各識別手段(1)〜(3)２１〜２３は、学習手段(1)〜(3)１４〜１６によってクラスタごとに導かれた学習モデルをもとに、ＳＶＭあるいは判別分析などによって未知コンテンツが正例コンテンツに属するか負例コンテンツに属するかを識別する。各識別手段(1)〜(3)２１〜２３での識別処理は、正例コンテンツか負例コンテンツかを分離する学習モデルによるものであるので、教師コンテンツの数に依存せず高速に行うことができる。 Each of the identification means (1) to (3) 21 to 23 has an unknown content generated by SVM or discriminant analysis based on the learning model derived for each cluster by the learning means (1) to (3) 14 to 16. Whether the content belongs to positive content or negative content is identified. The identification processing in each of the identification means (1) to (3) 21 to 23 is based on a learning model that separates positive example contents or negative example contents, and therefore should be performed at high speed without depending on the number of teacher contents. Can do.

図３は、未知コンテンツを正例コンテンツであるクラスタＸ、負例コンテンツであるクラスタＡ，Ｂ，Ｃに識別する例を示している。未知コンテンツは、まず、識別手段(1)２１に与えられてクラスタＡとクラスタＡ以外に識別される。ここでクラスタＡと識別されたコンテンツは負例コンテンツと判断される。クラスタＡ以外と識別されたコンテンツは、識別手段(2)２２に与えられてクラスタＢとクラスタＢ以外に識別される。ここでクラスタＢと識別されたコンテンツは負例コンテンツと判断される。次に、クラスタＢ以外と識別されたコンテンツは、識別手段(3)２３に与えられてクラスタＸとクラスタＣに識別される。クラスタＸと識別されたコンテンツは正例コンテンツと判断され、クラスタＣと識別されたコンテンツは負例コンテンツと判断される。 FIG. 3 shows an example in which unknown content is identified as cluster X, which is positive example content, and clusters A, B, and C, which are negative example content. The unknown content is first given to the identification means (1) 21 to be identified other than the cluster A and the cluster A. Here, the content identified as cluster A is determined as negative example content. The content identified as other than the cluster A is given to the identification means (2) 22 to be identified other than the cluster B and the cluster B. Here, the content identified as cluster B is determined to be a negative example content. Next, the contents identified as other than the cluster B are given to the identification means (3) 23 and identified as the cluster X and the cluster C. The content identified as cluster X is determined as positive example content, and the content identified as cluster C is determined as negative example content.

未知コンテンツ中での各クラスタの出現頻度に応じて識別手段の適用順序を設定したり、学習モデルごとの識別精度に応じて識別手段の適用順序を設定したりすることにより、識別処理の負担低減や高速化、高精度化を図ることができる。 Reduce the burden of identification processing by setting the application order of identification means according to the appearance frequency of each cluster in unknown content, or by setting the application order of identification means according to the identification accuracy of each learning model And high speed and high accuracy.

例えば未知コンテンツ中で出現頻度が大きいことが分かっているクラスタを負例コンテンツとして識別する識別処理を優先させるより、未知コンテンツ中の多くのクラスタを早い段階で負例コンテンツと識別し、後段処理の対象から除くことができる。早い段階の識別で除かれたコンテンツは後段での処理対象外となり、それからのメタデータの抽出を省略することができるので、全体的に見て未知コンテンツから最小限のメタデータを抽出することで識別が可能になり、高速の識別処理を実現できる。また、例えば識別精度が高い学習モデルによる識別処理を優先させることにより、負例コンテンツに含まれるコンテンツをクラスタごとに高精度に識別できる。 For example, rather than prioritizing identification processing for identifying clusters that are known to have a high appearance frequency in unknown content as negative example content, many clusters in unknown content are identified as negative example content at an early stage. Can be excluded from the subject. Content removed by early identification is not subject to processing in later stages, and extraction of metadata from it can be omitted, so by extracting the minimum metadata from unknown content as a whole, Identification becomes possible, and high-speed identification processing can be realized. In addition, for example, by giving priority to the identification process using a learning model with high identification accuracy, the content included in the negative example content can be identified with high accuracy for each cluster.

未知コンテンツの識別結果を再学習に利用することもできる。例えば未知コンテンツが負例コンテンツであるにも拘わらず正例コンテンツであると識別された場合、該未知コンテンツをそのクラスタに対する学習手段あるいは抽出手段に与え、該未知コンテンツが負例コンテンツと識別されるように再学習を行わせる。この場合、未知コンテンツのクラスタやそれに最適なメタデータは識別処理において既知であるので、メタデータが保存されていればそのメタデータを再学習に利用できるし、保存されていなくても未知コンテンツに最適なメタデータのみを抽出すればよいので、抽出するメタデータは必要最小限に抑えることができる。 The identification result of unknown content can also be used for relearning. For example, when an unknown content is identified as a positive example content even though it is a negative example content, the unknown content is given to the learning means or extraction means for the cluster, and the unknown content is identified as a negative example content Let them learn again. In this case, the cluster of unknown content and the optimal metadata are known in the identification process. Therefore, if the metadata is stored, the metadata can be used for relearning. Since only the optimal metadata needs to be extracted, the metadata to be extracted can be minimized.

本発明に係るコンテンツ識別装置の一実施形態を示す機能ブロック図である。It is a functional block diagram which shows one Embodiment of the content identification device based on this invention. サポートベクタマシン（SVM）の概念を示す説明図である。It is explanatory drawing which shows the concept of a support vector machine (SVM). 識別処理手段での識別処理の手順の一例を示すフローチャートである。It is a flowchart which shows an example of the procedure of the identification process in an identification process means.

Explanation of symbols

１０・・・学習処理手段、１１，１２・・・抽出手段、１３・・・分類手段、１４，１５，１６・・・学習手段、１７，１８，１９・・・選定手段、２０・・・識別処理手段、２１，２２，２３・・・識別手段 DESCRIPTION OF SYMBOLS 10 ... Learning processing means 11,12 ... Extraction means, 13 ... Classification means, 14, 15, 16 ... Learning means, 17, 18, 19 ... Selection means, 20 ... Identification processing means 21, 22, 23... Identification means

Claims

In a content identification device for identifying whether unknown content is content to be identified,
A learning model is constructed by performing learning based on the feature amount of the content to be identified (hereinafter referred to as positive content) and the feature amount of the content not to be identified (hereinafter referred to as negative content). Learning processing means to
Content comprising: identification processing means for identifying whether or not the unknown content is an example content based on a feature amount of the unknown content and a learning model constructed by the learning processing means Identification device.

A positive example teacher content belonging to the positive example content and a negative example teacher content belonging to the negative example content are prepared in advance, and the learning processing means uses the feature amount extracted from the positive example teacher content as the feature amount of the positive example content, and The content identification apparatus according to claim 1, wherein the feature amount extracted from the negative example teacher content is used as a feature amount of the negative example content.

The content identification apparatus according to claim 2, wherein the learning processing unit uses a descriptor defined by MPEG-7 as the feature amount.

The content identification apparatus according to claim 1, wherein the learning processing unit includes a classification unit that classifies the negative example teacher content according to a feature amount thereof.

The classifying means is configured to classify the negative example teacher content according to the feature amount using a K-mean method or a self-organizing map, or to be classified by a manual operation. The content identification device according to claim 4.

5. The content identification apparatus according to claim 4, wherein the learning processing unit is configured to learn for each negative example teacher content classified by the classification unit.

The content identification apparatus according to claim 6, wherein the learning processing unit is configured to learn by using a plurality of feature amounts individually.

7. The learning processing unit is configured to set a separation hyperplane that optimally separates from negative example teacher content for each negative example teacher content classified by the classification unit. The content identification device described in 1.

9. The content identification apparatus according to claim 8, wherein the separation hyperplane is set using a support vector machine or discriminant analysis.

The content identification apparatus according to claim 7, wherein the learning processing unit includes a selection unit that selects an optimum feature amount and a learning model according to the negative example teacher content classified by the classification unit.

The content identification apparatus according to claim 10, wherein the identification processing unit includes a plurality of identification units corresponding to the classification performed by the classification unit, and the plurality of identification units are connected.

5. The content identification apparatus according to claim 4, wherein each of the identification units identifies unknown content into two groups of negative example content and other content according to the classification by the classification unit.

The identification processing means is configured such that the application order of the plurality of identification means can be changed in accordance with the appearance frequency of the content according to the classification by the classification means in the unknown content or the identification accuracy of the plurality of identification means. The content identification device according to claim 11.

12. The content identification according to claim 11, wherein the identification processing unit identifies the unknown content from the example content when all of the plurality of identification units identify the example content as the example content. apparatus.

The identification processing unit identifies the unknown content as negative example content and interrupts the processing when the unknown content is identified as negative example content by at least one of the plurality of identification units. Item 12. The content identification device according to Item 11.

12. The content identification apparatus according to claim 11, wherein the identification processing unit extracts a feature amount of the learning model in each identification unit at a stage required by each identification unit.

12. The content identification apparatus according to claim 11, wherein the identification processing unit reuses a feature amount required by each identification unit if it has already been extracted by another identification unit.