JP2009181306A

JP2009181306A - VIDEO INDEXING DEVICE, VIDEO INDEXING METHOD, VIDEO INDEXING PROGRAM, AND ITS RECORDING MEDIUM

Info

Publication number: JP2009181306A
Application number: JP2008019319A
Authority: JP
Inventors: Satoshi Shimada; 聡嶌田; Yongqing Sun; 泳青孫; Yukinobu Taniguchi; 行信谷口
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2008-01-30
Filing date: 2008-01-30
Publication date: 2009-08-13
Anticipated expiration: 2028-01-30
Also published as: JP4838272B2

Abstract

【課題】辞書データの収集に多大なコストと時間をかけることなく，定義ラベルと画像内容との関係が変動する場合にも高精度の映像インデキシングを実現する。
【解決手段】定義ラベルと，１クラス識別関数算出手段４が正事例画像から算出した処理対象画像が定義ラベルを表す画像であるか否かを判別する１クラス識別関数と，２クラス識別関数算出手段５が正事例画像と負事例画像とから算出した正事例画像と負事例画像とを識別する２クラス識別関数とを辞書記憶手段６に記憶しておく。ラベル付与候補検出手段７は，インデキシング対象映像を入力し，辞書記憶手段６に記憶された１クラス識別関数を用いて定義ラベルを付与する候補となるフレーム画像を求め，インデキシング手段８は，候補とされたフレーム画像から辞書記憶手段６に記憶された２クラス識別関数を用いて負事例画像に該当する画像を排除し，定義ラベルを付与すべき画像を決定する。
【選択図】図１High accuracy video indexing is realized even when the relationship between the definition label and the image content changes without taking much cost and time to collect dictionary data.
A definition label, a one-class identification function for determining whether or not a processing target image calculated from a normal case image by a one-class identification function calculating unit is an image representing a definition label, and a two-class identification function calculation The dictionary storage means 6 stores a two-class identification function for identifying the positive case image and the negative case image calculated by the means 5 from the positive case image and the negative case image. The labeling candidate detection means 7 receives the indexing target video, obtains a frame image as a candidate for giving a definition label using the one-class identification function stored in the dictionary storage means 6, and the indexing means 8 The image corresponding to the negative case image is excluded from the frame image thus obtained using the two-class discriminant function stored in the dictionary storage means 6, and the image to be assigned the definition label is determined.
[Selection] Figure 1

Description

本発明は，映像の中の見たいシーンに効率よくアクセスするための検索や閲覧に必要となるインデックスを自動付与する映像インデキシングの技術に関する。 The present invention relates to a video indexing technique for automatically assigning an index required for searching and browsing for efficiently accessing a scene to be viewed in a video.

予め定義しておいたラベルと画像内容との関係に基づいて，映像の中で定義ラベルに該当するシーンが出現したときに，そのシーンに定義ラベルを付与する映像インデキシング装置の従来技術として，大量の辞書データを事前に収集し，学習させることで，高精度にインデキシングするものがある。例えば「人物顔」のように普遍的な定義ラベルであれば，事前に大量の辞書画像を収集することができる。辞書データで学習した人物顔画像の辞書を用意しておき，インデキシングの対象映像の各フレーム画像と辞書との類似度から顔画像が含まれると判定されたときに，「人物顔」の定義ラベルを付与すればよい（非特許文献１参照）。 Based on the relationship between predefined labels and image content, when a scene corresponding to a defined label appears in a video, a large number of conventional video indexing devices that assign a defined label to a scene Some dictionary data are indexed with high accuracy by collecting and learning in advance. For example, a universal definition label such as “person face” can collect a large amount of dictionary images in advance. A dictionary of human face images learned from dictionary data is prepared, and when it is determined that a face image is included from the similarity between each frame image of the indexing target video and the dictionary, the definition label of “person face” (See Non-Patent Document 1).

また，利用者がサンプル画像を指定することで意味ラベルを定義した場合には，テンプレートマッチングなどの手法によりサンプル画像と類似したシーンを検出し（非特許文献２参照），検出したシーンに意味ラベルを付与すればよい。
高塚皓正，田中正行，奥富正敏，「顔らしさの評価値分布を利用した顔検出の提案」，情報処理学会論文誌，Vol.48，No.SIG16，pp.51-54，2007．高木幹雄，下田陽久監修「新編画像解析ハンドブック」，pp.1669-1675，2004，東京大学出版会. When a user defines a semantic label by specifying a sample image, a scene similar to the sample image is detected by a method such as template matching (see Non-Patent Document 2), and a semantic label is detected in the detected scene. Can be given.
Takamasa Takamasa, Tanaka Masayuki, Okutomi Masatoshi, “Proposal of Face Detection Using Evaluation Value Distribution of Facialness”, Transactions of Information Processing Society of Japan, Vol.48, No.SIG16, pp.51-54, 2007. Mikio Takagi and Yoshihisa Shimoda “New Image Analysis Handbook”, pp.1669-1675, 2004, University of Tokyo Press.

従来の大量の辞書データを用いる技術は，辞書データの収集に多大のコストと時間がかかることが問題であり，さらに定義ラベルと画像内容との関係が変動する場合に適用できないことが問題である。すなわち，利用者の検索要求が生じるたびに，その定義ラベルを表す画像を大量に収集する必要がある。また，最近話題になったオブジェクトやイベントを表したシーンや，放送映像などで撮影方法などが変更になったりした場合には，定義ラベルと画像内容との関係が変動するので，そのたびに改めて大量の辞書データを収集して学習しなおす必要がある。このように，学習にコストと時間がかかるため，適用領域が限定されるという問題があった。 The conventional technique using a large amount of dictionary data is problematic in that it takes a lot of cost and time to collect dictionary data, and it cannot be applied when the relationship between the definition label and the image content changes. . That is, each time a user search request occurs, a large amount of images representing the definition label must be collected. In addition, when the shooting method is changed in a scene that represents an object or event that has recently become a topic or in a broadcast video, the relationship between the definition label and the image content changes. It is necessary to collect and learn a large amount of dictionary data. As described above, there is a problem that the application area is limited because learning costs and time.

従来のテンプレートマッチングを用いる技術では，定義ラベルを付与するかを判別するための類似度のしきい値の最適解を求めることが困難である。また，少数のテンプレートでは定義ラベルを表現することが困難な場合には，インデキシングの精度が低いことが問題であった。 In the conventional technique using template matching, it is difficult to obtain an optimum solution of the threshold value of similarity for determining whether to provide a definition label. In addition, when it is difficult to express definition labels with a small number of templates, the problem is that indexing accuracy is low.

本発明は，上記問題点の解決を図り，辞書データの収集に負荷をかけることなく，意味ラベル等のラベルを付与するかを判別するための類似度のしきい値が最適値でなくても高精度にインデキシングできる装置を提供することを目的とする。 The present invention solves the above-described problems, and does not impose a burden on the collection of dictionary data, and the similarity threshold for determining whether to add a label such as a semantic label is not the optimum value. An object of the present invention is to provide a device capable of indexing with high accuracy.

本発明の原理構成図を図１に示す。本発明の映像インデキシング装置は，図１に示すような手段を備える。 A principle configuration diagram of the present invention is shown in FIG. The video indexing apparatus of the present invention includes means as shown in FIG.

基準画像選定手段１は，与えられた辞書用映像の中から，定義ラベルを表す基準画像を選定する手段である。辞書データ収集手段２は，辞書用映像の中から基準画像と類似する画像を検出する手段である。正事例・負事例選定手段３は，収集した類似画像の中で定義ラベルを表す画像として正しい画像を正事例画像，正しくない画像を類似している順に負事例画像としてそれぞれ選定する手段である。 The reference image selection means 1 is a means for selecting a reference image representing a definition label from a given dictionary video. The dictionary data collecting means 2 is a means for detecting an image similar to the reference image from the dictionary video. The positive case / negative case selection means 3 is a means for selecting a correct image as a positive case image and an incorrect image as negative case images in the order of similarity, as an image representing a definition label among the collected similar images.

１クラス識別関数算出手段４は，正事例画像から定義ラベルを表す画像であるか否かを判別するための１クラス識別関数を算出する手段である。２クラス識別関数算出手段５は，正事例画像と負事例画像とから両者を識別するための２クラス識別関数を算出する手段である。辞書記憶手段６は，１クラス識別関数算出手段４が算出した１クラス識別関数および２クラス識別関数算出手段５が算出した２クラス識別関数を，定義ラベルとともに蓄積し記憶しておく手段である。 The one class discriminant function calculating unit 4 is a unit that calculates a one class discriminant function for discriminating whether or not the image represents a definition label from the positive case image. The 2-class discriminant function calculating means 5 is a means for calculating a 2-class discriminant function for discriminating both from the positive case image and the negative case image. The dictionary storage means 6 is a means for accumulating and storing the 1-class discrimination function calculated by the 1-class discrimination function calculation means 4 and the 2-class discrimination function calculated by the 2-class discrimination function calculation means 5 together with the definition label.

ラベル付与候補検出手段７は，インデキシング対象映像が与えられたときに，当該映像から選択したフレーム画像について，１クラス識別関数算出手段４で求めた１クラス識別関数を用いて定義ラベルを付与する候補となるフレーム画像を求める手段である。インデキシング手段８は，ラベル付与候補検出手段７が求めた候補のフレーム画像について，２クラス識別関数算出手段５で求めた２クラス識別関数を用いて定義ラベルを付与すべき画像であるか否かを判定する手段である。 The candidate for label addition detection means 7 is a candidate for giving a definition label to the frame image selected from the video using the one class identification function obtained by the one class identification function calculation means 4 when the index target video is given. Means for obtaining a frame image. The indexing unit 8 determines whether or not the candidate frame image obtained by the label addition candidate detection unit 7 is an image to which a definition label is to be assigned using the 2-class discrimination function obtained by the 2-class discrimination function calculation unit 5. It is a means to determine.

本発明は，正事例画像を抽出する１クラス識別関数により，正事例画像に似た対象を抽出した後，正事例画像と負事例画像とを区別する２クラス識別関数を用いた負事例画像の排除を実施することにより，定義ラベルを付与すべき画像であるか否かを適切に判定することができる。 According to the present invention, an object similar to a positive case image is extracted by a one-class identification function for extracting a positive case image, and then a negative case image using a two-class identification function for distinguishing a positive case image from a negative case image is used. By performing the exclusion, it is possible to appropriately determine whether or not the image is to be provided with a definition label.

１クラス識別関数算出手段４および２クラス識別関数算出手段５での特徴量として，フレーム画像の映像における出現時刻（メディア時刻）と画像特徴量を用いることができる。 Appearance time (media time) and image feature quantity in the video of the frame image can be used as the feature quantity in the first class discrimination function calculation means 4 and the second class discrimination function calculation means 5.

また，本発明は，上記の手段に加え，複数の定義ラベルが出現する順序関係や時間間隔でイベントをルール化し，検出すべきイベントのルールとラベルとを設定するイベント設定手段と，インデキシング手段８で付与された定義ラベルの順序関係や時間間隔と，イベント設定手段で設定されているルールとの類似度に基づいてイベントラベルを付与するイベントラベル付与手段とを設けることもできる。 In addition to the above-described means, the present invention includes an event setting means that rules events according to the order relationship and time intervals in which a plurality of definition labels appear, and sets the rules and labels of events to be detected, and the indexing means 8 It is also possible to provide event label assigning means for assigning event labels based on the similarity between the order relationship and the time interval of the definition labels given in step 1 and the rules set by the event setting means.

また，上記本発明において，１クラス識別関数算出手段４は，１クラス識別関数を辞書データ収集手段２における類似度の基準に基づいて算出することができる。例えば，１クラス識別関数算出手段４での正事例画像から定義ラベルを表す画像であるか否かを判別するための特徴量として，辞書データ収集手段２における画像の類似度の基準とした特徴量を用いることができる。 In the present invention, the one-class discriminant function calculating unit 4 can calculate the one-class discriminant function based on the similarity criterion in the dictionary data collecting unit 2. For example, as a feature value for discriminating whether or not the image represents a definition label from the positive case image in the one-class discrimination function calculation unit 4, a feature value used as a reference for image similarity in the dictionary data collection unit 2 Can be used.

本発明によれば，１クラス識別関数による判定と２クラス識別関数による判定の２段階の判定手段により判定することで精度よくインデキシングを行うことができる。また，１クラス識別関数算出手段のしきい値を調整するための手間をなくすことができる。 According to the present invention, it is possible to perform indexing with high accuracy by performing the determination by two-stage determination means, that is, determination by a one-class identification function and determination by a two-class identification function. Further, it is possible to eliminate the trouble of adjusting the threshold value of the one class identification function calculating means.

また，本発明は，定義ラベルが出現する順序関係や時間間隔と事前に設定したルールとを比較することでイベントラベルを付与するイベントラベル付与手段を設けることにより，さらに精度よく所望の映像区間にインデキシングが行えるようになる。 In addition, the present invention provides an event label assigning means for assigning an event label by comparing the order relationship and time interval in which the definition labels appear with a preset rule, thereby further accurately adding a desired video section. Indexing can be performed.

さらに，本発明は，基準画像を選定する手段，辞書データ収集手段，正事例画像と負事例画像の選定手段を設けることにより，辞書データを簡単に効率よく集めることができる。 Furthermore, according to the present invention, dictionary data can be collected easily and efficiently by providing means for selecting a reference image, dictionary data collecting means, and means for selecting positive case images and negative case images.

以下，図面を用いて，本発明の実施の形態を説明する。本発明に係る映像インデキシング装置は，大きく分けて，辞書を生成する辞書生成部とインデキシング対象映像にラベルを付与するインデキシング部とから構成される。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. The video indexing apparatus according to the present invention is roughly composed of a dictionary generation unit that generates a dictionary and an indexing unit that assigns a label to the indexed video.

図２は，本発明の第１の実施例における映像インデキシング装置の辞書生成部の構成，図３は，本発明の第１の実施例における映像インデキシング装置のインデキシング部の構成をそれぞれ説明するための図である。 FIG. 2 illustrates the configuration of the dictionary generation unit of the video indexing apparatus according to the first embodiment of the present invention, and FIG. 3 illustrates the configuration of the indexing unit of the video indexing apparatus according to the first embodiment of the present invention. FIG.

図２の辞書生成部は，辞書用映像蓄積部１１，基準画像選定処理部１２，画像特徴抽出部１３，辞書データ収集処理部１４，正事例・負事例選定処理部１５，特徴量抽出部１６，１クラス識別関数算出部１７，２クラス識別関数算出部１８，定義ラベル辞書記憶部１９で構成される。 2 includes a dictionary video storage unit 11, a reference image selection processing unit 12, an image feature extraction unit 13, a dictionary data collection processing unit 14, a positive case / negative case selection processing unit 15, and a feature amount extraction unit 16. , 1 class discrimination function calculation unit 17, 2 class discrimination function calculation unit 18, and definition label dictionary storage unit 19.

図３のインデキシング部は，インデキシング映像取得部２０，フレーム画像取得部２１，特徴量抽出部１６，定義ラベル辞書記憶部１９，１クラス識別関数による候補検出部２２，２クラス識別関数によるインデキシング部２３で構成される。 The indexing unit in FIG. 3 includes an indexing video acquisition unit 20, a frame image acquisition unit 21, a feature amount extraction unit 16, a definition label dictionary storage unit 19, a candidate detection unit 22 based on a one class identification function, and an indexing unit 23 based on a two class identification function. Consists of.

辞書用映像蓄積部１１は，事前に取得しておいた辞書用の映像を蓄積・管理しておき，基準画像選定処理部１２からの要求に応じて辞書用映像を基準画像選定処理部１２に出力する。 The dictionary video storage unit 11 stores and manages the dictionary video acquired in advance, and sends the dictionary video to the reference image selection processing unit 12 in response to a request from the reference image selection processing unit 12. Output.

基準画像選定処理部１２は，定義ラベルが入力されると，辞書用映像蓄積部１１に読み取り要求を出して辞書用映像蓄積部１１から受けとった辞書用映像の中から，定義ラベルを表す典型的なフレーム画像を選定する。選定した画像を基準画像として，辞書用映像とともに画像特徴抽出部１３に出力する。基準画像の選定方法として，例えば，辞書映像のシーンが大きく変わるところを区切りとするショット分割を行い，各ショットの先頭画像を一覧表示し，その中から基準画像をマウスなどで指定させるインタフェースを実装する方法を用いることができるが，この方法に限らず，基準画像の選定は，種々の方法により容易に実施することができる。 When the definition label is input, the reference image selection processing unit 12 issues a read request to the dictionary image storage unit 11 and represents a definition label from the dictionary image received from the dictionary image storage unit 11. Select a suitable frame image. The selected image is output as a reference image to the image feature extraction unit 13 together with the dictionary video. For example, an interface has been implemented to divide shots where the scene of the dictionary video changes greatly, display a list of the top images of each shot, and specify the reference image from among them by using a mouse as a reference image selection method. However, the method is not limited to this method, and the selection of the reference image can be easily performed by various methods.

画像特徴抽出部１３は，基準画像選定処理部１２から受け取った基準画像および辞書用映像の各フレーム画像から，色やテクスチャーなどに関する画像特徴量を抽出し，抽出した特徴空間での各フレーム画像を特徴ベクトルで表し，特徴ベクトルを辞書データ収集処理部１４に出力する。 The image feature extraction unit 13 extracts image feature amounts related to color, texture, and the like from each frame image of the reference image and dictionary video received from the reference image selection processing unit 12, and each frame image in the extracted feature space is extracted. The feature vector is expressed as a feature vector, and the feature vector is output to the dictionary data collection processing unit 14.

辞書データ収集処理部１４は，画像特徴抽出部１３から各画像の特徴ベクトルを受け取ると，その基準画像と類似した画像を辞書用映像の中から収集し，収集した画像群を正事例・負事例選定処理部１５に出力する。 When the dictionary data collection processing unit 14 receives the feature vector of each image from the image feature extraction unit 13, the dictionary data collection processing unit 14 collects images similar to the reference image from the dictionary video, and the collected image group is a positive case / negative case. The data is output to the selection processing unit 15.

正事例・負事例選定処理部１５は，辞書データ収集処理部１４から受け取った画像群から，定義ラベルの画像内容を表している正事例画像と，表していない負事例画像を所定の数だけ選定し，正事例画像と負事例画像を特徴量抽出部１６に出力する。 The positive case / negative case selection processing unit 15 selects a predetermined number of positive case images representing the image contents of the definition label and negative case images not representing from the image group received from the dictionary data collection processing unit 14. Then, the positive case image and the negative case image are output to the feature amount extraction unit 16.

特徴量抽出部１６は，正事例・負事例選定処理部１５から受け取った正事例画像または負事例画像の色やテクスチャーに関する画像特徴をもとに，識別関数を算出するための特徴量を抽出し，抽出した特徴空間での各画像を特徴ベクトルで表し，正事例画像の特徴ベクトルを１クラス識別関数算出部１７へ，正事例画像と負事例画像の特徴ベクトルを２クラス識別関数算出部１８へそれぞれ出力する。特徴量の具体例として，画像特徴抽出部１３で抽出した画像特徴に画像の映像での出現時刻を特徴量として追加する方法や，正事例画像と負事例画像の全画像を対象として，画像特徴抽出部１３で抽出した画像特徴を用いて主成分分析を行い，次元圧縮したものを特徴量とする方法などが有効である。 The feature quantity extraction unit 16 extracts a feature quantity for calculating the discriminant function based on the image characteristics regarding the color and texture of the positive case image or the negative case image received from the positive case / negative case selection processing unit 15. , Each image in the extracted feature space is represented by a feature vector, the feature vector of the positive case image is sent to the one-class discriminant function calculating unit 17, and the feature vector of the positive case image and the negative case image is going to the two-class discriminant function calculating unit 18. Output each. As a specific example of the feature amount, a method of adding the appearance time in the image of the image to the image feature extracted by the image feature extraction unit 13 as a feature amount, or an image feature for all images of the positive case image and the negative case image. It is effective to perform a principal component analysis using the image features extracted by the extraction unit 13 and use a dimensionally compressed feature as a feature amount.

また，特徴量抽出部１６は，図３に示すフレーム画像取得部２１から画像を受け取った場合には，同様の特徴抽出を行い，抽出した特徴ベクトルを１クラス識別関数による候補検出部２２に出力する。 Further, when receiving an image from the frame image acquisition unit 21 shown in FIG. 3, the feature amount extraction unit 16 performs similar feature extraction, and outputs the extracted feature vector to the candidate detection unit 22 using a one-class identification function. To do.

１クラス識別関数算出部１７は，特徴量抽出部１６から受け取った正事例画像の特徴ベクトルから，特徴空間において，定義ラベルを表す画像であるか否かを判別するための１クラス識別関数を算出し，算出した１クラス識別関数を定義ラベル辞書記憶部１９に出力する。 The one-class identification function calculation unit 17 calculates a one-class identification function for determining whether the image represents a definition label in the feature space from the feature vector of the positive case image received from the feature amount extraction unit 16. Then, the calculated one-class identification function is output to the definition label dictionary storage unit 19.

１クラス識別関数として，例えば，特徴空間における正事例画像の平均ベクトルを中心とする球面の中に正事例画像が全て含まれるような球面を識別関数とする方法や，特徴空間での正事例画像の分布を複数のガウシアン分布でモデル化した混合ガウシアンモデルで正事例であることの確率分布を表し，この確率分布に基づいて正事例画像であることの確率が予め設定したしきい値以上であれば正事例と判定することを識別関数とする方法などを用いればよい。このときのしきい値の設定を，例えば６０％と低く設定しておけば，１クラス識別関数による検出漏れが少なくなる。もちろん誤検出が増えることになるが，これは，２クラス識別関数算出部１８の処理により除外できるので問題ない。このように，１クラス識別関数のしきい値を最適値にする必要がないので，しきい値の調整にコストをかけることなく精度よくインデキシングを行うことができる。 As a one-class discriminant function, for example, a method in which a spherical surface in which all positive case images are included in a spherical surface centered on an average vector of positive case images in a feature space is used as a discriminant function, or a positive case image in a feature space The probability distribution of a positive case image is expressed by a mixed Gaussian model obtained by modeling the distribution of multiple Gaussian distributions, and the probability of being a positive case image based on this probability distribution is greater than a preset threshold. For example, a method using an identification function to determine a positive case may be used. If the threshold setting at this time is set as low as 60%, for example, the detection omission due to the one-class identification function is reduced. Of course, the number of false detections increases, but this can be eliminated by the processing of the two-class discriminant function calculation unit 18, so that there is no problem. As described above, since it is not necessary to set the threshold value of the one-class discriminant function to the optimum value, it is possible to perform the indexing with high accuracy without costing the adjustment of the threshold value.

２クラス識別関数算出部１８は，特徴量抽出部１６から受け取った正事例画像と負事例画像の特徴ベクトルから，特徴空間において，定義ラベルを表す画像と表さない画像とを判別するための２クラス識別関数を算出し，算出した２クラス識別関数を定義ラベル辞書記憶部１９に出力する。２クラス識別関数として，例えば，フィッシャーの判別関数やＳＶＭ（Support Vector Machine）などを用いればよい。 The two-class discriminant function calculation unit 18 uses a feature case of the positive case image and the negative case image received from the feature amount extraction unit 16 to discriminate between an image representing a definition label and an image not represented in the feature space. The class identification function is calculated, and the calculated two-class identification function is output to the definition label dictionary storage unit 19. As the two-class discriminant function, for example, a Fisher discriminant function or SVM (Support Vector Machine) may be used.

定義ラベル辞書記憶部１９は，１クラス識別関数算出部１７から受け取った１クラス識別関数と，２クラス識別関数算出部１８から受け取った２クラス識別関数を，定義ラベルとともに蓄積しておき，図３に示す１クラス識別関数による候補検出部２２，および２クラス識別関数によるインデキシング部２３からの要求に応じて，１クラス識別関数または２クラス識別関数を出力する。 The definition label dictionary storage unit 19 stores the one class identification function received from the one class identification function calculation unit 17 and the two class identification function received from the two class identification function calculation unit 18 together with the definition label, and FIG. 1 class discriminant function or 2 class discriminant function is output in response to a request from the candidate detector 22 using the one class discriminant function and the indexing unit 23 using the two class discriminant function.

次に，図３のインデキシング映像取得部２０は，インデキシングの対象となる映像を取得し，取得した映像をフレーム画像取得部２１へ出力する。 Next, the indexed video acquisition unit 20 in FIG. 3 acquires a video to be indexed and outputs the acquired video to the frame image acquisition unit 21.

フレーム画像取得部２１は，インデキシング映像取得部２０から受け取ったインデキシングの対象映像から一定間隔でサンプリングすることによりフレーム画像を取得し，取得したフレーム画像を特徴量抽出部１６へ出力する。 The frame image acquisition unit 21 acquires a frame image by sampling the indexing target video received from the indexing video acquisition unit 20 at a predetermined interval, and outputs the acquired frame image to the feature amount extraction unit 16.

１クラス識別関数による候補検出部２２は，特徴量抽出部１６から各フレーム画像の特徴ベクトルを受け取ると，定義ラベルと１クラス識別関数の要求信号を定義ラベル辞書記憶部１９に出力し，定義ラベル辞書記憶部１９から定義ラベルと１クラス識別関数を受け取る。受け取った１クラス識別関数を用いて，各フレーム画像が定義ラベルを表す画像であるか否かを判定し，定義ラベルを表す画像であると判定された画像を候補画像とし，候補画像の特徴ベクトルを２クラス識別関数によるインデキシング部２３に出力する。 Upon receiving the feature vector of each frame image from the feature quantity extraction unit 16, the candidate detection unit 22 based on the one class identification function outputs a definition label and a request signal for the one class identification function to the definition label dictionary storage unit 19. The definition label and the one class identification function are received from the dictionary storage unit 19. Using the received one-class discriminant function, it is determined whether each frame image is an image representing a definition label, an image determined to be an image representing a definition label is set as a candidate image, and a feature vector of the candidate image Is output to the indexing unit 23 using a two-class discriminant function.

２クラス識別関数によるインデキシング部２３は，１クラス識別関数による候補検出部２２から候補画像の特徴ベクトルを受け取ると，定義ラベルと２クラス識別関数の要求信号を定義ラベル辞書記憶部１９に出力し，定義ラベル辞書記憶部１９から定義ラベルと２クラス識別関数を受け取る。２クラス識別関数によるインデキシング部２３は，受け取った２クラス識別関数を用いて，候補画像の中で定義ラベルを表す画像を決定し，映像の中で定義ラベルが出現した画像であることを示す情報をインデキシング結果として出力する。 When receiving the feature vector of the candidate image from the candidate detection unit 22 based on the one class identification function, the indexing unit 23 based on the two class identification function outputs a definition label and a request signal for the two class identification function to the definition label dictionary storage unit 19. A definition label and a two-class identification function are received from the definition label dictionary storage unit 19. The indexing unit 23 based on the two-class discriminant function uses the received two-class discriminant function to determine an image representing the definition label in the candidate image, and indicates that the image has the definition label appearing in the video Is output as an indexing result.

次に，上記の構成における処理手順について，図４に示す辞書用映像の例を用いて説明する。辞書用映像をＶｄ，定義ラベルの基準画像をＦｄとする。図４において，区間１，２，３は，定義ラベルを表す画像が出現する区間である。 Next, a processing procedure in the above configuration will be described using an example of a dictionary video shown in FIG. The dictionary video is Vd, and the reference image of the definition label is Fd. In FIG. 4, sections 1, 2, and 3 are sections in which images representing definition labels appear.

図５は，本発明の第１の実施例における辞書生成部の処理を示すフローチャートである。 FIG. 5 is a flowchart showing the processing of the dictionary generation unit in the first embodiment of the present invention.

［ステップＳ５０１］
基準画像選定処理部１２において，基準画像と辞書用映像蓄積部１１で管理されている辞書用映像の各フレーム画像を読み込む。基準画像として図４に示すＦｄが選定されたとする。 [Step S501]
The reference image selection processing unit 12 reads each frame image of the reference image and the dictionary video managed by the dictionary video storage unit 11. Assume that Fd shown in FIG. 4 is selected as the reference image.

次に，画像特徴抽出部１３において，ステップＳ５０２，Ｓ５０３，Ｓ５０４，Ｓ５０５の手順で画像特徴量を抽出する。以下では，色特徴量としてカラーモーメント，テクスチャー特徴量として濃度勾配ヒストグラムを用いる場合を例に説明する。 Next, the image feature extraction unit 13 extracts image feature amounts in the steps S502, S503, S504, and S505. Hereinafter, a case where a color moment is used as the color feature amount and a density gradient histogram is used as the texture feature amount will be described as an example.

［ステップＳ５０２］
各フレーム画像はＲＧＢの３原色で表現されているので，これをＬａｂ色空間に変換する。 [Step S502]
Since each frame image is represented by the three primary colors RGB, it is converted to the Lab color space.

［ステップＳ５０３］
画像をＭ×Ｎ個のブロックに分割する。図６に分割例を示す。図６の例では，フレーム画像を４×４＝１６個のブロック領域に分割している。 [Step S503]
The image is divided into M × N blocks. FIG. 6 shows an example of division. In the example of FIG. 6, the frame image is divided into 4 × 4 = 16 block areas.

［ステップＳ５０４］
各ブロックに属する画素の色情報からカラーモーメントを算出する。Ｌａｂ色空間の各成分Ｌ，ａ，ｂについて，ブロック内の全画素に対する１次，２次，３次のモーメントをそれぞれ算出する。各ブロックから９次元のカラーモーメントが得られるので，各フレーム画像からの色特徴は，この例ではＭ×Ｎ×９次元の特徴ベクトルで表されることになる。 [Step S504]
The color moment is calculated from the color information of the pixels belonging to each block. For each component L, a, and b in the Lab color space, first, second, and third moments are calculated for all the pixels in the block. Since a 9-dimensional color moment is obtained from each block, the color feature from each frame image is represented by an M × N × 9-dimensional feature vector in this example.

［ステップＳ５０５］
Ｌａｂ色空間のＬ成分のみを対象としてテクスチャー特徴である濃度勾配ヒストグラムを算出する。 [Step S505]
A density gradient histogram that is a texture feature is calculated only for the L component of the Lab color space.

まず，Ｌ成分を表す画像の各画素について，エッジ方向とエッジ強度を求める。画素（ｘ，ｙ）のＬ成分の値をＬ（ｘ，ｙ）とすると，エッジ方向とエッジ強度は次式で求められる。 First, the edge direction and the edge strength are obtained for each pixel of the image representing the L component. When the value of the L component of the pixel (x, y) is L (x, y), the edge direction and the edge strength are obtained by the following equations.

エッジ強度：ｓｑｒｔ（ΔＸ＊ΔＸ＋ΔＹ＊ΔＹ）
エッジ方向：ａｒｃｔａｎ（ΔＹ／ΔＸ）
ここで，
ΔＸ＝Ｌ（ｘ＋１，ｙ）−Ｌ（ｘ，ｙ）
ΔＹ＝Ｌ（ｘ，ｙ＋１）−Ｌ（ｘ，ｙ）
次に，ブロック内の各画素について，エッジ強度で重み付けしたエッジ方向の頻度分布を算出する。エッジ方向について，例えば０度〜１８０度を２０度間隔に９分割して，頻度を集計すると，各ブロックの濃度勾配ヒストグラムは９次元となる。この場合，各フレーム画像からのテクスチャー特徴は，Ｍ×Ｎ×９次元の特徴ベクトルで表される。 Edge strength: sqrt (ΔX * ΔX + ΔY * ΔY)
Edge direction: arctan (ΔY / ΔX)
here,
ΔX = L (x + 1, y) −L (x, y)
ΔY = L (x, y + 1) −L (x, y)
Next, for each pixel in the block, the frequency distribution in the edge direction weighted by the edge strength is calculated. For the edge direction, for example, when the frequency is totaled by dividing 9 degrees into 0 degrees to 180 degrees at 20 degree intervals, the density gradient histogram of each block becomes 9 dimensions. In this case, the texture feature from each frame image is represented by a feature vector of M × N × 9 dimensions.

次に，辞書データ収集処理部１４において，ステップＳ５０６，Ｓ５０７により辞書データを収集する。 Next, the dictionary data collection processing unit 14 collects dictionary data in steps S506 and S507.

［ステップＳ５０６］
辞書用映像Ｖｄの各フレーム画像Ｆｉ（ｉ＝１，２，... ，Ｉ）と基準画像Ｆｄとの距離ｒ（ｉ）を算出する。 [Step S506]
A distance r (i) between each frame image Fi (i = 1, 2,..., I) of the dictionary video Vd and the reference image Fd is calculated.

基準画像Ｆｄと各フレーム画像Ｆｉとから求めたＭ×Ｎ×９次元の色特徴ベクトル間のユークリッド距離をｒ＿ｃｏｌ，基準画像Ｆｄと各フレーム画像Ｆｉとから求めたＭ×Ｎ×９次元のテクスチャー特徴ベクトル間のユークリッド距離をｒ＿ｔｅｘとすると，それらの重み和としてｒ（ｉ）を次式で求める。 The Euclidean distance between the M × N × 9-dimensional color feature vectors obtained from the reference image Fd and each frame image Fi is r_col, and the M × N × 9-dimensional texture feature obtained from the reference image Fd and each frame image Fi. Assuming that the Euclidean distance between vectors is r_tex, r (i) is obtained by the following equation as the weight sum thereof.

ｒ（ｉ）＝ｗ１・ｒ＿ｃｏｌ＋ｗ２・ｒ＿ｔｅｘ
ここで，ｗ１，ｗ２は事前に設定した定数である。 r (i) = w1 · r_col + w2 · r_tex
Here, w1 and w2 are constants set in advance.

［ステップＳ５０７］
辞書用映像の各フレーム画像と基準画像に対して，ステップＳ５０６で求めた距離ｒ（ｉ）から，基準画像と類似したフレーム画像を収集する。予め設定した閾値ＴＨ＿ｒを用いて，
ｒ（ｉ）＜ＴＨ＿ｒ
となるフレーム画像Ｆｉを収集すればよい。図４の場合には，定義ラベルを表すフレーム画像が出現する区間が３個，定義ラベルを表していないが基準画像と類似している区間が１個あるので，合計４個の区間内のフレーム画像が辞書データとして収集されることになる。 [Step S507]
Frame images similar to the reference image are collected from the distance r (i) obtained in step S506 for each frame image of the dictionary video and the reference image. Using a preset threshold TH_r,
r (i) <TH_r
What is necessary is just to collect frame image Fi. In the case of FIG. 4, there are three sections in which a frame image representing a definition label appears, and there is one section that does not represent a definition label but is similar to the reference image, so frames in a total of four sections Images are collected as dictionary data.

次に，正事例・負事例選定処理部１５において，ステップＳ５０８，Ｓ５０９，Ｓ５１０により正事例と負事例を選定する。 Next, the positive case / negative case selection processing unit 15 selects a positive case and a negative case in steps S508, S509, and S510.

［ステップＳ５０８］
収集した画像を基準画像との距離ｒ（ｉ）が小さい順位にソートする。 [Step S508]
The collected images are sorted in order of decreasing distance r (i) from the reference image.

［ステップＳ５０９］
正事例画像をチェックする画面を提示する。図７に，正事例画像をチェックするＧＵＩ（Graphical User Interface）の例を示す。例えば図７に示すように，ディスプレイに画像を一覧表示し，正事例画像とみなす画像をマウスでクリックさせるＧＵＩを提供すればよい。 [Step S509]
A screen for checking the correct case images is presented. FIG. 7 shows an example of a GUI (Graphical User Interface) for checking a correct case image. For example, as shown in FIG. 7, it is only necessary to provide a GUI that displays a list of images on a display and allows the user to click an image that is regarded as a normal case image with a mouse.

［ステップＳ５１０］
図７に示すようなＧＵＩにより，正事例画像としてチェックされた画像を正事例画像とし，正事例画像としてチェックされなかった画像の中から，基準画像との距離ｒ（ｉ）が小さい順に正事例画像と同数だけ，負事例画像として選定する。 [Step S510]
Using the GUI as shown in FIG. 7, an image checked as a positive case image is set as a positive case image, and a positive case is selected from the images not checked as a positive case image in ascending order of the distance r (i) from the reference image. Select as many negative images as there are images.

［ステップＳ５１１］
特徴量抽出部１６において，正事例画像と負事例画像から特徴ベクトルを算出する。特徴ベクトルの例として，ステップＳ５０２，Ｓ５０３，Ｓ５０４，Ｓ５０５の手順と同様の方法で，Ｍ×Ｎ×９次元の色特徴ベクトルとＭ×Ｎ×９次元のテクスチャ特徴ベクトルの画像特徴量を抽出し，画像特徴量だけを特徴ベクトルとして用いる方法や，選定された正事例画像と負事例画像が属していた辞書用映像Ｖｄにおける出現時刻（メディア時刻）を画像特徴量に加える方法などが有効である。 [Step S511]
The feature quantity extraction unit 16 calculates a feature vector from the positive case image and the negative case image. As an example of the feature vector, the image feature amount of the M × N × 9 dimensional color feature vector and the M × N × 9 dimensional texture feature vector is extracted by the same method as the procedure of steps S502, S503, S504, and S505. The method using only the image feature amount as the feature vector, the method of adding the appearance time (media time) in the dictionary video Vd to which the selected positive case image and negative case image belong to the image feature amount are effective. .

［ステップＳ５１２］
１クラス識別関数算出部１７において，正事例画像を識別するための１クラス識別関数を算出する。１クラス識別関数の算出で用いる特徴量として，辞書データ収集処理部１４が辞書データを収集するときに用いた画像の類似度の基準とした特徴量を用いることができる。例えば１クラス識別関数として，特徴空間における正事例画像の確率分布または正事例画像の平均特徴ベクトルからの距離により，処理対象画像が定義ラベルを表す画像であるか否かを識別する関数を用いることができる。 [Step S512]
The one class identification function calculation unit 17 calculates a one class identification function for identifying a positive case image. As the feature quantity used in the calculation of the one-class identification function, the feature quantity based on the image similarity used when the dictionary data collection processing unit 14 collects the dictionary data can be used. For example, a function that identifies whether the processing target image is an image representing a definition label based on the probability distribution of the positive case image in the feature space or the distance from the average feature vector of the positive case image is used as the one-class identification function. Can do.

［ステップＳ５１３］
２クラス識別関数算出部１８において，正事例画像と負事例画像とから両者を識別する２クラス識別関数を算出する。 [Step S513]
The 2-class discriminant function calculation unit 18 calculates a 2-class discriminant function for discriminating both from the positive case image and the negative case image.

［ステップＳ５１４］
定義ラベルの１クラス識別関数と２クラス識別関数を定義ラベル辞書記憶部１９に保存する。 [Step S514]
The one-class discriminant function and the two-class discriminant function of the definition label are stored in the definition label dictionary storage unit 19.

以上の処理により，定義ラベルに対する辞書が生成できる。 With the above processing, a dictionary for the definition label can be generated.

次に，インデキシング対象映像の各フレームに対して定義ラベルを付与するかどうかを判定するインデキシング処理部の処理手順を図８を用いて説明する。 Next, the processing procedure of the indexing processing unit for determining whether or not to assign a definition label to each frame of the indexing target video will be described with reference to FIG.

［ステップＳ８０１］
インデキシング映像取得部２０において，インデキシング対象映像を読み込み，フレーム画像取得部２１でインデキシング対象映像からフレーム画像を取り込む。以下では，Ｐ枚のフレーム画像を取り込んだとして説明する。 [Step S801]
The indexing video acquisition unit 20 reads the indexing target video, and the frame image acquisition unit 21 captures the frame image from the indexing target video. In the following description, it is assumed that P frame images have been captured.

［ステップＳ８０２］
Ｐ枚のフレーム画像を順番に処理するために，初期値としてｐ＝１に設定する。 [Step S802]
In order to sequentially process P frame images, p = 1 is set as an initial value.

［ステップＳ８０３］
特徴量抽出部１６において，ｐ枚目（ｐ＝１，２，... ，Ｐ）のフレーム画像から特徴量を算出する。 [Step S803]
The feature quantity extraction unit 16 calculates a feature quantity from the p-th (p = 1, 2,..., P) frame image.

［ステップＳ８０４］
１クラス識別関数による候補検出部２２において，定義ラベル辞書記憶部１９に保存してある１クラス識別関数を用いて，ｐ番目のフレーム画像が定義ラベルを付与する候補になるかを判定する。候補になる場合には，ステップＳ８０５に進み，候補にならない場合には，ステップＳ８０７に進む。 [Step S804]
The candidate detection unit 22 based on the one-class identification function determines whether the p-th frame image is a candidate for adding a definition label using the one-class identification function stored in the definition label dictionary storage unit 19. When it becomes a candidate, it progresses to step S805, and when it becomes not a candidate, it progresses to step S807.

判定方法として，例えば，ステップＳ５１１で求めた特徴ベクトルについて正事例画像の平均ベクトルを算出し，未知画像と平均ベクトルからのユークリッド距離が予め設定しておいた閾値ＴＨ＿ｐｏｓ１以下であれば，その未知画像に定義ラベルを付与すると判定する方法や，ステップＳ５１１で求めた特徴ベクトルにおける正事例画像の分布を複数のガウシアン分布で推定し，混合ガウシアン分布に対する未知画像の尤度が予め設定しておいた閾値ＴＨ＿ｐｏｓ２以下であれば，その未知画像に定義ラベルを付与すると判定する方法を用いればよい。 As a determination method, for example, an average vector of positive case images is calculated for the feature vector obtained in step S511. If the Euclidean distance from the unknown image and the average vector is equal to or less than a preset threshold TH_pos1, the unknown image A method for determining that a definition label is to be assigned to the threshold value, and a distribution of the positive case image in the feature vector obtained in step S511 is estimated using a plurality of Gaussian distributions, and the likelihood of an unknown image with respect to the mixed Gaussian distribution is set in advance. If TH_pos2 or less, a method of determining that a definition label is to be given to the unknown image may be used.

［ステップＳ８０５］
２クラス識別関数によるインデキシング部２３において，ｐ番目のフレーム画像に定義ラベルを表しているかを判定する。定義ラベルを表していると判定された場合には，ステップＳ８０６へ，表していないと判定された場合には，ステップＳ８０７に進む。 [Step S805]
In the indexing unit 23 using the two-class discriminant function, it is determined whether a definition label is represented in the p-th frame image. If it is determined that the definition label is represented, the process proceeds to step S806. If it is determined that the definition label is not represented, the process proceeds to step S807.

［ステップＳ８０６］
ｐ番目のフレームのメディア時刻を，定義ラベルのインデキシング結果として記憶する。すなわち，定義ラベルを付与するフレーム画像のメディア時刻を記憶する。 [Step S806]
The media time of the p-th frame is stored as an indexing result of the definition label. That is, the media time of the frame image to which the definition label is assigned is stored.

［ステップＳ８０７］
Ｐ個のフレーム画像の全てに対して処理が終了したかを判定する。処理が終了していない場合には，ステップＳ８０８に進み，処理が終了した場合には，ステップＳ８０９に進む。 [Step S807]
It is determined whether or not processing has been completed for all P frame images. If the process has not been completed, the process proceeds to step S808. If the process has been completed, the process proceeds to step S809.

［ステップＳ８０８］
ｐ＝ｐ＋１とし，ステップＳ８０３以降の処理を繰り返す。 [Step S808]
p = p + 1 is set, and the processing after step S803 is repeated.

［ステップＳ８０９］
インデキシング結果として，Ｐ個のフレーム画像の中で定義ラベルを付与するフレーム画像のメディア時刻をまとめ，そのインデキシング結果を出力する。 [Step S809]
Among the P frame images, the media times of the frame images to which the definition label is added are collected as the indexing result, and the indexing result is output.

以上の処理により，インデキシング対象映像に対して定義ラベルを付与することができる。 With the above processing, a definition label can be assigned to the indexing target video.

次に，本発明の第２の実施例について説明する。図９は，本発明の第２の実施例における映像インデキシング装置のインデキシング部の構成を説明するための図である。 Next, a second embodiment of the present invention will be described. FIG. 9 is a diagram for explaining the configuration of the indexing unit of the video indexing apparatus according to the second embodiment of the present invention.

第２の実施例では，図９のインデキシング処理部は，特徴量抽出部１６，定義ラベル辞書記憶部１９，インデキシング映像取得部２０，フレーム画像取得部２１，１クラス識別関数による候補検出部２２，２クラス識別関数によるインデキシング部２３，フレーム画像インデキシング結果管理部３０，イベントルール記憶部３１，イベントラベル付与部３２で構成される。 In the second embodiment, the indexing processing unit of FIG. 9 includes a feature amount extraction unit 16, a definition label dictionary storage unit 19, an indexing video acquisition unit 20, a frame image acquisition unit 21, a candidate detection unit 22 based on one class identification function, It comprises an indexing unit 23 based on a two-class identification function, a frame image indexing result management unit 30, an event rule storage unit 31, and an event label assigning unit 32.

特徴量抽出部１６，定義ラベル辞書記憶部１９，インデキシング映像取得部２０，フレーム画像取得部２１，定義ラベル辞書記憶部１９，１クラス識別関数による候補検出部２２，２クラス識別関数によるインデキシング部２３は，前述した第１の実施例と同じ処理を行う。 Feature quantity extraction unit 16, definition label dictionary storage unit 19, indexing video acquisition unit 20, frame image acquisition unit 21, definition label dictionary storage unit 19, candidate detection unit 22 based on one class identification function, indexing unit 23 based on two class identification function Performs the same processing as in the first embodiment.

フレーム画像インデキシング結果管理部３０は，２クラス識別関数によるインデキシング部２３より出力される，インデキシング対象映像に対する定義ラベルが付与されたメディア時刻を管理する。 The frame image indexing result management unit 30 manages the media time, which is output from the indexing unit 23 based on the two-class identification function, to which the definition label for the indexing target video is given.

イベントルール記憶部３１は，複数の定義ラベルが出現する順序関係や時間間隔をイベントラベルのルールとして設定する。イベントルールの例を図１０に示す。同図において，イベントルール１は，イベントラベル１を付与するためのルールであり，定義ラベルＡが付与された区間の次に定義ラベルＣが付与された区間が出現し，さらにその次に定義ラベルＢが付与された区間が出現し，最後に定義ラベルＡが出現するイベントのルールを示す。イベントルール２は，イベントラベル２を付与するためのルールであり，定義ラベルＡが付与された区間の１０秒以内に，また定義ラベルＡの区間が出現し，さらに１０秒以内にもう一度，定義ラベルＡの区間が出現するイベントのルールを示す。 The event rule storage unit 31 sets an order relation and a time interval in which a plurality of definition labels appear as event label rules. An example of the event rule is shown in FIG. In the same figure, event rule 1 is a rule for assigning event label 1, a section to which definition label C is assigned appears after a section to which definition label A is assigned, and then a definition label is added. A rule of an event in which a section to which B is given appears and a definition label A appears at the end is shown. The event rule 2 is a rule for assigning the event label 2, and the definition label A appears again within 10 seconds of the section to which the definition label A is assigned, and again within 10 seconds. The rule of the event where the section of A appears is shown.

イベントラベル付与部３２は，フレーム画像インデキシング結果管理部３０から定義ラベルが付与されたメディア時刻を受け取ると，イベントルール記憶部３１に取得要求信号を出して，イベントルールを読み取る。 When the event label assigning unit 32 receives the media time to which the definition label is assigned from the frame image indexing result managing unit 30, it outputs an acquisition request signal to the event rule storing unit 31 to read the event rule.

インデキシング対象映像に対して付与された定義ラベルの中で，読み取ったイベントルールとの類似度を算出し，類似度が予め設定しておいた閾値以上であれば，そのイベントラベルを付与する。 Among the definition labels assigned to the indexing target video, the similarity with the read event rule is calculated, and if the similarity is equal to or higher than a preset threshold, the event label is assigned.

次に，上記の構成における処理手順について説明する。定義ラベル辞書記憶部１９において，定義ラベルＡ，定義ラベルＢ，定義ラベルＣに対する辞書が管理されており，本発明の第１の実施例の手順に従って，フレーム画像インデキシング結果管理部３０では，インデキシング対象映像に対して定義ラベルＡ，定義ラベルＢ，定義ラベルＣが付与され，イベントルール記憶部３１において，図１０のイベントラベル１とイベントラベル２が付与されている場合を例に説明する。 Next, a processing procedure in the above configuration will be described. The definition label dictionary storage unit 19 manages dictionaries for the definition label A, definition label B, and definition label C, and the frame image indexing result management unit 30 performs indexing according to the procedure of the first embodiment of the present invention. An example will be described in which definition label A, definition label B, and definition label C are assigned to a video, and event label 1 and event label 2 in FIG.

イベントラベル付与部３２は，フレーム画像インデキシング結果管理部３０で管理されているインデキシング対象映像に付与された定義ラベルと，イベントルール記憶部３１で記憶されているイベントルールとの類似度を算出する。 The event label assigning unit 32 calculates the similarity between the definition label assigned to the indexing target video managed by the frame image indexing result management unit 30 and the event rule stored in the event rule storage unit 31.

類似度の算出方法の例として，イベントルールの定義区間数（図１０のイベントルール１では４，イベントルール２では５）に対して出現順序が一致した区間数の比を求める方法がある。例えば，図１１に示すように，インデキシング対象映像に対して付与された定義ラベルが映像メディア時刻の順番に定義ラベルＡ，定義ラベルＣ，定義ラベルＢ，定義ラベルＣという区間があれば，イベントルール１との類似度は３／４＝０．７５となる。０．７５以上の類似度であればイベントラベルを付与すると設定されていれば，この区間に対してイベントラベル１が付与されることになる。 As an example of the similarity calculation method, there is a method of obtaining a ratio of the number of sections having the same appearance order with respect to the number of defined sections of the event rule (4 for event rule 1 in FIG. 10 and 5 for event rule 2). For example, as shown in FIG. 11, if the definition label given to the indexed video has sections of definition label A, definition label C, definition label B, and definition label C in the order of the video media time, the event rule The similarity with 1 is 3/4 = 0.75. If it is set to add an event label if the degree of similarity is 0.75 or more, event label 1 is assigned to this section.

このような処理をイベントルール１とイベントルール２に対して行うことで，予め設定しておいたイベントが出現した区間に対してインデックスを付与することができる。 By performing such processing for the event rule 1 and the event rule 2, an index can be assigned to a section in which a preset event appears.

イベントラベル付与の具体例について説明する。 A specific example of event label assignment will be described.

〔例１〕例えばサッカー映像での定義ラベルとして，以下のシーンを設定しておく。
・グランドシーン：グランド全体が映っているシーン
・ゴールポストシーン：ゴールポストが大きく映っているシーン
・人物顔シーン：人物顔が大きく映っているシーン
ゴールというイベントを，グランドシーンの後に，ゴールポストシーンが出現し，そのＴ秒以内に人物顔シーンが出現すると規定する。このようなイベントルールを設定しておけば，サッカー映像においてゴールが映っている可能性の大きい映像区間に，ゴールのイベントラベルを自動付与することができる。 [Example 1] For example, the following scene is set as a definition label in a soccer video.
・ Grand scene: The scene where the whole ground is reflected ・ Goal post scene: The scene where the goal post is reflected greatly ・ The human face scene: The scene where the person face is reflected greatly The event called the goal is the goal post scene after the grand scene Appears, and a human face scene appears within T seconds. If such an event rule is set, an event label for the goal can be automatically assigned to a video section where there is a high possibility that the goal is reflected in the soccer video.

〔例２〕例えばニュース映像での定義ラベルとして，以下のシーンを設定しておく。
・アナウンサーシーン：アナウンサーが映っているシーン
・テロップシーン：テロップが表示されたシーン
トピック遷移イベントを，アナウンサーシーンの直後に，テロップシーンが出現し，そのＴ秒以上はアナウンサーシーンが出現しないと規定する。このようなイベントルールを設定しておけば，ニュース映像においてトピック遷移イベントを検出し，そのイベントラベルを自動付与することができる。 [Example 2] For example, the following scene is set as a definition label in a news video.
-Announcer scene: Scene where the announcer is shown-Telop scene: Scene where the telop is displayed Topic transition events are defined as the telop scene appears immediately after the announcer scene, and the announcer scene does not appear for more than T seconds. . If such an event rule is set, it is possible to detect a topic transition event in a news video and automatically assign the event label.

以上の映像インデキシングの処理は，コンピュータとソフトウェアプログラムとによって実現することができ，そのプログラムをコンピュータ読み取り可能な記録媒体に記録して提供することも，ネットワークを通して提供することも可能である。 The above video indexing processing can be realized by a computer and a software program, and the program can be provided by being recorded on a computer-readable recording medium or provided through a network.

本発明の原理構成図である。It is a principle block diagram of this invention. 本発明の第１の実施例における映像インデキシング装置の辞書生成部の構成図である。It is a block diagram of the dictionary production | generation part of the image | video indexing apparatus in 1st Example of this invention. 本発明の第１の実施例における映像インデキシング装置のインデキシング部の構成図である。It is a block diagram of the indexing part of the image | video indexing apparatus in 1st Example of this invention. 辞書用映像の例を示す図である。It is a figure which shows the example of the image | video for dictionary. 本発明の第１の実施例における辞書生成処理のフローチャートである。It is a flowchart of the dictionary production | generation process in 1st Example of this invention. フレーム画像のブロック分割の例を示す図である。It is a figure which shows the example of the block division of a frame image. 正事例画像をチェックするＧＵＩの例を示す図である。It is a figure which shows the example of GUI which checks a positive example image. 本発明の第１の実施例におけるインデキシング処理のフローチャートである。It is a flowchart of the indexing process in 1st Example of this invention. 本発明の第２の実施例における映像インデキシング装置のインデキシング部の構成図である。It is a block diagram of the indexing part of the image | video indexing apparatus in 2nd Example of this invention. イベントルールの例を示す図である。It is a figure which shows the example of an event rule. 定義ラベル付与の例を示す図である。It is a figure which shows the example of a definition label provision.

Explanation of symbols

１基準画像選定手段
２辞書データ収集手段
３正事例・負事例選定手段
４１クラス識別関数算出手段
５２クラス識別関数算出手段
６辞書記憶手段
７ラベル付与候補検出手段
８インデキシング手段 DESCRIPTION OF SYMBOLS 1 Reference image selection means 2 Dictionary data collection means 3 Positive case / negative case selection means 4 1 class discrimination function calculation means 5 2 class discrimination function calculation means 6 Dictionary storage means 7 Label addition candidate detection means 8 Indexing means

Claims

In a video indexing device that predefines a definition label that represents image content and assigns a definition label to the scene when a scene corresponding to the definition label appears in a video to be indexed,
It is determined whether or not the processing target image is an image representing the definition label, which is calculated from the definition label and the feature amount extracted from the correct positive case image representing the definition label given as the learning image. A positive case image and a negative case image calculated by a feature amount extracted from a class identification function and a correct positive case image and an incorrect negative case image representing the definition label given as a learning image. Dictionary storage means for storing, as dictionary data, a two-class identification function for identification;
When an image to be indexed is given, the image represents a definition label stored in the dictionary storage unit using a one-class identification function stored in the dictionary storage unit for a frame image selected from the video. Labeling candidate detection means for determining whether or not to determine a frame image as a candidate for giving the definition label;
For the frame image that is a candidate by the labeling candidate detection means, by determining whether it corresponds to a positive case image or a negative case image using a two-class identification function stored in the dictionary storage means, An indexing unit that eliminates images corresponding to negative case images from candidate frame images, determines an image to which the definition label corresponding to a positive case image is to be assigned, and assigns a definition label. A video indexing device.

The video indexing device according to claim 1,
The video indexing device, wherein the 1-class discriminant function and the 2-class discriminant function are functions that identify an appearance time and an image feature amount in a video of a frame image as a feature amount.

The video indexing device according to claim 1 or 2,
An event rule storage means for storing an event label representing an event in the video and an event rule defined by an order relation or a time interval in which a plurality of definition labels for detecting the event in the video appear;
The order relationship or time interval of the definition labels given by the indexing means is collated with the event rules stored in the event rule storage means, and the event label is displayed on the video portion whose similarity is greater than a predetermined threshold. An image labeling device, further comprising:

In the video indexing device according to claim 1, claim 2 or claim 3,
A reference image selection means for selecting a reference image representing a definition label from the dictionary image when the dictionary image is given;
Dictionary data collection means for detecting an image similar to the reference image from the dictionary video;
A positive case / negative case selection means for selecting a correct image as a positive case image and an incorrect image as a negative case image in the order of similarity to the reference image among the collected similar images representing the definition label;
One class identification function calculating means for calculating one class identification function for determining whether or not the image represents a definition label based on the feature amount extracted from the positive case image;
2-class discriminant function calculating means for calculating a 2-class discriminant function for discriminating both based on the feature values extracted from the positive case image and the negative case image,
The one-class identification function calculated by the one-class identification function calculating means, the two-class identification function calculated by the two-class identification function calculating means, and the definition label are stored as dictionary data in the dictionary storage means. A video indexing device characterized by that.

The video indexing device according to claim 4,
The video indexing apparatus according to claim 1, wherein the one-class discriminant function calculating unit calculates the one-class discriminant function based on a similarity criterion in the dictionary data collecting unit.

In a video indexing method executed by a video indexing device that predefines a definition label that represents image content and assigns a definition label to the scene when a scene corresponding to the definition label appears in the video to be indexed ,
It is determined whether or not the processing target image is an image representing the definition label, which is calculated from the definition label and the feature amount extracted from the correct positive case image representing the definition label given as the learning image. A positive case image and a negative case image calculated by a feature amount extracted from a class identification function and a correct positive case image and an incorrect negative case image representing the definition label given as a learning image. Refer to the dictionary storage means for storing the two-class identification function for identification as dictionary data,
When an image to be indexed is given, the image represents a definition label stored in the dictionary storage unit using a one-class identification function stored in the dictionary storage unit for a frame image selected from the video. A labeling candidate detection process for determining whether or not a frame image that is a candidate for giving the definition label,
For the frame image that is a candidate by the labeling candidate detection process, by determining whether it corresponds to a positive case image or a negative case image using a two-class identification function stored in the dictionary storage means, An image corresponding to the negative case image is excluded from the candidate frame images, an image to which the definition label corresponding to the positive case image is to be assigned is determined, and an indexing process for assigning the definition label is executed. Characteristic video indexing method.

The video indexing method according to claim 6,
The video indexing method, wherein the 1-class discriminant function and the 2-class discriminant function are functions for discriminating an appearance time and an image feature quantity in a video of a frame image as a feature quantity.

The video indexing method according to claim 6 or 7,
Referring to an event rule storage means for storing an event label representing an event in the video and an event rule defined by an order relation or a time interval in which a plurality of definition labels for detecting the event in the video appear,
The event label stored in the event rule storage means is collated with the order relationship or time interval of the definition labels given in the indexing process, and the event label is displayed on the video portion whose similarity is greater than a predetermined threshold. And an event label assigning process for assigning a video indexing method.

In the video indexing method according to claim 6, claim 7 or claim 8,
A reference image selection process for selecting a reference image representing a definition label from a dictionary image when a dictionary image is given;
Dictionary data collection processing for detecting an image similar to the reference image from the dictionary video;
A positive case / negative case selection process for selecting a correct image as a positive case image and an incorrect image as a negative case image in order of similarity to the reference image among the collected similar images,
A one-class identification function calculating process for calculating a one-class identification function for determining whether or not the image represents a definition label based on the feature amount extracted from the positive case image;
Performing a two-class identification function calculation process for calculating a two-class identification function for identifying both based on the feature values extracted from the positive case image and the negative case image,
The one class identification function calculated by the one class identification function calculation process, the two class identification function calculated by the two class identification function calculation process, and the definition label are stored as dictionary data in the dictionary storage means. A video indexing method characterized by the above.

The video indexing method according to claim 9,
In the one-class identification function calculation process, the one-class identification function is calculated based on a similarity criterion in the dictionary data collection process.

A video indexing program for causing a computer to execute the video indexing method according to any one of claims 6 to 10.

A computer-readable recording medium on which the video indexing program according to claim 11 is recorded.