JP2009048334A

JP2009048334A - Video identification processing device, image identification processing device, and computer program

Info

Publication number: JP2009048334A
Application number: JP2007212412A
Authority: JP
Inventors: Kikuka Miura; 菊佳三浦; Ichiro Yamada; 一郎山田; Hideki Sumiyoshi; 英樹住吉; Nobuyuki Yagi; 伸行八木
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2007-08-16
Filing date: 2007-08-16
Publication date: 2009-03-05

Abstract

【課題】映像データや画像データと関連付けられたテキストデータの中に、被写体そのものを表わす単語が含まれていない場合にも、所望の被写体が映されている映像データや画像データを検索結果として得る。
【解決手段】映像識別処理装置が、映像データと映像データに関連付けられたテキストデータとをクラスごとに分類して保持するコンテンツデータベースと、単語とクラスとの関係を表わす確率パラメタ値を、当該単語と当該クラスとに関連付けて保持する確率パラメタ値データベースと、コンテンツデータベースから読み出したテキストデータに含まれる単語の出現頻度に基づき、単語とクラスとの関係を表わす確率パラメタ値を算出する処理を行ない、算出された確率パラメタ値を当該単語及び当該クラスに関連付けて確率パラメタ値データベースに書き込む確率計算処理部と、を備える。
【選択図】図１Even when text data associated with video data or image data does not include a word representing the subject itself, video data or image data showing the desired subject is obtained as a search result. .
A video identification processing device classifies and stores video data and text data associated with the video data for each class, and a probability parameter value representing a relationship between the word and the class. And a probability parameter value database to be held in association with the class, and based on the appearance frequency of the word included in the text data read from the content database, to calculate a probability parameter value representing the relationship between the word and the class, A probability calculation processing unit that writes the calculated probability parameter value in the probability parameter value database in association with the word and the class.
[Selection] Figure 1

Description

本発明は、映像識別処理装置、画像識別処理装置、およびコンピュータプログラムに関する。 The present invention relates to a video identification processing device, an image identification processing device, and a computer program.

テキストによるコンテンツの検索技術が進歩し実用化の範囲が広がる一方で、映像や画像などの非テキストコンテンツを効率よく検索する技術が求められている。
特許文献１には、映像に予め人がキーワードテキストを付与しておいて、検索語がマッチするキーワードを有する映像を出力するという、キーワードによる映像検索方法の技術が記載されている。
特許文献２には、映像情報に含まれるクローズドキャプションから、その構文に基づいて抽出される特定の品詞の語を基に、映像をインデクス化する映像インデキシング装置の技術が記載されている。
また、特許文献３には、映像と同時に用いられる台本またはクローズドキャプションを含むテキストデータに基づいて、所望の被写体が含まれた映像を抽出する映像被写体抽出装置の技術が記載されている。
特開２００２−００７４１５号公報特開２００７−００６１１６号公報特開２００７−１８８１６９号公報 While the search technology for text content has advanced and the range of practical use has expanded, a technology for efficiently searching non-text content such as video and images has been demanded.
Japanese Patent Application Laid-Open No. 2004-228561 describes a technique of a keyword video search method in which a keyword text is given to a video in advance and a video having a keyword that matches a search word is output.
Patent Document 2 describes a technology of a video indexing device that indexes video based on words of specific parts of speech extracted based on the syntax from closed captions included in the video information.
Japanese Patent Application Laid-Open No. 2004-228561 describes a technique of a video subject extraction device that extracts a video including a desired subject based on text data including a script or closed caption used simultaneously with the video.
JP 2002-007415 A JP 2007-006116 A JP 2007-188169 A

上記の背景技術に記載した特許文献１に開示された技術では、映像をインデクス付けするために、キーワードの付与を人手で行なう必要があり、膨大な時間等労力がかかるという問題があった。
また、特許文献２および特許文献３に開示された技術では、人手によるキーワードの付与は不要であるものの、所望の被写体が写されている映像を検索する際、当該被写体そのものを表わす語がテキストに含まれていなければ、検索結果としてその映像を得ることはできなかった。例えば、動物のライオンが被写体となっている映像を検索して得るためには、当該映像に関連付けられているクローズドキャプション等のテキスト内に、「これは百獣の王、ライオンです。」というように、「ライオン」という言葉そのものが含まれている必要があった。 In the technique disclosed in Patent Document 1 described in the background art described above, there is a problem that it is necessary to manually assign keywords in order to index videos, and it takes a lot of time and labor.
In addition, in the techniques disclosed in Patent Document 2 and Patent Document 3, it is not necessary to manually assign a keyword, but when searching for an image in which a desired subject is copied, a word representing the subject itself is converted into text. If it was not included, the video could not be obtained as a search result. For example, in order to search and obtain a video of an animal lion as the subject, “This is the king of beasts, the lion” in the text such as the closed caption associated with the video. , The word “lion” had to be included.

しかしながら、実際には、被写体そのものを直接表わす語が、クローズドキャプション等のテキストには含まれないこともある。一例としては、野球のバッターボックスが映っている映像シーンに対応するクローズドキャプションに、「バッター」、「カーブ」、「ストライク」などといった単語は含まれているものの、「バッターボックス」という単語そのものは含まれていない場合がある。また他の例として、料理のためのガスレンジが移っている映像シーンに対応するクローズドキャプションに、「フライパン」、「強火」、「大さじ１ぱい」などといった単語等は含まれているものの、「ガスレンジ」という単語そのものは含まれていない場合がある。 In practice, however, words that directly represent the subject itself may not be included in text such as closed captions. For example, the closed caption corresponding to a video scene showing a baseball batter box contains words such as “batter”, “curve”, and “strike”, but the word “batter box” itself is May not be included. As another example, although closed captions corresponding to video scenes where the gas range for cooking has moved include words such as “fry pan”, “high heat”, “1 tablespoon”, etc., “ The word “gas range” may not be included.

上記の例のように、被写体そのものを直接表わす単語がクローズドキャプション等のテキストに含まれていない場合にも、そのような映像を検索結果に含めることができるようになれば、映像や画像の検索精度を向上させることにつながる。 As in the above example, even if a word that directly represents the subject itself is not included in text such as closed captions, if such a video can be included in the search results, the search for the video or image is possible. This leads to improved accuracy.

本発明は、このような事情、課題を考慮してなされたものであり、映像データや画像データと関連付けられたテキストデータの中に、被写体そのものを表わす単語が含まれていない場合にも、所望の被写体が映されている映像データや画像データを検索結果として得るための、映像識別処理装置、画像識別処理装置、およびコンピュータプログラムを提供することを目的とする。 The present invention has been made in view of such circumstances and problems, and is desired even when a word representing the subject itself is not included in text data associated with video data or image data. An object of the present invention is to provide a video identification processing device, an image identification processing device, and a computer program for obtaining video data and image data showing a subject as a search result.

上記の課題を解決するため、本発明は、映像データや画像データの周辺のテキストから得られる周辺単語情報から、被写体そのものを表す単語が文書中に出現していなくても映像内容をある程度予測する手段を提供する。
そして、本発明は、画像的特徴、映像的特徴などにより、似ているもので映像データや画像データのクラスわけをし、それぞれのクラスのテキスト中の語彙的特徴（クラスの語彙的特徴は、単語の出現頻度のベクトルとして表わされる）から確率値を算出し、言語と画像の相関の指標とする。つまり、語彙的特徴が与えられたときのクラス生起確率、あるいは単語が与えられたときのクラス生起確率が、この言語と画像との相関の指標となる。例えば、「フライパン」、「強火」、「大さじ１ぱい」という語からは料理番組でガスレンジの前で調理する映像（のクラス）が予測され（つまり生起確率が高く）、「バッター」、「ストライク」、「カーブ」という語からは野球番組のバッターボックスの映像（のクラス）が予測される。言語上は「ガスレンジ」という単語も「バッターボックス」という単語も出現していないが、周辺の単語（上記例においては、「フライパン」、「強火」、「大さじ１ぱい」という語や、「バッター」、「ストライク」、「カーブ」という語）を手掛かりとして映像を推定する。これを映像とテキストの対からモデル化し、テキストのみの入力がきたときに映像を予測するシステムを提供する。 In order to solve the above problems, the present invention predicts video content to some extent even if a word representing the subject itself does not appear in a document from peripheral word information obtained from video data and text around image data. Provide a means.
Then, the present invention classifies video data and image data with similar ones according to image characteristics, video characteristics, etc., and lexical characteristics in the text of each class (the lexical characteristics of the class are: A probability value is calculated from a word appearance frequency vector) and used as an index of correlation between language and image. That is, the class occurrence probability when a lexical feature is given or the class occurrence probability when a word is given is an index of the correlation between the language and the image. For example, the words “frying pan”, “high heat”, and “one tablespoon” predict a video (class) of cooking in front of a gas range in a cooking program (that is, a high probability of occurrence), “batter”, “ From the words “strike” and “curve”, a video of a batter box of a baseball program is predicted. In terms of language, neither the word “gas range” nor the word “batter box” appears, but the surrounding words (in the above example, “fry pan”, “high heat”, “one tablespoon”, “ Video is estimated using clues such as “batter”, “strike”, and “curve”. We provide a system that models video and text pairs and predicts video when text-only input is received.

［１］本発明の一態様による映像識別処理装置は、映像データと前記映像データに関連付けられたテキストデータとをクラスごとに分類して保持するコンテンツデータベースと、単語と前記クラスとの関係を表わす確率パラメタ値を、当該単語と当該クラスとに関連付けて保持する確率パラメタ値データベースと、前記コンテンツデータベースから読み出した前記テキストデータに含まれる単語の出現頻度に基づき、前記単語と前記クラスとの関係を表わす確率パラメタ値を算出する処理を行ない、算出された前記確率パラメタ値を当該単語及び当該クラスに関連付けて前記確率パラメタ値データベースに書き込む確率計算処理部と、を備えることを特徴とする。
これによれば、映像データとテキストデータは元々関連付けられている。コンテンツデータベースにおいては、映像データとテキストデータのペアが、あるクラスに属する。確率計算処理部は、クラス分けされているコンテンツデータベースから読み出したテキストデータに含まれる単語の出現頻度に基づき、単語とクラスとの関係を表わす確率パラメタ値を算出する。確率パラメタ値の最良の形態のひとつとして、例えば、与えられた単語に対するクラス生起確率を用いる。これにより、確率パラメタ値データベースには、単語とクラスと算出された確率パラメタ値との関係を表わすデータが蓄えられる。 [1] A video identification processing device according to an aspect of the present invention represents a relationship between a content database that stores video data and text data associated with the video data by class, and a word and the class. Based on the probability parameter value database that holds the probability parameter value in association with the word and the class, and the appearance frequency of the word included in the text data read from the content database, the relationship between the word and the class is determined. A probability calculation processing unit that performs a process of calculating a probability parameter value to be expressed and writes the calculated probability parameter value in the probability parameter value database in association with the word and the class.
According to this, video data and text data are originally associated with each other. In the content database, a pair of video data and text data belongs to a certain class. The probability calculation processing unit calculates a probability parameter value representing the relationship between the word and the class based on the appearance frequency of the word included in the text data read from the classified content database. As one of the best forms of the probability parameter value, for example, the class occurrence probability for a given word is used. As a result, data representing the relationship between the word, the class, and the calculated probability parameter value is stored in the probability parameter value database.

［２］また、本発明の一態様による映像識別処理装置においては、入力されるテキストデータを読み込み、当該テキストデータに含まれる単語の出現頻度と、前記確率パラメタ値データベースから読み出した前記確率パラメタ値と、に基づき、当該テキストデータが前記クラスに属する確率であるクラス生起確率を各々のクラスごとに算出する識別処理部を、さらに備えることを特徴とする。
これによれば、識別処理部は、確率パラメタ値データベースから読み出した前記の確率パラメタ値と、入力されるテキストデータの出現頻度とから、所定の計算処理により、入力されたテキストデータからのクラス生起確率を算出することができ、つまり入力されたテキストが属するクラスを推定できる。なお、算出されたクラス生起確率の最も高いクラスを、最尤クラスとして決定できる。 [2] Further, in the video identification processing device according to one aspect of the present invention, the input text data is read, the appearance frequency of words included in the text data, and the probability parameter value read from the probability parameter value database. And an identification processing unit that calculates a class occurrence probability, which is a probability that the text data belongs to the class, for each class.
According to this, the identification processing unit generates a class from the input text data by a predetermined calculation process from the probability parameter value read from the probability parameter value database and the appearance frequency of the input text data. The probability can be calculated, that is, the class to which the input text belongs can be estimated. The class with the highest class occurrence probability calculated can be determined as the maximum likelihood class.

［３］また、本発明の一態様による映像識別処理装置においては、入力される映像データの映像の特徴に基づき、当該映像データが属するクラスを決定し、当該映像データと、当該映像データに関連付けられたテキストデータを、前記コンテンツデータベースに書き込む映像分類処理部を、さらに備えることを特徴とする映像識別処理装置。
これによれば、映像分類処理部は、テキストデータに依らず、また映像データが表わす意味等にも依らず、映像の特徴に基づいて映像データとテキストデータのペアをクラスに分類し、コンテンツデータベースに書き込む。 [3] In the video identification processing device according to one aspect of the present invention, the class to which the video data belongs is determined based on the video characteristics of the input video data, and the video data is associated with the video data. A video identification processing device, further comprising: a video classification processing unit that writes the text data to the content database.
According to this, the video classification processing unit classifies the pairs of video data and text data into classes based on the characteristics of the video regardless of the text data and the meaning represented by the video data, and the content database. Write to.

［４］また、本発明の一態様による映像識別処理装置は、テキストデータに含まれる単語の出現頻度に基づき算出した、前記単語とクラスとの関係を表わす確率パラメタ値を、当該単語及び当該クラスに関連付けて保持する確率パラメタ値データベースと、入力されるテキストデータを読み込み、当該テキストデータに含まれる単語の出現頻度と、前記確率パラメタ値データベースから読み出した前記確率パラメタ値と、に基づき、当該テキストデータが前記クラスに属する確率であるクラス生起確率を各々のクラスごとに算出する識別処理部と、を備えることを特徴とする。 [4] Further, the video identification processing device according to one aspect of the present invention uses the probability parameter value representing the relationship between the word and the class calculated based on the appearance frequency of the word included in the text data as the word and the class. A probability parameter value database stored in association with the text data, the input text data, the appearance frequency of words included in the text data, and the probability parameter value read from the probability parameter value database, the text An identification processing unit that calculates a class occurrence probability, which is a probability that data belongs to the class, for each class.

［５］また、本発明の一態様による映像識別処理装置においては、映像データをクラスごとに分類して保持する映像データベースと、入力される検索語を基に、当該検索語にマッチする検索結果クラスを求め、映像データベースを参照することによって、前記検索結果クラスに属する映像データ又は前記検索結果クラスに属する映像データへの参照情報の少なくともいずれかを出力する検索処理部と、を備えるとともに、前記入力されるテキストデータには映像データが関連付けられており、前記識別処理部は、さらに、算出した前記クラス生起確率に基づき、前記入力されるテキストデータの最尤クラスを決定するとともに、前記映像データが決定された最尤クラスに属するように前記映像データを前記映像データベースに書き込む、ことを特徴とする。
これにより、検索語にマッチするクラスを求めることができる。そのクラスを最尤クラスとする映像データを（或いは映像データへの参照情報を）映像データベースから読み出して映像検索結果として出力することができる。 [5] Also, in the video identification processing device according to one aspect of the present invention, a video database that stores video data classified by class and a search result that matches the search word based on the input search word A search processing unit that obtains a class and outputs at least one of video data belonging to the search result class or reference information to video data belonging to the search result class by referring to a video database; and Video data is associated with the input text data, and the identification processing unit further determines a maximum likelihood class of the input text data based on the calculated class occurrence probability, and the video data Writing the video data to the video database so that the video data belongs to the maximum likelihood class determined. And features.
Thereby, a class that matches the search term can be obtained. Video data whose class is the maximum likelihood class (or reference information to the video data) can be read from the video database and output as a video search result.

［６］また、本発明の一態様による映像識別処理装置においては、前記単語は、前記テキストデータを形態素解析処理した結果の中から助詞と助動詞と記号とを除外したものであることを特徴とする。 [6] In the video identification processing device according to one aspect of the present invention, the word is obtained by excluding particles, auxiliary verbs, and symbols from the result of morphological analysis processing of the text data. To do.

［７］また、本発明の一態様による映像識別処理装置においては、前記単語は、前記テキストデータを形態素解析処理した結果の中から名詞と動詞と形容詞と未知語とを抽出したものであることを特徴とする。
ここで、未知語とは、後述するように、形態素解析処理の結果、未知の品詞として扱われる語であり、そのように扱われた品詞は、実際には名詞である可能性が高い。 [7] Further, in the video identification processing device according to one aspect of the present invention, the word is obtained by extracting a noun, a verb, an adjective, and an unknown word from a result of morphological analysis processing on the text data. It is characterized by.
Here, as will be described later, the unknown word is a word that is treated as an unknown part of speech as a result of morphological analysis processing, and the part of speech treated as such is likely to be a noun actually.

［８］また、本発明の一態様による映像識別処理装置においては、前記映像データは、放送番組の映像データであり、前記テキストデータは、前記放送番組のクローズドキャプションのテキストデータ、時刻情報に関連付けられた前記放送番組の台本のテキストデータ、前記放送番組の映像データを基に音声認識処理をした結果得られ時刻情報に関連付けられたテキストデータ、のいずれかであることを特徴とする。 [8] In the video identification processing device according to the aspect of the present invention, the video data is video data of a broadcast program, and the text data is associated with text data of closed caption of the broadcast program and time information. Or the text data associated with time information obtained as a result of voice recognition processing based on the video data of the broadcast program.

［９］また、本発明の一態様による画像識別処理装置は、画像データとテキストデータとを含むウェブコンテンツデータをクラスごとに分類して保持するコンテンツデータベースと、単語と前記クラスとの関係を表わす確率パラメタ値を、当該単語と当該クラスとに関連付けて保持する確率パラメタ値データベースと、前記コンテンツデータベースから読み出した前記テキストデータに含まれる単語の出現頻度に基づき、前記単語と前記クラスとの関係を表わす確率パラメタ値を算出する処理を行ない、算出された前記確率パラメタ値を当該単語及び当該クラスに関連付けて前記確率パラメタ値データベースに書き込む確率計算処理部と、を備えることを特徴とする。
を特徴とする。
これにより、ウェブコンテンツデータに含まれる画像データの画像的特徴に基づいて分類されたウェブコンテンツデータを、クラスごとにコンテンツデータベースに保持する。そして、そのクラスに含まれるウェブコンテンツデータ、に含まれるテキストデータの語彙的特徴に基づき、確率計算処理部は、単語とクラスとの関係を表わす確率パラメタ値（最良の形態のひとつとして、例えば、与えられた単語に対するクラス生起確率）を確率パラメタ値データベースに書き込み、蓄積することができる。 [9] In addition, the image identification processing device according to an aspect of the present invention represents a content database that classifies and holds web content data including image data and text data for each class, and a relationship between words and the classes. Based on the probability parameter value database that holds the probability parameter value in association with the word and the class, and the appearance frequency of the word included in the text data read from the content database, the relationship between the word and the class is determined. A probability calculation processing unit that performs a process of calculating a probability parameter value to be expressed and writes the calculated probability parameter value in the probability parameter value database in association with the word and the class.
It is characterized by.
Thereby, the web content data classified based on the image characteristics of the image data included in the web content data is held in the content database for each class. Then, based on the lexical characteristics of the text data included in the web content data included in the class, the probability calculation processing unit calculates a probability parameter value representing the relationship between the word and the class (as one of the best modes, for example, The class occurrence probability for a given word) can be written and stored in the probability parameter value database.

［１０］また、本発明の一態様によるコンピュータプログラムは、映像データと前記映像データに関連付けられたテキストデータとをクラスごとに分類して保持するコンテンツデータベースと、単語と前記クラスとの関係を表わす確率パラメタ値を、当該単語と当該クラスとに関連付けて保持する確率パラメタ値データベースと、を備えるコンピュータに、前記コンテンツデータベースから読み出した前記テキストデータに含まれる単語の出現頻度に基づき、前記単語と前記クラスとの関係を表わす確率パラメタ値を算出する処理を行ない、算出された前記確率パラメタ値を当該単語及び当該クラスに関連付けて前記確率パラメタ値データベースに書き込む確率計算処理ステップ、の処理を実行させるコンピュータプログラムである。 [10] A computer program according to an aspect of the present invention represents a relation between a content database that stores video data and text data associated with the video data by class, and a word and the class. A computer comprising a probability parameter value database that stores the probability parameter value in association with the word and the class, based on the appearance frequency of the word included in the text data read from the content database, the word and the A computer that executes a process of calculating a probability parameter value representing a relationship with a class, and writing the calculated probability parameter value in the probability parameter value database in association with the word and the class. It is a program.

［１１］また、本発明の一態様によるコンピュータプログラムは、テキストデータに含まれる単語の出現頻度に基づき算出した、前記単語とクラスとの関係を表わす確率パラメタ値を、当該単語及び当該クラスに関連付けて保持する確率パラメタ値データベース、を備えるコンピュータに、入力されるテキストデータを読み込み、当該テキストデータに含まれる単語の出現頻度と、前記確率パラメタ値データベースから読み出した前記確率パラメタ値と、に基づき、当該テキストデータが前記クラスに属する確率であるクラス生起確率を各々のクラスごとに算出する識別処理ステップ、の処理を実行させるコンピュータプログラムである。 [11] Further, the computer program according to one aspect of the present invention associates a probability parameter value representing a relationship between the word and the class, calculated based on the appearance frequency of the word included in the text data, with the word and the class. A computer having a probability parameter value database to be stored, read the input text data, based on the appearance frequency of words included in the text data, and the probability parameter value read from the probability parameter value database, A computer program for executing a process of an identification processing step for calculating a class occurrence probability, which is a probability that the text data belongs to the class, for each class.

［１２］また、本発明の一態様によるコンピュータプログラムは、画像データとテキストデータとを含むウェブコンテンツデータをクラスごとに分類して保持するコンテンツデータベースと、単語と前記クラスとの関係を表わす確率パラメタ値を、当該単語と当該クラスとに関連付けて保持する確率パラメタ値データベースと、を備えるコンピュータに、前記コンテンツデータベースから読み出した前記テキストデータに含まれる単語の出現頻度に基づき、前記単語と前記クラスとの関係を表わす確率パラメタ値を算出する処理を行ない、算出された前記確率パラメタ値を当該単語及び当該クラスに関連付けて前記確率パラメタ値データベースに書き込む確率計算処理ステップと、の処理を実行させるコンピュータプログラムである。 [12] A computer program according to an aspect of the present invention includes a content database that classifies and holds web content data including image data and text data for each class, and a probability parameter that represents a relationship between a word and the class. A probability parameter value database that stores values in association with the word and the class, and based on the appearance frequency of the word included in the text data read from the content database, the word and the class A probability calculation processing step of performing a process of calculating a probability parameter value representing the relationship between the probability parameter value and writing the calculated probability parameter value to the probability parameter value database in association with the word and the class. It is.

本発明によれば、映像データに関連するテキストデータに含まれない語であっても、映像に映っている被写体等を表わす語を用いて、映像を検索できるようになる。或いは、そのような映像検索の基になる確率パラメタ値データベース（識別モデルデータベース）を作成することが可能となる。
また、本発明によれば、画像データ（或いは映像データ）とテキストデータを含むウェブコンテンツを基に、そのテキストデータに含まれない語であっても、画像、映像に映っている被写体等を表わす語を用いて、画像、映像を検索できるようになる。
なお、テキストデータとして用いるのは、例えば、クローズドキャプションデータや、台本データや、音声認識結果データであるため、わざわざ手間をかけて新たにテキストデータを作る必要がない。つまり、効率的に、検索対象となるデータを蓄積することができる。 According to the present invention, even if the word is not included in the text data related to the video data, the video can be searched using the word representing the subject or the like shown in the video. Alternatively, it is possible to create a probability parameter value database (identification model database) that is the basis of such video search.
In addition, according to the present invention, based on web content including image data (or video data) and text data, even a word not included in the text data represents a subject or the like reflected in the image or video. You can search for images and videos using words.
Note that text data used is, for example, closed caption data, script data, or voice recognition result data, so that it is not necessary to create new text data by taking time and effort. That is, the data to be searched can be accumulated efficiently.

［第１の実施形態］
以下、図面を参照しながら、本発明の第１の実施の形態について説明する。
＜識別モデル生成処理＞
図１は、本実施形態による識別モデル生成装置（映像識別処理装置）の機能構成を示すブロック図である。この図において、符号１００が識別モデル生成装置（映像識別処理装置）である。そして、識別モデル生成装置１００は、番組映像データ１（映像データ）を格納する記憶手段と、クローズドキャプションデータ２（テキストデータ）を格納する記憶手段と、映像分類処理部３と、コンテンツデータベース４と、確率計算処理部５と、識別モデルデータベース６（確率パラメタ値データベース）を含んで構成される。 [First Embodiment]
The first embodiment of the present invention will be described below with reference to the drawings.
<Identification model generation process>
FIG. 1 is a block diagram showing a functional configuration of an identification model generation device (video identification processing device) according to the present embodiment. In this figure, reference numeral 100 denotes an identification model generation device (video identification processing device). The identification model generation apparatus 100 includes a storage unit that stores program video data 1 (video data), a storage unit that stores closed caption data 2 (text data), a video classification processing unit 3, a content database 4, and the like. The probability calculation processing unit 5 and the identification model database 6 (probability parameter value database) are included.

番組映像データ１は、例えばテレビ放送などの番組の映像を表わすデータであり、数分から数時間程度の長さの動画と、その動画に同期して記録されている単数又は複数のチャネルの音声を含むものである。クローズドキャプションデータ２は、番組映像データに対応するテキストデータであり、単数又は複数の文が時刻に対応付いて記録されている。言い換えれば、クローズドキャプションデータ２のテキストデータは、時間に同期しているデータである。ここでの時刻としては、放送日時等に対応する標準時間（日本標準時などのローカルタイムや、世界標準時）を用いても良いし、映像開始時点からの相対時刻を用いても良い。クローズドキャプションデータ２も、元はテレビ放送に用いられる目的で作成されているものであり、典型的には番組内での人の発話を文字としてテレビ画面に表示するためのテキストデータである。番組映像データ１やクローズドキャプションデータ２は、放送後には、後で資料等として用いるために放送局のアーカイブデータとして蓄積されている。これらのデータは、磁気テープ、磁気ハードディスク、光ディスクなどの媒体に記録されており、それぞれを読み出す装置を用いて適宜読み出せる。 The program video data 1 is data representing a video of a program such as a television broadcast, for example. A video having a length of several minutes to several hours and audio of one or a plurality of channels recorded in synchronization with the video are recorded. Is included. The closed caption data 2 is text data corresponding to the program video data, and one or a plurality of sentences are recorded in association with the time. In other words, the text data of the closed caption data 2 is data synchronized with time. As the time here, standard time (local time such as Japan standard time or world standard time) corresponding to the broadcast date and time may be used, or relative time from the video start time may be used. The closed caption data 2 is also originally created for the purpose of being used for television broadcasting, and is typically text data for displaying a person's utterance in a program as characters on a television screen. The program video data 1 and the closed caption data 2 are stored as archive data of a broadcasting station for later use as materials after broadcasting. These data are recorded on a medium such as a magnetic tape, a magnetic hard disk, or an optical disk, and can be read as appropriate using a device that reads each data.

映像分類処理部３は、入力される番組映像データ１の映像の特徴に基づき、当該番組映像データ１が属するクラスを決定し、当該番組映像データ１と、当該映像データに関連付けられたクローズドキャプションデータ２を、コンテンツデータベース４に書き込むものである。 The video classification processing unit 3 determines a class to which the program video data 1 belongs based on the video characteristics of the input program video data 1, and the program video data 1 and closed caption data associated with the video data. 2 is written in the content database 4.

コンテンツデータベース４は、番組映像データ１とその番組映像データ１に関連付けられたクローズドキャプションデータ２とをクラスごとに分類して保持するものである。図示するようにコンテンツデータベース４は、クラス１、クラス２、クラス３、・・・というクラスごとに、映像データとテキストデータとを関連付けて記憶している。各クラスには、映像データとテキストデータのペアが複数含まれていても良い。コンテンツデータベース４は、磁気ディスク装置や光ディスク装置などといった記憶手段を用いて実現される。コンテンツデータベース４のデータ構造については後で別図面を参照しながら説明する。 The content database 4 classifies and holds program video data 1 and closed caption data 2 associated with the program video data 1 for each class. As shown in the figure, the content database 4 stores video data and text data in association with each other, class 1, class 2, class 3,. Each class may include a plurality of pairs of video data and text data. The content database 4 is realized using storage means such as a magnetic disk device or an optical disk device. The data structure of the content database 4 will be described later with reference to another drawing.

確率計算処理部５は、コンテンツデータベース４から読み出したテキストデータに含まれる単語の出現頻度に基づき、単語とクラスとの関係を表わす確率パラメタ値を算出する処理を行なう。また、確率計算処理部５は、算出された確率パラメタ値を当該単語及び当該クラスに関連付けて確率パラメタ値データベースに書き込む。
ここで、確率パラメタ値とは、例えば、あるクラスにおけるある単語の生起確率の値である。このような生起確率の計算方法については、後で詳しく説明する。 The probability calculation processing unit 5 performs a process of calculating a probability parameter value representing the relationship between the word and the class based on the appearance frequency of the word included in the text data read from the content database 4. In addition, the probability calculation processing unit 5 writes the calculated probability parameter value in the probability parameter value database in association with the word and the class.
Here, the probability parameter value is, for example, a value of occurrence probability of a certain word in a certain class. Such an occurrence probability calculation method will be described in detail later.

識別モデルデータベース６は、上で計算された、単語とクラスとの関係を表わす確率パラメタ値を、当該単語と当該クラスとに関連付けて保持するデータベースである。識別モデルデータベース６は、磁気ディスク装置や光ディスク装置などといった記憶手段を用いて実現される。識別モデルデータベース６のデータ構造については後で別図面を参照しながら説明する。 The identification model database 6 is a database that holds the probability parameter value representing the relationship between a word and a class calculated above in association with the word and the class. The identification model database 6 is realized using storage means such as a magnetic disk device or an optical disk device. The data structure of the identification model database 6 will be described later with reference to another drawing.

図２は、コンテンツデータベース４のデータ構成を示す概略図である。図示するようにコンテンツデータベース４は、テーブル（２次元の表）形式のデータで構成され、そのテーブルは、クラス、映像データ、テキストデータといったデータ項目を含む。このテーブルの各行において、それぞれのデータ項目の値によって、クラスと映像データとテキストデータとが関連付けられている。例えば、図示するデータ例の１行目では、「映像データ１」と「テキストデータ１」とが対応しており、その「映像データ１」が属するクラスが「クラス１」であることを表わしている。ここで、「クラス１」、「クラス２」、「クラス３」などは、クラスを識別するデータである。また「映像データ１」、「映像データ２」などは、ＭＰＥＧ（Moving Picture Experts Group）−４形式などのバイナリデータそのもの、あるいはそのようなバイナリデータを格納するデータファイルへの参照データである。また「テキストデータ１」、「テキストデータ２」などは、映像に対応するクローズドキャプションのテキストデータそのもの、あるいはそのようなテキストデータを格納するデータファイルへの参照データである。 FIG. 2 is a schematic diagram showing the data structure of the content database 4. As shown in the figure, the content database 4 includes data in a table (two-dimensional table) format, and the table includes data items such as class, video data, and text data. In each row of the table, the class, the video data, and the text data are associated with each other according to the value of each data item. For example, in the first line of the illustrated data example, “video data 1” corresponds to “text data 1”, and the class to which “video data 1” belongs is “class 1”. Yes. Here, “class 1”, “class 2”, “class 3”, and the like are data for identifying classes. “Video data 1”, “Video data 2”, and the like are binary data itself such as MPEG (Moving Picture Experts Group) -4 format or reference data to a data file storing such binary data. “Text data 1”, “Text data 2”, and the like are closed caption text data corresponding to video, or reference data to a data file storing such text data.

図３は、識別モデルデータベース６のデータ構成およびデータ例を示す概略図である。図示するようにコンテンツデータベース４は、テーブル形式のデータで構成され、そのテーブルは、クラス、単語、生起確率というデータ項目を含む。このテーブルの各行において、それぞれのデータ項目の値によって、クラスと、単語と、生起確率とが関連付けられている。例えば、図示するデータ例の１行目では、「カーブ」という単語が「クラス１」というクラスにおいて生起する確率の値が、生起確率の欄に格納されている（生起確率の数値そのものは図示せず）。 FIG. 3 is a schematic diagram illustrating a data configuration and a data example of the identification model database 6. As shown in the figure, the content database 4 is configured by data in a table format, and the table includes data items such as class, word, and occurrence probability. In each row of the table, the class, the word, and the occurrence probability are associated with each other according to the value of the data item. For example, in the first row of the illustrated data example, the value of the probability that the word “curve” occurs in the class “class 1” is stored in the column of the occurrence probability (the occurrence probability value itself is not shown). )

次に、識別モデル生成装置１００の各部による処理の手順について、順に説明する。
まず、映像分類処理部３は、番組映像データ１を解析することにより、番組映像データ１およびそれに対応するクローズドオキャプションデータ２を時間方向に適当な長さに区切る。区切るタイミングは、例えば、番組映像のシーンの切り替え時や、音声レベルが所定値未満から所定値以上に切り替わる時（またはその逆の時）とする。映像シーンの切り替えタイミングは、動画データを時間方向に微分することによって検出する。
また、映像分類処理部３は、区切られた各区間の映像の特徴を基に、番組映像データを複数のクラスに分類する。ここでの分類のための映像の特徴とは、ＲＧＢ（赤・緑・青）それぞれのチャネルの輝度やそれらの画面上の位置における分布など、純粋に画像データそのものから得られる特徴であり、映像の意味的な特徴による分類は必要ない。そして、映像分類処理部３は、分類結果のクラスごとに分類した形で、番組映像データ１とそれに対応付くクローズドキャプションデータ２とをコンテンツデータベースに書き込む。 Next, processing procedures by each unit of the identification model generation device 100 will be described in order.
First, the video classification processing unit 3 analyzes the program video data 1 to divide the program video data 1 and the closed caption data 2 corresponding thereto into an appropriate length in the time direction. The timing of division is, for example, when the scene of the program video is switched, or when the audio level is switched from less than a predetermined value to more than a predetermined value (or vice versa). The video scene switching timing is detected by differentiating the moving image data in the time direction.
In addition, the video classification processing unit 3 classifies the program video data into a plurality of classes based on the video features of each segmented section. The video features for classification here are features that are obtained purely from the image data itself, such as the luminance of each RGB (red, green, blue) channel and their distribution in the position on the screen. Classification by the semantic features of is not necessary. Then, the video classification processing unit 3 writes the program video data 1 and the closed caption data 2 associated therewith in the content database in a form classified for each class of the classification result.

次に、確率計算処理部５は、分類されている画像（映像データ）のクラスに対応するテキストを解析する。この際、言語情報（テキストに含まれる単語）と分類クラスの関係を数値化した確率パラメタ値を計算する。
図４は、確率計算処理部５による処理の前半部分の手順を示すフローチャートである。以下、このフローチャートに沿って説明する。 Next, the probability calculation processing unit 5 analyzes the text corresponding to the class of the classified image (video data). At this time, a probability parameter value obtained by quantifying the relationship between the linguistic information (words included in the text) and the classification class is calculated.
FIG. 4 is a flowchart showing the procedure of the first half of the processing by the probability calculation processing unit 5. Hereinafter, it demonstrates along this flowchart.

まずステップＳ１１において、確率計算処理部５は、コンテンツデータベース４から文書を読み込む。この文書とは、コンテンツデータベース４に記憶されているテキストデータである。そして、１つのクラスに属するテキストデータをすべてまとめたものが１つの文書である。
次に、ステップＳ１２において、確率計算処理部５は、読み出した文書の中から一文を取り出す。
次に、ステップＳ１３において、確率計算処理部５は、取り出した一文について形態素解析の処理を行なう。そして、この形態素解析の結果として、文書に含まれる単語（形態素）が抽出される。 First, in step S <b> 11, the probability calculation processing unit 5 reads a document from the content database 4. This document is text data stored in the content database 4. A document is a collection of all text data belonging to one class.
Next, in step S12, the probability calculation processing unit 5 takes out one sentence from the read document.
Next, in step S <b> 13, the probability calculation processing unit 5 performs a morphological analysis process on the extracted sentence. As a result of this morphological analysis, words (morphemes) included in the document are extracted.

次に、ステップＳ１４において、確率計算処理部５は、抽出された各単語が、助詞、助動詞、記号以外の単語であるかどうかを判定する。判定結果が肯定的な場合、つまりその単語が助詞、助動詞、記号のいずれでもない場合には、そのままステップＳ１５に進む。判定結果が否定的な場合、つまりその単語が助詞、助動詞、記号のいずれかである場合は、ステップＳ１５をスキップして、ステップＳ１６に進む。
次に、ステップＳ１５において、確率計算処理部５は、単語の頻度を計算し、更新する。なお、このステップＳ１５の処理は、上記の形態素解析の結果得られた単語であって、助詞、助動詞、記号のいずれでもない単語の各々について行なう。なお、頻度の計算の詳細については後述する。 Next, in step S14, the probability calculation processing unit 5 determines whether or not each extracted word is a word other than a particle, an auxiliary verb, and a symbol. If the determination result is affirmative, that is, if the word is not a particle, auxiliary verb, or symbol, the process proceeds directly to step S15. If the determination result is negative, that is, if the word is any of a particle, an auxiliary verb, or a symbol, step S15 is skipped and the process proceeds to step S16.
Next, in step S15, the probability calculation processing unit 5 calculates and updates the word frequency. The process of step S15 is performed for each word obtained as a result of the above morphological analysis and not a particle, auxiliary verb, or symbol. Details of the frequency calculation will be described later.

次に、ステップＳ１６において、確率計算処理部５は、上のステップＳ１２で取り出した一文が、当該文書の最後の文であったかどうかを判定する。最後の文であった場合にはステップＳ１７に移る。最後の文でなかった場合には、次の一文の処理をするために、ステップＳ１２に戻る。
次に、ステップＳ１７において、確率計算処理部５は、すべての文についての各単語の頻度を集計し、頻度ベクトルを求める。頻度ベクトルについては、後で説明する。 Next, in step S16, the probability calculation processing unit 5 determines whether or not the one sentence extracted in step S12 above is the last sentence of the document. If it is the last sentence, the process proceeds to step S17. If it is not the last sentence, the process returns to step S12 to process the next sentence.
Next, in step S <b> 17, the probability calculation processing unit 5 totals the frequencies of the words for all sentences to obtain a frequency vector. The frequency vector will be described later.

次に、上述したステップＳ１１からＳ１７までの処理について、数式を用いてより詳細に説明する。つまり、確率計算処理部５は、コンテンツデータベース４から読み出したテキストデータの中の文を形態素解析し、単語に分ける。そして、それらの単語から助詞、助動詞、記号を除外し、残った単語の出現頻度により、文書ベクトルｘと頻度ベクトルＮ（ｘ）を求める。これら、文書ベクトルｘと頻度ベクトルＮ（ｘ）は、それぞれ下記の式（１）および（２）で表わされる。 Next, the process from step S11 to S17 mentioned above is demonstrated in detail using numerical formula. That is, the probability calculation processing unit 5 performs morphological analysis on the sentence in the text data read from the content database 4 and divides it into words. Then, particles, auxiliary verbs, and symbols are excluded from those words, and a document vector x and a frequency vector N (x) are obtained based on the appearance frequency of the remaining words. These document vector x and frequency vector N (x) are expressed by the following equations (1) and (2), respectively.

但しここで、文書とは、１つのクラスに属するテキストデータの全てである。そして、ｗ_ｉ（ｉ＝１，２，３，・・・，ｎ）は、文書に属する単語である。なお、ｗ_ｉの集合には、重複する単語はない。つまり、当該文書内には、助詞、助動詞、記号を除いて、ｎ種類の単語が出現している。言い換えれば、任意のｉ，ｊ（但しｉ≠ｊ）に対して、ｗ_ｉ≠ｗ_ｊである。また、ｎ_ｉ（ｉ＝１，２，３，・・・，ｎ）は、当該文書内での単語ｗ_ｉの出現頻度（即ち、出現回数）である。 Here, the document is all text data belonging to one class. W _i (i = 1, 2, 3,..., N) is a word belonging to the document. Note that there are no overlapping words in the set of w _i . That is, n types of words appear in the document, excluding particles, auxiliary verbs, and symbols. In other words, for any i, j (where i ≠ j), w _i ≠ w _j . N _i (i = 1, 2, 3,..., N) is the appearance frequency (that is, the number of appearances) of the word w _{i in} the document.

上述したステップＳ１１からＳ１７までの処理で、文書ベクトルｘと頻度ベクトルＮ（ｘ）が得られているため、これらを用いて、次に、確率計算処理部５は、下記の式（３）を用いて、各々のクラスにおける単語ｗ_ｉの生起確率Ｐ（ｗ_ｉ｜ｃ）を算出する。 Since the document vector x and the frequency vector N (x) are obtained in the processing from step S11 to S17 described above, the probability calculation processing unit 5 next uses the following expression (3) using these. The occurrence probability P (w _i | c) of the word w _i in each class is calculated.

但し、Ｎ（ｉ，ｘ）は、文書ｘ内での単語ｗ_ｉの頻度（出現回数）である。また、この式では、ゼロ頻度問題を考慮して、ラプラス推定（Ｌａｐｌａｃｅｅｓｔｉｍａｔｉｏｎ）によるスムージングを行い、そのパラメタとしてスムージング係数δを用いている。なお、Ｖは、クラスｃ中の単語の種類数を表わしている。 However, N (i, x) is the frequency of the word _{w i} in the document x (number of occurrences). Also, in this equation, smoothing by Laplace estimation is performed in consideration of the zero frequency problem, and the smoothing coefficient δ is used as the parameter. V represents the number of types of words in class c.

なお、確率計算処理部５は、すべての単語とすべてのクラスとの組み合わせについて、上記の方法により生起確率Ｐ（ｗ_ｉ｜ｃ）を算出する。そして、単語およびクラスと関連付けて、算出された生起確率Ｐ（ｗ_ｉ｜ｃ）を識別モデルデータベース６のテーブルに書き込む。 The probability calculation processing unit 5 calculates the occurrence probability P (w _i | c) by the above method for all combinations of words and all classes. Then, the calculated occurrence probability P (w _i | c) is written in the table of the identification model database 6 in association with the word and the class.

以上の一連の処理により、識別モデル生成装置１００は、番組映像データ１とクローズドキャプションデータ２のペアを基に、クローズドキャプションデータ２に含まれる単語と映像の特徴に基づくクラスとの関係を表わす確率パラメタ値である生起確率Ｐ（ｗ_ｉ｜ｃ）を算出し、これによる識別モデルデータベース６を構築する。 Through the above-described series of processing, the identification model generation device 100 represents the relationship between the words included in the closed caption data 2 and the class based on the video features based on the pair of the program video data 1 and the closed caption data 2. The occurrence probability P (w _i | c), which is a parameter value, is calculated, and the identification model database 6 based on this is constructed.

＜識別処理＞
次に、上述した識別モデル生成処理によって構築された識別モデルを用いて、新たに入力されるテキストデータがマッチする画像クラスを識別する処理について説明する。
図５は、本実施形態による識別処理装置（映像識別処理装置）の機能構成を示すブロック図である。図示するように、符号２００が識別処理装置である。そして、この識別処理装置２００は、識別モデルデータベース６と、文書７を記憶する記憶手段と、識別処理部８と、画像クラス９を記憶する記憶手段とを含んで構成される。 <Identification process>
Next, processing for identifying an image class that matches newly input text data using the identification model constructed by the above-described identification model generation processing will be described.
FIG. 5 is a block diagram showing a functional configuration of the identification processing device (video identification processing device) according to the present embodiment. As shown in the figure, reference numeral 200 is an identification processing device. The identification processing device 200 includes an identification model database 6, a storage unit that stores the document 7, an identification processing unit 8, and a storage unit that stores the image class 9.

上記構成において、識別モデルデータベース６は、識別モデル生成処理において算出された確率パラメタ値（つまり、生起確率Ｐ（ｗ_ｉ｜ｃ））を、単語ごと且つクラスごとに保持している。また、文書７は、入力されるテキストデータである。識別処理装置２００の利用者は、文書７に相応しい映像を取り出すことを目的として、この文書７を入力することが可能となっている。また、識別処理部８は、文書７（テキストデータ）を読み込み、この文書７に含まれる単語の出現頻度を計算し、算出された出現頻度と、識別モデルデータベース６から読み出した確率パラメタ値とに基づき、文書７のテキストがあるクラスに属する確率であるクラス生起確率を各々のクラスごとに算出するものである。また、識別処理部８は、算出されたクラス生起確率を用いて、文書７に対応する最尤クラスを決定する。画像クラス９は、識別処理部８によって決定された最尤クラスが出力されたものである。 In the above configuration, the identification model database 6 holds the probability parameter value (ie, the occurrence probability P (w _i | c)) calculated in the identification model generation process for each word and for each class. Document 7 is input text data. The user of the identification processing device 200 can input the document 7 for the purpose of extracting a video suitable for the document 7. The identification processing unit 8 reads the document 7 (text data), calculates the appearance frequency of words included in the document 7, and calculates the appearance frequency and the probability parameter value read from the identification model database 6. Based on this, the class occurrence probability, which is the probability that the text of the document 7 belongs to a certain class, is calculated for each class. Further, the identification processing unit 8 determines the maximum likelihood class corresponding to the document 7 by using the calculated class occurrence probability. The image class 9 is obtained by outputting the maximum likelihood class determined by the identification processing unit 8.

なお、文書７および画像クラス９を記憶する記憶手段は、例えば、半導体メモリや、磁気ディスク装置や、光ディスク装置などを用いて実現する。 The storage means for storing the document 7 and the image class 9 is realized using, for example, a semiconductor memory, a magnetic disk device, an optical disk device, or the like.

図６は、上記の識別処理部８による識別処理の手順を示すフローチャートである。以下、この図に沿って識別処理部８が実行する識別処理について説明する。
まず、ステップＳ３１において、識別処理部８は、入力される文書７を読み込む。
次に、ステップＳ３２において、識別処理部８は、読み込んだ文書７に含まれる単語を抽出し、それら単語の頻度を計算する。この部分の、単語抽出と頻度計算の方法は、識別モデル生成処理の中での方法（前述した、ステップＳ１１からＳ１７までの処理の方法）と同様である。このステップＳ３２の結果、入力された文書７に対する文書ベクトルと頻度ベクトルとが得られる。 FIG. 6 is a flowchart showing a procedure of identification processing by the identification processing unit 8. Hereinafter, the identification process performed by the identification processing unit 8 will be described with reference to FIG.
First, in step S31, the identification processing unit 8 reads the input document 7.
Next, in step S32, the identification processing unit 8 extracts words included in the read document 7 and calculates the frequency of those words. The method of word extraction and frequency calculation of this part is the same as the method in the identification model generation processing (the processing method from steps S11 to S17 described above). As a result of step S32, a document vector and a frequency vector for the input document 7 are obtained.

次に、ステップＳ３３において、識別処理部８は、下記の方法を用いて、入力された文書７に対する各クラスの生起確率（クラス生起確率）を計算する。
その方法とは、まず、識別処理部８は、適宜識別モデルデータベース６を参照して必要な生起確率Ｐ（ｗ_ｉ｜ｃ）の値を読み出しながら、下記の式（４）を用いて、条件付確率Ｐ（ｃ｜ｘ）を求める。 Next, in step S33, the identification processing unit 8 calculates the occurrence probability (class occurrence probability) of each class for the input document 7 using the following method.
First, the identification processing unit 8 reads the necessary occurrence probability P (w _i | c) with reference to the identification model database 6 as appropriate, and uses the following equation (4) to The attached probability P (c | x) is obtained.

ここで、ｘは、入力された文書７の文書ベクトルである。また、Ｎ（ｉ，ｘ）は、文書ベクトルｘに対応する単語ｗ_ｉの頻度である。また、「！」は階乗演算子である。 Here, x is a document vector of the input document 7. N (i, x) is the frequency of the word w _i corresponding to the document vector x. “!” Is a factorial operator.

また、式（４）において、Ｐ（ｗ_ｉ｜ｃ）は、識別処理部８が単語ｗ_ｉおよびクラスｃをキーとして識別モデルデータベース６から読み出した確率パラメタ値である。なお、識別モデルデータベース６に単語ｗ_ｉに対応するＰ（ｗ_ｉ｜ｃ）の値が格納されていない場合には、式（３）により、Ｐ（ｗ_ｉ｜ｃ）＝δ／（δ・｜Ｖ｜）とする。つまり、式（３）により、Ｐ（ｗ_ｉ｜ｃ）＝１／｜Ｖ｜とする。 In the equation (4), P (w _i | c) is a probability parameter value read from the identification model database 6 by the identification processing unit 8 using the word w _i and the class c as keys. If the value of P (w _i | c) corresponding to the word w _i is not stored in the identification model database 6, P (w _i | c) = δ / (δ · | V |). That is, P (w _i | c) = 1 / | V | is set according to the expression (3).

また、式（４）において、｜ｘ｜は文書の頻度である。つまり、文書ベクトルｘと同じ文書ベクトルを有する文書の出現数である。また、Ｐ（｜ｘ｜）は、文書頻度が｜ｘ｜である確率である。
つまり、あるクラスにおいてｘ１という文書ベクトルを持つ文書が２つあった場合には、｜ｘ｜は２である。そして、ベクトルｘ１に限らず、｜ｘ｜＝２となる文書ベクトルの割合がｐ（｜ｘ｜）である。
例えば、あるクラスにおいて、文書ベクトルｘ１を有する文書が２個あり（つまり｜ｘ１｜＝２）、文書ベクトルｘ２を有する文書が１個あり（つまり｜ｘ２｜＝１）、文書ベクトルｘ３を有する文書が２個ある（つまり｜ｘ３｜＝２）場合、且つその他の文書がない場合には、｜ｘ｜＝２である頻度が２であり（ｘ１とｘ３）、｜ｘ｜＝１である頻度が１である（ｘ２）。よって、ｐ（｜ｘ１｜）＝２／３であり、ｐ（｜ｘ２｜）＝１／３であり、ｐ（｜ｘ３｜）＝２／３である。 In Expression (4), | x | is a document frequency. That is, the number of appearances of a document having the same document vector as the document vector x. P (| x |) is a probability that the document frequency is | x |.
That is, if there are two documents having a document vector of x1 in a certain class, | x | is 2. In addition to the vector x1, the ratio of document vectors in which | x | = 2 is p (| x |).
For example, in a certain class, there are two documents having the document vector x1 (that is, | x1 | = 2), one document having the document vector x2 (that is, | x2 | = 1), and a document having the document vector x3. If there are two (ie, | x3 | = 2) and there are no other documents, the frequency of | x | = 2 is 2 (x1 and x3), and the frequency of | x | = 1 Is 1 (x2). Therefore, p (| x1 |) = 2/3, p (| x2 |) = 1/3, and p (| x3 |) = 2/3.

次に、識別処理部８は、式（４）に基づき、ベイズの定理により、式（５）を用いて、Ｐ（ｃ｜ｘ）を求める。 Next, the identification processing unit 8 obtains P (c | x) by using Bayes's theorem and using Equation (5) based on Equation (4).

ここで、Ｐ（ｃ）は、単純に、クラス間の文書の分布に基づくクラスｃの存在確率である。
なお、式（５）における最右辺の分母は、クラスに依らずに一定である。よって、式（６）の通りである。 Here, P (c) is simply the existence probability of class c based on the distribution of documents between classes.
Note that the rightmost denominator in the equation (5) is constant regardless of the class. Therefore, it is as Formula (6).

上では、計算過程として式（４）および式（５）を示したが、当然のことながら、識別処理部８が各クラスについて式（６）の右辺の値のみを直接計算し、その比率によってクラス生起確率Ｐ（ｃ｜ｘ）を計算するようにしても良い。 In the above, Expression (4) and Expression (5) are shown as the calculation process. However, as a matter of course, the identification processing unit 8 directly calculates only the value on the right side of Expression (6) for each class, The class occurrence probability P (c | x) may be calculated.

フローチャートに戻って、次に、ステップＳ３４において、識別処理部８は、計算された各クラスのクラス生起確率Ｐ（ｃ｜ｘ）に基づき、与えられた文書に対する最尤クラスを決定する。つまり、最尤クラスは、下の式（７）により与えられる。 Returning to the flowchart, next, in step S34, the identification processing unit 8 determines the maximum likelihood class for the given document based on the calculated class occurrence probability P (c | x) of each class. That is, the maximum likelihood class is given by the following equation (7).

つまり、この処理により、識別処理部８は、与えられた文書７に最も相応しい映像のクラスを決定することができる。
なお、当然のことながら、識別処理部８が各クラスについて式（７）を用いて、文書の最尤クラスを直接計算するようにしても良い。
なお、上述した処理は、ナイーブベイズ（ＮａｉｖｅＢａｙｅｓ）分離器による。また、各単語の生起は独立と仮定している。 That is, through this process, the identification processing unit 8 can determine the video class most suitable for the given document 7.
As a matter of course, the identification processing unit 8 may directly calculate the maximum likelihood class of the document using the equation (7) for each class.
In addition, the process mentioned above is based on a Naive Bayes separator. In addition, the occurrence of each word is assumed to be independent.

なお、識別処理部８への入力となる文書７に、映像データが関連付けられていても良い。その場合には、文書７と、識別モデルデータベース６に記憶されているクラス生起確率に基づいて、映像データのクラス生起確率を計算することができ、従って、その映像データの最尤クラス（画像クラス９）を出力することができる。 Note that video data may be associated with the document 7 that is input to the identification processing unit 8. In that case, the class occurrence probability of the video data can be calculated based on the document 7 and the class occurrence probability stored in the identification model database 6, and therefore the maximum likelihood class (image class) of the video data can be calculated. 9) can be output.

次に、識別モデルを生成する処理手段と、識別処理を行なう処理手段とを兼ね備えた、識別モデル生成及び識別処理装置（映像識別処理装置）について説明する。
図７は、識別モデル生成及び識別処理装置３００の構成を示すブロック図である。図示するように、識別モデル生成及び識別処理装置３００は、識別モデル生成装置１００と識別処理装置２００とを含んで構成される。識別モデル生成装置１００と識別処理装置２００の詳細については、既に、それぞれ、図面を参照しながら説明した。 Next, an identification model generation and identification processing apparatus (video identification processing apparatus) that includes both a processing unit that generates an identification model and a processing unit that performs identification processing will be described.
FIG. 7 is a block diagram illustrating a configuration of the identification model generation and identification processing device 300. As illustrated, the identification model generation and identification processing device 300 includes an identification model generation device 100 and an identification processing device 200. Details of the identification model generation device 100 and the identification processing device 200 have already been described with reference to the drawings.

そして既に説明したように、識別モデル生成装置１００と識別処理装置２００は、識別モデルデータベース６を有している。識別モデル生成装置１００における処理の結果、単語とクラスとの関係を表わす確率パラメタ値（つまり、ある単語を前提とした、クラス生起確率値）が、識別モデル生成装置１００の識別モデルデータベース６に書き込まれる。書き込まれた値は、識別処理装置２００の識別モデルデータベース６に反映される。識別処理装置２００側では、識別モデルデータベース６に書かれている確率パラメタ値を用いて、既に述べた識別処理を行なう。なお、両者間でデータベースの更新を反映する方法としては、例えば、ハードディスク装置のミラーリングを用いたり、データベース管理システムが有するミラーリングのしくみを用いたり、識別モデル生成装置１００側でのデータベース更新をトリガーとして検出して所定の手順が起動されるしくみを用いて識別処理装置２００側のデータベースにも同じ値を書き込むようにして実現できる。また、単純に、単一のデータベースを、識別モデル生成装置１００と識別処理装置２００の双方からアクセスできる形で共有するようにしてもよい。 As already described, the identification model generation device 100 and the identification processing device 200 have the identification model database 6. As a result of processing in the identification model generation device 100, a probability parameter value representing the relationship between a word and a class (that is, a class occurrence probability value based on a certain word) is written in the identification model database 6 of the identification model generation device 100. It is. The written value is reflected in the identification model database 6 of the identification processing device 200. On the identification processing device 200 side, the above-described identification processing is performed using the probability parameter value written in the identification model database 6. As a method of reflecting the database update between the two, for example, using mirroring of a hard disk device, using the mirroring mechanism of the database management system, or using database update on the identification model generation device 100 side as a trigger This can be realized by writing the same value to the database on the identification processing device 200 side using a mechanism for detecting and starting a predetermined procedure. Alternatively, a single database may be shared in a form that can be accessed from both the identification model generation device 100 and the identification processing device 200.

つまり、この識別モデル生成及び識別処理装置３００は、コンテンツデータベース（図示せず）内の番組映像データとクローズドキャプションデータを基に、番組映像データのクラスへの分類と、識別モデル構築の処理を行ない（以上は、識別モデル生成装置１００の処理）、構築された識別モデルを用いて、文書（テキスト）に対するクラス生起確率を計算し、このクラス生起確率に基づいて、その文書に適合する映像のクラスを決定する処理を行なう（以上は、識別処理装置２００の処理）ことができる。 That is, the identification model generation and identification processing device 300 performs classification of program video data into classes and identification model construction processing based on program video data and closed caption data in a content database (not shown). (The above is the processing of the identification model generation apparatus 100). Using the constructed identification model, the class occurrence probability for the document (text) is calculated, and based on this class occurrence probability, the class of the video that matches the document. Can be performed (the above is the processing of the identification processing device 200).

[第２の実施形態]
次に、本発明の第２の実施形態について説明する。なお、以下では、上述した第１の実施形態と同様である部分についての説明を省略し、本実施形態特有の部分についてのみ、説明する。
第１の実施形態に置いては文書ベクトルを構成する単語として、助詞以外、且つ助動詞以外、且つ記号以外の単語を用いていたが、本実施形態においては、その代わりに、名詞と、動詞と、形容詞と、未知語を用いるようにする。 [Second Embodiment]
Next, a second embodiment of the present invention will be described. In the following, description of parts that are the same as those in the first embodiment described above will be omitted, and only parts that are unique to the present embodiment will be described.
In the first embodiment, words other than particles, non-auxiliary verbs, and non-symbols are used as words constituting the document vector. In this embodiment, instead of nouns, verbs, Use adjectives and unknown words.

図８は、本実施形態の確率計算処理部５や識別処理部８における、単語抽出及び頻度ベクトル作成の処理の手順を示すフローチャートである。この図を参照しながら説明すると、ステップＳ１１で文書を読み込み、ステップＳ１２で文書から一文を取り出し、ステップＳ１３で形態素解析をするところまでの処理は、第１の実施形態と同様である。 FIG. 8 is a flowchart illustrating a procedure of word extraction and frequency vector creation processing in the probability calculation processing unit 5 and the identification processing unit 8 of the present embodiment. Describing with reference to this figure, the processing up to reading a document in step S11, taking out one sentence from the document in step S12, and performing morphological analysis in step S13 is the same as in the first embodiment.

次のステップＳ１４Ａにおける処理は本実施形態特有のものであり、ステップＳ１４Ａにおいて、確率計算処理部５または識別処理部８は、抽出された各単語が、名詞、動詞、形容詞、未知語のいずれかの単語であるかどうかを判定する。判定結果が肯定的な場合、つまりその単語が名詞、動詞、形容詞、未知語のいずれかである場合には、そのままステップＳ１５に進む。判定結果が否定的な場合、つまりその単語が名詞、動詞、形容詞、未知語のいずれでもない場合は、ステップＳ１５をスキップして、ステップＳ１６に進む。 The processing in the next step S14A is specific to this embodiment. In step S14A, the probability calculation processing unit 5 or the identification processing unit 8 determines whether each extracted word is a noun, a verb, an adjective, or an unknown word. It is determined whether it is a word. If the determination result is affirmative, that is, if the word is one of a noun, a verb, an adjective, or an unknown word, the process proceeds directly to step S15. If the determination result is negative, that is, if the word is not a noun, verb, adjective, or unknown word, step S15 is skipped and the process proceeds to step S16.

なお、ステップＳ１４Ａにおいて、未知語とは、形態素解析の結果として未知の品詞であると判定される形態素（単語）である。形態素解析処理の手法によっては、このような、品詞を特定できない未知語が、解析結果として出力される。なお、未知語と判定された語は、実際には名詞であることが多く、そのような未知語は、文書ベクトルおよび頻度ベクトルに含めるようにして、文書の特徴の一部とすることが有効である。 In step S14A, the unknown word is a morpheme (word) determined to be an unknown part of speech as a result of morphological analysis. Depending on the method of morphological analysis processing, such unknown words whose part of speech cannot be specified are output as analysis results. Note that words that are determined to be unknown words are often nouns in practice, and it is effective to include such unknown words as part of the document characteristics by including them in the document vector and frequency vector. It is.

そして、ステップＳ１５において単語頻度を計算、更新する処理と、ステップＳ１６において最後の文であるかどうかを判定して分岐する処理と、ステップＳ１７において頻度ベクトルを計算する処理は、第１の実施形態と同様である。 The process of calculating and updating the word frequency in step S15, the process of determining whether or not it is the last sentence in step S16, and the process of calculating the frequency vector in step S17 are the first embodiment. It is the same.

上記のようなステップＳ１４Ａの処理により、本実施形態による識別モデル生成装置（映像識別処理装置）や識別処理装置（映像識別処理装置）などでは、では、名詞、動詞、形容詞、未知語を対象として確率計算処理や識別処理が行なわれる。 In the identification model generation device (video identification processing device) or the identification processing device (video identification processing device) according to the present embodiment by the processing in step S14A as described above, nouns, verbs, adjectives, and unknown words are targeted. Probability calculation processing and identification processing are performed.

[第３の実施形態]
次に、本発明の第３の実施形態について説明する。なお、以下では、上述した実施形態と同様である部分についての説明を省略し、本実施形態特有の部分についてのみ、説明する。 [Third embodiment]
Next, a third embodiment of the present invention will be described. In the following, description of parts that are the same as those of the above-described embodiment will be omitted, and only parts that are unique to this embodiment will be described.

図９は、本実施形態による識別モデル生成装置１００Ｂ（映像識別処理装置）の機能構成を示すブロック図である。本実施形態の特徴は、第１の実施形態においてはクローズドキャプションデータ２を用いていたのに代えて、時刻情報が付与された番組台本データ２Ｂ（テキストデータ）を用いている点である。 FIG. 9 is a block diagram illustrating a functional configuration of the identification model generation device 100B (video identification processing device) according to the present embodiment. The feature of this embodiment is that, instead of using closed caption data 2 in the first embodiment, program script data 2B (text data) to which time information is added is used.

番組台本データ２Ｂは、番組映像データ１の番組を制作する際に用意されていた台本のテキストデータである。番組台本の内容と、実際の番組の中での出演者やナレーターの発話とは、完全に一致するわけではないが、非常に近い。よって、番組台本を基に、確率計算処理部５による確率計算の処理を行なっても、良好な結果が得られる。近年では番組台本データがテキストデータとして保存されている場合もあり、そのような場合には、たとえクローズドキャプションデータがない状況であっても、膨大な人手をかけることなく、識別モデルデータベース６を作成することができる。 The program script data 2B is script text data prepared when the program of the program video data 1 is produced. The content of the program script and the utterances of the performers and narrators in the actual program are not exactly the same, but they are very close. Therefore, even if the probability calculation processing by the probability calculation processing unit 5 is performed based on the program script, good results can be obtained. In recent years, program script data may be stored as text data. In such a case, even if there is no closed caption data, the identification model database 6 is created without enormous manpower. can do.

番組台本データ２Ｂには、時刻情報が付与されているため、この時刻情報を用いて番組映像データ１との同期ポイントを特定することができる。これにより、番組映像データ１を複数の時間区間に分割しても、各時間区間に適合したテキストデータを与えることができる。
なお、番組台本データ２Ｂは、番組制作段階で元々時刻情報が付加されている場合もあるためこの時刻情報をそのまま用いても良いし、時刻情報のみを人手で番組台本のテキストに付加して作成しても良い。 Since time information is given to the program script data 2B, a synchronization point with the program video data 1 can be specified using this time information. Thereby, even if the program video data 1 is divided into a plurality of time intervals, text data suitable for each time interval can be provided.
Since the program script data 2B may originally have time information added at the program production stage, the time information may be used as it is, or only the time information is manually added to the program script text. You may do it.

[第４の実施形態]
次に、本発明の第４の実施形態について説明する。なお、以下では、上述した実施形態と同様である部分についての説明を省略し、本実施形態特有の部分についてのみ、説明する。 [Fourth Embodiment]
Next, a fourth embodiment of the present invention will be described. In the following, description of parts that are the same as those of the above-described embodiment will be omitted, and only parts that are unique to this embodiment will be described.

図１０は、本実施形態による識別モデル生成装置１００Ｃ（映像識別処理装置）の機能構成を示すブロック図である。本実施形態の特徴は、第１の実施形態においてはクローズドキャプションデータ２を用いていたのに代えて、時刻情報が付与された音声認識結果データ２Ｃ（テキストデータ）を用いている点である。 FIG. 10 is a block diagram illustrating a functional configuration of the identification model generation device 100C (video identification processing device) according to the present embodiment. A feature of the present embodiment is that, instead of using the closed caption data 2 in the first embodiment, voice recognition result data 2C (text data) to which time information is added is used.

音声認識結果データ２Ｃは、対応する番組映像データ１を基に、予め音声認識処理により作成したものである。また、その番組映像データ１に含まれている映像および音声は元々時刻情報と同期しているため、その時刻情報を用いて、音声認識結果データ２Ｃにも時刻情報を付与するようにしておく。 The voice recognition result data 2C is created in advance by voice recognition processing based on the corresponding program video data 1. Since the video and audio included in the program video data 1 are originally synchronized with the time information, the time information is also given to the voice recognition result data 2C using the time information.

このように音声認識結果データ２Ｃを用いることにより、たとえクローズドキャプションデータがない状況であっても、膨大な人手をかけることなく、識別モデルデータベースを作成することができる。 By using the speech recognition result data 2C as described above, even if there is no closed caption data, the identification model database can be created without enormous manpower.

[第５の実施形態]
次に、本発明の第５の実施形態について説明する。なお、以下では、上述した実施形態と同様である部分についての説明を省略し、本実施形態特有の部分についてのみ、説明する。 [Fifth Embodiment]
Next, a fifth embodiment of the present invention will be described. In the following, description of parts that are the same as those of the above-described embodiment will be omitted, and only parts that are unique to this embodiment will be described.

図１１は、本実施形態による識別モデル生成装置１００Ｄ（画像識別処理装置）の機能構成を示すブロック図である。本実施形態による識別モデル生成装置１００Ｄの特徴的な構成は次の通りである。即ち、第１実施形態における番組映像データ１の代わりに、本実施形態では画像データ１Ｄを用いる。また、第１実施形態におけるクローズドキャプションデータ２の代わりに、本実施形態ではウェブテキストデータ２Ｄ（テキストデータ）を用いる。なお、画像データ１Ｄとウェブテキストデータ２Ｄをまとめてウェブコンテンツデータと称する。また、第１実施形態におけるコンテンツデータベース４はクラスごとに映像データとテキストデータのペアを格納していたのに対して、本実施形態におけるコンテンツデータベース４Ｄは、クラスごとに、画像データとテキストデータのペアを複数格納している。 FIG. 11 is a block diagram illustrating a functional configuration of the identification model generation device 100D (image identification processing device) according to the present embodiment. The characteristic configuration of the identification model generation device 100D according to the present embodiment is as follows. That is, instead of the program video data 1 in the first embodiment, image data 1D is used in this embodiment. Further, instead of the closed caption data 2 in the first embodiment, web text data 2D (text data) is used in the present embodiment. The image data 1D and the web text data 2D are collectively referred to as web content data. The content database 4 in the first embodiment stores a pair of video data and text data for each class, whereas the content database 4D in the present embodiment stores image data and text data for each class. Multiple pairs are stored.

ここで、画像データ１Ｄと、ウェブテキストデータ２Ｄは、同一のウェブコンテンツから抽出されたものである。例えば、単一のウェブページにおいてページ上に表示されている単一又は複数の画像のデータを画像データ１Ｄとする。ここで、画像は、静止画であっても動画であってもよい。そして、ウェブテキストデータ２Ｄは、そのウェブページにおいて表示されているテキストのデータである。同じウェブページに表示されている画像データ１Ｄとウェブテキストデータ２Ｄは、意味的につながりを持つ場合が非常に多い。 Here, the image data 1D and the web text data 2D are extracted from the same web content. For example, image data 1D is data of a single image or a plurality of images displayed on a page in a single web page. Here, the image may be a still image or a moving image. The web text data 2D is text data displayed on the web page. In many cases, image data 1D and web text data 2D displayed on the same web page are semantically connected.

一例としては、１つのＨＴＭＬ（ハイパーテキスト・マークアップ言語）ファイルの中に、「＜ｉｍｇ＞」タグにより参照されている画像ファイルを画像データ１Ｄとして用い、そのＨＴＭＬファイルの中の、タグ以外のテキスト部分をウェブテキストデータ２Ｄとする。 As an example, an image file referred to by the “<img>” tag is used as image data 1D in one HTML (hypertext markup language) file, and the HTML file other than the tag in the HTML file is used. Let the text part be the web text data 2D.

本実施形態では、映像分類処理部３は、画像データ１Ｄの画像上の特徴（画素値の特徴、画素値の画面上での分布の特徴、そして動画の場合には画素値の時間方向の分布の特徴など）により、クラス分類を行なう。 In the present embodiment, the video classification processing unit 3 performs the feature on the image of the image data 1D (the feature of the pixel value, the feature of the distribution of the pixel value on the screen, and the distribution of the pixel value in the time direction in the case of a moving image. Classification) based on the characteristics of

本実施形態の識別モデル生成装置１００Ｄにより、ウェブページ等に掲載されている画像を、その純粋に画像的な特徴のみからクラスに分類し、クラスごとのウェブテキストに基づいて確率計算処理を行なうことによりクラスと単語との関係を示す確率パラメタ値を算出し、それらの確率パラメタ値を書き込んだ識別モデルデータベース６を作成することができる。そして、この識別モデルデータベース６を用いて既に述べた識別処理を行なうことにより、入力される文書にマッチに応じたクラス生起確率を求めることができるとともに、その文書にマッチした画像のクラスを決定することができる。 The identification model generation apparatus 100D of the present embodiment classifies images posted on a web page or the like into classes based only on their purely image features, and performs probability calculation processing based on the web text for each class. By calculating the probability parameter value indicating the relationship between the class and the word, the identification model database 6 in which the probability parameter value is written can be created. Then, by performing the above-described identification processing using the identification model database 6, the class occurrence probability corresponding to the match can be obtained for the input document, and the class of the image matching the document is determined. be able to.

[第６の実施形態]
次に、本発明の第６の実施形態について説明する。なお、以下では、上述した実施形態と同様である部分についての説明を省略し、本実施形態特有の部分についてのみ、説明する。 [Sixth Embodiment]
Next, a sixth embodiment of the present invention will be described. In the following, description of parts that are the same as those of the above-described embodiment will be omitted, and only parts that are unique to this embodiment will be described.

図１２は、本実施形態による映像検索装置の機能構成を示すブロック図である。図示する通り、映像検索装置４００（映像識別処理装置）は、識別モデルデータベース６（確率パラメタ値データベース）と、文書７（テキストデータ）を記憶する手段と、映像データ１Ｅを記憶する手段と、識別処理部８Ｅと、画像クラス９を記憶する手段と、映像データベース１２と、検索装置１３（検索処理部）と、映像検索インターフェース部１４と、入力される検索語１５を少なくとも一時的に記憶する手段と、検索結果１６を少なくとも一時的に記憶する手段とを含んで構成される。
なお、文書７や、映像データ１Ｅや、画像クラス９や、検索語１５や、検索結果１６を記憶する手段は、それぞれ、半導体メモリや、磁気ハードディスク装置や、光ディスク装置などを用いて実現される。
また、映像データベース１２は、映像データをクラスごとに分類して保持するものである。 FIG. 12 is a block diagram illustrating a functional configuration of the video search apparatus according to the present embodiment. As shown in the figure, the video search device 400 (video identification processing device) includes an identification model database 6 (probability parameter value database), means for storing a document 7 (text data), means for storing video data 1E, identification Processing unit 8E, means for storing image class 9, video database 12, search device 13 (search processing unit), video search interface unit 14, and means for at least temporarily storing input search word 15 And means for storing the search result 16 at least temporarily.
Note that the means for storing the document 7, the video data 1E, the image class 9, the search word 15, and the search result 16 are realized using a semiconductor memory, a magnetic hard disk device, an optical disk device, or the like, respectively. .
The video database 12 classifies and holds video data by class.

この映像検索装置４００の中で、識別処理部８Ｅが、識別モデルデータベース６を参照しながら、入力される文書７に相応しい画像クラス９を決定する処理の部分は、図５を参照しながら既に説明したのと同様であるので、ここでは説明を省略する。
識別処理部８Ｅによる本実施形態特有の処理は、文書７に関連付けられている映像データ１Ｅを読み込み、文書７に基づいて決定された画像クラス９（最尤クラス）に関連付けて、その映像データ１Ｅと文書７とのペアを映像データベース１２に書き込む処理を行なう点である。つまり、識別処理部８Ｅは、第１実施形態における識別処理部８の処理に加えてさらに、算出したクラス生起確率に基づき、入力される文書７の最尤クラスを決定するとともに、文書７に対応付いている映像データ１Ｅが決定されたその最尤クラスに属するように、映像データ１Ｅを映像データベース１２に書き込む処理を行なう。 In this video search apparatus 400, the process of the identification processing unit 8E determining the image class 9 suitable for the input document 7 while referring to the identification model database 6 has already been described with reference to FIG. Since it is the same as that described above, the description is omitted here.
The processing unique to the present embodiment by the identification processing unit 8E reads the video data 1E associated with the document 7, and associates the video data 1E with the image class 9 (maximum likelihood class) determined based on the document 7. And a document 7 pair is written in the video database 12. That is, in addition to the processing of the identification processing unit 8 in the first embodiment, the identification processing unit 8E further determines the maximum likelihood class of the input document 7 based on the calculated class occurrence probability and supports the document 7 The video data 1E is written into the video database 12 so that the attached video data 1E belongs to the determined maximum likelihood class.

識別処理部８Ｅが上記のような処理を行なうことにより、映像データ１Ｅを、その映像あるいは画像の特徴によらずに、関連している文書７の特徴によってクラスに分類し、分類したクラスごとに映像データベース１２に書き込むことができる。このような処理を繰り返すことにより、映像データベース１２には、クラスごとに分類された映像データが蓄積される。 When the identification processing unit 8E performs the above-described processing, the video data 1E is classified into classes according to the characteristics of the related document 7 regardless of the characteristics of the video or image, and for each classified class. The video database 12 can be written. By repeating such processing, video data classified by class is stored in the video database 12.

なお、映像データベース１２においては、各クラスに対応するテキスト情報（被検索テキストと呼ぶ）も記録されている。この被検索テキストとしては、例えば、人が適宜付与した単語（例えば、その単語が、そのクラスのクラス名であってもよい）を用いる。例えば、「ガスレンジ」や、「バッターボックス」などといった単語を用いる。これら以外にも、そのクラスに関連性の深いテキストを自動的に生成して、各クラスに対応する被検索テキストとして用いるようにしても良い。そのための方法は、例えば、類義語辞書あるいは概念辞書を参照することによって、「バッター」、「ストライク」、「カーブ」などという単語の上位概念として対応付けられている「野球」という単語を自動的に抽出したり、それらの単語の同概念あるいは類似概念として対応付けられている「バッターボックス」という単語を自動的に抽出したりする。また、そのように類義語辞書あるいは概念辞書から単語を抽出する際に、例えば、具象物名詞であるという限定を加えたり、場所を表す名詞であるという限定を加えたりするなど、自動生成する単語ないしはテキストに関するルールを予め設けるようにしておいても良い。 In the video database 12, text information corresponding to each class (referred to as searched text) is also recorded. As the text to be searched, for example, a word appropriately given by a person (for example, the word may be a class name of the class) is used. For example, words such as “gas range” and “batter box” are used. In addition to these, texts closely related to the class may be automatically generated and used as searched text corresponding to each class. For this purpose, for example, by referring to a synonym dictionary or concept dictionary, the word “baseball” that is associated as a superordinate concept of the words “batter”, “strike”, “curve”, etc. is automatically Extracting or automatically extracting the word “batter box” associated with the same or similar concept of those words. In addition, when extracting a word from a synonym dictionary or a concept dictionary in this way, for example, a word that is automatically generated, such as a limitation that it is a concrete noun, a limitation that it is a noun that represents a place, etc. Rules regarding text may be provided in advance.

映像検索インターフェース部１４は、利用者による検索語（１５）の入力ないしは選択を受け付けるとともに、検索結果（１６）を利用者に対して表示する機能を有するものである。映像検索インターフェース部１４は、例えば、パーソナルコンピュータのディスプレイ装置やキーボードやマウスなどを制御することにより、上記の利用者インターフェースを実現する。 The video search interface unit 14 has a function of accepting input or selection of a search word (15) by the user and displaying the search result (16) to the user. The video search interface unit 14 realizes the above-described user interface by controlling, for example, a display device of a personal computer, a keyboard, a mouse, and the like.

検索装置１３は、入力された検索語を映像検索インターフェース部１４から受け取り、その検索語にマッチするクラスを求め、求められたクラスを検索結果クラスとし、その検索結果クラスに属する映像データを映像データベース１３から読み出して、検査結果１６として映像検索インターフェース部１４に渡す。
なお、検索装置１３が入力された検索語に基づいてマッチするクラスを求める部分の処理は、テキストを対象としたウェブ検索エンジン（サーチエンジン）の技術を用いる。この技術を用いて、検索装置１３は、上記の、各クラスに対応する被検索テキストを検索し、ヒットするクラスを検索結果クラスとする。なお、検索結果クラスの数は１つであっても良いし、複数であっても良い。 The search device 13 receives the input search word from the video search interface unit 14, obtains a class that matches the search word, sets the obtained class as a search result class, and sets video data belonging to the search result class as a video database. 13 and passes to the video search interface unit 14 as the inspection result 16.
In addition, the process of the part which calculates | requires the class which matches based on the search term in which the search device 13 was input uses the technique of the web search engine (search engine) for text. Using this technique, the search device 13 searches the search target text corresponding to each class described above, and sets the hit class as the search result class. Note that the number of search result classes may be one or plural.

なお、検索装置１３は、検索結果１６として映像データそのものを映像検索インターフェース部１４に渡す代わりに、検索でヒットした映像データへの参照情報を検索結果１６として映像検索インターフェース部に渡すようにしてもよい。ここで、映像データへの参照情報としては、例えば、映像データのファイルの場所を示すＵＲＬ（ユニフォーム・リソース・ロケータ）や、或いは映像データベース１２内での目的の映像データのデータベースインデックス値など、映像データを特定できる情報を用いる。 Instead of passing the video data itself as the search result 16 to the video search interface unit 14, the search device 13 may pass the reference information for the video data hit in the search as the search result 16 to the video search interface unit. Good. Here, as reference information to the video data, for example, a video (such as a URL (Uniform Resource Locator)) indicating the location of the video data file or a database index value of the target video data in the video database 12 is used. Use information that can identify the data.

この映像検索装置４００を用いることにより、映像データ１Ｅに元々対応する文書７（テキストデータ）に「ガスレンジ」や「バッターボックス」などの語が含まれていなくても、検索語として「ガスレンジ」や「バッターボックス」などといった語が利用者によって入力された場合に、該当する（つまり、ガスレンジやバッターボックスなどが映されている）映像データを取り出して利用者に見せることができる。
なお、上では、各クラスに対応する被検索テキストを人が付与する形態も記載しているが、たとえこの部分の被検索テキストの付与を人手で行なったとしても、各映像データに対して個別にメタデータを付与する方法を採るよりは、格段に手間が削減され、効率化を図ることができる。 By using the video search device 400, even if the document 7 (text data) originally corresponding to the video data 1E does not contain words such as “gas range” and “batter box”, the search term “gas range” is used. When the user inputs a word such as “Battery box” or the like, the corresponding video data (that is, a gas range, a batter box, or the like is displayed) can be taken out and shown to the user.
In the above, a form in which a search target text corresponding to each class is given by a person is described, but even if this search text is given manually, each video data is individually provided. Compared to the method of assigning metadata to the, it is possible to significantly reduce labor and improve efficiency.

なお、上述した実施形態における識別モデル生成装置や、識別処理装置や、識別モデル生成及び識別処理装置や、映像検索装置の全部又は一部、例えば、映像分類処理や確率計算処理や識別処理や検索等の機能をコンピュータで実現するようにしても良い。その場合、これらの機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現しても良い。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時刻の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時刻プログラムを保持しているものも含んでも良い。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであっても良い。 In addition, all or part of the identification model generation device, the identification processing device, the identification model generation and identification processing device, and the video search device in the above-described embodiment, for example, video classification processing, probability calculation processing, identification processing, and search These functions may be realized by a computer. In that case, a program for realizing these functions may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read into a computer system and executed. Here, the “computer system” includes an OS and hardware such as peripheral devices. The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Further, the “computer-readable recording medium” dynamically holds a program for a short time, like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. It is also possible to include those that hold a program for a certain time, such as a volatile memory inside a computer system serving as a server or client in that case. The program may be a program for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in a computer system.

以上、複数の実施形態を説明したが、本発明はさらに次のような変形例でも実施することが可能である。
例えば、番組映像データに限らず、何らかのテキストデータが関連付けられている一般の映像データ、画像データ等を用いて、上記実施形態で記載したのと同様の処理を行なっても良い。
以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 Although a plurality of embodiments have been described above, the present invention can also be implemented in the following modifications.
For example, the same processing as described in the above embodiment may be performed using not only program video data but also general video data, image data, or the like associated with some text data.
The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to this embodiment, and includes designs and the like that do not depart from the gist of the present invention.

本発明の第１実施形態による識別モデル生成装置（映像識別処理装置）の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the identification model production | generation apparatus (video identification processing apparatus) by 1st Embodiment of this invention. 同実施形態によるコンテンツデータベースのデータ構成を示す概略図である。It is the schematic which shows the data structure of the content database by the embodiment. 同実施形態による識別モデルデータベースのデータ構成およびデータ例を示す概略図である。It is the schematic which shows the data structure and example of a data of the identification model database by the embodiment. 同実施形態による確率計算処理部の処理の前半部分（文書ベクトルおよび頻度ベクトルを算出する処理）の手順を示すフローチャートである。It is a flowchart which shows the procedure of the first half part (process which calculates a document vector and a frequency vector) of the process of the probability calculation process part by the embodiment. 同実施形態による識別処理装置（映像識別処理装置）の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the identification processing apparatus (video identification processing apparatus) by the embodiment. 同実施形態の識別処理部による識別処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the identification process by the identification process part of the embodiment. 同実施形態による識別モデル生成及び識別処理装置（映像識別処理装置）の構成を示すブロック図である。It is a block diagram which shows the structure of the identification model production | generation and identification processing apparatus (video identification processing apparatus) by the embodiment. 本発明の第２実施形態による確率計算処理部や識別処理部における、単語抽出及び頻度ベクトル作成の処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the process of word extraction and frequency vector preparation in the probability calculation process part and the identification process part by 2nd Embodiment of this invention. 本発明の第３実施形態による識別モデル生成装置（映像識別処理装置）の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the identification model production | generation apparatus (video identification processing apparatus) by 3rd Embodiment of this invention. 本発明の第４実施形態による識別モデル生成装置（映像識別処理装置）の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the identification model production | generation apparatus (image | video identification processing apparatus) by 4th Embodiment of this invention. 本発明の第５実施形態による識別モデル生成装置（画像識別処理装置）の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the identification model production | generation apparatus (image identification processing apparatus) by 5th Embodiment of this invention. 本発明の第６実施形態による映像検索装置（映像識別処理装置）の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the video search apparatus (video identification processing apparatus) by 6th Embodiment of this invention.

Explanation of symbols

１番組映像データ（映像データ）
１Ｄ画像データ
１Ｅ映像データ
２クローズドキャプションデータ（テキストデータ）
２Ｂ番組台本データ（テキストデータ）
２Ｃ音声認識結果データ（テキストデータ）
２Ｄウェブテキストデータ（テキストデータ）
３映像分類処理部
４，４Ｄコンテンツデータベース
５確率計算処理部
６識別モデルデータベース（確率パラメタ値データベース）
７文書（テキストデータ）
８，８Ｅ識別処理部
９画像クラス
１２映像データベース
１３検索装置（検索処理部）
１００，１００Ｂ，１００Ｃ識別モデル生成装置（映像識別処理装置）
１００Ｄ識別モデル生成装置（画像識別処理装置）
２００識別処理装置（映像識別処理装置）
３００識別モデル生成及び識別処理装置（映像識別処理装置） 1 Program video data (video data)
1D image data 1E video data 2 closed caption data (text data)
2B Script script data (text data)
2C Voice recognition result data (text data)
2D web text data (text data)
3 Video classification processing unit 4, 4D content database 5 probability calculation processing unit 6 identification model database (probability parameter value database)
7 Document (text data)
8, 8E Identification processing unit 9 Image class 12 Video database 13 Search device (search processing unit)
100, 100B, 100C Identification model generation device (video identification processing device)
100D identification model generation device (image identification processing device)
200 Identification processing device (video identification processing device)
300 Identification Model Generation and Identification Processing Device (Video Identification Processing Device)

Claims

A content database that classifies and holds video data and text data associated with the video data for each class;
A probability parameter value database that stores a probability parameter value representing a relationship between a word and the class in association with the word and the class;
Based on the appearance frequency of the word included in the text data read from the content database, the probability parameter value representing the relationship between the word and the class is calculated, the calculated probability parameter value is the word and A probability calculation processor that writes to the probability parameter value database in association with the class;
A video identification processing apparatus comprising:

The video identification processing device according to claim 1,
A class occurrence that is a probability that the text data belongs to the class based on the appearance frequency of the words included in the text data and the probability parameter value read from the probability parameter value database. An identification processing unit that calculates the probability for each class,
A video identification processing apparatus, further comprising:

The video identification processing device according to claim 1,
A video classification processing unit that determines a class to which the video data belongs based on the video characteristics of the input video data and writes the video data and text data associated with the video data to the content database,
A video identification processing apparatus, further comprising:

A probability parameter value database that stores the probability parameter value representing the relationship between the word and the class, calculated based on the appearance frequency of the word included in the text data, in association with the word and the class;
A class occurrence that is a probability that the text data belongs to the class based on the appearance frequency of the words included in the text data and the probability parameter value read from the probability parameter value database. An identification processing unit for calculating the probability for each class;
A video identification processing apparatus comprising:

The video identification processing device according to claim 4,
A video database that stores video data classified by class, and
Based on an input search term, a search result class that matches the search term is obtained, and reference information to video data belonging to the search result class or video data belonging to the search result class is obtained by referring to a video database. A search processing unit that outputs at least one of
With
Video data is associated with the input text data,
The identification processing unit further determines a maximum likelihood class of the input text data based on the calculated class occurrence probability, and the video data so that the video data belongs to the determined maximum likelihood class. Write to the video database;
A video identification processing apparatus characterized by the above.

The video identification according to any one of claims 1 to 5, wherein the word is obtained by excluding a particle, an auxiliary verb, and a symbol from a result of morphological analysis processing of the text data. Processing equipment.

6. The word according to claim 1, wherein a noun, a verb, an adjective, and an unknown word are extracted from the result of the morphological analysis process on the text data. Video identification processing device.

The video data is video data of a broadcast program,
The text data is obtained as a result of voice recognition processing based on the closed caption text data of the broadcast program, the text data of the broadcast program script associated with the time information, and the video data of the broadcast program. The video identification processing device according to claim 1, wherein the video identification processing device is associated text data.

A content database that classifies and stores web content data including image data and text data,
A probability parameter value database that stores a probability parameter value representing a relationship between a word and the class in association with the word and the class;
Based on the appearance frequency of the word included in the text data read from the content database, the probability parameter value representing the relationship between the word and the class is calculated, the calculated probability parameter value is the word and A probability calculation processor that writes to the probability parameter value database in association with the class;
An image identification processing device comprising:

A content database that classifies and holds video data and text data associated with the video data for each class;
A probability parameter value database that stores a probability parameter value representing a relationship between a word and the class in association with the word and the class;
On a computer with
Based on the appearance frequency of the word included in the text data read from the content database, the probability parameter value representing the relationship between the word and the class is calculated, the calculated probability parameter value is the word and Probability calculation processing step to write to the probability parameter value database in association with the class,
A computer program that executes the process.

A computer comprising a probability parameter value database that stores the probability parameter value representing the relationship between the word and the class, which is calculated based on the appearance frequency of the word included in the text data, in association with the word and the class,
A class occurrence that is a probability that the text data belongs to the class based on the appearance frequency of the words included in the text data and the probability parameter value read from the probability parameter value database. An identification processing step for calculating the probability for each class;
A computer program that executes the process.

A content database that classifies and stores web content data including image data and text data,
A probability parameter value database that stores a probability parameter value representing a relationship between a word and the class in association with the word and the class;
On a computer with
Based on the appearance frequency of the word included in the text data read from the content database, the probability parameter value representing the relationship between the word and the class is calculated, the calculated probability parameter value is the word and Probability calculation processing step to write to the probability parameter value database in association with the class,
A computer program that executes the process.