JP2010045486A

JP2010045486A - Still image extracting device, and still image extracting program

Info

Publication number: JP2010045486A
Application number: JP2008206751A
Authority: JP
Inventors: Naoki Kawai; 直樹河合; Masami Fujita; 昌巳藤田; Akira Nakamura; 章中村
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2008-08-11
Filing date: 2008-08-11
Publication date: 2010-02-25
Anticipated expiration: 2028-08-11
Also published as: JP4937211B2

Abstract

【課題】同一字幕が表示されている中で最も代表的な静止画を抽出する。
【解決手段】サムネイルデータを作成するために、字幕データが付随する映像データから静止画を抽出する静止画抽出装置であって、映像データの各フレームから、字幕データが新たに表示される時点の第１のフレームと、表示された字幕データが消去される時点の第２のフレームと、第１のフレームと第２のフレームの間のＮ（Ｎは１以上の自然数）枚のフレームとを抽出するフレーム抽出手段と、フレーム抽出手段によって抽出された（Ｎ＋２）枚のフレームのうち、時間的に隣り合う２枚フレームの画像の差分をそれぞれ求める差分算出手段と、差分算出手段によって求めた画像の差分が最も少ない２枚のフレームを特定し、特定した２枚のフレームのうち時間的に早い方または遅い方のフレームの映像データを静止画として抽出して記録する静止画抽出手段とを備えた。
【選択図】図１The most representative still image is extracted while the same subtitle is displayed.
A still image extraction apparatus that extracts a still image from video data accompanied by caption data in order to create thumbnail data, wherein the caption data is newly displayed from each frame of the video data. Extract the first frame, the second frame when the displayed caption data is erased, and N (N is a natural number of 1 or more) frames between the first frame and the second frame A frame extracting unit that calculates a difference between two temporally adjacent images out of (N + 2) frames extracted by the frame extracting unit, and an image obtained by the difference calculating unit. Identify the two frames with the smallest difference, and extract the video data of the earlier or later frame of the identified two frames as a still image And a still image extracting unit for recording.
[Selection] Figure 1

Description

本発明は、サムネイルを作成するために動画の中から静止画を抽出する静止画抽出装置及び静止画抽出プログラムに関する。 The present invention relates to a still image extraction apparatus and a still image extraction program for extracting a still image from a moving image in order to create a thumbnail.

テレビジョン放送として放送される番組のデータには、映像データと音声データが含まれている。また、放送される番組データには、映像データとして含まれているため、ユーザ側で表示の可否を選択できない字幕（番組の題名やキャストなどの紹介、または、海外の作品における日本語字幕）が含まれている場合がある。また、放送される番組データには、表示させるか否かをユーザ側で選択可能な字幕データが含まれている場合がある。このような選択可能な字幕データは、一般にクローズドキャプション（Closed Caption）と称され、主に、聴覚障害者用に開発され、海外の作品における日本語字幕のように、出演者の会話だけではなく、例えば、ＢＧＭや効果音などの説明も含まれる。 The data of a program broadcast as a television broadcast includes video data and audio data. In addition, since the broadcast program data is included as video data, there are subtitles (introduction of program titles, casts, etc., or Japanese subtitles in overseas works) that cannot be selected by the user. May be included. In addition, the broadcast program data may include subtitle data that allows the user to select whether or not to display the program data. Such selectable subtitle data is generally called Closed Caption, and it was developed mainly for the hearing impaired, and not only the conversation of performers like Japanese subtitles in overseas works. For example, descriptions of BGM, sound effects, and the like are also included.

例えば、ＮＴＳＣ（National Television System Committee）方式のアナログの地上波放送では、映像信号に５２５本の走査線が用いられており、この５２５本のうち、各フィールド（２フィールドで１フレームを構成）の最初の２１本相当は、ＶＢＩ（Vertical Blanking Interval：垂直帰線消去期間）と称される、走査を開始するためのインターバル用に割り当てられている。クローズドキャプションは、各フィールドのＶＢＩのうち、ＶＢＩの２１本目に文字コードを多重化することによって伝送される。そして、各フィールドを使って２種類の文字セットを毎秒約６０文字が伝送されている。 For example, in NTSC (National Television System Committee) analog terrestrial broadcasting, 525 scanning lines are used for video signals. Of these 525 lines, one field is composed of two fields. The first 21 lines are allocated for an interval for starting scanning, which is called VBI (Vertical Blanking Interval). The closed caption is transmitted by multiplexing the character code in the 21st VBI of the VBIs in each field. Each field is used to transmit about 60 characters per second through two types of character sets.

また、デジタルテレビ放送の字幕情報の伝送については、国内規定である地上デジタルテレビジョン放送運用規定・技術資料（ＡＲＩＢＴＲ−Ｂ１４）、ＢＳ／広帯域ＣＳデジタル放送運用規定・技術資料（ＡＲＩＢＴＲ−Ｂ１５）で規定されているように、字幕情報用のトランスポートストリームを使って映像情報の伝送と同時に字幕情報を伝送できるように構成されている。そして、デジタルテレビ放送用受信機において、字幕情報に対応する符号がデコードされ、字幕を構成する文字、図形が生成されて、映像に重畳されて表示される。デジタルテレビ放送において、字幕情報用のトランスポートストリームを使って映像情報の伝送と同時に伝送されたテキスト情報を、内部に保存したり、内部に保存したテキスト情報をテレビモニタでいつでも表示閲覧したりすることができるようにすることにより、ユーザが、表示された字幕のメモを取らなくても、字幕の情報を活用することができるようにした技術がある（例えば、特許文献１参照）。 Regarding the transmission of caption information for digital television broadcasts, terrestrial digital television broadcast operation regulations / technical documents (ARIB TR-B14), BS / broadband CS digital broadcast operation regulations / technical documents (ARIB TR-B15) are domestic regulations. The subtitle information can be transmitted simultaneously with the transmission of the video information using the subtitle information transport stream. Then, in the receiver for digital television broadcasting, the code corresponding to the caption information is decoded, characters and figures constituting the caption are generated, and displayed superimposed on the video. In digital TV broadcasting, the text information transmitted simultaneously with the transmission of video information using a transport stream for subtitle information is stored internally, and the text information stored internally can be displayed and viewed on a TV monitor at any time. Thus, there is a technique in which a user can utilize subtitle information without taking a note of the displayed subtitle (see, for example, Patent Document 1).

テレビジョン放送は、近年、例えば、携帯電話やＰＤＡ（Personal Digital Assistant）などのユーザが携帯可能な端末でも閲覧することができるようになってきている。また、家庭内の録画装置で録画された番組を、携帯可能な装置に装着する記録媒体に記録し、その記録媒体から、記録された番組を再生するといったようなことも行われている。このように、放送される字幕データは、さまざまなサービスに利用され始めている。また、テレビジョン放送自体も、さまざまな装置で受信され、その受信された番組が閲覧できるような仕組みも設けられている。 In recent years, television broadcasts can be viewed on terminals that can be carried by users such as mobile phones and PDAs (Personal Digital Assistants). In addition, a program recorded by a recording device at home is recorded on a recording medium mounted on a portable device, and the recorded program is reproduced from the recording medium. Thus, the caption data to be broadcast has begun to be used for various services. In addition, a system is also provided in which television broadcasts themselves are received by various devices and the received programs can be browsed.

しかしながら、テレビジョン放送は、さまざまな装置で受信され、そのテレビジョン放送で放送された番組は、所定の記録媒体に記録され携帯可能な装置で閲覧されるなどされているが、その番組を記録する記録媒体の容量には限度がある。また、携帯可能な装置に装着可能（内蔵可能）な記録媒体という条件が付加されると、さらに、利用できる記録媒体の容量は限定されてしまう。 However, a television broadcast is received by various devices, and a program broadcast by the television broadcast is recorded on a predetermined recording medium and viewed on a portable device, but the program is recorded. There is a limit to the capacity of recording media. In addition, when a condition of a recording medium that can be mounted (built in) in a portable device is added, the capacity of the usable recording medium is further limited.

そのため、携帯可能な装置で番組を閲覧すると、短い時間の番組しか閲覧できない（番組の一部分しか閲覧できない）、長い時間分の番組を記録するために圧縮率を高くすると、映像が荒くなり画質が低下してしまうといった問題があった。そのために普及しないといった問題もあった。そこで、携帯可能な装置でも長時間の番組や複数の番組を閲覧でき、かつ、画質が低下してしまうようなことを防ぎながら閲覧できる機能が望まれている。 Therefore, when viewing a program on a portable device, only a short time program can be viewed (only a part of the program can be viewed), and if a high compression ratio is recorded to record a long time program, the video becomes rough and the image quality is improved. There was a problem of being lowered. For this reason, there was a problem that it did not spread. Therefore, there is a demand for a function that allows a portable device to browse a long program or a plurality of programs while preventing the image quality from being deteriorated.

このような問題を解決するため、携帯可能な装置においても、複数の番組や、長時間の番組を閲覧できるようにするために、携帯電話機により指示されたテレビジョン放送の番組を録画し、この録画された番組を時刻情報により関連付けたサムネイル画像データとテキストデータに変換して表示することにより、番組の内容を把握することができる情報処理装置が知られている（例えば、特許文献２参照）。 In order to solve such a problem, even in a portable device, in order to be able to view a plurality of programs and long-time programs, a television broadcast program instructed by a mobile phone is recorded, and this program is recorded. There is known an information processing apparatus capable of grasping the contents of a program by converting the recorded program into thumbnail image data and text data associated with time information and displaying the data (for example, see Patent Document 2). .

ところで、放送番組などの映像コンテンツの内容を容易に理解できるように、映像に付随する字幕データ（テキストデータ）と、映像データから抽出した静止画（サムネイル画像）を表示することで、コンテンツの内容を閲覧あるいは検索することが可能となる。この場合、内容を理解するために過不足なく必要な量の静止画を映像から抽出する必要があるが、長時間の映像データから人手を使って静止画の抽出を行うのは現実的でないため、自動的に静止画の抽出を行うことができるようにする必要がある。一般に映像データ（動画像データ）から静止画を抽出する場合、一定の時間間隔でフレームを抽出することが考えられる。図１２に示すように、入力される映像データからフレーム抽出部５１が各フレームデータを切り出し、時計Ｔの出力を参照して、一定の時間間隔（例えば、１分間隔）でフレームデータを時計Ｔが出力する時刻情報とともに、時間・静止画記録部５２に記録するようにすれば、長時間の映像データであっても自動的に所定数の静止画を抽出して記録することが可能となる。 By the way, in order to easily understand the content of video content such as broadcast programs, the content of the content is displayed by displaying subtitle data (text data) attached to the video and still images (thumbnail images) extracted from the video data. Can be browsed or searched. In this case, it is necessary to extract the necessary amount of still images from the video to understand the contents, but it is not practical to extract still images from long-time video data manually. It is necessary to be able to automatically extract still images. In general, when a still image is extracted from video data (moving image data), it is conceivable to extract frames at regular time intervals. As shown in FIG. 12, the frame extraction unit 51 cuts out each frame data from the input video data and refers to the output of the clock T, and converts the frame data into the clock T at a fixed time interval (for example, every 1 minute). Is recorded in the time / still image recording unit 52 together with the time information output by the camera, it is possible to automatically extract and record a predetermined number of still images even for long-time video data. .

しかしながら、この方法は、フレームを切り出す時間間隔を短くすると不必要に抽出される静止画が多くなり、また、時間間隔が長くすると必要な静止画が抽出されないという問題がある。このような問題を解決するために、新たな字幕情報が受信された場合に静止画を抽出するテレビ受信装置が知られている（例えば、特許文献３参照）。これは、字幕情報のサービスを利用することにより、記録する画像データを大幅に削減して、極めてデータ量の少ないダイジェスト記録を可能にするものである。
特開２００３−０７８８８９号公報特開２００６−２５３９６０号公報特開２００７−００６３０８号公報 However, this method has a problem that if the time interval for cutting out frames is shortened, more still images are extracted unnecessarily, and if the time interval is longer, necessary still images are not extracted. In order to solve such a problem, a television receiver that extracts a still image when new caption information is received is known (see, for example, Patent Document 3). This makes it possible to perform digest recording with an extremely small amount of data by using the subtitle information service to greatly reduce the image data to be recorded.
JP 2003-078889 A JP 2006-253960 A JP 2007-006308 A

しかしながら、特許文献３に示すテレビ受信装置にあっては、同一の字幕が表示されている場面であっても複数の場面があり、必ずしも新たな字幕情報が受信された時点のフレームの静止画がその字幕を表示している場面における代表的な静止画であるとは限らないという問題がある。 However, in the television receiver shown in Patent Document 3, there are a plurality of scenes even when the same subtitle is displayed, and a still image of a frame at the time when new subtitle information is received is not necessarily obtained. There is a problem that it is not necessarily a representative still image in a scene displaying the caption.

本発明は、このような事情に鑑みてなされたもので、同一字幕が表示されている中で最も代表的な静止画を抽出することができる静止画抽出装置及び静止画抽出プログラムを提供することを目的とする。 The present invention has been made in view of such circumstances, and provides a still image extraction device and a still image extraction program capable of extracting the most representative still image among the same subtitles being displayed. With the goal.

本発明は、サムネイルデータを作成するために、字幕データが付随する映像データから静止画を抽出する静止画抽出装置であって、前記映像データの各フレームから、前記字幕データが新たに表示される時点の第１のフレームと、前記表示された字幕データが消去される時点の第２のフレームと、前記第１のフレームと前記第２のフレームの間のＮ（Ｎは１以上の自然数）枚のフレームとを抽出するフレーム抽出手段と、前記フレーム抽出手段によって抽出された（Ｎ＋２）枚のフレームのうち、時間的に隣り合う２枚フレームの画像の差分をそれぞれ求める差分算出手段と、前記差分算出手段によって求めた画像の差分が最も少ない２枚のフレームを特定し、特定した２枚のフレームのうち時間的に早い方または遅い方のフレームの映像データを静止画として抽出して記録する静止画抽出手段とを備えたことを特徴とする。 The present invention is a still image extraction device that extracts a still image from video data accompanied by caption data in order to create thumbnail data, and the caption data is newly displayed from each frame of the video data. The first frame at the time, the second frame at the time when the displayed caption data is erased, and N (N is a natural number of 1 or more) between the first frame and the second frame A frame extracting unit for extracting the frames of the frame, a difference calculating unit for obtaining a difference between two temporally adjacent images among (N + 2) frames extracted by the frame extracting unit, and the difference Two frames with the smallest image difference obtained by the calculation means are specified, and the video data of the earlier or later frame of the two specified frames is identified. Characterized in that a still image extracting means for recording by extracting data as a still image.

本発明は、前記字幕データから句読点を抽出する句読点抽出手段をさらに備え、前記フレーム抽出手段は、前記句読点抽出手段によって前記字幕データ内に句点または読点が検出された場合にのみに前記（Ｎ＋２）枚のフレーム抽出を行うことを特徴とする。 The present invention further includes punctuation mark extraction means for extracting punctuation marks from the subtitle data, and the frame extraction means is the (N + 2) only when a punctuation mark or a punctuation mark is detected in the subtitle data by the punctuation mark extraction means. It is characterized by performing frame extraction of a sheet.

本発明は、サムネイルデータを作成するために、コンピュータによって字幕データが付随する映像データから静止画を抽出する静止画抽出プログラムであって、前記映像データの各フレームから、前記字幕データが新たに表示される時点の第１のフレームと、前記表示された字幕データが消去される時点の第２のフレームと、前記第１のフレームと前記第２のフレームの間のＮ（Ｎは１以上の自然数）枚のフレームとを抽出するフレーム抽出ステップと、前記フレーム抽出ステップによって抽出された（Ｎ＋２）枚のフレームのうち、時間的に隣り合う２枚フレームの画像の差分をそれぞれ求める差分算出ステップと、前記差分算出ステップによって求めた画像の差分が最も少ない２枚のフレームを特定し、特定した２枚のフレームのうち時間的に早い方または遅い方のフレームの映像データを静止画として抽出して記録する静止画抽出ステップとをコンピュータに行わせることを特徴とする。 The present invention is a still image extraction program for extracting still images from video data accompanied by subtitle data by a computer in order to create thumbnail data, wherein the subtitle data is newly displayed from each frame of the video data. A first frame at the time when the subtitle data is displayed, a second frame at the time when the displayed caption data is erased, and N between the first frame and the second frame (N is a natural number of 1 or more) ) A frame extraction step for extracting frames, and a difference calculation step for obtaining a difference between two temporally adjacent images among (N + 2) frames extracted by the frame extraction step; The two frames with the smallest image difference obtained by the difference calculating step are specified, and the time of the two specified frames is Characterized in that to perform to earlier or later one frame and a still image extraction step of extracting and recording video data as a still image in the computer.

本発明は、前記字幕データから句読点を抽出する句読点抽出ステップをさらにコンピュータに行わせ、前記フレーム抽出ステップは、前記句読点抽出ステップによって前記字幕データ内に句点または読点が検出された場合にのみに前記（Ｎ＋２）枚のフレーム抽出を行うことを特徴とする。 The present invention further causes a computer to perform a punctuation extraction step of extracting punctuation marks from the caption data, and the frame extraction step is performed only when a punctuation mark or a punctuation mark is detected in the caption data by the punctuation extraction step. (N + 2) frames are extracted.

本発明によれば、映像データの内容を理解するために過不足なく必要な量の静止画を映像データから抽出することができ、特に、同一字幕が表示されている中で最も代表的な静止画を抽出することができるという効果が得られる。 According to the present invention, it is possible to extract a required amount of still images from video data without excess or deficiency in order to understand the content of video data, and in particular, the most representative still image among the same subtitles being displayed. The effect that a picture can be extracted is acquired.

以下、本発明の一実施形態による静止画抽出装置を図面を参照して説明する。初めに、本発明による静止画抽出装置が適用される受配信システムについて説明する。図１は同実施形態の構成を示すブロック図である。この図において、符号１は、映像、音声、テキストなどを配信する配信装置である。符号２は、配信装置１の処理動作を統括して制御する制御部である。符号３は、コンテンツデータの入力のほか、各種データの入出力を行う入力出力部である。符号４は、テキスト／サムネイル配信・表示ソフトウェア９、映像／音声／テキスト配信・表示ソフトウェア１０、コンテンツデータ１１、１２、１３、配信・表示部ソフトウェア管理データ１４、コンテンツ管理データ１５、ユーザ管理データ１６、コンテンツ・ユーザ付加データ１７等が記憶される記憶部である。符号５は、テキスト、サムネイルを配信するテキスト／サムネイル配信部である。符号６は、映像、音声、テキストの配信を行う映像／音声／テキスト配信部である。符号７は、ユーザがコンテンツに付与するコメント、評価などのデータを受信し、集約したデータを配信するコンテンツ・ユーザ付加データ受配信部である。符号８は、情報の放送または情報通信を行う通信部である。 Hereinafter, a still image extracting apparatus according to an embodiment of the present invention will be described with reference to the drawings. First, a receiving / distributing system to which the still image extracting apparatus according to the present invention is applied will be described. FIG. 1 is a block diagram showing the configuration of the embodiment. In this figure, reference numeral 1 denotes a distribution device that distributes video, audio, text, and the like. Reference numeral 2 denotes a control unit that performs overall control of processing operations of the distribution apparatus 1. Reference numeral 3 denotes an input / output unit for inputting / outputting various data in addition to input of content data. Reference numeral 4 denotes text / thumbnail distribution / display software 9, video / audio / text distribution / display software 10, content data 11, 12, 13, distribution / display unit software management data 14, content management data 15, user management data 16. The storage unit stores content / user additional data 17 and the like. Reference numeral 5 denotes a text / thumbnail distribution unit that distributes text and thumbnails. Reference numeral 6 denotes a video / audio / text distribution unit that distributes video, audio, and text. Reference numeral 7 denotes a content / user-added data receiving / distributing unit that receives data such as comments and evaluations given to the content by the user and distributes the aggregated data. Reference numeral 8 denotes a communication unit that performs information broadcasting or information communication.

テキスト／サムネイル配信・表示ソフトウェア９は、テキスト／サムネイルの配信・表示処理を実現するソフトウェアである。映像／音声／テキスト配信・表示ソフトウェア１０は、映像、音声、テキストの配信・表示処理を実現するソフトウェアである。コンテンツデータ１１は、映像、音声、テキスト、サムネイル、その他のデータで構成されるコンテンツデータであり、各データは配信装置１から通信部を介して配信される。コンテンツデータ１１は、時間的に切り替わる静止画像の集まりも含む映像データ（動画像データ）１８、映像に時間的に連動した音声データ１９、テレビ放送の字幕をテキストに変換したデータなどの映像、または音声に時間的に連動したテキストデータ２０、映像を縮小した静止画像データであるサムネイルデータ２１及びコンテンツ全体の名称、概要などの全体情報データと、コンテンツの一部のシーン説明、ＢＧＭ、タイトル名などの個別情報データに大別され、映像、または音声に時間的に連動するその他のデータ２２から構成する。 The text / thumbnail distribution / display software 9 is software for realizing text / thumbnail distribution / display processing. The video / audio / text distribution / display software 10 is software that realizes video / audio / text distribution / display processing. The content data 11 is content data composed of video, audio, text, thumbnails, and other data, and each data is distributed from the distribution device 1 via the communication unit. The content data 11 includes video data (moving image data) 18 including a collection of still images that change over time, audio data 19 that is synchronized with the video over time, video such as data obtained by converting subtitles of a television broadcast into text, or Text data 20 that is temporally linked to audio, thumbnail data 21 that is still image data obtained by reducing video, and overall information data such as the name and outline of the entire content, a scene description of part of the content, BGM, title name, etc. It is roughly divided into individual information data, and is composed of other data 22 that is temporally linked to video or audio.

配信・表示ソフトウェア管理データ１４は、テキスト／サムネイル配信・表示部ソフトウェアおよび映像／音声／テキスト配信・表示ソフトウェアを管理するデータである。コンテンツ管理データ１５は、コンテンツデータ１１〜１３を管理するデータである。ユーザ管理データ１６は、ユーザ情報を管理するデータである。コンテンツ・ユーザ付加データ１７は、ユーザがコンテンツに付与するコメント、評価などのデータである。 The distribution / display software management data 14 is data for managing text / thumbnail distribution / display software and video / audio / text distribution / display software. The content management data 15 is data for managing the content data 11 to 13. The user management data 16 is data for managing user information. The content / user additional data 17 is data such as comments and evaluations given to the content by the user.

符号２３は、配信装置１が配信した映像、音声、テキストなどを受信し、表示する受信装置であり、例えば、携帯電話端末等で構成する。符号２４は、受信装置２３の処理動作を統括して制御する制御部である。符号２５は、ダイヤルキーやファンクションキー等で構成し、ユーザとのマンマシンインタフェースを行う入力部である。符号２６は、ユーザに対して、コンテンツデータなどの表示を行うために液晶のディスプレイ等で構成する表示装置である。符号２７は、放送の受信、配信装置１との間で情報通信を行う通信部である。符号２８は、テキスト／サムネイル表示ソフトウェア３８、映像／音声／テキスト表示ソフトウェア３９、コンテンツデータ４０、４１、４２等が記憶される記憶部である。 Reference numeral 23 denotes a receiving device that receives and displays video, audio, text, and the like distributed by the distribution device 1, and includes, for example, a mobile phone terminal. Reference numeral 24 denotes a control unit that performs overall control of processing operations of the receiving device 23. Reference numeral 25 denotes an input unit configured by a dial key, a function key, or the like, and performing a man-machine interface with the user. Reference numeral 26 denotes a display device configured with a liquid crystal display or the like in order to display content data or the like to the user. Reference numeral 27 denotes a communication unit that performs information communication with the broadcast reception and distribution device 1. Reference numeral 28 denotes a storage unit that stores text / thumbnail display software 38, video / audio / text display software 39, content data 40, 41, 42, and the like.

符号２９は、テキスト／サムネイル表示ソフトウェア３８を用いて、コンテンツデータの中のテキストデータとサムネイルデータなどを、データバッファ３０を介して記憶部２８から読み出して表示装置２６に表示するテキスト／サムネイル表示部であり、テキスト表示部３１とサムネイル表示部３２とから構成する。テキスト表示部３１は、コンテンツデータの中のテキストデータを表示装置２６に表示する。テキストデータは記憶部２８に記憶されたデータや、通信部２７経由で取得されたデータを使用し、入力部２５における操作入力に応答する形で、行送りやページ送りなどの表示制御を行う。サムネイル表示部３２は、コンテンツデータの中のサムネイルデータやその他のデータを表示する。サムネイルデータやその他のデータは、記憶部２８に記憶されたデータや通信部２７経由で取得されたデータを使用する。サムネイルデータやその他のデータのうち、個別情報データについては、テキストデータに連動して表示する。 Reference numeral 29 denotes a text / thumbnail display unit that reads text data and thumbnail data in the content data from the storage unit 28 via the data buffer 30 and displays them on the display device 26 using the text / thumbnail display software 38. And comprises a text display unit 31 and a thumbnail display unit 32. The text display unit 31 displays text data in the content data on the display device 26. As the text data, data stored in the storage unit 28 or data acquired via the communication unit 27 is used, and display control such as line feed and page feed is performed in response to an operation input in the input unit 25. The thumbnail display unit 32 displays thumbnail data and other data in the content data. As the thumbnail data and other data, data stored in the storage unit 28 and data acquired via the communication unit 27 are used. Among the thumbnail data and other data, the individual information data is displayed in conjunction with the text data.

符号３３は、映像／音声／テキスト表示ソフトウェア３９を用いて、コンテンツデータの中の映像データ、音声データ、テキストデータなどを、データバッファ３４を介して記憶部２８から読み出して表示装置２６に表示する映像／音声／テキスト表示部であり、映像表示部、音声出力部、テキスト表示部からなる。映像表示部３５は、コンテンツデータの中の映像データを表示する。映像データは記憶部２８に記憶されたデータや通信部２７経由で取得されたデータを使用し、入力部２５における操作入力に応答する形で、再生、停止、早送り、巻き戻しなどの表示制御を行う。音声出力部３６は、コンテンツデータの中の音声データを表示する。音声データは記憶部２８に記憶されたデータや通信部２７経由で取得されたデータを使用する。音声データは、映像データに連動して出力されるか、または音声データが先行出力され、映像が追従する形で連動して表示される。テキスト表示部３７は、コンテンツデータの中のテキストデータやその他のデータを表示する。テキストデータやその他のデータは記憶部２８に記憶されたデータや通信部２７経由で取得されたデータを使用する。テキストデータやその他データのうち、個別情報データについては、映像データ、または音声データに連動して表示する。 Reference numeral 33 denotes a video / audio / text display software 39 that reads out video data, audio data, text data, and the like in the content data from the storage unit 28 via the data buffer 34 and displays the data on the display device 26. A video / audio / text display unit, which includes a video display unit, an audio output unit, and a text display unit. The video display unit 35 displays video data in the content data. For video data, data stored in the storage unit 28 or data acquired via the communication unit 27 is used, and display control such as playback, stop, fast forward, and rewind is performed in response to an operation input in the input unit 25. Do. The audio output unit 36 displays audio data in the content data. As the audio data, data stored in the storage unit 28 or data acquired via the communication unit 27 is used. The audio data is output in conjunction with the video data, or the audio data is output in advance and displayed in association with the video. The text display unit 37 displays text data and other data in the content data. As text data and other data, data stored in the storage unit 28 and data acquired via the communication unit 27 are used. Among text data and other data, individual information data is displayed in conjunction with video data or audio data.

テキスト／サムネイル表示ソフトウェア３８は、コンテンツデータの中のテキストデータとサムネイルデータなどを用いて、表示装置２６に表示する処理を実現するソフトウェアである。テキスト／サムネイル表示ソフトウェア３８は配信装置１から通信部２７経由で取得し記憶部２８に記憶する。ユーザからのリクエストに応じて、コンテンツデータをテキスト／サムネイル表示部２９により視聴する場合、このソフトウェアが呼び出され、使用される。テキスト／サムネイル表示ソフトウェア３８はバージョンアップにより改変され、最新のソフトウェアは配信装置１から通信部２７経由で逐次取得する。 The text / thumbnail display software 38 is software that realizes processing for displaying on the display device 26 using text data and thumbnail data in the content data. The text / thumbnail display software 38 is acquired from the distribution device 1 via the communication unit 27 and stored in the storage unit 28. In response to a request from the user, when viewing the content data by the text / thumbnail display unit 29, this software is called and used. The text / thumbnail display software 38 is modified by version upgrade, and the latest software is sequentially acquired from the distribution apparatus 1 via the communication unit 27.

映像／音声／テキスト表示ソフトウェア３９は、コンテンツデータの中の映像データ、音声データ、テキストデータなどを用いて、表示装置２６に表示する処理を実現するソフトウェアである。映像／音声／テキスト表示ソフトウェア３９は、配信装置１から通信部２７経由で取得され、記憶部２８に記憶する。ユーザからのリクエストに応じて、コンテンツデータを映像／音声／テキスト表示部３３により視聴する場合、このソフトウェアが呼び出され、使用される。映像／音声／テキスト表示部ソフトウェア３９はバージョンアップにより改変され、最新のソフトウェアは配信装置１から通信部２７経由で逐次取得する。 The video / audio / text display software 39 is software that realizes processing to display on the display device 26 using video data, audio data, text data, etc. in the content data. The video / audio / text display software 39 is acquired from the distribution apparatus 1 via the communication unit 27 and stored in the storage unit 28. In response to a request from the user, when viewing the content data by the video / audio / text display unit 33, this software is called and used. The video / audio / text display unit software 39 is modified by upgrading, and the latest software is sequentially acquired from the distribution apparatus 1 via the communication unit 27.

次に、図１１を参照して、図１に示す受信装置２３におけるコンテンツ閲覧機能を説明する。図１１は、図１に示す受信装置２３におけるコンテンツ閲覧機能を示す図である。受信装置２３は、メタデータ検索機能、読むモード機能及び見るモード機能の３つの機能を有している。 Next, the content browsing function in the receiving device 23 shown in FIG. 1 will be described with reference to FIG. FIG. 11 is a diagram showing a content browsing function in the receiving device 23 shown in FIG. The receiving device 23 has three functions: a metadata search function, a reading mode function, and a viewing mode function.

（１）メタデータ検索機能
キーワードの入力により、番組内の見たい箇所へピンポイントで到達することができる機能である。メタデータは、字幕データを利用して作成し、テキストとして提供される。例えば、検索キーワードとして「ていえん」と入力すると、全番組のメタデータをくまなく検索し、「日本の庭園」と番組内のシーンまでたどり着くことができる。 (1) Metadata search function This is a function that allows a user to pinpoint to a desired place in a program by inputting a keyword. Metadata is created using caption data and provided as text. For example, if “deien” is entered as a search keyword, the metadata of all programs can be searched, and “Japanese garden” and scenes in the program can be reached.

（２）読むモード機能
テキスト（字幕データ）と静止画（サムネイル）を関係付けて表示することにより、手早く情報の閲覧を行うことが可能な機能であり、スクロールとページ送りを行うことができる。テキストと静止画が事前にダウンロードされているため、電波が届かないところでも「いつでもどこでも」閲覧することが可能である。 (2) Reading mode function This is a function that enables quick browsing of information by displaying text (caption data) and a still image (thumbnail) in association with each other, and can perform scrolling and page turning. Since text and still images are downloaded in advance, it can be viewed “anytime, anywhere” even when radio waves do not reach.

（３）見るモード機能
読むモードにおいてテキスト部分をクリックすると、この時点からの動画を再生することが可能な機能であり、早送り／巻き戻し／サーチ機能を備えている。動画再生は、予め受信装置２３内に蓄積した映像コンテンツを再生する機能と、通信または放送を介してストリーミングで再生する機能を備えている。ストリーミングで再生する場合、長時間の動画再生が可能であり、視聴を中断した箇所を記憶しておき、簡単に視聴の再開が可能である。 (3) View mode function This is a function capable of playing back a moving image from this point by clicking on a text part in the read mode, and has a fast forward / rewind / search function. The moving image reproduction has a function of reproducing video content stored in advance in the receiving device 23 and a function of reproducing by streaming via communication or broadcasting. In the case of reproduction by streaming, it is possible to reproduce a moving image for a long time, and it is possible to easily resume viewing by storing a portion where viewing is interrupted.

読むモードと見るモードは、連動して切替えすることが可能であり、読むモードを見たい動画の検索に用いることができる。 The reading mode and the viewing mode can be switched in conjunction with each other, and can be used for searching for a moving image in which the reading mode is desired.

次に、図２を参照して、図１に示す記憶部４に記憶されるコンテンツデータ１１のうち、サムネイルデータ２１を生成する静止画抽出方法を説明する。このサムネイルデータは、配信装置１内において生成してもよいが、配信装置１とは別のサーバ等において生成し、入力出力部３を介して入力して、記憶部４に記憶するようにしてもよい。ここでは、テキスト／サムネイル配信部５が、コンテンツデータ１１の映像データ１８とテキストデータ（字幕データ）２０を使用して、サムネイルデータを生成する（静止画を抽出する）ものとして説明する。 Next, a still image extraction method for generating thumbnail data 21 among the content data 11 stored in the storage unit 4 shown in FIG. 1 will be described with reference to FIG. The thumbnail data may be generated in the distribution device 1, but is generated in a server or the like different from the distribution device 1, input via the input / output unit 3, and stored in the storage unit 4. Also good. Here, description will be made assuming that the text / thumbnail distribution unit 5 generates thumbnail data (extracts a still image) using the video data 18 and the text data (caption data) 20 of the content data 11.

図２は、映像データ１８に時間的に連動する同一の字幕が表示されている間の代表静止画を抽出する機能を示すブロック図である。まず、字幕ページ抽出部５３は、映像データ１８に対して時間的に連動させた字幕ページのテキストデータ２０を読み出す。この字幕ページのテキストデータには、字幕の表示開始時刻（映像データの先頭からの経過時間）と、この表示した字幕の消去時刻（映像データの先頭からの経過時間）の情報が含まれる。字幕ページ抽出部５３は、字幕の表示開始時刻と、字幕の消去時刻の情報とをフレーム抽出部５１へ出力する。 FIG. 2 is a block diagram illustrating a function of extracting a representative still image while the same subtitle that is temporally linked to the video data 18 is displayed. First, the caption page extraction unit 53 reads the text data 20 of the caption page that is temporally linked to the video data 18. The text data of the subtitle page includes information on the subtitle display start time (elapsed time from the top of the video data) and the erase time of the displayed subtitle (elapsed time from the top of the video data). The subtitle page extraction unit 53 outputs the subtitle display start time and the subtitle deletion time information to the frame extraction unit 51.

フレーム抽出部５１は、映像データ１８から同一の字幕が表示されている間の代表静止画を抽出するための候補となるフレームを抽出する。このとき、フレーム抽出部５１は、Ｎ値記憶部５０に予め記憶されているＮ値を読み出す。Ｎ値は、同一の字幕が表示されている間の候補のフレームを何枚抽出するかを定義する値であり、Ｎ値が「２」であれば、同一の字幕が表示されている間のフレームを２枚抽出することを意味する。Ｎ値は、字幕の表示開始時刻のフレームと、字幕の消去時刻のフレームは含まない値であり、予め静止画抽出に適した値が記憶されているものである。 The frame extraction unit 51 extracts a frame that is a candidate for extracting a representative still image while the same caption is displayed from the video data 18. At this time, the frame extraction unit 51 reads the N value stored in advance in the N value storage unit 50. The N value is a value that defines how many candidate frames are extracted while the same subtitle is displayed. If the N value is “2”, the same subtitle is displayed. This means that two frames are extracted. The N value is a value that does not include the subtitle display start time frame and the subtitle erase time frame, and stores a value suitable for still image extraction in advance.

フレーム抽出部５１は、同一の字幕が表示されている間の代表静止画を抽出するための候補となるフレームを抽出する場合、字幕の表示開始時刻のフレーム（請求項でいう第１のフレーム）と、この表示した字幕の消去時刻のフレーム（請求項でいう第２のフレーム）と、この字幕が表示されている間において等時間間隔になるように、Ｎ枚のフレームを抽出する。すなわち、Ｎが２であれば、同一字幕が表示されている間において、４枚（Ｎ＋２枚）のフレームを抽出することになる。 When extracting a frame that is a candidate for extracting a representative still image while the same subtitle is being displayed, the frame extraction unit 51 is a subtitle display start time frame (first frame in the claims). Then, N frames are extracted so that the displayed subtitle erase time frame (second frame in the claims) and the subtitle are displayed at equal time intervals. That is, if N is 2, four (N + 2) frames are extracted while the same subtitle is displayed.

ここで、図４を参照して、フレーム抽出部５１の処理動作を説明する。図４は、フレーム抽出部５１の処理動作を示す図である。図４において、ｎ（ｎは自然数）番目の字幕_ｎの表示区間が終了したフレームを、フレームＦ_ｍ（字幕_ｎの消去フレーム）とし、新たな字幕_ｎ＋１の表示が開始される時刻をＴＣ_ＩＮとし、この字幕_ｎ＋１が消去される時刻をＴＣ_ＯＵＴとする。ＴＣ_ＩＮ〜ＴＣ_ＯＵＴの間（字幕_ｎ＋１の表示区間）のフレームが代表静止画を抽出する対象のフレームである。ここでは、Ｎ値記憶部５０に記憶されているＮ値が「２」であるものとして説明する。 Here, the processing operation of the frame extraction unit 51 will be described with reference to FIG. FIG. 4 is a diagram illustrating a processing operation of the frame extraction unit 51. In FIG. 4, a frame in which the display section of the nth (n is a natural number) subtitle _n has ended is a frame F _m (an erase frame of subtitle _n ), and a time at which the display of a new subtitle _{n + 1} is started is TC _IN. The time when the subtitle _{n + 1} is erased is TC _OUT . A frame between TC _{IN and} TC _OUT (display section of subtitle _{n + 1} ) is a target frame from which a representative still image is extracted. Here, it is assumed that the N value stored in the N value storage unit 50 is “2”.

まず、フレーム抽出部５１は、時刻ＴＣ_ＩＮのフレームＦ_ｍ＋１（字幕_ｎ＋１の表示開始フレーム）と、時刻ＴＣ_ＯＵＴのフレームＦ_ｍ＋ｚ（字幕_ｎ＋１の消去フレーム）を抽出する。そして、フレーム抽出部５１は、ＴＣ_ＯＵＴからＴＣ_ＩＮを減算（ＴＣ_ＯＵＴ−ＴＣ_ＩＮ）することにより、字幕_ｎ＋１が表示されている時間Ｔｉｍｅを算出して、Ｎ値に基づいて抽出するフレームの時間間隔（Ｔｉｍｅ／（Ｎ＋１））を算出する。ここでは、Ｎ値が「２」であるため、Ｔｉｍｅを３で除算した値となる。 First, the frame extraction unit 51 extracts a frame F _{m + 1 at} time TC _IN (display start frame of subtitle _{n + 1} ) and a frame F _{m + z at} time TC _OUT (erasure frame of subtitle _{n + 1} ). Then, the frame extraction unit 51 subtracts TC _IN from TC _OUT (TC _OUT -TC _IN ) to calculate the time Time during which the caption _{n + 1} is displayed, and extracts the frame time based on the N value. The interval (Time / (N + 1)) is calculated. Here, since the N value is “2”, the value is obtained by dividing Time by 3.

次に、フレーム抽出部５１は、時刻ＴＣ_ＩＮに時間（Ｔｉｍｅ／（Ｎ＋１））を加算して、フレーム抽出時刻ＴＣ_１を算出する。また、フレーム抽出部５１は、時刻ＴＣ_１に時間（Ｔｉｍｅ／（Ｎ＋１））を加算して、フレーム抽出時刻ＴＣ_２を算出する。そして、フレーム抽出部５１は、時刻ＴＣ_１に最も近い時刻情報が付与されているフレームＦ_ｍ＋ｘと、時刻ＴＣ_２に最も近い時刻情報が付与されているフレームＦ_ｍ＋ｙを抽出する。Ｎ値が「２」であるため、２枚のフレームが抽出されることになる。この処理動作によって、（Ｎ＋２）枚のフレーム（Ｆ_ｍ＋１、Ｆ_ｍ＋ｘ、Ｆ_ｍ＋ｙ、Ｆ_ｍ＋ｚ）が抽出されたことになる。 Next, the frame extraction unit 51 adds the time (Time / (N + 1)) to the time TC _IN to calculate the frame extraction time TC ₁ . In addition, the frame extraction unit 51 calculates the frame extraction time TC ₂ by adding time (Time / (N + 1)) to the time TC ₁ . Then, the frame extraction unit 51 extracts the frame F _{m + x} to which the time information closest to the time TC ₁ is assigned and the frame F _{m + y} to which the time information closest to the time TC ₂ is assigned. Since the N value is “2”, two frames are extracted. With this processing operation, (N + 2) frames (F _{m + 1} , F _{m + x} , F _{m + y} , F _{m + z} ) are extracted.

次に、図２に戻り、フレーム抽出部５１は、抽出した（Ｎ＋２）枚のフレームのフレームデータを差分算出部５４と静止画抽出部５９へ出力する。ここで、出力されるフレームデータには、各フレームデータに付与されている時刻情報が含まれる。これを受けて、差分算出部５４は、（Ｎ＋２）枚のフレームデータのうち、時間的に隣り合う２枚フレームの画像の差分をそれぞれ求める。図４に示す例においては、フレームＦ_ｍ＋１とフレームＦ_ｍ＋ｘとの差分Ｄ１、フレームＦ_ｍ＋ｘとフレームＦ_ｍ＋ｙとの差分Ｄ２、フレームＦ_ｍ＋ｙとフレームＦ_ｍ＋ｚとの差分Ｄ３の３つの差分値（Ｄ１、Ｄ２、Ｄ３）が求められることになる。これは、Ｎ値が「２」の場合の例であり、求められる差分値の数は、Ｎ＋１となる。 Next, returning to FIG. 2, the frame extraction unit 51 outputs the frame data of the extracted (N + 2) frames to the difference calculation unit 54 and the still image extraction unit 59. Here, the output frame data includes time information given to each frame data. In response to this, the difference calculation unit 54 calculates the difference between the images of two temporally adjacent frames from among the (N + 2) frame data. In the example shown in FIG. 4, three difference values (D1) include a difference D1 between the frame F _{m + 1} and the frame F _{m + x} , a difference D2 between the frame F _{m + x} and the frame F _{m + y} , and a difference D3 between the frame F _{m + y} and the frame F _{m + z.} , D2, D3). This is an example when the N value is “2”, and the number of obtained difference values is N + 1.

差分算出部５４が行う２枚のフレームデータの差分算出処理は、単純に対応する画素値毎に減算を行って差を求める方法を用いてもよい。また、映像を符号化してデータ圧縮を行う場合に用いられる動き補償予測処理（例えば、デジタル放送ハンドブック、オーム社、第２編圧縮技術１章映像符号化技術参照）のように、前後のフレームの類似した性質を利用してデータ量を削減する技術を応用し、参照フレーム（時間的に前のフレーム）からの動き量（動きベクトル）を検出して、その動きの大きさに応じて、画像をずらして差を取るようにして差分を求めるようにしてもよい。 The difference calculation process of the two pieces of frame data performed by the difference calculation unit 54 may use a method of simply subtracting each corresponding pixel value to obtain the difference. Also, motion compensation prediction processing used when encoding video and compressing data (for example, refer to Digital Broadcasting Handbook, Ohmsha, Chapter 2, Compression Technology, Chapter 1, Video Coding Technology). Applying a technology to reduce the amount of data using similar properties, detect the amount of motion (motion vector) from the reference frame (temporarily previous frame), and depending on the magnitude of the motion, The difference may be obtained by shifting to obtain the difference.

このようにすることにより、場面は変化していないが、場面中の人物のみが動いている場合などは、差分が小さいと判定することができる。このような方法を用いて、差分を求めることにより、場面が変化したか否かを判定することが可能となる。差分値算出部５４が行う差分の算出は、２枚のフレームデータの相関が高い（場面が変わっていない）ことを検出することができる公知の方法を用いることが可能である。 By doing so, it is possible to determine that the difference is small when the scene is not changed but only the person in the scene is moving. It is possible to determine whether or not the scene has changed by obtaining the difference using such a method. The calculation of the difference performed by the difference value calculation unit 54 can use a known method capable of detecting that the correlation between the two pieces of frame data is high (the scene has not changed).

次に、差分値算出部５４は、求めた３つの差分値Ｄ１、Ｄ２、Ｄ３を静止画抽出部５９へ出力する。３つの差分値Ｄ１、Ｄ２、Ｄ３は、フレームの時刻が早い順に静止画抽出部５９へ出力される。これを受けて、静止画抽出部５９は、３つの差分値Ｄ１、Ｄ２、Ｄ３のそれぞれの絶対値を比較して、最も小さい差分値を求め、最も小さい差分値を持つ２枚のフレームを特定する。続いて、静止画抽出部５９は、特定した２枚のフレームのうち時間的に早い方のフレームを抽出する。すなわち、図４に示す時間Ｔｉｍｅを３つの区間に分割して、３つの区間のそれぞれの先頭と最後のフレーム間に差がないということは、場面が変化していない、または似ている場面が所定時間継続したことを意味するため、このフレームが時間Ｔｉｍｅにおける代表静止画であると見なして抽出を行う。
なお、特定した２枚のフレームは、場面が変化していない、または似ている場面が所定時間継続したフレームであるため、特定した２枚のフレームのうち、時間的に遅い方のフレームを代表静止画として抽出するようにしてもよい。 Next, the difference value calculation unit 54 outputs the obtained three difference values D1, D2, and D3 to the still image extraction unit 59. The three difference values D1, D2, and D3 are output to the still image extraction unit 59 in the order of early frame time. In response to this, the still image extraction unit 59 compares the absolute values of the three difference values D1, D2, and D3 to obtain the smallest difference value, and identifies two frames having the smallest difference value. To do. Subsequently, the still image extraction unit 59 extracts the earlier frame of the two identified frames. That is, when the time Time shown in FIG. 4 is divided into three sections and there is no difference between the first and last frames of each of the three sections, the scene has not changed or is similar. Since this means that it has continued for a predetermined time, the extraction is performed by regarding this frame as a representative still image at time Time.
Note that the two specified frames are frames in which the scene has not changed or a similar scene has continued for a predetermined time, and therefore, the frame that is later in time among the two specified frames is represented. You may make it extract as a still image.

例えば、図４に示すように、フレームＦ_ｍ＋１において、「Ｂ」という場面が映っており、フレームＦ_ｍ＋ｘにおいては、「Ｃ」という場面が映っているため、この２枚のフレームデータの差分値は大きい値となる。また、フレームＦ_ｍ＋ｙにおいて、「Ｃ」という場面が映っており、フレームＦ_ｍ＋ｚにおいては、「Ｄ」という場面が映っているため、この２枚のフレームデータの差分値も大きい値となる。しかし、フレームＦ_ｍ＋ｘとフレームＦ_ｍ＋ｙには、映っている位置は異なるが、いずれも「Ｃ」という場面が映っており、前述した動き補償予測処理を用いて差分を求めると差分値が小さい値となる。したがって、図４に示す例においては、フレームＦ_ｍ＋ｘが代表静止画として抽出されることになる。 For example, as shown in FIG. 4, a scene “B” is shown in the frame F _{m + 1} , and a scene “C” is shown in the frame F _{m + x} . Is a large value. Further, since the scene “C” is shown in the frame F _{m + y and} the scene “D” is shown in the frame F _{m + z} , the difference value between the two pieces of frame data is also a large value. However, the frame F _{m + x} and the frame F _{m + y} have different positions, but both have a scene “C”. When the difference is obtained using the motion compensation prediction process described above, the difference value is small. It becomes. Therefore, in the example shown in FIG. 4, the frame F _{m + x} is extracted as a representative still image.

次に、静止画抽出部５９は、抽出したフレームデータと、このフレームに付与されている時刻情報（映像データの先頭からの経過時間の情報）とを時間・静止画記録部５２へ記録する。これにより、図４に示す字幕ｎ＋１の表示区間を代表する代表静止画（図４に示す例においては、フレームＦｍ＋ｘ）が抽出されて、時間・静止画記録部５２に記録されることになる。そして、フレーム抽出部５１及び字幕ページ抽出部５３は、字幕ｎ＋２の表示区間において、前述した処理動作を再び行うことによって、字幕ｎ＋２の表示区間における代表静止画を抽出する。この処理動作を映像データの最後まで繰り返し行うことにより、同一字幕が表示されている中で最も代表的な静止画を抽出することができる。この処理によって抽出された代表静止画が、記憶部４に記憶されるサムネイルデータ２１となる。 Next, the still image extraction unit 59 records the extracted frame data and the time information (elapsed time information from the beginning of the video data) attached to this frame in the time / still image recording unit 52. As a result, a representative still image (frame Fm + x in the example shown in FIG. 4) representative of the display section of the caption n + 1 shown in FIG. 4 is extracted and recorded in the time / still image recording unit 52. Then, the frame extraction unit 51 and the caption page extraction unit 53 extract the representative still image in the display section of the caption n + 2 by performing the above-described processing operation again in the display section of the caption n + 2. By repeating this processing operation until the end of the video data, it is possible to extract the most representative still image among the same subtitles being displayed. The representative still image extracted by this processing becomes the thumbnail data 21 stored in the storage unit 4.

このように、映像データ１８にテキストデータ（字幕）が付随していることを利用し、同一字幕が表示されている中で最も代表的な静止画を抽出するようにしたため、映像データの内容を理解するために過不足なく必要な量の静止画を映像データから抽出することができる。これにより、字幕がある部分は音声による話や説明がある場所であり、放送番組など映像コンテンツの内容理解のために必要な静止画を抽出することができる。 As described above, since the video data 18 is accompanied by the text data (caption), the most representative still image is extracted among the same subtitles displayed. A necessary amount of still images can be extracted from the video data without excess or deficiency. Thereby, the part with the subtitle is a place where there is a speech or explanation by voice, and a still image necessary for understanding the content of the video content such as a broadcast program can be extracted.

次に、図３を参照して、図２に示す機能ブロックの変形例を説明する。図３に示す機能ブロック図が図２に示す機能ブロック図と異なる点は、句読点抽出部６０を新たに設けた点である。句読点抽出部６０は、字幕ページ抽出部５３が読み出した字幕ページのテキストデータ２０中に含まれる読点「、」または句点「。」を抽出し、テキストデータ２０中に句読点が含まれていた場合は、フレーム抽出部５１へ句読点が含まれていたことを通知する。フレーム抽出部５１は、句読点抽出部６０によって字幕のテキストデータ内に句点または読点が検出された場合のみに、前述した（Ｎ＋２）枚のフレーム抽出処理を行い、字幕のテキストデータ内に句点または読点が検出されない場合は、前述した（Ｎ＋２）枚のフレーム抽出処理を行なわずに、次の字幕データの処理へ移行する。（Ｎ＋２）枚のフレーム抽出処理を行った後の処理動作は、前述した動作と同様であるため、ここでは、詳細な処理動作の説明を省略する。 Next, a modification of the functional block shown in FIG. 2 will be described with reference to FIG. The functional block diagram shown in FIG. 3 is different from the functional block diagram shown in FIG. 2 in that a punctuation mark extraction unit 60 is newly provided. The punctuation mark extraction unit 60 extracts the punctuation mark “,” or the punctuation mark “.” Included in the text data 20 of the subtitle page read by the subtitle page extraction unit 53, and when the punctuation mark is included in the text data 20. The frame extraction unit 51 is notified that punctuation marks are included. The frame extraction unit 51 performs the above-described (N + 2) frame extraction processing only when the punctuation mark extraction unit 60 detects the punctuation mark or punctuation mark in the caption text data, and the punctuation mark or punctuation mark is included in the caption text data. Is not detected, the processing proceeds to the next caption data processing without performing the above-described (N + 2) frame extraction processing. Since the processing operation after performing the (N + 2) frame extraction process is the same as the above-described operation, a detailed description of the processing operation is omitted here.

このように、字幕中に句読点が含まれていた場合には、表示される字幕に句読点が含まれる場合に表示された場面は、内容を理解するための代表静止画が含まれている可能性が高いと見なして、代表静止画を抽出する処理を実行し、表示される字幕に句読点が含まれていない場合に表示された場面は、内容を理解するための代表静止画が含まれている可能性が低いと見なして、代表静止画を抽出する処理を行わないようにしたため、効率よく代表静止画の抽出を行うことができる。 In this way, when punctuation marks are included in the subtitles, the scene displayed when punctuation marks are included in the displayed subtitles may include a representative still image for understanding the contents. If the displayed subtitles do not contain punctuation marks, the displayed scene contains a representative still image for understanding the content. Since it is considered that the possibility is low and the process of extracting the representative still image is not performed, the representative still image can be extracted efficiently.

次に、図５を参照して、他の静止画抽出方法を説明する。図５は、映像データを入力し、映像の差分に応じて静止画を抽出する機能を示すブロック図である。まず、フレーム抽出部５１は、映像データ１８を読み込み、各フレームを切り出す。これと並行して差分算出部抽出部５４は、映像データ１８を読み込み、各フレームを切り出し、前後のフレーム間の差分を算出する。この算出した差分量が大きい場合、差分算出部５４は、フレーム抽出部５１に対して、現時点のフレームを時間・静止画記録部５２に記録するように指示を出すとともに、時計Ｔに対して、現時点の時間（映像データの先頭からの経過時間）を時間・静止画記録部５２に記録するように指示を出す。 Next, another still image extraction method will be described with reference to FIG. FIG. 5 is a block diagram showing a function of inputting video data and extracting a still image in accordance with the video difference. First, the frame extraction unit 51 reads the video data 18 and cuts out each frame. In parallel with this, the difference calculation unit extraction unit 54 reads the video data 18, cuts out each frame, and calculates the difference between the previous and next frames. When the calculated difference amount is large, the difference calculation unit 54 instructs the frame extraction unit 51 to record the current frame in the time / still image recording unit 52 and An instruction is issued to record the current time (elapsed time from the beginning of the video data) in the time / still image recording unit 52.

これを受けて、フレーム抽出部５１は、指示が出されたタイミングのフレームを時間・静止画記録部５２に記録する。また、時計Ｔは、指示が出されたタイミングの時間情報を時間・静止画記録部５２に記録されたフレームデータ（静止画）に関係付けて記録する。この動作を映像データ１８の最後まで繰り返すことにより、時間・静止画記録部５２には、抽出された静止画データと時間情報が関係付けられたサムネイルデータが複数記録されることになる。映像データ１８の最後まで処理が終了すると、時間・静止画記録部５２は、記録したサムネイルデータを記憶部４に記憶する。このサムネイルデータ２１は、図１１に示す読むモード機能の静止画に用いられることになる。 In response to this, the frame extraction unit 51 records the frame at the timing when the instruction is issued in the time / still image recording unit 52. The clock T records the time information of the timing at which the instruction is issued in relation to the frame data (still image) recorded in the time / still image recording unit 52. By repeating this operation until the end of the video data 18, a plurality of thumbnail data in which the extracted still image data and time information are related are recorded in the time / still image recording unit 52. When the processing is completed to the end of the video data 18, the time / still image recording unit 52 stores the recorded thumbnail data in the storage unit 4. The thumbnail data 21 is used for a still image of the reading mode function shown in FIG.

このように、映像フレーム間の画像差分を算出し、一つのフレームからの画像差分が一定の量を超えた場合に静止画を抽出することができる。これにより、映像変化がない場合に静止画を抽出しても同じ内容の画像であるため、コンテンツ理解のための有用な情報になりにくいが、映像変化ある部分は異なる内容が含まれるので、映像コンテンツの理解に必要な映像情報が含まれている可能性が高くなる。 In this way, an image difference between video frames is calculated, and a still image can be extracted when the image difference from one frame exceeds a certain amount. As a result, even if a still image is extracted when there is no video change, it will not be useful information for content understanding because it is the same content image, but the video change part contains different content, so the video There is a high possibility that video information necessary for understanding the content is included.

次に、図６を参照して、他の静止画抽出方法を説明する。図６は、映像データを入力し、時間間隔と映像の差分の大きさと組み合わせて静止画を抽出する機能を示すブロック図である。まず、フレーム抽出部５５は、映像データ１８を読み込み、時計ＴＡが出力する時間情報を参照して、比較的短い一定時間間隔毎に、フレームを切り出し、フレーム抽出部５１及び差分算出部５４へ切り出したフレームを出力する。そして、差分算出部５４は、前後のフレーム間の差分を算出する。この算出した差分量が大きい場合、差分算出部５４は、フレーム抽出部５１に対して、現時点のフレームを時間・静止画記録部５２に記録するように指示を出すとともに、時計Ｔに対して、現時点の時間（映像データの先頭からの経過時間）を時間・静止画記録部５２に記録するように指示を出す。 Next, another still image extraction method will be described with reference to FIG. FIG. 6 is a block diagram illustrating a function of inputting video data and extracting a still image in combination with the time interval and the magnitude of the video difference. First, the frame extraction unit 55 reads the video data 18, refers to the time information output by the clock TA, cuts out frames at relatively short fixed time intervals, and cuts out the frames to the frame extraction unit 51 and the difference calculation unit 54. Output the frame. Then, the difference calculation unit 54 calculates the difference between the previous and next frames. When the calculated difference amount is large, the difference calculation unit 54 instructs the frame extraction unit 51 to record the current frame in the time / still image recording unit 52 and An instruction is issued to record the current time (elapsed time from the beginning of the video data) in the time / still image recording unit 52.

これを受けて、フレーム抽出部５１は、指示が出されたタイミングのフレームを時間・静止画記録部５２に記録する。また、時計Ｔは、指示が出されたタイミングの時間情報を時間・静止画記録部５２に記録されたフレームデータ（静止画）に関係付けて記録する。この動作を映像データ１８の最後まで繰り返すことにより、時間・静止画記録部５２には、抽出された静止画データと時間情報が関係付けられたサムネイルデータが複数記録されることになる。映像データ１８の最後まで処理が終了すると、時間・静止画記録部５２は、記録したサムネイルデータを記憶部４に記憶する。このサムネイルデータ２１は、図１１に示す読むモード機能の静止画に用いられることになる。 In response to this, the frame extraction unit 51 records the frame at the timing when the instruction is issued in the time / still image recording unit 52. The clock T records the time information of the timing at which the instruction is issued in relation to the frame data (still image) recorded in the time / still image recording unit 52. By repeating this operation until the end of the video data 18, a plurality of thumbnail data in which the extracted still image data and time information are related are recorded in the time / still image recording unit 52. When the processing is completed up to the end of the video data 18, the time / still image recording unit 52 stores the recorded thumbnail data in the storage unit 4. The thumbnail data 21 is used for a still image of the reading mode function shown in FIG.

このように、一定時間間隔を短くして抽出しておき、映像変化が少ない静止画を抽出しないようにして静止画を抽出することができる。これにより、変化が少ない静止画は冗長なため省くことができる。 As described above, it is possible to extract a still image without extracting a still image with a small change in video by extracting the image at a predetermined time interval. As a result, still images with little change can be omitted because they are redundant.

次に、図７を参照して、他の静止画抽出方法を説明する。図７は、映像データを入力し、時間間隔と映像の差分の大きさと組み合わせて静止画を抽出する機能を示すブロック図である。まず、フレーム抽出部５６は、映像データ１８を読み込み、時計ＴＡが出力する時間情報を参照して、比較的長い一定時間間隔毎に、フレームを切り出し、時間ＴＡが出力する時間情報と関係付けて時間・静止画記録部５２に記録する。これと並行して、フレーム抽出部５１は、映像データ１８を読み込み、各フレームを切り出すとともに、差分算出部５４は、映像データ１８を読み込み、各フレームを切り出し、前後のフレーム間の差分を算出する。差分算出部５４は、算出した差分量が大きい場合、フレーム抽出部５１に対して、現時点のフレームを時間・静止画記録部５２に記録するように指示を出すとともに、時計Ｔに対して、現時点の時間（映像データの先頭からの経過時間）を時間・静止画記録部５２に記録するように指示を出す。 Next, another still image extraction method will be described with reference to FIG. FIG. 7 is a block diagram illustrating a function of inputting video data and extracting a still image in combination with the time interval and the magnitude of the video difference. First, the frame extraction unit 56 reads the video data 18, refers to the time information output by the clock TA, cuts out frames at a relatively long fixed time interval, and relates them to the time information output by the time TA. Record in the time / still image recording unit 52. In parallel with this, the frame extraction unit 51 reads the video data 18 and cuts out each frame, and the difference calculation unit 54 reads the video data 18, cuts out each frame, and calculates the difference between the previous and subsequent frames. . When the calculated difference amount is large, the difference calculation unit 54 instructs the frame extraction unit 51 to record the current frame in the time / still image recording unit 52, and Is instructed to record the time (elapsed time from the beginning of the video data) in the time / still image recording unit 52.

このように、比較的長い一定時間間隔で静止画を抽出するとともに、映像の差分が一定以上大きいと判定された場合、この時点の静止画を抽出して加えることによって静止画を抽出することができる。これにより、一定の時間間隔で静止画を得て、さらに映像変化がある部分の重要な静止画を追加することができる。 In this way, still images are extracted at relatively long fixed time intervals, and when it is determined that the difference in video is larger than a certain value, still images can be extracted by extracting and adding still images at this time point. it can. As a result, still images can be obtained at regular time intervals, and important still images can be added where there are video changes.

次に、図８を参照して、他の静止画抽出方法を説明する。図８は、映像データを入力し、時間間隔と字幕ページ数と組み合わせて静止画を抽出する機能を示すブロック図である。まず、フレーム抽出部５６は、映像データ１８を読み込み、時計ＴＡが出力する時間情報を参照して、比較的長い一定時間間隔毎に、フレームを切り出し、時間ＴＡが出力する時間情報と関係付けて時間・静止画記録部５２に記録する。これと並行して、フレーム抽出部５１は、映像データ１８を読み込み、各フレームを切り出すとともに、字幕ページ数判定部５７は、テキストデータを映像データ１８に対して時間的に連動させた字幕ページのページ数（字幕の更新回数）を判定する。この判定の結果、字幕ページ数（字幕の更新回数）が所定値を超えていた場合、フレーム抽出部５１に対して、現時点のフレームを時間・静止画記録部５２に記録するように指示を出すとともに、時計Ｔに対して、現時点の時間（映像データの先頭からの経過時間）を時間・静止画記録部５２に記録するように指示を出す。 Next, another still image extraction method will be described with reference to FIG. FIG. 8 is a block diagram illustrating a function of inputting video data and extracting a still image in combination with a time interval and the number of subtitle pages. First, the frame extraction unit 56 reads the video data 18, refers to the time information output by the clock TA, cuts out frames at a relatively long fixed time interval, and relates them to the time information output by the time TA. Record in the time / still image recording unit 52. In parallel with this, the frame extraction unit 51 reads the video data 18 and cuts out each frame, and the subtitle page number determination unit 57 generates a subtitle page in which the text data is temporally linked to the video data 18. Determine the number of pages (number of subtitle updates). As a result of the determination, if the number of subtitle pages (number of subtitle updates) exceeds a predetermined value, the frame extraction unit 51 is instructed to record the current frame in the time / still image recording unit 52. At the same time, the clock T is instructed to record the current time (elapsed time from the beginning of the video data) in the time / still image recording unit 52.

このように、比較的長い一定時間間隔で静止画を抽出するとともに、字幕ページの数（字幕が更新された回数）が所定値より多くなる場合は、この時点の静止画を抽出して加えることによって静止画を抽出することができる。これにより、字幕とのバランスを取るように静止画を追加することができる。 In this way, still images are extracted at relatively long fixed time intervals, and if the number of subtitle pages (the number of times subtitles are updated) exceeds a predetermined value, the still images at this time are extracted and added. Can extract a still image. Thereby, a still image can be added so as to balance the subtitles.

次に、図９を参照して、他の静止画抽出方法を説明する。図９は、映像データを入力し、字幕ページと映像差分を組み合わせて静止画を抽出する機能を示すブロック図である。まず、フレーム抽出部５６は、映像データ１８を読み込み、各フレームを切り出す。これと並行して字幕ページ抽出部５３は、テキストデータを映像データ１８に対して時間的に連動させた字幕ページを抽出する。そして、字幕ページ抽出部５３は、字幕ページが更新されるタイミングでフレーム抽出部５６に対して、現時点のフレームを時間・静止画記録部５２に記録するように指示を出すとともに、時計ＴＢに対して、現時点の時間（映像データの先頭からの経過時間）を時間・静止画記録部５２に記録するように指示を出す。これを受けて、フレーム抽出部５６は、指示が出されたタイミングのフレームを時間・静止画記録部５２に記録する。また、時計ＴＢは、指示が出されたタイミングの時間情報を時間・静止画記録部５２に記録されたフレームデータ（静止画）に関係付けて記録する。 Next, another still image extraction method will be described with reference to FIG. FIG. 9 is a block diagram illustrating a function of inputting video data and extracting a still image by combining a caption page and a video difference. First, the frame extraction unit 56 reads the video data 18 and cuts out each frame. In parallel with this, the caption page extraction unit 53 extracts a caption page in which text data is temporally linked to the video data 18. Then, the subtitle page extraction unit 53 instructs the frame extraction unit 56 to record the current frame in the time / still image recording unit 52 at the timing when the subtitle page is updated, and to the clock TB. The current time (elapsed time from the beginning of the video data) is instructed to be recorded in the time / still image recording unit 52. In response to this, the frame extraction unit 56 records the frame at the timing when the instruction is issued in the time / still image recording unit 52. In addition, the clock TB records the time information of the timing at which the instruction is issued in relation to the frame data (still image) recorded in the time / still image recording unit 52.

一方、フレーム抽出部５１は、映像データ１８を読み込み、各フレームを切り出す。これと並行して、差分算出部５４は、映像データ１８を読み込み、時間間隔判定部５８が判定した時間間隔が所定時間であれば、各フレームを切り出し、前後のフレーム間の差分を算出する。差分算出部５４は、算出した差分量が大きい場合、フレーム抽出部５１に対して、現時点のフレームを時間・静止画記録部５２に記録するように指示を出すとともに、時計ＴＣに対して、現時点の時間を時間・静止画記録部５２に記録するように指示を出す。これを受けて、フレーム抽出部５１は、指示が出されたタイミングのフレームを時間・静止画記録部５２に記録する。また、時計ＴＣは、指示が出されたタイミングの時間情報を時間・静止画記録部５２に記録されたフレームデータ（静止画）に関係付けて記録する。 On the other hand, the frame extraction unit 51 reads the video data 18 and cuts out each frame. In parallel with this, the difference calculation unit 54 reads the video data 18 and, if the time interval determined by the time interval determination unit 58 is a predetermined time, cuts out each frame and calculates the difference between the previous and subsequent frames. When the calculated difference amount is large, the difference calculation unit 54 instructs the frame extraction unit 51 to record the current frame in the time / still image recording unit 52 and also notifies the clock TC of the current time Is instructed to be recorded in the time / still image recording unit 52. In response to this, the frame extraction unit 51 records the frame at the timing when the instruction is issued in the time / still image recording unit 52. The clock TC records the time information of the timing at which the instruction is issued in relation to the frame data (still image) recorded in the time / still image recording unit 52.

以上の動作を映像データ１８の最後まで繰り返すことにより、時間・静止画記録部５２には、抽出された静止画データと時間情報が関係付けられたサムネイルデータが複数記録されることになる。映像データ１８の最後まで処理が終了すると、時間・静止画記録部５２は、記録したサムネイルデータを記憶部４に記憶する。このサムネイルデータ２１は、図１１に示す読むモード機能の静止画に用いられることになる。 By repeating the above operation until the end of the video data 18, the time / still image recording unit 52 records a plurality of thumbnail data in which the extracted still image data and the time information are associated with each other. When the processing is completed to the end of the video data 18, the time / still image recording unit 52 stores the recorded thumbnail data in the storage unit 4. The thumbnail data 21 is used for a still image of the reading mode function shown in FIG.

このように、映像データ１８にテキストデータ（字幕）が付随していることを利用し、字幕の表示が更新されるごとに静止画を抽出することができるとともに、映像フレーム間の画像差分を算出し、一つのフレームからの画像差分が一定の量を超えた場合に静止画を追加することができる。これにより、重要度が高い静止画を集めることができる。 Thus, using the fact that text data (caption) is attached to the video data 18, a still image can be extracted every time the display of the subtitle is updated, and an image difference between video frames is calculated. A still image can be added when the image difference from one frame exceeds a certain amount. Thereby, still images with high importance can be collected.

前述した「一定の時間間隔で抽出する」、「字幕表示ページごとに抽出する」及び「映像変化が大きい場合に抽出する」の３種類の抽出方法には以下の特徴を有している。すなわち、一定の時間間隔で抽出する方法は、時間間隔を短くすると不必要に静止画が多くなり、時間間隔が長いと必要な静止画が抽出されないことがある。また、字幕表示ページごとに抽出する方法は、字幕が非常に少ないと、抽出される静止画も少なくなり過ぎる場合があり、理解に必要な映像シーンの静止画が不足することがある。また、映像変化が大きい場合に抽出する方法は、映像変化が大きい場合の静止画を抽出すると、異なる内容の静止画が選択される確率が高くなるが、人物が話しているシーンが固定的な映像である場合、話している字幕に比べて静止画が極端に少なくことがある。一方、映像変化が小さい場合の静止画を抽出するようにすると、無駄な静止画が多くなりすぎることがある。 The above three types of extraction methods of “extracting at regular time intervals”, “extracting for each subtitle display page”, and “extracting when the video change is large” have the following characteristics. That is, in the method of extracting at a constant time interval, if the time interval is shortened, still images are unnecessarily increased, and if the time interval is long, necessary still images may not be extracted. Also, the method of extracting for each subtitle display page may be that there are too few still images to be extracted if there are very few subtitles, and there may be a shortage of still images of video scenes necessary for understanding. In addition, when extracting a still image when the video change is large, the extraction method when the video change is large increases the probability that a still image with a different content is selected, but the scene where the person is speaking is fixed. In the case of video, there may be extremely few still images compared to the subtitles being spoken. On the other hand, if still images are extracted when the video change is small, there may be too many useless still images.

これらの３つの抽出方法は、１つの抽出方法では、適度な静止画を抽出することは難しいが、これらの抽出方法を組み合わせることにより、必要最小限の静止画を自動的に抽出することができるようになる。 With these three extraction methods, it is difficult to extract an appropriate still image with one extraction method, but by combining these extraction methods, the minimum required still image can be automatically extracted. It becomes like this.

以上説明したように、本発明による静止画抽出方法によれば、重要度の高い静止画のみを効率よく抽出することができる。これによって抽出した静止画を読みモード機能に閲覧可能とすることにより、これらの静止画によりテレビ映像などの動画を含むコンテンツを短時間で素早く閲覧し理解することができる。また、これらの静止画によりコンテンツ中の求める情報をブラウジングして検索することができる。また、これらの静止画により検索した後、テレビ映像などの必要な動画シーンを見るモード機能に切り替えることによって見ることができる。特に、静止画と、この静止画を抽出した時間の情報を関係付けて記憶するようにしたため、図１０に示すように、静止画の時間と動画の時間を連動させることができ、静止画から必要な動画シーンを見るモードに直ちに移行することができる。逆に必要な動画シーンを見終わった場合、見終わった時間の近傍の静止画に戻り、静止画による閲覧や検索を行う読むモードに戻ることができる。 As described above, according to the still image extraction method of the present invention, it is possible to efficiently extract only still images with high importance. By making it possible to browse the extracted still images using the reading mode function, it is possible to quickly browse and understand content including moving images such as television images by using these still images. In addition, it is possible to browse and search for information required in the content using these still images. In addition, after searching with these still images, it can be viewed by switching to a mode function for viewing a necessary moving image scene such as a television image. In particular, since the still image and the information on the time when the still image was extracted are related and stored, the time of the still image and the time of the moving image can be linked as shown in FIG. It is possible to immediately shift to a mode for viewing a necessary moving image scene. On the other hand, when the necessary moving image scene is finished, it is possible to return to the still image near the time when the viewing is finished and return to the reading mode for browsing and searching by the still image.

なお、図２〜７における処理部の機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより静止画抽出処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータシステム」は、ホームページ提供環境（あるいは表示環境）を備えたＷＷＷシステムも含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（ＲＡＭ）のように、一定時間プログラムを保持しているものも含むものとする。 A program for realizing the functions of the processing units in FIGS. 2 to 7 is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into the computer system and executed to execute a still image. An extraction process may be performed. Here, the “computer system” includes an OS and hardware such as peripheral devices. The “computer system” includes a WWW system having a homepage providing environment (or display environment). The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Further, the “computer-readable recording medium” refers to a volatile memory (RAM) in a computer system that becomes a server or a client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. In addition, those holding programs for a certain period of time are also included.

また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、上記プログラムは、前述した機能の一部を実現するためのものであってもよい。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であってもよい。 The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, what is called a difference file (difference program) may be sufficient.

本発明の一実施形態の構成を示すブロック図である。It is a block diagram which shows the structure of one Embodiment of this invention. 図１に示すテキスト／サムネイル配信部５における静止画抽出の機能を示すブロック図である。It is a block diagram which shows the function of the still image extraction in the text / thumbnail delivery part 5 shown in FIG. 図１に示すテキスト／サムネイル配信部５における静止画抽出の機能を示すブロック図である。It is a block diagram which shows the function of the still image extraction in the text / thumbnail delivery part 5 shown in FIG. 図２、図３に示すフレーム抽出部５１の処理動作を示す説明図である。It is explanatory drawing which shows the processing operation of the frame extraction part 51 shown in FIG. 2, FIG. 図１に示すテキスト／サムネイル配信部５における静止画抽出の機能を示すブロック図である。It is a block diagram which shows the function of the still image extraction in the text / thumbnail delivery part 5 shown in FIG. 図１に示すテキスト／サムネイル配信部５における静止画抽出の機能を示すブロック図である。It is a block diagram which shows the function of the still image extraction in the text / thumbnail delivery part 5 shown in FIG. 図１に示すテキスト／サムネイル配信部５における静止画抽出の機能を示すブロック図である。It is a block diagram which shows the function of the still image extraction in the text / thumbnail delivery part 5 shown in FIG. 図１に示すテキスト／サムネイル配信部５における静止画抽出の機能を示すブロック図である。It is a block diagram which shows the function of the still image extraction in the text / thumbnail delivery part 5 shown in FIG. 図１に示すテキスト／サムネイル配信部５における静止画抽出の機能を示すブロック図である。It is a block diagram which shows the function of the still image extraction in the text / thumbnail delivery part 5 shown in FIG. 図１に示すテキスト／サムネイル配信部５における静止画抽出の機能を示すブロック図である。It is a block diagram which shows the function of the still image extraction in the text / thumbnail delivery part 5 shown in FIG. 図１に示す受信装置２３の機能を示す説明図である。It is explanatory drawing which shows the function of the receiver 23 shown in FIG. 従来技術による静止画抽出機能を示すブロック図である。It is a block diagram which shows the still image extraction function by a prior art.

Explanation of symbols

１・・・配信装置、２３・・・受信装置、４・・・記憶部、５・・・テキスト／サムネイル配信部、５０・・・Ｎ値記憶部、５１、５５、５６・・・フレーム抽出部、５２・・・時間・静止画記録部、５３・・・字幕ページ抽出部、５４・・・差分算出部、５７・・・字幕ページ数判定部、５８・・・時間間隔判定部、５９・・・静止画抽出部、６０・・・句読点抽出部、Ｔ、ＴＡ、ＴＢ、ＴＣ・・・時計 DESCRIPTION OF SYMBOLS 1 ... Distribution apparatus, 23 ... Receiving device, 4 ... Memory | storage part, 5 ... Text / thumbnail distribution part, 50 ... N value memory | storage part, 51, 55, 56 ... Frame extraction , 52 ... Time / still image recording unit, 53 ... Subtitle page extraction unit, 54 ... Difference calculation unit, 57 ... Subtitle page number determination unit, 58 ... Time interval determination unit, 59 ... Still image extraction unit, 60 ... Punctuation mark extraction unit, T, TA, TB, TC ... Clock

Claims

A still image extraction device for extracting still images from video data accompanied by caption data in order to create thumbnail data,
From each frame of the video data, a first frame when the caption data is newly displayed, a second frame when the displayed caption data is erased, the first frame, and the frame Frame extracting means for extracting N (N is a natural number of 1 or more) frames between the second frames;
A difference calculating means for respectively obtaining a difference between two temporally adjacent images out of (N + 2) frames extracted by the frame extracting means;
The two frames having the smallest image difference obtained by the difference calculating means are identified, and the video data of the earlier or later frame of the identified two frames is extracted and recorded as a still image. A still image extracting device.

Further comprising punctuation mark extraction means for extracting punctuation marks from the caption data;
2. The still image according to claim 1, wherein the frame extraction unit extracts the (N + 2) frames only when a punctuation mark or a punctuation point is detected in the caption data by the punctuation mark extraction unit. Extraction device.

A still image extraction program for extracting still images from video data accompanied by caption data by a computer in order to create thumbnail data,
From each frame of the video data, a first frame when the caption data is newly displayed, a second frame when the displayed caption data is erased, the first frame, and the frame A frame extraction step of extracting N (N is a natural number of 1 or more) frames between the second frames;
A difference calculating step for obtaining a difference between two temporally adjacent images among (N + 2) frames extracted by the frame extracting step;
The two frames having the smallest image difference obtained by the difference calculating step are identified, and the video data of the earlier or later frame of the identified two frames is extracted and recorded as a still image. A still image extraction program for causing a computer to perform a still image extraction step.

Further causing the computer to perform a punctuation extraction step of extracting punctuation from the caption data;
The still image according to claim 3, wherein the frame extracting step extracts the (N + 2) frames only when a punctuation mark or a punctuation mark is detected in the caption data by the punctuation mark extraction step. Extraction program.