JP2017058978A

JP2017058978A - Information processing device and program

Info

Publication number: JP2017058978A
Application number: JP2015183453A
Authority: JP
Inventors: 麻衣鈴木; Mai Suzuki
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2015-09-16
Filing date: 2015-09-16
Publication date: 2017-03-23

Abstract

PROBLEM TO BE SOLVED: To provide an information processing device for causing a user to recognize the importance of each part obtained by dividing time series information.SOLUTION: The information processing device includes division means for dividing time series information including voice information and image information and changing with time into a plurality of parts, feature image acquisition means for acquiring a feature image showing a feature of a divided part from image information included in the part, feature word acquisition means for acquiring a feature word showing a feature of the divided part on the basis of voice information included in the part, and summary image generation means for generating a summary image including a plurality of feature images displayed in a mode corresponding to importance of the feature word of the divided part.SELECTED DRAWING: Figure 2

Description

本発明は、情報処理装置及びプログラムに関する。 The present invention relates to an information processing apparatus and a program.

例えば、利用者が視聴する映画等の時系列情報を複数のパートに区分し、区分された各パートにおいてパートに含まれる音声から得られる当該パートに関する特徴語と、パートに含まれる画像から得られる当該パートに関する特徴画像と、を関連付けて要約を作成する技術がある（特許文献１）。 For example, time series information such as a movie viewed by a user is divided into a plurality of parts, and obtained from feature words related to the part obtained from sound included in the part and images included in the part. There is a technique for creating a summary by associating feature images related to the part (Patent Document 1).

特開２００８−１４８１２１号公報JP 2008-148121 A

各パートにおける特徴語の重要度は、各パート間で異なるため、特徴語の重要度に応じてユーザーが時系列情報を閲覧した場合、各パートの特徴語の重要度をユーザーに示す必要がある。そこで、複数の特徴画像を含む要約を作成する際に特徴語の重要度に応じて複数の特徴画像の大きさを異ならせることができれば利用者の利便性が向上する。 Since the importance level of the feature word in each part is different between each part, when the user browses the time series information according to the importance level of the feature word, it is necessary to show the importance level of the feature word of each part to the user . Therefore, when creating a summary including a plurality of feature images, if the size of the plurality of feature images can be varied according to the importance of the feature word, the convenience for the user is improved.

本発明の目的の一つは、時系列情報を区分した各パートの特徴語の重要度を利用者に把握させることができる情報処理装置を提供することにある。 One of the objects of the present invention is to provide an information processing apparatus that allows a user to grasp the importance of feature words of each part into which time-series information is divided.

本発明の請求項１に係る情報処理装置は、音声情報及び画像情報を含み時間とともに変化する時系列情報を複数のパートに区分する区分手段と、前記区分されたパートの特徴を示す特徴画像を当該パートに含まれる前記画像情報から取得する特徴画像取得手段と、前記区分されたパートの特徴を示す特徴語を当該パートに含まれる前記音声情報に基づいて取得する特徴語取得手段と、前記区分されたパートの特徴語の重要度に応じた態様で表示された複数の前記特徴画像を含む要約画像を生成する要約画像生成手段と、を含むことを特徴とする。 An information processing apparatus according to claim 1 of the present invention includes a classifying unit that classifies time-series information that includes audio information and image information and changes with time into a plurality of parts, and a feature image that indicates characteristics of the divided parts. Feature image acquisition means for acquiring from the image information included in the part, feature word acquisition means for acquiring a feature word indicating the characteristics of the classified part based on the audio information included in the part, and the classification And summary image generation means for generating a summary image including a plurality of the feature images displayed in a manner corresponding to the degree of importance of the feature words of the selected part.

本発明の請求項２に係る情報処理装置は、請求項１に記載の情報処理装置において、前記特徴語取得手段は、前記区分されたパートに対応する前記音声情報における出現頻度に応じて前記特徴語を取得する。 The information processing apparatus according to claim 2 of the present invention is the information processing apparatus according to claim 1, wherein the feature word acquisition unit is configured to perform the feature according to an appearance frequency in the audio information corresponding to the divided part. Get a word.

本発明の請求項３に係る情報処理装置は、請求項１に記載の情報処理装置において、それぞれが前記重要度に応じた大きさで表示される前記複数の特徴画像を含む前記要約画像を生成する。 An information processing apparatus according to a third aspect of the present invention is the information processing apparatus according to the first aspect, wherein the summary image including the plurality of feature images each displayed in a size corresponding to the importance is generated. To do.

本発明の請求項４に係る情報処理装置は、請求項３に記載の情報処理装置において、前記特徴画像は、前記重要度が高いほど大きく表示され、前記重要度が低いほど小さく表示される。 An information processing apparatus according to a fourth aspect of the present invention is the information processing apparatus according to the third aspect, wherein the feature image is displayed larger as the importance is higher, and smaller as the importance is lower.

本発明の請求項５に係る情報処理装置は、請求項１に記載の情報処理装置において、前記重要度は前記特徴語の前記音声情報から決定される。 An information processing apparatus according to a fifth aspect of the present invention is the information processing apparatus according to the first aspect, wherein the importance is determined from the voice information of the feature word.

本発明の請求項６に係る情報処理装置は、請求項５に記載の情報処理装置において、前記重要度は前記特徴語の前記音声情報の音量または無音の時間によって決定される。 An information processing apparatus according to a sixth aspect of the present invention is the information processing apparatus according to the fifth aspect, wherein the importance is determined by a volume of the voice information of the feature word or a silent time.

本発明の請求項７に係る情報処理装置は、請求項１に記載の情報処理装置において、前記要約画像は、前記特徴語が関連付けられた前記特徴画像を含む。 An information processing apparatus according to a seventh aspect of the present invention is the information processing apparatus according to the first aspect, wherein the summary image includes the feature image associated with the feature word.

本発明の請求項８に係る情報処理装置は、請求項３または４に記載の情報処理装置において、前記要約画像は、前記特徴画像の大きさに応じた態様の前記特徴語が関連付けられた前記特徴画像を含む。 An information processing apparatus according to an eighth aspect of the present invention is the information processing apparatus according to the third or fourth aspect, wherein the summary image is associated with the feature word having an aspect corresponding to a size of the feature image. Includes feature images.

本発明の請求項９に係る情報処理装置は、請求項８に記載の情報処理装置において、前記要約画像生成手段は、前記特徴画像の大きさに応じて当該特徴画像に関連付けられる前記特徴語の表示を制限する。 The information processing device according to claim 9 of the present invention is the information processing device according to claim 8, wherein the summary image generating means is configured to store the feature word associated with the feature image according to the size of the feature image. Limit the display.

本発明の請求項１０に係る情報処理装置は、請求項９に記載の情報処理装置において、前記要約画像生成手段は、前記特徴画像の大きさが予め定められた閾値以下または未満の場合に、当該特徴画像に関連付けられる前記特徴語を表示させない。 The information processing apparatus according to claim 10 of the present invention is the information processing apparatus according to claim 9, wherein the summary image generation means is configured to perform the processing when the size of the feature image is equal to or less than a predetermined threshold. The feature word associated with the feature image is not displayed.

本発明の請求項１１に係る情報処理装置は、請求項９に記載の情報処理装置において、前記要約画像生成手段は、前記特徴画像の大きさが予め定められた閾値以下または未満の場合に、当該特徴画像に関連付けられる前記特徴語の一部を表示させる。 An information processing apparatus according to an eleventh aspect of the present invention is the information processing apparatus according to the ninth aspect, wherein the summary image generating means is configured such that the size of the feature image is less than or less than a predetermined threshold value. A part of the feature word associated with the feature image is displayed.

本発明の請求項１２に係る情報処理装置は、請求項１０または１１に記載の情報処理装置において、利用者による操作指示を受け付ける操作指示受付手段、をさらに含み、前記操作指示受付手段が、前記特徴語の表示の制限をされた前記特徴画像を指示する操作を受け付けると、前記要約画像生成手段は前記表示の制限をされた前記特徴語を表示させた前記要約画像を生成する。 An information processing apparatus according to claim 12 of the present invention further includes an operation instruction accepting unit that accepts an operation instruction by a user in the information processing apparatus according to claim 10 or 11, wherein the operation instruction accepting unit includes the operation instruction accepting unit, Upon receiving an operation for instructing the feature image for which the display of feature words is restricted, the summary image generation means generates the summary image in which the feature words for which the display is restricted are displayed.

本発明の請求項１３に係るプログラムは、音声情報及び画像情報を含み時間とともに変化する時系列情報を複数のパートに区分する区分手段、前記区分されたパートの特徴を示す特徴画像を当該パートに含まれる前記画像情報から取得する特徴画像取得手段、前記区分されたパートの特徴を示す特徴語を当該パートに含まれる前記音声情報に基づいて取得する特徴語取得手段、前記区分されたパートの特徴語の重要度に応じた態様で表示された複数の前記特徴画像を含む要約画像を生成する要約画像生成手段、としてコンピュータを機能させるためのプログラムである。 According to a thirteenth aspect of the present invention, there is provided a program for classifying time-series information that includes audio information and image information and changes with time into a plurality of parts, and a feature image indicating the characteristics of the divided parts as the part. Feature image acquisition means for acquiring from the included image information, feature word acquisition means for acquiring a feature word indicating the characteristics of the classified part based on the audio information included in the part, and characteristics of the classified part A program for causing a computer to function as summary image generation means for generating a summary image including a plurality of the feature images displayed in a manner according to the importance of a word.

本発明の請求項１及び１３によれば、時系列情報を区分した各パートの特徴語の重要度を利用者に把握させることができる。 According to the first and thirteenth aspects of the present invention, it is possible to allow the user to grasp the importance of the feature word of each part into which the time series information is divided.

本発明の請求項２によれば、各パートにおける出現頻度に応じて特徴語を決定することができる。 According to claim 2 of the present invention, a feature word can be determined according to the appearance frequency in each part.

本発明の請求項３によれば、特徴画像の大きさにより各パートの特徴語の重要度を利用者に把握させることができる。 According to the third aspect of the present invention, the importance of the feature word of each part can be recognized by the user based on the size of the feature image.

本発明の請求項４によれば、特徴画像が大きいほど特徴語の重要度が高く、特徴画像が小さいほど特徴語の重要度が低い、ことを利用者に把握させることができる。 According to the fourth aspect of the present invention, it is possible to allow the user to grasp that the importance of the feature word is higher as the feature image is larger and the importance of the feature word is lower as the feature image is smaller.

本発明の請求項５によれば、音声情報から特徴語の重要度を決定することができる。 According to claim 5 of the present invention, the importance of the feature word can be determined from the voice information.

本発明の請求項６によれば、音声情報の音量から特徴語の重要度を決定することができる。 According to claim 6 of the present invention, the importance of the feature word can be determined from the volume of the voice information.

本発明の請求項７によれば、利用者に各パートの情報把握をさせやすくする。 According to claim 7 of the present invention, it is easy for the user to grasp the information of each part.

本発明の請求項８によれば、特徴語の表示態様から各パートの特徴語の重要度を利用者に把握させることができる。 According to claim 8 of the present invention, it is possible to make the user grasp the importance of the feature word of each part from the display mode of the feature word.

本発明の請求項９から１１によれば、特徴語の重要度の低いパートについての情報量を少なくすることができる。 According to the ninth to eleventh aspects of the present invention, it is possible to reduce the amount of information about a part having a low importance level of a feature word.

本発明の請求項１２によれば、特徴語の重要度の低いパートについては利用者が要求したときだけ情報を提供することができる。 According to the twelfth aspect of the present invention, it is possible to provide information about a part having a low importance level of a feature word only when the user requests it.

本実施形態に係る情報処理システムのハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions of the information processing system which concerns on this embodiment. 本実施形態に係るサーバが実現する主な機能の一例を示す機能ブロック図である。It is a functional block diagram which shows an example of the main functions which the server which concerns on this embodiment implement | achieves. 本実施形態に係る要約画像の一例を示す図である。It is a figure which shows an example of the summary image which concerns on this embodiment. 本実施形態に係るサーバが実行する要約画像生成処理の一例を示すフロー図である。It is a flowchart which shows an example of the summary image generation process which the server which concerns on this embodiment performs. 本実施形態に係るサーバが実行する特徴情報取得処理の一例を示すフロー図である。It is a flowchart which shows an example of the characteristic information acquisition process which the server which concerns on this embodiment performs. 本実施形態に係るサーバが実行する表示態様設定処理の一例を示すフロー図である。It is a flowchart which shows an example of the display mode setting process which the server which concerns on this embodiment performs. 本実施形態に係る要約画像の一例を示す図である。It is a figure which shows an example of the summary image which concerns on this embodiment. 本実施形態に係るサーバが実行する表示態様設定処理の一例を示すフロー図である。It is a flowchart which shows an example of the display mode setting process which the server which concerns on this embodiment performs. 本実施形態に係る要約画像の一例を示す図である。It is a figure which shows an example of the summary image which concerns on this embodiment.

以下、本発明の一実施形態について図面に基づき詳細に説明する。 Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.

図１は、本実施形態に係る情報処理システム１のハードウェア構成の一例を示す図である。図１に示すように、本実施形態に係る情報処理システム１は、ネットワークを介して接続されるサーバ１０及び端末装置２０を含んで構成される。なお、図１においては、情報処理システム１は、１つの端末装置２０のみを含んでいるが、その他複数の端末装置２０を含んでいてもよい。 FIG. 1 is a diagram illustrating an example of a hardware configuration of an information processing system 1 according to the present embodiment. As shown in FIG. 1, the information processing system 1 according to the present embodiment includes a server 10 and a terminal device 20 connected via a network. In FIG. 1, the information processing system 1 includes only one terminal device 20, but may include a plurality of other terminal devices 20.

サーバ１０は、例えばサーバコンピュータ等の情報処理装置であって、制御部１１と、記憶部１２と、通信部１３と、を含んで構成される。 The server 10 is an information processing apparatus such as a server computer, and includes a control unit 11, a storage unit 12, and a communication unit 13.

制御部１１は、例えばＣＰＵ等であって、記憶部１２に格納されるプログラムに従って各種の情報処理を実行する。 The control unit 11 is, for example, a CPU or the like, and executes various types of information processing according to programs stored in the storage unit 12.

記憶部１２は、例えばＲＡＭやＥＯＭ等のメモリ素子、ハードディスクなどを含んで構成される。記憶部１２は、制御部１１によって実行されるプログラムや、各種のデータを保持する。また、記憶部１２は、制御部１１のワークメモリとしても動作する。 The storage unit 12 includes, for example, a memory element such as a RAM or an EOM, a hard disk, and the like. The storage unit 12 holds programs executed by the control unit 11 and various data. The storage unit 12 also operates as a work memory for the control unit 11.

通信部１３は、例えばＬＡＮカード等のネットワークインタフェースであって、ＬＡＮや無線通信網などの通信手段を介して、サーバ１０との間で情報の送受信を行う。なお、制御部１１、記憶部１２、通信部１３は、それぞれバスを介して接続される。 The communication unit 13 is a network interface such as a LAN card, and transmits and receives information to and from the server 10 via communication means such as a LAN and a wireless communication network. In addition, the control part 11, the memory | storage part 12, and the communication part 13 are each connected via the bus | bath.

端末装置２０は、例えばパーソナルコンピュータ等の情報処理装置であって、制御部２１と、記憶部２２と、通信部２３と、表示部２４と、操作部２５と、を含んで構成される。上記各部２１〜２５はバスを介して接続される。なお、制御部２１、記憶部２２、及び通信部２３は、それぞれ制御部１１、記憶部１２、及び通信部１３と同様の構成であってよい。 The terminal device 20 is an information processing device such as a personal computer, and includes a control unit 21, a storage unit 22, a communication unit 23, a display unit 24, and an operation unit 25. The units 21 to 25 are connected via a bus. The control unit 21, the storage unit 22, and the communication unit 23 may have the same configuration as the control unit 11, the storage unit 12, and the communication unit 13, respectively.

表示部２４は、例えば液晶ディスプレイ、ＣＲＴディスプレイ、有機ＥＬディスプレイ等であって、制御部２１からの指示に従って、情報の表示を行う。 The display unit 24 is a liquid crystal display, a CRT display, an organic EL display, or the like, for example, and displays information according to an instruction from the control unit 21.

操作部２５は、例えばキーボード、マウス、ボタンやタッチパネル等であって、利用者の指示操作を受け付けて、当該指示操作の内容を制御部２１に出力する。 The operation unit 25 is, for example, a keyboard, a mouse, a button, a touch panel, or the like. The operation unit 25 receives a user instruction operation and outputs the content of the instruction operation to the control unit 21.

次に、本実施形態に係る情報処理システム１が実現する機能について、説明する。図２は、本実施形態に係るサーバ１０が実現する主な機能の一例を示す機能ブロック図である。図２に示すように、サーバ１０は、機能的には、時系列情報取得部５０、パート区分部５１、特徴語取得部５２、特徴画像取得部５３、特徴情報関連付け部５４、重要度算出部５５、表示態様設定部５６、及び要約画像生成部５７を含んで構成される。これらの機能は、記憶部１２に記憶されたプログラムを制御部１１が実行することにより実現される。このプログラムは、例えば、磁気ディスク（ＨＤＤ、ＦＤ（Flexible Disk））など）、光記録媒体（光ディスク（ＣＤ（Compact Disk）、ＤＶＤ（Digital Versatile Disk））など）、光磁気記録媒体、半導体メモリ（フラッシュＲＯＭなど）などのコンピュータが読み取り可能な記録媒体を介して、あるいは、インターネットなどの通信手段を介してサーバ１０に供給される。 Next, functions realized by the information processing system 1 according to the present embodiment will be described. FIG. 2 is a functional block diagram illustrating an example of main functions realized by the server 10 according to the present embodiment. As shown in FIG. 2, the server 10 functionally includes a time-series information acquisition unit 50, a part classification unit 51, a feature word acquisition unit 52, a feature image acquisition unit 53, a feature information association unit 54, and an importance calculation unit. 55, a display mode setting unit 56, and a summary image generation unit 57. These functions are realized by the control unit 11 executing the program stored in the storage unit 12. This program includes, for example, a magnetic disk (HDD, FD (Flexible Disk), etc.), an optical recording medium (optical disk (CD (Compact Disk), DVD (Digital Versatile Disk), etc.)), a magneto-optical recording medium, a semiconductor memory ( The data is supplied to the server 10 via a computer-readable recording medium such as a flash ROM or via communication means such as the Internet.

時系列情報取得部５０は、時系列情報を取得する。ここで、時系列情報は、時間とともに変化する情報であり、本実施形態では音声情報と、複数のフレーム画像を含む画像情報と、を含む情報とする。具体的な時系列情報としては、講演を記録した講演情報や、テレビやインターネットで配信されている映像を記録した配信記録情報であってよい。当該時系列情報は、サーバ１０の記憶部１２に記憶されていてもよいし、端末装置２０から提供されてもよい。 The time series information acquisition unit 50 acquires time series information. Here, the time-series information is information that changes with time, and in the present embodiment, it is information including audio information and image information including a plurality of frame images. Specific time-series information may be lecture information recording a lecture or distribution record information recording video distributed on a television or the Internet. The time series information may be stored in the storage unit 12 of the server 10 or may be provided from the terminal device 20.

パート区分部５１は、時系列情報を複数のパートに区分する。パート区分部５１は、時系列情報に含まれる音声情報及び画像情報を複数のパートに区分する。パート区分部５１は、例えば、連続する２つのフレーム画像のカラーヒストグラムの差、画素差、離散コサイン変換などの変換係数差などを算出し、算出された差が所定の閾値以上となった場合に当該２つのフレーム画像をパートの区切りとする。 The part classification unit 51 classifies the time series information into a plurality of parts. The part classification unit 51 classifies audio information and image information included in the time series information into a plurality of parts. For example, the part classification unit 51 calculates a difference between color histograms of two consecutive frame images, a pixel difference, a conversion coefficient difference such as a discrete cosine transform, and the like, and when the calculated difference is equal to or greater than a predetermined threshold value. The two frame images are used as part separators.

特徴語取得部５２は、パート区分部５１が区分した各パートにおいて、当該パートの特徴を示す特徴語を取得する。本実施形態に係る特徴語取得部５２は、音声情報に応じて特徴語を取得する。まず特徴語取得部５２は、各パートの音声情報に対して音声認識を行い文字情報に変換する。そして特徴語取得部５２は、変換した文字情報から特徴語を取得する。例えば、特徴語取得部５２は、変換した文字情報のなかから出現する頻度の最も高い単語を特徴語として抽出する。 The feature word acquisition unit 52 acquires, for each part classified by the part classification unit 51, a feature word indicating the feature of the part. The feature word acquisition unit 52 according to the present embodiment acquires a feature word according to the voice information. First, the feature word acquisition unit 52 performs voice recognition on the voice information of each part and converts it into character information. Then, the feature word acquisition unit 52 acquires a feature word from the converted character information. For example, the feature word acquisition unit 52 extracts the most frequently occurring word from the converted character information as a feature word.

特徴画像取得部５３は、パート区分部５１が区分した各パートにおいて、当該パートの特徴を示す特徴画像を取得する。特徴画像取得部５３は、特徴語取得部５２が取得した特徴語に対応するフレーム画像を特徴画像として取得する。具体的には、特徴画像取得部５３は、音声情報における特徴語の出現位置を取得し、当該出現位置に対応するフレーム画像を特徴画像として取得する。なお特徴画像取得部５３は、画像情報に基づいて特徴画像を取得してもよい。例えば特徴画像取得部５３は、パートにおいて最初に出現するフレーム画像を特徴画像として取得してもよいし、パートに含まれる複数のフレーム画像から所定の基準を満たすフレーム画像を特徴画像として取得してもよい。 The feature image acquisition unit 53 acquires a feature image indicating the feature of each part classified by the part classification unit 51. The feature image acquisition unit 53 acquires a frame image corresponding to the feature word acquired by the feature word acquisition unit 52 as a feature image. Specifically, the feature image acquisition unit 53 acquires the appearance position of the feature word in the voice information, and acquires the frame image corresponding to the appearance position as the feature image. The feature image acquisition unit 53 may acquire a feature image based on the image information. For example, the feature image acquisition unit 53 may acquire a frame image that first appears in a part as a feature image, or acquire a frame image that satisfies a predetermined criterion from a plurality of frame images included in the part as a feature image. Also good.

特徴情報関連付け部５４は、特徴語取得部５２が取得した特徴語と、特徴画像取得部５３が取得した特徴画像と、を関連付けた関連付け情報を関連付け情報記憶部に保存する。関連付け情報は、パート毎に特徴語と特徴画像とが関連付けられていることとする。関連付け情報には、パート毎に特徴語や特徴画像に関連する情報がさらに関連付けられてもよい。例えば時系列情報において特徴語が出現する出現位置（例えば再生時間など）や、時系列情報において特徴画像が出現する出現位置などが関連付けられてもよい。 The feature information association unit 54 stores, in the association information storage unit, association information in which the feature word acquired by the feature word acquisition unit 52 and the feature image acquired by the feature image acquisition unit 53 are associated with each other. Assume that the association information associates a feature word and a feature image for each part. Information related to feature words and feature images may be further associated with the association information for each part. For example, an appearance position where a feature word appears in time series information (for example, reproduction time), an appearance position where a feature image appears in time series information, and the like may be associated.

重要度算出部５５は、パート区分部５１が区分した各パートの重要度を算出する。重要度算出部５５は、特徴語取得部５２が取得した特徴語に基づいて各パートの重要度を算出する。例えば、重要度算出部５５は、音声情報において特徴語が出現する時間の音量に基づいて重要度を算出する。具体的には重要度算出部５５は、特徴語が出現する時間の音量が大きいほど重要度を高く算出し、特徴語が出現する時間の音量が小さいほど重要度を低く算出する。これは音量が大きいほど主張したい内容が含まれていると考えられるからである。また重要度算出部５５は、音声情報において特徴語が出現する時間の直前に無音の時間があるか否かにより重要度を算出してもよい。具体的には重要度算出部５５は、特徴語が出現する時間の直前に無音の時間がある場合に重要度を高く算出する。これは時系列情報が講演の記録である場合など、聴衆の注意を惹きつけるためにあえて沈黙していると判断し、無音の時間の後に主張したい内容が含まれていると考えられるからである。また重要度算出部５５は、各パートにおける特徴語の出現回数に応じて重要度を算出してもよい。これは、パート内における特徴語の出現回数が多いほど、該パートの特徴語が重要だと考えられるからである。なお、重要度算出部５５は、特徴画像に基づいて重要度を算出してもよい。例えば重要度算出部５５は、特徴画像のカラーヒストグラムや画像分布等から重要度を算出してもよい。また重要度算出部５５は、上述した重要度算出手法の２つ以上を組み合わせて各パートの重要度を算出してもよい。例えば、重要度算出部５５は、特徴語と特徴画像とに基づいて重要度を算出してもよい。 The importance calculation unit 55 calculates the importance of each part classified by the part classification unit 51. The importance calculation unit 55 calculates the importance of each part based on the feature words acquired by the feature word acquisition unit 52. For example, the importance calculation unit 55 calculates the importance based on the volume of the time when the feature word appears in the voice information. Specifically, the importance calculation unit 55 calculates the importance higher as the volume of the time when the feature word appears is larger, and calculates the importance lower as the volume of the time when the feature word appears is lower. This is because the higher the volume, the more content you want to claim. Further, the importance level calculation unit 55 may calculate the importance level based on whether or not there is a silent time immediately before the time when the feature word appears in the voice information. Specifically, the importance calculation unit 55 calculates the importance high when there is a silent time immediately before the time when the feature word appears. This is because, for example, when the time-series information is a recording of a lecture, it is considered that the content that you want to insist on after silent time is included because it is judged to be silent to attract the attention of the audience . The importance level calculation unit 55 may calculate the importance level according to the number of appearances of the feature word in each part. This is because the feature word of the part is considered to be more important as the number of appearances of the feature word in the part increases. The importance calculation unit 55 may calculate the importance based on the feature image. For example, the importance calculation unit 55 may calculate the importance from a color histogram or image distribution of the feature image. The importance calculation unit 55 may calculate the importance of each part by combining two or more of the importance calculation methods described above. For example, the importance calculation unit 55 may calculate the importance based on the feature word and the feature image.

表示態様設定部５６は、要約画像生成部５７が複数の特徴画像を用いて時系列情報の要約を表す要約画像を生成する際の、各特徴画像の表示態様を設定する。そして、要約画像生成部５７は、表示態様設定部５６が設定した表示態様にしたがって要約画像を生成する。 The display mode setting unit 56 sets the display mode of each feature image when the summary image generation unit 57 generates a summary image representing a summary of time-series information using a plurality of feature images. The summary image generation unit 57 generates a summary image according to the display mode set by the display mode setting unit 56.

表示態様設定部５６は、重要度算出部５５が算出した各パートの重要度に基づいて、各特徴画像の表示態様を設定する。具体的には表示態様設定部５６は、複数の特徴画像の大きさを重要度に応じて異ならせる。つまり表示態様設定部５６は、重要度の高いパートの特徴画像を大きく、重要度の低いパートの特徴画像を小さく表示されるよう設定する。 The display mode setting unit 56 sets the display mode of each feature image based on the importance level of each part calculated by the importance level calculation unit 55. Specifically, the display mode setting unit 56 varies the sizes of the plurality of feature images according to the importance. That is, the display mode setting unit 56 sets the feature image of the part with high importance to be large and the feature image of the part with low importance to be displayed small.

図３は、本実施形態に係る要約画像１００の一例を示す図である。図３に示す要約画像１００は、複数のコマ１１０を含み、各コマ１１０に特徴画像が表示される。複数のコマ１１０は、それぞれ大きさが異なっており、例えばコマ１１０ａはコマ１１０ｂより大きい。そして複数の特徴画像は、各コマ１１０の大きさに応じてそれぞれ異なる大きさで表示されている。これらコマの大きさ、特徴画像の大きさは上述したように重要度に応じて設定されている。なお、表示態様設定部５６は、特徴画像の大きさを設定してもよいし、要約画像１００の各コマ１１０の大きさを設定してもよい。表示態様設定部５６が各コマ１１０の大きさを設定する場合は、要約画像生成部５７が各特徴画像を対応するコマ１１０に応じた大きさに変換して表示させればよい。 FIG. 3 is a diagram illustrating an example of the summary image 100 according to the present embodiment. The summary image 100 shown in FIG. 3 includes a plurality of frames 110, and a feature image is displayed on each frame 110. The plurality of frames 110 have different sizes. For example, the frame 110a is larger than the frame 110b. The plurality of feature images are displayed in different sizes depending on the size of each frame 110. The size of these frames and the size of the feature image are set according to the importance as described above. The display mode setting unit 56 may set the size of the feature image or the size of each frame 110 of the summary image 100. When the display mode setting unit 56 sets the size of each frame 110, the summary image generation unit 57 may convert each feature image into a size corresponding to the corresponding frame 110 and display it.

図３に示すような要約画像１００は、ネットワークを介して端末装置２０へ送信され、端末装置２０の表示部２４に表示される。端末装置２０の利用者は、端末装置２０の表示部２４に表示された要約画像１００を見ることができる。そして、利用者がいずれかの特徴画像を選択すると、当該特徴画像の出現位置から時系列情報が再生され利用者が視聴できるようになっている。このように利用者はコマ割りされた特徴画像の大きさから重要度を判断することができ、重要度に応じて再生位置を選択することが可能となる。 A summary image 100 as shown in FIG. 3 is transmitted to the terminal device 20 via the network and displayed on the display unit 24 of the terminal device 20. A user of the terminal device 20 can see the summary image 100 displayed on the display unit 24 of the terminal device 20. When the user selects any feature image, the time-series information is reproduced from the appearance position of the feature image so that the user can view it. In this way, the user can determine the importance level from the size of the feature image divided into frames, and can select the reproduction position according to the importance level.

また、表示態様設定部５６は、要約画像１００における複数の特徴画像それぞれの表示位置を設定してもよい。例えば表示態様設定部５６は、時系列順に特徴画像が並ぶように各特徴画像の表示位置を設定してもよい。また表示態様設定部５６は、特徴画像の大きさに応じて各特徴画像の表示位置を設定してもよい。例えば表示態様設定部５６は特徴画像の大きさが大きい順に並ぶよう各特徴画像の表示位置を設定してもよいし、最も大きい特徴画像が要約画像１００の中心に位置するよう各特徴画像の表示位置を設定してもよい。 Further, the display mode setting unit 56 may set the display position of each of the plurality of feature images in the summary image 100. For example, the display mode setting unit 56 may set the display position of each feature image so that the feature images are arranged in time series. The display mode setting unit 56 may set the display position of each feature image in accordance with the size of the feature image. For example, the display mode setting unit 56 may set the display position of each feature image so that the feature images are arranged in descending order, or display each feature image so that the largest feature image is positioned at the center of the summary image 100. The position may be set.

また、図３に示す要約画像１００において各特徴画像に当該特徴画像に関連付けられている特徴語を文字で示した文字画像を重畳して表示させる場合がある。このとき、特徴画像の大きさが予め定められた閾値以下または未満の場合に、特徴語文字をすべて表示できないことがある。例えば、特徴語の文字数に対して特徴画像が小さいと、特徴語の文字をすべて表示できない場合がある。そこで、表示態様設定部５６は、各特徴画像の大きさに応じて文字画像を調整する。例えば表示態様設定部５６は、特徴画像の大きさに応じて特徴語の文字がすべて表示されるよう文字画像のサイズを小さくする。また表示態様設定部５６は、特徴画像の大きさに対して特徴語の文字をすべて表示できない場合には、文字画像の表示を制限させてもよい。この場合、要約画像１００を見た利用者が、文字画像の表示が制限されている特徴画像にマウスオーバーすると、文字画像が表示されるようにしてもよい。これらの具体的な処理については後述する。 Further, in the summary image 100 shown in FIG. 3, there may be a case where a character image indicating a feature word associated with the feature image is superimposed on each feature image and displayed. At this time, if the size of the feature image is less than or less than a predetermined threshold value, not all feature word characters may be displayed. For example, if the feature image is small relative to the number of characters in the feature word, it may not be possible to display all the characters in the feature word. Therefore, the display mode setting unit 56 adjusts the character image according to the size of each feature image. For example, the display mode setting unit 56 reduces the size of the character image so that all characters of the feature word are displayed according to the size of the feature image. Further, the display mode setting unit 56 may restrict the display of the character image when all the characters of the feature word cannot be displayed with respect to the size of the feature image. In this case, the character image may be displayed when the user who views the summary image 100 moves the mouse over a feature image in which display of the character image is restricted. These specific processes will be described later.

ここで、本実施形態に係るサーバ１０が実行する要約画像生成処理の一例について図４に示すフロー図を用いて説明する。 Here, an example of the summary image generation process executed by the server 10 according to the present embodiment will be described with reference to the flowchart shown in FIG.

まず、時系列情報取得部５０が、時系列情報を取得する（Ｓ１０１）。ここで時系列情報は、音声情報と、複数のフレーム画像を含む画像情報と、を含む。 First, the time series information acquisition unit 50 acquires time series information (S101). Here, the time series information includes audio information and image information including a plurality of frame images.

パート区分部５１は、時系列情報取得部５０が取得した時系列情報を複数のパートに区分する（Ｓ１０２）。ここでパート区分部５１は、時系列情報をＫ個のパートに区分し、区分した各パートに時系列順に１〜Ｋの番号を付す。 The part classification unit 51 classifies the time series information acquired by the time series information acquisition unit 50 into a plurality of parts (S102). Here, the part classification unit 51 classifies the time series information into K parts, and assigns numbers 1 to K to the divided parts in time series order.

そして、変数ｎにｎ＝１の初期値が設定され（Ｓ１０３）、特徴情報取得処理が実行される（Ｓ１０４）。ここで変数ｎは、１以上の整数値をとるカウンタ変数である。 Then, an initial value of n = 1 is set to the variable n (S103), and feature information acquisition processing is executed (S104). Here, the variable n is a counter variable that takes an integer value of 1 or more.

本実施形態に係るサーバが実行する特徴情報取得処理の一例については図５のフロー図を用いて説明する。ここで特徴情報取得処理により取得される特徴情報は各パートの特徴を示す情報であり、本実施形態においては特徴語と特徴画像とが取得される。 An example of the feature information acquisition process executed by the server according to the present embodiment will be described with reference to the flowchart of FIG. Here, the feature information acquired by the feature information acquisition process is information indicating the feature of each part. In the present embodiment, a feature word and a feature image are acquired.

まず特徴語取得部５２は、ｎ番目パートの音声情報を取得する（Ｓ２０１）。 First, the feature word acquisition unit 52 acquires the sound information of the nth part (S201).

次に特徴語取得部５２は、ｎ番目パートの音声情報を音声認識することで文字情報に変換する（Ｓ２０２）。 Next, the feature word acquisition unit 52 converts the voice information of the nth part into character information by voice recognition (S202).

そして特徴情報取得部は、処理Ｓ２０２において得られた文字情報から、ｎ番目パートの特徴を示す特徴語を抽出する（Ｓ２０３）。 Then, the feature information acquisition unit extracts a feature word indicating the feature of the nth part from the character information obtained in step S202 (S203).

次に特徴画像取得部５３は、処理Ｓ２０３において得られた特徴語の、音声情報における出現位置を取得し、当該出現位置に対応するフレーム画像をｎ番目パートの特徴を示す特徴画像として取得する（Ｓ２０４）。 Next, the feature image acquisition unit 53 acquires the appearance position of the feature word obtained in the process S203 in the audio information, and acquires the frame image corresponding to the appearance position as the feature image indicating the feature of the nth part ( S204).

そして、特徴情報関連付け部５４が、処理Ｓ２０３において取得された特徴語と、処理Ｓ２０４において取得された特徴画像と、を関連付けて（Ｓ２０５）、リターンする。ここで特徴情報関連付け部５４は、パートを識別する番号と、特徴語と、特徴画像と、を関連付けた関連付け情報を記憶部に記憶することとする。 Then, the feature information associating unit 54 associates the feature word acquired in the process S203 with the feature image acquired in the process S204 (S205), and returns. Here, the feature information associating unit 54 stores the association information in which the number for identifying the part, the feature word, and the feature image are associated with each other in the storage unit.

処理Ｓ１０４における特徴情報抽出処理が終了すると、ｎ＝Ｋが成立するか否かが判断される（Ｓ１０５）。つまりＫ個のパートすべてにおいて特徴情報抽出処理が行われたかを判断される。 When the feature information extraction process in process S104 is completed, it is determined whether or not n = K is satisfied (S105). That is, it is determined whether the feature information extraction process has been performed for all K parts.

処理Ｓ１０５の判断の結果、ｎ＝Ｋが成立しないと判断された場合は（Ｓ１０５：Ｎ）、変数ｎに１が加算され（Ｓ１０６）、処理Ｓ１０４以降の処理が繰り返し実行される。 As a result of the determination in step S105, when it is determined that n = K is not satisfied (S105: N), 1 is added to the variable n (S106), and the processing after step S104 is repeatedly executed.

処理Ｓ１０５の判断の結果、ｎ＝Ｋが成立すると判断された場合は（Ｓ１０５：Ｙ）、重要度算出部５５がＫ個のパートそれぞれについての重要度を算出する（Ｓ１０７）。ここで重要度算出部５５により算出された重要度は、関連付け情報に追加して関連付けられて記憶されることとする。 As a result of the determination in step S105, when it is determined that n = K is established (S105: Y), the importance calculation unit 55 calculates the importance for each of the K parts (S107). Here, the importance calculated by the importance calculation unit 55 is stored in association with the association information.

そして、表示態様設定処理が実行される（Ｓ１０８）。本実施形態に係るサーバが実行する表示態様設定処理の一例については図６のフロー図を用いて説明する。 Then, display mode setting processing is executed (S108). An example of the display mode setting process executed by the server according to the present embodiment will be described with reference to the flowchart of FIG.

図６に示すように、まず、表示態様設定部５６は、処理Ｓ１０７において重要度算出部５５が算出した各パートの重要度に基づいて、各パートの特徴画像を表示させるコマ１１０の大きさを設定する（Ｓ３０１）。ここでは表示態様設定部５６は、Ｋ個のパートにそれぞれ対応するＫ個のコマ１１０の大きさを設定する。例えば表示態様設定部５６は、重要度の高いパートに対応するコマ１１０を大きく、重要度の低いパートに対応するコマ１１０を小さく設定する。ここで表示態様設定部５６が設定した情報は各パートの関連付け情報に追加して関連付ける。 As shown in FIG. 6, first, the display mode setting unit 56 determines the size of the frame 110 that displays the feature image of each part based on the importance of each part calculated by the importance calculation unit 55 in step S107. Set (S301). Here, the display mode setting unit 56 sets the size of the K frames 110 corresponding to the K parts. For example, the display mode setting unit 56 sets the frame 110 corresponding to the part with high importance large and sets the frame 110 corresponding to the part with low importance small. The information set by the display mode setting unit 56 is added and associated with the association information of each part.

次に、変数ｍにｍ＝１の初期値が設定される（Ｓ３０２）。ここで変数ｍは、１以上の整数値をとるカウンタ変数である。 Next, an initial value of m = 1 is set to the variable m (S302). Here, the variable m is a counter variable that takes an integer value of 1 or more.

そして表示態様設定部５６は、ｍパートの特徴語の文字数を算出する（Ｓ３０３）。また表示態様設定部５６は、ｍパートに対応するコマ１１０に文字を配置する際の配置可能な文字数を算出する（Ｓ３０４）。ここでコマ１１０に配置可能な文字数は、コマ１１０の大きさと使用する文字サイズとに応じて予め定められていることとする。 Then, the display mode setting unit 56 calculates the number of characters of the m-part feature word (S303). In addition, the display mode setting unit 56 calculates the number of characters that can be placed when placing characters on the frame 110 corresponding to the m part (S304). Here, the number of characters that can be arranged on the frame 110 is determined in advance according to the size of the frame 110 and the character size to be used.

そして配置可能な文字数が、特徴語の文字数以上となるかが判断される（Ｓ３０５）。つまり処理Ｓ３０１において設定されたコマ１１０に特徴語の文字をすべて配置することができるかが判断される。 Then, it is determined whether the number of characters that can be arranged is equal to or greater than the number of characters in the feature word (S305). That is, it is determined whether all the characters of the feature word can be arranged on the frame 110 set in the process S301.

処理Ｓ３０５の判断の結果、配置可能な文字数が特徴語の文字数以上となる場合は（Ｓ３０５：Ｙ）、表示態様設定部５６は、特徴語の文字を示す特徴語文字データを特徴画像に関連付ける（Ｓ３０６）。ここで表示態様設定部５６は、ｍパートに対応する関連付け情報に特徴語文字データを追加して関連付けて記憶する。特徴語文字データは、文字のフォント、配置位置等の情報を含んでいてもよい。なお表示態様設定部５６が、特徴語文字データを関連付けた特徴画像を新たに生成して記憶してもよい。 As a result of the determination in step S305, when the number of characters that can be arranged is equal to or greater than the number of characters in the feature word (S305: Y), the display mode setting unit 56 associates the feature word character data indicating the character of the feature word with the feature image ( S306). Here, the display mode setting unit 56 adds the feature word character data to the association information corresponding to the m part and stores them in association with each other. The feature word character data may include information such as a character font and an arrangement position. The display mode setting unit 56 may newly generate and store a feature image associated with feature word character data.

そして、ｍ＝Ｋが成立するか否かが判断される（Ｓ３０７）。つまりＫ個のパートすべてにおいて特徴語文字データが関連付けられたか判断される。 Then, it is determined whether m = K is satisfied (S307). That is, it is determined whether the feature word character data is associated with all K parts.

処理Ｓ３０７の判断の結果、ｍ＝Ｋが成立しないと判断された場合は（Ｓ３０７：Ｎ）、変数ｍに１が加算され（Ｓ３０８）、処理Ｓ３０３以降の処理が繰り返し実行される。 As a result of the determination in step S307, if it is determined that m = K is not satisfied (S307: N), 1 is added to the variable m (S308), and the processing after step S303 is repeatedly executed.

処理Ｓ３０７の判断の結果、ｍ＝Ｋが成立すると判断された場合は（Ｓ３０７：Ｙ）、リターンする。 If it is determined that m = K is satisfied as a result of the determination in step S307 (S307: Y), the process returns.

また、処理Ｓ３０５の判断の結果、配置可能な文字数が特徴語の文字数より少ない場合は（Ｓ３０５：Ｎ）、表示態様設定部５６は、コマ１１０に配置する特徴語の文字を示す特徴語文字データを調整する（Ｓ３０９）。例えば表示態様設定部５６は、特徴語文字データの文字サイズを、コマ１１０に配置可能なサイズに変更する。そして、処理Ｓ３０７以降の処理を実行する。なお、表示態様設定部５６は、特徴語の一部を特徴語文字データとして特徴画像に関連付けてもよい。また表示態様設定部５６は、特徴語文字データが示す特徴語の文字をテロップ表示させるよう設定してもよい。 When the number of characters that can be arranged is smaller than the number of characters of the feature word as a result of the determination in step S305 (S305: N), the display mode setting unit 56 displays the feature word character data indicating the characters of the feature word to be arranged on the frame 110 Is adjusted (S309). For example, the display mode setting unit 56 changes the character size of the feature word character data to a size that can be arranged on the frame 110. And the process after process S307 is performed. The display mode setting unit 56 may associate a part of the feature word with the feature image as feature word character data. Further, the display mode setting unit 56 may set the character of the feature word indicated by the feature word character data to be displayed in telop.

表示態様設定処理が終了すると、要約画像生成部５７が表示態様設定処理において設定された表示態様に従って要約画像１００を生成し（Ｓ１０９）、要約画像生成処理が終了する。ここでは要約画像生成部５７は、関連付け情報のパート毎に、関連付けられている各種情報に従って要約画像１００を生成すればよい。 When the display mode setting process ends, the summary image generation unit 57 generates the summary image 100 according to the display mode set in the display mode setting process (S109), and the summary image generation process ends. Here, the summary image generation unit 57 may generate the summary image 100 according to the associated information for each part of the association information.

図７は、本実施形態に係る要約画像１００の一例を示す図である。図７は、処理Ｓ１０９において要約画像生成部５７が生成する要約画像１００の一例を示している。図７に示す要約画像１００は、各コマ１１０の特徴画像に特徴語文字データに対応する特徴語文字画像２００が重畳されている。そして、小さいコマ１１０ｂの特徴画像に重畳されている特徴語文字画像２００ｂは、大きいコマ１１０ａの特徴画像に重畳されている特徴語文字画像２００ａより文字サイズが小さくなっている。このように小さいコマ１１０ｂの特徴画像にも特徴語の文字をすべて表示させることで、利用者に各パートの情報把握をさせやすくする。 FIG. 7 is a diagram illustrating an example of the summary image 100 according to the present embodiment. FIG. 7 shows an example of the summary image 100 generated by the summary image generation unit 57 in step S109. In the summary image 100 shown in FIG. 7, the feature word character image 200 corresponding to the feature word character data is superimposed on the feature image of each frame 110. The feature word character image 200b superimposed on the feature image of the small frame 110b has a smaller character size than the feature word character image 200a superimposed on the feature image of the large frame 110a. In this way, by displaying all the characters of the feature words in the feature image of the small frame 110b, it is easy for the user to grasp the information of each part.

次に、本実施形態に係るサーバが実行する表示態様設定処理の他の例について図８のフロー図を用いて説明する。ここでは特徴画像に特徴語文字画像を重畳させない場合の処理について説明する。 Next, another example of the display mode setting process executed by the server according to the present embodiment will be described with reference to the flowchart of FIG. Here, a process when the feature word character image is not superimposed on the feature image will be described.

まず表示態様設定部５６は、処理Ｓ１０７において重要度算出部５５が算出した各パートの重要度に基づいて、各パートの特徴画像を表示させるコマ１１０の大きさを設定する（Ｓ４０１）。 First, the display mode setting unit 56 sets the size of the frame 110 for displaying the feature image of each part based on the importance of each part calculated by the importance calculation unit 55 in step S107 (S401).

次に表示態様設定部５６は、特徴語文字画像を重畳させないパートを選択する（Ｓ４０２）。ここでは表示態様設定部５６は、配置可能文字数が特徴語文字数より少ないパートを選択する。なお、表示態様設定部５６は、すべてのパートを選択してもよい。 Next, the display mode setting unit 56 selects a part on which the feature word character image is not superimposed (S402). Here, the display mode setting unit 56 selects a part whose number of characters that can be arranged is less than the number of feature word characters. The display mode setting unit 56 may select all the parts.

そして表示態様設定部５６は、特徴語文字画像を重畳させないパートについて、特徴語文字画像の代わりに重畳させる代替データを生成する（Ｓ４０３）。ここで代替データは、特徴語文字画像は重畳されていないが文字情報が存在することを、要約画像１００を見る利用者に知らせるためのものである。代替データとしては、例えば画像アイコンや、「・・・」などの文字データとする。 And the display mode setting part 56 produces | generates the alternative data which superimpose instead of a feature word character image about the part which does not superimpose a feature word character image (S403). Here, the substitute data is for informing the user who views the summary image 100 that the character information is present although the feature word character image is not superimposed. The substitute data is, for example, an image icon or character data such as “...”.

そして表示態様設定部５６は、表示文字データを生成する（Ｓ４０４）。表示文字データは、利用者が特徴語文字画像を重畳させていない特徴画像を選択する操作指示（例えばマウスオーバー操作など）を行った場合に、表示させるものである。表示文字データは、特徴語の文字を示すものであってもよいし、パートに含まれる音声情報を音声認識により変換した文字情報を示すものであってもよい。 Then, the display mode setting unit 56 generates display character data (S404). The display character data is displayed when the user performs an operation instruction (for example, a mouse over operation) for selecting a feature image on which no feature word character image is superimposed. The display character data may indicate the character of the feature word, or may indicate character information obtained by converting speech information included in the part by speech recognition.

表示態様設定部５６は、処理Ｓ４０３において生成した代替データと、表示文字データとを特徴画像に関連付けて（Ｓ４０４）、リターンする。ここで表示態様設定部５６は、各パートに対応する関連付け情報に代替データ及び表示文字データを追加して関連付けて記憶する。 The display mode setting unit 56 associates the substitute data generated in step S403 and the display character data with the feature image (S404), and returns. Here, the display mode setting unit 56 adds the substitute data and the display character data to the association information corresponding to each part and stores them in association with each other.

図９は、本実施形態に係る要約画像１００の一例を示す図である。図９は、要約画像生成部５７が、図８に示した表示態様設定処理により設定された表示態様に従って生成した要約画像１００の一例を示している。図９に示す要約画像１００において、小さいコマ１１０ｂの特徴画像には代替データに対応する代替データ画像３００が重畳されている。ここで利用者がマウスオーバー操作等により代替データ画像３００を選択すると、当該操作指示が端末装置２０からサーバ１０へ送信される。サーバ１０が操作指示を受け付けると、要約画像生成部５７が表示文字データに対応する表示文字画像４００を重畳させた要約画像１００を生成する。そして生成された要約画像は端末装置２０へ送信される。そして表示文字画像４００が重畳された要約画像１００が端末装置２０の表示部２４に表示される。なお表示文字画像４００は、表示文字データが示す文字がテロップ表示されるよう構成されていてもよい。 FIG. 9 is a diagram illustrating an example of the summary image 100 according to the present embodiment. FIG. 9 shows an example of the summary image 100 generated by the summary image generation unit 57 according to the display mode set by the display mode setting process shown in FIG. In the summary image 100 shown in FIG. 9, the substitute data image 300 corresponding to the substitute data is superimposed on the feature image of the small frame 110b. Here, when the user selects the alternative data image 300 by a mouse-over operation or the like, the operation instruction is transmitted from the terminal device 20 to the server 10. When the server 10 receives the operation instruction, the summary image generation unit 57 generates a summary image 100 in which the display character image 400 corresponding to the display character data is superimposed. The generated summary image is transmitted to the terminal device 20. The summary image 100 on which the display character image 400 is superimposed is displayed on the display unit 24 of the terminal device 20. The display character image 400 may be configured such that characters indicated by the display character data are displayed in telop.

またサーバ１０が、利用者によるマウスオーバー操作等により代替データ画像３００を選択する操作指示を受け付けると、要約画像生成部５７が表示文字データに対応する表示文字画像４００を生成して端末装置２０へ送信することとしてもよい。 Further, when the server 10 receives an operation instruction for selecting the substitute data image 300 by a user's mouse-over operation or the like, the summary image generation unit 57 generates a display character image 400 corresponding to the display character data and sends it to the terminal device 20. It is good also as transmitting.

本発明は、上記の実施形態に限定されるものではない。 The present invention is not limited to the above embodiment.

例えば、処理Ｓ３０５において配置可能な文字数が、特徴語の文字数より少ない場合に、特徴語を変更してもよい。ここでは特徴情報取得処理を実行させて、特徴語取得部５２が表示態様設定部５６の設定した大きさのコマに配置される特徴語を取得してもよい。 For example, the feature word may be changed when the number of characters that can be arranged in the process S305 is smaller than the number of characters of the feature word. Here, the feature word acquisition process may be executed, and the feature word acquisition unit 52 may acquire the feature words arranged in the frame of the size set by the display mode setting unit 56.

また、上記の実施形態では、サーバ１０と端末装置２０とが別体の例を示したが、サーバ１０と端末装置２０とが一体の装置であってもよい。 Moreover, in said embodiment, although the server 10 and the terminal device 20 showed the example of a different body, the server 10 and the terminal device 20 may be an integrated apparatus.

１情報処理システム、１０サーバ、１１，２１制御部、１２，２２記憶部、１３，２３通信部、２０端末装置、２４表示部、２５操作部、５０時系列情報取得部、５１パート区分部、５２特徴語取得部、５３特徴画像取得部、５４特徴情報関連付け部、５５重要度算出部、５６表示態様設定部、５７要約画像生成部、１００要約画像、１１０，１１０ａ，１１０ｂコマ、２００，２００ａ，２００ｂ特徴語文字画像、３００代替データ画像、４００表示文字画像。 DESCRIPTION OF SYMBOLS 1 Information processing system, 10 server, 11, 21 Control part, 12, 22 Storage part, 13, 23 Communication part, 20 Terminal device, 24 Display part, 25 Operation part, 50 Time series information acquisition part, 51 Part classification part, 52 feature word acquisition unit, 53 feature image acquisition unit, 54 feature information association unit, 55 importance calculation unit, 56 display mode setting unit, 57 summary image generation unit, 100 summary image, 110, 110a, 110b frame, 200, 200a 200b Character word image, 300 alternative data image, 400 display character image.

Claims

Classifying means for classifying time-series information that includes audio information and image information and changes with time into a plurality of parts;
Feature image acquisition means for acquiring a feature image indicating the characteristics of the divided part from the image information included in the part;
Feature word acquisition means for acquiring a feature word indicating the characteristics of the divided part based on the audio information included in the part;
Summary image generation means for generating a summary image including a plurality of the feature images displayed in a manner according to the importance of the feature words of the divided parts;
An information processing apparatus comprising:

The feature word acquisition means acquires the feature word according to the appearance frequency in the audio information corresponding to the divided part.
The information processing apparatus according to claim 1.

The summary image generating means includes
Generating the summary image including the plurality of feature images, each of which is displayed in a size corresponding to the importance.
The information processing apparatus according to claim 1.

The feature image is
The higher the importance, the larger the display, and the lower the importance, the smaller the display,
The information processing apparatus according to claim 3.

The importance is determined from the voice information of the feature word.
The information processing apparatus according to claim 1.

The importance is determined by the volume or silent time of the voice information of the feature word.
The information processing apparatus according to claim 5.

The summary image is
Including the feature image associated with the feature word;
The information processing apparatus according to claim 1.

The summary image is
Including the feature image associated with the feature word in an aspect corresponding to the size of the feature image;
The information processing apparatus according to claim 3, wherein the information processing apparatus is an information processing apparatus.

The summary image generating means includes
Restricting the display of the feature words associated with the feature image according to the size of the feature image;
The information processing apparatus according to claim 8.

The summary image generating means includes
When the size of the feature image is less than or less than a predetermined threshold, the feature word associated with the feature image is not displayed.
The information processing apparatus according to claim 9.

The summary image generating means includes
When the size of the feature image is equal to or less than a predetermined threshold value, a part of the feature word associated with the feature image is displayed.
The information processing apparatus according to claim 9.

An operation instruction receiving means for receiving an operation instruction by a user;
When the operation instruction accepting unit accepts an operation for instructing the feature image for which the display of the feature word is restricted, the summary image generation unit displays the feature word for which the display is restricted. Generate images,
The information processing apparatus according to claim 10, wherein the information processing apparatus is an information processing apparatus.

Classifying means for classifying time-series information that includes audio information and image information and changes with time into a plurality of parts,
Feature image acquisition means for acquiring a feature image indicating the characteristics of the divided part from the image information included in the part;
A feature word acquisition means for acquiring a feature word indicating the characteristics of the divided part based on the audio information included in the part;
Summary image generation means for generating a summary image including a plurality of the feature images displayed in a manner according to the importance of the feature words of the divided parts;
As a program to make the computer function as.