JP2020035261A

JP2020035261A - Topic splitter

Info

Publication number: JP2020035261A
Application number: JP2018162452A
Authority: JP
Inventors: 悠介福島; Yusuke Fukushima
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2018-08-31
Filing date: 2018-08-31
Publication date: 2020-03-05

Abstract

【課題】話者の変化が乏しい動画データを、トピックごとに分割することができるトピック分割装置を提供すること。【解決手段】トピック分割装置１００は、入力した動画データから音声部分を抽出するコンテンツ読込部１０２と、音声部分から、所定時刻ごとに、意味情報を表す特徴量を算出する内容特徴量計算部１０３と、特徴量の変化に基づいて、一または複数の動画分割時刻を特定する内容変化度解析部１０４と、動画分割時刻に基づいて、動画データを分割する分割処理部１０５と、を備える。【選択図】図１PROBLEM TO BE SOLVED: To provide a topic dividing device capable of dividing video data with little change in speakers for each topic. SOLUTION: A topic dividing device 100 calculates a feature amount representing semantic information from a content reading unit 102 that extracts an audio part from input moving image data and an audio part at predetermined time intervals, and a content feature amount calculation unit 103. A content change degree analysis unit 104 that specifies one or a plurality of moving image division times based on a change in the feature amount, and a division processing unit 105 that divides moving image data based on the moving image dividing time are provided. [Selection diagram] Fig. 1

Description

本発明は、音声を用いた映像を分割するトピック分割装置に関するものである。 The present invention relates to a topic division device that divides a video using audio.

画像特徴と音声特徴とのどちらか一方、あるいは両方を用いて映像をいくつかの区間に分割する映像処理装置が知られている。（下記特許文献１参照） 2. Description of the Related Art There is known a video processing device that divides a video into several sections using one or both of an image feature and an audio feature. (See Patent Document 1 below)

特開２０１３−１２６２３３号公報JP 2013-126233 A

上記特許文献１に記載の技術は、以下に示す問題点を有している。すなわち、上記技術は、音声特徴として、音量、周波数成分、および話者の変化を用いるため、単独話者が発話を続けて、話者の変化に乏しい動画を想定していない。 The technique described in Patent Document 1 has the following problems. That is, since the above technology uses the volume, the frequency component, and the change of the speaker as the voice feature, the moving image in which the single speaker continues to speak and the change of the speaker is scarce is not assumed.

そこで本発明は、上記問題点を解決し、話者の変化が乏しい動画データを、トピックごとに分割することができるトピック分割装置を提供することを目的とする。 Accordingly, it is an object of the present invention to solve the above-mentioned problems and to provide a topic dividing device capable of dividing moving image data with a small change in speakers for each topic.

上記課題を解決するために、本発明のトピック分割装置は、入力した動画データから音声部分を抽出する音声抽出部と、前記音声部分から、所定時刻ごとに、意味情報を表す特徴量を算出する算出部と、前記特徴量の変化に基づいて、一または複数の動画分割時刻を特定する特定部と、前記動画分割時刻に基づいて、前記動画データを分割する分割部と、を備える。 In order to solve the above-mentioned problem, a topic division device of the present invention calculates a feature amount representing semantic information at a predetermined time from a sound extraction unit that extracts a sound portion from input moving image data. The image processing apparatus includes a calculating unit, a specifying unit that specifies one or a plurality of moving image division times based on the change in the feature amount, and a dividing unit that divides the moving image data based on the moving image division time.

この構成により、動画データにおける音声部分の意味内容から動画データを分割することができ、動画データをそのトピックごとに分割することができる。 With this configuration, the moving image data can be divided from the meaning of the audio part in the moving image data, and the moving image data can be divided for each topic.

本発明は、動画データのトピック分割を行うことができる。 According to the present invention, topic division of moving image data can be performed.

本実施形態にかかるトピック分割装置の機能構成を示すブロック図である。It is a block diagram showing the functional composition of the topic division device concerning this embodiment. フィルタ処理により発話内容を処理することを示す概念図である。FIG. 9 is a conceptual diagram showing that utterance contents are processed by filter processing. 発話内容の変化度を示すグラフ図である。It is a graph figure which shows the degree of change of the utterance content. トピック分割装置１００の動作を示すフローチャートである。5 is a flowchart illustrating an operation of the topic division device 100. 他の実施形態におけるトピック分割装置１００ａの機能構成を示すブロック図である。It is a block diagram showing functional composition of topic division device 100a in other embodiments. 安定領域を含む動画データの具体例を示す図である。FIG. 9 is a diagram illustrating a specific example of moving image data including a stable area. 発話内容の変化度とテロップの変化の有無との対応を示すグラフ図である。FIG. 7 is a graph showing a correspondence between the degree of change in utterance content and the presence or absence of a change in telop. 他の実施形態におけるトピック分割装置１００ａの動作を示すフローチャートである。11 is a flowchart illustrating an operation of the topic division device 100a according to another embodiment. 発話に応じた分割時刻を修正する概念図である。It is a conceptual diagram which corrects the division | segmentation time according to utterance. トピック分割装置のハードウェア構成を示す図である。FIG. 2 is a diagram illustrating a hardware configuration of a topic division device.

本実施形態にかかるトピック分割装置について、図面を用いて説明する。まず、本実施形態にかかるトピック分割装置の構成について説明する。図１は、本実施形態にかかるトピック分割装置の機能構成を示すブロック図である。本実施形態のトピック分割装置１００は、蓄積装置１０１、コンテンツ読込部１０２（音声抽出部）、内容特徴量計算部１０３（算出部）、内容変化度解析部１０４（特定部）、および分割処理部１０５（分割部）を含む。本実施形態において、トピックとは、発話内容の一の話題を示す。このトピック分割装置１００は、話題ごとに動画データを分割、いわゆるトピック分割するための装置である。以下、詳細に説明する。 A topic division device according to the present embodiment will be described with reference to the drawings. First, the configuration of the topic division device according to the present embodiment will be described. FIG. 1 is a block diagram illustrating a functional configuration of the topic division device according to the present embodiment. The topic dividing device 100 according to the present embodiment includes a storage device 101, a content reading unit 102 (voice extracting unit), a content feature amount calculating unit 103 (calculating unit), a content change degree analyzing unit 104 (identifying unit), and a dividing unit. 105 (division unit). In the present embodiment, the topic indicates one topic of the utterance content. The topic dividing apparatus 100 is an apparatus for dividing moving image data for each topic, that is, for dividing a topic into so-called topics. The details will be described below.

蓄積装置１０１は、分割前の動画データおよび分割後の動画データを記憶する装置である。 The storage device 101 is a device that stores moving image data before division and moving image data after division.

コンテンツ読込部１０２は、蓄積装置１０１から動画データを読み込み、音声部分（音声データ）と映像部分（映像データ）とを抽出する部分である。 The content reading unit 102 is a part that reads moving image data from the storage device 101 and extracts an audio part (audio data) and a video part (video data).

内容特徴量計算部１０３は、コンテンツ読込部１０２から音声部分を受け取り、所定の時間単位で構成される音声を、意味情報を表す特徴量へと変換し、所定の時間単位の音声の起点を示す時刻（発話した時刻）とそれに対応する特徴量とのペアを算出する部分である。 The content feature amount calculation unit 103 receives the audio part from the content reading unit 102, converts the audio composed of a predetermined time unit into a characteristic amount representing semantic information, and indicates a starting point of the audio in the predetermined time unit. This is a part for calculating a pair of the time (the time when the utterance was made) and the corresponding feature amount.

この内容特徴量計算部１０３は、音声認識機能を備え、当該音声認識機能により音声波形を発話内容の文字列へと変換し、これを形態素、単語、または文として、意味を持つ認識単位へと分割し、各認識単位をベクトル化する。ベクトル化する手法は、Word2Vec、GloVe、FastTextなど公知の手法を使う。文を認識単位とする場合には、Sentence2Vecもあり得る。この構成により、内容特徴量計算部１０３は、各時間単位の意味情報を表す特徴量（ベクトル）と発話した時刻とのペアへと変換することが可能である。さらに図２（ａ）および図２（ｂ）に示すように、内容特徴量計算部１０３は、任意長のフィルタ内の各認識単位における特徴量の和を算出することで、フィルタ長Ｎの意味情報を表す特徴量と発話した時刻とのペアを算出する。例えば、［０秒、（０．２、０．４、・・・）］などのような配列からなるペア情報を計算する。なお、ここでは“０秒”は、フィルタの起点となる時刻情報であり、この時刻情報のうち所定条件を満たした情報が、動画分割時刻となる。（０．２、０．４、・・・）は、ベクトル情報であって、フィルタで捉えられた文字列の特徴量を示す。 The content feature calculation unit 103 has a voice recognition function, converts the voice waveform into a character string of the utterance content by the voice recognition function, and converts this into a recognition unit having a meaning as a morpheme, word, or sentence. Divide and vectorize each recognition unit. A known method such as Word2Vec, GloVe, or FastText is used for the vectorization. When a sentence is used as a recognition unit, Sentence2Vec is also possible. With this configuration, the content feature quantity calculation unit 103 can convert the feature quantity (vector) representing the semantic information in each time unit into a pair of the utterance time. Further, as shown in FIGS. 2A and 2B, the content feature amount calculation unit 103 calculates the sum of the feature amounts in each recognition unit in the filter having an arbitrary length, thereby obtaining the meaning of the filter length N. A pair of a feature amount representing information and the time of utterance is calculated. For example, pair information having an array such as [0 seconds, (0.2, 0.4,...)] Is calculated. Here, “0 seconds” is time information serving as a starting point of the filter, and information satisfying a predetermined condition among the time information is a moving image division time. (0.2, 0.4,...) Are vector information, and indicate the feature amount of the character string captured by the filter.

さらに詳細に説明する。図２（ａ）は、フィルタ長Ｎの長さを有するフィルタで、発話した内容を処理することを示す概念図である。図に示されるとおり、フィルタが捉えた、時刻ｔ_０〜ｔ_１（時刻ｔ_１＝時刻ｔ_０＋Ｎ）の間における発話内容は「ニュースはまずこちら」である。それぞれ認識単位は「ニュース」「は」「まず」「こちら」であり、それぞれの特徴量が算出される。内容特徴量計算部１０３は、このフィルタ内における各認識単位の特徴量を合算する。なお、特徴量を合算するに際して、正規化してもよい。 This will be described in more detail. FIG. 2A is a conceptual diagram showing that a uttered content is processed by a filter having a filter length N. As shown in the figure, the filter is captured, speech content in between the time _t 0 ~t ₁ (time _{t 1} = time _t 0 + N) is "news here first." Recognition units are "news", "ha", "first", and "here", and the respective feature amounts are calculated. The content feature amount calculation unit 103 adds up the feature amounts of each recognition unit in the filter. In addition, when summing up the feature amounts, they may be normalized.

図２（ｂ）は、時刻ｔ_２（＝時刻ｔ_０＋ｄ）〜時刻ｔ_３（＝時刻ｔ_０＋ｄ＋Ｎ）の長さを有するフィルタで、発話した内容を処理することを示す概念図である。フィルタは、時間ｄだけ、ずれることにより、フィルタは発話内容として「まずこちらの映像」を捉える。同様に、認識単位は「まず」「こちら」「の」「映像」であり、それぞれの特徴量が算出される。内容特徴量計算部１０３は、このフィルタ内における各認識単位の特徴量を合算する。上記の通り、内容特徴量計算部１０３は、各フィルタ内における各認識単位の特徴量を合算し、それぞれ時刻（フィルタの起点となる時刻）とともに算出する。これら処理を、フィルタを時間ｄずらしながら動画データの終端まで繰り返し行う。 FIG. 2B is a conceptual diagram showing that the uttered content is processed by a filter having a length from time t ₂ (= time t ₀ + d) to time t ₃ (= time t ₀ + d + N). The filter shifts by the time d, so that the filter captures “first this image” as the utterance content. Similarly, the recognition unit is “first”, “here”, “no”, and “video”, and the respective feature amounts are calculated. The content feature amount calculation unit 103 adds up the feature amounts of each recognition unit in the filter. As described above, the content feature amount calculation unit 103 sums up the feature amounts of each recognition unit in each filter, and calculates the sum together with the time (time serving as the starting point of the filter). These processes are repeated until the end of the moving image data while shifting the filter by the time d.

内容変化度解析部１０４は、内容特徴量計算部１０３から、発話した時刻（各フィルタの起点時刻）とそれに対応する特徴量とのペアを受け取り、トピックの変化した時刻を検出する部分である。 The content change degree analysis unit 104 is a part that receives a pair of the utterance time (start time of each filter) and the corresponding feature amount from the content feature amount calculation unit 103, and detects the time at which the topic has changed.

この内容変化度解析部１０４は、特徴量（ベクトル）の変化を算出する手法を用いて、意味情報を表す特徴量と時刻とのペアに基づいて、各時刻の特徴量の変化度を計算する。すなわち、内容変化度解析部１０４は、時系列順に並んでいるペアにおいて、隣同士のペアの差分を計算することでペアの変化度を算出する。例えば、以下の計算を行う。 The content change degree analysis unit 104 calculates the change degree of the feature amount at each time based on a pair of the feature amount representing the semantic information and the time, using a method of calculating the change of the feature amount (vector). . That is, the content change degree analysis unit 104 calculates the change degree of a pair in a pair arranged in chronological order by calculating a difference between adjacent pairs. For example, the following calculation is performed.

変化度ｐ_ｔｎ＝特徴量の和Ｗ_ｔｎ−特徴量の和Ｗ_ｔｎ＋１
ｔ_ｎは時刻であり、Ｗ_ｔｎは、時刻ｔ_ｎの特徴量の和を示す。 Degree of change p _tn = sum of feature amounts W _{tn -sum of} feature amounts W _{tn + 1}
t _n is time, and W _tn indicates the sum of the feature amounts at time t _n .

そして、内容変化度解析部１０４は、所定条件を満たす変化度ｐの時刻ｔ_ｎを動画分割時刻として選択する。なお、時刻ｔ_ｎ＋１を動画分割時刻としてもよい。所定条件としては、以下の通りの条件およびそのための処理が考えられる。 Then, the content change degree analysis unit 104 selects the time t _n of the predetermined condition is satisfied change degree p as video split time. The time t _{n + 1} may be set as the moving image division time. As the predetermined condition, the following conditions and processing for the following conditions are considered.

例えば、内容変化度解析部１０４は、あらかじめ分割したいトピック数を与える場合には、変化度の高い順に分割トピック数を得るためのいくつかの時刻を取得する。また、内容変化度解析部１０４は、あらかじめ分割したいトピック数を与えない場合には、閾値を設定し、その閾値を超える変化度の時刻を取得する。 For example, when the number of topics to be divided is given in advance, the content change degree analysis unit 104 obtains some times for obtaining the number of divided topics in descending order of change degree. When the number of topics to be divided is not given in advance, the content change degree analysis unit 104 sets a threshold value and acquires a time of a change degree exceeding the threshold value.

また、一つのトピックの動画の時間長としてあり得る最短の時間を事前に与えることで、内容変化度解析部１０４は、任意の二つの時刻の差がそれより短いか否かを判断し、短い場合には、より変化度の小さい時刻を削除することが可能である。 Also, by giving in advance the shortest possible time as the time length of the video of one topic, the content change degree analysis unit 104 determines whether the difference between any two times is shorter than that, and In such a case, it is possible to delete a time having a smaller degree of change.

例えば、あり得る最短の時間がＬのとき、図３に示す閾値を超えた変化度ｐ１、ｐ２、ｐ３について、変化度ｐ１と変化度ｐ２の時刻の差はＬ未満のため、変化度の小さい変化度ｐ１に対応する動画分割時刻が削除される。同様に、変化度ｐ２と変化度ｐ３の時刻の差はＬ未満のため、変化度の小さい変化度ｐ３の動画分割時刻が削除される。すなわち、内容変化度解析部１０４は、任意に決めた変化度ｐに対して所定時間Ｌの範囲に、他の変化度ｐがあった場合には、いずれの変化度ｐが高いかを判断し、低い変化度ｐに対応する動画分割時刻を削除する処理を行う。そして、残った変化度ｐに対して再度所定時間Ｌの範囲に、他の変化度ｐがあるか否かを判断し、同様に対応する動画分割時刻を削除する処理を行う。これら処理を繰り返し行うことにより、所定時間Ｌの範囲内に、変化度ｐが複数存在しなくなる。 For example, when the shortest possible time is L, the difference between the changes p1, p2, and p3 exceeding the threshold shown in FIG. The moving image division time corresponding to the degree of change p1 is deleted. Similarly, since the difference between the times of the change degree p2 and the change degree p3 is less than L, the moving image division time of the change degree p3 with the small change degree is deleted. That is, the content change degree analysis unit 104 determines which change degree p is higher when another change degree p is within a predetermined time L with respect to the arbitrarily determined change degree p. , A process of deleting the moving image division time corresponding to the low degree of change p is performed. Then, it is determined whether or not there is another change degree p in the range of the predetermined time L with respect to the remaining change degree p, and similarly, a process of deleting the corresponding moving image division time is performed. By repeatedly performing these processes, a plurality of degrees of change p do not exist within the range of the predetermined time L.

分割処理部１０５は、内容変化度解析部１０４で得られた動画分割時刻で、入力された動画データを分割し、分割した動画データを蓄積装置１０１に記録する部分である。 The division processing unit 105 is a part that divides the input moving image data based on the moving image division time obtained by the content change degree analysis unit 104, and records the divided moving image data in the storage device 101.

なお、上記実施形態にかかる内容特徴量計算部１０３で、発話内容を意味のある単位に分割するとき、この分割された単位の切れ目の時刻（以下、発話単位時刻とする）を分割処理部１０５は受け取ってもよい。この切れ目の時刻を特定する手法として、Voice Activity Detectionを利用することが可能である。このとき分割処理部１０５は、内容変化度解析部１０４で得られた動画分割時刻を、最も近い発話単位時刻へと変換してもよい。このような構成にすることで、図１の構成において意味のある単位の途中でトピック分割が行われていた場合、変形後には意味のある単位の開始前か、終了後に分割される動画分割時刻が移される。 When the utterance content is divided into meaningful units by the content feature amount calculation unit 103 according to the embodiment, the time of a break between the divided units (hereinafter referred to as an utterance unit time) is referred to as a division processing unit 105. May be received. Voice Activity Detection can be used as a method for specifying the time of the break. At this time, the division processing unit 105 may convert the moving image division time obtained by the content change degree analysis unit 104 into the nearest utterance unit time. With such a configuration, when topic division is performed in the middle of a meaningful unit in the configuration of FIG. 1, the moving image division time is divided before the start of the meaningful unit after transformation or after the end. Is moved.

つぎに、本実施形態にかかるトピック分割装置１００の動作について、図４のフローチャートを参照して説明する。図４は、トピック分割装置１００の動作を示すフローチャートである。 Next, the operation of the topic division device 100 according to the present embodiment will be described with reference to the flowchart in FIG. FIG. 4 is a flowchart showing the operation of the topic division device 100.

コンテンツ読込部１０２が蓄積装置１０１から動画データを読み込む（Ｓ１０１）。コンテンツ読込部１０２が動画データから音声部分を抽出し、内容特徴量計算部１０３が受け取る（Ｓ１０２）。内容特徴量計算部１０３が、受け取った音声部分（音声波形）を、音声認識により文字列に変換する（Ｓ１０３）。内容特徴量計算部１０３が、文字列を形態素、単語、または文といった意味のある単位（認識単位）に分割する（Ｓ１０４）。内容特徴量計算部１０３が、認識単位ごとに、分割された文字列を、意味情報を表す特徴量（ベクトル情報）へと変換する（Ｓ１０５）。 The content reading unit 102 reads moving image data from the storage device 101 (S101). The content reading unit 102 extracts an audio part from the moving image data, and the content feature amount calculation unit 103 receives the audio part (S102). The content feature value calculation unit 103 converts the received voice part (voice waveform) into a character string by voice recognition (S103). The content feature quantity calculation unit 103 divides the character string into meaningful units (recognition units) such as morphemes, words, or sentences (S104). The content feature amount calculation unit 103 converts the divided character string into feature amounts (vector information) representing semantic information for each recognition unit (S105).

内容特徴量計算部１０３は、時刻Ｔを０に設定する。時刻Ｔは、動画データの時刻情報を示す。そして、内容特徴量計算部１０３は、時刻Ｔを起点として、所定時間長における特徴量の和を算出し、時刻Ｔにおける特徴量の和のペアを算出する（Ｓ１０７）。つぎに、内容特徴量計算部１０３は、時刻Ｔに時間ｄを加算し、それを新たな時刻Ｔとし（Ｓ１０８）、新たな時刻Ｔに基づいて、所定時間長における特徴量の和を算出し、時刻Ｔにおける特徴量の和のペアを算出する。これを動画データの終端まで繰り返し処理する。 The content feature amount calculation unit 103 sets the time T to 0. The time T indicates time information of the moving image data. Then, the content feature amount calculation unit 103 calculates the sum of the feature amounts for a predetermined time length starting from the time T, and calculates a pair of the sum of the feature amounts at the time T (S107). Next, the content feature amount calculation unit 103 adds the time d to the time T, sets the result as a new time T (S108), and calculates the sum of the feature amounts for a predetermined time length based on the new time T. , A pair of the sum of the feature amounts at the time T is calculated. This is repeated until the end of the moving image data.

ステップＳ１０７における処理を詳細に説明する。図２（ａ）および図２（ｂ）に示すように、内容特徴量計算部１０３は、フィルタ長Ｎを持つフィルタを時間方向にスライドさせながら、フィルタ内部の各認識単位の特徴量の和を算出し、算出した特徴量の和と時刻とのペアを内容変化度解析部１０４に受け渡す。 The processing in step S107 will be described in detail. As shown in FIGS. 2A and 2B, the content feature amount calculation unit 103 calculates the sum of the feature amounts of each recognition unit inside the filter while sliding the filter having the filter length N in the time direction. The pair of the calculated sum of the feature amounts and the calculated time is passed to the content change degree analysis unit 104.

図２（ａ）に示されるとおり、時刻ｔ_０〜時刻ｔ_０＋Ｎの間の文字列は、「ニュース」「は」「まず」「こちら」の認識単位からなり、内容特徴量計算部１０３は、それぞれの特徴量の和Ｗ１を算出する。図２（ｂ）では、内容特徴量計算部１０３は、フィルタを時間ｄずらし、時刻ｔ０＋ｄ〜時刻ｔ０＋Ｎ＋ｄのフィルタに基づき、特徴量の和Ｗ２を算出する。これら処理が、動画データの終端まで繰り返し行われる（Ｓ１０７−Ｓ１０８）。 As shown in FIG. 2A, the character string between time t ₀ and time t ₀ + N is composed of recognition units of “news”, “ha”, “first”, and “here”, and the content feature amount calculation unit 103 , The sum W1 of the respective feature amounts is calculated. In FIG. 2B, the content feature amount calculation unit 103 shifts the filter by time d and calculates the sum W2 of the feature amounts based on the filter from time t0 + d to time t0 + N + d. These processes are repeated until the end of the moving image data (S107-S108).

内容変化度解析部１０４が、意味情報を表す特徴量の各時刻における変化度を算出する（Ｓ１０９）。例えば、図２を例にとると、内容変化度解析部１０４は、和Ｗ１と、和Ｗ２との差を算出することで、変化度を算出する。すなわち、内容変化度解析部１０４は、各時刻間における変化度を算出する。 The content change degree analysis unit 104 calculates the change degree at each time of the feature amount representing the semantic information (S109). For example, taking FIG. 2 as an example, the content change degree analysis unit 104 calculates the change degree by calculating the difference between the sum W1 and the sum W2. That is, the content change degree analysis unit 104 calculates the change degree between each time.

内容変化度解析部１０４は、各時刻間における変化度に基づいて、トピック分割に適した動画分割時刻を求める（Ｓ１０９）。すなわち、上述したとおり、変化度の大きい時刻を、動画分割時刻として求める。 The content change degree analysis unit 104 obtains a moving image division time suitable for topic division based on the change degree between the times (S109). That is, as described above, the time at which the degree of change is large is determined as the moving image division time.

そして、分割処理部１０５が、トピック分割に適した動画分割時刻で入力された動画データの分割を行い、分割した動画データを蓄積装置１０１に記憶する（Ｓ１１１）。 Then, the division processing unit 105 divides the input moving image data at a moving image division time suitable for topic division, and stores the divided moving image data in the storage device 101 (S111).

この処理によって、音声部分の内容に応じて、動画データを分割することができる。すなわち、トピックごとに動画データを分割することができる。 By this processing, the moving image data can be divided according to the content of the audio part. That is, moving image data can be divided for each topic.

続いて、他の実施形態にかかるトピック分割装置１００ａについて説明する。図５は、トピック分割装置１００ａの機能構成を示すブロック図である。このトピック分割装置１００ａは、映像特徴を用いて分割するための装置が追加されている。すなわち、トピック分割装置１００ａは、蓄積装置１０１、コンテンツ読込部１０２（映像抽出部）、内容特徴量計算部１０３、内容変化度解析部１０４、および分割処理部１０５ａに加えて、安定領域検出部１０６（領域検出部）、映像変化度解析部１０７（解析部）を含んでいる。分割処理部１０５ａは、音声部分における動画分割時刻（および発話単位時刻）と、映像部分における分割可能時間帯とに基づいて分割できる構成である。以下、映像部分を用いて分割するための処理構成について説明する。 Subsequently, a topic division device 100a according to another embodiment will be described. FIG. 5 is a block diagram showing a functional configuration of the topic division device 100a. The topic dividing device 100a has an additional device for dividing using a video feature. That is, the topic dividing device 100a includes a stable region detecting unit 106 in addition to the storage device 101, the content reading unit 102 (video extracting unit), the content feature calculating unit 103, the content change degree analyzing unit 104, and the dividing processing unit 105a. (Area detection unit) and a video change degree analysis unit 107 (analysis unit). The division processing unit 105a is configured to be able to divide based on a moving image division time (and an utterance unit time) in an audio part and a division possible time zone in a video part. Hereinafter, a processing configuration for dividing using a video portion will be described.

安定領域検出部１０６は、コンテンツ読込部１０２から映像部分を受け取り、映像を通して変化の少ない領域（以下、安定領域とする）を検出する部分である。図６に示すように、撮影された映像と編集時に挿入されるＣＧ部分１およびＣＧ部分２が混在する動画において、撮影された映像部分は、常に微小な変化はある一方で、編集時に追加されるＣＧ部分は、変化しない時間帯と変化する時間帯を繰り返す安定した領域である（以下、安定領域とする）。例えば、このＣＧ部分１およびＣＧ部分２は、ニュース番組などのテロップ、キャプションなどである。 The stable area detection unit 106 is a part that receives an image portion from the content reading unit 102 and detects an area with little change (hereinafter, referred to as a stable area) through the image. As shown in FIG. 6, in a moving image in which a captured video and a CG portion 1 and a CG portion 2 inserted at the time of editing are mixed, the captured video portion is always added at the time of editing while there is always a slight change. The CG portion is a stable area in which a time zone that does not change and a time zone that changes are repeated (hereinafter, referred to as a stable area). For example, the CG part 1 and the CG part 2 are a telop, a caption, and the like of a news program.

安定領域検出部１０６は、このような安定領域を検出するために、各画素において、時系列方向での分散を算出し、離散値を二値化する手法を用い、あるいは閾値を下回る画素を抽出することで、分散が閾値以上の領域と分散が閾値未満の領域とに全画素を分類する。安定領域検出部１０６は、分散が閾値未満の領域を安定領域として検出する。すなわち、安定領域検出部１０６は、上記ＣＧ部分を検出する。 In order to detect such a stable region, the stable region detecting unit 106 calculates a variance in a time series direction at each pixel and uses a method of binarizing a discrete value, or extracts a pixel below a threshold. By doing so, all the pixels are classified into a region where the variance is equal to or larger than the threshold and a region where the variance is smaller than the threshold. The stable region detection unit 106 detects a region where the variance is smaller than the threshold as a stable region. That is, the stable region detection unit 106 detects the CG portion.

映像変化度解析部１０７は、安定領域検出部１０６で得られた安定領域の変化が大きい部分をトピック分割の候補の時間帯である分割可能時間帯として検出する部分である。すなわち、映像変化度解析部１０７は、内容変化度解析部１０４と同様に、変化度が閾値を超えた部分を分割可能時間帯として検出する。なお、映像変化度解析部１０７は、安定領域内において、時系列方向に画素単位での変化度を算出する。安定領域内の変化度は、ＲＧＢの割合の変化を見ることによって、求められる。 The video change degree analysis unit 107 is a part that detects a part where the change of the stable area obtained by the stable area detection unit 106 is large as a possible division time slot that is a candidate time slot for topic division. That is, similarly to the content change degree analysis unit 104, the video change degree analysis unit 107 detects a portion where the change degree exceeds the threshold value as a dividable time zone. Note that the video change degree analysis unit 107 calculates a change degree for each pixel in the time series direction in the stable region. The degree of change in the stable region is obtained by observing the change in the ratio of RGB.

分割処理部１０５ａは、内容変化度解析部１０４で得られた動画分割時刻のうち映像変化度解析部１０７で得られた分割可能時間帯に含まれる動画分割時刻で、入力された動画データを分割し、蓄積装置１０１に記憶する部分である。さらに分割処理部１０５ａは、内容特徴量計算部１０３で音声認識した結果に基づいて、発話の切れ目の時刻に基づいて動画分割時刻を調整する。 The division processing unit 105a divides the input moving image data at the moving image division time included in the dividable time zone obtained by the video change degree analysis unit 107 among the moving image division times obtained by the content change degree analysis unit 104. This is a part to be stored in the storage device 101. Further, the division processing unit 105a adjusts the moving image division time based on the time of the utterance break based on the result of speech recognition by the content feature amount calculation unit 103.

以下、図面を用いて動画分割時刻の選択処理について説明する。図７に示すように、内容変化度解析部１０４により算出された変化度のうち、閾値を超える変化度として、変化度ｐ１〜変化度ｐ５の５つが存在した場合で、安定領域の変化が閾値を超えた分割可能時間帯ｓ１〜ｓ３を想定する。この場合、変化度ｐ１〜ｐ５における動画分割時刻のうち、分割可能時間帯ｓ１〜ｓ３に含まれる変化度ｐ２、ｐ３、ｐ５に対応する時刻がトピック分割すべき動画分割時刻として選択される。 Hereinafter, a process of selecting a moving image division time will be described with reference to the drawings. As shown in FIG. 7, in the case where there are five change degrees p <b> 1 to change degree p <b> 5 among the change degrees calculated by the content change degree analysis unit 104 that exceed the threshold value, the change in the stable region is equal to the threshold value. S1 to s3 are assumed. In this case, among the moving image division times at the degrees of change p1 to p5, the times corresponding to the degrees of change p2, p3, and p5 included in the dividable time zones s1 to s3 are selected as the moving image division times to be topic-divided.

この構成により、映像部分の内容と音声部分の内容との両方に基づいてトピック分割することができる。 With this configuration, topic division can be performed based on both the content of the video portion and the content of the audio portion.

つぎに、他の本実施形態にかかるトピック分割装置１００ａの動作について説明する。図８は、映像部分と音声部分とを用いて、トピック分割を行うトピック分割装置１００ａの動作を示すフローチャートである。 Next, the operation of the topic division device 100a according to another embodiment will be described. FIG. 8 is a flowchart showing the operation of the topic division device 100a that divides a topic using a video part and an audio part.

ステップＳ１０１〜Ｓ１０９は、上記本実施形態における処理と同じであり、音声部分における意味を示す特徴量に基づいて内容を解析して、動画分割時刻を特定する処理である。 Steps S101 to S109 are the same as the processing in the present embodiment described above, and are processings for analyzing the content based on the characteristic amount indicating the meaning in the audio part and specifying the moving image division time.

ステップＳ１０１において、コンテンツ読込部１０２は、蓄積装置１０１から動画データを読込み、さらに動画データから映像部分を抽出する（Ｓ１０２ａ）。安定領域検出部１０６は、抽出した映像部分から各画素の時系列方向の分散を算出する（Ｓ１０３ａ）。安定領域検出部１０６が、一定以上の分散を持つ画素と一定未満の分散を持つ画素とに各画素を分類し、一定未満の分散と分類された画素の集合を安定領域とする（Ｓ１０４ａ）。このとき、安定領域の外形が矩形であるといった情報を事前に与え、それに該当する領域のみに安定領域を限定しても良い。 In step S101, the content reading unit 102 reads moving image data from the storage device 101, and further extracts a video portion from the moving image data (S102a). The stable area detection unit 106 calculates the variance of each pixel in the time series direction from the extracted video part (S103a). The stable region detection unit 106 classifies each pixel into pixels having a variance equal to or more than a certain value and pixels having a variance less than a certain value, and sets a set of pixels classified as a variance less than a certain value as a stable region (S104a). At this time, information that the outer shape of the stable area is rectangular may be given in advance, and the stable area may be limited to only the corresponding area.

つぎに、映像変化度解析部１０７が、安定領域内の時系列方向の変化度を算出し（Ｓ１０４ａ）、閾値を超える変化度があった時間帯を分割可能時間帯として算出する（Ｓ１０５ａ）。 Next, the video change degree analysis unit 107 calculates the change degree in the time series direction within the stable area (S104a), and calculates a time zone in which the change degree exceeds the threshold as a dividable time zone (S105a).

そして、分割処理部１０５ａが、図７に示すように、内容変化度解析部１０４から渡された動画分割時刻のうち、映像変化度解析部１０７から渡された分割可能時間帯に含まれる動画分割時刻で、動画データの分割を行い、分割した動画データを蓄積装置１０１に記憶する。このとき、発話の途中でトピック分割されることを避けるため、図９に示すように、各トピック分割の時刻を、内容変化度解析部１０４から渡された発話単位時刻の中から最近傍のものに変換しても良い。なお、図９において、時刻Ｘ１、Ｘ２が、内容変化度解析部１０４から渡された動画分割時刻であり、時刻Ｙ１、Ｙ２が、時刻Ｘ１、Ｘ２に最も近い発話単位時刻である。それぞれ時刻Ｘ１を時刻Ｙ１に、時刻Ｘ２を時刻Ｙ２に変更することで、発話の切れ目で動画データを分割することができる。 Then, as shown in FIG. 7, the division processing unit 105 a divides the moving image division time included in the divisional time zone passed from the video change degree analysis unit 107 out of the moving image division time passed from the content change degree analysis unit 104. At time, the moving image data is divided, and the divided moving image data is stored in the storage device 101. At this time, in order to avoid topic division in the middle of the utterance, as shown in FIG. 9, the time of each topic division is set to the nearest one of the utterance unit times passed from the content change degree analysis unit 104. May be converted to In FIG. 9, times X1 and X2 are moving image division times passed from the content change degree analysis unit 104, and times Y1 and Y2 are utterance unit times closest to the times X1 and X2. By changing the time X1 to the time Y1 and the time X2 to the time Y2, the moving image data can be divided at a break between the utterances.

つぎに、本実施形態のトピック分割装置１００の作用効果について説明する。このトピック分割装置１００は、入力した動画データから音声部分を抽出するコンテンツ読込部１０２と、音声部分から、所定時刻ごとに、意味情報を表す特徴量を算出する内容特徴量計算部１０３と、特徴量の変化に基づいて、一または複数の動画分割時刻を特定する内容変化度解析部１０４と、動画分割時刻に基づいて、動画データを分割する分割処理部１０５と、を備える。 Next, the operation and effect of the topic division device 100 of the present embodiment will be described. The topic dividing apparatus 100 includes a content reading unit 102 for extracting an audio part from input moving image data, a content characteristic amount calculating unit 103 for calculating a characteristic amount representing semantic information from the audio part at predetermined time intervals, The system includes a content change degree analysis unit 104 for specifying one or a plurality of moving image division times based on a change in the amount, and a division processing unit 105 for dividing moving image data based on the moving image division time.

この構成により、動画データにおける音声部分の意味内容から動画データを分割することができる。したがって、動画データにおけるトピックごとに分割することができる。例えばニュース番組のようにいくつかのトピックに分かれている動画データをそのトピックごとに分割することで、動画データの編集等を容易にすることができる。 With this configuration, the moving image data can be divided from the meaning of the audio part in the moving image data. Therefore, the video data can be divided for each topic. For example, by dividing moving image data divided into several topics such as a news program for each topic, editing of moving image data and the like can be facilitated.

特に、発話内容の話題の変化を用いることで、音響特徴や画像特徴における変化が乏しくとも動画データを分割できる。 In particular, by using the change of the topic of the utterance content, the moving image data can be divided even if the change in the acoustic feature or the image feature is small.

また、本実施形態のトピック分割装置１００において、内容特徴量計算部１０３は、音声部分における音声波形を入力し、各時刻間における波形を、意味情報を表す特徴量に変換する変換部を有する。内容変化度解析部１０４は、特徴量と時刻とのペアに基づいて動画分割時刻を特定し、分割処理部１０５は、動画データを分割する。 Further, in the topic dividing apparatus 100 of the present embodiment, the content feature amount calculation unit 103 has a conversion unit that inputs a speech waveform in a speech part and converts a waveform between respective times into a feature amount representing semantic information. The content change degree analysis unit 104 specifies the moving image division time based on the pair of the feature amount and the time, and the division processing unit 105 divides the moving image data.

また、本実施形態のトピック分割装置１００において、内容変化度解析部１０４は、特徴量と時刻とのペアに基づいて、時系列上での特徴量の変化から話題の変化した時刻を検出して、動画分割時刻として特定する時刻検出部を含む。 Further, in the topic dividing apparatus 100 of the present embodiment, the content change degree analysis unit 104 detects the time at which the topic has changed from the change of the feature amount in the time series based on the pair of the feature amount and the time. And a time detection unit that specifies the time as a moving image division time.

また、本実施形態のトピック分割装置１００において、内容変化度解析部１０４は、特徴量の変化量が所定条件を満たすときの動画分割時刻が複数ある場合に、動画分割時刻における時間差が所定値以下である場合には、その変化量に基づいていずれかの動画分割時刻を選択する。例えば、所定条件とは、変化量が所定値以上である、または所定の順位内の変化量とする。 Further, in the topic division device 100 of the present embodiment, when there are a plurality of moving image division times when the amount of change in the feature amount satisfies the predetermined condition, the content change degree analysis unit 104 sets the time difference between the moving image division times to a predetermined value or less. In the case of, one of the moving image division times is selected based on the change amount. For example, the predetermined condition is a change amount equal to or more than a predetermined value or a change amount within a predetermined order.

また、本実施形態のトピック分割装置１００および他の実施形態のトピック分割装置１００ａにおいて、分割処理部１０５（１０５ａ）は、音声部分における発話の切れ目に基づいて、動画分割時刻を調整して分割する。この構成により、発話の途中で分割することを防止することができる。 Further, in the topic dividing device 100 of the present embodiment and the topic dividing device 100a of another embodiment, the division processing unit 105 (105a) adjusts the moving image division time based on the break of the utterance in the audio part to perform division. . With this configuration, it is possible to prevent division during the utterance.

また、他の本実施形態におけるトピック分割装置１００ａは、入力した動画データから映像部分を抽出するコンテンツ読込部１０２と、映像部分から安定領域を検出する安定領域検出部１０６と、安定領域の映像部分の変化を解析する映像変化度解析部１０７と、を備え、分割処理部１０５ａは、動画分割時刻に加えて、映像部分の変化に基づいて、動画データを分割する。 The topic dividing apparatus 100a according to another embodiment includes a content reading unit 102 that extracts a video portion from input moving image data, a stable region detection unit 106 that detects a stable region from the video portion, and a video portion of the stable region. And a video change degree analysis unit 107 for analyzing the change of the video data. The division processing unit 105a divides the video data based on the change of the video portion in addition to the video division time.

この構成により、音声部分と映像部分とから動画データを分割することができる。特に映像部分には、安定領域を含むことが多く、この安定領域は、内容に連動して変化する場合が多い。従って、内容に則した動画分割時刻を特定し、それに基づいた分割を行うことができる。 With this configuration, the moving image data can be divided from the audio part and the video part. In particular, the video portion often includes a stable region, and the stable region often changes in accordance with the content. Therefore, it is possible to specify the moving image division time according to the content and perform the division based on the time.

また、他の実施形態におけるトピック分割装置１００ａにおいて、安定領域検出部１０６は、安定領域における各画素の時系列における変化を参照し、話題が変化していない箇所では画素値の変化の少ない領域を安定領域として検出する。 Further, in the topic dividing device 100a according to another embodiment, the stable area detecting unit 106 refers to a change in the time series of each pixel in the stable area, and determines an area where the pixel value does not change much at a place where the topic does not change. Detect as a stable region.

この構成により、ニュース番組などのテロップなどの安定領域を正確に検出することができる。 With this configuration, a stable area such as a telop of a news program can be accurately detected.

また、他の実施形態におけるトピック分割装置１００ａにおいて、映像変化度解析部１０７は、安定領域として検出された箇所が変化した一または複数の時間帯を分割可能時間帯として検出し、分割処理部１０５ａは、検出した分割可能時間帯および動画分割時刻に基づいて、動画データを分割する。 Further, in the topic dividing device 100a according to another embodiment, the video change degree analyzing unit 107 detects one or a plurality of time zones in which a portion detected as a stable area has changed as a dividable time zone, and Divides the moving image data based on the detected dividable time zone and the moving image division time.

この構成により、内容に則した動画分割時刻を正確に特定し、それに基づいた分割を行うことができる。 With this configuration, it is possible to accurately specify a moving image division time in accordance with the content, and perform division based on the time.

なお、上記実施形態の説明に用いたブロック図は、機能単位のブロックを示している。これらの機能ブロック（構成部）は、ハードウェア及びソフトウェアの少なくとも一方の任意の組み合わせによって実現される。また、各機能ブロックの実現方法は特に限定されない。すなわち、各機能ブロックは、物理的又は論理的に結合した１つの装置を用いて実現されてもよいし、物理的又は論理的に分離した２つ以上の装置を直接的又は間接的に（例えば、有線、無線などを用いて）接続し、これら複数の装置を用いて実現されてもよい。機能ブロックは、上記１つの装置又は上記複数の装置にソフトウェアを組み合わせて実現されてもよい。 Note that the block diagram used in the description of the above-described embodiment shows blocks in functional units. These functional blocks (components) are realized by an arbitrary combination of at least one of hardware and software. In addition, a method of implementing each functional block is not particularly limited. That is, each functional block may be realized using one device physically or logically coupled, or directly or indirectly (for example, two or more devices physically or logically separated from each other). , Wired, wireless, etc.), and may be implemented using these multiple devices. The functional block may be realized by combining one device or the plurality of devices with software.

機能には、判断、決定、判定、計算、算出、処理、導出、調査、探索、確認、受信、送信、出力、アクセス、解決、選択、選定、確立、比較、想定、期待、見做し、報知（broadcasting）、通知（notifying）、通信（communicating）、転送（forwarding）、構成（configuring）、再構成（reconfiguring）、割り当て（allocating、mapping）、割り振り（assigning）などがあるが、これらに限られない。たとえば、送信を機能させる機能ブロック（構成部）は、送信部（transmitting unit）や送信機（transmitter）と呼称される。いずれも、上述したとおり、実現方法は特に限定されない。 Functions include judgment, decision, judgment, calculation, calculation, processing, derivation, investigation, search, confirmation, reception, transmission, output, access, resolution, selection, selection, establishment, comparison, assumption, expectation, deemed, Broadcasting, notifying, communicating, forwarding, configuring, reconfiguring, allocating, mapping, assigning, but not limited to these I can't. For example, a functional block (configuration unit) that causes transmission to function is called a transmitting unit (transmitting unit) or a transmitter (transmitter). In any case, as described above, the realization method is not particularly limited.

例えば、本開示の一実施の形態におけるトピック分割装置１００（１００ａ）などは、本開示の無線通信方法の処理を行うコンピュータとして機能してもよい。図１０は、本開示の一実施の形態に係るトピック分割装置１００（１００ａ）のハードウェア構成の一例を示す図である。上述のトピック分割装置１００（１００ａ）は、物理的には、プロセッサ１００１、メモリ１００２、ストレージ１００３、通信装置１００４、入力装置１００５、出力装置１００６、バス１００７などを含むコンピュータ装置として構成されてもよい。 For example, the topic division device 100 (100a) or the like according to an embodiment of the present disclosure may function as a computer that performs processing of the wireless communication method according to the present disclosure. FIG. 10 is a diagram illustrating an example of a hardware configuration of the topic division device 100 (100a) according to an embodiment of the present disclosure. The above-described topic division device 100 (100a) may be physically configured as a computer device including a processor 1001, a memory 1002, a storage 1003, a communication device 1004, an input device 1005, an output device 1006, a bus 1007, and the like. .

なお、以下の説明では、「装置」という文言は、回路、デバイス、ユニットなどに読み替えることができる。トピック分割装置１００（１００ａ）のハードウェア構成は、図に示した各装置を１つ又は複数含むように構成されてもよいし、一部の装置を含まずに構成されてもよい。 In the following description, the term “apparatus” can be read as a circuit, a device, a unit, or the like. The hardware configuration of the topic division device 100 (100a) may be configured to include one or more devices shown in the drawing, or may be configured without including some devices.

トピック分割装置１００（１００ａ）における各機能は、プロセッサ１００１、メモリ１００２などのハードウェア上に所定のソフトウェア（プログラム）を読み込ませることによって、プロセッサ１００１が演算を行い、通信装置１００４による通信を制御したり、メモリ１００２及びストレージ１００３におけるデータの読み出し及び書き込みの少なくとも一方を制御したりすることによって実現される。 The functions of the topic dividing apparatus 100 (100a) are controlled by reading predetermined software (program) on hardware such as the processor 1001 and the memory 1002 so that the processor 1001 performs an operation and controls communication by the communication apparatus 1004. Or by controlling at least one of data reading and writing in the memory 1002 and the storage 1003.

プロセッサ１００１は、例えば、オペレーティングシステムを動作させてコンピュータ全体を制御する。プロセッサ１００１は、周辺装置とのインターフェース、制御装置、演算装置、レジスタなどを含む中央処理装置（ＣＰＵ：Central Processing Unit）によって構成されてもよい。例えば、上述の内容特徴量計算部１０３、内容変化度解析部１０４などは、プロセッサ１００１によって実現されてもよい。 The processor 1001 controls the entire computer by operating an operating system, for example. The processor 1001 may be configured by a central processing unit (CPU) including an interface with a peripheral device, a control device, an arithmetic device, a register, and the like. For example, the content feature amount calculation unit 103 and the content change degree analysis unit 104 described above may be realized by the processor 1001.

また、プロセッサ１００１は、プログラム（プログラムコード）、ソフトウェアモジュール、データなどを、ストレージ１００３及び通信装置１００４の少なくとも一方からメモリ１００２に読み出し、これらに従って各種の処理を実行する。プログラムとしては、上述の実施の形態において説明した動作の少なくとも一部をコンピュータに実行させるプログラムが用いられる。例えば、トピック分割装置１００（１００ａ）の内容特徴量計算部１０３等は、メモリ１００２に格納され、プロセッサ１００１において動作する制御プログラムによって実現されてもよく、他の機能ブロックについても同様に実現されてもよい。上述の各種処理は、１つのプロセッサ１００１によって実行される旨を説明してきたが、２以上のプロセッサ１００１により同時又は逐次に実行されてもよい。プロセッサ１００１は、１以上のチップによって実装されてもよい。なお、プログラムは、電気通信回線を介してネットワークから送信されても良い。 In addition, the processor 1001 reads a program (program code), a software module, data, and the like from at least one of the storage 1003 and the communication device 1004 to the memory 1002, and executes various processes according to these. As the program, a program that causes a computer to execute at least a part of the operation described in the above embodiment is used. For example, the content feature amount calculation unit 103 and the like of the topic division device 100 (100a) may be implemented by a control program stored in the memory 1002 and operated by the processor 1001, and other functional blocks may be implemented similarly. Is also good. Although it has been described that the various processes described above are executed by one processor 1001, the processes may be executed simultaneously or sequentially by two or more processors 1001. Processor 1001 may be implemented by one or more chips. Note that the program may be transmitted from a network via a telecommunication line.

メモリ１００２は、コンピュータ読み取り可能な記録媒体であり、例えば、ＲＯＭ（Read Only Memory）、ＥＰＲＯＭ（Erasable Programmable ＲＯＭ）、ＥＥＰＲＯＭ（Electrically Erasable Programmable ＲＯＭ）、ＲＡＭ（Random Access Memory）などの少なくとも１つによって構成されてもよい。メモリ１００２は、レジスタ、キャッシュ、メインメモリ（主記憶装置）などと呼ばれてもよい。メモリ１００２は、本開示の一実施の形態に係る無線通信方法を実施するために実行可能なプログラム（プログラムコード）、ソフトウェアモジュールなどを保存することができる。 The memory 1002 is a computer-readable recording medium, and includes, for example, at least one of a ROM (Read Only Memory), an EPROM (Erasable Programmable ROM), an EEPROM (Electrically Erasable Programmable ROM), and a RAM (Random Access Memory). May be done. The memory 1002 may be called a register, a cache, a main memory (main storage device), or the like. The memory 1002 can store a program (program code), a software module, and the like that can be executed to execute the wireless communication method according to an embodiment of the present disclosure.

ストレージ１００３は、コンピュータ読み取り可能な記録媒体であり、例えば、ＣＤ−ＲＯＭ（Compact Disc ＲＯＭ）などの光ディスク、ハードディスクドライブ、フレキシブルディスク、光磁気ディスク(例えば、コンパクトディスク、デジタル多用途ディスク、Ｂｌｕ−ｒａｙ（登録商標）ディスク)、スマートカード、フラッシュメモリ(例えば、カード、スティック、キードライブ)、フロッピー（登録商標）ディスク、磁気ストリップなどの少なくとも１つによって構成されてもよい。ストレージ１００３は、補助記憶装置と呼ばれてもよい。上述の記憶媒体は、例えば、メモリ１００２及びストレージ１００３の少なくとも一方を含むデータベース、サーバその他の適切な媒体であってもよい。 The storage 1003 is a computer-readable recording medium such as an optical disk such as a CD-ROM (Compact Disc ROM), a hard disk drive, a flexible disk, and a magneto-optical disk (eg, a compact disk, a digital versatile disk, a Blu-ray). (Registered trademark) disk, a smart card, a flash memory (eg, card, stick, key drive), a floppy (registered trademark) disk, a magnetic strip, or the like. The storage 1003 may be called an auxiliary storage device. The above-described storage medium may be, for example, a database including at least one of the memory 1002 and the storage 1003, a server, or another appropriate medium.

通信装置１００４は、有線ネットワーク及び無線ネットワークの少なくとも一方を介してコンピュータ間の通信を行うためのハードウェア（送受信デバイス）であり、例えばネットワークデバイス、ネットワークコントローラ、ネットワークカード、通信モジュールなどともいう。通信装置１００４は、例えば周波数分割複信（ＦＤＤ：Frequency Division Duplex）及び時分割複信（ＴＤＤ：Time Division Duplex）の少なくとも一方を実現するために、高周波スイッチ、デュプレクサ、フィルタ、周波数シンセサイザなどを含んで構成されてもよい。 The communication device 1004 is hardware (transmission / reception device) for performing communication between computers via at least one of a wired network and a wireless network, and is also referred to as, for example, a network device, a network controller, a network card, a communication module, or the like. The communication device 1004 includes a high-frequency switch, a duplexer, a filter, a frequency synthesizer, and the like, for example, in order to realize at least one of frequency division duplex (FDD) and time division duplex (TDD). May be configured.

入力装置１００５は、外部からの入力を受け付ける入力デバイス（例えば、キーボード、マウス、マイクロフォン、スイッチ、ボタン、センサなど）である。出力装置１００６は、外部への出力を実施する出力デバイス（例えば、ディスプレイ、スピーカー、LEDランプなど）である。なお、入力装置１００５及び出力装置１００６は、一体となった構成（例えば、タッチパネル）であってもよい。 The input device 1005 is an input device that receives an external input (for example, a keyboard, a mouse, a microphone, a switch, a button, a sensor, and the like). The output device 1006 is an output device that performs output to the outside (for example, a display, a speaker, an LED lamp, and the like). Note that the input device 1005 and the output device 1006 may have an integrated configuration (for example, a touch panel).

また、プロセッサ１００１、メモリ１００２などの各装置は、情報を通信するためのバス１００７によって接続される。バス１００７は、単一のバスを用いて構成されてもよいし、装置間ごとに異なるバスを用いて構成されてもよい。 Each device such as the processor 1001 and the memory 1002 is connected by a bus 1007 for communicating information. The bus 1007 may be configured using a single bus, or may be configured using a different bus for each device.

また、トピック分割装置１００（１００ａ）は、マイクロプロセッサ、デジタル信号プロセッサ（ＤＳＰ：Digital Signal Processor）、ＡＳＩＣ（Application Specific Integrated Circuit）、ＰＬＤ（Programmable Logic Device）、ＦＰＧＡ（Field Programmable Gate Array）などのハードウェアを含んで構成されてもよく、当該ハードウェアにより、各機能ブロックの一部又は全てが実現されてもよい。例えば、プロセッサ１００１は、これらのハードウェアの少なくとも１つを用いて実装されてもよい。 The topic dividing device 100 (100a) includes hardware such as a microprocessor, a digital signal processor (DSP), an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), and an FPGA (Field Programmable Gate Array). It may be configured to include hardware, and some or all of the functional blocks may be realized by the hardware. For example, the processor 1001 may be implemented using at least one of these pieces of hardware.

情報の通知は、本開示において説明した態様／実施形態に限られず、他の方法を用いて行われてもよい。例えば、情報の通知は、物理レイヤシグナリング（例えば、ＤＣＩ（Downlink Control Information）、ＵＣＩ（Uplink Control Information））、上位レイヤシグナリング（例えば、ＲＲＣ（Radio Resource Control）シグナリング、ＭＡＣ（Medium Access Control）シグナリング、報知情報（ＭＩＢ（Master Information Block）、ＳＩＢ（System Information Block）））、その他の信号又はこれらの組み合わせによって実施されてもよい。また、ＲＲＣシグナリングは、ＲＲＣメッセージと呼ばれてもよく、例えば、ＲＲＣ接続セットアップ（RRC Connection Setup）メッセージ、ＲＲＣ接続再構成（RRC Connection Reconfiguration）メッセージなどであってもよい。 The notification of information is not limited to the aspects / embodiments described in the present disclosure, and may be performed using another method. For example, the information is notified by physical layer signaling (for example, DCI (Downlink Control Information), UCI (Uplink Control Information)), higher layer signaling (for example, RRC (Radio Resource Control) signaling, MAC (Medium Access Control) signaling, Broadcast information (MIB (Master Information Block), SIB (System Information Block))), other signals, or a combination thereof may be used. Further, the RRC signaling may be called an RRC message, and may be, for example, an RRC connection setup message, an RRC connection reconfiguration message, or the like.

本開示において説明した各態様／実施形態は、ＬＴＥ（Long Term Evolution）、ＬＴＥ−Ａ（LTE-Advanced）、ＳＵＰＥＲ３Ｇ、ＩＭＴ−Ａｄｖａｎｃｅｄ、４Ｇ（4th generation mobile communication system）、５Ｇ（5th generation mobile communication system）、ＦＲＡ（Future Radio Access）、ＮＲ（new Radio）、Ｗ−ＣＤＭＡ（登録商標）、ＧＳＭ（登録商標）、ＣＤＭＡ２０００、ＵＭＢ（Ultra Mobile Broadband）、ＩＥＥＥ８０２．１１（Ｗｉ−Ｆｉ（登録商標））、ＩＥＥＥ８０２．１６（ＷｉＭＡＸ（登録商標））、ＩＥＥＥ８０２．２０、ＵＷＢ（Ultra-WideBand）、Ｂｌｕｅｔｏｏｔｈ（登録商標）、その他の適切なシステムを利用するシステム及びこれらに基づいて拡張された次世代システムの少なくとも一つに適用されてもよい。また、複数のシステムが組み合わされて（例えば、ＬＴＥ及びＬＴＥ−Ａの少なくとも一方と５Ｇとの組み合わせ等）適用されてもよい。 Each aspect / embodiment described in the present disclosure is applicable to LTE (Long Term Evolution), LTE-A (LTE-Advanced), SUPER 3G, IMT-Advanced, 4G (4th generation mobile communication system), 5G (5th generation mobile communication system). system), FRA (Future Radio Access), NR (new Radio), W-CDMA (registered trademark), GSM (registered trademark), CDMA2000, UMB (Ultra Mobile Broadband), IEEE 802.11 (Wi-Fi (registered trademark) )), Systems utilizing IEEE 802.16 (WiMAX®), IEEE 802.20, UWB (Ultra-WideBand), Bluetooth (registered trademark), and other suitable systems, and extensions based thereon. It may be applied to at least one of the next generation systems. Further, a plurality of systems may be combined (for example, a combination of at least one of LTE and LTE-A with 5G) and applied.

本開示において説明した各態様／実施形態の処理手順、シーケンス、フローチャートなどは、矛盾の無い限り、順序を入れ替えてもよい。例えば、本開示において説明した方法については、例示的な順序を用いて様々なステップの要素を提示しており、提示した特定の順序に限定されない。 The processing procedure, sequence, flowchart, and the like of each aspect / embodiment described in the present disclosure may be rearranged as long as there is no inconsistency. For example, for the methods described in this disclosure, elements of various steps are presented in an exemplary order, and are not limited to the specific order presented.

本開示で使用する「判断(determining)」、「決定(determining)」という用語は、多種多様な動作を包含する場合がある。「判断」、「決定」は、例えば、判定(judging)、計算(calculating)、算出(computing)、処理(processing)、導出(deriving)、調査(investigating)、探索(looking up、search、inquiry)（例えば、テーブル、データベース又は別のデータ構造での探索）、確認(ascertaining)した事を「判断」「決定」したとみなす事などを含み得る。また、「判断」、「決定」は、受信(receiving)（例えば、情報を受信すること）、送信(transmitting)(例えば、情報を送信すること)、入力(input)、出力(output)、アクセス(accessing)（例えば、メモリ中のデータにアクセスすること）した事を「判断」「決定」したとみなす事などを含み得る。また、「判断」、「決定」は、解決(resolving)、選択(selecting)、選定(choosing)、確立(establishing)、比較(comparing)などした事を「判断」「決定」したとみなす事を含み得る。つまり、「判断」「決定」は、何らかの動作を「判断」「決定」したとみなす事を含み得る。また、「判断（決定）」は、「想定する（assuming）」、「期待する（expecting）」、「みなす（considering）」などで読み替えられてもよい。 The terms "determining" and "determining" as used in the present disclosure may encompass a wide variety of operations. `` Judgment '', `` decision '', for example, judgment (judging), calculation (calculating), calculation (computing), processing (processing), derivation (deriving), investigating (investigating), searching (looking up, search, inquiry) (E.g., searching in a table, database, or another data structure), ascertaining may be considered "determined", "determined", and the like. Also, “determining” and “deciding” include receiving (eg, receiving information), transmitting (eg, transmitting information), input (input), output (output), and access. (accessing) (for example, accessing data in a memory) may be regarded as “determined” or “determined”. In addition, `` judgment '' and `` decision '' means that resolving, selecting, selecting, establishing, establishing, comparing, etc. are considered as `` judgment '' and `` decided ''. May be included. In other words, “judgment” and “decision” may include deeming any operation as “judgment” and “determined”. “Judgment (determination)” may be read as “assuming”, “expecting”, “considering”, or the like.

本開示において使用する「に基づいて」という記載は、別段に明記されていない限り、「のみに基づいて」を意味しない。言い換えれば、「に基づいて」という記載は、「のみに基づいて」と「に少なくとも基づいて」の両方を意味する。 The phrase "based on" as used in the present disclosure does not mean "based solely on" unless stated otherwise. In other words, the phrase "based on" means both "based only on" and "based at least on."

本開示において、「含む（include）」、「含んでいる（including）」及びそれらの変形が使用されている場合、これらの用語は、用語「備える（comprising）」と同様に、包括的であることが意図される。さらに、本開示において使用されている用語「又は（or）」は、排他的論理和ではないことが意図される。 Where the terms “include”, “including” and variations thereof are used in the present disclosure, these terms are as inclusive as the term “comprising” Is intended. Further, the term "or" as used in the present disclosure is not intended to be an exclusive or.

本開示において、例えば、英語でのa, an及びtheのように、翻訳により冠詞が追加された場合、本開示は、これらの冠詞の後に続く名詞が複数形であることを含んでもよい。 In the present disclosure, where articles are added by translation, for example, a, an and the in English, the present disclosure may include that the nouns following these articles are plural.

１００ …トピック分割装置、１００ａ…トピック分割装置、１０１…蓄積装置、１０２…コンテンツ読込部、１０３…内容特徴量計算部、１０４…内容変化度解析部、１０５…分割処理部、１０５ａ…分割処理部、１０６…安定領域検出部、１０７…映像変化度解析部。

100: Topic dividing device, 100a: Topic dividing device, 101: Storage device, 102: Content reading unit, 103: Content feature amount calculating unit, 104: Content change degree analyzing unit, 105: Dividing processing unit, 105a: Dividing processing unit .., 106... A stable area detection unit, 107.

Claims

An audio extraction unit for extracting an audio part from the input video data,
A calculating unit that calculates a feature amount representing semantic information from the audio portion at each predetermined time;
A specifying unit that specifies one or a plurality of moving image division times based on the change in the feature amount;
A dividing unit that divides the moving image data based on the moving image division time;
A topic division device comprising:

The calculation unit,
A conversion unit for inputting a voice waveform in the voice portion and converting a waveform between respective times into a feature amount representing semantic information,
The specifying unit specifies a moving image division time based on a pair of a feature amount and a time,
The dividing unit divides the moving image data,
The topic dividing device according to claim 1.

The identification unit is
When there are a plurality of moving image division times when the amount of change in the feature amount satisfies a predetermined condition, and when the time difference between the moving image division times is equal to or less than a predetermined value, one of the moving image division times is determined based on the change amount. select,
The topic dividing device according to claim 1.

The topic division device according to any one of claims 1 to 3, wherein the division unit adjusts a moving image division time based on a break between utterances in an audio part and performs division.

A video extracting unit for extracting a video portion from the input moving image data,
An area detection unit that detects a stable area from the image portion,
An analysis unit that analyzes a change in an image portion of the stable region,
With
The topic division device according to claim 1, wherein the division unit divides the moving image data based on a change in the video portion in addition to the moving image division time.

The area detection unit,
6. The topic division device according to claim 5, wherein a time-series change of each pixel in the stable area is referred to, and an area having a small change in pixel value is detected as a stable area in a place where the topic has not changed.

The analysis unit,
Detecting one or more time zones where the location detected as the stable area has changed as a dividable time zone,
The dividing unit includes:
The topic division device according to claim 5, wherein the moving image data is divided based on the divideable time zone and the moving image division time.