JP2021164094A

JP2021164094A - Recording device and recording system using it

Info

Publication number: JP2021164094A
Application number: JP2020065858A
Authority: JP
Inventors: 祐基山崎; Yuki Yamazaki; 太志山崎; Futoshi Yamazaki; 実加藤; Minoru Kato; 伸太郎鈴村; Shintaro Suzumura; 紀之木島; Noriyuki Kijima
Original assignee: Maxell Ltd
Current assignee: Maxell Ltd
Priority date: 2020-04-01
Filing date: 2020-04-01
Publication date: 2021-10-11

Abstract

【課題】映像信号や音声信号を用いたインデックスを連携させて、インデックスを利用する際の使い勝手が向上可能な、記録装置、及び、それを用いた記録システムを提供する。【解決手段】入力映像と入力映像におけるインデックスを記録する記録装置であって、インデックスは、入力映像の映像信号を用いたアノテーションインデックスと、入力映像に付随する音声信号を用いた音声インデックスであり、音声インデックスとアノテーションインデックスの生成時間が重複している場合、それぞれの関連付けを行ない、関連付けられた音声インデックスとアノテーションインデックスの何れかを選択することで、選択したインデックスの生成時間の開始点の時刻に対応した時刻を記録する構成とする。【選択図】図１１A recording device and a recording system using the recording device are provided, in which indexes using video signals and audio signals are linked to improve usability when using the indexes. A recording device for recording an input video and an index in the input video, wherein the index is an annotation index using a video signal of the input video and an audio index using an audio signal accompanying the input video, If the generation time of the audio index and the annotation index overlap, each association is performed, and by selecting either the associated audio index or annotation index, at the time of the start point of the generation time of the selected index It is configured to record the corresponding time. [Selection drawing] Fig. 11

Description

本発明は、映像と、その映像におけるインデックスを記録する記録装置に関する。 The present invention relates to an image and a recording device that records an index in the image.

従来、記録した映像において、特定の場面の検索や再生を行う際に利用するインデックスが知られている。 Conventionally, an index used when searching for or playing back a specific scene in a recorded video is known.

このようなインデックスは、映像信号を用いて、映像における被写体の色、形、動き、さらには、登場人物、動作イベントなどに基づいて発生させたものであり、これらが映像に付加された形で利用されている。 Such an index is generated based on the color, shape, movement of the subject in the video, characters, motion events, etc. using the video signal, and these are added to the video. It's being used.

また、音声信号を用いてインデックスを発生させる背景技術として特許文献１がある。特許文献１には、音声信号を用いて、イベント中に発生する各種特徴事項に対応するインデックスを自動的に発生する装置であって、音声データのパワーレベルと周波数を解析する各解析部と、それぞれからの特徴を抽出する各特徴抽出部と、得られた特徴からインデックスを発生、出力する部分とを具えた構成が開示されている。 Further, there is Patent Document 1 as a background technique for generating an index using an audio signal. Patent Document 1 describes a device that automatically generates an index corresponding to various feature items generated during an event using an audio signal, and includes each analysis unit that analyzes the power level and frequency of audio data. A configuration including each feature extraction unit that extracts features from each and a portion that generates and outputs an index from the obtained features is disclosed.

特開２００１−１４３４５１号公報Japanese Unexamined Patent Publication No. 2001-143451

上記した、映像信号や音声信号を用いて発生させたインデックスについては、それぞれ単独で使用される場合を想定しており、それらの連携については考慮されていなかった。また、そのインデックスを利用する際の操作性等の使い勝手についても考慮されていなかった。 The indexes generated by using the video signal and the audio signal described above are assumed to be used independently, and their cooperation is not considered. In addition, usability such as operability when using the index was not considered.

本発明は、これらの課題に鑑みなされたものであって、映像信号や音声信号を用いたインデックスを連携させて、インデックスを利用する際の使い勝手を向上させることを目的とする。 The present invention has been made in view of these problems, and an object of the present invention is to improve usability when using indexes by linking indexes using video signals and audio signals.

本発明は、その一例を挙げるならば、入力映像と入力映像におけるインデックスを記録する記録装置であって、インデックスは、入力映像の映像信号を用いたアノテーションインデックスと、入力映像に付随する音声信号を用いた音声インデックスであり、音声インデックスとアノテーションインデックスの生成時間が重複している場合、それぞれの関連付けを行ない、関連付けられた音声インデックスとアノテーションインデックスの何れかを選択することで、選択したインデックスの生成時間の開始点の時刻に対応した時刻を記録する構成とする。 The present invention is, for example, a recording device that records an input video and an index in the input video, and the index includes an annotation index using the video signal of the input video and an audio signal accompanying the input video. If the voice index used and the generation time of the voice index and the annotation index overlap, the selected index is generated by associating them with each other and selecting either the associated voice index or the annotation index. The configuration is such that the time corresponding to the time of the start point of the time is recorded.

本発明によれば、映像信号や音声信号を用いたインデックスを連携させて、インデックスを利用する際の使い勝手が向上可能な、記録装置、及び、それを用いた記録システムを提供できる。 According to the present invention, it is possible to provide a recording device capable of improving usability when using an index by linking indexes using video signals and audio signals, and a recording system using the index.

実施例１における記録装置を用いた記録システムの構成図である。It is a block diagram of the recording system using the recording apparatus in Example 1. FIG. 実施例１における記録装置の機能ブロック構成図である。It is a functional block block diagram of the recording apparatus in Example 1. FIG. 実施例１におけるアノテーションインデックスの判定を説明する図である。It is a figure explaining the determination of the annotation index in Example 1. FIG. 実施例１におけるアノテーションインデックスの表示形式を説明する図である。It is a figure explaining the display format of the annotation index in Example 1. FIG. 実施例１におけるアノテーションインデックスと音声インデックスの表示レイアウトを示す図である。It is a figure which shows the display layout of the annotation index and the voice index in Example 1. FIG. 実施例１におけるアノテーションインデックスと音声インデックスの他の表示レイアウトを示す図である。It is a figure which shows the other display layout of the annotation index and the voice index in Example 1. FIG. 実施例１における表示する音声インデックスの表示形式を説明する図である。It is a figure explaining the display format of the voice index to be displayed in Example 1. FIG. 実施例１における音声インデックスの作成処理フローチャートである。It is a flowchart of the voice index creation processing in Example 1. FIG. 実施例１における音声インデックスとアノテーションインデックスとのリンクについての説明図である。It is explanatory drawing about the link between the voice index and the annotation index in Example 1. FIG. 実施例１におけるアノテーションインデックスの表示形式を説明する図である。It is a figure explaining the display format of the annotation index in Example 1. FIG. 実施例１における音声インデックスとアノテーションインデックスとのリンクの表示形式を説明する図である。It is a figure explaining the display format of the link of the voice index and the annotation index in Example 1. FIG. 実施例１における音声インデックスとアノテーションインデックスの生成時刻の差が微小の場合のリンクの表示形式を説明する図である。It is a figure explaining the display format of the link when the difference between the generation time of the voice index and the annotation index in Example 1 is small. 実施例１における音声インデックスとアノテーションインデックスとのリンクの表示形式を説明する図である。It is a figure explaining the display format of the link of the voice index and the annotation index in Example 1. FIG. 実施例１における入力映像の録画とアノテーションインデックスや音声インデックスの保存構成を説明する概念図である。It is a conceptual diagram explaining the recording of the input video and the storage structure of the annotation index and the audio index in Example 1. 実施例１における音声インデックスのデータ構成図である。It is a data structure diagram of the voice index in Example 1. FIG. 実施例２における音声インデックスを用いた記録装置における入力映像の再生を説明する概念図である。It is a conceptual diagram explaining the reproduction of the input video in the recording apparatus using the audio index in Example 2. FIG. 実施例３における統合インデックスを説明する概念図である。It is a conceptual diagram explaining the integrated index in Example 3. FIG.

以下、図面を用いて、実施例を説明する。 Hereinafter, examples will be described with reference to the drawings.

図１は、本実施例における記録装置を用いた記録システムの構成図である。図１における記録システムは、例えば、学校での講義の映像を記録して、後から学生がその記録した講義内容を見直すことが出来る講義記録システムを想定している。 FIG. 1 is a configuration diagram of a recording system using the recording device in this embodiment. The recording system in FIG. 1 assumes, for example, a lecture recording system in which a video of a lecture at a school can be recorded and the student can review the recorded lecture content later.

図１において、１０は本実施例における記録装置であって、記録する映像の入力元となるプロジェクター２０やカメラ３０等の映像取得装置と、記録する映像に付随する音声の入力元となるマイク４０等の音声取得装置が接続される。なお、プロジェクター２０は、タッチペンなどによる表示画像への手書き文字や線や丸などの付加情報を追加できる機能を有しており、これらの付加情報は講義の中でも重要な場面となりうるため、注釈や補足情報という意味で以降アノテーション情報と称する。なお、上記機能を有するプロジェクターに代えて、同様の機能を有するいわゆる電子黒板でもよい。５０は記録装置１０に教材データを入力するためのデータ入力用ＰＣ（Personal Computer）であって、ドキュメントカメラやその他のデバイスでもよい。また、６０はサーバー、７０は学生が記録装置１０で記録された映像をサーバー６０経由で視聴するための視聴装置である視聴用ＰＣである。 In FIG. 1, reference numeral 10 denotes a recording device according to the present embodiment, which is an image acquisition device such as a projector 20 or a camera 30 that is an input source of a video to be recorded, and a microphone 40 that is an input source of audio accompanying the video to be recorded. Etc. are connected. The projector 20 has a function of adding handwritten characters, lines, circles, and other additional information to the display image by a touch pen or the like, and these additional information can be important scenes in the lecture. Hereinafter referred to as annotation information in the sense of supplementary information. Instead of the projector having the above function, a so-called electronic blackboard having the same function may be used. Reference numeral 50 denotes a data input PC (Personal Computer) for inputting teaching material data into the recording device 10, and may be a document camera or other device. Further, 60 is a server, and 70 is a viewing PC which is a viewing device for students to view the video recorded by the recording device 10 via the server 60.

図２は、本実施例における記録装置の機能ブロック構成図である。図２において、記録装置１０は、映像入力部１０１とチャンネル割当て部１０２と情報入力部１０３とアノテーション処理部１０４と音声入力部１０５と音声処理部１０６と映像処理部１０７と記録部１０８と映像出力部１０９で構成される。また、映像処理部１０７は映像合成部１１０とインデックス生成部１１１で構成される。なお、上記各処理部は図示しない制御部で制御される。 FIG. 2 is a functional block configuration diagram of the recording device in this embodiment. In FIG. 2, the recording device 10 includes a video input unit 101, a channel allocation unit 102, an information input unit 103, an annotation processing unit 104, an audio input unit 105, an audio processing unit 106, an image processing unit 107, a recording unit 108, and a video output. It is composed of a part 109. Further, the video processing unit 107 is composed of a video compositing unit 110 and an index generation unit 111. Each of the above processing units is controlled by a control unit (not shown).

映像入力部１０１は、プロジェクター、ＰＣやカメラ等との入力インターフェース部であり、ＨＭＤＩ（High-Definition Multimedia Interface）、ＶＧＡ（Video Graphics Array）、ＬＡＮ（Local Area Network）、ＳＤＩ（Serial Digital Interface）等から映像を取り込む。 The video input unit 101 is an input interface unit for a projector, a PC, a camera, etc., and includes HMDI (High-Definition Multimedia Interface), VGA (Video Graphics Array), LAN (Local Area Network), SDI (Serial Digital Interface), and the like. Capture video from.

チャンネル割当て部１０２は、映像入力部１０１から取り込んだ映像を複数の映像入力ごとの各チャンネルに割り当てる。 The channel allocation unit 102 allocates the video captured from the video input unit 101 to each channel for each of the plurality of video inputs.

情報入力部１０３は、映像に対するアノテーション情報の入力部であり、プロジェクターや電子黒板等とＵＳＢ（Universal Serial Bus）で接続され、例えば、タッチペンによる入力情報が入力される。 The information input unit 103 is an input unit for annotation information for video, and is connected to a projector, an electronic blackboard, or the like via USB (Universal Serial Bus), and for example, input information by a touch pen is input.

アノテーション処理部１０４は、取り込んだアノテーション情報を描画する処理であり、例えば、タッチペンによる入力情報を画像情報に変換処理する。 The annotation processing unit 104 is a process of drawing the captured annotation information, for example, converting the input information by the touch pen into image information.

音声入力部１０５は、マイク等からの音声データを入力するインターフェース部であり、マイク等とAudio I/OやＢｌｕｅＴｏｏｔｈ（登録商標）等で接続される。 The voice input unit 105 is an interface unit for inputting voice data from a microphone or the like, and is connected to the microphone or the like by Audio I / O, Bluetooth (registered trademark) or the like.

音声処理部１０６は、取り込んだ音声情報をデジタルデータとして処理し、さらに、音声から音声認識により文字へのテキスト化を行う。なお、テキスト化は、音声認識処理の負荷が大きい場合は、外部接続端子を介してＬＡＮ等で接続された外部サーバーによる処理としてもよい。 The voice processing unit 106 processes the captured voice information as digital data, and further converts the voice into text by voice recognition. If the load of the voice recognition process is large, the text conversion may be performed by an external server connected by a LAN or the like via an external connection terminal.

映像処理部１０７は、チャンネル割当て部１０２から取り込んだ映像とアノテーション処理部１０４で処理したアノテーション情報の描画情報を合成する映像合成部１１０と、インデックス生成部１１１を有する。さらに、後述する記録部１０８に記録された映像を取り込んで、記録された映像を再生する再生処理機能を有している。 The video processing unit 107 includes a video compositing unit 110 that synthesizes the video captured from the channel allocation unit 102 and the drawing information of the annotation information processed by the annotation processing unit 104, and an index generation unit 111. Further, it has a reproduction processing function of capturing the recorded video in the recording unit 108, which will be described later, and reproducing the recorded video.

インデックス生成部１１１は、後述するアノテーションインデックスと音声インデックスを生成する。 The index generation unit 111 generates an annotation index and a voice index, which will be described later.

記録部１０８は、ＨＤＤ等であり、映像処理部１０７で合成した映像と、インデックス生成部１１１で生成したインデックスを記録する。 The recording unit 108 is an HDD or the like, and records the video synthesized by the video processing unit 107 and the index generated by the index generation unit 111.

映像出力部１０９は、記録部１０８に記録した映像の再生映像信号出力や、映像処理部１０７からの映像信号を出力するインターフェースであり、サーバー等に接続するためのＬＡＮや、外部表示装置であるプロジェクターや電子黒板等にＨＭＤＩやＶＧＡで出力する。 The video output unit 109 is an interface for outputting the reproduced video signal of the video recorded in the recording unit 108 and the video signal from the video processing unit 107, and is a LAN for connecting to a server or the like or an external display device. Output in HMDI or VGA to a projector or electronic blackboard.

なお、図２に示す記録装置のハードウェア構成としては、処理装置（ＣＰＵ）と記憶装置（メモリ、ＨＤＤ等）と入出力インターフェース（Ｉ／Ｆ）を有する装置によって実現される。すなわち、図２における各処理部の処理は、メモリに格納されたそれらの処理プログラムをＣＰＵがソフトウェア処理することにより実行される。なお、映像処理等の負荷の大きい処理は、専用の信号処理プロセッサで行ってもよい。 The hardware configuration of the recording device shown in FIG. 2 is realized by a device having a processing device (CPU), a storage device (memory, HDD, etc.), and an input / output interface (I / F). That is, the processing of each processing unit in FIG. 2 is executed by the CPU processing those processing programs stored in the memory by software. It should be noted that processing with a large load such as video processing may be performed by a dedicated signal processing processor.

次に、インデックス生成部１１１で生成するアノテーションインデックスについて説明する。 Next, the annotation index generated by the index generation unit 111 will be described.

まず、アノテーションインデックスとは、入力した映像の瞬間を切り取ったスナップショットであって、アノテーション情報を含む複数のスナップショットが時間情報と共に保存されたものである。そして、所定のアノテーションインデックスを選択することで、選択したスナップショットの時刻に対応した時刻から記録した映像を再生または検索を行うことが出来る。以降、インデックスを用いた映像再生をインデックス再生という。 First, the annotation index is a snapshot of the moment of the input video, and a plurality of snapshots including the annotation information are stored together with the time information. Then, by selecting a predetermined annotation index, the video recorded from the time corresponding to the time of the selected snapshot can be played back or searched. Hereinafter, video reproduction using an index is referred to as index reproduction.

アノテーションインデックスの使用手順としては、まず、記憶装置１０で入力映像の録画を実行する。この録画開始指示は、記憶装置１０の本体に設けたボタンや、記憶装置１０の出力映像の画面上のアイコン選択により行う。次に、アノテーション情報を含む複数のスナップショットを撮影する。この撮影は、記憶装置１０に入力されている映像からアノテーション情報を含む画像を取得することであって、例えば、講義を行っている教師が、タッチペンなどにより表示画像への手書き文字や線や丸などの図形などのアノテーション情報を画面に描いたときに、スナップショットの撮影指示を記憶装置１０の出力映像の画面上のアイコン選択により行う。記憶装置１０は、スナップショットの撮影指示を受けて、その時の画像を入力映像から取得し、その時刻と共に記録部１０８で記録する。これにより、アノテーションインデックスが生成される。 As a procedure for using the annotation index, first, the storage device 10 records the input video. This recording start instruction is given by selecting a button provided on the main body of the storage device 10 or selecting an icon on the screen of the output video of the storage device 10. Next, take a plurality of snapshots including the annotation information. This shooting is to acquire an image including annotation information from the video input to the storage device 10. For example, a teacher giving a lecture uses a touch pen or the like to display handwritten characters, lines, or circles on the displayed image. When annotation information such as a figure such as is drawn on the screen, a snapshot shooting instruction is given by selecting an icon on the screen of the output image of the storage device 10. The storage device 10 receives an instruction to take a snapshot, acquires an image at that time from the input video, and records the time together with the recording unit 108. This will generate an annotation index.

次に、アノテーションインデックスを使用した再生を行う場合は、記憶装置１０において、撮影したスナップショットを再生し、再生したいアノテーション情報を含むスナップショットを選択することで、選択したスナップショットの作成時刻に対応した時刻から記録した映像を再生することが出来る。 Next, when performing reproduction using the annotation index, the captured snapshot is reproduced in the storage device 10, and the snapshot including the annotation information to be reproduced is selected to correspond to the creation time of the selected snapshot. The recorded video can be played back from the time recorded.

また、音声インデックスは、記憶装置１０での入力映像の録画開始から録画終了までの間の入力映像に付随する音声について作成されたインデックスであり、記録した映像において特定の場面からの再生や検索を行う際に利用する。 Further, the audio index is an index created for the audio accompanying the input video from the start of recording of the input video in the storage device 10 to the end of recording, and the recorded video can be played back or searched from a specific scene. Use when doing.

ここで、アノテーションインデックスは、アノテーション情報を含む複数のスナップショットを撮らなければならない点や、スナップショットの撮影によるアノテーションインデックス作成時刻しかインデックスとすることができないという課題がある。さらに、解説、説明等の口頭内容が映像の途中から再生されることがあるので、アノテーションインデックスによるインデック再生では、アノテーションインデックスの生成時刻が基準のため、映像の途中からの必要なインデック再生や検索ができず、作成したアノテーションインデックスを修正することもできないという課題がある。 Here, the annotation index has a problem that a plurality of snapshots including the annotation information must be taken and that only the annotation index creation time by taking the snapshot can be used as an index. Furthermore, since oral content such as explanations and explanations may be played from the middle of the video, the index playback using the annotation index is based on the annotation index generation time, so the necessary index playback and search from the middle of the video. There is a problem that the created annotation index cannot be modified.

そこで、本実施例では、音声インデックスを用いて、アノテーションインデックスと音声インデックスとの関連付けを行なうことで、これらの課題を解決する。また、インデックスを修正可能とし、さらに、インデックスの視認性の改善や、操作性の改善を行う。以下、本実施例の詳細について説明する。 Therefore, in this embodiment, these problems are solved by associating the annotation index with the voice index using the voice index. In addition, the index can be modified, and the visibility of the index and the operability are improved. The details of this embodiment will be described below.

図３は、本実施例におけるアノテーションインデックスの判定を説明する図である。図３は、アノテーション情報を含むスナップショットが生成された時刻をアノテーション生成時刻と呼ぶとき、アノテーション情報Ａ，Ｂ，Ｃをそれぞれ含むアノテーション生成時刻Ａ，Ｂ，Ｃを示している。 FIG. 3 is a diagram for explaining the determination of the annotation index in this embodiment. FIG. 3 shows the annotation generation times A, B, and C including the annotation information A, B, and C, respectively, when the time when the snapshot including the annotation information is generated is called the annotation generation time.

図３において、多数のアノテーションインデックスが生成されてしまうのを防ぐために、アノテーション生成時刻の間隔が所定閾値未満のものは同じアノテーションインデックスとする。例として、所定閾値が５秒の場合、アノテーション生成時刻ＡとＢの間隔が５秒未満であり、アノテーション生成時刻ＢとＣの間隔が５秒以上であった場合、アノテーション情報ＡとＢはまとめて同じアノテーションインデックス２００とし、アノテーション情報Ｃは単独のアノテーションインデックス２０１とする。 In FIG. 3, in order to prevent a large number of annotation indexes from being generated, those whose annotation generation time intervals are less than a predetermined threshold are set to be the same annotation index. As an example, when the predetermined threshold is 5 seconds, the interval between the annotation generation times A and B is less than 5 seconds, and when the interval between the annotation generation times B and C is 5 seconds or more, the annotation information A and B are combined. The same annotation index 200 is used, and the annotation information C is a single annotation index 201.

なお、アノテーション情報ＡとＢをまとめた1つのアノテーションインデックス２００は、時間幅を持つことになり、アノテーションインデックス２００は開始点、終了点を有する。すなわち、図３において、所定閾値以上アノテーション生成がない状態でアノテーション生成があれば最初のアノテーションインデックス２００の開始点（１）となり、所定閾値内でアノテーション生成状態が続き、再び所定閾値以上アノテーション生成がない状態になり、またアノテーション生成あればそこが次のアノテーションインデックス２０１の開始点（３）になる。所定閾値内でのアノテーション生成状態が続いた最後のアノテーション生成時刻が最初のアノテーションインデックス２００の終了点（２）となる。複数のアノテーション情報をまとめたアノテーションインデックス２００によるインデックス再生は、開始点の時刻が基準となる。 One annotation index 200 that summarizes the annotation information A and B has a time width, and the annotation index 200 has a start point and an end point. That is, in FIG. 3, if annotation is generated without annotation generation above a predetermined threshold value, it becomes the start point (1) of the first annotation index 200, the annotation generation state continues within the predetermined threshold value, and annotation generation above the predetermined threshold value is performed again. If no annotation is generated, that is the starting point (3) of the next annotation index 201. The last annotation generation time in which the annotation generation state within the predetermined threshold value continues becomes the end point (2) of the first annotation index 200. Index reproduction by the annotation index 200, which is a collection of a plurality of annotation information, is based on the time of the start point.

図４は、図３の条件でのアノテーションインデックスの表示形式を説明する図である。図４に示すように、アノテーションインデックスはスナップショット上にアノテーション情報を点線枠で囲んで表示される。ここで、アノテーション情報ＡとＢはまとめて1つの点線枠で囲まれたアノテーションインデックス２００として表示され、アノテーション情報Ｃは単独の点線枠で囲まれたアノテーションインデックス２０１として表示される。 FIG. 4 is a diagram illustrating a display format of the annotation index under the conditions of FIG. As shown in FIG. 4, the annotation index is displayed on the snapshot with the annotation information surrounded by a dotted line frame. Here, the annotation information A and B are collectively displayed as the annotation index 200 surrounded by one dotted frame, and the annotation information C is displayed as the annotation index 201 surrounded by a single dotted frame.

図５は、本実施例におけるアノテーションインデックスと音声インデックスの表示レイアウトを示す図である。アノテーション情報を含むスナップショットをインデックスサーチすると、図４に示すスナップショット上のアノテーション情報を囲んだ点線枠が、図５に示すように、表示画面３００の映像表示領域３０１に表示される。 FIG. 5 is a diagram showing a display layout of the annotation index and the voice index in this embodiment. When the snapshot including the annotation information is index-searched, the dotted line frame surrounding the annotation information on the snapshot shown in FIG. 4 is displayed in the video display area 301 of the display screen 300 as shown in FIG.

音声インデックスは音声インデックス表示領域３０２に表示される。また、選択した音声インデックスの音声から文字へのテキスト化を行なったテキストをテキスト表示領域３０３に表示する。 The voice index is displayed in the voice index display area 302. Further, the text obtained by converting the voice of the selected voice index into characters is displayed in the text display area 303.

図６は、本実施例におけるアノテーションインデックスと音声インデックスの他の表示レイアウトを示す図である。図６においては、映像表示とカメラ映像の表示のＭＩＸ画面表示を考慮して、映像表示領域３０１とカメラ映像表示領域３０４、３０５を有し、図５と同様に、映像表示領域３０１とカメラ映像表示領域３０４、３０５の上下に、音声インデックス表示領域３０２とテキスト表示領域３０３を設けてもよい。なお、音声インデックス表示領域３０２とテキスト表示領域３０３はカスタマイズできるようにしてもよい。例えば、テキストが下に固定であると見えにくい可能性があるため、音声インデックス表示領域３０２を下に、テキスト表示領域３０３を上にしてもよい。また、カメラ映像がない場合は、カメラ映像表示領域３０４や３０５をテキスト表示領域３０３としてもよい。 FIG. 6 is a diagram showing other display layouts of the annotation index and the voice index in this embodiment. In FIG. 6, in consideration of the MIX screen display of the image display and the camera image display, the image display area 301 and the camera image display areas 304 and 305 are provided, and the image display area 301 and the camera image are similar to those in FIG. A voice index display area 302 and a text display area 303 may be provided above and below the display areas 304 and 305. The voice index display area 302 and the text display area 303 may be customized. For example, if the text is fixed at the bottom, it may be difficult to see, so the voice index display area 302 may be at the bottom and the text display area 303 may be at the top. When there is no camera image, the camera image display area 304 or 305 may be used as the text display area 303.

図７は、本実施例における図５や図６の音声インデックス表示領域３０２に表示する音声インデックスの表示形式を説明する図である。図７に示すように、音声インデックスは音声ブロック４００ごとの表示として、マイクチャンネル４０５別に表示する。例えばチャンネルＣＨ１は教師、チャンネルＣＨ２は生徒や聴講者等として、接続マイク数によってチャンネルごとの音声インデックを自動作成し表示する。 FIG. 7 is a diagram illustrating a display format of the voice index displayed in the voice index display area 302 of FIGS. 5 and 6 in this embodiment. As shown in FIG. 7, the voice index is displayed for each microphone channel 405 as a display for each voice block 400. For example, channel CH1 is a teacher, channel CH2 is a student, a listener, etc., and an audio index for each channel is automatically created and displayed according to the number of connected microphones.

音声ブロック４００は、発言時間間隔が所定閾値未満のものは同じ音声インデックスとする。例えば、所定閾値を５秒として、５秒未満の発言間隔は１連の発言内容であるとして同じ音声インデックスとし、５秒以上の場合は、別の音声インデックスとして別の音声ブロックとして表示する。 The voice block 400 has the same voice index when the speech time interval is less than a predetermined threshold value. For example, the predetermined threshold value is 5 seconds, the speech interval of less than 5 seconds is the same voice index as the content of a series of speeches, and the speech index of 5 seconds or more is displayed as another voice block.

また、音声ブロック４００の内部で、例えば発言時間間隔が２秒未満の場合、ピリオドポイントとして強調表示（点線で表示したり、同じブロック内でピリオドポイントごとに色を変えて表示、あるいはまだらのように交互に色を変えて表示）し、編集機能による分割可能位置として管理し利用する。いわゆる文章ごとに編集を行う場合等に利用できる。 Also, inside the voice block 400, for example, when the speech time interval is less than 2 seconds, it is highlighted as a period point (displayed with a dotted line, displayed in a different color for each period point in the same block, or mottled. It is displayed by changing the color alternately), and it is managed and used as a divisible position by the editing function. It can be used when editing so-called sentences.

音声インデックスは、記憶装置１０での入力映像の録画開始から録画終了までの間の入力映像に付随する音声についてのインデックスであるので、音声インデックスの表示は、デフォルトは録画開始から録画終了までの全体表示であり、録画表示時間の５分、１０分等の表示間隔４０１は固定とする。なお、録画表示時間の表示間隔を変更可能であり、変更することで音声ブロックの長さが変わる。 Since the audio index is an index for the audio accompanying the input video from the start of recording to the end of recording of the input video in the storage device 10, the default display of the audio index is the entire period from the start of recording to the end of recording. It is a display, and the display interval 401 such as 5 minutes and 10 minutes of the recording display time is fixed. The display interval of the recording display time can be changed, and the length of the audio block changes by changing the display interval.

また、録画表示時間の表示間隔の調整時、音声インデックス表示領域に収まらないときにはスクロール表示４０２を行う。 Further, when adjusting the display interval of the recording display time, if it does not fit in the audio index display area, the scroll display 402 is performed.

音声インデックスを選択するとテキストを表示し、もう一度選択するとその音声インデックスによるインデックス再生を行なう。なお、音声インデックスによるインデックス再生の場合は、音声インデックスの生成時間の開始点の時刻にマイナス方向のさらに早い方向の時刻をマージンとして付加して、その時刻に対応した時刻から記録した入力映像を再生する。 Select a voice index to display the text, and select it again to play the index using that voice index. In the case of index playback using an audio index, an earlier time in the minus direction is added as a margin to the time at the start point of the audio index generation time, and the input video recorded from the time corresponding to that time is reproduced. do.

４０３は再生位置の表示であって、再生中、自動スクロールする。また、自動スクロール中、手動でスクロールも可能である。また、一定時間内に別の音声インデックスを選択しない場合、再生中の音声インデックスの位置に復帰すし、自動スクロールを再開する。 Reference numeral 403 is a display of the playback position, which automatically scrolls during playback. It is also possible to scroll manually during automatic scrolling. If another audio index is not selected within a certain period of time, the position of the audio index being played is restored and automatic scrolling is restarted.

なお、音声ブロック４００は、チャンネルごとに色を分けてもよいし、音量に反応して、表示高さや色を変えてもよい。 The audio block 400 may be color-coded for each channel, or the display height or color may be changed in response to the volume.

また、各チャンネルに音量レベルの閾値を設け、閾値以下の音はテキスト化しないようにしてもよい。 Further, a volume level threshold value may be set for each channel so that sounds below the threshold value are not converted into text.

また、音声ブロック４００は、頻出ワードやキーワード部分などの重要箇所は、強調表示として、色を変える、点滅させる、枠を付ける等を行ってもよい。また、後述する、アノテーションインデックスとのリンクの場合や、教材の文字と頻出ワードが一致した場合に上記強調表示を行ってもよい。 Further, in the voice block 400, important parts such as frequently-used words and keyword parts may be highlighted by changing the color, blinking, adding a frame, or the like. In addition, the above highlighting may be performed in the case of a link with an annotation index, which will be described later, or when the characters of the teaching material and the frequently-used words match.

また、音声インデックス表示領域３０２に表示する音声インデックスは、上記した音声ブロック形式ではなく、リスト形式で表示してもよい。その際には、後述する音声インデックスのデータ構成の構成要素の主要なものをリスト化する。 Further, the voice index displayed in the voice index display area 302 may be displayed in a list format instead of the above-mentioned voice block format. At that time, the main components of the data structure of the voice index, which will be described later, are listed.

図８は、本実施例における音声インデックスの作成処理フローチャートである。図８において、まずステップＳ８１で、記憶装置１０での入力映像の録画開始を行う。次にステップＳ８２で、発言があるかを音声レベル検出等により検知し、発言があればステップＳ８３に進み、発言がなければステップＳ８７に進む。ステップＳ８３では、音声から音声認識により文字へのテキスト化処理を行う。次にステップＳ８４で、ピリオド検知処理を行い、例えば、発言時間間隔が２秒未満かを検出する。次にステップＳ８５で、発言の継続性を判断し、例えば、発言時間間隔が５秒未満であれば、ステップＳ８３に戻り、５秒以上であれば、発言が終了と判断し、ステップＳ８６で、１つの音声インデックスとして、時間情報とテキストデータを保存する。そして、ステップＳ８７で、記憶装置１０で入力映像の録画停止が選択されたかを判断し、停止でなければ、ステップＳ８２に戻り、次の音声インデックスの作成処理を行う。録画停止選択された場合はステップＳ８８に進み録画終了となる。 FIG. 8 is a flowchart of the voice index creation process in this embodiment. In FIG. 8, first, in step S81, recording of the input video in the storage device 10 is started. Next, in step S82, it is detected by voice level detection or the like whether or not there is a statement, and if there is a statement, the process proceeds to step S83, and if there is no statement, the process proceeds to step S87. In step S83, text conversion processing from voice to character is performed by voice recognition. Next, in step S84, a period detection process is performed to detect, for example, whether the speech time interval is less than 2 seconds. Next, in step S85, the continuity of the speech is determined. For example, if the speech time interval is less than 5 seconds, the process returns to step S83, and if it is 5 seconds or more, it is determined that the speech is finished. Time information and text data are stored as one voice index. Then, in step S87, it is determined in the storage device 10 whether or not the recording stop of the input video is selected, and if it is not stopped, the process returns to step S82 and the next audio index creation process is performed. If recording stop is selected, the process proceeds to step S88 to end recording.

図９は、本実施例における音声インデックスとアノテーションインデックスとのリンクについての説明図である。図９において、横軸は時間であって、アノテーションインデックス１、２は、複数のアノテーション情報をまとめた時間幅を持つ場合を示している。音声インデックスとアノテーションインデックスの生成時間が重複している場合をそれぞれがリンクしていると定義し、それぞれの関連付けを行う。図９においては、アノテーションインデックス１と音声インデックス１から５がリンクしており、アノテーションインデックス２と音声インデックス５がリンクしている。また、音声インデックス５を介して、アノテーションインデックス１と２も間接的にリンクする。このように共通の音声インデックスとリンクしている複数のアノテーションインデックスはまとめてユニットインデックスとする。 FIG. 9 is an explanatory diagram of the link between the voice index and the annotation index in this embodiment. In FIG. 9, the horizontal axis is time, and the annotation indexes 1 and 2 show a case where a plurality of annotation informations have a time width. When the generation time of the voice index and the annotation index overlaps, it is defined that they are linked, and they are associated with each other. In FIG. 9, the annotation index 1 and the voice indexes 1 to 5 are linked, and the annotation index 2 and the voice index 5 are linked. In addition, annotation indexes 1 and 2 are also indirectly linked via the voice index 5. A plurality of annotation indexes linked to a common voice index in this way are collectively referred to as a unit index.

図１０は、図９の条件でのアノテーションインデックスの表示形式を説明する図である。図１０において、映像表示領域３０１に表示される、スナップショット上のアノテーション情報を点線枠で囲んだ２００、２０１がアノテーションインデックスであって、それに対してオフセットさせた点線枠で囲んだ２５０がユニットインデックスとして表示される。 FIG. 10 is a diagram illustrating a display format of the annotation index under the conditions of FIG. In FIG. 10, 200 and 201 in which the annotation information on the snapshot displayed in the video display area 301 is surrounded by a dotted line frame are annotation indexes, and 250 surrounded by a dotted line frame offset from the annotation index is a unit index. Is displayed as.

なお、アノテーションインデックス２００、２０１の点線枠とユニットインデックス２５０の点線枠は色を変えてよい。 The color of the dotted line frame of the annotation indexes 200 and 201 and the dotted line frame of the unit index 250 may be changed.

また、ユニットインデックスが定義されているときのアノテーションインデックスの点線枠は非表示にする。また、ユニットインデックスを選択すると、中のアノテーションインデックスの点線枠を表示する。表示されたアノテーションインデックスの仕様は前述した通りである。 Also, hide the dotted frame of the annotation index when the unit index is defined. When the unit index is selected, the dotted frame of the annotation index inside is displayed. The specifications of the displayed annotation index are as described above.

図１１は、本実施例における音声インデックスとアノテーションインデックスとのリンクの表示形式を説明する図である。 FIG. 11 is a diagram illustrating a display format of a link between the voice index and the annotation index in this embodiment.

図１１において、音声インデックスとアノテーションインデックスが図のような関係であると、アノテーションインデックス１から３はユニットインデックス２５０を構成する。アノテーションインデックスにリンクしている音声インデックスは、アノテーションインデックスの近くに音声インデックスを示すマイクマークなどのアイコン４１０を表示する。 In FIG. 11, when the voice index and the annotation index have the relationship as shown in the figure, the annotation indexes 1 to 3 constitute the unit index 250. The voice index linked to the annotation index displays an icon 410 such as a microphone mark indicating the voice index near the annotation index.

リンクしている音声インデックスのアイコン４１０の表示位置は、時間的に先のパターンである音声インデックス１については、一律アノテーションインデックス（またはユニットインデックス）の点線枠の左辺の左側とする。また、時間的に後のパターンである音声インデックス２については、アノテーションインデックス（またはユニットインデックス）の点線枠の左辺の右側とする。 The display position of the linked voice index icon 410 is set to the left side of the left side of the dotted frame of the uniform annotation index (or unit index) for the voice index 1 which is the pattern ahead in time. Further, the voice index 2 which is a later pattern in time is set to the right side of the left side of the dotted line frame of the annotation index (or unit index).

また、アノテーションインデックス（またはユニットインデックス）の生成時間を点線枠の幅に見立て、音声インデックスとアノテーションインデックスの生成開始時刻の差に応じてアイコン４１０とアノテーションインデックス（またはユニットインデックス）とを結ぶリンク線４２０のアノテーションインデックス（またはユニットインデックス）との接続位置とする。このとき、音声インデックスの生成開始時刻にマージンは含めない。 In addition, the generation time of the annotation index (or unit index) is regarded as the width of the dotted line frame, and the link line 420 connecting the icon 410 and the annotation index (or unit index) according to the difference between the voice index and the generation start time of the annotation index. It is the connection position with the annotation index (or unit index) of. At this time, the margin is not included in the generation start time of the voice index.

図１１において、表示画面３００の映像表示領域３０１に表示されるのは、アノテーションインデックスまたはユニットインデックスと音声インデックスを示すアイコン４１０である。映像表示領域３０１に表示される音声インデックスを示すアイコン４１０またはユニットインデックスを選択すると音声インデックスに対応したテキストを表示する。また、ユニットインデックス内のユニットインデックスの点線枠を表示する。 In FIG. 11, what is displayed in the video display area 301 of the display screen 300 is an icon 410 indicating an annotation index or a unit index and an audio index. When the icon 410 or the unit index indicating the audio index displayed in the video display area 301 is selected, the text corresponding to the audio index is displayed. Also, the dotted frame of the unit index in the unit index is displayed.

また、記録装置における入力映像のインデックスによる再生の場合、ユニットインデックスを選択すると、アノテーションインデックスと音声インデックスの生成時刻の早い方のインデックスに対応した時刻から映像を再生する。なお、音声インデックスの場合は、マージンを含んで再生される（図１１の（１））。また、音声インデックスを示すアイコン４１０を選択すると音声インデックスに対応した時刻から映像を再生する（図１１の（２））。また、アノテーションインデックスを選択すると、アノテーションインデックスに対応した時刻から映像を再生する（図１１の（３））。 Further, in the case of reproduction by the index of the input video in the recording device, when the unit index is selected, the video is reproduced from the time corresponding to the index corresponding to the earlier index of the generation time of the annotation index and the audio index. In the case of an audio index, the audio index is reproduced including a margin ((1) in FIG. 11). Further, when the icon 410 indicating the audio index is selected, the video is reproduced from the time corresponding to the audio index ((2) in FIG. 11). When the annotation index is selected, the video is played back from the time corresponding to the annotation index ((3) in FIG. 11).

図１２は、本実施例における音声インデックスとアノテーションインデックスの生成時刻の差が微小の場合のリンクの表示形式を説明する図である。 FIG. 12 is a diagram illustrating a link display format when the difference between the generation time of the voice index and the annotation index in this embodiment is small.

図１２において、図９と同様に、アノテーションインデックス１、２は、複数のアノテーション情報をまとめた時間幅を持つ場合を示しており、音声インデックスとアノテーションインデックスの生成時間が重複している場合をそれぞれがリンクしていると定義するが、それらの生成時刻の差が微小の場合、例えば２秒未満の場合は、サブリンクとして定義する。また、５秒以上離れたアノテーションインデックスとのリンクは、メインリンクと称する。 In FIG. 12, similarly to FIG. 9, the annotation indexes 1 and 2 show a case where a plurality of annotation informations have a time width, and a case where the voice index and the annotation index generation time overlap, respectively. Is defined as a link, but if the difference between their generation times is small, for example, less than 2 seconds, it is defined as a sublink. A link with an annotation index separated by 5 seconds or more is called a main link.

図１２においては、アノテーションインデックス１と音声インデックス７がメインリンクしており、アノテーションインデックス２と音声インデックス１０がメインリンクしている。また、アノテーションインデックス１と音声インデックス６、８、９、１０がサブリンクしており、アノテーションインデックス２と音声インデックス７、９がサブリンクしている。 In FIG. 12, the annotation index 1 and the voice index 7 are mainly linked, and the annotation index 2 and the voice index 10 are mainly linked. Further, the annotation index 1 and the voice indexes 6, 8, 9 and 10 are sub-linked, and the annotation index 2 and the voice indexes 7 and 9 are sub-linked.

図１３は、図１２の条件での音声インデックスとアノテーションインデックスとのリンクの表示形式を説明する図である。図１３は、映像表示領域３０１に表示される、アノテーションインデックスまたはユニットインデックスと音声インデックスを示すアイコン４１０を示している。 FIG. 13 is a diagram illustrating a display format of a link between the voice index and the annotation index under the conditions of FIG. FIG. 13 shows an icon 410 indicating an annotation index or a unit index and an audio index displayed in the video display area 301.

アイコン４１０が音声インデックス６の場合は、音声インデックス６はアノテーションインデックス１のみとサブリンクしているので、アイコン４１０はアノテーションインデックス１を囲む点線枠とリンク線４２０で結ばれる。 When the icon 410 has a voice index 6, the voice index 6 is sublinked only to the annotation index 1, so that the icon 410 is connected to the dotted frame surrounding the annotation index 1 by the link line 420.

アイコン４１０が音声インデックス７の場合は、音声インデックス７はアノテーションインデックス１とメインリンク、アノテーションインデックス２とサブリンクしているので、アノテーションインデックス１、２はユニットインデックス２５０となり、アイコン４１０はユニットインデックス２５０を囲む点線枠とリンク線４２０で結ばれる。 When the icon 410 has a voice index 7, the voice index 7 has a main link with the annotation index 1 and a sublink with the annotation index 2. Therefore, the annotation indexes 1 and 2 have a unit index 250, and the icon 410 has a unit index 250. It is connected to the surrounding dotted frame by a link line 420.

アイコン４１０が音声インデックス８の場合は、音声インデックス８はアノテーションインデックス１のみとサブリンクしているので、アイコン４１０はアノテーションインデックス１を囲む点線枠とリンク線４２０で結ばれる。 When the icon 410 has a voice index 8, the voice index 8 is sublinked only to the annotation index 1, so that the icon 410 is connected to the dotted frame surrounding the annotation index 1 by the link line 420.

アイコン４１０が音声インデックス９の場合は、音声インデックス９はアノテーションインデックス１と２でサブリンクしているので、アノテーションインデックス１、２はユニットインデックス２５０となり、アイコン４１０はユニットインデックス２５０を囲む点線枠とリンク線４２０で結ばれる。なお、サブリンクしかない場合には、生成時間の早い方に対するリンクとしてリンク線４２０の傾きを決定する。 When the icon 410 has a voice index 9, the voice index 9 is sublinked by the annotation indexes 1 and 2, so that the annotation indexes 1 and 2 become the unit index 250, and the icon 410 is linked to the dotted frame surrounding the unit index 250. It is connected by line 420. If there is only a sublink, the inclination of the link line 420 is determined as the link for the one with the earlier generation time.

アイコン４１０が音声インデックス１０の場合は、音声インデックス１０はアノテーションインデックス１とサブリンク、アノテーションインデックス２とメインリンクしているので、アノテーションインデックス１、２はユニットインデックス２５０となり、アイコン４１０はユニットインデックス２５０を囲む点線枠とリンク線４２０で結ばれる。なお、メインリンクとサブリンクが混在する場合、メインリンクを基準とする。メインリンクが複数の場合は生成時間の早い方に対するリンクとしてリンク線４２０の傾きを決定する。 When the icon 410 has a voice index 10, the voice index 10 has a sublink with the annotation index 1 and a main link with the annotation index 2. Therefore, the annotation indexes 1 and 2 have a unit index 250, and the icon 410 has a unit index 250. It is connected to the surrounding dotted frame by a link line 420. When the main link and the sub link are mixed, the main link is used as the reference. When there are a plurality of main links, the slope of the link line 420 is determined as the link for the one with the earlier generation time.

なお、サブリンクとメインリンクで関連付けの程度を区別し、リンク線４２０の色や形状等を変えてもよい。 The degree of association may be distinguished between the sublink and the main link, and the color and shape of the link line 420 may be changed.

図１４は、本実施例における入力映像の録画とアノテーションインデックスや音声インデックスの保存構成を説明する概念図である。図１４において、上段は、従来の入力映像であるメインレイヤとアノテーション情報を有するアノテーションレイヤの保存構成である。すなわち、従来は、映像とアノテーション情報が保存時に合成され1つの映像になってしまうので、録画映像上ではアノテーション情報は変更できない。また、スナップショットを撮り忘れたときにはアノテーションインデックスは定義できない。 FIG. 14 is a conceptual diagram illustrating the recording of the input video and the storage configuration of the annotation index and the audio index in this embodiment. In FIG. 14, the upper part is a storage configuration of a main layer which is a conventional input video and an annotation layer having annotation information. That is, conventionally, since the video and the annotation information are combined into one video at the time of saving, the annotation information cannot be changed on the recorded video. Also, if you forget to take a snapshot, you cannot define the annotation index.

これに対して、下段は、本実施例における、メインレイヤとアノテーションレイヤの保存構成である。すなわち、本実施例では、音声は、音声とは別に音声インデックスデータとテキストデータを個別に保存する構成とすることで、
録画時のスナップショットなしでも音声インデックスデータによりインデックス作成ができる。また、音声インデックスデータとテキストデータの編集機能による修正をそれぞれのデータに反映し保存させることができるのでインデックスの修正が可能となる。 On the other hand, the lower part is the storage configuration of the main layer and the annotation layer in this embodiment. That is, in this embodiment, the voice is configured to store the voice index data and the text data separately from the voice.
Indexing can be done with voice index data without a snapshot at the time of recording. Further, since the correction by the editing function of the voice index data and the text data can be reflected in each data and saved, the index can be corrected.

図１５は、本実施例における音声インデックスのデータ構成図である。図１５において、４５０はファイル名であって、日付と録画開始時刻を示している。４５１はインデックスであって、音声インデックスの生成順を示す番号とピリオドポイントで分割されていることを示す枝番を示している。また、４５２は属性であって、マイクチャンネルや人物の識別、インデックス群の種類などを示す。また、４５３は開始時間であって、音声開始時間やピリオドポイント開始時間を示す。また、４５４は終了時間であって、音声終了時間やピリオドポイント終了時間を示す。開始時間４５３や終了時間４５４は、録画開始時刻からの経過時間であって、インデックス群連続再生時には終了時間から次のインデックス開始時間までジャンプする。４５５は座標であって、音声インデックスを示すアイコン４１０の表示位置（中心座標）とアノテーションインデックスとの接続座標を示す。また、４５６は、リンクするアノテーションインデックス名である。 FIG. 15 is a data structure diagram of the voice index in this embodiment. In FIG. 15, 450 is a file name and indicates a date and a recording start time. Reference numeral 451 is an index, which indicates a number indicating the order in which the voice index is generated and a branch number indicating that the voice index is divided by a period point. Further, 452 is an attribute, and indicates a microphone channel, identification of a person, a type of index group, and the like. Further, 453 is a start time, and indicates a voice start time or a period point start time. Further, 454 is an end time, and indicates a voice end time or a period point end time. The start time 453 and the end time 454 are elapsed times from the recording start time, and jump from the end time to the next index start time during continuous playback of the index group. Reference numeral 455 is a coordinate, which indicates the connection coordinate between the display position (center coordinate) of the icon 410 indicating the voice index and the annotation index. Further, 456 is an annotation index name to be linked.

以上のように、本実施例によれば、音声インデックスを用いて、アノテーションインデックスと音声インデックスとの関連付けを行なうことで、アノテーションインデックスのためのスナップショット撮影がなくても、音声インデックスを用いて記録した映像において特定の場面の検索や再生を行うことが出来る。また、インデックスを修正でき、さらに、インデックスの視認性の改善や、操作性の改善を行うことが出来る。 As described above, according to the present embodiment, by associating the annotation index with the voice index using the voice index, recording is performed using the voice index even if there is no snapshot for the annotation index. It is possible to search and play back a specific scene in the recorded video. In addition, the index can be modified, and the visibility of the index can be improved and the operability can be improved.

本実施例は、音声インデックスを用いた記録装置における入力映像の再生について説明する。 In this embodiment, reproduction of an input video in a recording device using an audio index will be described.

図１６は、本実施例における、音声インデックスを用いた記録装置における入力映像の再生を説明する概念図である。 FIG. 16 is a conceptual diagram illustrating reproduction of an input video in a recording device using an audio index in this embodiment.

図１６において、図７と同様に、音声インデックスは音声ブロック４００ごとの表示として、マイクチャンネル別に表示されている。 In FIG. 16, similarly to FIG. 7, the voice index is displayed for each microphone channel as a display for each voice block 400.

ここで、音声インデックスをテキスト化した内容から頻出するワードを抽出し、それらを重要ワードとして、これらの重要ワードのインデックス群を構築する。そして、４３０で示すように、重要ワードを含むピリオドポイント単位のインデックスを用いて記録装置における入力映像の再生を行う。ピリオドポイントがない場合は音声ブロックの最初から再生する。 Here, words that frequently appear are extracted from the textual content of the voice index, and these important words are used as important words to construct an index group of these important words. Then, as shown by 430, the input video is reproduced in the recording device using the index in units of period points including the important words. If there is no period point, play from the beginning of the audio block.

なお、上記は頻出するワードを重要ワードとしているが、重要ワードの割り出しをそれ専用のシステムを用いて行ってもよい。 In the above, the words that frequently appear are regarded as important words, but the important words may be determined by using a dedicated system for the important words.

また、重要ワードや音量、文脈などを総合的に判断してサマリーのインデックス群を構築する。そして、４４０で示すように、音声ブロックの最初のインデックスから再生、または、音声ブロックの最初と要約箇所までの間が長いときはピリオドポイントのインデックスから再生する。 In addition, a summary index group is constructed by comprehensively judging important words, volume, context, etc. Then, as shown by 440, playback is performed from the first index of the audio block, or when the distance between the beginning of the audio block and the summary portion is long, playback is performed from the index of the period point.

このようなインデックス群を用いて、個別にインデックス群を選択してインデックス再生してもよいし、インデックス群の連続再生として、ピリオドポイント単位、または音声ブロック単位での再生が終わったら次のインデックスへ自動ジャンプしてインデックス再生するようにしてもよい。 Using such an index group, the index group may be individually selected for index playback, or as continuous playback of the index group, when playback in period point units or audio block units is completed, the index is moved to the next index. You may make an automatic jump and play an index.

なお、重要ワードやサマリーは記録装置が自動で判断して作成するものであるが、ユーザーが決めたキーワードでインデックス群を構築してもよい。 Although important words and summaries are automatically determined and created by the recording device, an index group may be constructed with keywords determined by the user.

以上のように、本実施例によれば、音声インデックスを用いたインデックス再生の操作性の改善を行うことが出来る。 As described above, according to the present embodiment, it is possible to improve the operability of index reproduction using the voice index.

本実施例は、音声インデックスやアノテーションインデックスを統合したインデックスについて説明する。 In this embodiment, an index in which a voice index and an annotation index are integrated will be described.

図１７は、本実施例における統合インデックスを説明する概念図である。図１７においては、アノテーションインデックスと、音声インデックスと、記録装置と別体のプレゼンテーションツールからのインデックス情報が横軸を時間軸として表示されている。音声インデックスは音声ブロック４００ごとの表示として、マイクチャンネル別に表示されている。 FIG. 17 is a conceptual diagram illustrating an integrated index in this embodiment. In FIG. 17, the annotation index, the audio index, and the index information from the presentation tool separate from the recording device are displayed with the horizontal axis as the time axis. The audio index is displayed for each microphone channel as a display for each audio block 400.

本実施例では、これらの、種々のインデックス情報を統合した、４５０で示すような、統合インデックスを構築する。ここで、複数のインデックス情報の内、ほぼ同時刻や数秒差しかないインデックスは近似インデックス４５１として、近似インデックスを自動で削除またはＯＦＦすることで調整し、有用なインデックスのみを使い統合インデックス群を構築する。 In this embodiment, an integrated index as shown by 450, which integrates these various index information, is constructed. Here, among a plurality of index information, indexes that differ at approximately the same time or a few seconds are set as approximate indexes 451 and adjusted by automatically deleting or turning off the approximate indexes, and an integrated index group is constructed using only useful indexes. ..

なお、実施例２で説明した、重要ワードやキーワード、サマリーなどのインデックス群も含めたインデックス全てから成るダイジェストのようなインデックス群を構築することで疑似的な自動動画編集機能が可能になる。なお、このインデックス群を構築するためには、特にアノテーションインデックスで何が描かれたかまで内容を理解する必要があるためＡＩを用いてもよい。 In addition, by constructing an index group such as a digest composed of all indexes including index groups such as important words, keywords, and summaries described in the second embodiment, a pseudo automatic video editing function becomes possible. In addition, in order to construct this index group, it is necessary to understand the contents even what is drawn by the annotation index, so AI may be used.

また、視差から距離情報を算出し撮像範囲を3次元的に捉えることができるステレオセンシングカメラを用いて、そのカメラから人間の挙動の動作情報を出力し、記録装置または専用のAI搭載機でその情報を解析させ、被写体(例えば教師など)の身振り手振りなどの動作特徴を捉えたものなどの解析結果をインデックス生成に反映させてもよい。 In addition, using a stereo sensing camera that can calculate distance information from parallax and capture the imaging range three-dimensionally, it outputs motion information of human behavior from that camera, and uses a recording device or a dedicated AI-equipped machine to output it. The information may be analyzed, and the analysis result of capturing the motion characteristics such as the gesture of the subject (for example, a teacher) may be reflected in the index generation.

以上のように、本実施例によれば、統合インデックスを用いることでインデックス再生の機能向上をはかることが出来る。 As described above, according to this embodiment, it is possible to improve the function of index reproduction by using the integrated index.

なお、図５で、表示画面３００の映像表示領域３０１には、スナップショット上に、アノテーション情報を囲んだ点線枠で示すアノテーションインデックスが表示されるとして説明したが、スナップショット上に表示させるものを選択可能としてもよい。例えば、アノテーションインデックスと音声インデックスの両方全表示、リンクしているもののみの両方全表示、アノテーションインデックスのみ全表示、音声インデックスのみ全表示、などの表示選択機能を持たせてもよい。 Although it has been described in FIG. 5 that the annotation index indicated by the dotted line frame surrounding the annotation information is displayed on the snapshot in the video display area 301 of the display screen 300, the one to be displayed on the snapshot is described. It may be selectable. For example, it may have a display selection function such as full display of both annotation index and voice index, full display of both linked items, full display of only annotation index, and full display of only voice index.

また、音声インデックス表示領域３０２に、上記に加えて、統合インデックスや、重要ワード、キーワード、サマリー、ダイジェストのインデックス群、等を表示する表示選択機能を持たせてもよい。 Further, in addition to the above, the voice index display area 302 may be provided with a display selection function for displaying an integrated index, important words, keywords, summaries, digest index groups, and the like.

以上実施例について説明したが、本発明は上記した実施例に限定されるものではなく、様々な変形例が含まれる。例えば、上記した実施例は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明したすべての構成を備えるものに限定されるものではない。また、ある実施例の構成の一部を他の実施例の構成に置き換えることが可能であり、また、ある実施例の構成に他の実施例の構成を加えることも可能である。また、各実施例の構成の一部について、他の構成の追加・削除・置換をすることが可能である。 Although the examples have been described above, the present invention is not limited to the above-mentioned examples, and various modifications are included. For example, the above-described embodiment has been described in detail in order to explain the present invention in an easy-to-understand manner, and is not necessarily limited to the one including all the configurations described. Further, it is possible to replace a part of the configuration of one embodiment with the configuration of another embodiment, and it is also possible to add the configuration of another embodiment to the configuration of one embodiment. Further, it is possible to add / delete / replace a part of the configuration of each embodiment with another configuration.

１０：記録装置、２０：プロジェクター、３０：カメラ、４０：マイク、５０：データ入力用ＰＣ、６０：サーバー、７０：視聴用ＰＣ、１０１：映像入力部、１０２：チャンネル割当て部、１０３：情報入力部、１０４：アノテーション処理部、１０５：音声入力部、１０６：音声処理部、１０７：映像処理部、１０８：記録部、１０９：映像出力部、１１０：映像合成部、１１１：インデックス生成部、２００、２０１：アノテーションインデックス、２５０：ユニットインデックス、３００：表示画面、３０１：映像表示領域、３０２：音声インデックス表示領域、３０３：テキスト表示領域、３０４、３０５：カメラ映像表示領域、４００：音声ブロック、４１０：音声インデックスを示すアイコン、４２０：リンク線 10: Recording device, 20: Projector, 30: Camera, 40: Microphone, 50: Data input PC, 60: Server, 70: Viewing PC, 101: Video input unit, 102: Channel allocation unit, 103: Information input Unit, 104: Annotation processing unit, 105: Audio input unit, 106: Audio processing unit, 107: Video processing unit, 108: Recording unit, 109: Video output unit, 110: Video synthesis unit, 111: Index generation unit, 200 , 201: Annotation index, 250: Unit index, 300: Display screen, 301: Video display area, 302: Audio index display area, 303: Text display area, 304, 305: Camera image display area, 400: Audio block, 410 : Icon indicating voice index, 420: Link line

Claims

A recording device that records an input video and an index in the input video.
The index is an annotation index using the video signal of the input video and an audio index using the audio signal accompanying the input video.
If the generation time of the voice index and the annotation index overlap, they are associated with each other.
A recording device characterized in that by selecting either the associated voice index or the annotation index, a time corresponding to the time of the start point of the generation time of the selected index is recorded.

The recording device according to claim 1.
The annotation index is a snapshot obtained by cutting out the moment of the input video, and is obtained by storing a plurality of snapshots including annotation information which is handwritten information on the display image together with time information.
The audio index is a recording device created for audio accompanying the input video from the start of recording of the input video to the end of recording.

The recording device according to claim 2.
As the voice index, the same voice index is used when the speech time interval in the voice is less than the predetermined threshold value, and another voice index is used when the speech time interval is equal to or more than the predetermined threshold value.
A recording device characterized in that, inside the voice index, when the speech time interval is less than a second threshold value smaller than the predetermined threshold value, it is managed as a divisible position.

The recording device according to claim 1.
When the recorded input video is reproduced by selecting the audio index, a time in the earlier direction in the negative direction is added as a margin to the time of the start point of the generation time of the audio index, and the time is set to that time. A recording device characterized in that the recorded input video is reproduced from a corresponding time.

The recording device according to claim 1.
A recording device characterized in that the voice of the voice index is converted into text from voice to characters.

The recording device according to claim 2.
When an index search is performed on a snapshot containing the annotation information, a dotted line frame surrounding the annotation information on the snapshot is displayed in the image display area of the display screen of the external display device, and is above or below the image display area in the display screen. A recording device characterized in that a video signal is output so that the audio index is displayed in a side area.

The recording device according to claim 3.
The audio index is displayed as an audio block for each audio input channel in the area above or below the image display area of the display screen of the external display device, and the video signal is highlighted so that the divisible position is highlighted. A recording device characterized by outputting.

The recording device according to claim 1.
A plurality of annotation indexes linked to the voice index are collectively referred to as a unit index.
The annotation index is a snapshot obtained by cutting out the moment of the input video, and is obtained by storing a plurality of snapshots including annotation information which is handwritten information on the display image together with time information.
When an index search is performed on a snapshot containing the annotation information, a dotted frame surrounding the annotation information on the snapshot is displayed in the video display area of the display screen of the external display device, and further, when the annotation index is the unit index. Is a recording device characterized in that a video signal is output so that a second dotted frame offset from the dotted frame surrounding the annotation information is displayed in the video display area of the display screen of the external display device.

The recording device according to claim 8.
The audio index linked to the annotation index is characterized by outputting a video signal so as to display an icon indicating the audio index near the annotation index in the video display area of the display screen of the external display device. Recording device.

The recording device according to claim 9.
The display position of the icon of the linked voice index is the left side of the left side of the dotted line frame of the annotation index or the second dotted line frame of the unit index for the first voice index ahead in time, and the time. A recording device characterized in that a video signal is output so that the second audio index later is on the right side of the left side of the dotted line frame of the annotation index or the second dotted line frame of the unit index.

The recording device according to claim 10.
The generation time of the annotation index or the unit index is regarded as the width of the dotted line frame, and a link line connecting the icon and the annotation index or the unit index according to the difference between the voice index and the generation start time of the annotation index. A recording device characterized in that a video signal is output so as to be a connection position with the annotation index or the unit index of the above.

The recording device according to claim 10.
When the unit index is selected, the input video is played back from the time corresponding to the earlier index of the annotation index and the audio index generation time.
When the icon indicating the audio index is selected, the input video is played from the time corresponding to the audio index.
A recording device characterized in that when the annotation index is selected, the input video is reproduced from the time corresponding to the annotation index.

A recording system consisting of a video acquisition device, an audio acquisition device, a recording device that captures and records video and audio information from the video acquisition device and audio acquisition device, and a viewing device that views playback information from the recording device via a server. And
The recording device is
A video input unit that inputs video from the video acquisition device and captures the input video,
An information input unit for inputting annotation information for the input video, and
Annotation processing unit that draws the captured annotation information, and
A voice input unit that captures voice information from the voice acquisition device,
A voice processing unit that processes the captured voice information as digital data,
A video synthesis unit that synthesizes drawing information of the input video and annotation information processed by the annotation processing unit, a video processing unit having an index generation unit that generates an index in the input video, and a video processing unit.
A recording unit that records the video synthesized by the video compositing unit and the index generated by the index generation unit,
A video output unit that outputs the video recorded in the recording unit to an external display device, and
Has a control unit
The index is an annotation index using the video signal of the input video and an audio index using the audio signal accompanying the input video.
The control unit associates the voice index and the annotation index with each other assuming that the generation times of the voice index and the annotation index overlap with each other, and any one of the linked voice index and the annotation index. When is selected, the recording system is characterized in that the input video recorded in the recording unit is reproduced from the time corresponding to the time of the start point of the generation time of the selected index.