JP2008145976A

JP2008145976A - Content reproducing device

Info

Publication number: JP2008145976A
Application number: JP2006336247A
Authority: JP
Inventors: Takuro Sone; 卓朗曽根; Takahiro Tanaka; 孝浩田中
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2006-12-13
Filing date: 2006-12-13
Publication date: 2008-06-26
Anticipated expiration: 2026-12-13
Also published as: JP4506750B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a content reproducing device capable of synchronously reproducing video of a lyrics telop etc., corresponding to an audio signal of a live musical performance etc., which is generated in real time. <P>SOLUTION: Audio data 8 including guide melody data, and music data including a lyrics data track 9 are stored; a live musical performance sound input from an audio input unit 1 and the audio data are compared and synchronized with each other, and a clock signal is generated based upon the synchronism (a performance clock predictive generator 3). The lyrics data track is sequenced with the clock signal (a video sequencer 10) to display lyrics in synchronism with the externally input live musical performance sound. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

この発明は、外部から入力されるオーディオ信号に同期した映像を表示するコンテンツ再生装置に関する。 The present invention relates to a content reproduction apparatus that displays a video synchronized with an audio signal input from the outside.

カラオケ装置による演奏では、楽曲の演奏と同期して歌詞テロップが表示される（たとえば特許文献１）。これは予め曲データ中に演奏用の演奏トラックと歌詞テロップを表示するための歌詞トラックとを並列に記憶しているからである。 In the performance by the karaoke apparatus, the lyrics telop is displayed in synchronization with the performance of the music (for example, Patent Document 1). This is because the performance track for performance and the lyrics track for displaying the lyrics telop are previously stored in the song data in parallel.

特開２０００−９９０４４号公報JP 2000-99044 A

近年、カラオケ装置だけでなく、テレビ放送等でも歌手の歌唱に合わせて歌詞を表示することがよく行われている。また、テレビ放送では、歌詞に限定されず、ニュース番組等で人物の発言を文字に記述したものをテロップとして画面にスーパーインポーズ表示（テロップ化）したり、語学学習番組では講師の発言をテロップとしてスーパーインポーズ表示して表示することが広く行われている。 In recent years, not only karaoke apparatuses but also television broadcasts and the like are often used to display lyrics in accordance with the singer's singing. Also, in television broadcasting, not limited to lyrics, a news program or the like in which a person's remarks are written in text is displayed as a telop and displayed on the screen as a superimpose (telop). As a superimpose display, the display is widely performed.

これらは、いずれも事前に映像にテロップを合成したものを再生表示するものであり、たとえば演奏の生中継（いわゆるライブ中継）などのリアルタイムに行われ、ストリーミングで到来する演奏や発言等をテロップとしてスーパーインポーズ表示しているものではなかった。 Each of these is a playback and display of a telop synthesized with video in advance. For example, live performances of live performances (so-called live broadcasts) are performed in real time. It was not a superimpose display.

この発明は、ライブ演奏等のリアルタイムに発生するオーディオ信号に対応づけて歌詞テロップ等の映像を同期再生することができるコンテンツ再生装置を提供することを目的とする。 An object of the present invention is to provide a content reproduction apparatus capable of synchronously reproducing a video such as a lyrics telop in association with an audio signal generated in real time such as a live performance.

請求項１の発明は、音声時系列データと映像時系列データとを時系列に対応づけて記録したコンテンツデータを記憶するコンテンツデータ記憶部と、外部からオーディオ信号を入力するオーディオ信号入力部と、外部から補助情報を入力する補助情報入力部と、前記オーディオ信号と前記音声時系列データとを対比して時間軸上の対応位置とオーディオ信号の進行速度を算出する算出処理を実行するとともに、前記補助情報に基づいて前記算出処理を修正し、前記算出処理に要する時間だけ先行させて前記オーディオ信号に同期した再生クロックを生成するクロック予測生成部と、前記クロック予測生成部が生成した再生クロックに基づいて前記映像時系列データを再生する映像再生部と、を備えたことを特徴とする。 The invention of claim 1 is a content data storage unit that stores content data in which audio time-series data and video time-series data are recorded in association with each other in time series, an audio signal input unit that inputs an audio signal from the outside, Auxiliary information input unit for inputting auxiliary information from the outside, and a calculation process for calculating the corresponding position on the time axis and the traveling speed of the audio signal by comparing the audio signal and the audio time-series data, and A clock prediction generation unit that corrects the calculation process based on auxiliary information and generates a reproduction clock that is synchronized with the audio signal by preceding the time required for the calculation process, and a reproduction clock generated by the clock prediction generation unit And a video playback unit for playing back the video time-series data based on the video time series data.

請求項２の発明は、請求項１の発明において、外部からビデオ信号を入力するビデオ信号入力部と、前記ビデオ信号入力部が入力したビデオ信号と前記映像再生部が再生した映像信号とを合成して出力する映像合成部と、をさらに備えたことを特徴とする。 According to a second aspect of the present invention, in the first aspect of the present invention, a video signal input unit for inputting a video signal from the outside, a video signal input by the video signal input unit and a video signal reproduced by the video reproduction unit are combined. And a video synthesizing unit that outputs the image.

請求項３の発明は、請求項２の発明において、前記ビデオ信号入力部が入力したビデオ信号から補助情報を抽出して前記クロック予測生成部に供給する補助情報抽出部と、をさらに備えたことを特徴とする。 The invention of claim 3 is the invention of claim 2, further comprising: an auxiliary information extraction unit that extracts auxiliary information from the video signal input by the video signal input unit and supplies the auxiliary information to the clock prediction generation unit. It is characterized by.

請求項４の発明は、請求項１の発明において、外部から前記オーディオ信号に同期したビデオ信号を入力するビデオ信号入力部と、前記ビデオ信号入力部が入力したビデオ信号から補助情報を抽出して前記クロック予測生成部に供給する補助情報抽出部と、をさらに備えたことを特徴とする。

請求項５の発明は、請求項１〜４の発明において、前記クロック予測生成部を、さらに、前記補助情報に基づいて、前記映像再生部に対して、前記映像時系列データの再生位置を指示するようにしたことを特徴とする。 According to a fourth aspect of the present invention, in the first aspect of the invention, a video signal input unit that inputs a video signal synchronized with the audio signal from the outside, and auxiliary information is extracted from the video signal input by the video signal input unit. And an auxiliary information extraction unit that supplies the clock prediction generation unit.

According to a fifth aspect of the present invention, in the first to fourth aspects of the present invention, the clock prediction generation unit further instructs the video reproduction unit to reproduce the video time-series data based on the auxiliary information. It was made to do.

請求項６の発明は、請求項１〜５の発明において、前記補助情報入力部は、補助情報として前記オーディオ信号の進行位置情報を入力し、前記クロック予測生成部は、前記補助情報に基づいて、前記算出処理における前記音声時系列データの対比位置を修正することを特徴とする。 According to a sixth aspect of the present invention, in the first to fifth aspects of the invention, the auxiliary information input unit inputs progress position information of the audio signal as auxiliary information, and the clock prediction generation unit is based on the auxiliary information. The contrast position of the audio time-series data in the calculation process is corrected.

請求項７の発明は、請求項１〜５の発明において、前記補助情報入力部は、補助情報として前記オーディオ信号の進行速度情報を入力し、前記クロック予測生成部は、前記補助情報に基づいて、前記算出処理における前記進行速度を修正することを特徴とする。 According to a seventh aspect of the present invention, in the first to fifth aspects of the invention, the auxiliary information input unit inputs progress speed information of the audio signal as auxiliary information, and the clock prediction generation unit is based on the auxiliary information. The traveling speed in the calculation process is corrected.

請求項８の発明は、請求項１〜７の発明において、前記コンテンツデータ記憶部が記憶する前記映像時系列データは、表示すべき映像データとその表示タイミングを示すタイミングデータを含むシーケンスデータであり、前記映像再生部は、前記シーケンスデータを再生するシーケンス処理部を含むことを特徴とする。 According to an eighth aspect of the present invention, in the first to seventh aspects of the invention, the video time-series data stored in the content data storage unit is sequence data including video data to be displayed and timing data indicating the display timing. The video reproduction unit includes a sequence processing unit for reproducing the sequence data.

請求項９の発明は、請求項１〜７の発明において、前記コンテンツデータ記憶部が記憶する前記映像時系列データは、動画映像を含むビデオデータであり、前記映像再生部は、前記ビデオデータを再生するビデオ再生部を含むことを特徴とする。 According to a ninth aspect of the present invention, in the first to seventh aspects of the invention, the video time-series data stored in the content data storage unit is video data including a moving image, and the video reproduction unit stores the video data A video playback unit for playback is included.

請求項１０の発明は、請求項１〜９の発明において、前記コンテンツデータ記憶部は、さらに、外部機器を時系列に制御する機器制御時系列データを記憶し、前記クロック予測生成部が生成した再生クロックに基づいて前記機器制御時系列データを読み出して制御信号を出力する外部機器制御部をさらに備えたことを特徴とする。 According to a tenth aspect of the present invention, in the first to ninth aspects of the invention, the content data storage unit further stores device control time-series data for controlling the external device in time series, and the clock prediction generation unit generates An external device control unit that reads out the device control time-series data based on a reproduction clock and outputs a control signal is further provided.

請求項１１の発明は、請求項１〜１０の発明において、前記クロック予測生成部が生成した再生クロックに基づいて前記音声時系列データを再生する音声再生部を、さらに備えたことを特徴とする。 The invention according to claim 11 is the invention according to any one of claims 1 to 10, further comprising an audio reproduction unit that reproduces the audio time-series data based on the reproduction clock generated by the clock prediction generation unit. .

請求項１２の発明は、請求項１１の発明において、前記コンテンツデータ記憶部が記憶する前記音声時系列データは、並列に形成された複数のトラックからなり、前記クロック予測生成部は、前記複数のトラックのうち、前記オーディオ信号に対応する１つのトラックを用いて再生クロックを生成し、前記音声再生部は、前記複数のトラックのうち、前記クロック予測生成部が用いないトラックの一部または全部を再生することを特徴とする。 According to a twelfth aspect of the present invention, in the invention of the eleventh aspect, the audio time-series data stored in the content data storage unit includes a plurality of tracks formed in parallel, and the clock prediction generation unit includes the plurality of clock prediction generation units. A reproduction clock is generated by using one track corresponding to the audio signal among the tracks, and the audio reproduction unit selects a part or all of the plurality of tracks not used by the clock prediction generation unit. It is characterized by playing.

［作用］
この発明では、オーディオ信号および各種の補助情報に基づいてオーディオ信号に同期した再生クロックを生成し、この再生クロックに基づいて映像時系列データを再生することにより、オーディオ信号に同期した映像（および音声）を再生することができる。なお、オーディオ信号は、この装置を経由して再生用に出力されるようにしてもよく、この装置をバイパスする別の経路で再生されるようにしてもよい。
また、再生クロックを生成するための時間軸上の対応位置や進行速度の算出処理は、複雑で重い処理であるため、オーディオ信号（ビデオ信号）が入力されたのち、ある程度の時間が必要である。そこで、補助情報に基づいてオーディオ信号の進行を予測して再生クロックを算出処理に要する時間だけ先行させる。これにより、再生された映像（および音声）を入力されたオーディオ信号に遅れなく同期させることが可能になる。 [Action]
In the present invention, a reproduction clock synchronized with the audio signal is generated based on the audio signal and various auxiliary information, and the video time-series data is reproduced based on the reproduction clock, whereby the video (and audio) synchronized with the audio signal is reproduced. ) Can be played. Note that the audio signal may be output for reproduction via this device, or may be reproduced by another path that bypasses this device.
In addition, the calculation processing of the corresponding position on the time axis and the traveling speed for generating the reproduction clock is a complicated and heavy process, and therefore a certain amount of time is required after the audio signal (video signal) is input. . Therefore, the progress of the audio signal is predicted based on the auxiliary information, and the reproduction clock is advanced by the time required for the calculation process. As a result, it is possible to synchronize the reproduced video (and sound) with the input audio signal without delay.

この発明によれば、入力されたオーディオ信号に遅れなく同期して映像を再生することができるため、たとえばライブ演奏等の事前に歌詞テロップを用意できない場面でも、別に用意されたカラオケ用の同じ楽曲用の歌詞テロップを用いて、演奏に同期した歌詞テロップを表示することができる。 According to the present invention, since the video can be reproduced in synchronization with the input audio signal without delay, the same music for karaoke prepared separately even in a scene where a lyrics telop cannot be prepared in advance, such as a live performance The lyrics telop synchronized with the performance can be displayed using the lyrics telop for

図面を参照してこの発明の実施形態について説明する。
<<第１実施形態>>
図１はこの発明の第１実施形態であるコンテンツ再生装置の構成図である。このコンテンツ再生装置は、オーディオ入力部（ＡｕｄｉｏＩｎ）１から入力されたオーディオ信号に同期させて映像シーケンスデータ（ＤｉｓｐｌａｙＳｅｑｕｅｎｃｅＤａｔａ）９を再生出力する装置である。以下の説明では、カラオケ曲の演奏に用いられる曲データを用いて、ライブ演奏の歌詞を表示する場合を例に挙げて説明する。 Embodiments of the present invention will be described with reference to the drawings.
<< first embodiment >>
FIG. 1 is a block diagram of a content reproduction apparatus according to the first embodiment of the present invention. This content playback device is a device that plays back and outputs video sequence data (Display Sequence Data) 9 in synchronization with an audio signal input from an audio input unit (Audio In) 1. In the following description, a case where lyrics of live performance are displayed using song data used for performance of a karaoke song will be described as an example.

コンテンツ再生装置は、外部からオーディオ信号が入力されるオーディオ入力部１、補助情報が入力される補助情報入力部（ＡｕｘＩｎｆｏ）２、オーディオ信号が出力されるオーディオ出力部（ＡｕｄｉｏＯｕｔ）１１、映像が出力される映像出力部（ＶｉｄｅｏＯｕｔ）１２、音声データ（ＡｕｄｉｏＤａｔａ）８および前記映像シーケンスデータ９を含むカラオケ曲の曲データが記憶される記憶部（ＳｏｎｇＤａｔａ）７、入力されたオーディオ信号と映像とを同期させるための信号処理部６および前記映像シーケンスデータ９を再生する映像シーケンサ（ＧｒａｐｈｉｃＳｅｑｕｅｎｃｅｒ）１０を備えている。 The content playback apparatus includes an audio input unit 1 to which an audio signal is input from the outside, an auxiliary information input unit (Aux Info) 2 to which auxiliary information is input, an audio output unit (Audio Out) 11 to which an audio signal is output, and video A video output unit (Video Out) 12 for outputting the karaoke song data including the audio data (Audio Data) 8 and the video sequence data 9 (Song Data) 7, and the input audio signal And a video sequencer (Graphic Sequencer) 10 that reproduces the video sequence data 9.

ライブ演奏の歌詞を表示する場合、オーディオ入力部１から入力されるオーディオ信号は、ライブ演奏の演奏音である。ライブ演奏の演奏音は、コンテンツ再生装置のユーザが演奏する電子楽器から入力される。また、ユーザの生演奏音を収音するマイクから経由して入力されてもよいし、公共放送網、インターネット等を経由して配信されてもよい。また、カラオケ曲の曲データの場合、音声データ８は、ＭＩＤＩ等のフォーマットでインプリメントされたシーケンスデータ（演奏トラック）である。シーケンスデータは、楽音の発音動作や消音動作を実行するための情報であるイベントデータとこのイベントデータの読み出しタイミングを指示するタイミングデータからなっている。また、カラオケ曲の曲データの場合、映像シーケンスデータ９は、歌詞を表示するための歌詞トラックである。この演奏トラックと歌詞トラックは同じクロックで再生されることにより同期して表示される。 When displaying the lyrics of a live performance, the audio signal input from the audio input unit 1 is a performance sound of the live performance. The performance sound of the live performance is input from an electronic musical instrument played by the user of the content reproduction apparatus. Moreover, it may be input via a microphone that collects a user's live performance sound, or may be distributed via a public broadcast network, the Internet, or the like. In the case of karaoke song data, the audio data 8 is sequence data (performance track) implemented in a format such as MIDI. The sequence data is composed of event data that is information for executing a tone generation operation and a mute operation, and timing data that indicates the read timing of the event data. In the case of karaoke song data, the video sequence data 9 is a lyrics track for displaying lyrics. The performance track and the lyrics track are displayed in synchronization by being reproduced at the same clock.

信号処理部６は、演奏クロック予測生成部（ＴｉｍｅＡｌｉｇｎｍｅｎｔ＆ＴｅｍｐｏＴｒａｃｋｅｒ）３を有している。演奏クロック予測生成部３は、オーディオ入力部１から入力されたオーディオ信号と曲データ中の音声データとを対比し、且つ、補助情報入力部２から入力された補助情報に基づいて対比位置や速度を修正しつつ、前記オーディオ信号と同期した再生位置情報（タイミング情報）およびテンポ情報を生成する。また、生成した再生位置情報（タイミング情報）およびテンポ情報に基づいて再生クロックを生成する。 The signal processing unit 6 includes a performance clock prediction generation unit (Time Alignment & Tempo Tracker) 3. The performance clock prediction generation unit 3 compares the audio signal input from the audio input unit 1 with the audio data in the song data, and compares the position and speed based on the auxiliary information input from the auxiliary information input unit 2. The reproduction position information (timing information) and the tempo information synchronized with the audio signal are generated. Further, a reproduction clock is generated based on the generated reproduction position information (timing information) and tempo information.

演奏クロック予測生成部３は、オーディオ入力部１から入力されたオーディオ信号を数十ミリ秒ずつのフレームに分割し、各フレームごとのスペクトル、音量等を分析する。その一方で、記憶部７に記憶されている音声データを読み出す。音声データが上述したＭＩＤＩシーケンスデータの場合には、イベントデータに基づき、各タイミングで発音されているべき楽音のスペクトルおよび音量を分析する。また、記憶されている音声データが音声波形データ（ＡＤＰＣＭ，ＭＰ３等）であった場合には、オーディオ入力部１から入力されたオーディオ信号と同様にフレームに分割して、フレームごとのスペクトル，音量等を分析する。 The performance clock prediction generation unit 3 divides the audio signal input from the audio input unit 1 into frames of several tens of milliseconds, and analyzes the spectrum, volume, and the like for each frame. On the other hand, the audio data stored in the storage unit 7 is read out. When the audio data is the above-described MIDI sequence data, the spectrum and volume of the musical sound that should be sounded at each timing are analyzed based on the event data. When the stored audio data is audio waveform data (ADPCM, MP3, etc.), the audio data input from the audio input unit 1 is divided into frames, and the spectrum and volume for each frame are divided. Analyze etc.

オーディオ入力部１から入力されたオーディオ信号のスペクトル，音量等の情報と、記憶部７から読み出した音声データのスペクトル，音量等の情報とを対比し、スペクトルおよびその変化曲線が最も近似する時間軸上の対応位置を検出し、その対応位置における曲データの時間情報を曲の演奏位置情報とする。すなわち、ライブ演奏の演奏音が曲データのどの位置を演奏しているかを検出する。 The time axis in which the spectrum and its change curve are most approximated by comparing the spectrum and volume information of the audio signal input from the audio input section 1 with the information of the audio data spectrum and volume read from the storage section 7. The upper corresponding position is detected, and the time information of the music data at the corresponding position is used as the musical performance position information. That is, it detects which position in the music data the performance sound of the live performance is playing.

この演奏位置の検出すなわち同期検出により、現在入力されているオーディオ信号の曲データの時間軸上の位置を割り出すとともに、そのオーディオ信号の進行速度すなわちテンポを割り出す。そして、この時間軸上の位置、すなわち演奏位置情報およびタイミング情報（いつ拍タイミングを通過するかの情報）とテンポ情報を映像シーケンサ１０に出力する。なお、テンポおよび拍タイミングの決定は、たとえば特開平９−１６１７１号公報に記載された技術等を用いればよい。 By detecting the performance position, that is, synchronous detection, the position on the time axis of the music data of the currently input audio signal is determined, and the progress speed, that is, the tempo of the audio signal is determined. Then, the position on the time axis, that is, performance position information and timing information (information on when the beat timing is passed) and tempo information are output to the video sequencer 10. Note that the tempo and beat timing may be determined using, for example, the technique described in Japanese Patent Laid-Open No. 9-16171.

映像シーケンサ１０は、歌詞テロップである映像シーケンスデータ９を、クロック信号にしたがって読み出し、歌詞テロップの映像を再生する。上記のようにクロック信号が、外部から入力されたオーディオ信号すなわちライブ演奏音に同期したものであるため、この映像シーケンサ１０によって再生され、映像出力部１２から出力される歌詞テロップは、ライブ演奏に同期したものとなる。 The video sequencer 10 reads the video sequence data 9 which is a lyrics telop in accordance with the clock signal and reproduces the video of the lyrics telop. Since the clock signal is synchronized with the externally input audio signal, that is, the live performance sound as described above, the lyrics telop reproduced by the video sequencer 10 and output from the video output unit 12 is used for the live performance. It will be synchronized.

なお、演奏クロック予測生成部３による同期検出およびクロック信号の生成には若干の処理時間が必要であるため、要した処理時間の分だけ現時点の演奏位置とずれる可能性がある。したがって、演奏クロック予測生成部３は、この処理時間が無い場合の現時点の演奏位置を予測して再生クロックを出力する必要がある。通常、演奏はテンポが一定であるため、テンポ情報と要した処理時間とに基づいて演奏位置（オーディオ信号の時間軸上の位置）を予測することができる。この予測した演奏位置に基づいて演奏位置情報、クロック信号を生成する。これにより、映像出力部１２から出力される歌詞テロップは、オーディオ出力部１１から出力されるオーディオ信号とより精度よく同期する。なお、演奏が停止する、再開する、または演奏位置がジャンプする場合、同期がずれる可能性があるが、本実施形態では後述する補助情報入力部２から入力される補助情報を用いて確実な演奏位置の決定や同期の確定を行う。 It should be noted that since some processing time is required for synchronization detection and clock signal generation by the performance clock prediction generation unit 3, there is a possibility that the current performance position is shifted by the required processing time. Therefore, the performance clock prediction generation unit 3 needs to predict the current performance position when there is no processing time and output a reproduction clock. Usually, since the performance has a constant tempo, the performance position (position on the time axis of the audio signal) can be predicted based on the tempo information and the required processing time. Based on the predicted performance position, performance position information and a clock signal are generated. Thereby, the lyrics telop output from the video output unit 12 is more accurately synchronized with the audio signal output from the audio output unit 11. Note that if the performance is stopped, restarted, or the performance position jumps, synchronization may be lost, but in this embodiment, reliable performance is performed using auxiliary information input from the auxiliary information input unit 2 described later. Determine position and confirm synchronization.

ここで、図２，図３を参照して、カラオケ曲の曲データと歌詞テロップの表示方式について説明する。
図２において、曲データは、同図（Ａ）に示すように、ヘッダ、カラオケ曲を演奏するための楽音トラック、ガイドメロディを発生するためのガイドメロディトラック、歌詞テロップを表示するための歌詞トラック、曲の区切り位置を示すジャンプマークが書き込まれたマークトラック等からなっている。マークトラックには、たとえば、１番、２番、サビ、クライマックス、イントロ、間奏、エンディング等を示すジャンプマークが書き込まれる。 Here, with reference to FIG. 2, FIG. 3, the display method of the song data of a karaoke song and a lyrics telop is demonstrated.
In FIG. 2, the song data includes a header, a musical sound track for playing a karaoke song, a guide melody track for generating a guide melody, and a lyrics track for displaying a lyrics telop, as shown in FIG. It consists of a mark track or the like on which jump marks indicating the song separation positions are written. On the mark track, for example, jump marks indicating the first, second, chorus, climax, intro, interlude, ending and the like are written.

各トラックは、ＭＩＤＩフォーマットに従って記述されている。たとえば、楽音トラックやガイドメロディトラックは、同図（Ｂ）に示すように、ノートオンイベントデータ、ノートオフイベントデータなどのイベントデータと各イベントデータの読み出しタイミングを示すタイミングデータからなっている。ノートオンイベントデータは音高データを含み、このノートオンによって発生する楽音の音高や音量を指定する。この楽音は、対応するノートオフイベントデータが読み出されるまで継続する。
タイミングデータは、各イベントデータ間の時間的間隔を示すデュレーションデータや曲のスタート時刻からの絶対時間を示す絶対時間データなどで構成することができる。 Each track is described according to the MIDI format. For example, a musical tone track and a guide melody track are composed of event data such as note-on event data and note-off event data and timing data indicating the read timing of each event data, as shown in FIG. The note-on event data includes pitch data, and designates the pitch and volume of a musical sound generated by the note-on. This musical sound continues until the corresponding note-off event data is read out.
The timing data can be composed of duration data indicating the time interval between the event data, absolute time data indicating the absolute time from the start time of the music, and the like.

楽音トラック、ガイドメロディトラックのイベントデータは、上記のように楽音の音高、音量、オン／オフなどを示すノートイベントデータなどで構成され、このノートイベントデータを音源に入力することにより楽音が発音／消音される。楽音トラックは、多数の楽器の楽音を発生するために複数トラック（パート）で構成されており、ガイドメロディトラックは、歌唱旋律をガイドするための単旋律のＭＩＤＩデータで構成されている。 The music track and guide melody track event data consists of note event data that indicates the pitch, volume, on / off, etc. of the music as described above. / Muted. The musical sound track is composed of a plurality of tracks (parts) for generating musical sounds of a large number of musical instruments, and the guide melody track is composed of single melody MIDI data for guiding the singing melody.

歌詞トラックは、カラオケ曲の曲名や歌詞を表示するための各種データをインプリメントしたシーケンスデータであり、図３（Ａ）に示すように、タイミングデータに基づいて読み出される歌詞表示データから構成されている。 The lyric track is sequence data that implements various data for displaying the karaoke song name and lyrics, and is composed of lyric display data read based on timing data as shown in FIG. .

歌詞表示データは１行分の歌詞表示に関する全てのデータを内包しており、表示オンタイミング、表示オフタイミング、文字列データ（表示ポイント数、表示座標、文字間データなどを含む）、色１データ、色２データ、歌詞の色変えデータなどからなっている。 The lyric display data includes all data related to lyric display for one line, display on timing, display off timing, character string data (including the number of display points, display coordinates, character spacing data, etc.), color 1 data , Color 2 data, lyric color change data, etc.

同図（Ｂ）を参照して、歌詞表示データによる歌詞テロップの表示態様について説明する。この図のグラフは、縦軸が時間、横軸が歌詞テロップ（モニタ画面）のｘ座標を表している。曲の演奏がこの歌詞の箇所に到達するｔｏｎ前に、１行分の歌詞を色１で先行表示する。歌唱者は、歌詞テロップを見て解釈して歌唱するため、事前に歌詞を表示することが必要である。そして、曲の演奏がこの歌詞の箇所まで進行してくるまでの間色１で表示し続ける。曲の演奏がこの歌詞の箇所に到達すると、曲の進行に合わせて表示色を左から順に色１から色２に色変え（ワイプ）してゆく。ワイプが終了したのちも暫くの間（ｔｅ）色２で残表示したのちこの歌詞テロップを消去する。 With reference to FIG. 5B, the display mode of the lyrics telop by the lyrics display data will be described. In the graph of this figure, the vertical axis represents time and the horizontal axis represents the x-coordinate of the lyrics telop (monitor screen). Prior to ton when the performance of a song reaches the location of the lyrics, the lyrics for one line are displayed in advance in color 1. The singer needs to display the lyrics in advance in order to view and interpret the lyrics telop. The display continues in color 1 until the performance of the song progresses to the lyrics. When the performance of the song reaches the location of the lyrics, the display color is changed from color 1 to color 2 (wipe) in order from the left as the song progresses. After the wipe is completed, the telop is erased after being displayed in color 2 for a while (te).

歌詞の色変えデータは同図（Ｂ）に示すようにワイプの（時間−ｘ座標）曲線の主要な点を示す複数のプロッティングデータで構成されている。映像シーケンス部１０は、このプロッティングデータを２次曲線で補間して色変え（ワイプ）を行う。映像シーケンス部１０は、演奏クロック予測生成部３から入力される再生クロックおよび演奏位置情報により、ライブ演奏の映像・音声に遅れなく同期した歌詞テロップを生成する。 The color change data of the lyrics is composed of a plurality of plotting data indicating the main points of the wipe (time-x coordinate) curve as shown in FIG. The video sequence unit 10 performs color change (wipe) by interpolating the plotting data with a quadratic curve. The video sequence unit 10 generates a lyrics telop synchronized with the video / audio of the live performance without delay, based on the reproduction clock and the performance position information input from the performance clock prediction generation unit 3.

ここで、再び図１において、演奏クロック予測生成部３は、オーディオ入力部１から入力されるオーディオ信号のみでは演奏位置の割り出しや正確な同期確定が困難である。そこで、補助情報入力部２から入力された補助情報を用いて確実な演奏位置の決定や同期の確定を行う。 Here, referring again to FIG. 1, it is difficult for the performance clock prediction generation unit 3 to determine the performance position and to accurately determine the synchronization only with the audio signal input from the audio input unit 1. Therefore, reliable performance position determination and synchronization determination are performed using the auxiliary information input from the auxiliary information input unit 2.

補助情報は、例えばユーザによって入力される情報や、ライブ会場にいる係員によって入力される情報、ライブ映像等から検出される情報等である。ユーザは演奏をしながら、操作子（例えば足踏みペダル等）を用いて補助情報を入力する。また、ライブ会場には、ライブ中継のためにカメラマンや音声担当を含む複数人のスタッフが従事しており、そのうちの一人がカラオケ装置での同期再生のための補助情報を入力する。補助情報としては、曲がスタートしたことを示すスタート情報、曲が一時停止したことを示すストップ情報、曲が終了したことを示すエンド情報、曲の演奏位置が通常の進行からジャンプすることを示すジャンプ情報等である。ユーザが演奏を行う場合、特定の演奏を繰り返したりするなど、レコーディングされた通常の演奏と異なる進行で演奏される場合がある。また、ライブ演奏の場合においても、時間の制約で３コーラス目を省略したり、聴衆の反応に合わせてサビを繰り返したりするなど、レコーディングされた通常の演奏と異なる進行で演奏される場合がある。
このような場合に、ユーザや会場の係員がどこにジャンプしたか（またはどこにジャンプしそうかという予測）をジャンプ情報として入力する。ジャンプ位置は、曲データ中のジャンプマークで指定すればよいが、曲データにおける時間軸の値で指定してもよい。ジャンプマークとしては、上述したように、１番、２番、サビ、クライマックス、イントロ、間奏、エンディングを示すマークがあり、それぞれの曲に応じて曲データにおける時間軸の値に対応づけて付される。このジャンプマークは、図２に示したように元々曲データに付加されていることが好ましいが、ジャンプマークを持たない曲データについては、ユーザやライブ会場の係員が、曲のスタート時に補助情報としてジャンプマークトラックのデータを入力するようにしてもよい。なお、演奏中にユーザが足踏みペダルを用いてジャンプ情報を入力する場合、例えば足踏み１回で１番、２回で２番、３回でサビ等、簡易な操作で入力できるように構成しておけばよい。 The auxiliary information is, for example, information input by a user, information input by a staff member at a live venue, information detected from a live video, or the like. While performing, the user inputs auxiliary information using an operator (for example, a foot pedal). In addition, a plurality of staff members including a cameraman and a voice staff are engaged in the live venue for live broadcasting, and one of them inputs auxiliary information for synchronized playback on the karaoke apparatus. The auxiliary information includes start information indicating that the song has started, stop information indicating that the song has been paused, end information indicating that the song has ended, and indicating that the performance position of the song jumps from normal progress. Jump information and the like. When a user performs a performance, the performance may be performed differently from a recorded normal performance, such as repeating a specific performance. Also, in the case of live performances, there are cases where the third chorus is omitted due to time constraints, or the chorus is repeated according to the audience's reaction, and the performance is different from the recorded normal performance. .
In such a case, the user or the attendant at the venue jumps (or predicts where to jump) as jump information. The jump position may be specified by a jump mark in the music data, but may be specified by a time axis value in the music data. As described above, jump marks include marks indicating first, second, rust, climax, intro, interlude, and ending, and are associated with time axis values in song data according to each song. The It is preferable that the jump mark is originally added to the song data as shown in FIG. 2. However, for song data that does not have a jump mark, the user or the attendant at the live venue provides auxiliary information at the start of the song. You may make it input the data of a jump mark track. In addition, when the user inputs jump information using the foot pedal during the performance, it is configured so that it can be input by a simple operation such as No. 1 by stepping, No. 2 by 2 times, Rust by 3 times, etc. Just keep it.

また、ユーザやライブ会場の係員が入力する補助情報として、曲スタート時やテンポ変化時に手動（タッピング）で入力されるビート信号も送られてくる。このビート信号は精度が高いものではなく、また、曲の開始から終了まで継続的に送られてくるものではないため、これをそのままテンポクロック信号として用いることはできないが、テンポ決定や拍タイミングを割り出すための情報として用いることができる。 In addition, a beat signal input manually (tapping) at the start of a song or when the tempo changes is also sent as auxiliary information input by a user or an attendant at a live venue. This beat signal is not highly accurate and is not sent continuously from the beginning to the end of the song, so it cannot be used as a tempo clock signal as it is, but the tempo determination and beat timing are not It can be used as information for determining.

演奏クロック予測生成部３は、補助情報入力部２から入力される上記のような補助情報を参照して、現在の演奏位置がどのあたりであるかを割り出し、その周辺の音声データを読み出して、入力されるオーディオ信号と対比する。また、間欠的に入力されるビート信号を参照して概ねの拍タイミングとテンポを把握し、この概ねの拍タイミングとテンポ（時間軸のスケーリング）で入力オーディオ信号と音声データとを対比することで同期を容易にする。 The performance clock prediction generation unit 3 refers to the auxiliary information as described above input from the auxiliary information input unit 2 to determine where the current performance position is, reads out the surrounding audio data, Contrast with the input audio signal. In addition, the beat signal input intermittently is referred to, the approximate beat timing and tempo are grasped, and the input audio signal and the audio data are compared with the approximate beat timing and tempo (time axis scaling). Facilitates synchronization.

<<第２実施形態>>
図４はこの発明の第２実施形態であるコンテンツ再生装置の構成図である。図４において、図１に示した第１の実施形態と同一構成の部分は、同一番号を付して説明を省略する。 << Second Embodiment >>
FIG. 4 is a block diagram of a content reproduction apparatus according to the second embodiment of the present invention. In FIG. 4, the same components as those of the first embodiment shown in FIG.

この実施形態のコンテンツ再生装置では、記憶部７は、映像シーケンスデータ９（図１参照）に代えてビデオデータ（ＶｉｄｅｏＤａｔａ）１４を記憶している。そしてこのビデオデータ１４を再生する機能部として、映像シーケンサ１０（図１参照）に代えてビデオ再生部（ＶｉｄｅｏＰｌａｙｅｒ）１５を備えている。ビデオ再生部１５としては、可変フレームレートの再生装置（または再生ソフトウェア）を用いる。これにより、演奏クロック予測生成部３から入力されたクロック信号に同期したビデオ再生が可能になる。 In the content reproduction apparatus of this embodiment, the storage unit 7 stores video data (Video Data) 14 instead of the video sequence data 9 (see FIG. 1). As a functional unit for reproducing the video data 14, a video reproducing unit (Video Player) 15 is provided instead of the video sequencer 10 (see FIG. 1). As the video playback unit 15, a playback device (or playback software) with a variable frame rate is used. Thereby, video reproduction synchronized with the clock signal input from the performance clock prediction generation unit 3 can be performed.

<<第３実施形態>>
図５はこの発明の第３実施形態であるコンテンツ再生装置の構成図である。図５において、図１に示した第１の実施形態と同一構成の部分は、同一番号を付して説明を省略する。 << Third Embodiment >>
FIG. 5 is a block diagram of a content reproduction apparatus according to the third embodiment of the present invention. In FIG. 5, the same components as those of the first embodiment shown in FIG.

この実施形態のコンテンツ再生装置では、記憶部７は、映像シーケンスデータ９に加えて外部機器を制御するための外部シーケンスデータ（ＡｕｘＳｅｑｕｅｎｃｅＤａｔａ）１７を記憶している。そしてこの外部シーケンスデータ１７をシーケンスする機能部として、外部シーケンサ１８を備えている。外部シーケンサ１８がシーケンスした制御データは制御信号出力部１９から外部機器に対して出力される。 In the content reproduction apparatus of this embodiment, the storage unit 7 stores external sequence data (Aux Sequence Data) 17 for controlling external devices in addition to the video sequence data 9. An external sequencer 18 is provided as a functional unit for sequencing the external sequence data 17. The control data sequenced by the external sequencer 18 is output from the control signal output unit 19 to the external device.

制御信号出力部１９には、たとえば、自動演奏ピアノ、照明機器や花火，噴水等の演出装置が接続される。このような、機器をオーディオ信号入力部１から入力されるオーディオ信号に同期して制御することにより、ライブ演奏に同期して生ピアノが演奏されたり、照明が変化したり花火が点火されたりするような演出を加えることができる。 The control signal output unit 19 is connected to a production device such as an automatic performance piano, lighting equipment, fireworks, and fountains. By controlling the device in synchronism with the audio signal input from the audio signal input unit 1, a live piano is played in synchronization with the live performance, lighting is changed, and fireworks are ignited. Such effects can be added.

<<第４実施形態>>
図６はこの発明の第４実施形態であるコンテンツ再生装置の構成図である。図６において、図１に示した第１の実施形態と同一構成の部分は、同一番号を付して説明を省略する。 << Fourth embodiment >>
FIG. 6 is a block diagram of a content reproduction apparatus according to the fourth embodiment of the present invention. In FIG. 6, the same components as those of the first embodiment shown in FIG.

この実施形態のコンテンツ再生装置では、音声データ８Ａを再生する機能部として、音声再生部２１を備えている。音声データ８Ａは、オーディオ信号との同期を検出するためのリファレンスとして用いた音声データ８と同一のもの（同一トラック）であってもよく、他のものであってもよい。たとえば、図２に示した曲データの場合、ガイドメロディトラックを含む楽音をシーケンスする演奏データトラックは複数あるため、そのうちの１つ（たとえばガイドメロディトラック）をリファレンス（音声データ８）として用い、他のトラックを再生用の音声データ８Ａとして用いることができる。また、再生用の音声データ８Ａとして、ＡＤＰＣＭやＭＰ３等のオーディオ波形データを記憶しておいてもよい。 The content reproduction apparatus of this embodiment includes an audio reproduction unit 21 as a functional unit that reproduces the audio data 8A. The audio data 8A may be the same (same track) as the audio data 8 used as a reference for detecting synchronization with the audio signal, or may be another. For example, in the case of the song data shown in FIG. 2, since there are a plurality of performance data tracks that sequence a musical tone including a guide melody track, one of them (for example, the guide melody track) is used as a reference (audio data 8), and the other Can be used as audio data 8A for reproduction. Further, audio waveform data such as ADPCM and MP3 may be stored as the audio data 8A for reproduction.

再生用の音声データ８ＡがＭＩＤＩ等のシーケンスデータの場合、音声再生部２１は、シーケンサおよび音源の機能を備える。また、再生用の音声データ８Ａがオーディオ波形データの場合には、音声再生部２１は、デコーダを備える。 When the audio data 8A for reproduction is sequence data such as MIDI, the audio reproducing unit 21 has functions of a sequencer and a sound source. When the audio data 8A for reproduction is audio waveform data, the audio reproducing unit 21 includes a decoder.

音声再生部２１が再生した音声データ８Ａの音声信号は、ミキサ２２に入力される。ミキサ２２には、オーディオ入力部１から入力されたオーディオ信号も入力される。ミキサ２２は、この入力されたオーディオ信号および再生された音声信号をミキシングしてオーディオ出力部１１から出力する。 The audio signal of the audio data 8 </ b> A reproduced by the audio reproducing unit 21 is input to the mixer 22. The audio signal input from the audio input unit 1 is also input to the mixer 22. The mixer 22 mixes the input audio signal and the reproduced audio signal and outputs them from the audio output unit 11.

これにより、入力されたオーディオ信号（ライブ演奏音）に同期して、歌詞テロップ等の映像を再生することができるとともに、さらに音声信号を再生して前記入力されたオーディオ信号にミキシングして出力することができる。例えばユーザの演奏音（メロディ）にあわせて、伴奏音を追加することができる。 Thereby, in synchronism with the input audio signal (live performance sound), a video such as a lyrics telop can be reproduced, and further, an audio signal is reproduced, mixed and output to the input audio signal. be able to. For example, an accompaniment sound can be added in accordance with the user's performance sound (melody).

<<第５実施形態>>
図７はこの発明の第５実施形態であるコンテンツ再生装置の構成図である。図７において、図１に示した第１の実施形態と同一構成の部分は、同一番号を付して説明を省略する。 << Fifth Embodiment >>
FIG. 7 is a block diagram of a content reproduction apparatus according to the fifth embodiment of the present invention. In FIG. 7, the same components as those of the first embodiment shown in FIG.

この実施形態のコンテンツ再生装置は、オーディオ入力部１に加えてビデオ入力部２４を備えている。このビデオ入力部２４には、例えばライブ映像が入力される。さらに、このコンテンツ再生装置は、映像ミキサ２６を備えている。 The content reproduction apparatus according to this embodiment includes a video input unit 24 in addition to the audio input unit 1. For example, live video is input to the video input unit 24. Further, this content reproduction apparatus includes a video mixer 26.

ビデオ入力２４に入力されたビデオ信号は、映像ミキサ２６に入力される。映像ミキサ２６には、映像シーケンサ１０が再生した映像も入力される。映像ミキサ２６は、ビデオ入力部２４から入力されたビデオ信号と映像シーケンサ１０が再生した映像とを合成する。この映像ミキサ２６で合成された映像（ビデオ信号）が映像出力部１２から出力される。 The video signal input to the video input 24 is input to the video mixer 26. The video reproduced by the video sequencer 10 is also input to the video mixer 26. The video mixer 26 combines the video signal input from the video input unit 24 and the video reproduced by the video sequencer 10. The video (video signal) synthesized by the video mixer 26 is output from the video output unit 12.

たとえば、入力されたビデオ信号がライブ映像であり、映像シーケンサ１０が再生した映像が歌詞テロップであった場合には、ライブ映像上に歌詞テロップがスーパーインポーズ合成され、この合成映像が外部出力されて表示される。 For example, if the input video signal is a live video and the video reproduced by the video sequencer 10 is a lyrics telop, the lyrics telop is superimposed on the live video and the synthesized video is output to the outside. Displayed.

<<第６実施形態>>
図８はこの発明の第６実施形態であるコンテンツ再生装置の構成図である。図８において、図７に示した第５の実施形態と同一構成の部分は、同一番号を付して説明を省略する。 << Sixth Embodiment >>
FIG. 8 is a block diagram of a content reproduction apparatus according to the sixth embodiment of the present invention. In FIG. 8, the same components as those of the fifth embodiment shown in FIG.

この実施形態のコンテンツ再生装置では、ビデオ入力部２４から入力されたビデオ信号を再生映像と合成した出力するのみでなく、このビデオ映像を解析し、その解析結果を補助情報として演奏クロック予測生成部３に供給している。この解析を行う機能部としてパターン認識部（ＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ）２８を備えている。このパターン認識部２８は、ライブ演奏を認識する場合、以下のようなパターン認識を行い、補助情報として演奏クロック予測生成部３に入力する。 In the content playback apparatus of this embodiment, not only the video signal input from the video input unit 24 is synthesized and output, but also the video image is analyzed, and the performance clock prediction generation unit uses the analysis result as auxiliary information. 3 is supplied. A pattern recognition unit (Pattern Recognition) 28 is provided as a functional unit for performing this analysis. When recognizing a live performance, the pattern recognition unit 28 performs the following pattern recognition and inputs it to the performance clock prediction generation unit 3 as auxiliary information.

歌唱者の口の動きを認識して、歌い始めや概ねのフレーズを認識する。
演奏者の動作を認識して、概ねのフレーズや繰り返し，曲の終了等を認識する。
照明の変化（画面の明るさの変化）で、曲のスタートや概ねのビートを認識する。 Recognize the movement of the singer's mouth and recognize the beginning of singing and general phrases.
Recognize the player's actions and recognize the general phrases, repetitions, end of music, etc.
Changes in lighting (changes in screen brightness) recognize the start of music and the approximate beat.

演奏クロック予測生成部３は、この認識結果を補助情報として取り込むことにより、外部から入力される補助情報が無くても演奏位置や拍タイミング，テンポ等の認識が容易になり、外部から入力される補助情報が存在する場合にはさらに高精度に演奏位置や拍タイミング，テンポ等の認識が容易になる。 The performance clock prediction generation unit 3 captures this recognition result as auxiliary information, so that the performance position, beat timing, tempo, etc. can be easily recognized without auxiliary information input from the outside, and input from the outside. When auxiliary information exists, the performance position, beat timing, tempo, etc. can be easily recognized with higher accuracy.

<<尚書き>>
上記第３実施形態は、第１実施形態に外部シーケンスデータ１７，外部シーケンサ１８等による外部機器制御機能を付加した例を示したが、この外部機器制御機能を第２，４〜６実施形態のどれに設けてもよく、いずれの形態も本発明の技術的範囲内のものである。 << Reading >>
In the third embodiment, the example in which the external device control function by the external sequence data 17, the external sequencer 18 and the like is added to the first embodiment is shown. The external device control function is added to the second and fourth to sixth embodiments. Any form may be provided, and any form is within the technical scope of the present invention.

また、同様に、第４実施形態の音声データ８Ａ，音声再生部２１等による音声再生機能、および、第５実施形態のビデオ入力部２４，映像ミキサ２６等によるビデオ合成機能は、それぞれ第１実施形態以外のどの実施形態と組み合わせてもよく、そのいずれの形態も本発明の技術的範囲内のものである。また、第６実施形態のビデオ入力部２４，パターン認識部２８による映像解析機能は第１実施形態及び第５実施形態以外のどの実施形態と組み合わせてもよく、そのいずれの形態も本発明の技術的範囲内のものである。 Similarly, the audio data reproduction function by the audio data 8A, the audio reproduction unit 21 and the like of the fourth embodiment, and the video synthesis function by the video input unit 24 and the video mixer 26 of the fifth embodiment are respectively the first implementation. Any form other than the form may be combined, and any form is within the technical scope of the present invention. In addition, the video analysis function by the video input unit 24 and the pattern recognition unit 28 of the sixth embodiment may be combined with any embodiment other than the first embodiment and the fifth embodiment, and any of these modes is the technology of the present invention. Within the scope.

また、上記実施形態では、主としてユーザの演奏や他の地点のライブ演奏の中継を例にあげて説明したが、本発明は、これらに限定されることなく、オーディオ信号に同期して映像を再生する場面であれば、どのような場面にも適用することができる。たとえば、テレビのキャプション表示装置、映画等のマルチリンガル字幕表示装置、語学学習装置等に適用することが可能である。
また、上記実施形態では、ネットワークや放送を通じてオーディオ信号やビデオ信号が送られてくる例を示したが、オーディオ信号，ビデオ信号および補助情報がＤＶＤやＨＤＤ等のメディア，ストレージに記憶されており、これを再生したものをオーディオ信号，ビデオ信号，補助情報として入力する形態で、本コンテンツ再生装置を用いることも可能である。 In the above-described embodiment, the description has been given mainly on the user performance or the live performance relay at another point as an example. However, the present invention is not limited to this, and the video is reproduced in synchronization with the audio signal. Any scene can be applied. For example, the present invention can be applied to a caption display device for television, a multilingual subtitle display device for movies, a language learning device, and the like.
In the above embodiment, an example in which an audio signal or a video signal is transmitted through a network or broadcast is shown. However, the audio signal, the video signal, and auxiliary information are stored in a medium or storage such as a DVD or HDD, It is also possible to use the content reproducing apparatus in such a form that a reproduced one is input as an audio signal, a video signal, and auxiliary information.

また、上記実施形態では、コンテンツ再生装置をローカル（カラオケボックス）側に設置する構成で説明したが、このコンテンツ再生装置を配信側（たとえばライブ会場）に設置し、オーディオ信号と、それに同期した映像を一緒に配信するようにしてもよい。 In the above embodiment, the content playback device is described as being installed on the local (karaoke box) side. However, the content playback device is installed on the distribution side (for example, a live venue), and an audio signal and a video synchronized with the audio signal are provided. May be distributed together.

この発明の第１の実施形態であるコンテンツ再生装置のブロック図The block diagram of the content reproduction apparatus which is 1st Embodiment of this invention コンテンツ再生装置に記憶されている曲データの構成を示す図The figure which shows the structure of the music data memorize | stored in the content reproduction apparatus 歌詞テロップの表示方式を説明する図The figure explaining the display system of the lyrics telop この発明の第２の実施形態であるコンテンツ再生装置のブロック図The block diagram of the content reproduction apparatus which is 2nd Embodiment of this invention この発明の第３の実施形態であるコンテンツ再生装置のブロック図The block diagram of the content reproduction apparatus which is the 3rd Embodiment of this invention この発明の第４の実施形態であるコンテンツ再生装置のブロック図The block diagram of the content reproduction apparatus which is 4th Embodiment of this invention この発明の第５の実施形態であるコンテンツ再生装置のブロック図The block diagram of the content reproduction apparatus which is 5th Embodiment of this invention この発明の第６の実施形態であるコンテンツ再生装置のブロック図The block diagram of the content reproduction apparatus which is the 6th Embodiment of this invention

Explanation of symbols

１オーディオ入力部
２補助情報入力部
３演奏クロック予測生成部
６信号処理部
７記憶部
８音声データ
８Ａ（再生用の）音声データ
９映像シーケンスデータ
１０映像シーケンサ
１１オーディオ出力部
１２映像出力部
１４ビデオデータ
１５ビデオ再生部
１７外部シーケンスデータ
１８外部シーケンサ
１９制御信号出力部
２１音声再生部
２２ミキサ
２４ビデオ入力部
２６映像ミキサ
２８パターン認識部 DESCRIPTION OF SYMBOLS 1 Audio input part 2 Auxiliary information input part 3 Performance clock prediction production | generation part 6 Signal processing part 7 Memory | storage part 8 Audio | voice data 8A (For reproduction) Audio data 9 Video sequence data 10 Video sequencer 11 Audio output part 12 Video output part 14 Video Data 15 Video playback unit 17 External sequence data 18 External sequencer 19 Control signal output unit 21 Audio playback unit 22 Mixer 24 Video input unit 26 Video mixer 28 Pattern recognition unit

Claims

A content data storage unit for storing content data in which audio time-series data and video time-series data are recorded in association with each other in time series;
An audio signal input section for inputting an audio signal from the outside;
An auxiliary information input unit for inputting auxiliary information from the outside;
Performing a calculation process of calculating the corresponding position on the time axis and the speed of progress of the audio signal by comparing the audio signal and the audio time-series data, and correcting the calculation process based on the auxiliary information, A clock prediction generation unit that generates a reproduction clock synchronized with the audio signal in advance by a time required for calculation processing;
A video reproduction unit for reproducing the video time-series data based on the reproduction clock generated by the clock prediction generation unit;
A content playback apparatus comprising:

A video signal input unit for inputting a video signal from the outside;
A video synthesis unit that synthesizes and outputs the video signal input by the video signal input unit and the video signal reproduced by the video reproduction unit;
The content reproduction apparatus according to claim 1, further comprising:

An auxiliary information extraction unit that extracts auxiliary information from the video signal input by the video signal input unit and supplies the auxiliary information to the clock prediction generation unit;
The content reproduction apparatus according to claim 2, further comprising:

A video signal input unit for inputting a video signal synchronized with the audio signal from the outside;
An auxiliary information extraction unit that extracts auxiliary information from the video signal input by the video signal input unit and supplies the auxiliary information to the clock prediction generation unit;
The content reproduction apparatus according to claim 1, further comprising:

5. The content reproduction according to claim 1, wherein the clock prediction generation unit further instructs the video reproduction unit to reproduce the video time-series data based on the auxiliary information. apparatus.

The auxiliary information input unit inputs the progress position information of the audio signal as auxiliary information,
The content reproduction apparatus according to claim 1, wherein the clock prediction generation unit corrects a comparison position of the audio time-series data in the calculation process based on the auxiliary information.

The auxiliary information input unit inputs the traveling speed information of the audio signal as auxiliary information,
The content reproduction device according to claim 1, wherein the clock prediction generation unit corrects the progress speed in the calculation process based on the auxiliary information.

The video time-series data stored in the content data storage unit is sequence data including video data to be displayed and timing data indicating the display timing thereof,
The content reproduction apparatus according to claim 1, wherein the video reproduction unit includes a sequence processing unit that reproduces the sequence data.

The video time-series data stored in the content data storage unit is video data including a moving image,
The content reproduction apparatus according to claim 1, wherein the video reproduction unit includes a video reproduction unit that reproduces the video data.

The content data storage unit further stores device control time-series data for controlling external devices in time series,
The content reproduction according to any one of claims 1 to 9, further comprising an external device control unit that reads out the device control time-series data based on the reproduction clock generated by the clock prediction generation unit and outputs a control signal. apparatus.

The content reproduction apparatus according to claim 1, further comprising an audio reproduction unit that reproduces the audio time-series data based on a reproduction clock generated by the clock prediction generation unit.

The audio time-series data stored in the content data storage unit is composed of a plurality of tracks formed in parallel,
The clock prediction generation unit generates a reproduction clock using one track corresponding to the audio signal among the plurality of tracks,
The content reproduction device according to claim 11, wherein the audio reproduction unit reproduces a part or all of the plurality of tracks that are not used by the clock prediction generation unit.