JP2025124424A

JP2025124424A - Data processing method and data processing device

Info

Publication number: JP2025124424A
Application number: JP2024020476A
Authority: JP
Inventors: 祐希末光; 将俊長谷川; 洋平和田
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2024-02-14
Filing date: 2024-02-14
Publication date: 2025-08-26
Also published as: WO2025173478A1

Abstract

【課題】リアルタイム性を阻害せず、再生環境に応じて最適な映像を再生させることができるデータ処理方法を提供する。【解決手段】データ処理方法は、少なくとも第１チャンネルと第２チャンネルとを含む複数チャンネルのオーディオデータを格納したマルチトラックオーディオデータを生成するデータ処理方法であって、前記第１チャンネルに、デジタルオーディオ信号のデータ列を格納し、前記第２チャンネルに、前記デジタルオーディオ信号と関連する、キャラクタの動きの情報であるモーションデータを、デジタルオーディオ信号のデータ列として格納する。【選択図】図３[Problem] To provide a data processing method that can reproduce optimal video according to the reproduction environment without impeding real-time performance. [Solution] The data processing method generates multi-track audio data that stores audio data of multiple channels including at least a first channel and a second channel, and stores a data string of digital audio signals in the first channel, and stores motion data, which is information on character movement and is associated with the digital audio signals, as a data string of digital audio signals in the second channel. [Selected Figure] Figure 3

Description

この発明の一実施形態は、複数チャンネルのオーディオデータを格納したマルチトラックオーディオデータを処理するデータ処理方法に関する。 One embodiment of the present invention relates to a data processing method for processing multi-track audio data that stores audio data for multiple channels.

特許文献１には、前記第２記憶手段に記憶された前記動作情報及び前記動作時間と、前記第１制御手段により出力された前記楽曲情報を識別する識別情報とを対応付けて、所定のサーバ装置へ送信する送信手段が開示されている。 Patent document 1 discloses a transmission means that associates the operation information and operation time stored in the second storage means with identification information that identifies the music information output by the first control means and transmits the information to a specified server device.

特許文献２には、歌唱者の頭部の動きからキャラクタの動き画像を生成することが記載されている。 Patent document 2 describes generating a character movement image from the singer's head movement.

特開２０１５－４７２６１号公報JP 2015-47261 A 特開２０１２－７８５２６号公報JP 2012-78526 A

映像データのデータ量は音データのデータ量よりも大きい。そのため、音データと映像データを別々に配信すると映像データの遅延が大きくなる。したがって、音データと映像データの同期が難しくなる。また、音データと映像データを同期させて配信すると、リアルタイム性が低下する（音データを映像データに合わせるため、音データの配信が遅れる）。 The amount of video data is larger than the amount of audio data. Therefore, if audio data and video data are transmitted separately, the delay in the video data will be large. This makes it difficult to synchronize the audio data with the video data. Furthermore, if audio data and video data are transmitted in synchronization, real-time performance will be reduced (the transmission of the audio data will be delayed in order to synchronize the audio data with the video data).

また、映像データを配信すると、再生会場の環境に合った映像が再生されない場合もある。例えば、屋外で撮影された映像データが屋内の再生会場で再生されるとユーザは違和感を覚える。 Furthermore, when video data is distributed, it may not be played in a way that is appropriate for the environment of the playback venue. For example, if video data shot outdoors is played in an indoor playback venue, the user may feel uncomfortable.

そこで、本発明の一実施形態の目的は、リアルタイム性を阻害せず、再生環境に応じて最適な映像を再生させることができるデータ処理方法を提供することを目的とする。 Therefore, an object of one embodiment of the present invention is to provide a data processing method that can play optimal video depending on the playback environment without impeding real-time performance.

本発明の一実施形態に係るデータ処理方法は、少なくとも第１チャンネルと第２チャンネルとを含む複数チャンネルのオーディオデータを格納したマルチトラックオーディオデータを生成するデータ処理方法であって、前記第１チャンネルに、デジタルオーディオ信号のデータ列を格納し、前記第２チャンネルに、前記デジタルオーディオ信号と関連する、キャラクタの動きの情報であるモーションデータを、デジタルオーディオ信号のデータ列として格納する。 A data processing method according to one embodiment of the present invention is a data processing method for generating multi-track audio data that stores audio data for multiple channels including at least a first channel and a second channel, in which a data stream of digital audio signals is stored in the first channel, and motion data, which is information about character movement and is associated with the digital audio signals, is stored in the second channel as a data stream of digital audio signals.

本発明の一実施形態によれば、リアルタイム性を阻害せず、再生環境に応じて最適な映像を再生させることができる。 According to one embodiment of the present invention, it is possible to play back video that is optimal for the playback environment without impeding real-time performance.

データ処理システム１の構成を示すブロック図である。1 is a block diagram showing a configuration of a data processing system 1. データ処理装置１０の主要構成を示すブロック図である。1 is a block diagram showing the main configuration of a data processing device 10. FIG. ＣＰＵ１０４の動作を示すフローチャートである。4 is a flowchart showing the operation of a CPU 104. 処理部１５４の動作を示すフローチャートである。10 is a flowchart showing the operation of a processing unit 154. モーションデータの構造例を示す図である。FIG. 2 is a diagram illustrating an example of the structure of motion data. モーションデータをデジタルオーディオ信号のデータ列として格納する場合のフォーマットの一例を示す図である。FIG. 10 is a diagram showing an example of a format when motion data is stored as a data string of a digital audio signal. 再生環境下におけるデータ処理システム１Ａの構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a data processing system 1A in a playback environment. データ処理装置１０の処理部１５４の再生時の動作を示すフローチャートである。10 is a flowchart showing the operation of a processing unit 154 of the data processing device 10 during playback.

図１は、データ処理システム１の構成を示すブロック図である。データ処理システム１は、データ処理装置１０、ミキサ１１、およびモーションキャプチャ１２を備えている。 Figure 1 is a block diagram showing the configuration of a data processing system 1. The data processing system 1 includes a data processing device 10, a mixer 11, and a motion capture device 12.

各機器は、例えば、ＵＳＢケーブル、ＨＤＭＩ（登録商標）、イーサネット（登録商標）、またはＭＩＤＩ等の通信規格で接続されている。各機器は、例えばライブ演奏等のパフォーマンスを行う演者の室内に設置される。 Each device is connected via a communication standard such as a USB cable, HDMI (registered trademark), Ethernet (registered trademark), or MIDI. Each device is installed in the room of a performer, for example, where a live performance is taking place.

ミキサ１１は、マイク、楽器、またはアンプ等の複数の音響機器を接続する。ミキサ１１は、複数の音響機器からデジタルまたはアナログのオーディオ信号を受信する。ミキサ１１は、アナログのオーディオ信号を受信した場合、当該アナログオーディオ信号を例えばサンプリング周波数４８ｋＨｚ、２４ビットのデジタルオーディオ信号に変換する。ミキサ１１は、複数のデジタルオーディオ信号にミキシング、ゲイン調整、イコライジング、またはコンプレッシング等の信号処理を施す。ミキサ１１は、信号処理後のデジタルオーディオ信号をデータ処理装置１０に送信する。 The mixer 11 connects multiple audio devices, such as microphones, musical instruments, or amplifiers. The mixer 11 receives digital or analog audio signals from the multiple audio devices. When the mixer 11 receives an analog audio signal, it converts the analog audio signal into a 24-bit digital audio signal with a sampling frequency of, for example, 48 kHz. The mixer 11 performs signal processing such as mixing, gain adjustment, equalization, or compression on the multiple digital audio signals. The mixer 11 transmits the processed digital audio signals to the data processing device 10.

モーションキャプチャ１２は、演者のモーションをキャプチャするためのセンサであり、例えば光学式、慣性式、あるいは画像式等のセンサである。データ処理装置１０は、モーションキャプチャ１２からセンサ信号を受信する。データ処理装置１０は、モーションキャプチャ１２のセンサ信号に基づいてモーションデータを生成する。 The motion capture device 12 is a sensor for capturing the motion of the performer, and may be, for example, an optical, inertial, or image sensor. The data processing device 10 receives sensor signals from the motion capture device 12. The data processing device 10 generates motion data based on the sensor signals from the motion capture device 12.

図２は、データ処理装置１０の主要構成を示すブロック図である。データ処理装置１０は、一般的なパーソナルコンピュータ等からなり、表示器１０１、ユーザインタフェース（Ｉ／Ｆ）１０２、フラッシュメモリ１０３、ＣＰＵ１０４、ＲＡＭ１０５、および通信インタフェース（Ｉ／Ｆ）１０６を備えている。 Figure 2 is a block diagram showing the main components of the data processing device 10. The data processing device 10 is composed of a general-purpose personal computer or the like, and includes a display 101, a user interface (I/F) 102, a flash memory 103, a CPU 104, a RAM 105, and a communication interface (I/F) 106.

表示器１０１は、例えばＬＣＤ（ＬｉｑｕｉｄＣｒｙｓｔａｌＤｉｓｐｌａｙ）またはＯＬＥＤ（ＯｒｇａｎｉｃＬｉｇｈｔ－ＥｍｉｔｔｉｎｇＤｉｏｄｅ）等からなり、種々の情報を表示する。ユーザＩ／Ｆ１０２は、スイッチ、キーボード、マウス、トラックボール、またはタッチパネル等からなり、ユーザの操作を受け付ける。ユーザＩ／Ｆ１０２がタッチパネルである場合、該ユーザＩ／Ｆ１０２は、表示器１０１とともに、ＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ以下略）を構成する。 The display 101 is composed of, for example, an LCD (Liquid Crystal Display) or OLED (Organic Light-Emitting Diode), and displays various information. The user I/F 102 is composed of a switch, keyboard, mouse, trackball, touch panel, or the like, and accepts user operations. If the user I/F 102 is a touch panel, the user I/F 102, together with the display 101, constitutes a GUI (Graphical User Interface).

通信Ｉ／Ｆ１０６は、ＵＳＢケーブル、ＨＤＭＩ（登録商標）、イーサネット（登録商標）、またはＭＩＤＩ等の通信線を介して、ミキサ１１およびモーションキャプチャ１２に接続される。通信Ｉ／Ｆ１０６は、ミキサ１１からデジタルオーディオ信号を受信する。また、通信Ｉ／Ｆ１０６は、モーションキャプチャ１２からセンサ信号を受信する。なお、ミキサ１１は、デジタルオーディオ信号をＤａｎｔｅ（登録商標）等のプロトコルを用いて、イーサネット（登録商標）を介してデータ処理装置１０に送信してもよい。 The communication I/F 106 is connected to the mixer 11 and the motion capture device 12 via a communication line such as a USB cable, HDMI (registered trademark), Ethernet (registered trademark), or MIDI. The communication I/F 106 receives digital audio signals from the mixer 11. The communication I/F 106 also receives sensor signals from the motion capture device 12. The mixer 11 may also transmit digital audio signals to the data processing device 10 via Ethernet (registered trademark) using a protocol such as Dante (registered trademark).

ＣＰＵ１０４は、本発明の処理部に対応する。ＣＰＵ１０４は、記憶媒体であるフラッシュメモリ１０３に記憶されているプログラムをＲＡＭ１０５に読み出して、所定の機能を実現する。例えば、ＣＰＵ１０４は、表示器１０１にユーザの操作を受け付けるための画像を表示し、ユーザＩ／Ｆ１０２を介して、当該画像に対する選択操作等を受け付けることで、ＧＵＩを実現する。ＣＰＵ１０４は、通信Ｉ／Ｆ１０６を介して他の機器からデジタル信号を受信する。ＣＰＵ１０４は、受信したデジタル信号に基づいてマルチトラックオーディオデータを生成する。また、ＣＰＵ１０４は、生成したマルチトラックオーディオデータをフラッシュメモリ１０３に記憶する。あるいは、ＣＰＵ１０４は、生成したマルチトラックオーディオデータを配信する。 The CPU 104 corresponds to the processing unit of the present invention. The CPU 104 reads a program stored in the flash memory 103, which is a storage medium, into the RAM 105 to implement predetermined functions. For example, the CPU 104 displays an image on the display 101 for accepting user operations, and accepts selection operations and the like on the image via the user I/F 102, thereby implementing a GUI. The CPU 104 receives digital signals from other devices via the communication I/F 106. The CPU 104 generates multi-track audio data based on the received digital signals. The CPU 104 also stores the generated multi-track audio data in the flash memory 103. Alternatively, the CPU 104 distributes the generated multi-track audio data.

なお、ＣＰＵ１０４が読み出すプログラムは、自装置内のフラッシュメモリ１０３に記憶されている必要はない。例えば、プログラムは、サーバ等の外部装置の記憶媒体に記憶されていてもよい。この場合、ＣＰＵ１０４は、該サーバから都度プログラムをＲＡＭ１０５に読み出して実行すればよい。 Note that the program read by the CPU 104 does not have to be stored in the flash memory 103 of the device itself. For example, the program may be stored in a storage medium of an external device such as a server. In this case, the CPU 104 simply reads the program from the server into the RAM 105 and executes it each time.

マルチトラックオーディオデータは、例えばＤａｎｔｅ（登録商標）のプロトコルに準拠する。例えば、Ｄａｎｔｅ（登録商標）は、１００ｂａｓｅ－ＴＸにおいて６４チャンネルのオーディオ信号を伝送できる。各チャンネルは、例えばサンプリング周波数４８ｋＨｚ、２４ビットのデジタルオーディオ信号（ＷＡＶ形式等の非圧縮のデジタルオーディオデータ）を格納する。ただし、本発明においてマルチトラックオーディオデータのチャンネル数、サンプリング周波数、およびビット数はこの例に限らない。また、マルチトラックオーディオデータは、Ｄａｎｔｅ（登録商標）のプロトコルに準拠する必要もない。 Multi-track audio data conforms to the Dante (registered trademark) protocol, for example. For example, Dante (registered trademark) can transmit 64-channel audio signals at 100base-TX. Each channel stores, for example, a 48 kHz sampling frequency, 24-bit digital audio signal (uncompressed digital audio data in WAV format, etc.). However, in the present invention, the number of channels, sampling frequency, and bit depth of the multi-track audio data are not limited to this example. Furthermore, the multi-track audio data does not need to conform to the Dante (registered trademark) protocol.

図３は、本発明の最小構成を示す機能的ブロック図である。図３に示す処理部１５４は、ＣＰＵ１０４の実行するソフトウェアにより実現される。図４は、処理部１５４の動作を示すフローチャートである。処理部１５４は、ミキサ１１からデジタルオーディオ信号を受け付け（Ｓ１１）、モーションキャプチャ１２からセンサ信号を受け付ける（Ｓ１２）。Ｓ１１およびＳ１２の処理は同時に行ってもよいし、いずれを先に行ってもよい。 Figure 3 is a functional block diagram showing the minimum configuration of the present invention. The processing unit 154 shown in Figure 3 is realized by software executed by the CPU 104. Figure 4 is a flowchart showing the operation of the processing unit 154. The processing unit 154 receives a digital audio signal from the mixer 11 (S11) and a sensor signal from the motion capture unit 12 (S12). The processes of S11 and S12 may be performed simultaneously, or either may be performed first.

そして、処理部１５４は、モーションキャプチャ１２から受け付けたセンサ信号に基づいてモーションデータを生成する（Ｓ１３）。 Then, the processing unit 154 generates motion data based on the sensor signal received from the motion capture device 12 (S13).

図５は、モーションデータの構造例を示す図である。モーションデータは、キャラクタの動き情報であり、キャラクタの３Ｄモデルの動作を制御するための情報である。図５のモーションデータは、ディスクリプションデータおよび複数のボーンの位置情報（ボーンデータ）を含む。複数のボーンは、キャラクタの手足や指等の可動部に対応する。複数のボーンは、キャラクタの身体に対応するスケルトンを構成する。 Figure 5 shows an example of the structure of motion data. Motion data is information about the movement of a character, and is information used to control the behavior of a 3D model of the character. The motion data in Figure 5 includes description data and position information for multiple bones (bone data). The multiple bones correspond to the movable parts of the character, such as the limbs and fingers. The multiple bones make up a skeleton that corresponds to the character's body.

ディスクリプションデータは、ボーンを識別するための識別情報（ボーンの名称情報）、他のボーンとの接続関係を示す情報、およびスケルトンの識別情報等を含む。ディスクリプションデータは、静的な情報であるため、モーションデータに含まれる必要は無い。例えば、ディスクリプションデータは、モーションデータとは別に生成して出力してもよい。また、ディスクリプションデータは、毎フレームではなく数フレームあるは数十フレームに１回含まれていてもよい。 Description data includes identification information for identifying bones (bone name information), information indicating the connection relationships with other bones, and skeleton identification information. Because description data is static information, it does not need to be included in the motion data. For example, description data may be generated and output separately from the motion data. Also, description data may be included once every few frames or every few dozen frames, rather than every frame.

ボーンデータは、位置座標情報および回転座標情報を含む。位置座標情報は、例えば、ある位置を原点とした３軸直交座標で表される。位置座標情報は、３軸（Ｘ，Ｙ，Ｚ）のそれぞれについて例えば１フレームあたり３２ビットの情報を格納する場合、９６ビットのデータとなる。回転座標情報は、４軸（Ｘ，Ｙ，Ｚ，Ｗ）のそれぞれについて例えば１フレームあたり３２ビットの情報を格納する場合、１２８ビットとなる。そして、１つのスケルトンで例えば３０個のボーンを含む場合、モーションデータは１フレームあたり６７２０ビットのデータとなる。モーションデータは、この様なボーンデータを例えば６０ｆｐｓで駆動させる場合、１秒間あたり４０３２００ビット（５０４００バイト）のデータ容量を必要とする。なお、モーションデータは、絶対座標である必要はない。モーションデータは、基準となるボーンに対する相対座標であってもよい。相対座標は、毎フレーム必要ではなく、基準となるボーンに対して変化がある場合のフレームにのみ含まれていてもよい。 Bone data includes position coordinate information and rotation coordinate information. Position coordinate information is expressed, for example, as a three-axis Cartesian coordinate system with a certain position as the origin. If 32 bits of information are stored per frame for each of the three axes (X, Y, Z), the position coordinate information will be 96 bits of data. If 32 bits of information are stored per frame for each of the four axes (X, Y, Z, W), the rotation coordinate information will be 128 bits. If one skeleton contains, for example, 30 bones, the motion data will be 6,720 bits of data per frame. If such bone data is driven at, for example, 60 fps, the motion data requires a data capacity of 403,200 bits (50,400 bytes) per second. Note that motion data does not need to be absolute coordinates. Motion data can also be relative coordinates with respect to a reference bone. Relative coordinates do not need to be included every frame; they can be included only in frames where there is a change relative to the reference bone.

モーションデータを生成した後、処理部１５４は、デジタルオーディオ信号およびモーションデータに基づいてマルチトラックオーディオデータを生成する（Ｓ１２）。具体的には、処理部１５４は、マルチトラックオーディオデータの第１のチャンネルにミキサ１１から受け付けたデジタルオーディオ信号のデータ列を格納する。処理部１５４は、ミキサ１１からＤａｎｔｅ（登録商標）のプロトコルに準拠したオーディオデータを受信した場合には、当該受信したオーディオデータの各チャンネルのデジタルオーディオ信号のデータ列をそのままマルチトラックオーディオデータの同じチャンネルに格納する。 After generating the motion data, the processing unit 154 generates multi-track audio data based on the digital audio signal and the motion data (S12). Specifically, the processing unit 154 stores the data string of the digital audio signal received from the mixer 11 in the first channel of the multi-track audio data. When the processing unit 154 receives audio data conforming to the Dante (registered trademark) protocol from the mixer 11, it stores the data string of the digital audio signal for each channel of the received audio data as is in the same channel of the multi-track audio data.

また、処理部１５４は、マルチトラックオーディオデータの第２チャンネルに、モーションデータを、デジタルオーディオ信号のデータ列として格納する。 The processing unit 154 also stores the motion data as a data sequence of a digital audio signal in the second channel of the multi-track audio data.

図６は、モーションデータをデジタルオーディオ信号のデータ列として格納する場合のフォーマットの一例を示す図である。上述の様に、マルチトラックオーディオデータの各チャンネルのデジタルオーディオ信号は、一例としてサンプリング周波数４８ｋＨｚ、２４ビットのデジタルオーディオ信号である。図５は、マルチトラックオーディオデータにおける、あるチャンネルのあるサンプルのデータ列を示す。マルチトラックオーディオデータは、受信したデジタルオーディオ信号をそのままデジタルオーディオ信号のデータ列として格納した第１チャンネルと、図６に示す様なデータ列からなる第２チャンネルとを有する。第１チャンネルおよび第２チャンネルの数は、それぞれいくつであってもよい。一例として、処理部１５４は、第１チャンネルとしてチャンネル１～３２にミキサ１１から受信したオーディオデータのデジタルオーディオ信号を格納し、第２チャンネルとしてチャンネル３３～６４にモーションデータションを格納する。 Figure 6 shows an example of a format for storing motion data as a data sequence of digital audio signals. As described above, the digital audio signals of each channel of multi-track audio data are, for example, 24-bit digital audio signals with a sampling frequency of 48 kHz. Figure 5 shows a data sequence of a certain sample in a certain channel of multi-track audio data. The multi-track audio data has a first channel in which the received digital audio signal is stored as a data sequence of digital audio signals as is, and a second channel consisting of a data sequence such as that shown in Figure 6. There can be any number of first channels and any number of second channels. As an example, the processing unit 154 stores the digital audio signals of the audio data received from the mixer 11 in channels 1 to 32 as the first channels, and stores motion data in channels 33 to 64 as the second channels.

図６に示す様に、第２チャンネルの各サンプルは、８ビットのヘッダ情報と、モーションデータのデータ本体と、からなる。データ本体は、１６ビットのデータ列を含む。 As shown in Figure 6, each sample in the second channel consists of 8-bit header information and the motion data itself. The data itself contains a 16-bit data string.

８ビットのヘッダ情報は、スタートビット、データサイズフラグ、およびタイプのデータを含む。スタートビットは、そのサンプルのデータが先頭データであるか否かを示す１ビットのデータである。スタートビットが１である場合、先頭データであることを示す。つまり、スタートビットが１を示すサンプルから次にスタートビットが１を示すまでのサンプルが、１つのモーションデータに対応する。 The 8-bit header information includes a start bit, a data size flag, and type data. The start bit is a 1-bit data that indicates whether the data in that sample is the first data. A start bit of 1 indicates that it is the first data. In other words, the samples from one sample whose start bit is 1 to the next sample whose start bit is 1 correspond to one piece of motion data.

データサイズフラグは、データサイズを示す１ビットのデータである。例えば、データサイズフラグが１である場合、１つのサンプル内に８ビットのデータが含まれていることを示し、データサイズフラグが０である場合、１つのサンプル内に１６ビットのデータが含まれていることを示す。例えば、１つのモーションデータが２４ビットのデータを含む場合、１つ目のサンプルのスタートビットは１、データサイズフラグは０になる。２つめのサンプルのスタートビットは０、データサイズフラグは１になる。１つのモーションデータが８ビットのデータである場合、各サンプルのスタートビットは０になる。 The data size flag is 1 bit of data that indicates the data size. For example, if the data size flag is 1, it indicates that one sample contains 8 bits of data, and if the data size flag is 0, it indicates that one sample contains 16 bits of data. For example, if one piece of motion data contains 24 bits of data, the start bit of the first sample will be 1 and the data size flag will be 0. The start bit of the second sample will be 0 and the data size flag will be 1. If one piece of motion data is 8 bits of data, the start bit of each sample will be 0.

次に、タイプは、データの種類を示す識別情報であり、６ビットのデータである。この例ではタイプは、６ビットであるため、６４個のデータ種別を示す。無論、ビット数は６ビットに限らない。タイプのビット数は、必要とする種別の数に応じたビット数に設定できる。 Next, type is identification information that indicates the type of data, and is 6-bit data. In this example, type is 6 bits, so it indicates 64 data types. Of course, the number of bits is not limited to 6 bits. The number of bits for type can be set to the number of bits corresponding to the number of types required.

タイプは、例えば、「０１」のデータであればモーションデータを示す。タイプは、例えば、「０２」のデータであれば他のデータ（例えばＭＩＤＩのデータ）を示す。また、タイプは、例えば、「００」のデータであればＩｄｌｅデータ（空データ）を示す。ただし、全てのビットデータが１になる「３Ｆ」は、使用しないことが好ましい。処理部１５４が「３Ｆ」を使用しない場合、マルチトラックオーディオデータを再生する装置は、全てのビットデータが１の場合に何らかの障害が発生したと判断することができる。処理部１５４は、全てのビットデータが１の場合、マルチトラックオーディオデータを再生せず、制御信号を出力しない。これにより、処理部１５４は、異常な信号が出力された場合に、照明等の機器にダメージ等を与えることがない。 For example, if the type is "01", it indicates motion data. If the type is "02", it indicates other data (e.g., MIDI data). If the type is "00", it indicates idle data (empty data). However, it is preferable not to use "3F", in which all bit data is 1. If the processing unit 154 does not use "3F", the device playing multi-track audio data can determine that some kind of failure has occurred when all bit data is 1. If all bit data is 1, the processing unit 154 does not play the multi-track audio data or output a control signal. This prevents the processing unit 154 from damaging equipment such as lighting when an abnormal signal is output.

なお、処理部１５４は、タイプを「００」にして全てのビットデータを０、つまりＩｄｌｅデータ（空データ）とする場合、スタートビットを１にして、全てのビットが０にならない様にすることが好ましい。これにより、マルチトラックオーディオデータを処理する装置は、全てのビットデータが０の場合に何らかの障害が発生したと判断することができる。処理部１５４は、全てのビットデータが０の場合、マルチトラックオーディオデータを再生しない。この場合も、処理部１５４は、異常な信号が出力されて機器にダメージ等を与えることがない。 When the processing unit 154 sets the type to "00" and all bit data to 0, i.e., idle data (empty data), it is preferable to set the start bit to 1 to prevent all bits from becoming 0. This allows the device that processes multi-track audio data to determine that some kind of error has occurred when all bit data is 0. If all bit data is 0, the processing unit 154 will not play the multi-track audio data. Even in this case, the processing unit 154 will not output an abnormal signal that may cause damage to the device.

また、処理部１５４は、図６に示したデータ形式に一致しないデータが含まれている場合に、マルチトラックオーディオデータを再生しないようにしてもよい。例えば、データサイズフラグが１である場合、下位８ビットは全て０のビットデータビットとなる（または全て１のビットデータとなる）。したがって、処理部１５４は、データサイズフラグが１であるにも関わらず、下位８ビットに０および１のビットデータが混在している場合、マルチトラックオーディオデータを再生しない。この場合も、処理部１５４は、異常な信号が出力された場合に、機器にダメージ等を与えることがない。 The processing unit 154 may also prevent the multi-track audio data from being played back if it contains data that does not match the data format shown in Figure 6. For example, if the data size flag is 1, the lowest 8 bits will all be 0 bits (or all 1 bits). Therefore, even if the data size flag is 1, the processing unit 154 will not play back the multi-track audio data if the lowest 8 bits contain a mixture of 0 and 1 bits. In this case, too, the processing unit 154 will not cause damage to the equipment if an abnormal signal is output.

なお、処理部１５４は、同じチャンネルに同じタイプのデータのみ含む様にしてもよいが、同じチャンネルに異なるタイプのデータを含んでもよい。つまり、処理部１５４は、第２のチャンネルのうち同じチャンネルには、１つの識別情報のみ含むようにしてもよいし、１つのチャンネルに複数の識別情報を含むようにしてもよい。 Note that the processing unit 154 may cause the same channel to contain only data of the same type, or may cause the same channel to contain data of different types. In other words, the processing unit 154 may cause the same channel of the second channels to contain only one piece of identification information, or may cause one channel to contain multiple pieces of identification information.

また、処理部１５４は、同じ種別のデータを予め決められた複数のチャンネルにまたがって格納してもよい。例えば、処理部１５４は、チャンネル３３，３４，３５の３つのチャンネルにそれぞれ順番にデータを格納する。この場合、チャンネル３３，３４，３５は、全て同じタイプのビットデータを含む。処理部１５４は、仮にチャンネル３３，３４，３５のデータサイズフラグを全て０とすれば、１サンプルで３倍（４８ビット）のデータを格納することができる。あるいは、処理部１５４は、例えば４つのチャンネルに同じ種別のデータを格納すれば１サンプルで４倍（６４ビット）のデータを格納することができる。この様に、処理部１５４は、複数チャンネルにまたがってデータを格納することで、単位時間あたりのデータ量の多い種別のデータ（例えば映像データ）を１サンプル内に格納することもできる。 The processing unit 154 may also store the same type of data across multiple predetermined channels. For example, the processing unit 154 stores data in each of the three channels, 33, 34, and 35, in order. In this case, channels 33, 34, and 35 all contain the same type of bit data. If the data size flags for channels 33, 34, and 35 are all set to 0, the processing unit 154 can store three times as much data (48 bits) in one sample. Alternatively, if the processing unit 154 stores the same type of data in, for example, four channels, it can store four times as much data (64 bits) in one sample. In this way, by storing data across multiple channels, the processing unit 154 can store data of a type that generates a large amount of data per unit time (for example, video data) in one sample.

さらに、処理部１５４は、複数チャンネルにまたがってデータを格納する場合、代表して１つのチャンネル（例えばチャンネル３３）にのみヘッダ情報を付加し、他のチャンネルはヘッダ情報を付加しなくてもよい。この場合、処理部１５４は、代表のチャンネルに１６ビットのデータを格納し、他のチャンネルは最大で２４ビットのデータを格納することができる。すなわち、処理部１５４は、３つのチャンネルで、最大で６４ビットのデータを格納することができる。なお、この場合、代表のチャンネルのデータサイズフラグは、１ビットではなく、例えば３ビット（８種類のデータサイズを示すビットデータ）であってもよい。 Furthermore, when storing data across multiple channels, the processing unit 154 may add header information to only one representative channel (e.g., channel 33) and not to the other channels. In this case, the processing unit 154 stores 16 bits of data in the representative channel, and the other channels can store up to 24 bits of data. In other words, the processing unit 154 can store up to 64 bits of data across three channels. In this case, the data size flag for the representative channel may be, for example, 3 bits (bit data indicating 8 types of data sizes) instead of 1 bit.

また、データ処理装置１０は、ミキサ１１等の音響機器から信号処理の内容を示す信号処理パラメータやミキサ１１の基本設定に係るデータを受信し、第２チャンネルにデジタルオーディオ信号のデータ列として格納してもよい。 The data processing device 10 may also receive signal processing parameters indicating the content of signal processing and data related to the basic settings of the mixer 11 from audio equipment such as the mixer 11, and store them as a data stream of a digital audio signal on the second channel.

なお、上述した様に、本発明においてマルチトラックオーディオデータのサンプリング周波数は４８ｋＨｚに限らないし、ビット数は２４ビットに限らない。例えば、マルチトラックオーディオデータのサンプリング周波数が９６ｋＨｚである場合、処理部１５４は、２サンプルに１回の無効データとして、４８ｋＨｚのサンプリングデータを生成してもよい。また、ビット数が３２ビットである場合、処理部１５４は、下位の８ビットを使用せず、上位の２４ビットのみを使用してもよい。 As mentioned above, in the present invention, the sampling frequency of multi-track audio data is not limited to 48 kHz, and the number of bits is not limited to 24 bits. For example, if the sampling frequency of multi-track audio data is 96 kHz, the processing unit 154 may generate 48 kHz sampling data with invalid data once every two samples. Furthermore, if the number of bits is 32 bits, the processing unit 154 may not use the lowest 8 bits, but may use only the highest 24 bits.

マルチトラックオーディオデータが例えば４８ｋＨｚ、２４ビットである場合、１サンプルのデータ列は例えば１６ビットである。この場合、マルチトラックオーディオデータは、１秒間あたりに１６×４８０００＝７６８０００ビット（９６０００バイト）の容量のデータを格納することができる。 If the multi-track audio data is, for example, 48 kHz and 24 bits, the data string for one sample is, for example, 16 bits. In this case, the multi-track audio data can store a capacity of 16 x 48,000 = 768,000 bits (96,000 bytes) per second.

一方、モーションデータは、ボーンデータを例えば６０ｆｐｓで駆動させる場合、１秒間あたり４０３２００ビット（５０４００バイト）のデータ容量を必要とする。つまり、マルチトラックオーディオデータは、モーションデータの約２倍のビットレートを有する。したがって、データ処理装置１０は、マルチトラックオーディオデータのビットレートよりも低いビットレートのモーションデータを格納することができる。なお、データ処理装置１０は、複数サンプルのうち少なくとも１つのサンプルを無効データとし、無効データ以外のサンプルにデータを格納してもよい。例えば、データ処理装置１０は、２サンプルのうち１サンプルを無効データとし、残りの１つのサンプルにモーションデータを格納してもよい。この場合、再生側では第２チャンネルのモーションデータが１サンプル程度ずれる可能性がある。しかし、サンプリング周波数４８ｋＨｚの時間ずれは０．０２０８ｍｓｅｃに過ぎず、仮に数サンプルずれた場合でも時間ずれは１ｍｓｅｃ未満に収まる。したがって、視聴者がモーションデータによる画像と音のずれを知覚することはない。 On the other hand, when bone data is driven at, for example, 60 fps, motion data requires a data capacity of 403,200 bits (50,400 bytes) per second. In other words, multi-track audio data has a bit rate approximately twice that of motion data. Therefore, the data processing device 10 can store motion data at a bit rate lower than that of multi-track audio data. Note that the data processing device 10 may invalidate at least one sample among multiple samples and store data in the samples other than the invalid data. For example, the data processing device 10 may invalidate one sample among two samples and store motion data in the remaining sample. In this case, there is a possibility that the motion data in the second channel may be out of sync by approximately one sample on the playback side. However, the time lag at a sampling frequency of 48 kHz is only 0.0208 msec, and even if there is an lag of several samples, the time lag will be less than 1 msec. Therefore, the viewer will not perceive a lag between the image and sound caused by the motion data.

以上の様にして、ライブ配信のパフォーマンスを記録したマルチトラックデータが生成される。処理部１５４は、以上の様にして生成したマルチトラックオーディオデータを出力する（Ｓ１５）。マルチトラックオーディオデータは、自装置のフラッシュメモリ１０３に保存してもよいし、通信Ｉ／Ｆ１０６を介して他装置に配信してもよい。 In this manner, multi-track data recording the live streaming performance is generated. The processing unit 154 outputs the multi-track audio data generated in this manner (S15). The multi-track audio data may be stored in the flash memory 103 of the device itself, or may be distributed to another device via the communication I/F 106.

上述の様に、データ処理システム１は、例えばライブ演奏等のパフォーマンスを行う配信者の室内に設置される。データ処理装置１０は、ライブ演奏時の音響機器から受信するデジタルオーディオ信号を第１チャンネルに格納し、モーションデータを第２チャンネルに格納する。デジタルオーディオ信号およびモーションデータは、同じライブ演奏等のパフォーマンスの進行に合わせて生成されている。モーションデータは、デジタルオーディオ信号のサンプリング周波数（例えば４８ｋＨｚ）と同じ周波数で格納されている。 As described above, the data processing system 1 is installed in the room of a broadcaster who is performing a performance such as a live performance. The data processing device 10 stores the digital audio signal received from the audio equipment during the live performance in a first channel, and stores the motion data in a second channel. The digital audio signal and motion data are generated in accordance with the progress of the same performance such as a live performance. The motion data is stored at the same sampling frequency (e.g., 48 kHz) as the digital audio signal.

したがって、データ処理装置１０は、第１チャンネルに所定のサンプリング周波数（例えば４８ｋＨｚ）のデジタルオーディオ信号を格納し、第２チャンネルに第１チャンネルと同じ周波数（４８ｋＨｚ）のモーションデータを格納したマルチトラックオーディオデータを生成することで、タイムコードを使用することなくオーディオ信号とモーションデータを同期することができる。これにより、データ処理システム１は、複数の機器でそれぞれ用いられるプロトコル毎に専用の記録機器を用意して各データを個別に記録する必要がない。また、データ処理システム１はオーディオ信号とモーションデータをタイムコードで同期させるためのタイムコード生成機、ケーブル、およびインタフェース等も不要である。また、データ処理システム１は、機器毎にタイムコードのフレームレートを合わせる等の設定も不要である。 The data processing device 10 can therefore synchronize audio signals and motion data without using time codes by generating multi-track audio data in which a digital audio signal with a predetermined sampling frequency (e.g., 48 kHz) is stored in the first channel and motion data with the same frequency (48 kHz) as the first channel is stored in the second channel. This eliminates the need for the data processing system 1 to prepare dedicated recording devices for each protocol used by multiple devices and record each piece of data separately. Furthermore, the data processing system 1 does not require a time code generator, cables, interfaces, etc. to synchronize audio signals and motion data with time codes. Furthermore, the data processing system 1 does not require settings such as matching the time code frame rate for each device.

本実施形態に示した第２チャンネルは、モーションデータをデジタルオーディオ信号のデータ列として格納しているため、所定のマルチトラックオーディオデータのプロトコル（例えばＤａｎｔｅ（登録商標）のプロトコル）に準ずる。したがって、第２チャンネルは、オーディオデータとして配信、再生もできるし、ＤＡＷ（Digital Audio Workstation）等のオーディオデータの編集アプリケーションプログラムを用いてコピー、カット、ペースト、あるいはタイミング調整等の編集を行うこともできる。例えば、ユーザは、ＤＡＷを用いて、ある時間帯に含まれている第１チャンネルおよび第２チャンネルのオーディオデータを異なる時間帯にカットアンドペーストする作業を行うと、オーディオデータだけでなく、モーションデータも、同期を崩さずに異なる時間帯に移動させることもできる。 The second channel shown in this embodiment stores motion data as a data stream of digital audio signals, and therefore conforms to a specific multi-track audio data protocol (e.g., the Dante (registered trademark) protocol). Therefore, the second channel can be distributed and played back as audio data, and can also be edited using an audio data editing application program such as a DAW (Digital Audio Workstation) to perform operations such as copying, cutting, pasting, and timing adjustment. For example, a user can use a DAW to cut and paste audio data from the first and second channels contained in a certain time period to a different time period, thereby moving not only the audio data but also the motion data to a different time period without disrupting synchronization.

次に、図７は、再生環境下におけるデータ処理システム１Ａの構成を示すブロック図である。データ処理システム１Ａは、例えばライブ演奏等のパフォーマンスを遠隔地で再現するための会場に設置される。 Next, Figure 7 is a block diagram showing the configuration of data processing system 1A in a playback environment. Data processing system 1A is installed in a venue, for example, to remotely reproduce a performance such as a live musical performance.

図１に示したデータ処理システム１と同じ構成は同一の符号を付し、説明を省略する。データ処理システム１Ａは、表示器１３を備える。表示器１３は、ＬＣＤ等のパネル型の表示機器でもよいし、投影機等の映像表示機器であってもよい。 The same components as those in the data processing system 1 shown in Figure 1 are given the same reference numerals and will not be described again. The data processing system 1A is equipped with a display 13. The display 13 may be a panel-type display device such as an LCD, or a video display device such as a projector.

なお再生会場におけるデータ処理装置１０は、ライブ演奏等のイベントが行われた会場と同じハードウェア構成を備える必要はない。 Note that the data processing device 10 at the playback venue does not need to have the same hardware configuration as the venue where the event, such as a live performance, was held.

図８は、データ処理装置１０の処理部１５４の再生時の動作を示すフローチャートである。まず、処理部１５４は、マルチトラックオーディオデータを受け付ける（Ｓ２１）。マルチトラックオーディオデータは、ライブ演奏を行っている会場のデータ処理装置１０から受信する、あるいはサーバから受信する。あるいは、データ処理装置１０は、フラッシュメモリ１０３に記憶されている過去のライブ演奏に係るマルチトラックオーディオデータを読み出す。あるいは、データ処理装置１０は、サーバ等の他装置に記憶されている過去のライブ演奏に係るマルチトラックオーディオデータを読み出す。 Figure 8 is a flowchart showing the operation of the processing unit 154 of the data processing device 10 during playback. First, the processing unit 154 accepts multi-track audio data (S21). The multi-track audio data is received from the data processing device 10 at the venue where the live performance is being held, or from a server. Alternatively, the data processing device 10 reads multi-track audio data relating to a past live performance stored in flash memory 103. Alternatively, the data processing device 10 reads multi-track audio data relating to a past live performance stored in another device such as a server.

処理部１５４は、受け付けたマルチトラックオーディオデータをデコードしてデジタルオーディオ信号およびモーションデータを再生する（Ｓ２２）。例えば、処理部１５４は、第１チャンネルであるチャンネル１～３２のデジタルオーディオ信号を取り出す。処理部１５４は、取り出したデジタルオーディオ信号をミキサ１１に出力する（Ｓ２３）。ミキサ１１は、受信したデジタルオーディオ信号をスピーカ等の音響機器に出力し、歌唱音や演奏音を再生する。 The processing unit 154 decodes the received multi-track audio data and reproduces the digital audio signal and motion data (S22). For example, the processing unit 154 extracts the digital audio signal of the first channel, channels 1 to 32. The processing unit 154 outputs the extracted digital audio signal to the mixer 11 (S23). The mixer 11 outputs the received digital audio signal to an audio device such as a speaker, and reproduces the singing sound or performance sound.

また、処理部１５４は、Ｓ２２の処理において第２チャンネルであるチャンネル３３～６４の各サンプルについて、８ビットのヘッダ情報を読み出して、８ビットまたは１６ビットのモーションデータの本体を取り出す。なお、第２チャンネルは、ミキサ１１の信号処理パラメータや基本設定に係るデータを含んでいてもよい。処理部は、これらの信号処理パラメータや基本設定に係るデータを取り出し、ミキサ１１に出力してもよい。 Furthermore, in the processing of S22, the processing unit 154 reads the 8-bit header information for each sample of channels 33 to 64, which are the second channel, and extracts the 8-bit or 16-bit main body of the motion data. Note that the second channel may also include data related to signal processing parameters and basic settings of the mixer 11. The processing unit may extract this signal processing parameters and data related to basic settings and output it to the mixer 11.

ミキサ１１は、データ処理装置１０から信号処理パラメータや設定データを受信し、当該信号処理パラメータや設定データに基づいて受信したオーディオ信号に各種の信号処理を施す。これにより、ミキサ１１は、ライブ演奏と同じ状態で歌唱音や演奏音を再生する。 The mixer 11 receives signal processing parameters and setting data from the data processing device 10 and performs various signal processing on the received audio signal based on the signal processing parameters and setting data. As a result, the mixer 11 reproduces singing and performance sounds in the same condition as a live performance.

処理部１５４は、取り出したモーションデータに基づいて３Ｄモデルのキャラクタの映像をレンダリングする（Ｓ２４）。ここでレンダリングとは、キャラクタの３Ｄモデルをモーションデータに基づいて制御し、３Ｄモデルを２次元の映像信号に変換することを意味する。図５で示した様に、モーションデータは、キャラクタの手足や指等の可動部に対応するボーンデータを含む。処理部１５４は、ボーンデータに含まれる位置座標情報および回転座標情報に基づいて、各フレームにおけるキャラクタの３Ｄモデルの構成を決定する。処理部１５４は、レイトレーシングあるいはスキャンライン等の所定の方式に基づいて、指定された視点から３Ｄモデルを視たキャラクタの映像を生成する。 The processing unit 154 renders an image of the 3D model character based on the extracted motion data (S24). Here, rendering means controlling the 3D model of the character based on the motion data and converting the 3D model into a two-dimensional video signal. As shown in Figure 5, the motion data includes bone data corresponding to the character's movable parts such as the limbs and fingers. The processing unit 154 determines the configuration of the character's 3D model for each frame based on the position coordinate information and rotation coordinate information included in the bone data. The processing unit 154 generates an image of the character as viewed from a specified viewpoint using a predetermined method such as ray tracing or scanline.

処理部１５４は、生成した映像信号を表示器１３に出力する（Ｓ２５）。表示器１３は、受信した映像信号に基づく映像を表示する。 The processing unit 154 outputs the generated video signal to the display unit 13 (S25). The display unit 13 displays an image based on the received video signal.

以上の様に、データ処理装置１０は、再生会場において、各サンプルの第１チャンネルのデジタルオーディオ信号を取り出して出力し、かつ各サンプルの第２チャンネルのモーションデータを取り出してキャラクタの映像をレンダリングすることで、タイムコードを使用しなくともデジタルオーディオ信号とキャラクタの映像に係る映像信号を同期して再生することができる。 As described above, the data processing device 10 extracts and outputs the digital audio signal of the first channel of each sample at the playback site, and extracts the motion data of the second channel of each sample to render the character image, thereby enabling the digital audio signal and the video signal related to the character image to be played back in sync without using a time code.

通常、映像データのデータ量は音データのデータ量よりも大きい。そのため、仮に音データを映像データに合わせて遅延させて同期すると、音データの配信が遅れるため、リアルタイム性が低下する。しかし、本実施形態に示したモーションデータは、一例として４８ｋＨｚ、２４ビットのビットレート（７６８ｋｂｐｓ）の半分のビットレート（４０３．２ｋｂｐｓ）である。したがって、本実施形態のデータ処理システムでは、同じフレーム内にデジタルオーディオ信号およびモーションデータを格納することができ、リアルタイム性を低下させることがない。また、仮に、モーションデータが複数サンプルにまたがって格納されたとしても、サンプリング周波数４８ｋＨｚの１サンプルの時間ずれは１ｍｓｅｃ未満に収まる。したがって、本実施形態のデータ処理システムの利用者は、ライブ演奏の会場とは異なる会場に居ながらもライブ演奏等のイベントに参加している様に知覚できるという新たな顧客体験を得ることができる。 Typically, the amount of video data is larger than the amount of audio data. Therefore, if audio data were delayed and synchronized with the video data, the delivery of the audio data would be delayed, resulting in a decrease in real-time performance. However, the motion data shown in this embodiment has a bit rate (403.2 kbps), half the bit rate (768 kbps) of 48 kHz, 24 bits. Therefore, the data processing system of this embodiment can store digital audio signals and motion data within the same frame, without reducing real-time performance. Furthermore, even if motion data is stored across multiple samples, the time lag for one sample at a sampling frequency of 48 kHz is less than 1 msec. Therefore, users of the data processing system of this embodiment can enjoy a new customer experience, perceiving themselves as participating in an event such as a live performance, even when they are in a different venue from the venue where the live performance is taking place.

また、例えば、配信元の会場が屋外であり、屋外で撮影された映像データが屋内の再生会場で再生されるとユーザは違和感を覚える。しかし、本実施形態のデータ処理システムは、モーションデータを配信するため、例えば配信先の会場に設置されたカメラで会場内を撮影し、当該会場内の映像にキャラクタの映像を合成することもできる。したがって、本実施形態のデータ処理システムは、再生環境に応じた最適な映像を再生させることができる。したがって、利用者は、従来ではなし得なかった没入感を実感できるという新たな顧客体験を得ることができる。 Furthermore, for example, if the distribution venue is outdoors and video data shot outdoors is played back in an indoor playback venue, the user may feel uncomfortable. However, in order to distribute motion data, the data processing system of this embodiment can, for example, film the inside of the venue with a camera installed at the destination venue and composite video of the character onto the video of the venue. Therefore, the data processing system of this embodiment can play back video that is optimal for the playback environment. This allows users to enjoy a new customer experience in which they can feel an immersive feeling that was previously not possible.

なお、図８で示したＳ２３のデジタルオーディオ信号を出力する処理と、Ｓ２５の映像信号の出力処理は同時に行ってもよいし、いずれを先に行ってもよい。 Note that the process of outputting the digital audio signal at S23 and the process of outputting the video signal at S25 shown in Figure 8 may be performed simultaneously, or either may be performed first.

（変形例１）
変形例１のモーションデータは、キャラクタの主となる部位の動きのデータであるメインモーションデータと、主となる部位に従属する部位の動きのデータであるサブモーションデータとを有する。 (Variation 1)
The motion data of the first modification includes main motion data, which is data on the movement of the main body part of the character, and sub-motion data, which is data on the movement of the body parts subordinate to the main body part.

キャラクタの主となる部位は、少なくとも上腕および前腕であり、パフォーマンスを伝えるために重用な部位である。従属する部位は、例えば胴体、脚、手、指等のパフォーマンスを伝えるために直接的に必要ではない部位である。ただし、主となる部位および従属する部位は、パフォーマンスの種類によって異なる。例えばギターの演奏では右腕の前腕の動きと、左腕の前腕および指の動きがパフォーマンスを伝えるために重要な部位となるが、ピアノの演奏では両腕の前腕および動きが重要な部位となる。 The main parts of a character are at least the upper arms and forearms, which are important parts for conveying the performance. Subordinate parts are parts that are not directly necessary for conveying the performance, such as the torso, legs, hands, and fingers. However, the main and subordinate parts differ depending on the type of performance. For example, when playing the guitar, the movement of the forearm of the right arm and the movement of the forearm and fingers of the left arm are important parts for conveying the performance, but when playing the piano, the forearms and movements of both arms are important parts.

変形例１のマルチトラックオーディオデータは、デジタルオーディオ信号を格納する第１チャンネルと、メインモーションデータを格納する第２チャンネルと、サブモーションチャンネルを格納する第３チャンネルを、を有する。 The multi-track audio data of variant 1 has a first channel that stores a digital audio signal, a second channel that stores main motion data, and a third channel that stores a sub-motion channel.

データ処理装置１０の処理部１５４は、処理能力、ＣＰＵ１０４の使用率、またはＲＡＭ１０５の空き容量に応じて再生時に第３チャンネルのサブモーションデータを再生するか否かを決定してもよい。これにより、データ処理装置１０は、処理能力、ＣＰＵ１０４の使用率、またはＲＡＭ１０５の空き容量に余裕がない場合であっても、再生を停止したり遅らせたりせず、リアルタイム性を低下させることがない。また、データ処理装置１０は、サブモーションデータを再生しなくても、パフォーマンスを伝えるために重用な部位であるメインモーションデータを再生するため、利用者の没入感を阻害することがない。 The processing unit 154 of the data processing device 10 may determine whether or not to play sub-motion data for the third channel during playback depending on the processing power, the usage rate of the CPU 104, or the available capacity of the RAM 105. As a result, even if the processing power, usage rate of the CPU 104, or the available capacity of the RAM 105 is low, the data processing device 10 will not stop or delay playback, and real-time performance will not be compromised. Furthermore, even if the data processing device 10 does not play sub-motion data, it will play main motion data, which is an important part of conveying the performance, so the user's sense of immersion will not be hindered.

（変形例２）
変形例２に係るデータ処理装置１０は、再生するキャラクタの指定を受け付けて、取り出したモーションデータを用いて、指定されたキャラクタの映像をレンダリングする。 (Variation 2)
The data processing device 10 according to the second modification accepts the designation of a character to be played back, and uses the extracted motion data to render an image of the designated character.

例えば、配信先の第１会場が北半球であり冬である場合、第１会場のデータ処理装置１０は、冬服を着たキャラクタの指定を受け付ける。一方、配信先の第２会場が南半球であり夏である場合、第２会場のデータ処理装置１０は、夏服を着たキャラクタの指定を受け付ける。 For example, if the first venue to which distribution is to be made is in the Northern Hemisphere and it is winter, the data processing device 10 at the first venue will accept the specification of a character wearing winter clothing. On the other hand, if the second venue to which distribution is to be made is in the Southern Hemisphere and it is summer, the data processing device 10 at the second venue will accept the specification of a character wearing summer clothing.

これにより、変形例２のデータ処理装置１０は、再生環境に応じたより最適な映像を再生させることができる。 This allows the data processing device 10 of variant 2 to play back video that is more optimal for the playback environment.

（変形例３）
変形例３に係るデータ処理装置１０は、変形例２においてさらに、指定されたキャラクタに基づいて取り出したモーションデータを補正し、補正したモーションデータを用いて指定されたキャラクタの映像をレンダリングする。 (Variation 3)
The data processing device 10 according to the third modification further corrects the extracted motion data based on the specified character in the second modification, and renders an image of the specified character using the corrected motion data.

３Ｄモデルデータのサイズは、キャラクタによって場合がある。そこで、データ処理装置１０は、指定されたキャラクタのデータ（例えば身長を示すデータ）に基づいて、モーションデータに含まれる各ボーンデータの長さを補正する。 The size of the 3D model data may vary depending on the character. Therefore, the data processing device 10 corrects the length of each bone data included in the motion data based on the specified character data (for example, data indicating height).

これにより、変形例３のデータ処理装置１０は、より最適な映像を再生させることができる。 This allows the data processing device 10 of variant 3 to play more optimal video.

（変形例４）
変形例４に係るデータ処理装置１０は、背景映像の指定を受け付けて、指定された背景映像に、レンダリングしたキャラクタの映像を重畳する。 (Variation 4)
The data processing device 10 according to the fourth modification accepts the designation of a background image, and superimposes the rendered image of the character on the designated background image.

上述した様に、例えば、配信元の会場が屋外であり、屋外で撮影された映像データが屋内の再生会場で再生されるとユーザは違和感を覚える。しかし、変形例４のデータ処理装置１０は、指定された背景映像に、レンダリングしたキャラクタの映像を重畳する（合成する）。変形例４のデータ処理装置１０は、例えば、再生会場が屋内であれば屋内の背景映像を重畳する。あるいは、変形例４のデータ処理装置１０は、再生会場が夏であれば夏の背景映像を重畳したり、再生会場が冬であれば冬の背景映像を重畳したりしてもよい。 As mentioned above, for example, if the distribution venue is outdoors and video data shot outdoors is played back at an indoor playback venue, the user may feel uncomfortable. However, data processing device 10 of variant 4 superimposes (combines) the rendered character video onto the specified background video. For example, if the playback venue is indoors, data processing device 10 of variant 4 may superimpose an indoor background video. Alternatively, data processing device 10 of variant 4 may superimpose a summer background video if the playback venue is summer, or a winter background video if the playback venue is winter.

したがって、変形例４のデータ処理装置１０は、再生環境に応じた最適な映像を再生させることができる。したがって、利用者は、従来ではなし得なかった没入感を実感できるという新たな顧客体験を得ることができる。 Therefore, the data processing device 10 of variant 4 can play back video that is optimal for the playback environment. This allows users to have a new customer experience in which they can feel an immersive feeling that was previously not possible.

本実施形態の説明は、すべての点で例示であって、制限的なものではないと考えられるべきである。本発明の範囲は、上述の実施形態ではなく、特許請求の範囲によって示される。さらに、本発明の範囲は、特許請求の範囲と均等の範囲を含む。例えば、上記実施形態では、ＣＰＵ１０４の読み出したプログラムにより本発明の処理部を構成したが、例えばＦＰＧＡ（Ｆｉｅｌｄ－ＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）により本発明の処理部を実現することも可能である。 The description of this embodiment should be considered illustrative in all respects and not restrictive. The scope of the present invention is defined by the claims, not the above-described embodiments. Furthermore, the scope of the present invention includes equivalents to the claims. For example, in the above embodiment, the processing unit of the present invention is configured by a program read by the CPU 104, but it is also possible to realize the processing unit of the present invention using, for example, an FPGA (Field-Programmable Gate Array).

本実施形態に示したデータ処理システムは、テーマパーク等の映像および音響を組み合わせる必要のあるシステムに適用することもできる。あるいは、本実施形態に示したデータ処理システムは、遠隔セッションシステムに適用することもできる。遠隔セッションでは音だけではなく演奏者または歌唱者の動きを伝えることが重要である。しかし、上述のように映像データのデータ量は音データのデータ量よりも大きいため、音データと映像データを別々に配信すると映像データの遅延が大きくなる。また、音データと映像データを同期させて配信すると、リアルタイム性が低下する。これに対して本実施形態に示したデータ処理システムは、映像データよりもデータ量の小さいモーションデータを送受信するため、リアルタイム性を低下させずに低遅延で遠隔の演奏者または歌唱者の動きを伝えることができる。 The data processing system shown in this embodiment can also be applied to systems that require a combination of video and audio, such as theme parks. Alternatively, the data processing system shown in this embodiment can also be applied to a remote session system. In a remote session, it is important to convey not only the sound but also the movements of the performer or singer. However, as mentioned above, the amount of video data is greater than the amount of audio data, so transmitting the audio data and video data separately results in significant delays in the video data. Furthermore, transmitting the audio data and video data synchronously reduces real-time performance. In contrast, the data processing system shown in this embodiment sends and receives motion data, which has a smaller amount of data than video data, and can therefore convey the movements of a remote performer or singer with low delay without reducing real-time performance.

１：データ処理システム，１Ａ：データ処理システム，１０：データ処理装置，１１：ミキサ，１２：モーションキャプチャ，１３：表示器，１０１：表示器，１０２：ユーザＩ／Ｆ，１０３：フラッシュメモリ，１０４：ＣＰＵ，１０５：ＲＡＭ，１０６：通信Ｉ／Ｆ，１５４：処理部 1: Data processing system, 1A: Data processing system, 10: Data processing device, 11: Mixer, 12: Motion capture, 13: Display, 101: Display, 102: User I/F, 103: Flash memory, 104: CPU, 105: RAM, 106: Communication I/F, 154: Processing unit

Claims

A data processing method for generating multi-track audio data storing audio data of multiple channels including at least a first channel and a second channel, comprising:
storing a data string of a digital audio signal in the first channel;
storing motion data, which is information on character movement and is associated with the digital audio signal, as a data string of the digital audio signal in the second channel;
Data processing methods.

Distributing the multi-track audio data.
The data processing method according to claim 1 .

A data processing method for reproducing multi-track audio data that stores audio data of multiple channels, comprising:
Reproducing the data string stored in the first channel as a digital audio signal;
extracting the data string stored in the second channel as motion data, which is information on character movement associated with the digital audio signal, and rendering an image of the character using the extracted motion data;
Data processing methods.

the second channel includes identification information of the motion data;
The data processing method according to any one of claims 1 to 3.

the motion data includes main motion data, which is data on the movement of a main body part of the character, and sub-motion data, which is data on the movement of a body part subordinate to the main body part;
the audio data further comprises a third channel in which the sub-motion data is stored;
The data processing method according to any one of claims 1 to 3.

Accepts the specification of the character to be played,
Rendering an image of the specified character using the extracted motion data.
The data processing method according to claim 3 .

correcting the extracted motion data based on the specified character;
Rendering an image of the designated character using the corrected motion data;
7. The data processing method according to claim 6.

Accepts the background video specification,
superimposing the rendered image of the character on the specified background image;
The data processing method according to claim 3 .

A data processing method for generating multi-track audio data storing audio data of multiple channels including at least a first channel and a second channel, comprising:
storing a data string of a digital audio signal in the first channel;
storing motion data, which is information on character movement and is associated with the digital audio signal, as a data string of the digital audio signal in the second channel;
A data processing device having a processing unit.