JP2010074238A

JP2010074238A - Receiver and control method therefor

Info

Publication number: JP2010074238A
Application number: JP2008236277A
Authority: JP
Inventors: Atsushi Mizutome; 敦水留
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2008-09-16
Filing date: 2008-09-16
Publication date: 2010-04-02
Anticipated expiration: 2028-09-16
Also published as: JP5253062B2

Abstract

【課題】受信装置の能力や機能に応じて、画面の一部（放送局が指定する代表画面やユーザが選択する画面）を切り出して視聴しても、切り出し視聴している画面の位置や画角に応じてマルチチャンネル音像を適切に保つことができる受信装置およびその制御方法を提供する。
【解決手段】受信装置が、複数チャンネルの音声を伴う第１の映像を受信する受信部１０２と、前記第１の映像の一部を切り出して第２の映像を出力する映像出力制御部１０９と、前記第２の映像と共に出力するための出力音声を生成する音声出力制御部１１１と、を備える。受信装置は、前記第２の映像の切り出し位置に基づいて、前記第１の映像の音声である複数チャンネルの入力音声の中から、前記出力音声の合成に用いる入力音声の組み合わせを決定し、決定した組み合わせの入力音声を合成して前記出力音声を生成する。
【選択図】図１[PROBLEMS] To cut and view a part of a screen (a representative screen specified by a broadcast station or a screen selected by a user) according to the capability and function of a receiving apparatus, Provided are a receiving device capable of appropriately maintaining a multi-channel sound image according to a corner, and a control method thereof.
A receiving device receives a first video accompanied by a plurality of channels of audio, a video output control unit 109 that cuts out a part of the first video and outputs a second video; And an audio output control unit 111 that generates output audio to be output together with the second video. The receiving apparatus determines a combination of input sounds used for synthesizing the output sound from a plurality of channels of the input sound that is the sound of the first video, based on the cutout position of the second video. The output speech is generated by synthesizing the input speech of the combination.
[Selection] Figure 1

Description

本発明は、マルチチャンネル音声を伴う映像の一部を切り出し視聴可能な受信装置及びその制御方法に関する。 The present invention relates to a receiving apparatus that can cut out and view a part of video accompanying multi-channel audio and a control method thereof.

デジタル放送ではハイビジョン映像（２ｋ×１ｋ：１９２０×１０２４（以下、ＨＤと記す場合あり））が一般的となっている。また、高度衛星デジタル放送においては、さらに高解像度（４ｋ×２ｋ：デジタルシネマ、８ｋ×４ｋ：スーパーハイビジョン（以下ＳＨＶと記す場合あり））の映像や２２．２チャンネルのマルチチャンネル音声を送る方式が検討されている。 High-definition video (2k × 1k: 1920 × 1024 (hereinafter sometimes referred to as HD)) is common in digital broadcasting. Also, in advanced satellite digital broadcasting, there are methods for sending higher resolution (4k × 2k: digital cinema, 8k × 4k: Super Hi-Vision (hereinafter sometimes referred to as SHV)) video and 22.2 channel multi-channel audio. It is being considered.

これら従来の解像度を超えるスーパーハイビジョンなどの放送においては、受信装置の能力や機能に応じた表示を行うことも併せて検討されている。例えば、画面全体をダウンコンバートして視聴するケースのほかに、放送局が指定した代表画面（ＳＨＶ画面の一部）やユーザが選択した画面を受信機側で切り出して視聴する形態が検討されている（以下、トリミング視聴と記す場合あり）。このとき、切り出して視聴している代表画面の位置や画角（サイズ）とマルチチャンネル音像との関係を適切に保つことが必要になってくる。 In broadcasting such as Super Hi-Vision exceeding the conventional resolution, it is also considered to perform display according to the capability and function of the receiving device. For example, in addition to the case where the entire screen is viewed by down-conversion, a mode in which a representative screen (a part of the SHV screen) designated by the broadcast station or a screen selected by the user is cut out and viewed on the receiver side is being studied. (Hereinafter referred to as trimmed viewing). At this time, it is necessary to appropriately maintain the relationship between the position and angle of view (size) of the representative screen being cut out and viewed and the multi-channel sound image.

従来、ユーザが視聴している画面に応じて、音声を切り替えるものとしては、次のものがある。特許文献１には、マルチ画面表示においてユーザがどの画面を見ているかを検出し、見ている画面に対応する音声に切り替えて出力する方法が開示されている。また特許文献２には、ユーザが指定した画面上の位置に近い音源を重み付けして合成出力することにより、音声のズームイン効果を得る方法が開示されている。 Conventionally, there are the followings for switching sounds according to the screen that the user is viewing. Patent Document 1 discloses a method of detecting which screen a user is viewing in multi-screen display, and switching to and outputting audio corresponding to the screen being viewed. Patent Document 2 discloses a method of obtaining a sound zoom-in effect by weighting and synthesizing a sound source close to a position on a screen designated by a user.

他方、聴取位置がスピーカに対して非対称である環境において正しい音像を提供するものとして、聴取者近傍に置かれる音声検出手段により聴取者と各スピーカまでの距離を測定し、音場を聴取者の聴取位置に定位させる音像位置補正装置がある（特許文献３）。
特開２０００−２７８６２６号公報特開平８−２９８６３５号公報特開平７−７５２００号公報 On the other hand, in order to provide a correct sound image in an environment where the listening position is asymmetric with respect to the speaker, the distance between the listener and each speaker is measured by the sound detection means placed in the vicinity of the listener, and the sound field of the listener is measured. There is a sound image position correction device that localizes to a listening position (Patent Document 3).
JP 2000-278626 A JP-A-8-298635 JP 7-75200 A

通常、高解像度映像とともに送出されるマルチチャンネル音声は、その高解像度映像をフル解像度で表示し、画面の正面中央の位置で視聴する場合に最適となるよう音像が調整されている。しかし、上述のように画面の一部を切り出し視聴するような場合、その切り出し位置が画面中央部ではないケースが存在する。このとき、マルチチャンネル音声の出力バランスがオリジナルのままであると、視聴している画面と音像とにずれが生じ、違和感が発生するという問題がある。 Normally, multi-channel audio transmitted together with a high-resolution video is adjusted so that the sound image is optimal when the high-resolution video is displayed at full resolution and viewed at the center position in front of the screen. However, when a part of the screen is cut out and viewed as described above, there is a case where the cutout position is not in the center of the screen. At this time, if the output balance of the multi-channel sound remains the same, there is a problem that a difference between the screen being viewed and the sound image is generated, resulting in a sense of incongruity.

上述した従来の技術は、マルチ画面表示においてユーザが見ている画面の音声に完全に切り替えて出力するもの（特許文献１）や、ユーザが指定した画面の位置に音声をズームインさせるもの（特許文献２）である。よって、ユーザが画面の一部を切り出して視聴している場合において、マルチチャンネルの音像を最適な位置に補正することは出来なかった。一方、聴取位置に対して、適切な音像を得る従来の技術（特許文献３）は、ユーザとスピーカとの位置関係に基づく音声補正処理のみであり、視聴画面との関係が考慮されて
いなかった。 The conventional techniques described above are those that completely switch and output the sound of the screen that the user is viewing in multi-screen display (Patent Document 1), or those that zoom in the sound to the position of the screen specified by the user (Patent Document) 2). Therefore, when the user cuts out a part of the screen for viewing, the multi-channel sound image cannot be corrected to the optimum position. On the other hand, the conventional technique (Patent Document 3) for obtaining an appropriate sound image with respect to the listening position is only the sound correction processing based on the positional relationship between the user and the speaker, and the relationship with the viewing screen is not considered. .

そこで、本発明は、画面の一部を切り出し視聴する場合において、切り出された画面に応じて、適切なマルチチャンネル音像定位を得ることができる受信装置およびその制御方法を提供することを目的とする。 SUMMARY OF THE INVENTION Accordingly, an object of the present invention is to provide a receiving apparatus capable of obtaining an appropriate multi-channel sound image localization according to a cut out screen when a part of the screen is cut out and viewed, and a control method therefor. .

上記目的を達成するために本発明では、以下の構成を採用する。 In order to achieve the above object, the present invention employs the following configuration.

本発明に係る受信装置は、複数チャンネルの音声を伴う第１の映像を受信する受信手段と、前記第１の映像の一部を切り出して第２の映像を出力する映像出力制御手段と、前記第２の映像と共に出力するための出力音声を生成する音声出力制御手段と、を備える受信装置において、前記第２の映像の切り出し位置に基づいて、前記第１の映像の音声である複数チャンネルの入力音声の中から、前記出力音声の合成に用いる入力音声の組み合わせを決定する決定手段をさらに備え、前記音声出力制御手段は、前記決定手段で決定された前記入力音声を合成して前記出力音声を生成することを特徴とする受信装置である。 The receiving apparatus according to the present invention includes a receiving unit that receives a first video accompanied by a plurality of channels of audio, a video output control unit that cuts out a part of the first video and outputs a second video, Audio output control means for generating output audio to be output together with the second video, based on the cutout position of the second video, a plurality of channels of audio of the first video Further, it comprises a determining means for determining a combination of input sounds to be used for synthesizing the output sounds from the input sounds, and the sound output control means synthesizes the input sounds determined by the determining means to synthesize the output sounds. It is the receiver which produces | generates.

本発明に係る受信装置の制御方法は、複数チャンネルの音声を伴う第１の映像を受信する工程と、前記第１の映像の一部を切り出して第２の映像を出力する工程と、前記第２の映像と共に出力するための出力音声を生成する工程と、を備え、前記出力音声を生成する工程が、前記第２の映像の切り出し位置に基づいて、前記第１の映像の音声である複数チャンネルの入力音声の中から、前記出力音声の合成に用いる入力音声の組み合わせを決定する工程と、前記決定された前記入力音声を合成して前記出力音声を生成する工程と、を含むことを特徴とする受信装置の制御方法である。 The control method of the receiving apparatus according to the present invention includes a step of receiving a first video accompanied by a plurality of channels of audio, a step of cutting out a part of the first video and outputting a second video, A step of generating output audio for output together with the second video, wherein the step of generating the output audio is a plurality of audios of the first video based on a cutout position of the second video A step of determining a combination of input sounds used for synthesizing the output sound from the input sounds of the channels; and a step of generating the output sound by synthesizing the determined input sounds. This is a control method of the receiving device.

本発明によれば、受信装置の能力や機能に応じて、画面の一部を切り出して視聴しても、切り出された視聴画面に対して、適切なマルチチャンネル音像定位を実現することが可能となる。 According to the present invention, it is possible to realize appropriate multi-channel sound image localization for a cut-out viewing screen even if a part of the screen is cut out for viewing according to the capability or function of the receiving apparatus. Become.

以下、図を参照しながら、本発明による装置、方法の一実施の形態について説明する。以下に説明する実施の形態は、ハイビジョン放送の映像、音声（映像フォーマット：１９２０×１０８０／６０／ｉ、音声モード：５．１ｃｈ）が出力可能なデジタル放送受信装置において、スーパーハイビジョン放送（７６８０×４３２０／６０／ｐ、音声モード：２２．２ｃｈ）を受信処理する場合の例である。 Hereinafter, an embodiment of an apparatus and a method according to the present invention will be described with reference to the drawings. The embodiment described below is a digital broadcast receiver capable of outputting high-definition broadcast video and audio (video format: 1920 × 1080/60 / i, audio mode: 5.1ch). 4320/60 / p, audio mode: 22.2 ch).

［実施例１］
図１は本発明のデジタル放送受信装置のブロック図である。 [Example 1]
FIG. 1 is a block diagram of a digital broadcast receiver according to the present invention.

アンテナ１０１は、複数の映像データ、音声データ、メタデータなどが多重化されデジタル変調されたデジタルテレビジョン放送信号を受信する。より具体的には、映像フォーマットとして７６８０×４３２０／６０／ｐ（トリミング指定あり）、音声モードとして２２．２ｃｈのマルチチャンネルステレオのデジタル放送番組を受信する。 The antenna 101 receives a digital television broadcast signal in which a plurality of video data, audio data, metadata, and the like are multiplexed and digitally modulated. More specifically, it receives a digital broadcast program of 7680 × 4320/60 / p (with trimming designation) as the video format and 22.2 ch multi-channel stereo digital broadcast program as the audio mode.

受信部１０２は、デジタルテレビジョン放送信号の復調、誤り訂正処理などを行い、ＭＰＥＧ−２ＴＳ（ＴｒａｎｓｐｏｒｔＳｔｒｅａｍ：トランスポートストリーム）信号を出力する。 The receiving unit 102 performs demodulation, error correction processing, and the like of the digital television broadcast signal and outputs an MPEG-2 TS (Transport Stream) signal.

信号分離部１０３は、ユーザが選局を行った番組のパケットＩＤに従って、多重化され
たＭＰＥＧ−２ＴＳ信号から映像データ、音声データ、メタデータを分離し、それぞれ映像再生部１０４、音声再生部１０５、メタデータ処理部１０６に送る。なお、多重化されたＭＰＥＧ−２ＴＳ信号には、データ放送（マルチメディア）データを含む場合もあるが、図１ではその処理部については図示していない。 The signal separation unit 103 separates video data, audio data, and metadata from the multiplexed MPEG-2 TS signal according to the packet ID of the program selected by the user, and the video reproduction unit 104 and the audio reproduction unit, respectively. 105, and sent to the metadata processing unit 106. The multiplexed MPEG-2 TS signal may include data broadcast (multimedia) data, but the processing unit is not shown in FIG.

映像再生部１０４は、ＩＴＵ−ＴＨ．２６４｜ＩＳＯ／ＩＥＣ１４４９６−１０（ＭＰＥＧ−４ＡＶＣ）（以下、Ｈ．２６４）で符号化された映像データの復号を行う。 The video playback unit 104 is an ITU-T H.264. H.264 | ISO / IEC14496-10 (MPEG-4 AVC) (hereinafter referred to as H.264) is used to decode video data.

音声再生部１０５は、ＭＰＥＧ−２ＡＡＣで符号化された音声データの復号を行う。マルチチャンネルの音声データが含まれる場合、合成することなく独立に音声出力制御部１１１に送る。 The audio reproducing unit 105 decodes audio data encoded by MPEG-2 AAC. When multi-channel audio data is included, it is sent to the audio output control unit 111 independently without being synthesized.

メタデータ処理部１０６は、信号分離部１０３でＭＰＥＧ２−ＴＳ信号をフィルタリング処理するための情報を、ＰＳＩ／ＳＩ（ＰｒｏｇｒａｍＳｐｅｃｉｆｉｃＩｎｆｏｒｍａｔｉｏｎ／ＳｅｒｖｉｃｅＩｎｆｏｒｍａｔｉｏｎ）データから取り出す。また、メタデータ処理部１０６は、電子番組表などに利用される番組情報、さらに本発明に係わる映像フォーマットやトリミング（切り出し位置、およびサイズ）情報、音声モードなどの情報も、ＰＳＩ／ＳＩデータから取り出す。 The metadata processing unit 106 extracts information for filtering the MPEG2-TS signal by the signal separation unit 103 from PSI / SI (Program Specific Information / Service Information) data. The metadata processing unit 106 also receives program information used for an electronic program guide and the like, and information such as video format and trimming (cutting position and size) information, audio mode, etc. according to the present invention from the PSI / SI data. Take out.

表示位置検出部１０７は、メタデータ処理部１０６からのトリミング情報に基づき、ＳＨＶ（７６８０×４３２０）画面のどの部分を切り出すか、その座標を求め、映像出力制御部１０９に送る。また表示位置検出部１０７は、切り出し画面の座標から音声モードとして５．１ｃｈ出力可能な本受信装置のフロントスピーカ（３箇所）の座標を決定する。そして、表示位置検出部１０７は、３つのフロントスピーカの各々に対し、受信したＳＨＶの音声モード（２２．２ｃｈ）におけるフロントスピーカ（１１箇所）のうち、どのスピーカの音声出力を合成して出力するか、その組み合わせを決定する。 Based on the trimming information from the metadata processing unit 106, the display position detection unit 107 obtains the coordinates of which part of the SHV (7680 × 4320) screen is to be cut out and sends it to the video output control unit 109. Further, the display position detection unit 107 determines the coordinates of the front speakers (three locations) of the receiving apparatus capable of outputting 5.1ch as the audio mode from the coordinates of the cut-out screen. The display position detection unit 107 then synthesizes and outputs the sound output of which speaker among the front speakers (11 locations) in the received SHV sound mode (22.2ch) to each of the three front speakers. Or determine the combination.

補正データ算出部１０８は、本受信装置におけるスピーカ位置と、オリジナルのＳＨＶの音声モード（２２．２ｃｈ）におけるスピーカ位置と切り出し画面の視聴位置との位置関係から、切り出し画面視聴時の適正な音像を形成するための補正データを算出する。 The correction data calculation unit 108 obtains an appropriate sound image at the time of viewing the cutout screen from the positional relationship between the speaker position in the receiving apparatus, the speaker position in the original SHV audio mode (22.2 ch), and the viewing position of the cutout screen. Correction data for forming is calculated.

本実施例では、ＳＨＶ映像が複数チャンネル（２２．２ｃｈ）の音声を伴う第１の映像に該当し、受信部１０２が第１の映像を受信する受信手段に該当する。また、映像出力制御部１０９が、ＳＨＶ映像の一部を切り出して第２の映像（ＨＤ解像度の切り出し画面）を出力する映像出力制御手段に該当する。また、音声出力制御部１１１が、第２の映像と共に出力するための出力音声（ここでは５．１ｃｈ）を生成する音声出力制御手段に該当する。また、表示位置検出部１０７が、第２の映像の切り出し位置に基づいて、ＳＨＶの入力音声の中から、出力音声の合成に用いる入力音声の組み合わせを決定する決定手段に該当する。 In the present embodiment, the SHV video corresponds to a first video accompanied by a plurality of channels (22.2 ch), and the receiving unit 102 corresponds to a receiving unit that receives the first video. The video output control unit 109 corresponds to video output control means for cutting out a part of the SHV video and outputting the second video (HD resolution cut-out screen). Also, the audio output control unit 111 corresponds to audio output control means for generating output audio (5.1ch here) for output together with the second video. Further, the display position detection unit 107 corresponds to a determination unit that determines a combination of input voices used for synthesis of output voices from SHV input voices based on the cutout position of the second video.

ここで、補正データ算出部１０８について少し詳しく説明する。 Here, the correction data calculation unit 108 will be described in some detail.

図２は、補正データ算出部１０８の構成例である。補正データ算出部１０８は、角度解析部２０１、混合ゲインデータ算出部２０２、距離解析部２０３、ゲイン補正データ算出部２０４、遅延量補正データ算出部２０５で構成される。角度解析部２０１は、本受信装置のフロントスピーカと、それぞれに出力する２２．２ｃｈのフロントスピーカとの角度を算出する。以下、本受信装置のフロント左スピーカをＴＦＬ、フロントセンタースピーカをＴＦＣ、フロント右スピーカをＴＦＲと略記し、２２．２ｃｈのフロントスピーカをソースと略記する。混合ゲインデータ算出部２０２は、算出された角度に基づき、切り出し視聴している画面に対して視聴者が感じる音声の到来方向が適正になるよう合成すべき
各ソースの音声データの混合比（ゲイン）を決定する。距離解析部２０３は、角度解析部２０１で算出された角度に基づき、２２．２ｃｈの各ソースと本受信装置のフロントスピーカ（ＴＦＬ／ＴＦＣ／ＴＦＲ）との距離の違いを算出する。ゲイン補正データ算出部２０４は、算出された距離の違いの情報に基づき、本受信装置のフロントスピーカ（ＴＦＬ／ＴＦＣ／ＴＦＲ）の音声信号のゲイン（音量レベル）補正データを算出する。遅延量補正データ算出部２０５は、算出された距離の違いの情報に基づき、本受信装置のフロントスピーカ（ＴＦＬ／ＴＦＣ／ＴＦＲ）の音声信号の遅延量補正データを算出する。ゲイン補正データおよび遅延量補正データは、切り出し視聴している画面に対して視聴者が感じる音声の音量と音源までの距離感が適正になるよう、決定される。 FIG. 2 is a configuration example of the correction data calculation unit 108. The correction data calculation unit 108 includes an angle analysis unit 201, a mixed gain data calculation unit 202, a distance analysis unit 203, a gain correction data calculation unit 204, and a delay amount correction data calculation unit 205. The angle analysis unit 201 calculates an angle between the front speaker of the receiving apparatus and the 22.2 ch front speaker output to each of the front speakers. Hereinafter, the front left speaker of this receiving apparatus is abbreviated as TFL, the front center speaker is abbreviated as TFC, the front right speaker is abbreviated as TFR, and the 22.2ch front speaker is abbreviated as a source. Based on the calculated angle, the mixing gain data calculation unit 202 mixes the audio data of each source to be synthesized so that the direction of arrival of the sound felt by the viewer with respect to the screen being cut and viewed is appropriate (gain). ). The distance analysis unit 203 calculates a difference in distance between each 22.2ch source and the front speaker (TFL / TFC / TFR) of the reception apparatus based on the angle calculated by the angle analysis unit 201. The gain correction data calculation unit 204 calculates gain (volume level) correction data of the audio signal of the front speaker (TFL / TFC / TFR) of the receiving apparatus based on the calculated difference in distance. The delay amount correction data calculation unit 205 calculates the delay amount correction data of the audio signal of the front speaker (TFL / TFC / TFR) of the receiving apparatus based on the calculated distance difference information. The gain correction data and the delay amount correction data are determined so that the sound volume perceived by the viewer and the sense of distance to the sound source are appropriate for the screen being cut and viewed.

説明を図１に戻す。図１において映像出力制御部１０９は、表示位置検出部１０７からの切り出し画面の位置情報に基づき、ＳＨＶ画面からＨＤ解像度の画面を切り出して表示部１１０に送る。 Returning to FIG. In FIG. 1, the video output control unit 109 cuts out an HD resolution screen from the SHV screen and sends it to the display unit 110 based on the cut-out screen position information from the display position detection unit 107.

表示部１１０は、ＨＤ（１９２０×１０８０）を表示できる解像度を有する表示装置である。 The display unit 110 is a display device having a resolution capable of displaying HD (1920 × 1080).

音声出力制御部１１１は、音声再生部１０５から出力される２２．２ｃｈのフロントスピーカの各音声データを、補正データ解析部１０８からの混合ゲインデータに基づいて所定の割合で合成する。その後、音声出力制御部１１１は、ゲイン補正データ、遅延量補正データに従ってゲイン調整、遅延量調整を行い、本受信装置のフロントスピーカ（ＴＦＬ／ＴＦＣ／ＴＦＲ）の音声データを生成する。生成された音声データはＤ／Ａ（Ｄｉｇｉｔａｌ−Ａｎａｌｏｇ）コンバータとアンプを介してＴＦＬ／ＴＦＣ／ＴＦＲの各スピーカ１１２に出力される。合成、ゲイン調整、遅延量調整の詳細については後述する。 The audio output control unit 111 synthesizes the audio data of the 22.2ch front speaker output from the audio reproduction unit 105 based on the mixed gain data from the correction data analysis unit 108 at a predetermined ratio. Thereafter, the audio output control unit 111 performs gain adjustment and delay amount adjustment according to the gain correction data and the delay amount correction data, and generates audio data of the front speaker (TFL / TFC / TFR) of the present receiving apparatus. The generated audio data is output to each TFL / TFC / TFR speaker 112 via a D / A (Digital-Analog) converter and an amplifier. Details of synthesis, gain adjustment, and delay amount adjustment will be described later.

図３は、実施例１における切り出し視聴画面の位置を示す図である。実施例１では、スーパーハイビジョン画面３０１の中央部をハイビジョン解像度３０２で切り出し（トリミング）視聴する例を示している。 FIG. 3 is a diagram illustrating the position of the cut-out viewing screen in the first embodiment. In the first embodiment, an example in which the center portion of the super high-definition screen 301 is cut out (trimmed) with a high-definition resolution 302 is shown.

図４は、スーパーハイビジョンの画面中央部を切り出し視聴する実施例１における音声出力制御部１１１の構成例である。画面中央を中心に切り出しているため、本受信装置のフロントセンタースピーカ（ＴＦＣ）には、２２．２ｃｈのフロントセンタースピーカ（ＦＣ）の音声データを出力する。フロント左スピーカ（ＴＦＬ）には、２２．２ｃｈのフロント左スピーカ（ＦＬ）、フロント左センタースピーカ（ＦＬｃ）、フロントセンタースピーカ（ＦＣ）の３つのソースを混合した音声データを出力する。フロント右スピーカ（ＴＦＲ）には、２２．２ｃｈのフロント右スピーカ（ＦＲ）、フロント右センタースピーカ（ＦＲｃ）、フロントセンタースピーカ（ＦＣ）の３つのソースを混合した音声データを出力する。 FIG. 4 is a configuration example of the audio output control unit 111 according to the first embodiment in which the center portion of the Super Hi-Vision screen is cut out and viewed. Since the center of the screen is cut out, the audio data of the 22.2 ch front center speaker (FC) is output to the front center speaker (TFC) of the receiving apparatus. The front left speaker (TFL) outputs audio data obtained by mixing three sources of 22.2ch front left speaker (FL), front left center speaker (FLc), and front center speaker (FC). The front right speaker (TFR) outputs audio data in which three sources of 22.2ch front right speaker (FR), front right center speaker (FRc), and front center speaker (FC) are mixed.

図５は、２２．２ｃｈシステムにおける標準的なスピーカ配置を示したものである。２２．２ｃｈシステムは、フロントに１１ｃｈ、サイドに４ｃｈ、リアに６ｃｈ、上部に１ｃｈの２２ｃｈに、低音域用のＬＦＥ（ＬｏｗＦｒｅｑｕｅｎｃｙＥｆｆｅｃｔｓ）を２ｃｈ（０．２ｃｈとして扱う）加えたスピーカ構成になっている。５．１ｃｈや６．１ｃｈ／７．１ｃｈシステムは、前後左右方向の音の広がりには対応できるが上下の音像表現が困難であった。２２．２ｃｈシステムでは、視聴者と同じ高さの中間層のスピーカを１０チャンネルに増やすとともに、上層に９チャンネル、下層に３チャンネルを配することにより、垂直方向の音像移動にも対応することが可能な構成になっている。 FIG. 5 shows a standard speaker arrangement in a 22.2 channel system. The 22.2ch system has a speaker configuration with 11ch on the front, 4ch on the side, 6ch on the rear, 1ch on the top, and 22ch on the low frequency range LFE (Low Frequency Effects). ing. The 5.1ch and 6.1ch / 7.1ch systems can cope with the spread of sound in the front-rear and left-right directions, but it is difficult to express the upper and lower sound images. In the 22.2ch system, the number of speakers in the middle layer, which is the same height as the viewer, is increased to 10 channels, and 9 channels in the upper layer and 3 channels in the lower layer can be arranged to support vertical sound image movement. It has a possible configuration.

本明細書における実施例では、説明を簡単にするため２２．２ｃｈシステム（ＳＨＶ放送の音声モードの１つ）と、５．１ｃｈシステム（ＨＤ放送の音声モードの１つ）におけ
る、それぞれのフロントスピーカの関係についてのみ説明する。リアスピーカやサイドスピーカ、低域スピーカについての記載は省略する。また、実施例１及び２は、画面の上下方向に対しては中央部分をトリミング視聴する構成であるため、２２．２ｃｈシステムにおける中間層のフロントスピーカについてのみ説明する。上層のフロントスピーカ（図５におけるＴｐＦＬ／ＴｐＦＣ／ＴｐＦＲ）、下層のフロントスピーカ（ＢｔＦＬ／ＢｔＦＣ／ＢｔＦＲ）についての記載は省略する。 In the embodiments of the present specification, front speakers in the 22.2 channel system (one of the audio modes of the SHV broadcast) and the 5.1 channel system (one of the audio modes of the HD broadcast) are described in order to simplify the description. Only the relationship will be described. Descriptions of rear speakers, side speakers, and low-frequency speakers are omitted. Since the first and second embodiments have a configuration in which the central portion is trimmed and viewed in the vertical direction of the screen, only the front speakers in the middle layer in the 22.2 ch system will be described. The description of the upper front speakers (TpFL / TpFC / TpFR in FIG. 5) and the lower front speakers (BtFL / BtFC / BtFR) is omitted.

次に本発明に係わるブロックについて、フローチャートを用いてその動作を説明する。 Next, the operation of the block according to the present invention will be described using a flowchart.

図６は、メタデータ処理部１０６の処理フローの一実施例である。 FIG. 6 is an example of a processing flow of the metadata processing unit 106.

デジタル放送においては、映像や音声、番組に関する種々の情報をＰＳＩ／ＳＩデータ内に各種テーブルとして埋め込み、送出することが可能である。詳細は、（社）電波産業会（以下、ＡＲＩＢ）の発行する「デジタル放送に使用する番組配列情報」の標準規格（ＡＲＩＢＳＴＤ−Ｂ１０）を参照のこと。 In digital broadcasting, various information regarding video, audio, and programs can be embedded and transmitted as various tables in PSI / SI data. For details, refer to the standard (ARIB STD-B10) of “Program arrangement information used for digital broadcasting” issued by the Radio Industries Association (ARIB).

本実施例においては、ＰＭＴ（ＰｒｏｇｒａｍＭａｐＴａｂｌｅ）もしくはＥＩＴ（ＥｖｅｎｔＩｎｆｏｒｍａｔｉｏｎＴａｂｌｅ）に挿入されるコンポーネント記述子を用いて、スーパーハイビジョン映像（７６８０×４３２０／６０／Ｐの映像フォーマット）であること、さらに放送局によるトリミング情報があることを伝送する。ＥＩＴは、番組名、放送日時、番組内容など番組に関する情報を送るテーブルである。 In this embodiment, it is a super high-definition video (video format of 7680 × 4320/60 / P) using a component descriptor inserted in PMT (Program Map Table) or EIT (Event Information Table), and It transmits that there is trimming information by the broadcasting station. The EIT is a table for sending information related to a program such as program name, broadcast date and time, and program content.

コンポーネント記述子は、図７のようなデータ構造を持っており、その中のコンポーネント種別（ｃｏｍｐｏｎｅｎｔ＿ｔｙｐｅ）という８ビットのフィールドに映像コンポーネントや音声コンポーネントの種別を示すことが出来る。 The component descriptor has a data structure as shown in FIG. 7, and the type of video component or audio component can be indicated in an 8-bit field called component type (component_type) therein.

図８は、コンポーネント記述子内をより詳細に示したもので、コンポーネント種別が０ｘ００から０ｘＣ０までは、現在規格化されている映像コンポーネントを示している。本実施例で想定している映像ストリームは、ハイビジョン映像を超えるより高解像度の映像フォーマットであり、現在は規定されていない。本実施例では、デジタルシネマやスーパーハイビジョンの映像フォーマットであることを示すコンポーネント種別として、０ｘＥ１から０ｘＦ３を割り当てている（図８）。 FIG. 8 shows the details in the component descriptor. The component types 0x00 to 0xC0 indicate video components that are currently standardized. The video stream assumed in this embodiment is a higher-resolution video format that exceeds the high-definition video, and is not currently defined. In this embodiment, 0xE1 to 0xF3 are assigned as component types indicating the digital cinema or super high-definition video format (FIG. 8).

コンポーネント種別として、トリミング指定ありの映像フォーマットの場合、さらにトリミング（切り出し）情報を伝送する。トリミング情報はＥＩＴなどに挿入される拡張形式イベント記述子を用いて伝送することができる。拡張形式イベント記述子は、図９のようなデータ構造を持っており、その中のｉｔｅｍ＿ｄｅｓｃｒｉｐｔｉｏｎ＿ｃｈａｒの１つとしてトリミング情報を追加する。図１０のように、ｉｔｅｍ＿ｄｅｓｃｒｉｐｔｉｏｎ＿ｃｈａｒにトリミングサイズやアドレスを示し、ｉｔｅｍ＿ｃｈａｒにてその値を送る。 In the case of a video format with trimming designation as the component type, trimming (cutout) information is further transmitted. Trimming information can be transmitted using an extended format event descriptor inserted in EIT or the like. The extended format event descriptor has a data structure as shown in FIG. 9, and trimming information is added as one of the item_description_chars therein. As shown in FIG. 10, the trimming size and address are shown in item_description_char, and the value is sent in item_char.

このようなメタデータが多重化されたデジタル放送を受信することを前提に説明を行う。 The description will be made on the premise that digital broadcasting in which such metadata is multiplexed is received.

図６のステップＳ６０２にて、メタデータ処理部１０６は、ＰＭＴのコンポーネント記述子から、コンポーネント識別情報を抽出する。次に、メタデータ処理部１０６は、コンポーネント識別情報から、受信している番組がスーパーハイビジョン（ＳＨＶ）映像であるかを判定する（Ｓ６０３）。ここで、ＳＨＶ映像モードではなく通常のハイビジョン映像モードであった場合（図８のコンポーネント種別が０ｘＢ２など）は、画面の一部を切り出すことなく、そのままの解像度で表示する（Ｓ６０４）。 In step S602 of FIG. 6, the metadata processing unit 106 extracts component identification information from the component descriptor of the PMT. Next, the metadata processing unit 106 determines whether the received program is a Super Hi-Vision (SHV) video from the component identification information (S603). Here, when the normal high-definition video mode is used instead of the SHV video mode (the component type of FIG. 8 is 0xB2 or the like), the screen is displayed as it is without cutting out a part of the screen (S604).

ステップＳ６０３にてＳＨＶ映像モードであると判定された場合、メタデータ処理部１０６はステップＳ６０５にて放送局によるトリミング指定があるか否かを判定する。ここでトリミング指定ありの映像モード（例えば、図８の０ｘＦ２）であった場合は、ステップＳ６０６に進む。一方、同じＳＨＶ映像であってもトリミング指定なしであった場合は、ステップＳ６０７に進む。 If it is determined in step S603 that the mode is the SHV video mode, the metadata processing unit 106 determines whether there is a trimming designation by the broadcast station in step S605. If the video mode has trimming designation (for example, 0xF2 in FIG. 8), the process proceeds to step S606. On the other hand, even if the same SHV video is not specified for trimming, the process proceeds to step S607.

ステップＳ６０６では、メタデータ処理部１０６は、ＥＩＴの拡張形式イベント記述子から、トリミング情報を抽出する。図１０は、トリミングサイズ“１”（ＨＤ解像度：１９２０×１０８０であることを表す）、左上（Ｘ，Ｙ）アドレスが（−９６０、＋５４０）と記述されたトリミング情報の例である。本実施例においては画面中央の座標を（０，０）としているので、図１０のトリミング情報は、ＳＨＶ画面の中央部をＨＤ解像度でトリミング（切り出す）指定をしていることになる。ステップＳ６０６にて抽出されたトリミング情報は、表示位置検出部１０７に送られる（Ｓ６１０）。 In step S606, the metadata processing unit 106 extracts trimming information from the EIT extended format event descriptor. FIG. 10 is an example of trimming information in which the trimming size is “1” (representing that the HD resolution is 1920 × 1080) and the upper left (X, Y) address is (−960, +540). In this embodiment, since the coordinates of the center of the screen are (0, 0), the trimming information in FIG. 10 specifies that the center of the SHV screen is trimmed (cut out) with HD resolution. The trimming information extracted in step S606 is sent to the display position detection unit 107 (S610).

ステップＳ６０５にて放送局によるトリミング指定がなかった場合、メタデータ処理部１０６は、ユーザ操作によるトリミング指定があるか否かを判定する（Ｓ６０７）。ここで受信装置の機能としてユーザが任意の位置をトリミング視聴でき、実際にトリミング視聴している場合は、受信装置内で管理されている切り出し位置、およびサイズの情報をトリミング情報として抽出し（Ｓ６０８）、ステップＳ６１０に進む。一方、ユーザによるトリミング指定がなかった場合は、メタデータ処理部１０６は、全画面をダウンコンバートして表示するよう映像出力制御部に指示する（Ｓ６０９）。 If there is no trimming designation by the broadcast station in step S605, the metadata processing unit 106 determines whether there is a trimming designation by a user operation (S607). Here, as a function of the receiving apparatus, the user can trim and view an arbitrary position, and when the user actually performs trimming and viewing, information on the cutout position and size managed in the receiving apparatus is extracted as trimming information (S608). ), And proceeds to step S610. On the other hand, if there is no trimming designation by the user, the metadata processing unit 106 instructs the video output control unit to down-convert and display the entire screen (S609).

図１１は、表示位置検出部１０７の処理フローの一実施例である。 FIG. 11 is an example of a processing flow of the display position detection unit 107.

まず、ステップＳ１１０２にて、表示位置検出部１０７は、メタデータ処理部１０６で抽出されたトリミング情報を読み込む。次に、表示位置検出部１０７は、図１０に示したようなトリミング情報に基づき、切り出し画面の座標を求める（Ｓ１１０３）。切り出し画面の座標データは、図１における映像出力制御部１０９に送られ（Ｓ１１０４）、所定の位置が（本実施例ではＳＨＶの画面中央部をＨＤ解像度で）切り出され表示部１１０にて表示される。 First, in step S1102, the display position detection unit 107 reads the trimming information extracted by the metadata processing unit. Next, the display position detection unit 107 obtains the coordinates of the cut-out screen based on the trimming information as illustrated in FIG. 10 (S1103). The coordinate data of the cutout screen is sent to the video output control unit 109 in FIG. 1 (S1104), and a predetermined position is cut out (in the present embodiment, the central portion of the screen of the SHV is HD resolution) and displayed on the display unit 110. The

ステップＳ１１０５にて、表示位置検出部１０７は、切り出し画面の位置に応じて５．１ｃｈシステムにおけるフロントスピーカの位置を算出する。 In step S1105, the display position detection unit 107 calculates the position of the front speaker in the 5.1ch system according to the position of the cutout screen.

図１２は、視聴位置１２０１において、ＳＨＶの画面中央部をＨＤ解像度でトリミング視聴する場合のフロントスピーカの配置を示したものである。上述したように、ＦＬ、ＦＬｃ、ＦＣ、ＦＲｃ、ＦＲは、それぞれ２２．２ｃｈシステムにおける中間層のフロント左スピーカ、フロント左センタースピーカ、フロントセンタースピーカ、フロント右センタースピーカ、フロント右スピーカを表している。また、ＴＦＬ、ＴＦＣ、ＴＦＲは、それぞれトリミング視聴時の５．１ｃｈシステムにおけるフロントスピーカを表している。図１２に示したように、本実施例では、ＴＦＬ、ＴＦＣ、ＴＦＲの座標は、それぞれ、（−９６０，０）、（０，０）、（＋９６０，０）となる。ちなみに、ＴＦＬとＴＦＲの位置については、ユーザの広がり感に関する好みや視聴するコンテンツなどに応じて、切り出し画面の両端とＦＬ、ＦＲとの間の任意の位置に設置することが可能である（図１３）。ここでは説明を簡単にするため、切り出し画面の左端にＴＦＲを、右端にＴＦＲを設置することとする。 FIG. 12 shows the arrangement of front speakers when the central portion of the SHV screen is trimmed and viewed at the viewing position 1201 with HD resolution. As described above, FL, FLc, FC, FRc, and FR represent the front left speaker, the front left center speaker, the front center speaker, the front right center speaker, and the front right speaker, respectively, in the 22.2ch system. . TFL, TFC, and TFR represent front speakers in the 5.1 channel system during trimming viewing. As shown in FIG. 12, in this embodiment, the coordinates of TFL, TFC, and TFR are (−960, 0), (0, 0), and (+960, 0), respectively. Incidentally, the positions of TFL and TFR can be set at arbitrary positions between both ends of the cut-out screen and FL and FR according to the user's sense of spread and the content to be viewed (see FIG. 13). Here, in order to simplify the description, it is assumed that a TFR is installed at the left end of the cut-out screen and a TFR is installed at the right end.

次に、図１１のステップＳ１１０６において、表示位置検出部１０７は、ＴＦＬ、ＴＦＣ、ＴＦＲのｙ座標が０（ゼロ）であるか否かを判定する。ここで、ｙ座標が０の場合は
、図１２のようにＳＨＶ画面の上下方向のちょうど中央の高さで切り出されたことになる。本実施例では、ｙ座標が０であるゆえ、ステップＳ１１０７に進む。 Next, in step S1106 in FIG. 11, the display position detection unit 107 determines whether the y-coordinates of TFL, TFC, and TFR are 0 (zero). Here, when the y coordinate is 0, it is cut out at the exact center height in the vertical direction of the SHV screen as shown in FIG. In this embodiment, since the y coordinate is 0, the process proceeds to step S1107.

ステップＳ１１０７で、表示位置検出部１０７は、ＴＦＬ、ＴＦＣ、ＴＦＲと、ＦＬ、ＦＬｃ、ＦＣ、ＴＲｃ、ＦＲのｘ座標上の位置関係を判定する。本実施例では、例えばＴＦＬの位置は、ＦＬｃとＦＣとの間に位置していると判定される。 In step S1107, the display position detection unit 107 determines the positional relationship on the x-coordinate of TFL, TFC, TFR and FL, FLc, FC, TRc, FR. In the present embodiment, for example, the position of TFL is determined to be located between FLc and FC.

ステップＳ１１０８〜Ｓ１１１０において、表示位置検出部１０７は、ステップＳ１１０７で判定された位置関係に従い、ＴＦＬ、ＴＦＣ、ＴＦＲそれぞれに、ＦＬ、ＦＬｃ、ＦＣ、ＴＲｃ、ＦＲのどのスピーカからの音声を合成して出力するかを決定する。表示位置検出部１０７は、まずＴＦＬと同じ位置のソースの有無を調べ、同じ位置のソースがあればそれを選択し、同じ位置のソースがなければ、表示位置検出部１０７は、ＴＦＬに最も近いＮ個（Ｎは２以上の整数）のソースを選択する。Ｎが３の場合、本実施例では、ＴＦＬに対して、ＦＬ、ＦＬｃ、ＦＣの組み合わせが選ばれる。同様にして、ＴＦＲにはＦＲ、ＦＲｃ、ＦＣの組み合わせが選ばれる。そして本実施例は画面中央での切り出しのため、ＴＦＣとＦＣの座標は一致し、ＴＦＣにはＦＣのみが選ばれる。なお組み合わせ決定のルール（アルゴリズム）は上記の例に限らず、どのようなものを採用してもよい。例えば、ＴＦＬ、ＴＦＣ、ＴＦＲそれぞれの位置（ｘ、ｙ座標）に対応するソースの組み合わせが予め規定されたテーブルを参照することも好ましい。また、ＴＦＬ等とソースとの距離に応じて、組み合わせるソースの数を動的に変化させてもよい。 In steps S1108 to S1110, the display position detection unit 107 synthesizes sound from any speaker of FL, FLc, FC, TRc, and FR to each of TFL, TFC, and TFR in accordance with the positional relationship determined in step S1107. Decide whether to output. First, the display position detection unit 107 checks whether or not there is a source at the same position as TFL. If there is a source at the same position, the display position detection unit 107 selects it. If there is no source at the same position, the display position detection unit 107 is closest to TFL. N sources (N is an integer of 2 or more) are selected. When N is 3, in this embodiment, a combination of FL, FLc, and FC is selected for TFL. Similarly, a combination of FR, FRc, and FC is selected for TFR. Since the present embodiment cuts out at the center of the screen, the coordinates of TFC and FC coincide with each other, and only FC is selected as the TFC. The rule (algorithm) for determining the combination is not limited to the above example, and any rule may be adopted. For example, it is also preferable to refer to a table in which combinations of sources corresponding to positions (x, y coordinates) of TFL, TFC, and TFR are defined in advance. Further, the number of sources to be combined may be dynamically changed according to the distance between the TFL and the source.

表示位置検出部１０７は、ステップＳ１１０５にて算出されたＴＦＬ、ＴＦＣ、ＴＦＲの位置情報（座標）とステップＳ１１０８〜Ｓ１１１０で決定されたＦＬ、ＦＬｃ、ＦＣ、ＴＲｃ、ＦＲの組み合わせ情報を補正データ算出部１０８に出力する（Ｓ１１１１）。 The display position detection unit 107 calculates correction data using the position information (coordinates) of TFL, TFC, and TFR calculated in step S1105 and the combination information of FL, FLc, FC, TRc, and FR determined in steps S1108 to S1110. The data is output to the unit 108 (S1111).

なお、図１１のステップＳ１１１２〜Ｓ１１２２は、切り出し位置が異なる場合の処理フローである。ステップＳ１１１３〜Ｓ１１１７は、切り出し位置が上下方向のみ変化する場合の処理例である。その場合、ＴｐＦＣとＢｔＦＣの２つのソースも音声合成に利用される。ステップＳ１１１２、Ｓ１１１８〜Ｓ１１２２については、他の実施例（実施例４）にて説明する。 Note that steps S1112 to S1122 in FIG. 11 are processing flows when the cutout positions are different. Steps S1113 to S1117 are processing examples when the cutout position changes only in the vertical direction. In that case, two sources of TpFC and BtFC are also used for speech synthesis. Steps S1112 and S1118 to S1122 will be described in another embodiment (embodiment 4).

図１４は補正データ算出部１０８における処理フローの一実施例である。 FIG. 14 shows an example of a processing flow in the correction data calculation unit 108.

まず、ステップＳ１４０２にて、補正データ算出部１０８は、表示位置検出部１０７にて算出、決定されたＴＦＬ、ＴＦＣ、ＴＦＲの位置情報（座標）とそれぞれに出力するＦＬ、ＦＬｃ、ＦＣ、ＴＲｃ、ＦＲの組み合わせ情報を読み込む。 First, in step S1402, the correction data calculation unit 108 calculates and determines the position information (coordinates) of TFL, TFC, and TFR calculated and determined by the display position detection unit 107, and outputs FL, FLc, FC, TRc, Read FR combination information.

次に、ステップＳ１４０３にて、補正データ算出部１０８は、視聴位置からみた各スピーカの角度を算出する。本実施例における視聴位置（図１２の１２０１）は、ＳＨＶ視聴において視聴角１００度、視聴距離０．７５Ｈ（Ｈは画面上下サイズ）、ＨＤ視聴においては視聴角３０度、視聴距離３Ｈとしている。これは、ＳＨＶ視聴およびＨＤ視聴の標準視聴パラメータに相当し、それぞれの標準視聴角で視聴した場合、ＳＨＶの０．７５ＨとＨＤの３Ｈは同じ視聴距離となる。 In step S1403, the correction data calculation unit 108 calculates the angle of each speaker viewed from the viewing position. The viewing position (1201 in FIG. 12) in this embodiment is set to a viewing angle of 100 degrees and viewing distance of 0.75H (H is the vertical size of the screen) in SHV viewing, and a viewing angle of 30 degrees and viewing distance of 3H in HD viewing. This corresponds to the standard viewing parameters for SHV viewing and HD viewing. When viewing at each standard viewing angle, 0.75H for SHV and 3H for HD have the same viewing distance.

ステップＳ１４０４では、補正データ算出部１０８は、ステップＳ１４０３で算出された視聴位置と各スピーカとの角度から、ＴＦＬ、ＴＦＣ、ＴＦＲそれぞれに対するＦＬ、ＦＬｃ、ＦＣ、ＦＲｃ、ＦＲの混合比を算出する。ステップＳ１４０５にて、補正データ算出部１０８は、算出した混合比を混合ゲインデータとして図１の音声出力制御部１１１に出力する。 In step S1404, the correction data calculation unit 108 calculates a mixture ratio of FL, FLc, FC, FRc, and FR for each of TFL, TFC, and TFR from the angle between the viewing position calculated in step S1403 and each speaker. In step S1405, the correction data calculation unit 108 outputs the calculated mixing ratio as mixing gain data to the audio output control unit 111 in FIG.

図１５は、ＴＦＬから出力する音声を、ＦＬ、ＦＬｃ、ＦＣの音声を混合して生成する一例を示した図である。本実施例における視聴位置と各スピーカとの角度は、視聴位置を標準視聴位置（視聴角ＳＨＶ：１００度、ＨＤ：３０度）とした場合の値である。各スピーカとの角度に対し余弦定理等を用いることによりＦＬ、ＦＬｃ、ＦＣの各音声ベクトルを合成し、音声の到来方向がＴＦＬの角度となる音声ベクトルＶＴＦＬを生成する。ＶＴＦＬ＝１として正規化することにより、ＦＬ、ＦＬｃ、ＦＣそれぞれの音声ベクトルの混合比を求めることができる。図１５では、ＦＬの出力をＶＦＬ、ＦＬｃの出力をＶＦＬｃ、ＦＣの出力をＶＦＣの割合で合成することで、ＴＦＬからの音声ベクトルＶＴＦＬを生成している。なお、図１５の例のように２次元において３つ以上のソースのベクトルを合成する場合は、必要に応じて他の拘束条件（例えばベクトルの大きさの範囲など）を追加すればよい。ここでは、合成ベクトルＶＴＦＬの一方の分解成分ベクトルであるベクトルＶＦＣの大きさを、他方の分解ベクトルの方向がベクトルＶＦＬｃとベクトルＶＦＬとにより合成できる範囲とすることができる。 FIG. 15 is a diagram illustrating an example in which the sound output from the TFL is generated by mixing the sounds of FL, FLc, and FC. The angle between the viewing position and each speaker in this embodiment is a value when the viewing position is a standard viewing position (viewing angle SHV: 100 degrees, HD: 30 degrees). By using the cosine theorem or the like with respect to the angle with each speaker, the speech vectors of FL, FLc, and FC are synthesized to generate a speech vector VTFL in which the speech arrival direction is an angle of TFL. By normalizing with VTFL = 1, it is possible to obtain the mixing ratio of the speech vectors of FL, FLc, and FC. In FIG. 15, a speech vector VTFL is generated from TFL by combining FL output with VFL, FLc output with VFLc, and FC output with VFC. In the case of synthesizing three or more source vectors in two dimensions as in the example of FIG. 15, other constraint conditions (for example, a range of vector magnitudes) may be added as necessary. Here, the magnitude of the vector VFC, which is one decomposition component vector of the combined vector VTFL, can be set to a range in which the direction of the other decomposition vector can be combined by the vector VFLc and the vector VFL.

図１６は、図１５に対してＴＦＬの位置をよりＦＬ側に変更した例である。図１６においては、ＴＦＬの位置がＦＬ側になったことにより、ＦＬ、ＦＬｃ、ＦＣの各音声ベクトル混合比として、ＶＦＣの割合が減り、ＶＦＬ、ＶＦＬｃの割合が増加することを示している。 FIG. 16 shows an example in which the position of TFL is changed to the FL side with respect to FIG. FIG. 16 shows that the ratio of VFC decreases and the ratio of VFL and VFLc increases as the speech vector mixture ratio of FL, FLc, and FC due to the position of TFL being on the FL side.

図１４に戻り処理フローの説明を続ける。 Returning to FIG. 14, the description of the processing flow will be continued.

ステップＳ１４０６では、補正データ算出部１０８は、ステップＳ１４０３にて算出された視聴位置からみた各スピーカの角度から、ＦＬ、ＦＬｃ、ＦＣ、ＦＲｃ、ＦＲとＴＦＬ、ＴＦＣ、ＴＦＲの距離の差を算出する。図１７は、ＦＬとＴＦＬの距離の差を求める一例である。標準視聴位置の場合、ＦＬは正面から左に５０度、本実施例のＴＦＬの位置は同じく左に１５度となる。 In step S1406, the correction data calculation unit 108 calculates the difference in distance between FL, FLc, FC, FRc, FR and TFL, TFC, TFR from the angle of each speaker viewed from the viewing position calculated in step S1403. . FIG. 17 is an example of obtaining a difference in distance between FL and TFL. In the case of the standard viewing position, FL is 50 degrees to the left from the front, and the TFL position in this embodiment is 15 degrees to the left.

視聴者とＦＬとの距離をＬ_ＦＬ、視聴者とＴＦＬとの距離をＬ_ＴＦＬとすると、Ｌ_ＦＬ・ｃｏｓ５０°＝Ｌ_ＴＦＬ・ｃｏｓ１５°の関係が成り立つゆえ、Ｌ_ＴＦＬ＝Ｌ_ＦＬ×ｃｏｓ５０°／ｃｏｓ１５°となる。 The distance between the viewer and the _{FL L} FL, and the distance between the viewer and the TFL and _{L _TFL,} because the relationship between the _{L FL · cos50 ° = L TFL} · cos15 ° _{_{holds, L TFL = L FL × cos50}} ° / cos 15 °.

次にステップＳ１４０７にて、補正データ算出部１０８は、ステップＳ１４０６で求めた距離関係に基づいてゲイン補正データを算出する。例えば図１７において、ＴＦＬのＦＬに対するゲインをＧ（ＴＦＬ＿ＦＬ）、ＦＬのゲインをＧ（ＦＬ）とおく。音量は距離の２乗に反比例するため、トリミング視聴においても同じ音量で聞こえるようにするには、Ｇ（ＴＦＬ＿ＦＬ）＝Ｇ（ＦＬ）×（Ｌ_ＴＦＬ）^２／（Ｌ_ＦＬ）^２となるようゲイン補正データを生成する。さらに、本実施例においてＴＦＬに合成出力する他の音声チャンネル（ＦＬｃ、ＦＣ）についても同様にゲイン補正データを算出し、ＦＬ、ＦＬｃ、ＦＣトータルのゲイン補正データを決定し、図１の音声出力制御部１１１に送る（Ｓ１４０８）。 In step S1407, the correction data calculation unit 108 calculates gain correction data based on the distance relationship obtained in step S1406. For example, in FIG. 17, the gain of TFL with respect to FL is set to G (TFL_FL), and the gain of FL is set to G (FL). Since the volume is inversely proportional to the square of the distance, a gain is set so that G (TFL_FL) = G (FL) × (L _TFL ) ² / (L _FL ) ² in order to hear the same volume during trimming viewing. Generate correction data. Further, in this embodiment, gain correction data is similarly calculated for other audio channels (FLc, FC) synthesized and output to TFL, and the total gain correction data of FL, FLc, FC is determined, and the audio output of FIG. The data is sent to the control unit 111 (S1408).

ステップＳ１４０９では、補正データ算出部１０８は、ステップＳ１４０６で求めた距離関係に基づいて遅延量補正データを算出する。ステップＳ１４０７の説明と同じく図１７を用いて説明する。図１７において、ＴＦＬのＦＬに対する遅延量をＤ（ＴＦＬ＿ＦＬ）、ＦＬの遅延量をＤ（ＦＬ）とおく。遅延量は距離に比例するため、トリミング視聴においても、視聴者が感じる音源までの距離感が同じとなるようにするには、Ｄ（ＴＦＬ＿ＦＬ）＝Ｄ（ＦＬ）×Ｌ_ＴＦＬ／Ｌ_ＦＬとなるよう遅延量補正データを生成する。さらに、本実施例においてＴＦＬに合成出力する他の音声チャンネル（ＦＬｃ、ＦＣ）についても同様に遅延量補正データを算出し、ＦＬ、ＦＬｃ、ＦＣトータルの遅延量補正データを決定し、図１の音声出力制御部１１１に送る（Ｓ１４１０）。 In step S1409, the correction data calculation unit 108 calculates delay amount correction data based on the distance relationship obtained in step S1406. Similar to the description of step S1407, the description will be made with reference to FIG. In FIG. 17, it is assumed that the delay amount of TFL with respect to FL is D (TFL_FL), and the delay amount of FL is D (FL). Since the delay amount is proportional to the distance, D (TFL_FL) = D (FL) × L _TFL / L _{FL in} order to make the sense of distance to the sound source felt by the viewer the same in trimming viewing. The delay amount correction data is generated. Further, in this embodiment, the delay amount correction data is similarly calculated for the other audio channels (FLc, FC) synthesized and output to the TFL, and the total delay amount correction data of FL, FLc, FC is determined. The data is sent to the audio output control unit 111 (S1410).

メタデータ処理部１０６（図６）、表示位置検出部１０７（図１１）、補正データ算出部１０８（図１４）の処理を経て得られた、トリミング視聴における、混合ゲイン補正データ、ゲイン補正データ、遅延量補正データは、音声出力制御部１１１に入力される。 Mixed gain correction data, gain correction data in trimming viewing, obtained through the processing of the metadata processing unit 106 (FIG. 6), the display position detection unit 107 (FIG. 11), and the correction data calculation unit 108 (FIG. 14), The delay amount correction data is input to the audio output control unit 111.

音声出力制御部１１１はＦＬ、ＦＬｃ、ＦＣ、ＦＲｃ、ＦＲからＴＦＬ、ＴＦＣ、ＴＦＲを生成する。一例として図４を参照して、ＴＦＬを生成する流れについて説明する。まず、音声出力制御部１１１は、混合ゲイン補正データに基づき、ＦＬ、ＦＬｃ、ＦＣを所定の混合比で合成する。次に音声出力制御部１１１は、音量がほぼ同じとなるようゲイン補正データに基づきゲインを調整し、さらに音源までの距離感がほぼ同じとなるよう遅延量補正データに基づき遅延量を調整する。これらの調整が完了したデータは、Ｄ／Ａ、ＡＭＰ（アンプ）を介してＴＦＬスピーカへと送られる。ＴＦＲについては、ＦＣ、ＦＲｃ、ＦＲを所定の混合比で合成する以外は、ＴＦＬの流れと同様である。なお、本実施例においては、画面中央部切り出し視聴のため、ＴＦＣ＝ＦＣの関係になり、ＴＦＣに関して特段の補正は行わない。（図４においては、ゲイン補正部、遅延量補正部を経由しているが、ともに補正量は０（ゼロ）である。） The audio output control unit 111 generates TFL, TFC, and TFR from FL, FLc, FC, FRc, and FR. As an example, a flow for generating a TFL will be described with reference to FIG. First, the audio output control unit 111 synthesizes FL, FLc, and FC at a predetermined mixing ratio based on the mixing gain correction data. Next, the audio output control unit 111 adjusts the gain based on the gain correction data so that the sound volume is substantially the same, and further adjusts the delay amount based on the delay amount correction data so that the sense of distance to the sound source is substantially the same. Data for which these adjustments have been completed is sent to the TFL speaker via D / A and AMP (amplifier). The TFR is the same as the TFL flow except that FC, FRc, and FR are synthesized at a predetermined mixing ratio. In the present embodiment, since the center portion of the screen is cut out and viewed, the relationship of TFC = FC is established, and no special correction is performed with respect to TFC. (In FIG. 4, the correction amount is 0 (zero) through the gain correction unit and the delay amount correction unit.)

本実施例における音声出力制御部（図４）は、混合ゲイン調整部により音の到来方向（角度）の補正を行った後に、音量を調整するゲイン補正、ならびに音源までの距離感を調整する遅延量補正を行う構成とした。しかし、先にゲイン補正、遅延量補正を行ってから、到来方向（角度）の補正を行う構成とすることで、ゲイン、遅延量の補正をより正確に行うことも可能である。 The audio output control unit (FIG. 4) in the present embodiment corrects the arrival direction (angle) of the sound by the mixed gain adjustment unit, and then performs gain correction for adjusting the volume and delay for adjusting the sense of distance to the sound source. It was set as the structure which performs quantity correction | amendment. However, it is possible to correct the gain and the delay amount more accurately by performing the correction of the arrival direction (angle) after the gain correction and the delay amount correction are performed first.

本実施例によれば、スーパーハイビジョン（ＳＨＶ）画面の中央部をハイビジョン（ＨＤ）解像度でトリミング視聴する場合、切り出された視聴画面に対して、音声の到来方向、音量、距離感が適切に補正されたマルチチャンネル音声を提供することが可能となる。 According to the present embodiment, when the central portion of the super high-definition (SHV) screen is trimmed and viewed at a high-definition (HD) resolution, the direction of arrival, volume, and distance are appropriately corrected for the cut-out viewing screen. Multi-channel audio can be provided.

［実施例２］
次に、本発明に係わる実施例２について説明する。 [Example 2]
Next, a second embodiment according to the present invention will be described.

図１８は、実施例２における切り出し視聴画面の位置を示す図である。実施例２では、スーパーハイビジョン画面１７０１の一部をハイビジョン解像度１７０２で切り出し（トリミング）視聴する例である。実施例１との違いは、ｘ軸方向の切り出し位置が画面中央部ではない点である。 FIG. 18 is a diagram illustrating the position of the cut-out viewing screen in the second embodiment. The second embodiment is an example in which a part of the super high-definition screen 1701 is cut out (trimmed) and viewed at a high-definition resolution 1702. The difference from the first embodiment is that the cut-out position in the x-axis direction is not the center of the screen.

実施例２のデジタル放送受信装置のブロック構成（図１）、ならびにメタデータ処理部、表示位置検出部、補正データ算出部の処理フロー（図６、図１１、図１４）は実施例１と基本的に同じである。 The block configuration (FIG. 1) of the digital broadcast receiving apparatus according to the second embodiment and the processing flow (FIGS. 6, 11, and 14) of the metadata processing unit, the display position detecting unit, and the correction data calculating unit are the same as those of the first embodiment. Are the same.

以下、実施例１との差分を中心に説明する。 Hereinafter, the difference from the first embodiment will be mainly described.

図１９は、実施例２におけるトリミング視聴時のフロントスピーカＴＦＬ、ＴＦＣ、ＴＦＲの位置ならびに、ＴＦＬに合成して出力するＦＬ、ＦＬｃ、ＦＣ、ＦＲｃ、ＦＲの組み合わせを示したものである。実施例２において、ＴＦＬにはＦＣ、ＦＲｃの音声データを組み合わせて出力する。これは切り出した画面周辺の音を中心に合成することを基本としている為であるが、さらに一定の比率でＦＬ、ＦＬｃを加えても良い。 FIG. 19 shows the positions of the front speakers TFL, TFC, and TFR during trimming viewing in Example 2, and combinations of FL, FLc, FC, FRc, and FR that are synthesized and output to the TFL. In Example 2, FC and FRc audio data are combined and output to TFL. This is because synthesis is based on the sound around the cut-out screen, but FL and FLc may be added at a constant ratio.

混合比率は、実施例１と同様、各スピーカとの角度に基づいて求めるが、実施例２では、視聴位置が原点（ｘ，ｙ）＝（０，０）でないため、各スピーカとの角度は切り出し位置座標を加味して算出する必要がある。 The mixing ratio is obtained based on the angle with each speaker as in the first embodiment. However, in the second embodiment, since the viewing position is not the origin (x, y) = (0, 0), the angle with each speaker is It is necessary to calculate the cut-out position coordinates.

視聴位置のｘ座標をａ、ＳＨＶ画面右端のｘ座標をｂ、原点座標におけるＦＬとの角度を５０度（実施例１と同じく標準視聴位置）、視聴位置におけるＦＬとの角度をＫ度とした場合、角度Ｋは、以下の関係式で表すことができる。
ｔａｎＫ°＝（（ａ＋ｂ）／ｂ）ｔａｎ５０° The x coordinate of the viewing position is a, the x coordinate of the right end of the SHV screen is b, the angle with the FL at the origin coordinate is 50 degrees (standard viewing position as in the first embodiment), and the angle with the FL at the viewing position is K degrees. In this case, the angle K can be expressed by the following relational expression.
tanK ° = ((a + b) / b) tan50 °

このようにして、視聴位置と画面端の座標から視聴位置における各スピーカとの角度を求めることができる（図２０）。 In this way, the angle with each speaker at the viewing position can be obtained from the viewing position and the coordinates of the screen edge (FIG. 20).

図２１は、実施例２におけるＴＦＣに合成して出力するＦＬ、ＦＬｃ、ＦＣ、ＦＲｃ、ＦＲの組み合わせを示したものである。実施例２におけるフロントセンタースピーカＴＦＣには、ＦＲｃ、ＦＲの音声データを組み合わせて出力している。これは上述のように、切り出した画面周辺の音を中心に出力するようにしている為である。 FIG. 21 shows combinations of FL, FLc, FC, FRc, and FR that are combined with the TFC and output in the second embodiment. The front center speaker TFC according to the second embodiment outputs FRc and FR audio data in combination. This is because as described above, the sound around the cut-out screen is mainly output.

図２２は、実施例２における音声出力制御部１１１の一例である。実施例１と同様、切り出し画面位置から算出された、混合ゲインデータ、ゲイン補正データ、遅延量補正データに基づいてトリミング視聴時のフロントスピーカＴＦＬ、ＴＦＣ、ＴＦＲのそれぞれの音声データを生成する。図１９、ならびに図２１にて説明したように実施例２においては、ＴＦＬにはＦＣ、ＦＲｃを、ＴＦＣには、ＦＲｃ、ＦＲを合成して出力する構成になっている。またＴＦＲについても、ＴＦＣと同じくＦＲｃ、ＦＲを合成して出力する構成となっている。これは、図１８からもわかるように、実施例２における切り出し画面位置の場合、ＴＦＣ、ＴＦＲともにＦＲｃ、ＦＲという２つのスピーカの間に位置しているためである。このため、ＴＦＣ、ＴＦＲともにＦＲｃ、ＦＲの音声データを合成しているが、各スピーカとの角度関係が異なるため、その混合比は異なっている。合成後は、実施例１と同様、音量調整のためのゲイン補正、距離感調整のための遅延量補正を行い、Ｄ／Ａ、ＡＭＰを通してトリミング視聴時のフロントスピーカＴＦＬ、ＴＦＣ、ＴＦＲに出力する。 FIG. 22 is an example of the audio output control unit 111 according to the second embodiment. Similarly to the first embodiment, the audio data of each of the front speakers TFL, TFC, and TFR at the time of trimming viewing is generated based on the mixed gain data, gain correction data, and delay amount correction data calculated from the cutout screen position. As described with reference to FIG. 19 and FIG. 21, the second embodiment has a configuration in which FC and FRc are combined with TFL and FRc and FR are combined with TFC and output. Also, the TFR is configured to synthesize and output FRc and FR as in the TFC. As can be seen from FIG. 18, in the case of the cutout screen position in the second embodiment, both TFC and TFR are located between two speakers FRc and FR. For this reason, although the audio data of FRc and FR are synthesized for both TFC and TFR, the mixing ratio differs because the angular relationship with each speaker is different. After the synthesis, as in the first embodiment, gain correction for volume adjustment and delay amount correction for distance adjustment are performed and output to front speakers TFL, TFC, and TFR during trimming viewing through D / A and AMP. .

本実施例によれば、ＳＨＶ画面の一部（上下中央、左右中央部以外）をＨＤ解像度でトリミング視聴する場合、切り出された視聴画面に対して、音声の到来方向、音量、距離感が適切に補正されたマルチチャンネル音声を提供することが可能となる。 According to the present embodiment, when a part of the SHV screen (other than the upper and lower center and the left and right center) is trimmed and viewed with HD resolution, the sound arrival direction, sound volume, and sense of distance are appropriate for the clipped viewing screen. It is possible to provide multi-channel audio corrected to the above.

［実施例３］
次に、本発明に係わる実施例３について説明する。 [Example 3]
Next, a third embodiment according to the present invention will be described.

実施例３において、デジタル放送受信装置のブロック構成（図１）、ならびにメタデータ処理部、表示位置検出部、補正データ算出部の処理フロー（図６、図１１、図１４）は実施例１と基本的に同じである。 In the third embodiment, the block configuration of the digital broadcast receiving apparatus (FIG. 1) and the processing flow (FIGS. 6, 11, and 14) of the metadata processing unit, the display position detection unit, and the correction data calculation unit are the same as those of the first embodiment. Basically the same.

実施例３は、実施例２の切り出し画面構成における、他の実施の形態であり、番組の音声イベントのうち、ナレーションやＢＧＭなどを、切り出し画面位置に関係なくフロントセンタースピーカＴＦＣから出力させる構成の一例である。 Example 3 is another embodiment of the cut-out screen configuration of Example 2, and is configured to output narration, BGM, etc. from the front center speaker TFC among the audio events of the program regardless of the cut-out screen position. It is an example.

図２３は実施例３を適用する音声モードに関するコンポーネント記述子の一例を示したものである。これは、実施例１で説明したコンポーネント記述子（図７）、ならびにその映像コンポーネントに関する記述（図８）にさらに追加されて記述されることを想定している。図２３において、コンポーネント内容の０ｘ０２は、音声コンポーネントであることを示し、コンポーネント種別が０ｘ００から０ｘ０９までは、現在規格化されている音声モードを示している。実施例３では説明のため、デジタルシネマやスーパーハイビジョンの音声フォーマットのコンポーネント種別として、０ｘ０Ａから０ｘ０Ｆを割り当て、
コンポーネント種別０ｘ０Ｆには、音声チャンネル種別情報ありの音声モードを設けている。音声チャンネル種別情報ありの音声モードとは、登場人物やナレーション、ＢＧＭなどの音声イベントごとに特定の音声チャンネルを割り当てるモードである。２２．２ｃｈシステムなどにおいて、特定の音声チャンネルに特定の音声イベントを割り当てることを想定している。 FIG. 23 shows an example of a component descriptor related to a voice mode to which the third embodiment is applied. This is assumed to be described in addition to the component descriptor (FIG. 7) described in the first embodiment and the description about the video component (FIG. 8). In FIG. 23, the component content 0x02 indicates a voice component, and component types 0x00 to 0x09 indicate a currently standardized voice mode. In the third embodiment, for the purpose of explanation, 0x0A to 0x0F are assigned as component types of audio formats of digital cinema and Super Hi-Vision.
The component type 0x0F has an audio mode with audio channel type information. The audio mode with audio channel type information is a mode in which a specific audio channel is assigned for each audio event such as a character, narration, or BGM. In a 22.2ch system or the like, it is assumed that a specific audio event is assigned to a specific audio channel.

コンポーネント種別として、音声チャンネル種別情報ありの音声モードの場合、どの音声イベントがどの音声チャンネルで伝送されているかの情報を送る必要がある。これは、実施例１におけるトリミング情報を伝送する例と同様、ＥＩＴなどに挿入される拡張形式イベント記述子を用いて伝送することができる。図２４は、拡張形式イベント記述子の中のｉｔｅｍ＿ｄｅｓｃｒｉｐｔｉｏｎ＿ｃｈａｒの１つとして音声イベントを追加し、そのｉｔｅｍ＿ｃｈａｒにて音声イベントに対応する音声チャンネルを示した一例である。 In the case of the audio mode with audio channel type information as the component type, it is necessary to send information on which audio event is transmitted on which audio channel. Similar to the example of transmitting trimming information in the first embodiment, this can be transmitted using an extended format event descriptor inserted in EIT or the like. FIG. 24 is an example in which an audio event is added as one of item_description_char in the extended format event descriptor, and an audio channel corresponding to the audio event is indicated by the item_char.

このようなメタデータを多重化して送ることにより実施例３を実現することができる。実施例３のデジタル放送受信装置における、この音声コンポーネントに係わるメタデータの処理については、実施例１におけるトリミング情報ありの場合の処理（図６）と類似しているため、本項での説明は省略する。 Embodiment 3 can be realized by multiplexing and sending such metadata. The processing of the metadata related to the audio component in the digital broadcast receiving apparatus of the third embodiment is similar to the processing (FIG. 6) in the case where there is trimming information in the first embodiment, so the description in this section will be described. Omitted.

図２５は、実施例３における音声出力制御部１１１の一例である。実施例２と同様、切り出し画面位置から算出された、混合ゲインデータ、ゲイン補正データ、遅延量補正データに基づいてトリミング視聴時のフロントスピーカＴＦＬ、ＴＦＣ、ＴＦＲのそれぞれの音声データを生成する。さらに実施例３においては、拡張形式イベント記述子にて伝送される音声イベントに対応する音声チャンネル情報に基づき、ナレーションやＢＧＭの音声チャンネル（図２４におけるチャンネル番号５や６）をトリミング視聴時のＴＦＣに合成する構成となっている。（図２５においては、特定ｃｈと表記） FIG. 25 is an example of the audio output control unit 111 according to the third embodiment. Similarly to the second embodiment, the respective audio data of the front speakers TFL, TFC, and TFR during trimming viewing are generated based on the mixed gain data, gain correction data, and delay amount correction data calculated from the cutout screen position. Further, in the third embodiment, based on the audio channel information corresponding to the audio event transmitted by the extended format event descriptor, the TFC at the time of trimming viewing the narration or BGM audio channel (channel numbers 5 and 6 in FIG. 24). It is a composition to synthesize. (In FIG. 25, indicated as specific ch)

本実施例によれば、実施例１、２の効果に加え、ナレーションやＢＧＭのような、通常は切り出し位置に依存しない音声についても、安定して聴取することが可能となる。 According to the present embodiment, in addition to the effects of the first and second embodiments, it is possible to stably listen to voices such as narration and BGM that do not normally depend on the cut-out position.

［実施例４］
次に、本発明に係わる実施例４について説明する。 [Example 4]
Next, a fourth embodiment according to the present invention will be described.

これまでの実施例は、説明を簡単にするため、画面上下方向にオフセットがない（ｙ座標が０の）トリミング視聴のケースについて説明してきたが、実施例４では画面上下方向についてもオフセットがついて切り出されるケースについて簡単に説明する。 In the embodiments so far, for the sake of simplicity, the case of trimming viewing where there is no offset in the vertical direction of the screen (y coordinate is 0) has been described, but in the fourth embodiment, there is also an offset in the vertical direction of the screen. The case to be cut out will be briefly described.

図２６は、実施例４における切り出し視聴画面の位置を示す図である。実施例４では、これまでの実施例と同様、スーパーハイビジョン画面２５０１の一部をハイビジョン解像度２５０２で切り出して視聴する例であるが、ｘ軸方向、ｙ軸方向とも切り出し位置が中央ではない点が、これまでの実施例と異なっている。 FIG. 26 is a diagram illustrating the position of the cut-out viewing screen in the fourth embodiment. In the fourth embodiment, as in the previous embodiments, a part of the super high-definition screen 2501 is clipped and viewed at a high-definition resolution 2502, but the clip position is not centered in both the x-axis direction and the y-axis direction. This is different from the previous embodiments.

図２６のように切り出し視聴する場合は、２２．２ｃｈシステムにおける中間層のスピーカだけでなく、上層および下層のスピーカを含め、トリミング視聴におけるフロントスピーカＴＦＬ、ＴＦＣ、ＴＦＬに出力する組み合わせを決定する。 In the case of cut-out viewing as shown in FIG. 26, the combination to be output to the front speakers TFL, TFC, TFL in trimming viewing including not only the middle-layer speakers in the 22.2ch system but also upper and lower speakers is determined.

表示位置検出部１０７の処理フロー（図１１）のステップＳ１１１８からステップＳ１１２２の処理がそれにあたる。これまでの実施例１から３では、中間層のＦＬ、ＦＬｃ、ＦＣ、ＦＲｃ、ＦＲのみの組み合わせを決定していたが、実施例４では、さらに上層のＴｐＦＬ、ＴｐＦＣ、ＴｐＦＲ、および下層のＢｔＦＬ、ＢｔＦＣ、ＢｔＦＲまで含めて組み合わせを決定する。 The processing from step S1118 to step S1122 in the processing flow (FIG. 11) of the display position detection unit 107 corresponds to this. In Examples 1 to 3 so far, the combination of only FL, FLc, FC, FRc, and FR of the intermediate layer has been determined, but in Example 4, TpFL, TpFC, TpFR, and BtFL of the lower layer are further increased. , BtFC and BtFR are included to determine the combination.

組み合わせ決定後の補正データ算出部１０８における処理や、音声出力制御部１１１における処理は、合成対象となるチャンネル数が増加する以外、基本的な動作は実施例１から３と同じである。 The processing in the correction data calculation unit 108 after the combination determination and the processing in the audio output control unit 111 are the same as those in the first to third embodiments except that the number of channels to be combined is increased.

以上、複数の実施例を挙げて本発明の具体的に構成を説明したが、本発明の範囲は上記実施例に限られることはない。例えば、上記実施例は、切り出し視聴前の音声モードに２２．２ｃｈシステム、切り出し視聴時の音声モードに５．１ｃｈシステムを例に説明したが、本発明は他の音声モードの組み合わせにも適用可能である。 The specific configuration of the present invention has been described with reference to a plurality of embodiments. However, the scope of the present invention is not limited to the above embodiments. For example, the above embodiment has been described by taking the 22.2ch system as the audio mode before clipping and viewing and the 5.1ch system as the sound mode during clipping and viewing, but the present invention can also be applied to other audio mode combinations. It is.

また上記実施例では、アンテナ１０１から放送波を受信する構成を例に説明したが、インターネットなどのＩＰネットワーク網からコンテンツ（番組）を受信する場合にも本発明を適用することが可能である。この場合であっても、表示位置を検出し、その位置とサイズに基づき、マルチチャンネルの音声出力を制御する処理は同じである。 In the above embodiment, the configuration in which the broadcast wave is received from the antenna 101 has been described as an example. However, the present invention can also be applied to the case where content (program) is received from an IP network such as the Internet. Even in this case, the processing for detecting the display position and controlling the multi-channel audio output based on the position and size is the same.

さらに、上記実施例では、画面の切り出し位置の指定方法として、放送局からメタデータを送ることにより指定する方法と受信装置の機能によってユーザ指定する方法について説明した。しかし、放送局から受信装置上で動作するアプリケーションを送り、そのアプリケーションが切り出し位置を制御するような場合であっても、本発明を適用することが可能である。 Furthermore, in the above-described embodiment, the method of specifying by sending metadata from the broadcasting station and the method of specifying the user by the function of the receiving device have been described as the method for specifying the cutout position of the screen. However, the present invention can be applied even when an application that operates on the receiving apparatus is sent from the broadcasting station and the application controls the cut-out position.

本発明に係わるデジタル放送受信装置の構成を示すブロック図The block diagram which shows the structure of the digital broadcast receiver concerning this invention 本発明に係わる補正データ算出部の構成を示すブロック図The block diagram which shows the structure of the correction data calculation part concerning this invention 本発明の実施例１における切り出し画面位置を示す図The figure which shows the cutout screen position in Example 1 of this invention 本発明の実施例１に係わる音声出力制御部の構成を示すブロック図The block diagram which shows the structure of the audio | voice output control part concerning Example 1 of this invention. ２２．２ｃｈ音声システムのスピーカ配置を示す図The figure which shows the speaker arrangement of a 22.2ch audio system 本発明に係わるメタデータ処理について説明するためのフローチャートFlowchart for explaining metadata processing according to the present invention デジタル放送で運用されるコンポーネント記述子のデータ構造を示す図Diagram showing the data structure of component descriptors used in digital broadcasting 本発明に係わる映像コンポーネント種別の一例Example of video component type according to the present invention デジタル放送で運用される拡張形式イベント記述子のデータ構造を示す図The figure which shows the data structure of the extended format event descriptor which is operated with digital broadcasting 本発明の実施例１に係わるトリミング情報の一例An example of trimming information according to the first embodiment of the present invention 本発明に係わる表示位置検出処理について説明するためのフローチャートThe flowchart for demonstrating the display position detection process concerning this invention. 本発明の実施例１に係わる切り出し画面位置とスピーカ位置の関係を示す図The figure which shows the relationship between the cut-out screen position and speaker position concerning Example 1 of this invention. 本発明の実施例１に係わる切り出し画面位置とスピーカ位置の関係を示す図The figure which shows the relationship between the cut-out screen position and speaker position concerning Example 1 of this invention. 本発明に係わる補正データ算出処理について説明するためのフローチャートFlowchart for explaining correction data calculation processing according to the present invention 本発明の実施例１に係わるＴＦＬ（切り出し画面のフロント左スピーカ）に出力する音声データの合成処理を説明するための図The figure for demonstrating the synthesis | combination process of the audio | voice data output to TFL (front left speaker of a cut-out screen) concerning Example 1 of this invention. 本発明の実施例１に係わるＴＦＬ（切り出し画面のフロント左スピーカ）に出力する音声データの合成処理を説明するための図（他の例）The figure for demonstrating the synthesis | combination process of the audio | voice data output to TFL (front left speaker of a cut-out screen) concerning Example 1 of this invention (other example) 本発明の実施例１に係わるＴＦＬ（切り出し画面のフロント左スピーカ）とＦＬ（オリジナル画面のフロント左スピーカ）との距離関係を説明するための図The figure for demonstrating the distance relationship between TFL (front left speaker of a cut-out screen) and FL (front left speaker of an original screen) concerning Example 1 of this invention. 本発明の実施例２における切り出し画面位置を示す図The figure which shows the cutout screen position in Example 2 of this invention 本発明の実施例２に係わるＴＦＬ（切り出し画面のフロント左スピーカ）に出力する音声データの合成処理を説明するための図The figure for demonstrating the synthesis | combination process of the audio | voice data output to TFL (front left speaker of a cut-out screen) concerning Example 2 of this invention. 本発明に係わる視聴位置における各スピーカとの角度計算について説明するための図The figure for demonstrating angle calculation with each speaker in the viewing-and-listening position concerning this invention. 本発明の実施例２に係わるＴＦＣ（切り出し画面のフロントセンタースピーカ）に出力する音声データの合成処理を説明するための図The figure for demonstrating the synthetic | combination process of the audio | voice data output to TFC (front center speaker of a cut-out screen) concerning Example 2 of this invention. 本発明の実施例２に係わる音声出力制御部の構成を示すブロック図The block diagram which shows the structure of the audio | voice output control part concerning Example 2 of this invention. 本発明の実施例３に係わる音声コンポーネント種別の一例Example of audio component type according to the third embodiment of the present invention 本発明の実施例３に係わる音声イベント情報の一例Example of audio event information according to Embodiment 3 of the present invention 本発明の実施例３に係わる音声出力制御部の構成を示すブロック図The block diagram which shows the structure of the audio | voice output control part concerning Example 3 of this invention. 本発明の実施例４における切り出し画面位置を示す図The figure which shows the cutout screen position in Example 4 of this invention

Explanation of symbols

１０６メタデータ処理部
１０７表示位置検出部
１０８補正データ算出部
１１１音声出力制御部
２０１角度解析部
２０２混合ゲインデータ算出部
２０３距離解析部
２０４ゲイン補正データ算出部
２０５遅延量補正データ算出部 106 Metadata processing unit 107 Display position detection unit 108 Correction data calculation unit 111 Audio output control unit 201 Angle analysis unit 202 Mixed gain data calculation unit 203 Distance analysis unit 204 Gain correction data calculation unit 205 Delay amount correction data calculation unit

Claims

Receiving means for receiving a first video accompanied by a plurality of channels of audio;
Video output control means for cutting out a part of the first video and outputting a second video;
An audio output control means for generating an output audio for output together with the second video;
Based on the cut-out position of the second video, further comprising a determining unit that determines a combination of input audios used for synthesizing the output audio from a plurality of channels of input audio that is the audio of the first video,
The audio output control means generates the output audio by synthesizing the input voice determined by the determining means.

The sound image at the viewing position is corrected based on the positional relationship between the speaker position of the input audio, the speaker position of the output audio, and the viewing position of the second video, which is determined according to the cutout position of the second video. Correction data calculating means for calculating correction data for performing,
The receiving apparatus according to claim 1, wherein the sound output control unit generates the output sound using the correction data.

3. The correction data according to claim 2, wherein the correction data includes data for determining a mixing ratio of the input sound used for generation of the output sound as data for correcting a direction of arrival of the sound at the viewing position. Receiver device.

The receiving apparatus according to claim 2, wherein the correction data includes data for determining a gain to be given to the output sound as data for correcting a volume at the viewing position.

5. The correction data according to claim 2, wherein the correction data includes data for determining a delay amount given to the output sound as data for correcting a distance to a sound source at the viewing position. The receiving device described.

When a predetermined type of input sound is included in the input sound of the plurality of channels,
The audio output control means synthesizes the predetermined type of input audio with an output audio of a predetermined channel regardless of the cutout position of the second video. The receiving device according to Item 1.

Receiving a first video with multiple channels of audio;
Cutting out a portion of the first video and outputting a second video;
Generating output audio for output together with the second video,
Generating the output speech comprises:
Determining a combination of input sounds used for synthesizing the output sound from a plurality of channels of input sound, which is the sound of the first image, based on the cut-out position of the second image;
And synthesizing the determined input speech to generate the output speech.