HK1164003B

HK1164003B - Information processing device, information processing method, playback device, playback method, and recording medium

Info

Publication number: HK1164003B
Application number: HK12104573.3A
Authority: HK
Inventors: 忍服部
Original assignee: 索尼公司
Priority date: 2009-04-08
Filing date: 2010-04-06
Publication date: 2013-11-15

Description

Information processing apparatus, information processing method, playback apparatus, playback method, and recording medium

Technical Field

The present invention relates to an information processing device, an information processing method, a playback device, a playback method, and a recording medium, and more particularly, to an information processing device, an information processing method, a playback device, a playback method, and a recording medium that enable the same information to be added to pictures corresponding to a basic stream (basic stream) and an extended stream (extended stream) to be used for displaying a 3D image.

Background

Two-dimensional image content is a mainstream of content such as movies and the like, but recently, stereoscopic image content that realizes stereoscopic vision scenes has been attracting attention.

A dedicated device is necessary for displaying a stereoscopic image, and an IP (integral photography) stereoscopic image system developed by NHK (japan broadcasting company), for example, is an example of such a stereoscopic device.

The image data of the stereoscopic image includes image data of a plurality of viewpoints (image data of images photographed from a plurality of viewpoints), and the larger the number of viewpoints, the wider the range covered by the viewpoints, the more the object can be seen from the directions, so that a sort of "television with depth can be seen" can be realized.

Among the stereoscopic images, the least number of viewpoints is a stereoscopic image (also referred to as a 3D image) of two viewpoints. The image data of the stereoscopic image includes data of a left image which is an image viewed by the left eye and data of a right image which is an image viewed by the right eye.

On the other hand, high-resolution image content has a large data volume, and therefore a large-capacity recording medium is necessary for recording such large-data-volume content.

As such a large-capacity recording medium, there is a Blu-Ray (registered trademark) disc (hereinafter also referred to as BD), such as BD (Blu-Ray (registered trademark)) -ROM (read only memory), or the like.

Reference list

Patent document

PTL 1: japanese unexamined patent application publication No.2005-348314

Disclosure of Invention

Technical problem

Incidentally, in the BD standard, how to record image data of a stereoscopic image including a stereoscopic image into a BD or how to play it has not been specified.

For example, the image data of a stereoscopic image includes two data streams: a data stream of a left image and a data stream of a right image. Therefore, corresponding left and right images need to be determined and decoded. As the determination information of the corresponding image, if the corresponding left and right images can be determined at the decoder level using information included in the stream of the corresponding left and right images instead of information managed by a higher-layer system, the processing can be performed in an efficient manner.

The present invention has been made in consideration of such a situation, and enables the same information to be added to pictures corresponding to a basic stream and an extended stream for displaying a 3D image.

Means for solving the problems

An information processing apparatus according to a first aspect of the present invention includes: generating means configured to generate first additional information representing an output timing of each picture of a basic stream obtained by encoding a video stream by a predetermined encoding method, and second additional information representing an output timing of each picture of an extended stream to be used for displaying a 3D image together with the basic stream so that information to be added to pictures corresponding to the basic stream and the extended stream represents the same timing; and an encoding device configured to generate data of pictures of the basic stream and the extended stream by encoding a video stream, add the first additional information to the data of each picture of the basic stream, and add the second additional information to the data of each picture of the extended stream.

An information processing method according to a first aspect of the present invention includes the steps of: generating first additional information representing an output timing of each picture of a basic stream obtained by encoding a video stream by a predetermined encoding method, and second additional information representing an output timing of each picture of an extended stream to be used for displaying a 3D image together with the basic stream, so that information to be added to pictures corresponding to the basic stream and the extended stream represents the same timing; and generating data of pictures of the basic stream and the extended stream by encoding a video stream, adding the first additional information to the data of each picture of the basic stream, and adding the second additional information to the data of each picture of the extended stream.

A playback apparatus according to a second aspect of the present invention includes: an acquisition means configured to acquire a basic stream which is obtained by encoding a video stream by a predetermined encoding method and to which first additional information indicating an output timing of each picture is added to data of each picture, and an extended stream which is used for displaying a 3D image together with the basic stream and to which second additional information indicating an output timing of each picture is added to data of each picture; and decoding means configured to decode data of pictures corresponding to the basic stream and the extended stream at the same timing in accordance with timings indicated by the first additional information and the second additional information, and to output pictures obtained by the decoding at the same timing in accordance with timings indicated by the first additional information and the second additional information.

The decoding means may calculate a value representing a display order of pictures obtained by decoding the basic stream, output a picture of the basic stream whose display order with the highest value is calculated according to the timing represented by the first additional information, and also output a corresponding picture of the extended stream at the same timing.

The first additional information and the second additional information may be SEIs conforming to the H.264/AVC standard.

The playback method according to the second aspect of the present invention includes the steps of: acquiring a basic stream and an extended stream, the basic stream being obtained by encoding a video stream by a predetermined encoding method and to the data of each picture of which first additional information indicating the output timing of each picture is added, the extended stream being used for displaying a 3D image together with the basic stream and to the data of each picture of which second additional information indicating the output timing of each picture is added; and decoding data of pictures corresponding to the basic stream and the extended stream at the same timing according to timings indicated by the first additional information and the second additional information, and outputting pictures obtained by the decoding at the same timing according to timings indicated by the first additional information and the second additional information.

A recording medium according to a third aspect of the present invention, wherein first additional information indicating an output timing of each picture of a basic stream obtained by encoding a video stream by a predetermined encoding method is added to data of each picture of the basic stream, second additional information indicating an output timing of each picture of an extended stream to be used for displaying a 3D image together with the basic stream is added to data of each picture of the extended stream, and information to be added to pictures corresponding to the basic stream and the extended stream is recorded to display the same timing; and wherein when the recording medium is mounted on a playback device, the basic stream and the extended stream are read, and data of pictures corresponding to the basic stream and the extended stream are decoded at the same timing in accordance with the timing indicated by the first additional information and the second additional information, thereby playing the recording medium in its entirety with the playback device.

An information processing apparatus according to a fourth aspect of the present invention includes: generating means configured to generate first constraint information, second constraint information, and third constraint information, wherein the first constraint information relates to a processing constraint at the time of decoding a basic stream obtained by encoding a video stream by a predetermined encoding method, the second constraint information relates to a processing constraint at the time of decoding an extended stream to be used for displaying a 3D image together with the basic stream, and the third constraint information relates to a processing constraint at the time of decoding the basic stream and the extended stream; and an encoding device configured to generate data of pictures of the basic stream and the extended stream by encoding a video stream, add the first constraint information to the data of each picture of the basic stream, and add the second constraint information and the third constraint information to the data of each picture of the extended stream.

An information processing method according to a fourth aspect of the present invention includes the steps of: generating first constraint information, second constraint information, and third constraint information, wherein the first constraint information relates to a processing constraint when decoding a basic stream obtained by encoding a video stream with a predetermined encoding method, the second constraint information relates to a processing constraint when decoding an extended stream to be used for displaying a 3D image together with the basic stream, and the third constraint information relates to a processing constraint when decoding the basic stream and the extended stream; and generating data of pictures of the basic stream and the extended stream by encoding a video stream, adding the first constraint information to the data of each picture of the basic stream, and adding the second constraint information and the third constraint information to the data of each picture of the extended stream.

A playback apparatus according to a fifth aspect of the present invention includes: acquiring means configured to acquire only a basic stream or a basic stream and an extended stream, of the basic stream and the extended stream, wherein the basic stream is obtained by encoding a video stream using a predetermined encoding method, and first constraint information relating to a processing constraint at the time of decoding the basic stream is added to data of each picture of the basic stream, wherein the extended stream is to be used for displaying a 3D image together with the basic stream, and second constraint information relating to a processing constraint at the time of decoding the extended stream is added to data of each picture of the extended stream, and third constraint information relating to a processing constraint at the time of decoding the basic stream and the extended stream is also added to data of each picture of the extended stream; and decoding means configured to decode the stream obtained by the obtaining means in accordance with a constraint represented by information added to data of each picture of the stream obtained by the obtaining means, from among the first to third constraint information.

The decoding means may decode the elementary stream according to the constraint indicated by the first constraint information if the obtaining means obtains only the elementary stream.

If the obtaining means has obtained the basic stream and the extended stream, the decoding means decodes the basic stream and the extended stream according to the constraint indicated by the third constraint information, when the decoding means may include a decoder.

If the obtaining means has obtained the basic stream and the extended stream, the decoding means decodes the basic stream with a decoder for the basic stream in accordance with the constraint indicated by the first constraint information and decodes the extended stream with a decoder for the extended stream in accordance with the constraint indicated by the second constraint information, when the decoding means may include both the decoder for the basic stream and the decoder for the extended stream.

Each of the first to third constraints may include rate information representing a maximum bit rate of data to be input into the decoder, and picture-number-of-pictures information representing a maximum number of pictures that can be stored into a buffer for storing data of decoded pictures.

The rate information may be hrd _ parameters specified by H.264/AVC, and the slice information may be max _ dec _ frame _ buffering specified by H.264/AVC.

The playback method according to the fifth aspect of the present invention includes the steps of: obtaining only a basic stream or a basic stream and an extended stream of a basic stream and an extended stream, wherein the basic stream is obtained by encoding a video stream using a predetermined encoding method, and first constraint information relating to a processing constraint at the time of decoding the basic stream is added to data of each picture of the basic stream, wherein the extended stream is to be used for displaying a 3D image together with the basic stream, and second constraint information relating to a processing constraint at the time of decoding the extended stream is added to data of each picture of the extended stream, and third constraint information relating to a processing constraint at the time of decoding the basic stream and the extended stream is also added to data of each picture of the extended stream; and decoding the obtained stream in accordance with a constraint indicated by information added to data of each picture of the obtained stream among the first to third constraint information.

A recording medium according to a sixth aspect of the present invention, wherein first constraint information relating to a processing constraint at the time of decoding a basic stream obtained by encoding a video stream by a predetermined encoding method is added to data of each picture of the basic stream, second constraint information relating to a processing constraint at the time of decoding an extended stream to be used for displaying a 3D image together with the basic stream, and third constraint information relating to a processing constraint at the time of decoding the basic stream and the extended stream are added to data of each picture of the extended stream, and these pieces of information are recorded; and wherein only the basic stream, or the basic stream and the extended stream, of the basic stream and the extended stream is read when the recording medium is mounted on a playback apparatus; and wherein the read stream is decoded in accordance with a constraint represented by information added to the data of each picture of the obtained stream among the first to third constraint information; and wherein the recording medium is played in its entirety with the playback device.

In the first aspect of the present invention, first additional information indicating output timing of each picture of a basic stream obtained by encoding a video stream by a predetermined encoding method, and second additional information indicating output timing of each picture of an extended stream to be used for displaying a 3D image together with the basic stream are generated so that information to be added to pictures corresponding to the basic stream and the extended stream indicate the same timing. In addition, data of pictures of the basic stream and the extended stream are generated by encoding a video stream, the first additional information is added to the data of each picture of the basic stream, and the second additional information is added to the data of each picture of the extended stream.

In the second aspect of the present invention, a basic stream obtained by encoding a video stream by a predetermined encoding method is obtained using first additional information indicating output timing of each picture added to data of each picture, and an extended stream to be used for displaying a 3D image together with the basic stream is obtained using second additional information indicating output timing of each picture added to data of each picture. In addition, data of pictures corresponding to the basic stream and the extended stream at the same timing is decoded in accordance with timings represented by the first additional information and the second additional information, and pictures obtained by decoding at the same timing in accordance with timings represented by the first additional information and the second additional information are output.

In a fourth aspect of the present invention, first constraint information, which relates to a processing constraint at the time of decoding a basic stream obtained by encoding a video stream by a predetermined encoding method, second constraint information, which relates to a processing constraint at the time of decoding an extended stream to be used for display of a 3D image together with the basic stream, and third constraint information, which relates to a processing constraint at the time of decoding the basic stream and the extended stream, are generated. In addition, data of pictures of the basic stream and the extended stream are generated by encoding a video stream, the first constraint information is added to the data of each picture of the basic stream, and the second constraint information and the third constraint information are added to the data of each picture of the extended stream.

In a fifth aspect of the present invention, only a basic stream, or a basic stream and an extended stream, are obtained, wherein the basic stream is obtained by encoding a video stream using a predetermined encoding method, and first constraint information relating to a processing constraint at the time of decoding the basic stream is added to data of each picture of the basic stream, and wherein the extended stream is to be used for displaying a 3D image together with the basic stream, and second constraint information relating to a processing constraint at the time of decoding the extended stream is added to data of each picture of the extended stream, and third constraint information relating to a processing constraint at the time of decoding the basic stream and the extended stream is also added to data of each picture of the extended stream. In addition, the obtained stream is decoded in accordance with the constraint indicated by the information added to the data of each picture of the obtained stream among the first to third constraint information.

The invention has the advantages of

According to the present invention, the same information may be added to pictures corresponding to a basic stream and an extended stream for displaying a 3D image.

Drawings

Fig. 1 shows a configuration example of a playback system including a playback device to which the present invention is applied.

Fig. 2 shows a shooting example.

Fig. 3 is a block diagram showing a configuration example of an MVC encoder.

Fig. 4 shows an example of a reference image.

Fig. 5 shows a configuration example of the TS.

Fig. 6 shows another configuration example of the TS.

Fig. 7 shows still another configuration example of the TS.

Fig. 8 shows an example of AV stream management.

Fig. 9 shows the structure of the Main Path (Main Path) and the Sub Path (Sub Path).

Fig. 10 shows an example of a management structure of files to be recorded into an optical disc.

Fig. 11 shows the syntax of a PlayList (play list) file.

Fig. 12 shows an example of how reserved _ for _ future _ use in fig. 11 is used.

Fig. 13 shows the meaning of the value of 3D _ PL _ type.

Fig. 14 shows the meaning of the value of view _ type.

Fig. 15 shows syntax of PlayList () in fig. 11.

Fig. 16 shows syntax of SubPath () in fig. 15.

Fig. 17 shows the syntax of subplayitem (i) in fig. 16.

Fig. 18 shows syntax of PlayItem () in fig. 15.

Fig. 19 shows syntax of STN _ table () in fig. 18.

Fig. 20 shows a configuration example of a playback device.

Fig. 21 shows a configuration example of the decoder unit in fig. 20.

Fig. 22 shows a configuration for performing video stream processing.

Fig. 23 shows a configuration for performing video stream processing.

Fig. 24 shows another configuration for performing video stream processing.

Fig. 25 shows an example of an Access Unit (Access Unit).

Fig. 26 shows yet another configuration for performing video stream processing.

Fig. 27 shows the configuration of the composition unit and its preceding stage.

Fig. 28 is another diagram showing the configuration of the composition unit and its preceding stage.

Fig. 29 is a block diagram showing a configuration example of the software assembling processing unit.

Fig. 30 shows an example of respective configurations including a software assembly processing unit.

Fig. 31 shows a configuration example of a 3D video TS generating unit to be provided to a recording apparatus.

Fig. 32 shows another configuration example of a 3D video TS generating unit to be provided to a recording apparatus.

Fig. 33 shows still another configuration example of the 3D video TS generating unit to be provided to the recording apparatus.

Fig. 34 shows a configuration on the playback device side for decoding access units.

Fig. 35 shows the decoding process.

Fig. 36 shows a closed gop (close gop) structure.

Fig. 37 shows an open gop (open gop) structure.

Fig. 38 shows the maximum number of frames/fields within one GOP.

Fig. 39 shows a closed GOP structure.

Fig. 40 shows an open GOP structure.

Fig. 41 shows an example of the decoding start position set to the EP _ map.

Fig. 42 illustrates a problem caused when a GOP structure of Dependent view video is not defined.

Fig. 43 shows the concept of picture search.

Fig. 44 shows the structure of an AV stream recorded on an optical disc.

Fig. 45 shows an example of a ClipAV stream.

Fig. 46 conceptually shows EP _ map corresponding to the ClipAV stream in fig. 45.

Fig. 47 shows an example of a data structure of a source packet indicated by SPN _ EP _ start.

Fig. 48 shows a sub-table included in EP _ map.

Fig. 49 shows an example of the formats of the entry PTS _ EP _ coarse and the entry PTS _ EP _ fine.

Fig. 50 shows an example of the formats of the entry SPN _ EP _ coarse and the entry SPN _ EP _ fine.

Fig. 51 shows the configuration of the access unit.

Fig. 52 is a block diagram showing a configuration example of the recording apparatus.

Fig. 53 is a block diagram showing a configuration example of the MVC encoder in fig. 52.

Fig. 54 is a flowchart describing a recording process of the recording apparatus.

The flowchart of fig. 55 describes the encoding process performed in step S2 of fig. 54.

Fig. 56 is a block diagram showing a configuration example of a playback device.

Fig. 57 is a block diagram showing a configuration example of the MVC decoder in fig. 56.

Fig. 58 is a flowchart describing playback processing of the playback device.

The flowchart of fig. 59 describes the decoding process performed in step S32 in fig. 58.

The flowchart of fig. 60 describes the decoding process performed in step S32 in fig. 58, continuing from fig. 59.

The flowchart of fig. 61 describes the random access playback processing by the playback unit.

Fig. 62 shows the states of the Base (Base) view video stream and the dependent view video stream.

Fig. 63 shows an example of encoding positions of HRD parameters of the base view video stream.

Fig. 64 shows a description format when HRD parameters are encoded at the positions shown in fig. 63.

Fig. 65 shows an example of the encoding position of max _ dec _ frame _ buffering of the base view video stream.

FIG. 66 shows the description format when max _ dec _ frame _ buffering is encoded at the position shown in FIG. 65.

Fig. 67 shows an example of encoding positions of HRD parameters of the dependent view video stream.

Fig. 68 shows a description format when HRD parameters are encoded at the positions shown in fig. 67.

Fig. 69 shows another description format when HRD parameters are encoded at the positions shown in fig. 67.

Fig. 70 shows an example of the encoding position of max _ dec _ frame _ buffering of the dependent view video stream.

Fig. 71 shows a description format when max _ dec _ frame _ buffering is encoded at the position shown in fig. 70.

FIG. 72 shows another description format when max _ dec _ frame _ buffering is set at the position shown in FIG. 70.

Fig. 73 is a flowchart describing a recording process of the recording apparatus.

The flowchart of fig. 74 describes the playback processing of the playback device.

Fig. 75 shows one setting example of the parameters.

Fig. 76 shows another setting example of the parameters.

Fig. 77 is a block diagram showing another configuration example of the MVC decoder.

Fig. 78 shows still another setting example of the parameters.

Fig. 79 shows one setting example of the parameters.

Fig. 80 shows another setting example of the parameters.

Fig. 81 shows still another setting example of the parameters.

Fig. 82 shows an authentication apparatus.

Fig. 83 shows a functional configuration of the HRD.

Fig. 84 shows an example of authentication.

Fig. 85 shows another example of authentication.

Fig. 86 shows a description example of view _ type.

Fig. 87 shows another description example of view _ type.

Fig. 88 is a block diagram showing a configuration example of hardware of a computer.

Detailed Description

< first embodiment >

[ configuration example of playback System ]

Fig. 1 shows a configuration example of a playback system including a playback device 1 to which the present invention is applied.

As shown in fig. 1, the playback system includes a playback device 1 and a display device 3 connected by an HDMI (high definition multimedia interface) cable or the like. An optical disc 2 such as a BD is mounted to the playback device 1.

Streams necessary for displaying a stereoscopic image (also referred to as a 3D image) with the number of viewpoints being 2 are recorded in the optical disc 2.

The playback device 1 is a player compatible with 3D playback of streams recorded in the optical disc 2. The playback device 1 plays the stream recorded in the optical disc 2, and displays a 3D image obtained by playback on a display device 3 constituted by a television receiver or the like. The playback device 1 plays audio in the same manner, and outputs from a speaker or the like provided to the display device 3.

Various methods have been proposed as a 3D image display method. Now, the following type-1 display method and type-2 display method will be adopted as the 3D image display method.

the type-1 display method is a method in which data of a 3D image includes data of an image viewed by a left eye (L image) and data of an image viewed by a right eye (R image), and the 3D image is displayed by alternately displaying the L image and the R image.

the type-2 display method is a method of displaying a 3D image by displaying an L image and an R image generated using data of an original image and Depth data, wherein the original image is an image used as a source for generating the 3D image. the data of the 3D image to be used by the type-2 display method includes data of an original image, and Depth data that can generate an L image and an R image from the original image supplied thereto.

the type-1 display method is a display method requiring glasses at the time of viewing and listening. And the type-2 display method is a display method that can view and listen to a 3D image without glasses.

The stream is recorded into the optical disc 2 so that a 3D image can be displayed using one of a type-1 display method and a type-2 display method.

For example, h.264avc (advanced video coding)/MVC (multiview video coding) profile standard is used as a coding method for recording such a stream in the optical disc 2.

[ H.264AVC/MVC Profile ]

In the h.264AVC/MVC profile standard, an image stream called base view video and an image stream called dependent view video are defined. Hereinafter, the H.264AVC/MVC profile standard is referred to as MVC for short, as appropriate.

Fig. 2 shows a shooting example.

As shown in fig. 2, shooting is performed with the camera for the L image and the camera for the R image by taking the same object as a subject. An elementary stream (elementary stream) of video captured by a camera for L images and a camera for R images is input to the MVC encoder.

Fig. 3 is a block diagram showing a configuration of an MVC encoder.

As shown in fig. 3, the MVC encoder 11 includes an h.264/AVC encoder 21, an h.264/AVC decoder 22, a Depth calculation unit 23, a dependent view video encoder 24, and a multiplexer 25.

The stream of video #1 taken by the camera for L image is input to the h.264/AVC encoder 21 and the Depth calculation unit 23. In addition, the stream of the video #2 taken by the camera for R image is input to the Depth calculation unit 23 and the dependent view video encoder 24. Another arrangement may be made wherein the stream of video #2 is input to the h.264/AVC encoder 21 and the Depth calculation unit 23, and the stream of video #1 is input to the Depth calculation unit 23 and the dependent view video encoder 24.

The h.264/AVC encoder 21 encodes the stream of video #1 into, for example, an h.264 AVC/advanced profile video stream. The h.264/AVC encoder 21 outputs the AVC video stream obtained by encoding to the h.264/AVC decoder 22 and the multiplexer 25 as a base view video stream.

The h.264/AVC decoder 22 decodes the AVC video stream supplied from the h.264/AVC encoder 21, and outputs a stream of the video #1 obtained by the decoding to the dependent view video encoder 24.

The Depth calculation unit 23 calculates Depth based on the stream of the video #1 and the stream of the video #2, and outputs the calculated Depth data to the multiplexer 25.

The dependent view video encoder 24 encodes the stream of the video #1 supplied from the h.264/AVC decoder 22 and the stream of the externally input video #2, and outputs a dependent view video stream.

For base view video, predictive coding using another stream as a reference picture is not allowed, but for dependent view video, predictive coding using base view video as a reference picture is allowed, as shown in fig. 4. For example, when encoding is performed with the L image as the base view video and the R image as the dependent view video, the data amount of the dependent view video stream obtained as a result of the encoding is smaller than that of the base view video stream.

Note that, according to encoding with h.264/AVC, prediction in the temporal direction has been performed on base view video. In addition, also for dependent view video, prediction in the temporal direction is performed together with prediction between views. In order to decode dependent view video, it is necessary to complete decoding of corresponding base view video first, because the base view video was once taken as a reference target at the time of encoding.

The dependent view video encoder 24 outputs a dependent view video stream obtained by encoding with such inter-view prediction to the multiplexer 25.

The multiplexer 25 multiplexes the base view video stream supplied from the h.264/AVC encoder 21, the dependent view video stream (data of Depth) supplied from the Depth calculation unit 23, and the dependent view video stream supplied from the subordinate view video encoder 24 into, for example, MPEG2 TS. The base view video stream and the dependent view video stream may be multiplexed into a single MPEG 2TS, or may be included in a separate MPEG2 TS.

The multiplexer 25 outputs the generated TS (MPEG2 TS). The TS output from the multiplexer 25 is recorded into the optical disc 2 in the recording apparatus together with other management data, and is supplied to the playback apparatus 1 in such a manner as to be recorded in the optical disc 2.

If it is necessary to distinguish a dependent view video used together with a base view video in the type-1 display method from a dependent view video (Depth) used together with a base view video in the type-2 display method, the former will be referred to as a D1 view video and the latter will be referred to as a D2 view video.

In addition, 3D playback in the type-1 display method to be performed using the base view video and the D1 view video will be referred to as B-D1 playback. The 3D playback in the type-2 display method to be performed using the base view video and the D2 view video will be referred to as B-D2 playback.

If the B-D1 playback is performed according to an instruction of a user or the like, the playback device 1 reads and plays the base view video stream and the D1 view video stream from the optical disc 2.

Also, if B-D2 playback is performed, the playback device 1 reads and plays the base view video stream and the D2 view video stream from the optical disc 2.

Further, if playback of a general 2D image is performed, the playback device 1 separately reads and plays the base view video stream from the optical disc 2.

The base view video stream is an AVC video stream encoded by h.264/AVC, and therefore, as long as the playback device 1 is a player compatible with the BD format, the playback device 1 can play its base view video stream to display 2D images.

A case where the dependent view video is the D1 view video will be described below in principle. When referred to as dependent view video only, represents D1 view video. The D2 viewpoint video is also recorded in the optical disc 2 in the same manner as the D1 viewpoint video, and is played.

[ configuration example of TS ]

Fig. 5 shows a configuration example of the TS.

The streams of each of the base view video, dependent view video, primary audio, base PG, dependent PG, base IG, and dependent IG are multiplexed into the main TS in fig. 5. As described above, the dependent view video stream may be included in the main TS together with the base view video stream.

The main TS and the sub TS are recorded in the optical disc 2. The main TS is a TS including at least a base view video stream. The sub TS is a TS including a stream other than the base view video stream to be used with the main TS.

Each of the streams of the base view and the dependent view is prepared for PG and IG to be described later, so that display in 3D is available in the same manner as video.

The plane of the base views of PG and IG obtained by decoding each stream is displayed by plane synthesis with the base view video obtained by decoding the base view video stream. Similarly, the planes of dependent views of PG and IG are displayed through plane synthesis with dependent view video obtained by decoding the dependent video view stream.

For example, if the base view video stream is a stream of L images and the dependent view video stream is a stream of R images, the stream of the base view thereof becomes a graphics stream of L images also for PG and IG. In addition, the PG stream and the IG stream of the dependent view become a graphics stream of an R image.

On the other hand, if the base view video stream is a stream of R images and the dependent view video stream is a stream of L images, the stream of the base view thereof becomes a graphics stream of R images also for PG and IG. In addition, the PG stream and the IG stream of the dependent view become a graphics stream of an L image.

Fig. 6 shows another configuration example of the TS.

The streams of each of the base view video and the dependent view video are multiplexed into the main TS in fig. 6.

On the other hand, streams of each of the master audio, the base PG, the slave PG, the base IG, and the slave IG are multiplexed into the sub TS.

Therefore, an arrangement may be made in which a video stream is multiplexed into the main TS and streams of PG, IG, and the like are multiplexed into the sub TS.

Fig. 7 shows still another configuration example of the TS.

The streams of each of the base view video, the primary audio, the base PG, the dependent PG, the base IG, and the dependent IG are multiplexed into the primary TS in a shown in fig. 7.

On the other hand, the dependent view video stream is included in the sub TS.

Therefore, the dependent view video stream may be included in another TS different from the base view video stream.

The streams of each of the base view video, the main audio, PG, and IG are multiplexed into the main TS in B in fig. 7. On the other hand, streams of each of the dependent view video, the base PG, the dependent PG, the base IG, and the dependent IG are multiplexed into the sub TS.

PG and IG included in the main TS are streams for 2D playback. The stream included in the sub TS is a stream for 3D playback.

Therefore, the stream of PG and the stream of IG may not be shared by 2D playback and 3D playback.

As described above, the base view video stream and the dependent view video stream may be included in different MPEG2 TSs. An advantage that the base view video stream and the dependent view video stream are included in different MPEG 2TS and recorded will be described.

For example, consider the case where the bit rate of the multiplexing that can be performed as a single MPEG 2TS is limited. In this case, when both the base view video stream and the dependent view video stream are included in a single MPEG 2TS, the bit rate of each stream needs to be lowered to satisfy its constraint. As a result, the image quality deteriorates.

Since the respective streams are included in different MPEG2 TSs and thus there is no need to reduce the bit rate, it is possible to prevent the image quality from deteriorating.

[ application Format ]

Fig. 8 shows an example of AV stream management of the playback device 1.

AV stream management is performed using two layers of playlists and clips shown in fig. 8. The AV stream may be recorded in a local storage device of the playback device 1 instead of only the optical disc 2.

Here, one pair including one AV stream and Clip information as information accompanying it is taken as one object and will be collectively referred to as a Clip. Herein, a file storing an AV stream will be referred to as an AV stream file. In addition, a file storing Clip information is also referred to as a Clip information file.

The AV streams are mapped on the time axis, and the access point of each Clip is principally specified by a time stamp in the PlayList. The Clip information file is used to find an address or the like within the AV stream at which decoding starts.

A PlayList is a set of playback sections (sections) of an AV stream. One playback section within the AV stream is called one PlayItem. The PlayItem is represented on a time axis by a pair of In-point (In point) and Out-point (Out point) of a playback section. As shown in fig. 8, a PlayList is composed of single or multiple playitems.

The first PlayList from the left side of fig. 8 includes two playitems, and the first half and the second half of the AV stream included in the Clip on the left side are respectively referenced by its two playitems.

The second PlayList from the left side includes one PlayItem, by which the entire AV stream included in the Clip on the right side is referenced.

The third PlayList from the left includes two playitems by which a portion of the AV stream included in the Clip on the left side and a portion of the AV stream included in the Clip on the right side are referenced.

For example, if a left PlayItem included in the first PlayList from the left side has been designated as a playback object by the disc navigation program, playback of the first half of the AV stream included in the left Clip, which is referenced by the PlayItem, is performed. Thus, the PlayList is used as playback management information for managing playback of the AV stream.

The playback Path created by the arrangement of one or more playitems within the PlayList will be referred to as a Main Path (Main Path).

In addition, a playback Path created by the arrangement of one or more subplayitems within the PlayList will be referred to as a Sub Path (Sub Path).

Fig. 9 shows the structure of the primary path and the secondary path.

One PlayList may have one main path and one or more sub paths.

The above-described base view video stream is managed as a stream referenced by playitems constituting the main path. In addition, the dependent view video stream is managed as a stream referenced by SubPlayItem constituting the sub path.

The PlayList in fig. 9 has one main path including an arrangement of three playitems and three sub paths.

An ID is set for each PlayItem constituting the main path in order from the beginning. ID Subpath _ ID is also set to 0, Subpath _ ID is 1, and Subpath _ ID is 2 to the secondary paths in order from the beginning, respectively.

In the example in fig. 9, one SubPlayItem is included in the sub path whose Subpath _ id is 0, two subplayitems are included in the sub path whose Subpath _ id is 1, and one SubPlayItem is included in the sub path whose Subpath _ id is 2.

A Clip AV stream referred to by one PlayItem includes at least one video stream (main image data).

In addition, the Clip AV stream may or may not include one or more audio streams to be played at the same timing (synchronization) as the video stream included in the Clip AV stream.

The Clip AV stream may or may not include one or more bitmap subtitle data (PG (presentation graphics)) streams to be played in synchronization with a video stream included in the Clip AV stream.

The Clip AV stream may or may not include one or more IG (interactive graphics) streams to be played in synchronization with a video stream included in the Clip AV stream file. The IG stream is used to display graphics such as buttons to be operated by the user.

In a Clip AV stream referred to by one PlayItem, a video stream, zero or more audio streams to be played in synchronization therewith, zero or more PG streams, and zero or more IG streams are multiplexed.

In addition, one SubPlayItem references a video stream, an audio stream, a PG stream, and the like, which are streams different from the Clip AV stream referenced by the playltem.

Management of AV streams using such PlayLists, PlayItems, and SubPlayItems is described in Japanese unexamined patent application publication No.2008-252740 and Japanese unexamined patent application publication No. 2005-348314.

[ catalog Structure ]

Fig. 10 shows an example of a management structure of files to be recorded on the optical disc 2.

As shown in fig. 10, files are managed through a directory structure in a hierarchical manner. A root directory is created on the optical disc 2. Under the root directory is a scope to be managed by one recording/playback system.

The BDMV directory is arranged under the root directory.

An Index file (Index file) as a file with the name "Index.

A BACKUP directory, a PLAYLIST directory, a CLIPINF directory, a STREAM directory, etc. are also provided under the BDMV directory.

A PlayList file describing a PlayList is stored in the PlayList directory. A name in which 5-digit numbers and an extension ". mpls" are combined is set to each PlayList file. A file name "00000. mpls" is set to one PlayList file shown in fig. 10.

Clip information files are stored in the CLIPINF directory. A name composed of a 5-digit number and an extension ". clpi" is set to each Clip information file.

File names "00001. clpi", "00002. clpi", and "00003. clpi" are respectively set to the three Clip information files in fig. 10. Hereinafter, the Clip information file will be referred to as a clpi file as appropriate.

For example, a clpi file "00001. clpi" is a file in which information on a Clip of a base view video is described.

The clpi file "00002. clpi" is a file in which information about the Clip of the D2 viewpoint video is described.

The clpi file "00003. clpi" is a file in which information about the Clip of the D1 viewpoint video is described.

The STREAM file is stored in the STREAM directory. A name in which 5 digits are combined with an extension ". m2 ts" or a name in which 5 digits are combined with an extension ". ilvt" is set to each stream file. Hereinafter, a file having the extension ". m2 ts" set will be referred to as an m2ts file as appropriate. In addition, a file in which the extension ". ilvt" is set will be referred to as an ilvt file.

The m2ts file "00001. m2 ts" is a file for 2D playback, and reading of the base view video stream is performed by specifying the file.

The m2ts file "00002. m2 ts" is a D2 view video stream file, and the m2ts file "00003. m2 ts" is a D1 view video stream file.

The ilvt file "10000. ilvt" is a file for playback of B-D1, and reading of the base view video stream and the D1 view video stream is performed by specifying the file.

The ilvt file "20000. ilvt" is a file for B-D2 playback, and reading of the base view video stream and the D2 view video stream is performed by specifying the file.

In addition to the directories shown in fig. 10, a directory storing audio stream files and the like is provided under the BDMV directory.

[ syntax of each data ]

Fig. 11 shows the syntax of the PlayList file.

The PlayList file is a file with an extension ". mpls" set, and is stored in the PlayList directory in fig. 10.

Type _ indicator in fig. 11 represents the type of "xxxxx. mpls" file.

version _ number indicates the version number of "xxxxx. mpls" file. The version _ number is composed of 4-bit numbers. For example, "0240" represents a "3D specification version" set for a PlayList file for 3D playback.

PlayList _ start _ address represents the start address of PlayList (), with the number of relative bytes from the start byte of the PlayList file being the unit.

PlayListMark _ start _ address represents the leading address of PlayListMark () in relative byte number from the leading byte of the PlayList file.

ExtensionData _ start _ address represents the beginning address of ExtensionData (), in relative byte numbers from the beginning byte of the PlayList file.

A 160-bit reserved _ for _ future _ use is included after ExtensionData _ start _ address.

Parameters related to playback control of a PlayList such as playback restriction are stored in AppInfoPlayList ().

Parameters related to the main path, the sub path, and the like are stored in PlayList (). The content of PlayList () will be described later.

PlayList mark information, that is, information about a mark that is a jump destination (jump point) operated by a user or a command for instructing chapter jump or the like, is stored in PlayListMark ().

Private data can be inserted into ExtensionData ().

Fig. 12 shows a specific example of the description of the PlayList file.

As shown in fig. 12, 2-bit 3D _ PL _ type and 1-bit view _ type are described in a PlayList file. view _ type is described in, for example, AppInfoPlayList () in fig. 11.

The 3D _ PL _ type represents the type of PlayList.

view _ type indicates whether the base view video stream whose playback is managed by the PlayList is an L-picture (L view) stream or an R-picture (R view) stream.

Fig. 13 shows the meaning of the value of 3D _ PL _ type.

A value 00 of 3D _ PL _ type indicates that this is a PlayList for 2D playback.

A value of 01 for 3D _ PL _ type indicates that this is a PlayList for B-D1 playback in 3D playback.

A value of 10 for 3D _ PL _ type indicates that this is a PlayList for B-D2 playback in 3D playback.

For example, if the value of 3D _ PL _ type is 01 or 10, 3D PlayList information is registered into ExtensionData () of the PlayList file. For example, information related to reading the base view video stream and the dependent view video stream from the optical disc 2 is registered as 3DplayList information.

Fig. 14 shows the meaning of the value of view _ type.

If 3D playback is performed, a value of 0 of view _ type indicates that the base view video stream is an L view stream. If 2D playback is performed, a value of 0 of view _ type indicates that the base view video stream is an AVC video stream.

A value of 1 for view _ type indicates that the base view video stream is an R view stream.

The playback device 1 can identify whether the base view video stream is an L view stream or an R view stream using the view _ type described in the PlayList file.

For example, if a video signal is output to the display device 3 via an HDMI cable, it can be considered that the playback device 1 is required to output an L viewpoint signal and an R viewpoint signal after each signal is distinguished.

The playback device 1 can distinguish and output an L view signal and an R view signal by enabling identification of whether the base view video stream is an L view stream or an R view stream.

Fig. 15 shows syntax of PlayList () in fig. 11.

length is a 32-bit unsigned integer indicating the number of bytes from the length field to the end of PlayList (). That is, length represents the number of bytes from reserved _ for _ future _ use to the end of the PlayList.

After length, 16-bit reserved _ for _ future _ use is prepared.

number _ of _ PlayItems is a 16-bit field indicating the number of PlayItems within a PlayList. In the case of the example in fig. 9, the number of playitems is 3. Values are assigned to PlayItem _ id from 0 in the order in which PlayItem () appears in the PlayList. For example, in fig. 9, PlayItem _ id is given 0, 1, and 2.

number _ of _ SubPaths is a 16-bit field indicating the number of sub paths within one PlayList. In the case of the example in fig. 9, the number of secondary paths is 3. Values are assigned to SubPath _ id from 0 in the order in which SubPath () appears in the PlayList. For example, SubPath _ id is given as 0, 1, and 2 in fig. 9. Hereinafter, PlayItem () is referred to the number of times of PlayItem, and SubPath () is referred to the number of times of sub path.

Fig. 16 shows syntax of SubPath () in fig. 15.

length is a 32-bit unsigned integer indicating the number of bytes from the length field to the end of SubPath (). That is, length represents the number of bytes from reserved _ for _ future _ use to the end of the PlayList.

After length, 16-bit reserved _ for _ future _ use is prepared.

The Subpath _ type is an 8-bit field indicating the type of application of the secondary path. For example SubPath _ type is used to indicate, for example, that the type of the secondary path is audio, bitmap subtitle, or text subtitle.

A reserved _ for _ future _ use of 15 bits is prepared after SubPath _ type.

is _ repeat _ SubPath is a 1-bit field specifying the playback method of the secondary path, which indicates whether playback of the secondary path is repeatedly performed or performed only once during playback of the primary path. For example, if the playback timing of a Clip referenced by the main path and the playback timing of a Clip referenced by the sub path are different (if the main path is a path for slideshow of still images and the sub path is a path for audio serving as BGM or the like), this field is used.

After is _ repeat _ SubPath, reserved _ for _ future _ use of 8 bits is prepared.

number _ of _ SubPlayItems is an 8-bit field indicating the number of SubPlayItems (number of entries) in one sub path. For example, number _ of _ SubPlayItems of SubPlayItems whose SubPath _ id is 0 in fig. 9 is 1, and number _ of _ SubPlayItems of SubPlayItems whose SubPath _ id is 1 is 2. In the subsequent for sentence, SubPlayItem () is referred to the number of subplayitems.

Fig. 17 shows the syntax of subplayitem (i) in fig. 16.

length is a 16-bit unsigned integer indicating the number of bytes from the length field to the end of SubPlayItem ().

SubPlayItem (i) in fig. 17 is described in terms of the division into a case where SubPlayItem references one Clip and a case where SubPlayItem references a plurality of clips.

Description will be made regarding a case where SubPlayItem refers to one Clip.

Clip _ Information _ file _ name [0] represents a Clip to be referenced.

Clip _ codec _ identifier [0] indicates a Clip codec method. Reserved _ for _ future _ use is included after Clip _ codec _ identifier [0 ].

is _ multi _ Clip _ entries is a flag indicating presence/absence of registration for a plurality of clips. If the is _ multi _ Clip _ entries flag is on, the syntax in the case where the SubPlayItem references multiple clips is referred to.

ref _ to _ STC _ id [0] is information on an STC discontinuity (discontinuity of system time base).

SubPlayItem _ IN _ time represents the start position of the playback section of the sub path, and SubPlayItem _ OUT _ time represents the end position.

The sync _ PlayItem _ id and sync _ start _ PTS _ of _ PlayItem indicate time points when the sub path starts playback on the time axis of the main path.

SubPlayItem _ IN _ time, SubPlayItem _ OUT _ time, sync _ PlayItem _ id, and sync _ start _ PTS _ of _ PlayItem are commonly used at a Clip referenced by the SubPlayItem.

Description will be made below regarding a case where "if is _ multi _ Clip _ entries ═ 1 b", and SubPlayItem refers to a plurality of clips.

num _ of _ Clip _ entries indicates the number of clips to be referenced. The number of Clip _ Information _ file _ name [ SubClip _ entry _ id ] specifies the number of clips excluding Clip _ Information _ file _ name [0 ].

Clip _ codec _ identifier [ SubClip _ entry _ id ] indicates a Clip codec method.

ref _ to _ STC _ id SubClip _ entry _ id is information about STC discontinuity (discontinuity of system time base). Reserved _ for _ future _ use is included after ref _ to _ STC _ id SubClip _ entry _ id.

Fig. 18 shows syntax of PlayItem () in fig. 15.

length is a 16-bit unsigned integer indicating the number of bytes from the length field to the end of PlayItem ().

Clip _ Information _ file _ name [0] represents the file name of the Clip Information file of the Clip referenced by the PlayItem. Note that the same 5-digit number is contained in the file name of the m2ts file including the Clip, to which the file name of the Clip information file corresponds.

Clip _ codec _ identifier [0] indicates a Clip codec method. Reserved _ for _ future _ use is included after Clip _ codec _ identifier [0 ]. The reserved _ for _ future _ use is followed by is _ multi _ angle and connection _ condition.

IN _ time represents the start position of the playback section of the PlayItem, and OUT _ time represents the end position.

OUT _ time is followed by UO _ mask _ table (), PlayItem _ random _ access _ mode, and still _ mode.

STN _ table () includes information of the AV stream referenced by the object PlayItem. In addition, if there is a sub path to be played in association with the object PlayItem, information of the AV stream referenced by the SubPlayItem constituting the sub path thereof is also included.

Fig. 19 shows syntax of STN _ table () in fig. 18.

STN _ table () is set as an attribute of PlayItem.

length is a 16-bit unsigned integer indicating the number of bytes from the length field to the end of STN _ table (). A reserved _ for _ future _ use of 16 bits is also prepared after the length.

number _ of _ video _ stream _ entries represents the number of streams to which video _ stream _ id to be input (registered) into the STN _ table () is supplied.

video _ stream _ id is information for identifying a video stream. For example, the base view video stream is determined by the video _ stream _ id.

The ID of the dependent view video stream may be defined in STN _ table (), or may be obtained by, for example, such calculation: a predetermined value is added to the ID of the base view video stream, and so on.

video _ stream _ number is a video stream number viewed from the user, and is used for video switching.

number _ of _ audio _ stream _ entries represents the number of streams of the first audio stream to which the audio _ stream _ id input into the STN _ table () is supplied. audio _ stream _ id is information for identifying an audio stream, and audio _ stream _ number is an audio stream number as viewed from the user, for audio switching.

number _ of _ audio _ stream2_ entries represents the number of streams of the second audio stream to which audio _ stream _ id2 input into STN _ table () is supplied. The audo _ stream _ id2 is information for identifying an audio stream, and the audio _ stream _ number is an audio stream number as viewed from the user, for audio switching. In this example, the audio to be played is arranged to be switched.

number _ of _ PG _ txtST _ stream _ entries represents the number of streams to which PG _ txtST _ stream _ id input into STN _ table () is supplied. Among these, a PG stream obtained by subjecting a bitmap subtitle to cursor length coding and a text subtitle file (txtST) are input. PG _ txtST _ stream _ id is information for identifying a subtitle stream, and PG _ txtST _ stream _ number is a subtitle stream number as viewed from the user, for subtitle switching.

number _ of _ IG _ stream _ entries represents the number of streams to which IG _ stream _ id input into STN _ table () is supplied. Among these, an IG stream is input. IG _ stream _ id is information for identifying an IG stream, and IG _ stream _ number is a graphics stream number as viewed from the user for graphics switching.

The IDs of the main TS and the sub TS are also registered in the STN _ table (). In stream _ attribute (), it is described that the ID thereof is not the ID of the elementary stream but the ID of the TS.

[ configuration example of playback device 1]

Fig. 20 is a block diagram showing a configuration example of the playback device 1.

The controller 51 executes a prepared control program to control the entire operation of the playback device 1.

For example, the controller 51 controls the disc drive 52 to read a PlayList file for 3D playback. In addition, the controller 51 also controls the disc drive 52 to read the main TS and the sub TS based on the ID registered in the STN _ table, and supplies these to the decoder unit 56.

The disk drive 52 reads data from the optical disk 2 according to the control of the controller 51, and outputs the read data to the controller 51, the memory 53, and the decoder unit 56.

The memory 53 appropriately stores data necessary for the controller 51 to perform various types of processing.

The local storage device 54 is configured by, for example, an HDD (hard disk drive). The dependent view video stream or the like downloaded from the server 72 is recorded in the local storage device 54. The stream recorded in the local storage device 54 is also supplied to the decoder unit 56 as appropriate.

The internet interface 55 performs communication with the server 72 via the network 71 according to the control of the controller 51, and provides data downloaded from the server 72 to the local storage device 54.

The data for updating the data recorded in the optical disc 2 is downloaded from the server 72. By enabling the downloaded dependent view video stream to be used together with the base view video stream recorded in the optical disc 2, 3D playback of content different from that of the optical disc 2 can be realized. When the dependent view video stream is downloaded, the content of the PlayList is also updated appropriately.

The decoder unit 56 decodes the stream supplied from the disk drive 52 or the local storage device 54, and outputs the obtained video signal to the display device 3. The audio signal is also output to the display device 3 via a predetermined route.

The operation input unit 57 includes an input device such as a button, a key, a touch panel, a dial, a mouse, and a receiving unit for receiving a signal such as infrared rays transmitted from a predetermined remote controller. The operation input unit 57 detects an operation by the user, and supplies a signal representing the content of the detected operation to the controller 51.

Fig. 21 shows a configuration example of the decoder unit 56.

Fig. 21 shows a configuration in which processing of a video signal is performed. In the decoder unit 56, decoding processing of the audio signal is also performed. The result of the decoding process performed on the audio signal as an object is output to the display device 3 via a route not shown.

The PID filter 101 identifies whether the TS supplied from the disk drive 52 or the local storage device 54 is a main TS or an auxiliary TS based on the ID of the stream constituting the TS and the PID of the packet. The PID filter 101 outputs the main TS to the buffer 102 and the sub TS to the buffer 103.

Based on the PID, the PID filter 104 sequentially reads the packets of the main TS stored in the buffer 102 to distribute the packets.

For example, the PID filter 104 outputs the packets constituting the base view video stream included in the main TS to the B video buffer 106, and outputs the packets constituting the dependent view video stream to the switch 107.

In addition, the PID filter 104 also outputs packets constituting the base IG stream included in the main TS to the switch 114, and outputs packets constituting the dependent IG stream to the switch 118.

The PID filter 104 outputs packets constituting the base PG stream included in the main TS to the switch 122, and outputs packets constituting the dependent PG stream to the switch 126.

As described with reference to fig. 5, streams of each of the base view video, the dependent view video, the base PG, the dependent PG, the base IG, and the dependent IG may be multiplexed into the main TS.

Based on the PID, the PID filter 105 sequentially reads the packets of the sub TS stored in the buffer 103 to distribute the packets.

For example, the PID filter 105 outputs packets constituting the dependent view video stream included in the sub TS to the switch 107.

In addition, the PID filter 105 also outputs packets constituting the base IG stream included in the sub TS to the switch 114, and outputs packets constituting the dependent IG stream to the switch 118.

The PID filter 105 outputs packets constituting the base PG stream included in the sub TS to the switch 122, and outputs packets constituting the dependent PG stream to the switch 126.

As described with reference to fig. 7, the dependent view video stream may be included in the sub TS. In addition, as described with reference to fig. 6, streams of each of the base PG, the dependent PG, the base IG, and the dependent IG may be multiplexed into the sub TS.

The switch 107 outputs packets constituting the dependent view video stream supplied from the PID filter 104 or the PID filter 105 to the D video buffer 108.

The switch 109 sequentially reads the base view video packet stored in the B video buffer 106 and the dependent view video packet stored in the D video buffer 108 according to time point information specifying decoding timing, for example, the same time point information is set to a packet storing data of a certain picture of the base view video and a packet storing data of the picture of the corresponding dependent view video.

The switch 109 outputs the packet read from the B video buffer 106 or the D video buffer 108 to the video decoder 110.

The video decoder 110 decodes the packet supplied from the switch 109 to output the base view video or the dependent view video obtained by the decoding to the switch 111.

The switch 111 outputs data obtained by decoding the base view video packet to the B video plane generating unit 112, and outputs data obtained by decoding the dependent view video packet to the D video plane generating unit 113.

The B video plane generating unit 112 generates a base viewpoint video plane based on the data supplied from the switch 111, and outputs it to the synthesizing unit 130.

The D video plane generating unit 113 generates a dependent view video plane based on the data supplied from the switch 111, and outputs it to the synthesizing unit 130.

The switch 114 outputs the packets constituting the base IG stream supplied from the PID filter 104 or the PID filter 105 to the B IG buffer 115.

The B IG decoder 116 decodes the packets constituting the base IG stream stored in the B IG buffer 115 to output the data obtained by the decoding to the B IG plane generating unit 117.

The B IG plane generating unit 117 generates a basic IG plane based on the data supplied from the B IG decoder 116, and outputs it to the synthesizing unit 130.

The switch 118 outputs the packets constituting the dependent IG stream supplied from the PID filter 104 or the PID filter 105 to the D IG buffer 119.

The D IG decoder 120 decodes the packets constituting the dependent IG stream stored in the D IG buffer 119, and outputs the data obtained by the decoding to the D IG plane generating unit 121.

The D IG plane generating unit 121 generates a dependent IG plane based on the data supplied from the D IG decoder 120, and outputs it to the synthesizing unit 130.

The switch 122 outputs the packets constituting the base PG stream supplied from the PID filter 104 or the PID filter 105 to the B PG buffer 123.

The B PG decoder 124 decodes the packets constituting the base PG stream stored in the B PG buffer 123, and outputs data obtained by the decoding to the B PG plane generating unit 125.

The B PG plane generating unit 125 generates a base PG plane based on the data supplied from the B PG decoder 124, and outputs it to the synthesizing unit 130.

The switch 126 outputs the packets constituting the dependent PG stream supplied from the PID filter 104 or the PID filter 105 to the D PG buffer 127.

The D PG decoder 128 decodes the packets constituting the dependent PG stream stored in the D PG buffer 127, and outputs data obtained by the decoding to the D PG plane generating unit 129.

The D PG plane generating unit 129 generates a dependent PG plane based on the data supplied from the D PG decoder 128, and outputs it to the synthesizing unit 130.

The synthesizing unit 130 synthesizes the base viewpoint video plane supplied from the B video plane generating unit 112, the base IG plane supplied from the B IG plane generating unit 117, and the base PG plane supplied from the B PG plane generating unit 125 by overlaying them in a predetermined order, thereby generating a base viewpoint plane.

In addition, the synthesizing unit 130 also synthesizes the dependent view video plane supplied from the D video plane generating unit 113, the dependent IG plane supplied from the D IG plane generating unit 121, and the dependent PG plane supplied from the D PG plane generating unit 129 by overlaying them in a predetermined order, thereby generating a dependent view plane.

The synthesizing unit 130 outputs data of the base view plane and the dependent view plane. The video data output from the synthesizing unit 130 is output to the display device 3, and 3D display is performed by alternately displaying the base view plane and the dependent view plane.

[ first example of T-STD (transport stream-System target decoder) ]

Now, a configuration of the decoder and its surroundings in the configuration shown in fig. 21 will be described.

Fig. 22 shows a configuration in which processing of a video stream is performed.

In fig. 22, the same configurations as those shown in fig. 21 are denoted by the same reference numerals. Fig. 22 shows the PID filter 104, the B video buffer 106, the switch 107, the D video buffer 108, the switch 109, the video decoder 110, and the DPB (decoded picture buffer) 151. Although not shown in fig. 21, the DPB 151 in which data of decoded pictures is stored is supplied to a subsequent stage of the video decoder 110.

The PID filter 104 outputs the packets constituting the base view video stream included in the main TS to the B video buffer 106, and outputs the packets constituting the dependent view video stream to the switch 107.

For example, PID 0 has been given to packets constituting the base view video stream as a fixed value of PID. In addition, a fixed value other than 0 has been given as PID to the packets constituting the dependent view video stream.

The PID filter 104 outputs a packet whose header describes PID ═ 0 to the B video buffer 106, and outputs a packet whose header describes PIDs other than 0 to the switch 107.

The packets output to the B video buffer 106 are via TB (transport buffer)₁And MB (multiplex buffer)₁Is stored to VSB₁In (1). Data of elementary stream of base view video is stored to VSB₁In (1).

Both the packets output from the PID filter 104 and the packets constituting the dependent view video stream extracted from the sub TS at the PID filter 105 in fig. 21 are supplied to the switch 107.

If the packets constituting the dependent view video stream have been supplied from the PID filter 104, the switch 107 outputs them to the D video buffer 108.

In addition, if the packets constituting the dependent view video stream have been supplied from the PID filter 105, the switch 107 also outputs them to the D video buffer 108.

The packets output to the D-video buffer 108 are via the TB₂And MB₂Is stored to VSB₂In (1). Data of elementary stream of dependent view video is stored to VSB₂In (1).

The switch 109 sequentially reads the VSB stored in the B video buffer 106₁And VSB stored in the D video buffer 108₂And outputs them to the video decoder 110.

For example, the switch 109 continuously outputs the base view video packet and the dependent view video packet at the same time point to the video decoder 110, so that the dependent view video packet at the same time point as the time point is output immediately after the base view video packet is output at the time point.

For a packet storing data of a certain picture of base view video and a packet storing data of a picture of dependent view video corresponding thereto, at the encoding timing thereof, the same time point information ensuring PCR (program clock reference) synchronization is set. Even in the case where the base view video stream and the dependent view video stream are each included in a different TS, the same point-in-time information is set to the packet storing the data of the corresponding picture.

The time point information is DTS (decoding time stamp) and PTS (presentation time stamp), and is set to each PES (packetized elementary stream) packet.

That is, when the pictures of each stream are arranged in the encoding order/decoding order, the picture of the base view video and the picture of the dependent view video at the same point in time become corresponding pictures. The same DTS is set to a PES packet storing data of a certain base view video picture and a PES packet storing data of a dependent view video picture corresponding to the picture in decoding order.

In addition, when the pictures of each stream are arranged in the display order, the base view video picture and the dependent view video picture at the same point in time also become corresponding pictures. The same PTS is set to a PES packet storing data of a certain base view video picture and a PES packet storing data of a dependent view video picture corresponding to the picture in display order.

If the GOP structure of the base view video stream and the GOP structure of the dependent view video stream are the same structure, the corresponding picture in decoding order becomes the corresponding picture in display order, which will be described later.

VSB from the B video buffer 106 at a specific timing if packet transfer is performed serially₁DTS of read packet₁And VSB from the D video buffer 108 at timing immediately thereafter₂DTS of read packet₂Representing the same point in time as shown in fig. 22.

Switch 109 will slave the VSB of B video buffer 106₁Read base view video packets or VSB from D video buffer 108₂The read dependent view video packet is output to the video decoder 110.

The video decoder 110 sequentially decodes the packets supplied from the switch 109 to store data of the base view video picture or data of the dependent view video picture obtained by decoding into the DPB 151.

The switch 111 reads out data of the decoded picture stored in the DPB 151 at a predetermined timing. In addition, the data of the decoded picture stored in the DPB 151 is also used by the video decoder 110 to predict another picture.

If data transfer is performed serially, the PTS of data of a base view video picture output at a certain timing and the PTS of data of a dependent view video picture output at an immediately subsequent timing represent the same point in time.

The base view video stream and the dependent view video stream may be multiplexed into a single TS, for example, as described with reference to fig. 5 or the like, or may each be included in a different TS, as described with reference to fig. 7.

Even in a case where the base view video stream and the dependent view video stream are multiplexed into a single TS or may each be included in a different TS, the playback device 1 can handle such a case by implementing the decoder model in fig. 22.

For example, as shown in fig. 23, if only a case where a single TS is provided is assumed, the playback device 1 cannot handle a case where the base view video stream and the dependent view video stream are each included in a different TS, or the like.

In addition, according to the decoder model in fig. 22, even if the base view video stream and the dependent view video stream are each included in a different TS, they have the same DTS, and thus the packets can be supplied to the video decoder 110 at the correct timing.

The decoder for the base view video and the decoder for the dependent view video may be arranged in parallel. In this case, packets at the same point in time are supplied to each of the decoder for base view video and the decoder for dependent view video at the same timing.

[ second example ]

Fig. 24 shows another configuration for performing processing on a video stream.

In addition to the configuration in fig. 22, fig. 24 also shows a switch 111, an L video plane generating unit 161, and an R video plane generating unit 162. In addition, the PID filter 105 is shown on the preceding stage of the switch 107. Redundant description is appropriately omitted.

The L video plane generating unit 161 generates an L viewpoint video plane, and is provided in place of the B video plane generating unit 112 in fig. 21.

The R video plane generating unit 162 generates an R viewpoint video plane, and is provided in place of the D video plane generating unit 113 in fig. 21.

In this example, the switch 111 needs to recognize and output L viewpoint video data and R viewpoint video data.

That is, the switch 111 needs to identify whether data obtained by decoding the base view video packet is arbitrary video data of L view or R view.

In addition, the switch 111 also needs to identify whether data obtained by decoding the dependent view video packet is arbitrary video data of L view or R view.

The view _ type described with reference to fig. 12 and 14 is used to identify the L view and the R view. For example, the controller 51 outputs view _ type described in the PlayList file to the switch 111.

If the value of view _ type is 0, the switch 111 outputs data obtained by decoding the base view video packet identified by PID ═ 0, of the data stored in the DPB 151, to the L video plane generating unit 161. As described above, the value 0 of view _ type indicates that the base view video stream is an L view stream.

In this case, the switch 111 outputs data obtained by decoding the dependent view video packets identified by the PIDs other than 0 to the R video plane generating unit 162.

On the other hand, if the value of view _ type is 1, the switch 111 outputs data obtained by decoding the base view video packet identified by PID ═ 0, among the data stored in the DPB 151, to the R video plane generating unit 162. A value of 1 for view _ type indicates that the base view video stream is an R view stream.

In this case, the switch 111 outputs data obtained by decoding the dependent view video packets identified by PIDs other than 0 to the L video plane generating unit 161.

The L video plane generating unit 161 generates an L viewpoint video plane based on the data supplied from the switch 111 and outputs it to the synthesizing unit 130.

The R video plane generating unit 162 generates an R viewpoint video plane based on the data supplied from the switch 111, and outputs it to the synthesizing unit 130.

There is no information (field) indicating whether the stream is an L view or an R view in the elementary streams of the base view video and dependent view video encoded with the h.264avc/MVC profile standard.

Accordingly, by setting the view _ type to the PlayList file, the recording apparatus can recognize whether the base view video stream and the dependent view video stream are each L view or R view.

The playback device 1 identifies whether each of the base view video stream and the dependent view video stream is an L view or an R view, and can switch the output destination according to the identification result.

If L view and R view are also prepared for each of the IG and PG planes, L view and R view of the video stream can be distinguished, so that the playback device 1 can easily perform synthesis of the L view plane and the R view plane.

As described above, if a video signal is output via the HDMI cable, it is required that the L-viewpoint signal and the R-viewpoint signal each be separately output, and the playback device 1 can handle the requirement.

The identification of the data obtained by decoding the base view video packets stored in the DPB 151 and the data obtained by decoding the dependent view video packets may be performed based on the view _ id instead of the PID.

When encoding is performed using the h.264AVC/MVC profile standard, view _ id is set to an access unit of a stream constituting an encoding result. Which view component unit each access unit is can be identified from the view _ id.

Fig. 25 shows an example of an access unit.

Access unit #1 in fig. 25 is a unit including data of base view video. The dependent unit #2 is a unit including data of dependent view video. An access unit (a dependent unit in the case of a dependent view) is, for example, a unit that collects data of one picture so as to be accessible in units of pictures.

By performing encoding in compliance with the h.264AVC/MVC profile standard, data of each picture of the base view video and the dependent view video is stored in such a unit. In encoding conforming to the h.264avc/MVC profile standard, an MVC header is added to each view component as shown in dependent unit # 2. view _ id is included in the MVC header.

In the case of the example in fig. 25, for the dependent unit #2, it can be recognized from view _ id that the view component to be stored in its unit is dependent view video.

On the other hand, as shown in fig. 25, the MVC header is not added to the base view video as the view component stored in access unit # 1.

As described above, the base view video stream is data to be used also for 2D playback. Therefore, to ensure compatibility therewith, the MVC header is not added to the base view video at the time of encoding. Alternatively, the MVC header that was added is removed. Encoding with the recording apparatus will be described below.

In the playback device 1, a view component to which no MVC header is added is defined (set) such that its view _ id is 0, and the view component is recognized as a base view video. A value other than 0 is set to the dependent view video as view _ id at the time of encoding.

Accordingly, the playback device 1 can identify the base view video based on the view _ id identified as 0, and can identify the dependent view video based on the actually set view _ id other than 0.

In the switch 111 in fig. 24, identification of data obtained by decoding the base view video packet and data obtained by decoding the dependent view video can be performed based on such view _ id.

[ third example ]

Fig. 26 shows still another example in which processing of a video stream is performed.

In the example in fig. 26, the B video plane generating unit 112 is provided instead of the L video plane generating unit 161 in fig. 24, and the D video plane generating unit 113 is provided instead of the R video plane generating unit 162. A switch 171 is provided on the rear stage of the B video plane generating unit 112 and the D video plane generating unit 113. Also with the configuration shown in fig. 26, it is arranged to switch the data output destination based on view _ type.

The switch 111 outputs data obtained by decoding the base view video packet among the data stored in the DPB 151 to the B video plane generation unit 112. In addition, the switch 111 outputs data obtained by decoding the dependent view video packet to the D video plane generating unit 113.

As described above, data obtained by decoding the base view video packet and data obtained by decoding the dependent view video packet are identified based on the PID or view _ id.

The B video plane generating unit 112 generates a base viewpoint video plane based on the data supplied from the switch 111 and outputs it.

The D video plane generating unit 113 generates a dependent view video plane based on the data supplied from the switch 111 and outputs it.

View _ type described in the PlayList file is supplied from the controller 51 to the switch 171.

The switch 171 outputs the base view video plane supplied from the B video plane generating unit 112 as an L view video plane to the synthesizing unit 130 if the value of view _ type is 0. A value of 0 of view _ type indicates that the base view video stream is an L view stream.

In addition, in this case, the switch 171 outputs the dependent view video plane supplied from the D video plane generating unit 113 to the synthesizing unit 130 as an R view video plane.

On the other hand, if the value of view _ type is 1, the switch 171 outputs the dependent view video plane supplied from the D video plane generating unit 113 to the synthesizing unit 130 as an L view video plane. A value of 1 for view _ type indicates that the base view video stream is an R view stream.

In addition, in this case, the switch 171 outputs the base viewpoint video plane supplied from the B video plane generating unit 112 to the synthesizing unit 130 as an R viewpoint video plane.

Also according to the configuration of fig. 26, the playback device 1 recognizes the L viewpoint and the R viewpoint, and can switch the output destination according to the recognition result.

[ first example of planar Synthesis model ]

Fig. 27 shows a configuration of the synthesizing unit 130 and its subsequent stage in the configuration shown in fig. 21.

Also in fig. 27, the same configurations as those shown in fig. 21 are denoted by the same reference numerals.

The packets constituting the IG stream included in the main TS or the sub TS are input to the switch 181. The packets constituting the IG stream to be input to the switch 181 include a base view packet and a dependent view packet.

The packets constituting the PG stream included in the main TS or the sub TS are input to the switch 182. The packets constituting the PG stream to be input to the switch 182 include a base view packet and a dependent view packet.

As described with reference to fig. 5 and the like, also for IG and PG, a base view stream and a dependent view stream for performing 3D display are prepared.

The IG of the base viewpoint is displayed in a synthesized manner with the base viewpoint video, and the IG of the dependent viewpoint is displayed in a synthesized manner with the dependent viewpoint video, so that the user can view a 3D video and buttons, icons, and the like.

The PG of the base view is displayed in a manner of being synthesized with the base view video, and the PG of the dependent view is displayed in a manner of being synthesized with the dependent view video, so that the user can view a 3D video, caption text, and the like.

The switch 181 outputs the packets constituting the base IG stream to the B IG decoder 116, and outputs the packets constituting the dependent IG stream to the D IG decoder 120. The switch 181 includes the functions of the switch 114 and the switch 118 in fig. 21. In fig. 27, the illustration of each buffer is omitted.

The B IG decoder 116 decodes the packet constituting the base IG stream supplied from the switch 181, and outputs the data obtained by the decoding to the B IG plane generating unit 117.

The D IG decoder 120 decodes the packet constituting the dependent IG stream supplied from the switch 181, and outputs the data obtained by the decoding to the D IG plane generating unit 121. The base IG stream and the dependent IG stream can be arranged to be decoded by one decoder.

The switch 182 outputs packets constituting the base PG stream to the B PG decoder 124, and outputs packets constituting the dependent PG stream to the D PG decoder 128. Switch 182 includes the functionality of switch 122 and switch 126 in fig. 21.

The B PG decoder 124 decodes the packet constituting the base PG stream supplied from the switch 182, and outputs data obtained by the decoding to the B PG plane generating unit 125.

The B PG plane generating unit 125 generates a base PG plane based on the data supplied from the B PG decoder 124 to output it to the synthesizing unit 130.

The D PG decoder 128 decodes the packet constituting the dependent PG stream supplied from the switch 182, and outputs data obtained by the decoding to the D PG plane generating unit 129. The base PG stream and the dependent PG stream can be arranged to be decoded by one decoder.

The D PG plane generating unit 129 generates a dependent PG plane based on the data supplied from the D PG decoder 128 to output it to the synthesizing unit 130.

The video decoder 110 sequentially decodes the packets supplied from the switch 109 (fig. 22 and the like), and outputs the data of the base view video and the data of the dependent view video obtained by decoding to the switch 111.

The switch 111 outputs data obtained by decoding the packet of the base view video to the B video plane generating unit 112, and outputs data obtained by decoding the packet of the dependent view video to the D video plane generating unit 113.

The synthesizing unit 130 includes adding units 191 to 194 and a switch 195.

The addition unit 191 synthesizes the dependent PG plane supplied from the D PG plane generation unit 129 onto the dependent view video plane supplied from the D video plane generation unit 113 in an overlaying manner, and outputs the synthesis result to the addition unit 193. The dependent PG plane supplied from the D PG plane generating unit 129 to the adding unit 191 is subjected to color information conversion processing (CLUT (color lookup table) processing).

The addition unit 192 synthesizes the base PG plane supplied from the B PG plane generation unit 125 onto the base viewpoint video plane supplied from the B video plane generation unit 112 in an overlaying manner, and outputs the synthesis result to the addition unit 194. The base PG plane supplied from the B PG plane generating unit 125 to the adding unit 192 is subjected to color information conversion processing or correction processing using an offset value.

The addition unit 193 synthesizes the dependent IG plane supplied from the D IG plane generating unit 121 on the synthesis result of the addition unit 191 in an overlaid manner, and outputs the synthesis result as a dependent viewpoint plane. The dependent IG plane supplied from the D IG plane generating unit 121 to the adding unit 193 is subjected to color information conversion processing.

The addition unit 194 synthesizes the base IG plane supplied from the B IG plane generation unit 117 onto the synthesis result of the addition unit 192 in an overlaying manner, and outputs the synthesis result as a base viewpoint plane. The basic IG plane supplied from the B IG plane generating unit 117 to the adding unit 194 is subjected to color information conversion processing or correction processing using an offset value.

The image to be displayed based on the basic viewpoint plane and the dependent viewpoint plane thus generated becomes an image that causes the buttons and icons to be seen in the front, the subtitle text to be seen thereunder (depth direction), and the video to be seen thereunder.

If the value of view _ type is 0, the switch 195 outputs the base view plane as the L view plane and the dependent view plane as the R view plane. view _ type is supplied from controller 51 to switch 195.

In addition, if the value of view _ type is 1, the switch 195 outputs the base view plane as the R view plane and the dependent view plane as the L view plane. Which plane is a base view plane or a dependent view plane among the provided planes is identified based on PID or view _ id.

Thus, in the playback device 1, synthesis of the base view plane, the dependent view plane, and the plane of each of the video, IG, and PG is performed.

At the stage where the synthesis of all the planes of the video, IG, and PG has been completed, it is determined whether the synthesis result of the base view plane is the L view or the R view based on the view _ type, and the R view plane and the L view plane are each output.

In addition, at a stage where the synthesis of all the planes of the video, IG, and PG has been completed, it is determined whether the synthesis result of the dependent view plane is the L view or the R view based on the view _ type, and the R view plane and the L view plane are each output.

[ second example ]

Fig. 28 shows a configuration of the combining unit 130 and its preceding stage.

In the configuration shown in fig. 28, the same configurations as those shown in fig. 27 are denoted by the same reference numerals. In fig. 28, the configuration of the combining unit 130 is different from that in fig. 27. In addition, the operation of the switch 111 is different from that of the switch 111 in fig. 27. The L video plane generating unit 161 is provided to replace the B video plane generating unit 112, and the R video plane generating unit 162 is provided to replace the D video plane generating unit 113. Redundant description will be omitted.

The same value of view _ type is supplied from the controller 51 to the switch 111 and the switches 201 and 202 of the synthesizing unit 130.

The switch 111 switches the output destinations of the data obtained by decoding the packet of the base view video and the data obtained by decoding the packet of the dependent view video on the basis of the view _ type in the same manner as the switch 111 in fig. 24.

For example, if the value of view _ type is 0, the switch 111 outputs data obtained by decoding the packet of the base view video to the L video plane generating unit 161. In this case, the switch 111 outputs data obtained by decoding the packet of the dependent view video to the R video plane generating unit 162.

On the other hand, if the value of view _ type is 1, the switch 111 outputs data obtained by decoding the packet of the base view video to the R video plane generating unit 162. In this case, the switch 111 outputs data obtained by decoding the packet of the dependent view video to the L video plane generating unit 161.

The synthesizing unit 130 includes a switch 201, a switch 202, and adding units 203 to 206.

The switch 201 switches the output destinations of the base IG plane supplied from the B IG plane generating unit 117 and the dependent IG plane supplied from the D IG plane generating unit 121 based on the view _ type.

For example, if the value of view _ type is 0, the switch 201 outputs the base IG plane supplied from the B IG plane generating unit 117 as an L view plane to the adding unit 206. In this case, the switch 201 outputs the dependent IG plane supplied from the D IG plane generating unit 121 to the adding unit 205 as an R viewpoint plane.

On the other hand, if the value of view _ type is 1, the switch 201 outputs the dependent IG plane supplied from the D IG plane generating unit 121 to the adding unit 206 as an L view plane. In this case, the switch 201 outputs the basic IG plane supplied from the B IG plane generating unit 117 to the adding unit 205 as an R viewpoint plane.

The switch 202 switches the output destinations of the base PG plane supplied from the B PG plane generating unit 125 and the dependent PG plane supplied from the D PG plane generating unit 129 based on the view _ type.

For example, if the value of view _ type is 0, the switch 202 outputs the base PG plane supplied from the B PG plane generating unit 125 to the adding unit 204 as an L view plane. In this case, the switch 202 outputs the dependent PG plane supplied from the D PG plane generating unit 129 to the adding unit 203 as an R viewpoint plane.

On the other hand, if the value of view _ type is 1, the switch 202 outputs the dependent PG plane supplied from the D PG plane generating unit 129 to the adding unit 204 as an L view plane. In this case, the switch 202 outputs the base PG plane supplied from the B PG plane generating unit 125 to the adding unit 203 as an R viewpoint plane.

The addition unit 203 synthesizes the PG plane of the R viewpoint supplied from the switch 202 onto the R viewpoint video plane supplied from the R video plane generation unit 162 in an overlaying manner, and outputs the synthesis result to the addition unit 205.

The addition unit 204 synthesizes the PG plane of the L viewpoint supplied from the switch 202 onto the L viewpoint video plane supplied from the L video plane generation unit 161 in an overlay manner, and outputs the synthesis result to the addition unit 206.

The addition unit 205 synthesizes the IG plane of the R viewpoint supplied from the switch 201 onto the plane of the synthesis result of the addition unit 203 in an overlaid manner, and outputs the synthesis result as the R viewpoint plane.

The addition unit 206 synthesizes the IG plane of the L viewpoint supplied from the switch 201 onto the plane of the synthesis result of the addition unit 204 in an overlaid manner, and outputs the synthesis result as an L viewpoint plane.

In this way, in the playback device 1, it is determined which plane is the L view or the R view for the base view plane and the dependent view plane of each of the video, IG, and PG before being synthesized with another plane.

After the determination is performed, synthesis of each plane of the video, IG, and PG is performed, whereby a plane of the L viewpoint and a plane of the R viewpoint are synthesized.

[ configuration example of recording apparatus ]

Fig. 29 is a block diagram showing a configuration example of the software assembling processing unit 301.

The video encoder 311 has the same configuration as the MVC encoder 11 in fig. 3. The video encoder 311 generates a base view video stream and a dependent view video stream by encoding a plurality of video data according to the h.264AVC/MVC profile standard, and outputs them to the buffer 312.

For example, the video encoder 311 sets a DTS and a PTS having the same PCR as a reference at the time of encoding. That is, the video encoder 311 sets the same DTS to a PES packet for storing data of a certain base view video picture and a PES packet for storing data of a dependent view video picture corresponding to the picture in decoding order.

In addition, the video encoder 311 sets the same PTS to a PES packet for storing data of a certain base view video picture and a PES packet for storing data of a dependent view video picture corresponding to the picture in display order.

As will be described later, the video encoder 311 sets the same information to each of the base view video picture and the corresponding base view video picture in decoding order as additional information (which is supplementary information related to decoding).

Further, as will be described later, the video encoder 311 sets the same value to each of the base view video picture and the corresponding base view video picture in display order as the value of POC indicating the output order of the pictures.

In addition, as will be described later, the video encoder 311 performs encoding so that the GOP structure of the base view video stream and the GOP structure of the dependent view video stream match.

The audio encoder 313 encodes an input audio stream and outputs the obtained data to the buffer 314. The audio stream to be recorded in the disc is input to the audio encoder 313 together with the base view video stream and the dependent view video stream.

The data encoder 315 encodes the aforementioned various types of data (e.g., PlayList files, etc.) other than video and audio, and outputs the data obtained by the encoding to the buffer 316.

The data encoder 315 sets view _ type, which indicates whether the base view video stream is an L view stream or an R view stream, to the PlayList file according to encoding by the video encoder 311. Information indicating whether the dependent view video stream is an L view stream or an R view stream may be set instead of the type of the base view video stream.

In addition, the data encoder 315 also sets EP _ map, which will be described later, to each of the Clip information file of the base view video stream and the Clip information file of the dependent view video stream. The picture of the base view video stream and the picture of the dependent view video stream set to the EP _ map as the decoding start position become corresponding pictures.

The multiplexing unit 317 multiplexes the video data and audio data stored in each buffer and data other than the stream together with the synchronization signal, and outputs it to the error correction encoding unit 318.

The error correction encoding unit 318 adds an error correction code to the data multiplexed by the multiplexing unit 317.

Modulation section 319 modulates the data supplied from error correction coding section 318 and outputs the modulated data. The output of the modulation unit 319 becomes software to be recorded onto the optical disc 2 that can be played in the playback device 1.

The software assembling processing unit 301 having such a configuration is also provided to the recording apparatus.

Fig. 30 shows a configuration example including the software assembly processing unit 301.

A part of the configuration shown in fig. 30 may be provided inside the recording apparatus.

The recording signal generated by the software assembling processing unit 301 is subjected to mastering processing in a pre-mastering processing unit (mastering processing unit)331, and generates a format signal to be recorded into the optical disc 2. The generated signal is supplied to the master recording unit 333.

In a mastering unit (a mastering for recording) 332, a master disc made of glass or the like is prepared, and a recording material made of a photoresist (photoresist) is coated thereon. Thereby making a master for recording.

With the master recording unit 333, a laser beam is modulated in response to a recording signal supplied from the pre-master processing unit 331 and irradiated onto the photoresist on the master. So that the photoresist on the master disc is exposed in response to the recording signal. Subsequently, the master is developed and pits (pit) appear on the master.

In the metal mastering unit 334, the master is subjected to a process such as electroforming, thereby producing a metal master to which pits on the glass master are transferred. Further, a metal stamper (stamp) is produced from the metal master, and this is used as a molding die (molding die).

In the molding process unit 335, a material such as PMMA (acrylic), PC (polycarbonate) or the like is injected into the molding die, thereby fixing the molding die. Alternatively, after 2P (ultraviolet curable resin) or the like is applied to the metal stamper, ultraviolet rays are irradiated thereto, thereby hardening the molding die. Thus, pits on the metal stamper can be transferred to the replica made of resin.

In the film formation processing unit 336, a reflection film is formed on the replica by vapor deposition or sputtering. Alternatively, a reflective film is formed on the replica by spin coating.

In the post-processing unit 337, the disc is subjected to diameter processing so that the two discs are bonded together by necessary measurement. Further, after a label is attached thereto or a hub (hub) is attached thereto, the disc is inserted into the disc cartridge. Thus, the optical disc 2 in which data playable by the playback device 1 is recorded is completed.

< second embodiment >

[ operation 1 of H.264AVC/MVC Profile video stream ]

As described above, in the BD-ROM standard, which is the standard of the optical disc 2, encoding of 3D video is realized using the h.264avc/MVC profile.

In addition, in the BD-ROM standard, the base view video stream is treated as an L view video stream, and the dependent view video stream is treated as an R view video stream.

The base view video is encoded as an h.264 AVC/advanced profile video stream so that the optical disc 2, which is a 3D compatible disc, can be played even in a past player or in a player compatible with only 2D playback. That is, downward compatibility can be ensured.

Specifically, even in a decoder compliant with the h.264AVC/MVC profile standard, the base view video stream can be decoded (played) separately. That is, the base view video stream becomes a stream that can be played without fail even in the existing 2D BD player.

In addition, the base view video stream is used in both 2D playback and 3D playback, so that the burden at the time of authoring (editing) can be reduced. For the AV stream, the writer makes a 3D compatible disc by preparing a dependent view video stream in addition to the conventional work.

Fig. 31 shows a configuration example of a 3D video TS generating unit to be set to a recording apparatus.

The 3D video TS generating unit in fig. 31 includes an MVC encoder 401, an MVC header removing unit 402, and a multiplexer 403. Data of the L view video #1 and data of the R view video #2 photographed as described with reference to fig. 2 are input to the MVC encoder 401.

The MVC encoder 401 encodes the data of the L view video #1 with h.264/AVC in the same manner as the MVC encoder 11 in fig. 3, and outputs AVC video data obtained by the encoding as a base view video stream. In addition, the MVC encoder 401 generates a dependent view video stream based on the data of the L view video #1 and the data of the R view video #2, and outputs it.

The base view video stream output from the MVC encoder 401 is composed of access units, and data of each picture of the base view video is stored in the access units. The dependent view video stream output from the MVC encoder 401 is composed of dependent units, and data of each picture of the dependent view video is stored in the dependent units.

Each access unit constituting the base view video stream and each dependent unit constituting the dependent view video stream include an MVC header in which view _ id for identifying the stored view component is described.

A fixed value equal to or greater than 1 is used as the view _ id to be described in the MVC header of the dependent view video. This also applies to the examples in fig. 32 and 33.

That is, the MVC encoder 401 different from the MVC encoder 11 in fig. 3 is an encoder for generating and outputting each stream of the base view video and the dependent view video in a form in which an MVC header is added. In the MVC encoder 11 of fig. 3, an MVC header is added only to dependent view video encoded with the h.264AVC/MVC profile standard.

The base view video stream output from the MVC encoder 401 is supplied to the MVC header removing unit 402, and the dependent view video stream is supplied to the multiplexer 403.

The MVC header removal unit 402 removes MVC headers included in access units constituting the base view video stream. The MVC header removal unit 402 outputs the base view video stream composed of the access units from which the MVC header has been removed to the multiplexer 403.

The multiplexer 403 generates and outputs a TS including the base view video stream supplied from the MVC header removal unit 402 and the dependent view video stream supplied from the MVC encoder 401. In the example of fig. 31, the TS including the base view video stream and the TS including the dependent view video stream are output separately, but these streams may be output by being multiplexed into the same TS as described above.

Thus, an MVC encoder can be conceived in which, depending on the installation method, L view video and R view video are input, and each stream of base view video and dependent view video to which an MVC header is added is output.

Note that the entire configuration shown in fig. 31 may be included in the MVC encoder shown in fig. 3. This also applies to the configurations shown in fig. 32 and 33.

Fig. 32 shows another configuration example of a 3D video TS generating unit to be set to a recording apparatus.

The 3D video TS generating unit in fig. 32 includes a mixing processing unit 411, an MVC encoder 412, a separating unit 413, an MVC header removing unit 414, and a multiplexer 415. The data of the L view video #1 and the data of the R view video #2 are input to the blend processing unit 411.

The blend processing unit 411 arranges each picture of L view and each picture of R view in coding order. Each picture of the dependent view video is encoded with reference to a corresponding picture of the base view video, and thus, the picture of the L view and the picture of the R view are alternately arranged as a result of being arranged in the encoding order.

The mixing processing unit 411 outputs pictures of L views and pictures of R views arranged in the coding order to the MVC encoder 412.

The MVC encoder 412 encodes each picture supplied from the mixing processing unit 411 according to the h.264AVC/MVC profile standard, and outputs a stream obtained by the encoding to the separation unit 413. The base view stream and the dependent view video stream are multiplexed into the stream output from the MVC encoder 412.

The base view video stream included in the stream output from the MVC encoder 412 is composed of access units in which data of each picture of the base view video is stored. In addition, the dependent view video stream included in the stream output from the MVC encoder 412 is composed of dependent units in which data of each picture of the dependent view video is stored.

An MVC header in which view _ id for identifying the stored view component is described is included in each access unit constituting the base view video stream and each dependent unit constituting the dependent view video stream.

The separation unit 413 separates the base view video stream and the dependent view video stream multiplexed in the stream supplied from the MVC encoder 412, and outputs them. The base view video stream output from the separation unit 413 is supplied to the MVC header removing unit 414, and the dependent view video stream is supplied to the multiplexer 415.

The MVC header removing unit 414 removes an MVC header included in each access unit constituting the base view video stream supplied from the separating unit 413. The MVC header removal unit 414 outputs the base view video stream composed of the access units from which the MVC header has been removed to the multiplexer 415.

The multiplexer 415 generates and outputs a TS including the base view video stream supplied from the MVC header removing unit 414 and the dependent view video stream supplied from the separating unit 413.

Fig. 33 shows still another configuration example of the 3D video TS generating unit to be set to the recording apparatus.

The 3D video TS generating unit in fig. 33 includes an AVC encoder 421, an MVC encoder 422, and a multiplexer 423. Data of L view video #1 is input to the AVC encoder 421, and data of R view video #2 is input to the MVC encoder 422.

The AVC encoder 421 encodes the data of the L view video #1 in accordance with h.264/AVC, and outputs an AVC video stream obtained by the encoding to the MVC encoder 422 and the multiplexer 423 as a base view video stream. The access unit constituting the base view video stream output from the AVC encoder 421 does not include an MVC header.

The MVC encoder 422 decodes the base view video stream (AVC video stream) supplied from the AVC encoder 421 to generate data of L view video # 1.

In addition, the MVC encoder 422 generates a dependent view video stream based on the data of the L view video #1 obtained by decoding and the data of the R view video #2 externally input, and outputs it to the multiplexer 423. Each dependent unit constituting the dependent view video stream output from the MVC encoder 422 includes an MVC header.

The multiplexer 423 generates and outputs a TS including the base view video stream supplied from the AVC encoder 421 and the dependent view video stream supplied from the MVC encoder 422.

The AVC encoder 421 in FIG. 33 has the functions of the H.264/AVC encoder 21 in FIG. 3, and the MVC encoder 422 has the functions of the H.264/AVC encoder 22 and the dependent view video encoder 24 in FIG. 3. In addition, the multiplexer 423 has the function of the multiplexer 25 in fig. 3.

The 3D video TS generating unit having such a configuration is provided in the recording apparatus, so that encoding of the MVC header of the access unit storing the data of the base view video can be prohibited. In addition, an MVC header whose view _ id is set to be equal to or greater than 1 may be included in a dependent unit storing data of dependent view video.

Fig. 34 shows a configuration for decoding an access unit on the playback device 1 side.

Fig. 34 shows the switch 109 and the video decoder 110 and the like described with reference to fig. 22. The access unit #1 including data of the base view video and the slave unit #2 including data of the slave view video are read out from the buffer and supplied to the switch 109.

Encoding is performed with reference to base view video, and thus in order to correctly decode dependent view video, it is first necessary that the corresponding base view video has already been decoded.

In the h.264/MVC profile standard, the decoding side is arranged to calculate the decoding order of each unit using the view _ id included in the MVC header. In addition, the minimum value is always arranged to be set to the base view video as a value of view _ id at the time of encoding. The decoder is arranged to be able to decode the base view video and the dependent view video in the correct order by starting decoding from a unit including the MVC header in which the minimum view _ id is set.

Incidentally, encoding of the MVC header of the access unit storing the base view video to be supplied to the video decoder 110 of the playback device 1 is prohibited.

Therefore, in the playback device 1, the view component stored in the access unit having no MVC header is defined so that its view _ id can be recognized as 0.

Accordingly, the playback device 1 can identify the base view video based on the view _ id identified as 0, and can identify the dependent view video based on the view _ id actually set to a value other than 0.

The switch 109 in fig. 34 first outputs the access unit #1 identified with the minimum value 0 set to view _ id to the video decoder 110 for decoding.

In addition, after the decoding of the access unit #1 is completed, the switch 109 outputs the slave unit #2 (which is a unit in which a fixed value Y greater than 0 is set to view _ id) to the video decoder 110 for performing the decoding. The picture of the dependent view video stored in the dependent unit #2 is a picture corresponding to the picture of the base view video stored in the access unit # 1.

In this way, encoding of the MVC header of the access unit storing the base view video is prohibited, so that the base view video stream recorded in the optical disc 2 can be handled as a playable stream even in a conventional player.

As a condition for a base view video stream extended from the BD-ROM standard to the BD-ROM3D standard, if a condition that the base view video is a playable stream even in a conventional player is determined, the condition can be satisfied.

For example, as shown in fig. 35, if an MVC header is previously added to each of the base view video and the dependent view video and decoding is performed from the base view video first, the base view video cannot be played at a legacy player. In an h.264/AVC decoder installed in a legacy player, an MVC header is undefined data. If such undefined data is input, it cannot be ignored depending on the decoder, and thus the process may fail.

Note that, in fig. 35, the view _ id of the base view video is X, and the view _ id of the dependent view video is Y larger than X.

In addition, even if encoding of the MVC header is prohibited, by defining the view _ id of the base view video to be regarded as 0, it is possible to cause the playback device 1 to perform decoding of the base view video first and then perform decoding of the corresponding dependent view video. I.e. the decoding can be performed in the correct order.

[ operation 2]

With respect to GOP structure

In the h.264/AVC standard, a GOP (group of pictures) structure according to the MPEG-2 video standard is not defined.

Therefore, in the BD-ROM standard for processing an H.264/AVC video stream, the GOP structure of the H.264/AVC video stream is defined, and various functions utilizing the GOP structure, such as random access, are realized.

The base view video stream and the dependent view video stream, which are video streams obtained by encoding compliant with the h.264AVC/MVC profile standard, have no definition about the GOP structure, which is the same as the h.264/AVC video stream.

The base view video stream is an H.264/AVC video stream. Therefore, the GOP structure of the base view video stream has the same structure as that of the H.264/AVC video stream defined in the BD-ROM standard.

The GOP structure of the dependent view video stream is also defined as the same structure as that of the base view video stream, i.e., the GOP structure of the H.264/AVC video stream defined in the BD-ROM standard.

The GOP structure of an H.264/AVC video stream defined in the BD-ROM standard has the following characteristics.

1. Features relating to flow structure

(1) Open GOP/closed GOP structure

Fig. 36 shows a closed GOP structure.

Each picture in FIG. 36 is a picture constituting an H.264/AVC video stream. The closed GOP includes an IDR (instantaneous decoding refresh) picture.

An IDR picture is an I picture and is decoded first in a GOP including the IDR picture. When an IDR picture is decoded, all decoding-related information such as the state of the reference picture buffer (DPB 151 in fig. 22), the frame number managed so far, POC (picture order count) is reset.

As shown in fig. 36, for the current GOP that is a closed GOP, it is prevented that, among pictures of the current GOP, pictures whose display order is earlier than (older) IDR pictures refer to pictures in the last GOP.

In addition, among the pictures of the current GOP, a picture whose display order is later than that of a (newer) IDR picture is prevented from referring to a picture exceeding the IDR picture in the last GOP. In h.264/AVC, P pictures following an I picture in display order are allowed to refer to pictures preceding the I picture.

Fig. 37 shows an open GOP structure.

As shown in fig. 37, for the current GOP that is an open GOP, pictures that are earlier in display order than (older) non-IDR I pictures (I pictures other than IDR pictures) among the pictures of the current GOP are allowed to refer to pictures in the last GOP.

In addition, pictures in the current GOP that are displayed in an order later than the non-IDR I picture are prohibited from referring to pictures in the previous GOP that exceed the non-IDR I picture.

(2) The SPS and PPS must be encoded in the head access unit of the GOP.

An SPS (sequence parameter set) is header information of a sequence, including information related to encoding of the entire sequence. Upon decoding a certain sequence, SPS including identification information of the sequence and the like is first required. A PPS (picture parameter set) is header information of a picture, including information related to encoding the entire picture.

(3) Up to 30 PPS may be encoded in the top access unit of a GOP. If multiple PPSs are encoded with a head-of-line access unit, the id (pic _ parameter _ set _ id) of each PPS should not be the same.

(4) Up to a maximum of one PPS may be encoded in an access unit other than the head of a GOP.

2. Features relating to reference structures

(1) The I, P and B pictures are required to be pictures individually configured from I, P and B slices, respectively.

(2) It is required that B pictures immediately preceding a reference picture (I or P picture) in display order must be encoded immediately after their reference picture in encoding order.

(3) The coding order and display order of reference pictures (I or P pictures) are required to be maintained (same).

(4) B pictures are prohibited from being referenced from P pictures.

(5) If the non-reference B picture (B1) is earlier in coding order than the non-reference picture (B2), then B1 is also required to be earlier in display order than B2.

A non-reference B picture is a B picture that is not referenced by another picture later in display order.

(6) The reference B picture may refer to the last or next reference picture (I or P picture) in display order.

(7) A non-reference B picture may refer to the last or next reference picture (I or P picture) in display order, or one reference B picture.

(8) The number of consecutive B pictures is required to be at most 3.

3. Features relating to maximum number of frames or fields within a GOP

The maximum number of frames or fields within one GOP is specified according to the frame rate of the video, as shown in fig. 38.

As shown in fig. 38, for example, if interlace display is performed at a frame rate of 29.97 frames per second, the maximum number of fields that can be displayed in pictures of 1GOP is 60. In addition, if progressive display is performed at a frame rate of 59.94 frames per second, the maximum number of frames that can be displayed in pictures of 1GOP is 60.

The GOP structure having the above-described characteristics is also defined as a GOP structure of the dependent view video stream.

In addition, it is also specified as a constraint that the structure of a certain GOP of the base view video stream matches the structure of a GOP of the corresponding dependent view video stream.

Fig. 39 shows a closed GOP structure of the base view video stream or dependent view video stream as defined above.

As shown in fig. 39, for a current GOP that is a closed GOP, a picture whose display order among pictures of the current GOP is prior to (older than) an IDR picture or an anchor picture is prohibited from referring to a picture of the last GOP. Anchor pictures will be described later.

In addition, pictures in the current GOP that are displayed in order later than the (newer) IDR picture or anchor picture are prohibited from referring to pictures in the last GOP that are beyond the IDR picture or anchor picture.

Fig. 40 shows an open GOP structure of the base view video stream or the dependent view video stream.

As shown in fig. 40, for a current GOP that is an open GOP, pictures that are displayed in order before a non-IDR anchor picture (an anchor picture that is not an IDR picture) in pictures of the current GOP are allowed to refer to pictures of the previous GOP.

In addition, pictures in the current GOP that are displayed in a later order than the non-IDR anchor picture are prohibited from referencing pictures in the previous GOP that are beyond the non-IDR anchor picture.

The GOP structure is defined as described above, so that a certain GOP of the base view video stream and a GOP of the corresponding dependent view video stream match in terms of the characteristics of the stream structure such as an open GOP or a closed GOP.

In addition, a plurality of features of the picture reference structure are matched so that a picture of the dependent view video corresponding to a non-reference B picture of the base view video is necessarily a non-reference B picture.

Further, the number of frames and the number of fields are also matched between a certain GOP of the base view video stream and a GOP of the corresponding dependent view video stream.

In this way, the GOP structure of the dependent view video stream is defined as the same structure as that of the base view video stream, so that the corresponding GOPs between the streams can have the same characteristics.

In addition, even if decoding is performed from the middle of the stream, it can be performed without problems. Decoding starting from the middle of the stream is performed, for example, at free play (trick play) or random access.

If the structure of the corresponding GOP differs between streams, for example, the number of frames differs, attention is paid to the fact that a situation may occur in which one stream can be normally played and the other stream cannot be played, but this can be prevented.

If decoding is started from the middle of one stream on the assumption that the structure of the corresponding GOP is different between the streams, attention is paid to a case where a base view video picture necessary for decoding the dependent view video may not have been decoded yet. In this case, as a result, the dependent view video picture cannot be decoded, and thus 3D display cannot be performed. In addition, an image of the base viewpoint video may not be output depending on the installation method, but such inconvenience may be prevented.

[EP_map]

The start position of decoding at the time of random access and free play can be set at EP _ map by using the GOP structures of the base view video stream and the dependent view video stream. The EP _ map is included in the Clip information file.

The following two constraints are specified as constraints on pictures that can be set to EP _ map as a decoding start position.

The position of the anchor picture continuously arranged in the SubsetSPS or the position of the IDR picture continuously arranged in the SubsetSPS is taken as a position that can be set to the dependent view video stream.

The anchor picture is a picture specified by the h.264AVC/MVC profile standard, and is a picture of the dependent view video stream encoded by performing reference between views instead of reference in the temporal direction.

2. If a certain picture of the dependent view video stream is set to EP _ map as the decoding start position, the picture of the corresponding base view video stream is also set to EP _ map as the decoding start position.

Fig. 41 shows an example of the decoding start position set to the EP _ map, which satisfies the above-described two constraints.

In fig. 41, pictures constituting the base view video stream and pictures constituting the dependent view video stream are shown in decoding order.

Picture P of pictures of dependent view video stream shown with color₁Is an anchor picture or an IDR picture. Including picture P₁Includes a subsetss.

In the example of fig. 41, picture P is shown by white arrow #11₁The EP _ map set to the dependent view video stream is set as the decoding start position.

Video stream and picture P as base view₁Picture P of the corresponding picture₁₁Is an IDR picture. As indicated by the white arrow #12, picture P as an IDR picture₁₁EP _ map also set to the base view video stream is set as the decoding start position.

If from picture P in response to indicating random access or free play₁And P₁₁Starting decoding, then firstly executing the decoding to the picture P₁₁And (4) decoding. Picture P₁₁Is an IDR picture, and thus, it is possible to refer to a picture P without referring to another picture₁₁And decoding is carried out.

After completing the picture P₁₁When decoding, next, the picture P is decoded₁And decoding is carried out. In contrast to picture P₁Referencing decoded picture P when decoding₁₁. Picture P₁Is an IDR picture or an anchor picture, and thus, if the P picture is finished₁₁Can perform decoding on the picture P₁And (4) decoding.

Subsequently, picture P in terms of, for example, base view video₁Subsequent picture, picture P of dependent view video₁₁The following pictures, and so on.

The structure of the corresponding GOP is the same, so decoding also starts from the corresponding position, and thus a picture set to EP _ map and subsequent pictures can be decoded without problems regarding base view video and dependent view video. So that random access can be achieved.

A picture arranged on the left side of the dotted line shown in the vertical direction in fig. 41 becomes a picture that does not need to be decoded.

Fig. 42 illustrates a problem that would result if the GOP structure of dependent view video was not defined.

In the example of fig. 42, a picture P shown with color as an IDR picture of a base view video₂₁Is set to the EP _ map as a decoding start position.

Consider picture P if video from base view point₂₁Start decoding as the sum picture P in the dependent view video₂₁Picture P of the corresponding picture₃₁Not the case for anchor pictures. If the GOP structure is not defined, a picture of the dependent view video corresponding to an IDR picture of the base view video is not guaranteed to be an IDR picture or an anchor picture.

In this case, even in the picture P to the base view video₂₁At the end of decoding, the picture P cannot be decoded₃₁. For picture P₃₁A reference in the time direction for decoding is also necessary, but a picture to the left of the dotted line shown in the vertical direction is not decoded.

Picture P₃₁Cannot be decoded, resulting in reference picture P₃₁Other pictures of the dependent view video cannot be decoded.

This situation can be prevented by defining the GOP structure of the dependent view video stream.

For both the base view video and the dependent view video, the decoding start position is set to the EP _ map, so that the playback device 1 can easily determine the decoding start position.

If a certain picture of the base view video is set to the EP _ map alone as the decoding start position, the playback device 1 needs to determine a picture of the dependent view video corresponding to the picture of the decoding start position by calculation, and thus the processing becomes complicated.

Even if corresponding pictures of the base view video and the dependent view video have the same DTS/PTS, if bit rates of the videos are different, byte arrangements in the TS are not matched, and thus in this case, the processing becomes complicated.

Fig. 43 shows a concept of picture search required to perform random access or free play with an MVC stream composed of a base view video stream and a dependent view video stream as a target.

As shown in fig. 43, when random access or free play is performed, a non-IDR anchor picture or an IDR picture is searched and a decoding start position is determined.

Now, EP _ map will be described. Description will be made for a case where the decoding start position of the base view video is set to the EP _ map but the decoding start position of the dependent view video is also set to the EP _ map of the dependent view video in the same manner.

Fig. 44 shows the structure of an AV stream recorded on the optical disc 2.

The TS including the base view video stream is configured by an integer number of aligned units (aligned units) having a size of 6144 bytes.

The alignment unit is composed of 32 source packets (source packets). The source packet has 192 bytes. One source packet is composed of a Transport packet extra header (TP _ extra header) of 4 bytes and a Transport packet (Transport packet) of 188 bytes.

Data of the base view video is packetized into MPEG2 PES packets. The PES packet is formed by adding a PES packet header to the data portion of the PES packet. The PES packet includes a stream ID for determining the type of elementary stream to be transmitted through the PES packet.

The PES packets are also packetized into transport packets. That is, the PES packet is divided into the size of the payload of the transport packet, and the transport packet header is added to the payload, thereby forming the transport packet. The transport packet header includes a PID as identification information of data to be stored in the payload.

Note that the source packet number is provided to the source packet, and the source packet number is incremented by 1 for each source packet, for example, by 0 at the beginning of the Clip AV stream. In addition, the alignment unit starts from the first byte of the source packet.

Given the time stamp of the access point of the Clip, the EP _ map is used to search for data addresses to start reading data within the Clip AV stream file. The EP _ map is a list of entry points (a list of entry points) extracted from the elementary stream and the transport stream.

The EP _ map has address information for searching for an entry point to start decoding within the AV stream. An EP data in an EP _ map is configured by a pair of a PTS and an address corresponding to the PTS in the AV stream of the access unit. In AVC/h.264, data equivalent to one picture is stored in one access unit.

FIG. 45 shows an example of a Clip AV stream.

The Clip AV stream in fig. 45 is a video stream (base view video stream) composed of source packets identified by PID ═ x. For each source packet, the video stream is distinguished by a PID included in the header of a transport packet within the source packet.

In fig. 45, the source packets including the first byte of an IDR picture among the source packets of the video stream are marked with color. The clear boxes indicate source packets including data that is not a random access point or source packets including data of another stream.

For example, a source packet having a source packet number X1, which is distinguished by PID ═ X and includes the leading byte of a randomly accessible IDR picture of a video stream, is arranged at the position of PTS ═ PTS (X1) on the time axis of the Clip AV stream.

Similarly, the next source packet including the leading byte of the randomly accessible IDR picture is taken as a source packet having a source packet number X2, and is arranged at the position of PTS (X2).

Fig. 46 conceptually shows an example of EP _ map corresponding to the Clip AV stream in fig. 45.

As shown in fig. 46, the EP _ map is configured by stream _ PID, PTS _ EP _ start, and SPN _ EP _ start.

The stream _ PID represents a PID of transport packets for transmitting a video stream.

PTS _ EP _ start denotes a PTS of an access unit starting from a randomly accessible IDR picture.

SPN _ EP _ start represents the address of the source packet including the first byte of the access unit to be referenced by the value of PTS _ EP _ start.

The PID of the video stream is stored in the stream _ PID, and EP _ map _ for _ one _ stream _ PID () that is table information indicating the correlation between PTS _ EP _ start and SPN _ EP _ start is generated.

For example, PTS ═ PTS (X1) and the source packet number X1, PTS ═ PTS (X2) and the source packet numbers X2, …, and PTS ═ PTS (Xk) and the source packet number Xk are each described in a correlated manner in EP _ map _ for _ one _ stream _ PID [0] of a video stream whose PID ═ X.

Such a table is also generated for each video stream multiplexed in the same Clip AV stream. The EP _ map including the generated table is stored in the Clip information file corresponding to the Clip AV stream.

Fig. 47 shows an example of a data structure of a source packet designated by SPN _ EP _ start.

As described above, the source packet is configured in a form in which a 4-byte header is added to a 188-byte transport packet. The transport packet portion is composed of a header portion (TP header) and a payload portion. SPN _ EP _ start denotes a source packet number of a source packet including the first byte of an access unit from an IDR picture.

In AVC/h.264, an access unit (i.e., picture) starts from an AU delimiter (access unit delimiter). After the AU delimiter, SPS and PPS continue. Thereafter, the beginning portion of the IDR picture or the entire data slice is stored.

A value of 1 for payload _ unit _ start _ indicator of the TP header of a transport packet indicates that a new PES packet starts from the payload of the transport packet. The access unit starts with the source packet.

Such EP _ map is prepared for each of the base view video stream and the dependent view video stream.

Fig. 48 shows a sub-table included in EP _ map.

As shown in fig. 48, the EP _ map is divided into EP _ coarse and EP _ fine as sub-tables. The sub table EP _ coarse is a table for performing a search in a coarse unit, and the sub table EP _ fine is a table for performing a search in a more accurate unit.

As shown in fig. 48, the sub table EP _ fine is a table in which entries PTS _ EP _ fine and SPN _ EP _ fine are associated with each other. For example, with the top column being "0", each entry in the sub-table is provided with an entry number in ascending order. For the sub-table EP _ fine, the data width of the combination of the entry PTS _ EP _ fine and the entry SPN _ EP _ fine is 4 bytes.

The sub table EP _ coarse is a table in which the entry ref _ to _ EP _ fine _ id, the entry PTS _ EP _ coarse, and the entry SPN _ EP _ coarse are associated with each other. The data width of the combination of the entry ref _ to _ EP _ fine _ id, the entry PTS _ EP _ coarse, and the entry SPN _ EP _ coarse is 8 bytes.

The entry of the sub table EP _ fine is composed of bit information on the LSB (least significant bit) side of each of the entry PTS _ EP _ start and the entry SPN _ EP _ start. Also, the entry of the sub-table EP _ coarse is made up of bit information on the MSB (most significant bit) side of each of the PTS _ EP _ start and the entry SPN _ EP _ start and an entry number in the table of the sub-table EP _ fine corresponding thereto. The entry number is the number of an entry in the sub table EP _ fine having bit information on the LSB side extracted from the same data PTS _ EP _ start.

The data length of the entry PTS _ EP _ start is a 33-bit value. If the MSB bit is the 32 th bit and the LSB bit is the 0 th bit, 14 bits from the 32 th bit to the 19 th bit of the entry PTS _ EP _ start are used for the entry PTS _ EP _ coarse. According to the entry PTS _ EP _ coarse, a search may be performed in a range of up to 26.5 hours with a resolution of 5.8 seconds.

In addition, 11 bits from the 19 th bit to the 9 th bit in the entry PTS _ EP _ start are used for the entry PTS _ EP _ fine. According to the entry PTS _ EP _ fine, a search can be performed in a range of up to 11.5 seconds with a resolution of 5.7 milliseconds. Note that the 19 th bit is shared by the entry PTS _ EP _ coarse and the entry PTS _ EP _ fine. In addition, 9 bits in total from the 0 th bit to the 8 th bit on the LSB side are not used.

The data length of the entry SPN _ EP _ start is a 32-bit value. If the MSB bit is the 31 st bit and the LSB bit is the 0 th bit, all bits from the 31 st bit to the 0 th bit in the entry SPN _ EP _ start are used for the entry SPN _ EP _ coarse.

In addition, 17 bits from the 16 th bit to the 0 th bit in the entry SPN _ EP _ start are used for the entry SPN _ EP _ fine.

How to determine the read start address when random access is performed using EP _ coarse and EP _ fine will be described later. For example, EP _ map is also described in Japanese unexamined patent application publication No. 2005-348314.

[ operation 3]

At the time of decoding, the same value as the value of POC (picture order count) of a picture corresponding to the base view video stream is adopted as the value of POC of a picture of the dependent view video stream. POC is a value representing the display order of pictures specified by the AVC/h.264 standard, and is obtained by calculation at the time of decoding.

For example, the value of POC of the picture of the base view video stream is obtained by calculation, and the picture of the base view video stream is output from the decoder in the order indicated by the obtained value. In addition, at the same time that the picture of the base view video stream is output, the corresponding picture of the dependent view video stream is also output. Thus, substantially the same value as the value of the POC of the picture of the base view video stream is used as the value of the POC of the picture of the dependent view video stream.

In addition, SEI (supplemental enhancement information) is added to data of each picture constituting the base view video stream and the dependent view video stream. The SEI is additional information including secondary information related to decoding specified by h.264/AVC.

The picture timing SEI, which is one of the SEI, includes a read time point from a CPB (coded picture buffer) at the time of decoding, a read time point from a DPB, and the like. In addition, information of a display time point, information of a picture structure, and the like are also included in the picture timing SEI.

Fig. 51 shows the configuration of the access unit.

As shown in fig. 51, the access unit of the base view video including data of one picture of the base view video stream and the slave unit of the slave view video including data of one picture of the slave view video stream have the same configuration. One unit is composed of a delimiter indicating the boundary of each unit, SPS, PPS, SEI, and picture data.

At the time of encoding, the picture timing SEI to be added to the picture of the base view video stream and the picture timing SEI to be added to the dependent view video stream are operated in a unified manner.

For example, if picture timing SEI representing a reading time point from the CPB of T1 is added to the first picture in the encoding sequence of the base view video stream, picture timing SEI representing a reading time point from the CPB of T1 is also added to the first picture in the encoding sequence of the dependent view video stream.

That is, picture timing SEI having the same content for corresponding pictures in the encoded sequence or the decoded sequence is added to each picture of the base view video stream and the dependent view video stream.

Thus, the playback device 1 can treat the view component to which the same picture timing SEI is added as a corresponding view component in the decoded sequence.

The picture timing SEI is included in the elementary stream of the base view video and the dependent view video, and is referred to by the video decoder 110 in the playback device 1.

The video decoder 110 may identify the corresponding view component based on information included in the base stream. Likewise, the video decoder 110 may perform decoding processing in the correct decoding order based on the picture timing SEI.

The corresponding view component is identified without referring to a PlayList or the like, so that it is possible to cope with a situation where a problem with respect to a system layer occurs, and a layer thereon can be implemented. In addition, decoder installation independent of the layer in which the problem occurred can also be realized.

[ configuration of recording apparatus ]

Fig. 52 is a block diagram showing a configuration example of a recording apparatus for performing encoding according to the above-described operation to record a base view video stream and a dependent view video stream into a recording medium.

In the recording apparatus 501 in fig. 52, the base view video stream is generated, and the D1 view video stream is also generated as the dependent view video stream. That is, in the recording apparatus 501, the information of Depth described with reference to fig. 3 is not generated.

As shown in fig. 52, the recording apparatus 501 includes an information generating unit 511, an MVC encoder 512, and a recording unit 513. The information generating unit 511 corresponds to the data encoder 315 in fig. 29 described above, and the MVC encoder 512 corresponds to the video encoder 311 in fig. 29. The L image data and the R image data are input to the MVC encoder 512.

The information generating unit 511 generates database information composed of a PlayList file, a Clip information file including EP _ map of base view video, and a Clip information file including EP _ map of dependent view video. The generation of the database information by the information generating unit 511 is performed in accordance with an input by a user (content creator) of the recording apparatus 501. The information generation unit 511 outputs the generated database information to the recording unit 513.

In addition, the information generating unit 511 also generates additional information of the base view video, such as SPS, PPS, SEI, or the like in fig. 51 to be added to each picture of the base view video, and additional information of the dependent view video, such as SPS, PPS, SEI, or the like to be added to each picture of the dependent view video. The additional information of the base view video and the additional information of the dependent view video to be generated by the information generating unit 511 each include a picture timing SEI. The information generation unit 511 outputs the generated additional information to the MVC encoder 512.

The MVC encoder 512 encodes the L image data and the R image data in accordance with the h.264AVC/MVC profile standard, and generates data of each picture of base view video obtained by encoding the L image data and data of each picture of dependent view video obtained by encoding the R image data.

In addition, the MVC encoder 512 also generates a base view video stream by adding additional information of the base view video generated by the information generating unit 511 to data of each picture of the base view video. Similarly, the MVC encoder 512 generates a dependent view video stream by adding the additional information of the dependent view video generated by the information generating unit 511 to the data of each picture of the dependent view video.

The MVC encoder 512 outputs the generated base view video stream and dependent view video stream to the recording unit 513.

The recording unit 513 records the database information supplied from the information generating unit 511 and the base view video stream and the dependent view video stream supplied from the MVC encoder 512 into a recording medium such as a BD. The recording medium in which the recording unit 513 has recorded data is supplied to the device on the playback side, for example, as the above-described optical disc 2.

Note that in the recording unit 513, a plurality of types of processing are performed before the base view video stream and the dependent view video stream are recorded. For example, processing for multiplexing the base view video stream and the dependent view video stream into the same TS or multiplexing each of these streams into a different TS together with other data, processing for removing the MVC header from the access unit of the base view video, packetizing processing for dividing the base view video stream and the dependent view video stream into source packets, and the like are performed.

Fig. 53 is a block diagram showing a configuration example of the MVC encoder 512 in fig. 52.

As shown in fig. 53, the MVC encoder 512 includes a base view video encoder 521 and a dependent view video encoder 522. The L image data is input to the base view video encoder 521 and the dependent view video encoder 522, and the R image is input to the dependent view video encoder 522. An arrangement may be made in which R image data is input to the base view video encoder 521 and encoded as base view video.

The base view video encoder 521 encodes the L image data according to the h.264AVC standard. In addition, the base view video encoder 521 adds additional information of the base view video to each picture obtained by encoding and outputs it as a base view video stream.

The dependent view video encoder 522 encodes the R image data according to the h.264avc/MVC profile standard with reference to the L image data as appropriate. In addition, the dependent view video encoder 522 also adds additional information of the dependent view video to each picture obtained by encoding and outputs it as a dependent view video stream.

[ operation of recording apparatus ]

The recording process of the recording apparatus 501 will now be described with reference to the flowchart in fig. 54.

In step S1, the information generating unit 511 generates database information composed of a PlayList file, a Clip information file, and additional information to be added to each picture of the L image data and the R image data.

In step S2, the MVC encoder 512 executes encoding processing. The base view video stream and dependent view video stream generated by the encoding process are supplied to the recording unit 513.

In step S3, the recording unit 513 records the database information generated by the information generating unit 511 and the base view video stream and the dependent view video stream generated by the MVC encoder 512 into a recording medium. Then, the process ends.

Next, the encoding process performed in step S2 in fig. 54 will be described with reference to the flowchart in fig. 55.

In step S11, the base view video encoder 521 selects one picture (one frame) of the input L image as an encoding target picture.

In step S12, the base view video encoder 521 determines whether the L image to be encoded is encoded as an I picture or an IDR picture. If encoding conditions, such as the number of pictures constituting 1GOP, the number of I pictures or IDR pictures included in 1GOP, etc., have been set, the picture type of L image to be encoded is determined, for example, from the positions of pictures arranged in the encoding order.

If it is determined in step S12 that the L image to be encoded is encoded as an I picture or an IDR picture, the base view video encoder 521 determines the type of the L image to be encoded as an I picture or an IDR picture in step S13.

In step S14, the dependent view video encoder 522 detects one picture corresponding to the L image of which the picture type has been determined in step S13 among the input R images. As described above, the L image and the R image located at the same time point and the same position when each image is arranged in the display order, the coding order, become the corresponding pictures.

In step S15, the dependent view video encoder 522 determines that the picture type of the detected R image is an anchor picture.

On the other hand, if it is determined in step S12 that the L image to be encoded is not encoded as an I picture or an IDR picture, the base view video encoder 521 determines a picture type according to the position of the L image to be encoded in step S16.

In step S17, the dependent view video encoder 522 detects one picture corresponding to the L image of which the picture type has been determined in step S16 among the input R images.

In step S18, the dependent view video encoder 522 determines the type of the L image selected as the encoding target, which can be output next, as the picture type of the detected R image.

In step S19, the base view video encoder 521 encodes an L image to be encoded according to the determined picture type. Also, the dependent view video encoder 522 encodes the R image detected in step S14 or S17 according to the determined picture type.

In step S20, the base view video encoder 521 adds additional information to the base view video picture obtained by encoding. Also, the dependent view video encoder 522 adds additional information to the dependent view video obtained by encoding.

In step S21, the base view video encoder 521 determines whether the L image currently selected as the encoding target is the last picture.

If it is determined in step S21 that the L image currently selected as the encoding target is not the last picture, the flow returns to step S11, in which pictures to be encoded are switched, and the above-described processing is repeated. If it is determined in step S21 that the currently selected L image is the last picture, the flow returns to step S2 in fig. 54, and the processing and subsequent processing will be executed.

According to the above-described processing, the data of the L picture and the data of the R picture can be encoded so that the encoded base view video stream and the encoded dependent view video stream have the same GOP structure.

In addition, additional information having the same content may be added to the picture of the base view video and the corresponding picture of the dependent view video.

[ configuration of playback apparatus ]

Fig. 56 is a block diagram showing a configuration example of a playback device for playing a recording medium in which data is recorded by the recording device 501.

As shown in fig. 56, the playback device 502 includes an acquisition unit 531, a control unit 532, an MVC decoder 533, and an output unit 534. The acquisition unit 531 corresponds to, for example, the disk drive 52 in fig. 20, and the control unit 532 corresponds to the controller 51 in fig. 20. The MVC decoder 533 corresponds to a partial configuration of the decoder unit 56 in fig. 20.

The acquisition unit 531 reads data from a recording medium, in which the recording apparatus 501 has recorded data, installed in the playback apparatus 502, according to the control of the control unit 532. The acquisition unit 531 outputs the database information read from the recording medium to the control unit 532, and outputs the base view video stream and the dependent view video stream to the MVC decoder 533.

The control unit 532 controls the overall operation of the playback device 502, such as reading data from a recording medium.

For example, the control unit 532 obtains database information by controlling the obtaining unit 531 to read the database information from the recording medium. In addition, if playback of a PlayList for 3D playback (a PlayList having a value of 01 of 3D _ PL _ ype in fig. 13) included in the obtained database information is instructed, the control unit 532 supplies information such as a stream ID described in the PlayList to the acquisition unit 531, and controls the acquisition unit 531 to read the base view video stream and the dependent view video stream from the recording medium. The control unit 532 controls the MVC decoder 533 to decode the base view video stream and the dependent view video stream.

The MVC decoder 533 decodes the base view video stream and the dependent view video stream according to the control of the control unit 532. The MVC decoder 533 outputs data obtained by decoding the base view video stream and the dependent view video stream to the output unit 534. For example, the MVC decoder 533 outputs data obtained by decoding the base view video stream as L image data and data obtained by decoding the dependent view video stream as R image data, respectively, according to the view _ type (fig. 14).

The output unit 534 outputs the L image data and the R image data supplied from the MVC decoder 533 to a display to display the L image and the R image.

Fig. 57 is a block diagram showing a configuration example of the MVC decoder 533.

As shown in fig. 57, the MVC decoder 533 includes a CPB541, a decoder 542, and a DPB 543. The CPB541 includes the B video buffer 106 and the D video buffer 108 in fig. 22. The decoder 542 corresponds to the video decoder 110 in fig. 22, and the DPB543 corresponds to the DPB 151 in fig. 22. Although not shown in the drawing, a circuit corresponding to the switch 109 in fig. 22 is also provided between the CPB541 and the decoder 542.

The CPB541 stores the data of the base view video stream and the data of the dependent view video stream supplied from the acquiring unit 531. The decoder 542 reads out data of the base view video stream stored in the CPB541 in units of data constituting one access unit. The decoder 542 also similarly reads data of the dependent view video stream stored in the CPB541 in units of data constituting one dependent unit.

The decoder 542 decodes the data read from the CPB541, and outputs data of each picture of the base view video and the dependent view video obtained by the decoding to the DPB 543.

The DPB543 stores the data supplied from the decoder 542. The decoder 542 appropriately refers to data of each picture of the base view video and the dependent view video stored in the DPB543 when decoding a subsequent picture in decoding order. Data of each picture stored in the DPB543 is output according to a display time point of each picture represented by the picture timing SEI.

[ operation of playback apparatus ]

The playback process of the playback device 502 will now be described with reference to the flowchart of fig. 58.

Note that fig. 58 shows each step as performing the processing on the dependent view video stream after performing the processing on the base view video stream, but the processing on the base view video stream and the processing on the dependent view video stream may be performed in parallel as appropriate. The same is true for other flowcharts relating to the processing of the base view video stream and the processing of the dependent view video stream.

In step S31, the acquisition unit 531 reads data from the recording medium mounted on the playback device 502. The acquisition unit 531 outputs the read database information to the control unit 532, and outputs the data of the base view video stream and the data of the dependent view video stream to the MVC decoder 533.

In step S32, the MVC decoder 533 performs decoding processing.

In step S33, the output unit 534 outputs the L image data and the R image data supplied from the MVC decoder 533 to the display to display the L image and the R image. Subsequently, the process ends.

The decoding process to be performed in step S32 in fig. 58 will now be described with reference to the flowcharts in fig. 59 and 60.

In step S41, the CPB541 stores data of the base view video stream and data of the dependent view video stream. The data stored in the CPB541 is read appropriately by the control unit 532.

In step S42, the control unit 532 detects the boundary of the access unit of the base view video stream with reference to the data stored in the CPB 541. Detecting the boundaries of the access unit is performed by, for example, detecting an access unit delimiter. Data from a certain boundary position to the next boundary position becomes data of one access unit. The data of one access unit includes data of one picture of the base view video and additional information added thereto.

In step S43, the control unit 532 determines whether or not the picture timing SEI has been encoded into (included in) one access unit of the base view video whose boundary has been detected.

If it is determined in step S43 that the picture timing SEI has been encoded, the control unit 532 reads the picture timing SEI in step S44.

In step S45, in accordance with the extracted point-in-time (read point-in-time) described in the read picture timing SEI, the control unit 532 supplies picture data of the base view video, of the data of one access unit whose boundary has been detected, from the CPB541 to the decoder 542.

On the other hand, if it is determined in step S43 that the picture timing SEI has not been encoded, in step S46, the control unit 532 supplies picture data of base view video from among data of one access unit whose boundary has been detected, from the CPB541 to the decoder 542, in accordance with system information (DTS).

In step S47, the decoder 542 decodes the data supplied from the CPB 541. In decoding a picture of base view video, the decoded picture stored in the DPB543 is referred to as appropriate.

In step S48, the DPB543 stores the picture data of the base view video obtained by decoding.

In step S49, the control unit 532 calculates the POC of the decoded picture of the base view video and stores it.

In step S50, the control unit 532 detects the boundary of the slave unit of the slave view video stream, and detects a slave unit in the slave view video stream corresponding to the access unit of the base view video stream whose boundary has been detected in step S42.

In step S51, the control unit 532 determines whether or not the picture timing SEI has been encoded into one dependent unit of the dependent view video whose boundary has been detected.

If it is determined in step S51 that the picture timing SEI has been encoded, the control unit 532 reads the picture timing SEI in step S52.

In step S53, the control unit 532 supplies picture data of the dependent view video from among data of one dependent unit whose boundary has been detected, to the decoder 542 from the CPB541, in accordance with the extracted point-in-time described in the read picture timing SEI.

On the other hand, if it is determined in step S51 that the picture timing SEI has not been encoded, in step S54, the control unit 532 supplies picture data of the dependent view video from among data of one dependent unit, the boundary of which has been detected, from the CPB541 to the decoder 542 in accordance with the system information.

Note that if the decoder for base view video and the decoder for dependent view video are both provided to the MVC decoder 533, the picture data of dependent view video stored in the CPB541 is provided to the decoder for dependent view video at the same timing as when the picture data of base view video is provided from the CPB541 to the decoder for base view video.

In step S55, the decoder 542 decodes the data supplied from the CPB 541. In decoding the pictures of the dependent view video, the decoded picture of the base view video and the decoded picture of the dependent view video stored in the DPB543 are referred to as appropriate.

In step S56, the DPB543 stores the picture data of the dependent view video obtained by decoding. According to the above-described process being repeated, a plurality of pictures of the base view video whose POC values have been calculated and corresponding pictures of the dependent view video are stored into the DPB 543. For pictures of dependent view video, POC values are not calculated.

In step S57, the control unit 532 outputs, from the DPB543, the picture with the smallest POC value among the pictures of the base view video stored in the DPB543, and also outputs, at the same timing, the corresponding picture of the dependent view video from the DPB 543. The pictures output from the DPB543 are supplied to the output unit 534.

If the picture timing SEI is added to a picture of a base view video, the picture of the base view video is output according to a display time point described in the picture timing SEI. On the other hand, if the picture timing SEI is not added, the output is performed according to a display time point represented by system information (PTS).

In step S58, the control unit 532 determines whether all pictures of the base view video and the dependent view video have been output. If it is determined in step S58 that the full partial picture has not been output, the control unit 532 returns to step S41 and repeatedly performs the above-described processing. If it is determined in step S58 that all pictures have been output, the control unit 532 returns to step S32 in fig. 58, and the processing and subsequent processing will be executed.

According to the above-described processing, encoding is performed so as to have the same GOP structure, and the base view video stream and the dependent view video stream, to each of which the same additional information has been added, can be decoded.

Next, random access playback processing of the playback device 502 to be performed with EP _ map will be described with reference to the flowchart of fig. 61.

In step 71, the control unit 532 controls the acquisition unit 531 to read the Clip information file of each of the Clip of the base view video stream and the Clip of the dependent view video stream. In addition, the control unit 532 also obtains the EP _ map of the base view video and the EP _ map of the dependent view video. As described above, the EP _ map of the base view video and the EP _ map of the dependent view video are separately prepared.

In step S72, the control unit 532 obtains a PTS indicating the start time point of random access playback based on an operation by the user or the like. For example, if a chapter set to a video stream has been selected from a menu screen, the PTS of the selected chapter is obtained.

In step S73, the control unit 532 determines the source packet number indicated by the SPN _ EP _ start corresponding to the PTS of the playback start time point obtained from the EP _ map of the base view video. In addition, the control unit 532 sets an address on the recording medium to which the source packet identified by the determined source packet number is recorded as the read start address.

For example, based on 14 bits on the MSB side among the 32 bits constituting the PTS, a search is performed with EP _ coarse as a sub-table of EP _ map of base view video as an object, and PTS _ EP _ coarse, corresponding ref _ to _ EP _ fine _ id, and SPN _ EP _ coarse are determined. In addition, based on the determined ref _ to _ EP _ fine _ id, a search is performed with EP _ fine as an object, and an entry PTS _ EP _ fine corresponding to an 11-bit value from the 10 th bit on the LSB side is determined.

A source packet number indicated by SPN _ EP _ coarse corresponding to PTS _ EP _ fine is determined, and an address to which a source packet identified by the source packet number is recorded is determined as a read start address. The address of each source packet on the recording medium is determined using a file system for managing data recorded in the recording medium.

In step S74, the control unit 532 determines the source packet number indicated by the SPN _ EP _ start corresponding to the PTS of the playback start time point obtained from the EP _ map of the dependent view video. The source packet number indicated by SPN _ EP _ start corresponding to the PTS is also determined using the sub-table of EP _ map constituting the dependent view video. In addition, the control unit 532 determines an address on the recording medium where the source packet identified by the determined source packet number is written as the read start address.

In step S75, the obtaining unit 531 starts reading data of each source packet constituting the base view video stream from the reading start address set in step S73. In addition, the acquisition unit 531 also starts reading data of each source packet constituting the dependent view video stream from the reading start address set in step S74.

The read data of the base view video stream and the read data of the dependent view video stream are supplied to the MVC decoder 533. The processing described with reference to fig. 59 and 60 is performed, and thus decoding from the playback start position specified by the user is performed.

In step S76, the control unit 532 determines whether the next search is performed, that is, whether start of random access playback from another position has been instructed, and if it is determined that start of accompanying access playback from another position has been instructed, the control unit 532 repeatedly performs the processing in step S71 and the processing thereafter.

If it is determined in step S76 that the start of random access playback from another position is not instructed, the process ends.

[ buffer control information ]

As described above, in the h.264AVC/MVC profile standard, a base view video stream, which is a video stream as a base, and a dependent view video stream, which is a video stream on which encoding/decoding is performed using the base view video stream as a base, are defined.

In the h.264AVC/MVC profile standard, the base view video stream and the dependent view video stream are allowed to exist as a single video stream or as independent video streams, respectively.

A in fig. 62 shows a state in which the base view video stream and the dependent view video stream exist as a single video stream.

In the example in a of fig. 62, the entire base view video stream and the entire dependent view video stream are each divided for each predetermined section, and a single elementary stream is configured such that the respective sections are mixed. In a of fig. 62, a section denoted by the letter "B" represents a section of the base view video stream, and a section denoted by the letter "D" represents a section of the dependent view video stream.

B of fig. 62 shows a state in which the base view video stream and the dependent view video stream exist as independent video streams, respectively.

In the BD-ROM3D standard, as shown in B of fig. 62, the base view video stream and the dependent view video stream are required to be recorded on the disc as independent elementary streams, respectively. In addition, the base view video stream is also required to be a stream encoded according to the H.264/AVC standard. These restrictions allow a BD player compatible with 3D playback to play back (2D playback) only the base view video stream.

Therefore, in the BD-ROM3D standard, even if only the base view video stream encoded according to the h.264/AVC standard is played back, or even if the base view video stream and the dependent view video stream are played back together, it is necessary to encode the streams on the recording apparatus side in advance so that the streams can be played back correctly. In particular, encoding needs to be performed so as not to underflow or overflow the buffer.

In the h.264/AVC standard, it is allowed to encode two kinds of buffer control information into one stream so as not to underflow a buffer or the like. Also in the BD-ROM3D standard, it is assumed that decoding only the base view video stream, and decoding the base view video stream and the dependent view video stream together, require encoding of buffer control information into one stream.

Incidentally, playback apparatuses compatible with the BD-ROM3D standard include a playback apparatus that decodes a base view video stream and a dependent view video stream with one decoder, and a playback apparatus that decodes a base view video stream and a dependent view video stream with two decoders, one decoder for base view video and one decoder for dependent view video. In the BD-ROM3D standard, the number of decoders is not specified.

Therefore, in the BD-ROM3D standard, even if the base view video stream and the dependent view video stream are decoded by one decoder or two decoders, it is necessary to encode buffer control information into the streams on the recording apparatus side so that the streams can be correctly played.

Therefore, in the recording apparatus, the buffer control information is encoded as follows.

1. If only the base view video stream is played, a value for enabling such playback to be performed correctly is encoded into the base view video stream.

2. If the dependent view video stream is played with a separate decoder (decoder for dependent view video), a value for enabling such playback to be performed correctly is encoded into the dependent view video stream.

3. If the base view video stream and the dependent view video stream are played together using a single decoder, a value for enabling such playback to be performed correctly is encoded into the dependent view video stream.

[ specific examples of encoding positions ]

In the base view video stream and the dependent view video stream, the HRD parameters and max _ dec _ frame _ buffering are encoded as buffer control information.

The HRD parameters include information representing the maximum bit rate of the input from the CPB to the decoder. The method can also comprise the following steps: information indicating the maximum bit rate of the input to the CPB, information indicating the buffer size of the CPB, or a flag indicating whether the HRD is CBR (constant bit rate).

max _ dec _ frame _ buffering is information indicating the maximum number of pictures (reference pictures) that can be stored in the DPB.

Fig. 63 shows an example of the encoding positions of the HRD parameters in the base view video stream.

As shown in fig. 63, the HRD parameters are encoded as a single piece of SPS information included in each access unit constituting the base view video stream. In the example shown in fig. 63, the HRD parameters are encoded as a single piece of VUI (video usability information) information included in the SPS.

The HRD parameter in fig. 63 represents the maximum bit rate of the input to the decoder in the case where only the base view video stream is played. If a bus between the CPB and the decoder is used to transmit data of only the base view video stream, the transmission rate is limited to the bit rate represented by the HRD parameters or less.

Note that AUD in fig. 63 denotes an AU delimiter described with reference to fig. 51, and a clip corresponding to data of one picture is included in the access unit of fig. 63.

Fig. 64 shows a description format of seq _ parameter _ set _ data () (SPS) in the case where HRD parameters are encoded into the positions shown in fig. 63.

As shown in fig. 64, HRD _ parameters () (HRD parameters) are described in VUI _ parameters () (VUI) within seq _ parameter _ set _ data ().

Fig. 65 shows an example of the encoding position of max _ dec _ frame _ buffering in the base view video stream.

As shown in fig. 65, max _ dec _ frame _ buffering is also encoded as a single piece of information included in each of the access units constituting the base view video stream. In the example in fig. 65, max _ dec _ frame _ buffering is encoded as a single piece of VUI information included in the SPS.

Max _ dec _ frame _ buffering in fig. 65 represents the maximum number of pictures that can be stored in the DPB in the case of playing only the base view video stream. If one DPB is used to store decoded pictures of only the base view video stream, the number of pictures to be stored in the DPB is limited to the number of slices represented by max _ dec _ frame _ buffering or less.

Fig. 66 shows the description format of seq _ parameter _ set _ data () in the case of encoding max _ dec _ frame _ buffering into the positions shown in fig. 65.

As shown in fig. 66, max _ dec _ frame _ buffering is described in vui _ parameters () within seq _ parameter _ set _ data ().

Hereinafter, as shown in fig. 63, the HRD parameters encoded in the base view video stream will be referred to as first HRD parameters as appropriate. In addition, as shown in fig. 65, max _ dec _ frame _ buffering encoded in the base view video stream will be referred to as first max _ dec _ frame _ buffering.

Fig. 67 shows an example of the encoding positions of the HRD parameters in the dependent view video stream.

As shown in fig. 67, the HRD parameters are encoded as a single piece of subsetss information included in each of the slave units constituting the dependent view video stream. In the example shown in fig. 67, the HRD parameters are encoded as SPS single information included in the SubsetSPS.

The HRD parameters encoded as a single piece of SPS information represent the maximum bit rate of input to the decoder for dependent view video in the case where the dependent view video stream is played with a separate decoder. If the data of the dependent view video stream only is transmitted using the bus between the CPB and the independent decoder, the transmission rate is limited to the bit rate represented by the HRD parameters or less.

Fig. 68 shows a description format of subset _ seq _ parameter _ set _ data () (SubsetSPS) in the case where HRD parameters are encoded as SPS single information. The subsetSPS is a description of parameters from the SPS extension of H.264/AVC, and includes information indicating dependency relationships between views and the like.

As shown in fig. 68, hrd _ parameters () is described in vui _ parameters () within seq _ parameter _ set _ data () in subset _ seq _ parameter _ set _ data ().

In the example in fig. 67, the HRD parameters are encoded as mvvui Ext single information included in the SubsetSPS.

The HRD parameter encoded as single information of the MVC VUI Ext indicates a maximum bit rate input to a single decoder when the base view video stream and the dependent view video stream are played together with the decoder. If the data of the base view video stream and the data of the dependent view video stream are transmitted using a bus between the CPB and the single decoder, the transmission rate is limited to the bit rate represented by the HRD parameters or less.

Fig. 69 shows a description format of subset _ seq _ parameter _ set _ data () in the case of encoding HRD parameters as single information of MVC VUI Ext.

As shown in fig. 69, hrd _ parameters () is described in MVC _ VUI _ parameters _ extension () (MVC VUI Ext) within subset _ seq _ parameter _ set _ data ().

Hereinafter, as shown in fig. 67, the HRD parameters (left side in fig. 67) encoded as the SPS single information in the dependent view video stream will be referred to as second HRD parameters as appropriate. In addition, the HRD parameters (right side in fig. 67) encoded as single information of the MVC VUI Ext in the dependent view video stream will be referred to as third HRD parameters.

Fig. 70 shows an example of the encoding position of max _ dec _ frame _ buffering in the dependent view video stream.

As shown in fig. 70, max _ dec _ frame _ buffering is encoded as a single piece of SubsetSPS information included in each of the dependent units constituting the dependent view video stream. In the example of fig. 70, max _ dec _ frame _ buffering is encoded as SPS single information included in the SubsetSPS.

Max _ dec _ frame _ buffering, encoded as a single piece of SPS information, represents the maximum number of pictures that can be stored in the DPB if the dependent view video stream is played back with an independent decoder. If a single DPB is employed to store decoded pictures of only the dependent view video stream, the number of pictures to be stored into the DPB is limited to the number of pictures represented by max _ dec _ frame _ buffering or less.

Fig. 71 shows the description format of subset _ seq _ parameter _ set _ data () in the case of encoding max _ dec _ frame _ buffering as single information of SPS.

As shown in fig. 71, max _ dec _ frame _ buffering is described in vui _ parameters () within seq _ parameter _ set _ data () in subset _ seq _ parameter _ set _ data ().

In the example in fig. 70, max _ dec _ frame _ buffering is encoded as SEI single information.

The max _ dec _ frame _ buffering encoded as the SEI single information represents the maximum number of pictures that can be stored into the DPB in the case where the base view video stream and the dependent view video stream are played together with a single decoder. If a single DPB is employed to store decoded pictures of the base view video stream and decoded pictures of the dependent view video stream, the number of slices of pictures to be stored into the DPB is limited to the number of slices represented by max _ dec _ frame _ buffering or less.

Fig. 72 shows a description format of SEI _ message () (SEI) in the case where max _ dec _ frame _ buffering is encoded as SEI single information.

As shown in fig. 72, max _ dec _ frame _ buffering is described in view _ scalability _ info () (view scalability information SEI) in SEI _ message ().

Hereinafter, as shown in fig. 70, the max _ dec _ frame _ buffering (left side in fig. 70) encoded as single information of the SPS in the dependent view video stream will be referred to as second max _ dec _ frame _ buffering as appropriate. In addition, max _ dec _ frame _ buffering (right side in fig. 70) encoded as a single piece of information of SEI in the dependent view video stream will be referred to as third max _ dec _ frame _ buffering.

Thus, the HRD parameters and max _ dec _ frame _ buffering are encoded in the base view video stream and the dependent view video stream in units of three types.

[ arrangement of devices ]

A recording apparatus for recording data including buffer control information into a BD has the same configuration as the recording apparatus 501 shown in fig. 52. In addition, a playback device for playing data recorded in a BD has the same configuration as the playback device 502 shown in fig. 56.

Hereinafter, the configurations of a recording apparatus and a playback apparatus for performing processing using buffer control information will be described with reference to the configurations in fig. 52 and 56. Redundant description in the above description will be appropriately omitted.

The information generating unit 511 of the recording apparatus 501 generates database information composed of a PlayList file and a Clip information file, and also generates additional information of base view video and additional information of dependent view video. The additional information of the base view video includes the first HRD parameters and the first max _ dec _ frame _ buffering. In addition, the additional information of the dependent view video includes second and third HRD parameters, and second and third max _ dec _ frame _ buffering.

The information generation unit 511 outputs the generated database information to the recording unit 513, and outputs the generated additional information to the MVC encoder 512.

The MVC encoder 512 encodes the L image data and the R image data in accordance with the h.264AVC/MVC profile standard to generate data of each picture of base view video obtained by encoding the L image data and data of each picture of dependent view video obtained by encoding the R image data.

In addition, the MVC encoder 512 also generates a base view video stream by adding additional information of the base view video generated by the information generating unit 511 to the data of each picture of the base view video. In this base view video stream, the first HRD parameters are encoded into the positions shown in fig. 63, and the first max _ dec _ frame _ buffering is encoded into the positions shown in fig. 65.

Similarly, the MVC encoder 512 also generates a dependent view video stream by adding the additional information of the dependent view video generated by the information generating unit 511 to the data of each picture of the dependent view video. In the dependent view video stream, the second and third HRD parameters are encoded into the positions shown in fig. 67, and the second and third max _ dec _ frame _ buffering are encoded into the positions shown in fig. 70.

The recording unit 513 records the database information supplied from the information generating unit 511 and the base view video stream and the dependent view video stream supplied from the MVC encoder 512 into the BD. The BD in which the recording unit 513 records data is supplied to the playback device 502.

The acquisition unit 531 of the playback device 502 reads data from a BD, which has been mounted to the playback device 502, in which the recording device 501 has recorded data. The acquisition unit 531 outputs the database information read from the BD to the control unit 532, and outputs the base view video stream and the dependent view video stream to the MVC decoder 533.

The control unit 532 controls the overall operation of the playback device 502, such as reading data from a recording medium or the like.

For example, if only the base view video stream is played, the control unit 532 reads the first HRD parameters and the first max _ dec _ frame _ buffering from the base view video stream. The control unit 532 controls decoding of the base view video stream with the MVC decoder 533 based on the read information.

In addition, if the base view video stream and the dependent view video stream are played (3D playback), when the MVC decoder 533 includes a single decoder, the control unit 532 reads the third HRD parameter and the third max _ dec _ frame _ buffering from the dependent view video stream. The control unit 532 controls decoding of the base view video stream and the dependent view video stream by the MVC decoder 533 based on the read information.

The MVC decoder 533 decodes the base view video stream, or the base view video stream and the dependent view video stream, according to the control of the control unit 532. The MVC decoder 533 outputs the data obtained by decoding to the output unit 534.

The output unit 534 outputs the image supplied from the MVC decoder 533 to a display to display a 2D image or a 3D image.

[ operation of the apparatus ]

The recording process of the recording apparatus 501 is now described with reference to a flowchart in fig. 73.

In step S101, the information generating unit 511 generates database information and additional information including buffer control information to be added to each picture of the base view video and the dependent view video.

In step S102, the MVC encoder 512 executes encoding processing. Here, the same processing as that described with reference to fig. 55 is performed. The buffer control information generated in step S101 is added to each picture of the base view video and the dependent view video. The base view video stream and dependent view video stream generated by the encoding process are supplied to the recording unit 513.

In step S103, the recording unit 513 records the database information generated by the information generating unit 511 and the base view video stream and the dependent view video stream generated by the MVC encoder 512 into the BD. Subsequently, the process ends.

Next, the playback processing of the playback device 502 will be described with reference to the flowchart in fig. 74.

In step S111, the acquisition unit 531 reads data from the BD mounted in the playback device 502. The acquisition unit 531 outputs the read database information to the control unit 532, and outputs the data of the base view video stream and the dependent view video stream to the MVC decoder 533 if, for example, 3D playback is performed.

In step S112, the control unit 532 reads the buffer control information from the data of the stream read and supplied from the BD, and sets parameters to the MVC decoder 533. The stream serving as a read source of the buffer control information is changed according to the stream read from the BD or the configuration of the MVC decoder 533, which will be described later.

In step S113, the MVC decoder 533 executes the processing described with reference to fig. 59 and 60 according to the parameters set by the control unit 532.

In step S114, the output unit 534 outputs image data obtained by the decoding process performed with the MVC decoder 533 to the display. Subsequently, the process ends.

[ specific examples of parameter settings ]

A specific example of parameter setting to be performed using the buffer control information will now be described.

Now, it is assumed that the maximum bit rate of the input to the decoder in the case of playing only the base view video stream is 40 Mbps. In addition, it is assumed that the maximum bit rate of input to the decoder for dependent view video is 40Mbps in the case where the dependent view video stream is played with an independent decoder. It is assumed that the maximum bit rate of the input to a decoder in the case where a base view video stream and a dependent view video stream are played together with a single decoder is 60 Mbps.

In this way, in the recording apparatus 501, values representing 40Mbps are encoded as the values of the first HRD parameters and the values of the second HRD parameters in any case. A value representing 60Mbps is encoded as the value of the third HRD parameter.

In addition, it is assumed that the maximum number of pictures that can be stored in the DPB is 4 when only the base view video stream is played. Assume that the maximum number of pictures storable into the DPB is 4 in the case of playing back the dependent view video stream with the independent decoder. It is assumed that the maximum number of pictures that can be stored into the DPB in the case where the base view video stream and the dependent view video stream are played together using a single decoder is 6.

In this way, in the recording apparatus 501, the values representing 4 sheets are encoded as the first max _ dec _ frame _ buffering value and the second max _ dec _ frame _ buffering value in any case. The value representing 6 sheets is encoded as the value of the third max _ dec _ frame _ buffering.

Fig. 75 shows an example of decoding only the base view video stream in the MVC decoder 533 including a single decoder.

In this case, as shown in fig. 75, the control unit 532 reads the first HRD parameters and the first max _ dec _ frame _ buffering encoded in the base view video stream. The buffer control information D1 marked by adding diagonal lines on the base view video stream represents the first HRD parameters and the first max _ dec _ frame _ buffering.

In addition, the control unit 532 sets the maximum bit rate of the input from the CPB541 to the decoder 542 to 40Mbps based on the first HRD parameters. For example, the maximum bit rate is set according to a bus bandwidth between the CPB541 and the decoder 542 guaranteed by the size of 40 Mbps.

Further, the control unit 532 sets the maximum number of pictures storable into the DPB543 as 4 based on the first max _ dec _ frame _ buffering. For example, in the storage area of the DPB543, an area in which only 4 decoded pictures can be stored is secured, so that the maximum number of storable pictures is set.

Therefore, the decoding of the base view video stream by a single decoder is assumed by the recording side, for example. If the base view video stream has been encoded on the recording side so as to be decoded within a limited range, the buffer on the playback side can be prevented from crashing.

Fig. 76 shows an example of a case where the base view video stream and the dependent view video stream are decoded by the MVC decoder 533 including a single decoder.

In this case, as shown in fig. 76, the control unit 532 reads the third HRD parameter and the third max _ dec _ frame _ buffering encoded in the dependent view video stream. The buffer control information D2, marked by adding diagonal lines on the dependent view video stream, represents the second HRD parameters and the second max _ dec _ frame _ buffering. In addition, the buffer control information D3 represents the third HRD parameter and the third max _ dec _ frame _ buffering.

In addition, the control unit 532 sets the maximum bit rate of the input from the CPB541 to the decoder 542 to 60Mbps based on the third HRD parameters.

Further, the control unit 532 sets the maximum number of pictures storable into the DPB543 to 6 based on the third max _ dec _ frame _ buffering.

Therefore, for example, the recording side is responsible for decoding the base view video stream and the dependent view video stream. If the base view video stream and the dependent view video stream have been encoded on the recording side so that they are decoded within a limited range, it is possible to prevent the buffer crash on the playback side.

Fig. 77 is a block diagram showing another configuration of the MVC decoder 533.

In the configuration shown in fig. 77, the same configurations as those shown in fig. 57 are denoted by the same reference numerals. Redundant description is appropriately omitted.

In the example of fig. 77, two decoders are provided: decoders 542-1 and 542-2. The decoder 542-1 is a decoder for base view video, and the decoder 542-2 is a decoder for dependent view video.

The decoder 542-1 reads data of the base view video stream stored in the CPB541 in units of data constituting one access unit. In addition, the decoder 542-2 reads the dependent view video stream stored in the CPB541 in units of data constituting one dependent unit.

The decoder 542-1 decodes the data read from the CPB541, and outputs data of each picture of the base view video obtained by the decoding to the DPB 543.

The decoder 542-2 decodes the data read from the CPB541, and outputs data of each picture of the dependent view video obtained by the decoding to the DPB 543.

A case where the MVC decoder 533 includes two decoders will be described, for example.

Fig. 78 shows an example of decoding only the base view video stream at the MVC decoder 533 including two decoders.

In this case, as shown in fig. 78, the control unit 532 reads the first HRD parameters and the first max _ dec _ frame _ buffering encoded in the base view video stream.

In addition, the control unit 532 sets the maximum bit rate of the input from the CPB541 to the decoder 542 to 40Mbps based on the first HRD parameters.

Further, the control unit 532 sets the maximum number of pictures storable into the DPB543 as 4 based on the first max _ dec _ frame _ buffering.

In fig. 78, decoder 542-2 is marked with a dashed line to indicate that no processing is performed at decoder 542-2.

Fig. 79 shows an example of decoding a base view video stream and a dependent view video stream at the MVC decoder 533 including two decoders.

In this case, as shown in fig. 79, the control unit 532 reads the first HRD parameters encoded in the base view video stream and the second HRD parameters and the third max _ dec _ frame _ buffering encoded in the dependent view video stream.

In addition, the control unit 532 sets the maximum bit rate of the input from the CPB541 to the decoder 542-1 to 40Mbps based on the first HRD parameters, and sets the maximum bit rate of the input from the CPB541 to the decoder 542-2 to 40Mbps based on the second HRD parameters.

Further, the control unit 532 sets the maximum number of pictures storable into the DPB543 to 6 based on the third max _ dec _ frame _ buffering. The DPB543 is shared for the base view video and the dependent view video, and thus, the third max _ dec _ frame _ buffering is used as a parameter for setting the maximum number of pictures storable into the DPB 543.

Fig. 80 shows another example of decoding the base view video stream and the dependent view video stream at the MVC decoder 533 including two decoders.

In the MVC decoder 533 in fig. 80, a buffer for base view video and a buffer for dependent view video are also provided for each of the CPB541 and the DPB 543.

In this case, as shown in fig. 80, the control unit 532 reads the first HRD parameters and the first max _ dec _ frame _ buffering encoded in the base view video stream. In addition, the control unit 532 also reads the second HRD parameters and the second max _ dec _ frame _ buffering encoded in the dependent view video stream.

The control unit 532 sets the maximum bit rate from the input of the CPB 541-1, which is the CPB for the base view video, to the decoder 542-1 to 40Mbps based on the first HRD parameters. In addition, the maximum bit rate from the input of the CPB 541-2, which is the CPB for the dependent view video, to the decoder 542-2 is set to 40Mbps based on the second HRD parameter.

Further, the control unit 532 sets the maximum number of pictures storable into the DPB 543-1, which is the DPB for the base view video, to 4 based on the first max _ dec _ frame _ buffering. In addition, the maximum number of pictures storable into the DPB 543-2 as the DPB for the dependent view video is set to 4 based on the second max _ dec _ frame _ buffering.

Fig. 81 shows still another example of decoding a base view video stream and a dependent view video stream at the MVC decoder 533 including two decoders.

In the MVC decoder 533 in fig. 81, a buffer for base view video and a buffer for dependent view video are provided for the CPB, and one buffer is provided for both the base view video and the dependent view video for the DPB. In addition, data transmission between the CPB 541-1 as the CPB for the base view video and the decoder 542-1 and data transmission between the CPB 541-2 as the CPB for the dependent view video and the decoder 542-2 are performed via the same bus.

In this case, as shown in fig. 81, the control unit 532 reads the third HRD parameters and the third max _ dec _ frame _ buffering encoded in the dependent view video stream.

In addition, the control unit 532 sets the maximum bit rate of the bus to be used for data transmission between the CPB 541-1 and the decoder 542-1 and data transmission between the CPB 541-2 and the decoder 542-2 to 60Mbps based on the third HRD parameter.

[ authentication apparatus ]

Fig. 82 shows an authentication device for authenticating whether a video stream recorded in a BD by the recording device 501 can be correctly played in the playback device 502.

The authentication device 551 in fig. 82 is configured by a computer. The video stream read from the BD is input to the authentication device 551.

In the base view video stream to be input as a video stream to the verification apparatus 551, first HRD parameters and first max _ dec _ frame _ buffering are encoded. In addition, in the dependent view video stream, second and third HRD parameters and second and third max _ dec _ frame _ buffering are encoded.

In the authentication apparatus 551, the control unit 551A is realized by a predetermined program executed by the CPU. The control unit 551A verifies whether the input video stream can be correctly played in the playback apparatus 502, and outputs information indicating the verification result. The verification result is displayed, for example, and confirmed by the user who performs verification with the verification apparatus 551.

In addition, in the authentication apparatus 551, an HRD (hypothetical reference decoder) is implemented with a CPU that executes a predetermined program. The HRD essentially reproduces the MVC decoder 533 of the playback device 502. The functional configuration of the HRD is shown in fig. 83.

As shown in fig. 83, the HRD 561 includes a CPB 571, a decoder 572, and a DPB 573.

The CPB 571 stores data of the input base view video stream and data of the input dependent view video stream. The decoder 572 reads the data of the base view video stream stored in the CPB 571 in units of data constituting one access unit. Similarly, the decoder 572 reads the data of the dependent view video stream stored in the CPB 571 in units of data constituting one dependent unit.

The decoder 572 decodes the data read from the CPB 571, and outputs data of each picture of the base view video and the dependent view video obtained by the decoding to the DPB 573.

DPB 573 stores data provided from decoder 572. The data of each picture of the base view video and the dependent view video stored in the DPB 573 is output according to a display time point of each picture indicated by the picture timing SEI.

Specific examples of the verification will be described.

In the same manner as described above, it is assumed that values representing 40Mbps, and 60Mbps have been encoded as the values of the first, second, and third HRD parameters, respectively. In addition, values representing 4 sheets, 4 sheets and 6 sheets have also been encoded as the values of the first, second and third max _ dec _ frame _ buffering, respectively.

Fig. 83 shows an example of decoding only the base view video stream.

In this case, as shown in fig. 83, the control unit 551A reads the first HRD parameters and the first max _ dec _ frame _ buffering encoded in the base view video stream.

In addition, the control unit 551A sets the maximum bit rate of the input from the CPB 571 to the decoder 572 to 40Mbps based on the first HRD parameters. Further, the control unit 551A sets the maximum number of pictures storable into the DPB 573 to 4 based on the first max _ dec _ frame _ buffering.

In this state, whether or not decoding of the base view video stream can be correctly performed is verified by the control unit 551A, and information indicating the verification result is output. If it is determined that the decoding can be correctly performed, it is determined that the input base view video stream is a stream that can be correctly played back based on the first HRD parameter and the first max _ dec _ frame _ buffering encoded therein, as described with reference to fig. 75, fig. 78, and fig. 80.

Fig. 84 shows an example of decoding only a dependent view video stream using a decoder for dependent view video.

In this case, as shown in fig. 84, the control unit 551A reads the second HRD parameters and the second max _ dec _ frame _ buffering encoded in the dependent view video stream.

In addition, the control unit 551A sets the maximum bit rate of the input from the CPB 571 to the decoder 572 to 40Mbps based on the second HRD parameters. Further, the control unit 551A sets the maximum number of pictures storable into the DPB 573 to 4 based on the second max _ dec _ frame _ buffering.

In this state, whether or not decoding of the dependent view video stream can be correctly performed is verified by the control unit 551A, and information indicating the verification result is output. If it is determined that the decoding can be correctly performed, it is determined that the input dependent view video stream is a stream that can be correctly played by a decoder for dependent view video based on the second HRD parameters and the second max _ dec _ frame _ buffering encoded therein, as described with reference to fig. 80.

Note that, in order to decode the dependent view video stream, the base view video stream is necessary. The data of the decoded picture of the base view video stream is appropriately input to the decoder 572 in fig. 84, and is used for decoding of the dependent view video stream.

Fig. 85 shows an example of decoding the base view video stream and the dependent view video stream with a single decoder.

In this case, as shown in fig. 85, the control unit 551A reads the third HRD parameters and the third max _ dec _ frame _ buffering encoded in the dependent view video stream.

In addition, the control unit 551A sets the maximum bit rate of the input from the CPB 571 to the decoder 572 to 60Mbps based on the third HRD parameters.

Further, the control unit 551A sets the maximum number of pictures storable into the DPB 573 to 6 based on the third max _ dec _ frame _ buffering.

In this state, whether or not decoding of the base view video stream and the dependent view video stream can be correctly performed is verified by the control unit 551A, and information indicating the verification result is output. If it is determined that the decoding can be correctly performed, it is determined that the input base view video stream and dependent view video stream are streams that can be correctly played back based on the third HRD parameter and the third max _ dec _ frame _ buffering, as described with reference to fig. 76.

[ position of view _ type ]

In the above description, as described with reference to fig. 12, the view _ type indicating whether the base view video stream is an L image stream or an R image stream is arranged to be described in a PlayList, but may be described in other positions.

For example, it is conceivable that the base view video stream and the dependent view video stream are multiplexed into the same TS, or each is multiplexed into a different TS, and transmitted via a broadcast wave or a network. In this case, the view _ type is described in PSI as transmission control information, or in a base view video stream or a dependent view video stream (elementary stream).

Fig. 86 shows an example of describing the view _ type in a PMT (program map table) included in the PSI (program specific information).

As shown in fig. 86, an arrangement may be made in which MVC _ video _ stream _ descriptor () is newly defined as a descriptor of MVC, and view _ type is described in MVC _ video _ stream _ descriptor (). Note that, for example, 65 is assigned as a value of descriptor _ tag.

In the playback device 1 that has received the TS, based on the value of the view _ type described in the PMT, it is determined whether the base view video stream multiplexed in the TS is an L image stream or an R image stream, and thus the processing described with reference to fig. 24 and 26, such as switching of the output destination of the decoding result, or the like, is performed.

The view _ type may also be described in another location such as SIT (selection information table) instead of within the PMT.

Fig. 87 shows an example of describing view _ type in an elementary stream.

As shown in fig. 87, view _ type can be described in MVC _ video _ stream _ info () within SEI. As described above, the SEI is additional information to be added to data of each picture constituting the base view video stream and the dependent view video stream. The SEI including the view _ type is added to each picture of at least one of the base view video stream and the dependent view video stream.

In the playback device 1 that has read the SEI, it is determined whether the base view video stream is an L image stream or an R image stream based on the value of the view _ type described in the SEI, and thereby the processing described with reference to fig. 24 and fig. 26, such as switching of the output destination of the decoding result, or the like, is performed.

The above-described processing sequence may be executed by hardware or may be executed by software. If the processing sequence is executed by software, a program constituting the software is installed from a program recording medium to a computer, a general-purpose personal computer, or the like accommodated in dedicated hardware.

Fig. 88 is a block diagram showing a configuration example of hardware of a computer that executes the above-described processing sequence by a program.

A CPU (central processing unit) 701, a ROM (read only memory) 702, and a RAM (random access memory) 703 are interconnected by a bus 704.

The bus 704 is also connected to an input/output interface 705. The input/output interface 705 is connected to an input unit 706 configured by a keyboard, a mouse, and the like, and an output unit 707 configured by a display, a speaker, and the like. In addition, the bus 704 is connected to a storage unit 708 constituted by a hard disk, a nonvolatile memory, or the like, a communication unit 709 constituted by a network interface or the like, and a drive 710 for driving a removable medium 711.

In the computer thus configured, for example, the above-described processing sequence is executed by the CPU 701 loading and executing a program stored in the storage unit 708 to the RAM 703 via the input/output interface 705 and the bus 704.

A program executed by the CPU 701 is recorded in, for example, a removable medium 711 and installed in the storage unit 708, the program being provided via a wired or wireless transmission medium (e.g., a local area network, the internet, digital broadcasting, etc.).

Note that the program executed by the computer may be a program for executing processing in time series according to the order described in this specification, or may be a program for executing processing in parallel or executing processing at necessary timing such as when called, or the like.

The embodiments of the present invention are not limited to the above-described embodiments, but various types of modifications may be made without departing from the spirit of the present invention.

List of labels

1 playback device, 2 optical disc, 3 display device, 11MVC encoder, 21 h.264/AVC encoder, 22 h.264/AVC decoder, 23 depth calculation unit, 24 dependent view video encoder, 25 multiplexer, 51 controller, 52 disk drive, 53 memory, 54 local storage, 55 internet interface, 56 decoder unit, 57 operation input unit

Claims

1. An information processing apparatus comprising:

generating means configured to generate first additional information representing an output timing of each picture of a basic stream obtained by encoding a video stream by a predetermined encoding method, and second additional information representing an output timing of each picture of an extended stream to be used for displaying a 3D image together with the basic stream so that information to be added to pictures corresponding to the basic stream and the extended stream represents the same timing; and

an encoding device configured to generate data of pictures of the basic stream and the extended stream by encoding a video stream, add the first additional information to the data of each picture of the basic stream, and add the second additional information to the data of each picture of the extended stream.

2. An information processing method comprising the steps of:

generating first additional information representing an output timing of each picture of a basic stream obtained by encoding a video stream by a predetermined encoding method, and second additional information representing an output timing of each picture of an extended stream to be used for displaying a 3D image together with the basic stream, so that information to be added to pictures corresponding to the basic stream and the extended stream represents the same timing; and

generating data of pictures of the basic stream and the extended stream by encoding a video stream, adding the first additional information to the data of each picture of the basic stream, and adding the second additional information to the data of each picture of the extended stream.

3. A playback device, comprising:

an acquisition means configured to acquire a basic stream and an extended stream, the basic stream being obtained by encoding a video stream by a predetermined encoding method and to which first additional information indicating an output timing of each picture is added to data of each picture of the basic stream, the extended stream being used for displaying a 3D image together with the basic stream, and to which second additional information indicating an output timing of each picture is added to data of each picture of the extended stream; and

decoding means configured to decode data of pictures corresponding to the basic stream and the extended stream at the same timing in accordance with timings indicated by the first additional information and the second additional information, and output pictures obtained by the decoding at the same timing in accordance with timings indicated by the first additional information and the second additional information.

4. The playback device according to claim 3, wherein the decoding means calculates a value representing a display order of pictures obtained by decoding the basic stream, outputs a picture of the basic stream whose display order is calculated to have the highest priority value according to the timing represented by the first additional information, and also outputs a corresponding picture of the extended stream at the same timing.

5. The playback device of claim 3, wherein the first additional information and the second additional information are SEIs in compliance with the H.264/AVC standard.

6. A playback method comprising the steps of:

acquiring a basic stream and an extended stream, the basic stream being obtained by encoding a video stream by a predetermined encoding method and to data of each picture of the basic stream being added first additional information indicating an output timing of each picture, the extended stream being used for displaying a 3D image together with the basic stream and to data of each picture of the extended stream being added second additional information indicating an output timing of each picture; and

decoding data of pictures corresponding to the basic stream and the extended stream at the same timing in accordance with timings indicated by the first additional information and the second additional information, and outputting pictures obtained by the decoding at the same timing in accordance with timings indicated by the first additional information and the second additional information.

7. An information processing apparatus comprising:

generating means configured to generate first constraint information, second constraint information, and third constraint information, wherein the first constraint information relates to a processing constraint at the time of decoding a basic stream obtained by encoding a video stream by a predetermined encoding method, the second constraint information relates to a processing constraint at the time of decoding an extended stream to be used for displaying a 3D image together with the basic stream, and the third constraint information relates to a processing constraint at the time of decoding the basic stream and the extended stream; and

an encoding device configured to generate data of pictures of the basic stream and the extended stream by encoding a video stream, add the first constraint information to the data of each picture of the basic stream, and add the second constraint information and the third constraint information to the data of each picture of the extended stream.

8. An information processing method comprising the steps of:

generating first constraint information, second constraint information, and third constraint information, wherein the first constraint information relates to a processing constraint when decoding a basic stream obtained by encoding a video stream with a predetermined encoding method, the second constraint information relates to a processing constraint when decoding an extended stream to be used for displaying a 3D image together with the basic stream, and the third constraint information relates to a processing constraint when decoding the basic stream and the extended stream; and

generating data of pictures of the basic stream and the extended stream by encoding a video stream, adding the first constraint information to the data of each picture of the basic stream, and adding the second constraint information and the third constraint information to the data of each picture of the extended stream.

9. A playback device, comprising:

acquiring means configured to acquire only a basic stream or a basic stream and an extended stream, of the basic stream and the extended stream, wherein the basic stream is obtained by encoding a video stream using a predetermined encoding method, and first constraint information relating to a processing constraint at the time of decoding the basic stream is added to data of each picture of the basic stream, wherein the extended stream is to be used for displaying a 3D image together with the basic stream, and second constraint information relating to a processing constraint at the time of decoding the extended stream is added to data of each picture of the extended stream, and third constraint information relating to a processing constraint at the time of decoding the basic stream and the extended stream is also added to data of each picture of the extended stream; and

decoding means configured to decode the stream obtained by the obtaining means in accordance with a constraint indicated by information added to data of each picture of the stream obtained by the obtaining means, from among the first to third constraint information.

10. The playback device of claim 9, wherein if the acquisition means has acquired only the elementary stream, the decoding means decodes the elementary stream in accordance with the constraint indicated by the first constraint information.

11. The playback apparatus of claim 9, wherein if the acquisition means has acquired the basic stream and the extended stream, when the decoding means includes a decoder, the decoding means decodes the basic stream and the extended stream in accordance with a constraint indicated by the third constraint information.

12. The playback apparatus according to claim 9, wherein if the obtaining means has obtained the basic stream and the extended stream, when the decoding means includes two decoders, a decoder for a basic stream and a decoder for an extended stream, the decoding means decodes the basic stream with the decoder for a basic stream in accordance with the constraint indicated by the first constraint information, and decodes the extended stream with the decoder for an extended stream in accordance with the constraint indicated by the second constraint information.

13. The playback device of claim 9, wherein each of the first to third constraints includes rate information representing a maximum bit rate of data to be input into the decoder, and picture-number-of-pictures information representing a maximum number of pictures that can be stored into a buffer for storing data of decoded pictures.

14. The playback device of claim 13, wherein the rate information is hrd _ parameters specified by h.264/AVC;

and wherein the slice count information is max _ dec _ frame _ buffering specified by H.264/AVC.

15. A playback method comprising the steps of:

obtaining only a basic stream or a basic stream and an extended stream of a basic stream and an extended stream, wherein the basic stream is obtained by encoding a video stream using a predetermined encoding method, and first constraint information relating to a processing constraint at the time of decoding the basic stream is added to data of each picture of the basic stream, wherein the extended stream is to be used for displaying a 3D image together with the basic stream, and second constraint information relating to a processing constraint at the time of decoding the extended stream is added to data of each picture of the extended stream, and third constraint information relating to a processing constraint at the time of decoding the basic stream and the extended stream is also added to data of each picture of the extended stream; and

the obtained stream is decoded in accordance with the constraint indicated by the information added to the data of each picture of the obtained stream among the first to third constraint information.