[go: up one dir, main page]

HK1182245B - 3d disparity maps - Google Patents

3d disparity maps Download PDF

Info

Publication number
HK1182245B
HK1182245B HK13109319.0A HK13109319A HK1182245B HK 1182245 B HK1182245 B HK 1182245B HK 13109319 A HK13109319 A HK 13109319A HK 1182245 B HK1182245 B HK 1182245B
Authority
HK
Hong Kong
Prior art keywords
resolution
disparity
disparity value
picture
resolutions
Prior art date
Application number
HK13109319.0A
Other languages
Chinese (zh)
Other versions
HK1182245A1 (en
Inventor
Thierry Borel
Ralf Ostermann
Wolfram Putzke-Roeming
Original Assignee
Interdigital Ce Patent Holdings
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Interdigital Ce Patent Holdings filed Critical Interdigital Ce Patent Holdings
Priority claimed from PCT/IB2011/000708 external-priority patent/WO2011121437A1/en
Publication of HK1182245A1 publication Critical patent/HK1182245A1/en
Publication of HK1182245B publication Critical patent/HK1182245B/en

Links

Description

Three-dimensional parallax map
Cross Reference to Related Applications
This application claims benefit of the filing date of the following U.S. provisional applications, which are hereby incorporated by reference in their entirety: (i) 61/397,418 entitled "3D Disparity Maps" filed on 11 th 6 th 2010 and entitled "three-dimensional Disparity Maps"; and (ii) No. 61/319,566 filed 3/31/2010 and entitled "Dense Disparity Maps".
Technical Field
Implementations related to 3D (three-dimensional) are described herein. Various specific implementations relate to disparity maps for video images.
Background
Stereoscopic video provides two video images including a left video image and a right video image. Depth and/or disparity information may also be provided for both video images. The depth and/or disparity information may be used for a variety of processing operations on the two video images.
Disclosure of Invention
According to one general aspect, disparity values for a particular location in a picture are accessed. The disparity value indicates disparity with respect to a specific resolution. The accessed disparity value is modified according to a plurality of resolutions to produce a modified disparity value.
According to another general aspect, a signal or structure includes a disparity portion that includes disparity values for a particular location in a picture. The picture has a particular resolution. The disparity value indicates disparity with respect to another resolution different from the specific resolution and based on the plurality of resolutions.
According to another general aspect, a disparity value for a particular location in a picture is accessed. The picture has a particular resolution. The disparity value indicates disparity with respect to another resolution different from the specific resolution and based on the plurality of resolutions. The accessed disparity value is modified to produce a modified disparity value indicative of disparity with respect to a particular resolution.
The details of one or more implementations are set forth in the accompanying drawings and the description below. Even if described in one particular way, it should be clear that the implementations may be configured or embodied in various ways. For example, an implementation may be performed as a method, embodied in a device such as, for example, a device configured to perform a set of operations or a device storing instructions to perform a set of operations, or embodied in a signal. Other aspects and features will become apparent from the following detailed description considered in conjunction with the accompanying drawings and the claims.
Drawings
Fig. 1 is a graphical representation of the actual depth values of parallel cameras.
Fig. 2 is a graphical representation of disparity values.
FIG. 3 is a graphical representation of the relationship between apparent depth and disparity;
fig. 4 is a graphical representation of a traffic camera.
FIG. 5 is a graphical representation of occlusions in a stereoscopic video image pair.
Fig. 6 is a block/flow diagram depicting one implementation with different native and transport formats.
FIG. 7 is a tabular representation of an example of a common multiple representation of disparity values.
Fig. 8 is a block/flow diagram depicting one example of a process for transmitting and using a common multiple representation of disparity values.
FIG. 9 is a block/flow diagram depicting one example of a transmission system that may be used with one or more implementations.
FIG. 10 is a block/flow diagram depicting one example of a receiving system that may be used with one or more implementations.
Detailed Description
As a preview of some of the features presented in this application, at least one implementation describes the use of disparity values based on a resolution much greater than the maximum resolution of any standard display. In this application, the term "resolution" generally refers to horizontal resolution and is measured in terms of, for example, the number of pixels of the display, the number of squares of pixels of the display, or the number of elements of the digital image. The non-standard resolution is an integer that is easily converted to one or more of several standard display resolutions. In this particular implementation, the effective display resolution is the least common multiple of several standard display resolutions. The disparity value for the effective display resolution is represented in integer format. As a result of the large non-display resolution, the disparity value may be large. However, the integer representation guarantees sub-pixel accuracy when converting the disparity values down to the standard display resolution.
Exiting from the preview above, fig. 1 illustrates the concept of depth in a video image. Fig. 1 shows a right camera 105 with a sensor 107, and a left camera 110 with a sensor 112. The two cameras 105, 110 capture images of an object 115. For illustrative purposes, the object 115 is a tangible cross with any detail 116 located to the right of the cross (see FIG. 2). Right camera 105 has a capture angle 120 and left camera 110 has a capture angle 125. The two capture angles 120, 125 overlap in a 3D volumetric region 130.
Because the object 115 is in the 3D volumetric region 130, the object 115 is visible to both cameras 105, 110, and the object 115 can be perceived as having depth. The object 115 has an actual depth 135. The actual depth 135 is generally referred to as the distance from the object 115 to the cameras 105, 110. More specifically, the actual depth 135 may be referred to as the distance from the object 115 to a stereo camera baseline 140, the stereo camera baseline 140 being a plane defined by the entrance pupil planes of the two cameras 105, 110. The entrance pupil plane of the camera is typically inside the zoom lens and is therefore typically not physically accessible.
The cameras 105, 110 are also shown as having a focal length 145. The focal length 145 is the distance from the exit pupil plane to the sensors 107, 112. For illustrative purposes, the entrance pupil plane and the exit pupil plane are shown as coincident, but in most cases they are separated by a little. In addition, the cameras 105, 110 are shown having a baseline length 150. The baseline length 150 is the distance between the centers of the entrance pupils of the cameras 105, 110, and is therefore measured on the stereo camera baseline 140.
Object 115 is imaged by each of cameras 105 and 110 as a real image on each of sensors 107 and 112. These real images include a real image 117 of the detail 116 on the sensor 107, and a real image 118 of the detail 116 on the sensor 112. As shown in fig. 1, the real image is inverted as known in the art.
Depth is closely related to parallax. Fig. 2 shows a left image 205 captured from camera 110, and a right image 210 captured from camera 105. Both images 205, 210 include representations of the object 115 with details 116. Image 210 includes a detail image 217 of detail 116 and image 205 includes a detail image 218 of detail 116. The rightmost point of detail 116 is captured in pixel 220 in detail image 218 in left image 205, and in pixel 225 in detail image 217 in right image 210. The horizontal distance between the locations of pixel 220 and pixel 225 is the disparity 230. It is assumed that the object images 217, 218 are vertically aligned so that the image of the detail 116 has the same vertical position in both images 205, 210. The parallax 230 provides the perception that the object 215 has depth when the left and right images 205, 210 are viewed by the left and right eyes, respectively, of a viewer.
Fig. 3 shows the relationship between parallax and perceived depth. Three observers 305, 307, 309 are shown viewing a stereoscopic pair of objects on respective screens 310, 320, 330.
The first observer 305 views a left view 315 of the object and a right view 317 of the object with positive parallax. The positive disparity reflects the fact that the left view 315 of the object is to the left of the right view 317 of the object on the screen 310. Positive parallax results in a perceived or virtual object 319 appearing behind the plane of the screen 310.
The second observer 307 views a left view 325 of the object and a right view 327 of the object with zero disparity. The zero disparity reflects the fact that the left view 325 of the object is in the same position on the screen 320 as the right view 327 of the object. Zero disparity results in a perceived or virtual object 329 that appears at the same depth as the screen 320.
The third observer 309 views a left view 335 of the object and a right view 337 of the object with negative disparity. The negative disparity reflects the fact that the left view 335 of the object is to the right of the right view 337 of the object on the screen 330. Negative parallax results in a perceived or virtual object 339 appearing in front of the plane of the screen 330.
It is worth noting at this point that disparity and depth may be used interchangeably in various implementations unless context indicates otherwise or requires otherwise. From equation 1 we know that parallax is inversely proportional to depth:
where "D" describes the depth (135 in fig. 1), "b" is the baseline length (150 in fig. 1) between the two stereo image cameras, "f" is the focal length (145 in fig. 1) of each camera, and "D" is the disparity of two opposite feature points (230 in fig. 2).
Equation 1 above is valid for parallel cameras with the same focal length. More complex formulas can be defined for other situations, but in most cases equation 1 can be used as an approximation. In addition, however, as known to those of ordinary skill in the art, equation 2 below is valid at least for various arrangements of intersecting (inverting) cameras:
wherein d isIs the disparity value of an object at infinity. dDepending on the angle of intersection and the focal length, expressed in meters (for example) rather than in the number of pixels. The focal lengths have been discussed above with reference to fig. 1 and with respect to focal length 145. The angle of intersection is shown in fig. 4.
Fig. 4 includes camera 105 and camera 110 in an interlaced configuration rather than the parallel configuration of fig. 1. The angle 410 shows the line of sight at which the cameras 105, 110 intersect, and the angle 410 may be referred to as an angle of intersection.
The disparity map is used to provide disparity information for the video image. A disparity map generally refers to a set of disparity values having a geometry corresponding to a pixel in an associated video image.
A dense disparity map generally refers to a disparity map having a spatial and temporal resolution that is generally the same as the resolution of the associated video image. Temporal resolution refers to, for example, frame rate, and may be, for example, 50Hz or 60 Hz. Thus, a dense disparity map typically has one disparity sample per pixel location. The geometry of a dense disparity map is typically the same as that of the corresponding video image, e.g. a rectangle with the following horizontal and vertical dimensions in pixels:
(i) 1920 × 1080 (or 1920 × 1200);
(ii) 1440 × 1080 (or 1440 × 900);
(iii) 1280 × 720 (or 1280 × 1024, 1280 × 960, 1280 × 900, 1280 × 800);
(iv) 960 × 640 (or 960 × 600, 960 × 576, 960 × 540);
(v) 2048 × 1536 (or 2048 × 1152);
(vi) 4096 × 3072 (or 4096 × 3112, 4096 × 2304, 4096 × 2400, 4096 × 2160, 4096 × 768); or
(vii) 8192 × 4302 (or 8192 × 8192, 8192 × 4096, 7680 × 4320).
The resolution of the dense disparity map may be substantially the same as, but may also be different from, the resolution of the associated image. In one implementation, disparity information on image boundaries is difficult to obtain. Thus, in that implementation, the disparity values on the boundary pixels are not included in the disparity map, which is smaller than the associated image.
Downsampled disparity maps generally refer to disparity maps having a resolution less than the native video resolution (e.g., divided by a factor of 4). The downsampled disparity map will, for example, have one disparity value per block of pixels.
A sparse disparity map generally refers to a set of disparities corresponding to a limited number of pixels (e.g., 1000) in a respective video image that are considered to be easily trackable. The limited number of pixels selected generally depends on the content itself. Often more than a couple of million pixels (1280 × 720 or 1920 × 1080) in an image. The pixel subset selection is typically done automatically or semi-automatically by a tracking tool capable of detecting feature points. Tracking tools are readily available. The feature points may be, for example, edge or corner points in the picture that can be easily tracked in other images. Typically a subset of pixels, preferably represents a feature of a high contrast edge of an object.
Disparity maps, or more generally disparity information, may be used for a variety of processing operations. Such operations include, for example, adjusting view interpolation (rendering) of 3D effects on consumer devices, providing smart subtitle arrangements, visual effects, and graphics insertion.
In one particular implementation, the graphics are inserted into the background of the image. In this implementation, the 3D presentation includes a stereoscopic video interview between a sports commentator and a soccer player, both in the foreground. The background includes a view of the gym. In this example, the disparity map is used to select a pixel from a stereoscopic video interview when the corresponding disparity value is smaller (that is, closer) than a predetermined value. Conversely, if the disparity value is larger (that is, farther) than the predetermined value, a pixel is selected from the figure. This allows, for example, the director to display the interview participants in front of the graphical images, rather than in front of the actual stadium background. In other variations, the background is replaced with another environment, such as, for example, a court, during the playback of the player's most recent scoring play.
In one implementation, the 3D effect is softened (reduced) according to user preferences. To reduce the 3D effect (reduce the absolute value of the disparity), a new view is interpolated using the disparity and the video image. For example, the new view is placed in a place between the existing left and right views, with the new view replacing one of the left and right views. Thus, the new stereoscopic image pair has a shortened baseline length and has reduced parallax, and thus has a reduced 3D effect.
In another implementation, extrapolation rather than interpolation is performed to enlarge the apparent depth, thereby enhancing the 3D effect. In this implementation, the new view is extrapolated corresponding to a virtual camera having a longer baseline length relative to one of the original left and right views.
In another embodiment, disparity maps are used to intelligently place subtitles in video images to reduce or avoid viewer discomfort. For example, subtitles should generally have a perceived depth in front of any object that the subtitles occlude. However, the perceived depth should generally have a depth comparable to the region of interest, rather than being too far in front of an object in the region of interest.
For many 3D processing operations, dense disparity maps are preferred over downsampled disparity maps or sparse disparity maps. For example, when disparity maps are used to achieve a user-controllable 3D effect, per-pixel based disparity information is generally preferred. Better results may generally be achieved based on per-pixel disparity information, as the use of sparse or downsampled disparity maps may degrade the quality of the synthesized view.
The disparity values may be represented in a variety of formats. There are several implementations to represent disparity values for storage or transmission using the following format:
(i) signed integers: 2 complement number
(a) negative disparity value indicates depth in front of the screen
Zero disparity value for objects in the screen plane
(ii) 1/8 pixels as a unit
(iii) Representing disparity values by 16 bits
Typical parallax ranges vary from +80 pixels to-150 pixels. This is generally sufficient for forty inch displays with a resolution of 1920 or 2048.
For 1/8 pixel precision, the range is between +640 and-1200 units, which can be represented by 11 bits +1 sign bits =12 bits.
To maintain the same 3D effect on an 8k display (with a resolution equal to four times the horizontal resolution of a display approximately 1920 or 2048 pixels wide), we typically need two additional bits to encode the disparity: 12+2=14 bits.
Provide 2 bits for future use.
Also, various implementations using the above-described format are also provided for dense disparity maps. Thus, to accomplish a dense disparity map for such an implementation, the above-described 16-bit format is provided for each pixel location in the respective video image.
Parallax and related depth variations cause occlusion between different views of a scene. Fig. 5 shows a left view 510 and a right view 520 combined together in the viewer's brain to produce a 3D scene 530. The left view 510, the right view 520, and the 3D scene 530 each contain three objects, including a fat cylinder 532, an ellipsoid 534, and a thin cylinder 536. However, as shown in FIG. 5, two of the three objects 532, 534, 536 are at different relative locations in each of the two views 510, 520 and the 3D scene 530. The two objects are the inflated cylinder 532 and the thin cylinder 536. The ellipsoid 534 is at the same relative location in each of the views 510, 520 and the 3D scene 530.
As will be explained in the following simplified discussion, different relative locations may produce occlusions. The left view 510 is shown in the left image 540, which also reveals the occlusion regions 545 and 548. The occlusion regions 545 and 548 are visible only in the left view 510 and not in the right view 520. This is because (i) the region corresponding to the occlusion region 545 in the right view 520 is covered by the inflated cylinder 532 and (ii) the region corresponding to the occlusion region 548 in the right view 520 is covered by the thin cylinder 536.
Similarly, a right view 520 is shown in the right image 550, which also reveals two occlusion regions 555 and 558. The occlusion regions 555, 558 are visible only in the right view 520 and not in the left view 510. This is because (i) the region corresponding to the occlusion region 555 in the left view 510 is covered by the fat cylinder 532 and (ii) the region corresponding to the occlusion region 558 in the left view 510 is covered by the thin cylinder 536.
In view of the possible occlusion in a stereo image pair, it is useful to provide two disparity maps for the stereo image pair. In one such implementation, a left disparity map is provided for a left video image and a right disparity map is provided for a right video image. Known algorithms can be used to assign disparity values to pixel locations for which each image cannot determine disparity values using standard disparity vector means. Occlusion regions can then be determined by comparing the left and right disparity values.
As an example of comparing left and right disparity values, consider a left-eye image and a corresponding right-eye image. One pixel L is on the Nth line with a horizontal coordinate x in the left eye imageL. The pixel L is determined to have a disparity value dL. The pixel R is on the Nth line of the corresponding right eye image and has the same value as xL+dLThe closest horizontal coordinate. The pixel R is determined to have about "-dL"parallax value dR. Then, it can be considered with a large degree of confidence that there is no occlusion in L or R because the parallaxes correspond to each other. That is, in general, for their determined disparity, both pixels L and R point to each other.
However, if dRNot with-dLApproximately the same, then an occlusion may exist. For example, if the two disparity values are significantly different, then after considering the symbol, occlusion can generally be considered to be present with a high degree of confidence. In one implementation, | d is used distinctly differentlyL-dR|>1 to indicate. In addition, if one of the disparity values (d)ROr dL) Unavailable, occlusion can generally be considered to be present with a greater degree of confidence. Disparity values may not be available because, for example, disparity values cannot be determined. Occlusion some involve one of the two images. For example, a portion of the scene shown by pixels associated with a disparity of lesser magnitude, or pixels corresponding to an unavailable disparity value, is generally considered to be occluded in another image.
One possibility to represent disparity values is to use integers to represent the number of pixels of disparity at a given pixel location in the video image. The disparity value represents the number of pixels of disparity for a particular horizontal resolution of the video image. Therefore, the disparity value depends on a particular horizontal resolution. Such an implementation is useful and may be efficient.
However, other implementations require disparity values to sub-pixel accuracy. Such implementations typically use floating point numbers to represent disparity values so that a fraction may be included in the disparity value. Several of these implementations provide disparity values that are specific to a given horizontal resolution. These implementations are also useful and may be effective.
Some other implementations represent disparity values as percentage values. Thus, instead of representing disparity as a number of pixels, disparity is represented as a percentage of the horizontal resolution. For example, if the disparity for a given pixel location is ten pixels, and the horizontal resolution is 1920, then the percentage disparity value is (10/1920) × 100. Such an implementation may also provide sub-pixel accuracy of parallax. The percentage value representation is typically a floating point representation rather than an integer representation. For example, one pixel disparity for a display with horizontal resolution 1920 is 1/1920, equal to 0.0005208 or 0.05208%.
Also, such a percentage disparity value may be directly applied to other horizontal resolutions. For example, assume that (i) the video image has a horizontal resolution of 1920, (ii) the video image is transmitted to the user's home, and (iii) the user's display device has a horizontal resolution of 1440. In this case, the user's display device (or set-top box, some other processor, or processing device) typically converts the horizontal resolution of the video image from 1920 to 1440, and also converts the disparity value so that the disparity value corresponds to the horizontal resolution of 1440. The conversion may be performed, for example, by multiplying the percentage disparity value by the horizontal resolution. For example, if the percentage disparity for a given pixel location is 0.5%, and the horizontal resolution is 1920, then the absolute disparity value is 1/2 x 1920/100. Several of these implementations use a single disparity value equal to a percentage disparity value in the transmission and storage of disparity values, regardless of the horizontal resolution of the video image and disparity map. Such an implementation is also useful and may be efficient.
As described above, the transmission system may use a horizontal resolution of a transmission format different from that of the video image. In addition, the receiving system may display video images using different horizontal resolutions. Thus, it may be necessary to convert from one horizontal resolution to another. Such conversion not only changes the resolution of the video image but also requires adjustment of the disparity value. In general, such conversion is required not only for absolute disparity values but also for percent disparity values.
The following example provides more detail regarding some tradeoffs between various implementations:
● (i) one implementation formats the disparity value as an absolute value (number of pixels) for a given video resolution of 1/8 with precision of one pixel (e.g., an object has a disparity of 10 pixels on video content with 1920 horizontal pixels).
● (ii) has many advantages, including simplicity and ease of handling.
● (iii) in one such system, 11 bits are used: 8 bits are used to provide the integer part of up to 255 pixels disparity and 3 bits are used for the fractional part (to achieve 1/8 accuracy or precision). Note that sign bits may also be used, or the system provides a disparity value of +/-127 pixels.
● (iv) if the video image needs to be reformatted during transmission, the disparity map is also formatted, which may result in information loss. For example, referring to fig. 6, one implementation uses a native format 610 with a horizontal resolution of 1920 and a transmit format 620 downsampled to have a horizontal resolution of 1280 (or, in another implementation, 1440). The depth or disparity map is filtered as the video image before the sub-sampling, which typically results in loss of depth detail. The filtering occurs in a filtering and sub-sampling operation 630. Filtering and sub-sampling operations are applied to both the video image and the disparity image.
● (v) and the new disparity values are transformed and typically corrupted. For example, after down-sampling in order to lower the resolution of the disparity map (that is, reduce the number of disparity values), the disparity values are converted into the resolution of the transmission format. When changing from 1920 to 1280, the disparity value of 10 pixels becomes 6.6666. This results in, for example, rounding the value to 6.625, since the fractional part can only be a multiple of 0.125 (1/8).
● (vi) after transmission, if the display is 1920 pixels wide, the final disparity value is 6.625 × 1920/1280= 9.9375. 9.9375 represents some distortion compared to the original value of 10. 9.9375 may be rounded up, down to the nearest integer, or for example, to the nearest 1/8, potentially causing information loss. If the value is rounded down, the loss is severe.
One solution is to use a percentage disparity that may be common to all horizontal resolutions. Such implementations described above have both advantages and disadvantages. The use of the percentage disparity value allows the conversion operation before transmission to be omitted.
Another solution is to use integer values that are not specific to any one of the commonly used resolutions (note that it is usually assumed that the picture has been vertically corrected and subjected to other processing. This solution suggests defining a reference resolution (or virtual resolution) of 11520 pixels, which is referred to in this application as the least common multiple ("SCM") of several standard TV resolutions (720, 960, 1280, 1440, 1920). Note that SCMs are also referred to as "lowest common multiples" in various references.
At least one implementation of such SCM solution has many advantages (other implementations need not have all of these advantages) including:
● (i) because the disparity values are integers, determining and storing the disparity values is simple and makes the disparity values easy to manipulate and process.
● (ii) the disparity value is no longer strictly absolute but has a relative aspect and is therefore independent of the native video resolution.
● (iii) does not require a fractional part.
● (iv) disparity value image percentage because it is relative and independent of native video resolution. However, disparity values are integers and therefore there is no obvious need to encode complex numbers like 0.00868% to describe the minimum disparity value. The minimum disparity value is one pixel, 1/11520 is 0.00868%.
● (v) there is no apparent need to transcode the disparity value during transmission, as the disparity value refers to 11520.
● (vi) when the SCM-based disparity value arrives, for example, at a set-top box ("STB"), the STB calculates the true absolute disparity for a given video resolution by performing very simple operations such as, for example, the following:
o (a) for 1920 resolution, disparity/6;
o (b) for 1440 resolution, disparity/8;
o (c) for 1280 resolution, disparity/9; and
o (d) disparity/12 for 960 resolution.
● (vii) regardless of which channels are used, disparity information is not of interest during transmission, as long as it is not transcoded.
● (viii) its operation is simple to implement even for newer consumer resolutions like 2k, 4k, 8k and can be easily implemented in the STB processing unit. Note that 2k generally refers to an image having a horizontal pixel resolution of 2048, 4k generally refers to 4096, and 8k generally refers to 8192. The operation is as follows:
o (a) for 2048 resolution, disparity × 8/45;
o (b) disparity × 16/45 for 4096 resolution; and
o (c) disparity x 32/45 for 8192 resolution.
In practice, one or more SCM implementations (1) determine disparity values for existing horizontal resolutions of respective video content; (2) converting those disparity values to 11520 scale by simple multiplication and/or division to generate SCM disparity values; (3) storing and transmitting the SCM disparity values without transcoding; and (4) converting the received SCM disparity value to a resolution of an output display using simple multiplication and/or division. Because transcoding is not used, such solutions generally do not suffer from information loss (e.g., rounding loss) caused by transcoding. Note that the above process does not change the resolution of the disparity map. Instead, the existing disparity values (for the existing resolution) are scaled so that they are based on, or reflect, a reference resolution (or virtual resolution) that is different from the actual resolution.
Various implementations generate disparity values through simple mathematical operations that are inverse to those described above. For example, to generate an SCM disparity value, the received absolute disparity value is multiplied and/or divided by one or two integers as follows:
o (i) 1920 parallax 6= SCM parallax;
o (ii) 1440 parallax x 8= SCM parallax;
o (iii) 1280 parallax 9= SCM parallax;
o (iv) 960 parallax 12= SCM parallax;
o (v) 2048 disparity 45/8= SCM disparity;
o (vi) 4096 parallax 45/16= SCM parallax;
o (vii) 8192 parallax 45/32= SCM parallax;
fig. 7 provides a more detailed process for determining the least common multiple for various horizontal resolutions. Column 710 lists the different horizontal resolutions. Column 720 lists the minimum factor (factor) for the horizontal resolution. For example, 960 is factored into 263 x 5, wherein 26Is the power of 6 of 2. Thus, 960=64 × 3 × 5. Also note that with respect to the horizontal resolution of 1280, 30Equal to 1.
Least common multiple 2 of the first four resolutions 960, 1280, 1440, and 19208*325, i.e. equal to 11520. By multiplying by 2 to the appropriate power and then dividing by 3 which does not appear in 2k, 4k and 8k2And a factor of 5, 11520 is used at 2k, 4k and 8k resolutions. Note that in various implementations, the power of 2 multiplication is performed using a bitwise left shift operation, rather than an actual multiplication operation. FIG. 7 includes a column 730 that provides conversion equations for converting between 11520 and the various resolutions shown in column 710.
The transfer equations of column 730 may be used to scale the disparity values according to resolutions supported by a variety of common display sizes (display size refers to the physical size of the display measured in, for example, inches or centimeters). In the example of fig. 6, the input disparity value based on, for example, 1920 horizontal resolution is scaled by a factor of 6 to convert the disparity value to a new disparity value based on 11520 horizontal resolution. The new disparity values are also based on the horizontal resolutions of 960, 1280, and 1440, since those resolutions are accommodated by the resolution of 11520 and are used in determining the resolution of 11520.
Alternative implementation mode is simpleUsing 11520 x 2 alone5A parallax resolution of = 368640. In this alternative implementation, no multiplication is required to convert 368640 back to the original resolution.
11520 is used for various implementations. However, other values may be used in other implementations. In one implementation, 11520 is doubled to 23040. In a second implementation, 368640 is doubled to 737280.
Alternatively, a different set of horizontal resolutions may be used in various implementations. This results in a different SCM. For example, in another implementation, only 1920 and 1440 output resolutions are of interest, so this implementation uses 5760 SCM. Then, to generate the SCM disparity value, the disparity value from 1920 resolution is multiplied by a factor of 3, while the disparity value from 1440 resolution is multiplied by a factor of 4.
It should be clear that the various implementations are not SCM implementations. For example, even 11520 values are not SCMs for all seven resolutions listed in column 710. Instead, the 368640 value is the SCM. However, the implementations described in this application are generally referred to as SCM implementations even if the disparity value is not the least common multiple of all horizontal resolutions.
Note that SCM implementations provide sub-pixel accuracy. For example, for 1920 resolution, the disparity value is converted to/from 11520 resolution using a factor of 6, thus providing 1/6 pixel accuracy. More specifically, if the disparity value based on 11520 is 83, the disparity value based on 1920 is 135/6. This clearly provides 1/6 pixel accuracy. This provides various advantages in terms of quality, as well as boundaries for future use. For example, if 1920 resolution is replaced by 2k resolution, then the disparity value based on 11520 still provides sub-pixel accuracy of 8/45 pixel accuracy, which is slightly less than the accuracy of 1/6 (7.5/45) pixels, but still greater than the accuracy of 1/5 (9/45) pixels.
At least one implementation using the SCM resolution of 11520 operates in a two byte (sixteen bit) format at 1920 × 1080Typical disparity values on a display (resolution) tend to vary between +80 and-150 pixels. Multiplying those numbers by six at 11520 reference resolution yields a range of +480 to-900. This 1380 range may use eleven bits (2)11= 2048). An alternative implementation uses ten bits to represent the absolute value of the disparity (the maximum absolute value of the disparity is 900) and one additional bit to represent the sign.
Yet another implementation reserves one bit by turning the sign of the disparity to recessive. For example, the disparity of the pixels in the left view is encoded together with the sign of the disparity. However, it is assumed that the disparities of the respective pixels in the respective right views have opposite signs.
Another implementation allocates bits indicating the views to which the dense disparity map corresponds in order to be able to provide one dense disparity map for each view (both left and right), thereby alleviating the problem caused by occlusion. Another implementation provides an implicit connection between the image (left or right) and the corresponding dense disparity map, so no bits need to be spent on this information. Variations of these implementations use one or more additional bits to introduce other types of graphs or images. One such implementation uses two bits to indicate whether the map is (i) a left image disparity map, (ii) a right image disparity map, (iii) an occlusion map, or (iv) a transparency map. One implementation uses a sixteen bit format, with 11 bits for indicating the range-900 to +480, 2 bits for indicating the type of graph, and 3 bits for standby.
FIG. 8 provides a block/flow diagram illustrating the operation of one or more implementations. Fig. 8 also illustrates some tradeoffs between different implementations.
Fig. 8 includes a processing chain 810 that processes video. The video image 811 has a horizontal resolution of 1920. However, the transmit format of processing chain 810 has a resolution of 1280. The video image 811 is then filtered and downsampled in operation 812 to generate a video image 813 having a horizontal resolution of 1280. Filtering and downsampling are performed together in the processing chain 810. However, in other implementations filtering and downsampling are performed separately. This filtering is used, for example, when downsampling the video image 811 to low-pass filter the video image 811 to prevent aliasing. The video image 813 is transmitted in a sending and/or storing operation 814.
The receive side of the processing chain 810 accesses a received video image 815 that may be the same as, similar to, or different from the video image 813. For example, in one implementation, the video image 815 is a stored version of the video image 813. Additionally, in another implementation, video picture 815 represents a reconstructed version of video picture 813 after a source encoding and decoding operation (not shown). And, in yet another implementation, video picture 815 represents an error corrected version of video picture 813 after a channel encoding and decoding (including error correction) operation (not shown). The video image 815 is processed in an upsampling operation 816 to produce a video image 817 having a 1920 horizontal resolution as in the original video image 811.
Fig. 8 also includes a processing chain 820 that processes disparity images corresponding to video images processed in processing chain 810. The parallax image 821 has a horizontal resolution of 1920 and includes an integer-valued parallax value based on the resolution of 11520. Note that a disparity image generally refers to any accumulation of disparity information, such as, for example, a dense disparity map, a downsampled disparity map, or a sparse disparity map. Also, the disparity map may correspond to, for example, a picture, a frame, a field, a slice, a macroblock, a partition, or some other set of disparity information.
However, the transmit format of processing chain 820 has a horizontal resolution of 1280. Accordingly, the parallax image is filtered and down-sampled in operation 822 to generate a parallax image 823 having a horizontal resolution of 1280. Filtering and downsampling are performed together in the processing chain 820. However, other implementations separate filtering and downsampling. This filtering is used, for example, when the parallax image 821 is downsampled to prevent aliasing as a destination from low-pass filtering the parallax value of the parallax image 821.
The parallax value of the parallax image 821 is an integer value. This can be done in various ways. In one implementation, the results of the filtering and downsampling operations are rounded to the nearest integer. In another implementation, any fractional portion is simply discarded. Yet another implementation uses floating point representation for the disparity values of the disparity image 823. Note that even after filtering and downsampling results in a resolution of 1280 of the parallax image 823, the parallax value is still based on 11520.
The parallax image 823 is transmitted in a sending and/or storing operation 824. The receiving side of the processing chain 820 accesses the received parallax image 825. The parallax image 825 may be the same as, similar to, or different from the parallax image 823. For example, in one implementation, the parallax image 825 is a stored version of the parallax image 823. Additionally, in another implementation, the disparity image 825 represents a reconstructed version of the disparity image 823 after source encoding and decoding operations (not shown). Also, in yet another implementation, the parallax image 825 represents an error-corrected version of the parallax image 823 after a channel encoding and decoding (including error correction) operation (not shown). However, if necessary, the parallax values in the parallax image 825 are kept in integer numbers by using, for example, rounding.
The parallax image 825 is processed in an upsample operation 826 to produce a parallax image 827 having 1920 horizontal resolution as in the original parallax image 821. Operation 826 generates an integer value for the disparity image 827 using, for example, rounding and truncation.
The parallax value of the parallax image 827 is converted from a value based on 11520 resolution to a value based on 1920 resolution in a conversion operation 828. As described above, the conversion operation 827 divides each disparity value by 6. The conversion operation 828 generates a parallax image 829. The disparity value of the disparity image 829 is expressed as a floating point number in order to maintain sub-pixel precision.
It should be clear that processing chain 820 includes at least important advantages. First, the disparity values are integers throughout the processing chain 820 until a final disparity image 829 is provided. Second, although the horizontal resolution of the transmission format is different from that of the native disparity map 821, the actual disparity values are not transcoded. Thus, the disparity value is applicable to a variety of different horizontal resolutions.
Then, the receiving system processes the video image 817 using the parallax image 829. As described above, this processing may include adjusting 3D effects, positioning subtitles, inserting graphics, or implementing visual effects.
Fig. 8 also depicts a processing chain 830 for comparison purposes. The processing chain 830 also processes parallax images corresponding to the video images processed in the processing chain 810. Processing chain 830 is an alternative to processing chain 820. It should be clear that the entire processing chain 830 is not shown in order to simplify fig. 8, as described below.
The parallax image 831 has a horizontal resolution of 1920 and includes a percentage-based parallax value with a floating-point representation. However, the transmit format of processing chain 830 has a horizontal resolution of 1280. Then, the parallax image 831 is filtered and down-sampled in operation 832 to generate the parallax image 833 with a horizontal resolution of 1280. Operation 832 may be similar to, for example, filtering and downsampling operations 812 or 822. The percentage-based disparity value of the disparity image 833 continues to be represented in floating point format.
The remainder of processing chain 830 (not shown) reflects the remainder of processing chain 820. The parallax image 833 is transferred in a sending and/or storing operation. The receiving side of the processing chain 830 accesses the received disparity image. The received disparity image is up-sampled to a horizontal resolution of 1920 and then the disparity value is converted from a percentage-based value to a value based on 1920 resolution. As described above, this conversion operation is a multiplication of percentage times 1920. However, in contrast to the processing chain 820, the disparity values of the disparity images in the processing chain 830 are always represented in floating point format.
Fig. 8 also depicts a processing chain 840 for comparison purposes. The processing chain 840 also processes the disparity images corresponding to the video images processed in the processing chain 810. Processing chain 840 is an alternative to processing chain 820. It should be appreciated that, as described below, the entire processing chain 840 is not shown in order to simplify FIG. 8.
The parallax image 841 has a horizontal resolution of 1920, and includes parallax values based on 1920 resolution and having a floating point representation. However, the transmit format of processing chain 840 has a horizontal resolution of 1280. Then, the parallax image 841 is filtered and downsampled in operation 842 to generate a parallax image 843 having a horizontal resolution of 1280. Operation 842 may be similar to, for example, filtering and downsampling operations 812, 822, or 823. The disparity value of the disparity image 843 continues to be represented in floating-point format.
Then, the parallax values of the parallax image 843 are converted in a conversion operation 850 so as to generate a parallax image 860. A conversion operation 850 converts the disparity value from a value based on 1920 horizontal resolution to a value based on 1280 horizontal resolution. The disparity values of the disparity image 860 continue to be represented in floating-point format.
The remainder of processing chain 840 (not shown) reflects the remainder of processing chain 820. The parallax image 860 is transmitted in a transmission and/or storage operation. The receiving side of the processing chain 840 accesses the received disparity image. The received disparity image is up-sampled to a horizontal resolution of 1920, and then the disparity value is converted from a value based on 1280 resolution to a value based on 1920 resolution. The conversion operation involves multiplying 1920/1280 the disparity value. As with the processing chain 830, and in contrast to the processing chain 820, the disparity values of the disparity images in the processing chain 830 are always represented in floating point format.
In another implementation of processing chain 840, conversion operation 850 is not performed. Therefore, the parallax value of the parallax image 843 remains the same as the parallax value based on the 1920 horizontal resolution. However, the horizontal resolution of the parallax image 843 remains the same as 1280. Thus, this implementation avoids conversion prior to transmission, and possibly reconversion after reception or retrieval. Avoiding conversion or re-conversion in at least some implementations also avoids rounding errors. This implementation has advantages and may be useful as all other implementations in this application. However, disparity values are represented by floating point numbers throughout the implementation.
Referring now to fig. 9, a video transmission system or apparatus 900 is shown to which the above-described features and principles may be applied. The video transmission system or apparatus 900 may be, for example, a headend or transmission system that transmits signals using any of a variety of media, such as, for example, satellite, cable, telephone line, or terrestrial broadcast. The video transmission system or apparatus 900 may also or alternatively be used, for example, to provide signals for storage. The transmission may be provided over the internet or some other network. The video transmission system or apparatus 900 is capable of generating and delivering, for example, video content as well as other content such as, for example, depth indications including, for example, depth and/or disparity values. It should also be appreciated that in addition to providing a block diagram of a video transmission system or apparatus, the blocks of fig. 9 also provide a flow chart of the video transmission process.
The video transmission system or apparatus 900 receives input video from the processor 901. In one implementation, the processor 901 simply provides raw resolution images, such as the parallax images 821, 831, 841 and/or the video image 811, to the video transmission system or apparatus 900. However, in another implementation, the processor 901 is a processor configured to filter and downsample, for example, as described above for operations 812, 822, 832, 842, to generate images such as the video image 813 and/or the parallax images 823, 833, 843. In yet another implementation, the processor 901 is configured to perform disparity conversion, such as, for example, operation 850, to generate a disparity image having converted disparity values, such as, for example, disparity image 860. The processor 901 may also provide metadata to the video transmission system or device 900 indicating, for example, the horizontal resolution of the input image, the horizontal resolution on which the disparity value is based, whether the disparity value is based on a percentage or a common multiple, and other information describing one or more input images.
The video transmission system or apparatus 900 includes an encoder 902 and a transmitter 904 capable of transmitting the encoded signal. The encoder 902 receives video information from the processor 901. The video information may include, for example, video images and/or parallax (or depth) images. The encoder 902 generates an encoded signal from the video and/or disparity information. Encoder 902 may be, for example, an AVC encoder. The AVC encoder can be applied to both video and disparity information. AVC refers to the existing International organization for standardization/International electrotechnical Commission (ISO/IEC) moving Picture experts group-4 (MPEG-4) part 10 Advanced Video Coding (AVC) standard/International telecommunication Union, telecommunication sector (ITU-T) H.264 recommendation (hereinafter, the "H.264/MPEG-4 AVC standard" or variants thereof like the "AVC standard", "H.264 standard, or simply" AVC "or" H.264 ").
Encoder 902 may include sub-modules including, for example, an assembly unit that receives and assembles various pieces of information into a structured format for storage or transmission. The various pieces of information may include, for example, coded or uncoded video, coded or uncoded disparity (or depth) values, and coded or uncoded elements such as, for example, motion vectors, coding format indicators, and syntax elements. In some implementations, the encoder 902 includes the processor 901, and thus performs the operations of the processor 901.
Transmitter 904 receives the encoded signal from encoder 902 and transmits the encoded signal in one or more output signals. The transmitter 904 may, for example, be adapted to transmit a program signal containing one or more bit streams representing the encoded pictures and/or information associated therewith. A typical transmitter performs functions such as, for example, providing one or more of error correction coding, interleaving data in the signal, randomizing the energy in the signal, and modulating the signal on one or more carrier waves using a modulator 906. The transmitter 904 may include, or interface with, an antenna (not shown). Also, implementations of the transmitter 904 may not be limited to the modulator 906.
The video transmission system or apparatus 900 is also communicatively coupled with a storage unit 908. In one implementation, the storage unit 908 is coupled to the encoder 902, and the storage unit 908 stores the encoded bitstream from the encoder 902. In another implementation, the storage unit 908 is coupled to the transmitter 904 and stores the bit stream from the transmitter 904. The bit stream from the transmitter 904 may comprise, for example, one or more encoded bit streams that have been further processed by the transmitter 904. In different implementations, the storage unit 908 is one or more of a standard DVD, a blu-ray disc, a hard drive, or some other storage device.
Referring now to fig. 10, shown is a video receiving system or apparatus 1000 to which the features and principles described above may be applied. The video receiving system or apparatus 1000 may be configured to receive signals on a variety of media such as, for example, satellite, wire, telephone line, or terrestrial broadcast. The signal may be received over the internet or some other network. It should also be appreciated that the blocks of fig. 10 provide a flow chart of a video reception process in addition to providing a block diagram of a video reception system or apparatus.
The video receiving system or apparatus 1000 may be, for example, a cellular telephone, a computer, a set-top box, a television, or other device that receives encoded video and provides, for example, decoded video signals for display (e.g., to a user), processing, or storage. Thus, the video receiving system or apparatus 1000 may provide its output to a screen of a television, a computer monitor, a computer (for storage, processing, or display), or some other storage, processing, or display device.
The video receiving system or apparatus 1000 is capable of receiving and processing video information, which may include, for example, video images and/or parallax (or depth) images. The video receiving system or apparatus 1000 includes a receiver 1002 that receives an encoded signal, such as, for example, a signal described in implementations of the present application. The receiver 1002 may receive, for example, one or more of a signal providing the video image 815 and/or the parallax image 825, or a signal output from the video transmission system 900 of fig. 9.
The receiver 1002 may, for example, be adapted to receive a program signal containing a plurality of bit streams representing encoded pictures. A typical receiver performs functions such as, for example, one or more of receiving a modulated and encoded data signal, demodulating the data signal from one or more carriers using a demodulator, derandomizing the energy in the signal, deinterleaving the data in the signal, and error correction decoding the signal. The receiver 1002 may include, or interface with, an antenna (not shown). Also, implementations of the receiver 1002 may not be limited to the demodulator 1004.
The video receiving system or device 1000 includes a decoder 1006. The receiver 1002 provides a received signal to a decoder 1006. The signal provided by the receiver 1002 to the decoder 1006 may comprise one or more encoded bitstreams. The decoder 1006 outputs a decoded signal such as, for example, a decoded video signal including video information. The decoder 1006 may be, for example, an AVC decoder.
The video receiving system or apparatus 1000 is also communicatively coupled with a storage unit 1007. In one implementation, the storage unit 1007 is coupled to the receiver 1002, and the receiver 1002 accesses the bit stream from the storage unit 1007. In another implementation, the storage unit 1007 is coupled to the decoder 1006, and the decoder 1006 accesses the bit stream from the storage unit 1007. In different implementations, the bitstream accessed from the storage unit 1007 includes one or more encoded bitstreams. In different implementations, the storage unit 1007 is one or more of a standard DVD, a blu-ray disc, a hard drive, or some other storage device.
In one implementation, the output video from decoder 1006 is provided to a processor 1008. In one implementation, processor 1008 is a processor configured to perform upsampling such as, for example, that described with respect to upsampling operations 816 and/or 826. In some implementations, the decoder 1006 includes a processor 1008, and thus performs the operations of the processor 1008. In other implementations, the processor 1008 is part of a downstream device such as, for example, a set-top box or television.
Note that at least one implementation uses extra bits to generate 2 disparity maps. The first disparity map is calculated for the "left" view and the second disparity map is calculated for the "right" view. Having two disparity maps helps to improve the management of occlusion given that objects may be occluded. For example, by comparing the respective disparity values, the system can determine whether occlusion exists and, if so, take the step of filling in the resulting hole. Additional implementations provide more disparity maps and allocate the appropriate number of bits to accommodate the number of disparity maps. For example, in a multi-view context like, for example, MVC (referring to AVC with MVC extension (annex G)), it may be desirable to transmit a set of disparity maps showing the computed disparity on a view-by-view basis. Alternatively, an implementation may send disparity maps relating to only a small set of views. The disparity may, for example, be calculated in a manner similar to the calculation of motion vectors. Alternatively, as is well known and described above, the disparity may be calculated from the depth values.
Various implementations also have advantages arising from the use of disparity values instead of depth values. Such advantages may include: (1) disparity values are bounded, while depth values may be infinite, so depth values are more difficult to represent/encode; and (2) disparity values can be directly represented, whereas representing potentially very large depth values often requires a logarithmic scale. In addition, determining depth from disparity is generally straightforward. Metadata is included in various implementations to provide information such as focal length, baseline distance (length), and convergence plane distance. The convergence plane distance is the distance that the camera axes intersect when the cameras intersect. The point where the camera axes intersect can be seen in fig. 4 as the vertex of angle 410. When the cameras are parallel, the convergence plane distance is infinite.
Accordingly, we provide one or more implementations having particular features and aspects. In particular, we provide several implementations related to dense disparity maps. Dense disparity maps may enable a variety of applications, such as, for example, relatively complex 3D effect adjustment on consumer devices, and relatively simple subtitle placement at a post-production stage. However, variations and additional applications of these implementations are contemplated, all of which are within the present disclosure, and features and aspects of the described implementations may be applied to other implementations.
Note that for one or more particular display sizes, a range of +80 to-150 pixels is used in at least one of the implementations described above. However, in other implementations, different parallax ranges may be used, with the end values of the ranges and/or the size of the ranges themselves varying, even for those particular display sizes. In one implementation, a performance in a theme park uses much more negative parallax (e.g., depicting objects coming out of the screen closer than halfway through) to achieve a more dramatic effect. In another implementation, professional devices are made to support a wider range of parallax than consumer devices.
Several implementations and features described herein may be used in the context of the AVC standard, AVC with an MVC extension (annex H), and/or AVC with an SVC extension (annex G). Additionally, these implementations and features may be used in the context of another standard (now or in the future), or in contexts that do not involve a standard.
Reference to "one embodiment," "an embodiment," "one implementation," or "an implementation," as well as other variations thereof, of the present principles means that a particular feature, structure, characteristic, and the like described in connection with the embodiment is included in at least one embodiment of the present principles. Thus, the appearances of the phrases "in one embodiment," "in an embodiment," "in one implementation," or "in an implementation," as well any other variations, appearing in various places throughout the specification are not necessarily all referring to the same embodiment.
In addition, this application or its claims may refer to "determining" various pieces of information. Determining information may include, for example, one or more of estimating information, calculating information, predicting information, or retrieving information from memory.
It should be understood that a given display may support a variety of different resolutions. Thus, a given display may be capable of displaying video content having a resolution of, for example, 1280, 1440, or 1920. However, a given display is often referred to as a 1920 display because the highest supported resolution is 1920. When a large display displays a low resolution image, individual elements of the image may contain multiple pixels. For example, if the display can support horizontal resolutions of 800 and 1920, the display is typically at least 1920 pixels wide. When the display displays an 800 resolution image, the display may at least partially assign three or more pixels to elements of the image.
Various implementations use a floating point representation of disparity values. A particular variant of such an implementation uses a fixed-point representation of disparity values instead of a floating-point representation.
It should be appreciated that, for example, in the case of "A/B", "A and/or B", and "at least one of A and B", the use of any of the following "/", "and/or" and "at least one" is intended to encompass the selection of only the first listed option (A), the selection of only the second listed option (B), or the selection of both options (A and B). As another example, in the case of "A, B and/or C," "at least one of A, B and C," and "A, B or at least one of C," such phrases are intended to include selection of only the first listed option (a), selection of only the second listed option (B), selection of only the third listed option (C), selection of only the first and second listed options (a and B), selection of only the first and third listed options (a and C), selection of only the second and third listed options (B and C), or selection of all three options (a and B and C). As one of ordinary skill in this and related arts will readily recognize, this may be extended for many of the listed items.
In addition, many implementations may be implemented in one or more of an encoder (e.g., encoder 902), a decoder (e.g., decoder 1006), a post-processor (e.g., processor 1008) that processes output from the decoder, or a pre-processor (e.g., processor 901) that provides input to the encoder. Also, other implementations are contemplated by the present disclosure.
Implementations described herein may be implemented, for example, in the form of a method or process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (e.g., discussed only as a method), the implementation of the features discussed may be implemented in other forms (e.g., an apparatus or a program). An apparatus may be implemented in, for example, suitable hardware, software, or firmware. The methods may be implemented, for example, in an apparatus such as, for example, a processor, which refers generally to a processing device, including, for example, a computer, microprocessor, integrated circuit, or programmable logic device. Processing devices also include communication devices such as, for example, computers, cellular telephones, portable/personal data assistants ("PDAs"), and other devices that facilitate communication of information between end-users.
Implementations of the various processes and features described herein may be implemented in a variety of different equipment or applications, for example, in connection with data encoding, data decoding, view generation, depth or disparity processing, and other images of images and related depth and/or disparity maps, among others. Examples of such equipment include encoders, decoders, post-processors that process output from decoders, pre-processors that provide input to encoders, video decoders, video codecs, web servers, set-top boxes, laptops, personal computers, cellular telephones, PDAs, and other communication devices. It should be clear that the equipment may be mobile, even installed in a mobile vehicle.
Additionally, the methods may be implemented by instructions being executed by a processor, such instructions may be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier, or such as, for example, a hard disk, a compact disc ("CD"), an optical disc (such as, for example, a DVD, which is often referred to as a digital versatile disc or a digital video disc), a random access memory ("RAM"), or a read only memory ("ROM"). The instructions may be in, for example, hardware, firmware, software, or a combination thereof. These instructions may be found, for example, in the operating system, a separate application, or a combination of the two. Thus, a processor may be characterized, for example, as both a device configured to execute a process and a device including a processor-readable medium (such as a storage device) containing instructions to execute a process. Also, a processor-readable medium may store, in addition to or in place of instructions, data values produced by one implementation.
It will be apparent to those of ordinary skill in the art that various implementations may produce a variety of signals formatted to convey information that may, for example, be stored or transmitted. This information may include, for example, instructions to perform a method, or data generated by one of the implementations. For example, the signal may be formatted to convey as data the rules for writing or reading the syntax of the embodiment, or to convey as data the actual syntax values written by the embodiment. Such signals may be formatted, for example, as electromagnetic waves (e.g., using the radio frequency portion of the spectrum) or as baseband signals. Formatting may include, for example, encoding a data stream and modulating the encoded data stream onto a carrier wave. The information signaled may be, for example, analog or digital information. It is well known that signals may be transmitted over a variety of different wired or wireless links. The signal may be stored on a processor readable medium.
Many implementations are described herein. Nevertheless, it will be understood that various modifications may be made. For example, some elements of different implementations may be combined, supplemented, modified, or removed to form other implementations. In addition, those of ordinary skill in the art will appreciate that other structures and processes may be substituted for those disclosed herein and that the resulting implementations perform at least substantially the same function in at least substantially the same way to achieve at least substantially the same result as the implementations disclosed herein. These and other implementations are then contemplated by this application.

Claims (30)

1. A method of video processing, comprising:
accessing a disparity value for a particular location in a picture, the disparity value corresponding to a particular resolution of the picture; and
generating a modified disparity value using the accessed disparity value according to a ratio of a reference resolution to a particular resolution of the picture, the modified disparity value corresponding to the reference resolution, the reference resolution being greater than the particular resolution and a resolution at which the picture is to be displayed and being a least common multiple of a standard TV resolution.
2. The method of claim 1, wherein the generating comprises scaling the accessed disparity value by a ratio of the reference resolution to the particular resolution, and wherein the reference resolution is determined based on a plurality of resolutions, the reference resolution being greater than each of the plurality of resolutions.
3. The method of claim 2, wherein the plurality of resolutions correspond to resolutions supported by a standard display.
4. A method as claimed in any one of claims 1 to 3, wherein said generating comprises scaling the accessed disparity values according to a common multiple of a plurality of resolutions.
5. The method of claim 4, wherein the common multiple is a least common multiple of the plurality of resolutions.
6. The method of claim 4, wherein the common multiple is 11520.
7. A method as claimed in claim 2 or 3, wherein the modified disparity value is indicative of a disparity value relating to a non-standard resolution that is much larger than any resolution of a standard display.
8. The method of claim 7, wherein the non-standard resolution is different from each of the plurality of resolutions.
9. A method as claimed in any one of claims 1 to 3, wherein the picture has a particular resolution.
10. A method as claimed in any one of claims 1 to 3, wherein the modified disparity value is an integer.
11. The method of claim 10, wherein the integer provides sub-pixel precision of disparity for a plurality of resolutions.
12. The method of claim 11, wherein the integer provides a disparity accuracy that is more accurate than 1/4 pixels.
13. A video processing apparatus, comprising:
means for accessing a disparity value for a particular location in a picture, the disparity value corresponding to a particular resolution of the picture; and
means for generating a modified disparity value using the accessed disparity value according to a ratio of a reference resolution to a particular resolution of the picture, the modified disparity value corresponding to the reference resolution, the reference resolution being greater than the particular resolution and a resolution at which the picture is to be displayed and being a least common multiple of a standard TV resolution.
14. A video processing apparatus, comprising:
a processor configured to:
accessing a disparity value for a particular location in a picture, the disparity value corresponding to a particular resolution of the picture; and
generating a modified disparity value using the accessed disparity value according to a ratio of a reference resolution to a particular resolution of the picture, the modified disparity value corresponding to the reference resolution, the reference resolution being greater than the particular resolution and a resolution at which the picture is to be displayed and being a least common multiple of a standard TV resolution; and
a modulator configured to modulate data indicative of the modified disparity value on a signal.
15. A method of video processing, comprising:
accessing a disparity value for a particular location in a picture, the picture having a particular resolution, and the disparity value corresponding to a reference resolution that is greater than the particular resolution; and
generating a modified disparity value using the accessed disparity value according to a ratio of the particular resolution to the reference resolution, the modified disparity value corresponding to the particular resolution,
wherein the reference resolution is the least common multiple of the standard TV resolution.
16. The method of claim 15, wherein the generating comprises scaling an accessed disparity value by a ratio of the particular resolution to the reference resolution.
17. The method of claim 15 or 16, wherein the modifying comprises scaling the accessed disparity value according to a common multiple of a plurality of resolutions.
18. The method of claim 17, wherein the plurality of resolutions correspond to resolutions supported by a standard display.
19. The method of claim 18, wherein the common multiple is a least common multiple of the plurality of resolutions.
20. The method of claim 18, wherein the common multiple is 11520.
21. The method of any of claims 18-20, wherein the reference resolution is a non-standard resolution much larger than any resolution of a standard display.
22. The method of claim 21, wherein the non-standard resolution is different from each of the plurality of resolutions.
23. The method of claim 15 or 16, wherein the access disparity value is an integer.
24. The method of claim 15 or 16, wherein the modified disparity value is an integer.
25. The method of claim 23, wherein the integer provides sub-pixel precision of disparity for a plurality of resolutions.
26. The method of claim 24, wherein the integer provides a disparity accuracy that is more accurate than 1/4 pixels.
27. A video processing apparatus, comprising:
means for accessing a disparity value for a particular location in a picture, the picture having a particular resolution, and the disparity value corresponding to a reference resolution that is greater than the particular resolution; and
means for generating a modified disparity value using the accessed disparity value according to a ratio of the particular resolution to the reference resolution, the modified disparity value corresponding to the particular resolution,
wherein the reference resolution is the least common multiple of the standard TV resolution.
28. A video processing device, comprising:
a demodulator for demodulating a signal including data indicating a disparity value for a specific location in a picture, the picture having a specific resolution, and the disparity value corresponding to a reference resolution greater than the specific resolution; and
a processor configured to generate a modified disparity value using the accessed disparity value according to a ratio of the particular resolution to the reference resolution, the modified disparity value corresponding to the particular resolution,
wherein the reference resolution is the least common multiple of the standard TV resolution.
29. The method of any of claims 1-3 or 15-16, wherein the particular resolution is a horizontal resolution and the disparity value is a horizontal disparity.
30. The apparatus of any one of claims 13, 14, 27, 28, wherein the particular resolution is a horizontal resolution and the disparity value is a horizontal disparity.
HK13109319.0A 2010-03-31 2011-03-31 3d disparity maps HK1182245B (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US31956610P 2010-03-31 2010-03-31
US61/319,566 2010-03-31
US39741810P 2010-06-11 2010-06-11
US61/397,418 2010-06-11
PCT/IB2011/000708 WO2011121437A1 (en) 2010-03-31 2011-03-31 3d disparity maps

Publications (2)

Publication Number Publication Date
HK1182245A1 HK1182245A1 (en) 2013-11-22
HK1182245B true HK1182245B (en) 2018-01-05

Family

ID=

Similar Documents

Publication Publication Date Title
CN102934451B (en) 3D disparity map
CN103562958B (en) scale-independent graph
EP2553932B1 (en) Disparity value indications
US10158838B2 (en) Methods and arrangements for supporting view synthesis
US20100231689A1 (en) Efficient encoding of multiple views
US9661301B2 (en) Image processing device and image processing method
CN101690249A (en) Method and system for encoding 3D video signal, packaged 3D video signal, method and system for 3D video signal decoder
HK1182245B (en) 3d disparity maps