HK1010624B

HK1010624B - Image conversion and encoding techniques

Info

Publication number: HK1010624B
Application number: HK98111644.0A
Authority: HK
Inventors: Duncan Richard Angus
Original assignee: Dynamic Digital Depth Research Pty Ltd
Priority date: 1995-12-22
Filing date: 1996-12-20
Publication date: 2003-10-03

Description

The present invention is generally directed towards stereoscopic image synthesis and more particularly toward a method of converting two dimensional (2D) images for further encoding, transmission and decoding for the purpose of stereoscopic image display on two dimensional (2D) or three dimensional (3D) stereoscopic displays systems. Recent improvements in technology in the areas of compact high performance video projection systems, image processing, digital video and liquid crystal panels have made possible many practical 3D display systems utilising both active and passive polarising glasses and both single and multi-viewer autostereoscopic displays. Three dimensional display systems have moved out of the arena of being technological curiosities and are now becoming practical display systems for entertainment, commercial and scientific applications. There has now emerged the requirement for 3D media to display on these devices. Traditionally there has been just two ways to produce this 3D media (ie media produced that contains image information for at least two separate views ol the same scene from different perspectives). These are:

1) Generation of two separate views (usually in real time) by a computer.
2) Videoing or filming with two laterally displaced cameras

In the case of computer generated Images for usage in Computer Aided Design (CAD) systems, simulators or video game equipment, it is not a complex process to produce two separate images with different perspective's. The filming of movies utilising two laterally displaced cameras to produce 3D has been well understood for many years. However, there are many problems with this approach. It is considerably more difficult to film or video in 3D than in 2D because there are limits to the permissible distance between the nearest and farthest objects in the scene (practical 3D depth of field) as well as framing problems (such as near objects being seen on only one camera and thus highlighting the inaccuracy of the 3D image generation when re-played.

Another problem is maintaining a smooth pan without causing false 3D artefacts due to latency between the images from the two cameras and so on.

Because of the complexity, high cost of production and implementation, and the fact that there are as yet still only a very small number of 3D display systems being produced for the domestic and commercial markets there has not been a large incentive for the major producers of films or videos to produce 3D media. However, if a technique was devised that would allow conventional 2D films to be re-processed into a 3D version then it would be possible to not only convert new films into 3D format for significantly less cost than filming them directly in 3D in the first place but it would also make possible the re-processing of the vast archives of 2D film and video material for re-release to both the cinema and video markets.

It would however be advantageous to be able to convert an existing 2D image so that it can be viewed as a 3D image. One way of achieving this is to convert a single 2D image to two separate left and right images by a 'cut and paste' technique. In this technique, an object is 'cut' from the image and laterally displaced left or right then 'pasted' back onto the original image to produce the required separate images. This however results in a blank region in the area formally occupied by the object within the image.

US4925294 discloses that selected image areas of the left and right eye images are displaced and increased in size in order to fill the empty space caused by the displacement.

The present invention seeks to overcome or minimise at least one of these problems.

According to one aspect of the present invention there is provided a method of producing left and right eye images for a stereoscopic display from an original 2D image comprising displacing selected areas of said original 2D image by a determined amount and direction to form left and right eye images, characterised in that said displacing comprises generating stretched images wherein each of said left and right eye images includes portions of said selected original 2D image areas which have been compressed and portions which have been stretched.

The two converted images when respectively viewed by the left and right eye of a viewer can provide a 3D image without any blank region as would be the case with images produced by the 'cut and paste' technique.

The method of producing left and right eye images for a stereoscopic display from an original 2D image may include the steps of:

a) identifying at least one object within said original image;
b) outlining said or each object;
c) defining a depth characteristic for said or each object;
d) respectively displacing selected areas of said or each image by a determined amount in a lateral direction as a function of the depth characteristic of said or each object, to form two stretched images for viewing by the left and right eyes of the viewer.

These image pairs may be either mirrored or similar to each other so that the stereoscopic 3D effect is optimised.

The image may include a plurality of objects with each object being provided with a said respective depth characteristic. Images may be converted on an individual basis. Alternatively, a series of related images as in a video or film may be converted.

The image may be digitised and the image may be stretched or converted electronically by temporarily placing a mesh over the image, the mesh initially having a plurality of parallel lateral mesh lines and a plurality of parallel longitudinal mesh lines positioned at right angles to the lateral mesh lines. Each intersection of the mesh lines on the mesh may provide a mesh sub-point. The image can move together with the mesh so that distortion of the mesh results in the stretching of the underlying image. The mesh lines may remain continuous to provide for a smooth stretching of the image. The amount of displacement of each of the mesh sub-points from their initial position may provide the conversion data for said original image. The sub-points may be displaced in a lateral direction.

The displacement of the mesh sub-points may also be defined by a mathematical algorithm to thereby provide for automatic conversion of images. Further enhancements to the method could be to add shadow, blurring and motion interpolation data to the conversion data including force paralex information and field delay and direction for motion paralex delays.

The present invention can be applicable for use in image transmission systems sending video signals that provide 2D images.

The invention is not limited to the conversion of existing 2D video images. Rather the process can be readily used to produce the conversion data simultaneously with the creation of the 2D video image.

According to a second aspect of the present invention there is provided a system for producing left and right eye images for a stereoscopic display from an original 2D image, wherein said system includes means to select areas of said original image and means to displace said areas by a determined amount and direction to form left and right eye images, characterised by said displacement means generating stretched images wherein each of said left and right eye images includes portions of said selected original 2D image areas which have been compressed and portions which have been stretched.

It will be convenient to further describe the invention by reference to the accompanying drawings which illustrate possible implementations of the present invention.

Other implementations of the invention are also possible and consequent the particularities of the accompanying drawings is not to be understood as superseding the generality of the preceding description.

In the drawings:

Figure 1 shows an original image and conventional left and right images for providing a 3D or stereoscopic image;
Figure 2 shows an original image and left and right images for providing a 3D image produced using a cut and paste technique;
Figure 3 shows an original image and an image generated by the Dynamic Depth Cueing (DDC) method according to the present invention;
Figure 4 shows a left and right image and the resultant 3D image according to the present invention;
Figure 5 shows an image distorted discontinuously by a distortion mesh;
Figure 6 shows an image distorted continuously by a distortion mesh;
Figure 7 shows example mesh Spatial Displacement (MSD) data for a left and right mesh; and
Figure 8 shows a flow chart illustrating how MSD data is added to a video image according to the present invention.

The method according to the present invention for enabling 2D or "monoscopic" video signals to be converted to 3D or "stereoscopic" video signals is referred to as Dynamic Depth Cueing (DDC) in the following description and embraces the following but is not limited to these techniques:

a) 3D GENERATION - A technique and procedure for converting 2D images into 3D stereoscopic image pairs and for producing the 3D conversion data.
b) 3D SCRIPTING - A technique to describe the changes required to be made to a 2D image in order to convert it to a 3D stereoscopic image pair. Describes which objects are selected, how they are processed and provides for a means of storage of 3D data.
c) 3D DATA ENCODING - A technique for adding information to a 2D video image in a defined format. The resulting modified video is compatible with existing video recording, editing, transmission and receiving systems.
d) 3D STANDARDISED PROTOCOL - The 3D Conversion data is added to the 2D video using a defined data format or standardised protocol. This protocol may well become a world-wide standard for adding 3D Conversion data to 2D transmissions.
e) 3D DATA DECODING - A technique for receiving the 2D video image plus conversion data and to extract the information added to a 2D video image so that a 3D stereoscopic Image pair may be synthesised.
f) 3D SYNTHESIS - A technique to manipulate the 2D video image using the conversion data to synthesise a 3D stereoscopic image pair.

In order to convert a 2D image to a simulated 3D image it is necessary to modify the original image to produce two slightly different images and present these separate images to the left and right eyes independently. The modification to the original image consists of a lateral shift of objects within the image plane (located at the projection or viewing screen) in order to give the impression of depth. To make an object in an image appear farther away from the viewer, with respect to the image plane, then it is necessary to present the object within the image to the left eye with a slight left lateral shift and that to the right eye with a slight right lateral shift. This is illustrated in Figure 1. To make an object appear to the viewer to be closer, it is necessary to shift the object within the image for the left eye laterally to the right and the object within the image for the right eye laterally to the left. For an object to be positioned at the image plane the object is placed in the image at the same position for both eyes. When viewing objects in the real world, a viewer also makes use of locus information. However, with simulated 3D this information is not present and if the lateral shifts are made too great, particularly in order to bring an object closer to the viewer, then the object appears to break into two separate images and the 3D effect is lost. The left and right images can be produced using a computer. The image is firstly digitised using a video digitiser and the resulting data stored in memory. The two new images can then be generated.

The simplest way to generate the new left and right images with the required lateral shift is to "cut" the objects from the image and "paste" them back with the necessary lateral displacement, this will be referred to as the "cut and paste" technique. This can be done by first defining the position of an object to be "moved" by identifying it then "cutting" the object from the image and moving it laterally. The problem with this simple technique is that once the selected object is moved the background is also removed and a blank region in the background results, see Figure 2. According to the present invention, an object within an image is "stretched" to provide the necessary lateral shift and retain the original background detail. The resulting lateral distortions of the image are smoothed mathematically so that the resultant effect is perceived as 'real' 3D with little or no visual artefacts. To better visualise the effect of this stretching on the image, imagine that the image to be converted has been printed onto a thin sheet of rubber. It is possible to pick a point on the surface of the image, adjacent to an object, and stretch it into a new position, to the right of its original position for example. The section of the image to the right of the object is therefore compressed and that to the left stretched, see Figure 3. To the viewer the object now appears distorted if viewed by both eyes. However, if a similar but oppositely stretched image is presented to the other eye, then the viewer does not see a distorted image, rather an object that has 3D characteristics. This is illustrated in Figure 4. The "stretching" of an object within an image can be undertaken electronically. The objects of interest in each video frame are firstly identified by outlining them. For each object a depth or mesh distortion characteristic is also defined. The stretching can be conducted by enabling an operator to stretch the image and view the effect of the resulting 3D image in real time. Operator skill and artistic intervention can be used to determine the 3D impact of the resulting image and subsequent video sequence. Whilst individual video frames could be converted manually (ie, non real time) we also envisage automatically (ie. real time) converting a series of related frames that form a video "clip". The operator will define the start and end frames of the video clip to be converted. They will also determine the relative depth of each object, relative to the image plane, in the start and end frames. The video clip will be processed using the start and end positions and depth of each object in the clip, to interpolate the required stretching or manipulation for the intermediate frames. In the case of multiple overlapping objects which have different depths, foreground objects are given priority. This is inherently the case, since the original 2D image has been captured with a single camera, thus the pixel information has automatically been prioritised to the foreground. This "stretching'' of the image can be undertaken electronically by manipulating the digitised image. A mesh (grid) is temporarily placed over the image to be distorted such that, prior to distortion, the co-ordinates of each row and column of the mesh is 0,0. The mesh x co-ordinates are altered which results in the underlying image being distorted. Rather than just the image area immediately under the mesh line being moved, which would result in a discontinuity - figure 5, adjacent mesh lines are also moved to produce a smooth distortion, Figure 6. The coarseness of the distortion mesh determines the impact of the 3D effect. The coarser the mesh the more splined other objects doser to the object being stretched are impacted. This results in a lower 3D impact in the resulting image. A finer mesh results in sharper edges to the objects, a higher impact 3D effect, but greater edge discontinuities. The order of the distortion mesh will for explanation purposes be assumed to be 16 X 16. Information on each subpoint on the mesh (ie. co-ordinate positions after distorting) are encoded so as to produce background and foreground subpoints. For example, 4 bits can be used for the subpoint encoding which will result in 16 different levels, 4 background and 12 foreground. The format of the subpoint encoding can also be determined by experimentation and adjusted to suit the application. Alternatively, this mesh distortion process may be defined by a mathematical algorithm which would enable automatic processing of images.

Note that once the mesh distortions for the left eye have been determined, then the co-ordinates of the distortions for the right eye are simply obtained by scalar multiplication of the matrix by -1 ( ie. shifted In the opposite lateral direction by the same amount) and can be calculated automatically. This is illustrated in Figure 7. The matrix that is formed from the relative horizontal offset of each intersection point of the distorted mesh defines the Mesh Spatial Displacement (MSD) data. In order to fully define and reproduce the resultant 3D image all that is necessary is to provide the original, unaltered, 2D image and the MSD data. Thus 3D images may be stored, transmitted, generated, edited, and manipulated by considering the 2D image and an associated MSD data file. it is therefore possible to store and transmit 3D images over conventional 2D video systems by encoding MSD data within each video frame. Since the original 2D video image is stored and can be transmitted without alteration, the resulting video is fully compatible with all existing video and television systems. Existing 2D TV receivers will display a normal picture. A number of existing techniques can be used to add the MSD data to the 2D image such that it is not detected by the viewer and is compatible with existing video standards. These techniques include, but are not limited to:

a) inserting the MSD information in the spare lines at the top and bottom of the picture that are set at black level, ie. in a similar manner to the addition of "Teletext" data;
b) in the unseen over-scan region at the left and right of each image;
c) In the horizontal sync period, along the lines of the British Broadcasting Corporation "sound in sync" system.

In the future, with the introduction of digital HDTV, spare digital data frames will be available to insert the MSD data. The process of adding the MSD data to a 2D video image to form a DDC encoded video frame is illustrated in Figure 8. The amount of MSD data is small, estimated to be approximately 100 bytes per frame. This can be further compressed if necessary, for storage and transmission, by using standard data compression techniques such as run-length or differential encoding.

Because of the small amount of data, the required data rate is also low. It is also possible to use spatial and temporal compression to further reduce the data required since the MSD data does not vary rapidly over a number of frames. The exact time relationship between the MSD data and its associated frame is not critical, a displacement error of one frame is probably acceptable.

Again due to the small amount of data, low data rate and non-critical alignment, the MSD data could be sent over a number of frames i.e. four frames with a quarter of the information in each frame.

Claims

A method of producing left and right eye images for a stereoscopic display from an original 2D image comprising displacing selected areas of said original 2D image by a determined amount and direction to form left and right eye images, characterised in that said displacing comprises generating stretched images wherein each of said left and right eye images includes portions of said selected original 2D image areas which have been compressed and portions which have been stretched.
A method as claimed in claim 1, characterised in that said method further includes the steps of:
a) identifying at least one object within said original image;

b) outlining said or each object;

c) defining a depth characteristic for said or each object, wherein the amount of displacement of said areas is determined as a function of the depth characteristic of said or each object.
A method as claimed in any preceding claim, characterised in that one of said stretched left and right eye images is a mirror image of the other said stretched image.
A method as claimed in claim 2, characterised in that a separate depth characteristic is provided for each said object.
A method as claimed in any preceding claim, characterised in that a plurality of 2D images are converted.
A method according to claim 1, wherein said method includes the steps of:
producing a digitised 2D image;

forming a mesh over said digitised image, said mesh initially having a plurality of parallel lateral mesh lines and a plurality of parallel longitudinal lines, wherein said lateral lines are positioned at right angles to said longitudinal lines and intersect to form a plurality of sub-points; and

distorting the mesh by moving selected ones of said sub-points by determined amounts to thereby displace selected areas of the underlying image to stretch the underlying image.
A method as claimed in claim 6, characterised in that said mesh lines between adjacent sub-points remain continuous at the conclusion of any distortion of the mesh.
A method as claimed in claim 6 or claim 7, characterised in that the subpoints are displaced in a lateral direction to distort the mesh.
A method as claimed in any one of claims 6 to 8, characterised in that the amount of distortion of each sub-point is used to produce data to enable the conversion of said original 2D image into left and right eye images for a stereoscopic display; said data describing which objects within an image are to be processed, how said objects will be processed, priority of said objects over other objects and their depth characteristics.
A method as claimed in claim 9, characterised in that a mathematical algorithm is generated to define the distortion required for each said subpoint.
A method as claimed in claim 9 or claim 10, characterised in that the original 2D image and conversion data is capable of transmission along standard 2D technology.
A system for producing left and right eye images for a stereoscopic display from an original 2D image, wherein said system includes means to select areas of said original image and means to displace said areas by a determined amount and direction to form left and right eye images, characterised by said displacement means generating stretched images wherein each of said left and right eye images includes portions of said selected original 2D image areas which have been compressed and portions which have been stretched.
A system as claimed in claim 12, characterised in that said system further includes:
a means to identify objects within said original image; and

means for defining a depth characteristic for each object, wherein the amount of displacement of said areas is determined as a function of the depth characteristic of said or each object.
A system as claimed in claim 12 or claim 13, characterised in that said system further includes a means to create a mirror image of one stretched image whereby said one stretched image and the mirror image form said left and right eye images.
A system as claimed in any one of claims 12 to 14, characterised in that said means for defining a depth characteristic is capable of defining a separate depth characteristic for each object in the image.
A system as claimed in any one of claims 12 to 15, characterised in that said system is capable of converting a plurality of 2D images.