GB2458305A

GB2458305A - Providing a volumetric representation of an object

Info

Publication number: GB2458305A
Application number: GB0804660A
Authority: GB
Inventors: Oliver Grau
Original assignee: British Broadcasting Corp
Current assignee: British Broadcasting Corp
Priority date: 2008-03-13
Filing date: 2008-03-13
Publication date: 2009-09-16
Anticipated expiration: 2028-03-13
Also published as: GB0804660D0; GB2458305A8; GB2458305B

Abstract

A method of producing a 3D model of an object 40 in an image comprises the steps of projecting a plurality of voxels 32 onto a 2D image plane, selecting the voxels which are within a predetermined distance of the object 40 and generating a 3D model based on the selected voxels. This may be done by segmenting the image into foreground and background to create a silhouette of the object and performing intersection tests for each voxel. The silhouette or the voxel footprint 38 (or an approximation of the footprint) may be expanded by the predetermined distance such that the intersection test will select both voxels which intersect the object and voxels which are close to the object. A distance map may be created for each pixel of the 2D image, which may then be used to determine if the voxel is within the predetermined distance of the object. This method avoids truncating the image as the target volume is over-estimated.

Description

Providing a volumetric representation of an object The present invention relates in certain aspects to methods and systems for generating volumetric representations or 3D models of objects or scenes from one or more two-dimensional images.

The generation of high-quality 3D models of scenes and objects from 2D images is a complex computational problem. Some prior art techniques are disclosed in patent publications GB2399703 and GB2418827. The general goal of such techniques is to reconstruct the 3D shape of an object from a number of 2D silhouette images obtained from different cameras.

GB241 8827 describes a reconstruction approach using an octree data representation and describes methods to improve the accuracy of the 3D model generated. GB2399703 describes methods for reconstructing object models in which a volumetric representation is refined using line segments which intersect object boundary positions.

Many prior art solutions rely on source images taken under controlled conditions (e.g. in a studio), where lighting conditions can be carefully controlled and camera positions can be calibrated accurately, especially since cameras are usually static.

Particular problems are encountered when attempting to extend these techniques to real-world settings. One particular example is sports broadcasts, where the ability to reconstruct reliable 3D models of a scene from images recorded by conventional television cameras could enable new views of the scene to be synthesised from the reconstructed model, thus allowing events to be presented from viewpoints other than those originally recorded. For example, a replay of an event could be generated from an arbitrary viewpoint (for example from a player position). This, however, requires techniques that are robust to camera calibration errors, variable lighting conditions and that can operate at real-time or near real-time speeds.

Shape-from-silhouette is a popular technique for generating 3D models from images. It requires a set of camera images with known camera parameters and a segmentation of the objects in the images. The latter can be realised with chroma-keying techniques in a studio or in sporting events using the green of the pitch, as described in patent publication GB232 1814, or with difference keying, as described in patent publication EP1480469.

The 3D shape of the foreground object can be reconstructed using its silhouettes from the segmented images by projecting volumetric elements (voxels) into the segmented images using the camera parameters and testing whether they intersect with the silhouettes.

Both the segmentation and the camera parameters are usually prone to errors. In the case where a silhouette is too small the reconstructed 3D model will be truncated. The camera parameters are estimated with a process called camera calibration. Errors in these parameters lead to a truncation of the resulting 3D model, as depicted in Figure 1.

The present invention seeks to alleviate some of the problems

associated with prior art techniques.

Accordingly, in a first aspect of the invention, there is provided a method of generating a 3D model of an object represented in an image, comprising: projecting a plurality of voxels into the image plane (of the image); selecting voxels of the projected plurality of voxels in dependence on the projection; and generating the 3D model based on the selected voxels; wherein the selecting step comprises, for a given voxel, determining whether the projected voxel extends, in the image plane, to within a predefined distance of the object.

The 3D model generated may be in the form of a volumetric representation (e.g. the set of selected voxels), or some other representation derived from a volumetric representation, for example a surface model such as a polygon mesh. The term voxel refers to a volume element. Preferably, cube-shaped voxels are used, though other shapes may also be used.

Projecting the voxel preferably comprises computing information relating to the projection of the voxel in the image plane. However, the full projection of the voxel need not necessarily be generated. For example, the projection of one or more (representative) 3D points of the voxel may be computed, for example one or more (but not necessarily all) vertices of the voxel, or a centre point of the voxel. The projection of a voxel is referred to herein as the footprint of the voxel in the image plane. Since some embodiments use only an approximation of the footprint, it may not be necessary to compute a complete description of the footprint. Other embodiments calculate a complete description of the footprint, for example by projecting each vertex of the voxel into the image plane.

Since the test to determine whether to select a voxel does not require the projected voxel to intersect the object, but only that it extends to within a defined distance of the object, voxels are more likely to be selected using this approach, which will tend to result in a larger volumetric representation of the target object. This tendency can serve to alleviate the impact of certain errors, for example camera calibration and segmentation errors, which could otherwise lead to truncation of the reconstructed model. The present method provides a conservative estimate of the target volume, which is more likely to include the target surface, and which can serve as the starting point for model refinement processes.

The term predefined distance' preferably refers to a positive nonzero distance value. The test performed is thus not merely an intersection test, but a proximity test, testing proximity of the footprint to the object. Thus, a voxel will satisfy the test both in the case where the footprint intersects the object and in the case where it does not intersect the object but extends to within the defined distance of the object. If the footprint neither intersects nor extends to within the defined distance, then the voxel will not satisfy the test. Proximity to the object is typically determined based on a silhouette image providing a silhouette of the object as set out in more detail below.

Thus the method preferably comprises not selecting the given voxel if the projection of the given voxel does not extend to within the predefined distance. If it does extend to within the distance, it may be selected, but this may depend on other images of the object as described in more detail later. In a particular example, the method comprises performing the projecting step for each of a plurality of images representing the object, and selecting comprises selecting a voxel if, for each image, the voxel extends to within a predefined distance of the object as represented in the image. Thus, the information from multiple images (typically from different viewpoints) may be combined to provide an improved model.

Preferably, the method comprises generating a silhouette image (from the image) representing a silhouette of the object; and the determining step comprises determining whether the projected voxel extends to within the predefined distance of the object silhouette. This allows proximity to the object to be tested more efficiently.

The proximity test can be performed in a number of ways. In one embodiment, set out in more detail below, the determining step comprises performing an intersection test, the intersection test comprising determining whether an expanded approximation to the voxel footprint intersects the object silhouette. In another embodiment, set out in more detail below, the determining step comprises determining whether the voxel footprint or an approximation thereto intersects an expanded object silhouette.

More specifically, the method may comprise: generating an initial silhouette image comprising an object silhouette; and expanding the object silhouette to produce the silhouette image. Preferably the object silhouette is expanded by a predetermined number of pixels corresponding to the predefined distance. Preferably, the determining step comprises determining whether the projected voxel intersects the expanded silhouette. By using the expanded silhouette, the proximity test can be performed efficiently.

Generating the initial silhouette image or the silhouette image may comprise segmenting the image into at least image foreground and image background, the image foreground comprising the object silhouette. Multiple scene objects may also be represented by the image foreground (i.e. the image foreground may define multiple object silhouettes). The expansion may then be performed for each silhouette, or for the image foreground generally.

Expanding the silhouette preferably comprises performing a dilation operation on the image foreground or an erosion operation on the image background (or vice versa, depending on the encoding of foreground and background). This can provide a simple and efficient method of performing silhouette expansion, which can accommodate multiple object silhouettes.

The method may comprise testing one or more pixel locations corresponding to the projected voxel to determine whether at least one such pixel location falls within the expanded silhouette.

To simplify the intersection test, the method may comprise determining an approximation to the area of the projected voxel (i.e. an approximation to the voxel footprint), the determining step (or intersection test) being performed using the approximation. This can in some cases avoid the need for testing each individual footprint pixel to determine if any footprint pixels fall within the object silhouette.

The method may comprise generating a distance map for the silhouette image, the distance map specifying, for one or more pixels of the silhouette image (preferably at least for each background pixel), the distance to the nearest object (or foreground) pixel, the determining step using the distance map. This can enable the intersection test to be performed more efficiently.

When referring to silhouette images, the terms object, silhouette and foreground are generally used interchangeably (e.g. an object pixel is identified in the silhouette image by a foreground pixel), unless the context requires otherwise.

Preferably, the method comprises determining a measure of the extent of the projected voxel in relation to a reference point in the projected voxel, and determining whether the projected voxel intersects using the determined measure. This can simplify the intersection test, since intersection can be determined using the single determined measure, instead of testing multiple (possibly all) pixels within the footprint. The method preferably comprises determining whether a distance value obtained from the distance map for the reference point exceeds the determined measure by at least a predetermined value corresponding to the predefined distance (e.g. by determining whether the determined measure plus the predetermined value is less than or equal to the distance value from the distance map). In this way, the intersection test can be achieved by way of a simple numerical comparison.

To simplify computation, the measure preferably defines the extent of an approximation of the area of the projected voxel. In preferred examples, the measure is the radius of a circle representing an approximation of the projected voxel footprint, preferably a bounding circle, more preferably a minimum bounding circle of the voxel footprint. The reference point is then preferably the circle centre. This can provide a fairly close approximation to the voxel footprint and can enable simple calculation of the intersection test.

In a further aspect, there is provided a method of generating a volumetric representation of an object represented in at least one source image, comprising, for the or each source image: segmenting the source image into foreground and background, the image foreground representing at least one object silhouette, to produce a silhouette image corresponding to the source image; expanding the object silhouette by a predefined distance (at least one pixel); and projecting a plurality of voxels into the image plane and performing intersection tests to determine whether projected voxels intersect the expanded silhouette; the method further comprising: selecting voxels for inclusion in a volumetric representation of the object in dependence on the outcome of the projections and intersection tests; and generating the 3D model based on the selected voxels.

In a further aspect, there is provided a method of generating a volumetric representation of an object represented in at least one source image, comprising: segmenting the source image into foreground and background, the image foreground representing at least one object silhouette, to produce a silhouette image corresponding to the source image; generating a distance map for the silhouette image, the distance map specifying, for given pixels in the silhouette image, the distance to the nearest foreground pixel; and for each of a plurality of voxels considered for inclusion in a volumetric representation of the object: determining a measure of the extent of the voxel footprint resulting from projection of the voxel into the image plane, the measure determined in relation to a reference point, and excluding the voxel from the volumetric representation if a value from the distance map at a location corresponding to the reference point exceeds the measure by at least a predefined distance value (where the distance value is greater than zero as mentioned above). As previously mentioned, an approximation to the footprint is preferably used, e.g. a bounding circle, with the circle radius providing the determined measure.

Preferably, in any of the above aspects, the predefined distance is determined in dependence on camera calibration information for the image.

This can allow the distance to be set to an appropriate value to improve robustness by generating a conservative volumetric representation of the object (i.e. one that is more likely to include the target object). Preferably, the distance is determined automatically. Accordingly, the method may comprise performing camera calibration for the image to determine one or more camera parameters, preferably by comparing a set of one or more known points in the scene with their projection in the camera. The method may further comprise deriving a probabilistic covariance matrix of the camera parameters, determining an error ellipsoid in which the projection of a given 3D point wiJi most likely lie, and determining the predefined distance in dependence on the error ellipsoid, preferably as a function of (or equal to) the maximum expansion of the error ellipsoid.

Alternatively or additionally, the method may comprise performing segmentation of the image, and determining the predefined distance in dependence on one or more segmentation errors in the segmented image.

Alternatively or additionally, the method may comprise performing voxel projection for a plurality of source images, and using different distance values for at least two of the source images. Preferably, a distance value of zero is used for a first source image, and a positive non-zero value is used for at least one other source image.

The method preferably comprises generating a different 3D model or volumetric representation of the object for each source image. Specifically, a different model may be derived for each source image, using, for a given model, a distance value of zero for the corresponding source image and a positive non-zero value for all other source images. By use of view-dependent geometry in this way, more reliable reconstruction can be achieved in some cases.

The method preferably comprises synthesising a view of the object using the generated 3D model or volumetric representation. The synthesised image may be generated for a (virtual camera) viewpoint which is different from the viewpoints for the source images.

Preferably, the view is synthesised for a given viewpoint, the method comprising selecting the predefined distance value in dependence on the given viewpoint, preferably as a function of the distance of the virtual camera to one or more real cameras from which the source image(s) were obtained, more preferably as a function of the angle between the viewing direction of the virtual camera and at least one source image camera. The distance value can thus be adjusted so as to provide for more conservative reconstruction when the reconstructed view diverges more significantly from the actual recorded views, and accordingly when calibration and segmentation errors might otherwise have a more pronounced effect on the reconstruction quality, and to provide for less conservative reconstruction in situations where the impact of errors might be less significant.

The above methods preferably generate a volumetric representation of the object (or of a scene comprising multiple objects), the volumetric representation given by the selected voxels. Preferably, the method further comprises generating a polygonal mesh based on the volumetric representation, i.e. the selected voxels. Additional post-processing may be performed, for example for smoothing the mesh.

In a further aspect of the invention, there is provided a method of generating a 3D model of an object represented in a source image, comprising: projecting a plurality of voxels into the image plane (of the source image); selecting voxels of the projected plurality of voxels in dependence on the projection; and generating the 3D model based on the selected voxels; wherein the selecting step comprises, for a given voxel, determining whether the projected voxel extends, in the image plane, to within a predefined distance of the object; and wherein the method further comprises determining the predefined distance in dependence on camera calibration information for the source image.

The method aspects set out above are preferably computer-implemented methods.

The invention also provides a computer program or computer program product having software code adapted, when executed on a data processing apparatus, to perform a method as set out herein and apparatus, preferably an image processing system, having means for performing a method as set out herein.

More generally, the invention also provides a computer program and a computer program product for carrying out any of the methods described herein and/or for embodying any of the apparatus features described herein, and a computer readable medium having stored thereon a program for carrying out any of the methods described herein and/or for embodying any of the apparatus features described herein.

The invention also provides a signal embodying a computer program for carrying out any of the methods described herein and/or for embodying any of the apparatus features described herein, a method of transmitting such a signal, and a computer product having an operating system which supports a computer program for carrying out any of the methods described herein and/or for embodying any of the apparatus features described herein.

The invention extends to methods and/or apparatus substantially as herein described with reference to the accompanying drawings.

Any feature in one aspect of the invention may be applied to other aspects of the invention, in any appropriate combination. In particular, method aspects may be applied to apparatus aspects, and vice versa.

Furthermore, features implemented in hardware may generally be implemented in software, and vice versa. Any reference to software and hardware features herein should be construed accordingly.

Preferred features of the present invention will now be described, purely by way of example, with reference to the accompanying drawings, in which:-Figure 1 illustrates the problem of truncation due to camera calibration errors; Figure 2 illustrates the shape-from-silhouette process; Figure 3 illustrates the intersection of a projected voxel with an object silhouette; Figure 4 illustrates generation of an expanded silhouette for use in a modified intersection test; Figure 5 illustrates a modified intersection test using the radius of a projected voxel footprint; Figure 6 illustrates generation of distance maps for use in performing intersection tests; Figure 7 illustrates a modified intersection test using a distance map; Figure 8 illustrates a typical multi-camera studio setup; Figure 9 illustrates a video broadcasting system; Figure 10 illustrates an image of a scene including a foreground object

and scene background;

Figure 11 shows segmentation of the image of Figure 10 into foreground and background, for use in a shape-from silhouette technique.

Embodiments of the invention provide methods to generate new synthetic views using 3D models and texture maps which are robust against unreliable camera calibration and segmentation.

When reconstructed 3D models are used to synthesise new views, the original camera images are usually mapped onto the reconstructed 3D shape (texture mapping). In the case of truncation the visual quality might be reduced significantly depending on the magnitude of the errors in calibration and segmentation.

The truncation problem is illustrated in Figure 1. Here, a 3D shape 10 is reconstructed from two silhouette images 12, 14, derived from respective camera views cam-I and cam-2 by way of an image segmentation process, for example using chroma keying. As illustrated, a segmentation error in image 12 leads to an incorrect object silhouette, which in turn leads to a truncated area 16 in the reconstructed model 10.

It has been found that a reasonably good visual quality can be preserved by ensuring that the reconstructed 3D shape is the same size or larger than the real shape and then using texture mapping with transparency (see Matusik, W.; Pfister, H.; Beardsley, P.A.; Ngan, A.; Ziegler, R.; McMillan, L., "Image-Based 3D Photography Using Opacity Hulls", ACM SIGGRAPH, ISSN:0730-0301, pp.427-437, July 2002).

Further techniques are described in "Dual-Mode Deformable Models for Free-Viewpoint Video of Sports Events", Kilner, J.; Starck, J.; Hilton, A.; Grau, 0., 3-D Digital Imaging and Modeling, 2007; 3DIM apos;07; Sixth International Conference on Volume, Issue, 21-23 Aug. 2007 pages:177 -184, the entire contents of which are incorporated herein by reference. The described approach uses an initial approximation of the 3D shape which is larger than the target shape as input to a model refinement algorithm. However, the model refinement algorithm is computationally intensive.

Embodiments of the present invention provide methods of generating 3D models that can be more robust against errors, including segmentation and camera calibration errors. The resulting models can be used with transparent texture mapping or as input to model refinement techniques to provide improved visual quality of the reconstructed model.

Preferred embodiments use a volumetric representation, in which the 3D model is constructed from a set of voxels (volume elements). For efficiency, an octree representation can be used. Other model representations may be also be used.

The volumetric representation is generated from a set of source images of a scene recorded from multiple cameras from different viewing directions.

In an initial step, the source images are segmented to form silhouette images.

Segmentation is illustrated in Figures 10 and 11. Figure 10 shows an image recorded by a camera depicting the object 180 for which a 3D model is to be derived against a scene background 182. The image is segmented into (typically) two distinct portions as shown in Figure 11, representing image foreground and background. The segmented image may, for example, be represented as a binary image. The segmented image is referred to as a silhouette image, as it defines a silhouette 186 of the foreground object 180.

In some cases (such as a sports scene involving several players), the silhouette image may comprise multiple object silhouettes, from which a multi-object scene can then be reconstructed. In the present example, the silhouette is shown white on a dark background; however, this is for illustrative purposes only, and any suitable representation may be used.

A silhouette image is obtained for each camera view and provided as input to the reconstruction algorithm which derives a volumetric representation of the object(s).

The basic algorithm for reconstruction is illustrated in Figure 2. The process starts with a subdivision of the total reconstruction volume into a set of voxels which enclose (by definition) the object to be reconstructed. The voxels are then processed for each silhouette, with voxels which do not intersect the silhouettes being excluded from the volumetric representation.

Starting with the first silhouette, in step 20, a voxel is projected into the image plane of the silhouette image being processed. The resulting 2D projection of the 3D voxel is referred to herein as the voxel footprint. The image plane is determined from the viewing direction of the camera from which the silhouette image originates (e.g. cam-2 for silhouette image 14 in Figure 1). This information may be obtained using a camera calibration process. -12-

The algorithm then determines (step 22), based on the voxel footprint in relation to the object silhouette, whether the voxel should be excluded from the volumetric representation of the object. This is referred to as the voxel exclusion test. In step 24, it is determined whether any voxels remain to be processed for this silhouette; if so, the process returns to step 20 to process the next voxel.

Once all voxels have been processed it is determined (step 26) whether all silhouette images have been processed. If not, the processing continues for the next silhouette image (step 28) by projecting the remaining voxels into the next silhouette image (repeating steps 22 to 24) and excluding any that do not intersect the next silhouette image. Once all silhouettes have been processed, the remaining voxels (i.e. those that have not been excluded) form a volumetric representation of the target object or scene.

Thus, starting with a complete volume of voxels, the algorithm discards voxels until only the set of voxels matching each silhouette image remain.

This process is equivalent to determining for each silhouette the set of voxels which should be included and then taking the intersection of the sets for each silhouette to give the final voxel set. However, the above process is computationally more efficient since a voxel which has been excluded once is not reconsidered again for another silhouette.

Post-processing of the reconstructed volume can be used to convert from the volumetric to a surface representation, e.g. to produce a conventional polygonal mesh as output (for example, a marching cubes algorithm may be used).

Since cube-shaped voxels are typically used in these methods, the resulting volumetric representation (if rendered directly) would usually not provide a smooth, natural surface for the reconstructed object. Thus, post-processing may also be performed to smoothen the volumetric or surface model.

The resulting 3D model can be used to synthesise new views of the object or scene, for example for use in a sports replay. Texture mapping can be used to provide a realistic appearance. The 3D model may also be combined with artificially generated model elements to produce synthesised -13 -views combining real-world objects with synthetic objects (e.g. in a virtual studio system).

The voxel exclusion test is typically an intersection test -if the footprint of a voxel in the image plane (as resulting from the projection) does not intersect the silhouette foreground (i.e. a scene object), then it is excluded from the reconstructed model. If the voxel footprint does intersect the foreground, then the voxel is retained (though it may be excluded during processing of a subsequent silhouette image).

The algorithm is summarised in the following pseudo-code: Initialise all voxels Vi to true For all silhouette images Si For all voxels Vi == true Project Vi into Si

If Vi intersects with background only then

set Vi:= false The resolution at which the initial set of voxels Vi is defined determines the detail in the resuItin model. For efficiency, a process of iterative refinement can be used in which the reconstruction volume is divided into an intial set of voxels at an initial resolution. The above process is then perlormed to discard voxels which do not intersect the object in all silhouette images. The remaining voxels are then subdivided into smaller voxels (i.e. at a higher resolution), and the test is repeated. This is repeated until a defined voxel resolution is reached. To further improve efficiency, voxels which when projected are completely contained in each silhouette Si need not be further subdivided. An octree representation can be used to implement this process efficiently. An iterative method as described in GB2418827 may be used.

The intersection test typically involves checking whether the footprint of a voxel intersects with at least one foreground pixel in the silhouette image.

The intersection test is illustrated in Figure 3. Figure 3 shows a volume 32 of voxels. The projection of one voxel 34 into the image plane results in a voxel footprint 38. In this example, footprint 38 intersects silhouette foreground 40, and is thus retained.

The intersection test can be performed by testing all silhouette pixels in the footprint of the voxel (as described in GB2418827). An alternative approach uses a distance map, as described in R. Szeliski, "Rapid octree construction from image sequences", CVGIP: Image Understanding, 58(1):23- 32, July 1993. In this approach, a distance map is determined for each silhouette image. The distance map gives for each pixel the distance to the next foreground pixel.

To simplify computation, preferred embodiments may use an approximation to the voxel footprint. This may be a bounding circle, preferably a minimum bounding circle. The intersection test can then be reduced to checking whether the radius of the footprint Rfootpri,it is less than the value Vdistnrice found in the distance map at the centre of the bounding circle.

In preferred embodiments of the invention, the robustness to errors is improved by modifying the intersection test so that individual voxels are more likely to be included in the generated model. In this modified test, the voxel need not necessarily intersect the silhouette; instead, it suffices for the voxel footprint to extend near the silhouette for it to be included.

In one approach, this can be achieved by expanding the silhouette in the silhouette image by a certain distance (i.e. by a predetermined number Di of pixels, where Di>O). The intersection test is then performed in relation to the expanded silhouette. Specifically, the pixel intersection test can be used to determine whether any pixel within the voxel footprint falls within the expanded silhouette, or a distance map can be used.

The expansion can be achieved by performing a dilation operation on the image foreground to achieve a dilation by Di pixels (or equivalently, an erosion operation on the image background). Known morphological dilation/erosion operators can be used to achieve this. The silhouette expansion can be performed as a pre-processing step prior to the execution of the reconstruction algorithm.

This process is illustrated in Figure 4. For each of the available camera images, the image is segmented into foreground and background at step 50, for example using known chroma-keying techniques, resulting in a binary silhouette image 52, in which the foreground corresponds to one or more silhouettes of scene objects. The object silhouette(s) are then expanded by Di pixels in step 54 to produce an expanded silhouette image 56, which is then stored, If images remain to be processed (step 58), then the process returns to step 50 to segment the next image.

Once the expanded silhouette images have been generated, the reconstruction process proceeds as described above (step 60) but using the expanded silhouette images, i.e. by projecting voxels into the expanded silhouette images and performing pixel intersection tests to determine which voxels are to be included in the model.

In the alternative approach mentioned above using distance maps, the intersection test is simplified by using a bounding circle for the projected voxel as an approximation to its actual footprint. This method will now be described in more detail.

The process is illustrated in Figure 5. Figure 5 shows projection of a voxel 70 from a set of voxels 72 into silhouette image 80, resulting in projected voxel footprint 74. The projection is in relation to the camera viewpoint 82. A bounding circle 76 is determined as an approximation for voxel footprint 74. Intersection with silhouette 78 is then tested as follows: first, a lookup is performed in the distance map corresponding to silhouette image 80 at the centre of the bounding circle 76. The resulting value Vd,st,ice gives the distance from the footprint centre to the nearest foreground pixel.

This value is then compared to the radius Rootpr,,jt of the bounding circle to determine whether the footprint intersects the silhouette; e.g., if Riootpr,,t �= Vd,stnce then the footprint is taken not to intersect silhouette 78 and is excluded from the reconstructed model.

The distance map for each silhouette image can be computed in a pre-processing step. The process is illustrated in Figure 6. As shown, the first or next image is segmented in step 90 to produce silhouette image 92. In step 94, a distance map 96 is computed for the silhouette image, giving, for each pixel (or at least for background pixels), the distance to the nearest foreground pixel. The distance map can be computed using known techniques. The process continues (step 98) until all images have been processed. Voxel projection and intersection tests can then be performed for each silhouette image as described above using the computed distance maps (step 100).

In this variant, to achieve the desired improved robustness, instead of expanding the silhouette prior to the intersection test as described above, the intersection test is modified to add the value Di (where Di>O) to the radius Rfootpr,r7t prior to comparison with the distance map value. Thus, referring back to Figure 5, if: Riooipr,nt + D, �= \/dlstdnce, then the voxel footprint is taken not to intersect the silhouette 78 and is excluded from the reconstruction, It should be noted that equivalent or similar tests can be substituted for the above, e.g.: �= -D, or + 0, < Vd,stanco.

In general terms, the test thus determines whether Vd,sta,co exceeds Riootpr,ni by at least 0,. If it does, the voxel footprint is considered not to extend near the silhouette and the voxel is excluded. If it does not, the voxel footprint is considered to extend near the silhouette (i.e. it almost intersects), and the voxel is considered for inclusion in the model.

The process is summarised in Figure 7.

As shown, for a given silhouette image, the next voxel is projected into the silhouette image plane in step 110 to produce a voxel footprint. At step, 112, the centre point and radius of a bounding circle for the footprint are determined. A distance map lookup is performed at step 114 for the centre point of the bounding circle. The resulting distance value is V(J,SI,,1Ce. The exclusion test is performed at step 116. Specifically, if Rtootpnnt + D, > Vd,stance, the voxel remains included, step 118 (it may later be excluded based on another silhouette image). If RfOC)tj,r,,lt + 0, �= Vcy,stance, the voxel is excluded (step 120).

If, at step 122, all voxels have been processed, then the algorithm proceeds to process the next silhouette image at step 124; if not, the process returns to step 110 to project the next voxel. Once all silhouette images have been processed, the remaining (not excluded) voxels provide the volumetric representation of the scene object(s).

Thus, the voxel exclusion test used in both the above variants is not a strict intersection test. Instead, both variants effectively test whether the voxel footprint extends to within some defined distance (given by Di) from the silhouette boundary. If a voxel footprint does not extend to within the defined distance, it is excluded from the reconstruction. The result of this is that a given voxel is more likely to be included in the reconstructed model, thus leading to a larger reconstructed volume and providing a more conservative estimate of the actual shape being reconstructed, which is more likely to include the target object surface. This conservative estimate can thus provide a more reliable starting point for certain model refinement techniques, as indicated above, and can avoid errors such as truncation errors resulting from incorrect segmentation.

The distance value Di (effectively used to expand the silhouette or voxel footprint respectively in the above variants) in effect determines a degree of lenience in the modified voxel intersection test. If the value chosen for Di is too small, the desired error robustness may not be achieved.

Conversely, if the value is too large, too many voxels may be included in the model, and the resulting reconstructed model may not bear sufficient resemblance to the target surface to be useful for subsequent texture mapping or model refinement.

The value of Di may be determined for each silhouette in a variety of ways.

In a first example, Di is determined as a function of the camera calibration error. If camera parameters for a particular camera cam-i are unreliable (high uncertainty) then Di should be large. A method to derive this automatically can make use of the calibration covariance matrix.

Specifically, the camera parameters used for the reconstruction are usually determined by camera calibration. This is a known process in photogrammetry and computer vision. The camera is typically described by the intrinsic parameters (usually focal length, centre point shift, radial lens distortion) and the extrinsic parameters position and orientation. Any appropriate subset of parameters may be used. For example, certain parameters may be known and/or fixed in which case they may not need to be estimated. The relevant parameters are usually estimated by comparing a set of known points in the scene with their projection in the camera.

However, the estimated parameters are subject to errors. Various known calibration techniques give an estimate of this calibration error in form of a calibration uncertainty. A known representation is in form of the probabilistic covariance matrix Cov(Y) of the camera parameters Y. With the covariance matrix it is possible to predict an error ellipsoid in which the projection p of a 3D point P will most likely lie. Di can be derived as a function of (or be set equal to) the maximum expansion of this error ellipsoid.

In another example, Di can be determined based on segmentation errors. Segmentation errors include areas of silhouettes that are smaller than the true object due to incorrect segmentation. Where such segmentation faults cannot be detected automatically, a practical approach is to adapt Di by human intervention. An interactive process can be provided where an operator sets Di based on visual inspection of the source image and/or segmented silhouette image. Di may optionally also be refined after generation of an initial 3D reconstruction.

Di can also be adapted where objects are only visible on a small scale (due to wide angle lenses or distant cameras). This can again be achieved, for example, by human inspection. Alternatively, Di can be varied automatically proportionally to the size (or radius) of the footprint.

As a further variation of the above approaches the distance Di can be set to zero (or some small value) for one camera/silhouette and as described above for the others. Then a model is derived that will fit the silhouette of that particular camera closely (it will typically fit that camera view more closely than other camera views). The approach is repeated with all the other cameras/silhouettes and results in having a different 3D model for each camera. This is also known as view-dependent geometry. In reconstructing an image from a particular desired viewing angle, the model generated for the viewing angle closest to the desired angle can then be selected, or interpolation can be performed between individual models (e.g. between the closest models).

As a further example, a function to compute Di can take into account the position of the (virtual) camera for which a view should be synthesised.

This variation requires that the position and camera parameters of the virtual camera cam-] for which a view should be synthesised are known before the scene model is computed (on-the-fly computation). Di can then be computed as a function of the distance' K of the virtual camera to the real cameras. A function to determine K can be based on the angle between the position of the virtual camera cam-] and the real camera cam-i. When the angle is zero than Di should be small. For example: Di = a * angle (position ca rn-j. position cam-i) The parameter a can be chosen freely. Other functions are possible.

The above approaches may also be combined to provide more complex functions for selecting a value for Di. For example, Di could be calculated based on a combination of camera calibration error and the position of the virtual camera. In a simple approach, Di may be calculated as an average (possibly weighted) of individual Di values computed in accordance with any of the methods described above or any other suitable methods.

The above methods result in a volumetric representation of the target object, consisting of a number of voxels. As mentioned above, the resulting representation may be post-processed to produce a refined model, for example a smoothed polygonal mesh. Texture mapping can then additionally be performed using known techniques. This typically involves re-projecting mesh polygons or vertices into the original camera images to derive a texture mapping (see, for example EP1398734). A realistic textured model can be generated in this way, enabling high-quality synthesis of artificial views of a scene.

An example of a studio environment in which the present method can be used is illustrated in Figure 8. The system depicted in Figure 8 uses multiple cameras and chroma-key segmentation techniques to compute a 3D model of an actor 140 using the processes described above. Studio cameras 142 are arranged to take images of the scene. A retro-reflective background 146 may be provided to improve the segmentation accuracy.

However, as mentioned above, due to the improved robustness, embodiments of the invention are also suitable for use in less controlled environments. Typical scenarios are sport scenes with a limited number of -20 -cameras (usually less than 12) and dynamic objects. One example is a live sports event such as a football match recorded at a sports stadium with multiple (possibly moveable) cameras. In that scenario, segmentation can still be improved by making use of domain knowledge (e.g. the green colour of a football pitch can be identified as background; feature detection can also be used to eliminate any undesired background artefacts from the image prior to or after segmentation, e.g. pitch lines).

An image processing system and broadcasting system utilising the above methods is shown schematically in Figure 9. The system comprises a plurality of cameras 142, which may be conventional television cameras.

Scene images recorded by the cameras are provided to capture device 160, which may include analogue-digital conversion circuitry if analogue cameras are used (digital cameras can alternatively be used). The images may be provided directly to a broadcast system 168 for broadcast to a television network 170 (for example an analogue or digital terrestrial, cable or satellite network). Alternatively, the images may be transmitted via any other form of communications network, for example a data network such as the Internet.

Recorded images may also be recorded to storage system 166 for later use, for example for broadcast at a later time. The storage system may include hard disk storage, optical disk storage, tape storage, or any other suitable storage media or combinations of storage media.

The images captured by the cameras may also be provided to model generator 162 to generate a 3D model of a scene at any point in time. The resulting model may be used by image synthesiser 164 to synthesise an artificial view of the scene. Typically, this will be for a viewpoint other than those captured by the cameras as mentioned above, for example for use in sports replays. However, the image synthesis may also serve other purposes, for example to generate a modified view in which certain elements are highlighted or replaced, and hence need not necessarily be generated for a viewpoint that differs from the source images. Multiple views may be synthesised, for example to produce an animated sequence or an artificial panning or tracking shot.

The model and/or synthesised view(s) may be stored in and retrieved from storage 166. Previously recorded images from storage 166 may also be -21 -used as input to the model generator 162. Synthesised images may also be provided from synthesiser 164 directly to broadcast system 168 for inclusion in a television broadcast.

The model generator 162 and synthesiser 164 may be implemented as software modules executing on a general purpose computer. The computer may include dedicated video processing hardware, and may include an operator interface (e.g. screen, keyboard, mouse) for controlling model generation and image synthesis.

It will be understood that the present invention has been described above purely by way of example, and modification of detail can be made within the scope of the invention.

For example, the segmentation step to form silhouette images may be omitted where other information is available which enables classification into foreground I background. For example, foreground objects may be identified manually by an operator. Also, in scenes where pixel colour is strongly indicative of foreground classification (e.g. a particular colour always represents an object), the segmentation step could be omitted and the intersection test could instead directly test pixel values in the source images.

Claims

-22 -CLAIMS1. A method of generating a 3D model of an object represented in an image, comprising: projecting a plurality of voxels into the image plane; selecting voxels of the projected plurality of voxels in dependence on the projection; and generating the 3D model based on the selected voxels; wherein the selecting step comprises, for a given voxel, determining whether the projected voxel extends, in the image plane, to within a predefined distance of the object.
2. A method according to claim 1, comprising not selecting the given voxel if the projection of the given voxel does not extend to within the predefined distance.
3. A method according to claim 2, comprising performing the projecting step for each of a plurality of images representing the object, wherein selecting comprises selecting a voxel if, for each image, the voxel extends to within a predefined distance of the object as represented in the image.
4. A method according to any of the preceding claims, comprising generating a silhouette image representing a silhouette of the object; and wherein the determining step comprises determining whether the projected voxel extends to within the predefined distance of the object silhouette.
5. A method according to claim 4, comprising: generating an initial silhouette image comprising an object silhouette; and expanding the object silhouette to produce the silhouette image.
6. A method according to claim 5, comprising expanding the object silhouette by a predetermined number of pixels corresponding to the predefined distance.

-23 -
7. A method according to claim 5 or 6, wherein the determining step comprises determining whether the projected voxel intersects the expanded silhouette.
8. A method according to any preceding claim, wherein generating the initial silhouette image or the silhouette image comprises segmenting the image into at least image foreground and image background, the image foreground comprising the object silhouette.
9. A method according to any of claims 5 to 8, wherein expanding the silhouette comprises performing a dilation operation on the image foregroundor an erosion operation on the image background.
10. A method according to any of claims 5 to 9, comprising testing one or more pixel locations corresponding to the projected voxel to determine whether at least one such pixel location falls within the expanded silhouette.
11. A method according to any of the preceding claims, comprising determining an approximation to the area of the projected voxel, the determining step being performed using the approximation.
12. A method according to any of claims 4 to 11, comprising generating a distance map for the silhouette image, the distance map specifying, for one or more pixels of the silhouette image, the distance to the nearest object pixel, the determining step using the distance map.
13. A method according to claim 12, comprising determining a measure of the extent of the projected voxel in relation to a reference point in the projected voxel, and determining whether the projected voxel intersects using the determined measure.
14. A method according to claim 13, comprising determining whether a distance value obtained from the distance map for the reference point -24-exceeds the determined measure by at least a predetermined value corresponding to the predefined distance.
15. A method according to clam 13 or 14, wherein the measure defines the extent of an approximation of the area of the projected voxel.
16. A method according to any of claims 13 to 15, wherein the measure is the radius of a circle representing an approximation of the projected voxel footprint, preferably a bounding circle, more preferably a minimum bounding circle.
17. A method of generating a volumetric representation of an object represented in at least one source image, comprising, for the or each source image: segmenting the source image into foreground and background, the image foreground representing at least one object silhouette, to produce a silhouette image corresponding to the source image; expanding the object silhouette by a predefined distance; and projecting a plurality of voxels into the image plane and performing intersection tests to determine whether projected voxels intersect the expanded silhouette; the method further comprising: selecting voxels for inclusion in the volumetric representation of the object in dependence on the outcome of the projections and intersection tests; and generating the volumetric representation based on the selected voxels.
18. A method of generating a volumetric representation of an object represented in at least one source image, comprising: segmenting the source image into foreground and background, the image foreground representing at least one object silhouette, to produce a silhouette image corresponding to the source image; generating a distance map for the silhouette image, the distance map specifying, for pixels in the silhouette image, the distance to the nearest foreground pixel; and for each of a plurality of voxels considered for inclusion in the volumetric representation of the object: determining a measure of the extent of the voxel footprint resulting from projection of the voxel into the image plane, the measure determined in relation to a reference point, and excluding the voxel from the volumetric representation if a value from the distance map at a location corresponding to the reference point exceeds the measure by at least a predefined distance value.
19. A method according to any of the preceding claims, wherein the predefined distance is determined in dependence on camera calibration information for the image.
20. A method according to claim 19, comprising performing camera calibration for the image to determine one or more camera parameters, preferably by comparing a set of one or more known points in the scene with their projection in the camera.
21. A method according to claim 19 or 20, comprising deriving a probabilistic covariance matrix of the camera parameters, determining an error ellipsoid in which the projection of a given 3D point will most likely lie, and determining the predefined distance as a function of the maximum expansion of the error ellipsoid.
22. A method according to any the preceding claims, comprising performing segmentation of the image, and determining the predefined distance in dependence on one or more segmentation errors in the segmented image. 26 -
23. A method according to any of the preceding claims, comprising performing voxel projection for a plurality of source images, comprising using different distance values for at least two of the source images.
24. A method according to claim 23, comprising using a distance value of zero for a first source image, and a positive non-zero value for at least one other source image.
25. A method according to claim 23 or 24, comprising generating a different 3D model or volumetric representation of the object for each source image.
26. A method according to any of the preceding claims, comprising synthesising a view of the object using the generated 3D model or volumetric representation.
27. A method according to claim 26, wherein the view is synthesised for a given viewpoint, the method comprising selecting the predefined distance value in dependence on the given viewpoint, preferably as a function of the distance of the virtual camera to one or more real cameras from which the source image(s) were obtained, more preferably as a function of the angle between the viewing direction of the virtual camera and at least one source image camera.
28. A method according to any of the preceding claims, comprising generating a polygonal mesh based on the selected voxels.
29. A method of generating a 3D model of an object represented in a source image, comprising: projecting a plurality of voxels into the image plane of the source image; selecting voxels of the projected plurality of voxels in dependence on the projection; and generating the 3D model based on the selected voxels; wherein the selecting step comprises, for a given voxel, determining whether the projected voxel extends, in the image plane, to within a predefined distance of the object; and wherein the method further comprises determining the predefined distance in dependence on camera calibration information for the source image.
30. A computer program or computer program product having software code adapted, when executed on a data processing apparatus, to perform a method as claimed in any of the preceding claims.
31. Apparatus, preferably an image processing system, having means for performing a method as claimed in any of claim 1 to 29.
32. A method of generating a 3D model substantially as herein described with reference to and/or as illustrated in any of the accompanying drawings.
33. An image processing system substantially as herein described with reference to and/or as illustrated in any of the accompanying drawings.