WO2001037222A1 - Image processing system - Google Patents
Image processing system Download PDFInfo
- Publication number
- WO2001037222A1 WO2001037222A1 PCT/GB2000/004411 GB0004411W WO0137222A1 WO 2001037222 A1 WO2001037222 A1 WO 2001037222A1 GB 0004411 W GB0004411 W GB 0004411W WO 0137222 A1 WO0137222 A1 WO 0137222A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- input parameters
- parameters
- appearance
- model
- function
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
- G06T17/005—Tree description, e.g. octree, quadtree
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
Definitions
- the present invention relates to the parametric modelling of the appearance of objects.
- the resulting model can be used, for example, to track the object, such as a human face, in a video sequence.
- the appearance model proposed by Cootes et al includes a single appearance model matrix which linearly relates a set of parameters to corresponding image data. Blanz et al segmented the face into a number of completely independent appearance models, each of which is used to render a separate region of the face. The results are then merged using a general interpretation technique.
- the present invention aims to provide an alternative way of modelling the appearance of objects which will allow subsequent image interpretation through appropriate processing of parameters generated for the image.
- the present invention provides a hierarchical parametric model for modelling the shape of an object, the model comprising data defining a hierarchical set of functions in which a function in a top layer of the hierarchy is operable to generate a set of output parameters from a set of input parameters and in which one or more functions in a bottom layer of the hierarchy are operable to receive parameters output from one or more functions from a higher layer of the hierarchy and to generate therefrom the relative positions of a plurality of predetermined points on the object.
- a hierarchical parametric model has the advantage that small changes in some parts of the object can still be modelled by the parameters, even though they are significantly smaller than variations in other less important parts of the object.
- This model can be used for face tracking, video compression, 2D and 3D character generation, face recognition for security purposes, image editing etc.
- the present invention provides an apparatus and method of determining a set of appearance parameters representative of the appearance of an object, the method comprising the steps of storing a hierarchical parametric model such as the one discussed above and at least one function which relates a change in input parameters to an error between actual appearance data for the object and appearance data determined from the set of input parameters and the parametric model; initially receiving a current set of input parameters for the object; determining appearance data for the object from the current set of input parameters and the stored parametric model; determining the error between the actual appearance data of the object and the appearance data determined from the current set of input parameters; determining a change in the input parameters using the at least one stored function and said determined error; and updating the current set of input parameters with the determined change in the input parameters.
- a hierarchical parametric model such as the one discussed above and at least one function which relates a change in input parameters to an error between actual appearance data for the object and appearance data determined from the set of input parameters and the parametric model
- Figure 1 is a schematic block diagram illustrating a general arrangement of a computer system which can be programmed to implement the present invention
- Figure 2 is a block diagram of an appearance model generation unit which receives some of the image frames of a source video sequence together with a target image frame and generates therefrom an appearance model;
- Figure 3 is a block diagram of a target video sequence generation unit which generates a target video sequence from a source video sequence using a set of stored difference parameters;
- Figure 4 is a flow chart illustrating the processing steps which the target video sequence generation unit shown in Figure 3 performs to generate the target video sequence;
- Figure 5 schematically illustrates the form of a hierarchical appearance model generated in one embodiment of the invention
- Figure 6 shows a head with a mesh of triangular facets placed over the head and whose positions are defined by the position of landmark points at the corners of the facets;
- Figure 7 is a flow chart illustrating the processing steps required to generate a facet appearance model from the training images
- Figure 8 schematically illustrates the way in which a transformation is defined between a facet in a training image and a predefined shape of facet which allows texture information to be extracted from the facet;
- Figure 9 is a flow chart illustrating the main processing steps involved in determining an appearance model for the mouth using the appearance models for the facets which appear in the mouth and using the training images;
- Figure 10 schematically illustrates the way in which training images are used to determine some of the appearance models which form the hierarchical appearance model illustrated in Figure 5;
- Figure 11a is a flow chart illustrating the processing steps performed during a training routine to identify an Active matrix associated with a current facet
- Figure lib is a flow chart illustrating the processing steps performed during a training routine to identify an Active matrix associated with the mouth
- Figure 12 is a flow chart illustrating the processing steps involved in determining a set of parameters which define the appearance of a face within a input image
- Figure 13a shows three frames of an example source video sequence which is applied to the target video sequence generation unit shown in Figure 4;
- Figure 13b shows an example target image used to generate a set of difference parameters used by the target video sequence generation unit shown in Figure 4;
- Figure 13c shows a corresponding three frames from a target video sequence generated by the target video sequence generation unit shown in Figure 4 from the three frames of the source video sequence shown in Figure 13a using the difference parameters generated using the target image shown in Figure 13b;
- Figure 13d shows a second example of a target image used to generate a set of difference parameters for use by the target video sequence generation unit shown in Figure 4 ;
- Figure 13e shows the corresponding three frames from the target video sequence generated by the target video sequence generation unit shown in Figure 4 when the three frames of the source video sequence shown in Figure 13a are input to the target video sequence generation unit together with the difference parameters calculated using the target image shown in Figure 13d.
- FIG. 1 is an image processing apparatus according to an embodiment of the present invention.
- the apparatus comprises a computer 1 having a central processing unit (CPU) 3 connected to a memory 5 which is operable to store a program defining the sequence of operations of the CPU 3 and to store object and image data used in calculations by the CPU 3.
- an input device 1 which in this embodiment comprises a keyboard and a computer mouse.
- pointing device such as a digitiser with associated stylus may be used.
- a frame buffer 9 is also provided and is coupled to the CPU 3 and comprises a memory unit (not shown) arranged to store image data relating to at least one image, for example by providing one (or several) memory location(s) per pixel of the image .
- the value stored in the frame buffer for each pixel defines the colour or intensity of that pixel in the image.
- the images are represented by 2-D arrays of pixels, and are conveniently described in terms of Cartesian coordinates, so that the position of a given pixel can be described by a pair of x-y coordinates. This representation is convenient since the image is displayed on a raster scan display 11. Therefore, the x-coordinate maps to the distance along the line of the display and the y- coordinate maps to the number of the line.
- the frame buffer 9 has sufficient memory capacity to store at least one image. For example, for an image having a resolution of 1000 x 1000 pixels, the frame buffer 9 includes 10 s pixel locations, each addressable directly or indirectly in terms of a pixel coordinate x,y.
- a video tape recorder (VTR) 13 is also coupled to the frame buffer 9, for recording the image or sequence of images displayed on the display 11.
- a mass storage device 15, such as a hard disc drive, having a high data storage capacity is also provided and coupled to the memory 5.
- a floppy disc drive 17 which is operable to accept removable data storage media, such as a floppy disc 19 and to transfer data stored thereon to the memory 5.
- the memory 5 is also coupled to a printer 21 so that generated images can be output in paper form, an image input device 23 such as a scanner or video camera and a modem 25 so that input images and output images can be received from and transmitted to remote computer terminals via a data network, such as the Internet.
- the CPU 3, memory 5, frame buffer 9, display unit 11 and mass storage device 13 may be commercially available as a complete system, for example as an IBM compatible personal computer (PC) or a workstation such as the Sparc station available from Sun Microsystems.
- a number of embodiments of the invention can be supplied commercially in the form of programs stored on a floppy disc 19 or on other mediums, or as signals transmitted over a data link, such as the Internet, so that the receiving hardware becomes reconfigured into an apparatus embodying the present invention.
- the computer 1 is programmed to receive a source video sequence input by the image input device 23 and to generate a target video sequence from the source video sequence using a target image.
- the source video sequence is a video clip of an actor acting out a scene
- the target image is an image of a second actor
- the resulting target video sequence is a video sequence showing the second actor acting out the scene.
- a hierarchical parametric appearance model which models the variability of shape and texture of the head images is used.
- This appearance model makes use of the fact that some prior knowledge is available about the contents of head images in order to facilitate their modelling. For example, it can be assumed that two frontal images of a human face will each include eyes, a nose and a mouth.
- the hierarchical parametric appearance model 35 is generated by an appearance model generation unit 31 from training images which are stored in an image database 32.
- all the training images are colour images having 500 x 500 pixels, with each pixel having a red, green and a blue pixel value.
- the resulting appearance model 35 is a parameterisation of the appearance of the class of head images defined by the heads in the training images, so that a relatively small number of parameters (for example 50) can describe the detailed (pixel level) appearance of a head image from the class.
- the hierarchical appearance model 35 defines a function (F) such that:
- a target video sequence can be generated from a source video sequence.
- the source video sequence is input to a target video sequence generation unit 51 which processes the source video sequence using a set of difference parameters 53 to generate and to output the target video sequence.
- the difference parameters 53 are determined by subtracting the appearance parameters which are generated for the first actor ' s head in one of the source video frames , from the appearance parameters which are generated for the second actor's head in the target image. The way in which these appearance parameters are determined for these images will be described later.
- the pose and facial expression of the first actor's head in the source video frame used should match, as closely as possible, the pose and facial expression of the second actor's head in the target image.
- step si the appearance parameters (Q S ) for the first actor's head in the current video frame (I s 1 ) are automatically calculated. The way that this is achieved will be described later.
- step s3 the difference parameters (j d i f ) are added to the appearance parameters for the first actor's head in the current video frame to generate :
- the resulting appearance parameters (fi ⁇ d 1 ) are then used, in step s5 , to regenerate the head for the current target video frame.
- the modified appearance parameters are inserted into equation ( 1 ) above to regenerate a modified head image which is then composited, in step s7, into the source video frame to generate the corresponding target video frame.
- a check is then made, in step s9, to determine whether or not there are any more source video frames. If there are, then the processing returns to step si where the procedure described above is repeated for the next source video frame. If there are no more source video frames, then the processing ends.
- Figure 13 illustrates the results of this animation technique (although showing black and white images and not colour) .
- Figure 13a shows three frames of the source video sequence
- Figure 13b shows the target image (which in this embodiment is computer generated)
- Figure 13c shows the corresponding three frames of the target video sequence obtained in the manner described above.
- an animated sequence of the computer generated character has been generated from a video clip of a real person and a single image of the computer generated character.
- the parametric model is created by placing a number of landmark points on a training image and then identifying the same landmark points on the other training images in order to identify how the location of and the pixel values around the landmark points vary within the training images.
- a principal component analysis is then performed on the matrix which consists of vectors of the landmark points.
- This PCA yields a set of Eigenvectors which describe the directions of greatest variation along which the landmark points change.
- Their appearance model includes the linear combination of the Eigenvectors plus parameters for translation, rotation and scaling. This single appearance model relates a compact set of appearance parameters to pixel values.
- a hierarchical appearance model comprising several appearance models which model variations in components of the object is used.
- the hierarchical appearance model may include an appearance model for the mouth, one for the left eye, one for the right eye and one for the nose. Since it may be possible to model various components of the object, the particular hierarchical structure which will be used for a particular object and application must first of all be defined by the system designer.
- Figure 5 schematically illustrates the structure of the hierarchical appearance model used in this embodiment.
- a general face appearance model 61 Beneath the face appearance model there is a mouth appearance model 63, a left eye appearance model 65, a right eye appearance model 67, a left eyebrow appearance model 69, a rest of left eye appearance model 71, a right eyebrow appearance model 73, a rest of right eye appearance model 75 and, in this embodiment, a facet appearance model for each facet defined in the training images.
- Figure 6 shows the head of a training image in which the set of landmark points has been placed at the appropriate points on the head.
- the face appearance model 61 operates to relate a small number of "global" appearance parameters to a further set of appearance parameters , some of which are input to facet appearance models 77, some of which are input to the mouth appearance model 63, some of which are input to the left eye appearance model 65 and the rest of which are input to the right eye appearance model 67.
- the facet appearance models 77 operate to relate the input parameters received from the appearance model which is above it in the hierarchy into corresponding pixel values for that facet.
- the mouth appearance model 63 is operable to relate the parameters it receives from the face appearance model 61 into a further set of appearance parameters, respective ones of which are output to the respective facet appearance models 77 for the facets which are associated with the mouth.
- the left and right eye appearance models 65 and 67 operate to relate the parameters it receives from the face appearance model 61 into a further set of appearance parameters, some of which are input to the appropriate eyebrow appearance model and the rest of which are input to the appropriate rest of eye appearance model .
- These appearance models in turn convert these parameters into parameters for input to the facet appearance models associated with the facets which appear in the left and right eyes respectively.
- a small compact set of "global" appearance parameters input to the face appearance model 61 can filter through the hierarchical structure illustrated in Figure 5 to generate a set of pixel values for all the facets in a head which can then be used to regenerate the image of the head.
- each of the training images stored in the image database 32 is labelled with eighty six landmark points. In this embodiment, this is performed manually by the user via the user interface 33.
- each training image is displayed on the display 11 and the user places the landmark points over the head in the training image. These points delineate the main features in the head, such as the position of the hairline, neck, eyes, nose, ears and mouth.
- each landmark point is associated with the same point on each face. In this embodiment, the following landmark points are used:
- the result of the manual placement of the landmark points is a table of landmark points for each training image, which identifies the (x, y) coordinate of each landmark point within the image. As shown in Figure 6 , these landmark points are also used to define the location of predetermined triangular facets or areas within the training image.
- Figure 7 shows a flow chart illustrating the main processing steps involved in this embodiment in determining a facet appearance model for facet (i) .
- the system determines, for each training image, the apex coordinates of facet (i) and texture values from within facet (i).
- a transformation which transforms the facet onto a reference facet is determined.
- Figure 8 illustrates this transformation.
- Figure 8 shows facet f ⁇ v taken from the V-th training image, which is defined by the landmark points (X ⁇ v ,y/), (x 2 v ,y 2 v ) and (x 3 v ,y 3 v )-
- the transformation (T ⁇ v ) which transforms those coordinates onto coordinates (0,0), (1,0) and (0,1) is determined.
- the texture information extracted from each training facet is defined by the regular array of pixels shown in the reference facet.
- the inverse transformation ([Ti v ] -1 ) is used to transform the pixel locations in the reference facet, into corresponding locations in the training facet, from which the RGB pixel values are determined.
- this transformation may not result in an exact correspondence with a single image pixel location since the pixel resolution in the actual facet may be different to the resolution in the reference facet.
- the texture information (RGB pixel values) which is determined is obtained by interpolating between the surrounding image RGB pixel values.
- the texture information for facet (i) from the V-th training image can then be represented by a vector (t iv ) of the form:
- t ⁇ ⁇ v is the RGB texture information for the first reference pixel extracted from facet (i) in the V-th training image etc .
- the facet appearance models 77 treat shape and texture separately. Therefore, in step s63, the system performs a principal component analysis (PCA) on the set of texture training vectors generated in step s61.
- PCA principal component analysis
- the reader is referred to the book by W. J. Krzanowski entitled “Principles of Multivariate Analysis - A User's Perspective” 1998, Oxford Statistical Science Series.
- this principal component analysis determines all possible modes of variation within the training texture vectors. However, since each of the facets is associated with a similar point on the face, most of the variation within the data can be explained by a few modes of variation.
- the result of the principal component analysis is a facet texture appearance model (defined by matrix Fj . ) which relates a vector of facet texture parameters to a vector of texture pixel values, by:
- t iv is the RGB texture vector defined above
- f 1 is the mean RGB texture vector for facet (i)
- F ⁇ is a matrix which defines the facet texture appearance model for facet (i)
- ⁇ L ⁇ is a vector of the facet texture parameters which describes the RGB texture vector t iv .
- the matrix F ⁇ describes the main modes of variation of the texture within the training facets; and the vector of facet texture parameters (£ l ) for a given input facet has a parameter associated with each mode of variation whose value relates the texture of the input facet to the corresponding mode of variation.
- step s65 the system determines how many texture parameters are needed for the current facet and stores the appropriate facet appearance model matrix.
- equation ( 3 ) can be solved with respect to the texture vector t iv to give:
- a facet texture appearance model will have been generated for each of those facets.
- the facet appearance model does not compress the parameters defining the shape of the facets, since only six parameters are needed to define the shape of each facet - two parameters for each (x,y) coordinate of the facet's apexes.
- Figure 9 shows a flow chart illustrating the main processing steps required in order to generate the mouth appearance model 63.
- the system uses the facet appearance models for the facets which form part of the mouth to generate shape and texture parameters from those facets for each training image. Therefore, referring to Figure 10, the mouth appearance model 63 will receive texture and shape parameters from the facet appearance model for facet (i), facet (j) and facet (n) for the corresponding facets in each of the training images 79.
- the appearance model for facet (i) is operable to generate, for each training image, six shape parameters (corresponding to the three (x,y) coordinates of the apexes of facet (i)) and six texture parameters.
- the appearance model for facet (j) is operable to generate, for each training image, six shape parameters and four texture parameters
- the appearance model for facet (n) is operable to generate, for each training image, six shape parameters and three texture parameters.
- step s69 the system performs a principal component analysis on the shape and texture parameters generated for the training images by the facet appearance models associated with the mouth.
- the mouth appearance model 63 treats the shape and texture separately.
- the system concatenates the six shape parameters for the facets associated with the mouth to form the following shape vector:
- ⁇ T s [ ⁇ ⁇ £i , y ⁇ fi . ⁇ 2 fi . y2 fi . ⁇ 3 fi , y3 fl - : ⁇ ⁇ £j , y ⁇ fj , ⁇ 2 fj , y2 £j • • - F and concatenates the facet texture parameters output by the facet appearance models associated with the mouth to form the following texture vector:
- the system then performs a principal component analysis on the shape vectors generated by all the training images to generate a shape appearance model for the mouth 0 (defined by matrix M s ) which relates each mouth shape vector to a corresponding vector of shape mouth parameters by:
- g* ⁇ M (E FMs -FMs (5) 5
- ⁇ is the mouth shape vector for the mouth in the V-th training image
- TM 8 is the mean mouth shape vector from the training vectors
- £ is a vector of mouth shape parameters for the mouth shape vector p ⁇ v Tne 0 mouth shape model, defined by matrix M s , describes the main modes of variation of the shape of the mouths within the training images
- the vector of mouth shape parameters (£ ) for the mouth in the V-th training image has a parameter associated with each mode of variation 5 whose value relates the shape of the input mouth to the corresponding mode of variation.
- equation (5) above can be rewritten with respect to the mouth shape vector 22" to give:
- the system then performs a principal component analysis on the mouth texture parameter vectors (E FMt ) which are generated for the training images.
- This principal component analysis generates a mouth texture model (defined by matrix M,.) which relates each of the facet texture parameter vectors for the facets associated with the mouth, to a corresponding vector of mouth texture parameters , by:
- £" is a vector of mouth facet texture parameters generated by the facet appearance models associated with the mouth for the mouth in the V-th training image
- j Mt is the mean vector of mouth facet texture parameters from the training vectors and " is a vector of mouth texture parameters for the facet texture parameters
- the matrix M t describes the main modes of variation within the training images of the facet texture parameters generated by the facet appearance models which are associated with the mouth
- the vector of mouth texture parameters ( ⁇ Mt v ) has a parameter associated with each of those modes of variation whose value relates the texture of the input mouth to the corresponding mode of variation.
- step s71 shown in Figure 9 the system determines the number of shape parameters and texture parameters needed to describe the training data received from the facet appearance models which are associated with the mouth.
- the mouth appearance model 63 requires five shape parameters and four texture parameters to be able to model most of this variation. The system therefore stores the appropriate mouth shape and texture appearance model matrices for subsequent use.
- the resulting hierarchical appearance model allows a small number of global face appearance parameters to be input to the face appearance model 61, which generates further parameters which propagate down through the hierarchical model structure until facet pixel values are generated, from which an image which corresponds to the global appearance parameters can be generated.
- appearance parameters for an image were generated from a manual placement of a number of landmark points over the image.
- the appearance parameters for the heads in the input images were automatically calculated. This task involves finding the set of global appearance parameters p_ which best describe the pixels in view. This problem is complicated because the inverse of each of the appearance models in the hierarchical appearance model is not necessarily one-to-one.
- the appearance parameters for the head in an input image are calculated in a two-step process.
- an initial set of global appearance parameters for the head in the current frame (Ig 1 ) is found using a simple and rapid technique. For all but the first frame of the source video sequence, this is achieved by simply using the appearance parameters from the preceding video frame (I s 1-1 ) before modification in step s3 (i.e. parameters 2 s 1 " 1 ).
- the global appearance parameters (£) effectively define the shape and colour texture of the head.
- the initial estimate of the appearance parameters is set to the mean set of appearance parameters and the scale, position and orientation is initially estimated by the user manually placing the mean head over the head in the image.
- an iterative technique is used in order to make fine adjustments to the initial estimate of the appearance parameters.
- the adjustments are made in an attempt to minimise the difference between the head described by the global appearance parameters (the model head) and the head in the current video frame (the image head).
- this represents a difficult optimisation problem.
- This can be performed by using a standard steepest descent optimisation technique to iteratively reduce the mean squared error between the given image pixels and those predicted by a particular set of appearance parameter values.
- minimising the following error function Efg) minimising the following error function Efg) :
- I a is a vector of actual image RGB pixel values at the locations where the appearance model predicts values (the appearance model does not predict all pixel values since it ignores background pixels and only predicts a subsample of pixel values within the object being modelled) and F( ) is the vector of image RGB pixel values predicted by the hierarchical appearance model.
- E(g) will only be zero when the model head (i.e. F(]D) ) predicts the actual image head (J a ) exactly.
- Standard steepest descent optimisation techniques stipulate that a step in the direction - ⁇ 7E(JD) should result in a reduction in the error function E (JO) , provided the error function is well behaved. Therefore, the change ( ⁇ p) in the set of parameter values should be:
- an Active matrix is determined and used for each of the individual appearance models which form part of the hierarchical appearance model.
- the way in which these Active matrices are determined in this embodiment will now be described with reference to Figures 11a and lib, which illustrate the processing steps performed to generate the Active matrix for each facet appearance model and the Active matrix for the mouth appearance model.
- step s73 the system chooses a random facet parameter vector ( ⁇ L ) for the current facet (i) and then, in s75, perturbs this facet parameter vector by a small random amount to create JD* " + ⁇ ".
- the facet parameter vectors include not only the texture parameters, but also the six shape parameters which define the (x,y) coordinates of the facet's location within the image.
- step s77 the system uses the parameter vector E* "1 and the perturbed parameter vector p 1 + ⁇ 1"1 to create model images I 0 Fi and I 1 respectively.
- step s79 the system records the parameter change Ap 1 and image difference If 1 - J/ 1 .
- step s81 the system determines whether or not there is sufficient training data for the current facet. If there is not then the processing returns to step s21. Once sufficient training data has been generated, the processing proceeds to step s83 where the system performs multiple multivariate linear regressions on the data for the current facet to identify an Active matrix (A Fi ) for the current facet.
- a Fi Active matrix
- Figure lib shows the processing steps required to calculate the Active matrix for the mouth appearance model.
- the system chooses a random mouth parameter vector £".
- this vector includes both the mouth shape parameters and the mouth texture parameters.
- step s87 the system perturbs this mouth parameter vector by a small random amount to create " + ⁇ ".
- the processing then proceeds to step s89 where the system uses the mouth parameter vectors p* 1 and the perturbed mouth parameter vector 1 + ⁇ M to create model images I 0 m and l ⁇ respectively, using the mouth appearance model and the facet appearance models associated with the mouth.
- step s91 the facet appearance models associated with the mouth are used again to transform the mouth model images I 0 m and 1 ⁇ into corresponding facet appearance parameters p_ Q FM and £>/", which are then subtracted to determine the corresponding change ⁇ p_ FM in the mouth facet parameters .
- step s93 the system records the mouth parameter change ⁇ p_ M and the mouth facet parameter change ⁇ p_ FM .
- step s95 the system determines whether or not there is sufficient training data. If there is not, then the processing returns to step s85.
- step s97 the system performs multiple multivariate linear regressions on the training data for the mouth to identify the Active matrix (Arada) for the mouth which relates changes in mouth parameters ⁇ p_ M to changes in facet parameters ⁇ p_ FM for the facets associated with the mouth.
- step slOl the system initially estimates a set of global parameters for the head in the current source video frame.
- step sl03 the system generates a model image from the estimated global parameters and the hierarchical appearance model.
- step sl05 it determines the image error between the model image and the current source video frame.
- step sl07 the system uses this image error to propagate parameter changes up the hierarchy of the hierarchical appearance model using the stored Active matrices to determine a change in the global parameters .
- This change in global parameters is then used, in step sl09, to update the current global parameters for the current source video frame.
- the system determines, in step sill, whether or not convergence has been reached by comparing the error obtained from equation (8) using the updated global parameters with a predetermined threshold (Th) . If convergence has not been reached, then the processing returns to step sl03. Once convergence is reached, the processing proceeds to step sll3, where the current global appearance parameters are output as the global appearance parameters for the current source video frame and then the processing ends.
- Th predetermined threshold
- each of the appearance models within the hierarchical model may model the combined variation of the shape and texture within the training images .
- a facet appearance model was generated for each facet defined within the training images.
- many of the facets may be grouped together such that a single facet appearance model is generated for those facets.
- a single facet appearance model may be determined which models the variability of texture within each facet of the training images .
- the same amount of texture information was extracted from each facet within the training images.
- fifty RGB texture values were extracted from each training facet.
- the amount of texture information extracted from each facet may vary in dependence upon the size of the facet. For example, more texture information may be extracted from larger facets or more texture information may be extracted from facets associated with important features of the face, such as the mouth, eyes or nose.
- each appearance model was determined from a principal component analysis of a set of training data.
- This principal component analysis determines a linear relationship between the training data and a set of model parameters.
- techniques other than principal component analysis can be used to determine a parametric model which relates a set of parameters to the training data.
- This model may define a non-linear relationship between the training data and the model parameters.
- one or more of the models within the hierarchy may comprise a neural network which relates the set of input parameters to the training data.
- a principal component analysis was performed on a set of training data in order to identify a relatively small number of parameters which describe the main modes of variation within the training data. This allows a relatively small number of input parameters to be able to generate a larger set of output parameters from the model.
- One or more of the appearance models may act as transformation models in which the number input parameters is the same as or greater than the number of output parameters. This can be used to generate a set of input parameters which can be changed by the user in some intuitive way. For example, in order to identify parameters which have a linear relationship with features in the object, such as a parameter that linearly changes the amount of smile within a face image.
- a set of Active matrices were used in order to identify automatically a set of appearance parameters for an input image.
- a global Active matrix may be used instead.
- suitable Active matrices can be determined using just the shape information.
- the target image illustrated a computer generated head.
- the target image might be a hand-drawn head or an image of a real person.
- Figures 13d and 13e illustrate how an embodiment with a hand-drawn character might be used in character animation.
- Figure 13d shows a hand-drawn sketch of a character which, when combined with the images from the source video sequence (some of which are shown in Figure 13a) generate a target video sequence, some frames of which are shown in Figure 13e.
- the hand-drawn sketch has been animated automatically using this technique.
- the appearance model was used to model the variations in facial expressions and 3D pose of human heads .
- the appearance model can be used to model the appearance of any deformable object such as parts of the body and other animals and objects.
- the above techniques can be used to track the movement of lips in a video sequence. Such an embodiment could be used in film dubbing applications in order to synchronise the lip movements with the dubbed sound.
- This animation technique might also be used to give animals and other objects human-like characteristics by combining images of them with a video sequence of an actor. This technique can also be used for monitoring the shape and appearance of objects passing along a production line for quality control purposes.
- the appearance model was generated by using a principal component analysis of shape and texture data which is extracted from the training images.
- a principal component analysis of shape and texture data which is extracted from the training images.
- other modelling techniques such as vector quantisation and wavelet techniques can be used.
- the training images used to generate the appearance model were all colour images in which each pixel had an RGB value.
- the way in which the colour is represented in this embodiment is not important.
- each pixel having a red, green and blue value they might be represented by a chrominance and a luminance component or by hue, saturation and value components.
- the training images may be black and white images, in which case only grey level data would be extracted from the facets in the training images. Additionally, the resolution of each training image may be different.
- the difference parameters were determined by comparing the image of the first actor from one of the frames of the source video sequence with the image of the second actor in the target image.
- a separate image of the first actor may be provided which does not form part of the source video sequence.
- each of the appearance models modelled variations in two-dimensional images .
- the above modelling technique could be adapted to work with 3D images and animations.
- the training images used to generate the appearance model would normally include 3D images instead of 2D images.
- the three-dimensional models may be obtained using a three dimensional scanner which typically work either by using laser range-finding over the object or by using one or more stereo pairs of cameras.
- 3D hierarchical appearance model Once a 3D hierarchical appearance model has been created from the training models, new 3D models can be generated by adjusting the appearance parameters and existing 3D models can be animated using the same differencing technique that was used in the two-dimensional embodiment described above.
- This 3D model can then be used to track 3D objects directly within a 3D animation.
- a 2D model may be used to track the 3D object within a video sequence and then use the result to generate 3D data for the tracked object.
- a set of difference parameters were identified which describe the main differences between the head in the video sequence and the head in the target image, which difference parameters were used to modify the video sequence so as to generate a target video sequence showing the second head.
- the set of difference parameters were added to a set of appearance parameters for the current frame being processed.
- the difference parameters may be weighted so that, for example, the target video sequence shows a head having characteristics from both the first and second actors.
- a hierarchical appearance model is used to model the appearance of human faces. The model is then used to modify a source video sequence showing a first actor performing a scene to generate a target video sequence showing a second actor performing the same scene.
- the hierarchical model presented above can be used in various other applications.
- the hierarchical appearance model can be used for synthetic two-dimensional or three-dimensional character generation; video compression when the video is substantially that of an object which is modelled by the appearance model; object recognition for security purposes; face tracking for human performance analysis or human computer interaction and the like; 3D model generation from two-dimensional images; and image editing (for example making people look older or younger, fatter or thinner etc ) .
- an iterative process was used to update an estimated set of appearance parameters for an input image. This iterative process continued until an error between the actual image and the image predicted by the model was below a predetermined threshold. In an alternative embodiment, where there is only a predetermined amount of time available for determining a set of appearance parameters for an input image, this iterative routine may be performed for a predetermined period of time or for a predetermined number of iterations .
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP00976195A EP1272979A1 (en) | 1999-11-18 | 2000-11-20 | Image processing system |
AU14070/01A AU1407001A (en) | 1999-11-18 | 2000-11-20 | Image processing system |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB9927314.6 | 1999-11-18 | ||
GB9927314A GB2359971A (en) | 1999-11-18 | 1999-11-18 | Image processing system using hierarchical set of functions |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2001037222A1 true WO2001037222A1 (en) | 2001-05-25 |
WO2001037222A9 WO2001037222A9 (en) | 2001-08-09 |
Family
ID=10864763
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/GB2000/004411 WO2001037222A1 (en) | 1999-11-18 | 2000-11-20 | Image processing system |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP1272979A1 (en) |
AU (1) | AU1407001A (en) |
GB (1) | GB2359971A (en) |
WO (1) | WO2001037222A1 (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004036500A3 (en) * | 2002-10-16 | 2004-09-16 | Koninkl Philips Electronics Nv | Hierarchical image segmentation |
GB2402311A (en) * | 2003-05-27 | 2004-12-01 | Canon Kk | Facial recognition using synthetic images |
EP1330128A3 (en) * | 2001-12-03 | 2006-02-08 | Microsoft Corporation | Automatic detection and tracking of multiple individuals' faces using multiple cues |
US7362886B2 (en) | 2003-06-05 | 2008-04-22 | Canon Kabushiki Kaisha | Age-based face recognition |
US20100135530A1 (en) * | 2008-12-03 | 2010-06-03 | Industrial Technology Research Institute | Methods and systems for creating a hierarchical appearance model |
WO2014043755A1 (en) * | 2012-09-19 | 2014-03-27 | Commonwealth Scientific And Industrial Research Organisation | System and method of generating a non-rigid model |
-
1999
- 1999-11-18 GB GB9927314A patent/GB2359971A/en not_active Withdrawn
-
2000
- 2000-11-20 AU AU14070/01A patent/AU1407001A/en not_active Abandoned
- 2000-11-20 WO PCT/GB2000/004411 patent/WO2001037222A1/en not_active Application Discontinuation
- 2000-11-20 EP EP00976195A patent/EP1272979A1/en not_active Withdrawn
Non-Patent Citations (2)
Title |
---|
COOTES T F ET AL: "ACTIVE SHAPE MODELS-THEIR TRAINING AND APPLICATION", COMPUTER VISION AND IMAGE UNDERSTANDING,ACADEMIC PRESS,US, vol. 61, no. 1, January 1995 (1995-01-01), pages 38 - 59, XP000978654, ISSN: 1077-3142 * |
EDWARDS G J ET AL: "ADVANCES IN ACTIVE APPEARANCE MODELS", KERKYRA, GREECE, SEPT. 20 - 27, 1999,LOS ALMITOS, CA: IEEE COMP. PRESS,US, vol. CONF. 7, 1999, pages 137 - 142, XP000980072, ISBN: 0-7695-0165-6 * |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7428315B2 (en) | 2001-12-03 | 2008-09-23 | Microsoft Corporation | Automatic detection and tracking of multiple individuals using multiple cues |
US7433495B2 (en) | 2001-12-03 | 2008-10-07 | Microsoft Corporation | Automatic detection and tracking of multiple individuals using multiple cues |
EP1330128A3 (en) * | 2001-12-03 | 2006-02-08 | Microsoft Corporation | Automatic detection and tracking of multiple individuals' faces using multiple cues |
EP1838104A3 (en) * | 2001-12-03 | 2009-09-30 | Microsoft Corporation | Automatic detection and tracking of multiple individuals' faces using multiple cues |
US7151843B2 (en) | 2001-12-03 | 2006-12-19 | Microsoft Corporation | Automatic detection and tracking of multiple individuals using multiple cues |
US7171025B2 (en) | 2001-12-03 | 2007-01-30 | Microsoft Corporation | Automatic detection and tracking of multiple individuals using multiple cues |
EP1838104A2 (en) | 2001-12-03 | 2007-09-26 | Microsoft Corporation | Automatic detection and tracking of multiple individuals' faces using multiple cues |
WO2004036500A3 (en) * | 2002-10-16 | 2004-09-16 | Koninkl Philips Electronics Nv | Hierarchical image segmentation |
GB2402311A (en) * | 2003-05-27 | 2004-12-01 | Canon Kk | Facial recognition using synthetic images |
GB2402311B (en) * | 2003-05-27 | 2006-03-08 | Canon Kk | Image processing |
US7362886B2 (en) | 2003-06-05 | 2008-04-22 | Canon Kabushiki Kaisha | Age-based face recognition |
US20100135530A1 (en) * | 2008-12-03 | 2010-06-03 | Industrial Technology Research Institute | Methods and systems for creating a hierarchical appearance model |
EP2194487A1 (en) * | 2008-12-03 | 2010-06-09 | Industrial Technology Research Institute | Methods and systems for creating a hierarchical appearance model |
US8422781B2 (en) | 2008-12-03 | 2013-04-16 | Industrial Technology Research Institute | Methods and systems for creating a hierarchical appearance model |
WO2014043755A1 (en) * | 2012-09-19 | 2014-03-27 | Commonwealth Scientific And Industrial Research Organisation | System and method of generating a non-rigid model |
AU2013317700B2 (en) * | 2012-09-19 | 2019-01-17 | Commonwealth Scientific And Industrial Research Organisation | System and method of generating a non-rigid model |
TWI666592B (en) * | 2012-09-19 | 2019-07-21 | 澳洲聯邦科學暨工業研究組織 | System and method of generating a non-rigid model |
Also Published As
Publication number | Publication date |
---|---|
EP1272979A1 (en) | 2003-01-08 |
GB9927314D0 (en) | 2000-01-12 |
WO2001037222A9 (en) | 2001-08-09 |
AU1407001A (en) | 2001-05-30 |
GB2359971A (en) | 2001-09-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Blanz et al. | A morphable model for the synthesis of 3D faces | |
US6556196B1 (en) | Method and apparatus for the processing of images | |
US5745668A (en) | Example-based image analysis and synthesis using pixelwise correspondence | |
US5774129A (en) | Image analysis and synthesis networks using shape and texture information | |
KR100571115B1 (en) | System and method using a data driven model for monocular face tracking | |
Pighin et al. | Resynthesizing facial animation through 3d model-based tracking | |
Beymer et al. | Example based image analysis and synthesis | |
Hwang et al. | Reconstruction of partially damaged face images based on a morphable face model | |
Pighin et al. | Modeling and animating realistic faces from images | |
US5844573A (en) | Image compression by pointwise prototype correspondence using shape and texture information | |
KR20230097157A (en) | Method and system for personalized 3D head model transformation | |
US20220292772A1 (en) | Methods and systems for constructing facial position map | |
JP2024506170A (en) | Methods, electronic devices, and programs for forming personalized 3D head and face models | |
KR20230085931A (en) | Method and system for extracting color from face images | |
CN113628327A (en) | Head three-dimensional reconstruction method and equipment | |
Kang et al. | Appearance-based structure from motion using linear classes of 3-d models | |
EP1116189A1 (en) | Graphics and image processing system | |
WO2001037222A1 (en) | Image processing system | |
US20030146918A1 (en) | Appearance modelling | |
GB2360183A (en) | Image processing using parametric models | |
Wong | Artistic rendering of portrait photographs | |
Sarris et al. | Building three dimensional head models | |
Kang | A structure from motion approach using constrained deformable models and appearance prediction | |
US6356669B1 (en) | Example-based image synthesis suitable for articulated figures | |
Goldenstein et al. | Adaptive deformable models for graphics and vision |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
AK | Designated states |
Kind code of ref document: C2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: C2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
COP | Corrected version of pamphlet |
Free format text: PAGE 10/14, DRAWINGS, REPLACED BY CORRECT PAGE 10/14 |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2000976195 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
WWP | Wipo information: published in national office |
Ref document number: 2000976195 Country of ref document: EP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2000976195 Country of ref document: EP |