[go: up one dir, main page]

WO2003049039A1 - Techniques d'animation faciale orientees performances - Google Patents

Techniques d'animation faciale orientees performances Download PDF

Info

Publication number
WO2003049039A1
WO2003049039A1 PCT/GB2002/005418 GB0205418W WO03049039A1 WO 2003049039 A1 WO2003049039 A1 WO 2003049039A1 GB 0205418 W GB0205418 W GB 0205418W WO 03049039 A1 WO03049039 A1 WO 03049039A1
Authority
WO
WIPO (PCT)
Prior art keywords
facial
sequence
image
image sequence
ordinate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/GB2002/005418
Other languages
English (en)
Inventor
Glyn Cowe
Alan Johnston
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University College London
Original Assignee
University College London
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University College London filed Critical University College London
Priority to AU2002349141A priority Critical patent/AU2002349141A1/en
Publication of WO2003049039A1 publication Critical patent/WO2003049039A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings

Definitions

  • the invention relates to performance-driven facial animation techniques and to films or other media products generated by the operation of such techniques.
  • Performance-driven facial animation has previously been approached with a variety of tracking techniques and deformable facial models.
  • Parke introduced the first computer generated facial model (Parke 1972; Parke 1974).
  • a polygonal mesh was painted onto a face, then the face was photographed from two angles and the 3d location of each vertex calculated by measurement and geometry. His choice of a polygonal mesh has always been the most popular approach and was animated simply by interpolating between key- frames.
  • Sophisticated underlying muscle models based on those of Platt and Badler (Platt and Badler 1981), have furthermore been incorporated to enable more anatomically realistic movements (Waters 1987; Terzopoulos and Waters 1993) and alternative tracking strategies, such as active contour models ('snakes') (Terzopoulos and Waters 1993), deformable templates (Yuille 1991) and more simplistic feature trackers (for cartoon animation) (Buck, Finkelstein et al. 2000) have been employed. Essa and Pentland (Essa and Pentland 1997) and Black and Yacoob (Black and Yacoob 1995) sought denser motion information by tracking facial motion with optical flow for recognition of expression.
  • Tiddeman and Perrett demonstrated a technique, based on prototyping, which allows them to transform existing facial image sequences in dimensions such as age, race and sex (Tiddeman and Perrett 2001). Prototypes are generated by averaging shape and texture information from a set of similar images (same race, for example). Each frame from the sequence can then be transformed towards this prototype. Points (179) must be located in each image and, although this can be automated using active shape models (Cootes, Taylor et al. 1995), a set of examples must first be delineated; so manual intervention cannot be avoided. Although this is not technically performance-driven facial animation, a new face is generated driven by the original motion.
  • Video Rewrite a system developed by Bregler et al., automatically lip-synchs existing footage to a new audio track (Bregler, Covell et al. 1997).
  • the video sequence remains the same, except for the mouth. Only the voice of the other actor drives the mouth and their facial expressions are ignored. They track the lips using eigenpoints (Covell and Bregler 1996) and employ hidden Markov models to learn the deformations associated with phonemes from the original audio track. Mouth shapes are predicted for each frame from the new audio track, and these are incorporated into the existing sequence by warping and blending.
  • An extension of this approach is described by Cosatto and Graf, driven by a text to speech synthesiser (Cosatto and Graf 2000). Ezzat and Poggio also describe similar work [Ezzat, 1998 #121; Ezzat, 2000 #122].
  • the invention is developed preferably around the application of principal components analysis to vectors representing faces; we now proceed to discuss previous work applying PCA in this area.
  • Principal components analysis is a mathematical technique that extracts uncorrelated vectors from a set of correlated vectors, in order of the variance they account for within the set. Early components thus provide strong descriptors of change within the set and later vectors have less relevance.
  • Sirovich and Kirby were the first to apply PCA to vectorised images of faces, principally as a means of data compression (Sirovich and Kirby 1987). Face images were turned into vectors by concatenating rows of pixel-wise grey level intensity values and transposing . They demonstrated how the weighted sum of just a small number of principal components can be used to reconstruct recognisable faces, requiring only the storage of the weights.
  • the principal components extracted from sets of facial images in this way are often termed eigenfaces and have been successfully applied since, particularly for facial recognition (Turk and Pentland 1991; Pentland, Moghaddam et al. 1994).
  • a problem with the application of PCA on intensity values of images is blurring, since linearly combining images results in deterioration of sharp edges.
  • Shape and texture information can thus be separated for an improved vectorisation.
  • Beymer, and Netter and Troje presented such improved vectorisations using optic flow to find pixel-to-pixel correspondences between images (Beymer 1995; Netter and Troje 1995).
  • flow fields were extracted from each face to a chosen reference face, these could be averaged to define the mean shape and, for each face, shape could be encoded as the flow field deviation from this mean.
  • warping faces onto the average shape was removed, leaving only texture.
  • PCA has often been used to find axes of variation between people, variations within people have been considered less often.
  • PCA has been applied to dot tracking data from facial sequences.
  • Arslan et al. used it simply for dimensionality reduction in building codebooks relating acoustic data and phonemes to three-dimensional positions of dots for speech-driven facial animation (Arslan and Talkin 1998).
  • Kshirsagar et al. used PCA on these vectors of dot positions and mapped a configuration associated with each phoneme into the principal component space (Kshirsagar, Molet et al. 2001).
  • Kuratate et al. captured laser scans of a face in eight different poses and used PCA to reduce the dimensionality of the data (Kuratate, Yehia et al. 1998). By relating the positions of a small number of points on the meshes to their principal component scores via a linear estimator, they were able to drive the 3D mesh by tracking points positioned analogously on an actor.
  • the invention takes a new approach, applying mathematical analysis techniques to the information available from a real facial image sequence in order to enable that information to be used to drive the movements of another face appropriately co-ordinately aligned with the original.
  • each example can be vectorised in a chosen manner.
  • the resultant virtual avatar can be controlled by projecting novel deformations into this co-ordinate frame if the new sequence of movements is appropriately aligned with the original in position and scale (although this alignment need not be precise).
  • the invention envisages an essentially space-based performance-driven facial animation technique in which frames from a preexisting real facial image sequence are analysed to generate a co-ordinate frame characterising an individual's permissible facial actions with a new sequence then being projected into the thus-defined co-ordinate frame to animate the end image accordingly.
  • FIGS. 1 through 7 are derived from facial image sequences of two male subjects Glyn (the younger man) and Harry (the elder) with the recorded facial movements of Glyn being used to drive the facial end image of Harry as will be described below.
  • nxm image a vector of grey level intensity values, one value for each pixel of the image.
  • Principal components analysis is a mathematical technique that seeks to linearly transform a set of correlated N-dimensional variables, ⁇ x,,x 2 ,...,x w ⁇ , into an
  • ⁇ T .
  • is, by definition, the covariance -l matrix of the set of image vectors (recall that the ⁇ p,'s are centred on their mean) and 1 u, T ⁇ u, gives a measure of the variance in the set that Ui accounts for.
  • Figure 3 shows the first five principal components from an image sequence of Harry speaking, vectorised as described previously. Together, these mere five principal components account for 75% of the variance in the sequence of 317 frames.
  • the central column always shows the mean image from the sequence.
  • the first P principal components can be learned by a neural network [Sanger, 1989 #112], or can be extracted using a convergence algorithm [Ro Stamm, 1998 #160].
  • Each element of c represents a weighting on the respective basis vector.
  • An optional rescaling step can be included, where the distribution of c/s can be transformed so the means and standard deviations of the weights associated with each basis vector match those for the training set.
  • the distribution can also be rescaled for exaggeration or anti-exaggeration purposes.
  • a face space is defined for Harry (the first five dimensions of which are shown in
  • Figure 3 The top row of Figure 4 shows five frames from a real image sequence of Glyn telling a joke. These are then projected into Harry's face space using the procedure defined, and the resulting images, transformed back to image space, are shown below their corresponding frames.
  • RGB colour images can be vectorised and the procedure outlined above can be applied.
  • FIG. 5 demonstrates this warping approach.
  • I a matrix containing the colour information for each pixel of the image, for example as an RGB triple.
  • I(x, y) we write to represent the colour information for the pixel at (x, y).
  • I(x, y) we choose the image shown in (a) as a reference; although this is a somewhat arbitrary choice, we select the image closest to the luminance mean, additionally ensuring it is in a 'neutral' pose with eyes open and mouth slightly open; this is because, for example, an open mouth can be warped onto a closed mouth, but a closed mouth cannot be warped onto an open mouth.
  • the Multi-channel Gradient Model is an optic flow algorithm modelled on the processing of the human visual system (Johnston, McOwan et al. 1999). We chose to apply the model to just two images for each frame, the reference and the target, since optic flow provides only an estimate and errors would be disproportionately magnified for frames temporally further from the reference, were fields to be combined over time. Some adaptation is thus required for this application, since the McGM would usually have a large temporal buffer of images to work with and, in this case, we have only two. This can be overcome by replacing the zero* and first temporal derivatives with their average and difference, respectively, and discarding all those of higher order.
  • a coarse-to-fine implementation of the McGM was applied at three spatial scales, 0.25, 0.5 and 1.0, progressively warping a reference facial image onto the target frame.
  • R is the reconstruction. Since (x - V(x, y), y - V( , y)) will rarely correspond exactly to pixel locations in Q, an interpolation technique is employed. Here we use bilinear interpolation. All images in the sequence can be represented as warps from Q and the entire sequence can be reconstructed by warping this one reference frame. Each vector field (U, V) can be vectorised, by concatenating each row of U and V, joining them and transposing to form one long vector.
  • Figure 6 shows the first five principal components from Harry's sequence, vectorised as warps from a reference as described above.
  • the middle column shows the chosen reference image and the left and right columns show the warp -2 standard deviations and +2 standard deviations respectively in the direction of each shown component. Together, these five components account for 85% of the variance in the whole set.
  • a first basis can be extracted from the set of forward training warps and a second basis can be extracted from the image-based vectorisations (luminance, or RGB, etc.) of the stabilised training sequence.
  • the first basis we refer to the configural basis, and the second as the image basis.
  • the stabilised images that it is generated from will be aligned.
  • the feature aligned texture information and the configural information in the form of the flow fields, can be combined together into one single vector for each frame of the sequence.
  • the basis can then be extracted as before from this information.
  • Such vectors can then be converted into images by simply warping the texture component by its configural component.
  • the decoding step would then be (from 3.22)
  • PCA is not the only way to generate a set of bases.
  • the original vectors from the training sequence could be used, for example (the transformation matrix U, would then be the matrix generated with these vectors as its columns, after normalisation, and the inverse transformation matrix would be its pseudoinverse rather than U ⁇ ).
  • PCA happens to be particularly good because it orders the bases in terms of descriptive importance, so noise can be truncated away, and orthogonality is enforced, so no pseudoinverses need to be calculated.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Processing Or Creating Images (AREA)

Abstract

La présente invention se rapporte à un procédé de génération d'une séquence d'animation faciale comprenant les étapes consistant à: a) observer une séquence d'image faciale réelle la séquence d'image originale et saisir les informations ainsi générées; b) aligner une autre image faciale l'image finale en utilisant des coordonnées appropriées, sur l'image originale; c) analyser mathématiquement les informations issues de la séquence d'image originale et d) utiliser les résultats ainsi obtenus pour commander les mouvements qui génèrent la séquence d'image finale. Une analyse en composantes principales est appliquée au trames successives issues de la séquence d'image originale de manière à générer les trames nécessaires de cordonnées caractérisant les actions faciales permises à un sujet, puis la nouvelle séquence résultante est projetée sur les trames de coordonnées ainsi définies de manière à commander en conséquence l'image finale. Il est également possible que les vectorisations nécessaires soient générées par des techniques analytiques non basées sur une analyse en composantes principales selon lesquelles l'analyse est effectuée à partir de bases vectorielles.
PCT/GB2002/005418 2001-12-01 2002-11-29 Techniques d'animation faciale orientees performances Ceased WO2003049039A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2002349141A AU2002349141A1 (en) 2001-12-01 2002-11-29 Performance-driven facial animation techniques

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0128863.8 2001-12-01
GB0128863A GB0128863D0 (en) 2001-12-01 2001-12-01 Performance-driven facial animation techniques and media products generated therefrom

Publications (1)

Publication Number Publication Date
WO2003049039A1 true WO2003049039A1 (fr) 2003-06-12

Family

ID=9926877

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2002/005418 Ceased WO2003049039A1 (fr) 2001-12-01 2002-11-29 Techniques d'animation faciale orientees performances

Country Status (3)

Country Link
AU (1) AU2002349141A1 (fr)
GB (1) GB0128863D0 (fr)
WO (1) WO2003049039A1 (fr)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8279228B2 (en) 2006-04-24 2012-10-02 Sony Corporation Performance driven facial animation
US10574883B2 (en) 2017-05-31 2020-02-25 The Procter & Gamble Company System and method for guiding a user to take a selfie
US10614623B2 (en) 2017-03-21 2020-04-07 Canfield Scientific, Incorporated Methods and apparatuses for age appearance simulation
US10621771B2 (en) 2017-03-21 2020-04-14 The Procter & Gamble Company Methods for age appearance simulation
US10818007B2 (en) 2017-05-31 2020-10-27 The Procter & Gamble Company Systems and methods for determining apparent skin age
US11055762B2 (en) 2016-03-21 2021-07-06 The Procter & Gamble Company Systems and methods for providing customized product recommendations

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6285794B1 (en) * 1998-04-17 2001-09-04 Adobe Systems Incorporated Compression and editing of movies by multi-image morphing

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6285794B1 (en) * 1998-04-17 2001-09-04 Adobe Systems Incorporated Compression and editing of movies by multi-image morphing

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JOCKUSCH S ET AL: "Analysis-by-synthesis and example based animation with topology conserving neural nets", PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP) AUSTIN, NOV. 13 - 16, 1994, LOS ALAMITOS, IEEE COMP. SOC. PRESS, US, vol. 3 CONF. 1, 13 November 1994 (1994-11-13), pages 953 - 957, XP010146492, ISBN: 0-8186-6952-7 *
LIU Z ET AL: "EXPRESSIVE EXPRESSION MAPPING WITH RATIO IMAGES", COMPUTER GRAPHICS. SIGGRAPH 2001. CONFERENCE PROCEEDINGS. LOS ANGELES, CA, AUG. 12 - 17, 2001, COMPUTER GRAPHICS PROCEEDINGS. SIGGRAPH, NEW YORK, NY: ACM, US, 12 August 2001 (2001-08-12), pages 271 - 275, XP001049896, ISBN: 1-58113-374-X *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8279228B2 (en) 2006-04-24 2012-10-02 Sony Corporation Performance driven facial animation
US11055762B2 (en) 2016-03-21 2021-07-06 The Procter & Gamble Company Systems and methods for providing customized product recommendations
US10614623B2 (en) 2017-03-21 2020-04-07 Canfield Scientific, Incorporated Methods and apparatuses for age appearance simulation
US10621771B2 (en) 2017-03-21 2020-04-14 The Procter & Gamble Company Methods for age appearance simulation
US10574883B2 (en) 2017-05-31 2020-02-25 The Procter & Gamble Company System and method for guiding a user to take a selfie
US10818007B2 (en) 2017-05-31 2020-10-27 The Procter & Gamble Company Systems and methods for determining apparent skin age

Also Published As

Publication number Publication date
GB0128863D0 (en) 2002-01-23
AU2002349141A1 (en) 2003-06-17

Similar Documents

Publication Publication Date Title
Noh et al. A survey of facial modeling and animation techniques
Blanz et al. Reanimating faces in images and video
Thies et al. Headon: Real-time reenactment of human portrait videos
Ichim et al. Dynamic 3D avatar creation from hand-held video input
Thies et al. Real-time expression transfer for facial reenactment.
US6556196B1 (en) Method and apparatus for the processing of images
Jones et al. Multidimensional morphable models: A framework for representing and matching object classes
Vlasic et al. Face transfer with multilinear models
Chai et al. Vision-based control of 3 D facial animation
Vetter et al. Estimating coloured 3D face models from single images: An example based approach
Bickel et al. Multi-scale capture of facial geometry and motion
US6967658B2 (en) Non-linear morphing of faces and their dynamics
Essa et al. Modeling, tracking and interactive animation of faces and heads//using input from video
Bronstein et al. Calculus of nonrigid surfaces for geometry and texture manipulation
US6400828B2 (en) Canonical correlation analysis of image/control-point location coupling for the automatic location of control points
Pighin et al. Modeling and animating realistic faces from images
WO2021228183A1 (fr) Reconstitution faciale
Pighin et al. Realistic facial animation using image-based 3D morphing
Kwolek Model based facial pose tracking using a particle filter
Fidaleo et al. Classification and volume morphing for performance-driven facial animation
WO2003049039A1 (fr) Techniques d'animation faciale orientees performances
Paier et al. Neural face models for example-based visual speech synthesis
EP4564301A1 (fr) Procédé de création d'un avatar pouvant être commandé
Peng et al. RMAvatar: Photorealistic human avatar reconstruction from monocular video based on rectified mesh-embedded Gaussians
Cowe Example-based computer-generated facial mimicry

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LU MC NL PT SE SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP