WO2001022355A1

WO2001022355A1 - Occlusion tolerant pattern recognition

Info

Publication number: WO2001022355A1
Application number: PCT/GB2000/003628
Authority: WO
Inventors: Christopher John Taylor; Timothy Francis Cootes; Gareth Edwards
Original assignee: Victoria University of Manchester
Current assignee: University of Manchester
Priority date: 1999-09-22
Filing date: 2000-09-21
Publication date: 2001-03-29
Anticipated expiration: 2002-03-22
Also published as: AU7303200A; WO2001022355A8; GB9922295D0

Abstract

A method of identifying an object using a model based on appearance parameters derived by comparing a series of images of examples of that object, the model including a pre-learned estimated relationship describing the effect of perturbations of the appearance parameters upon an image difference, the image difference comprising a set of elements describing the difference between an image of an object generated according to the model and an image of the object itself. During searching for an object using the model, the set of elements comprising the image difference is compared to corresponding elements of a threshold vector, and only those elements of the image difference having values less than the corresponding elements of the threshold vector are used in the identification of the object, the remainder of the elements of the image difference being discarded.

Description

Identification

The present invention relates to a method of identification.

Model-based approaches have proved successful in the location and interpretation of objects in images. Model-based approaches are particularly useful where the objects of interest are significantly variable in shape, texture, or both. In general, robust interpretation in the presence of noisy or missing data is achieved by constraining the interpretation to plausible solutions. Explaining image data in terms of model parameters provides a natural basis for further interpretation. The field of Face Recognition has seen extensive use of model-based methods, due to the need to deal with the large amount of variability present in images of faces (G. J. Edwards, C. J. Taylor, and T. Cootes. Face recognition using active appearance models. In 5' European Conference on Computer Vision, pages 581-595, 1998; A. Lanitis, C. J. Taylor, and T. F. Cootes. A unified approach to coding and interpreting face images. th

In 5 International Conference on Computer Vision, pages 368-373, June 1995; B. Moghaddam, W. Wahid, and A. Pentland. Beyond eigenfaces: Probabilistic matching for face recognition. In 3^r International Conference on Automatic Face and Gesture Recognition 1998, pages 30-35, Los Alamitos, California, 1998. IEEE Computer Society Press; M. Turk and A. Pentland. Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(l):71-86, 1991.).

The usefulness of a model depends on its specificity - it should only be able to represent plausible instances of the object class. In order to be truly specific, a model must represent all the possible data in images of the modelled objects. For example, models of shape such as the Active Shape Model described by Cootes et al (T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham. Active shape models - their training and application. Computer Vision and Image Understanding, 61(l):38-59, Jan. 1995) can only generate plausible object shapes, but it is possible to find plausible shapes which enclose invalid regions of texture for the object class of interest. Photo-realistic, generative models attempt to represent all object information, including both shape and texture and can be used to generate synthetic reconstructions of objects. Edwards et al (G. Edwards, A. Lanitis, C. Taylor, and T. Cootes. Statistical model of face images - improving specificity. Image and Vision Computing, 16:203-211, 1998) described the construction of 2D Appearance Models of faces, which included both shape and grey-level texture. Similar models of faces have been described by Jones and Poggio (M. J. Jones and T. Poggio. th

Multidimensional morphable models. In 6 International Conference on Computer Vision, pages 683-688, 1998) and Vetter (T. Vetter. Learning novel views to a single face image. In 2" International Conference on Automatic Face and Gesture Recognition 1996, pages 22-27, Los Alamitos, California, Oct. 1996. IEEE Computer Society Press). Vetter and Blanz (T. Vetter and V. Blanz. Estimating coloured 3d face models from single images: An example based approach. In 5^th European Conference on Computer Vision, pages 499-513. Springer, June 1998) describe a 3D colour model of faces, which is used to interpret 2D images by projection. In principal, each of these approaches could be used to model other types of variable objects - Cootes et al (T. Cootes, G. J. Edwards, and C. J. Taylor. Active appearance models. In 5^th European Conference on Computer Vision, pages 484-498. Springer, June 1998) have described Appearance Models of MRI scans of knee joints.

Using such models for image interpretation requires a method of fitting them to new image data. Jones and Poggio, and Vetter and Blanz fit their appearance models using forms of stochastic gradient descent. This is a difficult method to apply to a model containing up to 100 parameters, and image search is very slow (around 10 minutes in Jones and Poggio's approach). An additional drawback is the need for very good initialisation of the model to ensure convergence. A faster, more robust method known as the Active Appearance Model (AAM) was introduced by Edwards et al (G. Edwards, C. Taylor, and T. Cootes. Interpreting face images using active appearance models. In 3^r International Conference on Automatic Face and Gesture Recognition 1998, pages 300-305, Nara, Japan, Apr. 1998. IEEE Computer Society Press) and described further by Cootes et al (T. Cootes, G. J. Edwards, and C. J. Taylor. Active appearance models. In 5' European Conference on Computer Vision, pages 484-498. Springer, June 1998). Their approach uses a prior training stage in which the relationship between model parameter displacements and induced residual errors is learnt. This provides a method of matching high-dimensional models to image data in a fraction of a second.

Convergence is achieved by the Active Appearance Model when the measured difference between a reconstructed image generated by the model, and the underlying target image tends towards zero. Since the model can only represent complete, plausible faces this difference can never become zero in the presence of occlusion and the search fails in these cases.

It is an object of the first aspect of the invention to provide a method of identification which overcomes the above disadvantage.

According to a first aspect of the invention there is provided a method of identifying an object using a model based on appearance parameters derived by comparing a series of images of examples of that object, the model including a pre-learned estimated relationship describing the effect of perturbations of the appearance parameters upon an image difference, the image difference comprising a set of elements describing the difference between an image of an object generated according to the model and an image of the object itself, wherein during searching for an object using the model, the set of elements comprising the image difference is compared to corresponding elements of a threshold vector, and only those elements of the image difference having values less than the corresponding elements of the threshold vector are used in the identification of the object, the remainder of the elements of the image difference being discarded.

The inventors have realised that an occlusion will produce a much larger image difference than would be seen in the absence of an occlusion, and have used this to provide a robust object identification method. Preferably, the elements of the threshold vector are larger for areas of the object which exhibited more variation during construction of the model.

Preferably, the elements of the threshold vector correspond to the maximum magnitude of each element observed in the absence of occlusions during construction of the model.

Preferably, the threshold vector does not include elements which arose as a result of an area of an image of an object generated according to the model overlapping with a background during training.

Alternatively, the threshold vector may include elements which arose as a result of an area of an image of an object generated according to the model overlapping with a background during training. This may be useful in applications where the background does not change, for example the blue background commonly used by the film industry.

Preferably, the object is a face. The object may alternatively be a car, a horse or any other suitable object.

The known Active Appearance Model requires an initial placement at an image in order to identify an object. A correlation based method is used in the known Active Appearance Model to provide the initial placement.

It is an object of the second aspect of the invention to provide a method of identifying an object which does not require a correlation based method to provide an initial placement.

According to a second aspect of the invention there is provided a method of identifying an object using a model based on appearance parameters derived by comparing a series of images of examples of that object, the model including a pre- learned estimated relationship describing the effect of perturbations of the appearance parameters upon an image difference, the image difference comprising a set of elements describing the difference between an image of an object generated according to the model and an image of the object itself, wherein searching for an object using the model is begun at a plurality of locations in an image, a value representative of the total image difference of the model at a given location after an iteration of the model is compared to a threshold value, and searching is continued by performing further iterations of the model only if the value representative of the total image difference is less than the threshold value.

The threshold value is preferably a single scalar value.

The method according to the second aspect of the invention represents a complete and unified scheme for model-based image interpretation, requiring no hand-initialisation or correlation based initialisation.

Suitably, the value representative of the total image difference is compared to the threshold value after a single iteration of the model. This allows many searches to be abandoned after a single iteration.

Suitably, the value representative of the total image difference is compared to the threshold value after the second and subsequent iterations of the model. This allows further searches to be abandoned.

Preferably, the threshold value corresponds to the largest value representative of the total image difference which has been observed to result in a successful identification of an object. By setting the threshold at this value, the likelihood of the method failing to locate an object is minimised.

Preferably, the model is considered to have converged to a solution when the value representative of the total image difference is less than a predetermined value. Preferably, the object is a face. The object may alternatively be a car, a horse or any other suitable object.

Suitably, the method may comprise the first and second aspects of the invention.

A specific embodiment of the first and second aspects of the invention will now be described by way of example only, with reference to the accompany drawings, in which:

Figure 1 is an example of a representation of an unseen image using a combined model;

Figure 2 shows typical face hypotheses generated using a correlation method;

Figure 3 is an example of shape normalisation of an image;

Figure 4 shows three example test images, and reconstructions of the images based upon hand annotation;

Figure 5 shows reconstructions of the test images of Figure 4 obtained using an

Active Appearance Model;

Figure 6 shows a graph of reconstruction error versus iteration obtained using the

Active Appearance Model;

Figure 7 is an illustration of a threshold vector utilised by a first aspect of the invention;

Figure 8 shows a comparison of an image search using the known Active Appearance

Model algorithm with an image search using the Active Appearance Model algorithm according to the first aspect of the invention;

Figure 9 is a graph illustrating the effect of varying the threshold utilised by a second aspect of the invention; and

Figure 10 is a selection of test images illustrating operation of the first and second aspects of the invention.

The first and second aspects of the invention are based upon the Active Appearance Model (G. Edwards, C. Taylor, and T. Cootes. Interpreting face images using active appearance models. In 3^r International Conference on Automatic Face and Gesture Recognition 1998, pages 300-305, Nara, Japan, Apr. 1998. IEEE Computer Society Press) and described further by Cootes et al (T. Cootes, G. J. Edwards, and C. J. T Taayylloorr.. AAccttiivvee aappppeeaarraannccee mmooddeellss,. h In 5' European Conference on Computer Vision, pages 484-498. Springer, June 1998).

The Active Appearance Model uses the difference between a reconstructed image generated by the model and the underlying target image, to drive the model parameters towards better values. In a prior learning stage, known displacements, δc, are applied to known model instances and the resulting difference between model and image, δv, is measured. Multivariate linear regression is applied to a large set of such training displacements and an approximate linear relationship is established:

δc = Aδv

When searching an image, the current difference between model and image, δv, is used to predict an adjustment, -δc, to the model parameters which improves model fit. For simplicity of notation, the vector δc is assumed to include displacements in scale, rotation, and translation.

The Active Appearance Model was constructed using sets of face images. To do this, Facial appearance models were generated following the approach described by Edwards et al (G. Edwards, A. Lanitis, C Taylor and T. Cootes, Statistical model of face images - improving specificity. Image and Vision Computing, 16:203-211, 1998). The models were generated by combining a model of face shape variation with a model of the appearance variations of a shape-normalised face. The models were trained on 400 face images, each labelled with 122 landmark points representing the positions of key features. The shape model was generated by representing each set of landmarks as a vector, x and applying a principal component analysis (PCA) to the data. Any example can then be approximated using:

x = x + P_sb_s (1) where is the mean shape, . is a set of orthogonal modes of variation and b. is a set of shape parameters. Each example image was warped so that its control points match the mean shape (using a triangulation algorithm), and the grey level information g was sampled from this shape-normalised face patch. By applying PCA to this data a similar model is obtained:

The shape and appearance of any example can thus be summarised by the vectors b_s and b_g. Since there are correlations between the shape and grey-level variations, a further PCA was applied to the concatenated vectors, to obtain a combined model of the form: x = x + Q_sc (3) g = g + Q_gc (4)

where c is a vector of appearance parameters controlling both the shape and grey- levels of the model, and Q_s and Q_g map the value of c to changes in the shape and shape-normalised grey-level data. A face can be synthesised for a given c by generating the shape-free grey-level image from the vector g and warping it using the control points described by x (this process is described in detail in [3]).

The 400 examples lead to 23 shape parameters, b_s, and 114 grey-level parameters, b_g. However, only 80 combined appearance model parameters, c are required to explain 98% of the observed variation.

Figure 1 is an example of a representation of an unseen image using the combined model (the original image on left, and the model reconstruction, overlaid on the original image is on the right).

In the prior art embodiment of the Active Appearance Model, a two-stage strategy is adopted for matching the appearance model to face images. The first step is to find an approximate match using a simple and rapid approach. No initial knowledge is assumed of where the face may lie in the image, or of its scale and orientation. A simple eigen-face model (M. Turk and A. Pentland. Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(l):71-86, 1991) is used for this stage of the location. A correlation score, S, between the eigen-face representation of the image data, M and the image itself, I can be calculated at various scales, positions and orientations:

S = I -M - (5)

Although in principle the image could be searched exhaustively, it is much more efficient to use a stochastic scheme similar to that of Matas et al (K. J. J. Matas and J. Kittler. Fast face localisation and verification. In British Machine Vision Conference 1997, Colchester, UK, 1997). Both the model and image are sub-sampled to calculate the correlation score using only a small fraction of the model sample points. Figure 2 shows typical face hypotheses generated using this method. The average time for location was around 0.2sec using 10% of the model sample points.

Once a reasonable starting approximation of the position of a face has bee determined, the parameters of the appearance model are then adjusted, such that a synthetic face is generated which matches the image as closely as possible. The basic idea is outlined below, followed by details of the algorithm.

Interpretation is treated as an optimisation problem in which the difference between a real face image and one synthesised by the appearance model is minimised. A difference vector δ I can be defined:

51 = 1, - 1, (6)

where Ii is the vector of grey-level values in the image, and I_m, is the vector of grey- level values for the current model parameters. To locate a best match between model and image, the magnitude of the difference vector, Δ = δl i , is minimised by varying the model parameters, c.

Since the model has around 80 parameters, this appears at first to be a very difficult optimisation problem involving search in a very high-dimensional space. However, it is noted that each attempt to match the model to a new face image, is actually a similar optimisation problem. Therefore, the model learns something about how to solve this class of problems in advance. By providing a-priori knowledge of how to adjust the model parameters during image search, it arrives at an efficient run-time algorithm. In particular, it might be expected that the spatial pattern in δ I, to encode information about how the model parameters should be changed in order to achieve a better fit. For example, if the largest differences between the model and the image occurred at the sides of the face, that would imply that a parameter that adjusted the width of the model face should be adjusted. This expected effect is seen in Figure 3; an original image being shown at top left, a perturbed model displacement being shown at top right, and a shape-normalised difference image being shown at bottom centre.

In adopting this approach there are two parts to the problem: learning the relationship between δ I and the error in the model parameters, δc and using this knowledge in an iterative algorithm for minimising Δ .

The simplest model that could be chosen for the relationship between δl and the error in the model parameters (and thus the correction which needs to be made) is linear:

δc = AδI (7)

This is a good enough approximation to provide good results. To find A, a multiple multivariate linear regression is performed on a large sample of known model displacements, δc , and the corresponding difference images, δ I. These large sets of random displacements are generated, by perturbing the 'true' model parameters for the images in the training set by a known amount. As well as perturbations in the model parameters, small displacements in 2D position, scale, and orientation are also modelled. These extra 4 parameters are included in the regression; for simplicity of notation, they can, however, be regarded simply as extra elements of the vector δc . In order to obtain a well-behaved relationship it is important to choose carefully the frame of reference in which the image difference is calculated. The most suitable frame of reference is the shape-normalised face patch described above. A difference is calculated thus: for the current location of the model, calculate the image grey-level sample vector, gj, by warping the image data at the current location into the shape- normalised face patch. This is compared with the model grey-level sample vector, g_m, calculated using equation 4:

δg = g_i ^~g_m (8)

Thus, equation 7 can be modified:

δc = Aδg (9)

The best range of values of δc to use during training is determined experimentally. Ideally it is desired to model a relationship that holds over as large a range errors, δg as possible. However, the real relationship is found to be linear only over a limited range of values. In experiments, the model used 80 parameters. The optimum perturbation level was found to be around 0.5 standard deviations (over the training set) for each model parameter. Each parameter was perturbed from the mean by a value between 0 and 1 standard deviation. The scale, angle and position were perturbed by values ranging from 0 to +/- 10% (positional displacements are relative to the face width). After performing linear regression, an R² statistic is calculated for each parameter perturbation, δc_x to measure how well the displacement is 'predicted' by the error vector^ . The average R² value for the 80 parameters was 0.82, with a maximum of 0.98 (the 1st parameter) and a minimum of 0.48. Figure 3 illustrates the shape-free error image reconstructed for δg , for a deviation of 2 standard deviations in the 1st model parameter, and a horizontal displacement of 10 pixels.

Given a method for predicting the correction which needs to made in the model parameters, an iterative method may be constructed for solving the optimisation problem. For a given model projection into the image, c, the grey-level sample error vector, δg , is calculated, and the model estimate is updated thus:

c' = c - Aδg (10)

If the initial approximation is far from the correct solution the predicted model parameters at the first iteration will generally not be very accurate but should reduce the energy in the difference image. This can be ensured by scaling A so that the prediction reduces the magnitude of the difference vector, δg ²\ , for all the examples in the training set. Given the improved value of the model parameters, the prediction made in the next iteration should be better. The procedure is iterated to convergence. Typically the algorithm converges in around 5-10 iterations from fairly poor starting approximations. More quantitative data are given below.

The method was tested on a set of 80 previously unseen face images. Figure 4 shows three example images used for testing and the "true' model reconstruction, based on hand-annotation of the face location and shape.

Figure 5 illustrates the result of applying AAM search to these images. The left hand image shows the original overlaid with the initial hypothesis for face location. In general, the starting hypotheses used are better than those shown here. However, the hypotheses generated by the stochastic generator were deliberately displaced in order to illustrate the convergence properties of AAM search. Alongside the initial approximation are shown the search result afters iterations 1, 5 and 12 respectively. The reconstruction error of AAM search was tested over a test set of 80 unseen images. The reconstruction error for each image was calculated as the magnitude of the shape-normalised grey-level sample vector,

. Figure 6 shows a graph of reconstruction error versus iteration. Two plots are shown in Figure 6. The solid curve is a plot of average error versus iteration for the test set. The dashed curve shows the worst case encountered in the test. The two horizontal lines indicate the error measured when the model is fitted using accurate, hand-labelled points, for the average and worst case respectively. The error is measured in average grey-level difference per sample pixel, where pixels take a value from 0 to 63.

The Active Appearance Model relies on pre-learnt variations between images to identify a face. However, the Active Appearance Model cannot pre-learn variations which arise from the location of an occlusion in front of part of a face, because there is an infinite number of ways that the occlusion may occur. Thus, the active appearance model will fail to converge if an occlusion is located in front of part of a face.

The inventors have realised that an occlusion will produce a much larger difference δv between pixels than would be seen in the absence of an occlusion. By observing the magnitude of the elements of δv for each training displacement, a suitable threshold vector, v_t has been estimated during training.

The threshold is a vector rather than a single value. The threshold vector v_t is illustrated in Figure 7. The brighter pixels in Figure 7 correspond to areas where the model tolerates larger image differences during search before regarding the difference as due to occlusion (i.e. the magnitude of v_t is large). It is noted that areas of significant facial structure are more inclined to show large differences δv during search. In this embodiment of the invention, the threshold vector v_t is chosen to be the maximum magnitude of each element of δv observed in the absence of occlusions during training.

When the model is used to search for a face, elements of the vector δv having a magnitude greater than the threshold v_t are assumed to represent an occlusion. These elements are then ignored during subsequent search steps, thereby removing the effect of the occlusion from the search.

It is noted that, during training, some of the training displacements will result in the displaced model instance overlapping the background. When this occurs, the estimate of v_t will be too high, particularly for those sample points near the edge of the model. To overcome this, the invention ignores displaced model pixels not overlapping a pixel from the real object (for training examples the location of the real object pixels is known exactly). Such pixels are also ignored in the linear regression used to calculate A. Ignoring the background in this manner has the added advantage of preventing the AAM learning to use the background during an image search. This would cause a problem if the background seen when searching for an image is different to that used during training.

In some applications in may be possible to include the background when training, for example where the background is always a blue screen (as is used in the film industry).

Figure 8 shows a comparison of an image search using the simple AAM algorithm and using the new robust AAM algorithm. The simple algorithm locks onto the section of the face that is not occluded and does not provide a good fit, whereas the robust AAM algorithm provides an accurate image.

The prior art embodiment of the AAM uses a simple eigen-face model (M. Turk and A. Pentland. Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1):71- 86, 1991) to obtain an initial estimate of the location of a face in an image. A correlation score between the eigen-face representation of the image data and the image itself is calculated at various scales, positions and orientations. Once a candidate location for a face in the image has been found, the AAM is used to identify the face. The AAM method itself is not suitable for direct correlation-based matching because it contains a complete model of appearance, including both shape and texture.

The second aspect of the invention provides a hierarchical method of using the AAM to rapidly detect objects in images without first using correlation based matching.

An AAM search will converge from a limited number of different starting positions. For example, in the case of the face model described above, if at least two-thirds of the model instance at the initial placement overlaps the true face, the search will converge. In cases where the initial placement is smaller than the true face, the same condition applies, but the model needs to be at least half the size of the face.

The inventors have noted that in every observed case of successful AAM search, the magnitude of the residual between model and image, |<5v| , decreases with the first iteration. It is therefore possible to provide an initial estimate of possible face locations by running one iteration of AAM search over an equally spaced grid of locations and measuring the residual magnitude. The grid spacing is selected to be the widest possible that guarantees at least one starting position sufficiently overlapping the true object.

It is assumed that an instance might be a plausible place to try further iterations of the AAM if the following condition is met:

δv ^' < τ

where is a pre-learnt threshold. Clearly, the choice of Ti is important. Too low and there is a risk of missing faces; too high and many false positives will be detected. A false positive will cost further wasted iterations and thus processing time. The training images used to build the model may be used to estimate a suitable value of Ti in advance. Given the known positions and model parameters for the training images, the value of |<5v|^" is measured after one iteration of AAM search from several displaced starting positions, c, for each image (again, for simplicity of notation c, includes scale, angle and translation of the model). In each case whether or not the full AAM search finds the solution is recorded.

A 'final- fit' threshold, T_f defines whether the solution is found. A reasonable choice for T_f is the maximum value of |<5v|^" for the known ^'best-fit' to the training examples.

Figure 9 shows an example for the face model, illustrating the effect of varying Ti. The y-axis shows the percentage of possible locations that would have been rejected, even though a full search would have converged on the face. The x-axis shows the percentage of wasted searches, i.e. when a full search would be requested but would not converge on the face. The example compares the number of missed convergent search with the number of wasted searches for varying threshold level, Ti, for the first iteration of AAM search.

In this case, to be sure not to reject possible convergent searches, 55% wasted searches must be tolerated. The key point is that there exists a threshold above which all convergent searches are guaranteed to be kept.

The threshold level, T_{l s} can be chosen to include all possible convergent starting locations. Some of these (about 55% in this case) will not be successful and will start to diverge after further iterations. The key is to reject these as soon as possible. The search algorithm is as follows:

1. Perform one iteration at initial grid of n locations, {ci, c , c_n}

2. Measure |<5v| at updated locations {ci ', c₂', c_n'} Reject instances where \δv\ > T_\.

4. Add instances where |<5v|^" < T_f to 'solutions' list.

5. Perform further iteration on m remaining instances, {ci ', c ', c_m'}

6. Return to 2 until no instances remain.

Thus, as the algorithm iterates, solutions which fall below the final fit threshold, T_f are accepted as plausible objects, whilst solutions which show divergence beyond Ti are rejected. The final list of solutions should describe both the location and model parameters of all objects (faces, in this case) in the image.

The method has been tested on 200 unseen images containing faces. The first 100 images were fairly straightforward, containing a single face. The second 100 images contained faces hand-segmented from test images and placed in a new image, in order to produce multiple faces on difficult, cluttered backgrounds.

The method has also been tested, including the first aspect of the invention, on the same 200 images with the addition of random occlusion. Some typical test images are shown in Figure 10.

Table 1 shows the detection rates for each of the image types. The best performance was obtained on the simple, single-face images. This type of image was not significantly affected by occlusion. In the images with multiple faces and cluttered background, occlusion was found to have a small detrimental effect on performance. The system did not return any false-positive matches; the specificity of the model ensured that a low value of \δv\ , must correspond to an instance of a face.

Table 1

The time required to search an image depends on the rate at which the hierarchy of possible model instances, {cι\ c ', c_m'}, is trimmed; images containing multiple faces generally take longer to search. The range of times required to successfully search a 640x480 image was 10 to 35 seconds on a single processor Pentium II 450 MHz machine.

Claims

1. A method of identifying an object using a model based on appearance parameters derived by comparing a series of images of examples of that object, the model including a pre-learned estimated relationship describing the effect of perturbations of the appearance parameters upon an image difference, the image difference comprising a set of elements describing the difference between an image of an object generated according to the model and an image of the object itself, wherein during searching for an object using the model, the set of elements comprising the image difference is compared to corresponding elements of a threshold vector, and only those elements of the image difference having values less than the corresponding elements of the threshold vector are used in the identification of the object, the remainder of the elements of the image difference being discarded.

2. A method of identifying an object according to claim 1, wherein the elements of the threshold vector are larger for areas of the object which exhibited more variation during construction of the model.

3. A method of identifying an object according to claim 1 or 2, wherein the elements of the threshold vector correspond to the maximum magnitude of each element observed in the absence of occlusions during construction of the model.

4. A method of identifying an object according to any preceding claim, wherein the threshold vector does not include elements which arose as a result of an area of an image of an object generated according to the model overlapping with a background during training.

5. A method of identifying an object according to any of claims 1 to 3, wherein the threshold vector includes elements which arose as a result of an area an image of an object generated according to the model overlapping with a background during training.

6. A method according to any preceding claim, wherein the object is a face.

7. A method of identifying an object using a model based on appearance parameters derived by comparing a series of images of examples of that object, the model including a pre-learned estimated relationship describing the effect of perturbations of the appearance parameters upon an image difference, the image difference comprising a set of elements describing the difference between an image of an object generated according to the model and an image of the object itself, wherein searching for an object using the model is begun at a plurality of locations in an image, a value representative of the total image difference of the model at a given location after an iteration of the model is compared to a threshold value, and searching is continued by performing further iterations of the model only if the value representative of the total image difference is less than the threshold value.

8. A method of identifying an object according to claim 7, wherein the threshold value is preferably a single scalar value.

9. A method of identifying an object according to claim 8, wherein the value representative of the total image difference is compared to the threshold value after a single iteration of the model.

10. A method of identifying an object according to any of claims 7 to 9, wherein the value representative of the total image difference is compared to the threshold value after the second and subsequent iterations of the model.

11. A method of identifying an object according to any of claims 7 to 10, wherein the threshold value corresponds to the largest value representative of the total image difference which was observed to result in a successful identification of an object.

12. A method of identifying an object according to any of claims 7 to 11, wherein the model is considered to have converged to a solution when the value representative of the total image difference is less than a predetermined value.

13. A method of identifying an object according to any of claims 7 to 12, wherein the object is a face.

14. A method of identifying an object according to claim 1 and claim 7.

15. A method of identifying an object substantially as hereinbefore described with reference to the accompanying figures.