GB2549941A

GB2549941A - Method,apparatus and computer program product for tracking feature points in video frames

Info

Publication number: GB2549941A
Application number: GB1607583.0A
Authority: GB
Inventors: M Williams John; Ono Tomohiro
Original assignee: Kudan Ltd
Current assignee: Kudan Ltd
Priority date: 2016-04-29
Filing date: 2016-04-29
Publication date: 2017-11-08
Also published as: GB201607583D0; WO2017186576A1

Abstract

A method and apparatus for tracking a reference image captured in video frames comprising: identifying a target by matching descriptors for a subset of points of interest (POIs), against a video frame, where the subset of POIs is selected from a set of POIs for the reference image; tracking the movement of the set of POIs of the reference image between video frames; and updating the selection of the subset of POIs dependent on the steps of matching and/or tracking. Also claimed is a method of tracking a reference image captured in video frames comprising: determining an estimated location of a POI based on the location of the POI in a previous frame; and determining the location of the POI based on the estimated location and the location of the POI in the reference image.

Description

METHOD, APPARATUS AND COMPUTER PROGRAM PRODUCT FOR TRACKING FEATURE

POINTS IN VIDEO FRAMES

Field of the invention [0001] The invention relates to augmented reality in which video elements (e.g. elements of a live direct or indirect view of a physical, real-world environment) are augmented or supplemented by computer-generated video or graphical input. In particular, it relates to a method, an apparatus and a computer program product for tracking a feature points (e.g. points in a reference image or a marker) in a plurality of video frames, e.g. for purposes of superimposing graphical input on an object represented by a reference image.

Background [0002] The optical flow method is conventionally used for tracking a reference image, e.g. a 2D marker, captured in a plurality of video frames V(x, y, ti). V represents a pixel value, x represents a first direction, y represents a second direction and ti represents a time. The reference image has a plurality of points of interest with respective locations xi, yi within the video frames V(x, y, ti).

[0003] In a first iteration, a processor determines velocity components and (i.e. the optical flow) for each point of interest at a location xi-1, yi-1 within a previous video frame V(x,y,ti-1). The processor then estimates the location xi, yi of each point of interest within a current video frame V(x,y,ti) based on the velocity components and the location xi-1, yi-1 of each point of interest within the previous video frame V(x,y,ti-1).

[0004] In a second iteration, the processor determines the velocity components and for each point of interest at a location xi, yi within the current video frame V(x,y,ti). The processor then estimates the location xi+1, yi+1 of each point of interest within a next video frame V(x,y,ti+1) based on the velocity components and the location xi, yi within the current video frame V(x,y,ti), and so forth.

[0005] A shortcoming of the optical flow method is that the determination of a location xi, yi of a point of interest within a video frame V(x,y,ti) involves an error. As the location xi, yi is used to determine the velocity components and then to determine the location xi+1, yi+1 of the point of interest within the next video frame V(x,y,ti+1), the location xi+1, yi+1 of the point of interest within the next video frame V(x,y,ti) inevitably involves a larger error (i.e. the errors accumulate). Said differently, the accuracy of the location xi, yi of each point of interest within the video frames V(x, y, ti) gets poorer over time. Moreover, errors accumulate to the point at which a particular point of interest is effectively lost. When too many points of interest for a given reference image are lost, it is no longer possible to track the reference image.

[0006] The invention aims at addressing this shortcoming at minimum expense for a processor.

Summary [0007] A method for tracking a reference image captured in video frames, comprising: matching descriptors for a subset of points of interest, POIs, against a video frame, where the subset of POIs is selected from a set of POIs for the reference image; tracking the movement of the set of POIs between video frames; and updating the selection of the subset dependent on the steps of matching and/or tracking.

[0008] The selection of the subset is preferably updated depending on movement between video frames. For example, the selection is updated by deselecting POIs that, by their movement, are likely to be unavailable in a subsequent video frame (e.g. because they are close to the edge of the video frame and moving towards the edge) and/or by adding POIs that, by their movement, are likely to become available in a subsequent video frame (e.g. because they are beyond the edge of the frame but moving into the frame). POIs may be deselected and alterative POIs selected dependent on their relative degrees of matching in the step of matching.

[0009] The selection of the subset is preferably updated after a step of tracking POIs from a previous video frame to a present video frame and before performing matching in a subsequent video frame.

[0010] In accordance with another aspect of the invention, a method for tracking a reference image captured in video frames V(x, y, ti) is provided. V represents a pixel value, x represents a first direction, y represents a second direction and ti represents a time. The reference image is captured in a reference image, wherein the reference image has a set of POIs with respective locations xref, yref within the reference image and respective locations xi, yi within the plurality of video frames V(x, y, ti). The method comprises: determining an estimation xi*, yi* of location xi, yi of a point of interest, POI, within a given video frame V(x,y,ti) based on a location xi-1, yi-1 of that POI within a previous video frame V(x,y,ti-1); and determining the location xi, yi of the POI within the given video frame V(x,y,ti) based on the location xref, yref of that POI within the reference image and the estimation xi*, yi* of the location xi, yi of that POI within the given video frame V(x,y,ti).

By so doing, the accuracy of the location xi, yi of the POI within the given video frame V(x, y, ti) is improved or, to put this another way, the location is determined with high accuracy but at a reduced processing burden.

[0011] According to one feature, the method further comprises detecting a subset of a set of POIs and their respective locations xi-1, yi-1 within the previous video frame V(x,y,ti-1).

[0012] According to another feature, determining the estimation xi*, yi* of the location xi, yi of a POI within the given video frame V(x,y,ti) comprises: determining (604) velocity components for the location xi-1, yi-1 of the POI within the previous video frame V(x,y,ti-1); and determining (606) the estimation xi*, yi* of the location xi, yi of the POI within the given video frame V(x,y,ti) based on the velocity components and the location xi-1, yi-1 of the point of interest within the previous video frame V(x,y,ti-1).By so doing, the estimation xi*, yi* of the location xi, yi of the POI within the given video frame V(x,y,ti) is obtained by the optical flow method.

[0013] According to one feature, determining the location xi, yi of a POI within a given video frame V(x,y,ti) comprises: determining a search area in the given video frame V(x,y,ti) based on the estimation xi*, yi* of the location xi, yi of that POI within the given video frame V(x,y,ti); determining a reference patch in the reference image based on the location xref, yref of that POI within the reference image; comparing the reference patch at least one patch selected within the search area to determine whether there is a match; and determining the location xi, yi of the POI within the given video frame V(x,y,ti) based on the match. By so doing, the location xi, yi of the location xi, yi of the POI within the given video frame V(x,y,ti) is obtained by a template match method. The computing resources required to perform the template match method can be minimized by setting the search area to only a portion of the given video frame V(x,y,ti).

[0014] According to another feature, determining whether there is a match based the results of the comparing comprises: determining whether a maximum result from the comparing is above a threshold; or determining whether a minimum result from the comparing is below a threshold.

[0015] According to one feature, determining the location xi, yi of a POI within a given video frame V(x,y,ti) further comprises determining whether the at least one patch selected within the search area is blurry; and comparing the reference patch with the at least one patch selected within the search area is blurry only if not blurry. By so doing, the computing resources required to perform the template match method can be minimized. In addition, the amount of false positive matches can be minimized.

[0016] According to a feature, the method further comprises determining an angle, a distance and/or a motion of a video camera relative to the reference image; and adjusting the resolution of the video camera based on the angle, the distance and/or the motion. By so doing, the accuracy of the location xi, yi of the POI within the given video frame can further be improved.

[0017] According to an aspect of the invention, an apparatus for tracking a reference image in a video comprising a camera for capturing video frames and at least one processor configured to perform the above methods is provided.

[0018] According to an aspect of the invention, a computer-program product for equipping an apparatus comprising code, which when executed by a processor, causes the processor to perform a the above methods is provided.

[0019] According to an aspect of the invention, an apparatus for tracking a reference image captured in video frames is provided. The apparatus comprises means for capturing video frames; means for storing the reference image; means for matching descriptors for a subset of points of interest, POIs, against a video frame, where the subset of POIs is selected from a set of POIs for the reference image; means for tracking the movement of the set of POIs between video frames; and means for updating the selection of the subset dependent on the steps of matching and/or tracking.

[0020] According to an aspect of the invention, an apparatus for tracking a reference image captured in video frames V(x, y, ti) is provided. The apparatus comprises means for determining an estimation xi*, yi* of location xi, yi of a point of interest, POI, within a given video frame V(x,y,ti) based on a location xi-1, yi-1 of that POI within a previous video frame V(x,y,ti-1)·, and means for determining the location xi, yi of the POI within the given video frame V(x,y,ti) based on the location xref, yref of that POI within the reference image and the estimation xi*, yi* of the location xi, yi of that POI within the given video frame V(x,y,ti).

[0021] Other features and advantages of the invention will become apparent after review of the entire application, including the following sections: brief description of the drawings, detailed description and claims.

Brief description of the drawings [0022] The accompanying drawings illustrate exemplary aspects of the invention, and, together with the general description given above and the detailed description given below, serve to explain features of the invention.

[0023] Fig. 1 is a process flow diagram illustrating certain high level processes performed by a general purpose or application specific processor.

[0024] Fig. 2 is a flow chart of a method for tracking a reference image in video frames V(x, V, ti)· [0025] Fig. 3 shows optional additional steps in the process of Fig. 1 or Fig. 2.

[0026] Fig. 4 is a flow chart of a method for determining an estimation xi*, yi* of the location xi, yi of each POI of a subset of POIs within a given video frame V(x, y, ti).

[0027] Fig. 5 is a flow chart of a method for determining the location xi, yi of each POI of a subset of POIs within a given video frame V(x, y, ti).

[0028] Fig. 6 is an alternative representation of the process of Fig. 5.

[0029] Fig. 7 shows three consecutive video frames V(x, y, ti) capturing a reference image.

[0030] Fig. 8 is a flow chart of a method for updating a subset of POIs.

[0031] Fig. 9 shows a mobile device for implementing any one of the methods of Figs. 1 to Fig. 6 and 8.

Detailed Description [0032] Referring to Fig. 1, a process flow diagram is given, illustrating certain high level processes performed by a general purpose or application specific processor.

[0033] The process begins at step 10 with detection of a reference image. A camera is pointing at a scene and, in that scene, there is a pre-determined reference image that is recognized by known recognition techniques. The reference image (which may be a 2D marker) is stored in the form of a JPEG or PNG file and has a set of POIs with respective locations xref, yref within the reference image. Each POI is represented as a feature descriptor. A feature descriptor represents some aspect of a small array of pixels, such as brightness, colour and/or shape but may also represent texture or other aspects of a POI. A feature descriptor may be a simple number or vector (e.g. in which each digit/element represents a pixel in the array). Alternatively, each POI in the reference image is represented as a template, which is preferably some larger matrix representing the individual pixels in the vicinity of the POI (e.g. at full resolution or partial resolution). There may be 1000 POIs in a reference image. When a sufficient number of POIs are detected in correct position relative to each other, the reference image is deemed to have been recognized.

[0034] If, for example, POIs A, B and C are present in the reference image in a straight horizontal line relative to each other, and these POIs are found in a line in any orientation, this would contribute to a positive recognition. The detection process outputs position of the reference image in the frame and a 2-dimensional measure of pose of the reference image. I.e., without reference to any 3D co-ordinates, a value is generated indicating the orientation of the reference image. E.g., if the POIs A, B and C are found in a straight horizontal line, the pose would be zero. If the POIs C, B and A are found in a straight horizontal line, the pose would be 180 degrees. If the POIs A, C and B are found in a straight line, these would not contribute to the recognition of the reference image. A threshold can be set at which a sufficient number of identified POIs in correct positional relationship result in a reference image being deemed recognized. E.g. 10% may be sufficient, meaning that discovery of 100 templates out of 1000 is sufficient (given that part of the reference image may be obscured behind something else in the scene, or may be off-camera and out of the frame, or may be out of focus, etc.) A vector may also be generated indicating the location of the reference image in the first 2D frame in which it is detected and a value may be generated for its scale. At this stage, pose, position and scale are all 2D values.

[0035] The reference image may have a third dimension, in which case discovering it in two dimensions results in an orientation (for the reference image) and pose (for the camera relative to some plane in the image).

[0036] It may be noted that there are other ways of performing the detection step 10 that do not involve template matching and do not deliver a measure of pose. For example, the reference image may be represented as a histogram of feature descriptors, and if POIs are discovered in numbers that match the histogram for the reference image, this can be deemed a successful recognition. Then the templates for the reference image can be fetched, and template matching can be initiated. In such a case, 2D pose is determined after template matching in the first frame. This description will continue for the case illustrated, but those skilled in the art will be aware that the process steps can be executed in other sequences.

[0037] Once a reference image has been detected, process 20 is initiated. In this process, a subset of POIs in the reference image are selected for tracking. These can be selected from among the set of POIs discovered, but other factors in selection will be described. If, for example, 500 of the 1000 POIs have been discovered, process 20 may select 100 of them. In selecting a subset of POIs for tracking, process 20 may simply select the "best matching" POIs -i.e. those that most closely resembled their corresponding feature descriptors in the preceding stage. A better selection, however, is to select POIs that (a) have a good or fair match and (b) are fairly evenly distributed across the reference image. It is of lesser value, for example to have many POIs clusters about one corner of the reference image, as that corner may become obscured in a later frame. The preferred process therefore rank orders the POIs by degree (closeness) of match (e.g. using a RMS error measurement for each feature descriptor and the corresponding pixels in the scene) and then looks down the rank-ordered list and skips POIs that are too close to a POI already selected, until a subset of a satisfactory size (e.g. 100) has been selected, returning to the top of the list to include points that have been skipped if necessary. The set of POIs selected may include POIs (from the same reference image) that were not discovered in step 10 but may be expected in the next frame.

[0038] The process proceeds to step 30. Using the knowledge of the camera pose and the detected reference image and the discovered (or expected) POIs for that reference image, a process of template matching can be performed. This process is made vastly easier by having selected a subset of all the POIs against which to perform template matching.

[0039] Template matching is a documented process by which small parts of an image are matched against template images. It is a known problem when performing template matching that the reference image may be partly obscured. It is impractical to provide a multitude of templates to cover each possible occlusion. One documented approach (F. Jurie and M. Dhome “Real time robust template matching", British Machine Vision Conference 2002, pages 123-131) is to divide the template image into multiple sub-images and perform matching on each subdivision.

[0040] Once POIs have been matched against their respective feature descriptors in step 30, the process continues to step 40 in which the optical flow from the present frame to the next frame is calculated. This is a matrix representing movement vectors from one frame to the next. It is a process that is documented in the literature. It may utilize POIs discovered in the present frame and rediscovered in the next frame to calculate vectors for the various POIs, but it can be performed independent of POIs. The output of process 40 is a 3D value of pose relative to a set of 3D co-ordinates of the camera's choosing. E.g. the camera may have a gyroscope to determine vertical, and may select co-ordinate axes based on the determined vertical direction. Process 40 determines the pose of the reference image in 3D space based on the original 3D pose and the optical flow since the previous frame (taking into account any movement of the camera between frames if necessary).

Determining the pose of a reference image relative to a set of 3D co-ordinates of a scene (or vice versa) is referred to in the industry as simultaneous localization and mapping (SLAM) and is described in, for example, US20120146998A1.

[0041] The process then returns to process 20 and a new selection of points is made. The previously selected subset is updated by the same process. E.g. POIs that have dropped off the edge of the frame are deselected. E.g. POIs of poor matching quality may be deselected (on the basis that they are out of focus) or POIs that have disappeared may be deselected (on the basis that they are assumed to have been obscured by something in the scene), but some such POIs may be retained (on the basis that whatever is obscuring them may move in the next frame).

[0042] POIs may be added to the list, in particular POIs in the reference image that are out-of-frame but are in the direction of movement as determined by the optical flow process. POIs are known to be out of frame but close to the edge of the frame by virtue of their positions in the reference image relative to POIs that are in the frame (taking into account 3-D pose of the reference image relative to the camera).

[0043] POIs that are about to drop off the edge of the frame (on the basis of proximity to the edge of the frame and the direction of movement as determined by the optical flow process) can be deselected.

[0044] When the new set of POIs has been selected, as an optional step, the expected locations of those POIs in the next frame can be updated, based on the optical flow process. I.e., for each POI location (or estimated location) in the present frame, a movement vector is determined to project the location of that POI in the next frame. Each movement vector can be determined by extrapolation of the optical flow from process 40. This updating of the expected location of the POI in the next frame can assist the next execution of the template matching process BO in the next cycle of the process.

[0045] Following process 20, the entire process executes another cycle for the new frame. Template matching process 30 operates to perform matching for the updated set of POIs, optionally using extrapolated locations of those POIs to narrow down the search space in which the template for each POI is to be matched in the new frame.

[0046] When process 30 operates in this new cycle, it confirms the presence of POIs in the set (some of which are rediscovered and some may be discovered for the first time) and it accurately identifies their locations. The optical flow process 40 may use the change in position of rediscovered POIs to give its output (a new vector for pose) but this is not essential - it may use other known means.

[0047] The selection of the subset of POIs may be updated dependent on the step of matching, by deselecting POIs whose descriptors no longer result in a satisfactory match within a frame of video and selecting POIs not selected in that frame.

[0048] From the above explanation, the process of detection and the process of tracking work from different sets of points. A few points may be detected, which are enough to calculate a 3D pose for the image in question. Using this pose, the method can predict a large set of points that should be visible. It is not limited to reusing the ones from detection. These points are template matched and then fed into optical flow. Optical flow tracks to the next frame, and allows an updated pose to be calculated. The process is repeated: the pose is used to predict points that should be visible. Optical flow points are discarded following the pose estimation.

[0049] Fig. 2 shows a flow chart of a method for tracking a reference image captured in a plurality of video frames V(x, y, ti). An optional process of narrowing down the search space for template matching is illustrated.

[0050] Having selected the reference image in step 100 (e.g. loading the reference into memory volatile memory) the process is ready to search for a plurality of POIs (e.g. 1000 POIs) having respective locations xref, yref within the reference image.

[0051] At step 200, the processor selects a video frame V(x,y,ti). The POIs have respective locations xi, yi within the video frames V(x, y, ti).

[0052] At step 300, the processor selects a subset of POIs (e.g. 100 out of 1000) and their respective locations xi, yi within the video frame V(x,y,ti).

[0053] At step 400, the processor determines the location xi,yi of each POI of the subset in the video frame V(x,y,ti). This can be achieved by template matching (or other means). As will be explained below, this processes may be assisted by an estimation xi*, yi* of the location xi, yi of that POI within the video frame V(x,y,ti) determined at step 600.

[0054] At step 500, the processor increments / and selects another video frame V(x,y,ti).

[0055] At step 600, the processor determines, e.g. by means of optical flow, an estimation xi*, yi* of the location xi, yi of each POI of the subset within the video frame V(x,y,ti) based on the location xi-1, yi-1 of that POI detected within the video frame V(x,y,ti-1) at step 300.

[0056] At step 300, the processor updates the subset of POIs by preferably removing POIs and preferably adding POIs. In this way, the subset does not gradually diminish as POIs are lost, but is replenished without necessarily having to return to step 100 and begin again with all 1000 original POIs.

[0057] Referring now to Fig. 3 various processes can be run following process 40 (or 30 depending on the order in which the various processes are run) or in process 300.

[0058] At step 900, the processor determines an angle (i.e. a pose) of the reference image relative to the camera. At step 1000, the processor determines a distance of the reference image from the camera. At step 1100, the processor determines a motion of the reference image relative to the video camera.

[0059] There is an optional step 1200 at which the processor adjusts the resolution of the video camera based on at least one of the determined angle, distance and motion. For example, the resolution is increased when the angle is steep and decreased when the angle is shallow. The resolution is decreased when the distance is short and increased when the distance is large. The resolution is decreased when the motion is fast and increased when the motion is slow. Advantageously, the processor makes a trade-off between these three criteria.

[0060] Fig. 4 shows a flow chart of a possible implementation of step 600 of Fig. 2, in which the estimation xi*, yi* of the location xi, yi of each POI of the subset within the video frame V(x,y,ti) is determined using the optical flow method.

[0061] At operation 602, the processor selects a POI in the subset and at operation 604, the processor determines velocity components and (i.e. the optical flow) for the POI at the location xi-1, yi-1 within the video frame V(x,y,ti-1). At operation 606, the processor determines the estimation xi*, yi* of the location xi, yi of the POI in the video frame V(x,y,ti) based on the velocity components and and the location xi-1, yi-1 of the POI within the video frame V(x,y,ti-1). At operation 608, the processor determines whether each POI has been selected in the subset. If at least one POI of the subset has not been selected, the processor repeats operation 602. Otherwise, the processor performs operation 610 and step 600 comes to an end.

[0062] Fig. 5 shows a flow chart of a possible implementation of step 400 of Fig. 2, in which the location xi, yi of each POI of the subset within the video frame is determined based on the location xref, yref of that POI in the reference image and the estimation xi*, yi* of the location xi, yi of that POI within the video frame V(x,y,ti). The determination is made using a template match method with a limited search area, bounded based on the estimation xi*, yi*.

[0063] At operation 702, the processor selects a POI in the subset. At operation 704, using a value of pose from optical flow, the processor determines a search area within the video frame V(x, y, ti). The search area is a portion of the video frame V(x, y, ti) advantageously centred at the estimation xi*, yi* of the location xi, yi of the POI within in the video frame V(x, y, ti). At operation 706, the processor determines a reference patch within the reference image. The reference patch is a portion of the reference image advantageously centred at the location xref, yref of the POI in the reference image. At operation 708, the processor selects a patch within the search area. The selected patch and the reference patch have preferably the same size.

[0064] At operation 710, the processor determines whether the selected patch is blurry. For example, the processor determines the variance of the pixel values within the selected patch and determines that the selected patch is blurry if the variance is below a threshold. If the selected patch is blurry, operation 714 is performed. Otherwise operation 716 is performed.

[0065] At operation 716, the processor compares the selected patch with the reference patch and weights the result of the comparison based on the variance determined at 710. The higher the variance, the higher the weight.

[0066] At step 720, the processor determines if there is a match. For example, the processor determines the maximum result or the minimum result of the comparisons and determines if the maximum result is above a threshold or if the minimum result is below a threshold. If there is a match, operation 722 is performed. Otherwise operation 714 is performed.

[0067] At operation 722, the processor determines the location xi,yi of the POI within the video frame V(x, y, ti) based on the best match. For example, the best match corresponds to a selected patch centred on the location xi,yi of the POI within the video frame V(x, y, ti).

[0068] At operation 714, the processor removes the POI from the subset and proceeds to operation 724. Such removal takes place when the search area is too blurry to find a reliable match or when the search area is not blurry but no match has been found.

[0069] At operation 724, the processor determines if all the points of interest of the subset have been selected. If at least one POI has not been selected, operation 702 is repeated. Otherwise, operation 726 is performed and step 700 comes to an end.

[0070] A determination of whether a patch is blurry preferably comprises determining a variance of the pixel values within a patch, and determining that the patch is blurry if the variance is below a threshold.

[0071] Each result from the comparing is preferably weighed based on the variance determined for the patch.

[0072] Determining the location xi,yi of a POI of the subset within a video frame V(x,y,ti) based on the location xref, yref of that POI within the reference image and the estimation of the location of that POI within the video frame V(x,y,ti) preferably further comprises removing (714) each POI from the subset for which there is no match.

[0073] Persons of ordinary skill in the art will understand that the steps and operations in Fig. 5 can be performed in different sequences and are not necessarily set out in the sequence described with reference to Fig. 1, without being incompatible therewith. E.g. the steps and processes of Fig 5 are described with reference to a given frame whereas some can be performed with reference to a subsequent frame (e.g. steps 704 or 714) or a previous frame.

[0074] Thus, starting with a value of pose (step 704), a large set of points is projected via that pose to see whether they are within the bounds of the camera. For each point, the reference image (stored in memory) is warped by that pose to what it is expected to look like at that position. E.g. it may be that the pose represents a 90 degree rotation. The resulting warped template would be the area surrounding the point, but rotated 90 degrees. The warped template is typically 8x8 pixels. A search area around the predicted position of the point within the camera is defined. This is typically a little larger than the warped template, and accounts for lack of accuracy in the predicted location. The warped template is moved around the search area, comparing the pixels. If the template is found (step 720), it is added to the list of points for optical flow (as above).

[0075] A refined pose (which is more accurate) is calculated from the set of successful matches, and this is used for example, for rendering.

[0076] Fig. 6 is an alternative representation of the process of Fig. 5, illustrating that process 716 may be arranged as a process of repeated selection (716A) of a patch and comparison (716B) to find a "best match" until, at step 720 the best match is good enough to proceed to determine the location of the POI within the video frame.

[0077] Fig. 7a, Fig. 7b and Fig. 7c show an object (e.g. the reference image) 750 and an object 760 captured in three consecutive video frames. The object 750 has four POIs 751, 752, 753 and 754.

[0078] In the first video frame, the locations of the POIs 751, 752, 753 are detected (e.g. by a full template match process across the first video frame). POI 754 may also be detected (e.g. detected and de-selected) or may be too blurry to be detected. In the second video frame, the object 750 has moved while the object 760 has remained stationary. Note that the movement of object 750 can be of any nature and may include a translational component, a rotational component, a pitch component, a roll component and/or a yaw component. As a result, the locations of the three points of interest 751, 752, 753 have changed. In the second video frame, the locations of the three points of interest 751, 752, 753 are estimated using the optical flow method (as is known in the art, velocity components are determined for the points of interest by video frame to video frame analysis and estimations of the locations of the three points of interest 751, 752, 753 within the second video frame can be derived). Optionally, the location of the undetected POI 754 is also estimated, from information in the reference image about its position relative to other POIs in the reference image.

[0079] Optionally, search areas 781, 782, 783 are drawn up around those estimated locations, and template matching is performed to find the points of interest within those search areas 781, 782, 783. The locations of the points of interest 751, 752, 753 are updated and the process is repeated to determine the locations of the points of interest 751, 752, 753 within the third video frame.

[0080] Note that in the third video frame, one of the points of interest, point 752, has become occluded behind object 760. The template match attempt will fail. Rather than using this occluded POI to track the object 750, it is discarded and another POI will be tracked instead, e.g. POI 754. The set of POIs to be located in the next frame (not shown) is supplemented to include POI 754. Similar processes will apply if a template match fails due to blurriness or movement beyond the video frame boundaries or for some other reason.

[0081] In the next frame, a search area 784 may be drawn up around POI 754 to facilitate its discovery and/or facilitate matching of a template for that POI.

[0082] Referring to Fig. 8, preferably the process further comprises determining (802) an area within the reference image excluding each POI removed from the subset; and replacing (804) each POI deleted from the subset with a replacement POI within the area. It may further comprise: determining (812) an estimation xi*, yi* of the location xi, yi of each replacement POI within a video frame V(x,y,ti); and determining (814) the location xi, yi of each replacement POI within the video frame V(x,y,ti) based on the location xref yref of that replacement POI within the reference frame and the estimation xi*, yi* of the location xi, yi of that POI within the video frame V(x,y,ti). It may further comprise: determining (810) at least one neighbouring POI within the subset located in a neighbourhood of the replacement POI within the reference image; and determining (812) the estimation xi*, yi* of the location xi, yi of the replacement POI within the video frame V(x,y,ti) based on the location xi, yi of the neighbouring POI within the video frame V(x,y,ti).

[0083] [0084] [0085] Fig. 9 shows a mobile device 1300 implementing the methods described above. The mobile device 1300 comprises a video camera 1310 for capturing the video frames V(x,y, ti), a display 1320 for rendering the video frames V(x,y, ti), at least one sensor 1330 (e.g. a gyroscope), a processor 1350 and volatile and non-volatile memories 1340.

[0086] The various aspects will be described in detail with reference to the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts. References made to particular examples and implementations are for illustrative purposes, and are not intended to limit the scope of the invention or the claims.

[0087] It will be understood that above embodiments of the present invention have been described by way of example only, and that various changes and modifications may be made without departing from the scope of the invention.

[0088] It will be further understood that the above aspects can be performed independently from each other.

[0089] In particular, steps 900, 1000, 1100 and 1200 shown on Fig.3 could be performed independently. Step 600 shown on Fig.2 could be performed independently from steps 100, 200, 300, 400, 500, 700, 800, 900, 1000, 1100 and 1200. Step 700 shown on Fig.5 could be performed independently from steps 100, 200, 300, 400, 500, 600, 800, 900, 1000, 1100 and 1200. The steps of Fig. 8 could be performed independently from steps 100, 200, 300, 400, 500, 600, 700, 900, 1000, 1100 and 1200.

Claims

1. A method for tracking a reference image captured in video frames, comprising: matching descriptors for a subset of points of interest, POIs, against a video frame, where the subset of POIs is selected from a set of POIs for the reference image; tracking the movement of the set of POIs between video frames; and updating the selection of the subset dependent on the steps of matching and/or tracking.

2. The method of claim 1, wherein the selection of the subset is updated depending on movement between video frames.

3. The method of claim 2, wherein the selection is updated by deselecting POIs that, by their movement, are likely to be unavailable in a subsequent video frame.

4. The method of claim 2 or 3, wherein the selection is updated by adding POIs that, by their movement, are likely to become available in a subsequent video frame.

5. The method of any one of claims 1 to 4, wherein the selection is updated by deselecting POIs and selecting alterative POIs dependent on their relative degrees of matching in the step of matching.

6. The method of any one of claims 1 to 5, wherein the selection of the subset is updated after a step of tracking POIs from a previous video frame to a present frame and before performing matching.

7. The method of any one of claims 1 to 6, wherein the process includes: initial discovery of POIs in a video frame and provision of a value of pose for the reference image; and provision of a three-dimensional value of pose after tracking.

8. The method of any one claims 1 to 7, further comprising determining (600) an estimation xi*, yi* of a location xi, yi of a POI in a given video frame V(x,y,ti) based on a location xi-1, yi-1 of that POI in a previous video frame V(x,y,ti-1).

9. The method of claim 8, further comprising determining (700) the location xi, yi of the POI in the given video frame V(x,y,ti) based on the location xref, yref of that POI within the reference image and the estimation xi*, yi*.

10. A method for tracking a reference image captured in video frames V(x, y, ti), comprising: determining (600) an estimation xi*, yi* of location xi, yi of a point of interest, POI, within a given video frame V(x,y,ti) based on a location xi-1, yi-1 of that POI within a previous video frame V(x,y,ti-1); and determining (700) the location xi, yi of the POI within the given video frame V(x,y,ti) based on the location xref, yref of that POI within the reference image and the estimation xi*, yi* of the location xi, yi of that POI within the given video frame V(x,y,ti).

11. The method of claim 10, further comprising detecting (BOO) a subset of a set of POIs and their respective locations xi-1, yi-1 within the previous video frame V(x,y,ti-1).

12. The method of claim 10 or 11, wherein determining (600) the estimation xi*, yi* of the location xi, yi of a POI within the given video frame V(x,y,ti) comprises: determining (604) velocity components for the location xi-1, yi-1 of the POI within the previous video frame V(x,y,ti-1); and determining (606) the estimation xi*, yi* of the location xi, yi of the POI within the given video frame V(x,y,ti) based on the velocity components and the location xi-1, yi-1 of the point of interest within the previous video frame V(x,y,ti-1).

13. The method of any of claims 10 to 12, wherein determining (700) the location xi, yi of a POI within a given video frame V(x,y,ti) comprises: determining (704) a search area in the given video frame V(x,y,ti) based on the estimation xi*, yi* of the location xi, yi of that POI within the given video frame V(x,y,ti); determining (706) a reference patch in the reference image based on the location xref, yref of that POI within the reference image; comparing (716) the reference patch with at least one patch selected within the search area to determine (720) whether there is a match; and determining (722) the location xi, yi of the POI within the given video frame V(x,y,ti) based on the match.

14. The method of claim 13, wherein determining (720) whether there is a match based the results of the comparing comprises: determining whether a maximum result from the comparing is above a threshold; or determining whether a minimum result from the comparing is below a threshold.

15. The method of claim 13 or 14, wherein determining the location xi, yi of a POI within a given video frame V(x,y,ti) further comprises determining (710) whether the at least one patch selected within the search area is blurry; and comparing (716) the reference patch with the at least one patch only if not blurry.

16. The method of any of claims 1 to 12, further comprising: determining (900, 1000,1100) an angle, a distance and/or a motion of a video camera relative to the reference image; and adjusting (1200) the resolution of the video camera based on the angle, the distance and/or the motion.

17. An apparatus for tracking a reference image in a video comprising a camera for capturing video frames and at least one processor configured to perform a method according to any of claims 1 to 16 on the video frames.

18. A computer-program product for equipping an apparatus comprising code, which when executed by a processor, causes the processor to perform a method according to any of claims lto 16.

19. An apparatus for tracking a reference image captured in video frames, comprising: means for capturing video frames; means for storing the reference image; means for matching descriptors for a subset of points of interest, POIs, against a video frame, where the subset of POIs is selected from a set of POIs for the reference image; means for tracking the movement of the set of POIs between video frames; and means for updating the selection of the subset dependent on the steps of matching and/or tracking.

20. An apparatus for tracking a reference image captured in video frames V(x, y, ti), comprising: means for determining an estimation xi*, yi* of location xi, yi of a point of interest, POI, within a given video frame V(x,y,ti) based on a location xi-1, yi-1 of that POI within a previous video frame V(x,y,ti-1); and means for determining the location xi, yi of the POI within the given video frame V(x,y,ti) based on the location xref, yref of that POI within the reference image and the estimation xi*, yi* of the location xi, yi of that POI within the given video frame V(x,y,ti).