HK1169735B

HK1169735B - Non-rigid tracking-based human-machine interface

Info

Publication number: HK1169735B
Application number: HK12110131.5A
Authority: HK
Inventors: Livet Nicolas; Pasquier Thomas
Original assignee: 高通股份有限公司
Priority date: 2010-11-19
Filing date: 2012-10-12
Publication date: 2018-04-13

Description

The present invention relates to the detection of objects by image analysis and their tracking in a video stream representing a sequence of images and in particular a process and device for detecting and tracking non-rigid objects in motion in real time in a video stream, allowing a user to interact with a computer system.

Augmented reality is used to insert one or more virtual objects into images in a video stream representing a sequence of images. Depending on the type of application, the position and orientation of these virtual objects may be determined by data external to the scene represented by the images, for example coordinates directly from a game scenario, or by data related to certain elements of that scene, for example coordinates from a particular point in the scene such as a player's hand. When the nature of the objects present in the real scene is identified and the position and orientation are determined by data related to certain elements of that scene, it may be necessary to track these elements in motion as a function of their movements or in-scene.

Err1:Expecting ',' delimiter: line 1 column 86 (char 85)

In addition, users in such applications may be asked to interact, in the real scene represented, at least partially, by the image stream, with a computer system in order, in particular, to trigger specific actions or scenarios which allow, for example, interaction with virtual elements superimposed on images.

The same is true for many other types of applications, for example in video game applications.

For this purpose, it is necessary to identify particular movements such as hand movements to identify a predetermined command or commands, such as those initiated by a computer pointing device such as a mouse.

Err1:Expecting ',' delimiter: line 1 column 86 (char 85)

Err1:Expecting ',' delimiter: line 1 column 71 (char 70)

The applicant has developed visual tracking algorithms for textured objects, which have various geometries, do not use a marker and are unique in that they match particular points between a common image in a video stream and a set of key images, which are automatically obtained when the system is initialized. However, such algorithms, described in French patent applications 0753482, 0752810, 0902764, 0752809 and 0957353, do not allow the detection of motion of non-textured or nearly uniformly textured objects such as a user's hands.

While there are solutions that allow a user to interact, in a scene represented by a sequence of images, with a computer system, these solutions are usually complex to implement.

Although this approach is often used for motion capture applications, particularly for special cinematic effects, it is also possible to track the position and orientation of an actor and, in particular, of his hands and feet to allow him to interact in a virtual scene with a computer system. However, the use of this technique is costly because it requires the insertion of bulky actors into the scene represented by the analyzed image stream, who may also be subject to disturbances related to their environment (e.g. electromagnetic disturbances).

Another solution, developed in particular in the European projects OCETRE and HOLONICS , is to use multiple image sources, e.g. multiple cameras, to allow a three-dimensional, real-time reconstruction of the environment and the spatial movements of users. An example of such approaches is described in the document Holographic and action capture techniques , T. Rodriguez, A. Cabo de Leon, B. Uzzan, N. Livet, E. Boyer, F. Geffray, T. Balogh, Z. Megyesi and A. Barsi, August 2007, SIGGRAPH '07, 2007 ACM SIGGRAPH Emerging Technologies.

There are also touchscreens used to visualize augmented reality scenes that allow for determining user interactions with a computer system. However, these screens are expensive and unsuitable for augmented reality applications.

In video game user interactions, an image is typically captured from a webcam-like camera connected to a computer or console. This image, after being stored in a memory of the system to which the camera is connected, is usually analyzed by an object tracking algorithm, also called blobs tracking, to calculate, in real time, the contours of certain elements of the user moving in the image using, for example, an optical wave algorithm.

However, the limitations of this approach are mainly the lack of precision as it is not possible to maintain the correct execution of the process when moving the camera and the lack of semantics as it is not possible to distinguish the movements between the front and the back.

There is also an approach to real-time detection of an interaction between a user and a computer system in an augmented reality scene, from an image of a sequence of images, the interaction resulting from the change in the appearance of the representation of an object present in the image. However, this process, described in particular in French patent application 0854382, does not allow the identification of precise user movements and applies only to sufficiently textured areas of the image.

The invention allows at least one of the problems described above to be solved.

The invention thus concerns a computer process for detecting interactions with a software application by the motion of at least one object within the field of view of an image sensor connected to a computer implementing the process, the image sensor transmitting an image stream to that computer, which process comprises the following steps: receipt of at least one first image of the image sensor;identification of at least one first region of interest in the first image; at least one first region of interest corresponding to a part of the image; receipt of at least one second image of the image sensor; identification of at least one second region of interest in the second image;(a) at least one second region of interest corresponding to that region; at least one first region of interest corresponding to that region; at least one first image of that region; comparison of those first and second regions of interest and determination of a mask of interest characterizing a variation of at least one characteristic of corresponding points in those first and second regions; determination of a movement of said at least one object from that mask of interest, with said at least one object at least partially represented in at least one of those first and second regions of interest; and, analysis of said movement and, in response to that stage of analysis,The first is the question of whether or not a predetermined action is triggered.

The process according to the invention thus allows tracking of objects, in particular deformable and poorly textured objects, particularly for augmented reality applications. Furthermore, the limited amount of processing allows the process to be implemented in devices with limited resources (in particular computing) such as mobile platforms.

The process of the invention allows fast motion of objects to be tracked even in the presence of blur in the images acquired from the image sensor. Furthermore, the processing according to the process of the invention does not depend on a specific colorimetry of moving objects, so it is possible for objects such as a hand or a textured object to move in front of the image sensor used.

The number of degrees of freedom defining the movements of each object tracked can be set for each region of interest.

It is possible to track several areas of interest simultaneously, in particular to allow for multiple controls, for example by using two-hand tracking to increase the number of possible interactions between a user and a software application.

The process of the invention thus combines the advantages of tracking points of interest while limiting the areas where these points are located in order to limit processing and focus it on the object being tracked.

According to a particular method of execution, that step of determining a motion includes a step of determining and matching a plurality of pairs of points of interest in at least one of the first and second images, at least one point of each of those pairs of points of interest belonging to that interest mask, that motion being estimated by a transformation of a first set of points of interest into a second set of points of interest, the points of interest of those first and second sets belonging to that plurality of pairs of points of interest, the points of interest of that set of points of interest belonging to that first image and at least one point of interest of each of those second sets of points of interest belonging to that second set of points of interest. Thus, an object of interest may be determined from at least one part of a set of movements.

This transformation preferably implements a weighting function based on the distance between two points of interest of the same pair of points of interest of the said plurality of pairs of points of interest in order to improve the estimation of the motion of the object being followed.

The process, always in a particular embodiment, also includes a validation step of at least one point of interest of at least one first image, belonging to at least one pair of points of interest, according to the specified movement, at least one validated point of interest being used to follow the object in at least one third image following at least one second image and at least one validated point of interest being used to modify a mask of interest created from at least one second and third image. It is therefore possible to use the same image points of interest in the image if they contribute effectively to the estimation of the overall movement of the object being tracked. In addition, validated points of interest are used for selection of new points of interest in order to avoid excessive accumulation of points of interest in a limited region.

The comparison of at least one of the first and second regions of interest shall consist of a point-to-point subtraction of values from the corresponding points in at least one of the first and second regions of interest and a comparison of the result of the subtraction at a predetermined threshold.

Depending on the particular embodiment, the process also includes a detection step of at least one predetermined feature in the said at least one first image, the said at least one first region of interest being at least partially identified in response to the said detection step. The process according to the invention can thus automatically initialize or reset itself according to elements of the content of the processed image.

The method of the invention thus allows the processing of the next image to be anticipated for object tracking. The estimation of the next image for object tracking, for example, implements a KLT-type object tracking algorithm.

This movement may be characterised by translation, rotation and/or a scale factor.

When the movement is characterised by a scale factor, the triggering or not of the predetermined action can be determined by that scale factor, for example, a scale factor may characterize a mouse click.

In a particular embodiment, the motions of at least two objects in the field of view of the image sensor are determined, the triggering or not of the predetermined action being determined by a combination of the motions associated with the said at least two objects.

The invention also concerns a computer program containing instructions suitable for the implementation of each step of the process described above when the said program is executed on a computer and a device containing means suitable for the implementation of each step of the process described above.

Other advantages, purposes and features of the present invention are shown by the following detailed description, made as a non-limiting example, in the light of the attached drawings in which: Figure 1, including Figures 1a and 1b, shows two successive images of a stream of images which can be used to determine the motion of objects and the interaction of a user; Figure 2, including Figures 2a to 2d, shows examples of variation of an area of interest in one image with the corresponding area of interest in a following image; Figure 3 shows schematically the determination of a motion of an object at least part of which is represented in a region and in a mask of interest of two consecutive images; Figure 4 shows schematically some steps implemented in accordance with the invention to identify,in a continuous mode, variations in the position of objects between two consecutive (or close) images of a sequence of images; Figure 5 illustrates some aspects of the invention where four parameters characterize a motion of an object followed in consecutive (or close) images of a sequence of images; Figure 6, including Figures 6a, 6b and 6c, illustrates an example of implementation of the invention in a driving simulation game where two regions of interest allow the real-time tracking of a user's hands, characterizing a steering movement of a vehicle, in a sequence of images and Figure 7 illustrates an adaptive implementation of the invention;

In general, the invention is intended to track objects in particular regions of images in an image stream, these regions, called regions of interest, comprising part of the objects being followed and part of the scene depicted in the images.

Regions of interest, whose initial position can be predetermined, determined by a user, by an event such as the appearance of a shape or color or by predefined characteristics, for example using key images, can be characterized by points of interest, i.e. singular points of interest, such as points with a strongly matching gradient of luminosity. These regions can also be moved in movements of objects or have a fixed position and orientation in the user's function. The use of regions of interest, for example, allows for multiple interactions, each with multiple observers (or several hands) and several users (or several hands) to interact.

Points of interest are used to find the change in regions of interest in an image stream from one image to the next (or nearby) image, using points of interest tracking techniques based, for example, on algorithms known as FAST for detection, and KLT (Kannada, Lucas and Tomasi acronym), for tracking in the next image. Points of interest in a region of interest may vary during the images analyzed depending on, for example, the distortion of the objects being tracked and their movements that mask parts of the scene represented on the images and/or may bring out areas of objects of interest from those areas.

In addition, objects whose movements are likely to create an interaction are tracked in each region of interest using a tracking mechanism of points of interest in masks defined in the regions of interest.

Figures 1 and 2 illustrate the general principle of the invention.

Figure 1, including Figures 1a and 1b, shows two successive images of a stream of images which can be used to determine the movement of objects and user interaction.

As shown in Figure 1a, Figure 100-1 represents a scene with fixed (not represented) elements such as set elements and moving elements related here to characters (real or virtual). Figure 100-1 includes here a region of interest 105-1. As previously stated, several regions of interest can be treated simultaneously however, for the sake of clarity, only one region of interest is represented here, the treatment of the regions of interest being similar for each of them.

Figure 100-2 of Figure 1b represents an image following Figure 100-1 of Figure 1a in a sequence of images. In Figure 100-2, it is possible to define a region of interest 105-2, corresponding to the position and dimensions of the region of interest 105-1 defined in the previous image, in which disturbances can be estimated. The region of interest 105-1 is thus compared to the region of interest 105-2 of Figure 1b, for example by subtracting these parts of images, pixel by pixel (an acronym for ELICture in English terminology), in order to extract a map of pixels considered in motion.

Points of interest, generically referenced 110 in Figure 1a, can be determined in Figure 100-1, particularly in the region of interest 105-1, using standard image analysis algorithms.

The points of interest 110 defined in the area of interest 105-1 are tracked in Figure 100-2, preferably in the area of interest 105-2, for example using KLT tracking principles by comparing portions of images 100-1 and 100-2 associated with the neighborhoods of the points of interest.

These 115 matches between Figure 100-1 and Figure 100-2 allow estimating the movements of the hand represented with reference 120-1 on Figure 100-1 and reference 120-2 on Figure 100-2.

The hand movement can then be advantageously used to move the region of interest 105-2 from image 100-2 to the modified region of interest 125 which can be used to estimate the hand movement in an image following image 100-2 of the image stream.

It is noted here that, as previously noted, some points of interest in Figure 100-1 have disappeared from Figure 100-2 due, inter alia, to the presence and movements of the hand.

The determination of points of interest in an image is preferably limited to the area corresponding to the corresponding region of interest as located on the current image or to an area comprising all or part of it when a pixel-motion-interest mask is defined in that region of interest.

Depending on the particular embodiment, information characterizing the relative position of the objects to be tracked (e.g. the hand referenced 120-1 in Figure 1a) in relation to a reference related to the camera from which the images are taken is estimated.

Similarly, it is possible to track changes in the region of interest 125 defined in Figure 100-2 from the region of interest 105-1 in Figure 100-1 according to an estimated movement between Figure 100-2 and the next image in the image stream. For this purpose, a new region of interest is first identified in the next image from the region of interest 125.

Figure 2, including Figures 2a to 2c, illustrates the change in a region of interest of one image compared to the corresponding region of interest, at the same position, of a subsequent image, as described by reference to Figure 1. The resulting image, having the same shape as the region of interest, is made up of pixels that can take two states, with the first state being associated, by default, with each pixel. A second state is associated with pixels corresponding to pixels in regions of interest whose variation exceeds a predetermined threshold. This second state forms a mask used to limit the search for points of interest to areas that are located on or near objects to describe the characteristic motion of objects being tracked, and possibly trigger particular actions.

Figure 2a represents a region of interest in a first image while Figure 2b represents the corresponding region of interest in a subsequent image, at the same position. As shown in Figure 2a, the region of interest 200-1 includes a 205-1 hand as well as another 210-1 object. Similarly, the corresponding region of interest, referenced 200-2 and shown in Figure 2b, includes the hand and the object, referenced here 205-2 and 210-2, respectively. The hand, generically referenced 205, moved substantially while the object, generically referenced 210, moved only faintly.

Figure 2c illustrates Figure 215 resulting from the comparison of the regions of interest 200-1 and 200-2. The black part, forming an interest mask, represents the pixels with a difference above a predetermined threshold while the white part represents the pixels with a difference below that threshold. The black part includes in particular the reference part 220 corresponding to the difference in hand position 205 between the regions of interest 200-1 and 200-2.

The image 215 in Figure 2c can be analysed to infer an interaction between the user who has moved his hand in the field of the camera from which the images from which the regions of interest 200-1 and 200-2 are extracted are obtained and a computer system processing these images.

However, a skeletisation step, which may include the removal of additional movements such as the movement as shown by reference 225, is preferably performed before the movement of the points of interest belonging to the mask of interest is analysed.

In addition, the resulting interest mask is advantageously modified to remove parts around points of interest identified recursively between the image from which the region of interest 200-1 is extracted and the image preceding it.

Figure 2d thus illustrates the interest mask represented in Figure 1c, referenced here 235, from which the parts 240 located around the points of interest identified 245 have been deleted.

The interest mask 235 is thus cut off from areas where points of interest already detected are located and where it is therefore not necessary to detect new ones.

Again, the interest mask 235 can be used to identify points of interest whose movements can be analyzed to trigger a particular action if necessary.

Figure 3 again schematically illustrates the determination of a motion of an object at least partly represented in a region and a mask of interest of two consecutive (or close) images. Figure 300 here corresponds to the mask of interest resulting from the comparison of regions of interest 200-1 and 200-2 as described in reference to Figure 2d. However, a skeletonization step has been performed to remove disturbances (notably disturbance 225). Thus, Figure 300 includes a mask 305 that can be used to identify new points of interest whose movements characterize the movement of objects in this region of interest.

For example, the point of interest at the end of the user's index finger is shown. Reference 310-1 refers to this point of interest as it is located in the 200-1 area of interest and reference 310-2 refers to this point of interest as it is located in the 200-2 area of interest. Thus, using standard points of interest tracking techniques, for example an optical wave tracking algorithm, it is possible to find the corresponding point of interest 310-2 from the 200-2 area of interest from the 310-1 area of interest and, therefore, to find the corresponding translation.

Analysis of the movements of several points of interest, in particular of point 310-1 and previously detected and validated points of interest, e.g. points 245, allows the determination of a set of parameters of the movement of the object being tracked, in particular related to translation, rotation and/or change of scale.

Figure 4 shows schematically some steps implemented in accordance with the invention to identify, on a permanent basis, variations in the position of objects between two consecutive (or close) images of a sequence of images.

The images are acquired here by means of an image sensor such as a camera, in particular a webcam-type camera, connected to a computer system implementing the process described here.

After a current frame 400 has been acquired and if this frame is the first to be processed, i.e. if a previous frame 405 of the same video stream has not been previously processed, a first initialization step (step 410) is performed.

As described above, a region of interest can be defined in relation to a corresponding region of interest determined in a previous image (in recursive phase of monitoring, in which case initialization 410 is not necessary) or according to predetermined characteristics and/or particular events (corresponding to the initialization phase).

Thus, for example, a region of interest may not be defined in an initial state, the system being waiting for a triggering event, e.g. a particular movement of the user facing the camera (moving pixels in the image are analyzed for a particular movement), the location of a particular color such as skin color or the recognition of a particular predetermined object whose position defines that of the region of interest.

The initialization step 410 can therefore take several forms depending on the object to be followed in the image sequence and depending on the application being implemented.

In this case, the initial position of the region of interest is predetermined (offline determination) and the tracking algorithm is waiting for a disturbance.

The initialization phase may also include a step of object recognition of a specific type. For example, the principles of detecting Haar wavelet descriptors can be implemented. The principle of these descriptors is described in the paper by Viola and Jones, Rapid object detection using boosted cascade of simple features , Computer Vision and Pattern Recognition, 2001.

Another approach is to segment an image and identify certain colorimetric properties and some predefined shapes. When a shape and/or segmented region of the processed image is similar to the object being searched, for example skin tone and hand contour, the tracking process is initialized as described above.

In a next step (step 415), a region of interest whose characteristics have been previously determined (at initialization or on the previous image) is positioned on the current image to extract the corresponding part of the image.

This part of the image is then compared with the corresponding region of interest in the previous image (step 420), which may include subtracting each pixel from the region of interest in the current image from the corresponding pixel in the corresponding region of interest in the previous image.

The detection of moving dots is thus carried out, according to this example, by the absolute difference of parts of the current image and the previous image. This difference allows to create a mask of interest that can be used to distinguish a moving object from the mostly static set. However, since the segmentation object/setup is not a priori perfect, it is possible to update such a mask of interest recursively based on the movements in order to identify the movements of the pixels of the object tracked and the movements of the pixels that belong to the background of the image.

A threshold is then preferably made on the difference between pixels according to a predetermined threshold value (step 425). Such a threshold can, for example, be made on the luminance. If an 8-bit encoding is used, its value is, for example, equal to 100. It allows to isolate pixels with a movement considered to be sufficiently large between two consecutive (or close) images. The difference between the pixels of the current and previous images is then binarily coded, for example black if the difference exceeds the predetermined threshold characterizing a movement and white otherwise.

If points of interest have been previously validated, the mask is modified (step 460) to exclude from the mask areas in which points of interest are recursively tracked. Thus, as represented by the use of dotted lines, step 460 is only performed if there are validated points of interest. As noted earlier, this step involves removing from the created mask areas, e.g. disks of a predetermined diameter, around the previously validated points of interest.

Points of interest are then searched in the region of the previous image corresponding to the interest mask thus defined (step 435), the interest mask being here the interest mask created in step 430 or the interest mask created in step 430 and modified in step 460.

The search for points of interest is, for example, limited to the detection of twenty points of interest.

This search is advantageously done using the algorithm known as FAST, which says that a Bresenham circle, say 16 pixels in circumference, is built around each pixel in the image. If k adjacent pixels (k typically has a value of 9, 10, 11, or 12) contained in this circle are all either of greater intensity than the central pixel or all of lower intensity than the central pixel, then the central pixel is considered a point of interest.

The points of interest detected in the previous image according to the interest mask and, where applicable, the points of interest previously detected and validated are used to identify the corresponding points of interest in the current image.

A search for the corresponding points of interest in the current image is thus performed (step 440), using, preferably, a method known as optical waves. The use of this technique allows for better robustness when the image is blurry, notably through the use of Gaussian filter-smoothed image pyramids.

When the points of interest of the current image, corresponding to the points of interest of the previous image (determined by the mask of interest or recursive tracking), have been identified, the motion parameters of objects tracked in the region of interest of the previous image in relation to the region of interest of the current image are estimated (step 445). Such parameters, also called degrees of freedom, include, for example, a translation parameter along the x-axis, a translation parameter along the y-axis, a rotation parameter and/or a scale parameter (s), the transformation of which moves a set of bidirectional points from one plane to another,The method of estimation is based on the nonlinear least squares error (NLSE) or Gauss-Newton method, which aims to minimize the projection error on all points of interest. In order to improve the estimation of the model parameters (position and orientation), it is advantageous, in a specific embodiment, to search for these parameters separately.The first step is to calculate the parameters of the change in scale and/or rotation (possibly less precisely) during a second iteration.

In a subsequent step, the points of interest in the previous image, a corresponding one of which has been found in the current image, are preferably analysed in order to determine recursively valid points of interest relative to the estimated movement in the previous step. For these purposes, it is checked, for each previously determined point of interest in the previous image (determined by the mask of interest or by recursive tracking), whether the movement relative to it of the corresponding point of interest in the current image is in accordance with the identified movement. If so, the point of interest is considered valid, otherwise it is considered invalid. A threshold, typically expressed in pre-defined pixels and using a marginal position, is then allowed to obtain a certain advantage in the theoretical tracking process between the tracking point and the tracking point (see Step 445) by applying the following procedure:

The valid points of interest, referred to herein 455, are considered to belong to an object whose movement is being tracked while the invalid points (also called outliers in Anglo-Saxon terminology) are considered to belong to the background of the image or to portions of an object that are no longer visible in the image.

As previously stated, valid points of interest are tracked in the following image and are used to modify the interest mask created by comparing a region of interest of the current image with the corresponding region of interest of the next image (step 460) to exclude portions of pixel mask moving between the current and following images as described in reference to Figure 2d. This modified interest mask allows for removal of portions of images where points of interest are tracked recursively.

The new region of interest (or modified region of interest) that is used to process the current image and the next image is then estimated using the previously estimated degrees of freedom (step 445). For example, if the degrees of freedom are x and y translations, the new position of the region of interest is estimated from the previous position of the region of interest, using these two information. If a change in scale (s) is estimated and considered in this step, it is possible, depending on the scenario considered, to also change the size of the new region of interest that is used in the next and next frames of the video stream.

At the same time, when the different degrees of freedom are calculated, it is possible to estimate a particular interaction according to these parameters (step 470).

Depending on the particular embodiment, the estimation of a scale change (s) is used to detect the triggering of an action in a similar way to a mouse click. Similarly, it is possible to use orientation changes, particularly around the camera's axis of view (called Roll in English terminology) to, for example, allow the rotation of a virtual element displayed in a scene or control a potentiometer button to, for example, set a sound volume of an application.

This detection of interactions following a scale factor to detect an action such as a mouse click can, for example, be implemented as follows, by counting the number of images on which the motion vector standard (translation) and the scale factor (determined by corresponding regions of interest) are less than certain predetermined values. Such a number characterizes a stability of motion of the following objects. If the number of images on which the motion is stable exceeds a certain threshold, the system is placed in a state of waiting for detection of a click. A click is then detected by measuring the average of the absolute differences in scale factors between the previous and the next images, and if a given number of images exceeds a certain threshold, the calculation is validated.

When an object is no longer tracked in a sequence of images (either because it disappears from the image or because it is lost), the algorithm returns, preferably, to the initialization step. In addition, a stall leading to the re-run of the initialization step can be identified by measuring a user's movements. Thus, it may be decided to reset the process when these movements are stable or non-existent for a predetermined period or when a tracked object is out of the field of view of the image sensor.

Figure 5 illustrates more precisely some aspects of the invention where four parameters characterize a motion of an object followed in consecutive (or close to) images of a sequence of images. These four parameters are here a notated translation (Tx, Ty), a notated rotation θ around the optical axis of the image sensor, and a notated scale factor s. These four parameters represent a similarity which is the transformation to transform a point M from a plane to a point M'.

As shown in Figure 5, O is the origin of a reference 505 of the object in the previous figure and O' is the origin of a reference 510 of the object in the current figure, obtained according to the object tracking process, the image reference here bearing the reference 500. where (XM, YM) are the coordinates of the point M expressed in the image reference, (X0, Y0) are the coordinates of the point O in the image reference and (XM', YM') are the coordinates of the point M' in the image reference.

The points Ms and Msθ represent the transformation of the point M according to the scale change s and the scale change s combined with the rotation θ, respectively.

As described above, it is possible to use the nonlinear least squares approach to solve this system using all points of interest followed in step 440 described in reference to Figure 4.

To calculate the new position of the object in the current image (step 465 of Figure 4), it is theoretically sufficient to apply the estimated translation (Tx,Ty) to the previous position of the object as follows: where (X0', Y0') are the coordinates of the point O' in the reference image.

The advantage is that the partial derivatives of each point considered, i.e. the motions associated with each of these points, are weighted according to the associated motion, so that the most moving points of interest have a greater importance in the estimation of the parameters, which prevents the points of interest related to the background from disturbing the tracking of objects.

Thus, it has been observed that it is advantageous to add an influence of the centre of gravity of the points of interest followed in the current image to the previous equation. This centre of gravity corresponds approximately to the local centre of gravity of the motion (the points followed in the current image are from moving points in the previous image). The centre of the region of interest thus tends to shift towards the centre of motion as long as the distance of the object from the centre of gravity is greater than the estimated translational motion. where (XGC, YGC) represents the center of gravity of the points of interest in the current image and WGC represents the weight on the influence of the current center of gravity and WT the weight on the influence of translation.

Figure 6, including Figures 6a, 6b and 6c, illustrates an example of the application of the invention in a driving simulation game where two regions of interest allow the real-time tracking of a user's hands, characterizing a steering wheel movement of a vehicle, in a sequence of images.

More specifically, Figure 6a presents the context of the game, in an imaged manner, while Figure 6b represents the visual of the game as perceived by a user. Figure 6c illustrates the estimation of the motion parameters, or degrees of freedom, of objects followed to infer a motion from a steering wheel of a vehicle.

Figure 6a contains an image 600 taken from the image sequence from the image sensor used. The image 600 is placed in front of the user, as if it were fixed to the windscreen of the vehicle being driven by the user. This image 600 contains an area 605 comprising two circular regions of interest 610 and 615 associated with a steering wheel 620 superimposed by a synthesis image.

The initial position of regions 610 and 615 is fixed on a predetermined horizontal line, equidistant on either side, from a point representing the centre of the steering wheel, waiting for disturbance. When the user positions his hands in these two regions, he is then able to turn the steering wheel either to the left or to the right.

The radius of the circle corresponding to the steering wheel 620 may also vary when the user moves his hands closer or further away from the centre of this circle.

These two degrees of freedom are then used to control the orientation of a vehicle (hands on the circle corresponding to the steering wheel 620) and its speed (scale factor related to the position of the hands relative to the centre of the circle corresponding to the steering wheel 620).

Figure 6b, illustrating the application's visual 625, includes the portion of image 605 extracted from image 600, which allows the user to observe and control his movements.

The regions 610 and 615 of Figure 600 allow the control of the steering wheel 620's movements, i.e. the direction of the vehicle referenced 630 on the visual 625 as well as its speed in relation to the 635 elements of the set, the vehicle 630 and the 635 elements of the set here being created by synthesis images.

Figure 6c describes more precisely the estimation of the freedom parameters related to each of the regions of interest and the deduction of the degrees of freedom of the steering wheel.

In order to analyze the components of the movement, several markers are defined. The mark Ow here corresponds to a reference mark (mark world ), the mark Owh is a local mark related to the wheel 620 and the mark Oa1 and Oa2 are local markers related to the regions of interest 610 and 615. The vectors Va1 ((Xva1, Yva1) and Va2 ((Xva2, Yva2) are the displacement vectors resulting from the analysis of the movement of the user's hands in the regions of interest 610 and 615, expressed in the mark Oa1 and Oa2, respectively.

The new steering wheel orientation θ' is calculated from its previous orientation θ and from the movement of the user's hands (determined by the two regions of interest 610 and 615). where Δθ1 and Δθ2, represent the rotation of the user's hands.Δθ1 can be calculated by the following relation:Δθ1 = atan2(Yva1wh , D/2)with Yva1wh = Xva1 * sin(-(θ + π)) + Yva1 * cos(-(θ + π)) which characterizes the translation on the Y-axis into the Owh reference.Δθ2 can be calculated similarly.

Similarly, the new steering wheel diameter D' is calculated from its previous diameter D and from the movement of the user's hands (determined by means of the two regions of interest 610 and 615). where Xva1wh = Xva1 * cos(-θ + π)) - Yva1 * sin(-θ + π)) and Xva2wh = Xva2 * cos(-θ) - Yva2 * sin(-θ)

Thus, by knowing the steering wheel's angular position and diameter, the game scenario can calculate a corresponding synthesis image.

Figure 7 shows an example of a device that can be used to identify the motion of objects represented in images from a camera and trigger specific actions based on the identified movements.

Err1:Expecting ',' delimiter: line 1 column 342 (char 341)

Err1:Expecting ',' delimiter: line 1 column 176 (char 175)

The communication bus allows communication and interoperability between the various elements included in or connected to the device 700. The representation of the bus is not limited and, in particular, the central unit is capable of communicating instructions to any element of the device 700 directly or through another element of the device 700.

The executable code of each program enabling the programmable device to implement the processes of the invention may be stored, for example, in hard disk 720 or in memory 706.

In one variant, the executable code of the programs can be received via the communication network 728 via the interface 726 to be stored in the same way as described above.

More generally, the program (s) may be loaded into one of the storage media of the device 700 before being executed.

The central unit 704 will command and direct the execution of instructions or portions of software code from the program (s) of the invention, instructions which are stored in the hard disk 720 or in the memory 706 or in the other storage elements mentioned above. When powered on, the program (s) which are stored in nonvolatile memory, e.g. the hard disk 720 or the memory 706 are transferred to the memory 708 which then contains the executable code of the program (s) of the invention, as well as registers to store the variables and parameters necessary for the implementation of the invention.

It should be noted that the communication device incorporating the device of the invention may also be a programmed device, which contains the code of the computer program (s) for example, frozen in an application specific integrated circuit (ASIC).

Naturally, to meet specific needs, a person competent in the field of the invention may apply modifications to the previous description.

Claims

A computer method for detecting interactions with a software application according to a movement of at least one object situated in the field of an image sensor connected to a computer implementing the method, said image sensor providing a stream of images to said computer, the method being characterized in that it comprises the following steps,
- receiving at least one first image from said image sensor;

- identifying at least one first region of interest in said first image, said at least one first region of interest corresponding to a part of said at least one first image;

- receiving at least one second image from said image sensor;

- identifying at least one second region of interest of said at least one second image, said at least one second region of interest corresponding to said at least one first region of interest of said at least one first image;

- comparing (440) said at least one first and second regions of interest and determining a mask of interest characterizing a variation of at least one feature of corresponding points in said at least one first and second regions of interest;

- determining (445) a movement of said at least one object from said mask of interest, said at least one object being at least partially represented in at least one of said at least one first and second regions of interest; and

- analyzing (470) said movement and, in response to said analyzing step, triggering or not triggering a predetermined action.
A method according to claim 1, wherein said step (445) of determining a movement comprises a step of determining and matching at least one pair of points of interest in said at least one first and second images, at least one point of said at least one pair of points of interest belonging to said mask of interest.
A method according to claim 2, wherein said step (445) of determining a movement comprises a step of determining and matching a plurality of pairs of points of interest in said at least one first and second images, at least one point of each of said pairs of points of interest belonging to said mask of interest, said movement being estimated on the basis of a transformation of a first set of points of interest into a second set of points of interest, the points of interest of said first and second sets belonging to said plurality of pairs of points of interest, the points of interest of said first set of points of interest furthermore belonging to said at least one first image and the points of interest of said second set of points of interest furthermore belonging to said at least one second image.
A method according to claim 3, wherein said transformation implements a weighting function based on a distance between two points of interest from the same pairs of points of interest of said plurality of pairs of points of interest.
A method according to claim 3 or claim4 further comprising a step of validating at least one point of interest of said at least one first image, belonging to said at least one pair of points of interest, according to said determined movement, said at least one validated point of interest being used to track said object in at least one third image following said at least one second image and said at least one validated point of interest being used for modifying a mask of interest created on the basis of said at least one second and third images.
A method according to anyone of the preceding claims wherein said step of comparing said at least one first and second regions of interest comprises a step of performing subtraction, point by point, of values of corresponding points of said at least one first and second regions of interest and a step of comparing a result of said subtraction to a predetermined threshold.
A method according to anyone of the preceding claims further comprising a step of detecting at least one predetermined feature in said at least one first image, said at least one first region of interest being at least partially identified in response to said detecting step.
A method according to claim 7, wherein said at least one predetermined feature is a predetermined shape and/or a predetermined color.
A method according to anyone of the preceding claims further comprising a step of estimating at least one modified second region of interest in said at least one second image, said at least one modified second region of interest of said at least one second image being estimated according to said at least one first region of interest of said at least one first image and of said at least one second region of interest of said at least one second image.
A method according to claim 9, wherein said estimation of said at least one modified second region of interest of said at least one second image implements an object tracking algorithm of KL T type.
A method according to anyone of the preceding claims wherein said movement is characterized by a translation, a rotation and/or a scale factor.
A method according to claims 1 to 10, wherein said movement is characterized by a scale factor, whether or not said predetermined action is triggered being determined on the basis of said scale factor.
A method according to anyone of the preceding claims wherein the movements of at least two objects situated in the field of said image sensor are determined, whether or not said predetermined action is triggered being determined according to a combination of the movements associated with said at least two objects.
A computer program comprising instructions adapted for the carrying out of each of the steps of the method according to anyone of the preceding claims when said program is executed on a computer.
A device comprising means adapted for the implementation of each of the steps of the method according to anyone of claims 1 to 13.