HK1190218A

HK1190218A - Self learning face recognition using depth based tracking for database generation and update

Info

Publication number: HK1190218A
Application number: HK14103206.8A
Authority: HK
Inventors: 哈沙瓦他那．纳拉亚纳．基克里; 迈克尔．F．柯尼格; 杰弗里．科尔
Original assignee: 微软技术许可有限责任公司
Priority date: 2012-06-22
Filing date: 2014-04-03
Publication date: 2014-06-27

Description

Self-learning facial recognition with depth-based tracking to generate and update databases

Background

The problem of identifying persons based on the appearance of their faces depicted in images has been studied for many years. Face recognition systems and processes essentially work by comparing some type of model of a human face to an image or characterization of the human face extracted from an input image. These face models are typically obtained by training a face recognition system using images of (or characterizing) a person's face. Therefore, a database of training face images or characterizations is typically required to train a face recognition system.

Disclosure of Invention

Face recognition training database generation techniques embodiments described herein generally involve collecting characterizations of a person's face captured over time and as the person moves through an environment to create a training database of facial characterizations of the person. In one embodiment, a computer-implemented process is employed to generate a facial recognition training database for each person detected in an environment. Processing begins with inputting a sequence of simultaneously captured pairs of frames. Each frame pair includes a frame output from the color camera and a frame output from the depth camera. Next, potential people in the environment are detected using a face detection method and color camera frames. In addition, motion detection methods and depth camera frames are used to detect potential people in the environment.

The location of one or more persons in the environment is determined using detection results generated via the aforementioned face detection methods and motion detection methods. The detection results generated via the face detection method also include, for each potential person detected, a facial characterization of the portion of the color camera frame depicting the person's face. For each person detected only via the motion detection method, the processing further includes identifying a corresponding location of the person in a frame captured while the color camera and generating a facial characterization of the portion of the color camera frame.

For each person detected in the environment, each facial characterization generated for that person is assigned to an unknown personal identifier established specifically for that person, and the facial characterization is stored in a memory associated with the computer used to implement the processing. And then attempt to confirm the identity of each person. If the attempt for the person is successful, each facial characterization assigned to an unknown personal identifier established for the person is reassigned to a facial recognition training database established for the person.

It should be noted that the summary above is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

Drawings

The specific features, aspects, and advantages of the disclosure will become better understood with regard to the following description, appended claims, and accompanying drawings where:

1A-1B are flow diagrams generally outlining one embodiment of a computer-implemented process for: the computer-implemented process is for generating a facial recognition training database for each person detected in an environment.

FIGS. 2A-2E are flow diagrams generally outlining one embodiment of a computer-implemented process for: the computer-implemented process is for generating or supplementing a facial recognition training database for each person detected in the environment based on the new sequence of simultaneously captured frame pairs.

FIG. 3 is a flow chart summarizing one embodiment of the computer-implemented process of: the computer-implemented process is for discarding a facial characteristic representation assigned to an unknown personal identifier whenever the person is not identified more than a prescribed number of attempts to identify the person.

FIG. 4 is a flow chart summarizing one embodiment of a computer implemented process: the computer-implemented process is for capturing a magnified image of a person located in an environment at a distance exceeding a prescribed maximum distance from a color camera.

5A-5C are flow diagrams generally outlining one embodiment of the following computer-implemented process: the computer-implemented process is for generating or supplementing a facial recognition training database for each person detected in an environment based on a sequence of simultaneously captured frame pairs output by additional color camera and depth camera pairs capturing a scene from different viewpoints.

6A-6F are flow diagrams that generally outline one embodiment of a computer-implemented process: the computer-implemented process is for generating or supplementing a facial recognition training database for each person detected in an environment based on a sequence of simultaneously captured frame pairs output by additional color camera and depth camera pairs capturing different scenes within the environment.

7A-7D are flow diagrams generally outlining one embodiment of a computer-implemented motion detection process for embodiments of the facial recognition training database generation techniques described herein.

FIG. 8 is a simplified component diagram of a suitable mobile robotic device in which embodiments of the facial recognition training database generation techniques described herein may be implemented.

FIG. 9 is a diagram depicting a general purpose computing device constituting an exemplary system for implementing embodiments of the facial recognition training database generation techniques described herein.

Detailed Description

In the following description of embodiments of the face recognition training database generation technique, reference is made to the accompanying drawings which form a part hereof, and in which is shown by way of illustration specific embodiments in which the technique may be practiced. It is to be understood that other embodiments may be utilized and structural changes may be made without departing from the scope of the present technology.

It is also noted that the specific terms will be used to describe the invention for clarity purposes and it is not intended to limit the invention to the specific terms selected. Moreover, it is to be understood that each specific term includes all technical equivalents thereof that operate in a broadly similar manner to accomplish a similar purpose. Reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Moreover, the order of process flow representing one or more embodiments of the invention is not inherently indicative of any particular order or of any limitations of the invention.

1.0Training database generation for face recognition systems

Facial recognition training database generation techniques embodiments described herein generally involve collecting characterizations of a person's face captured over time and as the person moves through an environment to create a training database of facial characterizations of the person. When facial characterizations are captured over time, the facial characterizations will represent human faces viewed from different angles and distances, different resolutions, and under different environmental conditions (e.g., lighting and blur conditions). Furthermore, over a longer period of time, where facial characterizations of a person are collected periodically, these characterizations may represent the evolution of the person's appearance. For example, a person may gain or lose weight; growing or removing facial hair; changing the hairstyle; wear different hats, etc. Thus, the resulting training database may be built and populated even before training begins, and added over time to capture the above-described changes in the facial pose and appearance of the person. This results in a rich training resource for face recognition systems. In addition, since the human face recognition training database can be built before the face recognition system needs it, once the database is employed, training will be faster. Further, embodiments of the facial recognition training database generation techniques described herein may generate training databases for multiple persons found in an environment. In addition, existing databases may be updated with incremental face changes. This allows a person's facial changes to be captured gradually enough to allow the person to be identified even when the person's features change greatly over a period of time. For example, if a person is growing beard, their facial features will change slowly. However, since the daily changes are small enough, a new face with a partial beard can be added to the database. In this way, when a person's beard grows completely out, he can still be recognized even if he is not using the face for manual training. The same principle applies to any gradual change due to age, weight, etc.

It is noted that the term "environment" as used throughout this disclosure should be broadly interpreted as any external surroundings of a person. This includes indoor settings, outdoor settings, or a combination of both.

1.1Process for generating a facial recognition training database

1A-1B, one embodiment of a computer-implemented process for generating a facial recognition training database for each person detected as being located in an environment is presented. Processing begins with the input of a sequence of simultaneously captured frame pairs (process action 100). Each frame pair includes a frame output from the color camera and a frame output from the depth camera. The cameras are synchronized such that each camera simultaneously captures an image of the scene. Thus, each time a scene is captured, a simultaneous pair of color and depth frames is generated. Next, potential people in the environment are detected using a face detection method and color camera frames (process action 102). It is noted that any suitable face detection method employing color video frames may be employed to accomplish the foregoing tasks. In addition, potential people in the environment are detected using the motion detection method and the depth camera frames (process action 104). It is noted that any suitable motion detection method employing depth video frames may be employed to accomplish the foregoing tasks. In one implementation (as shown in FIG. 1A), process actions 102 and 104 are completed at approximately the same time.

The detection results generated via the aforementioned face detection methods and motion detection methods are used to determine the location of one or more persons in the environment (process action 106). The detection results generated via the face detection method also include, for each potential person detected, a facial characterization of the portion of the color camera frame depicting the person's face. The type of facial feature representation is specific to the particular face detection method employed and is compatible with the aforementioned face recognition system that will use the generated training database. Next each person detected only via the motion detection method is identified (process action 108) and the corresponding position of each identified person is looked up in the simultaneously captured frames of the color camera (process action 110). Additionally, a facial characterization of the portion of the color camera frame is generated for each identified person (process action 112).

Processing continues with the selection of a previously unselected person of the persons detected in the environment (processing act 114). Each facial characterization generated for the selected person is assigned to an unknown personal identifier that is established specifically for that person (process action 116) and is stored in memory associated with the computer used to implement the processing (process action 118). The aforementioned computer may be, for example, one of the computers described in the exemplary operating environment section of this disclosure.

It is noted that by this point in the process, facial feature representations have been assigned to unknown personal identifiers. In this way, facial characterizations are being created and saved even though the identity of the detected person is still unknown. Thus, if the identity of the detected person is ultimately established, the saved facial characterization may be reassigned to the facial recognition training database established for that person. To do so, processing continues with an attempt to confirm the identity of the person (process action 120). This discrimination action is accomplished using any suitable conventional method, including inviting unknown persons to interact with a computer to provide discrimination information. It is next determined whether the attempt was successful (process action 122). If the attempt is successful, each facial characterization assigned to an unknown personal identifier established for the selected person is reassigned to the facial recognition training database established for that person (process action 124). Regardless of whether the attempt of process action 120 was successful, it is next determined whether all detected persons have been selected (process action 126). If not, then process actions 114 through 126 are repeated until all detected persons have been selected and considered. At which point the process ends.

1.2Sequence of successively captured frame pairs

To prevent a situation where a person detected in a sequence of successively captured frame pairs is associated with a new unknown personal identifier, the location of each person detected in the foregoing process is tracked even if such an identifier was previously issued to that same person. Any suitable conventional tracking method may be employed for this purpose. Thus, in analyzing future sequences of frame pairs, known detected persons are previously detected and associated with unknown personal indicators or facial recognition training databases. In this way, facial characterizations created for a person can be assigned to the appropriate set and no new unknown personal identifiers need to be established.

Given the foregoing, there are a number of possibilities as to how a person detected in a sequence of successively captured frame pairs will be handled. For example, if a detected person was previously detected and tracked, any facial characterization created from the new sequence will be assigned to the person's existing unknown personal identifier if the person was not previously recognized or to the person's facial recognition training database if the person was previously recognized. On the other hand, if the detected person is new to the scene, an unknown personal identifier will be created and assigned to the generated facial characterization. In addition, whenever a facial characterization is assigned to an unknown personal indicator (whether an existing or new personal indicator), an attempt will be made to identify the person.

More specifically, referring to FIGS. 2A-2E, in one embodiment, when a new sequence of simultaneously captured frame pairs becomes available, the new sequence is entered (process action 200). The new sequence of frame pairs is then used to perform the process actions 102 through 112 in fig. 1.

Processing then continues with the selection of one of the people detected in the environment using the new sequence of frame pairs (processing action 202). It is then determined whether the selected person corresponds to a person whose position was previously determined using the sequence of simultaneously captured frame pairs preceding the new sequence (process action 204). As indicated previously, in one embodiment, this is done by tracking the location of previously detected people over time. If it is determined that the person corresponds to such a previously detected person, then a determination is next made as to whether the identity of the person was previously confirmed (process action 206). If the identity of the person was previously confirmed, then the previously unselected ones of the facial feature representations generated for the person from the new sequence of simultaneously captured frame pairs are selected (process action 208). Note that the facial characterization is generated as previously described. A determination is made as to whether the selected facial characteristic representation differs by a prescribed degree from each facial characteristic representation assigned to the facial recognition training database established for the person (process action 210). If the difference is by a prescribed degree, the selected facial characteristic representation is assigned to a facial recognition training database established for the selected person (process action 212) and stored in memory associated with the computer (process action 214). Otherwise, the selected facial feature representation is discarded (process action 216). In any case, it is then determined whether all of the facial characterizations created for the selected person from the new frame pair sequence have been selected (process action 218). If not, process actions 208 through 218 are repeated until all facial feature representations have been selected and considered.

If, however, it is determined in process action 206 that the identity of the selected person was not previously confirmed, then the previously unselected ones of the facial feature representations generated for that person from the new sequence of simultaneously captured frame pairs are selected (process action 220). It is then determined whether the selected facial characteristic representation differs to a prescribed degree from each facial characteristic representation assigned to the unknown personal identifier established for the person (process action 222). If the difference is by a prescribed degree, the selected facial characteristic representation is assigned to an unknown personal identifier established for the selected person (process action 224) and the selected facial characteristic representation is stored in a memory associated with the computer (process action 226). Otherwise, the selected facial feature representation is discarded (process action 228). In any case, it is then determined whether all of the facial characterizations created for the selected person from the new frame pair sequence have been selected (process action 230). If not, process actions 220 through 230 are repeated until all facial feature representations have been selected and considered. Processing then continues to attempt to confirm the identity of the person (process action 232). As before, the discrimination action is accomplished using any suitable conventional method, including inviting an unknown person to interact with a computer to provide discrimination information. It is next determined whether the attempt was successful (process action 234). If the attempt is successful, each facial characterization assigned to an unknown personal identifier established for the selected person is reassigned to the facial recognition training database established for that person (process action 236).

The following possibilities also exist: the selected person is new to the environment or has not been detected in the past. To this end, if it is determined in process act 204 that the selected person does not correspond to a person whose position was previously determined using the sequence of simultaneously captured frame pairs preceding the new sequence, then each facial characterization generated for the selected person is assigned to an unknown person identifier specifically established for that person (process act 238), and the facial characterization is stored in a memory associated with the computer used to implement the processing (process act 240). Next, an attempt is made to confirm the identity of the person (process action 242). A determination is then made as to whether the attempt was successful (process action 244). If the attempt is successful, each facial characterization assigned to an unknown person identifier established for the selected person is reassigned to the facial recognition training database established for that person (process action 246).

Once the currently selected person is considered as outlined above, a determination is made as to whether all of the persons detected in the environment using the new frame pair sequence have been selected (process action 248). If not, process actions 202 through 248 are repeated until all detected persons have been selected and considered. At this point, the current iteration of processing ends. However, the process may be repeated the next time a new sequence of simultaneously captured frame pairs becomes available.

Face recognition methods typically use facial feature characterizations such as those previously described in identifying a person from an image of the person's face. With regard to the foregoing processing acts for attempting to confirm the identity of a person, it is noted that a facial characterization of an unknown personal identifier generated for and assigned to the person may be employed in the attempt.

1.2.1Indistinguishable person

There is also the possibility in the foregoing process of: the detected person will not be discernable. To conserve memory space, in one embodiment as outlined in FIG. 3, if the identity of the selected person is determined not to be confirmed in any of processing acts 122, 234 or 244, the number of times the sequence of simultaneously captured frame pairs has been entered and processed without confirming the identity of the person is recorded (processing act 300). It is then determined whether the number recorded exceeds a prescribed maximum number (e.g., 100) (process action 302). If not, the process outlined above continues as is, and the memory saving process ends. If, however, the number recorded exceeds the prescribed maximum number, then each facial characteristic representation assigned to the unknown personal identifier established for the selected person is deleted from the computer's memory (process action 304).

1.2.2Zoom scheme

It is noted that many face recognition methods that may be employed in connection with the face recognition training database generation technique embodiments described herein will often fail to identify persons detected in the environment, but located at a greater distance from the camera. Although not necessarily so, the foregoing situation may occur when a person is detected via a motion detection method only. This situation can be handled using a zoom scheme. The zoom scheme is completed before each facial characterization generated for a person is assigned to an unknown personal identifier established for that person. More specifically, referring to FIG. 4, in one embodiment, a selection is detected (by any of the aforementioned methods) as a previously unselected person present in the environment (process action 400). It is then determined whether the selected person is located in the environment at a distance exceeding a prescribed maximum distance (e.g., 3 meters) from the color camera (process action 402). If so, the position of the selected person is provided to a controller that controls a color camera with zoom capability (process action 404). The controller causes the color camera to zoom in on the face of the selected person to a degree proportional to the distance from the color camera to the person, and then captures a zoom image of the face of the person. It is noted that the color camera may be the aforementioned color camera or a separate camera positioned to capture images of the environment. The degree of zoom is calculated such that, given the distance from the camera to the selected person, the resulting image will depict the person's face at a resolution that facilitates face recognition. The zoom image is then input (process action 406) and a facial characterization of the portion of the zoom image depicting the person's face is generated (process action 408). This facial characteristic representation is then assigned to the unknown personal identifier established for the person, along with all other facial characteristic representations generated for the selected person.

1.3Additional color camera and depth camera

The environment in which embodiments of the facial recognition training database generation techniques described herein operate may be quite large. Thus, in one embodiment, more than one pair of color and depth cameras are employed to cover the environment. Given that more than one pair of cameras are available in an environment, they may be configured to capture the same scene but from different points of view (points of view). This situation allows more facial characterizations to be generated for different people in the same time period for the same person detected by different pairs of cameras, or for people for which one pair of cameras cannot "see" the other pair of cameras can. In this regard, it is advantageous that each pair of cameras knows the location of the person in the scene so that it can be easily determined whether the person is the same person or a different person detected using the other camera pairs. In one embodiment, this is accomplished by configuring the camera pairs to capture the frame pairs substantially simultaneously. In this way, the position of a person calculated by one pair of cameras will match the position of a person calculated by the other pair of cameras if it is the same person and not match if it is a different person.

1.3.1Capturing the same scene but from different viewpoints

More specifically, referring to fig. 5A-5C, for each additional color camera and depth camera pair that captures a scene from a different viewpoint, an additional sequence of simultaneously captured frame pairs is input (process action 500). Next, potential persons in the environment are detected using a face detection method and color camera frames output by the color cameras of the additional camera pair (process action 502). In addition, potential people in the environment are detected using the motion detection method and depth camera frames output by the depth cameras of the additional camera pair (process action 504). The locations of one or more persons in the environment are determined using detection results generated via the aforementioned face detection methods and motion detection methods (process action 506). The detection results generated via the face detection method also include, for each potential person detected, a facial characterization of the portion of the color camera frame depicting the person's face.

Next each person detected via the motion detection method is identified (process act 508), and the corresponding position of each identified person is looked up in the frames captured simultaneously by the color cameras of the other camera pair (process act 510). Additionally, a facial characterization of the portion of the color camera frame is generated for each identified person (process action 512).

Processing continues with the selection of a previously unselected one of the persons detected in the environment based on the frame pairs output from the additional color camera and depth camera pairs (processing act 514). Then, based on the identified location of the person, it is determined whether the person has also been detected using other color camera and depth camera pairs (process action 516). If so, based on the detection of the person using the other color camera and depth camera pairs, each facial feature representation generated for the selected person based on the frame pairs output from the other color camera and depth camera pairs is assigned to an unknown personal identifier established for the person (process action 518). Otherwise, each facial characterization generated for the selected person based on the frame pairs output from the additional color camera and depth camera pairs is assigned to an unknown personal identifier established for that person (process action 520). In either case, each facial characterization generated for the selected person based on the frame pairs output from the additional color camera and depth camera pairs is stored in a memory associated with the computer (process action 522). In addition, an attempt is made to confirm the identity of the person (process action 524). A determination is then made as to whether the attempt was successful (process action 526). If the attempt is successful, each facial characterization assigned to an unknown personal identifier established for the selected person is reassigned to the facial recognition training database established for that person (process action 528). Regardless of whether the attempt of process action 526 was successful, it is next determined whether all detected persons have been selected (process action 530). If not, process actions 514 through 530 are repeated until all detected persons have been selected and considered. The process ends at this point, but can be repeated each time a new sequence of simultaneously captured frame pairs is input from another color camera and depth camera pair.

1.3.2Capturing different scenes

It may also be assumed that more than one pair of cameras is available in the environment, which pairs may be configured to capture different scenes. This configuration is useful in situations where a pair of cameras cannot cover the entire environment. In view of this, a person detected in one scene covered by a pair of cameras can be tracked, and if the person moves into the portion of the environment covered by other pairs of cameras, knowledge of their location as the person leaves one scene to the other can be used to confirm that the person detected in the new scene is the same person detected in the previous scene. In addition, if feasible, a face recognition method or some other method of distinguishing people may be employed to confirm whether a person detected in a new scene is the same person detected in a previous scene. This facilitates assigning facial characterization generated for a person in the new environmental section to the correct unknown personal identifier (or to the correct facial recognition training database if the person was previously recognized).

More specifically, referring to fig. 6A-6F, given additional pairs of color and depth cameras of different scenes within the capture environment, additional sequences of simultaneously captured pairs of frames are input (process action 600). Next, potential persons in the environment are detected using the face detection method and color camera frames output by the color cameras of the additional camera pair (process action 602). In addition, potential people in the environment are detected using the motion detection method and depth camera frames output by the depth cameras of the additional camera pair (process action 604). The locations of one or more persons in the environment are determined using detection results generated via the aforementioned face detection methods and motion detection methods (process action 606). The detection results generated via the face detection method also include, for each detected potential person, a facial characterization of the portion of the color camera frame depicting the person's face.

Next each person detected via the motion detection method is identified (process act 608), and the corresponding position of each identified person is looked up in the frames captured simultaneously by the color cameras of the further camera pair (process act 610). Additionally, a facial characterization of the portion of the color camera frame is generated for each identified person (process action 612).

Processing continues with the selection of a previously unselected one of the persons detected in the environment based on the frame pairs output from the additional color camera and depth camera pairs (processing act 614). A determination is then made as to whether the selected person was previously detected in other scenes in the environment using other color camera and depth camera pairs (process action 616). As indicated previously, this may be based on tracking of the location of the person as they leave one scene to another, facial recognition methods, or some other method of identifying the person. If the selected person was previously detected in other scenarios, a further determination is made as to whether the identity of the selected person was previously confirmed (process action 618). If the selected person was not previously identified, then a previously unselected one of the facial characterizations generated from the additional sequence of simultaneously captured frame pairs is selected (process act 620), and a determination is made as to whether the selected facial characterization differs by a prescribed degree from each of the facial characterizations assigned to unknown person identifiers previously established for the selected person (process act 622). If so, the selected facial characteristic representation is assigned to an unknown personal identifier previously established for the person (process act 624), and the selected facial characteristic representation is stored in a memory associated with the computer (process act 626). Otherwise the selected facial feature representation is discarded (process action 628). It is then determined whether all facial characterization generated from the further sequence of simultaneously captured frame pairs have been selected (process action 630). If not, process actions 620 through 630 are repeated until all facial feature representations have been selected and considered. Next, an attempt is made to confirm the identity of the selected person (process action 632). A determination is then made as to whether the attempt was successful (process action 634). If the attempt is successful, each facial characterization assigned to an unknown person identifier established for the selected person is reassigned to the facial recognition training database established for that person (process action 636).

However, if it is determined in process act 618 that the selected person was previously recognized, then a previously unselected one of the facial features generated from the further sequence of simultaneously captured frame pairs is selected (process act 638), and it is determined whether the selected facial feature differs to a prescribed degree from each of the facial features assigned to the previously established facial recognition training database for the selected person (process act 640). If so, the selected facial characteristic representation is assigned to a facial recognition training database established for the person (process action 642) and stored in memory associated with the computer (process action 644). Otherwise the selected facial characteristic representation is discarded (process action 646). A determination is then made as to whether all facial characteristic representations generated from additional sequences of simultaneously captured frame pairs have been selected (process action 648). If not, process actions 638 through 648 are repeated until all facial feature representations have been selected and considered.

However, if it is determined in process action 616 that the selected person has not been previously detected in other scenes in the environment, then processing continues with assigning each facial feature representation generated for the selected person based on the pairs of frames output from the additional color camera and the additional depth camera to the newly established unknown personal identifier for the person (process action 650). Each of these facial characterizations is also stored in memory associated with the computer (process action 652). An attempt is then made to confirm the identity of the selected person (process action 654). A determination is then made as to whether the attempt was successful (process action 656). If the identity of the selected person is confirmed, each facial characterization assigned to the unknown personal identifier established for that person is reassigned to the facial recognition training database established for that person (process action 658).

Once the selected persons have been considered as described above, a determination is made as to whether all of the detected persons have been selected (process action 660). If not, process actions 614 through 660 are repeated until all detected persons have been selected and considered. The process ends at this point, but may be repeated each time a new sequence of simultaneously captured frame pairs is input from another color camera and depth camera pair.

1.4Motion detection

Although any motion detection method may be employed for the face recognition training database generation technique embodiments described herein, the following method is employed in one embodiment. Generally, the method utilizes short term variations in depth data extracted from the depth camera frames to detect potential people in the environment.

More specifically, referring to fig. 7A-7D, in one embodiment, the motion detection process first involves designating all pixels in the first depth camera frame as background pixels (process action 700). A determination is then made as to whether a new, successively captured depth frame has become available (process action 702). If not, process action 702 is repeated until a new frame is available. When a new depth frame is input, previously unselected pixels of the depth frame are selected (process action 704), and it is determined whether the depth value of the selected pixel has changed by more than a prescribed amount from the value of a pixel representing the same location within the environment in the depth frame captured immediately prior to the currently considered frame (process action 706). If the depth value has changed by more than the prescribed amount, the selected pixel is designated as a foreground pixel (process action 708). It is next determined whether there are any previously unselected pixels of the depth frame remaining (process action 710). If there are remaining pixels, then process actions 704 through 710 are repeated. If not, a determination is made as to whether the depth frame currently under consideration is the last frame in the sequence (process action 712). If not, process actions 702 through 712 are repeated.

However, if it is the last frame, then a seed point is established among the foreground pixels in the last frame, and the pixels associated with that point are assigned as part of a blob (blob) (process action 714). Next, previously unselected pixels that are adjacent to the pixel assigned to the blob (which was originally only the seed point pixel) and that have not yet been assigned to the blob are selected (process action 716). It is first determined whether the selected pixel is assigned to a different blob (process action 718). If so, then the two blobs are merged into one blob (process action 720). Next, a determination is made as to whether there are any previously unselected pixels adjacent to the pixels assigned to the merged blob that have not been assigned to the merged blob (process action 722). If so, the previously unselected ones of the pixels are selected (process action 724) and process actions 718 through 724 are repeated. However, whenever it is determined in process action 718 that the selected pixel is not assigned to a different blob, a determination is made as to whether the depth value of the selected pixel is the same, within a specified tolerance, as the current average value of the pixels assigned to the blob (process action 726). If so, then the selected pixel is assigned to the blob (process action 728). If not, no action is taken. In either case, however, it is next determined whether there are any previously unselected pixels that are adjacent to the pixel assigned to the blob (merged or not) and that have not yet been assigned to the blob (process action 730). If such a pixel exists, then process actions 716 through 730 are repeated. Otherwise, no action is taken. Thus, the pixels around the seed point pixel are all considered and cause the blob to be merged or assigned to if the pixel has the desired depth value, then the pixels around the enlarged blob (merged or not) are considered, and so on to cause the blob to grow. This continues until no more neighboring pixels can be found that are not assigned to the blob and have the same depth value within the specified tolerance as the current average of the pixels assigned to the blob.

Next, it is determined whether there are foreground pixels that have not yet been assigned to the blob (process action 732). If such pixels remain, a seed point is established among the unassigned foreground pixels in the last frame, and the pixel associated with that point is assigned as part of a new blob (process action 734). Process actions 716 through 734 are then repeated until no unassigned foreground pixels remain.

Once no unallocated foreground pixels remain (and thus new blobs cannot be formed), the previously unselected ones of the blobs are selected (process action 736). A determination is then made as to whether the blob satisfies a set of prescribed criteria indicating that the blob represents a person (process action 738). If not, the blob is removed (process action 740). However, if the selected blob satisfies the specified criteria, then the blob is designated as representing a potential person located within the environment (process action 742).

It is noted that the criteria used to indicate that the blob represents a person may be any conventional set of criteria. Additionally, the criteria may include whether the blob conforms to normal human parameters in the real spatial dimension. For example, whether the blob exhibits a rectangular region corresponding to a human chest and head.

2.0Color camera and depth camera

The aforementioned color camera and depth camera employed by embodiments of the facial recognition training database generation techniques described herein will now be described in greater detail. In general, a color camera outputs a continuous sequence of digital color images of a scene captured by the camera. As in the previous description, these images are sometimes referred to as frames or image frames. An example of a suitable color camera is a conventional RGB camera. The depth camera outputs a continuous sequence of digital depth images of a scene captured by the camera. As in the previous description, these images are sometimes referred to herein as frames or depth frames. The pixel values in the depth frame indicate the distance between the depth camera and objects in the environment. For example, one suitable depth camera is a conventional infrared-based depth camera. This type of camera projects a known infrared pattern into the environment and determines depth based on the pattern deformation captured by the infrared imager.

As previously described, embodiments of the facial recognition training database generation techniques described herein may use pixel correlation between pairs of color frames and depth frames captured simultaneously. In other words, it is sometimes useful to know which pixel in one frame of a pair of frames depicts the same location in the scene as the given pixel in the other frame. Although the pixel correlation can be confirmed using conventional methods each time a pair of simultaneous frames is captured, in one embodiment a pre-computed transformation is employed that defines the pixel coordinates. More specifically, if the color camera and the depth camera are synchronized such that they move together in the same manner, the relative transformation between them will not change. Thus, a transform may be pre-computed and used to determine the pixel correlation of each pair of captured simultaneous frames.

Embodiments of the facial recognition training database generation techniques described herein may also employ fixed position color cameras and depth cameras. Fixed position means that the camera is placed at a specific location within the environment and does not move from that location independently. This does not, of course, preclude the camera being relocated within the environment. However, it is foreseen that they remain in the same position during operation. In addition, when a fixed position camera does not move position, this does not mean that the camera cannot be translated, tilted, rotated, or zoomed while in that position.

Alternatively, the facial recognition training database generation technique embodiments described herein may employ a moving color camera and depth camera. For example, the camera may be mounted in a mobile robotic device. A suitable mobile robotic device may generally be any conventional mobile robotic device that exhibits the following attributes. First, referring to fig. 8, a robotic device 800 is able to move around the environment in which it is intended to travel. Thus, the mobile robotic device 800 includes a maneuver 802 for the mobile device to traverse the environment. The mobile robotic device 800 also has sensors for tracking and following a person through the applicable environment. In particular, these sensors include the aforementioned color camera 804 and depth camera 806. The color camera 804 and the depth camera 806 are repositionable so that different portions of the environment can be captured. To this end, the color camera 804 and the depth camera 806 may be disposed in a head 808 of the mobile robotic device 800, the head 808 generally being placed over the aforementioned maneuvers 802. The viewpoints of the cameras 804, 806 may be changed by reorienting the cameras themselves, or by moving the head 808, or both. An example of the latter case is the following configuration: in this configuration, the head is rotated about a vertical axis to provide a 360 degree translational motion while the camera is pivoted up and down to provide a tilting motion. The camera also has a zoom feature.

The mobile robotic device 800 further includes a control unit 810, the control unit 810 controlling the maneuvers 802 to move the robotic device through the environment in a conventional manner; and controls movement of the head 808 or cameras 804, 806, or both, to capture different scenes within the environment. In addition, control unit 810 includes a computing device 812 (such as those described in the exemplary operating environment section of this disclosure). The computing device 812 includes a control module that is responsible for initiating movement control signals to the motorised portion and the head portion, and for generating a facial recognition training database using frames captured by the color camera and the depth camera in the manner previously described. Control of the movement of the motorised portion and the head portion is performed using conventional methods. However, the latter function is handled by the face recognition training database generation submodule.

It is noted that in operation, the motion detection process previously described in connection with fig. 7A-7D will be performed when the mobile robotic device is stationary and the camera is not moving (e.g., no pan, tilt, rotation, or zoom). This prevents false positives due to relative movement of the cameras.

3.0Exemplary operating Environment

Embodiments of the face recognition training database generation techniques described herein are operational with numerous types of general purpose or special purpose computing system environments or configurations. FIG. 9 illustrates a simplified example of a general-purpose computer system upon which various embodiments and elements of the facial recognition training database generation techniques as described herein may be implemented. It is noted that any blocks represented by broken or dashed lines in fig. 9 represent alternative embodiments of a simplified computing device, and that any or all of these alternative embodiments, as described below, may be used in conjunction with other alternative embodiments described throughout this document.

For example, FIG. 9 shows a general system diagram illustrating a simplified computing device 10. Such computing devices may generally be found in devices having at least some minimal computing power, including, but not limited to, personal computers, server computers, hand-held computing devices, portable or mobile computers, communication devices such as cellular telephones and PDAs (personal digital assistants), multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, audio or video media players, and the like.

In order for an apparatus to implement the facial recognition training database generation technique embodiments described herein, the apparatus should have sufficient computing power and system memory to enable basic computing operations. In particular, as shown in FIG. 9, computing power is illustrated generally by one or more processing units 12, and may also include one or more GPUs (graphics processing units) 14, one or both of processing units 12 and GPU14 in communication with system memory 16. Note that the processing unit 12 of the general purpose computing device may be a special purpose microprocessor such as a DSP (digital signal processor), VLIW (very long instruction word), or other microcontroller, or may be a conventional CPU having one or more processing cores, including a dedicated GPU-based core in a multi-core CPU.

In addition, the simplified computing device of FIG. 9 may also include other components, such as a communication interface 18, for example. The simplified computing device of FIG. 9 may also include one or more conventional computer input devices 20 (e.g., pointing devices, keyboards, audio input devices, video input devices, tactile input devices, devices for receiving wired or wireless data transmissions, etc.). The simplified computing device of fig. 9 may also include other optional components, such as, for example, one or more conventional display devices 24 and other computer output devices 22 (e.g., audio output devices, video output devices, devices for communicating wired or wireless data transmissions, etc.). Note that typical communication interfaces 18, input devices 20, output devices 22, and storage devices 26 for a general purpose computer are well known to those skilled in the art and will not be described in detail herein.

The simplified computing device of FIG. 9 may also include a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by computer 10 via storage device 26 and includes both volatile and nonvolatile media, as removable 28 and/or non-removable 30, for storing information such as computer-readable or computer-executable instructions, data structures, program modules, or other data. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer storage media includes, but is not limited to, computer or machine readable media or storage devices (such as DVD (digital versatile disc), CD (compact disc), floppy disk, magnetic tape drive, hard disk drive, optical disk drive, solid state memory device, RAM (random access memory), ROM (read only memory), EEPROM (electrically erasable programmable read only memory), flash memory or other memory technology, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices), or any other device which can be used to store the desired information and which can be accessed by one or more computing devices.

The retention of information such as computer-readable or computer-executable instructions, data structures, program modules, etc., may also be accomplished by encoding one or more modulated data signals or carriers using any of the various aforementioned communication media, or other transmission mechanisms or communication protocols, and includes any wired or wireless information delivery mechanisms. Note that the term "modulated data signal" or "carrier" generally refers to a signal that: the signal is such that one or more of its characteristics are set or changed in such a manner as to encode information in the signal. For example, communication media includes wired media such as a direct-wired connection or a wired network carrying one or more modulated data signals, and wireless media such as acoustic, RF (radio frequency), infrared, laser, and other wireless media for transmitting and/or receiving one or more modulated data signals or carrier waves. Combinations of any of the above should also be included within the scope of communication media.

Furthermore, software, programs, and/or computer program products that implement portions or all of the various facial recognition training database generation techniques embodiments described herein or portions thereof may be stored, received, transmitted or read from any desired combination of computer or machine readable media or storage devices and communication media, in the form of computer-executable instructions or other data structures.

Finally, the facial recognition training database generation technique embodiments described herein may be further described in the general context of computer-executable instructions, such as program modules, being executed by a computing device. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The embodiments described herein may also be practiced in distributed computing environments where tasks are performed by one or more remote processing devices, or within a "cloud" of one or more devices that are linked through one or more communications networks. In a distributed computing environment, program modules may be located in both local and remote computer storage media including media storage devices. Further, the foregoing instructions may be implemented partially or wholly as hardware logic circuitry, which may or may not include a processor.

4.0Other embodiments

In the foregoing description of embodiments of the face recognition training database generation technique, a depth camera and a motion detection method using depth frames from such a camera are employed. However, there are also conventional motion detection methods that can detect people in an environment using only a color camera. In view of this, in an alternative embodiment, the depth camera is removed and only the color camera is used to detect potential people in the environment. Thus, the previously described process would be modified such that a sequence of frames output from the color camera is input. These image frames are then used in conjunction with a face detection method to detect potential people in the environment, and are also used in conjunction with an appropriate motion detection method to detect potential people in the environment. Likewise, when a new sequence of frames is employed as previously described, these are also only new sequences of frames output from the color camera.

It is also noted that any or all of the aforementioned embodiments throughout the specification can be used in any desired combination to form additional hybrid embodiments. Furthermore, although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

According to an embodiment of the present disclosure, the following additional notes are also disclosed:

1. a computer-implemented process for generating a facial recognition training database for each person detected as being located in an environment, comprising:

using a computer to perform the following processing actions:

(a) inputting a sequence of simultaneously captured frame pairs, each frame pair comprising a frame output from a color camera and a frame output from a depth camera;

(b) detecting potential people in the environment using a face detection method and a color camera frame;

(c) detecting potential people in the environment using a motion detection method and depth camera frames;

(d) determining a location of one or more persons in the environment using detection results generated via the face detection method and motion detection method, the detection results generated via the face detection method including, for each detected person, a facial characterization of a portion of a color camera frame depicting the person's face;

(e) for each person detected only via the motion detection method,

identifying a corresponding position of the person in a frame captured simultaneously with the color camera,

generating the facial characterization of the portion of the color camera frame depicting the person's face;

(f) for each person detected in the environment,

each facial characterization generated for the person is assigned to an unknown personal identifier established for the person,

storing each of the facial characterization in a memory associated with the computer,

attempting to confirm the identity of the person, an

Each facial characterization assigned to an unknown personal identifier established for the person is reassigned to a facial recognition training database established for the person each time the identity of the person is confirmed.

2. The process according to supplementary note 1, further comprising:

inputting a new sequence of simultaneously captured frame pairs, each frame pair comprising a frame output from the color camera and a frame output from the depth camera;

repeating process acts (b) through (e);

for each person detected in the environment and depicted in the new sequence of simultaneously captured frame pairs,

determining whether the person corresponds to a person whose position was previously determined using a sequence of simultaneously captured frame pairs preceding the new sequence,

whenever it is determined that the person has determined its position corresponding to a person previously determined using the sequence of simultaneously captured frame pairs preceding the new sequence, determining whether the identity of the person has been previously confirmed,

determining, for each facial characterization generated from the new sequence of simultaneously captured frame pairs, whether the facial characterization differs by a prescribed degree from each facial characterization assigned to a facial recognition training database established for the person each time the identity of the person is determined to have been previously confirmed,

for each facial characterization generated from the new sequence of simultaneously captured frame pairs, assigning the facial characterization to a facial recognition training database established for the person and storing the facial characterization in a memory associated with the computer each time the facial characterization is determined to differ to a prescribed extent from each facial characterization assigned to the facial recognition training database established for the person.

3. The process of supplementary note 2, wherein the color camera and depth camera are arranged on a mobile robotic device movable around the environment, and wherein a new sequence of the simultaneously captured frame pairs is captured from a different viewpoint within the environment than a viewpoint capturing previously captured frame pairs, the new viewpoint being achieved by at least one of: the mobile robotic device changes the direction in which the color camera and depth camera are pointing without changing the location within the environment, or the mobile robotic device changes the location within the environment.

4. The process according to supplementary note 1, further comprising:

repeating process acts (b) through (e);

determining, for each facial characterization generated from the new sequence of simultaneously captured frame pairs, whether the facial characterization differs to a prescribed extent from each facial characterization assigned to an unknown personal identifier established for the person whenever the identity of the person was determined to have not been previously confirmed,

for each facial characterization generated from the new sequence of simultaneously captured frame pairs, each time the facial characterization is determined to differ by a prescribed degree from each facial characterization assigned to an unknown personal identifier established for the person,

assigning the facial characterization to an unknown personal identifier established for the person, and storing the facial characterization in a memory associated with the computer,

attempting to confirm the identity of the person, an

5. The process of supplementary note 4, wherein the color camera and depth camera are arranged on a mobile robotic device movable around the environment, and wherein a new sequence of the simultaneously captured frame pairs is captured from a different viewpoint within the environment than a viewpoint from which a previously captured frame pair was captured, the new viewpoint being achieved by at least one of: the mobile robotic device changes the direction in which the color camera and depth camera are pointing without changing the location within the environment, or the mobile robotic device changes the location within the environment.

6. The process according to supplementary note 1, further comprising:

repeating process acts (b) through (e);

determining, each time the identity of the person is determined not to have been previously confirmed, the number of times a sequence of simultaneously captured frame pairs has been entered and processed without confirming the identity of the person, and determining whether the number of times exceeds a prescribed maximum number,

each facial characterization assigned to an unknown personal identifier established for the person is deleted from the memory whenever it is determined that the number of times a sequence of simultaneously captured frame pairs has been entered and processed without confirming the identity of the person exceeds the prescribed maximum number.

7. The process according to supplementary note 1, further comprising:

repeating process acts (b) through (e);

whenever it is determined that the person does not correspond to a person whose position was previously determined using the sequence of simultaneously captured frame pairs preceding the new sequence,

attempting to confirm the identity of the person, an

8. The process of supplementary note 7, wherein the color camera and depth camera are disposed on a mobile robotic device movable about the environment, and wherein a new sequence of the simultaneously captured frame pairs is captured from a different viewpoint within the environment than a viewpoint from which a previously captured frame pair was captured, the new viewpoint being achieved by at least one of: the mobile robotic device changes the direction in which the color camera and depth camera are pointing without changing the location within the environment, or the mobile robotic device changes the location within the environment.

9. The process according to supplementary note 1, further comprising, prior to performing the process action of assigning each facial characteristic representation generated for a person to an unknown personal identifier established for the person, performing the process actions of:

for each person detected at a distance exceeding a prescribed maximum distance from the color camera,

providing the position of the person to a controller that controls a color camera having zoom capability, the controller being capable of magnifying the face of the person to a degree proportional to a distance from the color camera to the person based on the position of the person and capturing a zoom image of the face of the person,

inputting the zoom image of the face of the person, an

Generating the facial characterization of the portion of the zoomed image depicting the person's face.

10. The process of supplementary note 1 wherein the process action of attempting to confirm the identity of the person comprises the actions of: the actions employ a facial characterization generated for a person and assigned to an unknown personal identifier established for the person in an attempt to confirm the person's identity.

11. The process according to supplementary note 1, further comprising the process actions of:

inputting a further sequence of simultaneously captured pairs of frames, each further pair of frames comprising a frame output from a further color camera and a frame output from a further depth camera, the further color and depth cameras capturing the same scene in the environment as captured by the other color and depth cameras but captured from different viewpoints, and each further pair of frames being captured substantially simultaneously with the pairs of frames output from the other color and depth cameras;

detecting potential persons in the environment using a face detection method and frames from the further color camera;

detecting potential people in the environment using a motion detection method and frames from the further depth camera;

determining a location of one or more persons in the environment using detection results generated via the face detection method and motion detection method, the detection results generated via the face detection method including, for each detected person, a facial characterization of a portion of a color camera frame depicting the person's face;

for each person detected only via the motion detection method,

identifying the corresponding position of the person in the frame captured at the same time as the further color camera,

generating the facial characterization of the portion of the further color camera frame depicting the person's face;

for each person detected in the environment based on the pairs of frames output from the further color camera and the further depth camera,

determining, based on the identified location of the person, whether the person has also been detected using the other color and depth cameras,

each time it is determined that the person has also been detected using said other color and depth cameras, assigning each facial feature representation generated for the person based on the frame pairs output from said other color and depth cameras to an unknown personal identifier established for the person based on the detection of the person using said other color and depth cameras, and storing each said facial feature representation generated for the person based on the frame pairs output from said other color and depth cameras in a memory associated with said computer,

each facial feature representation generated for the person based on the pair of frames output from the further color camera and the further depth camera is assigned to an unknown personal identifier established for the person, an attempt is made to confirm the identity of the person, and each facial feature representation assigned to the unknown personal identifier established for the person is reassigned to a facial recognition training database established for the person each time the identity of the person is confirmed.

12. The process according to supplementary note 1, further comprising the process actions of:

inputting a further sequence of simultaneously captured pairs of frames, each further pair of frames comprising frames output from a further colour camera and frames output from a further depth camera, the further colour and depth cameras capturing a different scene of the environment than the scene captured by the other colour and depth cameras;

for each person detected only via the motion detection method,

determining whether the detected person was previously detected in a different scene in the environment,

if the person was previously detected in a different scene in the environment, it is determined whether the identity of the person was previously confirmed,

whenever the identity of the person is determined not to have been previously confirmed, for each facial characterization generated from the further sequence of simultaneously captured frame pairs, determining whether the facial characterization differs to a prescribed extent from each facial characterization assigned to an unknown personal identifier previously established for the person,

for each facial characterization generated from the further sequence of simultaneously captured frame pairs, assigning the facial characterization to an unknown personal identifier previously established for the person and storing the facial characterization in a memory associated with the computer whenever it is determined that the facial characterization differs by a prescribed degree from each facial characterization assigned to an unknown personal identifier previously established for the person,

attempting to confirm the identity of the person, an

Each time the identity of the person is confirmed, reassigning each facial characterization assigned to an unknown personal identifier established for the person to a facial recognition training database established for the person,

if the person has not been previously detected in a different scene in the environment,

assigning each facial characterization generated for the person based on the pair of frames output from the further color camera and the further depth camera to a newly established unknown personal identifier for the person,

storing each said facial characterization generated for the person based on the pair of frames output from the further color camera and the further depth camera in a memory associated with the computer,

attempting to confirm the identity of the person, an

13. The process according to supplementary note 1, further comprising the process actions of:

for each person detected only via the motion detection method,

if the person was previously detected in a different scene in the environment,

it is determined whether the identity of the person was previously confirmed,

determining, for each facial characterization generated from the further sequence of simultaneously captured frame pairs, whether the facial characterization differs to a prescribed extent from each facial characterization assigned to a facial recognition training database established for the person each time the identity of the person was previously confirmed,

for each facial characterization generated from the further sequence of simultaneously captured frame pairs, assigning the facial characterization to the facial recognition training database established for the person and storing the facial characterization in a memory associated with the computer each time the facial characterization is determined to differ to a prescribed extent from each facial characterization assigned to the facial recognition training database established for the person.

14. The process of supplementary note 1, wherein the process actions of detecting potential people in the environment using motion detection methods and depth camera frames comprise the actions of:

designating all pixels in the first depth camera frame as background pixels;

for each pixel of each frame in a sequence of successively captured depth frames contained in the sequence of simultaneously captured frame pairs in the order in which the frames were captured;

discriminating whether the depth value of the pixel has changed by more than a prescribed amount from a value of a pixel representing the same position within the environment in a depth frame captured immediately before a currently considered frame;

designating the pixel as a foreground pixel each time the depth value of the pixel changes by more than the prescribed amount;

once the last frame included in the sequence of simultaneously captured frame pairs has been processed to discern whether its pixel depth values have changed by more than the prescribed amount,

(i) establishing a seed point among foreground pixels in the last frame and assigning pixels associated with the seed point as part of a separate blob,

(ii) for each pixel adjacent to the pixel assigned to the blob that has not been assigned to the blob, recursively determining whether its depth value is the same as the current average of the pixels assigned to the blob within a specified tolerance, and if so, assigning the adjacent pixel as part of the blob until an adjacent pixel that is not assigned to a blob and has a depth value that is the same as the current average of the pixels assigned to the blob within the specified tolerance can no longer be found,

(iii) (iii) whenever during the execution of the recursive determination act (ii) neighboring pixels assigned to a different blob are found, merging the two blobs into one blob and continuing the recursive determination act (ii), and

(iv) (iv) repeating processing acts (i) to (iii) for unassigned foreground pixels until no more blobs can be formed,

once the blobs can no longer be formed, for each blob,

determining whether the blob satisfies a set of prescribed criteria indicating that the blob represents a person,

removing each blob that does not satisfy the set of prescribed criteria, an

Each remaining blob is designated as representing a different potential person located within the environment.

15. A computer-implemented process for generating a facial recognition training database for each person detected as being located in an environment, comprising:

using a computer to perform the following processing actions:

(a) inputting a sequence of frames output from a color camera;

(c) detecting potential people in the environment using a motion detection method and color camera frames;

(e) for each person detected only via the motion detection method,

locating a portion of a color camera frame depicting the person's face, an

(f) for each person detected in the environment,

attempting to confirm the identity of the person, an

16. The process according to supplementary note 1, further comprising:

inputting a new sequence of frames output from the color camera;

repeating process acts (b) through (e);

for each person detected in the environment and depicted in the new sequence of frames output from the color camera,

determining whether the person corresponds to a person whose position was previously determined using a sequence of color camera frames captured before the new sequence,

each time it is determined that the person corresponds to a person whose location was previously determined, it is determined whether the identity of the person was previously confirmed,

determining, for each facial characterization generated from the new sequence of frames, whether the facial characterization differs by a prescribed degree from each facial characterization assigned to a facial recognition training database established for the person each time the identity of the person was previously confirmed,

for each facial characterization generated from the new sequence of frames, assigning the facial characterization to a facial recognition training database established for the person and storing the facial characterization in a memory associated with the computer each time the facial characterization is determined to differ by a prescribed degree from each facial characterization assigned to the facial recognition training database established for the person.

17. The process according to supplementary note 1, further comprising:

inputting a new sequence of frames output from the color camera;

repeating process acts (b) through (e);

determining, for each facial characterization generated from the new sequence of frames, whether the facial characterization differs to a prescribed degree from each facial characterization assigned to an unknown personal identifier established for the person whenever the identity of the person was not previously confirmed,

for each facial characterization generated from the new sequence of frames, each time the facial characterization is determined to differ to a prescribed degree from each facial characterization assigned to an unknown person identifier established for the person,

attempting to confirm the identity of the person, an

18. The process according to supplementary note 1, further comprising:

inputting a new sequence of frames output from the color camera;

repeating process acts (b) through (e);

whenever it is determined that the person does not correspond to a person whose position was previously determined,

attempting to confirm the identity of the person, an

19. A computer-implemented process for detecting a person located in an environment, comprising:

using a computer to perform the following processing actions:

inputting a sequence of frames output from a depth camera;

designating all pixels in the first depth camera frame as background pixels;

for each pixel of each frame of successively captured depth frames included in the sequence of frames in the order in which the frames were captured;

once the last frame included in the sequence of frames has been processed to discern whether its pixel depth values have changed by more than the prescribed amount,

once the blobs can no longer be formed, for each blob,

removing each blob that does not satisfy the set of prescribed criteria, an

20. The process of supplementary note 19, wherein the process action of determining whether a blob satisfies a set of prescribed criteria indicating that the blob represents a person comprises: it is determined whether at least a portion of the blob exhibits a substantially rectangular shape.

Claims

using a computer to perform the following processing actions:

(a) inputting a sequence of simultaneously captured pairs of frames, each pair of frames comprising a frame output from a color camera and a frame output from a depth camera (100);

(b) detecting potential people in the environment using a face detection method and color camera frames (102);

(c) detecting potential people in the environment using a motion detection method and depth camera frames (104);

(d) determining a location (106) of one or more persons in the environment using detection results generated via the face detection method and motion detection method, the detection results generated via the face detection method comprising, for each detected person, a facial characterization of a portion of a color camera frame depicting the person's face;

(e) for each person (108) detected only via the motion detection method,

identifying a corresponding location (110) of the person in frames captured simultaneously with the color camera,

generating the facial characterization (112) of the portion of the color camera frame depicting the person's face;

(f) for each person (114) detected in the environment,

assigning each facial characteristic representation generated for the person to an unknown personal identifier (116) established for the person,

storing each of the facial characteristic representations in a memory associated with the computer (118),

attempting to confirm the identity of the person (120), and

each facial characterization assigned to an unknown personal identifier established for the person is reassigned to a facial recognition training database (124) established for the person each time the identity of the person is confirmed.

2. The process of claim 1, further comprising:

repeating process acts (b) through (e);

3. The process of claim 1, further comprising:

repeating process acts (b) through (e);

attempting to confirm the identity of the person, an

4. The process of claim 1, further comprising:

repeating process acts (b) through (e);

5. The process of claim 1, further comprising:

repeating process acts (b) through (e);

attempting to confirm the identity of the person, an

6. The process of claim 1, further comprising, prior to performing the processing action of assigning each facial characterization generated for a person to an unknown personal identifier established for the person, performing the processing action of:

inputting the zoom image of the face of the person, an

7. The process of claim 1, further comprising the process actions of:

for each person detected only via the motion detection method,

8. The process of claim 1, further comprising the process actions of:

for each person detected only via the motion detection method,

attempting to confirm the identity of the person, an

9. The process of claim 1, further comprising the process actions of:

for each person detected only via the motion detection method,

if the person was previously detected in a different scene in the environment,

it is determined whether the identity of the person was previously confirmed,

10. The process of claim 1, wherein the processing action of using the motion detection method and the depth camera frame to detect potential people in the environment comprises the actions of:

designating all pixels in the first depth camera frame as background pixels;

once the blobs can no longer be formed, for each blob,

removing each blob that does not satisfy the set of prescribed criteria, an