HK1236300A

HK1236300A - A camera system and method for dynamic object classification

Info

Publication number: HK1236300A
Application number: HK17110010.6A
Authority: HK
Inventors: 莎萨芮许．马哈诩; 里森‧狄米翠‧A.; 里普钦‧艾利克斯; 芮昕‧爱格尔
Original assignee: Motorola Solutions, Inc.
Priority date: 2008-03-03
Filing date: 2017-10-03
Publication date: 2018-03-23

Description

Dynamic object classification

The present case is the divisional application, and the application number of its parent case is: 200980107334.6, filing date: 3 months and 3 days in 2009.

Technical Field

This disclosure relates generally, but not exclusively, to video surveillance and, more particularly, to object classification.

Cross reference to related applications

This application is entitled to the benefit of U.S. provisional application No. 61/033,349 entitled "method for dynamic object and event classification" filed on 3.3.3.2008 of Gregorian and U.S. provisional application No. 61/033,284 entitled "method and system for tracking objects under video surveillance" filed on 3.3.3.2008 of Gregorian in accordance with U.S. code 35 U.S. C. 119(e), both of which are hereby incorporated by reference in their entirety.

Background

Automated security and surveillance systems typically employ video cameras or other image capture devices or sensors to collect image data. In the simplest system, the images represented by the image data are displayed for concurrent review by security personnel and/or recorded for subsequent reference after a security violation. In those systems, the task of detecting the relevant object is performed by a human observer. Significant advances have occurred when the system itself is capable of performing object detection and classification, either partially or completely.

In a typical surveillance system, for example, it may be of interest to detect objects such as humans, vehicles, animals, etc. moving through an environment. Different items are the extent to which different threats or alarms may be imposed. For example, an animal in the scene may be normal, but a human or vehicle in the scene may be the cause for an alarm and may require immediate attention from security personnel. Existing systems that are capable of classifying detected objects tend to employ simple trial and error methods to distinguish the objects from each other in a broad category. For example, predetermined expectations for aspect ratio and height are used to classify a detected object as human. In theory, the trial-and-error method is computationally inexpensive and easy to implement, but is much less robust than optimized parameter classifiers formed by applying known machine learning algorithms such as Adaptive Boosting. However, known parameter classifiers suffer from one or more of the following: (1) lack of labeling data for training, and (2) inability to evolve automatically.

Prior art classifiers typically require manual geometric calibration and adjustment. Such calibration and adjustment is typically an intermediate user input (e.g., object height) for which system performance is indirectly affected, and typically requires time-consuming manual labor due to trained personnel during installation. Moreover, readjustment and calibration is typically required over seasonal variations or if a camera is moving.

Disclosure of Invention

Detailed Description

This paragraph describes certain embodiments and their detailed construction and operation with reference to the accompanying drawings. The embodiments described herein are set forth by way of illustration only and not by way of limitation. Those skilled in the art will recognize in light of the disclosure herein that: there are a number of equivalents to the embodiments described herein. In particular, other embodiments are possible, variations may be made in the embodiments described herein, and equivalents may exist to components, parts, or steps that constitute the described embodiments.

For the sake of brevity, certain aspects of the components or steps of certain embodiments may be presented without undue detail that will be apparent to one of ordinary skill in the art in light of the disclosure herein, and/or which will obscure the embodiments at a relatively level that is relevant to the description.

SUMMARY

As will be understood by those skilled in the art in light of this disclosure: certain embodiments are capable of achieving certain advantages over known prior art, which may include some or all of the following: (1) improved object classification accuracy; (2) the utilization of user feedback for training and adaptation of an object classifier; (3) learning new object classes in a field-deployed camera system; (4) online evaluation and deployment of new object classifiers; (5) aggregating feedback from a group of camera systems to train a new and/or more accurate generic object classifier; (6) calibrating a field-deployable camera system during field operations; (7) a reduction or even elimination of the need for manual calibration of the system in the field during system setup and adjustment due to seasonal variations or camera movement; and (8) automatic adaptation of the camera system to changing conditions. These and other advantages of various embodiments will be apparent upon reading the remainder of this document.

According to one embodiment, a camera system comprises: an image capturing device; and an object classification module connected to the image capture device. The image capture device has a field of view and generates image data representing an image of the field of view. The object classification module is operable to determine whether an object in an image is a member of an object class. The object classification module includes N decision steps (steps) configured in a cascade configuration, wherein at least one of the N decision steps is operable to (a) accept an object as a member of the object class, (b) reject an object as a member of the object class, and (c) request a next step to determine whether an object is a member of the object class.

According to another embodiment, a method classifies an object captured by a camera system that includes an object classification module having N decision steps configured in a cascade configuration. The method captures an image of an object and transmits image data representing the object to a first one of the N decision stages. The method identifies a characteristic of the object represented in the image data to determine whether the object is a member of an object class, wherein a decision step value is derived from the characteristic of the object. The method makes a decision to accept the object as a member of the object class, reject the object as a member of the object class, or forward the image data to a second of the N decision stages for further analysis. The decision is based on a comparison of the decision step value to one or more of an acceptance threshold and a rejection threshold. The acceptance threshold is a higher value than the rejection threshold. The object is received as a member of the object class when the decision step value is above the acceptance threshold. The object is rejected as a member of the object class when the decision step value is below the rejection threshold. The image data is forwarded to a second decision stage when the decision stage value is between the accept and reject thresholds.

According to another embodiment, a camera system includes: an image capturing device; and an object classification module connected to the image capture device. The image capture device has a field of view and generates image data representing an image of the field of view. The object classification module is operable to determine whether an object in the image is a member of an object class. The object classification module includes N decision stages configured in a cascade configuration. Each of the decision stages includes one or more stages (stages) for mapping object features to scalar values. A first one of the stages includes a first discriminant function for determining a first scalar value, and a second one of the stages includes a second discriminant function for determining a second scalar value. The first and second discrimination functions are of different types.

According to another embodiment, a method classifies an object captured by a camera system. The method generates image data representing an image of an object captured by the camera system and identifies first and second features of the object that are representative of the image data. The method maps first and second features of the object to respective first and second scalar values. A first discriminant function is used to generate the first scalar value, and a second discriminant function is used to generate the second scalar value. The first and second discriminant functions are selected from a group consisting of a plurality of different discriminant functions during a training session. The method determines whether the object is a member of an object class based on a decision step value derived from the first and second scalar values.

According to another embodiment, a camera system includes: an image capturing device; an object classification module connected to the image capture device; and a calibration module connected to the object classification module. The image capturing device has a field of view and an image plane. The image capture device generates image data representing an image of the field of view projected at the image plane. The object classification module is operable to detect and classify an object captured in the field of view based on the image data. The object classification module is operable to classify objects as members or non-members of an object class. The calibration module is coupled to the object classification module for estimating representative sizes of the members of the object classes, the representative sizes corresponding to different regions of the image plane. The calibration module is operable during online operation to automatically update the representative dimensions in response to the classification performed by the object classification module. The calibration module is operable to supply information representative of the updated representative size to the object classification module to improve its object classification performance.

According to another embodiment, a method is to automatically calibrate a field-deployed camera system. The method captures images of a field of view of the camera system. The plurality of images correspond to an image plane of the camera system on which the field of view is projected. The method detects a first object in the plurality of images. The first object is detected in the plurality of images at different positions of the image plane. The image of the first object has different sizes corresponding to the different positions. The method classifies the first object as a first member of an object class and calculates a parameter of a size function for the image plane based on different sizes of the first object. The size function is used to estimate a representative size of the object class for the image plane. The method updates parameters of the size function in response to detection and classification of a second member of the object class. The second member is detected and classified during online operation of the camera system.

According to another embodiment, a method modifies an object classification module employed by a user in a field-deployed camera system. The method captures images of a field of view of the camera system. The plurality of images are representative of a plurality of objects. A first group of the plurality of objects is members of an object class and a second group of the plurality of objects is not members of the object class. The method classifies the plurality of objects as members or non-members of the object class, wherein the object classification module generates one or more misclassifications. The method generates error metadata (metadata) based on approval by the user of at least some of the one or more error classifications. The method modifies the object classification module based on the error metadata to reduce the number of error classifications, the modification being automatically performed during field operation of the camera system.

According to another embodiment, a camera system includes: an image capturing device; an object classification module connected to the image capture device; a user station (station) connected to the image capture device; and, a classifier evolution (evolution) module. The image capture device has a field of view and generates image data representing an image of the field of view. The object classification module is operable to determine whether an object in the image is a member of an object class, wherein the object classification module generates a misclassification. The user station has a display to present an image of the field of view to a user. The user station is operable to present a representation of the misclassification generated by the object classification module to the display. The user station is operable to generate user feedback information in response to user approval of the misclassification. The user feedback is generated error metadata. The classifier evolution module receives the error metadata and is operable to modify the object classification module using the error metadata to reduce the number of error classifications. The classifier evolution module thus generates a specialized classifier.

According to another embodiment, a method is to construct a new object classification module for a field-deployed camera system. The new object classification module is used for classifying objects as members or non-members of a new object class selected by a user. The method captures images of a field of view of the camera system. The images include representations of objects, wherein a first group of the objects is members of the new object class and a second group of the objects is not members of the new object class. The method uses a deployed object classification module to classify the objects as members of a deployed object class. The method provides representations of the plurality of objects classified by the deployed object classification module on a display. A user identifies the plurality of objects as members or non-members of the new object class. The method generates metadata based on the tag and modifies the deployed object classification module based on the metadata to form the new object classification module.

Further aspects and details of construction and operation of the foregoing embodiments and other embodiments are set forth in the following subsections which make reference to the accompanying drawings.

Integrated system

Fig. 1 is a depiction of a camera system 100 according to one embodiment. The camera system 100 includes an image capture device 102, a user interface 104, and a remote storage/processing unit 106, which are connected to one another over a network 108. The network 108 is a wired or wireless network that may include any type of network. Although the camera system 100 of fig. 1 includes a plurality of image capture devices 102 connected to a network, the camera system 100 may include a single image capture device 102. The image capture device 102 may include an internal storage system 110 that includes a hard disk drive (HD)111 and a metadata Database (DB) 112. For example, the image capture device 102 may include a storage system described in commonly owned U.S. patent application Nos. 12/105,971 and 12/105,893 entitled "content aware storage of video material" and "extending the operational life of a hard drive for video material storage applications," respectively, both of which are hereby incorporated by reference in their entirety. The user interface 104 includes a display 114 and an input device 116. The image capture device 102 captures images of their respective fields of view and generates data representative of the images. It is to be understood that: the image may mean a still image or a moving video image. Image data is communicated to the user interface 104 via the network 108 and one or more views of the image are presented on the display 114. The input device 116 is operable to allow a user to provide user feedback information for the camera system 100. Image data may also be communicated to a remote storage/processing unit 106 via a network 108, where a storage system 110, or portions thereof, or similar storage systems may be provided instead of or in addition to.

FIG. 2 is a simplified block diagram of the image capture device 102. The image capture device 102 may be a high resolution video camera, such as: a megapixel video camera. The image capture device 102 may also capture data in the visible spectrum (e.g., thermal energy) from the outside. In addition to the storage system 110, the image capture device 102 includes an image processing unit that includes a video analysis module 200 for analyzing images captured by the image capture device 102. The data generated by the video analysis module 200 may be used by a rules engine (not shown) to determine whether one or more user-specified rules have been violated. For example, the rules engine may trigger an alarm that is presented to the display 114 of the user interface 104 if a person is detected in the field of view of one of the image capture devices 102. The image processing unit need not be housed within a housing 202 of the image capture device 102 as depicted in fig. 2. Furthermore, the remote storage/processing unit 106 may also include an image processing unit.

The video analysis module 200 is a module that includes modules for performing a variety of tasks. For example, the video analysis module 200 includes an object detection module 204 for detecting objects appearing in the field of view of the image capture device 102. The input to the object detection module 204 is video data, preferably live video data from an imager (not shown) or video buffer (not shown). The object detection module 204 may employ any known object detection method, such as: motion detection and binary large object (blob) detection. The object detection module 204 may be included in and utilize the system described in commonly owned U.S. patent application No. 10/884,486, entitled "method and system for detecting objects in a spatio-temporal signal," the entire contents of which are incorporated herein by reference.

The video analysis module 200 also includes an object tracking module 206 coupled to the object detection module 204. As used herein, the term "connected" is intended to mean directly or indirectly through one or more intermediaries, either logical or physical. The object tracking module 206 is operable to temporally correlate instances of an object detected by the object detection module 204. The item tracking module 206 is a system and method that may be included in the commonly owned U.S. patent application entitled "item matching for tracking, indexing, and searching" (attorney docket number 37686/7: 2), the entire contents of which are incorporated herein by reference, and which employs the methods described herein. The object tracking module 206 generates metadata corresponding to the tracked objects. The metadata may correspond to a signature of the object that represents the appearance or other characteristic of the object. The metadata may be transmitted to the metadata database 112 for storage.

The video analysis module 200 also includes a temporal object classification module 208 coupled to the object tracking module 206. The temporal object classification module 208 is operable to classify an object by considering the appearance of the object over time according to its type (e.g., human, vehicle, animal). In other words, the object tracking module 206 tracks an object in a plurality of frames (i.e., a plurality of images), and the temporal object classification module 208 determines the type of the object based on the appearance of the plurality of frames. For example, analysis of the pace of a person's way of walking may be used to classify a person, or analysis of a person's legs may be used to classify a cyclist. Temporal object classification module 208 may combine information about the trajectory of an object (e.g., whether the trajectory is smooth or chaotic, whether the object is moving or stationary) and the confidence in the classification made by an object classification module 210 (described in more detail below) averaged over multiple frames. For example, the classification confidence value determined by the object classification module 210 may be adjusted based on the smoothness of the trajectory of the object. The temporal object classification module 208 may assign an object to an unknown class until the object is classified by the object classification module a sufficient number of times and a predetermined number of statistics have been collected. In classifying an object, the temporal object classification module 208 may also consider how long the object has been in view. The temporal object classification module may make a final determination regarding the class of an object based on the information described above. The temporal object classification module 208 may also employ a hysteresis method for changing the class of an object. More specifically, a threshold may be set to transition the classification of an object from unknown to unambiguous categories, and the threshold may be greater than a threshold for the opposite transition (e.g., from human to unknown). The temporal object classification module 208 may generate metadata regarding the classification of an object, and the metadata may be stored in the metadata database 112. The temporal object classification module 208 may aggregate the classifications made by the object classification module 210.

Article classification

The video analysis module 200 also includes an object classification module 210, which is preferably directly or indirectly connected to the object detection module 204. In contrast to the temporal object classification module 208, the object classification module 210 may determine the type of an object based on a single instance of the object (i.e., a single image). The input to the object classification module 208 is preferably an object rather than video or image data. The benefits of inputting objects into the object classification module 208 are: the entire scene does not need to be analyzed for classification, and thus requires less processing power. Such as a heuristic-based module to capture apparent classifications, other preliminary modules may also be included to further simplify the complexity of the object classification module 210.

The object detection, tracking and temporal classification modules 204, 206 and 208 are optional and preferred components of the video analysis module 200. In an alternative configuration, the object classification module 210 is placed after the object detection module 204 and before the object tracking module 206, such that object classification occurs before object tracking. In another alternative arrangement, the object detection, tracking, temporal classification and classification modules 204 and 210 are interrelated as described in the above-mentioned U.S. patent application 10/884,486.

The object classification module 210 includes a number of object classifiers, as depicted in the block diagram of FIG. 3. For example, the object classification module 210 includes: a whole body classifier 300 that determines whether an image of a detected object corresponds to a whole body; a human torso classifier 302 that determines whether an image of a detected object corresponds to a human torso; and a carrier classifier 304 that determines whether an image of a detected object corresponds to a carrier. The object classification module 210 may include any number of different classifiers, and as described in more detail below, a user may generate new classes for the objects of the object classification module 210, even when the camera system is deployed and operational. In other words, the item classification module 210 is field trainable.

The object classifier is operable to classify an object based on object characteristics, such as appearance characteristics. For example, the whole body classifier 300 receives data corresponding to the characteristics of an object (i.e., an input pattern X) and determines whether the object corresponds to a whole body. After the object classification module 210 classifies an object, metadata representing the object class and object characteristics may be stored in the metadata database 112.

Features that may be employed by the object classification module 210 are described in more detail below. One training algorithm described below is from a set of features F ═ F₁，f₂，f₃，…，f_nTo select a subset of featuresThe input pattern X is composed ofIs composed of the elements of (1).Is a transformation of an image region R that can be considered as an object. Thus, X is a form that may take the form:

characteristics of an objectMay correspond to some appearance characteristic such as, without limitation: aspect ratio, color, edge orientation, and normalized saturation. What is more, characteristicsAre feature vectors (e.g., histograms, where bins of the histogram correspond to vector components) that may represent appearance characteristics and may be employed by one or more object classifiers to determine a class (i.e., type) of an object. For example, a histogram of the edge orientations of the object may be plotted for different regions (e.g., sub-windows) of the object image. In other words, the image of an object can be divided into sub-windows, and the edge orientation can be calculated for each pixel of the sub-windows. The edge orientation of a pixel may be derived using a steerable filter (e.g., using a gaussian derivative filter in multiple directions). Using a steerable filter allows the dominant direction to be the pixels assigned to a sub-window and the histogram of the directions to be plotted for that sub-window. For example, for a given pixel, a steerable filter is applied in multiple directions to generate multiple responses, and the direction corresponding to the largest directional derivative response is designated as the direction of the pixel.

One classification problem for the object classifier can be summarized as being defined by a classifier function (X), where an object represented by input type X is declared a member of the object class when (X) > 0 or declared a non-member of the object class when (X) < 0. In summary, the classifier function (X) is parameterized by a set of parameters, and the input pattern X is composed of the above features. A specific classifier C (X) is trained for each associated object class. The multi-class classification model represented by the object classification module 210 of FIG. 3 is mathematically defined as follows:

Ω＝{ω₁，ω₂，…，ω_c}

ω＝ω_c：(_c(X) > 0 and

where ω represents an object class and Ω represents the set of all object classes.

One example of one configuration of the item sorting module 210 will be described in more detail with reference to fig. 4-6. For clarity, the complete human classifier 300 will be described in detail. However, the following description is equally applicable to other article classifiers of the article classification module 210. Classifier 300 includes a plurality of stages 400(N decision stages) configured in a cascade configuration (i.e., stage 1, followed by stage 2, followed by stages 3, …, followed by stage N), as shown in fig. 4. The N steps operate to determine if the appearance of an object corresponds to the object class (i.e., the complete human body). Unlike the International Computer Vision Journal of the national 2004, pages 137-154, "robust real-time face detection" by Paul Viola and Michael joints, a known cascade classification system in which an object is rejected or forwarded to N-1 initial steps (i.e., all but the last step), each of the initial N-1 steps of classifier 300 is operable to make one of three decisions: (1) accepting an item as a member of the item class (i.e., positive class); (2) rejecting the item as a member of the item class (i.e., negative class); and (3) forwarding the decision to the next step. For example, an input pattern X of an object is supplied to step 1, and step 1 determines whether (1) the object is accepted as a complete human body; (2) rejecting the object as a complete human body; or (3) forwarding the input pattern X to step 2 for determination. The decision to accept, reject or forward is based on a value generated by the step (i.e., a decision step value). The last or Nth step is operable to (1) accept the object as a complete human body; or (2) reject the object as a whole human body.

FIG. 5A is a block diagram showing one of the initial N-1 steps 400 in greater detail. Each stage 400 includes one or more stages 500. For each stage 500, featuresOne is taken from the input pattern X (represented by block 502) and supplied to its corresponding stage 500. Thus, each stage 500 has a corresponding feature associated with it. The combination of phases/features is predeterminable due to a learning algorithm during a training method as described below. Furthermore, for example, one feature supplied to the first stage (stage 1) of fig. 5 may be different or the same as the feature supplied to the second stage (stage 2).

In summary, a phase 500 can be represented by a phase function g, which is defined as follows:

g: x → gamma, wherein

g∈G (3)

X is equal to X and

γ∈[-1，1]

where G represents the general set from which the particular phase function G is selected, and x represents an arbitrary input to the phase. Set G is a function for which a feature set can be mapped to a rich family of scalar values whose signs indicate the class of an object. As described aboveEach stage 500 is a receive featureThus, the set G can be written as G ═ F × T, where F is defined above and T represents a set of possible transforms (i.e., maps) such that for T ∈ T, T:thus, the phase function g is an expanded form that may take the following form:

a stage 500 represents a discriminant function comprising a weight vectorAnd an activation function β, as described above, the characteristicsIs representative of a feature vector. The phase function is rewritable as follows:

wherein the content of the first and second substances,is representative of a weight vectorAnd feature vectorThe launch function β i may be any function such as, but not limited to, an sigmoid function or a radial basis function the launch function β i is implemented with image weight pairsMeasurement ofAnd feature vectorIs set to a value between 0 and 1. The scalar value γ is determined by calculating a differential of the discriminant function. Unlike known cascade classifiers, which include the same discriminant function for all stages, the discriminant functions for the stages of the present embodiment may be different from each other. What is more, the weighting vectors for each stage 500The activation function β i is automatically determined during training as described below.

Each stage 400 is a linear combination of one or more of the stages 500 (represented by the scale and sum block 504). In other words, the scalar values γ of stage 500 are scaled and summed to produce the decision step value s (X). In summary, this function is mathematically represented as follows:

wherein the content of the first and second substances,

since s (X) (i.e., the decision step value) is a convex combination of stages, s (X) and g (x) have the same range. The weighting factor alpha is selected by a learning algorithm during training. The decision step value is compared to one or both of an acceptance threshold τ α and a rejection threshold τ γ (represented by block 506) to determine whether to accept the object as a whole body, reject the object as a whole body, or forward the decision to the next step 400. This comparison can be represented as follows:

accept if_a＜s(X)≤1

Transfer if τ_r＜s(X)≤τ_a(7)

Rejecting if-1 is less than s (X) is less than or equal to tau_r

Fig. 5B illustrates an example of an acceptance and rejection threshold for the range [ -1, +1 ]. The acceptance threshold τ a and rejection threshold τ r are selected by a learning algorithm during training based on user-specific pseudo-positive and pseudo-negative rates. Each step 400 may have a value for τ a and τ r that is different from or the same as the other steps 400. A decision to "accept" means: the classifier is assured that the input pattern X belongs to a positive class (e.g., a whole human body). A decision to "forward" is intended to mean: the classifier is uncertain and delays the decision to the next step. A decision to "reject" is intended to mean: the classifier is assured that the input pattern X belongs to a negative class (e.g., not a whole body). At any step, if the decision is not "forwarded" to the next step 400, an accept/reject decision is made at that point and the evaluation is complete. The decision step value may correspond to a decision confidence for the corresponding step 400. For example, a decision step value close to 1 may represent, compared to a decision step value slightly above the acceptance threshold τ a: the corresponding step 400 is more confident that an object is a complete human body. Alternatively, an increase in the decision step value may not necessarily correspond to a higher decision confidence (i.e., a probability that the decision is correct). Since objects are classified correctly and incorrectly by different decision step values, the confidence associated with each decision step value can be empirically estimated during training. The confidence level of the decision step value is described in more detail below. As described above, in the last step of the classifier (step N), it is forced to accept or reject constantly:

receiving if s is more than 0 and (X) is less than or equal to 1

Reject if-1 < s (X) ≦ 0 (8)

FIG. 6 is a flow chart depicting a method 600 of utilizing a camera system according to one embodiment. For example only, the method 600 will be described with respect to the camera system 100; the method 600 is operable with any suitable camera system. First, an image of an object is captured by one of the image capture devices 102 (step 602). The object is detected by the object detection module 204 (step 604). For example, the input pattern X is sent to the first step 400 of the whole body classifier 300 (step 606). Alternatively, rather than transmitting type X to the first step 400, the combined features utilized by the stages 500 of the first step 400 are selectable and only those features are transmitted to the first step 400. Features for the first through S phases 500 are identified for the input pattern X and selected from the input pattern X (step 608). The selected features are supplied to their respective stages 500, and the stages 500 map the selected features to scalar values (step 610). The scalar values are scaled (i.e., weighted) and summed to produce a decision step value s (x) (step 612). The decision step value is compared to one or more of an acceptance threshold τ a and a rejection threshold τ r (step 614). If the decision step value is greater than the acceptance threshold τ a, the object is accepted as a member of the object class (e.g., accepted as a complete human) (step 616). If the decision step value is less than or equal to the rejection threshold τ r, the object is rejected as a member of the object class (e.g., rejected as a complete human body) (step 618). If the decision step value is greater than the rejection threshold τ r and less than or equal to the acceptance threshold τ a, the input pattern X is forwarded to the second step 400 (or alternatively, only those feature combinations utilized by the second step 400 are sent to the second step 400) (step 620). An object may be accepted or rejected as a member of the object class at any step 400 in the cascade.

The input pattern X is simultaneously supplied to all classifiers of the object classification module 210, wherein each classifier accepts or rejects the object as a member of its corresponding class. If more than one object classifier accepts the object as a member of its class, the output of the object classifier decision step values of the object classifier accepting the object may be compared. For example, the whole body classifier 300 and the vehicle classifier 304 can determine that: an object is a complete human body and a carrier respectively. In this case, the decision step values resulting from the steps at which the object was accepted are comparable, and the object is of a class that may specify the classifier corresponding to the most reliable (e.g., largest) decision step value. For example, if the decision step value of the complete human classifier corresponds to a 90% confidence level and the decision step value of the carrier classifier corresponds to an 80% confidence level, the object is classified as a complete human. If none of the classifiers declares a positive output (an object is not accepted as a member of any class), then the object may be classified as unknown.

When the object classification module 210 determines the classification of an object, such as tracking performed by the object tracking module 206, is taken into account. FIG. 7 is a flowchart illustrating an object tracking method 700. For example only, the method 700 will be described with respect to the object tracking module 206 employed in connection with the object classification module 210. The method 700 may operate in any suitable system. The method 700 associates an image of an object with a previous instance of the object (block 702). The method 700 recalls the classification and classification confidence associated with the previous instance of the object. Item classification module 210 determines whether the item has high confidence in a previous classification (block 704). The confidence level deemed high may be predetermined by the user (e.g., 70% confidence or higher). If the object classification module determines that the object is not previously classified with high confidence, then a plurality of object classifiers are performed (block 705). If the method 700 determines that the object has high confidence in a previously declared category, a portion of the classifiers corresponding to the previously declared category of the object may be executed (block 706) instead of executing multiple classifiers. For example, only the first step 400 of the classifier corresponding to the previously declared category is evaluable. The output of the first step 400 is that it can check to see if it is a category consistent with the previous announcement (block 708). If the output of the first step 400 is consistent with the previously declared class, no further evaluation is required; the type of object for the current image is determined and the history of the object is updated and stored in the metadata database 112 (block 710). On the other hand, if the decision of the first step 400 is to reject the object as a member of the object class, then one or more of the other object classifiers are executed (block 705). If the output of the first step 400 is to be forwarded, one or more of the other steps 400 may be evaluated until the output is consistent or inconsistent with the previously declared class.

Training object classifier

One way to train the object classifiers of the object classification module 210 is as follows. Existing classification systems may be trained for learning algorithms AdaBoost or some variation of AdaBoost. Although AdaBoost has proven its value in some applications, the algorithms and objective functions used in the learning method have some limitations. For example, in order for AdaBoost to be effective, members or non-members of an object class cannot significantly overlap each other in feature space. In other words, the objects should be characterized as being relatively separated into a classification space. Moreover, because AdaBoost employs weak learners, a large number of weak learners may be necessary to form a full classifier that can achieve a desired accuracy.

According to one embodiment, an alternative objective function and learning algorithm, referred to as Sequential Discriminant Error Minimization (SDEM), is preferably employed to train the object classifier of the object classification module 210. SDEM is proposed in Saptharishi as "minimization of sequential discriminant error: the theory and its application to real-time video object recognition "(university of Carnegie Mellon, pubic 2005) is incorporated herein by reference in its entirety. SDEM is a feature that can handle its not necessarily quite separate classification spaces. Unlike AdaBoost and other similar boosting techniques, sdme is another discriminant function that may be considered weak or not necessarily with a weak learner. Thus, the number of stages 400 and 500 of an object classifier may be significantly smaller than that of AdaBoost, using SDEM training. For a given feature, SDEM is often the best classifier that can be learned for the corresponding feature space. The optimal characteristics of the object may be automatically selected for a given classification problem.

In summary, the SDEM algorithm is employed to train the combination of phases 500 for each step 400. As defined in equation (4), a stage 500 includes a stage function gi (x) that is equal to a featureIs changed by t. The training task is to select the best transition t, and the best featureSuch that when a particular stage 500 is added to a step 400, the performance of the object classifier is maximized. In other words, the SDEM algorithm selects the transition t and characteristics for a particular phaseTo maximize an objective function. The set of features F may be finite and the set of transitions T may be continuous and differentiable. Using SDEM algorithm, aimed atA search is performed on the set T to identify the transition T that performs best on a training data set. The search for the set T is performed using standard unconstrained optimization techniques, such as, but not limited to: a Quasi-Newton (Quasi-Newton) optimization method. Once the optimum transition t is aimed atAre identified, the best feature is identified based on an estimated summary errorAnd select. The choice of best features can be written as follows:

one property of the SDEM algorithm is that: when stage 500 is added to stage 400, the addition of stage 500 improves the performance of the object classifier on the training data set. If a new phase is not identified as improving the performance of the object classifier, the SDEM algorithm is automatically terminated. Alternatively, rather than waiting for the SDEM algorithm to automatically terminate, some of the stages 500 of the one-step stage 400 may be user-determined. In other words, the SDEM algorithm terminates training when a maximum number of phases 500 set by the designer are reached or when no phases 500 are added that will improve performance.

The SDEM algorithm selects a series of features/transitions such that when combined, the combination is superior to a single one of the features/transitions. For example, although the aspect ratio of an object may only be a poor feature for classification, when combined with local gradient information, the aspect ratio may improve its classification accuracy corresponding to local gradient information only. Some simple features and transformations may be combined to produce an extremely accurate object classifier. In practice, the training task is to generate hyper-features by combining a set of appearance features for an object.

One training method 800 for establishing phase 500 of the first step 400 is described in more detail with reference to the flow chart of fig. 8. The following description is also applicable to the phase 500 of other stages 400. The training data set for a classifier is representative of both members and non-members of a particular class of objects. For example, to train the whole body classifier 300, the training data set is representative of images including images of a whole body and images of other objects. Characteristics of the articleIs retrieved from the training data set. The objects of the training data set are manually labeled by a user as a member and non-member of the particular object class, resulting in labeled objects 802. Characteristics of each tagged item 802Are identified and retrieved (steps 804a, 804b, and 804 c). Each featureThe stage that is employed to train a separate stage and which maximizes the value of the objective function is selected. Any number of features may be employed. For example, among the M (e.g., M-60) features, one may be an aspect ratio and the other M-1 features may be vectors of size B, which corresponds to an edge orientation histogram having B bins for M-1 different regions of the image of a tagged object.

Is characterized in thatAfter the self-tagged object 802 is retrieved, the optimal transition t is for the featureAre selected (steps 806a, 806b, and 806 c). The transition may be selected based on standard optimization techniques. A transition t is a decision boundary that can be considered as a separation of the tagged objects in feature space. Thus, the best transition t corresponds to a decision boundary that best separates the members and non-members of the object class. By weighting the vectorAs with the discriminant function formed by the activation function β i, the selection of the best transition t corresponds to the activation function β i and the weight vector that best separates the member and non-member of the object classThe activation function β i may be selected from a set of multiple functional types, such as, but not limited to, an sigmoid function and a radial basis function (e.g., gaussian).

In the aspect of the featureAfter the transition of (c) is selected, a value of an objective function corresponding to each feature/transition combination is calculated (steps 808a, 808b, and 808 c). The objective function is a function that may be proportional to a measure of the classification error or may be monotonically varying for a non-linearity of the classification error. The calculated value of the objective function is the number and/or severity of classification errors that may be made with respect to different feature/transition combinations. For example, a first calculated value may be associated with a featureThe number of classification errors made corresponding to the transition. The calculated values of the objective function are compared and the feature/transition combination with the largest calculated value is selected for the first stage 500 of the first step 400 (step 810).

After the features and transitions are selected for the first stage 500, the label object 802 is weighted to have different weights taking into account the decisions made by the first stage 500 (step 812). Objects may be weighted as a function of how close their corresponding data points in the feature space are to the decision boundary represented by the first stage 500. For example, objects corresponding to data points close to the decision boundary of the feature space may be weighted with higher weights than objects having data points far from the decision boundary, so that a second stage 500 may be trained with more objects for which the first stage 500 is somewhat confusing. The distance between a data point of an object and the decision boundary of the first stage 500 may be correlated to the scalar value γ calculated for that object.

The training method 800 is repeated for the next phase. The optimal transition is for the feature after the labeled object is weightedAnd again (steps 806a, 806b, and 806c are repeated). However, the characteristicsIs already weighted and is directed to the featureThe optimal transition t for each is selected taking into account the first stage 500. The optimum transition t is the transition that may correspond to the maximum increase in the value of the objective function that it causes. The values of the objective function are recalculated and compared to determine the features/transitions for the second stage 500 (steps 808a, 808b, 808c, and 810 are repeated). To establish a third stage 500, objects are marked for reweighting, wherein objects that are somewhat confusing to the first and second stages 500 are given higher weights. Again, steps 806a, 806b, and 806c are repeated, and featuresHave been re-weighted and are directed to featuresThe optimal transition t for each is selected taking into account both the first and second stages 500. The values of the objective function are again calculated and compared to determine the features/transitions for the third stage 500 (steps 808a, 808b, 808c, and 810 are repeated). The method of selecting the best feature/transition and weight label object by training each iteration of the new phase is considered as a gradient rise in the function space, or as a method of increasing the total value of the objective function.

Once the first step 400 is trained, the thresholds τ a and τ r are selected so that the desired pseudo-positive and pseudo-negative rates are settable. Furthermore, as the stage 500 is constructed for the first step 400, the weighting factor α is also selected. For example, as each stage 500 is incremented to the first step 400, the weighting factor α for that stage is adjusted to find the value of the weighting factor α that corresponds to the lowest overall error rate for the first step 400. For example, the weighting factor α may be selected by applying a line search optimization strategy.

After the first step 400 is trained, a stage 500 of a second step 400 is trainable. However, the training data used to train the second step 400 is a subset of the set of training data used to train the first step 400. The subset of training data corresponds to tagged objects that the first step 400 may neither accept nor reject as a member of the object class. In other words, the second step 400 is trained on tagged objects that have a corresponding decision step value greater than the rejection threshold τ r and less than or equal to the acceptance threshold τ a. This allows the second step 400 to be directed only to those objects that are at a perplexity with respect to the first step 400.

Because of the gradual progression of the training method 800, it is true that the best order of criteria for the N steps of the classifier 300 should be determined and made the minimum steps necessary to carry out the classification. As a result, classification performed on a deployed, trained, on-site system should minimize the execution time required to output a classification and the processing power required to generate the classification.

Once an object classifier is trained with labeled objects, the object classifier can continue through other training steps to improve the features/transformations selected for the different stages. A high-level manner of training an object classifier is shown in the flowchart of FIG. 9, which illustrates a training method 900. Image data 901 (e.g., raw video data) is supplied to a simple base or seed system that enables basic detection, tracking, and classification of objects. The base system detects, tracks, and classifies objects that represent the image data 901 and generates metadata that corresponds to the objects (step 902). The base system selects a set of objects that it detects and tracks (step 904). The selection of the object may depend on the amount of time an object is in the field of view of an image capture device 102, or may depend on the classification of the object as depending on how reliable the base system is. Other rules may be specified to specify whether an object is selected by the base system.

The image of the object selected by the base classifier is presented to the user on a display so that the user can manually mark the object as a member or non-member of the particular object class for which the trained object classifier is a. The user at the user interface 104 manually labels the objects and the labeled objects are supplied to the object classifier that is trained (step 906). The object manually marked by the user may correspond to the above-mentioned marked object. The object classifier is trained with labeled objects, such as: according to the training method 800 described above with reference to fig. 8. The trained object classifier classifies the objects represented in the image data 901 and generates metadata representing the classes of the objects (step 910). Each object classified by the trained object classifier has a classification confidence associated therewith. The classification confidence is a decision step value corresponding to a step 400 of classifying the item as a member or non-member of the item class. The classification confidence generated by the trained object classifier is analyzed to identify objects that are confusing to the trained object classifier (e.g., objects with a low classification confidence). The performance of the trained object classifier is evaluated to determine if the performance of the object classifier is acceptable (step 912).

To determine whether the performance of the object classifier is acceptable, a disjoint test set is available, wherein the classes of objects of the disjoint test set are known prior to classification by the trained object classifier. The image data 901 supplied to the trained object classifier may correspond to disjoint test sets, and the classification made by the trained object classifier is comparable to the actual class of the object. From this comparison, the performance of the trained object classifier is determinable. If the performance is not equal to or above a predetermined performance level, the confusing object is presented to the user for manual identification (step 904). The user is marking confusing objects and the newly marked objects are used to retrain the object classifier (steps 906 and 800). When the object classifier is retrained, the feature/transition combinations for the different stages 500 may be updated based on the newly labeled object. The retrained object classifier is used to classify the objects represented in the image data 901, and the performance of the retrained object classifier is evaluated (steps 910 and 912). The retraining method may continue until the performance of the retrained object classifier is acceptable. When the performance of the retrained object classifier is acceptable, it may be deployed (step 914). The training method is characterized in that the method can be rearranged into the following steps:

1. a small portion of a data set is manually marked.

2. An object classifier is trained using portions of the data set.

3. The newly trained classifier is applied to automatically label the complete data set.

4. A set of automatically labeled data points that are confusing to the object classifier are selected.

5. Manually marking the confused data points.

6. The training is repeated with all newly marked data points.

7. And (5) going to step 3.

Classification trust

The decision step value s (x) is related to the confidence of the classification. The correlation may be non-linear, i.e.: the step 400 may generate a high positive value, but the item may not be a member of the item class. Typically, the higher the value of s (X) the less likely it is that step 400 will make an error due to the training method. Confidence associated with a value of s (X) is computed by first defining a pointer function ((X)), where: belong to the category of

A confidence function Ψ ((X)) may be defined as the probability that step 400 declares an object to belong to the positive class, and is correct for an output of s (X) -v. Thus, for a small quantization interval [ v- Δ, v + Δ ], the confidence function can be expressed as follows:

Ψ((X))＝P_，Ω|((X)) ═ 0, ω ═ class | s (X) < v + Δ) - (11)

P_，Ω|((X)) ═ 0, ω ═ class | s (X) < v- Δ)

Note that it may be considered: step 400 declares an object as belonging to the positive class when s (X) > 0, i.e.: p, Ω | (ω ═ + class | s (x) > 0) ═ 1. Thus, for v > 0, equation (11) is representable as:

Ψ((X))＝P_|Ω，Г((X)) ═ 0| ω ═ type, 0 < s (X) ≦ v + Δ) - (12)

P_|Ω，Г((X)) ═ 0| ω ═ class, 0 < s (X) < v- Δ)

Formula (12) represents the true rate when v ∈ [ Δ, 1- Δ ] and s (X) ∈ [ v- Δ, v + Δ ].

Similarly, the confidence function for a step 400 declaring an item as belonging to the negative category for v ≦ Δ may be expressed as:

Ψ((X))＝P_|Ω，((X)) ═ 0| ω ═ class, v + Δ ≦ s (X) ≦ 0) - (13)

P_|Ω，((X)) ═ 0| ω ═ class, v- Δ < s (X) ≦ 0)

The expression (13) represents the true negative rate when v ∈ [ -1+ delta- ] -delta]And s (X) ∈ [ v- Δ, v + Δ ]]. Therefore, if the probability that the step 400 is correct for any observed output value s (x) ═ v (as defined in equations (12) and (13)) is high, the step 400 is deemed to be relied upon for its answer. For this self-evaluation of confidence, a probability measurementIs estimated from the training data set and the confidence function Ψ ((X)) is inferred. If the confidence function Ψ ((X)) is smaller than a predetermined output value for s (X)The confidence threshold c is bounded, then the step 400 is considered to be different or confusing with respect to the output value. The different classification is forwarded to the next step 400. Thus, the confidence function Ψ ((X)) may be applied during training to identify objects that are confusing to an object classifier. If the discriminant function that constitutes stage 500 is a good approximation to the best decision boundary of Bayes, then decision step value s (X) will be monotonically related to the confidence function Ψ ((X)). For the initial step 400, the object classifier may not sufficiently approximate the Bayesian decision boundary. Thus, the decision step value s (X) and the confidence function Ψ ((X)) for a given step 400 may not be constantly monotonically correlated.

The confidence function Ψ ((X)) may be employed to determine the acceptance threshold τ a and rejection threshold τ r for different stages 400. With respect to other cascade classifier architectures, if the confidence function Ψ ((X)) for the positive class monotonically increases with the decision step value s (X), the acceptance threshold τ a is chosen such that the true limit is met. If the confidence function Ψ ((X)) does not monotonically increase with the decision step value s (X), the acceptance threshold τ a may remain saturated at 1, i.e.: and is not accepted as a positive class for the corresponding step 400. The lack of monotonicity is noted: in the positive region, the decision boundary is not sufficiently well reflecting the best Bayesian classifier. Similarly, a rejection threshold τ r is selected if the negative category confidence is monotonically related to the decision step value s (X). In fact, negative classes may be more densely populated in the feature space than positive classes. Thus, while a monotonic relationship is a positive class that may not exist for an initial step, it is highly likely that a negative class exists for an initial step.

Performing temporal classifier evolution

For this argument, the classifier model, its classification operations, and its active learning offline have been described. Next, the online evolution of the classifier is described. On-line run-time classifier evolution is an off-line active learning approach similar to that described above and illustrated in FIG. 9. The execution time evolution is a step comprising: (1) collecting user feedback; (2) training a new additional step for the classifier; (3) validating the classifier via passive observation; (4) requesting user authentication and deploying the classifier if the classifier is passively authenticated; and, if possible, (5) uploading the specialized classifier with performance statistics and location information to a central feedback server so that it absorbs a more general classifier for the specialized is producible and trained.

Fig. 10 illustrates a method 1000 for collecting feedback from a user operating the user interface 104. In many security and surveillance systems, a user, such as a security officer, responds to the alarm and notifies receipt of the alarm. If an alarm is considered false, it may be documented as a false alarm or may be ignored. A preferred embodiment of a system with run time classifier evolution is to have the user explicitly notify the system when the alarm is a false alarm. Accordingly, the method 1000 raises an alert to the user (step 1002). The alert presented to the user includes: the classification result (i.e., the class of the object as determined by the classifier) and the video data in which the object is present. The user may enter an approval or rejection of the classification. The method 1000 receives the user feedback (step 1010) and determines whether the classifier misclassifies the object (step 1012). The method 1000 collects the set of features applied to the classification and stores it as "error metadata" (step 1014). When the number of errors exceeds a predetermined value, the method 1000 may initiate a corrective "customized" training procedure.

An error may take the form of a false positive, meaning: the classifier incorrectly accepts an object as part of a positive class at a previous step of the cascade (e.g., the classifier is classifying a non-human as being human type). An error may also take the form of a false negative, meaning: the classifier rejects an object and concludes that it is a non-member of an object class when the object is actually a member of the class (e.g., the classifier is disabled to classify a human as a human type). For example, a classifier may assign a "suspect" or "unknown" class to an object that it cannot classify with sufficient confidence. If the object is actually a human or vehicle or the like, the user may indicate the error.

One specialized training procedure may require adding an additional step at the end of the steps that make up the classifier's cascade and training the new step to separate false alarms from valid or "true" alarms, as indicated by user feedback. The additional step added to the classifier is what may be referred to as a "specialization step". In a sense, the customization step is to assist the classifier in becoming more specific to classifying the object that is proposed to the particular classifier, considering its location, camera, etc.

According to one embodiment, the specialization takes one of two forms: (1) specialization of places; and (2) camera specialization. In this embodiment, specialized steps are trained using false alarm errors. Thus, as shown in FIG. 11, classifier 300 has made an erroneous positive classification at some of the stages 1 through N of its cascade. It is from the classifier 300 that the dummy is sent to a site-specific synchronization stage 1110 and then, if necessary, to a camera-specific synchronization stage 1120, if present.

The site-specific stride 1110 is a general stride that is trained to reduce false alarms by taking features that are part of the operation of the general classifier 300. The data used to train the site-specific stride 1110 is site-specific. Thus, a site-specific classifier 1130 (i.e., a classifier modified or augmented to include the site-specific step 1110) may not operate with increased accuracy at a different site.

The camera specialization step 1120 is a step that is trained to only reduce false alarms for a particular camera. If the site specific classifier 1130 is disabled to reduce the number of false alarms, a camera specific classifier 1140 is trainable.

Fig. 12 is a flow diagram of a specialized training method 1200, comprising: and (5) actively verifying. After a sufficient number of errors have been collected (steps 1202 and 1204), a site-specific stride is incremented and trained (step 1206). The performance of the location-specific classifier is verified via a verification method (step 1208). If its performance is acceptable (step 1210), then: if the error rate is sufficiently lower than the generic classifier, the method proceeds to step 1218. However, if the error rate is not sufficiently reduced, a camera customization step is increased for each camera that is the source of the error (step 1212). The performance of the camera specific classifier is verified via a verification method (step 1214). If its performance is acceptable (step 1216), the method proceeds to step 1218. If the net error rate has not been sufficiently reduced, the user feedback collection step 1202 continues. If an improved classifier is constructed, then any previously trained specializations are tested (step 1218) to see if they are consistent with the new specializations. If there is a previously trained specialization for which it is consistent, then the previously trained specialization is selected (step 1220) and directed to a passive verification step 1224 of the method 1200. Otherwise, specialization of the new training is selected (step 1222) and passively verified in step 1224 of the method 1200. If the new specialization is validated and may be deployed (step 1226), it is the database added to the specialization step level (step 1228) and actually deployed (step 1230). The storing step 1228 is advantageous since different specializations may be needed for different seasons of the year or different configurations of the scene under surveillance. Thus, the specialization of the previous deployment is likely to be reapplied at a later time.

Two different validation operations may be performed before deploying a classifier. First, passive verification is a decision to compare the feedback provided by the user confirmation alarm to a specific classifier. If the specialized classifier is more user-friendly than the deployment classifier, the specialized classifier is considered valid and may then be actively verified, which is a second type of verification. During active verification, the system actively presents the private classifier to the user, displaying false alarms rejected by the private classifier and/or the true values rejected by the private classifier. The user selects which errors they are acceptable and which are not. The system then attempts to adjust the rejection threshold τ r and/or rejection threshold τ a for the dedicated classifier so that the user's preference is most closely achieved. If the performance goal is not achieved, the classifier is declared invalid and the data collection process continues. Otherwise, the dedicated classifier is deployed.

Fig. 13 illustrates a passive verification method 1300 in greater detail. The method 1300 provides alerts from a camera 102 to a user and a specialized classifier (steps 1302 and 1304). The user at the user interface 104 approves or rejects the alert and the user feedback is accepted by the method 1300 (step 1310). Similarly, the specialized classifier accepts or rejects the alerting object that caused the false alarm as a member of the class, and those decision results are accepted by the method 1300 (step 1340). The method 1300 automatically compares the user feedback confirmation alert with the decision made by the dedicated classifier (step 1350). If the specialized classifier is not more user-friendly than the deployment classifier (step 1360), it is continually refined (step 1370). If the specialized classifier is more user-friendly than the deployment classifier (step 1360), the specialized classifier is considered valid and proceeds to an active verification operation (step 1380). The improvement in consistency means that: specialization is the ability to reject most false alarms without rejecting true alarms. Only more consistent specialization is forwarded to active verification step 1380 and eventually deployment (step 1390).

FIG. 14 shows an active verification method 1400 that actively engages a user in verifying a specialized classifier. The user engaged in the active authentication method is preferably a supervisor, manager, or other high-level person who is more skilled in catching errors or intentional disruptions to the system. The method 1400 classifies the specialized classifier inconsistency into false positive and false negative (step 1410). The method 1400 presents both to the user (steps 1420 and 1430). The user at the user interface 104 then classifies the error as acceptable or unacceptable. A false alarm/false negative tradeoff is automatically performed by appropriately choosing τ r (step 1440). Increasing τ r increases the number of false negatives and decreases the number of false positives. The system is intended to adjust τ r so that a dedicated classifier is the best to meet the user. If performance is not acceptable (step 1450), the classifier is declared invalid and the data collection process continues (step 1460). Otherwise, the specialized classifier is deployed (step 1470).

By learning and/or specialization, the classifier is adaptable to its environment and automatically changes thereto. A camera system with such a classifier may require little or no manual field geometric calibration or adjustment. This can result in substantial cost savings by reducing or eliminating the need for manpower to install or adjust the system by trained personnel, such as when season changes or cameras are moving. One camera system that utilizes a classifier as described herein is installable by generally anyone who is familiar with camera installation.

Another benefit of an accurate classifier is that: improved accuracy in classifying objects can improve the quality of feedback it supplies to an object detection module and its components, such as: a foreground/background separator as described in the above-mentioned us patent application 10/884,486, thereby further improving the performance of the overall system.

Another benefit is that feedback about accurate classifiers can occur as it is collected from a variety of locations. In particular, if a specialization is trained and the site specialization step provides a considerable performance improvement, the specialized classifier can be uploaded to a central feedback server. FIG. 15 shows a detailed feedback collection and generalization method 1500. Error metadata for the error regarding the customized fix, if allowed by the user/site, is collected (step 1505), packaged with performance statistics and site information (step 1510), and provided to a feedback server via a network 1520 (step 1515). At the feedback server, the metadata and associated data are stored in a feedback database 1525. The method 1500 evaluates the performance of the specialized classifier using the error metadata and video data stored in a video database 1530 at or accessible by the feedback server. In this way, a specialized classifier can be used to automatically label large amounts of training data. Unlabeled data that the generic classifier is not compliant with the specialized classifier is presented to the user for labeling at the central training facility (not shown). A new generic classifier is then trained to agree with the specialized classifier for those patterns that it correctly classifies (step 1540). The specialization accumulated from multiple sites may be employed in a similar manner. If a new generic classifier is a specialized classifier that can be trained to be more consistent with all uploads, the new generic classifier is distributed to all sites for possible deployment. Specifically, the method 1500 tests whether the generic classifier is better than the previous (step 1545). If so, it can be launched as a new generic classifier for deployment at the site (step 1550). If not, the special classifier is marked as a location template. When the performance of a specialized classifier is considered location-specific, the decision is compared to the stored location templates (step 1555). If there is a location template that is more consistent with the specialized classifier, then the location template may be updated (step 1565). Otherwise, the specialized classifier is stored as a new location template (step 1560).

Optionally, the method 1500 may test whether the location template is a seasonal improvement (step 1570), and if so, schedule the specialization as a seasonal improvement (step 1575).

The central integration and distribution of feedback with updated or updated classifiers or classifier parameters enables integrated community feedback based on recognition errors. Data for customer locations that encounter similar problems is integrable and a new classifier is trainable and propagated. The broad-based feedback allows the broad collection of information to be included in training a new classifier; for example, information about false alarms from a variety of systems may be shared. In general, metadata includes sufficient information for the classifier to know without accessing the original video data. When a new classifier is updated internally at a customer site, the new version of the classifier is also sent to other customers. For example, a new classifier may be employed to determine which customer locations have similar sources of false alarms. As the number of deployed systems increases, the amount and quality of the feedback collected may also increase, thereby enabling more accurate generic classifier generation based on the feedback.

Periodic updates are pushed to all network cameras much like a virus protection system. The classifier specialization can be regarded as a new object definition file. Each new definition file can be verified at the customer site using both passive and active verification mechanisms. If the new definition file is verified, it is proposed to the user for deployment.

Learning the new generic object classes uses the same procedure as the active learning method described above. Site-specific learning of new object classes is in the same way as the false alarm reduction method. In a typical scenario, a new class is a particular type, usually a more general class. For example, a user may want to distinguish a delivery truck from other vehicles. Thus, a "delivery truck" is a particular type of vehicle. A specialized architecture such as that shown in fig. 16 with a new object classifier 1610 may then be implemented as a separate classifier rather than as a modification of an existing class. The customized method is visible as a false alarm reduction method, wherein vehicles that are not truck-bound are false alarms. The passive and active verification operations are preferably completed before a new classifier for the new object is deployed.

Automatic calibration

A calibration module may be included in the video analysis module 200 (fig. 2) to automatically update the representative sizes of the various classified objects in response to the classification performed by the object classification module 210 during field operations. Conversely, the calibration module may provide information representative of the updated representative dimensions to the object classification module to improve its classification performance.

FIG. 17 is a flow chart of a method 1700. the method 1700 utilizes and updates a size function that relates a size of an object of a given type to its position in a field of view. The size function is a parameterized function that can be position, such as: a second order polynomial in the X and Y coordinates. Since classified objects 1702 having a general fixed size for all members of the class (e.g., adult height) are made available by the object classification module, the method 1700 determines whether they are classified as members of the class with high or low confidence (step 1710), preferably by checking the confidence estimates described above. If confidence is low, a size function is applied to the object at its current location (step 1720) and the value obtained from the size function is compared to the actual size of the image to determine if it is a sufficiently close match (step 1725). If so, the classification of the object 1702 as a member of the object class may be confirmed as correct (step 1730). If the actual size does not sufficiently closely match the size calculated by the size function, the method 1700 classifies the object as a non-member of the object class (step 1735). In either case, the size function is not changed whenever the confidence in the classifier is low, since it would be prudent to attempt to use suspect data to calibrate the size function.

The size function is updatable (step 1740) using the actual size of the object as an additional data point for the expected size of the class of objects for which it appears in the image. The size function is updated by modifying its parameters, such as: by a recursive least squares algorithm or the like. Thus, next time a low confidence object is proposed, the update size function will be applied to confirm or deny the classification. In this manner, the object classification is automatically calibrated during run-time operations using field-reliable data.

The size function may be any parameterized function, the parameters of which can be determined and adjusted by a fit (fit). For example: the form of a height dimension function can be utilized:

height(x.y)＝ax+by+c (14)

other functions such as higher order polynomials can be utilized as desired. The parameters a, b, and c can be determined based on a least squares error fit or other suitable criteria, preferably performed recursively and each time an iteration occurs while step 1740 is performed.

It is also possible to estimate the dimensional error estimate using a selected dimensional error function. The size error function error (x, x) is a parameterized function resembling the coordinates in the image field of view and is an estimate of the difference between the size function and the actual size. The dimensional error function itself can be updated recursively each time the actual dimensions of an object are measured. If a value returned by the size error function is too high (i.e., above a threshold), the size error function may be invalid and should not be used to assist in sorting of objects sorted by the object sorting module 210 with low confidence. Therefore, the dimensional error function can be used as a self-checking technique for automatic calibration to avoid mis-calibration. If there are multiple large errors for high and low confidence, the calibration may be invalidated due to an external change such as camera movement. During the period of time when the calibration is stopped, the method 1700 can continue to update the size and size error function (i.e., the high confidence branch on the right side of FIG. 17) until the size error becomes acceptable and the automatic sizing/rejection of the low confidence object (i.e., the low confidence branch on the left side of FIG. 17) can resume.

An object height grid is optionally constructed at the image plane, wherein for each grid cell, the average height of an object is estimated without the aid of manual calibration. A polynomial fit may then be estimated to map the bottom position of an object to its top position and vice versa. Over time, accurate object size estimates may be automatically generated for different portions of the scene, regardless of whether active user feedback is incorporated. In a preferred embodiment, a manual calibration method is not necessary to achieve accurate object size estimation. As greater accuracy develops, the confidence in the information learned increases, so that object size estimation can be used to reduce false detection. Using the height information of the self-verified and tracked object and along with the camera lens information, a complete set of camera parameters can then be estimated and then used to estimate a ground plane and image-to-real-world coordinate map. With sufficient confidence, the geometric information is transferable to be used for detecting objects that lie on the ground plane, such as: in multiple levels, is the upper level of a parking garage that houses similar vehicles.

Fig. 18 is a block diagram of the video analysis module 200 of fig. 3 including a calibration module 240 to perform an auto-calibration process such as described above, according to one embodiment. FIG. 18 also illustrates several other optional modules, such as a velocity estimation module 250, that can use the calibration module 240 to estimate the velocity of the classified items in the field of view using the proportional information derived from the dimensional information generated by the self-calibration module 240.

FIG. 18 also depicts a classifier evolution module 260 that can perform on-site or in-use self-learning or evolution of the item classification module, such as by any of the techniques described herein. Fig. 18 also depicts one or more steerable filters 220 that can be used to calculate edge orientation values. Finally, FIG. 18 also depicts one or more histogram data structures 230 that represent various histograms, such as edge orientation histograms or color histograms, used as object characteristics for object classification purposes. The histogram information can be stored in a data structure having a plurality of dimension values (bins) and a dimension count (bin count), wherein the values represent occurrences of a variable between dimension value fields (bin bound). Unlike the icons, one or more of the modules and other items illustrated in fig. 18 can be independent of the video analysis module 200 and can reside elsewhere in the camera 102 or other portions of the camera system 100.

As used herein, the term "module" is a means that may include one or more hardware circuits or devices, and/or one or more software routines, functions, objects, or the like. A module may be entirely hardware, entirely software, including firmware, or some combination of the foregoing. As used herein, the term "system" is intended to mean a tangible person.

The methods, modules, and systems illustrated and described herein may exist in a variety of forms both active and inactive. For example, they may exist, partially or wholly, as one or more software programs comprised of program instructions in source code, object code, executable code or other formats. Any of the above may be embodied in a computer readable medium, which includes storage devices, in compressed or uncompressed form. Exemplary computer readable storage devices include Random Access Memory (RAM), Read Only Memory (ROM), Erasable Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), flash memory, and magnetic or optical disks or tapes of conventional computer systems.

Conclusion

The terms and descriptions used above are set forth by way of illustration only and are not meant as limitations. For example, the classifier may be part of a remote processing unit and the classification method may be implemented in the remote processing unit, such as: the remote storage/processing unit 106 (fig. 1), a computer associated with the user interface 104, another node in the camera system 108, or another server, such as: at a central location or at one of the other networks. Those skilled in the art will recognize that: these and many other variations, enhancements, and modifications of the concepts described herein are possible without departing from the underlying principles of the invention. The scope of the invention should, therefore, be determined only by the following claims and their equivalents.

Brief description of the drawings

FIG. 1 is a depiction of a camera system according to one embodiment.

FIG. 2 is a simplified block diagram of an image capture device of the system shown in FIG. 1.

FIG. 3 is a block diagram of the object classification module shown in FIG. 2.

Fig. 4 is a block diagram of a classifier of fig. 3.

FIG. 5A is a block diagram showing one of the initial N-1 steps of the classifier shown in FIG. 4.

Figure 5B is a plot of the accept and reject thresholds utilized at the step illustrated in figure 5A.

FIG. 6 is a flow chart depicting a method of utilizing a camera system in accordance with one embodiment.

FIG. 7 is a flowchart illustrating an object tracking method.

FIG. 8 is a flow chart of a method of object classifier training.

FIG. 9 is a flow chart of another method of object classifier training.

FIG. 10 illustrates a method for collecting feedback from a user operating a user interface.

FIG. 11 is a block diagram of a specialized classifier.

FIG. 12 is a flow chart of a specialized training method that includes active verification.

FIG. 13 is a diagram illustrating a passive authentication method in more detail.

FIG. 14 is a flow chart diagram of a method of active verification.

FIG. 15 shows a feedback collection and generalization method.

FIG. 16 is a block diagram of a classifier with an additional step to identify a new object type.

FIG. 17 is a flow chart of a calibration method for utilizing and updating a size function.

FIG. 18 is a block diagram of a video analysis module of FIG. 3 in accordance with one embodiment.

Claims

1. A camera system, comprising:

a video analysis for processing image data representing an image of a field of view projected onto an image plane of an image capture device, the video analysis comprising:

an object classification module comprising an object classifier operative to classify an object captured in the field of view based on the image data, wherein the object classifier is operative to classify the object as a member or non-member of an object class; and

a calibration module connected to the object classification module for estimating representative dimensions of members of the object class, the representative dimensions corresponding to different regions of the image plane, wherein the calibration module is operative to automatically update the representative dimensions in response to the classification performed by the object classifier during online operation, and the calibration module is operative to supply information representative of the updated representative dimensions to the object classifier to improve its object classification performance.

2. The camera system of claim 1, wherein a confidence parameter is associated with an object class when the object classifier classifies the object as a member of the class of objects, and wherein the calibration module updates the representative size when the confidence parameter indicates that the classification is correct high confidence.

3. The camera system of claim 1, further comprising:

a user station having a display and an input device to provide user feedback information in response to the classification by the object classifier, wherein the user feedback information is used to refine the estimated value of the representative dimension.

4. The camera system of claim 1, wherein calibration information from the calibration module is used to improve object detection and classification accuracy of the camera system.

5. The camera system of claim 1, further comprising:

a velocity estimation module connected to the calibration module and operative to estimate a velocity at which objects located in different regions of the field of view are classified.

6. The camera system of claim 1, wherein the object classification module is operative to classify objects without an initial manual calibration.

7. The camera system of claim 1, wherein the object classifier comprises stages corresponding to feature/transition combinations, wherein the feature/transition combinations correspond to features extracted from the image data and a discriminant function for mapping the features to scalar values, the features are selected from a set of possible features, and the discriminant function is selected from a set of possible discriminant functions, the features and the discriminant function being selected so as to maximize an object classification performance of the object classifier.

8. The camera system of claim 7, wherein the set of possible features includes one or more of aspect ratio, edge orientation histogram, color information, and normalized saturation.

9. The camera system of claim 8, wherein the edge orientation histogram is generated using a steerable filter.

10. The camera system of claim 7, wherein the set of possible discriminant functions includes one or both of a radial basis function and a sigmoid function.

11. The camera system of claim 7, wherein the object classifier is operative to classify an object as human or non-human.

12. The camera system of claim 7, wherein the object classifier is operative to classify an object as a carrier or a non-carrier.

13. A method of automatically calibrating a camera system, the method comprising:

receiving image data representing an image of a scene, the image corresponding to an image plane on which the scene is projected;

detecting a first object in the image, the image of the first object being detected at a location of the image plane, and the image of the first object having a size corresponding to the location;

classifying the first object as a first member of an object class;

calculating parameters of a size function for the image plane based on the size of the first object; and

updating the parameters of the size function in response to detection and classification of a second member of the object class.

14. A camera system, comprising:

a video analysis for processing image data representing an image projected onto an image plane of an image capture device, the video analysis comprising:

an object classification module comprising an object classifier operative to classify objects captured in the field of view based on the image data, wherein the object classifier is operative to classify the objects as members or non-members of an object class; and

a calibration module connected to the object classification module for estimating representative dimensions of members of the object class, the representative dimensions corresponding to different regions of the image plane, wherein the calibration module is operative during online operation to automatically update the representative dimensions in response to the classification performed by the object classifier, associate a confidence parameter with a classification of an object by the object classification module, and determine the confidence parameter based at least in part on the representative dimensions of the associated object.

15. A camera system, comprising:

video analysis for processing image data representing an image of a field of view of an image capture device, the camera system comprising:

an object classification module comprising an object classifier operative to determine whether an object represented in the image is a member of an object class, the object classifier comprising N decision steps configured in a cascade configuration, wherein at least one of the N decision steps is operative to (a) accept an object as a member of the object class, (b) reject an object as a member of the object class, and (c) request a next step to determine whether an object is a member of the object class.

16. The camera system as claimed in claim 15, wherein the at least one of the N decision steps comprises a stage for mapping a feature of the object to a scalar value.

17. A method of classifying an object, the object captured by a camera system, the camera system including an object classification module having N decision steps configured in a cascade configuration, the method comprising:

transmitting image data representing an image of the object to a first one of the N decision stages;

identifying features of the object represented in the image data to determine whether the object is a member of an object class, wherein a decision step value is derived from the features of the object; and

making a decision to accept the object as a member of the object class, reject the object as a member of the object class, or forward the image data to a second of the N decision stages for further analysis, the decision being based on a comparison of the decision step value to one or more of an acceptance threshold and a rejection threshold, the acceptance threshold having a higher value than the rejection threshold, wherein when the decision step value is above the acceptance threshold, the object is received as a member of the object class, when the decision step value is below the rejection threshold, the object is rejected as a member of the object class, and when the decision step value is between the acceptance threshold and the rejection threshold, the image data is forwarded to the second decision stage.

18. A camera system, comprising:

video analytics for processing image data representing an image of a field of view of an image capture device, the image data including a representation of an object, the video analytics comprising:

an object classification module comprising an object classifier operative to determine whether an object represented in the image is a member of an object class, the object classifier comprising stages corresponding to feature/transition combinations, wherein the feature/transition combinations correspond to features extracted from the image material and a discriminant function for mapping the features to scalar values, the features are selected from a set of possible features, and the discriminant function is selected from a set of possible discriminant functions, the features and the discriminant function being selected so as to maximize an object classification performance of the object classifier.

19. The camera system of claim 18, wherein the video analytics includes an object detection module connected to the object classification module, the object detection module receiving the image data and operative to detect whether the object is represented in the image data.

20. The camera system of claim 18, wherein the stages are part of a decision stage of the object classifier, the object classifier including a plurality of decision stages configured in a cascade configuration, and each of the plurality of decision stages including one or more stages.

21. The camera system of claim 20, wherein:

one of the plurality of decision steps comprises a first stage and a second stage to generate a first scalar value and a second scalar value;

the object classifier is operative to apply a first weighting coefficient and a second weighting coefficient to the first scalar value and the second scalar value, respectively; and

the object classifier is operative to increase the weighted first and second scalar values to generate a decision step value.

22. The camera system of claim 18, wherein the scalar value is correlated to a classification confidence.

23. The camera system of claim 18, wherein the set of possible features includes one or more of an aspect ratio, an edge orientation histogram, color information, and a normalized scribe saturation.

24. The camera system of claim 23, wherein the edge orientation histogram is generated using a steerable filter.

25. The camera system of claim 18, wherein the set of possible discriminant functions includes one or both of a radial basis function and a sigmoid function.

26. The camera system of claim 18, wherein the object classifier is operative to classify an object as human or non-human.

27. The camera system of claim 18, wherein the object classifier is operative to classify an object as a carrier or a non-carrier.

28. The camera system of claim 18, wherein the object classifier is self-learning.

29. A method of constructing an object classifier in a camera system, the method comprising:

receiving image data including a representation of an object, the object being identified as a member or non-member of an object class;

processing the image data to retrieve a set of features for each of the objects;

searching a set of discriminant functions for each of the set of features for each of the objects to select a discriminant function for a given feature to optimize separation of the tagged objects, the selected discriminant function being associated with its given feature to form one of a group of multiple feature/transition combinations; and

selecting a feature/transition combination from the plurality of multiple feature/transition combinations to be included in the object classifier, the selected feature/transition combination maximizing the object classification performance of the object classifier as compared to unselected ones of the plurality of multiple feature/transition combinations.