WO2009049314A2 - Video processing system employing behavior subtraction between reference and observed video image sequences - Google Patents
Video processing system employing behavior subtraction between reference and observed video image sequences Download PDFInfo
- Publication number
- WO2009049314A2 WO2009049314A2 PCT/US2008/079839 US2008079839W WO2009049314A2 WO 2009049314 A2 WO2009049314 A2 WO 2009049314A2 US 2008079839 W US2008079839 W US 2008079839W WO 2009049314 A2 WO2009049314 A2 WO 2009049314A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- behavior
- sequences
- motion
- sequence
- images
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
- G06V20/54—Surveillance or monitoring of activities, e.g. for recognising suspicious objects of traffic, e.g. cars on the road, trains or boats
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
Definitions
- Moving objects are first detected in a motion segmentation step, then classified and tracked over a certain number of frames, and, finally, the resulting paths used to distinguish "normal” from “suspicious” objects (path different from a path of most objects).
- these methods contain a training phase during which a probabilistic model is built using paths followed by "normal" objects.
- object path discriminant techniques focus on the question of which path is anomalous. However, there is often important information in the dynamics of the path itself, usually ignored by these schemes.
- the present disclosure addresses a slightly different question - how conventional paths are visited, and whether these are statistically anomalous. To illustrate the importance of this approach, the following scenarios can be considered: (a) the lingering time of an object along a normal path (or in some regions) may be statistically significant; (b) an unusually large/small object along a conventional path traveling at normal speeds.
- One novelty in the disclosed approach is the modeling of temporal behavior, and another novelty is in terms of the solution to the problem.
- path-based schemes require many stages of processing ranging from low- level detection to high-level inferencing, and can often result in dramatic failure due to the multiple stages of processing (imprecise motion field, tracking error), the disclosed technique may require only low-level processing and may be efficiently implemented in real-time. Furthermore, on account of its simplicity it may be possible to provide performance guarantees.
- a third advantage is its generality, i.e., it applies to individual moving objects
- This disclosure describes a method and system for the detection and localization of either dissimilarities or similarities between sequences of images based on analysis of patterns of motion and non-motion in pixels of the image sequences.
- the technique can be used in a security context, for example, to identify anomalies in sequences of images obtained from a security camera. It can also be used in an identification context, for example to identify unknown/suspicious video material by comparison with a database or library of known (e.g., copyrighted) video material.
- the disclosed methods and apparatus form a compact, low- dimensional representation (referred to as behavior representation or behavior image) to capture temporal activity (or lack thereof) and account for broad temporal variations in size, shape and color characteristics of real-time video sequences.
- behavior representation or behavior image a compact, low- dimensional representation
- the disclosed system accumulates motion data from multiple image sequences to form corresponding behavior representations.
- one behavior representation is a reference behavior representation obtained by an accumulation applied to a reference (training) image sequence
- the other behavior representation is an observed behavior representation obtained by an accumulation applied to an observed image sequence.
- the image sequences can be obtained from a visible-light or infra-red video camera (sequence of fields if interlaced scan is used, or sequence of frames if progressive scan is used), both analog and digital, from a web camera, computer camera, surveillance camera, etc.
- a comparing function is applied to the behavior representations to detect either dissimilar (anomalous, unusual, abnormal) or similar (highly-correlated) dynamics, or motion patterns, between the image sequences.
- the detection of unusual motion patterns is an enabling technology for suspicious behavior detection in surveillance, homeland security, military, dynamic data analysis (e.g., bird motion patterns), and other applications.
- the detection of similar motion patterns is an enabling technology for the identification of copyrighted video material (against a database of such material) for the purposes of copyright policing/enforcement.
- the technique is a completely new approach to visual anomaly (or similarity) detection and is generic, with no specific models needed for the analysis of motion patterns of humans, animals, vehicles, etc.
- the temporal accumulation of images is, in general, non- linear and can take many different forms.
- anomaly detection based on motion patterns is performed by applying motion detection first (e.g., by means of background subtraction) in order to obtain per-pixel motion detection labels, e.g., 1 for moving, 0 for stationary pixels, from both observed and reference image sequences.
- per-pixel motion detection labels e.g. 1 for moving, 0 for stationary pixels
- the motion detection labels of the observed image sequence undergo temporal summation over N images (current image and N-I prior images) to form a cumulative label field, which is the observed behavior representation.
- a separate cumulative label field is computed similarly from the reference image sequence, but additionally the motion labels undergo a nonlinear "max" operation in time, pixel-by-pixel, i.e., a maximum cumulative label is found at each pixel over all time instants. This produces the reference behavior representation.
- the technique can be used for comparison of a target video sample against representative images (and hence video sequences).
- the output can be identification of copyrighted video material, detecting and localizing abnormal behavior in surveillance, or enabling detecting of illegal activities for example, and it can be employed either indoors, outdoors or on mobile units. It may be implemented on specialized integrated circuit(s) or a general-purpose processor, and can be employed in a stand-alone scenario such as within a single camera unit or as components within a larger networked system.
- This technique can also be used to perform detection of "essential" motion in harsh environments (i.e., detection of moving items of interest (vehicles, people etc.) in environments that also have noisy non- essential motion such as camera jitter, trees rustling, waves, etc.).
- a networked system can then perform local processing and may use these results in combination with advanced fusion technologies for providing area-wide, realtime pervasive surveillance in limited communication environments.
- Advantages of the disclosed technique are (a) Compactness: typical temporal behavior is projected into a low-dimensional image or feature space; (b) Generality: requires little prior information, training or knowledge of the video environment; (c) Robustness: provides a degree of insensitivity to harsh environments; (d) Real-time processing: can be implemented to provide real-time information with existing hardware in current stand-alone video cameras.
- temporal segmentation of motion detection labels into 1 (motion activity) and 0 (no motion activity) states may be followed by measurement of the average temporal length of the 1 and
- the behavior representation is a single two-dimensional histogram of average "busy” versus "idle” times over a time window.
- a separate two- dimensional histogram of average busy versus idle times over a time window can describe each pixel position separately.
- the comparing function applied to the behavior representations may be a simple difference operation followed by "floor-to-zero" operation, or it may be a more elaborate scheme.
- the histograms of average "busy” and “idle” times for the observed and reference image sequences can be compared using histogram equalization.
- the histograms are transformed into continuous probability density functions by means of adding low- variance random noise to each histogram count, followed by non- parametric kernel estimation to evaluate the likelihood of an observed pixel's busy and idle times being drawn from the estimated non-parametric probability density function.
- the basic motion information may also be augmented by other information to enhance operation, such as information identifying the size, shape and/or color of the object that each pixel or region is part of.
- busy and "idle” cycles registered at each pixel (or region) can also be used to find correspondence between cameras. Since the busy-idle distributions are unique for each pixel (or region) containing a non-zero activity, they can be used as a signature to uniquely define pixels in the video. Furthermore, since binary motion labels are almost invariant, the position and orientation of the camera (under the assumption that the moving objects' height be significantly smaller than the elevation of the camera), pixels of different cameras looking at the same region shall have a similar busy-idle distribution.
- a light projector (be it laser-based, LCD-based or any other) to project patterns of light (typically, patterns containing white and black regions) on a three-dimensional object. If one or more cameras look at that object while the light patterns are projected, those patterns can be interpreted by the cameras as being binary motion labels.
- a correspondence map between the pixels (or the regions) of each camera and the projector can be established. This correspondence map can then be used as a disparity map to build a three dimensional view of the observed object.
- One embodiment of the video identification application is as follows.
- a database of copyrighted video material is assembled by applying one of the behavior representations described above to each of a collection of known copyrighted videos. Then, the same representation is computed for a suspicious video. The two resulting behavior representations are compared using either the difference and floor-to-zero operators, or a histogram-based metric. If these behavior representations are judged very similar, a copyright violation alert is issued.
- the use of a single-image representation in this case leads to a very compact and fast video identification system. It can be used as a triage system to separate definitive non- violations from possible violations.
- a behavior sequence representation is used.
- a behavior sequence is a sequence of behavior images, each such image computed from a sub-sequence (temporal segment) of an analyzed video sequence.
- the sub-sequences can be obtained either by uniform temporal partitioning of the analyzed video sequence (e.g., 100-frame segments) or by a non-uniform temporal partitioning for example, partitioning based on scene-cut detection).
- a non-uniform temporal partitioning for example, partitioning based on scene-cut detection.
- behavior image representations for each sub-sequence and comparison metric are applied to both the database and suspicious video material.
- the matching of the observed behavior sequence with the video database is preferably performed through a "sliding window" (space and time) approach.
- the observed behavior sequence is matched with all same-length temporal segments and same-size spatial windows of all behavior sequences in the copyright material database.
- the generality of the proposed representations and comparison mechanisms makes the anomaly detection and video identification robust to spatial resolution change (spatial scaling), compression, spatial filtering, and other forms of spatial video manipulation.
- Figure 1 is a block diagram of an image-processing system
- Figure 2 is a flow diagram of operation of the system of Figure 1
- Figures 3(a)-3(f) and 4(a)-4(f) are video images and image-like representations illustrating examples of the disclosed method.
- Figure 1 shows a video processing system 10 including a camera 12, video storage 14, image processor 16, and a database 18.
- the camera 12 generates a video signal 20 representing captured video frames, which are stored in the storage 14.
- the stored video frames are provided to the image processor 16 as image sequences 22.
- the image processing circuitry 16 calculates reference behavior representations 24 and stores these in the database 18, and then later retrieves the stored reference behavior representations 24 as part of processing observed image sequences 22 and generating user output.
- the nature of the user output varies based on the particular application.
- the user output may include alerts as well as an image on a user output device (e.g., display) indicating the location of anomalous behavior in an imaging field.
- the user output may include an identification of a reference image sequence that has matched an observed image sequence.
- Figure 2 illustrates pertinent operation of the image processor 16.
- sequences of motion labels are calculated from corresponding sequences of images, each sequence including motion label values each indicating presence and absence of motion on a region-by-region basis.
- Various region sizes and motion label types may be employed.
- one simple motion label is the presence or absence of motion, i.e., for each pixel in a frame, whether or not that pixel is part of an object in motion.
- the result of this operation for each video frame is a corresponding frame of per-pixel motion label values, such as "1" for "moving" and "0" for "not moving".
- regions larger than a pixel may be covered by a label, and the label values may be more complex.
- the label values may be accompanied by additional information that can enhance operation, such as information identifying other attributes of the moving objects that each pixel/region constitutes, such as size, shape, color, etc.
- the image processor 16 at step 32 uses the motion labels for the first and second image sequences to calculate respective first and second "behavior representations", each being a low-dimensional representation of the corresponding image sequence.
- the behavior representations is stored in the database 18, especially in applications such as surveillance in which a reference behavior representation is calculated from a reference image sequence that precedes an observed image sequence from which the observed behavior representation is calculated. This operation may be repeated for multiple image sequences.
- the image processor 16 calculates a comparing function of the behavior representations from step 32, which may include retrieving one or more of the subject behavior representations from the database 18 if necessary.
- the result of the calculation of step 34 can be used to provide an indication to a user or other application- specific functionality.
- the calculation of step 34 may identify anomalous behavior, and in that case appropriate visual and/or other indications can be made to a user.
- the calculation of step 34 may identify which of several reference image sequences is matched by the observed image sequence.
- Figure 3(a) is a representative image from a video of a highway scene near an overpass. It will be appreciated that over a typical short interval (e.g., 15 minutes), many cars will pass by the camera both on the highway (left-to-right) as well as on the overpass
- Figure 3(b) is a single motion label field derived from the image of Figure 3(a), with the pixels of moving objects (e.g., cars and trucks) showing as white and the pixels of stationary objects showing as black. It will be appreciated that for each pixel, the set of motion labels calculated for that pixel over all images of the sequence forms a respective sequence of motion labels.
- Figure 3(c) is a reference behavior representation which is calculated from all the per-pixel sequences of motion labels over a typical interval. The behavior representation of Figure 3(c) is taken as a reference behavior representation. The image can be interpreted as indicating (by pixel amplitude) the amount of motion experienced in different areas of the image.
- Figures 3(d) through 3(f) illustrate the use of the reference behavior representation of Figure 3(c) to identify anomalous behavior.
- Figure 3(d) shows the scene at a time when a train is passing by (near the top of the image), and Figure 3(e) shows the corresponding motion label field.
- Figure 3(f) shows the result of a comparing function applied between the reference behavior representation of Figure 3(c) and an observed behavior representation calculated from the sequences of motion labels for the observed image sequence. As shown in Figure 3(f), only the areas where the train is located show up, indicating that the passing of the train is anomalous based on the training sequence represented by Figure 3(a). Further details of operation are now provided.
- the goal of a video surveillance system is to detect unusual behavior such as a motor vehicle accident, vehicle breakdown, an intrusion into a restricted area, or someone leaving a suspicious object (e.g. walking away from a suitcase).
- unusual behavior such as a motor vehicle accident, vehicle breakdown, an intrusion into a restricted area, or someone leaving a suspicious object (e.g. walking away from a suitcase).
- the starting point is a definition of "normality”. It is proposed that a video sequence exhibiting normal behavior is one whose dynamic content has already been observed in a training sequence.
- an abnormal sequence is one whose dynamic content has not been observed in the training sequence.
- l ⁇ x,t) be a training video sequence exhibiting normal activity and l(x,t)be an observed video sequence which may contain unusual behavior, where xis a pixel location and t denotes time.
- letj be an abnormality label field that is sought to be estimated; ⁇ (x ⁇ is 0 for normal behavior and 1 for abnormal behavior.
- ⁇ L[x, t t ) ⁇ f be an observed motion label sequence, i.e., a sequence of motion labels, with value of 1 if a pixel belongs to a moving object and 0 otherwise, computed from the observed sequence using motion detection.
- ⁇ L ⁇ x,t) ⁇ f bQ a reference motion label sequence computed similarly from the reference (training) sequence ⁇ (x,t t ) ⁇ f .
- MRF Markov Random Field
- abnormality is defined as follows: a series of observations is abnormal if it contains an unusually high amount of activity. Therefore, a JF-frame time series ⁇ Z,(jc o ,t : ) ⁇ £ _ w+l is abnormal if it contains more l's than any W- frame time series in the training sequence ⁇ L(X 0 , t t ) ⁇ k k _ w+l for W ⁇ k ⁇ M .
- a pixel at x 0 is abnormal if the previous W observations ⁇ Z,(x o ,t : ) ⁇ £ _ w+l exhibit more activity than the maximum amount of activity registered at x 0 in the training sequence.
- the B image succinctly synthesizes the ongoing activity in the training sequence and thus is a form of reference (training) behavior representation. It implicitly includes the paths followed by moving objects as well as the amount of activity registered at every point in the training sequence.
- v is a form of observed behavior representation.
- D(B(x),v(x)) lv(x)- B(x)i, where
- the behavior detection problem is reduced to background subtraction i.e., subtraction of image v, containing a snapshot of activity just prior to t, from the background image B, containing an aggregate of long-term activity in the training sequence.
- background subtraction i.e., subtraction of image v, containing a snapshot of activity just prior to t, from the background image B, containing an aggregate of long-term activity in the training sequence.
- the method may be seen as involving behavior subtraction.
- average activities can be compared, and the background behavior image computed using an averaging operator over M frames of the reference motion label sequence as follows: and similarly the observed behavior image can be computed as W- frame average over the observed motion label sequence: ) -
- the background and observed behavior images obtained by means of averaging are again compared using the distance-measuring function D. Since in this embodiment an average motion activity is compared between the observed and reference sequences, only a departure from average motion is detected. Therefore, this approach is a very effective method for motion detection in presence of some nominal activity in the scene, such as due to camera vibrations, animated water, or fluttering leaves.
- Figures 3(a)-3(f) and 4(a)-4(f) are of a highway scene with adjacent overpass and train tracks.
- Figures 4(a)-4(f) are of a busy intersection, and the anomalous behavior identified by Figure 4(f) is the presence of a streetcar visible in the center of the image of Figure 4(d) but absent from the training sequence represented by Figure 4(a).
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Closed-Circuit Television Systems (AREA)
- Image Analysis (AREA)
Abstract
An image processing method includes calculating sequences of motion labels from each of first and second sequences of images, each sequence of motion labels including motion label values indicating levels of motion activity in corresponding regions of the corresponding sequence of images. First and second behavior representations are calculated, each from the sequences of motion labels calculated from the respective image sequence, such that each region of each behavior representation represents a measure of motion activity of the corresponding region throughout the respective sequence of images. A comparing function of the first and second behavior representations is calculated, and an output signal is generated that reflects a result of calculating the comparing function. Applications include surveillance applications in which anomalous behavior is identified in an observed image sequence versus background behavior in a training image sequence.
Description
VIDEO PROCESSING SYSTEM EMPLOYING BEHAVIOR SUBTRACTION BETWEEN REFERENCE AND OBSERVED VIDEO IMAGE SEQUENCES
BACKGROUND Pervasive, wide-area visual surveillance, unthinkable even 10 years ago, is a reality today due to the invention of wireless network video camera. Easily deployed and remotely managed, such a camera can generate round-the-clock visual data (also at night) that can be used in access control, vehicle and pedestrian traffic analysis, gait recognition, detection of unattended luggage, etc. However, the sheer amount of data produced by each camera prevents human-operator monitoring. Therefore, automatic algorithms need to be developed for specific tasks. One such task recently gaining prominence is the detection of suspicious behavior, i.e., locating someone or something whose behavior differs from behavior observed in a reference video sequence. Many surveillance methods are based on a general pipeline- based framework. Moving objects are first detected in a motion segmentation step, then classified and tracked over a certain number of frames, and, finally, the resulting paths used to distinguish "normal" from "suspicious" objects (path different from a path of most objects). In general, these methods contain a training phase during which a probabilistic model is built using paths followed by "normal" objects.
SUMMARY
While there are advantages of using object path as a motion attribute, there are several drawbacks as well. Fundamentally, object path discriminant techniques focus on the question of which path is anomalous. However, there is often important information in the dynamics of the path itself, usually ignored by these schemes. The present disclosure addresses a slightly different question - how conventional paths are visited, and whether these are statistically anomalous. To illustrate the importance of this approach, the following scenarios can be considered: (a) the lingering time of an object along a normal path (or in some regions) may be statistically significant; (b) an unusually large/small object along a conventional path traveling at normal speeds. One novelty in the disclosed approach is the modeling of temporal behavior, and another novelty is in terms of the solution to the problem. While path-based schemes require many stages of processing ranging from low-
level detection to high-level inferencing, and can often result in dramatic failure due to the multiple stages of processing (imprecise motion field, tracking error), the disclosed technique may require only low-level processing and may be efficiently implemented in real-time. Furthermore, on account of its simplicity it may be possible to provide performance guarantees. A third advantage is its generality, i.e., it applies to individual moving objects
(car, truck, person), groups of objects, merging and/or splitting objects. Finally, the disclosed technique is robust against harsh environments (jittery cameras with unknown calibration, highly cluttered scenes, rain/snow/fog, etc.).
This disclosure describes a method and system for the detection and localization of either dissimilarities or similarities between sequences of images based on analysis of patterns of motion and non-motion in pixels of the image sequences. The technique can be used in a security context, for example, to identify anomalies in sequences of images obtained from a security camera. It can also be used in an identification context, for example to identify unknown/suspicious video material by comparison with a database or library of known (e.g., copyrighted) video material.
More particularly, the disclosed methods and apparatus form a compact, low- dimensional representation (referred to as behavior representation or behavior image) to capture temporal activity (or lack thereof) and account for broad temporal variations in size, shape and color characteristics of real-time video sequences. The disclosed system accumulates motion data from multiple image sequences to form corresponding behavior representations. In one class of embodiments one behavior representation is a reference behavior representation obtained by an accumulation applied to a reference (training) image sequence, and the other behavior representation is an observed behavior representation obtained by an accumulation applied to an observed image sequence. The image sequences can be obtained from a visible-light or infra-red video camera (sequence of fields if interlaced scan is used, or sequence of frames if progressive scan is used), both analog and digital, from a web camera, computer camera, surveillance camera, etc.
A comparing function is applied to the behavior representations to detect either dissimilar (anomalous, unusual, abnormal) or similar (highly-correlated) dynamics, or motion patterns, between the image sequences. The detection of unusual motion patterns is an enabling technology for suspicious behavior detection in surveillance, homeland security, military, dynamic data analysis (e.g., bird motion patterns), and other applications. The detection of similar motion patterns, on the other hand, is an enabling technology for the
identification of copyrighted video material (against a database of such material) for the purposes of copyright policing/enforcement. The technique is a completely new approach to visual anomaly (or similarity) detection and is generic, with no specific models needed for the analysis of motion patterns of humans, animals, vehicles, etc. The temporal accumulation of images (or video frames) is, in general, non- linear and can take many different forms. In one embodiment, anomaly detection based on motion patterns is performed by applying motion detection first (e.g., by means of background subtraction) in order to obtain per-pixel motion detection labels, e.g., 1 for moving, 0 for stationary pixels, from both observed and reference image sequences. Subsequently, the motion detection labels of the observed image sequence undergo temporal summation over N images (current image and N-I prior images) to form a cumulative label field, which is the observed behavior representation. A separate cumulative label field is computed similarly from the reference image sequence, but additionally the motion labels undergo a nonlinear "max" operation in time, pixel-by-pixel, i.e., a maximum cumulative label is found at each pixel over all time instants. This produces the reference behavior representation.
The technique can be used for comparison of a target video sample against representative images (and hence video sequences). The output can be identification of copyrighted video material, detecting and localizing abnormal behavior in surveillance, or enabling detecting of illegal activities for example, and it can be employed either indoors, outdoors or on mobile units. It may be implemented on specialized integrated circuit(s) or a general-purpose processor, and can be employed in a stand-alone scenario such as within a single camera unit or as components within a larger networked system. This technique can also be used to perform detection of "essential" motion in harsh environments (i.e., detection of moving items of interest (vehicles, people etc.) in environments that also have noisy non- essential motion such as camera jitter, trees rustling, waves, etc.). It can also be used to automatically register multiple cameras by comparing reference behavior representations from multiple cameras. A networked system can then perform local processing and may use these results in combination with advanced fusion technologies for providing area-wide, realtime pervasive surveillance in limited communication environments. Advantages of the disclosed technique are (a) Compactness: typical temporal behavior is projected into a low-dimensional image or feature space; (b) Generality: requires little prior information, training or knowledge of the video environment; (c) Robustness: provides a
degree of insensitivity to harsh environments; (d) Real-time processing: can be implemented to provide real-time information with existing hardware in current stand-alone video cameras.
Other embodiments of the accumulation of images are possible. For example, the temporal segmentation of motion detection labels into 1 (motion activity) and 0 (no motion activity) states may be followed by measurement of the average temporal length of the 1 and
0 states ("busy" and "idle" times) over a time window, and forming a scatter plot of these average lengths. Now, the behavior representation is a single two-dimensional histogram of average "busy" versus "idle" times over a time window. Alternatively, a separate two- dimensional histogram of average busy versus idle times over a time window can describe each pixel position separately. Although more memory-consuming, this embodiment allows for finer granularity of anomalous motion pattern detection.
The comparing function applied to the behavior representations may be a simple difference operation followed by "floor-to-zero" operation, or it may be a more elaborate scheme. For example, the histograms of average "busy" and "idle" times for the observed and reference image sequences can be compared using histogram equalization. In yet another embodiment, the histograms are transformed into continuous probability density functions by means of adding low- variance random noise to each histogram count, followed by non- parametric kernel estimation to evaluate the likelihood of an observed pixel's busy and idle times being drawn from the estimated non-parametric probability density function. The basic motion information may also be augmented by other information to enhance operation, such as information identifying the size, shape and/or color of the object that each pixel or region is part of.
The use of "busy" and "idle" cycles registered at each pixel (or region) can also be used to find correspondence between cameras. Since the busy-idle distributions are unique for each pixel (or region) containing a non-zero activity, they can be used as a signature to uniquely define pixels in the video. Furthermore, since binary motion labels are almost invariant, the position and orientation of the camera (under the assumption that the moving objects' height be significantly smaller than the elevation of the camera), pixels of different cameras looking at the same region shall have a similar busy-idle distribution. Thus, with a simple distance metric between busy-idle distributions (be it a Euclidean distance, a Kullback-Leibler distance, or a Kolmogorov-Smirnov distance for example), the correspondences between pixels of different cameras looking at a single scene can be found. Once such pixel-to-pixel (or region-to-region) correspondence has been established, the
behavior model learned by one camera can be transferred to other cameras looking at the scene, enabling them to detect abnormal activity.
Based on a similar idea, one can use a light projector (be it laser-based, LCD-based or any other) to project patterns of light (typically, patterns containing white and black regions) on a three-dimensional object. If one or more cameras look at that object while the light patterns are projected, those patterns can be interpreted by the cameras as being binary motion labels. Thus, following the same busy-idle distance metric, a correspondence map between the pixels (or the regions) of each camera and the projector can be established. This correspondence map can then be used as a disparity map to build a three dimensional view of the observed object.
One embodiment of the video identification application is as follows. A database of copyrighted video material is assembled by applying one of the behavior representations described above to each of a collection of known copyrighted videos. Then, the same representation is computed for a suspicious video. The two resulting behavior representations are compared using either the difference and floor-to-zero operators, or a histogram-based metric. If these behavior representations are judged very similar, a copyright violation alert is issued. The use of a single-image representation in this case leads to a very compact and fast video identification system. It can be used as a triage system to separate definitive non- violations from possible violations. In another embodiment that requires more memory and computing power but achieves higher robustness, a behavior sequence representation is used. A behavior sequence is a sequence of behavior images, each such image computed from a sub-sequence (temporal segment) of an analyzed video sequence. The sub-sequences can be obtained either by uniform temporal partitioning of the analyzed video sequence (e.g., 100-frame segments) or by a non-uniform temporal partitioning for example, partitioning based on scene-cut detection). Using the same type of partitioning (uniform or non-uniform), behavior image representations for each sub-sequence and comparison metric are applied to both the database and suspicious video material. However, since the suspicious video material may be cropped (in spatial dimensions and time), the matching of the observed behavior sequence with the video database is preferably performed through a "sliding window" (space and time) approach. The observed behavior sequence is matched with all same-length temporal segments and same-size spatial windows of all behavior sequences in the copyright material
database. Although more demanding computationally and memory-wise, this approach assures robustness to spatial and temporal video cropping.
The generality of the proposed representations and comparison mechanisms makes the anomaly detection and video identification robust to spatial resolution change (spatial scaling), compression, spatial filtering, and other forms of spatial video manipulation.
BRIEF DESCRIPTION OF THE DRAWINGS
The foregoing and other objects, features and advantages will be apparent from the following description of particular embodiments of the invention, as illustrated in the accompanying drawings in which like reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating the principles of various embodiments of the invention. Figure 1 is a block diagram of an image-processing system; Figure 2 is a flow diagram of operation of the system of Figure 1; Figures 3(a)-3(f) and 4(a)-4(f) are video images and image-like representations illustrating examples of the disclosed method.
DETAILED DESCRIPTION
Figure 1 shows a video processing system 10 including a camera 12, video storage 14, image processor 16, and a database 18. In operation, the camera 12 generates a video signal 20 representing captured video frames, which are stored in the storage 14. The stored video frames are provided to the image processor 16 as image sequences 22. As described in more detail below, the image processing circuitry 16 calculates reference behavior representations 24 and stores these in the database 18, and then later retrieves the stored reference behavior representations 24 as part of processing observed image sequences 22 and generating user output. The nature of the user output varies based on the particular application. In a surveillance application, the user output may include alerts as well as an image on a user output device (e.g., display) indicating the location of anomalous behavior in an imaging field. In an identification application, the user output may include an identification of a reference image sequence that has matched an observed image sequence.
Figure 2 illustrates pertinent operation of the image processor 16. At step 26, sequences of motion labels are calculated from corresponding sequences of images, each sequence including motion label values each indicating presence and absence of motion on a
region-by-region basis. Various region sizes and motion label types may be employed. As mentioned above, one simple motion label is the presence or absence of motion, i.e., for each pixel in a frame, whether or not that pixel is part of an object in motion. The result of this operation for each video frame is a corresponding frame of per-pixel motion label values, such as "1" for "moving" and "0" for "not moving". In alternative embodiments, regions larger than a pixel may be covered by a label, and the label values may be more complex. In particular as noted elsewhere herein, the label values may be accompanied by additional information that can enhance operation, such as information identifying other attributes of the moving objects that each pixel/region constitutes, such as size, shape, color, etc. Referring again to Figure 2, the image processor 16 at step 32 uses the motion labels for the first and second image sequences to calculate respective first and second "behavior representations", each being a low-dimensional representation of the corresponding image sequence. In general, one or both of the behavior representations is stored in the database 18, especially in applications such as surveillance in which a reference behavior representation is calculated from a reference image sequence that precedes an observed image sequence from which the observed behavior representation is calculated. This operation may be repeated for multiple image sequences. At step 34, the image processor 16 calculates a comparing function of the behavior representations from step 32, which may include retrieving one or more of the subject behavior representations from the database 18 if necessary. The result of the calculation of step 34 can be used to provide an indication to a user or other application- specific functionality. For example, in the case of video surveillance, the calculation of step 34 may identify anomalous behavior, and in that case appropriate visual and/or other indications can be made to a user. In a video matching application, the calculation of step 34 may identify which of several reference image sequences is matched by the observed image sequence.
Before describing the details of operation, example images/representations are presented in order to briefly illustrate how the process works. Reference is made to Figures 3(a) through 3(f). Figure 3(a) is a representative image from a video of a highway scene near an overpass. It will be appreciated that over a typical short interval (e.g., 15 minutes), many cars will pass by the camera both on the highway (left-to-right) as well as on the overpass
(top right corner of image). Figure 3(b) is a single motion label field derived from the image of Figure 3(a), with the pixels of moving objects (e.g., cars and trucks) showing as white and the pixels of stationary objects showing as black. It will be appreciated that for each pixel,
the set of motion labels calculated for that pixel over all images of the sequence forms a respective sequence of motion labels. Figure 3(c) is a reference behavior representation which is calculated from all the per-pixel sequences of motion labels over a typical interval. The behavior representation of Figure 3(c) is taken as a reference behavior representation. The image can be interpreted as indicating (by pixel amplitude) the amount of motion experienced in different areas of the image.
Figures 3(d) through 3(f) illustrate the use of the reference behavior representation of Figure 3(c) to identify anomalous behavior. Figure 3(d) shows the scene at a time when a train is passing by (near the top of the image), and Figure 3(e) shows the corresponding motion label field. Figure 3(f) shows the result of a comparing function applied between the reference behavior representation of Figure 3(c) and an observed behavior representation calculated from the sequences of motion labels for the observed image sequence. As shown in Figure 3(f), only the areas where the train is located show up, indicating that the passing of the train is anomalous based on the training sequence represented by Figure 3(a). Further details of operation are now provided.
In many cases, the goal of a video surveillance system is to detect unusual behavior such as a motor vehicle accident, vehicle breakdown, an intrusion into a restricted area, or someone leaving a suspicious object (e.g. walking away from a suitcase). Thus, the starting point is a definition of "normality". It is proposed that a video sequence exhibiting normal behavior is one whose dynamic content has already been observed in a training sequence.
Consequently, an abnormal sequence is one whose dynamic content has not been observed in the training sequence. Following this definition, let l{x,t) be a training video sequence exhibiting normal activity and l(x,t)be an observed video sequence which may contain unusual behavior, where xis a pixel location and t denotes time. Also, letj be an abnormality label field that is sought to be estimated; ^(x^is 0 for normal behavior and 1 for abnormal behavior.
To discriminate between normal and abnormal behavior, a measure is used of how likely it is that the dynamic content of an N-frame video sequence {l(x, tt )}f = {l(x, t{ ), l(x, t2),..., l(x, tN )} has already been observed in the M-frame training sequence {ϊ(x, tt )}f = [ϊ(x, tx ), ϊ(x, t2),...,ϊ(x, tM )). Since the comparison is to be made based on dynamic content in both video sequences, a motion attribute at (x,t) is of interest. Let
{L[x, tt )}f be an observed motion label sequence, i.e., a sequence of motion labels, with value of 1 if a pixel belongs to a moving object and 0 otherwise, computed from the observed sequence
using motion detection. Similarly, let {L{x,t)}f bQ a reference motion label sequence computed similarly from the reference (training) sequence {ϊ(x,tt)}f . Many motion detection techniques have been devised to date, including fixed- and adaptive- threshold hypothesis testing, background subtraction, Markov Random Field (MRF), and level-set inspired methods.
Applying a motion detection algorithm to all frames of the training video sequence and, independently, to all frames of the observed video sequence results in respective temporal series of zeros and ones at each pixel location. Each such binary sequence reflects the amount of activity occurring at a given pixel location during a certain period of time. For instance, a pixel whose zero-one sequence contains many l's is located in a relatively "busy" area (exhibiting relatively more motion over time), whereas a pixel associated with many O's is located in a relatively "static" area (exhibiting relatively less motion over time). From the observed and reference motion label sequences {L{X, ^ )}f and {Lyx,^)}^ , respectively, observed and reference behavior representations are computed and compared to establish abnormality. More specifically, suppose that abnormality is defined as follows: a series of observations is abnormal if it contains an unusually high amount of activity. Therefore, a JF-frame time series {Z,(jco,t:)}£ _w+l is abnormal if it contains more l's than any W- frame time series in the training sequence {L(X0 , tt )}k k_w+l for W < k < M . In other words, a pixel at x0 is abnormal if the previous W observations {Z,(xo,t:)}£ _w+l exhibit more activity than the maximum amount of activity registered at x0 in the training sequence.
Since the maximum amount of activity in the training sequence is constant, it may be pre-calculated and stored in a background behavior image B :
where W < k < M . The B image succinctly synthesizes the ongoing activity in the training sequence and thus is a form of reference (training) behavior representation. It implicitly
includes the paths followed by moving objects as well as the amount of activity registered at every point in the training sequence.
Similarly, the activity in the observed sequence is measured by computing an observed behavior image v:
v(x) = ∑ L(x,tj) j =k-W which contains the total amount of activity during the last W frames of the observed sequence. Thus, v is a form of observed behavior representation. Once images B and v have been computed, one simple possibility for the comparing function is a distance-measuring function D such as : D(B(x),v(x)) = lv(x)- B(x)i, where |_α_|0 is a Λ Λfloor to zero" operator (0 if a < 0 and a otherwise). With such a distance measure, the behavior detection problem is reduced to background subtraction
i.e., subtraction of image v, containing a snapshot of activity just prior to t, from the background image B, containing an aggregate of long-term activity in the training sequence. Based on this formulation, the method may be seen as involving behavior subtraction.
In another embodiment, average activities can be compared, and the background behavior image computed using an averaging operator over M frames of the reference motion label sequence as follows:
and similarly the observed behavior image can be computed as W- frame average over the observed motion label sequence:
)-
The background and observed behavior images obtained by means of averaging are again compared using the distance-measuring function D. Since in this embodiment an average motion activity is compared between the observed and reference sequences, only a departure from average motion is detected. Therefore, this approach is a very effective method for
motion detection in presence of some nominal activity in the scene, such as due to camera vibrations, animated water, or fluttering leaves.
The behavior subtraction method has been tested on real video sequences from network video cameras. Example results are shown in Figures 3(a)-3(f) and 4(a)-4(f). As noted above, Figures 3(a)-3(f) are of a highway scene with adjacent overpass and train tracks. Figures 4(a)-4(f) are of a busy intersection, and the anomalous behavior identified by Figure 4(f) is the presence of a streetcar visible in the center of the image of Figure 4(d) but absent from the training sequence represented by Figure 4(a). These figures correspond to the above mathematical designations of the images/fields as follows:
The results of Figures 3(a)-3(f) and 4(a)-4(f) were obtained with 1700 and 2000 training frames, respectively. In both cases, values of JF=IOO, τ= 20, and T= 30 were used. Motion detection was implemented using a simple background subtraction with background derived from a temporal median filter.
While various embodiments of the invention have been particularly shown and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims
1. A method, comprising: calculating sequences of motion labels from each of first and second sequences of images, each sequence of motion labels including a plurality of motion label values indicating levels of motion activity in a corresponding region of the corresponding sequence of images; calculating a first behavior representation from the sequences of motion labels calculated from the first image sequence, such that each region of the first behavior representation represents a measure of motion activity of the corresponding region throughout the first sequence of images; calculating a second behavior representation from the sequences of motion labels calculated from the second image sequence, such that each region of the second behavior representation represents a measure of motion activity of the corresponding region throughout the second sequence of images; calculating a comparing function of the first and second behavior representations; and generating an output signal reflecting a result of calculating the comparing function.
2. A method according to claim 1, wherein: the first sequence of images is a reference image sequence and the first behavior representation is a reference behavior representation calculated in a training mode of operation; and the second sequence of images is an observed image sequence and the second behavior representation is an observed behavior representation calculated in an observation mode of operation.
3. A method according to claim 2, wherein (1) the sequences of images are of a scene captured by a video camera and the method is performed to detect anomalous behavior in the scene, (2) the reference image sequence is a training sequence in which the anomalous behavior is absent, and (3) calculating the comparing function generates a user output identifying the location of the anomalous behavior in the scene.
4. A method according to claim 2, wherein (1) the reference image sequence is one of a plurality of reference image sequences and the method is performed to determine whether the observed image sequence is present among the plurality of reference image sequences, (2) calculating sequences of motion labels and calculating a reference behavior representation is performed for each of the plurality of reference image sequences to create a corresponding plurality of reference behavior representations, and (3) calculating the comparing function is performed for the observed behavior representation and each of the reference behavior representations.
5. A method according to claim 1, wherein (1) the sequences of images are of a scene captured by a video camera and the method performs robust and essential motion detection of the scene, and (2) calculating the comparing function generates a user output which identifies essential motion in the scene.
6. A method according to claim 1, wherein (1) activity in the image sequences is captured by a plurality of video cameras, and (2) motion features captured by each of the video cameras are used to calculate a correspondence map between regions of the cameras.
7. A method according to claim 6, wherein (1) the activity in the image sequences is induced by a projector projecting a pattern of light onto the scene, and (2) the correspondence map is used to perform a three dimensional reconstruction of the scene.
8. A method according to claim 6, wherein the video cameras are part of a multiple-camera two-dimensional imaging system and the correspondence map is used to register the video cameras with each other.
9. A method according to claim 1, wherein the comparing function includes a distance- measuring function and a floor-to-zero operator with a predetermined threshold.
10. A method according to claim 1, wherein the motion label values are respective binary values, a first binary value indicating that a region of an image is part of a moving object, and a second binary value indicating that the region of the image is not part of a moving object.
11. A method according to claim 1 , wherein each region is a pixel.
12. A method according to claim 1, wherein calculating the first and second behavior representations comprises temporal summation of the motion label values over the first and second sequences of motion labels respectively to form respective cumulative label fields.
13. A method according to claim 12, wherein calculating one of the first and second behavior representations includes applying a maximum function to the cumulative label fields.
14. A method according to claim 12, wherein calculating the first and second behavior representations includes using an averaging operator over a predetermined number of frames of the sequences of motion labels.
15. A method according to claim 1, wherein calculating the first and second behavior representations comprises measuring average temporal lengths of busy and idle periods of the sequences of motion labels, and forming respective scatter plots of the measured average temporal lengths.
16. A method according to claim 15, wherein calculating the comparing function comprises creating respective histograms of the measured average temporal lengths of the busy and idle periods of the sequences of motion labels.
17. A method according to claim 16, further comprising comparing the histograms using histogram equalization.
18. A method according to claim 16, further comprising adding low-variance random noise to the histograms to transform the histograms into respective probability density functions, followed by non-parametric kernel estimation.
19. A method according to claim 1, wherein: the first behavior representation includes a first behavior sequence calculated from sequences of motion label fields calculated from respective temporal partitions of the first image sequence; and the second behavior representation includes a second behavior sequence calculated from sequences of motion labels calculated from respective temporal partitions of the second image sequence.
20. A method according to claim 1, further comprising for each of the sequences of images, identifying augmented information about objects moving in the sequences of images and capturing the augmented information in the first and second behavior representations.
21. An image processing system, comprising: one or more video cameras for capturing video images; storage for storing the captured video images as sequences of images; an image processor operative to perform the method of any one of claims 1 to 20 on the sequences of images stored in the storage.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US97915907P | 2007-10-11 | 2007-10-11 | |
| US60/979,159 | 2007-10-11 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2009049314A2 true WO2009049314A2 (en) | 2009-04-16 |
| WO2009049314A3 WO2009049314A3 (en) | 2009-07-30 |
Family
ID=40409759
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2008/079839 Ceased WO2009049314A2 (en) | 2007-10-11 | 2008-10-14 | Video processing system employing behavior subtraction between reference and observed video image sequences |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2009049314A2 (en) |
Cited By (42)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2011022276A3 (en) * | 2009-08-18 | 2011-04-28 | Behavioral Recognition Systems, Inc. | Intra-trajectory anomaly detection using adaptive voting experts in a video surveillance system |
| WO2011022275A3 (en) * | 2009-08-18 | 2011-06-03 | Behavioral Recognition Systems, Inc. | Adaptive voting experts for incremental segmentation of sequences with prediction in a video surveillance system |
| US8170283B2 (en) | 2009-09-17 | 2012-05-01 | Behavioral Recognition Systems Inc. | Video surveillance system configured to analyze complex behaviors using alternating layers of clustering and sequencing |
| US8167430B2 (en) | 2009-08-31 | 2012-05-01 | Behavioral Recognition Systems, Inc. | Unsupervised learning of temporal anomalies for a video surveillance system |
| US8175333B2 (en) | 2007-09-27 | 2012-05-08 | Behavioral Recognition Systems, Inc. | Estimator identifier component for behavioral recognition system |
| US8180105B2 (en) | 2009-09-17 | 2012-05-15 | Behavioral Recognition Systems, Inc. | Classifier anomalies for observed behaviors in a video surveillance system |
| US8189905B2 (en) | 2007-07-11 | 2012-05-29 | Behavioral Recognition Systems, Inc. | Cognitive model for a machine-learning engine in a video analysis system |
| US8200011B2 (en) | 2007-09-27 | 2012-06-12 | Behavioral Recognition Systems, Inc. | Context processor for video analysis system |
| US8218819B2 (en) | 2009-09-01 | 2012-07-10 | Behavioral Recognition Systems, Inc. | Foreground object detection in a video surveillance system |
| US8218818B2 (en) | 2009-09-01 | 2012-07-10 | Behavioral Recognition Systems, Inc. | Foreground object tracking |
| US8270732B2 (en) | 2009-08-31 | 2012-09-18 | Behavioral Recognition Systems, Inc. | Clustering nodes in a self-organizing map using an adaptive resonance theory network |
| US8270733B2 (en) | 2009-08-31 | 2012-09-18 | Behavioral Recognition Systems, Inc. | Identifying anomalous object types during classification |
| US8280153B2 (en) | 2009-08-18 | 2012-10-02 | Behavioral Recognition Systems | Visualizing and updating learned trajectories in video surveillance systems |
| US8285060B2 (en) | 2009-08-31 | 2012-10-09 | Behavioral Recognition Systems, Inc. | Detecting anomalous trajectories in a video surveillance system |
| US8285046B2 (en) | 2009-02-18 | 2012-10-09 | Behavioral Recognition Systems, Inc. | Adaptive update of background pixel thresholds using sudden illumination change detection |
| US8300924B2 (en) | 2007-09-27 | 2012-10-30 | Behavioral Recognition Systems, Inc. | Tracker component for behavioral recognition system |
| US8340352B2 (en) | 2009-08-18 | 2012-12-25 | Behavioral Recognition Systems, Inc. | Inter-trajectory anomaly detection using adaptive voting experts in a video surveillance system |
| US8358834B2 (en) | 2009-08-18 | 2013-01-22 | Behavioral Recognition Systems | Background model for complex and dynamic scenes |
| US8416296B2 (en) | 2009-04-14 | 2013-04-09 | Behavioral Recognition Systems, Inc. | Mapper component for multiple art networks in a video analysis system |
| US8493409B2 (en) | 2009-08-18 | 2013-07-23 | Behavioral Recognition Systems, Inc. | Visualizing and updating sequences and segments in a video surveillance system |
| US8620028B2 (en) | 2007-02-08 | 2013-12-31 | Behavioral Recognition Systems, Inc. | Behavioral recognition system |
| US8625884B2 (en) | 2009-08-18 | 2014-01-07 | Behavioral Recognition Systems, Inc. | Visualizing and updating learned event maps in surveillance systems |
| US8786702B2 (en) | 2009-08-31 | 2014-07-22 | Behavioral Recognition Systems, Inc. | Visualizing and updating long-term memory percepts in a video surveillance system |
| US8797405B2 (en) | 2009-08-31 | 2014-08-05 | Behavioral Recognition Systems, Inc. | Visualizing and updating classifications in a video surveillance system |
| US9104918B2 (en) | 2012-08-20 | 2015-08-11 | Behavioral Recognition Systems, Inc. | Method and system for detecting sea-surface oil |
| US9111353B2 (en) | 2012-06-29 | 2015-08-18 | Behavioral Recognition Systems, Inc. | Adaptive illuminance filter in a video analysis system |
| US9111148B2 (en) | 2012-06-29 | 2015-08-18 | Behavioral Recognition Systems, Inc. | Unsupervised learning of feature anomalies for a video surveillance system |
| US9113143B2 (en) | 2012-06-29 | 2015-08-18 | Behavioral Recognition Systems, Inc. | Detecting and responding to an out-of-focus camera in a video analytics system |
| US9208675B2 (en) | 2012-03-15 | 2015-12-08 | Behavioral Recognition Systems, Inc. | Loitering detection in a video surveillance system |
| US9232140B2 (en) | 2012-11-12 | 2016-01-05 | Behavioral Recognition Systems, Inc. | Image stabilization techniques for video surveillance systems |
| US9317908B2 (en) | 2012-06-29 | 2016-04-19 | Behavioral Recognition System, Inc. | Automatic gain control filter in a video analysis system |
| US9373055B2 (en) | 2008-12-16 | 2016-06-21 | Behavioral Recognition Systems, Inc. | Hierarchical sudden illumination change detection using radiance consistency within a spatial neighborhood |
| US9507768B2 (en) | 2013-08-09 | 2016-11-29 | Behavioral Recognition Systems, Inc. | Cognitive information security using a behavioral recognition system |
| US9633275B2 (en) | 2008-09-11 | 2017-04-25 | Wesley Kenneth Cobb | Pixel-level based micro-feature extraction |
| US9723271B2 (en) | 2012-06-29 | 2017-08-01 | Omni Ai, Inc. | Anomalous stationary object detection and reporting |
| US9805271B2 (en) | 2009-08-18 | 2017-10-31 | Omni Ai, Inc. | Scene preset identification using quadtree decomposition analysis |
| US9911043B2 (en) | 2012-06-29 | 2018-03-06 | Omni Ai, Inc. | Anomalous object interaction detection and reporting |
| CN109543590A (en) * | 2018-11-16 | 2019-03-29 | 中山大学 | A kind of video human Activity recognition algorithm of Behavior-based control degree of association fusion feature |
| CN110135409A (en) * | 2019-04-04 | 2019-08-16 | 平安科技(深圳)有限公司 | The optimization method and device of identification model |
| US10409909B2 (en) | 2014-12-12 | 2019-09-10 | Omni Ai, Inc. | Lexical analyzer for a neuro-linguistic behavior recognition system |
| US10409910B2 (en) | 2014-12-12 | 2019-09-10 | Omni Ai, Inc. | Perceptual associative memory for a neuro-linguistic behavior recognition system |
| US11455836B2 (en) | 2018-08-24 | 2022-09-27 | Shanghai Sensetime Intelligent Technology Co., Ltd. | Dynamic motion detection method and apparatus, and storage medium |
-
2008
- 2008-10-14 WO PCT/US2008/079839 patent/WO2009049314A2/en not_active Ceased
Non-Patent Citations (7)
| Title |
|---|
| BOBICK A F ET AL: "THE RECOGNITION OF HUMAN MOVEMENT USING TEMPORAL TEMPLATES" IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, IEEE SERVICE CENTER, LOS ALAMITOS, CA, US, vol. 23, no. 3, 1 March 2001 (2001-03-01), pages 257-267, XP001005771 ISSN: 0162-8828 * |
| GRAVES A; SHAOGANG GONG: "Surveillance video indexing with iconic patterns of activity" IEE INTERNATIONAL CONFERENCE ON VISUAL INFORMATION ENGINEERING (VIE 2005) 4-6 APRIL 2005 GLASGOW, UK, 2005, pages 409-416, XP008105068 IEE International Conference on Visual Information Engineering (VIE 2005) (CP No.509) IEE Stevenage, UK ISBN: 0-86341-507-5 * |
| JODOIN P-M ET AL: "Behavior subtraction" PROCEEDINGS OF THE SPIE, SPIE, BELLINGHAM, VA , US, vol. 6822, 1 January 2008 (2008-01-01), pages 68220B-1, XP008098903 ISSN: 0277-786X * |
| PIERRE-MARC JODOIN ET AL: "Modeling background activity for behavior subtraction" DISTRIBUTED SMART CAMERAS, 2008. ICDSC 2008. SECOND ACM/IEEE INTERNATIONAL CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, 7 September 2008 (2008-09-07), pages 1-10, XP031329216 ISBN: 978-1-4244-2664-5 * |
| TAO XIANG ET AL: "Beyond Tracking: Modelling Activity and Understanding Behaviour" INTERNATIONAL JOURNAL OF COMPUTER VISION, KLUWER ACADEMIC PUBLISHERS, BO, vol. 67, no. 1, 1 April 2006 (2006-04-01), pages 21-51, XP019216513 ISSN: 1573-1405 * |
| TAO XIANG; SHAOGANG GONG; PARKINSON D: "Autonomous visual events detection and classification without explicit object-centred segmentation and tracking" BMVC2002: BRITISH MACHINE VISION CONFERENCE 2002 2-5 SEPT. 2002 CARDIFF, UK, 2002, pages 233-242, XP008105073 Electronic Proceedings of the 13th British Machine Vision Conference British Machine Vision Assoc Manchester, UK ISBN: 1-901725-20-0 * |
| XIANG T ET AL: "Activity based surveillance video content modelling" PATTERN RECOGNITION, ELSEVIER, GB, vol. 41, no. 7, 1 July 2008 (2008-07-01), pages 2309-2326, XP022594772 ISSN: 0031-3203 [retrieved on 2008-01-28] * |
Cited By (87)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8620028B2 (en) | 2007-02-08 | 2013-12-31 | Behavioral Recognition Systems, Inc. | Behavioral recognition system |
| US9946934B2 (en) | 2007-07-11 | 2018-04-17 | Avigilon Patent Holding 1 Corporation | Semantic representation module of a machine-learning engine in a video analysis system |
| US9665774B2 (en) | 2007-07-11 | 2017-05-30 | Avigilon Patent Holding 1 Corporation | Semantic representation module of a machine-learning engine in a video analysis system |
| US9235752B2 (en) | 2007-07-11 | 2016-01-12 | 9051147 Canada Inc. | Semantic representation module of a machine-learning engine in a video analysis system |
| US8411935B2 (en) | 2007-07-11 | 2013-04-02 | Behavioral Recognition Systems, Inc. | Semantic representation module of a machine-learning engine in a video analysis system |
| US10706284B2 (en) | 2007-07-11 | 2020-07-07 | Avigilon Patent Holding 1 Corporation | Semantic representation module of a machine-learning engine in a video analysis system |
| US8189905B2 (en) | 2007-07-11 | 2012-05-29 | Behavioral Recognition Systems, Inc. | Cognitive model for a machine-learning engine in a video analysis system |
| US9489569B2 (en) | 2007-07-11 | 2016-11-08 | 9051147 Canada Inc. | Semantic representation module of a machine-learning engine in a video analysis system |
| US10423835B2 (en) | 2007-07-11 | 2019-09-24 | Avigilon Patent Holding 1 Corporation | Semantic representation module of a machine-learning engine in a video analysis system |
| US10198636B2 (en) | 2007-07-11 | 2019-02-05 | Avigilon Patent Holding 1 Corporation | Semantic representation module of a machine-learning engine in a video analysis system |
| US8175333B2 (en) | 2007-09-27 | 2012-05-08 | Behavioral Recognition Systems, Inc. | Estimator identifier component for behavioral recognition system |
| US8200011B2 (en) | 2007-09-27 | 2012-06-12 | Behavioral Recognition Systems, Inc. | Context processor for video analysis system |
| US8300924B2 (en) | 2007-09-27 | 2012-10-30 | Behavioral Recognition Systems, Inc. | Tracker component for behavioral recognition system |
| US8705861B2 (en) | 2007-09-27 | 2014-04-22 | Behavioral Recognition Systems, Inc. | Context processor for video analysis system |
| US11468660B2 (en) | 2008-09-11 | 2022-10-11 | Intellective Ai, Inc. | Pixel-level based micro-feature extraction |
| US9633275B2 (en) | 2008-09-11 | 2017-04-25 | Wesley Kenneth Cobb | Pixel-level based micro-feature extraction |
| US10755131B2 (en) | 2008-09-11 | 2020-08-25 | Intellective Ai, Inc. | Pixel-level based micro-feature extraction |
| US12244967B2 (en) | 2008-09-11 | 2025-03-04 | Intellective Ai, Inc. | Pixel-level based micro-feature extraction |
| US9373055B2 (en) | 2008-12-16 | 2016-06-21 | Behavioral Recognition Systems, Inc. | Hierarchical sudden illumination change detection using radiance consistency within a spatial neighborhood |
| US8285046B2 (en) | 2009-02-18 | 2012-10-09 | Behavioral Recognition Systems, Inc. | Adaptive update of background pixel thresholds using sudden illumination change detection |
| US8416296B2 (en) | 2009-04-14 | 2013-04-09 | Behavioral Recognition Systems, Inc. | Mapper component for multiple art networks in a video analysis system |
| US9959630B2 (en) | 2009-08-18 | 2018-05-01 | Avigilon Patent Holding 1 Corporation | Background model for complex and dynamic scenes |
| US8295591B2 (en) | 2009-08-18 | 2012-10-23 | Behavioral Recognition Systems, Inc. | Adaptive voting experts for incremental segmentation of sequences with prediction in a video surveillance system |
| US8493409B2 (en) | 2009-08-18 | 2013-07-23 | Behavioral Recognition Systems, Inc. | Visualizing and updating sequences and segments in a video surveillance system |
| US8379085B2 (en) | 2009-08-18 | 2013-02-19 | Behavioral Recognition Systems, Inc. | Intra-trajectory anomaly detection using adaptive voting experts in a video surveillance system |
| US8625884B2 (en) | 2009-08-18 | 2014-01-07 | Behavioral Recognition Systems, Inc. | Visualizing and updating learned event maps in surveillance systems |
| US8358834B2 (en) | 2009-08-18 | 2013-01-22 | Behavioral Recognition Systems | Background model for complex and dynamic scenes |
| WO2011022275A3 (en) * | 2009-08-18 | 2011-06-03 | Behavioral Recognition Systems, Inc. | Adaptive voting experts for incremental segmentation of sequences with prediction in a video surveillance system |
| US8340352B2 (en) | 2009-08-18 | 2012-12-25 | Behavioral Recognition Systems, Inc. | Inter-trajectory anomaly detection using adaptive voting experts in a video surveillance system |
| US9805271B2 (en) | 2009-08-18 | 2017-10-31 | Omni Ai, Inc. | Scene preset identification using quadtree decomposition analysis |
| WO2011022276A3 (en) * | 2009-08-18 | 2011-04-28 | Behavioral Recognition Systems, Inc. | Intra-trajectory anomaly detection using adaptive voting experts in a video surveillance system |
| US10796164B2 (en) | 2009-08-18 | 2020-10-06 | Intellective Ai, Inc. | Scene preset identification using quadtree decomposition analysis |
| US8280153B2 (en) | 2009-08-18 | 2012-10-02 | Behavioral Recognition Systems | Visualizing and updating learned trajectories in video surveillance systems |
| US10032282B2 (en) | 2009-08-18 | 2018-07-24 | Avigilon Patent Holding 1 Corporation | Background model for complex and dynamic scenes |
| US10248869B2 (en) | 2009-08-18 | 2019-04-02 | Omni Ai, Inc. | Scene preset identification using quadtree decomposition analysis |
| US8285060B2 (en) | 2009-08-31 | 2012-10-09 | Behavioral Recognition Systems, Inc. | Detecting anomalous trajectories in a video surveillance system |
| US10489679B2 (en) | 2009-08-31 | 2019-11-26 | Avigilon Patent Holding 1 Corporation | Visualizing and updating long-term memory percepts in a video surveillance system |
| US8270732B2 (en) | 2009-08-31 | 2012-09-18 | Behavioral Recognition Systems, Inc. | Clustering nodes in a self-organizing map using an adaptive resonance theory network |
| US8167430B2 (en) | 2009-08-31 | 2012-05-01 | Behavioral Recognition Systems, Inc. | Unsupervised learning of temporal anomalies for a video surveillance system |
| US8270733B2 (en) | 2009-08-31 | 2012-09-18 | Behavioral Recognition Systems, Inc. | Identifying anomalous object types during classification |
| US8797405B2 (en) | 2009-08-31 | 2014-08-05 | Behavioral Recognition Systems, Inc. | Visualizing and updating classifications in a video surveillance system |
| US8786702B2 (en) | 2009-08-31 | 2014-07-22 | Behavioral Recognition Systems, Inc. | Visualizing and updating long-term memory percepts in a video surveillance system |
| US8218819B2 (en) | 2009-09-01 | 2012-07-10 | Behavioral Recognition Systems, Inc. | Foreground object detection in a video surveillance system |
| US8218818B2 (en) | 2009-09-01 | 2012-07-10 | Behavioral Recognition Systems, Inc. | Foreground object tracking |
| US8180105B2 (en) | 2009-09-17 | 2012-05-15 | Behavioral Recognition Systems, Inc. | Classifier anomalies for observed behaviors in a video surveillance system |
| US8170283B2 (en) | 2009-09-17 | 2012-05-01 | Behavioral Recognition Systems Inc. | Video surveillance system configured to analyze complex behaviors using alternating layers of clustering and sequencing |
| US8494222B2 (en) | 2009-09-17 | 2013-07-23 | Behavioral Recognition Systems, Inc. | Classifier anomalies for observed behaviors in a video surveillance system |
| US10096235B2 (en) | 2012-03-15 | 2018-10-09 | Omni Ai, Inc. | Alert directives and focused alert directives in a behavioral recognition system |
| US11727689B2 (en) | 2012-03-15 | 2023-08-15 | Intellective Ai, Inc. | Alert directives and focused alert directives in a behavioral recognition system |
| US9208675B2 (en) | 2012-03-15 | 2015-12-08 | Behavioral Recognition Systems, Inc. | Loitering detection in a video surveillance system |
| US11217088B2 (en) | 2012-03-15 | 2022-01-04 | Intellective Ai, Inc. | Alert volume normalization in a video surveillance system |
| US9349275B2 (en) | 2012-03-15 | 2016-05-24 | Behavorial Recognition Systems, Inc. | Alert volume normalization in a video surveillance system |
| US12094212B2 (en) | 2012-03-15 | 2024-09-17 | Intellective Ai, Inc. | Alert directives and focused alert directives in a behavioral recognition system |
| US10257466B2 (en) | 2012-06-29 | 2019-04-09 | Omni Ai, Inc. | Anomalous stationary object detection and reporting |
| US9317908B2 (en) | 2012-06-29 | 2016-04-19 | Behavioral Recognition System, Inc. | Automatic gain control filter in a video analysis system |
| US9111148B2 (en) | 2012-06-29 | 2015-08-18 | Behavioral Recognition Systems, Inc. | Unsupervised learning of feature anomalies for a video surveillance system |
| US11233976B2 (en) | 2012-06-29 | 2022-01-25 | Intellective Ai, Inc. | Anomalous stationary object detection and reporting |
| US9113143B2 (en) | 2012-06-29 | 2015-08-18 | Behavioral Recognition Systems, Inc. | Detecting and responding to an out-of-focus camera in a video analytics system |
| US9911043B2 (en) | 2012-06-29 | 2018-03-06 | Omni Ai, Inc. | Anomalous object interaction detection and reporting |
| US9723271B2 (en) | 2012-06-29 | 2017-08-01 | Omni Ai, Inc. | Anomalous stationary object detection and reporting |
| US11017236B1 (en) | 2012-06-29 | 2021-05-25 | Intellective Ai, Inc. | Anomalous object interaction detection and reporting |
| US10410058B1 (en) | 2012-06-29 | 2019-09-10 | Omni Ai, Inc. | Anomalous object interaction detection and reporting |
| US10848715B2 (en) | 2012-06-29 | 2020-11-24 | Intellective Ai, Inc. | Anomalous stationary object detection and reporting |
| US9111353B2 (en) | 2012-06-29 | 2015-08-18 | Behavioral Recognition Systems, Inc. | Adaptive illuminance filter in a video analysis system |
| US9104918B2 (en) | 2012-08-20 | 2015-08-11 | Behavioral Recognition Systems, Inc. | Method and system for detecting sea-surface oil |
| US9232140B2 (en) | 2012-11-12 | 2016-01-05 | Behavioral Recognition Systems, Inc. | Image stabilization techniques for video surveillance systems |
| US10827122B2 (en) | 2012-11-12 | 2020-11-03 | Intellective Ai, Inc. | Image stabilization techniques for video |
| US9674442B2 (en) | 2012-11-12 | 2017-06-06 | Omni Ai, Inc. | Image stabilization techniques for video surveillance systems |
| US10237483B2 (en) | 2012-11-12 | 2019-03-19 | Omni Ai, Inc. | Image stabilization techniques for video surveillance systems |
| US10735446B2 (en) | 2013-08-09 | 2020-08-04 | Intellective Ai, Inc. | Cognitive information security using a behavioral recognition system |
| US9507768B2 (en) | 2013-08-09 | 2016-11-29 | Behavioral Recognition Systems, Inc. | Cognitive information security using a behavioral recognition system |
| US9639521B2 (en) | 2013-08-09 | 2017-05-02 | Omni Ai, Inc. | Cognitive neuro-linguistic behavior recognition system for multi-sensor data fusion |
| US11991194B2 (en) | 2013-08-09 | 2024-05-21 | Intellective Ai, Inc. | Cognitive neuro-linguistic behavior recognition system for multi-sensor data fusion |
| US12470580B2 (en) | 2013-08-09 | 2025-11-11 | Intellective Ai, Inc. | Cognitive neuro-linguistic behavior recognition system for multi-sensor data fusion |
| US11818155B2 (en) | 2013-08-09 | 2023-11-14 | Intellective Ai, Inc. | Cognitive information security using a behavior recognition system |
| US9973523B2 (en) | 2013-08-09 | 2018-05-15 | Omni Ai, Inc. | Cognitive information security using a behavioral recognition system |
| US12200002B2 (en) | 2013-08-09 | 2025-01-14 | Intellective Ai, Inc. | Cognitive information security using a behavior recognition system |
| US10187415B2 (en) | 2013-08-09 | 2019-01-22 | Omni Ai, Inc. | Cognitive information security using a behavioral recognition system |
| US10409910B2 (en) | 2014-12-12 | 2019-09-10 | Omni Ai, Inc. | Perceptual associative memory for a neuro-linguistic behavior recognition system |
| US11847413B2 (en) | 2014-12-12 | 2023-12-19 | Intellective Ai, Inc. | Lexical analyzer for a neuro-linguistic behavior recognition system |
| US10409909B2 (en) | 2014-12-12 | 2019-09-10 | Omni Ai, Inc. | Lexical analyzer for a neuro-linguistic behavior recognition system |
| US12032909B2 (en) | 2014-12-12 | 2024-07-09 | Intellective Ai, Inc. | Perceptual associative memory for a neuro-linguistic behavior recognition system |
| US11017168B2 (en) | 2014-12-12 | 2021-05-25 | Intellective Ai, Inc. | Lexical analyzer for a neuro-linguistic behavior recognition system |
| US11455836B2 (en) | 2018-08-24 | 2022-09-27 | Shanghai Sensetime Intelligent Technology Co., Ltd. | Dynamic motion detection method and apparatus, and storage medium |
| CN109543590A (en) * | 2018-11-16 | 2019-03-29 | 中山大学 | A kind of video human Activity recognition algorithm of Behavior-based control degree of association fusion feature |
| CN110135409B (en) * | 2019-04-04 | 2023-11-03 | 平安科技(深圳)有限公司 | Optimization method and device for recognition model |
| CN110135409A (en) * | 2019-04-04 | 2019-08-16 | 平安科技(深圳)有限公司 | The optimization method and device of identification model |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2009049314A3 (en) | 2009-07-30 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2009049314A2 (en) | Video processing system employing behavior subtraction between reference and observed video image sequences | |
| US10242282B2 (en) | Video redaction method and system | |
| US7391907B1 (en) | Spurious object detection in a video surveillance system | |
| Wang | Real-time moving vehicle detection with cast shadow removal in video based on conditional random field | |
| US20090041297A1 (en) | Human detection and tracking for security applications | |
| Kumar et al. | Study of robust and intelligent surveillance in visible and multi-modal framework | |
| Li et al. | Decoupled appearance and motion learning for efficient anomaly detection in surveillance video | |
| TW200807338A (en) | Object density estimation in video | |
| CN107122743B (en) | Security monitoring method and device and electronic equipment | |
| Kong et al. | Blind image quality prediction for object detection | |
| CA2670021A1 (en) | System and method for estimating characteristics of persons or things | |
| Kongurgsa et al. | Real-time intrusion—detecting and alert system by image processing techniques | |
| US7382898B2 (en) | Method and apparatus for detecting left objects | |
| Sharma | Human detection and tracking using background subtraction in visual surveillance | |
| Susan et al. | Unsupervised detection of nonlinearity in motion using weighted average of non-extensive entropies | |
| Jodoin et al. | Modeling background activity for behavior subtraction | |
| Boufares et al. | Moving object detection system based on the modified temporal difference and otsu algorithm | |
| Chen et al. | Dust particle detection in traffic surveillance video using motion singularity analysis | |
| Pless | Spatio-temporal background models for outdoor surveillance | |
| Lagorio et al. | Automatic detection of adverse weather conditions in traffic scenes | |
| Shammi et al. | An automated way of vehicle theft detection in parking facilities by identifying moving vehicles in CCTV video stream | |
| WO2018050644A1 (en) | Method, computer system and program product for detecting video surveillance camera tampering | |
| Harasse et al. | People Counting in Transport Vehicles. | |
| Jodoin et al. | Behavior subtraction | |
| Foresti et al. | Vehicle detection and tracking for traffic monitoring |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 08837090 Country of ref document: EP Kind code of ref document: A2 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 08837090 Country of ref document: EP Kind code of ref document: A2 |