CN1200397C

CN1200397C - Method for object action set-up mold

Info

Publication number: CN1200397C
Application number: CNB011226315A
Authority: CN
Inventors: 崔良林; 刘允柱; 班加洛尔·S·曼朱纳思; 孙信鼎; 陈清威
Original assignee: Samsung Electronics Co Ltd; University of California San Diego UCSD
Current assignee: Samsung Electronics Co Ltd; University of California San Diego UCSD
Priority date: 2000-11-14
Filing date: 2001-06-26
Publication date: 2005-05-04
Anticipated expiration: 2021-06-26
Also published as: CN1352439A

Abstract

An object behavior modeling method capable of efficiently modeling complex objects such as human bodies is provided. The object behavior modeling method comprises steps: (a) obtaining an optical flow vector from a video sequence; (b) using the optical flow vector to obtain the probability distribution of feature vectors of multiple video frames; (c) using the feature vector's probability distributions, for state modeling; and (d) based on state transitions, for expressing the behavior of objects in a video sequence. According to this modeling method, complex behaviors such as human activities can be efficiently modeled and recognized without segmenting objects in the fields of video indexing and recognition.

Description

Object Behavior Modeling Method

技术领域technical field

本发明涉及对象行为建模方法，具体地说，涉及一种有效分析诸如人类行为的复杂对象的行为的对象行为方法。本发明还涉及一种使用由对象行为建模方法建模的对象行为模型、识别视频序列中对象的行为或事件的对象行为识别方法。The present invention relates to object behavior modeling methods, and in particular, to an object behavior method for effectively analyzing the behavior of complex objects such as human behavior. The invention also relates to an object behavior recognition method for recognizing behavior or events of objects in a video sequence using an object behavior model modeled by the object behavior modeling method.

背景技术Background technique

包括坐、走、起立或转圈的人类行为能够使用摄像机捕获并存储为数字视频。在获取该数字视频之后，可以分析该数字视频的内容。例如，使用基于训练数据的随机模型能够刻画数字视频的行为的时间和空间特征。这些模型能够用于匹配提供的用于模式识别的视频序列和数据库视频。在模式分析之后，即能够使用这些模式从语义上索引视频。同样，在该处理中，能够获得视频内容的语义归纳。Human behavior including sitting, walking, standing up, or turning in circles can be captured using a video camera and stored as digital video. After the digital video is acquired, the content of the digital video can be analyzed. For example, using stochastic models based on training data can characterize the temporal and spatial characteristics of the behavior of digital videos. These models can be used to match provided video sequences and database videos for pattern recognition. After pattern analysis, the videos can be semantically indexed using these patterns. Also in this process, semantic induction of video content can be obtained.

传统的对象行为分析方法可以分为两类。在第一种分析方法中，将一个为分析行为而设计的装置附着在人体上并用于行为分析。在第二种分析方法中，将对象的几何特征或图像用于行为分析。但是，在第一种方法中，由于所述装置必须附着在人体上，所以该附着在人体上的装置限制了人的活动。同样，在第二种方法中，必需从视频中分割各单个对象。但是，在许多情况下，各单个对象不能从视频中精确分割。特别是，将第二种分析方法应用到比如不能方便地分割的人体的复杂对象是非常困难的。Traditional object behavior analysis methods can be divided into two categories. In the first method of analysis, a device designed to analyze behavior is attached to the human body and used for behavioral analysis. In the second analysis method, geometric features or images of objects are used for behavioral analysis. However, in the first method, since the device must be attached to the human body, the device attached to the human body restricts the movement of the person. Also, in the second method it is necessary to segment each individual object from the video. However, in many cases individual objects cannot be accurately segmented from the video. In particular, it is very difficult to apply the second analysis method to complex objects such as human bodies that cannot be easily segmented.

发明内容Contents of the invention

为了解决上述问题，本发明的一个目的是提供一种能够为比如人体的复杂对象建模的对象行为建模方法。In order to solve the above problems, an object of the present invention is to provide an object behavior modeling method capable of modeling complex objects such as human bodies.

本发明的另一个目的是提供一种使用通过对象行为建模方法建模的行为模型的、对象行为识别方法。Another object of the present invention is to provide an object behavior recognition method using a behavior model modeled by an object behavior modeling method.

为了实现本发明的上述目的，提供一种对象行为建模方法，包括下列步骤：(a)从视频序列中获取光流向量；(b)使用该光流向量，获取多个视频帧的特征向量的概率分布，其中所述特征向量为d×L维向量，d为维数而L为在一个视频帧或感兴趣的区域中的像素数目；(c)使用该特征向量的概率分布，进行状态建模；以及(d)基于状态变换，表达视频序列中对象的行为。In order to achieve the above object of the present invention, a method for modeling object behavior is provided, comprising the following steps: (a) obtaining an optical flow vector from a video sequence; (b) using the optical flow vector to obtain feature vectors of multiple video frames Probability distribution of , wherein the feature vector is a d×L dimensional vector, d is the dimension and L is the number of pixels in a video frame or region of interest; (c) using the probability distribution of the feature vector, the state modeling; and (d) expressing the behavior of objects in a video sequence based on state transitions.

最好步骤(a)基于仿射运动估计。Preferably step (a) is based on affine motion estimation.

最好步骤(a)还包括子步骤：(a-l)将输入视频帧分组为多个视频帧组，并将每一个视频帧组划分为独立状态；(a-2)为每一独立状态的视频帧组中的每一视频获取仿射运动参数；以及(a-3)从仿射运动参数中获取光流向量。Preferably step (a) also includes sub-steps: (a-1) grouping the input video frame into a plurality of video frame groups, and dividing each video frame group into independent states; (a-2) being the video of each independent state Each video in the frame group obtains an affine motion parameter; and (a-3) obtains an optical flow vector from the affine motion parameter.

最好步骤(a-2)包括以下步骤，当I表示强度、t表示时间、x表示像素位置(x，y)、V表示运动向量时，使给定视频中的、基于表达为I_t(x)＝I_t-1(x-V(x))的对象上的像素强度的差的平方和∑(I_t(x)-I_t-1(x-V(x)))²最小的参数确定为运动参数。Preferably step (a-2) comprises the step of, when I represents intensity, t represents time, x represents pixel position (x, y), V represents motion vector, so that in a given video, based on the expression I _t ( x) = It _-1 (xV(x)) sum of squares of differences in pixel intensities on the object Σ( _It (x)-It _-1 (xV(x))) ² The smallest parameter is determined for motion parameter.

最好，步骤(b)包括步骤：按照下式计算概率分布P(Z|Ω)：Preferably, step (b) comprises the step of calculating the probability distribution P(Z|Ω) according to the following formula:

$P P ((Z Z | | Ω Ω)) = = \frac{exp exp ((- - \frac{11}{22} {((Z Z - - m m))}^{T T})) {Q Q}^{- - 11} ((Z Z - - m m))))}{{((22 π π))}^{N N} {| | Q Q | |}^{11 / / 22}}$

其中P＝(p₁，p₂，...p_d)表示在每一像素位置(x，y)计算的运动向量，L表示在一个视频帧或感兴趣的区域中的像素数目，d表示维数，d×L维向量的特征向量Z为Z＝(p¹ ₁，p² ₁，...p^L ₁，p¹ ₂，p² ₂，...，p^L ₂，p¹ _d，p² _d，...，p^L _d)^T，m为特征向量Z的平均向量，Q为特征向量Z的协方差矩阵，并假定特征向量Z是从观测分类(observation class)Ω提供的。where P=(p ₁ , p ₂ ,...p _d ) represents the motion vector calculated at each pixel position (x, y), L represents the number of pixels in a video frame or region of interest, and d represents Dimensions, the eigenvector Z of the d×L-dimensional vector is Z=(p ¹ ₁ , p ² ₁ ,...p ^L ₁ , p ¹ ₂ , p ² ₂ ,...,p ^L ₂ , p ¹ _d , p ² _d ,..., p ^L _d ) ^T , m is the mean vector of eigenvector Z, Q is the covariance matrix of eigenvector Z, and it is assumed that eigenvector Z is provided from the observation class (observation class) Ω .

最好步骤(b)还包括步骤：按照下式分解协方差矩阵Q：Preferably step (b) also includes the step of decomposing the covariance matrix Q according to the following formula:

Q＝φΛφ^T Q＝ ^φΛφT

其中φ的列为协方差矩阵Q的正交特征向量，Λ相应于该协方差矩阵Q的对角特征值；以及按照下式计算概率分布P(Z|Ω)：The columns of φ are the orthogonal eigenvectors of the covariance matrix Q, Λ corresponds to the diagonal eigenvalues of the covariance matrix Q; and the probability distribution P(Z|Ω) is calculated according to the following formula:

$P P ((Z Z | | Ω Ω)) = = [[\frac{exp exp ((- - \frac{11}{22} {Σ Σ}_{i i}^{M m} {y the y}_{}^{i i} / / {α α}_{i i}))}{{((22 π π))}^{M m} {| | Λ Λ | |}^{11 / / 22}}]] [[\frac{exp exp {((- - \frac{11}{22} {Σ Σ}_{M m + + 11}^{N N} {y the y}_{}^{i i} / / 22 ρ ρ))}_{i i}}{{((22 πρ πρ))}^{((N N - - M m)) / / 22}}]]$

其中M为主要元素的数量，y_i为Y的第i个元素，α_i为Q的第i个特征值，p为通过 $ρ = \frac{1}{N - M} Σ_{M + 1}^{N} α_{i}$ 获得的最优值，并假定特征向量Z是从观测分类Ω提供的。Where M is the number of principal elements, y _i is the i-th element of Y, α _i is the i-th eigenvalue of Q, and p is passed $ρ = \frac{1}{N - m} Σ_{m + 1}^{N} α_{i}$ The optimal value is obtained and assumes that the feature vector Z is provided from the observation class Ω.

最好在步骤(c)中，视频序列中的对象行为基于状态变换使用隐含马尔可夫模型(HMM)表达。Preferably in step (c), the behavior of objects in the video sequence is expressed using a Hidden Markov Model (HMM) based on state transitions.

最好隐含马尔可夫模型(HMM)表达为λ＝{Ξ，A，B，∏}，其中N为可能状态的数量，Ξ满足Ξ＝{q₁，q₂，...，q_N}，A为隐含状态i和j之间的变换{a_ij}，B为相应于状态j的观测符号概率{b_j(.)}，∏为初始状态分布，并且状态Ξ＝{q₁，q₂，...，q_N}和初始状态分布∏是基于视频数据事先确定的。It is best to express the Hidden Markov Model (HMM) as λ={Ξ, A, B, ∏}, where N is the number of possible states, and Ξ satisfies Ξ={q ₁ , q ₂ ,...,q _N }, A is the transformation {a _ij } between hidden states i and j, B is the observed symbol probability {b _j (.)} corresponding to state j, ∏ is the initial state distribution, and state Ξ={q ₁ , q ₂ , ..., q _N } and the initial state distribution Π are determined in advance based on the video data.

为了实现本发明的另一个目的，提供一种对象行为识别方法，包括步骤：(a)通过运动估计获取视频帧的特征向量，其中所述特征向量为d×L维向量，d为维数而L为在一个视频帧或感兴趣的区域中的像素数目；(b)使用获取的特征向量确定每一帧所属的状态；以及(c)使用用于确定的状态的变换矩阵，确定使行为模型和从给定的行为模型字典中提供的视频帧之间的概率最大的行为模型，作为所识别的行为。In order to achieve another object of the present invention, a method for object behavior recognition is provided, comprising the steps of: (a) obtaining a feature vector of a video frame by motion estimation, wherein the feature vector is a d×L dimension vector, and d is a dimension and L is the number of pixels in a video frame or region of interest; (b) using the acquired feature vectors to determine the state to which each frame belongs; and (c) using the transformation matrix for the determined states, determine the behavior model and the behavior model with the highest probability between the provided video frames from the given behavior model dictionary, as the identified behavior.

附图说明Description of drawings

通过以下借助附图对优选实施例的详细描述，本发明的上述目的和优点将变得更加清楚，其中：The above objects and advantages of the present invention will become clearer through the following detailed description of preferred embodiments with the aid of the accompanying drawings, wherein:

图1为表示根据本发明的实施例的对象行为建模方法的主要步骤的流程图；Fig. 1 is a flow chart representing the main steps of the object behavior modeling method according to an embodiment of the present invention;

图2A为表示在训练前一个行为的隐含马尔可夫模型(HMM)的示例图，其中一个人开始站起来，但是又返回坐下位置；Figure 2A is an example diagram representing a hidden Markov model (HMM) of a behavior before training, wherein a person starts to stand up, but returns to a sitting position;

图2B为表示在训练后一个行为的隐含马尔可夫模型(HMM)的示例图，其中一个人开始站起来，但是又返回坐下位置；和Figure 2B is an example graph representing a Hidden Markov Model (HMM) of a behavior after training, wherein a person starts to stand up, but returns to a sitting position; and

图3为表示根据本发明的实施例的对象行为识别方法的主要步骤的流程图。FIG. 3 is a flow chart showing the main steps of the object behavior recognition method according to the embodiment of the present invention.

具体实施方式Detailed ways

下面将参照附图详细描述本发明的实施例。本发明不仅限于下面的这些实施例，在本发明的构思和范围内可以对其进行多种改变。提供本发明的实施例仅用于向本领域的技术人员更加完整地解释本发明。Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. The present invention is not limited to the following examples, and various changes can be made within the concept and scope of the present invention. The embodiments of the present invention are provided only to more fully explain the present invention to those skilled in the art.

图1为表示根据本发明的实施例的对象行为建模方法的主要步骤的流程图。由于所有类型的对象行为都可以阐释为经历不同类型的运动的变化的对象，所以最好将行为认为是关于该对象的运动分布。因此，在本发明中，基于运动分布对人的行为进行建模。在本发明中，使用基于模型的运动估计替代精确的运动估计。FIG. 1 is a flowchart representing main steps of a method for modeling object behavior according to an embodiment of the present invention. Since all types of object behavior can be interpreted as varying objects undergoing different types of motion, it is best to think of behavior as a distribution of motion about the object. Therefore, in the present invention, human behavior is modeled based on the motion distribution. In the present invention, model-based motion estimation is used instead of precise motion estimation.

参照图1，在根据本发明的实施例的对象行为建模方法中，首先，在步骤102，将通过手动操纵的状态模型选择输入的视频帧分组为多个视频帧组，并将每一个分组的视频帧组划分为独立状态。Referring to Fig. 1, in the object behavior modeling method according to the embodiment of the present invention, first, in step 102, the video frame that selects input by the state model of manual manipulation is grouped into a plurality of video frame groups, and each group Groups of video frames are divided into independent states.

在步骤104，通过仿射运动估计为每一独立状态的视频帧组中的每一视频获取仿射运动参数。这里，当I表示强度、t表示时间、x表示像素位置(x，y)、V表示运动向量时，运动估计基于对象上的像素强度，像素强度表示为下面的公式1：In step 104, affine motion parameters are obtained for each video in each video frame group in each independent state by affine motion estimation. Here, when I denotes intensity, t denotes time, x denotes pixel position (x, y), and V denotes motion vector, the motion estimation is based on the pixel intensity on the object, which is expressed as Equation 1 below:

I_t(x)＝I_t-1(x-V(x)) ......(1)I _t (x)=I _t-1 (xV(x)) ......(1)

也就是说，在给定区域中，将使差的平方和∑(I_t(x)-I_t-1(x-V(x)))²最小的参数估计为运动参数。That is, in a given area, parameters that minimize the sum of squares of differences Σ(I _t (x)-I _t-1 (xV(x))) ² are estimated as motion parameters.

在基于模型的运动估计中，如果对象的尺寸比摄像机和对象之间的距离小很多，则可以使用仿射模型近似对象的运动。如果使用局部窗口表示视频中的每一点，比如，用5×5的像素尺寸窗口，则可以使用仿射模型参数近似运动。仿射模型用公式2表达：In model-based motion estimation, if the size of the object is much smaller than the distance between the camera and the object, an affine model can be used to approximate the motion of the object. If a local window is used to represent each point in the video, say, with a 5×5 pixel-sized window, motion can be approximated using affine model parameters. The affine model is expressed by Equation 2:

V(x，y)＝ψ(x，y)K ......(2)V (x, y) = ψ (x, y) K ... (2)

这里，(x，y)表示对象上任意一点的坐标，w(x，y)＝(u(x，y)，w(x，y))^T为运动向量，K＝(k₁，k₂，k₃，k₄，k₅，k₆ )为仿射模型参数，

同时，应注意k₁和k₄相应于正交移动，而k₂、k₃、k₅和k₆相应于表面的变形。此时，如果忽略k₂、k₃、k₅和k₆，则运动向量V可以表达为V＝(k₁，k₄)^T。该运动向量V＝(k₁，k₄)^T是典型的光流向量。从而，在步骤106中，可以从仿射运动参数中获得光流向量。Here, (x, y) represents the coordinates of any point on the object, w(x, y)=(u(x, y), w(x, y)) ^T is the motion vector, K=(k ₁ , k ₂ , k ₃ , k ₄ , k ₅ , k ₆ ) are affine model parameters,

At the same time, it should be noted that k ₁ and k ₄ correspond to orthogonal movement, while k ₂ , k ₃ , k ₅ and k ₆ correspond to deformation of the surface. At this time, if k ₂ , k ₃ , k ₅ , and k ₆ are ignored, the motion vector V can be expressed as V=(k ₁ , k ₄ ) ^T . This motion vector V=(k ₁ , k ₄ ) ^T is a typical optical flow vector. Therefore, in step 106, the optical flow vector can be obtained from the affine motion parameters.

下面考虑在每一像素位置(x，y)上计算的运动向量P＝(p₁，p₂，...p_d)。比如，P可以为6维(6-D)的仿射运动参数或2-D的光流向量。当L表示在一个视频帧或感兴趣的区域中的像素数目，d表示维数时，仿射运动参数可以用下面的公式3表示：Next consider the motion vector P = (p ₁ , p ₂ , . . . p _d ) calculated at each pixel position (x, y). For example, P can be a 6-dimensional (6-D) affine motion parameter or a 2-D optical flow vector. When L represents the number of pixels in a video frame or region of interest, and d represents the dimensionality, the affine motion parameters can be expressed by Equation 3 below:

Z＝(p¹ ₁，p² ₁，...，p^L ₁，p¹ ₂，p² ₂，...，p^L ₂，p¹ _d，p² _d，...，p^L _d)^T .....(3)Z=(p ¹ ₁ , ^p ² ₁ , ..., p ^L ₁ ^, p 1 2 , p ² ₂ , ..., p ^L ₂ , p ₁ _d , p ² _d , ..., p ^L _d ) ^T .....(3)

即，由仿射运动向量或光流向量构成的特征向量Z可以表示为d×L维向量。使用该方法，在步骤108，从光流向量获得特征向量Z。That is, the feature vector Z composed of an affine motion vector or an optical flow vector can be represented as a d×L-dimensional vector. Using this method, at step 108, a feature vector Z is obtained from the optical flow vector.

同时，可以使用高斯函数模拟特征向量Z。此时，将高斯函数的平均值称作m，用矩阵表达的特征向量Z的协方差矩阵称作Q。如果特征向量Z从观测分类Ω提供，则按照下面的公式4计算概率分布P(Z|Ω)：At the same time, the eigenvector Z can be simulated using a Gaussian function. In this case, the mean value of the Gaussian function is referred to as m, and the covariance matrix of the eigenvector Z expressed in a matrix is referred to as Q. If the feature vector Z is provided from the observation class Ω, the probability distribution P(Z|Ω) is calculated according to Equation 4 below:

$P P ((Z Z | | Ω Ω)) = = \frac{exp exp ((- - \frac{11}{22} {((Z Z - - m m))}^{T T})) {Q Q}^{- - 11} ((Z Z - - m m))))}{{((22 π π))}^{N N} {| | Q Q | |}^{11 / / 22}} . . . . . . . . . . . . ((44))$

这里，Z表示特征向量，m表示特征向量Z的平均向量，Q为特征向量Z的协方差矩阵。Here, Z represents the eigenvector, m represents the mean vector of the eigenvector Z, and Q is the covariance matrix of the eigenvector Z.

然而，如果按照公式4计算用于观测分类的概率，并考虑视频像素的数量和维数，则获取该概率所需要的计算量非常大。因此，在本实施例中，使用Karhunen-Loeve变换(KLT)使该公式的计算变得简单。首先，定义

等于Z-m。接下来，如果φ的列为Q的正交特征向量，Λ相应于对角特征值，则协方差矩阵可以按照下式5分解：However, if the probability for classification of observations is calculated according to Equation 4, and the number and dimensionality of video pixels are taken into consideration, the computation required to obtain the probability is very large. Therefore, in this embodiment, the calculation of this formula is simplified by using the Karhunen-Loeve transform (KLT). First, define

equal to Zm. Next, if the columns of φ are the orthogonal eigenvectors of Q, and Λ corresponds to the diagonal eigenvalues, the covariance matrix can be decomposed according to Equation 5 below:

Q＝φΛφ^T ......(5)Q＝ ^φΛφT ......(5)

基于此，如果M为主要元素的数量，y_i为Y的第i个元素，α_i为Q的第1个特征值，ρ为通过 $ρ = \frac{1}{N - M} Σ_{M + 1}^{N} α_{i}$ 获得的最优值，则公式4可以近似为下面的公式6：Based on this, if M is the number of principal elements, y _i is the ith element of Y, α _i is the first eigenvalue of Q, and ρ is passed $ρ = \frac{1}{N - m} Σ_{m + 1}^{N} α_{i}$ The optimal value obtained, then Equation 4 can be approximated as Equation 6 below:

$P P ((Z Z | | Ω Ω)) = = [[\frac{exp exp ((- - \frac{11}{22} {Σ Σ}_{i i}^{M m} {y the y}_{i i}^{22} / / {α α}_{i i}))}{{((22 π π))}^{M m} {| | Λ Λ | |}^{11 / / 22}}]] [[\frac{exp exp {((- - \frac{11}{22} {Σ Σ}_{M m + + 11}^{N N} {y the y}_{i i}^{22} / / 22 ρ ρ))}_{i i}}{{((22 πρ πρ))}^{((N N - - M m)) / / 22}}]] . . . . . . . . . . . . ((66))$

从而，在本实施例中，如果特征向量Z是从观测分类Ω提供的，则在步骤110中使用公式6计算概率分布P(Z|Ω)。接着，在步骤112中，使用如上所述计算的概率分布，对每一状态进行建模。Thus, in this embodiment, if the feature vector Z is provided from the observation class Ω, then in step 110 formula 6 is used to calculate the probability distribution P(Z|Ω). Next, in step 112, each state is modeled using the probability distribution calculated as described above.

然后，在步骤114中，基于状态变换，表达视频序列中对象的行为。在本实施例中，使用隐含马尔可夫模型(HMM)表达视频序列中的对象行为。HMM在训练和识别随时间不同而变化的数据中是一个非常著名的随机模型。特别是，HMM广泛应用于在线特征或连续输入的语音识别中。在使用HMM的语音识别中，在假定能够将语音建模为马尔可夫模型的情况下，通过在训练处理时获取马尔可夫模型的概率参数生成基准马尔可夫模型。同样，在语音识别处理时，通过估计与输入表达最相似的基准马尔可夫模型来识别语音。通常，使用隐含马尔可夫模型作为识别语音的模型，其原因是为了适合语音模式的不同变化。这里，单词“隐含(hidden)”表示状态隐含在模型中，而不考虑语音的模式。当N为可能状态的数量、Ξ满足Ξ＝{q₁，q₂，...，q_N}、A为隐含状态i和j之间的变换{a_ij}、B为相应于状态j的观测符号概率{b_j(.)}、以及∏为初始状态分布时，一般的HMM可以表达为下面的公式7：Then, in step 114, the behavior of the objects in the video sequence is expressed based on the state transitions. In this embodiment, a Hidden Markov Model (HMM) is used to express object behavior in a video sequence. HMM is a very well-known stochastic model in training and recognizing data that changes over time. In particular, HMMs are widely used in speech recognition for online features or continuous input. In speech recognition using an HMM, a reference Markov model is generated by acquiring probability parameters of the Markov model at the time of training processing on the assumption that speech can be modeled as a Markov model. Likewise, in speech recognition processing, speech is recognized by estimating a benchmark Markov model that is most similar to the input expression. Generally, a Hidden Markov Model is used as a model for recognizing speech, and the reason is to adapt to different variations of speech patterns. Here, the word "hidden" means that the state is hidden in the model regardless of the mode of speech. When N is the number of possible states, Ξ satisfies Ξ={q ₁ , q ₂ ,...,q _N }, A is the transformation {a _ij } between hidden states i and j, and B is the transformation corresponding to state j When the observed symbol probability {b _j (.)} of , and ∏ is the initial state distribution, the general HMM can be expressed as the following formula 7:

λ＝{Ξ，A，B，∏} ......(7)λ＝{Ξ, A, B, ∏} ... (7)

状态Ξ＝{q₁，q₂，...，q_N}和初始状态分布∏是基于视频数据事先确定的。可以使用公知的Baum-Welsh重估计公式重复训练隐含马尔可夫模型参数A和B。State Ξ={q ₁ , q ₂ , . . . , q _N } and initial state distribution Π are determined in advance based on video data. Hidden Markov model parameters A and B can be repeatedly trained using the well-known Baum-Welsh reestimation formula.

可以从经验上确定状态模型或状态的数量，在本实施例中，将阐释一个选用4个状态的示例。同样，在本实施例中，使用4个状态对行为进行建模，并将阐述一个为每一状态的变换概率设定同一值的示例。The state model or the number of states can be determined empirically, and in this embodiment, an example of selecting 4 states will be explained. Also, in this embodiment, behavior is modeled using 4 states, and an example of setting the same value for the transition probability of each state will be explained.

图2A和2B示出一个行为的隐含马尔可夫模型的示例，其中一个人开始站起来，但是又返回坐下位置(下文中称作“bd”)。图2A表示在训练前bd的隐含马尔可夫模型的示例；图2B表示在训练后bd的隐含马尔可夫模型的示例。参照图2A，将从一个状态到另一个状态的变换将发生的概率和从一个状态到前一个状态的变换发生的概率统一设定为0.333。为了便于模型开发，假定从一个称作4的状态返回到4发生的概率为1。但是，参照图2B，将从一个状态到另一个状态的变换发生的概率和从一个状态到前一个状态的变换发生的概率设定为不同的值。然后，使用该设置不同的变换概率，获得变换矩阵。接着，将通过各自不同的概率分布定义的多个不同状态和获得的变换矩阵确定为行为模型。这样，完成了行为建模。Figures 2A and 2B show an example of a Hidden Markov Model of behavior in which a person starts to stand up, but then returns to a sitting position (hereinafter referred to as "bd"). FIG. 2A shows an example of a hidden Markov model of bd before training; FIG. 2B shows an example of a hidden Markov model of bd after training. Referring to FIG. 2A , the probability that transition from one state to another state will occur and the probability that transition from one state to the previous state will occur are uniformly set to 0.333. For the convenience of model development, it is assumed that the return from a state called 4 to 4 occurs with probability 1. However, referring to FIG. 2B , the probability of occurrence of transition from one state to another state and the probability of occurrence of transition from one state to the previous state are set to different values. Then, using this to set different transformation probabilities, a transformation matrix is obtained. Next, a plurality of different states defined by respective different probability distributions and the obtained transformation matrix are determined as a behavior model. In this way, behavioral modeling is completed.

根据上述的对象行为建模方法，在视频索引和识别领域，可以有效地对比如人类活动的复杂行为进行建模。特别是，根据本对象行为建模方法，可以对行为识别所需的对象行为进行建模而不用分割对象。According to the object behavior modeling method described above, complex behaviors such as human activities can be effectively modeled in the field of video indexing and recognition. In particular, according to the present object behavior modeling method, object behavior required for behavior recognition can be modeled without segmenting the object.

按照上述实施例的对象行为建模方法可以应用于诸如静态摄像的系统。但是，如果期望应用本方法的系统是一运动摄像机，则必须首先恢复人的运动。随后的步骤与上述实施例中的步骤相同。The object behavior modeling method according to the above-mentioned embodiments can be applied to systems such as still camera. However, if the system to which it is desired to apply the method is a motion camera, the person's motion must first be recovered. Subsequent steps are the same as those in the above-mentioned embodiment.

现在将描述识别行为的处理。图3为表示根据本发明的实施例的对象行为识别方法的主要步骤的流程图。参照图3，首先在步骤302中，输入包含期望被识别的行为的视频帧。接着，在步骤304中，通过输入视频帧的运动估计获取特征向量。可以认为步骤304实质上与参照图1说明的步骤106相同。The processing of the recognition action will now be described. FIG. 3 is a flow chart showing the main steps of the object behavior recognition method according to the embodiment of the present invention. Referring to FIG. 3 , first in step 302 , a video frame containing a desired behavior to be recognized is input. Next, in step 304, feature vectors are obtained through motion estimation of input video frames. It can be considered that step 304 is substantially the same as step 106 described with reference to FIG. 1 .

接着，在步骤306，使用获取的特征向量确定每一视频帧所属的状态。如果T为表示形成视频序列的帧的数量的正整数、Z₁、Z₂、...、Z_T分别为第一帧、第二帧、...、第T帧的特征向量、以及如果给定视频帧O＝{Z₁，Z₂，...，Z_T}、E为状态模型的数量，则在步骤308中，将使提供的视频帧和给定的行为模型字典{λ₁，λ₂，...，λ_E}中的行为模型之间的概率P(O|λ)最大的一个行为模型确定为识别的行为。变换矩阵是在训练处理中通过使用基于相应于场景j的观测符号概率{b_j(.)}的最大期望(EM)算法获得的。为了提高搜索速度，最好在基于使用由包括初始化、状态预测、估量和更新三步构成的卡尔曼滤波的预测算法、具有与训练时使用的相同尺寸的窗口中跟踪运动轨迹。Next, in step 306, the state to which each video frame belongs is determined using the obtained feature vector. If T is a positive integer representing the number of frames forming a video sequence, Z ₁ , Z ₂ , ..., Z _T are respectively the feature vectors of the first frame, the second frame, ..., the T frame, and if Given video frame O={Z ₁ , Z ₂ ,..., Z _T }, E is the number of state models, then in step 308, the provided video frame and the given behavior model dictionary {λ ₁ , λ ₂ ,..., λ _E } among the behavior models, the behavior model with the largest probability P(O|λ) is determined as the recognized behavior. The transformation matrix is obtained in the training process by using an expectation-maximization (EM) algorithm based on the observed symbol probabilities {b _j (.)} corresponding to scene j. In order to improve the search speed, it is best to track the motion trajectory in a window with the same size as that used in training, based on a prediction algorithm using Kalman filtering consisting of three steps: initialization, state prediction, estimation, and update.

以这种方式，可以有效地识别视频序列中比如人类活动的复杂对象行为。根据该对象行为识别方法，能够有效地识别比如人类行为的复杂行为。特别是，根据该对象行为识别方法，可以识别行为而无须分割对象。In this way, complex object behaviors such as human activities in video sequences can be efficiently recognized. According to this object behavior recognition method, complex behavior such as human behavior can be efficiently recognized. In particular, according to the object behavior recognition method, behavior can be recognized without segmenting the object.

此外，根据本发明对象行为建模方法和对象行为识别方法可以写成在个人计算机或服务器计算机上执行的程序。本领域的计算机编程人员能够非常容易地导出构建该程序的程序代码和代码段。另外，该程序可以存储到计算机可读记录介质中。该记录介质可以包括磁记录介质、光记录介质和无线电介质。In addition, the object behavior modeling method and the object behavior recognition method according to the present invention can be written as programs executed on a personal computer or a server computer. Computer programmers skilled in the art can very easily derive the program codes and code segments for constructing the program. Also, the program can be stored in a computer-readable recording medium. The recording media may include magnetic recording media, optical recording media, and radio media.

如上所述，根据本发明，在视频索引和识别领域，能够有效地对比如人类行为的复杂行为建模并识别，而无须分割对象。As described above, according to the present invention, in the field of video indexing and recognition, it is possible to effectively model and recognize complex behaviors such as human behaviors without segmenting objects.

Claims

1. A method for object behavior modeling, comprising the following steps:

(a) Obtain the optical flow vector from the video sequence;

(b) Using the optical flow vector, obtain the probability distribution of a feature vector for multiple video frames, where the feature vector is a d×L dimensional vector, where d is the dimension and L is the the number of pixels in the region;

(c) performing state modeling using the probability distribution of the feature vector; and

(d) Express the behavior of objects in video sequences based on state transitions.

2. The object behavior modeling method of claim 1, wherein step (a) is based on affine motion estimation.

3. The object behavior modeling method as claimed in claim 2, wherein step (a) also comprises sub-steps:

(a-1) grouping the input video frame into a plurality of video frame groups, and dividing each video frame group into independent states;

(a-2) obtaining affine motion parameters for each video in the group of video frames in each independent state; and

(a-3) Obtain the optical flow vector from the affine motion parameters.

4. The object behavior modeling method as claimed in claim 3, wherein step (a-2) comprises the step: when I represents intensity, t represents time, x represents pixel position (x, y), V represents motion vector, Let Σ(I _t ( _x )−I _t−1 ₍ xV(x))) ² The smallest parameter is determined as the motion parameter.

5. The object behavior modeling method as claimed in claim 1, wherein step (b) comprises the step: calculate the probability distribution P (Z|Ω) according to the following formula:

P P ((Z Z | | Ω Ω)) = = \frac{exp exp ((- - \frac{11}{22} {((Z Z - - m m))}^{T T})) {Q Q}^{- - 11} ((Z Z - - m m))}{{((22 π π))}^{N N} {| | Q Q | |}^{11 / / 22}}

where P=(p ₁ , p ₂ ,...p _d ) represents the motion vector calculated at each pixel position (x, y), L represents the number of pixels in a video frame or region of interest, and d represents dimension, the eigenvector Z of the d×L-dimensional vector is Z=(p ¹ ₁ , p ² ₁ ,..., p ^L ₁ , p ¹ ₂ , p ² ₂ ,..., p ^L ₂ , p ¹ _d , p ² _d ,..., p ^L _d ) ^T , m is the mean vector of eigenvector Z, Q is the covariance matrix of eigenvector Z, and it is assumed that eigenvector Z is provided from the observation class Ω.

6. The object behavior modeling method as claimed in claim 1, wherein step (b) also comprises the step:

Decompose the covariance matrix Q according to the following formula:

Q＝ ^φΛφT

where the columns of φ are the orthogonal eigenvectors of the covariance matrix Q, and Λ corresponds to the diagonal eigenvalues of the covariance matrix Q; and

Calculate the probability distribution P(Z|Ω) according to the following formula:

P P ((Z Z | | Ω Ω)) = = [[\frac{exp exp ((- - \frac{11}{22} {Σ Σ}_{i i}^{M m} {y the y}_{i i}^{22} / / {α α}_{i i}))}{{((22 π π))}^{M m} {| | Λ Λ | |}^{11 / / 22}}]] [[\frac{exp exp {((- - \frac{11}{22} {Σ Σ}_{M m + + 11}^{N N} {y the y}_{i i}^{22} / / 22 ρ ρ))}_{i i}}{{((22 πρ πρ))}^{((N N - - M m)) / / 22}}]]

Where M is the number of principal elements, y _i is the i-th element of Y, α _i is the i-th eigenvalue of Q, and ρ is passed

ρ = \frac{1}{N - m} Σ_{m + 1}^{N} α_{i}

Obtain the optimal value and assume that the feature vector Z is provided from the observation class Ω.

7. The object behavior modeling method as claimed in claim 1, wherein in step (c), the object behavior in the video sequence is expressed using a Hidden Markov Model (HMM) based on state transitions.

8. The object behavior modeling method as claimed in claim 7, wherein the Hidden Markov Model (HMM) is expressed as λ={Ξ, A, B, Π}, wherein N is the number of possible states, and Ξ satisfies Ξ ={q ₁ ,q ₂ ,...,q _N ), A is the transformation {a _ij ) between hidden states i and j, B is the observed symbol probability corresponding to state j {b _j (.)} , Π is the initial state distribution, and the state Ξ={q ₁ , q ₂ , . . . , q _N } and the initial state distribution Π are determined in advance based on video data.

9. A method for object behavior recognition, comprising the steps of:

(a) obtaining the feature vector of the video frame by motion estimation, wherein said feature vector is a d×L dimensional vector, d is the dimension and L is the number of pixels in a video frame or region of interest;

(b) using the acquired feature vectors to determine the state to which each frame belongs; and

(c) Using the transformation matrix for the determined state, determine a behavior model that maximizes the probability between the behavior model and the video frame provided from the given behavior model dictionary, as the recognized behavior.

10. The object behavior recognition method as claimed in claim 9, wherein step (c) comprises the step: when T is a positive integer representing the number of frames forming a video sequence, Z ₁ , Z ₂ , ..., Z _T respectively When is the feature vector of the first frame, the second frame, ..., the T frame, and a given video frame O={Z ₁ , Z ₂ , ..., Z _T }, E is the number of state models, Find the behavior model that maximizes the probability P(O|λ) from the given behavior model dictionary {λ ₁ ,λ ₂ ,...,λ _E ).

11. The object behavior recognition method according to claim 10, wherein the transformation matrix is obtained in the training process by using an Expectation Maximum (EM) algorithm based on the observed symbol probability {b _j (.)} corresponding to the scene j.