HK1220795B

HK1220795B - Teaching data generating device, method, and program, and crowd state recognition device, method, and program

Info

Publication number: HK1220795B
Application number: HK16108787.2A
Authority: HK
Inventors: 池田浩雄
Original assignee: 日本电气株式会社
Priority date: 2013-06-28
Filing date: 2014-05-21
Publication date: 2021-01-22

Description

Training data generation device, method, and program, as well as crowd state recognition device, method, and program

技术领域Technical Field

本发明涉及用于生成训练数据的训练数据生成设备、训练数据生成方法和训练数据生成程序以及用于识别图像中的人群的状态的人群状态识别设备、人群状态识别方法和人群状态识别程序。The present invention relates to a training data generating device, a training data generating method and a training data generating program for generating training data, and a crowd state recognition device, a crowd state recognition method and a crowd state recognition program for recognizing the state of a crowd in an image.

背景技术Background Art

提出了用于识别图像中的人群的状态(其在下面将被表示为人群状态)的各种技术(见PTL 1至PTL 3)。Various techniques for recognizing the state of a crowd in an image (which will be denoted as a crowd state hereinafter) are proposed (see PTL 1 to PTL 3).

在PTL 1中描述的人行为确定设备从视频提取改变的区域，并且根据该改变的区域来计算特征量，在改变的区域中由于背景差分等而造成差异。人行为确定设备然后通过使用对特征量进行机器学习的人鉴别器来确定改变的区域是否是人区域，从而检测人区域。另外，人行为确定设备考虑到距离或者颜色直方图而在帧之间关联检测到的人区域，并且在预定数目的帧上跟踪人区域。人行为确定设备然后根据通过跟踪获取的人轨迹来计算诸如平均速度、跟踪时间和运动方向之类的人轨迹的特征量，并且基于人轨迹的特征量来确定人行为。The human behavior determination device described in PTL 1 extracts a changed area from a video and calculates a feature quantity based on the changed area, where differences are caused by background differentiation, etc. The human behavior determination device then determines whether the changed area is a human area by using a human discriminator that performs machine learning on the feature quantities, thereby detecting the human area. In addition, the human behavior determination device associates the detected human areas between frames by taking into account distance or color histogram, and tracks the human areas over a predetermined number of frames. The human behavior determination device then calculates feature quantities of the human trajectory, such as average speed, tracking time, and direction of movement, based on the human trajectory obtained through tracking, and determines human behavior based on the feature quantities of the human trajectory.

在PTL 2中描述的人头数计数设备根据拍摄视频中的人群的视频对人的数目进行计数。人头数计数设备基于头部模型来提取图像中包括的人的头部。人头数计数设备然后通过使用诸如位置信息或颜色分布之类的特征量在帧之间将被确定为同一人的头部位置链接，并且根据链接结果(人跟踪结果)对人的数目进行计数。The head counting device described in PTL 2 counts the number of people based on a video of a crowd captured in a video. The head counting device extracts the heads of people included in the image based on a head model. The head counting device then links the head positions of people identified as the same person between frames using features such as position information or color distribution, and counts the number of people based on the linking results (person tracking results).

在PTL 3中描述的系统检测诸如稳定(例如，人的主流)/不稳定(例如，与主流相对)之类的状态。系统聚集作为确定单元的确定块的光流属性，并且计算用于评估光流的稳定程度的评估值。系统然后根据评估值来确定确定块的状态。The system described in PTL 3 detects states such as stable (e.g., the mainstream of a person) and unstable (e.g., opposed to the mainstream). The system aggregates the optical flow attributes of a determination block, which serves as a determination unit, and calculates an evaluation value used to assess the stability of the optical flow. The system then determines the state of the determination block based on the evaluation value.

引用列表Reference List

专利文献Patent Literature

PTL 1：日本专利申请特开第2011-100175号(0028至0030段)PTL 1: Japanese Patent Application Laid-Open No. 2011-100175 (paragraphs 0028 to 0030)

PTL 2：日本专利申请特开第2010-198566号(0046至0051段)PTL 2: Japanese Patent Application Laid-Open No. 2010-198566 (paragraphs 0046 to 0051)

PTL 3：日本专利申请特开第2012-22370号(0009段)PTL 3: Japanese Patent Application Laid-Open No. 2012-22370 (paragraph 0009)

发明内容Summary of the Invention

技术问题Technical issues

对于在PTL 1至PTL 3中描述的技术，确定性能对于低帧速率的视频降低。特别地，对于在PTL 1至PTL3中描述的技术，针对静止图像无法确定图像中的人群状态。With the techniques described in PTL 1 to PTL 3, determination performance degrades for videos with a low frame rate. In particular, with the techniques described in PTL 1 to PTL 3, the state of a crowd in an image cannot be determined for a still image.

这是因为在PTL 1至PTL 3中描述的技术使用视频中的每个帧并且状态确定性能依赖于帧之间的间隔。例如，利用在PTL 1中描述的技术，人区域在帧之间被关联从而获取人轨迹。另外，利用在PTL 2中描述的技术，头部位置在帧之间被链接并且其结果被假定为人跟踪结果。当这种轨迹或者跟踪结果被获取时，人区域或者头部位置需要在帧之间关联。此时，人的运动量在低帧速率处更大，并且因此人区域或头部位置的变化或者形状(姿势)的变化增大。另外，由于照明的干扰等产生的影响也增大。因此，人区域或头部位置难以在帧之间关联。结果，人轨迹等的精度降低并且确定图像中的人群状态的精度降低。另外，同样利用在PTL 3中描述的技术，难以正确地找到低帧速率的光流，并且结果聚集的属性的精度降低并且状态确定性能降低。This is because the techniques described in PTL 1 to PTL 3 use each frame in the video and the state determination performance depends on the interval between frames. For example, using the technique described in PTL 1, the human region is associated between frames to obtain the human trajectory. In addition, using the technique described in PTL 2, the head position is linked between frames and the result is assumed to be the human tracking result. When such a trajectory or tracking result is obtained, the human region or head position needs to be associated between frames. At this time, the amount of human movement is greater at a low frame rate, and therefore the change in the human region or head position or the change in shape (posture) increases. In addition, the influence caused by interference from lighting, etc. also increases. Therefore, it is difficult to associate the human region or head position between frames. As a result, the accuracy of human trajectories, etc. is reduced and the accuracy of determining the state of the crowd in the image is reduced. In addition, also using the technique described in PTL 3, it is difficult to correctly find the optical flow at a low frame rate, and the accuracy of the resulting aggregated attributes is reduced and the state determination performance is reduced.

例如，假定一种使用具有学习的字典的鉴别器以便识别图像中的人群状态的方法。字典通过诸如指示人群状态的图像之类的训练数据而被学习。然而，用于学习字典的大量训练数据(学习数据)需要被收集。例如，人的布置(人之间的重叠或者人位置的偏差)、人的方向和密度(每单位区域的人)在各种状态下需要被定义，并且其中拍摄人的角度、背景、照明、衣服或姿势等在每一种状态下被不同地改变的大量图像需要被收集。机器学习通过使用图像而被执行以使得鉴别器的字典被获取。然而，用于收集训练数据的工作负荷在收集大量此类训练数据时增大。For example, assume a method of using a discriminator with a learned dictionary to identify the state of a crowd in an image. The dictionary is learned through training data such as images indicating the state of a crowd. However, a large amount of training data (learning data) for learning the dictionary needs to be collected. For example, the arrangement of people (overlap between people or deviation of people's positions), the direction and density of people (people per unit area) need to be defined in various states, and a large number of images in which the angle, background, lighting, clothes or posture of the people are photographed are changed differently in each state need to be collected. Machine learning is performed by using images so that the dictionary of the discriminator is acquired. However, the workload for collecting training data increases when a large amount of such training data is collected.

因此，本发明的一个目的是提供一种能够容易地生成用于对用于识别人群状态的鉴别器的字典进行机器学习的大量训练数据的训练数据生成设备、训练数据生成方法和训练数据生成程序。Therefore, an object of the present invention is to provide a training data generating device, a training data generating method, and a training data generating program that can easily generate a large amount of training data for machine learning of a dictionary of a discriminator for recognizing a crowd state.

本发明的另一目的是提供一种无论帧速率如何都能够优选地识别图像中的人群状态的人群状态识别设备、人群状态识别方法和人群状态识别程序。Another object of the present invention is to provide a crowd state recognition device, a crowd state recognition method, and a crowd state recognition program that can preferably recognize the state of a crowd in an image regardless of the frame rate.

对问题的解决方案Solutions to the Problem

根据本发明的一种训练数据生成设备包括：背景提取装置，用于从多个预先准备的背景图像选择背景图像，提取该背景图像中的区域，并且将对应于提取的区域的图像放大或者缩小为预定尺寸的图像；人状态确定装置，用于根据作为关于多人的人状态的指明信息的多人状态控制指明和作为关于多人中的个别人的状态的指明信息的个别人状态控制指明来确定人群的人状态；以及人群状态图像合成装置，用于生成人群状态图像、指定用于该人群状态图像的训练标签以及输出人群状态图像和训练标签的配对，人群状态图像是其中与由人状态确定装置所确定的人状态相对应的人图像被与由背景提取装置获取的预定尺寸的图像合成的图像。According to the present invention, a training data generating device includes: a background extraction device for selecting a background image from a plurality of pre-prepared background images, extracting an area in the background image, and enlarging or reducing the image corresponding to the extracted area to an image of a predetermined size; a human state determination device for determining the human state of a crowd based on a multi-person state control designation as designation information about the human state of multiple people and an individual person state control designation as designation information about the state of an individual person among multiple people; and a crowd state image synthesis device for generating a crowd state image, specifying a training label for the crowd state image, and outputting a pair of the crowd state image and the training label, wherein the crowd state image is an image in which a human image corresponding to the human state determined by the human state determination device is synthesized with an image of a predetermined size acquired by the background extraction device.

另外，根据本发明的一种人群状态识别设备包括：矩形区域组存储装置，用于存储指示图像上将针对人群状态而被识别的部分的一组矩形区域；人群状态识别字典存储装置，用于存储通过利用人群状态图像和用于人群状态图像的训练标签的多个配对进行机器学习而获取的鉴别器的字典，人群状态图像是以预定尺寸表达人群状态并且包括其基准部位被表达为与针对预定尺寸定义的人的基准部位的尺寸一样大的人的图像；以及人群状态识别装置，用于从给定图像提取在矩形区域组存储装置中存储的该一组矩形区域中指示的区域，并且基于字典来识别在提取的图像中拍摄的人群的状态。In addition, a crowd state recognition device according to the present invention includes: a rectangular area group storage device for storing a group of rectangular areas indicating parts on an image to be recognized for crowd state; a crowd state recognition dictionary storage device for storing a dictionary of discriminators obtained by machine learning using a plurality of pairs of crowd state images and training labels for the crowd state images, the crowd state image being an image of a person expressing a crowd state in a predetermined size and including a reference part expressed as being the same size as the reference part of a person defined for the predetermined size; and a crowd state recognition device for extracting an area indicated in the group of rectangular areas stored in the rectangular area group storage device from a given image, and recognizing the state of a crowd captured in the extracted image based on the dictionary.

另外，根据本发明的一种训练数据生成方法包括：背景提取步骤，从多个预先准备的背景图像选择背景图像、提取该背景图像中的区域并且将对应于提取的区域的图像放大或者缩小为预定尺寸的图像；人状态确定步骤，根据作为关于多人的人状态的指明信息的多人状态控制指明和作为关于多人中的个别人的状态的指明信息的个别人状态控制指明来确定人群的人状态；以及人群状态图像合成步骤，生成人群状态图像、指定用于该人群状态图像的训练标签以及输出人群状态图像和训练标签的配对，人群状态图像是其中与在人状态确定步骤中确定的人状态对应的人图像被与在背景提取步骤中获取的预定尺寸的图像合成的图像。In addition, a training data generation method according to the present invention includes: a background extraction step of selecting a background image from a plurality of pre-prepared background images, extracting an area in the background image, and enlarging or reducing the image corresponding to the extracted area to an image of a predetermined size; a person state determination step of determining the person state of a crowd based on a multi-person state control indication as indication information about the person state of multiple people and an individual person state control indication as indication information about the state of an individual person among multiple people; and a crowd state image synthesis step of generating a crowd state image, specifying a training label for the crowd state image, and outputting a pair of the crowd state image and the training label, wherein the crowd state image is an image in which a person image corresponding to the person state determined in the person state determination step is synthesized with an image of a predetermined size acquired in the background extraction step.

另外，在根据本发明的一种人群状态识别方法中，矩形区域组存储装置存储指示图像上将针对人群状态而被识别的部分的一组矩形区域，并且人群状态识别字典存储装置存储通过利用人群状态图像和用于人群状态图像的训练标签的多个配对进行机器学习而获取的鉴别器的字典，人群状态图像是以预定尺寸表达人群状态并且包括其基准部位被表达为与针对预定尺寸定义的人的基准部位的尺寸一样大的人的图像，该方法包括人群状态识别步骤，从给定图像提取在矩形区域组存储装置中存储的该一组矩形区域中指示的区域并且基于字典来识别在提取的图像中拍摄的人群的状态。In addition, in a crowd state recognition method according to the present invention, a rectangular area group storage device stores a group of rectangular areas indicating parts on an image to be recognized for crowd state, and a crowd state recognition dictionary storage device stores a dictionary of discriminators obtained by machine learning using multiple pairings of crowd state images and training labels for the crowd state images, the crowd state image being an image of a person expressing a crowd state in a predetermined size and including a reference part expressed as the same size as the reference part of a person defined for the predetermined size, the method including a crowd state recognition step of extracting an area indicated in the group of rectangular areas stored in the rectangular area group storage device from a given image and recognizing the state of a crowd captured in the extracted image based on the dictionary.

另外，根据本发明的一种训练数据生成程序使得计算机执行：背景提取处理，从多个预先准备的背景图像选择背景图像、提取该背景图像中的区域并且将对应于提取的区域的图像放大或者缩小为预定尺寸的图像；人状态确定处理，根据作为关于多人的人状态的指明信息的多人状态控制指明和作为关于多人中的个别人的状态的指明信息的个别人状态控制指明来确定人群的人状态；以及人群状态图像合成处理，生成人群状态图像、指定用于该人群状态图像的训练标签以及输出人群状态图像和训练标签的配对，人群状态图像是其中与在人状态确定处理中确定的人状态对应的人图像被与在背景提取处理中获取的预定尺寸的图像合成的图像。In addition, a training data generation program according to the present invention enables a computer to execute: background extraction processing, selecting a background image from a plurality of pre-prepared background images, extracting an area in the background image, and enlarging or reducing the image corresponding to the extracted area to an image of a predetermined size; human state determination processing, determining the human state of a crowd based on a multi-person state control designation as designation information about the human state of multiple people and an individual person state control designation as designation information about the state of an individual person among multiple people; and crowd state image synthesis processing, generating a crowd state image, specifying a training label for the crowd state image, and outputting a pair of the crowd state image and the training label, wherein the crowd state image is an image in which a human image corresponding to the human state determined in the human state determination processing is synthesized with an image of a predetermined size acquired in the background extraction processing.

另外，根据本发明的一种人群状态识别程序使得包括矩形区域组存储装置和人群状态识别字典存储装置的计算机执行：人群状态识别处理，从给定图像提取在矩形区域组存储装置中存储的该一组矩形区域中指示的区域并且基于字典来识别在提取的图像中拍摄的人群的状态，矩形区域组存储装置用于存储指示图像上将针对人群状态而被识别的部分的一组矩形区域，并且人群状态识别字典存储装置用于存储通过利用人群状态图像和用于人群状态图像的训练标签的多个配对进行机器学习而获取的鉴别器的字典，人群状态图像是以预定尺寸表达人群状态并且包括其基准部位被表达为与针对预定尺寸定义的人的基准部位的尺寸一样大的人的图像。In addition, a crowd state recognition program according to the present invention enables a computer including a rectangular area group storage device and a crowd state recognition dictionary storage device to execute: crowd state recognition processing, extracting an area indicated in the group of rectangular areas stored in the rectangular area group storage device from a given image and recognizing the state of a crowd captured in the extracted image based on a dictionary, the rectangular area group storage device being used to store a group of rectangular areas indicating parts on the image to be recognized for the crowd state, and the crowd state recognition dictionary storage device being used to store a dictionary of discriminators obtained by performing machine learning using a plurality of pairings of crowd state images and training labels for the crowd state images, the crowd state image being an image of a person whose reference part is expressed as being the same size as the reference part of a person defined for the predetermined size.

发明的有利效果Advantageous Effects of the Invention

利用根据本发明的训练数据生成设备、训练数据生成方法和训练数据生成程序，可以容易地生成用对用于识别人群状态的鉴别器的字典进行于机器学习的大量训练数据。By using the training data generating device, the training data generating method, and the training data generating program according to the present invention, a large amount of training data for machine learning of a dictionary of a discriminator for recognizing a crowd state can be easily generated.

利用根据本发明的人群状态识别设备、人群状态识别方法和人群状态识别程序，可以无论帧速率如何都优选地识别图像中的人群状态。With the crowd state recognition device, the crowd state recognition method, and the crowd state recognition program according to the present invention, it is possible to preferably recognize the crowd state in an image regardless of the frame rate.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

[图1]其描绘了图示了根据本发明的训练数据生成设备的示例性结构的框图。[ Fig. 1 ] It depicts a block diagram illustrating an exemplary structure of a training data generating apparatus according to the present invention.

[图2]其描绘了图示了在人群状态控制指明存储装置中存储的示例性信息的示意图。[ Fig. 2 ] It depicts a schematic diagram illustrating exemplary information stored in the crowd state control specification storage device. [ Fig. 2 ] FIG.

[图3]其描绘了图示了在人状态控制指明存储装置中存储的示例性信息的示意图。[ Fig. 3 ] It depicts a schematic diagram illustrating exemplary information stored in a human status control specifying storage device.

[图4]其描绘了通过示例图示了在人图像存储装置中存储的人图像和对应于人图像的人区域图像的示图。[ Fig. 4 ] It depicts a diagram illustrating, by way of example, human images stored in a human image storage device and human region images corresponding to the human images. [ Fig. 4 ] FIG.

[图5]其描绘了图示了满足背景人状态的条件的示例的示意图。[ Fig. 5 ] It depicts a schematic diagram illustrating an example of conditions satisfying a background person state.

[图6]其描绘了图示了满足前景人状态的条件的示例的示意图。[ Fig. 6 ] It depicts a schematic diagram illustrating an example of conditions satisfying a foreground person state.

[图7]其描绘了图示了根据本发明的人群状态识别设备的示例性结构的框图。[ Fig. 7 ] It depicts a block diagram illustrating an exemplary structure of a crowd state recognition device according to the present invention.

[图8]其描绘了通过示例图示了如何识别拥挤程度(人的数目)的示意图。[ Fig. 8 ] It depicts a schematic diagram illustrating how to recognize the degree of congestion (the number of people) by way of example.

[图9]其描绘了通过示例图示了如何识别人群的方向的示意图。[Fig. 9] It depicts a schematic diagram illustrating how to recognize the direction of a crowd by way of example.

[图10]其描绘了通过示例图示了如何识别非异常人群或异常人群的示意图。[Fig. 10] It depicts a schematic diagram illustrating how to identify a non-abnormal person or an abnormal person by way of example.

[图11]其描绘了通过示例图示了如何识别无序状态或有序状态的示意图。[ Fig. 11 ] It depicts a schematic diagram illustrating how to identify a disordered state or an ordered state by way of example.

[图12]其描绘了图示了训练数据生成设备的示例性处理进展的流程图。[ Fig. 12 ] It depicts a flowchart illustrating an exemplary processing progress of a training data generating device.

[图13]其描绘了图示了步骤S1的示例性处理进展的流程图。[ Fig. 13 ] It depicts a flowchart illustrating an exemplary processing progress of step S1 .

[图14]其描绘了图示了步骤S2的示例性处理进展的流程图。[ Fig. 14 ] It depicts a flowchart illustrating an exemplary processing progress of step S2 .

[图15]其描绘了图示了步骤S3的示例性处理进展的流程图。[ Fig. 15 ] It depicts a flowchart illustrating an exemplary processing progress of step S3.

[图16]其描绘了图示了步骤S4的示例性处理进展的流程图。[ Fig. 16 ] It depicts a flowchart illustrating an exemplary processing progress of step S4.

[图17]其描绘了图示了人群状态识别设备的示例性处理进展的流程图。[ Fig. 17 ] It depicts a flowchart illustrating an exemplary processing progress of the crowd state recognition device.

[图18]其描绘了通过示例图示了根据本发明的训练数据生成设备的具体结构的框图。[ Fig. 18 ] It depicts a block diagram illustrating a specific structure of a training data generating device according to the present invention by way of example.

[图19]其描绘了通过示例图示了根据本发明的人群状态识别设备的具体结构的框图。[ Fig. 19 ] It depicts a block diagram illustrating a specific structure of a crowd state recognition device according to the present invention by way of example.

[图20]其描绘了图示了根据本发明的训练数据生成设备中的主要部分的框图。[ Fig. 20 ] It depicts a block diagram illustrating main parts in a training data generating apparatus according to the present invention.

[图21]其描绘了图示了根据本发明的人群状态识别设备中的主要部分的框图。[Fig. 21] It depicts a block diagram illustrating main parts in a crowd state recognition device according to the present invention.

具体实施方式DETAILED DESCRIPTION

下面将参考附图来描述根据本发明的示例性实施例。Exemplary embodiments according to the present invention will be described below with reference to the accompanying drawings.

图1是图示了根据本发明的训练数据生成设备的示例性结构的框图。根据本发明的训练数据生成设备10生成用于对图像中的人群状态进行机器学习的训练数据。具体而言，训练数据生成设备10创建多对人群状态的局部图像和对应于局部图像的训练标签。在这里，“局部”指示比针对人群状态将被识别出的图像(通过下面描述的图像获取设备3(参见图7)获取的图像)的区域更小。然后，人群状态的局部图像表示在这种区域中配置人群的人的基本部位(其在下面将被表示为基准部位)的集合。根据本示例性实施例，将在假定头部被用作基准部位的情况下进行描述，但是除头部之外的其他部分可以被用作基准部位。人群状态的局部图像将被表示为人群补丁。人群补丁可以在其中指示除基准部位(本示例中的头部)之外的人的部位。1 is a block diagram illustrating an exemplary structure of a training data generating device according to the present invention. The training data generating device 10 according to the present invention generates training data for machine learning of crowd states in an image. Specifically, the training data generating device 10 creates multiple pairs of partial images of crowd states and training labels corresponding to the partial images. Here, "local" indicates an area smaller than the image (the image acquired by the image acquisition device 3 described below (see FIG7 )) to be identified for the crowd state. The partial image of the crowd state then represents a set of basic parts (which will be represented as reference parts below) of the people configuring the crowd in such an area. According to this exemplary embodiment, it will be described assuming that the head is used as the reference part, but other parts besides the head can be used as the reference part. The partial image of the crowd state will be represented as a crowd patch. The crowd patch can indicate therein parts of the person other than the reference part (the head in this example).

训练数据生成设备10包括在程序控制下工作的数据处理设备1，以及用于将信息存储在其中的存储设备2。The training data generating device 10 includes a data processing device 1 operating under program control, and a storage device 2 for storing information therein.

存储设备2包括背景图像存储装置21、学习局部图像信息存储装置22、人群状态控制指明存储装置23、人状态控制指明存储装置24、人图像存储装置25和人区域图像存储装置26。The storage device 2 includes a background image storage device 21, a learned local image information storage device 22, a crowd state control specification storage device 23, a human state control specification storage device 24, a human image storage device 25 and a human area image storage device 26.

背景图像存储装置21存储被用作人群补丁中的背景的多个背景图像(一组背景图像)。背景图像不包括人。将针对人群状态而被识别的图像被拍摄处的实际地方的图像可以被用作背景图像。通过使用CG(计算机图形学)等生成的背景图像可以被使用。The background image storage device 21 stores a plurality of background images (a set of background images) used as backgrounds in crowd patches. The background images do not include people. An image of the actual place where an image to be recognized for the crowd state was taken can be used as the background image. Background images generated using CG (computer graphics) or the like can also be used.

学习局部图像信息存储装置22存储人群补丁(用于机器学习的人群状态的局部图像)的尺寸，以及用于人群补丁的人的基准部位的尺寸。例如，人群补丁的尺寸被假定为高度是h个像素并且宽度是w个像素。构成人群补丁中的人群镜头的人的基准部位(本示例中的头部)的高度尺寸被假定为人群补丁的高度尺寸的1/α或是h/α个像素。在这种情况下，高度h个像素和宽度w个像素在学习局部图像信息存储装置22中被存储为人群补丁的尺寸。高度h/α个像素在学习局部图像信息存储装置22中被存储为人的基准部位的尺寸。在这里假定其中高度尺寸被存储为基准部位的尺寸的示例，但是将被存储的基准部位的尺寸不限于高度尺寸。例如，假定人的基准部位的宽度尺寸被定义为人群补丁的宽度尺寸的1/α倍或是个w/α像素。在这种情况下，高度h个像素和宽度w个像素在学习局部图像信息存储装置22中被存储为人群补丁的尺寸，并且宽度w/α个像素在学习局部图像信息存储装置22中可以被存储为人的基准部位的尺寸。在实际使用中，人的基准部位的尺寸可以运用高度尺寸或是宽度尺寸。在人群补丁的尺寸与人的基准部位的尺寸之间的关系是仅需已知的，并且对角线尺寸等可以被使用。The learning local image information storage device 22 stores the size of the crowd patch (a local image of the crowd state used for machine learning) and the size of the reference part of the person used for the crowd patch. For example, the size of the crowd patch is assumed to be h pixels in height and w pixels in width. The height size of the reference part of the person constituting the crowd shot in the crowd patch (the head in this example) is assumed to be 1/α of the height size of the crowd patch or h/α pixels. In this case, a height of h pixels and a width of w pixels are stored in the learning local image information storage device 22 as the size of the crowd patch. A height of h/α pixels is stored in the learning local image information storage device 22 as the size of the reference part of the person. An example in which the height size is stored as the size of the reference part is assumed here, but the size of the reference part to be stored is not limited to the height size. For example, it is assumed that the width size of the reference part of the person is defined as 1/α times the width size of the crowd patch or w/α pixels. In this case, the height h pixels and width w pixels are stored in the learned local image information storage device 22 as the size of the crowd patch, and the width w/α pixels can be stored in the learned local image information storage device 22 as the size of the reference part of the person. In actual use, the size of the reference part of the person can be the height size or the width size. The relationship between the size of the crowd patch and the size of the reference part of the person only needs to be known, and the diagonal size, etc. can be used.

在这里，人的基准部位的尺寸是用来将其基准部位被拍摄为与人群补丁中的尺寸一样大的人识别为人类的尺寸。例如，当人的基准部位被拍摄为在人群补丁中显著大或者被拍摄为显著小时，该人构成人群但是只是被看作背景。Here, the size of a person's reference part is the size used to identify a person as a human when their reference part is photographed as being the same size as the size in the crowd patch. For example, when a person's reference part is photographed as being significantly larger or significantly smaller than the crowd patch, the person forms part of the crowd but is only seen as background.

人群状态控制指明存储装置23存储在合成人群补丁中的多个人图像时关于用于多人的人状态(其在下面将被表示为多人状态控制指明)的指明信息。多人状态控制指明先前由训练数据生成设备10的操作者定义并且被存储在人群状态控制指明存储装置23中。多人状态控制指明按照项目而被定义，这些项目诸如是针对在合成多个人图像时诸如重叠的人或者位置偏差之类的多人布置关系的项目“人的布置”、关于人的朝向的项目“人的方向”或是针对人的数目或者说密度的项目“人的数目”。具有定义的多人状态控制指明的项目不限于此。图2是图示了在人群状态控制指明存储装置23中存储的示例性信息的示意图。图2图示了针对“人的布置”、“人的方向”和“人的数目”定义的多人状态控制指明。The crowd state control designation storage device 23 stores designation information about the human state for multiple people (which will be represented as multi-person state control designation below) when synthesizing multiple human images in a crowd patch. The multi-person state control designation is previously defined by the operator of the training data generating device 10 and is stored in the crowd state control designation storage device 23. The multi-person state control designation is defined according to items, such as the item "arrangement of people" for the arrangement relationship of multiple people such as overlapping people or position deviation when synthesizing multiple human images, the item "direction of people" about the orientation of people, or the item "number of people" for the number of people or density. The items with defined multi-person state control designations are not limited to these. Figure 2 is a schematic diagram illustrating exemplary information stored in the crowd state control designation storage device 23. Figure 2 illustrates the multi-person state control designations defined for "arrangement of people", "direction of people" and "number of people".

多人状态控制指明的形式包括“预定状态”、“随机”和“预定规则”。The forms specified by multi-person status control include "pre-booked status", "random" and "pre-booked rules".

“预定状态”是用来指明对应项目的特定状态的指明形式。在图2中示出的示例中，针对项目“人的数目”定义的“三个人”对应于“预定状态”。在该示例中，“人的数目”被具体指明为“三个人”。作为“预定状态”的其他示例，例如，可以针对项目“人的方向”指明“所有人在右方向上”。The "predetermined state" is a form of designation used to indicate a specific state of a corresponding item. In the example shown in Figure 2, the "three people" defined for the "number of people" item corresponds to the "predetermined state." In this example, the "number of people" is specifically designated as "three people." Another example of a "predetermined state" is, for example, "all people facing right" for the "direction of people" item.

“随机”表明可以针对对应项目任意定义状态。在图2中示出的示例中，针对“人的布置”和“人的方向”定义了多人状态控制指明“随机”。"Random" indicates that the state can be arbitrarily defined for the corresponding item. In the example shown in FIG2 , the multi-person state control indication "Random" is defined for "arrangement of people" and "direction of people".

“预定规则”是表明可以在满足操作者指定的规则的范围内定义对应项目的状态的指定形式。例如，当针对项目“人的布置”定义了规则“人按照50％重叠被布置”时，人的布置被指明为至少定义满足该规则的人的状态。例如，当针对“人的方向”定义了规则“相对于人群补丁的中心布置在右侧的人面朝右并且相对于中心布置在左侧的人面朝左”时，人的方向被指明为至少定义满足规则的人的状态。"Predetermined rules" are a form of specification that indicates that the status of the corresponding item can be defined within the scope of satisfying the operator-specified rule. For example, if the rule "People are arranged with a 50% overlap" is defined for the item "Person Arrangement," the person's arrangement is specified to at least define the status of the person satisfying this rule. For example, if the rule "People's Direction" is defined to at least define the status of the person satisfying this rule, such as "People arranged to the right of the center of the crowd patch face right, and People arranged to the left of the center face left," the person's direction is specified to at least define the status of the person satisfying this rule.

人群状态控制指明存储装置23按照项目来存储指定训练标签的存在。在图2中示出的示例中，“○”指示关于指明训练标签的存在的信息并且“×”指示关于指明训练标签的空缺的信息。这一点在稍后描述的图3中是相同的。The crowd state control indication storage device 23 stores the presence of a designated training label per item. In the example shown in FIG2 , “○” indicates information about the presence of a designated training label and “×” indicates information about the absence of a designated training label. This is the same in FIG3 described later.

操作者利用从具有定义的多人状态控制指明的项目之中选择具有指明的训练标签的一个或多个项目。另外，操作者无论项目是否将被指明以训练标签都定义每一个项目的多人状态控制指明。在图2中示出的示例中，针对没有指明的训练标签的项目“人的布置”和“人的方向”定义了多人状态控制指明(在这一示例中是随机的指示)。操作者将一种形式的多人状态控制指明假定为具有指明的训练标签的项目的“预定状态”。在图2中示出的示例中，针对具有指明的训练标签的项目“人的数目”指明了“三个人”的特定状态。人群状态控制指明存储装置23在其中存储多人状态控制指明和由操作者按照项目定义的指明的训练标签的存在。The operator selects one or more items with specified training labels from among the items with defined multi-person status control designations. In addition, the operator defines the multi-person status control designation for each item regardless of whether the item will be designated with a training label. In the example shown in FIG2 , a multi-person status control designation (a random designation in this example) is defined for the items “arrangement of people” and “direction of people” that do not have a specified training label. The operator assumes a form of multi-person status control designation as a “predetermined state” for the item with a specified training label. In the example shown in FIG2 , a specific state of “three people” is designated for the item “number of people” with a specified training label. The crowd status control designation storage device 23 stores therein the multi-person status control designation and the existence of the designated training labels defined by the operator according to the item.

图2通过示例图示了项目“人的布置”、“人的方向”和“人的数目”，但是操作者为其定义了多人状态控制指明和指明的训练标签的存在的项目不限于此。根据本示例性实施例，将在假定人群状态控制指明存储装置23在其中存储多人状态控制指明和操作者至少为项目“人的布置”、“人的方向”和“人的数目”定义的指明的训练标签的存在的情况下进行描述。FIG2 illustrates the items “arrangement of people”, “direction of people”, and “number of people” by way of example, but the items for which the operator defines the existence of multi-person state control designation and the designated training label are not limited thereto. According to the present exemplary embodiment, description will be made assuming that the crowd state control designation storage device 23 stores therein the multi-person state control designation and the existence of the designated training label defined by the operator for at least the items “arrangement of people”, “direction of people”, and “number of people”.

多人状态控制指明存储装置24存储用来当在人群补丁内合成多个人图像时指明每个人的状态的信息(其在下面将被表示为个别人状态控制指明)。在“多人状态控制指明”指明用于多人的人状态的同时，“个别人状态控制指明”指明属于一组人的个别人的状态。个别人状态控制指明先前由训练数据生成设备10的操作者定义，并且被存储在人状态控制指明存储装置24中。个别人状态控制指明在与人群补丁合成时按照项目被定义，项目诸如是“人的拍摄角度”、“对人的照明”、“人的姿势”、“人的衣服”、“人的身体形状”、“人的发型”或“在与人群补丁合成时的人尺寸”。具有定义的个别人状态控制指明的项目不限于这些项目。图3是图示了在人状态控制指明存储装置24中存储的示例性信息的示意图。图3图示了针对项目“人的拍摄角度”、“对人的照明”和“人的姿势”定义的个别人状态控制指明。The multi-person state control designation storage device 24 stores information used to designate the state of each person when synthesizing multiple person images within a crowd patch (hereinafter referred to as an individual person state control designation). While the "multi-person state control designation" designates the state of a person for multiple people, the "individual person state control designation" designates the state of an individual person belonging to a group of people. The individual person state control designation is previously defined by the operator of the training data generating device 10 and stored in the person state control designation storage device 24. The individual person state control designation is defined according to items such as "the shooting angle of the person," "the lighting of the person," "the posture of the person," "the clothes of the person," "the body shape of the person," "the hairstyle of the person," or "the size of the person when synthesized with the crowd patch" when synthesizing with the crowd patch. Items with defined individual person state control designations are not limited to these items. FIG3 is a schematic diagram illustrating exemplary information stored in the person state control designation storage device 24. FIG3 illustrates individual person state control designations defined for the items "the shooting angle of the person," "the lighting of the person," and "the posture of the person."

类似于多人状态控制指明，个别人状态控制指明的形式是“预定状态”、“随机”和“预定规则”。Similar to the multi-person status control indication, the forms of individual status control indication are "predetermined status", "random" and "predetermined rules".

如针对多人状态控制指明所描述的，“预定状态”是用来指明对应项目的特定状态的指明形式。在图3中示出的示例中，针对项目“人的姿势”定义的“步行”对应于“预定状态”。在这一示例中，“人的姿势”被具体指明为步行姿势。As described with respect to multi-person status control designation, "predetermined status" is a form of designation used to indicate a specific status of a corresponding item. In the example shown in FIG3 , "walking" defined for the item "person's posture" corresponds to the "predetermined status." In this example, "person's posture" is specifically designated as a walking posture.

如针对多人状态控制指明所描述的，“随机”指示可以针对对应项目任意定义状态。在图3中示出的示例中，针对“对人的照明”定义了个别人状态控制指明“随机”。As described for the multi-person state control indication, the "random" indication can arbitrarily define a state for the corresponding item. In the example shown in FIG3 , the individual state control indication "random" is defined for "lighting for people."

如针对多人状态控制指明所描述的，“预定规则”是用来指示以在满足操作者指明的规则的范围内定义对应项目的状态的指明形式。在图3中示出的示例中，针对“人的拍摄角度”定义了预定规则。在这一示例中，指明了计算人的拍摄角度并且通过使用基于来自合成时的人布置的相机参数的等式来定义根据拍摄角度的人状态。例如，当针对“与人群补丁相合成时的人尺寸”定义了规则“基于合成时的人布置和在学习局部图像信息存储装置22中存储的基准部位的尺寸来确定合成时的人尺寸”时，人的尺寸将被定义为至少满足该规则。As described for multi-person status control instructions, "predetermined rules" are a form of instruction for defining the status of a corresponding item within the range of satisfying the rules specified by the operator. In the example shown in FIG3 , a predetermined rule is defined for "shooting angle of a person." In this example, it is specified that the shooting angle of a person is calculated and the state of the person according to the shooting angle is defined by using an equation based on the camera parameters from the arrangement of the person at the time of synthesis. For example, when a rule is defined for "size of a person when synthesizing with a crowd patch" that "determines the size of a person at the time of synthesis based on the arrangement of the person at the time of synthesis and the size of the reference part stored in the learned local image information storage device 22," the size of the person will be defined to at least satisfy this rule.

多人状态控制指明存储装置24还按照项目来存储指明的训练标签的存在。The multi-person state control indication storage device 24 also stores the existence of the indicated training label on a project-by-project basis.

操作者可以不仅针对具有定义的多人状态控制指明的项目而且还针对具有定义的个别人状态控制指明的项目来选择具有指明的训练标签的一个或多个项目。同样，在这种情况下，操作者无论项目是否将被指明以训练标签都定义用于每个项目的个别人状态控制指明。在图3中示出的示例中，针对没有指明的训练标签的项目“人的拍摄角度”和“对人的照明”来定义个别人状态控制指明。操作者将一种形式的个别人状态控制指明假定为具有指明的训练标签的项目的“预定状态”。在图3中示出的示例中，具有指明的训练标签的项目“人的姿势”被具体指明为步行的状态。人状态控制指明存储装置24在其中存储个别人状态控制指明和用户按照项目定义的指明的训练标签的存在。The operator can select one or more items with designated training labels not only for items with defined multi-person status control designations but also for items with defined individual person status control designations. Also, in this case, the operator defines the individual person status control designation for each item regardless of whether the item will be designated with a training label. In the example shown in FIG3 , the individual person status control designation is defined for the items “shooting angle of the person” and “lighting of the person” that do not have designated training labels. The operator assumes a form of individual person status control designation as the “predetermined state” of the item with a designated training label. In the example shown in FIG3 , the item “posture of the person” with a designated training label is specifically designated as the state of walking. The person status control designation storage device 24 stores therein the existence of the individual person status control designation and the designated training labels defined by the user according to the item.

操作者可以不针对具有定义的个别人状态控制指明的所有项目指明训练标签。如上所述，操作者针对具有定义的多人状态控制指明的项目将一个或多个项目定义为具有指明的训练标签的项目。The operator may not specify a training label for all items with a defined individual status control designation. As described above, the operator defines one or more items as items with a specified training label for items with a defined multi-person status control designation.

根据本示例性实施例，将在假定人状态控制指明存储装置24存储操作者定义的个别人状态控制指明以及至少针对项目“人的拍摄角度”、“对人的照明”、“人的姿势”、“人的衣服”、“人的身体形状”、“人的发型”和“当与人群补丁相合成时的人尺寸”的指明的训练标签的存在。According to this exemplary embodiment, the existence of individual person state control indications defined by the operator and indications of at least one training label for the items "shooting angle of the person", "lighting of the person", "posture of the person", "clothing of the person", "body shape of the person", "hair style of the person" and "size of the person when synthesized with a crowd patch" will be stored in the assumed person state control indication storage device 24.

针对具有指明的训练标签的项目而定义的多人状态控制指明的内容是与根据在人群状态控制指明存储装置23中存储的信息生成的人群补丁对应的训练标签。类似地，针对具有指明的训练标签的项目而定义的个别人状态控制指明的内容是与根据在人状态控制指明存储装置24中存储的信息生成的人群补丁对应的训练标签。基于多人状态控制指明的训练标签是主训练标签，并且基于个别人状态控制指明的训练标签是用于训练标签的补充训练标签。The content of the multi-person state control designation defined for an item having a designated training label is a training label corresponding to a crowd patch generated based on information stored in the crowd state control designation storage device 23. Similarly, the content of the individual person state control designation defined for an item having a designated training label is a training label corresponding to a crowd patch generated based on information stored in the person state control designation storage device 24. The training label based on the multi-person state control designation is a main training label, and the training label based on the individual person state control designation is a supplementary training label for the training label.

具体而言，数据处理设备1(见图1)确定人状态，并且根据在人群状态控制指明存储装置23中存储的每个项目的多人状态控制指明和在人状态控制指明存储装置24中存储的每个项目的个别人状态控制指明来生成其中人被合成的人群补丁。数据处理设备1将针对具有指明的训练标签的项目定义的多人状态控制指明和个别人状态控制指明的内容定义为人群补丁的训练标签。例如，假定数据处理设备1根据在图2和图3中示出的多人状态控制指明和个别人状态控制指明来生成人群补丁。在这种情况下，在人群补丁中拍摄到三个步行的人。数据处理设备1将训练标签“三个人，步行”定义为用于人群补丁的训练标签。Specifically, the data processing device 1 (see FIG1 ) determines the state of people and generates a crowd patch in which people are synthesized based on the multi-person state control designation of each item stored in the crowd state control designation storage device 23 and the individual person state control designation of each item stored in the person state control designation storage device 24. The data processing device 1 defines the contents of the multi-person state control designation and the individual person state control designation defined for the items with the designated training labels as training labels for the crowd patch. For example, it is assumed that the data processing device 1 generates a crowd patch based on the multi-person state control designation and the individual person state control designation shown in FIG2 and FIG3 . In this case, three walking people are photographed in the crowd patch. The data processing device 1 defines the training label "three people, walking" as a training label for the crowd patch.

项目“当与人群补丁合成时的人尺寸”将被存储在人状态控制指明存储装置24中。当被识别为人类的人在人群补丁中被合成时，例如，在学习局部图像信息存储装置22中存储的人的基准部位的尺寸可以被指明，或者随机可以被指明为“当与人群补丁合成时的人尺寸”的个别人状态控制指明。作为随机指定的结果，如果人状态被临时确定为与在学习局部图像信息存储装置22中存储的人的基准部位的尺寸很大不同的基准部位的尺寸，则人状态的临时确定可以被再次做出。当将是背景的人在人群补丁中被合成时，例如，与在学习局部图像信息存储装置22中存储的人的基准部位的尺寸很大不同的尺寸可以被指明，或者随机可以被指明为“当与人群补丁合成时的人尺寸”的个别人状态控制指明。作为随机指定的结果，如果不对应于背景的人的状态被临时确定，则人状态的临时确定可以被再次做出。The item "person size when synthesized with a crowd patch" is stored in the person status control designation storage device 24. When a person identified as a human is synthesized in a crowd patch, for example, the size of the person's reference part stored in the learned partial image information storage device 22 may be specified, or a random individual person status control designation may be specified as "person size when synthesized with a crowd patch." If, as a result of the random designation, the person's status is temporarily determined to be a reference part size significantly different from the reference part size of the person stored in the learned partial image information storage device 22, the temporary determination of the person's status may be made again. When a person representing the background is synthesized in a crowd patch, for example, a size significantly different from the reference part size of the person stored in the learned partial image information storage device 22 may be specified, or a random individual person status control designation may be specified as "person size when synthesized with a crowd patch." If, as a result of the random designation, the status of a person not corresponding to the background is temporarily determined, the temporary determination of the person's status may be made again.

如在下面描述的，根据本示例性实施例，数据处理设备1确定将被识别为人类的人(其在下面可以被表示为前景人)的状态并且确定背景人的状态。用于确定前景人状态的多人状态控制指明和个别人状态控制指明以及用于确定背景人状态的多人状态控制指明和个别人状态控制指明可以由操作者分别定义。在这种情况下，人群状态控制指明存储装置23在其中存储用于确定前景人状态的多人状态控制指明和用于确定背景人状态的多人状态控制指明。人状态控制指明存储装置24在其中存储用于确定前景人状态的个别人状态控制指明和用于确定背景人状态的个别人状态控制指明。多人状态控制指明和个别人状态控制指明对于确定前景人状态和对于确定背景人状态可以不被分开。As described below, according to the present exemplary embodiment, the data processing device 1 determines the state of a person to be identified as a human being (which may be represented below as a foreground person) and determines the state of a background person. The multi-person state control designation and the individual person state control designation for determining the state of the foreground person and the multi-person state control designation and the individual person state control designation for determining the state of the background person may be defined separately by the operator. In this case, the crowd state control designation storage device 23 stores therein the multi-person state control designation for determining the state of the foreground person and the multi-person state control designation for determining the state of the background person. The person state control designation storage device 24 stores therein the individual person state control designation for determining the state of the foreground person and the individual person state control designation for determining the state of the background person. The multi-person state control designation and the individual person state control designation may not be separated for determining the state of the foreground person and for determining the state of the background person.

人图像存储装置25存储添加有诸如人的方向、人的拍摄角度、对人的照明、人的姿势、人图像的衣服、身体形状和发型之类的关于人状态的信息的多个人图像(一组人图像)。就是说，数据处理设备1可以从人图像存储装置25读取与确定状态相匹配的人图像。The human image storage device 25 stores a plurality of human images (a set of human images) to which information on human conditions, such as the direction of the person, the angle from which the person is photographed, the lighting of the person, the person's posture, the clothing of the person image, the body shape, and the hairstyle, is added. That is, the data processing device 1 can read a human image that matches a certain condition from the human image storage device 25.

人区域图像存储装置26存储与在人图像存储装置25中存储的该一组人图像对应的一组人区域图像。人区域图像是指示在人图像存储装置25中存储的人图像中的人的区域的图像。图4是以示例方式图示了在人图像存储装置25中存储的人图像和对应于人图像的人区域图像的示图。图4以示例方式图示了人图像和人区域图像的四个配对。人区域图像可以是这样一个图像，其中在人图像中拍摄的人的区域以单个颜色(在图4中示出的示例中为白色)被表达并且除人之外的区域以另一单个颜色(在图4中示出的示例中为黑色)被表达。人区域图像不限于该示例。人区域图像可以是能够指示人图像中的人的区域的图像。The human area image storage device 26 stores a group of human area images corresponding to the group of human images stored in the human image storage device 25. The human area image is an image indicating the area of a person in the human image stored in the human image storage device 25. Figure 4 is a diagram illustrating, by way of example, human images stored in the human image storage device 25 and human area images corresponding to the human images. Figure 4 illustrates, by way of example, four pairs of human images and human area images. The human area image may be an image in which the area of a person captured in the human image is expressed in a single color (white in the example shown in Figure 4) and the area other than the person is expressed in another single color (black in the example shown in Figure 4). The human area image is not limited to this example. The human area image may be an image that can indicate the area of a person in the human image.

人区域图像用于从对应的人图像仅裁剪人(或者仅裁剪人的区域)。The person region image is used to crop only the person (or only the region of the person) from the corresponding person image.

可以配置为不使得一组各种人图像被先前准备并存储在人图像存储装置25中而是使得数据处理设备1包括用于通过CG等生成与确定的人状态匹配的人图像的人图像生成装置(未示出)。It may be configured such that not a set of various human images is previously prepared and stored in the human image storage device 25 but the data processing apparatus 1 includes a human image generating device (not shown) for generating a human image matching a determined human state by CG or the like.

数据处理设备1包括背景提取装置11、人状态确定装置15、人群状态图像合成装置14和控制装置16。The data processing device 1 includes a background extraction device 11 , a human state determination device 15 , a crowd state image synthesis device 14 and a control device 16 .

背景提取装置11从在背景图像存储装置21中存储的该组背景图像选择背景图像。背景提取装置11计算在学习局部图像信息存储装置22中存储的人群补丁尺寸的纵横比。背景提取装置11从选择的背景图像临时提取适当位置和适当尺寸的背景以满足纵横比。另外，背景提取装置11放大或者缩小临时提取的背景以与在学习局部图像信息存储装置22中存储的人群补丁尺寸匹配。以这种方式，放大或者缩小从图像提取的区域以与人群补丁尺寸匹配可以被表示为归一化。The background extraction device 11 selects a background image from the set of background images stored in the background image storage device 21. The background extraction device 11 calculates the aspect ratio of the crowd patch size stored in the learned local image information storage device 22. The background extraction device 11 temporarily extracts a background image at an appropriate position and size from the selected background image to satisfy the aspect ratio. In addition, the background extraction device 11 enlarges or reduces the temporarily extracted background image to match the crowd patch size stored in the learned local image information storage device 22. In this way, enlarging or reducing the area extracted from the image to match the crowd patch size can be expressed as normalization.

当背景提取装置11临时提取适当位置和适当尺寸的背景时，随机位置和随机尺寸的区域可以被提取以满足纵横比。假定图像中的每个位置处的人的基准部位的尺寸是已知的，背景提取装置11可以根据在图像中的每个位置处已知的基准部位的尺寸找到以放大率或者缩小率放大或者缩小的人群补丁的尺寸，在学习局部图像信息存储装置22中存储的人的基准部位的尺寸以该放大率或缩小率被放大或缩小。背景提取装置11可以随后提取具有针对图像中的位置找到的尺寸的区域。背景提取装置11用来从选择的背景图像临时提取区域的方法可以是其他方法。When the background extraction device 11 temporarily extracts the background at an appropriate position and an appropriate size, an area at a random position and a random size can be extracted to satisfy the aspect ratio. Assuming that the size of the reference part of the person at each position in the image is known, the background extraction device 11 can find the size of the crowd patch that is enlarged or reduced at a magnification or reduction rate based on the size of the known reference part at each position in the image, and the size of the reference part of the person stored in the learned local image information storage device 22 is enlarged or reduced at the magnification or reduction rate. The background extraction device 11 can then extract an area with the size found for the position in the image. The method used by the background extraction device 11 to temporarily extract an area from the selected background image may be other methods.

人状态确定装置15在基于在人群状态控制指明存储装置23中存储的多人状态控制指明和在人状态控制指明存储装置24中存储的个别人状态控制指明临时确定人状态的同时基于人群补丁尺寸的人的基准部位的尺寸的条件和基准部位如何被表达来确定最终人状态。The human state determination device 15 temporarily determines the human state based on the multi-person state control indication stored in the crowd state control indication storage device 23 and the individual person state control indication stored in the human state control indication storage device 24, while determining the final human state based on the conditions of the size of the human reference part of the crowd patch size and how the reference part is expressed.

在这里，当满足多人状态控制指明和个别人状态控制指明的人状态被确定时，指明“随机”在指明中是可能的，并且因而适当的人状态可能无法获取。在这种情况下，满足多人状态控制指明和个别人状态控制指明的人状态被再次确定。当适当人状态被获取时，人状态被最终确定。以这种方式，人状态可以被再次确定，并且因而表达“临时确定”可以被使用。Here, when a person's status is determined to satisfy both the multi-person status control specification and the individual person's status control specification, it is indicated that "random" is possible in the specification, and thus the appropriate person's status may not be obtained. In this case, the person's status that satisfies both the multi-person status control specification and the individual person's status control specification is re-determined. When the appropriate person's status is obtained, the person's status is finally determined. In this way, the person's status can be re-determined, and thus the expression "temporarily determined" can be used.

根据本示例性实施例，人状态确定装置15确定前景人状态并且确定背景人状态。此时，当确定临时确定的前景人状态是否适当时，人状态确定装置15基于可与人群补丁尺寸的人的基准部位的尺寸比较的基准部位的尺寸是否被获取或者基准部位如何被表达来做出确定。当确定临时确定的背景人状态是否适当时，人状态确定装置15基于与人群补丁尺寸的人的基准部位的尺寸很大不同的基准部位的尺寸是否被获取或者基准部位如何被表达来做出确定。According to this exemplary embodiment, the human state determination device 15 determines the foreground human state and the background human state. When determining whether the provisionally determined foreground human state is appropriate, the human state determination device 15 makes a determination based on whether the size of a reference part comparable to the size of a reference part of a person of the crowd patch size is obtained, or how the reference part is expressed. When determining whether the provisionally determined background human state is appropriate, the human state determination device 15 makes a determination based on whether the size of a reference part significantly different from the size of a reference part of a person of the crowd patch size is obtained, or how the reference part is expressed.

人状态确定装置15在下面将被更详细地描述。人状态确定装置15包括背景人状态确定装置12和前景人状态确定装置13。The human state determining device 15 will be described in more detail below. The human state determining device 15 includes a background human state determining device 12 and a foreground human state determining device 13.

背景人状态确定装置12定义人的布置、人的方向、人的数目、人的拍摄角度、对人的照明、人的姿势、人的衣服、人的身体形状、人的发型、当与人群补丁合成时的人尺寸等，并且根据在人群状态控制指明存储装置23中存储的多人状态控制指明和在人状态控制指明存储装置24中存储的个别人状态控制指明来临时确定与背景对应的人的状态。背景人状态确定装置12确定临时确定的人状态是否满足背景人状态的条件，并且如果背景人状态的条件未得到满足，则再次做出人状态的临时确定。如果临时确定的人状态满足条件，则背景人状态确定装置12最终将临时确定的人状态确定为与背景对应的人的状态。The background person state determination device 12 defines the arrangement of people, the direction of people, the number of people, the shooting angle of people, the lighting of people, the posture of people, the clothing of people, the body shape of people, the hairstyle of people, the size of people when synthesized with the crowd patch, etc., and temporarily determines the state of people corresponding to the background based on the multi-person state control designation stored in the crowd state control designation storage device 23 and the individual person state control designation stored in the person state control designation storage device 24. The background person state determination device 12 determines whether the temporarily determined person state meets the conditions of the background person state, and if the conditions of the background person state are not met, makes a temporary determination of the person state again. If the temporarily determined person state meets the conditions, the background person state determination device 12 finally determines the temporarily determined person state as the state of the person corresponding to the background.

背景人状态的条件例如对应于人被布置为使得人的基准部位不在人群补丁内的事实，或者人的基准部位的尺寸在被合成时显著大于在学习局部图像信息存储装置22中存储的基准部位的尺寸或者显著小于其的事实。在该条件下，最终基于相对于人群补丁尺寸的人的基准部位的尺寸或者基准部位如何被表达来确定与背景对应的人的状态。在这里列出的条件是示例性的，并且其他条件可以被用于背景人状态的条件。The conditions for the background person status correspond to, for example, the fact that the person is arranged so that the person's reference part is not within the crowd patch, or the fact that the size of the person's reference part is significantly larger or smaller than the size of the reference part stored in the learned local image information storage device 22 when synthesized. Under these conditions, the status of the person corresponding to the background is ultimately determined based on the size of the person's reference part relative to the size of the crowd patch or how the reference part is expressed. The conditions listed here are exemplary, and other conditions can be used for the background person status conditions.

人的基准部位在人群补丁内的事实指示这样一种状态，其中多于预定比率的表达其中的人的基准部位的区域是在人群补丁中拍摄的。相反地，人的基准部位不在人群补丁内的事实指示这样一种状态，其中少于预定比率的表达其中的人的基准部位的区域是在人群补丁中拍摄的。例如，假定预定比率先前被定义为80％。在这种情况下，例如，如果表达基准部位的区域的85％是在人群补丁内拍摄的，则可以说人的基准部位在人群补丁内。例如，如果只有表达基准部位的区域的20％是在人群补丁中拍摄的，则可以说人的基准部位不在人群补丁内。80％是示例性比率，并且除80％之外的值可以被定义为预定比率。The fact that the reference part of a person is within the crowd patch indicates a state in which more than a predetermined ratio of the area expressing the reference part of the person is captured in the crowd patch. Conversely, the fact that the reference part of a person is not within the crowd patch indicates a state in which less than a predetermined ratio of the area expressing the reference part of the person is captured in the crowd patch. For example, assume that the predetermined ratio was previously defined as 80%. In this case, for example, if 85% of the area expressing the reference part is captured within the crowd patch, it can be said that the reference part of the person is within the crowd patch. For example, if only 20% of the area expressing the reference part is captured in the crowd patch, it can be said that the reference part of the person is not within the crowd patch. 80% is an exemplary ratio, and a value other than 80% may be defined as the predetermined ratio.

根据本示例性实施例，指示比在学习局部图像信息存储装置22中存储的基准部位的尺寸更大的尺寸的第一阈值和指示比在学习局部图像信息存储装置22中存储的基准部位的尺寸更小的尺寸的第二阈值是预先定义的。人的基准部位的尺寸在被合成时与在学习局部图像信息存储装置22中存储的基准部位的尺寸一样大的事实表明人的基准部位的尺寸在被合成时等于或者大于第二阈值，并且等于或者小于第一阈值。人的基准部位的尺寸在被合成时比在学习局部图像信息存储装置22中存储的基准部位的尺寸大得多的事实表明人的基准部位的尺寸在被合成时大于第一阈值。人的基准部位的尺寸在被合成时比在学习局部图像信息存储装置22中存储的基准部位的尺寸小得多的事实表明人的基准部位的尺寸在被合成时小于第二阈值。According to the present exemplary embodiment, a first threshold value indicating a size larger than the size of the reference part stored in the learned partial image information storage device 22 and a second threshold value indicating a size smaller than the size of the reference part stored in the learned partial image information storage device 22 are predefined. The fact that the size of the reference part of a person is as large as the size of the reference part stored in the learned partial image information storage device 22 when synthesized indicates that the size of the reference part of the person is equal to or larger than the second threshold value and equal to or smaller than the first threshold value when synthesized. The fact that the size of the reference part of the person is much larger than the size of the reference part stored in the learned partial image information storage device 22 when synthesized indicates that the size of the reference part of the person is larger than the first threshold value when synthesized. The fact that the size of the reference part of the person is much smaller than the size of the reference part stored in the learned partial image information storage device 22 when synthesized indicates that the size of the reference part of the person is smaller than the second threshold value when synthesized.

图5(a)至图5(d)是图示了其中背景人状态的条件被满足的示例的示意图。在这一示例中，假定人的基准部位(在本示例中为头部)的高度尺寸在学习局部图像信息存储装置22中被存储为人群补丁的h个像素的高度尺寸的1/α(或者h/α个像素)。在图5(a)和图5(b)中示出的人状态处于布置状态下，其中在人群补丁中未找到人的基准部位，并且因而背景人状态的条件被满足。在图5(c)中示出的人状态是基准部位的尺寸比基准部位的定义尺寸小得多，并且因而背景人状态的条件被满足。在图5(d)中示出的人状态是基准部位的尺寸比基准部位的定义尺寸大得多，并且因而背景人状态的条件得到满足。Figures 5(a) to 5(d) are schematic diagrams illustrating examples in which the condition of the background person state is satisfied. In this example, it is assumed that the height size of the reference part of the person (the head in this example) is stored in the learning local image information storage device 22 as 1/α (or h/α pixels) of the height size of h pixels of the crowd patch. The human state shown in Figures 5(a) and 5(b) is in an arranged state, in which the reference part of the person is not found in the crowd patch, and thus the condition of the background person state is satisfied. The human state shown in Figure 5(c) is that the size of the reference part is much smaller than the defined size of the reference part, and thus the condition of the background person state is satisfied. The human state shown in Figure 5(d) is that the size of the reference part is much larger than the defined size of the reference part, and thus the condition of the background person state is satisfied.

前景人状态确定装置13定义人的布置、人的方向、人的数目、人的拍摄角度、对人的照明、人的姿势、人的衣服、人的身体形状、人的发型、当与人群补丁合成时的人尺寸等，并且根据在人群状态控制指明存储装置23中存储的多人状态控制指明和在人状态控制指明存储装置24中存储的个别人状态控制指明来临时确定与前景对应的人的状态。前景人状态确定装置13然后确定临时确定的人状态是否满足前景人状态的条件，并且如果前景人状态的条件未被满足，则再次做出人状态的临时确定。另外，如果临时确定的人状态满足条件，则前景人状态确定装置13最终将临时确定的人状态确定为与前景对应的人的状态。The foreground person state determination device 13 defines the arrangement of people, the direction of people, the number of people, the shooting angle of people, the lighting of people, the posture of people, the clothing of people, the body shape of people, the hairstyle of people, the size of people when synthesized with the crowd patch, etc., and temporarily determines the state of the person corresponding to the foreground based on the multi-person state control designation stored in the crowd state control designation storage device 23 and the individual person state control designation stored in the person state control designation storage device 24. The foreground person state determination device 13 then determines whether the temporarily determined person state meets the conditions of the foreground person state, and if the conditions of the foreground person state are not met, a temporary determination of the person state is made again. In addition, if the temporarily determined person state meets the conditions, the foreground person state determination device 13 finally determines the temporarily determined person state as the state of the person corresponding to the foreground.

前景人状态的条件例如是人的基准部位被布置为在人群补丁内并且人的基准部位的尺寸在被合成时与在学习局部图像信息存储装置22中存储的基准部位的尺寸一样大。在该条件下，最终基于相对于人群补丁尺寸的人的基准部位的尺寸或者基准部位如何被表达来确定与前景对应的人的状态。在这里列出的条件是示例性的，并且其他条件可以被用于前景人状态的条件。The conditions for the foreground person status are, for example, that the person's reference part is arranged within the crowd patch and that the size of the person's reference part, when synthesized, is the same as the size of the reference part stored in the learned partial image information storage device 22. Under this condition, the status of the person corresponding to the foreground is ultimately determined based on the size of the person's reference part relative to the size of the crowd patch or how the reference part is expressed. The conditions listed here are exemplary, and other conditions can be used for the foreground person status conditions.

图6(a)至图6(d)是图示了其中前景人状态的条件被满足的示例的示意图。如参考图5(a)至图5(d)描述，人的基准部位(在本示例中为头部)的高度尺寸被假定为学习局部图像信息存储装置22中的人群补丁的h个像素的高度尺寸的1/α(或者h/α个像素)。在图6(a)至图6(d)中示出的任何人状态是使得人的基准部位在人群补丁内并且基准部位的尺寸与在学习局部图像信息存储装置22中存储的基准部位的尺寸一样大。因此，在图6(a)至图6(d)中示出的任何人状态都满足前景人状态的条件。Figures 6(a) to 6(d) are schematic diagrams illustrating examples in which the conditions for the foreground person state are satisfied. As described with reference to Figures 5(a) to 5(d), the height size of the reference part of the person (the head in this example) is assumed to be 1/α (or h/α pixels) of the height size of the h pixels of the crowd patch in the learning local image information storage device 22. Any person state shown in Figures 6(a) to 6(d) is such that the reference part of the person is within the crowd patch and the size of the reference part is the same as the size of the reference part stored in the learning local image information storage device 22. Therefore, any person state shown in Figures 6(a) to 6(d) satisfies the conditions for the foreground person state.

如上所述，用于确定前景人状态的多人状态控制指明和个别人状态控制指明以及用于确定背景人状态的多人状态控制指明和个别人状态控制指明可以由操作者分别定义。在这种情况下，背景人状态确定装置12可以根据用于确定背景人状态的多人状态控制指明和个别人状态控制指明来临时确定人状态。前景人状态确定装置13然后可以根据用于确定前景人状态的多人状态控制指明和个别人状态控制指明来临时确定人状态。如上所述，当针对确定前景人状态和针对确定背景人状态而分别定义多人状态控制指明和个别人状态控制指明时，前景人的数目和背景人的数目可以被改变。As described above, the multi-person state control designation and individual person state control designation for determining the state of the foreground person and the multi-person state control designation and individual person state control designation for determining the state of the background person can be defined separately by the operator. In this case, the background person state determining device 12 can temporarily determine the state of the person based on the multi-person state control designation and individual person state control designation for determining the state of the background person. The foreground person state determining device 13 can then temporarily determine the state of the person based on the multi-person state control designation and individual person state control designation for determining the state of the foreground person. As described above, when the multi-person state control designation and individual person state control designation are defined for determining the state of the foreground person and for determining the state of the background person, respectively, the number of foreground people and the number of background people can be changed.

人群状态图像合成装置14从人图像存储装置25读取满足背景人状态确定装置12最终确定的人状态(诸如人的方向、人的数目、人的拍摄角度、对人的照明、人的姿势、人的衣服、人的身体形状和人的发型)的人图像，并且进一步从人区域图像存储装置26读取对应于该人图像的人区域图像。人群状态图像合成装置14然后通过使用人区域图像从人图像中裁剪只有人的图像(或者仅裁剪人区域)。类似地，人群状态图像合成装置14从人图像存储装置25读取满足前景人状态确定装置13最终确定的人状态的人图像，并且进一步从人区域图像存储装置26读取对应于该人图像的人区域图像。人群状态图像合成装置14然后通过使用人区域图像从人图像中裁剪只有人的图像。The crowd state image synthesis device 14 reads a person image that satisfies the person state (such as the person's direction, the number of people, the shooting angle of the person, the lighting of the person, the person's posture, the person's clothing, the person's body shape, and the person's hairstyle) finally determined by the background person state determination device 12 from the person image storage device 25, and further reads a person region image corresponding to the person image from the person region image storage device 26. The crowd state image synthesis device 14 then crops an image of only the person (or only the person region) from the person image by using the person region image. Similarly, the crowd state image synthesis device 14 reads a person image that satisfies the person state finally determined by the foreground person state determination device 13 from the person image storage device 25, and further reads a person region image corresponding to the person image from the person region image storage device 26. The crowd state image synthesis device 14 then crops an image of only the person from the person image by using the person region image.

人群状态图像合成装置14使如上所述裁剪的只有人的图像与背景图像合成。此时，人群状态图像合成装置14根据由背景人状态确定装置12确定的“人的布置”和“当与人群补丁合成时的人尺寸”使基于由背景人状态确定装置12最终确定的人状态而裁剪的只有人的图像与背景图像合成。另外，人群状态图像合成装置14根据由前景人状态确定装置13确定的“人的布置”和“当与人群补丁合成时的人尺寸”使基于由前景人状态确定装置13最终确定的人状态而裁剪的只有人的图像与背景图像合成。在这里，背景图像是由背景提取装置11归一化的图像。合成结果是人群补丁。The crowd state image synthesis device 14 synthesizes the above-mentioned cropped image of only people with the background image. At this time, the crowd state image synthesis device 14 synthesizes the image of only people, which was cropped based on the human state finally determined by the background human state determination device 12, with the background image based on the "human arrangement" and "human size when synthesized with the crowd patch" determined by the background human state determination device 12. Furthermore, the crowd state image synthesis device 14 synthesizes the image of only people, which was cropped based on the human state finally determined by the foreground human state determination device 13, with the background image based on the "human arrangement" and "human size when synthesized with the crowd patch" determined by the foreground human state determination device 13. Here, the background image is the image normalized by the background extraction device 11. The result of the synthesis is a crowd patch.

当使只有人的图像与背景图像合成时，人群状态图像合成装置14使来自与距离相机的最远布置位置对应的人的图像的图像顺序地重叠并合成。例如，当图像的上部距离相机更远时，人群状态图像合成装置14使屏幕的上部处的人的图像顺序地重叠并合成。当关于相机校准的信息被提供时，人群状态图像合成装置14考虑到人的图像的3D位置而使来自距离相机的最远图像的人的图像重叠并合成。When combining images of only people with the background image, the crowd state image combining device 14 sequentially overlaps and combines images of people located at the positions furthest from the camera. For example, when the upper portion of an image is further from the camera, the crowd state image combining device 14 sequentially overlaps and combines images of people located at the upper portion of the screen. When information regarding camera calibration is provided, the crowd state image combining device 14 overlaps and combines images of people located at the farthest position from the camera, taking into account the 3D positions of the people's images.

在假定人群状态图像合成装置14通过使用人区域图像从人图像裁剪只有人的图像并且使只有人的图像与背景图像合成的情况下描述了以上示例。人群状态图像合成装置14可以基于对应于人图像的人区域图像将从人图像存储装置25读取的人图像划分为人的区域和除人之外的区域，可以对人的区域和除人之外的区域加权，并且可以基于权重来混合并合成人图像。在这种情况下，人的区域的权重比除人之外的区域更重。权重在这些区域中可以被改变。The above example assumes that the crowd state image synthesis device 14 uses the human region image to crop an image containing only people from a human image and synthesizes the image containing only people with a background image. The crowd state image synthesis device 14 can divide the human image read from the human image storage device 25 into a human region and a non-human region based on the human region image corresponding to the human image, weight the human region and the non-human region, and then combine and synthesize the human image based on the weights. In this case, the human region is weighted more heavily than the non-human region. The weights can be varied between these regions.

如上所述，数据处理设备1可以包括用于通过CG等来生成与指定人状态匹配的人图像的人图像生成装置(未示出)。在这种情况下，人图像生成装置(未示出)生成与由背景人状态确定装置12确定的人状态或由前景人状态确定装置13确定的人状态匹配的人图像，并且人群状态图像合成装置14可以合成人图像以由此生成人群补丁。As described above, the data processing device 1 may include a human image generating means (not shown) for generating a human image matching a specified human state by CG, etc. In this case, the human image generating means (not shown) generates a human image matching the human state determined by the background human state determining means 12 or the human state determined by the foreground human state determining means 13, and the crowd state image synthesizing means 14 may synthesize the human image to thereby generate a crowd patch.

人群状态图像合成装置14在生成人群补丁时从人群状态控制指明存储装置23和人状态控制指明存储装置24读取训练标签。就是说，人群状态图像合成装置14从人群状态控制指明存储装置23读取具有指明的训练标签的项目的多人状态控制指明的内容，并且从人状态控制指明存储装置24读取具有指明的训练标签的项目的个别人状态控制指明的内容。人群状态图像合成装置14然后输出人群补丁和训练标签的配对。人群补丁和训练标签被用作用于识别图像中的人群状态的机器学习的训练数据。When generating crowd patches, the crowd state image synthesis device 14 reads training labels from the crowd state control specification storage device 23 and the person state control specification storage device 24. Specifically, the crowd state image synthesis device 14 reads the contents of the multi-person state control specification for items with designated training labels from the crowd state control specification storage device 23, and reads the contents of the individual person state control specification for items with designated training labels from the person state control specification storage device 24. The crowd state image synthesis device 14 then outputs a pair of crowd patches and training labels. The crowd patches and training labels serve as training data for machine learning to identify crowd states in images.

控制装置16使得背景提取装置11、人状态确定装置15(具体而言，背景人状态确定装置12和前景人状态确定装置13)和人群状态图像合成装置14重复地执行一系列处理。结果，数据处理设备1输出人群补丁和训练标签的大量配对。The control means 16 causes the background extraction means 11, the human state determination means 15 (specifically, the background human state determination means 12 and the foreground human state determination means 13), and the crowd state image synthesis means 14 to repeatedly perform a series of processes. As a result, the data processing device 1 outputs a large number of pairs of crowd patches and training labels.

当改变人状态指明或训练标签时，操作者重置多人状态控制指明、个别人状态控制指明和指明的训练标签的存在以使得数据处理设备1根据设定输出人群补丁和训练标签的大量配对。因而，操作者可以获取大量期望训练数据。When changing the person status designation or training label, the operator resets the multi-person status control designation, individual person status control designation, and the designated training label so that the data processing device 1 outputs a large number of pairs of crowd patches and training labels according to the settings. Thus, the operator can obtain a large amount of desired training data.

图7是图示了根据本发明的人群状态识别设备的示例性结构的框图。根据本发明的人群状态识别设备30识别给定图像中的人群状态。人群状态识别设备30包括图像获取设备3、在程序控制下工作的数据处理设备4以及用于将信息存储在其中的存储设备5。FIG7 is a block diagram illustrating an exemplary structure of a crowd state recognition device according to the present invention. The crowd state recognition device 30 according to the present invention recognizes the state of a crowd in a given image. The crowd state recognition device 30 includes an image acquisition device 3, a data processing device 4 operating under program control, and a storage device 5 for storing information therein.

图像获取设备3是用于获取针对人群状态将被识别的图像的相机。The image acquisition device 3 is a camera for acquiring images to be recognized for a crowd state.

存储设备5包括搜索窗口存储装置51和人群状态识别字典存储装置52。The storage device 5 includes a search window storage means 51 and a crowd state recognition dictionary storage means 52 .

搜索窗口存储装置51存储指示图像上针对人群状态将被识别的部分的一组矩形区域。矩形区域可被称为搜索窗口。该一组矩形区域可以通过基于指示图像获取设备3的位置、姿势、焦距和透镜畸变的相机参数和与人群补丁尺寸对应的基准部位的尺寸(在学习局部图像信息存储装置22中存储的基准部位的尺寸)根据图像上的位置定义人群补丁的改变尺寸而被设置。例如，在图像中拍摄的人的基准部位的尺寸可以从相机参数得出。在根据基准部位的尺寸来放大或者缩小在学习局部图像信息存储装置22中存储的人的基准部位的尺寸时以放大率或者缩小率来放大或者缩小人群补丁的尺寸，由此设置矩形区域的尺寸。该一组矩形区域可以被设置为覆盖图像上的位置。该一组矩形区域可以不限于该方法而被自由地设置。另外，该一组矩形区域可以被设置为重叠。The search window storage device 51 stores a set of rectangular areas indicating the portion of the image to be identified for the crowd state. The rectangular area can be referred to as a search window. The set of rectangular areas can be set by defining the change size of the crowd patch according to the position on the image based on the camera parameters indicating the position, posture, focal length and lens distortion of the image acquisition device 3 and the size of the reference part corresponding to the crowd patch size (the size of the reference part stored in the learning local image information storage device 22). For example, the size of the reference part of the person photographed in the image can be derived from the camera parameters. When the size of the reference part of the person stored in the learning local image information storage device 22 is enlarged or reduced according to the size of the reference part, the size of the crowd patch is enlarged or reduced at a magnification or reduction rate, thereby setting the size of the rectangular area. The set of rectangular areas can be set to cover the position on the image. The set of rectangular areas can be freely set without being limited to this method. In addition, the set of rectangular areas can be set to overlap.

人群状态识别字典存储装置52存储通过在图1中示出的训练数据生成设备10所生成的训练数据(人群补丁和训练标签的大量配对)而学习的鉴别器的字典。鉴别器是一种用于识别人群状态的算法，并且鉴别器的字典用于根据该算法来执行人群状态识别处理。在人群状态识别字典存储装置52中存储的鉴别器的字典例如是通过使用由训练数据生成设备10生成的人群补丁和训练标签的大量配对进行机器学习而得到的。机器学习可以是众所周知的机器学习。The crowd state recognition dictionary storage device 52 stores a dictionary of discriminators learned using the training data (a large number of pairs of crowd patches and training labels) generated by the training data generation device 10 shown in FIG1 . A discriminator is an algorithm for recognizing crowd states, and the discriminator dictionary is used to perform crowd state recognition processing based on the algorithm. The discriminator dictionary stored in the crowd state recognition dictionary storage device 52 is obtained, for example, by performing machine learning using a large number of pairs of crowd patches and training labels generated by the training data generation device 10. The machine learning may be well-known machine learning.

数据处理设备4包括人群状态识别装置41。The data processing device 4 includes a crowd status recognition device 41 .

人群状态识别装置41从由图像获取设备3获取的图像提取与在搜索窗口存储装置51中存储的该组一矩形区域对应的局部区域图像，并且使提取的局部区域图像归一化以与人群补丁尺寸匹配。人群状态识别装置41然后根据识别算法(或者鉴别器)通过使用在人群状态识别字典存储装置52中存储的鉴别器的字典来识别(确定)归一化的局部区域图像中的人群状态。The crowd state recognition means 41 extracts the local area image corresponding to the set of rectangular areas stored in the search window storage means 51 from the image acquired by the image acquisition device 3, and normalizes the extracted local area image to match the crowd patch size. The crowd state recognition means 41 then recognizes (determines) the crowd state in the normalized local area image based on the recognition algorithm (or discriminator) by using the dictionary of the discriminator stored in the crowd state recognition dictionary storage means 52.

在图1中示出的训练数据生成设备10可以生成操作者期望的大量训练数据(人群补丁和训练标签的配对)。人群状态识别装置41通过使用作为使用这种训练数据进行机器学习的结果而获取的鉴别器的字典来识别局部区域图像中的人群状态。因而，人群状态识别设备30可以识别各种人群状态。The training data generation device 10 shown in FIG1 can generate a large amount of training data (pairs of crowd patches and training labels) as desired by the operator. The crowd state recognition device 41 recognizes the crowd state in the local area image by using a dictionary of discriminators obtained as a result of machine learning using this training data. Thus, the crowd state recognition device 30 can recognize various crowd states.

图8是通过示例方式图示了如何识别作为图像中的人群状态的拥挤程度(人的数目)的示意图。例如，假定训练数据生成设备10的操作者主要以逐步方式控制“人的数目”并且获取许多人群补丁和训练标签(见图8中的上部)。然后假定通过根据训练数据的机器学习而获取的鉴别器的字典被存储在人群状态识别字典存储装置52中。在图8中示出的图像61中，将从中提取局部区域图像的矩形区域以虚线指示。根据矩形区域提取的局部区域图像的人群状态的识别结果与需线区域对应地被表达。这适用于下面描述的图9至图11。另外，实际的矩形区域基本上被设置为覆盖整个图像，但是为了简单图示识别结果而通过示例方式图示了仅一些矩形区域。在这一示例中，人群状态识别装置41如在图8中示出可以识别图像61中的各种区域中的人的数目(拥挤程度)。FIG8 is a schematic diagram illustrating how to identify the degree of congestion (the number of people) as the state of a crowd in an image by way of example. For example, it is assumed that the operator of the training data generating device 10 mainly controls the "number of people" in a step-by-step manner and obtains a number of crowd patches and training labels (see the upper part in FIG8 ). It is then assumed that the dictionary of the discriminator obtained by machine learning based on the training data is stored in the crowd state recognition dictionary storage device 52. In the image 61 shown in FIG8 , the rectangular area from which the local area image is extracted is indicated by a dotted line. The recognition result of the crowd state of the local area image extracted from the rectangular area is expressed corresponding to the dotted area. This applies to FIG9 to FIG11 described below. In addition, the actual rectangular area is basically set to cover the entire image, but only some rectangular areas are illustrated by way of example for the purpose of simply illustrating the recognition result. In this example, the crowd state recognition device 41 can recognize the number of people (degree of congestion) in various areas in the image 61 as shown in FIG8 .

图9是通过示例方式图示了如何识别作为图像中的人群状态的人群的方向的示意图。例如，假定训练数据生成设备10的操作者主要控制“人的方向”并且获取了许多人群补丁和训练标签(见图9中的上部)。然后假定通过根据训练数据的机器学习而获取的鉴别器的字典被存储在人群状态识别字典存储装置52中。在这一示例中，人群状态识别装置41如在图9中所示可以识别图像62中的各种区域中的人群的方向。FIG9 is a schematic diagram illustrating, by way of example, how to identify the direction of a crowd as a crowd state in an image. For example, it is assumed that the operator of the training data generation device 10 primarily controls "direction of people" and acquires a number of crowd patches and training labels (see the upper portion of FIG9 ). It is then assumed that a dictionary of discriminators acquired through machine learning based on the training data is stored in the crowd state recognition dictionary storage device 52. In this example, the crowd state recognition device 41 can identify the direction of a crowd in various areas of the image 62, as shown in FIG9 .

图10是通过示例方式图示了如何识别作为图像中的人群状态的非异常人群(非显著拥挤人群)或异常人群(显著拥挤人群)的示意图。例如，假定训练数据生成设备10的操作者主要控制“人的数目”并且获取了许多人群补丁和训练标签。在这里，假定大量训练数据按照包括人的数目何时小于n和人的数目何时是n或者更大在内的两类而被获取(见图10的上部)。然后假定通过根据训练数据的机器学习而得到的鉴别器的字典被存储在人群状态识别字典存储装置52中。在这一示例中，人群状态识别装置41如在图10中示出可以识别图像63中的各种区域中的人群状态是非异常人群还是异常人群。FIG10 is a schematic diagram illustrating, by way of example, how to identify a non-abnormal crowd (non-significantly crowded crowd) or an abnormal crowd (significantly crowded crowd) as a crowd state in an image. For example, it is assumed that the operator of the training data generating device 10 mainly controls the "number of people" and acquires a number of crowd patches and training labels. Here, it is assumed that a large amount of training data is acquired according to two categories including when the number of people is less than n and when the number of people is n or greater (see the upper part of FIG10 ). It is then assumed that a dictionary of discriminators obtained by machine learning based on the training data is stored in the crowd state recognition dictionary storage device 52. In this example, the crowd state recognition device 41 can recognize whether the crowd state in various areas in the image 63 is a non-abnormal crowd or an abnormal crowd as shown in FIG10 .

图11是通过示例方式图示了如何识别作为图像中的人群状态的无序状态(人的方向不统一)或有序状态(人的方向统一)的示意图。例如，假定训练数据生成设备10的操作者按照包括“人的方向”何时统一和何时不统一在内的两类获取大量训练数据(见图11的上部)。然后假定通过根据训练数据的机器学习而获取的鉴别器的字典被存储在人群状态识别字典存储装置52中。在这一示例中，人群状态识别装置41如在图11中示出可以识别图像64中的各种区域中的人群状态是无序状态还是有序状态。FIG11 is a schematic diagram illustrating, by way of example, how to identify a disordered state (people's directions are not uniform) or an ordered state (people's directions are uniform) as the state of a crowd in an image. For example, it is assumed that the operator of the training data generating device 10 obtains a large amount of training data according to two categories, including when "people's directions" are uniform and when they are not uniform (see the upper part of FIG11). It is then assumed that a dictionary of discriminators obtained through machine learning based on the training data is stored in the crowd state recognition dictionary storage device 52. In this example, the crowd state recognition device 41 can identify whether the crowd state in various areas of the image 64 is a disordered state or an ordered state, as shown in FIG11.

因为大量的操作者期望的训练数据可以被生成，因此人群状态识别装置41可以识别各种状态，诸如除在图8至图11中示出的情况之外的其中人群散开并奔跑的离散状态、其中人群在一位置处聚集的聚集状态、其中人群避免某事的回避状态、指示特殊人群群集的逗留(hanging)状态以及直线(line)状态。Because a large amount of operator-desired training data can be generated, the crowd state recognition device 41 can recognize various states, such as a discrete state in which a crowd disperses and runs, a gathering state in which a crowd gathers at one location, an avoidance state in which a crowd avoids something, a hanging state indicating a special crowd cluster, and a line state, in addition to the situations shown in Figures 8 to 11.

根据本发明的训练数据生成设备10的处理过程在下面将被描述。图12是图示了训练数据生成设备10的示例性处理进展的流程图。The processing procedure of the training data generating device 10 according to the present invention will be described below. FIG12 is a flowchart illustrating an exemplary processing progress of the training data generating device 10.

背景提取装置11从存储在背景图像存储装置21中的该一组背景图像选择背景图像，并且提取被用作人群补丁的背景的图像(步骤S1)。The background extraction means 11 selects a background image from the group of background images stored in the background image storage means 21 and extracts an image to be used as a background of a crowd patch (step S1 ).

图13是图示了步骤S1的示例性处理进展的流程图。在步骤S1中，背景提取装置11首先从存储在背景图像存储装置21中的该一组背景图像选择一个背景图像(步骤S101)。选择方法不受特别限制。例如，背景提取装置11可以从该组背景图像选择任何一个背景图像。FIG13 is a flowchart illustrating an exemplary process of step S1. In step S1, the background extraction device 11 first selects a background image from the set of background images stored in the background image storage device 21 (step S101). The selection method is not particularly limited. For example, the background extraction device 11 may select any one background image from the set of background images.

背景提取装置11然后计算在学习局部图像信息存储装置22中存储的人群补丁的纵横比，并且从选择的背景图像临时提取适当位置和适当尺寸的背景以满足纵横比(步骤S102)。The background extraction means 11 then calculates the aspect ratio of the crowd patches stored in the learned partial image information storage means 22 and temporarily extracts a background of an appropriate position and an appropriate size to satisfy the aspect ratio from the selected background image (step S102 ).

背景提取装置11放大或者缩小(或者归一化)临时提取的背景图像以与人群补丁尺寸匹配，由此获取作为人群补丁的背景的图像(步骤S103)。这是步骤S1的结束。The background extraction means 11 enlarges or reduces (or normalizes) the temporarily extracted background image to match the size of the crowd patch, thereby acquiring an image as the background of the crowd patch (step S103 ). This is the end of step S1 .

在步骤S1之后，背景人状态确定装置12确定与背景对应的人的状态(步骤S2)。After step S1, the background person state determination means 12 determines the state of the person corresponding to the background (step S2).

图14是图示了步骤S2的示例性处理进展的流程图。背景人状态确定装置12定义人的布置、人的方向、人的数目、人的拍摄角度、对人的照明、人的姿势、人的衣服、人的身体形状、人的发型、当与人群补丁合成时的人尺寸等，并且根据在人群状态控制指明存储装置23中存储的多人状态控制指明和在人状态控制指明存储装置24中存储的个别人状态控制指明来临时确定与背景对应的人的状态(步骤S201)。14 is a flowchart illustrating an exemplary processing progress of step S2. The background person state determination means 12 defines the arrangement of people, the direction of people, the number of people, the shooting angle of people, the lighting of people, the posture of people, the clothing of people, the body shape of people, the hairstyle of people, the size of people when synthesized with the crowd patch, etc., and temporarily determines the state of people corresponding to the background based on the multi-person state control designation stored in the crowd state control designation storage means 23 and the individual person state control designation stored in the person state control designation storage means 24 (step S201).

背景人状态确定装置12然后确定在步骤S201中临时确定的人状态是否满足背景人状态的条件(步骤S202)。该条件在上面已经被描述，因而其描述在这里将被省略。The background person state determining means 12 then determines whether the person state temporarily determined in step S201 satisfies the condition of the background person state (step S202). The condition has been described above, so its description will be omitted here.

多人状态控制指明或个别人状态控制指明可能包括“随机”的指明等，并且因而在步骤S201中临时确定的状态可能不满足背景人状态的条件。在这种情况下(步骤S202中的“否”)，背景人状态确定装置12重复地执行步骤S201中及其之后的处理。The multi-person state control designation or the individual person state control designation may include a "random" designation, etc., and thus the state temporarily determined in step S201 may not satisfy the condition of the background person state. In this case ("No" in step S202), the background person state determination means 12 repeatedly performs the processing in and after step S201.

当在步骤S201中临时确定的状态满足背景人状态的条件时(步骤S202中的“是”)，背景人状态确定装置12将在步骤S201中临时确定的最新人状态定义为与背景对应的人的状态(步骤S203)。这是步骤S2的结束。When the state temporarily determined in step S201 satisfies the condition of the background person state ("Yes" in step S202), the background person state determining means 12 defines the latest person state temporarily determined in step S201 as the state of the person corresponding to the background (step S203). This is the end of step S2.

在步骤S2之后，前景人状态确定装置13确定与前景对应的人的状态(步骤S3)。After step S2, the foreground person state determination means 13 determines the state of the person corresponding to the foreground (step S3).

图15是图示了步骤S3的示例性处理进展的流程图。前景人状态确定装置13定义人的布置、人的方向、人的数目、人的拍摄角度、对人的照明、人的姿势、人的衣服、人的身体形状、人的发型、当与人群补丁合成时的人尺寸等，并且根据在人群状态控制指明存储装置23中存储的多人状态控制指明和在人状态控制指明存储装置24中存储的个别人状态控制指明来临时确定与前景对应的人的状态(步骤S301)。FIG15 is a flowchart illustrating an exemplary processing progress of step S3. The foreground person state determining means 13 defines the arrangement of people, the direction of people, the number of people, the shooting angle of people, the lighting of people, the posture of people, the clothing of people, the body shape of people, the hairstyle of people, the size of people when synthesized with the crowd patch, etc., and temporarily determines the state of the person corresponding to the foreground based on the multi-person state control designation stored in the crowd state control designation storage means 23 and the individual person state control designation stored in the person state control designation storage means 24 (step S301).

前景人状态确定装置13然后确定在步骤S301中临时确定的人状态是否满足前景人状态的条件(步骤S302)。该条件在上面已经被描述，因而其描述在这里将被省略。The foreground person state determining means 13 then determines whether the person state temporarily determined in step S301 satisfies the foreground person state condition (step S302). The condition has been described above, so its description will be omitted here.

多人状态控制指明或个别人状态控制指明可能包括“随机”的指明等，并且因而在步骤S301中临时确定的状态可能不满足前景人状态的条件。在这种情况下(步骤S302中的“否”)，前景人状态确定装置13重复地执行步骤S301中及其之后的处理。The multi-person state control designation or the individual person state control designation may include a "random" designation, etc., and thus the state temporarily determined in step S301 may not satisfy the condition of the foreground person state. In this case ("No" in step S302), the foreground person state determination means 13 repeatedly performs the processing in and after step S301.

当在步骤S301中临时确定的状态满足前景人状态的条件时(步骤S302中的“是”)，前景人状态确定装置13将在步骤S301中临时确定的最新人状态定义为与前景对应的人的状态(步骤S303)。这是步骤S3的结束。When the state temporarily determined in step S301 satisfies the foreground person state condition ("Yes" in step S302), the foreground person state determining means 13 defines the latest person state temporarily determined in step S301 as the state of the person corresponding to the foreground (step S303). This is the end of step S3.

在步骤S3之后，人群状态图像合成装置14基于在步骤S2和S3中确定的人状态来生成人群补丁，读取对应于人群补丁的训练标签，并且输出人群补丁和训练标签的配对(步骤S4)。After step S3 , the crowd state image synthesizing means 14 generates crowd patches based on the human states determined in steps S2 and S3 , reads training labels corresponding to the crowd patches, and outputs pairs of the crowd patches and the training labels (step S4 ).

图16是图示了步骤S4的示例性处理进展的流程图。人群状态图像合成装置14从人图像存储装置25中的该一组人图像选择并读取满足在步骤S2和S3中确定的人状态(诸如人的方向、人的数目、人的拍摄角度、对人的照明、人的姿势、人的衣服、人的身体形状和人的发型)的人图像(步骤S401)。16 is a flowchart illustrating an exemplary processing progress of step S4. The crowd state image synthesizing means 14 selects and reads a person image that satisfies the person state (such as the direction of the person, the number of people, the shooting angle of the person, the lighting for the person, the posture of the person, the clothing of the person, the body shape of the person, and the hairstyle of the person) determined in steps S2 and S3 from the group of person images in the person image storage means 25 (step S401).

人群状态图像合成装置14然后从人区域图像存储装置26读取与在步骤S401选择的每个人图像对应的每个人区域图像。人群状态图像合成装置14通过使用对应于人图像的人区域图像针对每个人图像裁剪只有人的图像(步骤S402)。The crowd state image synthesizing means 14 then reads each person region image corresponding to each person image selected at step S401 from the person region image storage means 26. The crowd state image synthesizing means 14 crops only a person image for each person image by using the person region image corresponding to the person image (step S402).

人群状态图像合成装置14根据在步骤S2和S3中确定的“人的布置”和“当与人群补丁合成时的人尺寸”来确定在步骤S402中生成的用于每个只有人的图像的布置状态(步骤S403)。人群状态图像合成装置14然后根据布置状态使每个只有人的图像与在步骤S1中获取的背景图像合成以由此生成人群补丁(步骤S404)。The crowd state image synthesizing means 14 determines the arrangement state for each of the images containing only people generated in step S402 based on the “arrangement of people” and the “size of people when synthesized with the crowd patch” determined in steps S2 and S3 (step S403). The crowd state image synthesizing means 14 then synthesizes each of the images containing only people with the background image acquired in step S1 based on the arrangement state to thereby generate a crowd patch (step S404).

人群状态图像合成装置14然后获取对应于人群补丁的训练标签(步骤S405)。就是说，人群状态图像合成装置14从人群状态控制指明存储装置23读取具有指明的训练标签的项目的多人状态控制指明的内容，并且从人状态控制指明存储装置24读取具有指明的训练标签的项目的个别人状态控制指明的内容。读取的内容对应于训练标签。The crowd state image synthesizing means 14 then obtains the training labels corresponding to the crowd patches (step S405). That is, the crowd state image synthesizing means 14 reads the contents of the multi-person state control designation of the item with the specified training label from the crowd state control designation storage means 23, and reads the contents of the individual person state control designation of the item with the specified training label from the person state control designation storage means 24. The read contents correspond to the training labels.

人群状态图像合成装置14输出在步骤S404中生成的人群补丁和在步骤S405中得到的训练标签的配对(步骤S406)。这是步骤S4的结束。The crowd state image synthesizing device 14 outputs a pair of the crowd patch generated in step S404 and the training label obtained in step S405 (step S406 ). This is the end of step S4 .

在步骤S4之后，控制装置16确定步骤S1至S4中的处理的重复次数是否达到预定次数(步骤S5)。当步骤S1至S4中的处理的重复次数未达到预定次数时(步骤S5中的“否”)，控制装置16使背景提取装置11、人状态确定装置15(具体而言是背景人状态确定装置12和前景人状态确定装置13)和人群状态图像合成装置14重复地执行步骤S1至S4中的处理。After step S4, the control device 16 determines whether the number of repetitions of the processes in steps S1 to S4 has reached a predetermined number of times (step S5). When the number of repetitions of the processes in steps S1 to S4 has not reached the predetermined number of times ("No" in step S5), the control device 16 causes the background extraction device 11, the human state determination device 15 (specifically, the background human state determination device 12 and the foreground human state determination device 13), and the crowd state image synthesis device 14 to repeatedly perform the processes in steps S1 to S4.

当步骤S1至S4中的处理的重复次数达到预定次数时(步骤S5中的“是”)，处理被终止。When the number of repetitions of the processes in steps S1 to S4 reaches a predetermined number ("YES" in step S5), the processes are terminated.

步骤S1至S4中的处理被执行一次以使得人群补丁和训练标签的配对被得到。因此，数据处理设备1重复地执行步骤S1至S4中的处理预定次以使得大量训练数据被得到。例如，当预定次数被定义为100000时，与多人状态控制指明和个别人状态控制指明匹配的人群补丁和训练标签的100000个配对被得到。The processing in steps S1 to S4 is performed once to obtain a pair of crowd patches and training labels. Therefore, the data processing device 1 repeatedly performs the processing in steps S1 to S4 a predetermined number of times to obtain a large amount of training data. For example, when the predetermined number of times is defined as 100,000, 100,000 pairs of crowd patches and training labels that match multi-person status control indications and individual person status control indications are obtained.

步骤S1、S2和S3的次序在图12中示出的流程图中可以被替换。The order of steps S1 , S2 and S3 may be replaced in the flowchart shown in FIG. 12 .

根据本发明的人群状态识别设备30的处理进展在下面将被描述。图17是图示了人群状态识别设备30的示例性处理进展的流程图。The processing progress of the crowd state recognition device 30 according to the present invention will be described below. FIG17 is a flowchart illustrating an exemplary processing progress of the crowd state recognition device 30 ...

图像获取设备3获取针对人群状态将被识别的图像，并且将该图像输入到人群状态识别装置41中(步骤S21)。The image acquisition device 3 acquires an image to be recognized for a crowd state, and inputs the image into the crowd state recognition means 41 (step S21 ).

人群状态识别装置41然后确定在搜索窗口存储装置51中存储的整组矩形区域是否已被选择(步骤S22)。The crowd state recognition means 41 then determines whether the entire set of rectangular areas stored in the search window storage means 51 has been selected (step S22).

当在搜索窗口存储装置51中存储的该组矩形区域中存在未选择的矩形区域时(步骤S22中的“否”)，人群状态识别装置41从该一组矩形区域选择一个未选择的矩形区域(步骤S23)。When there is an unselected rectangular area in the group of rectangular areas stored in the search window storage device 51 ("No" in step S22), the crowd state recognition device 41 selects an unselected rectangular area from the group of rectangular areas (step S23).

人群状态识别装置41然后从在步骤S21中输入的图像提取与选择的矩形区域对应的局部区域图像(步骤S24)。人群状态识别装置41然后使该局部区域图像归一化以与人群补丁尺寸匹配(步骤S25)。The crowd state recognition device 41 then extracts a local area image corresponding to the selected rectangular area from the image input in step S21 (step S24). The crowd state recognition device 41 then normalizes the local area image to match the crowd patch size (step S25).

人群状态识别装置41然后通过使用在人群状态识别字典存储装置52中存储的鉴别器的字典来识别归一化的局部区域图像内的人群状态(步骤S26)。The crowd state recognition means 41 then recognizes the crowd state within the normalized local area image by using the dictionary of discriminators stored in the crowd state recognition dictionary storage means 52 (step S26 ).

在步骤S26之后，人群状态识别装置41重复地执行步骤S22中及其后的处理。当确定整组矩形区域已被选择时(步骤S22中的“是”)，人群状态识别装置41然后终止该处理。After step S26, the crowd state recognition means 41 repeatedly performs the processing in and after step S22. When determining that the entire set of rectangular areas has been selected ("Yes" in step S22), the crowd state recognition means 41 then terminates the processing.

利用根据本发明的训练数据生成设备，人状态确定装置15根据由操作者定义的多人状态控制指明(针对多人的状态指明，诸如“人的布置”、“人的方向”和“人的数目”)和个别人状态控制指明(针对个别人的状态指明，诸如“人的拍摄角度”、“对人的照明”、“人的姿势”、“人的衣服”、“人的身体形状”、“人的发型”和“当与人群补丁合成时的人尺寸”)来确定构成人群的人的状态。人群状态图像合成装置14然后合成确定状态下的人图像以由此生成人群补丁，并且读取对应于该人群补丁的训练标签。然后，确定人状态、生成人群补丁和指定训练标签的处理被重复地执行预定次以使得操作者期望的人群状态的大量各种训练数据(多对人群补丁和训练标签)可以被自动生成。By using the training data generating device according to the present invention, the human state determining means 15 determines the states of the people constituting the crowd based on the multi-person state control designation (state designation for multiple people, such as "arrangement of people", "direction of people" and "number of people") and the individual person state control designation (state designation for individual people, such as "shooting angle of people", "lighting on people", "posture of people", "clothing of people", "body shape of people", "hair style of people" and "size of people when synthesized with crowd patches") defined by the operator. The crowd state image synthesizing means 14 then synthesizes the image of people in the determined state to thereby generate a crowd patch, and reads the training label corresponding to the crowd patch. Then, the process of determining the human state, generating the crowd patch and specifying the training label is repeatedly performed a predetermined number of times so that a large amount of various training data (a plurality of pairs of crowd patches and training labels) of the crowd state desired by the operator can be automatically generated.

另外，如果大量这种训练数据被获取，则鉴别器的字典可以根据训练数据而被机器学习。然后人群状态识别设备30可以通过使用该字典在静止图像中容易地识别复杂的人群状态。In addition, if a large amount of such training data is acquired, the dictionary of the discriminator can be machine-learned based on the training data. The crowd state recognition device 30 can then easily recognize complex crowd states in a still image by using the dictionary.

人群状态识别设备30中的人群状态识别装置41通过使用基于表达人群的人群补丁和对应于人群补丁的训练标签而学习的字典来识别给定图像中的人群状态。因此，人群状态识别装置41不是以诸如人的头部之类的单个对象为单位而是以作为其基准部位被拍摄的人的集合的人群为更大单位来识别人群状态。由此，其中头部或者个别人无法被识别的小型区域中的人群状态可以被识别。The crowd state recognition device 41 in the crowd state recognition apparatus 30 recognizes the state of a crowd in a given image using a dictionary learned based on crowd patches expressing a crowd and training labels corresponding to the crowd patches. Therefore, the crowd state recognition device 41 recognizes the state of a crowd not based on a single object such as a person's head, but rather on a larger unit of a group of people whose reference parts are captured. This allows the recognition of crowd states in small areas where heads or individual people cannot be recognized.

对于根据本发明的人群状态识别设备30，人群状态识别装置41通过使用字典(鉴别器的字典)来识别人群状态。因此，识别人群状态的精度不依赖于帧速率。由此，根据本发明的人群状态识别设备无论帧速率如何都可以优选地识别图像中的人群状态。例如，根据本发明的人群状态识别设备30即使在静止图像中也可以优选地识别人群状态。In the crowd state recognition device 30 according to the present invention, the crowd state recognition means 41 recognizes the crowd state by using a dictionary (a discriminator's dictionary). Therefore, the accuracy of recognizing the crowd state is independent of the frame rate. Thus, the crowd state recognition device according to the present invention can preferably recognize the crowd state in an image regardless of the frame rate. For example, the crowd state recognition device 30 according to the present invention can preferably recognize the crowd state even in a still image.

根据以上示例性实施例的训练数据生成设备10根据多人状态控制指明来确定诸如人之间的重叠之类的“人的布置”的人状态，并且生成指示人状态的人群补丁。当通过使用这种人群补丁来执行机器学习时，也可以学习包括人之间的遮挡(occlusion)在内的状态。因此，即使当发生难以通过头部识别或者人识别来识别的人之间的重叠(遮挡)时，人群状态识别设备30也可以通过使用作为学习的结果而获取的字典来优选地识别人群状态。The training data generation device 10 according to the above exemplary embodiment determines the human state of "human arrangement" such as overlap between people based on the multi-person state control specification, and generates a crowd patch indicating the human state. When machine learning is performed using such a crowd patch, states including occlusion between people can also be learned. Therefore, even when overlap (occlusion) between people occurs, which is difficult to recognize by head recognition or person recognition, the crowd state recognition device 30 can preferably recognize the crowd state by using the dictionary acquired as a result of learning.

根据以上示例性实施例的训练数据生成设备10确定人状态、生成拍摄该状态下的人的人群补丁并且根据指明多人的人状态的信息(多人状态控制指明)和指明每个人的人状态的信息(个别人状态控制指明)来指定对应于该人群补丁的训练标签。因此，操作者定义多人状态控制指明或者个别人状态控制指明以由此容易地获取用于识别不同性质人群状态的训练数据。然后，训练数据被机器学习，由此容易地制成用于识别不同性质人群状态的人群状态识别设备30。The training data generation device 10 according to the above exemplary embodiment determines a person state, generates a crowd patch that captures the person in that state, and assigns a training label corresponding to the crowd patch based on information indicating the person state of multiple people (multi-person state control designation) and information indicating the person state of each person (individual person state control designation). Thus, the operator defines the multi-person state control designation or the individual person state control designation to easily obtain training data for identifying different types of crowd states. The training data is then machine-learned, thereby easily creating a crowd state recognition device 30 for identifying different types of crowd states.

根据以上示例性实施例，如果指示图像获取设备(相机)3在人群拍摄环境中的位置、姿势、焦距和透镜畸变的相机参数可以被得到，则限于该环境的多人状态控制指明或个别人状态控制指明可以通过使用相机参数而被定义。训练数据生成设备10根据多人状态控制指明或个别人状态控制指明来确定人状态并且生成训练数据，由此学习适合于人群拍摄环境的鉴别器的字典。结果，人群状态识别设备30可以按照高精度在静止图像等中识别复杂的人群状态。According to the above exemplary embodiment, if the camera parameters indicating the position, posture, focal length, and lens distortion of the image acquisition device (camera) 3 in a crowd shooting environment can be obtained, a multi-person state control designation or an individual person state control designation limited to that environment can be defined by using the camera parameters. The training data generation device 10 determines the state of a person based on the multi-person state control designation or the individual person state control designation and generates training data, thereby learning a dictionary of discriminators suitable for the crowd shooting environment. As a result, the crowd state recognition device 30 can recognize complex crowd states in still images, etc. with high accuracy.

根据上面的示例性实施例，如果指示图像获取设备3在识别环境中的位置、姿势、焦距和透镜畸变的相机参数可以被获取，则人的人状态和每个人的人状态可以按照图像上的局部区域而被控制。然后，可以通过基于受控的人状态合成人图像来自动地生成大量操作者期望的人群补丁和对应于人群补丁的训练标签。然后，可以基于人群补丁和训练标签按照图像上的局部区域来学习鉴别器的字典，并且可以通过按照图像上的区域使用鉴别器的字典来增加识别复杂的人群状态的精度。According to the exemplary embodiment described above, if camera parameters indicating the position, posture, focal length, and lens distortion of the image acquisition device 3 in the recognition environment can be acquired, the human state of each person and the human state of each person can be controlled according to the local area on the image. Then, by synthesizing human images based on the controlled human state, a large number of crowd patches desired by the operator and training labels corresponding to the crowd patches can be automatically generated. Then, based on the crowd patches and training labels, a discriminator dictionary can be learned according to the local area on the image, and by using the discriminator dictionary according to the area on the image, the accuracy of recognizing complex crowd states can be increased.

根据本发明的训练数据生成设备和人群状态识别设备的具体结构在下面将通过示例方式来描述。图18是通过示例方式图示了根据本发明的训练数据生成设备的具体结构的框图。与在图1中示出的组件相同的组件用与图1中相同的标号来表示，并且其详细描述将被省略。在图18中示出的示例性结构中，包括背景图像存储装置21、学习局部图像信息存储装置22、人群状态控制指明存储装置23、人状态控制指明存储装置24、人图像存储装置25和人区域图像存储装置26的存储设备2被连接到计算机100。用于将训练数据生成程序101存储在其中的计算机可读存储介质102也被连接到计算机100。The specific structure of the training data generating device and the crowd state recognition device according to the present invention will be described below by way of example. Figure 18 is a block diagram illustrating the specific structure of the training data generating device according to the present invention by way of example. Components identical to those shown in Figure 1 are denoted by the same reference numerals as in Figure 1, and their detailed descriptions will be omitted. In the exemplary structure shown in Figure 18, a storage device 2 including a background image storage device 21, a learning local image information storage device 22, a crowd state control indication storage device 23, a human state control indication storage device 24, a human image storage device 25, and a human area image storage device 26 is connected to the computer 100. A computer-readable storage medium 102 for storing the training data generating program 101 therein is also connected to the computer 100.

计算机可读存储介质102例如由磁盘、半导体存储器等实现。例如，当被激活时，计算机100从计算机可读存储介质102读取训练数据生成程序101。计算机100然后根据训练数据生成程序101作为在图1中示出的数据处理设备1中的背景提取装置11、人状态确定装置15(更具体地说，背景人状态确定装置12和前景人状态确定装置13)、人群状态图像合成装置14和控制装置16操作。The computer-readable storage medium 102 is implemented by, for example, a magnetic disk, a semiconductor memory, or the like. For example, when activated, the computer 100 reads the training data generation program 101 from the computer-readable storage medium 102. The computer 100 then operates according to the training data generation program 101 as the background extraction device 11, the human state determination device 15 (more specifically, the background human state determination device 12 and the foreground human state determination device 13), the crowd state image synthesis device 14, and the control device 16 in the data processing apparatus 1 shown in FIG.

图19是通过示例方式图示了根据本发明的人群状态识别设备的具体结构的框图。与在图7中示出的组件相同的组件用与图7中相同的标号来表示，并且其详细描述将被省略。在图19中示出的示例性结构中，包括搜索窗口存储装置51和人群状态识别字典存储装置52的存储设备5被连接到计算机150。用于将人群状态识别程序103存储在其中的计算机可读存储介质104也被连接到计算机150。FIG19 is a block diagram illustrating, by way of example, the specific structure of a crowd state recognition device according to the present invention. Components identical to those shown in FIG7 are denoted by the same reference numerals as in FIG7 , and their detailed descriptions will be omitted. In the exemplary structure shown in FIG19 , a storage device 5 including a search window storage device 51 and a crowd state recognition dictionary storage device 52 is connected to a computer 150. A computer-readable storage medium 104 storing a crowd state recognition program 103 is also connected to the computer 150.

计算机可读存储介质104例如由磁盘、半导体存储器等实现。例如，当被激活时，计算机150从计算机可读存储介质104读取人群状态识别程序103。计算机100然后根据人群状态识别程序103作为在图7中示出的数据处理设备4中的人群状态识别装置41操作。The computer-readable storage medium 104 is implemented by, for example, a magnetic disk, a semiconductor memory, etc. For example, when activated, the computer 150 reads the crowd state recognition program 103 from the computer-readable storage medium 104. The computer 100 then operates as the crowd state recognition device 41 in the data processing apparatus 4 shown in FIG. 7 according to the crowd state recognition program 103.

在上面的示例性实施例中已经描述了其中人群状态识别字典存储装置52(见图7)存储通过利用由训练数据生成设备10(见图1)生成的训练数据进行学习而获取的字典的情况。换言之，在上面的示例性实施例中已经描述了其中通过利用人群补丁和人群补丁的训练标签的多个配对进行机器学习而获取的字典被存储在人群状态识别字典存储装置52中的情况，人群补丁通过合成与控制为期望状态的人状态匹配的人图像而被获取。In the above exemplary embodiment, the case where the crowd state recognition dictionary storage device 52 (see FIG. 7 ) stores a dictionary acquired through learning using the training data generated by the training data generating device 10 (see FIG. 1 ) has been described. In other words, in the above exemplary embodiment, the case where a dictionary acquired through machine learning using a plurality of pairs of crowd patches and training labels of the crowd patches is stored in the crowd state recognition dictionary storage device 52 has been described, the crowd patches being acquired by synthesizing images of people that match the state of people controlled to a desired state.

人群状态识别字典存储装置52可以将通过利用除由训练数据生成设备10生成的训练数据之外的数据进行机器学习而获取的字典存储为训练数据。即使对于除由训练数据生成设备10生成的训练数据之外的训练数据，人群补丁和人群补丁的训练标签的多个配对被准备并且可被用作训练数据，该人群补丁包括一人，其基准部位被表达为与针对人群补丁的尺寸定义的人的基准部位的尺寸一样大。就是说，通过利用多对人群补丁和训练标签进行机器学习而获取的鉴别器的字典可以被存储在人群状态识别字典存储装置52中。同样在这种情况下，可以得到一种效果，即无论帧速率如何都可以优选地识别图像中的人群状态。The crowd state recognition dictionary storage device 52 can store, as training data, a dictionary acquired through machine learning using data other than the training data generated by the training data generation device 10. Even for training data other than the training data generated by the training data generation device 10, multiple pairs of crowd patches and training labels for the crowd patches are prepared and used as training data. The crowd patches include a person whose reference parts are expressed as large as the reference parts of the person defined for the size of the crowd patches. That is, a dictionary of discriminators acquired through machine learning using multiple pairs of crowd patches and training labels can be stored in the crowd state recognition dictionary storage device 52. In this case, too, the crowd state in an image can be preferably recognized regardless of the frame rate.

根据本发明的主要部分在下面将被描述。图20是图示了根据本发明的训练数据生成设备中的主要部分的框图。根据本发明的训练数据生成设备包括背景提取单元71、人状态确定单元72和人群状态图像合成单元73。The main parts of the present invention will be described below. Figure 20 is a block diagram illustrating the main parts of the training data generation device according to the present invention. The training data generation device according to the present invention includes a background extraction unit 71, a human state determination unit 72, and a crowd state image synthesis unit 73.

背景提取单元71(例如，背景提取装置11)从多个预先准备的背景图像选择背景图像，提取该背景图像中的区域并且将与提取出的区域对应的图像放大或者缩小为预定尺寸的图像。The background extraction unit 71 (for example, the background extraction device 11 ) selects a background image from a plurality of pre-prepared background images, extracts a region in the background image, and enlarges or reduces the image corresponding to the extracted region to an image of a predetermined size.

人状态确定单元72(例如，人状态确定装置15)根据作为关于人的人状态的指明信息的多人状态控制指明和作为关于多人中的个别人的状态的指明信息的个别人状态控制指明来确定人群的人状态。The human state determination unit 72 (for example, the human state determination device 15) determines the human state of the crowd based on the multi-person state control indication as indication information about the human state of people and the individual person state control indication as indication information about the state of individual people among the crowd.

人群状态图像合成单元73生成作为其中与人状态确定单元72所确定的人状态对应的人图像被与背景提取单元71所得到的预定尺寸的图像合成的图像的人群状态图像(诸如人群补丁)，指定人群状态图像的训练标签，并且输出一对人群状态图像和训练标签。The crowd state image synthesis unit 73 generates a crowd state image (such as a crowd patch) which is an image in which a human image corresponding to the human state determined by the human state determination unit 72 is synthesized with an image of a predetermined size obtained by the background extraction unit 71, specifies a training label of the crowd state image, and outputs a pair of crowd state images and training labels.

例如，背景提取单元71、人状态确定单元72和人群状态图像合成单元73顺序地重复这些操作。背景提取单元71、人状态确定单元72和人群状态图像合成单元73的操作可以不被顺序地执行。例如，背景提取单元71和人状态确定单元72可以并行地执行操作。For example, the background extraction unit 71, the human state determination unit 72, and the crowd state image synthesis unit 73 repeat these operations sequentially. The operations of the background extraction unit 71, the human state determination unit 72, and the crowd state image synthesis unit 73 may not be performed sequentially. For example, the background extraction unit 71 and the human state determination unit 72 may perform operations in parallel.

利用该结构，用于机器学习用于识别人群状态的鉴别器的字典的大量训练数据可以被容易地生成。With this structure, a large amount of training data for machine learning a dictionary of a discriminator for recognizing crowd states can be easily generated.

图21是图示了根据本发明的人群状态识别设备中的主要部分的框图。根据本发明的人群状态识别设备包括矩形区域组存储单元81、人群状态识别字典存储单元82和人群状态识别单元83。21 is a block diagram illustrating the main parts of the crowd state recognition device according to the present invention. The crowd state recognition device according to the present invention includes a rectangular area group storage unit 81, a crowd state recognition dictionary storage unit 82, and a crowd state recognition unit 83.

矩形区域组存储单元81(例如，搜索窗口存储装置51)存储指示图像上将针对人群状态而被识别的部分的一组矩形区域。The rectangular region group storage unit 81 (eg, the search window storage device 51 ) stores a group of rectangular regions indicating portions on an image to be recognized for a crowd state.

人群状态识别字典存储单元82(例如，人群状态识别字典存储装置52)存储通过利用多对人群状态图像(诸如人群补丁)和人群状态图像的训练标签进行机器学习而得到的鉴别器的字典，人群状态图像是其中包括一人的图像，该人的基准部位被表示为与针对表示人群状态的图像的预定尺寸定义的人的基准部位的尺寸一样大。The crowd state recognition dictionary storage unit 82 (e.g., the crowd state recognition dictionary storage device 52) stores a dictionary of discriminators obtained by machine learning using multiple pairs of crowd state images (such as crowd patches) and training labels of the crowd state images, where the crowd state image is an image including a person whose reference part is represented as the same size as the reference part of the person defined for a predetermined size of the image representing the crowd state.

人群状态识别单元83(例如人群状态识别装置41)从给定图像提取由在矩形区域组存储单元81中存储的该组矩形区域所指示的区域，并且基于字典来识别在提取出的图像中拍摄的人群的状态。The crowd state recognition unit 83 (eg, the crowd state recognition device 41 ) extracts the area indicated by the group of rectangular areas stored in the rectangular area group storage unit 81 from a given image, and recognizes the state of the crowd captured in the extracted image based on a dictionary.

利用该结构，无论帧速率如何都可以优选地识别图像中的人群状态。With this structure, the state of a crowd in an image can be preferably recognized regardless of the frame rate.

上面的示例性实施例中的部分或者全部可以如在以下补充说明中描述，但是不限于以下。Part or all of the above exemplary embodiments may be as described in the following supplementary notes, but are not limited to the following.

(补充说明1)(Supplementary Note 1)

一种训练数据生成设备，包括：A training data generating device, comprising:

背景提取装置，用于从多个预先准备的背景图像选择背景图像，提取该背景图像中的区域，并且将对应于提取的区域的图像放大或者缩小为预定尺寸的图像；Background extraction means for selecting a background image from a plurality of pre-prepared background images, extracting a region in the background image, and enlarging or reducing an image corresponding to the extracted region to an image of a predetermined size;

人状态确定装置，用于根据作为关于多人的人状态的指明信息的多人状态控制指明和作为关于多人中的个别人的状态的指明信息的个别人状态控制指明来确定人群的人状态；以及a person state determining means for determining the person state of a group of people based on a multi-person state control designation as designation information on the person state of a plurality of people and an individual person state control designation as designation information on the state of an individual person among the plurality of people; and

人群状态图像合成装置，用于生成人群状态图像、指定用于该人群状态图像的训练标签以及输出人群状态图像和训练标签的配对，人群状态图像是其中与人状态确定装置所确定的人状态对应的人图像被与由背景提取装置获取的预定尺寸的图像合成的图像。Crowd state image synthesis means is used to generate a crowd state image, specify a training label for the crowd state image, and output a pair of the crowd state image and the training label, wherein the crowd state image is an image in which a human image corresponding to the human state determined by the human state determination means is synthesized with an image of a predetermined size acquired by the background extraction means.

(补充说明2)(Supplementary Note 2)

根据补充说明1所述的训练数据生成设备，The training data generating device according to Supplementary Note 1,

其中人状态确定装置根据多人状态控制指明和个别人状态控制指明来临时确定人群的人状态，在临时确定的人状态满足用于针对预定尺寸定义的人的基准部位的尺寸和基准部位如何被表达的条件时、将临时确定的人状态确定为人群的人状态，以及当临时确定的人状态不满足这些条件时、重复地进行对人群的人状态的临时确定。The human state determination device temporarily determines the human state of the crowd based on the multi-person state control indication and the individual person state control indication, and when the temporarily determined human state meets the conditions of the size of the reference part of a person defined for a predetermined size and how the reference part is expressed, the temporarily determined human state is determined as the human state of the crowd, and when the temporarily determined human state does not meet these conditions, the temporary determination of the human state of the crowd is repeated.

(补充说明3)(Supplementary Note 3)

根据补充说明1或2所述的训练数据生成设备，包括：The training data generating device according to Supplementary Note 1 or 2, comprising:

人群状态控制指明存储装置，用于存储按照项目定义的多人状态控制指明以及存储针对该项目定义的指明的训练标签的存在；以及Crowd state control indication storage means for storing a multi-person state control indication defined according to a project and storing the existence of a training label for the indication defined for the project; and

人状态控制指示存储装置，用于存储按照项目定义的个别人状态控制指明以及存储针对该项目定义的指明的训练标签的存在，A person state control indication storage device for storing individual person state control indications defined according to a project and storing the existence of training labels for the indications defined for the project,

其中人状态确定装置根据在人群状态控制指明存储装置中存储的多人状态控制指明和在人状态控制指示存储装置中存储的个别人状态控制指明，来确定人群的人状态，并且wherein the human state determining means determines the human state of the crowd according to the multi-person state control indication stored in the crowd state control indication storage means and the individual human state control indication stored in the human state control indication storage means, and

人群状态图像合成装置通过从人群状态控制指明存储装置读取被定义为具有指明的训练标签的项目的多人状态控制指明、和从人状态控制指明存储装置读取被定义为具有指明的训练标签的项目的个别人状态控制指明，来指定训练标签。The crowd state image synthesis device specifies the training label by reading the multi-person state control designation defined as an item with the specified training label from the crowd state control designation storage device, and reading the individual person state control designation defined as an item with the specified training label from the person state control designation storage device.

(补充说明4)(Supplementary Note 4)

根据补充说明3所述的训练数据生成设备，The training data generating device according to Supplementary Note 3,

其中，人群状态控制指明存储装置将至少一个项目存储为具有指明的训练标签，并且wherein the crowd state control specifies that the storage device stores at least one item as having a specified training label, and

人群状态图像合成装置从人群状态控制指明存储装置读取被定义为具有指明的训练标签的项目的多人状态控制指明。The crowd state image synthesizing means reads a multi-person state control specification defined as an item having a specified training label from the crowd state control specification storage means.

(补充说明5)(Supplementary Note 5)

根据补充说明3或4所述的训练数据生成设备，The training data generating device according to Supplementary Note 3 or 4,

其中人群状态控制指明存储装置按照诸如人的布置、人的方向和人的数目之类的项目来存储多人状态控制指明和指明的训练标签的存在，并且以指示特定状态的第一形式、指明可以定义任意状态的第二形式和指明可以在预定规则内定义状态的第三形式中的任一形式来存储对应于每个项目的多人状态控制指明，wherein the crowd state control indication storage device stores the multi-person state control indication and the existence of the indicated training label according to items such as the arrangement of people, the direction of people, and the number of people, and stores the multi-person state control indication corresponding to each item in any one of a first form indicating a specific state, a second form indicating that an arbitrary state can be defined, and a third form indicating that a state can be defined within a predetermined rule,

人状态控制指示存储装置按照诸如人的拍摄角度、对人的照明、人的姿势、人的衣服、人的身体形状、人的发型和当与人群状态图像合成时的人尺寸之类的项目，来存储个别人状态指明和指明的训练标签的存在，并且以第一形式、第二形式和第三形式中的任一形式，来存储对应于每个项目的个别人状态控制指明，并且The person state control indication storage device stores the existence of individual person state indications and the indicated training labels according to items such as the shooting angle of the person, the lighting of the person, the posture of the person, the clothes of the person, the body shape of the person, the hairstyle of the person, and the size of the person when synthesized with the crowd state image, and stores the individual person state control indication corresponding to each item in any one of the first form, the second form, and the third form, and

人状态确定装置根据在人群状态控制指明存储装置中存储的多人状态控制指明和在人状态控制指示存储装置中存储的个别人状态控制指明，来确定人群的人状态。The human state determining means determines the human state of the crowd based on the multi-person state control designation stored in the crowd state control designation storage means and the individual human state control designation stored in the human state control designation storage means.

(补充说明6)(Supplementary Note 6)

根据补充说明1至5中任一项所述的训练数据生成设备，The training data generating device according to any one of Supplementary Notes 1 to 5,

其中人群状态图像合成装置从一组预先准备的人图像选择与诸如人的方向、人的数目、人的拍摄角度、对人的照明、人的姿势、人的衣服、人的身体形状和人的发型之类的确定的人状态匹配的人图像，从选择的人图像裁剪人的区域，由此生成只有人的图像，并且根据被确定为人状态的人的布置和当与人群状态图像合成时的人尺寸，来使只有人的图像与背景提取装置获取的预定尺寸的图像合成。The crowd state image synthesis device selects a human image that matches a determined human state such as the direction of the human, the number of people, the shooting angle of the human, the lighting of the human, the posture of the human, the clothes of the human, the body shape of the human and the hairstyle of the human from a group of pre-prepared human images, crops the human area from the selected human image, thereby generating an image of only the human, and synthesizes the image of only the human with an image of a predetermined size obtained by the background extraction device based on the arrangement of the people determined to be in human state and the size of the people when synthesized with the crowd state image.

(补充说明7)(Supplementary Note 7)

根据补充说明6所述的训练数据生成设备，The training data generating device according to Supplementary Note 6,

其中，人群状态图像合成装置从与距离相机的最远布置位置对应的只有人的图像顺序地与背景提取装置获取的预定尺寸的图像合成。The crowd state image synthesis device sequentially synthesizes the image of only people corresponding to the farthest arrangement position from the camera with the image of a predetermined size acquired by the background extraction device.

(补充说明8)(Supplementary Note 8)

根据补充说明1至7中任一项所述的训练数据生成设备，The training data generating device according to any one of Supplementary Notes 1 to 7,

其中，人状态确定装置包括：Wherein, the human state determination device includes:

背景人状态确定装置，用于根据多人状态控制指明和个别人状态控制指明来临时确定作为人群状态图像中的背景的人群的人状态，在临时确定的人状态满足针对人群状态图像的预定尺寸定义的人的基准部位的尺寸、和基准部位如何被表达的第一条件时，将临时确定的人状态确定为作为背景的人群的人状态，并且在临时确定的人状态不满足第一条件时、重复地进行对作为背景的人群的人状态的临时确定；以及Background human state determining means for temporarily determining the human state of a crowd serving as a background in a crowd state image based on a multi-person state control designation and an individual person state control designation, determining the temporarily determined human state as the human state of the crowd serving as the background when the temporarily determined human state satisfies a first condition of a size of a reference part of a person defined for a predetermined size of the crowd state image and how the reference part is expressed, and repeatedly performing the temporary determination of the human state of the crowd serving as the background when the temporarily determined human state does not satisfy the first condition;

前景人状态确定装置，用于根据多人状态控制指明和个别人状态控制指明来临时确定作为人群状态图像中的前景的人群的人状态，在临时确定的人状态满足针对人群状态图像的预定尺寸定义的人的基准部位的尺寸、和基准部位如何被表达的第二条件时将临时确定的人状态确定为作为前景的人群的人状态，并且在临时确定的人状态不满足第二条件时、重复地进行对作为前景的人群的人状态的临时确定。A foreground person state determination device is used to temporarily determine the person state of the crowd serving as the foreground in a crowd state image based on a multi-person state control indication and an individual person state control indication, and to determine the temporarily determined person state as the person state of the crowd serving as the foreground when the temporarily determined person state meets a second condition of the size of a reference part of a person defined for a predetermined size of the crowd state image and how the reference part is expressed, and to repeatedly perform the temporary determination of the person state of the crowd serving as the foreground when the temporarily determined person state does not meet the second condition.

(补充说明9)(Supplementary Note 9)

根据补充说明8所述的训练数据生成设备，The training data generating device according to Supplementary Note 8,

其中第一条件是人的基准部位不在人群状态图像内、或者基准部位的尺寸比针对预定尺寸定义的人的基准部位的尺寸大得多或者小得多，并且The first condition is that the reference part of the person is not within the crowd state image, or the size of the reference part is much larger or smaller than the size of the reference part of the person defined for the predetermined size, and

第二条件是人的基准部位在人群状态图像内、并且基准部位的尺寸与针对预定尺寸定义的人的基准部位的尺寸一样大。The second condition is that the reference part of the person is within the crowd state image, and the size of the reference part is as large as the size of the reference part of the person defined for a predetermined size.

(补充说明10)(Supplementary Note 10)

一种人群状态识别设备，包括：A crowd status recognition device, comprising:

矩形区域组存储装置，用于存储指示图像上将针对人群状态而被识别的部分的一组矩形区域；Rectangular region group storage means for storing a group of rectangular regions indicating portions of an image to be recognized for crowd status;

人群状态识别字典存储装置，用于存储通过利用人群状态图像和人群状态图像的训练标签的多个配对进行机器学习而获取的鉴别器的字典，人群状态图像是以预定尺寸表达人群状态、并且包括其基准部位被表达为与针对预定尺寸定义的人的基准部位的尺寸一样大的人的图像；以及a crowd state recognition dictionary storage device for storing a dictionary of a discriminator obtained by performing machine learning using a plurality of pairs of crowd state images and training labels of the crowd state images, the crowd state images being images expressing a crowd state in a predetermined size and including images of persons whose reference parts are expressed as large as the size of the reference parts of persons defined for the predetermined size; and

人群状态识别装置，用于从给定图像提取在矩形区域组存储装置中存储的该一组矩形区域中指示的区域，并且基于字典来识别在提取的图像中拍摄的人群的状态。Crowd state recognition means is configured to extract, from a given image, an area indicated in the group of rectangular areas stored in the rectangular area group storage means, and recognize the state of a crowd captured in the extracted image based on a dictionary.

(补充说明11)(Supplementary Note 11)

根据补充说明10所述的人群状态识别设备，According to the crowd state recognition device described in Supplementary Note 10,

其中人群状态识别字典存储装置存储通过利用人群状态图像和人群状态图像的训练标签的多个配对进行机器学习而获取的鉴别器的字典，人群状态图像通过合成与控制为期望状态的人状态匹配的人图像而被获取，并且wherein the crowd state recognition dictionary storage device stores a dictionary of discriminators obtained by machine learning using a plurality of pairs of crowd state images and training labels of the crowd state images, the crowd state images being obtained by synthesizing images of people whose states match those of people controlled to a desired state, and

人群状态识别装置基于字典来识别在图像中拍摄的人群的状态。The crowd state recognition means recognizes the state of a crowd captured in an image based on a dictionary.

(补充说明12)(Supplementary Note 12)

根据补充说明10或11所述的人群状态识别设备，The crowd state recognition device according to Supplementary Note 10 or 11,

其中矩形区域组存储装置存储基于指示用于获取图像的图像获取设备的位置、姿势、焦距和透镜畸变的相机参数的一组尺寸定义的矩形区域，以及针对预定尺寸定义的人的基准部位的尺寸，并且wherein the rectangular area group storage means stores rectangular areas defined in a set of sizes based on camera parameters indicating the position, posture, focal length, and lens distortion of an image acquisition device used to acquire an image, and sizes of reference parts of a person defined for predetermined sizes, and

人群状态识别装置从给定图像提取在该组一矩形区域中指示的区域。The crowd state recognition device extracts an area indicated in the set of rectangular areas from a given image.

(补充说明13)(Supplementary Note 13)

根据补充说明10至12中任一项所述的人群状态识别设备，The crowd state recognition device according to any one of Supplementary Notes 10 to 12,

其中人群状态识别字典存储装置存储通过改变在人群状态图像中表打的人的数目并且通过利用针对人的该数目准备的人群状态图像和训练标签的多个配对进行机器学习而获取的鉴别器的字典，并且wherein the crowd state recognition dictionary storage means stores a dictionary of discriminators obtained by changing the number of people represented in a crowd state image and by performing machine learning using a plurality of pairs of crowd state images and training labels prepared for the number of people, and

人群状态识别装置基于字典来识别在图像中拍摄的人群中的人的数目。The crowd state recognition means recognizes the number of persons in a crowd captured in an image based on a dictionary.

(补充说明14)(Supplementary Note 14)

根据补充说明10至13中任一项所述的人群状态识别设备，The crowd state recognition device according to any one of Supplementary Notes 10 to 13,

其中人群状态识别字典存储装置存储通过改变在人群状态图像中表示的人的方向、并且通过利用针对人的这些方向准备的人群状态图像和训练标签的多个配对进行机器学习而获取的鉴别器的字典，并且wherein the crowd state recognition dictionary storage means stores a dictionary of discriminators obtained by changing the directions of people represented in crowd state images and by performing machine learning using a plurality of pairs of crowd state images and training labels prepared for these directions of people, and

人群状态识别装置基于字典来识别在图像中拍摄的人群的方向。The crowd state recognition device recognizes the direction of a crowd captured in an image based on a dictionary.

(补充说明15)(Supplementary Note 15)

根据补充说明10至14中任一项所述的人群状态识别设备，The crowd state recognition device according to any one of Supplementary Notes 10 to 14,

其中人群状态识别字典存储装置存储通过利用针对非显著拥挤的人群和显著拥挤的人群准备的人群状态图像和训练标签的多个配对进行机器学习而获取的鉴别器的字典，并且wherein the crowd state recognition dictionary storage means stores a dictionary of discriminators obtained by machine learning using a plurality of pairs of crowd state images and training labels prepared for non-significantly crowded crowds and significantly crowded crowds, and

人群状态识别装置基于字典来识别在图像中拍摄的人群是否是显著拥挤的。The crowd state recognition means recognizes whether a crowd captured in an image is remarkably crowded based on a dictionary.

(补充说明16)(Supplementary Note 16)

根据补充说明10至15中任一项所述的人群状态识别设备，The crowd state recognition device according to any one of Supplementary Notes 10 to 15,

其中人群状态识别字典存储装置存储通过利用针对其中人的方向统一的人群和其中人的方向不统一的人群准备的人群状态图像和训练标签的配对进行机器学习而获取的鉴别器的字典，并且wherein the crowd state recognition dictionary storage means stores a dictionary of discriminators obtained by machine learning using pairs of crowd state images and training labels prepared for a crowd in which the directions of people are uniform and a crowd in which the directions of people are not uniform, and

人群状态识别装置基于字典来识别在图像中拍摄的人群中的人的方向是否是统一的。The crowd state recognition device recognizes whether the directions of people in a crowd captured in an image are unified based on a dictionary.

(补充说明17)(Supplementary Note 17)

一种训练数据生成方法，包括：A training data generation method, comprising:

背景提取步骤，从多个预先准备的背景图像选择背景图像、提取该背景图像中的区域并且将对应于提取出的区域的图像放大或者缩小为预定尺寸的图像；A background extraction step of selecting a background image from a plurality of pre-prepared background images, extracting a region in the background image, and enlarging or reducing an image corresponding to the extracted region to an image of a predetermined size;

人状态确定步骤，根据作为关于多人的人状态的指明信息的多人状态控制指明和作为关于多人中的个别人的状态的指明信息的个别人状态控制指明，来确定人群的人状态；以及a person state determining step of determining the person state of the group based on the multi-person state control designation as designation information on the person state of the plurality of persons and the individual person state control designation as designation information on the state of an individual person among the plurality of persons; and

人群状态图像合成步骤，生成人群状态图像、指定用于该人群状态图像的训练标签以及输出人群状态图像和训练标签的配对，人群状态图像是其中与在人状态确定步骤中确定的人状态对应的人图像被与在背景提取步骤中得到的预定尺寸的图像合成的图像。A crowd state image synthesis step generates a crowd state image, specifies a training label for the crowd state image, and outputs a pair of the crowd state image and the training label, wherein the crowd state image is an image in which a human image corresponding to the human state determined in the human state determination step is synthesized with an image of a predetermined size obtained in the background extraction step.

(补充说明18)(Supplementary Note 18)

根据补充说明17所述的训练数据生成方法，包括：The training data generation method according to Supplementary Note 17 includes:

根据多人状态控制指明和个别人状态控制指明来临时确定人群的人状态、在临时确定的人状态满足针对预定尺寸定义的人的基准部位的尺寸和基准部位如何被表示的条件时，将临时确定的人状态确定为人群的人状态并且当临时确定的人状态不满足这些条件时重复地进行对人群的人状态的临时确定的人状态确定步骤。The human state of a group of people is temporarily determined based on the multi-person state control indication and the individual person state control indication. When the temporarily determined human state meets the conditions of the size of the human reference part defined for the predetermined size and how the reference part is represented, the temporarily determined human state is determined as the human state of the group of people, and when the temporarily determined human state does not meet these conditions, the human state determination step of temporarily determining the human state of the group of people is repeated.

(补充说明19)(Supplementary Note 19)

根据补充说明17或18所述的训练数据生成方法，According to the training data generation method described in Supplementary Note 17 or 18,

其中人群状态控制指明存储装置存储按照项目定义的多人状态控制指明并且存储针对该项目定义的指明的训练标签的存在，并且wherein the crowd state control indication storage device stores a multi-person state control indication defined according to a project and stores the existence of a training label for the indication defined for the project, and

人状态控制指明存储装置存储按照项目定义的个别人状态控制指明并且存储针对该项目定义的指明的训练标签的存在，The person state control designation storage device stores individual person state control designations defined according to a project and stores the existence of training labels for the designations defined for the project.

该方法包括：The method includes:

人状态确定步骤，根据在人群状态控制指明存储装置中存储的多人状态控制指明和在人状态控制指示存储装置中存储的个别人状态控制指明来确定人群的人状态；以及a human state determining step of determining the human state of the crowd based on the multi-person state control designation stored in the crowd state control designation storage device and the individual human state control designation stored in the human state control designation storage device; and

人群状态图像合成步骤，通过从人群状态控制指明存储装置读取被定义为具有指明的训练标签的项目的多人状态控制指明和从人状态控制指示存储装置读取被定义为具有指明的训练标签的项目的个别人状态控制指明来指定训练标签。The crowd state image synthesis step specifies the training label by reading the multi-person state control indication defined as an item with the specified training label from the crowd state control indication storage device and reading the individual person state control indication defined as an item with the specified training label from the person state control indication storage device.

(补充说明20)(Supplementary Note 20)

根据补充说明19所述的训练数据生成方法，According to the training data generation method described in Supplementary Note 19,

其中人群状态控制指明存储装置将至少一个项目存储为具有指明的训练标签，并且wherein the crowd state control instructs the storage device to store at least one item as having a specified training label, and

该方法包括人群状态图像合成步骤，从人群状态控制指明存储装置读取被定义为具有指明的训练标签的项目的多人状态控制指明。The method includes a crowd state image synthesis step of reading a multi-person state control specification defined as an item having a specified training label from a crowd state control specification storage device.

(补充说明21)(Supplementary Note 21)

根据补充说明19或20所述的训练数据生成方法，According to the training data generation method described in Supplementary Note 19 or 20,

其中人群状态控制指明存储装置按照诸如人的布置、人的方向和人的数目之类的项目来存储多人状态控制指明和指明的训练标签的存在，并且以指示特定状态的第一形式、指明可以定义任意状态的第二形式和指明可以在预定规则内定义状态的第三形式中的任一形式来存储对应于每个项目的多人状态控制指明，并且wherein the crowd state control designation storage device stores the multi-person state control designation and the existence of the designated training label according to items such as arrangement of people, direction of people, and number of people, and stores the multi-person state control designation corresponding to each item in any one of a first form indicating a specific state, a second form indicating that an arbitrary state can be defined, and a third form indicating that a state can be defined within a predetermined rule, and

人状态控制指示存储装置按照诸如人的拍摄角度、对人的照明、人的姿势、人的衣服、人的身体形状、人的发型和当与人群状态图像合成时的人尺寸之类的项目，来存储个别人状态指明和指明的训练标签的存在，并且以第一形式、第二形式和第三形式中的任一形式，来存储对应于每个项目的个别人状态控制指明，The person state control indication storage device stores the existence of individual person state indications and the indicated training labels according to items such as the shooting angle of the person, the lighting of the person, the posture of the person, the clothes of the person, the body shape of the person, the hairstyle of the person, and the size of the person when synthesized with the crowd state image, and stores the individual person state control indication corresponding to each item in any one of the first form, the second form, and the third form.

该方法包括人状态确定步骤，根据在人群状态控制指明存储装置中存储的多人状态控制指明和在人状态控制指示存储装置中存储的个别人状态控制指明来确定人群的人状态。The method includes a human state determining step of determining the human state of a crowd based on multi-person state control indications stored in a crowd state control indication storage device and individual human state control indications stored in a human state control indication storage device.

(补充说明22)(Supplementary Note 22)

根据补充说明17至21中任一项所述的训练数据生成方法，包括：The training data generation method according to any one of Supplementary Notes 17 to 21, comprising:

人群状态图像合成步骤，从一组预先准备的人图像选择与诸如人的方向、人的数目、人的拍摄角度、对人的照明、人的姿势、人的衣服、人的身体形状和人的发型之类的确定的人状态匹配的人图像，从选择的人图像裁剪人的区域，由此生成只有人的图像，并且根据被确定为人状态的人的布置和当与人群状态图像合成时的人尺寸，来使只有人的图像与背景提取装置获取的预定尺寸的图像合成。A crowd state image synthesis step selects a person image that matches a determined person state such as the person's direction, the number of people, the shooting angle of the person, the lighting of the person, the person's posture, the person's clothes, the person's body shape and the person's hairstyle from a group of pre-prepared person images, crops the person's area from the selected person image, thereby generating an image of only the person, and synthesizes the image of only the person with an image of a predetermined size obtained by a background extraction device based on the arrangement of the people determined to be in the human state and the size of the person when synthesized with the crowd state image.

(补充说明23)(Supplementary Note 23)

根据补充说明22所述的训练数据生成方法，包括：The training data generation method according to Supplementary Note 22 includes:

人群状态图像合成步骤，从与距离相机的最远布置位置对应的的只有人的图像顺序地与由背景提取装置获取的预定尺寸的图像合成。The crowd state image synthesis step sequentially synthesizes the image of only people corresponding to the farthest arrangement position from the camera with the image of a predetermined size acquired by the background extraction device.

(补充说明24)(Supplementary Note 24)

根据补充说明17至23中任一项所述的训练数据生成方法，According to the training data generation method described in any one of Supplementary Notes 17 to 23,

其中人状态确定步骤包括：The steps of determining the status of a person include:

背景人状态确定步骤，据多人状态控制指明和个别人状态控制指明来临时确定作为人群状态图像中的背景的人群的人状态，在临时确定的人状态满足针对人群状态图像的预定尺寸定义的人的基准部位的尺寸和基准部位如何被表示的第一条件时将临时确定的人状态确定为作为背景的人群的人状态，并且在临时确定的人状态不满足第一条件时重复地做出对作为背景的人群的人状态的临时确定；以及a background person state determining step of temporarily determining the person state of the crowd serving as the background in the crowd state image based on the multi-person state control designation and the individual person state control designation, determining the temporarily determined person state as the person state of the crowd serving as the background when the temporarily determined person state satisfies a first condition of how the size of a reference part of a person and the reference part are represented defined for a predetermined size of the crowd state image, and repeatedly making the temporary determination of the person state of the crowd serving as the background when the temporarily determined person state does not satisfy the first condition;

前景人状态确定步骤，根据多人状态控制指明和个别人状态控制指明来临时确定作为人群状态图像中的前景的人群的人状态，在临时确定的人状态满足针对人群状态图像的预定尺寸定义的人的基准部位的尺寸和基准部位如何被表示的第二条件时将临时确定的人状态确定为作为前景的人群的人状态，并且在临时确定的人状态不满足第二条件时重复地做出对作为前景的人群的人状态的临时确定。A foreground person state determination step is to temporarily determine the person state of the crowd serving as the foreground in the crowd state image based on the multi-person state control indication and the individual person state control indication, and to determine the temporarily determined person state as the person state of the crowd serving as the foreground when the temporarily determined person state meets a second condition of the size of the person's reference part defined for a predetermined size of the crowd state image and how the reference part is represented, and to repeatedly make a temporary determination of the person state of the crowd serving as the foreground when the temporarily determined person state does not meet the second condition.

(补充说明25)(Supplementary Note 25)

根据补充说明24所述的训练数据生成方法，According to the training data generation method described in Supplementary Note 24,

其中第一条件是人的基准部位不在人群状态图像内或者基准部位的尺寸比针对预定尺寸定义的人的基准部位的尺寸大得多或者小得多，并且The first condition is that the reference part of the person is not within the crowd state image or the size of the reference part is much larger or smaller than the size of the reference part of the person defined for the predetermined size, and

第二条件是人的基准部位在人群状态图像内并且基准部位的尺寸与针对预定尺寸定义的人的基准部位的尺寸一样大。The second condition is that the reference part of the person is within the crowd state image and the size of the reference part is as large as the size of the reference part of the person defined for a predetermined size.

(补充说明26)(Supplementary Note 26)

一种人群状态识别方法，A crowd state recognition method,

其中矩形区域组存储装置存储指示图像上将针对人群状态而被识别的部分的一组矩形区域，并且wherein the rectangular region group storage means stores a group of rectangular regions indicating portions of the image to be recognized for the crowd state, and

人群状态识别字典存储装置存储通过利用人群状态图像和人群状态图像的训练标签的多个配对进行机器学习而获取的鉴别器的字典，人群状态图像是以预定尺寸表达人群状态、并且包括其基准部位被表达为与针对预定尺寸定义的人的基准部位的尺寸一样大的人的图像，The crowd state recognition dictionary storage device stores a dictionary of discriminators obtained by performing machine learning using a plurality of pairs of crowd state images and training labels of the crowd state images, the crowd state images being images expressing a crowd state in a predetermined size and including images of people whose reference parts are expressed as large as the reference parts of people defined for the predetermined size,

该方法包括人群状态识别步骤，从给定图像提取在矩形区域组存储装置中存储的该一组矩形区域中指示的区域，并且基于字典来识别在提取出的图像中拍摄的人群的状态。The method includes a crowd state recognition step of extracting, from a given image, an area indicated among the group of rectangular areas stored in a rectangular area group storage device, and recognizing a state of a crowd photographed in the extracted image based on a dictionary.

(补充说明27)(Supplementary Note 27)

根据补充说明26所述的人群状态识别方法，According to the crowd state recognition method described in Supplementary Note 26,

其中，人群状态识别字典存储装置存储通过利用人群状态图像和人群状态图像的训练标签的多个配对进行机器学习而获取的鉴别器的字典，人群状态图像通过合成与控制为期望状态的人状态匹配的人图像而被获取，The crowd state recognition dictionary storage device stores a dictionary of discriminators obtained by machine learning using a plurality of pairs of crowd state images and training labels of the crowd state images, wherein the crowd state images are obtained by synthesizing images of people whose states match those of people controlled to a desired state.

该方法包括基于字典来识别在图像中拍摄的人群的状态的人群状态识别步骤。The method includes a crowd state recognition step of recognizing the state of a crowd captured in an image based on a dictionary.

(补充说明28)(Supplementary Note 28)

根据补充说明26或27所述的人群状态识别方法，According to the crowd state recognition method described in Supplementary Note 26 or 27,

其中矩形区域组存储装置存储基于指示用于获取图像的图像获取设备的位置、姿势、焦距和透镜畸变的相机参数的一组尺寸定义的矩形区域，以及针对预定尺寸定义的人的基准部位的尺寸，wherein the rectangular area group storage means stores rectangular areas defined by a set of sizes based on camera parameters indicating a position, posture, focal length, and lens distortion of an image acquisition device used to acquire an image, and sizes of reference parts of a person defined for predetermined sizes,

该方法包括从给定图像提取在该组矩形区域中指示的区域的人群状态识别步骤。The method includes a crowd state recognition step of extracting, from a given image, an area indicated in the set of rectangular areas.

(补充说明29)(Supplementary Note 29)

根据补充说明26至28中任一项所述的人群状态识别方法，According to the crowd state recognition method described in any one of Supplementary Notes 26 to 28,

其中人群状态识别字典存储装置存储通过改变在人群状态图像中表达的人的数目并且通过利用针对人的该数目准备的人群状态图像和训练标签的多个配对进行机器学习而获取的鉴别器的字典，wherein the crowd state recognition dictionary storage means stores a dictionary of discriminators obtained by varying the number of persons expressed in a crowd state image and by performing machine learning using a plurality of pairs of crowd state images and training labels prepared for the number of persons,

该方法包括人群状态识别步骤，基于字典来识别在图像中拍摄的人群中的人的数目。The method includes a crowd state recognition step of recognizing the number of people in a crowd captured in an image based on a dictionary.

(补充说明30)(Supplementary Note 30)

根据补充说明26至29中任一项所述的人群状态识别方法，According to the crowd state recognition method described in any one of Supplementary Notes 26 to 29,

其中人群状态识别字典存储装置存储通过改变在人群状态图像中表达的人的方向并且通过利用针对人的这些方向准备的人群状态图像和训练标签的多个配对进行机器学习而获取的鉴别器的字典，wherein the crowd state recognition dictionary storage means stores a dictionary of discriminators obtained by changing the directions of people expressed in crowd state images and by performing machine learning using a plurality of pairs of crowd state images and training labels prepared for these directions of people,

该方法包括基于字典来识别在图像中拍摄的人群的方向的人群状态识别步骤。The method includes a crowd state recognition step of recognizing the direction of a crowd captured in an image based on a dictionary.

(补充说明31)(Supplementary Note 31)

根据补充说明26至30中任一项所述的人群状态识别方法，According to the crowd state recognition method described in any one of Supplementary Notes 26 to 30,

其中人群状态识别字典存储装置存储通过利用针对非显著拥挤的人群和显著拥挤的人群准备的人群状态图像和训练标签的多个配对进行机器学习而获取的鉴别器的字典，wherein the crowd state recognition dictionary storage device stores a dictionary of discriminators obtained by machine learning using a plurality of pairs of crowd state images and training labels prepared for non-significantly crowded crowds and significantly crowded crowds,

该方法包括基于字典来识别在图像中拍摄的人群是否是显著拥挤的的人群状态识别方法。The method includes a crowd state recognition method for recognizing whether a crowd captured in an image is significantly crowded based on a dictionary.

(补充说明32)(Supplementary Note 32)

根据补充说明26至31中任一项所述的人群状态识别方法，According to the crowd state recognition method described in any one of Supplementary Notes 26 to 31,

其中人群状态识别字典存储装置存储通过利用针对其中人的方向统一的人群和其中人的方向不统一的人群准备的人群状态图像和训练标签的多个配对进行机器学习而获取的鉴别器的字典，wherein the crowd state recognition dictionary storage means stores a dictionary of discriminators obtained by performing machine learning using a plurality of pairs of crowd state images and training labels prepared for a crowd in which the directions of people are uniform and a crowd in which the directions of people are not uniform,

该方法包括基于字典来识别在图像中拍摄的人群中的人的方向是否统一的人群状态识别方法。The method includes a crowd state recognition method for recognizing whether the directions of people in a crowd captured in an image are unified based on a dictionary.

(补充说明33)(Supplementary Note 33)

一种用于使计算机执行以下处理的训练数据生成程序：A training data generating program for causing a computer to execute the following processing:

背景提取处理，从多个预先准备的背景图像选择背景图像、提取该背景图像中的区域并且将对应于提取的区域的图像放大或者缩小为预定尺寸的图像；a background extraction process of selecting a background image from a plurality of pre-prepared background images, extracting a region in the background image, and enlarging or reducing the image corresponding to the extracted region to an image of a predetermined size;

人状态确定处理，根据作为关于多人的人状态的指明信息的多人状态控制指明和作为关于多人中的个别人的状态的指明信息的个别人状态控制指明来确定人群的人状态；以及a person state determination process of determining the person state of a group of people based on a multi-person state control designation as designation information on the person states of a plurality of people and an individual person state control designation as designation information on the states of individual people among the plurality of people; and

人群状态图像合成处理，生成人群状态图像、指定该人群状态图像的训练标签并且输出人群状态图像和训练标签的配对，人群状态图像是其中与在人状态确定处理中确定的人状态对应的人图像被与在背景提取处理中得到的预定尺寸的图像合成的图像。Crowd state image synthesis processing generates a crowd state image, specifies a training label for the crowd state image, and outputs a pair of the crowd state image and the training label. The crowd state image is an image in which a human image corresponding to the human state determined in the human state determination processing is synthesized with an image of a predetermined size obtained in the background extraction processing.

(补充说明34)(Supplementary Note 34)

根据补充说明33所述的训练数据生成程序，该程序用于使计算机执行：The training data generation program according to Supplementary Note 33, the program being configured to cause a computer to execute:

人状态确定处理，根据多人状态控制指明和个别人状态控制指明来临时确定人群的人状态、在临时确定的人状态满足针对预定尺寸定义的人的基准部位的尺寸和基准部位如何被表达的条件时将临时确定的人状态确定为人群的人状态并且当临时确定的人状态不满足这些条件时重复地进行对人群的人状态的临时确定。Human state determination processing, temporarily determining the human state of a group of people based on the multi-person state control indication and the individual person state control indication, determining the temporarily determined human state as the human state of the group when the temporarily determined human state meets the conditions of the size of the human reference part defined for the predetermined size and how the reference part is expressed, and repeating the temporary determination of the human state of the group when the temporarily determined human state does not meet these conditions.

(补充说明35)(Supplementary Note 35)

根据补充说明33或34所述的训练数据生成程序，该程序用于使包括人群状态控制指明存储装置和人状态控制指示存储装置的计算机执行：The training data generation program according to Supplementary Note 33 or 34 is configured to cause a computer including a crowd state control instruction storage device and a person state control instruction storage device to execute:

人群状态图像合成步骤，通过从人群状态控制指明存储装置读取被定义为具有指明的训练标签的项目的多人状态控制指明和从人状态控制指示存储装置读取被定义为具有指明的训练标签的项目的个别人状态控制指明来指定训练标签，a crowd state image synthesis step of specifying a training label by reading a multi-person state control designation defined as an item having a specified training label from a crowd state control designation storage device and reading an individual person state control designation defined as an item having a specified training label from a person state control designation storage device;

其中人群状态控制指明存储装置用于存储按照项目定义的多人状态控制指明并且存储针对该项目定义的指定训练标签的存在，并且The crowd state control indication storage device is used to store the multi-person state control indication defined according to the project and store the existence of the specified training label defined for the project, and

人状态控制指示存储装置用于存储按照项目定义的个别人状态控制指明并且存储针对该项目定义的指定训练标签的存在。The person state control indication storage device is used to store individual person state control indications defined according to a project and to store the existence of a specified training label defined for the project.

(补充说明36)(Supplementary Note 36)

根据补充说明35所述的训练数据生成程序，该程序用于使得包括用于将至少一个项目存储为具有指明的训练标签人群状态控制指明存储装置的计算机执行：The training data generation program according to Supplementary Note 35 is configured to cause a computer including a crowd state control device for storing at least one item with a specified training label to execute:

人群状态图像合成处理，从人群状态控制指明存储装置读取被定义为具有指明的训练标签的项目的多人状态控制指明。The crowd state image synthesis process reads a multi-person state control specification defined as an item having a specified training label from a crowd state control specification storage device.

(补充说明37)(Supplementary Note 37)

根据补充说明35或36所述的训练数据生成程序，该程序用于使得包括人群状态控制指明存储装置和人状态控制指示存储装置的计算机执行：The training data generation program according to Supplementary Note 35 or 36, the program being configured to cause a computer including a crowd state control instruction storage device and a person state control instruction storage device to execute:

人状态确定处理，根据在人群状态控制指明存储装置中存储的多人状态控制指明和在人状态控制指示存储装置中存储的个别人状态控制指明来确定人群的人状态，The human state determination processing determines the human state of the crowd based on the multi-person state control indication stored in the crowd state control indication storage device and the individual person state control indication stored in the human state control indication storage device,

其中人群状态控制指明存储装置用于按照诸如人的布置、人的方向和人的数目之类的项目来存储多人状态控制指明和指明的训练标签的存在，并且以指示特定状态的第一形式、指示可以定义任意状态的第二形式和指示可以在预定规则内定义状态的第三形式中的任一项来存储对应于每个项目的多人状态控制指明，并且wherein the crowd state control designation storage device is used to store the multi-person state control designation and the existence of the designated training label according to items such as the arrangement of people, the direction of people, and the number of people, and stores the multi-person state control designation corresponding to each item in any one of a first form indicating a specific state, a second form indicating that an arbitrary state can be defined, and a third form indicating that a state can be defined within a predetermined rule, and

人状态控制指示存储装置用于按照诸如人的拍摄角度、对人的照明、人的姿势、人的衣服、人的身体形状、人的发型和当与人群状态图像合成时的人尺寸之类的项目来存储个别人状态控制指明和指明的训练标签的存在，并且以第一形式、第二形式和第三形式中的任一项来存储对应于每个项目的个别人状态控制指明。The person status control indication storage device is used to store the existence of individual person status control indications and indicated training labels according to items such as the person's shooting angle, the person's lighting, the person's posture, the person's clothes, the person's body shape, the person's hairstyle and the person's size when synthesized with the crowd status image, and stores the individual person status control indication corresponding to each item in any one of the first form, the second form and the third form.

(补充说明38)(Supplementary Note 38)

根据补充说明33至37中任一项所述的训练数据生成程序，该程序用于使得计算机执行：The training data generation program according to any one of Supplementary Notes 33 to 37, the program being configured to cause a computer to execute:

人群状态图像合成处理，从一组预先准备的人图像选择与诸如人的方向、人的数目、人的拍摄角度、对人的照明、人的姿势、人的衣服、人的身体形状和人的发型之类的确定的人状态匹配的人图像、从选择的人图像中裁剪人的区域，由此生成只有人的图像并且根据被确定为人状态的当人的布置和与人群状态图像合成时的人尺寸来使只有人的图像与背景提取装置获取的预定尺寸的图像合成。Crowd state image synthesis processing, selecting a human image that matches a determined human state such as the direction of the human, the number of people, the shooting angle of the human, the lighting of the human, the posture of the human, the clothes of the human, the body shape of the human and the hairstyle of the human from a group of pre-prepared human images, cropping the human area from the selected human image, thereby generating an image of only the human and synthesizing the image of only the human with an image of a predetermined size obtained by a background extraction device based on the arrangement of the human when determined to be in a human state and the size of the human when synthesized with the crowd state image.

(补充说明39)(Supplementary Note 39)

根据补充说明38所述的训练数据生成程序，该程序用于使得计算机执行：The training data generation program according to Supplementary Note 38, the program being configured to cause a computer to execute:

用于距离相机的最远布置位置对应的只有人的图像顺序地与由背景提取装置获取的预定尺寸的图像合成的人群状态图像合成处理。A crowd state image synthesis process is used to sequentially synthesize an image of only people corresponding to the farthest arrangement position from the camera with an image of a predetermined size acquired by a background extraction device.

(补充说明40)(Supplementary Note 40)

根据补充说明33至39中任一项所述的训练数据生成程序，该程序用于使得计算机在人状态确定处理中执行：The training data generation program according to any one of Supplementary Notes 33 to 39, the program being configured to cause a computer to execute, in a human state determination process:

背景人状态确定处理，根据多人状态控制指明和个别人状态控制指明来临时确定作为人群状态图像中的背景的人群的人状态、在临时确定的人状态满足针对人群状态图像的预定尺寸定义的人的基准部位的尺寸和基准部位如何被表达的第一条件时将临时确定的人状态确定为作为背景的人群的人状态并且在临时确定的人状态不满足第一条件时重复地进行对作为背景的人群的人状态的临时确定；以及Background person state determination processing of temporarily determining a person state of a crowd serving as a background in a crowd state image based on a multi-person state control designation and an individual person state control designation, determining the temporarily determined person state as the person state of the crowd serving as the background when the temporarily determined person state satisfies a first condition of how a reference part of a person is expressed and a size defined for a predetermined size of the crowd state image, and repeatedly performing the temporary determination of the person state of the crowd serving as the background when the temporarily determined person state does not satisfy the first condition;

前景人状态确定处理，根据多人状态控制指明和个别人状态控制指明来临时确定作为人群状态图像中的前景的人群的人状态、在临时确定的人状态满足针对人群状态图像的预定尺寸定义的人的基准部位的尺寸和基准部位如何被表达的第二条件时将临时确定的人状态确定为作为前景的人群的人状态并且在临时确定的人状态不满足第二条件时重复地进行对作为前景的人群的人状态的临时确定。The foreground person state determination processing temporarily determines the person state of the crowd as the foreground in the crowd state image based on the multi-person state control indication and the individual person state control indication, determines the temporarily determined person state as the person state of the crowd as the foreground when the temporarily determined person state meets the second condition of the size of the person's reference part defined for the predetermined size of the crowd state image and how the reference part is expressed, and repeats the temporary determination of the person state of the crowd as the foreground when the temporarily determined person state does not meet the second condition.

(补充说明41)(Supplementary Note 41)

根据补充说明40所述的训练数据生成程序，According to the training data generation procedure described in Supplementary Note 40,

(补充说明42)(Supplementary Note 42)

一种人群状态识别程序，用于使得包括矩形区域组存储装置和人群状态识别字典存储装置的计算机执行：A crowd state recognition program is configured to cause a computer including a rectangular region group storage device and a crowd state recognition dictionary storage device to execute:

人群状态识别处理，从给定图像提取在矩形区域组存储装置中存储的该组矩形区域中指示的区域并且基于字典来识别在提取出的图像中拍摄的人群的状态，a crowd state recognition process of extracting, from a given image, an area indicated in the group of rectangular areas stored in the rectangular area group storage means and recognizing the state of a crowd photographed in the extracted image based on a dictionary,

其中矩形区域组存储装置用于存储指示图像上将针对人群状态而被识别的部分的一组矩形区域，并且The rectangular region group storage device is used to store a group of rectangular regions indicating the portion of the image to be identified for the crowd state, and

人群状态识别字典存储装置用于存储通过利用人群状态图像和人群状态图像的训练标签的多个配对进行机器学习而获取的鉴别器的字典，人群状态图像是以预定尺寸表示人群状态并且包括其基准部位被表达为与针对预定尺寸定义的人的基准部位的尺寸一样大的人的图像。The crowd state recognition dictionary storage device is used to store a dictionary of a discriminator obtained by machine learning using multiple pairs of crowd state images and training labels of the crowd state images, wherein the crowd state image is an image of a person whose reference part is expressed as the same size as the reference part of a person defined for the predetermined size and represents a crowd state in a predetermined size.

(补充说明43)(Supplementary Note 43)

根据补充说明42所述的人群状态识别程序，该程序用于使得包括人群状态识别字典存储装置的计算机执行：According to the crowd state recognition program described in Supplementary Note 42, the program is configured to cause a computer including a crowd state recognition dictionary storage device to execute:

人群状态识别处理，基于字典来识别在图像中拍摄的人群的状态，Crowd state recognition processing, based on the dictionary to identify the state of the crowd captured in the image,

其中人群状态识别字典存储装置用于存储通过利用人群状态图像和人群状态图像的训练标签的多个配对进行机器学习而获取的鉴别器的字典，人群状态图像是通过合成与控制为期望状态的人状态匹配的人图像而得到的。The crowd state recognition dictionary storage device is used to store a dictionary of discriminators obtained by machine learning using multiple pairs of crowd state images and training labels of crowd state images, and the crowd state images are obtained by synthesizing human images that match the human state controlled to the desired state.

(补充说明44)(Supplementary Note 44)

根据补充说明42或43所述的人群状态识别程序，该程序用于使得包括矩形区域组存储装置的计算机执行：The crowd state recognition program according to Supplementary Note 42 or 43 is configured to cause a computer including a rectangular region group storage device to execute:

人群状态识别处理，从给定图像提取在该组矩形区域中指示的区域，Crowd state recognition processing, extracting the area indicated in the set of rectangular areas from a given image,

其中矩形区域组存储装置用于存储基于指示用于获取图像的图像获取设备的位置、姿势、焦距和透镜畸变的相机参数的一组尺寸定义的矩形区域，以及针对预定尺寸定义的人的基准部位的尺寸。The rectangular area group storage device is used to store rectangular areas defined by a set of sizes based on camera parameters indicating the position, posture, focal length and lens distortion of an image acquisition device used to acquire images, and the sizes of reference parts of a person defined for predetermined sizes.

(补充说明45)(Supplementary Note 45)

根据补充说明42至44中任一项所述的人群状态识别程序，该程序用于使包括人群状态识别字典存储装置的计算机执行：A crowd state recognition program according to any one of Supplementary Notes 42 to 44, the program being configured to cause a computer including a crowd state recognition dictionary storage device to execute:

基于字典来识别在图像中拍摄的人群中的人的数目的人群状态识别处理，Crowd state recognition processing of recognizing the number of people in a crowd captured in an image based on a dictionary,

其中，群状态识别字典存储装置用于存储通过改变在人群状态图像中表达的人的数目并且通过利用针对人的该数目准备的多对人群状态图像和训练标签进行机器学习而得到的鉴别器的字典。Among them, the group state recognition dictionary storage device is used to store a dictionary of discriminators obtained by changing the number of people expressed in the crowd state image and by performing machine learning using multiple pairs of crowd state images and training labels prepared for the number of people.

(补充说明46)(Supplementary Note 46)

根据补充说明42至45中任一项所述的人群状态识别程序，该程序用于使得包括人群状态识别字典存储装置的计算机执行：The crowd state recognition program according to any one of Supplementary Notes 42 to 45, the program being configured to cause a computer including a crowd state recognition dictionary storage device to execute:

人群状态识别处理，基于字典来识别在图像中拍摄的人群的方向，Crowd state recognition processing, based on the dictionary to identify the direction of the crowd captured in the image,

其中人群状态识别字典存储装置用于存储通过改变在人群状态图像中表达的人的方向并且通过利用针对人的这些方向准备的人群状态图像和训练标签的多个配对进行机器学习而获取的鉴别器的字典。The crowd state recognition dictionary storage device is used to store a dictionary of discriminators obtained by changing the directions of people expressed in crowd state images and by performing machine learning using multiple pairs of crowd state images and training labels prepared for these directions of people.

(补充说明47)(Supplementary Note 47)

根据补充说明42至46中任一个所述的人群状态识别程序，该程序用于使得包括人群状态识别字典存储装置的计算机执行：The crowd state recognition program according to any one of Supplementary Notes 42 to 46, the program being configured to cause a computer including a crowd state recognition dictionary storage device to execute:

人群状态识别处理，基于字典来识别在图像中拍摄的人群是否显著拥挤，Crowd state recognition processing, based on the dictionary to identify whether the crowd captured in the image is significantly crowded,

其中人群状态识别字典存储装置存储通过利用针对非显著拥挤的人群和显著拥挤的人群准备的人群状态图像和训练标签的多个配对进行机器学习而获取的鉴别器的字典。The crowd state recognition dictionary storage device stores a dictionary of discriminators obtained by performing machine learning using a plurality of pairs of crowd state images and training labels prepared for non-significantly crowded crowds and significantly crowded crowds.

(补充说明48)(Supplementary Note 48)

根据补充说明42至47中任一项所述的人群状态识别程序，该程序用于使得包括人群状态识别字典存储装置的计算机执行：The crowd state recognition program according to any one of Supplementary Notes 42 to 47, the program being configured to cause a computer including a crowd state recognition dictionary storage device to execute:

人群状态识别处理，基于字典来识别在图像中拍摄的人群中的人的方向是否统一，Crowd state recognition processing, based on the dictionary to identify whether the orientation of people in the crowd captured in the image is unified,

其中人群状态识别字典存储装置存储通过利用针对其中人的方向是统一的的人群和其中人的方向不是统一的的人群准备的人群状态图像和训练标签的多个配对进行机器学习而获取的鉴别器的字典。The crowd state recognition dictionary storage device stores a dictionary of discriminators obtained by machine learning using a plurality of pairs of crowd state images and training labels prepared for a crowd in which the directions of people are unified and a crowd in which the directions of people are not unified.

本发明已经参考了示例性实施例而得到描述，但是本发明不限于上面的示例性实施例。在本领域技术人员可以理解的本发明的范围内可以不同地改变本发明的结构和细节The present invention has been described with reference to the exemplary embodiments, but the present invention is not limited to the above exemplary embodiments. The structure and details of the present invention can be variously changed within the scope of the present invention that can be understood by those skilled in the art.

本申请要求基于在2013年6月28日提交的日本专利申请第2013-135915号的优先权，其公开内容通过引用而被全部结合于此。This application claims the benefit of priority based on Japanese Patent Application No. 2013-135915, filed on Jun. 28, 2013, the disclosure of which is incorporated herein in its entirety by reference.

工业应用性Industrial Applicability

本发明可合适地应用于用于在学习用于识别人群状态的鉴别器的字典时生成训练数据的训练数据生成设备。The present invention can be suitably applied to a training data generating device for generating training data when learning a dictionary of a discriminator for recognizing a crowd state.

本发明被合适地应用于用于识别图像中的人群状态的人群状态识别设备。具体而言，本发明可合适地应用于识别低帧速率的图像中的人群状态。另外，当帧速率不稳定并且使用时间信息的人群状态识别处理无法被执行时也可以合适地利用本发明。另外，本发明可以被合适地用于根据静止图像来识别包括人之间的重叠的复杂人群状态。另外，本发明可以被用于监视领域中的可疑人识别、左可疑对象识别、追尾识别、异常状态识别、异常行为识别等以用与从由相机获取的图像识别人群状态。另外，本发明可被用于将图像中的人群状态的识别结果连同人群的位置(2D位置或者3D位置)一起输出给其他系统。另外，本发明可以被用于获取图像中的人群状态的识别结果和人群的位置(2D位置或3D位置)和利用获得物作为触发来进行视频搜索。The present invention is suitably applied to a crowd state recognition device for identifying the state of a crowd in an image. Specifically, the present invention can be suitably applied to identifying the state of a crowd in an image with a low frame rate. In addition, the present invention can also be suitably utilized when the frame rate is unstable and the crowd state recognition processing using time information cannot be performed. In addition, the present invention can be suitably used to identify a complex crowd state including overlaps between people based on a still image. In addition, the present invention can be used for suspicious person recognition, left suspicious object recognition, rear-end recognition, abnormal state recognition, abnormal behavior recognition, etc. in the field of surveillance to identify the state of a crowd from an image acquired by a camera. In addition, the present invention can be used to output the recognition result of the state of a crowd in an image together with the position (2D position or 3D position) of the crowd to other systems. In addition, the present invention can be used to obtain the recognition result of the state of a crowd in an image and the position (2D position or 3D position) of the crowd and use the obtained object as a trigger to perform a video search.

标号列表Label list

11 背景提取装置11 Background extraction device

12 背景人状态确定装置12 Background person status determination device

13 前景人状态确定装置13. Foreground Person Status Determination Device

14 人群状态图像合成装置14 Crowd state image synthesis device

15 人状态确定装置15. Person status determination device

16 控制装置16 Control Device

21 背景图像存储装置21 Background image storage device

22 学习局部图像信息存储装置22 Learning local image information storage device

23 人群状态控制指明存储装置23 Crowd status control indication storage device

24 人状态控制指明存储装置24-person status control indicating storage device

25 人图像存储装置25-person image storage device

26 人区域图像存储装置26-person area image storage device

41 人群状态识别装置41 Crowd Status Recognition Device

51 搜索窗口存储装置51 Search Window Storage

52 人群状态识别字典存储装置52 Crowd state recognition dictionary storage device

Claims

1. A training data generation device, comprising:

The background extraction unit is used to select a background image from multiple pre-prepared background images, extract a region from the background image, and enlarge or reduce the image corresponding to the extracted region to a predetermined size.

A person state determination unit is used to determine the person state of a crowd based on a multi-person state control specification as specifying information about the person state of multiple people and an individual state control specification as specifying information about the state of an individual among the multiple people; and

A crowd state image synthesis unit is configured to generate a crowd state image, specify training labels for the crowd state image, and output a pair of crowd state images and training labels. The crowd state image is an image synthesized by combining a person image corresponding to the person state determined by the person state determination unit with an image of a predetermined size obtained by the background extraction unit.

The human state determination unit includes:

A background person state determination unit is configured to temporarily determine the person state of a crowd serving as the background in a crowd state image based on the multi-person state control specification and the individual person state control specification; when the temporarily determined person state satisfies a first condition regarding the size of a reference part of a person defined for the predetermined size of the crowd state image and how the reference part is represented, the temporarily determined person state is determined as the person state of the crowd serving as the background; and when the temporarily determined person state does not satisfy the first condition, the temporary determination of the person state of the crowd serving as the background is repeated.

A foreground person state determination unit is configured to temporarily determine the person state of a crowd as the foreground in a crowd state image based on the multi-person state control specification and the individual person state control specification. When the temporarily determined person state satisfies a second condition—the size of a reference portion defined for a predetermined size of the crowd state image and how the reference portion is expressed—the temporarily determined person state is confirmed as the person state of the crowd as the foreground. When the temporarily determined person state does not satisfy the second condition, the temporary determination of the person state of the crowd as the foreground is repeated.

The first condition is that the reference part of the person is not within the crowd state image, or the size of the reference part is much larger or much smaller than the size of the reference part of the person defined for the predetermined size.

The second condition is that the reference part of the person is within the crowd state image, and the size of the reference part is the same as the size of the reference part of the person defined for the predetermined size.

The training data generation device includes:

A crowd state control specification storage unit is provided for storing the crowd state control specification as defined in the project definition, and storing the presence of training labels for the specification defined in the project definition; and

The human state control specification storage unit is used to store individual human state control specifications as defined by the project, as well as the presence of training labels for the specifications defined by the project.

The person status determination unit determines the person status of the crowd based on the multi-person status control indication stored in the crowd status control indication storage unit and the individual person status control indication stored in the person status control indication storage unit.

The crowd state image synthesis unit specifies training labels by reading the multi-person state control specifications defined as items with specified training labels from the crowd state control specification storage unit, and the individual state control specifications defined as items with specified training labels from the individual state control specification storage unit.

The crowd state control specifies that the storage unit stores the presence of the multi-person state control specification and the specified training label according to items related to the arrangement, orientation, and number of people, and stores the multi-person state control specification corresponding to each item in any of the following forms: a first form specifying a specific state, a second form specifying that any state can be defined, and a third form specifying that a state can be defined within predetermined rules.

The person state control specifies that the storage unit stores the presence of the individual state specification and the specified training label according to items related to the person's shooting angle, lighting, posture, clothing, body shape, hairstyle, and size when composited with a crowd state image. Furthermore, it stores the individual state control specification corresponding to each item in any of the first, second, and third forms.

The human status determination unit determines the human status of the crowd based on the multi-person status control indication stored in the crowd status control indication storage unit and the individual status control indication stored in the human status control indication storage unit.

2. The training data generation device according to claim 1,

The person state determination unit temporarily determines the person state of the crowd based on the multi-person state control specification and the individual person state control specification. When the temporarily determined person state satisfies the conditions for the size of the reference part of the person defined for the predetermined size and how the reference part is expressed, the temporarily determined person state is determined as the person state of the crowd. When the temporarily determined person state does not satisfy the conditions, the temporary determination of the person state of the crowd is repeated.

3. The training data generation device according to claim 1,

The crowd state control specifies that the storage unit stores at least one item with a specified training label, and

The crowd state image synthesis unit reads the multi-person state control specification defined as having specified training labels from the crowd state control specification storage unit.

4. The training data generation device according to claim 1,

The crowd state image synthesis unit selects a set of pre-prepared human images that match the determined human state, which is determined by factors such as the direction of the human, the number of human, the shooting angle of the human, the lighting of the human, the posture of the human, the clothing of the human, the body shape of the human, and the hairstyle of the human. It then crops the human region from the selected human images to generate an image containing only the human. Based on the arrangement of the human in the determined human state and the human size when synthesized with the crowd state image, it synthesizes the image containing only the human with the image of a predetermined size obtained by the background extraction unit.

5. The training data generation device according to claim 4,

The crowd state image synthesis unit sequentially synthesizes images containing only people from the furthest placement position from the camera with images of a predetermined size obtained by the background extraction unit.

6. A method for generating training data, comprising:

The background extraction step involves selecting a background image from multiple pre-prepared background images, extracting a region from the background image, and enlarging or shrinking the image corresponding to the extracted region to a predetermined size.

The person status determination step involves determining the person status of a group based on a multi-person status control specification, which serves as specification information regarding the person status of multiple people, and an individual status control specification, which serves as specification information regarding the status of individual individuals within the multiple people; and

The crowd state image synthesis step includes generating a crowd state image, specifying training labels for the crowd state image, and outputting a pair of crowd state images and training labels. The crowd state image is an image synthesized from a person image corresponding to the person state determined in the person state determination step and an image of a predetermined size obtained in the background extraction step.

The steps for determining the person's status include:

The background person state determination step involves temporarily determining the person state of the crowd as background in the crowd state image based on the multi-person state control specification and the individual person state control specification. If the temporarily determined person state satisfies a first condition regarding the size of a reference part of a person defined for the predetermined size of the crowd state image and how the reference part is represented, the temporarily determined person state is determined as the person state of the crowd as background. If the temporarily determined person state does not satisfy the first condition, the temporary determination of the person state of the crowd as background is repeated.

The foreground person state determination step involves temporarily determining the person states of the crowd as foreground in the crowd state image based on the multi-person state control specification and the individual person state control specification. If the temporarily determined person state satisfies a second condition regarding the size of the reference portion defined for the predetermined size of the crowd state image and how the reference portion is represented, the temporarily determined person state is confirmed as the person state of the crowd as foreground. If the temporarily determined person state does not satisfy the second condition, the temporary determination of the person state of the crowd as foreground is repeated.

in

The crowd state control specifies that the storage unit stores the presence of training labels specified in the project definition, in accordance with the crowd state control specification.

The human state control specifies that the storage unit stores the individual human state control specified according to the project definition, and stores the existence of the specified training labels for the project definition.

The steps for determining the person's status include:

The person status of the crowd is determined based on the multi-person status control indication stored in the crowd status control indication storage unit and the individual person status control indication stored in the person status control indication storage unit.

The crowd state image synthesis step includes:

Training labels are specified by reading the multi-person status control specification defined as an item with a specified training label from the crowd status control specification storage unit, and the individual status control specification defined as an item with a specified training label from the person status control specification storage unit.

The steps for determining the person's status include:

The person status of the crowd is determined based on the multi-person status control specification stored in the crowd status control specification storage unit and the individual person status control specification stored in the person status control specification storage unit.

7. A computer-readable recording medium wherein a training data generation program is recorded, the training data generation program causing a computer to perform the following processes:

Background extraction processing involves selecting a background image from multiple pre-prepared background images, extracting a region from the background image, and enlarging or shrinking the image corresponding to the extracted region to a predetermined size.

The person status determination process determines the person status of a group based on a multi-person status control specification, which serves as specification information about the person status of multiple people, and an individual status control specification, which serves as specification information about the status of individual individuals within the multiple people; and

The crowd state image synthesis process generates a crowd state image, specifies training labels for the crowd state image, and outputs a pair of crowd state images and training labels. The crowd state image is an image synthesized from a person image corresponding to the person state determined in the person state determination process and an image of a predetermined size obtained in the background extraction process.

The training data generation procedure causes the computer to execute the human state determination process:

Background person state determination processing involves temporarily determining the person states of the crowd as background in a crowd state image based on the multi-person state control specification and the individual person state control specification. When the temporarily determined person state satisfies a first condition regarding the size of a reference part of a person defined for the predetermined size of the crowd state image and how the reference part is represented, the temporarily determined person state is determined as the person state of the crowd as background. If the temporarily determined person state does not satisfy the first condition, the temporary determination of the person state of the crowd as background is repeated.

The foreground person state determination process temporarily determines the person state of the crowd as the foreground in the crowd state image based on the multi-person state control specification and the individual person state control specification. If the temporarily determined person state satisfies the second condition regarding the size of the reference portion defined for the predetermined size of the crowd state image and how the reference portion is represented, the temporarily determined person state is confirmed as the person state of the crowd as the foreground. If the temporarily determined person state does not satisfy the second condition, the temporary determination of the person state of the crowd as the foreground is repeated.

The computer mentioned above includes:

The training data generation program causes the computer to execute:

In the person status determination process, the person status of the crowd is determined based on the multi-person status control specifications stored in the crowd status control specification storage unit and the individual person status control specifications stored in the person status control specification storage unit.

In the crowd state image synthesis process, training labels are specified by reading the multi-person state control specifications defined as items with specified training labels from the crowd state control specification storage unit, and the individual state control specifications defined as items with specified training labels from the individual state control specification storage unit.

The training data generation procedure causes the computer to execute, in the person state determination process, to determine the person state of the crowd based on the multi-person state control specification stored in the crowd state control specification storage unit and the individual person state control specification stored in the person state control specification storage unit.