CN109815867A

CN109815867A - A kind of crowd density estimation and people flow rate statistical method

Info

Publication number: CN109815867A
Application number: CN201910031587.5A
Authority: CN
Inventors: 朱杰; 沈波
Original assignee: Donghua University
Current assignee: Donghua University
Priority date: 2019-01-14
Filing date: 2019-01-14
Publication date: 2019-05-28

Abstract

The present invention relates to a method for crowd density estimation and human flow statistics. The present invention uses a multi-scale fusion crowd density estimation model, which consists of a deep network and a shallow network, wherein the deep network is designed based on VGG-16. The shallow network is mainly used to learn the features of objects with relatively small pixels on the image. Moreover, the multi-scale fusion crowd density estimation model extracts the features of different convolutional layers of the deep network, and fuses them with the outputs of the deep network and the shallow network for crowd density estimation. At the same time, the output of the crowd density estimation model is used as the input of the people flow statistics model, so that the two models are fused together, which can not only greatly speed up the training speed of the neural network, but also be more widely used in real life. The invention not only improves the accuracy of the task of crowd density and the same task of people flow, but also completes the task of crowd density estimation and people flow statistics in one model.

Description

A kind of crowd density estimation and people flow rate statistical method

Technical field

The present invention relates to a kind of crowd density estimation neural network based and people flow rate statistical methods, belong to Video security Monitoring technology field.

Background technique

Crowd density estimation is that information is handled to the crowd is dense using machine and software, extracts its feature and calculate The process of total number of persons in crowd density image out, and people flow rate statistical is the information extracted between successive frame picture, is calculated Pass through number somewhere in a period of time.Crowd density estimation and people flow rate statistical are typically used to Video security monitoring neck Domain.For example, more than the flow of the people and the places such as complicated occasion such as railway station, subway station carry out video monitoring, by population analysis, Intellectualized monitoring is carried out, the abnormal behaviour of group in scene is detected.

In current crowd density estimation and people flow rate statistical research, it is main using one end to end regression model come At the model is to be proposed based on AlexNet, but changed 4096 neurons of the full articulamentum of the last layer into one Single neuron is used to estimate the number in picture.But in the picture calculated due to crowd density estimation and people flow rate statistical The larger reason of crowd density, so many difficulties can be faced when research, such as target occlusion, target deformation, change of scale Deng.For example pixel ratio shared by people of the dense graph on piece close to camera is very big, but shared by the people far from camera Pixel it is especially small, be thus that neural network is also difficult disposably to finish all density informations of this picture.Meanwhile scheming When crowd density is larger in piece, interpersonal block also can be very serious, so last estimated result often and is paid no attention to Think.

Summary of the invention

The purpose of the present invention is: improve the accuracy rate of crowd density estimation and people flow rate statistical task.

In order to achieve the above object, the technical solution of the present invention is to provide a kind of crowd density estimation and people flow rate statisticals Method, which comprises the following steps:

Step 1 carries out pretreatment operation to crowd density picture；

Personage head on pretreated crowd density picture point is marked by step 2, and it is identical to generate size Bianry image generates thermodynamic chart, a thermodynamic chart and and thermodynamic chart using Gaussian kernel normalization algorithm to the bianry image of generation Corresponding crowd density figure is at one group of training set；

Step 3, the training set for obtaining step 2 are put into Density estimating model and are trained, and Density estimating model is by depth Layer network and shallow-layer network composition, in which:

Deep layer network is the VGG-16 for removing full articulamentum, and VGG-16 is made of five groups of full convolutional layers, the first of VGG-16 Group convolutional layer and second group of convolutional layer are there are two full convolutional layer, and the full convolutional layer of third group of VGG-16, the 4th group of full convolutional layer It is all made of three full convolutional layers with the 5th group of full convolutional layer, only first group of convolutional layer, second group of convolutional layer, the full convolution of third group It is connected to one layer of maximum pond layer behind layer and the 4th group of full convolutional layer, is connected to first group of convolutional layer, second group of convolutional layer, the The step-length of maximum pond layer after three groups of full convolutional layers is 2, the step-length of the maximum pond layer after being connected to the 4th group of full convolutional layer It is 1

Shallow-layer network possesses three-layer coil lamination and three layers of pond layer；

Two upper convolutional layers have been added after directly the output of deep layer network and the output of shallow-layer network are connected, and have been made The size for obtaining the characteristic pattern that Density estimating model finally exports is identical with the size of original crowd's density map piece；

Step 4, the Density estimating model completed according to training, by the output of Density estimating model as the defeated of bilayer LSTM Out to connect crowd density estimation model and people flow rate statistical model, it is completed at the same time crowd density estimation task and people Traffic statistics task.

Preferably, in step 2, generate the thermodynamic chart the following steps are included:

Its parameter is made to two-dimensional Gaussian kernel normalization first and for one, dimensional Gaussian then is used to the bianry image Core normalization algorithm generates thermodynamic chart, so that the point that pixel value is one in bianry image scatter, makes all pixels around the point Value adds up and is one.

Preferably, in step 2, the normalized calculation formula H (x) of Gaussian kernel are as follows:

In formula, what A was represented be personage head in crowd density picture it is labeled come out it is all Point, a is a point in A, and what x was represented is the location of pixels in thermodynamic chart, and σ is the variance of Gaussian kernel N.

Due to the adoption of the above technical solution, compared with prior art, the present invention having the following advantages that and actively imitating Fruit: present invention uses the crowd density estimation model of Multiscale Fusion, which is made of deep layer network and shallow-layer network, Mid-deep strata network is designed based on VGG-16, only deep layer network eliminated on the basis of VGG-16 full articulamentum and Layer 5 pond layer, and the step-length of the 4th layer of pond layer is changed to 1.And shallow-layer network is only by three-layer coil lamination and three layers of pond Layer composition, shallow-layer network are primarily used to the shared lesser clarification of objective of pixel ratio on study picture.And Multiscale Fusion The crowd density estimation model extraction feature of deep layer network difference convolutional layer, by they and deep layer network and shallow-layer network Output is integrated into row crowd density estimation.Simultaneously by the output of crowd density estimation model as the defeated of people flow rate statistical model Enter, so that together by two Model Fusions, the training speed of neural network can not only be greatly speeded up in this way, in real life Utilization also more extensively.The present invention not only increases crowd density task and flow of the people with the accuracy rate of task, while by people Group's density estimation task and people flow rate statistical task are completed in a model.

Detailed description of the invention

Fig. 1 is a kind of flow chart of crowd density estimation and people flow rate statistical method provided by the invention.

Specific embodiment

Present invention will be further explained below with reference to specific examples.It should be understood that these embodiments are merely to illustrate the present invention Rather than it limits the scope of the invention.In addition, it should also be understood that, after reading the content taught by the present invention, those skilled in the art Member can make various changes or modifications the present invention, and such equivalent forms equally fall within the application the appended claims and limited Range.

Embodiments of the present invention are related to a kind of crowd density estimation neural network based and people flow rate statistical method, such as Shown in Fig. 1, comprising the following steps: pre-processed to crowd density image；Personage head on pretreated image is used Point, which is marked, generates the identical bianry image of size, generates heating power using Gaussian kernel normalization algorithm to the bianry image of generation Figure, an original image and a thermodynamic chart are just one group of training set；Above-mentioned training set is put into crowd density estimation network It is trained, crowd density estimation network is made of deep layer network and shallow-layer network, and mid-deep strata network is set based on VGG-16 Meter, shallow-layer network possesses three-layer coil lamination and three layers of pond layer；According to training complete crowd density estimation network, by it The output as bilayer LSTM is exported to crowd density estimation network and people flow rate statistical are connected to the network, the network energy Enough it is completed at the same time crowd density estimation task and people flow rate statistical task.It is specific as follows:

Step 1 collects cut out image block in present frame according to the collected data, image block is done pretreatment operation: first Image block size is adjusted to 225 ' 225 pixel sizes, then finds out the pixel mean value of all pixels point, then with each pixel Pixel value subtract pixel mean value and obtain normalized image block.

Personage's head point on pretreated image is marked the identical bianry image of generation size by step 2, Thermodynamic chart is generated using Gaussian kernel normalization algorithm to the bianry image of generation, an original image and a thermodynamic chart are just one Group training set.Specific step is as follows:

Step 2.1 makes its parameter to two-dimensional Gaussian kernel normalization first and for one, then uses above-mentioned bianry image Two-dimensional Gaussian kernel normalization algorithm generates thermodynamic chart.This method makes the point that pixel value is one in bianry image scatter, and makes this Point surrounding all pixels value adds up and is one.The normalized calculation formula of Gaussian kernel is as follows:

In formula (1), what A was represented, which is that personage head in original image is labeled, comes out all points, and a is one in A Point, what x was represented is the location of pixels in thermodynamic chart, and σ is the variance of Gaussian kernel N.

Step 2.2 is equal to all pixels value addition on the thermodynamic chart generated by the above method in original image Number, the total number of persons that can be calculated by the following formula in original image:

Wherein H (x) represents the pixel value of x point on thermodynamic chart.It goes to train crowd density estimation network by this thermodynamic chart It enables to network not have to the position for focusing on personage in image again, and is concerned with the number in image.And by this The thermodynamic chart that mode generates can obtain the substantially distribution of crowd in original image.

Above-mentioned training set is put into density estimation network and is trained by step 3, density estimation network by deep layer network and Shallow-layer network composition, mid-deep strata network is designed based on VGG-16, and shallow-layer network possesses three-layer coil lamination and three layers of pond Layer.Specific step is as follows:

Step 3.1 learns high-level semantics information on density map using deep layer network, which is a kind of similar In the model of VGG-16.Crowd density estimation task is realized using the good VGG-16 of pre-training.Although VGG-16 design is first Inner feelings is used to as image classification network, but the convolution kernel inside VGG-16 is extraordinary general vision descriptor. VGG-16 has been used in various Computer Vision Tasks such as Target Segmentation, target detection, target tracking etc..Base In this characteristic of VGG-16, the good VGG-16 sorter network of pre-training is finely adjusted to complete crowd density estimation task.But It is that classification task is different with crowd density estimation task always, classification task is to classify to a picture, and crowd is close Degree estimation task is that the prediction of Pixel-level is carried out to picture, is added to obtain total people by will export pixel value all on picture Number.For simple, the output of classification task is some discrete points, and the output of crowd density estimation task is a Zhang Erwei Image.So present invention removes the full articulamentums inside VGG-16, so that our crowd density estimation network becomes full volume Product network.

VGG-16 is made of five groups of full convolutional layers, and there are two full convolutional layers for first and second groups of convolutional layers of VGG-16, and Third, the 4th and the 5th group of full convolutional layer are all made of three full convolutional layers.The convolution of all convolutional layers in every group of convolutional layer Core size is all 3 × 3, and the convolution carried out in every group of convolutional layer will not all change the size and dimension of output characteristic pattern.But It is one layer of maximum pond layer (max-pooling) to be all connected to behind every group of convolutional layer, and the step-length of maximum pond layer is 2.Cause This, entire deep layer crowd density estimation network one shares 5 maximum pond layers, and it is defeated that this, which allows for last output picture size, Enter the 1/32 of picture size.In order to make the deep layer network of crowd density estimation network as the output size of shallow-layer network, this Invention removes the subsequent pond layer of the 5th group of convolutional layer of deep layer network and sets the pond layer step-length after the 4th group of convolutional layer It is 1.After being handled in this way, the output picture size of deep layer network becomes inputting the 1/8 of picture.

Step 3.2, because density image in number it is more, some people distant from camera often only show in the picture One hand or one leg are shown, they often only account for several pixels even pixel.Due to they feature very Unconspicuous reason, so extremely difficult to the detection of crowd density estimation image Small Target.If continuing to use deep layer herein Network detects Small object, then last accuracy rate will be very low.Generate this effect be due to receptive field, such as In above-mentioned deep layer network, the pixel of crowd density estimation network exported in picture sense shared in input picture There are tens pixels by open country, but small target only only has several pixels, so the feature of Small object is by so Ignored after the convolutional calculation of multilayer, this accuracy rate for having resulted in crowd density estimation task reduces.

Therefore, the present invention proposes shallow-layer network on the basis of deep layer network, which is mainly used to learn in picture The feature of Small object.Because not needing the high-level semantics information of study image, only only there are three convolution for shallow-layer network Layer and three average pond layers.Each convolutional layer has 24 convolution kernels, and the size of convolution kernel is 5 × 5.Because of maximum Chi Huayou It may cause the loss of statistics total number of persons, so maximum pond layer is changed to average pond layer.The step-length of each averagely pond layer It is 2, so the output picture size of shallow-layer network is input picture size 1/8.

Step 3.3, the output due to deep layer network and the output of shallow-layer network are all the 1/8 of input picture, it is possible to Directly the output of deep layer network and the output of shallow-layer network are connected.I.e. in the case where not changing output picture size, Directly it is overlapped in dimension.Meanwhile in order to preferably complete crowd density estimation task, the present invention is extracted deep layer network Conv2-2, Conv3-3 and Conv4-3 layers of feature.For the feature of the Conv2-2 layer extracted, face adds the present invention behind One layer of convolutional layer is added, which there are 128 convolution kernels, and the size of convolution kernel is 9 × 9, step-length 4.Simultaneously for extraction The feature of the Conv3-3 layer arrived, also face is added to one layer of convolutional layer to the present invention behind, which has 128 convolution kernels, volume Product core size is 9 × 9, step-length 2.For the feature for the Conv4-3 layer that extraction obtains, also face is added to one layer of convolution behind Layer, the convolutional layer have 128 convolution kernels, and convolution kernel size is 4 × 4, step-length 1.Behind them add convolutional layer be in order to Keep the size for obtaining characteristic pattern by Conv2-2, Conv3-3 and Conv4-3 layers of extraction identical, the characteristic pattern energy extracted in this way Output with shallow-layer network and deep layer network is directly attached.By extracting the feature of different layers, crowd density estimation network It can learn the density information more advanced into density picture, improve the accuracy rate of crowd density estimation model.

After aforesaid operations, the dimension for finally exporting picture is 920 dimensions, and the size for exporting picture is input picture 1/8.In order to make to reduce the parameter of network while output picture size is identical with picture size is inputted, the present invention is merging more rulers It joined two up-sampling layers after degree feature.First up-sampling layer has 512 convolution kernels, and convolution kernel size is 5 × 5, step A length of 4.Second up-sampling layer also has 512 convolution kernels, and convolution kernel size is 3 × 3.By the two up-sampling layer processing Afterwards, the characteristic pattern of one 512 dimension can be obtained, the size of this feature figure is identical with original picture size.This operation can not only Keep output picture size identical with picture size is inputted, to smoothly complete crowd density estimation task, while also can be reduced people The parameter of group's density estimation network, accelerates network training speed.One 1 × 1 is finally added behind crowd density estimation model Convolution kernel, do so primarily to change the dimension of characteristic pattern, characteristic pattern and thermodynamic chart compared, calculate Loss function.Finally, we, which only need to export pixel value addition all on characteristic pattern, can obtain on input picture Total number of persons.

Step 4, according to training complete crowd density estimation network, by it output as bilayer LSTM output thus Crowd density estimation network and people flow rate statistical network connection are got up, which can be completed at the same time crowd density estimation task With people flow rate statistical task.Specific step is as follows:

Step 4.1, theoretically, the number of plies of neural network is deeper, and the performance of neural network is better.This is theoretical in circulation mind Through equally applicable in network.Two layers of LSTM is used in crowd density estimation and people flow rate statistical task so merging at us. By stacking two layers of LSTM, people flow rate statistical model being capable of information more effectively on learning time sequence picture.Standard is followed Ring neural network only has one layer, we increase LSTM in the present invention.Because increasing one layer of LSTM, first layer it is defeated It is the input of the second layer out.

Step 4.2, input are time series picture, and each moment someone enters monitoring area while also someone walks out prison Region is controlled, so at different times, the total number of persons in image is possible to not change, but always by the number of this area Changing.The crowd density estimation model of Multiscale Fusion of the picture at each moment Jing Guo above-mentioned introduction calculates thermodynamic chart, All pixels value on thermodynamic chart is added to the total number of persons that can be calculated on original image.The heating power that each time point is exported Figure as two layers of LSTM input because LSTM can calculate the information on continuous time series picture, it is possible to last The total number of persons in this period by monitoring area is exported on one LSTM, so that crowd density estimation task and flow of the people be united Meter task is combined together.

It is not difficult to find that present invention uses than use one end to end regression model have stronger learning sample pictorial information Ability.It goes to realize people flow rate statistical task, this hair compared to directly time series original image is directly inputted in LSTM The input of the people flow rate statistical model of bright proposition is the output of crowd's Density estimating model.The present invention not only increases flow of the people system The accuracy rate of meter task, while also combining crowd density estimation task and people flow rate statistical task.

Claims

1. a crowd density estimation and people flow statistics method, it is characterized in that, comprise the following steps:

Step 1. Preprocess the crowd density image;

Step 2. Mark the head of the person on the preprocessed crowd density image with dots, generate a binary image of the same size, and use the Gaussian kernel normalization algorithm for the generated binary image to generate a heat map, a heat map. and the crowd density map corresponding to the heat map to form a set of training sets;

Step 3. Put the training set obtained in step 2 into the density estimation model for training. The density estimation model consists of a deep network and a shallow network, where:

The deep network is VGG-16 that removes the fully connected layer. VGG-16 consists of five groups of fully convolutional layers. The first group of convolutional layers and the second group of convolutional layers of VGG-16 have two fully convolutional layers. The third group of fully convolutional layers, the fourth group of fully convolutional layers and the fifth group of fully convolutional layers of VGG-16 are all composed of three fully convolutional layers, only the first group of convolutional layers and the second group of convolutional layers. A layer of maximum pooling layer is connected behind the product layer, the third group of fully convolutional layers, and the fourth group of fully convolutional layers, which are connected to the first group of convolutional layers, the second group of convolutional layers, and the third group of fully convolutional layers. The max-pooling layer after the layer has a stride of 2, and the max-pooling layer connected after the fourth set of fully convolutional layers has a stride of 1.

The shallow network has three convolutional layers and three pooling layers;

The output of the deep network and the output of the shallow network are directly connected, and then two upper convolution layers are added, so that the size of the feature map output by the density estimation model is the same as the size of the original crowd density image;

Step 4. According to the density estimation model completed by training, the output of the density estimation model is used as the output of the double-layer LSTM to connect the crowd density estimation model and the people flow statistics model, and simultaneously complete the crowd density estimation task and the people flow statistics task.

2. a kind of crowd density estimation and people flow statistics method as claimed in claim 1 is characterized in that, in step 2, generating described heat map comprises the following steps:

First, normalize the two-dimensional Gaussian kernel to make the sum of its parameters equal to one, and then use the two-dimensional Gaussian kernel normalization algorithm to generate a heat map for the binary image, so that the points with a pixel value of one in the binary image are scattered. , so that all pixel values around the point add up to one.

3. a kind of crowd density estimation and people flow statistics method as claimed in claim 1, is characterized in that, in step 2, described Gaussian kernel normalized calculation formula H(x) is:

In the formula, A represents all the points marked on the head of the person in the crowd density image, a is a point in A, x represents the pixel position in the heat map, and σ is the variance of the Gaussian kernel N.