Summary of the invention
The purpose of the present invention is: improve the accuracy rate of crowd density estimation and people flow rate statistical task.
In order to achieve the above object, the technical solution of the present invention is to provide a kind of crowd density estimation and people flow rate statisticals
Method, which comprises the following steps:
Step 1 carries out pretreatment operation to crowd density picture;
Personage head on pretreated crowd density picture point is marked by step 2, and it is identical to generate size
Bianry image generates thermodynamic chart, a thermodynamic chart and and thermodynamic chart using Gaussian kernel normalization algorithm to the bianry image of generation
Corresponding crowd density figure is at one group of training set;
Step 3, the training set for obtaining step 2 are put into Density estimating model and are trained, and Density estimating model is by depth
Layer network and shallow-layer network composition, in which:
Deep layer network is the VGG-16 for removing full articulamentum, and VGG-16 is made of five groups of full convolutional layers, the first of VGG-16
Group convolutional layer and second group of convolutional layer are there are two full convolutional layer, and the full convolutional layer of third group of VGG-16, the 4th group of full convolutional layer
It is all made of three full convolutional layers with the 5th group of full convolutional layer, only first group of convolutional layer, second group of convolutional layer, the full convolution of third group
It is connected to one layer of maximum pond layer behind layer and the 4th group of full convolutional layer, is connected to first group of convolutional layer, second group of convolutional layer, the
The step-length of maximum pond layer after three groups of full convolutional layers is 2, the step-length of the maximum pond layer after being connected to the 4th group of full convolutional layer
It is 1
Shallow-layer network possesses three-layer coil lamination and three layers of pond layer;
Two upper convolutional layers have been added after directly the output of deep layer network and the output of shallow-layer network are connected, and have been made
The size for obtaining the characteristic pattern that Density estimating model finally exports is identical with the size of original crowd's density map piece;
Step 4, the Density estimating model completed according to training, by the output of Density estimating model as the defeated of bilayer LSTM
Out to connect crowd density estimation model and people flow rate statistical model, it is completed at the same time crowd density estimation task and people
Traffic statistics task.
Preferably, in step 2, generate the thermodynamic chart the following steps are included:
Its parameter is made to two-dimensional Gaussian kernel normalization first and for one, dimensional Gaussian then is used to the bianry image
Core normalization algorithm generates thermodynamic chart, so that the point that pixel value is one in bianry image scatter, makes all pixels around the point
Value adds up and is one.
Preferably, in step 2, the normalized calculation formula H (x) of Gaussian kernel are as follows:
In formula, what A was represented be personage head in crowd density picture it is labeled come out it is all
Point, a is a point in A, and what x was represented is the location of pixels in thermodynamic chart, and σ is the variance of Gaussian kernel N.
Due to the adoption of the above technical solution, compared with prior art, the present invention having the following advantages that and actively imitating
Fruit: present invention uses the crowd density estimation model of Multiscale Fusion, which is made of deep layer network and shallow-layer network,
Mid-deep strata network is designed based on VGG-16, only deep layer network eliminated on the basis of VGG-16 full articulamentum and
Layer 5 pond layer, and the step-length of the 4th layer of pond layer is changed to 1.And shallow-layer network is only by three-layer coil lamination and three layers of pond
Layer composition, shallow-layer network are primarily used to the shared lesser clarification of objective of pixel ratio on study picture.And Multiscale Fusion
The crowd density estimation model extraction feature of deep layer network difference convolutional layer, by they and deep layer network and shallow-layer network
Output is integrated into row crowd density estimation.Simultaneously by the output of crowd density estimation model as the defeated of people flow rate statistical model
Enter, so that together by two Model Fusions, the training speed of neural network can not only be greatly speeded up in this way, in real life
Utilization also more extensively.The present invention not only increases crowd density task and flow of the people with the accuracy rate of task, while by people
Group's density estimation task and people flow rate statistical task are completed in a model.
Specific embodiment
Present invention will be further explained below with reference to specific examples.It should be understood that these embodiments are merely to illustrate the present invention
Rather than it limits the scope of the invention.In addition, it should also be understood that, after reading the content taught by the present invention, those skilled in the art
Member can make various changes or modifications the present invention, and such equivalent forms equally fall within the application the appended claims and limited
Range.
Embodiments of the present invention are related to a kind of crowd density estimation neural network based and people flow rate statistical method, such as
Shown in Fig. 1, comprising the following steps: pre-processed to crowd density image;Personage head on pretreated image is used
Point, which is marked, generates the identical bianry image of size, generates heating power using Gaussian kernel normalization algorithm to the bianry image of generation
Figure, an original image and a thermodynamic chart are just one group of training set;Above-mentioned training set is put into crowd density estimation network
It is trained, crowd density estimation network is made of deep layer network and shallow-layer network, and mid-deep strata network is set based on VGG-16
Meter, shallow-layer network possesses three-layer coil lamination and three layers of pond layer;According to training complete crowd density estimation network, by it
The output as bilayer LSTM is exported to crowd density estimation network and people flow rate statistical are connected to the network, the network energy
Enough it is completed at the same time crowd density estimation task and people flow rate statistical task.It is specific as follows:
Step 1 collects cut out image block in present frame according to the collected data, image block is done pretreatment operation: first
Image block size is adjusted to 225 ' 225 pixel sizes, then finds out the pixel mean value of all pixels point, then with each pixel
Pixel value subtract pixel mean value and obtain normalized image block.
Personage's head point on pretreated image is marked the identical bianry image of generation size by step 2,
Thermodynamic chart is generated using Gaussian kernel normalization algorithm to the bianry image of generation, an original image and a thermodynamic chart are just one
Group training set.Specific step is as follows:
Step 2.1 makes its parameter to two-dimensional Gaussian kernel normalization first and for one, then uses above-mentioned bianry image
Two-dimensional Gaussian kernel normalization algorithm generates thermodynamic chart.This method makes the point that pixel value is one in bianry image scatter, and makes this
Point surrounding all pixels value adds up and is one.The normalized calculation formula of Gaussian kernel is as follows:
In formula (1), what A was represented, which is that personage head in original image is labeled, comes out all points, and a is one in A
Point, what x was represented is the location of pixels in thermodynamic chart, and σ is the variance of Gaussian kernel N.
Step 2.2 is equal to all pixels value addition on the thermodynamic chart generated by the above method in original image
Number, the total number of persons that can be calculated by the following formula in original image:
Wherein H (x) represents the pixel value of x point on thermodynamic chart.It goes to train crowd density estimation network by this thermodynamic chart
It enables to network not have to the position for focusing on personage in image again, and is concerned with the number in image.And by this
The thermodynamic chart that mode generates can obtain the substantially distribution of crowd in original image.
Above-mentioned training set is put into density estimation network and is trained by step 3, density estimation network by deep layer network and
Shallow-layer network composition, mid-deep strata network is designed based on VGG-16, and shallow-layer network possesses three-layer coil lamination and three layers of pond
Layer.Specific step is as follows:
Step 3.1 learns high-level semantics information on density map using deep layer network, which is a kind of similar
In the model of VGG-16.Crowd density estimation task is realized using the good VGG-16 of pre-training.Although VGG-16 design is first
Inner feelings is used to as image classification network, but the convolution kernel inside VGG-16 is extraordinary general vision descriptor.
VGG-16 has been used in various Computer Vision Tasks such as Target Segmentation, target detection, target tracking etc..Base
In this characteristic of VGG-16, the good VGG-16 sorter network of pre-training is finely adjusted to complete crowd density estimation task.But
It is that classification task is different with crowd density estimation task always, classification task is to classify to a picture, and crowd is close
Degree estimation task is that the prediction of Pixel-level is carried out to picture, is added to obtain total people by will export pixel value all on picture
Number.For simple, the output of classification task is some discrete points, and the output of crowd density estimation task is a Zhang Erwei
Image.So present invention removes the full articulamentums inside VGG-16, so that our crowd density estimation network becomes full volume
Product network.
VGG-16 is made of five groups of full convolutional layers, and there are two full convolutional layers for first and second groups of convolutional layers of VGG-16, and
Third, the 4th and the 5th group of full convolutional layer are all made of three full convolutional layers.The convolution of all convolutional layers in every group of convolutional layer
Core size is all 3 × 3, and the convolution carried out in every group of convolutional layer will not all change the size and dimension of output characteristic pattern.But
It is one layer of maximum pond layer (max-pooling) to be all connected to behind every group of convolutional layer, and the step-length of maximum pond layer is 2.Cause
This, entire deep layer crowd density estimation network one shares 5 maximum pond layers, and it is defeated that this, which allows for last output picture size,
Enter the 1/32 of picture size.In order to make the deep layer network of crowd density estimation network as the output size of shallow-layer network, this
Invention removes the subsequent pond layer of the 5th group of convolutional layer of deep layer network and sets the pond layer step-length after the 4th group of convolutional layer
It is 1.After being handled in this way, the output picture size of deep layer network becomes inputting the 1/8 of picture.
Step 3.2, because density image in number it is more, some people distant from camera often only show in the picture
One hand or one leg are shown, they often only account for several pixels even pixel.Due to they feature very
Unconspicuous reason, so extremely difficult to the detection of crowd density estimation image Small Target.If continuing to use deep layer herein
Network detects Small object, then last accuracy rate will be very low.Generate this effect be due to receptive field, such as
In above-mentioned deep layer network, the pixel of crowd density estimation network exported in picture sense shared in input picture
There are tens pixels by open country, but small target only only has several pixels, so the feature of Small object is by so
Ignored after the convolutional calculation of multilayer, this accuracy rate for having resulted in crowd density estimation task reduces.
Therefore, the present invention proposes shallow-layer network on the basis of deep layer network, which is mainly used to learn in picture
The feature of Small object.Because not needing the high-level semantics information of study image, only only there are three convolution for shallow-layer network
Layer and three average pond layers.Each convolutional layer has 24 convolution kernels, and the size of convolution kernel is 5 × 5.Because of maximum Chi Huayou
It may cause the loss of statistics total number of persons, so maximum pond layer is changed to average pond layer.The step-length of each averagely pond layer
It is 2, so the output picture size of shallow-layer network is input picture size 1/8.
Step 3.3, the output due to deep layer network and the output of shallow-layer network are all the 1/8 of input picture, it is possible to
Directly the output of deep layer network and the output of shallow-layer network are connected.I.e. in the case where not changing output picture size,
Directly it is overlapped in dimension.Meanwhile in order to preferably complete crowd density estimation task, the present invention is extracted deep layer network
Conv2-2, Conv3-3 and Conv4-3 layers of feature.For the feature of the Conv2-2 layer extracted, face adds the present invention behind
One layer of convolutional layer is added, which there are 128 convolution kernels, and the size of convolution kernel is 9 × 9, step-length 4.Simultaneously for extraction
The feature of the Conv3-3 layer arrived, also face is added to one layer of convolutional layer to the present invention behind, which has 128 convolution kernels, volume
Product core size is 9 × 9, step-length 2.For the feature for the Conv4-3 layer that extraction obtains, also face is added to one layer of convolution behind
Layer, the convolutional layer have 128 convolution kernels, and convolution kernel size is 4 × 4, step-length 1.Behind them add convolutional layer be in order to
Keep the size for obtaining characteristic pattern by Conv2-2, Conv3-3 and Conv4-3 layers of extraction identical, the characteristic pattern energy extracted in this way
Output with shallow-layer network and deep layer network is directly attached.By extracting the feature of different layers, crowd density estimation network
It can learn the density information more advanced into density picture, improve the accuracy rate of crowd density estimation model.
After aforesaid operations, the dimension for finally exporting picture is 920 dimensions, and the size for exporting picture is input picture
1/8.In order to make to reduce the parameter of network while output picture size is identical with picture size is inputted, the present invention is merging more rulers
It joined two up-sampling layers after degree feature.First up-sampling layer has 512 convolution kernels, and convolution kernel size is 5 × 5, step
A length of 4.Second up-sampling layer also has 512 convolution kernels, and convolution kernel size is 3 × 3.By the two up-sampling layer processing
Afterwards, the characteristic pattern of one 512 dimension can be obtained, the size of this feature figure is identical with original picture size.This operation can not only
Keep output picture size identical with picture size is inputted, to smoothly complete crowd density estimation task, while also can be reduced people
The parameter of group's density estimation network, accelerates network training speed.One 1 × 1 is finally added behind crowd density estimation model
Convolution kernel, do so primarily to change the dimension of characteristic pattern, characteristic pattern and thermodynamic chart compared, calculate
Loss function.Finally, we, which only need to export pixel value addition all on characteristic pattern, can obtain on input picture
Total number of persons.
Step 4, according to training complete crowd density estimation network, by it output as bilayer LSTM output thus
Crowd density estimation network and people flow rate statistical network connection are got up, which can be completed at the same time crowd density estimation task
With people flow rate statistical task.Specific step is as follows:
Step 4.1, theoretically, the number of plies of neural network is deeper, and the performance of neural network is better.This is theoretical in circulation mind
Through equally applicable in network.Two layers of LSTM is used in crowd density estimation and people flow rate statistical task so merging at us.
By stacking two layers of LSTM, people flow rate statistical model being capable of information more effectively on learning time sequence picture.Standard is followed
Ring neural network only has one layer, we increase LSTM in the present invention.Because increasing one layer of LSTM, first layer it is defeated
It is the input of the second layer out.
Step 4.2, input are time series picture, and each moment someone enters monitoring area while also someone walks out prison
Region is controlled, so at different times, the total number of persons in image is possible to not change, but always by the number of this area
Changing.The crowd density estimation model of Multiscale Fusion of the picture at each moment Jing Guo above-mentioned introduction calculates thermodynamic chart,
All pixels value on thermodynamic chart is added to the total number of persons that can be calculated on original image.The heating power that each time point is exported
Figure as two layers of LSTM input because LSTM can calculate the information on continuous time series picture, it is possible to last
The total number of persons in this period by monitoring area is exported on one LSTM, so that crowd density estimation task and flow of the people be united
Meter task is combined together.
It is not difficult to find that present invention uses than use one end to end regression model have stronger learning sample pictorial information
Ability.It goes to realize people flow rate statistical task, this hair compared to directly time series original image is directly inputted in LSTM
The input of the people flow rate statistical model of bright proposition is the output of crowd's Density estimating model.The present invention not only increases flow of the people system
The accuracy rate of meter task, while also combining crowd density estimation task and people flow rate statistical task.