[go: up one dir, main page]

WO2022188030A1 - Procédé d'estimation de densité de foule, dispositif électronique et support de stockage - Google Patents

Procédé d'estimation de densité de foule, dispositif électronique et support de stockage Download PDF

Info

Publication number
WO2022188030A1
WO2022188030A1 PCT/CN2021/079755 CN2021079755W WO2022188030A1 WO 2022188030 A1 WO2022188030 A1 WO 2022188030A1 CN 2021079755 W CN2021079755 W CN 2021079755W WO 2022188030 A1 WO2022188030 A1 WO 2022188030A1
Authority
WO
WIPO (PCT)
Prior art keywords
crowd
feature
crowd density
image
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2021/079755
Other languages
English (en)
Chinese (zh)
Inventor
胡金星
杨戈
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to PCT/CN2021/079755 priority Critical patent/WO2022188030A1/fr
Publication of WO2022188030A1 publication Critical patent/WO2022188030A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A30/00Adapting or protecting infrastructure or their operation
    • Y02A30/60Planning or developing urban green infrastructure

Definitions

  • the present application relates to the technical field of crowd density estimation, and in particular, to a crowd density estimation method, an electronic device and a storage medium.
  • the technical problem mainly solved by the present application is to provide a crowd density estimation method, electronic device and storage medium, which can improve the accuracy of crowd density estimation for crowd images collected by a collection device at different viewing angles and different fields of view.
  • a technical solution adopted in the present application is to provide a method for estimating crowd density, the method comprising: acquiring multiple crowd images; wherein the multiple crowd images are acquired respectively by multiple image acquisition devices; The individual crowd images are input to the crowd density estimation network to obtain a first crowd density image corresponding to each crowd image; wherein, the crowd density estimation network includes several feature extraction layers, several feature fusion layers and crowd density estimation layers, and several feature extraction layers. The layers have different network depths; according to the positions of multiple image acquisition devices and the image acquisition angles, multiple first crowd density images are combined to form a second crowd density image, so as to use the second crowd density image to carry out the flow of people in the target area. estimate.
  • combining multiple first crowd density images to form a second crowd density image according to the positions and image capturing angles of multiple image capturing devices includes: determining each capturing device according to the position and image capturing angle of each capturing device The perspective transformation relationship is based on the perspective transformation relationship; use the perspective transformation relationship to perform plane projection on each first crowd density image to obtain the corresponding crowd density plane image; normalize multiple crowd density plane images; The density plane images are combined to form a second crowd density image.
  • determining the perspective transformation relationship of each acquisition device according to the position of each acquisition device and the image acquisition angle includes: determining at least four spatial coordinates in the acquisition area corresponding to the position of each acquisition device; Pixel coordinates corresponding to at least four spatial coordinates are determined in the crowd image; the perspective transformation relationship of each acquisition device is determined by using the at least four spatial coordinates and the pixel coordinates corresponding to the at least four spatial coordinates.
  • normalizing the plurality of crowd density plane images includes: determining a normalization weight matrix; multiplying each crowd density plane image with the normalization weight matrix to normalize each crowd density plane image change.
  • determining the normalized weight matrix includes: using the following formula to determine the elements of the normalized weight matrix: Among them, (x 0 , y 0 ) represents the pixel coordinates on the crowd image, (x, y) represents the pixel coordinates on the crowd density plane image corresponding to the pixel coordinates on the crowd image, is the first crowd density image with the Gaussian blur kernel center at the crowd image pixel point (x 0 , y 0 ); Represents the crowd density plane image, i, j and m, n are the pixel coordinates on the crowd image and the pixel coordinates on the crowd density plane image, respectively, w xy is the Gaussian blur kernel center falling on the crowd image pixel (x 0 , y 0 ) of the first crowd density image at the crowd density plane image (x, y), where, The pixel value of the pixel point (x 0 , y 0 ) is 1 before the Gaussian blur is calculated and the pixel value of other pixels is
  • combining each normalized crowd density plane image to form a second crowd density image includes: determining the weighted average weight of each crowd density plane image; The first pixel value of the pixel point is obtained to obtain a pixel value set; the first pixel value in the pixel value set is weighted and averaged by using the weighted average weight to obtain the second pixel value; the second pixel value is used as the second crowd density image. The pixel values of the corresponding pixel points are used to form a second crowd density image.
  • several feature extraction layers include a first feature extraction layer, a second feature extraction layer, a third feature extraction layer and a fourth feature extraction layer; wherein, the first feature extraction layer, the second feature extraction layer, the third feature extraction layer and the network depth of the fourth feature extraction layer increase in turn;
  • several feature fusion layers include the first feature fusion layer, the second feature fusion layer, the third feature fusion layer, the fourth feature fusion layer and the fifth feature fusion layer;
  • the network depths of the first feature fusion layer, the second feature fusion layer, the third feature fusion layer, and the fourth feature fusion layer are the same, and the network depth of the fifth feature fusion layer is greater than the network depth of the first feature fusion layer.
  • inputting a plurality of crowd images into a crowd density estimation network to obtain a first crowd density image corresponding to each crowd image includes: inputting each crowd image into a first feature extraction layer to output a first feature map ; Input the first feature map to the second feature extraction layer to output the second feature map; Input the second feature map to the third feature extraction layer to output the third feature map, and input the second feature map to the first feature map a feature fusion layer to output the first feature fusion map; input the third feature map to the fourth feature extraction layer to output the fourth feature map, and input the third feature map and the first feature fusion map to the fifth feature fusion layer to output the second feature fusion map, and input the third feature map to the second feature fusion layer to output the third feature fusion map; combine the fourth feature map, the second feature fusion map and the third feature fusion map Input to the third feature fusion layer to output the fourth feature fusion map; input the fourth feature fusion map to the fourth feature fusion layer to output the fifth feature fusion map; input the fifth feature fusion
  • the number of channels of the first feature extraction layer is 3, 64, 64 and 64 from the input to the output direction; the number of channels of the second feature extraction layer is 64, 128, 128 and 128 from the input to the output direction; the third The number of channels of the feature extraction layer is 128, 256, 256, 256, 256, 256, 256 and 256 from input to output; the number of channels of the fourth feature extraction layer is 256, 512, 512, 512, 512, 512, and 512; wherein the pooling layers in the first feature extraction layer, the second feature extraction layer, the third feature extraction layer and the fourth feature extraction layer have a step size of 2 and a receptive field of 2 ; the number of channels of the first feature fusion layer is 128 and 16 from the input to the output direction; the number of channels of the second feature fusion layer is 16 and 16 from the input to the output direction; the number of channels of the third feature fusion layer is from the input to the output direction.
  • the output directions are 16 and 16
  • the method further includes: when the input feature map size and the number of channels in the first feature fusion layer, the second feature fusion layer, the third feature fusion layer, the fourth feature fusion layer and the fifth feature fusion layer are inconsistent, adopting The bilinear difference method upsamples and downsamples the feature maps, and uses preset convolutional layers for processing to output feature maps with a uniform number of channels.
  • an electronic device the electronic device includes a processor and a memory connected to the processor; wherein the memory is used for storing program data, the processor is used for executing the program data, In order to realize the method provided by the above technical solution.
  • another technical solution adopted in the present application is to provide a computer-readable storage medium, the computer-readable storage medium is used to store program data, and when the program data is executed by the processor, it is used to realize the above-mentioned Methods provided by technical solutions.
  • a method for estimating crowd density of the present application includes: acquiring a plurality of crowd images; wherein, the plurality of crowd images are respectively acquired by a plurality of image acquisition devices ; Input a plurality of crowd images into a crowd density estimation network to obtain a first crowd density image corresponding to each crowd image; wherein, the crowd density estimation network includes several feature extraction layers and several feature fusion layers and a crowd density estimation layer, Several feature extraction layers have different network depths; according to the positions and image acquisition angles of multiple image acquisition devices, multiple first crowd density images are combined to form a second crowd density image, so as to use the second crowd density image to carry out the target area. estimated traffic flow.
  • the feature fusion layer and the feature extraction layer with different network depths are used to extract and fuse features of different scales for each crowd image, so as to adapt to the collection height of different crowd images, so as to better perform feature extraction and further processing.
  • Crowd density estimation can improve the accuracy of crowd density estimation for crowd images collected by collection devices from different viewing angles and different fields of view, and improve the accuracy of crowd density estimation in cross-video crowd distribution statistics.
  • FIG. 1 is a schematic flowchart of an embodiment of a crowd density estimation method provided by the present application.
  • FIG. 2 is a schematic flowchart of a second embodiment of a method for adjusting a display interface provided by the present application
  • FIG. 3 is a schematic flowchart of another embodiment of the crowd density estimation method provided by the present application.
  • Fig. 4 is the specific flow chart of step 33 provided by this application.
  • Fig. 5 is the specific flow chart of step 35 provided by this application.
  • Fig. 6 is the specific flow chart of step 36 provided by this application.
  • FIG. 8 is a schematic diagram of an application of the crowd density estimation method provided by the present application.
  • FIG. 9 is a schematic structural diagram of an embodiment of an electronic device provided by the present application.
  • FIG. 10 is a schematic structural diagram of an embodiment of a computer-readable storage medium provided by the present application.
  • FIG. 1 is a schematic flowchart of an embodiment of a crowd density estimation method provided by the present application. The method includes:
  • Step 11 Acquire multiple crowd images.
  • crowd images are acquired by multiple image acquisition devices respectively. It can be understood that crowd images do not necessarily contain crowds.
  • a plurality of image capturing devices may be distributed at different positions in an area to capture crowd images at corresponding positions. If the area is an intersection, referring to Figure 2, the plan view of the intersection is divided by the XOY coordinate system, then the acquisition device D is set in the area corresponding to the first quadrant, the acquisition device A is set in the area corresponding to the second quadrant, and the The acquisition device B is set in the area corresponding to the third quadrant, and the acquisition device C is set in the area corresponding to the fourth quadrant. The acquisition device A, the acquisition device B, the acquisition device C, and the acquisition device D can respectively acquire crowd images in their corresponding regions.
  • step 11 may be to preprocess the plurality of crowd images. Specifically, since multiple crowd images are collected by different collection devices, they can be classified according to the collection devices, and after the classification, the crowd images can be sorted according to the generation time of the crowd images. Then traverse the crowd images corresponding to each acquisition device, and acquire multiple crowd images with the same generation time among these crowd images.
  • Step 12 Input multiple crowd images into the crowd density estimation network to obtain a first crowd density image corresponding to each crowd image; wherein, the crowd density estimation network includes several feature extraction layers and several feature fusion layers and crowd density estimation layers, several feature extraction layers have different network depths.
  • each crowd image may be input to a crowd density estimation network to obtain a first crowd density image corresponding to the crowd image.
  • the plurality of crowd images are sorted, and then the plurality of crowd images are sequentially input to the crowd density estimation network according to the sorted order, so that the crowd density estimation network outputs the first crowd corresponding to each crowd image density image.
  • the crowd image is input to the feature extraction layer with the smallest network depth among several feature extraction layers, so as to perform feature extraction corresponding to the network depth in the feature extraction layer to obtain the first target feature map; then input the first target feature map to the
  • the next feature extraction layer is to obtain the second target feature map in the next feature extraction layer, and the second target feature map is input to the next feature extraction layer and feature fusion layer respectively to obtain the third target feature map and the first target feature map
  • corresponding feature extraction and feature fusion are performed according to the number of feature extraction layers and feature fusion layers.
  • the target fusion map output by the last feature fusion layer is input to the crowd density estimation layer, and the first crowd density image corresponding to each crowd image is obtained.
  • each feature extraction layer includes several convolutional layers.
  • Each feature fusion layer includes several convolutional layers and the crowd density estimation layer includes several convolutional layers. Among them, each convolutional layer is followed by an activation layer.
  • convolutional layers with ReLu activation layer after each convolutional layer
  • several convolutional layers are used as a feature extraction layer
  • several convolutional layers are used as a feature fusion layer (with ReLu activation layer after each convolutional layer).
  • several convolutional layers are used as crowd density estimation layers (each convolutional layer is followed by a ReLu activation layer) to form a crowd density estimation network.
  • each feature extraction layer has the function of downsampling the feature map, that is, the width and height of the target feature map output by the feature extraction layer are reduced by 1/2 times the size, which can be realized by a maximum pooling layer or a convolution layer.
  • the crowd density estimation network calculates and outputs the first crowd density image in N stages; except that the input of the feature extraction layer of the first stage is the crowd image, the feature extraction layer of each stage only inputs the output of the feature extraction layer of the previous stage
  • the input image of the first stage of the crowd density estimation network is sequentially composed of two feature extraction layers in series
  • the second stage is composed of a 4x feature fusion layer and a feature extraction layer in parallel
  • the second stage is composed of a 4x feature fusion layer and a feature extraction layer.
  • the three stages consist of a 4x feature fusion layer, an 8x feature fusion layer and a feature extraction layer in parallel
  • the fourth stage consists of a 4x feature fusion layer, an 8x feature fusion layer, a 16x feature fusion layer and a feature extraction layer
  • the fifth stage is composed of a 4x feature fusion module and a crowd density estimation layer in series.
  • the 4x feature fusion module of the fourth stage is composed of several parallel convolutional layers with different separation rates (Dilation Rate) as a
  • the feature fusion layer implements multi-scale feature fusion (with a ReLu activation layer after each convolutional layer).
  • a feature fusion layer accepts the outputs of multiple feature fusion layers and feature extraction layers as input at the same time, the feature maps are added element by element by addition and then input to the feature fusion layer for calculation.
  • the first, second, third and fourth stages of the network constitute the fusion and extraction of multi-scale features to extract multi-scale hidden features;
  • the 4x feature fusion layer in the fifth stage constitutes a multi-scale receptive field convolutional network module for further fusion Or transform the multi-scale hidden features;
  • the crowd density estimation layer in the fifth stage inputs the multi-scale hidden features output by the feature fusion layer formed by the multi-scale receptive field convolutional network module to calculate and output the first crowd density image.
  • Step 13 Combine multiple first crowd density images to form a second crowd density image according to the positions and image capturing angles of multiple image capturing devices, so as to use the second crowd density image to estimate the flow of people in the target area.
  • coordinate transformation is performed on the first crowd density image according to the position of each collection device and the image collection angle, and the first crowd density image Converted to a flat image of the area acquired by the acquisition device.
  • a plurality of plane images corresponding to the areas collected by the acquisition devices will be obtained, and then these plane images will be processed to obtain a second crowd density image.
  • the pixel area representing the crowd in the second crowd density image is represented by a specific color.
  • different pixel values can be set for pixel points in the pixel area to represent different crowd densities.
  • the crowd density estimation network includes several feature extraction layers, several feature fusion layers and crowd density estimation layers; several feature extraction layers have different network depths; according to the positions of multiple image acquisition devices and image acquisition angles , combining a plurality of first crowd density images to form a second crowd density image, so as to use the second crowd density image to estimate the flow of people in the target area.
  • the feature fusion layer and the feature extraction layer with different network depths are used to extract and fuse features of different scales for each crowd image, so as to adapt to the collection height of different crowd images, so as to better perform feature extraction and further processing.
  • Crowd density estimation can improve the accuracy of crowd density estimation for crowd images collected by collection devices from different viewing angles and different fields of view, and improve the accuracy of crowd density estimation in cross-video crowd distribution statistics.
  • FIG. 3 is a schematic flowchart of another embodiment of the crowd density estimation method provided by the present application.
  • the method includes:
  • Step 31 Acquire multiple crowd images.
  • Step 32 Input multiple crowd images into the crowd density estimation network to obtain a first crowd density image corresponding to each crowd image; wherein, the crowd density estimation network includes several feature extraction layers, several feature fusion layers and crowd density estimation layers. layers, several feature extraction layers have different network depths.
  • Steps 31 to 32 have the same or similar technical solutions as the above-mentioned embodiments, and are not repeated here.
  • Step 33 Determine the perspective transformation relationship of each acquisition device according to the position of each acquisition device and the image acquisition angle.
  • each acquisition device corresponds to a perspective transformation relationship.
  • the perspective transformation relationship between the crowd image collected by the collection device and the spatial coordinates of the area can be calculated according to the spatial coordinates and collection angle of the area collected by the collection device.
  • step 33 may be the following process:
  • Step 331 Determine at least four spatial coordinates in the collection area corresponding to the location of each collection device; and determine pixel point coordinates corresponding to the at least four spatial coordinates in the crowd image corresponding to the collection device.
  • the at least four spatial coordinates may be spatial coordinates of landmark buildings in the acquisition area corresponding to the location of the acquisition device. Since the coordinates of the building are fixed relative to the crowd in the collection area, the spatial coordinates of the building coordinates and the coordinates of the pixel points in the crowd image are used as the corresponding reference coordinates, and step 332 is executed.
  • Step 332 Determine the perspective transformation relationship of each acquisition device by using at least four spatial coordinates and pixel point coordinates corresponding to the at least four spatial coordinates.
  • At least four spatial coordinates and pixel point coordinates corresponding to the at least four spatial coordinates can be used to determine the perspective transformation matrix, and the perspective transformation matrix can be used as the perspective transformation relationship of each acquisition device.
  • the perspective transformation matrix can be calculated using the following formula:
  • [x', y', w'] are the transformed coordinates, that is, the spatial coordinates of the collection area
  • [x, y, w] are the coordinates before transformation, that is, the pixel coordinates in the crowd image
  • A is the perspective Transformation matrix
  • the parameters a 11 , a 12 , a 13 , a 21 , a 22 in the perspective transformation matrix A can be obtained , a 23 , a 31 , a 32 and a 33 .
  • w' and w in the coordinates can be set to 1.
  • Step 34 Using the perspective transformation relationship, perform plane projection on each first crowd density image to obtain a corresponding crowd density plane image.
  • each pixel in the first crowd density image is calculated with the perspective transformation relationship, which is equivalent to performing plane projection to obtain its spatial coordinates corresponding to the collection area, and then form a correspondence with these spatial coordinates Crowd Density Flat Image.
  • Step 35 Normalize the plurality of crowd density plane images.
  • step 35 may be the following process:
  • Step 351 Determine the normalized weight matrix.
  • determining the normalized weight matrix includes: using the following formula to determine the normalized weight matrix: Among them, (x 0 , y 0 ) represents the pixel coordinates on the crowd image, (x, y) represents the pixel coordinates on the crowd density plane image corresponding to the pixel coordinates on the crowd image, is the first crowd density image with the Gaussian blur kernel center at the crowd image pixel point (x 0 , y 0 ); Represents the crowd density plane image, i, j and m, n are the pixel coordinates on the crowd image and the pixel coordinates on the crowd density plane image, respectively, w xy is the Gaussian blur kernel center falling on the crowd image pixel (x 0 , y 0 ) of the first crowd density image at the crowd density plane image (x, y), where, The pixel value of the pixel point (x 0 , y 0 ) is 1 before the Gaussian blur is calculated and the pixel value of other pixels is 0.
  • Step 352 Dot-multiply each crowd density plane image with the normalized weight matrix to normalize each crowd density plane image.
  • Step 36 Combine each of the normalized crowd density plane images to form a second crowd density image.
  • step 36 may be the following process:
  • Step 361 Determine the weighted average weight of each crowd density plane image.
  • Step 362 Acquire the first pixel value of the pixel point corresponding to the same plane position in each crowd density plane image to obtain a set of pixel values.
  • Step 363 Use the weighted average weight to perform a weighted average of the first pixel values in the pixel value set to obtain a second pixel value.
  • Step 364 Use the second pixel value as the pixel value of the corresponding pixel in the second crowd density image to form a second crowd density image.
  • the weighted average weight is the inverse of the number of collection devices covered by the surveillance video for each pixel position (corresponding to the position on the world coordinate plane) in each crowd density plane image.
  • the collection areas of the collection devices may overlap due to the settings of the collection devices, and at this time, the overlapping parts need to be processed according to steps 361-364.
  • the non-overlapping parts can also be performed according to the above steps, except that the weighted average weight of the non-overlapping parts is 1.
  • the perspective transformation relationship is used to project the first crowd density images of multiple collection devices onto the same plane, and normalization and spatial fusion are performed to realize cross-video human flow estimation.
  • FIG. 7 is a schematic flowchart of another embodiment of the crowd density estimation method provided by the present application
  • FIG. 8 is an application schematic diagram of the crowd density estimation method provided by the present application.
  • several feature extraction layers include a first feature extraction layer, a second feature extraction layer, a third feature extraction layer and a fourth feature extraction layer; wherein the first feature extraction layer, the second feature extraction layer, the third feature extraction layer The network depths of the feature extraction layer and the fourth feature extraction layer are sequentially increased
  • several feature fusion layers include a first feature fusion layer, a second feature fusion layer, a third feature fusion layer, a fourth feature fusion layer, and a fifth feature fusion layer;
  • the network depths of the first feature fusion layer, the second feature fusion layer, the third feature fusion layer, and the fourth feature fusion layer are the same, and the network depth of the fifth feature fusion layer is greater than the network depth of the first feature fusion layer.
  • the method includes:
  • Step 71 Acquire multiple crowd images.
  • Step 72 Input each crowd image to the first feature extraction layer to output a first feature map.
  • Step 73 Input the first feature map to the second feature extraction layer to output the second feature map.
  • Step 74 Input the second feature map to the third feature extraction layer to output the third feature map, and input the second feature map to the first feature fusion layer to output the first feature fusion map.
  • Step 75 Input the third feature map to the fourth feature extraction layer to output the fourth feature map, and input the third feature map and the first feature fusion map to the fifth feature fusion layer to output the second feature fusion map , and input the third feature map to the second feature fusion layer to output the third feature fusion map.
  • Step 76 Input the fourth feature map, the second feature fusion map and the third feature fusion map to the third feature fusion layer to output the fourth feature fusion map.
  • Step 77 Input the fourth feature fusion map to the fourth feature fusion layer to output the fifth feature fusion map.
  • Step 78 Input the fifth feature fusion map to the crowd density estimation layer to output a first crowd density image corresponding to each image.
  • Step 79 Combine multiple first crowd density images to form a second crowd density image according to the positions and image capturing angles of multiple image capturing devices, so as to use the second crowd density image to estimate the flow of people in the target area.
  • the number of channels of the first feature extraction layer is 3, 64, 64 and 64 in order from the input to the output direction.
  • the structure of the first feature extraction layer is ⁇ C(3,3,64), C(3,64,64), M(2,2) ⁇ , where C(3,3,64) represents a
  • the convolution kernel size is 3, the number of input channels is 3, the number of output channels is 64, and the default activation function is ReLu.
  • M(2,2) represents a maximum pool with a receptive field size of 2 and a stride of 2. chemical layer.
  • the number of channels of the second feature extraction layer is 64, 128, 128 and 128 in order from input to output.
  • the structure of the second feature extraction layer is ⁇ C(3, 64, 128), C(3, 128, 128), M(2, 2) ⁇ .
  • the number of channels of the third feature extraction layer is 128, 256, 256, 256, 256, 256 and 256 in order from input to output.
  • the structure of the third feature extraction layer is ⁇ C(3,128,256), C(3,256,256), C(3,256,256), C(3,256,256), M(2,2) ⁇ .
  • the number of channels of the fourth feature extraction layer is 256, 512, 512, 512, 512, 512 and 512 in order from input to output.
  • the structure of the fourth feature extraction layer is ⁇ C(3,256,512), C(3,512,512), C(3,512,512), C(3,512,512), M(2,2) ⁇ .
  • the number of channels of the first feature fusion layer is 128 and 16 sequentially from input to output.
  • the structure of the first feature fusion layer is ⁇ C(3, 128, 16) ⁇ .
  • the number of channels of the second feature fusion layer is 16 and 16 from the input to the output direction.
  • the structure of the second feature fusion layer is ⁇ C(3,16,16) ⁇ .
  • the number of channels of the third feature fusion layer is 16 and 16 in order from input to output.
  • the number of channels of the fourth feature fusion layer is 16, 16, 16, 16, 16 and 16 in turn from input to output; specifically, the structure of the third feature fusion layer is ⁇ C(3,16,16) ⁇ , the first The structure of the four-feature fusion layer is ⁇ C(3,16,16), C(3,16,16), C(3,16,16) ⁇ .
  • the number of channels of the fifth feature fusion layer is 256 and 16 sequentially from input to output.
  • the structure of the fifth feature fusion layer is ⁇ C(3, 256, 16) ⁇ .
  • the bilinear difference is adopted.
  • the value method upsamples and downsamples the target feature map, and uses a preset convolutional layer for processing to output the target feature map with a uniform number of channels.
  • the convolutional layer is ⁇ C(3,x,16) ⁇ . where x represents the number of input channels of the received target feature map.
  • the training method of the crowd density estimation network is described below.
  • the crowd density estimation network as in any of the above embodiments is constructed.
  • the collection of training samples is carried out.
  • the training samples need crowd images in different regions collected by collection devices at different locations, and real crowd density images corresponding to the crowd images. In this way, hidden features of more scales can be obtained during training, and the estimation accuracy of the crowd density estimation network can be improved.
  • use the training samples to train the crowd density estimation network, where the loss function is defined as follows:
  • ⁇ 1 and ⁇ 2 are the sub-term weights of the loss function.
  • L c is used to represent the loss value between the number of people in the real crowd density image and the number of people in the first crowd density image
  • L ot is used to represent the optimal transmission loss
  • L tv is used to represent the real crowd density image. The loss value between the pixel point and the corresponding pixel point in the first crowd density image.
  • the training can be ended, and the training of the crowd density estimation network is completed, and the trained crowd density estimation network can be used in any of the above embodiments.
  • FIG. 9 is a schematic structural diagram of an embodiment of an electronic device provided by the present application.
  • the electronic device 90 includes a processor 91 and a memory 92 connected to the processor 91; wherein, the memory 92 is used to store program data, and the processor 91 is used to execute the program data to realize the following method:
  • the plurality of crowd images are acquired by a plurality of image acquisition devices respectively; inputting the plurality of crowd images into a crowd density estimation network to obtain a first crowd density image corresponding to each crowd image; wherein , the crowd density estimation network includes several feature extraction layers, several feature fusion layers and crowd density estimation layers; several feature extraction layers have different network depths; The density images are combined to form a second crowd density image, so as to use the second crowd density image to estimate the flow of people in the target area.
  • processor 91 is further configured to execute program data to implement the method provided in any of the foregoing embodiments, and the specific implementation steps may refer to any of the foregoing embodiments, which will not be repeated here.
  • FIG. 10 is a schematic structural diagram of an embodiment of a computer-readable storage medium provided by the present application.
  • the computer-readable storage medium 100 is used to store program data 101, and when the program data 101 is executed by a processor, it is used to realize The following method:
  • the plurality of crowd images are acquired by a plurality of image acquisition devices respectively; inputting the plurality of crowd images into a crowd density estimation network to obtain a first crowd density image corresponding to each crowd image; wherein , the crowd density estimation network includes several feature extraction layers, several feature fusion layers and crowd density estimation layers, and several feature extraction layers have different network depths;
  • the density images are combined to form a second crowd density image, so as to use the second crowd density image to estimate the flow of people in the target area.
  • the disclosed method and device may be implemented in other manners.
  • the device implementations described above are only illustrative.
  • the division of the modules or units is only a logical function division. In actual implementation, there may be other divisions.
  • multiple units or components may be Incorporation may either be integrated into another system, or some features may be omitted, or not implemented.
  • the units described as separate components may or may not be physically separated, and components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this implementation manner.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
  • the integrated units in the other embodiments described above are implemented in the form of software functional units and sold or used as independent products, they may be stored in a computer-readable storage medium.
  • the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Image Analysis (AREA)

Abstract

La présente demande divulgue un procédé d'estimation de densité de foule. Le procédé consiste : à acquérir une pluralité d'images de foule, la pluralité d'images de foule étant respectivement acquises au moyen d'une pluralité de dispositifs d'acquisition d'image ; à entrer la pluralité d'images de foule dans un réseau d'estimation de densité de foule de façon à obtenir une première image de densité de foule correspondant à chaque image de foule, le réseau d'estimation de densité de foule comprenant plusieurs couches d'extraction de caractéristiques, plusieurs couches de fusion de caractéristiques et une couche d'estimation de densité de foule, et les multiples couches d'extraction de caractéristiques ayant des profondeurs de réseau différentes ; et en fonction des positions et des angles d'acquisition d'image de la pluralité de dispositifs d'acquisition d'image, à combiner une pluralité de premières images de densité de foule pour former une seconde image de densité de foule, de façon à estimer le flux de piétons d'une zone cible à l'aide de la seconde image de densité de foule. Au moyen du procédé, la précision des dispositifs d'acquisition acquérant des images de foule à différents angles de vision et différentes distances de champ de vision pour effectuer une estimation de densité de foule peut être améliorée.
PCT/CN2021/079755 2021-03-09 2021-03-09 Procédé d'estimation de densité de foule, dispositif électronique et support de stockage Ceased WO2022188030A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/079755 WO2022188030A1 (fr) 2021-03-09 2021-03-09 Procédé d'estimation de densité de foule, dispositif électronique et support de stockage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/079755 WO2022188030A1 (fr) 2021-03-09 2021-03-09 Procédé d'estimation de densité de foule, dispositif électronique et support de stockage

Publications (1)

Publication Number Publication Date
WO2022188030A1 true WO2022188030A1 (fr) 2022-09-15

Family

ID=83227351

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/079755 Ceased WO2022188030A1 (fr) 2021-03-09 2021-03-09 Procédé d'estimation de densité de foule, dispositif électronique et support de stockage

Country Status (1)

Country Link
WO (1) WO2022188030A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115527166A (zh) * 2022-09-27 2022-12-27 阿里巴巴(中国)有限公司 图像处理方法、计算机可读存储介质以及电子设备
CN115797873A (zh) * 2023-02-06 2023-03-14 泰山学院 一种人群密度检测方法、系统、设备、存储介质及机器人
CN115937772A (zh) * 2022-12-06 2023-04-07 天翼云科技有限公司 一种基于深层次聚合的人群密度估计方法
CN117315428A (zh) * 2023-10-30 2023-12-29 燕山大学 一种跨模态特征对齐融合的人群计数系统及方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108717528A (zh) * 2018-05-15 2018-10-30 苏州平江历史街区保护整治有限责任公司 一种基于深度网络的多策略全局人群分析方法
CN109815867A (zh) * 2019-01-14 2019-05-28 东华大学 一种人群密度估计和人流量统计方法
CN111914819A (zh) * 2020-09-30 2020-11-10 杭州未名信科科技有限公司 一种多摄像头融合的人群密度预测方法、装置、存储介质及终端
US20210027069A1 (en) * 2018-03-29 2021-01-28 Nec Corporation Method, system and computer readable medium for crowd level estimation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210027069A1 (en) * 2018-03-29 2021-01-28 Nec Corporation Method, system and computer readable medium for crowd level estimation
CN108717528A (zh) * 2018-05-15 2018-10-30 苏州平江历史街区保护整治有限责任公司 一种基于深度网络的多策略全局人群分析方法
CN109815867A (zh) * 2019-01-14 2019-05-28 东华大学 一种人群密度估计和人流量统计方法
CN111914819A (zh) * 2020-09-30 2020-11-10 杭州未名信科科技有限公司 一种多摄像头融合的人群密度预测方法、装置、存储介质及终端

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115527166A (zh) * 2022-09-27 2022-12-27 阿里巴巴(中国)有限公司 图像处理方法、计算机可读存储介质以及电子设备
CN115937772A (zh) * 2022-12-06 2023-04-07 天翼云科技有限公司 一种基于深层次聚合的人群密度估计方法
CN115797873A (zh) * 2023-02-06 2023-03-14 泰山学院 一种人群密度检测方法、系统、设备、存储介质及机器人
CN115797873B (zh) * 2023-02-06 2023-05-26 泰山学院 一种人群密度检测方法、系统、设备、存储介质及机器人
CN117315428A (zh) * 2023-10-30 2023-12-29 燕山大学 一种跨模态特征对齐融合的人群计数系统及方法
CN117315428B (zh) * 2023-10-30 2024-04-05 燕山大学 一种跨模态特征对齐融合的人群计数系统及方法

Similar Documents

Publication Publication Date Title
CN112488210B (zh) 一种基于图卷积神经网络的三维点云自动分类方法
CN108710830B (zh) 一种结合密集连接注意力金字塔残差网络和等距限制的人体3d姿势估计方法
CN114170290B (zh) 图像的处理方法及相关设备
CN108596961B (zh) 基于三维卷积神经网络的点云配准方法
CN114519819B (zh) 一种基于全局上下文感知的遥感图像目标检测方法
WO2022188030A1 (fr) Procédé d'estimation de densité de foule, dispositif électronique et support de stockage
CN113837202B (zh) 特征点的提取方法、图像的重建方法及装置
CN110246181A (zh) 基于锚点的姿态估计模型训练方法、姿态估计方法和系统
CN110263716B (zh) 一种基于街景图像的遥感影像超分辨率土地覆被制图方法
WO2020186385A1 (fr) Procédé de traitement d'image, dispositif électronique et support d'informations lisible par ordinateur
CN115222578A (zh) 图像风格迁移方法、程序产品、存储介质及电子设备
CN116091871B (zh) 一种针对目标检测模型的物理对抗样本生成方法及装置
CN113902802A (zh) 视觉定位方法及相关装置、电子设备和存储介质
CN114764856A (zh) 图像语义分割方法和图像语义分割装置
CN111105452A (zh) 基于双目视觉的高低分辨率融合立体匹配方法
CN117392508A (zh) 一种基于坐标注意力机制的目标检测方法和装置
CN113158780A (zh) 区域人群密度估计方法、电子设备及存储介质
CN117078629B (zh) 基于改进YOLOv5的手套缺陷检测方法
Lan et al. Autonomous robot photographer with KL divergence optimization of image composition and human facial direction
CN113487713A (zh) 一种点云特征提取方法、装置及电子设备
CN111914938A (zh) 一种基于全卷积二分支网络的图像属性分类识别方法
CN108171731B (zh) 一种顾及拓扑几何多要素约束的最小影像集自动优选方法
CN120635207A (zh) 全景相机在点云地图中的定位方法的系统
CN119762999A (zh) 基于无人机三维倾斜摄影的积水定位方法、装置、设备及介质
CN113392858B (zh) 一种图像数据处理方法、计算机设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21929512

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21929512

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 24-05-2024)

122 Ep: pct application non-entry in european phase

Ref document number: 21929512

Country of ref document: EP

Kind code of ref document: A1