Crowd concentration degree prediction model construction method and device based on urban space structure
Technical Field
The invention relates to a method and a device for building a crowd concentration prediction model, in particular to a method and a device for building a crowd concentration prediction model based on an urban space structure.
Background
With the gradual change of population and economic conditions in China, the development of cities in China has been shifted from incremental planning to inventory planning, so that the conditions of city management and comfort level are improved, and one of the main means is to improve the walking convenience level of pedestrians.
In the past research, there are two main methods for estimating the pedestrian concentration: one is that on the basis of the traditional four-stage traffic estimation model, the pedestrian mode is included, but the pedestrian cell is only split according to the method of evenly distributed space characteristics, and the requirement characteristic of the pedestrian traveling cannot be estimated; the other method is to use the integration level of the spatial syntax calculation as the basis to calculate pedestrian flow and the condition of pedestrian and vehicle conflict, and the acquisition method is limited by objective conditions, has few data points and is difficult to be really applied to the whole urban road network.
In summary, the existing pedestrian concentration estimation method has the problem that the prediction result is inaccurate.
Disclosure of Invention
The invention aims to provide a method and a device for building a crowd concentration prediction model based on an urban space structure, which are used for solving the problems of inaccurate prediction results and the like of a pedestrian concentration estimation method and device in the prior art.
In order to realize the task, the invention adopts the following technical scheme:
a crowd concentration prediction model construction method based on an urban space structure is used for obtaining a crowd concentration prediction model according to a walking road network of a region to be estimated and is executed according to the following method:
step1, obtaining a walking road network of an area to be estimated, and uniformly segmenting the walking road network to obtain a plurality of road section sampling points;
acquiring the pedestrian concentration of each road section sampling point to obtain a tag set;
step2, acquiring an environment variable of each road section sampling point, and acquiring a sampling point environment parameter set;
the environment variables comprise a road network reachability parameter, a public transport reachability parameter, a subway reachability parameter and an urban commercial activity parameter;
and 3, taking the environment variable set as input and the label set as output, training a random forest model, and obtaining a crowd density prediction model of the area to be estimated.
Further, when the road network reachability parameter of each road section sampling point is acquired in step2, the method specifically includes:
step A, uniformly segmenting the pedestrian road network according to a certain distance to obtain a plurality of line segment nodes;
b, obtaining the road network reachability parameters of each line segment node in a plurality of radius ranges;
wherein the ith line segment node d is obtained by adopting the formula IiRoad network parameter I (d) within radius riR), the unit of r is m:
wherein d is
jDenotes the J-th segment node, i ≠ J, J ═ 1,2, …, J is the i-th segment node d obtained in step a
iThe total number of nodes of all the segments except the node, J is a positive integer,
indicating node d from the ith line segment
iNode d to jth line segment
jThe shortest distance therebetween, in m;
step C, obtaining road network parameters of each line segment node in the same radius range, and obtaining first road network parameters corresponding to each line segment node;
assigning a first road network parameter corresponding to each line segment node to all the road segment sampling points obtained in the step1 by using a neighbor calculation method, and obtaining the road network parameter of each road segment sampling point corresponding to the current radius range;
calculating the correlation between the road network parameters of each road section sampling point corresponding to the current radius range and the pedestrian aggregation degree of each road section sampling point obtained in the step1, and obtaining a correlation numerical value corresponding to the current radius range;
step D, repeating the step C until obtaining a correlation value corresponding to each radius range, and obtaining a plurality of correlation values;
e, selecting a radius range corresponding to the maximum correlation value to obtain an optimal radius range; and taking the road network parameter of each road section sampling point corresponding to the optimal radius range as the road network reachability parameter of each road section sampling point.
Further, when the bus reachability parameter of each road section sampling point is acquired in the step2, the method specifically includes:
step a, acquiring a bus route of an area to be estimated, merging the same-name bus stops in the bus route, and reconstructing topology to obtain a topological graph;
the topological graph comprises a plurality of bus stops and a plurality of bus line segments, and the bus line segments are formed by connecting a plurality of bus stops;
and b, repeating the step to obtain the bus parameters of all bus stops within a plurality of radius ranges:
the formula II is adopted to obtain the bus parameters B (p, R) of the p-th bus stop within the radius range R, and the unit of R is m:
wherein N is the total number of the bus stops obtained in the step a, N is a positive integer, and kpNumber of bus segments passing through the p-th bus stop, kpIs a positive integer, dpqThe minimum bus line length from the p bus stop to the q bus stop is m;
step c, acquiring the bus parameters of each bus stop within the same radius range, and acquiring first bus parameters corresponding to each bus stop;
acquiring urban public transport thermodynamic diagram grid data according to first public transport parameters corresponding to each bus stop point;
after the grid data of the urban public transport thermodynamic diagram is converted into point data, the public transport parameters of each road section sampling point corresponding to the current radius range are obtained by using a neighbor calculation method;
calculating the correlation between the public transportation parameter of each road section sampling point corresponding to the current radius range and the pedestrian concentration of each road section sampling point obtained in the step1, and obtaining a correlation numerical value corresponding to the current radius range;
d, repeating the step c until obtaining the correlation value corresponding to each radius range, and obtaining a plurality of correlation values;
e, selecting a radius range corresponding to the maximum correlation value to obtain an optimal radius range; and taking the bus parameter of each road section sampling point corresponding to the optimal radius range as the bus reachability parameter of each road section sampling point.
Further, when the subway reachability parameter of each road section sampling point is acquired in step2, the method specifically includes:
step I, acquiring a subway line map of an area to be estimated, combining the same-name stations in the subway line map, and interrupting the stations to obtain an axis map of space syntax;
the axis map of the space syntax comprises a plurality of subway stations and a plurality of subway line segments, wherein the subway line segments are formed by connecting a plurality of subway stations;
and step II, repeating the step to obtain subway parameters of all subway stations within a plurality of radius ranges:
obtaining subway parameters S (u, t) of the u-th subway station within a radius range t by adopting a formula III, wherein the unit of t is m:
wherein U is the total number of subway stations obtained in the step I, U is a positive integer, d
uvIs the minimum subway line length from the u subway station to the v subway station, and has the unit of m,
step III, acquiring subway parameters of each subway station within the same radius range, and acquiring first subway parameters corresponding to each subway station;
acquiring urban subway thermodynamic diagram grid data according to a first subway parameter corresponding to each subway station;
after converting the urban subway thermodynamic diagram grid data into point data, acquiring subway parameters of each road section sampling point corresponding to the current radius range by using a neighbor calculation method;
calculating the correlation between the subway parameters of each road section sampling point corresponding to the current radius range and the pedestrian concentration of each road section sampling point obtained in the step1 to obtain a correlation numerical value corresponding to the current radius range;
step IV, repeating the step III until obtaining the correlation value corresponding to each radius range, and obtaining a plurality of correlation values;
v, selecting a radius range corresponding to the maximum correlation value to obtain an optimal radius range; and taking the subway parameter of each road section sampling point corresponding to the optimal radius range as the subway reachability parameter of each road section sampling point.
A crowd concentration degree prediction method based on an urban space structure is implemented according to the following steps:
step1, obtaining a walking road network of the area to be estimated, obtaining the environmental parameters of each road section sampling point in the area to be estimated by adopting the method in the Step2 in the urban spatial structure-based crowd concentration prediction model construction method, and obtaining an environmental parameter set;
and Step2, inputting the environment parameter set obtained at Step1 into a crowd concentration prediction model constructed by a crowd concentration prediction model construction method based on an urban space structure, and obtaining the crowd concentration.
A crowd concentration degree prediction model building device based on an urban space structure comprises a data obtaining module, a data preprocessing module and a model building module, wherein the data obtaining module is used for obtaining a data set of a user, the data preprocessing module is used for preprocessing the data set, and the model building module is used for:
the data acquisition module is used for acquiring a walking road network of an area to be estimated, uniformly segmenting the walking road network and acquiring a plurality of road section sampling points;
acquiring the pedestrian concentration of each road section sampling point to obtain a tag set;
the data preprocessing module is used for acquiring the environment variable of each road section sampling point and acquiring a sampling point environment parameter set;
the environment variables comprise a road network reachability parameter, a public transport reachability parameter, a subway reachability parameter and an urban commercial activity parameter;
and the model construction module is used for training a random forest model by taking the environment variable set as input and the label set as output to obtain a crowd density prediction model of the region to be estimated.
Further, the data preprocessing module comprises a road network reachability parameter calculation submodule for acquiring road network reachability parameters of each road section sampling point;
the road network reachability parameter calculation submodule comprises a road network segmentation unit, a road network parameter calculation unit, a road network correlation calculation unit and a road network reachability parameter acquisition unit;
the road network segmentation unit is used for uniformly segmenting the walking road network to obtain a plurality of line segment nodes;
the road network parameter calculating unit is used for obtaining the road network reachability parameters of each line segment node in a plurality of radius ranges, wherein the ith line segment node d is obtained by adopting a formula IiRoad network parameter I (d) within radius riR), the unit of r is m:
wherein d is
jDenotes the J-th segment node, i ≠ J, J ═ 1,2, …, J is the i-th segment node d obtained in step a
iThe total number of nodes of all the segments except the node, J is a positive integer,
indicating node d from the ith line segment
iNode d to jth line segment
jThe shortest distance therebetween, in m;
the road network correlation calculation unit is used for obtaining road network parameters of each line segment node in the same radius range and obtaining a first road network parameter corresponding to each line segment node;
assigning a first road network parameter corresponding to each line segment node to all the road segment sampling points obtained in the step1 by using a neighbor calculation method, and obtaining the road network parameter of each road segment sampling point corresponding to the current radius range;
calculating the correlation between the road network parameters of each road section sampling point corresponding to the current radius range and the pedestrian aggregation degree of each road section sampling point obtained by the data obtaining module to obtain a correlation value corresponding to the current radius range;
obtaining a correlation value corresponding to each radius range, and obtaining a plurality of correlation values;
the road network reachability parameter obtaining unit is used for selecting the radius range corresponding to the maximum correlation value to obtain the optimal radius range; and taking the road network parameter of each road section sampling point corresponding to the optimal radius range as the road network reachability parameter of each road section sampling point.
Furthermore, the data preprocessing module also comprises a public transport reachability parameter calculation submodule for acquiring the public transport reachability parameter of each road section sampling point;
the bus reachability parameter calculation submodule comprises a bus segmentation unit, a bus parameter calculation unit, a bus correlation calculation unit and a bus reachability parameter acquisition unit;
the bus segmentation unit is used for acquiring a bus route of an area to be estimated, merging the same-name bus stops in the bus route and reconstructing topology to obtain a topological graph;
the topological graph comprises a plurality of bus stops and a plurality of bus line segments, and the bus line segments are formed by connecting a plurality of bus stops;
the public transportation parameter calculation unit is used for obtaining public transportation parameters of all public transportation stations within a plurality of radius ranges, wherein the public transportation parameters B (p, R) of the p-th public transportation station within the radius range R are obtained by adopting a formula II, and the unit of R is m:
wherein N is the total number of the bus stops obtained in the step a, N is a positive integer, and kpNumber of bus segments passing through the p-th bus stop, kpIs a positive integer, dpqThe minimum bus line length from the p bus stop to the q bus stop is m;
the bus correlation calculation unit is used for obtaining bus parameters of each bus stop in the same radius range and obtaining first bus parameters corresponding to each bus stop;
acquiring urban public transport thermodynamic diagram grid data according to first public transport parameters corresponding to each bus stop point;
after the grid data of the urban public transport thermodynamic diagram is converted into point data, the public transport parameters of each road section sampling point corresponding to the current radius range are obtained by using a neighbor calculation method;
calculating the correlation between the public transportation parameter of each road section sampling point corresponding to the current radius range and the pedestrian concentration of each road section sampling point obtained by the data obtaining module to obtain a correlation value corresponding to the current radius range;
until obtaining the corresponding correlation value of each radius range, obtaining a plurality of correlation values;
the bus reachability parameter obtaining unit is used for selecting the radius range corresponding to the maximum correlation value to obtain the optimal radius range; and taking the bus parameter of each road section sampling point corresponding to the optimal radius range as the bus reachability parameter of each road section sampling point.
Furthermore, the data preprocessing module also comprises a subway reachability parameter calculation submodule for acquiring the subway reachability parameter of each road section sampling point;
the subway reachability parameter calculation submodule comprises a subway segmentation unit, a subway parameter calculation unit, a subway correlation calculation unit and a subway reachability parameter acquisition unit;
the subway segmentation unit is used for acquiring a subway line map of an area to be estimated, merging the same-name stations in the subway line map and then breaking the stations to obtain an axis map of space syntax;
the axis map of the space syntax comprises a plurality of subway stations and a plurality of subway line segments, wherein the subway line segments are formed by connecting a plurality of subway stations;
the subway parameter calculation unit is used for obtaining subway parameters of all subway stations within a plurality of radius ranges, wherein a formula III is adopted to obtain subway parameters S (u, t) of the u-th subway station within a radius range t, and the unit of t is m:
wherein U is the total number of subway stations obtained in the step I, U is a positive integer, d
uvIs the minimum subway line length from the u subway station to the v subway station, and has the unit of m,
the subway correlation calculation unit is used for obtaining subway parameters of each subway station within the same radius range and obtaining a first subway parameter corresponding to each subway station;
acquiring urban subway thermodynamic diagram grid data according to a first subway parameter corresponding to each subway station;
after converting the urban subway thermodynamic diagram grid data into point data, acquiring subway parameters of each road section sampling point corresponding to the current radius range by using a neighbor calculation method;
calculating the correlation between the subway parameters of each road section sampling point corresponding to the current radius range and the pedestrian concentration of each road section sampling point obtained by the data obtaining module to obtain a correlation value corresponding to the current radius range;
until obtaining the corresponding correlation value of each radius range, obtaining a plurality of correlation values;
the subway reachability parameter obtaining unit is used for selecting the radius range corresponding to the maximum correlation value to obtain the optimal radius range; and taking the subway parameter of each road section sampling point corresponding to the optimal radius range as the subway reachability parameter of each road section sampling point.
A crowd concentration degree prediction device based on an urban space structure comprises a data acquisition module, a crowd concentration degree prediction model construction device based on the urban space structure and a prediction module;
the data acquisition module is used for acquiring a walking road network of a region to be estimated;
the system is also used for inputting the pedestrian road network of the area to be estimated to a data preprocessing module in a crowd concentration degree prediction model construction device based on an urban space structure to obtain the environmental parameters of each road section sampling point in the area to be estimated and obtain an environmental parameter set;
the crowd concentration prediction model construction device based on the urban space structure is used for obtaining a crowd concentration prediction model;
the prediction module is used for inputting the environment parameter set into the crowd concentration prediction model to obtain the crowd concentration.
Compared with the prior art, the invention has the following technical effects:
1. the method and the device for building the urban spatial structure-based crowd concentration prediction model provide that the crowd concentration is estimated by applying four environment variables of road network accessibility, bus accessibility, subway accessibility and commercial activity, the four environment variable values can reflect the height of the crowd concentration in one area to a great extent, have high correlation with the crowd concentration value and can be used as four important environment variables for measuring the crowd concentration, so that the accuracy of crowd concentration estimation is improved;
2. the method and the device for building the crowd concentration prediction model based on the urban space structure provide a new method for calculating the accessibility parameters of the road network, when the road network is obtained, the road network including walking minor roads can be obtained as detailed as possible, and the road network is processed after being interrupted again, so that overlong or overlong road sections are avoided, and the calculation accuracy is improved; meanwhile, a simple and generally accepted calculation method of space syntax is adopted, the value is assigned to the road section sampling point again through neighbor analysis, and the radius with the strongest correlation with the crowd concentration is compared and screened again in the process to serve as the calculation radius, so that the accuracy of crowd concentration estimation is improved, and the calculation is convenient and simple, high in accuracy and strong in practicability;
3. the invention provides a novel method for calculating the parameters of the accessibility of the public transport and the accessibility of the subway, which is provided by the invention, and a method and a device for constructing a crowd concentration prediction model based on an urban space structure, wherein stations are taken as nodes for calculating the accessibility of the public transport and the subway, so that the method and the device accord with the actual conditions in real life, and the higher the crowd concentration is, the higher the station contact ratio of the public transport and the subway is, the more convenient the switching is; meanwhile, the calculated nodes are firstly integrated into a thermodynamic diagram, so that the accuracy of the obtained result can be visually seen and verified, and meanwhile, the node value after the grid data is converted into the point data is used as the final result of assigning values for the sampling points, so that the node density is increased, the accuracy of the calculated result is improved, the accuracy of crowd density estimation is improved, and the calculation is convenient and simple, high in accuracy and high in practicability.
Drawings
FIG. 1 is a microtexture crowd mass flow thermodynamic diagram provided in one embodiment of the invention;
FIG. 2 is a graph of a correlation of pedestrian concentration with walking reachability metric radius provided in an embodiment of the present invention;
FIG. 3 is a graph illustrating a correlation between a bus stop radius and a population concentration according to an embodiment of the present invention;
fig. 4 is a diagram illustrating a calculation result of a bus reachability parameter provided in an embodiment of the present invention;
fig. 5 is a diagram illustrating a calculation result of a subway reachability parameter provided in an embodiment of the present invention;
FIG. 6 is a business POI concentration graph provided in one embodiment of the present invention;
fig. 7 is a scatter plot of the estimated population concentration value and the measured value according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and examples. So that those skilled in the art can better understand the present invention. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the subject matter of the present invention.
The following definitions or conceptual connotations relating to the present invention are provided for illustration:
road section sampling points: the road section is represented by a data point, and the data point comprises information such as the position of the road section.
Road network accessibility: the degree of accessibility of a road network under the influence of network layout, transport conditions (traffic patterns) and land use, here primarily the accessibility in the space of the network layout.
Bus accessibility: the degree of access of a bus in a certain place on a network layout is mainly biased to the convenience degree of obtaining bus service by an individual.
Accessibility of the subway: the accessibility of the subway on the space network structure and the convenience of transfer.
Urban commercial activity: the vitality of the business development is mainly embodied by the density of business interest points, and the denser the business interest points, the higher the value of the business activity of the place.
Urban public transport thermodynamic diagram: the bus accessibility is represented in a special highlight form, and the brighter places in the thermodynamic diagram represent the better bus accessibility.
Urban subway thermodynamic diagrams: the accessibility of the subway is represented in a special highlight form, and the brighter places in the thermodynamic diagram represent that the convenience of the subway is stronger and the accessibility is better.
Example one
A crowd concentration prediction model construction method based on an urban space structure is used for obtaining a crowd concentration prediction model according to a walking road network of a region to be estimated and is executed according to the following method:
step1, obtaining a walking road network of an area to be estimated, and uniformly segmenting the walking road network to obtain a plurality of road section sampling points;
acquiring the pedestrian concentration of each road section sampling point to obtain a tag set;
in this embodiment, a road network in west ampere is obtained, the road network is broken into small road segments at intervals of 30m in arcgis, and the small road segments are converted into road segment sampling points.
Acquiring the population aggregation thermodynamic diagrams of the Xian city at different periods of West confidence and suitable traveling as a data base, as shown in FIG. 1; and performing nearest neighbor analysis on the crowd clustering thermodynamic diagram to obtain a crowd clustering value, namely a label value, of each road section sampling point in the road network.
Step2, acquiring an environment variable of each road section sampling point, and acquiring a sampling point environment parameter set;
the environment variables comprise a road network reachability parameter, a public transport reachability parameter, a subway reachability parameter and an urban commercial activity parameter;
in this embodiment, the road network reachability parameter, the bus reachability parameter, the subway reachability parameter, and the city business activity parameter are all available in the prior art.
However, in the invention, four parameters of road network accessibility, public transport accessibility, subway accessibility and commercial activity are selected in consideration of several main factors closely related to the crowd concentration, in the process of processing data and further performing calculation, in order to improve the calculation accuracy, a series of data processing methods are provided, and example results further show that the selected parameters and the used method have high accuracy and practicability.
Optionally, when the road network reachability parameter of each road segment sampling point is acquired in step2, the method specifically includes:
a, uniformly segmenting a walking road network to obtain a plurality of segment nodes;
b, obtaining the road network reachability parameters of each line segment node in a plurality of radius ranges;
wherein the ith line segment node d is obtained by adopting the formula IiRoad network parameter I (d) within radius riR), the unit of r is m:
wherein d is
jDenotes the J-th segment node, i ≠ J, J ═ 1,2, …, J is the i-th segment node d obtained in step a
iThe total number of nodes of all the segments except the node, J is a positive integer,
indicating node d from the ith line segment
iNode d to jth line segment
jThe shortest distance therebetween, in m;
step C, obtaining road network parameters of each line segment node in the same radius range, and obtaining first road network parameters corresponding to each line segment node;
assigning a first road network parameter corresponding to each line segment node to all the road segment sampling points obtained in the step1 by using a neighbor calculation method, and obtaining the road network parameter of each road segment sampling point corresponding to the current radius range;
calculating the correlation between the road network parameters of each road section sampling point corresponding to the current radius range and the pedestrian aggregation degree of each road section sampling point obtained in the step1, and obtaining a correlation numerical value corresponding to the current radius range;
step D, repeating the step C until obtaining a correlation value corresponding to each radius range, and obtaining a plurality of correlation values;
e, selecting a radius range corresponding to the maximum correlation value to obtain an optimal radius range; and taking the road network parameter of each road section sampling point corresponding to the optimal radius range as the road network reachability parameter of each road section sampling point.
In the present embodiment, the statistical process is shown in fig. 2, and when the radius is 6000 meters, the correlation between the walking accessibility and the crowd concentration is the strongest. And finally, assigning the value of the line segment node closest to each sampling point in the obtained line segment nodes to the sampling point by a neighbor calculation method, and taking the value as the road network reachability value of the sampling point.
Optionally, when the bus reachability parameter of each road section sampling point is acquired in step2, the method specifically includes:
step a, acquiring a bus route of an area to be estimated, merging the same-name bus stops in the bus route, and reconstructing topology to obtain a topological graph;
the topological graph comprises a plurality of bus stops and a plurality of bus line segments, wherein the bus line segments are formed by connecting a plurality of bus stops;
and b, repeating the step to obtain the bus parameters of all bus stops within a plurality of radius ranges:
the formula II is adopted to obtain the bus parameters B (p, R) of the p-th bus stop within the radius range R, and the unit of R is m:
wherein N is the total number of the bus stops obtained in the step a, N is a positive integer, and kpNumber of bus segments passing through the p-th bus stop, kpIs a positive integer, dpqThe minimum bus line length from the p bus stop to the q bus stop is m;
step c, acquiring the bus parameters of each bus stop within the same radius range, and acquiring first bus parameters corresponding to each bus stop;
acquiring urban public transport thermodynamic diagram grid data according to first public transport parameters corresponding to each bus stop point;
after converting the grid data of the urban public transport thermodynamic diagram into point data, acquiring the public transport parameters of each road section sampling point corresponding to the current radius range by using a neighbor calculation method;
calculating the correlation between the public transportation parameter of each road section sampling point corresponding to the current radius range and the pedestrian concentration of each road section sampling point obtained in the step1, and obtaining a correlation numerical value corresponding to the current radius range;
d, repeating the step c until obtaining the correlation value corresponding to each radius range, and obtaining a plurality of correlation values;
e, selecting a radius range corresponding to the maximum correlation value to obtain an optimal radius range; and taking the bus parameter of each road section sampling point corresponding to the optimal radius range as the bus reachability parameter of each road section sampling point.
In the embodiment, as shown in fig. 3, the analysis process has the results that when the radius is 2600 meters, the bus reachability has the strongest correlation with the crowd concentration, and then the reachability coverage area with the radius of 2600 meters is constructed by the kernel density method and is fused into the bus reachability thermodynamic diagram of the whole city; and finally, acquiring the bus reachability parameter of the sampling point of the second road section by a neighbor analysis method, wherein the final bus reachability processing result is shown in fig. 4.
Optionally, when the subway reachability parameter of each road section sampling point is acquired in step2, the method specifically includes:
step I, acquiring a subway line map of an area to be estimated, combining the same-name stations in the subway line map, and interrupting the stations to obtain an axis map of space syntax;
the axis map of the space syntax comprises a plurality of subway stations and a plurality of subway line segments, wherein the subway line segments are formed by connecting a plurality of subway stations;
and step II, repeating the step to obtain subway parameters of all subway stations within a plurality of radius ranges:
obtaining subway parameters S (u, t) of the u-th subway station within a radius range t by adopting a formula III, wherein the unit of t is m:
wherein U is the total number of subway stations obtained in the step I, U is a positive integer, d
uvIs the minimum subway line length from the u subway station to the v subway station, and has the unit of m,
step III, acquiring subway parameters of each subway station within the same radius range, and acquiring first subway parameters corresponding to each subway station;
acquiring urban subway thermodynamic diagram grid data according to a first subway parameter corresponding to each subway station;
after converting the urban subway thermodynamic diagram grid data into point data, acquiring subway parameters of each road section sampling point corresponding to the current radius range by using a neighbor calculation method;
calculating the correlation between the subway parameters of each road section sampling point corresponding to the current radius range and the pedestrian concentration of each road section sampling point obtained in the step1 to obtain a correlation numerical value corresponding to the current radius range;
step IV, repeating the step III until obtaining the correlation value corresponding to each radius range, and obtaining a plurality of correlation values;
v, selecting a radius range corresponding to the maximum correlation value to obtain an optimal radius range; and taking the subway parameter of each road section sampling point corresponding to the optimal radius range as the subway reachability parameter of each road section sampling point.
In this embodiment, first, three established subway lines in west ampere city are selected, the same-name stations are merged, and then after the stations are broken, the subway network is converted into a spatial syntactic axis map; the spatial integrity is then computed using the DepthMap spatial syntax. And finally, constructing a reachability coverage area with the radius of 1500m by using each subway station as a circle center through a kernel density method, fusing the reachability coverage area into a subway reachability thermodynamic diagram of the whole city, and calculating the subway reachability of the sampling point through a proximity calculation method, wherein a processing result is shown in fig. 5.
In this embodiment, obtaining the city business activity parameters of the road segment sampling points specifically includes:
1. obtaining a map of a region to be estimated and commercial activity data of catering, entertainment and shopping, wherein the commercial activity data comprises attribute information such as the geographic position, name and category of commercial activities;
2. endowing the obtained business POI data on a map according to the coordinate position, and generating a city business activity thermodynamic diagram;
3. converting the commercial activity thermodynamic diagram grid data into point data, performing neighbor analysis on a sampling point i, selecting a value of a commercial activity point nearest to the sampling point of the ith road section, and assigning the value to the sampling point as an urban commercial activity parameter of the ith sampling point;
in this embodiment, firstly, data of commercial catering and shopping poi in west ampere city in 2018 and 5 months are collected, and 151819 pieces of data are collected in total, including attribute information such as geographic location, name and category. Secondly, for simplicity, the influence of various commercial poi interest points on pedestrians is assumed to be similar, regardless of the differences in scale and category of the different pois; further, the entire sienna market commercial activity thermodynamic diagram was calculated by a method of nuclear density buffering (the buffer radius was set to 300 meters), and as shown in fig. 6, the commercial activity of the sampling point was calculated by a proximity calculation method.
And 3, taking the environment variable set as input and the label set as output, training a neural network model, and obtaining a crowd density prediction model of the area to be estimated.
In this embodiment, the neural network model is a random forest prediction module, and first, a tree (N) and a depth (M) of the random tree are initialized, where N is 500 (default), and M is 4; secondly, taking 70% of data of the obtained parameter matrix as input to serve as a training set, adopting a Bootstrap resampling technology, and randomly extracting N (500) sub-training sets which are replaced; and finally, independently learning the 500 extracted training sets to generate 500 decision trees to form a required random forest regression model.
Example two
A crowd concentration degree prediction method based on an urban space structure is implemented according to the following steps:
step1, obtaining a walking road network of the area to be estimated, obtaining an environmental parameter of each road section sampling point in the area to be estimated by adopting the method in the Step2 in the method for constructing the urban spatial structure-based crowd concentration prediction model in the first embodiment, and obtaining an environmental parameter set;
and Step2, inputting the environment parameter set obtained at Step1 into the crowd concentration prediction model constructed by the method for constructing the crowd concentration prediction model based on the urban space structure in the first embodiment, and obtaining the crowd concentration.
In this embodiment, four environmental variables of walking convenience, bus accessibility, subway accessibility and urban commercial activity of the remaining 30% of the verification set data in the first embodiment are used as inputs to obtain a crowd density predicted value and calculate the accuracy of the model. The method comprises the following specific steps:
firstly, on the basis of the obtained random forest, using the remaining 30% of data as a verification set, and inputting four environment variables of walking convenience, bus accessibility, subway accessibility and urban commercial activity to obtain predicted crowd concentration data; and secondly, calculating the average value of the predicted crowd concentration data of each decision tree as a final predicted value. Finally, the Mean Square Error (MSE) and the deterministic coefficient (R) are calculated by calculating the predicted value and the real value2) Judging the accuracy and the reference value of the model, wherein the calculation formula is as follows:
in the formula, Q
iIn order to be the actual value of the measurement,
in order to predict the value of the target,
mean of the true values.
Based on Mean Square Error (MSE) and deterministic coefficient (R)2) Judging the model precision of the random forest regression model, wherein the smaller the MSE, the higher the model precision, R2The closer to 1, the stronger the model reference value. The calculation results are shown in table 1, the certainty coefficients of the random forest regression models at all time intervals are higher than 95% and the mean square errors are less than 0.02, so that the random forest regression models established by the model have higher prediction accuracy while ensuring higher reference value, and the model is sufficient to predict the spatial distribution condition of people in cities in a city range according to related data, which shows that the model has sufficient scientificity for predicting the actual people flow concentration.
TABLE 1 statistical table of prediction results of random forest regression models in different time periods
In order to visually observe the relationship between the actual value and the predicted value of the crowd concentration, in a scatter diagram of the actual value and the predicted value of the crowd concentration, as shown in fig. 7, the distribution of the actual value and the predicted value of the crowd concentration shows high consistency. Therefore, the random forest regression model obtained through training of the four basic data of the city of Xian is ideal in prediction accuracy and certainty indexes of the crowd concentration degree, the certainty coefficient is up to more than 95%, and the prediction model established through random forest regression can be used for quite accurately predicting the crowd distribution condition in the city range according to the basic data.
EXAMPLE III
A crowd concentration degree prediction model building device based on an urban space structure comprises a data obtaining module, a data preprocessing module and a model building module, wherein the data obtaining module is used for obtaining a data set of a user, the data preprocessing module is used for preprocessing the data set, and the model building module is used for:
the data acquisition module is used for acquiring a walking road network of the area to be estimated, uniformly segmenting the walking road network and acquiring a plurality of road section sampling points;
acquiring the pedestrian concentration of each road section sampling point to obtain a tag set;
the data preprocessing module is used for acquiring the environment variable of each road section sampling point and acquiring a sampling point environment parameter set;
the environment variables comprise a road network reachability parameter, a public transport reachability parameter, a subway reachability parameter and an urban commercial activity parameter;
and the model construction module is used for taking the environment variable set as input and the label set as output, training the neural network model and obtaining a crowd density prediction model of the area to be estimated.
Optionally, the data preprocessing module includes a road network reachability parameter calculation sub-module for acquiring a road network reachability parameter of each road segment sampling point;
the road network reachability parameter calculation submodule comprises a road network segmentation unit, a road network parameter calculation unit, a road network correlation calculation unit and a road network reachability parameter acquisition unit;
the road network segmentation unit is used for uniformly segmenting the walking road network to obtain a plurality of line segment nodes;
the road network parameter calculation unit is used for obtaining the number of nodes of each line segmentRoad network accessibility parameters within a radius range, wherein the ith line segment node d is obtained by adopting the formula IiRoad network parameter I (d) within radius riR), the unit of r is m:
wherein d is
jDenotes the J-th segment node, i ≠ J, J ═ 1,2, …, J is the i-th segment node d obtained in step a
iThe total number of nodes of all the segments except the node, J is a positive integer,
indicating node d from the ith line segment
iNode d to jth line segment
jThe shortest distance therebetween, in m;
the road network correlation calculation unit is used for obtaining road network parameters of each line segment node in the same radius range and obtaining a first road network parameter corresponding to each line segment node;
assigning a first road network parameter corresponding to each line segment node to all the road segment sampling points obtained in the step1 by using a neighbor calculation method, and obtaining the road network parameter of each road segment sampling point corresponding to the current radius range;
calculating the correlation between the road network parameters of each road section sampling point corresponding to the current radius range and the pedestrian aggregation degree of each road section sampling point obtained by the data obtaining module to obtain a correlation value corresponding to the current radius range;
obtaining a correlation value corresponding to each radius range, and obtaining a plurality of correlation values;
the road network reachability parameter obtaining unit is used for selecting the radius range corresponding to the maximum correlation value to obtain the optimal radius range; and taking the road network parameter of each road section sampling point corresponding to the optimal radius range as the road network reachability parameter of each road section sampling point.
Optionally, the data preprocessing module further comprises a bus reachability parameter calculation submodule for acquiring a bus reachability parameter of each road section sampling point;
the bus reachability parameter calculation submodule comprises a bus segmentation unit, a bus parameter calculation unit, a bus correlation calculation unit and a bus reachability parameter acquisition unit;
the bus segmentation unit is used for acquiring a bus route of an area to be estimated, merging the same-name bus stops in the bus route and reconstructing topology to obtain a topological graph;
the topological graph comprises a plurality of bus stops and a plurality of bus line segments, wherein the bus line segments are formed by connecting a plurality of bus stops;
the bus parameter calculation unit is used for obtaining bus parameters of all bus stops within a plurality of radius ranges, wherein the bus parameters B (p, R) of the p-th bus stop within the radius range R are obtained by adopting a formula II, and the unit of R is m:
wherein N is the total number of the bus stops obtained in the step a, N is a positive integer, and kpNumber of bus segments passing through the p-th bus stop, kpIs a positive integer, dpqThe minimum bus line length from the p bus stop to the q bus stop is m;
the bus correlation calculation unit is used for obtaining bus parameters of each bus stop in the same radius range and obtaining first bus parameters corresponding to each bus stop;
acquiring urban public transport thermodynamic diagram grid data according to first public transport parameters corresponding to each bus stop point;
after converting the grid data of the urban public transport thermodynamic diagram into point data, acquiring the public transport parameters of each road section sampling point corresponding to the current radius range by using a neighbor calculation method;
calculating the correlation between the public transportation parameter of each road section sampling point corresponding to the current radius range and the pedestrian concentration of each road section sampling point obtained by the data obtaining module to obtain a correlation value corresponding to the current radius range;
until obtaining the corresponding correlation value of each radius range, obtaining a plurality of correlation values;
the bus reachability parameter obtaining unit is used for selecting the radius range corresponding to the maximum correlation value to obtain the optimal radius range; and taking the bus parameter of each road section sampling point corresponding to the optimal radius range as the bus reachability parameter of each road section sampling point.
Optionally, the data preprocessing module further includes a subway reachability parameter calculation submodule for acquiring a subway reachability parameter of each road section sampling point;
the subway reachability parameter calculation submodule comprises a subway segmentation unit, a subway parameter calculation unit, a subway correlation calculation unit and a subway reachability parameter acquisition unit;
the subway segmentation unit is used for acquiring a subway line map of an area to be estimated, merging the same-name stations in the subway line map and then breaking the stations to obtain an axis map of space syntax;
the axis map of the space syntax comprises a plurality of subway stations and a plurality of subway line segments, wherein the subway line segments are formed by connecting a plurality of subway stations;
the subway parameter calculation unit is used for obtaining subway parameters of all subway stations within a plurality of radius ranges, wherein a subway parameter S (u, t) of the u-th subway station within a radius range t is obtained by adopting a formula III, and the unit of t is m:
wherein U is the total number of subway stations obtained in the step I, U is a positive integer, d
uvIs the minimum subway line length from the u subway station to the v subway station, and has the unit of m,
the subway correlation calculation unit is used for obtaining subway parameters of each subway station within the same radius range and obtaining a first subway parameter corresponding to each subway station;
acquiring urban subway thermodynamic diagram grid data according to a first subway parameter corresponding to each subway station;
after converting the urban subway thermodynamic diagram grid data into point data, acquiring subway parameters of each road section sampling point corresponding to the current radius range by using a neighbor calculation method;
calculating the correlation between the subway parameters of each road section sampling point corresponding to the current radius range and the pedestrian concentration of each road section sampling point obtained by the data obtaining module to obtain a correlation value corresponding to the current radius range;
until obtaining the corresponding correlation value of each radius range, obtaining a plurality of correlation values;
the subway reachability parameter obtaining unit is used for selecting the radius range corresponding to the maximum correlation value to obtain the optimal radius range; and taking the subway parameter of each road section sampling point corresponding to the optimal radius range as the subway reachability parameter of each road section sampling point.
Example four
A crowd concentration degree prediction device based on an urban space structure comprises a data acquisition module, a crowd concentration degree prediction model construction device based on the urban space structure in the third embodiment and a prediction module;
the data acquisition module is used for acquiring a walking road network of the area to be estimated;
the system is also used for inputting the road network of the area to be estimated into a data preprocessing module in the crowd concentration prediction model building device based on the urban space structure in the third embodiment to obtain the environmental parameters of each road section sampling point in the area to be estimated and obtain an environmental parameter set;
the crowd concentration degree prediction model construction device based on the urban space structure is used for obtaining a crowd concentration degree prediction model;
the prediction module is used for inputting the environment parameter set into the crowd concentration prediction model to obtain the crowd concentration.