CN111210111A

CN111210111A - Urban environment assessment method and system based on online learning and crowdsourcing data analysis

Info

Publication number: CN111210111A
Application number: CN201911332515.0A
Authority: CN
Inventors: 张一杨; 马小雯; 舒元昊; 张惠根; 林兴萍
Original assignee: CETHIK Group Ltd
Current assignee: CETHIK Group Ltd
Priority date: 2019-12-22
Filing date: 2019-12-22
Publication date: 2020-05-29
Anticipated expiration: 2039-12-22
Also published as: CN111210111B

Abstract

The invention discloses an urban environment assessment method and system based on online learning and crowdsourcing data analysis, wherein the method comprises the following steps: collecting urban environment assessment data from a crowdsourcing data platform, and preprocessing the collected urban environment assessment data by utilizing a crowdsourcing algorithm; constructing an urban environment evaluation model, and training the urban environment evaluation model by adopting preprocessed urban environment evaluation data; collecting new model training data, constructing an online learning algorithm, optimizing the urban environment evaluation model at regular time by utilizing the online learning algorithm based on the new model training data, and outputting urban impression attribute comparison results to be evaluated by utilizing the latest urban environment evaluation model. The method and the device fully consider the differences of different population evaluations, can update the evaluation model in real time, have strong universality and obviously improve the accuracy of the evaluation result.

Description

Urban environment assessment method and system based on online learning and crowdsourcing data analysis

Technical Field

The application belongs to the field of smart cities, and particularly relates to a city environment assessment method and system based on online learning and crowdsourcing data analysis.

Background

The expansion of urban size and the acceleration of urbanization process have brought about great challenges to urban development. To address the challenge, smart cities have come. Urban environment, i.e. natural and artificial external conditions affecting urban human activities, is one of the important indicators for measuring urban development. Urban environment assessment is an important component in smart city research, and the final purpose of the urban environment assessment is to improve urban environment, improve citizen satisfaction, assist in making relevant policies and realize urban sustainable development.

The urban environment includes urban economic environment, social environment, ecological environment, aesthetic environment, and the like. The urban environment assessment traditionally adopts a method based on field investigation, and the method has high cost and low benefit and is difficult to analyze from a macroscopic perspective. With the development of sensor technology and the arrival of the big data era, a large number of street view pictures are provided for urban environment assessment research by a new data acquisition means, and the street view assessment can be used as a new branch of urban environment assessment. The street view picture has the characteristics of wide distribution range, large data volume and detailed content, and can reflect the urban state from the micro and macro level simultaneously. Meanwhile, the current rapidly-developed deep learning technology, especially a computer vision model, is introduced into an urban environment assessment task, so that the image feature extraction cost is reduced, the authenticity of urban environment assessment is improved, and the application field of smart cities is expanded.

City environment assessment based on street view (i.e., street view assessment) related research has recently become a new direction in the field of smart cities. At present, the city environment assessment related research based on street view mainly focuses on improving the scoring accuracy of a model to city feeling through paired street view pictures, and ignores the current situations that the subjectivity of city environment assessment is strong and the individual difference of a labeling result is large, so that the current assessment result is large in difference and low in accuracy, and the existing assessment system lacks the real-time model updating capability, cannot cope with changeable street view environment data, and is poor in universality.

Disclosure of Invention

The application aims to provide an urban environment assessment method and system based on online learning and crowdsourcing data analysis, differences of assessment of different crowds are fully considered, an assessment model can be updated in real time, universality is high, and accuracy of assessment results is remarkably improved.

In order to achieve the purpose, the technical scheme adopted by the application is as follows:

a city environment assessment method based on online learning and crowd-sourced data analysis comprises the following steps:

step S1, collecting urban environment assessment data from the crowd-sourced data platform, and preprocessing the collected urban environment assessment data by using a crowd-sourced algorithm;

step S2, constructing an urban environment assessment model, and training the urban environment assessment model by adopting preprocessed urban environment assessment data;

and S3, collecting new model training data, constructing an online learning algorithm, optimizing the urban environment evaluation model at regular time by using the online learning algorithm based on the new model training data, and outputting an urban impression attribute comparison result to be evaluated by using the latest urban environment evaluation model.

Preferably, the collecting urban environment assessment data from the crowd-sourced data platform comprises:

s1.1, releasing paired street view pictures on a crowd-sourced data platform;

s1.2, receiving a comparison result of a marker on the crowdsourcing data platform on each street view picture in pairs;

and S1.3, taking the comparison result as a label generated by the current annotator on the current paired street view pictures, and establishing a one-to-one correspondence relationship among the annotator, the paired street view pictures and the label.

Preferably, the collected urban environment assessment data is preprocessed by using a crowdsourcing algorithm, and the preprocessing comprises the following steps:

s1.4, setting the total number of samples to be marked as N, namely, setting the total logarithm of the street view picture as N, and acquiring a comparison result of K markers, namely labels, from a crowd data platform, wherein the total class number of the labels is I;

step S1.5, for the nth sample, the following relation is present:

wherein P represents the authenticity label of the nth sample

Probability of i, let us note

I represents the label class I, and I ∈ (1, …, I);

s1.6, each marker corresponds to an I multiplied by I confusion matrix, and the confusion matrix corresponding to the kth marker is recorded as pi^(k)，

Denotes the probability that the kth annotator annotates a sample with a true label a with b, a denotes label class a and a ∈ (1, …, I), b denotes label class b and b ∈ (1, …, I), and

step S1.7, constructing conditional probabilities as follows:

wherein S is_nRepresents the nth sample, Q_niTrue tag representing nth sample under current parameters

Is the probability of being i,

representing the probability set of the corresponding real label of 1, … I, … I of each sample so as to

True tag representing nth sample

A probability of i, i

Pi represents a confusion matrix corresponding to each marker;

s1.8, calculating probability sets of the true labels corresponding to the samples to be 1, … I and … I respectively by adopting a crowdsourcing algorithm

And the confusion matrix pi corresponding to each label is obtained according to calculation

And pi, by the conditional probability formula Q_niThe conditional probability of each sample corresponding to all comparison results can be calculated, and the comparison result corresponding to the maximum conditional probability of each sample is selected as the final label of the current sample.

Preferably, the probability set with the true label i corresponding to each sample is calculated by adopting a crowdsourcing algorithm

And the confusion matrix pi corresponding to each label person comprises:

the adopted crowdsourcing algorithm is EM algorithm, and EM algorithm is adopted for calculation

And pi, the calculation process is as follows:

definition of Q_niThe initial values of (a) are:

wherein k represents the number of the annotator,

representing the number of times all annotators labeled the nth sample as i,

denotes the number of times the kth tagger labels the nth sample as b, thus Q_niThe initial value of (1) is the number of times that all annotators mark the nth sample as i divided by the total number of times that all annotators mark the nth sample;

defining M steps as follows: constructing an auxiliary function, maximizing the auxiliary function by using maximum likelihood estimation to update parameters

And pi; wherein the auxiliary function is:

define step E as follows: according to Bayesian formula by

And pi update Q_niThe formula is updated as follows:

circularly executing the step E and the step M until the end condition of the EM algorithm is met, and obtaining the final parameters

And pi.

Preferably, the constructing of the urban environment assessment model includes:

s2.1, constructing a twin network, wherein the twin network consists of two city impression scoring models with the same weight, each city impression scoring model takes one of the pair of street view pictures as input and outputs a city impression attribute score of the street view picture; the city impression scoring model comprises a computer vision model used for extracting features in street view pictures and a full connection layer used for outputting city impression attribute scoring according to the extracted features;

and S2.2, constructing a logistic regression model of the twin network, wherein the twin network and the logistic regression model form a city environment evaluation model, the logistic regression model takes the difference value of two city impression attribute scores output by the twin network as input, takes the probability that the subjective feeling generated by a first street view picture in the street view pictures is greater than that of a second street view picture in degree as output, and the dependent variable of the logistic regression model is set to be represented by 0, 0.5 and 1, namely the set output result is represented by 0, 0.5 and 1.

Preferably, the acquiring new model training data includes:

1) collecting new urban environment evaluation data from a crowd-sourced data platform;

2) and collecting user feedback data, wherein the user feedback data is a judgment result of judging whether the user accords with the subjective impression of the user according to the city impression attribute comparison result output by the city environment evaluation model.

Preferably, the constructing an online learning algorithm, and the optimizing the urban environment assessment model at regular time by using the online learning algorithm based on new model training data includes:

s3.1, representing the input of a city environment evaluation model by x, namely representing the paired street view pictures of the input twin network; f represents an urban environment evaluation model; representing the parameters of the urban environment assessment model by theta; with y_pThe output of the city environment evaluation model, that is, the probability that the subjective feeling generated on the first street view picture in the pair of street view pictures output by the logistic regression model is greater than that of the second street view picture, is represented, the city environment evaluation model may be represented as:

y_p＝f(x|θ)

step S3.2, establishing a loss function as follows:

wherein, y_tRepresenting the actual city impression attribute comparison result of the paired street view pictures;

step S3.3, deriving the loss function to obtain gradient values ξ corresponding to all parameters, and after the random gradient is decreased, updating the parameters of the urban environment assessment model to θ' as follows:

θ′＝θ-ηξ

here, η is a learning rate.

The application also provides a city environment assessment system based on online learning and crowdsourcing data analysis, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to realize the steps of the city environment assessment method based on online learning and crowdsourcing data analysis in any technical scheme.

According to the urban environment assessment method and system based on online learning and crowdsourcing data analysis, the crowdsourcing algorithm is adopted to improve the reliability of the marking data, the crowdsourcing algorithm is used to consider the differences of different markers, and the online learning algorithm is used to optimize the assessment model at regular time, so that the assessment model has high universality, and meanwhile, the accuracy of the assessment result is obviously improved.

Drawings

Fig. 1 is a flowchart of an urban environment assessment method based on online learning and crowd-sourced data analysis according to the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

It should be understood that steps in this application are not limited to being performed in the exact order described, and that steps may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least a portion of the sub-steps or stages of other steps.

In one embodiment, the urban environment assessment method based on online learning and crowdsourcing data analysis is provided and used for obtaining subjective feeling tendency and evaluation of a city to people through visual element information such as urban landscape provided by street view images, so that the obtained subjective feeling tendency and evaluation are used as important indexes for city construction.

As shown in fig. 1, the urban environment assessment method based on online learning and crowd-sourced data analysis includes:

and step S1, collecting urban environment assessment data from the crowd-sourced data platform, and preprocessing the collected urban environment assessment data by using a crowd-sourced algorithm.

The data acquired through the crowdsourcing data platform are comprehensive, the crowd is wide, and the crowdsourcing data platform has a better effect on the training of the model. When data are collected through a crowdsourcing data platform, the following steps are mainly executed:

and S1.1, releasing the paired street view pictures on a crowd-sourced data platform. After the city impression attributes are released to the crowdsourcing data platform, please note the annotator to generate the comparison result of subjective feelings of different degrees in the paired street view pictures according to the specific city impression attributes (such as security, aesthetic and the like).

And S1.2, receiving a comparison result of the annotator on the crowdsourcing data platform on each pair of street view pictures.

To ensure the comprehensiveness of the comparison result, in one embodiment, the comparison result is set to three categories, i.e., the tag has three categories. If one of the street view pictures in the pair is set as a first street view picture and the other is set as a second street view picture, the three types of comparison results are respectively as follows:

the tag category 1 is that subjective feeling generated on the first street view picture is greater than that of the second street view picture in degree; the label category 2 is that subjective feeling generated on the first street view picture is equal to the second street view picture in degree; the tag category 3, the subjective feeling produced to the first street view picture is less than that of the second street view picture.

In addition, in order to ensure the comprehensiveness of the acquired data, the subjective feeling of the annotator can be generated under the influence of aesthetic standards of the annotator or under the influence of specific professional knowledge (such as industry evaluation standards).

Because the data collected on the crowdsourcing data platform relates to a wide range of people, and multiple annotators annotate the street view pictures with the same input street view picture, that is, multiple comparison results (which may be the same or different) are obtained for the same street view picture, the final tags of the street view pictures with the same pair of street view pictures need to be determined by integrating multiple pieces of information.

In the prior art, "majority voting" is usually used, but this method ignores the variability of labeled individuals. In order to avoid adverse effects on the result output by the final evaluation model due to the differences, in one embodiment, a crowdsourcing algorithm is used for respectively modeling the capability of the annotator and the comparison result, and an EM algorithm is used for solving model parameters to complete data preprocessing. And the crowdsourcing algorithm can consider that a bad annotator deliberately beats labels randomly, and can screen out the bad annotator by utilizing the annotator capability model so as to reduce the influence on the comparison result.

Preprocessing the collected urban environment assessment data by utilizing a crowdsourcing algorithm, wherein the preprocessing comprises the following steps:

step S1.4, setting the total number of samples to be labeled as N, that is, the total number of pairs of street view pictures is N, obtaining the comparison result of K-bit annotators, that is, the tags from the crowd data platform, and the total number of types of the tags is I type, and dividing the comparison result into three types according to the foregoing known comparison result in this embodiment, that is, I is 3.

Step S1.5, for the nth sample, the following relation is present:

wherein P represents the authenticity label of the nth sample

Probability of i, let us note

I represents the label class I, and I ∈ (1, …, I);

step S1.7, constructing conditional probabilities as follows:

Is the probability of being i,

True tag representing nth sample

A probability of i, i

Pi represents a confusion matrix corresponding to each marker;

Wherein the calculation is carried out

In order to better conform to the application environment of the crowdsourcing algorithm in the embodiment, the EM algorithm is taken as an example for explanation. It should be understood that the present implementation algorithm is not limited to the EM algorithm.

Parameters defining the EM algorithm: assuming the number of samples to be N; defining an observed value x^(j)Wherein j takes the value from 1 to N and represents the sample number; defining hidden variable gamma_jkWhen the local j samples are labeled by the kth class marker, γ_jkValue 1, otherwise γ_jkThe value is 0.

The main steps of the EM algorithm are parameter initialization, and then the steps E and M are repeated until the model converges. Wherein the parameters in the initialization of the parameters, the steps E and the steps M are defined as follows.

Definition of Q_niThe initial values of (a) are:

wherein k represents the number of the annotator,

representing the number of times all annotators labeled the nth sample as i,

And pi; wherein the auxiliary function is:

define step E as follows: according to Bayesian formula by

And pi update Q_niNumber ofThe value, update formula is as follows:

And pi.

Therefore, the urban environment assessment data is preprocessed, differences of subjective feelings of different annotators are considered when the labels of the street view pictures in pairs are finally selected, the reliability of the data is effectively improved, and a foundation is laid for the output accuracy of a later-stage assessment model.

And S2, constructing an urban environment evaluation model, and training the urban environment evaluation model by adopting the preprocessed urban environment evaluation data.

Corresponding to the twin street view picture, the twin network is applied for identification, and the process of constructing the urban environment assessment model in one embodiment is as follows:

s2.1, constructing a twin network, wherein the twin network consists of two city impression scoring models with the same weight, each city impression scoring model takes one of the pair of street view pictures as input, and outputs the city impression attribute score of the street view picture.

And the adopted city impression scoring model comprises a computer vision model (such as a VGG model) for extracting features in the street view picture and a full connection layer for outputting the city impression attribute scoring according to the extracted features.

The twin network adopts the structure in the prior art and is not the focus of the improvement of the application.

And S2.2, constructing a logistic regression model of the twin network, wherein the twin network and the logistic regression model form a city environment evaluation model, the logistic regression model takes the difference value of two city impression attribute scores output by the twin network as input, the probability that subjective feeling generated on a first street view picture in a pair of street view pictures is greater than that of a second street view picture in degree as output, and dependent variables of the logistic regression model are set to be represented by 0, 0.5 and 1, namely the set output result is represented by 0, 0.5 and 1.

The dependent variables 0, 0.5 and 1 of the urban environment assessment model respectively represent that the probability that the subjective feeling generated on the first street view picture is greater than that of the second street view picture is 0%, 50% and 100%, and the corresponding crowd-sourced data label has the category that the subjective feeling generated on the first street view picture is less than, equal to or greater than that of the second street view picture.

The urban environment evaluation model is updated regularly in order to ensure the accuracy of the output result of the model in real time and adapt to the change of the environment and the subjective feeling of people, and meanwhile, the evaluation model can be optimized in a targeted manner to adapt to the subjective feeling of users.

The new model training data thus collected includes the following two parts:

1) new urban environment assessment data is collected from the crowd-sourced data platform.

2) And acquiring user feedback data, wherein the user feedback data is a judgment result of judging whether the user accords with the subjective impression of the user according to the urban impression comparison result output by the urban environment evaluation model.

When user feedback data is collected, for example, an urban travel route recommendation system based on an urban environment evaluation model can be established, paired street view pictures and model prediction results can be given, and a user can judge whether the comparison result accords with self subjective impression, so that the urban environment evaluation result of a specific user is optimized.

The goal of the online learning algorithm is to further optimize the urban environment assessment model with the new data obtained in the previous step. In one embodiment, in order to ensure accurate optimization of online learning, the online learning method used is a random gradient descent method. The basic idea of the method is that for each newly input sample, the existing model is used for obtaining a prediction result, a loss function is constructed according to the real result of the sample and the model prediction result, and finally a gradient descent method is used for updating model parameters.

The method comprises the following specific steps:

s3.1, representing the input of a city environment evaluation model by x, namely representing the paired street view pictures of the input twin network; f represents an urban environment evaluation model; representing the parameters of the urban environment assessment model by theta; with y_pThe probability that the subjective feeling generated by the first street view picture in the pair of street view pictures output by the logistic regression model is greater than that of the second street view picture is represented as the output of the urban environment assessment model, and the urban environment assessment model can be represented as follows:

y_p＝f(x|θ)

step S3.2, establishing a loss function as follows:

θ′＝θ-ηξ

here, η is a learning rate.

And performing online learning optimization at regular time in order to keep the model optimal so as to improve the accuracy of the urban impression attribute comparison result and provide reliable data for urban construction.

In another embodiment, a city environment assessment system based on online learning and crowd-sourced data analysis is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the city environment assessment method based on online learning and crowd-sourced data analysis according to any embodiment when executing the computer program.

In this embodiment, an urban environment assessment system based on online learning and crowd-sourced data analysis is a computer device, which may be a terminal. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a city environment assessment method based on online learning and crowd-sourced data analysis. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

The urban environment assessment system based on online learning and crowdsourcing data analysis provided by the embodiment comprises a crowdsourcing data platform, an urban environment evaluation model and an online learning module, and aims to obtain reliable urban environment evaluation data through a crowdsourcing method and establish an online learning mechanism to achieve the aim of updating the model in real time.

For further limitation of the urban environment assessment system based on online learning and crowd-sourced data analysis, reference may be made to the above-mentioned limitation on the urban environment assessment method based on online learning and crowd-sourced data analysis, and details are not repeated.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A city environment assessment method based on online learning and crowd-sourced data analysis is characterized in that the city environment assessment method based on online learning and crowd-sourced data analysis comprises the following steps:

2. The urban environment assessment method based on online learning and crowdsourcing data analysis of claim 1, wherein collecting urban environment assessment data from a crowdsourcing data platform comprises:

s1.1, releasing paired street view pictures on a crowd-sourced data platform;

3. The urban environment assessment method based on online learning and crowdsourcing data analysis according to claim 2, wherein preprocessing the collected urban environment assessment data by using a crowdsourcing algorithm comprises:

step S1.5, for the nth sample, the following relation is present:

wherein P represents the authenticity label of the nth sample

Probability of i, let us note

I represents a label class I, and I ∈ (1.,. I);

Denotes the probability that the kth annotator annotates a sample with a true label a with b, a denotes the label class a and a ∈ (1...., I), b denotes the label class b and b ∈ (1...., I), and

step S1.7, constructing conditional probabilities as follows:

Is the probability of being i,

a probability set representing that the real label corresponding to each sample is 1

True tag representing nth sample

A probability of i, i

Pi represents a confusion matrix corresponding to each marker;

s1.8, calculating a probability set of 1, a

4. The city environment assessment method based on online learning and crowdsourcing data analysis according to claim 3, wherein the crowdsourcing algorithm is adopted to calculate the probability set of the true label i corresponding to each sample

And the confusion matrix pi corresponding to each label person comprises:

And pi, the calculation process is as follows:

definition of Q_niThe initial values of (a) are:

wherein k represents the number of the annotator,

representing the number of times all annotators labeled the nth sample as i,

And pi; wherein the auxiliary function is:

define step E as follows: according to Bayesian formula by

And pi update Q_niThe formula is updated as follows:

And pi.

5. The urban environment assessment method based on online learning and crowd-sourced data analysis according to claim 1, wherein the building of the urban environment assessment model comprises:

6. The method for urban environment assessment based on online learning and crowd-sourced data analysis according to claim 5, wherein the collecting new model training data comprises:

7. The urban environment assessment method based on online learning and crowdsourcing data analysis according to claim 6, wherein the constructing of an online learning algorithm, and the timing optimization of the urban environment assessment model by using the online learning algorithm based on new model training data comprises:

y_p＝f(x|θ)

step S3.2, establishing a loss function as follows:

θ′＝θ-ηξ

here, η is a learning rate.

8. An urban environment assessment system based on online learning and crowd-sourced data analysis, comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the urban environment assessment method based on online learning and crowd-sourced data analysis according to any one of claims 1 to 7 when executing the computer program.