CN115035715B

CN115035715B - Expressway flow prediction method based on decision tree and multi-element auxiliary information

Info

Publication number: CN115035715B
Application number: CN202210588240.2A
Authority: CN
Inventors: 李保; 王东京; 沈航; 万峰; 于涵诚; 俞东进; 张煜; 裴洋
Original assignee: Zhejiang Institute of Mechanical and Electrical Engineering Co Ltd
Current assignee: Zhejiang Institute of Mechanical and Electrical Engineering Co Ltd
Priority date: 2022-05-26
Filing date: 2022-05-26
Publication date: 2023-08-29
Anticipated expiration: 2042-05-26
Also published as: CN115035715A

Abstract

The invention discloses a highway flow prediction method based on decision trees and multi-element auxiliary information. The method comprises the steps of firstly establishing a multi-element auxiliary information time sequence of a multi-scale time span, learning characteristic representation of information such as flow and weather by using an LSTM model perceived by multi-element information, then establishing a time sequence with multi-element information, training the model by using a gradient lifting decision tree, and improving accuracy of flow prediction in a highway scene. The invention constructs a data set of multi-element information by using a real expressway microwave vehicle detector and a meteorological detector data set, comprehensively considers various characteristic factors influencing traffic flow by using a sliding window, predicts the flow in an expressway scene by using a model based on a gradient lifting decision tree, and has higher accuracy.

Description

Freeway Flow Forecasting Method Based on Decision Tree and Multivariate Auxiliary Information

技术领域technical field

本发明属于数据挖掘与智能交通领域，具体涉及一种基于决策树和多元辅助信息的高速公路流量预测方法。The invention belongs to the fields of data mining and intelligent transportation, and in particular relates to a method for predicting expressway flow based on decision trees and multivariate auxiliary information.

背景技术Background technique

近年来，我国在交通运输方面取得了历史性成就，其中，据中国交通运输部的数据，截至2020年底，中国高速公路里程达16.10万公里，居世界第一。随着经济的快速发展，我国机动车数量也在快速增加，高速管理部门进行日常维护、拥塞管理等工作的难度日益增加。为了缓解管理难度，已经有越来越多的高速公路路段安装微波车检器等智能传感器设备，用于记录固定时间段内的车道级的总车流量、平均车速、平均车间距、平均车长以及分小、中、大型车型的车流量等数据。这些记录数据能反映当前高速公路的车流规律，是为高速管理部门提供科学决策的重要基础。In recent years, my country has made historic achievements in transportation. Among them, according to the data of the Ministry of Transport of China, as of the end of 2020, the mileage of China's expressways has reached 161,000 kilometers, ranking first in the world. With the rapid development of the economy, the number of motor vehicles in my country is also increasing rapidly, and it is becoming more and more difficult for the high-speed management department to perform daily maintenance and congestion management. In order to ease management difficulties, more and more expressway sections have installed smart sensor devices such as microwave vehicle detectors to record the total traffic flow, average vehicle speed, average vehicle distance, and average vehicle length at the lane level within a fixed period of time. And data such as traffic flow of small, medium and large models. These recorded data can reflect the current traffic flow law of the expressway, and are an important basis for scientific decision-making for the expressway management department.

目前国内外的研究人员对道路的流量预测做了许多有价值的研究工作。现有的道路流量预测算法基于时间序列数据主要分为两类：基于统计学习的模型和基于机器学习的模型。相关统计学习方法包括差分自回归移动平均(ARIMA)、卡尔曼滤波、线性回归等，机器学习方法包括支持向量机(Support Vector Machine，SVM)、K近邻算法(K-NearestNeighbor)、梯度提升决策树(XGBoost)等。此外，随着深度学习的迅速发展，也有一些研究使用循环神经网络(Recurrent Neural Network，RNN)、卷积神经网络(ConvolutionalNeural Network，CNN)等神经网络模型来提升道路流量预测的准确性。例如长短期记忆网络(LSTM)通过引入记忆单元(memory unit)，随着时间序列的增加而不会衰减已有的信息，可以捕获更长时间跨度内的时间序列特征，具有更好的性能。At present, researchers at home and abroad have done a lot of valuable research work on road flow forecasting. The existing road flow forecasting algorithms are mainly divided into two categories based on time series data: models based on statistical learning and models based on machine learning. Relevant statistical learning methods include differential autoregressive moving average (ARIMA), Kalman filter, linear regression, etc. Machine learning methods include support vector machine (Support Vector Machine, SVM), K-Nearest Neighbor algorithm (K-Nearest Neighbor), gradient boosting decision tree (XGBoost), etc. In addition, with the rapid development of deep learning, some studies have used neural network models such as Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN) to improve the accuracy of road traffic forecasting. For example, the long-short-term memory network (LSTM) introduces memory units, which will not attenuate the existing information as the time series increases, and can capture time series features in a longer time span, with better performance.

然而，高速公路流量的时间序列往往是多种因素共同作用产生的结果，仅依靠道路流量的历史数据难以提供高准确度的流量预测，导致道路管理部门无法获取对未来道路流量状况预判的数据支撑。同时，高速公路的流量数据存在以下特点，导致预测过程更具挑战性：1)流量受时间、节假日和气象情况等因素的影响较大，预测难度大，2)微波车检器的设施不够完善、运行不稳定，存在大量缺失或者独立的记录，影响流量预测的准确率。现有的道路流量预测方法大多针对城市道路场景下，缺乏对于高速公路的应用场景的流量预测方法。However, the time series of expressway traffic is often the result of the joint action of various factors. It is difficult to provide high-accuracy traffic forecasting only by relying on historical data of road traffic, which makes it impossible for road management departments to obtain data for predicting future road traffic conditions. support. At the same time, the flow data of the expressway has the following characteristics, which make the prediction process more challenging: 1) The flow is greatly affected by factors such as time, holidays and weather conditions, making prediction difficult; 2) The facilities of the microwave vehicle detector are not perfect , The operation is unstable, and there are a large number of missing or independent records, which affect the accuracy of traffic forecasting. Most of the existing road flow prediction methods are aimed at urban road scenarios, and there is a lack of flow prediction methods for expressway application scenarios.

发明内容Contents of the invention

本发明针对因高速公路场景下的多种复杂因素以及大量缺失车检器流量记录而导致流量预测困难和准确率不高的问题，提出了一种基于决策树和多元辅助信息的高速公路流量预测方法，本发明可以提高高速公路场景下流量预测的准确度。The present invention aims at the problem of difficulty in flow forecasting and low accuracy due to various complex factors in the expressway scene and a large number of missing vehicle detector flow records, and proposes a freeway flow forecast based on decision trees and multivariate auxiliary information method, the present invention can improve the accuracy of traffic forecasting in the expressway scene.

本发明的具体步骤是：Concrete steps of the present invention are:

步骤(1).考虑高速公路场景下影响流量预测因素的多元性，采集高速公路路段的微波车检器和气象仪检测器数据构建多元辅助信息数据集。Step (1). Considering the multiplicity of factors affecting traffic flow prediction in the expressway scene, the microwave vehicle detector and meteorological detector data of the expressway section are collected to construct a multivariate auxiliary information data set.

步骤(2).在步骤(1)的基础上进行特征提取与数据预处理，包括以下子步骤：Step (2). Carry out feature extraction and data preprocessing on the basis of step (1), including the following sub-steps:

步骤(2.1).提取高速公路车道级的总车流量tr_t、区分小、中、大型车型的车流量tr_s、tr_m、tr_l、平均车速s、平均车长l等；Step (2.1). Extract the total traffic flow tr _t at the lane level of the expressway, the traffic flow tr _s , tr _m , tr _l , the average vehicle speed s, the average vehicle length l, etc. for distinguishing between small, medium and large models;

步骤(2.2).提取气象仪检测信息，包括高速路段一定范围内的能见度w_v、路面湿滑程度w_p；Step (2.2). Extract the detection information of the meteorological instrument, including the visibility w _v and the degree of slippery road surface w _p within a certain range of the highway section;

步骤(2.3).基于信息采集时间戳提取时间特征，包括时间段特征ti_h、日期特征ti_d、星期特征ti_w、月份特征ti_m。Step (2.3). Time features are extracted based on the information collection time stamp, including time period feature ti _h , date feature ti _d , week feature ti _w , and month feature ti _m .

步骤(3).在步骤(2)的基础上，设置不同时间跨度的滑动窗口大小，构建不同类别信息的时间序列，结合多元信息感知的LSTM(Long Short-Term Memory，长短期记忆网络)模型对时间序列进行特征学习，获取多元信息的特征表示；其中特征融合后的交通流信息表示为tr′_t，tr′_s，tr′_m，tr′_l，s′，l′，气象信息表示为w′_v，w′_p，时间信息表示为ti′_h，ti′_d，ti′_w，ti′_m。Step (3). On the basis of step (2), set the sliding window size of different time spans, construct the time series of different types of information, and combine the LSTM (Long Short-Term Memory, long-term short-term memory network) model of multi-information perception Carry out feature learning on the time series to obtain the feature representation of multivariate information; where the traffic flow information after feature fusion is represented as tr′ _t , tr′ _s , tr′ _m , tr′ _l , s′, l′, and the weather information is represented as w′ _v , w′ _p , the time information is expressed as ti′ _h , ti′ _d , ti′ _w , ti′ _m .

步骤(4).考虑高速公路场景下微波车检器等设施运行不稳定，存在大量缺失或者独立记录的问题，本发明提出了建立基于时间窗口的特征序列，利用周期性变化规律特征缓解上述问题，基于时间窗口的特征序列定义为：Step (4). Considering the unstable operation of facilities such as microwave vehicle detectors in the expressway scene, there are a large number of missing or independent records, the present invention proposes to establish a feature sequence based on time windows, and use the periodic change feature to alleviate the above problems , the feature sequence based on the time window is defined as:

步骤(4.1).设所有时间间隔可获取的多元信息序列为s₁，s₂，...，s_t，其中s_t表示第t个时间间隔的多元信息，并且s_t由交通流、气象和时间三种信息组成，即Step (4.1). Let the sequence of multivariate information available for all time intervals be s ₁ , s ₂ , ..., s _t , where st _t represents the multivariate information of the tth time interval, and st _t consists of traffic flow, weather and time are composed of three kinds of information, namely

步骤(4.1).基于时间窗口的特征序列将多个时间间隔的多元信息序列按时间顺序进行拼接，设窗口大小为size，融合后的时间序列为：s′_t＝s_t-size||s_t-size+1||...||s_t，其中(a||b)表示将两个维度为12的序列a和b拼接为维度为24的一个序列。Step (4.1). Based on the feature sequence of the time window, the multiple information sequences of multiple time intervals are spliced in time order, and the window size is set to size. The time series after fusion is: s′ _t = s _t-size ||s _t-size+1 ||...||s _t , where (a||b) means splicing two sequences a and b with a dimension of 12 into a sequence with a dimension of 24.

步骤(5).以基于时间窗口的特征序列为基础，构建结合多元辅助信息的多特征梯度提升决策树模型，对多元辅助信息特征表示进一步训练学习。本发明的模型学习的目标函数为：Step (5). Based on the feature sequence based on the time window, construct a multi-feature gradient boosting decision tree model combined with multivariate auxiliary information, and further train and learn the multivariate auxiliary information feature representation. The objective function of model learning of the present invention is:

其中n表示样本空间大小，y_t表示真实流量值，流量预测值的是由多棵决策树组合得来，计算方式如下：Among them, n represents the size of the sample space, y _t represents the real flow value, and the flow forecast value is obtained by combining multiple decision trees, and the calculation method is as follows:

其中K表示回归树的数量，fk()表示第k颗树。定义为L2正则化的平方损失函数，降低模型的过拟合概率。Where K represents the number of regression trees, and fk() represents the kth tree. Defined as the L2 regularized square loss function to reduce the overfitting probability of the model.

步骤(6).流量预测过程。基于步骤(5)训练的决策树模型，输入待预测道路的历史流量信息、气象信息和时间信息。其中，流量信息包括总车流量、区分小、中、大型车型的车流量、平均车速和平均车长；气象信息包括路段能见度和路面湿滑程度；时间信息包括时间段(小时)、日期、星期和月份。将上述数据输入进决策树模型中即可获取流量预测结果。Step (6). Flow prediction process. Based on the decision tree model trained in step (5), input the historical flow information, weather information and time information of the road to be predicted. Among them, the flow information includes the total traffic flow, the traffic flow of small, medium and large models, the average speed and the average vehicle length; the meteorological information includes the visibility of the road section and the slippery degree of the road surface; the time information includes the time period (hour), date, week and month. Input the above data into the decision tree model to obtain traffic forecasting results.

本发明具有的有益效果：The beneficial effect that the present invention has:

很多工作仅依赖历史流量时间序列作为输入进行流量预测的准确度不高，容易忽略时间上下文、天气上下文等多元信息的影响，本发明在高速公路场景下提取多元辅助信息，综合考虑了多元特征对于流量预测的影响程度。A lot of work only relies on historical flow time series as input, and the accuracy of flow prediction is not high, and it is easy to ignore the influence of multiple information such as time context and weather context. How much influence traffic forecasts have.

本发明使用真实的高速公路微波车检器和气象检测仪数据集构建了多元信息的数据集，使用滑动窗口综合考虑多种影响交通流量的特征因子，并且使用基于梯度提升决策树的模型对高速公路场景下的流量进行预测，具有更高的准确率。The present invention uses the real highway microwave vehicle detector and weather detector data sets to build a data set of multivariate information, uses sliding windows to comprehensively consider a variety of characteristic factors that affect traffic flow, and uses a model based on gradient lifting decision trees to analyze the high-speed The traffic in the highway scene is predicted with higher accuracy.

附图说明Description of drawings

图1.本发明方法流程图；Fig. 1. method flowchart of the present invention;

图2.多元辅助信息提取模块示意图；Figure 2. Schematic diagram of multivariate auxiliary information extraction module;

图3.基于时间跨度的滑动窗口示意图。Figure 3. Schematic diagram of a sliding window based on time span.

具体实施方式Detailed ways

下面将对本发明所提出的基于决策树和多元辅助信息的高速公路流量预测方法做具体说明。The expressway traffic forecasting method based on the decision tree and multivariate auxiliary information proposed by the present invention will be described in detail below.

如图1所示，本发明的具体步骤如下：As shown in Figure 1, the concrete steps of the present invention are as follows:

步骤(1).输入：考虑高速公路场景下影响流量预测因素的多元性，采集高速公路路段的微波车检器和气象仪检测器数据构建多元辅助信息数据集。具体过程如下：Step (1). Input: Considering the diversity of factors affecting traffic flow prediction in the expressway scene, collect microwave vehicle detector and meteorological instrument detector data on the expressway section to construct a multivariate auxiliary information data set. The specific process is as follows:

步骤(1.1).在微波车检器中，每隔5分钟记录一次信息，包括时间戳、分车道的车总流量、分车型的车流量、平均车速、平均车长度、平均车间距等多种信息。气象仪检测器同样每隔5分钟记录一次信息，包括降水量、能见度、路面湿滑度、风速、风向等多种气象信息，但很多属性存在记录值缺失或者异常的现象。针对高速场景下，本发明选择能见度和路面湿滑度两个对流量影响较大的属性进行采集。在高速公路场景下，存在一些时刻无车通过或者流量很低的情况，为了聚焦于正常情况下的流量预测，本发明对5分钟采集的数据进行汇聚操作，即以12个记录点(1小时)为一组取均值作为一个时间点的数据，在此基础上构建数据集。Step (1.1). In the microwave vehicle detector, information is recorded every 5 minutes, including time stamp, total traffic flow by lane, traffic flow by vehicle type, average vehicle speed, average vehicle length, average inter-vehicle distance, etc. information. The meteorological instrument detector also records information every 5 minutes, including precipitation, visibility, road slipperiness, wind speed, wind direction and other meteorological information, but many attributes have missing or abnormal recorded values. For high-speed scenarios, the present invention selects two attributes that have a greater impact on traffic, namely visibility and road slipperiness, for collection. In the expressway scene, there are situations where there is no car passing or the flow rate is very low at some moments. In order to focus on the flow prediction under normal conditions, the present invention performs an aggregation operation on the data collected in 5 minutes, that is, with 12 recording points (1 hour ) is a set of data whose mean value is taken as a time point, and a data set is constructed on this basis.

步骤(2.1).提取高速公路车道级的总车流量tr_t、区分小、中、大型车型的车流量tr_s、tr_m、tr₁、平均车速s、平均车长l。对于上述属性，对缺失值或者异常值采用平滑均值的方式进行数据填充；Step (2.1). Extract the total traffic flow tr _t at the lane level of the expressway, the traffic flow tr _s , tr _m , and tr ₁ , the average vehicle speed s, and the average vehicle length l for differentiating small, medium and large vehicle types. For the above attributes, the missing values or outliers are filled with smooth mean;

步骤(2.2).提取天气信息，包括高速路段一定范围内的能见度w_v、路面湿滑程度w_p。对于上述属性，对缺失值或者异常值采用平滑均值的方式进行数据填充；Step (2.2). Extract weather information, including visibility w _v and road slippery degree w _p within a certain range of expressway section. For the above attributes, the missing values or outliers are filled with smooth mean;

步骤(3).在步骤(2)的基础上，设置不同时间跨度的滑动窗口大小，构建不同类别信息的时间序列，结合多元信息感知的LSTM模型对时间序列进行特征学习，获取多元信息的特征表示。具体从如下：Step (3). On the basis of step (2), set the sliding window size of different time spans, construct time series of different types of information, and combine the LSTM model of multi-information perception to perform feature learning on the time series to obtain the characteristics of multi-information express. The details are as follows:

步骤(3.1).对于流量信息和天气信息，如每个时间间隔的总车流量tr_t，分别以天和周两个时间跨度划分时间滑动窗口的大小，构建不同时间跨度的流量时间序列，对于长度不足窗口大小的部分，用所有流量的平均值进行填充。Step (3.1). For flow information and weather information, such as the total traffic flow tr _t of each time interval, the size of the time sliding window is divided by two time spans of day and week respectively, and traffic time series of different time spans are constructed. For The part whose length is less than the window size is filled with the average value of all flows.

步骤(3.2).设时间序列长度为T，以天和周为跨度的时间序列长度分别为24和168。时间序列的构建过程如图3所示。对于总车流量tr_t、区分小、中、大型车型的车流量tr_s、tr_m、tr_l、平均车速s、平均车长l以及天气特征能见度w_v、路面湿滑程度w_p，分别作为LSTM模型的输入x_t进行特征学习，过程如下：Step (3.2). Let the length of the time series be T, and the lengths of the time series spanning days and weeks are 24 and 168, respectively. The construction process of the time series is shown in Figure 3. For the total traffic flow tr _t , the traffic flow tr _s , tr _m , tr _l of small, medium and large vehicles, the average vehicle speed s, the average vehicle length l, the weather feature visibility w _v , and the degree of slippery road surface w _p , respectively, as The input x _t of the LSTM model is used for feature learning, and the process is as follows:

其中(·)表示两个跨度的情况下的特征学习结果，取值为d(天)和w(周)，和分别表示模型的此刻输入和上一时刻的输出。/>分别表示输入门、遗忘门和输出门，/>表示此刻的输入状态，/>表示上一时刻的记忆状态，/>分别表示此刻的记忆状态和输出状态，W和b分别表示不同门和状态计算过程中非线性激活函数σ和tanh的权重矩阵和偏置。输入门决定当前输入状态有哪些会保存到当前记忆状态，遗忘门控制上一时刻的记忆状态有哪些保留到当前时刻，输出门决定当前时刻的输出状态。Where ( ) represents the feature learning result in the case of two spans, and the values are d (day) and w (week), and represent the current input and the previous output of the model, respectively. /> Denote the input gate, forget gate and output gate respectively, /> Indicates the input state at the moment, /> Indicates the memory status of the previous moment, /> Represent the memory state and output state at the moment, respectively, W and b represent the weight matrix and bias of the nonlinear activation functions σ and tanh in the calculation process of different gates and states, respectively. The input gate determines which of the current input states will be saved to the current memory state, the forget gate controls which memory states at the previous moment are retained to the current moment, and the output gate determines the output state at the current moment.

经过LSTM模型的学习，可以分别得到以天和周为时间跨度的特征表示为按同样的处理过程可以得到剩余的属性的特征表示，分别为：小、中、大型车型的车流量/>平均车速/>平均车长/>以及天气特征能见度/>路面湿滑程度/> After the learning of the LSTM model, the feature representation of the time span of days and weeks can be obtained as According to the same process, the feature representation of the remaining attributes can be obtained, which are: the traffic flow of small, medium and large models /> average speed/> Average vehicle length/> and weather feature visibility/> Slippery degree of the road surface />

步骤(3.3).针对两个时间跨度的特征表示进行融合，融合如下：Step (3.3). Fusion is performed on the feature representations of the two time spans as follows:

h′＝γh^(d)+(1-γ)h^(w) h'=γh ^(d) +(1-γ)h ^(w)

其中γ表示两种时间跨度特征表示的影响权重，h′是信息特征表示的通用表示，具体可以计算得到步骤(3.2)中的所有属性，即最终得到流量信息和气象信息的特征表示：tr′_t，tr′_s，tr′_m，tr′_l，s′，l′，w′_p，w′_v。考虑到步骤(3.2)和步骤(3.3)的融合了不同跨度的时间信息，故直接使用步骤(2.3)的结果作为最终的时间特征表示：ti′_h，ti′_d，ti′_w，ti′_m。Among them, γ represents the influence weight of two kinds of time span feature representations, and h′ is the general representation of information feature representation. Specifically, all attributes in step (3.2) can be calculated, that is, the feature representation of flow information and meteorological information is finally obtained: tr′ _t , tr' _s , tr' _m , tr' _l , s', l', w' _p , w' _v . Considering that step (3.2) and step (3.3) integrate time information of different spans, the result of step (2.3) is directly used as the final time feature representation: ti′ _h , ti′ _d , ti′ _w , ti′ _m .

其中K表示回归树的数量，f_k()表示第k颗树。定义为L2正则化的平方损失函数，降低模型的过拟合概率。Among them, K represents the number of regression trees, and f _k () represents the kth tree. Defined as the L2 regularized square loss function to reduce the overfitting probability of the model.

将本发明方法所得结果与其它方法所得结果进行比对，如下表：The result obtained by the method of the present invention is compared with the result obtained by other methods, as shown in the following table:

由上表可知，本发明可以显著提高高速公路场景下流量预测的准确度。It can be seen from the above table that the present invention can significantly improve the accuracy of flow forecast in the expressway scene.

本发明所提出的基于决策树和多元辅助信息的高速公路流量预测方法可有两个模块实施：多元辅助信息提取模块和梯度提升决策树流量预测模块。The expressway flow prediction method based on the decision tree and multivariate auxiliary information proposed by the present invention can be implemented by two modules: a multivariate auxiliary information extraction module and a gradient boosting decision tree flow prediction module.

在某个实施例中，多元辅助信息提取模块对应上述步骤中的(1)、(2)和(3)，如图2多元辅助信息提取模块所示。首先通过对微波车检器每5分钟采集的数据进行汇聚，使用每6次记录的平均值作为每半小时的流量数据，同时将记录中的与流量相关的车型、车距、车速等信息进行空值填充等处理，利用记录时间戳计算得到小时、星期、月份等时间信息，利用周边的气象仪检测数据获取能见度、路面湿滑度等天气信息。最后利用多元信息感知的LSTM模型对不同特征的时间序列进行特征学习，获取流量和多元信息的特征表示，用于下一模块中的流量预测。In a certain embodiment, the multivariate auxiliary information extraction module corresponds to (1), (2) and (3) in the above steps, as shown in the multivariate auxiliary information extraction module in FIG. 2 . Firstly, the data collected by the microwave vehicle detector every 5 minutes are aggregated, and the average value of every 6 records is used as the flow data every half hour, and the information related to the flow, such as vehicle types, vehicle distances, and vehicle speeds, etc. Null value filling and other processing, use the recorded time stamp to calculate the time information such as hour, week, month, etc., and use the detection data of the surrounding meteorological instruments to obtain weather information such as visibility and road slipperiness. Finally, the multi-information-aware LSTM model is used to learn the features of the time series of different features, and the feature representation of traffic and multi-information is obtained, which is used for traffic prediction in the next module.

在某个实施例中，梯度提升决策树流量预测模块对应上述步骤中的(4)和(5)。此模块基于梯度提升决策树的流量预测模型，将LSTM学习得到的多元辅助信息作为输入，构建具有多种特征的多棵子决策树，最后汇总所有子树的损失，得到最终的流量预测结果。在基于决策树的预测过程中，构建基于时间窗口的特征序列，利用周期性变化规律特征缓解高速公路场景下微波车检器等设施运行不稳定、存在大量缺失或者独立记录的问题。In a certain embodiment, the gradient boosting decision tree traffic prediction module corresponds to (4) and (5) in the above steps. This module is based on the traffic prediction model of the gradient boosting decision tree. It uses the multivariate auxiliary information learned by LSTM as input to construct multiple sub-decision trees with various characteristics, and finally summarizes the losses of all sub-trees to obtain the final traffic prediction result. In the decision tree-based prediction process, a time-window-based feature sequence is constructed, and periodic changes are used to alleviate the problems of unstable operation of microwave vehicle detectors and other facilities in expressway scenarios, and there are a large number of missing or independent records.

Claims

1. The expressway flow prediction method based on the decision tree and the multi-element auxiliary information is characterized by comprising the following steps of:

step (1), acquiring data of a microwave vehicle detector and a meteorological instrument detector of a highway section, and constructing a multi-element auxiliary information data set;

step (2), extracting features on the basis of the step (1);

step (3), setting sliding window sizes of different time spans on the basis of the step (2), constructing time sequences of different types of information, and carrying out feature learning on the time sequences by combining an LSTM model perceived by multiple information to obtain feature representation of the multiple information, wherein the multiple information relates to flow, weather and time;

step (4), establishing a characteristic sequence based on a time window, which is specifically as follows:

step (4.1), setting the multi-element information sequence which can be obtained at all time intervals as s ₁ ,s ₂ ,...,s _t Wherein s is _t Multiple information representing the t-th time interval, and s _t The system consists of traffic flow, weather and time information;

step (4.2), splicing a plurality of time-interval multi-element information sequences according to time sequence based on the characteristic sequence of the time window;

step (5), constructing a multi-feature gradient lifting decision tree model combined with multi-element auxiliary information based on a feature sequence based on a time window, and further training and learning the multi-element auxiliary information feature representation;

step (6), based on the decision tree model trained in the step (5), inputting historical flow information, weather information and time information of the road to be predicted, and obtaining a flow prediction result;

the step (3) specifically comprises:

step (3.1), dividing the size of a time sliding window by two time spans of day and week for flow information and weather information respectively, and constructing flow time sequences of different time spans;

step (3.2), setting the time sequence length as T, and setting the time sequence lengths of the day and week as spans as 24 and 168 respectively;

for total traffic flow tr _t Traffic flow tr for small, medium and large vehicle types _s 、tr _m 、tr _l Average vehicle speed s, average vehicle length l, and weather feature visibility w _v Degree of road surface wet skid w _p Respectively performing feature learning as the input of the LSTM model;

the time span of day and week is obtained asTraffic flow of small, medium and large vehicle typesAverage vehicle speed>Average ofLength->Weather feature visibility ++>Road surface wet skid degree->

Step (3.3), fusing the characteristic representations of the two time spans to finally obtain the characteristic representations of the flow information and the meteorological information: tr _t ^′ ,tr _s ^′ ,tr ^′ _m ,tr _l ^′ ,s ^′ ,l ^′ ,w _p ^′ ,w _v ^′ The method comprises the steps of carrying out a first treatment on the surface of the Time characteristic representation ti ^′ _h ,ti ^′ _d ,ti ^′ _w ,ti ^′ _m 。

2. The method for predicting highway traffic based on decision tree and multivariate assistance information as set forth in claim 1, wherein: the step (1) specifically comprises:

the microwave vehicle detector records information once every 5 minutes, including a time stamp, the total traffic of the split lanes, the traffic of the split vehicle types, the average vehicle speed, the average vehicle length and the average vehicle distance;

the meteorological instrument detector records information including precipitation, visibility, road surface smoothness, wind speed and wind direction once every 5 minutes.

3. The method for predicting highway traffic based on decision tree and multivariate assistance information as set forth in claim 2, wherein: and (3) carrying out aggregation operation on the data acquired in 5 minutes, namely taking 12 record points as a group of data taking an average value as a time point, and constructing a data set on the basis.

4. The method for predicting highway traffic based on decision tree and multivariate assistance information as set forth in claim 1, wherein: the step (2) specifically comprises:

step (2.1), extracting the total traffic flow tr of the expressway lane level _t Traffic flow tr for small, medium and large vehicle types _s 、tr _m 、tr _l Average speed s, average vehicle length l;

step (2.2), extracting weather information including visibility w in a certain range of the high-speed road section _v Degree of road surface wet skid w _p ；

Step (2.3) of extracting time features including time period features ti based on the information acquisition time stamps _h Date feature ti _d Week characteristics ti _w Month characteristics ti _m 。

5. The method for predicting highway traffic based on decision tree and multivariate assistance information as set forth in claim 1, wherein: and (3) filling data in a mode of adopting a smooth average value to the missing value or the abnormal value in the steps (2.1) and (2.2).

6. The method for predicting highway traffic based on decision tree and multivariate assistance information as set forth in claim 1, wherein: setting the window size as size in the step (4.1), and fusing the time sequence as follows: s' _t ＝s _t-size ||s _t-size+1 ||...||s _t Where (a||b) denotes stitching two dimensional 12 sequences a and b into one dimensional 24 sequence.

7. The method for predicting highway traffic based on decision tree and multivariate assistance information as set forth in claim 1, wherein: the objective function of multi-feature gradient lifting decision tree model learning in the step (5) is as follows:

wherein n represents the sample space size，y _t The value of the true flow rate is indicated,representing a flow prediction value; />Representing the square loss function of the L2 regularization.

8. The method for predicting highway traffic based on decision tree and multivariate assistance information as set forth in claim 7, wherein: the flow predicted valueIs combined by a plurality of decision trees, and the calculation mode is as follows:

where K represents the number of regression trees, f _k () Represents the kth tree, s _t ^′ Is a fused time series.