CN119672961B

CN119672961B - Traffic data prediction method and device

Info

Publication number: CN119672961B
Application number: CN202510171827.7A
Authority: CN
Inventors: 邵健轩; 郑立勇; 郝勇刚; 李文婧; 谢易珏
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2025-02-17
Filing date: 2025-02-17
Publication date: 2025-05-30
Anticipated expiration: 2045-02-17
Also published as: CN119672961A

Abstract

The application discloses a traffic data prediction method, which comprises the steps of utilizing a trained prediction model for traffic data prediction, acquiring predicted traffic data of a traffic position to be predicted at a future time based on historical traffic data of the traffic position to be predicted at a historical time, wherein the prediction model comprises a time sequence sub-model for acquiring time sequence characteristics of the historical traffic data to obtain a first prediction result, a static space sub-model for acquiring static space characteristics of the historical traffic data to obtain a second prediction result, a multi-model layer formed by at least two dynamic space sub-models for acquiring dynamic space characteristics of the historical traffic data to obtain a third prediction result, and a routing layer for selecting the prediction result output by each sub-model in the multi-model layer. The application realizes the prediction combination of at least two of time sequence characteristics, static space characteristics and dynamic space characteristics, and improves the adaptability of traffic data prediction in various application scenes.

Description

Traffic data prediction method and device

技术领域Technical Field

本发明涉及智能交通领域，特别地，涉及一种交通数据预测方法。The present invention relates to the field of intelligent transportation, and in particular, to a traffic data prediction method.

背景技术Background Art

智慧交通系统中准确的交通数据预测是对交通问题进行精准的分析前提。交通数据包括且不限于设定的交通位置处的流量、待通行队列长度、通行速率等交通路网指标，例如，某交叉路口处的流量、待通行队列长度、通行速率等。交通数据预测涉及到多种不同的场景，包括但不限于高速公路和城市道路，其中，除了正常的道路连接之外，路网中也存在一些空间孤立道路，比如广泛存在于城市边缘的一些与其他道路无连接的道路；也存在时空特性复杂的道路，如城市中的高速公路匝道，其道路结构特殊且容易受节假日等特定事件影响。Accurate traffic data prediction in the intelligent transportation system is a prerequisite for accurate analysis of traffic problems. Traffic data includes but is not limited to traffic flow, queue length, traffic speed and other traffic network indicators at the set traffic location, for example, the flow, queue length, and traffic speed at a certain intersection. Traffic data prediction involves a variety of different scenarios, including but not limited to highways and urban roads. In addition to normal road connections, there are also some spatially isolated roads in the road network, such as some roads that are widely distributed on the edge of the city and are not connected to other roads; there are also roads with complex spatiotemporal characteristics, such as highway ramps in cities, which have special road structures and are easily affected by specific events such as holidays.

相关交通数据预测方法通常使用单一的建模方法以较好地适应某一场景下的交通数据预测，但难以泛化至其他场景并取得较好的效果。Related traffic data prediction methods usually use a single modeling method to better adapt to traffic data prediction in a certain scenario, but it is difficult to generalize to other scenarios and achieve better results.

发明内容Summary of the invention

本发明提供了一种交通数据预测方法，以实现不同场景下的交通数据预测的泛化能力。The present invention provides a traffic data prediction method to achieve the generalization capability of traffic data prediction in different scenarios.

本申请第一方面提供一种交通数据预测方法，该方法包括：The first aspect of the present application provides a traffic data prediction method, the method comprising:

利用训练后的用于交通数据预测的预测模型，基于待预测交通位置处在历史时间的历史交通数据，获取待预测交通位置处在未来时间的预测交通数据，Using the trained prediction model for traffic data prediction, based on the historical traffic data of the traffic location to be predicted at the historical time, the predicted traffic data of the traffic location to be predicted at the future time is obtained,

其中，in,

预测模型包括：用于获取历史交通数据时序特征以得到第一预测结果的时序子模型、用于获取历史交通数据静态空间特征以得到第二预测结果的静态空间子模型、用于获取历史交通数据动态空间特征以得到第三预测结果的动态空间子模型中的至少二者所组成的多模型层，以及用于选择多模型层中各子模型输出的预测结果的路由层。The prediction model includes: a multi-model layer consisting of at least two of a temporal sub-model for obtaining temporal characteristics of historical traffic data to obtain a first prediction result, a static spatial sub-model for obtaining static spatial characteristics of historical traffic data to obtain a second prediction result, and a dynamic spatial sub-model for obtaining dynamic spatial characteristics of historical traffic data to obtain a third prediction result, and a routing layer for selecting the prediction results output by each sub-model in the multi-model layer.

作为一种可能的实施方式，所述预测模型以如下方式进行训练：As a possible implementation, the prediction model is trained in the following manner:

将历史交通数据切分为设定样本时间长度的时序片段，将时序片段作为样本数据，输入至待训练的预测模型，The historical traffic data is divided into time series segments of set sample time length, and the time series segments are used as sample data and input into the prediction model to be trained.

根据子模型的数量、各子模型的分类预测损失函数值、以及各子模型的样本输出概率，计算预测模型的预测损失函数值，其中，分类预测损失函数值包括：用于放弃较差子模型的第一类预测损失函数值，以及，用于选择较佳子模型的第二类预测损失函数值，The prediction loss function value of the prediction model is calculated according to the number of sub-models, the classification prediction loss function value of each sub-model, and the sample output probability of each sub-model, wherein the classification prediction loss function value includes: a first type of prediction loss function value for abandoning a poor sub-model, and a second type of prediction loss function value for selecting a better sub-model.

基于预测模型的预测损失函数值进行反向传播，Back propagation is performed based on the predicted loss function value of the prediction model,

反复训练，直至训练结束。Repeat the training until it is completed.

作为一种可能的实施方式，所述预测模型的预测损失函数值以如下方式计算：As a possible implementation, the prediction loss function value of the prediction model is calculated as follows:

对于任一子模型，计算该子模型的分类预测损失函数值与该子模型的样本输出概率的对数之间的乘积，得到该子模型的预测损失函数值，For any sub-model, calculate the product of the classification prediction loss function value of the sub-model and the logarithm of the sample output probability of the sub-model to obtain the prediction loss function value of the sub-model.

计算所有子模型的预测损失函数值的平均值，得到预测模型的预测损失函数值；Calculate the average of the prediction loss function values of all sub-models to obtain the prediction loss function value of the prediction model;

所述第一类预测损失函数值以如下方式确定：The first type of prediction loss function value is determined as follows:

对于任一子模型，For any sub-model,

在该子模型所输出的预测样本值与真实值之间的差异小于设定的第一差异阈值、且该子模型的样本输出概率大于设定的样本输出概率阈值的情形下，该子模型的第一类预测损失函数值设置为1，When the difference between the predicted sample value output by the sub-model and the true value is less than the set first difference threshold, and the sample output probability of the sub-model is greater than the set sample output probability threshold, the first type of prediction loss function value of the sub-model is set to 1.

在该子模型所输出的预测样本值与真实值之间的差异大于第一差异阈值、且该子模型的样本输出概率不大于样本输出概率阈值的情形下，该子模型的第一类预测损失函数值设置为子模型数量减去1后的倒数，When the difference between the predicted sample value output by the sub-model and the true value is greater than the first difference threshold, and the sample output probability of the sub-model is not greater than the sample output probability threshold, the first type of prediction loss function value of the sub-model is set to the reciprocal of the number of sub-models minus 1.

否则，该子模型的第一类预测损失函数值设置为0；Otherwise, the first-class prediction loss function value of this sub-model is set to 0;

所述第二类预测损失函数值以如下方式确定：The second type of prediction loss function value is determined as follows:

对于任一子模型，For any sub-model,

在该子模型所输出的预测样本值与真实值之间的差异小于第二差异阈值、且该子模型的样本输出概率大于样本输出概率阈值的情形下，该子模型的第二类预测损失函数值设置为1，When the difference between the predicted sample value output by the sub-model and the true value is less than the second difference threshold, and the sample output probability of the sub-model is greater than the sample output probability threshold, the second type of prediction loss function value of the sub-model is set to 1.

在该子模型所输出的预测样本值与真实值之间的差异大于第二差异阈值、且该子模型的样本输出概率不大于样本输出概率阈值的情形下，该子模型的第二类预测损失函数值设置为子模型数量减去1后的倒数，When the difference between the predicted sample value output by the sub-model and the true value is greater than the second difference threshold, and the sample output probability of the sub-model is not greater than the sample output probability threshold, the second type prediction loss function value of the sub-model is set to the reciprocal of the number of sub-models minus 1.

否则，该子模型的第二类预测损失函数值设置为0，Otherwise, the second-category prediction loss function value of this sub-model is set to 0.

其中，in,

第二差异阈值为1与第一差异阈值之差，差异阈值为用于表征在概率分布范围内所在等份数值点的分位数阈值，The second difference threshold is the difference between 1 and the first difference threshold. The difference threshold is the quantile threshold used to characterize the equal-division numerical points within the probability distribution range.

输出概率阈值根据所需输出子模型预测结果的类型数量确定。The output probability threshold is determined according to the number of types of prediction results of the required output sub-model.

作为一种可能的实施方式，所述利用训练后的用于交通数据预测的预测模型，基于待预测交通位置处在历史时间的历史交通数据，获取待预测交通位置处在未来时间的预测交通数据，包括：As a possible implementation, the method of using the trained prediction model for traffic data prediction to obtain predicted traffic data for the traffic location to be predicted at a future time based on historical traffic data of the traffic location to be predicted at a historical time includes:

按照设定历史时间步长，提取待预测交通位置处的历史交通数据，得到在设定历史时间步长的各历史时间点下的历史交通数据，According to the set historical time step, the historical traffic data at the traffic location to be predicted is extracted to obtain the historical traffic data at each historical time point of the set historical time step.

按照时间顺序，将各历史时间点下的历史交通数据依次输入至训练后的预测模型，使得：预测模型多模型层中的各子模型分别从当前输入的历史序列数据获取该各子模型的预测结果，路由层根据各子模型的输出概率，选择输出概率大于设定输出概率阈值的各子模型的预测结果，In chronological order, the historical traffic data at each historical time point is sequentially input into the trained prediction model, so that: each sub-model in the multi-model layer of the prediction model obtains the prediction results of each sub-model from the currently input historical sequence data, and the routing layer selects the prediction results of each sub-model whose output probability is greater than the set output probability threshold according to the output probability of each sub-model.

其中，in,

子模型的输出概率用于表征：该子模型所输出的预测结果与当前历史序列数据的模式特征之间的相似度，在各子模型所输出的预测结果与当前历史序列数据的模式特征之间的各相似度的总和中所占有的比例，The output probability of the sub-model is used to represent: the proportion of the similarity between the prediction results output by the sub-model and the pattern features of the current historical sequence data in the sum of the similarities between the prediction results output by each sub-model and the pattern features of the current historical sequence data,

模式特征用于表征当前历史序列数据与当前时序模式相关性，Pattern features are used to characterize the correlation between the current historical sequence data and the current time series pattern.

当前时序模式用于表征通过时间序列搜索出的重复发生概率较高的各交通数据模式。The current time series pattern is used to characterize each traffic data pattern with a high probability of recurrence that is searched through the time series.

作为一种可能的实施方式，所述预测结果为各未来时间点的预测交通数据所组成的预测序列数据，其中，相邻两未来时间点之间为设定预测时间步长；As a possible implementation, the prediction result is prediction sequence data composed of predicted traffic data at each future time point, wherein the prediction time step is set between two adjacent future time points;

作为一种可能的实施方式，所述输出概率以如下方式确定：As a possible implementation, the output probability is determined in the following manner:

对于任一子模型，For any sub-model,

根据线性权重和线性偏移量，计算用于表征多模型层对当前历史序列数据的预测结果的多模型层预测序列数据，According to the linear weight and linear offset, the multi-model layer prediction sequence data used to characterize the prediction results of the multi-model layer for the current historical sequence data is calculated.

根据多模型层预测序列数据和当前时序模式，计算多模型层预测序列数据与当前时序模式中各交通数据模式之间的相关度，并通过所计算的各相关度对当前时序模式中的各交通数据模式进行加权，得到当前历史序列数据的模式特征，According to the multi-model layer prediction sequence data and the current time series mode, the correlation between the multi-model layer prediction sequence data and each traffic data mode in the current time series mode is calculated, and each traffic data mode in the current time series mode is weighted by the calculated correlation to obtain the pattern characteristics of the current historical sequence data.

根据当前历史序列数据的模式特征和该子模型所输出的预测结果，计算该子模型所输出的预测结果与当前历史序列数据的模式特征之间的相似度，According to the pattern features of the current historical sequence data and the prediction results output by the sub-model, the similarity between the prediction results output by the sub-model and the pattern features of the current historical sequence data is calculated.

根据当前历史序列数据的模式特征和各子模型所输出的预测结果，计算该各子模型所输出的预测结果与当前历史序列数据的模式特征之间的相似度，得到该各子模型的相似度，累计该各子模型的相似度，得到各相似度的总和，According to the pattern characteristics of the current historical sequence data and the prediction results output by each sub-model, the similarity between the prediction results output by each sub-model and the pattern characteristics of the current historical sequence data is calculated to obtain the similarity of each sub-model, and the similarity of each sub-model is accumulated to obtain the sum of each similarity.

计算该子模型所输出的预测结果与当前历史序列数据的模式特征之间的相似度与各相似度的总和的比值，得到该子模型的输出概率。The ratio of the similarity between the prediction result output by the sub-model and the pattern features of the current historical sequence data to the sum of the similarities is calculated to obtain the output probability of the sub-model.

作为一种可能的实施方式，所述根据线性权重和线性偏移量，计算用于表征多模型层对当前历史序列数据的预测结果的多模型层预测序列数据，包括：As a possible implementation, the calculation of the multi-model layer prediction sequence data for characterizing the prediction result of the multi-model layer for the current historical sequence data according to the linear weight and the linear offset includes:

将线性权重与当前历史序列数据的乘积结果，累加线性偏移量，得到当前历史序列数据的多模型层预测序列数据；The product of the linear weight and the current historical sequence data is added with the linear offset to obtain the multi-model layer prediction sequence data of the current historical sequence data;

作为一种可能的实施方式，所述根据多模型层预测序列数据和当前时序模式，计算多模型层预测序列数据与当前时序模式中各交通数据模式之间的相关度，包括：As a possible implementation, the calculation of the correlation between the multi-model layer prediction sequence data and each traffic data pattern in the current time series pattern according to the multi-model layer prediction sequence data and the current time series pattern includes:

对于当前时序模式中的任一交通数据模式，For any traffic data mode in the current time series mode,

计算以自然常数为底数、且以多模型层预测序列数据与用于表征该交通数据模式的行向量的转置的乘积结果为指数的指数函数值，得到该交通数据模式的指数函数值，其中，多模型层预测序列数据为一行向量，该交通数据模式的行向量的转置为列向量，当前时序模式为时序模式矩阵，时序模式矩阵中的每一行对应一交通数据模式，每一列对应一时序，Calculate an exponential function value with a natural constant as the base and a product of the multi-model layer prediction sequence data and the transpose of the row vector used to characterize the traffic data pattern as the exponent to obtain the exponential function value of the traffic data pattern, wherein the multi-model layer prediction sequence data is a row vector, the transpose of the row vector of the traffic data pattern is a column vector, the current time series pattern is a time series pattern matrix, each row in the time series pattern matrix corresponds to a traffic data pattern, and each column corresponds to a time series,

计算以自然常数为底数、且以多模型层预测序列数据与各交通数据模式的行向量的转置的乘积结果为指数的指数函数值，得到各交通数据模式的指数函数值，并累计该各交通数据模式的指数函数值，Calculate the exponential function value with the natural constant as the base and the product result of the transposition of the row vector of each traffic data mode and the multi-model layer prediction sequence data as the exponent, obtain the exponential function value of each traffic data mode, and accumulate the exponential function values of each traffic data mode,

计算该交通数据模式的指数函数值与所累计的各交通数据模式的指数函数值的比值，得到多模型层预测序列数据与该交通数据模式的相关度。The ratio of the exponential function value of the traffic data pattern to the accumulated exponential function values of each traffic data pattern is calculated to obtain the correlation between the multi-model layer prediction sequence data and the traffic data pattern.

作为一种可能的实施方式，所述通过所计算的各相关度对当前时序模式中的各交通数据模式进行加权，包括：As a possible implementation, weighting each traffic data mode in the current time series mode by using each calculated correlation degree includes:

对于每一交通数据模式的相关度，计算该交通数据模式的相关度与该交通数据模式的行向量的乘积，得到该交通数据模式加权后的行向量，For each traffic data pattern relevance, the product of the traffic data pattern relevance and the row vector of the traffic data pattern is calculated to obtain a weighted row vector of the traffic data pattern.

将各交通数据模式加权后的行向量进行相加，得到模式特征向量；Add the weighted row vectors of each traffic data pattern to obtain a pattern feature vector;

作为一种可能的实施方式，所述选择输出概率大于设定输出概率阈值的各子模型的预测结果，进一步包括：As a possible implementation manner, the selecting the prediction results of each sub-model whose output probability is greater than a set output probability threshold further includes:

按照所选择各子模型的输出概率，确定融合权值，According to the output probability of each selected sub-model, the fusion weight is determined.

使用融合权值对所选择的预测结果进行加权求和，得到预测交通数据。The selected prediction results are weighted and summed using the fusion weights to obtain the predicted traffic data.

作为一种可能的实施方式，所述时序子模型为时序大模型，该时序大模型用于建模任一交通位置处的交通数据在下一时间点的交通时序特征仅与该交通位置处的交通数据上一时间点的交通时序特征相关，As a possible implementation, the time series sub-model is a time series large model, which is used to model the traffic time series characteristics of traffic data at any traffic location at the next time point, which are only related to the traffic time series characteristics of the traffic data at the traffic location at the previous time point.

作为一种可能的实施方式，所述静态空间子模型为图神经网络模型，该图神经网络模型用于建模任一交通位置处的交通数据在下一时间点的交通静态空间特征与该交通位置处的交通数据在上一时间点的交通静态空间特征、及与该交通位置处静态相邻的静态交通位置处的交通数据的静态空间特征相关，As a possible implementation, the static space sub-model is a graph neural network model, which is used to model the static spatial characteristics of traffic data at any traffic location at the next time point, which are related to the static spatial characteristics of traffic data at the traffic location at the previous time point, and the static spatial characteristics of traffic data at a static traffic location statically adjacent to the traffic location.

作为一种可能的实施方式，所述动态空间子模型为图神经网络模型为注意力机制模型，该注意力机制模型用于建模任一交通位置处的交通数据在下一时间点的空间特征与所有交通位置处在上一时间点的空间特征相关，该空间特征相关通过注意力机制获取。As a possible implementation, the dynamic spatial sub-model is a graph neural network model and an attention mechanism model, which is used to model that the spatial characteristics of traffic data at any traffic location at the next time point are related to the spatial characteristics of all traffic locations at the previous time point, and the spatial feature correlation is obtained through the attention mechanism.

本申请第二方面提供一种用于交通数据预测的预测模型结构，该预测模型结构包括：A second aspect of the present application provides a prediction model structure for traffic data prediction, the prediction model structure comprising:

用于获取历史交通数据时序特征以得到第一预测结果的时序子模型、用于获取历史交通数据静态空间特征以得到第二预测结果的静态空间子模型、用于获取历史交通数据动态空间特征以得到第三预测结果的动态空间子模型中的至少二者所组成的多模型层，以及a multi-model layer consisting of at least two of a temporal sub-model for obtaining temporal features of historical traffic data to obtain a first prediction result, a static spatial sub-model for obtaining static spatial features of historical traffic data to obtain a second prediction result, and a dynamic spatial sub-model for obtaining dynamic spatial features of historical traffic data to obtain a third prediction result, and

用于选择多模型层中各子模型输出的预测结果的路由层。A routing layer used to select the prediction results output by each sub-model in a multi-model layer.

本申请第三方面提供一种交通数据预测装置，该装置包括：A third aspect of the present application provides a traffic data prediction device, the device comprising:

预测模块，利用训练后的用于交通数据预测的预测模型，基于待预测交通位置处在历史时间的历史交通数据，获取待预测交通位置处在未来时间的预测交通数据，The prediction module uses the trained prediction model for traffic data prediction to obtain the predicted traffic data of the traffic location to be predicted at a future time based on the historical traffic data of the traffic location to be predicted at a historical time.

其中，in,

本申请提供的一种交通数据预测方法，通过预测模型的多模型层中的多类子模型和路由层，实现了多个子模型对历史交通数据的预测结果的融合，避免了对单一模型预测结果的依赖，实现了时序特征、静态空间特征、动态空间特征中的至少两者的预测组合，提高了多种应用场景下交通数据预测的适应性；进一步地，多类子模型中的时序子模型所使用的时序大模型，有利于将时序大模型与空间小模型结合，增强了预测模型整体的时序泛化能力。A traffic data prediction method provided by the present application realizes the fusion of prediction results of multiple sub-models on historical traffic data through multiple categories of sub-models and routing layers in the multi-model layer of the prediction model, avoids dependence on the prediction results of a single model, realizes the prediction combination of at least two of the time series characteristics, static spatial characteristics, and dynamic spatial characteristics, and improves the adaptability of traffic data prediction in various application scenarios; further, the time series large model used by the time series sub-model in the multi-category sub-model is conducive to combining the time series large model with the spatial small model, thereby enhancing the overall time series generalization ability of the prediction model.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本申请实施例交通数据预测方法的一种流程示意图。FIG1 is a flow chart of a traffic data prediction method according to an embodiment of the present application.

图2为本申请实施例交通数据预测方法的一种流程示意图。。FIG2 is a flow chart of a traffic data prediction method according to an embodiment of the present application.

图3为本实施例时序子模型建模的一种示意图。FIG3 is a schematic diagram of modeling of a sequential sub-model in this embodiment.

图4为本实施例静态空间子模型建模的一种示意图。FIG4 is a schematic diagram of modeling a static space sub-model in this embodiment.

图5为本实施例静态空间子模型建模的一种示意图。FIG5 is a schematic diagram of modeling a static space sub-model in this embodiment.

图6为本实施例门控网络层获得控制数据的一种示意图。FIG. 6 is a schematic diagram of a gated network layer obtaining control data according to this embodiment.

图7为本实施例训练后的预测模型基于输入的历史交通数据进行预测的一种示意图。FIG. 7 is a schematic diagram of a prediction model trained in this embodiment performing prediction based on input historical traffic data.

图8为本申请实施例交通数据预测装置的一种示意图。FIG8 is a schematic diagram of a traffic data prediction device according to an embodiment of the present application.

图9为本申请实施例交通数据预测装置的另一种示意图。FIG. 9 is another schematic diagram of the traffic data prediction device according to an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

为了使本申请的目的、技术手段和优点更加清楚明白，以下结合附图对本申请做进一步详细说明。In order to make the objectives, technical means and advantages of the present application more clearly understood, the present application is further described in detail below in conjunction with the accompanying drawings.

本申请实施例利用训练后的具有多个子模型所组成的多模型层的预测模型，基于历史交通数据，获取预测交通数据。The embodiment of the present application utilizes a trained prediction model having a multi-model layer composed of multiple sub-models to obtain predicted traffic data based on historical traffic data.

参见图1所示，图1为本申请实施例交通数据预测方法的一种流程示意图。该方法包括：Referring to FIG. 1 , FIG. 1 is a flow chart of a traffic data prediction method according to an embodiment of the present application. The method comprises:

其中，in,

作为一种示例，第一预测结果表征了待预测交通位置处在未来各时间点的预测交通数据，第二预测结果表征了基于路网中与待预测交通位置处相关的各固定交通位置处的交通数据以及待预测交通位置处历史交通数据所预测的待预测交通位置处未来各时间点的交通数据，第三预测结果表征了基于路网中所有交通位置处的历史交通数据所预测的待预测交通位置处在未来各时间点的预测交通数据。As an example, the first prediction result represents the predicted traffic data for the traffic location to be predicted at various future time points, the second prediction result represents the traffic data for the traffic location to be predicted at various future time points predicted based on the traffic data at various fixed traffic locations related to the traffic location to be predicted in the road network and the historical traffic data at the traffic location to be predicted, and the third prediction result represents the predicted traffic data for the traffic location to be predicted at various future time points predicted based on the historical traffic data at all traffic locations in the road network.

作为一种示例，按照设定的历史时间步长，提取待预测交通位置处的历史交通数据，得到在设定历史时间步长的各历史时间点下的历史交通数据，As an example, according to the set historical time step, the historical traffic data at the traffic location to be predicted is extracted to obtain the historical traffic data at each historical time point of the set historical time step.

其中，预测结果为设定预测时长的预测序列交通数据，该预测序列交通数据所包括的未来时间点总数可以与当前输入的历史序列数据所包括的历史时间点总数相同，也可以不相同，相邻两未来时间点之间为设定的预测时间步长。Among them, the prediction result is the predicted sequence traffic data with a set prediction time length. The total number of future time points included in the predicted sequence traffic data may be the same as or different from the total number of historical time points included in the currently input historical sequence data. The time between two adjacent future time points is the set prediction time step.

例如，按照时间递增顺序，将各历史时间点下的历史交通数据依次输入至训练后的预测模型，这样，预测序列数据中首个未来时间点为与当前输入的历史序列数据中最末历史时间点相邻的下一时间点，首个未来时间点与最末历史时间点之间相差设定的预测时间步长。For example, in ascending order of time, the historical traffic data at each historical time point is input into the trained prediction model in sequence. In this way, the first future time point in the prediction sequence data is the next time point adjacent to the last historical time point in the currently input historical sequence data, and the difference between the first future time point and the last historical time point is the set prediction time step.

设定的历史时间步长可以根据历史交通数据的采集时间间隔确定。所应理解的是，设定的历史时间步长可以是固定值，也可以是动态变化值，本申请对此不作限制。设定的预测时间步长可以与历史时间步长相同，也可以不相同，本申请对此不作限制。The historical time step can be determined according to the time interval for collecting historical traffic data. It should be understood that the historical time step can be a fixed value or a dynamically changing value, and this application does not limit this. The prediction time step can be the same as the historical time step or different, and this application does not limit this.

当前时序模式用于表征通过时间序列所获得的重复发生概率较高的各交通数据模式。The current time series mode is used to characterize each traffic data mode with a high probability of recurrence obtained through the time series.

所应理解的是，待预测交通位置处可以是一个待预测交通位置，例如，路网中的任一路口处，也可以是多个待预测交通位置，例如，路网中具有一定距离多个路口，多个待预测交通位置可具有不同的形态，包括且不限于交叉路口、丁字路口、单向路口、潮汐路口、多向路口、指定地点等，本申请对此不作限制。It should be understood that the traffic location to be predicted can be one traffic location to be predicted, for example, any intersection in the road network, or it can be multiple traffic locations to be predicted, for example, multiple intersections at a certain distance in the road network. The multiple traffic locations to be predicted can have different forms, including but not limited to intersections, T-junctions, one-way intersections, tidal intersections, multi-directional intersections, designated locations, etc., and the present application does not impose any restrictions on this.

对于多个待预测交通位置，可将各待预测交通位置处的历史交通数据并行地输入至预测模型，也可串行地输入至预测模型，或者，并行、串行交替地输入至预测模型，本申请对此不作限制。作为一种可能的实施方式，按照相同的时间顺序，并行地输入各待预测交通位置处的历史交通数据时，输入的各路历史交通数据可以具有相同的设定时长，以有利于提高预测的准确性，降低预测过程的复杂性。For multiple traffic locations to be predicted, the historical traffic data at each of the traffic locations to be predicted can be input into the prediction model in parallel, or can be input into the prediction model in series, or can be input into the prediction model alternately in parallel and in series, and this application does not limit this. As a possible implementation method, when the historical traffic data at each of the traffic locations to be predicted are input in parallel in the same time sequence, the input historical traffic data of each route can have the same set time length, so as to improve the accuracy of the prediction and reduce the complexity of the prediction process.

本申请实施例提供的交通数据预测方法，通过多种类型具有独立预测能力子模型所组成的多模型层，使得预测模型具有了多个专家模型功能，通过路由层使得各子模型的预测结果得以选择和融合，相当于实现了混合专家模型，有利于提高预测模型的泛化能力。The traffic data prediction method provided in the embodiment of the present application, through a multi-model layer composed of various types of sub-models with independent prediction capabilities, enables the prediction model to have multiple expert model functions, and through the routing layer, the prediction results of each sub-model can be selected and integrated, which is equivalent to realizing a hybrid expert model, which is beneficial to improving the generalization ability of the prediction model.

为便于理解本申请实施例，以下以时序子模型、静态空间子模型、动态空间子模型所组成的多模型层为例的预测模型来进行说明，所应理解的是，本申请实施例并不限于此，任意其中的两子模型也同样可适用。To facilitate understanding of the embodiments of the present application, the following description uses a prediction model using a multi-model layer consisting of a timing sub-model, a static space sub-model, and a dynamic space sub-model as an example. It should be understood that the embodiments of the present application are not limited to this, and any two of the sub-models may also be applicable.

参见图2所示，图2为本实施例预测模型的一种示意图。该预测模型包括：依次相连接的第一全连接层、多模型层、第二全连接层、路由层，其中，路由层包括门控网络层和路由选择层，门控网络层输入端分别与多模型层的输出端、第一全连接层的输入端相连接，门控网络（Gating Network）层的输出端与路由选择层的选择输入端相连接。See Figure 2, which is a schematic diagram of the prediction model of this embodiment. The prediction model includes: a first fully connected layer, a multi-model layer, a second fully connected layer, and a routing layer connected in sequence, wherein the routing layer includes a gating network layer and a routing selection layer, the input end of the gating network layer is respectively connected to the output end of the multi-model layer and the input end of the first fully connected layer, and the output end of the gating network layer is connected to the selection input end of the routing selection layer.

第一全连接层，用于将各交通位置处的各历史时间点交通特征数据进行组合和转换。The first fully connected layer is used to combine and transform the traffic characteristic data of each historical time point at each traffic location.

多模型层，用于基于来自第一全连接层输出的交通特征数据进行时空预测，该多模型层包括时序子模型、静态空间子模型、动态空间子模型，其中，The multi-model layer is used for performing spatiotemporal prediction based on the traffic feature data output from the first fully connected layer. The multi-model layer includes a time series sub-model, a static space sub-model, and a dynamic space sub-model, wherein:

时序子模型用于实现各交通位置处在各未来时间的交通数据预测，The time series sub-model is used to realize the traffic data prediction of each traffic location at each future time.

静态空间子模型用于实现静态交通位置处的交通数据预测，The static space sub-model is used to realize traffic data prediction at static traffic locations.

动态空间子模型用于实现动态交通位置处的交通数据预测；The dynamic spatial sub-model is used to realize traffic data prediction at dynamic traffic locations;

静态交通位置系指具有固定不变属性的静态交通位置，例如，道路路口、交叉口等，动态交通位置系指具有不固定属性的动态交通位置，例如，临时管制或限行的道路路口、交叉口等。Static traffic positions refer to static traffic positions with fixed attributes, such as road intersections and intersections, etc. Dynamic traffic positions refer to dynamic traffic positions with non-fixed attributes, such as road intersections and intersections with temporary control or restrictions.

上述多模型层包括多种类型的子模型，每一种类型的子模型作为一专家模型，负责处理输入数据的特定部分或任务的一个子集，通过专家模型之间的分工合作来提高模型的性能和效率，以适用于处理复杂且多样化的交通数据。The above-mentioned multi-model layer includes multiple types of sub-models. Each type of sub-model acts as an expert model, responsible for processing a specific part of the input data or a subset of the tasks. The performance and efficiency of the model are improved through division of labor and cooperation among expert models, so as to be suitable for processing complex and diverse traffic data.

鉴于时序大模型的相关工作是比较多的，但单一的时序大模型忽略了空间信息，而在复杂的大型交通网络中，空间信息对于交通时序预测、补全等任务有着非常重要的作用；同时，空间建模方法也是多种多样，如基于图神经网络、注意力机制等，但这些空间建模方法在建模的时候采用的时序部分往往是小的时序模型，由于小的时序模型在时序建模能力和泛化能力相比时序大模型相对较差，也就是说，时序大模型和空间小模型其实各有优劣。在本实施例中，将多种空间建模方式与时序大模型的时序建模能力结合起来，多模型包括时序子模型、静态空间子模型、动态空间子模型三者。In view of the fact that there are many related works on large time series models, but a single large time series model ignores spatial information, and in complex large-scale transportation networks, spatial information plays a very important role in tasks such as traffic time series prediction and completion; at the same time, there are also various spatial modeling methods, such as those based on graph neural networks, attention mechanisms, etc., but the time series part used in these spatial modeling methods is often a small time series model when modeling. Since the small time series model is relatively poor in time series modeling ability and generalization ability compared with the large time series model, that is to say, the large time series model and the small space model actually have their own advantages and disadvantages. In this embodiment, a variety of spatial modeling methods are combined with the time series modeling ability of the large time series model, and the multi-model includes a time series sub-model, a static space sub-model, and a dynamic space sub-model.

作为一种示例，时序子模型采用时序大模型，该模型为用于处理和分析时间序列数据的深度学习模型，以捕捉时间序列中的长期依赖关系和动态变化，有利于预测未来趋势、异常检测等。时序子模型用于提取交通数据的时序特征，不考虑各交通位置之间的关系，以建模任一交通位置处的交通数据在下一时间点的交通时序特征仅与该交通位置处的交通数据上一时间点的交通时序特征相关，也就是说，基于该交通位置处上一时间点的历史时序特征来预测该交通位置处下一时间点的未来时序特征。用数学式表示为：As an example, the time series sub-model adopts the time series big model, which is a deep learning model for processing and analyzing time series data to capture long-term dependencies and dynamic changes in time series, which is conducive to predicting future trends, anomaly detection, etc. The time series sub-model is used to extract the time series characteristics of traffic data, without considering the relationship between each traffic location, to model the traffic time series characteristics of traffic data at any traffic location at the next time point is only related to the traffic time series characteristics of the traffic data at the previous time point at the traffic location, that is, based on the historical time series characteristics of the traffic location at the previous time point to predict the future time series characteristics of the traffic location at the next time point. It can be expressed as:

其中，H_i ^（t）表示交通位置i处的交通数据在下一时间t的交通时序映射，H_i ^（t-1）表示交通位置i处交通数据在上一时间t-1的交通时序映射，交通时序映射用于表征交通数据的时序特征信息，交通数据包括且不限于流量信息、速率信息、待通行队列长度等路网指标信息。Wherein, _Hi ^(t) represents the traffic time sequence mapping of the traffic data at the traffic position i at the next time t, _Hi ^(t-1) represents the traffic time sequence mapping of the traffic data at the traffic position i at the previous time t-1. The traffic time sequence mapping is used to characterize the time sequence characteristic information of the traffic data. The traffic data includes but is not limited to the traffic flow information, speed information, the length of the queue to be passed and other road network indicator information.

参见图3所示，图3为本实施例时序子模型建模的一种示意图。图中，不同颜色的圆表示不同交通位置，不同平面表示不同时间点。See Figure 3, which is a schematic diagram of the modeling of the time series sub-model in this embodiment. In the figure, circles of different colors represent different traffic locations, and different planes represent different time points.

静态空间子模型用于提取各交通位置处交通数据的静态空间特征。作为一种示例，静态空间子模型采用GCN模型，以提取各交通位置之间静态相关性，即，建模任一交通位置处的交通数据在下一时间点的交通静态空间特征，与该交通位置处的交通数据在上一时间点的交通静态空间特征、及与该交通位置处静态相邻的静态交通位置处的交通数据的静态空间特征相关，也就是说，基于该交通位置处上一时间点的历史静态空间特征以及其各静态交通位置处的静态空间特征，来预测该位置处下一时间点未来静态空间特征。用数学式表示为：The static space submodel is used to extract the static spatial features of traffic data at each traffic location. As an example, the static space submodel uses the GCN model to extract the static correlation between each traffic location, that is, to model the static spatial features of traffic data at any traffic location at the next time point, which is related to the static spatial features of traffic data at the traffic location at the previous time point, and the static spatial features of traffic data at static traffic locations that are statically adjacent to the traffic location. In other words, based on the historical static spatial features of the traffic location at the previous time point and the static spatial features of each static traffic location, the future static spatial features of the location at the next time point are predicted. It can be expressed as:

其中，S_i ^（t）表示交通位置i处的交通数据在下一时间t的交通静态空间特征，Among them, _Si ^(t) represents the static spatial characteristics of traffic data at traffic location i at the next time t,

S_i ^（t-1）表示交通位置i处的交通数据在上一时间t-1的交通静态空间特征，g（）表示静态空间特征映射，θ_j为与该交通位置i处静态相邻的静态交通位置j处的交通数据的静态空间特征。例如，十字路口中的4个路段在空间上呈固定分布，高速路的入口处和出口处在空间上固定不变。S _i ^(t-1) represents the static spatial characteristics of the traffic data at the traffic position i at the previous time t-1, g() represents the static spatial feature mapping, and θ _j is the static spatial characteristics of the traffic data at the static traffic position j that is statically adjacent to the traffic position i. For example, the four sections at the intersection are fixedly distributed in space, and the entrance and exit of the highway are fixed in space.

参见图4所示，图4为本实施例静态空间子模型建模的一种示意图。图中，不同颜色的圆表示不同交通位置，不同平面表示不同时间点，黑色线段表示该线段两端圆所表征位置处的空间位置关系，红色线段表示该线段两端圆所表征位置处的空间位置之间在不同时间点的关联。See Figure 4, which is a schematic diagram of modeling the static space sub-model of this embodiment. In the figure, circles of different colors represent different traffic positions, different planes represent different time points, black line segments represent the spatial position relationship of the positions represented by the circles at both ends of the line segment, and red line segments represent the relationship between the spatial positions of the positions represented by the circles at both ends of the line segment at different time points.

动态空间子模型用于提取各交通位置处交通数据的动态空间特征。作为一种示例，动态空间子模型为注意力机制模型，以提取各交通位置之间动态相关性，即，建模路网中任一交通位置处交通数据在下一时间点的空间特征与路网中所有交通位置处在上一时间点的空间特征相关，相关性通过注意力机制获取，也就是说，基于所有交通位置处上一时间的历史交通动态空间特征来预测该交通位置处下一时间的未来交通动态空间特征。用数学式表达为：The dynamic spatial submodel is used to extract the dynamic spatial features of traffic data at each traffic location. As an example, the dynamic spatial submodel is an attention mechanism model to extract the dynamic correlation between traffic locations, that is, the spatial features of traffic data at any traffic location in the modeled road network at the next time point are related to the spatial features of all traffic locations in the road network at the previous time point. The correlation is obtained through the attention mechanism, that is, the future dynamic spatial features of traffic at the next time point at the traffic location are predicted based on the historical dynamic spatial features of traffic at all traffic locations at the previous time point. It can be expressed mathematically as:

其中，D_i ^（t）表示交通位置i处的交通数据在下一时间t的交通动态空间特征，Where D _i ^(t) represents the traffic dynamic spatial characteristics of the traffic data at the traffic location i at the next time t,

D_J ^（t-1）表示所有交通位置J处的交通数据在上一时间t-1的交通动态空间特征，Attention()表示注意力机制操作。 _DJ ^(t-1) represents the traffic dynamic spatial characteristics of the traffic data at all traffic locations J at the previous time t-1, and Attention() represents the attention mechanism operation.

参见图5所示，图5为本实施例静态空间子模型建模的一种示意图。图中，不同颜色的圆表示不同交通位置，不同平面表示不同时间点，黑色线段表示该线段两端圆所表征位置处的空间位置关系，红色线段表示该线段两端圆所表征位置处的空间位置之间在不同时间点的关联。See Figure 5, which is a schematic diagram of modeling the static space sub-model of this embodiment. In the figure, circles of different colors represent different traffic positions, different planes represent different time points, black line segments represent the spatial position relationship of the positions represented by the circles at both ends of the line segment, and red line segments represent the relationship between the spatial positions of the positions represented by the circles at both ends of the line segment at different time points.

上述子模型中，时间t与时间t-1之间的时间间隔可以为设定的预测时间步长。In the above sub-model, the time interval between time t and time t-1 may be a set prediction time step.

第二全连接层，用于将来自子模型的预测结果进行组合和转换。The second fully connected layer is used to combine and transform the prediction results from the sub-models.

路由层，用于选择各子模型的预测结果以输出预测交通数据，作为一种示例，通过门控网络层用于计算不同子模型的概率、路由选择层根据概率向量选择各子模型并将各子模型的输出进行融合，以实现多模型的混合输出。The routing layer is used to select the prediction results of each sub-model to output the predicted traffic data. As an example, the gated network layer is used to calculate the probabilities of different sub-models, and the routing selection layer selects each sub-model according to the probability vector and fuses the outputs of each sub-model to achieve a mixed output of multiple models.

路由层包括门控网络层和路由选择层，多模型层中各子模型输出的预测结果分别输入至门控网络层，门控网络层输出的控制数据输入至路由选择层，第二全连接层输出的各子模型的交通特征数据输入至路由选择层以作为被选择数据。The routing layer includes a gating network layer and a routing selection layer. The prediction results output by each sub-model in the multi-model layer are respectively input into the gating network layer, the control data output by the gating network layer is input into the routing selection layer, and the traffic feature data of each sub-model output by the second fully connected layer is input into the routing selection layer as the selected data.

参见图6所示，图6为本实施例门控网络层获得控制数据的一种示意图。通过学习用于表征通过时间序列搜索出的重复发生概率较高的各交通数据模式的当前时序模式，得到用于表征交通数据模式的时序模式矩阵，该矩阵的每一行表征了一交通数据模式，一行向量用于表征一交通数据模式的模式特征向量，每一列表征了一时序信息，即各时间t，该矩阵中的任一元素表征了该元素所在行的交通数据模式、所在列的时序的特征值。交通数据模式可以是特征行为模式，例如，交通高峰、平峰、低峰、拥堵等模式。See FIG6 , which is a schematic diagram of the gated network layer obtaining control data in this embodiment. By learning the current time series pattern used to characterize each traffic data pattern with a high probability of recurrence searched through the time series, a time series pattern matrix used to characterize the traffic data pattern is obtained, each row of the matrix represents a traffic data pattern, a row vector is used to characterize the pattern feature vector of a traffic data pattern, and each column characterizes a time series information, that is, at each time t, any element in the matrix represents the traffic data pattern of the row where the element is located and the characteristic value of the time series of the column where the element is located. The traffic data pattern can be a characteristic behavior pattern, for example, traffic peak, flat peak, low peak, congestion and other patterns.

输入至预测模型的当前输入序列交通数据输入至一线性权重网络，得到多模型层的预测序列交通数据，可用数学式表示为：The current input sequence traffic data input to the prediction model is input to a linear weight network to obtain the predicted sequence traffic data of the multi-model layer, which can be expressed mathematically as follows:

其中，X_i ^（t）为交通位置i处在时间t的当前输入序列交通数据，即，输入至预测模型的输入数据，W_q为线性权重网络的权重，b_q为线性权重网络的偏移量，Q_i ^（t）为线性权重网络所输出的交通位置i处在时间t的多模型层预测序列交通数据。Among them, _Xi ^(t) is the current input sequence traffic data at traffic location i at time t, that is, the input data input to the prediction model, _Wq is the weight of the linear weight network, _bq is the offset of the linear weight network, and _Qi ^(t) is the multi-model layer prediction sequence traffic data at traffic location i at time t output by the linear weight network.

根据多模型层预测序列交通数据、以及时序模式矩阵中各交通数据模式的行向量，获取该多模型层预测序列交通数据与各交通数据模式的相关性。具体地，对于当前时序模式中的任一交通数据模式：According to the multi-model layer prediction sequence traffic data and the row vector of each traffic data pattern in the time series pattern matrix, the correlation between the multi-model layer prediction sequence traffic data and each traffic data pattern is obtained. Specifically, for any traffic data pattern in the current time series pattern:

计算以自然常数为底数、且以多模型层预测序列数据与用于表征该交通数据模式的行向量的转置的乘积结果为指数的指数函数值，得到该交通数据模式的指数函数值，其中，多模型层预测序列数据为一行向量，该交通数据模式的行向量的转置为列向量，An exponential function value with a natural constant as the base and a product of the multi-model layer prediction sequence data and the transpose of the row vector used to characterize the traffic data pattern as the exponent is calculated to obtain the exponential function value of the traffic data pattern, wherein the multi-model layer prediction sequence data is a row vector, and the transpose of the row vector of the traffic data pattern is a column vector,

用数学式表示为：Mathematically expressed as:

其中，α_m为多模型层预测序列交通数据与交通数据模式m的相关度，M[m]^T为时序模式矩阵中交通数据模式m所对应的行向量M[m]的转置，为一列向量，Q_i ^(t)为一行向量，M为交通数据模式的总数，即时序模式矩阵的总行数。Among them, _αm is the correlation between the multi-model layer predicted sequence traffic data and the traffic data pattern m, M[m] ^T is the transpose of the row vector M[m] corresponding to the traffic data pattern m in the time series pattern matrix, which is a column vector, _Qi ^(t) is a row vector, and M is the total number of traffic data patterns, that is, the total number of rows in the time series pattern matrix.

对各交通数据模式进行加权求和，得到用于表征该交通数据的模式特征。具体地，对于每一交通数据模式的相关度，计算该交通数据模式的相关度与该交通数据模式的行向量的乘积，得到该交通数据模式加权后的行向量，将所有交通数据模式加权后的行向量进行相加，得到模式特征向量。用数学式表示为：Perform weighted summation on each traffic data pattern to obtain the pattern feature used to characterize the traffic data. Specifically, for each traffic data pattern relevance, calculate the product of the traffic data pattern relevance and the row vector of the traffic data pattern to obtain the weighted row vector of the traffic data pattern, and add the weighted row vectors of all traffic data patterns to obtain the pattern feature vector. It can be expressed as:

其中，O_i ^(t)为交通位置i处的交通数据在时间t的模式特征向量，该向量为一行向量，α_mM[m]为交通数据模式m加权后的行向量。Wherein, O _i ^(t) is the pattern feature vector of the traffic data at the traffic position i at time t, which is a row vector, and α _m M[m] is the weighted row vector of the traffic data pattern m.

将该交通数据的模式特征向量与多模型层中各子模型的预测结果进行相关性计算，得到用于表征各子模型的预测结果与模式特征向量相似程度的子模型相似度，用数学式表示为：The correlation between the pattern feature vector of the traffic data and the prediction results of each sub-model in the multi-model layer is calculated to obtain the sub-model similarity used to characterize the similarity between the prediction results of each sub-model and the pattern feature vector, which is expressed as follows:

其中，r_e表示子模型e的与模式特征向量之间的相似度，z_e表示子模型e的预测结果，可以为序列数据，可表示为向量形式，R( )表示相似度函数，其可以为余弦相似度、欧式距离等。Among them, _re represents the similarity between sub-model e and the pattern feature vector, _ze represents the prediction result of sub-model e, which can be sequence data and can be expressed in vector form, and R( ) represents the similarity function, which can be cosine similarity, Euclidean distance, etc.

基于各子模型的相似度，计算各子模型的输出概率，并将所计算的各子模型的输出概率作为门控网络层输出的控制数据输入至路由选择层。具体地，累计各子模型的相似度，得到各相似度的总和；对于任一子模型，计算该子模型的相似度与各相似度的总和的比值，得到该子模型的输出概率，用数学式表示为：Based on the similarity of each sub-model, the output probability of each sub-model is calculated, and the calculated output probability of each sub-model is input into the routing selection layer as the control data output by the gated network layer. Specifically, the similarities of each sub-model are accumulated to obtain the sum of each similarity; for any sub-model, the ratio of the similarity of the sub-model to the sum of each similarity is calculated to obtain the output probability of the sub-model, which is expressed as follows:

其中，p_e为子模型e的输出概率，在本实施例中e的数值为3，即所包含的子模型的总数。Wherein, p _e is the output probability of sub-model e. In this embodiment, the value of e is 3, that is, the total number of sub-models included.

所应理解的是，上述各子模型的预测结果可以是子模型隐藏层输出的交通数据，也可以是多模型层中各子模型输出层输出的该交通数据的交通数据，本申请对此不作限制。It should be understood that the prediction results of the above-mentioned sub-models can be the traffic data output by the hidden layer of the sub-model, or the traffic data output by the output layer of each sub-model in the multi-model layer, and this application does not impose any restrictions on this.

路由选择层根据各子模型的输出概率，在每一时间t选择输出概率大于设定输出概率阈值的子模型所输出的预测结果作为预测交通数据。例如，选择输出概率最高的k个子模型所输出的预测结果作为预测交通数据。The routing selection layer selects the prediction results output by the sub-models whose output probability is greater than the set output probability threshold at each time t as the predicted traffic data according to the output probability of each sub-model. For example, the prediction results output by the k sub-models with the highest output probability are selected as the predicted traffic data.

本实施例中，预测模型以如下方式进行训练。将历史交通数据切分为设定样本时间长度的时序片段，将时序片段作为样本数据，输入至待训练的预测模型，为了找到合适的路由，在反向传播时采用两类分类预测损失函数值：一类是用于放弃较差模型的第一类预测损失函数值，另一类是用于选择较佳模型的第二类预测损失函数值，以便选择预测结果相对更好的k个子模型的输出结果。In this embodiment, the prediction model is trained in the following manner: the historical traffic data is divided into time series segments of a set sample time length, and the time series segments are used as sample data and input into the prediction model to be trained. In order to find a suitable route, two types of classification prediction loss function values are used in back propagation: one type is the first type of prediction loss function value used to abandon the poor model, and the other type is the second type of prediction loss function value used to select the better model, so as to select the output results of k sub-models with relatively better prediction results.

作为一种示例，预测模型的损失函数以如下方式计算：根据各子模型的分类预测损失函数值、子模型的数量、以及各子模型的样本输出概率，计算预测模型的预测损失函数值。具体地，As an example, the loss function of the prediction model is calculated as follows: the prediction loss function value of the prediction model is calculated according to the classification prediction loss function value of each sub-model, the number of sub-models, and the sample output probability of each sub-model. Specifically,

对于任一子模型，计算该子模型的分类预测损失函数值与该子模型的样本输出概率的对数之间的乘积，得到该子模型的预测损失函数值，其中，对数可以为以2为底的对数。For any sub-model, the product of the classification prediction loss function value of the sub-model and the logarithm of the sample output probability of the sub-model is calculated to obtain the prediction loss function value of the sub-model, where the logarithm can be a logarithm with base 2.

计算所有子模型的预测损失函数值的平均值，得到预测模型的预测损失函数值。数学式表达为：Calculate the average of the prediction loss function values of all sub-models to get the prediction loss function value of the prediction model. The mathematical expression is:

其中，L(P′)表示设定样本时间步长即样本时间间隔P'内的样本数据输入时预测模型的损失函数值，E为子模型的总数，p_e′为子模型e的样本输出概率，l _e为子模型e的分类预测损失函数值。Wherein, L(P′) represents the loss function value of the prediction model when the sample data within the set sample time step, i.e., the sample time interval P′, is input, E is the total number of sub-models, p _e ′ is the sample output probability of sub-model e, and l _e is the classification prediction loss function value of sub-model e.

当采用第一类预测损失函数值时，设置一用于表征在概率分布范围内所在等份数值点的分位数阈值qth大于0.5，如此一来，任一子模型的样本预测值准确度及该子模型的输出概率会有以下4种情况。When the first type of prediction loss function value is used, a quantile threshold qth used to characterize the numerical points in the probability distribution range is set to be greater than 0.5. In this way, the accuracy of the sample prediction value of any sub-model and the output probability of the sub-model will have the following four situations.

（1）该子模型样本预测值与该子模型真实值相差小于qth，且该子模型样本输出概率在top k中。(1) The difference between the predicted value of the sub-model sample and the true value of the sub-model is less than qth, and the output probability of the sub-model sample is among the top k.

（2）该子模型样本预测值与该子模型真实值相差小于等于qth，且该子模型样本输出概率不在top k中。(2) The difference between the predicted value of the sub-model sample and the true value of the sub-model is less than or equal to qth, and the output probability of the sub-model sample is not in the top k.

（3）该子模型样本预测值与该子模型真实值相差大于等于qth，且该子模型样本输出概率在top k中。(3) The difference between the predicted value of the sub-model sample and the true value of the sub-model is greater than or equal to qth, and the output probability of the sub-model sample is among the top k.

（4）该子模型样本预测值与该子模型真实值相差大于qth，且该子模型输出样本概率不在top k中。(4) The difference between the predicted value of the sub-model sample and the true value of the sub-model is greater than qth, and the probability of the output sample of the sub-model is not in the top k.

其中情况（2）、（3）为较差模型的预测损失函数值，所以将其预测损失函数值le置为0。Among them, cases (2) and (3) are the prediction loss function values of the poor model, so their prediction loss function values le are set to 0.

基于此，任一子模型的第一类预测损失函数值可以如此方式确定：Based on this, the first-class prediction loss function value of any sub-model can be determined as follows:

用数学式表示为：Mathematically expressed as:

其中，l _e为子模型e的损失函数值，表示子模型e的样本预测值与该模型真实值y之间的差异，为第一差异阈值，取值为分位数阈值，top k(P′)表示样本时间步长即样本时间间隔P'内的样本数据输入预测模型时的k个最高样本输出概率，p_e′为子模型e的样本输出概率。Among them, l _e is the loss function value of sub-model e, Represents the sample prediction value of sub-model e The difference between the true value y and the model, is the first difference threshold, which is taken as the quantile threshold. top k(P′) represents the k highest sample output probabilities when the sample data within the sample time step, i.e., the sample time interval P′, is input into the prediction model. p _e ′ is the sample output probability of sub-model e.

当采用第二类预测损失函数值时，任一子模型样本预测值准确度、该子模型的概率会有以下4种情况。When the second type of prediction loss function value is used, the accuracy of the sample prediction value of any sub-model and the probability of the sub-model will have the following four situations.

（1）该子模型样本预测值与该子模型真实值相差小于1-qth，且该子模型的样本输出概率在top k中。(1) The difference between the sample prediction value of the sub-model and the true value of the sub-model is less than 1-qth, and the sample output probability of the sub-model is among the top k.

（2）该子模型样本预测值与该子模型真实值相差小于等于1-qth，且该子模型的样本输出概率不在top k中。(2) The difference between the sample prediction value of the sub-model and the true value of the sub-model is less than or equal to 1-qth, and the sample output probability of the sub-model is not in the top k.

（3）该子模型样本预测值与该子模型真实值相差大于等于1-qth，且该子模型的样本输出概率在top k中。(3) The difference between the sample prediction value of the sub-model and the true value of the sub-model is greater than or equal to 1-qth, and the sample output probability of the sub-model is among the top k.

（4）该子模型样本预测值与该子模型真实值相差大于1-qth，且该子模型的样本输出概率不在top k中。(4) The difference between the sample prediction value of the sub-model and the true value of the sub-model is greater than 1-qth, and the sample output probability of the sub-model is not in the top k.

其中，情况（1）、（4）为较佳模型的预测损失函数值，所以其预测损失函数值le不置为0。Among them, cases (1) and (4) are the prediction loss function values of the better models, so their prediction loss function values le are not set to 0.

基于此，任一子模型的第二类预测损失函数值可以如此方式确定：对于任一子模型，Based on this, the second type prediction loss function value of any sub-model can be determined in this way: for any sub-model,

否则，该子模型的第二类预测损失函数值设置为0数学式表示为：Otherwise, the second type of prediction loss function value of the sub-model is set to 0. The mathematical formula is:

其中，1-qth为第二差异阈值。Among them, 1-qth is the second difference threshold.

参见图7所示，图7为本实施例训练后的预测模型基于输入的历史交通数据进行预测的一种示意图。Refer to FIG. 7 , which is a schematic diagram of the trained prediction model of this embodiment performing prediction based on input historical traffic data.

本实施例中，待预测交通位置的数量包括N个，按照设定历史时间步长P，提取各待预测交通位置处的历史交通数据，得到在设定历史时间步长的各历史时间点下的历史交通数据；按照时间递增顺序，将各待预测交通位置处各历史时间点下的历史交通数据依次输入至训练后的预测模型，以输入各待预测交通位置处的历史序列交通数据，例如，图7中当前输入历史交通数据X_P中包括了N处待预测交通位置在各时间点t-P-1、t-P、…t的历史交通数据，交通数据可以为一维，例如，流量，也可以是多维，例如流量和速率。In this embodiment, the number of traffic locations to be predicted includes N. According to the set historical time step P, the historical traffic data at each traffic location to be predicted is extracted to obtain the historical traffic data at each historical time point of the set historical time step; in the ascending order of time, the historical traffic data at each historical time point of each traffic location to be predicted is sequentially input into the trained prediction model to input the historical sequence traffic data of each traffic location to be predicted. For example, the current input historical traffic data _XP in FIG7 includes the historical traffic data of N traffic locations to be predicted at each time point tP-1, tP, ...t. The traffic data may be one-dimensional, such as flow, or may be multi-dimensional, such as flow and rate.

在预测模型中：In the prediction model:

各待预测交通位置处的历史序列交通数据输入至路由层中的门控网络层，并同时输入至第一全连接层，The historical sequence traffic data at each traffic location to be predicted is input into the gating network layer in the routing layer and simultaneously input into the first fully connected layer.

各待预测交通位置处的历史序列交通数据经第一全连接层组合后，同时输入多模型层中的各子模型，The historical sequence traffic data at each traffic location to be predicted are combined by the first fully connected layer and then input into each sub-model in the multi-model layer at the same time.

每一子模型分别输出该子模型所预测的各待预测交通位置处在其预测时间步长S的预测结果，其中，预测结果预测序列交通数据，各待预测交通位置处的预测时间步长S可以相同，也可以不同，本实施例对此不作限制。Each sub-model outputs the prediction results of each traffic location to be predicted predicted by the sub-model at its prediction time step S, wherein the prediction results predict the sequence traffic data, and the prediction time step S at each traffic location to be predicted can be the same or different, and this embodiment does not impose any restrictions on this.

各子模型输出的预测结果分别经第二全连接层输入至路由层中的路由选择层，并且分别输入至路由层中的门控网络层。The prediction results output by each sub-model are respectively input to the routing selection layer in the routing layer through the second fully connected layer, and are respectively input to the gating network layer in the routing layer.

门控网络层根据输入的预测结果、输入的历史交通数据，计算各子模型的输出概率，并输入至路由层中的路由选择层，The gated network layer calculates the output probability of each sub-model based on the input prediction results and input historical traffic data, and inputs it to the routing selection layer in the routing layer.

路由选择层根据各子模型的输出概率，从各子模型所输出的预测结果中选择输出概率大于设定输出概率阈值的子模型所输出的预测结果。The routing selection layer selects the prediction results output by the sub-model whose output probability is greater than the set output probability threshold from the prediction results output by the sub-models according to the output probability of each sub-model.

进一步地，预测模型还可以根据所选择的子模型的输出概率，将所选择的预测结果按照其输出概率进行融合，得到预测交通数据，例如，图7中预测结果包括了N处待预测交通位置在各时间点t+1、t+S+1、…的预测交通数据。所应理解的是，融合所选择的预测结果可以在路由选择层进行，也可以在路由层的外部进行，本申请对此不作限制。Furthermore, the prediction model can also fuse the selected prediction results according to the output probability of the selected sub-model to obtain predicted traffic data. For example, the prediction results in FIG7 include predicted traffic data of N traffic locations to be predicted at each time point t+1, t+S+1, ... It should be understood that the fusion of the selected prediction results can be performed in the routing selection layer or outside the routing layer, and the present application does not limit this.

为便于理解本实施例，以下以预测200个交叉口路网的流量数据为例。To facilitate understanding of this embodiment, the following takes the prediction of traffic data of a road network of 200 intersections as an example.

假设路网流量采集时间间隔为5分钟，预测任务为根据过去1小时的全路网流量数据来预测未来1小时的全路网流量数据，则可以以采集时间间隔为历史时间步长和预测时间步长，利用训练后的预测模型，基于12个历史时间步长的流量数据预测未来12个预测时间步长的流量数据。Assuming that the road network traffic collection time interval is 5 minutes, and the prediction task is to predict the entire road network traffic data for the next hour based on the entire road network traffic data for the past hour, the collection time interval can be used as the historical time step and the prediction time step. The trained prediction model can be used to predict the traffic data for the next 12 prediction time steps based on the traffic data for 12 historical time steps.

预测模型可以如下方式进行训练：The prediction model can be trained as follows:

获取路网过去若干天例如7天的历史流量数据，将每个交叉口的历史流量数据以滑动窗口方式切分为时间长度为2小时的时序片段，作为预测模型的训练样本数据。如图7中所示，则设历史样本数据的历史时间步长P的数量为12，设预测时间步长S的数量为12，交叉口数量N=200，流量数据维度C=1。Obtain the historical traffic data of the road network for the past several days, for example, 7 days, and divide the historical traffic data of each intersection into time series segments of 2 hours in a sliding window manner as the training sample data of the prediction model. As shown in Figure 7, the number of historical time steps P of the historical sample data is set to 12, the number of prediction time steps S is set to 12, the number of intersections N=200, and the traffic data dimension C=1.

输入样本数据，其维度为[12,200,1]，首先通过第一全连接（FCN）层对数据进行转换。第一FCN层的输出数据，分别送入到多模型中的各子模型，分别得到输出z₁、z₂、z₃。The input sample data, whose dimension is [12, 200, 1], is first transformed through the first fully connected (FCN) layer. The output data of the first FCN layer is sent to each sub-model in the multi-model to obtain the output z ₁ , z ₂ , and z ₃ respectively.

路由选择层基于输入样本数据、时序模式矩阵M，根据公式（4）（5）（6）可以得到模式特征向量O；模式特征向量 O与各子模型输出的z_1~3基于式（7）进行相似度计算可以得到r_e，根据r_e可以得到每个子模型的样本输出概率p'_e。The routing selection layer is based on the input sample data and the time series pattern matrix M. According to formulas (4), (5), and (6), the pattern feature vector O can be obtained. The similarity between the pattern feature vector O and the output z1 _~3 of each sub-model can be calculated based on formula (7) to obtain _re . According to _re , the sample output probability _p'e of each sub-model can be obtained.

基于每个子模型输出的样本预测值y与该子模型的样本输出概率p'_e，基于预测模型的损失函数可以得到预测损失函数值，进行反向传播，从而优化网络参数。Based on the sample prediction value y output by each sub-model and the sample output probability _p'e of the sub-model, the prediction loss function value can be obtained based on the loss function of the prediction model, and back propagation is performed to optimize the network parameters.

例如：选取2个子模型即k=2，3个子模型的样本输出概率p'_e分别为p₁=0.5，p₂=0.4，p₃=0.1，3个子模型的样本预测值与实际值之间的差值分别为：For example: select 2 sub-models, that is, k=2, and the sample output probabilities _p'e of the three sub-models are p ₁ =0.5, p ₂ =0.4, p ₃ =0.1, respectively. The differences between the sample prediction values and the actual values of the three sub-models are:

qth设置为0.7； qth is set to 0.7;

若放弃较差模型损失函数值，则有：If we abandon the loss function value of the worse model, we have:

子模型1为情况（1），le=1；Submodel 1 is case (1), le = 1;

子模型2为情况（3），le=0；Submodel 2 is case (3), le = 0;

子模型3为情况（2），le=0；Submodel 3 is case (2), le = 0;

所以子模型2、3的较差，预测模型的预测损失函数值如下：Therefore, sub-models 2 and 3 are worse, and the prediction loss function value of the prediction model is as follows:

若选择较佳模型损失函数值，则有：If a better model loss function value is selected, then:

子模型1为情况（1），le=1；Submodel 1 is case (1), le = 1;

子模型2为情况（3），le=0；Submodel 2 is case (3), le = 0;

子模型3为情况（4），le=1/2；Submodel 3 is case (4), le = 1/2;

所以子模型1、3的选择比较好，预测模型的预测损失函数值如下：Therefore, sub-models 1 and 3 are better choices, and the prediction loss function value of the prediction model is as follows:

根据所采用的两类预测损失函数所计算得到的预测模型的两个预测损失函数值，调整预测模型的模型参数，直至训练结束，以得到训练后的预测模型。According to the two prediction loss function values of the prediction model calculated by the two types of prediction loss functions adopted, the model parameters of the prediction model are adjusted until the training is completed to obtain the trained prediction model.

基于训练得到的预测模型，对输入数据进行处理：通过3个子模型对时序、空间进行建模；每一个时刻利用路由层对各子模型输出的预测结果进行选择，以得到最佳的k个模型，实现对全路网流量的预测。Based on the trained prediction model, the input data is processed: time series and space are modeled through three sub-models; at each moment, the routing layer is used to select the prediction results output by each sub-model to obtain the best k models to achieve the prediction of the entire road network traffic.

具体的，如t时刻，对交叉路口处A的流量数据进行预测，各子模型所输出的预测流量分别为v1,v2,v3，各子模型的输出概率分别为0.6, 0.37, 0.03，则对于t时刻在对交叉路口处A的预测流量为：Specifically, at time t, the traffic flow data of intersection A is predicted, and the predicted traffic flows output by each sub-model are v1, v2, and v3, respectively. The output probabilities of each sub-model are 0.6, 0.37, and 0.03, respectively. Then, the predicted traffic flow at intersection A at time t is:

本实施例通过由多种子模型所组成的预测模型，实现了更优预测结果选择，有利于应对不同场景下的交通数据预测，具体地，通过时序子模型来进行预测，增加了预测模型的时序泛化能力，通过静态空间子模型，有利于获得规律性交通特征，通过动态空间子模型，有利于获得非规律性交通特征，通过路由层对各子模型预测结果的选择，既实现了各子模型预测结果的融合，又避免了依赖单一模型的预测结果，时序子模型所采用的时序大模型，能够捕捉时间序列中的长期依赖关系和动态变化，有利于将时序大模型与空间小模型结合，增强了预测模型整体的时序泛化能力。训练过程中在进行反向传播时通过分类预测函数值来计算预测模型的预测损失函数值，实现了按照较佳模型选择和较差模型避免的方式进行学习。This embodiment achieves better prediction result selection through a prediction model composed of multiple sub-models, which is conducive to dealing with traffic data prediction in different scenarios. Specifically, prediction is performed through a time series sub-model, which increases the time series generalization ability of the prediction model. The static space sub-model is conducive to obtaining regular traffic characteristics, and the dynamic space sub-model is conducive to obtaining irregular traffic characteristics. The selection of prediction results of each sub-model by the routing layer not only realizes the fusion of the prediction results of each sub-model, but also avoids relying on the prediction results of a single model. The time series large model used by the time series sub-model can capture the long-term dependencies and dynamic changes in the time series, which is conducive to combining the time series large model with the space small model, and enhances the overall time series generalization ability of the prediction model. During the training process, the prediction loss function value of the prediction model is calculated by the classification prediction function value during back propagation, so as to realize learning in a way of selecting the better model and avoiding the worse model.

参见图8所示，图8为本申请实施例交通数据预测装置的一种示意图。该装置包括：See FIG8 , which is a schematic diagram of a traffic data prediction device according to an embodiment of the present application. The device includes:

预测模块，用于利用训练后的用于交通数据预测的预测模型，基于待预测交通位置处在历史时间的历史交通数据，获取待预测交通位置处在未来时间的预测交通数据，The prediction module is used to obtain predicted traffic data of the traffic location to be predicted at a future time based on the historical traffic data of the traffic location to be predicted at a historical time by using the trained prediction model for traffic data prediction.

其中，in,

参见图9所示，图9为本申请实施例交通数据预测装置或用于交通数据预测的电子设备另一种示意图。该装置包括：存储器和处理器，所述存储器存储有计算机程序，所述处理器被配置执行所述计算机程序以实现本申请实施例交通数据预测方法的步骤。Referring to Figure 9, Figure 9 is another schematic diagram of a traffic data prediction device or an electronic device for traffic data prediction according to an embodiment of the present application. The device includes: a memory and a processor, the memory stores a computer program, and the processor is configured to execute the computer program to implement the steps of the traffic data prediction method according to an embodiment of the present application.

存储器可以包括随机存取存储器（Random Access Memory，RAM），也可以包括非易失性存储器（Non-Volatile Memory，NVM），例如至少一个磁盘存储器。可选的，存储器还可以是至少一个位于远离前述处理器的存储装置。The memory may include a random access memory (RAM) or a non-volatile memory (NVM), such as at least one disk memory. Optionally, the memory may also be at least one storage device located away from the aforementioned processor.

上述的处理器可以是通用处理器，包括中央处理器（Central Processing Unit，CPU）、网络处理器（Network Processor，NP）等；还可以是数字信号处理器（Digital SignalProcessing，DSP）、专用集成电路（Application Specific Integrated Circuit，ASIC）、现场可编程门阵列（Field-Programmable Gate Array，FPGA）或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。The above-mentioned processor can be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; it can also be a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

本发明实施例还提供了一种计算机可读存储介质，所述存储介质内存储有计算机程序，所述计算机程序被处理器执行时实现本实施例所述交通数据预测方法的步骤。An embodiment of the present invention further provides a computer-readable storage medium, in which a computer program is stored. When the computer program is executed by a processor, the steps of the traffic data prediction method described in this embodiment are implemented.

对于装置/网络侧设备/存储介质实施例而言，由于其基本相似于方法实施例，所以描述的比较简单，相关之处参见方法实施例的部分说明即可。As for the apparatus/network-side device/storage medium embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the partial description of the method embodiment.

在本文中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。In this article, relational terms such as first and second, etc. are used only to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Moreover, the terms "comprise", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or device. In the absence of further restrictions, the elements defined by the statement "comprise a ..." do not exclude the presence of other identical elements in the process, method, article or device including the elements.

以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内，所做的任何修改、等同替换、改进等，均应包含在本发明保护的范围之内。The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present invention should be included in the scope of protection of the present invention.

Claims

1. A traffic data prediction method, the method comprising:

obtaining predicted traffic data of the traffic location to be predicted at a future time based on historical traffic data of the traffic location to be predicted at a historical time by using the trained prediction model for traffic data prediction,

Wherein,

The prediction model comprises a time sequence sub-model for acquiring time sequence characteristics of historical traffic data to obtain a first prediction result, a static space sub-model for acquiring static space characteristics of the historical traffic data to obtain a second prediction result, a multi-model layer formed by at least two dynamic space sub-models for acquiring dynamic space characteristics of the historical traffic data to obtain a third prediction result, and a routing layer for selecting the prediction result output by each sub-model in the multi-model layer;

The obtaining, by using the trained prediction model for traffic data prediction, predicted traffic data of the traffic location to be predicted at a future time based on historical traffic data of the traffic location to be predicted at a historical time, includes:

each sub-model in the multi-model layer obtains the prediction result of each sub-model from the current input historical sequence data,

The routing layer selects the prediction result of each sub-model with the output probability larger than the set output probability threshold according to the output probability of each sub-model,

Wherein,

The output probability of the sub-model is used for representing the similarity between the prediction result output by the sub-model and the mode characteristics of the current historical sequence data, the proportion occupied in the sum of the similarities between the prediction result output by each sub-model and the mode characteristics of the current historical sequence data,

The pattern feature is used to characterize the correlation of the current historical sequence data with the current timing pattern,

The current time sequence mode is used for representing each traffic data mode which is searched out through the time sequence and has high repeated occurrence probability.

2. The traffic data prediction method according to claim 1, wherein the prediction model is trained in the following manner:

Dividing the historical traffic data into time sequence segments with set sample time length, taking the time sequence segments as sample data, inputting the sample data into a prediction model to be trained,

Calculating a predictive loss function value of the predictive model based on the number of sub-models, the classified predictive loss function value of each sub-model, and the sample output probability of each sub-model, wherein the classified predictive loss function value includes a first type predictive loss function value for discarding a worse sub-model, and a second type predictive loss function value for selecting a better sub-model,

Back-propagation is performed based on the predictive loss function value of the predictive model,

And (5) training is repeated until the training is finished.

3. The traffic data prediction method according to claim 2, wherein the prediction loss function value of the prediction model is calculated as follows:

for any sub-model, calculating the product between the classified predicted loss function value of the sub-model and the logarithm of the sample output probability of the sub-model to obtain the predicted loss function value of the sub-model,

Calculating the average value of the predicted loss function values of all the sub-models to obtain the predicted loss function value of the predicted model;

The first type of predictive loss function value is determined as follows:

For any one of the sub-models,

In the case where the difference between the predicted sample value and the true value outputted by the sub-model is smaller than a set first difference threshold value, and the sample output probability of the sub-model is larger than the set sample output probability threshold value, the first-type predictive loss function value of the sub-model is set to 1,

In the case where the difference between the predicted sample value and the true value output by the sub-model is greater than a first difference threshold and the sample output probability of the sub-model is not greater than the sample output probability threshold, the first type predictive loss function value of the sub-model is set to the inverse of the number of sub-models minus 1,

Otherwise, the first type predictive loss function value of the sub-model is set to 0;

the second type of predictive loss function value is determined as follows:

For any one of the sub-models,

In case that the difference between the predicted sample value and the true value outputted by the sub-model is smaller than a second difference threshold value, and the sample output probability of the sub-model is larger than the sample output probability threshold value, the second type prediction loss function value of the sub-model is set to 1,

In the case where the difference between the predicted sample value and the true value output by the sub-model is greater than a second difference threshold and the sample output probability of the sub-model is not greater than the sample output probability threshold, the second type predictive loss function value of the sub-model is set to the inverse of the number of sub-models minus 1,

Otherwise, the second type predictive loss function value of the sub-model is set to 0,

Wherein,

The second difference threshold is the difference between 1 and the first difference threshold, the difference threshold is a fractional threshold used for representing the equal number points in the probability distribution range,

The output probability threshold is determined according to the number of types of output submodel predictors required.

4. A traffic data prediction method according to any one of claims 1 to 3, wherein the currently inputted history sequence data is inputted to the trained prediction model in such a manner that the history traffic data at the traffic location to be predicted is extracted according to the set history time step, so as to obtain the history traffic data at each history time point of the set history time step,

And sequentially inputting the historical traffic data at each historical time point into the trained prediction model according to the time sequence.

5. The traffic data prediction method according to claim 1, wherein the prediction result is prediction sequence data composed of predicted traffic data of each future time point, and wherein a set prediction time step is set between two adjacent future time points;

The output probability is determined as follows:

For any one of the sub-models,

Calculating multi-model layer predicted sequence data for characterizing the prediction result of the multi-model layer on the current historical sequence data according to the linear weight and the linear offset,

Calculating the correlation between the multi-model layer predicted sequence data and each traffic data mode in the current time sequence mode according to the multi-model layer predicted sequence data and the current time sequence mode, weighting each traffic data mode in the current time sequence mode according to each calculated correlation to obtain the mode characteristics of the current historical sequence data,

Calculating the similarity between the prediction result output by the sub-model and the mode characteristic of the current historical sequence data according to the mode characteristic of the current historical sequence data and the prediction result output by the sub-model,

Calculating the similarity between the prediction result output by each sub-model and the mode characteristic of the current historical sequence data according to the mode characteristic of the current historical sequence data and the prediction result output by each sub-model to obtain the similarity of each sub-model, accumulating the similarity of each sub-model to obtain the sum of the similarities,

And calculating the ratio of the similarity between the prediction result output by the sub-model and the mode characteristics of the current historical sequence data to the sum of the similarities to obtain the output probability of the sub-model.

6. The traffic data prediction method according to claim 5, wherein the calculating the multi-model layer prediction sequence data for characterizing the prediction result of the multi-model layer on the current history sequence data according to the linear weight and the linear offset comprises:

Accumulating the linear offset according to the product result of the linear weight and the current historical sequence data to obtain multi-model layer predicted sequence data of the current historical sequence data;

According to the multi-model layer prediction sequence data and the current time sequence mode, calculating the correlation between the multi-model layer prediction sequence data and each traffic data mode in the current time sequence mode comprises the following steps:

for any of the current timing modes,

Calculating an exponential function value based on a natural constant and taking the product of multi-model layer prediction sequence data and a transpose of a row vector used for representing the traffic data mode as an exponent to obtain the exponent function value of the traffic data mode, wherein the multi-model layer prediction sequence data is a row vector, the transpose of the row vector of the traffic data mode is a column vector, the current time sequence mode is a time sequence mode matrix, each row in the time sequence mode matrix corresponds to one traffic data mode, each column corresponds to one time sequence,

Calculating an exponential function value based on a natural constant and taking the product of the multi-model layer prediction sequence data and the transposed of the row vectors of each traffic data mode as an index to obtain an exponential function value of each traffic data mode, accumulating the exponential function values of each traffic data mode,

And calculating the ratio of the index function value of the traffic data mode to the accumulated index function value of each traffic data mode to obtain the correlation degree between the multi-model layer prediction sequence data and the traffic data mode.

7. The traffic data prediction method according to claim 6, wherein the weighting each traffic data pattern in the current time series pattern by each calculated correlation includes:

For the relevance of each traffic data mode, calculating the product of the relevance of the traffic data mode and the row vector of the traffic data mode to obtain the weighted row vector of the traffic data mode,

Adding the weighted row vectors of each traffic data mode to obtain a mode feature vector;

the prediction result of each sub-model with the selected output probability larger than the set output probability threshold value further comprises:

determining a fusion weight according to the output probability of each selected submodel,

And carrying out weighted summation on the selected prediction results by using the fusion weight to obtain the predicted traffic data.

8. The traffic data prediction method according to claim 6, wherein the time sequence sub-model is a time sequence large model for modeling that traffic time sequence characteristics of traffic data at any traffic location at a next time point are related only to traffic time sequence characteristics of traffic data at the traffic location at a previous time point,

The static space sub-model is a graph neural network model which is used for modeling that the traffic static space characteristics of traffic data at any traffic position at the next time point are related to the traffic static space characteristics of traffic data at the traffic position at the last time point and the traffic static space characteristics of traffic data at the static adjacent traffic position at the traffic position,

The dynamic space sub-model is a graph neural network model and is an attention mechanism model, the attention mechanism model is used for modeling the spatial characteristics of traffic data at any traffic position at the next time point to be correlated with the spatial characteristics of all traffic positions at the last time point, and the spatial characteristics are acquired through an attention mechanism.

9. A predictive model structure for traffic data prediction, the predictive model structure comprising:

A time sequence sub-model for acquiring time sequence characteristics of historical traffic data to obtain a first prediction result, a static space sub-model for acquiring static space characteristics of the historical traffic data to obtain a second prediction result, a multi-model layer formed by at least two of dynamic space sub-models for acquiring dynamic space characteristics of the historical traffic data to obtain a third prediction result, and

A routing layer for selecting the prediction results output by each sub-model in the multi-model layer;

Wherein,

10. A traffic data prediction apparatus, the apparatus comprising:

a prediction module for obtaining predicted traffic data of the traffic location to be predicted at a future time based on historical traffic data of the traffic location to be predicted at a historical time by using the trained prediction model for traffic data prediction,

Wherein,

Each sub-model in the multi-model layer respectively acquires the prediction result of each sub-model from the current input historical sequence data, the routing layer selects the prediction result of each sub-model with the output probability larger than the set output probability threshold value according to the output probability of each sub-model,

Wherein,