TWI755941B

TWI755941B - Hierarchical time-series prediction method

Info

Publication number: TWI755941B
Application number: TW109140783A
Authority: TW
Inventors: 大衛傅; 陳佩君
Original assignee: 英業達股份有限公司
Priority date: 2020-11-20
Filing date: 2020-11-20
Publication date: 2022-02-21
Also published as: TW202221548A

Abstract

A hierarchical time-series prediction method is adapted to a plurality of reconciled prediction values of a plurality of nodes of a hierarchical structure, wherein the plurality of nodes respectively has a plurality of time-series, the plurality of reconciled prediction values corresponds to the plurality of time-series, and the plurality of nodes comprises a plurality of bottom-level nodes. The method comprises generating a plurality of individual prediction values corresponding to the plurality of time-series by a plurality of prediction models, generating a plurality of bottom-level prediction values corresponding to the plurality of bottom-level nodes according to the plurality of individual prediction values and an encoder network, and generating the plurality of reconciled prediction values according to the plurality of bottom-level prediction values and a decoder associated with the hierarchical structure, wherein the number of the plurality of individual prediction values is greater than that of the plurality of bottom-level prediction values and is identical to that of the plurality of reconciled prediction values, and the individual prediction value and the reconciled prediction values which correspond to the same time-series are continuous in time.

Description

Hierarchical Time Series Forecasting Methods

本發明關於時間序列的預測，特別是一種階層式時間序列預測方法。 The present invention relates to the prediction of time series, especially a hierarchical time series prediction method.

階層式時間序列(hierarchical time-series)是一種在階層結構下隨時間改變的觀察結果的收集。階層式時間序列經常出現在商業及經濟領域，其中隨時間改變的量需要在不同的精細度(granularity)層級下被預測。例如在供應鏈中，可能需要在國家、城市或商店級別預測需求以組織物流。在眾多應用中，需要針對不同階層層級的多個時間序列產生預測。由於階層結構的限制，獨立的預測通常不能正確地相加，因而需要調和(reconciliation)步驟。 A hierarchical time-series is a collection of observations that change over time in a hierarchical structure. Hierarchical time series often appear in business and economics, where quantities that change over time need to be predicted at different levels of granularity. In a supply chain, for example, it may be necessary to forecast demand at the country, city or store level to organize logistics. In many applications, forecasts need to be generated for multiple time series at different hierarchical levels. Due to the constraints of the hierarchical structure, independent predictions often do not add up correctly, thus requiring a reconciliation step.

階層式時間序列的預測通常分為兩階段。第一階段是對所有或部分的時間序列進行預測。第二階段是這些預測值被調和(reconcile)以強制符合階層限制。調和時間序列的現有方式例如：自下而上(bottom-up)、自上而下(top-down)、最佳組合(optimal-combination)及跡最小化(trace-minimization)等，這些方式皆有彈性(flexibility)不足的問題，換言之，這些方式無法針對用來評估預測模型的特定指標(metric)進行優化。 The forecasting of hierarchical time series is usually divided into two stages. The first stage is to forecast all or part of the time series. The second stage is that these predictions are reconcile to enforce compliance with class constraints. Existing ways to reconcile time series such as bottom-up, top-down, optimal-combination and trace-minimization, etc. Insufficient flexibility, in other words, these approaches cannot be optimized for the specific metric used to evaluate the predictive model.

隨著在過去數年間興起的深度學習(deep learning)，也有嘗試在損失函數中施加軟約束(soft constraint)並利用階層式結構以規範訓練過程的方式，希望能改善預測效能並減少調和的間隙(gap)。然而這種方式並無法保證正確的調和，即不滿足階層的約束。 With the rise of deep learning in the past few years, there are also attempts to impose soft constraints on the loss function and use a hierarchical structure to regulate the training process, hoping to improve the prediction performance and reduce the reconciliation gap. (gap). However, this approach does not guarantee correct reconciliation, i.e. does not satisfy the class constraints.

有鑑於此，本發明提出了一種基於編碼器-解碼器(encoder-decoder)模型的調和策略，其具有通用性及彈性，且容易實現。藉由在真實世界的資料集上進行測試，本發明提出的階層式時間序列的預測方法皆可達到或超過調和設置中現有方法的效能。 In view of this, the present invention proposes a reconciliation strategy based on an encoder-decoder model, which is versatile, flexible, and easy to implement. By testing on real-world datasets, the proposed hierarchical time-series forecasting method can meet or exceed the performance of existing methods in a reconciliation setting.

依據本發明一實施例的一種階層式時間序列預測方法，適用於產生一階層結構的多個節點的多個調和預測值，其中節點分別具有多個時間序列，調和預測值對應於時間序列，且節點包含多個底層節點，所述的方法包括：以多個預測模型分別產生對應於時間序列的多個獨立預測值；依據這些獨立預測值及編碼網路產生多個底層預測值；以及依據底層預測值及關聯於階層結構的解碼器產生調和預測值；其中獨立預測值的數量大於底層預測值的數量，且等於調和預測值的數量；對應於時間序列中同一者的獨立預測值及調和預測值在時間上連續。 A hierarchical time series prediction method according to an embodiment of the present invention is suitable for generating a plurality of harmonic prediction values of a plurality of nodes in a hierarchical structure, wherein the nodes respectively have a plurality of time series, the harmonic prediction values correspond to the time series, and The node includes a plurality of underlying nodes, and the method includes: generating a plurality of independent forecasting values corresponding to the time series with a plurality of forecasting models; generating a plurality of underlying forecasting values according to the independent forecasting values and the coding network; Predictors and decoders associated with the hierarchy produce harmonic predictors; where the number of independent predictors is greater than the number of underlying predictors and equal to the number of harmonic predictors; independent predictors and harmonic predictors corresponding to the same in the time series Values are continuous in time.

以上之關於本揭露內容之說明及以下之實施方式之說明係用以示範與解釋本發明之精神與原理，並且提供本發明之專利申請範圍更進一步之解釋。 The above description of the present disclosure and the following description of the embodiments are used to demonstrate and explain the spirit and principle of the present invention, and provide further explanation of the scope of the patent application of the present invention.

10:調和器 10: Conditioner

P:編碼網路 P: Code Network

S:解碼器 S: decoder

S1、S2、S3:步驟 S1, S2, S3: Steps

圖1繪示一個階層結構的範例；圖2是本發明一實施例提出的階層式時間序列預測方法的流程圖；圖3是本發明一實施例提出的階層式時間序列預測方法對應的架構示意圖；圖4A是標準的全連接網路的示意圖；以及圖4B是簡化版的全連接網路的示意圖。 FIG. 1 shows an example of a hierarchical structure; FIG. 2 is a flowchart of a hierarchical time series prediction method proposed by an embodiment of the present invention; FIG. 3 is a schematic structural diagram corresponding to the hierarchical time series prediction method proposed by an embodiment of the present invention 4A is a schematic diagram of a standard fully connected network; and FIG. 4B is a schematic diagram of a simplified version of a fully connected network.

以下在實施方式中詳細敘述本發明之詳細特徵以及特點，其內容足以使任何熟習相關技藝者了解本發明之技術內容並據以實施，且根據本說明書所揭露之內容、申請專利範圍及圖式，任何熟習相關技藝者可輕易地理解本發明相關之構想及特點。以下之實施例係進一步詳細說明本發明之觀點，但非以任何觀點限制本發明之範疇。 The detailed features and characteristics of the present invention are described in detail in the following embodiments, and the content is sufficient to enable any person skilled in the relevant art to understand the technical content of the present invention and implement accordingly, and according to the content disclosed in this specification, the scope of the patent application and the drawings , any person skilled in the related art can easily understand the related concepts and features of the present invention. The following examples further illustrate the viewpoints of the present invention in detail, but do not limit the scope of the present invention in any viewpoint.

請參考圖1，其繪示一個階層結構的範例。此階層結構具有多個節點A~G，其中包含多個底層節點D、E、F及G。每個節點A~G各自具有一個時間序列A_t~G_t。時間序列(time-series)例如是以序列方式呈現生產者每個月的月產量。圖1繪示的階層結構描述這些時間序列之間的相依關係，舉一個實際範例如下述：在一工廠a中包含產線b及產線c，其中產線b具有機器d及機器e，且產線c具有機器f及機器g。時間序列D_t、E_t、F_t及G_t分別代表機器d、e、f及g的月產量，時間序列B_t及C_t分別代表產線b及c的月產量，時間序列A_t代表工廠a的月產量。這些生產者a~g的生產量A_t~G_t滿足「A_t=B_t+C_t」、「B_t=D_t+E_t」且「C_t=F_t+G_t」的關係。 Please refer to FIG. 1, which shows an example of a hierarchical structure. This hierarchical structure has multiple nodes A~G, including multiple bottom nodes D, E, F and G. Each node A~G has a time series At _~ _Gt . A time-series, for example, presents the monthly output of a producer in a serial manner. The hierarchical structure shown in Figure 1 describes the dependencies between these time series. A practical example is as follows: a factory a includes a production line b and a production line c, wherein the production line b has a machine d and a machine e, and Production line c has machine f and machine g. The time series D _t , E _t , F _t and G _t represent the monthly output of machines d, e, f and g respectively, the time series B _t and C _t represent the monthly output of production lines b and c respectively, and the time series A _t represent The monthly output of factory a. The production amounts A _t to G _t of these producers a to g satisfy the relationships of “A _t =B _t +C _t ”, “B _t =D _t +E _t ”, and “C _t =F _t +G _t ”.

除了上述範例，多個時間序列例如可用於描述政府機關中各部門每個月的預算，氣象預報中每天的氣溫及濕度，便利商店內各種產品的月營業額等。本發明不特別限制時間序列應用的領域。 In addition to the above examples, multiple time series can be used to describe, for example, the monthly budget of various departments in government agencies, the daily temperature and humidity in weather forecasts, and the monthly turnover of various products in convenience stores. The present invention does not particularly limit the field of time series application.

所述的階層結構用於描述多個時間序列之間的相依關係。承前例來說：產線b本月的月產量被機器d及機器e的本月的月產量所影響。在某些狀況下，產線b本月的月產量可能受到機器d及機器e過去數個月的月產量的影響。本發明提出的階層式時間序列預測方法可從時間序列A_t~G_t過去的實際產量和預測產量等大量維度的資料中，選擇部分或全部的資料進行計算，以產生A_t~G_t各自在未來的預測值(如預估月產量)，同時滿足A_t~G_t彼此的相依關係。例如：B_t及C_t的預估產量的總和不會超過A_t的預估月產量。 The hierarchical structure described is used to describe the dependencies between multiple time series. Taking the previous example, the monthly output of production line b is affected by the monthly output of machine d and machine e this month. In some cases, the monthly production of line b this month may be affected by the monthly production of machine d and machine e over the past few months. The hierarchical time series forecasting method proposed by the present invention can select some or all of the data from the data of a large number of dimensions, such as the past actual output and predicted output of the time series At ~ G _t _, to generate the respective data of At ~ G _t _. In the future forecast value (such as the estimated monthly output), at the same time satisfy the mutual dependence of A _t ~ G _t . For example: the sum of the estimated production of B _t and C _t will not exceed the estimated monthly production of A _t .

圖2是本發明一實施例提出的階層式時間序列預測方法的流程圖。圖3是本發明一實施例提出的階層式時間序列預測方法對應的架構示意圖。在圖3中，多個獨立的預測值組成的獨立預測向量被輸入至調和器10(reconciler)。調和器10包含可訓練的編碼網路P及固定的解碼器S。調和器10的輸出為多個調和預測值，這些調和預測值分別對應於時間序列。以下按照圖2的流程詳細說明圖3架構的實施細節。 FIG. 2 is a flowchart of a hierarchical time series prediction method proposed by an embodiment of the present invention. FIG. 3 is a schematic structural diagram corresponding to a hierarchical time series prediction method proposed by an embodiment of the present invention. In FIG. 3, an independent prediction vector composed of a plurality of independent prediction values is input to a reconciler 10 (reconciler). The blender 10 includes a trainable encoding network P and a fixed decoder S. The output of the blender 10 is a plurality of harmonic forecast values, which respectively correspond to time series. The implementation details of the architecture of FIG. 3 will be described in detail below according to the flow of FIG. 2 .

步驟S1是「以預測模型取得一獨立預測向量」。詳言之，以多個預測模型分別產生對應於多個時間序列的多個獨立預測值。所述的獨立預測向量包含多個時間序列的預測值，如圖3的

所示。其中A _t ,B _t ,...,G _t的每一者代表一個時間序列。在本發明一實施例中，獨立預測向量中的每個獨立預測值代表時間序列的下一個預測值。即

,

,...,

例如是下個月的產量。但本發明不限於上述範例，在另一實施例中，

,

,...,

中的每一者亦可代表連續多個預測週期(例如下兩個月)的多個預測值。 Step S1 is "obtaining an independent prediction vector with the prediction model". Specifically, a plurality of independent forecast values corresponding to a plurality of time series are respectively generated by a plurality of forecast models. The independent prediction vector contains the prediction values of multiple time series, as shown in Figure 3

shown. where each of A _t , B _t ,...,G _t represents a time series. In an embodiment of the present invention, each independent prediction value in the independent prediction vector represents the next prediction value of the time series. which is

,

,...,

For example, next month's output. However, the present invention is not limited to the above examples. In another embodiment,

,

,...,

Each of the may also represent multiple forecast values for multiple consecutive forecast periods (eg, the next two months).

在步驟S1中，所述的預測模型用於產生時間序列的下一個預測值。此預測模型在步驟S1之前已被訓練完成。例如，對時間序列A_t而言，收集工廠a過去多個月的歷史預測產量A_HP以及工廠a過去多個月的實際產量A_HR作為預測模型訓練時的輸入。 In step S1, the forecast model is used to generate the next forecast value of the time series. This prediction model has been trained before step S1. For example, for the time series A _t , the historical forecast production A _HP of factory a in the past several months and the actual production A _HR of factory a in the past several months are collected as the input for the training of the prediction model.

關於步驟S1所述的多個預測模型，本發明採用以下兩種策略之一。第一種策略是針對每個時間序列使用獨立的預測模型，第二種策略是針對所有時間序列使用一個通用(global)的預測模型。 Regarding the multiple prediction models described in step S1, the present invention adopts one of the following two strategies. The first strategy is to use an independent forecasting model for each time series, and the second strategy is to use a common (global) forecasting model for all time series.

在第一種策略中，本發明採用線性自迴歸模型(autoregressive model)並以滯後的時間序列值作為輸入。在此策略中，對應於每個時間序列的預測模型中的超參數(hyperparameter)各自獨立地被調整。例如，時間序列A_t的超參數調整不會影響到時間序列B_t的超參數調整。在第一種策略中，本發明執行網格搜索(grid search)以最佳化預測的滯後值的數量。 In the first strategy, the present invention employs a linear autoregressive model with lagged time series values as input. In this strategy, the hyperparameters in the forecasting model corresponding to each time series are adjusted independently. For example, hyperparameter tuning of time series A _t does not affect hyperparameter tuning of time series B _t . In the first strategy, the present invention performs a grid search to optimize the number of predicted lag values.

在第二種策略中，對於具有大量時間序列的資料集，例如超過500個時間序列的資料集，本發明採用單一個通用的預測模型去預測所有的時間序列。這個方法允許利用時間序列之間的相似性並建立一個複雜的預測模型。本發明採用輕量級的提升樹(Light Gradient Boosting，Light GBM)模型，並以縮放後的滯後的時間序列值、時間序列特定的特徵(time- series specific features)以及時間的特徵(temporal features)等作為輸入同理，在此預測模型的訓練階段亦採用具有上述類型的歷史預測資料以及歷史實際資料作為訓練資料。假設預測模型是用於預測下個月的飲料銷售量，則時間序列特定的特徵例如可採用溫度、濕度或季節等，且時間的特徵例如可採用上個月晴天的天數等。在此策略中，對應於每個時間序列的預測模型中的超參數彼此可以互相參照或沿用。例如，時間序列C_t的超參數設定可能與時間序列B_t的超參數設定相同。在第二種策略中，本發明執行網格搜索以最佳化Light GBM建立的樹中的樹葉節點的數量及在每個樹葉節點超參數中的最小觀察量(minimum number of observations)，並且對於其他參數維持預設值，其中超參數用於表示樹的結構及分支狀況。只要找到最佳的超參數配置，為了驗證調和器10的參數，本發明在交叉驗證期間使用多個模型進行訓練。調和器10依據階層約束將預測模型的預測結果修正得更為精確。上述的網格搜索亦可改用隨機搜索(random search)以降低搜索時間。 In the second strategy, for a data set with a large number of time series, such as a data set with more than 500 time series, the present invention adopts a single general prediction model to predict all the time series. This method allows to exploit the similarity between time series and build a complex forecasting model. The present invention adopts a light-weight boosting tree (Light Gradient Boosting, Light GBM) model, and uses the scaled lag time-series values, time-series specific features and temporal features. For the same reason as the input, in the training phase of the prediction model, the historical prediction data and the historical actual data of the above-mentioned types are also used as the training data. Assuming that the prediction model is used to predict the sales volume of beverages in the next month, the time series specific features such as temperature, humidity or season can be used, and the time features can be used such as the number of sunny days in the last month. In this strategy, the hyperparameters in the forecasting model corresponding to each time series can be cross-referenced or inherited from each other. For example, the hyperparameter settings for time series C _t may be the same as the hyperparameter settings for time series B _t . In the second strategy, the present invention performs a grid search to optimize the number of leaf nodes in the tree built by Light GBM and the minimum number of observations in each leaf node hyperparameter, and for Other parameters maintain default values, among which hyperparameters are used to represent the structure and branch status of the tree. As long as the optimal hyperparameter configuration is found, the present invention uses multiple models for training during cross-validation in order to validate the parameters of the blender 10. The blender 10 revises the prediction result of the prediction model to be more accurate according to the hierarchical constraints. The above grid search can also be replaced by random search to reduce the search time.

對於上述兩種策略，本發明執行十折的分塊交叉驗證(ten-fold blocked cross-validation)，即驗證集位於相同的時間窗中，藉此選擇最佳的超參數(hyperparameter)組合。上述的驗證技術對於時間序列而言已被證明是有效的。 For the above two strategies, the present invention performs ten-fold blocked cross-validation, that is, the validation sets are located in the same time window, thereby selecting the best hyperparameter combination. The verification techniques described above have been shown to be effective for time series.

在真實世界的預測模型中，經常需要最小化給定的指標(metric)，例如平均絕對值比例誤差(mean absolute scaled error，MASE)或平均絕對誤差(mean absolute error，MAE)，而且預測模型的選擇取決於被選定的指標。 In real-world prediction models, it is often necessary to minimize a given metric, such as mean absolute scaled error (MASE) or mean absolute error (MAE), and the prediction model's The choice depends on the selected indicator.

為了展示本發明的彈性，本發明一實施例採用以下兩種指標：第一種為平均絕對比例誤差(Mean Absolute Scaled Error，MASE)，第二種為平均對數絕對誤差(Mean Logarithm of Absolute Error，MLAE)。本發明基於這兩種指標訓練具有不同損失函數的多個預測模型。例如針對MASE指標使用MASE作為損失函數，針對MLAE指標使用MLAE作為損失函數。針對預測模型的指標選擇對應於指標的損失函數是一種直觀且方便的做法。 In order to demonstrate the flexibility of the present invention, an embodiment of the present invention adopts the following two indicators: the first one is Mean Absolute Scaled Error (MASE), and the second one is Mean Logarithm of Absolute Error (Mean Logarithm of Absolute Error, MLAE). The present invention trains multiple prediction models with different loss functions based on these two indicators. For example, MASE is used as the loss function for the MASE indicator, and MLAE is used as the loss function for the MLAE indicator. Selecting a loss function corresponding to the metric for the metrics of the predictive model is an intuitive and convenient approach.

在本發明另一實施例中，對於幾乎恆定的時間序列，本發明不採用MASE作為損失函數，以避免產生不穩定的預測結果。在此另一實施例中，損失函數例如是：將誤差放大一倍加上平均樣本內的原始誤差。 In another embodiment of the present invention, for an almost constant time series, the present invention does not use MASE as a loss function to avoid generating unstable prediction results. In this alternative embodiment, the loss function is, for example, doubling the error plus the original error within the average sample.

在本發明又一實施例中，在預測模型的訓練階段或驗證階段，可針對預測值與實際值相減的差值設置不同的權重。舉例來說，若預測產量大於實際產量，則以損失函數的輸出值乘以第一權重作為預測誤差，若預測產量小於或等於實際產量，則損失函數的輸出值乘以第二權重作為第二誤差，其中第一權重大於第二權重。實務上，考量到生產量大於銷售量將導致庫存率提高，因此藉由設置不同的權重產生不同的誤差值反映此實際狀況。 In yet another embodiment of the present invention, in the training stage or the verification stage of the prediction model, different weights may be set for the difference between the predicted value and the actual value. For example, if the predicted yield is greater than the actual yield, the output value of the loss function is multiplied by the first weight as the prediction error; if the predicted yield is less than or equal to the actual yield, the output value of the loss function is multiplied by the second weight as the second weight. error, where the first weight is greater than the second weight. In practice, considering that the production volume is greater than the sales volume, the inventory rate will increase, so different error values are generated by setting different weights to reflect the actual situation.

在本發明再一實施例中，可依據階層結構中每個節點的重要程度決定每個時間序列的預測模型採用的指標。舉例來說，若在圖1中，節點A相較於節點D及E更為重要，則在時間序列A_t的預測模型中採用MASE作損失函數，且在時間序列D_t及E_t的預測模型中，採用平均絕對誤差(Mean Absolute Error，MAE)。反之，若是節點D及E的重要程度大於節點A的重要程度，則在時間序列D_t及E_t的預測模型中採用MLAE作損失函數，且在時間序列A_t的預測模型中採用MAE作損失函數。 In yet another embodiment of the present invention, the index adopted by the prediction model of each time series can be determined according to the importance of each node in the hierarchical structure. For example, if in Figure 1, node A is more important than nodes D and E, MASE is used as the loss function in the prediction model of time series A _t , and the prediction of time series D _t and E _t is used as the loss function. In the model, the Mean Absolute Error (MAE) is used. Conversely, if the importance of nodes D and E is greater than that of node A, MLAE is used as the loss function in the prediction model of the time series D _t and E _t , and MAE is used as the loss function in the prediction model of the time series A _t . function.

請參考圖2，步驟S2是「依據獨立預測向量及一編碼網路產生一底層向量」。詳言之，依據步驟S1產生的多個獨立預測值及圖3繪示的編碼網路P產生對應於底層節點的多個底層預測值，這些底層預測值組成底層向量。步驟S1中獨立預測值的數量大於步驟S2中底層預測值的數量。 Please refer to FIG. 2 , step S2 is “generating an underlying vector according to the independent prediction vector and an encoding network”. To be more specific, according to the plurality of independent predicted values generated in step S1 and the encoding network P shown in FIG. 3 , a plurality of low-level predicted values corresponding to the bottom-level nodes are generated, and these low-level predicted values constitute a bottom-level vector. The number of independent predicted values in step S1 is greater than the number of underlying predicted values in step S2.

在本發明一實施例中，編碼網路P例如是一個編碼矩陣。該編碼矩陣用於將獨立預測向量映射至底層(bottom level)向量。 In an embodiment of the present invention, the encoding network P is, for example, an encoding matrix. The encoding matrix is used to map independent prediction vectors to bottom level vectors.

在本發明一實施例中，使用以神經網路表示的一個通用函數(generic function)P：R^N→R^M去通用化(generalize)M×N矩陣，其中N代表階層結構中所有節點的數量，M代表階層結構中葉節點的數量。 In an embodiment of the present invention, a generic function P: R ^N → R ^M represented by a neural network is used to generalize the M×N matrix, where N represents the number of all nodes in the hierarchical structure , M represents the number of leaf nodes in the hierarchical structure.

基於圖1的階層範例，請參考圖3，編碼網路P將7維的獨立預測向量

轉換為4維的底層向量

。底層預設值

分別對應於圖1的底層節點D、E、F及G。 Based on the hierarchical example of Fig. 1, please refer to Fig. 3, the encoding network P converts the 7-dimensional independent prediction vector

Convert to 4-dimensional underlying vector

. low-level default

They correspond to the underlying nodes D, E, F and G of FIG. 1, respectively.

對於調和器10中的編碼網路P，本發明採用以整流線性單位函數(Rectified Linear Unit，ReLU)作為激勵函數(activation function)的前饋網路(feed-forward network)，並且將輸出層的大小設定為底層預設值的數量。 For the encoding network P in the blender 10, the present invention adopts a feed-forward network with a Rectified Linear Unit (ReLU) as an activation function, and uses the output layer of the feed-forward network. Size is set to the number of underlying presets.

本發明提出以下兩種架構去實現前饋網路：第一種為標準的全連接(fully connected)網路，如圖4A所示。第二種為簡化版的全連接網路，如圖4B所示。第二種架構可讓給定底層時間序列的輸出僅取決於本身及上一階層在所有層級中的預測值。 The present invention proposes the following two architectures to implement a feedforward network: the first is a standard fully connected network, as shown in FIG. 4A . The second is the simplified version of the full connection connected to the network, as shown in Figure 4B. The second architecture allows the output of a given underlying time series to depend only on itself and the forecasts of the previous level at all levels.

請參考圖1所示的階層關係並舉例說明如下：在圖4A所示的全連接網路中，對應於底層節點D的底層預測值

受所有層級的時間序列預測值影響，即

。在圖4B簡化版的全連接網路中，

只受本身，其父節點及其祖父節點的影響，即

。由於圖1中節點B的預測值並不受節點F及G的預測值的影響，因此，在全連接網路中節點B與F的連線以及節點B與G的連線可被移除。簡化版的全連接網路雖然在表示能力(representative power)弱於全連接網路，然而其仍保留自下而上、自上而下、自中向上下(middle-out)以及上述組合的表示空間。 Please refer to the hierarchical relationship shown in FIG. 1 and illustrate as follows: In the fully connected network shown in FIG. 4A , the bottom prediction value corresponding to the bottom node D

is affected by the time series forecast values at all levels, i.e.

. In the fully connected network of the simplified version of Figure 4B,

is only affected by itself, its parent and its grandparents, i.e.

. Since the predicted value of node B in FIG. 1 is not affected by the predicted value of nodes F and G, the connection between nodes B and F and the connection between nodes B and G can be removed in the fully connected network. Although the simplified version of the fully connected network is weaker than the fully connected network in terms of representation power, it still retains the representations of bottom-up, top-down, middle-out, and combinations of the above. space.

對於給定層級數量的階層關係，隨著時間序列數量的成長，所需訓練的參數數量在全連接網路的範例中二次式地(quadratically)成長，在簡化版的全連接網路中線性成長。為了減少過擬合(overfitting)的風險，在本發明一實施例中，當時間序列的數量超過本身長度的10倍時，編碼網路P採用簡化版的全連接網路。此外，採用簡化版的選項也可做為可調整的一個超參數。 For a given number of levels of hierarchy, as the number of time series grows, the number of parameters required to train grows quadratically in the case of a fully connected network and linearly in a simplified version of a fully connected network growing up. In order to reduce the risk of overfitting, in an embodiment of the present invention, when the number of time series exceeds 10 times its own length, the coding network P adopts a simplified version of the fully connected network. In addition, the option to use a simplified version can also be used as a hyperparameter that can be adjusted.

步驟S3是「依據底層向量及一解碼器產生一調和預測向量」。詳言之，依據多個底層預測值及關聯於階層結構的解碼器產生多個調和預測值。如圖3所示，解碼器S將底層調和預測作為輸入，重建出所有層級的預測。調和預測向量及獨立預測向量二者具有相同維度，即步驟S1中獨立預測值的數量等於步驟S3中調和預測值的數量。 Step S3 is "generating a harmonic prediction vector according to the underlying vector and a decoder". Specifically, a plurality of harmonic predictors are generated according to a plurality of underlying predictors and decoders associated with the hierarchical structure. As shown in Figure 3, the decoder S takes the low-level harmonic predictions as input and reconstructs the predictions of all levels. Both the harmonic prediction vector and the independent prediction vector have the same dimension, that is, the number of independent prediction values in step S1 is equal to the number of harmonic prediction values in step S3.

請參考圖3，在本發明一實施例中，解碼器S是一個固定的0-1矩陣，如下所示。 Referring to FIG. 3 , in an embodiment of the present invention, the decoder S is a fixed 0-1 matrix, as shown below.

解碼器S按照如圖1的階層結構約束條件將4維的底層向量

轉換為7維的調和預測向量

。 The decoder S converts the 4-dimensional underlying vector according to the hierarchical structure constraints as shown in Figure 1.

Convert to a 7-dimensional harmonic prediction vector

.

調和預測向量的每一個元素對應於一個預測值。在本發明其他實施例中，解碼器S也可以是線性或非線性函數。 Each element of the harmonic prediction vector corresponds to a prediction value. In other embodiments of the present invention, the decoder S may also be a linear or nonlinear function.

在本發明一實施例中，圖3的架構可對應至如下方程式：

In an embodiment of the present invention, the structure of FIG. 3 may correspond to the following equation:

其中

為獨立預測向量，

為調和預測向量，S為解碼器S對應的0-1矩陣，P為編碼網路對應的0-1矩陣。因此，在最佳組合及跡最小化等方式中，P矩陣為一個稠密矩陣(dense matrix)。在自上而下方式中，將P矩陣中的第一行填入多個總和為1的數值，其餘元素則設置為0。換言之，本發明提出的架構中涵蓋傳統用來預測階層式時間序列的方式。 in

is an independent prediction vector,

is the harmonic prediction vector, S is the 0-1 matrix corresponding to the decoder S, and P is the 0-1 matrix corresponding to the encoding network. Therefore, in the best combination and trace minimization, the P matrix is a dense matrix. In a top-down approach, the first row of the P matrix is filled with values that sum to 1, and the remaining elements are set to 0. In other words, the framework proposed by the present invention covers the traditional methods used to predict hierarchical time series.

本發明具有兩個主要的優點：廣義化(Generalization)及彈性。廣義化優點指的是本發明的模型具有寬廣的表示空間，其涵蓋了過去的預測方法並加以延伸，且允許非線性(的轉換)。彈性優勢指的是本發明允許在訓練階段使用適當的損失函數以達到指定的目標效能。現有的方法是簡單的啟發法(如自下而上、自上而下)或在不同的假設下將估計係數誤差最小化(如最佳調和、跡最小化)。舉例來說，如果目標是平均對數絕對誤差(Mean Logarithm of Absolute Error，MLAE)，則以MLAE作為損失函數。或者，如果對階層中的特定層級有興趣，可指定一個較高的權重至對應的損失函數項。類似地，若誤差的重要性取決於度量或非對稱，本發明可據以改變損失函數以朝向目標進行最佳化。 The present invention has two main advantages: generalization and flexibility. The generalization advantage means that the model of the present invention has a broad representation space, which covers and extends past prediction methods, and allows for nonlinearity. The elasticity advantage means that the present invention allows the use of an appropriate loss function in the training phase to achieve a specified target performance. Existing methods are simple heuristics (such as bottom-up, top-down) or under different assumptions that estimate Coefficient error minimization (eg optimal harmonic, trace minimization). For example, if the target is Mean Logarithm of Absolute Error (MLAE), use MLAE as the loss function. Alternatively, if you are interested in a specific level in the hierarchy, you can assign a higher weight to the corresponding loss function term. Similarly, if the importance of the error depends on the metric or asymmetry, the present invention can change the loss function accordingly to optimize towards the goal.

此外，本發明提出的模型具有實際上容易實作的優點，並且容易被快速成長的深度學習社群存取。相對於傳統方式採用複雜的統計模型，如最佳化結合或跡最小化。此外，如果可以以深度學習框架表達預測模型，本發明允許將調和器網路(reconciler network)堆疊在預測模型上，並同時訓練調和器和微調預測模型。 Furthermore, the model proposed by the present invention has the advantage of being practically easy to implement and easily accessible to the rapidly growing deep learning community. Complex statistical models such as optimal binding or trace minimization are employed relative to traditional approaches. Furthermore, if the predictive model can be expressed in a deep learning framework, the present invention allows stacking a reconciler network on top of the predictive model and simultaneously train the reconciler and fine-tune the predictive model.

本發明具有下列貢獻及功效：本發明所提出的調和方法建立在任何以時間序列為輸入的預測方法的上層。本發明提出的易於實現的調和方法，比起現有方法更通用且更有彈性。傳統方法可視為本發明提出的架構中的特例。本發明容許用於各種實際考量的相異的損失函數。 The present invention has the following contributions and effects: The reconciliation method proposed by the present invention is built on top of any prediction method that takes time series as input. The easy-to-implement reconciliation method proposed by the present invention is more versatile and flexible than existing methods. The traditional method can be regarded as a special case in the architecture proposed by the present invention. The present invention allows for distinct loss functions for various practical considerations.

綜上所述，基於編碼器-解碼器模型的神經網路，本發明提出一個新的方法以調和階層式時間序列的預測。編碼器是一個可訓練的神經網路，其將獨立的預測結果作為輸入，並輸出位於底層的被調和的預測結果。解碼器是一個固定矩陣，其以底層的編碼的預測結果精確地重建出所有層級的預測結果。本發明包含並擴展了現有方法的表示空間(representation space)，極具彈性，並且易於實現。 In conclusion, based on the neural network of the encoder-decoder model, the present invention proposes a new method to reconcile the prediction of hierarchical time series. The encoder is a trainable neural network that takes as input the independent predictions and outputs the reconciled predictions at the bottom layer. The decoder is a fixed matrix that accurately reconstructs the predictions of all levels from the underlying encoded predictions. The present invention encompasses and extends the representation space of existing methods, is extremely flexible, and is easy to implement.

雖然本發明以前述之實施例揭露如上，然其並非用以限定本發明。在不脫離本發明之精神和範圍內，所為之更動與潤飾，均屬本發明之專利保護範圍。關於本發明所界定之保護範圍請參考所附之申請專利範圍。 Although the present invention is disclosed in the foregoing embodiments, it is not intended to limit the present invention. Changes and modifications made without departing from the spirit and scope of the present invention belong to the present invention the scope of patent protection. For the protection scope defined by the present invention, please refer to the attached patent application scope.

S1、S2、S3:步驟 S1, S2, S3: Steps

Claims

A hierarchical time series forecasting method is suitable for generating a plurality of harmonic forecast values of a plurality of nodes in a hierarchical structure, wherein the nodes respectively have a plurality of time series, the harmonic forecast values correspond to the time series, and the Some nodes include multiple underlying nodes, and the method includes: generating a plurality of independent forecast values corresponding to the time series using a plurality of forecast models; generating a plurality of underlying predicted values corresponding to the underlying nodes according to the independent predicted values and an encoding network; and generating the harmonic predictors according to the underlying predictors and a decoder associated with the hierarchy; wherein The number of independent predictors is greater than the number of underlying predictors and equal to the number of harmonic predictors; The independent predictor and the reconciled predictor corresponding to the same one of the time series are consecutive in time.

The hierarchical time series prediction method according to claim 1, wherein the encoding network is a feed-forward neural network, and the training data of the encoding network includes a plurality of historical prediction values and a plurality of historical actual values value.

The hierarchical time series forecasting method of claim 1, wherein each of the forecasting models is a linear autoregressive model, and hyperparameters of each of the forecasting models are independently adjusted in the training phase.

The hierarchical time series forecasting method as claimed in claim 1, wherein the forecasting models are lightweight boosted tree models, and hyperparameters of the forecasting models are referred to each other in the training phase.

The hierarchical time series prediction method according to claim 1, wherein when the number of the time series is greater than a threshold, the prediction models are lightweight boosted tree models.

The hierarchical time series forecasting method according to claim 1, wherein the loss function of each of the forecasting models corresponds to the verification index of the forecasting model.

The hierarchical time series prediction method according to claim 1, wherein the loss function corresponding to the time series located at the upper layer in the hierarchical structure is the mean absolute proportional error, and the loss function corresponding to another time series located at the lower layer is mean absolute error.

The hierarchical time series prediction method according to claim 1, wherein the loss function corresponding to the time series located at the lower layer in the hierarchical structure is the mean logarithmic absolute error, and the loss function corresponding to another time series located at the upper layer is mean absolute error.

The hierarchical time series forecasting method according to claim 1, wherein the forecasting model is used to output a forecast value and a forecast error, and when the forecast value is greater than an actual value, multiply the loss function by a first weight as For the prediction error, when the predicted value is not greater than the actual value, the loss function is multiplied by a second weight as the second error, wherein the first weight is greater than the second weight.

The hierarchical time series prediction method according to claim 2, wherein the feedforward network has a fully connected layer, and each element of the fully connected layer in the independent prediction vector is based on each element in the underlying vector The hierarchical relationship determines whether there is a connection.