TWI724734B

TWI724734B - Method of building and applying an attack identification data model

Info

Publication number: TWI724734B
Application number: TW109100150A
Authority: TW
Inventors: 陳建源
Original assignee: 台達電子工業股份有限公司
Priority date: 2020-01-03
Filing date: 2020-01-03
Publication date: 2021-04-11
Also published as: TW202127837A

Abstract

A method of building and applying an attack identification data model is provided. Under a training mode, the method is performed to execute statistics on a plurality of sample traffic for obtaining a plurality of characteristic values, executes an algorithm of classification and training to build the attack identification data model which can include a plurality of identification features. Under an identification mode, an identification module classifies a plurality of undefined data traffic based on the attack identification data model to one of the traffic categories of whitelist or blacklist. The present disclosed example can identify more different traffic categories, and accurately determine that each undefined data traffic belongs to whitelist or blacklist.

Description

Generation and application method of attack identification data model

本發明係與網路攻擊的辨識有關，特別有關於攻擊辨識資料模型的生成與應用方法。 The present invention is related to the identification of network attacks, in particular to the generation and application methods of attack identification data models.

於通訊網路中(如網際網路或區域網路)，電腦裝置間是透過傳遞資料流量來進行通訊。然而，惡意的流量(即攻擊行為)可能導致電腦裝置故障。 In a communication network (such as the Internet or a local area network), computer devices communicate through data traffic. However, malicious traffic (ie, attacking behavior) may cause the computer device to malfunction.

為了偵測來自網路上的攻擊，現有攻擊辨識技術是事先收集已知流量(如封包)的數值，並於收到陌生流量時，將陌生流量的數值與已知流量的數值進行比較，若陌生流量的數值符合任一已知流量的數值，則可判定此陌生流量的目的(如正常流量或攻擊流量)。 In order to detect attacks from the Internet, the existing attack identification technology collects the value of known traffic (such as packets) in advance, and compares the value of the unfamiliar traffic with the value of the known traffic when the unfamiliar traffic is received. If the value of the flow matches the value of any known flow, the purpose of the unfamiliar flow (such as normal flow or attack flow) can be determined.

現有攻擊辨識技術的缺失在於，僅能對與已知流量完全相同的陌生流量進行辨識，一但陌生流量與已知流量存在差異，將無法有效成功辨識陌生流量的目的。 The disadvantage of the existing attack identification technology is that it can only identify the strange traffic that is exactly the same as the known traffic. Once the strange traffic is different from the known traffic, it will not be able to effectively and successfully identify the purpose of the strange traffic.

是以，現有網路攻擊辨識技術存在上述問題，而亟待更有效的方案被提出。 Therefore, the existing network attack identification technology has the above-mentioned problems, and a more effective solution is urgently required.

本發明之主要目的，係在於提供一種攻擊辨識資料模型的生成與應用方法，可基於相同數量的樣本流量來對更多種的流量類別進行辨識。 The main purpose of the present invention is to provide a method for generating and applying an attack identification data model, which can identify more types of traffic based on the same amount of sample traffic.

為達上述目的，本發明係提供一種攻擊辨識資料模型的生成與應用方法，用於自動控制系統，攻擊辨識系統包括控制設備、受控設備及辨識模組，攻擊辨識資料模型的生成與應用方法包括以下步驟：於訓練模式下，對白名單或黑名單的多個樣本流量的多個數值進行統計以獲得多個樣本值，其中基於所有樣本值可對第一數量的多個流量類別進行辨識；基於多個樣本值與所對應的多個流量類別執行分類學習演算法來對多個樣本值以外的數值進行分類，以產生攻擊辨識資料模型，其中攻擊辨識資料模型包括多個辨識特徵，基於所有辨識特徵可對第二數量的多個流量類別進行辨識，第二數量大於第一數量；控制辨識模組於辨識模式下負責接收多個陌生流量；及，基於攻擊辨識資料模型的多個辨識特徵及各陌生流量的數值分類各陌生流量至白名單的流量類別或黑名單的流量類別，其中多個陌生流量是由控制設備發送至受控設備，或由受控設備發送至控制設備。 To achieve the above objective, the present invention provides an attack identification data model generation and application method, which is used in an automatic control system. The attack identification system includes a control device, a controlled device and an identification module, and an attack identification data model generation and application method It includes the following steps: in the training mode, perform statistics on multiple values of multiple sample flows of the whitelist or blacklist to obtain multiple sample values, wherein a first number of multiple traffic categories can be identified based on all sample values; Based on multiple sample values and corresponding multiple traffic categories, a classification learning algorithm is executed to classify values other than multiple sample values to generate an attack identification data model. The attack identification data model includes multiple identification features based on all The identification feature can identify a second number of multiple traffic categories, the second number is greater than the first number; the control identification module is responsible for receiving multiple unfamiliar traffic in the identification mode; and, multiple identification features based on the attack identification data model And the numerical classification of each unfamiliar traffic to a whitelisted traffic category or a blacklisted traffic category, where multiple unfamiliar traffic is sent from the control device to the controlled device, or from the controlled device to the control device.

本發明基於少量的樣本流量可辨識多種的流量類別，並可準確判斷未定義的陌生流量屬於白名單或黑名單。 The present invention can identify a variety of traffic categories based on a small amount of sample traffic, and can accurately determine that undefined unfamiliar traffic belongs to a white list or a black list.

100-102:攻擊流量 100-102: Attack traffic

11:攻擊偵測系統 11: Attack detection system

110-111:樣本 110-111: sample

120:白名單 120: Whitelist

121:黑名單 121: Blacklist

20:控制設備 20: control equipment

21:受控設備 21: controlled equipment

200、210、300、31:辨識模組 200, 210, 300, 31: identification module

30:中繼設備 30: Relay device

400:處理裝置 400: Processing device

401:儲存裝置 401: storage device

402:人機介面 402: Human-Machine Interface

403:傳輸裝置 403: Transmission Device

404:功能裝置 404: functional device

500:白名單樣本值 500: Whitelist sample value

501:黑名單樣本值 501: Blacklist sample value

51:分類學習演算法 51: Classification learning algorithm

52:白名單 52: Whitelist

53:黑名單 53: Blacklist

54:攻擊辨識資料模型 54: Attack Identification Data Model

60-63:流量類別 60-63: Traffic category

70-73:辨識特徵 70-73: Identify features

S100-S103:訓練步驟 S100-S103: training steps

S104-S108:第一辨識步驟 S104-S108: The first identification step

S20-S21:第一樣本值取得步驟 S20-S21: Steps to obtain the first sample value

S22:第二樣本值取得步驟 S22: Steps to obtain the second sample value

S30-S33:分類步驟 S30-S33: Classification steps

S400-S409:第二辨識步驟 S400-S409: Second identification step

S500-S509:第二辨識步驟 S500-S509: Second identification step

圖1為現有的攻擊偵測系統的運作示意圖。 Figure 1 is a schematic diagram of the operation of an existing attack detection system.

圖2為本發明一實施態樣的自動控制系統的架構圖。 Fig. 2 is a structural diagram of an automatic control system according to an embodiment of the present invention.

圖3為本發明一實施態樣的自動控制系統的架構圖。 Fig. 3 is a structural diagram of an automatic control system according to an embodiment of the present invention.

圖4為本發明一實施態樣的電腦裝置的架構圖。 FIG. 4 is a structural diagram of a computer device according to an embodiment of the present invention.

圖5為本發明第一實施例的攻擊辨識資料模型的生成與應用方法的流程圖。 FIG. 5 is a flowchart of the method for generating and applying an attack identification data model according to the first embodiment of the present invention.

圖6為本發明第二實施例的攻擊辨識資料模型的生成與應用方法的部分流程圖。 6 is a partial flowchart of the method for generating and applying an attack identification data model according to the second embodiment of the present invention.

圖7為本發明第三實施例的分類學習演算法的流程圖。 FIG. 7 is a flowchart of the classification learning algorithm according to the third embodiment of the present invention.

圖8為本發明第四實施例的攻擊辨識資料模型的生成與應用方法的流程圖。 FIG. 8 is a flowchart of a method for generating and applying an attack identification data model according to a fourth embodiment of the present invention.

圖9為本發明第五實施例的攻擊辨識資料模型的生成與應用方法的流程圖。 Fig. 9 is a flowchart of a method for generating and applying an attack identification data model according to a fifth embodiment of the present invention.

圖10為本發明一實施態樣的攻擊辨識資料模型的生成示意圖。 FIG. 10 is a schematic diagram of generating an attack identification data model according to an embodiment of the present invention.

圖11為本發明一實施態樣的基於單欄位的決策樹演算法的執行示意圖。 FIG. 11 is a schematic diagram of the execution of a single-column-based decision tree algorithm according to an embodiment of the present invention.

圖12為本發明一實施態樣的基於多欄位的決策樹演算法的執行示意圖。 FIG. 12 is a schematic diagram of the execution of a decision tree algorithm based on multiple fields according to an embodiment of the present invention.

圖13為本發明一實施態樣的多個陌生流量的多個欄位的示意圖。 FIG. 13 is a schematic diagram of multiple fields of multiple unfamiliar traffic according to an embodiment of the present invention.

茲就本發明之一較佳實施例，配合圖式，詳細說明如後。 With regard to a preferred embodiment of the present invention, the detailed description is given below in conjunction with the drawings.

請參閱圖1，為現有的攻擊偵測系統的運作示意圖，用以更清楚地說明本發明所要解決的技術問題。 Please refer to FIG. 1, which is a schematic diagram of the operation of the existing attack detection system, which is used to more clearly illustrate the technical problem to be solved by the present invention.

如圖1所示，攻擊偵測系統11是預先儲存有黑名單121的多個樣本110-111，其值分別為A與B，即攻擊偵測系統11僅能辨識兩種的流量類別。 As shown in FIG. 1, the attack detection system 11 is pre-stored with a plurality of samples 110-111 of the blacklist 121, the values of which are A and B respectively, that is, the attack detection system 11 can only identify two types of traffic.

當進行攻擊偵測時，攻擊偵測系統11是將所收到的各陌生流量的值(以攻擊流量100-102為例，其值分別為A、B、C)與所有樣本100-111的值進行比較，以決定各流量屬於白名單120的流量或黑名單121的流量。 When performing attack detection, the attack detection system 11 compares the value of each unfamiliar traffic received (take the attack traffic 100-102 as an example, the values are A, B, C) and all samples 100-111 The values are compared to determine that each flow belongs to the flow of the white list 120 or the flow of the black list 121.

於圖1的例子中，攻擊流量100-101分別與黑名單121的樣本110-111具有相同值，而會被辨識為黑名單121的流量，然而，攻擊流量102由於與樣本110-111具有不同值，而會被誤判為白名單120的流量。 In the example in Figure 1, the attack traffic 100-101 has the same value as the samples 110-111 of the blacklist 121, and will be identified as the traffic of the blacklist 121. However, the attack traffic 102 is different from the samples 110-111. Value, and will be misjudged as the traffic of the whitelist 120.

因此，現有攻擊偵測系統僅能對與已知的樣本完全相同的陌生流量進行辨識，未曾發生攻擊或不在記錄中的攻擊樣本將無法成功辨識，這使得辨識失敗或誤判的機率過高，而降低了系統的可靠度。 Therefore, the existing attack detection system can only identify the unfamiliar traffic that is exactly the same as the known sample, and the attack sample that has not had an attack or is not in the record will not be successfully identified, which makes the probability of identification failure or misjudgment too high, and Reduce the reliability of the system.

此外，於不同類型(如不同應用目的或不同網路協定)的自動控制系統中，所傳遞的流量內容也會不同，而需要不同的偵測規則，而需要一種可隨應用類型自動進行訓練與攻擊偵測的解決方案。 In addition, in different types of automatic control systems (such as different application purposes or different network protocols), the content of the traffic delivered will also be different, and different detection rules are required, and a system that can be automatically trained and used according to the application type is required. Solutions for attack detection.

然而，工業控制網路協定並不像IPv4如此普遍，不同的工業控制網路常使用不同的工業控制網路協定，而沒有任何一種攻擊辨識資料模型可同時適用於所有類型的工業控制網路，而需要一種可隨當前的工業控制網路類型自動進行訓練與攻擊偵測的解決方案。 However, industrial control network protocols are not as common as IPv4. Different industrial control networks often use different industrial control network protocols, and there is no attack identification data model that can be applied to all types of industrial control networks at the same time. What is needed is a solution that can automatically perform training and attack detection with the current type of industrial control network.

為解決上述問題，本發明主要是提供一種攻擊辨識資料模型的生成與應用方法，可經由對多個樣本流量進行學習訓練來產生一組攻擊辨識資料模型，並以此攻擊辨識資料模型來對網路流量進行偵測，以辨識各網路流量類別的目的(如為正常流量、可疑流量或攻擊流量)。前述攻擊辨識資料模型由於是經由學習訓練所產生並採用了不同思維的分類辨識方式，其可辨識的流量類別的數量是被擴充為大於訓練用的樣本流量的流量類別。 In order to solve the above problems, the present invention mainly provides a method for generating and applying an attack identification data model. A set of attack identification data models can be generated by learning and training multiple sample flows, and the attack identification data model can be used to attack the network. Road traffic is detected to identify the purpose of each network traffic category (such as normal traffic, suspicious traffic, or attack traffic). Since the aforementioned attack identification data model is generated through learning and training and adopts a classification and identification method of different thinking, the number of recognizable traffic categories is expanded to a traffic category larger than the sample traffic used for training.

請參閱圖2，為本發明一實施態樣的自動控制系統的架構圖。本發明的攻擊辨識資料模型的生成與應用方法可應用於圖2所示的自動控制系統2。 Please refer to FIG. 2, which is a structural diagram of an automatic control system according to an embodiment of the present invention. The method for generating and applying the attack identification data model of the present invention can be applied to the automatic control system 2 shown in FIG. 2.

具體而言，自動控制系統2主要包括控制設備20(如伺服器或控制主機)與一或多個受控設備21(如機器人、物聯網節點、工業自動化設備、末端設備等等)。控制設備20經由網路連接各受控設備21，並可傳輸控制指令(即流量)至受控設備21以控制受控設備21執行指定操作，或自受控設備21接收回傳資料(即流量)。 Specifically, the automatic control system 2 mainly includes a control device 20 (such as a server or a control host) and one or more controlled devices 21 (such as a robot, an IoT node, an industrial automation device, an end device, etc.). The control device 20 is connected to each controlled device 21 via the network, and can transmit control instructions (that is, flow) to the controlled device 21 to control the controlled device 21 to perform specified operations, or receive return data (that is, flow) from the controlled device 21 ).

於一實施態樣中，控制設備20包括辨識模組200，辨識模組200是用來基於前述攻擊辨識資料模型對控制設備20所收到的流量進行辨識，以決定所收到的各流量所屬的流量類別。藉此，本發明可於控制設備20上實現網路攻擊偵測。 In an implementation aspect, the control device 20 includes an identification module 200. The identification module 200 is used to identify the traffic received by the control device 20 based on the aforementioned attack identification data model, so as to determine the identity of each received traffic. Traffic category. In this way, the present invention can realize network attack detection on the control device 20.

於一實施態樣中，受控設備21包括辨識模組210，辨識模組210是用來基於前述攻擊辨識資料模型對受控設備21所收到的流量進行辨識，以決定所收到的各流量所屬的流量類別。藉此，本發明可於受控設備21上實現網路攻擊偵測。 In an implementation aspect, the controlled device 21 includes an identification module 210. The identification module 210 is used to identify the traffic received by the controlled device 21 based on the aforementioned attack identification data model to determine the received data. The traffic category to which the traffic belongs. In this way, the present invention can realize network attack detection on the controlled device 21.

值得一提的是，本發明的攻擊辨識資料模型可用來將各陌生流量分類至預先定義的多種流量類別的其中之一，而各流量類別可事先被歸屬於白名單或黑名單。藉此，於本發明中，當各陌生流量被分類完成時，即可依據所屬的流量類別來判定此流量為白名單或黑名單。 It is worth mentioning that the attack identification data model of the present invention can be used to classify each unfamiliar traffic into one of a plurality of predefined traffic categories, and each traffic category can be assigned to a whitelist or a blacklist in advance. Therefore, in the present invention, when each unfamiliar traffic is classified, it can be determined as a whitelist or a blacklist according to the traffic category to which it belongs.

續請一併參閱圖3，為本發明一實施態樣的自動控制系統的架構圖。圖3的控制設備20與受控設備21是與圖2所示相同或相似，於此不再贅述。 Please also refer to FIG. 3, which is a structural diagram of an automatic control system according to an embodiment of the present invention. The control device 20 and the controlled device 21 in FIG. 3 are the same as or similar to those shown in FIG. 2 and will not be repeated here.

於圖3的實施態樣中，自動控制系統3更包括中繼設備30(如網路交換器、路由器、橋接器等等)。控制設備20是經由中繼設備30來連接受控設備 21，即中繼設備30用以轉傳控制設備20發送至受控設備21的流量，或轉傳受控設備21發送至控制設備20的流量。 In the implementation aspect of FIG. 3, the automatic control system 3 further includes a relay device 30 (such as a network switch, a router, a bridge, etc.). The control device 20 is connected to the controlled device via the relay device 30 21, that is, the relay device 30 is used to forward the traffic sent by the control device 20 to the controlled device 21, or forward the traffic sent by the controlled device 21 to the control device 20.

於一實施態樣中，中繼設備30包括辨識模組300，辨識模組300是用來基於前述攻擊辨識資料模型對中繼設備30所收到的流量(即轉發的流量)進行辨識，以決定所收到的各流量所屬的流量類別。藉此，本發明僅需於中繼設備30架設辨識模組300即可實現整個網路的攻擊偵測。 In an implementation aspect, the relay device 30 includes an identification module 300. The identification module 300 is used to identify the traffic received by the relay device 30 (that is, the forwarded traffic) based on the aforementioned attack identification data model. Decide which traffic category each received traffic belongs to. In this way, the present invention only needs to set up the identification module 300 on the relay device 30 to realize the attack detection of the entire network.

於一實施態樣中，辨識模組31為獨立設備(如獨立電腦主機或伺服器)，中繼設備30經由網路連接辨識模組31，並且，中繼設備30於收到陌生流量時，可將陌生流量(或陌生流量的副本)傳輸至辨識模組31，再由辨識模組31決定所收到的各流量所屬的流量類別。藉此，本發明可降低中繼設備的負擔。 In one implementation aspect, the identification module 31 is an independent device (such as an independent computer host or server), and the relay device 30 is connected to the identification module 31 via a network, and when the relay device 30 receives unfamiliar traffic, The unfamiliar traffic (or a copy of the unfamiliar traffic) can be transmitted to the identification module 31, and then the identification module 31 determines the traffic category to which each received traffic belongs. Thereby, the present invention can reduce the burden of the relay device.

請一併參閱圖4，為本發明一實施態樣的電腦裝置的架構圖。前述的控制設備20、受控設備21、中繼設備30及辨識模組31可為如圖4所示的電腦裝置4。 Please also refer to FIG. 4, which is a structural diagram of a computer device according to an embodiment of the present invention. The aforementioned control device 20, controlled device 21, relay device 30, and identification module 31 may be the computer device 4 as shown in FIG. 4.

具體而言，電腦裝置4可包括儲存裝置401、人機介面402、傳輸裝置403、功能裝置404及電性連接上述裝置的處理裝置400。 Specifically, the computer device 4 may include a storage device 401, a human-machine interface 402, a transmission device 403, a functional device 404, and a processing device 400 electrically connected to the foregoing devices.

儲存裝置401用以儲存資料，如攻擊辨識資料模型、或用於控制功能裝置404的程式等等。人機介面402用以接受用戶輸入，並輸出資訊。人機介面402可包括各式輸入裝置與輸出裝置的任意組合，如觸控螢幕、按鍵組、滑鼠、顯示器、指示燈、揚聲器等等，不加以限定。傳輸裝置403用以連接網路，如乙太網路模組、Wi-Fi網路模組或行動網路模組等等。 The storage device 401 is used to store data, such as an attack identification data model, or a program used to control the functional device 404, and so on. The man-machine interface 402 is used to accept user input and output information. The human-machine interface 402 may include any combination of various input devices and output devices, such as a touch screen, a button set, a mouse, a display, an indicator light, a speaker, etc., and is not limited. The transmission device 403 is used to connect to a network, such as an Ethernet network module, a Wi-Fi network module, or a mobile network module, etc.

功能裝置404用以實現設備指定功能。舉例來說，以受控設備21為自動製造設備為例，則功能裝置404可為運輸帶、機器手臂或其他用於自動製造的裝置。以受控設備21為自動檢測設備為例，則功能裝置404可為攝影機、攝影機或物件的移動裝置或其他用於自動檢測的裝置。以控制設備20為工業管理主機為例，則功能裝置404可為管理系統或備援裝置。以中繼設備30為網路交換器或路由器為例，則功能裝置404可為交換器模組或路由器模組。 The function device 404 is used to implement the equipment designated function. For example, taking the controlled device 21 as an automatic manufacturing device as an example, the functional device 404 may be a conveyor belt, a robotic arm, or other devices for automatic manufacturing. Taking the controlled device 21 as an automatic detection device as an example, the functional device 404 may be a camera, a camera or a moving device of an object, or other devices for automatic detection. Work with control equipment 20 Take the business management host as an example, the functional device 404 can be a management system or a backup device. Taking the relay device 30 as a network switch or router as an example, the functional device 404 may be a switch module or a router module.

於一實施態樣中，儲存裝置401可儲存有電腦程式，上述電腦程式記錄有電腦可執行的程式碼。當處理裝置400執行上述電腦程式時，可實現本發明後續各實施例的攻擊辨識資料模型的生成與應用方法。 In an implementation aspect, the storage device 401 may store a computer program, and the computer program records a computer executable program code. When the processing device 400 executes the above-mentioned computer program, the method for generating and applying the attack identification data model of the subsequent embodiments of the present invention can be realized.

續請一併參閱圖5，為本發明第一實施例的攻擊辨識資料模型的生成與應用方法的流程圖。圖5的對應說明中，是以攻擊辨識資料模型的生成與應用方法應用於圖3所示的自動控制系統3來進行說明，但不以此限定。 Please also refer to FIG. 5, which is a flowchart of the method for generating and applying an attack identification data model according to the first embodiment of the present invention. In the corresponding description of FIG. 5, the method of generating and applying the attack identification data model is applied to the automatic control system 3 shown in FIG. 3, but it is not limited thereto.

於一實施例中，攻擊辨識資料模型的生成與應用方法亦可應用於圖2所示的自動控制系統2來加以實現。 In one embodiment, the method for generating and applying the attack identification data model can also be applied to the automatic control system 2 shown in FIG. 2 for implementation.

本發明的攻擊辨識資料模型的生成與應用方法主要分為兩個階段，訓練模式與辨識模式。訓練模式下，本發明可對已知流量進行訓練，來產生攻擊辨識資料模型。辨識模式下，本發明是使用攻擊辨識資料模型來對陌生流量進行辨識。 The method for generating and applying the attack identification data model of the present invention is mainly divided into two stages, a training mode and an identification mode. In the training mode, the present invention can train known traffic to generate an attack identification data model. In the identification mode, the present invention uses the attack identification data model to identify unfamiliar traffic.

值得一提的是，雖於後續說明中是由辨識模組31來執行訓練模式與辨識模式，但不以此限定。 It is worth mentioning that although the training mode and the recognition mode are executed by the recognition module 31 in the following description, it is not limited thereto.

於一實施例中，本發明亦可改由辨識模組200、辨識模組210及/或辨識模組300來執行訓練模式與辨識模式。 In an embodiment, the present invention can also be modified by the recognition module 200, the recognition module 210, and/or the recognition module 300 to execute the training mode and the recognition mode.

於一實施例中，訓練模式與辨識模式可由不同的電腦裝置來加以執行。舉例來說，辨識模組31執行訓練模式，併將所產生的攻擊辨識資料模型傳送至其他辨識模組(如辨識模組300或者辨識模組200與210)，以由其他辨識模組來執行辨識模式。藉此，本發明可分散訓練負載與辨識負載。 In one embodiment, the training mode and the recognition mode can be executed by different computer devices. For example, the recognition module 31 executes the training mode, and transmits the generated attack recognition data model to other recognition modules (such as the recognition module 300 or the recognition modules 200 and 210) for execution by the other recognition modules Identification mode. In this way, the present invention can disperse the training load and the identification load.

首先，辨識模組31執行步驟S100-S103來於訓練模式下產生攻擊辨識資料模型。 First, the identification module 31 executes steps S100-S103 to generate an attack identification data model in the training mode.

步驟S100：辨識模組31依據用戶操作或自動控制切換至於訓練模式，以準備執行學習訓練。 Step S100: The identification module 31 switches to the training mode according to user operations or automatic control, so as to prepare to perform learning training.

步驟S101：辨識模組31取得多個樣本流量，並對多個樣本流量的多個數值進行統計以獲得多個樣本值。前述多個樣本流量是已知目的的流量(如為白名單的流量或黑名單的流量)，或者具有高可信度的流量(如為可信任設備所發送過來的流量，而可直接推定為白名單的流量)。並且，基於所決定的所有樣本值可對第一數量(如800種或1000種)的流量類別進行辨識。 Step S101: The identification module 31 obtains a plurality of sample flows, and performs statistics on a plurality of values of the plurality of sample flows to obtain a plurality of sample values. The foregoing multiple sample traffic is traffic with a known purpose (such as whitelisted traffic or blacklisted traffic), or traffic with high credibility (such as traffic sent from a trusted device, which can be directly presumed to be Whitelisted traffic). In addition, the first number (for example, 800 or 1000) traffic types can be identified based on all the determined sample values.

於一實施例中，各樣本流量包括多個欄位(如封包長度、通訊協定代碼、功能代碼、每秒封包量、及/或發送時間戳等等)，辨識模組31是選擇多個欄位的全部或部分作為指定欄位，並對所有樣本流量的指定欄位的數值進行統計，以獲得各指定欄位的一或多個樣本值。 In one embodiment, each sample traffic includes multiple fields (such as packet length, protocol code, function code, number of packets per second, and/or sending timestamp, etc.), and the identification module 31 selects multiple fields All or part of the bits are used as designated fields, and the values of the designated fields of all sample flows are counted to obtain one or more sample values of each designated field.

於一實施例中，辨識模組31是將各指定欄位所有出現過的值都作為此指定欄位的樣本值，但不以此限定。 In one embodiment, the identification module 31 uses all the values that have appeared in each designated field as the sample value of the designated field, but it is not limited thereto.

於一實施例中，辨識模組31是對各指定欄位所有出現過的值進行統計分析，來獲得樣本值，如將出現次數大於預設次數(如5次)的值作為樣本值，將有規律出現的多個值(如出現在連續流量中)作為多個樣本值，或將出現頻率較高(如出現頻率為前30%)的值作為樣本值等等。 In one embodiment, the identification module 31 performs a statistical analysis on all the values that have appeared in each designated field to obtain a sample value. For example, if a value with a number of occurrences greater than a preset number (for example, 5 times) is used as the sample value, Multiple values that appear regularly (such as in continuous flow) are used as multiple sample values, or values with a higher frequency (such as the top 30%) are used as sample values, and so on.

於一實施例中，辨識模組31是於自動控制系統3正常工作運作狀態下，經由中繼設備30連續擷取控制設備20與受控設備21之間的傳輸流量作為樣本流量(如擷取連續10分鐘的流量)。 In one embodiment, the identification module 31 continuously captures the transmission traffic between the control device 20 and the controlled device 21 through the relay device 30 under the normal working state of the automatic control system 3 as the sample traffic (such as capturing Continuous flow for 10 minutes).

步驟S102：辨識模組31基於多個樣本值與所對應的多個流量類別執行分類學習演算法，以對多個樣本值以外的數值進行分類，並產生攻擊辨識資料模型。 Step S102: The identification module 31 executes a classification learning algorithm based on the plurality of sample values and the corresponding plurality of traffic categories to classify values other than the plurality of sample values, and generate an attack identification data model.

並且，前述攻擊辨識資料模型可包括多個辨識特徵，而基於所有辨識特徵可對第二數量的多個流量類別進行辨識，前述第二數量大於前述第一數量，即攻擊辨識資料模型可以擴增所能辨識的流量類別的數量。 In addition, the aforementioned attack identification data model may include multiple identification features, and a second number of multiple traffic categories can be identified based on all the identification features. The aforementioned second number is greater than the aforementioned first number, that is, the attack identification data model can be expanded The number of traffic types that can be identified.

值得一提的是，本發明主要是提供一種將現有的機器學習技術應用於網路攻擊偵測的解決方案。 It is worth mentioning that the present invention mainly provides a solution for applying existing machine learning technology to network attack detection.

關於如何對樣本值進行機器學習訓練以產生攻擊辨識資料模型，於機器學習技術領域中已有許多相關文獻記載一，於此不加以贅述。 Regarding how to perform machine learning training on sample values to generate attack identification data models, there have been many related documents in the field of machine learning technology, so I will not repeat them here.

舉例來說，可採用分類演算法，如非監督式分類演算法或監督式分類演算法。前述非監督式分類演算法可為K平均值(K-means)、類神經網路(Neural Network)及平衡疊代削減聚類演算法(BIRCH)等等。前述監督式分類演算法可為決策樹(Decision Tree)、支持向量機(Support Vector Machine,SVM)、貝氏演算法(Naïve-Bayes)等等。 For example, a classification algorithm can be used, such as an unsupervised classification algorithm or a supervised classification algorithm. The aforementioned unsupervised classification algorithm can be K-means, Neural Network, and Balanced Iterative Reduction Clustering Algorithm (BIRCH), etc. The aforementioned supervised classification algorithm can be a Decision Tree (Decision Tree), a Support Vector Machine (SVM), a Bayesian algorithm (Naïve-Bayes), and so on.

於一實施例中，前述分類學習演算法是分析同一欄位的多個樣本值或者跨欄位的多個樣本值之間的關聯，並可進一步結合各欄位的極值(如一般經驗下的最小容許值或最大容許值)來算出前述多個辨識特徵。 In one embodiment, the aforementioned classification learning algorithm analyzes the correlation between multiple sample values in the same column or multiple sample values across columns, and can further combine the extreme values of each column (as in general experience) Minimum allowable value or maximum allowable value) to calculate the aforementioned multiple identification features.

於一實施例中，前述多個辨識特徵分別對應多種流量類別。並且，多種流量類別是分別屬於白名單或黑名單的其中之一。藉此，當任一陌生流量符合多個辨識特徵的其中之一時，此陌生流量即屬於符合的辨識特徵所對應的流量類別，並可進一步依據此流量類別屬於白名單或黑名單來判定此陌生流量為正常流量或可疑流量。 In one embodiment, the aforementioned multiple identification features respectively correspond to multiple types of traffic. In addition, multiple traffic categories belong to one of the whitelist or blacklist respectively. In this way, when any unfamiliar traffic meets one of the multiple identification features, the unfamiliar traffic belongs to the traffic category corresponding to the matching identification feature, and the unfamiliar traffic can be further determined based on whether the traffic category belongs to the whitelist or blacklist. The traffic is normal or suspicious.

步驟S103：辨識模組31輸出攻擊辨識資料模型，如將攻擊辨識資料模型以檔案形式匯出，儲存於儲存裝置401或經由傳輸裝置403傳送至其他辨識模組。 Step S103: The identification module 31 outputs the attack identification data model. For example, the attack identification data model is exported as a file, stored in the storage device 401 or transmitted to other identification modules via the transmission device 403.

藉此，本發明可以經由輸入樣本流量來快速訓練專屬於當前網路環境的攻擊辨識資料模型，而可適用於不同類型的網路環境或自動控制系統。 In this way, the present invention can quickly train an attack identification data model specific to the current network environment through the input sample traffic, and can be applied to different types of network environments or automatic control systems.

接著，辨識模組31可執行步驟S104-S108來於辨識模式下偵測網路流量攻擊。 Then, the identification module 31 can perform steps S104-S108 to detect network traffic attacks in the identification mode.

步驟S104：辨識模組31依據用戶操作或自動控制切換至於辨識模式，以準備執行攻擊偵測。 Step S104: The identification module 31 switches to the identification mode according to user operations or automatic control, so as to prepare to perform attack detection.

步驟S105：辨識模組31載入於步驟S103中所輸出的攻擊辨識資料模型。 Step S105: The identification module 31 loads the attack identification data model output in step S103.

步驟S106：辨識模組31開始接收多個陌生流量。前述陌生流量可為由控制設備20發送至受控設備21，及/或由受控設備21發送至控制設備20。 Step S106: The identification module 31 starts to receive a plurality of unfamiliar traffic. The aforementioned unfamiliar traffic may be sent by the control device 20 to the controlled device 21 and/or sent by the controlled device 21 to the control device 20.

步驟S107：辨識模組31基於攻擊辨識資料模型的多個辨識特徵及各陌生流量的數值來對各陌生流量進行分類，以辨識陌生流量所屬的流量類別。 Step S107: The identification module 31 classifies each unfamiliar traffic based on the multiple identification features of the attack identification data model and the value of each unfamiliar traffic to identify the traffic category to which the unfamiliar traffic belongs.

更進一步地，由於各流量類別已預先被歸類至白名單與黑名單的其中之一，辨識模組31可依據各陌生流量所屬的流量類別來決定此陌生流量屬於白名單的流量(即正常行為)或黑名單的流量(即可疑行為或攻擊行為)。 Furthermore, since each traffic category has been pre-classified to one of the whitelist and the blacklist, the identification module 31 can determine that the strange traffic belongs to the whitelist according to the traffic category to which each strange traffic belongs (that is, normal Behavior) or blacklisted traffic (that is, suspicious behavior or offensive behavior).

於一實施例中，如同各樣本流量，各陌生流量可包括多個欄位。辨識模組31於步驟S107中是將攻擊辨識資料模型的多個辨識特徵逐一與各陌生流量的多個欄位的值進行比較，並於欄位的數值符合任一辨識特徵時，將此辨識特徵鎖定的流量類別做為此陌生流量的流量類別。藉以實現陌生流量的分類。 In one embodiment, as with each sample flow, each unfamiliar flow may include multiple fields. In step S107, the identification module 31 compares the multiple identification features of the attack identification data model with the values of multiple fields of each unfamiliar traffic, and when the value of the field matches any identification feature, the identification The traffic category locked by the feature is used as the traffic category of this unfamiliar traffic. In order to realize the classification of unfamiliar traffic.

步驟S108：辨識模組31判斷是否結束流量辨識。具體而言，辨識模組31是於預設的結束條件滿足時，自動結束流量辨識，即結束攻擊偵測。 Step S108: The identification module 31 determines whether to end the flow identification. Specifically, the identification module 31 automatically ends the flow identification when the preset end condition is met, that is, ends the attack detection.

於一實施例中，前述結束條件可為用戶手動關閉流量辨識功能、持續未收到任何陌生流量達預設結束時間、或受控制將處理資源釋放給其他程式或應用使用等等，不加以限定。 In one embodiment, the aforementioned end condition may be that the user manually disables the traffic recognition function, continues to receive no unfamiliar traffic for a preset end time, or controls the release of processing resources to other programs or applications, etc., and is not limited. .

若辨識模組31判斷結束條件滿足，則終止流量辨識。否則，辨識模組31持續執行步驟S106-S107以持續進行流量辨識。 If the identification module 31 determines that the termination condition is satisfied, the flow identification is terminated. Otherwise, the identification module 31 continues to perform steps S106-S107 to continue the flow identification.

本發明基於相同數量的樣本流量可辨識更多種的流量類別，並可準確判斷未定義的陌生流量屬於白名單或黑名單。 The invention can identify more types of traffic based on the same number of sample traffic, and can accurately determine that undefined unfamiliar traffic belongs to the whitelist or blacklist.

於一實施例中，前述樣本流量可為離線流量或即時流量。 In one embodiment, the aforementioned sample traffic may be offline traffic or real-time traffic.

以樣本流量為離線流量為例，辨識模組31於步驟S101中是於離線狀態(如中斷與控制設備20及受控設備21之間的連接，或是中斷網路連接)取得流量(如自其他電腦裝置接收流量或自儲存裝置401讀取流量)，並作為樣本流量。並且，辨識模組31於步驟S106中是於上線狀態(如連接控制設備20及受控設備21，或是恢復網路連接)取得流量，並作為陌生流量。 Taking the sample flow as offline flow as an example, the identification module 31 obtains flow (such as from the offline state) in the offline state (such as interrupting the connection with the control device 20 and the controlled device 21, or interrupting the network connection) in step S101. Other computer devices receive the flow or read the flow from the storage device 401) and use it as a sample flow. In addition, the identification module 31 obtains the traffic in the online state (such as connecting the control device 20 and the controlled device 21, or restoring the network connection) in step S106, and treats it as an unfamiliar traffic.

以樣本流量為即時流量為例，辨識模組31是自控制設備20及受控設備21持續接收多個流量，並於步驟S101中是將連續的多個流量的第一部分(如前三分鐘所收到的流量，或同一檔案/指令的前半部)作為樣本流量，於步驟S102、S103中即時產生並輸出攻擊辨識資料模型，並於步驟S104-S107中即時使用攻擊辨識資料模型來將連續的多個流量的第二部分(如第三分鐘以後的流量，或同一檔案/指令的後半部)作為陌生流量來進行分類以判斷連續的多個流量的第二部分的各流量是屬於白名單或黑名單。藉此，由於連續的多個流量之間通常具有較高關聯性或相近格式，本發明經由即時使用同一組流量的一部分來辨識另一部分，不僅可節省離線訓練的時間與樣本流量，還可具有較高辨識正確性。 Taking the sample flow as the real-time flow as an example, the identification module 31 continuously receives multiple flows from the control device 20 and the controlled device 21, and in step S101, is the first part of the continuous multiple flows (as shown in the previous three minutes). The received traffic, or the first half of the same file/command) is used as the sample traffic. In steps S102 and S103, the attack identification data model is generated and output in real time, and the attack identification data model is used in steps S104-S107 to combine the continuous The second part of multiple traffic (such as traffic after the third minute, or the second half of the same file/command) is classified as unfamiliar traffic to determine whether each traffic of the second part of the continuous multiple traffic belongs to the whitelist or blacklist. In this way, since multiple consecutive flows usually have high correlation or similar formats, the present invention uses one part of the same set of flows to identify another part in real time, which not only saves offline training time and sample flow, but also has Higher recognition accuracy.

續請一併參閱圖5及圖6，圖6為本發明第二實施例的攻擊辨識資料模型的生成與應用方法的部分流程圖。相較於圖5所示的實施例，本實施例進一步提供一種樣本值擴增功能，可於執行訓練前增加樣本值的數量，藉以提升攻擊辨識資料模型的準確性。 Please refer to FIG. 5 and FIG. 6 together. FIG. 6 is a partial flowchart of the method for generating and applying an attack identification data model according to the second embodiment of the present invention. Compared with the embodiment shown in FIG. 5, this embodiment further provides a sample value amplification function, which can increase the number of sample values before performing training, so as to improve the accuracy of the attack identification data model.

具體而言，於本實施例中，步驟S101包括步驟S20-S21及/或步驟S22。更進一步地，於步驟S101中經由統計獲得的多個樣本值可能僅包括白名單樣本值(即樣本流量皆為白名單流量)或同時包括白名單樣本值與黑名單樣本值(即樣本流量包括白名單流量與黑名單流量)。 Specifically, in this embodiment, step S101 includes steps S20-S21 and/or step S22. Furthermore, the multiple sample values obtained through statistics in step S101 may include only whitelist sample values (that is, sample flows are all whitelist flows) or both whitelist sample values and blacklist sample values (that is, sample flows include Whitelist traffic and blacklist traffic).

前述第一種情況中，由於缺乏黑名單樣本值，所訓練出來的攻擊辨識資料模型對於黑名單的辨識能力較差；前述第二種情況中，由於白名單樣本值與黑名單樣本值的數量未必為相等，所訓練出來的攻擊辨識資料模型對於白名單或黑名單其中之一的辨識能力可能較差。 In the foregoing first case, due to the lack of blacklist sample values, the trained attack identification data model has a poor ability to recognize blacklists; in the foregoing second case, because the number of whitelist sample values and blacklist sample values may not be To be equal, the trained attack identification data model may have poor identification ability for either the white list or the black list.

對此，本發明提出一種樣本值擴增功能，可經由下述步驟S20-S21解決缺乏黑名單樣本值的問題。 In this regard, the present invention provides a sample value amplification function, which can solve the problem of lack of blacklist sample values through the following steps S20-S21.

步驟S20：辨識模組31對白名單的多個樣本流量的多個數值進行統計以獲得多個白名單樣本值。 Step S20: The identification module 31 performs statistics on multiple values of the multiple sample flows of the whitelist to obtain multiple whitelist sample values.

步驟S21：辨識模組31對所獲得的多個白名單樣本值執行反向分析處理以獲得對應的多個黑名單樣本值。 Step S21: The identification module 31 performs a reverse analysis process on the obtained multiple whitelist sample values to obtain the corresponding multiple blacklist sample values.

於一實施例中，前述反向分析處理是依照當前使用的網路協定(如Modbus等工業控制協定)的傳輸限制、慣用數值(如最大長度、常見長度、常用功能碼、已定義功能碼等等)及/或白名單樣本值未涵蓋的數值範圍，來產生黑名單樣本值。 In one embodiment, the aforementioned reverse analysis processing is based on the transmission limits of currently used network protocols (such as Modbus and other industrial control protocols), customary values (such as maximum length, common length, commonly used function codes, defined function codes, etc.) Etc.) and/or the range of values not covered by the whitelist sample value to generate the blacklist sample value.

於一實施例中，為了平衡訓練用的白名單樣本與黑名單樣本的數量，前述反向分析處理後，複製原始樣本值使得黑名單樣本與白名單樣本數量一致。 In one embodiment, in order to balance the number of whitelist samples and blacklist samples for training, after the aforementioned reverse analysis process, the original sample values are copied to make the numbers of blacklist samples and whitelist samples consistent.

於一實施例中，前述反向分析處理可將白名單樣本值中的最大值增加一定數量作為黑名單樣本值，或將最小值減少一定數量的作為白名單樣本值。 In one embodiment, the foregoing reverse analysis processing may increase the maximum value of the whitelist sample value by a certain amount as the blacklist sample value, or decrease the minimum value by a certain amount as the whitelist sample value.

並且，當多個流量包括黑名單的樣本流量時，本發明可經由下述步驟S22來取得對應的黑名單樣本值。 Moreover, when the multiple flows include the sample flows of the blacklist, the present invention can obtain the corresponding blacklist sample values through the following step S22.

步驟S22：辨識模組31對黑名單的多個樣本流量的多個數值進行統計以獲得多個黑名單樣本值。 Step S22: The identification module 31 performs statistics on multiple values of the multiple sample flows of the blacklist to obtain multiple blacklist sample values.

值得一提的是，於本發明中，步驟S22的執行主要是用來增加黑名單樣本值的數量，以進一步提升攻擊辨識資料模型對於黑名單的辨識正確性，並非本發明之必要步驟。 It is worth mentioning that, in the present invention, the execution of step S22 is mainly used to increase the number of blacklist sample values to further improve the accuracy of the blacklist identification of the attack identification data model, which is not a necessary step of the present invention.

於一實施例中，即便有黑名單的樣本流量，亦可不執行步驟S22，而僅由步驟S20-S21所獲得的白名單樣本值與其反向分析後的黑名單樣本值來訓練的攻擊辨識資料模型。並且，前述攻擊辨識資料模型具有同樣具有分辨白名單以外的不尋常流量的能力。 In one embodiment, even if there is a sample traffic of the blacklist, step S22 may not be performed, and only the attack identification data trained by the whitelist sample value obtained in steps S20-S21 and the blacklist sample value after reverse analysis model. Moreover, the aforementioned attack identification data model also has the ability to distinguish unusual traffic outside the whitelist.

並且，當所取得的樣本流量缺乏黑名單流量時，是僅執行步驟S20-S21以產生黑名單流量；當所取得的樣本流量僅包括黑名單流量時，則可僅執行步驟S22以獲取對應的黑名單樣本值。 Moreover, when the obtained sample flow lacks blacklist flow, only steps S20-S21 are executed to generate blacklist flow; when the obtained sample flow only includes blacklist flow, only step S22 may be executed to obtain the corresponding Blacklist sample value.

藉此，本發明可增加樣本值的數量，而可提升攻擊辨識資料模型的分類精確度。 In this way, the present invention can increase the number of sample values, and can improve the classification accuracy of the attack identification data model.

值得一提的是，由於實務上不可能獲得所有的黑名單樣本值，即不符合黑名單樣本值的數值可能是白名單樣本值，也可能是黑名單樣本值。若對不完全的黑名單樣本執行反向分析，可能獲得錯誤的白名單樣本值，而造成攻擊辨識資料模型將未知的攻擊流量誤判為正常流量，而造成攻擊偵測失準。 It is worth mentioning that it is impossible to obtain all the blacklist sample values in practice, that is, the values that do not meet the blacklist sample values may be whitelist sample values or blacklist sample values. If reverse analysis is performed on incomplete blacklist samples, wrong whitelist sample values may be obtained, causing the attack identification data model to misjudge the unknown attack traffic as normal traffic, resulting in inaccurate attack detection.

對此，本發明不會對黑名單樣本值進行反向分析來獲得可能錯誤的白名單樣本值，以避免上述攻擊偵測失準的情況。 In this regard, the present invention does not perform reverse analysis on the blacklist sample value to obtain the whitelist sample value that may be wrong, so as to avoid the above-mentioned attack detection inaccurate situation.

續請一併參閱圖10，為本發明一實施態樣的攻擊辨識資料模型的生成示意圖，用以簡單說明本發明如何建構攻擊辨識資料模型54。 Please also refer to FIG. 10, which is a schematic diagram of generating an attack identification data model of an implementation aspect of the present invention, which is used to briefly explain how the present invention constructs an attack identification data model 54.

如圖10所示，於要進行訓練時，用戶可將多個白名單樣本值500與黑名單樣本值501輸入至分類學習演算法51。 As shown in FIG. 10, when training is to be performed, the user can input multiple whitelist sample values 500 and blacklist sample values 501 into the classification learning algorithm 51.

接著，本發明經由執行分類學習演算法51可以產生白名單52的多個辨識特徵70-71與黑名單53的多個辨識特徵72-73。並且，前述多個辨識特徵70-71是分別與白名單52的多個流量類別60-61相關聯，並用來辨識陌生流量是否屬於對應的流量類別60-61；前述多個辨識特徵72-73是分別與黑名單53的多個流量類別62-63相關聯，並用來辨識陌生流量是否屬於對應的流量類別62-63。 Next, the present invention can generate multiple identification features 70-71 of the white list 52 and multiple identification features 72-73 of the black list 53 by executing the classification learning algorithm 51. In addition, the aforementioned multiple identification features 70-71 are respectively associated with multiple traffic categories 60-61 of the white list 52, and are used to identify whether the unfamiliar traffic belongs to the corresponding traffic category 60-61; the aforementioned multiple identification features 72-73 They are respectively associated with multiple traffic categories 62-63 of the blacklist 53, and used to identify whether unfamiliar traffic belongs to the corresponding traffic categories 62-63.

值得一提的是，前述各流量類別60-61、62-63可以理解為是對網路行為進行分類，即本發明是將不同的網路行為(如具有不同的欄位值的流量)分類至不同的流量類別，藉以判斷此網路行為屬於白名單(善意行為或正常行為)或黑名單(可疑行為或攻擊行為)。 It is worth mentioning that the aforementioned traffic categories 60-61 and 62-63 can be understood as classifying network behaviors, that is, the present invention classifies different network behaviors (such as traffic with different field values) To different traffic types, it can be judged that this network behavior belongs to the whitelist (goodwill behavior or normal behavior) or blacklist (suspicious behavior or offensive behavior).

最後，本發明將多個辨識特徵70-71、72-73與上述關聯封裝為攻擊辨識資料模型54。 Finally, the present invention encapsulates a plurality of identification features 70-71, 72-73 and the above-mentioned association into an attack identification data model 54.

續請一併參閱圖5及圖7，圖7為本發明第三實施例的分類學習演算法的流程圖。除了使用現有的演算法作為本發明之分類學習演算法之外，於本實施例中，本發明進一步提出一種新穎且進步的分類學習演算法。前述分類學習演算法是基於決策樹演算法來建構決策樹(即樹狀分類結構)，決策樹的多個葉節點(即後述的符合預設純度的子群所對應的節點)即分別對應前述多個流量類別，而決策樹的多個分支的多個分類條件即構成前述的多個辨識特徵。 Please refer to FIG. 5 and FIG. 7 together. FIG. 7 is a flowchart of the classification learning algorithm according to the third embodiment of the present invention. In addition to using the existing algorithm as the classification learning algorithm of the present invention, in this embodiment, the present invention further proposes a novel and advanced classification learning algorithm. The foregoing classification The learning algorithm is based on the decision tree algorithm to construct a decision tree (that is, a tree-like classification structure). The multiple leaf nodes of the decision tree (that is, the nodes corresponding to the subgroups that meet the preset purity described later) correspond to the aforementioned multiple Traffic category, and multiple classification conditions of multiple branches of the decision tree constitute the aforementioned multiple identification features.

具體而言，本實施例的分類學習演算法(即圖5的步驟S102所示「執行分類學習演算法」)包括以下步驟。 Specifically, the classification learning algorithm of this embodiment (ie, "execute the classification learning algorithm" shown in step S102 in FIG. 5) includes the following steps.

步驟S30：辨識模組31執行決策樹演算法來決定分類條件。前述分類條件是將多個樣本流量劃分為多個子群(各子群分別包括部分的樣本流量)。 Step S30: The identification module 31 executes the decision tree algorithm to determine the classification conditions. The foregoing classification condition is to divide a plurality of sample flows into a plurality of subgroups (each subgroup includes part of the sample flow).

於一實施例中，前述分類條件是樣本流量的多個欄位的其中之一的數值或數值範圍，且是基於此欄位的白名單樣本值(即產生白名單的分類條件)或黑名單樣本值(即產生黑名單的分類條件)所加以決定。 In one embodiment, the aforementioned classification condition is the value or value range of one of the multiple fields of the sample traffic, and is based on the whitelist sample value (that is, the classification condition for generating the whitelist) or the blacklist based on this field The sample value (that is, the classification condition for generating the blacklist) is determined.

步驟S31：計算各子群的純度，即此分類的可信度指標(即評估依據各子群所對應的分類條件進行分類，則分類別果的可信度如何)。 Step S31: Calculate the purity of each subgroup, that is, the credibility index of this classification (that is, evaluate the classification based on the classification conditions corresponding to each subgroup, and what is the credibility of the classification results).

並且，基於所對應的分類條件(如為白名單的分類條件或黑名單的分類條件)，各子群會被分別對應至白名單的流量類別或黑名單的流量類別。 Moreover, based on the corresponding classification conditions (for example, whitelist classification conditions or blacklist classification conditions), each subgroup is respectively corresponding to the traffic category of the whitelist or the traffic category of the blacklist.

於現有技術中已有許多方式可計算純度，如計算資訊增益(Information gain)，計算熵(Entropy)或計算吉尼係數(Gini index)，於此不再贅述。 There are many ways to calculate purity in the prior art, such as calculating information gain (Information gain), calculating entropy (Entropy) or calculating Gini index (Gini index), which will not be repeated here.

值得一提的是，雖於本實施例中，是以計算子群純度進行說明，但本發明所屬技術領域中具有通常知識者應理解，本發明的「計算子群純度」實際上應包括計算純度及計算不純度(因為不純度僅是純度的反向指標，其計算仍與純度的計算有關)。 It is worth mentioning that although in this embodiment, the calculation of subgroup purity is described, those skilled in the art to which the present invention belongs should understand that the "calculation of subgroup purity" in the present invention should actually include calculation Purity and calculated impurity (because impurity is only a reverse indicator of purity, its calculation is still related to the calculation of purity).

步驟S32：辨識模組31取得預設純度，並判斷是否任一子群的純度不符預設純度，如判斷子群的純度是否高於預設純度，或低於預設不純度。 Step S32: The identification module 31 obtains the preset purity, and judges whether the purity of any subgroup does not meet the preset purity, such as judging whether the purity of the subgroup is higher than the preset purity or lower than the preset purity.

若辨識模組31判斷所有子群的純度都符合預設純度，則完成分類，即完成決策樹的建構。 If the identification module 31 determines that the purity of all subgroups meets the preset purity, the classification is completed, that is, the construction of the decision tree is completed.

若辨識模組31判斷任一子群的純度不符預設純度，則執行步驟S33：對純度不符預設純度的子群再次執行前述決策樹演算法來決定另一分類條件。前述另一分類條件是將純度不符預設純度的子群再劃分為多個子群。 If the identification module 31 determines that the purity of any subgroup does not meet the preset purity, step S33 is executed: the aforementioned decision tree algorithm is executed again on the subgroup whose purity does not meet the preset purity to determine another classification condition. The aforementioned another classification condition is to subdivide the subgroups whose purity does not meet the preset purity into multiple subgroups.

接著，辨識模組31再次執行步驟S32，以判斷新劃分的多個子群是否符合預設純度，以此類推，直到所有子群的純度皆符合預設純度。 Then, the identification module 31 executes step S32 again to determine whether the newly divided multiple subgroups meet the preset purity, and so on, until the purity of all the subgroups meet the preset purity.

接著，辨識模組31(於圖5的步驟S102中)進一步將決策樹的各葉節點(即純度符合預設純度的各子群)所對應的所有分類條件設定為所對應的流量類別的辨識特徵。 Then, the identification module 31 (in step S102 of FIG. 5) further sets all the classification conditions corresponding to each leaf node of the decision tree (that is, each subgroup whose purity meets the preset purity) as the identification of the corresponding traffic category feature.

藉此，本發明可有效且準確地對樣本值與樣本值以外的數值進行分類，並產生攻擊辨識資料模型的多個辨識特徵。 In this way, the present invention can effectively and accurately classify the sample value and the values other than the sample value, and generate multiple identification features that attack the identification data model.

請參閱圖11及圖12，圖11為本發明一實施態樣的基於單欄位的決策樹演算法的執行示意圖，圖12為本發明一實施態樣的基於多欄位的決策樹演算法的執行示意圖。圖11及圖12用以示例性說明前述決策樹演算法。 Please refer to FIG. 11 and FIG. 12. FIG. 11 is a schematic diagram of the execution of a decision tree algorithm based on a single field in an embodiment of the present invention, and FIG. 12 is a decision tree algorithm based on multiple fields in an embodiment of the present invention Schematic diagram of the implementation. Figures 11 and 12 are used to exemplarily illustrate the aforementioned decision tree algorithm.

於圖11及圖12的例子中，決策樹演算法採用是分類與迴歸樹演算法(Classification And Regression Tree Algorithm)，而純度是吉尼係數(Gini Index)。並且，X[n]表示流量的欄位[n]的值；gini為不純度，當其值為0.0(預設純度)時表示所有樣本值都可以被正確分類；value[a,b]表示a+b個樣本值中，有a個(流量的欄位的)白名單樣本值，b個(流量的欄位的)黑名單樣本值。白名單樣本值及/或與黑名單樣本值可自樣本流量獲得，或經由前述反向分析處理獲得。 In the examples in Fig. 11 and Fig. 12, the decision tree algorithm is the Classification And Regression Tree Algorithm, and the purity is the Gini Index. In addition, X[n] represents the value of the flow field [n]; gini represents impurity, when its value is 0.0 (preset purity), it means that all sample values can be classified correctly; value[a,b] represents Among the a+b sample values, there are a whitelist sample values (for the flow field), and b blacklist sample values (for the flow field). The whitelist sample value and/or the blacklist sample value can be obtained from the sample flow or obtained through the aforementioned reverse analysis process.

如圖11所示，本例子是輸入1256個樣本值(包括1000個白名單樣本值與256個黑名單樣本值)。首先於節點80(根節點)以「欄位X[2]<=4.5」的分類條件(1)進行分類，可以獲得兩個子群(即分類條件(1)符合時的節點81與分類條件(1)不符時的節點82)。 As shown in Figure 11, this example is to input 1256 sample values (including 1000 whitelist sample values and 256 blacklist sample values). First, at node 80 (root node), take the score of "Field X[2]<=4.5" For classification under the class condition (1), two subgroups can be obtained (ie, the node 81 when the classification condition (1) is met and the node 82 when the classification condition (1) is not met).

節點82的子群共包括253個樣本值，且都為黑名單樣本值，故節點82的gini為0，此子群已正確分類(即節點82為葉節點)。 The subgroup of node 82 includes a total of 253 sample values, and all of them are blacklist sample values, so the gini of node 82 is 0, and this subgroup has been correctly classified (that is, node 82 is a leaf node).

節點81的子群共包括1003個樣本值(1000個白名單樣本值，3個黑名單樣本值)，節點81的gini為0.006，即此子群尚未正確分類。 The subgroup of node 81 includes a total of 1003 sample values (1000 whitelist sample values, 3 blacklist sample values), and the gini of node 81 is 0.006, that is, this subgroup has not been correctly classified.

對此，決策樹演算法會以「欄位X[2]<=2.5」分類條件(2)對節點81進行分類，可以獲得兩個子群(即分類條件(2)符合時的節點83與分類條件(2)不符時的節點84)。 In this regard, the decision tree algorithm will classify the node 81 according to the classification condition (2) of "Field X[2]<=2.5", and obtain two subgroups (that is, the node 83 and the node 83 when the classification condition (2) is met. The node 84 when the classification condition (2) is not met).

節點83的子群共包括3個樣本值，且都為黑名單樣本值，故節點83的gini為0，此子群已正確分類(即節點83為葉節點)。 The subgroup of node 83 includes a total of 3 sample values, all of which are blacklist sample values, so the gini of node 83 is 0, and this subgroup has been correctly classified (that is, node 83 is a leaf node).

節點84的子群共包括1000個樣本值，且都為白名單樣本值，故節點84的gini為0，此子群已正確分類(即節點84為葉節點)。 The subgroup of node 84 includes a total of 1000 sample values, and all of them are whitelist sample values, so the gini of node 84 is 0, and this subgroup has been correctly classified (that is, node 84 is a leaf node).

由於所有子群的純度都符合預設純度，故分類完成。於本次分類中，共有3個流量類別，即節點82-84。並且，屬於黑名單的節點82所對應的辨識特徵為：分類條件(1)不符；屬於黑名單的節點83所對應的辨識特徵為：分類條件(1)符合且分類條件(2)符合；屬於白名單的節點84所對應的辨識特徵為：分類條件(1)符合且分類條件(2)不符。 Since the purity of all subgroups meets the preset purity, the classification is completed. In this classification, there are 3 traffic categories, namely nodes 82-84. In addition, the identification feature corresponding to the node 82 belonging to the blacklist is: the classification condition (1) does not match; the identification feature corresponding to the node 83 belonging to the blacklist is: the classification condition (1) is met and the classification condition (2) is met; The identification feature corresponding to the node 84 of the whitelist is: the classification condition (1) is met and the classification condition (2) is not met.

藉此，本發明可規劃多個流量類別，並計算所有流量類別的辨識特徵。 In this way, the present invention can plan multiple traffic categories and calculate the identification characteristics of all traffic categories.

值得一提的是，雖於圖11的例子中，進針對單一欄位的樣本值進行分類，但不以此限定。 It is worth mentioning that although in the example of FIG. 11, the sample value of a single field is classified, but it is not limited by this.

用戶可以依需求選擇多個欄位來執行前述決策演算法以提升後續分類的準確度。藉以解決因樣本流量過少而無法準確進行分類的問題。 The user can select multiple fields as required to execute the aforementioned decision-making algorithm to improve the accuracy of subsequent classification. In order to solve the problem of inaccurate classification due to too little sample flow.

舉例來說，圖12的例子是輸入2280個樣本值(包括1000個白名單樣本值與1280個黑名單樣本值)。首先於節點90(根節點)以「欄位X[2]<=4.5」的分類條件(1)進行分類，可以獲得兩個子群(即分類條件(1)符合時的節點91與分類條件(1)不符時的節點92)。 For example, the example in FIG. 12 is to input 2280 sample values (including 1000 whitelist sample values and 1280 blacklist sample values). First, perform classification at node 90 (root node) with the classification condition (1) of "Field X[2]<=4.5", and obtain two subgroups (i.e., node 91 and classification condition when the classification condition (1) is met (1) Node 92 at the time of discrepancy).

節點92的子群共包括915個樣本值(皆為黑名單樣本值)，故gini為0。 The subgroup of node 92 includes a total of 915 sample values (all of which are blacklist sample values), so gini is 0.

節點91的子群共包括1365個樣本值(1000個白名單樣本值，365個黑名單樣本值，節點91的gini為0.392(未正確分類)。對此，決策樹演算法會以「欄位X[3]<=2.5」分類條件(2)對節點91進行分類(其欄位與臨界值的選擇可透過祭器學習方式計算獲得)，以獲得兩個子群(即分類條件(2)符合時的節點93與分類條件(2)不符時的節點94)。 The subgroup of node 91 includes a total of 1365 sample values (1000 whitelist sample values, 365 blacklist sample values, and the gini of node 91 is 0.392 (not correctly classified). For this, the decision tree algorithm will use the "field X[3]<=2.5" classification condition (2) classify the node 91 (the selection of its field and threshold can be calculated through the sacrificial learning method) to obtain two subgroups (that is, the classification condition (2) meets The node 93 at time does not match the classification condition (2) at the time node 94).

節點94的子群共包括262個樣本值(皆為黑名單樣本值)，故gini為0。 The subgroup of node 94 includes a total of 262 sample values (all blacklist sample values), so gini is 0.

節點93的子群共包括1103個樣本值(1000個白名單樣本值，103個黑名單樣本值，節點93的gini為0.169(未正確分類)。對此，決策樹演算法會以「欄位X[0]<=32774.0」分類條件(3)對節點93進行分類，以獲得兩個子群(即分類條件(3)符合時的節點95與分類條件(3)不符時的節點96)。 The subgroup of node 93 includes a total of 1103 sample values (1000 whitelist sample values, 103 blacklist sample values, and the gini of node 93 is 0.169 (not correctly classified). For this, the decision tree algorithm will use the "column X[0]<=32774.0" classification condition (3) classifies the node 93 to obtain two subgroups (ie, the node 95 when the classification condition (3) is met and the node 96 when the classification condition (3) does not match).

節點96的子群共包括100個樣本值(皆為黑名單樣本值)，故gini為0。 The subgroup of node 96 includes 100 sample values (all blacklist sample values), so gini is 0.

節點95的子群共包括1003個樣本值(1000個白名單樣本值，3個黑名單樣本值，節點95的gini為0.006(未正確分類)。對此，決策樹演算法會以「欄位X[2]<=2.5」分類條件(4)對節點95進行分類，以獲得兩個子群(即分類條件(4)符合時的節點97與分類條件(4)不符時的節點98)。 The subgroup of node 95 includes a total of 1003 sample values (1000 whitelist sample values, 3 blacklist sample values, and the gini of node 95 is 0.006 (not correctly classified). For this, the decision tree algorithm will use the "field X[2]<=2.5" The classification condition (4) classifies the node 95 to obtain two subgroups (ie, the node 97 when the classification condition (4) is met and the node 98 when the classification condition (4) is inconsistent).

節點97的子群共包括3個樣本值(皆為黑名單樣本值)，故gini為0。 The subgroup of node 97 includes 3 sample values (all blacklist sample values), so gini is 0.

節點98的子群共包括1000個樣本值(皆為白名單樣本值)，故gini為0。 The subgroup of node 98 includes a total of 1000 sample values (all are whitelist sample values), so gini is 0.

由於所有子群的純度都符合預設純度，故分類完成。於本次分類中，共有4個流量類別，即節點92、94、96-98。並且，屬於黑名單的節點92所對應的辨識特徵為：分類條件(1)不符；屬於黑名單的節點94所對應的辨識特徵為：分類條件(1)符合且分類條件(2)不符；屬於黑名單的節點96所對應的辨識特徵為：分類條件(1)、(2)符合且分類條件(3)不符；屬於黑名單的節點97所對應的辨識特徵為：分類條件(1)-(4)皆符合；屬於白名單的節點98所對應的辨識特徵為：分類條件(1)-(3)符合且分類條件(4)不符。 Since the purity of all subgroups meets the preset purity, the classification is completed. In this classification, there are 4 traffic categories, namely nodes 92, 94, 96-98. In addition, the identification feature corresponding to the node 92 belonging to the blacklist is: the classification condition (1) does not match; the identification feature corresponding to the node 94 belonging to the blacklist is: the classification condition (1) meets and the classification condition (2) does not meet; The identification feature corresponding to the node 96 of the blacklist is: the classification conditions (1) and (2) meet and the classification condition (3) does not meet; the identification feature corresponding to the node 97 of the blacklist is: the classification condition (1)-( 4) All meet; the identification feature corresponding to the node 98 belonging to the whitelist is: the classification condition (1)-(3) is met and the classification condition (4) is not met.

藉此，本發明可規劃關聯多個欄位的流量類別，而可有效提升分類準確度。 In this way, the present invention can plan the traffic categories associated with multiple fields, and can effectively improve the classification accuracy.

續請一併參閱圖3及圖8，圖8為本發明第四實施例的攻擊辨識資料模型的生成與應用方法的流程圖。於圖8的實施例中，是將所產生的攻擊辨識資料模型用於入侵偵測系統(Intrusion Detection System,IDS)，即僅辨識陌生流量屬於白名單或黑名單，即便陌生流量屬於黑名單，也不會阻擋陌生流量的傳輸。 Please refer to FIGS. 3 and 8 together. FIG. 8 is a flowchart of a method for generating and applying an attack identification data model according to a fourth embodiment of the present invention. In the embodiment of FIG. 8, the generated attack identification data model is used in an intrusion detection system (Intrusion Detection System, IDS), that is, only identifying strange traffic belonging to a whitelist or blacklist, even if the strange traffic belongs to a blacklist. It will not block the transmission of unfamiliar traffic.

具體而言，本實施例的攻擊辨識資料模型的生成與應用方法是包括以下辨識步驟。 Specifically, the method for generating and applying the attack identification data model of this embodiment includes the following identification steps.

步驟S400：辨識模組31依據用戶操作或自動控制切換至於辨識模式，以準備執行攻擊偵測。 Step S400: The identification module 31 switches to the identification mode according to user operations or automatic control, so as to prepare to perform attack detection.

步驟S401：辨識模組31載入攻擊辨識資料模型。 Step S401: The identification module 31 loads the attack identification data model.

步驟S402：中繼設備30判斷是否收到任一流量。若中繼設備30未收到任何流量，則再次執行步驟S402以持續偵測。 Step S402: The relay device 30 judges whether any traffic is received. If the relay device 30 does not receive any traffic, step S402 is executed again to continue the detection.

若中繼設備30收到流量，則執行步驟S403：中繼設備30產生所收到的流量的副本，並傳輸所產生的副本至辨識模組31作為陌生流量。 If the relay device 30 receives the traffic, step S403 is executed: the relay device 30 generates a copy of the received traffic, and transmits the generated copy to the identification module 31 as an unfamiliar traffic.

步驟S404：中繼設備30依據流量的目的地欄位轉傳此流量至所指示的控制設備20或受控設備21。 Step S404: The relay device 30 forwards the traffic to the indicated control device 20 or controlled device 21 according to the destination field of the traffic.

步驟S405：辨識模組31自中繼設備30接收陌生流量。 Step S405: The identification module 31 receives the unfamiliar traffic from the relay device 30.

值得一提的是，中繼設備30可即時將所收到的流量的副本傳送至辨識模組31，亦可累積固定數量的流量後再一次傳送至辨識模組31，或定時傳送至辨識模組31，不加以限定。 It is worth mentioning that the relay device 30 can instantly send a copy of the received traffic to the identification module 31, or it can accumulate a fixed amount of traffic and then send it to the identification module 31 again, or send it to the identification module at regular intervals. Group 31 is not limited.

步驟S406：辨識模組31基於攻擊辨識資料模型對所收到的陌生流量進行分類，以決定此陌生流量的流量類別。 Step S406: The identification module 31 classifies the received unfamiliar traffic based on the attack identification data model to determine the traffic category of the unfamiliar traffic.

步驟S407：辨識模組31判斷陌生流量是屬於白名單的流量類別或黑名單的流量類別。若陌生流量是屬於白名單，則執行步驟S409。 Step S407: The identification module 31 determines whether the unfamiliar traffic belongs to the traffic category of the whitelist or the traffic category of the blacklist. If the unfamiliar traffic belongs to the whitelist, step S409 is executed.

若陌生流量是屬於黑名單，則執行步驟S408：辨識模組31經由人機介面402發出警示以通知用戶，及/或做成記錄並儲存於儲存裝置401以供用戶日後查閱或作為下次訓練攻擊辨識資料模型的樣本流量。 If the unfamiliar traffic belongs to the blacklist, step S408 is executed: the identification module 31 issues an alert via the man-machine interface 402 to notify the user, and/or makes a record and stores it in the storage device 401 for the user to review in the future or for the next training The sample traffic of the attack identification data model.

步驟S409：判斷是否結束流量辨識。若辨識模組31判斷結束條件滿足，則終止流量辨識。否則，再次執行步驟S402以持續進行流量辨識。 Step S409: Determine whether to end the flow identification. If the identification module 31 determines that the termination condition is satisfied, the flow identification is terminated. Otherwise, step S402 is executed again to continue the flow identification.

藉此，本發明可有效實現入侵偵測，並減低中繼設備30的負載。 In this way, the present invention can effectively realize intrusion detection and reduce the load of the relay device 30.

續請一併參閱圖3及圖9，圖9為本發明第五實施例的攻擊辨識資料模型的生成與應用方法的流程圖。於圖9的實施例中，是將所產生的攻擊辨識資料模型用於入侵預防系統(Intrusion Prevention System,IPS)，即即時辨識陌生流量屬於白名單或黑名單，並於陌生流量屬於黑名單時即時進行處理。後續是以中繼設備30的辨識模組300執行入侵預防為例進行說明，但不以此限定，亦可改由辨識模組200、210或31來執行。 Please refer to FIG. 3 and FIG. 9 together. FIG. 9 is a flowchart of a method for generating and applying an attack identification data model according to a fifth embodiment of the present invention. In the embodiment of Figure 9, the generated attack identification data model is used in an intrusion prevention system (Intrusion Prevention System, IPS), that is, real-time identification of strangers. The raw traffic belongs to the whitelist or blacklist, and the unfamiliar traffic belongs to the blacklist in real time. In the following, the identification module 300 of the relay device 30 executes intrusion prevention as an example for description, but it is not limited to this, and the identification module 200, 210, or 31 can also execute it instead.

步驟S500：辨識模組300依據用戶操作或自動控制切換至於辨識模式，以準備執行攻擊預防。 Step S500: The identification module 300 switches to the identification mode according to user operations or automatic control, so as to prepare to perform attack prevention.

步驟S501：辨識模組300載入攻擊辨識資料模型。 Step S501: The identification module 300 loads the attack identification data model.

步驟S502：中繼設備30判斷是否收到任一流量。若中繼設備30未收到任何流量，則再次執行步驟S502以持續偵測。 Step S502: The relay device 30 judges whether any traffic is received. If the relay device 30 does not receive any traffic, step S502 is executed again to continue the detection.

若中繼設備30收到流量，則執行步驟S503：將流量傳輸至辨識模組300作為陌生流量。 If the relay device 30 receives the traffic, step S503 is executed: the traffic is transmitted to the identification module 300 as unfamiliar traffic.

步驟S504：辨識模組300自中繼設備30接收陌生流量。 Step S504: The identification module 300 receives unfamiliar traffic from the relay device 30.

步驟S505：辨識模組300基於攻擊辨識資料模型對所收到的陌生流量進行分類，以決定此陌生流量的流量類別。 Step S505: The identification module 300 classifies the received unfamiliar traffic based on the attack identification data model to determine the traffic category of the unfamiliar traffic.

步驟S506：辨識模組300判斷陌生流量是屬於白名單的流量類別或黑名單的流量類別。 Step S506: The identification module 300 determines whether the unfamiliar traffic belongs to the traffic category of the whitelist or the traffic category of the blacklist.

若陌生流量是屬於黑名單，則執行步驟S507：辨識模組300阻擋陌生流量的傳輸，即不會將此陌生流量傳輸至目的地。藉此預防攻擊行為抵達目的地的設備。 If the unfamiliar traffic belongs to the blacklist, step S507 is executed: the identification module 300 blocks the transmission of the unfamiliar traffic, that is, the unfamiliar traffic will not be transmitted to the destination. To prevent attacks from reaching the destination device.

若陌生流量是屬於白名單，則執行步驟S508：辨識模組300轉傳此陌生流量至目的地欄位所指示的控制設備20或受控設備21。 If the unfamiliar traffic belongs to the whitelist, step S508 is executed: the identification module 300 forwards the unfamiliar traffic to the control device 20 or the controlled device 21 indicated by the destination field.

步驟S509：判斷是否結束流量辨識。若辨識模組300判斷結束條件滿足，則終止流量辨識。否則，再次執行步驟S502以持續進行流量辨識。 Step S509: Determine whether to end the flow identification. If the identification module 300 determines that the termination condition is satisfied, the flow identification is terminated. Otherwise, step S502 is executed again to continue the flow identification.

藉此，本發明可有效實現入侵預防偵測。 In this way, the present invention can effectively realize intrusion prevention and detection.

續請參閱圖13，為本發明一實施態樣的多個陌生流量的多個欄位的示意圖。圖13用以示例性說明本發明相較於現有技術進步之處。 Please continue to refer to FIG. 13, which is a schematic diagram of multiple fields of multiple unfamiliar traffic according to an embodiment of the present invention. FIG. 13 is used to exemplarily illustrate the progress of the present invention compared with the prior art.

圖13示出了21筆陌生流量(分別為流量1-21)的欄位資料與經由本發明所產生的辨識結果，事先以流量1-6作為白名單的樣本流量並加以訓練為攻擊分類模型，其中流量1-11經辨識後屬於白名單的流量類別0-4，流量12-21經辨識後屬於黑名單的流量類別5-14。 Figure 13 shows the field data of 21 unfamiliar traffic (respectively traffic 1-21) and the identification results generated by the present invention. The traffic 1-6 is used as the sample traffic of the whitelist in advance and trained as an attack classification model. , Where the traffic 1-11 belongs to the whitelisted traffic category 0-4 after identification, and the traffic 12-21 belongs to the blacklisted traffic category 5-14 after identification.

於圖13的例子中，是基於長度欄位、功能碼欄位及轉發率欄位，三個欄位來產生攻擊辨識資料模型，以進行攻擊偵測。長度欄位的白名單樣本值為11與12；功能碼欄位的白名單樣本值為3與4；轉發率欄位的白名單樣本值為1與2。 In the example in FIG. 13, the attack identification data model is generated based on the length field, the function code field and the forwarding rate field, which are three fields for attack detection. The whitelist sample values for the length field are 11 and 12; the whitelist sample values for the function code field are 3 and 4; the whitelist sample values for the forwarding rate field are 1 and 2.

於辨識過程中，流量1-10由於各欄位的值都與白名單樣本值相同，其所屬的流量種類0-3可判定為白名單。 During the identification process, since the value of each field of the flow 1-10 is the same as the sample value of the whitelist, the flow type 0-3 to which it belongs can be judged as the whitelist.

流量11的長度欄位(值為13)雖然此項不存在於白名單樣本值，但多數特徵仍符合白名單且屬於經驗上容許範圍內，故經訓練的攻擊辨識資料模型會將其所屬的流量種類4判定為白名單。 Although the length field of traffic 11 (value 13) does not exist in the whitelist sample value, most of the features still meet the whitelist and are within the empirical allowable range, so the trained attack identification data model will Traffic type 4 is judged as a whitelist.

流量12的長度欄位(值為16)不符白名單樣本值，且已明顯超出經驗容許範圍，故經訓練的攻擊辨識資料模型會將其所屬的流量種類5判定為黑名單。 The length field (value of 16) of traffic 12 does not match the whitelist sample value, and has clearly exceeded the empirical allowable range. Therefore, the trained attack identification data model will determine the traffic category 5 to which it belongs as a blacklist.

流量13的所有欄位至雖然都符合白名單樣本值，但其功能碼欄位(值為3)與轉發率欄位(值為0)的組合是屬於經驗上少見或不會出現的組合，故經訓練的攻擊辨識資料模型會將其所屬的流量種類6判定為黑名單。 Although all the fields of traffic 13 meet the whitelist sample value, the combination of the function code field (value 3) and the forwarding rate field (value 0) is a combination that is rare or will not appear in experience. Therefore, the trained attack identification data model will determine the traffic category 6 to which it belongs as a blacklist.

流量14-21的功能碼欄位(值分別為2、5-11)不符白名單樣本值，且已明顯超出經驗容許範圍，故經訓練的攻擊辨識資料模型會將其所屬的流量種類7-14判定為黑名單。 The function code field of traffic 14-21 (values are 2, 5-11) does not match the whitelist sample value, and has clearly exceeded the allowable range of experience, so the trained attack identification data model will be the traffic type 7- 14 judged as a blacklist.

因此，本發明由於可對白名單樣本值與黑名單樣本值以外的數值進行判斷，可進一步提升攻擊偵測的準確性。 Therefore, the present invention can further improve the accuracy of attack detection because the whitelist sample value and the value other than the blacklist sample value can be judged.

以上所述僅為本發明之較佳具體實例，非因此即侷限本發明之專利範圍，故舉凡運用本發明內容所為之等效變化，均同理皆包含於本發明之範圍內，合予陳明。 The above are only preferred specific examples of the present invention, and are not limited to the scope of the patent of the present invention. Therefore, all equivalent changes made by using the content of the present invention are included in the scope of the present invention in the same way. Bright.

S100-S103:訓練步驟 S100-S103: training steps

S104-S108:第一辨識步驟 S104-S108: The first identification step

Claims

An attack identification data model generation and application method for an automatic control system. The attack identification system includes a control device, a controlled device and an identification module. The attack identification data model generation and application method includes the following steps : A) In a training mode, perform statistics on multiple values of multiple sample flows in the whitelist or blacklist to obtain multiple sample values, where a first number of multiple traffic categories can be performed based on all the sample values Identification; b) performing a classification learning algorithm based on the plurality of sample values and the corresponding plurality of traffic categories to classify values other than the plurality of sample values to generate an attack identification data model, wherein the attack identification The data model includes a plurality of identification features, based on all the identification features, a second number of the plurality of traffic types can be identified, the second number is greater than the first number; c) controlling the identification module in an identification mode Receiving a plurality of unfamiliar traffic; and d) classifying each of the unfamiliar traffic to the traffic category in the whitelist or the traffic category in the blacklist based on the identification characteristics of the attack identification data model and the numerical value of each of the unfamiliar traffic, wherein the A plurality of unfamiliar traffic is sent by the control device to the controlled device, or sent by the controlled device to the control device; wherein, each of the sample traffic and each of the unfamiliar traffic includes a plurality of fields, and the classification learning algorithm It includes the following steps: e1) Execute a decision tree algorithm to determine a classification condition, wherein the classification condition divides the multiple sample flows into multiple subgroups, and each of the subgroups corresponds to a whitelist of the traffic category or a blacklist. For the traffic category of the list, the classification condition is the value or value range of one of the multiple fields, and is determined based on at least one whitelist sample value or at least one blacklist sample value of the field; e2) Calculate a purity of each subgroup; e3) When the purity of any one of the subgroups does not meet a preset purity, the decision tree algorithm is executed on the subgroup to determine another classification condition, wherein another classification condition subdivides the subgroup into the subgroup Multiple subgroups; and e4) repeating step e2) and step e3) until the purity of all the subgroups meets the preset purity.

The method for generating and applying an attack identification data model according to claim 1, wherein the multiple sample values include multiple whitelist sample values and multiple blacklist sample values, and the step a) includes the following steps: a11) whitelist Perform statistics on the multiple values of the multiple sample flows to obtain multiple whitelist sample values; and a12) perform a reverse analysis process on the multiple whitelist sample values to obtain the multiple blacklist sample values.

According to the method for generating and applying the attack identification data model described in claim 1, the multiple sample values include multiple whitelist sample values and multiple blacklist sample values, wherein the step a) includes a step a21) pairing the blacklist Perform statistics on the multiple values of the multiple sample flows to obtain multiple blacklist sample values.

The method for generating and applying an attack identification data model according to claim 1, wherein each of the sample traffic and each of the unfamiliar traffic includes a plurality of fields, and the plurality of identification features are respectively used to identify the plurality of different traffic types; The step a) is to select at least one of the multiple fields, and perform statistics on all the values of the selected fields of all the sample flows to obtain at least one of the selected fields The sample value; the step d) is to determine whether the value of at least one of the plurality of fields of each of the strange traffic meets any of the identification characteristics to determine that each of the strange traffic is the corresponding traffic category.

The method for generating and applying an attack identification data model as described in claim 1, wherein the step b) is to set all the classification conditions corresponding to each subgroup whose purity meets the preset purity to the corresponding traffic The identifying characteristics of the category.

The method for generating and applying an attack identification data model as described in claim 1, wherein the multiple classification conditions are the numerical values or numerical ranges of the multiple fields.

The method for generating and applying an attack identification data model as described in claim 1, wherein the decision tree algorithm is Classification And Regression Tree Algorithm, and the purity is Gini Index.

The method for generating and applying an attack identification data model according to claim 1, wherein the attack identification system includes a relay device, and the method for generating and applying the attack identification data model further includes the following steps before step c): f1 ) When the relay device receives any traffic, a copy of the traffic is generated; f2) the copy is transmitted to the identification module as the strange traffic; and f3) the traffic is forwarded to a destination of the traffic The control device or the controlled device indicated by the field.

The method for generating and applying an attack identification data model as described in claim 1, wherein the method for generating and applying the attack identification data model further includes a step g) after step d): determining that any of the strange traffic is black When the traffic category of the list is issued an alert or make a record.

The method for generating and applying an attack identification data model according to claim 1, wherein the attack identification system includes a relay device, and the method for generating and applying the attack identification data model further includes a step h) before the step c) When the relay device receives any traffic, it transmits the traffic to the identification module as the unfamiliar traffic.

The method for generating and applying an attack identification data model as described in claim 10, which further includes the following steps after step d): i1) When it is judged that the unfamiliar traffic belongs to the traffic category of the blacklist, block the transmission of the traffic; and i2) When it is judged that the unfamiliar traffic belongs to the traffic category of the whitelist, forward the traffic to a destination of the traffic The control device or the controlled device indicated by the field.

The method for generating and applying an attack identification data model as described in claim 1, wherein the sample traffic is offline traffic or real-time traffic.

The method for generating and applying an attack identification data model according to claim 12, wherein the sample traffic is offline traffic; the step a) is to use the traffic obtained in the offline state as the sample traffic; and the step c) is to The traffic received in the online state is regarded as the multiple unfamiliar traffic.

The method for generating and applying an attack identification data model according to claim 12, wherein the sample traffic is real-time traffic; the step a) is to receive the first part of the continuous unfamiliar traffic as the sample traffic; the step b) The attack identification data model is generated in real time; the step c) is to receive the second part of the continuous unfamiliar traffic; the step d) is to classify the second part of the plurality of unfamiliar traffic.