TWI724734B - Method of building and applying an attack identification data model - Google Patents
Method of building and applying an attack identification data model Download PDFInfo
- Publication number
- TWI724734B TWI724734B TW109100150A TW109100150A TWI724734B TW I724734 B TWI724734 B TW I724734B TW 109100150 A TW109100150 A TW 109100150A TW 109100150 A TW109100150 A TW 109100150A TW I724734 B TWI724734 B TW I724734B
- Authority
- TW
- Taiwan
- Prior art keywords
- traffic
- data model
- sample
- identification data
- attack
- Prior art date
Links
- 238000013499 data model Methods 0.000 title claims abstract description 99
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 38
- 238000012549 training Methods 0.000 claims abstract description 26
- 238000003066 decision tree Methods 0.000 claims description 23
- 238000004458 analytical method Methods 0.000 claims description 9
- 230000005540 biological transmission Effects 0.000 claims description 9
- GNFTZDOKVXKIBK-UHFFFAOYSA-N 3-(2-methoxyethoxy)benzohydrazide Chemical compound COCCOC1=CC=CC(C(=O)NN)=C1 GNFTZDOKVXKIBK-UHFFFAOYSA-N 0.000 claims 1
- 238000001514 detection method Methods 0.000 description 30
- 238000010586 diagram Methods 0.000 description 16
- 230000006870 function Effects 0.000 description 12
- 230000006399 behavior Effects 0.000 description 11
- 238000012545 processing Methods 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 238000007635 classification algorithm Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000002265 prevention Effects 0.000 description 5
- 238000004590 computer program Methods 0.000 description 3
- 239000012535 impurity Substances 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 230000003321 amplification Effects 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007257 malfunction Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
Images
Landscapes
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Description
本發明係與網路攻擊的辨識有關,特別有關於攻擊辨識資料模型的生成與應用方法。 The present invention is related to the identification of network attacks, in particular to the generation and application methods of attack identification data models.
於通訊網路中(如網際網路或區域網路),電腦裝置間是透過傳遞資料流量來進行通訊。然而,惡意的流量(即攻擊行為)可能導致電腦裝置故障。 In a communication network (such as the Internet or a local area network), computer devices communicate through data traffic. However, malicious traffic (ie, attacking behavior) may cause the computer device to malfunction.
為了偵測來自網路上的攻擊,現有攻擊辨識技術是事先收集已知流量(如封包)的數值,並於收到陌生流量時,將陌生流量的數值與已知流量的數值進行比較,若陌生流量的數值符合任一已知流量的數值,則可判定此陌生流量的目的(如正常流量或攻擊流量)。 In order to detect attacks from the Internet, the existing attack identification technology collects the value of known traffic (such as packets) in advance, and compares the value of the unfamiliar traffic with the value of the known traffic when the unfamiliar traffic is received. If the value of the flow matches the value of any known flow, the purpose of the unfamiliar flow (such as normal flow or attack flow) can be determined.
現有攻擊辨識技術的缺失在於,僅能對與已知流量完全相同的陌生流量進行辨識,一但陌生流量與已知流量存在差異,將無法有效成功辨識陌生流量的目的。 The disadvantage of the existing attack identification technology is that it can only identify the strange traffic that is exactly the same as the known traffic. Once the strange traffic is different from the known traffic, it will not be able to effectively and successfully identify the purpose of the strange traffic.
是以,現有網路攻擊辨識技術存在上述問題,而亟待更有效的方案被提出。 Therefore, the existing network attack identification technology has the above-mentioned problems, and a more effective solution is urgently required.
本發明之主要目的,係在於提供一種攻擊辨識資料模型的生成與應用方法,可基於相同數量的樣本流量來對更多種的流量類別進行辨識。 The main purpose of the present invention is to provide a method for generating and applying an attack identification data model, which can identify more types of traffic based on the same amount of sample traffic.
為達上述目的,本發明係提供一種攻擊辨識資料模型的生成與應用方法,用於自動控制系統,攻擊辨識系統包括控制設備、受控設備及辨識模組,攻擊辨識資料模型的生成與應用方法包括以下步驟:於訓練模式下,對白名單或黑名單的多個樣本流量的多個數值進行統計以獲得多個樣本值,其中基於所有樣本值可對第一數量的多個流量類別進行辨識;基於多個樣本值與所對應的多個流量類別執行分類學習演算法來對多個樣本值以外的數值進行分類,以產生攻擊辨識資料模型,其中攻擊辨識資料模型包括多個辨識特徵,基於所有辨識特徵可對第二數量的多個流量類別進行辨識,第二數量大於第一數量;控制辨識模組於辨識模式下負責接收多個陌生流量;及,基於攻擊辨識資料模型的多個辨識特徵及各陌生流量的數值分類各陌生流量至白名單的流量類別或黑名單的流量類別,其中多個陌生流量是由控制設備發送至受控設備,或由受控設備發送至控制設備。 To achieve the above objective, the present invention provides an attack identification data model generation and application method, which is used in an automatic control system. The attack identification system includes a control device, a controlled device and an identification module, and an attack identification data model generation and application method It includes the following steps: in the training mode, perform statistics on multiple values of multiple sample flows of the whitelist or blacklist to obtain multiple sample values, wherein a first number of multiple traffic categories can be identified based on all sample values; Based on multiple sample values and corresponding multiple traffic categories, a classification learning algorithm is executed to classify values other than multiple sample values to generate an attack identification data model. The attack identification data model includes multiple identification features based on all The identification feature can identify a second number of multiple traffic categories, the second number is greater than the first number; the control identification module is responsible for receiving multiple unfamiliar traffic in the identification mode; and, multiple identification features based on the attack identification data model And the numerical classification of each unfamiliar traffic to a whitelisted traffic category or a blacklisted traffic category, where multiple unfamiliar traffic is sent from the control device to the controlled device, or from the controlled device to the control device.
本發明基於少量的樣本流量可辨識多種的流量類別,並可準確判斷未定義的陌生流量屬於白名單或黑名單。 The present invention can identify a variety of traffic categories based on a small amount of sample traffic, and can accurately determine that undefined unfamiliar traffic belongs to a white list or a black list.
100-102:攻擊流量 100-102: Attack traffic
11:攻擊偵測系統 11: Attack detection system
110-111:樣本 110-111: sample
120:白名單 120: Whitelist
121:黑名單 121: Blacklist
20:控制設備 20: control equipment
21:受控設備 21: controlled equipment
200、210、300、31:辨識模組 200, 210, 300, 31: identification module
30:中繼設備 30: Relay device
400:處理裝置 400: Processing device
401:儲存裝置 401: storage device
402:人機介面 402: Human-Machine Interface
403:傳輸裝置 403: Transmission Device
404:功能裝置 404: functional device
500:白名單樣本值 500: Whitelist sample value
501:黑名單樣本值 501: Blacklist sample value
51:分類學習演算法 51: Classification learning algorithm
52:白名單 52: Whitelist
53:黑名單 53: Blacklist
54:攻擊辨識資料模型 54: Attack Identification Data Model
60-63:流量類別 60-63: Traffic category
70-73:辨識特徵 70-73: Identify features
S100-S103:訓練步驟 S100-S103: training steps
S104-S108:第一辨識步驟 S104-S108: The first identification step
S20-S21:第一樣本值取得步驟 S20-S21: Steps to obtain the first sample value
S22:第二樣本值取得步驟 S22: Steps to obtain the second sample value
S30-S33:分類步驟 S30-S33: Classification steps
S400-S409:第二辨識步驟 S400-S409: Second identification step
S500-S509:第二辨識步驟 S500-S509: Second identification step
圖1為現有的攻擊偵測系統的運作示意圖。 Figure 1 is a schematic diagram of the operation of an existing attack detection system.
圖2為本發明一實施態樣的自動控制系統的架構圖。 Fig. 2 is a structural diagram of an automatic control system according to an embodiment of the present invention.
圖3為本發明一實施態樣的自動控制系統的架構圖。 Fig. 3 is a structural diagram of an automatic control system according to an embodiment of the present invention.
圖4為本發明一實施態樣的電腦裝置的架構圖。 FIG. 4 is a structural diagram of a computer device according to an embodiment of the present invention.
圖5為本發明第一實施例的攻擊辨識資料模型的生成與應用方法的流程圖。 FIG. 5 is a flowchart of the method for generating and applying an attack identification data model according to the first embodiment of the present invention.
圖6為本發明第二實施例的攻擊辨識資料模型的生成與應用方法的部分流程圖。 6 is a partial flowchart of the method for generating and applying an attack identification data model according to the second embodiment of the present invention.
圖7為本發明第三實施例的分類學習演算法的流程圖。 FIG. 7 is a flowchart of the classification learning algorithm according to the third embodiment of the present invention.
圖8為本發明第四實施例的攻擊辨識資料模型的生成與應用方法的流程圖。 FIG. 8 is a flowchart of a method for generating and applying an attack identification data model according to a fourth embodiment of the present invention.
圖9為本發明第五實施例的攻擊辨識資料模型的生成與應用方法的流程圖。 Fig. 9 is a flowchart of a method for generating and applying an attack identification data model according to a fifth embodiment of the present invention.
圖10為本發明一實施態樣的攻擊辨識資料模型的生成示意圖。 FIG. 10 is a schematic diagram of generating an attack identification data model according to an embodiment of the present invention.
圖11為本發明一實施態樣的基於單欄位的決策樹演算法的執行示意圖。 FIG. 11 is a schematic diagram of the execution of a single-column-based decision tree algorithm according to an embodiment of the present invention.
圖12為本發明一實施態樣的基於多欄位的決策樹演算法的執行示意圖。 FIG. 12 is a schematic diagram of the execution of a decision tree algorithm based on multiple fields according to an embodiment of the present invention.
圖13為本發明一實施態樣的多個陌生流量的多個欄位的示意圖。 FIG. 13 is a schematic diagram of multiple fields of multiple unfamiliar traffic according to an embodiment of the present invention.
茲就本發明之一較佳實施例,配合圖式,詳細說明如後。 With regard to a preferred embodiment of the present invention, the detailed description is given below in conjunction with the drawings.
請參閱圖1,為現有的攻擊偵測系統的運作示意圖,用以更清楚地說明本發明所要解決的技術問題。 Please refer to FIG. 1, which is a schematic diagram of the operation of the existing attack detection system, which is used to more clearly illustrate the technical problem to be solved by the present invention.
如圖1所示,攻擊偵測系統11是預先儲存有黑名單121的多個樣本110-111,其值分別為A與B,即攻擊偵測系統11僅能辨識兩種的流量類別。
As shown in FIG. 1, the
當進行攻擊偵測時,攻擊偵測系統11是將所收到的各陌生流量的值(以攻擊流量100-102為例,其值分別為A、B、C)與所有樣本100-111的值進行比較,以決定各流量屬於白名單120的流量或黑名單121的流量。
When performing attack detection, the
於圖1的例子中,攻擊流量100-101分別與黑名單121的樣本110-111具有相同值,而會被辨識為黑名單121的流量,然而,攻擊流量102由於與樣本110-111具有不同值,而會被誤判為白名單120的流量。
In the example in Figure 1, the attack traffic 100-101 has the same value as the samples 110-111 of the
因此,現有攻擊偵測系統僅能對與已知的樣本完全相同的陌生流量進行辨識,未曾發生攻擊或不在記錄中的攻擊樣本將無法成功辨識,這使得辨識失敗或誤判的機率過高,而降低了系統的可靠度。 Therefore, the existing attack detection system can only identify the unfamiliar traffic that is exactly the same as the known sample, and the attack sample that has not had an attack or is not in the record will not be successfully identified, which makes the probability of identification failure or misjudgment too high, and Reduce the reliability of the system.
此外,於不同類型(如不同應用目的或不同網路協定)的自動控制系統中,所傳遞的流量內容也會不同,而需要不同的偵測規則,而需要一種可隨應用類型自動進行訓練與攻擊偵測的解決方案。 In addition, in different types of automatic control systems (such as different application purposes or different network protocols), the content of the traffic delivered will also be different, and different detection rules are required, and a system that can be automatically trained and used according to the application type is required. Solutions for attack detection.
然而,工業控制網路協定並不像IPv4如此普遍,不同的工業控制網路常使用不同的工業控制網路協定,而沒有任何一種攻擊辨識資料模型可同時適用於所有類型的工業控制網路,而需要一種可隨當前的工業控制網路類型自動進行訓練與攻擊偵測的解決方案。 However, industrial control network protocols are not as common as IPv4. Different industrial control networks often use different industrial control network protocols, and there is no attack identification data model that can be applied to all types of industrial control networks at the same time. What is needed is a solution that can automatically perform training and attack detection with the current type of industrial control network.
為解決上述問題,本發明主要是提供一種攻擊辨識資料模型的生成與應用方法,可經由對多個樣本流量進行學習訓練來產生一組攻擊辨識資料模型,並以此攻擊辨識資料模型來對網路流量進行偵測,以辨識各網路流量類別的目的(如為正常流量、可疑流量或攻擊流量)。前述攻擊辨識資料模型由於是經由學習訓練所產生並採用了不同思維的分類辨識方式,其可辨識的流量類別的數量是被擴充為大於訓練用的樣本流量的流量類別。 In order to solve the above problems, the present invention mainly provides a method for generating and applying an attack identification data model. A set of attack identification data models can be generated by learning and training multiple sample flows, and the attack identification data model can be used to attack the network. Road traffic is detected to identify the purpose of each network traffic category (such as normal traffic, suspicious traffic, or attack traffic). Since the aforementioned attack identification data model is generated through learning and training and adopts a classification and identification method of different thinking, the number of recognizable traffic categories is expanded to a traffic category larger than the sample traffic used for training.
請參閱圖2,為本發明一實施態樣的自動控制系統的架構圖。本發明的攻擊辨識資料模型的生成與應用方法可應用於圖2所示的自動控制系統2。
Please refer to FIG. 2, which is a structural diagram of an automatic control system according to an embodiment of the present invention. The method for generating and applying the attack identification data model of the present invention can be applied to the
具體而言,自動控制系統2主要包括控制設備20(如伺服器或控制主機)與一或多個受控設備21(如機器人、物聯網節點、工業自動化設備、末端設備等等)。控制設備20經由網路連接各受控設備21,並可傳輸控制指令(即流量)至受控設備21以控制受控設備21執行指定操作,或自受控設備21接收回傳資料(即流量)。
Specifically, the
於一實施態樣中,控制設備20包括辨識模組200,辨識模組200是用來基於前述攻擊辨識資料模型對控制設備20所收到的流量進行辨識,以決定所收到的各流量所屬的流量類別。藉此,本發明可於控制設備20上實現網路攻擊偵測。
In an implementation aspect, the
於一實施態樣中,受控設備21包括辨識模組210,辨識模組210是用來基於前述攻擊辨識資料模型對受控設備21所收到的流量進行辨識,以決定所收到的各流量所屬的流量類別。藉此,本發明可於受控設備21上實現網路攻擊偵測。
In an implementation aspect, the controlled
值得一提的是,本發明的攻擊辨識資料模型可用來將各陌生流量分類至預先定義的多種流量類別的其中之一,而各流量類別可事先被歸屬於白名單或黑名單。藉此,於本發明中,當各陌生流量被分類完成時,即可依據所屬的流量類別來判定此流量為白名單或黑名單。 It is worth mentioning that the attack identification data model of the present invention can be used to classify each unfamiliar traffic into one of a plurality of predefined traffic categories, and each traffic category can be assigned to a whitelist or a blacklist in advance. Therefore, in the present invention, when each unfamiliar traffic is classified, it can be determined as a whitelist or a blacklist according to the traffic category to which it belongs.
續請一併參閱圖3,為本發明一實施態樣的自動控制系統的架構圖。圖3的控制設備20與受控設備21是與圖2所示相同或相似,於此不再贅述。
Please also refer to FIG. 3, which is a structural diagram of an automatic control system according to an embodiment of the present invention. The
於圖3的實施態樣中,自動控制系統3更包括中繼設備30(如網路交換器、路由器、橋接器等等)。控制設備20是經由中繼設備30來連接受控設備
21,即中繼設備30用以轉傳控制設備20發送至受控設備21的流量,或轉傳受控設備21發送至控制設備20的流量。
In the implementation aspect of FIG. 3, the
於一實施態樣中,中繼設備30包括辨識模組300,辨識模組300是用來基於前述攻擊辨識資料模型對中繼設備30所收到的流量(即轉發的流量)進行辨識,以決定所收到的各流量所屬的流量類別。藉此,本發明僅需於中繼設備30架設辨識模組300即可實現整個網路的攻擊偵測。
In an implementation aspect, the
於一實施態樣中,辨識模組31為獨立設備(如獨立電腦主機或伺服器),中繼設備30經由網路連接辨識模組31,並且,中繼設備30於收到陌生流量時,可將陌生流量(或陌生流量的副本)傳輸至辨識模組31,再由辨識模組31決定所收到的各流量所屬的流量類別。藉此,本發明可降低中繼設備的負擔。
In one implementation aspect, the
請一併參閱圖4,為本發明一實施態樣的電腦裝置的架構圖。前述的控制設備20、受控設備21、中繼設備30及辨識模組31可為如圖4所示的電腦裝置4。
Please also refer to FIG. 4, which is a structural diagram of a computer device according to an embodiment of the present invention. The
具體而言,電腦裝置4可包括儲存裝置401、人機介面402、傳輸裝置403、功能裝置404及電性連接上述裝置的處理裝置400。
Specifically, the
儲存裝置401用以儲存資料,如攻擊辨識資料模型、或用於控制功能裝置404的程式等等。人機介面402用以接受用戶輸入,並輸出資訊。人機介面402可包括各式輸入裝置與輸出裝置的任意組合,如觸控螢幕、按鍵組、滑鼠、顯示器、指示燈、揚聲器等等,不加以限定。傳輸裝置403用以連接網路,如乙太網路模組、Wi-Fi網路模組或行動網路模組等等。
The
功能裝置404用以實現設備指定功能。舉例來說,以受控設備21為自動製造設備為例,則功能裝置404可為運輸帶、機器手臂或其他用於自動製造的裝置。以受控設備21為自動檢測設備為例,則功能裝置404可為攝影機、攝影機或物件的移動裝置或其他用於自動檢測的裝置。以控制設備20為工
業管理主機為例,則功能裝置404可為管理系統或備援裝置。以中繼設備30為網路交換器或路由器為例,則功能裝置404可為交換器模組或路由器模組。
The
於一實施態樣中,儲存裝置401可儲存有電腦程式,上述電腦程式記錄有電腦可執行的程式碼。當處理裝置400執行上述電腦程式時,可實現本發明後續各實施例的攻擊辨識資料模型的生成與應用方法。
In an implementation aspect, the
續請一併參閱圖5,為本發明第一實施例的攻擊辨識資料模型的生成與應用方法的流程圖。圖5的對應說明中,是以攻擊辨識資料模型的生成與應用方法應用於圖3所示的自動控制系統3來進行說明,但不以此限定。
Please also refer to FIG. 5, which is a flowchart of the method for generating and applying an attack identification data model according to the first embodiment of the present invention. In the corresponding description of FIG. 5, the method of generating and applying the attack identification data model is applied to the
於一實施例中,攻擊辨識資料模型的生成與應用方法亦可應用於圖2所示的自動控制系統2來加以實現。
In one embodiment, the method for generating and applying the attack identification data model can also be applied to the
本發明的攻擊辨識資料模型的生成與應用方法主要分為兩個階段,訓練模式與辨識模式。訓練模式下,本發明可對已知流量進行訓練,來產生攻擊辨識資料模型。辨識模式下,本發明是使用攻擊辨識資料模型來對陌生流量進行辨識。 The method for generating and applying the attack identification data model of the present invention is mainly divided into two stages, a training mode and an identification mode. In the training mode, the present invention can train known traffic to generate an attack identification data model. In the identification mode, the present invention uses the attack identification data model to identify unfamiliar traffic.
值得一提的是,雖於後續說明中是由辨識模組31來執行訓練模式與辨識模式,但不以此限定。
It is worth mentioning that although the training mode and the recognition mode are executed by the
於一實施例中,本發明亦可改由辨識模組200、辨識模組210及/或辨識模組300來執行訓練模式與辨識模式。
In an embodiment, the present invention can also be modified by the
於一實施例中,訓練模式與辨識模式可由不同的電腦裝置來加以執行。舉例來說,辨識模組31執行訓練模式,併將所產生的攻擊辨識資料模型傳送至其他辨識模組(如辨識模組300或者辨識模組200與210),以由其他辨識模組來執行辨識模式。藉此,本發明可分散訓練負載與辨識負載。
In one embodiment, the training mode and the recognition mode can be executed by different computer devices. For example, the
首先,辨識模組31執行步驟S100-S103來於訓練模式下產生攻擊辨識資料模型。
First, the
步驟S100:辨識模組31依據用戶操作或自動控制切換至於訓練模式,以準備執行學習訓練。
Step S100: The
步驟S101:辨識模組31取得多個樣本流量,並對多個樣本流量的多個數值進行統計以獲得多個樣本值。前述多個樣本流量是已知目的的流量(如為白名單的流量或黑名單的流量),或者具有高可信度的流量(如為可信任設備所發送過來的流量,而可直接推定為白名單的流量)。並且,基於所決定的所有樣本值可對第一數量(如800種或1000種)的流量類別進行辨識。
Step S101: The
於一實施例中,各樣本流量包括多個欄位(如封包長度、通訊協定代碼、功能代碼、每秒封包量、及/或發送時間戳等等),辨識模組31是選擇多個欄位的全部或部分作為指定欄位,並對所有樣本流量的指定欄位的數值進行統計,以獲得各指定欄位的一或多個樣本值。
In one embodiment, each sample traffic includes multiple fields (such as packet length, protocol code, function code, number of packets per second, and/or sending timestamp, etc.), and the
於一實施例中,辨識模組31是將各指定欄位所有出現過的值都作為此指定欄位的樣本值,但不以此限定。
In one embodiment, the
於一實施例中,辨識模組31是對各指定欄位所有出現過的值進行統計分析,來獲得樣本值,如將出現次數大於預設次數(如5次)的值作為樣本值,將有規律出現的多個值(如出現在連續流量中)作為多個樣本值,或將出現頻率較高(如出現頻率為前30%)的值作為樣本值等等。
In one embodiment, the
於一實施例中,辨識模組31是於自動控制系統3正常工作運作狀態下,經由中繼設備30連續擷取控制設備20與受控設備21之間的傳輸流量作為樣本流量(如擷取連續10分鐘的流量)。
In one embodiment, the
步驟S102:辨識模組31基於多個樣本值與所對應的多個流量類別執行分類學習演算法,以對多個樣本值以外的數值進行分類,並產生攻擊辨識資料模型。
Step S102: The
並且,前述攻擊辨識資料模型可包括多個辨識特徵,而基於所有辨識特徵可對第二數量的多個流量類別進行辨識,前述第二數量大於前述第一數量,即攻擊辨識資料模型可以擴增所能辨識的流量類別的數量。 In addition, the aforementioned attack identification data model may include multiple identification features, and a second number of multiple traffic categories can be identified based on all the identification features. The aforementioned second number is greater than the aforementioned first number, that is, the attack identification data model can be expanded The number of traffic types that can be identified.
值得一提的是,本發明主要是提供一種將現有的機器學習技術應用於網路攻擊偵測的解決方案。 It is worth mentioning that the present invention mainly provides a solution for applying existing machine learning technology to network attack detection.
關於如何對樣本值進行機器學習訓練以產生攻擊辨識資料模型,於機器學習技術領域中已有許多相關文獻記載一,於此不加以贅述。 Regarding how to perform machine learning training on sample values to generate attack identification data models, there have been many related documents in the field of machine learning technology, so I will not repeat them here.
舉例來說,可採用分類演算法,如非監督式分類演算法或監督式分類演算法。前述非監督式分類演算法可為K平均值(K-means)、類神經網路(Neural Network)及平衡疊代削減聚類演算法(BIRCH)等等。前述監督式分類演算法可為決策樹(Decision Tree)、支持向量機(Support Vector Machine,SVM)、貝氏演算法(Naïve-Bayes)等等。 For example, a classification algorithm can be used, such as an unsupervised classification algorithm or a supervised classification algorithm. The aforementioned unsupervised classification algorithm can be K-means, Neural Network, and Balanced Iterative Reduction Clustering Algorithm (BIRCH), etc. The aforementioned supervised classification algorithm can be a Decision Tree (Decision Tree), a Support Vector Machine (SVM), a Bayesian algorithm (Naïve-Bayes), and so on.
於一實施例中,前述分類學習演算法是分析同一欄位的多個樣本值或者跨欄位的多個樣本值之間的關聯,並可進一步結合各欄位的極值(如一般經驗下的最小容許值或最大容許值)來算出前述多個辨識特徵。 In one embodiment, the aforementioned classification learning algorithm analyzes the correlation between multiple sample values in the same column or multiple sample values across columns, and can further combine the extreme values of each column (as in general experience) Minimum allowable value or maximum allowable value) to calculate the aforementioned multiple identification features.
於一實施例中,前述多個辨識特徵分別對應多種流量類別。並且,多種流量類別是分別屬於白名單或黑名單的其中之一。藉此,當任一陌生流量符合多個辨識特徵的其中之一時,此陌生流量即屬於符合的辨識特徵所對應的流量類別,並可進一步依據此流量類別屬於白名單或黑名單來判定此陌生流量為正常流量或可疑流量。 In one embodiment, the aforementioned multiple identification features respectively correspond to multiple types of traffic. In addition, multiple traffic categories belong to one of the whitelist or blacklist respectively. In this way, when any unfamiliar traffic meets one of the multiple identification features, the unfamiliar traffic belongs to the traffic category corresponding to the matching identification feature, and the unfamiliar traffic can be further determined based on whether the traffic category belongs to the whitelist or blacklist. The traffic is normal or suspicious.
步驟S103:辨識模組31輸出攻擊辨識資料模型,如將攻擊辨識資料模型以檔案形式匯出,儲存於儲存裝置401或經由傳輸裝置403傳送至其他辨識模組。
Step S103: The
藉此,本發明可以經由輸入樣本流量來快速訓練專屬於當前網路環境的攻擊辨識資料模型,而可適用於不同類型的網路環境或自動控制系統。 In this way, the present invention can quickly train an attack identification data model specific to the current network environment through the input sample traffic, and can be applied to different types of network environments or automatic control systems.
接著,辨識模組31可執行步驟S104-S108來於辨識模式下偵測網路流量攻擊。
Then, the
步驟S104:辨識模組31依據用戶操作或自動控制切換至於辨識模式,以準備執行攻擊偵測。
Step S104: The
步驟S105:辨識模組31載入於步驟S103中所輸出的攻擊辨識資料模型。
Step S105: The
步驟S106:辨識模組31開始接收多個陌生流量。前述陌生流量可為由控制設備20發送至受控設備21,及/或由受控設備21發送至控制設備20。
Step S106: The
步驟S107:辨識模組31基於攻擊辨識資料模型的多個辨識特徵及各陌生流量的數值來對各陌生流量進行分類,以辨識陌生流量所屬的流量類別。
Step S107: The
更進一步地,由於各流量類別已預先被歸類至白名單與黑名單的其中之一,辨識模組31可依據各陌生流量所屬的流量類別來決定此陌生流量屬於白名單的流量(即正常行為)或黑名單的流量(即可疑行為或攻擊行為)。
Furthermore, since each traffic category has been pre-classified to one of the whitelist and the blacklist, the
於一實施例中,如同各樣本流量,各陌生流量可包括多個欄位。辨識模組31於步驟S107中是將攻擊辨識資料模型的多個辨識特徵逐一與各陌生流量的多個欄位的值進行比較,並於欄位的數值符合任一辨識特徵時,將此辨識特徵鎖定的流量類別做為此陌生流量的流量類別。藉以實現陌生流量的分類。
In one embodiment, as with each sample flow, each unfamiliar flow may include multiple fields. In step S107, the
步驟S108:辨識模組31判斷是否結束流量辨識。具體而言,辨識模組31是於預設的結束條件滿足時,自動結束流量辨識,即結束攻擊偵測。
Step S108: The
於一實施例中,前述結束條件可為用戶手動關閉流量辨識功能、持續未收到任何陌生流量達預設結束時間、或受控制將處理資源釋放給其他程式或應用使用等等,不加以限定。 In one embodiment, the aforementioned end condition may be that the user manually disables the traffic recognition function, continues to receive no unfamiliar traffic for a preset end time, or controls the release of processing resources to other programs or applications, etc., and is not limited. .
若辨識模組31判斷結束條件滿足,則終止流量辨識。否則,辨識模組31持續執行步驟S106-S107以持續進行流量辨識。
If the
本發明基於相同數量的樣本流量可辨識更多種的流量類別,並可準確判斷未定義的陌生流量屬於白名單或黑名單。 The invention can identify more types of traffic based on the same number of sample traffic, and can accurately determine that undefined unfamiliar traffic belongs to the whitelist or blacklist.
於一實施例中,前述樣本流量可為離線流量或即時流量。 In one embodiment, the aforementioned sample traffic may be offline traffic or real-time traffic.
以樣本流量為離線流量為例,辨識模組31於步驟S101中是於離線狀態(如中斷與控制設備20及受控設備21之間的連接,或是中斷網路連接)取得流量(如自其他電腦裝置接收流量或自儲存裝置401讀取流量),並作為樣本流量。並且,辨識模組31於步驟S106中是於上線狀態(如連接控制設備20及受控設備21,或是恢復網路連接)取得流量,並作為陌生流量。
Taking the sample flow as offline flow as an example, the
以樣本流量為即時流量為例,辨識模組31是自控制設備20及受控設備21持續接收多個流量,並於步驟S101中是將連續的多個流量的第一部分(如前三分鐘所收到的流量,或同一檔案/指令的前半部)作為樣本流量,於步驟S102、S103中即時產生並輸出攻擊辨識資料模型,並於步驟S104-S107中即時使用攻擊辨識資料模型來將連續的多個流量的第二部分(如第三分鐘以後的流量,或同一檔案/指令的後半部)作為陌生流量來進行分類以判斷連續的多個流量的第二部分的各流量是屬於白名單或黑名單。藉此,由於連續的多個流量之間通常具有較高關聯性或相近格式,本發明經由即時使用同一組流量的一部分來辨識另一部分,不僅可節省離線訓練的時間與樣本流量,還可具有較高辨識正確性。
Taking the sample flow as the real-time flow as an example, the
續請一併參閱圖5及圖6,圖6為本發明第二實施例的攻擊辨識資料模型的生成與應用方法的部分流程圖。相較於圖5所示的實施例,本實施例進一步提供一種樣本值擴增功能,可於執行訓練前增加樣本值的數量,藉以提升攻擊辨識資料模型的準確性。 Please refer to FIG. 5 and FIG. 6 together. FIG. 6 is a partial flowchart of the method for generating and applying an attack identification data model according to the second embodiment of the present invention. Compared with the embodiment shown in FIG. 5, this embodiment further provides a sample value amplification function, which can increase the number of sample values before performing training, so as to improve the accuracy of the attack identification data model.
具體而言,於本實施例中,步驟S101包括步驟S20-S21及/或步驟S22。更進一步地,於步驟S101中經由統計獲得的多個樣本值可能僅包括白名單樣本值(即樣本流量皆為白名單流量)或同時包括白名單樣本值與黑名單樣本值(即樣本流量包括白名單流量與黑名單流量)。 Specifically, in this embodiment, step S101 includes steps S20-S21 and/or step S22. Furthermore, the multiple sample values obtained through statistics in step S101 may include only whitelist sample values (that is, sample flows are all whitelist flows) or both whitelist sample values and blacklist sample values (that is, sample flows include Whitelist traffic and blacklist traffic).
前述第一種情況中,由於缺乏黑名單樣本值,所訓練出來的攻擊辨識資料模型對於黑名單的辨識能力較差;前述第二種情況中,由於白名單樣本值與黑名單樣本值的數量未必為相等,所訓練出來的攻擊辨識資料模型對於白名單或黑名單其中之一的辨識能力可能較差。 In the foregoing first case, due to the lack of blacklist sample values, the trained attack identification data model has a poor ability to recognize blacklists; in the foregoing second case, because the number of whitelist sample values and blacklist sample values may not be To be equal, the trained attack identification data model may have poor identification ability for either the white list or the black list.
對此,本發明提出一種樣本值擴增功能,可經由下述步驟S20-S21解決缺乏黑名單樣本值的問題。 In this regard, the present invention provides a sample value amplification function, which can solve the problem of lack of blacklist sample values through the following steps S20-S21.
步驟S20:辨識模組31對白名單的多個樣本流量的多個數值進行統計以獲得多個白名單樣本值。
Step S20: The
步驟S21:辨識模組31對所獲得的多個白名單樣本值執行反向分析處理以獲得對應的多個黑名單樣本值。
Step S21: The
於一實施例中,前述反向分析處理是依照當前使用的網路協定(如Modbus等工業控制協定)的傳輸限制、慣用數值(如最大長度、常見長度、常用功能碼、已定義功能碼等等)及/或白名單樣本值未涵蓋的數值範圍,來產生黑名單樣本值。 In one embodiment, the aforementioned reverse analysis processing is based on the transmission limits of currently used network protocols (such as Modbus and other industrial control protocols), customary values (such as maximum length, common length, commonly used function codes, defined function codes, etc.) Etc.) and/or the range of values not covered by the whitelist sample value to generate the blacklist sample value.
於一實施例中,為了平衡訓練用的白名單樣本與黑名單樣本的數量,前述反向分析處理後,複製原始樣本值使得黑名單樣本與白名單樣本數量一致。 In one embodiment, in order to balance the number of whitelist samples and blacklist samples for training, after the aforementioned reverse analysis process, the original sample values are copied to make the numbers of blacklist samples and whitelist samples consistent.
於一實施例中,前述反向分析處理可將白名單樣本值中的最大值增加一定數量作為黑名單樣本值,或將最小值減少一定數量的作為白名單樣本值。 In one embodiment, the foregoing reverse analysis processing may increase the maximum value of the whitelist sample value by a certain amount as the blacklist sample value, or decrease the minimum value by a certain amount as the whitelist sample value.
並且,當多個流量包括黑名單的樣本流量時,本發明可經由下述步驟S22來取得對應的黑名單樣本值。 Moreover, when the multiple flows include the sample flows of the blacklist, the present invention can obtain the corresponding blacklist sample values through the following step S22.
步驟S22:辨識模組31對黑名單的多個樣本流量的多個數值進行統計以獲得多個黑名單樣本值。
Step S22: The
值得一提的是,於本發明中,步驟S22的執行主要是用來增加黑名單樣本值的數量,以進一步提升攻擊辨識資料模型對於黑名單的辨識正確性,並非本發明之必要步驟。 It is worth mentioning that, in the present invention, the execution of step S22 is mainly used to increase the number of blacklist sample values to further improve the accuracy of the blacklist identification of the attack identification data model, which is not a necessary step of the present invention.
於一實施例中,即便有黑名單的樣本流量,亦可不執行步驟S22,而僅由步驟S20-S21所獲得的白名單樣本值與其反向分析後的黑名單樣本值來訓練的攻擊辨識資料模型。並且,前述攻擊辨識資料模型具有同樣具有分辨白名單以外的不尋常流量的能力。 In one embodiment, even if there is a sample traffic of the blacklist, step S22 may not be performed, and only the attack identification data trained by the whitelist sample value obtained in steps S20-S21 and the blacklist sample value after reverse analysis model. Moreover, the aforementioned attack identification data model also has the ability to distinguish unusual traffic outside the whitelist.
並且,當所取得的樣本流量缺乏黑名單流量時,是僅執行步驟S20-S21以產生黑名單流量;當所取得的樣本流量僅包括黑名單流量時,則可僅執行步驟S22以獲取對應的黑名單樣本值。 Moreover, when the obtained sample flow lacks blacklist flow, only steps S20-S21 are executed to generate blacklist flow; when the obtained sample flow only includes blacklist flow, only step S22 may be executed to obtain the corresponding Blacklist sample value.
藉此,本發明可增加樣本值的數量,而可提升攻擊辨識資料模型的分類精確度。 In this way, the present invention can increase the number of sample values, and can improve the classification accuracy of the attack identification data model.
值得一提的是,由於實務上不可能獲得所有的黑名單樣本值,即不符合黑名單樣本值的數值可能是白名單樣本值,也可能是黑名單樣本值。 若對不完全的黑名單樣本執行反向分析,可能獲得錯誤的白名單樣本值,而造成攻擊辨識資料模型將未知的攻擊流量誤判為正常流量,而造成攻擊偵測失準。 It is worth mentioning that it is impossible to obtain all the blacklist sample values in practice, that is, the values that do not meet the blacklist sample values may be whitelist sample values or blacklist sample values. If reverse analysis is performed on incomplete blacklist samples, wrong whitelist sample values may be obtained, causing the attack identification data model to misjudge the unknown attack traffic as normal traffic, resulting in inaccurate attack detection.
對此,本發明不會對黑名單樣本值進行反向分析來獲得可能錯誤的白名單樣本值,以避免上述攻擊偵測失準的情況。 In this regard, the present invention does not perform reverse analysis on the blacklist sample value to obtain the whitelist sample value that may be wrong, so as to avoid the above-mentioned attack detection inaccurate situation.
續請一併參閱圖10,為本發明一實施態樣的攻擊辨識資料模型的生成示意圖,用以簡單說明本發明如何建構攻擊辨識資料模型54。
Please also refer to FIG. 10, which is a schematic diagram of generating an attack identification data model of an implementation aspect of the present invention, which is used to briefly explain how the present invention constructs an attack
如圖10所示,於要進行訓練時,用戶可將多個白名單樣本值500與黑名單樣本值501輸入至分類學習演算法51。
As shown in FIG. 10, when training is to be performed, the user can input multiple whitelist sample values 500 and blacklist sample values 501 into the
接著,本發明經由執行分類學習演算法51可以產生白名單52的多個辨識特徵70-71與黑名單53的多個辨識特徵72-73。並且,前述多個辨識特徵70-71是分別與白名單52的多個流量類別60-61相關聯,並用來辨識陌生流量是否屬於對應的流量類別60-61;前述多個辨識特徵72-73是分別與黑名單53的多個流量類別62-63相關聯,並用來辨識陌生流量是否屬於對應的流量類別62-63。
Next, the present invention can generate multiple identification features 70-71 of the
值得一提的是,前述各流量類別60-61、62-63可以理解為是對網路行為進行分類,即本發明是將不同的網路行為(如具有不同的欄位值的流量)分類至不同的流量類別,藉以判斷此網路行為屬於白名單(善意行為或正常行為)或黑名單(可疑行為或攻擊行為)。 It is worth mentioning that the aforementioned traffic categories 60-61 and 62-63 can be understood as classifying network behaviors, that is, the present invention classifies different network behaviors (such as traffic with different field values) To different traffic types, it can be judged that this network behavior belongs to the whitelist (goodwill behavior or normal behavior) or blacklist (suspicious behavior or offensive behavior).
最後,本發明將多個辨識特徵70-71、72-73與上述關聯封裝為攻擊辨識資料模型54。
Finally, the present invention encapsulates a plurality of identification features 70-71, 72-73 and the above-mentioned association into an attack
續請一併參閱圖5及圖7,圖7為本發明第三實施例的分類學習演算法的流程圖。除了使用現有的演算法作為本發明之分類學習演算法之外,於本實施例中,本發明進一步提出一種新穎且進步的分類學習演算法。前述分類 學習演算法是基於決策樹演算法來建構決策樹(即樹狀分類結構),決策樹的多個葉節點(即後述的符合預設純度的子群所對應的節點)即分別對應前述多個流量類別,而決策樹的多個分支的多個分類條件即構成前述的多個辨識特徵。 Please refer to FIG. 5 and FIG. 7 together. FIG. 7 is a flowchart of the classification learning algorithm according to the third embodiment of the present invention. In addition to using the existing algorithm as the classification learning algorithm of the present invention, in this embodiment, the present invention further proposes a novel and advanced classification learning algorithm. The foregoing classification The learning algorithm is based on the decision tree algorithm to construct a decision tree (that is, a tree-like classification structure). The multiple leaf nodes of the decision tree (that is, the nodes corresponding to the subgroups that meet the preset purity described later) correspond to the aforementioned multiple Traffic category, and multiple classification conditions of multiple branches of the decision tree constitute the aforementioned multiple identification features.
具體而言,本實施例的分類學習演算法(即圖5的步驟S102所示「執行分類學習演算法」)包括以下步驟。 Specifically, the classification learning algorithm of this embodiment (ie, "execute the classification learning algorithm" shown in step S102 in FIG. 5) includes the following steps.
步驟S30:辨識模組31執行決策樹演算法來決定分類條件。前述分類條件是將多個樣本流量劃分為多個子群(各子群分別包括部分的樣本流量)。
Step S30: The
於一實施例中,前述分類條件是樣本流量的多個欄位的其中之一的數值或數值範圍,且是基於此欄位的白名單樣本值(即產生白名單的分類條件)或黑名單樣本值(即產生黑名單的分類條件)所加以決定。 In one embodiment, the aforementioned classification condition is the value or value range of one of the multiple fields of the sample traffic, and is based on the whitelist sample value (that is, the classification condition for generating the whitelist) or the blacklist based on this field The sample value (that is, the classification condition for generating the blacklist) is determined.
步驟S31:計算各子群的純度,即此分類的可信度指標(即評估依據各子群所對應的分類條件進行分類,則分類別果的可信度如何)。 Step S31: Calculate the purity of each subgroup, that is, the credibility index of this classification (that is, evaluate the classification based on the classification conditions corresponding to each subgroup, and what is the credibility of the classification results).
並且,基於所對應的分類條件(如為白名單的分類條件或黑名單的分類條件),各子群會被分別對應至白名單的流量類別或黑名單的流量類別。 Moreover, based on the corresponding classification conditions (for example, whitelist classification conditions or blacklist classification conditions), each subgroup is respectively corresponding to the traffic category of the whitelist or the traffic category of the blacklist.
於現有技術中已有許多方式可計算純度,如計算資訊增益(Information gain),計算熵(Entropy)或計算吉尼係數(Gini index),於此不再贅述。 There are many ways to calculate purity in the prior art, such as calculating information gain (Information gain), calculating entropy (Entropy) or calculating Gini index (Gini index), which will not be repeated here.
值得一提的是,雖於本實施例中,是以計算子群純度進行說明,但本發明所屬技術領域中具有通常知識者應理解,本發明的「計算子群純度」實際上應包括計算純度及計算不純度(因為不純度僅是純度的反向指標,其計算仍與純度的計算有關)。 It is worth mentioning that although in this embodiment, the calculation of subgroup purity is described, those skilled in the art to which the present invention belongs should understand that the "calculation of subgroup purity" in the present invention should actually include calculation Purity and calculated impurity (because impurity is only a reverse indicator of purity, its calculation is still related to the calculation of purity).
步驟S32:辨識模組31取得預設純度,並判斷是否任一子群的純度不符預設純度,如判斷子群的純度是否高於預設純度,或低於預設不純度。
Step S32: The
若辨識模組31判斷所有子群的純度都符合預設純度,則完成分類,即完成決策樹的建構。
If the
若辨識模組31判斷任一子群的純度不符預設純度,則執行步驟S33:對純度不符預設純度的子群再次執行前述決策樹演算法來決定另一分類條件。前述另一分類條件是將純度不符預設純度的子群再劃分為多個子群。
If the
接著,辨識模組31再次執行步驟S32,以判斷新劃分的多個子群是否符合預設純度,以此類推,直到所有子群的純度皆符合預設純度。
Then, the
接著,辨識模組31(於圖5的步驟S102中)進一步將決策樹的各葉節點(即純度符合預設純度的各子群)所對應的所有分類條件設定為所對應的流量類別的辨識特徵。 Then, the identification module 31 (in step S102 of FIG. 5) further sets all the classification conditions corresponding to each leaf node of the decision tree (that is, each subgroup whose purity meets the preset purity) as the identification of the corresponding traffic category feature.
藉此,本發明可有效且準確地對樣本值與樣本值以外的數值進行分類,並產生攻擊辨識資料模型的多個辨識特徵。 In this way, the present invention can effectively and accurately classify the sample value and the values other than the sample value, and generate multiple identification features that attack the identification data model.
請參閱圖11及圖12,圖11為本發明一實施態樣的基於單欄位的決策樹演算法的執行示意圖,圖12為本發明一實施態樣的基於多欄位的決策樹演算法的執行示意圖。圖11及圖12用以示例性說明前述決策樹演算法。 Please refer to FIG. 11 and FIG. 12. FIG. 11 is a schematic diagram of the execution of a decision tree algorithm based on a single field in an embodiment of the present invention, and FIG. 12 is a decision tree algorithm based on multiple fields in an embodiment of the present invention Schematic diagram of the implementation. Figures 11 and 12 are used to exemplarily illustrate the aforementioned decision tree algorithm.
於圖11及圖12的例子中,決策樹演算法採用是分類與迴歸樹演算法(Classification And Regression Tree Algorithm),而純度是吉尼係數(Gini Index)。並且,X[n]表示流量的欄位[n]的值;gini為不純度,當其值為0.0(預設純度)時表示所有樣本值都可以被正確分類;value[a,b]表示a+b個樣本值中,有a個(流量的欄位的)白名單樣本值,b個(流量的欄位的)黑名單樣本值。白名單樣本值及/或與黑名單樣本值可自樣本流量獲得,或經由前述反向分析處理獲得。 In the examples in Fig. 11 and Fig. 12, the decision tree algorithm is the Classification And Regression Tree Algorithm, and the purity is the Gini Index. In addition, X[n] represents the value of the flow field [n]; gini represents impurity, when its value is 0.0 (preset purity), it means that all sample values can be classified correctly; value[a,b] represents Among the a+b sample values, there are a whitelist sample values (for the flow field), and b blacklist sample values (for the flow field). The whitelist sample value and/or the blacklist sample value can be obtained from the sample flow or obtained through the aforementioned reverse analysis process.
如圖11所示,本例子是輸入1256個樣本值(包括1000個白名單樣本值與256個黑名單樣本值)。首先於節點80(根節點)以「欄位X[2]<=4.5」的分
類條件(1)進行分類,可以獲得兩個子群(即分類條件(1)符合時的節點81與分類條件(1)不符時的節點82)。
As shown in Figure 11, this example is to input 1256 sample values (including 1000 whitelist sample values and 256 blacklist sample values). First, at node 80 (root node), take the score of "Field X[2]<=4.5"
For classification under the class condition (1), two subgroups can be obtained (ie, the
節點82的子群共包括253個樣本值,且都為黑名單樣本值,故節點82的gini為0,此子群已正確分類(即節點82為葉節點)。
The subgroup of
節點81的子群共包括1003個樣本值(1000個白名單樣本值,3個黑名單樣本值),節點81的gini為0.006,即此子群尚未正確分類。
The subgroup of
對此,決策樹演算法會以「欄位X[2]<=2.5」分類條件(2)對節點81進行分類,可以獲得兩個子群(即分類條件(2)符合時的節點83與分類條件(2)不符時的節點84)。
In this regard, the decision tree algorithm will classify the
節點83的子群共包括3個樣本值,且都為黑名單樣本值,故節點83的gini為0,此子群已正確分類(即節點83為葉節點)。
The subgroup of
節點84的子群共包括1000個樣本值,且都為白名單樣本值,故節點84的gini為0,此子群已正確分類(即節點84為葉節點)。
The subgroup of
由於所有子群的純度都符合預設純度,故分類完成。於本次分類中,共有3個流量類別,即節點82-84。並且,屬於黑名單的節點82所對應的辨識特徵為:分類條件(1)不符;屬於黑名單的節點83所對應的辨識特徵為:分類條件(1)符合且分類條件(2)符合;屬於白名單的節點84所對應的辨識特徵為:分類條件(1)符合且分類條件(2)不符。
Since the purity of all subgroups meets the preset purity, the classification is completed. In this classification, there are 3 traffic categories, namely nodes 82-84. In addition, the identification feature corresponding to the
藉此,本發明可規劃多個流量類別,並計算所有流量類別的辨識特徵。 In this way, the present invention can plan multiple traffic categories and calculate the identification characteristics of all traffic categories.
值得一提的是,雖於圖11的例子中,進針對單一欄位的樣本值進行分類,但不以此限定。 It is worth mentioning that although in the example of FIG. 11, the sample value of a single field is classified, but it is not limited by this.
用戶可以依需求選擇多個欄位來執行前述決策演算法以提升後續分類的準確度。藉以解決因樣本流量過少而無法準確進行分類的問題。 The user can select multiple fields as required to execute the aforementioned decision-making algorithm to improve the accuracy of subsequent classification. In order to solve the problem of inaccurate classification due to too little sample flow.
舉例來說,圖12的例子是輸入2280個樣本值(包括1000個白名單樣本值與1280個黑名單樣本值)。首先於節點90(根節點)以「欄位X[2]<=4.5」的分類條件(1)進行分類,可以獲得兩個子群(即分類條件(1)符合時的節點91與分類條件(1)不符時的節點92)。
For example, the example in FIG. 12 is to input 2280 sample values (including 1000 whitelist sample values and 1280 blacklist sample values). First, perform classification at node 90 (root node) with the classification condition (1) of "Field X[2]<=4.5", and obtain two subgroups (i.e.,
節點92的子群共包括915個樣本值(皆為黑名單樣本值),故gini為0。
The subgroup of
節點91的子群共包括1365個樣本值(1000個白名單樣本值,365個黑名單樣本值,節點91的gini為0.392(未正確分類)。對此,決策樹演算法會以「欄位X[3]<=2.5」分類條件(2)對節點91進行分類(其欄位與臨界值的選擇可透過祭器學習方式計算獲得),以獲得兩個子群(即分類條件(2)符合時的節點93與分類條件(2)不符時的節點94)。
The subgroup of
節點94的子群共包括262個樣本值(皆為黑名單樣本值),故gini為0。
The subgroup of
節點93的子群共包括1103個樣本值(1000個白名單樣本值,103個黑名單樣本值,節點93的gini為0.169(未正確分類)。對此,決策樹演算法會以「欄位X[0]<=32774.0」分類條件(3)對節點93進行分類,以獲得兩個子群(即分類條件(3)符合時的節點95與分類條件(3)不符時的節點96)。
The subgroup of
節點96的子群共包括100個樣本值(皆為黑名單樣本值),故gini為0。
The subgroup of
節點95的子群共包括1003個樣本值(1000個白名單樣本值,3個黑名單樣本值,節點95的gini為0.006(未正確分類)。對此,決策樹演算法會以「欄位X[2]<=2.5」分類條件(4)對節點95進行分類,以獲得兩個子群(即分類條件(4)符合時的節點97與分類條件(4)不符時的節點98)。
The subgroup of
節點97的子群共包括3個樣本值(皆為黑名單樣本值),故gini為0。
The subgroup of
節點98的子群共包括1000個樣本值(皆為白名單樣本值),故gini為0。
The subgroup of
由於所有子群的純度都符合預設純度,故分類完成。於本次分類中,共有4個流量類別,即節點92、94、96-98。並且,屬於黑名單的節點92所對應的辨識特徵為:分類條件(1)不符;屬於黑名單的節點94所對應的辨識特徵為:分類條件(1)符合且分類條件(2)不符;屬於黑名單的節點96所對應的辨識特徵為:分類條件(1)、(2)符合且分類條件(3)不符;屬於黑名單的節點97所對應的辨識特徵為:分類條件(1)-(4)皆符合;屬於白名單的節點98所對應的辨識特徵為:分類條件(1)-(3)符合且分類條件(4)不符。
Since the purity of all subgroups meets the preset purity, the classification is completed. In this classification, there are 4 traffic categories, namely
藉此,本發明可規劃關聯多個欄位的流量類別,而可有效提升分類準確度。 In this way, the present invention can plan the traffic categories associated with multiple fields, and can effectively improve the classification accuracy.
續請一併參閱圖3及圖8,圖8為本發明第四實施例的攻擊辨識資料模型的生成與應用方法的流程圖。於圖8的實施例中,是將所產生的攻擊辨識資料模型用於入侵偵測系統(Intrusion Detection System,IDS),即僅辨識陌生流量屬於白名單或黑名單,即便陌生流量屬於黑名單,也不會阻擋陌生流量的傳輸。 Please refer to FIGS. 3 and 8 together. FIG. 8 is a flowchart of a method for generating and applying an attack identification data model according to a fourth embodiment of the present invention. In the embodiment of FIG. 8, the generated attack identification data model is used in an intrusion detection system (Intrusion Detection System, IDS), that is, only identifying strange traffic belonging to a whitelist or blacklist, even if the strange traffic belongs to a blacklist. It will not block the transmission of unfamiliar traffic.
具體而言,本實施例的攻擊辨識資料模型的生成與應用方法是包括以下辨識步驟。 Specifically, the method for generating and applying the attack identification data model of this embodiment includes the following identification steps.
步驟S400:辨識模組31依據用戶操作或自動控制切換至於辨識模式,以準備執行攻擊偵測。
Step S400: The
步驟S401:辨識模組31載入攻擊辨識資料模型。
Step S401: The
步驟S402:中繼設備30判斷是否收到任一流量。若中繼設備30未收到任何流量,則再次執行步驟S402以持續偵測。
Step S402: The
若中繼設備30收到流量,則執行步驟S403:中繼設備30產生所收到的流量的副本,並傳輸所產生的副本至辨識模組31作為陌生流量。
If the
步驟S404:中繼設備30依據流量的目的地欄位轉傳此流量至所指示的控制設備20或受控設備21。
Step S404: The
步驟S405:辨識模組31自中繼設備30接收陌生流量。
Step S405: The
值得一提的是,中繼設備30可即時將所收到的流量的副本傳送至辨識模組31,亦可累積固定數量的流量後再一次傳送至辨識模組31,或定時傳送至辨識模組31,不加以限定。
It is worth mentioning that the
步驟S406:辨識模組31基於攻擊辨識資料模型對所收到的陌生流量進行分類,以決定此陌生流量的流量類別。
Step S406: The
步驟S407:辨識模組31判斷陌生流量是屬於白名單的流量類別或黑名單的流量類別。若陌生流量是屬於白名單,則執行步驟S409。
Step S407: The
若陌生流量是屬於黑名單,則執行步驟S408:辨識模組31經由人機介面402發出警示以通知用戶,及/或做成記錄並儲存於儲存裝置401以供用戶日後查閱或作為下次訓練攻擊辨識資料模型的樣本流量。
If the unfamiliar traffic belongs to the blacklist, step S408 is executed: the
步驟S409:判斷是否結束流量辨識。若辨識模組31判斷結束條件滿足,則終止流量辨識。否則,再次執行步驟S402以持續進行流量辨識。
Step S409: Determine whether to end the flow identification. If the
藉此,本發明可有效實現入侵偵測,並減低中繼設備30的負載。
In this way, the present invention can effectively realize intrusion detection and reduce the load of the
續請一併參閱圖3及圖9,圖9為本發明第五實施例的攻擊辨識資料模型的生成與應用方法的流程圖。於圖9的實施例中,是將所產生的攻擊辨識資料模型用於入侵預防系統(Intrusion Prevention System,IPS),即即時辨識陌
生流量屬於白名單或黑名單,並於陌生流量屬於黑名單時即時進行處理。後續是以中繼設備30的辨識模組300執行入侵預防為例進行說明,但不以此限定,亦可改由辨識模組200、210或31來執行。
Please refer to FIG. 3 and FIG. 9 together. FIG. 9 is a flowchart of a method for generating and applying an attack identification data model according to a fifth embodiment of the present invention. In the embodiment of Figure 9, the generated attack identification data model is used in an intrusion prevention system (Intrusion Prevention System, IPS), that is, real-time identification of strangers.
The raw traffic belongs to the whitelist or blacklist, and the unfamiliar traffic belongs to the blacklist in real time. In the following, the
具體而言,本實施例的攻擊辨識資料模型的生成與應用方法是包括以下辨識步驟。 Specifically, the method for generating and applying the attack identification data model of this embodiment includes the following identification steps.
步驟S500:辨識模組300依據用戶操作或自動控制切換至於辨識模式,以準備執行攻擊預防。
Step S500: The
步驟S501:辨識模組300載入攻擊辨識資料模型。
Step S501: The
步驟S502:中繼設備30判斷是否收到任一流量。若中繼設備30未收到任何流量,則再次執行步驟S502以持續偵測。
Step S502: The
若中繼設備30收到流量,則執行步驟S503:將流量傳輸至辨識模組300作為陌生流量。
If the
步驟S504:辨識模組300自中繼設備30接收陌生流量。
Step S504: The
步驟S505:辨識模組300基於攻擊辨識資料模型對所收到的陌生流量進行分類,以決定此陌生流量的流量類別。
Step S505: The
步驟S506:辨識模組300判斷陌生流量是屬於白名單的流量類別或黑名單的流量類別。
Step S506: The
若陌生流量是屬於黑名單,則執行步驟S507:辨識模組300阻擋陌生流量的傳輸,即不會將此陌生流量傳輸至目的地。藉此預防攻擊行為抵達目的地的設備。
If the unfamiliar traffic belongs to the blacklist, step S507 is executed: the
若陌生流量是屬於白名單,則執行步驟S508:辨識模組300轉傳此陌生流量至目的地欄位所指示的控制設備20或受控設備21。
If the unfamiliar traffic belongs to the whitelist, step S508 is executed: the
步驟S509:判斷是否結束流量辨識。若辨識模組300判斷結束條件滿足,則終止流量辨識。否則,再次執行步驟S502以持續進行流量辨識。
Step S509: Determine whether to end the flow identification. If the
藉此,本發明可有效實現入侵預防偵測。 In this way, the present invention can effectively realize intrusion prevention and detection.
續請參閱圖13,為本發明一實施態樣的多個陌生流量的多個欄位的示意圖。圖13用以示例性說明本發明相較於現有技術進步之處。 Please continue to refer to FIG. 13, which is a schematic diagram of multiple fields of multiple unfamiliar traffic according to an embodiment of the present invention. FIG. 13 is used to exemplarily illustrate the progress of the present invention compared with the prior art.
圖13示出了21筆陌生流量(分別為流量1-21)的欄位資料與經由本發明所產生的辨識結果,事先以流量1-6作為白名單的樣本流量並加以訓練為攻擊分類模型,其中流量1-11經辨識後屬於白名單的流量類別0-4,流量12-21經辨識後屬於黑名單的流量類別5-14。 Figure 13 shows the field data of 21 unfamiliar traffic (respectively traffic 1-21) and the identification results generated by the present invention. The traffic 1-6 is used as the sample traffic of the whitelist in advance and trained as an attack classification model. , Where the traffic 1-11 belongs to the whitelisted traffic category 0-4 after identification, and the traffic 12-21 belongs to the blacklisted traffic category 5-14 after identification.
於圖13的例子中,是基於長度欄位、功能碼欄位及轉發率欄位,三個欄位來產生攻擊辨識資料模型,以進行攻擊偵測。長度欄位的白名單樣本值為11與12;功能碼欄位的白名單樣本值為3與4;轉發率欄位的白名單樣本值為1與2。 In the example in FIG. 13, the attack identification data model is generated based on the length field, the function code field and the forwarding rate field, which are three fields for attack detection. The whitelist sample values for the length field are 11 and 12; the whitelist sample values for the function code field are 3 and 4; the whitelist sample values for the forwarding rate field are 1 and 2.
於辨識過程中,流量1-10由於各欄位的值都與白名單樣本值相同,其所屬的流量種類0-3可判定為白名單。 During the identification process, since the value of each field of the flow 1-10 is the same as the sample value of the whitelist, the flow type 0-3 to which it belongs can be judged as the whitelist.
流量11的長度欄位(值為13)雖然此項不存在於白名單樣本值,但多數特徵仍符合白名單且屬於經驗上容許範圍內,故經訓練的攻擊辨識資料模型會將其所屬的流量種類4判定為白名單。
Although the length field of traffic 11 (value 13) does not exist in the whitelist sample value, most of the features still meet the whitelist and are within the empirical allowable range, so the trained attack identification data model will
流量12的長度欄位(值為16)不符白名單樣本值,且已明顯超出經驗容許範圍,故經訓練的攻擊辨識資料模型會將其所屬的流量種類5判定為黑名單。
The length field (value of 16) of
流量13的所有欄位至雖然都符合白名單樣本值,但其功能碼欄位(值為3)與轉發率欄位(值為0)的組合是屬於經驗上少見或不會出現的組合,故經訓練的攻擊辨識資料模型會將其所屬的流量種類6判定為黑名單。
Although all the fields of
流量14-21的功能碼欄位(值分別為2、5-11)不符白名單樣本值,且已明顯超出經驗容許範圍,故經訓練的攻擊辨識資料模型會將其所屬的流量種類7-14判定為黑名單。 The function code field of traffic 14-21 (values are 2, 5-11) does not match the whitelist sample value, and has clearly exceeded the allowable range of experience, so the trained attack identification data model will be the traffic type 7- 14 judged as a blacklist.
因此,本發明由於可對白名單樣本值與黑名單樣本值以外的數值進行判斷,可進一步提升攻擊偵測的準確性。 Therefore, the present invention can further improve the accuracy of attack detection because the whitelist sample value and the value other than the blacklist sample value can be judged.
以上所述僅為本發明之較佳具體實例,非因此即侷限本發明之專利範圍,故舉凡運用本發明內容所為之等效變化,均同理皆包含於本發明之範圍內,合予陳明。 The above are only preferred specific examples of the present invention, and are not limited to the scope of the patent of the present invention. Therefore, all equivalent changes made by using the content of the present invention are included in the scope of the present invention in the same way. Bright.
S100-S103:訓練步驟 S100-S103: training steps
S104-S108:第一辨識步驟 S104-S108: The first identification step
Claims (14)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW109100150A TWI724734B (en) | 2020-01-03 | 2020-01-03 | Method of building and applying an attack identification data model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW109100150A TWI724734B (en) | 2020-01-03 | 2020-01-03 | Method of building and applying an attack identification data model |
Publications (2)
Publication Number | Publication Date |
---|---|
TWI724734B true TWI724734B (en) | 2021-04-11 |
TW202127837A TW202127837A (en) | 2021-07-16 |
Family
ID=76604940
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW109100150A TWI724734B (en) | 2020-01-03 | 2020-01-03 | Method of building and applying an attack identification data model |
Country Status (1)
Country | Link |
---|---|
TW (1) | TWI724734B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8762298B1 (en) * | 2011-01-05 | 2014-06-24 | Narus, Inc. | Machine learning based botnet detection using real-time connectivity graph based traffic features |
US20150026810A1 (en) * | 2010-12-01 | 2015-01-22 | Cisco Technology, Inc. | Method and Apparatus for Detecting Malicious Software Using Machine Learning Techniques |
-
2020
- 2020-01-03 TW TW109100150A patent/TWI724734B/en active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150026810A1 (en) * | 2010-12-01 | 2015-01-22 | Cisco Technology, Inc. | Method and Apparatus for Detecting Malicious Software Using Machine Learning Techniques |
US8762298B1 (en) * | 2011-01-05 | 2014-06-24 | Narus, Inc. | Machine learning based botnet detection using real-time connectivity graph based traffic features |
Also Published As
Publication number | Publication date |
---|---|
TW202127837A (en) | 2021-07-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12177086B2 (en) | Classification and relationship correlation learning engine for the automated management of complex and distributed networks | |
JP6850902B2 (en) | Methods and equipment for detecting traffic anomalies in the network | |
US8850582B2 (en) | Security monitoring system and security monitoring method | |
US10187291B2 (en) | Path planning method and controller | |
JP7339321B2 (en) | Machine learning model update method, computer program and management device | |
JP6823501B2 (en) | Anomaly detection device, anomaly detection method and program | |
KR20180120558A (en) | System and method for predicting communication apparatuses failure based on deep learning | |
WO2018142703A1 (en) | Anomaly factor estimation device, anomaly factor estimation method, and program | |
Friedberg et al. | Evidential network modeling for cyber-physical system state inference | |
CN112333211B (en) | Industrial control behavior detection method and system based on machine learning | |
Preamthaisong et al. | Enhanced DDoS detection using hybrid genetic algorithm and decision tree for SDN | |
CN118413379A (en) | Intelligent identification and defense system of botnet in industrial environment | |
US11398945B2 (en) | System and method for fault detection and root cause analysis in a network of network components | |
CN113079127B (en) | Method for generating and applying attack recognition data model | |
TWI724734B (en) | Method of building and applying an attack identification data model | |
CN114554521B (en) | Substream sharing bandwidth bottleneck detection method and device for multi-path transmission protocol | |
CN109361658B (en) | Abnormal flow information storage method, device and electronic equipment based on industrial control industry | |
KR20230085692A (en) | Method and apparatus for detecting abnormal behavior of IoT system | |
CN118764409A (en) | Business data fault detection method, device, equipment and readable storage medium | |
TWI704782B (en) | Method and system for backbone network flow anomaly detection | |
CN116134447A (en) | Learning Utilization System, Utilization Device, Learning Device, Program, and Learning Utilization Method | |
CN110574348B (en) | Data processing apparatus and method | |
CN111935089B (en) | Data processing method and artificial intelligence server based on big data and edge computing | |
WO2023042710A1 (en) | Communication analysis system, analysis method, and program | |
Kilinçer et al. | Automatic fault detection with Bayes method in university campus network |