[go: up one dir, main page]

TWI509441B - Can flexibly set the data width of the multi-character string alignment device - Google Patents

Can flexibly set the data width of the multi-character string alignment device Download PDF

Info

Publication number
TWI509441B
TWI509441B TW103143825A TW103143825A TWI509441B TW I509441 B TWI509441 B TW I509441B TW 103143825 A TW103143825 A TW 103143825A TW 103143825 A TW103143825 A TW 103143825A TW I509441 B TWI509441 B TW I509441B
Authority
TW
Taiwan
Prior art keywords
output
state
rule
input
character
Prior art date
Application number
TW103143825A
Other languages
Chinese (zh)
Other versions
TW201624315A (en
Inventor
Chien Chi Chen
Sheng De Wang
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed filed Critical
Priority to TW103143825A priority Critical patent/TWI509441B/en
Application granted granted Critical
Publication of TWI509441B publication Critical patent/TWI509441B/en
Publication of TW201624315A publication Critical patent/TW201624315A/en

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Description

可彈性設定資料寬度之多字元字串比對裝置Multi-character string comparison device capable of flexibly setting data width

本發明係有關於一種字串比對裝置,特別是有關於一種可彈性設定資料寬度之多字元字串比對裝置。The present invention relates to a string comparison device, and more particularly to a multi-word string alignment device that can flexibly set a data width.

由Alfred V.Aho及Margaret J.Corasick所提出的字串比對演算法,一般稱為AC演算法,是一個有效率的完全字串比對(exact string matching)的方法,只要經由一次搜尋即可於一字串中找出所有的關鍵字(keyword)。其中一個重要的應用就是網路入侵偵測系統(Network Intrusion Detection System,簡稱NIDS),例如一般所熟知的SNORT。The string alignment algorithm proposed by Alfred V. Aho and Margaret J. Corasick, commonly referred to as the AC algorithm, is an efficient method of exact string matching, as long as it is searched through a single search. All keywords (keywords) can be found in a string. One of the most important applications is the Network Intrusion Detection System (NIDS), such as the well-known SNORT.

依據AC演算法所建立的搜尋樹(search tree)稱為AC-trie(AC搜尋樹)。請參照圖1,其繪示依關鍵字集{enhappy,happy,happen,happygo}而建立之一AC-trie。在圖1中,包含數字的圓圈代表狀態(state);雙層圓圈的狀態為輸出狀態-表示當轉移(transit)到該狀態時,有符合的輸出字串,例如狀態7代表符合的輸出字串"enhappy happy";實線為前進函式(goto function),虛線為失敗函式(failure function),前進函式和失敗函式的功用將於以下說明。事實上,除了初始狀態以外,每個狀態都有一失敗函式以指到初始狀態或另一個狀態。為了簡潔,圖1中未摽示出指到初始狀態的失敗函式(例如,狀態1的失敗函式)。當一個狀態的失敗函式不是指到初始 狀態,而是指到另一個狀態,表示該狀態所代表的字串包含了所述另一個狀態所代表的字串。例如,狀態7的失敗函式指到狀態12,其中狀態7所代表的字串包含了狀態12所代表的字串(狀態7所代表的字串為"enhappy",狀態12所代表的字串為"happy")。The search tree established according to the AC algorithm is called AC-trie (AC search tree). Please refer to FIG. 1 , which illustrates one AC-trie established according to the keyword set {enhappy, happy, happen, happygo}. In Fig. 1, a circle containing numbers represents a state; a state of a double-layered circle is an output state - indicating that when a transition is made to the state, there is a matching output string, for example, state 7 represents a matching output word. The string "enhappy happy"; the solid line is the goto function, the dotted line is the failure function, and the functions of the forward function and the failure function will be explained below. In fact, in addition to the initial state, each state has a failure function to point to the initial state or another state. For the sake of brevity, the failure function to the initial state (for example, the failure function of state 1) is not shown in FIG. When a state's failure function does not refer to the initial State, but refers to another state, indicating that the string represented by the state contains the string represented by the other state. For example, the failure function of state 7 refers to state 12, where the string represented by state 7 contains the string represented by state 12 (the string represented by state 7 is "enhappy", the string represented by state 12 Is "happy").

依圖1之AC-trie進行字串比對時,先將當前狀態(current state)設為初始狀態0。然後,對於輸入的字串,一次比對一個字元。針對輸入的字元,從對應當前狀態的至少一前進函式中找到一符合的前進函式以決定次狀態(next state);假如無法找到符合的前進函式,則依據失敗函式,將失敗函式所指的狀態指定為當前狀態,再繼續尋找符合的前進函式。透過失敗函式,最終將指到初始狀態。由於初始狀態的前進函式函蓋所有的字元,所以透過前進函式及失敗函式,在每次輸入一個字元後,保證一定能夠決定次狀態。從接受一個輸入字元開始,到透過前進函式及失敗函式決定次狀態為止,算一個比對週期(matching cycle)。比對完一個字元,決定次狀態後,再將次狀態指定為當前狀態,然後接受輸入下一個字元以進入下一個比對週期。依據前述比對過程的說明,建立AC-trie後,透過前進函式及失敗函式進行字串比對,只要一階段的搜尋(one pass search)即可找到所有符合關鍵字的字串,因此其搜尋時間與輸入的字串長度成線性關係,即O(n)。When the string comparison is performed according to the AC-trie of FIG. 1, the current state (current state) is first set to the initial state 0. Then, for the input string, one character is compared at a time. For the input character, find a matching forward function from at least one forward function corresponding to the current state to determine the next state; if the forward function cannot be found, it will fail according to the failure function. The state indicated by the function is specified as the current state, and then continue to find the forward function that matches. Through the failure function, it will eventually point to the initial state. Since the forward function of the initial state covers all the characters, the advance function and the failure function ensure that the secondary state can be determined after each input of one character. A matching cycle is calculated from the acceptance of an input character to the determination of the secondary state by the advance function and the failure function. After comparing one character, after determining the secondary state, the secondary state is designated as the current state, and then accepting the next character to enter the next comparison cycle. According to the description of the foregoing comparison process, after AC-trie is established, the string comparison is performed through the forward function and the failure function, and all the keywords matching the keyword can be found by one pass search, so Its search time is linear with the length of the input string, ie O(n).

請參照圖2(a),其繪示利用圖1之AC-trie進行一字串比對程序之一實例。如圖2(a)所示,該字串比對程序在進行至輸出狀態14及輸出狀態7時,會分別經由一失敗函式轉移到狀態2及狀態12,再進行後續的字元比對。圖2(b)係與圖2(a)對應的NFA(non-deterministic finite automata;非決定性有 限狀態機)字串比對方式,其允許多個狀態同時動作(active),其中,狀態0永遠在動作中,亦即,只要在狀態0偵測到符合關鍵字元e或h的輸入字元,NFA即開啟一字串比對程序。在圖2(b)中,第一字串比對程序(用以比對"happen")和第二字串比對程序(用以比對"enhappy")在比對en時係同時進行;第二字串比對程序和第三字串比對程序(用以比對"happygo")在比對"happy"時係同時進行。Referring to FIG. 2(a), an example of a string comparison procedure using the AC-trie of FIG. 1 is illustrated. As shown in FIG. 2(a), when the string comparison program proceeds to the output state 14 and the output state 7, it will respectively transfer to the state 2 and the state 12 via a failure function, and then perform subsequent character comparison. . Figure 2(b) is an NFA (non-deterministic finite automata corresponding to Figure 2(a); Limit state machine) string comparison mode, which allows multiple states to be active at the same time, wherein state 0 is always in action, that is, as long as the input word matching the key element e or h is detected in state 0 Yuan, NFA opens a string comparison program. In FIG. 2(b), the first string comparison program (for comparing "happen") and the second string comparison program (for comparing "enhappy") are performed simultaneously when comparing en; The second string comparison program and the third string comparison program (for comparing "happygo") are performed simultaneously when comparing "happy".

由上述的說明可知,AC-trie在形式上類似於 DFA(deterministic finite automata;決定性有限狀態機),因為其在一個時間內,只能一個狀態是動作的(active)。但對比於NFA的字串比對方式,可以看出,在每一個比對週期,被作用中之一狀態的失敗函式所指到的另一狀態,在NFA中會同時動作(active)。因為所有的非初始狀態,透過失敗函式最終都會連結到初始狀態,因此初始狀態永遠都是動作的。因此,AC-trie透過失敗函式的作用,可以達到類似於NFA的效果。DFA形式的AC-trie只維持一個狀態是動作的,有利於軟體的建構,因為在程式中,是循序(sequential)的執行程式碼;但是因為每檢視一個字元,狀態的移轉可能超過一次,不利於硬體的建構。As can be seen from the above description, AC-trie is similar in form. DFA (deterministic finite automata) because it can only have one state active at a time. However, compared with the NFA string alignment method, it can be seen that in each comparison period, another state pointed out by the failure function of one of the active states is active in the NFA. Because of all non-initial states, the failure function will eventually link to the initial state, so the initial state will always be active. Therefore, AC-trie can achieve an effect similar to NFA through the function of the failure function. The DFA-form AC-trie only maintains a state that is action-oriented, which is beneficial to the construction of the software, because in the program, it is a sequential execution code; but because each character is viewed, the state may be transferred more than once. Not conducive to the construction of hardware.

假若我們允許AC-trie中的多個狀態同時動作,則其運作方式 即是NFA,不需要考慮失敗函式,此將有利於硬體的實作。另外,因為AC-trie中的每一個狀態代表一個唯一的字串。我們以每一狀態與初始狀態之間的距離為其深度。在每一個時間內,同一深度的狀態中,一定只有一個狀態是動作的。因為同一深度的所有狀態代表長度相同但內容不同的字串,假如在同一個深度有超過一個的狀態同時動作,則與AC-trie定義矛盾,故不 可能發生此種情形。假如,我們將在相同深度的狀態歸於同一層級(level),那麼,在硬體實作時,每一層級只需要一個保留狀態的暫存器(register)。例如,在一組關鍵字集合(keyword set)中,最長的字串為q,則我們最多需要q個暫存器來保留每的層級的狀態。If we allow multiple states in AC-trie to act simultaneously, how they operate That is the NFA, there is no need to consider the failure function, which will be beneficial to the implementation of the hardware. In addition, because each state in AC-trie represents a unique string. We take the distance between each state and the initial state as its depth. At each time, in the same depth state, there must be only one state that is active. Because all states of the same depth represent strings of the same length but different contents, if there is more than one state at the same depth, the contradiction with the AC-trie definition is not This may happen. If we assign the same level of state to the same level, then at the hardware implementation, each level requires only one reserved state register. For example, in a set of keyword sets, the longest string is q, then we need at most q scratchpads to preserve the state of each level.

如上所述,使用AC-trie來進行搜尋,適用於以字元導向 (character-oriented)的方式來處理資料的應用,即以字元為基礎來決定所處理的資訊。亦即,在使用AC-trie來進行字串比對時,只能一個字元接著一個字元的檢視,無法一次同時檢視多個字元。同樣的,習知的基於AC-trie所建構的硬體架構在一個時脈週期只能比對一個字元,其最高的比對字元數將受限於硬體的時脈速率。As described above, use AC-trie for searching, suitable for character-oriented A (character-oriented) approach to the processing of data, that is, based on the character to determine the information processed. That is, when AC-trie is used for string comparison, only one character and one character can be viewed, and multiple characters cannot be viewed at the same time. Similarly, the conventional hardware architecture based on AC-trie can only match one character in one clock cycle, and the highest number of aligned characters will be limited by the clock rate of the hardware.

另外,半導體技術持續的進步,可以很容易的依據需求,自 行設計開發所需的硬體架構,而且在相同的面積可以放入更多的電路。但是,要提昇電路的運作速度卻是比較不容易的,而且高速運作的電路也會消耗更多的功率,因此,以一般通用的處理器(CPU)為例,其解決之道是在一個晶片中放入多個核心,藉由平行處理的方式提昇效能。相同的,字串比對的硬體裝置若是能夠在一個比對週期同時檢視多個字元,其效能必然會有倍數的成長。In addition, the continuous advancement of semiconductor technology can be easily based on demand. The hardware architecture required for line design development, and more circuits can be placed in the same area. However, it is not easy to speed up the operation of the circuit, and the circuit that operates at high speed consumes more power. Therefore, taking a general-purpose processor (CPU) as an example, the solution is in a single chip. Put multiple cores in it and improve performance by parallel processing. In the same way, if the hardware device of the string comparison can view multiple characters at the same time in one comparison cycle, the performance will inevitably increase by a multiple.

本發明之主要目的在於提出一種可彈性設定資料寬度之多字元字串比對裝置,其彈性設定在一個比對週期內要比對的字元數目以充份發揮硬體的效能。The main object of the present invention is to provide a multi-character string comparison device capable of flexibly setting a data width, which is elastically set to match the number of characters in a comparison period to fully exert the performance of the hardware.

為達到上述之目的,一可彈性設定資料寬度之多字元字串比 對裝置乃被提出,其具有:一規則電路,其具有複數個通用規則單元,各所述通用規則單元均具有至少一字串輸入端、複數個次態輸入端、複數個次態輸出端、複數個比對輸出端、一第一併接輸入端、一第二併接輸入端、一第一併接輸出端、以及一第二併接輸出端,且各所述通用規則單元各具有一轉移規則執行單元以及一邏輯電路,該轉移規則執行單元係依所述至少一字串輸入端及所述複數個次態輸入端之輸入資料執行以一AC-trie為基礎所建立之一轉移規則而產生一比對結果及在所述複數個次態輸出端及所述複數個比對輸出端輸出資料,該轉移規則執行單元具有一暫存單元以儲存一起始旗標及一最終旗標,該邏輯電路係依該第一併接輸入端之輸入信號、該第二併接輸入端之輸入信號、該起始旗標、該最終旗標、以及該比對結果進行一邏輯運算以在該第一併接輸出端和該第二併接輸出端分別產生一輸出信號,其中,所述複數個通用規則單元係分成複數個子群,且各所述子群內之複數個所述通用規則單元係藉由所述第一併接輸入端、所述第一併接輸出端、所述第二併接輸入端、和所述第二併接輸出端互相連接;一狀態電路,具有複數個輸入端以耦合所述複數個通用規則單元之所述複數個次態輸出端,及複數個輸出端以耦合所述複數個通用規則單元之所述複數個次態輸入端;以及一輸出電路,具有複數個輸入端以耦合所述複數個通用規則單元之所述複數個比對輸出端,及複數個輸出端以產生複數個比對輸出信號。To achieve the above purpose, a multi-character string ratio that can flexibly set the data width The device is proposed to have: a regular circuit having a plurality of general rule units, each of the general rule units having at least one string input terminal, a plurality of secondary state input terminals, and a plurality of secondary state output terminals, a plurality of comparison output terminals, a first parallel connection input terminal, a second parallel connection input terminal, a first parallel connection output terminal, and a second parallel output terminal, and each of the common rule units has one a transfer rule execution unit and a logic circuit, the transfer rule execution unit performing a transfer rule based on the AC-trie based on the input data of the at least one string input end and the plurality of secondary state input ends And generating a comparison result and outputting data at the plurality of secondary output terminals and the plurality of comparison output terminals, the transfer rule execution unit having a temporary storage unit for storing a start flag and a final flag. The logic circuit performs a logic operation according to the input signal of the first parallel input, the input signal of the second parallel input, the start flag, the final flag, and the comparison result. The first parallel output terminal and the second parallel output terminal respectively generate an output signal, wherein the plurality of general rule units are divided into a plurality of subgroups, and the plurality of the general rule units in each of the subgroups The first parallel connection input terminal, the first parallel connection output terminal, the second parallel connection input terminal, and the second parallel connection output terminal are connected to each other; a state circuit having a plurality of inputs a plurality of secondary output terminals coupled to the plurality of general rule units, and a plurality of output terminals for coupling the plurality of secondary input terminals of the plurality of general rule units; and an output circuit having A plurality of inputs are coupled to the plurality of aligned outputs of the plurality of general rule units, and the plurality of outputs to generate a plurality of aligned output signals.

在一實施例中,所述的邏輯運算為: CO=/L AND M AND(F OR(CI AND/F)),以及EO=/F AND((EI AND/L)OR(CI AND M AND L)),其中CO代表所述第一併接輸出端的輸出信號,EO代表所述第二併接輸出端的輸出信號,CI代表所述第一併接輸入端的輸入信號,EI代表所述第二併接輸入端的輸入信號,F代表所述起始旗標,L代表所述最終旗標,M代表所述比對結果,/代表反相運算子,AND代表邏輯且運算子,OR代表邏輯或運算子。In an embodiment, the logical operation is: CO=/L AND M AND(F OR(CI AND/F)), and EO=/F AND((EI AND/L)OR(CI AND M AND L)), where CO represents the first parallel Output signal at the output, EO represents the output signal of the second parallel output, CI represents the input signal of the first parallel input, EI represents the input signal of the second parallel input, and F represents the start Flag, L stands for the final flag, M stands for the comparison result, / stands for the inverse operator, AND stands for logical and operator, and OR stands for logical OR operator.

為使 貴審查委員能進一步瞭解本發明之結構、特徵及其目的,茲附以圖式及較佳具體實施例之詳細說明如后。The detailed description of the drawings and the preferred embodiments are set forth in the accompanying drawings.

1210‧‧‧規則電路1210‧‧‧Regular Circuit

1220‧‧‧狀態電路1220‧‧‧ State Circuit

1230‧‧‧輸出電路1230‧‧‧Output circuit

12101‧‧‧通用規則單元12101‧‧‧General Rule Unit

121011‧‧‧轉移規則執行單元121011‧‧‧Transfer rule execution unit

121012‧‧‧邏輯電路121012‧‧‧Logical Circuit

121011a‧‧‧處理單元121011a‧‧‧Processing unit

121011b、121011c‧‧‧多工器121011b, 121011c‧‧‧Multiplexer

121011d、121011e‧‧‧解多工器121011d, 121011e‧‧ ‧ multiplexer

圖1繪示依關鍵字集{enhappy,happy,happen,happygo}而建立之一AC-trie。Figure 1 illustrates one of the AC-tries created by the keyword set {enhappy, happy, happen, happygo}.

圖2(a)繪示利用圖1之AC-trie進行一字串比對程序之一實例。Figure 2(a) shows an example of a string alignment procedure using the AC-trie of Figure 1.

圖2(b)為與圖2(a)對應之一NFA(non-deterministic finite automata;非決定性有限狀態機)字串比對方式示意圖。2(b) is a schematic diagram showing a comparison manner of a NFA (non-deterministic finite automata) word string corresponding to FIG. 2(a).

圖3(a)為一多階層式平行多字元字串比對裝置之方塊圖。Figure 3 (a) is a block diagram of a multi-level parallel multi-word string alignment device.

圖3(b)-3(c)繪示本發明用以進行字串比對之一規則表。3(b)-3(c) illustrate a rule table for performing string alignment of the present invention.

圖4(a)-4(b)繪示依本發明之設計進行字串比對之一實例。4(a)-4(b) illustrate an example of string alignment in accordance with the design of the present invention.

圖5為本發明將圖1的AC-trie區分成兩部分以進行不同型態的轉移函式之示意圖。FIG. 5 is a schematic diagram showing the transfer function of the AC-trie of FIG. 1 divided into two parts for different types according to the present invention.

圖6繪示本發明之複數種輔助轉移函式。Figure 6 illustrates a plurality of auxiliary transfer functions of the present invention.

圖7-8繪示本發明之1字元轉移函式之一實施例。7-8 illustrate an embodiment of a 1-character transfer function of the present invention.

圖9-10繪示本發明之3字元轉移函式之一實施例。9-10 illustrate an embodiment of a 3-character transfer function of the present invention.

圖11繪示本發明之演算法之一實施例。Figure 11 illustrates an embodiment of an algorithm of the present invention.

圖12繪示本發明可彈性設定資料寬度之多字元字串比對裝置一較佳實施例之方塊圖。FIG. 12 is a block diagram showing a preferred embodiment of a multi-character string alignment device capable of flexibly setting a data width according to the present invention.

圖13繪示圖12所示一通用規則單元之一實施例的方塊圖。13 is a block diagram showing an embodiment of a general rule unit shown in FIG.

圖14繪示圖12所示一通用規則單元之一實施例的電路圖。14 is a circuit diagram of an embodiment of a general rule unit shown in FIG.

圖15為3個圖12所示之通用規則單元藉由各通用規則單元所設之第一併接輸入端、第一併接輸出端、第二併接輸入端、和第二併接輸出端互相連接而形成一子群的示意圖。15 is a first parallel input terminal, a first parallel output terminal, a second parallel input terminal, and a second parallel output terminal provided by the general rule unit shown in FIG. A schematic diagram of interconnecting to form a subgroup.

硬體架構的說明Description of the hardware architecture

為讓讀者瞭解本發明所提出的裝置的運作方式,先說明硬體架構,接著說明用於此硬體架構的轉移規則,然後再說明如何推導轉移規則。現在要以實例來說明轉移規則,且此說明的實例是以關鍵字集{enhappy,happy,happen,happygo}為基礎,依此關鍵字集所建立的AC-trie請參照圖1。In order to let the reader know how the device proposed by the present invention works, the hardware architecture will be explained first, followed by the transfer rules for the hardware architecture, and then how to derive the transfer rules. Now let's explain the transfer rule by example, and the example of this description is based on the keyword set {enhappy, happy, happen, happygo}. Please refer to Figure 1 for the AC-trie created by this keyword set.

首先,為了說明本發明的字串比對的法,我們將AC-trie的每個狀態依其深度歸類於各別的層級(level),例如初始狀態位於層級0,狀態1及8的深度為1則歸類於層級1,依此類推。First, in order to explain the string alignment method of the present invention, we classify each state of AC-trie according to its depth to a respective level, for example, the initial state is at level 0, the depth of states 1 and 8. 1 is classified as level 1, and so on.

為了讓讀者能夠瞭解本發明的作法,請參照圖3(a),其繪示具有7個階層單元之多階層式平行多字元字串比對裝置一實施例之方塊圖,其係用以平行比對3個字元。此實施例在一個比對週期內平行比對3個字元,因此前面6個階層單元被串接成3個管線(pipeline),每個管線包括2個 階層單元。其中第1~3個階層單元對應於層級0,階層單元STG(1)及STG(2)分別處理只符合樣版字元的最後兩個字元及最後一個字元的情形。階層單元STG(4)~STG(6)則對應於層級1~3,層級4及其之後的狀態則集中對應於階層單元STG(7)。前面6個階層單元的運作方式係等同於NFA。最後一個階層單元,即階層單元STG(7)的運作方式則是等同於DFA。另外還包括一個優先權多工器pmux(0),用以由階層單元STG(4)~STG(7)所輸出的NX決定送至階層單元STG(7)的CUR_ST。另外,優先權多工器pmux(1)~(3)用來決定比對輸出OP(1)~OP(3),用以由7個階層單元的比對輸出選擇最後的比對輸出。In order to enable the reader to understand the practice of the present invention, please refer to FIG. 3( a ), which is a block diagram of an embodiment of a multi-hierarchical parallel multi-word string alignment device having seven hierarchical units. Parallel to 3 characters. This embodiment parallels 3 characters in a comparison period, so the first 6 hierarchical units are serially connected into 3 pipelines, each of which includes 2 pipelines. Hierarchy unit. The first to third hierarchical units correspond to level 0, and the hierarchical units STG(1) and STG(2) respectively handle the case where only the last two characters and the last character of the pattern character are matched. The hierarchical units STG(4) to STG(6) correspond to the levels 1 to 3, and the levels 4 and subsequent states are collectively corresponding to the hierarchical units STG(7). The first six hierarchical units operate in the same way as the NFA. The last hierarchical unit, the hierarchical unit STG(7), operates in the same way as DFA. In addition, a priority multiplexer pmux(0) is included for determining the CUR_ST sent to the hierarchical unit STG(7) by the NX outputted by the hierarchical units STG(4)~STG(7). In addition, the priority multiplexer pmux(1)~(3) is used to determine the comparison output OP(1)~OP(3) for selecting the final comparison output from the comparison output of the seven hierarchical units.

此方塊圖的階層及管線只是用以說明本發明的架構,在一實際應用中,管線的數目則是由一個比對週期平行比對的字元數決定,而階層的數目則依應用需求決定。The hierarchy and pipelines of this block diagram are only used to illustrate the architecture of the present invention. In a practical application, the number of pipelines is determined by the number of characters in a parallel alignment, and the number of classes is determined by application requirements. .

轉移規則說明Transfer rule description

圖3(b)及3(c)繪示一規則表,其共有30條轉移規則,這些轉移規則係依前述關鍵字集所產生,分屬於7個階層單元。其中圖3(b)繪示第1至第19條規則,其為階層單元(7)和(6)的規則,3(c)繪示其他階層單元的規則。Figures 3(b) and 3(c) illustrate a rule table having a total of 30 transfer rules, which are generated according to the aforementioned keyword set and belong to 7 hierarchical units. FIG. 3(b) shows the first to the 19th rules, which are the rules of the hierarchical units (7) and (6), and 3(c) shows the rules of the other hierarchical units.

此規則表中的轉移規則,其排列順序係依據優先權由最高至最低。其中,較後面的階層單元的規則的優先權高於較前面的階層單元的規則,例如階層單元(7)的規則的優先權高於階層單元(1)至(6)的規則,故階層單元(7)的規則放在最前面。在同一階層單元中,則是具有完全比對的樣版字元的規則的優先順序較高,例如,考慮規則6及7,其為在相同的階層單元(7)中,因為規則6要完全(exact)比對輸入的3個字元,而規則7只比對前 兩個輸入字元,且規則6與規則7的樣版字元的前兩個字元同樣為"py",因此當規則6觸發時,規則7也一定會被觸發,但輸出的比對結果是由優先順序較高的規則6決定。規則6會同時決定對應於3個輸入字元的比對輸出以及次狀態,而規則7只決定對應於前2個輸入字元的比對輸出,但不影響次狀態及對應於第三個輸入字元的比對輸出。The transfer rules in this rule table are ranked in order of priority from highest to lowest. Wherein, the rule of the later hierarchical unit has higher priority than the rule of the preceding hierarchical unit, for example, the priority of the rule of the hierarchical unit (7) is higher than the rule of the hierarchical unit (1) to (6), so the hierarchical unit The rules of (7) are placed at the top. In the same hierarchical unit, the rules with the exact matching pattern characters have higher priority, for example, considering rules 6 and 7, which are in the same hierarchical unit (7), because the rule 6 is completely (exact) compares the input 3 characters, while rule 7 only compares before Two input characters, and the first two characters of rule 6 and rule 7 are also "py", so when rule 6 is triggered, rule 7 will also be triggered, but the output is compared. It is determined by Rule 6 with higher priority. Rule 6 will simultaneously determine the comparison output and the secondary state corresponding to the three input characters, while rule 7 only determines the comparison output corresponding to the first two input characters, but does not affect the secondary state and corresponds to the third input. The output of the characters is compared.

規則表的第一欄為規則號碼(rule no),規則號碼僅是為了方 便說明,並無實質作用,在規則單元中也不儲存規則號碼的資料。規則表的第二欄為階層號碼(stage no),代表此規則所屬的階層單元。依據前面所述的電路架構,階層號碼-用以表示該規則所屬的階層單元-並不需要儲存於暫存器中。但在後面將要說明的電路架構中,階層號碼必須儲存在規則單元中,因為其係用以控制相關的資料傳輸電路以選擇該規則單元的輸入資料及決定輸出資料的去處,從而達到前述多個階層單元的相同功能。The first column of the rule table is the rule number (rule no), and the rule number is only for the square. It is stated that there is no substantial effect and the data of the rule number is not stored in the rule unit. The second column of the rule table is the stage number, which represents the hierarchical unit to which this rule belongs. According to the circuit architecture described above, the hierarchical number - used to indicate the hierarchical unit to which the rule belongs - does not need to be stored in the scratchpad. However, in the circuit architecture to be described later, the hierarchical number must be stored in the rule unit because it is used to control the associated data transmission circuit to select the input data of the rule unit and determine the location of the output data, thereby achieving the foregoing plurality. The same function of the hierarchy unit.

規則表的其他欄位大致可分為樣版(pattern)資料及輸出資 料,樣版資料包括欄位PMASK、P_ST、及P_CHRS,其中欄位PMASK為樣版遮罩,欄位P_ST為當前狀態,欄位P_CHRS為樣版字元。輸出資料包括欄位OMASK、NX_ST、及OP1~OP3,欄位OMASK為輸出遮罩,欄位NX_ST為次狀態,欄位OP1~OP3為比對輸出。The other fields of the rules table can be roughly divided into pattern data and output resources. The sample data includes fields PMASK, P_ST, and P_CHRS, wherein the field PMASK is a pattern mask, the field P_ST is the current state, and the field P_CHRS is a pattern character. The output data includes fields OMASK, NX_ST, and OP1~OP3. The field OMASK is the output mask, the field NX_ST is the secondary state, and the fields OP1~OP3 are the comparison outputs.

樣版字元P_CHRS的數目對應於平行比對的字元數,例如, 在此實例中,一次比對3個字元,則每一轉移規則有3個樣版字元。樣版遮罩PMASK的位元依序對應於當前狀態P_ST及個別的樣版字元(pattern characters)P_CHRS,例如最高位元(Most Significant Bit,MSB),即PMASK的位元3(bit 3),對應於當前狀態P_ST,而次高位元,即PMASK的位元2(bit 2),則對應第一個樣版字元,最低位元(Least Significant Bit,LSB),即PMASK的位元0(bit 0),對應於最後一個樣版字元。然而,在此只是舉出一種樣版遮罩的資料表示方式,並不限定於此作法,熟習此技藝者,可以很容易的設計出不同的資料表示方式,能夠達到在此所提到的功能。The number of pattern characters P_CHRS corresponds to the number of characters of the parallel alignment, for example, In this example, three characters are compared at a time, and each transfer rule has three pattern characters. The bits of the pattern mask PMASK sequentially correspond to the current state P_ST and the individual pattern characters P_CHRS, such as the Most Significant Bit (MSB), that is, the bit 3 of the PMASK (bit 3). , corresponding to the current state P_ST, and the next highest bit, that is, bit 2 of PMASK (bit 2), corresponding to the first pattern character, the Least Significant Bit (LSB), that is, bit 0 (bit 0) of PMASK, corresponding to the last pattern character. However, here is only a data representation of a pattern mask, and is not limited to this method. Those skilled in the art can easily design different data representation manners to achieve the functions mentioned herein. .

當對應的樣版遮罩位元為'1'時,表示要比對對應的樣版資 料,而當其為'0'時,則表示對應的樣版資料為略過(don't care),為便於分辨,在P_CHRS中以通配字元(wildcard character)'?'來表示對應的樣版字元為略過(don't care)。例如,當階層單元(1)的轉移規則中的PMASK的位元2(bit 2)及位元1(bit 1)皆為'0'而bit0為'1'時,則P_CHRS的第1及第2樣版字元皆為'?',表示不論第1及2個輸入字元為何,皆為符合,只對第3個輸入字元進行實際的比對。階層單元(2)的P_CHRS則只要比對第2個及第3個字元。規則表的欄位P_ST係樣版資料的當前狀態。前面的階層單元(1)~(3),其當前狀態P_ST都不比對,只比對輸入字元與樣版字元P_CHRS,但為了讓規則的格式一致,其樣版資料的P_ST皆標示出來,而由轉移規則中的PMASK的位元3決定是否要比對當前狀態。When the corresponding pattern mask bit is '1', it indicates that the corresponding template is to be matched. Material, and when it is '0', it means that the corresponding pattern data is don't care. For easy resolution, the wildcard character is used in P_CHRS. 'To indicate that the corresponding template character is don't care. For example, when bit 2 (bit 2) and bit 1 (bit 1) of PMASK in the transfer rule of the hierarchical unit (1) are both '0' and bit 0 is '1', then the first and the first of P_CHRS 2 sample characters are '? ', indicating that regardless of the first and second input characters, it is a match, only the actual comparison of the third input character. The P_CHRS of the hierarchical unit (2) is only required to match the second and third characters. The current status of the P_ST system data in the rules table. The previous hierarchical units (1)~(3) have no current state P_ST, and only the input characters and the pattern characters P_CHRS are compared. However, in order to make the rules format consistent, the P_ST of the pattern data is marked. And it is determined by bit 3 of PMASK in the transfer rule whether or not to compare the current state.

由轉移規則表可以看出各個階層單元所對應的AC-trie的層 級,例如階層單元(1)~(3)的轉移規則的樣版資料的P_ST皆為0,代表其對應於層級(level)0,另外階層單元(4)的轉移規則的樣版資料的P_ST為1及8,其在AC-trie的層級1,代表階層單元(4)對應於層級1。另外,階層單元(5)的轉移規則的樣版資料的P_ST為2及9,其在AC-trie的level 2,代表階層單元(5)對應於層級2。同樣可依此類推階層單元(6)對應於AC-trie的層級3。但階層單元(7)則對應於AC-trie的層級4及其以後的所有層級。The transfer rule table shows the layer of AC-trie corresponding to each hierarchical unit. The P_ST of the pattern data of the transfer rule such as the hierarchical unit (1) to (3) is 0, which represents the P_ST of the pattern data corresponding to the level 0 and the transfer rule of the other hierarchical unit (4). It is 1 and 8, which is at level 1 of AC-trie, and represents hierarchical unit (4) corresponding to level 1. In addition, the P_ST of the pattern data of the transfer rule of the hierarchical unit (5) is 2 and 9, which is at level 2 of AC-trie, and represents that the hierarchical unit (5) corresponds to the level 2. Similarly, the hierarchical unit (6) can correspond to the level 3 of the AC-trie. However, the hierarchical unit (7) corresponds to the level 4 of the AC-trie and all subsequent levels.

假如一個轉移規則的樣版資料與輸入的CUR_ST及 IN_CHRS符合(matching),則該轉移規則被觸發。被觸發的轉移規則,其輸出資料可能會被用以決定該規則所屬的階層單元的次狀態NX及比對輸出OP(1)~OP(3)。If a transfer rule of the pattern data and the input CUR_ST and If the IN_CHRS matches, the transfer rule is triggered. The triggered transfer rule whose output data may be used to determine the secondary state NX of the hierarchical unit to which the rule belongs and the comparison output OP(1)~OP(3).

轉移規則中的NX_ST為次狀態(next state),在每一比對週 期中,同一階層單元中被觸發的轉移規則中優先權最高的規則的次狀態會被送至同一管線的後面的階層單元,例如,在此例中一次比對3個字元,由階層單元(3)所決定的次狀態會被送到階層單元(6)當成當前狀態,而階層單元(4)所決定的次狀態則送到階層單元(7)。假若沒有任何轉移規則被觸發,則該階層單元的輸出的次狀態為0。轉移規則中的OP1、OP2、及OP3為分別對應於第1個、第2個、及第3個輸入字元的比對輸出。同樣的,在每一比對週期中,同一階層單元中被觸發的轉移規則中優先權最高的比對輸出會被當作該階層單元比對輸出。NX_ST in the transfer rule is the next state (next state), in each comparison week During the period, the secondary state of the highest priority rule in the triggered rule in the same hierarchical unit is sent to the hierarchical unit behind the same pipeline. For example, in this example, three characters are aligned at a time, and the hierarchical unit is 3) The determined secondary state is sent to the hierarchical unit (6) as the current state, and the secondary state determined by the hierarchical unit (4) is sent to the hierarchical unit (7). If no transfer rule is triggered, the secondary state of the output of the hierarchical unit is zero. OP1, OP2, and OP3 in the transfer rule are comparison outputs corresponding to the first, second, and third input characters, respectively. Similarly, in each comparison cycle, the highest priority comparison output among the triggered transition rules in the same hierarchical unit is treated as the hierarchical unit comparison output.

輸出遮罩OMASK的最高位元(MSB),即上述規則的OMASK 的位元3(bit 3),係對應於次狀態NX_ST。其餘的位元則依序對應於個別的輸出字串(output strings),例如次高位元,即輸出遮罩OMASK的位元2(bit 2),對應於第一個比對輸出OP1,而最低位元(LSB),即輸出遮罩OMASK的位元0(bit 0),則對應於最後一個比對輸出OP3。當OMASK的某個位元為'0'時,表示對應的資料不是有效的,例如轉移規則5的OMASK的位元3及位元0(bit 3及bit 0)皆為'0',表示此規則的NX_ST及OP3皆不是有效的,亦即,當此規則被觸發時,並不會用來決定次狀態,而且也不決定對應於輸入字元的第3個字元的符合字串輸出。由規則表中,我們可以看到,當轉 移規則的樣版字元P_CHRS中排在後面的字元為通配字元(wildcard character)時,則輸出遮罩OMASK的位元3(bit 3)將為'0',代表此規則只決定對應的比對輸出而不決定次狀態(next state)。例如,在規則4中,由於其樣版字元P_CHRS為"y?‘?",故該規則4不決定次狀態。Output mask OMASK highest bit (MSB), which is the above rule OMASK Bit 3 (bit 3) corresponds to the secondary state NX_ST. The remaining bits are sequentially corresponding to individual output strings, such as the next highest bit, that is, bit 2 (bit 2) of the output mask OMASK, corresponding to the first comparison output OP1, and the lowest The bit (LSB), which is the bit 0 (bit 0) of the output mask OMASK, corresponds to the last comparison output OP3. When a certain bit of OMASK is '0', it indicates that the corresponding data is not valid. For example, bit 3 and bit 0 (bit 3 and bit 0) of OMASK of transfer rule 5 are both '0', indicating that The rules NX_ST and OP3 are not valid, that is, when this rule is triggered, it is not used to determine the secondary state, and does not determine the conforming string output corresponding to the third character of the input character. From the rules table, we can see that when When the character in the pattern character P_CHRS of the shift rule is a wildcard character, the bit 3 (bit 3) of the output mask OMASK will be '0', indicating that the rule only determines The corresponding alignment output does not determine the next state. For example, in rule 4, since its pattern character P_CHRS is "y?'?", the rule 4 does not determine the secondary state.

由於輸出字串的長度不固定,為了方便硬體的設計,可在資 料暫存器中儲存對應於比對輸出字串的狀態代碼,例如規則表中的轉移規則3,其OP3為"happygo",以對應的狀態16來代表該輸出字串。在上述的規則表中,雖然僅以狀態的編號來表示輸出字串,但熟習相關技藝者,可以很容易的達到以代碼來表示輸出字串的作法。Since the length of the output string is not fixed, in order to facilitate the design of the hardware, it can be used The status register stores a status code corresponding to the aligned output string, such as transfer rule 3 in the rules table, OP3 is "happygo", and the corresponding status string 16 represents the output string. In the above-described rule table, although the output string is represented only by the number of the state, it is easy for a person skilled in the art to realize the practice of outputting the string by code.

字串比對範例String comparison example

為了讓讀者更進一步瞭解本發明的運作,請參照圖4(a)及圖4(b),其所繪示為一比對的實例。輸入的待比對的字串為"enhappenhappygo",我們在進行比對時,一次比對三個字元,因此將字串分成數段"enh"、"app"、"enh"、"app"、及"ygo",依序進行比對。In order to further understand the operation of the present invention, please refer to FIG. 4(a) and FIG. 4(b), which are illustrated as an example of alignment. The input string to be compared is "enhappenhappygo", we compare three characters at a time, so we divide the string into several segments "enh", "app", "enh", "app" And "ygo", in order to compare.

在比對之前,先將所有的暫存器的狀態初始化(initialization)。在第一個比對週期,依據輸入的字元"enh",將觸發規則25及規則30,其中規則30屬於階層單元(1),規則25屬於階層單元(3)。規則30將決定次狀態為8,此結果將被送至串接於階層單元(1)後面的階層單元(4),在下一比對週期被階層單元(4)當成當前狀態。規則25將決定次狀態為3,此結果將被送至串接於階層單元(3)後面的階層單元(6),在下一比對週期被階層單元(6)當成當前狀態。依據前述兩個被觸發的規則所決定的比對輸出皆為空字串。Initialize the state of all registers before comparing them. In the first comparison cycle, according to the input character "enh", rule 25 and rule 30 will be triggered, wherein rule 30 belongs to hierarchical unit (1) and rule 25 belongs to hierarchical unit (3). Rule 30 will determine the secondary state as 8, and this result will be sent to the hierarchical unit (4) that is concatenated behind the hierarchical unit (1), and is treated as the current state by the hierarchical unit (4) in the next comparison period. Rule 25 will determine the secondary state to be 3, and this result will be sent to the hierarchical unit (6) that is concatenated behind the hierarchical unit (3), and will be regarded as the current state by the hierarchical unit (6) in the next comparison period. The aligned outputs determined according to the two triggered rules are all empty strings.

在第二個比對週期,依據輸入的字元"app"及前一比對週期 所決定的次狀態,將觸發規則16及規則24,其中規則24屬於階層單元(4),規則16屬於階層單元(6)。規則24將決定次狀態為11,規則16將決定次狀態為6。階層單元(4)及(6)所決定的次狀態會送至優先權多工器pmux(0)。因階層單元(6)的輸出具有較高的優先順序,故優先權多工器pmux(0)會決定次狀態為6,並將其送至最後的階層單元(7),以在下一比對週期被階層單元(7)當成當前狀態。另外,前述兩個被觸發的規則所決定的比對輸出皆為空字串。In the second comparison cycle, according to the input character "app" and the previous comparison period The determined secondary state will trigger rule 16 and rule 24, where rule 24 belongs to hierarchy unit (4) and rule 16 belongs to hierarchy unit (6). Rule 24 will determine the secondary state to be 11, and rule 16 will determine the secondary state to be 6. The secondary states determined by the hierarchical units (4) and (6) are sent to the priority multiplexer pmux(0). Since the output of the hierarchical unit (6) has a higher priority order, the priority multiplexer pmux(0) determines the secondary state to be 6 and sends it to the last hierarchical unit (7) for the next comparison. The period is regarded as the current state by the hierarchical unit (7). In addition, the comparison outputs determined by the two triggered rules are all empty strings.

在第三個比對週期,依據輸入的字元"enh"及前一比對週期 所決定的次狀態,將觸發規則13、規則25及規則30,其中規則30屬於階層單元(1),規則25屬於階層單元(3),規則13屬於階層單元(7)。規則13未決定有效的次狀態,但決定比對輸出OP2為14,其為"happen"。規則30將決定次狀態為8,此結果將被送至串接於階層單元(1)後面的階層單元(4),在下一比對週期被階層單元(4)當成當前狀態。規則25將決定次狀態為3,此結果將被送至串接於階層單元(3)的階層單元(6),在下一比對週期被階層單元(6)當成當前狀態。依據所觸發的規則,經優先權多工器pmux(2)決定最後的比對輸出OP2為14,OP1及OP3則皆為空字串。表示這個比對週期之後,對應於第2個輸入字元的比對結果為"happen"。In the third comparison cycle, according to the input character "enh" and the previous comparison period The determined secondary state will trigger rule 13, rule 25 and rule 30, where rule 30 belongs to hierarchical unit (1), rule 25 belongs to hierarchical unit (3), and rule 13 belongs to hierarchical unit (7). Rule 13 does not determine a valid secondary state, but determines that the comparison output OP2 is 14, which is "happen". Rule 30 will determine the secondary state as 8, and this result will be sent to the hierarchical unit (4) that is concatenated behind the hierarchical unit (1), and is treated as the current state by the hierarchical unit (4) in the next comparison period. Rule 25 will determine the secondary state to be 3, and this result will be sent to the hierarchical unit (6) connected in series to the hierarchical unit (3), which is regarded as the current state by the hierarchical unit (6) in the next comparison period. According to the triggered rule, the priority comparison output OP2 is determined by the priority multiplexer pmux(2) to be 14, and both OP1 and OP3 are empty strings. After the comparison period is indicated, the comparison result corresponding to the second input character is "happen".

在第四個比對週期,依據輸入的字元"app"及前一比對週期 所決定的次狀態,將觸發規則16及規則24,其中規則24屬於階層單元(4),規則16屬於階層單元(6)。規則24將決定次狀態為11,規則16將決定次狀態為6。階層單元(4)及(6)所輸出的次狀態11及6經優先權多工器pmux(0)依一優 先規則比較後,次狀態6被輸出至最後的階層單元(7),以在下一比對週期被階層單元(7)當成當前狀態。另外,前述兩個被觸發的規則所決定的比對輸出皆為空字串。In the fourth comparison cycle, according to the input character "app" and the previous comparison period The determined secondary state will trigger rule 16 and rule 24, where rule 24 belongs to hierarchy unit (4) and rule 16 belongs to hierarchy unit (6). Rule 24 will determine the secondary state to be 11, and rule 16 will determine the secondary state to be 6. The sub-states 11 and 6 output by the hierarchical units (4) and (6) are prioritized by the priority multiplexer pmux(0) After the rule comparison, the secondary state 6 is output to the last hierarchical unit (7) to be regarded as the current state by the hierarchical unit (7) in the next comparison cycle. In addition, the comparison outputs determined by the two triggered rules are all empty strings.

在第五個比對週期,依據輸入的字元"ygo"及前一比對週期所決定的次狀態,將觸發規則11及規則12,這兩個規則皆屬於階層單元(7)。規則11將決定次狀態為16及比對輸出OP1為7以及OP3為16,其中對應於狀態7的輸出為"enhappy happy",對應於狀態16的輸出為"happygo"。規則11未決定有效的次狀態,但比對輸出OP1為7。因此階層單元(7)所決定次狀態為16,且仍送回階層單元(7),作為下一比對週期當前狀態。因為規則11具有較高的優先權,所以決定最後的比對輸出OP1為7及OP3為16,而比對輸出OP2為空字串。In the fifth comparison period, according to the input character "ygo" and the secondary state determined by the previous comparison period, rule 11 and rule 12 are triggered, both of which belong to the hierarchical unit (7). Rule 11 will determine that the secondary state is 16 and the comparison output OP1 is 7 and OP3 is 16, wherein the output corresponding to state 7 is "enhappy happy" and the output corresponding to state 16 is "happygo". Rule 11 does not determine a valid secondary state, but the comparison output OP1 is 7. Therefore, the hierarchical unit (7) determines that the secondary state is 16, and still returns the hierarchical unit (7) as the current state of the next comparison cycle. Since rule 11 has a higher priority, it is determined that the final comparison output OP1 is 7 and OP3 is 16, and the comparison output OP2 is an empty string.

多字元轉移函式的推導及規則產生Derivation and rule generation of multi-character transfer function

本發明提出一種簡單且有效的作法,可以依據原始的AC-trie產生平行比對多字元的轉移規則。在下面將要說明本發明所提出的產生轉移規則的作法。The present invention proposes a simple and efficient method for generating parallel-directed multi-word transfer rules based on the original AC-trie. The method of generating a transfer rule proposed by the present invention will be described below.

在推導多字元轉移規則之前,本發明先從原來的AC-trie的前進函式及失敗函式得到單一字元轉移函式(1-character transition function),再以此單一字元轉移函式為基礎,推導出多字元轉移函式(multi-character transition function),再由所推導出的多字元轉移函式產生所需的轉移規則。Before deriving the multi-character transfer rule, the present invention first obtains a 1-character transition function from the forward function and the failure function of the original AC-trie, and then uses the single character transfer function. Based on this, a multi-character transition function is derived, and the derived multi-character transfer function produces the required transfer rules.

其中,由初始狀態0開始的轉移函式NX1 (0,﹁{e,h})=0,﹁{e,h}是指'e'及'h'以外的任何字元,在此實施例所敘述的規則中,本發明係以 通配字元'?'來表示,在硬體實作上,則是以遮罩的方式來實現,亦即將對應的樣版遮罩位元設為'0',使其符合於任何字元,但是因為將具有通配字元'?'的規則的優先順序放在完全比對的規則之後,所以會由完全比對的結果會優先決定最後的結果。Wherein, the transfer function NX 1 (0, "{e, h}) = 0 from the initial state 0, "{e, h} refers to any character other than 'e' and 'h', implemented here In the rules described in the examples, the present invention is a wildcard character '? 'To show, in hardware implementation, it is implemented in the form of a mask, that is, the corresponding pattern mask bit is set to '0', so that it conforms to any character, but because it will have With the character '? The priority of the 'rules' is placed after the rules that are completely aligned, so the result of the complete comparison will give priority to the final result.

在AC-trie中,假如一個狀態的失敗函式指向不是初始狀態 的狀態,表示兩個狀態之間有共同的字首(prefix)。因此,當對應所有的狀態的轉移規則(transition rule)都不符合時,將會比對由初始狀態開始的轉移規則。基於此事實,在平行處理多個字元的比對時,利用硬體的特性,可以多個規則同時作用以平行比對不同部分的字元。In AC-trie, if a state's failure function points to an initial state The state indicates that there is a common prefix between the two states. Therefore, when the transition rules corresponding to all states do not match, the transfer rule starting from the initial state will be compared. Based on this fact, when the alignment of a plurality of characters is processed in parallel, with the characteristics of the hardware, a plurality of rules can be simultaneously applied to parallelly match the characters of the different portions.

另外,平行比對的字元個數k不論是多少,狀態的個數仍會 保持不變,與原始的AC-trie一樣。因為,依據相同關鍵字建立的不同個數字元的轉移規則,處理相同的輸入字串,最後得到的狀態一定是一樣的,這樣這些不同個數字元的轉移規則才能用以比對相同的關鍵字。In addition, the number of characters in the parallel alignment, regardless of the number of states, will still be Stay the same, the same as the original AC-trie. Because, according to the transfer rules of different digital elements established by the same keyword, the same input string is processed, and the obtained state must be the same, so that the transfer rules of these different digital elements can be used to compare the same keywords. .

為了說明推導的過程,本發明定義一個k字元轉移函式為 NXk (S1 ,T)=S2 ,其中S1 為當前狀態,T為k字元的字串,S2 為次狀態。此轉移函式所代表的意義為當前狀態為S1 ,接受輸入的k字元的字串T之後,狀態會轉換為S2 。例如,一個3字元轉移函式NX3 (1,nha)=4,其當前狀態為1,接受3字元的字串"nha”之後,其狀態將轉換為4。與3字元字串比對裝置相對應的即是3字元轉移函式。To illustrate the derivation process, the present invention defines a k-character transfer function NX k (S 1 ,T)=S 2 , where S 1 is the current state, T is a string of k characters, and S 2 is a secondary state. . The meaning of this transfer function is that the current state is S 1 , and after accepting the input string of the k character, the state is converted to S 2 . For example, a 3-character transfer function NX 3 (1, nha) = 4, whose current state is 1, and after accepting the 3-character string "nha", its state will be converted to 4. Corresponding to the 3-character string comparison device is a 3-character transfer function.

另外,本發明把轉移函式NX k (S 2 ,T 2 )=S 3 稱為轉移函式NX 1 (S 1 ,T 1 )=S e 的後續轉移函式(successive transitionfunction),假如S 2 =S e ,其表示NX k (S 2 ,T 2 )=S 3 的初始狀態為NX 1 (S 1 ,T 1 )=S e 的結束狀態。In addition, the present invention refers to the transfer function NX k ( S 2 , T 2 )= S 3 as a subsequent transition function of the transfer function NX 1 ( S 1 , T 1 )= S e , if S 2 = S e , which indicates that the initial state of NX k ( S 2 , T 2 )= S 3 is the end state of NX 1 ( S 1 , T 1 )= S e .

本發明可藉著串接一轉移函式與其後續轉移函式而得到新 的多字元轉移函式。此串接運算(concatenating operation)定義如後。The invention can be newly obtained by concatenating a transfer function and its subsequent transfer function Multi-character transfer function. This concatenating operation is defined as follows.

定義(兩個轉移函式的串接):給定一k 字元轉移函式NX k (S 1 ,S 1 )=S 2 及一l 字元轉移函式NX 1 (S 2 ,T 2 )=S 3 ,其中S 1 S 2 、及S 3 為狀態(state),以及T 1 T 2 分別為k 字元字串(k -character string)及l 字元字串(l -character string),則串接此二轉移函式可得到一(k +l )字元轉移函式NX k +l (S 1 ,T 1 T 2 )=S 3 Definition (serial connection of two transfer functions): given a k- character transfer function NX k ( S 1 , S 1 )= S 2 and a 1- character transfer function NX 1 ( S 2 , T 2 ) = S 3, wherein S 1, S 2, and S 3 is at the state (state), and T 1 and T 2 each character string is k (k -character string) and the string of characters l (l -character string Then, concatenating the two transfer function can obtain a ( k + l ) character transfer function NX k + l ( S 1 , T 1 T 2 )= S 3 .

另外,本發明定義一虛擬狀態(pseudo state)及兩種輔助轉 移函式(assistant transition function)。在圖9中,符號'-'代表虛擬狀態,其係為了在推導過程中構成一個完整的轉移函式而虛擬定義的狀態,並非AC-trie中實際存在的狀態。此外,符號'?'代表一任意字元(arbitrary character)。接著,本發明定義兩種輔助轉移函式(assistant transitions)以輔助建構輸出狀態(output state)之完整多字元轉移函式(complete multi-character transition functions)。In addition, the present invention defines a pseudo state and two auxiliary rotations. "assistant transition function". In Fig. 9, the symbol '-' represents a virtual state, which is a state that is virtually defined in order to constitute a complete transfer function in the derivation process, and is not a state actually existing in the AC-trie. Also, the symbol '? 'Represents an arbitrary character. Next, the present invention defines two auxiliary transitions to assist in constructing the complete multi-character transition functions of the output state.

第一種輔助轉移函式定義為NX 1 (-,?)=-,其係依據一任意字元由一虛擬狀態轉移至另一虛擬狀態(transits from a pseudo state to another pseudo state on an arbitrary character.)。第二種輔助轉移函式定義為NX 1 (S op ,?)=-,其係依據一任意字元由一輸出狀態S op 轉移至一虛擬狀態(transits from an output stateS op to a pseudo state on an arbitrary character.)The first auxiliary transfer function is defined as NX 1 (-,?)=-, which is transferred from a virtual state to another pseudo state according to an arbitrary character (transits from a pseudo state to another pseudo state on an arbitrary character) .). The second sub-transfer function is defined as NX 1 (S op,?) = -, which according to a line of arbitrary characters is transferred from an output state to a virtual state S op (transits from an output state S op to a pseudo state On an arbitrary character.)

圖6繪示前述定義的幾種輔助轉移函式。為了與實際的轉移函式有所區別,圖中以虛線來表示輔助轉移函式。NX 1 (0,?)=0並非前述定義的輔助轉移函式,而是依據原來AC-trie中的NX1 (0,﹁{e,h})=0,以通配字元'?' 取代﹁{e,h}得到的,其係用以解決對齊(alignment)的問題。輔助轉移函式NX 1 (S op ,?)=-是用來保存輸出狀態(output state)Sop 的比對輸出(matching output)。因此每個有符合的比對輸出的狀態都會多加一個以虛線表示的輔助轉移函式,以輔助多字元轉移函式的推導,例如狀態7、12、14、及16。而NX 1 (-,?)=-為額外的一字元輔助轉移函式,用以輔助於輸出狀態之後建構完整的多字元轉移函式。通配字元'?'及虛擬狀態'-'是為了建構多字元轉移函式而提出的,在實際的轉移規則(transition rule)中係以樣版遮罩PMASK及輸出遮罩OMASK的對應位元來控制對應的行為。Figure 6 illustrates several auxiliary transfer functions as defined above. In order to distinguish it from the actual transfer function, the auxiliary transfer function is indicated by a broken line in the figure. NX 1 (0,?) = 0 is not the auxiliary transfer function defined above, but is based on the original AC-trie NX 1 (0, "{e, h}) = 0, to match the character '? 'Replaced by {e,h}, it is used to solve the problem of alignment. The auxiliary transfer function NX 1 ( S op ,?)=- is used to save the matching output of the output state S op . Therefore, each of the states of the aligned output will have an auxiliary transfer function indicated by a broken line to assist in the derivation of the multi-word transfer function, such as states 7, 12, 14, and 16. NX 1 (-,?)=- is an additional one-character auxiliary transfer function to assist in constructing a complete multi-word transfer function after the output state. Wildcard character '? 'and virtual state'-' is proposed to construct a multi-character transfer function. In the actual transition rule, the corresponding bit of the pattern mask PMASK and the output mask OMASK is used to control the corresponding behavior.

以下以3字元轉移函式的實例來進一步說明推導過程。The derivation process is further illustrated by an example of a 3-character transfer function.

本發明在推導多字元轉移函式時,係以依據AC-trie所得到 的1字元轉移函式為基礎,然後藉由重複的串接運算得到所需的多字元轉移函式。基於本發明所提出的硬體架構,本發明以兩種方式得到所需的1字元轉移函式,其中第一種係用以推導構成管線的階層單元所使用的轉移函式,第二種則係用以推導最後一個階層單元所使用的轉移函式。The invention is based on AC-trie when deriving the multi-word transfer function. Based on the 1-character transfer function, the desired multi-word transfer function is obtained by repeated concatenation operations. Based on the hardware architecture proposed by the present invention, the present invention obtains the required 1-character transfer function in two ways, the first of which is used to derive the transfer function used by the hierarchical elements constituting the pipeline, and the second It is used to derive the transfer function used by the last hierarchical unit.

以下說明係配合一7階層3字元字串比對裝置的架構。如圖5 所示,本發明係依照圖1的AC-trie,將其區分成兩部分,左半邊包含層級(level)0至3的狀態,右半邊則包含層級4及其以上的狀態。其中左半邊係用以推導管線的階層單元所使用的轉移函式,不需考慮失敗函式,因此表示失敗函式的虛線都被移除。右半邊則係用以推導最後一個階層單元所使用的轉移函式,此部分仍要保留失敗函式。左半邊和右半邊的分界點係依據前述字串比對裝置的階層單元的數目來決定,例如在圖4的實施例中,其中構成3個管線的階層單元有6個,即以NFA方式運作的係Ac-trie的層級0至3。The following description is based on the architecture of a 7-level 3-character string comparison device. Figure 5 As shown, the present invention is divided into two parts according to the AC-trie of Fig. 1, the left half containing the level 0 to 3 state, and the right half containing the level 4 and above. The left half is the transfer function used to push the hierarchical unit of the conduit line, and the failure function is not considered, so the dotted line indicating the failure function is removed. The right half is used to derive the transfer function used by the last hierarchical unit, which still retains the failure function. The boundary points of the left half and the right half are determined according to the number of hierarchical units of the preceding string comparison device. For example, in the embodiment of FIG. 4, there are six hierarchical units constituting three pipelines, that is, operating in an NFA manner. The level of Ac-trie is 0 to 3.

圖7繪示層級0至3的1字元轉移函式,其中,因為不考慮失 敗函式,故圖中的一字元轉移函式係直接由圖1之AC-trie中的前進函式得到。如圖7所示,由同一層級的狀態開始的一字元轉移函式被歸為一組。例如在(a)組中為層級0的一字元轉移函式,在層級0只有初始狀態0(initial state)。如前所述,轉移函式NX1 (0,?)=0係用以處理關鍵字未與輸入的開始字元對齊的對齊問題(alignment problem)。(b)組為層級1的一字元轉移函式,層級1包括狀態1及8。(c)組為層級2的一字元轉移函式,層級2包括狀態2及9。(d)組為層級3的一字元轉移函式,層級3包括狀態3及10。FIG. 7 illustrates a 1-character transfer function of levels 0 to 3, wherein the character transfer function in the figure is directly obtained from the forward function in AC-trie of FIG. 1 because the failure function is not considered. . As shown in Figure 7, a character transfer function starting from the state of the same level is grouped together. For example, in the (a) group, it is a character transfer function of level 0, and at level 0, there is only an initial state (initial state). As mentioned earlier, the transfer function NX 1 (0,?) = 0 is used to process the alignment problem in which the keyword is not aligned with the input start character. (b) The group is a character transfer function of level 1, and level 1 includes states 1 and 8. (c) Group is a character transfer function of level 2, and level 2 includes states 2 and 9. (d) Group is a character transfer function of level 3, and level 3 includes states 3 and 10.

圖8繪示層級4至7的1字元轉移函式,其中,因失敗函式而 衍生的1字元轉移函式係以虛線表示。(a)組為層級4的1字元轉移函式,層級4包括狀態4及11。(b)組為層級5的1字元轉移函式,層級5包括狀態5、12及13,其中狀態12具有符合的比對輸出,故多了輔助轉移函式NX 1 (12,?)=-。 (c)組為層級6的1字元轉移函式,層級6包括狀態6、14、及15,其中,狀態14多了輔助轉移函式NX 1 (14,?)=-;狀態6因失敗函式指向狀態11而多了一個轉移函式NX 1 (6,e)=13。為了讓讀者容易分別,因失敗函式而增加的轉移函式係以虛線表示。(d)組為層級7的1字元轉移函式,層級7包括狀態7及16,其中狀態7因失敗函式指向狀態12而多了一個轉移函式NX 1 (7,g)=15;狀態16則多了輔助轉移函式NX 1 (16,?)=-。Figure 8 illustrates a 1-character transfer function of levels 4 through 7, wherein the 1-character transfer function derived from the failure function is indicated by a dashed line. (a) The group is a 1-character transfer function of level 4, and level 4 includes states 4 and 11. (b) The group is a 1-character transfer function of level 5, and level 5 includes states 5, 12, and 13, where state 12 has a matching output, so the auxiliary transfer function NX 1 (12,?) = -. (c) The group is a 1-character transfer function of level 6, and level 6 includes states 6, 14, and 15, wherein state 14 has more auxiliary transfer functions NX 1 (14,?)=-; state 6 fails The function points to state 11 and has a transfer function NX 1 (6,e)=13. In order to make the reader easy to separate, the transfer function added by the failure function is indicated by a broken line. (d) The group is a 1-character transfer function of level 7, and level 7 includes states 7 and 16, where state 7 has a transfer function NX 1 (7, g) = 15 due to the failure function pointing to state 12; State 16 has more auxiliary transfer functions NX 1 (16,?)=-.

圖9及圖10係用以說明依據前述1字元轉移函式經由串接運算得到3字元轉移函式(3-character transition function)的推導過程,其中,符號'+'代表兩個轉移函式的串接。圖9繪示前6個階層單元所使用的3字元轉移函式,圖10繪示第7個階層單元所使用的3字元轉移函式,其中'=>'的左 邊的係串接的過程,'=>'的右邊則係串接的結果。為了清楚起見,於圖9、10中只標明非空字串的比對輸出,而比對輸出皆為空字串者,則不予以標示。9 and FIG. 10 are diagrams for explaining a derivation process of obtaining a 3-character transition function by a concatenation operation according to the aforementioned 1-character transfer function, wherein the symbol '+' represents two transfer functions. Serial connection. Figure 9 shows the 3-character transfer function used by the first 6 hierarchical units, and Figure 10 shows the 3-character transfer function used by the 7th hierarchical unit, where the left of '=>' The process of serial connection, the right side of '=>' is the result of serial connection. For the sake of clarity, only the comparison output of the non-empty string is indicated in FIGS. 9 and 10, and the comparison output is an empty string, and is not indicated.

在圖9中,所繪示之階層1的3字元轉移函式的推導過程,其 係先串接前面的兩個NX 1 (0,?)=0,再分別與狀態0的2個1字元轉移函式NX 1 (0,e )=1及NX 1 (0,h )=8串接而得到2個3字元轉移函式NX 3 (0,??e )=1及NX 3 (0,??h )=8,其中前兩個轉移函式NX1 (0,?)=0表示狀態持續停在初始狀態,直到第三個字元符合'e'或'h'時才轉移至層級1的狀態1或8。同樣的,於所繪示之階層2的3字元轉移函式的推導過程中,其係先串接NX 1 (0,?)=0與狀態0的2個1字元轉移函式,再分別與狀態1和8的後續轉移函式串接而得到2個3字元轉移函式NX 3 (0,?en )=2及NX 3 (0,?ha )=9。In Fig. 9, the derivation process of the 3-character transfer function of the hierarchical 1 is shown, which is preceded by concatenating the two preceding NX 1 (0, ?) = 0, and then respectively with the 1 of the state 0 The character transfer function NX 1 (0, e )=1 and NX 1 (0, h )=8 are connected in series to obtain two 3-character transfer functions NX 3 (0,?? e )=1 and NX 3 (0,?? h )=8, where the first two transfer functions NX 1 (0,?)=0 indicate that the state continues to be in the initial state until the third character matches 'e' or 'h'. Transfer to state 1 or 8 of level 1. Similarly, in the derivation of the 3-character transfer function of the stratum 2 shown, it is first to concatenate the two 1-character transfer functions of NX 1 (0,?)=0 and state 0, and then Two congruent transfer functions NX 3 (0,? en )=2 and NX 3 (0,? ha )=9 are obtained by concatenating with the subsequent transfer functions of states 1 and 8, respectively.

階層1及2的3字元轉移函式可用以解決多字元字串比對時的 對齊問題(aligmnent problem)。階層1的3字元轉移函式不比對前兩個字元一以通配字元'?'表示,只比對最後一個字元,故其為使一樣版(pattern)的開頭與輸入字串的第3個字元對齊的轉移函式。階層2的3字元轉移函式不比對第1個字元-以通配字元'?'表示,只比對後2個字元,故其為使一樣版的開頭與輸入字串的第2個字元對齊的轉移函式。Hierarchical 1 and 2 3-character transfer functions can be used to solve multi-word string alignment Align problem (aligmnent problem). The 3-character transfer function of level 1 is no more than the first two characters. 'Represents that only the last character is compared, so it is a transfer function that aligns the beginning of the same pattern with the third character of the input string. The 3 character transfer function of level 2 is no better than the first character - with wild characters '? ' indicates that only the last two characters are compared, so it is a transfer function that aligns the beginning of the same version with the second character of the input string.

依據前述方式,透過串接運算可以推導階層3至7的3字元轉移函式。According to the foregoing manner, the 3-character transfer function of the hierarchy 3 to 7 can be derived through the concatenation operation.

依據前述的說明,本發明技術領域具通常知識者可以很容易地推導出任何字元數的多字元轉移函式。推導出多字元轉移函式後,即可依此推導結果產生所需的多字元轉移規則。In accordance with the foregoing description, a multi-character transfer function of any number of characters can be readily derived by one of ordinary skill in the art. After deriving the multi-character transfer function, the resulting multi-character transfer rule can be derived from this result.

演算法Algorithm

以下,本發明將以程式的演算法來說明依據1字元轉移函式 產生k字元轉移函式的作法。從原始的AC-trie推導出1字元轉移函式,是很直接的,依據前面的說明,本發明技術領域具通常知識者可以很容易的達成此作法,故在此說明中略過關於此部分的敘述。以下說明請參照圖11所繪示之演算法。Hereinafter, the present invention will be described by a program algorithm based on a 1-character transfer function. The practice of generating a k-character transfer function. It is very straightforward to derive the 1-character transfer function from the original AC-trie. According to the foregoing description, the person skilled in the art can easily achieve this practice, so in this description, the relevant part is skipped. Narration. Please refer to the algorithm illustrated in FIG. 11 for the following description.

演算法的輸入參數k為要平行比對的字元數。輸入參數 NXSET包含原始的1字元轉移函數,演算法的結果為存放於變數TRSET中之k字元轉移函數。傳回的k字元轉移函數係用以產生本發明字串比對裝置所需的多字元轉移規則。The input parameter k of the algorithm is the number of characters to be aligned in parallel. Input parameters NXSET contains the original 1-character transfer function, and the result of the algorithm is the k-character transfer function stored in the variable TRSET. The returned k-character transfer function is used to generate the multi-word transfer rules required by the string alignment device of the present invention.

此演算法係經由多層迴圈針對原始AC-trie的每一狀態推 導對應的k字元轉移函數。在開始的第2行,將TRSET清空。在第3行至第21行之間的迴圈中,針對AC-trie中的每一狀態Si推導其對應的所有k字元轉移函數。在第5行,將狀態Si的所有1字元轉移函數複製到NSET。This algorithm is based on a multi-layer loop for each state of the original AC-trie The corresponding k-character transfer function is derived. In the second line of the beginning, clear TRSET. In the loop between the 3rd line and the 21st line, all of the corresponding k-character transfer functions are derived for each state Si in the AC-trie. In line 5, all 1-character transfer functions of state Si are copied to NSET.

重複執行k-1次第7行至第19行之間的迴圈,以疊代的方式串 接Si的1字元轉移函數及其後續的k-1個1字元轉移函數,藉以得到Si的k字元轉移函數。重複執行k-1次第7行至第19行之間的迴圈之後,NSET中包含由Si開始的所有的k字元轉移函數。在第20行,將NSET加到TRSET中,然後回到第5行,繼續處理下一個狀態。當所有的狀態都處理好了,則結束此演算法。Repeat the k-1 line between the 7th line and the 19th line, in the form of iteration The Si character transfer function of Si and its subsequent k-1 1-character transfer function are used to obtain the k-character transfer function of Si. After repeating the loop between line 7 and line 19 of k-1 times, NSET contains all the k-character transfer functions starting from Si. On line 20, add NSET to TRSET, then back to line 5 and continue processing the next state. When all the states are processed, the algorithm ends.

接著進一步討論第7行至第19行之間的迴圈。在第8行,清空 TMPSET。在第10行至第17行之間的迴圈,係依儲存在NSET中的每一轉移 函式NXi執行一次。在第11行,將NXi的次狀態指定至NX_ST。在第13行至第16行之間的迴圈,係依NX_ST開始的每一轉移函式NXj執行一次。在第14行,將轉移函式NXj與轉移函式NXi串接,得到新的轉移函式NEW_TR。在第15行,將NEW_TR加至TMPSET中。其中,轉移函式NEW_TR的樣版字元的數目會比轉移函式NXi的樣版字元的數目多一個。 在前述串接過程中,所推導出來的多字元轉移函式可能全部都是由輔助轉移函式所構成,在第22行中,將沒用的轉移函式移除掉。The loop between line 7 and line 19 is then discussed further. On line 8, empty TMPSET. The loop between line 10 and line 17 is based on each transfer stored in NSET The function NXi is executed once. On line 11, assign the secondary state of NXi to NX_ST. The loop between lines 13 to 16 is executed once for each transfer function NXj starting with NX_ST. In line 14, the transfer function NXj is concatenated with the transfer function NXi to obtain a new transfer function NEW_TR. On line 15, add NEW_TR to TMPSET. Among them, the number of pattern characters of the transfer function NEW_TR will be one more than the number of pattern characters of the transfer function NXi. In the foregoing concatenation process, the derived multi-word transfer functions may all be composed of auxiliary transfer functions. In the 22nd line, the useless transfer function is removed.

依據前述演算法的說明,推導出多字元轉移函式之後,即可 根據推導出的多字元轉移函式產生應用於本發明的字串比對裝置的多字元轉移規則。建立k字元的轉移規則後,還需將規則的順序予以重新排列,其中較後面的階層所設定的規則具有較高的優先順序,例如階層7的優先順序最高,而階層1的優先順序最低。在同一個階層中,則是完全匹配(exact matching)的樣版(pattern)的優先順序要在部分匹配的樣版之前。例如在圖3(a)-3(b)的規則表中,其係將由同一個狀態(表中的p_st欄位)開始的規則擺在一起,例如規則3和4都是由狀態11開始,規則3的樣版字串為"ygo",而規則4的樣版字串為"y?‘?",後者的最後兩個字元是不比對的,所以規則4的優先順序較低。According to the description of the foregoing algorithm, after deriving the multi-word transfer function, A multi-word transfer rule applied to the string comparison device of the present invention is generated based on the derived multi-word transfer function. After the k-character transfer rule is established, the order of the rules needs to be rearranged, wherein the rules set by the later classes have higher priority, for example, the priority of the hierarchy 7 is the highest, and the priority of the hierarchy 1 is the lowest. . In the same hierarchy, the pattern of exact matching patterns is prior to the partially matched pattern. For example, in the rule table of Figures 3(a)-3(b), it is a rule that starts with the same state (p_st field in the table), for example, rules 3 and 4 start with state 11, The pattern string of rule 3 is "ygo", and the pattern string of rule 4 is "y?'?", the last two characters of the latter are not aligned, so rule 4 has a lower priority.

一個簡單的作法,是先依階層排序,然後於相同階層中,再 依據樣版遮罩PMASK的二進位數值排序-數值大的優先順序較高。例如規則3的PMASK為"1111",而規則4的PMASK為"1100","1100"小於"1111",所以規則4排在後面。雖然,在前述實施例的規則並非依照樣版遮罩PMASK的二進位數值排序,但本發明技術領域具通常知識者應可很容易地依此法 實施。A simple approach is to sort by the hierarchy and then in the same class, then Sorting according to the binary value of the pattern mask PMASK - the numerical order has a higher priority. For example, the PMASK of rule 3 is "1111", while the PMASK of rule 4 is "1100", and "1100" is less than "1111", so rule 4 is listed later. Although the rules of the foregoing embodiments are not ordered according to the binary values of the pattern mask PMASK, those skilled in the art of the present invention should be able to easily follow the method. Implementation.

對於1字元轉移函式,每一個轉移函式的比對輸出(matching output)均可以其次狀態代表。然而對於多字元轉移函式,例如k字元轉移函式,其中k>1,假若每一k字元轉移函式只輸出次狀態,則關於其中間所經過的狀態的資訊會被隱藏起來。在這樣的情況下,只能知道對應於最後一個輸入字元的比對輸出,對應於前面k-1個輸入字元的比對輸出將會被遺漏掉。因此,在使用前述的演算法來推導多字元轉移函式時,在串接的過程中必須保留對應於每一字元的比對結果。雖然這部分在前述演算法的說明中並未提及,但本發明技術領域具通常知識者應能輕易達成。For the 1-character transfer function, the comparison output of each transfer function (matching Output) can be represented by the second state. However, for a multi-character transfer function, such as a k-character transfer function, where k>1, if each k-character transfer function outputs only the secondary state, information about the state passing through it will be hidden. . In such a case, only the aligned output corresponding to the last input character can be known, and the aligned output corresponding to the previous k-1 input characters will be missed. Therefore, when using the aforementioned algorithm to derive a multi-word transfer function, the alignment result corresponding to each character must be retained in the process of concatenation. Although this part is not mentioned in the description of the aforementioned algorithm, the person skilled in the art of the present invention should be able to easily achieve it.

依前述之原理為基礎,本發明提出一可彈性設定資料寬度的 多字元字串比對架構,以提升字串比對效率。Based on the foregoing principles, the present invention proposes an elastically configurable data width Multi-word string alignment architecture to improve string alignment efficiency.

請參照圖12,其繪示本發明可彈性設定資料寬度之多字元字 串比對裝置一較佳實施例之方塊圖。如圖12所示,該裝置包括一規則電路1210、一狀態電路1220、及一輸出電路1230。Please refer to FIG. 12, which illustrates the multi-character word of the flexible width of the present invention. A block diagram of a preferred embodiment of a string comparison device. As shown in FIG. 12, the device includes a regular circuit 1210, a state circuit 1220, and an output circuit 1230.

規則電路1210具有M個通用規則單元12101。請參照圖13, 其繪示通用規則單元12101一實施例之方塊圖。如圖13所示,通用規則單元12101具有至少一字串輸入端IN-CHRS、複數個次態輸入端NXI 、複數個次態輸出端NXO 、複數個比對輸出端OP、一第一併接輸入端CI、一第二併接輸入端EI、一第一併接輸出端CO、以及一第二併接輸出端EO,且各所述通用規則單元12101各具有一轉移規則執行單元121011以及一邏輯電路121012。The rules circuit 1210 has M general rule units 12101. Please refer to FIG. 13, which illustrates a block diagram of an embodiment of a general rule unit 12101. As shown in FIG. 13, the general rule unit 12101 has at least one string input terminal IN-CHRS, a plurality of secondary state input terminals N XI , a plurality of secondary state output terminals N XO , a plurality of comparison output terminals OP, and a first The input terminal CI, a second parallel input terminal EI, a first parallel output terminal CO, and a second parallel output terminal EO, and each of the general rule units 12101 each have a transfer rule execution unit 121011 And a logic circuit 121012.

該轉移規則執行單元121011係依所述至少一字串輸入端 IN-CHRS及所述複數個次態輸入端NXI 之輸入資料執行以一AC-trie為基礎所建立之一轉移規則而產生一比對結果M及在所述複數個次態輸出端NXO 及所述複數個比對輸出端OP輸出資料,該轉移規則執行單元121011具有一暫存單元(未示於圖中)以儲存一規則資料、一起始旗標F及一最終旗標L。The transfer rule execution unit 121011 generates a transfer rule based on the AC-trie based on the input data of the at least one string input terminal IN-CHRS and the plurality of secondary input terminals N XI to generate a transfer rule. Comparing the result M and outputting data at the plurality of secondary output terminals N XO and the plurality of comparison output terminals OP, the transfer rule execution unit 121011 has a temporary storage unit (not shown) for storing one Rule data, a starting flag F and a final flag L.

該邏輯電路121012係依該第一併接輸入端CI之輸入信號、該 第二併接輸入端EI之輸入信號、該起始旗標F、該最終旗標L、以及該比對結果M進行一邏輯運算以在該第一併接輸出端CO和該第二併接輸出端EO分別產生一輸出信號。The logic circuit 121012 is based on the input signal of the first parallel input terminal CI, The second parallel input signal EI, the start flag F, the final flag L, and the comparison result M are logically operated to connect the first parallel output terminal CO and the second parallel The output EO generates an output signal, respectively.

請參照圖14,其繪示用以實施轉移規則執行單元121011和邏 輯電路121012之一電路圖。如圖14所示,轉移規則執行單元121011係藉由一處理單元121011a、一多工器121011b、一多工器121011c、一解多工器121011d、以及一解多工器121011e執行以一AC-trie為基礎所建立之一轉移規則而產生一比對結果M及在所述複數個次態輸出端NXO 及所述複數個比對輸出端OP輸出資料。處理單元121011a內部具有一暫存單元(未示於圖中)以儲存一規則資料、一起始旗標F及一最終旗標L。處理單元121011a會將輸入的CUR_ST、XC1、及XC2與其儲存之規則資料比對,當其為符合時,則比對結果M為1,否則比對結果M為0。Please refer to FIG. 14 , which illustrates a circuit diagram for implementing the transfer rule execution unit 121011 and the logic circuit 121012. As shown in FIG. 14, the transfer rule execution unit 121011 is executed by a processing unit 121011a, a multiplexer 121011b, a multiplexer 121011c, a demultiplexer 121011d, and a demultiplexer 121011e. Based on one of the transfer rules established by the trie, a comparison result M is generated and the data is outputted at the plurality of secondary output terminals N XO and the plurality of comparison outputs OP. The processing unit 121011a internally has a temporary storage unit (not shown) for storing a rule data, a start flag F and a final flag L. The processing unit 121011a compares the input CUR_ST, XC1, and XC2 with the stored rule data. When it is a match, the comparison result M is 1, otherwise the comparison result M is 0.

處理單元121011a之信號ST_SEL用以控制多工器121011b以 選擇對應之狀態值,當作輸入之當前狀態CUR_ST。信號NX_SEL用以控制解多工器121011d以將次態NX_ST送至對應之輸出。SEQ_NO用以控制多工器121011c以選擇對應之輸入字元,以及控制解多工器121011e以將比對輸出XOP1及XOP2送至對應之輸出。其中,解多工器121011d及121011e在信號 EN為1才輸出資料。The signal ST_SEL of the processing unit 121011a is used to control the multiplexer 121011b to Select the corresponding status value as the current status of the input CUR_ST. The signal NX_SEL is used to control the demultiplexer 121011d to send the secondary NX_ST to the corresponding output. SEQ_NO is used to control the multiplexer 121011c to select the corresponding input character, and to control the demultiplexer 121011e to send the comparison outputs XOP1 and XOP2 to the corresponding outputs. Among them, the solution multiplexer 121011d and 121011e are in the signal The output is only when EN is 1.

邏輯電路121012主要係用以執行一邏輯運算,其內容如下:CO=/L AND M AND(F OR(CI AND/F)),以及EO=/F AND((EI AND/L)OR(CI AND M AND L)),其中/代表反相運算子,AND代表邏輯且運算子,OR代表邏輯或運算子。The logic circuit 121012 is mainly used to perform a logic operation, and its contents are as follows: CO=/L AND M AND(F OR(CI AND/F)), and EO=/F AND((EI AND/L)OR(CI AND M AND L)), where / represents an inverting operator, AND represents a logical and operator, and OR represents a logical OR operator.

於操作時,所述複數個通用規則單元12101係分成複數個子群,且各所述子群內之複數個所述通用規則單元12101係藉由所述第一併接輸入端CI、所述第一併接輸出端CO、所述第二併接輸入端EI、和所述第二併接輸出端EO互相連接。In operation, the plurality of general rule units 12101 are divided into a plurality of subgroups, and the plurality of the common rule units 12101 in each of the subgroups are by the first parallel input terminal CI, the first The output terminal CO, the second parallel input terminal EI, and the second parallel output terminal EO are connected to each other.

為更進一步說明通用規則單元之作用,以下將以一範例來說明。每一通用規則單元可處理2個輸入字元,於此範例中,將3個通用規則單元連接成一群組,因而可用以處理一6字元之轉移規則,在此範例為NX 6 (0,happyg )=15,其中對應於第5個輸入字元有一符合的比對輸出OP5,其值為12。此6字元之轉移規則係依前述演算法所產生。請參照圖15,其為3個通用規則單元12101藉由所述第一併接輸入端CI、所述第一併接輸出端CO、所述第二併接輸入端EI、和所述第二併接輸出端EO互相連接而形成一子群以一次比對6個字元(C1-C6)的示意圖。如圖15所示,F=1的通用規則單元12101係起始通用規則單元,即規則單元1,其SEQ_NO=1,故選擇字元C1及C2為輸入字元。中間的通用規則單元12101,即規則單元2,其F及L皆為0,且其SEQ_NO=2,故選擇字元C3及C4為輸入字元。而L=1的通用規則單元12101則為最終通用規則單元,即規則單元2,其SEQ_NO=3,故選擇字元C5及C6為輸入字元。且當前述3個規則單元之比對結果為符合時,只有 最終通用規則單元要輸出次態NX。藉由此方式,吾人即可併接出所需的資料寬度以進行一多字元字串比對程序。To further illustrate the role of the general rule unit, an example will be described below. Each general rule unit can process 2 input characters. In this example, 3 general rule units are connected into a group, and thus can be used to process a 6-character transfer rule, in this example NX 6 (0, Happyg )=15, where a corresponding alignment output OP5 corresponding to the fifth input character has a value of 12. This 6-character transfer rule is generated according to the aforementioned algorithm. Referring to FIG. 15 , the three general rule units 12101 are configured by the first parallel input terminal CI, the first parallel output terminal CO, the second parallel input terminal EI, and the second A schematic diagram in which the output terminals EO are connected to each other to form a subgroup to compare six characters (C1-C6) at a time. As shown in FIG. 15, the general rule unit 12101 of F=1 is the start general rule unit, that is, the rule unit 1, whose SEQ_NO=1, so that the selected characters C1 and C2 are input characters. The intermediate general rule unit 12101, that is, the rule unit 2, has both F and L 0, and its SEQ_NO=2, so the selected characters C3 and C4 are input characters. The general rule unit 12101 of L=1 is the final general rule unit, that is, the rule unit 2, and SEQ_NO=3, so the selected characters C5 and C6 are input characters. And when the comparison result of the foregoing three rule units is met, only the final general rule unit outputs the state NX. In this way, we can pick up the required data width to perform a multi-character string comparison program.

另外,狀態電路1220具有複數個輸入端以耦合所述複數個通 用規則單元之所述複數個次態輸出端NXO ,及複數個輸出端以耦合所述複數個通用規則單元之所述複數個次態輸入端NXI 。狀態電路1220係依所述複數個次態輸出端NXO 之一優先順序決定所述複數個次態輸入端NXI 的狀態值,而該優先順序係依所述的轉移規則而決定。狀態電路1220具有暫存單元(未示於圖中)以儲存儲存該些所決定之次態之狀態值。輸出電路1230具有複數個輸入端以耦合所述複數個通用規則單元之所述複數個比對輸出端OP,及複數個輸出端以產生複數個比對輸出信號MOP。輸出電路1230係依所述複數個比對輸出端OP之一優先順序決定所述複數個比對輸出信號MOP的內容,而該優先順序亦係依所述的轉移規則而決定。Additionally, state circuit 1220 has a plurality of inputs coupled to said plurality of secondary output terminals N XO of said plurality of general rule units, and a plurality of outputs coupled to said plurality of said plurality of general rule units Secondary input N XI . The state circuit 1220 determines the state values of the plurality of secondary inputs N XI in accordance with one of the plurality of secondary outputs N XO , and the priority is determined according to the transfer rule. The state circuit 1220 has a temporary storage unit (not shown) for storing state values for storing the determined secondary states. The output circuit 1230 has a plurality of inputs for coupling the plurality of aligned output terminals OP of the plurality of general rule units, and a plurality of outputs for generating a plurality of aligned output signals MOP. The output circuit 1230 determines the content of the plurality of comparison output signals MOP according to a priority order of the plurality of comparison output terminals OP, and the priority order is also determined according to the transfer rule.

至此,本發明之內容已詳盡揭露,而依其可彈性設定資料寬 度之設計,本發明確可依需要彈性設定要比對的字元數目以充份發揮硬體的效能。So far, the content of the present invention has been disclosed in detail, and the data width can be flexibly set according to its According to the design of the degree, the present invention can flexibly set the number of characters to be matched as needed to fully exert the performance of the hardware.

本案所揭示者,乃較佳實施例,舉凡局部之變更或修飾而源 於本案之技術思想而為熟習該項技藝之人所易於推知者,俱不脫本案之專利權範疇。The present invention is the preferred embodiment, and the source is changed or modified locally. Those who are acquainted with the technical idea of the case and who are familiar with the skill can not deduct the patent right of the case.

綜上所陳,本案無論就目的、手段與功效,在在顯示其迥異 於習知之技術特徵,且其首先發明合於實用,亦在在符合發明之專利要件,懇請 貴審查委員明察,並祈早日賜予專利,俾嘉惠社會,實感德便。In summary, this case shows its difference in terms of purpose, means and function. In the technical characteristics of Xizhi, and its first invention is practical and practical, it is also in compliance with the patent requirements of the invention. Please ask your review committee to inspect it and pray for an early patent.

12101‧‧‧通用規則單元12101‧‧‧General Rule Unit

Claims (2)

一種可彈性設定資料寬度之多字元字串比對裝置,其具有:一規則電路,其具有複數個通用規則單元,各所述通用規則單元均具有至少一字串輸入端、複數個次態輸入端、複數個次態輸出端、複數個比對輸出端、一第一併接輸入端、一第二併接輸入端、一第一併接輸出端、以及一第二併接輸出端,且各所述通用規則單元各具有一轉移規則執行單元以及一邏輯電路,該轉移規則執行單元係依所述至少一字串輸入端及所述複數個次態輸入端之輸入資料執行以一AC-trie為基礎所建立之一轉移規則而產生一比對結果及在所述複數個次態輸出端及所述複數個比對輸出端輸出資料,該轉移規則執行單元具有一暫存單元以儲存一起始旗標及一最終旗標,該邏輯電路係依該第一併接輸入端之輸入信號、該第二併接輸入端之輸入信號、該起始旗標、該最終旗標、以及該比對結果進行一邏輯運算以在該第一併接輸出端和該第二併接輸出端分別產生一輸出信號,其中,所述複數個通用規則單元係分成複數個子群,且各所述子群內之複數個所述通用規則單元係藉由所述第一併接輸入端、所述第一併接輸出端、所述第二併接輸入端、和所述第二併接輸出端互相連接;一狀態電路,具有複數個輸入端以耦合所述複數個通用規則單元之所述複數個次態輸出端,及複數個輸出端以耦合所述複數個通用規則單元之所述複數個次態輸入端;以及一輸出電路,具有複數個輸入端以耦合所述複數個通用規則單元之所述複數個比對輸出端,及複數個輸出端以產生複數個比對輸出信號。A multi-character string matching device capable of flexibly setting a data width, comprising: a rule circuit having a plurality of general rule units, each of the general rule units having at least one string input end and a plurality of sub-states An input end, a plurality of secondary output terminals, a plurality of comparison output terminals, a first parallel connection input terminal, a second parallel connection input terminal, a first parallel connection output terminal, and a second parallel connection output terminal, Each of the general rule units has a transfer rule execution unit and a logic circuit, and the transfer rule execution unit performs an AC according to the input data of the at least one string input terminal and the plurality of secondary state input terminals. -trie generates a comparison result based on one of the transfer rules, and outputs data at the plurality of secondary output terminals and the plurality of comparison outputs, the transfer rule execution unit has a temporary storage unit for storing a start flag and a final flag, the logic circuit is based on the input signal of the first parallel input, the input signal of the second parallel input, the start flag, the final flag, and the Performing a logic operation on the comparison result to generate an output signal at the first parallel output end and the second parallel output end, wherein the plurality of general rule units are divided into a plurality of subgroups, and each of the sub-groups The plurality of the general rule units in the group are mutually coupled by the first parallel input, the first parallel output, the second parallel input, and the second parallel output a state circuit having a plurality of inputs for coupling the plurality of secondary outputs of the plurality of general rule units, and a plurality of outputs for coupling the plurality of times of the plurality of general rule units And an output circuit having a plurality of inputs for coupling the plurality of aligned outputs of the plurality of general rule units and a plurality of outputs for generating a plurality of aligned output signals. 如申請專利範圍第1項所述之可彈性設定資料寬度之多字元字串比對裝置,其中所述的邏輯運算為:CO=/L AND M AND(F OR(CI AND/F)),以及EO=/F AND((EI AND/L)OR(CI AND M AND L)),其中CO代表所述第一併接輸出端的輸出信號,EO代表所述第二併接輸出端的輸出信號,CI代表所述第一併接輸入端的輸入信號,EI代表所述第二併接輸入端的輸入信號,F代表所述起始旗標,L代表所述最終旗標,M代表所述比對結果,/代表反相運算子,AND代表邏輯且運算子,OR代表邏輯或運算子。The multi-character string alignment device capable of flexibly setting a data width as described in claim 1, wherein the logical operation is: CO=/L AND M AND(F OR(CI AND/F)) And EO=/F AND((EI AND/L)OR(CI AND M AND L)), wherein CO represents an output signal of the first parallel output, and EO represents an output signal of the second parallel output CI represents the input signal of the first parallel input, EI represents the input signal of the second parallel input, F represents the start flag, L represents the final flag, and M represents the comparison As a result, / represents an inverting operator, AND represents a logical and operator, and OR represents a logical OR operator.
TW103143825A 2014-12-16 2014-12-16 Can flexibly set the data width of the multi-character string alignment device TWI509441B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW103143825A TWI509441B (en) 2014-12-16 2014-12-16 Can flexibly set the data width of the multi-character string alignment device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW103143825A TWI509441B (en) 2014-12-16 2014-12-16 Can flexibly set the data width of the multi-character string alignment device

Publications (2)

Publication Number Publication Date
TWI509441B true TWI509441B (en) 2015-11-21
TW201624315A TW201624315A (en) 2016-07-01

Family

ID=55220163

Family Applications (1)

Application Number Title Priority Date Filing Date
TW103143825A TWI509441B (en) 2014-12-16 2014-12-16 Can flexibly set the data width of the multi-character string alignment device

Country Status (1)

Country Link
TW (1) TWI509441B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5784298A (en) * 1993-10-15 1998-07-21 International Business Machines Corporation Apparatus and method for using finite state machines (FSMs) to monitor a serial data stream for characteristic patterns
TW201007497A (en) * 2008-08-01 2010-02-16 Univ Ishou Bidirectional string-matching method and string-searching method for intrusion detection system
US20110145544A1 (en) * 2009-12-15 2011-06-16 Micron Technology, Inc. Multi-level hierarchical routing matrices for pattern-recognition processors
TW201308110A (en) * 2011-08-11 2013-02-16 Univ Nat Taiwan Multi-level parallel multi-character string comparison device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5784298A (en) * 1993-10-15 1998-07-21 International Business Machines Corporation Apparatus and method for using finite state machines (FSMs) to monitor a serial data stream for characteristic patterns
TW201007497A (en) * 2008-08-01 2010-02-16 Univ Ishou Bidirectional string-matching method and string-searching method for intrusion detection system
US20110145544A1 (en) * 2009-12-15 2011-06-16 Micron Technology, Inc. Multi-level hierarchical routing matrices for pattern-recognition processors
TW201308110A (en) * 2011-08-11 2013-02-16 Univ Nat Taiwan Multi-level parallel multi-character string comparison device

Also Published As

Publication number Publication date
TW201624315A (en) 2016-07-01

Similar Documents

Publication Publication Date Title
US8972450B2 (en) Multi-stage parallel multi-character string matching device
Lee et al. Testing finite-state machines: State identification and verification
US7725510B2 (en) Method and system for multi-character multi-pattern pattern matching
CN106797446B (en) Memory-based history search
US8843508B2 (en) System and method for regular expression matching with multi-strings and intervals
US8504510B2 (en) State machine compression for scalable pattern matching
Le et al. A memory-efficient and modular approach for large-scale string pattern matching
US20140244554A1 (en) Non-deterministic finite state machine module for use in a regular expression matching system
CN108021569A (en) The structure of AC automatic machines and Chinese multi-model matching method and relevant apparatus
CN102736888B (en) With the data retrieval circuit of synchronization of data streams
CN103226551B (en) The matching process of the non deterministic finite automaton based on TCAM and device
TW200921435A (en) Apparatus, method and system for performing a rule matching on a datastream
JP5120263B2 (en) Pattern matching apparatus and method
Kari Automata and formal languages
US8463988B2 (en) System and method for matching patterns
TWI509441B (en) Can flexibly set the data width of the multi-character string alignment device
CN111651137A (en) Sorting method, device, electronic device and computer equipment
KR101276796B1 (en) Apparatus and method for matching pattern
TWI521364B (en) Multi-stage parallel multi-character string matching device
TWI443538B (en) Multi - hierarchical parallel multi - character string alignment device
Tadaki Phase transition and strong predictability
CN104008136A (en) Method and device for text searching
TWI417784B (en) Parallel multiple-character string matching device
Schewe et al. Tight bounds for complementing parity automata
Kumar et al. Fast map-addressing for content addressable memories using register reconfiguration

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees