HK1208104B - A method and computer system of compiling a pattern into a non-deterministic finite automata (nfa) graph - Google Patents
A method and computer system of compiling a pattern into a non-deterministic finite automata (nfa) graph Download PDFInfo
- Publication number
- HK1208104B HK1208104B HK15108608.0A HK15108608A HK1208104B HK 1208104 B HK1208104 B HK 1208104B HK 15108608 A HK15108608 A HK 15108608A HK 1208104 B HK1208104 B HK 1208104B
- Authority
- HK
- Hong Kong
- Prior art keywords
- node
- count
- type
- pattern
- character
- Prior art date
Links
Description
技术领域Technical Field
本发明的各实施例涉及为具有高级特征的正则表达式图样生成非确定有限自动机(NFA)图形。Embodiments of the present invention are directed to generating non-deterministic finite automaton (NFA) graphs for regular expression patterns having high-level features.
背景技术Background Art
开放系统互连(OSI)参考模型定义了用于通过传输介质进行通信的7个网络协议层(L1-L7)。上层(L4-L7)表示端到端通信并且下层(L1-L3)表示本地通信。The Open Systems Interconnection (OSI) reference model defines seven network protocol layers (L1-L7) for communication over a transmission medium. The upper layers (L4-L7) represent end-to-end communication and the lower layers (L1-L3) represent local communication.
联网应用感知系统需要处理、过滤和切换L3到L7网络协议层的范围,例如,L7网络协议层诸如超文本传输协议(HTTP)和简单邮件传输协议(SMTP),以及L4网络协议层诸如传输控制协议(TCP)。除了处理网络协议层以外,联网应用感知系统需要通过 L4-L7网络协议层来同时通过基于访问和内容的安全性来保护这些协议,这些协议层包括防火墙、虚拟专用网(VPN)、安全套接字层(SSL)、入侵检测系统(IDS)、互联网协议安全(IPSec)、线速的防病毒(AV)和防垃圾邮件功能。线速是在其上传输和接收数据的网络的物理介质上的数据传送速率。Networked application-aware systems need to process, filter, and switch a range of L3 to L7 network protocol layers, for example, L7 network protocol layers such as Hypertext Transfer Protocol (HTTP) and Simple Mail Transfer Protocol (SMTP), and L4 network protocol layers such as Transmission Control Protocol (TCP). In addition to processing network protocol layers, networked application-aware systems need to protect these protocols through both access-based and content-based security across L4-L7 network protocol layers, including firewalls, virtual private networks (VPNs), secure sockets layers (SSL), intrusion detection systems (IDS), Internet Protocol Security (IPSec), and line-speed antivirus (AV) and anti-spam capabilities. Line speed is the data transfer rate on the physical medium of the network over which data is transmitted and received.
网络处理器可用于高吞吐量L2和L3网络协议处理,即,执行数据包处理从而以线速转发数据包。通常,通用处理器用于处理需要更多智能处理的L4-L7网络协议。虽然通用处理器可以执行计算密集型任务,但是没有足够用于处理数据使得其能够被以线速转发的性能。Network processors are used for high-throughput Layer 2 and Layer 3 network protocol processing, meaning they perform packet processing to forward packets at line speed. Typically, general-purpose processors are used to process Layer 4-7 network protocols, which require more intelligent processing. While general-purpose processors can perform computationally intensive tasks, they lack the performance to process data at line speed.
内容感知联网需要以“线速”的对数据包的内容的检查。可以对内容进行分析,以确定是否存在安全漏洞或入侵。应用大量正则表达式形式的图样和规则以确保所有的安全漏洞或入侵被检测到。正则表达式是用于描述值/字符/字母串中的图样的紧凑型方法。由正则表达式所匹配的最简单图样是单个值/字符/字母或值/字符/字母串,例如,/c/或/cat/。正则表达式还包括具有特殊含义的运算符和元字符。Content-aware networking requires inspection of the contents of data packets at "wire speed". The content can be analyzed to determine if there are security vulnerabilities or intrusions. A large number of patterns and rules in the form of regular expressions are applied to ensure that all security vulnerabilities or intrusions are detected. Regular expressions are a compact method for describing patterns in strings of values/characters/letters. The simplest pattern matched by a regular expression is a single value/character/letter or a string of values/characters/letters, for example, /c/ or /cat/. Regular expressions also include operators and metacharacters with special meanings.
通过使用元字符,正则表达式可以用于更复杂的搜索,诸如“abc.*xyz”。即,在“abc”和“xyz”之间的无限量字符的情况下,找到字符串“abc”,之后是字符串“xyz”。另一示例是正则表达式“abc..abc.*xyz;”,即,找到字符串“abc”,后面两个字符,然后是字符串“abc”并且在无限量字符后由字符串“xyz”跟随。By using metacharacters, regular expressions can be used for more complex searches, such as "abc.*xyz". That is, with an unlimited number of characters between "abc" and "xyz", find the string "abc" followed by the string "xyz". Another example is the regular expression "abc..abc.*xyz;", that is, find the string "abc", followed by two characters, then the string "abc" and followed by the string "xyz" after an unlimited number of characters.
入侵检测系统(IDS)应用检查所有流过网络的单独数据包的内容,并且标识可能指示尝试闯入或威胁系统的可疑图样。可疑图样的一个示例可以是数据包中的特定文本串,该特定文本串在100 个字符以后跟随另一特定文本串。通常使用搜索算法(如用于处理正则表达式的确定有限自动机(DFA)或非确定有限自动机(NFA)) 执行内容搜索。Intrusion detection system (IDS) applications examine the contents of all individual packets flowing through a network and identify suspicious patterns that could indicate an attempt to break into or compromise a system. An example of a suspicious pattern might be a specific string of text within a packet that is followed 100 characters later by another specific string of text. Content searches are typically performed using search algorithms, such as deterministic finite automata (DFA) or nondeterministic finite automata (NFA) for processing regular expressions.
发明内容Summary of the Invention
在一个实施例中,一种将图样编译成非确定有限自动机 (NFA)图形的方法包括对该图样进行多个元素和多种节点类型检查。每种节点类型与一个元素相对应。可以对该图样的每个元素进行匹配至少零次。该方法进一步包括生成该NFA图形的多个节点。该多个节点中的每个节点可以被配置成用于针对该多个元素其中之一进行匹配。该元素可以指示该NFA图形中的下一个节点地址、计数值、和/或与该元素相对应的节点类型。该节点还可以指示该元素表示字符、字符类或字符串。字符还可以是一个值或一个字母。In one embodiment, a method for compiling a pattern into a non-deterministic finite automaton (NFA) graph includes checking the pattern for multiple elements and multiple node types. Each node type corresponds to an element. Each element of the pattern can be matched at least zero times. The method further includes generating multiple nodes of the NFA graph. Each of the multiple nodes can be configured to match one of the multiple elements. The element can indicate the next node address in the NFA graph, a count value, and/or a node type corresponding to the element. The node can also indicate that the element represents a character, a character class, or a string. A character can also be a value or a letter.
在一个实施例中,该节点类型是以下各项中的至少一项:可变计数、固定计数、固定计数和可变计数、字符、不区分大小写的字符、字符类、字符串、不区分大小写的字符串、标记的和分离的。In one embodiment, the node type is at least one of: variable count, fixed count, fixed count and variable count, character, case-insensitive character, character class, string, case-insensitive string, tokenized, and separated.
在一个实施例中,对该图样进行该节点类型检查可以包括针对以下各节点类型的指示中的至少一种对该图样进行搜索:可变计数节点类型、固定计数节点类型、固定和可变计数节点类型、字符类以及字符串。In one embodiment, checking the pattern for the node type may include searching the pattern for indications of at least one of the following node types: variable count node type, fixed count node type, fixed and variable count node types, character class, and string.
在一个实施例中,字符串节点类型可以表示多个值、字母、字符、或其他数据类型的图样。对该图样进行该字符串节点类型检查可以包括确定该图样指示多个连续值。对该图样进行该字符串节点类型检查可以进一步包括确定这些连续值没有介于多种节点类型中间。每个值可以是一个字节、字母、和/或字符。In one embodiment, a string node type can represent a pattern of multiple values, letters, characters, or other data types. Performing a string node type check on the pattern can include determining that the pattern indicates multiple consecutive values. Performing a string node type check on the pattern can further include determining that the consecutive values are not interposed between multiple node types. Each value can be a byte, letter, and/or character.
在一个实施例中,对该图样进行该可变计数节点类型检查可以包括确定该图样指示针对该元素进行匹配可变次数。对该图样进行该可变计数节点类型检查可以进一步包括确定该图样指示针对该元素进行匹配至少零次。In one embodiment, performing the variable count node type check on the pattern may include determining that the pattern indicates matching the element a variable number of times. Performing the variable count node type check on the pattern may further include determining that the pattern indicates matching the element at least zero times.
在一个实施例中,一种固定计数节点类型表示有待针对一个元素进行匹配固定次数的图样。该固定计数和可变计数节点类型可以表示有待针对一个元素进行匹配固定次数然后可变次数的图样。该可变次数可以是有限次数或无限次数。对该图样进行该固定计数和可变计数节点类型检查可以包括确定该图样指示针对该元素进行匹配至少一次和最多固定次数或无限次数。该图样包括一个表示无限的符号,该符号用于触发确定该图样指示针对该元素进行匹配该无限次数。In one embodiment, a fixed count node type represents a pattern to be matched a fixed number of times for an element. The fixed count and variable count node types can represent a pattern to be matched a fixed number of times and then a variable number of times for an element. The variable number of times can be a finite number of times or an infinite number of times. Checking the pattern for the fixed count and variable count node types can include determining that the pattern indicates that the element is to be matched at least once and at most a fixed number of times or an infinite number of times. The pattern includes a symbol representing infinity, which is used to trigger a determination that the pattern indicates that the element is to be matched an infinite number of times.
在一个实施例中,一种字符类节点类型可以表示至少一个值的布尔或运算。该方法可以进一步包括存储每个字符类作为掩码。如果该掩码中的每个可能的值/字符/字母为该字符类的一部分,则可以设置该值/字符/字母,如果其不是该字符类的一部分,则不设置。每个节点可以包括一个与唯一字符类相对应的字符类索引。字符类编号和有效载荷段可以被用作该掩码的索引,从而使得如果设置所标引的条目,则图形行走引擎可以确定该有效载荷与该字符类匹配。In one embodiment, a character class node type may represent a Boolean OR operation of at least one value. The method may further include storing each character class as a mask. Each possible value/character/letter in the mask may be set if the value/character/letter is part of the character class, and not set if it is not part of the character class. Each node may include a character class index corresponding to a unique character class. The character class number and payload segment may be used as an index into the mask so that if the indexed entry is set, the graph walking engine can determine that the payload matches the character class.
在一个实施例中,该可变计数节点类型可以进一步指示该节点为一个贪婪节点、懒惰节点、领属节点、或全匹配节点。该贪婪匹配类型和领属匹配类型的可变计数节点类型可以针对该有效载荷中的最长可能的匹配而进行匹配。然而,当到达该有效载荷的末端后,领属匹配类型的节点类型不回溯,而贪婪匹配类型的节点类型却回溯。懒惰匹配类型的可变计数类型的节点可以针对该有效载荷中的最短可能的匹配而进行匹配。全节点匹配类型的可变计数类型的节点可以针对该有效载荷中的所有匹配而进行匹配。In one embodiment, the variable count node type can further indicate that the node is a greedy node, a lazy node, a possessive node, or a full match node. The greedy match type and the possessive match type's variable count node type can be matched for the longest possible match in the payload. However, after reaching the end of the payload, the possessive match type's node type does not backtrack, while the greedy match type's node type does backtrack. The lazy match type's variable count node type can be matched for the shortest possible match in the payload. The full node match type's variable count node type can be matched for all matches in the payload.
在一个实施例中,对该图样进行节点类型和元素检查包括标识该图样的至少一部分作为一种字符类节点类型和相应的元素。对该图样进行该节点类型和该元素检查可以包括标识该图样的至少两个部分作为一种字符类节点类型和相应的元素。该方法可以进一步包括为这些部分中的每个部分生成一个位图。该位图中的每个位可以表示一个与该元素匹配的值。每个位图可以与一个字符类索引相关联。该方法可以进一步包括,如果该至少两个部分中的一个第一部分和该至少两个部分中的一个第二部分具有同一个相应元素,则使该第一和第二部分与同一位图相关联。对该图样进行检查可以包括检查多个图样。该第一和第二部分可以在分开的图样中。In one embodiment, performing a node type and element check on the pattern includes identifying at least a portion of the pattern as a character class node type and a corresponding element. Performing a node type and element check on the pattern may include identifying at least two portions of the pattern as a character class node type and a corresponding element. The method may further include generating a bitmap for each of the portions. Each bit in the bitmap may represent a value that matches the element. Each bitmap may be associated with a character class index. The method may further include associating the first and second portions with the same bitmap if a first portion of the at least two portions and a second portion of the at least two portions have the same corresponding element. Checking the pattern may include checking multiple patterns. The first and second portions may be in separate patterns.
在一个实施例中,一种用于将图样编译成非确定有限自动机(NFA)图形的方法可以包括一个图样确定模块,该图样确定模块被配置成用于对该图样进行多个元素和多种节点类型检查。每种节点类型与一个元素相对应。可以对该图样的每个元素进行匹配至少零次。该系统可以进一步包括一个节点生成模块,该节点生成模块被配置成用于生成该NFA图形的多个节点。该多个节点中的每个节点可以被配置成用于针对该多个元素其中之一进行匹配。该节点可以指示与该元素相对应的节点类型。该节点可以进一步指示该 NFA图形中的下一个节点地址、计数值、以及该元素表示字符、字符类或字符串。字符还可以是一个值或一个字母。In one embodiment, a method for compiling a pattern into a non-deterministic finite automaton (NFA) graph may include a pattern determination module configured to perform multiple element and multiple node type checks on the pattern. Each node type corresponds to an element. Each element of the pattern can be matched at least zero times. The system may further include a node generation module configured to generate multiple nodes of the NFA graph. Each node in the multiple nodes can be configured to match one of the multiple elements. The node can indicate the node type corresponding to the element. The node can further indicate the next node address, count value, and the element representation character, character class, or string in the NFA graph. A character can also be a value or a letter.
可变计数节点为针对一个元素进行匹配可变次数的节点,该次数由一个范围(例如,零到五次)限定。可变计数节点可以具有四种特性中的一种:懒惰、贪婪、领属、或全匹配。可变计数懒惰节点被配置成用于找到该范围内的最短可能的元素匹配。可变计数领属节点被配置成用于找到该范围内的最长可能的元素匹配,而不在到达有效载荷的末端时回溯。可变计数贪婪节点被配置成用于找到该范围内的最长可能的元素匹配,其中,当到达有效载荷的末端时进行回溯。可变计数全匹配节点被配置成用于返回有效载荷中的所有匹配。A variable count node is a node that matches an element a variable number of times, where the number of matches is defined by a range (e.g., zero to five times). A variable count node can have one of four characteristics: lazy, greedy, possessive, or full match. A variable count lazy node is configured to find the shortest possible match for an element within the range. A variable count possessive node is configured to find the longest possible match for an element within the range without backtracking when the end of the payload is reached. A variable count greedy node is configured to find the longest possible match for an element within the range, where backtracking is performed when the end of the payload is reached. A variable count full match node is configured to return all matches in the payload.
固定计数节点针对一个元素进行匹配固定次数。固定计数和可变计数图样可以是被配置成用于针对一个范围进行匹配的可变计数图样的表达式,其中,该范围以高于零的数字开始。例如,针对一个元素进行匹配10到20次的可变计数图样可以被表达为针对该元素进行匹配十次的固定计数节点然后针对该元素进行匹配0到 10次的可变计数节点。字符串节点为按具体顺序针对字符串(字符集合)进行匹配的节点。A fixed count node matches an element a fixed number of times. Fixed count and variable count patterns can be expressions configured as variable count patterns that match a range, where the range starts with a number greater than zero. For example, a variable count pattern that matches an element 10 to 20 times can be expressed as a fixed count node that matches the element ten times and then a variable count node that matches the element 0 to 10 times. A string node is a node that matches a string (set of characters) in a specific order.
固定-可变计数节点为针对一个元素进行匹配固定次数然后针对同一元素进行匹配可变次数的节点。例如,考虑图样“b{2, 5}”,其针对字符元素“b”进行匹配2至5次。这种图样可以被编译成一个具有两个计数值的固定-可变计数节点。第一计数值指示该元素进行匹配的固定次数,并且第二计数值指示对该元素进行匹配的可变次数。该第二计数值可以是例如总的最大值(在本示例中,对该元素共进行五次匹配)或固定匹配完成之后(在本示例中,共三次,或最大可变次数五次与该第一计数两次之间的差值)的最大值。如下所述,固定-可变节点的处理与可变计数节点相同。A fixed-variable count node is a node that matches a fixed number of times for an element and then matches a variable number of times for the same element. For example, consider the pattern "b{2, 5}", which matches 2 to 5 times for the character element "b". This pattern can be compiled into a fixed-variable count node with two count values. The first count value indicates the fixed number of times that the element is matched, and the second count value indicates the variable number of times that the element is matched. The second count value can be, for example, a total maximum value (in this example, matching the element five times in total) or the maximum value after the fixed match is completed (in this example, three times in total, or the difference between the maximum variable number of times five times and the first count twice). As described below, the processing of fixed-variable nodes is identical to that of variable count nodes.
标记节点为指示在有效载荷中找到图样的匹配的节点。分离节点为对图形中两条路径之间的选择进行指示的节点。A marker node is a node that indicates a match was found for a pattern in the payload. A split node is a node that indicates a choice between two paths in the graph.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
从本发明的示例实施例的以下更具体的说明中上述内容将是明显的,如在这些附图中所展示的,其中,贯穿这些不同的视图,相似的参考字符是指相同的部分。附图不一定按比例,而是着重于展示本发明的实施例。The foregoing will be apparent from the following more particular description of example embodiments of the invention, as illustrated in the accompanying drawings in which like reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating embodiments of the invention.
图1A和图1B为包括网络服务处理器的示例安全装置的框图。1A and 1B are block diagrams of example security devices including a network services processor.
图2A分别为图1A和图1B中所示的网络服务处理器、或协议处理器的框图。FIG. 2A is a block diagram of the network service processor or the protocol processor shown in FIG. 1A and FIG. 1B , respectively.
图2B为框图,展示了图2A的引擎(例如,网络服务处理器)的环境的示例实施例。2B is a block diagram illustrating an example embodiment of an environment for the engine (eg, network services processor) of FIG. 2A .
图3A为图解,展示了NFA图形的示例实施例。FIG. 3A is a diagram illustrating an example embodiment of an NFA graph.
图3B为本发明所使用的NFA图形的示例实施例的图解。FIG. 3B is a diagram of an example embodiment of an NFA graph used with the present invention.
图3C为图解,展示了示出可以使用的其他类型的计数节点的NFA图形的示例实施例。FIG. 3C is a diagram illustrating an example embodiment of an NFA graph showing other types of counting nodes that may be used.
图4A为现有技术系统所使用的NFA图形的示例实施例。FIG. 4A is an example embodiment of an NFA graph used by prior art systems.
图4B为图解,展示了本发明所使用的NFA图形的示例实施例。FIG. 4B is a diagram illustrating an example embodiment of an NFA graph used in the present invention.
图4C为使用五个单独节点的图样“USPTO”的常规图形的示例实施例。FIG. 4C is an example embodiment of a conventional graph of the pattern “USPTO” using five separate nodes.
图4D展示了使用字符串节点的图形的示例实施例。FIG4D illustrates an example embodiment of a graph using string nodes.
图5为图解,展示了NFA图形的示例实施例,该图形展示了本发明的示例实施例。FIG. 5 is a diagram illustrating an example embodiment of an NFA graph illustrating an example embodiment of the present invention.
图6A为框图,展示了编译器处理图样的示例实施例。FIG6A is a block diagram illustrating an example embodiment of a compiler processing pattern.
图6B为图6A的图样产生的编译NFA图形的图解。FIG. 6B is a diagram of the compiled NFA graph generated by the pattern of FIG. 6A .
图7为框图,展示了对图样进行编译的示例实施例。FIG7 is a block diagram illustrating an example embodiment of compiling a pattern.
图8为流程图,展示了对图样进行编译的示例实施例。FIG8 is a flow chart illustrating an example embodiment of compiling a pattern.
图9为流程图,展示了图形行走引擎对节点进行处理的示例实施例。FIG9 is a flow chart illustrating an example embodiment of processing nodes by a graph walking engine.
图10为框图,展示了图形行走引擎对NFA图形的节点进行处理的示例实施例。FIG10 is a block diagram illustrating an example embodiment of a graph walking engine processing nodes of an NFA graph.
图11为流程图,展示了使本发明所使用的NFA图形行走的过程。FIG11 is a flow chart showing the process of walking the NFA graph used in the present invention.
图12为流程图,展示了对节点进行处理的示例实施例。FIG12 is a flow chart illustrating an example embodiment of processing a node.
图13为流程图,展示了对字符类节点进行处理的示例实施例。FIG13 is a flow chart showing an example embodiment of processing character nodes.
图14为流程图,展示了图形行走引擎对字符串节点进行处理的示例实施例。FIG14 is a flow chart showing an example embodiment of processing a string node by a graph walking engine.
图15A和图15B为流程图,展示了对固定计数节点进行处理的示例实施例。15A and 15B are flow charts illustrating example embodiments of processing fixed count nodes.
图16为流程图,展示了对可变计数节点进行处理的示例实施例。FIG16 is a flow chart illustrating an example embodiment of processing a variable count node.
图17为流程图,展示了对可变计数懒惰节点进行处理的示例实施例。FIG17 is a flow chart illustrating an example embodiment of processing a variable count lazy node.
图18为流程图,展示了对可变计数贪婪节点进行处理的示例实施例。FIG18 is a flow chart illustrating an example embodiment of processing a variable count greedy node.
图19为流程图,展示了对可变计数领属节点进行处理的示例实施例。FIG19 is a flow chart illustrating an example embodiment of processing a variable count owner node.
图20为流程图,展示了对可变计数全匹配节点进行处理的示例实施例。FIG20 is a flow chart illustrating an example embodiment of processing a variable count full match node.
图21为表,展示了字符类中所使用的位图/掩码的示例实施例。FIG. 21 is a table showing an example embodiment of bitmaps/masks used in character classes.
图22为表,展示了字符类匹配节点的格式。FIG22 is a table showing the format of a character class matching node.
图23为表,展示了字符串匹配节点的格式。FIG23 is a table showing the format of a string matching node.
图24为表,展示了固定计数匹配节点的格式。FIG24 is a table showing the format of a fixed count matching node.
图25为表,展示了可变计数匹配节点的格式。FIG25 is a table showing the format of a variable count match node.
图26为表,展示了字符类匹配堆栈条目的格式。Figure 26 is a table showing the format of a character class matching stack entry.
图27为表,展示了字符串匹配堆栈条目的格式。FIG27 is a table showing the format of a string matching stack entry.
图28为表,展示了固定计数匹配堆栈条目的格式。Figure 28 is a table showing the format of a fixed count match stack entry.
图29为表,展示了可变计数匹配堆栈条目的格式。FIG29 is a table showing the format of a variable count match stack entry.
具体实施方式DETAILED DESCRIPTION
以下是本发明的多个示例实施例的描述。What follows is a description of several example embodiments of the invention.
戈亚尔(Goyal)等人的被公开为美国公开号2013/0133064 的美国第13/303,855号申请“逆向NFA生成和处理(Reverse NFA Generation and Processing)”和戈亚尔(Goyal)等人的被公开为美国公开号2012/0221497的美国第13/168,395号申请“正则表达式处理自动机(Regular Expression Processing Automaton)”描述了NFA 和表达式匹配概念。以上申请的全部教导通过引用结合于此。U.S. Application No. 13/303,855 to Goyal et al., “Reverse NFA Generation and Processing,” published as U.S. Publication No. 2013/0133064, and U.S. Application No. 13/168,395 to Goyal et al., “Regular Expression Processing Automaton,” published as U.S. Publication No. 2012/0221497, describe NFA and expression matching concepts. The entire teachings of these applications are incorporated herein by reference.
perl兼容正则表达式(PCRE)已经成为安全和联网应用中正则表达式语法的约定俗成的标准。随着更多应用需要深度数据包检查已经兴起或更多威胁在互联网中变得普遍,用于标识病毒/攻击的相应特征/图样或应用也已经变得更加复杂。特征数据库从具有简单字符串图样到具有通配字符/范围/字符类的正则表达式(regex)图样进化到高级PCRE特征。高级PCRE特征具体地是指如起始偏移、反向引用、捕捉组、和断言的特征。本发明的实施例支持线速下的高级PCRE特征。Perl Compatible Regular Expressions (PCRE) has become the de facto standard for regular expression syntax in security and networking applications. As more applications requiring deep packet inspection have emerged or more threats have become prevalent on the Internet, the corresponding features/patterns or applications for identifying viruses/attacks have also become more complex. Feature databases have evolved from having simple string patterns to regular expression (regex) patterns with wildcard characters/ranges/character classes to advanced PCRE features. Advanced PCRE features specifically refer to features such as starting offsets, backreferences, capture groups, and assertions. Embodiments of the present invention support advanced PCRE features at line rate.
在详细描述本发明的示例实施例之前,以下紧接着描述了可以使用DFA和NFA在其中实施这些实施例的示例网络安全应用,以帮助读者理解本发明的发明特征。Before describing example embodiments of the present invention in detail, an example network security application in which these embodiments may be implemented using DFA and NFA is described immediately below to help the reader understand the inventive features of the present invention.
图1A为包括网络服务处理器100的示例安全装置102的框图。安全装置102可以是可以将在一个以太网端口(Gig E)接收到的数据包切换到另一个以太网端口(Gig E)和在转发这些数据包之前在所接收到的数据包上执行多个安全功能的独立系统。例如,安全装置102可以用于在将所处理的数据包转发至局域网之前对在广域网上接收到的数据包执行安全处理。FIG1A is a block diagram of an example security appliance 102 including a network services processor 100. Security appliance 102 may be a standalone system that can switch packets received on one Ethernet port (Gig E) to another Ethernet port (Gig E) and perform multiple security functions on the received packets before forwarding them. For example, security appliance 102 may be configured to perform security processing on packets received on a wide area network (WAN) before forwarding the processed packets to a local area network (LAN).
网络服务处理器100对所接收到的数据包中所封装的开放系统互连网络L2-L7层协议进行处理。如本领域技术人员所熟知的,开放系统互连(OSI)参考模型定义了七层网络协议层(L1-7)。物理层(L1)表示将设备连接到传输媒介的实际接口,包括电气接口和物理接口。数据链路层(L2)执行数据组帧。网络层(L3)将数据格式化为数据包。传输层(L4)处理端到端的传输。会话层(L5) 管理设备之间的通信,例如,无论通信是半双工的还是全双工的。表现层(L6)管理数据格式化及表现,例如,语法、控制代码、特殊图形及字符集。应用层(L7)允许多个用户之间进行通信,例如,文件传输及电子邮件。The network service processor 100 processes the open systems interconnection network L2-L7 layer protocols encapsulated in the received data packets. As is well known to those skilled in the art, the open systems interconnection (OSI) reference model defines seven layers of network protocol layers (L1-7). The physical layer (L1) represents the actual interface that connects the device to the transmission medium, including the electrical interface and the physical interface. The data link layer (L2) performs data framing. The network layer (L3) formats the data into data packets. The transport layer (L4) handles end-to-end transmission. The session layer (L5) manages communication between devices, for example, whether the communication is half-duplex or full-duplex. The presentation layer (L6) manages data formatting and presentation, for example, syntax, control codes, special graphics and character sets. The application layer (L7) allows communication between multiple users, for example, file transfer and email.
网络服务处理器100可以为上层网络协议(例如,L4-L7) 调度和排列工作(数据包处理操作),并且允许在所接收到的待执行的数据包中进行上层网络协议的处理,以便以线速转发数据包。通过处理这些协议来以线速转发这些数据包,该网络服务处理器不会降低网络数据传送速率。The network services processor 100 can schedule and queue work (packet processing operations) for upper-layer network protocols (e.g., L4-L7) and allow upper-layer network protocol processing to be performed on received packets to forward packets at line speed. By processing these protocols to forward packets at line speed, the network services processor does not reduce the network data transfer rate.
网络服务处理器100可以包括多个以太网媒体访问控制接口,其中,标准简化的千兆比特媒体独立接口(RGMII)连接至芯片外PHY 104a和PHY 104b。The network services processor 100 may include multiple Ethernet media access control interfaces, wherein a standard Reduced Gigabit Media Independent Interface (RGMII) is connected to the off-chip PHY 104a and PHY 104b.
网络服务处理器100还可以通过物理接口PHY 104a和PHY 104b从以太网端口(GigE)接收数据包以及对所接收到的数据包执行L2-L7网络协议处理以及将所处理的数据包转发通过物理接口 104a和104b到达网络中的另一跳或最终目的地或通过外围组件互连 /外围组件互连扩展接口(PCI/PCI-X)总线106以便由主机处理器进行进一步处理。网络协议处理可以包括网络安全协议的处理,如防火墙、应用防火墙、包括IP安全(IPSec)和/或安全套接字层(SSL) 的虚拟专用网(VPN)、入侵检测系统(IDS)和防病毒(AV)。The network services processor 100 may also receive data packets from the Ethernet port (GigE) via the physical interfaces PHY 104a and PHY 104b, perform L2-L7 network protocol processing on the received data packets, and forward the processed data packets via the physical interfaces 104a and 104b to another hop or final destination in the network or via the peripheral component interconnect/peripheral component interconnect extended interface (PCI/PCI-X) bus 106 for further processing by the host processor. The network protocol processing may include processing of network security protocols such as firewalls, application firewalls, virtual private networks (VPNs) including IP Security (IPSec) and/or Secure Sockets Layer (SSL), intrusion detection systems (IDS), and antivirus (AV).
网络服务处理器100还可以包括用于控制外部本地存储器 108的存储器控制器,如动态随机存取存储器(DRAM)和双倍数据速率同步动态随机存取存储器(DDR SDRAM)。在某些实施例中,外部本地存储器118为低延迟存储器。The network services processor 100 may also include a memory controller, such as dynamic random access memory (DRAM) and double data rate synchronous dynamic random access memory (DDR SDRAM), for controlling the external local memory 108. In some embodiments, the external local memory 118 is a low-latency memory.
外部本地存储器118可以用于允许快速查找的互联网服务和安全应用,包括入侵检测系统(IDS)或防病毒(AV)应用或需要字符串匹配的其他应用可能所需的字符串匹配。The external local memory 118 may be used for Internet services and security applications that allow fast lookups, including string matching as may be required by intrusion detection systems (IDS) or anti-virus (AV) applications or other applications requiring string matching.
根据本发明的一个实施例,网络服务处理器100可以执行图样搜索、正则表达式处理、内容验证、转换和安全以加速数据包处理。正则表达式处理和图样搜索可以用于针对IDS和AV应用以及需要字符串匹配的其他应用执行字符串匹配。According to one embodiment of the present invention, the network service processor 100 can perform pattern searching, regular expression processing, content verification, conversion, and security to accelerate packet processing. Regular expression processing and pattern searching can be used to perform string matching for IDS and AV applications and other applications that require string matching.
网络服务处理器100中的DRAM控制器可以控制对耦合到网络服务处理器100上的外部动态随机存取存储器(DRAM)108的访问。DRAM 108可以存储从PHY接口104a和104b或PCI/PCI-X 接口106接收到的数据包以供网络服务处理器100进行处理。在一个实施例中,DRAM接口支持运行高达800MHz的64或128位双倍数据速率II同步动态随机存取存储器(DDRII SDRAM)。DRAM 还可以存储DFA和NFA图形表达式搜索中查找和图样匹配所需的规则数据。A DRAM controller in the network services processor 100 can control access to an external dynamic random access memory (DRAM) 108 coupled to the network services processor 100. DRAM 108 can store data packets received from the PHY interfaces 104a and 104b or the PCI/PCI-X interface 106 for processing by the network services processor 100. In one embodiment, the DRAM interface supports 64- or 128-bit double data rate II synchronous dynamic random access memory (DDRII SDRAM) running at up to 800 MHz. DRAM can also store rule data required for lookups and pattern matching in DFA and NFA graph expression searches.
启动总线110可以提供可以存储在闪速存储器112内并且当网络服务处理器100通电或复位时可以由网络服务处理器100执行的必要启动代码。应用代码还可以通过启动总线110从实施紧凑式闪存标准的装置114、或从可以是磁盘通过PCI/PCI-X总线106附接的另一个大容量装置加载到网络服务处理器100内。The boot bus 110 can provide the necessary boot code that can be stored in the flash memory 112 and executed by the network services processor 100 when the network services processor 100 is powered on or reset. Application code can also be loaded into the network services processor 100 through the boot bus 110 from a device 114 that implements the Compact Flash standard, or from another large storage device that can be a disk attached through the PCI/PCI-X bus 106.
杂项I/O接口116提供辅助接口,如通用输入/输出接口 (GPIO)、闪存、IEEE 802双线管理数据输入/输出(MDIO)接口、通用非同步收发器(UART)和串行接口。The miscellaneous I/O interface 116 provides auxiliary interfaces such as a general purpose input/output interface (GPIO), flash memory, an IEEE 802 two-wire managed data input/output (MDIO) interface, a universal asynchronous receiver/transmitter (UART), and a serial interface.
应认识到,示例安全装置102可以替代性地包括协议处理器101(图1B)。协议处理器101可以包括网络服务处理器100的元件,并且添加了内容处理加速器107,通过PCI/PCI-X连接106耦合到处理器101,并且外部DRAM 111耦合到加速器107。加速器 107和DRAM 111可以用于内容搜索应用中,从而在处理器101外部进行所有内容搜索操作。It should be appreciated that the example security appliance 102 may alternatively include a protocol processor 101 ( FIG. 1B ). The protocol processor 101 may include elements of the network services processor 100 with the addition of a content processing accelerator 107 coupled to the processor 101 via a PCI/PCI-X connection 106, and external DRAM 111 coupled to the accelerator 107. The accelerator 107 and DRAM 111 may be used in content search applications, thereby performing all content search operations external to the processor 101.
图2A分别为图1A和图1B中所示的网络服务处理器100、或协议处理器101的框图。网络服务处理器100、和/或协议处理器101使用多个处理器(内核)202提供高应用性能。网络应用可以被分类成数据平面和控制平面操作。内核202中的每个内核可以专用于执行数据平面或控制平面操作。数据平面操作可以包括数据包操作以便转发数据包。控制平面操作可以包括处理复杂的高层协议的多个部分,如互联网协议安全(IPSec)、传输控制协议(TCP)和安全套接字层(SSL)。数据平面操作可以包括处理这些复杂的高层协议的其他部分。FIG2A is a block diagram of the network service processor 100 or protocol processor 101 shown in FIG1A and FIG1B , respectively. The network service processor 100 and/or protocol processor 101 use multiple processors (cores) 202 to provide high application performance. Network applications can be classified into data plane and control plane operations. Each core in the core 202 can be dedicated to performing data plane or control plane operations. Data plane operations can include data packet operations to forward data packets. Control plane operations can include multiple parts of processing complex high-level protocols, such as Internet Protocol Security (IPSec), Transmission Control Protocol (TCP) and Secure Sockets Layer (SSL). Data plane operations can include other parts of processing these complex high-level protocols.
可以由接口单元210a、210b中的任一个通过SPI-4.2或 RGM II接口接收数据包。PCI接口224也可以接收数据包。接口单元210a、210b处理L2网络协议,该网络协议对所接收到的数据包进行以下预处理:检查所接收到的数据包内所包括的L2网络协议报头中的各字段。在接口单元210a、210b已经执行L2网络协议处理之后,将数据包转发至数据包输入单元214。数据包输入单元214 可以执行所接收到的数据包中所包括的L3和L4网络协议报头的预处理。该预处理包括对传输控制协议(TCP)/用户数据报协议(UDP) (L3网络协议)的校验和检查。Data packets can be received by either interface unit 210a or 210b via an SPI-4.2 or RGM II interface. PCI interface 224 can also receive data packets. Interface units 210a and 210b process the L2 network protocol, which pre-processes received data packets by checking various fields in the L2 network protocol header included in the received data packet. After interface units 210a and 210b have performed L2 network protocol processing, they forward the data packets to packet input unit 214. Packet input unit 214 can pre-process the L3 and L4 network protocol headers included in the received data packet. This pre-processing includes checksum checks on the Transmission Control Protocol (TCP)/User Datagram Protocol (UDP) (L3 network protocols).
数据包输入单元214可以用对至少一个处理器202中所执行的高层软件方便的格式将数据包数据写入到2级高速缓存212或 DRAM 108中的缓冲区中,以便进一步处理高层网络协议。数据包输入单元214还可以支持可编程缓冲区大小并且可以跨多个缓冲区分配数据包数据以支持大的数据包输入大小。The packet input unit 214 may write packet data to a buffer in the level 2 cache 212 or DRAM 108 in a format convenient for higher-level software executing in the at least one processor 202 for further processing of higher-level network protocols. The packet input unit 214 may also support programmable buffer sizes and may distribute packet data across multiple buffers to support large packet input sizes.
数据包次序/工作(POW)模块(单元)228可以为处理器 202对工作(数据包处理操作)进行排队和调度。工作被定义为处理器有待执行的、由工作队列上的条目标识的任何任务。该任务可以包括数据包处理操作,例如,针对有待由在工作队列上的工作队列条目所标识的所接收到的数据包上执行的L4-L7层的数据包处理操作。每个单独的数据包处理操作为处理器有待在存储器(L2高速缓存212或DRAM 108)中所存储的所接收到的数据包上执行的一份工作。例如,该工作可以是所接收到的防火墙/虚拟专用网络(VPN) 数据包的处理。防火墙/VPN数据包的处理可以包括以下单独的数据包处理操作(多份工作):(1)碎片整理,以对所接收到的数据包内的碎片进行重新排序;(2)IPSec解密;(3)IPSec加密;以及 (4)转发数据包之前的网络地址转换(NAT)或TCP序列号调整。The packet order/work (POW) module (unit) 228 may queue and schedule work (packet processing operations) for the processor 202. Work is defined as any task to be performed by the processor identified by an entry on a work queue. The task may include packet processing operations, such as L4-L7 layer packet processing operations to be performed on a received packet identified by a work queue entry on the work queue. Each individual packet processing operation is a piece of work to be performed by the processor on a received packet stored in memory (L2 cache 212 or DRAM 108). For example, the work may be processing of a received firewall/virtual private network (VPN) packet. The processing of a firewall/VPN packet may include the following individual packet processing operations (pieces of work): (1) defragmentation to reorder fragments within a received packet; (2) IPSec decryption; (3) IPSec encryption; and (4) network address translation (NAT) or TCP sequence number adjustment before forwarding the packet.
网络服务处理器100、和/或协议处理器101还可以包括存储器子系统。存储器子系统可以包括每个处理器202中的1级数据高速缓存204、每个处理器202中的指令高速缓存、2级高速缓存212、外部DRAM存储器的DRAM控制器216以及外部本地存储器118(例如,DDRSDRAM)的接口230。该存储器子系统被架构成用于支持多处理器并且被调谐成用于实现存储器密集型内容联网应用所需的高吞吐量和低延迟。处理器202和I/O协处理器装置全都可以共享2 级高速缓存212和(图1A和图1B的)外部DRAM存储器108。The network services processor 100 and/or the protocol processor 101 may also include a memory subsystem. The memory subsystem may include a Level 1 data cache 204 in each processor 202, an instruction cache in each processor 202, a Level 2 cache 212, a DRAM controller 216 for external DRAM memory, and an interface 230 for external local memory 118 (e.g., DDR SDRAM). The memory subsystem is architected to support multiple processors and is tuned to achieve the high throughput and low latency required for memory-intensive content networking applications. The processors 202 and the I/O coprocessor devices may all share the Level 2 cache 212 and the external DRAM memory 108 (of Figures 1A and 1B).
网络服务处理器100和/或协议处理器101还可以包括卸载处理器202从而使得网络服务处理器实现高吞吐量的特定用途协处理器。这些特定用途协处理器包括执行以下更加详细描述的非确定型有限自动机(NFA)处理的协处理器244和执行压缩和解压缩的压缩/解压缩协处理器208。The network services processor 100 and/or the protocol processor 101 may also include special purpose coprocessors that offload the processor 202 to enable the network services processor to achieve high throughput. These special purpose coprocessors include a coprocessor 244 that performs non-deterministic finite automaton (NFA) processing, described in more detail below, and a compression/decompression coprocessor 208 that performs compression and decompression.
每个处理器202可以是带有指令高速缓存206、1级数据高速缓存204、用于密码算法的内置硬件加速(加密加速模块)200的双发射超标量处理器,其中,通过低延迟存储器总线230直接访问本地存储器。至本地存储器118的低延迟直接访问路径绕过L2高速缓存212并且可以从处理器(内核)202和NFA协处理器244两者直接访问。Each processor 202 may be a dual-issue superscalar processor with an instruction cache 206, a level 1 data cache 204, built-in hardware acceleration for cryptographic algorithms (cryptographic acceleration module) 200, and direct access to local memory via a low-latency memory bus 230. The low-latency direct access path to local memory 118 bypasses the L2 cache 212 and is directly accessible from both the processor (core) 202 and the NFA coprocessor 244.
在进一步详细描述用于正则表达式处理的内容搜索宏命令和图样搜索的操作之前,将描述网络服务处理器100中的其他模块。在一个示例中,在处理器202已经处理了数据包之后,数据包输出单元(PKO)218从L2高速缓存或DRAM读取数据包数据,执行 L4网络协议后处理(例如,生成TCP/UDP校验和),转发数据包通过接口单元210a、210b以及释放用于存储数据包的L2高速缓存212或DRAM 108位置。Before further describing the operations of the content search macros and pattern search for regular expression processing, the other modules in the network services processor 100 will be described. In one example, after the processor 202 has processed a packet, the packet output unit (PKO) 218 reads the packet data from the L2 cache or DRAM, performs L4 network protocol post-processing (e.g., generates a TCP/UDP checksum), forwards the packet through the interface units 210a, 210b, and releases the L2 cache 212 or DRAM 108 location used to store the packet.
每个处理器202通过一致存储器总线234连接至L2高速缓存。一致存储器总线234(在一个实施例中宽度为384位)为用于处理器202、I/O桥(IOB)232与2级高速缓存212和控制器216之间的所有存储器和I/O事务的通信通道。Each processor 202 is connected to the L2 cache through a consistent memory bus 234. The consistent memory bus 234 (384 bits wide in one embodiment) is the communication channel for all memory and I/O transactions between the processor 202, the I/O bridge (IOB) 232, and the level 2 cache 212 and controller 216.
空闲池分配器(FPA)236维护多个指针池,以释放2级高速缓存212及DRAM 108中的存储器。为每个空闲指针池实现带宽高效(后进先出(LIFO))堆栈。如果指针池太大而不能安装在空闲池分配器(FPA)236内,则空闲池分配器(FPA)236使用指针池中用于存储附加指针的释放存储器在2级高速缓存212或DRAM 108中建立树/列表结构。The free pool allocator (FPA) 236 maintains multiple pointer pools to free memory in the Level 2 cache 212 and DRAM 108. A bandwidth-efficient (last-in, first-out (LIFO)) stack is implemented for each free pointer pool. If a pointer pool is too large to fit within the free pool allocator (FPA) 236, the free pool allocator (FPA) 236 uses the freed memory in the pointer pool to store additional pointers to build a tree/list structure in the Level 2 cache 212 or DRAM 108.
I/O桥(IOB)232管理整体协议及仲裁并且提供一致的I/O 划分。IOB 232包括桥238和FAU 240。桥238包括多个缓冲区队列,用于存储有待在I/O总线、一致存储器总线、数据包输入单元214 与数据包输出单元218之间传输的信息。I/O bridge (IOB) 232 manages the overall protocol and arbitration and provides consistent I/O partitioning. IOB 232 includes bridge 238 and FAU 240. Bridge 238 includes multiple buffer queues for storing information to be transferred between the I/O bus, the consistent memory bus, the packet input unit 214, and the packet output unit 218.
提取和添加单元(FAU)240为支持读取、写入、自动提取和添加、以及自动更新操作的2KB寄存器组。可以从处理器202和数据包输出单元218访问提取和添加单元(FAU)240。寄存器存储高度使用的值并因此减少访问这些值的流量。FAU 240中的寄存器用来维护用于通过数据包输出单元218转发所处理的数据包的输出队列的长度。The Fetch and Add Unit (FAU) 240 is a 2KB register set that supports read, write, auto-fetch and add, and auto-update operations. The Fetch and Add Unit (FAU) 240 is accessible from both the processor 202 and the packet output unit 218. The registers store highly used values and thus reduce the amount of traffic required to access these values. Registers in the FAU 240 are used to maintain the length of the output queue used to forward processed packets through the packet output unit 218.
PCI接口控制器224具有允许处理器202在网络服务处理器中的本地存储器与远程(PCI)存储器之间在两个方向上异步地移动数据的DMA引擎。The PCI interface controller 224 has a DMA engine that allows the processor 202 to asynchronously move data in both directions between local memory in the network services processor and remote (PCI) memory.
典型地,内容感知应用处理使用或者确定有限自动机(DFA)或者非确定有限自动机(NFA)来识别所接收到的数据包的内容中的图样。DFA和NFA两者都是有限状态机,即,计算模型,计算模型中的每一个都包括状态集合、开始状态、输入字母(所有可能的符号集合)和转换函数。计算在开始状态中开始,并且根据转换函数而改变为新的状态。Typically, content-aware application processing uses either a deterministic finite automaton (DFA) or a non-deterministic finite automaton (NFA) to identify patterns in the content of received data packets. Both DFAs and NFAs are finite state machines, i.e., computational models, each of which consists of a set of states, a start state, an input alphabet (the set of all possible symbols), and a transition function. Computation begins in the start state and changes to a new state according to the transition function.
图样通常使用正则表达式来表达,正则表达式包括基本元素,例如,诸如A-Z、0-9的正常文本字符以及诸如*、^和|的元字符或其他值。正则表达式的基本元素是要被匹配的符号(单个字符)。这些与允许元素(+)、交替(|)、Kleene星号(*)中的一个或多个匹配的元字符组合,其与元素匹配零次或多次。在一个实施例中,元字符可以由PCRE图样标准来限定。用于级联的元字符用于从单个字符(或子串)创建多个字符匹配图样,而用于交替(|)的元字符用于创建可以匹配两个或更多个子串中任何一个的正则表达式。元字符Kleene星号(*)允许图样匹配任意次数,包括不会发生有效载荷段的在前字符或字符类或字符串与在前字符或字符类匹配。组合不同的运算符和单个字符允许构建复杂的表达式。例如,表达式(th(is|at)*)将匹配以下字符串:th、this、that、thisis、thisat、thatis 或thatat。当元字符(?)跟着一个元素时,元字符(?)可以是{0, 1}的等效物。例如,图样“zzza?”可以与“zzz”有效载荷匹配或与有效载荷“zzza”匹配。Patterns are typically expressed using regular expressions, which include basic elements such as normal text characters such as A-Z, 0-9, and metacharacters such as *, ^, and | or other values. The basic elements of a regular expression are symbols (single characters) to be matched. These are combined with metacharacters that allow one or more matches of an element (+), alternation (|), or Kleene's asterisk (*), which match an element zero or more times. In one embodiment, metacharacters can be defined by the PCRE pattern standard. Metacharacters used for concatenation are used to create multiple character matching patterns from a single character (or substring), while metacharacters used for alternation (|) are used to create regular expressions that can match any one of two or more substrings. The metacharacter Kleene's asterisk (*) allows a pattern to match any number of times, including matching a preceding character or character class or a string with a preceding character or character class where no payload segment occurs. Combining different operators and single characters allows for the construction of complex expressions. For example, the expression (th(is|at)*) will match the following strings: th, this, that, thisis, thisat, thatis, or thatat. When the metacharacter (?) follows an element, the metacharacter (?) can be the equivalent of {0, 1}. For example, the pattern "zzza?" can match a "zzz" payload or a payload of "zzza".
字符类结构[...]允许列出要匹配的字符的列表,例如gr[ea]y 查找grey和gray两者。破折号指示字符的范围,例如[A-Z]或[0-9]。字符类可以进一步具有多个范围,例如,[a-zA-Z0-9]将包括所有字母、小写字母和大写字母、以及所有数字。除了换行字符以外,元字符“.”匹配任何一个字符。此外,元字符“^”指示除了后面的一个字符以外的每个字符。例如,“[^\n]”指示除了“换行”字符以外的每个字符(其中“\n”指示换行)。另一个示例为“[^0-9]”,其指示除了数字“0”至“9”以外的任何字符。The character class construct [...] allows for a list of characters to be matched, such as gr[ea]y finds both grey and gray. A dash indicates a range of characters, such as [A-Z] or [0-9]. A character class can further have multiple ranges, for example, [a-zA-Z0-9] would include all letters, lowercase and uppercase, and all digits. The metacharacter "." matches any one character except the newline character. Additionally, the metacharacter "^" indicates every character except the one that follows it. For example, "[^\n]" indicates every character except the "newline" character (where "\n" indicates a newline). Another example is "[^0-9]", which indicates any character except the digits "0" through "9".
典型地,ASCII字符作为来自0-128或0-256的二进制数分别存储在7位和8位实施例中。例如,换行(或跳行)字符可以被表示为ASCII下的数字12。然后,换行可以用二进制被表示为分别 7位和8位实施例中的“000 1010”或“0000 1010”然而,这这对于存储字符类而言不是最佳的。Typically, ASCII characters are stored as binary numbers from 0-128 or 0-256 in 7-bit and 8-bit embodiments, respectively. For example, the line feed (or line skip) character can be represented as the number 12 under ASCII. The line feed can then be represented in binary as "000 1010" or "0000 1010" in the 7-bit and 8-bit embodiments, respectively. However, this is not optimal for storing character classes.
对DFA或NFA状态机的输入通常是(8位)字节的串,即,字母是单字节(一个字符或符号)。输入流中的每个字节产生从一个状态到另一状态的转换。The input to a DFA or NFA state machine is typically a string of (8-bit) bytes, ie, an alphabet is a single byte (a character or symbol). Each byte in the input stream generates a transition from one state to another.
DFA或NFA状态机的状态和转换函数可以通过图形来表示,其中,图形中的每个节点表示状态,并且图形中的圆弧表示状态转换。状态机的当前状态由选择特定图形节点的节点标识符来表示。The states and transition functions of a DFA or NFA state machine can be represented by a graph, where each node in the graph represents a state and arcs in the graph represent state transitions. The current state of the state machine is represented by a node identifier that selects a particular graph node.
使用DFA来处理正则表达式并且找到字符的输入流中由正则表达式描述的一个或多个图样的特征在于:Use a DFA to process a regular expression and find one or more patterns described by the regular expression in an input stream of characters that are characterized by:
1)确定的运行时间性能:可以从输入字符(或符号)以及 DFA的当前状态确定DFA的下一个状态。换言之,每DFA状态仅存在一次状态转换。这样,DFA的运行时间性能被认为是确定的并且完全可以从输入预测行为。1) Deterministic Runtime Performance: The next state of a DFA can be determined from the input character (or symbol) and the current state of the DFA. In other words, there is only one state transition per DFA state. Thus, the runtime performance of a DFA is considered deterministic and its behavior is completely predictable from the input.
2)支持跨多个数据包匹配所需要的较小的每流上下文(例如,状态或节点指针):在搜索跨越构成流的若干数据包的输入的图样中,搜索可能在一个数据包处停止并且然后在另一个数据包处恢复。通常,确定要恢复搜索的状态需要跟踪、记住或以另外的方式存储(例如,作为状态指针或堆栈条目)直至搜索停止时所遍历的所有状态。然而,在DFA中,为了恢复搜索,仅需要记录搜索停止时的状态。这样,DFA的特征为需要较小的每流上下文,以支持跨多个输入数据包的图样匹配,例如,以若干字节的量级存储状态或节点指针。2) Smaller per-flow context (e.g., state or node pointers) required to support matching across multiple packets: In searching for a pattern spanning the inputs of several packets that make up a flow, the search may stop at one packet and then resume at another. Typically, determining the state to resume the search requires tracking, remembering, or otherwise storing (e.g., as a state pointer or stack entry) all states traversed up to the time the search stopped. However, in a DFA, only the state at the time the search stopped needs to be recorded in order to resume the search. Thus, a characteristic of a DFA is that a smaller per-flow context is required to support pattern matching across multiple input packets, e.g., storing state or node pointers on the order of several bytes.
3)其中节点的数量(或图形大小)可以随着图样的大小呈指数增长的图形。3) Graphs in which the number of nodes (or graph size) can grow exponentially with the size of the graph.
相比之下,使用NFA来处理正则表达式并且找到由字符的输入流中的正则表达式所描述的一个或多个图样的特征在于:In contrast, the use of an NFA to process regular expressions and find one or more patterns described by the regular expression in an input stream of characters is characterized by:
1)不确定的运行时间性能:给定输入字符(或符号)和 NFA的当前状态,可能存在要转换到其上的NFA的多于一个的下一状态。换言之,不能唯一地从NFA的输入和当前状态确定NFA的下一状态。这样,NFA的运行时间性能被认为是不确定的,并且不能完全从输入预测行为。1) Uncertain Runtime Performance: Given an input character (or symbol) and the current state of the NFA, there may be more than one next state of the NFA to transition to. In other words, the next state of the NFA cannot be uniquely determined from the input and the current state of the NFA. As such, the runtime performance of the NFA is considered uncertain, and its behavior cannot be fully predicted from the input.
2)支持跨多个数据包匹配所需要的较大的每流上下文(例如,状态或节点指针):如前所述,跨多个输入数据包的图样匹配,其中,搜索在一个数据包处停止并且然后在另一数据包处恢复,需要跟踪直至在搜索停止时所遍历的所有状态。在NFA中,越多输入被匹配,需要跟踪的当前状态的数量就越多。这样,可以说,与DFA 相比,NFA的特征为需要较大的每流上下文来支持跨多个输入数据包的图样匹配。2) Large per-flow context (e.g., state or node pointers) required to support matching across multiple packets: As previously mentioned, pattern matching across multiple input packets, where the search stops at one packet and then resumes at another, requires tracking all states traversed up to the point where the search stopped. In an NFA, the more inputs to be matched, the greater the number of current states that need to be tracked. Thus, compared to a DFA, an NFA is characterized by the need for a larger per-flow context to support pattern matching across multiple input packets.
3)其中节点的数量(或图形大小)通常随着图样的大小呈线性增长的图形。3) Graphs in which the number of nodes (or graph size) typically grows linearly with the size of the graph.
图2B为框图250,展示了图2A的引擎252(例如,网络服务处理器,(例如NFA引擎))的环境的示例实施例。引擎252 操作性地耦合成用于读取来自指令队列254的一个或多个指令253。指令队列254存储主机所发送的有待由引擎252处理的指令。引擎 252通过读取其中所存储的指针对指令253进行处理。指令253中的指针包括到输入缓冲区258(其可以被称为输入堆栈,即使其不具有堆栈的LIFO特性)的条目的指针、到有效载荷262的指针、到匹配结果缓冲区266的指针、到保存缓冲区264(其可以被称为保存堆栈,即使其不具有堆栈的LIFO特性)的指针以及到运行堆栈260的指针。FIG2B is a block diagram 250 illustrating an example embodiment of an environment for the engine 252 (e.g., a network services processor, such as an NFA engine) of FIG2A . The engine 252 is operatively coupled to read one or more instructions 253 from an instruction queue 254. The instruction queue 254 stores instructions sent by the host to be processed by the engine 252. The engine 252 processes the instructions 253 by reading pointers stored therein. The pointers in the instruction 253 include pointers to entries in an input buffer 258 (which may be referred to as an input stack, even though it does not have the LIFO nature of a stack), a pointer to a payload 262, a pointer to a match result buffer 266, a pointer to a save buffer 264 (which may be referred to as a save stack, even though it does not have the LIFO nature of a stack), and a pointer to a run stack 260.
引擎252将来自指针的一个或多个条目加载至输入缓冲区 258(例如,S1、S2、和/或S3)。然后,该引擎将来自输入缓冲区 258的一个或多个条目推送至运行堆栈260。在本示例中,该引擎可以将条目S1、S2、和S3推送至运行堆栈260。然后,引擎252弹出运行堆栈上的第一条目(例如,S1)并开始对其进行处理。在一个实施例中,该运行堆栈为后进先出(LIFO)堆栈。来自输入缓冲区 258的每个条目(例如,S1、S2、和S3)包括有效载荷偏移和到图形257的指针。然后,该引擎可以加载来自图形存储器256的图形 257并开始使用与有效载荷262的偏移相对应的有效载荷段对图形进行处理。Engine 252 loads one or more entries from the pointer into input buffer 258 (e.g., S1, S2, and/or S3). The engine then pushes one or more entries from input buffer 258 onto run stack 260. In this example, the engine may push entries S1, S2, and S3 onto run stack 260. Engine 252 then pops the first entry (e.g., S1) on the run stack and begins processing it. In one embodiment, the run stack is a last-in, first-out (LIFO) stack. Each entry from input buffer 258 (e.g., S1, S2, and S3) includes a payload offset and a pointer to graphic 257. The engine then loads graphic 257 from graphics memory 256 and begins processing the graphic using the payload segment corresponding to the offset in payload 262.
当引擎252使用来自有效载荷262的有效载荷段对图形257 进行处理时,其可以将条目推送和弹出至运行堆栈260。当需要将其位置保存在图形中时,引擎252将条目推送至运行堆栈260。当图形呈现多条处理路径时,引擎252需要将其位置保存在图形中。引擎 252可以遍历这些路径其中之一,并且在失配情况下,可以返回至运行堆栈260条目中所指示的节点和有效载荷以遍历其他一条或多条路径。图形257中的分离节点或可变计数节点可以在图形中呈现此类多条路径。As engine 252 processes graph 257 using payload segments from payload 262, it can push and pop entries onto run stack 260. Engine 252 pushes entries onto run stack 260 when its location in the graph needs to be saved. When a graph presents multiple processing paths, engine 252 needs to save its location in the graph. Engine 252 can traverse one of these paths and, in the event of a mismatch, can return to the node and payload indicated in the run stack 260 entry to traverse the other one or more paths. Split nodes or variable count nodes in graph 257 can present such multiple paths in the graph.
在处理有效载荷262和图形257中,有效载荷262可以在处理完成之前用完数据。有效载荷262可以是数据包或来自数据流(或有效载荷流)的其他分组数据。该流可以具有多个有效载荷262 (例如,数据包),每个有效载荷262在该流中具有一个顺序。有效载荷262的每个段为该有效载荷的具有具体粒度的部分,如但不限于一个字节。在一个实施例中,该粒度是可调整的或可选的。这种情况的一个示例为当有效载荷262的有效载荷偏移开始朝向数据包的末端并且在数据包结束前进行找到部分匹配时。为了继续该工作,引擎252将当前堆栈条目保存到保存缓冲区264内。从而,当有效载荷用尽时,保存缓冲区264存储运行堆栈260的一个或多个运行堆栈条目。然后,当引擎252从数据包的数据流加载有效载荷 262的后续部分时,引擎252可以从保存缓冲区264加载运行堆栈条目并将它们推送至运行堆栈260内以继续该工作。这种将保存缓冲区条目加载到运行堆栈内还可以由主机处理器来执行,同时将指令提交至用于同一流的后续数据包的引擎。During the processing of payload 262 and graph 257, payload 262 may run out of data before processing is complete. Payload 262 may be a packet or other packetized data from a data stream (or payload stream). The stream may have multiple payloads 262 (e.g., packets), each payload 262 having a sequence within the stream. Each segment of payload 262 is a portion of the payload with a specific granularity, such as, but not limited to, a byte. In one embodiment, this granularity is adjustable or selectable. An example of this situation is when the payload offset of payload 262 begins toward the end of a packet and a partial match is found before the end of the packet. To continue processing, engine 252 saves the current stack entry into save buffer 264. Thus, when the payload is exhausted, save buffer 264 stores one or more run stack entries from run stack 260. Then, when engine 252 loads subsequent portions of payload 262 from the packet's data stream, engine 252 can load run stack entries from save buffer 264 and push them into run stack 260 to continue processing. This loading of save buffer entries into the run stack may also be performed by the host processor while submitting instructions to the engine for subsequent packets of the same flow.
当找到有效载荷262对图形257的匹配后,除非引擎252 被配置成用于返回所有匹配,否则其弹出并可以丢弃运行堆栈260 内的与从输入缓冲区258加载的工作相关联的所有条目(例如,第一条目S1)。然后,引擎252将结果(例如,匹配位置和长度)保存在匹配结果缓冲区266存储器内。然后,引擎252可以从之前已经从输入缓冲区258(例如,S2)加载的运行堆栈加载下一个条目。然后,引擎252可以对与那个条目相对应的图形和有效载荷段进行处理,并继续对附加工作进行处理直到运行堆栈260是空的。When a match is found for payload 262 against graph 257, unless engine 252 is configured to return all matches, it pops and may discard all entries in run stack 260 associated with the work loaded from input buffer 258 (e.g., the first entry S1). Engine 252 then saves the results (e.g., the match position and length) in match result buffer 266 memory. Engine 252 may then load the next entry from the run stack that was previously loaded from input buffer 258 (e.g., S2). Engine 252 may then process the graph and payload segments corresponding to that entry and continue processing additional work until run stack 260 is empty.
当找到有效载荷262对图形257的失配后,该引擎弹出并处理运行堆栈260内的与从输入缓冲区258加载的工作相关联的下一个条目(例如,第一条目S1)。如果运行堆栈260内没有留下与从输入缓冲区258加载的工作相关联的条目(例如,第一目S1),则引擎252完成当前工作并从之前已经从输入缓冲区258加载的运行堆栈加载下一个条目(例如,S2)。然后,引擎252可以对与那个条目相对应的图形和有效载荷段进行处理,并继续对附加工作进行处理直到运行堆栈260是空的。When a mismatch is found between the payload 262 and the graphics 257, the engine pops and processes the next entry (e.g., the first entry S1) in the run stack 260 associated with the job loaded from the input buffer 258. If there are no entries (e.g., the first entry S1) left in the run stack 260 associated with the job loaded from the input buffer 258, the engine 252 completes the current job and loads the next entry (e.g., S2) from the run stack that was previously loaded from the input buffer 258. The engine 252 can then process the graphics and payload segments corresponding to that entry and continue processing additional jobs until the run stack 260 is empty.
图3A为图解300,展示了例如戈亚尔(Goyal)等人的被公开为美国公开号2013/0133064的美国第13/303,855号申请“逆向 NFA生成和处理(Reverse NFA Generation andProcessing)”和戈亚尔(Goyal)等人的被公开为美国公开号2012/0221497的美国第 13/168,395申请“正则表达式处理自动机(Regular Expression Processing Automaton)”中所描述的系统所使用的NFA图形320的示例实施例。以上申请的全部教导通过引用结合于此。NFA图形320 被配置成用于匹配图样“ab{0,5}x”。“b{0,5}”针对图样中的‘b’在任何地方匹配从零到五次。从而,该图样对以下有效载荷进行匹配:ax、abx、abbx、abbbx、abbbbx、或abbbbbx。FIG3A is a diagram 300 illustrating an example embodiment of an NFA graph 320 used by systems such as those described in U.S. Application No. 13/303,855, “Reverse NFA Generation and Processing,” published as U.S. Publication No. 2013/0133064 to Goyal et al., and U.S. Application No. 13/168,395, “Regular Expression Processing Automaton,” published as U.S. Publication No. 2012/0221497 to Goyal et al., all of which are incorporated herein by reference. NFA graph 320 is configured to match the pattern “ab{0,5}x.” “b{0,5}” matches anywhere from zero to five times for the ‘b’ in the pattern. Thus, the pattern matches the following payloads: ax, abx, abbx, abbbx, abbbbx, or abbbbx.
NFA图形320以节点N0 302开始。当加载节点N0 302后,图形行走引擎被配置成用于确定有效载荷的第一段(例如,字节) 是否匹配‘a’。如果匹配,则图形行走引擎加载节点N1 304和有效载荷的下一段,并且如果不匹配,则图形行走引擎返回不匹配。NFA graph 320 begins with node N0 302. After loading node N0 302, the graph walking engine is configured to determine whether the first segment (e.g., byte) of the payload matches 'a'. If so, the graph walking engine loads node N1 304 and the next segment of the payload, and if not, the graph walking engine returns a no match.
当加载节点N1 304后,如果有效载荷的下一段为‘x’,则图形行走引擎加载为标记节点的节点N7 316。该标记节点指示在有效载荷内找到匹配,从而使得图形行走引擎返回匹配。如果有效载荷的下一段为‘b’,则图形行走引擎加载节点N2 306。如果有效载荷的下一段为除了‘x’或‘b’以外的任何事物,则图形行走引擎确定有效载荷中没有匹配并返回不匹配。After loading node N1 304, if the next segment of the payload is 'x', the graph walking engine loads node N7 316, which is a marked node. This marked node indicates that a match was found within the payload, causing the graph walking engine to return a match. If the next segment of the payload is 'b', the graph walking engine loads node N2 306. If the next segment of the payload is anything other than 'x' or 'b', the graph walking engine determines that there is no match in the payload and returns a no match.
当加载节点N2 306后,如果有效载荷的下一段为‘x’,则图形行走引擎加载为标记节点的节点N7 316。该标记节点指示在有效载荷内找到匹配,从而使得图形行走引擎返回匹配。如果有效载荷的下一段为‘b’,则图形行走引擎加载节点N3 308。如果有效载荷的下一段为除了‘x’或‘b’以外的任何事物,则图形行走引擎确定有效载荷中没有匹配并返回不匹配。After loading node N2 306, if the next segment of the payload is 'x', the graph walking engine loads node N7 316, which is a marked node. This marked node indicates that a match was found within the payload, causing the graph walking engine to return a match. If the next segment of the payload is 'b', the graph walking engine loads node N3 308. If the next segment of the payload is anything other than 'x' or 'b', the graph walking engine determines that there is no match in the payload and returns a no match.
当加载节点N3 308后,如果有效载荷的下一段为‘x’,则图形行走引擎加载为标记节点的节点N7 316。该标记节点指示在有效载荷内找到匹配,从而使得图形行走引擎返回匹配。如果有效载荷的下一段为‘b’,则图形行走引擎加载节点N4 310。如果有效载荷的下一段为除了‘x’或‘b’以外的任何事物,则图形行走引擎确定有效载荷中没有匹配并返回不匹配。After loading node N3 308, if the next segment of the payload is 'x', the graph walking engine loads node N7 316, which is a marked node. This marked node indicates that a match was found within the payload, causing the graph walking engine to return a match. If the next segment of the payload is 'b', the graph walking engine loads node N4 310. If the next segment of the payload is anything other than 'x' or 'b', the graph walking engine determines that there is no match in the payload and returns a no match.
当加载节点N4 310后,如果有效载荷的下一段为‘x’,则图形行走引擎加载为标记节点的节点N7 316。该标记节点指示在有效载荷内找到匹配,从而使得图形行走引擎返回匹配。如果有效载荷的下一段为‘b’,则图形行走引擎加载节点N5 312。如果有效载荷的下一段为除了‘x’或‘b’以外的任何事物,则图形行走引擎确定有效载荷中没有匹配并返回不匹配。After loading node N4 310, if the next segment of the payload is 'x', the graph walking engine loads node N7 316, which is a marked node. This marked node indicates that a match was found within the payload, causing the graph walking engine to return a match. If the next segment of the payload is 'b', the graph walking engine loads node N5 312. If the next segment of the payload is anything other than 'x' or 'b', the graph walking engine determines that there is no match in the payload and returns a no match.
当加载节点N5 312后,如果有效载荷的下一段为‘x’,则图形行走引擎加载为标记节点的节点N7 316。该标记节点指示在有效载荷内找到匹配,从而使得图形行走引擎返回匹配。如果有效载荷的下一段为‘b’,则图形行走引擎加载节点N6 314。如果有效载荷的下一段为除了‘x’或‘b’以外的任何事物,则图形行走引擎确定有效载荷中没有匹配并返回不匹配。After loading node N5 312, if the next segment of the payload is 'x', the graph walking engine loads node N7 316, which is a marked node. This marked node indicates that a match was found within the payload, causing the graph walking engine to return a match. If the next segment of the payload is 'b', the graph walking engine loads node N6 314. If the next segment of the payload is anything other than 'x' or 'b', the graph walking engine determines that there is no match in the payload and returns a no match.
当加载节点N6 314后,如果有效载荷的下一段为‘x’,则图形行走引擎加载为标记节点的节点N7 316。该标记节点指示在有效载荷内找到匹配,从而使得图形行走引擎返回匹配。如果有效载荷的下一段为除了‘x’以外的任何事物,则图形行走引擎确定有效载荷中没有匹配并返回不匹配。After loading node N6 314, if the next segment of the payload is 'x', the graph walking engine loads node N7 316, which is a marked node. This marked node indicates that a match was found within the payload, causing the graph walking engine to return a match. If the next segment of the payload is anything other than 'x', the graph walking engine determines that there is no match in the payload and returns a no match.
图3B为本发明所使用的NFA图形370的示例实施例的图解。NFA图形370被配置成用于匹配与图3A中相同的图样“ab{0,5}x”。如上所述,“b{0,5}”针对图样中的‘b’在任何地方匹配从零到五次。从而,该图样与以下有效载荷匹配:ax、abx、 abbx、abbbx、abbbbx、或abbbbbbx。FIG3B illustrates an example embodiment of an NFA graph 370 used in the present invention. NFA graph 370 is configured to match the same pattern "ab{0,5}x" as in FIG3A . As described above, "b{0,5}" matches anywhere from zero to five times for the 'b' in the pattern. Thus, this pattern matches the following payloads: ax, abx, abbx, abbbx, abbbbx, or abbbbbbx.
节点N0 352为被配置成用于针对元素‘a’进行匹配的字符节点。节点N1 354为被配置成用于针对元素‘b’在任何地方匹配从‘0’和‘5’次的可变计数节点。可变计数节点可以被配置成用于针一个对元素进行匹配任何次数,包括无限次数。节点N2 356 为被配置成用于针对元素‘x’进行匹配的字符节点。节点N3 358 为被配置成用于表示图样末端并且发出在有效载荷中已经找到匹配的信号的标记节点。Node N0 352 is a character node configured to match element 'a'. Node N1 354 is a variable count node configured to match element 'b' anywhere between '0' and '5' times. Variable count nodes can be configured to match a pair of elements any number of times, including an unlimited number of times. Node N2 356 is a character node configured to match element 'x'. Node N3 358 is a marker node configured to indicate the end of the pattern and to signal that a match has been found in the payload.
图形行走引擎从NFA图形370加载节点N0 352。然后,图形行走引擎对有效载荷的第一段进行处理。如果该有效载荷段为‘a’,则图形行走引擎加载节点N1 354。否则,图形行走引擎返回不匹配。The graph walking engine loads node N0 352 from the NFA graph 370. The graph walking engine then processes the first segment of the payload. If the payload segment is 'a', the graph walking engine loads node N1 354. Otherwise, the graph walking engine returns a no match.
当加载节点N1 354后,图形行走引擎将该节点解释为针对字符类‘b’进行匹配从0到5次的可变计数节点。从此节点开始,图形行走引擎被配置成用于针对有效载荷中的这种图样进行匹配,并且然后加载下一个节点(节点N2 356)。然后,节点N2 356确定有效载荷的下一段是否为‘x’。如果是,则图形行走引擎加载节点 N3 358(标记节点),指示图样为匹配。如果不是,则图形行走引擎返回不匹配。以下描述了图形行走引擎使用运行堆栈行走过可变计数节点的特定细节。After loading node N1 354, the graph walking engine interprets the node as a variable count node that matches from 0 to 5 times for the character class 'b'. Starting from this node, the graph walking engine is configured to match this pattern in the payload and then loads the next node (node N2 356). Node N2 356 then determines whether the next segment of the payload is 'x'. If so, the graph walking engine loads node N3 358 (marker node) indicating that the pattern is a match. If not, the graph walking engine returns a mismatch. The following describes the specific details of the graph walking engine using the run stack to walk through the variable count nodes.
NFA图形370标识与图3A的NFA图形320相同的图样,然而用更少的节点来这样做。因此,NFA图形370使用更少的存储器并且具有降低的复杂性。NFA graph 370 identifies the same pattern as NFA graph 320 of FIG 3A, but does so with fewer nodes.Thus, NFA graph 370 uses less memory and has reduced complexity.
图3C为图解380,展示了示出其他类型的计数节点的NFA 图形390的示例实施例。固定计数节点针对一个元素对有效载荷段搜索固定次数,而不是使用一个范围。例如,图样“ab{5}x”匹配有效载荷“abbbbbx”,但不匹配“ax”、“abx”、“abbx”、“abbbx”、或“abbbbx”。同样,以一个范围而不是零开始的可变计数匹配图样可以被转换成固定计数图样、然后转换成可变计数图样。例如,“ab{5,10}x”还可以被表达为“ab{5}b{0,5}x”。图3C中的NFA 图形390示出了这种等效图样。如上所述,这生成用于针对“a”进行匹配的节点N0 382、针对“b”进行匹配五次的节点N1 384、针对“b”进行匹配从零到五次的可变计数节点N2 386、针对“x”进行匹配的节点N3 388、以及用于表示找到匹配的标记节点N4 389。Figure 3C is a diagram 380 showing an example embodiment of an NFA graph 390 illustrating other types of counting nodes. Fixed counting nodes search the payload segment a fixed number of times for an element, rather than using a range. For example, the pattern "ab{5}x" matches the payload "abbbbbx", but does not match "ax", "abx", "abbx", "abbbx", or "abbbbx". Similarly, variable counting matching patterns that start with a range other than zero can be converted to fixed counting patterns and then to variable counting patterns. For example, "ab{5,10}x" can also be expressed as "ab{5}b{0,5}x". The NFA graph 390 in Figure 3C shows such an equivalent pattern. As described above, this generates node N0 382 for matching against "a", node N1 384 for matching against "b" five times, variable count node N2 386 for matching against "b" from zero to five times, node N3 388 for matching against "x", and a marker node N4 389 to indicate that a match was found.
作为本发明的示例实施例,每个节点存储一个元素,其中,元素为或者单独的值/字符/字母、字符类ID(例如,字符类索引)、或者字符串。每个节点进一步存储其节点类型和节点类型所要求的任何其他信息,例如可变计数节点存储针对每个元素进行匹配的最大(并且可选地最小)次数和其是否为懒惰/贪婪/领属/全匹配类型节点,固定计数节点存储针对每个元素进行匹配的次数。As an example embodiment of the present invention, each node stores an element, where an element is either a single value/character/letter, a character class ID (e.g., a character class index), or a string. Each node further stores its node type and any other information required by the node type, such as a variable count node storing the maximum (and optionally minimum) number of matches for each element and whether it is a lazy/greedy/possessive/full match type node, and a fixed count node storing the number of matches for each element.
图4A为现有技术系统所使用的NFA图形440的示例实施例。NFA图形440被配置成用于对“[aA][bB]”图样进行匹配,其对包括“ab”、“aB”、“Ab”、和“AB”的有效载荷进行匹配。4A is an example embodiment of an NFA graph 440 used by prior art systems. NFA graph 440 is configured to match the pattern "[aA][bB]", which matches payloads including "ab", "aB", "Ab", and "AB".
图形行走引擎首先处理节点N0 402。如果有效载荷为“a”,则图形行走引擎加载节点N1 404。然后,图形行走引擎对有效载荷的下一段进行处理。如果有效载荷为‘b’,则图形行走引擎加载节点N3 408,该节点是标记节点。如果有效载荷为‘B’,则图形行走引擎加载节点N4 410,该节点也是标记节点。两个标记节点指令图形行走引擎返回匹配。The graph walking engine first processes node N0 402. If the payload is "a," the graph walking engine loads node N1 404. The graph walking engine then processes the next segment of the payload. If the payload is "b," the graph walking engine loads node N3 408, which is a marked node. If the payload is "B," the graph walking engine loads node N4 410, which is also a marked node. Two marked nodes indicate that the graph walking engine returns a match.
另一方面,如果当处理节点N0 402时,图形行走引擎处理为“A”的有效载荷,则图形行走引擎加载节点N2 406。然后,图形行走引擎对有效载荷的下一段进行处理。如果有效载荷为‘b’,则图形行走引擎加载节点N5 412,该节点是标记节点。如果有效载荷为‘B’,则图形行走引擎加载节点N6 414,该节点也是标记节点。两个标记节点指令图形行走引擎返回匹配。On the other hand, if the graph walking engine processes a payload of "A" when processing node N0 402, the graph walking engine loads node N2 406. The graph walking engine then processes the next segment of the payload. If the payload is "b," the graph walking engine loads node N5 412, which is a marked node. If the payload is "B," the graph walking engine loads node N6 414, which is also a marked node. The two marked nodes indicate that the graph walking engine returns a match.
NFA图形440甚至可以用短图样如“[aA][bB]”来增加复杂性。即使每个字符类仅指定两个值/字符/字母,则添加至图样的每个附加字符类使图形中的节点数量翻倍。进一步地,字符类可以具有所指示的任何数量的字符,字符越多,则甚至进一步增加图形的复杂性越大。NFA graph 440 can even increase in complexity with short patterns such as "[aA][bB]". Even if each character class specifies only two values/characters/letters, each additional character class added to the pattern doubles the number of nodes in the graph. Furthermore, a character class can have any number of characters indicated, with more characters increasing the complexity of the graph even further.
在一个实施例中,每个字符类可以存储在128位或256位图中。字符类中的每个位表示其相应的ASCII值。例如,位图中的第12位表示“换行”字符。如果第12位为1,其意味着字符类包括“换行”字符。如果第12位为0,则字符类不包括“换行”字符。以相同方式,每个字符类可以存储多个ASCII值。例如,[^\n](即,具有除了换行以外的所有字符的字符类)将除了第12位以外的所有位标记为“1”。举另一个示例来讲,字符类[a-z]包括97–122的ASCII值。因此,字符类[a-z]的位图将具有被设置为“1”的位97–122和被设置为“0”的所有其他位。In one embodiment, each character class can be stored in a 128-bit or 256-bit map. Each bit in the character class represents its corresponding ASCII value. For example, bit 12 in the bitmap represents the "new line" character. If bit 12 is 1, it means that the character class includes the "new line" character. If bit 12 is 0, the character class does not include the "new line" character. In the same way, each character class can store multiple ASCII values. For example, [^\n] (i.e., a character class with all characters except new line) marks all bits except bit 12 as "1". As another example, the character class [a-z] includes ASCII values 97–122. Therefore, the bitmap for the character class [a-z] will have bits 97–122 set to "1" and all other bits set to "0".
当图形行走引擎对有效载荷段与字符类进行匹配时,其可以将有效载荷的ASCII值用作字符类的索引。例如,当字符类为[a-z] 时,假设图形行走引擎处理字母“r”,该字母具有为114的ASCII 值。图形行走引擎可以访问字符类的第114位并且确定其是否被设置为用于确定其是否与该字符类匹配。这可以用以下逻辑语句来表达:“if(CharacterClass[PayLoadASCIIValue]==true),return match; else return nomatch”,其中,PayLoadASCIIValue为有效载荷的当前段的ASCII值,或者在这种情况下为114。When the graphics walking engine matches a payload segment to a character class, it can use the ASCII value of the payload as an index into the character class. For example, when the character class is [a-z], suppose the graphics walking engine processes the letter "r," which has an ASCII value of 114. The graphics walking engine can access bit 114 of the character class and determine if it is set to determine if it matches the character class. This can be expressed as the following logic statement: "if (CharacterClass[PayLoadASCIIValue] == true), return match; else return nomatch," where PayLoadASCIIValue is the ASCII value of the current segment of the payload, or in this case, 114.
给定图样还可以包括多个字符类。例如,图样“[a-z][0-9][^\n][a-z]”具有四个字符类,但仅三个唯一字符类(即, [a-z]、[0-9]、和[^\n]),因为[a-z]为重复字符类。所以,编译器首先确定该或这些图样中存在的唯一字符类的数量。然后,编译器为每个字符类分配唯一号码(例如,索引或标识符)。例如,编译器为[a-z] 分配为1的索引、为[0-9]分配为2的索引、以及为[^\n]分配为3的索引。即使其出现两次,字符类[a-z]作为位图被存储一次,并且可通过其索引“1”来访问。A given pattern may also include multiple character classes. For example, the pattern "[a-z][0-9][^\n][a-z]" has four character classes, but only three unique character classes (i.e., [a-z], [0-9], and [^\n]) because [a-z] is a repeating character class. Therefore, the compiler first determines the number of unique character classes present in the pattern(s). The compiler then assigns a unique number (e.g., an index or identifier) to each character class. For example, the compiler assigns [a-z] an index of 1, [0-9] an index of 2, and [^\n] an index of 3. Even though it appears twice, the character class [a-z] is stored once as a bitmap and can be accessed through its index "1."
编译器存储字符类作为二维矩阵,可以用作为输入的两个索引来对其进行访问。第一索引标识字符类,并且第二索引标识那个字符类内的值。The compiler stores character classes as a two-dimensional matrix that can be accessed using two indices as input. The first index identifies the character class, and the second index identifies the value within that character class.
在NFA图形的上下文中,节点类型=“字符类”的每个节点的“元素”字段包含字符类编号。此外,“可变计数”或“固定计数”类型的节点的“元素”字段还可以是字符类的索引,从而使得图形行走引擎针对该字符类分别进行匹配可变次数或固定次数。In the context of an NFA graph, the "Element" field of each node with node type "Character Class" contains the character class number. Additionally, the "Element" field of a node with type "Variable Count" or "Fixed Count" can also contain the index of a character class, causing the graph walking engine to match that character class a variable or fixed number of times, respectively.
此外,编译器确定所有图样的字符类。例如,编译器可以接收图样一“[a-z][0-9]”、图样二“[a-z][^\n]”以及图样三“[0-9][A-F]”。虽然图样一、二和三共有六个字符类,但其仅具有四个唯一字符类。因此,编译器给[a-z]分配索引1、给[0-9]分配索引2、给[^\n]分配索引3和给[A-F]分配索引4。图形的任何节点可以通过访问字符类的位图来对字符类进行访问,而不管其所出现在其中的一个或多个图样。这减少了存储所有字符类所需的存储器。In addition, the compiler determines the character classes for all patterns. For example, the compiler may receive pattern one, "[a-z][0-9]", pattern two, "[a-z][^\n]", and pattern three, "[0-9][A-F]". Although patterns one, two, and three have six character classes in total, they only have four unique character classes. Therefore, the compiler assigns index 1 to [a-z], index 2 to [0-9], index 3 to [^\n], and index 4 to [A-F]. Any node in the graph can access a character class by accessing its bitmap, regardless of the pattern or patterns in which it appears. This reduces the memory required to store all character classes.
在行走过程中,图形行走引擎将节点中所存储的指示(节点类型字符类的)字符类的元素用作第一索引并且将有效载荷段(例如,有效载荷字节)用作具体字符类位图的第二索引。这加载了二维矩阵的具体位,其中,在两个索引的位置处加载的位指示有效载荷段(例如,有效载荷字节)是否在具体字符类内。During the walking process, the graph walking engine uses the element indicating the character class (of the node type character class) stored in the node as the first index and the payload segment (e.g., payload byte) as the second index of the specific character class bitmap. This loads specific bits of the two-dimensional matrix, where the bits loaded at the two index positions indicate whether the payload segment (e.g., payload byte) is within the specific character class.
图4B为图解450,展示了带有本发明所使用的密集节点和相应字符类矩阵472(例如,位图表)的NFA图形470的示例实施例。NFA图形470被配置成用于对图样“[aA][bB]”进行匹配,其与包括“ab”、“aB”、“Ab”、和“AB”的有效载荷匹配。在本实施例中,NFA图形470利用图形的节点内的字符类来减少图形中节点的数量并降低图形复杂性。编译器确定该图样是否包括两个唯一字符类([aA]和[bB])。编译器为字符类[aA]分配索引0并为字符类[bB]分配索引1,并且两者作为位图存储在二维矩阵中。Figure 4B is a diagram 450 showing an example embodiment of an NFA graph 470 with a dense matrix 472 (e.g., a bitmap table) of nodes and corresponding character classes used by the present invention. The NFA graph 470 is configured to match the pattern "[aA][bB]", which matches a payload including "ab", "aB", "Ab", and "AB". In this embodiment, the NFA graph 470 utilizes character classes within the nodes of the graph to reduce the number of nodes in the graph and reduce the complexity of the graph. The compiler determines whether the pattern includes two unique character classes ([aA] and [bB]). The compiler assigns index 0 to the character class [aA] and index 1 to the character class [bB], and both are stored as bitmaps in a two-dimensional matrix.
字符类矩阵472示出了字符类[aA]和[bB]在其相应索引处的展示。字符类0(即,[aA])示出了用于设置“A”和“a”的条目,并且字符类1(即,[bB])示出了用于设置“b”和“B”的条目。使用相同字符类的其他图形可以利用这些字符类,并且该矩阵可以进一步包括与其他图形不同的字符类。关于图21,示出了字符类矩阵的另一个示例。The character class matrix 472 shows a display of the character classes [aA] and [bB] at their respective indices. Character class 0 (i.e., [aA]) shows entries for setting "A" and "a," and character class 1 (i.e., [bB]) shows entries for setting "b" and "B." Other graphics using the same character classes can utilize these character classes, and the matrix can further include character classes that are different from other graphics. With respect to FIG. 21 , another example of a character class matrix is shown.
图22为表2200,展示了字符类匹配节点的格式。表2200 包括节点类型2202、匹配类型2204、元素2206、下一个节点地址 2208、以及计数值2210。对于字符类匹配节点而言,节点类型2202 指示字符类。匹配类型2204指示其不适用(例如,空值)。元素2206 指示用于访问字符类矩阵中的字符类的字符类索引。下一个节点地址2208包括图形中的下一个节点的地址。计数值2210对于字符类匹配节点而言是不适用的。FIG22 is a table 2200 illustrating the format of a character class match node. Table 2200 includes a node type 2202, a match type 2204, an element 2206, a next node address 2208, and a count value 2210. For a character class match node, node type 2202 indicates the character class. Match type 2204 indicates that it is not applicable (e.g., a null value). Element 2206 indicates the character class index used to access the character class in the character class matrix. Next node address 2208 includes the address of the next node in the graph. Count value 2210 is not applicable for a character class match node.
再次参照图4B,当读取节点N0 452时,图形行走引擎确定节点N0 452是否针对所指定的字符类中的任何值/字符/字母匹配,在这种情况下,其为“a”或“A”,并且加载有效载荷的第一段。图形行走引擎加载节点的节点类型和节点的元素,节点类型指示其为字符类,节点的元素指示字符类具有索引0。然后,图形行走引擎将有效载荷的当前段用作到位图的索引(例如,加载 Matrix[0][PayloadSegmentValue])以确定该有效载荷段是否与字符类匹配。如果该有效载荷的第一段为所指定的字符类中的任何值/字符/ 字母,如从这些索引的位置处的位图加载的值所指示的,图形行走引擎加载由节点N0 452中所存储的“下一个节点地址”所指向的节点N1 454。Referring again to FIG. 4B , when reading node N0 452, the graph walking engine determines whether node N0 452 matches any value/character/letter in the specified character class, in this case, "a" or "A," and loads the first segment of the payload. The graph walking engine loads the node type of the node and the element of the node, where the node type indicates that it is a character class and the element of the node indicates that the character class has index 0. The graph walking engine then uses the current segment of the payload as an index into the bitmap (e.g., loading Matrix[0][PayloadSegmentValue]) to determine whether the payload segment matches the character class. If the first segment of the payload is any value/character/letter in the specified character class, as indicated by the values loaded from the bitmap at these indexed positions, the graph walking engine loads node N1 454 pointed to by the "next node address" stored in node N0 452.
当读取节点N1 454时,图形行走引擎确定节点N1 454是否针对所指定的字符类中的任何值/字符/字母匹配,在这种情况下,其为“b”或“B”,并且加载有效载荷的下一段。图形行走引擎加载节点的节点类型和节点的元素,节点类型指示其为字符类,节点的元素指示字符类具有索引1。然后,图形行走引擎将有效载荷的当前段用作位图的索引(例如,加载Matrix[1][PayloadSegmentValue]) 以确定该有效载荷段是否与字符类匹配。如果该有效载荷的当前段为所指定的字符类中的任何值/字符/字母,如从这些索引的位置处的位图加载的值所指示的,图形行走引擎加载由节点N1 454(即,节点N2 456)中所存储的“下一个节点地址”所指向的节点。当加载节点N2 456时,基于节点N2 456的“节点类型”,图形行走引擎确定其是否为标记节点。然后,图形行走引擎可以返回匹配。When reading node N1 454, the graph walking engine determines whether node N1 454 matches any value/character/letter in the specified character class, in this case, "b" or "B", and loads the next segment of the payload. The graph walking engine loads the node type of the node and the element of the node, where the node type indicates that it is a character class and the element of the node indicates that the character class has an index of 1. The graph walking engine then uses the current segment of the payload as an index into the bitmap (e.g., loading Matrix[1][PayloadSegmentValue]) to determine whether the payload segment matches the character class. If the current segment of the payload is any value/character/letter in the specified character class, as indicated by the values loaded from the bitmap at these indexed locations, the graph walking engine loads the node pointed to by the "next node address" stored in node N1 454 (i.e., node N2 456). When loading node N2 456, based on the "node type" of node N2 456, the graph walking engine determines whether it is a marked node. The graph walking engine can then return a match.
NFA图形470具有降低的复杂性和减少的大小。进一步地,每个字符类中的值/字符/字母的数量不增加或减少NFA图形470的大小。此外,增加图形中不同字符类的数量会线性地增加NFA图形 470的大小,而不是增加字符类中的值/字符/字母的数量的倍数。NFA graph 470 has reduced complexity and size. Further, the number of values/characters/letters in each character class does not increase or decrease the size of NFA graph 470. Moreover, increasing the number of different character classes in the graph increases the size of NFA graph 470 linearly, rather than increasing it by a multiple of the number of values/characters/letters in the character class.
除了字符类以外,根据本发明的示例实施例,另一种节点类型为字符串节点。字符串节点为一个针对连续值/字母/字符进行匹配的节点。In addition to character classes, another node type according to an exemplary embodiment of the present invention is a string node. A string node is a node that matches consecutive values/letters/characters.
图23为表2300,展示了字符串匹配节点的格式。字符串节点表2330包括节点类型2302、匹配类型2304、元素2306、下一个节点地址2308、以及计数值2310。节点类型2302指示“字符串匹配”。匹配类型2304不适用(例如,空值)。元素2306指示字符串数据2340的地址。下一个节点地址2308包括图形中的下一个节点的地址。计数值2310指示字符串的长度。Figure 23 is a table 2300 showing the format of a string match node. String node table 2330 includes a node type 2302, a match type 2304, an element 2306, a next node address 2308, and a count value 2310. Node type 2302 indicates "string match." Match type 2304 is not applicable (e.g., null value). Element 2306 indicates the address of string data 2340. Next node address 2308 includes the address of the next node in the graph. Count value 2310 indicates the length of the string.
字符串节点2330的元素2306的字符串数据的地址所指示的字符串数据2340包括节点类型2312、匹配类型2314、元素2316、下一个节点地址2318、以及计数值2320。节点类型2312指示其为“字符串数据”。元素2316指示字符串中的字符。匹配类型2314、下一个节点地址2318、以及计数2320全都不适用。The string data 2340 indicated by the address of the string data of the element 2306 of the string node 2330 includes a node type 2312, a matching type 2314, an element 2316, a next node address 2318, and a count value 2320. The node type 2312 indicates that it is "string data". The element 2316 indicates the characters in the string. The matching type 2314, the next node address 2318, and the count 2320 are all not applicable.
字符串节点的相似变体为不区分大小写的字符串节点。在一个示例实施例中,字符串前面的修饰符可以指示图样中的不区分大小写的字符串节点,如“{i}abc”,其将与以下有效载荷匹配:“abc”、“abC”、“aBc”、“aBC”、“Abc”、“AbC”、“ABc”、和“ABC”。本领域的普通技术人员可以认识到修饰符“{i}”可以是任何所指示的符号或系列符号。A similar variation of the string node is a case-insensitive string node. In one example embodiment, a modifier preceding the string can indicate a case-insensitive string node in the pattern, such as "{i}abc", which will match the following payloads: "abc", "abC", "aBc", "aBC", "Abc", "AbC", "ABc", and "ABC". One of ordinary skill in the art will recognize that the modifier "{i}" can be any indicated symbol or series of symbols.
为了处理不区分大小写的字符串节点(以及不区分大小写的字符节点),在进行比较之前,对字母的位之一进行掩码。例如,大写字母(A-Z)的ASCII值在65-90之间和在97-122之间。‘A’ (例如,十进制的97)的二进制表示为1100001,而‘a’(例如,十进制的65)的二进制表示为1000001。因此,在两个二进制值之间,仅一个位不同(例如,位[5],如果从自0开始的最低有效位进行标引)。对于每对相应的不区分大小写的字母字符而言,在比较之前,掩码元素和有效载荷段内的位[5](其中,每个的最低有效位为零)。该比较返回匹配,因为除了仅表示大写变化的位[5]以外,这些值是相同的。本领域的普通技术人员可以认识到,除了位[5]以外的其他一个或多个位可以用作例如其他字符方案中的掩码位。To handle case-insensitive string nodes (and case-insensitive character nodes), one of the alphabetic bits is masked before the comparison is performed. For example, the ASCII values for uppercase letters (A-Z) are between 65-90 and between 97-122. The binary representation of 'A' (e.g., 97 decimal) is 1100001, while the binary representation of 'a' (e.g., 65 decimal) is 1000001. Therefore, only one bit differs between the two binary values (e.g., bit [5], if indexed from the least significant bit starting at 0). For each pair of corresponding case-insensitive alphabetic characters, the mask element and bit [5] in the payload segment (where the least significant bit of each is zero) are compared before the comparison. The comparison returns a match because the values are the same except for bit [5], which only represents the uppercase change. One of ordinary skill in the art will recognize that one or more bits other than bit [5] can be used as mask bits, for example, in other character schemes.
图4C为使用五个单独节点的图样“USPTO”的常规图形 475的示例实施例,每个节点进行值/字符/字母检查。因此,常规图形475具有针对‘U’匹配的第一节点N0 476、针对‘S’匹配的第二节点N1 477、针对‘P’匹配的第三节点N2 478、针对‘T’匹配的第三节点N3479、针对‘O’匹配的第三节点N4 480、以及指示匹配的标记节点N5 481。FIG4C is an example embodiment of a conventional graph 475 for the pattern “USPTO” using five separate nodes, each performing a value/character/letter check. Thus, the conventional graph 475 has a first node N0 476 that matches for a ‘U’, a second node N1 477 that matches for a ‘S’, a third node N2 478 that matches for a ‘P’, a third node N3 479 that matches for a ‘T’, a third node N4 480 that matches for an ‘O’, and a marker node N5 481 that indicates a match.
图4D展示了使用字符串节点的图形490的示例实施例。节点N0 492为包括指向字符串“USPTO”的指针的字符串节点。节点 N0 492指令引擎针对整个字符串“USPTO”进行匹配而不是按照图 4C针对每个单独的字母进行匹配并且然后加载下一个节点。FIG4D shows an example embodiment of a graph 490 using string nodes. Node N0 492 is a string node that includes a pointer to the string "USPTO." Node N0 492 instructs the engine to match against the entire string "USPTO" rather than matching against each individual letter as in FIG4C and then loading the next node.
图24为表2400,展示了固定计数匹配节点的格式。对于固定计数节点而言,节点类型2402指示固定计数匹配2402。匹配类型字段2404对固定计数节点而言不适用。对于固定计数匹配节点而言,元素2406可以指示进行匹配所针对的字符,或者其可以指示进行匹配所针对的字符类索引。如果匹配成功,则下一个节点地址2408包含有待处理的下一个节点的地址。计数值2410包含进行元素匹配的固定次数。Figure 24 is a table 2400, which shows the format of a fixed count matching node. For a fixed count node, a node type 2402 indicates a fixed count matching 2402. Match type field 2404 is not applicable to a fixed count node. For a fixed count matching node, an element 2406 can indicate the character to be matched, or it can indicate the character class index to be matched. If the match is successful, then next node address 2408 comprises the address of the next node to be processed. Count value 2410 comprises the fixed number of times that the element is matched.
图25为表2500,展示了可变计数匹配节点的格式。该节点包括指示可变计数匹配的节点类型2502。该节点进一步包括指示可变计数节点是否为懒惰、贪婪、领属、或全匹配节点的匹配类型2504。元素2506可以包含进行匹配所针对的字符,或者其可以指示进行匹配所针对的字符类索引。如果匹配成功,则下一个节点地址2508包括有待处理的下一个节点的地址。计数值2510包括进行元素匹配的最大次数,其包括用于表示无限的特殊符号。Figure 25 is a table 2500, which shows the format of a variable count matching node. This node comprises a node type 2502 indicating that a variable count matches. This node further comprises a matching type 2504 indicating whether the variable count node is lazy, greedy, possessive, or fully matched. Element 2506 can comprise the character for which matching is performed, or it can indicate the character class index for which matching is performed. If the match is successful, then next node address 2508 comprises the address of the next node to be processed. Count value 2510 comprises the maximum number of times that the element matches, which comprises special symbols for representing infinity.
可选地,计数值2510还可以包含用于存储元素必须匹配的最大次数的第二计数值(如果没有提供第二计数值,则默认为零)。这可以用于表示范围匹配。此类图样还可以由进行元素匹配最小次数的固定计数节点然后进行匹配剩余次数的可变计数节点的组合来表示。Optionally, count value 2510 may also include a second count value for storing the maximum number of times an element must match (if no second count value is provided, it defaults to zero). This can be used to represent range matches. Such a pattern can also be represented by a combination of a fixed count node for matching an element a minimum number of times and a variable count node for matching the remaining number of times.
图5为图解500,展示了NFA图形510的示例实施例,该图形展示了本发明的示例实施例。NFA图形510被配置成用于检测图样“[^\n]*[zZ]b{5}”,其中,[^\n]为指示除了换行字符以外的任何值/字符/字母的字符,并且[“zZ”]为表示或者“z”或者“Z”的字符类。FIG5 is a diagram 500 illustrating an example embodiment of an NFA graph 510 illustrating an example embodiment of the present invention. NFA graph 510 is configured to detect the pattern "[^\n]*[zZ]b{5}," where [^\n] is a character indicating any value/character/letter except a newline character, and ["zZ"] is a character class indicating either "z" or "Z."
节点N0 502为可变计数节点。可变计数节点可以是或者懒惰、贪婪、领属(其是贪婪节点的优化形式)或全匹配类型节点。当从图样编译图形时,设置节点类型。用户可以在图样中指示可变计数节点应被编译为哪种匹配节点类型。可替代地,取决于所希望的图形行为,用户还可以将编译器设置为默认到四种模式中的任何模式。假设图形行走引擎处理有效载荷“yyyZbbbbbzyyyZbbbbb”。Node N0 502 is a variable count node. A variable count node can be either lazy, greedy, possessive (which is an optimized form of a greedy node), or a full match type node. When compiling a graph from a pattern, the node type is set. The user can indicate in the pattern which match node type the variable count node should be compiled as. Alternatively, depending on the desired graph behavior, the user can also set the compiler to default to any of the four modes. Assume that the graph walking engine processes the payload "yyyZbbbbbzyyyZbbbbb".
如果节点N0 502是懒惰的,则图形行走引擎找出到下一个节点(节点N1 504)的可能的最短路径。即,图形行走引擎在节点 N1 504而不是在节点N0 502处理有效载荷中的“z”或“Z”的第一实例,即使节点N0 502元素包括找出除换行以外的任何有效载荷段,其包括“z”或“Z”。然而,如果节点N0 502以此类方式处理有效载荷,则其将不利用通过图形的最短路径。If node N0 502 is lazy, the graph walking engine finds the shortest possible path to the next node (node N1 504). That is, the graph walking engine processes the first instance of "z" or "Z" in the payload at node N1 504 rather than at node N0 502, even though node N0 502 elements include finding any payload segments other than line breaks that include "z" or "Z." However, if node N0 502 processes the payload in this manner, it will not utilize the shortest path through the graph.
当按照可变计数懒惰节点处理节点N0时,图形行走引擎将具有零有效载荷偏移的节点N0的运行堆栈条目推送至运行堆栈。当推送运行堆栈条目后,图形行走引擎提取下一个节点N1 504。图形行走引擎提取与零有效载荷偏移相对应的下一个有效载荷字节(‘y’),并试图使其与节点N1 504的元素字符类[zZ]匹配。由于该字节与该字符类不匹配,图形行走引擎弹出该运行堆栈条目。然后,图形行走引擎对带有包含节点N0 502的所弹出的堆栈条目的同一字节进行处理。字节‘y’与字符类[^\n]匹配,所以其实现匹配。然后,图形引擎使有效载荷偏移增量1并推送包含节点N0 502的运行堆栈条目。When processing node N0 according to the variable count lazy node, the graph walking engine pushes the run stack entry of node N0 with zero payload offset to the run stack. After pushing the run stack entry, the graph walking engine extracts the next node N1 504. The graph walking engine extracts the next payload byte ('y') corresponding to the zero payload offset and attempts to match it with the element character class [zZ] of node N1 504. Because this byte does not match this character class, the graph walking engine pops up this run stack entry. Then, the graph walking engine processes the same byte with the popped stack entry containing node N0 502. Byte 'y' matches the character class [^\n], so it achieves a match. Then, the graph engine increments the payload offset by 1 and pushes the run stack entry containing node N0 502.
当推送运行堆栈条目后,图形行走引擎提取下一个节点N1 504。图形行走引擎提取与1有效载荷偏移相对应的下一个有效载荷字节,‘y’,并试图使其与节点N1 504的元素字符类[zZ]匹配。由于该字节与该字符类不匹配,图形行走引擎弹出该运行堆栈条目。然后,图形行走引擎对带有包含节点N0 502的所弹出的堆栈条目的同一字节进行处理。字节‘y’与字符类[^\n]匹配,所以其实现匹配。图形行走引擎使有效载荷偏移增量1并推送包含节点N0 502的运行堆栈条目。After pushing the run stack entry, the graph walking engine extracts the next node, N1 504. The graph walking engine extracts the next payload byte, 'y', corresponding to a payload offset of 1, and attempts to match it with the element character class [zZ] of node N1 504. Since the byte does not match the character class, the graph walking engine pops the run stack entry. The graph walking engine then processes the same byte with the popped stack entry containing node N0 502. The byte 'y' matches the character class [^\n], so it matches. The graph walking engine increments the payload offset by 1 and pushes the run stack entry containing node N0 502.
当推送运行堆栈条目后,图形行走引擎提取下一个节点N1 504。图形行走引擎提取与2有效载荷偏移相对应的下一个有效载荷字节(‘y’),并试图使其与节点N1 504的元素字符类[zZ]匹配。由于该字节与该字符类不匹配,图形行走引擎弹出该运行堆栈条目。然后,图形行走引擎对带有包含节点N0 502的所弹出的堆栈条目的同一字节进行处理。字节‘y’与字符类[^\n]匹配,所以其实现匹配。图形行走引擎使有效载荷偏移增量1并推送包含节点N0 502的运行堆栈条目。After pushing the run stack entry, the graph walking engine extracts the next node N1 504. The graph walking engine extracts the next payload byte ('y') corresponding to the payload offset of 2 and attempts to match it with the element character class [zZ] of node N1 504. Since the byte does not match the character class, the graph walking engine pops the run stack entry. The graph walking engine then processes the same byte with the popped stack entry containing node N0 502. The byte 'y' matches the character class [^\n], so it matches. The graph walking engine increments the payload offset by 1 and pushes the run stack entry containing node N0 502.
当推送运行堆栈条目后,图形行走引擎提取下一个节点N1 504。图形行走引擎提取与3有效载荷偏移相对应的下一个有效载荷字节(‘Z’),并试图使其与节点N1 504的元素字符类[zZ]匹配。由于该字节与该字符类匹配,图形行走引擎提取下一个节点N2 506。After pushing the run stack entry, the graph walking engine extracts the next node N1 504. The graph walking engine extracts the next payload byte ('Z') corresponding to the payload offset of 3 and attempts to match it with the element character class [zZ] of node N1 504. Since the byte matches the character class, the graph walking engine extracts the next node N2 506.
然后,图形行走引擎加载固定计数节点N2,该节点针对‘b’进行匹配五次。图形行走引擎加载有效载荷的下五个段,所有这些段为‘b’,固定计数节点与其元素匹配,该元素也为‘b’。在固定计数节点N2 506匹配之后,然后,图形行走引擎加载为标记节点的节点N3508。标记节点指示找到匹配。然后,如果复制位为‘1’,则图形行走引擎弹出运行堆栈中的所有条目并丢弃它们,在这种情况下,其丢弃运行堆栈中的包含具有有效载荷偏移3的节点N0 502 的单个条目。复制位为一个标志位,当到达NFA图形中的标记节点后(例如,找出有效载荷中的匹配),可以从运行堆栈弹出标志标记复制位(例如,设置为‘1’)的任何运行堆栈条目并将其丢弃而不进行进一步处理。如果没有标记复制位(例如,设置为‘0’),则当被弹出后,不丢弃运行堆栈条目,而是被处理以试图找到附加(例如,针对全匹配节点)匹配。The graph walking engine then loads fixed-count node N2, which matches five times for 'b'. The graph walking engine then loads the next five segments of the payload, all of which are 'b'. The fixed-count node matches its element, which is also 'b'. After fixed-count node N2 506 matches, the graph walking engine then loads node N3 508, which is a marked node. A marked node indicates a match was found. If the copy bit is '1', the graph walking engine then pops all entries from the run stack and discards them. In this case, it discards the single entry in the run stack containing node N0 502 with a payload offset of 3. The copy bit is a flag bit. Upon reaching a marked node in the NFA graph (e.g., finding a match in the payload), any run stack entry with the copy bit marked (e.g., set to '1') can be popped from the run stack and discarded without further processing. If the copy bit is not marked (e.g., set to '0'), the run stack entry is not discarded upon being popped, but instead is processed to attempt to find additional matches (e.g., for a fully matched node).
关于图17,更加详细地描述了处理可变计数懒惰节点。With respect to FIG. 17 , handling variable count lazy nodes is described in more detail.
如果节点N0 502是贪婪的,则图形行走引擎找出到下一个节点(节点N1 504)的可能的最长路径。例如,有效载荷中的第一“z”或“Z”不一定意味着处理节点N1 504。假设图形行走引擎处理“yyyZbbbbbzyyyZbbbbb”的相同有效载荷。虽然懒惰节点N0 502 返回“yyyZbbbbb”作为匹配,但贪婪节点N0 502返回“yyyZbbbbbzyyyZbbbbb”。换言之,节点N0502忽略第一可能的匹配而继续对有效载荷进行匹配以找到最长可能的匹配。以此类方式对有效载荷进行匹配需要图形行走引擎保存其步伐,例如,通过将有效载荷位置的节点和偏移推送至运行堆栈。这样,如果图形行走引擎到达有效载荷末端而没有找到匹配,其可以从运行堆栈弹出节点从而回溯以匹配早前可能的匹配。If node N0 502 is greedy, then the graph walking engine finds out the possible longest path to the next node (node N1 504). For example, the first "z" or "Z" in the payload does not necessarily mean processing node N1 504. Suppose that the graph walking engine processes the identical payload of "yyyZbbbbbzyyyZbbbbb". Although lazy node N0 502 returns "yyyZbbbbb" as a match, greedy node N0 502 returns "yyyZbbbbbzyyyZbbbbb". In other words, node N0 502 ignores the first possible match and continues to match the payload to find the longest possible match. In this way, matching the payload requires the graph walking engine to save its steps, for example, by pushing the node and offset of the payload position to the run stack. Like this, if the graph walking engine arrives at the payload end and does not find a match, it can pop up a node from the run stack to backtrack to match a possible match earlier.
在本发明的一个示例实施例中,在处理贪婪或领属节点N0 502时,图形行走引擎加载有效载荷的字节并针对元素对它们进行匹配直到其找到非匹配或其用完有效载荷。因为字符类为[^\n],其涵盖有效载荷中的所有值/字符/字母,所以图形行走引擎用完有效载荷。然后,图形行走引擎将节点推送至包括被设置的复制位、有效载荷偏移、和指示当匹配可变计数节点内指示的元素时所消耗的字节数量的计数(即,在这种情况下,该计数为19)的运行堆栈。然后,图形行走引擎加载字符类节点N1 504,但由于没有来自有效载荷的字节以供消耗,其返回非匹配。In an example embodiment of the present invention, when processing greedy or possessive node N0 502, the graph walking engine loads the bytes of the payload and matches them against the elements until it finds a non-match or it runs out of payload. Because the character class is [^\n], which covers all values/characters/letters in the payload, the graph walking engine runs out of payload. The graph walking engine then pushes the node onto a run stack that includes a set copy bit, a payload offset, and a count indicating the number of bytes consumed when matching the element indicated in the variable count node (i.e., in this case, the count is 19). The graph walking engine then loads character class node N1 504, but since there are no bytes from the payload to consume, it returns a non-match.
然后,图形行走引擎从运行堆栈弹出可变计数节点并将该计数减少1。然后,图形行走引擎将节点推送至包括被设置的复制位、有效载荷偏移、和指示所消耗的字节数量的计数(18)的运行堆栈。然后,图形行走引擎加载字符类节点N1 504。图形行走引擎试图消耗有效载荷中的第19字节,其为‘b’,但这不与节点N1 504的字符类[zZ]匹配。然后,图形行走引擎再次弹出运行堆栈条目。重复此内容直到计数减少至节点N1 504消耗的字节为一个匹配的数量,其是当该计数为13时。当该计数为13时,可变计数节点有效地消耗“yyyZbbbbbzyyy”。然后,节点N1 504试图消耗第14字节,其为“Z”,其是针对字符类[zZ]的匹配。然后,图形行走引擎加载节点 N2 506。节点N2消耗有效载荷中的下5个“b”。然后,图形行走引擎加载节点N3 508,其是指示找到匹配的标记节点。处理标记节点N3 508之后,图形行走引擎弹出并丢弃复制位被设置为1的所有运行堆栈条目,并且在这种情况下,运行堆栈中仅存在一个此类条目。因此,贪婪节点发现有效载荷中的最长匹配。设置/未设置复制位是将引擎在运行时间期间推送的(标记)运行堆栈条目与也存在于运行堆栈中的初始输入缓冲区条目分开,然而,这还可以通过其他方式来实现。关于图18,更加详细地描述了处理可变计数贪婪节点。The graph walking engine then pops the variable count node from the run stack and decrements the count by 1. The graph walking engine then pushes the node to the run stack including the set copy bit, the payload offset, and a count (18) indicating the number of bytes consumed. The graph walking engine then loads the character class node N1 504. The graph walking engine attempts to consume the 19th byte in the payload, which is a ‘b’, but this does not match the character class [zZ] of node N1 504. The graph walking engine then pops the run stack entry again. This is repeated until the count is reduced to the number of bytes consumed by node N1 504 that is a match, which is when the count is 13. When the count is 13, the variable count node effectively consumes “yyyZbbbbbzyyy”. Node N1 504 then attempts to consume the 14th byte, which is a “Z”, which is a match for the character class [zZ]. The graph walking engine then loads node N2 506. Node N2 consumes the next 5 “b”s in the payload. Then, the graph walking engine loads node N3 508, which is a marked node indicating that a match has been found. After processing marked node N3 508, the graph walking engine pops up and discards all run stack entries whose copy bit is set to 1, and in this case, there is only one such entry in the run stack. Therefore, the greedy node finds the longest match in the payload. Setting/not setting the copy bit is to separate the (marked) run stack entry pushed by the engine during runtime from the initial input buffer entry also present in the run stack, however, this can also be achieved in other ways. With respect to Figure 18, processing variable count greedy nodes is described in more detail.
如果节点N0 502是领属的,则图形行走引擎找出到下一个节点(节点N1 504)的可能的最长路径。针对领属节点,图形行走引擎产生于上述贪婪节点相同的结果,但执行以下更优化的过程:如关于图19更加详细描述的,当到达有效载荷末端后不回溯。If node N0 502 is owned, the graph walking engine finds the longest possible path to the next node, node N1 504. For owned nodes, the graph walking engine produces the same results as for greedy nodes above, but performs a more optimized process: as described in more detail with respect to FIG. 19 , it does not backtrack after reaching the end of the payload.
如果节点N0 502为可变计数全匹配节点,则图形行走引擎找出到下一个节点(节点N1 504)的可能的所有可能的路径。图形行走引擎可以针对可变计数全匹配节点返回多个匹配。关于图20,更加详细地描述了处理可变计数全匹配节点。If node N0 502 is a variable count full match node, the graph walking engine finds all possible paths to the next node (node N1 504). The graph walking engine can return multiple matches for a variable count full match node. With respect to FIG. 20 , processing a variable count full match node is described in more detail.
图6A为框图600,展示了编译器604处理图样602的示例实施例。在本示例中,图样602为“ACMEa*b{5,10}c{5}[def]”。图样602包括可以分别被分成字符串节点(例如,“ACME”)、可变计数节点(例如,“a*”)、固定计数和可变计数节点(例如,可转换成“b{5}b{0,5}”的“b{5,10}”)、固定计数节点(例如,c{5}) 以及字符类(例如,[def])的图样段620、622、624、626和628。FIG6A is a block diagram 600 illustrating an example embodiment of a compiler 604 processing a pattern 602. In this example, the pattern 602 is "ACMEa*b{5,10}c{5}[def]." The pattern 602 includes pattern segments 620, 622, 624, 626, and 628, which can be respectively divided into a string node (e.g., "ACME"), a variable-count node (e.g., "a*"), a fixed-count and variable-count node (e.g., "b{5,10}" which can be converted to "b{5}b{0,5}"), a fixed-count node (e.g., c{5}), and a character class (e.g., [def]).
编译器604包括字符串检测模块610、可变计数检测模块 612、固定计数检测模块614、固定计数和可变计数检测模块616、以及字符类检测模块618。每个模块610、612、614、616和618接收图样602、或其中对应的图样段620、622、624、626、和628,并基于图样为图形组装模块606组装的编译NFA图形640生成节点 630、632、634、636a-b、638。Compiler 604 includes a string detection module 610, a variable count detection module 612, a fixed count detection module 614, a fixed count and variable count detection module 616, and a character class detection module 618. Each module 610, 612, 614, 616, and 618 receives the graph 602, or corresponding graph segments 620, 622, 624, 626, and 628 thereof, and generates nodes 630, 632, 634, 636a-b, and 638 for a compiled NFA graph 640 assembled by graph assembly module 606 based on the graph.
在另一个实施例中,编译器604对图样602进行元素和元素类型而非单独模块检查以针对每个元素和节点类型进行匹配。In another embodiment, compiler 604 examines graph 602 element by element and element type rather than individual modules to perform a match for each element and node type.
图6B为图6A的图样602产生的编译NFA图形640的图解 601。编译NFA图形640以针对字符串“ACME”进行匹配的字符串节点650开始。然后,图形640具有被配置成用于针对元素“a”进行匹配无限次数的下一个可变计数节点652。可变计数节点可以或者是懒惰的、贪婪的、领属的或者全的。基于图样的语法,该节点可以被设置为懒惰的、贪婪的、领属的或全匹配类型。例如,如果元字符后面跟着第二元字符“?”,如图样“*?”、“+?”、“??”或“{n,m}?”,则编译器可以创建匹配类型懒惰可变计数节点。如果元字符后面跟着第二元字符“+”,如图样“*+”、“++”、“?+”和“{n,m}+”,则编译器可以创建匹配类型领属节点。例如,如果元字符后面跟着第二元字符“*”,如图样“**”、“+*”、“?*”、和“{n,m}*”,则编译器可以创建匹配类型全可变计数节点。FIG6B is a diagram 601 of a compiled NFA graph 640 generated by the pattern 602 of FIG6A . The compiled NFA graph 640 begins with a string node 650 that matches the string "ACME". Graph 640 then has a next variable count node 652 configured to match an infinite number of times against the element "a". Variable count nodes can be lazy, greedy, possessive, or full. Based on the grammar of the pattern, the node can be set to a lazy, greedy, possessive, or full match type. For example, if a metacharacter is followed by a second metacharacter "?", such as in the patterns "*?", "+?", "??", or "{n,m}?", the compiler can create a matching type lazy variable count node. If a metacharacter is followed by a second metacharacter "+", such as in the patterns "*+", "++", "?+", and "{n,m}+", the compiler can create a matching type possessive node. For example, if the metacharacter is followed by a second metacharacter "*", as in the patterns "**", "+*", "?*", and "{n,m}*", the compiler may create a matching type full variable count node.
例如,考虑有效载荷“abbbbbbb”。针对“ab*”图样,生成贪婪匹配类型可变计数节点。结果是节点消耗整个有效载荷,从而使得结果为“abbbbbbb”。For example, consider the payload "abbbbbbb". For the "ab*" pattern, a greedy match type variable count node is generated. The result is that the node consumes the entire payload, resulting in the result "abbbbbbb".
类似地,针对“ab*+”图样,创建领属匹配类型可变计数节点。领属节点具有与贪婪节点类似的特性,然后,被配置成用于当到达有效载荷末端后不进行回溯。同样,结果为可变计数领属节点此处消耗整个有效载荷并且不进行回溯,从而使得结果为“abbbbbbb”,这刚好与贪婪节点相同。Similarly, for the "ab*+" pattern, a possessive match type variable count node is created. The possessive node has similar properties to the greedy node, but is configured not to backtrack after reaching the end of the payload. Again, the result is a variable count possessive node that consumes the entire payload without backtracking, resulting in "abbbbbbb," which is exactly the same as the greedy node.
针对““ab*?”图样,创建懒惰匹配类型可变计数节点。结果为,可变计数节点消耗最短可能匹配,其为“a”。For the "ab*?" pattern, a lazy matching type variable count node is created. As a result, the variable count node consumes the shortest possible match, which is "a."
针对“ab**”图样,创建全匹配类型可变计数节点。结果为,找到全可能匹配,从而使得找到“a”、“ab”、“abb”、“abbb”、“abbbb”、“abbbbb”、“abbbbbb”、和“abbbbbbb”。For the "ab**" pattern, a full match type variable count node is created. As a result, all possible matches are found, such as "a", "ab", "abb", "abbb", "abbbb", "abbbbb", "abbbbb", "abbbbbb", and "abbbbbbb".
在其他实施例中,各种符号可以用于例如通过指定特殊字符为图样的前缀或后缀来指示匹配类型。在其他实施例中,生成图形640的编译器的设置可以设置节点的匹配类型。In other embodiments, various symbols may be used to indicate the matching type, for example, by specifying a special character as a prefix or suffix to the pattern. In other embodiments, settings of the compiler that generates the graph 640 may set the matching type of a node.
然后,图形640具有固定计数节点654a和可变计数节点 654b,其基于逻辑上被分成b{5}和“b{0,5}”的“b{5,10}”图样段。固定计数节点654a针对“b”进行匹配五次。可变计数节点654b针对“b”在任何地方进行匹配从零到五次。然后,图形640具有在有效载荷中针对“c”进行匹配五次的固定计数节点656。字符类节点 658针对元素[def]进行匹配,其为字符“d”、“e”、或“f”中任一项。Graph 640 then has a fixed count node 654a and a variable count node 654b, which are based on the "b{5,10}" pattern segment logically divided into b{5} and "b{0,5}." Fixed count node 654a matches "b" five times. Variable count node 654b matches "b" anywhere from zero to five times. Graph 640 then has a fixed count node 656 that matches "c" five times in the payload. Character class node 658 matches the element [def], which is any of the characters "d," "e," or "f."
该图形还可以针对作为可变计数节点或固定计数节点的一部分的字符类进行匹配。例如,图样“[xyz]{0,5}”编译成针对字符类[xyz]进行匹配从零到五次的可变计数节点。例如,“xyzzx”是与该图样匹配的有效载荷。The graph can also match character classes that are part of a variable-count node or a fixed-count node. For example, the pattern "[xyz]{0,5}" compiles to a variable-count node that matches the character class [xyz] from zero to five times. For example, "xyzzx" is a payload that matches this pattern.
图7为框图700,展示了对图样702进行编译的示例实施例。图样确定模块703对图样702进行匹配项检查。匹配项包括元素和节点类型。如果图样确定模块703找到匹配项,则其将该匹配项作为元素704和节点类型706输出至节点生成模块708。如果图样确定模块703没有找到匹配项,则其指示图样结束,并且图样确定模块 703可以消耗另一个图样,或者如果没有更多图样,则完成编译。节点生成模块708生成包括元素704和节点类型706的密集节点710,该元素可以是值/字符/字母、字符类、或字符串,该节点类型可以是值/字符/字母、字符类、可变计数、固定计数、可变计数和固定计数、字符串、或分离节点(用于交替)或用于宣布匹配的标记节点(被用作图形的最终节点)。Fig. 7 is a block diagram 700 showing an example embodiment of compiling a pattern 702. A pattern determination module 703 performs a matching item check on the pattern 702. The matching item includes an element and a node type. If the pattern determination module 703 finds a matching item, it outputs the matching item to a node generation module 708 as an element 704 and a node type 706. If the pattern determination module 703 does not find a matching item, it indicates that the pattern is finished, and the pattern determination module 703 can consume another pattern, or if there are no more patterns, the compilation is completed. The node generation module 708 generates a dense node 710 including an element 704 and a node type 706. The element can be a value/character/letter, a character class, or a string, and the node type can be a value/character/letter, a character class, a variable count, a fixed count, a variable count and a fixed count, a string, or a separation node (for alternation) or a marked node (used as the final node of the graph) for announcing a match.
图8为流程图800,展示了对图样进行编译的示例实施例。编译以对图样进行匹配项检查而开始,匹配项包括元素和节点类型 (802)。然后,该方法确定是否找到匹配项(804)。如果找到了,则该方法生成指示节点类型和元素的节点(806)如果没找到,则该方法结束(808)并可选地对另一个图样进行编译。FIG8 is a flow chart 800 illustrating an example embodiment of compiling a pattern. Compilation begins by checking the pattern for a match, including elements and node types (802). The method then determines whether a match is found (804). If a match is found, the method generates a node indicating the node type and element (806). If not, the method ends (808) and optionally compiles another pattern.
图9为流程图900,展示了图形行走引擎对节点进行处理的示例实施例。图形行走引擎从节点提取节点类型和元素(902)。如上所述,该元素可以是值/字符/字母、字符类索引、或字符串值。然后,图形行走引擎确定节点是否需要继续与同一元素进行匹配 (904)。图形行走引擎可以例如通过使用索引或计数变量来跟踪其针对可变计数节点或固定计数节点已经匹配的元素的数量。如果节点类型指示继续针对该元素进行匹配,则图形行走引擎使有效载荷段与该元素匹配(906)。然后,图形行走引擎确定该有效载荷段是否该元素匹配(910)。如果匹配,则确定该节点需要继续进行确定 (904)。如果节点类型不指示继续进行匹配,则图形行走引擎针对该节点返回匹配或不匹配(908),并且可用于处理图形中的下一个节点。Figure 9 is a flowchart 900 illustrating an example embodiment of a graph walking engine processing a node. The graph walking engine extracts the node type and element from the node (902). As described above, the element can be a value/character/letter, a character class index, or a string value. The graph walking engine then determines whether the node needs to continue matching with the same element (904). The graph walking engine can, for example, track the number of elements it has matched for a variable count node or a fixed count node by using an index or a count variable. If the node type indicates that the element should continue matching, the graph walking engine matches the payload segment with the element (906). The graph walking engine then determines whether the payload segment matches the element (910). If so, it determines that the node needs to continue matching (904). If the node type does not indicate that the node should continue matching, the graph walking engine returns a match or no match for the node (908) and can be used to process the next node in the graph.
如果该有效载荷段与该元素不匹配(910),然而,图形行走引擎返回不匹配(912)。If the payload segment does not match the element (910), however, the graphics walking engine returns no match (912).
图10为框图1000,展示了图形行走引擎对NFA图形1002 的节点1004a-d进行处理的示例实施例。确定模块1006接收包括节点1004a-d的NFA图形1002。NFA图形1002可以包括任意数量的节点1004a-d。进一步地,在一个实施例中,确定模块1006可以接收单独节点1004a-d。确定模块1006将节点类型1008和元素1010 输出至匹配模块1011。基于节点类型1008,匹配模块1011针对元素1010对一个或多个有效载荷段1014进行匹配。匹配模块1011可以接收基于节点类型1008的一个或多个附加段1014,例如,被配置成匹配一个或多个有效载荷段的可变计数节点或固定计数节点。当完成处理后,匹配模块1011输出匹配或不匹配1012。可选地,匹配模块1011可以请求确定模块1006处理NFA图形1002的下一个节点。匹配模块1011可以进一步处理早前或稍后的有效载荷段以及 NFA图形的早前或稍后的节点。Figure 10 is a block diagram 1000 illustrating an example embodiment of a graph walking engine processing nodes 1004a-d of an NFA graph 1002. A determination module 1006 receives an NFA graph 1002 including nodes 1004a-d. NFA graph 1002 may include any number of nodes 1004a-d. Furthermore, in one embodiment, determination module 1006 may receive individual nodes 1004a-d. Determination module 1006 outputs a node type 1008 and an element 1010 to a matching module 1011. Based on the node type 1008, matching module 1011 matches one or more payload segments 1014 against the element 1010. Matching module 1011 may receive one or more additional segments 1014 based on the node type 1008, such as variable-count nodes or fixed-count nodes configured to match one or more payload segments. Upon completion of processing, matching module 1011 outputs a match or mismatch 1012. Optionally, matching module 1011 may request determination module 1006 to process the next node of NFA graph 1002. Matching module 1011 may further process earlier or later payload segments and earlier or later nodes of the NFA graph.
图11为流程图1100,展示了使本发明所使用的NFA图形行走的过程。在一个实施例中,执行该过程的元素可以是与关于图 2B中所示的框图250描述的元素。FIG11 is a flow chart 1100 illustrating a process for walking an NFA graph used in the present invention. In one embodiment, the elements performing this process may be the same as those described with respect to block diagram 250 shown in FIG2B .
图形行走引擎252包括多个存储器,这些存储器存储用于保存步伐通过图形的其他部分的路径的运行堆栈260和当有效载荷以仅部分匹配完成被处理时用于存储保存缓冲区/堆栈264的保存缓冲区/堆栈264,从而使得当加载同一流的下一个有效载荷时,该引擎可以从保存缓冲区将堆栈条目重新加载到运行堆栈内。在一个实施例中,运行堆栈260或保存缓冲区264可以被保持为片上存储器中的循环缓冲区,并且其可以溢出至外部系统存储器,但可以使用其他堆栈实现方式和存储器类型。并且,当将下一个指令馈送至引擎以处理同一流的后续有效载荷时,主机可以从保存缓冲区将条目拷贝(移动)到运行堆栈(输入缓冲区)内。The graphics walk engine 252 includes a plurality of memories that store a run stack 260 for storing the path of the steps through other parts of the graphics and a save buffer/stack 264 for storing a save buffer/stack 264 when the payload is processed with only a partial match, so that when the next payload of the same stream is loaded, the engine can reload the stack entries from the save buffer into the run stack. In one embodiment, the run stack 260 or save buffer 264 can be maintained as a circular buffer in the on-chip memory, and it can overflow to the external system memory, but other stack implementations and memory types can be used. In addition, when the next instruction is fed to the engine to process the subsequent payload of the same stream, the host can copy (move) the entries from the save buffer into the run stack (input buffer).
运行堆栈260将堆栈条目推送至头指针并从头指针弹出堆栈条目。保存缓冲区/堆栈对其尾指针处的堆栈条目进行排队。因为保存缓冲区/堆栈264对其尾指针处的条目进行排队(例如LILO),其被结构化为一个队列。与处理器耦合的主机为初始运行堆栈提供至少一个填充的条目(例如,从图2B 的输入缓冲区258输入)。该主机还可以提供初始指令(例如,来自指令队列254)。行走指令包含以下与堆栈相关的信息:(1)运行堆栈头指针;(2)保存堆栈尾指针;(3)运行堆栈条目的数量;以及(4)按照条目数量的运行堆栈和保存堆栈大小。The run stack 260 pushes stack entries to the head pointer and pops stack entries from the head pointer. The save buffer/stack queues the stack entries at its tail pointer. Because the save buffer/stack 264 queues the entries at its tail pointer (e.g., LILO), it is structured as a queue. A host coupled to the processor provides at least one populated entry for the initial run stack (e.g., input from input buffer 258 of Figure 2B). The host may also provide an initial instruction (e.g., from instruction queue 254). The walk instruction contains the following stack-related information: (1) the run stack head pointer; (2) the save stack tail pointer; (3) the number of run stack entries; and (4) the run stack and save stack sizes in terms of the number of entries.
在本发明的一个示例实施例中,运行堆栈条目包括指示节点类型字段、复制字段、逆向处理字段、有效载荷偏移字段、类型特定数据字段、和地址字段的字段。如果节点类型为“NOP”(例如,无操作(No-op)),则图形行走引擎丢弃运行堆栈条目并弹出有待处理的下一个运行堆栈条目。如果节点类型为提取(Fetch),则运行堆栈条目不包含节点信息,并且类型特定数据字段无效。如果该类型为除了“NOP”或Fetch以外的任何类型(例如,固定字符、可变计数、分离节点、字符串节点、字符类、字符、或标记节点),则运行堆栈条目本身包含类型特定数据字段中的节点信息。下表列出了可能的节点类型。In an example embodiment of the present invention, the run stack entry includes a field indicating a node type field, a copy field, a reverse processing field, a payload offset field, a type specific data field, and an address field. If the node type is "NOP" (for example, no operation (No-op)), the graph walking engine discards the run stack entry and pops out the next run stack entry to be processed. If the node type is extraction (Fetch), the run stack entry does not include node information, and the type specific data field is invalid. If the type is any type (for example, fixed character, variable count, separation node, string node, character class, character, or mark node) except "NOP" or Fetch, the run stack entry itself includes the node information in the type specific data field. The following table has listed possible node types.
复制字段用于将图形行走引擎在运行时间期间推送的运行堆栈条目与也存在于同一运行堆栈中的初始输入缓冲区条目分开。逆向字段指示是否应在处理当前节点后使有效载荷偏移增量或减量。这允许在正向和逆向上处理有效载荷。偏移字段指示当前节点所处理的有效载荷的位置。如果节点类型为提取,则地址字段包含起始节点地址。以另外的方式,如果当处理堆栈条目时有效载荷匹配,则地址字段包含有待提取的下一个节点的地址。The Copy field is used to separate the run stack entries pushed by the graph walk engine during runtime from the initial input buffer entries also present in the same run stack. The Reverse field indicates whether the payload offset should be incremented or decremented after processing the current node. This allows payloads to be processed in both the forward and reverse directions. The Offset field indicates the location of the payload being processed by the current node. If the node type is Extract, the Address field contains the starting node address. Alternatively, if a payload matches when processing a stack entry, the Address field contains the address of the next node to be extracted.
将运行堆栈条目推送到运行堆栈260内允许图形行走引擎处理其他NFA节点或NFA图形的另一个分支,同时如果在那个分支中没有找到匹配,则能够返回到运行堆栈260内所记录的节点。Pushing a run stack entry into run stack 260 allows the graph walking engine to process other NFA nodes or another branch of the NFA graph while being able to return to the node recorded in run stack 260 if no match is found in that branch.
保存缓冲区/堆栈264允许图形行走引擎保存部分匹配,例如,在图形行走引擎到达有效载荷的末端的情况下。当加载同一流的后续有效载荷后,该引擎将堆栈条目从保存缓冲区/堆栈264拷贝到运行堆栈260内。在另一个实施例中,当向图形行走引擎提供下一个指令后,主机装置的主机软件可以将保存堆栈的内容拷贝至输入堆栈。在本实施例中,由于图形行走引擎由主机软件管理,其没有意识到数据包流或该流中的后续数据包。图11展示了实现所述使用运行堆栈和保存堆栈的系统的示例实施例,然而,本领域的普通技术人员可以设想其他实现方式。The save buffer/stack 264 allows the graphics walking engine to save partial matches, for example, when the graphics walking engine reaches the end of a payload. When a subsequent payload of the same stream is loaded, the engine copies the stack entry from the save buffer/stack 264 to the run stack 260. In another embodiment, after providing the next instruction to the graphics walking engine, the host software of the host device can copy the contents of the save stack to the input stack. In this embodiment, because the graphics walking engine is managed by the host software, it is unaware of the packet stream or subsequent packets in that stream. FIG11 illustrates an example embodiment of a system that uses a run stack and a save stack, however, other implementations are contemplated by those skilled in the art.
该过程以启动图形行走而开始(1102)。然后,该过程确定运行堆栈(例如,运行堆栈260)是否是空的(1104)。如果运行堆栈(例如,运行堆栈260)是空的,则该过程返回(1122)。响应于来自主机的指令253,可以从输入缓冲区258推送运行堆栈(例如,运行堆栈260)条目。如果运行堆栈(例如,运行堆栈260)不是空的(例如,具有至少一个条目),则图形行走引擎(例如,引擎252) 弹出运行堆栈(例如,运行堆栈260)以加载下一个堆栈条目(1106)。该运行堆栈(例如,运行堆栈260)是后进先出数据结构,所以从该运行堆栈(例如,运行堆栈260)弹出的条目是最近被推送到该运行堆栈(例如,运行堆栈260)内的条目。The process begins by initiating a graphics walk (1102). The process then determines whether a run stack (e.g., run stack 260) is empty (1104). If the run stack (e.g., run stack 260) is empty, the process returns (1122). In response to an instruction 253 from the host, a run stack (e.g., run stack 260) entry may be pushed from the input buffer 258. If the run stack (e.g., run stack 260) is not empty (e.g., has at least one entry), the graphics walk engine (e.g., engine 252) pops the run stack (e.g., run stack 260) to load the next stack entry (1106). The run stack (e.g., run stack 260) is a last-in, first-out data structure, so the entry popped from the run stack (e.g., run stack 260) is the most recently pushed entry into the run stack (e.g., run stack 260).
然后,图形行走引擎确定该运行堆栈条目是否存储节点信息(1108)。如果存储,则图形行走引擎从所弹出的运行堆栈条目读取节点信息(1110)。如果没有存储,则图形行走引擎从所弹出的运行堆栈条目所指示的存储器地址提取节点(1112)。The graph walking engine then determines whether the run stack entry stores the node information (1108). If so, the graph walking engine reads the node information from the popped run stack entry (1110). If not, the graph walking engine extracts the node from the memory address indicated by the popped run stack entry (1112).
然后,图形行走引擎在结果中设置“终止行走”位(也被称为“完成(done)”位)为假(1114)。然后,图形行走引擎对运行堆栈条目(1118)所指示的节点进行处理,这关于图12进行了更加详细的解释。关于图11,然后,图形行走引擎确定该终止行走位在被处理的节点内是否被赋值为真(TRUE)(1120)。如果没有,则图形行走引擎提取在当前节点的“下一节点地址”字段处所指示的节点(1116)。如果有,则图形行走引擎确定该运行堆栈是否是空的(1104)。The graph walking engine then sets the "end walk" bit (also known as the "done" bit) in the result to false (1114). The graph walking engine then processes the node indicated by the run stack entry (1118), which is explained in more detail with respect to FIG. 12. With respect to FIG. 11, the graph walking engine then determines whether the end walk bit is set to true (1120) within the node being processed. If not, the graph walking engine extracts the node indicated at the "next node address" field of the current node (1116). If so, the graph walking engine determines whether the run stack is empty (1104).
图12为流程图1200,展示了对节点进行处理的示例实施例。流程图1200为图11的对节点进行处理(1118)的扩展。FIG12 is a flowchart 1200 illustrating an example embodiment of processing a node. Flowchart 1200 is an extension of the process a node (1118) of FIG11.
图形行走引擎开始处理节点(1202)。图形行走引擎确定该图形行走引擎是否为密集节点(1204)。如果其不是密集节点,则该图形行走引擎按照非密集NFA节点(例如,字符节点、分离节点、或标记节点)对该节点进行处理(1214)。然后,图形行走引擎返回(1224)。The graph walking engine begins processing the node (1202). The graph walking engine determines whether the node is a dense node (1204). If it is not a dense node, the graph walking engine processes the node as a non-dense NFA node (e.g., a character node, a separation node, or a label node) (1214). The graph walking engine then returns (1224).
如果该节点为密集图形节点(1204),则图形行走引擎确定该节点是否为字符类节点(1206)。如果是,则图形行走引擎对该字符类节点进行处理(1216)。关于图13,更加详细地描述了处理字符类节点。然后,该图形行走引擎返回(1224)。If the node is a dense graph node (1204), the graph walking engine determines whether the node is a character node (1206). If so, the graph walking engine processes the character node (1216). Processing character nodes is described in more detail with respect to FIG. 13. The graph walking engine then returns (1224).
如果该节点不是字符类节点(1206),则图形行走引擎确定该节点是否为字符串节点(1208)。如果是,则图形行走引擎按照字符串节点对该节点进行处理(1218)。关于图14,更加详细地描述了处理字符串节点。然后,图形行走引擎返回(1224)。If the node is not a character node (1206), the graph walking engine determines whether the node is a string node (1208). If so, the graph walking engine processes the node as a string node (1218). Processing string nodes is described in more detail with respect to FIG. 14. The graph walking engine then returns (1224).
如果该节点不是字符串节点(1208),则图形行走引擎确定该节点是否为固定计数节点(1210)。如果是,则其对该固定计数节点进行处理(1220)。关于图15,进一步详细地描述了处理固定计数节点。然后,图形行走引擎返回(1224)。If the node is not a string node (1208), the graph walking engine determines whether the node is a fixed count node (1210). If so, it processes the fixed count node (1220). Processing fixed count nodes is described in further detail with respect to FIG. 15. The graph walking engine then returns (1224).
关于图12,如果节点不是固定计数节点(1210),则图形行走引擎确定该节点是否为可变计数节点(1211)。如果是,则图形行走引擎按照可变计数节点对该节点进行处理(1222)。关于图 16,进一步详细地描述了处理可变计数节点。然后,图形行走引擎返回(1224)。如果图形行走引擎确定该节点为可变计数节点(1211),则其返回错误代码(1226)。With respect to FIG12 , if the node is not a fixed-count node ( 1210 ), the graph walking engine determines whether the node is a variable-count node ( 1211 ). If so, the graph walking engine processes the node as a variable-count node ( 1222 ). Processing variable-count nodes is described in further detail with respect to FIG16 . The graph walking engine then returns ( 1224 ). If the graph walking engine determines that the node is a variable-count node ( 1211 ), it returns an error code ( 1226 ).
图形行走引擎可以使用对该节点进行处理的其他实施例。例如,图形行走引擎可以通过以不同的顺序检查每种节点类型来确定该节点的类型。The graph walking engine may use other embodiments for processing the node. For example, the graph walking engine may determine the type of the node by examining each node type in a different order.
图13为流程图1300,展示了对字符类节点进行处理的示例实施例。关于图22,以上描述了字符类节点的格式。关于图13,流程图1300为图12中所描述的对字符类节点进行处理(1216)的扩展。FIG13 is a flow chart 1300 illustrating an example embodiment of processing a character class node. The format of a character class node is described above with respect to FIG22. With respect to FIG13, flow chart 1300 is an extension of the processing of a character class node (1216) described in FIG12.
图26为表2600,展示了在对字符类节点类型进行处理的上下文中推送的堆栈条目的示例实施例。该堆栈条目包括指示字符类匹配的堆栈条目类型2602、指示字符类索引的元素2606、以及指示图形中的下一节点的下一节点地址2608。该堆栈条目进一步包括复制位2612、逆向位2614(指示图形是否要逆向行走)、以及指示下一字节的偏移以在有效载荷中进行处理的偏移位2616。该堆栈条目进一步包括匹配类型2604和计数值2610,这两个都指示它们不适用。字符类堆栈条目仅被排队到保存缓冲区/堆栈内,并且不被推送至运行堆栈,因为没有必要将其推送到运行堆栈内。Figure 26 is table 2600, has shown the example embodiment of the stack entry that is pushed in the context that character class node type is processed.This stack entry comprises the stack entry type 2602 of indication character class coupling, the element 2606 of indication character class index and the next node address 2608 of the next node in indication graph.This stack entry further comprises copy bit 2612, reverse bit 2614 (indication graph whether will walk in reverse) and the offset 2616 of indication next byte with processing in payload.This stack entry further comprises matching type 2604 and count value 2610, and both of these indicate that they are not applicable.The character class stack entry is only queued in the preservation buffer/stack, and is not pushed to the run stack, because it is not necessary to be pushed to the run stack.
关于图13,图形行走引擎开始对字符类节点进行处理 (1302)。图形行走引擎从字符类节点(例如,图22的元素2206) 加载字符类索引,并使用该字符类索引读取二维矩阵中所存储的位图/掩码(1304)。然后,图形行走引擎检查有效载荷中是否存在至少又一个字节要处理(1306)。With reference to FIG13 , the graphics walking engine begins processing the character class node ( 1302 ). The graphics walking engine loads the character class index from the character class node (e.g., element 2206 in FIG22 ) and uses the character class index to read the bitmap/mask stored in the two-dimensional matrix ( 1304 ). The graphics walking engine then checks to see if there is at least one more byte in the payload to process ( 1306 ).
如果存在至少又一个字节,则图形行走引擎从有效载荷提取下一字节(或其他数据大小)(1308)。图形行走引擎使用有效载荷的字节来访问位图/掩码的位(或其他数据大小)和确定是否设置该位(1310)。如果设置了该位,则图形行走引擎确定该有效载荷的字节与节点所表示的字符类匹配,并返回(1312)。如果没有设置该位(1310),则图形行走引擎将结果中的终止行走位设置为“真”(1314)并且然后返回(1312)。终止行走位指示当前图形行走没有找到匹配并且指示该引擎应中止当前图形行走线程而不是提取图形的下一个节点。If there is at least one more byte, the graph walk engine extracts the next byte (or other data size) from the payload (1308). The graph walk engine uses the byte of the payload to access the bit of the bitmap/mask (or other data size) and determines whether the bit is set (1310). If the bit is set, the graph walk engine determines that the byte of the payload matches the character class represented by the node and returns (1312). If the bit is not set (1310), the graph walk engine sets the terminate walk bit in the result to "true" (1314) and then returns (1312). The terminate walk bit indicates that the current graph walk did not find a match and indicates that the engine should abort the current graph walk thread instead of extracting the next node of the graph.
在其他方面,如果图形行走引擎确定不再有要处理的有效载荷(1306),则图形行走引擎将节点推送至保存缓冲区/堆栈,从而使得可以针对同一流的后续数据包恢复匹配(1316)。然后,图形行走引擎将结果中的终止行走位设置为“真”(1314)并且然后返回(1312)。In other aspects, if the graph walking engine determines that there are no more payloads to process (1306), the graph walking engine pushes the node to a save buffer/stack so that matching can be resumed for subsequent packets of the same flow (1316). The graph walking engine then sets the terminated walk bit in the result to "true" (1314) and then returns (1312).
图14为流程图1400,展示了图形行走引擎对字符串节点进行处理的示例实施例。如上所述,关于图23,展示了字符串节点的格式和字符串数据。关于图14,流程图1400为关于图12所描述的对字符串节点进行处理(1218)的扩展。FIG14 is a flowchart 1400 illustrating an example embodiment of a graph walking engine processing a string node. As described above with respect to FIG23 , the format and string data of a string node are illustrated. With respect to FIG14 , flowchart 1400 is an extension of the string node processing (1218) described with respect to FIG12 .
图27为表2700,展示了用于字符串匹配类型的堆栈条目的示例实施例。堆栈条目包括指示字符串匹配的堆栈条目类型2702、指示剩余字符串数据的地址的元素2706、指示图形中的下一节点的下一节点地址2708、以及指示有待处理的字符串的剩余长度的计数值2710。该堆栈条目进一步包括指示运行堆栈中的条目是否为副本的复制位2712、指示图形是否要逆向行走的逆向位2714、以及指示下一字节的偏移以在有效载荷中进行处理的偏移位2716。该堆栈条目进一步包括指示其不适用的匹配类型2704。针对字符串匹配类型,堆栈条目被排队到保存缓冲区/堆栈,因为不需要将它们推送至运行堆栈。Figure 27 is a table 2700 showing an example embodiment of a stack entry for a string match type. The stack entry includes a stack entry type 2702 indicating a string match, an element 2706 indicating the address of the remaining string data, a next node address 2708 indicating the next node in the graph, and a count value 2710 indicating the remaining length of the string to be processed. The stack entry further includes a copy bit 2712 indicating whether the entry in the run stack is a copy, a reverse bit 2714 indicating whether the graph is to be walked in reverse, and an offset bit 2716 indicating the offset of the next byte to be processed in the payload. The stack entry further includes a match type 2704 indicating that it is not applicable. For the string match type, the stack entry is queued to the save buffer/stack because it does not need to be pushed to the run stack.
关于图14,图形行走引擎开始对字符串节点进行处理 (1402)。图形行走引擎加载字符串数据,其包括来自节点的字符串的长度(例如,图23的字符串节点2330的计数2310),确定有效载荷中可用字节的数量(或其他数据大小),以及确定有效载荷中可用字节的数量是否等于大于字符串的长度(1404)。如果是,则图形行走引擎将“匹配长度”设置为“字符串长度”(1406)。以另外的方式,图形行走引擎将“匹配长度”设置为可用有效载荷段的数量。“匹配长度”为字符串的有待与有效载荷匹配的字节数量。如果匹配长度小于字符串长度(1404),则匹配长度被设置为可用字节数量(1405),从而使得字符串可以是部分匹配的,并且继续进行与后续数据包的匹配。With respect to Figure 14, the graph walking engine begins processing the string node (1402). The graph walking engine loads string data, which includes the length of the string from the node (e.g., count 2310 of string node 2330 of Figure 23), determines the number of available bytes in the payload (or other data size), and determines whether the number of available bytes in the payload is equal to or greater than the length of the string (1404). If so, the graph walking engine sets the "match length" to the "string length" (1406). Alternatively, the graph walking engine sets the "match length" to the number of available payload segments. The "match length" is the number of bytes of the string to be matched with the payload. If the match length is less than the string length (1404), the match length is set to the number of available bytes (1405), so that the string can be partially matched, and continues to match with subsequent data packets.
设置匹配长度之后,(1405或1406),图形行走引擎从有效载荷中提取多个字节,其中,字节的数量为该匹配长度,并且还提取字符串数据节点(例如,图23的字符串数据2340)(1408)。字符串数据节点包括有待与有效载荷段进行比较的实际字符串元素 (例如,图23的字符串数据2340的元素2316)。然后,图形行走引擎将所提取的有效载荷段的数量与相同字符串字节数量进行并行比较(1410)。然后,节点确定有效载荷的“匹配长度”字节是否与所有所提取的字符串字节匹配(1412)。如果不匹配,则图形行走引擎将结果的终止行走位设置为真(1418)并且返回(1420)。如果有效载荷的字节与字符串的字节匹配,则图形行走引擎确定匹配长度是否与字符串长度相同(1414)。After setting the match length, (1405 or 1406), the graph walk engine extracts a number of bytes from the payload, where the number of bytes is the match length, and also extracts a string data node (e.g., string data 2340 of Figure 23) (1408). The string data node includes the actual string element to be compared with the payload segment (e.g., element 2316 of string data 2340 of Figure 23). The graph walk engine then compares the number of extracted payload segments with the number of the same string bytes in parallel (1410). The node then determines whether the "match length" bytes of the payload match all the extracted string bytes (1412). If not, the graph walk engine sets the end walk bit of the result to true (1418) and returns (1420). If the bytes of the payload match the bytes of the string, the graph walk engine determines whether the match length is the same as the string length (1414).
如果匹配长度和字符串长度相同,则图形行走引擎返回 (1420)。如果匹配长度和字符串长度不相同,则图形行走引擎将包含字符串的剩余长度的堆栈条目(图27)推送至保存缓冲区/堆栈,从而使得来自同一流的后续有效载荷的剩余“字符串长度”字节可以与“剩余字符串数据”、和以上关于图27中所描述的信息一起匹配(1416),将结果的终止行走位设置为真(1418)并返回(1420)。If the match length and the string length are the same, the graph walk engine returns (1420). If the match length and the string length are not the same, the graph walk engine pushes the stack entry containing the remaining length of the string (Figure 27) to the save buffer/stack so that the remaining "string length" bytes of the subsequent payload from the same stream can be matched together with the "remaining string data" and the information described above with respect to Figure 27 (1416), sets the end walk bit of the result to true (1418) and returns (1420).
图15A和图15B为流程图1500和1501,展示了对固定计数节点进行处理的示例实施例。关于图24,以上描述了固定计数节点的格式。关于图15A-B,流程图1500和1501为关于图12所描述的对固定计数节点进行处理(1220)的扩展。Figures 15A and 15B are flow charts 1500 and 1501 illustrating an example embodiment of processing a fixed count node. The format of a fixed count node was described above with respect to Figure 24. With respect to Figures 15A-B, flow charts 1500 and 1501 are extensions of the processing of a fixed count node (1220) described with respect to Figure 12.
图28为表2800,展示了固定计数匹配类型的堆栈条目的示例实施例。堆栈条目包括指示固定计数匹配的堆栈条目类型2802、指示字符或字符类索引的元素2806、指示图形中的下一节点的下一节点地址2808、以及指示有待匹配的字节的剩余计数的计数值2810。该堆栈条目进一步包括指示运行堆栈中的节点是否为副本的复制位 2812、指示图形是否要逆向行走的逆向位2814、以及指示下一字节的偏移以在有效载荷中进行处理的偏移位2816。该堆栈条目进一步包括指示其不适用的匹配类型2804。针对固定计数匹配类型,堆栈条目被排队到保存缓冲区/堆栈,因为不需要将它们推送至运行堆栈。FIG28 is a table 2800 illustrating an example embodiment of a stack entry for a fixed count match type. The stack entry includes a stack entry type 2802 indicating a fixed count match, an element 2806 indicating a character or character class index, a next node address 2808 indicating the next node in the graph, and a count value 2810 indicating the remaining count of bytes to be matched. The stack entry further includes a copy bit 2812 indicating whether the node in the run stack is a copy, a reverse bit 2814 indicating whether the graph is to be walked in reverse, and an offset bit 2816 indicating the offset of the next byte to be processed in the payload. The stack entry further includes a match type 2804 indicating that it is not applicable. For a fixed count match type, stack entries are queued to a save buffer/stack because they do not need to be pushed to the run stack.
关于图15A,图形行走引擎开始对固定计数节点进行处理 (1502)。图形行走引擎读取节点中所存储的“计数”(例如,图 24的计数值2410)(1504)。节点中所存储的计数表示字符或字符类有待与有效载荷匹配的次数。例如,针对源自部分图样“b{5}”的固定节点,因为字符‘b’要与有效载荷匹配5次,计数为5。With respect to FIG15A , the graph walking engine begins processing a fixed count node ( 1502 ). The graph walking engine reads the "count" stored in the node (e.g., count value 2410 in FIG24 ) ( 1504 ). The count stored in the node represents the number of times a character or character class is to be matched with the payload. For example, for the fixed node derived from the partial pattern "b{5}," since the character 'b' is to be matched with the payload five times, the count is 5.
然后,图形行走引擎确定有效载荷中是否有可用的字节“计数”数量(1506)。如果有,则图形行走引擎将匹配长度设置为“计数”(1510)。以另外的方式,图形行走引擎将匹配长度设置为可用有效载荷段的数量。“匹配长度”为固定计数图样的有待与有效载荷匹配的字节数量。如果匹配长度小于固定计数节点的计数 (1508),则匹配长度被设置为可用字节数量,从而使得固定计数节点可以是部分匹配的,并且继续进行与同一流的后续数据包的匹配。设置匹配长度(1508或1510)之后,图形行走引擎从有效载荷提取字节的“匹配长度”数量的字节(1512)。The graph walk engine then determines if there are a "count" number of bytes available in the payload (1506). If so, the graph walk engine sets the match length to "count" (1510). Alternatively, the graph walk engine sets the match length to the number of available payload segments. The "match length" is the number of bytes of the fixed count pattern that are to be matched with the payload. If the match length is less than the count of the fixed count node (1508), the match length is set to the number of available bytes, so that the fixed count node can be partially matched, and matching continues with subsequent packets of the same flow. After setting the match length (1508 or 1510), the graph walk engine extracts a "match length" number of bytes from the payload (1512).
然后,图形行走引擎例如通过读取图24的元素2406中的数据来确定节点是否为固定计数字符类节点或固定计数字符节点,该数据指示字符类中的字符或索引数量(1514)。如果其为固定计数字符类节点,则图形行走引擎使用从固定字符类节点(例如,图 24的元素2406)提取的字符类索引来读取字符类位图/掩码(1516)。然后,图形行走引擎试图使“匹配长度”数量的有效载荷段与掩码中的相应条目并行匹配(1518)。以与以上字符类节点的上下文中所述的相同的方式执行字符类节点匹配。如果节点为固定计数字符节点,图形行走引擎将“匹配长度”数量的有效载荷段平行于节点中所存储的元素(例如,图24的元素2406)进行匹配(1520)。Then, the graph walking engine determines whether the node is a fixed count character class node or a fixed count character node, for example, by reading the data in element 2406 of Figure 24, which indicates the number of characters or indexes in the character class (1514). If it is a fixed count character class node, the graph walking engine uses the character class index extracted from the fixed character class node (e.g., element 2406 of Figure 24) to read the character class bitmap/mask (1516). The graph walking engine then attempts to match the payload segments of the "match length" number in parallel with the corresponding entries in the mask (1518). Character class node matching is performed in the same manner as described in the context of the above character class node. If the node is a fixed count character node, the graph walking engine matches the payload segments of the "match length" number in parallel with the elements stored in the node (e.g., element 2406 of Figure 24) (1520).
确定节点是否为固定计数字符类节点或固定计数字符节点 (1514)和响应于该确定(分别为1516和1518或1520)之后,参照图15B的流程图1501,图形行走引擎确定有效载荷的“匹配长度”数量的字节是否与该字符或字符类匹配(1522)。如果匹配,则图形行走引擎确定该匹配长度是否与该固定计数节点的计数相同 (1524)。如果相同,则图形行走引擎返回(1530)。如果不相同,图形行走引擎将堆栈条目(图28)推送至保存缓冲区/堆栈,从而使得来自同一流的后续有效载荷的剩余“计数”字节与剩余固定计数节点元素(1526)匹配,将结果的终止行走位设置为真(1528)并返回(1530)。After determining whether the node is a fixed-count character class node or a fixed-count character node (1514) and responding to the determination (1516 and 1518 or 1520, respectively), the graph walk engine determines, with reference to flowchart 1501 of FIG15B, whether the "match length" number of bytes of the payload matches the character or character class (1522). If so, the graph walk engine determines whether the match length is the same as the count of the fixed-count node (1524). If so, the graph walk engine returns (1530). If not, the graph walk engine pushes the stack entry (FIG. 28) to the save buffer/stack so that the remaining "count" bytes of the subsequent payload from the same stream match the remaining fixed-count node element (1526), sets the end-of-walk bit of the result to true (1528), and returns (1530).
如果有效载荷的“匹配长度”数量的字节与字符类的字符不匹配(1522),则图形行走引擎将结果的终止行走位设置为真 (1528)并返回(1530)。If the "match length" number of bytes of the payload do not match a character of the character class (1522), the graphics walk engine sets the end walk bit of the result to true (1528) and returns (1530).
图16为流程图1600,展示了对可变计数节点进行处理的示例实施例。关于图25,以上描述了可变计数节点的格式。关于图16,流程图1600为关于图12所描述的对可变计数节点进行处理(1222) 的扩展。FIG16 is a flowchart 1600 illustrating an example embodiment of processing a variable count node. The format of a variable count node is described above with respect to FIG25. With respect to FIG16, flowchart 1600 is an extension of the processing of a variable count node (1222) described with respect to FIG12.
图29为流程图2900,展示了用于可变计数匹配类型的堆栈条目的示例实施例。该堆栈条目包括指示可变计数匹配的堆栈条目类型2902、指示字符或字符类索引的元素2906、指示图形中的下一节点的下一节点地址2908、以及指示有待匹配的字节的剩余计数的计数值2910。该堆栈条目进一步包括指示运行堆栈中的节点是否为副本的复制位2912、指示图形是否要逆向行走的逆向位2914、以及指示下一字节的偏移以在有效载荷中进行处理的偏移位2916。该堆栈条目进一步包括指示节点是否为懒惰、贪婪、领属、或全匹配节点的匹配类型2904。可以将该堆栈条目推送和弹出至运行堆栈,或在用尽有效载荷的情况下,可以将其从运行堆栈拷贝至保存缓冲区/ 堆栈。Figure 29 is a flow chart 2900 showing an example embodiment of a stack entry for a variable count match type. This stack entry includes a stack entry type 2902 indicating a variable count match, an element 2906 indicating a character or character class index, a next node address 2908 indicating the next node in the graph, and a count value 2910 indicating the remaining count of bytes to be matched. This stack entry further includes a copy bit 2912 indicating whether the node in the run stack is a copy, a reverse bit 2914 indicating whether the graph is to be walked in reverse, and an offset bit 2916 indicating the offset of the next byte to be processed in the payload. This stack entry further includes a match type 2904 indicating whether the node is lazy, greedy, possessive, or a fully matched node. This stack entry can be pushed and popped onto the run stack, or, when the payload is exhausted, can be copied from the run stack to a save buffer/stack.
关于图16,图形行走引擎开始对可变计数节点进行处理 (1602)。图形行走引擎加载图25的匹配类型2504并确定该节点匹配类型是否为懒惰的(1604)。如果是,则其对该可变计数懒惰节点进行处理(1614),在图17中对此进行了进一步的详细解释。然后,该图形行走引擎返回(1622)。With reference to FIG16 , the graph walking engine begins processing the variable count node ( 1602 ). The graph walking engine loads the match type 2504 of FIG25 and determines whether the node match type is lazy ( 1604 ). If so, it processes the variable count lazy node ( 1614 ), as further explained in FIG17 . The graph walking engine then returns ( 1622 ).
如果不是,则图形行走引擎确定该节点匹配类型是否为贪婪的(1606)。如果是,则其对该可变计数贪婪节点进行处理(1616),在图18中对此进行了进一步的详细解释。然后,该图形行走引擎返回(1622)。If not, the graph walking engine determines whether the node matching type is greedy (1606). If so, it processes the variable count greedy node (1616), which is explained in further detail in Figure 18. The graph walking engine then returns (1622).
如果不是,则图形行走引擎确定该节点是否为领属匹配类型(1608)。如果是,则其对该可变计数领属节点进行处理(1618),在图19中对此进行了进一步的详细解释。然后,图形行走引擎返回 (1622)。If not, the graph walking engine determines whether the node is a possessive match type (1608). If so, it processes the variable count possessed node (1618), which is explained in further detail in Figure 19. The graph walking engine then returns (1622).
如果不是,则图形行走引擎确定该节点匹配类型是否为“全”或“全匹配”节点并按照可变计数全匹配节点对该节点进行处理(1620),图20中对此进行了进一步的详细解释。然后,该图形行走引擎返回(1622)。If not, the graph walking engine determines whether the node match type is a "full" or "full match" node and processes the node as a variable count full match node (1620), which is explained in further detail in Figure 20. The graph walking engine then returns (1622).
图17为流程图1700,展示了对可变计数懒惰节点进行处理的示例实施例。关于图25,以上描述了可变计数节点的格式,并且关于图29,以上描述了可变计数堆栈条目的格式。关于图17,流程图1700为关于图16所描述的对可变计数懒惰节点进行处理(1614) 的扩展。FIG17 is a flowchart 1700 illustrating an example embodiment of processing a variable-count lazy node. The format of a variable-count node was described above with respect to FIG25 , and the format of a variable-count stack entry was described above with respect to FIG29 . With respect to FIG17 , flowchart 1700 is an extension of the processing of a variable-count lazy node ( 1614 ) described with respect to FIG16 .
图形行走引擎开始处理可变计数懒惰节点(1702)。图形行走引擎确定该节点是否是从运行堆栈条目读取的(1704)。如果该节点不是从从运行堆栈条目读取的,这意味着该节点第一次被处理,则图形行走引擎确定该计数(例如,图25的计数值2510)是否大于零,并且如果大于,则其推送其复制位被设置为“1”(例如,图29的复制位2912)的带有如上解释填充的所有相关信息的运行堆栈条目(图29,2900)(1706)。然后,图形行走引擎返回(1724)。所推送的运行堆栈条目允许图形行走引擎记住其返回路径并继续行走至位于下一节点地址(例如,图25的2508)处的下一节点。如果当行走下一节点路径时找到匹配,则将复制位设置为“1”允许从运行堆栈弹出和丢弃节点。如果没有找到匹配,则当从运行堆栈弹出这些这些节点时,可以对它们进行处理。The graph walking engine begins processing the variable count lazy node (1702). The graph walking engine determines whether the node is read from a run stack entry (1704). If the node is not read from a run stack entry, this means that the node is being processed for the first time. The graph walking engine determines whether the count (e.g., count value 2510 in FIG. 25 ) is greater than zero, and if it is, it pushes a run stack entry ( FIG. 29 , 2900 ) with all relevant information filled as explained above, with its copy bit set to “1” (e.g., copy bit 2912 in FIG. 29 ) (1706). The graph walking engine then returns (1724). The pushed run stack entry allows the graph walking engine to remember its return path and continue walking to the next node at the next node address (e.g., 2508 in FIG. 25 ). If a match is found when walking the next node path, the copy bit is set to “1” to allow the node to be popped and discarded from the run stack. If no match is found, then when these nodes are popped from the run stack, they can be processed.
如果该节点是从运行堆栈条目读取的(1704),则图形行走引擎确定有效载荷中是否有有待处理的至少又一个字节(1708)。如果不再有有效载荷的字节,图形行走引擎将带有节点信息的堆栈条目(图29,2900)推送至保存缓冲区/堆栈(1710),将结果的终止行走位设置为“真”(1712)并返回(1724)。将节点推送至保存缓冲区/堆栈(1710)保存了匹配的进度,从而使得当图形行走引擎处理属于同一应用流的后续数据包时,其可以从保存缓冲区/堆栈加载之前的匹配进度并恢复匹配。If the node was read from a running stack entry (1704), the graph walking engine determines whether there are at least one more byte in the payload to be processed (1708). If there are no more bytes in the payload, the graph walking engine pushes the stack entry (Figure 29, 2900) with the node information to a save buffer/stack (1710), sets the terminated walk bit of the result to "true" (1712), and returns (1724). Pushing the node to the save buffer/stack (1710) saves the progress of the match so that when the graph walking engine processes subsequent packets belonging to the same application flow, it can load the previous match progress from the save buffer/stack and resume the match.
如果有效载荷没有用尽(即,如果存在有效载荷的有待处理的至少一个字节),则图形行走引擎通过检查图29的元素2906 来确定可变计数节点是否为字符类节点或字符节点(1714)。如果该可变计数节点为可变计数字符类节点,则其使用可变计数字符类节点中的图29的元素2906中所存储的字符类索引来读取位图/掩码 (1720)。然后,图形行走引擎从有效载荷提取一个字节并通过将来自该有效载荷的字节用作该位图/掩码的索引来将该字节与该位图 /掩码中的相应条目进行比较(1722)。如果设置了该条目,则图形行走引擎确定匹配。If the payload is not exhausted (i.e., if there is at least one byte of the payload to be processed), the graph walking engine determines whether the variable count node is a character class node or a character node by checking element 2906 of Figure 29 (1714). If the variable count node is a variable count character class node, it uses the character class index stored in element 2906 of Figure 29 in the variable count character class node to read the bitmap/mask (1720). The graph walking engine then extracts a byte from the payload and compares the byte to the corresponding entry in the bitmap/mask by using the byte from the payload as an index into the bitmap/mask (1722). If the entry is set, the graph walking engine determines a match.
另一方面,如果该可变计数节点为可变计数字符节点,则图形行走引擎从有效载荷中提取一个字节并将其与该节点中所存储的图29的元素2906进行匹配(1716)。On the other hand, if the variable count node is a variable count character node, the graph walking engine extracts one byte from the payload and matches it with element 2906 of FIG. 29 stored in the node ( 1716 ).
确定该节点是否为可变计数字符类节点或可变计数字符节点(1714)并响应于该确定(分别为1720和1722或1716),则图形行走引擎确定该字节是否与该元素匹配(1718)。如果匹配,则图形行走引擎使该计数(例如,图29的计数值2910)减量1(1705),如果该计数大于零则推送设置了复制位(例如,图29的复制位2912) 的运行堆栈条目(例如,图29的2900)(1706)并返回(1724)。如果该计数等于零,则不将条目推送至运行堆栈内。以另外的方式,图形行走引擎将结果中的终止行走位设置为“真”(1712)并且返回(1724)。Determine whether the node is a variable count character class node or a variable count character node (1714) and in response to the determination (1720 and 1722 or 1716, respectively), the graphics walk engine determines whether the byte matches the element (1718). If so, the graphics walk engine decrements the count (e.g., count value 2910 of FIG. 29 ) by 1 (1705), and if the count is greater than zero, pushes a run stack entry (e.g., 2900 of FIG. 29 ) with the copy bit (e.g., copy bit 2912 of FIG. 29 ) set (1706) and returns (1724). If the count is equal to zero, the entry is not pushed into the run stack. Alternatively, the graphics walk engine sets the terminate walk bit in the result to "true" (1712) and returns (1724).
图18为流程图1800,展示了对可变计数贪婪节点进行处理的示例实施例。关于图25,以上描述了可变计数节点的格式,并且关于图29,以上描述了可变计数堆栈条目的格式。关于图18,流程图1800为关于图16所描述的对可变计数贪婪节点进行处理(1616) 的扩展。FIG18 is a flowchart 1800 illustrating an example embodiment of processing a variable-count greedy node. The format of a variable-count node was described above with respect to FIG25 , and the format of a variable-count stack entry was described above with respect to FIG29 . With respect to FIG18 , flowchart 1800 is an extension of the processing of a variable-count greedy node (1616) described with respect to FIG16 .
图形行走引擎开始处理可变计数贪婪节点(1802)。图形行走引擎确定该节点是否从运行堆栈条目读取该节点(1804)。如果是,则图形行走引擎在运行堆栈条目中使该计数(例如,图29的计数值2910)减量1(1806)。然后,如果该计数(例如,图29的计数值2910)大于零,则其将运行堆栈条目与所设置的复制位一起推送至运行堆栈(1808)。然后,该图形行走引擎返回(1818)。The graph walking engine begins processing a variable-count greedy node (1802). The graph walking engine determines whether the node is read from a run stack entry (1804). If so, the graph walking engine decrements the count (e.g., count value 2910 in FIG. 29 ) in the run stack entry by 1 (1806). Then, if the count (e.g., count value 2910 in FIG. 29 ) is greater than zero, it pushes the run stack entry onto the run stack with the copy bit set (1808). The graph walking engine then returns (1818).
如果没有从运行堆栈读取运行堆栈条目(即,第一次处理该节点),则图形行走引擎通过检查图25的元素2506来确定该可变计数节点是否为可变计数字符类节点或可变计数字符节点 (1810)。如果该可变计数节点为可变计数字符类节点,则其通过读取图25的元素2506来读取与该可变计数字符类节点中所存储的字符类索引相对应的位图/掩码(1814)。然后,图形行走引擎从有效载荷提取一个字节并通过将来自该有效载荷的字节用作该位图/掩码的索引来将该字节与该位图/掩码中的相应条目进行比较,直到存在不匹配或该有效载荷中不再具有可用的字节,或所匹配的字节数量等于该计数值(图25的2510)(1816)。然后,图形行走引擎将有待存储的可用计数(图29的2910)分配在运行堆栈条目中作为可用计数节点所匹配的字节数量(1817)。然后,如果该运行堆栈条目的计数大于零,则图形行走引擎将复制位设置为1的运行堆栈条目(图29的2900)(1808)。如果该运行堆栈条目的计数等于零,则图形行走引擎不推送运行堆栈条目。然后,图形行走引擎返回 (1818)。If the run stack entry has not been read from the run stack (i.e., the node is being processed for the first time), the graph walk engine determines whether the variable count node is a variable count character class node or a variable count character node by checking element 2506 of Figure 25 (1810). If the variable count node is a variable count character class node, it reads the bitmap/mask corresponding to the character class index stored in the variable count character class node by reading element 2506 of Figure 25 (1814). The graph walk engine then extracts a byte from the payload and compares the byte with the corresponding entry in the bitmap/mask by using the byte from the payload as an index into the bitmap/mask until there is a mismatch, there are no more available bytes in the payload, or the number of bytes matched equals the count value (2510 of Figure 25) (1816). The graph walk engine then assigns the available count to be stored (2910 of Figure 29) to the run stack entry as the number of bytes matched by the available count node (1817). Then, if the count of the run stack entry is greater than zero, the graphics walk engine copies the run stack entry with the bit set to 1 (2900 of FIG. 29 ) ( 1808 ). If the count of the run stack entry is equal to zero, the graphics walk engine does not push the run stack entry. The graphics walk engine then returns ( 1818 ).
如果该节点为可变计数字符节点,则图形行走引擎从有效载荷提取字节并将它们与节点元素(图25的2506)所存储的字符进行匹配,直到其失败、用尽有效载荷、或所匹配的字节数量等于该计数(图25的2510)(1812)。然后,图形行走引擎将有待存储的计数值(例如,图29的计数值2910)分配在运行堆栈条目中作为可用计数节点所匹配的字节数量(1817)。If the node is a variable count character node, the graph walking engine extracts bytes from the payload and matches them against the characters stored by the node element (2506 of FIG. 25 ) until it fails, runs out of payload, or the number of bytes matched equals the count (2510 of FIG. 25 ) ( 1812 ). The graph walking engine then allocates the count value to be stored (e.g., count value 2910 of FIG. 29 ) in the run stack entry as the number of bytes matched by the available count node ( 1817 ).
图19为流程图1900,展示了对可变计数领属节点进行处理的示例实施例。关于图25,以上描述了可变计数节点的格式,并且关于图29,以上描述了可变计数堆栈条目的格式。关于图19,流程图1900为关于图16所描述的对可变计数领属节点进行处理(1618) 的扩展。FIG19 is a flowchart 1900 illustrating an example embodiment of processing a variable-count owned node. The format of a variable-count owned node was described above with respect to FIG25 , and the format of a variable-count stack entry was described above with respect to FIG29 . With respect to FIG19 , flowchart 1900 is an extension of the processing of a variable-count owned node (1618) described with respect to FIG16 .
关于图19,图形行走引擎开始对可变计数节点进行处理 (1902)。图形行走引擎通过检查图25的元素2506来确定该节点是否为可变计数字符类节点或可变计数字符节点(1904)。如果该节点为可变计数字符类节点,则其读取与该可变计数字符类节点元素(图25的2506)中所存储的字符类索引相对应的位图/掩码。然后,图形行走引擎从有效载荷提取字节并通过将来自该有效载荷的字节用作该位图/掩码的索引来将它们与该位图/掩码中的相应条目进行比较,直到存在不匹配,该有效载荷中不再具有可用的字节或所匹配的字节数量等于该计数(图25的2510)。With reference to FIG19 , the graph walking engine begins processing a variable count node ( 1902 ). The graph walking engine determines whether the node is a variable count character class node or a variable count character node ( 1904 ) by examining element 2506 of FIG25 . If the node is a variable count character class node, it reads the bitmap/mask corresponding to the character class index stored in the variable count character class node element ( 2506 of FIG25 ). The graph walking engine then extracts bytes from the payload and compares them to corresponding entries in the bitmap/mask by using the bytes from the payload as indices into the bitmap/mask until there is a mismatch, there are no more bytes available in the payload, or the number of bytes matched equals the count ( 2510 of FIG25 ).
如果该节点为可变计数字符节点,则图形行走引擎从有效载荷提取一个字节并将其与该节点中所存储的元素(图25的2506) 进行比较,并继续对字节进行匹配,直到存在不匹配、该有效载荷中不再具有可用字节、或所匹配的字节数量等于该计数(图25的 2510)(1906)。If the node is a variable count character node, the graph walking engine extracts a byte from the payload and compares it to the element stored in the node (2506 of Figure 25) and continues matching bytes until there is a mismatch, there are no more available bytes in the payload, or the number of bytes matched equals the count (2510 of Figure 25) (1906).
将来自有效载荷的字节与字符类或值/字符/字母进行匹配 (分别为1916或1906)后,图形行走引擎确定该有效载荷中是否具有剩余的字节(1908)。如果图形行走引擎已经用尽有效载荷(即,没有剩余字节)(1908),则图形行走引擎将该节点推送至保存缓冲区/堆栈(1910),将终止行走位设置为真(1912),并返回(1918)。如果图形行走引擎没有用尽有效载荷(即,有剩余字节)(1908),图形行走引擎返回(1918)。After matching the bytes from the payload to the character class or value/character/letter (1916 or 1906, respectively), the graph walking engine determines whether there are any remaining bytes in the payload (1908). If the graph walking engine has exhausted the payload (i.e., no bytes remain) (1908), the graph walking engine pushes the node to the save buffer/stack (1910), sets the terminated walk bit to true (1912), and returns (1918). If the graph walking engine has not exhausted the payload (i.e., there are remaining bytes) (1908), the graph walking engine returns (1918).
图20为流程图2000,展示了对可变计数全匹配节点进行处理的示例实施例。关于图25,以上描述了可变计数节点的格式。关于图20,流程图2000为关于图16所描述的对可变计数全匹配节点进行处理(1620)的扩展。FIG20 is a flow chart 2000 illustrating an example embodiment of processing a variable count full match node. The format of a variable count node is described above with respect to FIG25. With respect to FIG20, flow chart 2000 is an extension of the processing of a variable count full match node (1620) described with respect to FIG16.
图形行走引擎开始处理可变计数节点(2002)。图形行走引擎确定该节点是否是从运行堆栈条目读取的(2004)。如果该节点不是从运行堆栈读取的(2004),则其推送没有设置(例如,设置为0)复制位(图29,2912)的运行堆栈条目(图29,2900)(2007)。然后,图形行走引擎返回(2020)。The graph walking engine begins processing the variable count node (2002). The graph walking engine determines whether the node was read from a run stack entry (2004). If the node was not read from the run stack (2004), it pushes a run stack entry (2900, FIG. 29 ) with the copy bit (2912, FIG. 29 ) not set (e.g., set to 0) (2007). The graph walking engine then returns (2020).
如果该节点是从运行堆栈读取的,(2004),则图形行走引擎确定其是否用尽有效载荷(例如,有效载荷中是否没有剩余字节)(2005)。如果没有,或如果有效载荷中剩余了字节,则图形行走引擎通过检查图29的元素2906来确定该可变计数节点是否为可变计数字符类节点或可变计数字符节点(2006)。If the node is read from the run stack (2004), the graph walking engine determines whether it has exhausted its payload (e.g., whether there are no bytes remaining in the payload) (2005). If not, or if there are bytes remaining in the payload, the graph walking engine determines whether the variable count node is a variable count character class node or a variable count character node by examining element 2906 of FIG. 29 (2006).
如果该节点为可变计数字符类节点,则图形行走引擎读取与该可变计数字符类节点(2012)中所存储的字符类索引相对应的位图/掩码。然后,图形行走引擎从有效载荷提取一个字节并通过将来自该有效载荷的字节用作该位图/掩码的索引来将该字节与该位图/掩码中的相应条目进行比较(2014)。If the node is a variable-count character class node, the graph walking engine reads the bitmap/mask corresponding to the character class index stored in the variable-count character class node (2012). The graph walking engine then extracts a byte from the payload and compares the byte with the corresponding entry in the bitmap/mask by using the byte from the payload as an index into the bitmap/mask (2014).
如果该节点为可变计数字符节点,则图形行走引擎从有效载荷中提取一个字节并将其与该节点中所存储的值/字符/字母进行匹配(2008)。If the node is a variable count character node, the graph walking engine extracts a byte from the payload and matches it with the value/character/letter stored in the node (2008).
将来自有效载荷的字节与字符类或字符进行匹配(分别为 2014或2008)后,图形行走引擎确定该字节是否与该字符类或字符匹配(2010)。如果存在匹配(2010),则图形行走引擎使计数(即,图29的计数值2910)减量1(2022)。如果该计数大于零,则图形行走引擎推送没有设置(例如,设置为0)复制位(图29,2912) 的运行堆栈条目(图29,2900)(2007)并返回(2020)。如果该计数等于零,则图形行走引擎不推送任何堆栈条目并返回(2020)。如果没有匹配,则图形行走引擎将终止行走位设置为真(2018)并且返回(2020)。After matching a byte from the payload to a character class or character (2014 or 2008, respectively), the graphics walk engine determines whether the byte matches the character class or character (2010). If there is a match (2010), the graphics walk engine decrements a count (i.e., count value 2910 of FIG. 29 ) by 1 (2022). If the count is greater than zero, the graphics walk engine pushes a run stack entry ( FIG. 29 , 2900) with the copy bit ( FIG. 29 , 2912) not set (e.g., set to 0) (2007) and returns (2020). If the count is equal to zero, the graphics walk engine does not push any stack entry and returns (2020). If there is no match, the graphics walk engine sets the terminate walk bit to true (2018) and returns (2020).
如果图形行走引擎已经用尽有效载荷,或者没有剩余的有效载荷字节(2005),则图形行走引擎将该节点推送至保存缓冲区/ 堆栈(2016)。然后,图形行走引擎将终止行走位设置为真(2018) 并且返回(2020)。If the graph walking engine has run out of payload, or has no payload bytes remaining (2005), the graph walking engine pushes the node to the save buffer/stack (2016). The graph walking engine then sets the terminated walk bit to true (2018) and returns (2020).
图21为表2100,展示了字符类中所使用的位图/掩码的示例实施例。表2100示出了字符类索引2102、字符类定义2104、以及ASCII值2106。在实现字符类表的实施例中,存储器可以存储字符类索引2102的值、字符类定义2104、或ASCII值2106;然而,在此示出了它们以展示这些字符类定义如何与字符类矩阵相关和这些索引可以如何访问该字符类矩阵。图21示出了五个字符类定义仅作为一个示例实施例。其他实施例可以包括不同种类的字符类,并且唯一字符类的数量可以是任何数量。FIG21 is a table 2100 showing an example embodiment of a bitmap/mask used in a character class. Table 2100 shows a character class index 2102, a character class definition 2104, and an ASCII value 2106. In an embodiment implementing a character class table, a memory may store the value of the character class index 2102, the character class definition 2104, or the ASCII value 2106; however, they are shown here to show how these character class definitions relate to the character class matrix and how these indexes can access the character class matrix. FIG21 shows five character class definitions as only one example embodiment. Other embodiments may include different kinds of character classes, and the number of unique character classes may be any number.
被分配有字符类索引1的[^\n]字符类转换以与除了换行以外的每个字符匹配,因为“^”运算符产生跟在其后面的任何事物的倒数,并且“\n”指示换行。因此,位图/掩码中的每个位被设置为“1”,除了与换行相对应的ASCII值,其为12。因此,处理具有为 12的值的字节的节点访问此字符类字符类矩阵[1][12],其中“1”为字符类索引并且“12”为到字符类的有效载荷的值。由于该表中的此位置处的值为“0”,该有效载荷不匹配。然而,加载到CharacterClassMatrix[1][PayloadByte]中的任何其他有效载荷产生匹配。The [^\n] character class, assigned character class index 1, is converted to match every character except newline because the "^" operator produces the inverse of whatever follows it, and "\n" indicates a newline. Therefore, every bit in the bitmap/mask is set to "1" except for the ASCII value corresponding to newline, which is 12. Therefore, a node processing a byte with a value of 12 accesses this character class CharacterClassMatrix[1][12], where "1" is the character class index and "12" is the value of the payload to the character class. Since the value at this position in the table is "0", the payload does not match. However, any other payload loaded into CharacterClassMatrix[1][PayloadByte] produces a match.
被分配有字符类索引2的[a-z]字符类转换以与‘a’至‘z’的范围内的每个字符匹配。因此,在与字符类索引2相对应的位图/ 掩码中,来自97至122的值被设置为“1”并且所有其他值被设置为“0”。因此,处理表示ASCII值“c”的有效载荷段的节点访问字符类矩阵[2][99],其中“2”为字符类索引并且“99”为有效载荷的值。由于该表中的此位置处的值为“1”,该有效载荷与该字符类匹配。然而,针对此字符类,97-122范围以外的有效载荷不匹配。例如,如果该有效载荷为数字“4”,则该节点访问字符类矩阵[2][52],其具有为“0”的值,该值指示不匹配。The [a-z] character class, which is assigned character class index 2, is converted to match every character in the range of 'a' to 'z'. Therefore, in the bitmap/mask corresponding to character class index 2, the values from 97 to 122 are set to "1" and all other values are set to "0". Therefore, the node processing the payload segment representing the ASCII value "c" accesses the character class matrix [2][99], where "2" is the character class index and "99" is the value of the payload. Since the value at this position in the table is "1", the payload matches the character class. However, for this character class, payloads outside the range of 97-122 do not match. For example, if the payload is the number "4", the node accesses the character class matrix [2][52], which has a value of "0", which indicates a mismatch.
被分配有字符类索引3的[^a-z]字符类转换以与除了‘a’至‘z’的范围内的那些以外的每个值/字符/字母匹配。因此,在与字符类索引3相对应的位图/掩码中,来自97至122的值被设置为“0”并且所有其他值被设置为“1”。因此,处理表示ASCII值“c”的有效载荷段的节点访问字符类矩阵[3][99],其中“3”为字符类索引并且“99”为有效载荷的值。由于该表中的此位置处的值为“0”,该有效载荷不与该字符类匹配。然而,针对此字符类,97-122范围以外的有效载荷是匹配。例如,如果该有效载荷为数字“4”,则该节点访问字符类矩阵[3][52],其具有为“1”的值,该值指示匹配。The [^a-z] character class assigned character class index 3 is converted to match every value/character/letter except those in the range of 'a' to 'z'. Therefore, in the bitmap/mask corresponding to character class index 3, the values from 97 to 122 are set to "0" and all other values are set to "1". Therefore, the node processing the payload segment representing the ASCII value "c" accesses the character class matrix [3][99], where "3" is the character class index and "99" is the value of the payload. Since the value at this position in the table is "0", the payload does not match the character class. However, for this character class, payloads outside the range 97-122 are matches. For example, if the payload is the number "4", the node accesses the character class matrix [3][52], which has a value of "1", which indicates a match.
被分配有字符类索引4的[0-9]字符类转换以与‘0’至‘9’的范围内的每个值/字符/字母匹配。因此,在与字符类索引4相对应的位图/掩码中,来自48至57的值被设置为“1”并且所有其他值被设置为“0”。因此,处理表示ASCII值“D”的有效载荷段的节点访问CharacterClassMatrix[4][68],其中“4”为字符类索引并且“68”为有效载荷的值。由于该表中的此位置处的值为“0”,该有效载荷不与该字符类匹配。然而,针对此字符类,48-57范围以内的有效载荷是匹配。例如,如果该有效载荷为数字“4”,则该节点访问CharacterClassMatrix[4][52],其具有为“1”的值,该值指示匹配。The [0-9] character class, which is assigned character class index 4, is converted to match every value/character/letter in the range of '0' to '9'. Therefore, in the bitmap/mask corresponding to character class index 4, values from 48 to 57 are set to "1" and all other values are set to "0". Therefore, the node processing the payload segment representing the ASCII value "D" accesses CharacterClassMatrix[4][68], where "4" is the character class index and "68" is the value of the payload. Since the value at this position in the table is "0", the payload does not match the character class. However, for this character class, payloads in the range 48-57 are matches. For example, if the payload is the number "4", the node accesses CharacterClassMatrix[4][52], which has a value of "1", which indicates a match.
被分配有字符类索引5的[ABCabc]字符类转换以与单独值/ 字符/字母“A”、“B”、“C”、“a”、“b”、和“c”匹配。因此,在与字符类索引5相对应的位图/掩码中,来自65、66、67、97、 98和99的值被设置为“1”并且所有其他值被设置为“0”。因此,处理表示ASCII值“c”的有效载荷段的节点访问 CharacterClassMatrix[5][99],其中“5”为字符类索引并且“99”为有效载荷的值。由于该表中的此位置处的值为“1”,该有效载荷与该字符类匹配。然而,针对此字符类,65、66、67、97、98和99 的值不匹配。例如,如果该有效载荷为数字“4”,则该节点访问 CharacterClassMatrix[5][52],其具有为“0”的值,该值指示不匹配。The [ABCabc] character class, which is assigned character class index 5, is converted to match the individual values/characters/letters "A", "B", "C", "a", "b", and "c". Therefore, in the bitmap/mask corresponding to character class index 5, the values from 65, 66, 67, 97, 98, and 99 are set to "1" and all other values are set to "0". Therefore, the node processing the payload segment representing the ASCII value "c" accesses CharacterClassMatrix[5][99], where "5" is the character class index and "99" is the value of the payload. Since the value at this position in the table is "1", the payload matches the character class. However, for this character class, the values 65, 66, 67, 97, 98, and 99 do not match. For example, if the payload is the number "4", the node accesses CharacterClassMatrix[5][52], which has a value of "0", which indicates a mismatch.
在一个实施例中,该字符类矩阵可以用于任何数据类型或数据长度。在上述实施例中,这些有效载荷为字符,其可以是7位或8位。然而,可以使用任何长度的数据并且其不一定必须以字符为形式。可以使用其他数据编码。此类表的其他应用的示例为视频处理、音频处理、二分搜索、或任何图样搜索应用。In one embodiment, this character class matrix can be used for any data type or data length.In the above-described embodiment, these payloads are characters, which can be 7 or 8. However, data of any length can be used and it does not necessarily have to be in the form of characters. Other data encodings can be used. Examples of other applications of this type of table are video processing, audio processing, binary search or any pattern search application.
在此引证的全部专利、公开的申请以及参考文献的教导通过引用以其全文进行结合。The teachings of all patents, published applications, and references cited herein are incorporated by reference in their entirety.
尽管本发明已经参照其示例实施例做了具体的展示和说明,本领域技术人员将理解到通过在不偏离由所附的权利要求书涵盖的本发明的范围下可以从中做出在形式和细节上的不同的变化。While the invention has been particularly shown and described with reference to example embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention as encompassed by the appended claims.
Claims (50)
Applications Claiming Priority (6)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201361872612P | 2013-08-30 | 2013-08-30 | |
| US201361872622P | 2013-08-30 | 2013-08-30 | |
| US61/872,612 | 2013-08-30 | ||
| US61/872,622 | 2013-08-30 | ||
| US14/186,978 US9563399B2 (en) | 2013-08-30 | 2014-02-21 | Generating a non-deterministic finite automata (NFA) graph for regular expression patterns with advanced features |
| US14/186,978 | 2014-02-21 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK1208104A1 HK1208104A1 (en) | 2016-02-19 |
| HK1208104B true HK1208104B (en) | 2019-10-11 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US9563399B2 (en) | Generating a non-deterministic finite automata (NFA) graph for regular expression patterns with advanced features | |
| US8819217B2 (en) | Intelligent graph walking | |
| US9652505B2 (en) | Content search pattern matching using deterministic finite automata (DFA) graphs | |
| US7949683B2 (en) | Method and apparatus for traversing a compressed deterministic finite automata (DFA) graph | |
| US8180803B2 (en) | Deterministic finite automata (DFA) graph compression | |
| US9762544B2 (en) | Reverse NFA generation and processing | |
| US9398033B2 (en) | Regular expression processing automaton | |
| US8473523B2 (en) | Deterministic finite automata graph traversal with nodal bit mapping | |
| US8086609B2 (en) | Graph caching | |
| US8176300B2 (en) | Method and apparatus for content based searching | |
| US20110016154A1 (en) | Profile-based and dictionary based graph caching | |
| CN104714995A (en) | System and method to traverse a non-deterministic finite automata (NFA) graph generated for regular expression patterns with advanced features | |
| HK1208104B (en) | A method and computer system of compiling a pattern into a non-deterministic finite automata (nfa) graph | |
| HK1211718B (en) | System and method to traverse nfa generated for regular expression patterns |