HK1212119B

HK1212119B - Method and apparatus for processing of finite automata

Info

Publication number: HK1212119B
Application number: HK15112773.1A
Authority: HK
Inventors: R‧戈亚尔; S‧L‧比拉; A‧迪克西特
Original assignee: 马维尔亚洲私人有限公司
Priority date: 2013-12-30
Filing date: 2015-12-29
Publication date: 2019-08-02

Abstract

A method, and corresponding apparatus and system are provided for optimizing matching at least one regular expression pattern in an input stream by walking at least one finite automaton in a speculative manner. The speculative manner may include walking at least two nodes of a given finite automaton, of the at least one finite automaton, in parallel, with a segment, at a given offset within a payload of a packet in the input stream. The walking may include determining a match result for the segment, at the given offset within the payload, at each node of the at least two nodes. The walking may further include determining at least one subsequent action for walking the given finite automaton, based on an aggregation of each match result determined.

Description

Method and apparatus for processing finite automata

Technical Field

Embodiments of the invention relate to methods and apparatus for processing finite automata.

Background

The Open Systems Interconnection (OSI) reference model defines 7 network protocol layers (L1-L7) for communication over a transmission medium. The upper layers (L4-L7) represent end-to-end communications and the lower layers (L1-L3) represent local communications.

Networking application aware systems need to process, filter and switch the range of L3 to L7 network protocol layers, e.g., L7 network protocol layers such as hypertext transfer protocol (HTTP) and Simple Mail Transfer Protocol (SMTP), and L4 network protocol layers such as Transmission Control Protocol (TCP). In addition to handling network protocol layers, networking application aware systems need to protect these protocols at wire speed (i.e., data transfer rate on the physical medium of the network over which data is transmitted and received) through L4-L7 network protocol layers, including firewalls, Virtual Private Networks (VPNs), Secure Sockets Layer (SSL), Intrusion Detection Systems (IDS), internet protocol security (IPSec), anti-virus (AV), and anti-spam functions, while simultaneously through access and content based security.

Network processors are available for high throughput L2 and L3 network protocol processing, i.e., performing packet processing to forward packets at wire speed. Typically, a general purpose processor is used to process L4-L7 network protocols that require more intelligent processing. While general purpose processors may perform such compute intensive tasks, there is insufficient performance for processing data so that it can be forwarded at wire speed.

Intrusion Detection System (IDS) applications can examine the contents of individual data packets flowing through a network and can identify suspicious patterns that may indicate an attempt to intrude or compromise the system. One example of a suspicious pattern may be a particular text string in a data packet that follows 100 characters by another particular text string. Such content aware networking may require examining the content of the data packets at line speed. The content may be analyzed to determine if a security breach or intrusion exists.

A large number of patterns and rules in the form of regular expressions (also referred to herein as regular expression patterns) may be applied to ensure that all security vulnerabilities or intrusions are detected. Regular expressions are a compact method for describing patterns in character strings. The simplest pattern to match by regular expression is a single character or string of characters, e.g./c/or/cat/. Regular expressions may also include operators and meta-characters with special meaning. By using meta-characters, regular expressions can be used for more complex searches, such as "abc. That is, in the case of an unlimited number of characters between "abc" and "xyz", the character string "abc" is found, followed by the character string "xyz". Another example is the regular expression "abc.. xyz; ", i.e., find the string" abc ", followed by two characters, then the string" abc "and followed by the string" xyz "after an infinite number of characters. Content searches are typically performed using search algorithms such as Deterministic Finite Automata (DFA) or non-deterministic finite automata (NFA) for processing regular expressions.

Disclosure of Invention

Embodiments of the present invention provide a method, apparatus, computer program product, and corresponding system that may search an input stream for at least one regular expression pattern using at least one finite automaton.

According to one embodiment, a method may store, in at least one memory, at least one finite automaton comprising a plurality of nodes generated from at least one regular expression pattern. The method includes operatively coupling the at least one memory to at least one processor. The at least one processor may be configured to walk the at least one finite automaton with segments of an input stream received via a hardware network interface operatively coupled to a network to match the at least one regular expression pattern in the input stream. The walking may comprise walking at least two nodes of a given finite automaton of the at least one finite automaton in parallel with a segment at a given offset within a payload of a data packet in the input stream. The walking may include determining a match result for the segment at the given offset within the payload at each of the at least two nodes. The walking may further include determining at least one follow-up action for walking the given finite automaton based on the determined set of each match result.

The segment may also be referred to herein as a payload segment or a segment of the payload. The section may be a portion of the payload being examined to determine a match of the section to an element indicated by a node of the at least one finite automaton being traversed (i.e., walked) with the section. The field may be a value, character, letter, byte, or other suitable type of data. The segments can have any suitable granularity (e.g., length or size). For example, the granularity of the section may be one byte, multiple bytes, less than one byte, any number of bits, or any other suitable granularity. The type of the element may be a character, a string, a character class, or any other suitable element type.

The at least one finite automaton may comprise a Deterministic Finite Automaton (DFA) and at least one non-deterministic finite automaton (NFA). The given finite automaton of the at least one finite automaton may be a given NFA of the at least one NFA.

The determination at each of the at least two nodes may be within a same processing cycle of the at least one processor.

The at least two nodes may include an element node and a parallel node, the element node may be configured to match a single instance of a first element in the payload. The first element may be a first character or first character class. The parallel node may be one of: (i) a variable-count node configured to match a variable number of consecutive instances of a second element in the payload, or (ii) a speculative node. The speculative node may be configured to match a variable number of consecutive instances of a second element in the payload based on transition arcs from or to a split node. The second element may be a second character or a second character class.

The variable count node may be a collection of the split node and the speculative node. The detached node may be configured to advance the walk to the element node and the speculative node via epsilon transition arcs and walk the element and speculative nodes in parallel with the segment at the given offset, independent of and without consuming (i.e., processing) from the payload. The speculative node may be configured to advance the walk back to the detached node and consume the segment by updating the given offset based on a positive match with the second element at the speculative node.

The given finite automaton may be an NFA graph, which may include a transition arc from the variable count node to the element node. In the NFA graph, the variable count node may precede the element node.

The variable-count node may be a lazy-type node associated with metadata that either directly or indirectly identifies the element node to advance the walk to the element node based on a single matching instance of the variable number of consecutive instances of the second element in the payload.

The metadata associated with the variable count lazy node may include a count for tracking a total number of consecutive instances of the second element that match in the payload to allow comparison of the total number to the variable number.

The given finite automaton may include a disjointed node of the plurality of nodes based on the parallel node being the speculative node. The element node and the speculative node may be identified based on metadata associated with the split node. The detached node may be configured to advance the walk to the element node and the speculative node via epsilon transition arcs, independent of and without consumption from the payload, and walk the element and speculative node in parallel with the segment at the given offset based on a speculative processing indicator included in metadata associated with the detached node.

The method may walk the element and speculative node in parallel based on not including the speculative processing indicator in the metadata associated with the split node.

Walking the speculative node with the segment at the given offset may be based on one time for the segment at the element node at the given offset.

Walking the speculative node with the segment at the given offset may be further based on storing and retrieving unexplored context. The unexplored context may identify, either directly or indirectly, the speculative node and the given offset.

Storing the unexplored context may include storing the unexplored context in a stack entry and pushing the stack entry onto a stack. Retrieving the unexplored context may include popping the stack entry from the stack.

The speculative node may be configured to advance the walk to the detached node based on a positive match at the speculative node with the second element.

Based on a negative match result that the set includes at each of the at least two nodes walked in parallel, the at least one follow-up action may include not continuing to walk a given path at each of the at least two nodes walked in parallel. The given path may partially match the at least one regular expression pattern in the given finite automaton. The method may walk a next node of the plurality of nodes with a next segment at a next given offset within the payload based on sensing unexplored context. The method may terminate the walk based on not sensing the unexplored context.

The unexplored context may identify, either directly or indirectly, the next node and the next given offset to advance the walk with the next segment at the next node along another path that partially matches the at least one regular expression pattern in the given finite automaton.

Sensing the unexplored context may include determining a non-empty state of a stack and popping a stack entry from the stack. The stack entry may include the unexplored context and may be an entry that was recently pushed onto the stack.

The at least two nodes may include one element node and one parallel node. Based on the set including a positive match result at the element node for the segment and the positive match result or a negative match result at the parallel node for the segment, the at least one follow-up action includes updating the given offset to produce a next offset, identifying a next node of the plurality of nodes based on metadata associated with the element node, walking the identified next node with a next segment at the next offset within the payload, determining a next match result at the identified next node for the next segment, and determining at least one next follow-up action for walking the given finite automaton based on the determined next match result.

Based on the positive match result for the segment at the parallel node, the at least one follow-up action may further include storing an unexplored context in a stack entry and pushing the stack entry onto a stack, the unexplored context either directly or indirectly identifying the parallel node and the given offset.

Based on the next match result being the negative match result, the at least one next follow-up action may include walking the parallel node with the section at the given offset within the payload based on sensing an unexplored context, and terminating the walking based on not sensing the unexplored context.

Updating the given offset to generate the next offset may include increasing the given offset based on the direction of the walk being a forward direction and decreasing the given offset based on the direction of the walk being a reverse direction.

The at least two nodes walked in parallel may include one element node and one parallel node. Based on the set comprising a negative match result at the element node for the segment and a positive match result at the parallel node for the segment, the at least one subsequent action may further comprise updating the given offset to produce a next offset and walking the element node and the parallel node in parallel with a next segment at the next offset.

Updating the given offset to produce the next offset may include increasing the given offset based on the direction of the walk being a forward direction and decreasing the given offset based on the direction of the walk being a reverse direction.

Walking the at least two nodes in parallel may optimize performance of the matching by eliminating storing and retrieving context (which would be required if the at least two nodes were not walked in parallel) to advance the walk with the section at the given offset from a first node of the at least two nodes to a second node of the at least two nodes based on a negative match result for the section at the given offset.

Another example embodiment disclosed herein includes an apparatus corresponding to the operations consistent with the method embodiments disclosed herein.

Additionally, yet another example embodiment may include a non-transitory computer readable medium having stored thereon a sequence of instructions which, when loaded and executed by a processor, causes the processor to perform the method disclosed herein.

Drawings

The foregoing will be apparent from the following more particular description of exemplary embodiments of the invention, as illustrated in the accompanying drawings in which like reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating embodiments of the invention.

FIG. 1 is a block diagram of an embodiment of a security device in which embodiments disclosed herein may be implemented.

Fig. 2A-G are a table illustrating NFA and DFA graphs and illustrating graph explosion (graph explosion) concepts.

FIG. 3 is another block diagram of an embodiment of a security device in which embodiments disclosed herein may be implemented.

FIG. 4 is a block diagram of an example embodiment of an environment for a hyper-non-deterministic automata (HNA) coprocessor.

FIG. 5A is a block diagram of an example embodiment of a non-deterministic finite automata (NFA) graph, which may be used by a walker (walker) to match regular expression patterns in an input stream.

Fig. 5B is a table of an example embodiment of a processing cycle for walking the NFA graph of fig. 5A with one payload in a non-speculative (lazy non-productive) manner.

Fig. 5C is a block diagram of an example embodiment of a table of speculative processing rules.

Fig. 5D is a table of an example embodiment of a processing cycle for traversing the NFA graph of fig. 5A with the payload in a speculative manner.

FIG. 6A is a block diagram of another example embodiment of an NFA graph that may be used by a walker to match regular expression patterns in an input stream.

Fig. 6B is a table of an example embodiment of a processing cycle for traversing the NFA graph of fig. 6A with the payload in a non-speculative manner.

Fig. 6C is a table of another example embodiment of a processing cycle for traversing the NFA graph of fig. 6A with the payload.

Fig. 6D is a block diagram of another payload that may be traversed with the NFA graph of fig. 6A.

Fig. 6E is a table of an example embodiment of a processing cycle for traversing the NFA graph of fig. 6A with the payload of fig. 6D in a non-speculative manner.

Fig. 6F is a table of another example embodiment of a processing cycle for traversing the NFA graph of fig. 6A with the payload of fig. 6D in a speculative manner.

Fig. 7 is a flow diagram of an example embodiment of a method that may be implemented in at least one processor operatively coupled with at least one memory in a security device operatively coupled to a network.

FIG. 8 is a block diagram of an example internal structure of a computer, optionally within embodiments disclosed herein.

Detailed Description

Before describing in detail exemplary embodiments of the present invention, an exemplary security application is described immediately below to assist the reader in understanding the features of the present invention disclosed herein, in which the embodiments may be implemented and typical processes use Deterministic Finite Automata (DFA) and non-deterministic finite automata (NFA).

FIG. 1 is a block diagram of an embodiment of a security device 102 in which embodiments disclosed herein may be implemented. The security device 102 may include a network services processor 100. The security device 102 may be a separate system that may switch packets received at one network interface 103a to another network interface 103b and may perform security functions on the received data packets before forwarding the packets. For example, the security device 102 may be configured to perform security processing on packets 101a, which may be received over a Wide Area Network (WAN)105a or any other suitable network, and then forward the processed packets 101b to a Local Area Network (LAN)105b or any other suitable network.

Network services processor 100 may be configured to process an Open Systems Interconnection (OSI) network layer L2-L7 protocol encapsulated in received packets. As is well known to those skilled in the art, the OSI reference model defines seven network protocol layers (L1-7). The physical layer (L1) represents the actual interface that connects the device to the transmission medium, including electrical and physical interfaces. The data link layer (L2) performs data framing. The network layer (L3) formats the data into packets. The transport layer (L4) handles end-to-end transport. The session layer (L5) manages communication between devices, e.g., whether the communication is half-duplex or full-duplex. The presentation layer (L6) manages data formatting and presentation, such as syntax, control codes, special graphics and character sets. The application layer (L7) allows communication between multiple users, such as file transfers and e-mail.

The network services processor 100 may schedule and queue work (e.g., packet processing operations) for higher layer network protocols (e.g., L4-L7) and allow processing of the higher layer network protocols among received packets to be executed to forward the packets at wire speed. By processing these protocols to forward these packets at wire speed, the network services processor 100 does not reduce the network data transfer rate. Network services processor 100 may receive packets from network interfaces 103a or 103b, which may be physical hardware interfaces and may perform L2-L7 network protocol processing on the received packets. Network services processor 100 may subsequently forward processed data packet 101b through network interface 103a or 103b to another hop in the network, to a final destination, or through another bus (not shown) for further processing by a host processor (not shown). Network protocol processing may include processing of network security protocols such as firewalls, application firewalls, Virtual Private Networks (VPNs) including IP layer security protocol (IPSec) and/or Secure Sockets Layer (SSL), Intrusion Detection Systems (IDS), and anti-viruses (AV).

Network services processor 100 may use multiple processors (i.e., cores) to provide high application performance. Each core (not shown) may be dedicated to performing operations of a data plane or a control plane. The data plane operations may include packet operations for forwarding packets. The control plane operations may include processing complex higher layer protocol portions such as IP layer security protocol (IPSec), Transmission Control Protocol (TCP), and Secure Sockets Layer (SSL). The data plane operations may include processing other portions of these complex higher layer protocols.

Network services processor 100 may also include an application specific coprocessor that may offload cores, enabling network services processor 100 to achieve high throughput. For example, the network services processor 100 may include an acceleration unit 106 that may include a hyper-non-deterministic automata (HNA) coprocessor 108 for hardware accelerated NFA processing and a hyper-finite automata (HFA) coprocessor 110 for hardware accelerated DFA processing. The HNA108 and HFA 110 coprocessors may be configured to offload a general purpose core (not shown) of the network services processor 100 from performing the computational and memory intensive pattern matching methods.

The network services processor 100 may perform pattern searching, regular expression processing, content verification, transformation, and security accelerated packet processing. Regular expression processing and pattern searching can be used to perform string matching for AV and IDS applications, as well as other applications requiring string matching. A memory controller (not shown) in network services processor 100 may control access to memory 104, which is operatively coupled to network services processor 100. The memory may be internal (i.e., on-chip) or external (i.e., off-chip) or a combination thereof, and may be configured to store received data packets, such as data packet 101a for processing by network services processor 100. The memory may be configured to store compilation rule data for queries and pattern matching in DFA and NFA graphical expression searches. The compilation rule data may be stored as one binary image 112 that may include compilation rule data for both the DFA and NFA, or multiple binary images that separate the DFA compilation rule data from the NFA compilation rule data.

Typical content aware application processes may use either DFA or NFA to identify patterns in the content of received data packets. Both DFA and NFA are finite state machines, i.e., each computational model includes a set of states, a startup state, an input alphabet (the set of all possible symbols), and a transfer function. The calculation starts in the start-up state and changes to a new state depending on the transfer function.

The pattern is typically expressed in a regular expression that includes basic elements, e.g., normal text characters such as A-Z and 0-9 and meta characters such as ^ A, ^ and |. The basic element of a regular expression is the symbol to be matched (a single character). The basic elements may be combined with allowed concatenation (+), alternating (|) meta characters, and clarnish asterisks ("). The meta-characters used for concatenation can be used to create multiple character matching patterns from a single character (or multiple substrings), while the meta-characters used for alternating (|) can be used to create regular expressions that can match any of two or more substrings. The element character kriney star (×) allows a pattern match to occur any number of times, including the absence of preceding characters or strings.

Combining different operators and multiple single characters allows complex expressive sub-patterns to be constructed. For example, a sub-pattern such as (th (is | at)) may match multiple strings such as: th, this, that, thisis, thisat, thatis, or thatat. Another example of a complex sub-pattern of an expression may be a sub-pattern that incorporates a character class construct that allows listing a list of characters to be searched. For example, gr [ ea ] y looks for both grey and gray. Other examples of complex sub-patterns are those that may use dashes to represent character ranges, e.g., [ a-Z ] or a meta-character ". that matches any one character. An element of a pattern may be a basic element or a combination of one or more basic element combinations and one or more meta-characters.

The input to the DFA or NFA state machine is typically a segment, such as a string of one (8-bit) byte, i.e., the alphabet may be a single byte (one character or symbol) from the input stream (i.e., the received packet). Each segment (e.g., byte) in an input stream may cause a transition from one state to another. The states and transition functions of the DFA or NFA state machine may be represented by a graph. Each node in the graph may represent a state and the arcs in the graph (also referred to herein as transition arcs) may represent state transitions. The current state of the state machine may be represented by a node identifier that selects a particular node in the graph.

Processing a regular expression using DFA and finding one or more patterns described by the regular expression in a character input stream can be characterized as having deterministic runtime performance. The next state of the DFA can be determined from an input character (or symbol) and the current state of the DFA, since there is only one state transition per DFA state. Thus, the runtime performance of a DFA is said to be deterministic, and its behavior can be fully predicted from this input. However, the deterministic tradeoff is a graph: where the number of nodes (or pattern size) may grow exponentially with the size of the pattern.

In contrast, the number of nodes (or pattern size) of an NFA pattern may be characterized as growing linearly with the size of the pattern. However, processing a regular expression using NFA and finding one or more patterns described by the regular expression in a character input stream may be characterized as having non-deterministic runtime performance. For example, given an input character (or symbol) and a current state of the NFA, there may be a transition of the NFA to more than one next state. Thus, the next state of the NFA cannot be uniquely determined from the input and the current state of the NFA. Thus, the runtime performance of the NFA is referred to as indeterminate because its behavior cannot be fully predicted from this input.

FIGS. 2A-G illustrate the DFA "map shot" concept. FIGS. 2A, 2B, and 2C show NFA graphs for patterns "." [ a [ ^ n ], "" [ a [ ^ n ], "[ a [ ^ n ]," "and FIGS. 2D, 2E, and 2F show DFA graphs for the same patterns, respectively. As shown in fig. 2A-2F and summarized by the table of fig. 2G, the NFA graph may grow linearly for certain patterns while the DFA graph may grow exponentially for the same patterns, resulting in graph explosion. As shown, for a given pattern or patterns, the number of DFA states may be greater than the number of NFA states, typically on the order of hundreds or thousands of states. This is an example of a "graph burst," which is a signature feature of DFAs.

According to embodiments disclosed herein, content searches may be conducted using DFAs, NFAs, or a combination thereof. According to one embodiment, a runtime processor, coprocessor, or combination thereof may be implemented in hardware and may be configured to implement a compiler or walker.

The compiler may compile a pattern or a list of pattern inputs (also referred to as signatures or rules) into the DFA, NFA, or a combination thereof. The DFA and NFA may be binary data structures such as DFA and NFA graphs and tables.

The walker may perform run-time processing, such as actions that may identify the presence of a pattern in the input stream or match the pattern to content in the input stream. The content may be a payload portion of an Internet Protocol (IP) datagram, or any other suitable payload in the input stream. Runtime processing of a DFA or NFA graph may be referred to herein as walking or traversing the DFA or NFA graph using the payload to determine pattern matching. One processor configured to generate a DFA, NFA, or combination thereof may be referred to herein as a compiler. One processor configured to implement runtime processing of a payload using the generated DFA, NFA, or combination thereof may be referred to herein as a walker. According to embodiments disclosed herein, the network services processor 100 may be configured to implement a compiler and a walker in the security device 102.

Fig. 3 is a block diagram of another embodiment of the security device 102 of fig. 1 in which embodiments disclosed herein may be implemented. As described with reference to fig. 1, the security device 102 may be operatively coupled to one or more networks and may include a memory 104 and a network services processor 100, which may include an acceleration unit 106. Referring to FIG. 3, the network services processor 100 may be configured to implement a compiler 306 that generates the binary image 112 and a walker 320 that uses the binary image 112. For example, compiler 306 may generate binary image 112 including compilation rule data used by walker 320 to perform a pattern matching method on received data packet 101a (shown in fig. 1). The compiler 306 may generate the binary image 112 by determining compiled rule data for the DFA, the NFA, or a combination thereof based on determining the rule data that advantageously fits the DFA and the NFA.

According to embodiments described herein, the compiler 306 may generate the binary image 112 by processing a rule set 310, which may include a set of one or more regular expression patterns 304 and optional qualifiers 308. From rule set 310, compiler 306 may generate a unified DFA 312 and at least one NFA 314 for at least one pattern in the set of one or more regular expression patterns 304 using sub-patterns selected from all patterns in the one or more regular expression patterns for use by walker 320 during runtime processing, and metadata (not shown) including mapping information for transitioning walker 320 between a state of unified DFA 312 and a state of the at least one NFA 314. According to embodiments disclosed herein, each NFA generated may be for a particular pattern in the group, and a unified DFA may be generated based on all sub-patterns from all patterns in the group.

The unified DFA 312 and the at least one NFA 314 may be represented as a graph or in any other suitable form from data structure to data structure, and the mapping in the metadata may be represented as one or more tables or in any other suitable form from data structure to data structure. According to embodiments disclosed herein, if a sub-pattern selected from a given pattern is the entire given pattern, then NFA is not generated for the given pattern.

The walker 320 may be configured to walk the unified DFA 312 and the at least one NFA 314 with the payload by transforming the state of the unified DFA 312 and the at least one NFA based on the segment consuming the payload from the received data packet 101 a. Consuming may include updating a current offset from a current zone to another zone within the payload. Updating the current offset may be based on a walking direction, e.g., walker 320 may walk the unified DFA 312 or the at least one NFA 314 in a forward or reverse direction, increasing the current offset based on the forward walking direction and decreasing the current offset based on the reverse walking direction. The walker 320 then walks the payload through the unified DFA 312 and the at least one NFA 314.

Rule set 310 may include a set of one or more regular expression patterns 304 and may be in the form of a Perl-compatible regular expression (PCRE) script file or any other suitable form now known or later developed. The PCRE has become a de facto standard for regular expression syntax in security and networking applications. As more applications have emerged that require deep packet inspection or more threats have become prevalent on the internet, the corresponding signatures/patterns used to identify viruses/attacks or applications have also become more complex. For example, signature databases have evolved from having simple string patterns to regular expression (regex) patterns with wildcards, ranges, character classes, and advanced PCRE signatures.

As shown in FIG. 3, optional qualifiers 308 may each be associated with one pattern in the set of regular expression patterns 304. For example, an optional qualifier 322 may be associated with the pattern 316. Optional qualifiers 308 may each be one or more qualifiers that specify a desired customization, advanced PCRE signature options, or other options that are appropriate for processing the patterns associated with these qualifiers. Compiler 306 may generate a unified DFA 312 using sub-patterns 302 selected from all of the set of one or more regular expression patterns 304. Compiler 306 may select sub-pattern 302 from each of the set of one or more regular expression patterns 304. Compiler 306 may also generate at least one NFA 314 for at least one pattern 316 in the set, a portion (not shown) of the at least one pattern 316 is used to generate the at least one NFA 314, and at least one walking direction for runtime processing (i.e., walking) the at least one NFA 314 may be determined based on whether a length of the selected sub-pattern 318 is fixed or variable and a position of the selected sub-pattern 318 in the at least one pattern 316. The compiler 306 may store the unified DFA 312 and the at least one NFA 314 in the at least one memory 104.

A sub-pattern is a set of one or more consecutive elements from a pattern, where each element from the pattern may be represented by a node in a DFA or NFA graph for the purpose of matching a segment from the payload. As described above, an element may be a single text character represented by a node, or a character class represented by a node. Based on whether a sub-pattern is likely to result in excessive DFA bursts, compiler 306 may determine which sub-pattern in the pattern better fits the NFA, as described above with reference to fig. 2A-G. For example, generating a DFA from a sub-pattern that includes consecutive text characters will not result in a DFA explosion, whereas a complex sub-pattern as described above may include operators and characters and thus may result in a DFA explosion. For example, sub-patterns that include wildcards or larger character classes that are repeated multiple times (e.g., [ ^ n ]. or [ ^ n ] {16}) may result in too many states in the DFA and thus may be more advantageously suited for the NFA.

Determining a match for the entire pattern may be found by employing matching results from the unified DFA, the at least one NFA, or a combination thereof. According to embodiments disclosed herein, if a payload in the received data packet 101 includes content that matches a selected sub-pattern 318 from pattern 316, the walker may transition to at least one NFA of walking pattern 318. The walker 320 may report a match of the selected sub-pattern 318 and an offset identifying the position of the last character of the matched sub-pattern in the received data packet as the ending offset of the sub-pattern in the payload.

Sub-pattern matching may be partial matching to the pattern if the self-pattern is a subset of the pattern. The walker 320 may then continue searching the payload for the remainder of the pattern by walking at least one NFA of the pattern to determine a final match for the pattern. It should be appreciated that the pattern may traverse one or more payloads in the received packet 101 a.

Figure 4 is a block diagram 450 of an example embodiment of one environment of the HNA coprocessor 108 of figure 1. According to embodiments disclosed herein, HFA 110 may be configured to implement the functionality of walker 320 with respect to DFA processing, and HNA108 may be configured to implement the functionality of walker 320 with respect to NFA processing.

According to embodiments disclosed herein, HNA108 may be configured to read at least one instruction 453 from an instruction queue 454. The instruction queue 454 may be configured to store the at least one instruction 453, which may be sent by a host (not shown) for processing by the HNA 108. The at least one instruction 453 may include at least one job, such as S1459a, S2459b, or S3459 c. The at least one job may each be determined based on the partial match results identified by the HFA coprocessor 110 of fig. 1 for a given sub-pattern (matching is being made in the input stream) of the sub-patterns 302 of fig. 3.

A given one of the at least one job may indicate a given one of the at least one NFA 314, at least one given node of the given NFA, at least one given offset in a given payload, and at least one walking direction, each of the at least one walking direction corresponding to one of the at least one given node. The at least one job may each include results of processing by the HFA, enabling the HNA to advance a match corresponding to a given sub-pattern in a given NFA for a given one of the at least one pattern 304. Each job then represents a partial match result determined by the HFA coprocessor 110 to advance the matching of the given pattern through the HNA coprocessor 108.

The HNA108 may process the at least one instruction 453 by reading at least one pointer (not shown) or other suitable instruction information stored therein. The at least one pointer may include an input buffer pointer (not shown) to an input buffer 458. The at least one instruction 453 can also include a payload pointer (not shown) to a payload 462, a result buffer pointer (not shown) to a match result buffer 466, a save buffer pointer (not shown) to a save buffer 464, and a run stack pointer (not shown) to a run stack 460.

The input buffer 458, the run stack 460, and the save buffer 464 may be referred to herein as an input stack, a run stack, and a save stack, respectively, although the input buffer 458, the run stack 460, and the save buffer 464 may or may not exhibit the last-in-first-out (LIFO) characteristics of a stack. The input buffer 458, the run stack 460, and the save buffer 464 may be located within the same or different physical buffers. The entries of the input buffer 458, the run stack 460, and the save buffer 464 may differ based on the field settings of the entries, if located within the same physical buffer, or in any other suitable manner. The input buffer 458 and the run stack 460 may be located in the same physical buffer (which may be on-chip), while the hold buffer 464 may be located in another physical buffer (which may be off-chip).

The at least one job of the at least one instruction 453, such as S1459a, S2459b, or S3459c, may be stored in the input stack 458 for processing by the HNA 108. The at least one job of the at least one instruction may each belong to a same given payload, such as payload 462, that is processed by HFA 110.

The HNA108 may be configured to load (i.e., fetch or retrieve) at least one job from the input buffer 458 based on the input buffer pointer, such as jobs S1459a, S2459b, or S34 3459 c. The HNA108 may push (i.e., store) the at least one job to the run stack 460. The HNA108 may pop (i.e., read, fetch, load, etc.) a given job, such as entries S1459a, S2459b, or S3459c, from the running stack and process the given job. The at least one job each (e.g., S1459a, S2459b, or S3459c) may include a payload offset (not shown) to a section (not shown) of the payload 462 and a pointer to a graphic 457, which may be a given finite automaton of at least one finite automaton, such as the at least one NFA 314 of fig. 3.

The HNA108 may load (i.e., extract) the graphic 457 from the graphic memory 456, which may be included in the binary image 112 of fig. 1 and 3, and begin processing the graphic 457 using the payload section corresponding to the corresponding payload offset of the payload 462. The HNA108 may process the graph 457 by walking the nodes of the graph 457 with payload sections. The partially matched path of the graph 457 may include at least two nodes of the graph 457 that match consecutive segments of the payload with a given pattern used to generate the graph 457. Partially matched paths may be referred to herein as threads or active threads.

The HNA108 may process the graph 457 using a payload section from the payload 462, pushing/popping an entry to/from the running stack 460, to hold and restore its position in the graph 457. For example, if one walked node presents multiple options for the next to walk node, the HNA108 may need to maintain its position in the graph. For example, the HNA108 may walk a node presenting multiple processing path options, such as a fork represented in the graph. According to embodiments disclosed herein, a node of a DFA or NFA is associated with a node type. A node associated with one split or variable count node type may present multiple processing path options. Split and variable count node types are further disclosed below with reference to fig. 5A and 6A.

According to embodiments disclosed herein, the HNA108 may be configured to select a given one of the plurality of processing paths and push an entry into the running stack 460, which may allow the HNA108 to return and proceed along unselected ones of the plurality of processing paths based on determining a mismatch (i.e., a negative) at the walked nodes along the selected path. Thus, the entry on the push run stack 460 may hold a location in the graph 457 that represents an unexplored context. Unexplored context may indicate a given node and a corresponding payload offset of the graph 457 to enable the HNA108 to return to the given node and walk the given node with a given section of the payload 462, as the given section may be located at the corresponding payload offset in the payload 462. The run stack 460 may then be used to allow the engine 462 to remember and walk an unexplored path of the pattern 457 at a later time. Pushing or storing an entry that indicates a given node and corresponding offset in a given payload may be referred to herein as storing unexplored context, threads, or inactive threads. Popping, fetching, or loading an entry indicating the given node and the corresponding offset in the given payload to walk the given node with a section located at the corresponding offset in the given payload may be referred to herein as activating a thread. Discarding an entry indicating the given node and corresponding offset in the given payload may be referred to herein as clearing an entry or stalling a thread.

In the event that one end of payload 462 is reached while walking the section of payload 462 with graph 457, running stack 460 may allow HNA108 to save its position in graph 457. For example, the HNA108 may determine that the payload or a portion of the payload 462 partially matches a given pattern and that a current payload offset of the payload 462 is an end offset of the payload 462. Then, the HNA108 may determine that only one partial match of the given pattern was found and the entire payload 462 has been consumed. The HNA108 may then save the contents of the running stack 460 to the save buffer 464 to continue walking with the next payload corresponding to the same stream as the consumed payload 462. Save buffer 464 may be configured to store at least one run stack entry of run stack 460, mirroring a running state of run stack 460, in the event payload 462 is consumed.

Based on finding a final (i.e., full or complete) match for the pattern, the HNA may pop up and discard the entry associated with the current work, e.g., the work loaded from the input buffer, in the run stack 460, as in S1459a, and save the match result (not shown) to the match result buffer 466. Alternatively, the HNA108 may continue to process entries of the running stack 460 that are relevant to the current job, as all possible matching paths may be meaningful.

The match result may include a node address associated with the node where the final match of the pattern was determined. The nodes that determine the final match of the pattern may be referred to herein as marker nodes. The node address, or other identifier of a final matching location in the graph 457, an identifier of the matching pattern, a length of the matching pattern, or any other suitable matching result, or a combination thereof, may be included in the matching results.

Based on processing all the run stack entries associated with the current work, HNA108 may load the next work from the run stack, which has been previously loaded from input buffer 458 (e.g., S2459b), because HNA108 may be configured to process the work of instructions 453 sequentially. The HNA108 may then fetch the next pattern (not shown) from the pattern memory 456, walk the next pattern with one or more payload sections from the payload 462, and continue processing additional work until the run stack 460 is empty.

Based on finding a mismatch in payload 462 when walking graph 457 with payload 462, HNA108 may pop an entry associated with the current work (e.g., S1459 a) from running stack 460 and walk the next node with the next section of payload 462 based on the contents of the popped entry. If the run stack 460 does not include an entry associated with the current work, the HNA108 may end the current work and may load the next work from the run stack 460 that has been previously loaded from the input buffer 458 (e.g., S2459 c). The HNA108 may then be configured to walk the next graphic based on the next job loaded, and continue processing additional jobs until the run stack 460 is empty.

According to embodiments disclosed herein, the walker 320 functionality of the HNA108 may include optimizing the matching of at least one regular expression pattern to an input stream by walking a given NFA in a speculative manner. The speculative manner may include walking at least two nodes of the given NFA in parallel with a segment at a given offset within a payload of a packet in the input stream. The walking may include determining a match result for the segment at the given offset within the payload at each of the at least two nodes. The walking may further include determining at least one next follow-up action for walking the given finite automaton based on the determined set of each match result. The matching of such optimization of the input stream by the at least one regular expression by walking the given NFA in a speculative manner is further disclosed below.

FIG. 5A is a block diagram 500 of an example embodiment of an NFA graph 504 that may be used by a walker 320 to match regular expression patterns 502 in an input stream (not shown). As described above, the HNA108 may be configured to implement the functionality of the walker 320 with respect to NFA processing.

In the exemplary embodiment, the input stream may include a packet (not shown) with a payload 542. Regular expression pattern 502 is a pattern "h [ < lambda >/n ]. about ab", which specifies that character "h" is followed by an unlimited number of consecutive characters that do not match the line-filled character (i.e., [ < lambda >/n ]. about a word). The infinite number may be zero or more. Pattern 502 further includes characters "a" and "b" that consecutively follow an infinite number of characters that do not match the line feed character. In the example embodiment, payload 542 includes segments 552a-d (i.e., h, x, a, and b) with offsets 520a-d (i.e., 0, 1, 2, and 3), respectively, in payload 542.

It should be appreciated that the regular expression pattern 502, the NFA pattern 504, the payload 542, the segments 522a-d, and the offsets 520a-d represent examples for illustrative purposes, and that the systems, methods, and corresponding apparatus disclosed herein may be applicable to any suitable regular expression pattern, NFA pattern, payload, segment, and offset. Further, it should be understood that NFA pattern 504 may be a sub-region of a larger NFA pattern (not shown). Additionally, payload 542 may be a portion of a larger payload (not shown), and the portion may be at the beginning, end, or any location of the larger payload, resulting in a different offset than in the example embodiment.

In the example embodiment, NFA graph 504 is configured to match the regular expression pattern 502 with the input stream. For example, NFA graph 504 may be a graph that includes a plurality of nodes generated by compiler 306, such as nodes N0506, N1508, N2510, N3512, N4514, and N5515. Node N0506 can represent a start node of pattern 502, and node N5515 can represent a marker node of pattern 502. Marker node N5515 may be associated with an indicator that reflects a final (i.e., complete or complete) match of pattern 502 that matches the input stream. Thus, walker 302 may determine that pattern 502 is a match in the input stream based on traversing marker node N5515.

According to embodiments disclosed herein, the walker 320 may walk the segments 522a-d of the payload 542 through the NFA graph 504, one segment at a time, to match the regular expression 502 with the input stream. A given one of the segments 516 for walking a given node may be determined based on its corresponding offset in the offset 518 being the current offset within the payload 542. According to embodiments disclosed herein, the walker 320 may update the current offset by increasing or decreasing the current offset. For example, the walker 320 may walk the NFA graph 504 in a forward or reverse direction, and thus may walk a section from the payload 542 in one forward 543 or reverse 546 direction by increasing or decreasing the current offset, respectively.

Nodes N0506, N2510, N3512, and N4514 may be configured to match a corresponding element to a given section of payload 542, while nodes N1508 and N5515 may be nodes of a node type that do not indicate matching functionality and therefore are not consumed from payload 542. In the example embodiment, node N1508 is a separate node that presents a plurality of conversion path options to walker 320. For example, walk-off node N1508 presents epsilon paths 530a and 530 b. According to embodiments disclosed herein, walker 320 may select a given one of the plurality of paths 530a and 530b based on implicit settings that are mutually consistent with walker 306. For example, compiler 306 may generate NFA graph 504 based on an implicit understanding that walker 320 follows a deterministic path, e.g., implicitly understands walker 320 to select an upper epsilon path 530a based on walking split node 508. According to embodiments disclosed herein, upper epsilon path 530a may be selected as upper epsilon path 530a representing a lazy path. The lazy path may be the shortest possible matching path of the representative elements.

According to embodiments disclosed herein, the split node 508 may be associated with split node metadata (not shown) to present the plurality of path options. For example, in the example embodiment, the split node metadata may indicate, either directly or indirectly, a plurality of next nodes, such as nodes N2510 and N3512. If the plurality of next nodes are directly indicated, the metadata may include absolute addresses or pointers to the next nodes N2510 and N3512. If the plurality of next nodes are indirectly indicated, the metadata may include an index or offset that may be used to resolve the absolute addresses or pointers to the next nodes N2510 and N3512. Alternatively, other suitable forms for indicating the plurality of next nodes, either directly or indirectly, may also be used.

The implicit understanding may include configuring walker 320 to select a given one of the plurality of next nodes based on node metadata included in a particular entry location within the split node metadata. Compiler 306 may be configured to generate the split node metadata including an indication of the given next node at the specified entry location. Compiler 306 may then generate NFA graph 504 using the implicit understanding of: a given path, such as the upper epsilon path 530a, will be selected by walker 320 at split node N1508.

Fig. 5B is a table 538 of an example embodiment of processing cycles for walking the NFA graph of fig. 5A with one payload 542 in a non-speculative manner. It should be understood that a processing cycle may include one or more clock cycles.

As shown in table 538, processing cycles 540a-h may include walking a current node 530 at a current offset 532 with a section from payload 542 to determine a match result 534 and determining walking action 536 based on the match result 534. In this example embodiment, node N0506 may have a character node type. For example, node N0506 may be a character node configured to match the character "h" in the input stream. In this example embodiment, walker 320 may walk start node N0506 with segment 522a (i.e., "h") at current offset 520a in processing cycle 540 a.

When segment 522a matches the character "h" at node N0506, walker 320 may determine that match result 534 is a positive match result. As specified by the compiler 306 via metadata (not shown) associated with the start node N0506, the walker 320 may walk in the forward direction and extract the next node indicated by metadata associated with the node N0506 and may increase the current offset from 520a (i.e., "0") to 520b (i.e., "1"). The next node indicated by node N0506 is in this example embodiment split node N1508. Walker 320 then takes action 536 of processing cycle 540a, which includes updating the current offset in payload 542 to "1" and transitioning to disjoint node N1508. The conversion may include extracting (also referred to herein as loading) disjoint node N1508.

Since disjoint node N1508 presents multiple conversion path options, such as epsilon paths 530a and 530b, act 536 of processing cycle 540b may include selecting upper epsilon path 530a and extracting node N2510 that is not associated with payload 542 without being consumed from payload 542. Because the matching function is not performed by the splitting node N1508, there is no change in the current offset/sector 532 and thus no payload is consumed for processing cycle 540 b.

Because disjoint node N1508 presents multiple path options, act 536 may include storing unexplored context, such as by storing an indirect or direct identifier of node N3512 and current offset 520b (i.e., "1"). The selected conversion path may be referred to herein as the current or active thread, and each stored non-traversed conversion path may be referred to herein as a storage thread. Each thread may be identified by a corresponding node identifier and an offset in the payload. The unexplored context may then identify an unexplored thread (i.e., path).

Storing this unexplored context may enable walker 320 to remember to return to node N3512 to walk node N3512 with section "1" at offset 520b in payload 542 in the event of a negative match result occurring along the selected partial match path, e.g., if the negative match result was determined at node N2510 or at a node along a path extending from node N2510. According to embodiments disclosed herein, unexplored context may be marked with a Discard Unexplored Processing (DUP) indicator that indicates whether walker 320 discarded or processed the unexplored context in the event that a final match for pattern 502 is identified along the selected transition path.

For example, based on reaching marker node N5515 indicating a final (i.e., complete or complete) match to pattern 502 in the input stream, walker 320 may employ the DUP indicator to determine whether to process the unexplored context by walking node N3512 at offset 520b in section "x" in an effort to determine another path of NFA graph 504 that matches pattern 502, or whether to discard the unexplored context. Marking the unexplored context with a DUP indicator may include marking the unexplored context in any suitable manner, such as by setting a bit or field associated with the unexplored context to true to indicate desired processing of the stack entry, or to false to indicate desired discarding of the stack entry.

Whether a memory thread is traversed may be determined by compiler 306. For example, compiler 306 may control whether a setting is configured corresponding to the metadata of each node to set the DUP indicator. Alternatively, compiler 306 may configure a global setting included in the global metadata associated with the finite automaton, specifying that all storage threads need to be traversed so that all possible matches can be identified.

In this example embodiment, the selection of epsilon transition path 530a may result in a failure to detect a match at node N2510 or at a subsequent node of the current thread, such as N4514. Thus, if a match failure is detected, the storage thread for ε conversion path 530b may be traversed. Alternatively, if specified by compiler 306, ε conversion path 530b may be traversed regardless of whether traversing ε conversion path 530b caused a failure to detect a match.

Storing the non-traversed conversion path may include pushing an entry in a stack (e.g., the run stack 460 of FIG. 4) by storing an identifier of the next node N3513 in association with an indication of the current offset 522b in the entry. The identifier of the next node N3513 may be a value, a pointer, or any other suitable indicator of the next node. The value of the offset may be a numerical value, a pointer, or any other suitable value that identifies the location of the section 516 in the payload 542.

According to this example embodiment, based on selecting the upper path (i.e., ε switch path 530a), walker 320 may extract node N2510 and attempt to match the segment 522b (i.e., "x") at the current offset 520b (i.e., "1") with element "a" of node N2510 in processing cycle 540 c. Because "x" does not match element "a" at node N2510, act 536 of processing cycle 540c may include popping an entry from run stack 460. The popped entry 544b may be a most recently popped entry, such as a store entry 544a indicating node N3512 and offset 520b (i.e., "1") in the exemplary embodiment.

Walker 320 may translate and walk node N3512 and use the section "x" positioned at offset 520b in payload 542. Processing cycle 540d, then, demonstrates that the match result 534 is positive for processing cycle 540 d. Action 536 of processing cycle 540d may include updating the current offset to offset 520c and transitioning back to split node N1508, which may be the next node indicated by node N3512.

Because all arc transitions from disjoint node 508 are epsilon transitions, walker 320 may again select one of the plurality of path options and not consume from payload 542, since the current offset is not updated for processing cycle 540 e. In the exemplary embodiment, walker 320 again selects epsilon transition path 530 a. The walker 320 then stores one thread again by pushing node N3512 and the current offset (now 520c (i.e., "2")) on the running stack 460. As shown for processing cycle 540f, walker 320 extracts node N2510 and matches section 522c (i.e., "a") at offset 520c (i.e., "2") with element "a" of node N2510. Because "a" matches at node N2510, walker 320 updates the current offset to 520d (i.e., "3") and converts to node N4514, which is specified by node N2510 metadata as configured by compiler 306.

Thus, for processing cycle 540g, walker 320 may extract the next node N4514 and the next segment 522d (i.e., "b") at offset 520 d. Because "b" matches at node N4514, walker 320 may transition to the next node N5515. Node N5515 is a marker node associated with an indicator that represents a final (i.e., complete or complete) match to regular expression pattern 542 in the input stream. Thus, for processing cycle 540h, walker 320 may not continue to walk along the current path and report the final match by storing an entry in match result buffer 466. Walker 320 may then examine the run stack 460 for stored threads and either drop the stored threads or activate them as indicated by the corresponding DUP indicator. Thus, walker 320 pops an entry identifying node N3512 and offset 520 (i.e., "2"), and determines whether to activate or discard a stored thread by walking node N3512 in section 522c at offset 520c, according to the DUP indicator associated with the popped entry.

As shown in table 538 of fig. 5B, the number of processing cycles for matching payload 542 with pattern 502 is eight, and walker 320 pushes and pops unexplored context to remember and return to node N3512 twice. Further, table 538 illustrates that walking NFA graph 504 with payload 542 in a non-speculative manner results in processing the section "x" at nodes N2510 and N3512 in two processing cycles. According to embodiments disclosed herein, such performance may be improved by reducing the number of processing cycles required for matching, reducing the number of times a section may be processed, and reducing the number of times memory is accessed for push and pop operations required to store and retrieve unexplored contexts.

The performance optimization obtained from the embodiments disclosed herein may be based on the following conclusions: a given segment at a given offset may be processed by at least two nodes in the NFA, and for most of the time (e.g., 99%), the given segment at the given offset is processed by at least two nodes, the given segment being unmatched at a first node of the at least two nodes and matched at a second node of the at least two nodes. For example, in the example embodiment of fig. 5A, as disclosed above with reference to table 538 of fig. 5B, the segment 522B (i.e., "x") at a given offset 520B (i.e., "1") is processed through two nodes N2510 and N3512, and does not match at node N2510 but does match at node N3512.

According to embodiments disclosed herein, matching performance may be optimized by processing the section at a given offset in parallel at each of the at least two nodes. Processing the at least two nodes in parallel may be referred to herein as speculative processing. Embodiments disclosed herein may be based on the following assumptions: a matching operation at a selected one of the at least two nodes will cause a mismatch. The selected node of the at least two nodes may be referred to herein as an element node. The non-selected nodes of the at least two nodes (that will be traversed based on mismatches at the selected node) may be referred to herein as parallel nodes and may be speculatively processed in the same processing cycle as the same section that the selected node processes to improve matching performance. As described below with reference to fig. 5D, both node N2510 and node N3512 may be processed with sector "x" at a given offset 520b, optimizing matching performance by speculatively walking sector "x" at offset 520b at node N3512 in the same processing cycle as sector "x" at given offset 520b is walked at node N2510.

FIG. 5C is a block diagram of an example embodiment of a table 570 of speculative processing rules 578 a-d. Table 570 is a truth table with actions 576 that are based on element node match results 574 and parallel node match results 572. Four possible scenarios are presented according to the speculative processing rules 578 a-d. For example, speculative processing rules 578a, 578b, 578c, and 578d each have a corresponding follow-up action 576, which is positive/positive, positive/negative, negative/positive, and negative/positive, respectively, based on the match results. Follow-up action 576 may be based on the matching results of parallel node 572 and the set of matching results at element node 574.

Speculative processing rule 578b may be particularly meaningful because it optimizes matching performance by matching at the element node and the parallel node in parallel, as indicated by action 576 updating the offset and without conversion. Speculative processing rule 578b then enables the element node and the parallel node to process the next section in parallel, thereby eliminating memory accesses for node fetches.

Since the parallel nodes are speculatively processed, follow-up action 576 is directed to providing follow-up actions for the element node if the match for the element node is positive. For example, if the match at the element node results in a positive, then follow-up action 576 includes updating the current offset in the payload and transitioning to a next node, which is specified by the metadata associated with the element node. If the element node's match result is positive, then the parallel node's match result is used to determine if the follow-up action 576 includes pushing the parallel node and the current offset to store as the explored context.

For example, speculative processing rule 578a pushes an entry to run stack 460 to allow walker 320 to return to the parallel node with the section at the current offset, as the return may result in another partially matching thread in the NFA graph. However, if the match result at the parallel node is negative, as is the case for speculative processing rule 578c, then the unexplored context is not pushed onto the stack, since returning the segment used at the current offset to the parallel node will not advance the partial match of the pattern. Thus, the performance of the match can also be optimized by speculative processing rule 578c, because speculative processing rule 578c eliminates at least one set of push and pop operations for the match.

As shown by speculative processing rule 578d, the at least one next follow-up action may include not continuing to walk a given path based on the matching results for parallel node 572 and the set of matching results for element node 574, including a negative matching result at each node. The next section at the next given offset within the payload may be walked based on sensing unexplored context, such as by checking the runtime stack 460 for stored threads and popping the stored threads if already stored. The method may terminate the walk based on not sensing the unexplored context.

As shown by speculative processing rules 578a and 578c, based on the set of matching results for elemental node 574 (including a positive matching result at the elemental node) and parallel node 572 (including a positive matching result or a negative matching result for the segment at the parallel node), the at least one subsequent action includes updating the given offset to produce a next offset and transitioning to the next node. The next node may be identified based on metadata associated with the element node. The next node may then walk with the next segment at the next offset within the payload. Based on the positive match result for the segment at the parallel node, the at least one next follow-up action may further include storing an unexplored context in a stack entry and pushing the stack entry onto a stack, as shown by speculative processing rule 578 a. The unexplored context identifies the parallel node and the given offset, either directly or indirectly.

Fig. 5D is a table 550 for an example embodiment of traversing the processing cycles 554a-f of the NFA graph 504 of fig. 5A with the payload 542 in a speculative manner. As shown in table 550, processing cycles 554a-f may include traversing a current node 530' at a current offset 532' with a section from payload 542 to determine a match result 534' and determining walking action 536' based on match result 534 '. According to embodiments disclosed herein, walker 320 may process node N2510 and node N3512 in parallel with a given section at a given offset in payload 542, optimizing matching performance using the speculative processing rules disclosed in fig. 5C. For example, as disclosed below, processing cycles 554c and 554d may determine walker action 536' based on the set of matching results of N2510 and node N3512.

Similar to the embodiment of fig. 5B disclosed above, the walker 320 may walk the starting node N0506 with the segment 522a (i.e., "h") at the current offset 520a (i.e., "0"). When segment 522a matches the character "h" at node N0506, walker 320 may determine that match result 534' is a positive match result. Similar to the embodiment of fig. 5B, the next node indicated by node N0506 is split node N1508. Thus, walker 320 takes action 536' of processing cycle 554a, which includes updating the current offset in payload 542 to 520b (i.e., "1") and transitioning to disjoint node N1508. The conversion may include extracting (also referred to herein as loading) disjoint node N1508.

According to the example embodiment of fig. 5D, the metadata of the split node associated with split node 508 may include a speculative processing indicator. If the speculative processing indicator is not included in the split node metadata, walker 320 may continue as in the example embodiment of fig. 5B. Including the speculative processing indicator may include setting a field or other suitable data in the separate node metadata. Setting the field may include configuring the field to true to indicate speculative processing and configuring the field to false to indicate non-speculative processing. Including the speculative processing indicator may be done in any suitable manner (i.e., in parallel) that enables walker 320 to walk at least two nodes of NFA graph 504 to be speculatively processed.

According to the example embodiment of fig. 5D, if the split node metadata includes this speculative processing indicator, then no segments from the payload are consumed for processing cycle 554b, however walker 320 extracts both node N2510 and node N3512. Node N2510 may be referred to as an element node, while node N3512 may be referred to as a parallel node or, in this example embodiment, a speculative node, since node N3512 is speculatively processed (i.e., walked).

As shown for processing cycle 554c, walker 320 may determine a negative match result for section 522b (i.e., "x") at element node N2510 and a positive match result at parallel node N3512. These sets of matching results map to speculative processing rule entry 578b of fig. 5C. Then, the follow-up action 576 of speculative processing rule entry 578b specifies that the current offset is to be updated and the element and parallel nodes N2510 and N3512, respectively, are to be processed again. Since nodes N2510 and N3512 have already been fetched for processing cycle 554c, no node is fetched for processing cycle 554 d.

As shown for processing cycle 554d, walker 320 walks element node N2510 and parallel node N3512 with section 522c (i.e., "a") at the updated offset, which is offset 520c (i.e., "2"). The match result 534' is positive at both element node N2510 and parallel node N3512 because section "a" matches element "a" at node N2510 and also matches the "\\ N" element at node N3512 because "a" is not a line feed character. Thus, the set of positive match results 534' for processing cycle 554d maps to speculative processing rule entry 578a of FIG. 5C. Thus, unexplored context indicating the parallel node N3512 and the current offset 520c (i.e., "2") may be pushed onto the run stack 460 and the next node specified by the metadata of the element node may be extracted.

According to this example embodiment, the current offset may be updated to 520d (i.e., "3") and node N4514 may be extracted, causing walker 320 to switch. A positive match result (i.e., "b") for segment 522d may be determined at node N4514 for processing cycle 554e, and walker 320 may extract marker node N5515, transitioning to marker node N5515, which may be designated in the metadata associated with node N4514 as the next node of node N4514. Because node N5515 is a marker node, the walker may store the final match result to match result buffer 466 and not continue walking the active thread (e.g., current path) and activating a stored thread if run stack 460 is not empty.

For example, the walker 320 may check the running stack 460 for an empty status. In this example embodiment, the run stack 460 is not empty because unexplored context is pushed into the run stack 460 in processing cycle 554 d. Thus, walker 320 may pop up the unexplored context indicating that section 522d (i.e., "b") at offset 520d (i.e., "3") is used to advance the walk to parallel node N3512, and may determine whether to discard or process the unexplored context based on the DUP indicator associated with the stack entry as disclosed above. As shown in table 550 of this example embodiment, the number of processing cycles for matching payload 542 with pattern 502 is six, which is less than the eight processing cycles used in the example embodiment of fig. 5B.

FIG. 6A is a block diagram 600 of an NFA graph 604 that may be used by a walker 320 to match regular expression patterns 502 in input streams. In the exemplary embodiment, a region 507 of FIG. 5A (including split node N1508, speculative node N3512, and ε switch paths 530a and 530b) is represented by a variable count node N1N3' 607. Variable count node N1N3'607 is a collection of split node N1508 and parallel (i.e., speculative) node N3512 of FIG. 5A.

According to embodiments disclosed herein, the variable count node N1N3'607 may be configured to identify a given element, such as a character class 611 (i.e., [ ^ N ]), a variable number of instances 613, as infinitesimal, as indicated by the variable count node. The variable number of instances 613 may be at least zero times or any other suitable number of instances. It should be appreciated that the given element character class 611 is for purposes of illustrating this example embodiment and that the given element may be any suitable element that is matched by the variable count node N1N 3'.

A variable count node is a node that can match an element a variable number of times, which may be bounded by a range (e.g., zero to five times). The variable counting node may be one of four variable counting nodes: lazy, greedy, generic, or fully matched nodes. The lazy-type variable-count node may be configured to find a shortest possible element match within the range. A greedy or collator variable count node may be configured to find a longest possible element match within the range. The full-match variable count node may be configured to return all matches in the payload.

The lazy-type variable count node may be configured to consume (i.e., process) a single instance from a section of the payload based on a mismatch of a section at the next node identified by metadata associated with the lazy-type variable count node. The greedy variable count node may be configured to consume successive segments from the payload until a mismatch of one of the successive segments is determined at the greedy variable count node or until a variable total number of successive segments have been consumed (i.e., processed) by the greedy variable count node.

In the example embodiment of fig. 6A, variable-count node N1N3'607 is a lazy-type variable-count node associated with metadata 609 that identifies, either directly or indirectly, the next node 617, such as element node N2610. In this example embodiment, the walker advances the walk to element node N2610 based on zero or more matching instances of a variable number of consecutive instances 613 of a given element 611 in the input stream. For example, in the example embodiment, lazy-type variable count node N1N3'607 is configured to match zero or more instances of the character class element "\\ N" (i.e., not a wrap character) an infinite number of times.

According to embodiments disclosed herein, each node of the NFA may be associated with metadata that includes at least four fields, such as a node type, element, count, and next node, although one or more of the at least four fields may not be applicable based on the node type.

Metadata 609 associated with lazy-type variable count node N1N3'607 may include a count (not shown) for tracking the total number of consecutive instances (not shown) of positively matching elements 611 in the payload to allow comparison of the total number to the variable number 613.

According to embodiments disclosed herein, walker 320 may be configured to walk NFA graph 604 speculatively to optimize performance of matching regular expression pattern 502 in the input stream.

Fig. 6B is a table 618 of an example embodiment for traversing processing cycles 628a-g of NFA graph 604 of fig. 6A with payload 542 in a non-speculative manner. Similar to the embodiment of fig. 5A and 5B disclosed above, walker 320 may walk start node N0606 with section 522a (i.e., "h") at current offset 520a (i.e., "0"). When segment 522a matches the character "h" at node N0606, walker 320 may determine that match result 624 is a positive match result for processing cycle 628 a. In the embodiment of FIG. 6A, the next node indicated by node N0606 is lazy-type variable count node N1N3' 607. The walker 320 then takes action 626 of processing cycle 628a, which includes updating the current offset in the payload 542 to 520b (i.e., "1") and transitioning to the lazy-type variable counting node N1N3' 607. The conversion may include extracting (also referred to herein as loading) the lazy-type variable count node N1N3' 607.

Because the lazy-type variable counting node N1N3'607 is lazy, the actions 626 of the processing cycle 628b may include storing the unexplored context, such as by storing an indirect or direct identifier of the node N1N3'607 and the current offset 520b (i.e., "1") and advancing to the next node 617 identified by the lazy-type variable counting node N1N3'607 without updating the current offset. Thus, for processing cycle 628a, lazy-type variable count node N1N3'607 does not consume payload.

Storing this unexplored context may enable walker 320 to remember to return to lazy-type variable count node N1N3'607 to walk lazy-type variable count node N1N3'607 with section "x" at offset 520b in payload 542 in the event of a negative match result along the selected partial match path, e.g., if the negative match result is determined at node N2610 or at a node along a path extending from node N2610. To store the unexplored context, walker 320 may push 630a an entry to the running stack 460 that includes an identifier for lazy-type variable counting node N1N3'607 and offset 520 b.

According to embodiments disclosed herein, unexplored context may be marked with a DUP indicator that indicates whether walker 320 discarded or processed the unexplored context pushed in the event that a final match for pattern 502 is identified along the selected transition path. For example, based on reaching marker node N5615 indicating the final (i.e., complete or complete) match for pattern 502 in the input stream, walker 320 may employ the DUP indicator of the pushed stack entry to determine whether to process the unexplored context by walking lazy variable count node N1N3'607 at offset 520b in section "x" in an effort to determine another path of matching pattern 502 of NFA graph 604, or whether to discard the unexplored context, since only a single matching path of pattern 502 in the input stream is meaningful.

According to the example embodiment of fig. 6B, the walker 320 may extract the node N2610 and may attempt to match (i.e., search) the segment 522B (i.e., "x") at the current offset 520B (i.e., "1") in the processing cycle 628c with the element "a" of the node N2610. Because "x" does not match element "a" at node N2610, act 626 of process cycle 628c may include popping 630b an entry from the run stack 460. The popped entry may be a recently popped entry, such as the most recently pushed 630a entry indicating the lazy-type variable count node N1N3'607 and offset 520b (i.e., "1").

Walker 320 may translate and walk lazy-type variable count node N1N3'607 with section "x" located at offset 520b in payload 542. Because "x" is not a line feed character, "x" is a positive match at the lazy-type variable count node N1N3'607, and processing cycle 628d exhibits a match result 624 that is positive for processing cycle 528 d. Action 618 of processing cycle 528d may include updating the current offset to offset 520c and converting back to element node N2610, which may be the next node indicated by metadata 609 associated with lazy-type variable count node N1N3' 607.

As shown for processing cycle 628e, walker 320 extracts node N2610 and compares segment 522c (i.e., "a") at offset 520c (i.e., "2"). Because "a" is a positive match at element node N2610, walker 320 updates the current offset to 520d (i.e., "3") and transitions to node N4614.

Thus, for processing cycle 628f, walker 320 may extract node N4614 and segment 522d (i.e., "b") at offset 520 d. Because "b" is a positive match at node N4614, walker 320 may transition to node N5615. Node N5615 is a marker node associated with an indicator that represents a final (i.e., complete or complete) match to regular expression pattern 542 in the input stream. Thus, for processing cycle 628g, walker 320 may not continue walking and report the final match by storing an entry in match result buffer 466. The walker may then examine the running stack 460 for stored threads and either drop the stored threads or activate them as indicated by the corresponding DUP indicators of these entries in the running stack 460.

As shown in table 618 of fig. 6B, the number of processing cycles for matching payload 542 to pattern 502 is seven, and walker 320 pushes 630a and pops 630B unexplored context to remember and return twice with section "x" at offset 520B to lazy-type variable count node N1N3' 607. Thus, table 618 also reveals that segment "x" is processed (i.e., consumed) by walker 320 at both lazy-type variable counting node N1N3'607 and node N2610, and is a mismatch (i.e., a negative match) at node N2610 and a positive match at lazy-type variable counting node N1N3' 607.

According to embodiments disclosed herein, such performance may be improved by reducing the number of processing cycles required for matching, by reducing the number of processing cycles that a given section may be processed, and by reducing the number of times memory is accessed by reducing the number of push and pop operations to store and retrieve unexplored contexts. Similar to the walking disclosed above with respect to fig. 5D, embodiments disclosed herein may walk NFA graph 604 in a speculative manner.

For example, walker 320 may be configured to walk at least two nodes in NFA 604 in parallel with a given section at a given offset within payload 542. The walker 320 may determine a match result for the segment at the given offset within the payload at each of the at least two nodes. Walker 320 may determine at least one follow-up action for walking NFA graph 604 based on each determined set of matching results.

Fig. 6C is a table 648 for another example embodiment of processing cycles 658a-e for traversing the NFA graph 604 of fig. 6A with the payload 542. As shown in table 648, processing cycles 658a-e may include traversing a current node 650 at a current offset 652 with a section from payload 542 to determine a match result 654 and determining a walking action 656 based on the match result 654. According to embodiments disclosed herein, walker 320 may process lazy-type variable counting node N1N3'607 and element node N2610 in parallel with a given segment at a given offset in payload 542, thereby optimizing matching performance using the speculative processing rules disclosed above with reference to fig. 5C.

According to the example embodiment of FIG. 6C, metadata 609 associated with lazy-type variable count node N1N3'607 may include a speculative processing indicator (not shown). If the speculative processing indicator is not included in the lazy-type variable count node metadata, walker 320 may continue as in the example embodiment of fig. 6B.

Including the speculative processing indicator may be done in any suitable manner (i.e., in parallel) that enables walker 320 to walk at least two nodes of NFA graph 604 to be processed speculatively. The at least two nodes being processed in parallel may include an element node and a parallel node. In this example embodiment, node N2610 may be referred to as an element node and lazy-type variable count node N1N3'607 may be referred to as a parallel node.

According to the example embodiment of fig. 6C, if the lazy-type variable count node metadata includes this speculative processing indicator, the segment corresponding to the current offset in the payload may be processed for processing cycle 658b and walker 320 may extract both element node N2610 and lazy-type variable count node N1N3' 607. As shown for processing cycle 658b, walker 320 may determine a negative match result for section 522b (i.e., "x") at element node N2610 and a positive match result at parallel node N1N3' 607. These sets of matching results map to speculative processing rule entries 578 b. Thus, act 576 specifies that the current offset is to be updated and causes the element and the parallel nodes (e.g., nodes N2610 and N1N3'607), respectively, to be processed again (i.e., walked). Since nodes N2610 and N1N3'607 have already been extracted for processing cycle 658b, no extraction nodes are needed for processing cycle 658 c.

As shown for processing cycle 658c, walker 320 walks element node N2610 and parallel node N1N3'607 in payload 542 with section 522c (i.e., "a") at the updated offset, which is offset 520c (i.e., "2"). The match result 654 is positive at both element node N2610 and parallel node N1N3'607 because section "a" matches element "a" at element node N2610 and also matches the "^ N" element at parallel node N1N3'607 because "a" is not a line feed character.

The set of positive match results 654 in the match results for processing cycle 658c maps to speculative processing rule entry 578 a. Thus, an unexplored context indicating the parallel node N1N3'607 and the current offset 520c (i.e., "2") may be pushed onto the run stack 460 and the next node specified by the metadata of the element node 610 may be extracted. According to this example embodiment, the current offset may be updated to 520d (i.e., "3") and node N4614 may be extracted, since node N4614 is the next node indicated by the metadata of element node N2610.

A positive match result (i.e., "b") for segment 522d may be determined at node N4614 for processing cycle 658d, and walker 320 may transition to marker node N5615, which may be designated as the next node of node N4614 in the metadata associated with node N4614. Because node N5615 is a tagged node, the walker may store the final match result to the match result buffer 466 and not continue walking the active thread (e.g., current path).

The walker 320 may check the running stack 460 for an empty status. In this example embodiment, the running stack 460 is not empty because unexplored context was pushed into the running stack 460 in processing cycle 658 c. Thus, walker 320 may pop up the unexplored context indicating that section 522d (i.e., "a") at offset 520c (i.e., "2") is used to advance the walk to parallel node N1N3'607, and may determine whether to drop the unexplored context or process the unexplored context based on the DUP indicator associated with the stack entry.

As shown in table 648 of this example embodiment, the number of processing cycles to match payload 542 with pattern 502 using speculative processing is five, which is less than the seven processing cycles used in the non-speculative processing example embodiment of fig. 6B. It should be understood that such performance gains based on the speculative processing shown in the example embodiments disclosed above are for illustrative purposes and that the performance gains achieved by using speculative processing may be greater than those illustrated. For example, such performance gains may increase depending on the input payload. Based on the content of the input stream, further churning (churning) push 630a and pop 630B operations, such as that of fig. 6B, for conversion from or to a parallel node may be more generally used for different payloads, resulting in greater performance gains than described below.

Fig. 6D is a block diagram 660 of another payload 662 that may be walked using NFA graph 604 of fig. 6A. The input payload 662 includes a section 670 at an offset 672. Sections 674a-f correspond to sections h, x, a, and b, which map to offsets 676a-f (i.e., 0, 1, 2, 3, 4, and 5), respectively.

Fig. 6E is a table 680 for an example embodiment of processing cycles 681a-k of NFA graph 604 of fig. 6A walking through payload 662 of fig. 6D in a non-speculative manner. As shown in table 680, processing cycles 681a-k may include walking a current node 682 at a current offset 684 with a segment from payload 662 to determine a match 686 and walking action 688 based on match 686. As shown in this example embodiment, eleven processing cycles are required before finding the final match of pattern 502 in payload 662. Additionally, the processing cycle reflects that the unexplored context of parallel node N1N3'607 has been pushed and popped up multiple times because walker 320 determines a mismatched section at element node N2610, resulting in agitation of walker 320 between element node N2610 and parallel node 607. Such churning between nodes causes walker 320 to fetch parallel node 607 and element node N2610 at the cost of performance because the corresponding memory access requires additional processor cycles. Such memory accesses can be expensive, particularly because these memories may be Error Correction Code (ECC) protected memories. Thus, accessing ECC protected memory for a push or pop operation may take four clock cycles or more.

Fig. 6F is a table 690 of another example embodiment of processing cycles 691a-g for traversing NFA graph 604 of fig. 6A with payload 662 of fig. 6D in a speculative manner. As shown in table 690, processing cycles 691a-g may include traversing a current node 692 at a current offset 694 with a section from payload 662 to determine a match result 696 and determining a walking action 698 based on match result 696. As shown in this example embodiment, seven processing cycles are required before finding a final match for pattern 502 in payload 662, as opposed to eleven processing cycles 681a-k for the non-speculative processing embodiment of fig. 6E. According to embodiments disclosed herein, walker 320 may process lazy-type variable counting nodes N1N3'607 (parallel nodes in this example embodiment) and node N2610 (element nodes in this example embodiment) in parallel with a given segment at a given offset in payload 662 to optimize matching performance using the speculative processing rules disclosed above with reference to fig. 5C.

Fig. 7 is a flow diagram 700 of an example embodiment of a method that may be implemented in at least one processor operatively coupled with at least one memory in a security device operatively coupled to a network. The method may begin (702) and store in at least one memory at least one finite automaton (704) comprising a plurality of nodes generated from at least one regular expression pattern. The method may include operatively coupling the at least one memory to at least one processor and the at least one processor may be configured to walk the at least one finite automaton with segments of an input stream received via a hardware network interface operatively coupled to a network to match the at least one regular expression pattern in the input stream (706). The walking may include walking at least two nodes of a given finite automaton of the at least one finite automaton in parallel with a segment at a given offset within a payload of a packet of the input stream (708). The walking may include determining a match result for the segment at the given offset within the payload at each of the at least two nodes (710). The walking may further include determining at least one follow-up action for walking the given finite automaton based on the determined set of each match result (712), and in the example embodiment, the method then ends (714).

FIG. 8 is a block diagram of an example of the internal structure of a computer 800 in which embodiments of the present invention may be implemented. Computer 800 includes a system bus 802, which is a set of hardware lines used for data transfer between components of the computer or processing system. The system bus 802 is essentially a shared conduit that connects different elements of a computer system (e.g., processor, disk storage, memory, input/output ports, network ports, etc.) that enables the transfer of information between the elements. Working with system bus 802 is an I/O device interface 804 for connecting various input and output devices (e.g., keyboard, mouse, displays, printers, speakers, etc.) to the computer 800. The network interface 806 allows the computer 800 to connect to various other devices attached to a network. Memory 808 provides volatile storage for computer software instructions 810 and data 812 that may be used to implement embodiments of the present invention. Disk storage 814 provides non-volatile storage for computer software instructions 810 and data 812, which may be used to implement embodiments of the present invention. Central processor unit 818 also operates with system bus 802 and provides for the execution of computer instructions.

Other example embodiments of the invention may be configured using a computer program product; for example, the controls may be programmed in software to implement exemplary embodiments of the present invention. Other example embodiments of the invention may include a non-transitory computer readable medium containing instructions executable by a processor and that, when executed, cause the processor to perform the method described herein. It should be understood that the elements of the block diagrams and flowchart illustrations described herein may be implemented in software, hardware, firmware, or other similar implementations as determined in the future. Furthermore, the elements of the block diagrams and flow diagrams described herein may be combined or divided in any manner in software, hardware, or firmware.

It is to be understood that the term "herein" is transferable to applications or patents that incorporate the teachings presented herein such that the subject matter, definitions or data are maintained throughout the applications or patents that are incorporated herein.

If implemented in software, the software may be written in any language capable of supporting the example embodiments disclosed herein. The software may be stored in any form of a computer readable medium, such as Random Access Memory (RAM), Read Only Memory (ROM), compact disc read only memory (CD-ROM), and the like. In operation, a general-purpose or special-purpose processor loads and executes software in a manner well known in the art. It will be understood that the block diagrams and flow diagrams may include more or fewer elements, be arranged or oriented differently, or be represented differently. It should be understood that implementations may refer to block, flow, and/or network diagrams, and the number of block and flow diagrams illustrating the performance of embodiments of the invention.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the scope of the present invention as defined by the following claims.

Claims

1. A security device operatively coupled to a network, the security device comprising:

at least one memory configured to store at least one finite automaton comprising a plurality of nodes generated from at least one regular expression pattern;

at least one processor operatively coupled to the at least one memory and configured to walk the at least one finite automaton with segments of an input stream received via a network to match the at least one regular expression pattern in the input stream, the walking comprising:

walking in parallel at least two nodes of a given finite automaton of the at least one finite automaton with a segment at a given offset within a payload of a data packet in the input stream;

determining a match result for the segment at the given offset within the payload at each of the at least two nodes; and

at least one follow-up action for walking the given finite automaton is determined based on the determined set of each match result.

2. The security apparatus of claim 1, wherein the at least one finite automaton comprises a Deterministic Finite Automaton (DFA) and at least one non-deterministic finite automaton (NFA), the given finite automaton being a given NFA of the at least one NFA.

3. The security apparatus of claim 1, wherein the match result for the segment is determined at each of the at least two nodes in a same processing cycle of the at least one processor.

4. The security apparatus of claim 1, wherein the at least two nodes include an element node configured to match a single instance of a first element in the payload, the first element being a first character or first character class, and a parallel node that is one of: (i) a variable-count node configured to match a variable number of consecutive instances of a second element in the payload, the second element being a second character or second character class, or (ii) a speculative node configured to match the variable number of consecutive instances of the second element in the payload based on an arc of transition from or to a disjoint node.

5. The security device of claim 4, wherein the variable count node is a set of:

the detached node configured to advance the walk to the element node and the speculative node via epsilon transition arcs and walk the element and speculative nodes in parallel with the segment at the given offset independent of and without consuming from the payload; and

the speculative node configured to advance the walk back to the detached node and consume the segment by updating the given offset based on a positive match at the speculative node with the second element.

6. The security apparatus of claim 4, wherein the given finite automaton is an NFA graph comprising a transition from the variable counting node to the element node, the variable counting node preceding the element node in the NFA graph.

7. The security device of claim 4, wherein the variable count node is a lazy-type node associated with metadata that either directly or indirectly identifies the element node to advance the walk to the element node based on a single matching instance of the variable number of consecutive instances of the second element in the payload.

8. The security apparatus of claim 7, wherein the metadata associated with the variable count lazy node comprises a count for tracking a total number of consecutive instances of the second element that match in the payload to allow comparison of the total number to the variable number.

9. The security apparatus of claim 4, wherein, based on the parallel node being the speculative node, the given finite automaton comprises a disjoint node of the plurality of nodes, the element node and the speculative node identified based on metadata associated with the disjoint node, the disjoint node configured to:

the walk is advanced to the element node and the speculative node via epsilon transition arcs, independent of and without consumption from the payload, and the element and speculative nodes are walked in parallel with the segment at the given offset based on a speculative processing indicator included in metadata associated with the disjoint node.

10. The security apparatus of claim 9, wherein the element and speculative node are walked without parallel based on not including the speculative processing indicator in the metadata associated with the disjoint node.

11. The security device of claim 10, wherein walking the speculative node with the segment at the given offset is based on a negative match for the segment at the given offset at the element node.

12. The security device of claim 11, wherein walking the speculative node with the segment at the given offset is further based on storage and retrieval of an unexplored context that either directly or indirectly identifies the speculative node and the given offset.

13. The security apparatus of claim 12, wherein the storing of the unexplored context comprises:

storing the unexplored context in a stack entry; and

pushing the stack entry onto a stack, and retrieving the unexplored context includes popping the stack entry from the stack.

14. The security device of claim 13, wherein the speculative node is configured to advance the walk to the detached node based on a positive match at the speculative node with the second element.

15. The security device of claim 1, wherein, based on a negative match result that the set includes at each of the at least two nodes walked in parallel, the at least one follow-up action includes:

not continuing to walk a given path at each of the at least two nodes walked in parallel, the given path partially matching the at least one regular expression pattern in the given finite automaton;

walking a next node of the plurality of nodes with a next segment at a next given offset within the payload based on sensing unexplored context; and

terminating the walk based on not sensing the unexplored context.

16. The security device of claim 15, wherein the unexplored context identifies, either directly or indirectly, the next node and the next given offset to advance the walk with the next segment at the next node along another path that partially matches the at least one regular expression pattern in the given finite automaton.

17. The security apparatus of claim 16, wherein the sensing of the unexplored context comprises:

determining a non-empty state of a stack; and

popping a stack entry from the stack that includes the unexplored context and is an entry that was recently pushed onto the stack.

18. The security apparatus of claim 1, wherein the at least two nodes comprise an element node and a parallel node, and based on the set comprise the following:

a positive match result at the element node for the segment; and

the positive match result or a negative match result at the parallel node for the segment, the at least one follow-up action comprising:

updating the given offset to produce a next offset;

identifying a next node in the plurality of nodes based on metadata associated with the element node;

walking the next node with a next segment at the next offset within the payload;

determining a next match result for the next segment at the identified next node; and

at least one next follow-up action for walking the given finite automaton is determined based on the determined next match result.

19. The security device of claim 18, wherein, based on the positive match result for the segment at the parallel node, the at least one subsequent action further comprises:

storing an unexplored context in a stack entry; and

pushing the stack entry onto a stack, the unexplored context identifying, either directly or indirectly, the parallel node and the given offset.

20. The security apparatus of claim 19, wherein based on the next match result being the negative match result, the at least one next subsequent action comprises:

walking the parallel node with the segment at the given offset in the payload based on sensing unexplored context; and

terminating the walk based on not sensing the unexplored context.

21. The security apparatus of claim 20, wherein the sensing of the unexplored context comprises:

determining a non-empty state of a stack; and

22. The security device of claim 18, wherein updating the given offset to generate the next offset comprises:

increasing the given offset based on the direction of the walk being a forward direction; and

the given offset is decreased based on the direction of the walk being a reverse direction.

23. The security device of claim 1, wherein the at least two nodes walked in parallel include an element node and a parallel node, and based on the set including a negative match result at the element node for the segment and a positive match result at the parallel node for the segment, the at least one subsequent action includes:

updating the given offset to produce a next offset; and

the element node and the parallel node are walked in parallel with a next section at the next offset.

24. The security device of claim 23, wherein updating the given offset to generate the next offset comprises:

25. The security apparatus of claim 1, wherein walking the at least two nodes in parallel optimizes the performance of the matching by:

the method further includes refraining from storing and retrieving context that would be needed if the at least two nodes were not walked in parallel to advance the walk with the segment at the given offset from a first node of the at least two nodes to a second node of the at least two nodes based on a negative match result of the segment at the given offset.

26. A method for processing a finite automaton, comprising:

storing, in at least one memory, at least one finite automaton comprising a plurality of nodes generated from at least one regular expression pattern; and

operatively coupling the at least one memory to at least one processor configured to walk the at least one finite automaton with segments of an input stream received via a hardware network interface operatively coupled to a network to match the at least one regular expression pattern in the input stream, the walking comprising:

27. The method of claim 26, wherein the at least one finite automaton comprises a Deterministic Finite Automaton (DFA) and at least one non-deterministic finite automaton (NFA), the given finite automaton being a given NFA of the at least one NFA.

28. The method of claim 26, wherein the determination at each of the at least two nodes is in a same processing cycle of the at least one processor.

29. The method of claim 26, wherein the at least two nodes include an element node configured to match a single instance of a first element in the payload, the first element being a first character or first character class, and a parallel node that is one of: (i) a variable-count node configured to match a variable number of consecutive instances of a second element in the payload, the second element being a second character or second character class, or (ii) a speculative node configured to match the variable number of consecutive instances of the second element in the payload based on an arc of transition from or to a disjoint node.

30. The method of claim 29, wherein the variable count node is a set of:

31. The method of claim 29, wherein the given finite automaton is an NFA graph that includes a transition arc from the variable counting node to the element node, the variable counting node preceding the element node in the NFA graph.

32. The method of claim 29, wherein the variable count node is a variable count lazy node associated with metadata that either directly or indirectly identifies the element node to advance the walk to the element node based on a single matching instance of the variable number of consecutive instances of the second element in the payload.

33. The method of claim 32, wherein the metadata associated with the variable count lazy node comprises a count for tracking a total number of consecutive instances of the second element that match in the payload to allow comparison of the total number to the variable number.

34. The method of claim 29, wherein, based on the parallel node being the speculative node, the given finite automaton comprises a disjoint node of the plurality of nodes, the element node and the speculative node identified based on metadata associated with the disjoint node, the disjoint node configured to:

35. The method of claim 34, wherein the element and speculative node are walked without parallel based on not including the speculative processing indicator in the metadata associated with the split node.

36. The method of claim 35, wherein walking the speculative node with the segment at the given offset is based on a negative match for the segment at the element node at the given offset.

37. The method of claim 36, wherein walking the speculative node with the segment at the given offset is further based on storage and retrieval of an unexplored context that either directly or indirectly identifies the speculative node and the given offset.

38. The method of claim 37, wherein the storing of the unexplored context comprises:

storing the unexplored context in a stack entry; and

39. The method of claim 38, wherein the speculative node is configured to advance the walk to the detached node based on a positive match at the speculative node with the second element.

40. The method of claim 26, wherein, based on a negative match result that the set includes at each of the at least two nodes walked in parallel, the at least one follow-up action includes:

terminating the walk based on not sensing the unexplored context.

41. The method of claim 40, wherein the unexplored context identifies, either directly or indirectly, the next node and the next given offset such that, in advancing the walk at the next node with the next segment along another path, the path portion matches the at least one regular expression pattern in the given finite automaton.

42. The method of claim 41, wherein the sensing of the unexplored context comprises:

determining a non-empty state of a stack; and

43. The method of claim 26, wherein the at least two nodes include an element node and a parallel node, and including, based on the set:

a positive match result at the element node for the segment; and

updating the given offset to produce a next offset;

44. The method of claim 43, wherein, based on the positive match result for the segment at the parallel node, the at least one follow-up action further comprises:

storing an unexplored context in a stack entry; and

45. The method of claim 44, wherein based on the next match result being the negative match result, the at least one next subsequent action comprises:

terminating the walk based on not sensing the unexplored context.

46. The method of claim 45, wherein the sensing of the unexplored context comprises:

determining a non-empty state of a stack; and

47. The method of claim 46, wherein updating the given offset to generate the next offset comprises:

48. The method of claim 26, wherein the at least two nodes walked in parallel include an element node and a parallel node, and based on the set including a negative match result at the element node for the segment and a positive match result at the parallel node for the segment, the at least one follow-up action includes:

updating the given offset to produce a next offset; and

49. The method of claim 48, wherein updating the given offset to generate the next offset comprises:

50. The method of claim 26, wherein walking the at least two nodes in parallel optimizes the performance of the matching by:

51. A non-transitory computer-readable medium having encoded thereon a sequence of instructions that, when executed by at least one processor, cause the at least one processor to:

walking at least one finite automaton with segments of an input stream, the finite automaton comprising nodes generated from at least one regular expression pattern to match the at least one regular expression pattern in the input stream, the walking comprising: