WO2004079571A2 - Hardware accelerator state table compiler - Google Patents
Hardware accelerator state table compiler Download PDFInfo
- Publication number
- WO2004079571A2 WO2004079571A2 PCT/US2003/031312 US0331312W WO2004079571A2 WO 2004079571 A2 WO2004079571 A2 WO 2004079571A2 US 0331312 W US0331312 W US 0331312W WO 2004079571 A2 WO2004079571 A2 WO 2004079571A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- state
- finite automata
- recited
- grammar
- recursive
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/42—Syntactic analysis
- G06F8/427—Parsing
Definitions
- the present invention generally relates to processing of applications and documents for controlling the operations of general purpose computers and, more particularly, to performing parsing operations on applications programs, documents and/or other logical sequences of symbols in a given but arbitrary language or format .
- certain character strings correspond to certain commands or identifications, including special characters and other important data (collectively referred to as control words) which allow data or operations to, in effect, identify themselves so that they may be thereafter treated as "objects" such that associated data and commands can be translated into the appropriate formats and commands of different applications in different languages in order to engender a degree of compatibility of respective connected platforms sufficient to support the desired processing at a given machine.
- the detection of these character strings is performed by an operation known as parsing, similar to the more conventional usage of resolving the syntax of an expression, such as a sentence, into its component parts and describing them grammatically.
- control words will be limited to a finite but possibly large number and thus allowable sequences of symbols will be similarly limited as an incident of the content and grammar of the language .
- parsing of a document to identify its contents has proven to be an important tool for providing security in processors and networks through detection of control words which may represent an attack, unauthorized access or other possible breach of security.
- the conventional approach to parsing a document is to implement a table-based finite state machine (FSM) in software to search for these strings of interest.
- the state table resides in memory and is designed to search for the specific patterns of interest in the document.
- the current state is used as the base address into the state table and the ASCII representation of the input character is an index into the table. For example, assume the state machine is in state 0 (zero) and the first input character is ASCII value 02, the absolute address for the state entry would be the sum/concatenation of the base address (state 0) and the index/ASCII character (02) .
- the FSM begins with the CPU fetching the first character of the input document from memory.
- the CPU then constructs the absolute address into the state table in memory corresponding to the initialized/current state and the input character and then fetches the state data from the state table. Based on the state data that is returned, the CPU updates the current state to the new value, if different (indicating that the character corresponds to the first character of a string of interest) and performs any other action indicated in the state data (e.g. issuing a token or an interrupt if the single character is a special character or if the current character is found, upon a further repetition of the foregoing, to be the last character of a string of interest) .
- any other action indicated in the state data e.g. issuing a token or an interrupt if the single character is a special character or if the current character is found, upon a further repetition of the foregoing, to be the last character of a string of interest.
- the above process is repeated and the state is changed as successive characters of a string of interest are found. That is, if the initial character is of interest as being the initial character of a string of interest, the state of the FSM can be advanced to a new state (e.g. from initial state 0 to state 1) . If the character is not of interest, the state machine would (generally) remain the same by specifying the same state (e.g. state 0) or not commanding a state update) in the state table entry that is returned from the state table address. Possible actions include, but are not limited to, setting interrupts, storing tokens and updating pointers. The process is then repeated with the following character.
- the invention provides a methodology and a compiler for performing the method and loader, preferably implemented in software within an arrangement such as a hardware parser accelerator, which can read a language specification or specification summarizing desired performable functions to produce an output which can be loaded into a memory accessible by a device, such as a parsing accelerator, including a finite state machine (FSM) in order to customize the personality of the FSM and, in turn, the device including the
- a hardware parser accelerator which can read a language specification or specification summarizing desired performable functions to produce an output which can be loaded into a memory accessible by a device, such as a parsing accelerator, including a finite state machine (FSM) in order to customize the personality of the FSM and, in turn, the device including the
- FSM finite state machine
- the language or other specification is preferably written in a formal notation such as the Backus-Naur Form (BNF) or its derivatives or other regular expressions.
- BNF Backus-Naur Form
- the compiler in accordance with the invention generates the corresponding state transitions to form a state transition specification comprising one or more state tables.
- FIG. 1 is a high level schematic block diagram of the invention
- Figure 2A is a diagram representing a state table useful in understanding the invention
- Figure 2B is a high level flow chart showing the basic operation of a generalized form of the invention
- Figure 3 is a high level flow chart showing the operation of a preferred embodiment of the invention.
- FIG. 4 is a high level context diagram of the preferred embodiment of the invention.
- Figures 5A, 5B, 5C, 5D, 5E, 5F, 5G, 5H and 51 illustrate grouping and recognizing sub-expressions in grammar rule definitions
- Figure 6 comprising Figures 6A and 6B, illustrates an example of an output state table specification file represented completely in a self- describing data format.
- FIG. 1 a high level schematic block diagram of a basic form of the personality compiler in accordance with the invention and connected to provide state tables to a finite state machine (FSM) in a device, preferably a hardware parsing accelerator.
- FSM finite state machine
- the personality compiler 100 can be implemented as a stand-alone device which can be connected to a memory 105 (e.g.
- Figure 2A illustrates a portion of an exemplary state table as disclosed therein.
- an XMLTM document is used herein as an example of one type of logical data sequence which can be processed using an accelerator in accordance with the invention.
- Other logical data sequences can also be constructed from network data packet contents such as user terminal command strings intended for execution by shared server computers. (Such command strings are frequently generated by malicious users and sent to shared server computers as part of a longer term intrusion attempt.)
- the accelerator in accordance with the invention is suitable for processing many such logical data sequences. It will also be helpful observe that many entries in the portion of the state table illustrated in Figure 1 are duplicative .
- the hexadecimal representation of a symbol be used as an index into the state table and the vertical columns thereof are accordingly labelled "00" to "FF" .
- the rows are numbered to reflect the various states which the FSM can assume.
- the rows of the base address are thus divided into a number of columns corresponding to the number of codes which may be used to represent characters in the document to be parsed; in this example, two hundred fifty-six (256) columns corresponding to a basic eight bit hexadecimal byte for a character. As many characters as may be required, printable or non- printable, may be accommodated in this fashion.
- an entry of "stay in state n" provides for the state to be maintained through potentially long runs of one or more characters such as might be encountered, for example, in numerical arguments of commands, as is commonly encountered.
- the invention provides special handling of this type of character string to provide enhanced acceleration, as will be discussed in detail below.
- an entry of "go to state 0" signifies detection of a character which distinguishes the string from any string of interest, regardless of how many matching characters have previously been detected and returns the parsing process to the initial/default state to begin searching for another string of interest . (For this reason, the "go to state 0" entry will generally be, by far, the most frequent or numerous entry in the state table.) Returning to state 0 may require the parsing operation to return to a character in the document subsequent to the character which began the string being followed at the time the distinguishing character was detected.
- An entry including a command with "go to state 0 indicates completion of detection of a complete string of interest.
- the command will be to store a token (with an address and length of the token) which thereafter allows the string to be treated as an object.
- a command with "go to state n" provides for launching of an operation at an intermediate point while continuing to follow a string which could potentially match a string of interest.
- the parsing operation begins with the system in a given default/initial state, depicted in Figure 2A as state 0, and then progresses to higher numbered states as matching characters of a character string of interest are found upon repetitions of the process.
- state 0 When a string of interest has been completely identified or when a special operation is specified at an intermediate location in a string which is potentially a match, the operation such as storing a token or issuing an interrupt is performed.
- the character must be fetched from CPU memory, the state table entry must be fetched (again from CPU memory) and various pointers (e.g. to a character of the document and base address in the state table) and registers (e.g.
- the hardware parser accelerators disclosed in the above-incorporated applications accelerate the parsing process by providing for many of these operations to be performed in parallel while subsequent symbols of a document are being evaluated by the finite state machine therein.
- the basic function of a parser is to uniquely recognize an input character (e.g. symbol or binary signal sequence) string of interest and issue a unique token and other information upon such recognition. Recognition of nested strings of interest must also be detected and validated in some cases and for some purposes. Therefore, it is important to recognize that all character strings which can result in the issuance of a token are incidents of the language of the document being parsed as defined by control words and the characteristic syntax of that language.
- incidents of the language which are represented by control words and/or their arrangement in a sequence may also be regarded as tokens in regard to the language specification. It follows that the language specification contains sufficient information to define all character strings of interest that can result in the issuance of tokens by the parser for a given language or set of character strings of interest and is thus sufficient for generation of a state table to recognize them.
- a "next token" is called, as shown at 210. It is assumed that some order will exist in the language specification if only in the serial order of the data in which it is expressed. The actual order, to the extent an order exists, may be arbitrary and, in any event, does not affect the usability of the state transition specifications which will be developed since the parser is arranged to recognize strings of interest in any order.
- the order of tokens may affect the assigned state numbers but those state number are of no practical consequence. That is, any string of interest will cause advancement through a sequence of states of the state table to arrive at a terminal state at which a string of interest will have been uniquely identified but the numbers of the states and their sequence have no effect on the result .
- the calling of a "next token” thus functions to provide a mechanism to cause the consideration of the entire language specification by looping over the entire process until all tokens have been considered.
- this operation is carried out by reading the grammar input file 215, identifying the grammar entities such as control words and syntax requirements for characters/symbols (e.g. branching statements, characters delimiting fields, and the like) and tokenizing them by assigning unique tokens to each identified entity.
- Particular matching rules or criteria e.g. specifying numbers of arbitrary characters
- These functions are collectively indicated at 220 of Figure 2B.
- transition diagrams or finite automata (by which terminology such transition diagrams may be referenced hereinafter), as indicated at 230, for some grammar entities such as control words representing commands provided in the language while other grammar entities such as branching statements and delimiter symbols which are recursive will require additional processing and transformation to obtain character strings which can be expressed in a state table .
- remaining grammar rules that have not been transformed into character strings are tested to determine if they are recursive or express other properties such as exclusion. If needed, in accordance with this test, the grammar rules are simplified to be expressed as a character string or expanded into expanded grammar rules at 245.
- a nested subprocess at 246 that duplicates the steps as indicated by the loop 249 is performed to generate a new set of finite automata for the recursive symbol.
- This recursive symbol becomes the starting state for the new set of finite automata, and any additional recursive symbols encountered within the nested subprocess will be treated as if they were literal symbols.
- Literal symbols are symbols that can be used directly as an input for a state transition.
- the new set of finite automata generated for the recursive symbol is saved away in memory for processing later, and the recursive symbol is marked as a literal symbol in the grammar rule so that it breaks up the recursion when the processing is returned to step 230.
- the process is then repeated by looping to 210, as indicated by the loop 249 alluded to above, until all grammar entities have been considered and processed to form a complete sequence of finite automata, or state transition diagrams.
- a state transition diagram is made up of nodes for states and label edges for transitions.
- the label edges identify two pieces of information: input (e.g. condition for transition) and next state. If the same input (e.g. a character) can cause multiple transitions, to different states, the finite automaton is known as non-deterministic.
- the transformation processing at 230 produces both non- deterministic finite automata (NFA) and deterministic finite automata (DFA) .
- NFA is not suitable for building state tables for an FSM of a hardware accelerator.
- a check is performed at 260 to pick out the NFA.
- the NFA are then transformed into DFA at 265 by collapsing states that have certain properties into a closure set .
- the finite automata created previously for the recursive symbol are gathered together at 296 so that the same process to transform the finite automata into a state table can be performed again with the steps starting from 260.
- the loop at 292 repeats until all recursive symbols are transformed into state tables.
- the grammar file is read and the grammar entities are identified and tokenized as illustrated at 310.
- the tokenized grammar rules are then stored in a production table, as illustrated at 320.
- the grammar rule operations are then transformed into character strings (CharSet) insofar as is possible, as illustrated at 330.
- the grammar file is preferably expressed in a formal notation such as Backus-Naur Form (BNF) or a derivative thereof such as Extended Backus-Naur form (EBNF) .
- BNF Backus-Naur Form
- EBNF Extended Backus-Naur form
- XMLTM is documented in this form by the World Wide Web
- a language is made up of symbols with a set of rules (grammar) that govern how they can be correctly combined together.
- a language starts with a start symbol, and the symbol is defined with the right hand side expression as shown in the above notation using additional symbols, descriptors, attributes, and operators. New symbols are defined in the subsequent rules until all symbols for the language are defined.
- #xN where N is a hexadecimal integer, the expression matches the character in ISO/IEC 10646 whose canonical (UCS-4) code value, when interpreted as an unsigned binary number, has the value indicated.
- the number of leading zeros in the #xN form is insignificant; the number of leading zeros in the corresponding code value is governed by the character encoding in use and is not significant.
- [a-zA-Z] , [#xN-#xN] matches any character with a value in the inclusive range (s) indicated.
- [abc] , [#xN#xN#xN] matches any character with a value among the characters enumerated. Enumerations and ranges can be mixed in one set of brackets.
- [ a-z] , [ A #xN-#xN] matches any character with a value not among the characters given. Enumerations and ranges of forbidden values can be mixed in one set of brackets.
- a B matches A followed by B. This operator has higher precedence than alternation; thus A B
- a I B matches A or B but not both; also known as alternation.
- a - B matches any string that matches A but does not match B; (A excludes B) .
- A+ matches one or more occurrences of A. Concatenation has higher precedence than alternation; thus A+
- A* matches zero or more occurrences of A. Concatenation has higher precedence than alternation; thus A*
- Name : : (Letter] '_'
- a 'Namechar' is either an alphabetic character, a numeric character, a period, a dash, an underscore or a colon. It will be appreciated that some of the foregoing notations specify exclusion operations (e.g. A - B) .
- each of the previously identified recursive symbols is used as a starting symbol for a new expansion that will end up with a complete continuous grammar rule for the recursive symbol. It enables a new set of finite automata to be generated specifically for each of the recursive symbols. A set of states associated with these recursive symbols will be generated later in the process based on the finite automata created at this step.
- the loader populates the state table (s) within the hardware accelerator FSM according to the state information produced by the Hardware Accelerator Personality Compiler (HAPC) .
- the HAPC In addition to state identifications and state transitions, the HAPC also identifies all recursive symbols to the loader as shown in Figure 6.
- the loader processes a state transition involving a recursive symbol, it recognizes the recursive symbol. Instead of having the FSM to go to the next state immediately, the loader loads commands into the FSM as actions for this particular transition to push the next state information on to the stack within the hardware accelerator and branch to the starting state of the grammar rule for the recursive symbol.
- the loader For each of the terminal states in the grammar for the recursive symbol, the loader loads commands as actions for the terminal states in the FSM to pop the state information off the stack and go to the next state that is popped off from the stack.
- the loader performs the same operations as just described.
- the stack within the hardware accelerator enables the handling of these nested state transitions as a result of having recursive definition in the grammar rule.
- Non-deterministic finite automata are then generated from the expanded grammar rules (350) and transformed into deterministic finite automata
- DFA DFA
- the DFA can then be optimized (360) and the optimized DFA transformed into state table entries (370) which are then stored as discussed above.
- object oriented programming As is well- understood in the art, objects are essentially modules of a larger program which encapsulate and hide the details of their operations (which are irrelevant to the function of the overall function of the program and the interaction of the objects themselves) while the objects are able to call other objects, as needed, to carry out the program.
- the objects also can be arranged into classes which have relationships forming a context which is illustrated in Figure 4. In the following descriptions of classes of software objects and the objects therein, the descriptions of the objects and their functions which are provided are sufficient to the successful practice of the invention and further details thereof which are encapsulated by the objects are not important to the successful practice of the invention.
- the hardware accelerator personality compiler in accordance with the invention comprises a main HAPC class and twelve additional classes: 1. InputMgr
- the HAPC class contains the main program, and methods to direct the execution from reading the input, doing the compilation processing, and writing the output.
- the InputMgr class object is responsible to tokenize the input from a grammar rule specification file.
- the Token class object defines the supported token categories and provides support to access, set, and update tokens.
- the RuleMgr class object organizes the tokenized grammar production rules in a hash table allowing the software to have quick access to the grammar rules.
- the CharSet class object provides special support for character set entities in a grammar rule.
- the ExpandedRule class object provides a facility to refine grammar rules into a continuous rule for a language starting from a specific token.
- the RecursiveSymbolMgr class object provides a repository to identify symbols that are used recursively in the grammar rule definitions.
- the RSEntry class object defines recursive symbol repository entry format.
- the NFAMgr class object provides support to create a non-deterministic finite automata from a grammar rule.
- the StateMgr class object manages a repository that contains state transition information which is used for the creation of the state table (s) .
- the StateEntry class object defines the format used for entries in the state repository.
- the TransitionEntry class object provides a facility to store the state transition information.
- the DFAMgr class object provides support to convert a non-deterministic finite automata into a deterministic finite automata that is suitable for state table generation.
- the Hardware Accelerator Personality Compiler (HAPC) class contains the main program to start off the whole compilation process. In addtion to the main method, the class contains the following methods : genStates writeStateTransitions timestampToString
- the genStates method is the main driver of the compilation process. It creates and interfaces with other class objects to read the input grammar specification, process the grammar specification information into finite states, and write the state transition information out to a file.
- the writeStateTransition method creates an output stream for the state transition specification produced by the HAPC, and write out the information to the output file.
- the timestampToString method is a utility method supporting the writeStateTransition method to format the timestamp information into a printable string.
- the Hardware Accelerator Personality Compiler Input Manager is responsible for reading the input file that contains rules for a language grammar and encoding the input rule data as tokens. Information in the input file is broken up into tokens so that they are readily identifiable by their category.
- the InputMgr class supports the following constructor and methods: InputMgr next_token startNewSection next line parseCharLiteral
- the InputMgr constructor sets up the Java Buffer Reader to read in the Input Grammar Rule file.
- the Input Grammar Rule file consists of three sections: User Directives, Production Rule, and
- the User Directives section appears first at the beginning of the file. All user directive keywords are prefixed with the "%" character. Currently, the only supported user directive is %StartSymbol which has one argument . The argument specifies the starting symbol for the language that is defined in the Production Rule section. Comments which are enclosed within the symbol set: /* and */ can appear anywhere inside the input file.
- the Production Rule section contains the grammar rules for the language to be processed. Currently, it is assumed that the grammar rules are represented in the EBNF format. All left hand side symbols of the production rules must start in column 1. A production rule may span over a number of lines.
- the Production Rule Overrides section is the last and optional section. It allows the user to re- specify some of the production rules that appeared earlier in the Production Rule section. This allows the user to specify all grammar rules as they were defined by the creator of a language without any changes in the Production Rule section. If certain rules have notations that cannot be processed automatically by this software, the user can re- specify those rules using only notations supported by this software in the Production Rule Overrides section.
- the Hardware Accelerator Personality Compiler software can start extracting the entire input grammar production rules from the input file one token at a time by invoking the next_token method repeatedly.
- Each token is initially formed by recognizing the delimiter characters in the input character stream created from the input file. The token is then classified into different token categories. These token categories are described in further detail in the Token section.
- the InputMgr handles formatting information transparently and skips all comments in the input file. Character literals which are specified as numeric values in the input file are converted into character values internally via the parseCharLiteral method before it is being tokenized.
- the startNewSection is a simple method allowing a caller to reset the InputMgr from the "end of the rule section" state and thus allowing the software to read in additional production rules to override some of the previous grammar rule specifications.
- the constructor, the startNewSection and the next_token methods are the primary external interfaces into the InputMgr class object.
- Other private methods implemented in the InputMgr class are: next_line, and parserCharLiteral .
- the private method, next_line gets a line of characters from the input file and returns a trimmed version of the input line to the caller. It keeps a line count for the input file, and it trims off the blank spaces at the beginning and at the end of an input line.
- the other private method is parseCharLiteral . It converts a character literal represented as a hexadecimal number into an internal ASCII character. This allows the non-printable characters to be processed within the software in the same way as the printable characters. Token
- the Token class provides a facility to create and maintain tokens . By breaking the input character stream into tokens, the software can easily classify each logical character sequence within the input file and process the information accordingly.
- EAF End Of File
- Tokens belonging to the Symbol category include: StrProd (Start Production), Symbol (regular grammar symbol) , RecursiveSymbol, Literal, Set, and CharSet.
- StrProd Start Production
- Symbol regular grammar symbol
- RecursiveSymbol Literal
- Set Set
- CharSet CharSet
- RecursiveSymbol is a token that is reclassified from a general Symbol token after the software determines that the symbol has been used recursively in the grammar rules.
- Single characters, numeric representation of characters, and character strings are marked as literals when they are tokenized.
- Numeric representation of characters are converted into regular ASCII characters before they are tokenized. By doing it this way, all characters are handled the same way.
- Input string that are enclosed within the square brackets are assigned to the Set token.
- the Set token may have a set of discrete characters, or a range of characters. When the values within a set are processed into a bit set that marks each individual character belonging to the set, the Set token is converted into a CharSet. Characters that are associated together using the "OR" operators in a grammar rule are also grouped into a CharSet.
- OpOr is the "or” operator which is denoted by the "
- OpExclude is the "exclude” operator which is denoted by the "-” symbol in the EBNF notation.
- Attribute tokens are used to describe the allowable occurrence frequency for a symbol in a particular rule for a language.
- the tokens in this category include: AttZeroOrOne; AttZeroOrMany; and AttOneOrMany.
- AttZeroOrOne is denoted by the "?” character in EBNF and it is used to indicate that the symbol that appears immediately before this token is an optional symbol. That optional symbol can appear zero or exactly one time in this particular context within the language.
- AttZeroOrMany is denoted by the "*" character in EBNF and it is used to indicate that the symbol that appears immediately before this token can occur zero or many times in the current context. While
- AttOneOrMany similarly allows the previous tokenized symbol to appear one or many times and the attribute is denoted by the "+" character in EBNF.
- the Group category have two tokens defined: LParen and RParen.
- LParen signals the beginning of a group, while RParen indicates the end of a group.
- a group is defined by the expression enclosed by the left parenthesis and the right parenthesis. The entire expression within a group is treated as a unit. Groups may be embedded within another group.
- the Misc category contains meta tokens. These tokens include BlockStart; BlockEnd; and RecExp. These tokens are inserted into the grammar rules stored in the internal production table primarily for debug purpose. As part of the state transition generation process, the grammar rules are expanded inline starting from the "language starting symbol” until all symbols becomes terminal symbols or recursive symbols. Recursive symbols are not expanded inline, of course, since recursive expansion would result in an infinite loop, as discussed above. To aid with debugging, the BlockStart and BlockEnd tokens are inserted into the resulting rule during the inline expansion to identify the beginning and the end of a rule segment within the expanded rule . The tokens contain the left hand side symbol name from the original input production rule to help with the identification. RecExp indicates a recursive expression.
- the Unknown token category is a place holder category for the software to hold an unknown token temporarily while it is being resolved, or before it is reported to the users as an error.
- the Token class provides the constructors and the following methods :
- the Token constructors and the setToken method allows the caller to construct a token from scratch.
- the caller may use the getCategory, equals, and the various isCategoryXXXX methods to perform inquiries on a token.
- the print methods will print all information related to a token to the screen.
- the RuleMgr class provides a facility to create and maintain the grammar production rules in a hash table known as the ruleTable.
- the right hand side expression of a grammar production rule is stored as a vector of tokens. The vector is saved into the hash table using the left hand side symbol of the production rule as the hash key.
- the RuleMgr constructor provides a common mechanism to initialize the RuleMgr class.
- Other methods are provided by the RuleMgr class to help to construct the ruleTable, to make queries on the ruleTable, to perform conversions, and to support debugging. These methods are: parseEBNFRules checkRule componentLength extractCharSet replaceGroupsWithCharSets convertCharSetEntities findExclusion findAlternation groupRightAltParam groupLeftAltParam groupAltParams printRule replaceRule parseEBNFRules is an import method provided by the RuleMgr class. parseEBNFRules allows a caller to extract the grammar rule specification from an input grammar file. The method uses the passed in InputMgr to read the grammar file. It then reconstructs each of the production rules as a vector of tokens. The rules are saved into the ruleTable, and each rule is keyed by its left hand side symbol .
- checkRule allows a caller to determine if a rule has already been defined in the ruleTable. This eliminates the need for the caller to access the hash table that implements the ruleTable directly.
- the method Given a symbol name for a grammar rule, the method, componentLength, returns the number of tokens required to define the grammar rule .
- a typical use of this method is to determine if the rule has only a single component (for example: a set) in the grammar rule expression.
- the method extractCharSet , checks a segment of the token vector for a grammar production rule as specified by a pair of indices as the input, and determines if the expression subset can be resolved into a CharSet. The method will return the CharSet to the caller if the expression subset can be transformed into a CharSet . This method supports the convertCharSetEntities method.
- the method replaceGroupsWithCharSets, goes through the passed in vector containing a sequence of tokens and replace all suitable expression subsets with CharSets. This method supports the convertCharSetEntities method.
- the method convertCharSetEntities, goes through the entire ruleTable and transforms all sets and eligible expression subsets into CharSets.
- the method, findExclusion goes through the entire ruleTable and finds all grammar production rules that contain the "exclude” operator. At completion, the method returns those grammar rules in a vector.
- the method, findAlternation goes through the entire ruleTable and finds all grammar production rules that contain the "OR” operator. At completion, the method returns those grammar rules in a vector.
- the method, groupRightAltParam adds a pair of parentheses around the sub-expression on the right hand side of the "OR" operator in a grammar rule if the sub-expression is not already grouped with parentheses .
- the method groupLeftAltParam, adds a pair of parentheses around the sub-expression on the left hand side of the "OR" operator in a grammar rule if the sub-expression is not already grouped with parentheses.
- the method, groupAltParam adds a pair of parentheses around the two sub-expressions on the each side of the "OR" operator in a grammar rule if the sub-expression is not already grouped with parentheses.
- the method, printRule provides debug support by printing the grammar rule that is named by the input left hand side symbol as a sequence of tokens to the screen.
- the method, replaceRule replaces the vector of tokens for a grammar rule as named by the input symbol .
- the primary purpose of the ExpandedRule class is to provide a facility to expand the grammar rule starting from a starting symbol, and continuously expand all production rules inline until all rule symbols have been refined into CharSets, character string literals, or recursive symbols.
- CharSet and character string literals are terminal symbols which cannot be further refined.
- Recursive symbols require a stack to perform its state transition due to its nature of recursively entering the same state.
- a separate special process will be implemented to handle recursive symbols. For the purpose of rule expansion though, they are being treated as if they are terminal symbols.
- Two constructors are provided to expand the grammar production rules contained in the passed in RuleMgr object.
- the RuleMgr is an input argument to the constructors .
- One other input argument required by the constructors is the "language starting symbol" . This gives the constructor a starting point to expand the rules.
- One of the two constructors also requires a Boolean flag argument to indicate if it is desirable to compress the resulting expanded production rule. The compression is carried out by avoiding the generation of tokens, especially Misc Tokens, that are generated primarily for debug purpose, and by aggressively transforming rule segments into CharSets.
- These constructors are the primary interfaces required by the callers to expand a grammar rule.
- the constructors will invoke the internal private methods to expand the production rules inline resulting in a single grammar rule that covers the entire language. In the process of expanding the rules, these methods will also identify recursive symbols. These recursive symbols are treated in the expansion effort as if they are terminal symbols. The recursive symbols are also saved away by the constructors into a table maintained by the RecursiveSymbolMgr for processing later. After the top level production rule has been expanded, the caller may invoke the "expandAllRS" method to expand all recursive symbols that were identified and saved away by the constructors.
- the expandAllRS and performSimpleExclude methods are the only other external interface in the ExpandedRule class.
- the expandAllRS method gets a list of all recursive symbols from the RecursiveSymbolMgr class, and expands each recursive symbol one at a time. Similar to the top level expansion, any recursive symbols encountered during the expansion process will be treated as terminal symbols. These recursive symbols will cause special action code to be generated during the state transition table creation so that it can request a stack to support recursion.
- the performSimpleExclude method goes through the expanded grammar rule to locate the "exclusion (-)" operators. For each one it encounters, if the operands of the exclusion operation are determined to be a CharSet with a character literal, or two CharSets, the method will perform the exclusion operation immediately, and replace the operation expression in the grammar rule with the resulting
- the init method helps the constructors to initialize the class variables and to kick off the grammar rule inline expansion processing.
- the isOnTheStack method provides internal support for the constructors to determine if a grammar symbol is a recursive symbol .
- the software keeps track of the grammar symbols along the expansion chain by pushing each symbol being expanded onto the stack. Once the symbol is fully expanded, it is popped off the stack. Before expanding a symbol, the code checks if the symbol is already on the stack. If that is the case, the symbol is identified as a recursive symbol.
- the expand method is a recursive method that performs inline expansion of grammar rules by obtaining the right hand side expression of each non terminal symbol it encountered and replacing the symbol with the expression. It begins with a starting symbol, and it continues the substitution with each symbol in the expanded rule until all symbols become terminal symbols or recursive symbols.
- a stack is used to identify all recursive symbols as described above in the isOnTheStack method.
- the expandRS method is very similar to the expand method described above. It supports the expandAllRS method to expand the grammar rules specifically for recursive symbols.
- the expansion is done like the expand method by means of copying the vector of tokens that represent the production rule named by a non terminal symbol out of the ruleMgr, and replace that symbol in the rule being expanded with the vector of tokens. The process is repeated continuously until all symbols in the expanded rules are terminal symbols or recursive symbols. If a recursive symbol, including the symbol of the recursive rule that is being expanded itself, is encountered during the expansion, it is treated as if it is a terminal symbol.
- CharSet is a class that supports a set facility for storing the set of valid characters used in an expression in a grammar production rule or derived from a sub-expression in the grammar rule. Character sets initially specified in a production rule in EBNF are enclosed within a pair of square brackets. The contents within the square brackets may be expressed in a number of ways:
- CharSet class will handle all these different ways of specifying a set of valid characters and convert them into a CharSet object transparently for the caller. Additional methods are available from the class allowing the caller to maintain a CharSet object.
- CharSet constructors There are two CharSet constructors available.
- a parameter-less constructor allows the caller to set up a CharSet object with contents to be added at a later time.
- the other constructor allows the caller to set up a CharSet and initialize its contents by specifying a string that is formatted with information as described above.
- the methods defined in the CharSet class are: add remove isln isEqual print charCount iterator
- Each add method allows the caller to add more characters into a CharSet object.
- the first variant allows the caller to specify a number of characters using a string format as described above .
- the second add method allows a caller to add a character to the CharSet object.
- the third variation allows a caller to copy the contents of another CharSet object into the current object.
- the first version allows a caller to remove a character from the current CharSet object.
- the second version accepts a CharSet object as an input parameter. It removes all characters that are found in the input CharSet from the current CharSet object.
- the isln method allows a caller to find out if a particular character is currently in the CharSet object .
- the isEqual method compares another CharSet object with the current object to determine if they have the same contents.
- the print method is provided for debug purpose.
- the charCount method returns the number of characters currently in the CharSet.
- the iterator method returns an iterator object to the caller allowing the caller to access each of the characters inside the CharSet one at a time.
- the CharSet class also contains an inner class, CharSetIterator.
- CharSetlterator is an implementation of the Iterator interface .
- the RecursiveSymbolMgr maintains a hash table allowing the caller to set up a table to contain production rules that are recursive in nature.
- the recursive symbol table is used by the InputMgr, the ExpandedRule, and the NFAMgr classes.
- the class creates a Java hash table with the constructor. Since the table is implemented using a Java hash table, access to and maintenance of the recursive symbol table are performed using the hash table methods .
- the class does not define any additional methods .
- the RSEntry class defines the structure of the entries for the Recursive Symbol Table that is implemented as a hash table in the RecursiveSymbolMgr class.
- the purpose of the class is to define the data structure. As such, only a constructor is provided to initialize the class variables. All fields in the data structure are directly accessible using their native methods.
- NFAMgr The NFAMgr class provides supports to transform an expanded grammar production rule into a non- deterministic finite automata (NFA) .
- the NFAMgr class encapsulates a StateMgr class that is used for storing the state transition information generated from the expanded input grammar rule.
- the StateMgr is instantiated by the NFAMgr constructor.
- the NFAMgr class also defines the following methods: genStates genNFA findLoopbackState checkAttributeNext eliminateDoubleEpsilons optimizeEpsilonTransitions
- the genStates method allows the caller to start the processing to transform an expanded grammar rule into a non-deterministic finite automata.
- the input expanded grammar rule is passed in as a vector of tokens.
- the method then calls the recursive genNFA method to decompose the expanded grammar rules into manageable segments and converts these segments into state transitions.
- the genNFA method process a segment of the input expanded grammar rule at a time in a recursive fashion until the entire grammar rule is transformed into a complete non-deterministic finite automata.
- the processing is done by grouping and recognizing the common sub-expressions used in the grammar rule definition as illustrated in Figures 5A -51.
- Figures 5A - 51 illustrate several commonly occurring language patterns described as non- deterministic finite automata (NFA) which are defined above by labels contained in the respective Figures.
- NFA non- deterministic finite automata
- Figure 5A For Example, the pattern “a*”, representing zero or more occurrences of "a”, is illustated in Figure 5A; the pattern “a?”, representing zero or one occurrences of"a” is illustrated in Figure 5B, etc.
- This notation and logical processing of a corresponding pattern is a well-known technique used in compilers to concisely represent these patterns.
- the transformation is preferably not done in the most optimized fashion 'at this point in order to come up with common state transition patterns to make it easy to group and combine the outcome from the grammar rule sub-expressions. Redundant states will be eliminated and common states will be combined once a complete NFA state transition sequence is created.
- the findLoopbackState method supports the attribute (i.e., *+?) transformation processing in the checkAttributeNext method to determine the starting state for the current grammar subexpression group so that one or more transition arcs can be added correctly for each of the attributes.
- the checkAttributeNext method checks to find out if an attribute is defined for a grammar rule sub-expression that has just been transformed into a
- the eliminateDoubleEpsilons method optimizes the NFA transition sequence to remove redundant state transitions.
- the StateMgr class supports the creation and maintenance of a state transition table. It provide supports to both the NFAMgr class and the DFAMgr class.
- the class constructor initializes class variables and allocate storage for the state transition table. Additionally, the constructor creates a hash table that maps the NFA states (old states) to the DFA state (new state) to support the DFA transformation.
- the assignNewState method reserves a state table entry and returns the corresponding state number to be used for a new transition state.
- the recycleState method allows a caller to release a state table entry back to the pool for reallocation.
- the addStateTransition method creates a transition arc from the current state to the next state based on the input transition information. It also creates a reverse link from the next state back to the current state transparently for the caller.
- the removeStateTransition method removes a transition arc between two states. It removes both the forward and the reverse links for the same transition between the two states.
- the getAHOutTransitions method returns a list of all outbound transitions related to the specified state to the caller.
- the getAllInTransitions method returns a list of all inbound transitions related to the specified state to the caller.
- the getEpsilonOutTransitions method returns to the caller a list of outbound epsilon transitions, transitions that are caused by an "empty" input, related to the specified state.
- the getEpsilonlnTransitions method returns to the caller a list of inbound epsilon transitions related to the specified state.
- the getEpsilonArcs method returns a list of transitions that are related to an epsilon input taken out from the passed in list of transitions. ⁇ This method exists primarily to support the getEpsilonOutTransitions and the getEpsilonlnTransitions methods.
- the getNonEpsilonOutTransitions method returns to the caller a list of all outbound transitions that exclude the epsilon transitions related to the specified state.
- the getNonEpsilonlnTransitions method returns to the caller a list of all inbound transitions that exclude the epsilon transitions related to the specified state.
- the getNonEpsilonArcs method returns a list of transitions that are not related to an epsilon input taken out from the passed in list of transitions. This method exists primarily to support the getNonEpsilonOutTransitions and the getNonEpsilonlnTransitions methods.
- the allocateEntry method allocates a state table entry off from the locally controlled vector of state table entries .
- the recycleEntry method puts a state table entry to the list of state table entries that are to be reused.
- the updateEntry method copies the state entry information into the appropriate location in the state table vector maintained internally within the StateMgr class object.
- the getEntry method retrieves the information related to a state from the internal state table vector.
- the locateState method provide supports to the DFA transformation. It will find a matching DFA state, if existed, that was created for a set of NFA states matching the input parameter.
- the printStatistics method provides debug support. It prints out to the screen the usage information related to the internally controlled state table.
- the printStateWithExt method provides debug support. It prints all information related to a state with additional information that was maintained to support DFA transformation.
- the printState method provides debug support. It prints all information related to a state.
- the listStatesWithNFAStateSet returns a list of DFA states that include the specified NFA state set.
- the listStatesWithClosureStateSet returns a list of states that are part of the epsilon closure.
- the peekNextNewStateNum returns the state number to be assigned to the next new state.
- the writeXMLOutput method supports writing the state table out to an output file stream in the XML format .
- the StateEntry class defines the content of a state table entry.
- a state entry contains three major fields: state number, a list of outgoing transition arcs, and a list of incoming transition arcs.
- the class constructor initializes the fields and creates the vectors for the outgoing and incoming arcs.
- the support the creation and maintenance of state table entries also defines the following methods: addToArc addFromArc removeToArc removeFromArc doesTransitionExist removeArc compareNFAStates printToArcs printFromArcs printArc printExtension isInNFAStateSet isInClosureStateSet writeXMLOutput
- the addToArc method adds an outgoing transition entry for the current state to the outgoing transition arc vector.
- the addFromArc method adds an incoming transition entry for the current state to the incoming transition arc vector.
- the removeToArc method removes an outgoing transition entry for the current state from the outgoing transition arc vector.
- the removeFromArc method removes an incoming transition entry for the current state from the incoming transition arc vector.
- the doesTransitionExist method allows the caller to do an inquiry to determine if the specified transition matches any of the transition entries in the outgoing transition arc vector.
- the removeArc method supports both the removeToArc and the removeFromArc methods to remove a particular transition entry from the passed in transition arc vector.
- the compareNFAStates method compares if the input set of NFA states matches the set of NFA states that are being replaces by the current DFA state .
- the printToArcs method provides debug support to print out the information of all outgoing transition arcs for the current state.
- the printFromArcs method provides debug support to print out the information of all incoming transition arcs for the current state.
- the printArc method supports both the printToArcs and the printFromArcs methods to print out to the screen all transition entry information stored in the passed in transition arc vector.
- the printExtension method provides debug support to print out the DFA transformation support information maintained in the state entry to the screen.
- the isInNFAStateSet method provides support to the DFA transformation to check if a particular NFA state is already included in the NFA state set maintained within the current state entry.
- the isInClosureStateSet method provides support to the DFA transformation to check if a particular NFA state is already included in the empty input closure state set maintained within the current state entry.
- the writeXMLOutput method supports writing a state table entry out to an output file stream in the XML format .
- the TransitionEntry class defines the data fields for information describing the transition arc going from one state to another.
- the information includes the type of the input that is causing the state transition; the actual value of the input that is causing the state transition; and the state number of the next state caused by this transition.
- the clear method set all data fields to an initial known state.
- the setSymbolName method sets the transition input type to "RELOCATE" as an indication that a branch to another state table may be needed to handle a recursive symbol .
- the name of the symbol is passed in as an input parameter and is saved in the symbol name field for reference later.
- the setInput methods are made up of three overloading methods, differing only in their input parameters.
- the first version of setlnput does not require any input. It sets the transition input type for the transition entry as an empty (epsilon) input .
- the second version requires a character input parameter. The method sets the transition entry input type to character type, and save away the input character value.
- the third version requires a CharSet input parameter. It sets the transition entry input type to CharSet, and saves the CharSet value away.
- the setTransition method allows the caller to specify the state number to go to for the transition.
- the setCheckedFlag method supports the DFA transformation. It allows the DFA transformation processing to mark this transition entry so that this entry is only processed once to expedite the transformation.
- the getlnputType method returns the input type of this transition entry to the caller.
- the getCharSet method returns the input CharSet value of this transition entry to the caller.
- the getInputChar method returns the input character value of this transition entry to the caller.
- the getTransition method returns the transition state number that is specified in this transition entry.
- the getSymbolName method returns the value of the input symbol stored in this entry to the caller.
- the getCheckedFlag method returns the current flag setting for the CheckedFlag in this entry to the caller.
- the isEqual method compares all values including the transition state information stored in the transition entry that is passed in as an input parameter with those stored in this transition entry. It returns true if the values are the same; false otherwise.
- the comparelnput method compares the input type and the input value stored in the transition entry that is passed in as an input parameter with the input type and the input value stored in this transition entry. It returns true if the values are the same; false otherwise.
- the copylnput method allows a caller to copy the input type and the input value information from a transition entry that is passed in as an input parameter to the current entry.
- the print method provides debug support to print out the content of this transition entry to the screen.
- the writeXMLCharlnput method supports the writeXMLOutput method by determining if the input character is a printable ASCII character and write it out to the output file stream in the appropriate XML format.
- the writeXMLOutput method supports writing the state transition information out to an output file stream in the XML format .
- the DFAMgr class supports the transformation of a Non-deterministic Finite Automata (NFA) into a Deterministic Finite Automata (DFA) .
- the DFAMgr class constructor accepts a NFAMgr which contains the NFA state table to be transformed into a DFA as an input.
- the constructor also requires two additional parameters to specify the NFA starting state and the NFA final state so that the DFAMgr can map them into the DFA starting state and the DFA final states.
- the constructor creates a new StateMgr to maintain the new DFA states to be generated.
- the caller can invoke the NFA2DFA method to perform the DFA transformation.
- the createDFAState method provide supports to the NFA2DFA method to perform DFA transformation. It creates a state table entry for a new DFA state. After the state entry is created, the method initializes the entry with the associated NFA state set and the epsilon closure set.
- the NFA2DFA method is the primary method for performing the transformation from a NFA into a DFA. It employs some of the commonly known compiler construction techniques to transform a NFA into a DFA.
- the addEpsilonOutStates is a recursive method that exists to support the eClosure method.
- the method adds epsilon (empty input) transition states in a recursive manner to the closure set originated from a set of NFA states that is mapped to a DFA state .
- the eClosure method builds and returns a set of epsilon closure states that are associated with the set of NFA states passed in as an input parameter.
- the getNFATransitionSet method builds and returns a set of non-epsilon transition entries that are associated with the set of states which are passed in as an input parameter.
- the extractNFAInputSet method looks at a set of transition entries that are passed in as an input parameter, and returns a set of input extracted from these transition entries to the caller.
- the extractNFATargetStateSet method looks at a set of transition entries that are passed in as the first input parameter, and returns a set of target states that have input matching the input specified in the transition entry which is passed in as the second input parameter for this method.
- the findDFAFinalStates method returns a set of DFA states that are designated as the allowable final states in the DFA state table.
- the set is determined based on the original NFA final state which is passed in as an input parameter.
- the printFinalStates method provides debug support to print out to the screen the set of DFA final states as determined by the NFA2DFA method.
- the writeXMLOutput method supports writing the state table corresponding to the Deterministic Finite Automata created by the DFAMgr out to an output file stream in the XML format.
- the file header at 600 identifies the contents of the file, the date it was generated, and the source of the grammar rules input.
- the next section of the file at 610 provides some general information about the identity and the layout of the state table being specified. At 611, it identifies the number of logical state tables described in this file. These logical state tables can be combined into one single physical state table by the loader by appending the states from the subsequent logical state tables to the first one and adjusting their transitions accordingly. (For example, if the current last state in the physical state table is 1205. The next available state entry in the physical state table is 1206.
- the initial state which was logically labelled as state 0 is loaded to the physical state table entry 1206. All state transitions from the logical state table will be adjusted with an offset of 1206. Therefore, if there were a transition to State 5 of the logical state table, the transition will become 1211 (1206+5) in the physical state table.)
- it identifies the names of the logical tables. The recursive symbols themselves are used as the name for the logical state tables for the recursive symbols.
- it provides information to label the column (state input) of the physical state table.
- the next segment of the file at 620 provides detailed specification for each of the logical state tables.
- the section at 621 provides a complete description of a logical state table specified by this file. It identifies the table by name at 622. It then identifies the logical initial state for this state table at 623. The allowable final states are listed at 624. The number of states for this logical state table is specified at 625. Detailed information of all the different states for this logical state table and their transitions are identified in the section of the file at 626. It first provides a logical state number as shown at 627. And then it lists all transitions originated from this state with their input at 628. The states that have a transition into this logical state are identified at 629. The section of the file at 626 is repeated for each state in the logical state table. And the information specified at 621 is repeated for each of the logical state tables. This provides the complete information to the loader to personalize the hardware accelerator.
- the invention can directly and automatically provide error-free state table data for any computer language or for other purposes from a language or function specification, preferably in a formal notation such as BNF or its derivative.
- the process is rapidly executable and results in error-free state table data at low cost.
- the invention allows a personality for a FSM to be rapidly changed, at will, to accommodate or provide different functions or reflect different languages or character stings of interest. While the invention has been described in terms of a single preferred embodiment, those skilled in the art will recognize that the invention can be practiced with modification within the spirit and scope of the appended claims.
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Devices For Executing Special Programs (AREA)
Abstract
Description
Claims
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP03816198A EP1604277A2 (en) | 2003-02-28 | 2003-10-03 | Hardware accelerator personality compiler |
| AU2003277247A AU2003277247A1 (en) | 2003-02-28 | 2003-10-03 | Hardware accelerator state table compiler |
| CA002521576A CA2521576A1 (en) | 2003-02-28 | 2003-10-03 | Hardware accelerator state table compiler |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US45032003P | 2003-02-28 | 2003-02-28 | |
| US60/450,320 | 2003-02-28 |
Publications (3)
| Publication Number | Publication Date |
|---|---|
| WO2004079571A2 true WO2004079571A2 (en) | 2004-09-16 |
| WO2004079571A3 WO2004079571A3 (en) | 2005-03-24 |
| WO2004079571B1 WO2004079571B1 (en) | 2005-05-19 |
Family
ID=32962492
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2003/031312 Ceased WO2004079571A2 (en) | 2003-02-28 | 2003-10-03 | Hardware accelerator state table compiler |
Country Status (6)
| Country | Link |
|---|---|
| US (1) | US20040172234A1 (en) |
| EP (1) | EP1604277A2 (en) |
| CN (1) | CN100470480C (en) |
| AU (1) | AU2003277247A1 (en) |
| CA (1) | CA2521576A1 (en) |
| WO (1) | WO2004079571A2 (en) |
Cited By (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP1607823A3 (en) * | 2004-06-14 | 2006-01-25 | Lionic Corporation | Method and system for virus detection based on finite automata |
| WO2006102849A1 (en) | 2005-03-30 | 2006-10-05 | Huawei Technologies Co., Ltd. | A method and device for pattern matching and parsing on abnf character string |
| US7216364B2 (en) | 2004-06-14 | 2007-05-08 | Lionic Corporation | System security approaches using state tables |
| WO2007101726A3 (en) * | 2006-03-08 | 2007-10-25 | Tomtom Int Bv | Processing device, method and computer program for detecting a certain computer command |
| US7596809B2 (en) | 2004-06-14 | 2009-09-29 | Lionic Corporation | System security approaches using multiple processing units |
| US7685637B2 (en) | 2004-06-14 | 2010-03-23 | Lionic Corporation | System security approaches using sub-expression automata |
| US8429605B2 (en) | 2009-12-30 | 2013-04-23 | The United States Of America As Represented By The Secretary Of The Navy | Finite state machine architecture for software development |
| DE112006000260B4 (en) * | 2005-01-21 | 2014-04-10 | Huawei Technologies Co., Ltd. | Parser for analyzing a text-coded protocol |
Families Citing this family (72)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7213265B2 (en) * | 2000-11-15 | 2007-05-01 | Lockheed Martin Corporation | Real time active network compartmentalization |
| US7225467B2 (en) * | 2000-11-15 | 2007-05-29 | Lockheed Martin Corporation | Active intrusion resistant environment of layered object and compartment keys (airelock) |
| EP1554663A4 (en) * | 2002-07-26 | 2009-02-11 | Kumar Bulusu Gopi | Method for specifying equivalence of language grammars and automatically translating sentences in one language to sentences in another language in a computer environment |
| US7080094B2 (en) * | 2002-10-29 | 2006-07-18 | Lockheed Martin Corporation | Hardware accelerated validating parser |
| US7146643B2 (en) * | 2002-10-29 | 2006-12-05 | Lockheed Martin Corporation | Intrusion detection accelerator |
| US20070061884A1 (en) * | 2002-10-29 | 2007-03-15 | Dapp Michael C | Intrusion detection accelerator |
| US7672965B2 (en) * | 2003-02-24 | 2010-03-02 | Avaya, Inc. | Finite-state machine augmented for multiple evaluations of text |
| FI115367B (en) * | 2003-03-07 | 2005-04-15 | First Hop Oy | Transaction control system and method |
| JP3982623B2 (en) * | 2003-03-25 | 2007-09-26 | インターナショナル・ビジネス・マシーンズ・コーポレーション | Information processing apparatus, database search system, and program |
| US7949732B1 (en) | 2003-05-12 | 2011-05-24 | Sourcefire, Inc. | Systems and methods for determining characteristics of a network and enforcing policy |
| US7275069B2 (en) * | 2004-04-26 | 2007-09-25 | Tarari, Inc. | System and method for tokening documents |
| US7512592B2 (en) * | 2004-07-02 | 2009-03-31 | Tarari, Inc. | System and method of XML query processing |
| US7539681B2 (en) * | 2004-07-26 | 2009-05-26 | Sourcefire, Inc. | Methods and systems for multi-pattern searching |
| US8560475B2 (en) | 2004-09-10 | 2013-10-15 | Cavium, Inc. | Content search mechanism that uses a deterministic finite automata (DFA) graph, a DFA state machine, and a walker process |
| US8301788B2 (en) * | 2004-09-10 | 2012-10-30 | Cavium, Inc. | Deterministic finite automata (DFA) instruction |
| US8392590B2 (en) * | 2004-09-10 | 2013-03-05 | Cavium, Inc. | Deterministic finite automata (DFA) processing |
| US20060117307A1 (en) * | 2004-11-24 | 2006-06-01 | Ramot At Tel-Aviv University Ltd. | XML parser |
| US20060155526A1 (en) * | 2005-01-10 | 2006-07-13 | At&T Corp. | Systems, Devices, & Methods for automating non-deterministic processes |
| US7703006B2 (en) * | 2005-06-02 | 2010-04-20 | Lsi Corporation | System and method of accelerating document processing |
| US7665016B2 (en) * | 2005-11-14 | 2010-02-16 | Sun Microsystems, Inc. | Method and apparatus for virtualized XML parsing |
| US8046833B2 (en) * | 2005-11-14 | 2011-10-25 | Sourcefire, Inc. | Intrusion event correlation with network discovery information |
| US7733803B2 (en) * | 2005-11-14 | 2010-06-08 | Sourcefire, Inc. | Systems and methods for modifying network map attributes |
| US7665015B2 (en) * | 2005-11-14 | 2010-02-16 | Sun Microsystems, Inc. | Hardware unit for parsing an XML document |
| US7716577B2 (en) * | 2005-11-14 | 2010-05-11 | Oracle America, Inc. | Method and apparatus for hardware XML acceleration |
| US7948988B2 (en) * | 2006-07-27 | 2011-05-24 | Sourcefire, Inc. | Device, system and method for analysis of fragments in a fragment train |
| US7701945B2 (en) * | 2006-08-10 | 2010-04-20 | Sourcefire, Inc. | Device, system and method for analysis of segments in a transmission control protocol (TCP) session |
| CN100437482C (en) * | 2006-12-31 | 2008-11-26 | 中国建设银行股份有限公司 | Developing platform of application software, generating method and operation platform and operation method |
| US8069352B2 (en) * | 2007-02-28 | 2011-11-29 | Sourcefire, Inc. | Device, system and method for timestamp analysis of segments in a transmission control protocol (TCP) session |
| EP2156290B1 (en) * | 2007-04-30 | 2020-03-25 | Cisco Technology, Inc. | Real-time awareness for a computer network |
| US8819217B2 (en) * | 2007-11-01 | 2014-08-26 | Cavium, Inc. | Intelligent graph walking |
| US7949683B2 (en) * | 2007-11-27 | 2011-05-24 | Cavium Networks, Inc. | Method and apparatus for traversing a compressed deterministic finite automata (DFA) graph |
| US8180803B2 (en) * | 2007-11-27 | 2012-05-15 | Cavium, Inc. | Deterministic finite automata (DFA) graph compression |
| US8474043B2 (en) | 2008-04-17 | 2013-06-25 | Sourcefire, Inc. | Speed and memory optimization of intrusion detection system (IDS) and intrusion prevention system (IPS) rule processing |
| US8311806B2 (en) * | 2008-06-06 | 2012-11-13 | Apple Inc. | Data detection in a sequence of tokens using decision tree reductions |
| US8272055B2 (en) | 2008-10-08 | 2012-09-18 | Sourcefire, Inc. | Target-based SMB and DCE/RPC processing for an intrusion detection system or intrusion prevention system |
| US8473523B2 (en) * | 2008-10-31 | 2013-06-25 | Cavium, Inc. | Deterministic finite automata graph traversal with nodal bit mapping |
| JP5809238B2 (en) | 2010-04-16 | 2015-11-10 | シスコ テクノロジー,インコーポレイテッド | System and method for near real-time network attack detection, and system and method for integrated detection by detection routing |
| US8433790B2 (en) | 2010-06-11 | 2013-04-30 | Sourcefire, Inc. | System and method for assigning network blocks to sensors |
| US8671182B2 (en) | 2010-06-22 | 2014-03-11 | Sourcefire, Inc. | System and method for resolving operating system or service identity conflicts |
| US9002876B2 (en) * | 2010-12-02 | 2015-04-07 | Sap Se | Interpreted computer language to analyze business object data with defined relations |
| US9398033B2 (en) | 2011-02-25 | 2016-07-19 | Cavium, Inc. | Regular expression processing automaton |
| US8601034B2 (en) | 2011-03-11 | 2013-12-03 | Sourcefire, Inc. | System and method for real time data awareness |
| US9858051B2 (en) * | 2011-06-24 | 2018-01-02 | Cavium, Inc. | Regex compiler |
| US8990259B2 (en) | 2011-06-24 | 2015-03-24 | Cavium, Inc. | Anchored patterns |
| WO2013020001A1 (en) | 2011-08-02 | 2013-02-07 | Cavium, Inc. | Lookup front end output processor |
| US9203805B2 (en) * | 2011-11-23 | 2015-12-01 | Cavium, Inc. | Reverse NFA generation and processing |
| US9082073B2 (en) * | 2011-11-30 | 2015-07-14 | Metaswitch Networks Ltd. | Method and apparatus for operating a finite state machine |
| US9141738B2 (en) * | 2012-06-04 | 2015-09-22 | Reveal Design Automation | Sequential non-deterministic detection in hardware design |
| US9563399B2 (en) | 2013-08-30 | 2017-02-07 | Cavium, Inc. | Generating a non-deterministic finite automata (NFA) graph for regular expression patterns with advanced features |
| US9426166B2 (en) | 2013-08-30 | 2016-08-23 | Cavium, Inc. | Method and apparatus for processing finite automata |
| US9426165B2 (en) | 2013-08-30 | 2016-08-23 | Cavium, Inc. | Method and apparatus for compilation of finite automata |
| US9419943B2 (en) | 2013-12-30 | 2016-08-16 | Cavium, Inc. | Method and apparatus for processing of finite automata |
| US9275336B2 (en) | 2013-12-31 | 2016-03-01 | Cavium, Inc. | Method and system for skipping over group(s) of rules based on skip group rule |
| US9544402B2 (en) | 2013-12-31 | 2017-01-10 | Cavium, Inc. | Multi-rule approach to encoding a group of rules |
| US9667446B2 (en) | 2014-01-08 | 2017-05-30 | Cavium, Inc. | Condition code approach for comparing rule and packet data that are provided in portions |
| US9904630B2 (en) | 2014-01-31 | 2018-02-27 | Cavium, Inc. | Finite automata processing based on a top of stack (TOS) memory |
| US9602532B2 (en) | 2014-01-31 | 2017-03-21 | Cavium, Inc. | Method and apparatus for optimizing finite automata processing |
| US9438561B2 (en) | 2014-04-14 | 2016-09-06 | Cavium, Inc. | Processing of finite automata based on a node cache |
| US10110558B2 (en) | 2014-04-14 | 2018-10-23 | Cavium, Inc. | Processing of finite automata based on memory hierarchy |
| US10002326B2 (en) | 2014-04-14 | 2018-06-19 | Cavium, Inc. | Compilation of finite automata based on memory hierarchy |
| CN104503728B (en) | 2015-01-04 | 2017-11-24 | 华为技术有限公司 | A kind of hardware accelerator and chip |
| US10027346B2 (en) * | 2015-05-11 | 2018-07-17 | Via Alliance Semiconductor Co., Ltd. | Hardware data compressor that maintains sorted symbol list concurrently with input block scanning |
| EP3370150B1 (en) | 2015-11-25 | 2020-02-19 | Huawei Technologies Co., Ltd. | Program generation method and system for accelerator |
| US9684496B1 (en) * | 2016-03-25 | 2017-06-20 | Norman L. Reid | Method for parsing programming languages and structured data |
| CN105791021A (en) * | 2016-04-12 | 2016-07-20 | 上海斐讯数据通信技术有限公司 | Hardware acceleration device and method |
| CN106057211B (en) * | 2016-05-27 | 2018-08-21 | 广州多益网络股份有限公司 | A kind of Signal Matching method and device |
| US10330773B2 (en) * | 2016-06-16 | 2019-06-25 | Texas Instruments Incorporated | Radar hardware accelerator |
| US10198646B2 (en) | 2016-07-01 | 2019-02-05 | International Business Machines Corporation | Hardware compilation of cascaded grammars |
| US10481881B2 (en) * | 2017-06-22 | 2019-11-19 | Archeo Futurus, Inc. | Mapping a computer code to wires and gates |
| US9996328B1 (en) * | 2017-06-22 | 2018-06-12 | Archeo Futurus, Inc. | Compiling and optimizing a computer code by minimizing a number of states in a finite machine corresponding to the computer code |
| US10586051B2 (en) * | 2017-08-31 | 2020-03-10 | International Business Machines Corporation | Automatic transformation of security event detection rules |
| US11782983B1 (en) * | 2020-11-27 | 2023-10-10 | Amazon Technologies, Inc. | Expanded character encoding to enhance regular expression filter capabilities |
Family Cites Families (97)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4279034A (en) * | 1979-11-15 | 1981-07-14 | Bell Telephone Laboratories, Incorporated | Digital communication system fault isolation circuit |
| US4527270A (en) * | 1983-05-04 | 1985-07-02 | Allen-Bradley Company | Communications network with stations that detect and automatically bypass faults |
| US5280577A (en) * | 1988-01-19 | 1994-01-18 | E. I. Du Pont De Nemours & Co., Inc. | Character generation using graphical primitives |
| US5027342A (en) * | 1989-05-03 | 1991-06-25 | The University Of Toronto Innovations Foundation | Local area network |
| US5003531A (en) * | 1989-08-11 | 1991-03-26 | Infotron Systems Corporation | Survivable network using reverse protection ring |
| US5193192A (en) * | 1989-12-29 | 1993-03-09 | Supercomputer Systems Limited Partnership | Vectorized LR parsing of computer programs |
| EP0436194A3 (en) * | 1990-01-02 | 1992-12-16 | National Semiconductor Corporation | Media access controller |
| US5214778A (en) * | 1990-04-06 | 1993-05-25 | Micro Technology, Inc. | Resource management in a multiple resource system |
| US5319776A (en) * | 1990-04-19 | 1994-06-07 | Hilgraeve Corporation | In transit detection of computer virus with safeguard |
| EP0459912B1 (en) * | 1990-05-30 | 1996-09-11 | Fujitsu Limited | An issue processing system for a right to use a resource |
| US5327159A (en) * | 1990-06-27 | 1994-07-05 | Texas Instruments Incorporated | Packed bus selection of multiple pixel depths in palette devices, systems and methods |
| US5247664A (en) * | 1991-03-28 | 1993-09-21 | Amoco Corporation | Fault-tolerant distributed database system and method for the management of correctable subtransaction faults by the global transaction source node |
| US5511213A (en) * | 1992-05-08 | 1996-04-23 | Correa; Nelson | Associative memory processor architecture for the efficient execution of parsing algorithms for natural language processing and pattern recognition |
| FR2706652B1 (en) * | 1993-06-09 | 1995-08-18 | Alsthom Cge Alcatel | Device for detecting intrusions and suspicious users for a computer system and security system comprising such a device. |
| US5519830A (en) * | 1993-06-10 | 1996-05-21 | Adc Telecommunications, Inc. | Point-to-multipoint performance monitoring and failure isolation system |
| US5414833A (en) * | 1993-10-27 | 1995-05-09 | International Business Machines Corporation | Network security system and method using a parallel finite state machine adaptive active monitor and responder |
| WO1995015529A1 (en) * | 1993-12-01 | 1995-06-08 | Marathon Technologies Corporation | Fault resilient/fault tolerant computing |
| US5606668A (en) * | 1993-12-15 | 1997-02-25 | Checkpoint Software Technologies Ltd. | System for securing inbound and outbound data packet flow in a computer network |
| JP3339741B2 (en) * | 1994-01-13 | 2002-10-28 | 株式会社リコー | Language analyzer |
| JP3438105B2 (en) * | 1994-03-18 | 2003-08-18 | 富士通株式会社 | Detour route search method |
| FR2721781B1 (en) * | 1994-06-28 | 1996-07-19 | Thomson Csf | Method for ensuring the confidentiality of a phonic link and local telecommunication network implementing the method. |
| US5737526A (en) * | 1994-12-30 | 1998-04-07 | Cisco Systems | Network having at least two routers, each having conditional filter so one of two transmits given frame and each transmits different frames, providing connection to a subnetwork |
| US5794177A (en) * | 1995-07-19 | 1998-08-11 | Inso Corporation | Method and apparatus for morphological analysis and generation of natural language text |
| KR100244836B1 (en) * | 1995-11-02 | 2000-02-15 | 포만 제프리 엘 | How to isolate a computer system and one of the multiple function cards |
| JP3165366B2 (en) * | 1996-02-08 | 2001-05-14 | 株式会社日立製作所 | Network security system |
| US6233704B1 (en) * | 1996-03-13 | 2001-05-15 | Silicon Graphics, Inc. | System and method for fault-tolerant transmission of data within a dual ring network |
| US5798706A (en) * | 1996-06-18 | 1998-08-25 | Raptor Systems, Inc. | Detecting unauthorized network communication |
| US6119236A (en) * | 1996-10-07 | 2000-09-12 | Shipley; Peter M. | Intelligent network security device and method |
| US6182029B1 (en) * | 1996-10-28 | 2001-01-30 | The Trustees Of Columbia University In The City Of New York | System and method for language extraction and encoding utilizing the parsing of text data in accordance with domain parameters |
| US5958015A (en) * | 1996-10-29 | 1999-09-28 | Abirnet Ltd. | Network session wall passively listening to communication session, with use of access rules, stops further communication between network devices by emulating messages to the devices |
| US5922049A (en) * | 1996-12-09 | 1999-07-13 | Sun Microsystems, Inc. | Method for using DHCP and marking to override learned IP addesseses in a network |
| US5920698A (en) * | 1997-01-06 | 1999-07-06 | Digital Equipment Corporation | Automatic detection of a similar device at the other end of a wire in a computer network |
| US5905859A (en) * | 1997-01-09 | 1999-05-18 | International Business Machines Corporation | Managed network device security method and apparatus |
| US5805801A (en) * | 1997-01-09 | 1998-09-08 | International Business Machines Corporation | System and method for detecting and preventing security |
| US6173333B1 (en) * | 1997-07-18 | 2001-01-09 | Interprophet Corporation | TCP/IP network accelerator system and method which identifies classes of packet traffic for predictable protocols |
| US5919257A (en) * | 1997-08-08 | 1999-07-06 | Novell, Inc. | Networked workstation intrusion detection system |
| US6094731A (en) * | 1997-11-24 | 2000-07-25 | Symantec Corporation | Antivirus accelerator for computer networks |
| US6021510A (en) * | 1997-11-24 | 2000-02-01 | Symantec Corporation | Antivirus accelerator |
| US6279113B1 (en) * | 1998-03-16 | 2001-08-21 | Internet Tools, Inc. | Dynamic signature inspection-based network intrusion detection |
| US6393386B1 (en) * | 1998-03-26 | 2002-05-21 | Visual Networks Technologies, Inc. | Dynamic modeling of complex networks and prediction of impacts of faults therein |
| US6083276A (en) * | 1998-06-11 | 2000-07-04 | Corel, Inc. | Creating and configuring component-based applications using a text-based descriptive attribute grammar |
| US6282546B1 (en) * | 1998-06-30 | 2001-08-28 | Cisco Technology, Inc. | System and method for real-time insertion of data into a multi-dimensional database for network intrusion detection and vulnerability assessment |
| US6366934B1 (en) * | 1998-10-08 | 2002-04-02 | International Business Machines Corporation | Method and apparatus for querying structured documents using a database extender |
| US6421656B1 (en) * | 1998-10-08 | 2002-07-16 | International Business Machines Corporation | Method and apparatus for creating structure indexes for a data base extender |
| US6370648B1 (en) * | 1998-12-08 | 2002-04-09 | Visa International Service Association | Computer network intrusion detection |
| US6374207B1 (en) * | 1999-02-10 | 2002-04-16 | International Business Machines Corporation | Methods, data structures, and computer program products for representing states of interaction in automatic host access and terminal emulation using scripts |
| US6418446B1 (en) * | 1999-03-01 | 2002-07-09 | International Business Machines Corporation | Method for grouping of dynamic schema data using XML |
| US6405318B1 (en) * | 1999-03-12 | 2002-06-11 | Psionic Software, Inc. | Intrusion detection system |
| US6446110B1 (en) * | 1999-04-05 | 2002-09-03 | International Business Machines Corporation | Method and apparatus for representing host datastream screen image information using markup languages |
| US7188168B1 (en) * | 1999-04-30 | 2007-03-06 | Pmc-Sierra, Inc. | Method and apparatus for grammatical packet classifier |
| US6408311B1 (en) * | 1999-06-30 | 2002-06-18 | Unisys Corp. | Method for identifying UML objects in a repository with objects in XML content |
| US6763499B1 (en) * | 1999-07-26 | 2004-07-13 | Microsoft Corporation | Methods and apparatus for parsing extensible markup language (XML) data streams |
| US6684335B1 (en) * | 1999-08-19 | 2004-01-27 | Epstein, Iii Edwin A. | Resistance cell architecture |
| US6363489B1 (en) * | 1999-11-29 | 2002-03-26 | Forescout Technologies Inc. | Method for automatic intrusion detection and deflection in a network |
| US6721727B2 (en) * | 1999-12-02 | 2004-04-13 | International Business Machines Corporation | XML documents stored as column data |
| US6697950B1 (en) * | 1999-12-22 | 2004-02-24 | Networks Associates Technology, Inc. | Method and apparatus for detecting a macro computer virus using static analysis |
| US6295276B1 (en) * | 1999-12-31 | 2001-09-25 | Ragula Systems | Combining routers to increase concurrency and redundancy in external network access |
| US20020073091A1 (en) * | 2000-01-07 | 2002-06-13 | Sandeep Jain | XML to object translation |
| US20020108059A1 (en) * | 2000-03-03 | 2002-08-08 | Canion Rodney S. | Network security accelerator |
| US7159237B2 (en) * | 2000-03-16 | 2007-01-02 | Counterpane Internet Security, Inc. | Method and system for dynamic network intrusion monitoring, detection and response |
| CA2307529A1 (en) * | 2000-03-29 | 2001-09-29 | Pmc-Sierra, Inc. | Method and apparatus for grammatical packet classifier |
| US6768716B1 (en) * | 2000-04-10 | 2004-07-27 | International Business Machines Corporation | Load balancing system, apparatus and method |
| JP2001296881A (en) * | 2000-04-14 | 2001-10-26 | Sony Corp | Information processing apparatus and method, and recording medium |
| AU2001264928A1 (en) * | 2000-05-25 | 2001-12-03 | Kanisa Inc. | System and method for automatically classifying text |
| US7007301B2 (en) * | 2000-06-12 | 2006-02-28 | Hewlett-Packard Development Company, L.P. | Computer architecture for an intrusion detection system |
| AUPQ849500A0 (en) * | 2000-06-30 | 2000-07-27 | Canon Kabushiki Kaisha | Hash compact xml parser |
| FR2811782B1 (en) * | 2000-07-12 | 2003-09-26 | Jaxo Europ | DOCUMENT CONVERSION SYSTEM WITH TREE STRUCTURE BY SELECTIVE PATHWAY OF SAID STRUCTURE |
| EP1314098A1 (en) * | 2000-08-02 | 2003-05-28 | Biospace.Com, Inc. | Apparatus and method for producing contextually marked-up electronic content |
| EP1307828B1 (en) * | 2000-08-02 | 2004-06-09 | Philipp Kutter | Xml-robot |
| US20020120697A1 (en) * | 2000-08-14 | 2002-08-29 | Curtis Generous | Multi-channel messaging system and method |
| US7475405B2 (en) * | 2000-09-06 | 2009-01-06 | International Business Machines Corporation | Method and system for detecting unusual events and application thereof in computer intrusion detection |
| US6799248B2 (en) * | 2000-09-11 | 2004-09-28 | Emc Corporation | Cache management system for a network data node having a cache memory manager for selectively using different cache management methods |
| US8108543B2 (en) * | 2000-09-22 | 2012-01-31 | Axeda Corporation | Retrieving data from a server |
| US7225467B2 (en) * | 2000-11-15 | 2007-05-29 | Lockheed Martin Corporation | Active intrusion resistant environment of layered object and compartment keys (airelock) |
| US7213265B2 (en) * | 2000-11-15 | 2007-05-01 | Lockheed Martin Corporation | Real time active network compartmentalization |
| US20020099734A1 (en) * | 2000-11-29 | 2002-07-25 | Philips Electronics North America Corp. | Scalable parser for extensible mark-up language |
| US20020069317A1 (en) * | 2000-12-01 | 2002-06-06 | Chow Yan Chiew | E-RAID system and method of operating the same |
| US6671689B2 (en) * | 2001-01-19 | 2003-12-30 | Ncr Corporation | Data warehouse portal |
| EP1225516A1 (en) * | 2001-01-22 | 2002-07-24 | Sun Microsystems, Inc. | Storing data of an XML-document in a relational database |
| US6959416B2 (en) * | 2001-01-30 | 2005-10-25 | International Business Machines Corporation | Method, system, program, and data structures for managing structured documents in a database |
| US20020116644A1 (en) * | 2001-01-30 | 2002-08-22 | Galea Secured Networks Inc. | Adapter card for wirespeed security treatment of communications traffic |
| US6631379B2 (en) * | 2001-01-31 | 2003-10-07 | International Business Machines Corporation | Parallel loading of markup language data files and documents into a computer database |
| US20020111963A1 (en) * | 2001-02-14 | 2002-08-15 | International Business Machines Corporation | Method, system, and program for preprocessing a document to render on an output device |
| US7194683B2 (en) * | 2001-03-02 | 2007-03-20 | International Business Machines Corporation | Representing and managing dynamic data content for web documents |
| US6862588B2 (en) * | 2001-07-25 | 2005-03-01 | Hewlett-Packard Development Company, L.P. | Hybrid parsing system and method |
| US20020010715A1 (en) * | 2001-07-26 | 2002-01-24 | Garry Chinn | System and method for browsing using a limited display device |
| US20030041302A1 (en) * | 2001-08-03 | 2003-02-27 | Mcdonald Robert G. | Markup language accelerator |
| US7024351B2 (en) * | 2001-08-21 | 2006-04-04 | Microsoft Corporation | Method and apparatus for robust efficient parsing |
| US7639257B2 (en) * | 2002-07-31 | 2009-12-29 | Adobe Systems Incorporated | Glyphlets |
| US7493603B2 (en) * | 2002-10-15 | 2009-02-17 | International Business Machines Corporation | Annotated automaton encoding of XML schema for high performance schema validation |
| US7080094B2 (en) * | 2002-10-29 | 2006-07-18 | Lockheed Martin Corporation | Hardware accelerated validating parser |
| US20070061884A1 (en) * | 2002-10-29 | 2007-03-15 | Dapp Michael C | Intrusion detection accelerator |
| US20040083466A1 (en) * | 2002-10-29 | 2004-04-29 | Dapp Michael C. | Hardware parser accelerator |
| US20040194016A1 (en) * | 2003-03-28 | 2004-09-30 | International Business Machines Corporation | Dynamic data migration for structured markup language schema changes |
| US7774386B2 (en) * | 2003-07-24 | 2010-08-10 | International Business Machines Corporation | Applying abstraction to object markup definitions |
| US7437374B2 (en) * | 2004-02-10 | 2008-10-14 | International Business Machines Corporation | Efficient XML schema validation of XML fragments using annotated automaton encoding |
| US20050177578A1 (en) * | 2004-02-10 | 2005-08-11 | Chen Yao-Ching S. | Efficient type annontation of XML schema-validated XML documents without schema validation |
-
2003
- 2003-10-03 US US10/677,744 patent/US20040172234A1/en not_active Abandoned
- 2003-10-03 AU AU2003277247A patent/AU2003277247A1/en not_active Abandoned
- 2003-10-03 WO PCT/US2003/031312 patent/WO2004079571A2/en not_active Ceased
- 2003-10-03 CN CNB2003801102873A patent/CN100470480C/en not_active Expired - Fee Related
- 2003-10-03 CA CA002521576A patent/CA2521576A1/en not_active Abandoned
- 2003-10-03 EP EP03816198A patent/EP1604277A2/en not_active Withdrawn
Cited By (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP1607823A3 (en) * | 2004-06-14 | 2006-01-25 | Lionic Corporation | Method and system for virus detection based on finite automata |
| US7216364B2 (en) | 2004-06-14 | 2007-05-08 | Lionic Corporation | System security approaches using state tables |
| US7596809B2 (en) | 2004-06-14 | 2009-09-29 | Lionic Corporation | System security approaches using multiple processing units |
| US7685637B2 (en) | 2004-06-14 | 2010-03-23 | Lionic Corporation | System security approaches using sub-expression automata |
| DE112006000260B4 (en) * | 2005-01-21 | 2014-04-10 | Huawei Technologies Co., Ltd. | Parser for analyzing a text-coded protocol |
| WO2006102849A1 (en) | 2005-03-30 | 2006-10-05 | Huawei Technologies Co., Ltd. | A method and device for pattern matching and parsing on abnf character string |
| EP1868090A4 (en) * | 2005-03-30 | 2008-08-27 | Huawei Tech Co Ltd | METHOD AND DEVICE FOR PATTERN COMPARISON AND ANALYSIS OF AN ABNF CHARACTER |
| WO2007101726A3 (en) * | 2006-03-08 | 2007-10-25 | Tomtom Int Bv | Processing device, method and computer program for detecting a certain computer command |
| US8429605B2 (en) | 2009-12-30 | 2013-04-23 | The United States Of America As Represented By The Secretary Of The Navy | Finite state machine architecture for software development |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2004079571B1 (en) | 2005-05-19 |
| CN100470480C (en) | 2009-03-18 |
| EP1604277A2 (en) | 2005-12-14 |
| CN1781078A (en) | 2006-05-31 |
| CA2521576A1 (en) | 2004-09-16 |
| US20040172234A1 (en) | 2004-09-02 |
| WO2004079571A3 (en) | 2005-03-24 |
| AU2003277247A1 (en) | 2004-09-28 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20040172234A1 (en) | Hardware accelerator personality compiler | |
| DK2778914T3 (en) | PROCEDURE AND SYSTEM FOR GENERATING A PARSER AND PARISING COMPLEX DATA | |
| Lesk et al. | Lex: A lexical analyzer generator | |
| US7080094B2 (en) | Hardware accelerated validating parser | |
| EP3336721B1 (en) | Method and system for generating a parser and parsing complex data | |
| US7458022B2 (en) | Hardware/software partition for high performance structured data transformation | |
| JPS61103247A (en) | Translation program generation system | |
| Parr et al. | PCCTS reference manual: version 1.00 | |
| US20070061884A1 (en) | Intrusion detection accelerator | |
| JPH11249903A (en) | Abstract syntax tree processing method, computer readable storage medium recording abstract syntax tree processing program, computer readable storage medium recording abstract syntax tree data, and abstract syntax tree processing | |
| Seindal | GNU m4, version 1.4 | |
| Parr et al. | PCCTS reference manual | |
| Shukla | Converting Regex to Parsing Expression Grammar with Captures | |
| CA2504491A1 (en) | Hardware accelerated validating parser | |
| WO1997007452A1 (en) | Programmable compiler | |
| Marcotty et al. | The definition mechanism for standard PL/I | |
| Pagan | Formal semantics of a SNOBOL4 subset | |
| Sommerville | A pattern matching system | |
| CN121364885A (en) | Code searching method, device, equipment, storage medium and product | |
| Dempsey et al. | A regular expression pattern matching processor for APL | |
| Grune et al. | Program Text to Tokens—Lexical Analysis | |
| EP1244011A1 (en) | Displaying user readable information during linking | |
| Van Dijk | An Estelle compiler | |
| Wette | Not Yet Another Compiler-Compiler | |
| Syme et al. | Lexing and Parsing |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG UZ VC VN YU ZA ZM ZW |
|
| AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
| DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
| B | Later publication of amended claims |
Effective date: 20050401 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2003277247 Country of ref document: AU |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2003816198 Country of ref document: EP Ref document number: 1907/KOLNP/2005 Country of ref document: IN |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2521576 Country of ref document: CA |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 20038B02873 Country of ref document: CN |
|
| WWP | Wipo information: published in national office |
Ref document number: 2003816198 Country of ref document: EP |
|
| NENP | Non-entry into the national phase |
Ref country code: JP |
|
| WWW | Wipo information: withdrawn in national office |
Ref document number: JP |
|
| DPE2 | Request for preliminary examination filed before expiration of 19th month from priority date (pct application filed from 20040101) |