WO1992003782A1 - Flux de donnees pour programme d'analyse syntaxique - Google Patents
Flux de donnees pour programme d'analyse syntaxique Download PDFInfo
- Publication number
- WO1992003782A1 WO1992003782A1 PCT/US1991/004071 US9104071W WO9203782A1 WO 1992003782 A1 WO1992003782 A1 WO 1992003782A1 US 9104071 W US9104071 W US 9104071W WO 9203782 A1 WO9203782 A1 WO 9203782A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- syntax
- parsing
- programming language
- machine
- input stream
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/42—Syntactic analysis
- G06F8/427—Parsing
Definitions
- the present invention relates to parsing a data stream having a plurality of different computer language syntax.
- parsing is an analysis of a stream of program expressions (sentences) for determining whether or not the program expressions are syntactically correct. Once it is determined that a stream of program expressions is syntactically correct, that steam of program expressions can be compiled into executable modules. Parsing is automatically performed in a computer using a computer program.
- Fortran for example, a scanner using a set of rules groups predetermined characters in the input stream into tokens. Scanners are programs constructed to recognize different types of tokens, such as identifiers, decimal constants, floating point constants, and the like. In recognizing or identifying a token, a scanner may look ahead in the input stream for additional characters.
- the parser imposes a structure on the sequence of tokens using a set of rules appropriate for the language.
- rules are referred to as a context-free grammar; such rules are often specified in the so-called and well known Backus Naur form.
- Such a grammar specification for a programming language expression consisting of decimal digits and the operations + and "production" may be represented as ollows:
- Each of the five grammar rules above, one on each line, is referred to a a "production".
- the tokens detected by the scanner are +,* and decimal-digits. Such tokens are passed to the parser program.
- Each string in the input stream that is parsed as having correct syntax is said to be "accepted”. For example, the string 2+3*5 is "accepted” while the string 2++5 will be rejected as syntactically incorrect.
- a left-to-right, right-most derivation(LR) parser accepts a subset of a context-free grammar.
- Each LR parser has an input, an output, a push-down stack, a driver program and a
- the parsing table is created from the grammar of the language to be parsed and is unique to such language and its grammar.
- the driver program serially reads tokens one at a time from the input stream.
- the input stream is typically stored in a computer storage and is scanned by the driver program scanning the stored input stream to fetch the tokens.
- the driver program may shift the input token into the stack, reduce it by one of the productions, accept a string of such tokens, or reject the string of such tokens as being syntactically wrong. Reduction means that the right-hand side of a production is replaced by the left-hand side.
- LR parser may also fetch a next token from the input stream for determining whether or not to shift or to reduce the token. Such a token is termed a "lookahead" token and is referred to herein as a look ahead portion of the input stream. The lookahead portion may include more than one token.
- additional semantic checks also termed semantic actions
- a debugger which may employ the above described parsing, is a software or program tool that assists a programmer to guide the execution of a program. It enables the programmer to monitor and modify the status of an executing program.
- a debugger usually can display values of variables, evaluate expressions, and display a procedure call chain at predetermined points in the executing program.
- a source level debugger allows a programmer to refer to information in a program in terms of the programming langut "e in which the program was written
- An interactive debugger enables interactive operations between the debugger and the programmer. The programmer has to know the syntax and semantics of the debuggers command language.
- a debugger command might be PRINT expr where
- SUBSTITUTE SHEET expr is a programming language expression being evaluated with respect to the program being debugged.
- Known debuggers often define a syntax for expr that is independent of any programming language or is unique to one programming language. Since a programmer is familiar with the syntax and semantics of the programming language used to write the program being debugged, it makes the programmer's task easier if the same syntax were used to specify the expression the in PRINT command.
- Present day compiler technology permits the development of programs in which each module is written in a different programming language. Debugging such multi-language programs would be easier if the programmer could switch between different syntaxes during a debugging session. Then, to debug a multi-language program, switching parsing operations between the unique expression syntax may be required.
- debuggers have been designed in a variety of ways. Such designs have included language restrictions, unfamiliar debugger command languages and the lack of extensibility, i.e. adding another language to the debugger may be difficult or impossible.
- some debugger command languages are independent of any programming language. Such a selection may simplify the debugger but does not require each programmer to learn a new syntax to use the debugger.
- Another approach is to design a separate debugger for each programming language used and employ a syntax in the debugger command language that is similar to the respective syntax of the languages. While the programmer is not required to learn a new syntax, the debugger is necessarily limited to debugging programs written in one language.
- SUBSTITUTE SHEET for a debugger command language or for expression syntax of the programming language.
- Each of the plural programming languages and the debugger command language is parsed by a separate parser, each parser having its own parsing table and scanner.
- the command language parser initiates and controls the parsing including assigning parsing of programming language expressions to its respective parser.
- the command language parser initiates parsing of each command and determines when the expression is to be parsed by which parser.
- the language parsers are called a a part of semantic actions performed by the command parser. All of the scanners operate with the same input stream. Each individual parser requests tokens only from its respective scanner. This selection of tokens results in diverse interpretations and meanings for tokens having identical character strings.
- Lookahead operations are handled, and include returning back to the input stream those lookahead portions of the input stream which will not be parsed by the parser that looked ahead.
- Selection of the language parsers can be explicit separately from the input stream or implicitly from the input stream. Parsing of syntax of the diverse programming languages and debugger commands is isolated by a plurality of separate parsers. The use of small parsing tables enhances the speed of parsing.
- Figure 1 is a block diagram of a .prior art parsing system used in a program debugger.
- Figure 2 is a simplified diagram of a parsing system using the present invention.
- FIG 3 is a schematic showing the operations of the Figure 2 illustrated system.
- FIG. 1 illustrates a typical parsing system for a debugger.
- the input stream 10 contains debugger commands and programming-language specific expressions.
- the input stream is stored in a working data storage (not shown) of a data processing system (not shown) .
- the scanner 11, a program serially fetches the characters of the input stream 10 and groups these characters to identify tokens.
- the tokens are passed to debugger command language parser 12.
- Each identified token and other groups of characters in the input stream have but one meaning and interpretation for the command language parser 12.
- the three parsers ( Figure 2), C language expression parser 25, Fortran language expression parser 23 and command language parser 16 are created using a well-known parser generator called YACC (Yet Another Compiler Compiler).
- YACC Yet Another Compiler Compiler
- the YACC driver program (skeleton of
- YACC enables semantic actions to be specified within grammar rules.
- input stream 10 is first accessed by scanner 15 for parsing.
- Scanner 15 is a debugger command language scanner for ft ching and grouping characters from the input stream 10.
- Scanner 15 detected tokens are passed to command parser 1.
- Command parser 16 only parses the syntax for the command language of the debugger. The decision to use a particular language expression parser is based upon a debugger command termed the language command (see Table I below) .
- This debugger command designates which programming language parser is to be used for parsing ensuing language-specific syntax in input stream 10. The programming language parser chosen by the language command remains in effect until a new language command is seen by the command parser 16.
- the language command is indicated by the LANGUAGE token while which one of the plurality of syntax is to be used is identified by the language command qualifier which. As shown in Table I, the qualifier which can indicate C, Fortran, or any future language.
- the extensibility also maintains consistency in language used by the programmer when debugging. This combination, plus the later-described token analysis for selecting a parser for a given programming language or command expression, provides enhanced capability and flexibility in parsers.
- SUBSTITUTE SHEET programming languages would be parsed by the combination of scanner 26 and future language parser 27. Additional programming language parsers can be added in a modular form as indicated by ellipsis 29. All of the scanner-parser combinations have a common access to input stream 10 via programming calls to or activations of the respective parsers 23, 25, 27 all as represented in Figure 2. When any of the parsers 23, 25, 27 complete the parsing, control is returned to the command parser 16.
- Table I below includes an illustrative subset of a command language grammar.
- Each expr is a place holder for the expression grammar to be parsed by one of the programming language parsers 23, 25 or 27.
- the character strings BREAK, IF, THEN, ENDBREAK, PRINT, HEX, DECIMAL, LANGUAGE, C, FORTRAN, and ⁇ n are keywords or tokens.
- Source_line is a token that represents a source line at which to stop.
- the command language scanner 15 recognizes the tokens BREAK, IF, THEN, ENDBREAK, PRINT, HEX, DECIMAL, C, FORTRAN, LANGUAGE and source_line.
- the command language grammar can be modified to enable parsing of additional programming languages. Such language additions occur in the "which :" portion of Table I. For each new programming language added, a separate parser is designed for the language.
- the command parser 16 acts differently in the case of a command token such as the IF token, which stands alone, and a command token like the PRINT token, which may be followed by another qualifying command token (HEX or DECIMAL) .
- a command token such as the IF token, which stands alone
- a command token like the PRINT token which may be followed by another qualifying command token (HEX or DECIMAL) .
- the command language parser 16 recognizes that the tokens following each IF token are a part of the expression, it calls the currently active language expression parser. Since an expression token rather than a command token will follow the IF token, command parser 16 does not look ahead and consequently does not return the lookahead token to input stream 10. In the case of the PRINT token, command parser 16 looks ahead to determine whether or not the next following token is a HEX or a DECIMAL token.
- the token following PRINT token is a first token of a language expression.
- command parser 16 writes the lookahead token back into input stream 10 and calls the currently active language parser 23, 25 or 27 to parse the expression beginning with the token following the PRINT
- the command parser 16 When the lookahead token is either a HEX or DECIMAL token, the command parser 16 writes back the lookahead token (HEX OR DECIMAL) token, the command parser 16 writes back the lookahead token (HEX OR DECIMAL) and then calls the currently active language parser 23, 25 or 27 to parse subsequent expressions as a part of its semantic actions.
- Table II below lists an illustrative subset for C language expression grammar. This subset grammar accepts expressions consisting of +, -, *, / and identifiers. Scanner 24 recognizes the +, -, /. * and c identifier tokens.
- the scanners 22 and 24 use different rules for identifying
- ITUTE SHEET language parser then reduces the expression such as by the rule expr : additive_expr.
- command parser 16 finds the end-of-line marker ⁇ n and then resumes parsing at the first token beyond marker ⁇ n in data stream 10.
- Figure 3 shows operation of the invention in a two-level parsing system having command parser 16 at a first or initial level and programming language parsers 23 and 25 in a second level.
- parsers 23 and 25 appear to be identical, it being understood that the rules and token operations in the two parsers are different, as discussed above.
- parser 23 is described in detail, that description applies equally to parser 25.
- parsers 16 and 23 have been loaded into a computer, i.e. established for operation, the input stream 10 is in data storage mans, and the program is ready to execute.
- Scanner 15 obtains a first token from input stream 10. The first sequence of tokens is taken by command parser 16 for shifting and reduction at step 40. Any syntactical error is detected at branch step 41.
- the error rule is invoked with the error reported via step 42 to the user (programmer) interface 43. The end of the current line is then located such that parsing beginning with the next line in input stream 10 can ensue.
- command parser 16 determines at step 45 whether
- SUBSTITUTE SHEET or not the token following the examined token represents a programming language or is part of the command language.
- Such determination may require a lookahead in input stream
- parser 16 Upon successful reduction, the parsing of a command is complete. The cycle is started again for the next command in input stream 10. This processing continues until no more characters remain in input stream 10.
- Command parser 16 calls one of the programming language parsers 23, 25 or 27, such as Fortran parser 23 as a part of its semantic actions.
- Fortran parser 23 invokes scanner 22.
- Scanner 22 serially fetches tokens from input stream 10.
- Fortran parser 23 then operates on a sequence of tokens until it recognizes a complete expression or encounters a syntax error. Such errors are reported to user interface 43. If there are no errors and Fortran parser 23 recognized a complete expression, then Fortran parser 23 writes back any lookahead token to input stream 10 and returns control to command parser 16.
- command parser 16 Upon retrieving control from Fortran parser 23, command parser 16 continues performing the shift/reduce operations until it successfully parses a command or encounters an error.
- This action completes parsing one command. If there are more characters in input stream 10, the entire described cycle is repeated. A token that is not parsed is written back to input stream 10 such that a next parser can read the token from the input stream 10 such that a next parser can rei the token from the input stream via its scanner.
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Devices For Executing Special Programs (AREA)
Abstract
Cette invention concerne un procédé d'analyse syntaxique à deux niveaux, extensible, constitué de plusieurs analyseurs syntaxiques comprenant un analyseur syntaxique (16) de langage de commande et une pluralité d'analyseurs syntaxiques de langages de programmation (23, 25, 27, 29)) tels que le langage Fortran (23) ou le langage C (25). L'analyseur syntaxique (16) de commande analyse le langage jusqu'à ce qu'il reconnaisse un élément d'information spécifique à un langage, puis il appelle l'analyseur syntaxique approprié au langage de programmation. Chaque analyseur syntaxique incorporé possède sa propre table d'analyse distincte et son propre explorateur lexical (15, 22, 24 26). Tous les explorateurs lexicaux partagent un flux commun (10) de données d'entrées. Pour chaque analyseur syntaxique, la grammaire est plus simple que la grammaire globale couvrant un langage de commande et plusieurs expressions de langages de programmation. Cette simplicité améliore la rapidité et l'efficacité de l'analyse syntaxique. Le procédé utilise un procédé d'extension permettant d'incorporer la syntaxe spécifique à un langage de programmation lorsque c'est nécessaire, dans le cadre de la syntaxe de sous-programme de mise au point.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US57195490A | 1990-08-23 | 1990-08-23 | |
| US571,954 | 1990-08-23 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO1992003782A1 true WO1992003782A1 (fr) | 1992-03-05 |
Family
ID=24285742
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US1991/004071 Ceased WO1992003782A1 (fr) | 1990-08-23 | 1991-06-10 | Flux de donnees pour programme d'analyse syntaxique |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO1992003782A1 (fr) |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5903756A (en) * | 1996-10-11 | 1999-05-11 | Sun Microsystems, Incorporated | Variable lookahead parser generator |
| WO2001018649A3 (fr) * | 1999-09-03 | 2002-02-21 | Cadence Design Systems Inc | Procede et systeme permettant la compilation par division d'un programme de langages hybride |
| WO2003005193A3 (fr) * | 2001-07-05 | 2003-12-04 | Electronic Data Syst Corp | Systeme et procede de comptage de lignes de code source |
| GB2394089A (en) * | 2002-09-10 | 2004-04-14 | Sun Microsystems Inc | Parsing test results having diverse formats |
| US6959434B2 (en) * | 1999-12-01 | 2005-10-25 | International Business Machines Corporation | Method of determining the syntactic correctness of expressions |
| US8850414B2 (en) | 2007-02-02 | 2014-09-30 | Microsoft Corporation | Direct access of language metadata |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4686623A (en) * | 1985-06-07 | 1987-08-11 | International Business Machines Corporation | Parser-based attribute analysis |
| US4833606A (en) * | 1986-10-09 | 1989-05-23 | Hitachi, Ltd. | Compiling method for vectorizing multiple do-loops in source program |
| US4885684A (en) * | 1987-12-07 | 1989-12-05 | International Business Machines Corporation | Method for compiling a master task definition data set for defining the logical data flow of a distributed processing network |
-
1991
- 1991-06-10 WO PCT/US1991/004071 patent/WO1992003782A1/fr not_active Ceased
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US4686623A (en) * | 1985-06-07 | 1987-08-11 | International Business Machines Corporation | Parser-based attribute analysis |
| US4833606A (en) * | 1986-10-09 | 1989-05-23 | Hitachi, Ltd. | Compiling method for vectorizing multiple do-loops in source program |
| US4885684A (en) * | 1987-12-07 | 1989-12-05 | International Business Machines Corporation | Method for compiling a master task definition data set for defining the logical data flow of a distributed processing network |
Cited By (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5903756A (en) * | 1996-10-11 | 1999-05-11 | Sun Microsystems, Incorporated | Variable lookahead parser generator |
| WO2001018649A3 (fr) * | 1999-09-03 | 2002-02-21 | Cadence Design Systems Inc | Procede et systeme permettant la compilation par division d'un programme de langages hybride |
| US7010784B1 (en) | 1999-09-03 | 2006-03-07 | Cadence Design Systems, Inc. | Method and system for split-compiling a hybrid language program |
| US7873948B2 (en) | 1999-09-03 | 2011-01-18 | Cadence Design Systems, Inc. | Method and system for split-compiling a hybrid language program |
| US6959434B2 (en) * | 1999-12-01 | 2005-10-25 | International Business Machines Corporation | Method of determining the syntactic correctness of expressions |
| WO2003005193A3 (fr) * | 2001-07-05 | 2003-12-04 | Electronic Data Syst Corp | Systeme et procede de comptage de lignes de code source |
| GB2394089A (en) * | 2002-09-10 | 2004-04-14 | Sun Microsystems Inc | Parsing test results having diverse formats |
| GB2394089B (en) * | 2002-09-10 | 2004-11-10 | Sun Microsystems Inc | Parsing test results having diverse formats |
| US7191362B2 (en) | 2002-09-10 | 2007-03-13 | Sun Microsystems, Inc. | Parsing test results having diverse formats |
| US8850414B2 (en) | 2007-02-02 | 2014-09-30 | Microsoft Corporation | Direct access of language metadata |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US6434742B1 (en) | Symbol for automatically renaming symbols in files during the compiling of the files | |
| US5048018A (en) | Debugging parallel programs by serialization | |
| US5581696A (en) | Method using a computer for automatically instrumenting a computer program for dynamic debugging | |
| US5193192A (en) | Vectorized LR parsing of computer programs | |
| US5651111A (en) | Method and apparatus for producing a software test system using complementary code to resolve external dependencies | |
| US6353925B1 (en) | System and method for lexing and parsing program annotations | |
| KR950006619B1 (ko) | 번역 코드 실행용의 개선된 에러 기록 방법 및 시스템 | |
| US8099721B2 (en) | Parsing of declarations in all branches of preprocessor conditionals | |
| Cheatham Jr | The introduction of definitional facilities into higher level programming languages | |
| JPH05508494A (ja) | ソフトウェア開発のためのコンピュータプログラムの統合階層表示 | |
| US6598181B1 (en) | Method and system for debugging multiple function calls | |
| US6330714B1 (en) | Method and computer program product for implementing redundant lock avoidance | |
| US5822589A (en) | Method for locating errors in a computer program | |
| WO1992003782A1 (fr) | Flux de donnees pour programme d'analyse syntaxique | |
| Grune et al. | A programmer‐friendly LL (1) parser generator | |
| Richards | The BCPL programming manual | |
| Leroy | The Caml Light system release 0.74 | |
| US7318221B2 (en) | Windows™ F-language interpreter | |
| CN111767033A (zh) | 用于机械臂程序开发的编程系统及功能扩展方法 | |
| US5029170A (en) | Assembly language programming potential error detection scheme which recognizes incorrect symbolic or literal address constructs | |
| JPH0784797A (ja) | ロードモジュールへのソースコード行番号登録方法および装置 | |
| Leroy | The caml light system release 0.71 | |
| Saxena et al. | Compiler Design | |
| Maliavko et al. | Functionally Imperative Programming Language El and its Implementation | |
| CN120255902A (zh) | 一种基于dsp的机载软件编译过程的自动追溯方法 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AK | Designated states |
Kind code of ref document: A1 Designated state(s): JP KR |
|
| AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FR GB GR IT LU NL SE |