[go: up one dir, main page]

WO1992003782A1 - Flux de donnees pour programme d'analyse syntaxique - Google Patents

Flux de donnees pour programme d'analyse syntaxique Download PDF

Info

Publication number
WO1992003782A1
WO1992003782A1 PCT/US1991/004071 US9104071W WO9203782A1 WO 1992003782 A1 WO1992003782 A1 WO 1992003782A1 US 9104071 W US9104071 W US 9104071W WO 9203782 A1 WO9203782 A1 WO 9203782A1
Authority
WO
WIPO (PCT)
Prior art keywords
syntax
parsing
programming language
machine
input stream
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US1991/004071
Other languages
English (en)
Inventor
Ashok Chandramouli
Robert E. Ii Strout
Jon A. Masamitsu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SUPER-COMPUTER SYSTEMS LP
Original Assignee
SUPER-COMPUTER SYSTEMS LP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SUPER-COMPUTER SYSTEMS LP filed Critical SUPER-COMPUTER SYSTEMS LP
Publication of WO1992003782A1 publication Critical patent/WO1992003782A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • G06F8/427Parsing

Definitions

  • the present invention relates to parsing a data stream having a plurality of different computer language syntax.
  • parsing is an analysis of a stream of program expressions (sentences) for determining whether or not the program expressions are syntactically correct. Once it is determined that a stream of program expressions is syntactically correct, that steam of program expressions can be compiled into executable modules. Parsing is automatically performed in a computer using a computer program.
  • Fortran for example, a scanner using a set of rules groups predetermined characters in the input stream into tokens. Scanners are programs constructed to recognize different types of tokens, such as identifiers, decimal constants, floating point constants, and the like. In recognizing or identifying a token, a scanner may look ahead in the input stream for additional characters.
  • the parser imposes a structure on the sequence of tokens using a set of rules appropriate for the language.
  • rules are referred to as a context-free grammar; such rules are often specified in the so-called and well known Backus Naur form.
  • Such a grammar specification for a programming language expression consisting of decimal digits and the operations + and "production" may be represented as ollows:
  • Each of the five grammar rules above, one on each line, is referred to a a "production".
  • the tokens detected by the scanner are +,* and decimal-digits. Such tokens are passed to the parser program.
  • Each string in the input stream that is parsed as having correct syntax is said to be "accepted”. For example, the string 2+3*5 is "accepted” while the string 2++5 will be rejected as syntactically incorrect.
  • a left-to-right, right-most derivation(LR) parser accepts a subset of a context-free grammar.
  • Each LR parser has an input, an output, a push-down stack, a driver program and a
  • the parsing table is created from the grammar of the language to be parsed and is unique to such language and its grammar.
  • the driver program serially reads tokens one at a time from the input stream.
  • the input stream is typically stored in a computer storage and is scanned by the driver program scanning the stored input stream to fetch the tokens.
  • the driver program may shift the input token into the stack, reduce it by one of the productions, accept a string of such tokens, or reject the string of such tokens as being syntactically wrong. Reduction means that the right-hand side of a production is replaced by the left-hand side.
  • LR parser may also fetch a next token from the input stream for determining whether or not to shift or to reduce the token. Such a token is termed a "lookahead" token and is referred to herein as a look ahead portion of the input stream. The lookahead portion may include more than one token.
  • additional semantic checks also termed semantic actions
  • a debugger which may employ the above described parsing, is a software or program tool that assists a programmer to guide the execution of a program. It enables the programmer to monitor and modify the status of an executing program.
  • a debugger usually can display values of variables, evaluate expressions, and display a procedure call chain at predetermined points in the executing program.
  • a source level debugger allows a programmer to refer to information in a program in terms of the programming langut "e in which the program was written
  • An interactive debugger enables interactive operations between the debugger and the programmer. The programmer has to know the syntax and semantics of the debuggers command language.
  • a debugger command might be PRINT expr where
  • SUBSTITUTE SHEET expr is a programming language expression being evaluated with respect to the program being debugged.
  • Known debuggers often define a syntax for expr that is independent of any programming language or is unique to one programming language. Since a programmer is familiar with the syntax and semantics of the programming language used to write the program being debugged, it makes the programmer's task easier if the same syntax were used to specify the expression the in PRINT command.
  • Present day compiler technology permits the development of programs in which each module is written in a different programming language. Debugging such multi-language programs would be easier if the programmer could switch between different syntaxes during a debugging session. Then, to debug a multi-language program, switching parsing operations between the unique expression syntax may be required.
  • debuggers have been designed in a variety of ways. Such designs have included language restrictions, unfamiliar debugger command languages and the lack of extensibility, i.e. adding another language to the debugger may be difficult or impossible.
  • some debugger command languages are independent of any programming language. Such a selection may simplify the debugger but does not require each programmer to learn a new syntax to use the debugger.
  • Another approach is to design a separate debugger for each programming language used and employ a syntax in the debugger command language that is similar to the respective syntax of the languages. While the programmer is not required to learn a new syntax, the debugger is necessarily limited to debugging programs written in one language.
  • SUBSTITUTE SHEET for a debugger command language or for expression syntax of the programming language.
  • Each of the plural programming languages and the debugger command language is parsed by a separate parser, each parser having its own parsing table and scanner.
  • the command language parser initiates and controls the parsing including assigning parsing of programming language expressions to its respective parser.
  • the command language parser initiates parsing of each command and determines when the expression is to be parsed by which parser.
  • the language parsers are called a a part of semantic actions performed by the command parser. All of the scanners operate with the same input stream. Each individual parser requests tokens only from its respective scanner. This selection of tokens results in diverse interpretations and meanings for tokens having identical character strings.
  • Lookahead operations are handled, and include returning back to the input stream those lookahead portions of the input stream which will not be parsed by the parser that looked ahead.
  • Selection of the language parsers can be explicit separately from the input stream or implicitly from the input stream. Parsing of syntax of the diverse programming languages and debugger commands is isolated by a plurality of separate parsers. The use of small parsing tables enhances the speed of parsing.
  • Figure 1 is a block diagram of a .prior art parsing system used in a program debugger.
  • Figure 2 is a simplified diagram of a parsing system using the present invention.
  • FIG 3 is a schematic showing the operations of the Figure 2 illustrated system.
  • FIG. 1 illustrates a typical parsing system for a debugger.
  • the input stream 10 contains debugger commands and programming-language specific expressions.
  • the input stream is stored in a working data storage (not shown) of a data processing system (not shown) .
  • the scanner 11, a program serially fetches the characters of the input stream 10 and groups these characters to identify tokens.
  • the tokens are passed to debugger command language parser 12.
  • Each identified token and other groups of characters in the input stream have but one meaning and interpretation for the command language parser 12.
  • the three parsers ( Figure 2), C language expression parser 25, Fortran language expression parser 23 and command language parser 16 are created using a well-known parser generator called YACC (Yet Another Compiler Compiler).
  • YACC Yet Another Compiler Compiler
  • the YACC driver program (skeleton of
  • YACC enables semantic actions to be specified within grammar rules.
  • input stream 10 is first accessed by scanner 15 for parsing.
  • Scanner 15 is a debugger command language scanner for ft ching and grouping characters from the input stream 10.
  • Scanner 15 detected tokens are passed to command parser 1.
  • Command parser 16 only parses the syntax for the command language of the debugger. The decision to use a particular language expression parser is based upon a debugger command termed the language command (see Table I below) .
  • This debugger command designates which programming language parser is to be used for parsing ensuing language-specific syntax in input stream 10. The programming language parser chosen by the language command remains in effect until a new language command is seen by the command parser 16.
  • the language command is indicated by the LANGUAGE token while which one of the plurality of syntax is to be used is identified by the language command qualifier which. As shown in Table I, the qualifier which can indicate C, Fortran, or any future language.
  • the extensibility also maintains consistency in language used by the programmer when debugging. This combination, plus the later-described token analysis for selecting a parser for a given programming language or command expression, provides enhanced capability and flexibility in parsers.
  • SUBSTITUTE SHEET programming languages would be parsed by the combination of scanner 26 and future language parser 27. Additional programming language parsers can be added in a modular form as indicated by ellipsis 29. All of the scanner-parser combinations have a common access to input stream 10 via programming calls to or activations of the respective parsers 23, 25, 27 all as represented in Figure 2. When any of the parsers 23, 25, 27 complete the parsing, control is returned to the command parser 16.
  • Table I below includes an illustrative subset of a command language grammar.
  • Each expr is a place holder for the expression grammar to be parsed by one of the programming language parsers 23, 25 or 27.
  • the character strings BREAK, IF, THEN, ENDBREAK, PRINT, HEX, DECIMAL, LANGUAGE, C, FORTRAN, and ⁇ n are keywords or tokens.
  • Source_line is a token that represents a source line at which to stop.
  • the command language scanner 15 recognizes the tokens BREAK, IF, THEN, ENDBREAK, PRINT, HEX, DECIMAL, C, FORTRAN, LANGUAGE and source_line.
  • the command language grammar can be modified to enable parsing of additional programming languages. Such language additions occur in the "which :" portion of Table I. For each new programming language added, a separate parser is designed for the language.
  • the command parser 16 acts differently in the case of a command token such as the IF token, which stands alone, and a command token like the PRINT token, which may be followed by another qualifying command token (HEX or DECIMAL) .
  • a command token such as the IF token, which stands alone
  • a command token like the PRINT token which may be followed by another qualifying command token (HEX or DECIMAL) .
  • the command language parser 16 recognizes that the tokens following each IF token are a part of the expression, it calls the currently active language expression parser. Since an expression token rather than a command token will follow the IF token, command parser 16 does not look ahead and consequently does not return the lookahead token to input stream 10. In the case of the PRINT token, command parser 16 looks ahead to determine whether or not the next following token is a HEX or a DECIMAL token.
  • the token following PRINT token is a first token of a language expression.
  • command parser 16 writes the lookahead token back into input stream 10 and calls the currently active language parser 23, 25 or 27 to parse the expression beginning with the token following the PRINT
  • the command parser 16 When the lookahead token is either a HEX or DECIMAL token, the command parser 16 writes back the lookahead token (HEX OR DECIMAL) token, the command parser 16 writes back the lookahead token (HEX OR DECIMAL) and then calls the currently active language parser 23, 25 or 27 to parse subsequent expressions as a part of its semantic actions.
  • Table II below lists an illustrative subset for C language expression grammar. This subset grammar accepts expressions consisting of +, -, *, / and identifiers. Scanner 24 recognizes the +, -, /. * and c identifier tokens.
  • the scanners 22 and 24 use different rules for identifying
  • ITUTE SHEET language parser then reduces the expression such as by the rule expr : additive_expr.
  • command parser 16 finds the end-of-line marker ⁇ n and then resumes parsing at the first token beyond marker ⁇ n in data stream 10.
  • Figure 3 shows operation of the invention in a two-level parsing system having command parser 16 at a first or initial level and programming language parsers 23 and 25 in a second level.
  • parsers 23 and 25 appear to be identical, it being understood that the rules and token operations in the two parsers are different, as discussed above.
  • parser 23 is described in detail, that description applies equally to parser 25.
  • parsers 16 and 23 have been loaded into a computer, i.e. established for operation, the input stream 10 is in data storage mans, and the program is ready to execute.
  • Scanner 15 obtains a first token from input stream 10. The first sequence of tokens is taken by command parser 16 for shifting and reduction at step 40. Any syntactical error is detected at branch step 41.
  • the error rule is invoked with the error reported via step 42 to the user (programmer) interface 43. The end of the current line is then located such that parsing beginning with the next line in input stream 10 can ensue.
  • command parser 16 determines at step 45 whether
  • SUBSTITUTE SHEET or not the token following the examined token represents a programming language or is part of the command language.
  • Such determination may require a lookahead in input stream
  • parser 16 Upon successful reduction, the parsing of a command is complete. The cycle is started again for the next command in input stream 10. This processing continues until no more characters remain in input stream 10.
  • Command parser 16 calls one of the programming language parsers 23, 25 or 27, such as Fortran parser 23 as a part of its semantic actions.
  • Fortran parser 23 invokes scanner 22.
  • Scanner 22 serially fetches tokens from input stream 10.
  • Fortran parser 23 then operates on a sequence of tokens until it recognizes a complete expression or encounters a syntax error. Such errors are reported to user interface 43. If there are no errors and Fortran parser 23 recognized a complete expression, then Fortran parser 23 writes back any lookahead token to input stream 10 and returns control to command parser 16.
  • command parser 16 Upon retrieving control from Fortran parser 23, command parser 16 continues performing the shift/reduce operations until it successfully parses a command or encounters an error.
  • This action completes parsing one command. If there are more characters in input stream 10, the entire described cycle is repeated. A token that is not parsed is written back to input stream 10 such that a next parser can read the token from the input stream 10 such that a next parser can rei the token from the input stream via its scanner.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

Cette invention concerne un procédé d'analyse syntaxique à deux niveaux, extensible, constitué de plusieurs analyseurs syntaxiques comprenant un analyseur syntaxique (16) de langage de commande et une pluralité d'analyseurs syntaxiques de langages de programmation (23, 25, 27, 29)) tels que le langage Fortran (23) ou le langage C (25). L'analyseur syntaxique (16) de commande analyse le langage jusqu'à ce qu'il reconnaisse un élément d'information spécifique à un langage, puis il appelle l'analyseur syntaxique approprié au langage de programmation. Chaque analyseur syntaxique incorporé possède sa propre table d'analyse distincte et son propre explorateur lexical (15, 22, 24 26). Tous les explorateurs lexicaux partagent un flux commun (10) de données d'entrées. Pour chaque analyseur syntaxique, la grammaire est plus simple que la grammaire globale couvrant un langage de commande et plusieurs expressions de langages de programmation. Cette simplicité améliore la rapidité et l'efficacité de l'analyse syntaxique. Le procédé utilise un procédé d'extension permettant d'incorporer la syntaxe spécifique à un langage de programmation lorsque c'est nécessaire, dans le cadre de la syntaxe de sous-programme de mise au point.
PCT/US1991/004071 1990-08-23 1991-06-10 Flux de donnees pour programme d'analyse syntaxique Ceased WO1992003782A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US57195490A 1990-08-23 1990-08-23
US571,954 1990-08-23

Publications (1)

Publication Number Publication Date
WO1992003782A1 true WO1992003782A1 (fr) 1992-03-05

Family

ID=24285742

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1991/004071 Ceased WO1992003782A1 (fr) 1990-08-23 1991-06-10 Flux de donnees pour programme d'analyse syntaxique

Country Status (1)

Country Link
WO (1) WO1992003782A1 (fr)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5903756A (en) * 1996-10-11 1999-05-11 Sun Microsystems, Incorporated Variable lookahead parser generator
WO2001018649A3 (fr) * 1999-09-03 2002-02-21 Cadence Design Systems Inc Procede et systeme permettant la compilation par division d'un programme de langages hybride
WO2003005193A3 (fr) * 2001-07-05 2003-12-04 Electronic Data Syst Corp Systeme et procede de comptage de lignes de code source
GB2394089A (en) * 2002-09-10 2004-04-14 Sun Microsystems Inc Parsing test results having diverse formats
US6959434B2 (en) * 1999-12-01 2005-10-25 International Business Machines Corporation Method of determining the syntactic correctness of expressions
US8850414B2 (en) 2007-02-02 2014-09-30 Microsoft Corporation Direct access of language metadata

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4686623A (en) * 1985-06-07 1987-08-11 International Business Machines Corporation Parser-based attribute analysis
US4833606A (en) * 1986-10-09 1989-05-23 Hitachi, Ltd. Compiling method for vectorizing multiple do-loops in source program
US4885684A (en) * 1987-12-07 1989-12-05 International Business Machines Corporation Method for compiling a master task definition data set for defining the logical data flow of a distributed processing network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4686623A (en) * 1985-06-07 1987-08-11 International Business Machines Corporation Parser-based attribute analysis
US4833606A (en) * 1986-10-09 1989-05-23 Hitachi, Ltd. Compiling method for vectorizing multiple do-loops in source program
US4885684A (en) * 1987-12-07 1989-12-05 International Business Machines Corporation Method for compiling a master task definition data set for defining the logical data flow of a distributed processing network

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5903756A (en) * 1996-10-11 1999-05-11 Sun Microsystems, Incorporated Variable lookahead parser generator
WO2001018649A3 (fr) * 1999-09-03 2002-02-21 Cadence Design Systems Inc Procede et systeme permettant la compilation par division d'un programme de langages hybride
US7010784B1 (en) 1999-09-03 2006-03-07 Cadence Design Systems, Inc. Method and system for split-compiling a hybrid language program
US7873948B2 (en) 1999-09-03 2011-01-18 Cadence Design Systems, Inc. Method and system for split-compiling a hybrid language program
US6959434B2 (en) * 1999-12-01 2005-10-25 International Business Machines Corporation Method of determining the syntactic correctness of expressions
WO2003005193A3 (fr) * 2001-07-05 2003-12-04 Electronic Data Syst Corp Systeme et procede de comptage de lignes de code source
GB2394089A (en) * 2002-09-10 2004-04-14 Sun Microsystems Inc Parsing test results having diverse formats
GB2394089B (en) * 2002-09-10 2004-11-10 Sun Microsystems Inc Parsing test results having diverse formats
US7191362B2 (en) 2002-09-10 2007-03-13 Sun Microsystems, Inc. Parsing test results having diverse formats
US8850414B2 (en) 2007-02-02 2014-09-30 Microsoft Corporation Direct access of language metadata

Similar Documents

Publication Publication Date Title
US6434742B1 (en) Symbol for automatically renaming symbols in files during the compiling of the files
US5048018A (en) Debugging parallel programs by serialization
US5581696A (en) Method using a computer for automatically instrumenting a computer program for dynamic debugging
US5193192A (en) Vectorized LR parsing of computer programs
US5651111A (en) Method and apparatus for producing a software test system using complementary code to resolve external dependencies
US6353925B1 (en) System and method for lexing and parsing program annotations
KR950006619B1 (ko) 번역 코드 실행용의 개선된 에러 기록 방법 및 시스템
US8099721B2 (en) Parsing of declarations in all branches of preprocessor conditionals
Cheatham Jr The introduction of definitional facilities into higher level programming languages
JPH05508494A (ja) ソフトウェア開発のためのコンピュータプログラムの統合階層表示
US6598181B1 (en) Method and system for debugging multiple function calls
US6330714B1 (en) Method and computer program product for implementing redundant lock avoidance
US5822589A (en) Method for locating errors in a computer program
WO1992003782A1 (fr) Flux de donnees pour programme d'analyse syntaxique
Grune et al. A programmer‐friendly LL (1) parser generator
Richards The BCPL programming manual
Leroy The Caml Light system release 0.74
US7318221B2 (en) Windows™ F-language interpreter
CN111767033A (zh) 用于机械臂程序开发的编程系统及功能扩展方法
US5029170A (en) Assembly language programming potential error detection scheme which recognizes incorrect symbolic or literal address constructs
JPH0784797A (ja) ロードモジュールへのソースコード行番号登録方法および装置
Leroy The caml light system release 0.71
Saxena et al. Compiler Design
Maliavko et al. Functionally Imperative Programming Language El and its Implementation
CN120255902A (zh) 一种基于dsp的机载软件编译过程的自动追溯方法

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): JP KR

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IT LU NL SE