HK1022027B

HK1022027B - Method and system for generating a logical form graph for a sentence in a natural language

Info

Publication number: HK1022027B
Application number: HK00100970.4A
Authority: HK
Inventors: 乔治‧海多恩; 卡伦‧詹森
Original assignee: Microsoft Technology Licensing, Llc
Priority date: 1996-06-28
Filing date: 1997-06-27
Publication date: 2004-05-14

Description

Method and system for generating logical form graph for natural language sentence

The present invention relates to the field of natural language processing ("NLP") and more particularly to a method and system for generating logical form graphs from syntax trees.

Computer systems that automatically process natural language use different subsystems, roughly corresponding to linguistic domains of morphological, syntactic and semantic analysis to analyze the input text and to reach a level of machine understanding of natural language, and after understanding the input text at a certain level, the computer system may, for example, suggest grammatical and stylistic changes to the input text, answer questions posed in the input text, or effectively deposit information represented by the input text.

Morphological analysis identifies the input words and provides information for each word, i.e., a human speaker of natural language can use a dictionary to determine the word. Such information can include the syntactic role a word can play (e.g., nouns or verbs) and the way words need to be changed by adding prefixes or suffixes to generate different related words. For example, the dictionary may list, in addition to the word "fish," different related words derived from the word "fish," named "fish," fished, "" fisherman, "" fisher, "" fishery, "" fish bowl, "" fisherman, "" fishery, "" fishing net, "and" fish-like.

As a starting point, the parsing process parses each input sentence using information provided by the morphological analysis of the input words and a set of syntactic rules that specify the language grammar in which the input sentence is written. The following are syntax rule examples:

sentence as noun phrase + verb phrase

Noun phrase is adjective + noun

Verb phrase + verb syntactic analysis attempts to find an ordered subset of a set of syntactic rules, applied to the words of an input sentence, to combine the words into phrases, which are then combined into a complete sentence, e.g., considering the input sentence: "Big dog firm bit". Using the three simple rules listed above, syntactic analysis can distinguish the words "Big" and "dogs" as adjectives and nouns, respectively, and apply the second rule to generate the noun phrase "Big dogs", syntactic analysis can distinguish the words "fiercely" and "bit" as adverbs and verbs, respectively, and apply the third rule to generate the verb phrase "fiercely bit". Finally, the syntactic analysis applies a first rule to form a complete sentence from the previously generated noun and verb phrases. Usually, the syntactic analysis result is represented by a tree without rings and branches downwards, the nodes of the tree represent the input words, punctuation marks and phrases, and the root node of the tree represents the complete sentence, and the result is called syntactic analysis.

However, some sentences can have several different grammatical analyses. An example sentence of a multilingual analysis is: "Time flies like an arrow". There are at least three possible grammatical analyses corresponding to the three possible meanings of the sentence. In the first type of syntax analysis, "time" is the subject of the sentence, "flies" is the verb and "like an arrow" is the prepositional phrase that modifies the verb "flies". However, there are at least two unexpected grammatical analyses. In the second type of syntax analysis, "time" is an adjective that modifies "flies," like "is a verb, and" an arrow "is an object of the verb. The meaning of this grammar corresponds to a certain type of fly "time flies" that likes or attracts an arrow. In a third syntax analysis, "time" is a imperative verb, "flies" is an object, and "like an arrow" is a prepositional phrase that modifies "time". This syntax analysis corresponds to a command to clock the fly as if one were likely to clock an arrow using a stopwatch.

One or more hierarchical trees, called syntactic parse trees, are typically constructed to accomplish the syntactic parsing. Each leaf node of a syntactic parse tree typically represents a word or punctuation of an input sentence. Application of the syntax rules will generate an intermediate level node from the down-link to one, two or sometimes more existing node nodes. The existing nodes initially include only leaf nodes, but when the syntax analysis applies the syntax rules, the existing nodes include both leaf nodes and intermediate level nodes. A single root node of the complete parse tree represents one complete sentence.

Semantic analysis produces a logical form graph that describes the meaning of the input text in a deeper manner than a syntactic tree alone. The logical form graph is the first attempt to understand input text with a level that simulates what a human speaker of a language reaches.

The logical form graph has nodes and chains, but unlike the syntactic parse tree described above, it is not ordered by hierarchy. The chain of the logical form graph has a label to indicate the relationship between a pair of nodes. For example, semantic analysis may identify certain nouns in a sentence as deep subjects or deep objects of verbs. The deep subject of the verb is the actor of the action and the deep object of the verb is the subject of the action specified by the verb. The deep subject of the active verb may be a syntactic subject of the sentence, and the deep object of the active verb may be a syntactic object of the verb. However, the deep subject of the passive verb may be represented by a schlieren phrase, while the deep object of the passive verb may be the syntactic subject of a sentence. For example, consider two sentences: (1) "Dogs bit peers" and (2) "peer area bit by Dogs". The first sentence has an active verb and the second sentence has a passive verb. The syntactic subject of the first sentence is "Dogs" and the syntactic object of the verb "bite" is "people". In contrast, the syntactic subject of the second sentence is "People" and the verb phrase "are bitten" is modified by the schlier phrase "bylogs". For both sentences, "dogs" are deep subjects, and "peoples" are deep objects of verbs or verb phrases in the sentence. Although the syntax analysis trees generated by the syntax analysis of sentences 1 and 2 above are different, the logical form graph generated by the semantic analysis is the same because the basic meaning of the two sentences is the same.

Further semantic processing after the logical form graph is generated may rely on a knowledge base to link the analyzed documents with real world concepts to achieve a deeper understanding. An example of a knowledge base is an online encyclopedia from which more sophisticated definitions and context information for a particular word can be obtained.

The following text example sentence is input in the process: three NLP subsystems-morphological, syntactic and semantic are described in The context of "The person whom I met ws myfriend. Fig. 1 is a block diagram illustrating the flow of information between NLP subsystems. The morphological subsystem 101 receives input text and outputs discriminative content of words and meanings that can use different parts of the language for each word, and the syntactic subsystem 102 receives this information and applies syntactic rules to generate a syntactic parse tree. The semantic subsystem 103 receives the syntax analysis tree and generates a logical form graph.

Fig. 2-5 show dictionary information stored on an electronic storage medium retrieved for an input word of an input text illustrative sentence during morphological analysis. FIG. 2 shows dictionary entries entering the words "the" 201 and "person" 202. Entry 201 includes the keyword "the" and a table of attribute/value pairs. The first attribute "Adj" has symbols contained within brackets 205 and 206 as its value. These symbols in turn comprise two pairs of attribute/value pairs: (1) the "Lemma"/"the" and (2) "Bits"/"singPlur Wa6 Det Art Bo def." term (Lemma) are the basic forms of words with inflected changes in their morphology. Thus the attribute "Lemma" indicates that "the" is the basic inflected-free version of the word represented by this entry in the dictionary. The attribute "Bits" includes a group of abbreviations that represent certain morphological and syntactic information of a word. This information indicates that the is: (1) singular, (2) plural, (3) inflexibly variable, (4) qualifier, (5) article, (6) plain adjective and (7) definite. The attribute 204 indicates that the word "the" may be used as an adjective. The attribute 212 indicates that the word "the" may be used as an adverb. The attribute "Senses" 207 represents the different meaning of the word in different definitions and examples, with a portion included in the table of attribute/value pairs between brackets 208 and 209 and between brackets 210 and 211. Much of the actual inclusion in the entry for "the" is intended to be omitted from FIG. 2, as indicated by the parenthetical statement "(more sense records)" 213.

In the first step of natural language processing, the morphological subsystem recognizes each word and punctuation of the input text as a separate token and constructs an attribute/value record for each linguistic portion of each token using dictionary information. Attributes are fields in a record that may have one of different specified values for that particular attribute. These property/value records are then sent to the syntax subsystem for further processing, which serve as leaf nodes of the syntax parse tree constructed by the syntax subsystem. All nodes of the syntactic parse tree and all logical form graph nodes subsequently constructed by the NLP subsystem are attribute/value records.

The syntax subsystem applies syntax rules to the leaf nodes sent by the morphological subsystem to the syntax subsystem to construct higher level nodes of a possible syntactic parse tree representing the input text sample. A complete parse tree includes a root node, intermediate nodes, and leaf nodes. The root node represents the syntactic structure (e.g., a declarative sentence) of the input text sample. The middle level nodes represent middle syntactic structures (e.g., verbs, nouns, or pre-word phrases). Leaf nodes represent the initial combination of attribute/value records.

In some NLP systems, the syntax rules are applied in a top-down manner. The syntax subsystem of the NLP system described herein applies syntax rules to leaf nodes in a bottom-up manner. That is, the syntax subsystem attempts to use the syntax rules for a single leaf node, one item at a time, for pairs of leaf nodes, and sometimes for larger groups of leaf nodes. If a syntax rule requires two leaf nodes to operate and a pair of leaf nodes both contain attributes that comply with the requirements specified in the rule, the rule is applied to them to build a higher level syntax structure. For example, the word "my friend" may represent an adjective and a noun, respectively, that may be combined into a higher level syntactic structure of a noun phrase. A syntactic rule corresponding to the grammatical rule "noun phrase + noun" may establish an intermediate level noun phrase node and connect two leaf nodes representing "my" and "friend" to the newly established intermediate level node. When each new middle level node is created, it is connected to existing leaf nodes and middle level nodes and becomes part of the total set of nodes to which the syntax rules apply. The process of applying syntactic rules to ever increasing combinations of nodes continues until a complete parse tree is generated or until no more syntactic rules are applicable. A complete parse tree includes all words of an input sentence as its leaf nodes and represents one possible parsing of the sentence.

This bottom-up syntactic analysis approach creates many middle-level nodes and subtrees that may never be included in the final complete syntactic analysis tree. In addition, the parsing method can simultaneously generate a plurality of complete parsing trees.

The syntax subsystem would continuously apply the rules to search all possible complete syntax parse trees indefinitely until no more rules are available to apply. The syntactic subsystem may also try a different heuristic to generate the most likely node first. After generating one or several complete parse trees, the syntax subsystem may generally terminate the search, as the parse tree most likely to be selected to best represent the input sentence may be one of the first generated parse trees. If the complete parse tree is not generated after a fair search, the most likely sub-trees can be combined into a single tree whose root node can be generated using certain aggregation rules, which can result in a suitable parsing.

Fig. 6 illustrates the initial leaf nodes established by the syntax subsystem for the dictionary entries initially displayed in fig. 2-5. The leaf node includes two special nodes 601 and 614 that represent the beginning of a sentence and the period that ends the sentence, respectively. Each of the nodes 602 and 613 represents a single portion of the sentence in which an input word can be represented in the sentence. These sentence parts can be found in the dictionary entries as attribute/value pairs. For example, leaf nodes 602 and 603 represent two possible statement portions of The word "The," which are found in FIG. 2 as attributes 204 and 212.

Fig. 7-22 show the construction of the final syntax analysis tree by the syntax subsystem on a rule-by-rule basis. Each drawing illustrates the application of a single syntactic rule to generate an intermediate level node representing the syntactic structure. Only those rules that produce the intermediate level nodes that are used to compose the final parse tree are set forth. The syntax subsystem generates a number of intermediate level nodes that are not ultimately included in the final syntax analysis tree.

The syntactic subsystem in fig. 7-14 applies a unigram rule for building intermediate level nodes representing simple verbs, nouns and adjective phrases. Beginning with FIG. 15, the syntactic subsystem begins applying binary syntactic rules and combines simple verbs, nouns and adjective-word groups into a multi-word syntactic structure. The syntax subsystem orders the rules by their likelihood of successful application and then attempts to apply them piece by piece until a rule is found that can be successfully used for the existing node. For example, as shown in FIG. 15, the syntactic subsystem successfully applies a rule to create a node representing a noun phrase from an adjective phrase and a noun phrase. The rules specify the characteristics that adjectives and noun phrases should possess. The adjective phrase in this example must be a qualifier. Following the pointer from node 1501 back to node 1503 and then accessing the morphological information included in node 1503, the syntactic subsystem determines that node 1501 does not represent a qualifier. After finding two nodes 1501 and 1502 having the characteristics required by the rule, the syntactic subsystem applies the rule to establish a middle level node representing the noun phrase "my friend" from the two simplified word groups 1501 and 1502. In FIG. 22, The syntax subsystem applies a ternary rule that combines The special BEGIN1 leaf node 2201, The verb phrase "The person whom I met ws my friend" 2202, and The leaf node 2203 representing The terminal period into a node 2204 representing a declarative sentence, thereby generating The final complete syntactic parse tree representing The input sentence.

The semantic subsystem generates a logical form graph from a complete syntactic parse tree. In some NLP systems, a logical form graph is constructed from the nodes of the syntactic parse tree, to which attributes and new bi-directional chains are added. The logical form graph is a labeled directed graph. Which is a semantic representation of the input sentence. By referring to the leaf nodes of the syntactic parse tree internally from the nodes of the logical form graph, the information obtained by the word state subsystem for each word can still be used. Both the orientation and the labels of the chains of the logical form graph represent semantic information, including the functional role of the nodes of the logical form graph. During its analysis, the semantic subsystem adds chains and nodes to represent (1) omitted but implied words; (2) omitted or unclear contents and modifiers of verb phrases: and (3) the object referred by the prepositional word group.

FIG. 23 illustrates a complete logical form diagram generated by the semantic subsystem for an input illustrative sentence. As a result of the successful application of the semantic rules, the semantic subsystem assigns meaningful tokens to chains 2301-2306. The six nodes 2307-2312 and the links between them represent the principal components of the semantic meaning of the sentence. In general, logical form nodes roughly correspond to The words entered, but words that are not required for certain expressive semantic meanings, such as "The" and "whom", do not appear in The logical form graph, while The verbs "met" and "was" entered appear in their informal forms "meet" and "be". These nodes are represented by records in the computer system and contain additional information not shown in fig. 23. The fact that verbs are input in the singular past tense is indicated by the additional information in the logical form node corresponding to the meaning of verbs 2307 and 2310.

Comparing fig. 23 with fig. 22, the difference between the syntax analysis tree and the logical form diagram can be easily seen. The syntax analysis tree shown in fig. 22 includes 10 leaf nodes and 16 intermediate level nodes connected together in a strict hierarchy, where the logical form diagram shown in fig. 23 includes only 6 nodes. Unlike syntactic parse trees, logical form graphs are not ordered hierarchically, as is evident from the two opposite-direction chains between nodes 2307 and 2308. Furthermore, as indicated above, these nodes no longer represent the exact form of the input word, but rather their meaning.

The semantic analysis is followed by a further natural language processing step. They involve the following steps: combining logical form graphs with additional information obtained from a knowledge base, parsing the set of sentences, and generally attempting to aggregate a rich contextual environment around each logical form graph that is close to what people handle natural language.

The prior art methods for generating logical form diagrams involve computationally complex adjustments and manipulations of the syntax analysis trees. As a result, it is increasingly difficult to add new semantic rules to NLP systems. Adding new rules involves new process logic that may conflict with process logic that has been programmed in the semantic subsystem. Furthermore, because the nodes of the syntactic parse tree are extended and reused as nodes of the logical form graph, the semantic subsystems of the prior art produce large, cumbersome, complex data structures. The size and complexity of the logical form graph overlaid on the syntax analysis tree makes the combined data structure prone to errors and inefficiencies in further use. It is therefore desirable to have a more extensible and manageable semantic subsystem for generating simple logical form graph data structures.

The invention relates to a method and a system for performing semantic analysis on an input sentence in an NLP system. The semantic analysis subsystem receives a syntactic analysis tree generated by the morphological and syntactic subsystem. The semantic analysis subsystem applies two sets of semantic rules to adjust the received syntactic analysis tree. The semantic analysis subsystem then applies a third set of semantic rules to build a frame logical form graph from the syntactic analysis tree. Finally, the semantic analysis subsystem applies two sets of additional semantic rules to the frame logical form graph to provide semantically meaningful labels for chains of the logical form graph, to establish additional logical form graph nodes for omitted nodes, and to unify redundant logical form graph nodes. The final logical form graph generated by the semantic analysis subsystem represents the complete semantic analysis of the input sentence.

Fig. 1 is a block diagram illustrating the flow of information between subsystems of an NLP system.

FIGS. 2-5 show examples stored on an electronic storage medium for input: "The personwhom I met ws my friend. "is retrieved for each word.

Fig. 6 shows leaf nodes generated by the syntax subsystem in a first step of parsing an input sentence.

Fig. 7-22 show the process by which the syntactic subsystem continuously applies syntactic rules to parse an input sentence and generate a syntactic parse tree.

FIG. 23 illustrates a logical form diagram generated by the semantic subsystem to represent the meaning of an input sentence.

FIG. 24 shows a block diagram illustrating a preferred computer system for natural language processing.

FIG. 25 illustrates three phases of a new preferred semantic subsystem.

FIG. 26 is a flow diagram of a New Semantic Subsystem (NSS).

FIG. 27 shows a first set of semantic rules.

FIG. 28A shows a detailed description of the semantic rule PrLF _ You in the first set of semantic rules.

Fig. 28B shows an application example of the semantic rule PrLF _ You in the first set of semantic rules.

FIG. 29 shows a second set of semantic rules.

FIGS. 30A-30B show a detailed description of the semantic rule TrLF _ MoveProp in the second set of semantic rules.

Fig. 30C shows an example of the application of the semantic rule TrLF _ MoveProp in the second set of semantic rules.

FIG. 31 shows a flowchart of application _ rules.

FIG. 32 shows a flow chart for the first stage of NSS.

FIG. 33 shows a third set of semantic rules.

FIGS. 34A-C show a detailed description of the semantic rule SynToSem1 in the third set of semantic rules.

FIG. 34D shows an example of the application of the semantic rule SynToSem1 in the third set of semantic rules.

FIG. 35 shows a flow chart for the second stage of NSS.

FIGS. 36-38 show a fourth set of semantic rules.

FIG. 39A shows a detailed description of semantic rule LF _ Dobj2 in the fourth set of semantic rules.

FIG. 39B shows an example of an application of the semantic rule LF _ Dobj2 in the fourth set of semantic rules.

FIG. 40 shows a fifth set of semantic rules.

FIGS. 41A-C show a detailed description of the semantic rule PsLF _ PronNaphora in the fifth set of semantic rules.

Fig. 41D shows an application example of the semantic rule PsLF _ pronnanaphora in the fifth set of semantic rules.

Fig. 42 shows a flow chart of the third stage of NSS.

FIG. 43 is a block diagram of a computer system of the NSS.

FIGS. 44-59 show that the NSS successfully applies each rule when it processes the parse tree generated for the input illustrative sentence.

The present invention provides a new semantic method and system for generating logical form graphs from syntax trees. In a preferred embodiment, the New Semantic Subsystem (NSS) performs semantic analysis in three stages: (1) filling and adjusting the syntactic parse tree, (2) generating a logical form graph, and (3) generating meaningful labels and constructing a complete logical form graph for the chain of logical form graphs. Each phase includes applying one or two sets of rules to either a set of syntax tree nodes or a set of logical form graph nodes.

NSS addresses the noted shortcomings in the prior art semantic subsystems described in the background section above. Each phase of NSS is a simple and extensible rule-based approach. When additional linguistic phenomena are recognized, the rules that handle them can be easily included in one of the sets of rules to which the NSS applies. In addition, the NSS second stage generates a completely separate logical form graph rather than overlaying the logical form graph on an existing parsing tree. Thus, the logical form graph data structure generated by the NSS is simple and space-efficient compared to prior art logical form graph data structures.

Fig. 24 is a block diagram illustrating a preferred computer system for NLP systems. Computer system 2401 includes a central processing unit, a memory, a storage device, and input output devices. NLP subsystem 2406-2409 is typically loaded into memory 2404 from a computer readable storage device, such as a disk. Applications 2405 that use services provided by the NLP system are also typically loaded into memory. Electronic dictionary 2411 is stored on a storage device, such as disk 2410, and its entries are read into memory for use by the morphological subsystem. In one embodiment, the user typically enters one or more natural language sentences on input device 2404 in response to prompts displayed on output device 2403. The natural language sentences are received and processed by the application program and then sent to the NLP system through the morphological subsystem 2406. The morphological subsystem uses the information of the electronic dictionary to construct records describing each input word and passes these records to the syntactic subsystem 2407. The syntactic subsystem parses the input word to construct a syntactic parse tree and sends the syntactic parse tree to the semantic subsystem 2408. The semantic subsystem generates a logical form graph from the received parse tree and sends the logical form graph to other NLP subsystems 2409. The application can then send and receive information to the natural language subsystem 2409 to take advantage of the machine understanding of the input text done by the NLP system and finally output a response to the user on the output device 2403.

FIG. 25 illustrates three phases of a preferred new semantic subsystem. Stages 1-3 of NSS are shown at 2502, 2504 and 2506, respectively. The states of the relevant data structures input and output by each stage of the NSS are shown in fig. 25 as labels 2501, 2503, 2505, and 2507. The NSS receives a syntactic parse tree 2501 generated by the syntactic subsystem. The NSS first stage 2502 completes the parsing tree using semantic rules and passes the completed parsing tree 2503 to the NSS second stage 2504. The NSS second stage generates an initial logical form diagram 2505 and sends the initial logical form diagram to the NSS third stage 2506. The NSS third stage applies semantic rules to the initial logical form graph to add meaningful semantic tags to the chains of the logical form graph to add new chains and nodes to fill in the semantic representation of the input sentence and to occasionally delete redundant nodes. The complete logical form graph 2507 is then sent to other NLP subsystems for further interpretation of the input sentence represented by the logical form graph or for answering a question or preparing data based on the input sentence.

The flow diagram for NSS is shown in fig. 26, which shows the sequential invocation of three phases 2601, 2602, and 2603 of NSS. Each stage of NSS will be described in detail below.

NSS first stage-completes the syntactic role of the syntactic tree.

In the first stage of NSS, the NSS applies two different sets of semantic rules to the nodes of the parse tree received from the syntax subsystem to change the parse tree. These semantic rules can change the structure of the links of the syntax tree or add new nodes.

The NSS applies a first set of semantic rules to address different possible omissions and defects that cannot be handled by syntactic analysis. Applying these first set of semantic rules may enable a preliminary adjustment of the syntactic parse tree of the input. The linguistic phenomena handled by the first set of semantic rules include verbs omitted after the words "to" or "not" but understood by listeners as meaning words, pronouns omitted in command sentences such as "you" or "we", extensions related to the side-by-side structure of the words "and" or ", and omitted object or abridged verb phrases. FIG. 27 lists a preferred first set of semantic rules that NSS applies during the first phase. The rule name of each rule is shown, followed by an accurate description of the linguistic phenomena it processes.

The general format of each semantic rule is a set of conditions for a syntactic analysis tree node or logical formal graph node and a series of operations for a syntactic analysis tree or logical formal graph. For example, the NSS applies the condition of each rule in the first set of semantic rules to a series of syntactic records representing a syntactic parse tree, and for each rule that satisfies all the conditions of the rule, the NSS performs a series of operations contained in the rule, thereby specifically changing the syntactic parse tree. Of course, the actual form of each semantic rule depends on the representation details of the parse tree and the logical form graph, which may have many different representations. In the following figures, a semantic rule is described by a conditional expression followed by a boldface "If" followed by a series of operations followed by a boldface "Then". The "If" portion of the semantic rule represents the conditions that must be applied to a parse tree node or logical form graph node to which the rule applies in its entirety in real time, and the "Then" expression represents a series of operations performed on the parse tree or logical form graph. The displayed expression closely corresponds to the computer source code expression of the semantic rule.

FIG. 28A shows an English representation of the semantic rule PrLF _ You in the first set of semantic rules. As can be seen in fig. 28A, the "If" expression relates to different attribute values of the parsing tree node to which the rule is applied, and the "Then" expression specifies the establishment of pronoun nodes for the vocabulary entry "you" and the establishment of a noun phrase parent for the pronoun nodes and the addition of the established nodes to the parsing tree.

FIG. 28B shows an example of applying the semantic rule PrLF _ You to a syntax parse tree 2801 generated by the syntax subsystem for the sentence "apply close the door". The result of applying PrLF _ You is a modified parsing tree 2802 with two new nodes 2803 and 2804 connected to the root node of the sentence. The purpose of this semantic rule is to explicitly put the unexplained "you" in the command sentence into the parse tree.

After applying all semantic rules of the first set of semantic rules that can be used for input of the parsing tree, the NSS applies the second set of semantic rules to the nodes of the preliminarily adjusted parsing tree to perform the primary adjustment operation of the preliminarily adjusted parsing tree. This second set of rules includes rules for identifying and resolving distant addition phenomena for converting verb phrases into verbs with prepositional phrase objects and, in certain cases, for replacing the word "it" with an indefinite clause.

FIG. 29 lists a preferred second set of semantic rules that NSS applies in the first phase. The rule name of each rule is shown, followed by an accurate description of the linguistic phenomena it processes. FIGS. 30A-30B show an English representation of the semantic rule TrLF _ MoveProp in the second set of semantic rules. 30A-30B, the "If" expression relates to different attribute values for the parsing tree node and different related parsing tree nodes to which the rule applies, and the "Then" expression specifies a more complex rearrangement of the parsing tree.

FIG. 30C shows an example of using the semantic rule TrLF _ MoveProp to a parsing tree 3001 generated by the syntax subsystem for the sentence "I have no destination to see the man". The result of applying TrLF _ MoveProp is a modified syntax parsing tree 3002. The informal clause represented by node 3003 in the original parse tree has moved from its position as a child of node 3004 to the position as a child of node 3005 of the root node DECL 13006 of the modified parse tree. The purpose of this semantic rule is to move clauses, such as the indefinite clause 3003, from lower levels in the syntax tree to higher levels to facilitate subsequent transitions from the syntax analysis tree to the logical form graph.

In a preferred embodiment of the invention, semantic rules are statements in a programming language that, when executed, create a new tree or graph node from one, two or sometimes more existing tree or graph nodes and create an appropriate chain between the newly created node and the existing tree or graph nodes. In a preferred embodiment, the left end of a semantic rule specifies the characteristics that an existing node or nodes must possess to apply the rule. The right end of the semantic rule specifies the type of new node to be created and the attribute values of the new node. The rules described in fig. 28 and 30 illustrate this form.

In a preferred embodiment of the present invention, each syntactic parse tree and each logical form graph is represented as a set of nodes, with the chains between the nodes represented by attribute values within the nodes. Each set of rules is also represented as a table. The step of applying a set of rules to the syntax analysis tree involves selecting successive nodes from a set of nodes and attempting to apply each rule of a set of rules representing the set of rules to each selected node. A rule may be successfully applied to a node if the node has the characteristics specified in the left end of the particular rule. Sometimes, the result of a successful application of a rule is to create a new node, or an existing node may be marked as deleted.

A flow diagram of the sub-routine "apply _ rules" using a set of rules to represent a set of nodes of a syntactic parse tree or logical form graph is shown in fig. 31. The NSS calls the subroutine "apply _ rules" in each of the three phases of the NSS to apply each set of rules. In step 3101, the application _ rules receives a set of nodes as its first parameters and a set of rules as its second parameters. Steps 3102 through 3107 represent an outer loop that attempts to apply all of the input rules of the input set of rules to successive nodes selected from the input set of nodes each time it is iterated. Steps 3103 through 3106 represent an inner loop that attempts to apply a rule selected from the input set of rules to a node selected from the input set of nodes each time it is iterated. In step 3102, apply _ rules is started from the first node and the next node is selected from the input set of nodes. In step 3103, application _ rules starts with the first rule and selects the next rule in the input set of rules. Application _ rules in step 3104 determines whether the selected node has the characteristics specified by the left end of the selected rule. If the node has the specified characteristics, then apply _ rules applies the selected rule to the selected node in step 3105. If the application _ rules determines in step 3106 that more rules are available for the selected node, then the application _ rules returns to step 3103 to select the next rule. If the application _ rule determines in step 3107 that there are more nodes trying to apply the input set of rules, then the application _ rule returns to step 3102 to select the next node.

A flowchart of the processing performed in the first stage of NSS is shown in fig. 32. In step 3201, the variable "parameter 1" is assigned as a table of a set of parse tree nodes that make up the parse tree generated by the syntactic subsystem and input to the NSS. In step 3202, the variable "parameter 2" is assigned as a set of tables for the first set of semantic rules shown in FIG. 27. In step 3203, the NSS calls the subroutine "apply _ rules" and sends the variables "parameter 1" and "parameter 2" to the subroutine. The subroutine "apply _ rules" applies a first set of semantic rules to the parse tree to achieve the preliminary adjustment. In step 3204, the variable "parameter 1" is assigned as a table of a set of parse tree nodes that make up the initially-adjusted parse tree. In step 3205, the variable "parameter 2" is assigned as a table of a second set of semantic rules shown in FIG. 29. In step 3206, the NSS calls the subroutine "apply _ rules" to send the variables "parameter 1" and "parameter 2" to the subroutine. The subroutine "apply _ rules" applies a second set of semantic rules to the parse tree to achieve the primary adjustment.

NSS second phase-generating initial logical form graph

In the NSS second phase, the NSS applies a third set of semantic rules to the adjusted syntax tree nodes. Each successful rule application in the second phase can create a new logical form graph node. By applying this third set of rules, the NSS builds a new logical form graph. The logical form graph nodes include only semantically meaningful attributes and a pointer back to the corresponding syntax tree node. Unlike prior art semantic subsystems, the logical formal graph nodes established by the NSS in the second phase are completely separate and apart from the parsing tree nodes. The NSS constructs a logical form graph framework, which includes chains that are deposited as attributes within the nodes and which interconnect the logical form graph nodes.

A table of a third set of semantic rules applied by NSS in the second phase is shown in fig. 33. FIG. 33 shows the rule name of each rule followed by an accurate description of the syntactic phenomenon it handles. There are only three rules in this third set of rules, only the first rule SynToSem1 is commonly used. The second and third rules are only applied in the special case where the syntax subsystem generates the appropriate parsing, so that the adapted syntax tree contains an appropriate parsing node.

34A-34C show English representations of the semantic rules SynToSem1 in the third set of semantic rules. As can be seen in fig. 34A-34C, the "If" expression relates to different attribute values for the parsing tree node to which the rule applies and for different related parsing tree nodes, and the "Then" expression specifies the creation of a logical form graph node and the placement of the new node in the logical form graph that just appeared.

FIG. 34D shows an example of applying The semantic rules SynToSem1 to a syntax analysis tree 3401 generated by The syntax subsystem for The sentence "The book wa writen by John". The result of applying SynToSem1 is a logical form diagram 3402 of the framework. The logical form graph of the framework has three nodes with temporary modifiers for the marker chain. And according to the syntactic attributes of the syntactic analysis tree nodes from which the new nodes are built, giving the attributes to the new nodes. The logical form graph has much fewer nodes than the corresponding syntax parse tree, since the logical form graph represents the semantic meaning of the sentence. The grammatical meaning of the words "the", "was", and "by" in the initial sentence is or will be included in the attributes and labels of the logical form graph, and thus does not need to be derived from the complex hierarchy of nodes that are the existence of leaf nodes in the syntactic analysis tree.

FIG. 35 shows a flow chart for the second stage of NSS. In step 3501, the variable "parameter 1" is assigned as a set of nodes representing the adjusted parse tree, and in step 3502, the variable "parameter 2" is assigned as a set of the third set of semantic rules shown in FIG. 33. In step 3503, the NSS calls the subroutine "apply _ rules" to apply the third set of semantic rules to the adjusted parse tree nodes to create a new logical form graph corresponding to the adjusted parse tree.

NSS third stage-completion logic form diagram

In the third stage of NSS, the NSS applies a fourth set of semantic rules to the framework logical form graph to add semantically meaningful labels to the chains of the logical form graph. These new labels include "deep subject" ("Dsub"), "deep object" ("Dobj"), "deep indirect object" ("Dind"), "deep subject predicate" ("Dnom"), "deep complement" ("Dcmp"), and "deep adjective predicate" ("Dadj"). A fourth set of semantic rules that a set of NSS applies in the third stage is shown in fig. 36-38. FIGS. 36-38 show the rule name of each rule and the precise description of the grammatical phenomena it processes following it.

Fig. 39A shows an english representation of semantic rule LF _ Dobj2 in the fourth set of semantic rules. As can be seen in fig. 39A, the "If" expression relates to different attribute values of the logical form graph node to which the rule applies, and the "Then" expression specifies the labels in the chain of the logical form graph.

FIG. 39B shows an example of applying semantic rule LF _ Dobj2 to logical form diagram 3901 generated by NSS for The sentence "The book writen by John". Applying LF _ Dobj2 to a logical form graph containing passive clauses can recognize the syntactic subject as a deep object of action. This is done in FIG. 39B by relabeling the chain 3903 from the transient modifier as the marker 3904 for marking deep object relationships.

As a final step in the third phase, the NSS performs a final adjustment of the logical form graph by applying a fifth set of semantic rules. The set of rules includes rules for combining relational pronouns with their predecessors, finding and explicitly including omitted pronouns, resolving digital ellipses, providing a deep subject of omission, unifying redundant instances of a person's pronoun, and contracting parallel structures that will be expanded in the first sub-step of semantic analysis. These rules also relate to the problem of taking a pronoun (or "pronoun form") and identifying the noun phrase to which it refers. In many cases, it is not possible to distinguish the correct noun phrase from the object to which it refers, according to the information hierarchy provided by the logical form graph. In these cases, a set of most likely candidates is established and the processing is deferred until a later step in the NLP system that employs more global information. A fifth set of semantic rules applied by NSS in the third phase is shown in FIG. 40. FIG. 40 shows the rule name of each rule followed by an accurate description of the syntactic phenomenon it handles.

FIGS. 41A-41C show English representations of the semantic rule PsLF _ PronNaphora in the fifth set of semantic rules. As can be seen from fig. 41A-41C, the "If" expression relates to different attribute values of the logical form graph node and related logical form graph nodes to which the rule applies, and the "Then" expression specifies the addition of a logical form graph node representing an omitted reference of pronouns.

FIG. 41D shows an example of applying the semantic rule PsLF _ PronNaphora to a logical form diagram 4101 generated by NSS for the sentence "Mary letters the man white cam to diner, and Joan letters him to. The result of applying PsLF _ pronnanphora to a logical form graph containing a pronoun node with the referee located in different parts of the logical form graph is to add a new node with a pronoun node directly connected to it. The new node 4103 has been added in fig. 41D using PsLF _ PronAnaphora to mark node "he 1" as "man".

A flowchart of the processing performed in the third stage of NSS is shown in fig. 42. In step 4201, the variable "parameter 1" is assigned as a set of logical form graph nodes for composing the logical form graph generated in the second phase of NSS. In step 4202, the variable "parameter 2" is assigned as a set of fourth set of semantic rules shown in FIGS. 36-38. The NSS calls the subroutine "apply _ rules" in step 4203, and sends the variables "parameter 1" and "parameter 2" to the subroutine. The subroutine "apply _ rules" applies a fourth set of semantic rules to the logical form graph to add semantically meaningful labels to the chain of the logical form graph. In step 4204, the variable "parameter 1" is assigned as a set of logical form graph nodes for composing the meaningful labeled logical form graph generated in step 4203. In step 4205, the variable "parameter 2" is assigned as a set of fifth set of semantic rules as shown in FIG. 40. In step 4206, the NSS calls the subroutine "apply _ rules" to send the variables "parameter 1" and "parameter 2" to the subroutine. The subroutine "apply _ rules" applies a fifth set of semantic rules to the logical form graph to achieve the final adjustment.

FIG. 43 is a block diagram of a computer system for NSS. Computer 4300 includes a memory with semantic rules 4304 and rule applicator 4308. Under the control of the central processing unit, the rule applier applies five sets of rules to the parse tree 4301 to generate a corresponding logical form graph 4302. The syntactic parse tree is preferably generated by morphological and syntactic subsystems, not shown. Syntax trees and logical form diagrams can also be used to accomplish subsequent tasks that require information similar to that obtained by a human reader from an input sentence. For example, the grammar checker may suggest new expressions for the input sentence in order to more accurately or concisely narrate what is being narrated in the input sentence. As another example, a computer operating system may perform a computational task described by an input sentence. As yet another example, information contained in the input sentence may be categorized and deposited elsewhere by the database management system for later retrieval.

Semantic processing of input example sentences

The following discussion and FIGS. 44-59 describe The complete NSS processing of The example sentence "The person whom I method friendly". Each semantic rule applied by the NSS and a representation of the result of applying the rule will be described.

A preliminary tuning rule is set in the first set of semantic rules to be successfully applied to the parse tree for NSS input from the syntactic subsystem in the first phase. One primary adjustment rule of the second set of semantic rules is applied to the input syntax analysis tree. Fig. 44 shows a syntax analysis tree 4400 with its input form. It is noted that its representation in fig. 44 is simpler than in fig. 22. The NSS successfully applies the semantic rule TrLF _ LongDist1 shown in fig. 29 as rule 1 to the relational clause node RELCL 14401 of the parse tree 4400 to generate the adapted parse tree 4402. The effect of applying rule TrLF _ LongDist1 is to introduce a direct object property in the noun phrase node 4403 to indicate that the word "whom" is the direct object of the phrase "I met". The verb is followed by the direct object of the normal tense verb in English. Since "whom" does not follow "I met" in the sentence being analyzed to generate the syntax tree 4400, the fact that "whom" is a direct object of "I met" cannot be discerned when applying the syntax rules.

Seven rules of the third set of rules are successfully applied in the NSS second phase. The NSS in fig. 45 successfully applies the semantic rule SynToSem1 shown in fig. 33 as rule 1 to the qualifier node DETP 24301 of the parse tree to generate the logical form graph node "my" 4502. The NSS in fig. 46 successfully applies the semantic rule SynToSem1 to the noun phrase node NP 44601 of the syntactic parse tree to generate a logical form graph node "friend" 4602 and a chain 4603 with a temporary semantic label "Tmods" 4606. The NSS in fig. 47 successfully applies the semantic rule SynToSem1 to the noun phrase node NP 34701 of the syntactic parse tree to generate logical form graph node "I" 4702. The NSS in fig. 48 successfully applies the semantic rule SynToSem1 to the noun phrase node NP 24801 of the syntactic parse tree to generate the logical form graph node "whom" 4802. The NSS in fig. 49 successfully applies the semantic rule SynToSem1 to the syntactic clause node RELCL 14901 of the syntactic parse tree to generate a logical form graph node "meet" 4902 and a chain 4903 with a temporary semantic tag "Tmods" 4904. The NSS in fig. 50 successfully applies the semantic rule SynToSem1 to the noun phrase node NP 15001 of the syntactic parse tree to generate a logical form graph node "person" 5002 and a chain 5003 with a temporary semantic label "Tmods" 5004. The NSS in fig. 51 successfully applies the semantic rule SynToSem1 to the syntax node DECL 15101 of the parse tree to generate a logical form graph node "be" 5102 and a chain 5103 with a temporary semantic label "Tmods" 5104. Thus, a logical form diagram of the framework is created upon completion of the second stage of NSS.

Six rules in the fourth set of semantic rules are successfully applied in the NSS third stage. The NSS in fig. 52 successfully applies the semantic rule LF _ Dsub1 shown in fig. 36 as rule 1 to the logical form graph node "be" 5201 to generate a chain label "Dsub" 5202 and a chain 5203 with a temporary semantic label "Tmods" 5204. NSS in fig. 53 successfully applies the semantic rule LF _ Dnom shown in fig. 36 as rule 10 to the logical form graph node "be" 5301 to generate the chain marker "Dnom" 5302. NSS in fig. 54 successfully applies the semantic rule LF _ tips shown in fig. 38 as rule 21 to the logical form graph node "person" 5401 to generate the chaining token "tips" 5402. The NSS in fig. 55 successfully applies the semantic rule LF _ Dsub1 shown in fig. 36 as rule 1 to the logical form graph node "meet" 5501 to generate the chain marker "Dsub" 5502. The NSS in fig. 56 successfully applies the semantic rule LF _ Dobj1 shown in fig. 36 as rule 3 to the logical form graph node "meet" 5601 to generate a chain labeled "Dobj" 5603 to link the node "meet" to the node "whom" 5602. The NSS in fig. 57 successfully applies the semantic rule LF _ Ops shown in fig. 38 as rule 22 to the logical form graph node "friend" 5701 to generate the chain marker "posby" 5702.

One of the fifth set of word sense rules is successfully applied in the third stage of NSS. The NSS in fig. 58 successfully applies the sense rule PsLF _ RelPro shown in fig. 40 as rule 1 to the logical form graph node "whom" shown as 5602 in fig. 56 to generate a chain labeled "Dobj" 5801 and delete the node "whom". NSS in fig. 59 successfully applied the semantic rule PsLF _ UnifyProns shown in fig. 40 as rule 10 to the logical form graph to merge nodes "I" and "my" into a single node. This is the last rule that NSS applies successfully. FIG. 59 thus shows The final complete logical form diagram generated by NSS for The input sentence "The person whom I means my friend".

While the invention has been described in connection with a preferred embodiment, it is not intended to be limited to that embodiment. Modifications within the spirit of the invention will be readily apparent to those skilled in the art, and the scope of the invention is defined by the appended claims.

Claims

1. A method for generating a logical form graph for a natural language sentence in a computer system, the sentence being represented by a parse tree having nodes for representing syntactic structure components of the sentence, the parse tree being represented in a data structure, the method comprising:

adjusting the parse tree based on the semantic analysis of the parse tree, the adjustment being an adjustment that cannot be made based on the parsing of the parse tree;

generating a frame logical form graph for the adapted parse tree, the frame logical form graph being represented in a data structure separate from the parse tree data structure; and

and performing semantic analysis on the frame logic form diagram to complete the logic form diagram.

2. The method of claim 1, wherein the step of adapting the parse tree includes adding syntactic roles to syntactic structural components implied in the sentence.

3. The method of claim 2, wherein when the sentence omits a verb after the predefined word, the step of adding a syntactic contribution adds a syntactic structural component to the omitted verb.

4. The method of claim 3, wherein the predefined word is the word "to".

5. The method of claim 3, wherein the predefined word is the word "not".

6. The method of claim 2, wherein when the sentence omits a pronoun, the step of adding a syntactic contribution adds a syntactic construct to the omitted pronoun.

7. The method of claim 6, wherein the omitted pronoun is the word "you" in the command sentence.

8. The method of claim 2, wherein when the sentence includes a side-by-side structure, the step of adding a syntactic construct adds a syntactic construct component to extend the side-by-side structure.

9. The method of claim 8, wherein the juxtaposition includes the word "and".

10. The method of claim 8, wherein the juxtaposition comprises the word "or".

11. The method of claim 2, wherein the step of adjusting the parse tree includes resolving long-range additive phenomena after adding syntactic effects.

12. The method of claim 2, wherein the step of adjusting the syntactic parse tree includes converting verb phrases to verbs with prepositional phrase objects after adding syntactic contribution.

13. The method of claim 2, wherein the step of adapting the parse tree includes replacing the word "it" with an indefinite clause after adding the syntactic contribution.

14. The method of claim 1, wherein the step of generating the frame logical form graph includes assigning attributes to nodes of the frame logical form graph based on the adjusted attributes of the syntactic parse tree.

15. The method of claim 1, wherein the step of performing semantic analysis on the frame logical form graph comprises adding semantic tags to the frame logical form graph.

16. The method of claim 15, wherein the semantic tags mark deep portions of the markup language.

17. The method of claim 15, wherein the step of performing semantic analysis on the frame logical form graph includes adding semantic structure components after adding semantic tags.

18. A computer system for generating a logical form graph for a natural language phrase, the phrase represented by a syntactic parse tree having nodes for representing syntactic structural components of the phrase, the computer system comprising:

a first component for adjusting the parse tree according to a semantic analysis of the parse tree, the adjustment being an adjustment that cannot be made according to a syntactic analysis of the parse tree;

a second component for generating a frame logical form graph for the adapted parse tree, the frame logical form graph being represented in a data structure separate from the parse tree data structure; and

and a third component for performing semantic analysis on the frame logical form graph to complete the logical form graph.

19. The computer system of claim 18, wherein the first component is configured to adjust the parse tree by adding a syntactic contribution to a syntactic structure component implied in the phrase.

20. The computer system of claim 19, wherein the first component is configured to add syntactic roles by adding syntactic structure components to omitted verbs when the verbs are omitted from the phrase after the predefined word.

21. The computer system of claim 20, wherein the predefined word is the word "to".

22. The computer system of claim 20, wherein the predefined word is the word "not".

23. The computer system of claim 19, wherein the first component is configured to add syntactic roles in omitting pronouns in the phrase by adding syntactic construct components to the omitted pronouns.

24. The computer system of claim 23 wherein the omitted pronoun is the word "you" in the command phrase.

25. The computer system of claim 19, wherein the first component is configured to add syntactic structure components to expand the side-by-side structure when the phrase includes the side-by-side structure.

26. The computer system of claim 25 wherein the parallel structure includes the word "and".

27. The computer system of claim 25, wherein the parallel structure includes the word "or".

28. The computer system of claim 19, wherein the first component is configured to adjust the parse tree by addressing long-range additive phenomena after adding the syntactic contribution.

29. The computer system of claim 19, wherein the first component is configured to adjust the syntactic parse tree by converting verb phrases to verbs with prepositional phrase objects after adding the syntactic contribution.

30. The computer system of claim 19, wherein the first component is configured to adjust the parse tree by replacing the word "it" with an indefinite clause after adding the syntactic contribution.

31. The computer system of claim 18, wherein the second component is configured to generate the frame logical form graph by assigning attributes to the frame logical form graph nodes according to the adjusted attributes of the syntactic analysis tree.

32. The computer system of claim 18, wherein the third component is configured to perform semantic analysis on the frame logical form graph by adding semantic tags to the frame logical form graph.

33. The computer system of claim 32, wherein the semantic tags comprise deep parts of a language.

34. The computer system of claim 32, wherein the third component is configured to perform semantic analysis on the frame logical form graph by adding semantic structure components after adding semantic tags.