Summary of the invention
The problem and shortage that exists of prior art in view of the above, the objective of the invention is to solve the problem that prior art exists, a kind of semantic verbs information extracting method based on the incident body is provided, this method has not only improved the accuracy rate of identification verb by coupling verb and verb role's method, and generates abundant semantic verbs information by the relation information between verb word sense information, time temporal information and verb and the verb role.
In order to reach above purpose, the present invention adopts following technical proposals:
A kind of semantic verbs information extracting method based on the incident body is characterized in that at first obtaining sentence key element array A according to the sentence of importing; Next utilizes role's extracting rule that verb role among the sentence key element array A is extracted; Then the method for mating the verb notion by verb and verb role is determined the verb meaning of a word; Then discern the time and the temporal information of verb again according to time tense extracting rule; Generate semantic verbs information according to verb role, the verb meaning of a word and verb time temporal information at last, its concrete steps comprise:
A, obtain sentence key element array A: the input sentence, from sentence, obtain the speech that meets the sentence key element, form sentence key element array A by these speech;
B, verb role extract: utilize the verb role among the verb role extracting rule extraction sentence key element array A;
C, the identification of the verb meaning of a word: the method by verb notion in verb and the verb role match event body is determined the verb meaning of a word;
D, the tense identification of verb time: according to the time and the temporal information of time tense extracting rule identification verb;
E, generation semantic verbs information: generate semantic verbs information according to the verb time temporal information of discerning among the verb meaning of a word of discerning among the verb role who extracts among the above-mentioned steps B, the above-mentioned steps C and the above-mentioned steps D.
Obtain sentence key element array A described in the above-mentioned steps A, its operation steps is as follows:
A1, use the participle instrument to carry out participle and word mark part of speech to cutting out to the sentence of input;
If do not have gerund or verb to ignore this sentence in the A2 sentence, promptly the processing of semantic verbs information extraction do not done in this sentence;
A3, according to participle in the steps A 1 and part-of-speech tagging result, the speech of the structural word, noun, gerund and the verb that meet BA-sentence that the sentence key element requires or " quilt " words and expressions in the sentence is extracted, and to be unit with the speech leave among the sentence key element array A by the sequencing of speech in former;
Verb role described in the above-mentioned steps B extracts, and is to utilize verb role extracting rule that verb role among the sentence key element array A is extracted, and its operating process is as follows:
According to noun composition among the sentence key element array A, noun composition architectural feature, utilize verb role extracting rule that the verb role among the sentence key element array A is extracted at verb front-back direction and " quilt " words and expressions and BA-sentence;
The identification of the meaning of a word of verb described in the above-mentioned steps C is to determine the verb meaning of a word by the method for verb notion in verb and the verb role match event body, and its operating process is as follows:
Verb and verb role among C1, the traversal sentence key element array A are mated, and its operating process is as follows:
Verb among the traversal sentence key element array A, in the incident body, inquire about and judge the character of this verb, if this verb is an intransitive verb, this verb only mates the preceding nearest verb role of verb, if this verb is a body guest verb, nearest verb role behind this preceding nearest verb role of verb coupling verb and the verb is if this verb is a meaning guest verb, nearest verb role before this verb coupling verb obtains verb and verb role and mates set M;
C2, judge that whether verb and verb role are mated set M is empty, if be empty, then abandon this extraction semantic verbs information processing, otherwise whether the incident body of utilization is judged verb and verb role and is mated, if do not match then change the part of speech of this verb into gerund, and again to the coupling of a last verb to discerning again, otherwise it is right to keep this coupling; Neither one mates remaining among the set M if final verb and verb role are mated, then abandon this extraction semantic verbs information processing, otherwise traversal verb and verb role are mated the element of gathering M, to verb notion in the mapping incident body, obtain the verb word sense information by every pair of verb and verb role's coupling;
The identification of the time temporal feature of verb described in the above-mentioned steps D, its operating process is as follows:
Extract the temporal information of sentence and tense adverbial word (as:,, speech such as back), the temporal information and the temporal information of identification verb according to time tense extracting rule;
Generate semantic verbs information described in the above-mentioned steps E, its operating process is as follows:
Extract according to step B that verb role, step C obtain the verb word sense information and step D obtains the semantic verbs information that the time temporal information generates sentence.
The present invention has following conspicuous outstanding substantive distinguishing features and marked improvement compared with prior art:
The present invention is based on the incident body,, improved the accuracy rate of identification verb by coupling verb and verb role's method; By relation information between verb word sense information, time temporal information and verb and verb role, generate abundant semantic verbs information.Solved the low and not enough problem of semantic verbs expression of verb recognition accuracy that prior art exists.
Embodiment
Hereinafter a preferred embodiment that provides the semantic verbs information extracting method based on the incident body of the present invention according to Fig. 1 to 6 it is to be noted, given embodiment is technical characterstic and the functional characteristics that is used for illustrating the inventive method, enable to be easier to understand the present invention, rather than be used for limiting the scope of the invention.
With reference to Fig. 1, this is as follows based on the module architectures that the semantic verbs information extracting method of incident body comprises:
(1) obtain sentence key element array A201: the sentence to input uses the participle instrument to carry out participle and the word mark part of speech to cutting out, if do not have gerund or verb in the sentence, then ignore this sentence, otherwise the structural word of the BA-sentence in the sentence or " quilt " words and expressions, the speech of noun, gerund and verb extracts, and to be unit with the speech leave among the sentence key element array A by the sequencing of speech in former sentence;
(2) the verb role extracts 202: according to the noun composition among the sentence key element array A, the noun composition architectural feature at verb front-back direction and " quilt " words and expressions and BA-sentence, utilize role's extracting rule 205 to extract verb role among the candidate events sentence array A;
(3) semantic verbs identification 203: traversal sentence key element is formed the verb among the array A, in incident body 206, inquire about and judge the character of this verb, if this verb is an intransitive verb, this verb only mates the preceding nearest verb role of verb, if this verb is a body guest verb, nearest verb role behind this preceding nearest verb role of verb coupling verb and the verb is if this verb is a meaning guest verb, nearest verb role before this verb coupling verb obtains verb and verb role and mates set M; Judge whether verb and verb role are mated set M is empty, if be empty, then abandon this extraction semantic verbs information processing, otherwise whether the incident body of utilization is judged verb and verb role and is mated, if do not match then change the part of speech of this verb into gerund, and again to the coupling of a last verb to discerning again, otherwise it is right to keep this coupling; Neither one mates remaining among the set M if final verb and verb role are mated, then abandon this extraction semantic verbs information processing, otherwise traversal verb and verb role are mated the element of gathering M, to verb notion in the mapping incident body, obtain the verb word sense information by every pair of verb and verb role's coupling;
(4) the verb time sequence feature identification 204: according to time tense extracting rule 207 extracting time information and tense adverbial word (as:,, speech such as back), the time and the temporal information of identification verb;
(5) generate semantic verbs information 301: extract 202 according to the verb role and extract verb role, semantic verbs identification 203 and obtain verb word sense information and verb time sequence identification 204 and obtain the time temporal information and generate semantic verbs information in the sentence.
With reference to Fig. 2, this overview flow chart based on the semantic verbs information extracting method of incident body comprises the steps:
A, obtain sentence key element array A, the speech that meets the sentence key element obtained in the input sentence from sentence, form sentence key element array A by these speech;
B, verb role extract, and utilize the verb role among the verb role extracting rule extraction sentence key element array A;
C, the identification of the verb meaning of a word are determined the verb meaning of a word by the method for verb notion in verb and the verb role match event body;
D, the tense identification of verb time are according to the time and the temporal information of time tense extracting rule identification verb;
E, generation semantic verbs information generate semantic verbs information according to the verb time temporal information of discerning among the verb meaning of a word of discerning among the verb role who extracts among the above-mentioned steps B, the above-mentioned steps C and the above-mentioned steps D.
Obtain sentence key element array A described in the above-mentioned steps A, its operation steps is as follows:
A1, use the participle instrument to carry out participle and word mark part of speech to cutting out to the sentence of input;
If do not have gerund or verb to ignore this sentence in the A2 sentence, promptly the processing of semantic verbs information extraction do not done in this sentence;
A3, according to participle in the steps A 1 and part-of-speech tagging result, the speech of the structural word, noun, gerund and the verb that meet BA-sentence that the sentence key element requires or " quilt " words and expressions in the sentence is extracted, and to be unit with the speech leave among the sentence key element array A by the sequencing of speech in former;
Verb role described in the above-mentioned steps B extracts, and is to utilize verb role extracting rule that verb role among the sentence key element array A is extracted, and its operating process is as follows:
According to noun composition among the sentence key element array A, noun composition architectural feature, utilize verb role extracting rule that the verb role among the sentence key element array A is extracted at verb front-back direction and " quilt " words and expressions and BA-sentence;
The identification of the meaning of a word of verb described in the above-mentioned steps C is to determine the verb meaning of a word by the method for verb notion in verb and the verb role match event body, and its operating process is as follows:
Verb and verb role among C1, the traversal sentence key element array A are mated, and its operating process is as follows:
Verb among the traversal sentence key element array A, in the incident body, inquire about and judge the character of this verb, if this verb is an intransitive verb, this verb only mates the preceding nearest verb role of verb, if this verb is a body guest verb, nearest verb role behind this preceding nearest verb role of verb coupling verb and the verb is if this verb is a meaning guest verb, nearest verb role before this verb coupling verb obtains verb and verb role and mates set M;
C2, judge that whether verb and verb role are mated set M is empty, if be empty, then abandon this extraction semantic verbs information processing, otherwise whether the incident body of utilization is judged verb and verb role and is mated, if do not match then change the part of speech of this verb into gerund, and again to the coupling of a last verb to discerning again, otherwise it is right to keep this coupling; Neither one mates remaining among the set M if final verb and verb role are mated, then abandon this extraction semantic verbs information processing, otherwise traversal verb and verb role are mated the element of gathering M, to verb notion in the mapping incident body, obtain the verb word sense information by every pair of verb and verb role's coupling;
The identification of the time temporal feature of verb described in the above-mentioned steps D, its operating process is as follows:
Extract the temporal information of sentence and tense adverbial word (as:,, speech such as back), the temporal information and the temporal information of identification verb according to time tense extracting rule;
Generate semantic verbs information described in the above-mentioned steps E, its operating process is as follows:
Extract according to step B that verb role, step C obtain the verb word sense information and step D obtains the semantic verbs information that the time temporal information generates sentence.
Shown in Figure 3, verb role extracting rule most preferred embodiment is to be unit with sentence key element array A, extract a kind of template for the computing machine Automatic Extraction of verb role from sentence key element array A, it is the sequence that is made of role's speech of verb, verb, " quilt " words and expressions and BA-sentence feature and other statement compositions.
Such as: [{ * } { Actor}#{act_word}{*}]+
In verb role extracting rule, part in [] is the pattern of coupling, part in { } is a sentence element, * represent the sentence element except that verb role, verb and sentence elements such as " quilt " words and expressions and BA-sentence in the sentence, Actor represents role's speech of verb, act_word represents verb, and { } # represents that content can occur 0 time or 1 time in { }, and the content in []+expression [] occurs 1 time at least.
With reference to Fig. 4, as follows to this process step based on verb meaning of a word identification in the semantic verbs information extracting method of incident body:
(1) being arranged on the initial value that extracts i verb among the sentence key element array A is 1;
(2) in array A, obtain i verb;
(3) judge that whether the verb among the array A all travels through, if traveled through, then changes step (15), otherwise changes step (4);
(4) analyze the character of this verb,, then change step (5),, then change step (6), if this verb is meaning guest's verb character then changes step (7) if this verb is a body guest verb character if this verb is an intransitive verb character;
(5) mate the former piece role of role nearest before the verb, and this former piece role and verb, change step (8) as this verb;
(6) the nearest role before and after the verb respectively as the former piece role and the consequent role of this verb, and this former piece role and consequent role mated commentaries on classics step (8) with verb;
(7) mate the former piece role of role nearest before the verb, and this former piece role and verb as this verb;
(8) according to the restriction relation of verb that defines in the incident body and verb role coupling this verb and verb role are mated judging,, otherwise change step (9) if coupling is correctly changeed step (13);
(9) value with i subtracts 1;
(10) the value situation of judgement i: if i equals 0, then change step (11), otherwise change step (12);
(11) value that i is set is 1;
(12) change this verb part of speech into gerund, change step (2);
(13) value with i adds 1;
(14) will identify correct coupling and mate set M, change step (2) adding verb and verb role;
(15) judge whether set M is empty, if be empty, changes step (17), if be not empty, changes step (16);
(16) verb among the traversal set M and verb role are mated rightly, to shining upon verb notion in the incident body, obtain the verb word sense information by every pair of verb and verb role's coupling.
(17) finish.
With reference to Fig. 5, this is judged that based on verb character in the semantic verbs information extracting method of incident body process step is as follows:
Step 501, obtain the verb of preanalysis verb character;
Step 502, according to this verb character type that defines in the incident body, and all verb character types of this verb are left among the verb character type set C;
Step 503, judge among the verb character type set C whether comprise the type of intransitive verb,, change step 504,, change step 505 if do not comprise intransitive verb if comprise the type of intransitive verb;
Step 504, have or not the verb role after judging this verb,, change step 507, if do not change step 505 if having;
Step 505, judge the type that whether comprises meaning guest verb among the verb character type set C,, change step 506,, change step 510 if do not comprise the type of meaning guest verb if comprise the type of meaning guest verb;
Step 506, judge behind the verb whether have only verb or gerund,, change step 508, otherwise change step 509 if having only noun or gerund;
Step 507, this verb character is set is intransitive verb, judges that verb character finishes;
Step 508, this verb character is set is body guest verb, judges that verb character finishes;
Step 509, this verb character is set is meaning guest verb, judge that the verb part of speech finishes;
Step 510, this verb character is set is body guest verb, judges that the verb part of speech finishes;
Shown in Figure 6 is the synoptic diagram of this volume elements of adopting when making up of the mentioned incident body of present embodiment.Present embodiment is by OWL (Web Ontology Language) language description incident body, and this volume elements that relates in the incident Ontology Modeling has:
(1) notion of actConcept class 601 expression verbs;
(2) classification of eventClass class 602 presentation of events is as traffic hazard class, tsunami class etc.;
(3) the verb character type of ActoeProperty data attribute 603 expression verb notions 601, this verb character type is divided three classes: intransitive verb, body guest verb and meaning guest verb;
(4) the verb role of 604 expressions of MatchRole data attribute and verb notion 601 couplings;
(5) grammatical relation of Language data attribute 605 expression verb notions 601;
(6) time attribute of Time data attribute 606 expression verb notions 601, i.e. the time of origin of verb representative action;
(7) environment attribute of Environment data attribute 607 expression verb notions 601, i.e. the place that action takes place is feature extremely;
(8) hasPartOf object properties 608 expression eventClass class 602 is made up of the actConcept class, and the domain of hasPartOf object properties is eventClass classes, and range is the actConcept class;
(9) be set membership between hasSubClassOf object properties 609 expression actConcept classes and the actConcept class, the domain of hasSubClassOf object properties and range are the actConcept classes.