[go: up one dir, main page]

CN101901210A - Word meaning disambiguating system and method - Google Patents

Word meaning disambiguating system and method Download PDF

Info

Publication number
CN101901210A
CN101901210A CN2009101417374A CN200910141737A CN101901210A CN 101901210 A CN101901210 A CN 101901210A CN 2009101417374 A CN2009101417374 A CN 2009101417374A CN 200910141737 A CN200910141737 A CN 200910141737A CN 101901210 A CN101901210 A CN 101901210A
Authority
CN
China
Prior art keywords
word
meaning
speech
significant degree
confidence level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2009101417374A
Other languages
Chinese (zh)
Inventor
赵凯
胡长建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC China Co Ltd
Renesas Electronics China Co Ltd
Original Assignee
NEC China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC China Co Ltd filed Critical NEC China Co Ltd
Priority to CN2009101417374A priority Critical patent/CN101901210A/en
Publication of CN101901210A publication Critical patent/CN101901210A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention relates to a word meaning disambiguating system for disambiguating polysemous words. The word meaning disambiguating system comprises an input device and a word meaning disambiguating device, wherein the input device is used for inputting texts including the polysemous words; and the word meaning disambiguating device is used for iteratively determining the meaning of each word on the basis of the meaning obvious degree of the word, wherein the meaning obvious degree is obtained according to the meaning reliability of the word. Besides, the invention also relates to a word meaning disambiguating method. The word meaning disambiguating system and method can improve the consistency of word meaning disambiguating results and shorten calculating time.

Description

Sense disambiguation systems and method
Technical field
The present invention relates to natural language processing field, particularly, relate to a kind of sense disambiguation systems and method.
Background technology
In a kind of language, some speech has only a meaning of a word, and some speech has a plurality of meaning of a word.For example " phone " in the Chinese has only a meaning of a word, i.e. communication tool, and " clothes " have two meaning of a word (sense), and the one, clothing, the 2nd, eat.Word sense disambiguation (Word Sense Disambiguation, be called for short WSD) be exactly the meaning of a word of in concrete context environmental, determining certain polysemant, for example in " the spring clothes are existing; hat person five or six people; boy six or seven people ", determine that " clothes " are the meanings of clothing, and determine that in " taking medicine after meal " " clothes " are the meanings of eating.
Word sense disambiguation can be eliminated the ambiguity of speech, determines the real meaning of speech, and this all is very useful to text analyzing and associated various services.
As a rule word sense disambiguation has dual mode, and the one, supervision formula, the 2nd, non-supervision formula.The former needs the training sample set of an artificial mark, and the latter does not need.Because training sample set needs artificial mark, and generally be based on the field, that is to say that different field needs different training sample sets, so time that makes up and fund cost are all than higher.But not measure of supervision does not need training sample set, has advantages such as speed is fast, cost is low so supervise the formula method relatively.
Basic ideas of non-measure of supervision are to consider context (context).For example " clothes " word has two meaning of a word, but when occurring " Chinese tunic suit " in the context, then " clothes " get the meaning of a word of clothes probably, rather than the meaning of a word of eating.Specifically, list of references 1 (DianaMcCarthy, Rob Koeling, Julie Weeds, and John Carroll.Findingpredominant word senses in untagged text.In Proceedings of the 42ndMeeting of the Association for Computational Linguistics (ACL ' 04), MainVolume, pp 279-286.) provides a kind of computing method.
Fig. 1 shows the process flow diagram of the Word sense disambiguation method that list of references 1 adopted.Processing was divided into for four steps.The first, each polysemant is determined context; The second, each meaning of a word of each polysemant is determined and contextual similarity; The 3rd, to each polysemant, take all factors into consideration its each meaning of a word and contextual similarity, each meaning of a word is calculated confidence level; The 4th, the meaning of a word that selection has maximum confidence is as the meaning of a word of this polysemant.
Specifically, the context of suppositive w has n speech, then is designated as c (w)={ n 1, n 2..., n k.If w has m the meaning of a word (brief note is ws), be designated as Senses (w)=(ws 1, ws 2..., ws m).The meaning of a word ws of speech w iThe computing formula of confidence level as follows:
C ( ws i ) = Σ n j ∈ c ( w ) S ( w s i , n j ) Σ ws i ′ ∈ senses ( w ) S ( ws i ′ , n j )
S (ws wherein i, n j) be ws iCliction n about in the of j with w jSimilarity.Suppose n jL the meaning of a word arranged, and concrete formula is S (ws i, n j)=max (S (ws i, ns J1), S (ws i, ns J2) ..., S (ws i, ns Jl)), ns wherein JpRepresent n jP the meaning of a word.S (ws i, ns J1) be the similarity of two meaning of a word, some dictionary can provide this function, for example HowNet.
Below in conjunction with an example the employed method of list of references is described.Suppose to have three speech: { clothes, dress, bag }, they are context each other, for example c (clothes)={ dress, bag }.Suppose they the meaning of a word and the similarity between the meaning of a word as shown in Table 1 and Table 2.Table 1 shows clothes, dress, and the meaning of a word of three speech of bag, table 2 shows the similarity between the meaning of a word.For example, the fifth line of table 2 has been represented similarity S (clothing (clothes), apparatus (tool))=0.3.
Speech The meaning of a word 1 The meaning of a word 2
Clothes Clothing (clothes) Eat (eat)
Dress Clothing (clothes) Wrapping (wrap)
Bag Apparatus (tools) Wrapping (wrap)
Table 1
The meaning of a word 1 The meaning of a word 2 Similarity
Clothing (clothes) Clothing (clothes) 1
Apparatus (tools) Apparatus (tools) 1
Wrapping (wrap) Wrapping (wrap) 1
Eat (eat) Eat (eat) 1
Clothing (clothes) Apparatus (tools) 0.3
Wrapping (wrap) Eat (eat) 0.2
Clothing (clothes) Wrapping (wrap) 0
Clothing (clothes) Eat (eat) 0
Apparatus (tools) Wrapping (wrap) 0
Apparatus (tools) Eat (eat) 0
Table 2
The method of describing in the list of references 1 is that each speech is carried out four steps in the above flow process simultaneously.
For example, to the w=clothes, the first, the context of determining it is c (w)={ n 1, n 2}={ adorned, bag }.
The second, calculate each meaning of a word and contextual similarity:
Senses (w)=(ws 1, ws 2)=(clothing (clothes) is eaten (eat)).
S (ws 1, n 1)=max (S (clothing (clothes), clothing (clothes)), S (clothing
(clothes), wrapping (wrap)))=max (1,0)=1
S (ws 1, n 2)=max (S (clothing (clothes), apparatus (tools)), S (clothing
(clothes), wrapping (wrap)))=max (0.3,0)=0.3
S (ws 2, n 1(S (eating (eat), clothing (clothes)), S (eats (eat), wrapping to)=max
(wrap)))=max(0,0.2)=0.2
S (ws 2, n 2(S (eating (eat), apparatus (tools)), S (eats (eat), wrapping to)=max
(wrap)))=max(0,0.2)=0.2
The 3rd, calculate the confidence level of each meaning of a word:
C(ws 1)=S(ws 1,n 1)/(S(ws 1,n 1)+S(ws 2,n 1))+S(ws 1,n 2)/(S(ws 1,n 2)+S(ws 2,n 2))=1/(1+0.2)+0.3/(0.3+0.2)=1.43
C(ws 2)=S(ws 2,n 1)/(S(ws 1,n 1)+S(ws 2,n 1))+S(ws 2,n 2)/(S(ws 1,n 2)+S(ws 2,n 2))=0.2/(1+0.2)+0.2/(0.3+0.2)=0.57
The 4th, the meaning of a word of definite " clothes ": because C (ws 1)>C (ws 2), so " clothes " get ws 1The meaning of a word of=clothing (clothes).
Similarly, to the w=dress, the first, the context of determining it is c (w)={ n 1, n 2}={ obeys, bag }.
The second, calculate each meaning of a word and contextual similarity:
Senses (w)=(ws 1, ws 2)=(clothing (clothes), wrapping (wrap)).
S (ws 1, n 1)=max (S (clothing (clothes), clothing (clothes)), S (clothing (clothes) is eaten (eat)))=max (1,0)=1
S (ws 1, n 2)=max (S (clothing (clothes), apparatus (tools)), S (clothing (clothes), wrapping (wrap)))=max (0.3,0)=0.3
S (ws 2, n 1)=max (S (wrapping (wrap), clothing (clothes)), S ((eat) eaten in wrapping (wrap)))=max (0,0.2)=0.2
S (ws 2, n 2)=max (S (wrapping (wrap), apparatus (tools)), S (wrapping (wrap), wrapping (wrap)))=max (0,1)=1
The 3rd, calculate the confidence level of each meaning of a word:
C(ws 1)=S(ws 1,n 1)/(S(ws 1,n 1)+S(ws 2,n 1))+S(ws 1,n 2)/(S(ws 1,n 2)+S(ws 2,n 2))=1/(1+0.2)+0.3/(0.3+1)=1.06
C(ws2)=0.2/(1+0.2)+1/(0.3+1)=0.94
The 4th, the meaning of a word of definite " clothes ": because C (ws1)>C (ws2), so " dress " gets ws 1The meaning of a word of=clothing (clothes).
Similarly, to the w=bag, the first, the context of determining it is c (w)={ n 1, n 2}={ obeys, dress }.
The second, calculate each meaning of a word and contextual similarity:
Senses (w)=(ws 1, ws 2)=(apparatus (tools), wrapping (wrap)).
S (ws 1, n 1)=max (S (apparatus (tools), clothing (clothes)), S (apparatus (tools),
Eat (eat)))=max (0.3,0)=0.3
S (ws 1, n 2)=max (S (apparatus (tools), clothing (clothes)), S (apparatus (tools),
Wrapping (wrap)))=max (0.3,0)=0.3
S (ws 2, n 1)=max (S (wrapping (wrap), clothing (clothes)), S (wrapping (wrap),
Eat (eat)))=max (0,0.2)=0.2
S (ws 2, n 2)=max (S (wrapping (wrap), clothing (clothes)), S (wrapping (wrap), wrapping (wrap)))=max (0,1)=1
The 3rd, calculate the confidence level of each meaning of a word:
C(ws 1)=S(ws 1,n 1)/(S(ws 1,n 1)+S(ws 2,n 1))+S(ws 1,n 2)/(S(ws 1,n 2)+S(ws 2,n 2))=0.3/(0.3+0.2)+0.3/(0.3+1)=0.83
C(ws 2)=0.2/(0.3+0.2)+1/(0.3+1)=1.17
The 4th, the meaning of a word of definite " bag ": because C (ws 2)>C (ws 1), so " bag " gets ws 2The meaning of a word of=wrapping (wrap).
Comprehensive above three results are output as: { clothes: clothing (clothes), dress: clothing (clothes), bag: wrapping (wrap) }.
Because above process is to calculate the meaning of a word of each speech simultaneously, may exist inconsistent among the result.For example, in last example, what all get is the meaning of a word of clothing (clothes) for clothes and dress, is the meaning of a word of wrapping (wrap) and bag is got.But the computation process that anatomizes bag can find why bag gets this meaning of a word, is because the meaning of a word of the wrapping (wrap) of " dress " in computation process has played conclusive effect (S (ws 2, n 2)=max (..., S (wrapping (wrap), wrapping (wrap)))=max (0,1)=1).What but dress was got at last but is not wrapping (wrap) this meaning of a word, and this has just caused inconsistent.Result correct in the last example should be { clothes: clothing (clothes), dress: clothing (clothes), bag: apparatus (tools) }.
Summary of the invention
The present invention proposes a kind of gradual sense disambiguation systems and method.The initial meaning of a word of only determining a speech, rather than the meaning of a word of all speech recomputate other speech and corresponding contextual similarity subsequently.In recomputating process, determined that definite meaning of a word only considered in the speech of the meaning of a word, and ignored other meaning of a word of this speech.Repeat this process up to the meaning of a word of having determined all speech.
According to first aspect present invention, a kind of sense disambiguation systems has been proposed, be used for polysemant is carried out word sense disambiguation, comprising: input media is used to import the text that comprises polysemant; And the word sense disambiguation device, be used for the meaning of a word that meaning of a word significant degree based on institute's predicate determines iteratively each speech, wherein meaning of a word significant degree is that meaning of a word confidence level according to institute's predicate obtains.
According to second aspect present invention, a kind of Word sense disambiguation method has been proposed, be used for polysemant is carried out word sense disambiguation, comprising: input step, input comprises the text of polysemant; And the word sense disambiguation step, come to determine iteratively the meaning of a word of each speech based on the meaning of a word significant degree of institute's predicate, wherein meaning of a word significant degree is that meaning of a word confidence level according to institute's predicate obtains.
Preferably, in order to guarantee result's correctness, when determining the meaning of a word, select the most tangible that speech of the meaning of a word to determine the meaning of a word.For example, calculate significant degree based on the confidence level of the meaning of a word, then the confidence level of the meaning of a word is big more, and the meaning of a word is obvious more.
Because may prolong to some extent the computing time of gradual process than classic method, the invention allows for the method that reduces computing time, accelerates computation process.The present invention determines at first the meaning of a word of a plurality of speech, rather than only determines the meaning of a word of a speech, and selects the speech that is consistent with the meaning of a word of determining as far as possible.Since reduce computing time and may cause occurring among the result inconsistent, so this is a scheme of compromise.
Preferably, in order to save computing time, when determining the meaning of a word, select the speech of meaning of a word significant degree greater than a threshold value.
Preferably, in order to save computing time, when determining the meaning of a word, according to the meaning of a word significant degree speech is sorted and therefrom select before n speech.
Preferably, in order to save computing time, after the meaning of a word of having determined a speech, the possible meaning of a word of speech determined in the conjecture meaning of a word, and according to the meaning of a word of conjecture whether with determine that the meaning of a word is consistent and obtain the meaning of a word that speech determined in the meaning of a word.
Thus, the present invention has improved word sense disambiguation result's consistance, and keeps result's correctness in this process, and has overcome long shortcoming computing time.
Description of drawings
Fig. 1 shows the process flow diagram of the Word sense disambiguation method of prior art;
Fig. 2 a shows the synoptic diagram of the sense disambiguation systems of first embodiment of the invention;
Fig. 2 b shows the process flow diagram according to Word sense disambiguation method of the present invention;
Fig. 2 c shows another process flow diagram according to Word sense disambiguation method of the present invention;
Fig. 2 d shows another process flow diagram according to Word sense disambiguation method of the present invention;
Fig. 3 a shows the synoptic diagram according to the sense disambiguation systems of second embodiment of the invention;
Fig. 3 b shows another process flow diagram according to Word sense disambiguation method of the present invention.
Embodiment
Below, the preferred embodiments of the present invention will be described with reference to the drawings.In the accompanying drawings, components identical will be by identical reference symbol or numeral.In addition, in following description of the present invention, with the specific descriptions of omitting known function and configuration, to avoid making theme of the present invention unclear.Fig. 2 a shows the sense disambiguation systems according to first embodiment of the invention.This system comprises input media 21, and device 22 determined in context, word sense disambiguation device 2 and storer (not shown).Input media 21 is used to receive the text of input, and text comprises the polysemant with a plurality of meaning of a word.Context determines that device 22 is used for each polysemant of text is determined its context.For a polysemant, the context of this speech can be regarded as in its one or more adjacent speech in text.Word sense disambiguation device 2 comprises similarity calculated 23, meaning of a word confidence level computing unit 24, and meaning of a word significant degree computing unit 25 selects speech unit 26, meaning of a word determining unit 27 and controller 28.Similarity calculated 23 is used to calculate the meaning of a word of each polysemant and the similarity between its context.There have been some dictionaries to provide to calculate the function of the similarity between two meaning of a word, for example, can have used WordNet (English) or HowNet (Chinese) dictionary to obtain similarity between the meaning of a word of two polysemants.Meaning of a word confidence level computing unit 24 is used for calculating based on the similarity that obtains the meaning of a word confidence level of speech.Can adopt the method for list of references 1 to calculate meaning of a word confidence level.Meaning of a word significant degree computing unit 25 is used for obtaining based on the meaning of a word confidence level of speech the meaning of a word significant degree of speech.Meaning of a word significant degree has represented that polysemant gets the possibility of certain meaning of a word.Select speech unit 26 to be used for selecting to satisfy the speech of predetermined condition, for example, select the speech of meaning of a word significant degree maximum, select the speech of meaning of a word significant degree, perhaps from according to n speech before selecting the polysemant after the significant degree ordering greater than a threshold value according to meaning of a word significant degree.Meaning of a word determining unit 27 is used for the meaning of a word of speech that determine to select.Thereby can in each circulation, determine the meaning of a word of a speech, perhaps in each circulation, determine the meaning of a word of a plurality of speech.Controller 28 is used to control similarity calculated 23, meaning of a word confidence level computing unit 24, and meaning of a word significant degree computing unit 25 selects the operation of speech unit 26 and meaning of a word determining unit 27.Thereby each unit carries out similarity to the polysemant circulation in the text of input under the control of controller calculates, confidence level is calculated, and meaning of a word significant degree calculates, and selects speech, determine the meaning of a word, up to each polysemant in the text has been determined the meaning of a word of this polysemant in text.
Comprise that context determines device 22 though Fig. 2 a illustrates sense disambiguation systems of the present invention, be understandable that sense disambiguation systems can not comprise that also this context determines device, and be to use input determined contextual text.
Fig. 2 b shows according to Word sense disambiguation method of the present invention.At S201, input media 20 input texts of sense disambiguation systems.At S202, the context of each polysemant in device 22 definite texts determined in context.At S203, the similarity calculated 23 of word sense disambiguation device is determined each meaning of a word and the contextual similarity of each polysemant respectively.At S204, meaning of a word confidence level computing unit 24 calculates the confidence level of each meaning of a word of each polysemant.
At S205, meaning of a word significant degree computing unit 25 calculates the meaning of a word significant degree of each polysemant.Can use one of following two kinds of optional formula to calculate the meaning of a word significant degree of polysemant.
E(w)=Max(C w) E ( w ) = Max ( C w ) - Second _ Max ( C w ) Second _ Max ( C w )
Wherein, the Max (C in first formula w) be confidence level maximum in all meaning of a word confidence levels of speech w, and Second_Max (C w) be time big confidence level.Second formula is used to weigh maximum confidence and surmounts time degree of big confidence level.
For two formula, E (w) is big more, and then the meaning of a word of speech w is obvious more, therefore can get over the meaning of a word of early determining this speech in circulation.For example in " clothes bag " example, two meaning of a word confidence levels of clothes are respectively 1.43 and 0.57, and two meaning of a word confidence levels of dress are respectively 1.06 and 0.94, Fu two meaning of a word difference are very big so, the meaning of a word of clothes is more definite, should get confidence value and be that meaning of a word of 1.43, and two meaning of a word difference of dress are little, can not determine and get which meaning of a word.So, if only consider clothes and adorn two speech, should determine the meaning of a word of clothes earlier, determine the meaning of a word of dress again according to the meaning of a word of fixed clothes.
Afterwards, at S206, select speech unit 26 to select the speech of meaning of a word significant degree maximum, and the meaning of a word determined in the speech of selecting.The confidence level of each meaning of a word of the speech that can relatively select, and get the meaning of a word of that meaning of a word of confidence level maximum as the speech of selecting.At S208, controller 28 judges whether to have determined the meaning of a word of all polysemants.If no, then carry out S203, otherwise end process.
Be example also below, said method is carried out simple declaration with " clothes bag " speech.
First circulation:
(1) determine context, it is identical with the mode that prior art adopts with confidence level to calculate similarity, no longer describes here.
(2), calculate the meaning of a word significant degree of speech according to above-mentioned second formula asking E (w):
E (clothes)=(1.43-0.57)/0.57=1.51
E (dress)=(1.06-0.94)/0.94=0.13
E (bag)=(1.17-0.83)/0.83=0.41
(3) speech of selection meaning of a word significant degree maximum is selected " clothes " here.
(4) last, determine the meaning of a word of obeying.Because C is (ws 1)>C (ws 2), so get ws 1The meaning of a word of=clothing (clothes).
Second circulation:
Also remaining " dress " and " bag " two words, below calculating respectively.Owing to determined the meaning of a word of clothes in first circulation, therefore, in following calculating, clothes are only got the meaning of a word of clothing (clothes), and no longer get the meaning of a word of eating (eat).
W=is adorned: (c (w)={ n 1, n 2}={ obeys, bag }), Senses (w)=(ws 1, ws 2)=(clothing (clothes), wrapping (wrap)).
(1) calculates similarity
S (ws 1, n 1)=max (S (clothing (clothes), clothing (clothes)))=max (1)=
1
S (ws 1, n 2)=max (S (clothing (clothes), apparatus (tools)), S (clothing (clothes), wrapping (wrap)))=max (0.3,0)=0.3
S (ws 2, n 1)=max (S (wrapping (wrap), clothing (clothes)))=max (0)=0
S (ws 2, n 2)=max (S (wrapping (wrap), apparatus (tools)), S (wrapping (wrap),
Wrapping (wrap)))=max (0,1)=1
(2) calculate meaning of a word confidence level
C(ws 1)=S(ws 1,n 1)/(S(ws 1,n 1)+S(ws 2,n 1))+S(ws 1,n 2)/(S(ws 1,n 2)+S(ws 2,n 2))=1/(1+0)+0.3/(0.3+1)=1.23
C(ws 2)=0/(1+0)+1/(0.3+1)=0.77
(3) calculate meaning of a word significant degree
E (dress)=(1.23-0.77)/0.77=0.6
W=is wrapped:
(c (w)={ n 1, n 2}={ obeys, dress }), Senses (w)=(ws 1, ws 2)=(apparatus (tools), wrapping (wrap)).
(1) calculates similarity
S (ws 1, n 1)=max (S (apparatus (tools), clothing (clothes)))=max (0.3)=0.3
S (ws 1, n 2)=max (S (apparatus (tools), clothing (clothes)), S (apparatus (tools), wrapping (wrap)))=max (0.3,0)=0.3
S (ws 2, n 1)=max (S (wrapping (wrap), clothing (clothes)))=max (0)=0
S (ws 2, n 2)=max (S (wrapping (wrap), clothing (clothes)), S (wrapping (wrap), wrapping (wrap)))=max (0,1)=1
(2) calculate meaning of a word confidence level
C(ws 1)=S(ws 1,n 1)/(S(ws 1,n 1)+S(ws 2,n 1))+S(ws 1,n 2)/(S(ws 1,n 2)+S(ws 2,n 2))=0.3/(0.3+0)+0.3/(0.3+1)=1.23
C(ws 2)=0/(0.3+0)+1/(0.3+1)=0.77
(3) calculate meaning of a word significant degree
E (bag)=(1.23-0.77)/0.77=0.6
(4) select the significantly maximum speech of the meaning of a word
(second circulation):, can select any one because dress is identical with the significant degree of bag.For example choosing " dress " (result of choosing " bag " is the same).
(5) meaning of a word of definite speech of selecting
Because C is (ws 1)>C (ws 2), so " dress " gets ws 1The meaning of a word of=clothing (clothes).
The 3rd circulation: word of only remaining bag.In following calculating, clothes and dress are only got the meaning of a word of clothing (clothes), and no longer get other the meaning of a word.
W=is wrapped: (c (w)={ n 1, n 2}={ obeys, dress }), Senses (w)=(ws 1, ws 2)=(apparatus (tools), wrapping (wrap)).
(1) calculates similarity
S (ws 1, n 1)=max (S (apparatus (tools), clothing (clothes)))=max (0.3)=0.3
S (ws 1, n 2)=max (S (apparatus (tools), clothing (clothes)))=max (0.3)=0.3
S (ws 2, n 1)=max (S (wrapping (wrap), clothing (clothes)))=max (0)=0
S (ws 2, n 2)=max (S (wrapping (wrap), clothing (clothes)))=max (0)=0
(2) calculate confidence level
C(ws 1)=S(ws 1,n 1)/(S(ws 1,n 1)+S(ws 2,n 1))+S(ws 1,n 2)/(S(ws 1,n 2)+S(ws 2,n 2))=0.3/(0.3+0)+0.3/(0.3+0)=2
C(ws 2)=0/(0.3+0)+0/(0.3+0)=0
Because only surplus next speech, so can omit the step of calculating meaning of a word significant degree and selecting the speech of meaning of a word significant degree maximum.When determining the meaning of a word, because C is (ws 1)>C (ws 2), so bag is got ws 1The meaning of a word of=apparatus (tools).
Exporting the result at last is: { clothes: clothing (clothes), dress: clothing (clothes), bag: apparatus (tools) }.This is correct result, and the meaning of a word that wherein wraps kimonos, dress is consistent.
Adopt Word sense disambiguation method according to the present invention in word sense disambiguation, to keep the consistance of the meaning of a word as can be seen according to above-mentioned example.
In addition, though the result of above-mentioned Word sense disambiguation method has kept consistance, the method that above-mentioned example adopted has been used three circulations, double counting some content, so prolong to some extent than list of references 1 computing time.
In order to reduce computing time, to accelerate computation process, the present invention proposes improving one's methods to above-mentioned Word sense disambiguation method.To be (1) surpass the speech of a certain threshold value to all meaning of a word significant degrees to its thinking, all determines the meaning of a word in same circulation.(2) all speech are sorted according to meaning of a word significant degree, the meaning of a word determined in n speech before getting in same circulation.Below in conjunction with Fig. 2 c and 2d these two kinds are improved one's methods and to be described.
Fig. 2 c shows a process flow diagram of Word sense disambiguation method.Wherein S401 to S405 is identical with the processing procedure of S201 to S205, omits its description here.At S406, select speech unit 26 to select the polysemant of meaning of a word significant degree, and determine the meaning of a word of the speech of selection greater than threshold value.If the meaning of a word significant degree of certain speech very high (being higher than threshold value), then it to get the possibility of this meaning of a word very big, even some contextual meaning of a word changes in circulation subsequently, the possibility of this speech change meaning of a word is also little, so can just determine the meaning of a word of this speech in first circulation.But, may exist inconsistently among the result because threshold value normally is provided with.At S407, controller 28 judges whether to have determined the meaning of a word of all polysemants, if do not have, then carries out S403, otherwise end process.
Below in conjunction with " clothes bag " speech, this method is carried out simple declaration.
First circulation:
Because the similarity and the confidence level of each speech of calculating " clothes bag " are the same, have omitted description here.
Calculate meaning of a word significant degree: E (clothes)=1.51, E (dress)=0.13, E (bag)=0.41.
Select the speech of meaning of a word significant degree:, then have only a speech to satisfy condition: clothes if threshold value T=0.5 is set greater than threshold value.
Determine the meaning of a word: the implication of determining clothes is clothing (clothes).
Second circulation:
Omitted equally the similarity of " dress " and " bag " and the computation process of confidence level.
E (dress)=E (bag)=0.6.Because the two is all greater than T, so select this two speech decision meaning of a word.This process is no longer described here.At last, " dress " gets the meaning of a word of clothing (clothes), and " bag " takes the meaning of a word of tool (tools).
Exporting the result at last is: { clothes: clothing (clothes), dress: clothing (clothes), bag: apparatus (tools) }.This is correct result.The method that this example adopted has only used two circulations just to obtain correct result, so saved the computing time of sense disambiguation systems.
Fig. 2 d shows another process flow diagram of Word sense disambiguation method.Wherein S501 to S505 is identical with the processing procedure of S201 to S205, omits its description here.
At S506, select speech unit 26 polysemant to be sorted according to meaning of a word significant degree, and n speech before selecting.Owing to can determine the meaning of a word of a plurality of speech in this step, so can save certain computing time.But n also is the threshold value that is provided with, and may introduce inconsistent.
At S507, meaning of a word determining unit is determined the meaning of a word of the speech of selection.At S508, controller 28 judges whether to have determined the meaning of a word of all polysemants, if do not have, then carries out S503, otherwise end process.
Be example still, this method is carried out simple declaration with " clothes bag ".
First circulation:
Because the similarity and the confidence level of each speech of calculating " clothes bag " are the same, omit here its description.
Calculate meaning of a word significant degree: E (clothes)=1.51, E (dress)=0.13, E (bag)=0.41.
Ranking results: E (clothes)>E (bag)>E (dress).If n=2 is set, gets first two words and determine the meaning of a word.To " clothes ", because C is (ws 1)>C (ws 2), so get ws 1The meaning of a word of=clothing (clothes).To " bag ", because C is (ws 1)<C (ws 2), so get ws 2The meaning of a word of=wrapping (wrap).
Second circulation, only surplus next " dress " word.
W=is adorned: (c (w)={ n 1, n 2}={ obeys, bag }), Senses (w)=(ws 1, ws 2)=(clothing (clothes), wrapping (wrap)).
Calculate similarity:
S (ws 1, n 1)=max (S (clothing (clothes), clothing (clothes)))=max (1)=1
S (ws 1, n 2)=max (S (clothing (clothes), wrapping (wrap)))=max (0)=0
S (ws 2, n 1)=max (S (wrapping (wrap), clothing (clothes)))=max (0)=0
S (ws 2, n 2)=max (S (wrapping (wrap), wrapping (wrap)))=max (1)=1
Calculate confidence level:
C(ws 1)=C(ws 2)=1
Because C is (ws 1) and C (ws 2) confidence level identical, so can choose one wantonly, for example the meaning of a word is got " clothing (clothes) ".
Then be output as { clothes: clothing (clothes), dress: clothing (clothes), bag: wrapping (wrap) } at last.The method that this example adopted has only been used two circulations, has saved computing time.
Fig. 3 a shows the sense disambiguation systems according to second embodiment of the invention.Compare with the sense disambiguation systems shown in Fig. 2 a, this sense disambiguation systems also comprises meaning of a word conjecture unit 38 and meaning of a word acquiring unit 39.Meaning of a word conjecture unit 38 is used for the possible meaning of a word of the undetermined polysemant conjecture of the meaning of a word.Whether the possible meaning of a word that meaning of a word acquiring unit 39 is used to judge conjecture is defined as the meaning of a word of this polysemant with determining the meaning of a word possible meaning of a word consistent and that will guess when the unanimity.By adopting meaning of a word conjecture unit 38 and meaning of a word acquiring unit 39, save computing time thereby can reduce double counting.
Below in conjunction with Fig. 3 b the processing that the system of second embodiment of the invention carries out is described.Fig. 3 b shows according to Word sense disambiguation method of the present invention.At S601, input media 31 input texts of sense disambiguation systems.At S602, the context of each polysemant in device 32 definite texts determined in context.At S603, the similarity calculated 33 of word sense disambiguation device is determined each meaning of a word and the contextual similarity of each polysemant respectively.At S604, meaning of a word confidence level computing unit 34 calculates the confidence level of each meaning of a word of each polysemant.At S605, meaning of a word significant degree computing unit 35 calculates the meaning of a word significant degree of each polysemant.Can use the employed method of S205 to calculate meaning of a word significant degree.At S606, select speech unit 36 to select the speech of meaning of a word significant degree maximum, and meaning of a word determining unit 37 is determined the meaning of a word of this speech.
At S607, the possible meaning of a word of meaning of a word conjecture unit 38 other speech of conjecture.
At S608, meaning of a word acquiring unit 39 is selected the meaning of a word of conjecture and is determined the consistent speech of the meaning of a word, and with the meaning of a word the guessed meaning of a word as this speech.Because after meaning of a word determining unit 38 has been determined the meaning of a word of a speech, the meaning of a word is guessed unit 38 and is guessed that 39 interactive operations of meaning of a word acquiring unit are to check whether all its meaning of a word of polysemant of determining the meaning of a word are consistent with definite meaning of a word, if it is consistent, to determine not in this circulation that then the meaning of a word is defined as determining the meaning of a word, thereby reduce computing time.
At S609, controller 40 judges whether to have determined the meaning of a word of all speech.If not, then carry out S603, otherwise end process.
Be example still below with " clothes bag ", the simple declaration said method.
First circulation:
Determine context, it is identical with the mode that prior art adopts with confidence level to calculate similarity, no longer describes here.And determined the meaning of a word of service: " w=clothes " get the meaning of a word of ws=clothing (clothes).
The meaning of a word that speech may have is not determined in conjecture:
(1) to A=dress, ws 1=clothing (clothes), ws 2=wrapping (wrap). because C (ws 1)=1.06>C (ws 2)=0.94 is so dress is got As=ws 1.
(2) to A=bag, ws 1=apparatus (tools), ws 2=wrapping (wrap). because C (ws1)=0.83<C (ws2)=1.17, so bag is got As=ws2.
Judge whether the conjecture meaning of a word of determining speech is not consistent with the meaning of a word (ws=clothing (clothes)) of " clothes ", if consistent, then with the meaning of a word guessed the meaning of a word as this speech:
Wherein, claim that its certain meaning of a word As is consistent with the meaning of a word of speech w by meaning of a word speech A to determining, and and if only if S (As, w)=S (As, ws).Wherein ws is the meaning of a word that speech w has determined.
(1) to the A=dress, S (As, w)=max (S (clothing (clothes), clothing (clothes)),
S (clothing (clothes) is eaten (eat)))=max (1,0)=1.
And S (As, ws)=S (clothing (clothes), clothing (clothes))=1.
Because S (As, w)=S (As, ws), so As is consistent with the meaning of a word of speech w.
(2) to the A=bag, and S (As, w)=max (S (wrapping (wrap), clothing (clothes)), S ((eat) eaten in wrapping (wrap)))=max (0,0.2)=0.2.
And S (As, ws)=S (wrapping (wrap), clothing (clothes)=0..
Because S (As, w) ≠ S (As, ws), so the meaning of a word of As and speech w is inconsistent.
Because " dress " meets the requirements, and bag does not meet.So determine the meaning of a word of " dress ", i.e. clothing (clothes).
So, after this loop ends, have two speech to determine the meaning of a word: clothes and dress.
Second circulation: only remaining " bag " speech.
W=is wrapped: (c (w)={ n 1, n 2}={ obeys, dress }), Senses (w)=(ws 1, ws 2)=(apparatus (tools), wrapping (wrap)).
Calculate similarity:
S (ws 1, n 1)=max (S (apparatus (tools), clothing (clothes)))=max (0.3)=0.3
S (ws 1, n 2)=max (S (apparatus (tools), clothing (clothes)))=max (0.3)=0.3
S (ws 2, n 1)=max (S (wrapping (wrap), clothing (clothes)))=max (0)=0
S (ws 2, n 2)=max (S (wrapping (wrap), clothing (clothes)))=max (0)=0
Calculate confidence level:
C(ws 1)=S(ws 1,n 1)/(S(ws 1,n 1)+S(ws 2,n 1))+S(ws 1,n 2)/(S(ws 1,n 2)+S(ws 2,n 2))=0.3/(0.3+0)+0.3/(0.3+0)=2
C(ws 2)=0/(0.3+0)+0/(0.3+0)=0
Because only remain next speech, can directly judge the meaning of a word of this speech.Because C is (ws 1)>C (ws 2), so bag is got ws 1The meaning of a word of=apparatus (tools).Exporting the result at last is: { clothes: clothing (clothes), dress: clothing (clothes), bag: apparatus (tools) }.This result has kept the consistance of the meaning of a word in the text when having eliminated meaning of a word difference, and has reduced computing time, has accelerated computation process.
Though the present invention is an example with the Chinese text, and the system and method for word sense disambiguation has been described, for those skilled in the art, clearly, the present invention can also be applied to other Languages, for example, and English, Japanese.
Although with reference to specific embodiment, invention has been described, the present invention should not limited by these embodiment, and should only be limited by claims.Should be understood that under the prerequisite that does not depart from scope and spirit of the present invention, those of ordinary skills can change or revise embodiment.

Claims (14)

1. a sense disambiguation systems is used for polysemant is carried out word sense disambiguation, comprising:
Input media is used to import the text that comprises polysemant; And
The word sense disambiguation device is used for the meaning of a word that meaning of a word significant degree based on institute's predicate determines iteratively each speech, and wherein meaning of a word significant degree is that meaning of a word confidence level according to institute's predicate obtains.
2. the system as claimed in claim 1, wherein the word sense disambiguation device comprises:
Similarity calculated is used to calculate the meaning of a word of institute's predicate and the similarity between its context;
Meaning of a word confidence level computing unit is used for the meaning of a word confidence level based on the similarity calculating institute predicate that obtains;
Meaning of a word significant degree computing unit is used for the meaning of a word significant degree based on the meaning of a word confidence level acquisition institute predicate of institute's predicate;
Select the speech unit, be used for selecting to satisfy the speech of predetermined condition according to meaning of a word significant degree;
Meaning of a word determining unit is used for determining the meaning of a word of the speech of described selection; And
Controller is used to control above-mentioned each unit is determined each speech iteratively based on the meaning of a word significant degree of institute's predicate the meaning of a word.
3. system as claimed in claim 1 or 2, wherein:
Meaning of a word significant degree equals value maximum in the meaning of a word confidence level of institute's predicate or equals the ratio of difference and inferior big meaning of a word confidence level between maximum meaning of a word confidence level and time big meaning of a word confidence level.
4. system as claimed in claim 2 wherein selects the speech unit to select the speech of meaning of a word significant degree maximum.
5. system as claimed in claim 2 wherein selects the speech unit to select the speech of meaning of a word significant degree greater than a threshold value.
6. system as claimed in claim 2, wherein select the speech unit institute's predicate to be sorted according to meaning of a word significant degree and therefrom select before n speech.
7. system as claimed in claim 2 wherein also comprises:
Meaning of a word conjecture unit is used to guess the meaning of a word of definite speech of the meaning of a word;
Meaning of a word acquiring unit, be used for according to the meaning of a word of conjecture whether with determine that the meaning of a word is consistent and obtain the meaning of a word that speech determined in the meaning of a word; And
Above-mentioned each unit of described controller control is determined the meaning of a word of each speech iteratively based on the meaning of a word significant degree of institute's predicate.
8. the system as claimed in claim 1 wherein also comprises:
Device determined in context, is used for context determined in the speech of described input text.
9. a Word sense disambiguation method is used for polysemant is carried out word sense disambiguation, comprising:
Input step, input comprises the text of polysemant; And
The word sense disambiguation step determines iteratively the meaning of a word of each speech based on the meaning of a word significant degree of institute's predicate, and wherein meaning of a word significant degree is that meaning of a word confidence level according to institute's predicate obtains.
10. method as claimed in claim 9, wherein the word sense disambiguation step comprises:
The similarity calculation procedure is calculated the meaning of a word of institute's predicate and the similarity between its context;
Meaning of a word confidence level calculation procedure is based on the meaning of a word confidence level of the similarity calculating institute predicate that obtains;
Meaning of a word significant degree calculation procedure obtains the meaning of a word significant degree of institute's predicate based on the meaning of a word confidence level of institute's predicate;
Select the speech step, select to satisfy the speech of predetermined condition according to meaning of a word significant degree;
Meaning of a word determining step is determined the meaning of a word of the speech of described selection; And
Repeat above-mentioned each step up to the meaning of a word of having determined each speech.
11. as claim 9 or 10 described methods, wherein:
Meaning of a word significant degree equals value maximum in the meaning of a word confidence level of institute's predicate or equals the ratio of difference and inferior big meaning of a word confidence level between maximum meaning of a word confidence level and time big meaning of a word confidence level.
12. method as claimed in claim 10 wherein selects the speech step to select to satisfy the speech of predetermined condition according to one of following manner:
Select the speech of meaning of a word significant degree maximum;
Select the speech of meaning of a word significant degree greater than threshold value; And
According to the meaning of a word significant degree institute's predicate is sorted and therefrom select before n speech.
13. method as claimed in claim 10 wherein also is included in the step of carrying out after the meaning of a word determining step:
The meaning of a word of speech do not determined in the conjecture meaning of a word; And
According to the meaning of a word of conjecture whether with determine that the meaning of a word is consistent and obtain the meaning of a word that speech determined in the meaning of a word.
14. method as claimed in claim 9 wherein also comprises:
The context determining step is determined context to the speech in the described input text.
CN2009101417374A 2009-05-25 2009-05-25 Word meaning disambiguating system and method Pending CN101901210A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009101417374A CN101901210A (en) 2009-05-25 2009-05-25 Word meaning disambiguating system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009101417374A CN101901210A (en) 2009-05-25 2009-05-25 Word meaning disambiguating system and method

Publications (1)

Publication Number Publication Date
CN101901210A true CN101901210A (en) 2010-12-01

Family

ID=43226753

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009101417374A Pending CN101901210A (en) 2009-05-25 2009-05-25 Word meaning disambiguating system and method

Country Status (1)

Country Link
CN (1) CN101901210A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104160392A (en) * 2012-03-07 2014-11-19 三菱电机株式会社 Device, method, and program for estimating meaning of word
CN104731771A (en) * 2015-03-27 2015-06-24 大连理工大学 A system and method for disambiguating abbreviations based on word vectors
CN107291685A (en) * 2016-04-13 2017-10-24 北京大学 Method for recognizing semantics and semantics recognition system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5541836A (en) * 1991-12-30 1996-07-30 At&T Corp. Word disambiguation apparatus and methods
WO2002010985A2 (en) * 2000-07-28 2002-02-07 Tenara Limited Method of and system for automatic document retrieval, categorization and processing
CN1871597A (en) * 2003-08-21 2006-11-29 伊迪利亚公司 System and method for associating documents with contextual advertisements
CN1916887A (en) * 2006-09-06 2007-02-21 哈尔滨工程大学 Method for eliminating ambiguity without directive word meaning based on technique of substitution words

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5541836A (en) * 1991-12-30 1996-07-30 At&T Corp. Word disambiguation apparatus and methods
WO2002010985A2 (en) * 2000-07-28 2002-02-07 Tenara Limited Method of and system for automatic document retrieval, categorization and processing
CN1871597A (en) * 2003-08-21 2006-11-29 伊迪利亚公司 System and method for associating documents with contextual advertisements
CN1916887A (en) * 2006-09-06 2007-02-21 哈尔滨工程大学 Method for eliminating ambiguity without directive word meaning based on technique of substitution words

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DIANA MCCARTHY等: "Finding Predominant Word Senses in Untagged Text", 《ACL "04 PROCEEDINGS OF THE 42ND ANNUAL MEETING ON ASSOCIATION FOR COMPUTATIONAL LINGUISTICS》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104160392A (en) * 2012-03-07 2014-11-19 三菱电机株式会社 Device, method, and program for estimating meaning of word
CN104160392B (en) * 2012-03-07 2017-03-08 三菱电机株式会社 Semantic estimating unit, method
CN104731771A (en) * 2015-03-27 2015-06-24 大连理工大学 A system and method for disambiguating abbreviations based on word vectors
CN107291685A (en) * 2016-04-13 2017-10-24 北京大学 Method for recognizing semantics and semantics recognition system
CN107291685B (en) * 2016-04-13 2020-10-13 北京大学 Semantic recognition method and semantic recognition system

Similar Documents

Publication Publication Date Title
US10311146B2 (en) Machine translation method for performing translation between languages
US8630847B2 (en) Word probability determination
US8046222B2 (en) Segmenting words using scaled probabilities
US8831929B2 (en) Multi-mode input method editor
CN102081602B (en) Method and equipment for determining category of unlisted word
CN106095845B (en) Text classification method and device
AU2016383052A1 (en) Systems and methods for suggesting emoji
US20160217129A1 (en) Method and Apparatus for Determining Semantic Matching Degree
CN110377886A (en) Project duplicate checking method, apparatus, equipment and storage medium
CN107704506A (en) The method and apparatus of intelligent response
CN109492217B (en) Word segmentation method based on machine learning and terminal equipment
CN110413961A (en) The method, apparatus and computer equipment of text scoring are carried out based on disaggregated model
CN106484678A (en) A kind of short text similarity calculating method and device
CN118093789B (en) Medical text error correction system, medical query prompt text display method and device
CN105243053B (en) Extract the method and device of document critical sentence
US9311293B2 (en) Techniques for generating translation clusters
CN101901210A (en) Word meaning disambiguating system and method
CN114168729A (en) A text clustering system, method, device, device and medium
CN111898387B (en) Translation method and device, storage medium and computer equipment
CN113065333A (en) Method and device for identifying word types
CN116991252A (en) Input text prediction method and device, electronic equipment and storage medium
CN113312462B (en) Semantic similarity calculation method and device, electronic equipment and storage medium
CN114661852B (en) Text search method, terminal, and readable storage medium
CN110619122B (en) Word segmentation processing method, device, equipment and computer readable storage medium
CN113378541A (en) Text punctuation prediction method, device, system and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20101201