[go: up one dir, main page]

CN104794110B - Machine translation method and device - Google Patents

Machine translation method and device Download PDF

Info

Publication number
CN104794110B
CN104794110B CN201410026026.3A CN201410026026A CN104794110B CN 104794110 B CN104794110 B CN 104794110B CN 201410026026 A CN201410026026 A CN 201410026026A CN 104794110 B CN104794110 B CN 104794110B
Authority
CN
China
Prior art keywords
item
triggered
point
text
language vocabulary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410026026.3A
Other languages
Chinese (zh)
Other versions
CN104794110A (en
Inventor
贲国生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Tencent Cloud Computing Beijing Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201410026026.3A priority Critical patent/CN104794110B/en
Publication of CN104794110A publication Critical patent/CN104794110A/en
Application granted granted Critical
Publication of CN104794110B publication Critical patent/CN104794110B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a kind of machine translation method and devices, belong to text-processing technical field.Including:The first source language vocabulary to be translated in current text is obtained, and determines at least one corresponding target language vocabulary to be selected of the first source language vocabulary;The first point-by-point mutual information between each target language vocabulary to be selected target language vocabulary corresponding with each second source language vocabulary translated in current text is determined according to corpus, and the second point-by-point mutual information between each target language vocabulary to be selected and the second source language vocabulary is determined according to corpus;The translation result of the first source language vocabulary is determined according to the corresponding first point-by-point mutual information of each target language vocabulary to be selected and the second point-by-point mutual information.The present invention by simultaneously using between object language end point-by-point mutual information and original language end source language vocabulary to be translated is translated to the point-by-point mutual information between object language end, therefore, the quality translated when by source language translation at object language is higher.

Description

Machine translation method and device
Technical field
The present invention relates to text-processing technical field, in particular to a kind of machine translation method and device.
Background technique
With present science and technology development and various countries' information exchange it is increased, inter-State aphasis is aobvious Must be even more serious, since the operation mode of traditional artificial translation has been far from satisfying demand, and machine translation is as a kind of A kind of natural language is changed into the interpretative system of another natural language using computer, due to can be by the place of computer Reason speed advantage is rapidly translated, and can preferably be translated in conjunction with the context of co-text of entire chapter document simultaneously, therefore, It is increasingly becoming the mainstream of interpretative system.
Two kinds of machine translation methods are provided in the related technology:In the first machine translation method, obtain in current text First source language vocabulary to be translated, and determine at least one corresponding target language vocabulary to be selected of the first source language vocabulary, root Determine that each target language vocabulary to be selected is corresponding with each second source language vocabulary translated in current text according to corpus Point-by-point mutual information between target language vocabulary determines the first source according to the corresponding point-by-point mutual information of each target language vocabulary to be selected The translation result of language vocabulary.Wherein, Lexical Cohesion is broadly divided into repetition and collocation, and repetition refers to the weight of vocabulary item in text Multiple, collocation is mainly concerned with the vocabulary item of identical, similar or related semantic relation, collocation, the next relationship such as superordination Collocation, the collocation of identical relationship, the collocation of closeness relation, the collocation of inverse relationship, the collocation of complementary relationship etc..
In second of machine translation method, the first source language vocabulary to be translated in current text is obtained, and determine first At least one corresponding target language vocabulary to be selected of source language vocabulary, according to corpus determine each target language vocabulary to be selected with Point-by-point mutual information between second source language vocabulary determines first according to the corresponding point-by-point mutual information of each target language vocabulary to be selected The translation result of source language vocabulary.
In the implementation of the present invention, the inventor finds that the existing technology has at least the following problems:
Due between object language end point-by-point mutual information and original language end to the point-by-point mutual information between object language end into There is certain reference value when row machine translation, and any one in two kinds of point-by-point mutual informations can improve the matter of translation Amount, and one of point-by-point mutual information is only utilized in two kinds of machine translation methods in the related technology, so as to cause by original language That translates when translating into object language is of low quality.
Summary of the invention
In order to solve problems in the prior art, the embodiment of the invention provides a kind of machine translation method and devices.It is described Technical solution is as follows:
On the one hand, a kind of machine translation method is provided, the method includes:
The first source language vocabulary to be translated in current text is obtained, and determines that first source language vocabulary is corresponding extremely A few target language vocabulary to be selected;
Translated each second original language in each target language vocabulary to be selected and current text is determined according to corpus The first point-by-point mutual information between the corresponding target language vocabulary of vocabulary, and each object language to be selected is determined according to the corpus The second point-by-point mutual information between vocabulary and second source language vocabulary;
Described in being determined according to the corresponding first point-by-point mutual information of each target language vocabulary to be selected and the second point-by-point mutual information The translation result of first source language vocabulary.
On the other hand, a kind of machine translation apparatus is provided, described device includes:
Module is obtained, for obtaining the first source language vocabulary to be translated in current text;
First determining module, for determining at least one corresponding target language words to be selected of first source language vocabulary It converges;
Second determining module has been translated for being determined in each target language vocabulary to be selected and current text according to corpus Each of the first point-by-point mutual information between the corresponding target language vocabulary of the second source language vocabulary;
Third determining module, for determining each target language vocabulary to be selected and second source language according to the corpus The second point-by-point mutual information between words remittance;
4th determining module, for according to the corresponding first point-by-point mutual information of each target language vocabulary to be selected and second by Point mutual information determines the translation result of first source language vocabulary.
The beneficial effect of technical solution provided in an embodiment of the present invention is:
By determining at least one corresponding target language words to be selected of the first source language vocabulary to be translated in current text It converges, translated each second source language vocabulary pair in each target language vocabulary to be selected and current text is determined according to corpus After the first point-by-point mutual information between the target language vocabulary answered and the second point-by-point mutual information between the second source language vocabulary, according to The corresponding first point-by-point mutual information of each target language vocabulary to be selected and the second point-by-point mutual information determine the first source language vocabulary Translation result.Due to using point-by-point mutual information between object language end and original language end simultaneously to point-by-point between object language end Mutual information translates source language vocabulary to be translated, therefore, the quality translated when by source language translation at object language It is higher.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.
Fig. 1 is a kind of machine translation method flow chart that the embodiment of the present invention one provides;
Fig. 2 is a kind of machine translation method flow chart provided by Embodiment 2 of the present invention;
Fig. 3 is a kind of structural schematic diagram for machine translation apparatus that the embodiment of the present invention three provides;
Fig. 4 is the structural schematic diagram for the first determining module of one kind that the embodiment of the present invention three provides;
Fig. 5 is the structural schematic diagram for the second determining module of one kind that the embodiment of the present invention three provides;
Fig. 6 is a kind of structural schematic diagram for computing unit that the embodiment of the present invention three provides;
Fig. 7 is the structural schematic diagram for another computing unit that the embodiment of the present invention three provides;
Fig. 8 is a kind of structural schematic diagram for terminal that the embodiment of the present invention four provides.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention Formula is described in further detail.
Embodiment one
Since when carrying out machine translation, the point-by-point mutual information and original language end between object language end are between object language end Point-by-point mutual information have certain reference value when carrying out machine translation, and any one in two kinds of point-by-point mutual informations is all The quality that translation can be improved may cause source language translation if determining translation result according only to one of which into target That translates when language is of low quality.
For the quality translated when improving source language translation into object language, the embodiment of the invention provides a kind of machines Interpretation method, this method can be applied to terminal, which includes but is not limited to mobile phone, computer, tablet computer etc., this reality Example is applied not to be defined the concrete form of terminal.Referring to Fig. 1, method flow provided in this embodiment includes:
101:The first source language vocabulary to be translated in current text is obtained, and determines that the first source language vocabulary is corresponding extremely A few target language vocabulary to be selected;
102:Translated each second source in each target language vocabulary to be selected and current text is determined according to corpus The first point-by-point mutual information between the corresponding target language vocabulary of language vocabulary, and each object language to be selected is determined according to corpus The second point-by-point mutual information between vocabulary and the second source language vocabulary;
Wherein, translated each second source in each target language vocabulary to be selected and current text is determined according to corpus The first point-by-point mutual information between the corresponding target language vocabulary of language vocabulary, including but not limited to:
Using each target language vocabulary to be selected as the item that is triggered, and in the corresponding target language words of the second source language vocabulary Determine at least one corresponding first triggering item of the item that is each triggered in remittance, the first triggering item be under default Lexical Cohesion relationship with The corresponding target language vocabulary of the item that is triggered;
The point-by-point mutual information of son being each triggered between item and corresponding each first triggering item, and root are calculated according to corpus Each be triggered item and the second source are determined according to each point-by-point mutual information of son being triggered between item and corresponding each first triggering item The first point-by-point mutual information between the corresponding target language vocabulary of language vocabulary.
Wherein, determine that second between each target language vocabulary to be selected and the second source language vocabulary is point-by-point mutual according to corpus Information, including but not limited to:
Using each target language vocabulary to be selected as the item that is triggered, and determination is each triggered in the second source language vocabulary At least one corresponding second triggering item of item, the second triggering item is corresponding with the item that is triggered second under default Lexical Cohesion relationship Source language vocabulary;
The point-by-point mutual information of son being each triggered between item and corresponding each second triggering item, and root are calculated according to corpus Each be triggered item and the second source are determined according to each point-by-point mutual information of son being triggered between item and corresponding each second triggering item The second point-by-point mutual information between language vocabulary.
Wherein, the point-by-point mutual trust of son being each triggered between item and corresponding each first triggering item is calculated according to corpus Breath, including but not limited to:
The item that is each triggered is calculated according to corpus with corresponding each first triggering item in corresponding default Lexical Cohesion The first joint probability under relationship;
According to corpus calculate each be triggered first edge probability of the item under corresponding default Lexical Cohesion relationship with And second edge probability of the corresponding first triggering item of the item that is each triggered under corresponding default Lexical Cohesion relationship;
According to it is each be triggered item and corresponding each first triggering item under corresponding default Lexical Cohesion relationship the One joint probability, first edge probability and second edge probability calculation are each triggered between item and corresponding each first triggering item The point-by-point mutual information of son.
Wherein, the item that is each triggered is calculated according to corpus with corresponding each first triggering item in corresponding default vocabulary The first joint probability under joining relation, including but not limited to:
It is counted in the text of corpus while each item that is triggered occurs and trigger item and satisfaction with corresponding each first First quantity of the text of corresponding default Lexical Cohesion relationship;
In the text of corpus statistics have each be triggered item with it is corresponding it is each first triggering item it is corresponding preset Second quantity of the text of Lexical Cohesion relationship;
The item that is each triggered is calculated according to the first quantity and the second quantity with corresponding each first triggering item corresponding The first joint probability under default Lexical Cohesion relationship.
Wherein, general according to each first edge of the item under corresponding default Lexical Cohesion relationship that be triggered of corpus calculating The second edge probability of rate and the corresponding first triggering item of the item that is each triggered under corresponding default Lexical Cohesion relationship, packet It includes but is not limited to:
There is the third quantity of the text of each item that is triggered in statistics in the text of corpus;
There is the 4th quantity of the text of the corresponding first triggering item of each item that is triggered in statistics in the text of corpus;
In the text of corpus statistics have each be triggered item with it is corresponding it is each first triggering item it is corresponding preset Second quantity of the text of Lexical Cohesion relationship;
The to be each triggered item under corresponding default Lexical Cohesion relationship is calculated according to third quantity and the second quantity One marginal probability, and the item corresponding first that is each triggered is calculated with the second quantity according to the 4th quantity and triggers item corresponding pre- If the second edge probability under Lexical Cohesion relationship.
Wherein, the point-by-point mutual trust of son being each triggered between item and corresponding each second triggering item is calculated according to corpus Breath, including but not limited to:
The item that is each triggered is calculated according to corpus with corresponding each second triggering item in corresponding default Lexical Cohesion The second joint probability under relationship;
According to corpus calculate each be triggered third marginal probability of the item under corresponding default Lexical Cohesion relationship with And fourth marginal probability of the corresponding second triggering item of the item that is each triggered under corresponding default Lexical Cohesion relationship;
According to it is each be triggered item and corresponding each second triggering item under corresponding default Lexical Cohesion relationship the Two joint probabilities, third marginal probability and the 4th marginal probability, which calculate, to be each triggered between item and corresponding each second triggering item The point-by-point mutual information of son.
Wherein, the item that is each triggered is calculated according to corpus with corresponding each second triggering item in corresponding default vocabulary The second joint probability under joining relation, including but not limited to:
It is counted in the text of corpus while each item that is triggered occurs and trigger item and satisfaction with corresponding each second 5th quantity of the text of corresponding default Lexical Cohesion relationship;
In the text of corpus statistics have each be triggered item with it is corresponding it is each second triggering item it is corresponding preset 6th quantity of the text of Lexical Cohesion relationship;
The item that is each triggered is calculated according to the 5th quantity and the 6th quantity with corresponding each second triggering item corresponding The second joint probability under default Lexical Cohesion relationship.
Wherein, general according to each third edge of the item under corresponding default Lexical Cohesion relationship that be triggered of corpus calculating The 4th marginal probability of rate and the corresponding second triggering item of the item that is each triggered under corresponding default Lexical Cohesion relationship, packet It includes but is not limited to:
There is the 7th quantity of the text of each item that is triggered in statistics in the text of corpus;
There is the 8th quantity of the text of the corresponding second triggering item of each item that is triggered in statistics in the text of corpus;
In the text of corpus statistics have each be triggered item with it is corresponding it is each second triggering item it is corresponding preset 6th quantity of the text of Lexical Cohesion relationship;
The to be each triggered item under corresponding default Lexical Cohesion relationship is calculated according to the 7th quantity and the 6th quantity Three marginal probabilities, and the item corresponding second that is each triggered is calculated with the 6th quantity according to the 8th quantity and triggers item corresponding pre- If the 4th marginal probability under Lexical Cohesion relationship.
103:It is determined according to the corresponding first point-by-point mutual information of each target language vocabulary to be selected and the second point-by-point mutual information The translation result of first source language vocabulary.
Method provided in this embodiment, it is corresponding at least by the first source language vocabulary for determining to be translated in current text One target language vocabulary to be selected determines translated every in each target language vocabulary to be selected and current text according to corpus The first point-by-point mutual information between the corresponding target language vocabulary of a second source language vocabulary and between the second source language vocabulary It is true according to the corresponding first point-by-point mutual information of each target language vocabulary to be selected and the second point-by-point mutual information after two point-by-point mutual informations The translation result of fixed first source language vocabulary.Due to simultaneously use object language end between point-by-point mutual information and original language end arrive Point-by-point mutual information between object language end translates source language vocabulary to be translated, therefore, by source language translation at mesh The quality that poster is translated when saying is higher.
Embodiment two
The embodiment of the invention provides a kind of machine translation methods, in conjunction with the content of above-described embodiment one, referring to fig. 2, this Embodiment provide method flow include:
201:The first source language vocabulary to be translated in current text is obtained, and determines that the first source language vocabulary is corresponding extremely A few target language vocabulary to be selected;
Before obtaining the first source language vocabulary to be translated in current text, a text input can be preset Frame, obtains the text to be translated of user's input in Text Entry, and the present embodiment is not especially limited this.By current text In source language vocabulary in source language vocabulary to be translated as the first source language vocabulary, and obtain to be translated in current text First source language vocabulary, wherein the number of the first source language vocabulary got can be one, or be pre-set Number, the present embodiment do not make specific limit to the quantity of the first source language vocabulary got.
The present embodiment is not to the determination side for determining at least one corresponding target language vocabulary to be selected of the first source language vocabulary Formula makees specific restriction, including but not limited to:According to the first source language vocabulary to be translated in the current text got in data It is retrieved in library, at least one corresponding target language vocabulary to be selected of the first source language vocabulary is determined according to search result.Its In, source language and the target language are arbitrary two kinds of natural languages, for example, original language is Chinese, object language is English, this reality It applies example and this is not especially limited.It should be noted that original language and object language are two different natural languages, in addition, The database used when retrieval, which can according to need, to be selected, and the present embodiment does not make specific limit to the database used.Its In, it can store the different data under a kind of natural language in database, such as vocabulary, the phrase, morpheme in the natural language Deng it is, of course, also possible to store the different data under a variety of natural languages, the present embodiment is not to the natural language stored in database Type make it is specific limit, also specific limit is not made to the content of the different data under the every kind of natural language stored in database It is fixed.
For example, a Chinese language text is translated into corresponding English text, if in the Chinese language text got currently to First source language vocabulary of translation, i.e., Chinese vocabulary to be translated is " vehicles ", at this point, can be according to Chinese word to be translated Remittance is retrieved in the database, so that it is determined that at least one target language vocabulary to be selected.For example, determining that target language vocabulary can be Vocabulary:Vehicle, transportation etc..Certainly, Chinese vocabulary " vehicles " can also correspond to other English glossaries, The present embodiment is not especially limited this.
It it should be noted that some vocabulary include multiple meanings in above-mentioned English glossary, but include " vehicles " Meaning.
202:Translated each second source in each target language vocabulary to be selected and current text is determined according to corpus The first point-by-point mutual information between the corresponding target language vocabulary of language vocabulary;
The present embodiment does not determine each quilt to according to complete target language vocabulary translated in corpus and current text The method of determination for triggering the first point-by-point mutual information between the object language end Lexical Cohesion of item makees specific restriction, including but unlimited In:Using each target language vocabulary to be selected as the item that is triggered, and in the corresponding target language vocabulary of the second source language vocabulary Determine at least one corresponding first triggering item of the item that is each triggered, the first triggering item be under default Lexical Cohesion relationship with touched Send out the corresponding target language vocabulary of item;The son being each triggered between item and corresponding each first triggering item is calculated according to corpus Point-by-point mutual information, and each quilt is determined according to each point-by-point mutual information of son being triggered between item and corresponding each first triggering item Trigger the first point-by-point mutual information between item target language vocabulary corresponding with the second source language vocabulary.
Wherein, due to a possibility that each target language vocabulary to be selected is as translation result need according in current text Determining content is determined, therefore, using each target language vocabulary to be selected as the item that is triggered.For example, to be translated The first source language vocabulary be " vehicles ", corresponding target language vocabulary to be selected be vehicle, transportation Deng at this point, can be respectively using vehicle as the item 1 that is triggered, using transportation as the item 2 that is triggered, the present embodiment pair This is not especially limited.Lexical Cohesion is broadly divided into repetition and collocation, and repetition refers to that the repetition of vocabulary item in text, collocation are main It is related to the vocabulary item of identical, similar or related semantic relation, such as the collocation of the superordination, collocation of the next relationship, identical Therefore the collocation of relationship, the collocation of closeness relation, the collocation of inverse relationship, collocation of complementary relationship etc. are preset Lexical Cohesion and are closed System can be superordination, the next relationship, synonymy, and antonymy etc., the present embodiment is not to default Lexical Cohesion relationship Type makees specific restriction, does not also make specific limit to the content of default Lexical Cohesion relationship.
Determined in the corresponding target language vocabulary of the second source language vocabulary the item that is each triggered it is corresponding at least one the When one triggering item, including but not limited to:Target of the item under default Lexical Cohesion relationship that be each triggered is searched in the database Language vocabulary, if each of finding the target language vocabulary being triggered item under default Lexical Cohesion relationship while also current In text in the corresponding target language vocabulary of translated each second source language vocabulary, then by finding in the database and Exist simultaneously the object language in the corresponding target language vocabulary of translated each second source language vocabulary in current text Target language vocabulary of the vocabulary as the condition that meets, and it is right that the target language vocabulary for the condition that meets is determined as each item that is triggered At least one answered first triggers item.For any one be triggered item and any one default Lexical Cohesion relationship, by the quilt Triggering item and Lexical Cohesion relationship are denoted as be triggered item 1 and Lexical Cohesion relationship 1 respectively, search the item 1 that is triggered in the database Target language vocabulary under Lexical Cohesion relationship 1, if target language of the item 1 under Lexical Cohesion relationship 1 that be triggered found Words converge 1 simultaneously also in current text in the corresponding target language vocabulary of translated each second source language vocabulary, then will Target language vocabulary of the target language vocabulary 1 as the condition that meets, and the target language vocabulary for the condition that meets 1 is used as and is triggered At least one the corresponding first triggering item under Lexical Cohesion relationship 1 of item 1.
For the ease of illustrating, to determine at least one corresponding first triggering item of one of them item that is triggered, by this The item that is triggered is denoted as the item 1 that is triggered, and default Lexical Cohesion relationship is a kind of and for being the next relationship, searches quilt in the database All D-goal language vocabularies of the item 1 under the next relationship are triggered, with all D-goal languages found in the database Words converge including for target language vocabulary 1, target language vocabulary 3 and target language vocabulary 5.If translated in current text Include target language vocabulary 3 in the corresponding target language vocabulary of each second source language vocabulary, then makees target language vocabulary 3 For at least one corresponding first triggering item of the item 1 that is triggered.Certainly, it can also be determined according to the actual situation using other way every At least one corresponding first triggering item of a item that is triggered, the present embodiment are not especially limited this.Wherein, the database of use It can according to need and selected, the present embodiment does not make specific limit to the database used.
For example, it is assumed that the corresponding target language vocabulary of translated each second source language vocabulary of current text is ... Car ... orange...bus ..., wherein the part of omission is the corresponding target of the translated other source language vocabularies of current text Language vocabulary.In order to make it easy to understand, with the item that is triggered for vehicle, for presetting Lexical Cohesion relationship as the next relationship, according to The vehicle that is triggered searches all D-goal language vocabularies for the vehicle that is triggered in the database, such as bus, car, Plane etc., the present embodiment is not especially limited this.At this time, it may be determined that translated each second original language in current text Comprising there are two the corresponding D-goal language vocabulary of a vehicle that is triggered, i.e. car in the corresponding target language vocabulary of vocabulary And bus, therefore, when Lexical Cohesion relationship is the next relationship, the vehicle that is triggered is in current text translated each the Corresponding two first triggerings items, respectively car and bus in the corresponding target language vocabulary of two source language vocabularies.
In addition to the above method, determine that the item that is each triggered is right in the corresponding target language vocabulary of the second source language vocabulary At least one answered first triggers item, can also include but is not limited to:By each second original language translated in current text The corresponding target language vocabulary of vocabulary and the item that is each triggered separately constitute a target language vocabulary pair, search in the database Each be triggered target language vocabulary pair of the item under default Lexical Cohesion relationship, is presetting if finding the item that is each triggered Target language vocabulary under Lexical Cohesion relationship is to also translated second source language vocabulary is corresponding in current text simultaneously The target language vocabulary centering of target language vocabulary and the item composition that is each triggered, then will find in the database and deposit simultaneously In the mesh of the corresponding target language vocabulary of the second source language vocabulary translated in current text and the item composition that is each triggered The target language vocabulary of language vocabulary centering is marked to the target language vocabulary pair as the condition that meets, and by the target for the condition that meets The target language vocabulary of language vocabulary centering is determined as at least one corresponding first triggering item of each item that is triggered.For any This is triggered item and Lexical Cohesion relationship is denoted as is touched respectively by one be triggered item and any one default Lexical Cohesion relationship Send out item 1 and Lexical Cohesion relationship 1, by the corresponding target language vocabulary of translated each second source language vocabulary of current text with The item 1 that is triggered separately constitutes a target language vocabulary pair, searches be triggered item 1 under Lexical Cohesion relationship 1 in the database All target language vocabularies pair, if target language vocabulary of the item 1 under Lexical Cohesion relationship 1 that be triggered found is to simultaneously Also in current text the corresponding target language vocabulary of translated second source language vocabulary and the composition of item 1 that is triggered target language Words remittance centering, then will find in the database and exist simultaneously translated second source language vocabulary pair in current text The target language vocabulary of the target language vocabulary answered and the target language vocabulary centering for the composition of item 1 that is triggered is to as meeting condition Target language vocabulary pair, and the target language vocabulary of the target language vocabulary centering for the condition that meets is determined as the item 1 that is triggered At least one corresponding first triggering item under Lexical Cohesion relationship 1.
It should be noted that may there are the following situations:Under certain Lexical Cohesion relationship, some Xiang that is triggered Without the corresponding first triggering item of the item that is triggered in the corresponding target language vocabulary of two source language vocabularies.At this point, can continue by Determine that the item that is triggered is in the corresponding object language of the second source language vocabulary under other type Lexical Cohesion relationships according to the above method Corresponding first triggering item in vocabulary.If this is triggered, item is corresponding in the second source language vocabulary under every kind of Lexical Cohesion relationship All there is no corresponding first triggering items in target language vocabulary, at this point, continuing to determine every kind of Lexical Cohesion according to the method described above Other item corresponding first triggering items in the corresponding target language vocabulary of the second source language vocabulary that are triggered under relationship.
The present embodiment not between according to corpus calculate each be triggered item and it is corresponding it is each first triggering item son by The calculation of point mutual information makees specific restriction, including but not limited to:According to corpus calculate each be triggered item with it is corresponding First joint probability of each first triggering item under corresponding default Lexical Cohesion relationship;Each touched is calculated according to corpus Send out first edge probability and be each triggered item corresponding first triggering item of the item under corresponding default Lexical Cohesion relationship Second edge probability under corresponding default Lexical Cohesion relationship;It is triggered according to each item that is triggered with corresponding each first Joint probability, first edge probability under corresponding default Lexical Cohesion relationship and second edge probability calculation is each is touched Send out the point-by-point mutual information of son between item and corresponding each first triggering item.
Wherein, the item that is each triggered is calculated according to corpus with corresponding each first triggering item in corresponding default vocabulary The first joint probability under joining relation can use following calculation method:It counts while occurring every in the text of corpus First quantity of a be triggered item and corresponding each first triggering item and the text that meets corresponding default Lexical Cohesion relationship; There is statistics the item that is each triggered to trigger the corresponding default Lexical Cohesion of item with corresponding each first in the text of corpus Second quantity of the text of relationship;The item that is each triggered is calculated according to the first quantity and the second quantity to touch with corresponding each first Send out first joint probability of the item under corresponding default Lexical Cohesion relationship.
The present embodiment is not touched to according to each item that is triggered of the first quantity and the calculating of the second quantity with corresponding each first The calculation for sending out first joint probability of the item under corresponding default Lexical Cohesion relationship makees specific restriction, including but unlimited In:The value of first quantity is touched divided by the obtained quotient of value of the second quantity as each item that is triggered with corresponding each first Send out first joint probability of the item under corresponding default Lexical Cohesion relationship.
In order to make it easy to understand, triggering item is car now with the item that is triggered for vehicle, presetting Lexical Cohesion relationship is bottom For relationship, item is triggered under corresponding default Lexical Cohesion relationship with corresponding each first to the item that is each triggered is calculated The process of first joint probability is explained, and specific explaination is as follows:
Assuming that it is 5 that statistics, which has the value of the second quantity of the text of the next relationship, in the text of corpus, wherein first Occur being triggered a vehicle and triggering item car simultaneously in a text and meet the next relationship, is i.e. triggering item car is to be touched The D-goal language vocabulary of item vehicle is sent out, the vehicle that is triggered, third text only occurs in second text In only there is triggering item car, occur being triggered in the 4th text vehicle and triggering item car and under meeting simultaneously , only there is the vehicle that is triggered in the 5th text in position relationship.At this point, in the statistics available text for obtaining corpus simultaneously Occurring being triggered a vehicle and triggering item car and meet the value of the first quantity of the text of the next relationship is 2.
It is calculated according to the first quantity and the second quantity and is triggered a vehicle and triggers the of item car under the next relationship One joint probability, i.e. the first joint probability are 2/5, it is, of course, also possible to using other methods according to the first quantity and the second quantity It calculates and is each triggered the first joint of item and corresponding each first triggering item under corresponding default Lexical Cohesion relationship generally Rate, the present embodiment are not especially limited this.The item that is each triggered is calculated according to corpus triggers item with corresponding each first The first joint probability under corresponding default Lexical Cohesion relationship can also use other calculation methods, the present embodiment to this not Make specific limit.
Wherein, general according to each first edge of the item under corresponding default Lexical Cohesion relationship that be triggered of corpus calculating The second edge probability of rate and the corresponding first triggering item of the item that is each triggered under corresponding default Lexical Cohesion relationship, can To include but is not limited to:There is the third quantity of the text of each item that is triggered in statistics in the text of corpus;In corpus Text in statistics occur each be triggered item it is corresponding first triggering item text the 4th quantity;In the text of corpus Counting has the second of the text that item triggers the corresponding default Lexical Cohesion relationship of item with corresponding each first of being each triggered Quantity;First side of the item under corresponding default Lexical Cohesion relationship that be each triggered is calculated with the second quantity according to third quantity Edge probability, and the item corresponding first that is each triggered is calculated with the second quantity according to the 4th quantity and triggers item in corresponding default word The second edge probability converged under joining relation.
The present embodiment does not calculate each item that is triggered in corresponding default vocabulary rank with the second quantity to according to third quantity The calculation for connecing the first edge probability under relationship makees specific restriction, also not every to being calculated according to the 4th quantity and the second quantity A item that is triggered corresponding first triggers the calculation of second edge probability of the item under corresponding default Lexical Cohesion relationship Make specific restriction, including but not limited to:The value of third quantity is touched divided by the obtained quotient of value of the second quantity as each First edge probability of the item under corresponding default Lexical Cohesion relationship is sent out, by the value of the 4th quantity divided by the value institute of the second quantity Second edge of the obtained quotient as the corresponding first triggering item of each item that is triggered under corresponding default Lexical Cohesion relationship Probability.
In order to make it easy to understand, equally with the item that is triggered for vehicle, triggering item is car, under default Lexical Cohesion relationship is For the relationship of position, to calculating first edge probability of the item under corresponding default Lexical Cohesion relationship and each of being each triggered The process of second edge probability of the corresponding first triggering item of the item that is triggered under corresponding default Lexical Cohesion relationship is solved Explanation is released, specific explaination is as follows:
Assuming that it is 5 that statistics, which has the value of the second quantity of the text of the next relationship, in the text of corpus, wherein first Occur being triggered a vehicle and triggering item car simultaneously in a text and meet the next relationship, is i.e. triggering item car is to be touched The D-goal language vocabulary of item vehicle is sent out, the vehicle that is triggered, third text only occurs in second text In only there is triggering item car, occur being triggered in the 4th text vehicle and triggering item car and under meeting simultaneously , only there is the vehicle that is triggered in the 5th text in position relationship.
At this point, the third quantity of the text for the vehicle that obtains occurring being triggered can be counted in the text of corpus Value is 4.Similarly, statistics obtains occurring being triggered the corresponding text for triggering item car of a vehicle in the text of corpus The value of 4th quantity is 3.
First edge probability of the vehicle under the next relationship that be triggered is calculated according to third quantity and the second quantity, I.e. first edge probability is 4/5.It is triggered the corresponding triggering item car of a vehicle according to the calculating of the 4th quantity and the second quantity Second edge probability under the next relationship, the second edge probability being calculated are 3/5, certainly, may be used also according to the actual situation To be each triggered item in corresponding default Lexical Cohesion relationship with the calculating of the second quantity according to third quantity using other methods Under first edge probability, the present embodiment is not especially limited this.According to the actual situation can also using other methods according to 4th quantity calculates the item corresponding first that is each triggered with the second quantity and triggers item under corresponding default Lexical Cohesion relationship Second edge probability, the present embodiment is not especially limited this.
The present embodiment is not held in the mouth with corresponding each first triggering item in corresponding default vocabulary to according to item is each triggered Connect the first joint probability under relationship, first edge probability and second edge probability calculation be each triggered item with it is corresponding each The calculation of the point-by-point mutual information of son between first triggering item is made specifically to limit, including but not limited to:By first edge probability with Second edge probability multiplication, obtained product after the first joint probability is multiplied divided by the above two, and after obtained quotient is taken logarithm As final calculation result, using final calculation result as the son being each triggered between item and corresponding each first triggering item by Point mutual information.The above method can be indicated with following formula:
Wherein, p(X, y, R)Indicate the first joint probability, x represents triggering item, and y represents the item that is triggered, and R represents default vocabulary Joining relation, p(X, R)Indicate first edge probability, p(Y, R)Indicate second edge probability, PMI(xRy)Expression is triggered a y The point-by-point mutual information of son between triggering item x.
For example, according to the calculated result of example in above-mentioned steps, p(X, y, R)=2/5, p(X, R)=4/5, p(Y, R)=3/5, At this point, PMI can be calculated(xRy)=log(5/6).It should be noted that the truth of a matter of log function can take 2, it can also basis It needs to take other numerical value, the present embodiment is not especially limited this.
It, can be according to each after calculating each point-by-point mutual information of son being triggered between item and corresponding each first triggering item The point-by-point mutual information of son being triggered between item and corresponding each first triggering item determines each be triggered item and the second original language word The the first point-by-point mutual information converged between corresponding target language vocabulary.The present embodiment to according to each be triggered item with it is corresponding often The point-by-point mutual information of son between a first triggering item determines the item target language words corresponding with the second source language vocabulary that are each triggered The mode of the first point-by-point mutual information between remittance specifically limits, including but not limited to:According to it is each be triggered item with it is corresponding every The point-by-point mutual information of son between a first triggering item determine each be triggered item under every kind of Lexical Cohesion relationship with the second original language Point-by-point mutual information between the corresponding target language vocabulary of vocabulary, according to it is each be triggered item under every kind of Lexical Cohesion relationship with Point-by-point mutual information between the corresponding target language vocabulary of two source language vocabularies determines each be triggered item and the second source language vocabulary The first point-by-point mutual information between corresponding target language vocabulary.
Wherein, each quilt is determined according to each point-by-point mutual information of son being triggered between item and corresponding each first triggering item Triggering point-by-point mutual information of the item under every kind of Lexical Cohesion relationship between target language vocabulary corresponding with the second source language vocabulary can To include but is not limited to:The number of each item corresponding first triggering item under every kind of Lexical Cohesion relationship that is triggered of statistics, will Each be triggered item and the son of the item under every kind of Lexical Cohesion relationship between corresponding each first triggering item that be triggered it is point-by-point mutual Information is multiplied, and the result obtained after multiplication is carried out extracting operation, using the final result obtained after extracting operation as each quilt Trigger point-by-point mutual information of the item under every kind of Lexical Cohesion relationship between target language vocabulary corresponding with the second source language vocabulary.Its In, the number of the extracting operation item that can each be triggered for statistics corresponding first triggers item under every kind of Lexical Cohesion relationship Number, the present embodiment are not especially limited this.For any one be triggered item and any one Lexical Cohesion relationship, by this It is triggered item and Lexical Cohesion relationship is denoted as be triggered item 1 and Lexical Cohesion relationship 1 respectively, the statistics item 1 that is triggered is held in the mouth in vocabulary Connect under relationship 1 it is corresponding first triggering item number, will be triggered item 1 and the item 1 that is triggered it is corresponding under Lexical Cohesion relationship 1 The point-by-point mutual information of son between each first triggering item is multiplied, and the result obtained after multiplication is carried out extracting operation, by extracting operation The final result obtained afterwards is as the object language corresponding with the second source language vocabulary under Lexical Cohesion relationship 1 of item 1 that is triggered Point-by-point mutual information between vocabulary.Wherein, the number of extracting operation can be triggered item 1 under Lexical Cohesion relationship 1 pair for statistics The number for the first triggering item answered, the present embodiment are not especially limited this.
For the ease of illustrating, to determine that one of them is triggered item wherein under a kind of Lexical Cohesion relationship with second For point-by-point mutual information between the corresponding target language vocabulary of source language vocabulary:The item that is triggered is denoted as the item 1 that is triggered, it is assumed that Lexical Cohesion relationship is the next relationship, if the total number for all triggering items of the item 1 under the next relationship that are triggered is n, at this point, will The point-by-point mutual information of the son being triggered between item 1 and all triggering items of the item 1 under the next relationship that are triggered is multiplied, and will obtain after multiplication Result open n times side, last calculated result is corresponding with the second source language vocabulary under the next relationship as the item 1 that is triggered Point-by-point mutual information between target language vocabulary.
For example, default Lexical Cohesion relationship is the next relationship, be triggered a vehicle under the next relationship in the second source language Corresponding triggering item is car and bus in the corresponding target language vocabulary of words remittance, wherein be triggered a vehicle and triggering item The value of the point-by-point mutual information PMI of son between car is 0.2, the point-by-point mutual information PMI of the son being triggered between a vehicle and triggering item bus Value be 0.8, at this point it is possible to which it is corresponding with the second source language vocabulary under the next relationship that the vehicle that is triggered is calculated Point-by-point mutual information between target language vocabulary is(0.8*0.2)^0.5=0.4, circular are, by the item that is triggered Son mutual information and the point-by-point mutual trust of son being triggered between a vehicle and triggering item car point by point between vehicle and triggering item bus Manner of breathing multiplies to be taken away square again, accordingly, when being triggered a vehicle has n triggering item under the next relationship, can will be triggered Item vehicle is multiplied with the point-by-point mutual information of son between each triggering item is opening n times side, to obtain being triggered a vehicle under Point-by-point mutual information under the relationship of position between target language vocabulary corresponding with the second source language vocabulary.
The present embodiment is not to corresponding with the second source language vocabulary under every kind of Lexical Cohesion relationship according to the item that is each triggered Target language vocabulary between point-by-point mutual information determine and be each triggered item target language words corresponding with the second source language vocabulary The determination method of the first point-by-point mutual information between remittance makees specific restriction, including but not limited to:By each item that is triggered in every kind of word Point-by-point mutual information under remittance joining relation between target language vocabulary corresponding with the second source language vocabulary is overlapped, after superposition Result as the first point-by-point mutual information between the item that is each triggered target language vocabulary corresponding with the second source language vocabulary.Needle To any one item that is triggered, which is denoted as the item 1 that is triggered, item 1 will be triggered under every kind of Lexical Cohesion relationship Point-by-point mutual information between target language vocabulary corresponding with the second source language vocabulary is overlapped, using superimposed result as quilt Trigger the first point-by-point mutual information between the target language vocabulary corresponding with the second source language vocabulary of item 1.
For example, there are two types of default Lexical Cohesion relationships:The next relationship and synonymy, the vehicle that is triggered is in bottom Point-by-point mutual information under relationship between target language vocabulary corresponding with the second source language vocabulary is 0.4, with the under synonymy Point-by-point mutual information between the corresponding target language vocabulary of two source language vocabularies is 0.6, can determine at this time be triggered a vehicle with The first point-by-point mutual information between the corresponding target language vocabulary of second source language vocabulary is(0.4+0.6)=1.Furthermore it is also possible to pre- Weight first is set for every kind of Lexical Cohesion relationship, by it is each be triggered item under every kind of Lexical Cohesion relationship with the second original language word The point-by-point mutual information converged between corresponding target language vocabulary is being overlapped multiplied by after weight.Certainly, may be used also according to the actual situation To use other stacking methods, the present embodiment is not especially limited this.
203:Determine that second between each target language vocabulary to be selected and the second source language vocabulary is point-by-point mutual according to corpus Information;
The present embodiment not to determined according to corpus between each target language vocabulary to be selected and the second source language vocabulary The method of determination of two point-by-point mutual informations makees specific restriction, including but not limited to:Using each target language vocabulary to be selected as being touched Item is sent out, and determines at least one corresponding second triggering item of the item that is each triggered, the second triggering item in the second source language vocabulary For the second source language vocabulary corresponding with the item that is triggered under default Lexical Cohesion relationship;The item that is each triggered is calculated according to corpus With the point-by-point mutual information of son between corresponding each second triggering item, and triggered according to each item that is triggered with corresponding each second The point-by-point mutual information of son between determines the second point-by-point mutual information being each triggered between item and the second source language vocabulary.Wherein, quilt The definition for triggering item and Lexical Cohesion relationship can be with reference to the content in step 202, and details are not described herein again.
When determining corresponding at least one the second triggering item of the item that is each triggered in the second source language vocabulary, including but not It is limited to:Source language vocabulary of the item under default Lexical Cohesion relationship that be each triggered is searched in the database, if what is found is every A source language vocabulary of the item under default Lexical Cohesion relationship while also translated each second in current text of being triggered It, then will be being found in database and exist simultaneously translated each second original language in current text in source language vocabulary Source language vocabulary of second source language vocabulary as the condition that meets in vocabulary, and the source language vocabulary for the condition that meets is determined as At least one corresponding second triggering item of the item that is each triggered.For any one be triggered item and any one default vocabulary rank Relationship is connect, this is triggered item and Lexical Cohesion relationship is denoted as be triggered item 1 and Lexical Cohesion relationship 1 respectively, in the database Source language vocabulary 1 of the item 1 under Lexical Cohesion relationship 1 that be triggered is searched, if the item 1 that is triggered found is in Lexical Cohesion relationship Source language vocabulary 1 under 1 is simultaneously also in current text in translated each second source language vocabulary, then by source language vocabulary 1 source language vocabulary as the condition that meets, and the source language vocabulary 1 for meeting condition is used as and is triggered item 1 in Lexical Cohesion At least one corresponding second triggering item under relationship 1.
For the ease of illustrating, for determining at least one corresponding second triggering item of one of them item that is triggered: The item that is triggered is denoted as the item 1 that is triggered, it is assumed that default Lexical Cohesion relationship is a kind of and is the next relationship, according to the item that is triggered 1 searches in the database with the next relationship, searches all the next source language vocabularies for the item 1 that is triggered under the next relationship, with All the next source language vocabularies found in database include that source language vocabulary 5, source language vocabulary 9 and source language vocabulary 11 are Example.If in translated second source language vocabulary including source language vocabulary 5 and source language vocabulary 11 in current text, by source Language vocabulary 5 and source language vocabulary 11 are as the 1 corresponding two second triggering item of item that is triggered.Certainly, according to the actual situation also It can determine that at least one corresponding second triggering item of the item that is each triggered, the present embodiment are not made this specifically using other way It limits.
Car ... orange ... the car ... for example, it is assumed that translated second source language vocabulary of current text is ..., wherein The part of omission is the translated other original language contents of current text.In order to make it easy to understand, with the item that is triggered for vehicle, For default Lexical Cohesion relationship is the next relationship, according to being triggered, vehicle searches the item that is triggered in the database All the next source language vocabularies of vehicle, such as car, car, aircraft etc., the present embodiment is not especially limited this.At this point, It can determine the corresponding the next mesh of vehicle that is triggered comprising there are two in translated second source language vocabulary in current text Language vocabulary, i.e. " car " and " car " are marked, therefore, when Lexical Cohesion relationship is the next relationship, the vehicle that is triggered exists Corresponding two second triggering items, respectively " car " and " car " in translated second source language vocabulary of current text.
In addition to the above method, at least one corresponding second touching of the item that is each triggered is determined in the second source language vocabulary Item is sent out, can also include but is not limited to:By each second source language vocabulary translated in current text and each item that is triggered A source language vocabulary pair is separately constituted, searches source language of the item under default Lexical Cohesion relationship that be each triggered in the database Words converge pair, if find each be triggered item preset Lexical Cohesion relationship under source language vocabulary to and meanwhile also current The source language vocabulary centering of translated second source language vocabulary of text and the item composition that is each triggered, then will look into the database Find and exist simultaneously the original language word of translated second source language vocabulary and the item composition that is each triggered in current text Remittance is determined as to the source language vocabulary pair as the condition that meets, and by the source language vocabulary of the source language vocabulary centering for the condition that meets At least one corresponding second triggering item of the item that is each triggered.For any one be triggered item and any one default vocabulary rank Relationship is connect, this is triggered item and Lexical Cohesion relationship is denoted as be triggered item 1 and Lexical Cohesion relationship 1 respectively, by current text In translated each second source language vocabulary and the item 1 that is triggered separately constitute a source language vocabulary pair, with the source language formed Words converge to include be triggered a 1- source language vocabulary 6, the 1- source language vocabulary 7 that is triggered, be triggered a 1- source language vocabulary 8 For these three source language vocabularies pair.Source language vocabulary of the item 1 under Lexical Cohesion relationship 1 that be triggered is searched in the database It is right, if source language vocabulary of the item 1 under Lexical Cohesion relationship 1 that be triggered found to for the 1- source language vocabulary 8 that is triggered, And due to be triggered source language vocabulary that a 1- source language vocabulary 8 forms to and meanwhile also translated second source in current text The source language vocabulary centering of language vocabulary and the composition of item 1 that is triggered, the then original language formed the 1- source language vocabulary 8 that is triggered Vocabulary is determined as being touched to the source language vocabulary pair as the condition that meets, and by the source language vocabulary of the source language vocabulary centering 8 Send out the corresponding second triggering item under Lexical Cohesion relationship 1 of item 1.
It should be noted that may there are the following situations:Under certain Lexical Cohesion relationship, some Xiang that is triggered Without the corresponding second triggering item of the item that is triggered in two source language vocabularies.At this point, can continue to determine according to the method described above other The item corresponding second triggering item in the second source language vocabulary that is triggered under type Lexical Cohesion relationship.If this is triggered, item exists All there is no corresponding second triggering items in the second source language vocabulary under every kind of Lexical Cohesion relationship, at this point, continuing according to upper The method of stating determines other item corresponding second triggering items in the second source language vocabulary that are triggered under every kind of Lexical Cohesion relationship.
The present embodiment not between according to corpus calculate each be triggered item and it is corresponding it is each second triggering item son by The calculation of point mutual information makees specific restriction, including but not limited to:According to corpus calculate each be triggered item with it is corresponding Second joint probability of each second triggering item under corresponding default Lexical Cohesion relationship;Each touched is calculated according to corpus Send out third marginal probability and be each triggered item corresponding second triggering item of the item under corresponding default Lexical Cohesion relationship The 4th marginal probability under corresponding default Lexical Cohesion relationship;It is triggered according to each item that is triggered with corresponding each second Second joint probability, third marginal probability and fourth marginal probability of the item under corresponding default Lexical Cohesion relationship calculate each The point-by-point mutual information of son being triggered between item and corresponding each second triggering item.
Wherein, the item that is each triggered is calculated according to corpus with corresponding each second triggering item in corresponding default vocabulary The second joint probability under joining relation, can include but is not limited to:It is counted in the text of corpus while each quilt occurs 5th quantity of the text that triggering item triggers item and meet corresponding default Lexical Cohesion relationship with corresponding each second;In language Expect that there is statistics the item that is each triggered to trigger the corresponding default Lexical Cohesion relationship of item with corresponding each second in the text in library Text the 6th quantity;The item that is each triggered is calculated according to the 5th quantity and the 6th quantity triggers item with corresponding each second The second joint probability under corresponding default Lexical Cohesion relationship.Certainly, other calculating can also be used according to the actual situation Method, the present embodiment are not especially limited this.
The present embodiment is not touched to according to each item that is triggered of the 5th quantity and the calculating of the 6th quantity with corresponding each second The calculation for sending out second joint probability of the item under corresponding default Lexical Cohesion relationship makees specific restriction, including but unlimited In:The value of 5th quantity is touched divided by the obtained quotient of value of the 6th quantity as each item that is triggered with corresponding each second Send out second joint probability of the item under corresponding default Lexical Cohesion relationship.
In order to make it easy to understand, triggering item is " car " now with the item that is triggered for vehicle, default Lexical Cohesion relationship is For the next relationship, item is triggered in corresponding default Lexical Cohesion relationship with corresponding each second to the item that is each triggered is calculated Under the process of the second joint probability be explained, it is specific illustrate it is as follows:
Assuming that corpus is stored with the bilingual text of Chinese and English, statistics has the next relationship in the text of corpus The value of the 5th quantity of text be 5, wherein occur being triggered a vehicle and triggering item " sedan-chair simultaneously in first text Vehicle " and meet the next relationship, i.e. triggering item " car " is the D-goal language vocabulary of a vehicle of being triggered, second text Only there is the vehicle that is triggered in this, only occurs in third text triggering item " car ", in the 4th text simultaneously Occur being triggered a vehicle and triggering item " car " and meet the next relationship, only occurs being triggered in the 5th text Item vehicle.At this point, in the statistics available text for obtaining corpus while there is be triggered a vehicle and triggering item " car " And the value for meeting the 5th quantity of the text of the next relationship is 2.
It is calculated according to the 5th quantity and the 6th quantity and is triggered a vehicle and triggers item " car " under the next relationship Second joint probability, i.e. the second joint probability are 2/5, it is, of course, also possible to using other methods according to the 5th quantity and the 6th number Amount calculates second joint of the item with corresponding each second triggering item under corresponding default Lexical Cohesion relationship that be each triggered Probability, the present embodiment are not especially limited this.The item that is each triggered is calculated according to corpus to trigger with corresponding each second Second joint probability of the item under corresponding default Lexical Cohesion relationship can also use other calculation methods, and the present embodiment is to this It is not especially limited.
Wherein, general according to each third edge of the item under corresponding default Lexical Cohesion relationship that be triggered of corpus calculating The 4th marginal probability of rate and the corresponding second triggering item of the item that is each triggered under corresponding default Lexical Cohesion relationship, can To include but is not limited to:There is the 7th quantity of the text of each item that is triggered in statistics in the text of corpus;In corpus Text in statistics occur each be triggered item it is corresponding second triggering item text the 8th quantity;In the text of corpus Counting has the 6th of the text that item triggers the corresponding default Lexical Cohesion relationship of item with corresponding each second of being each triggered Quantity;Third side of the item under corresponding default Lexical Cohesion relationship that be each triggered is calculated with the 6th quantity according to the 7th quantity Edge probability, and the item corresponding second that is each triggered is calculated with the 6th quantity according to the 8th quantity and triggers item in corresponding default word The 4th marginal probability converged under joining relation.Certainly, other calculation methods, the present embodiment pair can also be used according to the actual situation This is not especially limited.
The present embodiment does not calculate each item that is triggered in corresponding default vocabulary rank with the 6th quantity to according to the 7th quantity The calculation for connecing the third marginal probability under relationship makees specific restriction, also not every to being calculated according to the 8th quantity and the 6th quantity A item that is triggered corresponding second triggers the calculation of fourth marginal probability of the item under corresponding default Lexical Cohesion relationship Make specific restriction, including but not limited to:The value of 7th quantity is touched divided by the obtained quotient of value of the 6th quantity as each Third marginal probability of the item under corresponding default Lexical Cohesion relationship is sent out, by the value of the 8th quantity divided by the value institute of the 6th quantity Fourth edge of the obtained quotient as the corresponding second triggering item of each item that is triggered under corresponding default Lexical Cohesion relationship Probability.
In order to make it easy to understand, triggering item is " car " equally with the item that is triggered for vehicle, Lexical Cohesion relationship is preset For the next relationship, to calculate each be triggered third marginal probability of the item under corresponding default Lexical Cohesion relationship and The process of corresponding fourth marginal probability of the second triggering item under corresponding default Lexical Cohesion relationship of the item that is each triggered into Row illustrates, and specific explaination is as follows:
Assuming that it is 5 that statistics, which has the value of the second quantity of the text of the next relationship, in the text of corpus, wherein first Occur being triggered a vehicle and triggering item " car " simultaneously in a text and meets the next relationship, i.e. triggering item " car " For the D-goal language vocabulary for the vehicle that is triggered, only there is the vehicle that is triggered, third in second text Only occur in a text triggering item " car ", occurs be triggered a vehicle and triggering item " sedan-chair in the 4th text simultaneously Vehicle " and meet the next relationship, the vehicle that is triggered only occurs in the 5th text.
At this point, the 7th quantity of the text for the vehicle that obtains occurring being triggered can be counted in the text of corpus Value is 4.Similarly, statistics obtains occurring being triggered corresponding second triggering item " car " of a vehicle in the text of corpus Text the 8th quantity value be 3.
Third marginal probability of the vehicle under the next relationship that be triggered is calculated according to the 7th quantity and the 6th quantity, I.e. third marginal probability is 4/5.It is triggered the corresponding triggering item " sedan-chair of a vehicle according to the calculating of the 8th quantity and the 6th quantity Fourth marginal probability of the vehicle " under the next relationship, the 4th marginal probability being calculated are 3/5, certainly, according to the actual situation also The item that is each triggered can be calculated with the 6th quantity according to the 7th quantity using other methods to close in corresponding default Lexical Cohesion Third marginal probability under system, the present embodiment are not especially limited this.Other calculating sides can also be used according to the actual situation Method calculates the item corresponding second that is each triggered with the 6th quantity according to the 8th quantity and triggers item in corresponding default Lexical Cohesion The 4th marginal probability under relationship, the present embodiment are not especially limited this.
The present embodiment is not held in the mouth with corresponding each second triggering item in corresponding default vocabulary to according to item is each triggered Connect the second joint probability under relationship, third marginal probability and the 4th marginal probability calculate each be triggered item with it is corresponding each The calculation of the point-by-point mutual information of son between second triggering item is made specifically to limit, including but not limited to:By third marginal probability with 4th marginal probability is multiplied, obtained product after the second joint probability is multiplied divided by the above two, and after obtained quotient is taken logarithm As final calculation result, using final calculation result as the son being each triggered between item and corresponding each second triggering item by Point mutual information.The above method can be indicated with following formula:
Wherein, p(X, y, R)Indicate the second joint probability, x represents triggering item, and y represents the item that is triggered, and R represents default vocabulary Joining relation, p(X, R)Indicate third marginal probability, p(Y, R)Indicate the 4th marginal probability, PMI(xRy)Expression is triggered a y The point-by-point mutual information of son between triggering item x.
For example, according to the calculated result of example in above-mentioned steps, p(X, y, R)=2/5, p(X, R)=4/5, p(Y, R)=3/5, At this point, PMI can be calculated(xRy)=log(5/6).It should be noted that the truth of a matter of log function can take 2, it can also basis It needs to take other numerical value, the present embodiment is not especially limited this.
It, can be according to each after calculating each point-by-point mutual information of son being triggered between item and corresponding each second triggering item The point-by-point mutual information of son being triggered between item and corresponding each second triggering item determines each be triggered item and second source language The second point-by-point mutual information between words remittance, the present embodiment are not triggered item and corresponding each second triggering item between according to each The point-by-point mutual information of son determine the determination side of the second point-by-point mutual information being each triggered between item and second source language vocabulary Formula makees specific restriction, including but not limited to:It is point-by-point mutual according to each son being triggered between item and corresponding each second triggering item Information determines point-by-point mutual information of the item under every kind of Lexical Cohesion relationship between the second source language vocabulary that be each triggered, according to every A point-by-point mutual information of the item under every kind of Lexical Cohesion relationship between the second source language vocabulary that be triggered determines the item that is each triggered The second point-by-point mutual information between second source language vocabulary.
Wherein, each quilt is determined according to each point-by-point mutual information of son being triggered between item and corresponding each second triggering item Triggering item can include but is not limited under every kind of Lexical Cohesion relationship with the point-by-point mutual information of the second source language vocabulary:Statistics is every The number of a item corresponding second triggering item under every kind of Lexical Cohesion relationship that is triggered, each item that is triggered is triggered with this Son point-by-point mutual information of the item under every kind of Lexical Cohesion relationship between corresponding each second triggering item is multiplied, by what is obtained after multiplication As a result extracting operation is carried out, using the final result obtained after extracting operation as being each triggered item in every kind of Lexical Cohesion relationship The lower point-by-point mutual information between the second source language vocabulary.Wherein, the number of extracting operation can exist for each of the statistics item that is triggered The number of corresponding second triggering item, the present embodiment are not especially limited this under every kind of Lexical Cohesion relationship.For any one This is triggered item and Lexical Cohesion relationship is denoted as the item 1 that is triggered respectively by a be triggered item and any one Lexical Cohesion relationship And Lexical Cohesion relationship 1, the number for the corresponding second triggering item under Lexical Cohesion relationship 1 of item 1 that is triggered is counted, will be triggered Item 1 is multiplied with the point-by-point mutual information of son of the item 1 under Lexical Cohesion relationship 1 between corresponding each second triggering item that be triggered, by phase The result obtained after multiplying carries out extracting operation, using the final result obtained after extracting operation as the item 1 that is triggered in Lexical Cohesion The lower point-by-point mutual information between the second source language vocabulary of relationship 1.Wherein, the number of extracting operation can be the item 1 that is triggered of statistics The number of corresponding second triggering item, the present embodiment are not especially limited this under Lexical Cohesion relationship 1.
For the ease of illustrating, to determine that one of them is triggered item wherein under a kind of Lexical Cohesion relationship with second For point-by-point mutual information between source language vocabulary:The item that is triggered is denoted as the item 1 that is triggered, it is assumed that Lexical Cohesion relationship is bottom Relationship, if the total number for all triggering items of the item 1 under the next relationship that are triggered be n, at this point, will be triggered item 1 be triggered The point-by-point mutual information of son between all triggering items of the item 1 under the next relationship is multiplied, and the result obtained after multiplication is opened n times side, will Last calculated result is as point-by-point mutual information of the item 1 under the next relationship between the second source language vocabulary that be triggered.
For example, default Lexical Cohesion relationship is the next relationship, be triggered a vehicle under the next relationship in the second source language It is " car " and " car " that words, which converge corresponding triggering item in corresponding target language vocabulary, wherein be triggered a vehicle and The value for triggering the point-by-point mutual information PMI of the son between item " car " is 0.2, the son being triggered between a vehicle and triggering item " car " The value of point-by-point mutual information PMI is 0.8, at this point it is possible to be calculated be triggered a vehicle under the next relationship with the second source language Words converge between point-by-point mutual information be(0.8*0.2)^0.5=0.4, circular be will be triggered a vehicle with Trigger the point-by-point mutual information of the son between item " car " and the son being triggered between a vehicle and triggering item " car " mutual trust manner of breathing point by point Multiply and take away again square, accordingly, when being triggered a vehicle has n triggering item under the next relationship, can will be triggered item The point-by-point mutual information of son, which is multiplied, between vehicle and each triggering item is opening n times side, so that the vehicle that obtains being triggered is in bottom Point-by-point mutual information under relationship between the second source language vocabulary, n are the positive integer more than or equal to 1.
The present embodiment not between according to be each triggered item under every kind of Lexical Cohesion relationship the second source language vocabulary It is specific that point-by-point mutual information determines that the determination method for the second point-by-point mutual information being each triggered between item and the second source language vocabulary is made It limits, including but not limited to:Each item that is triggered is point-by-point between the second source language vocabulary under every kind of Lexical Cohesion relationship Mutual information is overlapped, using superimposed result as the second point-by-point mutual trust being each triggered between item and the second source language vocabulary Breath.For any one item that is triggered, which is denoted as the item 1 that is triggered, the item 1 that will be triggered is closed in every kind of Lexical Cohesion The lower point-by-point mutual information between the second source language vocabulary of system is overlapped, using superimposed result as item 1 and second of being triggered The second point-by-point mutual information between source language vocabulary.
For example, there are two types of default Lexical Cohesion relationships:The next relationship and synonymy, the vehicle that is triggered is in bottom Point-by-point mutual information under relationship between the second source language vocabulary is 0.4, under synonymy between the second source language vocabulary by Point mutual information is 0.6, is can determine between the vehicle that is triggered target language vocabulary corresponding with the second source language vocabulary at this time Second point-by-point mutual information is(0.4+0.6)=1.Furthermore it is also possible to weight is set for every kind of Lexical Cohesion relationship in advance, it will be each It is being folded after point-by-point mutual information of the item under every kind of Lexical Cohesion relationship between the second source language vocabulary be triggered multiplied by weight Add.Certainly, other stacking methods can also be used according to the actual situation, and the present embodiment is not especially limited this.
204:It is determined according to the corresponding first point-by-point mutual information of each target language vocabulary to be selected and the second point-by-point mutual information The translation result of first source language vocabulary.
The present embodiment is not to point-by-point mutually according to the corresponding first point-by-point mutual information of each target language vocabulary to be selected and second Information determines that the method for determination of the translation result of the first source language vocabulary makees specific restriction, including but not limited to:It is point-by-point for first A weighted value is respectively set in mutual information and the second point-by-point mutual information, point by point by each target language vocabulary corresponding first to be selected Mutual information and the second point-by-point mutual information are overlapped multiplied by corresponding weight, using the calculated result obtained after superposition as measurement Value, the metric that more each target language vocabulary to be selected is calculated according to the method described above, by the biggish mesh to be selected of metric Mark translation result of the language vocabulary as the first source language vocabulary.
For example, the first source language vocabulary to be translated is " vehicles ", corresponding target language vocabulary to be selected is The weighted value of vehicle and transportation, the first point-by-point mutual information are 0.4, and the weighted value of the second point-by-point mutual information is 0.6.Wherein, the corresponding first point-by-point mutual information of target language vocabulary vehicle to be selected is 1, and the second point-by-point mutual information is 2, because This, the metric of target language vocabulary vehicle to be selected is(1*0.4+2*0.6)=1.6.Similarly, target language vocabulary to be selected The corresponding first point-by-point mutual information of transportation is 0.8, and the second point-by-point mutual information is 3, target language vocabulary to be selected The corresponding metric of transportation is(0.8*0.4+3*0.6)=2.12.Target language vocabulary more to be selected The metric of vehicle and transportation, it is known that the metric of target language vocabulary transportation to be selected compared with It greatly, therefore, can be using transportation as the translation result of the first metalanguage vocabulary vehicles to be translated.
Method provided in this embodiment, it is corresponding at least by the first source language vocabulary for determining to be translated in current text One target language vocabulary to be selected determines translated every in each target language vocabulary to be selected and current text according to corpus The first point-by-point mutual information between the corresponding target language vocabulary of a second source language vocabulary and between the second source language vocabulary It is true according to the corresponding first point-by-point mutual information of each target language vocabulary to be selected and the second point-by-point mutual information after two point-by-point mutual informations The translation result of fixed first source language vocabulary.Due to simultaneously use object language end between point-by-point mutual information and original language end arrive Point-by-point mutual information between object language end translates source language vocabulary to be translated, therefore, by source language translation at mesh The quality that poster is translated when saying is higher.
Embodiment three
The embodiment of the invention provides a kind of machine translation apparatus, the device is for executing above-described embodiment one or embodiment Function performed by terminal in two methods provided.Referring to Fig. 3, which includes:
Module 301 is obtained, for obtaining the first source language vocabulary to be translated in current text;
First determining module 302, for determining at least one corresponding target language vocabulary to be selected of the first source language vocabulary;
Second determining module 303, for being determined in each target language vocabulary to be selected and current text according to corpus The first point-by-point mutual information between each of the translation corresponding target language vocabulary of the second source language vocabulary;
Third determining module 304, for determining each target language vocabulary to be selected and the second original language word according to corpus The second point-by-point mutual information between remittance;
4th determining module 305, for according to the corresponding first point-by-point mutual information of each target language vocabulary to be selected and the Two point-by-point mutual informations determine the translation result of the first source language vocabulary.
As a kind of preferred embodiment, the second determining module 303, referring to fig. 4, including:
First determination unit 3031, for using each target language vocabulary to be selected as the item that is triggered, and in the second source language At least one corresponding first triggering item of the item that is each triggered, the first triggering item are determined in the corresponding target language vocabulary of words remittance For target language vocabulary corresponding with the item that is triggered under default Lexical Cohesion relationship;
First computing unit 3032 triggers item with corresponding each first for calculating the item that is each triggered according to corpus Between the point-by-point mutual information of son;
Second determination unit 3033, for point-by-point according to the son being each triggered between item and corresponding each first triggering item Mutual information determines the first point-by-point mutual information between being each triggered item target language vocabulary corresponding with the second source language vocabulary.
As a kind of preferred embodiment, third determining module 304, referring to Fig. 5, including:
Third determination unit 3041, for using each target language vocabulary to be selected as the item that is triggered, and in the second source language Words determine at least one corresponding second triggering item of the item that is each triggered in converging, and the second triggering item is default Lexical Cohesion relationship Lower the second source language vocabulary corresponding with the item that is triggered;
Second computing unit 3042 triggers item with corresponding each second for calculating the item that is each triggered according to corpus Between the point-by-point mutual information of son;
4th determination unit 3043, for point-by-point according to the son being each triggered between item and corresponding each second triggering item Mutual information determines the second point-by-point mutual information being each triggered between item and the second source language vocabulary.
As a kind of preferred embodiment, the first computing unit 3032, referring to Fig. 6, including:
First computation subunit 30321 is touched for calculating the item that is each triggered according to corpus with corresponding each first Send out first joint probability of the item under corresponding default Lexical Cohesion relationship;
Second computation subunit 30322 is held in the mouth for calculating the item that is each triggered according to corpus in corresponding default vocabulary It connects the first edge probability under relationship and the corresponding first triggering item of the item that is each triggered is closed in corresponding default Lexical Cohesion Second edge probability under system;
Third computation subunit 30323, for triggering item corresponding with corresponding each first according to item is each triggered Default Lexical Cohesion relationship under the first joint probability, first edge probability and second edge probability calculation be each triggered item With the point-by-point mutual information of son between corresponding each first triggering item.
As a kind of preferred embodiment, the first computation subunit 30321, for counting while going out in the text of corpus Now each it is triggered the first of item and corresponding each first triggering item and the text that meets corresponding default Lexical Cohesion relationship Quantity;There is statistics the item that is each triggered to trigger the corresponding default vocabulary of item with corresponding each first in the text of corpus Second quantity of the text of joining relation;The item and corresponding each that is each triggered is calculated according to the first quantity and the second quantity First joint probability of the one triggering item under corresponding default Lexical Cohesion relationship.
As a kind of preferred embodiment, the second computation subunit 30322 occurs often for counting in the text of corpus The third quantity of the text of a item that is triggered;There is corresponding first triggering of item that is each triggered in statistics in the text of corpus 4th quantity of the text of item;There is statistics the item that is each triggered to trigger item with corresponding each first in the text of corpus Second quantity of the text of corresponding default Lexical Cohesion relationship;The item that is each triggered is calculated according to third quantity and the second quantity First edge probability under corresponding default Lexical Cohesion relationship, and each touched is calculated according to the 4th quantity and the second quantity Send out second edge probability of the corresponding first triggering item of item under corresponding default Lexical Cohesion relationship.
As a kind of preferred embodiment, the second computing unit 3042, referring to Fig. 7, including:
4th computation subunit 30421 is touched for calculating the item that is each triggered according to corpus with corresponding each second Send out second joint probability of the item under corresponding default Lexical Cohesion relationship;
5th computation subunit 30422 is held in the mouth for calculating the item that is each triggered according to corpus in corresponding default vocabulary It connects the third marginal probability under relationship and the corresponding second triggering item of the item that is each triggered is closed in corresponding default Lexical Cohesion The 4th marginal probability under system;
6th computation subunit 30423, for triggering item corresponding with corresponding each second according to item is each triggered Default Lexical Cohesion relationship under the second joint probability, third marginal probability and the 4th marginal probability calculate the item that is each triggered With the point-by-point mutual information of son between corresponding each second triggering item.
As a kind of preferred embodiment, the 4th computation subunit 30421, for counting while going out in the text of corpus Now each it is triggered the 5th of item and corresponding each second triggering item and the text that meets corresponding default Lexical Cohesion relationship Quantity;There is statistics the item that is each triggered to trigger the corresponding default vocabulary of item with corresponding each second in the text of corpus 6th quantity of the text of joining relation;The item and corresponding each that is each triggered is calculated according to the 5th quantity and the 6th quantity Second joint probability of the two triggering items under corresponding default Lexical Cohesion relationship.
As a kind of preferred embodiment, the 5th computation subunit 30422 occurs often for counting in the text of corpus 7th quantity of the text of a item that is triggered;There is corresponding second triggering of item that is each triggered in statistics in the text of corpus 8th quantity of the text of item;There is statistics the item that is each triggered to trigger item with corresponding each second in the text of corpus 6th quantity of the text of corresponding default Lexical Cohesion relationship;The item that is each triggered is calculated according to the 7th quantity and the 6th quantity Third marginal probability under corresponding default Lexical Cohesion relationship, and each touched is calculated according to the 8th quantity and the 6th quantity Send out fourth marginal probability of the corresponding second triggering item of item under corresponding default Lexical Cohesion relationship.
Device provided in this embodiment, it is corresponding at least by the first source language vocabulary for determining to be translated in current text One target language vocabulary to be selected determines translated every in each target language vocabulary to be selected and current text according to corpus The first point-by-point mutual information between the corresponding target language vocabulary of a second source language vocabulary and between the second source language vocabulary It is true according to the corresponding first point-by-point mutual information of each target language vocabulary to be selected and the second point-by-point mutual information after two point-by-point mutual informations The translation result of fixed first source language vocabulary.Due to simultaneously use object language end between point-by-point mutual information and original language end arrive Point-by-point mutual information between object language end translates source language vocabulary to be translated, therefore, by source language translation at mesh The quality that poster is translated when saying is higher.
Example IV
A kind of terminal is present embodiments provided, which can be used for executing the machine translation side provided in above-described embodiment Method.Referring to Fig. 8, which includes:
Terminal 800 may include RF(Radio Frequency, radio frequency)Circuit 110 includes one or more meter The memory 120 of calculation machine readable storage medium storing program for executing, input unit 130, display unit 140, sensor 150, voicefrequency circuit 160, WiFi(Wireless Fidelity, Wireless Fidelity)Module 170, the processing for including one or more than one processing core The components such as device 180 and power supply 190.It will be understood by those skilled in the art that terminal structure shown in Fig. 8 is not constituted pair The restriction of terminal may include perhaps combining certain components or different component cloth than illustrating more or fewer components It sets.Wherein:
RF circuit 110 can be used for receiving and sending messages or communication process in, signal sends and receivees, particularly, by base station After downlink information receives, one or the processing of more than one processor 180 are transferred to;In addition, the data for being related to uplink are sent to Base station.In general, RF circuit 110 includes but is not limited to antenna, at least one amplifier, tuner, one or more oscillators, uses Family identity module(SIM)Card, transceiver, coupler, LNA(Low Noise Amplifier, low-noise amplifier), duplex Device etc..In addition, RF circuit 110 can also be communicated with network and other equipment by wireless communication.The wireless communication can make With any communication standard or agreement, and including but not limited to GSM (Global System of Mobile communication, entirely Ball mobile communcations system), GPRS (General Packet Radio Service, general packet radio service), CDMA (Code Division Multiple Access, CDMA), WCDMA (Wideband Code Division Multiple Access, wideband code division multiple access), LTE (Long Term Evolution, long term evolution), Email, SMS (Short Messaging Service, short message service) etc..
Memory 120 can be used for storing software program and module, and processor 180 is stored in memory 120 by operation Software program and module, thereby executing various function application and data processing.Memory 120 can mainly include storage journey Sequence area and storage data area, wherein storing program area can application program needed for storage program area, at least one function(Than Such as sound-playing function, image player function)Deng;Storage data area, which can be stored, uses created number according to terminal 800 According to(Such as audio data, phone directory etc.)Deng.In addition, memory 120 may include high-speed random access memory, can also wrap Include nonvolatile memory, a for example, at least disk memory, flush memory device or other volatile solid-state parts. Correspondingly, memory 120 can also include Memory Controller, to provide processor 180 and input unit 130 to memory 120 access.
Input unit 130 can be used for receiving the number or character information of input, and generate and user setting and function Control related keyboard, mouse, operating stick, optics or trackball signal input.Specifically, input unit 130 may include touching Sensitive surfaces 131 and other input equipments 132.Touch sensitive surface 131, also referred to as touch display screen or Trackpad are collected and are used The touch operation of family on it or nearby(For example user uses any suitable objects or attachment such as finger, stylus in touch-sensitive table Operation on face 131 or near touch sensitive surface 131), and corresponding attachment device is driven according to preset formula.It is optional , touch sensitive surface 131 may include both touch detecting apparatus and touch controller.Wherein, touch detecting apparatus detection is used The touch orientation at family, and touch operation bring signal is detected, transmit a signal to touch controller;Touch controller is from touch Touch information is received in detection device, and is converted into contact coordinate, then gives processor 180, and can receive processor 180 The order sent simultaneously is executed.Furthermore, it is possible to using multiple types such as resistance-type, condenser type, infrared ray and surface acoustic waves Realize touch sensitive surface 131.In addition to touch sensitive surface 131, input unit 130 can also include other input equipments 132.Specifically, Other input equipments 132 can include but is not limited to physical keyboard, function key(Such as volume control button, switch key etc.), One of trace ball, mouse, operating stick etc. are a variety of.
Display unit 140 can be used for showing information input by user or the information and terminal 800 that are supplied to user Various graphical user interface, these graphical user interface can be made of figure, text, icon, video and any combination thereof. Display unit 140 may include display panel 141, optionally, can use LCD (Liquid Crystal Display, liquid crystal Show device), the forms such as OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) configure display panel 141.Further, touch sensitive surface 131 can cover display panel 141, when touch sensitive surface 131 detects touching on it or nearby After touching operation, processor 180 is sent to determine the type of touch event, is followed by subsequent processing device 180 according to the type of touch event Corresponding visual output is provided on display panel 141.Although in fig. 8, touch sensitive surface 131 and display panel 141 are conducts Two independent components realize input and input function, but in some embodiments it is possible to by touch sensitive surface 131 and display Panel 141 is integrated and realizes and outputs and inputs function.
Terminal 800 may also include at least one sensor 150, such as optical sensor, motion sensor and other sensings Device.Specifically, optical sensor may include ambient light sensor and proximity sensor, wherein ambient light sensor can be according to environment The light and shade of light adjusts the brightness of display panel 141, and proximity sensor can close display when terminal 800 is moved in one's ear Panel 141 and/or backlight.As a kind of motion sensor, gravity accelerometer can detect in all directions(Generally Three axis)The size of acceleration can detect that size and the direction of gravity, can be used to identify the application of mobile phone posture when static(Than Such as horizontal/vertical screen switching, dependent game, magnetometer pose calibrating), Vibration identification correlation function(Such as pedometer, percussion)Deng;Extremely In other sensors such as gyroscope, barometer, hygrometer, thermometer, the infrared sensors that terminal 800 can also configure, herein It repeats no more.
Voicefrequency circuit 160, loudspeaker 161, microphone 162 can provide the audio interface between user and terminal 800.Audio Electric signal after the audio data received conversion can be transferred to loudspeaker 161, be converted to sound by loudspeaker 161 by circuit 160 Sound signal output;On the other hand, the voice signal of collection is converted to electric signal by microphone 162, after being received by voicefrequency circuit 160 Audio data is converted to, then by after the processing of audio data output processor 180, such as another end is sent to through RF circuit 110 End, or audio data is exported to memory 120 to be further processed.Voicefrequency circuit 160 is also possible that earphone jack, To provide the communication of peripheral hardware earphone Yu terminal 800.
WiFi belongs to short range wireless transmission technology, and terminal 800 can help user's transceiver electronics by WiFi module 170 Mail, browsing webpage and access streaming video etc., it provides wireless broadband internet access for user.Although Fig. 8 is shown WiFi module 170, but it is understood that, and it is not belonging to must be configured into for terminal 800, it can according to need completely Do not change in the range of the essence of invention and omits.
Processor 180 is the control centre of terminal 800, utilizes each portion of various interfaces and connection whole mobile phone Point, by running or execute the software program and/or module that are stored in memory 120, and calls and be stored in memory 120 Interior data execute the various functions and processing data of terminal 800, to carry out integral monitoring to mobile phone.Optionally, processor 180 may include one or more processing cores;Preferably, processor 180 can integrate application processor and modem processor, Wherein, the main processing operation system of application processor, user interface and application program etc., modem processor mainly handles nothing Line communication.It is understood that above-mentioned modem processor can not also be integrated into processor 180.
Terminal 800 further includes the power supply 190 powered to all parts(Such as battery), it is preferred that power supply can pass through electricity Management system and processor 180 are logically contiguous, to realize management charging, electric discharge and power consumption by power-supply management system The functions such as management.Power supply 190 can also include one or more direct current or AC power source, recharging system, power supply event Hinder the random components such as detection circuit, power adapter or inverter, power supply status indicator.
Although being not shown, terminal 800 can also include camera, bluetooth module etc., and details are not described herein.Specifically in this reality It applies in example, the display unit of terminal is touch-screen display, and terminal further includes having memory and one or more than one Program, perhaps more than one program is stored in memory and is configured to by one or more than one processing for one of them Device executes.The one or more programs include instructions for performing the following operations:
The first source language vocabulary to be translated in current text is obtained, and determines the first source language vocabulary corresponding at least one A target language vocabulary to be selected;
Translated each second original language in each target language vocabulary to be selected and current text is determined according to corpus The first point-by-point mutual information between the corresponding target language vocabulary of vocabulary, and each target language vocabulary to be selected is determined according to corpus The second point-by-point mutual information between the second source language vocabulary;
First is determined according to the corresponding first point-by-point mutual information of each target language vocabulary to be selected and the second point-by-point mutual information The translation result of source language vocabulary
Assuming that above-mentioned is the first possible embodiment, then provided based on the first possible embodiment Second of possible embodiment in, in the memory of terminal, also include instructions for performing the following operations:
Using each target language vocabulary to be selected as the item that is triggered, and in the corresponding target language words of the second source language vocabulary Determine at least one corresponding first triggering item of the item that is each triggered in remittance, the first triggering item be under default Lexical Cohesion relationship with The corresponding target language vocabulary of the item that is triggered;
The point-by-point mutual information of son being each triggered between item and corresponding each first triggering item, and root are calculated according to corpus Each be triggered item and the second source are determined according to each point-by-point mutual information of son being triggered between item and corresponding each first triggering item The first point-by-point mutual information between the corresponding target language vocabulary of language vocabulary.
In the third the possible embodiment provided based on the first possible embodiment, terminal is deposited It also include instructions for performing the following operations in reservoir:
Using each target language vocabulary to be selected as the item that is triggered, and determination is each triggered in the second source language vocabulary At least one corresponding second triggering item of item, the second triggering item is corresponding with the item that is triggered second under default Lexical Cohesion relationship Source language vocabulary;
The point-by-point mutual information of son being each triggered between item and corresponding each second triggering item, and root are calculated according to corpus Each be triggered item and the second source are determined according to each point-by-point mutual information of son being triggered between item and corresponding each second triggering item The second point-by-point mutual information between language vocabulary.
In the 4th kind of possible embodiment provided based on second of possible embodiment, terminal is deposited It also include instructions for performing the following operations in reservoir:
The item that is each triggered is calculated according to corpus with corresponding each first triggering item in corresponding default Lexical Cohesion The first joint probability under relationship;
According to corpus calculate each be triggered first edge probability of the item under corresponding default Lexical Cohesion relationship with And second edge probability of the corresponding first triggering item of the item that is each triggered under corresponding default Lexical Cohesion relationship;
According to it is each be triggered item and corresponding each first triggering item under corresponding default Lexical Cohesion relationship the One joint probability, first edge probability and second edge probability calculation are each triggered between item and corresponding each first triggering item The point-by-point mutual information of son.
In the 5th kind of possible embodiment provided based on the 4th kind of possible embodiment, terminal is deposited It also include instructions for performing the following operations in reservoir:
It is counted in the text of corpus while each item that is triggered occurs and trigger item and satisfaction with corresponding each first First quantity of the text of corresponding default Lexical Cohesion relationship;
In the text of corpus statistics have each be triggered item with it is corresponding it is each first triggering item it is corresponding preset Second quantity of the text of Lexical Cohesion relationship;
The item that is each triggered is calculated according to the first quantity and the second quantity with corresponding each first triggering item corresponding The first joint probability under default Lexical Cohesion relationship.
In the 6th kind of possible embodiment provided based on the 4th kind of possible embodiment, terminal is deposited It also include instructions for performing the following operations in reservoir:
There is the third quantity of the text of each item that is triggered in statistics in the text of corpus;
There is the 4th quantity of the text of the corresponding first triggering item of each item that is triggered in statistics in the text of corpus;
In the text of corpus statistics have each be triggered item with it is corresponding it is each first triggering item it is corresponding preset Second quantity of the text of Lexical Cohesion relationship;
The to be each triggered item under corresponding default Lexical Cohesion relationship is calculated according to third quantity and the second quantity One marginal probability, and the item corresponding first that is each triggered is calculated with the second quantity according to the 4th quantity and triggers item corresponding pre- If the second edge probability under Lexical Cohesion relationship.
In the 7th kind of possible embodiment provided based on the third possible embodiment, terminal is deposited It also include instructions for performing the following operations in reservoir:
The item that is each triggered is calculated according to corpus with corresponding each second triggering item in corresponding default Lexical Cohesion The second joint probability under relationship;
According to corpus calculate each be triggered third marginal probability of the item under corresponding default Lexical Cohesion relationship with And fourth marginal probability of the corresponding second triggering item of the item that is each triggered under corresponding default Lexical Cohesion relationship;
According to it is each be triggered item and corresponding each second triggering item under corresponding default Lexical Cohesion relationship the Two joint probabilities, third marginal probability and the 4th marginal probability, which calculate, to be each triggered between item and corresponding each second triggering item The point-by-point mutual information of son.
In the 8th kind of possible embodiment provided based on the 7th kind of possible embodiment, terminal is deposited It also include instructions for performing the following operations in reservoir:
It is counted in the text of corpus while each item that is triggered occurs and trigger item and satisfaction with corresponding each second 5th quantity of the text of corresponding default Lexical Cohesion relationship;
In the text of corpus statistics have each be triggered item with it is corresponding it is each second triggering item it is corresponding preset 6th quantity of the text of Lexical Cohesion relationship;
The item that is each triggered is calculated according to the 5th quantity and the 6th quantity with corresponding each second triggering item corresponding The second joint probability under default Lexical Cohesion relationship.
In the 9th kind of possible embodiment provided based on the 7th kind of possible embodiment, terminal is deposited It also include instructions for performing the following operations in reservoir:
There is the 7th quantity of the text of each item that is triggered in statistics in the text of corpus;
There is the 8th quantity of the text of the corresponding second triggering item of each item that is triggered in statistics in the text of corpus;
In the text of corpus statistics have each be triggered item with it is corresponding it is each second triggering item it is corresponding preset 6th quantity of the text of Lexical Cohesion relationship;
The to be each triggered item under corresponding default Lexical Cohesion relationship is calculated according to the 7th quantity and the 6th quantity Three marginal probabilities, and the item corresponding second that is each triggered is calculated with the 6th quantity according to the 8th quantity and triggers item corresponding pre- If the 4th marginal probability under Lexical Cohesion relationship.
Terminal provided by the invention, by determining the first source language vocabulary corresponding at least one to be translated in current text A target language vocabulary to be selected determines translated each in each target language vocabulary to be selected and current text according to corpus The first point-by-point mutual information between the corresponding target language vocabulary of second source language vocabulary and second between the second source language vocabulary After point-by-point mutual information, determined according to the corresponding first point-by-point mutual information of each target language vocabulary to be selected and the second point-by-point mutual information The translation result of first source language vocabulary.Due to using point-by-point mutual information between object language end and original language end simultaneously to mesh Poster says the point-by-point mutual information between end to translate to source language vocabulary to be translated, therefore, by source language translation at target The quality translated when language is higher.
Embodiment eight
The embodiment of the invention also provides a kind of computer readable storage medium, which be can be Computer readable storage medium included in memory in above-described embodiment;It is also possible to individualism, eventually without supplying Computer readable storage medium in end.The computer-readable recording medium storage has one or more than one program, this one A or more than one program is used to execute the permission issuer for realizing multidimensional data by one or more than one processor Method, this method include:
The first source language vocabulary to be translated in current text is obtained, and determines the first source language vocabulary corresponding at least one A target language vocabulary to be selected;
Translated each second original language in each target language vocabulary to be selected and current text is determined according to corpus The first point-by-point mutual information between the corresponding target language vocabulary of vocabulary, and each target language vocabulary to be selected is determined according to corpus The second point-by-point mutual information between the second source language vocabulary;
First is determined according to the corresponding first point-by-point mutual information of each target language vocabulary to be selected and the second point-by-point mutual information The translation result of source language vocabulary.
Assuming that above-mentioned is the first possible embodiment, then provided based on the first possible embodiment Second of possible embodiment in, it is described to be determined in each target language vocabulary to be selected and current text according to corpus The first point-by-point mutual information between each of the translation corresponding target language vocabulary of the second source language vocabulary, including:
Using each target language vocabulary to be selected as the item that is triggered, and in the corresponding target language words of the second source language vocabulary Determine at least one corresponding first triggering item of the item that is each triggered in remittance, the first triggering item be under default Lexical Cohesion relationship with The corresponding target language vocabulary of the item that is triggered;
The point-by-point mutual information of son being each triggered between item and corresponding each first triggering item, and root are calculated according to corpus Each be triggered item and the second source are determined according to each point-by-point mutual information of son being triggered between item and corresponding each first triggering item The first point-by-point mutual information between the corresponding target language vocabulary of language vocabulary.
In the third the possible embodiment provided based on the first possible embodiment, the basis Corpus determines the second point-by-point mutual information between each target language vocabulary to be selected and the second source language vocabulary, including:
Using each target language vocabulary to be selected as the item that is triggered, and determination is each triggered in the second source language vocabulary At least one corresponding second triggering item of item, the second triggering item is corresponding with the item that is triggered second under default Lexical Cohesion relationship Source language vocabulary;
The point-by-point mutual information of son being each triggered between item and corresponding each second triggering item, and root are calculated according to corpus Each be triggered item and the second source are determined according to each point-by-point mutual information of son being triggered between item and corresponding each second triggering item The second point-by-point mutual information between language vocabulary.
In the 4th kind of possible embodiment provided based on second of possible embodiment, the basis Corpus calculates the point-by-point mutual information of son being each triggered between item and corresponding each first triggering item, including:
The item that is each triggered is calculated according to corpus with corresponding each first triggering item in corresponding default Lexical Cohesion The first joint probability under relationship;
According to corpus calculate each be triggered first edge probability of the item under corresponding default Lexical Cohesion relationship with And second edge probability of the corresponding first triggering item of the item that is each triggered under corresponding default Lexical Cohesion relationship;
According to it is each be triggered item and corresponding each first triggering item under corresponding default Lexical Cohesion relationship the One joint probability, first edge probability and second edge probability calculation are each triggered between item and corresponding each first triggering item The point-by-point mutual information of son.
In the 5th kind of possible embodiment provided based on the 4th kind of possible embodiment, the basis Corpus calculates the item that is each triggered and triggers first of item under corresponding default Lexical Cohesion relationship with corresponding each first Joint probability, including:
It is counted in the text of corpus while each item that is triggered occurs and trigger item and satisfaction with corresponding each first First quantity of the text of corresponding default Lexical Cohesion relationship;
In the text of corpus statistics have each be triggered item with it is corresponding it is each first triggering item it is corresponding preset Second quantity of the text of Lexical Cohesion relationship;
The item that is each triggered is calculated according to the first quantity and the second quantity with corresponding each first triggering item corresponding The first joint probability under default Lexical Cohesion relationship.
In the 6th kind of possible embodiment provided based on the 4th kind of possible embodiment, the basis Corpus calculates the first edge probability being each triggered item under corresponding default Lexical Cohesion relationship and is each triggered Second edge probability of the corresponding first triggering item of item under corresponding default Lexical Cohesion relationship, including:
There is the third quantity of the text of each item that is triggered in statistics in the text of corpus;
There is the 4th quantity of the text of the corresponding first triggering item of each item that is triggered in statistics in the text of corpus;
In the text of corpus statistics have each be triggered item with it is corresponding it is each first triggering item it is corresponding preset Second quantity of the text of Lexical Cohesion relationship;
The to be each triggered item under corresponding default Lexical Cohesion relationship is calculated according to third quantity and the second quantity One marginal probability, and the item corresponding first that is each triggered is calculated with the second quantity according to the 4th quantity and triggers item corresponding pre- If the second edge probability under Lexical Cohesion relationship.
In the 7th kind of possible embodiment provided based on the third possible embodiment, the basis Corpus calculates the point-by-point mutual information of son being each triggered between item and corresponding each second triggering item, including:
The item that is each triggered is calculated according to corpus with corresponding each second triggering item in corresponding default Lexical Cohesion The second joint probability under relationship;
According to corpus calculate each be triggered third marginal probability of the item under corresponding default Lexical Cohesion relationship with And fourth marginal probability of the corresponding second triggering item of the item that is each triggered under corresponding default Lexical Cohesion relationship;
According to it is each be triggered item and corresponding each second triggering item under corresponding default Lexical Cohesion relationship the Two joint probabilities, third marginal probability and the 4th marginal probability, which calculate, to be each triggered between item and corresponding each second triggering item The point-by-point mutual information of son.
In the 8th kind of possible embodiment provided based on the 7th kind of possible embodiment, the basis Corpus calculates the item that is each triggered and triggers second of item under corresponding default Lexical Cohesion relationship with corresponding each second Joint probability, including:
It is counted in the text of corpus while each item that is triggered occurs and trigger item and satisfaction with corresponding each second 5th quantity of the text of corresponding default Lexical Cohesion relationship;
In the text of corpus statistics have each be triggered item with it is corresponding it is each second triggering item it is corresponding preset 6th quantity of the text of Lexical Cohesion relationship;
The item that is each triggered is calculated according to the 5th quantity and the 6th quantity with corresponding each second triggering item corresponding The second joint probability under default Lexical Cohesion relationship.
In the 9th kind of possible embodiment provided based on the 7th kind of possible embodiment, the basis Corpus calculates the third marginal probability being each triggered item under corresponding default Lexical Cohesion relationship and is each triggered Fourth marginal probability of the corresponding second triggering item of item under corresponding default Lexical Cohesion relationship, including:
There is the 7th quantity of the text of each item that is triggered in statistics in the text of corpus;
There is the 8th quantity of the text of the corresponding second triggering item of each item that is triggered in statistics in the text of corpus;
In the text of corpus statistics have each be triggered item with it is corresponding it is each second triggering item it is corresponding preset 6th quantity of the text of Lexical Cohesion relationship;
The to be each triggered item under corresponding default Lexical Cohesion relationship is calculated according to the 7th quantity and the 6th quantity Three marginal probabilities, and the item corresponding second that is each triggered is calculated with the 6th quantity according to the 8th quantity and triggers item corresponding pre- If the 4th marginal probability under Lexical Cohesion relationship.
Computer readable storage medium provided in an embodiment of the present invention, by determining the first source to be translated in current text At least one corresponding target language vocabulary to be selected of language vocabulary, determines each target language vocabulary to be selected and works as according to corpus The first point-by-point mutual information in preceding text between the corresponding target language vocabulary of translated each second source language vocabulary and with After the second point-by-point mutual information between two source language vocabularies, according to the corresponding first point-by-point mutual information of each target language vocabulary to be selected The translation result of the first source language vocabulary is determined with the second point-by-point mutual information.Due to simultaneously using point-by-point between object language end Mutual information and original language end translate source language vocabulary to be translated to the point-by-point mutual information between object language end, because This, the quality translated when by source language translation at object language is higher.
Embodiment nine
The embodiment of the invention provides a kind of graphical user interface, the graphical user interface is used at the terminal, the end End includes touch-screen display, memory and one for executing one or more than one program or more than one Processor;The graphical user interface includes:
The first source language vocabulary to be translated in current text is obtained, and determines the first source language vocabulary corresponding at least one A target language vocabulary to be selected;
Translated each second original language in each target language vocabulary to be selected and current text is determined according to corpus The first point-by-point mutual information between the corresponding target language vocabulary of vocabulary, and each target language vocabulary to be selected is determined according to corpus The second point-by-point mutual information between the second source language vocabulary;
First is determined according to the corresponding first point-by-point mutual information of each target language vocabulary to be selected and the second point-by-point mutual information The translation result of source language vocabulary.
Graphical user interface provided in an embodiment of the present invention, by determining the first original language word to be translated in current text It converges at least one corresponding target language vocabulary to be selected, each target language vocabulary to be selected and current text is determined according to corpus In the first point-by-point mutual information between the corresponding target language vocabulary of translated each second source language vocabulary and with the second source language Words converge between the second point-by-point mutual information after, according to the corresponding first point-by-point mutual information of each target language vocabulary to be selected and second Point-by-point mutual information determines the translation result of the first source language vocabulary.Due to using the point-by-point mutual information between object language end simultaneously And original language end translates source language vocabulary to be translated to the point-by-point mutual information between object language end, therefore, by source The quality translated when language translation is at object language is higher.
It should be noted that:Machine translation apparatus provided by the above embodiment when by source language translation at object language, Only the example of the division of the above functional modules, it in practical application, can according to need and by above-mentioned function distribution It is completed by different functional modules, i.e., the internal structure of device is divided into different functional modules, it is described above to complete All or part of function.In addition, machine translation apparatus provided by the above embodiment and machine translation method embodiment belong to together One design, specific implementation process are detailed in embodiment of the method, and which is not described herein again.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware It completes, relevant hardware can also be instructed to complete by program, the program can store in a kind of computer-readable In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims (19)

1. a kind of machine translation method, which is characterized in that the method includes:
The first source language vocabulary to be translated in current text is obtained, and determines first source language vocabulary corresponding at least one A target language vocabulary to be selected;
Translated each second source in each target language vocabulary to be selected and the current text is determined according to corpus The first point-by-point mutual information between the corresponding target language vocabulary of language vocabulary, and determined according to the corpus each described to be selected The second point-by-point mutual information in target language vocabulary and the current text between translated each second source language vocabulary;
According to each corresponding first point-by-point mutual information of target language vocabulary to be selected and the second point-by-point mutual information Determine the translation result of first source language vocabulary.
2. the method according to claim 1, wherein described determine each target language to be selected according to corpus Words converge between target language vocabulary corresponding with each second source language vocabulary translated in the current text first by Point mutual information, including:
Using each target language vocabulary to be selected as the item that is triggered, and in the corresponding mesh of second source language vocabulary At least one corresponding first triggering item of each item that is triggered is determined in mark language vocabulary, the first triggering item is default The target language vocabulary corresponding with the item that is triggered under Lexical Cohesion relationship;
It is point-by-point mutual that each son being triggered between item and corresponding each first triggering item is calculated according to the corpus Information, and determined according to each point-by-point mutual information of the son being triggered between item and corresponding each first triggering item Described first between each item target language vocabulary corresponding with second source language vocabulary that is triggered is point-by-point mutual Information.
3. the method according to claim 1, wherein described determine each mesh to be selected according to the corpus Mark the second point-by-point mutual information in language vocabulary and the current text between translated each second source language vocabulary, packet It includes:
Using each target language vocabulary to be selected as the item that is triggered, and each institute is determined in second source language vocabulary State at least one corresponding second triggering item of the item that is triggered, the second triggering item be under default Lexical Cohesion relationship with the quilt Trigger corresponding second source language vocabulary of item;
It is point-by-point mutual that each son being triggered between item and corresponding each second triggering item is calculated according to the corpus Information, and determined according to each point-by-point mutual information of the son being triggered between item and corresponding each second triggering item Each described second point-by-point mutual information being triggered between item and second source language vocabulary.
4. according to the method described in claim 2, it is characterized in that, described calculate each described be triggered according to the corpus The point-by-point mutual information of son between item and corresponding each first triggering item, including:
Each item that is triggered is calculated with corresponding each first triggering item corresponding described according to the corpus The first joint probability under default Lexical Cohesion relationship;
Each first side of the item under the corresponding default Lexical Cohesion relationship that be triggered is calculated according to the corpus Edge probability and the corresponding first triggering item of each item that is triggered are under the corresponding default Lexical Cohesion relationship Second edge probability;
It is closed with corresponding each first triggering item in the corresponding default Lexical Cohesion according to each item that is triggered The each item that is triggered of first joint probability, the first edge probability and the second edge probability calculation under system With the point-by-point mutual information of son between corresponding each first triggering item.
5. according to the method described in claim 4, it is characterized in that, described calculate each described be triggered according to the corpus First joint probability of the item with corresponding each first triggering item under the corresponding default Lexical Cohesion relationship, packet It includes:
It is counted in the text of the corpus while each item that is triggered occurs and triggered with corresponding each described first And meet the corresponding default Lexical Cohesion relationship text the first quantity;
There is statistics each item that is triggered to trigger with corresponding each described first in the text of the corpus Second quantity of the text of the corresponding default Lexical Cohesion relationship of item;
Each item that is triggered is calculated according to first quantity and second quantity to touch with corresponding each described first Send out first joint probability of the item under the corresponding default Lexical Cohesion relationship.
6. according to the method described in claim 4, it is characterized in that, described calculate each described be triggered according to the corpus First edge probability and each the be triggered item of the item under the corresponding default Lexical Cohesion relationship are corresponding described Second edge probability of the first triggering item under the corresponding default Lexical Cohesion relationship, including:
There is the third quantity of the text of each item that is triggered in statistics in the text of the corpus;
There is the text of the corresponding first triggering item of each item that is triggered in statistics in the text of the corpus 4th quantity;
There is statistics each item that is triggered to trigger with corresponding each described first in the text of the corpus Second quantity of the text of the corresponding default Lexical Cohesion relationship of item;
Each item that is triggered is calculated with second quantity according to the third quantity to hold in the mouth in the corresponding default vocabulary The first edge probability under relationship is connect, and each described be triggered is calculated according to the 4th quantity and second quantity The second edge probability of the corresponding first triggering item of item under the corresponding default Lexical Cohesion relationship.
7. according to the method described in claim 3, it is characterized in that, described calculate each described be triggered according to the corpus The point-by-point mutual information of son between item and corresponding each second triggering item, including:
Each item that is triggered is calculated with corresponding each second triggering item corresponding described according to the corpus The second joint probability under default Lexical Cohesion relationship;
The each third side of the item under the corresponding default Lexical Cohesion relationship that be triggered is calculated according to the corpus Edge probability and the corresponding second triggering item of each item that is triggered are under the corresponding default Lexical Cohesion relationship The 4th marginal probability;
It is closed with corresponding each second triggering item in the corresponding default Lexical Cohesion according to each item that is triggered Second joint probability, the third marginal probability and the 4th marginal probability under system calculate each item that is triggered With the point-by-point mutual information of son between corresponding each second triggering item.
8. the method according to the description of claim 7 is characterized in that described calculate each described be triggered according to the corpus Second joint probability of the item with corresponding each second triggering item under the corresponding default Lexical Cohesion relationship, packet It includes:
It is counted in the text of the corpus while each item that is triggered occurs and triggered with corresponding each described second And meet the corresponding default Lexical Cohesion relationship text the 5th quantity;
There is statistics each item that is triggered to trigger with corresponding each described second in the text of the corpus 6th quantity of the text of the corresponding default Lexical Cohesion relationship of item;
Each item that is triggered is calculated according to the 5th quantity and the 6th quantity to touch with corresponding each described second Send out second joint probability of the item under the corresponding default Lexical Cohesion relationship.
9. the method according to the description of claim 7 is characterized in that described calculate each described be triggered according to the corpus Third marginal probability and each the be triggered item of the item under the corresponding default Lexical Cohesion relationship are corresponding described Fourth marginal probability of the second triggering item under the corresponding default Lexical Cohesion relationship, including:
There is the 7th quantity of the text of each item that is triggered in statistics in the text of the corpus;
There is the text of the corresponding second triggering item of each item that is triggered in statistics in the text of the corpus 8th quantity;
There is statistics each item that is triggered to trigger with corresponding each described second in the text of the corpus 6th quantity of the text of the corresponding default Lexical Cohesion relationship of item;
Each item that is triggered is calculated with the 6th quantity according to the 7th quantity to hold in the mouth in the corresponding default vocabulary The third marginal probability under relationship is connect, and each described be triggered is calculated according to the 8th quantity and the 6th quantity Fourth marginal probability of the corresponding second triggering item of item under the corresponding default Lexical Cohesion relationship.
10. a kind of machine translation apparatus, which is characterized in that described device includes:
Module is obtained, for obtaining the first source language vocabulary to be translated in current text;
First determining module, for determining at least one corresponding target language vocabulary to be selected of first source language vocabulary;
Second determining module, for being determined in each target language vocabulary to be selected and the current text according to corpus The first point-by-point mutual information between each of the translation corresponding target language vocabulary of the second source language vocabulary;
Third determining module, for determining each target language vocabulary to be selected and the current text according to the corpus In the second point-by-point mutual information between translated each second source language vocabulary;
4th determining module, for according to the corresponding first point-by-point mutual information of each target language vocabulary to be selected and institute State the translation result that the second point-by-point mutual information determines first source language vocabulary.
11. device according to claim 10, which is characterized in that second determining module, including:
First determination unit, for using each target language vocabulary to be selected as the item that is triggered, and in second source language At least one corresponding first triggering item of each item that is triggered, institute are determined in the corresponding target language vocabulary of words remittance Stating the first triggering item is the target language vocabulary corresponding with the item that is triggered under default Lexical Cohesion relationship;
First computing unit is touched for calculating each item that is triggered according to the corpus with corresponding each described first Send out the point-by-point mutual information of son between item;
Second determination unit, for according to each son being triggered between item and corresponding each first triggering item Point-by-point mutual information determines each described be triggered between the item target language vocabulary corresponding with second source language vocabulary The first point-by-point mutual information.
12. device according to claim 10, which is characterized in that the third determining module, including:
Third determination unit, for using each target language vocabulary to be selected as the item that is triggered, and in second source language Words determine at least one corresponding second triggering item of each item that is triggered in converging, and the second triggering item is default vocabulary Second source language vocabulary corresponding with the item that is triggered under joining relation;
Second computing unit is touched for calculating each item that is triggered according to the corpus with corresponding each described second Send out the point-by-point mutual information of son between item;
4th determination unit, for according to each son being triggered between item and corresponding each second triggering item Point-by-point mutual information determines each described second point-by-point mutual information being triggered between item and second source language vocabulary.
13. device according to claim 11, which is characterized in that first computing unit, including:
First computation subunit, for calculating each item and corresponding each described first that is triggered according to the corpus Trigger first joint probability of the item under the corresponding default Lexical Cohesion relationship;
Second computation subunit, for being triggered item in the corresponding default vocabulary according to corpus calculating is each described The corresponding first triggering item of first edge probability and each item that is triggered under joining relation is corresponding described Second edge probability under default Lexical Cohesion relationship;
Third computation subunit, for triggering item corresponding with corresponding each described first according to each item that is triggered First joint probability, the first edge probability and the second edge probability meter under the default Lexical Cohesion relationship Calculate each point-by-point mutual information of son being triggered between item and corresponding each first triggering item.
14. device according to claim 13, which is characterized in that first computation subunit, in the corpus It is counted in the text in library while each item that is triggered occurs and triggered item with corresponding each described first and meet corresponding First quantity of the text of the default Lexical Cohesion relationship;Statistics has described each described in the text of the corpus Be triggered the second quantity of item and the text of the corresponding default Lexical Cohesion relationship of corresponding each first triggering item; Each item that is triggered, which is calculated, according to first quantity and second quantity triggers item with corresponding each described first First joint probability under the corresponding default Lexical Cohesion relationship.
15. device according to claim 13, which is characterized in that second computation subunit, in the corpus There is the third quantity of the text of each item that is triggered in statistics in the text in library;It is counted in the text of the corpus Now each item that is triggered corresponding described first triggers the 4th quantity of the text of item;It unites in the text of the corpus Meter triggers the corresponding default Lexical Cohesion of item with corresponding each described first with each item that is triggered and closes Second quantity of the text of system;Item is triggered corresponding with described in second quantity calculating each according to the third quantity The first edge probability under the default Lexical Cohesion relationship, and calculated according to the 4th quantity and second quantity The corresponding first triggering item of the item that is triggered under the corresponding default Lexical Cohesion relationship described in Second edge probability.
16. device according to claim 12, which is characterized in that second computing unit, including:
4th computation subunit, for calculating each item and corresponding each described second that is triggered according to the corpus Trigger second joint probability of the item under the corresponding default Lexical Cohesion relationship;
5th computation subunit, for being triggered item in the corresponding default vocabulary according to corpus calculating is each described The corresponding second triggering item of third marginal probability and each item that is triggered under joining relation is corresponding described The 4th marginal probability under default Lexical Cohesion relationship;
6th computation subunit, for triggering item corresponding with corresponding each described second according to each item that is triggered Second joint probability, the third marginal probability and the 4th marginal probability meter under the default Lexical Cohesion relationship Calculate each point-by-point mutual information of son being triggered between item and corresponding each second triggering item.
17. device according to claim 16, which is characterized in that the 4th computation subunit, in the corpus It is counted in the text in library while each item that is triggered occurs and triggered item with corresponding each described second and meet corresponding 5th quantity of the text of the default Lexical Cohesion relationship;Statistics has described each described in the text of the corpus Be triggered the 6th quantity of item and the text of the corresponding default Lexical Cohesion relationship of corresponding each second triggering item; Each item that is triggered, which is calculated, according to the 5th quantity and the 6th quantity triggers item with corresponding each described second Second joint probability under the corresponding default Lexical Cohesion relationship.
18. device according to claim 16, which is characterized in that the 5th computation subunit, in the corpus There is the 7th quantity of the text of each item that is triggered in statistics in the text in library;It is counted in the text of the corpus Now each item that is triggered corresponding described second triggers the 8th quantity of the text of item;It unites in the text of the corpus Meter triggers the corresponding default Lexical Cohesion of item with corresponding each described second with each item that is triggered and closes 6th quantity of the text of system;Item is triggered corresponding with described in the 6th quantity calculating each according to the 7th quantity The third marginal probability under the default Lexical Cohesion relationship, and calculated according to the 8th quantity and the 6th quantity Each item that is triggered corresponding described second triggers the described 4th of item under the corresponding default Lexical Cohesion relationship Marginal probability.
19. a kind of computer readable storage medium, which is characterized in that it is stored with program in the computer readable storage medium, Described program is loaded by processor and is executed to realize machine translation method as described in any one of claim 1 to 9.
CN201410026026.3A 2014-01-20 2014-01-20 Machine translation method and device Active CN104794110B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410026026.3A CN104794110B (en) 2014-01-20 2014-01-20 Machine translation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410026026.3A CN104794110B (en) 2014-01-20 2014-01-20 Machine translation method and device

Publications (2)

Publication Number Publication Date
CN104794110A CN104794110A (en) 2015-07-22
CN104794110B true CN104794110B (en) 2018-11-23

Family

ID=53558908

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410026026.3A Active CN104794110B (en) 2014-01-20 2014-01-20 Machine translation method and device

Country Status (1)

Country Link
CN (1) CN104794110B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781662B (en) * 2019-10-21 2022-02-01 腾讯科技(深圳)有限公司 Method for determining point-to-point mutual information and related equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002075586A1 (en) * 2001-03-16 2002-09-26 Eli Abir Content conversion method and apparatus
CN1471029A (en) * 2002-06-28 2004-01-28 System and method for auto-detecting collcation mistakes of file
CN1503161A (en) * 2002-11-20 2004-06-09 Statistical method and apparatus for learning translation relationship among phrases
CN1567297A (en) * 2003-07-03 2005-01-19 中国科学院声学研究所 Method for extracting multi-word translation equivalent cells from bilingual corpus automatically
CN101763402A (en) * 2009-12-30 2010-06-30 哈尔滨工业大学 Integrated retrieval method for multi-language information retrieval
CN102375809A (en) * 2010-08-04 2012-03-14 英业达股份有限公司 System and method for instantly outputting second language from input first language
CN102486770A (en) * 2010-12-02 2012-06-06 财团法人资讯工业策进会 Text conversion method and system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002075586A1 (en) * 2001-03-16 2002-09-26 Eli Abir Content conversion method and apparatus
CN1471029A (en) * 2002-06-28 2004-01-28 System and method for auto-detecting collcation mistakes of file
CN1503161A (en) * 2002-11-20 2004-06-09 Statistical method and apparatus for learning translation relationship among phrases
CN1567297A (en) * 2003-07-03 2005-01-19 中国科学院声学研究所 Method for extracting multi-word translation equivalent cells from bilingual corpus automatically
CN101763402A (en) * 2009-12-30 2010-06-30 哈尔滨工业大学 Integrated retrieval method for multi-language information retrieval
CN102375809A (en) * 2010-08-04 2012-03-14 英业达股份有限公司 System and method for instantly outputting second language from input first language
CN102486770A (en) * 2010-12-02 2012-06-06 财团法人资讯工业策进会 Text conversion method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"基于改进互信息的译文选择技术研究";林晓庆 等;《技术与方法》;20101231(第4期);68-70 *
"基于网络的跨语言信息检索中OOV译文挖掘研究";葛运东 等;《微电子学与计算机》;20091031;第26卷(第10期);185-188 *

Also Published As

Publication number Publication date
CN104794110A (en) 2015-07-22

Similar Documents

Publication Publication Date Title
CN105824958B (en) A kind of methods, devices and systems of inquiry log
CN105095432B (en) Web page annotation display methods and device
CN105278937B (en) A kind of method and device showing pop-up box message
US20170091335A1 (en) Search method, server and client
CN106227774B (en) Information search method and device
US9241242B2 (en) Information recommendation method and apparatus
WO2014169715A1 (en) Information recommendation method and apparatus
CN105022616A (en) Method and device for generating web page
CN104516624B (en) A kind of method and device inputting account information
CN104281394A (en) Method and device for intelligently selecting words
CN105530239B (en) Multi-medium data acquisition methods and device
CN105955597B (en) Information display method and device
CN104021129B (en) Show the method and terminal of group picture
CN107885718B (en) Semantic determination method and device
CN105868319B (en) Webpage loading method and device
CN105302452A (en) Gesture interaction-based operation method and device
CN105550316B (en) The method for pushing and device of audio list
CN104598542B (en) The display methods and device of multimedia messages
CN109543014B (en) Man-machine conversation method, device, terminal and server
CN105653112B (en) Method and device for displaying floating layer
CN106486119A (en) A kind of method and apparatus of identification voice messaging
CN106210838B (en) Caption presentation method and device
CN109002238A (en) Number of characters display methods, device, terminal and computer readable storage medium
CN104123308B (en) Webpage generating method and auto-building html files device
CN109451295A (en) A kind of method and system obtaining virtual information

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20190730

Address after: Shenzhen Futian District City, Guangdong province 518000 Zhenxing Road, SEG Science Park 2 East Room 403

Co-patentee after: Tencent cloud computing (Beijing) limited liability company

Patentee after: Tencent Technology (Shenzhen) Co., Ltd.

Address before: Shenzhen Futian District City, Guangdong province 518000 Zhenxing Road, SEG Science Park 2 East Room 403

Patentee before: Tencent Technology (Shenzhen) Co., Ltd.