CN114154483A - A method, device, medium and equipment for measuring the similarity of sentences - Google Patents
A method, device, medium and equipment for measuring the similarity of sentences Download PDFInfo
- Publication number
- CN114154483A CN114154483A CN202111211101.XA CN202111211101A CN114154483A CN 114154483 A CN114154483 A CN 114154483A CN 202111211101 A CN202111211101 A CN 202111211101A CN 114154483 A CN114154483 A CN 114154483A
- Authority
- CN
- China
- Prior art keywords
- context
- similarity
- sentence
- calculated
- corpus
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3346—Query execution using probabilistic model
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Probability & Statistics with Applications (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a sentence similarity measurement method, a sentence similarity measurement device, a sentence similarity measurement medium and sentence similarity measurement equipment, which are characterized by comprising the following steps: performing unsupervised learning on the context matching relationship of each sentence in a predetermined unmarked corpus by using a language model tool to obtain a context matching model; obtaining contexts related to a plurality of sentences with similarity to be calculated from a non-labeled corpus to obtain a shared context set, calculating each sentence with similarity to be calculated by using a context matching model, scoring each context in the shared context set, and further obtaining a context score vector by using all the context scores; and calculating cosine similarity between each context score vector, thereby obtaining sentence similarity between sentences with similarity to be calculated corresponding to the context score vectors. The method can complete the calculation of sentence similarity without marking data, reduces the dependence on the marking data and has simple calculation process.
Description
Technical Field
The present application relates to the field of language processing technologies, and in particular, to a method and an apparatus for measuring sentence similarity, a storage medium, and a computer device.
Background
Sentence similarity refers to how close semantically two sentences are evaluated, for example, "apple is a fruit" and "pear is a fruit", the two sentences are relatively close semantically, but "apple is a fruit" and "i love eating a pear" are relatively low in semantic similarity. The sentence similarity model is to accurately judge how similar the two sentences are in semantics.
The traditional sentence similarity model training needs to give a data set consisting of a sentence pair and a similarity score thereof, and the sentence similarity model is trained by the data set. However, such labeled data is lacking because the similarity between two sentences needs to be labeled manually, and the similarity measure between sentences can be evaluated from many aspects, so that the manual scoring efficiency is low, and the scale of the existing labeled data is small. For example, the commonly used STS data set only has 8600 training samples, the SICK data set only has 9800 training samples, and both the training samples do not reach ten thousand levels of data, so that the trained model is not good enough.
Disclosure of Invention
The invention provides a method and a device for measuring sentence similarity, a storage medium and computer equipment, which can complete the calculation of sentence similarity without marking data, reduce the dependence on the marking data and have simple calculation process.
In order to solve the above problems, the present invention adopts a technical solution that: a method for measuring sentence similarity is provided, which comprises the following steps:
performing unsupervised learning on the context matching relationship of each sentence in a predetermined unmarked corpus by using a language model tool to obtain a context matching model;
obtaining contexts related to a plurality of sentences with similarity to be calculated from a non-labeled corpus to obtain a shared context set, calculating each sentence with similarity to be calculated by using a context matching model, sharing the context score of each context in the shared context set, and further obtaining a context score vector of each sentence with similarity to be calculated by using all the context scores; and the number of the first and second groups,
and calculating the cosine similarity between each context score vector and the rest of the context score vectors so as to obtain the sentence similarity between the sentence with the similarity to be calculated corresponding to the context score vector and the rest of the sentences with the similarity to be calculated.
The invention adopts another technical scheme that: there is provided a sentence similarity measuring apparatus, including:
a module for performing unsupervised learning on the context matching relationship of each sentence in a predetermined unmarked corpus by using a language model tool to obtain a context matching model;
a module for obtaining contexts related to a plurality of sentences with similarity to be calculated from a non-labeled corpus to obtain a shared context set, calculating each sentence with similarity to be calculated by using a context matching model, sharing a context score of each context in the shared context set, and further obtaining a context score vector of each sentence with similarity to be calculated by using all the context scores; and the number of the first and second groups,
and the module is used for calculating the cosine similarity between each context score vector and the rest of the context score vectors so as to obtain the sentence similarity between the sentence with the similarity to be calculated corresponding to the context score vector and the rest of the sentences with the similarity to be calculated.
In another aspect of the present invention, a computer-readable storage medium is provided, which stores computer instructions, wherein the computer instructions are operable to perform a method for measuring sentence similarity in a solution.
In another aspect of the present invention, a computer device is provided, which includes a processor and a memory, the memory storing computer instructions, wherein the processor operates the computer instructions to perform a method for measuring sentence similarity in a scheme.
The technical scheme of the invention can achieve the following beneficial effects: the invention provides a method and a device for measuring sentence similarity, a storage medium and computer equipment, which can complete the calculation of sentence similarity without marking data, reduce the dependence on the marking data and have simple calculation process.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
FIG. 1 is a diagram illustrating an embodiment of a sentence similarity measurement method according to the present invention;
FIG. 2 is a diagram illustrating an embodiment of a sentence similarity measurement method according to the present invention;
fig. 3 is a diagram illustrating an embodiment of a sentence similarity measuring apparatus according to the present invention.
With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.
Detailed Description
The following detailed description of the preferred embodiments of the present invention, taken in conjunction with the accompanying drawings, will make the advantages and features of the invention easier to understand by those skilled in the art, and thus will clearly and clearly define the scope of the invention.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Fig. 1 is a diagram illustrating an embodiment of a sentence similarity measurement method according to the present invention.
In this embodiment, the method for measuring sentence similarity mainly includes:
the process S101: performing unsupervised learning on the context matching relationship of each sentence in a predetermined unmarked corpus by using a language model tool to obtain a context matching model;
the process S102: obtaining contexts related to a plurality of sentences with similarity to be calculated from a non-labeled corpus to obtain a shared context set, calculating each sentence with similarity to be calculated by using a context matching model, sharing the context score of each context in the shared context set, and further obtaining a context score vector of each sentence with similarity to be calculated by using all the context scores;
the process S103: and calculating the cosine similarity between each context score vector and the rest of the context score vectors so as to obtain the sentence similarity between the sentence with the similarity to be calculated corresponding to the context score vector and the rest of the sentences with the similarity to be calculated.
By the sentence similarity measuring method, the sentence similarity can be calculated without marking data, dependence on the marking data is reduced, and the calculating process is simple.
In the embodiment shown in fig. 1, the method for measuring sentence similarity according to the present invention includes a process S101, which utilizes a language model tool to perform unsupervised learning on a context matching relationship of each sentence in a predetermined unlabeled corpus to obtain a context matching model. The process predetermines the non-labeled corpus, reduces the dependence on labeled data, and obtains the context matching model, so as to further obtain the context score vector of each sentence with similarity to be calculated according to the context matching model, and further obtain the sentence similarity between the sentences with similarity to be calculated.
Specifically, in practical application, the unmarked corpus may be input into a language model tool, the language model tool performs unsupervised learning on the context matching relationship of each sentence in the unmarked corpus to obtain a context matching model, and the unsupervised learning here does not label any data, so that the model tool obtains a function of outputting a context matching score. The process is convenient for obtaining the context score vector of each sentence with the similarity to be calculated according to the context matching model, so that the sentence similarity between the sentences with the similarity to be calculated is further obtained.
In one embodiment of the invention, the unlabeled corpus includes unlabeled data that is crawled over the Internet. The process reduces the dependence on the labeled data, and can obtain the context matching model according to the training of the label-free corpus so as to further obtain the context score vector of each sentence with the similarity to be calculated according to the context matching model, thereby further obtaining the sentence similarity between the sentences with the similarity to be calculated.
Specifically, a non-labeled corpus can be formed by directly crawling a large amount of non-labeled data from the internet, for example, a non-labeled corpus is formed by crawling a large amount of non-labeled data from encyclopedia knowledge, forums, news information, social media and the like. Here, unmarked data means unprocessed data.
In the embodiment shown in fig. 1, the method for measuring sentence similarity of the present invention includes a process S102, obtaining a shared context set by obtaining contexts related to a plurality of sentences with similarity to be calculated from an unlabeled corpus, calculating each sentence with similarity to be calculated by using a context matching model, scoring each context in the shared context set, and further obtaining a context score vector of each sentence with similarity to be calculated by using all the context scores. The process calculates a context score vector of each sentence with the similarity to be calculated so as to further obtain the sentence similarity between the sentences with the similarity to be calculated.
In an embodiment of the present invention, the process of obtaining a shared context set from a non-labeled corpus, where the context is related to a plurality of sentences with similarity to be calculated, includes obtaining a context related to each sentence with similarity to be calculated from the non-labeled corpus; and combining all the contexts to obtain a shared context set. This process obtains a set of shared contexts to facilitate further computation of the context score vector.
In an embodiment of the invention, the process of obtaining a context associated with each of the plurality of sentences with similarity to be calculated from the unlabeled corpus includes obtaining a context associated with each of the sentences with similarity to be calculated from the unlabeled corpus by using a word frequency-inverse document frequency algorithm. This process facilitates noise reduction and improves the accuracy of subsequently derived context score vectors.
The word frequency-inverse document frequency algorithm is a mature prior art, and aims to relatively roughly find the context related to the sentences with similarity to be calculated.
Specifically, referring to the schematic diagram of a specific example of the method for measuring sentence similarity provided in fig. 2 of the present invention, first, two contexts related to a given sentence with similarity to be calculated are obtained from an unlabeled corpus, that is, contexts C1 and C2 related to a sentence S1 and a sentence S2 are obtained from an unlabeled corpus, if the given sentence S1 is "apple is a fruit", the given sentence S2 is "love eating pears", a context C1 related to a sentence 1 "apple is a fruit" is actually a context set, a plurality of contexts related to "apple is a fruit" may be contained in C1, if C1 contains 3 contexts related to "apple is a fruit", the contexts related to "apple is a fruit" may be respectively replaced with numbers a, b, and C, and thus C1 ═ a, b, C }, C, and C }, respectively, similarly, if C2 also contains 3 contexts relating to "i love eat pears", the 3 contexts relating to "i love eat pears" can be replaced by labels d, e, f, respectively, so that C2 ═ d, e, f. The context C1 and C2 associated with the sentence S1 and the sentence S2 are merged to obtain the shared context set C, which is a mathematical union, so that the shared context set C ═ C1 ═ C2 ═ a, b, C, d, e, f }.
In a specific example of the present invention, each sentence with similarity to be calculated is calculated by using a context matching model, and a context score of each context in a shared context set is calculated, and further, a context score vector of each sentence with similarity to be calculated is obtained by using all the context scores. Since the above procedure has already been performed to obtain the shared context set C of the given sentence S1 and the sentence S2 { a, b, C, d, e, f }, first, the context scores of the sentence S1 and the shared context set C are calculated, the sentence S1 and the shared context set C are input into the context matching model, and the sentence S1 and each context in the shared context set C are subjected to the matching score, for example, the sentence S1 is matched with the context 1 in the shared context set C, then the score of S1 and the context a is: [ p (S)1a)+p(aS1)]2, assume that the score of S1 and context a is 2, and similarly assume that the score of S1 and context b is 3, the score of S1 and context c is 4, the score of S1 and context d is 2, the score of S1 and context e is 1, and the score of S1 and context f is 2. Thereby obtaining a context score vector v of the sentence S1 and the shared context set C 12, 3, 4, 2, 1, 2. Similarly, the process of calculating the context score of the sentence S2 and the shared context set C is the same as the process of calculating S1, provided that the context score vector v of the sentence S2 and the shared context set C is2={1,4,3,3,2,2}。
In the embodiment shown in fig. 1, the method for measuring sentence similarity of the present invention includes a process S103 of calculating cosine similarity between each context score vector and the remaining context score vectors, so as to obtain sentence similarity between a sentence with similarity to be calculated corresponding to the context score vector and the remaining sentences with similarity to be calculated. The process calculates cosine similarity between context score vectors so as to further obtain sentence similarity between sentences with similarity to be calculated by referring to the cosine similarity.
In an embodiment of the present invention, referring to the schematic diagram of an embodiment of the sentence similarity measurement method provided in fig. 2 of the present invention, since the context score vector v of the sentence S1 and the shared context set C has been calculated in the example of the process S1021With {2, 3, 4, 2, 1, 2}, the sentence S2 shares the context score vector v of the context set C 21, 4, 3, 3, 2, 2. Continue scoring the context vector v1With context score vector v2Calculating cosine similarity Sim (upsilon)1,υ2) The calculation formula is as follows:
Sim(υ1,υ2)=cos(υ1,υ2)
cosine similarity Sim (upsilon) calculated in the process1,v2) Represents a context score vector v1And a context score vector v2The size of the included angle between the two groups, the cosine similarity and the sentence similarity are positively correlated. The smaller the angle, the larger the cosine value, and the more similar the sentence. Thus, according to all the processes summarized in the above example, the similarity between the sentence S1 and the sentence S2, i.e. the similarity between "apple is a kind of fruit" and "i love eating a pear" can be calculated.
Fig. 3 is a schematic diagram illustrating an embodiment of a sentence similarity measuring apparatus according to the present invention.
In this embodiment, the sentence similarity measuring device mainly includes:
the module 301: and the module is used for carrying out unsupervised learning on the context matching relationship of each sentence in the predetermined unmarked corpus by utilizing a language model tool to obtain a context matching model. The module determines a non-labeled corpus in advance, reduces the dependence on labeled data, and obtains a context matching model so as to further obtain a context score vector of each similarity sentence to be calculated according to the context matching model, thereby further obtaining the sentence similarity between the similarity sentences to be calculated.
The module 302: and the module is used for acquiring the contexts related to the plurality of sentences with the similarity to be calculated from the non-labeled corpus to obtain a shared context set, calculating each sentence with the similarity to be calculated by using a context matching model, sharing the context score of each context in the shared context set, and further obtaining the context score vector of each sentence with the similarity to be calculated by using all the context scores. The module calculates a context score vector of each sentence with the similarity to be calculated so as to further obtain the sentence similarity between the sentences with the similarity to be calculated.
Module 303: and the module is used for calculating the cosine similarity between each context score vector and the rest of the context score vectors so as to obtain the sentence similarity between the sentence with the similarity to be calculated corresponding to the context score vector and the rest of the sentences with the similarity to be calculated. The module calculates cosine similarity between context score vectors so as to further obtain sentence similarity between sentences with similarity to be calculated by referring to the cosine similarity.
In an embodiment of the present invention, the module 301 further includes a sub-module for obtaining an unlabeled corpus by crawling unlabeled data through the internet. The submodule reduces the dependence on the labeled data, and can obtain the context matching model according to the training of the label-free corpus so as to further obtain the context score vector of each sentence with the similarity to be calculated according to the context matching model, thereby further obtaining the sentence similarity between the sentences with the similarity to be calculated.
In an embodiment of the present invention, the module 302 further includes a sub-module for obtaining a context associated with each sentence with similarity to be calculated from the unlabeled corpus by using a word frequency-inverse document frequency algorithm. The sub-module is used for reducing noise and improving the accuracy of the subsequently obtained context score vector.
By applying the sentence similarity measuring device, the sentence similarity can be calculated without marking data, the dependence on the marking data is reduced, and the calculating process is simple.
The sentence similarity measuring device provided by the invention can be used for executing the sentence similarity measuring method described in any of the above embodiments, and the implementation principle and the technical effect are similar, and are not repeated herein.
In another embodiment of the present invention, a computer-readable storage medium stores computer instructions, wherein the computer instructions are operable to perform the method for measuring sentence similarity described in any of the embodiments. Wherein the storage medium may be directly in hardware, in a software module executed by a processor, or in a combination of the two.
A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium.
The Processor may be a Central Processing Unit (CPU), other general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), other Programmable logic devices, discrete Gate or transistor logic, discrete hardware components, or any combination thereof. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
In one embodiment of the present application, a computer device includes a processor and a memory, the memory storing computer instructions, wherein: the processor operates the computer instructions to perform the sentence similarity measure method described in any of the embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed system and method may be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
The above embodiments are merely examples, which are not intended to limit the scope of the present disclosure, and all equivalent structural changes made by using the contents of the specification and the drawings, or any other related technical fields, are also included in the scope of the present disclosure.
Claims (9)
1. A method for measuring sentence similarity is characterized by comprising the following steps,
performing unsupervised learning on the context matching relationship of each sentence in a predetermined unmarked corpus by using a language model tool to obtain a context matching model;
obtaining contexts related to a plurality of sentences with similarity to be calculated from the non-labeling corpus to obtain a shared context set, calculating each sentence with similarity to be calculated by using the context matching model, obtaining a context score of each sentence with similarity to be calculated with the context score of each context in the shared context set, and further obtaining a context score vector of each sentence with similarity to be calculated by using all the context scores; and the number of the first and second groups,
and calculating cosine similarity between each context score vector and the rest of the context score vectors, thereby obtaining sentence similarity between the sentence with the similarity to be calculated corresponding to the context score vector and the rest of the sentences with the similarity to be calculated.
2. The method for measuring sentence similarity according to claim 1,
the unlabeled corpus includes unlabeled data crawled through the internet.
3. The method for measuring sentence similarity according to claim 1, wherein the process of obtaining a shared context set from the context associated with a plurality of sentences with similarity to be calculated from the unlabeled corpus comprises,
acquiring the context related to each sentence with similarity to be calculated in the plurality of sentences with similarity to be calculated from the non-labeling corpus;
and combining all the contexts to obtain the shared context set.
4. The method for measuring sentence similarity according to claim 3, wherein the process of obtaining the context associated with each sentence with similarity to be calculated from the unlabeled corpus comprises,
and acquiring the context related to each sentence with the similarity to be calculated from the non-labeled corpus by using a word frequency-inverse document frequency algorithm.
5. A sentence similarity measuring device is characterized by comprising,
a module for performing unsupervised learning on the context matching relationship of each sentence in a predetermined unmarked corpus by using a language model tool to obtain a context matching model;
a module, configured to obtain a shared context set from the unlabeled corpus, where the shared context set is obtained by obtaining contexts related to multiple sentences with similarity to be calculated, calculate each sentence with similarity to be calculated by using the context matching model, obtain a context score for each context in the shared context set, and further obtain a context score vector for each sentence with similarity to be calculated by using all the context scores; and the number of the first and second groups,
and the module is used for calculating cosine similarity between each context score vector and the rest of the context score vectors so as to obtain sentence similarity between the sentence with the similarity to be calculated corresponding to the context score vector and the rest of the sentences with the similarity to be calculated.
6. The apparatus for sentence similarity measurement according to claim 5, wherein the module for unsupervised learning of the context matching relationship of each sentence in the predetermined unlabeled corpus using the language model tool to obtain the context matching model further comprises,
and the sub-module is used for obtaining the label-free corpus by crawling label-free data through the Internet.
7. The sentence similarity measurement apparatus according to claim 5, wherein the module for obtaining the context related to the plurality of sentences with similarity to be calculated from the unlabeled corpus to obtain a shared context set, and calculating each sentence with similarity to be calculated by using the context matching model, and further obtaining a context score vector of each sentence with similarity to be calculated by using all the context scores together with the context score of each context in the shared context set,
and the sub-module is used for acquiring the context related to each sentence with the similarity to be calculated from the non-labeled corpus by using a word frequency-inverse document frequency algorithm.
8. A computer readable storage medium storing computer instructions, wherein the computer instructions are operable to perform the sentence similarity measure method of any one of claims 1-4.
9. A computer device comprising a processor and a memory, the memory storing computer instructions, wherein the processor operates the computer instructions to perform the sentence similarity measure method according to any one of claims 1-4.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202111211101.XA CN114154483A (en) | 2021-10-18 | 2021-10-18 | A method, device, medium and equipment for measuring the similarity of sentences |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202111211101.XA CN114154483A (en) | 2021-10-18 | 2021-10-18 | A method, device, medium and equipment for measuring the similarity of sentences |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN114154483A true CN114154483A (en) | 2022-03-08 |
Family
ID=80462768
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202111211101.XA Pending CN114154483A (en) | 2021-10-18 | 2021-10-18 | A method, device, medium and equipment for measuring the similarity of sentences |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN114154483A (en) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102314418A (en) * | 2011-10-09 | 2012-01-11 | 北京航空航天大学 | Method for comparing Chinese similarity based on context relation |
| CN106610942A (en) * | 2016-07-27 | 2017-05-03 | 四川用联信息技术有限公司 | Word semantic similarity solution method based on context window |
| US20200104367A1 (en) * | 2018-09-30 | 2020-04-02 | International Business Machines Corporation | Vector Representation Based on Context |
| WO2021159613A1 (en) * | 2020-02-14 | 2021-08-19 | 深圳壹账通智能科技有限公司 | Text semantic similarity analysis method and apparatus, and computer device |
| US20210319260A1 (en) * | 2020-04-14 | 2021-10-14 | Samsung Sds Co., Ltd. | Apparatus and method for embedding sentence feature vector |
-
2021
- 2021-10-18 CN CN202111211101.XA patent/CN114154483A/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102314418A (en) * | 2011-10-09 | 2012-01-11 | 北京航空航天大学 | Method for comparing Chinese similarity based on context relation |
| CN106610942A (en) * | 2016-07-27 | 2017-05-03 | 四川用联信息技术有限公司 | Word semantic similarity solution method based on context window |
| US20200104367A1 (en) * | 2018-09-30 | 2020-04-02 | International Business Machines Corporation | Vector Representation Based on Context |
| WO2021159613A1 (en) * | 2020-02-14 | 2021-08-19 | 深圳壹账通智能科技有限公司 | Text semantic similarity analysis method and apparatus, and computer device |
| US20210319260A1 (en) * | 2020-04-14 | 2021-10-14 | Samsung Sds Co., Ltd. | Apparatus and method for embedding sentence feature vector |
Non-Patent Citations (1)
| Title |
|---|
| XIAOFEI SUN, YUXIAN MENG, XIANG AO, FEI WU, TIANWEI ZHANG, JIWEI LI, CHUN FAN: "Sentence Similarity Based on Contexts", Retrieved from the Internet <URL:https://arxiv.org/abs/2105.07623v1> * |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111709233B (en) | Intelligent diagnosis guiding method and system based on multi-attention convolutional neural network | |
| US10853401B2 (en) | Method, apparatus, and computer program product for classification and tagging of textual data | |
| CN107436864B (en) | Chinese question-answer semantic similarity calculation method based on Word2Vec | |
| WO2021000676A1 (en) | Q&amp;a method, q&amp;a device, computer equipment and storage medium | |
| CN112800205B (en) | Method and device for obtaining question and answer related paragraphs based on semantic change manifold analysis | |
| CN108959531B (en) | Information search method, device, device and storage medium | |
| WO2021184552A1 (en) | Medical text search method and apparatus, computer device and storage medium | |
| CN109472022B (en) | New word recognition method based on machine learning and terminal equipment | |
| CN111339764A (en) | Chinese named entity recognition method and device | |
| CN114300127A (en) | Method, device, equipment and storage medium for inquiry processing | |
| WO2021179688A1 (en) | Medical literature retrieval method and apparatus, electronic device, and storage medium | |
| CN109241529B (en) | Method and device for determining viewpoint label | |
| EP3867790A1 (en) | Enhanced intent matching using keyword-based word mover's distance | |
| CN113326383B (en) | Short text entity linking method, device, computing equipment and storage medium | |
| CN111859974B (en) | A semantic disambiguation method and device combined with knowledge graph, and intelligent learning device | |
| CN108461111A (en) | Chinese medical treatment text duplicate checking method and device, electronic equipment, computer read/write memory medium | |
| WO2025044865A1 (en) | Cross-domain problem processing methods and apparatuses, electronic device and storage medium | |
| CN111476026A (en) | Statement vector determination method and device, electronic equipment and storage medium | |
| CN113128234B (en) | Method and system for establishing entity recognition model, electronic equipment and medium | |
| CN111611796A (en) | Method, device, electronic device and storage medium for determining hypernym of hyponym | |
| US12197535B2 (en) | Determining a denoised named entity recognition model and a denoised relation extraction model | |
| CN109299467A (en) | Medical text recognition method and device, sentence recognition model training method and device | |
| CN110929507B (en) | Method, device and storage medium for text information processing | |
| CN118260431A (en) | A method, device, terminal device and storage medium for calculating person-job matching | |
| CN116504389A (en) | Artificial intelligence-based inquiry dialogue evaluation method and related equipment |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |