[go: up one dir, main page]

CN112417003A - Method, device, equipment and storage medium for synonym mining based on network search - Google Patents

Method, device, equipment and storage medium for synonym mining based on network search Download PDF

Info

Publication number
CN112417003A
CN112417003A CN202011317040.0A CN202011317040A CN112417003A CN 112417003 A CN112417003 A CN 112417003A CN 202011317040 A CN202011317040 A CN 202011317040A CN 112417003 A CN112417003 A CN 112417003A
Authority
CN
China
Prior art keywords
word
search
similarity
information
link
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011317040.0A
Other languages
Chinese (zh)
Inventor
张月涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Puhui Enterprise Management Co Ltd
Original Assignee
Ping An Puhui Enterprise Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Puhui Enterprise Management Co Ltd filed Critical Ping An Puhui Enterprise Management Co Ltd
Priority to CN202011317040.0A priority Critical patent/CN112417003A/en
Publication of CN112417003A publication Critical patent/CN112417003A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明涉及自然语言处理,提供一种基于网络搜索的近义词挖掘方法,该方法包括:检测待进行近似度对比的第一词语与第二词语,将第一词语与第二词语输入预设模型,得到第一词向量和第二词向量;计算第一词向量和第二词向量的相似度;比较相似度与预设阈值,若相似度大于预设阈值,则将第一词语和第二词语输入预设搜索引擎进行搜索,分别得到第一词语对应的第一搜索信息,以及第二词语对应的第二搜索信息;根据第一搜索信息与第二搜索信息,确定第一词语与第二词语的近似度对比结果。本发明还公开了一种基于网络搜索的近义词挖掘装置、设备及存储介质。本发明通过搜索引擎对近义词的挖掘提供帮助,提升了近义词挖掘的精准度。

Figure 202011317040

The invention relates to natural language processing, and provides a method for mining synonyms based on network search. The method includes: detecting a first word and a second word for similarity comparison, inputting the first word and the second word into a preset model, Obtain the first word vector and the second word vector; calculate the similarity between the first word vector and the second word vector; compare the similarity with the preset threshold, and if the similarity is greater than the preset threshold, compare the first word and the second word Entering a preset search engine for searching, respectively obtaining first search information corresponding to the first word and second search information corresponding to the second word; determining the first word and the second word according to the first search information and the second search information approximation comparison results. The invention also discloses a synonym mining device, equipment and storage medium based on network search. The present invention provides assistance for the mining of synonyms through a search engine, and improves the accuracy of mining the synonyms.

Figure 202011317040

Description

Network search-based synonym mining method, device, equipment and storage medium
Technical Field
The invention relates to the field of natural language processing, in particular to a method, a device, equipment and a storage medium for mining near-meaning words based on network search.
Background
In the field of Natural Language Processing (NLP), preprocessing of data is an important loop, including chinese word segmentation, synonym replacement, noise word cleaning, and the like, where synonym replacement has great significance for similarity calculation between subsequent sentences, and synonym replacement is often involved in natural language data preprocessing, for example, in calculating the similarity between two sentences, two synonyms of a word are converted into the same word by using a synonym replacement technique, so that the reliability of similarity calculation is improved.
The existing NLP team can mine and maintain a self synonym library according to a self business scene, and common synonym mining schemes in the market are approximately as follows: 1. word vector-word 2vec, this is the most common near meaning word mining method at present, calculate the similarity between the word vectors, the scheme considers the word with high similarity to be a near meaning word, but this method has a drawback, namely it is easy to discern the homonym as the synonym by mistake, namely in the word vector computing method, the vector of the word appearing in the same position is very similar 2 in a sentence, make good near meaning word stock or near meaning dictionary and combine the business scene to excavate the near meaning word according to the market, this method is more accurate, but the data source is less, cause to excavate the inefficiency.
Disclosure of Invention
The invention mainly aims to provide a method, a device, equipment and a storage medium for mining a near meaning word based on network search, and aims to solve the technical problem of low mining efficiency caused by few data sources and inaccurate mining in the existing near meaning word mining process.
In addition, in order to achieve the above object, the present invention further provides a method for mining a synonym based on a web search, where the method for mining a synonym based on a web search includes the following steps:
detecting a first word and a second word to be subjected to similarity comparison, and inputting the first word and the second word into a preset model to obtain a first word vector corresponding to the first word and a second word vector corresponding to the second word;
calculating the similarity of the first word vector and the second word vector;
comparing the similarity with a preset threshold, and if the similarity is greater than the preset threshold, inputting the first word and the second word into a preset search engine for searching to respectively obtain first search information corresponding to the first word and second search information corresponding to the second word;
and determining an approximation degree contrast result of the first word and the second word according to the first search information and the second search information.
Optionally, before the step of detecting a first word and a second word to be subjected to proximity comparison, and inputting the first word and the second word into a preset model to obtain a first word vector corresponding to the first word and a second word vector corresponding to the second word, the method includes:
receiving a near meaning word mining instruction, and acquiring a historical library corresponding to the near meaning word mining instruction;
performing word segmentation processing on the historical library to obtain a historical word bank, and performing denoising processing on the historical word bank to obtain a preprocessed word bank;
and receiving a near meaning word judgment instruction, and acquiring a first word and a second word corresponding to the near meaning word judgment instruction from the preprocessing word bank.
Optionally, the calculating the similarity between the first word vector and the second word vector includes:
acquiring a vector included angle between the first word vector and the second word vector, and calculating a length ratio of the first word vector to the second word vector;
and determining the similarity of the first word vector and the second word vector according to the ratio of the vector included angle to the length.
Optionally, the step of inputting the first term and the second term into a preset search engine for searching to obtain first search information corresponding to the first term and second search information corresponding to the second term includes:
inputting the first terms and the second terms into a preset search engine, adjusting the preset search engine to be in an initial state, and adjusting the search quantity parameters of the preset search engine to be target numerical values;
searching the first word and the second word, and acquiring first search information and second search information output by the preset search engine, wherein the first search information corresponds to the first word, the second search information corresponds to the second word, and the number of search links in the first search information and the number of search links in the second search information are both the target numerical values.
Optionally, the step of determining an approximate degree comparison result of the first word and the second word according to the first search information and the second search information includes:
screening target search links contained in the first search information and the second search information, and determining the target number of the target search links;
calculating a target ratio of the target quantity to the target numerical value, taking a search link in the first search information as a first search link, and taking a search link in the second search information as a second search link;
calculating a similarity value of first link display information and second link display information, wherein the first link display information corresponds to the first search link and the second link display information corresponds to the second search link;
and determining an approximation degree comparison result of the first word and the second word according to the target ratio and the similarity value.
Optionally, the step of calculating a similarity value between the first link display information and the second link display information includes:
acquiring a first resource positioning mark corresponding to the first link display information and a second resource positioning mark corresponding to the second link display information;
screening out target resource positioning marks contained in both the first resource positioning mark and the second resource positioning mark;
and determining the similarity value of the first link display information and the second link display information according to the target number and the number of the target resource positioning marks.
Optionally, after the step of calculating a similarity value between the first link display information and the second link display information, the method includes:
determining first link classification information corresponding to the first search link and second link classification information corresponding to the second search link according to a preset link classification label;
calculating the similarity degree of the first link classification information and the second link classification information;
the determining an approximation degree comparison result of the first word and the second word according to the target ratio and the similarity value comprises:
and determining an approximation degree contrast result of the first word and the second word according to the target ratio, the similarity value and the similarity degree.
In addition, in order to achieve the above object, the present invention further provides a network search-based synonym mining device, including:
the vector output module is used for detecting a first word and a second word to be subjected to similarity comparison, and inputting the first word and the second word into a preset model to obtain a first word vector corresponding to the first word and a second word vector corresponding to the second word;
the similarity calculation module is used for calculating the similarity of the first word vector and the second word vector;
the search module is used for comparing the similarity between the first word vector and the second word vector with a preset threshold value, and if the similarity between the first word vector and the second word vector is greater than the preset threshold value, inputting the first word and the second word into a preset search engine for searching to respectively obtain first search information corresponding to the first word and second search information corresponding to the second word;
and the similarity calculation module is used for determining a similarity comparison result of the first word and the second word according to the first search information and the second search information.
In addition, in order to achieve the above object, the present invention further provides a network search-based synonym mining device, including: the network search-based synonym mining program comprises a memory, a processor and a network search-based synonym mining program stored on the memory and capable of running on the processor, wherein the network search-based synonym mining program realizes the steps of the network search-based synonym mining method when being executed by the processor.
In addition, to achieve the above object, the present invention further provides a storage medium having stored thereon a network search based hypernym mining program, which when executed by a processor implements the steps of the network search based hypernym mining method as described above.
The embodiment of the invention provides a method, a device, equipment and a storage medium for mining a synonym based on network search. According to the embodiment of the invention, the first word vector corresponding to the first word and the second word vector corresponding to the second word are obtained by inputting the first word and the second word to be subjected to similarity comparison into the preset model, when the similarity of the first word vector and the second word vector is greater than the preset threshold value, the first word and the second word are input into the preset search engine for searching, the first search information corresponding to the first word and the second search information corresponding to the second word are obtained, and finally the similarity comparison result of the first word and the second word is determined according to the first search information and the second search information.
Drawings
Fig. 1 is a schematic hardware structure diagram of an implementation manner of a network search based near-synonym mining device according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a first embodiment of a method for mining synonyms based on web search according to the present invention;
FIG. 3 is a flowchart illustrating a method for mining synonyms based on web search according to a second embodiment of the present invention;
fig. 4 is a functional module diagram of an embodiment of a network search-based synonym mining device according to the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for facilitating the explanation of the present invention, and have no specific meaning in itself. Thus, "module", "component" or "unit" may be used mixedly.
The synonym mining terminal (called as a terminal, equipment or terminal equipment) based on network search in the embodiment of the invention can be a PC, and can also be a mobile terminal equipment with a display function, such as a smart phone, a tablet computer, a portable computer and the like.
As shown in fig. 1, the terminal may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Optionally, the terminal may further include a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WiFi module, and the like. Such as light sensors, motion sensors, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display screen according to the brightness of ambient light, and a proximity sensor that may turn off the display screen and/or the backlight when the mobile terminal is moved to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the magnitude of acceleration in each direction (generally, three axes), detect the magnitude and direction of gravity when the mobile terminal is stationary, and can be used for applications (such as horizontal and vertical screen switching, related games, magnetometer attitude calibration), vibration recognition related functions (such as pedometer and tapping) and the like for recognizing the attitude of the mobile terminal; of course, the mobile terminal may also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which are not described herein again.
Those skilled in the art will appreciate that the terminal structure shown in fig. 1 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a storage medium, may include therein an operating system, a network communication module, a user interface module, and a network search-based synonym mining program.
In the terminal shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be configured to invoke a network search based hypernym mining program stored in the memory 1005, which when executed by the processor implements the operations in the network search based hypernym mining method provided by the embodiments described below.
Based on the hardware structure of the equipment, the embodiment of the method for mining the similar meaning words based on the network search is provided.
Referring to fig. 2, in a first embodiment of the method for mining a synonym based on a web search according to the present invention, the method for mining a synonym based on a web search includes:
step S10, detecting a first word and a second word to be subjected to similarity comparison, inputting the first word and the second word into a preset model, and obtaining a first word vector corresponding to the first word and a second word vector corresponding to the second word.
In the field of natural language processing, the preprocessing of data is an important part, and includes chinese word segmentation, word replacement with near meaning, noise word cleaning, etc., where word replacement with near meaning plays a significant role in calculating the similarity between two sentences, and the preprocessing of data in natural language often involves word replacement with near meaning, for example, in calculating the similarity between two sentences, two near meaning words of a word are converted into the same word, so as to improve the reliability of similarity calculation between two sentences, and the similarity comparison in this embodiment refers to a determination process of determining whether two words (i.e., the first word and the second word in this embodiment) are near meaning words, and two words determined to be near meaning words are stored in a near meaning word library, so that word replacement is used in data preprocessing of natural language, and it is known that, for human beings, it is easy to determine whether two words are similar words, but it is difficult for a computer based on logic operation to determine whether two words are similar words, the first word and the second word in this embodiment are words that need to be determined by similar words, the preset model is a word vector generation model, it can be understood that when determining whether two words are similar words, the computer can determine according to the similarity of the features of the two words, and the preset model in this embodiment is a model that can extract word features and can generate a vector corresponding to a word according to the features of the word, where the vector is visually comparable, and it is known that the first word vector in this embodiment is a vector generated by the first word through the preset model, the second word vector is a vector generated by the second word through the preset model, and specifically, the features of a word include the position of the word in the original sentence and the part-of-word in the original sentence (noun, adjectives, verbs, etc.), etc., a vector corresponding to the word having specifically contrastable attributes such as direction and length is determined based on the characteristics of the word.
Step S20, calculating a similarity between the first word vector and the second word vector.
The first word vector and the second word vector have the general properties of vectors, i.e. have a magnitude and a direction, wherein the magnitude of the vectors can be measured by length, and the angle between the two vectors can be obtained by coinciding the starting points of the first vector and the second vector, and it is understood that the condition that the two vectors are identical is that: the vectors have the same direction and the same length, and if the lengths of the two vectors are equal, the length ratio of the two vectors is 1, it is understood that the closer the length ratio of the two vectors is to 1, the higher the similarity of the two vectors is, and the smaller the angle between the two vectors is, the higher the similarity of the two vectors is, in sum, the similarity of the first word vector and the second word vector is related to the length ratio and the size of the angle of the two vectors, and the similarity of the first word vector and the second word vector is calculated by respectively weighting the length ratio and the angle, specifically, for example, the weight of the length ratio is 0.3, the weight of the angle is 0.7, the length ratio is 0.8, and the angle is 30 degrees (i.e., 1/6 half circumference angle), and then the similarity of the first word vector and the second word vector is 0.3 × 0.8+ (0.7-0.7 × 1/6) ═ 0.82.
Step S30, comparing the similarity with a preset threshold, and if the similarity is greater than the preset threshold, inputting the first word and the second word into a preset search engine for searching to obtain first search information corresponding to the first word and second search information corresponding to the second word, respectively.
By obtaining and comparing the direction included angle and the vector length of the two word vectors, comprehensively determining the similarity between the first word vector and the second word vector, for example, the direction included angle of the two word vectors corresponds to different weights in different value intervals, the smaller the included angle, the larger the weight, the length ratio of the two word vectors also corresponds to different weights in different value intervals, the closer the length ratio is to 1, the larger the weight is, the weights are added to obtain a comprehensive weight, the comprehensive weight can represent the similarity, and finally the comprehensive weight is compared with a preset threshold, it can be known that the preset threshold is a value preset according to experience, if the similarity is less than or equal to the preset threshold, it is directly determined that the first word and the second word are not near-sense words, if the similarity is greater than the preset threshold, it is preliminarily determined that the first word and the second word are near-sense words, the preliminary determination here indicates that there is a greater likelihood that the first word and the second word are close-sense words.
Further, the present scheme adopts a search engine search method to judge whether the first word and the second word are synonyms again, so as to improve the accuracy of the judgment of the synonyms, specifically, the first word and the second word are input into a preset search engine to be searched, and search information containing a plurality of search links is obtained, wherein the first search information is obtained by searching the first word, the second search information is obtained by searching the second word, it is known that each search link corresponds to link display information, and there are cases where the link display information is the same but the link display information is different, for example, the words are searched for "understanding" and "understanding", and the search links of both words may include descriptions of the word by a certain encyclopedic address (for example, descriptions of the encyclopedic address for "understanding" and "understanding"), that is, the search links of both words include links of the encyclopedic address, a certain search link of the two words is the same, but the description contents (i.e. link display information) of the encyclopedia website to the two words are different, i.e. the search link is the same, but the link display information is different.
Step S40, determining an approximation degree comparison result of the first word and the second word according to the first search information and the second search information.
Specifically, in this embodiment, the first search information and the second search information are compared to obtain a comparison result, where the comparison result may be a numerical value capable of measuring a magnitude, or may be an absolute conclusion such as whether the first search information is compared with the second search information, and if the obtained similarity is the similarity, the comparison result of the first search information and the second search information is determined according to the similarity, specifically, the similarity is a numerical value, and the determination method of the magnitude may be: as is known, the search information includes search links and link display information, that is, each search link has link display information, in order to facilitate calculating the similarity, the number of the search links in the first search information and the second search information is the same (how many links can be manually set for searching), the number of the same search links in the first search information and the second search information is obtained through statistics, and then the link display information output by the same search link is compared, for example, the first word "lux" and the second word "zhou tree person", and 100 search links are displayed in total by searching "lux" and "zhou tree person", where the same search link has a total of 80, the same search link accounts for 80% of all the search links, and 80% corresponds to a weight, and then the similarity of the link display contents corresponding to the same search link is compared, and obtaining the similarity of the display content of the corresponding link of each same search link, further calculating the average similarity, if the average similarity is 65%, determining the similarity of the first search information and the second search information by calculating the sum of the two weights, and if the similarity is greater than a threshold value, determining that the first word and the second word are similar words.
Specifically, the steps before step S10 include:
step a1, receiving a near meaning word mining instruction, and acquiring a historical library corresponding to the near meaning word mining instruction.
Step a2, performing word segmentation processing on the historical library to obtain a historical word bank, and performing denoising processing on the historical word bank to obtain a preprocessed word bank.
Step a3, receiving a near meaning word judgment instruction, and acquiring a first word and a second word corresponding to the near meaning word judgment instruction from the preprocessing word bank.
The historical library in this embodiment is a database for performing synonym mining, such as a conversation between a client and a customer service, and the present solution enables a finally mined synonym library to better conform to an actual application scenario by collecting these text information, and also improves reliability of sentence similarity calculation for the case where synonym replacement is required, specifically, the historical library includes text contents input by the client and the customer service, and the text contents exist in the form of sentences, and when performing synonym mining, it is also necessary to perform word segmentation processing on these sentences, it can be understood that word segmentation processing in the present solution is to divide a sentence into words of different numbers in a literal sense, for example, the word segmentation result of a sentence "is good weather" is "today", "is" good "," weather "and knowingly, there are many ways to perform word segmentation on a sentence, in order to separate accurate words from a sentence, the scheme can comprehensively use various methods and can carry out training to obtain the best word separating result.
It is known that the words that are manually input may include many words that are useless for word mining, such as punctuation marks, foreign language, sensitive words (abusive words and illegal words), and the like, and the words are cleaned by the present scheme, that is, in the present embodiment, the process of denoising the historical word library is performed, after denoising is completed, a preprocessed word library including many words is obtained, the words in the preprocessed word library are bound with the positions of the words in the original sentence, and the parts of speech in the original sentence, so as to obtain word vectors subsequently, the first word and the second word that need to be subjected to word judgment are determined manually, or the words are automatically selected by a program, and the first word and the second word may be from the preprocessed word library or from outside the preprocessed word library.
Specifically, the step S20 is a step of refining, including:
step b1, obtaining a vector included angle between the first word vector and the second word vector, and calculating a length ratio of the first word vector to the second word vector.
Step b2, determining the similarity of the first word vector and the second word vector according to the ratio of the vector included angle to the length.
The first word vector and the second word vector have the general properties of vectors, i.e. have a magnitude and a direction, wherein the magnitude of the vectors can be measured by length, and the angle between the two vectors can be obtained by coinciding the starting points of the first vector and the second vector, and it is understood that the condition that the two vectors are identical is that: the vectors have the same direction and the same length, and if the lengths of the two vectors are equal, the length ratio of the two vectors is 1, it is understood that the closer the length ratio of the two vectors is to 1, the higher the similarity of the two vectors is, and the smaller the angle between the two vectors is, the higher the similarity of the two vectors is, in sum, the similarity of the first word vector and the second word vector is related to the length ratio and the size of the angle of the two vectors, and the similarity of the first word vector and the second word vector is calculated by respectively weighting the length ratio and the angle, specifically, for example, the weight of the length ratio is 0.3, the weight of the angle is 0.7, the length ratio is 0.8, and the angle is 30 degrees (i.e., 1/6 half circumference angle), and then the similarity of the first word vector and the second word vector is 0.3 × 0.8+ (0.7-0.7 × 1/6) ═ 0.82.
In the embodiment, the first word and the second word to be subjected to approximation degree comparison are input into the preset model, so that a first word vector corresponding to the first word and a second word vector corresponding to the second word are obtained, when the similarity of the first word vector and the second word vector is greater than a preset threshold value, the first word and the second word are input into the preset search engine for searching, first search information corresponding to the first word and second search information corresponding to the second word are obtained, and finally, the approximation degree comparison result of the first word and the second word is determined according to the first search information and the second search information.
Further, referring to fig. 3, on the basis of the above embodiment of the present invention, a second embodiment of the method for mining a synonym based on web search according to the present invention is provided.
This embodiment is a step of the first embodiment, which is a refinement of step S30, and the difference between this embodiment and the above-described embodiment of the present invention is:
step S31, inputting the first term and the second term into a preset search engine, adjusting the preset search engine to an initial state, and adjusting a search quantity parameter of the preset search engine to a target value.
Step S32, search the first word and the second word, and acquire first search information and second search information output by the preset search engine, where the first search information corresponds to the first word, the second search information corresponds to the second word, and both the number of search links in the first search information and the number of search links in the second search information are the target values.
It can be known that, the preset search engine in this embodiment is a system that collects information from the internet according to a certain policy, organizes and processes the collected information, provides a search service for a user, displays the retrieved related information to the user, inputs the first word and the second word into the preset search engine, adjusts the search quantity parameter of the preset search engine, and adjusts the search quantity parameter to a specific value (i.e., a target value in this embodiment), it can be understood that the search quantity parameter is the number of related information that is retrieved by the preset search engine at one time according to the information input by the user, the retrieved related information exists in a form of links, the number of links corresponding to all related information retrieved at one time is the search quantity parameter in this embodiment, and the search quantity parameter of the preset search engine in this embodiment supports adjustment, for example, if the search amount parameter of the preset search engine is adjusted to 100, the first term and the second term are respectively input into the preset search engine, the number of links corresponding to the obtained first search information is 100, the number of links corresponding to the second search information is also 100, the number of links corresponding to the first search information, the purpose of setting equal to the number of links corresponding to the second search information is to facilitate comparison of the first search information with the second search information, it is known that the preset search engine may adjust the output search information according to the search habit of the user, and in this scheme, before inputting the first word or the second word into the preset search engine, the preset search engine needs to be initialized, and the purpose of the initialization is to eliminate the previously stored user search records, to ensure that the output search information is not affected by other factors than the input information (i.e., the first and second terms in this embodiment).
Specifically, the step S40 is a step of refining, including:
and c1, screening the target search links contained in the first search information and the second search information, and determining the target quantity of the target search links.
Step c2, calculating the target ratio of the target number to the target value, and using the search link in the first search information as the first search link and using the search link in the second search information as the second search link.
And c3, calculating a similarity value between first link display information and second link display information, wherein the first link display information corresponds to the first search link, and the second link display information corresponds to the second search link.
And c4, determining the similarity contrast result of the first word and the second word according to the target ratio and the similarity value.
It should be noted that the target search link included in the first search information and the second search information in the present embodiment refers to a search link existing in the first search information and the second search information, and as is known from the definition of the link, the link refers to a connection relationship from one target (hereinafter referred to as target one) to another target (hereinafter referred to as target two), where the targets may be web pages, pictures, characters, etc., but whatever the targets are, the targets have an exact address, and the link refers to a connection relationship from one address (hereinafter referred to as a starting point) to another address (hereinafter referred to as an ending point), and in the present embodiment, the first search information and the second search information are both output by the same preset search engine, and therefore, the starting points of all the search links in the first search information and the second search information are the same, and if the end point of the search link a in the first search information is the same as the end point of the search link b in the second search information, the search link a and the search link b are the target search links in this embodiment.
After the target search links are screened from the first search information and the second search information, the number of the target search links (i.e., the number of targets in this embodiment) may be determined, and as can be seen from the above, the number of the targets is the number of links corresponding to all relevant information retrieved by the preset search engine once, i.e., the number of the first search links and the second search links in this embodiment, for example, if the number of the targets is 100, the number of the first search links and the number of the second search links are both 100, and if the number of the target search links is 30, the target ratio in this embodiment is 0.3, it is known that the link display information in this embodiment refers to information displayed on the target two, where the first link display information corresponds to the first search links, the second link display information corresponds to the second search links, and it is known that each link display information corresponds to one URL (Uniform Resource Locator, uniform resource locator), the uniform resource locator can be understood as a unique identification tag of information stored on an end point, for example, if the starting point and the end point of a search link c and a search link d are the same, an end point of the search link c stores information e, and an end point of the search link d stores information f, if the uniform resource locator of the information e is the same as the uniform resource locator of the information f, the search link c and the search link d are completely the same, if the uniform resource locator of the information e is different from the uniform resource locator of the information f, only the search link c and the search link d can be partially the same, by obtaining the uniform resource locator corresponding to the link display information, and judging whether the resource locator corresponding to the first link display information is the same as the resource locator corresponding to the second link display information, the same link display information in the first link display information and the second link display information may be determined, and then the similarity value between the first link display information and the second link display information may be determined according to the number of the same link display information in the first link display information and the second link display information, specifically, the greater the number ratio between the first link display information and the second link display information, the greater the similarity value, for example, if the target value is 200, the target number is 50, and the number of the same link display information in the first link display information and the second link display information is 2, the similarity value is 4%, the target ratio is 25%, and the similarity between the first word and the second word is 81%, it is understood that the similarity in this embodiment is determined by combining the target ratio and the similarity value, and when the target ratio is fixed, the greater the similarity value is, the greater the approximation degree is, and the greater the target ratio value is, the greater the approximation degree is when the similarity value is fixed.
Specifically, the step c3 is a step of refining, which comprises:
and d1, acquiring a first resource positioning mark corresponding to the first link display information, and a second resource positioning mark corresponding to the second link display information.
And d2, screening out the target resource positioning marks contained in both the first resource positioning mark and the second resource positioning mark.
Step d3, determining the similarity value of the first link display information and the second link display information according to the target number and the number of the target resource positioning marks.
It is to be noted that, if the first resource locator and the second resource locator are the same, the link display information corresponding to the first resource locator is the same as the link display information corresponding to the second resource locator, and it is to be noted that, after acquiring a first resource positioning mark corresponding to the first link display information and a second resource positioning mark corresponding to the second link display information, the target resource locator included in both the first resource locator and the second resource locator is also screened out, and finally, determining the similarity value of the first link display information and the second link display information according to the target number and the number of the target resource positioning marks, for example, if the target value is 500, the target number is 80, and the number of the same link display information in the first link display information and the second link display information is 2, the similarity value is 2.5%.
Specifically, the steps after step c3 include:
step e1, determining first link classification information corresponding to the first search link and second link classification information corresponding to the second search link according to a preset link classification label.
Step e2, calculating the similarity between the first link classification information and the second link classification information.
Specifically, the step of refining in step c4 includes:
and e3, determining the similarity contrast result of the first term and the second term according to the target ratio, the similarity value and the similarity degree.
It can be known that, besides the target search links included in the first search information and the second search information, the different search links in the first search information and the second search information also have reference values, and in the present solution, before performing the similar word comparison, the search links may be classified in advance to obtain link classification tags, where the link classification tags may also be classified into multiple levels of tags, for example, the first level tags of the search links may be classified into characters, videos, pictures, and the like, the second level tags of the characters may be books, documents, user public logs, and the like, and the first search link and the second search link are classified according to the preset link classification tags to obtain the first link classification information and the second link classification information, and it can be known that the link classification information is the classification condition of all the search links, including the number of the search links under each level of classification tags, further, according to the first link classification information and the second link classification information, the similarity is calculated, specifically, the difference between the numbers of the first search link and the second search link under the same classification label can be calculated, and finally, the average value of the difference values between the first search link and the second search link under all the classification labels is calculated, the smaller the average value is, the larger the similarity between the first link classification information and the second link classification information is, the larger the average value is, the smaller the similarity between the first link classification information and the second link classification information is, and finally, the similarity between the first word and the second word can be determined by the target ratio, the similarity value and the similarity degree, and the specific determination method is as described in the above embodiment.
In the embodiment, the search information is classified through the preset link classification label, and then the similarity comparison result of the first word and the second word is determined according to the similarity of the classification information and the search information, so that the precision of the near meaning word mining is improved.
In addition, referring to fig. 4, an embodiment of the present invention further provides a network search-based synonym mining device, where the network search-based synonym mining device includes:
the vector output module 10 is configured to detect a first word and a second word to be subjected to similarity comparison, input the first word and the second word into a preset model, and obtain a first word vector corresponding to the first word and a second word vector corresponding to the second word;
a similarity calculation module 20, configured to calculate a similarity between the first word vector and the second word vector;
the search module 30 is configured to compare a similarity between the first word vector and the second word vector with a preset threshold, and if the similarity between the first word vector and the second word vector is greater than the preset threshold, input the first word and the second word into a preset search engine for search to obtain first search information corresponding to the first word and second search information corresponding to the second word, respectively;
and the similarity calculation module 40 is configured to determine a similarity comparison result between the first word and the second word according to the first search information and the second search information.
Optionally, the apparatus for mining a synonym based on web search further includes:
the system comprises a mining instruction receiving module, a semantic word mining instruction processing module and a semantic word mining module, wherein the mining instruction receiving module is used for receiving a semantic word mining instruction and acquiring a historical library corresponding to the semantic word mining instruction;
the word segmentation and denoising module is used for carrying out word segmentation on the historical library to obtain a historical word bank and carrying out denoising on the historical word bank to obtain a preprocessed word bank;
and the near meaning word judgment instruction receiving module is used for receiving a near meaning word judgment instruction and acquiring a first word and a second word corresponding to the near meaning word judgment instruction from the preprocessing word bank.
Optionally, the similarity calculation module 20 includes:
the length ratio calculation unit is used for acquiring a vector included angle between the first word vector and the second word vector and calculating the length ratio of the first word vector to the second word vector;
and the similarity determining unit is used for determining the similarity of the first word vector and the second word vector according to the ratio of the vector included angle to the length.
Optionally, the vector output module 10 includes:
the parameter adjusting unit is used for inputting the first words and the second words into a preset search engine, adjusting the preset search engine to be in an initial state, and adjusting the search quantity parameters of the preset search engine to be target numerical values;
the search information acquisition unit is configured to search the first word and the second word and acquire first search information and second search information output by the preset search engine, where the first search information corresponds to the first word, the second search information corresponds to the second word, and the number of search links in the first search information and the number of search links in the second search information are both the target numerical value.
Optionally, the approximation calculation module 30 includes:
the first screening unit is used for screening target search links contained in the first search information and the second search information and determining the target number of the target search links;
a target ratio calculation unit, configured to calculate a target ratio between the target number and the target value, use a search link in the first search information as a first search link, and use a search link in the second search information as a second search link;
a similarity value calculation unit configured to calculate a similarity value between first link display information and second link display information, where the first link display information corresponds to the first search link and the second link display information corresponds to the second search link;
and the first comparison result determining unit is used for determining an approximate degree comparison result of the first word and the second word according to the target ratio and the similarity value.
Optionally, the similarity value calculating unit includes:
a resource locator acquiring unit, configured to acquire a first resource locator corresponding to the first link display information, and a second resource locator corresponding to the second link display information;
a second screening unit, configured to screen out target resource locators included in both the first resource locator and the second resource locator;
and the similarity value determining unit is used for determining the similarity value of the first link display information and the second link display information according to the target number and the number of the target resource positioning marks.
Optionally, the similarity value calculating unit includes:
the classified information determining unit is used for determining first link classified information corresponding to the first search link and second link classified information corresponding to the second search link according to a preset link classified label;
a similarity degree calculation unit configured to calculate a similarity degree between the first link classification information and the second link classification information;
the first comparison result determination unit includes:
and the second comparison result determining unit is used for determining the similarity comparison result of the first word and the second word according to the target ratio, the similarity value and the similarity degree.
In addition, the embodiment of the present invention further provides a storage medium, where a network search-based hypernym mining program is stored on the storage medium, and when executed by a processor, the network search-based hypernym mining program implements the operations in the network search-based hypernym mining method provided in the above embodiment.
The method executed by each program module can refer to each embodiment of the method of the present invention, and is not described herein again.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity/action/object from another entity/action/object without necessarily requiring or implying any actual such relationship or order between such entities/actions/objects; the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
For the apparatus embodiment, since it is substantially similar to the method embodiment, it is described relatively simply, and reference may be made to some descriptions of the method embodiment for relevant points. The above-described apparatus embodiments are merely illustrative, in that elements described as separate components may or may not be physically separate. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement it without inventive effort.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method for mining synonyms based on network search according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1.一种基于网络搜索的近义词挖掘方法,其特征在于,所述基于网络搜索的近义词挖掘方法包括以下步骤:1. a synonym mining method based on network search, is characterized in that, described synonym mining method based on network search comprises the following steps: 检测待进行近似度对比的第一词语与第二词语,将所述第一词语与所述第二词语输入预设模型,得到所述第一词语对应的第一词向量,以及所述第二词语对应的第二词向量;Detecting the first word and the second word for similarity comparison, inputting the first word and the second word into a preset model to obtain a first word vector corresponding to the first word, and the second word vector The second word vector corresponding to the word; 计算所述第一词向量和所述第二词向量的相似度;calculating the similarity between the first word vector and the second word vector; 比较所述相似度与预设阈值,若所述相似度大于所述预设阈值,则将所述第一词语和所述第二词语输入预设搜索引擎进行搜索,分别得到所述第一词语对应的第一搜索信息,以及所述第二词语对应的第二搜索信息;Compare the similarity with a preset threshold, and if the similarity is greater than the preset threshold, enter the first word and the second word into a preset search engine for searching, and obtain the first word respectively Corresponding first search information, and second search information corresponding to the second word; 根据所述第一搜索信息与所述第二搜索信息,确定所述第一词语与所述第二词语的近似度对比结果。According to the first search information and the second search information, a similarity comparison result between the first word and the second word is determined. 2.如权利要求1所述的基于网络搜索的近义词挖掘方法,其特征在于,所述检测待进行近似度对比的第一词语与第二词语,将所述第一词语与所述第二词语输入预设模型,得到所述第一词语对应的第一词向量,以及所述第二词语对应的第二词向量的步骤之前,包括:2. The method for mining synonyms based on network search as claimed in claim 1, wherein the detection of the first word and the second word for similarity comparison is performed, and the first word and the second word are compared between the first word and the second word. Before the step of inputting the preset model to obtain the first word vector corresponding to the first word and the second word vector corresponding to the second word, the steps include: 接收近义词挖掘指令,获取所述近义词挖掘指令对应的历史文库;receiving a synonym mining instruction, and obtaining a historical library corresponding to the synonym mining instruction; 对所述历史文库进行分词处理,得到历史词库,并对历史词库进行去噪处理,得到预处理词库;Perform word segmentation processing on the historical library to obtain a historical thesaurus, and perform denoising processing on the historical thesaurus to obtain a preprocessing thesaurus; 接收近义词判断指令,从所述预处理词库中获取所述近义词判断指令对应的第一词语和第二词语。A synonym judgment instruction is received, and a first word and a second word corresponding to the synonym judgment instruction are acquired from the preprocessing thesaurus. 3.如权利要求1所述的基于网络搜索的近义词挖掘方法,其特征在于,所述计算所述第一词向量和所述第二词向量的相似度,包括:3. The method for mining synonyms based on network search according to claim 1, wherein the calculating the similarity between the first word vector and the second word vector comprises: 获取所述第一词向量与所述第二词向量之间的向量夹角,计算所述第一词向量与所述第二词向量的长度比值;Obtain the vector angle between the first word vector and the second word vector, and calculate the length ratio of the first word vector and the second word vector; 根据所述向量夹角与所述长度比值,确定所述第一词向量与所述第二词向量的相似度。The similarity between the first word vector and the second word vector is determined according to the ratio of the included angle of the vectors to the length. 4.如权利要求1所述的基于网络搜索的近义词挖掘方法,其特征在于,所述将所述第一词语和所述第二词语输入预设搜索引擎进行搜索,分别得到所述第一词语对应的第一搜索信息,以及所述第二词语对应的第二搜索信息的步骤包括:4. The method for mining synonyms based on network search according to claim 1, wherein the first word and the second word are input into a preset search engine for searching, and the first word is obtained respectively The steps of the corresponding first search information and the second search information corresponding to the second words include: 将所述第一词语和所述第二词语输入预设搜索引擎,并将所述预设搜索引擎调整为初始状态,以及将所述预设搜索引擎的搜索量参数调整为目标数值;Inputting the first word and the second word into a preset search engine, adjusting the preset search engine to an initial state, and adjusting the search volume parameter of the preset search engine to a target value; 对所述第一词语和所述第二词语进行搜索,并获取所述预设搜索引擎输出的第一搜索信息和第二搜索信息,其中,所述第一搜索信息与所述第一词语对应,所述第二搜索信息与所述第二词语对应,所述第一搜索信息中搜索链接的数量与所述第二搜索信息中搜索链接的数量均为所述目标数值。Search the first word and the second word, and obtain the first search information and the second search information output by the preset search engine, wherein the first search information corresponds to the first word , the second search information corresponds to the second word, and the number of search links in the first search information and the number of search links in the second search information are both the target values. 5.如权利要求4所述的基于网络搜索的近义词挖掘方法,其特征在于,所述根据所述第一搜索信息与所述第二搜索信息,确定所述第一词语与所述第二词语的近似度对比结果的步骤包括:5 . The method for mining synonyms based on network search according to claim 4 , wherein the first word and the second word are determined according to the first search information and the second search information. 6 . The steps to compare the results of the approximation include: 筛选所述第一搜索信息与所述第二搜索信息中都包含的目标搜索链接,确定所述目标搜索链接的目标数量;Screening the target search links contained in both the first search information and the second search information, and determining the target number of the target search links; 计算所述目标数量与所述目标数值的目标比值,并将所述第一搜索信息中的搜索链接作为第一搜索链接,将所述第二搜索信息中的搜索链接作为第二搜索链接;Calculate the target ratio between the target quantity and the target value, and use the search link in the first search information as the first search link, and use the search link in the second search information as the second search link; 计算第一链接显示信息与第二链接显示信息的相似值,其中,所述第一链接显示信息与所述第一搜索链接对应,所述第二链接显示信息与所述第二搜索链接对应;calculating a similarity value between first link display information and second link display information, wherein the first link display information corresponds to the first search link, and the second link display information corresponds to the second search link; 根据所述目标比值与所述相似值,确定所述第一词语与所述第二词语的近似度对比结果。According to the target ratio and the similarity value, a similarity comparison result between the first word and the second word is determined. 6.如权利要求5所述的基于网络搜索的近义词挖掘方法,其特征在于,所述计算第一链接显示信息与第二链接显示信息的相似值的步骤包括:6. The method for mining synonyms based on network search according to claim 5, wherein the step of calculating the similarity between the first link display information and the second link display information comprises: 获取所述第一链接显示信息对应的第一资源定位标志,所述第二链接显示信息对应的第二资源定位标志;obtaining a first resource location flag corresponding to the first link display information, and a second resource location flag corresponding to the second link display information; 筛选出所述第一资源定位标志与所述第二资源定位标志中都包含的目标资源定位标志;Screening out the target resource location flags contained in both the first resource location flag and the second resource location flag; 根据所述目标数量与所述目标资源定位标志的数量,确定所述第一链接显示信息与所述第二链接显示信息的相似值。A similarity value of the first link display information and the second link display information is determined according to the number of targets and the number of the target resource locating flags. 7.如权利要求5所述的基于网络搜索的近义词挖掘方法,其特征在于,所述计算第一链接显示信息与第二链接显示信息的相似值的步骤之后,包括:7. The method for mining synonyms based on network search as claimed in claim 5, wherein after the step of calculating the similarity between the first link display information and the second link display information, the method comprises: 根据预设链接分类标签,确定所述第一搜索链接对应的第一链接分类信息,所述第二搜索链接对应的第二链接分类信息;determining the first link classification information corresponding to the first search link and the second link classification information corresponding to the second search link according to the preset link classification label; 计算所述第一链接分类信息与所述第二链接分类信息的相似程度;calculating the degree of similarity between the first link classification information and the second link classification information; 所述根据所述目标比值与所述相似值,确定所述第一词语与所述第二词语的近似度对比结果包括:The determining the similarity comparison result between the first word and the second word according to the target ratio and the similarity value includes: 根据所述目标比值、所述相似值和所述相似程度,确定所述第一词语与所述第二词语的近似度对比结果。According to the target ratio, the similarity value and the similarity degree, a similarity comparison result between the first word and the second word is determined. 8.一种基于网络搜索的近义词挖掘装置,其特征在于,所述基于网络搜索的近义词挖掘装置包括:8. A synonym mining device based on network search, wherein the synonym mining device based on network search comprises: 向量输出模块,用于检测待进行近似度对比的第一词语与第二词语,将所述第一词语与所述第二词语输入预设模型,得到所述第一词语对应的第一词向量,以及所述第二词语对应的第二词向量;The vector output module is used to detect the first word and the second word to be compared with the similarity degree, input the first word and the second word into the preset model, and obtain the first word vector corresponding to the first word , and the second word vector corresponding to the second word; 相似度计算模块,用于计算所述第一词向量和所述第二词向量的相似度;a similarity calculation module for calculating the similarity between the first word vector and the second word vector; 搜索模块,用于比较所述第一词向量与所述第二词向量的相似度与预设阈值,若所述第一词向量与所述第二词向量的相似度大于所述预设阈值,则将所述第一词语和所述第二词语输入预设搜索引擎进行搜索,分别得到所述第一词语对应的第一搜索信息,以及所述第二词语对应的第二搜索信息;A search module for comparing the similarity between the first word vector and the second word vector with a preset threshold, if the similarity between the first word vector and the second word vector is greater than the preset threshold , then the first word and the second word are input into a preset search engine for searching, and the first search information corresponding to the first word and the second search information corresponding to the second word are obtained respectively; 近似度计算模块,用于根据所述第一搜索信息与所述第二搜索信息,确定所述第一词语与所述第二词语的近似度对比结果。A similarity calculation module, configured to determine a similarity comparison result between the first word and the second word according to the first search information and the second search information. 9.一种基于网络搜索的近义词挖掘设备,其特征在于,所述基于网络搜索的近义词挖掘设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的基于网络搜索的近义词挖掘程序,所述基于网络搜索的近义词挖掘程序被所述处理器执行时实现如权利要求1至7中任一项所述的基于网络搜索的近义词挖掘方法的步骤。9. A synonym mining device based on network search, characterized in that the synonym mining device based on network search comprises: a memory, a processor, and a network-based network-based device that is stored on the memory and can run on the processor. A search-based synonym mining program, which implements the steps of the web search-based synonym mining method according to any one of claims 1 to 7 when executed by the processor. 10.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有基于网络搜索的近义词挖掘程序,所述基于网络搜索的近义词挖掘程序被处理器执行时实现如权利要求1至7中任一项所述的基于网络搜索的近义词挖掘方法的步骤。10. A computer-readable storage medium, wherein the computer-readable storage medium stores a synonym mining program based on network search, and the synonym mining program based on network search is executed by a processor to achieve the method as claimed in the claims. Steps of the method for mining synonyms based on web search described in any one of 1 to 7.
CN202011317040.0A 2020-11-20 2020-11-20 Method, device, equipment and storage medium for synonym mining based on network search Pending CN112417003A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011317040.0A CN112417003A (en) 2020-11-20 2020-11-20 Method, device, equipment and storage medium for synonym mining based on network search

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011317040.0A CN112417003A (en) 2020-11-20 2020-11-20 Method, device, equipment and storage medium for synonym mining based on network search

Publications (1)

Publication Number Publication Date
CN112417003A true CN112417003A (en) 2021-02-26

Family

ID=74777187

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011317040.0A Pending CN112417003A (en) 2020-11-20 2020-11-20 Method, device, equipment and storage medium for synonym mining based on network search

Country Status (1)

Country Link
CN (1) CN112417003A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114153925A (en) * 2021-11-12 2022-03-08 城云科技(中国)有限公司 Data table association analysis method and device
CN114491215A (en) * 2021-12-28 2022-05-13 深圳市游迷天下科技有限公司 Search-based method, device, equipment and storage medium for updating word stock of similar senses
CN115708107A (en) * 2021-08-20 2023-02-21 中国移动通信集团有限公司 Internet of things equipment identifier construction method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101241512A (en) * 2008-03-10 2008-08-13 北京搜狗科技发展有限公司 Search method for redefining enquiry word and device therefor
CN102722498A (en) * 2011-03-31 2012-10-10 北京百度网讯科技有限公司 Search engine and implementation method thereof
CN107451126A (en) * 2017-08-21 2017-12-08 广州多益网络股份有限公司 A kind of near synonym screening technique and system
CN111666417A (en) * 2020-04-13 2020-09-15 百度在线网络技术(北京)有限公司 Method and device for generating synonyms, electronic equipment and readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101241512A (en) * 2008-03-10 2008-08-13 北京搜狗科技发展有限公司 Search method for redefining enquiry word and device therefor
CN102722498A (en) * 2011-03-31 2012-10-10 北京百度网讯科技有限公司 Search engine and implementation method thereof
CN107451126A (en) * 2017-08-21 2017-12-08 广州多益网络股份有限公司 A kind of near synonym screening technique and system
CN111666417A (en) * 2020-04-13 2020-09-15 百度在线网络技术(北京)有限公司 Method and device for generating synonyms, electronic equipment and readable storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115708107A (en) * 2021-08-20 2023-02-21 中国移动通信集团有限公司 Internet of things equipment identifier construction method, device, equipment and storage medium
CN115708107B (en) * 2021-08-20 2025-08-26 中国移动通信集团有限公司 Method, device, equipment and storage medium for constructing IoT device identification
CN114153925A (en) * 2021-11-12 2022-03-08 城云科技(中国)有限公司 Data table association analysis method and device
CN114491215A (en) * 2021-12-28 2022-05-13 深圳市游迷天下科技有限公司 Search-based method, device, equipment and storage medium for updating word stock of similar senses
CN114491215B (en) * 2021-12-28 2024-08-30 深圳市游迷天下科技有限公司 Search-based paraphrasing library updating method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
US9489401B1 (en) Methods and systems for object recognition
CN108388650B (en) Search processing method and device based on requirements and intelligent equipment
CN110597994A (en) Event element identification method and device
US20170004820A1 (en) Method for building a speech feature library, and method, apparatus, device, and computer readable storage media for speech synthesis
CN113656582A (en) Training method of neural network model, image retrieval method, equipment and medium
CN112417003A (en) Method, device, equipment and storage medium for synonym mining based on network search
US8359306B2 (en) Intelligent automatic recognition toolbar search method and system
US20220058214A1 (en) Document information extraction method, storage medium and terminal
US20160103915A1 (en) Linking thumbnail of image to web page
CN110795942B (en) Keyword determination method and device based on semantic recognition and storage medium
CN113806588A (en) Method and device for searching video
CN113704184A (en) File classification method, device, medium and equipment
CN112579729A (en) Training method and device for document quality evaluation model, electronic equipment and medium
CN118525274A (en) System and method for determining content similarity by comparing semantic entity attributes
CN120196754A (en) Sensitive text classification method, device, computer equipment and storage medium
JP5687312B2 (en) Digital information analysis system, digital information analysis method, and digital information analysis program
CN111552829A (en) Method and apparatus for analyzing image material
JP6640519B2 (en) Information analysis device and information analysis method
CN111144122B (en) Evaluation processing method, device, computer system and medium
CN118069834A (en) Method, device, equipment and medium for speech quality inspection and training method of classification model
CN110209880A (en) Video content retrieval method, Video content retrieval device and storage medium
WO2018171499A1 (en) Information detection method, device and storage medium
KR102546331B1 (en) Method and system for crawling based on image
KR20220024251A (en) Method and apparatus for building event library, electronic device, and computer-readable medium
CN111814483B (en) Method and device for analyzing emotion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210226

WD01 Invention patent application deemed withdrawn after publication