CN114818706B

CN114818706B - A text matching method, device, electronic device and storage medium, and a government service text matching method

Info

Publication number: CN114818706B
Application number: CN202110130726.7A
Authority: CN
Inventors: 王彬铸; 郭立帆; 李海军; 丁菱; 韩雨轩; 韩喆; 冉秋萍
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2021-01-29
Filing date: 2021-01-29
Publication date: 2025-01-17
Anticipated expiration: 2041-01-29
Also published as: CN114818706A

Abstract

The application provides a text matching method, which comprises the steps of obtaining a target entity category corresponding to an entity name text to be matched, obtaining a candidate standardized entity name text corresponding to the entity name text to be matched, of which the entity category is the same as the target entity category, according to the target entity category, and obtaining a target standardized entity name text matched with the entity name text to be matched from the candidate standardized entity name text according to the text similarity of the entity name text to be matched and the candidate standardized entity name text. According to the text matching method provided by the application, the target standardized entity name text matched with the entity name text to be matched can be obtained according to the text similarity between the entity name text to be matched and the candidate standardized entity name text, and the entity name text to be matched and the target standardized entity name text are not required to be matched in a manual summarization mode, so that the efficiency of standardization of the entity name document is improved.

Description

Text matching method and device, electronic equipment, storage medium and government service text matching method

Technical Field

The application relates to the technical field of computers, in particular to a text matching method. The application also relates to a text matching device, electronic equipment, a storage medium and a government service text matching method.

Background

With the rapid development of internet technology, more and more business service systems based on internet are coming out or applied to different service fields, for example, the technical system of "internet+government service" applied to government service is already built. But the internet-based business service system has to face a plurality of problems, such as standardization of entity name documents.

Taking the example of "internet+government service", it is necessary to analyze which materials are electronically available in the optimization process of government service matters, and at this time, it is common practice to collect the most dependent materials of government service matters from each government and analyze which materials are electronically available. However, the government in each place often has non-standardized descriptions of the names of government service materials in describing the government materials required for the government service matters, such as "people's republic of China" described as "personal identification card", "two person identification card", "couple identification card" and so on. The non-standardized description of government service material names not only results in poor user experience, but also presents significant challenges for subsequent government service event optimization. Therefore, how to normalize the non-standardized entity name document becomes a problem to be solved in the development of the internet-based business service system.

In the prior art, the method for solving the problem of standardization of non-standardized entity name documents generally comprises the steps of establishing a database containing a large number of entity name documents of the same category, summarizing the non-standardized entity name documents describing the same entity name in the database by manual summarization, and then linking the summarized non-standardized entity name documents to the standardized entity name documents describing the entity name. Because the existing method for solving the problem of non-standardized entity name document standardization is based on manual summarization, the non-standardized entity name document standardization efficiency is low.

Disclosure of Invention

The application provides a text matching method, a text matching device, electronic equipment and a storage medium, so as to improve the standardization efficiency of entity name documents.

The application provides a text matching method, which comprises the following steps:

obtaining a target entity category corresponding to the entity name text to be matched;

according to the target entity category, obtaining candidate standardized entity name texts corresponding to the entity name texts to be matched, wherein the entity category of the candidate standardized entity name texts is the same as the target entity category;

And obtaining a target standardized entity name text matched with the entity name text to be matched from the candidate standardized entity name text according to the text similarity between the entity name text to be matched and the candidate standardized entity name text.

Optionally, the method further comprises providing the target standardized entity name text to a user device.

Optionally, the method further comprises the step of associating the entity name text to be matched with the target standardized entity name text.

Optionally, the associating the entity name text to be matched with the target standardized entity name text includes establishing a corresponding relationship between the entity name text to be matched and the target standardized entity name text.

Optionally, the obtaining the target entity category corresponding to the entity name text to be matched includes obtaining a text matching instruction sent by the user equipment, wherein the text matching instruction carries the entity name text to be matched;

The providing the location recommendation information to the user equipment comprises providing the target standardized entity name text to the user equipment for the text matching instruction.

Optionally, the method further comprises the step of displaying the target standardized entity name text.

Optionally, the obtaining the target entity category corresponding to the entity name text to be matched includes:

performing word segmentation on the entity name text to be matched by adopting a preset word segmentation strategy to obtain category keywords in the entity name text to be matched;

And obtaining the target entity category according to the category keywords in the entity name text to be matched.

Optionally, the obtaining, according to the target entity category, a candidate standardized entity name text corresponding to the entity name text to be matched, where the entity category is the same as the target entity category, includes:

obtaining an associated standardized entity name text associated with the entity name to be matched according to the keywords in the entity name text to be matched;

Obtaining entity categories of the associated standardized entity name text;

and obtaining the candidate standardized entity name text from the associated standardized entity name text according to the target entity category and the entity category.

Optionally, the obtaining, according to the text similarity between the entity name text to be matched and the candidate standardized entity name text, the target standardized entity name text matched with the entity name text to be matched from the candidate standardized entity name text includes:

obtaining the weight of the keywords in the entity name text to be matched and the weight of the keywords in the candidate standardized entity name text;

Obtaining a first word vector corresponding to a keyword in the entity name text to be matched and a second word vector corresponding to the keyword in the candidate standardized entity name text;

obtaining word vector similarity of the first word vector and the second word vector according to the weight of the keyword in the entity name text to be matched, the weight of the keyword in the candidate standardized entity name text, the first word vector and the second word vector;

And obtaining the text similarity according to the word vector similarity.

Optionally, the obtaining the text similarity according to the word vector similarity includes:

Obtaining a character string matched with the entity name text to be matched;

Obtaining character strings corresponding to the candidate standardized entity name text;

Obtaining the character string similarity of the character string matched by the entity name text to be matched and the character string corresponding to the candidate standardized entity name text according to the character string matched by the entity name text to be matched and the character string corresponding to the candidate standardized entity name text;

And obtaining the text similarity according to the word vector similarity and the character string similarity.

Optionally, the obtaining the text similarity according to the word vector similarity and the character string similarity includes weighting the word vector similarity and the character string similarity according to a first similarity weight corresponding to the preset word vector similarity and a second similarity weight corresponding to the word vector similarity, so as to obtain the text similarity.

Optionally, judging whether the text similarity reaches a text similarity threshold;

The method comprises the steps of obtaining a target standardized entity name text matched with the entity name text to be matched from the candidate standardized entity name text according to the text similarity between the entity name text to be matched and the candidate standardized entity name text, and obtaining the candidate standardized entity name text with the text similarity reaching the text similarity threshold from the candidate standardized entity name text as the target standardized entity name text if the text similarity reaches the text similarity threshold.

Optionally, the obtaining the candidate standardized entity name text with the text similarity reaching the text similarity threshold from the candidate standardized entity name texts as a target standardized entity name text comprises obtaining the candidate standardized entity name text with the text similarity reaching the text similarity threshold and highest similarity from the candidate standardized entity name texts as a target standardized entity name text.

Optionally, if the text similarity does not reach the text similarity threshold, determining that the target standardized entity name text does not exist in the candidate standardized entity name text.

In another aspect of the present application, there is also provided a text matching apparatus, including:

the target entity category obtaining unit is used for obtaining a target entity category corresponding to the entity name text to be matched;

A candidate text obtaining unit, configured to obtain, according to the target entity category, a candidate standardized entity name text corresponding to the entity name text to be matched, where the entity category is the same as the target entity category;

And the target text matching unit is used for obtaining target standardized entity name text matched with the entity name text to be matched from the candidate standardized entity name text according to the text similarity between the entity name text to be matched and the candidate standardized entity name text.

In another aspect of the present application, there is also provided an electronic device, including:

Processor, and

A memory for storing a program of a text matching method, the apparatus being powered on and executing the program of the text matching method by the processor, and performing the steps of:

In another aspect of the present application, there is also provided a storage medium storing a program of a text matching method, the program being executed by a processor, for obtaining a target entity category corresponding to a name text of an entity to be matched;

In another aspect of the present application, there is also provided a government service text matching method, including:

obtaining a target entity category corresponding to an entity name text to be matched for describing the government service material name;

obtaining a target standardized entity name text matched with the entity name text to be matched from the candidate standardized entity name text according to the text similarity between the entity name text to be matched and the candidate standardized entity name text;

and associating the entity name text to be matched with the target standardized entity name text.

In another aspect of the present application, there is also provided an address text matching method, including:

Obtaining a target entity category corresponding to an entity name text to be matched for describing the geographical location name;

Compared with the prior art, the application has the following advantages:

The application provides a text matching method, which comprises the steps of obtaining a target entity category corresponding to an entity name text to be matched, obtaining a candidate standardized entity name text corresponding to the entity name text to be matched, of which the entity category is the same as the target entity category, according to the target entity category, and obtaining a target standardized entity name text matched with the entity name text to be matched from the candidate standardized entity name text according to the text similarity between the entity name text to be matched and the candidate standardized entity name text. According to the text matching method provided by the application, the target standardized entity name text matched with the entity name text to be matched can be obtained according to the text similarity between the entity name text to be matched and the candidate standardized entity name text, and the entity name text to be matched and the target standardized entity name text are not required to be matched in a manual summarization mode, so that the efficiency of standardization of the entity name document is improved.

Drawings

Fig. 1 is a schematic diagram of a first scenario of a text matching method according to a first embodiment of the present application.

Fig. 2 is a flowchart of a text matching method according to a first embodiment of the present application.

Fig. 3 is a schematic diagram of a first scenario of a text matching method according to a first embodiment of the present application.

Fig. 4 is a schematic diagram of a text matching device according to a second embodiment of the present application.

Fig. 5 is a schematic diagram of an electronic device according to an embodiment of the present application.

Fig. 6 is a flowchart of a text matching method according to a fifth embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. The present application may be embodied in many other forms than those herein described, and those skilled in the art will readily appreciate that the present application may be similarly embodied without departing from the spirit or essential characteristics thereof, and therefore the present application is not limited to the specific embodiments disclosed below.

In order to more clearly show the text matching method provided in the first embodiment of the present application, first, an application scenario of the text matching method provided in the first embodiment of the present application is introduced. In the text matching method provided in the first embodiment of the present application, the execution subject may be a server, or may be a client installed with a related text recognition application, or may be both a server and a client, that is, the text matching method is completed by interaction between the server and the client and the server. The client is an Application program or software installed on a user device and capable of implementing the text matching method provided by the embodiment of the present application, the user device is typically a mobile phone, a PC (Personal Computer, a personal computer) or a tablet computer in a specific implementation manner, and the Application program (APP) or software capable of implementing the text matching method provided by the embodiment of the present application may be a mobile phone Application program, web page online text matching software, computer software or the like. A server is a computing device for providing services such as data processing to the clients, and is typically a server or a server cluster in a specific implementation manner.

In the first embodiment of the present application, specifically, an application scenario in which the text matching method is completed by interaction between the server and the client and the server, and computer software in which the client is installed on a computer and is capable of implementing the text matching method provided by the embodiment of the present application are taken as examples, and reference is made to fig. 1 for describing in detail the text matching method provided in the first embodiment of the present application, which is a schematic diagram of a first scenario of the text matching method provided in the first embodiment of the present application.

First, after the user equipment obtains a text matching triggering operation triggered by the client 101, the client 101 obtains the entity name text to be matched based on the user equipment.

In the first embodiment of the present application, the entity name text is text for describing the name of the entity in the target text, and the target text is generally a document, a paragraph or a sentence, such as government service matters, network service matters outside the government service, equipment operation flow, and description of chemical experiment steps, etc. When the entity name text is a text describing the name of the government service material, the entity is generally the required government service material in the government service matters, and when the entity name text is a text describing the name of the network service material, the entity can also be the required network service material in the network service matters except the government service. In addition, the entity may be other types of entities, such as equipment in the operation flow of the equipment, chemicals in the introduction of chemical experiment steps, chemical reaction devices, and the like. That is, in the first embodiment of the present application, the target text and the entity are not particularly limited.

The entity name text to be matched is generally non-standardized entity name text, and can also be standardized entity name text. The standardized entity name text and the non-standardized entity name text are respectively a standard description text and a non-standard description text for the same entity name. Specifically, the standardized entity name text of the resident identification card of the people's republic of China can be taken as an example, the standardized entity name text of the resident identification card of the people's republic of China can be the resident identification card of the people's republic of China, and the non-standardized entity name text can be the personal identification card, the double identification card, the couple identification card and the like. The "real estate registration application form" may be taken as an example, the standardized entity name text of the "real estate registration application form" may be the "real estate registration application form", and the non-standardized entity name text may be the "real estate registration application form" or the like. The "personnel file" may also be used as an example, the standardized entity name text of the "personnel file" may be "personnel file", and the non-standardized entity name text may be "applicant personnel file", "participant personnel file", and "personnel file original" etc.

In the "internet+government service", the non-standardized description of the government service material names not only results in poor user experience, but also presents great challenges for the optimization of subsequent government service matters. In other application scenarios, inconvenience is brought to the user. Therefore, the text matching method provided in the first embodiment of the present application needs to match the entity name text to be matched with the corresponding standardized entity name text. That is, the target normalized entity name text that matches the entity name text to be matched is obtained, that is, the non-normalized entity name text is normalized. The target standardized entity name text is a standard description text which has a text similarity with the entity name text to be matched exceeding a text similarity threshold and describes the same entity name as the entity name text to be matched.

Then, after obtaining the entity name text to be matched, the client 101 further sends a text matching instruction to the server 102 based on the triggering operation of the user on the user equipment, where the text matching instruction carries the entity name text to be matched. In addition, after obtaining the entity name text to be matched, the client 101 may send the entity name text to be matched to the server 102, and then send a text matching instruction for the entity name text to be matched to the server 102. In the first embodiment of the present application, the specific manner in which the client 101 sends the entity name text to be matched to the server 102 is not specifically limited. After the server 102 obtains the text matching instruction carrying the entity name text to be matched, the following steps are sequentially performed to obtain the target standardized entity name text matched with the entity name text to be matched, and referring specifically to fig. 2, which is a flowchart of a text matching method provided in the first embodiment of the present application.

In step S201, a target entity category corresponding to the entity name text to be matched is obtained.

The entity category is a category of an entity described by entity name text, and is a category which is divided in advance according to a preset entity category division strategy. In the specific implementation process, the process of obtaining the target entity category comprises the steps of firstly, performing word segmentation on the entity name text to be matched by adopting a preset word segmentation strategy to obtain category keywords in the entity name text to be matched, and then obtaining the target entity category according to the category keywords in the entity name text to be matched. The category keywords are words in the entity name text which can identify the category of the entity. For example, the entity name text to be matched is "some application form", "some certificate" and "some table", and the like, and the "application form", "certificate" and "table" are the category keywords in the entity name text to be matched. In the first embodiment of the present application, the categories are pre-divided according to a preset entity category division policy, that is, entity categories determined according to category keywords in the entity name text that are pre-counted. Therefore, the "application book" class, the "certificate" class, the "table" class, and the like are the classes which are divided in advance according to the preset entity class division policy.

In the first embodiment of the application, the specific implementation manner of obtaining the target entity category according to the category keyword in the entity name text to be matched is that the target entity category is obtained according to the category keyword in the entity name text to be matched and the corresponding relation between the category keyword and the category.

In the first embodiment of the application, the category of the entity is determined according to the category keywords in the entity name text counted in advance, so that the entity category of the entity name text can be obtained directly according to the category keywords after the category keywords in the entity name text are obtained, and the acquisition of the entity category of the entity name text is simpler and faster.

It should be noted that, when encountering a new entity name text, the entity category of the new entity name text may be confirmed by using an inverse maximum matching method.

The preset word segmentation strategy can be that a Chinese word segmentation device used for word segmentation in natural language processing is adopted to segment the text of the entity name to be matched. In the first embodiment of the present application, in order to control the granularity of word segmentation, a personalized dictionary may be further introduced, the text of the entity name text is segmented into continuous word sequences according to the sequence, and then whether the continuous word sequences are the final word segmentation result is determined according to the rule and whether the continuous word sequences are in the given personalized dictionary.

In step S202, according to the target entity category, a candidate standardized entity name text corresponding to the entity name text to be matched, the entity category of which is the same as the target entity category, is obtained.

The candidate standardized entity name text corresponding to the entity name text to be matched is standardized entity name text which is obtained based on the keywords in the entity name text to be matched and is associated with the keywords in the entity name text to be matched. In a specific implementation process, a recall strategy of bm25 (Best Match) is adopted for entity name texts to be matched, and an associated standardized entity name text associated with the entity name to be matched is quickly recalled in preset standardized entity name text data by using an ES (ELASTIC SEARCH, distributed full text retrieval) tool.

In the first embodiment of the application, the specific implementation manner of obtaining the candidate standardized entity name text is that firstly, the associated standardized entity name text associated with the entity name to be matched is obtained according to the keywords in the entity name text to be matched. Then, an entity category associated with the normalized entity name text is obtained. And finally, obtaining candidate standardized entity name texts from the associated standardized entity name texts according to the target entity category and the entity category. The detailed process of associating the entity category of the standardized entity name text is similar to the process of obtaining the target entity category corresponding to the entity name text to be matched, and the detailed description of the process of obtaining the target entity category in step S201 is referred to, and will not be repeated here in detail.

In step S203, a target normalized entity name text matching the entity name text to be matched is obtained from the candidate normalized entity name text according to the text similarity between the entity name text to be matched and the candidate normalized entity name text.

In the implementation process, the text similarity obtaining process comprises the steps of firstly obtaining the weight of the keywords in the entity name text to be matched and the weight of the keywords in the candidate standardized entity name text. Secondly, a first word vector corresponding to the keyword in the entity name text to be matched and a second word vector corresponding to the keyword in the candidate standardized entity name text are obtained. Thirdly, obtaining the word vector similarity of the first word vector and the second word vector according to the weight of the keyword in the entity name text to be matched, the weight of the keyword in the candidate standardized entity name text, the first word vector and the second word vector. Fourth, according to the word vector similarity, the text similarity is obtained.

In the first embodiment of the application, a first Word Vector corresponding To a keyword in an entity name text To be matched and a second Word Vector corresponding To a keyword in a candidate standardized entity name text are obtained by mapping the keyword in the entity name text To be matched and the keyword in the candidate standardized entity name text To a Vector based on a Word2vec (Word To Vector for generating a related model of the Word Vector) model, so that the first Word Vector and the second Word Vector are obtained, and the Word2vec model is a network model which is obtained by training the model and then quickly and effectively represents a Word into a Vector form according To a given corpus.

In the first embodiment of the present application, the specific implementation manner of obtaining the weight of the keyword in the entity name text to be matched and the weight of the keyword in the candidate standardized entity name text is as follows:

first, an entity name text to be matched and a candidate standardized entity name text are used as a target text set.

Then, the entity name text to be matched and the candidate standardized entity name text are respectively segmented, and different keywords in the entity name text to be matched and the candidate standardized entity name text are obtained.

Finally, TF-IDF (Term Frequency) technology of TF-IDF (Term Frequency-Inverse Document Frequency, common weighting technology of information retrieval data mining) technology is adopted to respectively obtain TF (Term Frequency) of different keywords in the entity name text to be matched, IDF (Inverse Document Frequency, inverse text Frequency index) of different keywords in the entity name text to be matched in a target text set, word Frequency of different keywords in the candidate standardized entity name text and inverse text Frequency index of different keywords in the candidate standardized entity name text in the target text set, then TF-IDF of different keywords in the entity name text to be matched and TF-IDF of different keywords in the candidate standardized entity name text are obtained according to the word Frequency and the inverse text Frequency index, and the similarity vector of different keywords in the candidate standardized entity name text and the candidate standardized entity name text is determined according to the TF-IDF of different keywords in the entity name text to be matched.

Taking government service material name text "resident identification card of the people's republic of China" as an example, if the government service material name text "resident identification card of the people's republic of China" is aimed at a target text set formed by the candidate standardized entity name text corresponding to the government service material name text, TF-IDF of the people's republic of China "is 2, TF-IDF of the resident is 1, TF-IDF of the identity card is 7, and when word vector similarity is calculated, the weight of the people's republic of China" is 0.2, the weight of the resident is 0.1, and the weight of the identity card is 0.7.

According to the first embodiment of the application, according to the weight of the keyword in the entity name text to be matched, the weight of the keyword in the candidate standardized entity name text, the first word vector and the second word vector, the process of obtaining the word vector similarity of the first word vector and the second word vector is that the cosine similarity of the first word vector and the second word vector is solved as the word vector similarity aiming at the first word vector and the second word vector. Specifically, when the cosine similarity of the first word vector and the second word vector is solved for the first word vector and the second word vector, the elements in the first word vector and the second word vector are multiplied by the weights of the elements.

In order to improve accuracy of text similarity between entity name texts to be matched and candidate standardized entity name texts, in the first embodiment of the application, when the text similarity is obtained, character string similarity is further introduced to solve the text similarity, in the specific implementation process, firstly, character strings matched with the entity name texts to be matched are obtained, secondly, character strings corresponding to the candidate standardized entity name texts are obtained, thirdly, character string similarity between the character strings matched with the entity name texts to be matched and the character strings corresponding to the candidate standardized entity name texts is obtained according to the character strings matched with the entity name texts to be matched and the character strings corresponding to the candidate standardized entity name texts, and finally, the text similarity is obtained according to the word vector similarity and the character string similarity. Specifically, according to a first similarity weight corresponding to the preset word vector similarity and a second similarity weight corresponding to the word vector similarity, weighting the word vector similarity and the character string similarity to obtain the text similarity.

In the first embodiment of the present application, before obtaining the text similarity according to the word vector similarity and the character string similarity, it is required to determine whether the text similarity reaches the text similarity threshold. And if the text similarity reaches the text similarity threshold, obtaining the candidate standardized entity name text with the text similarity reaching the text similarity threshold from the candidate standardized entity name texts as the target standardized entity name text. Specifically, from candidate standardized entity name texts, candidate standardized entity name texts with the highest text similarity reaching a text similarity threshold are obtained as target standardized entity name texts. For example, the text similarity between the personal identity card and the resident identity card of the people's republic of China is 0.78, and when the preset text similarity threshold value is 0.7, the resident identity card of the people's republic of China is used as a target standardized entity name text of the personal identity card.

In addition, if the text similarity does not reach the text similarity threshold, it is determined that the target normalized entity name text does not exist in the candidate normalized entity name text.

In the first embodiment of the present application, when the server 102 determines that the target standardized entity name text does not exist in the candidate standardized entity name text, feedback information that the target standardized entity name text does not exist is generated and fed back to the client 101, and the client 101 displays the feedback information through an interactive interface of the user equipment.

After the server 102 obtains the target standardized entity name text, the target standardized entity name text may be provided to the client 101 for the text matching instruction, and then the client 101 associates the entity name text to be matched with the target standardized entity name text, that is, establishes a corresponding relationship between the entity name text to be matched and the target standardized entity name text. In addition, after the server 102 obtains the target standardized entity name text, the entity name text to be matched may be associated with the target standardized entity name text, and then the association result may be obtained and provided to the client 101.

The text matching method provided in the first embodiment of the present application may also be applied to an application scenario in which a server is an execution subject, please refer to fig. 3, which is a schematic diagram of a first scenario of the text matching method provided in the first embodiment of the present application.

Step S301, obtaining entity name text to be matched. Step S302, obtaining a target entity category, namely obtaining the target entity category corresponding to the entity name text to be matched. Step S303, obtaining candidate standardized entity name texts, namely obtaining candidate standardized entity name texts corresponding to entity name texts to be matched, the entity categories of which are identical to the target entity categories, according to the target entity categories. And S304-1, obtaining word vector similarity. And S304-2, obtaining the similarity of the character strings. In step S305, the text similarity is obtained, namely, the text similarity is obtained according to the word vector similarity and the character string similarity. And step S306, judging whether the text similarity reaches a text similarity threshold value. And step S306-1, if yes, obtaining a target standardized entity name text, namely, obtaining a candidate standardized entity name text with the text similarity reaching a text similarity threshold from the candidate standardized entity name text as the target standardized entity name text. If not, step S306-2 is to obtain feedback information, namely, determining that the target standardized entity name text does not exist in the candidate standardized entity name text, and obtaining feedback information of the unobtained target standardized entity name text.

The text matching method provided in the first embodiment of the present application can also be applied to an application scenario in which a client is an execution subject.

In the implementation process, after obtaining the entity name text to be matched, the client sequentially executes the following steps of firstly obtaining a target entity category corresponding to the entity name text to be matched, then obtaining a candidate standardized entity name text corresponding to the entity name text to be matched, of which the entity category is the same as the target entity category, according to the target entity category, and finally obtaining a target standardized entity name text matched with the entity name text to be matched from the candidate standardized entity name text according to the text similarity between the entity name text to be matched and the candidate standardized entity name text.

In the first embodiment of the present application, the application scenario of the text matching method provided in the first embodiment of the present application is not specifically limited, for example, the text matching method provided in the first embodiment of the present application may also be applied to other scenarios, and will not be described in detail herein. The above application scenario is provided to facilitate understanding of the text matching method provided in the first embodiment of the present application, and is not limited to the text matching method provided in the first embodiment of the present application.

The first embodiment of the application provides a text matching method, which comprises the steps of obtaining a plurality of position element texts in a position query text, wherein the plurality of position element texts are texts used for describing positions in the position query text, obtaining candidate position information corresponding to the plurality of position element texts aiming at the plurality of position element texts, and sorting the candidate position information corresponding to the plurality of position element texts according to at least one of the occurrence times of the plurality of position element texts in the position query text and the clustering score of the candidate position information corresponding to the plurality of position element texts, and obtaining position recommendation information aiming at the position query text according to the sorting result. According to the text matching method provided by the first embodiment of the application, the candidate position information corresponding to the plurality of position element texts is ordered based on the plurality of position element texts, so that the candidate position information in the position recommendation information aiming at the position query text can be ensured to correspond to the plurality of position element texts, and the accuracy of the position recommendation information when the plurality of position element texts exist in the position query text is improved.

The first embodiment of the application provides a text matching method, which comprises the steps of obtaining a target entity category corresponding to an entity name text to be matched, obtaining a candidate standardized entity name text corresponding to the entity name text to be matched, of which the entity category is the same as the target entity category, according to the target entity category, and obtaining the target standardized entity name text matched with the entity name text to be matched from the candidate standardized entity name text according to the text similarity between the entity name text to be matched and the candidate standardized entity name text. According to the text matching method provided by the first embodiment of the application, the target standardized entity name text matched with the entity name text to be matched can be obtained according to the text similarity between the entity name text to be matched and the candidate standardized entity name text, and the entity name text to be matched and the target standardized entity name text are not required to be matched in a manual summarization mode, so that the efficiency of standardization of the entity name document is improved. Second embodiment

The second embodiment of the present application further provides a text matching method corresponding to the application scenario of the text matching method provided by the embodiment of the present application and the text matching method provided by the first embodiment. Since the embodiment of the device is basically similar to the application scenario of the text matching method provided by the embodiment of the present application and the text matching method provided by the first embodiment, the description is relatively simple, and please refer to the application scenario of the text matching method provided by the embodiment of the present application and the partial description of the text matching method provided by the first embodiment for relevant points. The device embodiments described below are merely illustrative.

The text matching device comprises:

A target entity category obtaining unit 401, configured to obtain a target entity category corresponding to the entity name text to be matched;

A candidate text obtaining unit 402, configured to obtain, according to the target entity category, a candidate standardized entity name text corresponding to the entity name text to be matched, where the entity category is the same as the target entity category;

And a target text matching unit 403, configured to obtain, from the candidate normalized entity name text, a target normalized entity name text that matches the entity name text to be matched according to the text similarity between the entity name text to be matched and the candidate normalized entity name text.

Optionally, the text matching device provided in the second embodiment of the present application further includes a text providing unit, configured to provide the target standardized entity name text to the user equipment.

Optionally, the text matching device provided in the second embodiment of the present application further includes a text associating unit, configured to associate the entity name text to be matched with the target standardized entity name text.

Optionally, the text association unit is specifically configured to establish a correspondence between the entity name text to be matched and the target standardized entity name text.

Optionally, the target entity category obtaining unit 401 is specifically configured to obtain a text matching instruction sent by the user equipment, where the text matching instruction carries the entity name text to be matched;

The text providing unit is specifically configured to provide the target standardized entity name text to the user equipment for the text matching instruction.

Optionally, the text matching device provided in the second embodiment of the present application further includes a text display unit, configured to display the target standardized entity name text.

Optionally, the target entity category obtaining unit 401 is specifically configured to segment the entity name text to be matched by adopting a preset word segmentation policy to obtain category keywords in the entity name text to be matched, and obtain the target entity category according to the category keywords in the entity name text to be matched.

Optionally, the candidate text obtaining unit 402 is specifically configured to obtain an associated normalized entity name text associated with the entity name to be matched according to a keyword in the entity name text to be matched, obtain an entity category of the associated normalized entity name text, and obtain the candidate normalized entity name text from the associated normalized entity name text according to the target entity category and the entity category.

Optionally, the target text matching unit 403 is specifically configured to obtain a weight of a keyword in the entity name text to be matched and a weight of a keyword in the candidate standardized entity name text, obtain a first word vector corresponding to the keyword in the entity name text to be matched and a second word vector corresponding to the keyword in the candidate standardized entity name text, obtain a word vector similarity of the first word vector and the second word vector according to the weight of the keyword in the entity name text to be matched, the weight of the keyword in the candidate standardized entity name text, the first word vector and the second word vector, and obtain the text similarity according to the word vector similarity.

Optionally, the text similarity is obtained according to the word vector similarity, and the text similarity comprises the steps of obtaining a character string matched with the entity name text to be matched;

Optionally, the text matching device provided in the second embodiment of the present application further includes a similarity judging unit, configured to judge whether the text similarity reaches a text similarity threshold;

The target text matching unit 403 is specifically configured to obtain, from the candidate normalized entity name texts, the candidate normalized entity name text whose text similarity reaches the text similarity threshold, as a target normalized entity name text, if the result of the similarity determination by the similarity determination unit is yes.

Optionally, the text matching device provided in the second embodiment of the present application further includes a result determining unit, configured to determine that the target standardized entity name text does not exist in the candidate standardized entity name text if the determination result of the similarity determining unit is negative.

Third embodiment

The third embodiment of the present application further provides an electronic device, corresponding to the application scenario of the text matching method provided by the embodiment of the present application and the text matching method provided by the first embodiment. Since the third embodiment is substantially similar to the application scenario of the text matching method provided by the embodiment of the present application and the text matching method provided by the first embodiment, the description is relatively simple, and the relevant points are only needed to refer to the partial description of the application scenario of the text matching method provided by the embodiment of the present application and the text matching method provided by the first embodiment. The third embodiment described below is merely illustrative.

Fig. 5 is a schematic diagram of an electronic device according to an embodiment of the application.

The electronic device includes a processor 501;

And a memory 502 for storing a program of the information processing method, the apparatus, after being powered on and running the program of the information processing method by the processor, performs the steps of:

It should be noted that, for the detailed description of the electronic device provided in the eighth embodiment of the present application, reference may be made to the application scenario of the live broadcast service system provided in the embodiment of the present application, the live broadcast service system provided in the first embodiment, and the related description of the foregoing method embodiments, which are not repeated herein.

Fourth embodiment

The fourth embodiment of the present application further provides a storage medium corresponding to the application scenario of the text matching method provided by the embodiment of the present application and the text matching method provided by the first embodiment. Since the fourth embodiment is substantially similar to the application scenario of the text matching method provided by the embodiment of the present application and the text matching method provided by the first embodiment, the description is relatively simple, and the relevant points are only needed to refer to the partial description of the application scenario of the text matching method provided by the embodiment of the present application and the text matching method provided by the first embodiment. The device embodiments described below are merely illustrative.

The storage medium stores a computer program that is executed by a processor to perform the steps of:

It should be noted that, for the detailed description of the storage medium provided in the ninth embodiment of the present application, reference may be made to the application scenario of the text matching method provided in the embodiment of the present application, and the related description of the text matching method provided in the first embodiment, which are not repeated here.

Fifth embodiment

The fifth embodiment of the present application further provides another text matching method corresponding to the application scenario of the text matching method provided by the embodiment of the present application and the text matching method provided by the first embodiment. Since the embodiment of the device is basically similar to the application scenario of the text matching method provided by the embodiment of the present application and the text matching method provided by the first embodiment, the description is relatively simple, and please refer to the application scenario of the text matching method provided by the embodiment of the present application and the partial description of the text matching method provided by the first embodiment for relevant points. The method embodiments described below are merely illustrative.

A text matching method provided in a fifth embodiment of the present application is described below with reference to fig. 6.

Fig. 6 is a flowchart of a text matching method according to a fifth embodiment of the present application. Fig. 6 is a flowchart of a text matching method according to a fifth embodiment of the present application, which includes steps S601 to S604.

In step S601, a target entity category corresponding to the entity name text to be matched for describing the government service material name is obtained.

In the fifth embodiment of the present application, the entity name text is a text for describing the name of an entity in a target text, and the target text is generally a document, a sentence or a sentence, such as a government service item, a network service item other than a government service, an equipment operation flow, and a description of chemical experiment steps. The entity is generally the required government service material in government service matters, the entity name text is the text for describing the government service material name, and the entity can be the required network service material in network service matters except government service, and the entity name text is the text for describing the network service material name. In addition, the entity may be other types of entities, such as equipment in the operation flow of the equipment, chemicals in the introduction of chemical experiment steps, chemical reaction devices, and the like. That is, in the first embodiment of the present application, the target text and the entity are not particularly limited.

The entity category is a category of an entity described by entity name text, and is a category which is divided in advance according to a preset entity category division strategy. In the specific implementation process, the process of obtaining the target entity category comprises the steps of firstly, performing word segmentation on the entity name text to be matched by adopting a preset word segmentation strategy to obtain category keywords in the entity name text to be matched, and then obtaining the target entity category according to the category keywords in the entity name text to be matched. The category keywords are words in the entity name text which can identify the category of the entity. For example, the entity name text to be matched is "some application form", "some certificate" and "some table", and the like, and the "application form", "certificate" and "table" are the category keywords in the entity name text to be matched. In the fifth embodiment of the present application, the categories are pre-divided according to a preset entity category division policy, that is, entity categories determined according to category keywords in the entity name text that are pre-counted. Therefore, the "application book" class, the "certificate" class, the "table" class, and the like are the classes which are divided in advance according to the preset entity class division policy.

In step S602, according to the target entity category, a candidate standardized entity name text corresponding to the entity name text to be matched, the entity category of which is the same as the target entity category, is obtained.

In the fifth embodiment of the present application, the candidate normalized entity name text corresponding to the entity name text to be matched is a normalized entity name text associated with a keyword in the entity name text to be matched, which is obtained based on the keyword in the entity name text to be matched. In the specific implementation process, a recall strategy of bm25 is adopted for the entity name text to be matched, and an associated standardized entity name text associated with the entity name to be matched is quickly recalled in preset standardized entity name text data by using an ES tool.

In step S603, a target normalized entity name text matching the entity name text to be matched is obtained from the candidate normalized entity name text according to the text similarity between the entity name text to be matched and the candidate normalized entity name text.

In the implementation process, before obtaining the text similarity according to the word vector similarity and the character string similarity, it is required to determine whether the text similarity reaches a text similarity threshold. And if the text similarity reaches the text similarity threshold, obtaining the candidate standardized entity name text with the text similarity reaching the text similarity threshold from the candidate standardized entity name texts as the target standardized entity name text. Specifically, from candidate standardized entity name texts, candidate standardized entity name texts with the highest text similarity reaching a text similarity threshold are obtained as target standardized entity name texts.

In step S604, the entity name text to be matched is associated with the target standardized entity name text.

The specific implementation mode of the association is generally to establish a corresponding relation between the entity name text to be matched and the target standardized entity name text.

Sixth embodiment

The sixth embodiment of the present application further provides another text matching method corresponding to the application scenario of the text matching method provided by the embodiment of the present application and the text matching method provided by the first embodiment. Since the embodiment of the device is basically similar to the application scenario of the text matching method provided by the embodiment of the present application and the text matching method provided by the first embodiment, the description is relatively simple, and please refer to the application scenario of the text matching method provided by the embodiment of the present application and the partial description of the text matching method provided by the first embodiment for relevant points. The method embodiments described below are merely illustrative.

First, a target entity category corresponding to entity name text to be matched for describing a geographical location name is obtained.

In the sixth embodiment of the present application, the entity name text is a text for describing the name of the geographic location in the target text, and the target text is generally a document, a paragraph or a sentence, for example, a document for describing the geographic location collected when creating the map, a document for describing the geographic location in the social administration and city management processes, etc. In the first embodiment of the present application, the target text and the entity are not particularly limited.

The entity name text to be matched is generally non-standardized entity name text, and can also be standardized entity name text. The standardized entity name text and the non-standardized entity name text are respectively a standard description text and a non-standard description text for the same entity name. Specifically, the "people's republic of China" may be taken as an example, the standardized entity name text of the "people's republic of China" may be "China", and the like. The "Beijing Olympic park" may be taken as an example, the standardized entity name text of the "Beijing Olympic park" may be "Beijing Olympic park", and the non-standardized entity name text may be "Olympic park", etc.

The entity category is a category of an entity described by entity name text, and is a category which is divided in advance according to a preset entity category division strategy. In the specific implementation process, the process of obtaining the target entity category comprises the steps of firstly, performing word segmentation on the entity name text to be matched by adopting a preset word segmentation strategy to obtain category keywords in the entity name text to be matched, and then obtaining the target entity category according to the category keywords in the entity name text to be matched. The category keywords are words in the entity name text which can identify the category of the entity. Specifically, for example, the entity to be matched is a government service material, for example, the entity name text to be matched is "somewhere country", "somewhere province", "somewhere city" and "somewhere mountain", and for "somewhere country", "somewhere province", "somewhere city" and "somewhere mountain", the "country", "province", "city" and "mountain" are the category keywords in the entity name text to be matched. In the sixth embodiment of the present application, the categories are pre-divided according to a preset entity category division policy, that is, entity categories determined according to category keywords in the entity name text that are pre-counted. Therefore, the "country" class, the "prefecture" class, the "mountain" class, the "district" class, and the like are classes that are divided in advance according to a preset entity class division policy.

And secondly, according to the target entity category, obtaining a candidate standardized entity name text corresponding to the entity name text to be matched, wherein the entity category of the candidate standardized entity name text is the same as the target entity category.

In the sixth embodiment of the present application, the candidate normalized entity name text corresponding to the entity name text to be matched is a normalized entity name text associated with a keyword in the entity name text to be matched, which is obtained based on the keyword in the entity name text to be matched. In the specific implementation process, a recall strategy of bm25 is adopted for the entity name text to be matched, and an associated standardized entity name text associated with the entity name to be matched is quickly recalled in preset standardized entity name text data by using an ES tool.

And thirdly, obtaining a target standardized entity name text matched with the entity name text to be matched from the candidate standardized entity name text according to the text similarity between the entity name text to be matched and the candidate standardized entity name text.

And finally, associating the entity name text to be matched with the target standardized entity name text.

While the application has been described in terms of preferred embodiments, it is not intended to be limiting, but rather, it will be apparent to those skilled in the art that various changes and modifications can be made herein without departing from the spirit and scope of the application as defined by the appended claims.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or Flash memory (Flash RAM), among others, in a computer readable medium. Memory is an example of computer-readable media.

1. Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), random Access Memory (RAM) of other physical types, read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic disk storage or other magnetic storage media, or any other non-transmission media, that can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include non-transitory computer-readable media (Transitory Media), such as modulated data signals and carrier waves.

2. It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims

1. A text matching method, comprising:

Obtaining candidate standardized entity name texts corresponding to the entity name texts to be matched, wherein the entity categories are the same as the target entity categories, according to the target entity categories, the candidate standardized entity name texts corresponding to the entity name texts to be matched comprise obtaining associated standardized entity name texts associated with the entity names to be matched according to keywords in the entity name texts to be matched;

2. The text matching method of claim 1, further comprising providing the target normalized entity name text to a user device.

3. The text matching method according to claim 1 or 2, further comprising associating the entity name text to be matched with the target standardized entity name text.

4. The text matching method according to claim 3, wherein the associating the entity name text to be matched with the target standardized entity name text includes establishing a correspondence between the entity name text to be matched and the target standardized entity name text.

5. The text matching method according to claim 2, wherein the obtaining the target entity category corresponding to the entity name text to be matched includes obtaining a text matching instruction sent by the user equipment, where the text matching instruction carries the entity name text to be matched;

The providing the target standardized entity name text to the user equipment comprises providing the target standardized entity name text to the user equipment for the text matching instruction.

6. The text matching method of claim 1, further comprising presenting the target normalized entity name text.

7. The text matching method according to claim 1, wherein the obtaining the target entity category corresponding to the entity name text to be matched includes:

8. The text matching method according to claim 1, wherein the obtaining, from the candidate standardized entity name text, a target standardized entity name text that matches the to-be-matched entity name text according to the text similarity between the to-be-matched entity name text and the candidate standardized entity name text, includes:

And obtaining the text similarity according to the word vector similarity.

9. The text matching method according to claim 8, wherein said obtaining said text similarity from said word vector similarity comprises:

Obtaining a character string matched with the entity name text to be matched;

10. The text matching method according to claim 9, wherein the obtaining the text similarity according to the word vector similarity and the character string similarity includes weighting the word vector similarity and the character string similarity according to a preset first similarity weight corresponding to the word vector similarity and a preset second similarity weight corresponding to the word vector similarity, so as to obtain the text similarity.

11. The text matching method of claim 1, further comprising determining whether the text similarity reaches a text similarity threshold;

12. The text matching method according to claim 11, wherein the obtaining the candidate normalized entity name text whose text similarity reaches the text similarity threshold from the candidate normalized entity name texts as a target normalized entity name text includes obtaining the candidate normalized entity name text whose text similarity reaches the text similarity threshold and whose similarity is highest from the candidate normalized entity name texts as a target normalized entity name text.

13. The text matching method of claim 11, further comprising determining that the target normalized entity name text does not exist in the candidate normalized entity name text if the text similarity does not reach the text similarity threshold.

14. A text matching apparatus, comprising:

The candidate text obtaining unit is used for obtaining candidate standardized entity name texts corresponding to the entity name texts to be matched, wherein the entity categories are the same as the target entity categories, according to the target entity categories, and the candidate standardized entity name texts comprise obtaining associated standardized entity name texts associated with the entity names to be matched according to keywords in the entity name texts to be matched;

15. An electronic device, comprising:

Processor, and

16. A storage medium storing a program for a text matching method, the program being executed by a processor to perform the steps of:

17. The government service text matching method is characterized by comprising the following steps of:

18. An address text matching method, comprising: