[go: up one dir, main page]

CN108415903A - Judge evaluation method, storage medium and the equipment of search intention identification validity - Google Patents

Judge evaluation method, storage medium and the equipment of search intention identification validity Download PDF

Info

Publication number
CN108415903A
CN108415903A CN201810202366.5A CN201810202366A CN108415903A CN 108415903 A CN108415903 A CN 108415903A CN 201810202366 A CN201810202366 A CN 201810202366A CN 108415903 A CN108415903 A CN 108415903A
Authority
CN
China
Prior art keywords
intent
word
search
domain
participle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810202366.5A
Other languages
Chinese (zh)
Other versions
CN108415903B (en
Inventor
王璐
陈少杰
张文明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beisu Information Technology Nanjing Co ltd
Original Assignee
Wuhan Douyu Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Douyu Network Technology Co Ltd filed Critical Wuhan Douyu Network Technology Co Ltd
Priority to CN201810202366.5A priority Critical patent/CN108415903B/en
Publication of CN108415903A publication Critical patent/CN108415903A/en
Application granted granted Critical
Publication of CN108415903B publication Critical patent/CN108415903B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明提供一种判断搜索意图识别有效性的评价方法,包括以下步骤:S1,获取待评价搜索意图识别过程中的分词的总数量以及每个分词的信息;S2,根据所述信息,分别计算所述每个分词所对应的词频之和以及倒排文档频率;S3,利用步骤S2得到的所述每个分词所对应的词频之和以及倒排文档频率计算所述待评价搜索意图识别过程的有效性评分,进而判断所述搜索意图识别过程是否有效。本发明还涉及相关的计算机可读存储介质以及电子设备。

The present invention provides an evaluation method for judging the effectiveness of search intent recognition, comprising the following steps: S1, obtaining the total number of word segments and the information of each word segment in the search intent recognition process to be evaluated; S2, calculating respectively according to the information The sum of the word frequencies corresponding to each of the word segments and the inverted document frequency; S3, using the sum of the word frequencies corresponding to each of the word segments obtained in step S2 and the inverted document frequency to calculate the search intent recognition process to be evaluated Effectiveness score, and then determine whether the search intent identification process is effective. The invention also relates to related computer-readable storage media and electronic devices.

Description

判断搜索意图识别有效性的评价方法、存储介质和设备Evaluation method, storage medium and equipment for judging effectiveness of search intent recognition

技术领域technical field

本发明涉及大数据搜索领域,具体涉及一种用于判断搜索意图识别有效性的评价方法、相关存储介质和电子设备。The invention relates to the field of big data search, in particular to an evaluation method for judging the effectiveness of search intent recognition, related storage media and electronic equipment.

背景技术Background technique

在直播平台上,可以根据用户的搜索查询猜测用户的真实意图,通过其真实意图返回更加准确的搜索结果。然而,究竟该结果能够在多大程度上反映用户的真实意图是需要进行度量的,如果相关性很差那么这种识别出来的弱意图产生的实际效用会非常低。因此,我们需要解决的问题是如何对意图匹配的相关性进行衡量,从而判断意图识别的有效性。On the live broadcast platform, users' real intentions can be guessed based on their search queries, and more accurate search results can be returned through their real intentions. However, it needs to be measured to what extent the result can reflect the real intention of the user. If the correlation is poor, the actual utility of the identified weak intention will be very low. Therefore, the problem we need to solve is how to measure the relevance of intent matching, so as to judge the effectiveness of intent recognition.

不同于直接根据搜索词的文本匹配返回结果的场景,在使用意图识别算法后返回的搜索结果与搜索词之间可能并没有文本上的相关性,因此采用文本编辑距离去衡量相关性就显得十分片面。Unlike the scenario where the results are returned directly based on the text matching of the search terms, there may be no textual correlation between the search results returned after using the intent recognition algorithm and the search terms, so it is very important to use the text edit distance to measure the correlation. one-sided.

因此,有必要提出一种新的用于判断搜索意图识别有效性的评价方法。Therefore, it is necessary to propose a new evaluation method for judging the effectiveness of search intent recognition.

发明内容Contents of the invention

有鉴于此,为了克服上述问题的至少一个方面,本发明的实施例提供了一种基于TF-IDF判断搜索意图识别有效性的评价方法。In view of this, in order to overcome at least one aspect of the above problems, an embodiment of the present invention provides an evaluation method for judging the effectiveness of search intent recognition based on TF-IDF.

根据本发明的一个方面,提供了一种判断搜索意图识别有效性的评价方法,包括步骤:According to one aspect of the present invention, an evaluation method for judging the effectiveness of search intent recognition is provided, comprising steps:

S1,获取待评价搜索意图识别过程中的分词的总数量以及每个分词的信息;S1, obtaining the total number of word segmentations and the information of each word segmentation in the search intent recognition process to be evaluated;

S2,根据所述信息,分别计算所述每个分词所对应的词频之和以及倒排文档频率;S2. According to the information, respectively calculate the sum of the word frequencies corresponding to each word segment and the inverted document frequency;

S3,利用步骤S2得到的所述每个分词所对应的词频之和以及倒排文档频率计算所述待评价搜索意图识别过程的有效性评分,进而判断所述搜索意图识别过程是否有效。S3, using the sum of word frequencies corresponding to each word segment obtained in step S2 and the inverted document frequency to calculate the effectiveness score of the search intent recognition process to be evaluated, and then determine whether the search intent recognition process is effective.

例如,所述信息包括所述每个分词匹配到的意图域,其中每个意图域均具有预设的权重;所述每个分词在所述意图域中匹配到的次数;用户在预设时间段内的搜索总次数,以及所述搜索总次数中的包含所述每个分词的搜索次数。For example, the information includes the intent domain matched by each participle, where each intent domain has a preset weight; the number of times each participle is matched in the intent domain; The total number of searches within the segment, and the number of searches that include each of the word segments in the total number of searches.

例如,根据下式计算每个分词所对应的词频之和:For example, calculate the sum of word frequencies corresponding to each word segment according to the following formula:

其中,H是匹配到的意图域集合,由多个不同的意图域组成,f是其中的一个意图域;Among them, H is a set of matched intent domains, which is composed of multiple different intent domains, and f is one of the intent domains;

是分词ti在意图域f中可以匹配到的次数; is the number of times the participle t i can be matched in the intent domain f;

nf是意图域f中词语的个数;n f is the number of words in the intent domain f;

wf是意图域f的权重。w f is the weight of the intent domain f.

例如,根据下式计算每个分词所对应的倒排文档频率 For example, calculate the inverted document frequency corresponding to each word segment according to the following formula

其中,N为用户在预设时间段内的搜索总次数;N(ti)为包含每个分词ti的搜索的次数,log为自然对数。Wherein, N is the total number of searches by the user within a preset time period; N(t i ) is the number of searches including each word t i , and log is the natural logarithm.

例如,根据下式计算待评价搜索意图识别过程的有效性评分R:For example, the effectiveness score R of the search intent recognition process to be evaluated is calculated according to the following formula:

其中,n为分词的总数量。Among them, n is the total number of word segmentation.

进一步地,步骤S3进一步包括:Further, step S3 further includes:

将有效性评分与预设阈值进行比较,若有效性评分大于预设阈值,则判定所述搜索意图识别过程是有效的;若有效性评分小于预设阈值,则判定所述搜索意图识别过程是无效的。Comparing the validity score with a preset threshold, if the validity score is greater than the preset threshold, it is determined that the search intent recognition process is effective; if the validity score is smaller than the preset threshold, it is determined that the search intent recognition process is Invalid.

本发明还提供一种计算机可读存储介质,其上存储有可执行指令,其特征在于,所述指令在由处理器执行时,实现如上所述的任一种判断搜索意图识别有效性的评价方法的步骤。The present invention also provides a computer-readable storage medium on which executable instructions are stored, wherein when the instructions are executed by a processor, any of the above-mentioned evaluations for judging the effectiveness of search intent recognition can be realized method steps.

本发明还提供了一种电子设备,包括:The present invention also provides an electronic device, comprising:

存储器,用于存储可执行指令;以及memory for storing executable instructions; and

处理器,用于执行所述存储器中存储的可执行指令,以实现如上所述的任一种判断搜索意图识别有效性的评价方法的步骤。A processor, configured to execute the executable instructions stored in the memory, so as to implement the steps of any evaluation method for judging the effectiveness of search intent recognition as described above.

与现有技术相比,本发明能够科学、准确的判断搜索意图识别是否有效,解决了传统相关性评价方法无法适用的问题。Compared with the prior art, the present invention can scientifically and accurately judge whether the search intention recognition is effective, and solves the problem that the traditional correlation evaluation method cannot be applied.

附图说明Description of drawings

通过下文中参照附图对本发明所作的描述,本发明的其它目的和优点将显而易见,并可帮助对本发明有全面的理解。Other objects and advantages of the present invention will be apparent from the following description of the present invention with reference to the accompanying drawings, and may help to provide a comprehensive understanding of the present invention.

图1为本发明实施例提供的判断搜索意图识别有效性的评价方法的实施步骤的流程图;FIG. 1 is a flowchart of implementation steps of an evaluation method for judging the effectiveness of search intent recognition provided by an embodiment of the present invention;

图2为本发明实施例提供的计算机可读存储介质的结构示意图;FIG. 2 is a schematic structural diagram of a computer-readable storage medium provided by an embodiment of the present invention;

图3为本发明实施例提供的电子设备的结构示意图。FIG. 3 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚,下面将结合本发明实施例的附图,对本发明的技术方案进行清楚、完整地描述。显然,所描述的实施例是本发明的一个实施例,而不是全部的实施例。基于所描述的本发明的实施例,本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solution and advantages of the present invention clearer, the technical solution of the present invention will be clearly and completely described below in conjunction with the accompanying drawings of the embodiments of the present invention. Apparently, the described embodiment is one embodiment of the present invention, but not all of them. Based on the described embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

除非另外定义,本发明使用的技术术语或者科学术语应当为本发明所属领域内具有一般技能的人士所理解的通常意义。Unless otherwise defined, the technical terms or scientific terms used in the present invention shall have the usual meanings understood by those skilled in the art to which the present invention belongs.

在本文中,表述搜索意图指根据用户搜索查询短语判断出的用户实际想搜索的意图内容。In this paper, expressing search intent refers to the intended content that the user actually wants to search for, judged according to the user's search query phrase.

表述意图域指根据业务经验对用户的搜索意图进行划分的意图集合,在直播搜索中常见的有主播意图域、分区意图域等。每个意图域由若干索引词组成。Expressive intent domain refers to the intent set that divides the user's search intent according to business experience. Commonly used in live search are anchor intent domain and sub-intent domain. Each intent domain consists of several index words.

根据本发明的一个方面,提供了一种判断搜索意图识别有效性的评价方法,具体实现思路如下:According to one aspect of the present invention, an evaluation method for judging the effectiveness of search intent recognition is provided, and the specific implementation ideas are as follows:

基于TF-FID,计算对搜索意图识别过程中分词处理后得到的所有分词的词频以及在预设时间段内出现的频率,由此得到对该次搜索意图识别过程的评价分,从而可以判断该次识别过程是否是有效的。Based on TF-FID, calculate the word frequency of all the word segmentations obtained in the process of word segmentation in the search intent recognition process and the frequency of occurrence within the preset time period, and thus obtain the evaluation score of the search intent recognition process, so that it can be judged. Whether the identification process is effective.

更具体的,下面结合附图对本发明判断搜索意图识别有效性的评价方法进行详细的描述。More specifically, the evaluation method for judging the effectiveness of search intent recognition in the present invention will be described in detail below with reference to the accompanying drawings.

参考图1所示,本发明的实施例提供的用于判断搜索意图识别有效性的评价方法,可以包括如下步骤:Referring to Fig. 1, the evaluation method for judging the effectiveness of search intent recognition provided by the embodiments of the present invention may include the following steps:

S1,获取待评价搜索意图识别过程中的分词的总数量以及每个分词的信息;S1, obtaining the total number of word segmentations and the information of each word segmentation in the search intent recognition process to be evaluated;

在本实施例中,每个分词的信息可以包括每个分词所匹配到的意图域,需要说明的是,每个分词匹配到的意图域数量可以不同,种类也可以不同,例如,分词t1可以匹配到一个意图域A,而分词t2可以匹配到意图域B和意图域C。而且不同的意图域均具有预设权重,权重是可以根据之前业务经验设定的。In this embodiment, the information of each participle may include the intent field matched by each participle. It should be noted that the number and types of the intentional fields matched by each participle may be different, for example, participle t 1 It can be matched to an intent domain A, and the word segmentation t2 can be matched to intent domain B and intent domain C. Moreover, different intent domains have preset weights, which can be set based on previous business experience.

每个分词的信息还可以包括每个分词在各自匹配到的意图域中可以匹配到的次数,即,分词可以匹配到意图域中词语的次数。例如,分词t1可以匹配到意图域A中的词语5次,而分词t2可以匹配到意图域B中词语2次和意图域C中的词语3次。The information of each word segment may also include the number of times each word segment can be matched in the respective matched intent domain, that is, the number of times the word segment can be matched to words in the intent domain. For example, word segmentation t 1 can match words in intent domain A 5 times, while word segmentation t 2 can match words in intent domain B 2 times and words in intent domain C 3 times.

每个分词的信息还可以包括一个预设时间段内用户所搜索的总次数中,总次数中包含每个分词的搜索的次数。在本实施例中,预设时间段可以是30天。当然在其他实施例中,也可以是其他时间长度。The information of each participle may also include the number of searches for each participle among the total times of searches by the user within a preset period of time. In this embodiment, the preset time period may be 30 days. Of course, in other embodiments, other time lengths are also possible.

例如,在30天内,所有用户一共进行了100000次搜索,其中包含分词t1的次数为100次,包含分词t2的次数为200次。For example, within 30 days, all users have performed a total of 100,000 searches, of which 100 times include participle t1 , and 200 times include participle t2 .

S2,根据信息,分别计算每个分词所对应的词频之和以及倒排文档频率;S2, according to the information, respectively calculate the sum of the word frequencies corresponding to each word segment and the inverted document frequency;

在本实施例中,可以根据下式计算每个分词所对应的词频之和:In this embodiment, the sum of the word frequencies corresponding to each word segmentation can be calculated according to the following formula:

其中,H是所有分词可以匹配到的意图域集合,由多个不同的意图域组成,f是其中的一个意图域;Among them, H is the set of intent domains that all participle can match, which is composed of multiple different intent domains, and f is one of the intent domains;

是分词ti在意图域f中可以匹配到的次数; is the number of times the participle t i can be matched in the intent domain f;

nf是意图域f中词语的个数;n f is the number of words in the intent domain f;

wf是意图域f的权重。w f is the weight of the intent domain f.

在本实施例中,可以根据下式计算每个分词所对应的倒排文档频塞 In this embodiment, the frequency of inverted documents corresponding to each word segment can be calculated according to the following formula

其中,N为用户在预设时间段内的搜索总次数;N(ti)为包含每个分词ti的搜索的次数,log为自然对数。Wherein, N is the total number of searches by the user within a preset time period; N(t i ) is the number of searches including each word t i , and log is the natural logarithm.

S3,利用步骤S2得到的每个分词所对应的词频之和以及倒排文档频率计算待评价搜索意图识别过程的有效性评分,进而判断搜索意图识别过程是否有效。S3, using the sum of word frequencies corresponding to each word segment obtained in step S2 and the inverted document frequency to calculate the effectiveness score of the search intent recognition process to be evaluated, and then determine whether the search intent recognition process is effective.

在本实施例中,可以根据下式计算待评价搜索意图识别过程的有效性评分R:In this embodiment, the effectiveness score R of the search intent recognition process to be evaluated can be calculated according to the following formula:

其中,n为分词的总数量。Among them, n is the total number of word segmentation.

在进一步较佳实施例中,步骤S3可以进一步包括:In a further preferred embodiment, step S3 may further include:

将有效性评分与预设阈值进行比较,若有效性评分大于预设阈值,则可以判定搜索意图识别过程是有效的;若有效性评分小于预设阈值,则可以判定搜索意图识别过程是无效的。Compare the validity score with the preset threshold, if the validity score is greater than the preset threshold, it can be judged that the search intent recognition process is effective; if the validity score is smaller than the preset threshold, it can be judged that the search intent recognition process is invalid .

下面列举一个实际例子具体说明本发明是如何判断搜索意图识别有效性的评价的。A practical example is given below to specifically illustrate how the present invention judges the evaluation of the effectiveness of search intent recognition.

假设现在有三个意图域,每个意图域的词语个数以及权重分别是:Assuming that there are three intent domains, the number of words and weights of each intent domain are:

意图域A:nA=1000,wA=1.0Intent domain A: n A =1000, w A =1.0

意图域B:nB=500,wB=0.5Intent domain B: n B = 500, w B = 0.5

意图域C:nC=100,wC=0.8Intent domain C: n C = 100, w C = 0.8

在一次识别中,根据分词可以分成两个词语t1、t2 In one recognition, it can be divided into two words t 1 and t 2 according to word segmentation

其中t1匹配到了意图域A中的词语5次,t2匹配到了意图域B中的词语2次、意图域C中的词语1次。Among them, t 1 matches words in intent domain A 5 times, t 2 matches words in intent domain B 2 times, and words in intent domain C 1 time.

在30天内用户总共有100000次搜索,其中包含词语t1的搜索有100次,包含词语t2的搜索有200次。In 30 days the user has a total of 100,000 searches, 100 of which contain the term t1 and 200 of which contain the term t2 .

于是该次搜索的意图匹配相关性得分为:The intent matching relevance score for this search is then:

5/1000*1.0*log(100000/100)+(2/500*0.5+1/100*0.8)*log(100000/200)=0.09675/1000*1.0*log(100000/100)+(2/500*0.5+1/100*0.8)*log(100000/200)=0.0967

然后将0.0967与预设的阈值进行比较,进而判断该次搜索意图识别是否是有效的。Then compare 0.0967 with the preset threshold to judge whether the search intent recognition is valid.

本实施例提出的判断搜索意图识别有效性的评价方法,可以解决传统相关性评价方法无法适用的问题,可以更加科学有效的判断意图识别的有效性。The evaluation method for judging the effectiveness of search intent recognition proposed in this embodiment can solve the problem that traditional correlation evaluation methods cannot be applied, and can more scientifically and effectively judge the effectiveness of intent recognition.

基于同一发明构思,参考图2所示,本发明的实施例还提供一种计算机可读存储介质201,其上存储有可执行指令202,可执行指令202在由一个或多个处理器执行时,可以实现如上实施例的判断搜索意图识别有效性的评价方法的步骤。Based on the same inventive concept, as shown in FIG. 2 , an embodiment of the present invention also provides a computer-readable storage medium 201 on which executable instructions 202 are stored. When executed by one or more processors, the executable instructions 202 , the steps of the evaluation method for judging the effectiveness of search intent recognition in the above embodiment can be implemented.

基于同一发明构思,参考图3所示,本发明实施例还提供了一种电子设备301,该电子设备301可以包括:Based on the same inventive concept, as shown in FIG. 3 , an embodiment of the present invention also provides an electronic device 301, which may include:

存储器310,其用于存储可执行指令311;以及memory 310 for storing executable instructions 311; and

处理器320,其用于执行存储器310中存储的可执行指令311,以实现如上实施例的任一种判断搜索意图识别有效性的评价方法的步骤。The processor 320 is configured to execute the executable instructions 311 stored in the memory 310, so as to implement the steps of any evaluation method for judging the effectiveness of search intent recognition in the above embodiments.

对于本发明的实施例,还需要说明的是,在不冲突的情况下,本发明的实施例及实施例中的特征可以相互组合以得到新的实施例。Regarding the embodiments of the present invention, it should also be noted that, under the condition of no conflict, the embodiments of the present invention and the features in the embodiments can be combined with each other to obtain new embodiments.

最后应说明的是,以上实施例仅用以说明本发明的技术方案而非限制,尽管参照较佳实施例对本发明进行了详细说明,本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或等同替换,而不脱离本发明技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention without limitation. Although the present invention has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present invention can be Modifications or equivalent replacements can be made without departing from the spirit and scope of the technical solutions of the present invention.

Claims (8)

1.一种判断搜索意图识别有效性的评价方法,包括以下步骤:1. An evaluation method for judging the effectiveness of search intent recognition, comprising the following steps: S1,获取待评价搜索意图识别过程中的分词的总数量以及每个分词的信息;S1, obtaining the total number of word segmentations and the information of each word segmentation in the search intent recognition process to be evaluated; S2,根据所述信息,分别计算所述每个分词所对应的词频之和以及倒排文档频率;S2. According to the information, respectively calculate the sum of the word frequencies corresponding to each word segment and the inverted document frequency; S3,利用步骤S2得到的所述每个分词所对应的词频之和以及倒排文档频率计算所述待评价搜索意图识别过程的有效性评分,进而判断所述搜索意图识别过程是否有效。S3, using the sum of word frequencies corresponding to each word segment obtained in step S2 and the inverted document frequency to calculate the effectiveness score of the search intent recognition process to be evaluated, and then determine whether the search intent recognition process is effective. 2.如权利要求1所述的方法,其特征在于,所述信息包括所述每个分词匹配到的意图域,其中每个意图域均具有预设的权重;所述每个分词在所述意图域中匹配到的次数;用户在预设时间段内的搜索总次数,以及所述搜索总次数中的包含所述每个分词的搜索次数。2. The method according to claim 1, wherein the information includes the domain of intent matched by each participle, wherein each domain of intent has a preset weight; each participle in the The number of matches in the intent domain; the total number of searches by the user within a preset time period, and the number of searches that include each participle in the total number of searches. 3.如权利要求2所述的方法,根据下式计算每个分词所对应的词频之和:3. The method according to claim 2, calculate the corresponding word frequency sum of each participle according to the following formula: 其中,H是匹配到的意图域集合,由多个不同的意图域组成,f是其中的一个意图域;Among them, H is a set of matched intent domains, which is composed of multiple different intent domains, and f is one of the intent domains; 是分词ti在意图域f中可以匹配到的次数; is the number of times the participle t i can be matched in the intent domain f; nf是意图域f中词语的个数;n f is the number of words in the intent domain f; wf是意图域f的权重。w f is the weight of the intent domain f. 4.如权利要求3所述的方法,根据下式计算每个分词所对应的倒排文档频率 4. The method according to claim 3, according to the following formula, calculate the corresponding inverted document frequency of each participle 其中,N为用户在预设时间段内的搜索总次数;N(ti)为包含每个分词ti的搜索的次数,log为自然对数。Wherein, N is the total number of searches by the user within a preset time period; N(t i ) is the number of searches including each word t i , and log is the natural logarithm. 5.如权利要求4所述的方法,根据下式计算待评价搜索意图识别过程的有效性评分R:5. The method according to claim 4, calculate the validity score R of the search intent recognition process to be evaluated according to the following formula: 其中,n为分词的总数量。Among them, n is the total number of word segmentation. 6.如权利要求1-5任一项所述的方法,其特征在于,步骤S3进一步包括:6. The method according to any one of claims 1-5, characterized in that step S3 further comprises: 将有效性评分与预设阈值进行比较,若有效性评分大于预设阈值,则判定所述搜索意图识别过程是有效的;若有效性评分小于预设阈值,则判定所述搜索意图识别过程是无效的。Comparing the validity score with a preset threshold, if the validity score is greater than the preset threshold, it is determined that the search intent recognition process is effective; if the validity score is smaller than the preset threshold, it is determined that the search intent recognition process is Invalid. 7.一种计算机可读存储介质,其上存储有可执行指令,其特征在于,所述指令在由处理器执行时,实现如权利要求1-6中任一项所述的方法的步骤。7. A computer-readable storage medium, on which executable instructions are stored, wherein when the instructions are executed by a processor, the steps of the method according to any one of claims 1-6 are implemented. 8.一种电子设备,包括:8. An electronic device comprising: 存储器,用于存储可执行指令;以及memory for storing executable instructions; and 处理器,用于执行所述存储器中存储的可执行指令,以实现如权利要求1-6中任一项所述的方法的步骤。A processor, configured to execute the executable instructions stored in the memory, so as to implement the steps of the method according to any one of claims 1-6.
CN201810202366.5A 2018-03-12 2018-03-12 Evaluation method, storage medium and device for judging the effectiveness of search intent recognition Active CN108415903B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810202366.5A CN108415903B (en) 2018-03-12 2018-03-12 Evaluation method, storage medium and device for judging the effectiveness of search intent recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810202366.5A CN108415903B (en) 2018-03-12 2018-03-12 Evaluation method, storage medium and device for judging the effectiveness of search intent recognition

Publications (2)

Publication Number Publication Date
CN108415903A true CN108415903A (en) 2018-08-17
CN108415903B CN108415903B (en) 2021-09-07

Family

ID=63131129

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810202366.5A Active CN108415903B (en) 2018-03-12 2018-03-12 Evaluation method, storage medium and device for judging the effectiveness of search intent recognition

Country Status (1)

Country Link
CN (1) CN108415903B (en)

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101661474A (en) * 2008-08-26 2010-03-03 华为技术有限公司 Search method and system
CN101820592A (en) * 2009-02-27 2010-09-01 华为技术有限公司 Method and device for mobile search
CN102768679A (en) * 2012-06-25 2012-11-07 深圳市汉络计算机技术有限公司 A search method and search system
CN102999521A (en) * 2011-09-15 2013-03-27 北京百度网讯科技有限公司 Method and device for identifying search requirement
CN103186574A (en) * 2011-12-29 2013-07-03 北京百度网讯科技有限公司 Method and device for generating searching result
CN103246681A (en) * 2012-02-13 2013-08-14 腾讯科技(深圳)有限公司 Search method and search device
CN103823906A (en) * 2014-03-19 2014-05-28 北京邮电大学 Multi-dimension searching sequencing optimization algorithm and tool based on microblog data
US20140258283A1 (en) * 2013-03-11 2014-09-11 Hon Hai Precision Industry Co., Ltd. Computing device and file searching method using the computing device
CN104679778A (en) * 2013-11-29 2015-06-03 腾讯科技(深圳)有限公司 Search result generating method and device
CN104838375A (en) * 2012-11-13 2015-08-12 微软技术许可有限责任公司 Intent-based presentation of search results
CN105005589A (en) * 2015-06-26 2015-10-28 腾讯科技(深圳)有限公司 Text classification method and text classification device
US20160140231A1 (en) * 2014-11-18 2016-05-19 Oracle International Corporation Term selection from a document to find similar content
CN106021626A (en) * 2016-07-27 2016-10-12 成都四象联创科技有限公司 Data search method based on data mining
CN106502980A (en) * 2016-10-09 2017-03-15 武汉斗鱼网络科技有限公司 A kind of search method and system based on text morpheme cutting
CN106959971A (en) * 2016-01-12 2017-07-18 阿里巴巴集团控股有限公司 The processing method and processing device of user behavior data
CN107133259A (en) * 2017-03-22 2017-09-05 北京晓数聚传媒科技有限公司 A kind of searching method and device

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101661474A (en) * 2008-08-26 2010-03-03 华为技术有限公司 Search method and system
CN101820592A (en) * 2009-02-27 2010-09-01 华为技术有限公司 Method and device for mobile search
CN102999521A (en) * 2011-09-15 2013-03-27 北京百度网讯科技有限公司 Method and device for identifying search requirement
CN103186574A (en) * 2011-12-29 2013-07-03 北京百度网讯科技有限公司 Method and device for generating searching result
CN103246681A (en) * 2012-02-13 2013-08-14 腾讯科技(深圳)有限公司 Search method and search device
CN102768679A (en) * 2012-06-25 2012-11-07 深圳市汉络计算机技术有限公司 A search method and search system
CN104838375A (en) * 2012-11-13 2015-08-12 微软技术许可有限责任公司 Intent-based presentation of search results
US20140258283A1 (en) * 2013-03-11 2014-09-11 Hon Hai Precision Industry Co., Ltd. Computing device and file searching method using the computing device
CN104679778A (en) * 2013-11-29 2015-06-03 腾讯科技(深圳)有限公司 Search result generating method and device
CN103823906A (en) * 2014-03-19 2014-05-28 北京邮电大学 Multi-dimension searching sequencing optimization algorithm and tool based on microblog data
US20160140231A1 (en) * 2014-11-18 2016-05-19 Oracle International Corporation Term selection from a document to find similar content
CN105005589A (en) * 2015-06-26 2015-10-28 腾讯科技(深圳)有限公司 Text classification method and text classification device
CN106959971A (en) * 2016-01-12 2017-07-18 阿里巴巴集团控股有限公司 The processing method and processing device of user behavior data
CN106021626A (en) * 2016-07-27 2016-10-12 成都四象联创科技有限公司 Data search method based on data mining
CN106502980A (en) * 2016-10-09 2017-03-15 武汉斗鱼网络科技有限公司 A kind of search method and system based on text morpheme cutting
CN107133259A (en) * 2017-03-22 2017-09-05 北京晓数聚传媒科技有限公司 A kind of searching method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ALI EI-KAHKY: "Extending domain coverage of language understanding systems via intent transfer between domains using knowledge graphs and search query click logs", 《2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS,SPEECH AND SIGNAL PROCESSING》 *
唐晓波 等: "基于语义查询扩展的微博检索", 《信息技术》 *

Also Published As

Publication number Publication date
CN108415903B (en) 2021-09-07

Similar Documents

Publication Publication Date Title
CN104899304B (en) Name entity recognition method and device
US7961986B1 (en) Ranking of images and image labels
CN103377226B (en) A kind of intelligent search method and system thereof
CN103914494B (en) Method and system for identifying identity of microblog user
CN104199833B (en) A clustering method and clustering device for network search words
WO2021212801A1 (en) Evaluation object identification method and apparatus for e-commerce product, and storage medium
CN110457486A (en) Method and device for human entity alignment based on knowledge graph
CN110705612A (en) A hybrid multi-feature sentence similarity calculation method, storage medium and system
CN108875040A (en) Dictionary update method and computer readable storage medium
US9087122B2 (en) Corpus search improvements using term normalization
US20140032207A1 (en) Information Classification Based on Product Recognition
CN111274366B (en) Search recommendation method, device, equipment, and storage medium
CN102737112B (en) Concept Relevance Calculation Method Based on Representational Semantic Analysis
CN116089567A (en) Recommendation method, device, equipment and storage medium for search keywords
CN104462399B (en) The processing method and processing device of search result
CN108763272B (en) A kind of event information analysis method, computer readable storage medium and terminal device
CN105095188A (en) Sentence similarity computing method and device
CN110362813B (en) Search relevance measurement method, storage media, equipment and system based on BM25
CN110909532B (en) User name matching method and device, computer equipment and storage medium
CN108415903A (en) Judge evaluation method, storage medium and the equipment of search intention identification validity
CN103279545A (en) Method for preliminarily retrieving images
Ma et al. Web API discovery using semantic similarity and hungarian algorithm
CN117952079A (en) New word mining method and device, electronic equipment and storage medium
CN114328855B (en) Document retrieval methods, devices, electronic devices, and readable storage media
Krishnan et al. Towards in time music mood-mapping for drivers: A novel approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20250523

Address after: 210000 Jiangsu Province Nanjing City Qinhuai District Shimenkan 104 Modern Service Building A Tower 501 Room A-02 Room

Patentee after: Beisu Information Technology Nanjing Co.,Ltd.

Country or region after: China

Address before: 430000 Wuhan Donghu Development Zone, Wuhan, Hubei Province, No. 1 Software Park East Road 4.1 Phase B1 Building 11 Building

Patentee before: WUHAN DOUYU NETWORK TECHNOLOGY Co.,Ltd.

Country or region before: China