CN111814058A

CN111814058A - Push method, device, electronic device and storage medium based on user intent

Info

Publication number: CN111814058A
Application number: CN202010844662.2A
Authority: CN
Inventors: 刘曙铭
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd; Shenzhen Huantai Technology Co Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd; Shenzhen Huantai Technology Co Ltd
Priority date: 2020-08-20
Filing date: 2020-08-20
Publication date: 2020-10-23

Abstract

Embodiments of the present application disclose a user-intent-based push method, apparatus, electronic device, and computer-readable medium, which relate to the technical field of computer applications. The method includes: acquiring a search term input by a user and a set of mapping relationships of content to be pushed, wherein the set of mapping relationships of content to be pushed includes: a mapping relationship between content to be pushed and keywords; and based on a pre-trained semantic understanding model, respectively acquiring Describe the search semantic vector corresponding to the search word and the content semantic vector corresponding to the keyword; calculate the similarity between the search semantic vector and the content semantic vector, and determine the target content from the content semantic vector according to the similarity Semantic vector; according to the semantic vector of the target content, obtain the content to be pushed corresponding to the semantic vector of the target content, and push it. Therefore, the user's search intent can be mined according to the user's search terms, so as to effectively push content.

Description

Push method, device, electronic device and storage medium based on user intent

技术领域technical field

本申请实施例涉及计算机应用技术领域，更具体地，涉及一种基于用户意图的推送方法、装置、电子设备及存储介质。The embodiments of the present application relate to the technical field of computer applications, and more particularly, to a push method, apparatus, electronic device, and storage medium based on user intent.

背景技术Background technique

随着移动时代的快速发展，网络上的信息不断增加，用户经常通过搜索引擎在海量信息中搜索需要的信息，搜索引擎推广也是目前最有效的互联网广告渠道。但是，当前进行广告投放的方式依赖人工构建标签体系，从而在海量广告中选择合适的广告进行投放，这种方式依赖于领域经验，主观性强，无法满足复杂多变的广告投放需求。因此，如何准确获取用户的意图信息，根据用户的意图进行广告推送从而提高推送效率，是亟待解决的。With the rapid development of the mobile era, the information on the Internet continues to increase, and users often search for the information they need in the massive information through search engines. Search engine promotion is also the most effective Internet advertising channel at present. However, the current method of advertising delivery relies on manually constructing a label system to select suitable advertisements for delivery among a large number of advertisements. This method relies on domain experience, is highly subjective, and cannot meet complex and changeable advertising requirements. Therefore, how to accurately obtain the user's intention information and push advertisements according to the user's intention to improve the push efficiency is an urgent problem to be solved.

发明内容SUMMARY OF THE INVENTION

鉴于上述问题，本申请实施例提供一种基于用户意图的推送方法、装置、电子设备及存储介质，可以有效地进行内容推送。In view of the above problems, embodiments of the present application provide a push method, device, electronic device, and storage medium based on user intent, which can effectively push content.

第一方面，本申请实施例提供了一种基于用户意图的推送方法，所述方法包括：获取用户输入的搜索词和待推送内容映射关系集合，所述待推送内容映射关系集合包括：待推送内容与关键词之间的映射关系；基于预先训练的语义理解模型，分别获取所述搜索词对应的搜索语义向量和所述关键词对应的内容语义向量；计算所述搜索语义向量和所述内容语义向量的相似度，根据所述相似度从所述内容语义向量中确定目标内容语义向量；根据所述目标内容语义向量，获取与所述目标内容语义向量对应的待推送内容，进行推送。In a first aspect, an embodiment of the present application provides a push method based on user intent, the method includes: acquiring a search term input by a user and a set of mapping relationships of content to be pushed, where the set of mapping relationships of content to be pushed includes: a set of mapping relationships to be pushed The mapping relationship between content and keywords; based on the pre-trained semantic understanding model, respectively obtain the search semantic vector corresponding to the search word and the content semantic vector corresponding to the keyword; calculate the search semantic vector and the content The similarity of the semantic vectors, the target content semantic vector is determined from the content semantic vector according to the similarity; according to the target content semantic vector, the content to be pushed corresponding to the target content semantic vector is obtained and pushed.

第二方面，本申请实施例还提供了一种基于用户意图的推送装置，所述装置包括：信息获取模块，获取用户输入的搜索词和待推送内容映射关系集合，所述待推送内容映射关系集合包括：待推送内容与关键词之间的映射关系；向量获取模块，基于预先训练的语义理解模型，分别获取所述搜索词对应的搜索语义向量和所述关键词对应的内容语义向量；确定模块，计算所述搜索语义向量和所述内容语义向量的相似度，根据所述相似度从所述内容语义向量中确定目标内容语义向量；处理模块，根据所述目标内容语义向量，获取与所述目标内容语义向量对应的待推送内容，进行推送。In a second aspect, an embodiment of the present application further provides a push device based on user intent, the device includes: an information acquisition module that obtains a search term input by a user and a set of mapping relationships of content to be pushed, where the mapping relationship of content to be pushed is obtained. The set includes: the mapping relationship between the content to be pushed and the keywords; the vector acquisition module, based on the pre-trained semantic understanding model, respectively acquires the search semantic vector corresponding to the search word and the content semantic vector corresponding to the keyword; determining module, calculating the similarity between the search semantic vector and the content semantic vector, and determining the target content semantic vector from the content semantic vector according to the similarity; the processing module, according to the target content semantic vector, obtains and Push the content to be pushed corresponding to the semantic vector of the target content.

第三方面，本申请实施例还提供了一种电子设备，包括：一个或多个处理器；存储器；一个或多个应用程序，其中所述一个或多个应用程序被存储在所述存储器中并被配置为由所述一个或多个处理器执行，所述一个或多个应用程序配置用于执行上述方法。In a third aspect, embodiments of the present application further provide an electronic device, including: one or more processors; a memory; and one or more application programs, wherein the one or more application programs are stored in the memory and configured to be executed by the one or more processors, the one or more application programs configured to perform the above method.

第四方面，本申请实施例还提供了一种计算机可读取存储介质，计算机可读取存储介质中存储有程序代码，所述程序代码可被处理器调用执行上述方法。In a fourth aspect, an embodiment of the present application further provides a computer-readable storage medium, where a program code is stored in the computer-readable storage medium, and the program code can be invoked by a processor to execute the above method.

本申请实施例公开了一种基于用户意图的推送方法、装置、电子设备及计算机可读介质，涉及计算机应用技术领域。方法包括：获取用户输入的搜索词和待推送内容映射关系集合，所述待推送内容映射关系集合包括：待推送内容与关键词之间的映射关系；基于预先训练的语义理解模型，分别获取所述搜索词对应的搜索语义向量和所述关键词对应的内容语义向量；计算所述搜索语义向量和所述内容语义向量的相似度，根据所述相似度从所述内容语义向量中确定目标内容语义向量；根据目标内容语义向量，获取与所述目标内容语义向量对应的待推送内容，进行推送。因此，可以根据用户的搜索词挖掘用户的搜索意图，从而有效地进行内容推送。Embodiments of the present application disclose a user-intent-based push method, apparatus, electronic device, and computer-readable medium, which relate to the technical field of computer applications. The method includes: acquiring a search term input by a user and a set of mapping relationships of content to be pushed, wherein the set of mapping relationships of content to be pushed includes: a mapping relationship between content to be pushed and keywords; and based on a pre-trained semantic understanding model, respectively acquiring Describe the search semantic vector corresponding to the search word and the content semantic vector corresponding to the keyword; calculate the similarity between the search semantic vector and the content semantic vector, and determine the target content from the content semantic vector according to the similarity Semantic vector; according to the semantic vector of the target content, obtain the content to be pushed corresponding to the semantic vector of the target content, and push it. Therefore, the user's search intent can be mined according to the user's search terms, so as to effectively push content.

附图说明Description of drawings

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，而不是全部的实施例。基于本申请实施例，本领域普通技术人员在没有付出创造性劳动前提下所获得的所有其他实施例及附图，都属于本发明保护的范围。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. not all examples. Based on the embodiments of the present application, all other embodiments and drawings obtained by persons of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

图1示出了一种适用于本申请实施例的应用环境示意图。FIG. 1 shows a schematic diagram of an application environment suitable for this embodiment of the present application.

图2示出了本申请一个实施例提供的基于用户意图的推送方法的流程示意图。FIG. 2 shows a schematic flowchart of a user-intent-based push method provided by an embodiment of the present application.

图3示出了本申请另一个实施例提供的基于用户意图的推送方法的流程示意图。FIG. 3 shows a schematic flowchart of a push method based on user intent provided by another embodiment of the present application.

图4示出了本申请又一个实施例提供的基于用户意图的推送方法的流程示意图。FIG. 4 shows a schematic flowchart of a push method based on user intent provided by another embodiment of the present application.

图5示出了本申请再一个实施例提供的基于用户意图的推送方法的流程示意图。FIG. 5 shows a schematic flowchart of a push method based on user intent provided by still another embodiment of the present application.

图6示出了本申请实施例提供的基于用户意图的推送装置的模块框图。FIG. 6 shows a block diagram of a module of a push device based on user intention provided by an embodiment of the present application.

图7示出了本申请实施例用于执行根据本申请实施例的基于用户意图的推送方法的电子设备的框图；FIG. 7 shows a block diagram of an electronic device for implementing the push method based on user intent according to the embodiment of the present application;

图8示出了本申请实施例用于执行根据本申请实施例的基于用户意图的推送方法的计算机可读存储介质的模块框图。FIG. 8 shows a block diagram of a computer-readable storage medium for implementing the push method based on user intent according to the embodiment of the present application.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本申请方案，下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述。应当理解，此处描述的具体实施例仅用于解释本申请，并不用于限定本申请。In order to make those skilled in the art better understand the solutions of the present application, the following will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

目前，随着互联网的不断发展，互联网网民的数量已经呈爆炸式的增长，广告用户经常在互联网的海量信息中搜索需要的信息，搜索引擎渐渐成为必不可少的工具，广告也逐渐以互联网作为载体进行传播。目前，搜索引擎一般是根据用户输入的搜索文本进行检索，以获取与搜索文本相关的搜索结果，并将该搜索结果提供给用户进行查看，因此，如何根据用户的搜索文本获取用户的搜索意图，以实现有效的广告投放变得非常重要。At present, with the continuous development of the Internet, the number of Internet users has exploded. Advertising users often search for the information they need in the massive information on the Internet. Search engines have gradually become an indispensable tool, and advertising has gradually used the Internet as a carrier to spread. At present, search engines generally search according to the search text input by the user to obtain search results related to the search text, and provide the search results to the user for viewing. Therefore, how to obtain the user's search intent according to the user's search text, to achieve effective advertising becomes very important.

但是，当前理解用户搜索意图通常是采用文本分类模型对用户的搜索文本进行分类，获取用户搜索文本对应的标签，根据待投放广告的标签来获取用户搜索文本与广告的关联，从而进行广告投放。发明人研究了当前进行广告投放的方法所存在的困难，发现由于广告主的需求复杂多变，相应的标签体系也极其复杂，通常需要成百上千个分类标签，而文本分类模型是有监督模型，对每一个类别都需要从海量的用户搜索数据中进行标签标注来获取训练样本数据进行训练，因此，获取训练样本数据的标签需要消耗大量的人力，并且分类模型的准确率依赖于训练样本数据的质量和数量。However, currently understanding the user's search intent is usually to use a text classification model to classify the user's search text, obtain the label corresponding to the user's search text, and obtain the association between the user's search text and the advertisement according to the label of the advertisement to be placed, so as to perform advertisement placement. The inventor has studied the difficulties in the current method of advertising, and found that due to the complex and changeable needs of advertisers, the corresponding labeling system is also extremely complex, usually requiring hundreds or thousands of classification labels, and the text classification model is supervised. Model, each category needs to be labeled from massive user search data to obtain training sample data for training. Therefore, obtaining the labels of training sample data requires a lot of manpower, and the accuracy of the classification model depends on the training samples. Quality and quantity of data.

人工构建标签体系依赖个人的领域知识，存在的人为误差可能影响最终模型的识别能力。虽然可以通过数据分析的方法获取一批对应标签的关键词，通过关键词匹配的方法对用户搜索文本进行标签的标注来快速获取训练样本数据，但这种方法存在两个弊端。一方面，因为词本身具有歧义性，所以同样的词在不同的场景下表达的用户意图可能完全不同，通过关键词匹配方法得到的训练样本数据可能存在很多干扰，使用这样的训练样本数据得到的文本分类模型的识别能力也较差；另一方面，通过关键词匹配得到的训练样本数据的分布空间相对狭窄，只能找到当前标签中部分的文本数据的表示，很容易丢失当前标签的大量的语料空间表示，通过这样的训练样本数据得到的模型能够识别的搜索文本的语料空间也相对狭窄，模型识别效果不好，导致最终的广告投放情况不理想。Manually constructing a labeling system relies on personal domain knowledge, and the existing human error may affect the recognition ability of the final model. Although it is possible to obtain a batch of keywords corresponding to labels by means of data analysis, and to label the user search text by means of keyword matching to quickly obtain training sample data, this method has two drawbacks. On the one hand, because the words themselves are ambiguous, the same words may express completely different user intentions in different scenarios, and there may be a lot of interference in the training sample data obtained by the keyword matching method. The recognition ability of the text classification model is also poor; on the other hand, the distribution space of the training sample data obtained by keyword matching is relatively narrow, and only the representation of part of the text data in the current label can be found, and it is easy to lose a large amount of the current label. The corpus space indicates that the corpus space of the search text that can be recognized by the model obtained through such training sample data is also relatively narrow, and the model recognition effect is not good, resulting in an unsatisfactory final advertisement placement.

发明人研究了目前广告投放方法的困难点，更是综合考虑实际场景的广告投放需求，提出了本申请实施基于用户意图的推送方法、装置、电子设备及存储介质，根据用户的搜索词挖掘用户的搜索意图，从而有效地进行内容推送。The inventor has studied the difficulties of the current advertising delivery methods, and more importantly, comprehensively considered the advertising delivery requirements of actual scenarios, and proposed that the present application implement a user-intent-based push method, device, electronic device and storage medium, and mine users according to the user's search terms. search intent to effectively push content.

为了更好理解本申请实施例提供的一种基于用户意图的推送方法、装置、电子设备及存储介质，下面先对适用于本申请实施例的应用环境进行描述。In order to better understand a user-intent-based push method, device, electronic device, and storage medium provided by the embodiments of the present application, an application environment applicable to the embodiments of the present application is first described below.

请参阅图1，图1示出了一种适用于本申请实施例的应用环境示意图。本申请实施例提供的基于用户意图的推送方法可以应用于如图1所示的多态交互系统10。多态交互系统10包括终端设备100以及服务器200，服务器200与终端设备100通信连接。其中，服务器200可以是传统服务器，也可以是云端服务器，在此不作具体限定。Referring to FIG. 1, FIG. 1 shows a schematic diagram of an application environment suitable for the embodiment of the present application. The push method based on the user's intention provided by the embodiment of the present application can be applied to the polymorphic interaction system 10 as shown in FIG. 1 . The polymorphic interaction system 10 includes a terminal device 100 and a server 200 , and the server 200 is connected in communication with the terminal device 100 . The server 200 may be a traditional server or a cloud server, which is not specifically limited herein.

在一些实施例中，用户在用户终端通过帐号登录，则该帐号对应的所有信息可以存储在服务器100的存储空间内。其中，服务器100可以是单独的服务器，也可以是服务器集群，可以是本地服务器，也可以是云端服务器。用户终端内安装有多个应用程序，服务器100能够向用户终端推送一些内容，具体地，可以是将该内容推送至用户终端的某个应用程序，由该应用程序将该内容显示，从而能够将内容推送给到用户终端对应的用户。In some embodiments, if the user logs in through an account at the user terminal, all the information corresponding to the account may be stored in the storage space of the server 100 . The server 100 may be a single server, a server cluster, a local server, or a cloud server. There are multiple application programs installed in the user terminal, and the server 100 can push some content to the user terminal. The content is pushed to the user corresponding to the user terminal.

其中，服务器100可以与多个用户终端连接，并且可以将所要推送的内容推送给所有用户终端，也可以根据一些策略选择其中的某个用户终端，将所要推送的内容推送至所选择的用户终端。而具体的策略可以是根据所要推送的内容以及各个用户终端所对应的用户而确定。于本申请实施例中，所要推送的内容可以是广告信息，例如，某个电商应用程序的商品折扣信息等。The server 100 can be connected to multiple user terminals, and can push the content to be pushed to all user terminals, or select a user terminal according to some policies, and push the content to be pushed to the selected user terminal. . The specific policy may be determined according to the content to be pushed and the user corresponding to each user terminal. In this embodiment of the present application, the content to be pushed may be advertisement information, for example, commodity discount information of a certain e-commerce application.

上述应用环境仅为方便理解所作的示例，可以理解的是，本申请实施例不仅局限于上述应用环境。The above application environment is only an example for convenience of understanding, and it can be understood that the embodiments of the present application are not limited to the above application environment.

下面将通过具体实施例对本申请实施例提供的基于用户意图的推送方法、装置、终端设备及存储介质进行详细说明。The user-intent-based push method, device, terminal device, and storage medium provided by the embodiments of the present application will be described in detail below through specific embodiments.

请参阅图2，图2示出了本申请一个实施例提供的基于用户意图的推送方法的流程示意图，本实施例提供的基于用户意图的推送方法可应用于上述系统中的服务器，即该方法的执行主体可以是上述的服务器，该方法用于提高为用户推送的内容的准确性，具体地，如图2所示，该方法包括：S110至S140。Please refer to FIG. 2. FIG. 2 shows a schematic flowchart of a user-intent-based push method provided by an embodiment of the present application. The user-intent-based push method provided by this embodiment can be applied to the server in the above-mentioned system, that is, the method The execution body of the .server may be the above-mentioned server, and the method is used to improve the accuracy of the content pushed for the user. Specifically, as shown in FIG. 2 , the method includes: S110 to S140.

S110:获取用户输入的搜索词和待推送内容映射关系集合。S110: Obtain a set of mapping relationships between the search term input by the user and the content to be pushed.

其中，用户输入的搜索词的输入形式可以是文本或者语音等，具体地，若用户的输入形式是文本，则获取用户输入的文本形式的搜索词，若用户输入形式是语音，则将语音信息转换为文本形式的搜索词。The input form of the search term input by the user may be text or voice, etc. Specifically, if the input form of the user is text, the search term in the form of text input by the user is obtained, and if the input form of the user is voice, the voice information Search terms converted to text.

其中，待推送内容映射关系集合包括：待推送内容与关键词之间的映射关系，待推送内容是等待被推送的事物或者虚拟事物，可以是文本形式，也可以是视频形式，还可以是图片形式，在此不做限定。例如，待推送内容可以是待推送的新闻、榜单、视频等，也可以是推荐的商品、店铺、服务等。推送内容与关键词之间的映射关系可以是一个关键词对应一个推送内容，也可以是一个关键词对应多个推送内容，在此不做限定。作为一种方式，待推送内容映射关系集合可以以各种数据结构保存，比如，按照数据结构类型的不同，内容向量映射关系集合可以以列表、表格、散列表、数组、树等形式保存。The set of mapping relationships of the content to be pushed includes: the mapping relationship between the content to be pushed and the keywords, and the content to be pushed is a thing waiting to be pushed or a virtual thing, which can be in the form of text, video, or pictures The form is not limited here. For example, the content to be pushed may be news, list, video, etc. to be pushed, or may be recommended products, stores, services, and the like. The mapping relationship between the push content and the keywords may be that one keyword corresponds to one push content, or one keyword may correspond to multiple push contents, which is not limited herein. As a way, the set of content mapping relationships to be pushed can be stored in various data structures. For example, according to different types of data structures, the set of content vector mapping relationships can be stored in the form of lists, tables, hash tables, arrays, trees, and the like.

在一些实施方式中，待推送内容映射关系集合中的关键词可以是待推送内容对应的标题，因为标题通常是待推送内容的提炼与概括，可以用于表征待推送内容的主要内，待推送内容与标题之间的映射关系集合可以是预先设置并存储在终端设备本地或者服务器中的。In some embodiments, the keyword in the content mapping relationship set to be pushed may be the title corresponding to the content to be pushed, because the title is usually a refinement and summary of the content to be pushed, and can be used to represent the main content of the content to be pushed. The set of mapping relationships between the content and the title may be preset and stored locally in the terminal device or in the server.

在另一些实施方式中，待推送内容映射关系集合中的关键词也可以是待推送内容的一段关键文本。具体地，当待推送内容包含文本内容时，若待推送内容中包括摘要，则该推送内容的关键词可以是推送内容的摘要；若待推送内容中不包括摘要，则该推送内容的关键词可以是推送内容的第一段或者最后一段文本，因为文本内容的第一段通常会概括地介绍页面的内容，文本内容的最后一段通常会对页面的内容进行总结。In other embodiments, the keyword in the content mapping relationship set to be pushed may also be a piece of key text of the content to be pushed. Specifically, when the content to be pushed includes text content, if the content to be pushed includes an abstract, the key word of the content to be pushed may be the abstract of the content to be pushed; if the content to be pushed does not include an abstract, the key word of the content to be pushed may be the abstract of the content to be pushed It can be the first paragraph or the last text of the push content, because the first paragraph of the text content usually summarizes the content of the page, and the last paragraph of the text content usually summarizes the content of the page.

S120：基于预先训练的语义理解模型，分别获取搜索词对应的搜索语义向量和关键词对应的内容语义向量。S120: Based on the pre-trained semantic understanding model, respectively obtain the search semantic vector corresponding to the search word and the content semantic vector corresponding to the keyword.

其中，语义向量是将文本信息映射到预设向量空间中，得到的用于表征文本信息的语义的向量，通过将搜索词作为预先训练的语义理解模型的输入，可以得到搜索词对应的搜索语义向量，通过将关键词作为预先训练的语义理解模型的输入，可以得到关键词对应的内容语义向量。其中，语义理解模型可以是监督模型，也可以是无监督模型，是由文本数据预先训练得到模型。Among them, the semantic vector is a vector used to represent the semantics of the text information obtained by mapping the text information into the preset vector space. By using the search word as the input of the pre-trained semantic understanding model, the search semantics corresponding to the search word can be obtained. Vector, by taking the keyword as the input of the pre-trained semantic understanding model, the content semantic vector corresponding to the keyword can be obtained. Among them, the semantic understanding model can be a supervised model or an unsupervised model, which is a model pre-trained from text data.

具体地，语义理解模型包括但不限于：深度神经网络(Deep Neural Networks,DNN)、循环神经网络(Recurrent Neural Networks,RNN)、卷积神经网络(ConvolutionalNeural Networks,CNN)、Transformer中的一种或组合。在一些实施方式中，语义理解模型可以是基于变换器的双向编码表示网络模型(Bidirectional Encoder Representationfrom Transformers，BERT)、词向量模型(doc2Vec)等。Specifically, the semantic understanding model includes but is not limited to: one of Deep Neural Networks (DNN), Recurrent Neural Networks (RNN), Convolutional Neural Networks (CNN), Transformer, or combination. In some embodiments, the semantic understanding model may be a transformer-based bidirectional encoding representation network model (Bidirectional Encoder Representation from Transformers, BERT), a word vector model (doc2Vec), and the like.

作为一种方式，可以按照指定的时间频率获取搜索词对应的搜索语义向量和关键词对应的内容语义向量，从而获取用户搜索意图的变化情况，更有效地进行信息推送，其中，指定的时间频率可以根据实际需求来设定，例如，时间频率为一天时，可以每天基于预先训练的语义理解模型，获取搜索词对应的搜索语义向量和关键词对应的内容语义向量。As a method, the search semantic vector corresponding to the search term and the content semantic vector corresponding to the keyword can be obtained according to the specified time frequency, so as to obtain the change of the user's search intention and push the information more effectively, wherein the specified time frequency It can be set according to actual needs. For example, when the time frequency is one day, the search semantic vector corresponding to the search word and the content semantic vector corresponding to the keyword can be obtained every day based on the pre-trained semantic understanding model.

S130：计算搜索语义向量和内容语义向量的相似度，根据相似度从内容语义向量中确定目标内容语义向量。S130: Calculate the similarity between the search semantic vector and the content semantic vector, and determine the target content semantic vector from the content semantic vector according to the similarity.

在获取到搜索词对应的搜索语义向量和关键词对应的内容语义向量后，可以计算语义向量和内容向量的相似度，其中相似度用于表征搜索语义向量和内容语义向量的相关性，也就是搜索词和关键词的相关性，具体地，相似度越高，关键词越能够表达搜索词反映的用户意图。After obtaining the search semantic vector corresponding to the search word and the content semantic vector corresponding to the keyword, the similarity between the semantic vector and the content vector can be calculated, where the similarity is used to represent the correlation between the search semantic vector and the content semantic vector, that is, The correlation between the search term and the keyword, specifically, the higher the similarity, the more the keyword can express the user intent reflected by the search term.

在一些实施例中，基于预先训练的语义理解模型，分别获取搜索词对应的搜索语义向量和关键词对应的内容语义向量可以是由服务器执行的，可以将获取到的搜索语义向量和内容语义向量存储在离线服务器或者终端设备本地的数据库中，离线服务器中相应的推送装置或者终端设备可以通过存储的搜索语义向量和内容语义向量的数据，实时地计算搜索语义向量和内容语义向量的相似度。In some embodiments, based on a pre-trained semantic understanding model, respectively obtaining the search semantic vector corresponding to the search term and the content semantic vector corresponding to the keyword may be performed by a server, and the obtained search semantic vector and content semantic vector may be Stored in the offline server or the local database of the terminal device, the corresponding push device or terminal device in the offline server can calculate the similarity between the search semantic vector and the content semantic vector in real time through the stored data of the search semantic vector and content semantic vector.

具体地，根据相似度从内容语义向量中确定目标语义向量，可以是将相似度满足指定条件的内容语义向量作为目标语义向量，指定条件是与实际推送场景有关的条件，目标语义向量可以是一个向量，也可以是多个向量。作为一种方式，指定条件可以是大于特定的阈值，相应地，将大于特定的阈值对应的内容语义向量作为目标语义向量。特定条件还可以是获取相似度最大的N个内容语义向量，N为预先设置的大于0的整数，相应地，将相似度最大的N个内容语义向量作为目标语义向量。Specifically, the target semantic vector is determined from the content semantic vector according to the similarity, and the content semantic vector whose similarity satisfies a specified condition may be used as the target semantic vector. The specified condition is a condition related to the actual push scene, and the target semantic vector may be a vector, or multiple vectors. In one way, the specified condition may be greater than a specific threshold, and accordingly, a content semantic vector corresponding to greater than a specific threshold is used as the target semantic vector. The specific condition may also be to obtain N content semantic vectors with the largest similarity, where N is a preset integer greater than 0, and correspondingly, the N content semantic vectors with the largest similarity are used as the target semantic vector.

作为一种实施方式，可以根据搜索语义向量与内容语义向量的相似度来优化待推送内容的关键词，从而更好地吸引用户，提高待推送内容的转化率。例如，待推送内容为广告信息时，广告主可以获取该广告信息想要触达的用户群体，通过分析与该用户群体的搜索语义向量的相似度较高的内容语义向量，可以得到用户较为感兴趣的内容语义向量对应的关键词，从而根据该关键词指导广告主优化广告信息中的关键词，提升广告的转化效果。As an embodiment, keywords of the content to be pushed can be optimized according to the similarity between the search semantic vector and the content semantic vector, so as to better attract users and improve the conversion rate of the content to be pushed. For example, when the content to be pushed is advertisement information, the advertiser can obtain the user group that the advertisement information wants to reach. The keyword corresponding to the content semantic vector of interest, so as to guide the advertiser to optimize the keyword in the advertisement information according to the keyword, and improve the conversion effect of the advertisement.

S140：根据目标内容语义向量，获取与目标内容语义向量对应的待推送内容，进行推送。S140: Acquire the content to be pushed corresponding to the semantic vector of the target content according to the semantic vector of the target content, and push the content.

在获取目标语义向量后，可以根据目标语义向量获取对应的关键词，基于待推送内容映射关系集合，获取到该关键词对应的待推送内容，将待推送内容进行推送。After the target semantic vector is obtained, the corresponding keyword can be obtained according to the target semantic vector, the content to be pushed corresponding to the keyword is obtained based on the mapping relationship set of the content to be pushed, and the content to be pushed is pushed.

作为一种方式，可以获取每一个目标内容语义向量对应的相似度，按照相似度由大到小的顺序，对目标语义向量对应的待推送内容进行排序，从而将相似度较高的目标语义向量对应的待推送内容排列前面，方便用户进行选择。As a method, the similarity corresponding to each target content semantic vector can be obtained, and the content to be pushed corresponding to the target semantic vector can be sorted in descending order of similarity, so that the target semantic vector with higher similarity can be sorted. The corresponding content to be pushed is arranged in the front, which is convenient for users to select.

作为一种方式，也可以预先指定内容语义向量，分别计算该内容语义向量与每个搜索语义向量的相似度，获取相似度满足指定条件的搜索语义向量，从而对输入过搜索语义向量对应的搜索词的用户进行推送，推送信息为指定内容语义向量对应的待推送内容。As a method, the content semantic vector can also be specified in advance, the similarity between the content semantic vector and each search semantic vector can be calculated separately, and the search semantic vector whose similarity satisfies the specified conditions can be obtained. The user of the word pushes, and the push information is the content to be pushed corresponding to the specified content semantic vector.

本申请实施例提供的基于用户意图的信息推送方法，在获取用户输入的搜索词和待推送内容映射关系集合，待推送内容映射关系集合包括待推送内容与关键词之间的映射关系后，基于预先训练的语义理解模型，分别获取搜索词对应的搜索语义向量和关键词对应的内容语义向量，计算搜索语义向量和内容语义向量的相似度，根据相似度从内容语义向量中确定目标内容语义向量，从而根据目标内容语义向量，获取与目标内容语义向量对应的待推送内容，进行推送。通过搜索语义向量和内容语义向量的相似度来获取待推送内容，可以使推送内容更加符合用户的搜索意图，进而提高推送内容的准确性。In the method for pushing information based on user intent provided by the embodiments of the present application, after obtaining a set of mapping relationships between a search term input by a user and the content to be pushed, and the set of mapping relationships between the content to be pushed includes the mapping relationship between the content to be pushed and keywords, based on The pre-trained semantic understanding model obtains the search semantic vector corresponding to the search word and the content semantic vector corresponding to the keyword, calculates the similarity between the search semantic vector and the content semantic vector, and determines the target content semantic vector from the content semantic vector according to the similarity , so that according to the semantic vector of the target content, the content to be pushed corresponding to the semantic vector of the target content is obtained and pushed. Obtaining the content to be pushed by searching for the similarity between the semantic vector and the content semantic vector can make the pushed content more in line with the user's search intent, thereby improving the accuracy of the pushed content.

请参阅图3，图3示出了本申请另一个实施例提供的基于用户意图的推送方法的流程示意图，应用于上述系统中的服务器，即该方法的执行主体可以是上述的服务器，该方法用于提高为用户推送的内容的准确性，具体地，如图3所示，该方法包括：S210至S260。Please refer to FIG. 3. FIG. 3 shows a schematic flowchart of a user-intent-based push method provided by another embodiment of the present application, which is applied to the server in the above-mentioned system, that is, the execution body of the method may be the above-mentioned server. For improving the accuracy of the content pushed for the user, specifically, as shown in FIG. 3 , the method includes: S210 to S260.

S210:获取用户输入的搜索词和待推送内容映射关系集合。S210: Acquire a set of mapping relationships between the search term input by the user and the content to be pushed.

步骤S210可以参阅步骤S110。Step S210 can refer to step S110.

在一些实施例中，待推送内容的关键词可以是待推送内容的标题包括的关键词，可以通过终端埋点日志来获取用户历史搜索数据，其中，用户历史搜索数据中可以包括用户历史搜索词和历史搜索词对应的用户浏览的待推送内容。作为一种方式，可以根据用户浏览的待推送内容获取待推送内容与待推送内容的标题之间的映射关系，从而将映射关系存储在待推送内容映射关系集合；作为另一种方式，可以在获取到待推送内容的时候，将待推送内容与待推送内容的标题之间的映射关系存储起来。In some embodiments, the keywords of the content to be pushed may be keywords included in the title of the content to be pushed, and the user's historical search data may be obtained through the terminal burying log, wherein the user's historical search data may include the user's historical search terms The to-be-pushed content browsed by the user corresponding to the historical search term. As one way, the mapping relationship between the content to be pushed and the title of the content to be pushed can be obtained according to the content to be pushed browsed by the user, so that the mapping relationship is stored in the content mapping relationship set to be pushed; When the content to be pushed is obtained, the mapping relationship between the content to be pushed and the title of the content to be pushed is stored.

S220：分别获取搜索词的特征向量和关键词的特征向量。S220: Obtain the feature vector of the search word and the feature vector of the keyword respectively.

本申请实施例中获取语义向量使用的语义理解模型是BERT模型，分别获取搜索词的特征向量和关键词的特征向量作为BERT模型的输入，该特征向量是文本信息通过编码器生成的特征向量,可以通过对文本进行编码从而降低数据的维度，但是该向量无法根据上下文获取不同的语义表征,通过将特征向量输入BERT模型得到对应的语义向量，该语义向量既可以根据语句的上下文进行语义表征。In the embodiment of the present application, the semantic understanding model used to obtain the semantic vector is the BERT model, and the feature vector of the search word and the feature vector of the keyword are obtained as the input of the BERT model, and the feature vector is the feature vector generated by the text information through the encoder, The dimension of the data can be reduced by encoding the text, but the vector cannot obtain different semantic representations according to the context. By inputting the feature vector into the BERT model, the corresponding semantic vector can be obtained. The semantic vector can be semantically represented according to the context of the sentence.

在一些实施例中，分别获取搜索词的特征向量和关键词的特征向量，包括获取所述搜索词的文本向量、位置向量、初始词向量，将所述搜索词的文本向量、位置向量、初始词向量进行融合，形成所述搜索词的特征向量；获取所述关键词的文本向量、位置向量、初始词向量，将所述关键词的文本向量、位置向量、初始词向量进行融合，形成所述待推送内容的特征向量。In some embodiments, obtaining the feature vector of the search word and the feature vector of the keyword respectively includes obtaining the text vector, position vector, and initial word vector of the search word, and combining the text vector, position vector, and initial word vector of the search word with The word vectors are fused to form the feature vector of the search word; the text vector, the position vector and the initial word vector of the keyword are obtained, and the text vector, the position vector and the initial word vector of the keyword are fused to form the Describe the feature vector of the content to be pushed.

其中，文本向量(Token Embeddings)中包含预置的标识符[CLS]和[SEP]，和每个字对应的隐层向量表示，对每一个搜索词或者关键词，都会在该语句的开头设置[CLS]标识符来表征句子的开始，在相邻或并列的两个语句之间用[SEP]标识符作为分隔，并可以在语句的结束设置符号[SEP]，初始词向量(Segment Embeddings)用于表征不同的语句，位置向量(Position Embeddings)用于表征语句的每个词语在语句中的位置。Among them, the text vector (Token Embeddings) contains the preset identifiers [CLS] and [SEP], and the hidden layer vector representation corresponding to each word. For each search word or keyword, it will be set at the beginning of the sentence. The [CLS] identifier is used to represent the beginning of the sentence, and the [SEP] identifier is used as a separation between two adjacent or juxtaposed sentences, and the symbol [SEP] can be set at the end of the sentence, the initial word vector (Segment Embeddings) It is used to represent different sentences, and the position vector (Position Embeddings) is used to represent the position of each word of the sentence in the sentence.

其中，向量融合是向量融合指多个向量转化为一个向量，根据融合方式的不同，向量融合可以包括向量拼接、向量相加等。作为一种方式，在BERT模型以及BERT模型的变种模型，位置向量是在使用该语句进行训练的过程中获取的，可以将文本向量、位置向量、初始词向量进行向量拼接，再经过一个全连接层进行输出。通过融合多种的向量得到特征向量，可以结合上下文更好地感知该语句的语义。Among them, vector fusion means that vector fusion refers to converting multiple vectors into one vector. According to different fusion methods, vector fusion may include vector splicing, vector addition, and the like. As a way, in the BERT model and the variant model of the BERT model, the position vector is obtained in the process of using the sentence for training. The text vector, position vector, and initial word vector can be vector spliced, and then go through a full connection. layer to output. By fusing a variety of vectors to obtain feature vectors, the semantics of the sentence can be better perceived in combination with the context.

S230：将搜索词的特征向量作为双向编码表示网络的输入，通过双向编码表示网络，得到搜索词对应的搜索语义向量。S230: The feature vector of the search word is used as the input of the bidirectional coding representation network, and the bidirectional coding representation network is used to obtain the search semantic vector corresponding to the search term.

其中，双向编码表示网络是基于变换器的双向编码表示网络(BidirectionalEncoder Representation from Transformers，BERT)，BERT模型同时使用上下文，并且使用双向的Transformer来表征文本，Transformer由编码器和解码器组成，编码器主要有多个编码模块组成，每个编码模块中包含自注意力层和前馈神经网络层，解码器和编码器类似，也有多个解码模块组成，每个解码模块中包含自注意力层和前馈神经网络层，另外还多了一层编码解码层。Among them, the bidirectional encoding representation network is a transformer-based bidirectional encoding representation network (BidirectionalEncoder Representation from Transformers, BERT). The BERT model uses the context at the same time, and uses a bidirectional Transformer to represent the text. The Transformer consists of an encoder and a decoder. The encoder It is mainly composed of multiple encoding modules. Each encoding module includes a self-attention layer and a feedforward neural network layer. Similar to the encoder, the decoder is also composed of multiple decoding modules. Each decoding module includes a self-attention layer and a feedforward neural network layer. Feedforward neural network layer, and an additional layer of encoding and decoding.

BERT模型是典型的二阶段模型，可以分成预训练阶段和微调阶段，在预训练阶段主要是利用Transformer作为特征抽取器学习海量的无标注文本从而学习语言学知识，最终得到文本的表征方式，也就是文本对应的语义向量。在微调阶段根据预训练阶段学习到的文本语义知识，基于下游实际的业务需求对模型进行微调学习，从而适应下游实际的业务需求。在本实施例中，使用BERT模型的预训练阶段，将融合了搜索词的文本向量、位置向量、初始词向量的特征向量作为BERT模型的输入，得到可以根据语句的上下文进行语义表征的搜索语义向量。The BERT model is a typical two-stage model, which can be divided into a pre-training stage and a fine-tuning stage. In the pre-training stage, the Transformer is mainly used as a feature extractor to learn a large amount of unlabeled text to learn linguistic knowledge, and finally the representation of the text is obtained. is the semantic vector corresponding to the text. In the fine-tuning stage, according to the text semantic knowledge learned in the pre-training stage, the model is fine-tuned and learned based on the actual downstream business needs, so as to adapt to the actual downstream business needs. In this embodiment, the pre-training stage of the BERT model is used, and the feature vector that combines the text vector, position vector, and initial word vector of the search word is used as the input of the BERT model, and the search semantics that can be semantically represented according to the context of the sentence is obtained. vector.

作为一种实施方式，可以通过搜索引擎应用程序或网页的埋点日志数据获取用户搜索行为数据，其中，用户搜索行为数据包括用户输入的搜索词和用户在该搜索词的搜索结果中点击的待推送内容，由于用户点击的待推送内容可以表征用户的搜索意图，通过将用户输入的搜索词和用户在该搜索词的搜索结果中点击的待推送内容作为弱监督数据来训练ALBERT模型，可以获取到用户搜索词与待推送内容之间的关联，这种关联在模型训练时进行充分的交互匹配，可以获取到更好的文本表示。As an implementation manner, user search behavior data can be obtained through the search engine application program or the buried log data of the webpage, wherein the user search behavior data includes the search term input by the user and the waiting list clicked by the user in the search result of the search term. Push content, since the content to be pushed that the user clicks can represent the user's search intent, the ALBERT model can be trained by using the search term input by the user and the content to be pushed clicked by the user in the search results of the search term as weakly supervised data. To the association between the user's search term and the content to be pushed, this association can be fully interactively matched during model training, and a better text representation can be obtained.

作为一种实施方式，可以采用BERT模型的变种模型ALBERT模型。与采用了12层的Transformer的BERT模型不同，ALBERT模型仅使用4层Transformer，通过对嵌入参数化进行因式分解和跨层参数共享减少了模型参数量，训练参数约为400万，模型大小仅为14M。此外，ALBERT模型使用句子顺序预测(Sentence Order Prediction，SOP)替代了BERT模型的下一句预测(Next Oder Prediction，NOP)，增强了模型学习句子的连续性的能力，提升了自监督学习任务的能力，通过去掉dropout可以节省很多临时变量，有效提升模型训练过程中内存的利用率，提升了模型的效率，减少了所需训练数据的规模。相比于BERT模型，尽管ALBERT模型的准确率会略微降低1％到2％，但是模型的训练速度和预测速度会提升2到3倍。As an implementation manner, a variant model of the BERT model, the ALBERT model, may be used. Unlike the BERT model that uses a 12-layer Transformer, the ALBERT model uses only 4-layer Transformers. The amount of model parameters is reduced by factoring the embedding parameterization and sharing parameters across layers. The training parameters are about 4 million, and the model size is only is 14M. In addition, the ALBERT model uses Sentence Order Prediction (SOP) to replace the Next Oder Prediction (NOP) of the BERT model, which enhances the model's ability to learn the continuity of sentences and improves the ability of self-supervised learning tasks , By removing dropout, a lot of temporary variables can be saved, which can effectively improve the memory utilization during model training, improve the efficiency of the model, and reduce the scale of training data required. Compared with the BERT model, although the accuracy of the ALBERT model will be slightly reduced by 1% to 2%, the training speed and prediction speed of the model will be improved by 2 to 3 times.

由于用户搜索行为具有实时性，也就是对于一些事件,搜索引擎要在一个限定的时间内满足用户的搜索需求，一方面，使用ALBERT模型可以更快地根据用户搜索行为数据来进行训练，从而获得搜索词对应的搜索语义向量，另一方面，使用ALBERT模型可以将更接近当前日期的用户搜索行为数据作为训练数据，从而更好地满足用户搜索行为的实时性。例如，对于同样的样本数量，使用BERT模型需要1天的训练时间，因此在训练BERT模型时只能采用1天前的埋点日志数据进行训练，而使用ALBERT模型只需要2小时的训练时间，可以获取2小时前的埋点日志数据进行训练，更接近当时间的训练数据能更好地反映用户具有时效性的搜索意图。Since the user's search behavior is real-time, that is, for some events, the search engine needs to meet the user's search needs within a limited time. The search semantic vector corresponding to the search word. On the other hand, using the ALBERT model can use the user search behavior data closer to the current date as the training data, so as to better meet the real-time nature of user search behavior. For example, for the same number of samples, using the BERT model requires 1 day of training time, so when training the BERT model, only the buried log data 1 day ago can be used for training, while using the ALBERT model only requires 2 hours of training time. The buried log data 2 hours ago can be obtained for training, and the training data closer to the current time can better reflect the user's time-sensitive search intent.

S240：将待推送内容的特征向量作为双向编码表示网络的输入，通过双向编码表示网络，得到关键词对应的内容语义向量。S240: The feature vector of the content to be pushed is used as the input of the bidirectional coding representation network, and the bidirectional coding representation network is used to obtain the content semantic vector corresponding to the keyword.

可以理解的是，获取关键词对应的内容语义向量的方法和获取搜索词对应的搜索语义向量的方法是相同的，具体请参照步骤S230。It can be understood that the method for obtaining the content semantic vector corresponding to the keyword is the same as the method for obtaining the search semantic vector corresponding to the search word. For details, please refer to step S230.

S250：计算搜索语义向量和内容语义向量的相似度，根据相似度从内容语义向量中确定目标内容语义向量。S250: Calculate the similarity between the search semantic vector and the content semantic vector, and determine the target content semantic vector from the content semantic vector according to the similarity.

S260：根据目标内容语义向量，获取与目标内容语义向量对应的待推送内容，进行推送。S260: Acquire the content to be pushed corresponding to the semantic vector of the target content according to the semantic vector of the target content, and push the content.

需要说明的是，本实施例中未详细描述的部分可以参考前述实施例，在此不再赘述。It should be noted that, for parts not described in detail in this embodiment, reference may be made to the foregoing embodiments, and details are not described herein again.

本申请实施例提供的基于用户意图的信息推送方法，语义理解模型为基于变换器的双向编码表示网络BERT模型，在获取用户输入的搜索词和待推送内容映射关系集合，待推送内容映射关系集合包括待推送内容与关键词之间的映射关系后，分别获取搜索词的特征向量和关键词的特征向量，将搜索词的特征向量作为双向编码表示网络的输入，通过双向编码表示网络，得到搜索词对应的搜索语义向量，将关键词的特征向量作为双向编码表示网络的输入，通过双向编码表示网络，得到关键词对应的内容语义向量，通过计算搜索语义向量和内容语义向量的相似度，根据相似度从内容语义向量中确定目标内容语义向量，然后根据目标内容语义向量，获取与目标内容语义向量对应的待推送内容，进行推送。通过BERT模型，可以分别获取到搜索词和关键词对应的根据语句的上下文进行语义表征的语义向量，从而根据语义向量获取用户的搜索意图进行推送，而不需要人工标注训练预料，一方面节省了标注的人力，另一方面也避免了标签维度过粗导致的进行推送不准确的问题。In the method for pushing information based on user intent provided by the embodiment of the present application, the semantic understanding model is a transformer-based bidirectional coding representation network BERT model, and after obtaining the search term input by the user and the set of mapping relationships of the content to be pushed, the set of mapping relationships of the content to be pushed is obtained. After including the mapping relationship between the content to be pushed and the keywords, the feature vector of the search word and the feature vector of the keyword are obtained respectively, and the feature vector of the search word is used as the input of the two-way encoding to represent the network, and the network is represented by the two-way encoding, and the search is obtained. The search semantic vector corresponding to the word, the feature vector of the keyword is used as the input of the two-way encoding to represent the network, and the network is represented by the two-way encoding, and the content semantic vector corresponding to the keyword is obtained. By calculating the similarity between the search semantic vector and the content semantic vector, according to The similarity determines the target content semantic vector from the content semantic vector, and then obtains the content to be pushed corresponding to the target content semantic vector according to the target content semantic vector, and pushes it. Through the BERT model, the semantic vectors corresponding to the search words and keywords, which are semantically represented according to the context of the sentence, can be obtained respectively, so that the user's search intent can be obtained according to the semantic vector and pushed, without the need for manual annotation and training expectations. On the one hand, it saves The manpower for labeling, on the other hand, also avoids the problem of inaccurate push caused by too thick label dimensions.

请参阅图4，图4示出了本申请又一个实施例提供的基于用户意图的推送方法的流程示意图，应用于上述系统中的服务器，即该方法的执行主体可以是上述的服务器，该方法用于提高为用户推送的内容的准确性，具体地，如图4所示，该方法包括：S310至S360。Please refer to FIG. 4. FIG. 4 shows a schematic flowchart of a user-intent-based push method provided by another embodiment of the present application, which is applied to the server in the above-mentioned system, that is, the execution body of the method may be the above-mentioned server. For improving the accuracy of the content pushed for the user, specifically, as shown in FIG. 4 , the method includes: S310 to S360.

S310：获取用户输入的搜索词和待推送内容映射关系集合。S310: Obtain a set of mapping relationships between the search term input by the user and the content to be pushed.

S320：基于预先训练的语义理解模型，分别获取搜索词对应的搜索语义向量和关键词对应的内容语义向量。S320: Based on the pre-trained semantic understanding model, obtain the search semantic vector corresponding to the search word and the content semantic vector corresponding to the keyword, respectively.

S330：计算搜索语义向量和内容语义向量之间的向量距离。S330: Calculate the vector distance between the search semantic vector and the content semantic vector.

其中，向量距离可以是欧氏距离(Euclidean Distance)、曼哈顿距离(ManhattanDistance)、切比雪夫距离(Chebyshev Distance)、标准化欧氏距离(StandardizedEuclidean distance)、夹角余弦(Cosine)等，在此不做限定。Among them, the vector distance can be Euclidean Distance, Manhattan Distance, Chebyshev Distance, Standardized Euclidean Distance, Cosine of Included Angle, etc. limited.

作为一种实施方式，可以将搜索语义向量和内容语义向量的数据存储在离线服务器中，在线服务器中的信息推送装置可以通过网络获取离线服务器的搜索语义向量和内容语义向量的数据，计算搜索语义向量和内容语义向量之间的向量距离。As an embodiment, the data of the search semantic vector and the content semantic vector can be stored in the offline server, and the information push device in the online server can obtain the data of the offline server's search semantic vector and content semantic vector through the network, and calculate the search semantic vector The vector distance between the vector and the content semantic vector.

在一些实施例中，向量距离可以是搜索语义向量和内容语义向量之间的夹角余弦距离，具体地，可以计算搜索语义向量与内容语义向量的向量长度，从而得到搜索语义的向量长度和内容语义的向量长度；计算搜索语义向量与内容语义向量的向量内积；基于向量内积和向量长度计算搜索语义向量和内容语义向量之间的余弦距离，作为搜索语义向量和内容语义向量之间的向量距离。In some embodiments, the vector distance may be the cosine distance of the included angle between the search semantic vector and the content semantic vector. Specifically, the vector length of the search semantic vector and the content semantic vector may be calculated to obtain the vector length of the search semantic and the content semantic vector. The vector length of semantics; calculate the vector inner product of the search semantic vector and the content semantic vector; calculate the cosine distance between the search semantic vector and the content semantic vector based on the vector inner product and the vector length, as the difference between the search semantic vector and the content semantic vector. vector distance.

S340：根据向量距离，确定搜索语义向量和内容语义向量的相似度。S340: Determine the similarity between the search semantic vector and the content semantic vector according to the vector distance.

可以根据向量距离来确定搜索语义向量和内容语义向量的相似度，向量距离越大，搜索语义向量和内容语义向量的差异越大，相应地，搜索语义向量和内容语义向量的相似度越低。The similarity between the search semantic vector and the content semantic vector can be determined according to the vector distance. The greater the vector distance, the greater the difference between the search semantic vector and the content semantic vector, and accordingly, the lower the similarity between the search semantic vector and the content semantic vector.

作为一种方式，可以用归一化后的向量距离作为搜索语义向量和内容语义向量的相似度。作为另一种方式，可以在预先训练的语义理解模型上集成根据向量距离计算相似度的损失函数，通过softmax函数将向量距离转化为概率形式表达的向量相似度。As a way, the normalized vector distance can be used as the similarity between the search semantic vector and the content semantic vector. As another way, a pre-trained semantic understanding model can be integrated with a loss function that calculates the similarity according to the vector distance, and the vector distance can be converted into a vector similarity expressed in probabilistic form through the softmax function.

S350：将相似度满足指定条件的内容语义向量作为目标内容语义向量。S350: Use the content semantic vector whose similarity satisfies the specified condition as the target content semantic vector.

其中，指定条件可以是根据推送信息的策略而设定的，目标语义向量可以是一个向量，也可以是多个向量。作为一种方式，指定条件可以是大于特定的阈值，相应地，将大于特定的阈值对应的内容语义向量作为目标语义向量。特定条件还可以是获取相似度最大的N个内容语义向量，N为预先设置的大于0的整数，相应地，将相似度最大的N个内容语义向量作为目标语义向量。The specified condition may be set according to a policy of pushing information, and the target semantic vector may be one vector or multiple vectors. In one way, the specified condition may be greater than a specific threshold, and accordingly, a content semantic vector corresponding to greater than a specific threshold is used as the target semantic vector. The specific condition may also be to obtain N content semantic vectors with the largest similarity, where N is a preset integer greater than 0, and correspondingly, the N content semantic vectors with the largest similarity are used as the target semantic vector.

S360：根据目标内容语义向量，获取与目标内容语义向量对应的待推送内容，进行推送。S360: Acquire the content to be pushed corresponding to the semantic vector of the target content according to the semantic vector of the target content, and push the content.

本申请实施例提供的基于用户意图的信息推送方法，获取用户输入的搜索词和待推送内容映射关系集合，待推送内容映射关系集合包括待推送内容与关键词之间的映射关系，然后基于预先训练的语义理解模型，分别获取搜索词对应的搜索语义向量和关键词对应的内容语义向量，计算搜索语义向量和内容语义向量之间的向量距离，根据向量距离，确定搜索语义向量和内容语义向量的相似度，并将相似度满足指定条件的内容语义向量作为目标内容语义向量，从而根据目标内容语义向量，获取与目标内容语义向量对应的待推送内容，进行推送。计算向量距离简单且易于实现，可以快速实时地获取搜索语义向量和内容语义向量的相似度，从而高效地进行内容推送。The method for pushing information based on user intent provided by the embodiment of the present application acquires a set of mapping relationships between a search term input by a user and content to be pushed, and the set of mapping relationships for content to be pushed includes a mapping relationship between content to be pushed and keywords, and then based on a pre- The trained semantic understanding model obtains the search semantic vector corresponding to the search word and the content semantic vector corresponding to the keyword, calculates the vector distance between the search semantic vector and the content semantic vector, and determines the search semantic vector and the content semantic vector according to the vector distance. and the content semantic vector whose similarity satisfies the specified condition is used as the target content semantic vector, so as to obtain the content to be pushed corresponding to the target content semantic vector according to the target content semantic vector, and push it. The calculation of the vector distance is simple and easy to implement, and the similarity between the search semantic vector and the content semantic vector can be obtained quickly and in real time, so as to efficiently push content.

请参阅图5，图5示出了本申请再一个实施例提供的基于用户意图的推送方法的流程示意图，应用于上述系统中的服务器，即该方法的执行主体可以是上述的服务器，该方法用于提高为用户推送的内容的准确性，具体地，如图5所示，该方法包括：S410至S450。Please refer to FIG. 5. FIG. 5 shows a schematic flowchart of a user-intent-based push method provided by another embodiment of the present application, which is applied to the server in the above-mentioned system, that is, the execution body of the method may be the above-mentioned server. For improving the accuracy of the content pushed for the user, specifically, as shown in FIG. 5 , the method includes: S410 to S450.

S410：获取用户浏览页面的页面文本信息。S410: Acquire page text information of the page browsed by the user.

其中，用户浏览页面可以是用户进行搜索后，在搜索结果中浏览的页面文本信息，作为一种方式，用户浏览页面也可以是在信息流的场景下，用户主要出于浏览的目的，并没有明确的检索行为所产生的浏览页面，在此不做限定。通过获取用户浏览页面的页面文本信息，可以得到用户与页面文本信息之间的映射关系。The user browsing page may be the text information of the page browsed in the search result after the user conducts a search. As a method, the user browsing page may also be in the scenario of information flow, the user is mainly for the purpose of browsing, and there is no The browsing page generated by the explicit retrieval behavior is not limited here. By acquiring the page text information of the page the user browses, the mapping relationship between the user and the page text information can be obtained.

作为一种方式，可以通过用户进行搜索时产生的搜索历史行为数据获取用户浏览页面的页面文本信息，页面文本信息可以包括用户浏览页面的内容，还可以包括用户浏览页面时的上下文信息，用户在浏览页面时，每一次点击查看的页面都可以反映用户浏览时的倾向，也就是说用户浏览的每一个页面文本信息可以反映用户的意图信息。例如，用户浏览的页面多为母婴论坛相关的界面，可以推测用户可能是关注婴幼儿领域的新手父母，因此用户可能会对推送的婴幼儿产品相关内容比较感兴趣。又例如，通过获取用户浏览的页面分析得出页面文本内容都字数较少，则用户可能会更倾向于简短文字内容的推送。As a method, the page text information of the user's browsing page can be obtained through the search history behavior data generated when the user searches. When browsing a page, the page viewed by each click can reflect the user's browsing tendency, that is to say, the text information of each page browsed by the user can reflect the user's intention information. For example, most of the pages that users browse are related to maternal and infant forums. It can be speculated that the user may be a novice parent who is concerned about the field of infants and young children, so the user may be more interested in the pushed content of infant and young children products. For another example, if it is obtained through analysis of the pages browsed by the user that the text content of the page has fewer words, the user may be more inclined to push short text content.

S420：基于预先训练的语义理解模型获取页面文本信息对应的页面语义向量。S420: Obtain a page semantic vector corresponding to the page text information based on the pre-trained semantic understanding model.

其中，预先训练的语义理解模型可以是BERT模型，也可以是ALBERT模型，还可以是其它机器学习模型，在此不做限定。通过将页面恩本信息输入预先训练得到的语义理解模型中，可以获取到页面文本信息对应的页面语义向量，其中，页面语义向量是用于表征用户浏览页面的页面文本信息的语义内容的向量，反映了用户搜索后浏览行为的意图信息。The pre-trained semantic understanding model may be a BERT model, an ALBERT model, or other machine learning models, which are not limited here. The page semantic vector corresponding to the page text information can be obtained by inputting the page enben information into the pre-trained semantic understanding model, wherein the page semantic vector is a vector used to represent the semantic content of the page text information of the page browsed by the user, It reflects the intent information of users' browsing behavior after searching.

S430：计算搜索语义向量和内容语义向量的余弦距离，获得第一相似度。S430: Calculate the cosine distance between the search semantic vector and the content semantic vector to obtain a first similarity.

其中，第一相似度用于表征搜索语义向量与内容语义向量的相似程度，具体计算搜索语义向量和内容语义向量的余弦距离，获得第一相似度的过程请参阅步骤S330至步骤S340。The first similarity is used to represent the similarity between the search semantic vector and the content semantic vector, and the cosine distance between the search semantic vector and the content semantic vector is specifically calculated. For the process of obtaining the first similarity, please refer to steps S330 to S340.

S440：计算页面语义向量和内容语义向量的余弦距离，获得第二相似度。S440: Calculate the cosine distance between the page semantic vector and the content semantic vector to obtain a second similarity.

其中，第二相似度用于表征页面语义向量与内容语义向量的相似程度，可以理解的是，计算搜索语义向量和内容语义向量的余弦距离，获得第二相似度的过程与步骤S430相似，具体请参阅步骤S330至步骤S340。The second similarity is used to represent the similarity between the page semantic vector and the content semantic vector. It can be understood that the process of calculating the cosine distance between the search semantic vector and the content semantic vector to obtain the second similarity is similar to step S430. Please refer to steps S330 to S340.

S450：根据第一相似度和第二相似度确定目标内容语义向量。S450: Determine the semantic vector of the target content according to the first similarity and the second similarity.

作为一种实施方式，可以分别对内容语义向量对应的第一相似度和第二相似度赋予不同的权重值，将第一相似度和第二相似度加权求和的值作为综合相似度，将满足指定条件的综合相似度所对应的内容语义向量作为目标内容语义向量。As an implementation manner, different weight values may be assigned to the first similarity and the second similarity corresponding to the content semantic vector respectively, and the weighted sum of the first similarity and the second similarity may be used as the comprehensive similarity, and the The content semantic vector corresponding to the comprehensive similarity satisfying the specified condition is taken as the target content semantic vector.

作为另一种实施方式，可以指定目标内容语义向量，获取满足第一指定条件的第一相似度对应的搜索语义向量，进而获取该搜索语义向量对应的第一用户群；获取满足第二指定条件的第二相似度对应的内容语义向量，进而获取该内容语义向量对应的第二用户群。通过对第一用户群和第二用户群取交集，得到进行信息推送的目标用户，对目标用户推送目标内容语义向量。As another implementation manner, the target content semantic vector may be specified, the search semantic vector corresponding to the first similarity that satisfies the first specified condition is obtained, and then the first user group corresponding to the search semantic vector is obtained; the second specified condition is obtained. The content semantic vector corresponding to the second similarity degree of , and then the second user group corresponding to the content semantic vector is obtained. By taking the intersection of the first user group and the second user group, the target user for information push is obtained, and the target content semantic vector is pushed to the target user.

本申请实施例提供的基于用户意图的信息推送方法，获取用户输入的搜索词，基于预先训练的语义理解模型，分别获取搜索词对应的搜索语义向量和关键词对应的内容语义向量，还可以通过获取用户浏览页面的页面文本信息，基于预先训练的语义理解模型获取页面文本信息对应的页面语义向量，计算搜索语义向量和内容语义向量的余弦距离，获得第一相似度，计算页面语义向量和内容语义向量的余弦距离，获得第二相似度，根据第一相似度和第二相似度确定目标内容语义向量，从而将目标内容语义向量对应的待推送内容进行推送。通过获取用户浏览页面的页面文本信息，可以从更多的维度来理解用户意图，从而实现更有效的信息推送方法。The user-intent-based information push method provided by the embodiment of the present application acquires the search term input by the user, and based on the pre-trained semantic understanding model, obtains the search semantic vector corresponding to the search term and the content semantic vector corresponding to the keyword respectively, and can also use Obtain the page text information of the page the user browses, obtain the page semantic vector corresponding to the page text information based on the pre-trained semantic understanding model, calculate the cosine distance between the search semantic vector and the content semantic vector, obtain the first similarity, and calculate the page semantic vector and content. The cosine distance of the semantic vector is used to obtain the second similarity, and the semantic vector of the target content is determined according to the first similarity and the second similarity, so that the content to be pushed corresponding to the semantic vector of the target content is pushed. By obtaining the page text information of the user's browsing page, the user's intention can be understood from more dimensions, thereby realizing a more effective information push method.

请参阅图6，其示出了本申请实施例提供的一种基于用户意图的推送装置600的结构框图，该装置应用于上述系统中的服务器，该装置可以包括：信息获取模块610、向量获取模块620、确定模块630和处理模块640。Please refer to FIG. 6 , which shows a structural block diagram of a user-intent-based push device 600 provided by an embodiment of the present application. The device is applied to the server in the above system. The device may include: an information acquisition module 610 , a vector acquisition module 600 module 620 , determination module 630 and processing module 640 .

信息获取模块610，获取用户输入的搜索词和待推送内容映射关系集合，所述待推送内容映射关系集合包括：待推送内容与关键词之间的映射关系。The information acquisition module 610 acquires the search term input by the user and a set of mapping relationships of the content to be pushed, where the set of mapping relationships of the content to be pushed includes: the mapping relationship between the content to be pushed and the keywords.

进一步地，信息获取模块610包括：搜索数据获取子模块、数据存储子模块以及视频显示子模块，其中：Further, the information acquisition module 610 includes: a search data acquisition sub-module, a data storage sub-module and a video display sub-module, wherein:

搜索数据获取子模块，用于获取用户历史搜索数据，所述用户历史搜索数据包括用户历史搜索词和所述历史搜索词对应的用户浏览的所述待推送内容。The search data acquisition sub-module is configured to acquire user historical search data, where the user historical search data includes historical search terms of the user and the content to be pushed browsed by the user corresponding to the historical search terms.

数据存储子模块，用于将所述待推送内容与所述待推送内容的标题之间的映射关系存储在所述待推送内容映射关系集合。A data storage submodule, configured to store the mapping relationship between the content to be pushed and the title of the content to be pushed in the content mapping relationship set to be pushed.

向量获取模块620，基于预先训练的语义理解模型，分别获取所述搜索词对应的搜索语义向量和所述关键词对应的内容语义向量。The vector obtaining module 620, based on the pre-trained semantic understanding model, obtains the search semantic vector corresponding to the search term and the content semantic vector corresponding to the keyword, respectively.

进一步地，向量获取模型620包括：特征向量获取子模块、搜索语义向量获取子模块以及内容语义向量获取子模块，其中：Further, the vector acquisition model 620 includes: a feature vector acquisition sub-module, a search semantic vector acquisition sub-module and a content semantic vector acquisition sub-module, wherein:

特征向量获取子模块，用于分别获取所述搜索词的特征向量和所述关键词的特征向量。The feature vector obtaining submodule is used to obtain the feature vector of the search word and the feature vector of the keyword respectively.

进一步地，特征向量获取子模块包括：搜索词向量融合单元以及关键词向量融合单元，其中：Further, the feature vector acquisition sub-module includes: a search word vector fusion unit and a keyword vector fusion unit, wherein:

搜索词向量融合单元，用于获取所述搜索词的文本向量、位置向量、初始词向量，将所述搜索词的文本向量、位置向量、初始词向量进行融合，形成所述搜索词的特征向量。The search word vector fusion unit is used to obtain the text vector, position vector and initial word vector of the search word, and fuse the text vector, position vector and initial word vector of the search word to form the feature vector of the search word .

以及关键词向量融合单元，用于获取所述关键词的文本向量、位置向量、初始词向量，将所述关键词的文本向量、位置向量、初始词向量进行融合，形成所述待推送内容的特征向量。And a keyword vector fusion unit, used to obtain the text vector, position vector, and initial word vector of the keyword, and fuse the text vector, position vector, and initial word vector of the keyword to form the content to be pushed. Feature vector.

搜索语义向量获取子模块，用于将所述搜索词的特征向量作为所述双向编码表示网络的输入，通过所述双向编码表示网络，得到所述搜索词对应的搜索语义向量。The search semantic vector acquisition sub-module is configured to use the feature vector of the search word as the input of the bidirectional coding representation network, and obtain the search semantic vector corresponding to the search term through the bidirectional coding representation network.

内容语义向量获取子模块，用于将所述关键词的特征向量作为所述双向编码表示网络的输入，通过所述双向编码表示网络，得到所述关键词对应的内容语义向量。The content semantic vector acquisition sub-module is configured to use the feature vector of the keyword as the input of the bidirectional coding representation network, and obtain the content semantic vector corresponding to the keyword through the bidirectional coding representation network.

确定模块630，计算所述搜索语义向量和所述内容语义向量的相似度，根据所述相似度从所述内容语义向量中确定目标内容语义向量。The determining module 630 calculates the similarity between the search semantic vector and the content semantic vector, and determines a target content semantic vector from the content semantic vector according to the similarity.

进一步地，确定模块630包括：距离计算子模块、相似度确定子模块以及目标向量确定子模块，其中：Further, the determination module 630 includes: a distance calculation sub-module, a similarity determination sub-module and a target vector determination sub-module, wherein:

距离计算子模块，用于计算所述搜索语义向量和所述内容语义向量之间的向量距离。A distance calculation submodule, configured to calculate a vector distance between the search semantic vector and the content semantic vector.

进一步地，距离计算子模块包括：长度计算单元、内积计算单元以及余弦距离计算单元，其中：Further, the distance calculation submodule includes: a length calculation unit, an inner product calculation unit and a cosine distance calculation unit, wherein:

长度计算单元，用于计算所述搜索语义向量与所述内容语义向量的向量长度。A length calculation unit, configured to calculate the vector length of the search semantic vector and the content semantic vector.

内积计算单元，用于计算所述搜索语义向量与所述内容语义向量的向量内积。An inner product calculation unit, configured to calculate the vector inner product of the search semantic vector and the content semantic vector.

余弦距离计算单元，用于基于所述向量内积和所述向量长度计算所述搜索语义向量和所述内容语义向量之间的余弦距离，作为所述搜索语义向量和所述内容语义向量之间的向量距离。A cosine distance calculation unit, configured to calculate the cosine distance between the search semantic vector and the content semantic vector based on the vector inner product and the vector length, as the distance between the search semantic vector and the content semantic vector vector distance.

相似度确定子模块，用于根据所述向量距离，确定所述搜索语义向量和所述内容语义向量的相似度。The similarity determination sub-module is configured to determine the similarity between the search semantic vector and the content semantic vector according to the vector distance.

目标向量确定子模块，将所述相似度满足指定条件的所述内容语义向量作为所述目标内容语义向量。The target vector determination sub-module uses the content semantic vector whose similarity satisfies a specified condition as the target content semantic vector.

处理模块640，根据所述目标内容语义向量，获取与所述目标内容语义向量对应的待推送内容，进行推送。The processing module 640 obtains the content to be pushed corresponding to the semantic vector of the target content according to the semantic vector of the target content, and pushes it.

进一步地，该装置还可以包括：文本信息获取模块、页面语义获取模块、第一计算模块、第二计算模块以及综合确定模块，其中：Further, the device may further include: a text information acquisition module, a page semantic acquisition module, a first calculation module, a second calculation module, and a comprehensive determination module, wherein:

文本信息获取模块，用于获取用户浏览页面的页面文本信息。The text information acquisition module is used to acquire the page text information of the page browsed by the user.

页面语义获取模块，用于基于预先训练的语义理解模型获取所述页面文本信息对应的页面语义向量。The page semantic acquisition module is configured to acquire the page semantic vector corresponding to the page text information based on the pre-trained semantic understanding model.

第一计算模块，用于计算所述搜索语义向量和所述内容语义向量的余弦距离，获得第一相似度。The first calculation module is configured to calculate the cosine distance between the search semantic vector and the content semantic vector to obtain a first similarity.

第二计算模块，用于计算所述页面语义向量和所述内容语义向量的余弦距离，获得第二相似度。The second calculation module is configured to calculate the cosine distance between the page semantic vector and the content semantic vector to obtain a second similarity.

综合确定模块，用于根据所述第一相似度和所述第二相似度确定所述目标内容语义向量。A comprehensive determination module is configured to determine the target content semantic vector according to the first similarity and the second similarity.

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述装置和模块的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, for the specific working process of the above-described devices and modules, reference may be made to the corresponding processes in the foregoing method embodiments, which will not be repeated here.

在本申请所提供的几个实施例中，模块相互之间的耦合可以是电性，机械或其它形式的耦合。In several embodiments provided in this application, the coupling between the modules may be electrical, mechanical or other forms of coupling.

另外，在本申请各个实施例中的各功能模块可以集成在一个处理模块中，也可以是各个模块单独物理存在，也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现，也可以采用软件功能模块的形式实现。In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist physically alone, or two or more modules may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules.

请参考图7，其示出了本申请实施例提供的一种电子设备的结构框图。该电子设备可以是上述服务器100。本申请中的电子设备700可以包括一个或多个如下部件：处理器710、存储器720、以及一个或多个应用程序，其中一个或多个应用程序可以被存储在存储器720中并被配置为由一个或多个处理器710执行，一个或多个程序配置用于执行如前述方法实施例所描述的方法。Please refer to FIG. 7 , which shows a structural block diagram of an electronic device provided by an embodiment of the present application. The electronic device may be the above-mentioned server 100 . The electronic device 700 in the present application may include one or more of the following components: a processor 710, a memory 720, and one or more application programs, wherein the one or more application programs may be stored in the memory 720 and configured by One or more processors 710 execute, and one or more programs are configured to perform the methods described in the foregoing method embodiments.

处理器710可以包括一个或者多个处理核。处理器710利用各种接口和线路连接整个电子设备700内的各个部分，通过运行或执行存储在存储器720内的指令、程序、代码集或指令集，以及调用存储在存储器120内的数据，执行电子设备100的各种功能和处理数据。可选地，处理器710可以采用数字信号处理(Digital Signal Processing，DSP)、现场可编程门阵列(Field－Programmable Gate Array，FPGA)、可编程逻辑阵列(Programmable LogicArray，PLA)中的至少一种硬件形式来实现。处理器710可集成中央处理器(CentralProcessing Unit，CPU)、图像处理器(Graphics Processing Unit，GPU)和调制解调器等中的一种或几种的组合。其中，CPU主要处理操作系统、用户界面和应用程序等；GPU用于负责显示内容的渲染和绘制；调制解调器用于处理无线通信。可以理解的是，上述调制解调器也可以不集成到处理器710中，单独通过一块通信芯片进行实现。Processor 710 may include one or more processing cores. The processor 710 uses various interfaces and lines to connect various parts of the entire electronic device 700, and executes by running or executing the instructions, programs, code sets or instruction sets stored in the memory 720, and calling the data stored in the memory 120. Various functions of the electronic device 100 and processing data. Optionally, the processor 710 may employ at least one of a digital signal processing (Digital Signal Processing, DSP), a Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), and a Programmable Logic Array (Programmable Logic Array, PLA). implemented in hardware. The processor 710 may integrate one or a combination of a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU), a modem, and the like. Among them, the CPU mainly handles the operating system, user interface and application programs, etc.; the GPU is used for rendering and drawing of the display content; the modem is used to handle wireless communication. It can be understood that, the above-mentioned modem may not be integrated into the processor 710, and is implemented by a communication chip alone.

存储器720可以包括随机存储器(Random Access Memory，RAM)，也可以包括只读存储器(Read-Only Memory)。存储器720可用于存储指令、程序、代码、代码集或指令集。存储器720可包括存储程序区和存储数据区，其中，存储程序区可存储用于实现操作系统的指令、用于实现至少一个功能的指令(比如触控功能、声音播放功能、图像播放功能等)、用于实现下述各个方法实施例的指令等。存储数据区还可以存储终端100在使用中所创建的数据(比如电话本、音视频数据、聊天记录数据)等。The memory 720 may include random access memory (Random Access Memory, RAM), or may include read-only memory (Read-Only Memory). Memory 720 may be used to store instructions, programs, codes, sets of codes, or sets of instructions. The memory 720 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playback function, an image playback function, etc.) , instructions for implementing the following method embodiments, and the like. The storage data area may also store data created by the terminal 100 during use (such as phone book, audio and video data, chat record data) and the like.

请参考图8，其示出了本申请实施例提供的一种计算机可读存储介质的结构框图。该计算机可读介质800中存储有程序代码，所述程序代码可被处理器调用执行上述方法实施例中所描述的方法。Please refer to FIG. 8 , which shows a structural block diagram of a computer-readable storage medium provided by an embodiment of the present application. The computer-readable medium 800 stores program codes, and the program codes can be invoked by the processor to execute the methods described in the above method embodiments.

计算机可读存储介质800可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。可选地，计算机可读存储介质800包括非易失性计算机可读介质(non-transitory computer-readable storage medium)。计算机可读存储介质800具有执行上述方法中的任何方法步骤的程序代码810的存储空间。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。程序代码810可以例如以适当形式进行压缩。The computer readable storage medium 800 may be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM. Optionally, the computer-readable storage medium 800 includes a non-transitory computer-readable storage medium. Computer readable storage medium 800 has storage space for program code 810 to perform any of the method steps in the above-described methods. These program codes can be read from or written to one or more computer program products. Program code 810 may be compressed, for example, in a suitable form.

综上所述，本申请提供的基于用户意图的推送方法、装置、电子设备及计算机可读介质，获取用户输入的搜索词和待推送内容映射关系集合，所述待推送内容映射关系集合包括：待推送内容与关键词之间的映射关系；基于预先训练的语义理解模型，分别获取所述搜索词对应的搜索语义向量和所述关键词对应的内容语义向量；计算所述搜索语义向量和所述内容语义向量的相似度，根据所述相似度从所述内容语义向量中确定目标内容语义向量；根据目标内容语义向量，获取与所述目标内容语义向量对应的待推送内容，进行推送。因此，可以根据用户的搜索词挖掘用户的搜索意图，从而有效地进行内容推送。In summary, the user-intent-based push method, device, electronic device and computer-readable medium provided by the present application acquire the search term input by the user and the set of mapping relationships of the content to be pushed, and the set of mapping relationships of the content to be pushed includes: The mapping relationship between the content to be pushed and the keywords; based on the pre-trained semantic understanding model, respectively obtain the search semantic vector corresponding to the search word and the content semantic vector corresponding to the keyword; calculate the search semantic vector and all According to the similarity of the content semantic vector, the target content semantic vector is determined from the content semantic vector according to the similarity; according to the target content semantic vector, the content to be pushed corresponding to the target content semantic vector is obtained and pushed. Therefore, the user's search intent can be mined according to the user's search terms, so as to effectively push content.

最后应说明的是：以上实施例仅用以说明本申请的技术方案，而非对其限制；尽管参照前述实施例对本申请进行了详细的说明，本领域的普通技术人员当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不驱使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: it can still be Modifications are made to the technical solutions described in the foregoing embodiments, or some technical features thereof are equivalently replaced; and these modifications or replacements do not drive the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A pushing method based on user intention is characterized by comprising the following steps:

acquiring a search word input by a user and a mapping relation set of contents to be pushed, wherein the mapping relation set of the contents to be pushed comprises: mapping relation between the content to be pushed and the keywords;

respectively acquiring a search semantic vector corresponding to the search word and a content semantic vector corresponding to the keyword based on a pre-trained semantic understanding model;

calculating the similarity of the search semantic vector and the content semantic vector, and determining a target content semantic vector from the content semantic vector according to the similarity;

and acquiring the content to be pushed corresponding to the target content semantic vector according to the target content semantic vector, and pushing.

2. The method according to claim 1, wherein the semantic understanding model is a bidirectional coding representation network based on a converter, and the obtaining of the search semantic vector corresponding to the search word and the content semantic vector corresponding to the keyword respectively based on the pre-trained semantic understanding model comprises:

respectively obtaining the feature vector of the search word and the feature vector of the keyword;

taking the feature vector of the search word as the input of the bidirectional coding representation network, and obtaining a search semantic vector corresponding to the search word through the bidirectional coding representation network;

and taking the feature vector of the keyword as the input of the bidirectional coding representation network, and obtaining a content semantic vector corresponding to the keyword through the bidirectional coding representation network.

3. The method according to claim 2, wherein the obtaining the feature vector of the search term and the feature vector of the keyword respectively comprises:

acquiring a text vector, a position vector and an initial word vector of the search word, and fusing the text vector, the position vector and the initial word vector of the search word to form a feature vector of the search word;

and acquiring a text vector, a position vector and an initial word vector of the keyword, and fusing the text vector, the position vector and the initial word vector of the keyword to form a feature vector of the content to be pushed.

4. The method of claim 1, wherein the calculating a similarity between the search semantic vector and the content semantic vector, and determining a target content semantic vector from the content semantic vectors according to the similarity comprises:

calculating a vector distance between the search semantic vector and the content semantic vector;

determining the similarity of the search semantic vector and the content semantic vector according to the vector distance;

and taking the content semantic vector with the similarity meeting a specified condition as the target content semantic vector.

5. The method of claim 4, wherein the calculating a vector distance between the search semantic vector and the content semantic vector comprises:

calculating the vector length of the search semantic vector and the content semantic vector;

calculating a vector inner product of the search semantic vector and the content semantic vector;

calculating a cosine distance between the search semantic vector and the content semantic vector based on the vector inner product and the vector length as a vector distance between the search semantic vector and the content semantic vector.

6. The method of claim 1, further comprising:

acquiring page text information of a user browsing page;

acquiring a page semantic vector corresponding to the page text information based on a pre-trained semantic understanding model;

calculating the cosine distance between the search semantic vector and the content semantic vector to obtain a first similarity;

calculating the cosine distance between the page semantic vector and the content semantic vector to obtain a second similarity;

and determining the semantic vector of the target content according to the first similarity and the second similarity.

7. The method according to any one of claims 1 to 6, wherein the keywords of the content to be pushed are keywords included in a title of the content to be pushed, and the obtaining of the search term input by the user and the mapping relationship set of the content to be pushed comprises:

acquiring historical search data of a user, wherein the historical search data of the user comprises historical search words of the user and the content to be pushed browsed by the user corresponding to the historical search words;

and storing the mapping relation between the content to be pushed and the title of the content to be pushed in the mapping relation set of the content to be pushed.

8. A push device based on user intent, comprising:

the information acquisition module is used for acquiring a search word input by a user and a mapping relation set of contents to be pushed, wherein the mapping relation set of the contents to be pushed comprises: mapping relation between the content to be pushed and the keywords;

the vector acquisition module is used for respectively acquiring a search semantic vector corresponding to the search word and a content semantic vector corresponding to the keyword based on a pre-trained semantic understanding model;

the determining module is used for calculating the similarity between the search semantic vector and the content semantic vector and determining a target content semantic vector from the content semantic vector according to the similarity;

and the processing module is used for acquiring the content to be pushed corresponding to the target content semantic vector according to the target content semantic vector and pushing the content.

9. An electronic device, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the method of any of claims 1-7.

10. A computer-readable medium having stored program code executable by a processor, the program code causing the processor to perform the method of any one of claims 1-7 when executed by the processor.