CN111324819B

CN111324819B - A method, device, computer equipment and storage medium for searching media content

Info

Publication number: CN111324819B
Application number: CN202010210951.7A
Authority: CN
Inventors: 王子昂; 张永华; 张梦琳
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Douyin Vision Co Ltd
Priority date: 2020-03-24
Filing date: 2020-03-24
Publication date: 2021-07-30
Anticipated expiration: 2040-03-24
Also published as: CN111324819A

Abstract

The present disclosure provides a method, an apparatus, a computer device and a storage medium for media content search, wherein the method comprises: receiving a search instruction for target media content; determining a first set of candidate media content matching the scene intent of the target media content and a second set of candidate media content matching the entity intent of the target media content based on the search instruction; and sending a search result corresponding to the target media content to the user side based on the first candidate media content set and the second candidate media content set. By adopting the scheme, the media content can be directly searched based on the media content, the user does not need to input text information to search the media content, and the searching efficiency and the searching accuracy of the user can be improved to a certain extent.

Description

A method, device, computer equipment and storage medium for searching media content

技术领域technical field

本公开涉及媒体内容处理技术领域，具体而言，涉及一种媒体内容搜索的方法、装置、计算机设备及存储介质。The present disclosure relates to the technical field of media content processing, and in particular, to a method, apparatus, computer device, and storage medium for searching media content.

背景技术Background technique

随着互联网的发展，搜索引擎成为人们上网必不可少的工具之一。传统的搜索引擎是基于文本搜索的，也即，通过文字输入进行搜索。此外，即使是有关媒体内容(例如视频)的搜索引擎也是通过媒体内容节目的名称、说明、介绍、标签等文本信息进行搜索。With the development of the Internet, search engines have become one of the indispensable tools for people to surf the Internet. Traditional search engines are text-based, that is, search through text input. In addition, even search engines related to media content (eg, videos) search through textual information such as names, descriptions, introductions, tags, etc. of media content programs.

通常情况下，在进行媒体内容搜索的过程中，可以直接对用户输入的文本信息进行识别，然后对识别出来的结果进行搜索，例如，在用户输入的文本信息与候选媒体内容的媒体内容标签信息一致时，即可以将候选媒体内容的信息反馈至用户。Usually, in the process of searching for media content, the text information input by the user can be directly identified, and then the identified results can be searched. For example, between the text information input by the user and the media content label information of the candidate media content When they are consistent, the information of the candidate media content can be fed back to the user.

可见，上述媒体内容搜索方法依赖于用户对意图搜索的媒体内容进行准确的文本描述以及文本的输入操作，这将导致搜索的效率较低，同时，在用户无法准确的进行文本描述时，往往导致媒体内容搜索的准确性较低。It can be seen that the above-mentioned media content search method relies on the user's accurate text description and text input operation for the media content that is intended to be searched, which will lead to low search efficiency. Media content searches are less accurate.

发明内容SUMMARY OF THE INVENTION

本公开实施例提供至少一种媒体内容搜索的方案，从目标媒体内容中的实体和场景两方面进行了相似媒体内容的自动搜索，无需通过文本输入进行媒体内容搜索，可以提升搜索效率及搜索的准确性，并且提高了得到满足用户意图的搜索结果的概率。The embodiments of the present disclosure provide at least one solution for searching for media content, which automatically searches for similar media content from both entities and scenes in the target media content, and does not need to search for media content through text input, which can improve search efficiency and search efficiency. accuracy, and improve the probability of getting search results that meet the user's intent.

主要包括以下几个方面：It mainly includes the following aspects:

第一方面，本公开提供了一种媒体内容搜索的方法，所述方法包括：In a first aspect, the present disclosure provides a method for searching media content, the method comprising:

接收针对目标媒体内容的搜索指令；Receive a search instruction for target media content;

基于所述搜索指令，确定与所述目标媒体内容的场景意图匹配的第一候选媒体内容集合，以及与所述目标媒体内容的实体意图匹配的第二候选媒体内容集合；determining, based on the search instruction, a first candidate media content set that matches the scene intent of the target media content, and a second candidate media content set that matches the entity intent of the target media content;

基于所述第一候选媒体内容集合，以及所述第二候选媒体内容集合，向用户端发送与所述目标媒体内容对应的搜索结果。Based on the first candidate media content set and the second candidate media content set, a search result corresponding to the target media content is sent to the user terminal.

一种可能的实施方式中，所述确定与所述目标媒体内容的场景意图匹配的第一候选媒体内容集合，包括：In a possible implementation manner, the determining of the first candidate media content set that matches the scene intent of the target media content includes:

基于所述目标媒体内容在多种预设维度下的特征信息，确定所述目标媒体内容对应的目标媒体内容特征向量；determining the target media content feature vector corresponding to the target media content based on the feature information of the target media content in multiple preset dimensions;

通过将所述目标媒体内容特征向量与场景索引库中的各个候选媒体内容特征向量进行匹配，确定与所述目标媒体内容的场景意图匹配的至少一个第一候选媒体内容，将所述至少一个第一候选媒体内容组成所述第一候选媒体内容集合。By matching the target media content feature vector with each candidate media content feature vector in the scene index library, at least one first candidate media content matching the scene intent of the target media content is determined, and the at least one first candidate media content is determined. A candidate media content constitutes the first candidate media content set.

一种可能的实施方式中，所述媒体内容为视频，所述多种预设维度包括以下维度中的多种：In a possible implementation, the media content is a video, and the multiple preset dimensions include multiple ones of the following dimensions:

视觉维度、文本信息维度、音乐维度。Visual dimension, textual information dimension, musical dimension.

一种可能的实施方式中，根据以下步骤生成所述场景索引库：In a possible implementation manner, the scene index library is generated according to the following steps:

对各个初选媒体内容进行场景意图识别，确定各个初选媒体内容中具有场景意图的第一候选媒体内容；Perform scene intention identification on each primary selection media content, and determine the first candidate media content with scene intention in each primary selection media content;

提取所述第一候选媒体内容在多种预设维度下的特征信息；extracting feature information of the first candidate media content under multiple preset dimensions;

基于所述第一候选媒体内容在多种预设维度下的特征信息，生成所述第一候选媒体内容的候选媒体内容特征向量；generating a candidate media content feature vector of the first candidate media content based on feature information of the first candidate media content in multiple preset dimensions;

将所述第一候选媒体内容的第一媒体内容标识和该第一候选媒体内容的所述候选媒体内容特征向量对应存储在所述场景索引库中。The first media content identifier of the first candidate media content and the candidate media content feature vector of the first candidate media content are stored in the scene index library correspondingly.

一种可能的实施方式中，所述通过将所述目标媒体内容特征向量与场景索引库中的各个候选媒体内容特征向量进行匹配，确定与所述目标媒体内容的场景意图匹配的至少一个第一候选媒体内容，包括：In a possible implementation manner, by matching the target media content feature vector with each candidate media content feature vector in the scene index library, determine at least one first target media content that matches the scene intent of the target media content. Candidate media content, including:

将所述目标媒体内容的目标媒体内容特征向量与场景索引库中的各个候选媒体内容特征向量进行匹配，确定与所述目标媒体内容特征向量匹配的至少一个候选媒体内容特征向量；Matching the target media content feature vector of the target media content with each candidate media content feature vector in the scene index library, and determining at least one candidate media content feature vector that matches the target media content feature vector;

基于所述场景索引库中，与确定的所述候选媒体内容特征向量对应的第一媒体内容标识，确定所述至少一个第一候选媒体内容。The at least one first candidate media content is determined based on the first media content identifier corresponding to the determined feature vector of the candidate media content in the scene index library.

一种可能的实施方式中，所述将所述至少一个第一候选媒体内容组成所述第一候选媒体内容集合，包括：In a possible implementation manner, forming the at least one first candidate media content into the first candidate media content set includes:

获取与所述第一媒体内容标识所标识的第一候选媒体内容对应的用户行为信息；acquiring user behavior information corresponding to the first candidate media content identified by the first media content identifier;

基于所述用户行为信息，从所述第一媒体内容标识所标识的第一候选媒体内容中选取符合预设要求的第一候选媒体内容，组成所述第一候选媒体内容集合。Based on the user behavior information, first candidate media content that meets a preset requirement is selected from the first candidate media content identified by the first media content identifier to form the first candidate media content set.

一种可能的实施方式中，确定与所述目标媒体内容的实体意图匹配的第二候选媒体内容集合，包括：In a possible implementation manner, determining the second candidate media content set that matches the entity intent of the target media content includes:

确定所述目标媒体内容中，与实体意图对应的目标实体的目标实体特征向量；Determine the target entity feature vector of the target entity corresponding to the entity intent in the target media content;

将所述目标实体的目标实体特征向量与实体索引库中的各个候选实体特征向量进行匹配，确定与所述目标实体匹配的第二候选媒体内容集合；所述第二候选媒体内容集合中包含至少一个第二候选媒体内容。Match the target entity feature vector of the target entity with each candidate entity feature vector in the entity index library, and determine a second candidate media content set that matches the target entity; the second candidate media content set contains at least A second candidate media content.

一种可能的实施方式中，所述确定所述目标媒体内容中，与实体意图对应的目标实体的目标实体特征向量，包括：In a possible implementation manner, the determining of the target entity feature vector of the target entity corresponding to the entity intent in the target media content includes:

检测出所述目标媒体内容中的实体；detecting an entity in the target media content;

对检测出的至少一个实体进行意图识别，确定出至少一个所述目标实体；Perform intent recognition on the detected at least one entity, and determine at least one of the target entities;

针对每个所述目标实体，基于该目标实体在所述目标媒体内容中对应的图像信息，生成该目标实体对应的所述目标实体特征向量。For each target entity, the target entity feature vector corresponding to the target entity is generated based on the image information corresponding to the target entity in the target media content.

一种可能的实施方式中，根据以下步骤生成所述实体索引库：In a possible implementation manner, the entity index library is generated according to the following steps:

确定各个初选媒体内容中包含的实体；determine the entities included in each primary media content;

对所述初选媒体内容中包含的实体进行意图识别，确定候选实体及该候选实体所在的第二候选媒体内容；Perform intent identification on entities included in the primary selection media content, and determine a candidate entity and a second candidate media content where the candidate entity is located;

基于所述候选实体在对应的第二候选媒体内容中的图像信息，生成该候选实体对应的所述候选实体特征向量；generating the candidate entity feature vector corresponding to the candidate entity based on the image information of the candidate entity in the corresponding second candidate media content;

将所述候选实体所在的第二候选媒体内容的第二媒体内容标识，和该候选实体的候选实体特征向量对应存储在所述实体索引库中。The second media content identifier of the second candidate media content where the candidate entity is located and the candidate entity feature vector of the candidate entity are correspondingly stored in the entity index library.

一种可能的实施方式中，所述将所述目标实体的目标实体特征向量与实体索引库中的各个候选实体特征向量进行匹配，确定与所述目标实体匹配的第二候选媒体内容集合，包括：In a possible implementation, the target entity feature vector of the target entity is matched with each candidate entity feature vector in the entity index library, and the second candidate media content set that matches the target entity is determined, including: :

将所述目标媒体内容中的目标实体的目标实体特征向量与实体索引库中的各个候选实体特征向量进行匹配，确定与所述目标媒体内容匹配的至少一个候选实体特征向量；Matching the target entity feature vector of the target entity in the target media content with each candidate entity feature vector in the entity index library, and determining at least one candidate entity feature vector that matches the target media content;

基于所述实体索引库中，与确定的所述候选实体特征向量对应的第二媒体内容标识，确定所述第二候选媒体内容集合。The second candidate media content set is determined based on the second media content identifier corresponding to the determined candidate entity feature vector in the entity index library.

一种可能的实施方式中，所述基于所述实体索引库中，与确定的所述候选实体特征向量对应的第二媒体内容标识，确定所述第二候选媒体内容集合，包括：In a possible implementation manner, determining the second candidate media content set based on the second media content identifier corresponding to the determined candidate entity feature vector in the entity index library includes:

获取与所述第二媒体内容标识所标识的第二候选媒体内容对应的用户行为信息；acquiring user behavior information corresponding to the second candidate media content identified by the second media content identifier;

基于所述用户行为信息，从所述第二媒体内容标识所标识的第二候选媒体内容中选取符合预设要求的第二候选媒体内容，组成所述第二候选媒体内容集合。Based on the user behavior information, a second candidate media content that meets a preset requirement is selected from the second candidate media content identified by the second media content identifier to form the second candidate media content set.

一种可能的实施方式中，所述基于所述第一候选媒体内容集合，以及所述第二候选媒体内容集合，向用户端发送与所述目标媒体内容对应的搜索结果，包括：In a possible implementation manner, the sending a search result corresponding to the target media content to the user terminal based on the first candidate media content set and the second candidate media content set includes:

生成所述第一候选媒体内容集合对应的第一集合标识信息，以及生成所述第二候选媒体内容集合中至少一个第二候选媒体内容子集分别对应的第二集合标识信息；其中，每个所述第二候选媒体内容子集对应一个与所述实体意图匹配的候选实体；generating first set identification information corresponding to the first candidate media content set, and generating second set identification information respectively corresponding to at least one second candidate media content subset in the second candidate media content set; wherein each The second candidate media content subset corresponds to a candidate entity that matches the entity intent;

将所述第一候选媒体内容集合及该第一候选媒体内容集合对应的第一集合标识信息，所述第二候选媒体内容集合中至少一个第二候选媒体内容子集及每个第二候选媒体内容子集分别对应的第二集合标识信息作为所述搜索结果发送给所述用户端。Combining the first candidate media content set and the first set identification information corresponding to the first candidate media content set, at least one second candidate media content subset and each second candidate media content in the second candidate media content set The second set identification information respectively corresponding to the content subsets is sent to the user terminal as the search result.

一种可能的实施方式中，所述第一集合标识信息包括第一缩略图片和/或第一文字描述信息；In a possible implementation manner, the first set identification information includes a first thumbnail image and/or first text description information;

所述第二集合标识信息包括第二缩略图片和/或第二文字描述信息。The second set identification information includes second thumbnail pictures and/or second text description information.

第二方面，本公开还提供了一种媒体内容搜索的方法，所述方法包括：In a second aspect, the present disclosure also provides a method for searching media content, the method comprising:

向服务器发送针对目标媒体内容的搜索指令；Send a search instruction for the target media content to the server;

接收所述服务器反馈的搜索结果；所述搜索结果中包含与所述目标媒体内容的场景意图匹配的第一候选媒体内容集合，和/或与所述目标媒体内容的实体意图匹配的第二候选媒体内容集合；Receive a search result fed back by the server; the search result includes a first candidate media content set that matches the scene intent of the target media content, and/or a second candidate that matches the entity intent of the target media content collection of media content;

基于所述搜索结果，显示搜索结果展示页面。Based on the search results, a search result presentation page is displayed.

一种可能的实施方式中，所述向服务器发送针对目标媒体内容的搜索指令，包括：In a possible implementation manner, the sending a search instruction for the target media content to the server includes:

响应针对目标媒体内容画面上的搜索按钮的触发操作，向服务器发送针对所述目标媒体内容的搜索指令；或者，In response to the trigger operation of the search button on the target media content screen, send a search instruction for the target media content to the server; or,

响应作用在目标媒体内容画面上的框选按钮的触发操作，向服务器发送针对框选的媒体内容的搜索指令。In response to the trigger operation of the frame selection button acting on the target media content screen, a search instruction for the frame-selected media content is sent to the server.

一种可能的实施方式中，所述搜索结果中还包含与第一候选媒体内容集合对应的第一集合标识信息，以及与所述第二候选媒体内容集合中至少一个第二候选媒体内容子集分别对应的第二集合标识信息；其中，每个所述第二候选媒体内容子集对应一个与所述实体意图匹配的候选实体；In a possible implementation manner, the search result further includes first set identification information corresponding to the first candidate media content set, and at least one second candidate media content subset in the second candidate media content set. respectively corresponding second set identification information; wherein, each of the second candidate media content subsets corresponds to a candidate entity that matches the entity intent;

所述基于所述搜索结果，显示搜索结果展示页面，包括：The displaying a search result display page based on the search result, including:

基于所述搜索结果，显示包含所述第一集合标识信息和所述第二集合标识信息的搜索结果展示页面；Based on the search result, displaying a search result display page containing the first set identification information and the second set identification information;

所述显示搜索结果展示页面之后，还包括：After the displaying the search result display page, the method further includes:

响应针对任一集合标识信息的触发操作，展示与该任一集合标识信息对应的候选媒体内容；其中，所述任一集合标识信息为所述第一集合标识信息或任一所述第二集合标识信息。In response to a trigger operation for any set identification information, display candidate media content corresponding to the any set identification information; wherein, the any set identification information is the first set identification information or any of the second sets identification information.

一种可能的实施方式中，基于所述搜索结果，显示包含所述第一集合标识信息和至少一个第二集合标识信息的搜索结果展示页面，包括：In a possible implementation manner, based on the search result, displaying a search result display page containing the first set of identification information and at least one second set of identification information, including:

基于所述搜索结果，显示包含第一搜索结果展示区域和第二搜索结果展示区域的所述搜索结果展示页面；Based on the search result, displaying the search result display page including the first search result display area and the second search result display area;

其中，所述第一搜索结果展示区域中包含所述第一集合标识信息和所述第二集合标识信息；所述第二搜索结果展示区域中包含媒体内容列表，所述媒体内容列表中包含所述第一候选媒体内容集合和所述第二候选媒体内容集合中的各个媒体内容。Wherein, the first search result display area includes the first set identification information and the second set identification information; the second search result display area includes a media content list, and the media content list includes the each media content in the first candidate media content set and the second candidate media content set.

一种可能的实施方式中，所述方法还包括：In a possible implementation, the method further includes:

在响应针对任一集合标识信息的触发操作，展示与该任一集合标识信息对应的候选媒体内容之后，响应滑动触发操作，切换为展示其它集合标识信息对应的其它候选媒体内容。After displaying candidate media content corresponding to any set identification information in response to a trigger operation for any set identification information, in response to a sliding trigger operation, switching to display other candidate media content corresponding to other set identification information.

第三方面，本公开还提供了一种媒体内容搜索的装置，所述装置包括：In a third aspect, the present disclosure also provides an apparatus for searching for media content, the apparatus comprising:

指令接收模块，用于接收针对目标媒体内容的搜索指令；an instruction receiving module for receiving a search instruction for the target media content;

集合确定模块，用于基于所述搜索指令，确定与所述目标媒体内容的场景意图匹配的第一候选媒体内容集合，以及与所述目标媒体内容的实体意图匹配的第二候选媒体内容集合；a set determining module, configured to determine, based on the search instruction, a first candidate media content set that matches the scene intent of the target media content, and a second candidate media content set that matches the entity intent of the target media content;

结果搜索模块，用于基于所述第一候选媒体内容集合，以及所述第二候选媒体内容集合，向用户端发送与所述目标媒体内容对应的搜索结果。A result search module, configured to send a search result corresponding to the target media content to the user terminal based on the first candidate media content set and the second candidate media content set.

第四方面，本公开还提供了一种媒体内容搜索的装置，所述装置包括：In a fourth aspect, the present disclosure also provides an apparatus for searching for media content, the apparatus comprising:

指令发送模块，用于向服务器发送针对目标媒体内容的搜索指令；an instruction sending module, configured to send a search instruction for the target media content to the server;

结果接收模块，用于接收所述服务器反馈的搜索结果；所述搜索结果中包含与所述目标媒体内容的场景意图匹配的第一候选媒体内容集合，和/或与所述目标媒体内容的实体意图匹配的第二候选媒体内容集合；A result receiving module, configured to receive a search result fed back by the server; the search result includes a first candidate media content set that matches the scene intent of the target media content, and/or an entity that matches the target media content an intent-matched second set of candidate media content;

页面显示模块，用于基于所述搜索结果，显示搜索结果展示页面。A page display module, configured to display a search result display page based on the search result.

第五方面，本公开还提供了一种计算机设备，包括：处理器、存储器和总线，所述存储器存储有所述处理器可执行的机器可读指令，当计算机设备运行时，所述处理器与所述存储器之间通过总线通信，所述机器可读指令被所述处理器执行时执行如第一方面及其各种实施方式中任一所述的媒体内容搜索的方法的步骤。In a fifth aspect, the present disclosure also provides a computer device, comprising: a processor, a memory, and a bus, the memory stores machine-readable instructions executable by the processor, and when the computer device runs, the processor In communication with the memory through a bus, the machine-readable instructions, when executed by the processor, perform the steps of the method for searching media content according to any one of the first aspect and its various embodiments.

第六方面，本公开还提供了一种计算机可读存储介质，该计算机可读存储介质上存储有计算机程序，该计算机程序被处理器运行时执行如第一方面及其各种实施方式中任一所述的媒体内容搜索的方法的步骤。In a sixth aspect, the present disclosure also provides a computer-readable storage medium, on which a computer program is stored, and the computer program is executed by a processor when any of the first aspect and its various embodiments are executed. Steps of a method for searching media content.

采用上述媒体内容搜索的方案，其服务器在响应针对目标媒体内容的搜索指令之后，一方面可以确定与目标媒体内容的场景意图匹配的第一候选媒体内容集合，另一方面可以确定与目标媒体内容的实体意图匹配的第二候选媒体内容集合，然后即可以将上述第一候选媒体内容集合和第二候选媒体内容集合作为响应搜索指令的搜索结果，返回至用户端进行查看。With the above solution for searching media content, after responding to the search instruction for the target media content, the server can, on the one hand, determine the first candidate media content set that matches the scene intent of the target media content; Then, the first candidate media content set and the second candidate media content set can be returned to the user terminal for viewing as search results in response to the search instruction.

可见，上述方案不仅实现了基于场景意图的相似媒体内容的自动搜索，还实现了基于实体意图的相似媒体内容的自动搜索，由于可以直接基于媒体内容进行搜索，不需要用户输入文本信息进行媒体内容搜索，能够一定程度上提升用户的搜索效率及搜索的准确性，与此同时，上述方案可以基于多重用户意图进行媒体内容搜索，为满足用户的搜索意图提供了保障，例如，在目标媒体内容是有关一只猫与主人的互动媒体内容时，利用上述方案不仅可以搜索出与猫这一目标实体相关的媒体内容集合，还可以搜索出与互动场景相关的媒体内容集合，这涵盖了用户多种可能的搜索意图，提升了搜索结果的全面性。It can be seen that the above solution not only realizes the automatic search of similar media content based on scene intention, but also realizes the automatic search of similar media content based on entity intention. Since the search can be directly based on the media content, the user does not need to input text information to search for the media content. Search can improve the user's search efficiency and search accuracy to a certain extent. At the same time, the above solution can search for media content based on multiple user intentions, which provides a guarantee for satisfying the user's search intention. For example, when the target media content is When it comes to the interactive media content between a cat and its owner, the above solution can not only search for a set of media content related to the target entity of the cat, but also search for a set of media content related to the interactive scene, which covers a variety of users. Possible search intent, improving the comprehensiveness of search results.

为使本公开的上述目的、特征和优点能更明显易懂，下文特举较佳实施例，并配合所附附图，作详细说明如下。In order to make the above-mentioned objects, features and advantages of the present disclosure more obvious and easy to understand, the preferred embodiments are exemplified below, and are described in detail as follows in conjunction with the accompanying drawings.

附图说明Description of drawings

为了更清楚地说明本公开实施例的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，此处的附图被并入说明书中并构成本说明书中的一部分，这些附图示出了符合本公开的实施例，并与说明书一起用于说明本公开的技术方案。应当理解，以下附图仅示出了本公开的某些实施例，因此不应被看作是对范围的限定，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他相关的附图。In order to explain the technical solutions of the embodiments of the present disclosure more clearly, the following briefly introduces the accompanying drawings required in the embodiments, which are incorporated into the specification and constitute a part of the specification. The drawings illustrate embodiments consistent with the present disclosure, and together with the description serve to explain the technical solutions of the present disclosure. It should be understood that the following drawings only show some embodiments of the present disclosure, and therefore should not be regarded as limiting the scope. Other related figures are obtained from these figures.

图1示出了本公开实施例一所提供的一种媒体内容搜索的方法流程图；FIG. 1 shows a flowchart of a method for searching media content provided by Embodiment 1 of the present disclosure;

图2示出了本公开实施例一所提供的媒体内容搜索的方法的搜索应用示意图；FIG. 2 shows a schematic diagram of a search application of the method for searching media content provided by Embodiment 1 of the present disclosure;

图3示出了本公开实施例一所提供的媒体内容搜索的方法中，确定第一候选媒体内容集合具体方法的流程图；3 shows a flowchart of a specific method for determining a first candidate media content set in the media content search method provided by Embodiment 1 of the present disclosure;

图4示出了本公开实施例一所提供的媒体内容搜索的方法中，生成场景索引库具体方法的流程图；4 shows a flowchart of a specific method for generating a scene index library in the method for searching media content provided by Embodiment 1 of the present disclosure;

图5示出了本公开实施例一所提供的媒体内容搜索的方法中，确定第二候选媒体内容集合具体方法的流程图；5 shows a flowchart of a specific method for determining a second candidate media content set in the method for searching media content provided by Embodiment 1 of the present disclosure;

图6示出了本公开实施例一所提供的媒体内容搜索的方法中，生成目标实体特征向量具体方法的流程图；6 shows a flowchart of a specific method for generating a target entity feature vector in the method for searching media content provided by Embodiment 1 of the present disclosure;

图7示出了本公开实施例一所提供的媒体内容搜索的方法中，生成实体索引库具体方法的流程图；7 shows a flowchart of a specific method for generating an entity index library in the method for searching media content provided by Embodiment 1 of the present disclosure;

图8示出了本公开实施例一所提供的媒体内容搜索的方法的应用示意图；FIG. 8 shows an application schematic diagram of the method for searching media content provided by Embodiment 1 of the present disclosure;

图9示出了本公开实施例二所提供的一种媒体内容搜索的方法流程图；FIG. 9 shows a flowchart of a method for searching media content provided by Embodiment 2 of the present disclosure;

图10示出了本公开实施例三所提供的一种媒体内容搜索的装置的示意图；FIG. 10 shows a schematic diagram of an apparatus for searching media content according to Embodiment 3 of the present disclosure;

图11示出了本公开实施例三所提供的另一种媒体内容搜索的装置的示意图；FIG. 11 shows a schematic diagram of another apparatus for media content search provided by Embodiment 3 of the present disclosure;

图12示出了本公开实施例四所提供的一种计算机设备的示意图；FIG. 12 shows a schematic diagram of a computer device according to Embodiment 4 of the present disclosure;

图13示出了本公开实施例四所提供的另一种计算机设备的示意图。FIG. 13 shows a schematic diagram of another computer device provided by Embodiment 4 of the present disclosure.

具体实施方式Detailed ways

为使本公开实施例的目的、技术方案和优点更加清楚，下面将结合本公开实施例中附图，对本公开实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本公开一部分实施例，而不是全部的实施例。通常在此处附图中描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此，以下对在附图中提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围，而是仅仅表示本公开的选定实施例。基于本公开的实施例，本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例，都属于本公开保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only These are some, but not all, embodiments of the present disclosure. The components of the disclosed embodiments generally described and illustrated in the drawings herein may be arranged and designed in a variety of different configurations. Therefore, the following detailed description of the embodiments of the disclosure provided in the accompanying drawings is not intended to limit the scope of the disclosure as claimed, but is merely representative of selected embodiments of the disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative work fall within the protection scope of the present disclosure.

在进行媒体内容搜索的过程中，一般依赖于用户输入的对媒体内容进行描述的文本信息，这种方法依赖于用户对意图搜索的媒体内容进行准确的文本描述，这将导致搜索的效率较低，同时，在用户无法准确的进行文本描述时，往往导致媒体内容搜索的准确性较低。In the process of searching for media content, it generally relies on the text information entered by the user to describe the media content. This method relies on the user's accurate textual description of the media content to be searched for, which will lead to low search efficiency. , and at the same time, when the user cannot accurately describe the text, the accuracy of the media content search is often low.

基于上述研究，本公开实施例提供了至少一种媒体内容搜索的方案，从目标媒体内容中的实体和场景两方面进行了相似媒体内容的自动搜索，无需通过文本输入进行媒体内容搜索，可以提升搜索效率及搜索的准确性，并且由于考虑了多重搜索意图，提高了得到的搜索结果的全面性。Based on the above research, the embodiments of the present disclosure provide at least one solution for media content search, which automatically searches for similar media content from both entities and scenes in the target media content, and does not need to search for media content through text input, which can improve the The search efficiency and search accuracy are improved, and the comprehensiveness of the obtained search results is improved due to the consideration of multiple search intents.

针对以上方案所存在的缺陷，均是发明人在经过实践并仔细研究后得出的结果，因此，上述问题的发现过程以及下文中本公开针对上述问题所提出的解决方案，都应该是发明人在本公开过程中对本公开做出的贡献。The defects existing in the above solutions are all the results obtained by the inventor after practice and careful research. Therefore, the discovery process of the above problems and the solutions to the above problems proposed by the present disclosure hereinafter should be the inventors Contributions made to this disclosure during the course of this disclosure.

应注意到：相似的标号和字母在下面的附图中表示类似项，因此，一旦某一项在一个附图中被定义，则在随后的附图中不需要对其进行进一步定义和解释。It should be noted that like numerals and letters refer to like items in the following figures, so once an item is defined in one figure, it does not require further definition and explanation in subsequent figures.

为便于对本实施例进行理解，首先对本公开实施例所公开的媒体内容搜索的方法进行详细介绍，本公开实施例所提供的媒体内容搜索的方法的执行主体一般为具有一定计算能力的计算机设备，该计算机设备可以是：终端设备或服务器或其它处理设备，终端设备可以为用户设备(User Equipment，UE)、移动设备、用户终端、终端、蜂窝电话、无绳电话、个人数字处理(Personal Digital Assistant，PDA)、手持设备、计算设备、车载设备、可穿戴设备等。在一些可能的实现方式中，该媒体内容搜索的方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。In order to facilitate the understanding of this embodiment, the method for searching media content disclosed by the embodiment of the present disclosure is first introduced in detail. The computer equipment can be: terminal equipment or server or other processing equipment, and the terminal equipment can be user equipment (User Equipment, UE), mobile equipment, user terminal, terminal, cellular phone, cordless phone, personal digital assistant (Personal Digital Assistant, PDA), handheld devices, computing devices, in-vehicle devices, wearable devices, etc. In some possible implementations, the method for searching for media content may be implemented by a processor invoking computer-readable instructions stored in a memory.

下面首先以执行主体为服务器对本公开实施例提供的一种媒体内容搜索的方法加以说明。First, a method for searching media content provided by an embodiment of the present disclosure will be described below by taking the execution subject as the server.

实施例一Example 1

参见图1所示，为本公开实施例一提供的媒体内容搜索的方法的流程图，方法包括步骤S101～S103，其中：Referring to FIG. 1, which is a flowchart of a method for searching media content according to Embodiment 1 of the present disclosure, the method includes steps S101-S103, wherein:

S101、接收针对目标媒体内容的搜索指令；S101, receiving a search instruction for target media content;

S102、基于搜索指令，确定与目标媒体内容的场景意图匹配的第一候选媒体内容集合，以及与目标媒体内容的实体意图匹配的第二候选媒体内容集合；S102, based on the search instruction, determine a first candidate media content set that matches the scene intent of the target media content, and a second candidate media content set that matches the entity intent of the target media content;

S103、基于第一候选媒体内容集合，以及第二候选媒体内容集合，向用户端发送与目标媒体内容对应的搜索结果。S103. Based on the first candidate media content set and the second candidate media content set, send a search result corresponding to the target media content to the client.

为了便于理解本公开实施例提供的媒体内容搜索的方法，首先对该媒体内容搜索的方法的可能的应用场景进行简单介绍。在用户发起针对目标媒体内容的搜索指令后，用户端将针对目标媒体内容的搜索指令发送给服务器，服务器基于上述方法步骤，得到与目标媒体内容的场景意图匹配的候选媒体内容，以及与目标媒体内容对应的实体意图匹配的候选媒体内容，然后在返回给用户端的搜索结果中，可以将上述与目标媒体内容的场景意图匹配的候选媒体内容以及与目标媒体内容对应的实体意图匹配的候选媒体内容发送至用户端。In order to facilitate understanding of the method for searching media content provided by the embodiments of the present disclosure, a possible application scenario of the method for searching media content is briefly introduced first. After the user initiates the search instruction for the target media content, the client sends the search instruction for the target media content to the server, and the server obtains the candidate media content matching the scene intent of the target media content based on the above method steps, and the target media content The candidate media content that matches the entity intent corresponding to the content, and then, in the search result returned to the client, the above-mentioned candidate media content that matches the scene intent of the target media content and the candidate media content that matches the entity intent corresponding to the target media content can be sent to the client.

这样，用户端可以将与场景意图匹配的候选媒体内容和与实体意图匹配的候选媒体内容进行混合排序后展示，也可以将场景意图匹配的候选媒体内容和与实体意图匹配的候选媒体内容分类展示。In this way, the user terminal can mix and display the candidate media content that matches the scene intent and the candidate media content that matches the entity intent, and can also display the candidate media content that matches the scene intent and the candidate media content that matches the entity intent. .

其中，上述搜索指令可以是基于用户端上设置的相关搜索按钮的触发所生成的，比如，在用户端的媒体内容播放页面上设置搜索按钮，在该搜索按钮被触发后生成上述搜索指令；也可以是在媒体内容播放的过程中，在媒体内容上执行相应的搜索操作所生成的，例如，在当前播放的媒体内容上利用搜索框框选媒体内容区域来发起上述搜索指令；还可以是在用户端进入某个媒体内容播放页面后默认触发的，也即进入媒体内容播放页面后自动进行媒体内容的搜索。除此之外，上述搜索指令还可以是能够发起媒体内容搜索指令的其它方式所触发的，本公开实施例对此不做具体的限制。Wherein, the above-mentioned search instruction may be generated based on the triggering of a relevant search button set on the user terminal, for example, a search button is set on the media content playback page of the user terminal, and the above-mentioned search instruction is generated after the search button is triggered; or It is generated by performing a corresponding search operation on the media content during the playback of the media content, for example, using the search box to select the media content area on the currently playing media content to initiate the above-mentioned search instruction; Triggered by default after entering a certain media content playback page, that is, the media content search is automatically performed after entering the media content playback page. Besides, the above-mentioned search instruction may also be triggered by other manners capable of initiating a media content search instruction, which is not specifically limited in this embodiment of the present disclosure.

上述目标媒体内容可以是一帧或多帧图片，或者，可以是视频，还可以是当前播放的视频中的某个图片帧，或者是当前播放的视频中的某个图片帧及该图片帧前后的若干图片帧所构成的图片组，还可以是其它媒体内容，本公开实施例对此不做具体的限制。考虑到视频搜索的广泛应用，接下来可以以视频作为目标多媒体内容进行具体示例。The above-mentioned target media content may be one or more frames of pictures, or, it may be a video, or a certain picture frame in the currently playing video, or a certain picture frame in the currently playing video and before and after the picture frame. The picture group formed by several picture frames may also be other media content, which is not specifically limited in this embodiment of the present disclosure. Considering the wide application of video search, a specific example can be given below with video as the target multimedia content.

基于搜索指令的响应，服务器可以确定与目标媒体内容的场景意图匹配的第一候选媒体内容集合以及与目标媒体内容的实体意图匹配的第二候选媒体内容集合。Based on the response to the search instruction, the server may determine a first set of candidate media content that matches the scene intent of the target media content and a second set of candidate media content that matches the entity intent of the target media content.

其中，上述第一候选媒体内容集合可以是从预设媒体数据库中筛选出的与目标媒体内容中的场景意图相关的各个第一候选媒体内容的集合，本公开实施例中可以通过比较目标媒体内容对应的目标媒体内容特征向量与预设媒体数据库中各个初选媒体内容对应的媒体内容特征向量之间的比对结果来确定上述第一候选媒体内容。上述第二候选媒体内容集合则可以是从预设媒体数据库中筛选出的与目标媒体内容中的实体意图相关的各个第二候选媒体内容的集合，本公开实施例中可以通过比较目标媒体内容中目标实体的目标实体特征向量与预设媒体数据库中的各个初选媒体内容中候选实体(与目标实体匹配的实体)的候选实体特征向量之间的比较结果来确定上述第二候选媒体内容。The above-mentioned first candidate media content set may be a set of each first candidate media content related to the scene intent in the target media content filtered from the preset media database. In this embodiment of the present disclosure, the target media content can be compared by comparing the target media content The first candidate media content is determined by the comparison result between the corresponding target media content feature vector and the media content feature vector corresponding to each primary selection media content in the preset media database. The above-mentioned second candidate media content set may be a set of each second candidate media content related to the entity intent in the target media content filtered from the preset media database. The second candidate media content is determined by comparing the target entity feature vector of the target entity with the candidate entity feature vector of candidate entities (entities matching the target entity) in each primary media content in the preset media database.

本公开实施例中的场景意图可以对应影视场景、游戏场景、自然场景等分类场景，还可以是上述场景的进一步分类场景，例如是室外的影视场景、家居的影视场景等，还可以是其它具有情感属性的场景，例如是搞笑、严肃、悲伤、欢快等场景风格，本公开实施例可以基于不同的应用需求确定对应的场景意图，本公开实施例对此不做具体的限制。The scene intent in the embodiments of the present disclosure may correspond to classified scenes such as film and television scenes, game scenes, natural scenes, etc., or may be further classified scenes of the above-mentioned scenes, such as outdoor film and television scenes, home film and television scenes, etc., and may also be other scenes with For scenes with emotional attributes, for example, scene styles such as funny, serious, sad, cheerful, etc., the embodiments of the present disclosure may determine corresponding scene intentions based on different application requirements, which are not specifically limited in the embodiments of the present disclosure.

本公开实施例中，目标媒体内容可以具有单一场景属性，也可以是由多个场景属性组成，为了兼顾各种场景属性，本公开实施例可以将与目标媒体内容的场景意图的各个场景属性相匹配的初选媒体内容均确定为第一候选媒体内容集合的媒体元素。例如，在确定目标媒体内容的场景既搞笑又严肃的情况下，可以基于与搞笑意图匹配度较高的初选媒体内容确定第一候选媒体内容集合，还可以基于与严肃意图匹配度较高的初选媒体内容确定第一候选媒体内容集合。In the embodiment of the present disclosure, the target media content may have a single scene attribute, or may be composed of multiple scene attributes. In order to take into account various scene attributes, the embodiment of the present disclosure may match each scene attribute with the scene intent of the target media content. The matched primary media content is determined as a media element of the first candidate media content set. For example, in the case where it is determined that the scene of the target media content is both funny and serious, the first set of candidate media content may be determined based on the primary selection media content with a high degree of matching with the funny intent, and may also be determined based on the primary selected media content with a high degree of matching with the funny intent, and may also be based on the high degree of matching with the serious intent. The primary selection of media content determines a first set of candidate media content.

本公开实施例中的实体意图可以对应目标媒体内容中的目标实体，该目标实体可以为一个，也可以为多个。这里，以人猫互动的一个视频为例，人可以作为一个目标实体，猫也可以作为一个目标实体。The entity intent in the embodiment of the present disclosure may correspond to a target entity in the target media content, and the target entity may be one or multiple. Here, taking a video of human-cat interaction as an example, a person can be used as a target entity, and a cat can also be used as a target entity.

本公开实施例中，在确定第一候选媒体内容集合和第二候选媒体内容集合之后，即可以向用户端发送与目标媒体内容对应的搜索结果。In the embodiment of the present disclosure, after the first candidate media content set and the second candidate media content set are determined, a search result corresponding to the target media content may be sent to the user terminal.

针对目标媒体内容的场景意图所确定的第一候选媒体内容集合而言，该候选媒体内容集合中的各个第一候选媒体内容所对应的场景意图可以是与目标媒体内容的场景意图中的一个或多个场景属性相关的，因此，本公开实施例中，为了便于兼顾各个场景属性，可以为第一候选媒体内容集合对应一个第一集合标识信息，以便用户端根据上述第一集合标识信息实现与场景意图相关的搜索结果的显示。For the first candidate media content set determined by the scene intent of the target media content, the scene intent corresponding to each first candidate media content in the candidate media content set may be one of the scene intents of the target media content or A plurality of scene attributes are related. Therefore, in this embodiment of the present disclosure, in order to facilitate consideration of various scene attributes, a first set of identification information may be corresponding to the first candidate media content set, so that the client can realize and The display of search results related to the scene intent.

其中，本公开实施例中的第一集合标识信息可以包括用于指示场景的第一缩略图片，还可以是用于描述场景的第一文字描述内容。Wherein, the first set identification information in the embodiment of the present disclosure may include a first thumbnail image used to indicate a scene, and may also be a first text description content used to describe the scene.

与此同时，针对目标媒体内容的实体意图所确定的第二候选媒体内容集合而言，该候选媒体内容集合中的各个第二候选媒体内容所对应的实体意图可以是与目标媒体内容中的目标实体相关的，因此，为了便于实现多个目标实体的搜索，本公开实施例中可以为第二候选媒体内容集合中至少一个第二候选媒体内容子集分别对应一个第二集合标识信息，每个第二候选媒体内容子集可以对应一个与实体意图匹配的候选实体，以便用户端根据上述第二集合标识信息实现与实体意图相关的搜索结果的显示。At the same time, for the second candidate media content set determined by the entity intention of the target media content, the entity intention corresponding to each second candidate media content in the candidate media content set may be the same as the target media content in the target media content. Entity-related, therefore, in order to facilitate searching for multiple target entities, in this embodiment of the present disclosure, at least one second candidate media content subset in the second candidate media content set may correspond to a second set of identification information respectively, and each The second candidate media content subset may correspond to a candidate entity that matches the entity's intent, so that the user terminal can display search results related to the entity's intent according to the second set of identification information.

上述第二集合标识信息可以包括用于指示候选实体的第一缩略图片，还可以是用于描述候选实体的第二文字描述内容。The above-mentioned second set identification information may include a first thumbnail image used to indicate a candidate entity, and may also be a second text description content used to describe the candidate entity.

这里，用户端可以基于接收到的上述搜索结果进行媒体内容展示。下面结合图2(a)～2(c)所示的用户端界面呈现效果图对本公开实施例提供的上述媒体内容搜索的方法进行示例说明。Here, the user terminal may display media content based on the received search results. The foregoing method for searching media content provided by the embodiments of the present disclosure will be described below with reference to the rendering effect diagrams of the user terminal interface shown in FIGS. 2( a ) to 2 ( c ).

如图2(a)所示，用户端所呈现的目标媒体内容画面(即人猫互动画面)上包括有搜索按钮(○)。在用户触发该搜索按钮之后，即可以向服务器发出有关目标媒体内容的搜索指令。服务器则可以基于该搜索指令分别确定与目标媒体内容对应的第一候选媒体内容集合(即多个家居视频)及其对应的第一集合标识信息(即场景这一标识)、第二候选媒体内容集合中的一个第二候选媒体内容子集合(即多个猫咪视频)及其对应的一个第二集合标识信息(即猫咪这一标识)、以及第二候选媒体内容集合中的另一个第二候选媒体内容子集合(即多个人物视频)及其对应的一个第二集合标识信息(即人物这一标识)。As shown in FIG. 2( a ), the target media content screen (ie, the human-cat interactive screen) presented by the user terminal includes a search button (○). After the user triggers the search button, a search instruction about the target media content can be sent to the server. The server can then determine, based on the search instruction, the first candidate media content set (ie, multiple home videos) corresponding to the target media content, and its corresponding first set identification information (ie, the scene identifier), and the second candidate media content. A second candidate media content subset in the set (that is, multiple cat videos) and a corresponding second set identification information (that is, the cat identification), and another second candidate in the second candidate media content set A subset of media content (ie, multiple person videos) and a corresponding second set of identification information (ie, an identification of a person).

其中，上述场景标识、猫咪标识、人物标识可以采用缩略图片的标识方式还可以采用文字描述内容的标识方式，还可以结合上述两种标识方式进行标识展示。Among them, the above-mentioned scene identification, cat identification, and character identification may adopt the identification method of thumbnail images or the identification method of text description content, and may also combine the above two identification methods for identification display.

从图2(b)所示的搜索结果展示来看，示例的是场景图、猫咪图、人物图这三个缩略图片标识，可以展示有与场景标识对应的多个家居视频，即展示与相似场景对应的搜索结果。通过切换操作，还展示与猫咪标识对应的多个猫咪视频(未示出)，还可以是与人物标识对应的人物视频(未示出)。从图2(c)所示的搜索结果展示来看，示例的是场景图、猫咪图、人物图这三个缩略图片标识及各个缩略图片标识所对应的文本描述标识，有关具体的搜索展示结果与图2(b)所展示的结果相同，在此不再赘述。From the display of the search results shown in Figure 2(b), the three thumbnail image identifiers of scene image, cat image, and character image are shown as examples, and multiple home videos corresponding to the scene identifiers can be displayed, that is, the display with Search results corresponding to similar scenes. Through the switching operation, a plurality of cat videos (not shown) corresponding to the cat logos are also displayed, and it may also be a character video (not shown) corresponding to the character logos. From the display of the search results shown in Figure 2(c), the three thumbnail image identifiers of scene graph, cat image, and character image and the text description identifiers corresponding to each thumbnail image identifier are shown as examples. The displayed results are the same as those shown in FIG. 2( b ), and are not repeated here.

这里，有关多个标识所对应展示的搜索结果可以基于滑动操作进行切换显示，例如，可以通过左右滑动在上述三个标识之间进行切换，从而实现其对应的显示内容的分类展示，在确定搜索全面性的前提下，确保搜索的针对性。Here, the search results displayed corresponding to the multiple logos can be switched and displayed based on the sliding operation. For example, the three logos can be switched between the above three logos by sliding left and right, so as to realize the classified display of the corresponding display contents. After confirming the search On the premise of comprehensiveness, ensure the pertinence of the search.

值得说明的是，有关搜索结果的具体展示方式，例如一行展示几个结果、采用纵向展示还是横向展示等均可以基于不同的应用需求来选取，在此不做具体的限制。It is worth noting that the specific display method of search results, such as displaying several results in one row, using vertical display or horizontal display, etc., can be selected based on different application requirements, and no specific restrictions are made here.

为了进一步满足用户的自定义搜索需求，本公开实施例还可以在提供上述搜索结果的同时，提供手动选择按钮(图2(b)和图2(c)右上角所示)，在用户触发这一选择按钮之后，可以跳转至目标媒体内容画面，以便用户进一步进行目标媒体内容的选择，有关基于选择操作触发搜索指令及根据该搜索指令进行相似媒体内容的搜索过程可参见上述描述内容，在此不再赘述。In order to further meet the user's customized search requirements, the embodiment of the present disclosure may also provide a manual selection button (shown in the upper right corners of FIG. 2(b) and FIG. 2(c) ) while providing the above search results. After selecting the button, you can jump to the target media content screen, so that the user can further select the target media content. For the process of triggering a search instruction based on the selection operation and searching for similar media content according to the search instruction, please refer to the above description. This will not be repeated here.

值得说明的是，本公开实施例不仅可以在展示搜索结果的同时支持用户的手动选择，还可以直接基于用户的手动选择，向服务器发起针对框选的媒体内容的搜索指令，以实现相似媒体内容的搜索，具体过程在此不再赘述。It is worth noting that the embodiments of the present disclosure can not only support the user's manual selection while displaying the search results, but also directly based on the user's manual selection to initiate a search instruction for the frame-selected media content to the server, so as to realize similar media content. search, the specific process will not be repeated here.

除此之外，本公开实施例不仅可以支持上述分类展示方式，还可以将第一候选媒体内容集合与第二候选媒体内容集合进行组合，而后进行组合展示，也即，不区分是基于实体意图所搜索到的相似媒体内容还是基于场景意图所搜索到的相似媒体内容，还可以采用其它展示方式，本公开实施例对此不做具体的限制。In addition, the embodiments of the present disclosure can not only support the above-mentioned classification and display methods, but also combine the first candidate media content set with the second candidate media content set, and then perform the combined display, that is, the indiscrimination is based on entity intent The searched similar media content is still the similar media content searched based on the scene intent, and other display manners may also be used, which are not specifically limited in this embodiment of the present disclosure.

本公开实施例中，第一候选媒体内容集合以及第二候选媒体内容集合的确定作为进行相似媒体内容搜索的关键步骤，接下来可以分别进行描述：In the embodiment of the present disclosure, the determination of the first candidate media content set and the second candidate media content set is a key step in searching for similar media content, which can be described separately below:

针对第一候选媒体内容集合的确定而言，本公开实施例可以根据目标媒体内容对应的目标媒体内容特征向量与场景索引库中的各个候选媒体内容特征向量之间的匹配结果来确定，如图3所示，上述确定第一候选媒体内容集合的方法具体包括如下步骤：For the determination of the first candidate media content set, the embodiment of the present disclosure may be determined according to the matching result between the target media content feature vector corresponding to the target media content and each candidate media content feature vector in the scene index library, as shown in FIG. 3, the above-mentioned method for determining the first candidate media content set specifically includes the following steps:

S301、基于目标媒体内容在多种预设维度下的特征信息，确定目标媒体内容对应的目标媒体内容特征向量；S301, based on the feature information of the target media content under multiple preset dimensions, determine a target media content feature vector corresponding to the target media content;

S302、通过将目标媒体内容特征向量与场景索引库中的各个候选媒体内容特征向量进行匹配，确定与目标媒体内容的场景意图匹配的至少一个第一候选媒体内容，将至少一个第一候选媒体内容组成第一候选媒体内容集合。S302: Determine at least one first candidate media content that matches the scene intent of the target media content by matching the target media content feature vector with each candidate media content feature vector in the scene index library, and match the at least one first candidate media content A first candidate media content set is formed.

这里，上述目标媒体内容特征向量可以是针对目标媒体内容的场景意图所提取的特征向量。对于目标媒体内容的场景意图而言，可以是与各种预设维度下的特征信息相关的，以视频作为媒体内容为例，上述预设维度可以是与场景关注度相关的视觉维度，该视觉维度可以表征应用场景的相关信息，例如，是在室外还是家居环境等信息；还可以是与媒体内容描述相关的文本信息维度；还可以是与场景氛围相关的音乐维度，例如是悲伤的音乐还是欢快的音乐等信息；还可以是其它与场景意图相关的预设维度，本公开实施例对此不做具体的限制。Here, the above-mentioned feature vector of the target media content may be a feature vector extracted for the scene intent of the target media content. For the scene intention of the target media content, it may be related to the feature information under various preset dimensions. Taking video as media content as an example, the above-mentioned preset dimension may be a visual dimension related to the scene attention. The dimension can represent the relevant information of the application scene, such as information such as outdoor or home environment; it can also be the text information dimension related to the description of the media content; it can also be the music dimension related to the scene atmosphere, such as sad music or information such as cheerful music; it may also be other preset dimensions related to the scene intent, which are not specifically limited in this embodiment of the present disclosure.

本公开实施例中，可以直接基于目标媒体内容在上述各种预设维度下的特征信息，确定对应的目标媒体内容特征向量，也即，将各种预设维度下的特征信息进行拼接即可以得到目标媒体内容的目标媒体内容特征向量。例如，预设维度为3个，分别为视觉维度、文本信息维度和音乐维度时，一个预设维度可以对应1个特征值，也可以对应一个特征向量，例如文本信息维度可以对应一个特征向量(如将文本转换为文本向量)，而音乐维度可以对应一个特征值(如欢快音乐对应特征值为1，悲伤音乐对应特征值为0)，这里，将各个预设维度所对应的特征信息进行进行拼接，即可得到目标媒体内容特征向量。In the embodiment of the present disclosure, the corresponding feature vector of the target media content can be determined directly based on the feature information of the target media content in the above-mentioned various preset dimensions, that is, the feature information in the various preset dimensions can be spliced together. Obtain the target media content feature vector of the target media content. For example, when there are three preset dimensions, which are visual dimension, text information dimension and music dimension, one preset dimension can correspond to one feature value or one feature vector, for example, the text information dimension can correspond to one feature vector ( For example, converting text into text vector), and the music dimension can correspond to a feature value (for example, the feature value corresponding to cheerful music is 1, and the feature value corresponding to sad music is 0). Here, the feature information corresponding to each preset dimension is processed. After splicing, the feature vector of the target media content can be obtained.

在确定目标媒体内容特征向量之后，即可以通过将目标媒体内容特征向量与场景索引库中的各个候选媒体内容特征向量进行匹配，确定与目标媒体内容的场景意图匹配的至少一个第一候选媒体内容，基于各个第一候选媒体内容即可确定第一候选媒体内容集合。After the target media content feature vector is determined, at least one first candidate media content that matches the scene intent of the target media content can be determined by matching the target media content feature vector with each candidate media content feature vector in the scene index library , the first candidate media content set can be determined based on each first candidate media content.

本公开实施例中，如图4所示，可以按照如下步骤生成场景索引库：In the embodiment of the present disclosure, as shown in FIG. 4 , the scene index library can be generated according to the following steps:

S401、对各个初选媒体内容进行场景意图识别，确定各个初选媒体内容中具有场景意图的第一候选媒体内容；S401, performing scene intention identification on each primary selection media content, and determining a first candidate media content with scene intention in each primary selection media content;

S402、提取第一候选媒体内容在多种预设维度下的特征信息；S402, extracting feature information of the first candidate media content under multiple preset dimensions;

S403、基于第一候选媒体内容在多种预设维度下的特征信息，生成第一候选媒体内容的候选媒体内容特征向量；S403, based on the feature information of the first candidate media content in multiple preset dimensions, generate a candidate media content feature vector of the first candidate media content;

S404、将第一候选媒体内容的第一媒体内容标识和该第一候选媒体内容的候选媒体内容特征向量对应存储在场景索引库中。S404: Store the first media content identifier of the first candidate media content and the candidate media content feature vector of the first candidate media content in the scene index library correspondingly.

其中，上述场景索引库中的各个候选媒体内容特征向量的确定方式与上述目标媒体内容特征向量的确定方式类似，也即，本公开实施例中，同样需要基于各种预设维度的特征信息确定候选媒体内容特征向量。The manner of determining the feature vector of each candidate media content in the above scene index library is similar to the manner of determining the feature vector of the target media content, that is, in the embodiment of the present disclosure, it also needs to be determined based on feature information of various preset dimensions Candidate media content feature vector.

不同的是，相对目标媒体内容特征向量可以在线完成，对于候选媒体内容特征向量的确定可以离线完成。本公开实施例中，在进行各种预设维度的特征信息的提取之前，需要先对媒体内容库中的各个媒体内容进行场景意图的识别，以筛选出与场景意图匹配的第一候选媒体内容。这样，即可以对筛选出的媒体内容进行各种预设维度的特征信息的提取以根据提取出的多种预设维度下的特征信息确定第一候选媒体内容的候选媒体内容特征向量。有关候选媒体内容特征向量的确定过程参见上述目标媒体内容特征向量的相关描述，在此不再赘述。The difference is that the relative target media content feature vector can be done online, and the determination of the candidate media content feature vector can be done offline. In the embodiment of the present disclosure, before extracting feature information of various preset dimensions, it is necessary to first identify the scene intent of each media content in the media content library, so as to filter out the first candidate media content that matches the scene intent . In this way, feature information of various preset dimensions can be extracted for the screened media content to determine a candidate media content feature vector of the first candidate media content according to the extracted feature information in multiple preset dimensions. For the process of determining the feature vector of the candidate media content, please refer to the relevant description of the feature vector of the target media content, which will not be repeated here.

其中，本公开实施例可以基于场景意图模型实现与场景意图匹配的第一候选媒体内容的筛选。这里的场景意图模型可以是预先训练完成的，可以基于媒体内容库中的各个初选媒体内容及其对应的场景意图标注信息进行训练，这样，基于训练到的场景意图模型即可以确定媒体内容库中的各个初选媒体内容是否具有场景意图。Wherein, the embodiment of the present disclosure can realize the screening of the first candidate media content matching the scene intent based on the scene intent model. The scene intent model here can be pre-trained, and can be trained based on each primary media content in the media content library and its corresponding scene intent annotation information. In this way, the media content library can be determined based on the trained scene intent model. Whether each primary media content in has a scene intent.

在具体应用中，若以初选视频作为初选媒体内容，首先可以提取初选视频中与场景意图识别相关的信息，然后将提取的与场景意图识别相关的信息作为待训练的场景意图模型的输入特征，将上述场景意图标注信息作为待训练的场景意图模型的输出结果进行场景意图模型的模型参数的训练。In a specific application, if the primary selection video is used as the primary selection media content, the information related to the scene intent recognition in the primary selection video can be extracted first, and then the extracted information related to the scene intent recognition can be used as the scene intent model to be trained. Input features, and use the above-mentioned scene intent annotation information as the output result of the scene intent model to be trained to train model parameters of the scene intent model.

其中，本公开实施例中所选用的与场景意图相关的信息可以包括初选视频所对应的视频用户已授权的行为特征，如视频点击、视频点赞等交互统计指标；还可以提取初选视频所对应的视频文本特征，如视频的标题文本，利用光学字符识别(Optical CharacterRecognition，OCR)，自动语音识别(Automatic Speech Recognition，ASR)技术识别得到的文本类信息；还可以包括视频视觉特征，如视频抽帧后的图片特征信息。上述行为特征一定程度上可以表征视频的热度信息，上述视频文本特征则一定程度上可以提现出场景描述相关的内容信息，上述视觉特征则可以表征视频中的图片的细节信息，上述信息一定程度上可以表征一个用户对当前标注的初选媒体内容的场景意图。Among them, the information related to the scene intention selected in the embodiment of the present disclosure may include the authorized behavior characteristics of the video user corresponding to the primary selection video, such as interactive statistical indicators such as video clicks, video likes, etc.; the primary selection video may also be extracted. The corresponding video text feature, such as the title text of the video, utilizes optical character recognition (Optical Character Recognition, OCR), automatic speech recognition (Automatic Speech Recognition, ASR) technology recognizes the text information; can also include video visual features, such as Picture feature information after video frame extraction. The above behavioral features can represent the popularity of the video to a certain extent, the video text features can represent the content information related to the scene description to a certain extent, and the visual features can represent the detailed information of the pictures in the video. It can represent a user's scene intention for the currently annotated primary media content.

基于上述媒体内容库的场景意图识别操作以及特征向量提取操作，可以确定可属于场景索引库的候选媒体内容特征向量，本公开实施例针对场景索引库不仅可以确定各个候选媒体内容特征向量，还可以确定该候选媒体内容特征向量的媒体内容出处，也即，可以建立有候选媒体内容特征向量与媒体内容标识之间的对应关系，并该将对应关系存储至场景索引库中。这样，在将目标媒体内容特征向量与各个候选媒体内容特征向量进行匹配之后，即可以将向量匹配结果符合预设要求的第一候选媒体内容组成为第一候选媒体内容集合。Based on the scene intent recognition operation and the feature vector extraction operation of the media content library, candidate media content feature vectors that can belong to the scene index library can be determined. For the scene index library, the embodiment of the present disclosure can not only determine each candidate media content feature vector, but also The source of the media content of the candidate media content feature vector is determined, that is, a corresponding relationship between the candidate media content feature vector and the media content identifier can be established, and the corresponding relationship should be stored in the scene index database. In this way, after the target media content feature vector is matched with each candidate media content feature vector, the first candidate media content whose vector matching result meets the preset requirement can be formed into a first candidate media content set.

本公开实施例中，有关向量匹配结果的确定，一是可以通过计算目标媒体内容的目标媒体内容特征向量与场景索引库中的各个候选媒体内容特征向量之间的余弦相似度来确定，二是基于场景相关性模型来确定。In the embodiment of the present disclosure, the determination of the vector matching result can be determined by calculating the cosine similarity between the target media content feature vector of the target media content and each candidate media content feature vector in the scene index library, and the second is Determined based on a scene correlation model.

对于前者，本公开实施例中可以基于余弦相似度确定与目标媒体内容特征向量匹配的至少一个候选媒体内容特征向量，这样，即可以将符合预设要求(如余弦相似度大于0.8)的候选媒体内容特征向量所对应的第一候选媒体内容归入第一候选媒体内容集合。For the former, in this embodiment of the present disclosure, at least one candidate media content feature vector that matches the target media content feature vector may be determined based on the cosine similarity. In this way, the candidate media that meets the preset requirements (eg, the cosine similarity is greater than 0.8) may be selected. The first candidate media content corresponding to the content feature vector is classified into the first candidate media content set.

对于后者，本公开实施例可以将目标媒体内容的目标媒体内容特征向量与场景索引库中的各个候选媒体内容特征向量输入场景相关性模型进行处理，得到与目标媒体内容特征向量匹配的至少一个候选媒体内容特征向量。For the latter, in this embodiment of the present disclosure, the target media content feature vector of the target media content and each candidate media content feature vector in the scene index library may be input into the scene correlation model for processing to obtain at least one feature vector matching the target media content feature vector. Candidate media content feature vector.

其中，上述场景相关性模型可以是基于标记好场景相关性匹配结果的训练样本媒体内容预先训练得到的。在具体应用中，可以基于训练媒体内容库中的任意两个训练样本媒体内容以及对应该两个训练样本媒体内容所设置的有关场景相关性匹配结果进行训练。这里，可以将两个场景相关性比较高的两个媒体内容进行媒体内容特征向量的提取，而后将对应的一组媒体内容特征向量作为一组训练样本进行训练，通过训练过程中模型输出的相关性与标记好的场景相关性匹配结果之间的比对结果进行场景相关性模型的参数调整，同理，还可以将两个场景相关性比较低的两个媒体内容作为一组训练样本进行训练以进行参数调整，从而能够在达到模型训练截止条件时，得到训练好的场景相关性模型。The above scene correlation model may be pre-trained based on the training sample media content marked with the scene correlation matching result. In a specific application, training can be performed based on any two training sample media contents in the training media content library and the relevant scene correlation matching results set for the two training sample media contents. Here, two media contents with relatively high correlation between two scenes can be extracted for media content feature vectors, and then a set of corresponding media content feature vectors can be used as a set of training samples for training. The parameters of the scene correlation model are adjusted according to the comparison results between the correlation results of the scene and the marked scene correlation. Similarly, the two media contents with low correlation between the two scenes can also be used as a set of training samples for training. In order to adjust the parameters, the trained scene correlation model can be obtained when the model training cut-off condition is reached.

这样，将目标媒体内容特征向量与场景索引库中的各个候选媒体内容特征向量输入至训练好的场景相关性模型，即可以确定目标媒体内容特征向量与场景索引库中的各个候选媒体内容特征向量之间的匹配度。本公开实施例中，两个媒体内容特征向量的相关性越高，一定程度上也说明匹配度越好，这时，可以从各个候选媒体内容特征向量对应的初选媒体内容中，选取出符合预设匹配度的第一候选媒体内容，例如，可以将匹配度达到0.75以上的候选媒体内容特征向量所对应的初选媒体内容确定为匹配的第一候选媒体内容。In this way, the target media content feature vector and each candidate media content feature vector in the scene index library are input into the trained scene correlation model, and the target media content feature vector and each candidate media content feature vector in the scene index library can be determined. match between. In the embodiment of the present disclosure, the higher the correlation between the two media content feature vectors, the better the matching degree to a certain extent. For the first candidate media content with a preset matching degree, for example, the primary selected media content corresponding to the feature vector of the candidate media content with a matching degree of 0.75 or more may be determined as the matching first candidate media content.

本公开实施例中，不管是基于余弦相似度，还是基于场景相关性模型确定与目标媒体内容特征向量匹配的候选媒体内容特征向量，在确定与匹配的候选媒体内容特征向量对应的第一候选媒体内容时，均可以基于场景索引库中所存储的有关候选媒体内容特征向量与媒体内容标识之间的对应关系，确定与匹配得到的候选媒体内容特征向量对应的第一媒体内容标识，这样，基于第一媒体内容标识即可以从媒体内容库中查找到对应的第一候选媒体内容。In this embodiment of the present disclosure, whether the candidate media content feature vector matching the target media content feature vector is determined based on the cosine similarity or the scene correlation model, when determining the first candidate media content feature vector corresponding to the matching candidate media content feature vector content, the first media content identifier corresponding to the matched candidate media content feature vector can be determined based on the correspondence between the relevant candidate media content feature vectors and the media content identifiers stored in the scene index library. The first media content identifier can search for the corresponding first candidate media content from the media content library.

这里，在确定出匹配的第一候选媒体内容之后，可以直接将匹配得到的第一候选媒体内容进行组合以生成第一候选媒体内容集合，还可以先基于用户行为信息进行筛选而后组合成第一候选媒体内容集合。Here, after the matching first candidate media content is determined, the first candidate media content obtained by matching may be directly combined to generate a first candidate media content set, and the first candidate media content may also be filtered based on user behavior information and then combined into the first candidate media content. A collection of candidate media content.

本公开实施例中，可以基于与第一媒体内容标识所标识的第一候选媒体内容对应的用户行为信息，从至少一个匹配的第一候选媒体内容中选取符合预设要求的第一候选媒体内容以作为第一候选媒体内容集合中的媒体内容元素。In this embodiment of the present disclosure, based on the user behavior information corresponding to the first candidate media content identified by the first media content identifier, the first candidate media content that meets the preset requirements may be selected from at least one matched first candidate media content as a media content element in the first candidate media content set.

其中，上述用户行为信息可以是有关第一候选媒体内容的点赞量、播放量、转发量等信息，这在一定程度上表明了各个用户对第一候选媒体内容的关注程度，从而能够进一步满足用户的搜索需求，提升媒体内容搜索的流量。Wherein, the above-mentioned user behavior information may be information such as the amount of likes, playback, and forwarding of the first candidate media content, which to a certain extent indicates the degree of attention of each user to the first candidate media content, so as to further satisfy the The search demand of users increases the traffic of media content search.

在确定用户行为信息之后，即可以将符合预设要求的第一候选媒体内容确定为第一候选媒体内容集合。例如，在用户行为信息为媒体内容点赞量和媒体内容播放量时，可以将媒体内容点赞量大于预设点赞量(如大于50)、以及媒体内容播放量大于预设播放量(如大于35)的第一候选媒体内容确定第一候选媒体内容集合。After the user behavior information is determined, the first candidate media content that meets the preset requirements may be determined as the first candidate media content set. For example, when the user behavior information is the amount of media content likes and the amount of media content playback, the amount of media content likes may be greater than the preset amount of likes (eg, greater than 50), and the amount of media content playback may be greater than the preset amount of playback (eg The first candidate media content greater than 35) determines the first candidate media content set.

除此之外，本公开实施例还可以是在按照用户行为信息对至少一个匹配的第一候选媒体内容进行排名后，选取预设名次的第一候选媒体内容以作为第一候选媒体内容集合中的媒体内容元素。这里，仍以媒体内容点赞量和媒体内容播放量作为用户行为信息进行说明，可以基于排名在前20的第一候选媒体内容确定第一候选媒体内容集合。这主要是考虑到本公开实施例确定的第一候选媒体内容集合需要在用户端进行展示，因此，这里可以采用排名展示的方式，这样，推送给用户端进行展示时，可以基于上述排名结果进行媒体内容的展示。In addition, in the embodiment of the present disclosure, after ranking at least one matched first candidate media content according to the user behavior information, the first candidate media content of the preset ranking may be selected as the first candidate media content in the first candidate media content set. media content elements. Here, the amount of media content likes and the amount of media content played are still used as user behavior information for description, and the first candidate media content set may be determined based on the top 20 first candidate media contents. This is mainly because the first candidate media content set determined by the embodiment of the present disclosure needs to be displayed on the user terminal. Therefore, a ranking display method can be adopted here. In this way, when it is pushed to the user terminal for display, it can be displayed based on the above ranking results. Display of media content.

针对第二候选媒体内容集合的确定而言，本公开实施例可以根据目标媒体内容对应的目标实体特征向量与实体索引库中的各个候选实体特征向量之间的匹配结果来确定，如图5所示，上述确定第二候选媒体内容集合的方法具体包括如下步骤：For the determination of the second candidate media content set, the embodiment of the present disclosure may be determined according to the matching result between the target entity feature vector corresponding to the target media content and each candidate entity feature vector in the entity index library, as shown in FIG. 5 . As shown, the above-mentioned method for determining the second candidate media content set specifically includes the following steps:

S501、确定目标媒体内容中，与实体意图对应的目标实体的目标实体特征向量；S501. Determine the target entity feature vector of the target entity corresponding to the entity intent in the target media content;

S502、将目标实体的目标实体特征向量与实体索引库中的各个候选实体特征向量进行匹配，确定与目标实体匹配的第二候选媒体内容集合；第二候选媒体内容集合中包含至少一个第二候选媒体内容。S502: Match the target entity feature vector of the target entity with each candidate entity feature vector in the entity index library to determine a second candidate media content set that matches the target entity; the second candidate media content set includes at least one second candidate media content.

这里，本公开实施例的目标实体特征向量可以是在对目标媒体内容中的实体进行意图识别之后所确定的目标实体的特征向量，也即，本公开实施例中的目标实体特征向量对应的是与实体意图对应的目标实体，该目标实体可以是目标媒体内容中的部分或全部实体。这主要是考虑到针对目标媒体内容而言，作为一种可以涵盖声音和图像的多媒体元素，其所包含的实体可能各种各样，然而，不符合实体意图的实体对于媒体内容搜索本身不仅会增加计算量，而且会降低搜索结果的观看流量，基于此，本公开实施例中可以对目标媒体内容进行意图识别，然后再确定实体意图对应的目标实体的目标实体特征向量，如图6所示，上述确定目标实体特征向量可以通过如下步骤实现：Here, the feature vector of the target entity in the embodiment of the present disclosure may be the feature vector of the target entity determined after the intent recognition of the entity in the target media content, that is, the feature vector of the target entity in the embodiment of the present disclosure corresponds to The target entity corresponding to the entity intent, the target entity may be part or all of the entities in the target media content. This is mainly considering that for the target media content, as a multimedia element that can cover sounds and images, it may contain various entities. However, entities that do not meet the intent of the entity will not only search for media content itself The amount of calculation is increased, and the viewing traffic of the search results will be reduced. Based on this, in the embodiment of the present disclosure, the intent of the target media content can be identified, and then the target entity feature vector of the target entity corresponding to the entity intent can be determined, as shown in FIG. 6 . , the above determination of the feature vector of the target entity can be achieved by the following steps:

S601、检测出目标媒体内容中的实体；S601. Detecting an entity in the target media content;

S602、对检测出的至少一个实体进行意图识别，确定出至少一个目标实体；S602. Perform intent recognition on the detected at least one entity to determine at least one target entity;

S603、针对每个目标实体，基于该目标实体在目标媒体内容中对应的图像信息，生成该目标实体对应的目标实体特征向量。S603. For each target entity, generate a target entity feature vector corresponding to the target entity based on image information corresponding to the target entity in the target media content.

这里，首先可以检测出目标媒体内容中的各个实体，然后可以对检测出的各个实体进行意图识别，以从各个实体中确定出目标实体。最后，针对目标实体可以基于该目标实体在目标媒体内容中对应的图像信息，生成该目标实体对应的目标实体特征向量。Here, each entity in the target media content can be detected first, and then the intent recognition can be performed on each detected entity to determine the target entity from the various entities. Finally, for the target entity, a target entity feature vector corresponding to the target entity may be generated based on the image information corresponding to the target entity in the target media content.

这里，仍以视频作为媒体内容为例，上述目标媒体内容中的实体可以是用户在观看目标视频时，基于搜索指令的触发而选中的当前图片帧中的实体，还可以基于搜索指令的触发而选中的当前图片帧或该图片帧前后若干图片帧中的中的实体，还可以是整个目标视频中各个图片帧中的实体。本公开实施例中的实体可以理解成目标视频的前景目标，这时，可以利用光流法、帧差法和背景差法等进行前景目标的检测，还可以利用机器学习手段实现前景目标的检测，本公开实施例对此不做具体限制。Here, still taking the video as the media content as an example, the entity in the above-mentioned target media content may be the entity in the current picture frame selected by the user based on the trigger of the search instruction when watching the target video, or the entity in the current picture frame based on the trigger of the search instruction. The selected entity in the current picture frame or several picture frames before and after the picture frame may also be an entity in each picture frame in the entire target video. The entity in the embodiment of the present disclosure can be understood as the foreground target of the target video. At this time, the optical flow method, the frame difference method, the background difference method, etc. can be used to detect the foreground target, and the machine learning method can also be used to realize the detection of the foreground target. , which is not specifically limited in the embodiments of the present disclosure.

在检测得到各个实体之后，可以将该实体输入至预先训练好的实体意图模型中，以确定出用户意图对应的目标实体，而后基于该目标实体在目标媒体内容中对应的图像信息，生成对应的目标实体特征向量。After each entity is detected, the entity can be input into the pre-trained entity intent model to determine the target entity corresponding to the user's intent, and then based on the corresponding image information of the target entity in the target media content, the corresponding Target entity feature vector.

其中，上述与实体意图对应的目标实体可以是直接基于发起搜索指令的框选按钮的触发操作所确定的实体，还可以是基于实体意图模型确定的实体。这里的实体意图模型可以是预先训练完成的，也即，可以基于媒体内容库中的各个初选媒体内容及其对应的实体意图标注信息进行训练，这样，基于训练到的实体意图模型即可以确定媒体内容库中的每个媒体内容是否具有目标实体。Wherein, the above-mentioned target entity corresponding to the entity intent may be an entity determined directly based on a triggering operation of a frame selection button that initiates a search instruction, or an entity determined based on an entity intent model. The entity intent model here can be pre-trained, that is, it can be trained based on each primary selected media content in the media content library and its corresponding entity intent annotation information, so that it can be determined based on the trained entity intent model. Whether each media content in the media content library has a target entity.

在具体应用中，若以初选视频作为初选媒体内容，首先可以提取初选视频中与实体意图识别相关的信息，然后将提取的与实体意图识别相关的信息作为待训练的实体意图模型的输入特征，将上述实体意图标注信息作为待训练的实体意图模型的输出结果进行实体意图模型的模型参数的训练。In a specific application, if the primary selection video is used as the primary selection media content, the information related to entity intent recognition in the primary selection video can be extracted first, and then the extracted information related to entity intent recognition can be used as the information of the entity intent model to be trained. Input features, and use the entity intent annotation information as the output result of the entity intent model to be trained to train model parameters of the entity intent model.

其中，本公开实施例中所选用的与实体意图相关的信息可以包括初选视频所对应的视频用户已授权的行为特征，如视频点击、视频点赞等交互统计指标；还可以提取初选视频所对应的视频文本特征，如视频的标题文本，利用OCR技术，ASR技术识别得到的文本类信息；还可以是初选视频的视觉特征，如视频抽帧后的图片特征信息；还可以包括初选视频中的实体的视觉特征；还可以包括初选视频中的实体与视频的关系特征，如实体占据视频图片帧的面积、实体所在视频图片帧中的位置、实体在视频中的出现频率等信息。上述信息一定上可以表征一个用户对当前标注的初选媒体内容的实体意图。Wherein, the information related to the entity intent selected in the embodiment of the present disclosure may include the authorized behavior characteristics of the video user corresponding to the primary selection video, such as interactive statistical indicators such as video clicks, video likes, etc.; the primary selection video may also be extracted. The corresponding video text features, such as the title text of the video, the text information identified by OCR technology and ASR technology; it can also be the visual features of the primary video, such as the picture feature information after the video frame is extracted; it can also include the initial video. Visual features of the entities in the selected video; it can also include the relationship between the entities in the primary video and the video, such as the area the entity occupies in the video picture frame, the location of the entity in the video picture frame, the frequency of the entity in the video, etc. information. The above information can definitely represent the entity intention of a user for the currently marked primary selected media content.

本公开实施例中，一个目标实体可以对应一个媒体内容对象，这里，以人猫互动的一个目标媒体内容为例，人可以作为一个视频对象，猫也可以作为一个视频对象，这样，在对猫进行实体意图标注之后，皆可确定目标媒体内容中是否存在猫这一目标实体。In the embodiment of the present disclosure, a target entity may correspond to a media content object. Here, taking a target media content interacting with a human and a cat as an example, a human can be used as a video object, and a cat can also be used as a video object. After the entity intent annotation is performed, it can be determined whether the target entity of the cat exists in the target media content.

有关目标实体在目标媒体内容中对应的图像信息可以是目标实体所处图像位置信息、还可以是基于该目标实体所处的多个图片帧所确定的实体运动信息，还可以是其它能够表征目标实体的图像信息，本公开实施例对此不做具体的限制。本公开实施例中，将上述各种图像信息采用向量形式来表征(例如，输入至训练好的特征提取网络中)，即可得到目标实体对应的所述目标实体特征向量，例如，针对16*16的图片帧，可以利用10111来表征目标实体的重心位于图片帧的第二行第三列，也即，可以将位置信息进行向量化，同理，还可以将其它图像信息进行向量化，本公开实施例在此不做赘述。The image information corresponding to the target entity in the target media content may be the image position information where the target entity is located, the entity motion information determined based on multiple picture frames where the target entity is located, or other information that can characterize the target entity. The image information of the entity, which is not specifically limited in this embodiment of the present disclosure. In the embodiment of the present disclosure, the above-mentioned various image information is represented in the form of a vector (for example, input into a trained feature extraction network), the target entity feature vector corresponding to the target entity can be obtained, for example, for 16* 16 picture frames, you can use 10111 to indicate that the center of gravity of the target entity is located in the second row and third column of the picture frame, that is, the position information can be vectorized. Similarly, other image information can also be vectorized. The disclosed embodiments are not described in detail here.

值的说明的是，本公开实施例针对目标媒体内容所确定的目标实体可以为一个，也可以为多个，针对每个目标实体，均可以基于上述方法确定其对应的目标实体特征向量。The description of the value is that the embodiment of the present disclosure may determine one or more target entities for the target media content, and for each target entity, its corresponding target entity feature vector may be determined based on the above method.

在确定目标实体特征向量之后，即通过将目标实体的目标实体特征向量与实体索引库中的各个候选实体特征向量进行匹配，确定与目标实体匹配的第二候选媒体内容集合。After the target entity feature vector is determined, the second candidate media content set matching the target entity is determined by matching the target entity feature vector of the target entity with each candidate entity feature vector in the entity index library.

本公开实施例中，如图7所示，可以按照如下步骤生成实体索引库：In the embodiment of the present disclosure, as shown in FIG. 7 , the entity index library can be generated according to the following steps:

S701、确定各个初选媒体内容中包含的实体；S701. Determine entities included in each primary selection media content;

S702、对初选媒体内容中包含的实体进行意图识别，确定候选实体及该候选实体所在的第二候选媒体内容；S702, performing intention identification on entities included in the primary selection media content, and determining a candidate entity and a second candidate media content where the candidate entity is located;

S703、基于候选实体在对应的第二候选媒体内容中的图像信息，生成该候选实体对应的候选实体特征向量；S703, based on the image information of the candidate entity in the corresponding second candidate media content, generate a candidate entity feature vector corresponding to the candidate entity;

S704、将候选实体所在的第二候选媒体内容的第二媒体内容标识，和该候选实体的候选实体特征向量对应存储在实体索引库中。S704: Store the second media content identifier of the second candidate media content where the candidate entity is located, and the candidate entity feature vector of the candidate entity in the entity index library correspondingly.

其中，上述实体索引库中的各个候选实体特征向量的确定方式与上述目标实体特征向量的确定方式类似，也即，本公开实施例中，同样需要先进行实体意图识别，再对实体意图识别所确定的候选实体进行特征向量的提取。The manner of determining the feature vectors of each candidate entity in the entity index library is similar to the manner of determining the feature vector of the target entity. The determined candidate entities are extracted for feature vector.

不同的是，相比目标实体特征向量可以在线完成，对于候选实体特征向量的确定可以离线完成。本公开实施例中，在进行实体意图识别之前，可以先对媒体内容库中的各个初选媒体内容进行抽帧处理，对于抽帧图片而言，需要先进行实体识别，再进行实体意图识别以筛选出与实体意图匹配的候选实体及该候选实体所在的第二候选媒体内容。这样，即可以基于候选实体在对应的第二候选媒体内容中的图像信息，确定对应的候选实体特征向量。有关候选实体特征向量的确定过程参见上述目标实体特征向量的相关描述，在此不再赘述。The difference is that, compared with the target entity feature vector, which can be done online, the determination of the candidate entity feature vector can be done offline. In the embodiment of the present disclosure, before performing entity intent recognition, frame extraction processing may be performed on each primary selected media content in the media content library. A candidate entity matching the entity's intent and the second candidate media content where the candidate entity is located are screened out. In this way, the corresponding candidate entity feature vector can be determined based on the image information of the candidate entity in the corresponding second candidate media content. For the determination process of the candidate entity feature vector, please refer to the above-mentioned relevant description of the target entity feature vector, which will not be repeated here.

其中，本公开实施例可以基于实体模型实现与实体意图匹配的第二候选媒体内容的筛选。这里的实体意图模型具体参见上述描述内容，在此不再赘述。Among them, the embodiment of the present disclosure can realize the screening of the second candidate media content that matches the entity's intent based on the entity model. For the entity intent model here, refer to the above description for details, and details are not repeated here.

基于上述媒体内容库的实体意图识别操作以及特征向量提取操作，即可以确定可属于实体索引库的候选实体特征向量，本公开实施例针对实体索引库不仅可以确定各个候选实体特征向量，还可以确定该候选实体特征向量的媒体内容出处，也即，可以建立有候选实体的候选实体特征向量与第二媒体内容标识之间的对应关系，并该将对应关系存储至场景索引库中。这样，在将目标实体特征向量与各个候选实体特征向量进行匹配之后，即可以将向量匹配结果符合预设要求的第二候选媒体内容组成为第二候选媒体内容集合。Based on the entity intent recognition operation and the feature vector extraction operation of the above media content library, the candidate entity feature vectors that can belong to the entity index library can be determined. For the entity index library, the embodiment of the present disclosure can not only determine each candidate entity feature vector, but also determine the entity index library. The source of the media content of the candidate entity feature vector, that is, the corresponding relationship between the candidate entity feature vector of the candidate entity and the second media content identifier can be established, and the corresponding relationship should be stored in the scene index database. In this way, after the target entity feature vector is matched with each candidate entity feature vector, the second candidate media content whose vector matching result meets the preset requirement can be formed into a second candidate media content set.

本公开实施例中，有关向量匹配结果的确定，一是可以通过计算目标实体的目标实体特征向量与实体索引库中的各个候选实体特征向量之间的余弦相似度来确定，二是基于实体相关性模型来确定。In the embodiment of the present disclosure, the determination of the vector matching result can be determined by calculating the cosine similarity between the target entity feature vector of the target entity and each candidate entity feature vector in the entity index library, and the second is based on entity correlation sex model to determine.

对于前者，本公开实施例中可以基于余弦相似度确定与目标实体的目标实体特征向量匹配的至少一个候选实体特征向量，这样，即可以将符合预设要求(如余弦相似度大于0.8)的候选实体特征向量所对应的第二候选媒体内容归入第二候选媒体内容集合。For the former, in this embodiment of the present disclosure, at least one candidate entity feature vector that matches the target entity feature vector of the target entity may be determined based on the cosine similarity. In this way, the candidate entity that meets the preset requirements (eg, the cosine similarity is greater than 0.8) can be selected. The second candidate media content corresponding to the entity feature vector is classified into the second candidate media content set.

对于后者，本公开实施例可以将目标实体的目标实体特征向量与实体索引库中的各个候选实体特征向量输入实体相关性模型进行处理，得到与目标实体特征向量匹配的至少一个候选实体特征向量。For the latter, in this embodiment of the present disclosure, the target entity feature vector of the target entity and each candidate entity feature vector in the entity index library can be input into the entity correlation model for processing to obtain at least one candidate entity feature vector matching the target entity feature vector. .

其中，上述实体相关性模型可以是预先训练得到的，可以基于标准好的实体相关性匹配结果的训练样本媒体内容训练得到。在具体应用中，可以基于训练媒体内容库中的任意训练样本媒体内容中的实体以及对应该两个训练样本媒体内容所设置的有关实体相关性匹配结果进行训练。这里，可以将两个实体相关性比较高的两个媒体内容进行实体特征向量的提取，而后将对应的一组实体特征向量作为一组训练样本进行训练，通过训练过程中模型输出的相关性与标注好的实体相关性匹配结果之间的比对结果进行实体相关性模型的参数调整，同理，还可以将两个实体相关性比较低的两个媒体内容作为一组训练样本进行训练以进行参数调整，从而能够在达到模型训练截止条件时，得到训练好的实体相关性模型。The above entity correlation model may be obtained by pre-training, and may be obtained by training based on the training sample media content of the standard good entity correlation matching result. In a specific application, training may be performed based on entities in any training sample media content in the training media content library and the related entity correlation matching results set for the two training sample media contents. Here, the entity feature vector can be extracted from the two media contents with high correlation between the two entities, and then the corresponding set of entity feature vectors can be used as a set of training samples for training. The comparison results between the marked entity correlation matching results are used to adjust the parameters of the entity correlation model. Similarly, two media contents with relatively low entity correlations can also be used as a set of training samples for training. The parameters are adjusted, so that the trained entity correlation model can be obtained when the model training cut-off condition is reached.

这样，将目标实体特征向量与实体索引库中的各个候选实体特征向量输入至训练好的实体相关性模型，即可以确定目标实体特征向量与实体索引库中的各个候选实体特征向量之间的匹配度。参见上述场景相关性模型的相关描述，这里也可以基于匹配度确定第二候选媒体内容。In this way, by inputting the target entity feature vector and each candidate entity feature vector in the entity index library into the trained entity correlation model, the matching between the target entity feature vector and each candidate entity feature vector in the entity index library can be determined. Spend. Referring to the relevant description of the above scene correlation model, the second candidate media content may also be determined based on the matching degree here.

本公开实施例中，不管是基于余弦相似度，还是基于实体相关性模型确定与目标实体特征向量匹配的候选实体特征向量，在确定与匹配的候选实体特征向量对应的第二候选媒体内容时，均可以基于实体索引库中所存储的有关候选实体特征向量与媒体内容标识之间的对应关系，确定与匹配得到的候实体特征向量对应的第二媒体内容标识，这样，基于第二媒体内容标识即可以从媒体内容库中查找到对应的第二候选媒体内容。In the embodiment of the present disclosure, whether the candidate entity feature vector matching the target entity feature vector is determined based on the cosine similarity or the entity correlation model, when determining the second candidate media content corresponding to the matching candidate entity feature vector, All can be based on the corresponding relationship between the relevant candidate entity feature vector and the media content identifier stored in the entity index library, to determine the second media content identifier corresponding to the candidate entity feature vector obtained by matching. In this way, based on the second media content identifier That is, the corresponding second candidate media content can be searched from the media content library.

这里，在确定出匹配的第二候选媒体内容之后，可以直接将匹配得到的第二候选媒体内容进行组合以生成第二候选媒体内容集合，还可以先基于用户授权的用户行为信息进行筛选而后组合成第二候选媒体内容集合。Here, after the matching second candidate media content is determined, the second candidate media content obtained by matching can be directly combined to generate a second candidate media content set, and the user behavior information authorized by the user can also be filtered first and then combined. into a second candidate media content set.

本公开实施例中，可以基于与第二媒体内容标识所标识的第二候选媒体内容对应的用户行为信息，从至少一个匹配的第二候选媒体内容中选取符合预设要求的第二候选媒体内容以作为第二候选媒体内容集合中的媒体内容元素。In this embodiment of the present disclosure, based on user behavior information corresponding to the second candidate media content identified by the second media content identifier, a second candidate media content that meets preset requirements may be selected from at least one matched second candidate media content as a media content element in the second candidate media content set.

有关第二候选媒体内容对应的用户行为信息与上述第一候选媒体内容对应的用户行为信息的确定方法相同，在此不再赘述。另外，基于用户行为信息确定第二候选媒体内容集合与上述第一候选媒体内容集合的确定方法类似，也即，即可以基于预设要求的判断，也可以基于排名结果进行确定，在此不再赘述。The method for determining the user behavior information corresponding to the second candidate media content is the same as the method for determining the user behavior information corresponding to the first candidate media content, and details are not described herein again. In addition, the method for determining the second candidate media content set based on the user behavior information is similar to the method for determining the first candidate media content set, that is, it can be determined based on either a preset requirement or a ranking result, which is not repeated here. Repeat.

考虑到本公开实施例确定的第二候选媒体内容集合需要在用户端进行展示，因此，这里可以采用排名展示的方式，这样，推送给用户端进行展示时，也可以基于上述排名结果进行媒体内容的展示。Considering that the second candidate media content set determined by the embodiment of the present disclosure needs to be displayed on the user end, a ranking display method may be adopted here. In this way, when the set of media content is pushed to the user end for display, the media content may also be displayed based on the above ranking results. 's display.

为了便于进一步理解本公开实施例提供的上述媒体内容搜索的方法，可以结合图8所示的应用示意图对上述媒体内容搜索的方法进行说明。In order to facilitate further understanding of the foregoing media content search method provided by the embodiments of the present disclosure, the foregoing media content search method may be described with reference to the application diagram shown in FIG. 8 .

如图8所示，上述媒体内容搜索的方法可以通过实体离线模块、场景离线模块和在线模块来实现，这里，将视频作为媒体内容进行示例说明。As shown in FIG. 8 , the above-mentioned method for searching for media content can be implemented by an entity offline module, a scene offline module and an online module. Here, video is used as an example of media content for description.

其中，上述实体离线模块可以基于对视频库中各个视频的抽帧操作实现图片帧的提取，而后可以经过实体检测和实体意图模型确定候选实体，这样，在基于目标实体在各个视频中的图像信息确定候选实体特征向量之后，即将确定的各个候选实体特征向量存储至实体索引库。其中，该实体索引库中可以存储的各候选实体特征向量与各视频标识之间的对应关系，如图8实体索引库中所示的候选实体特征向量1与视频21(即向量1->视频21)、候选实体特征向量2与视频22(即向量2->视频22)、候选实体特征向量3与视频23(即向量3->视频23)之间的对应关系。与此同时，还可以记录各个视频所对应的用户行为信息以为后续的实体排序做准备。Among them, the entity offline module can extract the picture frame based on the frame extraction operation of each video in the video library, and then can determine the candidate entity through entity detection and entity intent model. In this way, based on the image information of the target entity in each video After the candidate entity feature vectors are determined, the determined candidate entity feature vectors are stored in the entity index library. Among them, the corresponding relationship between each candidate entity feature vector and each video identifier that can be stored in the entity index library, such as the candidate entity feature vector 1 and video 21 shown in the entity index library in FIG. 21) The correspondence between the candidate entity feature vector 2 and the video 22 (ie, vector 2->video 22), and the candidate entity feature vector 3 and video 23 (ie, vector 3->video 23). At the same time, user behavior information corresponding to each video can also be recorded to prepare for subsequent entity sorting.

另外，上述场景离线模块可以将视频库中的各个视频输入至场景意图模型，以确定与场景意图匹配的视频，通过视觉特征、文本信息特征、音乐特征这些预设维度下的特征向量的提取，可以确定对应的候选视频特征向量，将确定的各个候选视频特征向量存储至场景索引库。其中，该场景索引库中可以存储的各候选视频特征向量与各视频标识之间的对应关系，如图8场景索引库中所示的候选视频特征向量1与视频11(即向量1->视频11)、候选视频特征向量2与视频12(即向量2->视频12)、候选视频特征向量3与视频13(即向量3->视频13)之间的对应关系。与此同时，还可以记录各个视频所对应的用户行为信息以为后续的场景排序做准备。In addition, the above-mentioned scene offline module can input each video in the video library into the scene intent model to determine the video that matches the scene intent, through the extraction of feature vectors under the preset dimensions of visual features, text information features, and music features, Corresponding candidate video feature vectors may be determined, and each determined candidate video feature vector may be stored in a scene index library. Among them, the corresponding relationship between each candidate video feature vector and each video identifier that can be stored in the scene index library, such as the candidate video feature vector 1 and video 11 shown in the scene index library in FIG. 8 (that is, vector 1 -> video 11) Correspondence between candidate video feature vector 2 and video 12 (ie, vector 2->video 12), and candidate video feature vector 3 and video 13 (ie, vector 3->video 13). At the same time, user behavior information corresponding to each video can also be recorded to prepare for subsequent scene sorting.

针对在线模块而言，可以分成两路进行相关处理，一路是基于场景意图确定第一候选视频集合，另一路是基于实体意图确定第二候选视频集合。For the online module, correlation processing can be performed in two ways, one way is to determine the first candidate video set based on the scene intent, and the other way is to determine the second candidate video set based on the entity intent.

针对第一候选视频集合的确定而言，首先可以提取在视觉特征、文本信息特征、音乐特征这些预设维度下的目标视频特征向量，将拼接后的目标视频特征向量输入至场景相关性模型，即可以将目标视频特征向量与场景索引库中的候选视频特征向量进行匹配，以得到匹配后的第一候选视频。这时，即可以基于场景排序模型以及记录的各个视频所对应的用户行为信息对匹配后的第一候选视频进行排序，筛选出符合预设名次的第一候选视频，作为最终的第一候选视频集合。For the determination of the first candidate video set, the target video feature vector under the preset dimensions of visual features, text information features, and music features can be extracted first, and the spliced target video feature vectors are input into the scene correlation model, That is, the target video feature vector can be matched with the candidate video feature vector in the scene index library to obtain the matched first candidate video. At this time, the matched first candidate videos can be sorted based on the scene sorting model and the user behavior information corresponding to the recorded videos, and the first candidate videos that meet the preset ranking can be screened out as the final first candidate videos. gather.

针对第二候选视频集合的确定而言，首先可以基于用户所点击的图片帧进行实体检测和实体意图模型下的意图识别，如图8所示，可以识别得到一个目标实体(仅为一个具体的示例)，这里，可以确定对应的目标实体特征向量。基于实体相关性模型，即可以将目标实体特征向量与实体索引库中的候选实体特征向量进行匹配，以得到匹配后的第二候选视频。这时，即可以基于实体排序模型以及记录的各个视频所对应的用户行为信息对匹配后的第二候选视频进行排序，筛选出符合预设名次的第二候选视频，作为最终的第二候选视频集合。For the determination of the second candidate video set, first, entity detection and intent recognition under the entity intent model can be performed based on the picture frame clicked by the user. As shown in Figure 8, a target entity (only a specific example), where the corresponding target entity feature vector can be determined. Based on the entity correlation model, the target entity feature vector can be matched with the candidate entity feature vector in the entity index library to obtain a matched second candidate video. At this time, the matched second candidate videos can be sorted based on the entity sorting model and the user behavior information corresponding to the recorded videos, and the second candidate videos that meet the preset ranking can be screened out as the final second candidate videos. gather.

接下来以执行主体为用户端对本公开实施例提供的一种媒体内容搜索的方法加以说明。Next, a method for searching media content provided by an embodiment of the present disclosure will be described with the execution subject as the client.

实施例二Embodiment 2

参见图9所示，为本公开实施例二提供的媒体内容搜索的方法的流程图，方法包括步骤S901～S903，其中：Referring to FIG. 9, which is a flowchart of a method for searching media content according to Embodiment 2 of the present disclosure, the method includes steps S901-S903, wherein:

S901、向服务器发送针对目标媒体内容的搜索指令；S901, sending a search instruction for target media content to a server;

S902、接收服务器反馈的搜索结果；搜索结果中包含与目标媒体内容的场景意图匹配的第一候选媒体内容集合，和/或与目标媒体内容的实体意图匹配的第二候选媒体内容集合；S902, receive the search result fed back by the server; the search result includes a first candidate media content set that matches the scene intent of the target media content, and/or a second candidate media content set that matches the entity intent of the target media content;

S903、基于搜索结果，显示搜索结果展示页面。S903 , based on the search result, display a search result display page.

这里，可以首先向服务器发送针对目标媒体内容的搜索指令，然后接收服务器根据实施例一所示的媒体内容搜索的方法所确定的搜索结果，基于搜索结果中包含与目标媒体内容的场景意图匹配的第一候选媒体内容集合，以及与目标媒体内容的实体意图匹配的第二候选媒体内容集合进行搜索结果展示页面的显示。Here, a search instruction for the target media content may be sent to the server first, and then a search result determined by the server according to the media content search method shown in Embodiment 1 may be received. The first candidate media content set and the second candidate media content set matching the entity intent of the target media content are used to display the search result display page.

其中，上述搜索指令可以是用户端在响应针对目标媒体内容画面上的搜索按钮的触发操作，还可以是响应作用在目标媒体内容画面上的框选按钮的触发操作之后，向服务器发起的，有关搜索指令的具体发起过程以及上述有关搜索按钮和框选按钮的触发操作的相关描述，具体参见图2(a)～2(b)所涉及的应用示意图以及实施例一的相关描述，在此不再赘述。Wherein, the above-mentioned search instruction may be initiated by the user terminal to the server in response to the trigger operation of the search button on the target media content screen, or in response to the trigger operation of the frame selection button acting on the target media content screen. For the specific initiation process of the search instruction and the above-mentioned related descriptions about the triggering operations of the search button and the frame selection button, please refer to the application schematic diagrams involved in FIGS. Repeat.

为了便于兼顾各个场景属性以及实现多个目标实体的搜索，本公开实施例中的搜索结果还可以包括与第一候选媒体内容集合对应的一个第一集合标识信息，以及第二候选媒体内容集合中至少一个第二候选媒体内容子集分别对应一个第二集合标识信息，具体描述可参见实施例一中的相关描述，在此不再赘述。In order to facilitate taking into account the attributes of each scene and realize the search of multiple target entities, the search result in the embodiment of the present disclosure may further include a first set identification information corresponding to the first candidate media content set, and the second candidate media content set in the At least one second candidate media content subset corresponds to one second set identification information respectively. For a specific description, please refer to the relevant description in Embodiment 1, which will not be repeated here.

这里，用户端即可以接收的上述搜索结果，显示包含第一集合标识信息和第二集合标识信息的搜索结果展示页面，这样，在用户触发任一集合标识信息时，均可展示与该任一集合标识信息对应的候选媒体内容。Here, the user terminal can receive the above-mentioned search results, and display the search result display page including the first set identification information and the second set identification information. In this way, when the user triggers any set of identification information, it can display and display any set of identification information. The candidate media content corresponding to the set identification information.

仍以图2(b)所示的应用示意图为例，在触发场景这一标识时，可以展示与该标识对应的多个家居视频，即展示与相似场景对应的搜索结果。同理，在触发其它标识时，可以切换展示对应的候选媒体内容(未示出)。Still taking the application diagram shown in FIG. 2( b ) as an example, when the scene marker is triggered, multiple home videos corresponding to the marker can be displayed, that is, search results corresponding to similar scenes can be displayed. Similarly, when other identifiers are triggered, the corresponding candidate media content (not shown) can be switched and displayed.

本公开实施例中，有关多个标识所对应展示的候选媒体内容可以基于滑动操作进行切换显示，仍以图2(b)，可以通过左右滑动在上述三个标识之间进行切换，从而实现其对应的显示内容的分类展示，在确定搜索全面性的前提下，确保搜索的针对性。In the embodiment of the present disclosure, the candidate media content displayed corresponding to the multiple logos can be switched and displayed based on the sliding operation. Still in FIG. 2(b), the three logos can be switched by sliding left and right, so as to realize the display of the candidate media content. The classified display of the corresponding display content ensures the pertinence of the search under the premise of determining the comprehensiveness of the search.

除此之外，本公开实施例不仅可以支持上述分类展示方式，还可以将第一候选媒体内容集合与第二候选媒体内容集合进行组合，而后进行组合展示，在具体展示时，可以基于搜索结果，在搜索结果展示页面上展示包含第一集合标识信息和第二集合标识信息的第一搜索结果展示区域以及包含第一候选媒体内容集合和第二候选媒体内容集合中的各个媒体内容这一媒体内容列表的第二搜索结果展示区域，从而可以不区分是基于实体意图所搜索到的相似媒体内容还是基于场景意图所搜索到的相似媒体内容。In addition, the embodiments of the present disclosure can not only support the above-mentioned classification display methods, but also combine the first candidate media content set and the second candidate media content set, and then perform combined display. , display the first search result display area containing the first set identification information and the second set identification information on the search result display page, and the media including each media content in the first candidate media content set and the second candidate media content set The second search result display area of the content list can make no distinction between similar media content searched based on entity intent or similar media content searched based on scene intent.

值得说明的是，本公开实施例还可以采用其它展示方式，本公开实施例对此不做具体的限制。It should be noted that the embodiments of the present disclosure may also adopt other display manners, which are not specifically limited in the embodiments of the present disclosure.

本领域技术人员可以理解，在具体实施方式的上述方法中，各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定，各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。Those skilled in the art can understand that in the above method of the specific implementation, the writing order of each step does not mean a strict execution order but constitutes any limitation on the implementation process, and the specific execution order of each step should be based on its function and possible Internal logic is determined.

基于同一发明构思，本公开实施例中还提供了与媒体内容搜索的方法对应的媒体内容搜索的装置，由于本公开实施例中的装置解决问题的原理与本公开实施例上述媒体内容搜索的方法相似，因此装置的实施可以参见方法的实施，重复之处不再赘述。Based on the same inventive concept, an apparatus for searching media content corresponding to the method for searching media content is also provided in the embodiment of the present disclosure, because the principle of solving the problem by the apparatus in the embodiment of the present disclosure is the same as the above-mentioned method for searching media content in the embodiment of the present disclosure. Similar, therefore, the implementation of the apparatus may refer to the implementation of the method, and repeated descriptions will not be repeated.

实施例三Embodiment 3

参照图10所示，为本公开实施例三提供的一种媒体内容搜索的装置的示意图，装置包括：指令接收模块1001、集合确定模块1002和结果搜索模块1003；其中，Referring to FIG. 10, which is a schematic diagram of an apparatus for searching media content according to Embodiment 3 of the present disclosure, the apparatus includes: an instruction receiving module 1001, a set determining module 1002, and a result searching module 1003; wherein,

指令接收模块1001，用于接收针对目标媒体内容的搜索指令；an instruction receiving module 1001, configured to receive a search instruction for target media content;

集合确定模块1002，用于基于搜索指令，确定与目标媒体内容的场景意图匹配的第一候选媒体内容集合，以及与目标媒体内容的实体意图匹配的第二候选媒体内容集合；a set determination module 1002, configured to determine, based on the search instruction, a first candidate media content set that matches the scene intent of the target media content, and a second candidate media content set that matches the entity intent of the target media content;

结果搜索模块1003，用于基于第一候选媒体内容集合，以及第二候选媒体内容集合，向用户端发送与目标媒体内容对应的搜索结果。The result search module 1003 is configured to send a search result corresponding to the target media content to the user terminal based on the first candidate media content set and the second candidate media content set.

本公开实施例从目标媒体内容中的实体和场景两方面进行了相似媒体内容的自动搜索，无需通过文本输入进行媒体内容搜索，可以提升搜索效率及搜索的准确性，并且提高了得到满足用户意图的搜索结果的概率。The embodiments of the present disclosure perform automatic search for similar media content from both entities and scenes in the target media content, and do not need to search for media content through text input, which can improve search efficiency and search accuracy, and improve the ability to meet user intent. the probability of a search result.

一种可能的实施方式中，集合确定模块1002，用于按照以下步骤确定与目标媒体内容的场景意图匹配的第一候选媒体内容集合：In a possible implementation, the set determination module 1002 is configured to determine the first candidate media content set that matches the scene intent of the target media content according to the following steps:

基于目标媒体内容在多种预设维度下的特征信息，确定目标媒体内容对应的目标媒体内容特征向量；Determine the target media content feature vector corresponding to the target media content based on the feature information of the target media content in multiple preset dimensions;

通过将目标媒体内容特征向量与场景索引库中的各个候选媒体内容特征向量进行匹配，确定与目标媒体内容的场景意图匹配的至少一个第一候选媒体内容，将至少一个第一候选媒体内容组成第一候选媒体内容集合。By matching the target media content feature vector with each candidate media content feature vector in the scene index library, at least one first candidate media content that matches the scene intent of the target media content is determined, and the at least one first candidate media content is composed of the first candidate media content. A set of candidate media content.

一种可能的实施方式中，媒体内容为视频，多种预设维度包括以下维度中的多种：In a possible implementation, the media content is a video, and the multiple preset dimensions include multiple of the following dimensions:

一种可能的实施方式中，集合确定模块1002，用于根据以下步骤生成场景索引库：In a possible implementation, the set determination module 1002 is configured to generate a scene index library according to the following steps:

提取第一候选媒体内容在多种预设维度下的特征信息；extracting feature information of the first candidate media content under multiple preset dimensions;

基于第一候选媒体内容在多种预设维度下的特征信息，生成第一候选媒体内容的候选媒体内容特征向量；generating a candidate media content feature vector of the first candidate media content based on feature information of the first candidate media content in multiple preset dimensions;

将第一候选媒体内容的第一媒体内容标识和该第一候选媒体内容的候选媒体内容特征向量对应存储在场景索引库中。The first media content identifier of the first candidate media content and the candidate media content feature vector of the first candidate media content are correspondingly stored in the scene index library.

一种可能的实施方式中，集合确定模块1002，用于按照以下步骤通确定与目标媒体内容的场景意图匹配的至少一个第一候选媒体内容：In a possible implementation, the set determination module 1002 is configured to determine at least one first candidate media content that matches the scene intent of the target media content according to the following steps:

将目标媒体内容的目标媒体内容特征向量与场景索引库中的各个候选媒体内容特征向量进行匹配，确定与目标媒体内容特征向量匹配的至少一个候选媒体内容特征向量；Matching the target media content feature vector of the target media content with each candidate media content feature vector in the scene index library, and determining at least one candidate media content feature vector that matches the target media content feature vector;

基于场景索引库中，与确定的候选媒体内容特征向量对应的第一媒体内容标识，确定至少一个第一候选媒体内容。At least one first candidate media content is determined based on the first media content identifier corresponding to the determined feature vector of the candidate media content in the scene index library.

一种可能的实施方式中，集合确定模块1002，用于按照以下步骤将至少一个第一候选媒体内容组成第一候选媒体内容集合：In a possible implementation, the set determination module 1002 is configured to form at least one first candidate media content into a first candidate media content set according to the following steps:

获取与第一媒体内容标识所标识的第一候选媒体内容对应的用户行为信息；obtaining user behavior information corresponding to the first candidate media content identified by the first media content identifier;

基于用户行为信息，从第一媒体内容标识所标识的第一候选媒体内容中选取符合预设要求的第一候选媒体内容，组成第一候选媒体内容集合。Based on the user behavior information, a first candidate media content set that meets a preset requirement is selected from the first candidate media content identified by the first media content identifier to form a first candidate media content set.

一种可能的实施方式中，集合确定模块1002，用于按照以下步骤将目标媒体内容的目标媒体内容特征向量与场景索引库中的各个候选媒体内容特征向量进行匹配：In a possible implementation, the set determination module 1002 is configured to match the target media content feature vector of the target media content with each candidate media content feature vector in the scene index library according to the following steps:

将目标媒体内容的目标媒体内容特征向量与场景索引库中的各个候选媒体内容特征向量输入场景相关性模型进行处理，得到与目标媒体内容特征向量匹配的至少一个候选媒体内容特征向量；场景相关性模型为基于标记好场景相关性匹配结果的训练样本媒体内容训练得到的；或者，Inputting the target media content feature vector of the target media content and each candidate media content feature vector in the scene index library into the scene correlation model for processing to obtain at least one candidate media content feature vector matching the target media content feature vector; scene correlation The model is trained based on the training sample media content marked with the scene relevance matching results; or,

通过计算目标媒体内容的目标媒体内容特征向量与场景索引库中的各个候选媒体内容特征向量之间的余弦相似度，确定与目标媒体内容特征向量匹配的至少一个候选媒体内容特征向量。At least one candidate media content feature vector matching the target media content feature vector is determined by calculating the cosine similarity between the target media content feature vector of the target media content and each candidate media content feature vector in the scene index library.

一种可能的实施方式中，集合确定模块1002，用于按照以下步骤确定与目标媒体内容的实体意图匹配的第二候选媒体内容集合：In a possible implementation manner, the set determination module 1002 is configured to determine the second candidate media content set that matches the entity intent of the target media content according to the following steps:

确定目标媒体内容中，与实体意图对应的目标实体的目标实体特征向量；Determine the target entity feature vector of the target entity corresponding to the entity intent in the target media content;

将目标实体的目标实体特征向量与实体索引库中的各个候选实体特征向量进行匹配，确定与目标实体匹配的第二候选媒体内容集合；第二候选媒体内容集合中包含至少一个第二候选媒体内容。Matching the target entity feature vector of the target entity with each candidate entity feature vector in the entity index library to determine a second candidate media content set that matches the target entity; the second candidate media content set includes at least one second candidate media content .

一种可能的实施方式中，集合确定模块1002，用于按照以下步骤确定目标媒体内容中，与实体意图对应的目标实体的目标实体特征向量：In a possible implementation, the set determination module 1002 is configured to determine the target entity feature vector of the target entity corresponding to the entity intent in the target media content according to the following steps:

检测出目标媒体内容中的实体；detecting entities in the target media content;

对检测出的至少一个实体进行意图识别，确定出至少一个目标实体；Perform intent recognition on the detected at least one entity, and determine at least one target entity;

针对每个目标实体，基于该目标实体在目标媒体内容中对应的图像信息，生成该目标实体对应的目标实体特征向量。For each target entity, a target entity feature vector corresponding to the target entity is generated based on the image information corresponding to the target entity in the target media content.

一种可能的实施方式中，集合确定模块1002，用于根据以下步骤生成实体索引库：In a possible implementation, the set determination module 1002 is configured to generate an entity index library according to the following steps:

对初选媒体内容中包含的实体进行意图识别，确定与候选实体及该候选实体所在的第二候选媒体内容；Perform intent recognition on entities included in the primary selection media content, and determine the candidate entity and the second candidate media content where the candidate entity is located;

基于候选实体在对应的第二候选媒体内容中的图像信息，生成该候选实体对应的候选实体特征向量；generating a candidate entity feature vector corresponding to the candidate entity based on the image information of the candidate entity in the corresponding second candidate media content;

将候选实体所在的第二候选媒体内容的第二媒体内容标识，和该候选实体的候选实体特征向量对应存储在实体索引库中。The second media content identifier of the second candidate media content where the candidate entity is located and the candidate entity feature vector of the candidate entity are correspondingly stored in the entity index library.

一种可能的实施方式中，集合确定模块1002，用于按照以下步骤确定与目标实体匹配的第二候选媒体内容集合：In a possible implementation, the set determination module 1002 is configured to determine the second candidate media content set matching the target entity according to the following steps:

将目标媒体内容中的目标实体的目标实体特征向量与实体索引库中的各个候选实体特征向量进行匹配，确定与目标媒体内容匹配的至少一个候选实体特征向量；Matching the target entity feature vector of the target entity in the target media content with each candidate entity feature vector in the entity index library, and determining at least one candidate entity feature vector matching the target media content;

基于实体索引库中，与确定的候选实体特征向量对应的第二媒体内容标识，确定第二候选媒体内容集合。Based on the second media content identifier corresponding to the determined candidate entity feature vector in the entity index library, the second candidate media content set is determined.

一种可能的实施方式中，集合确定模块1002，用于按照以下步骤确定第二候选媒体内容集合：In a possible implementation manner, the set determination module 1002 is configured to determine the second candidate media content set according to the following steps:

获取与第二媒体内容标识所标识的第二候选媒体内容对应的用户行为信息；obtaining user behavior information corresponding to the second candidate media content identified by the second media content identifier;

基于用户行为信息，从第二媒体内容标识所标识的第二候选媒体内容中选取符合预设要求的第二候选媒体内容，组成第二候选媒体内容集合。Based on the user behavior information, the second candidate media content that meets the preset requirements is selected from the second candidate media content identified by the second media content identifier to form a second candidate media content set.

一种可能的实施方式中，集合确定模块1002，用于按照以下步骤将目标实体的目标实体特征向量与实体索引库中的各个候选实体特征向量进行匹配：In a possible implementation, the set determination module 1002 is configured to match the target entity feature vector of the target entity with each candidate entity feature vector in the entity index library according to the following steps:

将目标实体的目标实体特征向量与实体索引库中的各个候选实体特征向量输入实体相关性模型进行处理，得到与目标实体特征向量匹配的至少一个候选实体特征向量；实体相关性模型为基于标记好实体相关性匹配结果的训练样本媒体内容训练得到的；或者，The target entity feature vector of the target entity and each candidate entity feature vector in the entity index library are input into the entity correlation model for processing, and at least one candidate entity feature vector matching the target entity feature vector is obtained; The training sample media content of the entity correlation matching result is obtained by training; or,

通过计算目标实体的目标实体特征向量与实体索引库中的各个候选实体特征向量之间的余弦相似度，确定与目标实体特征向量匹配的至少一个候选实体特征向量。At least one candidate entity feature vector matching the target entity feature vector is determined by calculating the cosine similarity between the target entity feature vector of the target entity and each candidate entity feature vector in the entity index library.

一种可能的实施方式中，结果搜索模块1003，用于按照以下步骤向用户端发送与目标媒体内容对应的搜索结果：In a possible implementation, the result search module 1003 is configured to send the search result corresponding to the target media content to the client according to the following steps:

生成第一候选媒体内容集合对应的第一集合标识信息，以及生成第二候选媒体内容集合中至少一个第二候选媒体内容子集分别对应的第二集合标识信息；其中，每个第二候选媒体内容子集对应一个与实体意图匹配的候选实体；generating first set identification information corresponding to the first candidate media content set, and generating second set identification information respectively corresponding to at least one second candidate media content subset in the second candidate media content set; wherein, each second candidate media The content subset corresponds to a candidate entity that matches the entity's intent;

将第一候选媒体内容集合及该第一候选媒体内容集合对应的第一集合标识信息，第二候选媒体内容集合中至少一个第二候选媒体内容子集及每个第二候选媒体内容子集分别对应的第二集合标识信息作为搜索结果发送给用户端。The first candidate media content set and the first set identification information corresponding to the first candidate media content set, at least one second candidate media content subset and each second candidate media content subset in the second candidate media content set respectively The corresponding second set identification information is sent to the user terminal as the search result.

一种可能的实施方式中，第一集合标识信息包括第一缩略图片和/或第一文字描述信息；In a possible implementation manner, the first set identification information includes a first thumbnail image and/or first text description information;

第二集合标识信息包括第二缩略图片和/或第二文字描述信息。The second set identification information includes second thumbnail images and/or second text description information.

如图11所示，为本公开实施例三提供的另一种媒体内容搜索的装置的示意图，装置包括：指令发送模块1101、结果接收模块1102、和页面显示模块1103；其中，As shown in FIG. 11 , it is a schematic diagram of another apparatus for searching media content according to Embodiment 3 of the present disclosure. The apparatus includes: an instruction sending module 1101 , a result receiving module 1102 , and a page displaying module 1103 ; wherein,

指令发送模块1101，用于向服务器发送针对目标媒体内容的搜索指令；an instruction sending module 1101, configured to send a search instruction for the target media content to the server;

结果接收模块1102，用于接收服务器反馈的搜索结果；搜索结果中包含与目标媒体内容的场景意图匹配的第一候选媒体内容集合，和/或与目标媒体内容的实体意图匹配的第二候选媒体内容集合；The result receiving module 1102 is configured to receive the search result fed back by the server; the search result includes a first candidate media content set that matches the scene intent of the target media content, and/or a second candidate media that matches the entity intent of the target media content content collection;

页面显示模块1103，用于基于搜索结果，显示搜索结果展示页面。The page display module 1103 is configured to display a search result display page based on the search result.

在一种可能的实施方式中，指令发送模块1101，用于按照如下步骤向服务器发送针对目标媒体内容的搜索指令：In a possible implementation, the instruction sending module 1101 is configured to send a search instruction for the target media content to the server according to the following steps:

响应针对目标媒体内容画面上的搜索按钮的触发操作，向服务器发送针对目标媒体内容的搜索指令；或者，In response to the trigger operation of the search button on the target media content screen, send a search instruction for the target media content to the server; or,

在一种可能的实施方式中，搜索结果中还包含与第一候选媒体内容集合对应的第一集合标识信息，以及与第二候选媒体内容集合中至少一个第二候选媒体内容子集分别对应的第二集合标识信息；其中，每个第二候选媒体内容子集对应一个与实体意图匹配的候选实体；In a possible implementation manner, the search result further includes first set identification information corresponding to the first candidate media content set, and identification information corresponding to at least one second candidate media content subset in the second candidate media content set respectively. The second set of identification information; wherein, each second candidate media content subset corresponds to a candidate entity that matches the entity's intent;

页面显示模块1103，用于按照以下步骤显示搜索结果展示页面：The page display module 1103 is used to display the search result display page according to the following steps:

基于搜索结果，显示包含第一集合标识信息和第二集合标识信息的搜索结果展示页面；Based on the search result, displaying a search result display page containing the first set of identification information and the second set of identification information;

内容展示模块1104，用于显示搜索结果展示页面之后，响应针对任一集合标识信息的触发操作，展示与该任一集合标识信息对应的候选媒体内容；其中，任一集合标识信息为第一集合标识信息或任一第二集合标识信息。The content display module 1104 is configured to display the candidate media content corresponding to any set identification information in response to a trigger operation for any set identification information after displaying the search result display page; wherein, any set identification information is the first set identification information or any second set of identification information.

在一种实施方式中，页面显示模块1103，用于按照以下步骤显示包含第一集合标识信息和至少一个第二集合标识信息的搜索结果展示页面：In one embodiment, the page display module 1103 is configured to display a search result display page containing the first set of identification information and at least one second set of identification information according to the following steps:

基于搜索结果，显示包含第一搜索结果展示区域和第二搜索结果展示区域的搜索结果展示页面；Based on the search result, displaying a search result display page including the first search result display area and the second search result display area;

其中，第一搜索结果展示区域中包含第一集合标识信息和第二集合标识信息；第二搜索结果展示区域中包含媒体内容列表，媒体内容列表中包含第一候选媒体内容集合和第二候选媒体内容集合中的各个媒体内容。Wherein, the first search result display area includes the first set identification information and the second set identification information; the second search result display area includes a media content list, and the media content list includes the first candidate media content set and the second candidate media Individual media content in a content collection.

在一种实施方式中，上述装置还包括：In one embodiment, the above device further includes:

内容切换模块1105，用于在响应针对任一集合标识信息的触发操作，展示与该任一集合标识信息对应的候选媒体内容之后，响应滑动触发操作，切换为展示其它集合标识信息对应的其它候选媒体内容。The content switching module 1105 is configured to, after displaying the candidate media content corresponding to any set identification information in response to a trigger operation for any set identification information, switch to display other candidates corresponding to other set identification information in response to the sliding trigger operation media content.

关于装置中的各模块的处理流程、以及各模块之间的交互流程的描述可以参照上述方法实施例中的相关说明，这里不再详述。For the description of the processing flow of each module in the apparatus and the interaction flow between the modules, reference may be made to the relevant descriptions in the foregoing method embodiments, which will not be described in detail here.

实施例四Embodiment 4

本公开实施例还提供了一种计算机设备，该计算机设备可以是服务器，也可以是用户端。在以服务器作为计算机设备时，如图12所示，为本公开实施例提供的计算机设备的结构示意图，包括：处理器1201、存储器1202、和总线1203。存储器1202存储有处理器1201可执行的机器可读指令(如图10所示装置中，指令接收模块1001、集合确定模块1002和结果搜索模块1003所对应执行的指令)，当计算机设备运行时，处理器1201与存储器1202之间通过总线1203通信，机器可读指令被处理器1201执行时执行如下处理：An embodiment of the present disclosure also provides a computer device, where the computer device may be a server or a client. When a server is used as a computer device, as shown in FIG. 12 , a schematic structural diagram of a computer device provided by an embodiment of the present disclosure includes: a processor 1201 , a memory 1202 , and a bus 1203 . The memory 1202 stores machine-readable instructions executable by the processor 1201 (in the device shown in FIG. 10, the instructions executed by the instruction receiving module 1001, the set determination module 1002 and the result search module 1003), when the computer device is running, The communication between the processor 1201 and the memory 1202 is through the bus 1203, and the machine-readable instructions are executed by the processor 1201 to perform the following processing:

基于搜索指令，确定与目标媒体内容的场景意图匹配的第一候选媒体内容集合，以及与目标媒体内容的实体意图匹配的第二候选媒体内容集合；determining, based on the search instruction, a first candidate media content set that matches the scene intent of the target media content, and a second candidate media content set that matches the entity intent of the target media content;

基于第一候选媒体内容集合，以及第二候选媒体内容集合，向用户端发送与目标媒体内容对应的搜索结果。Based on the first candidate media content set and the second candidate media content set, a search result corresponding to the target media content is sent to the user terminal.

一种可能的实施方式中，上述处理器1201执行的指令中，确定与目标媒体内容的场景意图匹配的第一候选媒体内容集合，包括：In a possible implementation manner, in the instructions executed by the processor 1201, the first candidate media content set that matches the scene intent of the target media content is determined, including:

一种可能的实施方式中，上述处理器1201执行的指令中，根据以下步骤生成场景索引库：In a possible implementation manner, in the instructions executed by the processor 1201, the scene index library is generated according to the following steps:

一种可能的实施方式中，上述处理器1201执行的指令中，通过将目标媒体内容特征向量与场景索引库中的各个候选媒体内容特征向量进行匹配，确定与目标媒体内容的场景意图匹配的至少一个第一候选媒体内容，包括：In a possible implementation, in the instructions executed by the processor 1201, by matching the target media content feature vector with each candidate media content feature vector in the scene index library, it is determined that at least the target media content is matched with the scene intent of the target media content. A first candidate media content, including:

一种可能的实施方式中，上述处理器1201执行的指令中，将至少一个第一候选媒体内容组成第一候选媒体内容集合，包括：In a possible implementation manner, in the instructions executed by the processor 1201, at least one first candidate media content is formed into a first candidate media content set, including:

一种可能的实施方式中，上述处理器1201执行的指令中，将目标媒体内容的目标媒体内容特征向量与场景索引库中的各个候选媒体内容特征向量进行匹配，包括：In a possible implementation, in the instructions executed by the processor 1201, the target media content feature vector of the target media content is matched with each candidate media content feature vector in the scene index library, including:

一种可能的实施方式中，上述处理器1201执行的指令中，确定与目标媒体内容的实体意图匹配的第二候选媒体内容集合，包括：In a possible implementation manner, in the instructions executed by the processor 1201, the second candidate media content set that matches the entity intent of the target media content is determined, including:

一种可能的实施方式中，上述处理器1201执行的指令中，确定目标媒体内容中，与实体意图对应的目标实体的目标实体特征向量，包括：In a possible implementation manner, in the instructions executed by the processor 1201, the target entity feature vector of the target entity corresponding to the entity intent in the target media content is determined, including:

一种可能的实施方式中，上述处理器1201执行的指令中，根据以下步骤生成实体索引库：In a possible implementation manner, in the instructions executed by the processor 1201, an entity index library is generated according to the following steps:

一种可能的实施方式中，上述处理器1201执行的指令中，将目标实体的目标实体特征向量与实体索引库中的各个候选实体特征向量进行匹配，确定与目标实体匹配的第二候选媒体内容集合，包括：In a possible implementation, in the instructions executed by the processor 1201, the target entity feature vector of the target entity is matched with each candidate entity feature vector in the entity index library, and the second candidate media content matching the target entity is determined. Collection, including:

一种可能的实施方式中，上述处理器1201执行的指令中，基于实体索引库中，与确定的候选实体特征向量对应的第二媒体内容标识，确定第二候选媒体内容集合，包括：In a possible implementation, in the instruction executed by the processor 1201, based on the second media content identifier corresponding to the determined candidate entity feature vector in the entity index library, the second candidate media content set is determined, including:

一种可能的实施方式中，上述处理器1201执行的指令中，将目标实体的目标实体特征向量与实体索引库中的各个候选实体特征向量进行匹配，包括：In a possible implementation manner, in the instructions executed by the processor 1201, the target entity feature vector of the target entity is matched with each candidate entity feature vector in the entity index library, including:

一种可能的实施方式中，上述处理器1201执行的指令中，基于第一候选媒体内容集合，以及第二候选媒体内容集合，向用户端发送与目标媒体内容对应的搜索结果，包括：In a possible implementation, in the instructions executed by the processor 1201, based on the first candidate media content set and the second candidate media content set, the search results corresponding to the target media content are sent to the client, including:

将第一候选媒体内容集合及该第一候选媒体内容集合对应的第一集合标识信息，以及第二候选媒体内容集合中至少一个第二候选媒体内容子集及每个第二候选媒体内容子集分别对应的第二集合标识信息作为搜索结果发送给用户端。The first candidate media content set and the first set identification information corresponding to the first candidate media content set, and at least one second candidate media content subset and each second candidate media content subset in the second candidate media content set The corresponding second set identification information is sent to the user terminal as a search result.

在以用户端作为计算机设备时，如图13所示，为本公开实施例提供的计算机设备的结构示意图，包括：处理器1301、存储器1302、和总线1303。存储器1302存储有处理器1301可执行的机器可读指令(如图11所示装置中，指令发送模块1101、结果接收模块1102、和页面显示模块1103所执行的指令)，当计算机设备运行时，处理器1301与存储器1302之间通过总线1303通信，机器可读指令被处理器1301执行时执行如下处理：When a user terminal is used as a computer device, as shown in FIG. 13 , a schematic structural diagram of a computer device provided by an embodiment of the present disclosure includes: a processor 1301 , a memory 1302 , and a bus 1303 . The memory 1302 stores machine-readable instructions executable by the processor 1301 (in the device shown in FIG. 11, the instructions executed by the instruction sending module 1101, the result receiving module 1102, and the page display module 1103), when the computer device runs, The communication between the processor 1301 and the memory 1302 is through the bus 1303. When the machine-readable instructions are executed by the processor 1301, the following processing is performed:

接收服务器反馈的搜索结果；搜索结果中包含与目标媒体内容的场景意图匹配的第一候选媒体内容集合，和/或与目标媒体内容的实体意图匹配的第二候选媒体内容集合；Receive a search result fed back by the server; the search result includes a first candidate media content set that matches the scene intent of the target media content, and/or a second candidate media content set that matches the entity intent of the target media content;

基于搜索结果，显示搜索结果展示页面。Based on the search results, display the search results display page.

一种可能的实施方式中，上述处理器1301执行的指令中，向服务器发送针对目标媒体内容的搜索指令，包括：In a possible implementation manner, in the instructions executed by the processor 1301, a search instruction for the target media content is sent to the server, including:

一种可能的实施方式中，搜索结果中还包含与第一候选媒体内容集合对应的第一集合标识信息，以及与第二候选媒体内容集合中至少一个第二候选媒体内容子集分别对应的第二集合标识信息；其中，每个第二候选媒体内容子集对应一个与实体意图匹配的候选实体；In a possible implementation manner, the search result further includes first set identification information corresponding to the first candidate media content set, and first set identification information corresponding to at least one second candidate media content subset in the second candidate media content set respectively. Two sets of identification information; wherein, each second candidate media content subset corresponds to a candidate entity that matches the entity's intent;

上述处理器1301执行的指令中，基于搜索结果，显示搜索结果展示页面，包括：In the instructions executed by the above-mentioned processor 1301, based on the search results, a search result display page is displayed, including:

显示搜索结果展示页面之后，上述处理器1301执行的指令还包括：After displaying the search result presentation page, the instructions executed by the processor 1301 further include:

响应针对任一集合标识信息的触发操作，展示与该任一集合标识信息对应的候选媒体内容；其中，任一集合标识信息为第一集合标识信息或任一第二集合标识信息。In response to a triggering operation for any set of identification information, candidate media content corresponding to any of the set of identification information is displayed; wherein any of the set of identification information is the first set of identification information or any of the second set of identification information.

一种可能的实施方式中，上述处理器1301执行的指令中，基于搜索结果，显示包含第一集合标识信息和至少一个第二集合标识信息的搜索结果展示页面，包括：In a possible implementation, in the instructions executed by the processor 1301, based on the search results, displaying a search result display page containing the first set of identification information and at least one second set of identification information, including:

一种可能的实施方式中，上述处理器1301执行的指令还包括：In a possible implementation manner, the instructions executed by the processor 1301 further include:

本公开实施例还提供一种计算机可读存储介质，该计算机可读存储介质上存储有计算机程序，该计算机程序被处理器运行时执行上述方法实施例一中的媒体内容搜索的方法的步骤或者执行上述方法实施例二中的媒体内容搜索的方法的步骤。其中，该存储介质可以是易失性或非易失的计算机可读取存储介质。An embodiment of the present disclosure further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the steps of the method for searching for media content in the first embodiment of the above method are executed or Perform the steps of the method for searching for media content in the second embodiment of the above method. Wherein, the storage medium may be a volatile or non-volatile computer-readable storage medium.

本公开实施例所提供的媒体内容搜索的方法的计算机程序产品，包括存储了程序代码的计算机可读存储介质，程序代码包括的指令可用于执行上述方法实施例中的媒体内容搜索的方法的步骤，具体可参见上述方法实施例，在此不再赘述。The computer program product of the method for searching media content provided by the embodiments of the present disclosure includes a computer-readable storage medium storing program codes, and the instructions included in the program code can be used to execute the steps of the method for searching media content in the above method embodiments , for details, refer to the foregoing method embodiments, which will not be repeated here.

本公开实施例还提供一种计算机程序，该计算机程序被处理器执行时实现前述实施例的任意一种方法。该计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中，所述计算机程序产品具体体现为计算机存储介质，在另一个可选实施例中，计算机程序产品具体体现为软件产品，例如软件开发包(Software DevelopmentKit，SDK)等等。Embodiments of the present disclosure further provide a computer program, which implements any one of the methods in the foregoing embodiments when the computer program is executed by a processor. The computer program product can be specifically implemented by hardware, software or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. .

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的系统和装置的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。在本公开所提供的几个实施例中，应该理解到，所揭露的系统、装置和方法，可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，又例如，多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。Those skilled in the art can clearly understand that, for the convenience and brevity of description, for the specific working process of the system and device described above, reference may be made to the corresponding process in the foregoing method embodiments, which will not be repeated here. In the several embodiments provided by the present disclosure, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. The apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some communication interfaces, indirect coupling or communication connection of devices or units, which may be in electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外，在本公开各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.

所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解，本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(Read-OnlyMemory，ROM)、随机存取存储器(Random Access Memory，RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The functions, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a processor-executable non-volatile computer-readable storage medium. Based on such understanding, the technical solutions of the present disclosure can be embodied in the form of software products in essence, or the parts that contribute to the prior art or the parts of the technical solutions. The computer software products are stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of the present disclosure. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes.

最后应说明的是：以上所述实施例，仅为本公开的具体实施方式，用以说明本公开的技术方案，而非对其限制，本公开的保护范围并不局限于此，尽管参照前述实施例对本公开进行了详细的说明，本领域的普通技术人员应当理解：任何熟悉本技术领域的技术人员在本公开揭露的技术范围内，其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化，或者对其中部分技术特征进行等同替换；而这些修改、变化或者替换，并不使相应技术方案的本质脱离本公开实施例技术方案的精神和范围，都应涵盖在本公开的保护范围之内。因此，本公开的保护范围应所述以权利要求的保护范围为准。Finally, it should be noted that the above-mentioned embodiments are only specific implementations of the present disclosure, and are used to illustrate the technical solutions of the present disclosure, but not to limit them. The protection scope of the present disclosure is not limited to this, although the aforementioned The embodiments describe the present disclosure in detail, and those skilled in the art should understand that: any person skilled in the art can still modify the technical solutions described in the foregoing embodiments within the technical scope disclosed by the present disclosure. Or can easily think of changes, or equivalently replace some of the technical features; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present disclosure, and should be covered in the present disclosure. within the scope of protection. Therefore, the protection scope of the present disclosure should be based on the protection scope of the claims.

Claims

1. A method for media content search, wherein the method comprises:

Receive a search instruction for target media content;

Based on the search instruction, determine a first candidate media content set that matches the scene intent of the target media content, and a second candidate media content set that matches the entity intent of the target media content. The scene intent is related to the feature information under various preset dimensions;

Based on the first candidate media content set and the second candidate media content set, a search result corresponding to the target media content is sent to the user terminal.

2. The method according to claim 1, wherein the determining the first candidate media content set matching the scene intent of the target media content comprises:

determining the target media content feature vector corresponding to the target media content based on the feature information of the target media content in multiple preset dimensions;

By matching the target media content feature vector with each candidate media content feature vector in the scene index library, at least one first candidate media content matching the scene intent of the target media content is determined, and the at least one first candidate media content is determined. A candidate media content constitutes the first candidate media content set.

3. The method according to claim 2, wherein the media content is a video, and the multiple preset dimensions include multiples of the following dimensions:

Visual dimension, textual information dimension, musical dimension.

4. The method according to claim 2, wherein the scene index library is generated according to the following steps:

Perform scene intention identification on each primary selection media content, and determine the first candidate media content with scene intention in each primary selection media content;

extracting feature information of the first candidate media content under multiple preset dimensions;

generating a candidate media content feature vector of the first candidate media content based on feature information of the first candidate media content in multiple preset dimensions;

The first media content identifier of the first candidate media content and the candidate media content feature vector of the first candidate media content are stored in the scene index library correspondingly.

5. The method according to claim 4, wherein, by matching the target media content feature vector with each candidate media content feature vector in the scene index library, determine the scene with the target media content At least one first candidate media content intended to match, including:

Matching the target media content feature vector of the target media content with each candidate media content feature vector in the scene index library, and determining at least one candidate media content feature vector that matches the target media content feature vector;

The at least one first candidate media content is determined based on the first media content identifier corresponding to the determined feature vector of the candidate media content in the scene index library.

6. The method according to claim 5, wherein the forming the at least one first candidate media content into the first candidate media content set comprises:

acquiring user behavior information corresponding to the first candidate media content identified by the first media content identifier;

Based on the user behavior information, first candidate media content that meets a preset requirement is selected from the first candidate media content identified by the first media content identifier to form the first candidate media content set.

7. The method according to claim 1, wherein determining the second candidate media content set matching the entity intent of the target media content comprises:

Determine the target entity feature vector of the target entity corresponding to the entity intent in the target media content;

Match the target entity feature vector of the target entity with each candidate entity feature vector in the entity index library, and determine a second candidate media content set that matches the target entity; the second candidate media content set includes at least A second candidate media content.

8. The method according to claim 7, wherein the determining the target entity feature vector of the target entity corresponding to the entity intent in the target media content comprises:

detecting an entity in the target media content;

Perform intent recognition on the detected at least one entity, and determine at least one of the target entities;

For each target entity, the target entity feature vector corresponding to the target entity is generated based on the image information corresponding to the target entity in the target media content.

9. The method according to claim 7, wherein the entity index library is generated according to the following steps:

determine the entities included in each primary media content;

Perform intent identification on entities included in the primary selection media content, and determine a candidate entity and a second candidate media content where the candidate entity is located;

generating the candidate entity feature vector corresponding to the candidate entity based on the image information of the candidate entity in the corresponding second candidate media content;

The second media content identifier of the second candidate media content where the candidate entity is located and the candidate entity feature vector of the candidate entity are correspondingly stored in the entity index library.

10. The method according to claim 9, wherein the target entity feature vector of the target entity is matched with each candidate entity feature vector in the entity index library, and the first entity matching the target entity is determined. Two candidate media content sets, including:

Matching the target entity feature vector of the target entity in the target media content with each candidate entity feature vector in the entity index library, and determining at least one candidate entity feature vector that matches the target media content;

The second candidate media content set is determined based on the second media content identifier corresponding to the determined candidate entity feature vector in the entity index library.

11 . The method according to claim 10 , wherein the second media content candidate is determined based on a second media content identifier corresponding to the determined feature vector of the candidate entity in the entity index library. 12 . Collection, including:

acquiring user behavior information corresponding to the second candidate media content identified by the second media content identifier;

Based on the user behavior information, a second candidate media content that meets a preset requirement is selected from the second candidate media content identified by the second media content identifier to form the second candidate media content set.

12. The method according to claim 1, wherein, based on the first candidate media content set and the second candidate media content set, the search corresponding to the target media content is sent to the client Results, including:

generating first set identification information corresponding to the first candidate media content set, and generating second set identification information respectively corresponding to at least one second candidate media content subset in the second candidate media content set; wherein each The second candidate media content subset corresponds to a candidate entity that matches the entity intent;

Combining the first candidate media content set and the first set identification information corresponding to the first candidate media content set, at least one second candidate media content subset and each second candidate media content in the second candidate media content set The second set identification information respectively corresponding to the content subsets is sent to the user terminal as the search result.

13. The method according to claim 12, wherein the first set identification information comprises a first thumbnail image and/or first text description information;

The second set identification information includes second thumbnail images and/or second text description information.

14. A method for media content search, wherein the method comprises:

Send a search instruction for the target media content to the server;

Receive a search result fed back by the server; the search result includes a first candidate media content set that matches the scene intent of the target media content, and/or a second candidate that matches the entity intent of the target media content A set of media content, where the scene intent of the target media content is related to feature information under various preset dimensions;

Based on the search results, a search result presentation page is displayed.

15. The method according to claim 14, wherein the sending a search instruction for the target media content to the server comprises:

In response to the trigger operation of the search button on the target media content screen, send a search instruction for the target media content to the server; or,

In response to the trigger operation of the frame selection button acting on the target media content screen, a search instruction for the frame-selected media content is sent to the server.

16. The method according to claim 14, wherein the search result further comprises first set identification information corresponding to the first candidate media content set, and at least one of the second candidate media content set Second set identification information respectively corresponding to the second candidate media content subsets; wherein, each of the second candidate media content subsets corresponds to a candidate entity that matches the entity intent;

The displaying a search result display page based on the search result, including:

Based on the search result, displaying a search result display page containing the first set identification information and the second set identification information;

After the displaying the search result display page, the method further includes:

In response to a trigger operation for any set identification information, display candidate media content corresponding to the any set identification information; wherein, the any set identification information is the first set identification information or any of the second sets identification information.

17. The method according to claim 16, wherein, based on the search result, displaying a search result display page containing the first set of identification information and at least one second set of identification information, comprising:

Based on the search result, displaying the search result display page including the first search result display area and the second search result display area;

Wherein, the first search result display area includes the first set identification information and the second set identification information; the second search result display area includes a media content list, and the media content list includes the each media content in the first candidate media content set and the second candidate media content set.

18. The method of claim 16, wherein the method further comprises:

After displaying candidate media content corresponding to any set identification information in response to a trigger operation for any set identification information, in response to a sliding trigger operation, switching to display other candidate media content corresponding to other set identification information.

19. An apparatus for searching media content, wherein the apparatus comprises:

an instruction receiving module for receiving a search instruction for the target media content;

a set determination module, configured to determine, based on the search instruction, a first candidate media content set that matches the scene intent of the target media content, and a second candidate media content set that matches the entity intent of the target media content, The scene intent of the target media content is related to feature information under various preset dimensions;

A result search module, configured to send a search result corresponding to the target media content to the user terminal based on the first candidate media content set and the second candidate media content set.

20. An apparatus for searching media content, wherein the apparatus comprises:

an instruction sending module, configured to send a search instruction for the target media content to the server;

A result receiving module, configured to receive a search result fed back by the server; the search result includes a first candidate media content set that matches the scene intent of the target media content, and/or an entity that matches the target media content a set of second candidate media content for intent matching, where the scene intent of the target media content is related to feature information under various preset dimensions;

A page display module, configured to display a search result display page based on the search result.

21. A computer device, comprising: a processor, a memory, and a bus, wherein the memory stores machine-readable instructions executable by the processor, and when the computer device runs, the processor and the The memories communicate with each other through a bus, and when the machine-readable instructions are executed by the processor, the steps of the method for searching media content according to any one of claims 1 to 18 are performed.

22. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and the computer program is executed by the processor when the media content search according to any one of claims 1 to 18 is executed. steps of the method.