CN112115299B

CN112115299B - Video search method, device, recommendation method, electronic device and storage medium

Info

Publication number: CN112115299B
Application number: CN202010979533.4A
Authority: CN
Inventors: 冯博豪; 庞敏辉; 谢国斌
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-09-17
Filing date: 2020-09-17
Publication date: 2024-08-13
Anticipated expiration: 2040-09-17
Also published as: CN112115299A

Abstract

The embodiments of the present application disclose a video search method, device, recommendation method, electronic device and storage medium, which relate to computer vision and video analysis technology, including: obtaining a search request carrying search information, wherein the search request is used to request a search for a target video corresponding to the search information, if the search information includes text information, determining a first similarity between the text information and pre-set label information of each video, wherein the label information includes a text label and a video frame label, selecting and outputting a target video from each video according to the first similarity, and combining the video frame label to describe the video in more detail and accurately, thereby improving the accuracy of the similarity calculation, thereby improving the accuracy and reliability of the video search, and improving the user's search experience.

Description

Video search method, device, recommendation method, electronic device and storage medium

技术领域Technical Field

本申请涉及人工智能和计算技术中的计算机视觉、图像技术和视频分析技术，尤其涉及一种视频搜索方法、装置、推荐方法、电子设备及存储介质。The present application relates to computer vision, image technology and video analysis technology in artificial intelligence and computing technology, and in particular to a video search method, device, recommendation method, electronic device and storage medium.

背景技术Background Art

随着互联网的发展，视频应用越来越受到人们的喜欢。尤其是小视频类的视频应用，更是为人们带来了快捷和方便，然而，随着视频量的增加，如果提高搜索的准确性成了亟待解决的问题。With the development of the Internet, video applications are becoming more and more popular, especially short video applications, which bring speed and convenience to people. However, with the increase in the amount of videos, improving the accuracy of search has become an urgent problem to be solved.

在现有技术中，用户在向视频应用的服务器上传视频时，可以对上传的视频进行标注，也可以由视频应用的工作人员对用户上传的视频进行标注，生成与视频对应的文本标签，当服务器接收到用户发送的搜索请求时，可以基于文本标签从海量视频中搜索并向用户反馈视频。In the prior art, when a user uploads a video to a server of a video application, the user can annotate the uploaded video, or the staff of the video application can annotate the video uploaded by the user and generate text tags corresponding to the video. When the server receives a search request sent by the user, it can search from a large number of videos based on the text tags and provide feedback to the user.

发明内容Summary of the invention

提供了一种用于提高视频准确性的视频搜索方法、装置、视频推荐方法、电子设备及存储介质。Provided are a video search method, device, video recommendation method, electronic device, and storage medium for improving video accuracy.

根据第一方面，提供了一种视频搜索方法，获取携带搜索信息的搜索请求，其中，所述搜索请求用于请求搜索与所述搜索信息对应的目标视频；According to a first aspect, a video search method is provided, which obtains a search request carrying search information, wherein the search request is used to request a search for a target video corresponding to the search information;

若所述搜索信息包括文本信息，则确定所述文本信息和预先设置的各视频的标签信息的第一相似度，其中，所述标签信息包括文本标签和视频帧标签；If the search information includes text information, determining a first similarity between the text information and preset tag information of each video, wherein the tag information includes a text tag and a video frame tag;

根据所述第一相似度从各所述视频中选取并输出所述目标视频。The target video is selected from each of the videos according to the first similarity and outputted.

在本实施例中，通过将文本信息与包括文本标签和视频帧标签的标签信息进行相似度计算，可以提高搜索结果的准确性和可靠性的技术效果。In this embodiment, by calculating the similarity between text information and tag information including text tags and video frame tags, the accuracy and reliability of search results can be improved.

根据第二方面，本申请实施例提供了一种视频搜索装置，包括：According to a second aspect, an embodiment of the present application provides a video search device, including:

获取模块，用于获取携带搜索信息的搜索请求，其中，所述搜索请求用于请求搜索与所述搜索信息对应的目标视频；An acquisition module, used to acquire a search request carrying search information, wherein the search request is used to request a search for a target video corresponding to the search information;

第一确定模块，用于若所述搜索信息包括文本信息，则确定所述文本信息和预先设置的各视频的标签信息的第一相似度，其中，所述标签信息包括文本标签和视频帧标签；A first determination module, configured to determine a first similarity between the text information and preset tag information of each video if the search information includes text information, wherein the tag information includes a text tag and a video frame tag;

选取模块，用于根据所述第一相似度从各所述视频中选取所述目标视频；A selection module, configured to select the target video from the videos according to the first similarity;

输出模块，用于输出所述目标视频。An output module is used to output the target video.

根据第三方面，本申请实施例提供了一种电子设备，包括：According to a third aspect, an embodiment of the present application provides an electronic device, including:

至少一个处理器；以及at least one processor; and

与所述至少一个处理器通信连接的存储器；其中，a memory communicatively connected to the at least one processor; wherein,

所述存储器存储有可被所述至少一个处理器执行的指令，所述指令被所述至少一个处理器执行，以使所述至少一个处理器能够执行如上任一实施例所述的方法。The memory stores instructions that can be executed by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the method described in any of the above embodiments.

根据第四方面，本申请实施例提供了一种存储有计算机指令的非瞬时计算机可读存储介质，所述计算机指令用于使所述计算机执行如上任一实施例所述的方法。According to a fourth aspect, an embodiment of the present application provides a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to enable the computer to execute the method described in any of the above embodiments.

根据第五方面，本申请实施例提供了一种视频推荐方法，包括：According to a fifth aspect, an embodiment of the present application provides a video recommendation method, including:

获取用户访问视频的历史记录；Get the user's video access history;

确定所述历史记录对应的文本信息；Determine text information corresponding to the historical record;

确定所述历史记录对应的文本信息和预先设置的各视频的标签信息的第三相似度，其中，所述标签信息包括文本标签和视频帧标签；Determine a third similarity between the text information corresponding to the historical record and the preset tag information of each video, wherein the tag information includes a text tag and a video frame tag;

根据所述第三相似度从各所述视频中选取并为所述用户推荐视频。A video is selected from each of the videos according to the third similarity and recommended to the user.

根据本申请的第六方面，提供了一种计算机程序产品，所述程序产品包括：计算机程序，所述计算机程序存储在可读存储介质中，电子设备的至少一个处理器可以从所述可读存储介质读取所述计算机程序，所述至少一个处理器执行所述计算机程序使得电子设备执行第一方面所述的方法。According to the sixth aspect of the present application, a computer program product is provided, the program product comprising: a computer program, the computer program is stored in a readable storage medium, at least one processor of an electronic device can read the computer program from the readable storage medium, and the at least one processor executes the computer program so that the electronic device executes the method described in the first aspect.

本申请提供了一种视频搜索方法、装置、推荐方法、电子设备及存储介质，包括：获取携带搜索信息的搜索请求，其中，搜索请求用于请求搜索与搜索信息对应的目标视频，若搜索信息包括文本信息，则确定文本信息和预先设置的各视频的标签信息的第一相似度，其中，标签信息包括文本标签和视频帧标签，根据第一相似度从各视频中选取并输出目标视频，在本实施例中，标签信息包括两个维度的内容，一个维度的内容为文本标签，另一个维度的内容为视频帧标签，相较于相关技术中仅基于文本标签确定目标视频的方案，本申请实施例结合视频帧标签可以对视频进行更为详细和准确的描述，从而可以提高相似度计算的准确性，进而提高视频搜索的准确性和可靠性的技术效果，尤其针对于文本标签与视频的内容存在较大差异时，采用本实施例的包括视频帧标签的标签信息进行视频搜索，能准确地确定出与用户搜索意图对应的目标视频，提高视频搜索的准确性和可靠性，且提高用户的搜索体验。The present application provides a video search method, device, recommendation method, electronic device and storage medium, including: obtaining a search request carrying search information, wherein the search request is used to request a search for a target video corresponding to the search information, if the search information includes text information, determining a first similarity between the text information and pre-set tag information of each video, wherein the tag information includes a text tag and a video frame tag, selecting and outputting a target video from each video according to the first similarity, in this embodiment, the tag information includes two dimensions of content, one dimension of content is a text tag, and the other dimension of content is a video frame tag, compared with the solution of determining the target video based only on the text tag in the related art, the embodiment of the present application combines the video frame tag to describe the video in more detail and accurately, thereby improving the accuracy of similarity calculation, and further improving the accuracy and reliability of video search, especially when there is a large difference between the text tag and the content of the video, using the tag information including the video frame tag of the present embodiment to perform video search, can accurately determine the target video corresponding to the user's search intention, improve the accuracy and reliability of video search, and improve the user's search experience.

应当理解，本部分所描述的内容并非旨在标识本申请的实施例的关键或重要特征，也不用于限制本申请的范围。本申请的其它特征将通过以下的说明书而变得容易理解。It should be understood that the content described in this section is not intended to identify the key or important features of the embodiments of the present application, nor is it intended to limit the scope of the present application. Other features of the present application will become easily understood through the following description.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

附图用于更好地理解本方案，不构成对本申请的限定。其中：The accompanying drawings are used to better understand the present solution and do not constitute a limitation of the present application.

图1为本申请实施例的视频搜索方法的应用场景示意图；FIG1 is a schematic diagram of an application scenario of a video search method according to an embodiment of the present application;

图2为本申请实施例的视频搜索界面的示意图；FIG2 is a schematic diagram of a video search interface according to an embodiment of the present application;

图3为本申请一个实施例的视频搜索方法的流程示意图；FIG3 is a schematic diagram of a flow chart of a video search method according to an embodiment of the present application;

图4为本申请另一实施例的视频搜索方法的流程示意图；FIG4 is a schematic diagram of a flow chart of a video search method according to another embodiment of the present application;

图5为本申请实施例的视频帧图像的示意图；FIG5 is a schematic diagram of a video frame image according to an embodiment of the present application;

图6为本申请另一实施例的视频搜索方法的流程示意图；FIG6 is a schematic flow chart of a video search method according to another embodiment of the present application;

图7为本申请一个实施例的视频搜索装置的示意图；FIG7 is a schematic diagram of a video search device according to an embodiment of the present application;

图8为本申请另一实施例的视频搜索装置的示意图；FIG8 is a schematic diagram of a video search device according to another embodiment of the present application;

图9为本申请实施例的电子设备的框图；FIG9 is a block diagram of an electronic device according to an embodiment of the present application;

图10为本申请实施例的视频推荐方法的流程示意图。FIG. 10 is a flow chart of a video recommendation method according to an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

以下结合附图对本申请实施例的示范性实施例做出说明，其中包括本申请实施例的各种细节以助于理解，应当将它们认为仅仅是示范性的。因此，本领域普通技术人员应当认识到，可以对这里描述的实施例做出各种改变和修改，而不会背离本申请实施例的范围和精神。同样，为了清楚和简明，以下的描述中省略了对公知功能和结构的描述。The following is a description of exemplary embodiments of the present application in conjunction with the accompanying drawings, including various details of the embodiments of the present application to facilitate understanding, which should be considered as merely exemplary. Therefore, it should be recognized by those of ordinary skill in the art that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of the embodiments of the present application. Similarly, for clarity and conciseness, the description of well-known functions and structures is omitted in the following description.

请参阅图1，图1为本申请实施例的视频搜索方法的应用场景示意图。Please refer to FIG. 1 , which is a schematic diagram of an application scenario of a video search method according to an embodiment of the present application.

在如图1所示的应用场景中，用户设备100上安装有具有视频播放功能的应用程序，可以为抖音、快手和微视等等。用户通过用户设备100中安装的上述应用程序向服务器200发送搜索请求，搜索请求用于请求搜索与搜索信息对应的目标视频。In the application scenario shown in FIG1 , an application with a video playback function is installed on the user device 100, which may be Douyin, Kuaishou, Weishi, etc. The user sends a search request to the server 200 through the above application installed in the user device 100, and the search request is used to request to search for a target video corresponding to the search information.

其中，用户设备100的搜索界面的示意图可以参阅图2。For a schematic diagram of the search interface of the user equipment 100 , please refer to FIG. 2 .

如图2所示，用户可以在输入框(如图2中所示的“输入搜索内容”处)输入用户期望搜索的视频(可以称为目标视频)的相关信息，即该信息可以用于表征用户的搜索意图，如目标视频上传至服务器200的时间，又如目标视频的上传者，又如目标视频的内容，等等。As shown in Figure 2, the user can enter relevant information about the video (which can be called the target video) that the user expects to search in the input box (such as the "Enter search content" as shown in Figure 2), that is, the information can be used to characterize the user's search intention, such as the time when the target video was uploaded to the server 200, the uploader of the target video, the content of the target video, and so on.

当用户在点击如图2中所示的“搜索”时，触发用户设备100向服务器发送搜索请求。When the user clicks “Search” as shown in FIG. 2 , the user device 100 is triggered to send a search request to the server.

服务器200在接收到搜索请求后，可以基于搜索请求从数据库中获取与搜索信息对应的目标视频，并将目标视频发送至用户设备100，由用户设备100对目标视频进行显示。After receiving the search request, the server 200 may obtain a target video corresponding to the search information from the database based on the search request, and send the target video to the user device 100, which then displays the target video.

其中，用户设备100，为可以安装各类应用程序，并且能够将已安装的应用程序中提供的对象进行显示的设备，该电子设备可以是移动的，也可以是固定的，例如，手机、平板电脑、各类可穿戴设备、车载设备、个人数字助理(personal digital assistant，PDA)、销售终端(point of sales，POS)、能够进行短视频推荐的设备或其它能够实现上述功能的电子设备等。Among them, the user device 100 is a device that can install various applications and display objects provided in the installed applications. The electronic device can be mobile or fixed, for example, a mobile phone, a tablet computer, various wearable devices, vehicle-mounted devices, a personal digital assistant (PDA), a point of sales (POS), a device that can recommend short videos, or other electronic devices that can achieve the above functions.

服务器100可以为任何能够提供互联网服务的设备。The server 100 may be any device capable of providing Internet services.

用户设备100与服务器200之间通过网络进行通信连接，该网络可以为局域网和广域网等。The user equipment 100 and the server 200 are communicatively connected via a network, and the network may be a local area network or a wide area network.

值得说明的是，上述实施例只是用于示范性地说明本实施例的视频搜索方法可以适用的应用场景，而不能理解为对应用场景的限定。It is worth noting that the above embodiment is only used to exemplarily illustrate the application scenarios to which the video search method of this embodiment can be applied, and cannot be understood as a limitation on the application scenarios.

基于上述应用场景的描述可知，用户设备可以基于用户输入的与目标视频的相关信息，生成携带搜索信息的搜索请求，并将搜索请求通过通信网络传输至服务器。在相关技术中，搜索信息一般为文本信息，如图2中用户基于输入框输入的文本信息等，服务器将文本信息与预先设置的文本标签进行匹配，并根据匹配结果反馈目标视频。Based on the description of the above application scenario, it can be known that the user device can generate a search request carrying search information based on the relevant information about the target video input by the user, and transmit the search request to the server through the communication network. In the related art, the search information is generally text information, such as the text information input by the user based on the input box in Figure 2, etc. The server matches the text information with the pre-set text tag and feeds back the target video based on the matching result.

然而，文本信息与视频的内容可能存在偏差，导致匹配结果的准确性偏低，降低用户的搜索体验。However, there may be deviations between text information and video content, resulting in low accuracy of matching results and reducing the user's search experience.

本申请的发明人经过创造性地劳动，得到了本申请的发明构思：将文本信息与包括文本标签和视频帧标签的标签信息进行匹配，基于匹配结果确定目标视频。The inventor of the present application has obtained the inventive concept of the present application through creative work: matching text information with tag information including text tags and video frame tags, and determining the target video based on the matching result.

下面以具体地实施例对本申请的技术方案以及本申请的技术方案如何解决上述技术问题进行详细说明。下面这几个具体的实施例可以相互结合，对于相同或相似的概念或过程可能在某些实施例中不再赘述。下面将结合附图，对本申请的实施例进行描述。The technical solution of the present application and how the technical solution of the present application solves the above-mentioned technical problems are described in detail below with specific embodiments. The following specific embodiments can be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments. The embodiments of the present application will be described below in conjunction with the accompanying drawings.

根据本申请实施例的一个方面，本申请实施例提供了一种视频搜索方法，应用于人工智能和计算机技术，具体应用于计算机视觉、图像技术和视频分析技术，以达到视频搜索的可靠性和准确性。According to one aspect of an embodiment of the present application, an embodiment of the present application provides a video search method, which is applied to artificial intelligence and computer technology, specifically to computer vision, image technology and video analysis technology, to achieve reliability and accuracy of video search.

请参阅图3，图3为本申请一个实施例的视频搜索方法的流程示意图。Please refer to FIG. 3 , which is a flow chart of a video search method according to an embodiment of the present application.

如图3所示，该方法包括：As shown in FIG3 , the method includes:

S101：获取携带搜索信息的搜索请求，其中，搜索请求用于请求搜索与搜索信息对应的目标视频。S101: Obtain a search request carrying search information, wherein the search request is used to request a search for a target video corresponding to the search information.

其中，本申请实施例的执行主体可以为视频搜索装置，且视频搜索装置具体可以为服务器(包括云端服务器和本地服务器)，关于服务器的描述可以参见上述示例，此处不再赘述。Among them, the executor of the embodiment of the present application can be a video search device, and the video search device can specifically be a server (including a cloud server and a local server). The description of the server can be found in the above example and will not be repeated here.

结合如图1所示的应用场景，搜索请求可以为用户设备基于用户输入的搜索信息(即用户的搜索意图)生成的。In conjunction with the application scenario shown in FIG. 1 , a search request may be generated by a user device based on search information input by a user (ie, the user's search intent).

S102：若搜索信息包括文本信息，则确定文本信息和预先设置的各视频的标签信息的第一相似度，其中，标签信息包括文本标签和视频帧标签。S102: If the search information includes text information, determine a first similarity between the text information and preset tag information of each video, wherein the tag information includes a text tag and a video frame tag.

其中，文本信息可以理解为文字或者语音相关的信息，如文本信息可以为用户在如图2所示的输入框中输入“美食”；又如，用户设备上可以设置语音组件，用户可以通过该语音组件输入语音信息，则文本信息可以为用户设备对语音信息进行解析，生成的文本信息。Among them, text information can be understood as information related to text or voice. For example, the text information can be the user entering "food" in the input box as shown in Figure 2; for another example, a voice component can be set on the user device, and the user can input voice information through the voice component, then the text information can be the text information generated by the user device analyzing the voice information.

标签信息可以理解为对各视频的描述信息，即基于标签信息，可以确定各视频的相关内容，如若标签信息为美食，则对应的视频应该是与食品相关的视频，如烹饪相关的视频。The tag information can be understood as the description information of each video, that is, based on the tag information, the relevant content of each video can be determined. For example, if the tag information is food, the corresponding video should be a food-related video, such as a cooking-related video.

其中，第一相似度中的“第一”用于，与后文中的第二相似度进行区分，而不能理解为对相似度的内容的限定。The “first” in the first similarity is used to distinguish it from the second similarity in the following text, and should not be understood as a limitation on the content of the similarity.

在本实施例中，第一相似度可以理解为文本信息与标签信息的相似程度。In this embodiment, the first similarity can be understood as the similarity between the text information and the tag information.

值得说明的是，在本实施例中，标签信息可以包括两个维度的内容，一个维度的内容为文本标签，另一个维度的内容为视频帧标签。It is worth noting that, in this embodiment, the tag information may include two dimensions of content, one dimension of content is a text tag, and the other dimension of content is a video frame tag.

其中，关于文本标签的描述如下：The description of the text label is as follows:

在一种可能实现的方式中，在用户将某视频上传至服务器时，可以标注该视频的名称、大概内容以及类型(如故事、搞笑和音乐等)等，服务器可以根据用户对该视频的标注信息生成该视频的文本标签。In one possible implementation, when a user uploads a video to a server, the user can annotate the name, general content, and type (such as story, comedy, and music, etc.) of the video. The server can generate a text label for the video based on the user's annotation information.

在另一种可能实现的方式中，服务器可以设置部分标签(如视频的类型等)，在用户上传某视频时，可以选择由服务器设置的部分标签，并对该视频进行标注，服务器根据用户选择的部分标签和标注的部分标签生成文本标签。In another possible implementation, the server can set partial tags (such as the type of video, etc.). When a user uploads a video, he or she can select the partial tags set by the server and annotate the video. The server generates text tags based on the partial tags selected by the user and the annotated partial tags.

也就是说，服务器基于用户对上传的视频的文字描述和/或选择的服务器预先设置的文字描述，生成的用于对该视频的相关信息进行描述的标签称为文本标签。That is, the tag generated by the server for describing the relevant information of the video based on the text description of the uploaded video by the user and/or the text description preset by the server is called a text tag.

其中，关于视频帧标签的描述如下：The description of the video frame label is as follows:

视频包括多帧图像，视频帧标签可以用于表征对多帧图像的标签，即视频帧可以理解基于视频中的多帧图像确定出的用于对视频进行描述的标签。The video includes multiple frames of images, and the video frame label can be used to represent the labels of the multiple frames of images, that is, the video frame can understand the label determined based on the multiple frames of images in the video and used to describe the video.

值得说明的是，在本实施例中，由于标签信息中包括文本标签和视频帧标签，因此，标签信息相对更加丰富，更能详细且清楚的对视频进行描述，而当基于该标签信息进行相似度计算，并确定目标视频时，可以提高相似度计算的准确性和可靠性，进而实现提高视频搜索的精确性。It is worth noting that, in the present embodiment, since the tag information includes text tags and video frame tags, the tag information is relatively richer and can describe the video in more detail and clarity. When similarity calculation is performed based on the tag information and the target video is determined, the accuracy and reliability of the similarity calculation can be improved, thereby achieving improved accuracy of video search.

S103：根据第一相似度从各视频中选取并输出目标视频。S103: Select and output a target video from each video according to the first similarity.

在一种可能实现的技术方案中，可以基于第一相似度从各视频中选择相似度偏高的视频作为目标视频，如可以确定第一相似度中大于预设阈值的相似度，并将大于预设阈值的相似度对应的视频确定为目标视频，又如，可以选择第一相似度中相似度最大的前n个相似度，并将选择出的相似度对应的视频确定为目标视频。In a possible technical solution, a video with a relatively high similarity can be selected from each video as a target video based on the first similarity. For example, a similarity greater than a preset threshold value in the first similarity can be determined, and the video corresponding to the similarity greater than the preset threshold value can be determined as the target video. For another example, the first n similarities with the largest similarity in the first similarity can be selected, and the video corresponding to the selected similarity can be determined as the target video.

基于上述分析可知，本申请实施例提供了一种视频搜索方法，该方法包括：获取携带搜索信息的搜索请求，其中，搜索请求用于请求搜索与搜索信息对应的目标视频，若搜索信息包括文本信息，则确定文本信息和预先设置的各视频的标签信息的第一相似度，其中，标签信息包括文本标签和视频帧标签，根据第一相似度从各视频中选取并输出目标视频，在本实施例中，标签信息包括两个维度的内容，一个维度的内容为文本标签，另一个维度的内容为视频帧标签，相较于相关技术中仅基于文本标签确定目标视频的方案，本申请实施例结合视频帧标签可以对视频进行更为详细和准确的描述，从而可以提高相似度计算的准确性，进而提高视频搜索的准确性和可靠性的技术效果，尤其针对于文本标签与视频的内容存在较大差异时，采用本实施例的包括视频帧标签的标签信息进行视频搜索，能准确地确定出与用户搜索意图对应的目标视频，提高视频搜索的准确性和可靠性，且提高用户的搜索体验。Based on the above analysis, it can be seen that an embodiment of the present application provides a video search method, which includes: obtaining a search request carrying search information, wherein the search request is used to request a search for a target video corresponding to the search information, if the search information includes text information, then determining a first similarity between the text information and the pre-set tag information of each video, wherein the tag information includes a text tag and a video frame tag, and selecting and outputting a target video from each video according to the first similarity. In this embodiment, the tag information includes two dimensions of content, one dimension of content is a text tag, and the other dimension of content is a video frame tag. Compared with the solution of determining the target video based only on text tags in the related art, the embodiment of the present application combines the video frame tag to describe the video in more detail and accurately, thereby improving the accuracy of similarity calculation, thereby improving the accuracy and reliability of video search, and achieving the technical effect of, especially when there is a large difference between the text tag and the content of the video, the tag information including the video frame tag of this embodiment is used to perform video search, which can accurately determine the target video corresponding to the user's search intention, improve the accuracy and reliability of video search, and improve the user's search experience.

现结合确定视频帧标签的原理，对本申请实施例的视频搜索方法进行更为详细地阐述。Now, in combination with the principle of determining the video frame label, the video search method of the embodiment of the present application is described in more detail.

具体可参阅图4，图4为本申请另一实施例的视频搜索方法的流程示意图。Please refer to FIG. 4 for details, which is a flowchart of a video search method according to another embodiment of the present application.

如图4所示，该方法包括：As shown in FIG4 , the method includes:

S201：对任一个视频均进行切片处理，获得任一个视频对应的切片集合。S201: Slice any video to obtain a slice set corresponding to any video.

应该理解的是，服务器可以接收各用户上传的视频，且每个用户均可以上传多个视频，因此，服务器中存储有多个视频，而服务器针对其接收到的任意一个视频，均可以对其进行切片处理。It should be understood that the server can receive videos uploaded by various users, and each user can upload multiple videos. Therefore, multiple videos are stored in the server, and the server can slice any video it receives.

在一些实施例中，服务器中存储的视频为基于视频的视频信息和音频信息进行筛选获得的。In some embodiments, the video stored in the server is obtained by screening based on the video information and audio information of the video.

例如，当服务器接收到用户上传的视频时，可以从视频的视频信息的维度和音频信息的维度对该视频进行审核，如果审核通过，则将该视频进行存储。For example, when the server receives a video uploaded by a user, the video may be reviewed from the dimensions of video information and audio information of the video, and if the review is passed, the video may be stored.

值得说明的是，通过从视频信息和音频信息两个维度对视频进行审核，可以避免相关技术中从视频信息单一维度对视频进行审核造成的：审核的准确性偏低的问题，避免了漏审误审，从而实现了视频审核的可靠性和准确性的技术效果。It is worth noting that by reviewing videos from two dimensions, video information and audio information, it is possible to avoid the problem of low review accuracy caused by reviewing videos from a single dimension of video information in related technologies, avoid missed reviews and misreviews, and thus achieve the technical effect of reliability and accuracy of video review.

例如，服务器中可以设置视频分类算法(learnable pooling with ContextGating，LPCG)，当服务器接收到用户上传的视频时，可以调用视频分类算法对该视频进行分类计算，从而实现对该视频的审核。For example, a video classification algorithm (learnable pooling with ContextGating, LPCG) may be set in the server. When the server receives a video uploaded by a user, the video classification algorithm may be called to perform classification calculations on the video, thereby implementing an audit of the video.

基于上述分析可知，切片可以理解为对视频切割成多帧图像。相应的，切片集合可以理解包括多帧图像的集合。Based on the above analysis, it can be known that a slice can be understood as cutting a video into multiple frames of images. Accordingly, a slice set can be understood as a set including multiple frames of images.

在一些实施例中，S201可以包括：In some embodiments, S201 may include:

S2011：以时间为切片单位对任一视频均进行切片，获得任一个视频对应的图像。S2011: Slice any video using time as a slicing unit to obtain an image corresponding to any video.

也就是说，可以基于时间对视频进行切片，如以0.1秒为切片单位将视频进行切片，得到视频对应的各图像。That is to say, the video may be sliced based on time, such as slicing the video in units of 0.1 seconds to obtain images corresponding to the video.

值得说明的是，一般而言，视频为彩色视频，则在一种可能实现的方案中，在切片之前，可以对视频进行灰度化处理。It is worth noting that, generally speaking, if the video is a color video, in a possible implementation, the video may be grayscaled before slicing.

通过对视频进行灰度化处理，可以避免颜色干扰，提高切片以及后续聚合等的可靠性和准确性。By gray-scaling the video, color interference can be avoided and the reliability and accuracy of slicing and subsequent aggregation can be improved.

S2012：以对象为切片单位对图像进行切片，获得切片集合。S2012: Slice the image using the object as a slice unit to obtain a slice set.

其中，对象可以为有生命的人或动物，也可以建筑物等，本实施例不做限定。The object may be a living person or animal, or a building, etc., which is not limited in this embodiment.

也就是说，在本实施例中，可以先基于时间对视频进行切片，并在此基础上，基于对象对视频进行切片，而通过基于时间和对象两个层面对视频进行切片，相当于先从大的范围对视频进行切片，然后从小的关注点对视频再次切片，可以提高切片的准确性和可靠性的技术效果，使得基于切片集合生成的视频帧标签具有较高的可靠性。That is to say, in this embodiment, the video can be sliced based on time first, and on this basis, the video can be sliced based on objects. By slicing the video based on both time and object levels, it is equivalent to first slicing the video from a large range, and then slicing the video again from a small focus point. This can improve the technical effect of the accuracy and reliability of the slicing, so that the video frame labels generated based on the slice set have higher reliability.

S202：对切片集合进行聚类处理，获得任一个视频的目标关键帧图像。S202: Perform clustering processing on the slice set to obtain a target key frame image of any video.

在本实施例中，通过聚类的方式对切片集合进行处理，可以减少切片集合中相似度很高的图像划分为同一个类别，避免反复对相同的图像的标注，且避免受到图像噪音的影响，降低视频帧标签准确性，即可以提高确定出对视频进行描述的较高贴合度的视频帧标签。In this embodiment, by processing the slice set in a clustering manner, it is possible to reduce the number of images with high similarity in the slice set being divided into the same category, avoid repeated labeling of the same image, and avoid being affected by image noise, thereby reducing the accuracy of video frame labels, that is, it is possible to improve the determination of video frame labels with a high degree of fit for describing the video.

在一些实施例中，S202可以包括：In some embodiments, S202 may include:

S2021：以对象为类别单位对切片集合进行聚类处理。S2021: Clustering the slice set based on the object as the category unit.

也就是说，在聚类处理时，可以以对象为基础，如以人物、动物和建筑物等为基础，对切片集合中的各图像进行类别的划分。That is to say, during clustering, each image in the slice set can be divided into categories based on objects, such as people, animals, and buildings.

在一种可能实现的技术方案中，确定切片集合中的时间不连续的图像为不同的类别，避免对视频帧标签的遗漏，提高视频帧标签的全面性和完整性。In a possible technical solution, time-discontinuous images in a slice set are determined to be different categories, thereby avoiding omission of video frame labels and improving the comprehensiveness and completeness of video frame labels.

S2022：针对聚类结果中的任一个类别，对任一个类别中图像信息熵最大的图像确定为候选关键帧图像。S2022: For any category in the clustering results, an image with the largest image information entropy in any category is determined as a candidate key frame image.

其中，图像信息熵可以用于表征图像的信息量。相对而言。图像信息熵越大，则说明图像的信息量越大，反之，图像信息熵越小，则说明图像的信息量越少。Among them, image information entropy can be used to characterize the amount of information in an image. Relatively speaking, the larger the image information entropy is, the greater the amount of information in the image is. Conversely, the smaller the image information entropy is, the less the amount of information in the image is.

在本实施例中，通过将图像信息熵最大的图像确定为候选关键帧图像，由于图像信息熵最大的图像为图像信息量最大的图像，因此，使得候选关键帧图像为在类别中具有代表性的图像，从而使得视频帧标签具有对视频的较强的代表性。In this embodiment, the image with the largest image information entropy is determined as the candidate key frame image. Since the image with the largest image information entropy is the image with the largest amount of image information, the candidate key frame image is a representative image in the category, so that the video frame label has a strong representativeness for the video.

S2023：对候选关键帧图像进行冗余处理，获得目标关键帧图像。S2023: Perform redundancy processing on the candidate key frame images to obtain a target key frame image.

为了减少标注量，在获得候选关键帧图像之后，可以对候选关键帧图像进行冗余处理。In order to reduce the amount of annotation, after obtaining the candidate key frame images, the candidate key frame images may be subjected to redundancy processing.

在一些实施例中，S2023可以包括：In some embodiments, S2023 may include:

S20231：针对任一两个时间相邻的候选关键帧图像，确定任一两个时间相邻的候选关键帧图像各自对应的边缘直方图。S20231: For any two temporally adjacent candidate key frame images, determine the edge histograms corresponding to any two temporally adjacent candidate key frame images.

其中，边缘直方图用于体现图像的边缘和纹理特征，确定某候选关键帧图像的边缘直方图的方法可以包括：将候选关键帧图像进行边缘算子运算；计算候选关键帧图像的各个像素的边缘方向；将边缘方向进行量化，得到边缘方向值；将边缘方向值进行直方图统计并归一化处理，得到候选关键帧图像的边缘直方图。Among them, the edge histogram is used to reflect the edge and texture features of the image. The method for determining the edge histogram of a candidate key frame image may include: performing edge operator operation on the candidate key frame image; calculating the edge direction of each pixel of the candidate key frame image; quantizing the edge direction to obtain the edge direction value; performing histogram statistics and normalization processing on the edge direction value to obtain the edge histogram of the candidate key frame image.

S20232：确定各自对应的边缘直方图之间的差异信息。S20232: Determine the difference information between the corresponding edge histograms.

S20233：基于差异信息对候选关键帧图像进行冗余处理，获得目标关键帧图像。S20233: Perform redundancy processing on the candidate key frame images based on the difference information to obtain a target key frame image.

其中，差异信息可以为差值，如候选关键帧图像A和候选关键帧图像B在时间上相邻，则差异信息可以为关键帧图像A的边缘直方图的归一化值与候选关键帧图像B的边缘直方图的归一化值之间的差值。The difference information may be a difference value. For example, if the candidate key frame image A and the candidate key frame image B are adjacent in time, the difference information may be a difference between a normalized value of an edge histogram of the key frame image A and a normalized value of an edge histogram of the candidate key frame image B.

则相应的，可以将差值与预先设置的阈值进行比较，如果差值大于或等于阈值，则说明候选关键帧图像A和候选关键帧图像B之间的差异较大，为了确保视频帧标签的完整性和全面性，将候选关键帧图像A和关键图像帧B均确定为目标关键帧图像；如果差值小于阈值，则说明候选关键帧图像A和候选关键帧图像B之间的差异较小，为了降低标注量，且节约服务器的存储空间，则可以将候选关键帧图像A或者候选关键帧图像B确定目标关键帧图像，或者，可以将候选关键帧图像A和候选关键帧图像B中图像信息熵偏大的图像确定为目标关键帧图像。Correspondingly, the difference can be compared with a preset threshold. If the difference is greater than or equal to the threshold, it means that the difference between candidate key frame image A and candidate key frame image B is large. In order to ensure the integrity and comprehensiveness of the video frame label, both candidate key frame image A and key image frame B are determined as target key frame images; if the difference is less than the threshold, it means that the difference between candidate key frame image A and candidate key frame image B is small. In order to reduce the amount of annotation and save storage space on the server, candidate key frame image A or candidate key frame image B can be determined as the target key frame image, or, the image with a larger image information entropy between candidate key frame image A and candidate key frame image B can be determined as the target key frame image.

S203：根据目标关键帧图像生成视频帧标签。S203: Generate a video frame label according to the target key frame image.

基于上述分析可知，由于目标关键帧图像为剔除了冗余图像的图像，且在此基础上，保留了图像信息量较大的图像，因此，根据目标关键帧图像生成的视频帧标签具有较高的完整性、准确性以及可靠性。Based on the above analysis, it can be seen that since the target key frame image is an image from which redundant images are eliminated, and on this basis, images with a large amount of image information are retained, the video frame labels generated according to the target key frame image have high integrity, accuracy and reliability.

在一些实施例中，S203可以包括：In some embodiments, S203 may include:

S2031：生成用于对目标关键帧图像进行描述的描述信息。S2031: Generate description information for describing the target key frame image.

其中，描述信息可以用于表征对关键帧图像的内容进行表述的信息。The description information may be used to represent information describing the content of the key frame image.

在一些实施例中，S2031可以包括：In some embodiments, S2031 may include:

S20311：确定用于描述目标关键帧图像中的对象的短语。S20311: Determine a phrase used to describe an object in a target key frame image.

S20312：基于预先设置的连接词对短语进行连接，获得描述信息。S20312: Connect the phrases based on pre-set connecting words to obtain description information.

例如，可以由服务器可以对目标关键帧图像进行语义分析和句法分析，从而得到描述信息。For example, the server may perform semantic analysis and syntactic analysis on the target key frame image to obtain description information.

具体地，可以基于预先设置的网络模型，如图像字幕合成的神经网络模型(NeuralCompositional Paradigm for Image Captioning，NCPIC)对目标关键帧图像进行语义分析和句法分析，从而得到描述信息。Specifically, semantic analysis and syntactic analysis may be performed on the target key frame image based on a preset network model, such as a neural network model for image captioning (NCPIC), so as to obtain description information.

现结合图5对生成描述信息进行示范性地描述如下：Now, the generation of description information is described as follows in conjunction with FIG5:

服务器可以对目标关键帧图像的对象进行识别，并形成用于对对象进行描述的短语：小狗、皮球以及草地。服务器中可以预先设置有用于存储连接词的语料库，当确定出用于对对象进行描述的短语时，服务器可以语料库中选取各连接词，使得各短语以及各连接词可以组成合理的语句，且若语句为多个时，可以将各语句与目标关键帧图像进行语义比对，得到最终的语句(可能为一个，也可以为多个)，如：小狗在草地上玩皮球。The server can identify the object of the target key frame image and form a phrase for describing the object: puppy, ball and grass. The server can be pre-set with a corpus for storing connectives. When a phrase for describing an object is determined, the server can select each connective from the corpus so that each phrase and each connective can form a reasonable sentence. If there are multiple sentences, each sentence can be semantically compared with the target key frame image to obtain the final sentence (which may be one or more), such as: the puppy is playing with a ball on the grass.

值得说明的是，在本实施例中，通过先形成短语，而后将短语结合连接词得到描述信息，可以使得描述信息可贴切的对目标关键帧图像进行描述，从而实现视频帧标签的可靠性和准确性的技术效果。It is worth noting that, in this embodiment, by first forming a phrase and then combining the phrase with a conjunction to obtain description information, the description information can appropriately describe the target key frame image, thereby achieving the technical effect of reliability and accuracy of video frame labels.

S2032：根据描述信息生成视频帧标签。S2032: Generate a video frame label according to the description information.

在本实施例中，描述信息用于对目标关键视频帧图像进行描述，因此，根据描述信息生成的视频帧标签，可以较为全面、完整和准确的对视频进行描述，从而提高相似度计算的可靠性和准确性，进而实现视频搜索的可靠性和准确性的技术效果。In this embodiment, the description information is used to describe the target key video frame image. Therefore, the video frame label generated according to the description information can describe the video more comprehensively, completely and accurately, thereby improving the reliability and accuracy of the similarity calculation, and further achieving the technical effect of reliability and accuracy of video search.

在该步骤中，可以直接根据描述信息生成视频帧标签，如结合上述实施例，可以将“小狗在草地上玩皮球”确定为视频帧标签。也可以对描述信息进行扩展，进行同义替换，以对视频帧标签进行丰富。如，将“小狗在草地上玩皮球”扩展为“狗在草地上玩皮球”和“小狗在草坪上玩球”等。In this step, the video frame label can be directly generated according to the description information. For example, in combination with the above embodiment, "a puppy playing with a ball on the grass" can be determined as a video frame label. The description information can also be expanded and synonymous substitution can be performed to enrich the video frame label. For example, "a puppy playing with a ball on the grass" can be expanded to "a dog playing with a ball on the grass" and "a puppy playing with a ball on the lawn".

在一些实施例中，目标关键帧图像包括多帧图像，描述信息包括与多帧图像各自对应的描述信息，则S2032包括：In some embodiments, the target key frame image includes multiple frame images, and the description information includes description information corresponding to each of the multiple frame images, then S2032 includes:

S20321：确与多帧图像各自对应的描述信息的出现次数。S20321: Determine the number of occurrences of the description information corresponding to each of the multiple image frames.

S20322：基于预先设置的选取参数和出现次数从与多帧图像各自对应的描述信息中选取视频帧标签。S20322: Selecting a video frame label from the description information corresponding to each of the plurality of frames of images based on pre-set selection parameters and the number of occurrences.

其中，选取参数可以百分比，也可以选取阈值，且百分比和选取阈值可以由服务器基于需求、历史记录和试验等进行设置，本实施例不做限定。The selection parameter may be a percentage or a threshold, and the percentage and the selection threshold may be set by the server based on demand, historical records, and experiments, etc., and this embodiment does not limit this.

例如，可以选取出现次数在前的5％的描述信息确定视频帧标签；也可以选取出现次数大于预设次数阈值(即选取阈值)的描述信息确定视频帧标签。For example, the description information with the top 5% of occurrence times may be selected to determine the video frame label; or the description information with the occurrence times greater than a preset threshold (ie, the selection threshold) may be selected to determine the video frame label.

在一些实施例中，可以基于标签排序(EmbedRank)算法对各描述信息的出现次数进行计算并排序，并基于百分比和出现次数确定视频帧标签。In some embodiments, the number of occurrences of each description information may be calculated and ranked based on a label ranking (EmbedRank) algorithm, and the video frame label may be determined based on the percentage and the number of occurrences.

在本实施例中，通过选取参数和出现次数选取视频帧标签，可以减少视频帧标签占用的存储空间，减少相似度计算时的计算资源等技术效果。In this embodiment, by selecting parameters and the number of occurrences to select video frame labels, it is possible to reduce the storage space occupied by the video frame labels, reduce the computing resources during similarity calculation, and other technical effects.

S204：获取携带搜索信息的搜索请求，其中，搜索请求用于请求搜索与搜索信息对应的目标视频。S204: Obtain a search request carrying the search information, wherein the search request is used to request a search for a target video corresponding to the search information.

其中，关于S204的描述可以参见S101，此处不再赘述。The description of S204 can be found in S101 and will not be repeated here.

S205：若搜索信息包括文本信息，则确定文本信息和预先设置的各视频的标签信息的第一相似度，其中，标签信息包括文本标签和视频帧标签。S205: If the search information includes text information, determine a first similarity between the text information and preset tag information of each video, wherein the tag information includes a text tag and a video frame tag.

其中，关于S205的描述可以参见S102，此处不再赘述。Among them, the description of S205 can be found in S102 and will not be repeated here.

在一些实施例中，服务器预先设置用于存储标签信息的数据库，并确定文本信息和数据库中的标签信息的第一相似度。In some embodiments, the server pre-sets a database for storing tag information, and determines a first similarity between the text information and the tag information in the database.

值得说明的是，通过预先构建用于存储标签信息的数据库，可以对各标签信息进行统一存储，从而实现计算相似度的便捷性，提高确定第一相似度的效率的技术效果。It is worth noting that, by pre-building a database for storing tag information, each tag information can be stored uniformly, thereby achieving the technical effect of convenience in calculating similarity and improving the efficiency of determining the first similarity.

且，在本实施例中，文本标签可以为服务器基于原始文本标签(如用户上传视频时提供的文本信息，或者，服务器(或工作人员)基于用户上传的视频标注的文本标签)进行扩展。Furthermore, in this embodiment, the text tag may be expanded by the server based on the original text tag (such as text information provided by the user when uploading the video, or text tags annotated by the server (or staff) based on the video uploaded by the user).

例如，服务器可以对原始文本标签进行语义分析，得到相近的标签，用于对原始文本标签进行补充和完善，并可以对补充后的原始文本标签进行排序(如基于标签排序(EmbedRank)算法)实现，并通过最大边缘相关(Maximal Marginal Relevance，MMR)模型对排序后的原始文本标签进行筛选，得到文本标签。For example, the server can perform semantic analysis on the original text tags to obtain similar tags to supplement and improve the original text tags, and can sort the supplemented original text tags (such as based on the tag sorting (EmbedRank) algorithm), and filter the sorted original text tags through the Maximum Marginal Relevance (MMR) model to obtain text tags.

S206：根据第一相似度从各视频中选取并输出目标视频。S206: Select and output a target video from each video according to the first similarity.

其中，关于S206的描述可以参见S103，此处不再赘述。The description of S206 can be found in S103 and will not be repeated here.

值得说明的是，本实施例为了支持视频搜索的多元化，还可以支持图片搜索视频，现结合图6进行示范性地说明。It is worth noting that, in order to support diversified video searches, this embodiment can also support image search for videos, which is now exemplarily described in conjunction with FIG. 6 .

其中，图6为本申请另一实施例的视频搜索方法的流程示意图。FIG6 is a flow chart of a video search method according to another embodiment of the present application.

如图6所示，该方法包括：As shown in FIG6 , the method includes:

S301：获取携带搜索信息的搜索请求，其中，搜索请求用于请求搜索与搜索信息对应的目标视频。S301: Obtain a search request carrying search information, wherein the search request is used to request a search for a target video corresponding to the search information.

其中，关于S301的描述可以参见S101，此处不再赘述。The description of S301 can be found in S101 and will not be repeated here.

S302：判断搜索信息是否包括文本信息和/或图片，若搜索信息中包括文本信息，则执行S303至S304；若搜索信息中包括图片，则执行S305至S306；若搜索信息中包括文本信息和图片，则执行S307至S309。S302: Determine whether the search information includes text information and/or pictures. If the search information includes text information, execute S303 to S304; if the search information includes pictures, execute S305 to S306; if the search information includes text information and pictures, execute S307 to S309.

S303：确定文本信息和预先设置的各视频的标签信息的第一相似度，其中，标签信息包括文本标签和视频帧标签。S303: Determine a first similarity between the text information and preset tag information of each video, wherein the tag information includes a text tag and a video frame tag.

其中，关于S303的描述可以参见S102，此处不再赘述。The description of S303 can be found in S102 and will not be repeated here.

S304：根据第一相似度从各视频中选取并输出目标视频。S304: Select and output a target video from each video according to the first similarity.

其中，关于S304的描述可以参见S103，此处不再赘述。The description of S304 can be found in S103 and will not be repeated here.

在一些实施例中，还可以结合用户的历史搜索记录、历史观看记录以及历史评论信息从目标视频中再次进行筛选，并输出筛选后的目标视频。In some embodiments, the target video may be screened again based on the user's historical search history, historical viewing history, and historical comment information, and the screened target video may be output.

S305：确定图片与各视频的图像的第二相似度。S305: Determine a second similarity between the picture and the images of each video.

其中，该步骤中的图像可以为各视频的目标关键帧图像。The image in this step may be a target key frame image of each video.

S306：根据第二相似度从各视频中选取并输出目标视频。S306: Select and output a target video from each video according to the second similarity.

其中，根据第二似度确定目标视频的原理可以参阅上述实施例中，根据第一相似度确定目标视频的原理，此处不再赘述。The principle of determining the target video according to the second similarity can refer to the principle of determining the target video according to the first similarity in the above embodiment, which will not be described in detail here.

S307：确定文本信息和预先设置的各视频的标签信息的第一相似度，其中，标签信息包括文本标签和视频帧标签。S307: Determine a first similarity between the text information and preset tag information of each video, wherein the tag information includes a text tag and a video frame tag.

S308：确定图片与各视频的图像的第二相似度。S308: Determine a second similarity between the picture and the images of each video.

S309：根据第一相似度和第二相似度的交集确定并输出目标视频。S309: Determine and output a target video according to the intersection of the first similarity and the second similarity.

也就是说，服务器可以分别确定第一相似度对应的视频(下文称为第一视频集合)、第二相似度对应的视频(下文称为第二视频集合)，则可以确定既包含于第一视频集合中的视频，且被包含于第二视频集合中的视频，且将该部分视频确定为目标视频。That is to say, the server can respectively determine the videos corresponding to the first similarity (hereinafter referred to as the first video set) and the videos corresponding to the second similarity (hereinafter referred to as the second video set), and then can determine the videos included in the first video set and the videos included in the second video set, and determine this part of the video as the target video.

同理，也可以从交集中选取部分视频确定为目标视频，其实现原理可以参见上述实施例，此处不再赘述。Similarly, some videos may be selected from the intersection to be determined as target videos. The implementation principle thereof may be referred to the above embodiment and will not be described in detail here.

在另一些实施例中，也可以预先分别为第一相似度和第二相似度分配权重系数，并基于权重系数确定目标视频。In some other embodiments, weight coefficients may be respectively allocated in advance to the first similarity and the second similarity, and the target video may be determined based on the weight coefficients.

在本实施例中，服务器既可以支持基于文本信息的视频搜索，又可以支持基于图片的视频搜索，可以提高视频搜索的灵活性和多样性的技术效果，且通过结合第一相似度和第二相似度对目标视频进行确定，可以提高目标视频的准确性的技术效果。In this embodiment, the server can support both text-based video search and image-based video search, which can improve the technical effect of flexibility and diversity of video search, and by determining the target video in combination with the first similarity and the second similarity, the technical effect of accuracy of the target video can be improved.

根据本申请实施例的另一个方面，本申请实施例还提供了一种视频搜索装置，用于执行如上任一实施例所述的视频搜索方法，如执行如图3、图4以及图6中任一实施例所示的方法。According to another aspect of the embodiments of the present application, the embodiments of the present application also provide a video search device for executing the video search method described in any of the above embodiments, such as executing the method shown in any of the embodiments in Figures 3, 4 and 6.

请参阅图7，图7为本申请一个实施例的视频搜索装置的示意图。Please refer to FIG. 7 , which is a schematic diagram of a video search device according to an embodiment of the present application.

如图7所示，该装置包括：As shown in FIG7 , the device comprises:

获取模块11，用于获取携带搜索信息的搜索请求，其中，所述搜索请求用于请求搜索与所述搜索信息对应的目标视频；An acquisition module 11 is used to acquire a search request carrying search information, wherein the search request is used to request a search for a target video corresponding to the search information;

第一确定模块12，用于若所述搜索信息包括文本信息，则确定所述文本信息和预先设置的各视频的标签信息的第一相似度，其中，所述标签信息包括文本标签和视频帧标签；A first determination module 12, configured to determine a first similarity between the text information and preset tag information of each video if the search information includes text information, wherein the tag information includes a text tag and a video frame tag;

选取模块13，用于根据所述第一相似度从各所述视频中选取所述目标视频；A selection module 13, configured to select the target video from the videos according to the first similarity;

输出模块14，用于输出所述目标视频。The output module 14 is used to output the target video.

结合图8可知，在一些实施例中，还包括：In conjunction with FIG8 , it can be seen that in some embodiments, the method further includes:

切片模块15，用于对任一个视频均进行切片处理，获得所述任一个视频对应的切片集合；A slicing module 15 is used to perform slicing processing on any video to obtain a slice set corresponding to any video;

聚类模块16，用于对所述切片集合进行聚类处理，获得所述任一个视频的目标关键帧图像；A clustering module 16, configured to perform clustering processing on the slice set to obtain a target key frame image of any video;

生成模块17，用于根据所述目标关键帧图像生成所述视频帧标签。The generating module 17 is used to generate the video frame label according to the target key frame image.

在一些实施例中，所述切片模块15用于，以时间为切片单位对所述任一视频均进行切片，获得所述任一个所述视频对应的图像，以对象为切片单位对所述图像进行切片，获得所述切片集合。In some embodiments, the slicing module 15 is used to slice any of the videos using time as a slicing unit to obtain an image corresponding to any of the videos, and to slice the image using objects as a slicing unit to obtain the slice set.

在一些实施例中，所述聚类模块16用于，以所述对象为类别单位对所述切片集合进行聚类处理，针对聚类结果中的任一个类别，对所述任一个类别中图像信息熵最大的图像确定为候选关键帧图像，对所述候选关键帧图像进行冗余处理，获得所述目标关键帧图像。In some embodiments, the clustering module 16 is used to perform clustering processing on the slice set based on the object as a category unit, and for any category in the clustering results, an image with the largest image information entropy in any category is determined as a candidate key frame image, and redundancy processing is performed on the candidate key frame image to obtain the target key frame image.

在一些实施例中，所述聚类模块16用于，针对任一两个时间相邻的候选关键帧图像，确定所述任一两个时间相邻的候选关键帧图像各自对应的边缘直方图，确定所述各自对应的边缘直方图之间的差异信息，基于所述差异信息对所述候选关键帧图像进行冗余处理，获得所述目标关键帧图像。In some embodiments, the clustering module 16 is used to determine, for any two temporally adjacent candidate key frame images, the edge histograms corresponding to each of the two temporally adjacent candidate key frame images, determine the difference information between the corresponding edge histograms, perform redundant processing on the candidate key frame images based on the difference information, and obtain the target key frame image.

在一些实施例中，所述生成模块17用于，生成用于对所述目标关键帧图像进行描述的描述信息，根据所述描述信息生成所述视频帧标签。In some embodiments, the generating module 17 is used to generate description information for describing the target key frame image, and generate the video frame label according to the description information.

在一些实施例中，所述目标关键帧图像包括多帧图像，所述描述信息包括与多帧图像各自对应的描述信息，所述生成模块17用于，确所述与多帧图像各自对应的描述信息的出现次数，基于预先设置的选取参数和所述出现次数从所述与多帧图像各自对应的描述信息中选取所述视频帧标签。In some embodiments, the target key frame image includes multiple frame images, and the description information includes description information corresponding to each of the multiple frame images. The generation module 17 is used to determine the number of occurrences of the description information corresponding to each of the multiple frame images, and select the video frame label from the description information corresponding to each of the multiple frame images based on pre-set selection parameters and the number of occurrences.

在一些实施例中，所述生成模块17用于，确定用于描述所述目标关键帧图像中的对象的短语，基于预先设置的连接词对所述短语进行连接，获得所述描述信息。In some embodiments, the generating module 17 is used to determine phrases for describing the object in the target key frame image, and connect the phrases based on preset connecting words to obtain the description information.

结合图8可知，在一些实施例中，若所述搜索信息中还包括图片，则还包括：As can be seen from FIG8 , in some embodiments, if the search information also includes pictures, it also includes:

第二确定模块18，用于确定所述图片与各所述视频的图像的第二相似度；A second determination module 18, configured to determine a second similarity between the picture and the images of each of the videos;

以及，所述选取模块13用于，将所述第一相似度和所述第二相似度的交集确定为所述目标视频。Furthermore, the selection module 13 is used to determine the intersection of the first similarity and the second similarity as the target video.

在一些实施例中，各所述视频为基于各所述视频的视频信息和音频信息进行筛选获得的。In some embodiments, each of the videos is obtained by screening based on video information and audio information of each of the videos.

处理模块19，用于对所述任一个视频均进行灰度处理；A processing module 19, configured to perform grayscale processing on any of the videos;

以及，所述切片模块用于，对灰度处理后的任一个视频均进行切片处理，获得所述切片集合。Furthermore, the slicing module is used to perform slicing processing on any video after grayscale processing to obtain the slicing set.

根据本申请的实施例，本申请还提供了一种电子设备和一种可读存储介质。According to an embodiment of the present application, the present application also provides an electronic device and a readable storage medium.

根据本申请的实施例，本申请还提供了一种计算机程序产品，程序产品包括：计算机程序，计算机程序存储在可读存储介质中，电子设备的至少一个处理器可以从可读存储介质读取计算机程序，至少一个处理器执行计算机程序使得电子设备执行上述任一实施例提供的方案。According to an embodiment of the present application, the present application also provides a computer program product, the program product includes: a computer program, the computer program is stored in a readable storage medium, at least one processor of an electronic device can read the computer program from the readable storage medium, and at least one processor executes the computer program so that the electronic device executes the solution provided by any of the above embodiments.

如图9所示，是根据本申请实施例的电子设备的框图。电子设备旨在表示各种形式的数字计算机，诸如，膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置，诸如，个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例，并且不意在限制本文中描述的和/或者要求的本申请实施例的实现。As shown in Figure 9, it is a block diagram of an electronic device according to an embodiment of the present application. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workbenches, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device can also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely examples, and are not intended to limit the implementation of the embodiments of the present application described and/or required herein.

如图9所示，该电子设备包括：一个或多个处理器101、存储器102，以及用于连接各部件的接口，包括高速接口和低速接口。各个部件利用不同的总线互相连接，并且可以被安装在公共主板上或者根据需要以其它方式安装。处理器可以对在电子设备内执行的指令进行处理，包括存储在存储器中或者存储器上以在外部输入/输出装置(诸如，耦合至接口的显示设备)上显示GUI的图形信息的指令。在其它实施方式中，若需要，可以将多个处理器和/或多条总线与多个存储器一起使用。同样，可以连接多个电子设备，各个设备提供部分必要的操作(例如，作为服务器阵列、一组刀片式服务器、或者多处理器系统)。图9中以一个处理器101为例。As shown in Figure 9, the electronic device includes: one or more processors 101, a memory 102, and interfaces for connecting various components, including high-speed interfaces and low-speed interfaces. The various components are connected to each other using different buses and can be installed on a common mainboard or installed in other ways as needed. The processor can process instructions executed in the electronic device, including instructions stored in or on the memory to display graphical information of the GUI on an external input/output device (such as a display device coupled to the interface). In other embodiments, if necessary, multiple processors and/or multiple buses can be used together with multiple memories. Similarly, multiple electronic devices can be connected, and each device provides some necessary operations (for example, as a server array, a group of blade servers, or a multi-processor system). In Figure 9, a processor 101 is taken as an example.

存储器102即为本申请实施例所提供的非瞬时计算机可读存储介质。其中，所述存储器存储有可由至少一个处理器执行的指令，以使所述至少一个处理器执行本申请实施例所提供的视频搜索方法。本申请实施例的非瞬时计算机可读存储介质存储计算机指令，该计算机指令用于使计算机执行本申请实施例所提供的视频搜索方法。The memory 102 is a non-transitory computer-readable storage medium provided in the embodiment of the present application. The memory stores instructions executable by at least one processor to enable the at least one processor to perform the video search method provided in the embodiment of the present application. The non-transitory computer-readable storage medium of the embodiment of the present application stores computer instructions, which are used to enable a computer to perform the video search method provided in the embodiment of the present application.

存储器102作为一种非瞬时计算机可读存储介质，可用于存储非瞬时软件程序、非瞬时计算机可执行程序以及模块，如本申请实施例中的程序指令/模块。处理器101通过运行存储在存储器102中的非瞬时软件程序、指令以及模块，从而执行服务器的各种功能应用以及数据处理，即实现上述方法实施例中的视频搜索方法。The memory 102 is a non-transient computer-readable storage medium that can be used to store non-transient software programs, non-transient computer executable programs and modules, such as program instructions/modules in the embodiments of the present application. The processor 101 executes various functional applications and data processing of the server by running the non-transient software programs, instructions and modules stored in the memory 102, that is, implements the video search method in the above method embodiment.

存储器102可以包括存储程序区和存储数据区，其中，存储程序区可存储操作系统、至少一个功能所需要的应用程序；存储数据区可存储根据电子设备的使用所创建的数据等。此外，存储器102可以包括高速随机存取存储器，还可以包括非瞬时存储器，例如至少一个磁盘存储器件、闪存器件、或其他非瞬时固态存储器件。在一些实施例中，存储器102可选包括相对于处理器101远程设置的存储器，这些远程存储器可以通过网络连接至电子设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、区块链服务网络(Block-chain-based Service Network，BSN)、移动通信网及其组合。The memory 102 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application required for at least one function; the data storage area may store data created according to the use of the electronic device, etc. In addition, the memory 102 may include a high-speed random access memory, and may also include a non-transient memory, such as at least one disk storage device, a flash memory device, or other non-transient solid-state storage device. In some embodiments, the memory 102 may optionally include a memory remotely arranged relative to the processor 101, and these remote memories may be connected to the electronic device via a network. Examples of the above-mentioned network include, but are not limited to, the Internet, an intranet, a local area network, a Blockchain Service Network (BSN), a mobile communication network, and a combination thereof.

电子设备还可以包括：输入装置103和输出装置104。处理器101、存储器102、输入装置103和输出装置104可以通过总线或者其他方式连接，图9中以通过总线连接为例。The electronic device may further include: an input device 103 and an output device 104. The processor 101, the memory 102, the input device 103 and the output device 104 may be connected via a bus or other means, and FIG9 takes the connection via a bus as an example.

输入装置103可接收输入的数字或字符信息，以及产生与电子设备的用户设置以及功能控制有关的键信号输入，例如触摸屏、小键盘、鼠标、轨迹板、触摸板、指示杆、一个或者多个鼠标按钮、轨迹球、操纵杆等输入装置。输出装置104可以包括显示设备、辅助照明装置(例如，LED)和触觉反馈装置(例如，振动电机)等。该显示设备可以包括但不限于，液晶显示器(LCD)、发光二极管(LED)显示器和等离子体显示器。在一些实施方式中，显示设备可以是触摸屏。The input device 103 can receive input digital or character information, and generate key signal input related to user settings and function control of the electronic device, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, an indicator bar, one or more mouse buttons, a trackball, a joystick and other input devices. The output device 104 may include a display device, an auxiliary lighting device (e.g., an LED) and a tactile feedback device (e.g., a vibration motor), etc. The display device may include, but is not limited to, a liquid crystal display (LCD), a light emitting diode (LED) display and a plasma display. In some embodiments, the display device may be a touch screen.

此处描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、专用ASIC(专用集成电路)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括：实施在一个或者多个计算机程序中，该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释，该可编程处理器可以是专用或者通用可编程处理器，可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令，并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described herein can be realized in digital electronic circuit systems, integrated circuit systems, dedicated ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various implementations may include: being implemented in one or more computer programs that can be executed and/or interpreted on a programmable system including at least one programmable processor, which can be a special purpose or general purpose programmable processor that can receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.

这些计算机程序(也称作程序、软件、软件应用、或者代码)包括可编程处理器的机器指令，并且可以利用高级过程和/或面向对象的编程语言、和/或汇编/机器语言来实施这些计算机程序。如本文使用的，术语“机器可读介质”和“计算机可读介质”指的是用于将机器指令和/或数据提供给可编程处理器的任何计算机程序产品、设备、和/或装置(例如，磁盘、光盘、存储器、可编程逻辑装置(PLD))，包括，接收作为机器可读信号的机器指令的机器可读介质。术语“机器可读信号”指的是用于将机器指令和/或数据提供给可编程处理器的任何信号。These computer programs (also referred to as programs, software, software applications, or code) include machine instructions for programmable processors and can be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, device, and/or means (e.g., disk, optical disk, memory, programmable logic device (PLD)) for providing machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal for providing machine instructions and/or data to a programmable processor.

为了提供与用户的交互，可以在计算机上实施此处描述的系统和技术，该计算机具有：用于向用户显示信息的显示装置(例如，CRT(阴极射线管)或者LCD(液晶显示器)监视器)；以及键盘和指向装置(例如，鼠标或者轨迹球)，用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互；例如，提供给用户的反馈可以是任何形式的传感反馈(例如，视觉反馈、听觉反馈、或者触觉反馈)；并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide interaction with a user, the systems and techniques described herein can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user; and a keyboard and pointing device (e.g., a mouse or trackball) through which the user can provide input to the computer. Other types of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including acoustic input, voice input, or tactile input).

可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如，作为数据服务器)、或者包括中间件部件的计算系统(例如，应用服务器)、或者包括前端部件的计算系统(例如，具有图形用户界面或者网络浏览器的用户计算机，用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如，通信网络)来将系统的部件相互连接。通信网络的示例包括：局域网(LAN)、区块链服务网络(Block-chain-based Service Network，BSN)、广域网(WAN)和互联网。The systems and techniques described herein may be implemented in a computing system that includes backend components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes frontend components (e.g., a user computer with a graphical user interface or a web browser through which a user can interact with implementations of the systems and techniques described herein), or a computing system that includes any combination of such backend components, middleware components, or frontend components. The components of the system may be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: a local area network (LAN), a Block-chain-based Service Network (BSN), a wide area network (WAN), and the Internet.

计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器，又称为云计算服务器或云主机，是云计算服务体系中的一项主机产品，以解决了传统物理主机与虚拟专用服务器(VPS，Virtual Private Server)服务中，存在的管理难度大，业务扩展性弱的缺陷。A computer system may include a client and a server. The client and the server are generally remote from each other and usually interact through a communication network. The relationship between the client and the server is generated by computer programs running on the corresponding computers and having a client-server relationship with each other. The server may be a cloud server, also known as a cloud computing server or cloud host, which is a host product in the cloud computing service system to solve the defects of difficult management and weak business scalability in traditional physical hosts and virtual private servers (VPS) services.

根据本申请实施例的另一个方面，本申请实施例还提供了一种视频推荐方法。According to another aspect of the embodiment of the present application, the embodiment of the present application also provides a video recommendation method.

请参阅图10，图10为本申请实施例的视频推荐方法的流程示意图。Please refer to FIG. 10 , which is a flowchart of a video recommendation method according to an embodiment of the present application.

如图10所示，该方法包括：As shown in FIG. 10 , the method includes:

S401：获取用户访问视频的历史记录。S401: Obtain the user's video access history.

S402：确定历史记录对应的文本信息。S402: Determine text information corresponding to the historical record.

其中，历史记录对应的文本信息可以从两个维度理解，一个维度为从历史记录本身，如用户观看的视频的文本标签、用户观看的视频的作者、用户观看的视频的上传时间以及用户观看的视频的评论信息(包括本用户和其他用户的评论信息)，等等；另一个维度为基于历史记录扩展的其他信息，如基于历史记录确定出的用户感兴趣的类似视频的相关信息。Among them, the text information corresponding to the historical records can be understood from two dimensions. One dimension is from the historical records themselves, such as the text tags of the videos watched by the user, the authors of the videos watched by the user, the upload time of the videos watched by the user, and the comment information of the videos watched by the user (including the comment information of the user and other users), etc.; the other dimension is other information extended based on the historical records, such as relevant information of similar videos that the user is interested in based on the historical records.

S403：确定历史记录对应的文本信息和预先设置的各视频的标签信息的第三相似度，其中，标签信息包括文本标签和视频帧标签。S403: Determine a third similarity between text information corresponding to the historical record and preset tag information of each video, wherein the tag information includes a text tag and a video frame tag.

其中，关于标签信息的描述可以参见上述实施例，此处不再赘述。The description of the tag information can be found in the above embodiments and will not be repeated here.

也就是说，在本实施例中，服务器可以根据历史记录和标签信息为用户推荐视频，由于服务器为用户推荐的视频充分考虑了文本标签和视频帧标签两个维度的内容，因此，可以提高视频推荐的准确性和可靠性，提高用户的视频观看体验。That is to say, in this embodiment, the server can recommend videos to users based on historical records and tag information. Since the videos recommended by the server to users fully consider the content of two dimensions, namely, text tags and video frame tags, the accuracy and reliability of video recommendations can be improved, thereby improving the user's video viewing experience.

S404：根据第三相似度从各视频中选取并为用户推荐视频。S404: Select a video from each video according to the third similarity and recommend it to the user.

应该理解，可以使用上面所示的各种形式的流程，重新排序、增加或删除步骤。例如，本申请中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行，只要能够实现本申请的技术方案所期望的结果，本文在此不进行限制。It should be understood that the various forms of processes shown above can be used to reorder, add or delete steps. For example, the steps recorded in this application can be executed in parallel, sequentially or in different orders, as long as the expected results of the technical solution of this application can be achieved, and this document is not limited here.

上述具体实施方式，并不构成对本申请保护范围的限制。本领域技术人员应该明白的是，根据设计要求和其他因素，可以进行各种修改、组合、子组合和替代。任何在本申请的精神和原则之内所作的修改、等同替换和改进等，均应包含在本申请保护范围之内。The above specific implementations do not constitute a limitation on the protection scope of this application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions can be made according to design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of this application should be included in the protection scope of this application.

Claims

1. A video search method, comprising:

acquiring a search request carrying search information, wherein the search request is used for requesting to search a target video corresponding to the search information;

if the search information comprises text information, determining a first similarity between the text information and preset label information of each video, wherein the label information comprises a text label and a video frame label;

selecting and outputting the target video from the videos according to the first similarity;

further comprises:

slicing any video to obtain a slice set corresponding to the any video;

Clustering the slice set to obtain a target key frame image of any video;

Generating description information for describing the target key frame image;

generating the video frame tag according to the description information;

The target key frame image comprises a plurality of frame images, the description information comprises description information corresponding to the plurality of frame images, and the generating the video frame tag according to the description information comprises the following steps:

determining the occurrence times of the descriptive information corresponding to the multi-frame images respectively;

selecting the video frame tag from the descriptive information corresponding to the multi-frame images based on preset selection parameters and the occurrence times;

the generating description information for describing the target key frame image comprises the following steps:

Determining a phrase for describing an object in the target key frame image;

and connecting the phrases based on preset connecting words to obtain the description information.

2. The method of claim 1, wherein slicing any video to obtain a set of slices corresponding to the any video comprises:

slicing any video by taking time as a slicing unit to obtain an image corresponding to any video;

and slicing the image by taking the object as a slice unit to obtain the slice set.

3. The method of claim 2, wherein clustering the set of slices comprises:

clustering the slice set by taking the object as a category unit;

for any one of the categories of the clustering result, determining the image with the maximum image information entropy in the any one of the categories as a candidate key frame image;

and carrying out redundancy processing on the candidate key frame images to obtain the target key frame images.

4. The method of claim 3, wherein redundantly processing the candidate key frame images to obtain the target key frame image comprises:

determining edge histograms corresponding to any two candidate key frame images adjacent to each other in time according to any two candidate key frame images adjacent to each other in time;

determining difference information between the respective corresponding edge histograms;

And carrying out redundancy processing on the candidate key frame images based on the difference information to obtain the target key frame images.

5. The method according to any one of claims 1 to 4, wherein if the search information further includes a picture, further comprising:

Determining a second similarity of the picture to an image of each of the videos;

And selecting and outputting the target video from the videos according to the first similarity, wherein the selecting and outputting the target video comprises: and determining and outputting the target video by the intersection of the first similarity and the second similarity.

6. The method according to any one of claims 1 to 4, wherein each of the videos is obtained by filtering based on video information and audio information of each of the videos.

7. The method of any of claims 2 to 4, further comprising: gray processing is carried out on any video;

And performing slice processing on any video, wherein obtaining a slice set corresponding to any video comprises: and slicing any video subjected to gray level processing to obtain the slice set.

8. A video search apparatus comprising:

The acquisition module is used for acquiring a search request carrying search information, wherein the search request is used for requesting to search a target video corresponding to the search information;

The first determining module is used for determining first similarity between the text information and preset label information of each video if the search information comprises the text information, wherein the label information comprises a text label and a video frame label;

The selecting module is used for selecting the target video from the videos according to the first similarity;

the output module is used for outputting the target video;

further comprises:

The slicing module is used for slicing any video to obtain a slicing set corresponding to any video;

the clustering module is used for carrying out clustering processing on the slice set to obtain a target key frame image of any video;

The generation module is used for generating the video frame tag according to the target key frame image;

the generation module is used for generating description information for describing the target key frame image and generating the video frame tag according to the description information;

the generation module is used for determining the occurrence times of the descriptive information corresponding to the multi-frame images, and selecting the video frame tag from the descriptive information corresponding to the multi-frame images based on preset selection parameters and the occurrence times;

the generation module is used for determining a phrase for describing the object in the target key frame image, and connecting the phrase based on preset connecting words to obtain the description information.

9. The apparatus of claim 8, wherein the slicing module is configured to slice the any video in units of time to obtain an image corresponding to the any video, and slice the image in units of slices to obtain the slice set.

10. The apparatus of claim 9, wherein the clustering module is configured to perform clustering on the slice set with the object as a category unit, determine, for any one of the categories of the clustering result, an image with a maximum image information entropy in the any one of the categories as a candidate key frame image, and perform redundancy processing on the candidate key frame image to obtain the target key frame image.

11. The apparatus of claim 10, wherein the clustering module is configured to determine, for any two temporally adjacent candidate key frame images, edge histograms corresponding to the any two temporally adjacent candidate key frame images, determine difference information between the respective corresponding edge histograms, and perform redundancy processing on the candidate key frame images based on the difference information to obtain the target key frame image.

12. The apparatus according to any one of claims 8 to 11, wherein if the search information further includes a picture, further comprising:

a second determining module, configured to determine a second similarity between the picture and an image of each video;

and the selecting module is used for determining the intersection of the first similarity and the second similarity as the target video.

13. The apparatus according to any one of claims 8 to 11, wherein each of the videos is obtained by filtering based on video information and audio information of each of the videos.

14. The apparatus of any of claims 8 to 11, further comprising:

the processing module is used for carrying out gray processing on any video;

And the slicing module is used for slicing any video after gray level processing to obtain the slicing set.

15. An electronic device, comprising:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

16. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-7.

17. A video recommendation method, comprising:

Acquiring a history record of video access of a user;

determining text information corresponding to the history record;

determining a third similarity between text information corresponding to the history record and preset label information of each video, wherein the label information comprises a text label and a video frame label;

selecting videos from the videos according to the third similarity and recommending the videos to the user;

further comprises:

slicing any video to obtain a slice set corresponding to the any video;

Clustering the slice set to obtain a target key frame image of any video;

Generating description information for describing the target key frame image;

generating the video frame tag according to the description information;

Determining a phrase for describing an object in the target key frame image;

18. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-7.