CN113239215B

CN113239215B - Multimedia resource classification method, device, electronic device and storage medium

Info

Publication number: CN113239215B
Application number: CN202110497331.0A
Authority: CN
Inventors: 陈帅; 汪琦; 冯知凡; 柴春光; 朱勇
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-05-07
Filing date: 2021-05-07
Publication date: 2024-05-14
Anticipated expiration: 2041-05-07
Also published as: CN113239215A

Abstract

The application provides a method and a device for classifying multimedia resources, electronic equipment and a storage medium, belongs to the technical field of computers, and particularly relates to the technical fields of computer vision, deep learning, knowledge graph and the like. The specific implementation scheme is as follows: the method comprises the steps of obtaining multimedia resources and multimedia expression vectors corresponding to the multimedia resources, determining knowledge expression vectors corresponding to the multimedia resources according to a plurality of entities in the multimedia resources and relations among the entities, and further determining categories of the multimedia resources according to the multimedia expression vectors and the knowledge expression vectors. Therefore, the deep understanding of the multimedia resources is enhanced by utilizing knowledge, and the classifying effect of the multimedia resources is improved.

Description

Multimedia resource classification method, device, electronic device and storage medium

技术领域Technical Field

本申请涉及计算机技术领域，具体涉及计算机视觉、深度学习、知识图谱等技术领域，尤其涉及多媒体资源的分类方法、装置、电子设备及存储介质。The present application relates to the field of computer technology, specifically to technical fields such as computer vision, deep learning, knowledge graphs, and especially to classification methods, devices, electronic devices, and storage media for multimedia resources.

背景技术Background technique

近年来，随着软硬件技术的快速发展，当前的互联网内容慢慢从图文时代向多媒体(如，视频)时代过渡，每天都有大量多媒体资源(如，UGC(User Generated Content，用户原创内容)视频)被生产、分发和消费，为生产系统带来了巨大压力。为了满足当前互联网多媒体资源爆发式增长的需要，采用多媒体资源的分类技术对多媒体资源按照既定的标签体系自动地给该多媒体资源标注所属标签，从而缓解多媒体资源的审核和分发压力。In recent years, with the rapid development of software and hardware technologies, the current Internet content has gradually transitioned from the era of graphics and text to the era of multimedia (such as video). Every day, a large number of multimedia resources (such as UGC (User Generated Content) videos) are produced, distributed and consumed, which has brought huge pressure to the production system. In order to meet the needs of the current explosive growth of Internet multimedia resources, the classification technology of multimedia resources is used to automatically label the multimedia resources with their own tags according to the established tag system, thereby alleviating the review and distribution pressure of multimedia resources.

相关技术中，从多媒体资源内容本身提取内容特征进行多媒体资源的分类，但是，对于一些特定领域的多媒体资源，对多媒体资源进行分类时没有考虑背景知识的理解，比如，对于医疗、经济等领域的视频，单纯的从视频内容本身提取特征进行视频分类，分类效果较差。In the related art, content features are extracted from the multimedia resource content itself to classify the multimedia resources. However, for multimedia resources in some specific fields, the understanding of background knowledge is not considered when classifying the multimedia resources. For example, for videos in the fields of medicine, economy, etc., simply extracting features from the video content itself to classify the videos will result in poor classification results.

发明内容Summary of the invention

本申请提供了一种用于多媒体资源的分类方法、装置、电子设备及存储介质。The present application provides a method, device, electronic device and storage medium for classifying multimedia resources.

根据本申请的一方面，提供了一种多媒体资源的分类方法，包括：获取待处理的多媒体资源，以及所述多媒体资源对应的多媒体表示向量；确定所述多媒体资源中的多个实体以及所述多个实体之间的关系；根据所述多个实体以及所述多个实体之间的关系，确定所述多媒体资源对应的知识表示向量；根据所述多媒体表示向量以及所述知识表示向量，确定所述多媒体资源的类别。According to one aspect of the present application, a method for classifying multimedia resources is provided, including: obtaining a multimedia resource to be processed, and a multimedia representation vector corresponding to the multimedia resource; determining multiple entities in the multimedia resource and the relationship between the multiple entities; determining a knowledge representation vector corresponding to the multimedia resource based on the multiple entities and the relationship between the multiple entities; determining a category of the multimedia resource based on the multimedia representation vector and the knowledge representation vector.

根据本申请的另一方面，提供了一种多媒体资源的分类装置，包括：获取模块，用于获取待处理的多媒体资源，以及所述多媒体资源对应的多媒体表示向量；第一确定模块，用于确定所述多媒体资源中的多个实体以及所述多个实体之间的关系；第二确定模块，用于根据所述多个实体以及所述多个实体之间的关系，确定所述多媒体资源对应的知识表示向量；第三确定模块，用于根据所述多媒体表示向量以及所述知识表示向量，确定所述多媒体资源的类别。According to another aspect of the present application, a multimedia resource classification device is provided, including: an acquisition module, used to acquire the multimedia resource to be processed, and a multimedia representation vector corresponding to the multimedia resource; a first determination module, used to determine multiple entities in the multimedia resource and the relationship between the multiple entities; a second determination module, used to determine the knowledge representation vector corresponding to the multimedia resource based on the multiple entities and the relationship between the multiple entities; a third determination module, used to determine the category of the multimedia resource based on the multimedia representation vector and the knowledge representation vector.

根据本申请的另一方面，提供了一种电子设备，包括：至少一个处理器；以及与所述至少一个处理器通信连接的存储器；其中，所述存储器存储有可被所述至少一个处理器执行的指令，所述指令被所述至少一个处理器执行，以使所述至少一个处理器能够执行如上所述的多媒体资源的分类方法。According to another aspect of the present application, an electronic device is provided, comprising: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor so that the at least one processor can execute the classification method of multimedia resources as described above.

根据本申请的另一方面，提供了一种存储有计算机指令的非瞬时计算机可读存储介质，其中，所述计算机指令用于使所述计算机执行如上所述的多媒体资源的分类方法。According to another aspect of the present application, a non-transitory computer-readable storage medium storing computer instructions is provided, wherein the computer instructions are used to enable the computer to execute the multimedia resource classification method as described above.

根据本申请的另一方面，提供了一种计算机程序产品，包括计算机程序，所述计算机程序在被处理器执行时实现如上所述的多媒体资源的分类方法。According to another aspect of the present application, a computer program product is provided, including a computer program, wherein when the computer program is executed by a processor, the computer program implements the multimedia resource classification method as described above.

应当理解，本部分所描述的内容并非旨在标识本申请的实施例的关键或重要特征，也不用于限制本申请的范围。本申请的其它特征将通过以下的说明书而变得容易理解。It should be understood that the content described in this section is not intended to identify the key or important features of the embodiments of the present application, nor is it intended to limit the scope of the present application. Other features of the present application will become easily understood through the following description.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

附图用于更好地理解本方案，不构成对本申请的限定。其中：The accompanying drawings are used to better understand the present solution and do not constitute a limitation of the present application.

图1是根据本申请第一实施例的示意图；FIG1 is a schematic diagram of a first embodiment of the present application;

图2是根据本申请第二实施例的示意图；FIG2 is a schematic diagram of a second embodiment of the present application;

图3是根据本申请实施例的确定多媒体资源对应的知识表示向量的流程示意图；3 is a schematic diagram of a process of determining a knowledge representation vector corresponding to a multimedia resource according to an embodiment of the present application;

图4是根据本申请第三实施例的示意图；FIG4 is a schematic diagram according to a third embodiment of the present application;

图5是根据本申请实施例的确定多媒体资源对应的知识表示向量的流程示意图；5 is a schematic diagram of a process of determining a knowledge representation vector corresponding to a multimedia resource according to an embodiment of the present application;

图6是根据本申请第四实施例的示意图；FIG6 is a schematic diagram of a fourth embodiment of the present application;

图7是根据本申请实施例的确定多媒体资源中各种媒体资源的权重的示意图；7 is a schematic diagram of determining weights of various media resources in multimedia resources according to an embodiment of the present application;

图8是根据本申请第五实施例的示意图；FIG8 is a schematic diagram of a fifth embodiment of the present application;

图9是根据本申请第六实施例的示意图；FIG9 is a schematic diagram of a sixth embodiment of the present application;

图10是根据本申请实施例的Transformer的输入示意图；FIG10 is a schematic diagram of input of a Transformer according to an embodiment of the present application;

图11是根据本申请第七实施例的示意图；FIG11 is a schematic diagram of a seventh embodiment of the present application;

图12是根据本申请实施例的多媒体资源的分类示意图；FIG12 is a schematic diagram of classification of multimedia resources according to an embodiment of the present application;

图13是根据本申请第八实施例的示意图；FIG13 is a schematic diagram of an eighth embodiment of the present application;

图14是用来实现本申请实施例的多媒体资源分类的方法的电子设备的框图。FIG. 14 is a block diagram of an electronic device used to implement the method for classifying multimedia resources in an embodiment of the present application.

具体实施方式Detailed ways

以下结合附图对本申请的示范性实施例做出说明，其中包括本申请实施例的各种细节以助于理解，应当将它们认为仅仅是示范性的。因此，本领域普通技术人员应当认识到，可以对这里描述的实施例做出各种改变和修改，而不会背离本申请的范围和精神。同样，为了清楚和简明，以下的描述中省略了对公知功能和结构的描述。The following is a description of exemplary embodiments of the present application in conjunction with the accompanying drawings, including various details of the embodiments of the present application to facilitate understanding, which should be considered as merely exemplary. Therefore, it should be recognized by those of ordinary skill in the art that various changes and modifications can be made to the embodiments described herein without departing from the scope and spirit of the present application. Similarly, for the sake of clarity and conciseness, the description of well-known functions and structures is omitted in the following description.

近年来，近年来，随着软硬件技术的快速发展，当前的互联网内容慢慢从图文时代向多媒体(如，视频)时代过渡，每天都有大量多媒体资源(如，UGC(User GeneratedContent，用户原创内容)视频)被生产、分发和消费，为生产系统带来了巨大压力。为了满足当前互联网多媒体资源爆发式增长的需要，采用多媒体资源的分类技术对原始多媒体资源按照既定的标签体系自动地给该多媒体资源标注所属标签，从而缓解多媒体资源的审核和分发压力。In recent years, with the rapid development of software and hardware technologies, the current Internet content has slowly transitioned from the era of graphics and text to the era of multimedia (such as video). Every day, a large number of multimedia resources (such as UGC (User Generated Content) videos) are produced, distributed and consumed, which has brought huge pressure to the production system. In order to meet the needs of the current explosive growth of Internet multimedia resources, the classification technology of multimedia resources is used to automatically label the original multimedia resources with the corresponding tags according to the established tag system, thereby alleviating the review and distribution pressure of multimedia resources.

相关技术中，以多媒体资源为视频为例，主要采用以下三种方式进行分类，方式(1)人工标注：利用人力对视频打标签；方式(2)基于关键帧的视频分类算法；方式(3)基于多模态融合的视频分类算法；但是，上述三种方案使用场景有所不同，所带来的问题和不足也有所不同，上述方式(1)耗时长，随着视频量的增加，所需的人力成本极大；方式(2)只利用了视频中的关键帧信息，缺乏其他模态信息的补充，容易导致视频内容的理解片面；方式(3)利用了视频多个模态的信息，但对需要背景知识的视频领域无法形成有效的内容理解。In the related art, taking multimedia resources as videos as an example, the following three methods are mainly used for classification: method (1) manual labeling: using manpower to label videos; method (2) video classification algorithm based on key frames; method (3) video classification algorithm based on multimodal fusion; however, the usage scenarios of the above three solutions are different, and the problems and shortcomings they bring are also different. The above method (1) is time-consuming, and as the amount of video increases, the labor cost required is extremely high; method (2) only uses the key frame information in the video, lacks the supplement of other modal information, and easily leads to a one-sided understanding of the video content; method (3) uses information from multiple modes of the video, but cannot form an effective content understanding of the video field that requires background knowledge.

针对上述问题，本申请提供了多媒体资源的分类方法、装置、电子设备及存储介质。In response to the above problems, the present application provides a method, device, electronic device and storage medium for classifying multimedia resources.

图1根据本申请第一实施例的示意图。需要说明的是，本申请实施例的多媒体资源的分类方法可应用于本申请实施例的多媒体资源的分类装置，该装置可被配置于电子设备中。其中，该电子设备可以是移动终端，例如，手机、平板电脑、个人数字助理等具有各种操作系统的硬件设备。FIG1 is a schematic diagram of the first embodiment of the present application. It should be noted that the multimedia resource classification method of the embodiment of the present application can be applied to the multimedia resource classification device of the embodiment of the present application, and the device can be configured in an electronic device. The electronic device can be a mobile terminal, for example, a mobile phone, a tablet computer, a personal digital assistant, and other hardware devices with various operating systems.

如图1所示，该多媒体资源的分类方法可以包括：As shown in FIG1 , the multimedia resource classification method may include:

步骤101，获取待处理的多媒体资源，以及多媒体资源对应的多媒体表示向量。Step 101: Acquire multimedia resources to be processed and multimedia representation vectors corresponding to the multimedia resources.

在本申请实施例中，可将包括视频、音频、文本等多种媒体的资源作为多媒体资源，待处理的多媒体资源可从网络下载、用户上传等方式进行获取。In the embodiment of the present application, resources including various media such as video, audio, and text may be used as multimedia resources, and the multimedia resources to be processed may be obtained by downloading from the network, uploading by users, and the like.

为了从多个角度理解多媒体资源的内容，提升多媒体资源的分类效果，可获取待处理的多媒体资源对应的多媒体表示向量。比如，将待处理的多媒体资源中的各种媒体资源采用向量表示，并根据每种资源对应的权重，确定待处理的多媒体资源对应的多媒体表示向量。又比如，将待处理的多媒体资源中的各种媒体资源采用向量表示，并根据每种资源对应的权重，确定多媒体资源的加权表示向量，接着，对多媒体资源进行特征提取，获取多媒体资源对应的多媒体资源特征向量序列，根据多媒体资源的加权表示向量以及多媒体资源的特征向量序列，得到多媒体资源对应的多媒体表示向量。In order to understand the content of multimedia resources from multiple perspectives and improve the classification effect of multimedia resources, the multimedia representation vector corresponding to the multimedia resources to be processed can be obtained. For example, various media resources in the multimedia resources to be processed are represented by vectors, and the multimedia representation vector corresponding to the multimedia resources to be processed is determined according to the weight corresponding to each resource. For another example, various media resources in the multimedia resources to be processed are represented by vectors, and the weighted representation vector of the multimedia resources is determined according to the weight corresponding to each resource. Then, the multimedia resources are feature extracted to obtain the multimedia resource feature vector sequence corresponding to the multimedia resources. According to the weighted representation vector of the multimedia resources and the feature vector sequence of the multimedia resources, the multimedia representation vector corresponding to the multimedia resources is obtained.

步骤102，确定多媒体资源中的多个实体以及多个实体之间的关系。Step 102: Determine multiple entities in the multimedia resource and the relationship between the multiple entities.

可以理解的是，待处理的多媒体资源中可包括视频、音频、文本等多种媒体资源，视频、音频以及文本中可包括不同种类的实体，比如，人物、事物等实体。It is understandable that the multimedia resources to be processed may include multiple media resources such as video, audio, text, etc., and the video, audio and text may include different types of entities, such as entities such as people and things.

在本申请实施例中，可对多媒体资源中的多个实体进行提取，确定多媒体资源中的多个实体，在确定多媒体资源中的多个实体之后，可根据对应的实体知识图谱确定多个实体之间的关系。In an embodiment of the present application, multiple entities in multimedia resources can be extracted to determine the multiple entities in the multimedia resources. After determining the multiple entities in the multimedia resources, the relationship between the multiple entities can be determined based on the corresponding entity knowledge graph.

步骤103，根据多个实体以及多个实体之间的关系，确定多媒体资源对应的知识表示向量。Step 103: Determine a knowledge representation vector corresponding to the multimedia resource according to the multiple entities and the relationship between the multiple entities.

可选地，在确定多媒体资源中的多个实体以及多个实体之间的关系之后，可根据多媒体资源中的多个实体以及多个实体之间的关系构成一个知识图谱，对知识图谱中的每个节点进行编码，从而可确定多媒体资源对应的知识表示向量。Optionally, after determining multiple entities in the multimedia resources and the relationships between the multiple entities, a knowledge graph can be constructed based on the multiple entities in the multimedia resources and the relationships between the multiple entities, and each node in the knowledge graph can be encoded, so as to determine the knowledge representation vector corresponding to the multimedia resources.

步骤104，根据多媒体表示向量以及知识表示向量，确定多媒体资源的类别。Step 104: Determine the category of the multimedia resource according to the multimedia representation vector and the knowledge representation vector.

在本申请实施例中，可将多媒体表示向量以及知识表示向量进行融合，根据融合后的结果可确定多媒体资源的类别。In an embodiment of the present application, the multimedia representation vector and the knowledge representation vector may be fused, and the category of the multimedia resource may be determined based on the fused result.

综上，通过获取多媒体资源以及多媒体资源对应的多媒体表示向量，以及根据多媒体资源中的多个实体以及多个实体之间的关系，确定多媒体资源对应的知识表示向量，进而根据多媒体表示向量以及知识表示向量，确定多媒体资源的类别。由此，利用知识增强了对多媒体资源的深层次理解，提高了多媒体资源的分类效果。In summary, by obtaining multimedia resources and multimedia representation vectors corresponding to the multimedia resources, and according to multiple entities in the multimedia resources and the relationship between the multiple entities, the knowledge representation vector corresponding to the multimedia resources is determined, and then the category of the multimedia resources is determined according to the multimedia representation vector and the knowledge representation vector. Thus, the use of knowledge enhances the deep understanding of multimedia resources and improves the classification effect of multimedia resources.

为了更加准确地确定多媒体资源中的多个实体之间的关系，如图2所示，图2是根据本申请第二实施例的示意图。在本申请实施例中，可提取多媒体资源中的多个实体，并根据实体知识图谱，进而确定多个实体之间的关系，图2所示实施例的步骤如下：In order to more accurately determine the relationship between multiple entities in the multimedia resources, as shown in FIG2, FIG2 is a schematic diagram according to the second embodiment of the present application. In the embodiment of the present application, multiple entities in the multimedia resources can be extracted, and the relationship between the multiple entities can be determined based on the entity knowledge graph. The steps of the embodiment shown in FIG2 are as follows:

步骤201，获取待处理的多媒体资源以及多媒体资源对应的多媒体表示向量。Step 201: Acquire the multimedia resources to be processed and the multimedia representation vectors corresponding to the multimedia resources.

步骤202，提取多媒体资源中的多个实体。Step 202: extract multiple entities from the multimedia resources.

在本申请实施例中，多媒体资源中可包括多个实体，其中，多个实体可采用以下方式的一种或者多种进行获取：对多媒体资源中视频的各帧图像进行人脸检测，提取实体；对多媒体资源中视频的各帧图像进行光学字符识别，提取实体；对多媒体资源中的文本进行命名体识别，提取文本中的实体。In an embodiment of the present application, a multimedia resource may include multiple entities, wherein the multiple entities may be acquired by one or more of the following methods: performing face detection on each frame image of the video in the multimedia resource to extract the entity; performing optical character recognition on each frame image of the video in the multimedia resource to extract the entity; performing named entity recognition on the text in the multimedia resource to extract the entity in the text.

也就是说，为了更加准确地获取多媒体资源中的多个实体，对于多媒体资源中的不同的媒体中的实体采用不同的方式进行获取，如，可通过人脸检测技术识别多媒体资源中视频的各帧图像中出现的名人实体，通过光学字符识别(Optical CharacterRecognition，简称OCR)技术识别多媒体资源中视频的各帧图像中出现的字幕以获取节目名称或其他有效实体，通过命名体识别(Named Entity Recognition，简称NER)技术识别多媒体资源中的文本中的视频标题包含的实体概念等。That is to say, in order to more accurately obtain multiple entities in multimedia resources, different methods are used to obtain entities in different media in multimedia resources. For example, face detection technology can be used to identify celebrity entities appearing in each frame image of the video in the multimedia resources, optical character recognition (OCR) technology can be used to identify subtitles appearing in each frame image of the video in the multimedia resources to obtain program names or other valid entities, and named entity recognition (NER) technology can be used to identify entity concepts contained in video titles in texts in multimedia resources.

步骤203，根据多个实体查询实体知识图谱，确定多个实体之间的关系。Step 203, querying the entity knowledge graph based on the multiple entities to determine the relationship between the multiple entities.

为了更加准确地确定多媒体资源中的多个实体之间的关系，在提取多媒体资源中的多个实体之后，可对预设的实体知识图谱进行实体查询，从而确定多个实体之间的关系。In order to more accurately determine the relationship between multiple entities in the multimedia resources, after extracting multiple entities in the multimedia resources, an entity query can be performed on a preset entity knowledge graph to determine the relationship between the multiple entities.

步骤204，根据多个实体以及多个实体之间的关系，确定多媒体资源对应的知识表示向量。Step 204: Determine a knowledge representation vector corresponding to the multimedia resource according to the multiple entities and the relationships between the multiple entities.

举例而言，如图3所示，以多媒体资源为视频为例，可通过人脸检测技术、OCR技术、NER技术获取视频中的多个实体，可根据多媒体资源中的多个实体以及多个实体之间的关系构成一个知识图谱，对知识图谱中的每个节点进行编码，从而可确定多媒体资源对应的知识表示向量。For example, as shown in Figure 3, taking the multimedia resource as a video, multiple entities in the video can be obtained through face detection technology, OCR technology, and NER technology. A knowledge graph can be constructed based on the multiple entities in the multimedia resources and the relationship between the multiple entities. Each node in the knowledge graph is encoded, so that the knowledge representation vector corresponding to the multimedia resource can be determined.

步骤205，根据多媒体表示向量以及知识表示向量，确定多媒体资源的类别。Step 205: Determine the category of the multimedia resource according to the multimedia representation vector and the knowledge representation vector.

在本申请实施例中，步骤201、205可以分别采用本申请的各实施例中的任一种方式实现，本申请实施例并不对此作出限定，也不再赘述。In the embodiment of the present application, steps 201 and 205 can be implemented in any way in the embodiments of the present application respectively. The embodiment of the present application does not limit this and will not be repeated.

为了可以准确地确定知识表示向量，更好地对多媒体资源进行深层次地理解，提高多媒体资源的分类效果。如图4所示，图4是根据本申请第三实施例的示意图。在本申请实施例中，可根据多媒体资源中的多个实体以及多个实体之间的关系，确定每个实体的表示向量，根据每个实体的表示向量，进而确定知识表示向量，图4所示实施例的步骤如下：In order to accurately determine the knowledge representation vector, better understand the multimedia resources in depth, and improve the classification effect of multimedia resources. As shown in Figure 4, Figure 4 is a schematic diagram according to the third embodiment of the present application. In the embodiment of the present application, the representation vector of each entity can be determined according to the multiple entities in the multimedia resources and the relationship between the multiple entities, and the knowledge representation vector can be determined according to the representation vector of each entity. The steps of the embodiment shown in Figure 4 are as follows:

步骤401，获取待处理的多媒体资源以及多媒体资源对应的多媒体表示向量。Step 401: Acquire the multimedia resources to be processed and the multimedia representation vectors corresponding to the multimedia resources.

步骤402，确定多媒体资源中的多个实体以及多个实体之间的关系。Step 402: Determine multiple entities in the multimedia resource and the relationship between the multiple entities.

步骤403，根据多个实体以及多个实体之间的关系，确定每个实体的表示向量。Step 403: Determine a representation vector of each entity according to the multiple entities and the relationships between the multiple entities.

在本申请实施例中，可根据多媒体资源中的多个实体以及多个实体之间的关系，形成一个知识图谱，比如，知识图谱中的每个节点为多媒体资源中的实体，知识图谱中的边为多媒体资源中的实体之间的关系。接着，根据知识图谱嵌入学习算法，对知识图谱中每个节点进行编码，以获取每个实体的表示向量。其中，知识图谱嵌入学习算法可包括但不限于TranseE、TranseH等。In an embodiment of the present application, a knowledge graph can be formed based on multiple entities in a multimedia resource and the relationships between multiple entities. For example, each node in the knowledge graph is an entity in the multimedia resource, and the edge in the knowledge graph is the relationship between entities in the multimedia resource. Then, according to the knowledge graph embedding learning algorithm, each node in the knowledge graph is encoded to obtain a representation vector for each entity. Among them, the knowledge graph embedding learning algorithm may include but is not limited to TranseE, TranseH, etc.

步骤404，根据每个实体的表示向量，确定知识表示向量。Step 404: determine a knowledge representation vector based on the representation vector of each entity.

作为一种示例，可基于图坍缩的图神经网络算法，获取根据多媒体资源中的多个实体以及多个实体之间的关系形成的知识图谱的全局表示。如图5所示，通过GNN(GraphNeural Networks，图神经网络)对知识图谱的节点进行学习，得到每个节点属于各个节点簇的概率，根据此概率分布对知识图谱(如图5中Original graph)进行坍缩，缩小簇的数量，将此过程重复，直至得到最后的超级节点，此超级节点即为知识图谱的全局表示，也就是此超级节点为多媒体资源对应的知识表示向量。As an example, a global representation of a knowledge graph formed based on multiple entities and the relationships between multiple entities in multimedia resources can be obtained based on a graph neural network algorithm based on graph collapse. As shown in Figure 5, the nodes of the knowledge graph are learned through GNN (Graph Neural Networks) to obtain the probability that each node belongs to each node cluster. The knowledge graph (such as the Original graph in Figure 5) is collapsed according to this probability distribution to reduce the number of clusters. This process is repeated until the final super node is obtained. This super node is the global representation of the knowledge graph, that is, this super node is the knowledge representation vector corresponding to the multimedia resource.

步骤405，根据多媒体表示向量以及知识表示向量，确定多媒体资源的类别。Step 405: Determine the category of the multimedia resource according to the multimedia representation vector and the knowledge representation vector.

在本申请实施例中，步骤401-402、405可以分别采用本申请的各实施例中的任一种方式实现，本申请实施例并不对此作出限定，也不再赘述。In the embodiment of the present application, steps 401-402 and 405 can be implemented in any way in the embodiments of the present application respectively. The embodiment of the present application does not limit this and will not be repeated.

为了从多个角度理解多媒体资源的内容，提升多媒体资源的分类效果，如图6所示，图6是根据本申请第四实施例的示意图。在本申请实施例中，可根据多媒体资源中各种媒体资源的表示向量以及权重，确定多媒体表示向量，图6所示实施例的步骤如下：In order to understand the content of multimedia resources from multiple perspectives and improve the classification effect of multimedia resources, as shown in Figure 6, Figure 6 is a schematic diagram according to the fourth embodiment of the present application. In the embodiment of the present application, the multimedia representation vector can be determined according to the representation vectors and weights of various media resources in the multimedia resources. The steps of the embodiment shown in Figure 6 are as follows:

步骤601，获取多媒体资源。Step 601, obtaining multimedia resources.

可选地，可从网络下载、用户上传等方式获取多媒体资源。Optionally, multimedia resources may be obtained by downloading from the network, uploading by users, and the like.

步骤602，确定多媒体资源中各种媒体资源的表示向量以及权重。Step 602: Determine the representation vectors and weights of various media resources in the multimedia resources.

可以理解的是，多媒体资源中可包括但不限于视频、音频、文本等多种媒体的资源，其中，视频可分解多张连续的单帧图片，每一帧的图片可对应一个向量，进而视频可对应一个序列向量。比如，可将视频进行解帧，得到连续图片，每一帧的图片可对应一个向量，视频可对应一个序列向量。音频可分解为多个连续的时间相同的音频，每个音频可对应一个向量，进而音频可对应一个序列向量，比如，5秒的音频可分解为5个连续的音频，每个音频对应的时间为1秒，每个音频可对应一个向量，音频可对应一个序列向量。文本可分解为多个字符，每个字符可对应一个向量，文本可对应一个序列向量。由此，可获取多媒体资源中各种媒体资源对应的序列向量。It is understandable that multimedia resources may include but are not limited to resources of various media such as video, audio, and text, among which video can be decomposed into multiple continuous single-frame pictures, each frame of the picture can correspond to a vector, and then the video can correspond to a sequence vector. For example, the video can be deframed to obtain continuous pictures, each frame of the picture can correspond to a vector, and the video can correspond to a sequence vector. Audio can be decomposed into multiple continuous audios with the same time, each audio can correspond to a vector, and then the audio can correspond to a sequence vector. For example, 5 seconds of audio can be decomposed into 5 continuous audios, each audio corresponding to 1 second, each audio can correspond to a vector, and the audio can correspond to a sequence vector. Text can be decomposed into multiple characters, each character can correspond to a vector, and text can correspond to a sequence vector. In this way, the sequence vectors corresponding to various media resources in the multimedia resources can be obtained.

作为一种示例，可将多媒体资源中各种媒体资源对应的序列向量进行池化，以获取各种媒体资源的表示向量。比如，可将视频、音频、文本对应的序列向量分别进行池化，获取视频表示向量、音频表示向量以及文本表示向量。As an example, the sequence vectors corresponding to various media resources in the multimedia resources may be pooled to obtain representation vectors of the various media resources. For example, the sequence vectors corresponding to video, audio, and text may be pooled to obtain a video representation vector, an audio representation vector, and a text representation vector.

在本申请实施例中，在获取各种媒体资源的表示向量之后，可根据各种媒体资源的表示向量以及预设的上下文向量获取各种媒体资源的权重。In the embodiment of the present application, after the representation vectors of various media resources are obtained, the weights of the various media resources may be obtained according to the representation vectors of the various media resources and a preset context vector.

比如，如图7所示，可根据公式(1)分别计算预设的上下文向量与各种媒体资源的表示向量的相似度，根据公式(2)对相似度进行归一，获取各种媒体资源的权重。For example, as shown in FIG7 , the similarities between the preset context vector and the representation vectors of various media resources can be calculated respectively according to formula (1), and the similarities can be normalized according to formula (2) to obtain the weights of various media resources.

其中，E_c表示预设的上下文向量，E_i表示各种媒体资源的表示向量，i表示多媒体资源中的媒体资源数量，E_t可表示媒体资源的表示向量(比如，E_t可为视频表示向量，或者，E_t为音频表示向量，或者，E_t为文本表示向量)，W表示预设值，表示媒体资源的表示向量的转换矩阵。Wherein, E _c represents a preset context vector, E _i represents a representation vector of various media resources, i represents the number of media resources in the multimedia resources, E _t may represent a representation vector of a media resource (for example, E _t may be a video representation vector, or, E _t may be an audio representation vector, or, E _t may be a text representation vector), W represents a preset value, A transformation matrix representing the representation vector of a media resource.

步骤603，根据各种媒体资源的表示向量以及权重进行加权求和处理，得到加权表示向量。Step 603: Perform weighted summation processing according to the representation vectors and weights of various media resources to obtain a weighted representation vector.

为了从多个角度理解多媒体资源的内容，提升多媒体资源的分类效果，可将根据各种媒体资源的表示向量以及权重进行加权求和处理，得到加权表示向量。In order to understand the content of multimedia resources from multiple perspectives and improve the classification effect of multimedia resources, the representation vectors and weights of various media resources may be weighted and summed to obtain a weighted representation vector.

举例而言，以多媒体资源中包括视频、音频、文本为例，多媒体资源中的各种媒体资源的表示向量可分别表示为E_video(视频表示向量)、E_audio(音频表示向量)、E_text(文本表示向量)，各种媒体资源的权重分别表示为a_v(视频权重)、a_a(音频权重)、a_t(文本权重)，加权表示向量可表示为：加权表示向量＝a_vE_video+a_aE_audio+a_tE_text。For example, taking multimedia resources including video, audio, and text as an example, the representation vectors of various media resources in the multimedia resources can be respectively expressed as E _video (video representation vector), E _audio (audio representation vector), and E _text (text representation vector), and the weights of various media resources are respectively expressed as a _v (video weight), a _a (audio weight), and a _t (text weight), and the weighted representation vector can be expressed as: weighted representation vector = a _v E _video + a _a E _audio + a _t E _text .

步骤604，将加权表示向量，作为多媒体表示向量。Step 604: Use the weighted representation vector as a multimedia representation vector.

进一步地，将根据各种媒体资源的表示向量以及权重进行加权求和处理，得到的加权表示向量作为多媒体表示向量。Furthermore, a weighted summation process is performed according to the representation vectors and weights of various media resources, and the obtained weighted representation vector is used as the multimedia representation vector.

步骤605，确定多媒体资源中的多个实体以及多个实体之间的关系。Step 605: Determine multiple entities in the multimedia resource and the relationship between the multiple entities.

步骤606，根据多个实体以及多个实体之间的关系，确定多媒体资源对应的知识表示向量。Step 606: Determine a knowledge representation vector corresponding to the multimedia resource according to the multiple entities and the relationships between the multiple entities.

步骤607，根据多媒体表示向量以及知识表示向量，确定多媒体资源的类别。Step 607: Determine the category of the multimedia resource according to the multimedia representation vector and the knowledge representation vector.

在本申请实施例中，步骤605-607可以分别采用本申请的各实施例中的任一种方式实现，本申请实施例并不对此作出限定，也不再赘述。In the embodiment of the present application, steps 605-607 can be implemented in any of the embodiments of the present application respectively. The embodiment of the present application does not limit this and will not be described in detail.

为了更加深入地从多个角度理解多媒体资源的内容，进一步提升多媒体资源的分类效果，如图8所示，图8是根据本申请第五实施例的示意图。在本申请实施例中，在根据多媒体资源中各种媒体资源的表示向量以及权重，确定多媒体资源的加权表示向量之后，可提取多媒体资源的特征向量序列，并对多媒体资源的特征向量进行处理，得到处理后特征向量序列，根据处理后特征向量获取多媒体表示向量，图8所示实施例的步骤如下：In order to more deeply understand the content of multimedia resources from multiple perspectives and further improve the classification effect of multimedia resources, as shown in Figure 8, Figure 8 is a schematic diagram according to the fifth embodiment of the present application. In the embodiment of the present application, after determining the weighted representation vector of the multimedia resource according to the representation vectors and weights of various media resources in the multimedia resource, the feature vector sequence of the multimedia resource can be extracted, and the feature vector of the multimedia resource is processed to obtain a processed feature vector sequence, and the multimedia representation vector is obtained according to the processed feature vector. The steps of the embodiment shown in Figure 8 are as follows:

步骤801，获取多媒体资源，并根据多媒体资源中各种媒体资源的表示向量以及权重，确定多媒体资源的加权表示向量。Step 801: Acquire multimedia resources, and determine a weighted representation vector of the multimedia resources according to representation vectors and weights of various media resources in the multimedia resources.

步骤802，提取多媒体资源的特征向量序列，并在特征向量序列中的每个特征向量上加上加权表示向量，得到处理后特征向量序列。Step 802: extract a feature vector sequence of multimedia resources, and add a weighted representation vector to each feature vector in the feature vector sequence to obtain a processed feature vector sequence.

作为一种示例，可对多媒体资源进行特征提取，得到媒体资源的多种特征向量子序列，根据该多种特征向量子序列得到媒体资源的特征向量序列，进而将媒体资源的特征向量序列进行拼接，得到多媒体资源的特征向量序列。其中，多媒体资源的特征可包括但不限于多种媒体资源信息特征、多种媒体资源的类型特征、多种媒体资源的位置特征等。As an example, feature extraction can be performed on multimedia resources to obtain multiple feature vector subsequences of media resources, and a feature vector sequence of media resources can be obtained based on the multiple feature vector subsequences, and then the feature vector sequences of media resources can be spliced to obtain a feature vector sequence of multimedia resources. The features of multimedia resources may include but are not limited to multiple media resource information features, multiple media resource type features, multiple media resource location features, etc.

进而，将多媒体的特征向量序列中的每个特征向量上加上加权表示向量，相加结果作为处理后的特征向量序列。Furthermore, a weighted representation vector is added to each feature vector in the feature vector sequence of the multimedia, and the addition result is used as the processed feature vector sequence.

步骤803，对处理后特征向量序列进行语义编码，得到多媒体表示向量。Step 803: semantically encode the processed feature vector sequence to obtain a multimedia representation vector.

作为一种示例，可对处理后的特征向量序列进行池化，获取多媒体表示向量。As an example, the processed feature vector sequence may be pooled to obtain a multimedia representation vector.

步骤804，确定多媒体资源中的多个实体以及多个实体之间的关系。Step 804: determine multiple entities in the multimedia resource and the relationship between the multiple entities.

步骤805，根据多个实体以及多个实体之间的关系，确定多媒体资源对应的知识表示向量。Step 805: Determine a knowledge representation vector corresponding to the multimedia resource according to the multiple entities and the relationships between the multiple entities.

步骤806，根据多媒体表示向量以及知识表示向量，确定多媒体资源的类别。Step 806: Determine the category of the multimedia resource according to the multimedia representation vector and the knowledge representation vector.

在本申请实施例中，步骤801、804-806可以分别采用本申请的各实施例中的任一种方式实现，本申请实施例并不对此作出限定，也不再赘述。In the embodiment of the present application, steps 801, 804-806 can be implemented in any way in the embodiments of the present application respectively. The embodiment of the present application does not limit this and will not be repeated.

综上，通过获取多媒体资源以及多媒体资源对应的多媒体表示向量，以及根据多媒体资源中的多个实体以及多个实体之间的关系，确定多媒体资源对应的知识表示向量，进而根据多媒体表示向量以及知识表示向量，确定多媒体资源的类别。由此，利用知识增强了对多媒体资源的深层次理解，提高了多媒体资源的分类效果。In summary, by obtaining multimedia resources and the multimedia representation vectors corresponding to the multimedia resources, and according to multiple entities in the multimedia resources and the relationship between the multiple entities, the knowledge representation vector corresponding to the multimedia resources is determined, and then the category of the multimedia resources is determined according to the multimedia representation vector and the knowledge representation vector. Thus, the use of knowledge enhances the deep understanding of multimedia resources and improves the classification effect of multimedia resources.

为了便于多媒体资源的特征向量序列的提取，进而从多个角度深入地理解多媒体资源的内容，进一步提升多媒体资源的分类效果，在图8所示基础上，如图9所示，图9是根据本申请第六实施例的示意图。在本申请实施例中，可对多媒体资源进行特征提取，得到媒体资源的多种特征向量子序列，根据该多种特征向量子序列得到媒体资源的特征向量序列，进而将媒体资源的特征向量序列进行拼接，得到多媒体资源的特征向量序列。图9所示实施例包括如下步骤：In order to facilitate the extraction of feature vector sequences of multimedia resources, and further understand the content of multimedia resources from multiple perspectives, and further improve the classification effect of multimedia resources, based on FIG8, FIG9 is a schematic diagram according to the sixth embodiment of the present application. In the embodiment of the present application, feature extraction can be performed on multimedia resources to obtain multiple feature vector subsequences of media resources, and feature vector sequences of media resources can be obtained based on the multiple feature vector subsequences, and then the feature vector sequences of media resources are spliced to obtain feature vector sequences of multimedia resources. The embodiment shown in FIG9 includes the following steps:

步骤901，获取多媒体资源，并根据多媒体资源中各种媒体资源的表示向量以及权重，确定多媒体资源的加权表示向量。Step 901: Acquire multimedia resources, and determine a weighted representation vector of the multimedia resources according to representation vectors and weights of various media resources in the multimedia resources.

步骤902，针对多媒体资源中的每种媒体资源，对媒体资源进行多种特征提取，得到媒体资源的多种特征向量子序列。Step 902: For each media resource in the multimedia resources, multiple features are extracted from the media resource to obtain multiple feature vector subsequences of the media resource.

在本申请实施例中，多媒体资源的特征可包括但不限于多种媒体资源信息特征、多种媒体资源的类型特征、多种媒体资源的位置特征等。In the embodiments of the present application, the characteristics of multimedia resources may include, but are not limited to, various media resource information characteristics, various media resource type characteristics, various media resource location characteristics, and the like.

作为一种示例，多种媒体资源信息特征可从各种媒体资源中进行获取。以媒体资源为视频为例，比如，可利用卷积神经网络从多媒体资源中的视频的每帧图像中获取图像特征；以媒体资源为音频为例，由于音频经信号转换后也可以视为图像，可利用卷积神经网络进行特征提取；以媒体资源为文本为例，文本经文本标记后，可通过BERT或ERNIE等预训练语言模型获取对应的特征。As an example, multiple media resource information features can be obtained from various media resources. Taking video as an example, for example, a convolutional neural network can be used to obtain image features from each frame of the video in the multimedia resource; taking audio as an example, since audio can also be regarded as an image after signal conversion, a convolutional neural network can be used for feature extraction; taking text as an example, after text tagging, the corresponding features can be obtained through pre-trained language models such as BERT or ERNIE.

根据视频的每帧图像的图像特征，音频的每秒音频的图像特征以及文本每个字符的文本特征可获取对应的特征向量子序列，为了对对应的特征向量子序列进行标识，每种媒体对应的特征向量子序列可用不同的数据进行标识，将标记后的对应的特征向量子序列进行最大化池化，可获取每种媒体资源信息特征向量子序列。如图10所示，图10为根据本申请实施例的Transformer的输入示意图，视频对应的特征向量子序列可表示为音频对应的特征向量子序列可表示为/>文本对应的特征向量子序列可表示为其中，视频、音频以及文本对应的特征向量子序列的数量可以相同也可以不同。The corresponding feature vector subsequence can be obtained based on the image features of each frame of the video, the image features of each second of the audio, and the text features of each character of the text. In order to identify the corresponding feature vector subsequence, the feature vector subsequence corresponding to each media can be identified with different data. The marked corresponding feature vector subsequence is maximized and pooled to obtain the feature vector subsequence of each media resource information. As shown in Figure 10, Figure 10 is a schematic diagram of the input of the Transformer according to an embodiment of the present application. The feature vector subsequence corresponding to the video can be expressed as The feature vector subsequence corresponding to the audio can be expressed as/> The feature vector subsequence corresponding to the text can be expressed as The numbers of feature vector subsequences corresponding to the video, audio and text may be the same or different.

为了对多种媒体资源信息特征进行区分，可根据多媒体资源中的各种媒体资源类型设置多种媒体资源的类型特征向量子序列，如图10所示，视频对应的类型特征向量子序列可表示为[M_v,...,M_v]，音频对应的类型特征向量子序列可表示为[M_a,...,M_a]，文本对应的类型特征向量子序列可表示为[M_t,...,M_t]。In order to distinguish the information features of multiple media resources, type feature vector subsequences of multiple media resources can be set according to various media resource types in the multimedia resources. As shown in Figure 10, the type feature vector subsequence corresponding to video can be expressed as [M _v ,...,M _v ], the type feature vector subsequence corresponding to audio can be expressed as [M _a ,...,M _a ], and the type feature vector subsequence corresponding to text can be expressed as [M _t ,...,M _t ].

为了可以确定多种媒体资源信息特征向量子序列的位置关系，可通过余弦编码提供每个位置的相对位置向量，接着，对每个位置的相对位置向量进行聚合，并对聚合向量进行位置编码，可获取多种媒体资源的位置特征向量子序列，如图10所示，视频对应的位置特征向量子序列可表示为[P_agg,P₁,...P_t]，音频对应的位置特征向量子序列可表示为[P_agg,P₁,...P_k]，文本对应的位置特征向量子序列可表示为[P_agg,P₁,...P_j]。In order to determine the positional relationship of feature vector subsequences of multiple media resource information, the relative position vector of each position can be provided by cosine coding. Then, the relative position vector of each position is aggregated and the aggregated vector is positionally encoded to obtain position feature vector subsequences of multiple media resources. As shown in FIG10 , the position feature vector subsequence corresponding to the video can be expressed as [P _agg ,P ₁ ,...P _t ], the position feature vector subsequence corresponding to the audio can be expressed as [P _agg ,P ₁ ,...P _k ], and the position feature vector subsequence corresponding to the text can be expressed as [P _agg ,P ₁ ,...P _j ].

步骤903，对媒体资源的多种特征向量子序列进行加和处理，得到媒体资源的特征向量序列。Step 903: Add up the multiple feature vector subsequences of the media resource to obtain the feature vector sequence of the media resource.

进一步地，可将媒体资源的多种特征向量子序列进行相加，将相加结果作为媒体资源的特征向量序列。Furthermore, multiple feature vector subsequences of the media resource may be added together, and the addition result may be used as the feature vector sequence of the media resource.

比如，可将视频对应的特征向量子序列、视频对应的类型特征向量子序列与视频对应的位置特征向量子序列相加，将相加结果作为视频的特征向量序列。For example, the feature vector subsequence corresponding to the video, the type feature vector subsequence corresponding to the video, and the position feature vector subsequence corresponding to the video may be added, and the addition result may be used as the feature vector sequence of the video.

又比如，可将音频对应的特征向量子序列、音频对应的类型特征向量子序列与音频对应的位置特征向量子序列相加，将相加结果作为音频的特征向量序列。For another example, the feature vector subsequence corresponding to the audio, the type feature vector subsequence corresponding to the audio, and the position feature vector subsequence corresponding to the audio may be added, and the addition result may be used as the feature vector sequence of the audio.

再比如，可将文本对应的特征向量子序列、文本对应的类型特征向量子序列与文本对应的位置特征向量子序列相加，将相加结果作为文本的特征向量序列。For another example, the feature vector subsequence corresponding to the text, the type feature vector subsequence corresponding to the text, and the position feature vector subsequence corresponding to the text may be added together, and the addition result may be used as the feature vector sequence of the text.

步骤904，对各种媒体资源的特征向量序列进行拼接，得到多媒体资源的特征向量序列。Step 904: concatenate the feature vector sequences of various media resources to obtain a feature vector sequence of multimedia resources.

比如，可将视频的特征向量序列、音频的特征向量序列以及文本的特征向量序列进行拼接，将相加结果作为多媒体资源的特征向量序列。For example, the feature vector sequence of the video, the feature vector sequence of the audio, and the feature vector sequence of the text may be concatenated, and the addition result may be used as the feature vector sequence of the multimedia resource.

步骤905，在特征向量序列中的每个特征向量上加上加权表示向量，得到处理后特征向量序列。Step 905: Add a weighted representation vector to each feature vector in the feature vector sequence to obtain a processed feature vector sequence.

步骤906，对处理后特征向量序列进行语义编码，得到多媒体表示向量。Step 906: semantically encode the processed feature vector sequence to obtain a multimedia representation vector.

步骤907，确定多媒体资源中的多个实体以及所述多个实体之间的关系。Step 907: Determine multiple entities in the multimedia resource and the relationship between the multiple entities.

步骤908，根据多个实体以及多个实体之间的关系，确定多媒体资源对应的知识表示向量。Step 908: Determine a knowledge representation vector corresponding to the multimedia resource according to the multiple entities and the relationships between the multiple entities.

步骤909，根据多媒体表示向量以及知识表示向量，确定多媒体资源的类别。Step 909: Determine the category of the multimedia resource according to the multimedia representation vector and the knowledge representation vector.

在本申请实施例中，步骤901、905-909可以分别采用本申请的各实施例中的任一种方式实现，本申请实施例并不对此作出限定，也不再赘述。In the embodiment of the present application, steps 901, 905-909 can be implemented in any way in the embodiments of the present application respectively. The embodiment of the present application does not limit this and will not be repeated.

为了可以自动地对多媒体资源进行分类，满足多媒体资源分类的时效性，减轻多媒体资源审核分发的压力，如图11所示，图11是根据本申请第七实施例的示意图。在本申请实施例中，利用知识增强对多媒体资源的深层次理解，确定多媒体资源的类型，图11所示实施例包括如下步骤：In order to automatically classify multimedia resources, meet the timeliness of multimedia resource classification, and reduce the pressure of multimedia resource review and distribution, as shown in Figure 11, Figure 11 is a schematic diagram according to the seventh embodiment of the present application. In the embodiment of the present application, knowledge is used to enhance the deep understanding of multimedia resources and determine the type of multimedia resources. The embodiment shown in Figure 11 includes the following steps:

步骤1101，获取待处理的多媒体资源以及多媒体资源对应的多媒体表示向量。Step 1101: Acquire the multimedia resources to be processed and the multimedia representation vectors corresponding to the multimedia resources.

步骤1102，确定多媒体资源中的多个实体以及多个实体之间的关系。Step 1102: Determine multiple entities in the multimedia resource and the relationship between the multiple entities.

步骤1103，根据多个实体以及多个实体之间的关系，确定多媒体资源对应的知识表示向量。Step 1103: Determine a knowledge representation vector corresponding to the multimedia resource according to the multiple entities and the relationship between the multiple entities.

步骤1104，确定多媒体表示向量和知识表示向量的权重。Step 1104, determining the weights of the multimedia representation vector and the knowledge representation vector.

在本申请实施例中，多媒体表示向量的权重可设置为1，知识表示向量的权重可根据如下公式进行获取：In the embodiment of the present application, the weight of the multimedia representation vector can be set to 1, and the weight of the knowledge representation vector can be obtained according to the following formula:

W_gate＝sigmoid(W[K_trans,H_kno]+b) (3)W _gate = sigmoid(W[K _trans ,H _kno ]+b) (3)

其中，W_gate表示知识表示向量的权重，K_trans表示多媒体表示向量，H_kno表示知识表示向量，b表示预设偏移量。Among them, W _gate represents the weight of the knowledge representation vector, K _trans represents the multimedia representation vector, H _kno represents the knowledge representation vector, and b represents the preset offset.

步骤1105，根据权重，对多媒体表示向量和知识表示向量进行加权处理，得到融合表示向量。Step 1105 , weighted processing is performed on the multimedia representation vector and the knowledge representation vector according to the weights to obtain a fused representation vector.

比如，可通过如下公式获取融合表示向量：For example, the fusion representation vector can be obtained by the following formula:

H＝K_trans+W_gateH_kno (4)H＝K _trans +W _gate H _kno (4)

其中，H表示融合表示向量，K_trans表示多媒体表示向量，H_kno表示知识表示向量，W_gate表示知识表示向量的权重。Among them, H represents the fusion representation vector, K _trans represents the multimedia representation vector, H _kno represents the knowledge representation vector, and W _gate represents the weight of the knowledge representation vector.

步骤1106，根据融合表示向量，确定多媒体资源的类型。Step 1106: Determine the type of multimedia resource according to the fusion representation vector.

可选地，根据融合表示向量，确定用于预测各个等级类别的中间向量；针对每个等级，根据等级的中间向量以及高等级的中间向量，预测多媒体资源在等级上的类别，其中，高等级为高于所述等级的等级。Optionally, based on the fused representation vector, an intermediate vector for predicting each level category is determined; for each level, based on the intermediate vector of the level and the intermediate vector of the higher level, the category of the multimedia resource at the level is predicted, wherein the higher level is a level higher than the level.

需要理解的是，为了更加准确地确定多媒体资源的类型，多媒体资源的类型可通过设置对应的标签进行表示，多媒体资源对应的标签具有层级结构，比如，以多媒体资源为视频为例，视频对应的一级标签可为动漫、影视等，一级标签为动漫对应的二级标签可为国产动漫、外国动漫，一级标签为影视对应的二级标签可为古装剧、偶像剧等。也就是说，可根据融合表示向量，确定用于预测各个等级类别的中间向量。其中，中间向量可为标签对应的分类表示向量。在对多媒体资源在等级上的类别进行预测时，可根据该等级的中间向量以及高于该等级的中间向量进行预测。It should be understood that in order to more accurately determine the type of multimedia resources, the type of multimedia resources can be represented by setting corresponding tags. The tags corresponding to multimedia resources have a hierarchical structure. For example, taking the multimedia resource as a video, the first-level tag corresponding to the video can be animation, film and television, etc. The second-level tag corresponding to the first-level tag animation can be domestic animation, foreign animation, and the second-level tag corresponding to the first-level tag film and television can be costume drama, idol drama, etc. In other words, the intermediate vector used to predict each level category can be determined based on the fusion representation vector. Among them, the intermediate vector can be the classification representation vector corresponding to the label. When predicting the category of the multimedia resource at the level, the prediction can be made based on the intermediate vector of the level and the intermediate vector above the level.

举例而言，如图12所示，其中H为融合表示向量，经过神经网络可获取h_l1和h_l2，h_l1和h_l2为一级标签和二级标签的分类表示向量，在二级标签分类层通过拼接一级标签表示向量来建立一二级标签之间的约束关系，一级标签和二级标签/>可根据如下公式进行计算：For example, as shown in Figure 12, H is the fusion representation vector. After the neural network, h _l1 and h _l2 can be obtained. h _l1 and h _l2 are the classification representation vectors of the first-level label and the second-level label. In the second-level label classification layer, the constraint relationship between the first-level and second-level labels is established by splicing the first-level label representation vector. and secondary tags/> It can be calculated according to the following formula:

其中，w_l1、w_l2为预设权重，b_l1、b_l2为预设偏移量。Among them, w _l1 and w _l2 are preset weights, and b _l1 and b _l2 are preset offsets.

在本申请实施例中，步骤1101-1103可以分别采用本申请的各实施例中的任一种方式实现，本申请实施例并不对此作出限定，也不再赘述。In the embodiment of the present application, steps 1101-1103 can be implemented in any of the embodiments of the present application respectively. The embodiment of the present application does not limit this and will not be repeated.

本申请实施例的多媒体资源的分类方法，通过获取多媒体资源以及多媒体资源对应的多媒体表示向量，以及根据多媒体资源中的多个实体以及多个实体之间的关系，确定多媒体资源对应的知识表示向量，进而根据多媒体表示向量以及知识表示向量，确定多媒体资源的类别。由此，利用知识增强了对多媒体资源的深层次理解，提高了多媒体资源的分类效果。The multimedia resource classification method of the embodiment of the present application obtains multimedia resources and multimedia representation vectors corresponding to the multimedia resources, and determines the knowledge representation vector corresponding to the multimedia resources according to multiple entities in the multimedia resources and the relationship between the multiple entities, and then determines the category of the multimedia resources according to the multimedia representation vector and the knowledge representation vector. Thus, the deep understanding of multimedia resources is enhanced by using knowledge, and the classification effect of multimedia resources is improved.

为了实现上述实施例，本申请实施例还提出一种多媒体资源的分类装置。In order to implement the above embodiment, the embodiment of the present application also proposes a classification device for multimedia resources.

图13是根据本申请第八实施例的示意图。如图13所示，该多媒体资源的分类装置1300包括：获取模块1310、第一确定模块1320、第二确定模块1330、第三确定模块1340。Fig. 13 is a schematic diagram of the eighth embodiment of the present application. As shown in Fig. 13 , the multimedia resource classification device 1300 includes: an acquisition module 1310 , a first determination module 1320 , a second determination module 1330 , and a third determination module 1340 .

其中，获取模块1310，用于获取待处理的多媒体资源以及多媒体资源对应的多媒体表示向量；第一确定模块1320，用于确定多媒体资源中的多个实体以及多个实体之间的关系；第二确定模块1330，用于根据多个实体以及多个实体之间的关系，确定多媒体资源对应的知识表示向量；第三确定模块1340，用于根据多媒体表示向量以及知识表示向量，确定多媒体资源的类别。Among them, the acquisition module 1310 is used to obtain the multimedia resources to be processed and the multimedia representation vector corresponding to the multimedia resources; the first determination module 1320 is used to determine multiple entities in the multimedia resources and the relationship between the multiple entities; the second determination module 1330 is used to determine the knowledge representation vector corresponding to the multimedia resources based on the multiple entities and the relationship between the multiple entities; the third determination module 1340 is used to determine the category of the multimedia resources based on the multimedia representation vector and the knowledge representation vector.

作为本申请实施例的一种可能实现方式，第一确定模块1320，具体用于：提取多媒体资源中的所述多个实体；根据多个实体查询实体知识图谱，确定多个实体之间的关系。As a possible implementation method of the embodiment of the present application, the first determination module 1320 is specifically used to: extract the multiple entities in the multimedia resources; query the entity knowledge graph according to the multiple entities, and determine the relationship between the multiple entities.

作为本申请实施例的一种可能实现方式，第一确定模块1320，还用于：对所述多媒体资源中视频的各帧图像进行人脸检测，提取实体；和/或，对多媒体资源中视频的各帧图像进行光学字符识别，提取实体；和/或，对多媒体资源中的文本进行命名体识别，提取文本中的实体。As a possible implementation method of an embodiment of the present application, the first determination module 1320 is also used to: perform face detection on each frame image of the video in the multimedia resource to extract entities; and/or, perform optical character recognition on each frame image of the video in the multimedia resource to extract entities; and/or, perform named entity recognition on the text in the multimedia resource to extract entities in the text.

作为本申请实施例的一种可能实现方式，第二确定模块1330，具体用于：根据多个实体以及多个实体之间的关系，确定每个实体的表示向量；根据每个实体的表示向量，确定知识表示向量。As a possible implementation method of the embodiment of the present application, the second determination module 1330 is specifically used to: determine the representation vector of each entity according to multiple entities and the relationship between the multiple entities; determine the knowledge representation vector according to the representation vector of each entity.

作为本申请实施例的一种可能实现方式，获取模块1310，具体用于：获取多媒体资源；确定多媒体资源中各种媒体资源的表示向量以及权重；根据各种媒体资源的表示向量以及权重进行加权求和处理，得到加权表示向量；将加权表示向量，作为多媒体表示向量。As a possible implementation method of an embodiment of the present application, the acquisition module 1310 is specifically used to: acquire multimedia resources; determine the representation vectors and weights of various media resources in the multimedia resources; perform weighted summation processing based on the representation vectors and weights of various media resources to obtain a weighted representation vector; and use the weighted representation vector as a multimedia representation vector.

作为本申请实施例的一种可能实现方式，获取模块1310，还用于：获取多媒体资源，并根据多媒体资源中各种媒体资源的表示向量以及权重，确定多媒体资源的加权表示向量；提取多媒体资源的特征向量序列，并在特征向量序列中的每个特征向量上加上加权表示向量，得到处理后特征向量序列；对处理后特征向量序列进行语义编码，得到多媒体表示向量。As a possible implementation method of an embodiment of the present application, the acquisition module 1310 is also used to: acquire multimedia resources, and determine a weighted representation vector of the multimedia resources based on the representation vectors and weights of various media resources in the multimedia resources; extract a feature vector sequence of the multimedia resources, and add the weighted representation vector to each feature vector in the feature vector sequence to obtain a processed feature vector sequence; and perform semantic encoding on the processed feature vector sequence to obtain a multimedia representation vector.

作为本申请实施例的一种可能实现方式，获取模块1310，还用于：针对多媒体资源中的每种媒体资源，对媒体资源进行多种特征提取，得到媒体资源的多种特征向量子序列；对媒体资源的多种特征向量子序列进行加和处理，得到媒体资源的特征向量序列；对各种媒体资源的特征向量序列进行拼接，得到多媒体资源的特征向量序列。As a possible implementation method of an embodiment of the present application, the acquisition module 1310 is also used to: for each media resource in the multimedia resources, perform multiple feature extraction on the media resources to obtain multiple feature vector subsequences of the media resources; add the multiple feature vector subsequences of the media resources to obtain a feature vector sequence of the media resources; splice the feature vector sequences of various media resources to obtain a feature vector sequence of the multimedia resources.

作为本申请实施例的一种可能实现方式，第三确定模块1340，具体用于：确定多媒体表示向量和知识表示向量的权重；根据权重，对多媒体表示向量和知识表示向量进行加权处理，得到融合表示向量；根据融合表示向量，确定多媒体资源的类型。As a possible implementation method of an embodiment of the present application, the third determination module 1340 is specifically used to: determine the weights of the multimedia representation vector and the knowledge representation vector; perform weighted processing on the multimedia representation vector and the knowledge representation vector according to the weights to obtain a fused representation vector; and determine the type of multimedia resource according to the fused representation vector.

作为本申请实施例的一种可能实现方式，第三确定模块1340还用于：根据融合表示向量，确定用于预测各个等级类别的中间向量；针对每个等级，根据等级的中间向量以及高等级的中间向量，预测多媒体资源在等级上的类别，其中，高等级为高于等级的等级。As a possible implementation method of an embodiment of the present application, the third determination module 1340 is also used to: determine the intermediate vector used to predict each level category based on the fused representation vector; for each level, predict the level category of the multimedia resource based on the intermediate vector of the level and the intermediate vector of the high level, wherein the high level is a level higher than the level.

本申请实施例的多媒体资源的分类装置，通过获取多媒体资源以及多媒体资源对应的多媒体表示向量，以及根据多媒体资源中的多个实体以及多个实体之间的关系，确定多媒体资源对应的知识表示向量，进而根据多媒体表示向量以及知识表示向量，确定多媒体资源的类别。由此，利用知识增强了对多媒体资源的深层次理解，提高了多媒体资源的分类效果。The multimedia resource classification device of the embodiment of the present application obtains multimedia resources and multimedia representation vectors corresponding to the multimedia resources, and determines the knowledge representation vector corresponding to the multimedia resources according to multiple entities in the multimedia resources and the relationship between the multiple entities, and then determines the category of the multimedia resources according to the multimedia representation vector and the knowledge representation vector. Thus, the deep understanding of multimedia resources is enhanced by using knowledge, and the classification effect of multimedia resources is improved.

根据本申请的实施例，本申请还提供了一种电子设备、一种可读存储介质和一种计算机程序产品。According to an embodiment of the present application, the present application also provides an electronic device, a readable storage medium and a computer program product.

图14示出了可以用来实施本申请的实施例的示例电子设备1400的示意性框图。电子设备旨在表示各种形式的数字计算机，诸如，膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置，诸如，个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例，并且不意在限制本文中描述的和/或者要求的本申请的实现。Figure 14 shows a schematic block diagram of an example electronic device 1400 that can be used to implement an embodiment of the present application. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workbenches, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device can also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely examples, and are not intended to limit the implementation of the present application described and/or required herein.

如图14所示，设备1400包括计算单元1401，其可以根据存储在只读存储器(ROM)1402中的计算机程序或者从存储单元1408加载到随机访问存储器(RAM)1403中的计算机程序，来执行各种适当的动作和处理。在RAM 1403中，还可存储设备1400操作所需的各种程序和数据。计算单元1401、ROM 1402以及RAM 1403通过总线1404彼此相连。输入/输出(I/O)接口1405也连接至总线1404。As shown in FIG. 14 , the device 1400 includes a computing unit 1401, which can perform various appropriate actions and processes according to a computer program stored in a read-only memory (ROM) 1402 or a computer program loaded from a storage unit 1408 into a random access memory (RAM) 1403. In the RAM 1403, various programs and data required for the operation of the device 1400 can also be stored. The computing unit 1401, the ROM 1402, and the RAM 1403 are connected to each other via a bus 1404. An input/output (I/O) interface 1405 is also connected to the bus 1404.

设备1400中的多个部件连接至I/O接口1405，包括：输入单元1406，例如键盘、鼠标等；输出单元1407，例如各种类型的显示器、扬声器等；存储单元1408，例如磁盘、光盘等；以及通信单元1409，例如网卡、调制解调器、无线通信收发机等。通信单元1409允许设备1400通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。A number of components in the device 1400 are connected to the I/O interface 1405, including: an input unit 1406, such as a keyboard, a mouse, etc.; an output unit 1407, such as various types of displays, speakers, etc.; a storage unit 1408, such as a disk, an optical disk, etc.; and a communication unit 1409, such as a network card, a modem, a wireless communication transceiver, etc. The communication unit 1409 allows the device 1400 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.

计算单元1401可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元1401的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元1401执行上文所描述的各个方法和处理，例如多媒体资源的分类方法。例如，在一些实施例中，多媒体资源的分类方法可被实现为计算机软件程序，其被有形地包含于机器可读介质，例如存储单元1408。在一些实施例中，计算机程序的部分或者全部可以经由ROM 1402和/或通信单元1409而被载入和/或安装到设备1400上。当计算机程序加载到RAM 1403并由计算单元1401执行时，可以执行上文描述的多媒体资源的分类方法的一个或多个步骤。备选地，在其他实施例中，计算单元1401可以通过其他任何适当的方式(例如，借助于固件)而被配置为执行多媒体资源的分类方法。The computing unit 1401 may be a variety of general and/or special processing components with processing and computing capabilities. Some examples of the computing unit 1401 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various dedicated artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, digital signal processors (DSPs), and any appropriate processors, controllers, microcontrollers, etc. The computing unit 1401 performs the various methods and processes described above, such as a classification method for multimedia resources. For example, in some embodiments, the classification method for multimedia resources may be implemented as a computer software program, which is tangibly included in a machine-readable medium, such as a storage unit 1408. In some embodiments, part or all of the computer program may be loaded and/or installed on the device 1400 via the ROM 1402 and/or the communication unit 1409. When the computer program is loaded into the RAM 1403 and executed by the computing unit 1401, one or more steps of the classification method for multimedia resources described above may be performed. Alternatively, in other embodiments, the computing unit 1401 may be configured to execute the multimedia resource classification method in any other appropriate manner (for example, by means of firmware).

本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括：实施在一个或者多个计算机程序中，该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释，该可编程处理器可以是专用或者通用可编程处理器，可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令，并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various implementations can include: being implemented in one or more computer programs that can be executed and/or interpreted on a programmable system including at least one programmable processor, which can be a special purpose or general purpose programmable processor that can receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.

用于实施本申请的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器，使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行，作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。The program code for implementing the method of the present application can be written in any combination of one or more programming languages. These program codes can be provided to a processor or controller of a general-purpose computer, a special-purpose computer, or other programmable data processing device, so that the program code, when executed by the processor or controller, implements the functions/operations specified in the flow chart and/or block diagram. The program code can be executed entirely on the machine, partially on the machine, partially on the machine and partially on a remote machine as a stand-alone software package, or entirely on a remote machine or server.

在本申请的上下文中，机器可读介质可以是有形的介质，其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备，或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present application, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, device, or equipment. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or equipment, or any suitable combination of the foregoing. A more specific example of a machine-readable storage medium may include an electrical connection based on one or more lines, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

为了提供与用户的交互，可以在计算机上实施此处描述的系统和技术，该计算机具有：用于向用户显示信息的显示装置(例如，CRT(阴极射线管)或者LCD(液晶显示器)监视器)；以及键盘和指向装置(例如，鼠标或者轨迹球)，用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互；例如，提供给用户的反馈可以是任何形式的传感反馈(例如，视觉反馈、听觉反馈、或者触觉反馈)；并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide interaction with a user, the systems and techniques described herein can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user; and a keyboard and pointing device (e.g., a mouse or trackball) through which the user can provide input to the computer. Other types of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including acoustic input, voice input, or tactile input).

可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如，作为数据服务器)、或者包括中间件部件的计算系统(例如，应用服务器)、或者包括前端部件的计算系统(例如，具有图形用户界面或者网络浏览器的用户计算机，用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如，通信网络)来将系统的部件相互连接。通信网络的示例包括：局域网(LAN)、广域网(WAN)、互联网和区块链网络。The systems and techniques described herein can be implemented in a computing system that includes backend components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes frontend components (e.g., a user computer with a graphical user interface or a web browser through which a user can interact with implementations of the systems and techniques described herein), or a computing system that includes any combination of such backend components, middleware components, or frontend components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: a local area network (LAN), a wide area network (WAN), the Internet, and a blockchain network.

计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器也可以为分布式系统的服务器，或者是结合了区块链的服务器。A computer system may include a client and a server. The client and the server are generally remote from each other and usually interact through a communication network. The relationship of client and server is generated by computer programs running on respective computers and having a client-server relationship to each other. The server may also be a server of a distributed system, or a server combined with a blockchain.

其中，需要说明的是，人工智能是研究使计算机来模拟人的某些思维过程和智能行为(如学习、推理、思考、规划等)的学科，既有硬件层面的技术也有软件层面的技术。人工智能硬件技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理等技术；人工智能软件技术主要包括计算机视觉技术、语音识别技术、自然语言处理技术以及机器学习/深度学习、大数据处理技术、知识图谱技术等几大方向。It should be noted that artificial intelligence is a discipline that studies how computers can simulate certain human thought processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), and includes both hardware-level and software-level technologies. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, and big data processing; artificial intelligence software technologies mainly include computer vision technology, speech recognition technology, natural language processing technology, as well as machine learning/deep learning, big data processing technology, knowledge graph technology, and other major directions.

另外，本申请的技术方案中所涉及的信息的获取、存储和应用等，均符合相关法律法规的规定，且不违背公序良俗。In addition, the acquisition, storage and application of information involved in the technical solution of this application are in compliance with the provisions of relevant laws and regulations and do not violate public order and good morals.

应该理解，可以使用上面所示的各种形式的流程，重新排序、增加或删除步骤。例如，本发申请中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行，只要能够实现本申请提出的技术方案所期望的结果，本文在此不进行限制。It should be understood that the various forms of processes shown above can be used to reorder, add or delete steps. For example, the steps recorded in this application can be executed in parallel, sequentially or in different orders, as long as the expected results of the technical solution proposed in this application can be achieved, and this document does not limit this.

上述具体实施方式，并不构成对本申请保护范围的限制。本领域技术人员应该明白的是，根据设计要求和其他因素，可以进行各种修改、组合、子组合和替代。任何在本申请的精神和原则之内所作的修改、等同替换和改进等，均应包含在本申请保护范围之内。The above specific implementations do not constitute a limitation on the protection scope of this application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions can be made according to design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of this application should be included in the protection scope of this application.

Claims

1. A method for classifying multimedia resources, comprising:

Acquire multimedia resources to be processed and multimedia representation vectors corresponding to the multimedia resources, wherein various media resources in the multimedia resources to be processed are represented by vectors, and the multimedia representation vectors corresponding to the multimedia resources to be processed are determined according to the weights corresponding to each resource;

Determining a plurality of entities in the multimedia resource and relationships between the plurality of entities;

Determine a knowledge representation vector corresponding to the multimedia resource according to the multiple entities and the relationships between the multiple entities, wherein a knowledge graph is formed according to the multiple entities in the multimedia resource and the relationships between the multiple entities, and each node in the knowledge graph is encoded to determine the knowledge representation vector corresponding to the multimedia resource;

The category of the multimedia resource is determined according to the multimedia representation vector and the knowledge representation vector, the multimedia representation vector and the knowledge representation vector are fused, and the category of the multimedia resource is determined according to the fused result.

2. The method according to claim 1, wherein the determining a plurality of entities in the multimedia resource and the relationship between the plurality of entities comprises:

extracting the plurality of entities from the multimedia resource;

An entity knowledge graph is queried according to the multiple entities to determine the relationships between the multiple entities.

3. The method according to claim 2, wherein the method of extracting the multiple entities in the multimedia resource comprises at least one of the following methods:

Performing face detection on each frame image of the video in the multimedia resource to extract the entity;

Performing optical character recognition on each frame image of the video in the multimedia resource to extract the entity;

Perform named entity recognition on the text in the multimedia resource and extract entities in the text.

4. The method according to claim 1, wherein determining the knowledge representation vector corresponding to the multimedia resource according to the plurality of entities and the relationship between the plurality of entities comprises:

Determine a representation vector of each entity according to the multiple entities and the relationship between the multiple entities;

The knowledge representation vector is determined according to the representation vector of each entity.

5. The method according to claim 1, wherein the step of obtaining the multimedia resource to be processed and the multimedia representation vector corresponding to the multimedia resource comprises:

Acquiring the multimedia resource;

Determining representation vectors and weights of various media resources in the multimedia resources;

Performing weighted summation processing according to the representation vectors and weights of the various media resources to obtain a weighted representation vector;

The weighted representation vector is used as the multimedia representation vector.

6. The method according to claim 1, wherein the step of obtaining the multimedia resource to be processed and the multimedia representation vector corresponding to the multimedia resource comprises:

Acquire the multimedia resource, and determine a weighted representation vector of the multimedia resource according to representation vectors and weights of various media resources in the multimedia resource;

Extracting a feature vector sequence of the multimedia resource, and adding the weighted representation vector to each feature vector in the feature vector sequence to obtain a processed feature vector sequence;

The processed feature vector sequence is semantically encoded to obtain the multimedia representation vector.

7. The method according to claim 6, wherein the step of extracting the feature vector sequence of the multimedia resource comprises:

For each media resource in the multimedia resources, extract multiple features from the media resource to obtain multiple feature vector subsequences of the media resource;

Adding and processing the multiple feature vector subsequences of the media resource to obtain the feature vector sequence of the media resource;

The feature vector sequences of various media resources are concatenated to obtain the feature vector sequence of the multimedia resource.

8. The method according to claim 1, wherein determining the category of the multimedia resource according to the multimedia representation vector and the knowledge representation vector comprises:

Determining weights of the multimedia representation vector and the knowledge representation vector;

According to the weight, weighted processing is performed on the multimedia representation vector and the knowledge representation vector to obtain a fusion representation vector;

The type of the multimedia resource is determined according to the fused representation vector.

9. The method according to claim 8, wherein determining the category of the multimedia resource according to the fused representation vector comprises:

Determining, based on the fused representation vector, an intermediate vector for predicting each level category;

For each level, the category of the multimedia resource at the level is predicted according to the intermediate vector of the level and the intermediate vector of a higher level, wherein the higher level is a level higher than the level.

10. A multimedia resource classification device, comprising:

An acquisition module, used to acquire multimedia resources to be processed and multimedia representation vectors corresponding to the multimedia resources, wherein various media resources in the multimedia resources to be processed are represented by vectors, and the multimedia representation vectors corresponding to the multimedia resources to be processed are determined according to the weights corresponding to each resource;

A first determining module, configured to determine a plurality of entities in the multimedia resource and the relationship between the plurality of entities;

A second determination module is used to determine a knowledge representation vector corresponding to the multimedia resource according to the multiple entities and the relationships between the multiple entities, wherein a knowledge graph is formed according to the multiple entities in the multimedia resource and the relationships between the multiple entities, and each node in the knowledge graph is encoded to determine the knowledge representation vector corresponding to the multimedia resource;

The third determination module is used to determine the category of the multimedia resource according to the multimedia representation vector and the knowledge representation vector, fuse the multimedia representation vector and the knowledge representation vector, and determine the category of the multimedia resource according to the fused result.

11. The device according to claim 10, wherein the first determining module is specifically configured to:

extracting the plurality of entities from the multimedia resource;

12. The apparatus according to claim 11, wherein the first determining module is further configured to:

and / or,

13. The device according to claim 10, wherein the second determining module is specifically configured to:

14. The device according to claim 10, wherein the acquisition module is specifically used to:

Acquiring the multimedia resource;

15. The device according to claim 10, wherein the acquisition module is further used for:

16. The device according to claim 15, wherein the acquisition module is further used for:

17. The device according to claim 10, wherein the third determining module is specifically configured to:

18. The apparatus according to claim 17, wherein the third determining module is further configured to:

19. An electronic device comprising:

at least one processor; and

a memory communicatively connected to the at least one processor; wherein,

The memory stores instructions that can be executed by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the method according to any one of claims 1 to 9.

20. A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute the method according to any one of claims 1 to 9.

21. A computer program product, comprising a computer program, which, when executed by a processor, implements the method according to any one of claims 1 to 9.