[go: up one dir, main page]

CN106611059A - Method and device for recommending multi-media files - Google Patents

Method and device for recommending multi-media files Download PDF

Info

Publication number
CN106611059A
CN106611059A CN201611235464.6A CN201611235464A CN106611059A CN 106611059 A CN106611059 A CN 106611059A CN 201611235464 A CN201611235464 A CN 201611235464A CN 106611059 A CN106611059 A CN 106611059A
Authority
CN
China
Prior art keywords
information
multimedia file
probability
key word
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611235464.6A
Other languages
Chinese (zh)
Inventor
高阳
丁晓亮
刘爽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Mobile Software Co Ltd
Original Assignee
Beijing Xiaomi Mobile Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Mobile Software Co Ltd filed Critical Beijing Xiaomi Mobile Software Co Ltd
Priority to CN201611235464.6A priority Critical patent/CN106611059A/en
Publication of CN106611059A publication Critical patent/CN106611059A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/435Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本公开提供了一种推荐多媒体文件的方法及装置,属于互联网技术领域。方法包括:获取终端对应的第一多媒体文件的第一标签信息,第一标签信息为基于第一多媒体文件的第一字幕信息提取的;根据第一标签信息和多媒体文件库中每个第二多媒体文件的第二标签信息,从多媒体文件库中选择第二标签信息与第一标签信息匹配的第二多媒体文件,每个第二多媒体文件的第二标签信息为基于每个第二多媒体文件的第二字幕信息提取的;向终端发送第二多媒体文件的标识。由于提取的第一标签信息和每个第二多媒体文件的第二标签信息较准确,根据第一标签信息和每个第二多媒体文件的第二标签信息,向用户推荐第二多媒体文件,能够提高向用户推荐第二多媒体文件的准确性。

The disclosure provides a method and device for recommending multimedia files, which belong to the technical field of the Internet. The method includes: acquiring the first tag information of the first multimedia file corresponding to the terminal, the first tag information is extracted based on the first subtitle information of the first multimedia file; according to the first tag information and each The second label information of a second multimedia file, select the second multimedia file whose second label information matches the first label information from the multimedia file library, the second label information of each second multimedia file Extracted based on the second subtitle information of each second multimedia file; sending the identifier of the second multimedia file to the terminal. Since the extracted first tag information and the second tag information of each second multimedia file are more accurate, according to the first tag information and the second tag information of each second multimedia file, recommend the second most The media file can improve the accuracy of recommending the second multimedia file to the user.

Description

推荐多媒体文件的方法及装置Method and device for recommending multimedia files

技术领域technical field

本公开涉及互联网技术领域,尤其涉及一种推荐多媒体文件的方法及装置。The present disclosure relates to the technical field of the Internet, in particular to a method and device for recommending multimedia files.

背景技术Background technique

随着信息时代的到来,服务器中存储的视频文件越来越多;用户可以使用终端从服务器中获取用户感兴趣的视频文件。为了提高用户的体验,服务器还可以为用户推荐用户感兴趣的视频文件。With the advent of the information age, more and more video files are stored in the server; the user can use the terminal to obtain the video file that the user is interested in from the server. In order to improve user experience, the server may also recommend video files of interest to the user.

目前,服务器在为用户推荐视频文件时,统计服务器中每个视频文件的点击率,根据每个视频文件的点击率,选择点击率最高的视频文件,向用户推荐选择的视频文件。At present, when recommending video files for users, the server counts the click-through rate of each video file in the server, selects the video file with the highest click-through rate according to the click-through rate of each video file, and recommends the selected video file to the user.

发明内容Contents of the invention

为克服相关技术中存在的问题,本公开提供一种推荐多媒体文件的方法及装置,所述技术方案如下:In order to overcome the problems existing in related technologies, the present disclosure provides a method and device for recommending multimedia files, and the technical solution is as follows:

根据本公开实施例的第一方面,提供一种推荐多媒体文件的方法,所述方法包括:According to a first aspect of an embodiment of the present disclosure, a method for recommending a multimedia file is provided, the method comprising:

获取终端对应的第一多媒体文件的第一标签信息,所述第一标签信息为基于所述第一多媒体文件的第一字幕信息提取的;Acquiring first tag information of a first multimedia file corresponding to the terminal, where the first tag information is extracted based on first subtitle information of the first multimedia file;

根据所述第一标签信息和多媒体文件库中每个第二多媒体文件的第二标签信息,从所述多媒体文件库中选择第二标签信息与所述第一标签信息匹配的第二多媒体文件,所述每个第二多媒体文件的第二标签信息为基于所述每个第二多媒体文件的第二字幕信息提取的;According to the first tag information and the second tag information of each second multimedia file in the multimedia file library, select a second multi-media file whose second tag information matches the first tag information from the multimedia file library. For media files, the second tag information of each second multimedia file is extracted based on the second subtitle information of each second multimedia file;

向所述终端发送所述第二多媒体文件的标识。Sending the identifier of the second multimedia file to the terminal.

在本公开实施例中,由于第一标签信息为基于第一多媒体文件的第一字幕信息提取的,每个第二多媒体文件的第二标签信息为基于每个第二多媒体文件的第二字幕信息提取的,因此提取的第一标签信息和每个第二多媒体文件的第二标签信息较准确,从而根据第一标签信息和每个第二多媒体文件的第二标签信息,向用户推荐第二多媒体文件,能够提高向用户推荐第二多媒体文件的准确性。In the embodiment of the present disclosure, since the first tag information is extracted based on the first subtitle information of the first multimedia file, the second tag information of each second multimedia file is extracted based on the first subtitle information of each second multimedia file. The second subtitle information of the file is extracted, so the extracted first tag information and the second tag information of each second multimedia file are more accurate, so that according to the first tag information and the first tag information of each second multimedia file Two tag information is used to recommend the second multimedia file to the user, which can improve the accuracy of recommending the second multimedia file to the user.

在一种可能实现方式中,所述获取终端对应的第一多媒体文件的第一标签信息,包括:In a possible implementation manner, the acquiring the first label information of the first multimedia file corresponding to the terminal includes:

获取所述第一多媒体文件的第一字幕信息;Obtain first subtitle information of the first multimedia file;

对所述第一字幕信息进行分词,得到第一关键词集合;performing word segmentation on the first subtitle information to obtain a first keyword set;

对所述第一关键词集合中的每个关键词进行分析,得到所述第一标签信息。Analyzing each keyword in the first keyword set to obtain the first tag information.

在本公开实施例中,通过对第一多媒体文件的第一字幕信息进行语义分析,提取第一多媒体文件的第一标签信息,从而提高了第一标签信息的准确性。In the embodiment of the present disclosure, the first label information of the first multimedia file is extracted by performing semantic analysis on the first subtitle information of the first multimedia file, thereby improving the accuracy of the first label information.

在一种可能实现方式中,所述对所述第一关键词集合中的每个关键词进行分析,得到所述第一标签信息,包括:In a possible implementation manner, the analyzing each keyword in the first keyword set to obtain the first tag information includes:

获取所述每个关键词在所述第一字幕信息中的概率,以及,获取所述每个关键词属于主题信息库中的每个主题信息的概率,所述主题信息库用于存储多个预设的主题信息;Acquiring the probability of each keyword in the first subtitle information, and obtaining the probability that each keyword belongs to each theme information in the theme information library, the theme information library is used to store a plurality of preset subject information;

根据所述每个关键词在所述第一字幕信息中的概率和所述每个关键词属于每个主题信息的概率,确定所述第一多媒体文件属于所述每个主题信息的概率;According to the probability of each keyword in the first subtitle information and the probability that each keyword belongs to each topic information, determine the probability that the first multimedia file belongs to each topic information ;

根据所述第一多媒体文件属于所述每个主题信息的概率,从所述每个主题信息中选择概率最大的预设数目个主题信息;According to the probability that the first multimedia file belongs to each topic information, select a preset number of topic information with the highest probability from each topic information;

将选择的预设数目个主题信息组成所述第一标签信息。Compose the selected preset number of topic information into the first tag information.

在本公开实施例中,根据每个关键词在第一字幕信息中的概率和每个关键词属于每个主题信息的概率,确定第一多媒体文件属于每个主题信息的概率;根据第一多媒体文件属于每个主题信息的概率,从每个主题信息中选择概率最大的预设数目个主题信息,从而提高了第一标签信息的准确性。In the embodiment of the present disclosure, according to the probability of each keyword in the first subtitle information and the probability of each keyword belonging to each topic information, the probability that the first multimedia file belongs to each topic information is determined; according to the A multimedia file belongs to the probability of each topic information, and select a preset number of topic information with the highest probability from each topic information, thereby improving the accuracy of the first tag information.

在一种可能实现方式中,所述根据所述每个关键词在所述第一字幕信息中的概率和所述每个关键词属于每个主题信息的概率,确定所述第一多媒体文件属于所述每个主题信息的概率,包括:In a possible implementation manner, the first multimedia subtitle is determined according to the probability of each keyword in the first subtitle information and the probability that each keyword belongs to each topic information Probability that the document belongs to each of the subject information, including:

将所述每个关键词在所述第一字幕信息中的概率组成第一概率矩阵,以及,将所述每个关键词属于每个主题信息的概率组成第二概率矩阵;Composing the probability of each keyword in the first subtitle information into a first probability matrix, and forming the probability of each keyword belonging to each topic information into a second probability matrix;

将所述第二概率矩阵的逆矩阵与所述第一概率矩阵相乘,得到第三概率矩阵;multiplying the inverse matrix of the second probability matrix by the first probability matrix to obtain a third probability matrix;

从所述第三概率矩阵中获取所述第一多媒体文件属于所述每个主题信息的概率。Obtaining the probability that the first multimedia file belongs to each topic information from the third probability matrix.

在本公开实施例中,将每个关键词在第一字幕信息中的概率组成第一概率矩阵,将每个关键词属于每个主题信息的概率组成第二概率矩阵,根据第一概率矩阵和第二概率矩阵,确定第一多媒体文件属于每个主题信息的概率,提高了确定出第一多媒体文件属于每个主题信息的概率的准确性,进而提高了第一标签信息的准确性。In the embodiment of the present disclosure, the probability of each keyword in the first subtitle information is formed into a first probability matrix, and the probability of each keyword belonging to each topic information is formed into a second probability matrix. According to the first probability matrix and The second probability matrix determines the probability that the first multimedia file belongs to each topic information, improves the accuracy of determining the probability that the first multimedia file belongs to each topic information, and then improves the accuracy of the first label information sex.

在一种可能实现方式中,所述获取所述每个关键词属于主题信息库中的每个主题信息的概率,包括:In a possible implementation manner, the obtaining the probability that each keyword belongs to each subject information in the subject information database includes:

对于所述每个主题信息,获取所述主题信息对应的预设关键词集合;For each subject information, obtain a preset keyword set corresponding to the subject information;

根据所述每个关键词在所述第一字幕信息中的概率、所述预设关键词集合和所述预设关键词集合包含的关键词的数目,确定所述每个关键词属于所述主题信息的概率。Determine that each keyword belongs to the Probability of Topic Information.

在一种可能实现方式中,所述根据所述每个关键词在所述第一字幕信息中的概率、所述预设关键词集合和所述预设关键词集合包含的关键词的数目,确定所述每个关键词属于所述主题信息的概率,包括:In a possible implementation manner, according to the probability of each keyword in the first subtitle information, the preset keyword set, and the number of keywords contained in the preset keyword set, Determining the probability that each keyword belongs to the subject information includes:

如果所述预设关键词集合中包含所述每个关键词,将所述每个关键词在所述第一字幕信息中的概率和所述预设关键词集合包含的关键词的数目的比值作为所述每个关键词属于所述主题信息的概率;If each keyword is included in the preset keyword set, the ratio of the probability of each keyword in the first subtitle information to the number of keywords included in the preset keyword set As the probability that each keyword belongs to the subject information;

如果所述预设关键词集合中不包含所述每个关键词,确定所述每个关键词属于所述主题信息的概率为零。If each keyword is not included in the preset keyword set, it is determined that the probability that each keyword belongs to the subject information is zero.

在本公开实施例中,将每个关键词在第一字幕信息中的概率和预设关键词集合包含的关键词的数目的比值作为每个关键词属于该主题信息的概率,由于结合了每个关键词在第一字幕信息中的概率,确定每个关键字属于该主题信息的概率,提高了确定出每个关键词属于该主题信息的概率的准确性,进而提高了第一标签信息的准确性。In the embodiment of the present disclosure, the ratio of the probability of each keyword in the first subtitle information to the number of keywords contained in the preset keyword set is used as the probability of each keyword belonging to the topic information, since each The probability of each keyword in the first subtitle information determines the probability that each keyword belongs to the topic information, improves the accuracy of determining the probability that each keyword belongs to the topic information, and then improves the accuracy of the first label information. accuracy.

在一种可能实现方式中,所述对所述第一字幕信息进行分词,得到第一关键词集合,包括:In a possible implementation manner, the performing word segmentation on the first subtitle information to obtain a first keyword set includes:

对所述第一字幕信息进行分词,将所述第一字幕信息包括的每个分词组成第二关键词集合;performing word segmentation on the first subtitle information, and forming each word segment included in the first subtitle information into a second keyword set;

将所述第二关键词集合中预设类型的关键词去除,得到所述第一关键字集合。The keywords of the preset type in the second keyword set are removed to obtain the first keyword set.

在本公开实施例中,将第二关键词集合中预设类型的关键词去除,不仅降低了运算量,还提高了第一标签信息的准确性。In the embodiment of the present disclosure, removing the keywords of the preset type in the second keyword set not only reduces the calculation amount, but also improves the accuracy of the first tag information.

在一种可能实现方式中,所述第一标签信息包括所述第一多媒体文件所属的主题信息、制作所述第一多媒体文件的用户信息和被拍摄的用户信息中的至少一个。In a possible implementation manner, the first tag information includes at least one of subject information to which the first multimedia file belongs, user information that produced the first multimedia file, and user information that was photographed .

在本公开实施例中,第一标签信息中包括第一多媒体文件所属的主题信息、制作第一多媒体文件的用户信息和被拍摄的用户信息中的至少一个。因此结合第一多媒体文件所属的主题信息、制作第一多媒体文件的用户信息和被拍摄的用户信息中的至少一个为用户推荐第二多媒体文件,能够提高向用户推荐第二多媒体文件的准确性。In an embodiment of the present disclosure, the first tag information includes at least one of subject information to which the first multimedia file belongs, user information that makes the first multimedia file, and user information that is photographed. Therefore, in combination with at least one of the subject information to which the first multimedia file belongs, the user information for making the first multimedia file, and the photographed user information, recommending the second multimedia file to the user can improve the ability to recommend the second multimedia file to the user. Accuracy of multimedia files.

根据本公开实施例的第二方面,提供一种推荐多媒体文件的装置,所述装置包括:According to a second aspect of an embodiment of the present disclosure, there is provided an apparatus for recommending multimedia files, the apparatus comprising:

获取模块,用于获取终端对应的第一多媒体文件的第一标签信息,所述第一标签信息为基于所述第一多媒体文件的第一字幕信息提取的;An acquisition module, configured to acquire first tag information of a first multimedia file corresponding to the terminal, where the first tag information is extracted based on first subtitle information of the first multimedia file;

选择模块,用于根据所述第一标签信息和多媒体文件库中每个第二多媒体文件的第二标签信息,从所述多媒体文件库中选择第二标签信息与所述第一标签信息匹配的第二多媒体文件,所述每个第二多媒体文件的第二标签信息为基于所述每个第二多媒体文件的第二字幕信息提取的;A selection module, configured to select the second tag information and the first tag information from the multimedia file library according to the first tag information and the second tag information of each second multimedia file in the multimedia file library Matching second multimedia files, the second tag information of each second multimedia file is extracted based on the second subtitle information of each second multimedia file;

发送模块,用于向所述终端发送所述第二多媒体文件的标识。A sending module, configured to send the identifier of the second multimedia file to the terminal.

在一种可能实现方式中,所述获取模块,包括:In a possible implementation manner, the acquisition module includes:

获取单元,用于获取所述第一多媒体文件的第一字幕信息;an acquiring unit, configured to acquire first subtitle information of the first multimedia file;

分词单元,用于对所述第一字幕信息进行分词,得到第一关键词集合;a word segmentation unit, configured to perform word segmentation on the first subtitle information to obtain a first set of keywords;

分析单元,用于对所述第一关键词集合中的每个关键词进行分析,得到所述第一标签信息。An analyzing unit, configured to analyze each keyword in the first keyword set to obtain the first tag information.

在一种可能实现方式中,所述分析单元,还用于获取所述每个关键词在所述第一字幕信息中的概率,以及,获取所述每个关键词属于主题信息库中的每个主题信息的概率,所述主题信息库用于存储多个预设的主题信息,根据所述每个关键词在所述第一字幕信息中的概率和所述每个关键词属于每个主题信息的概率,确定所述第一多媒体文件属于所述每个主题信息的概率,根据所述第一多媒体文件属于所述每个主题信息的概率,从所述每个主题信息中选择概率最大的预设数目个主题信息,将选择的预设数目个主题信息组成所述第一标签信息。In a possible implementation manner, the analysis unit is further configured to obtain the probability of each keyword in the first subtitle information, and obtain each The probability of each subject information, the subject information library is used to store a plurality of preset subject information, according to the probability of each keyword in the first subtitle information and the probability that each keyword belongs to each subject information probability, determining the probability that the first multimedia file belongs to each subject information, and according to the probability that the first multimedia file belongs to each subject information, from each subject information A preset number of topic information with the highest probability is selected, and the selected preset number of topic information is combined into the first tag information.

在一种可能实现方式中,所述分析单元,还用于将所述每个关键词在所述第一字幕信息中的概率组成第一概率矩阵,以及,将所述每个关键词属于每个主题信息的概率组成第二概率矩阵,将所述第二概率矩阵的逆矩阵与所述第一概率矩阵相乘,得到第三概率矩阵,从所述第三概率矩阵中获取所述第一多媒体文件属于所述每个主题信息的概率。In a possible implementation manner, the analysis unit is further configured to form a probability of each keyword in the first subtitle information into a first probability matrix, and form each keyword into each The probability of subject information constitutes a second probability matrix, and the inverse matrix of the second probability matrix is multiplied by the first probability matrix to obtain a third probability matrix, and the first probability matrix is obtained from the third probability matrix The probability that the multimedia file belongs to each subject information.

在一种可能实现方式中,所述分析单元,还用于对于所述每个主题信息,获取所述主题信息对应的预设关键词集合,根据所述每个关键词在所述第一字幕信息中的概率、所述预设关键词集合和所述预设关键词集合包含的关键词的数目,确定所述每个关键词属于所述主题信息的概率。In a possible implementation manner, the analysis unit is further configured to obtain, for each subject information, a preset keyword set corresponding to the subject information, and according to each keyword in the first subtitle The probability in the information, the preset keyword set and the number of keywords included in the preset keyword set determine the probability that each keyword belongs to the subject information.

在一种可能实现方式中,所述分析单元,还用于如果所述预设关键词集合中包含所述每个关键词,将所述每个关键词在所述第一字幕信息中的概率和所述预设关键词集合包含的关键词的数目的比值作为所述每个关键词属于所述主题信息的概率,如果所述预设关键词集合中不包含所述每个关键词,确定所述每个关键词属于所述主题信息的概率为零。In a possible implementation manner, the analysis unit is further configured to calculate the probability of each keyword in the first subtitle information if the preset keyword set contains each keyword and the ratio of the number of keywords included in the preset keyword set as the probability that each keyword belongs to the topic information, if the preset keyword set does not contain each keyword, determine The probability that each keyword belongs to the subject information is zero.

在一种可能实现方式中,所述分词单元,还用于对所述第一字幕信息进行分词,将所述第一字幕信息包括的每个分词组成第二关键词集合,将所述第二关键词集合中预设类型的关键词去除,得到所述第一关键字集合。In a possible implementation manner, the word segmentation unit is further configured to segment the first subtitle information, form each word segment included in the first subtitle information into a second keyword set, and combine the second The keywords of the preset type in the keyword set are removed to obtain the first keyword set.

在一种可能实现方式中,所述第一标签信息包括所述第一多媒体文件所属的主题信息、制作所述第一多媒体文件的用户信息和被拍摄的用户信息中的至少一个。In a possible implementation manner, the first tag information includes at least one of subject information to which the first multimedia file belongs, user information that produced the first multimedia file, and user information that was photographed .

根据本公开实施例的第三方面,提供一种推荐多媒体文件的装置,所述装置包括:According to a third aspect of an embodiment of the present disclosure, there is provided an apparatus for recommending multimedia files, the apparatus comprising:

处理器;processor;

用于存储处理器可执行指令的存储器;memory for storing processor-executable instructions;

其中,所述处理器被配置为:获取终端对应的第一多媒体文件的第一标签信息,所述第一标签信息为基于所述第一多媒体文件的第一字幕信息提取的;Wherein, the processor is configured to: acquire the first label information of the first multimedia file corresponding to the terminal, the first label information is extracted based on the first subtitle information of the first multimedia file;

根据所述第一标签信息和多媒体文件库中每个第二多媒体文件的第二标签信息,从所述多媒体文件库中选择第二标签信息与所述第一标签信息匹配的第二多媒体文件,所述每个第二多媒体文件的第二标签信息为基于所述每个第二多媒体文件的第二字幕信息提取的;According to the first tag information and the second tag information of each second multimedia file in the multimedia file library, select a second multi-media file whose second tag information matches the first tag information from the multimedia file library. For media files, the second tag information of each second multimedia file is extracted based on the second subtitle information of each second multimedia file;

向所述终端发送所述第二多媒体文件。sending the second multimedia file to the terminal.

本公开的实施例提供的技术方案可以包括以下有益效果:The technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects:

在本公开实施例中,由于第一标签信息为基于第一多媒体文件的第一字幕信息提取的,每个第二多媒体文件的第二标签信息为基于每个第二多媒体文件的第二字幕信息提取的,因此提取的第一标签信息和每个第二多媒体文件的第二标签信息较准确,从而根据第一标签信息和每个第二多媒体文件的第二标签信息,向用户推荐第二多媒体文件,能够提高向用户推荐第二多媒体文件的准确性。In the embodiment of the present disclosure, since the first tag information is extracted based on the first subtitle information of the first multimedia file, the second tag information of each second multimedia file is extracted based on the first subtitle information of each second multimedia file. The second subtitle information of the file is extracted, so the extracted first tag information and the second tag information of each second multimedia file are more accurate, so that according to the first tag information and the first tag information of each second multimedia file Two tag information is used to recommend the second multimedia file to the user, which can improve the accuracy of recommending the second multimedia file to the user.

应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure.

图1是根据一示例性实施例示出的一种实施环境示意图;Fig. 1 is a schematic diagram of an implementation environment according to an exemplary embodiment;

图2是根据一示例性实施例示出的一种推荐多媒体文件的方法的流程图;Fig. 2 is a flow chart showing a method for recommending multimedia files according to an exemplary embodiment;

图3是根据一示例性实施例示出的一种推荐多媒体文件的方法的流程图;Fig. 3 is a flow chart showing a method for recommending multimedia files according to an exemplary embodiment;

图4是根据一示例性实施例示出的一种推荐多媒体文件的装置的框图;Fig. 4 is a block diagram of an apparatus for recommending multimedia files according to an exemplary embodiment;

图5是根据一示例性实施例示出的一种获取模块的框图;Fig. 5 is a block diagram of an acquisition module according to an exemplary embodiment;

图6是根据一示例性实施例示出的一种推荐多媒体文件的装置的框图。Fig. 6 is a block diagram of an apparatus for recommending multimedia files according to an exemplary embodiment.

具体实施方式detailed description

为使本公开的目的、技术方案和优点更加清楚,下面将结合附图对本公开实施方式作进一步地详细描述。In order to make the purpose, technical solution and advantages of the present disclosure clearer, the implementation manners of the present disclosure will be further described in detail below in conjunction with the accompanying drawings.

这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatuses and methods consistent with aspects of the present disclosure as recited in the appended claims.

图1是本公开实施例提供的一种实施环境的示意图,参见图1,该实施环境包括:终端101和服务器102,终端101与服务器102之间通过通信网络连接。Fig. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present disclosure. Referring to Fig. 1 , the implementation environment includes: a terminal 101 and a server 102, and the terminal 101 and the server 102 are connected through a communication network.

其中,终端101中运行服务器102关联的应用,可以基于用户标识登录该应用,从而与该服务器102进行交互。该应用可以为视频应用或者音频应用等多种应用,该用户标识可以为用户账号、电话号码等,本公开实施例对此不做限定。Wherein, the application associated with the server 102 is run in the terminal 101, and the application can be logged in based on the user ID, so as to interact with the server 102. The application may be a variety of applications such as a video application or an audio application, and the user identifier may be a user account, a phone number, etc., which is not limited in this embodiment of the present disclosure.

终端101可以为手机终端、PAD(portable android device,平板电脑)终端或者电脑终端等。服务器102可以是一台服务器,或者由若干台服务器组成的服务器集群,或者是一个云计算服务器中心,本公开实施例对此不做限定。The terminal 101 may be a mobile phone terminal, a PAD (portable android device, tablet computer) terminal, or a computer terminal. The server 102 may be one server, or a server cluster composed of several servers, or a cloud computing server center, which is not limited in this embodiment of the present disclosure.

终端上安装某应用,该应用可以为视频应用或者音频应用。当终端通过该应用播放多媒体文件时,服务器可以基于用户当前播放或者历史播放的第一多媒体文件为用户推荐第二多媒体文件,在推荐第二多媒体文件时,服务器获取第一多媒体文件的第一标签信息,第一标签信息为基于第一多媒体文件的第一字幕信息提取的,并且,第一多媒体文件可以为一个多媒体文件,也可以包括多个多媒体文件;获取多媒体文件库中每个第二多媒体文件的第二标签信息,根据第一标签信息和每个第二多媒体文件的第二标签信息,向用户推荐第二多媒体文件,第二多媒体文件可以为一个多媒体文件,也可以包括多个多媒体文件。An application is installed on the terminal, and the application may be a video application or an audio application. When the terminal plays a multimedia file through the application, the server can recommend a second multimedia file for the user based on the first multimedia file currently played or historically played by the user. When recommending the second multimedia file, the server obtains the first The first label information of the multimedia file, the first label information is extracted based on the first subtitle information of the first multimedia file, and the first multimedia file can be a multimedia file, or can include multiple multimedia files file; obtain the second label information of each second multimedia file in the multimedia file library, and recommend the second multimedia file to the user according to the first label information and the second label information of each second multimedia file , the second multimedia file may be one multimedia file, or may include multiple multimedia files.

由于第一标签信息为基于第一多媒体文件的第一字幕信息提取的,每个第二多媒体文件的第二标签信息为基于每个第二多媒体文件的第二字幕信息提取的,因此提取的第一标签信息和每个第二多媒体文件的第二标签信息较准确,从而根据第一标签信息和每个第二多媒体文件的第二标签信息,向用户推荐第二多媒体文件,能够提高向用户推荐第二多媒体文件的准确性。Since the first tag information is extracted based on the first subtitle information of the first multimedia file, the second tag information of each second multimedia file is extracted based on the second subtitle information of each second multimedia file Therefore, the extracted first tag information and the second tag information of each second multimedia file are more accurate, so according to the first tag information and the second tag information of each second multimedia file, recommend The second multimedia file can improve the accuracy of recommending the second multimedia file to the user.

图2是根据一示例性实施例示出的一种推荐多媒体文件的方法流程图,该方法的执行主体可以为服务器,如图2所示,包括以下步骤。Fig. 2 is a flow chart of a method for recommending multimedia files according to an exemplary embodiment. The method may be executed by a server, as shown in Fig. 2 , including the following steps.

在步骤S201中,获取终端对应的第一多媒体文件的第一标签信息,第一标签信息为基于第一多媒体文件的第一字幕信息提取的。In step S201, the first tag information of the first multimedia file corresponding to the terminal is acquired, and the first tag information is extracted based on the first subtitle information of the first multimedia file.

在步骤S202中,根据第一标签信息和多媒体文件库中每个第二多媒体文件的第二标签信息,从多媒体文件库中选择第二标签信息与第一标签信息匹配的第二多媒体文件,每个第二多媒体文件的第二标签信息为基于每个第二多媒体文件的第二字幕信息提取的。In step S202, according to the first tag information and the second tag information of each second multimedia file in the multimedia file library, select the second multimedia file whose second tag information matches the first tag information from the multimedia file library. body files, the second tag information of each second multimedia file is extracted based on the second subtitle information of each second multimedia file.

在步骤S203中,向终端发送第二多媒体文件的标识。In step S203, the identifier of the second multimedia file is sent to the terminal.

在一种可能实现方式中,获取终端对应的第一多媒体文件的第一标签信息,包括:In a possible implementation manner, obtaining the first tag information of the first multimedia file corresponding to the terminal includes:

获取第一多媒体文件的第一字幕信息;Obtain first subtitle information of the first multimedia file;

对第一字幕信息进行分词,得到第一关键词集合;performing word segmentation on the first subtitle information to obtain a first keyword set;

对第一关键词集合中的每个关键词进行分析,得到第一标签信息。Each keyword in the first keyword set is analyzed to obtain first tag information.

在一种可能实现方式中,对第一关键词集合中的每个关键词进行分析,得到第一标签信息,包括:In a possible implementation manner, each keyword in the first keyword set is analyzed to obtain the first tag information, including:

获取每个关键词在第一字幕信息中的概率,以及,获取每个关键词属于主题信息库中的每个主题信息的概率,该主题信息库用于存储多个预设的主题信息;Acquiring the probability of each keyword in the first subtitle information, and obtaining the probability that each keyword belongs to each theme information in the theme information library, the theme information library is used to store a plurality of preset theme information;

根据每个关键词在第一字幕信息中的概率和每个关键词属于每个主题信息的概率,确定第一多媒体文件属于每个主题信息的概率;According to the probability of each keyword in the first subtitle information and the probability of each keyword belonging to each topic information, determine the probability that the first multimedia file belongs to each topic information;

根据第一多媒体文件属于每个主题信息的概率,从每个主题信息中选择概率最大的预设数目个主题信息;According to the probability that the first multimedia file belongs to each topic information, select a preset number of topic information with the highest probability from each topic information;

将选择的预设数目个主题信息组成第一标签信息。The selected preset number of topic information is combined into the first tag information.

在一种可能实现方式中,根据每个关键词在第一字幕信息中的概率和每个关键词属于每个主题信息的概率,确定第一多媒体文件属于每个主题信息的概率,包括:In a possible implementation manner, according to the probability of each keyword in the first subtitle information and the probability of each keyword belonging to each topic information, the probability that the first multimedia file belongs to each topic information is determined, including :

将每个关键词在第一字幕信息中的概率组成第一概率矩阵,以及,将每个关键词属于每个主题信息的概率组成第二概率矩阵;Composing the probability of each keyword in the first subtitle information into a first probability matrix, and forming the probability of each keyword belonging to each topic information into a second probability matrix;

将第二概率矩阵的逆矩阵与第一概率矩阵相乘,得到第三概率矩阵;multiplying the inverse matrix of the second probability matrix by the first probability matrix to obtain a third probability matrix;

从第三概率矩阵中获取第一多媒体文件属于每个主题信息的概率。The probability that the first multimedia file belongs to each subject information is obtained from the third probability matrix.

在一种可能实现方式中,获取每个关键词属于主题信息库中的每个主题信息的概率,包括:In a possible implementation manner, obtaining the probability that each keyword belongs to each subject information in the subject information database includes:

对于每个主题信息,获取该主题信息对应的预设关键词集合;For each subject information, obtain a preset keyword set corresponding to the subject information;

根据每个关键词在第一字幕信息中的概率、预设关键词集合和预设关键词集合包含的关键词的数目,确定每个关键词属于该主题信息的概率。According to the probability of each keyword in the first subtitle information, the preset keyword set and the number of keywords contained in the preset keyword set, the probability of each keyword belonging to the topic information is determined.

在一种可能实现方式中,根据每个关键词在第一字幕信息中的概率、预设关键词集合和预设关键词集合包含的关键词的数目,确定每个关键词属于该主题信息的概率,包括:In a possible implementation manner, according to the probability of each keyword in the first subtitle information, the preset keyword set, and the number of keywords contained in the preset keyword set, it is determined that each keyword belongs to the topic information Probability, including:

如果预设关键词集合中包含每个关键词,将每个关键词在第一字幕信息中的概率和预设关键词集合包含的关键词的数目的比值作为每个关键词属于该主题信息的概率;If each keyword is included in the preset keyword set, the ratio of the probability of each keyword in the first subtitle information and the number of keywords included in the preset keyword set is used as the value of each keyword belonging to the topic information probability;

如果预设关键词集合中不包含每个关键词,确定每个关键词属于该主题信息的概率为零。If each keyword is not included in the preset keyword set, it is determined that the probability that each keyword belongs to the topic information is zero.

在一种可能实现方式中,对第一字幕信息进行分词,得到第一关键词集合,包括:In a possible implementation manner, word segmentation is performed on the first subtitle information to obtain a first keyword set, including:

对第一字幕信息进行分词,将第一字幕信息包括的每个分词组成第二关键词集合;performing word segmentation on the first subtitle information, and forming a second keyword set from each word segment included in the first subtitle information;

将第二关键词集合中预设类型的关键词去除,得到第一关键字集合。The keywords of the preset type in the second keyword set are removed to obtain the first keyword set.

在一种可能实现方式中,第一标签信息包括第一多媒体文件所属的主题信息、制作第一多媒体文件的用户信息和被拍摄的用户信息中的至少一个。In a possible implementation manner, the first tag information includes at least one of subject information to which the first multimedia file belongs, user information that produces the first multimedia file, and user information that is photographed.

上述所有可选技术方案,可以采用任意结合形成本公开的可选实施例,在此不再一一赘述。All the above optional technical solutions may be combined in any way to form optional embodiments of the present disclosure, which will not be repeated here.

图3是根据一示例性实施例示出的一种设置标签信息的方法流程图,该方法的执行主体可以为服务器,如图3所示,包括以下步骤。Fig. 3 is a flow chart of a method for setting tag information according to an exemplary embodiment. The method may be executed by a server, as shown in Fig. 3 , including the following steps.

在步骤S301中,服务器获取终端对应的第一多媒体文件的第一标签信息,第一标签信息为基于第一多媒体文件的第一字幕信息提取的。In step S301, the server acquires first tag information of a first multimedia file corresponding to the terminal, where the first tag information is extracted based on first subtitle information of the first multimedia file.

当终端登录服务器或者从服务器中获取多媒体文件时,服务器根据终端的终端标识,从终端的历史播放记录中获取终端对应的第一多媒体文件;第一多媒体文件可以为一个多媒体文件,也可以包括多个多媒体文件。为了提高准确性,服务器可以从终端的历史播放记录中获取播放时间离当前时间最近的第一预设数目个多媒体文件作为第一多媒体文件。其中,第一标签信息包括第一多媒体文件所属的主题信息、制作第一多媒体文件的用户信息和被拍摄的用户信息中的至少一个。When the terminal logs in to the server or obtains a multimedia file from the server, the server obtains the first multimedia file corresponding to the terminal from the historical playback records of the terminal according to the terminal identifier of the terminal; the first multimedia file may be a multimedia file, Multiple multimedia files can also be included. In order to improve the accuracy, the server may acquire a first preset number of multimedia files whose playback time is closest to the current time from the historical playback records of the terminal as the first multimedia files. Wherein, the first tag information includes at least one of subject information to which the first multimedia file belongs, information of a user who made the first multimedia file, and information of a user who was photographed.

第一多媒体文件可以为视频文件或者音频文件;当第一多媒体文件为视频文件时,制作第一多媒体文件的用户信息可以为第一多媒体文件的导演和/或制片人,被拍摄的用户信息可以为拍摄第一多媒体文件的演员等。当第一多媒体文件为音频文件时,制作第一多媒体文件的用户信息可以为演唱者等。第一预设数目可以为1或者5。The first multimedia file can be a video file or an audio file; when the first multimedia file is a video file, the user information making the first multimedia file can be the director and/or producer of the first multimedia file Filmmaker, the photographed user information may be an actor or the like who photographed the first multimedia file. When the first multimedia file is an audio file, the user information for making the first multimedia file may be a singer or the like. The first preset number can be 1 or 5.

当第一标签信息包括第一多媒体文件所属的主题信息时,本步骤可以通过以下步骤3011-3013实现,包括:When the first tag information includes the theme information to which the first multimedia file belongs, this step can be implemented through the following steps 3011-3013, including:

3011:服务器获取第一多媒体文件的第一字幕信息。3011: The server acquires first subtitle information of the first multimedia file.

服务器中存储了多媒体文件的标识和字幕文件的对应关系;相应的,本步骤可以为:The corresponding relationship between the identification of the multimedia file and the subtitle file is stored in the server; correspondingly, this step can be:

服务器根据第一多媒体文件的标识,从多媒体文件的标识和字幕文件的对应关系中获取第一多媒体文件的字幕文件,从第一多媒体文件的字幕文件中获取该第一多媒体文件的字幕信息。According to the identification of the first multimedia file, the server obtains the subtitle file of the first multimedia file from the correspondence between the identification of the multimedia file and the subtitle file, and obtains the first multi-media file from the subtitle file of the first multimedia file. Caption information of the media file.

第一多媒体文件可以为视频文件或者音频文件。第一多媒体文件的标识可以为第一多媒体文件的名称或者编号等。在本公开实施例中,对第一多媒体文件的标识不作具体限定。The first multimedia file may be a video file or an audio file. The identifier of the first multimedia file may be the name or serial number of the first multimedia file. In the embodiment of the present disclosure, the identification of the first multimedia file is not specifically limited.

3012:服务器对第一字幕信息进行分词,得到第一关键词集合。3012: The server performs word segmentation on the first subtitle information to obtain a first keyword set.

在本步骤中,服务器可以对第一字幕信息进行分词,将第一字幕信息包括的每个分词组成第一关键词集合;服务器也可以通过以下步骤(1)-(2),得到第一关键词集合,包括:In this step, the server may perform word segmentation on the first subtitle information, and form each word segment included in the first subtitle information into a first keyword set; the server may also obtain the first keyword through the following steps (1)-(2): collection of words, including:

(1):服务器对第一字幕信息进行分词,将第一字幕信息包括的每个分词组成第二关键词集合。(1): The server performs word segmentation on the first subtitle information, and forms each word segment included in the first subtitle information into a second keyword set.

服务器通过预设分词工具,对第一字幕信息进行分词,得到第一字幕信息包括的每个分词,将第一字幕信息包括的每个分词组成第二关键词集合。The server performs word segmentation on the first subtitle information by using a preset word segmentation tool, obtains each word segment included in the first subtitle information, and forms each word segment included in the first subtitle information into a second keyword set.

例如,第一字幕信息为“最了解你的人不是你的朋友,而是你的敌人。”则通过预设分词工具,对第一字幕信息进行分词,得到第一字幕信息包括的每个分词为“最”、“了解”、“你的”、“人”、“不是”、“你的”、“朋友”、“而是”、“你的”、“敌人”,则第二关键词集合为{“最”、“了解”、“你的”、“人”、“不是”、“你的”、“朋友”、“而是”、“你的”、“敌人”}。For example, if the first subtitle information is "The person who knows you best is not your friend, but your enemy." Then use the preset word segmentation tool to segment the first subtitle information to obtain each word segment included in the first subtitle information For "most", "understand", "your", "person", "not", "your", "friend", "but", "your", "enemy", then the second keyword The set is {"most", "know", "your", "person", "not", "your", "friend", "rather", "your", "enemy"}.

预设分词工具可以为StandardAnalyzer(标准化分词工具),ChineseAnalyzer(中国分词工具),CJKAnalyzer(CJK分词工具)或者IKAnalyzer(IK分词工具)。在本公开实施例中,对预设分词工具不做具体限定。The preset word segmentation tool can be StandardAnalyzer (standardized word segmentation tool), ChineseAnalyzer (Chinese word segmentation tool), CJKAnalyzer (CJK word segmentation tool) or IKAnalyzer (IK word segmentation tool). In the embodiment of the present disclosure, there is no specific limitation on the preset word segmentation tool.

由于“的”、“了”、“么”、“吧”、“啊”、“最”之类的关键词对标签信息起不到关键作用;因此,为了减少运算量以及提高设置标签信息的准确性,在本步骤中,服务器还可以通过以下步骤(2)将“的”、“了”、“么”、“吧”、“啊”、“最”之类的关键词从第二关键词集合中去除。Because keywords such as "de", "le", "what", "bar", "ah", and "most" do not play a key role in label information; therefore, in order to reduce the amount of calculation and improve the efficiency of setting label information Accuracy, in this step, the server can also pass the following steps (2) keywords such as " of ", " ", " what ", " bar ", " ah ", " most " from the second key removed from the word set.

(2):服务器将第二关键词集合中预设类型的关键词去除,得到第一关键字集合。(2): The server removes keywords of a preset type in the second keyword set to obtain the first keyword set.

预设类型的关键词可以为语气词或者助词等。则本步骤可以为:服务器标注第二关键词集合中的每个关键词的词性,根据第二关键词集合中的每个关键词,从第二关键词集合中查找预设类型的关键词,将预设类型的关键词从第二关键词集合中去除,得到第一关键词集合。The keywords of the preset type may be modal particles or auxiliary words. Then this step can be: the server marks the part of speech of each keyword in the second keyword set, and searches for a preset type of keyword from the second keyword set according to each keyword in the second keyword set, The keywords of the preset type are removed from the second keyword set to obtain the first keyword set.

例如,服务器将第二关键词集合{“最”、“了解”、“你的”、“人”、“不是”、“你的”、“朋友”、“而是”、“你的”、“敌人”}中的“最”、“你的”、“人”、“不是”和“而是”去除,得到第一关键词集合为{“了解”、“朋友”、“敌人”}。For example, the server collects the second key words {"most", "understanding", "your", "person", "not", "your", "friend", "but", "your", "Most", "your", "person", "not" and "but" in "enemy"} are removed, and the first keyword set is obtained as {"knowledge", "friend", "enemy"}.

在一个可能的实现方式中,由于第一关键词集合中可能包含同义词或者近义词,例如,“首都”和“北京”是同义词;因此,为了减少运算量,服务器得到第一关键词集合之后,还可以将第一关键词集合中的多个同义词或者近义词合并为一个关键词。由于减少了第一关键词集合中的关键词的数量,因此,减少了服务器的运算量,进而提高了获取第一标签信息的效率。In a possible implementation, since the first keyword set may contain synonyms or near synonyms, for example, "capital" and "Beijing" are synonyms; therefore, in order to reduce the amount of computation, after the server obtains the first keyword set, it also Multiple synonyms or synonyms in the first keyword set may be combined into one keyword. Since the number of keywords in the first keyword set is reduced, the computation load of the server is reduced, thereby improving the efficiency of obtaining the first tag information.

3013:服务器对第一关键词集合中的每个关键词进行分析,得到第一标签信息。3013: The server analyzes each keyword in the first keyword set to obtain first tag information.

本步骤可以通过以下第一种方式或者第二种方式实现;对于第一种实现方式,本步骤可以通过以下步骤(1)-(3)实现,包括:This step can be realized through the following first or second way; for the first kind of implementation, this step can be realized through the following steps (1)-(3), including:

(1):服务器获取每个关键词在第一字幕信息中的概率。(1): The server obtains the probability of each keyword in the first subtitle information.

服务器获取每个关键词在第一字幕信息中出现的出现次数,计算每个关键词的出现次数之和,将每个关键词的出现次数与该出现次数之和的比值确定为每个关键词在第一字幕信息中的概率。The server acquires the number of occurrences of each keyword in the first subtitle information, calculates the sum of the number of occurrences of each keyword, and determines the ratio of the number of occurrences of each keyword to the sum of the number of occurrences of each keyword as Probability in the first subtitle information.

需要说明的是,如果服务器将第一关键词集合中的多个同义词或者近义词合并为一个关键词,则服务器获取该关键词在第一字幕信息中的概率时,服务器获取该关键词的同义词或者近义词在第一字幕信息中出现的出现次数之和,计算每个关键词的出现次数之和,将该关键词的同义词或者近义词在第一字幕信息中出现的出现次数之和与每个关键词的出现次数之和的比值确定为该关键词在第一字幕信息中的概率。It should be noted that, if the server combines multiple synonyms or synonyms in the first keyword set into one keyword, when the server obtains the probability of the keyword in the first subtitle information, the server obtains the synonyms or synonyms of the keyword The sum of the number of times that synonyms appear in the first subtitle information, calculate the sum of the number of occurrences of each keyword, and compare the sum of the number of times that synonyms of the keyword or synonyms appear in the first subtitle information with each keyword The ratio of the sum of the occurrence times of is determined as the probability of the keyword in the first subtitle information.

(2):服务器获取每个关键词属于主题信息库中的每个主题信息的概率,该主题信息库用于存储多个预设的主题信息。(2): The server obtains the probability that each keyword belongs to each topic information in the topic information database, and the topic information database is used to store a plurality of preset topic information.

预设的主题信息可以为“友谊”、“情感”和“爱情”等。本步骤可以通过以下步骤(2-1)-(2-2)实现,包括:The preset theme information may be "friendship", "emotion" and "love". This step can be realized through the following steps (2-1)-(2-2), including:

(2-1):对于每个主题信息,服务器获取该主题信息对应的预设关键词集合。(2-1): For each subject information, the server acquires a preset keyword set corresponding to the subject information.

对于主题信息库中的每个主题信息,服务器中存储每个主题信息和预设关键词集合的对应关系;相应的,本步骤可以为:For each topic information in the topic information database, the server stores the corresponding relationship between each topic information and the preset keyword set; correspondingly, this step can be:

服务器根据该主题信息,从主题信息和预设关键词集合的对应关系中获取该主题信息对应的预设关键词集合。其中,该预设关键词集合中包括属于该主题信息的多个预设的关键词。According to the theme information, the server obtains the preset keyword set corresponding to the theme information from the correspondence relationship between the theme information and the preset keyword set. Wherein, the preset keyword set includes a plurality of preset keywords belonging to the subject information.

例如,服务器获取主题信息“友谊”对应的预设关键词集合为{朋友、友谊、义气}。For example, the preset keyword set corresponding to the subject information "friendship" acquired by the server is {friend, friendship, loyalty}.

(2-2):服务器根据每个关键词在字幕信息中的概率、该预设关键词集合和该预设关键词集合包含的关键词的数目,确定该每个关键词属于该主题信息的概率。(2-2): The server determines that each keyword belongs to the topic information according to the probability of each keyword in the subtitle information, the preset keyword set and the number of keywords contained in the preset keyword set probability.

对于每个关键词,服务器检测该预设关键词集合中是否包含该关键词;如果该预设关键词集合中包含该关键词,将该关键词在第一字幕信息中的概率和该预设关键词集合包含的关键词的数目的比值作为该关键词属于该主题的概率。For each keyword, the server detects whether the keyword is included in the preset keyword set; if the keyword is included in the preset keyword set, the probability of the keyword in the first subtitle information and the preset The ratio of the number of keywords contained in the keyword set is used as the probability that the keyword belongs to the topic.

如果该预设关键词集合中不包含该关键词,确定该关键词属于该主题信息的概率为零。If the keyword is not included in the preset keyword set, it is determined that the probability that the keyword belongs to the subject information is zero.

(3):服务器根据每个关键词在第一字幕信息中的概率和每个关键词属于每个主题信息的概率,确定第一多媒体文件属于每个主题信息的概率。(3): The server determines the probability that the first multimedia file belongs to each topic information according to the probability of each keyword in the first subtitle information and the probability that each keyword belongs to each topic information.

本步骤可以通过以下步骤(3-1)-(3-3)实现,包括:This step can be realized through the following steps (3-1)-(3-3), including:

(3-1):服务器将每个关键词在第一字幕信息中的概率组成第一概率矩阵,以及,将每个关键词属于每个主题信息的概率组成第二概率矩阵。(3-1): The server forms the probability of each keyword in the first subtitle information into a first probability matrix, and forms the probability of each keyword belonging to each topic information into a second probability matrix.

服务器将每个关键词在第一字幕信息中的概率作为一行数据,组成第一概率矩阵;对于每个关键词,服务器将该关键词属于每个主题信息的概率作为一行数据,组成第二概率矩阵。The server takes the probability of each keyword in the first subtitle information as a row of data to form the first probability matrix; for each keyword, the server takes the probability of the keyword belonging to each topic information as a row of data to form the second probability matrix.

第一概率矩阵为n×1的矩阵,第二概率矩阵为n×m的矩阵;其中,n为第一关键词集合中包括的关键词的个数,m为主题信息库中包括的预设的主题信息的个数。The first probability matrix is a matrix of n×1, and the second probability matrix is a matrix of n×m; wherein, n is the number of keywords included in the first keyword set, and m is a preset included in the subject information database The number of subject information for .

例如,每个关键词分别为A、B和C;A、B和C在第一字幕信息中的概率分别为PA、PB和PC,主题信息库中包括的每个主题信息分别为主题1、主题2、主题3和主题4;关键词A属于每个主题信息的概率分别为A1、A2、A3和A4,关键词B属于每个主题信息的概率分别为B1、B2、B3和B4,关键词C属于每个主题信息的概率分别为C1、C2、C3和C4。For example, each keyword is A, B, and C respectively; the probabilities of A, B, and C in the first subtitle information are PA, P B , and PC respectively, and each topic information included in the topic information base is Topic 1, Topic 2, Topic 3, and Topic 4; the probabilities of keyword A belonging to each topic information are A1, A2, A3, and A4, and the probabilities of keyword B belonging to each topic information are B1, B2, B3, and B4, the probability that keyword C belongs to each topic information is C1, C2, C3 and C4 respectively.

则第一概率矩阵为第二概率矩阵为 Then the first probability matrix is The second probability matrix is

(3-2):服务器将第二概率矩阵的逆矩阵与第一概率矩阵相乘,得到第三概率矩阵。(3-2): The server multiplies the inverse matrix of the second probability matrix by the first probability matrix to obtain a third probability matrix.

服务器根据第二概率矩阵,确定第二概率矩阵的逆矩阵;将第二概率矩阵的逆矩阵与第一概率矩阵相乘,得到第三概率矩阵。其中,第三概率矩阵为m×1的矩阵,第三概率矩阵中的每行数据即为该第一多媒体文件属于每个主题信息的概率。The server determines an inverse matrix of the second probability matrix according to the second probability matrix; multiplies the inverse matrix of the second probability matrix by the first probability matrix to obtain a third probability matrix. Wherein, the third probability matrix is an m×1 matrix, and each row of data in the third probability matrix is the probability that the first multimedia file belongs to each subject information.

例如,服务器得到第三概率矩阵为 For example, the server obtains the third probability matrix as

(3-3):服务器从第三概率矩阵中获取第一多媒体文件属于每个主题信息的概率。(3-3): The server acquires the probability that the first multimedia file belongs to each topic information from the third probability matrix.

第三概率矩阵中的每行数据即为该第一多媒体文件属于每个主题信息的概率。服务器从第三概率矩阵中即可获取第一多媒体文件属于每个主题信息的概率。Each row of data in the third probability matrix is the probability that the first multimedia file belongs to each subject information. The server can obtain the probability that the first multimedia file belongs to each topic information from the third probability matrix.

例如,第三矩阵为则P1为该第一多媒体文件属于主题信息1的概率,P2为该第一多媒体文件属于主题信息2的概率,P3为该第一多媒体文件属于主题信息3的概率,P4为该第一多媒体文件属于主题信息4的概率。For example, the third matrix is Then P 1 is the probability that the first multimedia file belongs to topic information 1, P 2 is the probability that this first multimedia file belongs to topic information 2, and P 3 is the probability that this first multimedia file belongs to topic information 3. Probability, P 4 is the probability that the first multimedia file belongs to topic information 4.

(4):服务器根据该第一多媒体文件属于每个主题信息的概率,从每个主题信息中选择概率最大的预设数目个主题信息。(4): The server selects a preset number of topic information with the highest probability from each topic information according to the probability that the first multimedia file belongs to each topic information.

为了便于区分,将该处的预设数目称为第二预设数目,第二预设数目可以根据需要进行设置并更改,在本公开实施例中,对第二预设数目不作具体限定;例如,第二预设数目可以为1或者2等。In order to facilitate the distinction, the preset number here is called the second preset number, and the second preset number can be set and changed according to needs. In the embodiment of the present disclosure, the second preset number is not specifically limited; for example , the second preset number may be 1 or 2 and so on.

(5):服务器将选择的第二预设数目个主题信息组成该第一多媒体文件的第一标签信息。(5): The server forms the selected second preset number of theme information into the first tag information of the first multimedia file.

例如,选择的主题信息为喜剧和爱情,则第一多媒体文件的第一标签信息为喜剧和爱情。For example, if the selected theme information is comedy and romance, the first tag information of the first multimedia file is comedy and romance.

对于第二种实现方式,本步骤可以为:For the second implementation, this step can be:

服务器获取每个关键词在第一字幕信息中的概率,根据每个关键词在第一字幕信息中的概率,从每个关键词中选择概率最大的第三预设数目个关键词,获取选择的关键词所属的主题信息,将选择的关键字所属的主题信息组成该第一多媒体文件的标签信息。The server obtains the probability of each keyword in the first subtitle information, and selects a third preset number of keywords with the highest probability from each keyword according to the probability of each keyword in the first subtitle information, and obtains the selected The subject information to which the selected keyword belongs, and the subject information to which the selected keyword belongs constitute the label information of the first multimedia file.

当第一标签信息包括制作第一多媒体文件的用户信息和/或被拍摄的用户信息时,第一字幕信息中包括制作第一多媒体文件的用户信息和/或被拍摄的用户信息,服务器可以直接从第一字幕信息中提取制作第一多媒体文件的用户信息和/或被拍摄的用户信息。When the first tag information includes the information of the user who made the first multimedia file and/or the information of the user who was photographed, the first subtitle information includes the information of the user who made the first multimedia file and/or the information of the user who was photographed , the server may directly extract the information of the user who made the first multimedia file and/or the information of the user who was photographed from the first subtitle information.

服务器通过关键词匹配的方法,从第一字幕信息中提取制作第一多媒体文件的用户信息和/或被拍摄的用户信息。The server extracts the information of the user who made the first multimedia file and/or the information of the user who was photographed from the first subtitle information by means of keyword matching.

例如,制作第一多媒体文件的用户信息为第一多媒体文件的导演,则服务器在第一字幕信息中提取关键词“导演”后的名称。For example, if the information about the user who made the first multimedia file is the director of the first multimedia file, the server extracts the name after the keyword "director" from the first subtitle information.

需要说明的是,服务器可以事先获取服务器中的每个多媒体文件的标签信息,并建立每个多媒体文件的标识和标签信息的对应关系;相应的,本步骤可以为:It should be noted that the server can obtain the label information of each multimedia file in the server in advance, and establish the corresponding relationship between the identification of each multimedia file and the label information; correspondingly, this step can be:

服务器根据第一多媒体文件的标识,从多媒体文件的标识和标签信息的对应关系中获取第一多媒体文件的第一标签信息。According to the identifier of the first multimedia file, the server acquires the first tag information of the first multimedia file from the correspondence between the identifier of the multimedia file and the tag information.

其中,服务器事先获取服务器中的每个多媒体文件的标签信息的步骤和服务器获取第一多媒体文件的第一标签信息的过程相同,在此不再赘述。Wherein, the step for the server to acquire the label information of each multimedia file in the server in advance is the same as the process for the server to acquire the first label information of the first multimedia file, and will not be repeated here.

在步骤S302中,服务器获取多媒体文件库中每个第二多媒体文件的第二标签信息,每个第二多媒体文件的第二标签信息为基于每个第二多媒体文件的第二字幕信息提取的。In step S302, the server obtains the second label information of each second multimedia file in the multimedia file library, and the second label information of each second multimedia file is based on the first label information of each second multimedia file. Second subtitle information extraction.

对于每个第二多媒体文件,服务器可以按照以上获取第一多媒体文件的第一标签信息的过程获取该第二多媒体文件的第二标签信息。服务器也可以事先获取服务器中的每个多媒体文件的标签信息,并建立每个多媒体文件的标识和标签信息的对应关系;相应的,本步骤可以为:For each second multimedia file, the server may obtain the second tag information of the second multimedia file according to the above process of obtaining the first tag information of the first multimedia file. The server can also obtain the label information of each multimedia file in the server in advance, and establish the corresponding relationship between the identification of each multimedia file and the label information; correspondingly, this step can be:

服务器根据每个第二多媒体文件的标识,从多媒体文件的标识和标签信息的对应关系中获取每个第二多媒体文件的第二标签信息。According to the identifier of each second multimedia file, the server obtains the second tag information of each second multimedia file from the correspondence between the identifier of the multimedia file and the tag information.

其中,第二标签信息包括第二多媒体文件所属的主题信息、制作第二多媒体文件的用户信息和被拍摄的用户信息中的至少一个。Wherein, the second tag information includes at least one of subject information to which the second multimedia file belongs, user information that makes the second multimedia file, and user information that is photographed.

在步骤S303中,服务器根据第一标签信息和多媒体文件库中每个第二多媒体文件的第二标签信息,从多媒体文件库中选择第二标签信息与第一标签信息匹配的第二多媒体文件。In step S303, according to the first tag information and the second tag information of each second multimedia file in the multimedia file library, the server selects the second multi-media file whose second tag information matches the first tag information from the multimedia file library. media files.

服务器计算第一标签信息和多媒体文件库中每个第二多媒体文件的第二标签信息之间的匹配度,根据第一标签信息和多媒体文件库中每个第二多媒体文件的第二标签信息之间的匹配度,从多媒体文件中选择匹配度最大或者匹配度超过预设数值的第二标签信息对应的第二多媒体文件。The server calculates the matching degree between the first tag information and the second tag information of each second multimedia file in the multimedia file library, according to the first tag information and the first tag information of each second multimedia file in the multimedia file library. For the matching degree between the two tag information, select the second multimedia file corresponding to the second tag information with the highest matching degree or the matching degree exceeding a preset value from the multimedia files.

由于第一标签信息中包括第一多媒体文件所属的主题信息、制作第一多媒体文件的用户信息和被拍摄的用户信息中的至少一个。因此结合第一多媒体文件所属的主题信息、制作第一多媒体文件的用户信息和被拍摄的用户信息中的至少一个为用户推荐第二多媒体文件,能够提高向用户推荐第二多媒体文件的准确性。Because the first tag information includes at least one of subject information to which the first multimedia file belongs, user information that makes the first multimedia file, and user information that is photographed. Therefore, in combination with at least one of the subject information to which the first multimedia file belongs, the user information for making the first multimedia file, and the photographed user information, recommending the second multimedia file to the user can improve the ability to recommend the second multimedia file to the user. Accuracy of multimedia files.

例如,第一标签信息包括主题信息和制作第一多媒体文件的用户信息,第一标签信息为爱情和浪漫,张三。多媒体文件库中包括3个第二多媒体文件,分别为第二多媒体文件1、第二多媒体文件2和第二多媒体文件3,第二多媒体文件1的第二标签信息为:恐怖和惊悚,李四。第二多媒体文件2的第二标签信息为:爱情和浪漫,张三。第二多媒体文件3的第二标签信息为:古装和喜剧,赵五。则服务器从多媒体文件库中选择第二多媒体文件2。For example, the first tag information includes topic information and user information for making the first multimedia file, and the first tag information is love and romance, Zhang San. Comprise 3 second multimedia files in the multimedia file storehouse, be respectively the second multimedia file 1, the second multimedia file 2 and the second multimedia file 3, the second multimedia file 1 of the second The tag information is: Horror and Thriller, Lee Si. The second label information of the second multimedia file 2 is: love and romance, Zhang San. The second label information of the second multimedia file 3 is: costume and comedy, Zhao Wu. Then the server selects the second multimedia file 2 from the multimedia file library.

在步骤S304中,服务器向终端发送第二多媒体文件的标识。In step S304, the server sends the identifier of the second multimedia file to the terminal.

终端接收服务器发送的第二多媒体文件的标识,显示第二多媒体文件的标识;当用户播放第二多媒体文件时,终端根据第二多媒体标识,从服务器中拉取第二多媒体文件。其中,第二多媒体文件的标识可以为第二多媒体文件的播放链接。The terminal receives the identifier of the second multimedia file sent by the server, and displays the identifier of the second multimedia file; when the user plays the second multimedia file, the terminal pulls the second multimedia file from the server according to the second multimedia identifier. Two multimedia files. Wherein, the identifier of the second multimedia file may be a play link of the second multimedia file.

由于第一标签信息为基于第一多媒体文件的第一字幕信息提取的,每个第二多媒体文件的第二标签信息为基于每个第二多媒体文件的第二字幕信息提取的,因此提取的第一标签信息和每个第二多媒体文件的第二标签信息较准确,从而根据第一标签信息和每个第二多媒体文件的第二标签信息,向用户推荐第二多媒体文件,能够提高向用户推荐第二多媒体文件的准确性。Since the first tag information is extracted based on the first subtitle information of the first multimedia file, the second tag information of each second multimedia file is extracted based on the second subtitle information of each second multimedia file Therefore, the extracted first tag information and the second tag information of each second multimedia file are more accurate, so according to the first tag information and the second tag information of each second multimedia file, recommend The second multimedia file can improve the accuracy of recommending the second multimedia file to the user.

图4是根据一示例性实施例示出的一种推荐多媒体文件的装置框图。参照图4,该装置包括获取模块401,选择模块402和发送模块403。Fig. 4 is a block diagram of an apparatus for recommending multimedia files according to an exemplary embodiment. Referring to FIG. 4 , the device includes an acquisition module 401 , a selection module 402 and a sending module 403 .

获取模块401,被配置为获取终端对应的第一多媒体文件的第一标签信息,所述第一标签信息为基于所述第一多媒体文件的第一字幕信息提取的;The obtaining module 401 is configured to obtain the first tag information of the first multimedia file corresponding to the terminal, the first tag information is extracted based on the first subtitle information of the first multimedia file;

选择模块402,被配置为根据所述第一标签信息和多媒体文件库中每个第二多媒体文件的第二标签信息,从所述多媒体文件库中选择第二标签信息与所述第一标签信息匹配的第二多媒体文件,所述每个第二多媒体文件的第二标签信息为基于所述每个第二多媒体文件的第二字幕信息提取的;The selection module 402 is configured to select the second tag information and the first tag information from the multimedia file library according to the first tag information and the second tag information of each second multimedia file in the multimedia file library. The second multimedia file whose label information matches, the second label information of each second multimedia file is extracted based on the second subtitle information of each second multimedia file;

发送模块403,被配置为向所述终端发送所述第二多媒体文件。The sending module 403 is configured to send the second multimedia file to the terminal.

在一种可能实现方式中,参见图5,所述获取模块401,包括:In a possible implementation manner, referring to FIG. 5, the obtaining module 401 includes:

获取单元4011,被配置为获取所述第一多媒体文件的第一字幕信息;The obtaining unit 4011 is configured to obtain the first subtitle information of the first multimedia file;

分词单元4012,被配置为对所述第一字幕信息进行分词,得到第一关键词集合;The word segmentation unit 4012 is configured to perform word segmentation on the first subtitle information to obtain a first keyword set;

分析单元4013,被配置为对所述第一关键词集合中的每个关键词进行分析,得到所述第一标签信息。The analysis unit 4013 is configured to analyze each keyword in the first keyword set to obtain the first tag information.

在一种可能实现方式中,所述分析单元4013,还被配置为获取所述每个关键词在所述第一字幕信息中的概率,以及,获取所述每个关键词属于主题信息库中的每个主题信息的概率,所述主题信息库被配置为存储多个预设的主题信息,根据所述每个关键词在所述第一字幕信息中的概率和所述每个关键词属于每个主题信息的概率,确定所述第一多媒体文件属于所述每个主题信息的概率,根据所述第一多媒体文件属于所述每个主题信息的概率,从所述每个主题信息中选择概率最大的预设数目个主题信息,将选择的预设数目个主题信息组成所述第一标签信息。In a possible implementation manner, the analysis unit 4013 is further configured to obtain the probability of each keyword in the first subtitle information, and obtain the probability that each keyword belongs to the subject information database The probability of each topic information, the topic information base is configured to store a plurality of preset topic information, according to the probability of each keyword in the first subtitle information and the probability that each keyword belongs to The probability of each topic information, determine the probability that the first multimedia file belongs to the each topic information, according to the probability that the first multimedia file belongs to the each topic information, from each A preset number of topic information with the highest probability is selected from the topic information, and the selected preset number of topic information is combined into the first tag information.

在一种可能实现方式中,所述分析单元4013,还被配置为将所述每个关键词在所述第一字幕信息中的概率组成第一概率矩阵,以及,将所述每个关键词属于每个主题信息的概率组成第二概率矩阵,将所述第二概率矩阵的逆矩阵与所述第一概率矩阵相乘,得到第三概率矩阵,从所述第三概率矩阵中获取所述第一多媒体文件属于所述每个主题信息的概率。In a possible implementation manner, the analysis unit 4013 is further configured to form the probability of each keyword in the first subtitle information into a first probability matrix, and combine the probability of each keyword The probability belonging to each topic information forms a second probability matrix, multiplying the inverse matrix of the second probability matrix by the first probability matrix to obtain a third probability matrix, and obtaining the The probability that the first multimedia file belongs to each subject information.

在一种可能实现方式中,所述分析单元4013,还被配置为对于所述每个主题信息,获取所述主题信息对应的预设关键词集合,根据所述每个关键词在所述第一字幕信息中的概率、所述预设关键词集合和所述预设关键词集合包含的关键词的数目,确定所述每个关键词属于所述主题信息的概率。In a possible implementation manner, the analyzing unit 4013 is further configured to, for each subject information, obtain a preset keyword set corresponding to the subject information, and according to each keyword in the A probability in the subtitle information, the preset keyword set and the number of keywords included in the preset keyword set determine the probability that each keyword belongs to the subject information.

在一种可能实现方式中,所述分析单元4013,还被配置为如果所述预设关键词集合中包含所述每个关键词,将所述每个关键词在所述第一字幕信息中的概率和所述预设关键词集合包含的关键词的数目的比值作为所述每个关键词属于所述主题信息的概率,如果所述预设关键词集合中不包含所述每个关键词,确定所述每个关键词属于所述主题信息的概率为零。In a possible implementation manner, the analysis unit 4013 is further configured to include each keyword in the first subtitle information if the preset keyword set contains each keyword The ratio of the probability of the keyword to the number of keywords contained in the preset keyword set is used as the probability that each keyword belongs to the topic information, if the preset keyword set does not contain the each keyword , determining that the probability that each keyword belongs to the subject information is zero.

在一种可能实现方式中,所述分词单元4012,还被配置为对所述第一字幕信息进行分词,将所述第一字幕信息包括的每个分词组成第二关键词集合,将所述第二关键词集合中预设类型的关键词去除,得到所述第一关键字集合。In a possible implementation manner, the word segmentation unit 4012 is further configured to perform word segmentation on the first subtitle information, form each word segment included in the first subtitle information into a second keyword set, and combine the The keywords of the preset type in the second keyword set are removed to obtain the first keyword set.

在一种可能实现方式中,所述第一标签信息包括所述第一多媒体文件所属的主题信息、制作所述第一多媒体文件的用户信息和被拍摄的用户信息中的至少一个。In a possible implementation manner, the first tag information includes at least one of subject information to which the first multimedia file belongs, user information that produced the first multimedia file, and user information that was photographed .

由于第一标签信息为基于第一多媒体文件的第一字幕信息提取的,每个第二多媒体文件的第二标签信息为基于每个第二多媒体文件的第二字幕信息提取的,因此提取的第一标签信息和每个第二多媒体文件的第二标签信息较准确,从而根据第一标签信息和每个第二多媒体文件的第二标签信息,向用户推荐第二多媒体文件,能够提高向用户推荐第二多媒体文件的准确性。Since the first tag information is extracted based on the first subtitle information of the first multimedia file, the second tag information of each second multimedia file is extracted based on the second subtitle information of each second multimedia file Therefore, the extracted first tag information and the second tag information of each second multimedia file are more accurate, so according to the first tag information and the second tag information of each second multimedia file, recommend The second multimedia file can improve the accuracy of recommending the second multimedia file to the user.

上述所有可选技术方案,可以采用任意结合形成本公开的可选实施例,在此不再一一赘述。All the above optional technical solutions may be combined in any way to form optional embodiments of the present disclosure, which will not be repeated here.

需要说明的是:上述实施例提供的推荐多媒体文件的装置在推荐多媒体文件时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的推荐多媒体文件的装置与推荐多媒体文件的方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。It should be noted that: when the device for recommending multimedia files provided by the above-mentioned embodiments recommends multimedia files, the division of the above-mentioned functional modules is used as an example for illustration. In practical applications, the above-mentioned functions can be assigned to different functional modules according to needs To complete means to divide the internal structure of the device into different functional modules to complete all or part of the functions described above. In addition, the device for recommending multimedia files provided by the above embodiment is based on the same idea as the embodiment of the method for recommending multimedia files. For the specific implementation process, refer to the method embodiment for details, and will not be repeated here.

图6是根据一示例性实施例示出的一种用于推荐多媒体文件的装置600的框图。例如,装置600可以被提供为一服务器。参照图6,装置600包括处理组件622,其进一步包括一个或多个处理器,以及由存储器632所代表的存储器资源,用于存储可由处理组件622的执行的指令,例如应用程序。存储器632中存储的应用程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外,处理组件622被配置为执行指令,以执行上述推荐多媒体文件的方法。Fig. 6 is a block diagram of an apparatus 600 for recommending multimedia files according to an exemplary embodiment. For example, the apparatus 600 may be provided as a server. Referring to FIG. 6 , apparatus 600 includes processing component 622 , which further includes one or more processors, and a memory resource represented by memory 632 for storing instructions executable by processing component 622 , such as application programs. The application program stored in memory 632 may include one or more modules each corresponding to a set of instructions. In addition, the processing component 622 is configured to execute instructions to perform the above-mentioned method for recommending multimedia files.

装置600还可以包括一个电源组件626被配置为执行装置600的电源管理,一个有线或无线网络接口650被配置为将装置600连接到网络,和一个输入输出(I/O)接口658。装置600可以操作基于存储在存储器632的操作系统,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM或类似。Device 600 may also include a power component 626 configured to perform power management of device 600 , a wired or wireless network interface 650 configured to connect device 600 to a network, and an input-output (I/O) interface 658 . The apparatus 600 can operate based on an operating system stored in the memory 632, such as Windows Server , Mac OS X , Unix , Linux , FreeBSD or the like.

本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其它实施方案。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。Other embodiments of the present disclosure will be readily apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any modification, use or adaptation of the present disclosure, and these modifications, uses or adaptations follow the general principles of the present disclosure and include common knowledge or conventional technical means in the technical field not disclosed in the present disclosure . The specification and examples are to be considered exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。It should be understood that the present disclosure is not limited to the precise constructions which have been described above and shown in the drawings, and various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (17)

1. it is a kind of recommend multimedia file method, it is characterised in that methods described includes:
The first label information of corresponding first multimedia file of terminal is obtained, first label information is based on described first What the first caption information of multimedia file was extracted;
According to the second label information of each the second multimedia file in first label information and multimedia file storehouse, from institute The second multimedia file for selecting the second label information to match with first label information in stating multimedia file storehouse, it is described every Second label information of individual second multimedia file is that the second caption information based on each second multimedia file is extracted 's;
The mark of second multimedia file is sent to the terminal.
2. method according to claim 1, it is characterised in that the of corresponding first multimedia file of the acquisition terminal One label information, including:
Obtain the first caption information of first multimedia file;
Participle is carried out to first caption information, the first keyword set is obtained;
Each key word in first keyword set is analyzed, first label information is obtained.
3. method according to claim 2, it is characterised in that described crucial to each in first keyword set Word is analyzed, and obtains first label information, including:
Probability of each key word in first caption information described in obtaining, and, obtain described each key word and belong to The probability of each subject information in subject information storehouse, the subject information storehouse are used to store multiple default subject informations;
Probability and described each key word according to described each key word in first caption information belongs to each theme The probability of information, determines that first multimedia file belongs to the probability of each subject information;
The probability of each subject information according to first multimedia file belongs to, selects from described each subject information The preset number subject information of maximum probability;
The preset number subject information for selecting is constituted into first label information.
4. method according to claim 3, it is characterised in that each key word described in the basis is in first captions Probability and described each key word in information belongs to the probability of each subject information, determines that first multimedia file belongs to The probability of each subject information, including:
Probability by described each key word in first caption information constitutes the first probability matrix, and, will be described every Individual key word belongs to the probability of each subject information and constitutes the second probability matrix;
The inverse matrix of second probability matrix is multiplied with first probability matrix, the 3rd probability matrix is obtained;
The probability that first multimedia file belongs to each subject information is obtained from the 3rd probability matrix.
5. method according to claim 3, it is characterised in that each key word belongs to subject information storehouse described in the acquisition In each subject information probability, including:
For described each subject information, the corresponding preset critical set of words of the subject information is obtained;
Probability, the preset critical set of words according to described each key word in first caption information and described default The number of the key word that keyword set is included, it is determined that described each key word belongs to the probability of the subject information.
6. method according to claim 5, it is characterised in that each key word described in the basis is in first captions The number of the key word that the probability, the preset critical set of words and the preset critical set of words in information is included, determines institute The probability that each key word belongs to the subject information is stated, including:
If described each key word is included in the preset critical set of words, by described each key word in first captions The ratio of the number of the key word that the probability and the preset critical set of words in information is included belongs to as described each key word In the probability of the subject information;
If not including described each key word in the preset critical set of words, it is determined that described each key word belongs to the master The probability of topic information is zero.
7. method according to claim 2, it is characterised in that described to carry out participle to first caption information, obtains First keyword set, including:
Participle is carried out to first caption information, each participle that first caption information is included constitutes the second key word Set;
The key word of preset kind in second keyword set is removed, first set of keywords is obtained.
8. method according to claim 1, it is characterised in that first label information includes the first multimedia text Belonging to part in subject information, the user profile for making first multimedia file and the user profile being taken at least one It is individual.
9. it is a kind of recommend multimedia file device, it is characterised in that described device includes:
Acquisition module, for obtaining the first label information of corresponding first multimedia file of terminal, first label information It is the first caption information extraction based on first multimedia file;
Selecting module, for second according to each the second multimedia file in first label information and multimedia file storehouse Label information, from the multimedia file storehouse, select the second label information to match with first label information matchmaker more than second Body file, the second label information of each second multimedia file are based on each second multimedia file second What caption information was extracted;
Sending module, for the mark of second multimedia file is sent to the terminal.
10. device according to claim 9, it is characterised in that the acquisition module, including:
Acquiring unit, for obtaining the first caption information of first multimedia file;
Participle unit, for carrying out participle to first caption information, obtains the first keyword set;
Analytic unit, for being analyzed to each key word in first keyword set, obtains first label Information.
11. devices according to claim 10, it is characterised in that
The analytic unit, is additionally operable to obtain the probability of each key word in first caption information, and, obtain Described each key word belongs to the probability of each subject information in subject information storehouse, and the subject information storehouse is used to store multiple Default subject information, the probability and described each key word according to described each key word in first caption information belong to In the probability of each subject information, determine that first multimedia file belongs to the probability of each subject information, according to institute The probability that the first multimedia file belongs to each subject information is stated, maximum from select probability in described each subject information The preset number subject information for selecting is constituted first label information by preset number subject information.
12. devices according to claim 11, it is characterised in that
The analytic unit, is additionally operable to the probability by described each key word in first caption information and constitutes the first probability Matrix, and, the probability that described each key word belongs to each subject information is constituted into the second probability matrix, it is general by described second The inverse matrix of rate matrix is multiplied with first probability matrix, obtains the 3rd probability matrix, is obtained from the 3rd probability matrix Take the probability that first multimedia file belongs to each subject information.
13. devices according to claim 11, it is characterised in that
The analytic unit, is additionally operable to for described each subject information, obtains the corresponding predetermined keyword of the subject information Set, probability, the preset critical set of words according to described each key word in first caption information and described pre- If the number of the key word that keyword set is included, it is determined that described each key word belongs to the probability of the subject information.
14. devices according to claim 13, it is characterised in that
The analytic unit, if in being additionally operable to the preset critical set of words include described each key word, by it is described each The ratio of the number of the key word that probability of the key word in first caption information and the preset critical set of words are included Belong to the probability of the subject information as described each key word, if not comprising described every in the preset critical set of words Individual key word, it is determined that the probability that described each key word belongs to the subject information is zero.
15. devices according to claim 10, it is characterised in that
The participle unit, is additionally operable to carry out participle to first caption information, by first caption information include it is every Individual participle constitutes the second keyword set, and the key word of preset kind in second keyword set is removed, and obtains described First set of keywords.
16. devices according to claim 9, it is characterised in that first label information includes first multimedia In subject information, the user profile for making first multimedia file and the user profile being taken belonging to file at least One.
17. a kind of devices for recommending multimedia file, it is characterised in that include:
Processor;
For storing the memorizer of processor executable;
Wherein, the processor is configured to:Obtain the first label information of corresponding first multimedia file of terminal, described the One label information is that the first caption information based on first multimedia file is extracted;
According to the second label information of each the second multimedia file in first label information and multimedia file storehouse, from institute The second multimedia file for selecting the second label information to match with first label information in stating multimedia file storehouse, it is described every Second label information of individual second multimedia file is that the second caption information based on each second multimedia file is extracted 's;
Second multimedia file is sent to the terminal.
CN201611235464.6A 2016-12-28 2016-12-28 Method and device for recommending multi-media files Pending CN106611059A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611235464.6A CN106611059A (en) 2016-12-28 2016-12-28 Method and device for recommending multi-media files

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611235464.6A CN106611059A (en) 2016-12-28 2016-12-28 Method and device for recommending multi-media files

Publications (1)

Publication Number Publication Date
CN106611059A true CN106611059A (en) 2017-05-03

Family

ID=58636204

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611235464.6A Pending CN106611059A (en) 2016-12-28 2016-12-28 Method and device for recommending multi-media files

Country Status (1)

Country Link
CN (1) CN106611059A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108156480A (en) * 2017-12-27 2018-06-12 腾讯科技(深圳)有限公司 A kind of method, relevant apparatus and the system of video caption generation
CN116244496A (en) * 2022-12-06 2023-06-09 山东紫菜云数字科技有限公司 A resource recommendation method based on industry chain

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102244812A (en) * 2010-06-23 2011-11-16 微软公司 Video content recommendation
CN103488764A (en) * 2013-09-26 2014-01-01 天脉聚源(北京)传媒科技有限公司 Personalized video content recommendation method and system
CN103744835A (en) * 2014-01-02 2014-04-23 上海大学 Text keyword extracting method based on subject model
CN104093037A (en) * 2014-06-10 2014-10-08 腾讯科技(深圳)有限公司 Subtitle correction method and apparatus
CN105574132A (en) * 2015-12-15 2016-05-11 海信集团有限公司 Multimedia file recommendation method and terminal
CN105898495A (en) * 2016-05-26 2016-08-24 维沃移动通信有限公司 Method for pushing mobile terminal recommended information and mobile terminal

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102244812A (en) * 2010-06-23 2011-11-16 微软公司 Video content recommendation
CN103488764A (en) * 2013-09-26 2014-01-01 天脉聚源(北京)传媒科技有限公司 Personalized video content recommendation method and system
CN103744835A (en) * 2014-01-02 2014-04-23 上海大学 Text keyword extracting method based on subject model
CN104093037A (en) * 2014-06-10 2014-10-08 腾讯科技(深圳)有限公司 Subtitle correction method and apparatus
CN105574132A (en) * 2015-12-15 2016-05-11 海信集团有限公司 Multimedia file recommendation method and terminal
CN105898495A (en) * 2016-05-26 2016-08-24 维沃移动通信有限公司 Method for pushing mobile terminal recommended information and mobile terminal

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108156480A (en) * 2017-12-27 2018-06-12 腾讯科技(深圳)有限公司 A kind of method, relevant apparatus and the system of video caption generation
CN116244496A (en) * 2022-12-06 2023-06-09 山东紫菜云数字科技有限公司 A resource recommendation method based on industry chain
CN116244496B (en) * 2022-12-06 2023-12-01 山东紫菜云数字科技有限公司 Resource recommendation method based on industrial chain

Similar Documents

Publication Publication Date Title
CN106331778B (en) Video recommendation method and device
CN106528894B (en) Method and device for setting tag information
CN104255038B (en) A method and system for supplementing live broadcasting
CN106326391B (en) Multimedia resource recommendation method and device
US20150242497A1 (en) User interest recommending method and apparatus
CN105163142B (en) A kind of user preference determines method, video recommendation method and system
WO2017084541A1 (en) Method and apparatus for sending expression image during call session
CN110896488B (en) Recommendation method for live broadcast room and related equipment
CN105574030B (en) A kind of information search method and device
CN106407361A (en) Method and device for pushing information based on artificial intelligence
US9471669B2 (en) Presenting previously selected search results
CN105447205B (en) Method and device for sorting search results
CN108197336B (en) Video searching method and device
JP6865763B2 (en) Data processing method and equipment
CN108335165B (en) Interest tag determination method and device
CN112559923A (en) Website resource recommendation method and device, electronic equipment and computer storage medium
US9740695B2 (en) Method for enriching a multimedia content, and corresponding device
CN107547626B (en) User portrait sharing method and device
CN113626638A (en) Short video recommendation processing method and device, intelligent terminal and storage medium
US8171020B1 (en) Spam detection for user-generated multimedia items based on appearance in popular queries
CN110209780B (en) Question template generation method and device, server and storage medium
CN112825089A (en) Article recommendation method, article recommendation device, article recommendation equipment and storage medium
CN109558531A (en) News information method for pushing, device and computer equipment
CN110337008B (en) Video interaction adjusting method, device, equipment and storage medium
US10037550B2 (en) System and method for identifying offline consumer interests for online personalization by leveraging multimedia inputs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170503

RJ01 Rejection of invention patent application after publication