CN104572952B

CN104572952B - The recognition methods of live multimedia file and device

Info

Publication number: CN104572952B
Application number: CN201410849032.9A
Authority: CN
Inventors: 谭傅伦; 许泽军; 王晓萌; 王英杰; 袁斌
Original assignee: LeTV Information Technology Beijing Co Ltd
Current assignee: Shanghai Zongzhang Technology Group Co.,Ltd.
Priority date: 2014-12-29
Filing date: 2014-12-29
Publication date: 2018-04-17
Anticipated expiration: 2034-12-29
Also published as: CN104572952A

Abstract

The invention discloses a method and a device for identifying live multimedia files. The method comprises: obtaining the feature information of the current period of the live multimedia file according to the real-time data stream of the input live multimedia file; locating the multimedia record to be updated in the feature database according to the identification information of the live multimedia file; Information update the feature sample in the multimedia record to be updated; receive the identification request for identifying the target multimedia file, match the feature information of the target multimedia file included in the identification request with the feature sample in the feature database, to locate the multimedia record corresponding to the target multimedia file ; Obtain the identification information of the multimedia file corresponding to the target multimedia file. Through the present invention, live video can be identified.

Description

Method and device for identifying live multimedia files

技术领域technical field

本发明涉及视频识别技术领域，具体而言，特别涉及一种直播多媒体文件的识别方法及装置。The present invention relates to the technical field of video identification, in particular to a method and device for identifying live multimedia files.

背景技术Background technique

当前的视频搜索方式，通常使用是视频的“关键字”进行搜索。这不但要求用户知晓该视频的相关信息，同时也要求搜索服务提供方能及时维护与视频一一对应的“关键字”数据库。而实际上，我们常常会遭遇到这样的尴尬：在大街小巷或者电视机前邂逅一段有趣的视频，但我们并不熟悉甚至不知道这段视频的信息，更别说通过“关键字”搜索到这段视频了。The current video search method usually uses the "keyword" of the video to search. This not only requires the user to know the relevant information of the video, but also requires the search service provider to maintain a "keyword" database corresponding to the video in a timely manner. In fact, we often encounter such embarrassment: we come across an interesting video in the streets or in front of the TV, but we are not familiar with or even know the information of this video, let alone search through "keywords" It's time for this video.

因而，基于声音识别视频便在这一实际需求的推动之下应运而生。在基于声音识别视频的技术中，当用户需要识别某视频时，首先通过移动端(如：智能手机)的录音设备获取视频中的声音信息，将反映该声音信息的特征数据与云端服务器中的特征数据库进行匹配，并将匹配结果(视频流或者视频相关的信息)返回到移动端。Therefore, video recognition based on sound came into being under the impetus of this practical demand. In the technology based on sound recognition video, when the user needs to identify a certain video, first obtain the sound information in the video through the recording device of the mobile terminal (such as a smart phone), and compare the feature data reflecting the sound information with the cloud server. The feature database performs matching, and returns the matching result (video stream or video-related information) to the mobile terminal.

但是，视频文件具有快速更新、快速上线的特点，甚至很多视频文件采用的是网络直播的形式，所以用户需要识别的视频常常是正在直播的视频。而在现有技术的上述方法中，云端服务器在获取到视频源产生的完整视频后，才会根据视频对应的音频信息构建特征数据库，因此，现有技术的方法并不能识别直播视频。However, video files have the characteristics of fast updating and fast online, and even many video files are in the form of webcasting, so the video that users need to identify is often the video that is being broadcast. However, in the above method of the prior art, the cloud server will construct a feature database according to the audio information corresponding to the video after the cloud server obtains the complete video generated by the video source. Therefore, the method of the prior art cannot identify the live video.

针对现有技术不能识别直播视频的问题，目前尚未提出有效的解决方法。Aiming at the problem that the existing technology cannot recognize the live video, no effective solution has been proposed yet.

发明内容Contents of the invention

本发明的主要目的在于提供一种直播多媒体文件的识别方法及装置，以解决现有技术不能识别直播视频的问题。The main purpose of the present invention is to provide a method and device for identifying live multimedia files, so as to solve the problem that the prior art cannot identify live video.

依据本发明的一个方面，提供了一种直播多媒体文件的识别方法。According to one aspect of the present invention, a method for identifying live multimedia files is provided.

根据本发明的直播多媒体文件的识别方法包括：根据输入的直播多媒体文件的实时数据流获取直播多媒体文件当前时段的特征信息；根据直播多媒体文件的标识信息在特征数据库中定位待更新的多媒体记录，其中，特征数据库用于存储至少一条多媒体记录，多媒体记录包括多媒体文件的特征样本、与特征样本对应的标识信息，特征样本的时间长度为第一预定时间；根据直播多媒体文件当前时段的特征信息更新待更新的多媒体记录中的特征样本；接收识别目标多媒体文件的识别请求，匹配识别请求中包括的目标多媒体文件的特征信息与特征数据库中的特征样本，以定位目标多媒体文件对应的多媒体记录；获取目标多媒体文件对应的多媒体文件的标识信息。The identification method of the live multimedia file according to the present invention comprises: according to the real-time data stream of the input live multimedia file, obtains the feature information of the current period of the live multimedia file; according to the identification information of the live multimedia file, locate the multimedia record to be updated in the feature database, Wherein, the feature database is used to store at least one multimedia record, the multimedia record includes feature samples of multimedia files, identification information corresponding to feature samples, and the time length of feature samples is the first predetermined time; update according to feature information of the current period of live multimedia files The feature sample in the multimedia record to be updated; receive the recognition request for identifying the target multimedia file, match the feature information of the target multimedia file included in the recognition request and the feature sample in the feature database, to locate the corresponding multimedia record of the target multimedia file; obtain Identification information of the multimedia file corresponding to the target multimedia file.

进一步地，特征信息为多媒体文件的音频数据的指纹信息，根据输入的直播多媒体文件的实时数据流获取直播多媒体文件当前时段的特征信息，包括：根据实时数据流获取直播多媒体文件的当前时段的音频数据；将当前时段的音频数据按照时间顺序分割为第二预定时间的多个音频片段，其中，第二预定时间小于第一预定时间；以及提取每个音频片段的指纹信息，以得到直播多媒体的当前时段的特征信息。Further, the characteristic information is the fingerprint information of the audio data of the multimedia file, and obtaining the characteristic information of the current period of the live multimedia file according to the input real-time data stream of the live multimedia file includes: obtaining the audio frequency of the current period of the live multimedia file according to the real-time data stream Data; the audio data of the current period is divided into a plurality of audio segments of the second predetermined time in chronological order, wherein the second predetermined time is less than the first predetermined time; and the fingerprint information of each audio segment is extracted to obtain the live multimedia Feature information for the current period.

进一步地，特征样本为n个音频片段的指纹信息，直播多媒体文件的当前时段的特征信息为m个音频片段的指纹信息，m<n，n个音频片段的时间长度为第一预定时间，根据直播多媒体文件的特征信息更新待更新的多媒体记录中的特征样本包括：删除待更新的多媒体记录中特征样本的最早的m个指纹信息；将直播多媒体文件的当前时段的m个指纹信息按时间顺序置于待更新的多媒体记录的特征样本中。Further, the feature sample is the fingerprint information of n audio clips, the feature information of the current period of the live multimedia file is the fingerprint information of m audio clips, m<n, the time length of n audio clips is the first predetermined time, according to The feature information of the live multimedia file updates the feature samples in the multimedia record to be updated including: deleting the earliest m fingerprint information of the feature sample in the multimedia record to be updated; the m fingerprint information of the current period of the live multimedia file in time order Placed in the feature sample of the multimedia record to be updated.

进一步地，根据直播多媒体文件当前时段的特征信息更新待更新的多媒体记录中的特征样本，具体包括：步骤S1：特征指针指向直播多媒体文件当前时段的特征信息中的第一个指纹信息，并将计时器清零开始特征提取计时；步骤S2：获取特征指针指向的指纹信息；步骤S3：提取与直播多媒体的标识信息相对应的多媒体记录的特征样本，以得到第一特征样本；步骤S4：将特征指针指向的指纹信息拼接至第一特征样本的末尾，以得到第二特征样本；步骤S5：从第二特征样本的起始删除一个指纹信息；步骤S6：判断计时器中的时间是否达到第三预定时间，若未达到第三预定时间，特征指针指向下一个指纹信息，并重复执行步骤S2至S6；若达到第三预定时间，用得到的第二特征样本替换多媒体记录中多媒体标识对应的特征样本，其中，第三预定时间为m个指纹信息对应的多媒体文件的播放时间。Further, update the feature sample in the multimedia record to be updated according to the feature information of the current period of the live multimedia file, specifically including: Step S1: the feature pointer points to the first fingerprint information in the feature information of the current period of the live multimedia file, and The timer is cleared to start feature extraction timing; Step S2: Obtain the fingerprint information pointed to by the feature pointer; Step S3: Extract the feature sample of the multimedia record corresponding to the identification information of the live multimedia to obtain the first feature sample; Step S4: The fingerprint information pointed to by the feature pointer is spliced to the end of the first feature sample to obtain the second feature sample; step S5: delete a fingerprint information from the beginning of the second feature sample; step S6: judge whether the time in the timer reaches the first 3. Scheduled time, if the third scheduled time is not reached, the feature pointer points to the next fingerprint information, and steps S2 to S6 are repeated; if the third scheduled time is reached, the obtained second feature sample is used to replace the multimedia identification corresponding to the multimedia record The feature sample, wherein the third predetermined time is the playing time of the multimedia file corresponding to the m fingerprint information.

进一步地，提取音频片段的指纹信息包括：合并音频片段的左声道数据和右声道数据，以得到音频片段的立体声数据；以及提取音频片段的立体声数据的时频特征数据作为音频片段的指纹信息。Further, extracting the fingerprint information of the audio clip includes: merging the left channel data and the right channel data of the audio clip to obtain the stereo data of the audio clip; and extracting the time-frequency characteristic data of the stereo data of the audio clip as the fingerprint of the audio clip information.

进一步地，识别请求中包括的目标多媒体文件的特征信息为直播多媒体文件的当前时段的N个指纹信息,N个指纹信息中的一个指纹信息为目标多媒体的N个立体声数据中的一个立体声数据的时频特征数据，其中，N个立体声数据中的第i个立体声数据为si′＝ai′*l′+bi′*r′，ai′+bi′＝1，l′为直播多媒体文件的当前时段的左声道数据，r′为直播多媒体文件的当前时段的右声道数据，ai′和bi′为预设的参数，i＝1,2,3…N。在该方法中，匹配识别请求中包括的目标多媒体文件的特征信息与特征数据库中的特征样本，以定位目标多媒体文件对应的多媒体记录包括：将目标多媒体文件的每个指纹信息分别与特征数据库中的特征样本匹配，得到每个指纹信息的匹配率；将最大匹配率对应的特征样本所在的多媒体记录作为目标多媒体文件对应的多媒体记录。Further, the feature information of the target multimedia file included in the identification request is N fingerprint information of the current period of the live multimedia file, and one fingerprint information in the N fingerprint information is one of the N stereo data of the target multimedia. Time-frequency feature data, wherein, the i-th stereo data in the N stereo data is si'=ai'*l'+bi'*r', ai'+bi'=1, and l' is the current of the live multimedia file The left channel data of the period, r' is the right channel data of the current period of the live multimedia file, ai' and bi' are preset parameters, i=1, 2, 3...N. In the method, matching the feature information of the target multimedia file included in the identification request with the feature samples in the feature database to locate the multimedia record corresponding to the target multimedia file includes: comparing each fingerprint information of the target multimedia file with the feature sample in the feature database match the feature samples to obtain the matching rate of each fingerprint information; the multimedia record where the feature sample corresponding to the maximum matching rate is located is used as the multimedia record corresponding to the target multimedia file.

依据本发明的一个方面，提供了一种直播多媒体文件的识别装置。According to one aspect of the present invention, a device for identifying live multimedia files is provided.

根据本发明的直播多媒体文件的识别装置包括：获取模块，用于根据输入的直播多媒体文件的实时数据流获取直播多媒体文件当前时段的特征信息；定位模块，用于根据直播多媒体文件的标识信息在特征数据库中定位待更新的多媒体记录，其中，特征数据库用于存储至少一条多媒体记录，多媒体记录包括多媒体文件的特征样本、与特征样本对应的标识信息，特征样本的时间长度为第一预定时间；更新模块，用于根据直播多媒体文件当前时段的特征信息更新待更新的多媒体记录中的特征样本；匹配模块，用于接收识别目标多媒体文件的识别请求，匹配识别请求中包括的目标多媒体文件的特征信息与特征数据库中的特征样本，以定位目标多媒体文件对应的多媒体记录；识别模块，用于获取目标多媒体文件对应的多媒体文件的标识信息。The identification device of live multimedia file according to the present invention comprises: acquisition module, is used for obtaining the characteristic information of live multimedia file current period according to the real-time data stream of the live multimedia file of input; Locate the multimedia record to be updated in the feature database, wherein the feature database is used to store at least one multimedia record, the multimedia record includes a feature sample of the multimedia file, identification information corresponding to the feature sample, and the time length of the feature sample is the first predetermined time; The update module is used to update the feature sample in the multimedia record to be updated according to the feature information of the current period of the live multimedia file; the matching module is used to receive the identification request for identifying the target multimedia file, and match the feature of the target multimedia file included in the identification request The feature sample in the information and feature database is used to locate the multimedia record corresponding to the target multimedia file; the identification module is used to obtain the identification information of the multimedia file corresponding to the target multimedia file.

进一步地，特征信息为多媒体文件的音频数据的指纹信息，获取模块包括：音频数据获取模块，用于根据实时数据流获取直播多媒体文件的当前时段的音频数据；音频片段分割模块，用于将当前时段的音频数据按照时间顺序分割为第二预定时间的多个音频片段，其中，第二预定时间小于第一预定时间；以及指纹信息提取模块，用于提取每个音频片段的指纹信息，以得到直播多媒体的当前时段的特征信息。Further, the feature information is the fingerprint information of the audio data of the multimedia file, and the acquisition module includes: an audio data acquisition module, which is used to obtain the audio data of the current period of the live multimedia file according to the real-time data stream; The audio data of the time period is divided into a plurality of audio segments of the second predetermined time in chronological order, wherein the second predetermined time is less than the first predetermined time; and the fingerprint information extraction module is used to extract the fingerprint information of each audio segment to obtain Feature information of the current period of live multimedia.

进一步地，特征样本为n个音频片段的指纹信息，直播多媒体文件的当前时段的特征信息为m个音频片段的指纹信息，m<n，n个音频片段的时间长度为第一预定时间，更新模块包括：删除模块，用于删除待更新的多媒体记录中特征样本的最早的m个指纹信息；添加模块，用于将直播多媒体文件的当前时段的m个指纹信息按时间顺序置于待更新的多媒体记录的特征样本中。Further, the feature sample is the fingerprint information of n audio clips, the feature information of the current period of the live multimedia file is the fingerprint information of m audio clips, m<n, the time length of n audio clips is the first predetermined time, update The module includes: a deletion module, which is used to delete the earliest m fingerprint information of the feature sample in the multimedia record to be updated; an addition module, which is used to place the m fingerprint information of the current period of the live multimedia file in chronological order in the to-be-updated Among the feature samples of multimedia records.

进一步地，更新模块具体执行以下步骤：Further, the update module specifically performs the following steps:

步骤S1：特征指针指向直播多媒体文件当前时段的特征信息中的第一个指纹信息，并将计时器清零开始特征提取计时；步骤S2：获取特征指针指向的指纹信息；步骤S3：提取与直播多媒体的标识信息相对应的多媒体记录的特征样本，以得到第一特征样本；步骤S4：将特征指针指向的指纹信息拼接至第一特征样本的末尾，以得到第二特征样本；步骤S5：从第二特征样本的起始删除一个指纹信息；步骤S6：判断计时器中的时间是否达到第三预定时间，若未达到第三预定时间，特征指针指向下一个指纹信息，并重复执行步骤S2至S6；若达到第三预定时间，用得到的第二特征样本替换多媒体记录中多媒体标识对应的特征样本，其中，第三预定时间为m个指纹信息对应的多媒体文件的播放时间。Step S1: The feature pointer points to the first fingerprint information in the feature information of the current period of the live multimedia file, and the timer is reset to start feature extraction timing; Step S2: Obtain the fingerprint information pointed to by the feature pointer; Step S3: Extract and live broadcast The feature sample of the multimedia record corresponding to the multimedia identification information to obtain the first feature sample; Step S4: splicing the fingerprint information pointed to by the feature pointer to the end of the first feature sample to obtain the second feature sample; Step S5: From The beginning of the second feature sample deletes a fingerprint information; step S6: judge whether the time in the timer reaches the third predetermined time, if it does not reach the third predetermined time, the feature pointer points to the next fingerprint information, and repeat steps S2 to S6: If the third predetermined time is reached, replace the characteristic sample corresponding to the multimedia identifier in the multimedia record with the obtained second characteristic sample, wherein the third predetermined time is the playing time of the multimedia file corresponding to the m fingerprint information.

进一步地，指纹信息提取模块包括：立体声数据合成模块，用于合并音频片段的左声道数据和右声道数据，以得到音频片段的立体声数据；以及时频特征提取模块，用于提取音频片段的立体声数据的时频特征数据作为音频片段的指纹信息。Further, the fingerprint information extraction module includes: a stereo data synthesis module, which is used to merge the left channel data and the right channel data of the audio clip to obtain the stereo data of the audio clip; and a time-frequency feature extraction module, which is used to extract the audio clip The time-frequency feature data of the stereo data is used as the fingerprint information of the audio segment.

进一步地，识别请求中包括的目标多媒体文件的特征信息为直播多媒体文件的当前时段的N个指纹信息,N个指纹信息中的一个指纹信息为目标多媒体的N个立体声数据中的一个立体声数据的时频特征数据，其中，N个立体声数据中的第i个立体声数据为si′＝ai′*l′+bi′*r′，ai′+bi′＝1，l′为直播多媒体文件的当前时段的左声道数据，r′为直播多媒体文件的当前时段的右声道数据，ai′和bi′为预设的参数，i＝1,2,3…N,在该装置中，匹配模块包括：匹配率确定模块，用于将目标多媒体文件的每个指纹信息分别与特征数据库中的特征样本匹配，得到每个指纹信息的匹配率；多媒体记录确定模块，用于将最大匹配率对应的特征样本所在的多媒体记录作为目标多媒体文件对应的多媒体记录。Further, the feature information of the target multimedia file included in the identification request is N fingerprint information of the current period of the live multimedia file, and one fingerprint information in the N fingerprint information is one of the N stereo data of the target multimedia. Time-frequency feature data, wherein, the i-th stereo data in the N stereo data is si'=ai'*l'+bi'*r', ai'+bi'=1, and l' is the current of the live multimedia file The left channel data of the period, r' is the right channel data of the current period of the live multimedia file, ai' and bi' are preset parameters, i=1,2,3...N, in this device, the matching module Including: a matching rate determination module, which is used to match each fingerprint information of the target multimedia file with the feature samples in the feature database to obtain the matching rate of each fingerprint information; a multimedia record determination module, which is used to match the maximum matching rate corresponding to The multimedia record where the feature sample is located serves as the multimedia record corresponding to the target multimedia file.

通过本发明，预设一个特征数据库存储直播多媒体的特征信息，具体地，在该特征数据库中存储至少一条多媒体记录，多媒体记录包括多媒体文件的特征样本、与特征样本对应的标识信息，并且特征样本的时间长度为第一预定时间，在有直播多媒体文件的实时数据流输入时，首先根据输入的直播多媒体文件的实时数据流获取直播多媒体文件当前时段的特征信息，然后根据直播多媒体文件的标识信息在特征数据库中定位待更新的多媒体记录，根据直播多媒体文件当前时段的特征信息更新待更新的多媒体记录中的特征样本，从而保证特征数据库中存储直播多媒体文件当前最新的特征信息。在接收识别目标多媒体文件的识别请求时，匹配识别请求中包括的目标多媒体文件的特征信息与特征数据库中的特征样本，以定位目标多媒体文件对应的多媒体记录，然后获取目标多媒体文件对应的多媒体文件的标识信息，以达到识别目标多媒体文件的目的，解决了现有技术中不能识别直播视频的问题。Through the present invention, a feature database is preset to store feature information of live multimedia, specifically, at least one multimedia record is stored in the feature database, the multimedia record includes feature samples of multimedia files, identification information corresponding to feature samples, and feature samples The length of time is the first predetermined time. When there is a real-time data stream input of a live multimedia file, first obtain the feature information of the current period of the live multimedia file according to the input real-time data stream of the live multimedia file, and then according to the identification information of the live multimedia file Locate the multimedia record to be updated in the feature database, and update the feature samples in the multimedia record to be updated according to the feature information of the current period of the live multimedia file, thereby ensuring that the latest feature information of the live multimedia file is stored in the feature database. When receiving the identification request for identifying the target multimedia file, match the feature information of the target multimedia file included in the identification request with the feature samples in the feature database to locate the multimedia record corresponding to the target multimedia file, and then obtain the multimedia file corresponding to the target multimedia file identification information, so as to achieve the purpose of identifying the target multimedia file, and solve the problem that the live video cannot be identified in the prior art.

上述说明仅是本发明技术方案的概述，为了能够更清楚了解本发明的技术手段，而可依照说明书的内容予以实施，并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂，以下特举本发明的具体实施方式。The above description is only an overview of the technical solution of the present invention. In order to better understand the technical means of the present invention, it can be implemented according to the contents of the description, and in order to make the above and other purposes, features and advantages of the present invention more obvious and understandable , the specific embodiments of the present invention are enumerated below.

附图说明Description of drawings

通过阅读下文优选实施方式的详细描述，各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的，而并不认为是对本发明的限制。而且在整个附图中，用相同的参考符号表示相同的部件。在附图中：Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiment. The drawings are only for the purpose of illustrating a preferred embodiment and are not to be considered as limiting the invention. Also throughout the drawings, the same reference numerals are used to designate the same parts. In the attached picture:

图1是根据本发明实施例一的方法流程图；Fig. 1 is a method flowchart according to Embodiment 1 of the present invention;

图2是根据本发明实施例二的方法流程图；FIG. 2 is a flow chart of a method according to Embodiment 2 of the present invention;

图3是根据本发明实施例三的方法流程图；FIG. 3 is a flow chart of a method according to Embodiment 3 of the present invention;

图4是根据本发明实施例四的系统示意图；FIG. 4 is a schematic diagram of a system according to Embodiment 4 of the present invention;

图5是根据本发明实施例四的终端框图；FIG. 5 is a block diagram of a terminal according to Embodiment 4 of the present invention;

图6是根据本发明实施例四的视频检索服务器框图；6 is a block diagram of a video retrieval server according to Embodiment 4 of the present invention;

图7是根据本发明实施例四的指纹管理服务器框图；Fig. 7 is a block diagram of a fingerprint management server according to Embodiment 4 of the present invention;

图8是根据本发明实施例四的视频管理服务器框图；以及8 is a block diagram of a video management server according to Embodiment 4 of the present invention; and

图9是根据本发明实施例五的装置框图。Fig. 9 is a block diagram of a device according to Embodiment 5 of the present invention.

具体实施方式Detailed ways

下面结合附图和具体实施方式对本发明做进一步说明。需要指出的是，在不冲突的情况下，本申请中的实施例及实施例中的特征可以相互组合。The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments. It should be pointed out that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other.

本发明实施例提供了直播多媒体文件的识别方法，在该方法中，预置特征数据库，在直播多媒体文件直播的过程中，根据直播多媒体文件的实时数据流更新特征数据库，以保证特征数据库中存储有当前直播多媒体文件最新的信息。当用户需要识别直播的目标多媒体文件时，将目标多媒体文件的特征信息与预置特征数据库中的特征样本进行匹配，若目标多媒体文件的特征信息与某一特征样本匹配成功，则通过该特征样本对应的标识信息即可达到识别目标多媒体文件的目的。The embodiment of the present invention provides a method for identifying a live multimedia file. In the method, a feature database is preset. During the live broadcast of the live multimedia file, the feature database is updated according to the real-time data stream of the live multimedia file to ensure that the feature database is stored in the feature database. There is the latest information on the current live multimedia files. When the user needs to identify the target multimedia file of the live broadcast, the feature information of the target multimedia file is matched with the feature samples in the preset feature database. If the feature information of the target multimedia file matches a certain feature sample successfully, the feature sample is passed. The corresponding identification information can achieve the purpose of identifying the target multimedia file.

需要说明的是，直播多媒体文件的直播时间与实时数据流的产生时间通常会有一个时间差，直播多媒体文件的直播时间(即传送至用户的时间)晚于直播多媒体文件的实时数据流的产生时间，本发明实施例正是基于这一时间差，使得特征数据库中特征样本的更新时间早于或同步于直播多媒体文件的直播时间，使得特征数据库能够保持存储当前直播多媒体文件的特征信息，进而能够在特征数据库中匹配到直播的目标多媒体文件，达到识别目标多媒体文件的目的。It should be noted that there is usually a time difference between the live broadcast time of the live multimedia file and the generation time of the real-time data stream, and the live broadcast time of the live multimedia file (that is, the time when it is delivered to the user) is later than the generation time of the real-time data stream of the live multimedia file , the embodiment of the present invention is based on this time difference, so that the update time of the feature sample in the feature database is earlier than or synchronized with the live broadcast time of the live multimedia file, so that the feature database can keep storing the feature information of the current live multimedia file, and then can be in The live target multimedia file is matched to the feature database to achieve the purpose of identifying the target multimedia file.

具体地，该特征数据库中存储有一条或多条多媒体记录，每条多媒体记录对应一个直播多媒体文件，例如，该多媒体文件为视频，则每条多媒体记录对应一个直播视频；该多媒体文件为音频，则每条多媒体记录对应一个直播音频。每条多媒体记录包括一个直播多媒体文件的特征样本和该直播多媒体文件的标识信息，其中，特征样本由预置固定时间长度的多媒体文件的特征信息组成，例如特征样本为固定时间长度的视频的特征信息组成；标识信息为能够辨识不同直播多媒体的信息。Specifically, one or more multimedia records are stored in the feature database, and each multimedia record corresponds to a live multimedia file. For example, if the multimedia file is a video, each multimedia record corresponds to a live video; the multimedia file is audio, Then each multimedia record corresponds to a live audio. Each multimedia record includes a feature sample of a live multimedia file and identification information of the live multimedia file, wherein the feature sample is composed of feature information of a multimedia file with a preset fixed time length, for example, the feature sample is a feature of a video with a fixed time length Information composition; identification information is information that can identify different live multimedia.

当有直播多媒体文件的实时数据流输入时，进一步根据实时数据流获取到该直播多媒体文件当前时段的特征信息，由直播多媒体文件的标识信息在特征数据库中找到该直播多媒体文件对应的多媒体记录，再由特征信息更新该多媒体记录中的特征样本，从而保证特征数据库中存储直播多媒体文件当前最新的特征信息。When there is a real-time data stream input of a live multimedia file, the feature information of the current period of the live multimedia file is further obtained according to the real-time data stream, and the multimedia record corresponding to the live multimedia file is found in the feature database by the identification information of the live multimedia file, Then, the characteristic samples in the multimedia record are updated by the characteristic information, so as to ensure that the latest characteristic information of the live multimedia file is stored in the characteristic database.

当接收到识别目标多媒体文件的识别请求时，进一步根据识别请求获取目标多媒体文件的特征信息，然后通过目标多媒体文件的特征信息与特征数据库中的特征样本相匹配，由匹配到的特征样本定位到一条多媒体记录，进而得到该多媒体记录中的标识信息，最终可由该标识信息得到目标多媒体文件的识别结果。When receiving the identification request for identifying the target multimedia file, further obtain the feature information of the target multimedia file according to the identification request, then match the feature information of the target multimedia file with the feature sample in the feature database, and locate the location by the matched feature sample A multimedia record, and then obtain the identification information in the multimedia record, and finally obtain the identification result of the target multimedia file from the identification information.

本发明实施例的任意识别方法均可用于直播多媒体文件的搜索方法中。在直播多媒体文件的搜索方法中，通过本发明实施例的直播多媒体文件的识别方法识别到目标多媒体文件，也即得到目标多媒体文件的标识信息后，通过标识信息找到目标多媒体文件的链接，进而搜索到目标多媒体文件。Any identification method in the embodiments of the present invention can be used in a search method for live multimedia files. In the search method for live multimedia files, identify the target multimedia file through the identification method of the live multimedia file in the embodiment of the present invention, that is, after obtaining the identification information of the target multimedia file, find the link of the target multimedia file through the identification information, and then search to the target multimedia file.

例如，手机用户在街头广告屏上看到正在直播的某视频，希望能通过手机搜索并播放到该视频，此时，用户操作手机终端，手机终端对直播视频录音，再根据记录的直播视频的声音数据生成目标视频识别请求发送至云端服务器，云端服务器采用本发明实施例的识别方法识别到目标视频后，一种情况云端服务器可将目标视频的标识信息返回至手机终端，手机终端再根据目标视频的标识信息查找到目标视频的链接，进而通过查找到的链接播放视频；另一种情况云端服务器可通过目标视频的标识信息查找到目标视频的链接后返回至手机终端，进而手机终端通过该链接播放视频。For example, a mobile phone user sees a live video on a street advertisement screen, and hopes to search for and play the video through the mobile phone. The sound data generates a target video recognition request and sends it to the cloud server. After the cloud server recognizes the target video using the recognition method of the embodiment of the present invention, in one case the cloud server can return the identification information of the target video to the mobile terminal, and the mobile terminal can then use the target video to identify the target video. The identification information of the video finds the link of the target video, and then plays the video through the found link; in another case, the cloud server can find the link of the target video through the identification information of the target video and return it to the mobile terminal, and then the mobile terminal passes the link. Link to play video.

以下将对本发明所提供的多种实施例进行详细的描述。Various embodiments provided by the present invention will be described in detail below.

实施例一Embodiment one

该实施例一提供了一种直播多媒体文件的识别方法的实施例，该实施例提供的方法的执行主体为云端服务器，发送目标多媒体文件识别请求的是用户终端。其中，在云端服务器中预置特征数据库，特征数据库中存储有当前直播多媒体文件最新的信息。当用户终端需要识别直播的目标多媒体文件时，云端服务器接收到识别请求，将目标多媒体文件的特征信息与预置特征数据库中的特征样本进行匹配，通过匹配成功的特征样本对应的标识信息识别目标多媒体文件。The first embodiment provides an embodiment of a method for identifying a live multimedia file. The execution body of the method provided in this embodiment is a cloud server, and the user terminal sends a target multimedia file identification request. Wherein, a feature database is preset in the cloud server, and the latest information of the current live multimedia file is stored in the feature database. When the user terminal needs to identify the target multimedia file of the live broadcast, the cloud server receives the identification request, matches the characteristic information of the target multimedia file with the characteristic samples in the preset characteristic database, and identifies the target through the identification information corresponding to the successfully matched characteristic samples multimedia files.

图1是根据本发明实施例一的方法流程图，如图1所示，该方法具体包括以下步骤S102至步骤S116，其中，步骤S102至步骤S108实现特征数据库的更新，步骤S110至步骤S116实现根据特征数据库识别目标多媒体文件。Fig. 1 is a flow chart of a method according to Embodiment 1 of the present invention. As shown in Fig. 1, the method specifically includes the following steps S102 to S116, wherein, steps S102 to S108 realize updating of the feature database, and steps S110 to S116 realize A target multimedia file is identified based on the feature database.

步骤S102：获取直播多媒体文件的实时数据流和标识信息。Step S102: Obtain the real-time data stream and identification information of the live multimedia file.

云端服务器与直播源的后台数据库服务器相互通信，实时获取到直播多媒体文件的实时数据流，在获取实时数据流的同时，也可获取到直播多媒体文件的标识信息。该处的多媒体文件可以为视频或者音频，则获取到的实时数据流相应的为视频流或音频流。直播多媒体文件的标识信息为能够在多个多媒体文件(包括直播多媒体文件和非直播多媒体文件)中唯一辨识和确定该直播多媒体文件的信息。The cloud server communicates with the background database server of the live broadcast source to obtain the real-time data stream of the live multimedia file in real time. While obtaining the real-time data stream, the identification information of the live multimedia file can also be obtained. The multimedia file here can be video or audio, and the obtained real-time data stream is correspondingly video stream or audio stream. The identification information of the live multimedia file is information that can uniquely identify and determine the live multimedia file among multiple multimedia files (including live multimedia files and non-live multimedia files).

对于直播多媒体文件而言，一个直播源在同一时间只能直播一个多媒体文件，并且直播源的身份信息具有简单、唯一和辨识度高的特点，因而，直播多媒体文件的标识信息优选为直播源的身份信息，例如，直播多媒体文件为视频时，其标识信息为播放直播视频的视频源的频道数据，具体如直播多媒体文件为新闻联播，其标识信息为“CCTV1”；又如，直播多媒体文件为音频时，其标识信息为播放直播音频的音频源的频道数据，具体如直播多媒体文件为直播的评书广播连播，其标识信息为“中央人民广播电台”。For live multimedia files, a live broadcast source can only live broadcast one multimedia file at a time, and the identity information of the live broadcast source has the characteristics of simplicity, uniqueness, and high recognition. Therefore, the identification information of the live multimedia file is preferably the live source. Identity information, for example, when the live multimedia file is a video, its identification information is the channel data of the video source that plays the live video, specifically, if the live multimedia file is a news broadcast, its identification information is "CCTV1"; another example, the live multimedia file is For audio, its identification information is the channel data of the audio source that plays the live audio. Specifically, for example, the live multimedia file is a live storytelling radio series, and its identification information is "Central People's Broadcasting Station".

步骤S104：根据实时数据流获取直播多媒体文件当前时段的特征信息。Step S104: Obtain feature information of the current period of the live multimedia file according to the real-time data stream.

云端服务器每次获取到直播多媒体文件的实时数据流后，采用预设的特征提取模块，对实时数据流进行数据处理，以提取实时数据流的特征数据，得到直播多媒体文件当前时段的特征信息。After each time the cloud server obtains the real-time data stream of the live multimedia file, it uses the preset feature extraction module to process the real-time data stream to extract the feature data of the real-time data stream and obtain the feature information of the current period of the live multimedia file.

例如，获取到直播视频的视频流数据，采用特征提取模块对视频流数据进行处理，以提取直播视频对应音频指纹，得到直播视频当前时段的特征信息。For example, the video stream data of the live video is obtained, and the feature extraction module is used to process the video stream data to extract the corresponding audio fingerprint of the live video, and obtain the feature information of the current period of the live video.

步骤S106：根据直播多媒体文件的标识信息在特征数据库中定位待更新的多媒体记录。Step S106: Locate the multimedia record to be updated in the feature database according to the identification information of the live multimedia file.

其中，特征数据库存储有多条多媒体记录，每一条多媒体记录包括多媒体文件的特征样本和与特征样本对应的多媒体文件的标识信息两部分，其中，特征样本的时间长度是固定的，每次进行更新后，特征样本中的特征信息相应变化为更新后的特征信息，但特征样本的时间长度并不发生变化，以保证特征数据库存储的特征样本总是最新一段时间的直播多媒体文件的特征信息。Wherein, the feature database stores a plurality of multimedia records, and each multimedia record includes two parts: a feature sample of the multimedia file and identification information of the multimedia file corresponding to the feature sample, wherein the time length of the feature sample is fixed and is updated each time Afterwards, the feature information in the feature sample is correspondingly changed to the updated feature information, but the time length of the feature sample does not change, so as to ensure that the feature sample stored in the feature database is always the feature information of the latest period of live multimedia files.

在步骤S102获取到直播多媒体文件的标识信息后，在该步骤S106中，查找特征数据库，可在特征数据库中定位到与直播多媒体文件的标识信息相对应的多媒体记录，该多媒体记录即为待更新的多媒体记录。After obtaining the identification information of the live multimedia file in step S102, in this step S106, search the characteristic database, can locate the multimedia record corresponding to the identification information of the live multimedia file in the characteristic database, this multimedia record is to be updated multimedia recording.

步骤S108：根据直播多媒体文件的特征信息更新待更新的多媒体记录中的特征样本。Step S108: Update the feature samples in the multimedia record to be updated according to the feature information of the live multimedia file.

定位到待更新的多媒体记录后，采用直播多媒体文件的特征信息更新多媒体记录中的特征样本。更新时，当直播多媒体文件当前时段的特征信息的时间长度大于或等于特征样本的时间长度时，可将特征样本中的特征信息全部更新；当直播多媒体文件当前时段的特征信息的时间长度小于特征样本的时间长度时，可进行部分更新。After the multimedia record to be updated is located, feature samples in the multimedia record are updated using feature information of the live multimedia file. When updating, when the time length of the feature information of the current period of the live multimedia file is greater than or equal to the time length of the feature sample, all the feature information in the feature sample can be updated; when the time length of the feature information of the current period of the live multimedia file is less than the feature sample Partial updates can be made when the time length of the sample is specified.

无论采用什么样的更新方式，为避免数据冲突，可将当前的特征数据库进行备份，对备份的特征数据库进行更新，然后用更新后的备份的特征数据库覆盖原特征数据库。No matter what update method is adopted, in order to avoid data conflicts, the current feature database can be backed up, the backed up feature database can be updated, and then the original feature database can be overwritten with the updated backup feature database.

在云端服务器中，实时接收不同直播多媒体文件的数据流，通过步骤S102至步骤S108及时的更新特征数据库，从而针对任意的直播多媒体文件，在特征数据库中总存储有当前最新一段时间内的特征信息。In the cloud server, the data streams of different live multimedia files are received in real time, and the feature database is updated in time through steps S102 to S108, so that for any live multimedia file, the feature information of the latest period of time is always stored in the feature database .

步骤S110：接收识别目标多媒体文件的识别请求。Step S110: Receive an identification request for identifying a target multimedia file.

该识别请求可由用户终端发送，用户需要识别某正在直播的多媒体文件时(本申请将该待识别的正在直播的多媒体文件定义为目标多媒体文件)，获取该目标多媒体文件的一段时间内的数据流，当该段时间内的数据流数据量较大时，可在用户终端提取目标多媒体文件的特征信息，将提取到的特征信息封装为识别请求发送至云端服务器，云端服务器接收包含目标多媒体文件的特征信息的识别请求，以达到减少数据传输量的目的；当该数据流数据量较小时，可直接将数据流封装为识别请求发送至云端服务器，云端服务器接收包含目标多媒体文件的数据流的识别请求，以降低对用户终端数据处理能力的要求。The identification request can be sent by the user terminal. When the user needs to identify a multimedia file that is being broadcast live (this application defines the multimedia file that is to be identified as the target multimedia file), the data stream of the target multimedia file within a certain period of time is obtained. , when the amount of data stream data in this period of time is large, the feature information of the target multimedia file can be extracted at the user terminal, and the extracted feature information can be packaged as a recognition request and sent to the cloud server, and the cloud server receives the target multimedia file. Identification request of feature information to achieve the purpose of reducing the amount of data transmission; when the data volume of the data stream is small, the data stream can be directly encapsulated as an identification request and sent to the cloud server, and the cloud server receives the identification of the data stream containing the target multimedia file request to reduce the requirements on the data processing capability of the user terminal.

步骤S112：根据识别请求获取目标多媒体文件的特征信息。Step S112: Obtain feature information of the target multimedia file according to the identification request.

云端服务器在接收到识别请求之后，根据识别请求获取特征信息，具体地，当识别请求包含目标多媒体文件的特征信息，云端服务器通过解析识别请求即可得到目标多媒体文件的特征信息；当识别请求包含目标多媒体文件的数据流，云端服务器调用预设的特征提取模块，对目标多媒体文件的数据流进行数据处理，以提取目标多媒体文件的特征信息。After the cloud server receives the identification request, it obtains the characteristic information according to the identification request. Specifically, when the identification request contains the characteristic information of the target multimedia file, the cloud server can obtain the characteristic information of the target multimedia file by parsing the identification request; when the identification request contains For the data stream of the target multimedia file, the cloud server invokes a preset feature extraction module to perform data processing on the data stream of the target multimedia file, so as to extract feature information of the target multimedia file.

步骤S114：匹配目标多媒体文件的特征信息与特征数据库中的特征样本，以定位目标多媒体文件对应的多媒体记录。Step S114: Match the feature information of the target multimedia file with feature samples in the feature database to locate the multimedia record corresponding to the target multimedia file.

云端服务器在得到目标多媒体文件的特征信息后，与特征数据库中的各特征样本进行匹配，若目标多媒体文件的特征信息与某一特征样本匹配成功，则匹配结束，该特征样本所在的多媒体记录即为目标多媒体文件对应的多媒体记录。After the cloud server obtains the feature information of the target multimedia file, it matches with each feature sample in the feature database. If the feature information of the target multimedia file matches a certain feature sample successfully, the matching ends, and the multimedia record where the feature sample is located is It is the multimedia record corresponding to the target multimedia file.

步骤S116：获取目标多媒体文件对应的多媒体记录中的标识信息，以识别目标多媒体文件。Step S116: Obtain the identification information in the multimedia record corresponding to the target multimedia file, so as to identify the target multimedia file.

定位到目标多媒体文件对应的多媒体记录后，通过该多媒体记录中的标识信息即可达到识别目标多媒体文件的目的。例如，该多媒体记录中的标识信息为“CCTV5”，则目标多媒体文件即为CCTV5正在直播的视频。After the multimedia record corresponding to the target multimedia file is located, the purpose of identifying the target multimedia file can be achieved through the identification information in the multimedia record. For example, if the identification information in the multimedia record is "CCTV5", then the target multimedia file is the live video of CCTV5.

在该实施例提供的直播多媒体文件的识别方法中，预置一个特征数据库来存储直播多媒体文件的特征样本和标识信息，并通过获取直播源后台的实时数据流实时更新和维护该特征数据库，当有目标多媒体文件需要被识别时，根据目标多媒体文件的特征信息在特征数据库中找到对应的直播多媒体文件的标识信息，达到识别直播多媒体文件的目的。综上，采用该实施例提供的直播多媒体文件的识别方法，能够实时的识别直播多媒体文件，从而也能够实时的识别直播视频。In the method for identifying live multimedia files provided in this embodiment, a feature database is preset to store feature samples and identification information of live multimedia files, and the feature database is updated and maintained in real time by obtaining the real-time data stream of the live source background, when When a target multimedia file needs to be identified, the identification information of the corresponding live multimedia file is found in the feature database according to the characteristic information of the target multimedia file, so as to achieve the purpose of identifying the live multimedia file. To sum up, using the method for identifying live multimedia files provided by this embodiment, live multimedia files can be identified in real time, so that live video can also be identified in real time.

实施例二Embodiment two

该实施例二提供了一种直播多媒体文件的识别方法的优选实施例，该实施例是在实施例一的基础上进一步优选的实施例，具体的改进之处在于：This embodiment two provides a preferred embodiment of a method for identifying live multimedia files, which is a further preferred embodiment on the basis of embodiment one, and the specific improvements are:

首先，该实施例中采用多媒体文件的音频数据的指纹信息作为多媒体文件的特征信息，也即，在特征数据库中存储的特征样本为音频数据的指纹信息，获取到的目标多媒体文件的特征信息也相应为目标多媒体文件的音频数据的指纹信息，从而无论在更新特征数据库时，还是在获取目标多媒体文件的特征信息时，均需获取多媒体文件的音频数据、提取音频数据的指纹信息，极大减小了系统的数据传输量，降低了流量的消耗，增加识别方法的可用性。具体地，将直播多媒体文件的当前时段的音频数据分割为多个音频片段，提取每个音频片段的指纹信息，从而直播多媒体文件当前时段的特征信息由该多个音频片段的指纹信息构成。First of all, in this embodiment, the fingerprint information of the audio data of the multimedia file is used as the feature information of the multimedia file, that is, the feature samples stored in the feature database are the fingerprint information of the audio data, and the feature information of the acquired target multimedia file is also Correspondingly, it is the fingerprint information of the audio data of the target multimedia file, so no matter when updating the feature database or when obtaining the feature information of the target multimedia file, it is necessary to obtain the audio data of the multimedia file and extract the fingerprint information of the audio data, which greatly reduces the The data transmission volume of the system is reduced, the traffic consumption is reduced, and the usability of the identification method is increased. Specifically, the audio data of the current period of the live multimedia file is divided into multiple audio segments, and the fingerprint information of each audio segment is extracted, so that the feature information of the current period of the live multimedia file is composed of the fingerprint information of the multiple audio segments.

进一步地，该实施例中特征样本的时间长度(也即n个音频片段的时间长度)大于获取到的直播多媒体文件的特征信息的时间长度(也即m个音频片段的时间长度)，在更新特征数据库(当特征数据库中存储的特征样本为指纹信息时，也将特征数据库称为指纹数据库，将特征样本称为指纹样本)，将直播多媒体文件对应的指纹信息添加至指纹样本中，并将指纹样本中最早的、时间长度为添加特征信息的时间长度的指纹信息删除，从而一方面保证了指纹样本的长度，在匹配指纹样本时，保证了任意时间段直播的目标多媒体文件的有效识别，另一方面保证了指纹样本的实时更新，以保证识别的实时性。Further, the time length (that is, the time length of n audio clips) of the feature sample in this embodiment is greater than the time length (that is, the time length of m audio clips) of the feature information of the acquired live multimedia file. A feature database (when the feature sample stored in the feature database is fingerprint information, the feature database is also called the fingerprint database, and the feature sample is called the fingerprint sample), the fingerprint information corresponding to the live multimedia file is added to the fingerprint sample, and The earliest fingerprint information in the fingerprint sample whose time length is the time length of adding characteristic information is deleted, thus ensuring the length of the fingerprint sample on the one hand, and ensuring the effective identification of the target multimedia file live at any time when matching the fingerprint sample. On the other hand, the real-time updating of fingerprint samples is guaranteed to ensure the real-time performance of identification.

进一步地，在指纹数据库更新时，采用备份表结合计时器的更新方式，一方面能够避免数据冲突，另一方面能够根据实际需要控制更新周期。Furthermore, when the fingerprint database is updated, a backup table combined with a timer is used to update, on the one hand, data conflicts can be avoided, and on the other hand, the update cycle can be controlled according to actual needs.

具体地，图2是根据本发明实施例二的方法流程图，如图2所示，该方法具体包括以下步骤S202至步骤S216。Specifically, FIG. 2 is a flow chart of a method according to Embodiment 2 of the present invention. As shown in FIG. 2 , the method specifically includes the following steps S202 to S216.

步骤S202：获取直播多媒体文件的实时数据流和标识信息。Step S202: Obtain the real-time data stream and identification information of the live multimedia file.

步骤S204：根据实时数据流获取直播多媒体文件当前时段的音频数据。Step S204: Obtain the audio data of the current period of the live multimedia file according to the real-time data stream.

云端服务器在获取到直播多媒体文件的实时数据流时，能够进一步获取直播多媒体文件当前时段的音频数据你，例如，该当前时段的时间长度为第三预定时间，调用音频提取模块，提取数据流中的音频数据，从而得到的音频数据为第三预定时间的数据。When the cloud server obtains the real-time data stream of the live multimedia file, it can further obtain the audio data of the current period of the live multimedia file. audio data, so that the obtained audio data is the data of the third predetermined time.

优选地，在获取到音频数据时，可对音频数据进行格式转换，将获取到的音频数据转换为统一格式的数据，以方便后续处理；还可对音频数据进行去噪处理，例如采用滑动窗去噪的技术，去掉音频数据中的“尖刺”；还可对音频数据进行下采样，在保证数据精度的前提下，可减少数据的存储量和运算量。Preferably, when the audio data is acquired, format conversion can be performed on the audio data, and the acquired audio data can be converted into data in a unified format to facilitate subsequent processing; the audio data can also be denoised, for example, by using a sliding window The denoising technology removes the "spikes" in the audio data; it can also down-sample the audio data, which can reduce the amount of data storage and calculation under the premise of ensuring the accuracy of the data.

步骤S206：将当前时段的音频数据按照时间顺序分割为m个音频片段。Step S206: Divide the audio data of the current period into m audio segments in sequence of time.

获取到第三预定时间的音频数据后，调用音频分割模块按时间顺序将音频流分割为m个时间长度为第二预定时间t的音频片段。After the audio data of the third predetermined time is obtained, the audio segmentation module is called to divide the audio stream into m audio segments whose time length is the second predetermined time t in sequence.

需要说明的是，在执行步骤S204和步骤S206时，也可采用以下方式：先将当前时段内的实时数据流案时间顺序分割为m个数据片段，再获取每个数据片段的音频数据得到音频片段。该方式与上述步骤S204和步骤S206相互等同，均在本申请的保护范围之内。It should be noted that, when executing steps S204 and S206, the following method can also be adopted: first divide the real-time data flow case in the current period into m data segments in time order, and then obtain the audio data of each data segment to obtain the audio fragment. This manner is equivalent to the above step S204 and step S206, and both are within the protection scope of the present application.

步骤S208：提取每个音频片段的指纹信息，以得到直播多媒体文件当前时段的特征信息。Step S208: extract the fingerprint information of each audio segment to obtain the feature information of the current period of the live multimedia file.

得到m个时间长度为t的音频片段后，提取每个音频片段的指纹信息，其中，所有音频片段的指纹信息构成直播多媒体文件当前时段的特征信息，从而直播多媒体文件当前时段的特征信息包括m个指纹信息，该特征信息的时间长度相应为第三预定时间。After obtaining m audio segments with a time length of t, extract the fingerprint information of each audio segment, wherein the fingerprint information of all audio segments constitutes the feature information of the current period of the live multimedia file, so that the feature information of the current period of the live multimedia file includes m fingerprint information, and the time length of the feature information is corresponding to the third predetermined time.

直播多媒体文件当前时段的特征信息由按时间先后顺序排列的m个指纹信息组成，第一个指纹信息为最早的一个指纹信息，第m个指纹信息为最新的一个指纹信息。The characteristic information of the current period of the live multimedia file is composed of m fingerprint information arranged in chronological order, the first fingerprint information is the earliest fingerprint information, and the mth fingerprint information is the latest fingerprint information.

其中，该音频片段优选为立体声数据，同时，目标多媒体文件的特征信息也为立体声数据的指纹信息，二者数据源的统一能够提高匹配的准确性。Wherein, the audio segment is preferably stereo data, and at the same time, the feature information of the target multimedia file is also the fingerprint information of the stereo data, and the unification of the two data sources can improve the matching accuracy.

在提取音频片段的指纹信息时，可提取音频的时域特征，例如提取音频片段的幅值作为指纹信息，也可提取音频的时频特征，前者数据处理速度快，后者抗噪能力较强。When extracting the fingerprint information of an audio clip, the time domain feature of the audio can be extracted, for example, the amplitude of the audio clip can be extracted as the fingerprint information, and the time-frequency feature of the audio can also be extracted. The data processing speed of the former is fast, and the anti-noise ability of the latter is strong .

步骤S210：根据直播多媒体文件的标识信息在指纹数据库定位待更新的多媒体记录。Step S210: Locate the multimedia record to be updated in the fingerprint database according to the identification information of the live multimedia file.

其中，该指纹数据库存储有多条多媒体记录，每一条多媒体记录包括多媒体文件的特征样本和与特征样本对应的多媒体文件的标识信息两部分，其中，特征样本由按时间先后顺序排列的n个指纹信息组成，特征样本中的第一个指纹信息为最早的一个指纹信息，最后一个指纹信息为最新的一个指纹信息。每个指纹信息的时间长度为t，该n个指纹信息的时间长度为第一预定时间T，并且m<n，或者第三预定时间小于第一预定时间，也即直播多媒体文件当前时段的特征信息的时间长度小于指纹样本的时间长度。Wherein, the fingerprint database stores a plurality of multimedia records, and each multimedia record includes two parts: a feature sample of a multimedia file and identification information of a multimedia file corresponding to the feature sample, wherein the feature sample consists of n fingerprints arranged in chronological order The first fingerprint information in the feature sample is the earliest fingerprint information, and the last fingerprint information is the latest fingerprint information. The time length of each fingerprint information is t, the time length of the n fingerprint information is the first predetermined time T, and m<n, or the third predetermined time is less than the first predetermined time, that is, the characteristics of the current period of the live multimedia file The time length of the information is less than the time length of the fingerprint sample.

在得到标识信息之后，可在指纹数据库中定位到包括该标识信息的多媒体记录。After the identification information is obtained, the multimedia record including the identification information can be located in the fingerprint database.

步骤S212：删除待更新的多媒体记录中指纹样本的最早的m个指纹信息。Step S212: Delete the earliest m fingerprint information of the fingerprint samples in the multimedia record to be updated.

在定位到待更新的多媒体记录，对该多媒体记录中的指纹样本进行更新，在更新时，将指纹样本中的前m个指纹信息删除，也即，将指纹样本中当前最早的m个指纹信息删除。When the multimedia record to be updated is located, the fingerprint sample in the multimedia record is updated. When updating, the first m fingerprint information in the fingerprint sample is deleted, that is, the current earliest m fingerprint information in the fingerprint sample is deleted. delete.

步骤S214：将直播多媒体文件当前时段的m个指纹信息按时间顺序置于待更新的多媒体记录中指纹样本中。Step S214: Put the m fingerprint information of the current period of the live multimedia file into the fingerprint samples in the multimedia record to be updated in chronological order.

在更新时，将直播多媒体文件当前时段的m个指纹信息添加至指纹样本的末尾，从而添加的指纹信息在指纹样本中是最新的指纹信息。When updating, m pieces of fingerprint information in the current period of the live multimedia file are added to the end of the fingerprint sample, so that the added fingerprint information is the latest fingerprint information in the fingerprint sample.

需要说明的是，在该实施例中，可先执行步骤S212，后执行步骤S214，也可先执行步骤S214，后执行步骤S212。其中，在实现步骤S214和步骤S212时，可采用如下具体的方法步骤实现：It should be noted that, in this embodiment, step S212 may be executed first, and then step S214 may be executed, or step S214 may be executed first, and then step S212 may be executed. Wherein, when implementing step S214 and step S212, the following specific method steps can be adopted to realize:

步骤S1：特征指针指向直播多媒体文件当前时段的特征信息中的第一个指纹信息，并将计时器清零开始特征提取计时；Step S1: The feature pointer points to the first fingerprint information in the feature information of the current period of the live multimedia file, and the timer is cleared to start feature extraction timing;

步骤S2：获取特征指针指向的指纹信息；Step S2: Obtain the fingerprint information pointed to by the feature pointer;

步骤S3：提取与直播多媒体的标识信息相对应的多媒体记录的特征样本，以得到第一特征样本；Step S3: Extracting feature samples of multimedia records corresponding to the identification information of the live multimedia to obtain a first feature sample;

步骤S4：将特征指针指向的指纹信息拼接至第一特征样本的末尾，以得到第二特征样本；Step S4: splicing the fingerprint information pointed to by the feature pointer to the end of the first feature sample to obtain a second feature sample;

步骤S5：从第二特征样本的起始删除一个指纹信息；Step S5: delete a fingerprint information from the beginning of the second feature sample;

步骤S6：判断计时器中的时间是否达到第三预定时间，若未达到第三预定时间，特征指针指向下一个指纹信息，并重复执行步骤S2至S6；若达到第三预定时间，用得到的第二特征样本替换多媒体记录中多媒体标识对应的特征样本，其中，第三预定时间为m个指纹信息对应的多媒体文件的播放时间。Step S6: Determine whether the time in the timer reaches the third predetermined time, if it does not reach the third predetermined time, the feature pointer points to the next fingerprint information, and repeat steps S2 to S6; if it reaches the third predetermined time, use the obtained The second characteristic sample replaces the characteristic sample corresponding to the multimedia identifier in the multimedia record, wherein the third predetermined time is the playing time of the multimedia file corresponding to the m fingerprint information.

在云端服务器中，实时接收不同直播多媒体文件的数据流和标识信息，通过步骤S202至步骤S214及时的更新指纹数据库，从而针对任意的直播多媒体文件，在指纹数据库中总存储有当前最新一段时间内的直播多媒体文件的指纹信息。In the cloud server, the data streams and identification information of different live multimedia files are received in real time, and the fingerprint database is updated in time through steps S202 to S214, so that for any live multimedia files, the latest period of time is always stored in the fingerprint database. Fingerprint information of live multimedia files.

步骤S216：接收识别目标多媒体文件的识别请求，并识别目标多媒体文件。Step S216: Receive an identification request for identifying the target multimedia file, and identify the target multimedia file.

具体地，该步骤S216包括上述实施例中的步骤S110至步骤S116，此处不再赘述。Specifically, the step S216 includes the steps S110 to S116 in the above-mentioned embodiment, which will not be repeated here.

实施例三Embodiment Three

该实施例三提供了一种直播多媒体文件的识别方法的实施例，该实施例是在实施例二的基础上进一步优选的实施例，具体的改进之处在于：This embodiment three provides an embodiment of a method for identifying live multimedia files. This embodiment is a further preferred embodiment on the basis of embodiment two. The specific improvements are:

首先，直播多媒体文件的音频片段是由左声道数据和右声道数据合并而成，相应地，目标多媒体文件的特征信息也为立体声数据的指纹信息，并且在合并左、右声道数据为立体声数据时，设置权重参数，以能够根据实际需要调整左右声道数据在立体声数据中所占的比重。First of all, the audio segment of the live multimedia file is formed by merging the left channel data and the right channel data. Correspondingly, the characteristic information of the target multimedia file is also the fingerprint information of the stereo data, and after merging the left and right channel data is For stereo data, set the weight parameter to adjust the proportion of the left and right channel data in the stereo data according to actual needs.

进一步地，在构建目标多媒体文件的特征信息时，通过设置多组权重数据，将目标多媒体文件的左右声道数据转化为多组立体声数据，提取每组立体声数据对应的指纹信息，从而目标多媒体文件的特征信息包括多组指纹信息。在进行目标多媒体文件识别时，将每组指纹信息与指纹数据库中的指纹样本分别相匹配，将最大匹配率对应的指纹样本所在的多媒体记录作为目标多媒体文件对应的多媒体记录，增加识别的准确性。Further, when constructing the feature information of the target multimedia file, by setting multiple sets of weight data, the left and right channel data of the target multimedia file is converted into multiple sets of stereo data, and the fingerprint information corresponding to each set of stereo data is extracted, so that the target multimedia file The feature information includes multiple sets of fingerprint information. When identifying the target multimedia file, match each group of fingerprint information with the fingerprint samples in the fingerprint database, and use the multimedia record corresponding to the fingerprint sample with the maximum matching rate as the multimedia record corresponding to the target multimedia file to increase the accuracy of recognition .

进一步地，在提取音频片段的立体声数据的指纹信息时，或者在提取目标多媒体文件的每组立体声数据的时频特征数据时，均提取立体声数据的时频特征，并依据能量极大值点所处的时刻，所处的频率和能量构建指纹，使得指纹能够保持良好的稳定性。并将构建的指纹采用哈希码表示，方便数据存储与处理。Further, when extracting the fingerprint information of the stereo data of the audio segment, or when extracting the time-frequency feature data of each group of stereo data of the target multimedia file, the time-frequency feature of the stereo data is extracted, and according to the energy maximum point The moment, the frequency and energy of the location construct the fingerprint, so that the fingerprint can maintain good stability. And the constructed fingerprint is represented by hash code, which is convenient for data storage and processing.

具体地，图3是根据本发明实施例三的方法流程图，如图3所示，该方法具体包括以下步骤S302至步骤S318。Specifically, FIG. 3 is a flow chart of a method according to Embodiment 3 of the present invention. As shown in FIG. 3 , the method specifically includes the following steps S302 to S318.

步骤S302：获取直播多媒体文件的实时数据流和标识信息，并根据实时数据流得到直播多媒体文件当前时段的多个音频片段。Step S302: Obtain the real-time data stream and identification information of the live multimedia file, and obtain multiple audio segments of the current period of the live multimedia file according to the real-time data stream.

具体地，该步骤S216包括上述实施例中的步骤S202至步骤S206，此处不再赘述。Specifically, the step S216 includes the steps S202 to S206 in the above-mentioned embodiment, which will not be repeated here.

步骤S304：针对每个音频片段，合并音频片段的左声道数据和右声道数据，以得到音频片段的立体声数据。Step S304: For each audio segment, combine the left channel data and the right channel data of the audio segment to obtain stereo data of the audio segment.

具体地，可采用如下的公式得到立体声数据：Specifically, the following formula can be used to obtain stereo data:

s＝a*l+b*r，其中，a+b＝1，s为音频片段的立体声数据，l为音频片段的左声道数据，r为音频片段的右声道数据，a和b为预设的参数。s=a*l+b*r, wherein, a+b=1, s is the stereo data of audio clip, and l is the left channel data of audio clip, and r is the right channel data of audio clip, and a and b are preset parameters.

步骤S306：提取每个音频片段的立体声数据的时频特征数据作为该音频片段的指纹信息，从而得到直播多媒体文件当前时段的特征信息。Step S306: extracting the time-frequency feature data of the stereo data of each audio segment as the fingerprint information of the audio segment, so as to obtain the feature information of the current period of the live multimedia file.

在提取音频片段的立体声数据的时频特征数据时，具体包括以下的步骤：When extracting the time-frequency feature data of the stereo data of the audio segment, the following steps are specifically included:

首先对音频片段的立体声数据进行短时傅里叶变换，以得到音频片段的立体声数据的时频分布图，然后获取时频分布图中的能量极大值点，根据两个不同时刻的极大值点A[ta,fa,Va]、B[tb,fb,Vb]构建一个指纹为fp[ta,fa,fb,tb-ta]，并转换为哈希码fp[hashData，ta]，其中，ta为极值大点A所处的时刻，fa为极值大点A所处的频率，Va为极值大点A的能量，tb为极值大点B所处的时刻，fb为极值大点B所处的频率，Vb为极值大点B的能量，ta<tb，极大值点A和极值大点B为时频分布图中任意两个相邻的能量极大值点，最后将构建的所有指纹按照时间顺序组合得到音频片段的指纹信息。First, short-time Fourier transform is performed on the stereo data of the audio clip to obtain the time-frequency distribution map of the stereo data of the audio clip, and then the energy maximum points in the time-frequency distribution map are obtained, according to the maximum points at two different moments Value points A[ta,fa,Va], B[tb,fb,Vb] construct a fingerprint as fp[ta,fa,fb,tb-ta], and convert it into a hash code fp[hashData,ta], where , ta is the moment of the extreme point A, fa is the frequency of the extreme point A, Va is the energy of the extreme point A, tb is the moment of the extreme point B, fb is the extreme The frequency of point B with a large value, Vb is the energy of point B with a large extreme value, ta<tb, point A with a large maximum value and point B with a large extreme value are any two adjacent maximum energy values in the time-frequency distribution diagram Finally, combine all the constructed fingerprints in chronological order to obtain the fingerprint information of the audio clip.

步骤S308：根据直播多媒体文件当前时段的特征信息更新指纹数据库。Step S308: Update the fingerprint database according to the feature information of the current period of the live multimedia file.

具体地，该步骤S216包括上述实施例中的步骤S210至步骤S214，此处不再赘述。Specifically, the step S216 includes the steps S210 to S214 in the above-mentioned embodiment, which will not be repeated here.

步骤S310：接收识别目标多媒体文件的识别请求。Step S310: Receive an identification request for identifying a target multimedia file.

步骤S312：根据识别请求获取目标多媒体文件的特征信息。Step S312: Obtain feature information of the target multimedia file according to the identification request.

其中，目标多媒体文件的特征信息为目标多媒体文件的立体声数据的时频特征数据，具体获得视频特征数据的方法与步骤S306中提取音频片段的立体声数据的时频特征数据方法相同，此处不再赘述，Wherein, the feature information of the target multimedia file is the time-frequency feature data of the stereo data of the target multimedia file, and the method for specifically obtaining the video feature data is the same as the time-frequency feature data method of extracting the stereo data of the audio segment in step S306, and is no longer described here. repeat,

其中，目标多媒体文件的立体声数据由目标多媒体文件的音频数据中的左声道数据和右声道数据合并而成，具体采用多组参数可得到多个立体声数据，相应地，得到目标多媒体文件的特征信息为多个指纹信息。Wherein, the stereo data of the target multimedia file is formed by merging the left channel data and the right channel data in the audio data of the target multimedia file. Specifically, multiple sets of parameters can be used to obtain a plurality of stereo data, and correspondingly, the target multimedia file is obtained. The feature information is a plurality of fingerprint information.

在构建目标多媒体文件的指纹信息时，目标多媒体文件的特征信息为N个指纹信息，N个指纹信息中的一个指纹信息为目标多媒体文件的N个立体声数据中的一个立体声数据的时频特征数据，其中，N个立体声数据中的第i个立体声数据为si′＝ai′*l′+bi′*r′，ai′+bi′＝1，i＝1,2,3…N。When constructing the fingerprint information of the target multimedia file, the feature information of the target multimedia file is N fingerprint information, and one fingerprint information in the N fingerprint information is the time-frequency feature data of one stereo data in the N stereo data of the target multimedia file , wherein, the i-th stereo data among the N stereo data is si′=ai′*l′+bi′*r′, ai′+bi′=1, i=1, 2, 3...N.

步骤S314：将目标多媒体文件的每个指纹信息分别与指纹数据库中的指纹样本匹配，得到每个指纹信息的匹配率。Step S314: Match each fingerprint information of the target multimedia file with the fingerprint samples in the fingerprint database to obtain the matching rate of each fingerprint information.

将每组立体声数据对应的指纹信息与指纹数据库中的指纹样本进行匹配，对于任意一组立体声数据的指纹信息，均会得到与之匹配的指纹样本，并且每个匹配到的指纹样本均会对应一个匹配率，将匹配率最大(也即最大匹配率)对应的指纹样本所在的多媒体记录作为所述目标多媒体文件对应的多媒体记录。Match the fingerprint information corresponding to each set of stereo data with the fingerprint samples in the fingerprint database. For any set of fingerprint information of stereo data, a matching fingerprint sample will be obtained, and each matched fingerprint sample will correspond to A matching rate, the multimedia record corresponding to the fingerprint sample with the highest matching rate (that is, the maximum matching rate) is used as the multimedia record corresponding to the target multimedia file.

步骤S316：将最大匹配率对应的指纹样本所在的多媒体记录作为目标多媒体文件对应的多媒体记录。Step S316: Use the multimedia record where the fingerprint sample corresponding to the maximum matching rate is located as the multimedia record corresponding to the target multimedia file.

步骤S318：获取目标多媒体文件对应的多媒体记录中的标识信息，以识别目标多媒体文件。Step S318: Obtain the identification information in the multimedia record corresponding to the target multimedia file to identify the target multimedia file.

实施例四Embodiment Four

该实施例四提供了一种直播多媒体文件的识别方法，在该方法中，直播多媒体文件为直播视频，构建指纹数据库时使用视频的音频数据的时频特征数据作为指纹样本构建。The fourth embodiment provides a method for identifying a live multimedia file. In this method, the live multimedia file is a live video, and the time-frequency feature data of the audio data of the video is used as a fingerprint sample to construct the fingerprint database.

在该方法中，视频的音频数据一律使用立体声数据，保证目标视频的指纹信息音频源和指纹数据库中指纹样本音频源的数据格式一致。同时，在对目标视频的音频数据进行预处理时，针对录音方式获取目标视频的音频数据时环境噪声对音频数据质量的影响，设置自适应参数，使得提取到的目标视频的指纹信息更为鲁棒。In this method, the audio data of the video all use stereo data to ensure that the audio source of the fingerprint information of the target video is consistent with the data format of the audio source of the fingerprint sample in the fingerprint database. At the same time, when preprocessing the audio data of the target video, for the impact of environmental noise on the quality of the audio data when the audio data of the target video is obtained by recording, the adaptive parameters are set to make the fingerprint information of the extracted target video more robust. Great.

综上，从用户终端和指纹数据库两个方面，实现了对视频的快速、精确的识别。在该方法中，通过对指纹数据库的实时更新，实现了对网络直播视频的实时在线识别。To sum up, from two aspects of the user terminal and the fingerprint database, the fast and accurate identification of the video is realized. In this method, the real-time online identification of live webcast video is realized by updating the fingerprint database in real time.

接下来，将从实现该实施例方法的系统的角度，详细描述该实施例提供的直播多媒体文件的识别方法。Next, from the perspective of a system implementing the method of this embodiment, the method for identifying live multimedia files provided by this embodiment will be described in detail.

如图4所示，该系统由4部分构成：用户终端、视频搜索服务器、指纹管理服务器、视频管理服务器，其中，视频搜索服务器、指纹管理服务器、视频管理服务器可共同构成云端服务器。As shown in Figure 4, the system consists of four parts: user terminal, video search server, fingerprint management server, and video management server. Among them, the video search server, fingerprint management server, and video management server can jointly constitute a cloud server.

具体地，用户终端负责获取目标视频的音频数据，以及呈现视频搜索的结果。视频搜索服务器负责管理不同用户终端的视频识别请求，并向指纹管理服务器发送这些请求；还用于接收指纹管理服务器传来的视频识别结果，并将结果返回给提出识别请求的用户终端。指纹管理服务器一方面负责在指纹数据库中搜索目标视频对应的指纹样本；另一方面，负责创建、更新、维护指纹数据库。视频管理服务器一方面负责存储和管理视频源发送来的视频数据，将视频数据存储至视频数据库；同时，将视频对应的音频数据和视频信息上传给指纹管理服务器。视频搜索服务器和指纹管理服务器配合实现目标视频的搜索，而指纹管理服务器和视频管理服务器配合实现指纹数据库的创建和更新。Specifically, the user terminal is responsible for acquiring audio data of the target video, and presenting video search results. The video search server is responsible for managing video identification requests from different user terminals, and sending these requests to the fingerprint management server; it is also used to receive the video identification results from the fingerprint management server, and return the results to the user terminal that made the identification request. On the one hand, the fingerprint management server is responsible for searching the fingerprint samples corresponding to the target video in the fingerprint database; on the other hand, it is responsible for creating, updating and maintaining the fingerprint database. On the one hand, the video management server is responsible for storing and managing the video data sent by the video source, storing the video data in the video database; at the same time, uploading the audio data and video information corresponding to the video to the fingerprint management server. The video search server and the fingerprint management server cooperate to realize the search of the target video, and the fingerprint management server and the video management server cooperate to realize the creation and update of the fingerprint database.

如图5所示，用户终端包括以下模块：As shown in Figure 5, the user terminal includes the following modules:

录音模块：用于录制视频播放时的音频数据；音频预处理模块：对录音模块获取的音频数据进行混合、下采样、降噪等操作，降低录音环境的噪声对于匹配结果的影响；指纹提取模块：提取预处理后音频数据的指纹信息；结果显示模块：利用用户终端音频播放器、视频播放器、显示屏幕等硬件资源，显示视频搜索的结果(如：播放识别结果、在手机屏幕呈现相似的视频)；网络传输模块：实现用户终端与视频搜索服务器之间的数据传输需求，向视频搜索服务器发送内容为“场景信息、目标指纹(也即目标视频的指纹信息)”的识别请求，接收视频搜索服务器发来的视频识别结果。Recording module: used to record audio data during video playback; audio preprocessing module: perform operations such as mixing, down-sampling, and noise reduction on the audio data acquired by the recording module to reduce the impact of noise in the recording environment on the matching results; fingerprint extraction module : Extract the fingerprint information of the preprocessed audio data; Result display module: Utilize hardware resources such as user terminal audio player, video player, display screen, etc. to display the results of video search (such as: play recognition results, present similar video); network transmission module: realize the data transmission requirement between the user terminal and the video search server, send the identification request of "scene information, target fingerprint (that is, the fingerprint information of the target video)" to the video search server, and receive the video Search the video recognition results sent by the server.

如图6所示，视频搜索服务器包括以下模块：As shown in Figure 6, the video search server includes the following modules:

网络传输模块1：用于与用户终端的信息交互。接收用户终端的视频识别请求。将视频搜索的结果返回给用户终端；视频搜索管理模块：处理海量用户的视频识别请求。将用户终端发送的视频识别请求提交给指纹管理服务器；识别结果管理模块：处理视频搜索的结果；网络传输模块2：用于与指纹管理服务器信息交互。接收指纹管理服务器发送的视频识别结果。将来自用户终端的视频识别请求发送给指纹管理服务器。Network transmission module 1: used for information interaction with user terminals. Receive a video recognition request from a user terminal. Return the result of the video search to the user terminal; the video search management module: process the video recognition requests of a large number of users. Submit the video identification request sent by the user terminal to the fingerprint management server; identification result management module: process the video search result; network transmission module 2: used for information interaction with the fingerprint management server. Receive the video recognition result sent by the fingerprint management server. Send the video recognition request from the user terminal to the fingerprint management server.

如图7所示，指纹管理服务器包括以下模块：As shown in Figure 7, the fingerprint management server includes the following modules:

网络传输模块1：用于与视频搜索服务器的信息交互。接收视频识别请求。将视频搜索的结果返回给视频搜索服务器；指纹搜索模块：在指纹数据库中搜索目标指纹，返回识别结果；指纹提取模块：提取来自视频管理服务器的音频数据的指纹信息，将生成的指纹信息连同视频信息(也即视频的标识信息)传给指纹管理模块；指纹管理模块：根据视频信息和指纹信息生成的指纹数据库所需的数据，并将生成的数据存储至指纹数据库中；网络传输模块2：用于指纹管理服务器与视频管理服务器的信息交互。Network transmission module 1: used for information interaction with the video search server. Receive a video recognition request. Return the result of the video search to the video search server; fingerprint search module: search for the target fingerprint in the fingerprint database, and return the identification result; fingerprint extraction module: extract the fingerprint information of the audio data from the video management server, and combine the generated fingerprint information with the video Information (that is, the identification information of the video) is transmitted to the fingerprint management module; the fingerprint management module: according to the required data of the fingerprint database generated by the video information and the fingerprint information, and stores the generated data in the fingerprint database; the network transmission module 2: Used for information exchange between the fingerprint management server and the video management server.

如图8所示，视频管理服务器包括以下模块：As shown in Figure 8, the video management server includes the following modules:

网络传输模块1：实现视频源和视频管理服务器的数据传输；视频管理模块：根据视频源的信息(如：频道、ur l等)，将视频流存储至视频数据库中相应的位置，同时，将视频流连同视频源信息传入音频提取模块；音频提取模块：获取视频管理服务器传来的数据、提取视频流中的音频流，将音频流连同视频源信息传入音频分割模块；音频预处理模块：将双声道音频数据混合成立体声，将不同格式的音频数据转成统一的格式，并对音频数据进行下采样，将处理后的音频数据连同视频源信息传给音频分割模块；音频分割模块：按时间顺序将音频数据分割成或拼接成时间长度为T的音频片段，将音频片段连同视频源信息上传至指纹管理服务器；网络传输模块2：实现视频管理服务器和指纹管理服务器的数据传输。Network transmission module 1: realize the data transmission of video source and video management server; Video management module: according to the information (such as: channel, url etc.) of video source, video stream is stored in corresponding position in video database, simultaneously, The video stream and the video source information are transmitted to the audio extraction module; the audio extraction module: obtain the data from the video management server, extract the audio stream in the video stream, and transmit the audio stream and the video source information to the audio segmentation module; the audio preprocessing module : Mix the two-channel audio data into stereo, convert the audio data in different formats into a unified format, and down-sample the audio data, and pass the processed audio data together with the video source information to the audio segmentation module; the audio segmentation module : Divide or splice the audio data into audio segments with a time length of T in chronological order, upload the audio segments together with video source information to the fingerprint management server; network transmission module 2: realize data transmission between the video management server and the fingerprint management server.

该实施例的方法在采用上述的系统实现视频识别时，需要经过以下步骤：步骤一，获取待识别的目标视频的音频数据；对获取到的音频数据进行预处理；步骤三，获取目标视频的音频数据的指纹信息；步骤四，将该指纹信息与事先构建好(或是实时更新的)的指纹数据库中的指纹样本进行匹配，获得匹配结果；步骤五，将匹配结果返回给用户终端，用户终端可根据获取的结果，呈现和播放相关的视频内容。When the method of this embodiment adopts the above-mentioned system to realize video recognition, it needs to go through the following steps: Step 1, obtain the audio data of the target video to be recognized; preprocess the obtained audio data; Step 3, obtain the audio data of the target video The fingerprint information of the audio data; Step 4, match the fingerprint information with the fingerprint samples in the fingerprint database constructed in advance (or updated in real time) to obtain the matching result; Step 5, return the matching result to the user terminal, the user The terminal can present and play relevant video content according to the obtained result.

系统的视频数据采用分布式存储方式存储在多个视频数据库中，并由多个视频管理服务器进行管理。不同的视频数据库之间，通过统一的视频列表进行资源的共享，所有的视频管理服务器共享一个视频列表。当其中一个视频数据库的视频列表更新，则该视频数据库对应的视频管理服务器向全网广播列表更新报文(报文中携带更新之后的视频列表)，其他视频数据库根据报文更新自己的视频列表。The video data of the system is stored in multiple video databases in a distributed storage manner, and is managed by multiple video management servers. Different video databases share resources through a unified video list, and all video management servers share a video list. When the video list of one of the video databases is updated, the video management server corresponding to the video database broadcasts a list update message to the entire network (carrying the updated video list in the message), and other video databases update their own video lists according to the message .

系统拥有唯一的指纹数据库，该指纹数据库由指纹管理服务器进行管理，在指纹管理服务器上，配置了指纹提取模块和指纹搜索模块。指纹提取模块用来处理视频管理服务器传来的信息，形成指纹样本；指纹搜索模块用来处理视频搜索服务器发出的视频识别请求，在指纹数据库中搜索目标指纹，并将识别结果返回给视频搜索服务器。The system has a unique fingerprint database, which is managed by the fingerprint management server. On the fingerprint management server, a fingerprint extraction module and a fingerprint search module are configured. The fingerprint extraction module is used to process the information sent by the video management server to form a fingerprint sample; the fingerprint search module is used to process the video recognition request sent by the video search server, search for the target fingerprint in the fingerprint database, and return the recognition result to the video search server .

视频源与视频管理服务器间的交互：视频源产生新的视频数据，向视频管理服务器提交上传视频数据的请求报文(报文中包含视频源信息，视频内容信息)和视频流；视频管理服务器提取报文中的视频源信息、视频内容信息和视频流；更新视频管理服务器的视频列表；一方面，将上述信息和视频流存储至视频数据库中，建立列表信息与存储视频数据间的关联，将该关联信息添加到本地的视频列表中，同时向全网广播视频列表更新信息，更新全网视频列表的表单。另一方面，提取视频数据中的音频数据，将上述视频源和视频内容信息、新增视频列表信息连同提取的音频数据封装成视频库更新报文，通过网络向指纹管理服务器提交该报文。Interaction between the video source and the video management server: the video source generates new video data, and submits a request message for uploading video data (including video source information and video content information) and video stream to the video management server; the video management server Extract the video source information, video content information and video stream in the message; update the video list of the video management server; on the one hand, store the above information and video stream in the video database, establish the association between the list information and the stored video data, The associated information is added to the local video list, and at the same time, information is updated to the entire network broadcast video list, and the form of the entire network video list is updated. On the other hand, the audio data in the video data is extracted, the above video source and video content information, the new video list information and the extracted audio data are packaged into a video library update message, and the message is submitted to the fingerprint management server through the network.

指纹管理服务器与视频管理服务器间的交互：(1)指纹管理服务器接收到视频库更新报文，根据报文中视频源信息和视频内容信息，生成该与该视频唯一对应的Track ID。将Track ID添加入指纹管理服务器的指纹列表中。(2)获取报文中的音频数据，提取该音频数据的指纹信息。针对不同类型的视频源(直播/非直播)，采用不同的指纹信息提取方案。(3)将Track ID、指纹信息、视频源信息和视频内容信息封装，保存至指纹数据库中。指纹数据库用来存储视频数据的指纹信息和视频的关联信息、视频的相关信息。The interaction between the fingerprint management server and the video management server: (1) The fingerprint management server receives the video library update message, and generates the Track ID uniquely corresponding to the video according to the video source information and video content information in the message. Add the Track ID to the fingerprint list of the fingerprint management server. (2) Obtain the audio data in the message, and extract the fingerprint information of the audio data. For different types of video sources (live/non-live), different fingerprint information extraction schemes are adopted. (3) Encapsulate Track ID, fingerprint information, video source information and video content information, and store them in the fingerprint database. The fingerprint database is used to store the fingerprint information of the video data, the associated information of the video, and the related information of the video.

指纹数据库从逻辑上分成两个子数据库(1)直播视频指纹子数据库；(2)视频指纹子数据库，这两个数据库共同由指纹管理服务器进行统一管理，实现创建、更新、维护的操作。The fingerprint database is logically divided into two sub-databases (1) the live video fingerprint sub-database; (2) the video fingerprint sub-database. These two databases are jointly managed by the fingerprint management server to realize the operations of creation, update and maintenance.

直播视频指纹子数据库存储当前直播视频的相关信息：Track ID、指纹信息、视频信息、视频相关信息。每个频道对应唯一的Track ID。The live video fingerprint sub-database stores the relevant information of the current live video: Track ID, fingerprint information, video information, and video related information. Each channel corresponds to a unique Track ID.

Track ID：视频的指纹信息在指纹数据库中的唯一标识。Track ID: The unique identification of the fingerprint information of the video in the fingerprint database.

指纹信息：仅仅保留最新时长为T的直播视频的指纹信息。指纹信息伴随直播视频数据库的更新进行相应的更新。具体的实现方案在下一部分有具体阐述。Fingerprint information: only keep the fingerprint information of the latest live video with a length of T. The fingerprint information is updated accordingly with the update of the live video database. The specific implementation scheme is elaborated in the next part.

视频信息：与直播视频内容相关的信息以及视频存储信息。包括：视频频道、直播名称、直播内容、主持人，直播频道ur l、存储位置等。Video information: Information related to live video content and video storage information. Including: video channel, live broadcast name, live content, host, live channel url, storage location, etc.

视频相关信息：与直播视频相似的其他视频链接，视频中出现的商品或者地点的信息等。Video-related information: links to other videos similar to the live video, information about products or locations appearing in the video, etc.

直播视频指纹子数据库的更新：Live video fingerprint sub-database update:

(1)视频管理服务器收到直播间(也即视频源)发来的视频信息和视频流。提取其中视频信息部分。调用音频提取模块，提取视频的音频数据。调用音频预处理模块对获取的音频数据进行预处理操作；调用视频分割模块按时间顺序将视频流分割成时间长度为t(t远远小于T)的音频片段；将所有音频片段添加入网络传输模块1的发送队列中；将视频信息连同发送队列中长度为t的音频片段依次封装成视频库更新报文，上传至指纹管理服务器。(1) The video management server receives the video information and video stream from the live broadcast room (that is, the video source). Extract the video information part. Call the audio extraction module to extract the audio data of the video. Call the audio preprocessing module to preprocess the acquired audio data; call the video segmentation module to divide the video stream into audio segments with a time length of t (t is much smaller than T) in time order; add all audio segments to the network transmission In the sending queue of module 1; the video information and the audio clips of length t in the sending queue are sequentially encapsulated into a video library update message and uploaded to the fingerprint management server.

(2)指纹管理服务器接收到视频库更新报文，从报文中视频信息部分解析出该直播视频的频道，生成对应的Track ID；调用指纹提取模块，提取时间长度为t音频数据的指纹信息；指纹数据库的更新周期为P(T>>P＝kt,k为整数)。将新指纹长度为kt添加到指纹数据库的指纹列表中，同时移除原有的长度为kt的指纹信息，保证指纹列表中的指纹信息长度始终保持T。(2) The fingerprint management server receives the video storehouse update message, parses out the channel of the live video from the video information part in the message, and generates the corresponding Track ID; calls the fingerprint extraction module, and extracts the fingerprint information whose time length is t audio data ; The update period of the fingerprint database is P (T>>P=kt, k is an integer). Add the new fingerprint with a length of kt to the fingerprint list of the fingerprint database, and remove the original fingerprint information with a length of kt to ensure that the length of the fingerprint information in the fingerprint list is always T.

在该系统中，对于用户终端，通过结合指纹识别算法、指纹数据库实时更新技术，使得系统能够快速识别直播视频。同时，在创建指纹数据库时，从立体声数据提取指纹信息，使用户终端获取的目标视频的指纹信息更大概率匹配上指纹数据库中的它对应的真实指纹信息，提高了识别过程的抗噪能力。In this system, for the user terminal, by combining the fingerprint identification algorithm and the real-time update technology of the fingerprint database, the system can quickly identify the live video. At the same time, when the fingerprint database is created, the fingerprint information is extracted from the stereo data, so that the fingerprint information of the target video obtained by the user terminal is more likely to match its corresponding real fingerprint information in the fingerprint database, and the anti-noise ability of the identification process is improved.

实施例五Embodiment five

该实施例五提供了一种直播多媒体文件的识别装置的实施例，该装置可设置于云端服务器，如图9所示，该装置包括获取模块610、定位模块620、更新模块630、匹配模块640和识别模块650。This embodiment five provides an embodiment of an identification device for a live multimedia file, the device can be set on a cloud server, as shown in Figure 9, the device includes an acquisition module 610, a positioning module 620, an update module 630, and a matching module 640 and identification module 650 .

其中，云端服务器与直播源的后台数据库服务器相互通信，实时获取到直播多媒体文件的实时数据流，在获取实时数据流的同时，也可获取到直播多媒体文件的标识信息。获取模块610用于根据输入的直播多媒体文件的实时数据流获取直播多媒体文件当前时段的特征信息。例如，获取到直播视频的视频流数据，采用特征提取模块对视频流数据进行处理，以提取直播视频对应音频指纹，得到直播视频当前时段的特征信息。Wherein, the cloud server communicates with the background database server of the live broadcast source to obtain the real-time data stream of the live multimedia file in real time, and can also obtain the identification information of the live multimedia file while obtaining the real-time data stream. The acquiring module 610 is configured to acquire feature information of the current period of the live multimedia file according to the input real-time data stream of the live multimedia file. For example, the video stream data of the live video is obtained, and the feature extraction module is used to process the video stream data to extract the corresponding audio fingerprint of the live video, and obtain the feature information of the current period of the live video.

在云端服务器中设置有特征数据库，该特征数据库用于存储至少一条多媒体记录，多媒体记录包括多媒体文件的特征样本、与特征样本对应的标识信息，特征样本的时间长度为第一预定时间。定位模块620用于根据直播多媒体文件的标识信息在特征数据库中定位待更新的多媒体记录。A feature database is set in the cloud server, and the feature database is used to store at least one multimedia record. The multimedia record includes a feature sample of a multimedia file and identification information corresponding to the feature sample. The time length of the feature sample is a first predetermined time. The locating module 620 is used for locating the multimedia record to be updated in the feature database according to the identification information of the live multimedia file.

更新模块630用于根据直播多媒体文件当前时段的特征信息更新待更新的多媒体记录中的特征样本，从而针对任意的直播多媒体文件，在特征数据库中总存储有当前最新一段时间内的特征信息。The update module 630 is used to update the feature samples in the multimedia record to be updated according to the feature information of the current period of the live multimedia file, so that for any live multimedia file, the feature information of the latest period of time is always stored in the feature database.

匹配模块640用于接收识别目标多媒体文件的识别请求，匹配识别请求中包括的目标多媒体文件的特征信息与特征数据库中的特征样本，以定位目标多媒体文件对应的多媒体记录。识别模块650用于获取目标多媒体文件对应的多媒体文件的标识信息。The matching module 640 is configured to receive an identification request for identifying a target multimedia file, and match the feature information of the target multimedia file included in the identification request with feature samples in the feature database to locate the multimedia record corresponding to the target multimedia file. The identifying module 650 is configured to acquire identification information of a multimedia file corresponding to the target multimedia file.

采用该实施例提供的直播多媒体文件的识别装置，预置一个特征数据库来存储直播多媒体文件的特征样本和标识信息，并通过获取直播源后台的实时数据流实时更新和维护该特征数据库，当有目标多媒体文件需要被识别时，根据目标多媒体文件的特征信息在特征数据库中找到对应的直播多媒体文件的标识信息，达到识别直播多媒体文件的目的。综上，采用该实施例提供的直播多媒体文件的识别装置，能够实时的识别直播多媒体文件，从而也能够实时的识别直播视频。Adopt the identification device of the live multimedia file that this embodiment provides, preset a feature database to store the feature sample and the identification information of the live multimedia file, and update and maintain the feature database in real time by obtaining the real-time data stream of the live source background, when there is When the target multimedia file needs to be identified, the identification information of the corresponding live multimedia file is found in the feature database according to the characteristic information of the target multimedia file, so as to achieve the purpose of identifying the live multimedia file. To sum up, the apparatus for identifying live multimedia files provided by this embodiment can identify live multimedia files in real time, thereby also identifying live video in real time.

优选地，特征信息为多媒体文件的音频数据的指纹信息，获取模块610包括音频数据获取模块、音频片段分割模块和指纹信息提取模块。其中，音频数据获取模块用于根据实时数据流获取直播多媒体文件的当前时段的音频数据；音频片段分割模块用于将当前时段的音频数据按照时间顺序分割为第二预定时间的多个音频片段；指纹信息提取模块用于提取每个音频片段的指纹信息，以得到直播多媒体的当前时段的特征信息，其中，第二预定时间小于第一预定时间。Preferably, the feature information is fingerprint information of audio data of the multimedia file, and the acquisition module 610 includes an audio data acquisition module, an audio segment segmentation module and a fingerprint information extraction module. Wherein, the audio data acquisition module is used to obtain the audio data of the current period of the live multimedia file according to the real-time data stream; the audio segment segmentation module is used to divide the audio data of the current period into a plurality of audio segments of the second predetermined time in chronological order; The fingerprint information extraction module is used to extract the fingerprint information of each audio segment to obtain the feature information of the current period of the live multimedia, wherein the second predetermined time is shorter than the first predetermined time.

优选地，特征样本为n个音频片段的指纹信息，直播多媒体文件的当前时段的特征信息为m个音频片段的指纹信息，m<n，n个音频片段的时间长度为第一预定时间，更新模块630包括删除模块和添加模块。其中，删除模块用于删除待更新的多媒体记录中特征样本的最早的m个指纹信息；添加模块用于将直播多媒体文件的当前时段的m个指纹信息按时间顺序置于待更新的多媒体记录的特征样本中。Preferably, the feature sample is the fingerprint information of n audio clips, the feature information of the current period of the live multimedia file is the fingerprint information of m audio clips, m<n, the time length of n audio clips is the first predetermined time, update Module 630 includes a delete module and an add module. Wherein, the deletion module is used to delete the earliest m fingerprint information of the feature sample in the multimedia record to be updated; the addition module is used to place the m fingerprint information of the current period of the live multimedia file in chronological order in the multimedia record to be updated feature samples.

进一步优选地，更新模块630具体执行以下步骤：Further preferably, the update module 630 specifically performs the following steps:

进一步优选地，指纹信息提取模块包括立体声数据合成模块和时频特征提取模块。其中，立体声数据合成模块用于合并音频片段的左声道数据和右声道数据，以得到音频片段的立体声数据；时频特征提取模块用于提取音频片段的立体声数据的时频特征数据作为音频片段的指纹信息。Further preferably, the fingerprint information extraction module includes a stereo data synthesis module and a time-frequency feature extraction module. Wherein, the stereo data synthesis module is used to merge the left channel data and the right channel data of the audio clip to obtain the stereo data of the audio clip; the time-frequency feature extraction module is used to extract the time-frequency feature data of the stereo data of the audio clip as audio The fingerprint information of the fragment.

优选地，识别请求中包括的目标多媒体文件的特征信息为直播多媒体文件的当前时段的N个指纹信息,N个指纹信息中的一个指纹信息为目标多媒体的N个立体声数据中的一个立体声数据的时频特征数据，其中，N个立体声数据中的第i个立体声数据为si′＝ai′*l′+bi′*r′，ai′+bi′＝1，l′为直播多媒体文件的当前时段的左声道数据，r′为直播多媒体文件的当前时段的右声道数据，ai′和bi′为预设的参数，i＝1,2,3…N,匹配模块640包括匹配率确定模块和多媒体记录确定模块。其中，匹配率确定模块用于将目标多媒体文件的每个指纹信息分别与特征数据库中的特征样本匹配，以得到每个指纹信息的匹配率；多媒体记录确定模块用于将最大匹配率对应的特征样本所在的多媒体记录作为目标多媒体文件对应的多媒体记录。Preferably, the feature information of the target multimedia file included in the identification request is N fingerprint information of the current period of the live multimedia file, and one of the N fingerprint information is one of the N stereo data of the target multimedia. Time-frequency feature data, wherein, the i-th stereo data in the N stereo data is si'=ai'*l'+bi'*r', ai'+bi'=1, and l' is the current of the live multimedia file The left channel data of the period, r' is the right channel data of the current period of the live multimedia file, ai' and bi' are preset parameters, i=1,2,3...N, the matching module 640 includes matching rate determination Modules and multimedia records identify modules. Among them, the matching rate determination module is used to match each fingerprint information of the target multimedia file with the feature samples in the feature database to obtain the matching rate of each fingerprint information; the multimedia record determination module is used to match the feature corresponding to the maximum matching rate The multimedia record where the sample is located serves as the multimedia record corresponding to the target multimedia file.

以上所述，仅为本发明较佳的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉该技术的人在本发明所揭露的技术范围内，可轻易想到的变化或替换，都应涵盖在本发明的保护范围之内。因此，本发明的保护范围应该以权利要求的保护范围为准。The above is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any person familiar with the technology can easily think of changes or replacements within the technical scope disclosed in the present invention. , should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be determined by the protection scope of the claims.

Claims

A kind of 1. recognition methods of live multimedia file, it is characterised in that including：

The feature of the live multimedia file present period is obtained according to the real-time stream of the live multimedia file of input Information；

Multimedia recording to be updated is positioned in property data base according to the identification information of the live multimedia file, its In, the property data base is used to store at least one multimedia recording, and the multimedia recording includes multimedia file Feature samples, identification information corresponding with the feature samples, the time span of the feature samples was first scheduled time；

Spy in the multimedia recording to be updated is updated according to the characteristic information of the live multimedia file present period Levy sample；

The identification request of identification destination multimedia file is received, matches the destination multimedia text that the identification request includes The characteristic information of part and the feature samples in the property data base, to position the corresponding multimedia of the destination multimedia file Record；

Obtain the identification information of the corresponding multimedia file of the destination multimedia file.
2. the recognition methods of the live multimedia file according to claim 1, it is characterised in that the characteristic information is The finger print information of the voice data of multimedia file, the real-time stream of the live multimedia file according to input obtain institute The characteristic information of live multimedia file present period is stated, including：

The voice data of the present period of the live multimedia file is obtained according to the real-time stream；

The voice data of the present period is divided into multiple audio fragments of second scheduled time sequentially in time, its In, second scheduled time is less than first scheduled time；And

The finger print information of each audio fragment of extraction, to obtain the characteristic information of the present period of the live multimedia.
3. the recognition methods of the live multimedia file according to claim 2, it is characterised in that the feature samples are n The finger print information of a audio fragment, the characteristic information of the present period of the live multimedia file are the finger of m audio fragment Line information, m<N, the total time length of the n audio fragment is first scheduled time, according to the live multimedia The feature samples that the characteristic information of file is updated in the multimedia recording to be updated include：

Delete the m earliest finger print information of feature samples in the multimedia recording to be updated；

M finger print information of the present period of the live multimedia file is placed in chronological order described to be updated more In the feature samples of media recording.
4. the recognition methods of the live multimedia file according to claim 3, it is characterised in that described according to described straight The characteristic information of playing multimedia file present period updates the feature samples in the multimedia recording to be updated, specific bag Include：

Step S1：Feature pointer is directed toward first fingerprint letter in the characteristic information of the live multimedia file present period Breath, and timer is reset and starts feature extraction timing；

Step S2：Obtain the finger print information that the feature pointer is directed toward；

Step S3：Extraction and the feature samples of the corresponding multimedia recording of identification information of the live multimedia, to obtain Fisrt feature sample；

Step S4：The finger print information that the feature pointer is directed toward is spliced to the end of the fisrt feature sample, to obtain Two feature samples；

Step S5：A finger print information is deleted from the starting of the second feature sample；

Step S6：Judge whether the time in timer reached for the 3rd scheduled time, if not up to described 3rd scheduled time, The feature pointer is directed toward next finger print information, and repeats step S2 to S6；If reaching the 3rd scheduled time, The corresponding feature samples of multi-media tag described in the multimedia recording are replaced with the obtained second feature sample, its In, the 3rd scheduled time is the total duration of the reproduction time of the corresponding multimedia file of the m finger print information.
5. the recognition methods of the live multimedia file according to claim 2, it is characterised in that extract the audio piece The finger print information of section includes：

Merge the left channel data and right data of the audio fragment, to obtain the stereo data of the audio fragment； And

Extract finger print information of the time-frequency characteristics data of the stereo data of the audio fragment as the audio fragment.
6. the recognition methods of the live multimedia file according to claim 2, it is characterised in that in the identification request Including the destination multimedia file characteristic information for the live multimedia file present period N fingerprint believe Cease, a finger print information in the N finger print information is one in N stereo data of the destination multimedia The time-frequency characteristics data of stereo data, wherein, i-th stereo data in the N stereo data for si '= Ai ' * l '+bi ' * r ', wherein, ai '+bi '=1, l ' they are the L channel number of the present period of the live multimedia file According to the right data for the present period that, r ' is the live multimedia file, ai ' and bi ' are default parameter, i=1, 2,3… N,

In the method, the characteristic information for the destination multimedia file that the matching identification request includes and institute The feature samples in property data base are stated, are included with positioning the corresponding multimedia recording of the destination multimedia file：

Each finger print information of the destination multimedia file is matched with the feature samples in the property data base respectively, is obtained To the matching rate of each finger print information；

Multimedia recording where the corresponding feature samples of maximum matching rate is corresponding more as the destination multimedia file Media recording.
A kind of 7. identification device of live multimedia file, it is characterised in that including：

Acquisition module, the real-time stream for the live multimedia file according to input obtain the live multimedia file and work as The characteristic information of preceding period；

Locating module is to be updated more for being positioned according to the identification information of the live multimedia file in property data base Media recording, wherein, the property data base is used to store at least one multimedia recording, and the multimedia recording includes The feature samples of multimedia file, identification information corresponding with the feature samples, the time spans of the feature samples are the One scheduled time；

Update module, for updating more matchmakers to be updated according to the characteristic information of the live multimedia file present period Feature samples in body record；

Matching module, for receiving the identification request of identification destination multimedia file, matches the institute that the identification request includes The characteristic information of destination multimedia file and the feature samples in the property data base are stated, to position the destination multimedia text The corresponding multimedia recording of part；

Identification module, for obtaining the identification information of the corresponding multimedia file of the destination multimedia file.
8. the identification device of the live multimedia file according to claim 7, it is characterised in that the characteristic information is The finger print information of the voice data of multimedia file, the acquisition module include：

Voice data acquisition module, for the present period according to the real-time stream acquisition live multimedia file Voice data；

Audio fragment splits module, for the voice data of the present period to be divided into the second pre- timing sequentially in time Between multiple audio fragments, wherein, second scheduled time is less than first scheduled time；And

Finger print information extraction module, for extracting the finger print information of each audio fragment, to obtain the live multimedia Present period characteristic information.
9. the identification device of the live multimedia file according to claim 8, it is characterised in that the feature samples are n The finger print information of a audio fragment, the characteristic information of the present period of the live multimedia file are the finger of m audio fragment Line information, m<N, the total time length of the n audio fragment is first scheduled time, and the update module includes：

Removing module, for deleting the m earliest finger print information of feature samples in the multimedia recording to be updated；

Add module, for m finger print information of the present period of the live multimedia file to be placed in institute in chronological order In the feature samples for stating multimedia recording to be updated.
10. the identification device of the live multimedia file according to claim 9, it is characterised in that the update module tool Body performs following steps：

Step S1：Feature pointer is directed toward first fingerprint letter in the characteristic information of the live multimedia file present period Breath, and timer is reset and starts feature extraction timing；

Step S2：Obtain the finger print information that the feature pointer is directed toward；

Step S3：Extraction and the feature samples of the corresponding multimedia recording of identification information of the live multimedia, to obtain Fisrt feature sample；

Step S4：The finger print information that the feature pointer is directed toward is spliced to the end of the fisrt feature sample, to obtain Two feature samples；

Step S5：A finger print information is deleted from the starting of the second feature sample；

Step S6：Judge whether the time in timer reached for the 3rd scheduled time, if not up to described 3rd scheduled time, The feature pointer is directed toward next finger print information, and repeats step S2 to S6；If reaching the 3rd scheduled time, The corresponding feature samples of multi-media tag described in the multimedia recording are replaced with the obtained second feature sample, its In, the 3rd scheduled time is the total duration of the reproduction time of the corresponding multimedia file of the m finger print information.