CN103455642B

CN103455642B - A kind of method and apparatus of multimedia document retrieval

Info

Publication number: CN103455642B
Application number: CN201310469487.3A
Authority: CN
Inventors: 胡锴亮
Original assignee: Samsung Electronics China R&D Center; Samsung Electronics Co Ltd
Current assignee: Samsung Electronics China R&D Center; Samsung Electronics Co Ltd
Priority date: 2013-10-10
Filing date: 2013-10-10
Publication date: 2017-03-08
Anticipated expiration: 2033-10-10
Also published as: CN103455642A

Abstract

The present application discloses a method for retrieving multimedia files. The method includes: when receiving a voice command, separating the action keyword and description keyword in the voice command; using the separated description keyword to match the stored embedded description If the multimedia file of the voice data is matched to the multimedia file, according to the separated action keyword, the corresponding action is executed on the multimedia file determined to be retrieved. Based on the same inventive concept, this application also proposes a device that can simply retrieve multimedia files through voice, bringing friendly user physical examination.

Description

Method and device for multimedia file retrieval

技术领域technical field

本申请涉及语音处理技术领域，特别涉及一种多媒体文件检索的方法和装置。The present application relates to the technical field of voice processing, in particular to a method and device for multimedia file retrieval.

背景技术Background technique

目前，多媒体记录模块，如摄像头，麦克风已经成为个人日常电子设备的标准配置，如相机，手机，平板电脑等等。人们越来越多地使用这些设备记录生活点滴，也保存了大量的照片，视频，音频等多媒体文件。At present, multimedia recording modules, such as cameras and microphones, have become the standard configuration of personal daily electronic equipment, such as cameras, mobile phones, tablet computers and so on. People are increasingly using these devices to record life, and also save a large number of multimedia files such as photos, videos, and audios.

这些文件一般情况下都以数字序号或者拍摄时间来命名，但是正是这种序号方式的命名使得文件检索、管理相当地不方便。These files are generally named with digital serial numbers or shooting time, but it is precisely this serial number naming that makes file retrieval and management quite inconvenient.

例如用户想检索某几张照片，一般需要浏览大量照片并从中选择出所需的几张，特别是在没有鼠标键盘等通用输入设备的终端上，例如在TV上检索照片，只能利用简单的遥控器来浏览检索照片，相当不方便。For example, if a user wants to retrieve certain photos, he usually needs to browse through a large number of photos and select the desired ones, especially on terminals without general input devices such as mouse and keyboard. For example, when retrieving photos on TV, he can only use simple It is quite inconvenient to use the remote control to browse and retrieve photos.

现有实现中，已有某些技术方案用于解决多媒体文件检索问题，基本上都是通过建立关键词和多媒体文件的映射关系，并将这些映射关系以独立文件保存或者数据的方式存入数据库。In existing implementations, some technical solutions have been used to solve the problem of multimedia file retrieval, basically by establishing the mapping relationship between keywords and multimedia files, and storing these mapping relationships in the database in the form of independent file storage or data .

这些解决方案的最大缺点在于被检索的文件和映射关系文件的关系较为松散，并会产生大量的映射文件，对于用户来说，文件显得更加杂乱；另外如果被检索文件如果被更名，可能需要同时更新映射文件或者数据库；同时将被检索文件转存到别的存储设备或者显示设备上的时候，需要同时转存映射文件，或者数据库文件。The biggest disadvantage of these solutions is that the relationship between the retrieved files and the mapping relationship files is relatively loose, and a large number of mapping files will be generated. For the user, the files are more messy; Update the mapping file or database; at the same time, when transferring the retrieved file to another storage device or display device, it is necessary to transfer the mapping file or database file at the same time.

发明内容Contents of the invention

有鉴于此，本申请提供一种多媒体文件检索的方法和装置，能够简单地通过语音进行多媒体文件的检索，带来友好的用户体检。In view of this, the present application provides a method and device for retrieving multimedia files, which can simply perform retrieval of multimedia files through voice, and bring friendly user physical examination.

为解决上述技术问题，本发明的技术方案是这样实现的：In order to solve the problems of the technologies described above, the technical solution of the present invention is achieved in the following way:

一种多媒体文件检索的方法，所述方法包括：A method for multimedia file retrieval, the method comprising:

存储多媒体文件，其中，所述多媒体文件在捕获时，嵌入了针对该多媒体文件的描述语音数据；Storing the multimedia file, wherein, when the multimedia file is captured, the descriptive voice data for the multimedia file is embedded;

接收到检索多媒体文件的语音指令时，识别并分离该语音指令中的动作关键词和需要检索的多媒体文件的描述关键词；When receiving a voice instruction for retrieving a multimedia file, identify and separate the action keyword in the voice instruction and the description keyword of the multimedia file to be retrieved;

根据分离出的描述关键词，对存储的多媒体文件进行匹配；其中，匹配每个多媒体文件时，通过嵌入该多媒体文件的描述语音数据，识别出该多媒体文件的描述关键词；若分离出的描述关键词与识别出的描述关键词匹配，则确定该多媒体文件为匹配的多媒体文件；Match the stored multimedia files according to the separated description keywords; wherein, when matching each multimedia file, identify the description keywords of the multimedia file by embedding the description voice data of the multimedia file; if the separated description If the keyword matches the identified description keyword, it is determined that the multimedia file is a matching multimedia file;

根据分离出的动作关键词，对匹配到的多媒体文件执行相应的动作。According to the separated action keyword, a corresponding action is performed on the matched multimedia file.

一种装置，该装置包括：存储单元、接收单元、识别单元、匹配单元和处理单元；A device comprising: a storage unit, a receiving unit, an identification unit, a matching unit and a processing unit;

所述存储单元，用于存储多媒体文件，其中，所述多媒体文件在捕获时，嵌入了针对该多媒体文件的描述语音数据；The storage unit is used to store a multimedia file, wherein when the multimedia file is captured, the description voice data for the multimedia file is embedded;

所述接收单元，用于接收检索多媒体文件的语音指令；The receiving unit is configured to receive voice instructions for retrieving multimedia files;

所述识别单元，用于当所述接收单元接收到检索多媒体文件的语音指令时，识别并分离该语音指令中的动作关键词和需要检索的多媒体文件的描述关键词；The identification unit is used to identify and separate the action keyword in the voice instruction and the description keyword of the multimedia file to be retrieved when the receiving unit receives a voice instruction for retrieving multimedia files;

所述匹配单元，用于根据所述识别单元分离出的描述关键词，对所述存储单元存储的多媒体文件进行匹配；其中，匹配每个多媒体文件时，通过嵌入该多媒体文件的描述语音数据，识别出该多媒体文件的描述关键词；若分离出的描述关键词与识别出的描述关键词匹配，则确定该多媒体文件为匹配的多媒体文件；The matching unit is configured to match the multimedia files stored in the storage unit according to the description keywords separated by the recognition unit; wherein, when matching each multimedia file, by embedding the description voice data of the multimedia file, Identify the description keyword of the multimedia file; if the separated description keyword matches the identified description keyword, then determine that the multimedia file is a matching multimedia file;

所述处理单元，用于根据所述识别单元分离出的动作关键词，对所述匹配单元匹配到的多媒体文件执行相应的动作。The processing unit is configured to perform corresponding actions on the multimedia files matched by the matching unit according to the action keywords separated by the recognition unit.

综上所述，本申请通过在接收到语音指令时，分离出该语音指令中的动作关键词和描述关键词；使用分离出的描述关键词匹配存储的嵌入描述语音数据的多媒体文件，若匹配到多媒体文件，根据分离出的动作关键词，对确定为要检索的多媒体文件执行相应的动作。在不需要保存和维护关键词和多媒体文件的映射关系的情况下，能够通过语音识别进行多媒体文件的检索。To sum up, this application separates the action keyword and description keyword in the voice instruction when receiving the voice instruction; uses the separated description keyword to match the stored multimedia file embedded in the description voice data, if it matches According to the separated action keyword, the corresponding action is executed on the multimedia file determined to be retrieved. In the case that there is no need to save and maintain the mapping relationship between keywords and multimedia files, multimedia files can be retrieved through speech recognition.

附图说明Description of drawings

图1为本发明实施例一中检索多媒体文件的方法流程示意图；FIG. 1 is a schematic flow diagram of a method for retrieving multimedia files in Embodiment 1 of the present invention;

图2为本发明实施例二中检索多媒体文件的方法流程示意图；2 is a schematic flow diagram of a method for retrieving multimedia files in Embodiment 2 of the present invention;

图3为本发明实施例三中检索多媒体文件的方法流程示意图；3 is a schematic flow diagram of a method for retrieving multimedia files in Embodiment 3 of the present invention;

图4为本发明实施例四中检索多媒体文件的方法流程示意图；4 is a schematic flow diagram of a method for retrieving multimedia files in Embodiment 4 of the present invention;

图5为本发明实施例五中多媒体文件版权确定方法流程示意图；5 is a schematic flow chart of a method for determining copyright of a multimedia file in Embodiment 5 of the present invention;

图6为应用于上述技术的装置结构示意图。FIG. 6 is a schematic structural diagram of a device applied to the above technology.

具体实施方式detailed description

为使本发明的目的、技术方案及优点更加清楚明白，以下参照附图并举实施例，对本发明所述方案作进一步地详细说明。In order to make the object, technical solution and advantages of the present invention clearer, the solutions of the present invention will be further described in detail below with reference to the accompanying drawings and examples.

本发明实施例中提出一种多媒体文件检索的方法，在接收到语音指令时，分离出该语音指令中的动作关键词和描述关键词；使用分离出的描述关键词匹配存储的嵌入描述语音数据的多媒体文件，若匹配到多媒体文件，根据分离出的动作关键词，对确定为要检索的多媒体文件执行相应的动作。能够简单地通过语音进行多媒体文件的检索，带来友好的用户体检。In the embodiment of the present invention, a method for multimedia file retrieval is proposed. When a voice command is received, the action keywords and description keywords in the voice command are separated; the separated description keywords are used to match the stored embedded description voice data If the multimedia file is matched to the multimedia file, according to the separated action keyword, a corresponding action is performed on the multimedia file determined to be retrieved. It can simply retrieve multimedia files by voice, bringing a friendly user experience.

在本发明具体实施例中，捕获多媒体文件的设备，与检索多媒体文件的设备可以是同一设备，也可以是不同设备。如果不是同一设备，需将嵌入语音数据的多媒体文件转存到检索多媒体文件的设备上。In a specific embodiment of the present invention, the device for capturing multimedia files and the device for retrieving multimedia files may be the same device or different devices. If it is not the same device, the multimedia file embedded with voice data needs to be transferred to the device for retrieving the multimedia file.

检索多媒体文件的设备存储多媒体文件，其中，该多媒体文件在捕获时，嵌入了针对该多媒体文件的描述语音数据。The device for retrieving the multimedia file stores the multimedia file, wherein when the multimedia file is captured, descriptive voice data for the multimedia file is embedded.

如果该设备具有获取多媒体文件的功能时，该设备将获取的多媒体文件直接存储；如果该设备不具有捕获多媒体文件的功能时，将捕获该多媒体文件的设备上的多媒体文件转存到本设备上。If the device has the function of acquiring multimedia files, the device will directly store the acquired multimedia files; if the device does not have the function of capturing multimedia files, transfer the multimedia files on the device that captures the multimedia files to this device .

其中，针对该多媒体文件的描述语音数据可以是捕获的发声者针对该多媒体文件的描述语音数据。Wherein, the descriptive voice data for the multimedia file may be the captured speaker's descriptive voice data for the multimedia file.

如在拍摄一张图片时，直接捕获发声者针对这张图片的描述语音，并嵌入这张图片中，描述语音数据的内容如：这是在上海拍摄的图片。还可以更细致的描述，如具体地点等。For example, when taking a picture, directly capture the speaker's description voice for the picture, and embed it into the picture, and describe the content of the voice data, such as: This is a picture taken in Shanghai. A more detailed description, such as a specific location, can also be provided.

针对该多媒体文件的描述语音数据也可以为预设的缺省语音数据。The descriptive voice data for the multimedia file may also be preset default voice data.

预设的缺省语音为预先捕获的一段描述语音，假设该描述语音数据的内容同样为：这是在上海拍摄的图片。这样在上海拍摄的图片都可以嵌入这段预设的缺省语音数据，可以不需要每次都捕获描述语音数据。The preset default voice is a pre-captured descriptive voice, and it is assumed that the content of the descriptive voice data is also: this is a picture taken in Shanghai. In this way, all the pictures taken in Shanghai can be embedded with this preset default voice data, and it is not necessary to capture and describe the voice data every time.

多媒体文件在捕获时，嵌入了针对该多媒体文件的描述语音数据，包括：将针对该多媒体文件的描述语音数据，以扩展数据、元数据、数字水印，或其他保留语音数据原始格式的形式，嵌入到捕获到的多媒体文件中。When the multimedia file is captured, the descriptive voice data for the multimedia file is embedded, including: the descriptive voice data for the multimedia file is embedded in the form of extended data, metadata, digital watermark, or other forms that retain the original format of the voice data to the captured multimedia file.

其中，以扩展数据的方式嵌入，即将描述语音数据作为多媒体文件的扩展头的方式嵌入；以元数据的方式嵌入，即将描述语音数据作为多媒体文件的辅助数据的方式嵌入；以数字水印的方式嵌入，即将描述语音数据以数字水印的方式嵌入多媒体文件中。Among them, embedding in the form of extended data means embedding the description voice data as the extension header of the multimedia file; embedding in the form of metadata means embedding the description voice data as the auxiliary data of the multimedia file; embedding in the form of digital watermark , about to embed the voice data into multimedia files in the form of digital watermark.

本发明具体实施例中的描述语音嵌入数据的方式，不需像现有实现中，独立保存描述关键词，以及多媒体文件与其对应的描述关键词的映射关系。在对多媒体文件进行转存，或修改名称等操作时，不需维护该多媒体文件与其对应的描述关键词的映射关系。不影响多媒体文件的检索。The method of describing speech embedding data in the specific embodiment of the present invention does not need to independently save the description keywords and the mapping relationship between the multimedia file and the corresponding description keywords as in the existing implementation. When operations such as dumping or modifying a name of a multimedia file are performed, there is no need to maintain the mapping relationship between the multimedia file and its corresponding description keywords. Retrieval of multimedia files is not affected.

实施例一Embodiment one

参见图1，图1为本发明实施例一中检索多媒体文件的方法流程示意图。具体步骤为：Referring to FIG. 1 , FIG. 1 is a schematic flowchart of a method for retrieving multimedia files in Embodiment 1 of the present invention. The specific steps are:

步骤101，设备接收到检索多媒体文件的语音指令时，识别并分离该语音指令中的动作关键词和需要检索的多媒体文件的描述关键词。Step 101, when the device receives a voice instruction for retrieving multimedia files, it recognizes and separates action keywords in the voice instruction and description keywords of the multimedia files to be retrieved.

该设备接收到语音指令时，通过本设备的语音识别引擎或者程序，或者通过其他设备提供的语音识别引擎或者程序，识别该语音指令，并将识别出的关键词进行分离，分离为描述关键词和动作关键词。When the device receives a voice command, it recognizes the voice command through the voice recognition engine or program of the device, or through a voice recognition engine or program provided by other devices, and separates the recognized keywords into description keywords and action keywords.

假设接收到的语音指令的内容为“显示上海的照片”，则分离出动作关键词为：“显示”；描述关键词为：“上海”。Assuming that the content of the received voice instruction is "display the photos of Shanghai", the separated action keyword is: "display"; the description keyword is: "Shanghai".

对于如何识别和分离语音指令中的关键词，通过现有语音识别技术均可实现，这里不再具体给出实现方法。As for how to identify and separate the keywords in the voice instruction, it can be realized by the existing voice recognition technology, and no specific implementation method will be given here.

步骤102，该设备根据分离出的描述关键词，对存储的多媒体文件进行匹配。Step 102, the device matches the stored multimedia files according to the separated description keywords.

其中，该设备匹配每个多媒体文件时，通过嵌入该多媒体文件的描述语音数据，识别出该多媒体文件的描述关键词；若分离出的描述关键词与该识别出的描述关键词匹配，则确定该多媒体文件为匹配的多媒体文件；若分离出的描述关键词与识别出的描述关键词不匹配，则确定该多媒体文件不为匹配的多媒体文件。Wherein, when the device matches each multimedia file, it recognizes the description keyword of the multimedia file by embedding the description voice data of the multimedia file; if the separated description keyword matches the identified description keyword, then determine The multimedia file is a matching multimedia file; if the separated description keyword does not match the identified description keyword, it is determined that the multimedia file is not a matching multimedia file.

如果对某个多媒体文件中嵌入的描述语音识别出的描述关键词为“上海”，则确定该多媒体文件为匹配的多媒体文件；如果对某个多媒体文件中嵌入的描述语音识别出的描述关键词为“北京”，则确定该多媒体文件为不匹配的多媒体文件。If the description keyword recognized by the description speech embedded in a certain multimedia file is "Shanghai", then it is determined that the multimedia file is a matching multimedia file; if the description keyword recognized by the description speech embedded in a certain multimedia file is "Beijing", then it is determined that the multimedia file is an unmatched multimedia file.

步骤103，该设备根据分离出的动作关键词，对匹配到的多媒体文件执行相应的动作。Step 103, the device executes corresponding actions on the matched multimedia files according to the separated action keywords.

如上文举例，分离出的动作关键词为“显示”时，则执行的相应动作为，显示匹配到的多媒体文件。As an example above, when the separated action keyword is "display", the corresponding action to be executed is to display the matched multimedia file.

实施例二Embodiment two

参见图2，图2为本发明实施例二中检索多媒体文件的方法流程示意图。具体步骤为：Referring to FIG. 2 , FIG. 2 is a schematic flowchart of a method for retrieving multimedia files in Embodiment 2 of the present invention. The specific steps are:

步骤201，设备接收到检索多媒体文件的语音指令时，识别并分离该语音指令中的动作关键词和需要检索的多媒体文件的描述关键词。In step 201, when the device receives a voice instruction for retrieving multimedia files, it recognizes and separates action keywords in the voice instruction from description keywords of the multimedia files to be retrieved.

步骤202，该设备确定分离出的描述关键词是否与预设描述关键词匹配，如果是，执行步骤203；否则，执行步骤205。In step 202, the device determines whether the separated description keyword matches a preset description keyword, and if yes, executes step 203; otherwise, executes step 205.

如果接收到的语音指令的内容为“显示我拍的照片”，识别并分离出的动作关键词为“显示”；描述关键词为“我拍的照片”，当然，具体实现时的描述关键词也可以是“我”等，具体实现具体设置。If the content of the received voice command is "show the photos I took", the recognized and separated action keyword is "display"; the description keyword is "the photos I took", of course, the description keywords in the specific implementation It can also be "I", etc., to implement specific settings.

这时分离出的描述关键词和具体的人相联系，则需要执行步骤203，使用发声者的语音特征来匹配。At this time, the separated descriptive keyword is associated with a specific person, and step 203 needs to be executed to match using the voice feature of the speaker.

预设描述关键词，为与发声者自身相关的描述关键词，如：我的、我拍的照片等。The preset description keywords are description keywords related to the speaker himself, such as: mine, photos I took, etc.

步骤203，该设备对接收到的语音指令进行语音特征识别，获得该语音指令的发声者的语音特征。Step 203, the device performs voice feature recognition on the received voice command, and obtains the voice feature of the speaker of the voice command.

本步骤的具体实现，可以根据生物特征技术，提取出发声者的语音特征，这里不再具体描述。The specific implementation of this step can be based on biometric technology to extract the voice features of the speaker, which will not be described in detail here.

步骤204，该设备使用获得的发声者的语音特征，对存储的多媒体文件进行匹配，执行步骤206。In step 204, the device uses the acquired voice characteristics of the speaker to match the stored multimedia files, and then executes step 206.

该设备匹配每个多媒体文件时，识别出嵌入该多媒体文件的描述语音数据的语音特征；若获得的发声者的语音特征与识别出的语音数据的语音特征匹配，则确定该多媒体文件为匹配的多媒体文件；若获得的发生者的语音特征与识别出的语音特征不匹配，则确定该多媒体文件不为匹配的多媒体文件。When the device matches each multimedia file, it recognizes the voice feature of the description voice data embedded in the multimedia file; if the voice feature of the obtained speaker matches the voice feature of the recognized voice data, then it is determined that the multimedia file is matched. A multimedia file; if the acquired voice feature of the sender does not match the recognized voice feature, then it is determined that the multimedia file is not a matching multimedia file.

步骤205，该设备根据分离出的描述关键词，对存储的多媒体文件进行匹配。Step 205, the device matches the stored multimedia files according to the separated description keywords.

步骤206，该设备根据分离出的动作关键词，对匹配的多媒体文件执行相应的动作。In step 206, the device executes corresponding actions on the matched multimedia files according to the separated action keywords.

本实施例中在分离出的描述关键词与配置的描述关键词匹配时，即分离出的描述关键词与发声者自身相关联时，通过接收到的语音指令的语音特征匹配多媒体文件。In this embodiment, when the separated description keyword matches the configured description keyword, that is, when the separated description keyword is associated with the speaker itself, the multimedia file is matched according to the voice feature of the received voice command.

实施例三Embodiment three

参见图3，图3为本发明具体实施例三中检索多媒体文件的方法流程示意图。具体步骤为：Referring to FIG. 3 , FIG. 3 is a schematic flowchart of a method for retrieving multimedia files in Embodiment 3 of the present invention. The specific steps are:

步骤301，设备接收到检索多媒体文件的语音指令，识别并分离该语音指令中的动作关键词和需要检索的多媒体文件的描述关键词；并对接收到的语音指令进行语音特征识别，获得该语音指令的发声者的语音特征。Step 301, the device receives a voice command for retrieving multimedia files, recognizes and separates the action keywords in the voice command and the description keywords of the multimedia files to be retrieved; and performs voice feature recognition on the received voice command to obtain the voice The speech characteristics of the speaker of the command.

在本实施例中，在接收到语音指令时，可以直接对接收到的语音指令进行语音特征识别，获得该语音指令的发声者的语音特征，也可以在使用语音特征识别时，再获取语音特征，即执行步骤303之前再识别语音特征。In this embodiment, when a voice command is received, the voice feature recognition can be directly performed on the received voice command to obtain the voice feature of the speaker of the voice command, or the voice feature can be obtained when voice feature recognition is used. , that is, to recognize speech features before performing step 303 .

步骤302，该设备根据分离出的描述关键词，对存储的多媒体文件进行匹配。Step 302, the device matches the stored multimedia files according to the separated description keywords.

其中，其中，该设备匹配每个多媒体文件时，通过嵌入该多媒体文件的描述语音数据，识别出该多媒体文件的描述关键词；若分离出的描述关键词与该识别出的描述关键词匹配，则确定该多媒体文件为匹配的多媒体文件；若分离出的描述关键词与识别出的描述关键词不匹配，则确定该多媒体文件不为匹配的多媒体文件。。Wherein, wherein, when the device matches each multimedia file, it identifies the description keyword of the multimedia file by embedding the description voice data of the multimedia file; if the separated description keyword matches the identified description keyword, Then it is determined that the multimedia file is a matching multimedia file; if the separated description keyword does not match the identified description keyword, it is determined that the multimedia file is not a matching multimedia file. .

步骤303，该设备使用获得的发声者的语音特征，对通过分离出的描述关键词匹配到的多媒体文件再次进行匹配。In step 303, the device uses the acquired voice features of the speaker to match the multimedia files matched by the separated description keywords again.

在使用分离出的描述关键词检索结束后，获得了一些与描述关键词匹配的多媒体文件。本步骤使用获得的发声者的语音特征在这些已经匹配出的多媒体文件中进行二次匹配。After the retrieval using the separated descriptive keywords, some multimedia files matching the descriptive keywords are obtained. In this step, the acquired voice features of the speaker are used to perform secondary matching in the matched multimedia files.

其中，该设备匹配每个多媒体文件时，识别出嵌入该多媒体文件的描述语音数据的语音特征；若获得的发声者的语音特征与识别出的语音数据的语音特征匹配，则确定该多媒体文件为匹配的多媒体文件；否则，确定该识别出的语音数据的语音特征对应的多媒体文件不为匹配的多媒体文件。Wherein, when the device matches each multimedia file, it recognizes the voice feature of the description voice data embedded in the multimedia file; if the voice feature of the obtained speaker matches the voice feature of the recognized voice data, then it is determined that the multimedia file is A matching multimedia file; otherwise, it is determined that the multimedia file corresponding to the voice feature of the recognized voice data is not a matching multimedia file.

步骤304，该设备根据分离出的动作关键词，对通过语音特征匹配到的多媒体文件执行相应的动作。In step 304, the device performs a corresponding action on the multimedia file matched through the voice feature according to the separated action keyword.

本实施例中将描述关键词和语音特征结合进行检索，最终相当于显示的是两个因素检索交集后的多媒体文件，这样能够提高检索的准确性。该实施例具体实现时，先通过描述关键词检索，再通过语音特征检索。In this embodiment, the retrieval is performed by combining the description keyword and the speech feature, which is equivalent to displaying the multimedia file after the intersection of the two factors retrieval, which can improve the retrieval accuracy. When this embodiment is actually implemented, it is first searched through description keywords, and then searched through speech features.

实施例四Embodiment four

参见图4，图4为本发明实施例四中检索多媒体文件的方法流程示意图。具体步骤为：Referring to FIG. 4 , FIG. 4 is a schematic flowchart of a method for retrieving multimedia files in Embodiment 4 of the present invention. The specific steps are:

步骤401，设备接收到检索多媒体文件的语音指令时，识别并分离该语音指令中的动作关键词和需要检索的多媒体文件的描述关键词；并对接收到的语音指令进行语音特征识别，获得该语音指令的发声者的语音特征。Step 401, when the device receives a voice command for retrieving multimedia files, it recognizes and separates the action keywords in the voice command and the description keywords of the multimedia files to be retrieved; and performs voice feature recognition on the received voice commands to obtain the Voice characteristics of the speaker of the voice command.

步骤402，该设备使用获得的发声者的语音特征，对存储的多媒体文件进行匹配。Step 402, the device matches the stored multimedia files by using the acquired voice characteristics of the speaker.

步骤403，该设备根据分离出的描述关键词，对通过获得的发声者的语音特征匹配到的多媒体文件再次进行匹配。In step 403, the device performs matching again on the multimedia file matched by the acquired voice features of the speaker according to the separated description keywords.

步骤404，该设备根据分离出的动作关键词，对通过分离出的关键词匹配到的多媒体文件执行相应的动作。In step 404, the device executes corresponding actions on the multimedia files matched by the separated keywords according to the separated action keywords.

本实施例中同样是将描述关键词和语音特征结合进行检索，最终相当于显示的是两个因素检索交集后的多媒体文件，这样能够提高检索的准确性。该实施例具体实现时，先通过语音特征检索，再通过描述关键词检索。In this embodiment, the search is also performed by combining the description keyword and the speech feature, and finally what is displayed is the multimedia file after the intersection of the two factors, which can improve the accuracy of the search. When this embodiment is actually implemented, it is first searched through speech features, and then searched through description keywords.

实施例五Embodiment five

参见图5，图5为本发明实施例五中多媒体文件版权确定方法流程示意图。具体步骤为：Referring to FIG. 5 , FIG. 5 is a schematic flowchart of a method for determining the copyright of a multimedia file in Embodiment 5 of the present invention. The specific steps are:

步骤501，设备接收到发声者的语音数据时，进行语音特征识别，获得该发声者的语音特征。Step 501, when the device receives the voice data of the speaker, it performs voice feature recognition to obtain the voice feature of the speaker.

步骤502，该设备对需要确定版权的多媒体文件中嵌入的描述语音数据进行语音特征识别，获得描述语音数据的语音特征。Step 502, the device performs voice feature recognition on the descriptive voice data embedded in the multimedia file whose copyright needs to be determined, and obtains the voice feature of the descriptive voice data.

步骤503，该设备确定获得的发声者的语音特征与获得的描述语音数据的语音特征进行匹配。In step 503, the device determines that the obtained voice feature of the speaker matches the obtained voice feature describing the voice data.

步骤504，若获得的发声者的语音特征与获得的描述语音数据的语音特征匹配，则确定该发声者为该多媒体文件的版权所有者；否则，确定该发声者不为该多媒体文件的版权所有者。Step 504, if the obtained voice feature of the speaker matches the obtained voice feature describing the voice data, then determine that the speaker is the copyright owner of the multimedia file; otherwise, determine that the speaker is not the copyright owner of the multimedia file By.

利用生物学上的一致性，对多媒体文件进行版权保护。Copyright protection of multimedia files using biological consistency.

基于同样的发明构思，本申请还提出一种装置。参见图6，图6为应用于上述技术的装置结构示意图。该装置包括：存储单元601、接收单元602、识别单元603、匹配单元604和处理单元605。Based on the same inventive concept, the present application also proposes a device. Referring to FIG. 6 , FIG. 6 is a schematic structural diagram of a device applied to the above technology. The device includes: a storage unit 601 , a receiving unit 602 , an identification unit 603 , a matching unit 604 and a processing unit 605 .

存储单元601，用于存储多媒体文件，其中，所述多媒体文件在捕获时，嵌入了针对该多媒体文件的描述语音数据。The storage unit 601 is configured to store a multimedia file, wherein when the multimedia file is captured, description voice data for the multimedia file is embedded.

接收单元602，用于接收检索多媒体文件的语音指令。The receiving unit 602 is configured to receive voice instructions for retrieving multimedia files.

识别单元603，用于当接收单元602接收到检索多媒体文件的语音指令时，识别并分离该语音指令中的动作关键词和需要检索的多媒体文件的描述关键词。The recognition unit 603 is configured to, when the receiving unit 602 receives a voice command for retrieving a multimedia file, identify and separate the action keyword in the voice command and the description keyword of the multimedia file to be retrieved.

匹配单元604，用于根据识别单元603分离出的描述关键词，对存储单元601存储的多媒体文件进行匹配；其中，匹配每个多媒体文件时，通过嵌入该多媒体文件的描述语音数据，识别出该多媒体文件的描述关键词；若分离出的描述关键词与识别出的描述关键词匹配，则确定该多媒体文件为匹配的多媒体文件。The matching unit 604 is configured to match the multimedia files stored in the storage unit 601 according to the descriptive keywords separated by the identifying unit 603; wherein, when matching each multimedia file, identify the multimedia file by embedding the description voice data of the multimedia file A description keyword of the multimedia file; if the separated description keyword matches the identified description keyword, it is determined that the multimedia file is a matched multimedia file.

处理单元604，用于根据识别单元603分离出的动作关键词，对匹配单元604匹配到的多媒体文件执行相应的动作。The processing unit 604 is configured to perform corresponding actions on the multimedia files matched by the matching unit 604 according to the action keywords separated by the recognition unit 603 .

较佳地，所述针对该多媒体文件的描述语音数据为：预设的缺省语音数据，或，捕获的发声者针对该多媒体文件的描述语音数据。Preferably, the descriptive voice data for the multimedia file is: preset default voice data, or the captured speaker's descriptive voice data for the multimedia file.

较佳地，Preferably,

存储单元601，存储的多媒体文件在捕获时，将针对该多媒体文件的描述语音数据，以扩展数据、元数据、数字水印，或保留数据原始格式的形式，嵌入到捕获到的多媒体文件中。The storage unit 601 , when the stored multimedia file is captured, embeds the description voice data for the multimedia file into the captured multimedia file in the form of extended data, metadata, digital watermark, or retaining the original format of the data.

较佳地，Preferably,

处理单元604，进一步用于确定识别单元603分离出的关键词是否与预设关键词匹配。The processing unit 604 is further configured to determine whether the keyword separated by the recognition unit 603 matches a preset keyword.

识别单元603，进一步用于若处理单元604确定分离出的描述关键词与预设描述关键词匹配，对接收到的语音指令进行语音特征识别，获得该语音指令的发声者的语音特征。The recognition unit 603 is further configured to, if the processing unit 604 determines that the separated description keyword matches the preset description keyword, perform voice feature recognition on the received voice command, and obtain the voice feature of the speaker of the voice command.

匹配单元604，进一步用于使用识别单元603获得的发声者的语音特征，对存储的多媒体文件进行匹配；其中，匹配每个多媒体文件时，识别出嵌入该多媒体文件的描述语音数据的语音特征；若获得的发声者的语音特征与识别出的语音特征匹配，则确定该多媒体文件为匹配的多媒体文件；并触发处理单元604根据分离出的动作关键词，对匹配到的多媒体文件执行相应的动作；当处理单元604确定识别单元603分离出的描述关键词与预设描述关键词不匹配，则根据分离出的描述关键词，对存储的多媒体文件进行匹配。The matching unit 604 is further used to match the stored multimedia files using the voice features of the speaker obtained by the recognition unit 603; wherein, when matching each multimedia file, identify the voice features embedded in the multimedia file to describe the voice data; If the voice feature of the obtained speaker matches the recognized voice feature, then it is determined that the multimedia file is a matched multimedia file; and the processing unit 604 is triggered to perform corresponding actions on the matched multimedia file according to the separated action keywords ; When the processing unit 604 determines that the description keywords separated by the recognition unit 603 do not match the preset description keywords, then match the stored multimedia files according to the separated description keywords.

较佳地，Preferably,

识别单元603，进一步用于对接收到的语音指令进行语音特征识别，获得该语音指令的发声者的语音特征。The recognition unit 603 is further configured to perform voice feature recognition on the received voice command, and obtain the voice feature of the speaker of the voice command.

匹配单元604，进一步用于根据分离出的描述关键词，对存储的多媒体文件进行匹配之后，使用获得的发声者的语音特征，对通过描述关键词匹配到的多媒体文件进行进一步匹配；其中，匹配每个多媒体文件时，识别出嵌入该多媒体文件的描述语音数据的语音特征；若获得的发声者的语音特征与识别出的语音数据的语音特征匹配，则确定该多媒体文件为匹配的多媒体文件。The matching unit 604 is further configured to, after matching the stored multimedia files according to the separated description keywords, use the obtained voice characteristics of the speaker to further match the multimedia files matched by the description keywords; wherein, the matching When each multimedia file is identified, the voice feature describing the voice data embedded in the multimedia file is identified; if the voice feature of the obtained speaker matches the voice feature of the recognized voice data, the multimedia file is determined to be a matching multimedia file.

处理单元604，进一步用于根据分离出的动作关键词，对匹配单元604通过语音特征匹配到的多媒体文件执行相应的动作。The processing unit 604 is further configured to perform corresponding actions on the multimedia files matched by the matching unit 604 through voice features according to the separated action keywords.

较佳地，Preferably,

匹配单元604，进一步用于使用获得的发声者的语音特征，对存储的多媒体文件进行匹配之后；其中，匹配每个多媒体文件时，识别出嵌入该多媒体文件的描述语音数据的语音特征；若获得的发声者的语音特征与识别出的语音数据的语音特征匹配，则确定该多媒体文件为匹配的多媒体文件；并根据分离出的关键词，在通过语音特征匹配到的多媒体文件中进行匹配。The matching unit 604 is further configured to use the obtained voice features of the speaker to match the stored multimedia files; wherein, when matching each multimedia file, identify the voice features embedded in the multimedia file to describe the voice data; if obtained If the voice feature of the speaker matches the voice feature of the recognized voice data, it is determined that the multimedia file is a matched multimedia file; and according to the separated keywords, matching is performed in the multimedia file matched by the voice feature.

处理单元604，进一步用于根据分离出的动作关键词，对匹配单元604通过描述关键词匹配到的多媒体文件执行相应的动作。The processing unit 604 is further configured to perform a corresponding action on the multimedia file matched by the matching unit 604 through the description keyword according to the separated action keyword.

较佳地，Preferably,

接收单元602，进一步用于接收发声者的语音数据。The receiving unit 602 is further configured to receive the voice data of the speaker.

识别单元603，进一步用于接收单元602接收到发声者的语音数据时，进行语音特征识别，获得该发声者的语音特征；对需要确定版权的多媒体文件中嵌入的描述语音数据进行语音特征识别，获得描述语音数据的语音特征。Recognition unit 603 is further used for receiving unit 602 to perform speech feature recognition when receiving the voice data of the speaker to obtain the voice feature of the speaker; to perform voice feature recognition on the descriptive voice data embedded in the multimedia file that needs to determine the copyright, Obtain speech features describing the speech data.

匹配单元604，进一步用于确定获得的发声者的语音特征与获得的描述语音数据的语音特征进行匹配。The matching unit 604 is further configured to determine that the obtained voice features of the speaker are matched with the obtained voice features describing the voice data.

处理单元604，进一步用于若匹配单元604确定获得的发声者的语音特征与获得的描述语音数据的语音特征匹配，则确定该发声者为该多媒体文件的版权所有者；否则，确定该发声者不为该多媒体文件的版权所有者。The processing unit 604 is further configured to determine that the speaker is the copyright owner of the multimedia file if the matching unit 604 determines that the obtained voice feature of the speaker matches the obtained voice feature describing the voice data; otherwise, determine that the speaker is not the copyright owner of this multimedia file.

综上所述，本发明具体实施例中在接收到语音指令时，分离出该语音指令中的动作关键词和描述关键词；使用分离出的描述关键词匹配存储的嵌入描述语音数据的多媒体文件，若匹配到多媒体文件，根据分离出的动作关键词，对确定为要检索的多媒体文件执行相应的动作。在不需要保存和维护关键词和多媒体文件的映射关系的情况下，能够简单地通过语音进行多媒体文件的检索，带来友好的用户体检。In summary, in specific embodiments of the present invention, when a voice command is received, the action keywords and description keywords in the voice command are separated; the separated description keywords are used to match the stored multimedia files embedded in the description voice data , if a multimedia file is matched, perform a corresponding action on the multimedia file determined to be retrieved according to the separated action keywords. Without the need to save and maintain the mapping relationship between keywords and multimedia files, multimedia files can be retrieved simply by voice, bringing friendly user checkups.

在具体实施例中，还给出了通过发声者的语音特征进行检索，以及通过语音特征和描述关键词组合检索的实施例，能够更方便，更准确的检索出需要检索的多媒体文件。In the specific embodiment, it also provides the embodiment of searching through the voice features of the speaker, and combining the voice features and description keywords, so that the multimedia files to be retrieved can be retrieved more conveniently and accurately.

同时，还提供了通过语音特征的识别，确定多媒体文件的版权所有者的实施例，能够简单，有效地保护多媒体文件的版权。At the same time, it also provides an embodiment of determining the copyright owner of the multimedia file through the recognition of the voice feature, which can simply and effectively protect the copyright of the multimedia file.

以上所述，仅为本发明的较佳实施例而已，并非用于限定本发明的保护范围。凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the protection scope of the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.

Claims

1. a kind of method of multimedia document retrieval, it is characterised in that methods described includes：

Storing multimedia, wherein, the multimedia file embedded in the description language for the multimedia file in capture Sound data；

Receive retrieving multimedia file phonetic order when, recognize and separate the phonetic order in action keyword and needs The description keyword of the multimedia file of retrieval；

According to the description keyword that isolates, the multimedia file to storing mates；Wherein, mate each multimedia file When, by the description speech data of the multimedia file is embedded in, identify the description keyword of the multimedia file；If isolating The description keyword and description Keywords matching for identifying, it is determined that the multimedia file is the multimedia file of coupling；Its In, when the multimedia file to storing mates, the phonetic feature according to sounder is mated；

According to the action keyword that isolates, the multimedia file to matching executes corresponding action.

2. method according to claim 1, it is characterised in that

The description speech data for the multimedia file is：Default default speech data, or, the sounder pin of capture Description speech data to the multimedia file.

3. method according to claim 1, it is characterised in that the multimedia file is embedded in for should in capture The description speech data of multimedia file, including：By the description speech data for the multimedia file, with growth data, unit Data, digital watermarking, or the form of retention data unprocessed form, are embedded in the multimedia file for capturing.

4. the method according to claim 1-3 any one, it is characterised in that the action in the separation phonetic order Keyword and the description keyword of the multimedia file for needing to retrieve, afterwards, the description keyword that the basis is isolated, to depositing The multimedia file of storage is mated, and before, methods described is further included：

Determine the description keyword isolated whether with default description Keywords matching, if it is, to the phonetic order for receiving Phonetic feature identification is carried out, obtains the phonetic feature of the sounder of the phonetic order；Using the phonetic feature of the sounder for obtaining, Multimedia file to storing mates；Wherein, when mating each multimedia file, the embedded multimedia file is identified The phonetic feature of description speech data；If the phonetic feature of the sounder for obtaining is mated with the phonetic feature for identifying, it is determined that The multimedia file is the multimedia file of coupling；And according to the action keyword that isolates, to the multimedia file for matching Execute corresponding action；

Otherwise, execute the description keyword that the basis is isolated, the step of the multimedia file to storing mates and after Continuous step.

5. the method according to claim 1-3 any one, it is characterised in that the action in the separation phonetic order During the description keyword of keyword and the multimedia file for needing to retrieve, methods described is further included：To the voice for receiving Instruction carries out phonetic feature identification, obtains the phonetic feature of the sounder of the phonetic order；

The description keyword that the basis is isolated, the multimedia file to storing mate, and afterwards, the basis is isolated Action keyword, corresponding action is executed to the multimedia file that matches, before, methods described is further included：

Using obtain sounder phonetic feature, to by description Keywords matching to multimedia file carry out further Join；Wherein, the phonetic feature of the description speech data of the embedded multimedia file, when mating each multimedia file, is identified； If the phonetic feature of sounder for obtaining is mated with the phonetic feature of the speech data for identifying, it is determined that the multimedia file is The multimedia file of coupling；

The action keyword that the basis is isolated, the multimedia file to matching execute corresponding action, including：According to point The action keyword for separating out, the multimedia file to being matched by phonetic feature execute corresponding action.

6. the method according to claim 1-3 any one, it is characterised in that the action in the separation phonetic order During the description keyword of keyword and the multimedia file for needing to retrieve, methods described is further included：To the voice for receiving Instruction carries out phonetic feature identification, obtains the phonetic feature of the sounder of the phonetic order；

The description keyword that the basis is isolated, the multimedia file to storing mate, and before, methods described is further Including：Using the phonetic feature of the sounder for obtaining, the multimedia file to storing mates；Wherein, mate each many matchmaker During body file, the phonetic feature of the description speech data of the embedded multimedia file is identified；If the voice of the sounder for obtaining Feature is mated with the phonetic feature of the speech data for identifying, it is determined that the multimedia file is the multimedia file of coupling；

Wherein, according to the description keyword that isolates, the multimedia file to storing mates, including：According to isolated Keyword, is mated in the multimedia file matched by phonetic feature.

7. the method according to claim 1-3 any one, it is characterised in that methods described is further included：

Receive sounder speech data when, carry out phonetic feature identification, obtain the phonetic feature of the sounder；

To it needs to be determined that the description speech data being embedded in the multimedia file of copyright carries out phonetic feature identification, acquisition description language The phonetic feature of sound data；

The phonetic feature for determining the sounder for obtaining is mated with the phonetic feature of the description speech data for obtaining；

If the phonetic feature of the sounder for obtaining is mated with the phonetic feature of the description speech data for obtaining, it is determined that the sounder Copyright owner for the multimedia file；Otherwise, it determines sounder copyright owner not for the multimedia file.

8. a kind of device, it is characterised in that the device includes：Memory cell, receiving unit, recognition unit, matching unit and place Reason unit；

The memory cell, for storing multimedia, wherein, the multimedia file is embedded in be directed to and is somebody's turn to do in capture The description speech data of multimedia file；

The receiving unit, for receiving the phonetic order of retrieving multimedia file；

The recognition unit, for receive when the receiving unit retrieving multimedia file phonetic order when, recognize and point Action keyword in the phonetic order and the description keyword of the multimedia file for needing to retrieve；

The matching unit, for the description keyword that is isolated according to the recognition unit, to memory cell storage Multimedia file is mated；Wherein, when mating each multimedia file, by being embedded in the description voice number of the multimedia file According to identifying the description keyword of the multimedia file；If the description keyword that isolates and the description keyword for identifying Join, it is determined that the multimedia file is the multimedia file of coupling；Wherein, when the multimedia file to storing mates, Phonetic feature according to sounder is mated；

The processing unit, for the action keyword that is isolated according to the recognition unit, matches to the matching unit Multimedia file execute corresponding action.

9. device according to claim 8, it is characterised in that the description speech data for the multimedia file For：Default default speech data, or, description speech data of the sounder of capture for the multimedia file.

10. device according to claim 8, it is characterised in that

The memory cell, the multimedia file of storage will be directed to the description speech data of the multimedia file in capture, with Growth data, metadata, digital watermarking, or the form of retention data unprocessed form, are embedded in the multimedia file for capturing.

11. devices according to claim 8-10 any one, it is characterised in that

The processing unit, be further used for determining keyword that the recognition unit isolates whether with predetermined keyword Join；

The recognition unit, if be further used for the processing unit to determine that the description keyword that isolates is crucial with default description Word mates, and the phonetic order to receiving carries out phonetic feature identification, obtains the phonetic feature of the sounder of the phonetic order；

The matching unit, is further used for the phonetic feature of sounder obtained using the recognition unit, to many of storage Media file is mated；Wherein, the description voice number of the embedded multimedia file, when mating each multimedia file, is identified According to phonetic feature；If the phonetic feature of the sounder for obtaining is mated with the phonetic feature for identifying, it is determined that the multimedia text Part is the multimedia file of coupling；And the processing unit is triggered according to the action keyword that isolates, to many matchmakers for matching Body file executes corresponding action；When the processing unit determines that the description keyword that the recognition unit is isolated is retouched with default Crucial word mismatch is stated, then according to the description keyword that isolates, the multimedia file to storing mates.

12. devices according to claim 8-10 any one, it is characterised in that

The recognition unit, being further used for the phonetic order to receiving carries out phonetic feature identification, obtains the phonetic order Sounder phonetic feature；

The matching unit, is further used for according to the description keyword that isolates, and the multimedia file to storing mates Afterwards, using obtain sounder phonetic feature, to by description Keywords matching to multimedia file carry out further Coupling；Wherein, when mating each multimedia file, identify that the voice of the description speech data of the embedded multimedia file is special Levy；If the phonetic feature of the sounder for obtaining is mated with the phonetic feature of the speech data for identifying, it is determined that the multimedia text Part is the multimedia file of coupling；

The processing unit, is further used for, according to the action keyword that isolates, passing through phonetic feature to the matching unit The multimedia file for matching executes corresponding action.

13. devices according to claim 8-10 any one, it is characterised in that

The matching unit, the phonetic feature being further used for using the sounder for obtaining, the multimedia file to storing are carried out After coupling；Wherein, the voice of the description speech data of the embedded multimedia file, when mating each multimedia file, is identified Feature；If the phonetic feature of the sounder for obtaining is mated with the phonetic feature of the speech data for identifying, it is determined that the multimedia File is the multimedia file of coupling；And according to the keyword that isolates, in the multimedia file matched by phonetic feature In mated；

The processing unit, is further used for according to the action keyword that isolates, crucial by describing to the matching unit The multimedia file that word is matched executes corresponding action.

14. devices according to claim 8-10 any one, it is characterised in that

The receiving unit, is further used for receiving the speech data of sounder；

The recognition unit, when being further used for the speech data that the receiving unit receives sounder, carries out phonetic feature Identification, obtains the phonetic feature of the sounder；To it needs to be determined that the description speech data being embedded in the multimedia file of copyright enters Row phonetic feature recognizes that acquisition describes the phonetic feature of speech data；

The matching unit, is further used for the language of the phonetic feature and description speech data for obtaining for determining the sounder for obtaining Sound feature is mated；

The processing unit, if be further used for the matching unit to determine the phonetic feature of the sounder for obtaining and retouching for obtaining State the phonetic feature coupling of speech data, it is determined that the sounder is the copyright owner of the multimedia file；Otherwise, it determines should Sounder copyright owner not for the multimedia file.