[go: up one dir, main page]

CN109033099A - A kind of multi-media management method and device - Google Patents

A kind of multi-media management method and device Download PDF

Info

Publication number
CN109033099A
CN109033099A CN201710428940.4A CN201710428940A CN109033099A CN 109033099 A CN109033099 A CN 109033099A CN 201710428940 A CN201710428940 A CN 201710428940A CN 109033099 A CN109033099 A CN 109033099A
Authority
CN
China
Prior art keywords
user
label
voice messaging
voice
multimedia file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710428940.4A
Other languages
Chinese (zh)
Inventor
马靖博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201710428940.4A priority Critical patent/CN109033099A/en
Priority to PCT/CN2018/090400 priority patent/WO2018224032A1/en
Publication of CN109033099A publication Critical patent/CN109033099A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The present invention provides a kind of multi-media management method and devices, receive the voice messaging of user, from voice messaging, extract the characteristic information and command information of user, manage corresponding multimedia file according to characteristic information and command information.Implementation through the invention, the voice messaging for combining user are managed multimedia file, realize user to the flexible customized of multimedia file management, meet user to the multiple demands of management, the user experience is improved.

Description

A kind of multi-media management method and device
Technical field
The present invention relates to multimedia technology field more particularly to a kind of multi-media management method and devices.
Background technique
For in existing multimedia control means, other than the broadcasting on basis, pause etc., in broadcasting for video A kind of also emerging means for playing label, are exactly that in a manner of timing node, video is divided in a video in putting It cuts, video is watched with label after dividing so as to allow user, allows user that can see the content for wanting to see faster.So And this tag control mode is only a kind of inflexible, fixed way to manage, user can not according to oneself hobby and Actual demand is managed, it is difficult to meet the multiple demands of present different user.
Summary of the invention
The embodiment of the invention provides a kind of multi-media management method and devices, it is intended to solve in the prior art for more matchmakers The problem of body management means is inflexible single, poor user experience.
In order to solve the above-mentioned technical problem, the embodiment of the invention provides a kind of multi-media management methods, comprising:
Receive the voice messaging of user;
From the voice messaging, the characteristic information and command information of user are extracted;
Corresponding multimedia file is managed according to the characteristic information and command information.
In addition, the embodiment of the present invention also provides a kind of multimedia management device, comprising:
Voice input module, for receiving the voice messaging of user;
Speech recognition module, for extracting the characteristic information and command information of user from the voice messaging;
Command process module, for managing corresponding multimedia file according to the characteristic information and command information.
The beneficial effects of the present invention are:
The present invention provides a kind of multi-media management method and devices, receive the voice messaging of user, from voice messaging, The characteristic information and command information for extracting user, are managed multimedia file according to characteristic information and command information.It is logical Implementation of the invention is crossed, the voice messaging for combining user is managed multimedia file, realizes user to multimedia text Part management it is flexible customized, meet user to the multiple demands of management, the user experience is improved.
Detailed description of the invention
Fig. 1 is a kind of multi-media management method flow chart that first embodiment of the invention provides;
Fig. 2 is a kind of multi-media management method flow chart that second embodiment of the invention provides;
Fig. 3 is a kind of multi-media management method flow chart that second embodiment of the invention provides;
Fig. 4 is a kind of multi-media management method flow chart that second embodiment of the invention provides;
Fig. 5 is a kind of multi-media management method flow chart that second embodiment of the invention provides;
Fig. 6 is a kind of multimedia management device composition schematic diagram that third embodiment of the invention provides.
Specific embodiment
Design point of the invention is, in traditional multimedia administration means, the audio-frequency information progress for adding user can Customized management, to promote the flexibility ratio and freedom degree of user management, and high reliablity, user experience is good.
Specific embodiments of the present invention will be further explained with reference to the accompanying drawing.
First embodiment
Referring to FIG. 1, Fig. 1 is a kind of multi-media management method flow chart that first embodiment of the invention provides, comprising:
S101, the voice messaging for receiving user;
S102, from voice messaging, extract the characteristic information and command information of user;
S103, corresponding multimedia file is managed according to characteristic information and command information.
Multimedia file, including audio file and video file etc. have a feature, they are all based on timeline File, i.e. multimedia file all has a time attribute.For the audio determining for one or video file, at certain It is the time point of a determination, always fixed to correspond to a determining broadcasting content, audio file is then only included in audio Hold, then may include audio content and video content for video file.In addition, special, due to the curve characteristic of audio, It equally may correspond to determining audio curve at determining time point.It in other words, can be by time attribute, directly to determine The position that position is specified into multimedia file.
In S101, the voice messaging of user is received.The voice messaging of user can be the voice letter that user itself is issued Breath, such as the voice messaging that user is issued with utterance, it is also possible that user believes by the voice that other equipment are issued Breath, such as electronic translation, or the sound of user etc. recorded by other sound pick-up outfits.The voice messaging of user is then anti- It has answered user to want the corresponding control for carrying out multimedia file to operate, including conventional broadcasting, pause etc. Control logic.
In S102, from voice messaging, the characteristic information of user, and the instruction corresponding to management multimedia file are extracted Information.According to voice messaging, the characteristic information of user can be extracted, characteristic information can be used as the identification information of same user, root The voice messaging for belonging to same user can be determined according to characteristic information, correspondingly, can also be by characteristic information, by different user Voice messaging distinguish.
In the present embodiment, characteristic information may include the voiceprint of user, wherein voiceprint specifically includes carrying The sound wave spectrum of verbal information.Sound wave not only has specificity, but also has the characteristics that relative stability, especially adult with Afterwards, the sound of people can keep it is long-term stablize relatively it is constant.It is demonstrated experimentally that no matter talker is deliberately to feign another's voice And the tone, or whisper and softly talk, even if imitating remarkably true to life, vocal print is not but identical always.Based on this of vocal print Feature, so that it may accurately, conclude same user is belonged to and distinguish with the voice messaging for being not belonging to same user.
In the present embodiment, the command information corresponding to management multimedia file, then refer to voice transmitted by user Expressed by information, to the specific operation that multimedia file is carried out, this operation logic is conveyed in the form of voice messaging To system.Due to the particularity of voice messaging, difference, the difference of ethnic group, the difference of language of vocal print can all cause between individual The difference of the form of expression of command information included in voice messaging.For example, voice messaging can when user is Chinese To be that mandarin or some the local dialects even can be the sentence etc. for being mingled with some English;It is for user When Frenchman, voice messaging is then generally exactly French.Based on such consideration, in the present embodiment, extract correspond to manage it is more The command information of media file can be carried out by the multiple means for different language, and can be by voice messaging Characteristic information, to select different analysis modes.
Specifically, in the present embodiment, from voice messaging, the characteristic information and command information for extracting user be can wrap It includes: constructing the voiceprint of user according to voice messaging, characteristic information includes voiceprint.And it is carried out according to voice messaging Speech recognition and naturally semantic parsing, according to parsing result determine instruction information.Wherein, naturally semantic parsing, exactly parses language The expressed meaning of message breath, the result according to the different parsings of language are also different.
In the present embodiment, specific command information may include: broadcasting, pause, stop, jumping, createing directory, open At least one of catalogue, addition label etc..Wherein, play, suspend, stop etc. be all belong to it is existing it is multimedia often Advise control instruction in the present embodiment can be according to above-metioned instruction be parsed, to realize corresponding control from voice messaging. For example, playing corresponding voice messaging can be " playing xxx " for Chinese, and if it is English, it is right that plays institute The voice messaging answered can be that " play xxx " is such.Other than playing itself, can also include in voice messaging The voice content of object, such as the filename or a part in filename etc. of multimedia file are played, according in these Appearance can directly open corresponding multimedia file and play.
It in the present embodiment, may include: root to corresponding multimedia file is managed according to characteristic information and command information Existing user is judged whether it is according to the voiceprint of user;If so, being managed according to command information the more media file; If it is not, voiceprint is then saved, based on the corresponding catalogue of voiceprint creation user, and according to command information to multimedia file It is managed.The voiceprint extracted according to voice messaging is compared, according to comparison with the voiceprint of the user deposited As a result, be assured that the voiceprint whether be existing subscriber voiceprint.If it is then the voice messaging solution Analysis, which obtains the analysis mode of command information, to be carried out according to the analysis mode of existing subscriber, thus according to command information into The corresponding processing of row;If it is not, explanation does not have user corresponding to this voice messaging in existing user information, this User is new user, then, if in systems by the information preservation of new user, so that it may: firstly, voiceprint is saved, Then, it is based on the voiceprint, the corresponding catalogue of the new user is created, is finally just performed corresponding processing according to command information. If you do not need in systems by the information preservation of new user, then can directly parse to obtain command information, then according to instruction Information carries out corresponding operation.
In the present embodiment, corresponding according to characteristic information and command information management when command information includes addition label Multimedia file may include: determining voice messaging typing time point;The preset compensation time will be subtracted time point, obtained Label time point;According in command information, the corresponding particular content of addition label creates voice label at label time point.Language Phonetic symbol label are a kind of indicia means of a kind of pair of multimedia file, in the corresponding position of voice label, that is, multimedia file Certain corresponding time point of time attribute user's viewing can be convenient according to the customized label substance of preset or user Or the quick positioning in listening to.And add the process of voice label, addition voice label be usually with user viewing or Listen to multimedia file progress, for example, when user is when watching video, watch a user think to need to add it is tagged Therefore position, user have issued the corresponding voice messaging of addition label, time point at this time is exactly the time of voice messaging typing Point;But this time point has in fact had been subjected to user and has wanted to add tagged time point, because user necessarily first sees Video content, then voice label is added, therefore, the time point where real voice label, it should be the time point of typing The preset compensation time is subtracted again, and obtained time point is only the label time point where voice label;When specific compensation Between, it can be depending on the viewing of user habit, different users can have the different compensation time, and the same user is watching There can also be the different compensation time when different video file.It is noted that the compensation time is intended to help user's mark The multimedia position of required label, the requirement for its accuracy can be elasticity, that is to say, that label time point exists In a certain range of position for really wanting label, when next user wants viewing, label time point is jumped to Later, if there is a deviation in position, user can control progress bar by voice messaging again or user manually adjust, and can be with It modifies again to the content of voice label according to the demand of user, modification may include the label time for modifying voice label Point, and the label substance of modification voice label.
In the present embodiment, corresponding more according to characteristic information and command information management when command information includes jumping Media file may include: to be matched with the voice label deposited according in command information, jumping corresponding particular content, when When matching degree reaches preset threshold, then: the broadcasting of multimedia file being carried out to jump to voice label corresponding label time point. Specifically, jumping mainly includes two kinds of situations: if multimedia file is playing, playback progress directly being jumped to voice mark Sign corresponding label time point;If multimedia file is not switched on, multimedia file is opened, and playback progress is jumped into voice Label corresponding label time point.After voice label is set, so that it may the voice label based on setting carries out skip operation, It jumps and multimedia progress is exactly jumped directly to required label time point.Can substantially it divide for multimedia file It, one is being not switched on, for the multimedia file opened, is jumped directly to corresponding for two states one is having opened Label time point corresponding to audio tag;For the multimedia file being not switched on, then firstly the need of the multimedia is literary Part is opened, and is opened and then according to corresponding audio tag is jumped in voice messaging, is directly jumped to playback progress corresponding Label time point plays out.Specifically, in the present embodiment, about the voice label for jumping corresponding particular content Yu having deposited It is matched, for the voice messaging of same user, can directly match the voice of the two, when matching degree is greater than setting When threshold value, then jump instruction is triggered.The content for including in voice messaging at least may include: to jump, in the label of voice label Hold, can also include the filename of multimedia file, convenient for controlling the multimedia file being not switched on.
In the present embodiment, when command information is to create directory, open catalogue, then refer to that creation user is corresponding Corresponding command information when catalogue, usually new user are added, when there is the corresponding voice messaging of new voiceprint, such as The step of fruit voice messaging is to be carried out preservation voiceprint about createing directory, and creates the user corresponding catalogue.It beats Catalogue is opened, then is that the catalogue of the voice label of corresponding user is presented, display surface can be presented on the content of text On plate, is perhaps played in a manner of voice messaging or played out in the case where user's control automatically, such as in display surface The icon of a player is presented on plate, user clicks and can be carried out playing, alternatively, user can carry out by voice messaging Play control.
A kind of multi-media management method is present embodiments provided, the voice messaging of user is received, from voice messaging, is extracted The characteristic information and command information of user manages corresponding multimedia file according to characteristic information and command information.By this reality The implementation for applying example, the voice messaging for combining user are managed multimedia file, realize user to multimedia file pipe That manages is flexible customized, meets user to the multiple demands of management, the user experience is improved.
Second embodiment
Referring to FIG. 2, Fig. 2 is a kind of multi-media management method flow chart that second embodiment of the invention provides, comprising:
S201, the voice messaging for obtaining user's typing;
Characteristic information in S202, extraction voice messaging;
S203, simultaneously carries out speech recognition, extracts command information therein;
S204, corresponding multimedia file is managed according to command information;
S205, the characteristic information in the voice messaging of typing is saved.
In S201, the voice messaging of user's typing is obtained.The voice messaging of user can be the language that user itself is issued Message breath, such as the voice messaging that user is issued with utterance, it is also possible that the language that user is issued by other equipment Message breath, such as electronic translation, or the sound of user etc. recorded by other sound pick-up outfits.
In S202, from voice messaging, the characteristic information of user is extracted.According to voice messaging, the spy of user can be extracted Reference breath, characteristic information can be used as the identification information of same user, the language for belonging to same user can be determined according to characteristic information Message breath, correspondingly, can also be distinguished the voice messaging of different user by characteristic information.Characteristic information can wrap Include the voiceprint of user, wherein voiceprint specifically includes the sound wave spectrum for carrying verbal information.Sound wave not only has specific Property, and have the characteristics that relative stability, especially after adult, the sound of people can keep it is long-term it is relatively stable not Become.It is demonstrated experimentally that no matter talker is deliberately to feign another's voice and the tone, or whisper in sb.'s ear is softly talked, even if imitating Remarkably true to life, vocal print is not but identical always.This feature based on vocal print, so that it may accurately, will belong to same user and The voice messaging for being not belonging to same user is concluded and is distinguished.
In S203, speech recognition is carried out, command information therein is extracted, then refers to voice messaging transmitted by user Expressed, to the specific operation that multimedia file is carried out, this operation logic is communicated in the form of voice messaging is System.Due to the particularity of voice messaging, difference, the difference of ethnic group, the difference of language of vocal print can all cause voice between individual The difference of the form of expression of command information included in information.For example, voice messaging can be when user is Chinese Mandarin or some the local dialects even can be the sentence etc. for being mingled with some English;It is France for user When people, voice messaging is then generally exactly French.Based on such consideration, in the present embodiment, extracts and correspond to management multimedia The command information of file can be carried out by the multiple means for different language, and can pass through the feature in voice messaging Information, to select different analysis modes.
In the present embodiment, specific command information may include: broadcasting, pause, stop, jumping, createing directory, open At least one of catalogue, addition label etc..Wherein, play, suspend, stop etc. be all belong to it is existing it is multimedia often Advise control instruction in the present embodiment can be according to above-metioned instruction be parsed, to realize corresponding control from voice messaging. For example, playing corresponding voice messaging can be " playing xxx " for Chinese, and if it is English, it is right that plays institute The voice messaging answered can be that " play xxx " is such.Other than playing itself, can also include in voice messaging The voice content of object, such as the filename or a part in filename etc. of multimedia file are played, according in these Appearance can directly open corresponding multimedia file and play.
Referring to FIG. 3, being performed corresponding processing according to command information to multimedia file when command information includes playing May include:
S301, the corresponding command information of broadcasting is judged whether it is, if so, going to S302;
S302, judge whether the command information corresponds to existing subscriber, if so, S303 is gone to, if it is not, then going to S304;
S303, opening simultaneously play corresponding multimedia file;
S304, prompt user do not have related voice label, and from the beginning play corresponding multimedia file.
Referring to FIG. 4, being carried out according to command information to multimedia file corresponding when command information includes addition label Processing may include:
S401, judge whether it is the corresponding command information of addition label;
S402, the time point for obtaining voice messaging typing;
S403, voice label corresponding label time point is calculated;
S404, voice label is generated;
S405, addition voice label to corresponding User Catalog;
S406, feedback voice label add successful prompt information.
Voice label is a kind of indicia means of a kind of pair of multimedia file, in the corresponding position of voice label, that is, Certain corresponding time point of the time attribute of multimedia file, according to the customized label substance of preset or user, Ke Yifang Just user view or listen in quick positioning.
In S402, addition voice label is usually as user views or listens to multimedia file progress, for example, working as User watches a user to think to need to add tagged position, therefore user has issued addition label when watching video Corresponding voice messaging, time point at this time are exactly the time point of voice messaging typing.
In S403, the time point of voice messaging typing has in fact had been subjected to user and has wanted to add tagged time point, because Video content is necessarily first seen for user, then adds voice label, therefore, the time point where real voice label, Should subtract the preset compensation time again at time point of typing, when obtained time point is only the label where voice label Between point;It the specific compensation time, can be depending on the viewing of user habit, when different users can have different compensation Between, the same user can also have the different compensation time when watching different video files.
Referring to FIG. 5, being performed corresponding processing according to command information to multimedia file when command information includes jumping May include:
S501, judge whether it is to jump corresponding command information;If so, going to S502;
S502, judge whether to check tag directory;If so, S503 is gone to, if it is not, then going to S505;
S503, judge whether have corresponding tag directory;If so, S504 is gone to, if it is not, going to S507;
S504, the corresponding tag directory of display user, and continue to obtain the voice messaging of user's typing;
S505, judge whether there is with the matched user of the command information and label substance, if so, S506 is gone to, if it is not, then Go to S508;
S506, multimedia file is jumped to voice label corresponding label time point, played out.
S507, prompt user do not have corresponding tag directory, and terminate process.
S508, prompt user do not have corresponding voice label, and terminate process.
In S502, judge whether to check tag directory, refers to the corresponding tag directory of present multimedia file, if check It is then to be determined according to the content for jumping corresponding command information, if jump instruction specifies the content of voice label, no Need display label catalogue;If the content of the not specified voice label of jump instruction, needs display label catalogue.
In S505, the label substance with jump instruction matched user and voice label is judged whether there is, then may include: It is matched according to corresponding particular content is jumped with the voice label deposited, when matching degree reaches preset threshold, is then illustrated There is relevant voice label, that must be jumped according to this voice label;If it is not, there is no this marks for explanation Label, it may be possible to which user remembers wrongly label substance, it may be possible to the voice messaging of user's typing is wrong, after issuing the user with prompt, Terminate process.
In the present embodiment, when command information is to create directory, open catalogue, then refer to that creation user is corresponding Corresponding command information when catalogue, usually new user are added, when there is the corresponding voice messaging of new voiceprint, such as The step of fruit voice messaging is to be carried out preservation voiceprint about createing directory, and creates the user corresponding catalogue.It beats Catalogue is opened, then is that the catalogue of the voice label of corresponding user is presented, display surface can be presented on the content of text On plate, is perhaps played in a manner of voice messaging or played out in the case where user's control automatically, such as in display surface The icon of a player is presented on plate, user clicks and can be carried out playing, alternatively, user can carry out by voice messaging Play control.
A kind of multi-media management method is present embodiments provided, the voice messaging of user is received, from voice messaging, is extracted The characteristic information of user, and corresponding to the command information of management multimedia file, according to characteristic information and command information management Corresponding multimedia file.By the implementation of the present embodiment, the voice messaging for combining user is managed multimedia file, User is realized to the flexible customized of multimedia file management, user is met to the multiple demands of management, improves use Family experience.
3rd embodiment
Referring to FIG. 3, Fig. 3 is a kind of multimedia management device composition schematic diagram that third embodiment of the invention provides, packet It includes:
Voice input module 601, for receiving the voice messaging of user;
Speech recognition module 602, for extracting the characteristic information and command information of user from voice messaging;
Command process module 603, for managing corresponding multimedia file according to characteristic information and command information.
Multimedia file has a feature, is all based on the file of timeline, i.e., when multimedia file all has one Between attribute.For the audio determining for one or video file, at the time point that some is determined, always fixation corresponds to One determining broadcasting content, then only includes audio content for audio file, then may include in audio for video file Appearance and video content.In addition, it is special, due to the curve characteristic of audio, equally may correspond at determining time point Determining audio curve.It in other words, can be at time attribute be passed through, to be directly targeted to the position specified in multimedia file.
In the present embodiment, voice input module 601 is used to receive the voice messaging of user.The voice messaging of user can be with It is the voice messaging that the voice messaging that user itself is issued, such as user are issued with utterance, it is also possible that user borrows The voice messaging that other equipment are issued, such as electronic translation are helped, or the sound of the user recorded by other sound pick-up outfits Sound etc..The voice messaging of user has then reacted user and has wanted the corresponding control carried out to multimedia file operation, including The control logic of conventional broadcasting, pause etc..
In the present embodiment, speech recognition module 602 is used for from voice messaging, extracts the characteristic information of user, and Command information corresponding to management multimedia file.According to voice messaging, the characteristic information of user can be extracted, characteristic information can As the identification information of same user, the voice messaging for belonging to same user can be determined according to characteristic information, correspondingly, can also By characteristic information, the voice messaging of different user to be distinguished.
In the present embodiment, characteristic information may include the voiceprint of user, wherein voiceprint specifically includes carrying The sound wave spectrum of verbal information.Sound wave not only has specificity, but also has the characteristics that relative stability, especially adult with Afterwards, the sound of people can keep it is long-term stablize relatively it is constant.It is demonstrated experimentally that no matter talker is deliberately to feign another's voice And the tone, or whisper and softly talk, even if imitating remarkably true to life, vocal print is not but identical always.Based on this of vocal print Feature, so that it may accurately, conclude same user is belonged to and distinguish with the voice messaging for being not belonging to same user.
In the present embodiment, the command information corresponding to management multimedia file, then refer to voice transmitted by user Expressed by information, to the specific operation that multimedia file is carried out, this operation logic is conveyed in the form of voice messaging To system.Due to the particularity of voice messaging, difference, the difference of ethnic group, the difference of language of vocal print can all cause between individual The difference of the form of expression of command information included in voice messaging.For example, voice messaging can when user is Chinese To be that mandarin or some the local dialects even can be the sentence etc. for being mingled with some English;It is for user When Frenchman, voice messaging is then generally exactly French.Based on such consideration, in the present embodiment, extract correspond to manage it is more The command information of media file can be carried out by the multiple means for different language, and can be by voice messaging Characteristic information, to select different analysis modes.
Specifically, speech recognition module 602 can be also used for: constructing the voiceprint of user, feature according to voice messaging Information includes voiceprint.It is also possible that speech recognition and naturally semantic parsing are carried out according to voice messaging, it is true according to parsing result Determine command information.Wherein, naturally semantic parsing, exactly parses meaning expressed by voice messaging, is parsed according to the difference of language Result it is also different.
In the present embodiment, specific command information may include: broadcasting, pause, stop, jumping, createing directory, open At least one of catalogue, addition label etc..Wherein, play, suspend, stop etc. be all belong to it is existing it is multimedia often Advise control instruction in the present embodiment can be according to above-metioned instruction be parsed, to realize corresponding control from voice messaging. For example, playing corresponding voice messaging can be " playing xxx " for Chinese, and if it is English, it is right that plays institute The voice messaging answered can be that " play xxx " is such.Other than playing itself, can also include in voice messaging The voice content of object, such as the filename or a part in filename etc. of multimedia file are played, according in these Appearance can directly open corresponding multimedia file and play.
In the present embodiment, command process module 603 can be also used for: be judged whether it is according to the voiceprint of user There are users;If so, being managed according to command information to multimedia file;If it is not, then saving voiceprint, it is based on vocal print The corresponding catalogue of information creating user, and multimedia file is managed according to command information.It is extracted according to voice messaging Voiceprint, be compared with the voiceprint of the user deposited, according to comparison as a result, being assured that the voiceprint Whether be existing subscriber voiceprint.If it is then the analysis mode that the voice messaging parses to obtain command information can To be carried out according to the analysis mode of existing subscriber, to be performed corresponding processing according to command information;If it is not, explanation exists There is no user corresponding to this voice messaging in existing user information, this user is new user, then, if by new The information preservation of user is in systems, so that it may: firstly, saving voiceprint, then, it is based on the voiceprint, it is new to create this The corresponding catalogue of user is finally just performed corresponding processing according to command information.If you do not need to by the information preservation of new user In systems, then it can directly parse to obtain command information, corresponding operation is then carried out according to command information.
In the present embodiment, when command information includes addition label, command process module 603 be can be also used for: determine The time point of voice messaging typing;The preset compensation time will be subtracted time point, obtain label time point;According to command information In, the corresponding particular content of addition label creates voice label at label time point.Voice label is a kind of pair of multimedia file A kind of indicia means, certain corresponding time point of time attribute in the corresponding position of voice label, that is, multimedia file, According to the customized label substance of preset or user, can be convenient user view or listen in quick positioning.And add Adding the process of voice label, addition voice label usually views or listens to multimedia file progress with user, for example, When user is when watching video, a user is watched to think to need to add tagged position, therefore user has issued addition mark Corresponding voice messaging is signed, time point at this time is exactly the time point of voice messaging typing;But this time point is in fact It has passed through user to want to add tagged time point, because user necessarily first sees video content, then add voice label, because This, the time point where real voice label, it should it is to subtract the preset compensation time again at time point of typing, it is acquired Time point be only the label time point where voice label;The specific compensation time, can be accustomed to according to the viewing of user and Fixed, different users can have the different compensation time, and the same user can also have not when watching different video files The same compensation time.It is noted that the compensation time is intended to the multimedia position marked required for helping user's mark, it is right It can be elasticity in the requirement of its accuracy, that is to say, that label time point is really wanting the certain of the position of label In range, when next user wants viewing, jump to after label time point, if there are deviation, Yong Huke in position To be manually adjusted again by voice messaging control progress bar or user, and can be according to the demand of user again to voice label Content modify, modification may include modify voice label label time point, and modification voice label label in Hold.
In the present embodiment, when command information includes jumping, command process module 603 be can be also used for: according to instruction It in information, jumps corresponding particular content and is matched with the voice label deposited, when matching degree reaches preset threshold, then will The playback progress of the multimedia file jumps to the voice label corresponding label time point.Wherein, it jumps and mainly includes Two kinds of situations, it may be assumed that if multimedia file is playing, playback progress is directly jumped into the voice label corresponding label time Point;When if multimedia file is not switched on, opening multimedia file, and playback progress being jumped to the corresponding label of voice label Between point.After voice label is set, so that it may which the voice label based on setting carries out skip operation, and jumping is exactly by multimedia Playback progress jumps directly to required label time point.It is broadly divided into two states for multimedia file, it is a kind of It is to have opened, one is being not switched on, for the multimedia file opened, jumps directly to corresponding to corresponding audio tag Label time point;For the multimedia file being not switched on, then opened firstly the need of by the multimedia file, after opening, Further according to corresponding audio tag is jumped in voice messaging, playback progress is directly jumped into corresponding label time point and is broadcast It puts.Specifically, in the present embodiment, being matched about corresponding particular content is jumped with the voice label deposited, for same For the voice messaging of one user, the voice of the two can be directly matched, when matching degree is greater than the threshold value of setting, then triggers jump Turn instruction.The content for including in voice messaging, at least may include: jump, the label substance of voice label, can also include more The filename of media file, convenient for controlling the multimedia file being not switched on.
In the present embodiment, when command information is to create directory, open catalogue, then refer to that creation user is corresponding Corresponding command information when catalogue, usually new user are added, when there is the corresponding voice messaging of new voiceprint, such as The step of fruit voice messaging is to be carried out preservation voiceprint about createing directory, and creates the user corresponding catalogue.It beats Catalogue is opened, then is that the catalogue of the voice label of corresponding user is presented, display surface can be presented on the content of text On plate, is perhaps played in a manner of voice messaging or played out in the case where user's control automatically, such as in display surface The icon of a player is presented on plate, user clicks and can be carried out playing, alternatively, user can carry out by voice messaging Play control.
The present embodiment can also include voice storage module 604, pass through what speech recognition module 602 extracted for storing The characteristic information of user, i.e. voiceprint;And the present embodiment can also include display module 605, for according to voice messaging In the command information that extracts to user corresponding content, such as the broadcasting of video file, such as the presentation of tag directory is presented, Such as presentation of text information of voice label etc..
A kind of multimedia management device is present embodiments provided, the voice messaging of user is received, from voice messaging, is extracted The characteristic information of user, and corresponding to the command information of management multimedia file, according to characteristic information and command information management Corresponding multimedia file.By the implementation of the present embodiment, the voice messaging for combining user is managed multimedia file, User is realized to the flexible customized of multimedia file management, user is met to the multiple demands of management, improves use Family experience.
Obviously, those skilled in the art should be understood that each module of aforementioned present invention or each step can be with general Computing device realizes that they can be concentrated on a single computing device, or be distributed in constituted by multiple computing devices On network, optionally, they can be realized with the program code that computing device can perform, it is thus possible to be stored in It is performed by computing device in storage medium (ROM/RAM, magnetic disk, CD), and in some cases, it can be to be different from this The sequence at place executes shown or described step, perhaps they are fabricated to each integrated circuit modules or by it In multiple modules or step be fabricated to single integrated circuit module to realize.So the present invention is not limited to any specific Hardware and software combine.
The above content is specific embodiment is combined, further detailed description of the invention, and it cannot be said that this hair Bright specific implementation is only limited to these instructions.For those of ordinary skill in the art to which the present invention belongs, it is not taking off Under the premise of from present inventive concept, a number of simple deductions or replacements can also be made, all shall be regarded as belonging to protection of the invention Range.

Claims (12)

1. a kind of multi-media management method, comprising:
Receive the voice messaging of user;
From the voice messaging, the characteristic information and command information of user are extracted;
Corresponding multimedia file is managed according to the characteristic information and command information.
2. multi-media management method as described in claim 1, which is characterized in that it is described from the voice messaging, it extracts and uses The characteristic information and command information at family include;
The voiceprint of user is constructed according to the voice messaging, the characteristic information includes the voiceprint.
3. multi-media management method as described in claim 1, which is characterized in that it is described from the voice messaging, it extracts and uses The characteristic information and command information at family further include:
Speech recognition and naturally semantic parsing are carried out according to the voice messaging, described instruction information is determined according to parsing result.
4. multi-media management method as claimed in claim 2, which is characterized in that described to be believed according to the characteristic information and instruction Breath manages corresponding multimedia file
Existing user is judged whether it is according to the voiceprint of the user;
If so, being managed according to described instruction information to the multimedia file;
If it is not, then saving the voiceprint, the corresponding catalogue of the user is created based on the voiceprint, and according to described Command information is managed the multimedia file.
5. multi-media management method according to any one of claims 1-4, which is characterized in that when described instruction information includes adding It is described to include: according to the characteristic information and the corresponding multimedia file of command information management when tagging
Determine the time point of the voice messaging typing;
The time point is subtracted into the preset compensation time, obtains label time point;
According in described instruction information, the corresponding particular content of addition label creates voice label at the label time point.
6. multi-media management method according to any one of claims 1-4, which is characterized in that when described instruction information includes jumping It is described to include: according to the characteristic information and the corresponding multimedia file of command information management when turning
It is matched according to corresponding particular content in described instruction information, is jumped with the voice label deposited, when matching degree reaches When to preset threshold, then the playback progress of the multimedia file is jumped into the voice label corresponding label time point.
7. a kind of multimedia management device characterized by comprising
Voice input module, for receiving the voice messaging of user;
Speech recognition module, for extracting the characteristic information and command information of user from the voice messaging;
Command process module, for managing corresponding multimedia file according to the characteristic information and command information.
8. multimedia management device as claimed in claim 7, which is characterized in that the speech recognition module is also used to:
The voiceprint of user is constructed according to the voice messaging, the characteristic information includes the voiceprint.
9. multimedia management device as claimed in claim 7, which is characterized in that the speech recognition module is also used to:
Speech recognition and naturally semantic parsing are carried out according to the voice messaging, described instruction information is determined according to parsing result.
10. multimedia management device as claimed in claim 8, which is characterized in that described instruction processing module is also used to:
Existing user is judged whether it is according to the voiceprint of the user;
If so, being managed according to described instruction information to the multimedia file;
If it is not, then saving the voiceprint, the corresponding catalogue of the user is created based on the voiceprint, and according to described Command information is managed the multimedia file.
11. such as the described in any item multimedia management devices of claim 7-10, which is characterized in that when described instruction information includes When adding label, described instruction processing module is also used to:
Determine the time point of the voice messaging typing;
The time point is subtracted into the preset compensation time, obtains label time point;
According in described instruction information, the corresponding particular content of addition label creates voice label at the label time point.
12. such as the described in any item multimedia management devices of claim 7-10, which is characterized in that when described instruction information includes When jumping, described instruction processing module is also used to:
It is matched according to corresponding particular content in described instruction information, is jumped with the voice label deposited, when matching degree reaches When to preset threshold, then the playback progress of the multimedia file is jumped into the voice label corresponding label time point.
CN201710428940.4A 2017-06-08 2017-06-08 A kind of multi-media management method and device Pending CN109033099A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710428940.4A CN109033099A (en) 2017-06-08 2017-06-08 A kind of multi-media management method and device
PCT/CN2018/090400 WO2018224032A1 (en) 2017-06-08 2018-06-08 Multimedia management method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710428940.4A CN109033099A (en) 2017-06-08 2017-06-08 A kind of multi-media management method and device

Publications (1)

Publication Number Publication Date
CN109033099A true CN109033099A (en) 2018-12-18

Family

ID=64566419

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710428940.4A Pending CN109033099A (en) 2017-06-08 2017-06-08 A kind of multi-media management method and device

Country Status (2)

Country Link
CN (1) CN109033099A (en)
WO (1) WO2018224032A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114168764A (en) * 2021-11-04 2022-03-11 海南视联通信技术有限公司 Multimedia data processing method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103399737A (en) * 2013-07-18 2013-11-20 百度在线网络技术(北京)有限公司 Multimedia processing method and device based on voice data
US20140280773A1 (en) * 2013-03-15 2014-09-18 Michael Sharp Systems and methods for expedited delivery of media content
CN106372246A (en) * 2016-09-20 2017-02-01 深圳市同行者科技有限公司 Audio playing method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103226966A (en) * 2013-04-26 2013-07-31 广东欧珀移动通信有限公司 Method capable of quickly positioning playing progress and mobile terminal
CN105872619A (en) * 2015-12-15 2016-08-17 乐视网信息技术(北京)股份有限公司 Video playing record matching method and matching device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140280773A1 (en) * 2013-03-15 2014-09-18 Michael Sharp Systems and methods for expedited delivery of media content
CN103399737A (en) * 2013-07-18 2013-11-20 百度在线网络技术(北京)有限公司 Multimedia processing method and device based on voice data
CN106372246A (en) * 2016-09-20 2017-02-01 深圳市同行者科技有限公司 Audio playing method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114168764A (en) * 2021-11-04 2022-03-11 海南视联通信技术有限公司 Multimedia data processing method and device
CN114168764B (en) * 2021-11-04 2024-05-17 海南视联通信技术有限公司 Multimedia data processing method and device

Also Published As

Publication number Publication date
WO2018224032A1 (en) 2018-12-13

Similar Documents

Publication Publication Date Title
Nee et al. Podcasting the pandemic: Exploring storytelling formats and shifting journalistic norms in news podcasts related to the coronavirus
US20220053243A1 (en) Event-driven streaming media interactivity
US12356048B2 (en) Event-driven streaming media interactivity
Heiss Dubbing multilingual films: A new challenge?
CN102819969B (en) Implementation method for multimedia education platform and multimedia education platform system
CN108259971A (en) Subtitle adding method, device, server and storage medium
US20120197770A1 (en) System and method for real time text streaming
US10741089B2 (en) Interactive immersion system for movies, television, animation, music videos, language training, entertainment, video games and social networking
Díaz-Cintas 10 Audiovisual Translation in Mercurial Mediascapes
WO2015022992A1 (en) Information processing device, control method therefor, and computer program
JP2018116190A (en) Language teaching material creation system
Gerhardt Appropriating live televised football through talk
CN113992972B (en) Subtitle display method, device, electronic device and readable storage medium
CN109033099A (en) A kind of multi-media management method and device
CN108304130A (en) A kind of tag control system applied to audio
CN118762712A (en) Method, device, equipment, medium and program product for generating theatrical audio works
Denison Japanese and Korean film franchising and adaptation
KR102396263B1 (en) A System for Smart Language Learning Services using Scripts
US20240126500A1 (en) Device and method for creating a sharable clip of a podcast
Cui Deconstructing overhearing viewers: TVmojis as story retellers
CN106658167A (en) Video interaction method and device
US10657202B2 (en) Cognitive presentation system and method
Clancy et al. CALL tools for listening and speaking
Ahern et al. Radio Announcing
US11573999B2 (en) Accessible multimedia content

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20181218