CN109033099A - A kind of multi-media management method and device - Google Patents
A kind of multi-media management method and device Download PDFInfo
- Publication number
- CN109033099A CN109033099A CN201710428940.4A CN201710428940A CN109033099A CN 109033099 A CN109033099 A CN 109033099A CN 201710428940 A CN201710428940 A CN 201710428940A CN 109033099 A CN109033099 A CN 109033099A
- Authority
- CN
- China
- Prior art keywords
- user
- label
- voice messaging
- voice
- multimedia file
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The present invention provides a kind of multi-media management method and devices, receive the voice messaging of user, from voice messaging, extract the characteristic information and command information of user, manage corresponding multimedia file according to characteristic information and command information.Implementation through the invention, the voice messaging for combining user are managed multimedia file, realize user to the flexible customized of multimedia file management, meet user to the multiple demands of management, the user experience is improved.
Description
Technical field
The present invention relates to multimedia technology field more particularly to a kind of multi-media management method and devices.
Background technique
For in existing multimedia control means, other than the broadcasting on basis, pause etc., in broadcasting for video
A kind of also emerging means for playing label, are exactly that in a manner of timing node, video is divided in a video in putting
It cuts, video is watched with label after dividing so as to allow user, allows user that can see the content for wanting to see faster.So
And this tag control mode is only a kind of inflexible, fixed way to manage, user can not according to oneself hobby and
Actual demand is managed, it is difficult to meet the multiple demands of present different user.
Summary of the invention
The embodiment of the invention provides a kind of multi-media management method and devices, it is intended to solve in the prior art for more matchmakers
The problem of body management means is inflexible single, poor user experience.
In order to solve the above-mentioned technical problem, the embodiment of the invention provides a kind of multi-media management methods, comprising:
Receive the voice messaging of user;
From the voice messaging, the characteristic information and command information of user are extracted;
Corresponding multimedia file is managed according to the characteristic information and command information.
In addition, the embodiment of the present invention also provides a kind of multimedia management device, comprising:
Voice input module, for receiving the voice messaging of user;
Speech recognition module, for extracting the characteristic information and command information of user from the voice messaging;
Command process module, for managing corresponding multimedia file according to the characteristic information and command information.
The beneficial effects of the present invention are:
The present invention provides a kind of multi-media management method and devices, receive the voice messaging of user, from voice messaging,
The characteristic information and command information for extracting user, are managed multimedia file according to characteristic information and command information.It is logical
Implementation of the invention is crossed, the voice messaging for combining user is managed multimedia file, realizes user to multimedia text
Part management it is flexible customized, meet user to the multiple demands of management, the user experience is improved.
Detailed description of the invention
Fig. 1 is a kind of multi-media management method flow chart that first embodiment of the invention provides;
Fig. 2 is a kind of multi-media management method flow chart that second embodiment of the invention provides;
Fig. 3 is a kind of multi-media management method flow chart that second embodiment of the invention provides;
Fig. 4 is a kind of multi-media management method flow chart that second embodiment of the invention provides;
Fig. 5 is a kind of multi-media management method flow chart that second embodiment of the invention provides;
Fig. 6 is a kind of multimedia management device composition schematic diagram that third embodiment of the invention provides.
Specific embodiment
Design point of the invention is, in traditional multimedia administration means, the audio-frequency information progress for adding user can
Customized management, to promote the flexibility ratio and freedom degree of user management, and high reliablity, user experience is good.
Specific embodiments of the present invention will be further explained with reference to the accompanying drawing.
First embodiment
Referring to FIG. 1, Fig. 1 is a kind of multi-media management method flow chart that first embodiment of the invention provides, comprising:
S101, the voice messaging for receiving user;
S102, from voice messaging, extract the characteristic information and command information of user;
S103, corresponding multimedia file is managed according to characteristic information and command information.
Multimedia file, including audio file and video file etc. have a feature, they are all based on timeline
File, i.e. multimedia file all has a time attribute.For the audio determining for one or video file, at certain
It is the time point of a determination, always fixed to correspond to a determining broadcasting content, audio file is then only included in audio
Hold, then may include audio content and video content for video file.In addition, special, due to the curve characteristic of audio,
It equally may correspond to determining audio curve at determining time point.It in other words, can be by time attribute, directly to determine
The position that position is specified into multimedia file.
In S101, the voice messaging of user is received.The voice messaging of user can be the voice letter that user itself is issued
Breath, such as the voice messaging that user is issued with utterance, it is also possible that user believes by the voice that other equipment are issued
Breath, such as electronic translation, or the sound of user etc. recorded by other sound pick-up outfits.The voice messaging of user is then anti-
It has answered user to want the corresponding control for carrying out multimedia file to operate, including conventional broadcasting, pause etc.
Control logic.
In S102, from voice messaging, the characteristic information of user, and the instruction corresponding to management multimedia file are extracted
Information.According to voice messaging, the characteristic information of user can be extracted, characteristic information can be used as the identification information of same user, root
The voice messaging for belonging to same user can be determined according to characteristic information, correspondingly, can also be by characteristic information, by different user
Voice messaging distinguish.
In the present embodiment, characteristic information may include the voiceprint of user, wherein voiceprint specifically includes carrying
The sound wave spectrum of verbal information.Sound wave not only has specificity, but also has the characteristics that relative stability, especially adult with
Afterwards, the sound of people can keep it is long-term stablize relatively it is constant.It is demonstrated experimentally that no matter talker is deliberately to feign another's voice
And the tone, or whisper and softly talk, even if imitating remarkably true to life, vocal print is not but identical always.Based on this of vocal print
Feature, so that it may accurately, conclude same user is belonged to and distinguish with the voice messaging for being not belonging to same user.
In the present embodiment, the command information corresponding to management multimedia file, then refer to voice transmitted by user
Expressed by information, to the specific operation that multimedia file is carried out, this operation logic is conveyed in the form of voice messaging
To system.Due to the particularity of voice messaging, difference, the difference of ethnic group, the difference of language of vocal print can all cause between individual
The difference of the form of expression of command information included in voice messaging.For example, voice messaging can when user is Chinese
To be that mandarin or some the local dialects even can be the sentence etc. for being mingled with some English;It is for user
When Frenchman, voice messaging is then generally exactly French.Based on such consideration, in the present embodiment, extract correspond to manage it is more
The command information of media file can be carried out by the multiple means for different language, and can be by voice messaging
Characteristic information, to select different analysis modes.
Specifically, in the present embodiment, from voice messaging, the characteristic information and command information for extracting user be can wrap
It includes: constructing the voiceprint of user according to voice messaging, characteristic information includes voiceprint.And it is carried out according to voice messaging
Speech recognition and naturally semantic parsing, according to parsing result determine instruction information.Wherein, naturally semantic parsing, exactly parses language
The expressed meaning of message breath, the result according to the different parsings of language are also different.
In the present embodiment, specific command information may include: broadcasting, pause, stop, jumping, createing directory, open
At least one of catalogue, addition label etc..Wherein, play, suspend, stop etc. be all belong to it is existing it is multimedia often
Advise control instruction in the present embodiment can be according to above-metioned instruction be parsed, to realize corresponding control from voice messaging.
For example, playing corresponding voice messaging can be " playing xxx " for Chinese, and if it is English, it is right that plays institute
The voice messaging answered can be that " play xxx " is such.Other than playing itself, can also include in voice messaging
The voice content of object, such as the filename or a part in filename etc. of multimedia file are played, according in these
Appearance can directly open corresponding multimedia file and play.
It in the present embodiment, may include: root to corresponding multimedia file is managed according to characteristic information and command information
Existing user is judged whether it is according to the voiceprint of user;If so, being managed according to command information the more media file;
If it is not, voiceprint is then saved, based on the corresponding catalogue of voiceprint creation user, and according to command information to multimedia file
It is managed.The voiceprint extracted according to voice messaging is compared, according to comparison with the voiceprint of the user deposited
As a result, be assured that the voiceprint whether be existing subscriber voiceprint.If it is then the voice messaging solution
Analysis, which obtains the analysis mode of command information, to be carried out according to the analysis mode of existing subscriber, thus according to command information into
The corresponding processing of row;If it is not, explanation does not have user corresponding to this voice messaging in existing user information, this
User is new user, then, if in systems by the information preservation of new user, so that it may: firstly, voiceprint is saved,
Then, it is based on the voiceprint, the corresponding catalogue of the new user is created, is finally just performed corresponding processing according to command information.
If you do not need in systems by the information preservation of new user, then can directly parse to obtain command information, then according to instruction
Information carries out corresponding operation.
In the present embodiment, corresponding according to characteristic information and command information management when command information includes addition label
Multimedia file may include: determining voice messaging typing time point;The preset compensation time will be subtracted time point, obtained
Label time point;According in command information, the corresponding particular content of addition label creates voice label at label time point.Language
Phonetic symbol label are a kind of indicia means of a kind of pair of multimedia file, in the corresponding position of voice label, that is, multimedia file
Certain corresponding time point of time attribute user's viewing can be convenient according to the customized label substance of preset or user
Or the quick positioning in listening to.And add the process of voice label, addition voice label be usually with user viewing or
Listen to multimedia file progress, for example, when user is when watching video, watch a user think to need to add it is tagged
Therefore position, user have issued the corresponding voice messaging of addition label, time point at this time is exactly the time of voice messaging typing
Point;But this time point has in fact had been subjected to user and has wanted to add tagged time point, because user necessarily first sees
Video content, then voice label is added, therefore, the time point where real voice label, it should be the time point of typing
The preset compensation time is subtracted again, and obtained time point is only the label time point where voice label;When specific compensation
Between, it can be depending on the viewing of user habit, different users can have the different compensation time, and the same user is watching
There can also be the different compensation time when different video file.It is noted that the compensation time is intended to help user's mark
The multimedia position of required label, the requirement for its accuracy can be elasticity, that is to say, that label time point exists
In a certain range of position for really wanting label, when next user wants viewing, label time point is jumped to
Later, if there is a deviation in position, user can control progress bar by voice messaging again or user manually adjust, and can be with
It modifies again to the content of voice label according to the demand of user, modification may include the label time for modifying voice label
Point, and the label substance of modification voice label.
In the present embodiment, corresponding more according to characteristic information and command information management when command information includes jumping
Media file may include: to be matched with the voice label deposited according in command information, jumping corresponding particular content, when
When matching degree reaches preset threshold, then: the broadcasting of multimedia file being carried out to jump to voice label corresponding label time point.
Specifically, jumping mainly includes two kinds of situations: if multimedia file is playing, playback progress directly being jumped to voice mark
Sign corresponding label time point;If multimedia file is not switched on, multimedia file is opened, and playback progress is jumped into voice
Label corresponding label time point.After voice label is set, so that it may the voice label based on setting carries out skip operation,
It jumps and multimedia progress is exactly jumped directly to required label time point.Can substantially it divide for multimedia file
It, one is being not switched on, for the multimedia file opened, is jumped directly to corresponding for two states one is having opened
Label time point corresponding to audio tag;For the multimedia file being not switched on, then firstly the need of the multimedia is literary
Part is opened, and is opened and then according to corresponding audio tag is jumped in voice messaging, is directly jumped to playback progress corresponding
Label time point plays out.Specifically, in the present embodiment, about the voice label for jumping corresponding particular content Yu having deposited
It is matched, for the voice messaging of same user, can directly match the voice of the two, when matching degree is greater than setting
When threshold value, then jump instruction is triggered.The content for including in voice messaging at least may include: to jump, in the label of voice label
Hold, can also include the filename of multimedia file, convenient for controlling the multimedia file being not switched on.
In the present embodiment, when command information is to create directory, open catalogue, then refer to that creation user is corresponding
Corresponding command information when catalogue, usually new user are added, when there is the corresponding voice messaging of new voiceprint, such as
The step of fruit voice messaging is to be carried out preservation voiceprint about createing directory, and creates the user corresponding catalogue.It beats
Catalogue is opened, then is that the catalogue of the voice label of corresponding user is presented, display surface can be presented on the content of text
On plate, is perhaps played in a manner of voice messaging or played out in the case where user's control automatically, such as in display surface
The icon of a player is presented on plate, user clicks and can be carried out playing, alternatively, user can carry out by voice messaging
Play control.
A kind of multi-media management method is present embodiments provided, the voice messaging of user is received, from voice messaging, is extracted
The characteristic information and command information of user manages corresponding multimedia file according to characteristic information and command information.By this reality
The implementation for applying example, the voice messaging for combining user are managed multimedia file, realize user to multimedia file pipe
That manages is flexible customized, meets user to the multiple demands of management, the user experience is improved.
Second embodiment
Referring to FIG. 2, Fig. 2 is a kind of multi-media management method flow chart that second embodiment of the invention provides, comprising:
S201, the voice messaging for obtaining user's typing;
Characteristic information in S202, extraction voice messaging;
S203, simultaneously carries out speech recognition, extracts command information therein;
S204, corresponding multimedia file is managed according to command information;
S205, the characteristic information in the voice messaging of typing is saved.
In S201, the voice messaging of user's typing is obtained.The voice messaging of user can be the language that user itself is issued
Message breath, such as the voice messaging that user is issued with utterance, it is also possible that the language that user is issued by other equipment
Message breath, such as electronic translation, or the sound of user etc. recorded by other sound pick-up outfits.
In S202, from voice messaging, the characteristic information of user is extracted.According to voice messaging, the spy of user can be extracted
Reference breath, characteristic information can be used as the identification information of same user, the language for belonging to same user can be determined according to characteristic information
Message breath, correspondingly, can also be distinguished the voice messaging of different user by characteristic information.Characteristic information can wrap
Include the voiceprint of user, wherein voiceprint specifically includes the sound wave spectrum for carrying verbal information.Sound wave not only has specific
Property, and have the characteristics that relative stability, especially after adult, the sound of people can keep it is long-term it is relatively stable not
Become.It is demonstrated experimentally that no matter talker is deliberately to feign another's voice and the tone, or whisper in sb.'s ear is softly talked, even if imitating
Remarkably true to life, vocal print is not but identical always.This feature based on vocal print, so that it may accurately, will belong to same user and
The voice messaging for being not belonging to same user is concluded and is distinguished.
In S203, speech recognition is carried out, command information therein is extracted, then refers to voice messaging transmitted by user
Expressed, to the specific operation that multimedia file is carried out, this operation logic is communicated in the form of voice messaging is
System.Due to the particularity of voice messaging, difference, the difference of ethnic group, the difference of language of vocal print can all cause voice between individual
The difference of the form of expression of command information included in information.For example, voice messaging can be when user is Chinese
Mandarin or some the local dialects even can be the sentence etc. for being mingled with some English;It is France for user
When people, voice messaging is then generally exactly French.Based on such consideration, in the present embodiment, extracts and correspond to management multimedia
The command information of file can be carried out by the multiple means for different language, and can pass through the feature in voice messaging
Information, to select different analysis modes.
In the present embodiment, specific command information may include: broadcasting, pause, stop, jumping, createing directory, open
At least one of catalogue, addition label etc..Wherein, play, suspend, stop etc. be all belong to it is existing it is multimedia often
Advise control instruction in the present embodiment can be according to above-metioned instruction be parsed, to realize corresponding control from voice messaging.
For example, playing corresponding voice messaging can be " playing xxx " for Chinese, and if it is English, it is right that plays institute
The voice messaging answered can be that " play xxx " is such.Other than playing itself, can also include in voice messaging
The voice content of object, such as the filename or a part in filename etc. of multimedia file are played, according in these
Appearance can directly open corresponding multimedia file and play.
Referring to FIG. 3, being performed corresponding processing according to command information to multimedia file when command information includes playing
May include:
S301, the corresponding command information of broadcasting is judged whether it is, if so, going to S302;
S302, judge whether the command information corresponds to existing subscriber, if so, S303 is gone to, if it is not, then going to
S304;
S303, opening simultaneously play corresponding multimedia file;
S304, prompt user do not have related voice label, and from the beginning play corresponding multimedia file.
Referring to FIG. 4, being carried out according to command information to multimedia file corresponding when command information includes addition label
Processing may include:
S401, judge whether it is the corresponding command information of addition label;
S402, the time point for obtaining voice messaging typing;
S403, voice label corresponding label time point is calculated;
S404, voice label is generated;
S405, addition voice label to corresponding User Catalog;
S406, feedback voice label add successful prompt information.
Voice label is a kind of indicia means of a kind of pair of multimedia file, in the corresponding position of voice label, that is,
Certain corresponding time point of the time attribute of multimedia file, according to the customized label substance of preset or user, Ke Yifang
Just user view or listen in quick positioning.
In S402, addition voice label is usually as user views or listens to multimedia file progress, for example, working as
User watches a user to think to need to add tagged position, therefore user has issued addition label when watching video
Corresponding voice messaging, time point at this time are exactly the time point of voice messaging typing.
In S403, the time point of voice messaging typing has in fact had been subjected to user and has wanted to add tagged time point, because
Video content is necessarily first seen for user, then adds voice label, therefore, the time point where real voice label,
Should subtract the preset compensation time again at time point of typing, when obtained time point is only the label where voice label
Between point;It the specific compensation time, can be depending on the viewing of user habit, when different users can have different compensation
Between, the same user can also have the different compensation time when watching different video files.
Referring to FIG. 5, being performed corresponding processing according to command information to multimedia file when command information includes jumping
May include:
S501, judge whether it is to jump corresponding command information;If so, going to S502;
S502, judge whether to check tag directory;If so, S503 is gone to, if it is not, then going to S505;
S503, judge whether have corresponding tag directory;If so, S504 is gone to, if it is not, going to S507;
S504, the corresponding tag directory of display user, and continue to obtain the voice messaging of user's typing;
S505, judge whether there is with the matched user of the command information and label substance, if so, S506 is gone to, if it is not, then
Go to S508;
S506, multimedia file is jumped to voice label corresponding label time point, played out.
S507, prompt user do not have corresponding tag directory, and terminate process.
S508, prompt user do not have corresponding voice label, and terminate process.
In S502, judge whether to check tag directory, refers to the corresponding tag directory of present multimedia file, if check
It is then to be determined according to the content for jumping corresponding command information, if jump instruction specifies the content of voice label, no
Need display label catalogue;If the content of the not specified voice label of jump instruction, needs display label catalogue.
In S505, the label substance with jump instruction matched user and voice label is judged whether there is, then may include:
It is matched according to corresponding particular content is jumped with the voice label deposited, when matching degree reaches preset threshold, is then illustrated
There is relevant voice label, that must be jumped according to this voice label;If it is not, there is no this marks for explanation
Label, it may be possible to which user remembers wrongly label substance, it may be possible to the voice messaging of user's typing is wrong, after issuing the user with prompt,
Terminate process.
In the present embodiment, when command information is to create directory, open catalogue, then refer to that creation user is corresponding
Corresponding command information when catalogue, usually new user are added, when there is the corresponding voice messaging of new voiceprint, such as
The step of fruit voice messaging is to be carried out preservation voiceprint about createing directory, and creates the user corresponding catalogue.It beats
Catalogue is opened, then is that the catalogue of the voice label of corresponding user is presented, display surface can be presented on the content of text
On plate, is perhaps played in a manner of voice messaging or played out in the case where user's control automatically, such as in display surface
The icon of a player is presented on plate, user clicks and can be carried out playing, alternatively, user can carry out by voice messaging
Play control.
A kind of multi-media management method is present embodiments provided, the voice messaging of user is received, from voice messaging, is extracted
The characteristic information of user, and corresponding to the command information of management multimedia file, according to characteristic information and command information management
Corresponding multimedia file.By the implementation of the present embodiment, the voice messaging for combining user is managed multimedia file,
User is realized to the flexible customized of multimedia file management, user is met to the multiple demands of management, improves use
Family experience.
3rd embodiment
Referring to FIG. 3, Fig. 3 is a kind of multimedia management device composition schematic diagram that third embodiment of the invention provides, packet
It includes:
Voice input module 601, for receiving the voice messaging of user;
Speech recognition module 602, for extracting the characteristic information and command information of user from voice messaging;
Command process module 603, for managing corresponding multimedia file according to characteristic information and command information.
Multimedia file has a feature, is all based on the file of timeline, i.e., when multimedia file all has one
Between attribute.For the audio determining for one or video file, at the time point that some is determined, always fixation corresponds to
One determining broadcasting content, then only includes audio content for audio file, then may include in audio for video file
Appearance and video content.In addition, it is special, due to the curve characteristic of audio, equally may correspond at determining time point
Determining audio curve.It in other words, can be at time attribute be passed through, to be directly targeted to the position specified in multimedia file.
In the present embodiment, voice input module 601 is used to receive the voice messaging of user.The voice messaging of user can be with
It is the voice messaging that the voice messaging that user itself is issued, such as user are issued with utterance, it is also possible that user borrows
The voice messaging that other equipment are issued, such as electronic translation are helped, or the sound of the user recorded by other sound pick-up outfits
Sound etc..The voice messaging of user has then reacted user and has wanted the corresponding control carried out to multimedia file operation, including
The control logic of conventional broadcasting, pause etc..
In the present embodiment, speech recognition module 602 is used for from voice messaging, extracts the characteristic information of user, and
Command information corresponding to management multimedia file.According to voice messaging, the characteristic information of user can be extracted, characteristic information can
As the identification information of same user, the voice messaging for belonging to same user can be determined according to characteristic information, correspondingly, can also
By characteristic information, the voice messaging of different user to be distinguished.
In the present embodiment, characteristic information may include the voiceprint of user, wherein voiceprint specifically includes carrying
The sound wave spectrum of verbal information.Sound wave not only has specificity, but also has the characteristics that relative stability, especially adult with
Afterwards, the sound of people can keep it is long-term stablize relatively it is constant.It is demonstrated experimentally that no matter talker is deliberately to feign another's voice
And the tone, or whisper and softly talk, even if imitating remarkably true to life, vocal print is not but identical always.Based on this of vocal print
Feature, so that it may accurately, conclude same user is belonged to and distinguish with the voice messaging for being not belonging to same user.
In the present embodiment, the command information corresponding to management multimedia file, then refer to voice transmitted by user
Expressed by information, to the specific operation that multimedia file is carried out, this operation logic is conveyed in the form of voice messaging
To system.Due to the particularity of voice messaging, difference, the difference of ethnic group, the difference of language of vocal print can all cause between individual
The difference of the form of expression of command information included in voice messaging.For example, voice messaging can when user is Chinese
To be that mandarin or some the local dialects even can be the sentence etc. for being mingled with some English;It is for user
When Frenchman, voice messaging is then generally exactly French.Based on such consideration, in the present embodiment, extract correspond to manage it is more
The command information of media file can be carried out by the multiple means for different language, and can be by voice messaging
Characteristic information, to select different analysis modes.
Specifically, speech recognition module 602 can be also used for: constructing the voiceprint of user, feature according to voice messaging
Information includes voiceprint.It is also possible that speech recognition and naturally semantic parsing are carried out according to voice messaging, it is true according to parsing result
Determine command information.Wherein, naturally semantic parsing, exactly parses meaning expressed by voice messaging, is parsed according to the difference of language
Result it is also different.
In the present embodiment, specific command information may include: broadcasting, pause, stop, jumping, createing directory, open
At least one of catalogue, addition label etc..Wherein, play, suspend, stop etc. be all belong to it is existing it is multimedia often
Advise control instruction in the present embodiment can be according to above-metioned instruction be parsed, to realize corresponding control from voice messaging.
For example, playing corresponding voice messaging can be " playing xxx " for Chinese, and if it is English, it is right that plays institute
The voice messaging answered can be that " play xxx " is such.Other than playing itself, can also include in voice messaging
The voice content of object, such as the filename or a part in filename etc. of multimedia file are played, according in these
Appearance can directly open corresponding multimedia file and play.
In the present embodiment, command process module 603 can be also used for: be judged whether it is according to the voiceprint of user
There are users;If so, being managed according to command information to multimedia file;If it is not, then saving voiceprint, it is based on vocal print
The corresponding catalogue of information creating user, and multimedia file is managed according to command information.It is extracted according to voice messaging
Voiceprint, be compared with the voiceprint of the user deposited, according to comparison as a result, being assured that the voiceprint
Whether be existing subscriber voiceprint.If it is then the analysis mode that the voice messaging parses to obtain command information can
To be carried out according to the analysis mode of existing subscriber, to be performed corresponding processing according to command information;If it is not, explanation exists
There is no user corresponding to this voice messaging in existing user information, this user is new user, then, if by new
The information preservation of user is in systems, so that it may: firstly, saving voiceprint, then, it is based on the voiceprint, it is new to create this
The corresponding catalogue of user is finally just performed corresponding processing according to command information.If you do not need to by the information preservation of new user
In systems, then it can directly parse to obtain command information, corresponding operation is then carried out according to command information.
In the present embodiment, when command information includes addition label, command process module 603 be can be also used for: determine
The time point of voice messaging typing;The preset compensation time will be subtracted time point, obtain label time point;According to command information
In, the corresponding particular content of addition label creates voice label at label time point.Voice label is a kind of pair of multimedia file
A kind of indicia means, certain corresponding time point of time attribute in the corresponding position of voice label, that is, multimedia file,
According to the customized label substance of preset or user, can be convenient user view or listen in quick positioning.And add
Adding the process of voice label, addition voice label usually views or listens to multimedia file progress with user, for example,
When user is when watching video, a user is watched to think to need to add tagged position, therefore user has issued addition mark
Corresponding voice messaging is signed, time point at this time is exactly the time point of voice messaging typing;But this time point is in fact
It has passed through user to want to add tagged time point, because user necessarily first sees video content, then add voice label, because
This, the time point where real voice label, it should it is to subtract the preset compensation time again at time point of typing, it is acquired
Time point be only the label time point where voice label;The specific compensation time, can be accustomed to according to the viewing of user and
Fixed, different users can have the different compensation time, and the same user can also have not when watching different video files
The same compensation time.It is noted that the compensation time is intended to the multimedia position marked required for helping user's mark, it is right
It can be elasticity in the requirement of its accuracy, that is to say, that label time point is really wanting the certain of the position of label
In range, when next user wants viewing, jump to after label time point, if there are deviation, Yong Huke in position
To be manually adjusted again by voice messaging control progress bar or user, and can be according to the demand of user again to voice label
Content modify, modification may include modify voice label label time point, and modification voice label label in
Hold.
In the present embodiment, when command information includes jumping, command process module 603 be can be also used for: according to instruction
It in information, jumps corresponding particular content and is matched with the voice label deposited, when matching degree reaches preset threshold, then will
The playback progress of the multimedia file jumps to the voice label corresponding label time point.Wherein, it jumps and mainly includes
Two kinds of situations, it may be assumed that if multimedia file is playing, playback progress is directly jumped into the voice label corresponding label time
Point;When if multimedia file is not switched on, opening multimedia file, and playback progress being jumped to the corresponding label of voice label
Between point.After voice label is set, so that it may which the voice label based on setting carries out skip operation, and jumping is exactly by multimedia
Playback progress jumps directly to required label time point.It is broadly divided into two states for multimedia file, it is a kind of
It is to have opened, one is being not switched on, for the multimedia file opened, jumps directly to corresponding to corresponding audio tag
Label time point;For the multimedia file being not switched on, then opened firstly the need of by the multimedia file, after opening,
Further according to corresponding audio tag is jumped in voice messaging, playback progress is directly jumped into corresponding label time point and is broadcast
It puts.Specifically, in the present embodiment, being matched about corresponding particular content is jumped with the voice label deposited, for same
For the voice messaging of one user, the voice of the two can be directly matched, when matching degree is greater than the threshold value of setting, then triggers jump
Turn instruction.The content for including in voice messaging, at least may include: jump, the label substance of voice label, can also include more
The filename of media file, convenient for controlling the multimedia file being not switched on.
In the present embodiment, when command information is to create directory, open catalogue, then refer to that creation user is corresponding
Corresponding command information when catalogue, usually new user are added, when there is the corresponding voice messaging of new voiceprint, such as
The step of fruit voice messaging is to be carried out preservation voiceprint about createing directory, and creates the user corresponding catalogue.It beats
Catalogue is opened, then is that the catalogue of the voice label of corresponding user is presented, display surface can be presented on the content of text
On plate, is perhaps played in a manner of voice messaging or played out in the case where user's control automatically, such as in display surface
The icon of a player is presented on plate, user clicks and can be carried out playing, alternatively, user can carry out by voice messaging
Play control.
The present embodiment can also include voice storage module 604, pass through what speech recognition module 602 extracted for storing
The characteristic information of user, i.e. voiceprint;And the present embodiment can also include display module 605, for according to voice messaging
In the command information that extracts to user corresponding content, such as the broadcasting of video file, such as the presentation of tag directory is presented,
Such as presentation of text information of voice label etc..
A kind of multimedia management device is present embodiments provided, the voice messaging of user is received, from voice messaging, is extracted
The characteristic information of user, and corresponding to the command information of management multimedia file, according to characteristic information and command information management
Corresponding multimedia file.By the implementation of the present embodiment, the voice messaging for combining user is managed multimedia file,
User is realized to the flexible customized of multimedia file management, user is met to the multiple demands of management, improves use
Family experience.
Obviously, those skilled in the art should be understood that each module of aforementioned present invention or each step can be with general
Computing device realizes that they can be concentrated on a single computing device, or be distributed in constituted by multiple computing devices
On network, optionally, they can be realized with the program code that computing device can perform, it is thus possible to be stored in
It is performed by computing device in storage medium (ROM/RAM, magnetic disk, CD), and in some cases, it can be to be different from this
The sequence at place executes shown or described step, perhaps they are fabricated to each integrated circuit modules or by it
In multiple modules or step be fabricated to single integrated circuit module to realize.So the present invention is not limited to any specific
Hardware and software combine.
The above content is specific embodiment is combined, further detailed description of the invention, and it cannot be said that this hair
Bright specific implementation is only limited to these instructions.For those of ordinary skill in the art to which the present invention belongs, it is not taking off
Under the premise of from present inventive concept, a number of simple deductions or replacements can also be made, all shall be regarded as belonging to protection of the invention
Range.
Claims (12)
1. a kind of multi-media management method, comprising:
Receive the voice messaging of user;
From the voice messaging, the characteristic information and command information of user are extracted;
Corresponding multimedia file is managed according to the characteristic information and command information.
2. multi-media management method as described in claim 1, which is characterized in that it is described from the voice messaging, it extracts and uses
The characteristic information and command information at family include;
The voiceprint of user is constructed according to the voice messaging, the characteristic information includes the voiceprint.
3. multi-media management method as described in claim 1, which is characterized in that it is described from the voice messaging, it extracts and uses
The characteristic information and command information at family further include:
Speech recognition and naturally semantic parsing are carried out according to the voice messaging, described instruction information is determined according to parsing result.
4. multi-media management method as claimed in claim 2, which is characterized in that described to be believed according to the characteristic information and instruction
Breath manages corresponding multimedia file
Existing user is judged whether it is according to the voiceprint of the user;
If so, being managed according to described instruction information to the multimedia file;
If it is not, then saving the voiceprint, the corresponding catalogue of the user is created based on the voiceprint, and according to described
Command information is managed the multimedia file.
5. multi-media management method according to any one of claims 1-4, which is characterized in that when described instruction information includes adding
It is described to include: according to the characteristic information and the corresponding multimedia file of command information management when tagging
Determine the time point of the voice messaging typing;
The time point is subtracted into the preset compensation time, obtains label time point;
According in described instruction information, the corresponding particular content of addition label creates voice label at the label time point.
6. multi-media management method according to any one of claims 1-4, which is characterized in that when described instruction information includes jumping
It is described to include: according to the characteristic information and the corresponding multimedia file of command information management when turning
It is matched according to corresponding particular content in described instruction information, is jumped with the voice label deposited, when matching degree reaches
When to preset threshold, then the playback progress of the multimedia file is jumped into the voice label corresponding label time point.
7. a kind of multimedia management device characterized by comprising
Voice input module, for receiving the voice messaging of user;
Speech recognition module, for extracting the characteristic information and command information of user from the voice messaging;
Command process module, for managing corresponding multimedia file according to the characteristic information and command information.
8. multimedia management device as claimed in claim 7, which is characterized in that the speech recognition module is also used to:
The voiceprint of user is constructed according to the voice messaging, the characteristic information includes the voiceprint.
9. multimedia management device as claimed in claim 7, which is characterized in that the speech recognition module is also used to:
Speech recognition and naturally semantic parsing are carried out according to the voice messaging, described instruction information is determined according to parsing result.
10. multimedia management device as claimed in claim 8, which is characterized in that described instruction processing module is also used to:
Existing user is judged whether it is according to the voiceprint of the user;
If so, being managed according to described instruction information to the multimedia file;
If it is not, then saving the voiceprint, the corresponding catalogue of the user is created based on the voiceprint, and according to described
Command information is managed the multimedia file.
11. such as the described in any item multimedia management devices of claim 7-10, which is characterized in that when described instruction information includes
When adding label, described instruction processing module is also used to:
Determine the time point of the voice messaging typing;
The time point is subtracted into the preset compensation time, obtains label time point;
According in described instruction information, the corresponding particular content of addition label creates voice label at the label time point.
12. such as the described in any item multimedia management devices of claim 7-10, which is characterized in that when described instruction information includes
When jumping, described instruction processing module is also used to:
It is matched according to corresponding particular content in described instruction information, is jumped with the voice label deposited, when matching degree reaches
When to preset threshold, then the playback progress of the multimedia file is jumped into the voice label corresponding label time point.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710428940.4A CN109033099A (en) | 2017-06-08 | 2017-06-08 | A kind of multi-media management method and device |
PCT/CN2018/090400 WO2018224032A1 (en) | 2017-06-08 | 2018-06-08 | Multimedia management method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710428940.4A CN109033099A (en) | 2017-06-08 | 2017-06-08 | A kind of multi-media management method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109033099A true CN109033099A (en) | 2018-12-18 |
Family
ID=64566419
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710428940.4A Pending CN109033099A (en) | 2017-06-08 | 2017-06-08 | A kind of multi-media management method and device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109033099A (en) |
WO (1) | WO2018224032A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114168764A (en) * | 2021-11-04 | 2022-03-11 | 海南视联通信技术有限公司 | Multimedia data processing method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103399737A (en) * | 2013-07-18 | 2013-11-20 | 百度在线网络技术(北京)有限公司 | Multimedia processing method and device based on voice data |
US20140280773A1 (en) * | 2013-03-15 | 2014-09-18 | Michael Sharp | Systems and methods for expedited delivery of media content |
CN106372246A (en) * | 2016-09-20 | 2017-02-01 | 深圳市同行者科技有限公司 | Audio playing method and device |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103226966A (en) * | 2013-04-26 | 2013-07-31 | 广东欧珀移动通信有限公司 | Method capable of quickly positioning playing progress and mobile terminal |
CN105872619A (en) * | 2015-12-15 | 2016-08-17 | 乐视网信息技术(北京)股份有限公司 | Video playing record matching method and matching device |
-
2017
- 2017-06-08 CN CN201710428940.4A patent/CN109033099A/en active Pending
-
2018
- 2018-06-08 WO PCT/CN2018/090400 patent/WO2018224032A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140280773A1 (en) * | 2013-03-15 | 2014-09-18 | Michael Sharp | Systems and methods for expedited delivery of media content |
CN103399737A (en) * | 2013-07-18 | 2013-11-20 | 百度在线网络技术(北京)有限公司 | Multimedia processing method and device based on voice data |
CN106372246A (en) * | 2016-09-20 | 2017-02-01 | 深圳市同行者科技有限公司 | Audio playing method and device |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114168764A (en) * | 2021-11-04 | 2022-03-11 | 海南视联通信技术有限公司 | Multimedia data processing method and device |
CN114168764B (en) * | 2021-11-04 | 2024-05-17 | 海南视联通信技术有限公司 | Multimedia data processing method and device |
Also Published As
Publication number | Publication date |
---|---|
WO2018224032A1 (en) | 2018-12-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Nee et al. | Podcasting the pandemic: Exploring storytelling formats and shifting journalistic norms in news podcasts related to the coronavirus | |
US20220053243A1 (en) | Event-driven streaming media interactivity | |
US12356048B2 (en) | Event-driven streaming media interactivity | |
Heiss | Dubbing multilingual films: A new challenge? | |
CN102819969B (en) | Implementation method for multimedia education platform and multimedia education platform system | |
CN108259971A (en) | Subtitle adding method, device, server and storage medium | |
US20120197770A1 (en) | System and method for real time text streaming | |
US10741089B2 (en) | Interactive immersion system for movies, television, animation, music videos, language training, entertainment, video games and social networking | |
Díaz-Cintas | 10 Audiovisual Translation in Mercurial Mediascapes | |
WO2015022992A1 (en) | Information processing device, control method therefor, and computer program | |
JP2018116190A (en) | Language teaching material creation system | |
Gerhardt | Appropriating live televised football through talk | |
CN113992972B (en) | Subtitle display method, device, electronic device and readable storage medium | |
CN109033099A (en) | A kind of multi-media management method and device | |
CN108304130A (en) | A kind of tag control system applied to audio | |
CN118762712A (en) | Method, device, equipment, medium and program product for generating theatrical audio works | |
Denison | Japanese and Korean film franchising and adaptation | |
KR102396263B1 (en) | A System for Smart Language Learning Services using Scripts | |
US20240126500A1 (en) | Device and method for creating a sharable clip of a podcast | |
Cui | Deconstructing overhearing viewers: TVmojis as story retellers | |
CN106658167A (en) | Video interaction method and device | |
US10657202B2 (en) | Cognitive presentation system and method | |
Clancy et al. | CALL tools for listening and speaking | |
Ahern et al. | Radio Announcing | |
US11573999B2 (en) | Accessible multimedia content |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20181218 |