CN108900886A - A kind of Freehandhand-drawing video intelligent dubs generation and synchronous method - Google Patents
A kind of Freehandhand-drawing video intelligent dubs generation and synchronous method Download PDFInfo
- Publication number
- CN108900886A CN108900886A CN201810788821.4A CN201810788821A CN108900886A CN 108900886 A CN108900886 A CN 108900886A CN 201810788821 A CN201810788821 A CN 201810788821A CN 108900886 A CN108900886 A CN 108900886A
- Authority
- CN
- China
- Prior art keywords
- data
- dubbing
- generation
- intelligent
- freehandhand
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/4302—Content synchronisation processes, e.g. decoder synchronisation
- H04N21/4307—Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Management Or Editing Of Information On Record Carriers (AREA)
Abstract
The present invention provides a kind of Freehandhand-drawing video intelligent and dubs generation and synchronous method, which is characterized in that the Freehandhand-drawing video intelligent dubs generation and synchronous method includes:S10:Script data is dubbed in acquisition;S20:The script data of dubbing got is standardized;S30:Obtain the characteristic for dubbing script data;S40:Generate intelligent dubbing data;S50:The intelligent dubbing data of generation is synchronized in institute's video scene to be applied;S60:To after synchronizing intelligent dubbing data and video scene be finely adjusted processing.The present invention provides a kind of Freehandhand-drawing video intelligent and dubs generation and synchronous method, based on to different types of a large amount of voice datas are specified, carries out speech model training, and user can choose the sound of a certain type, be dubbed for target video.
Description
Technical field
The present invention relates to Freehandhand-drawing video intelligents to dub field, and in particular to a kind of Freehandhand-drawing video intelligent dubs generation and same
The method of step.
Background technique
In the short-sighted frequency production process of Freehandhand-drawing, sound is an important component part, sound and Freehandhand-drawing animation group together
At the short-sighted frequency of Freehandhand-drawing.
And the sound in the short-sighted frequency of Freehandhand-drawing now realizes technology, there are mainly two types of:1. being simply added background music, 2. specially
Industry is dubbed personnel and is dubbed according to video image content, and 3. utilize speech synthesis technique, synthesizes the voice of specified text, and
These three sound realize technology, each there is many problems and disadvantages.
The first is simply added the mode of background music, and disadvantage music cannot be synchronous with picture material, can not accomplish to picture
The effects of explanation in face, can only sound in simple realization Freehandhand-drawing video, the compatible degree of sound and the quality of video are all difficult to protect
Card.
Second of profession dubs mode, needs to dub script by the personnel that dub of profession, production, uses dubbing for profession
Equipment is gone to realize and be dubbed, then using the video editing tool of profession, is synthesized to video dubbing according to video image content
In;The shortcomings that this mode, is and profession dubs communication cost between personnel, monetary cost and entirely dubs synthesis process
Time cost, once and video content have modification, it is necessary to re-start it is above-mentioned entirely dub process, for common
It is that difficulty is biggish that user's use or professional user, which use,.
The third speech synthesis technique, the voice after synthesis, sounder are machine synthetic videos, word speed, intonation, at the tone
Reason, fluency etc. all have a long way to go between true man's sound, can not achieve dubbing for high quality in Freehandhand-drawing video.
To sum up, existing technology, the sound for having no idea to complete well in Freehandhand-drawing video are realized at present.
Summary of the invention
In view of this, the present invention, which provides a kind of Freehandhand-drawing video intelligent, dubs generation and synchronous method, based on to it is specified not
A large amount of voice datas of same type carry out speech model training, and user can choose the sound of a certain type, be target video into
Row is dubbed;And according to the duration of each scene in the short-sighted frequency of target, fine tuning is done to each section in dubbing data and is synchronized;Again to every
Gradual change processing is done at the pause of section dubbing data, the case where there are multiple voices to same time point, does reasonable volume adjustment
Operation, finally realize the generation intelligently dubbed with it is synchronous.
The present invention provides a kind of Freehandhand-drawing video intelligent and dubs generation and synchronous method, which is characterized in that the Freehandhand-drawing view
Frequency intelligence dubs generation and synchronous method includes:
S10:Script data is dubbed in acquisition, and is stored in dubbing in script database;
S20:The script data of dubbing got is standardized;
S30:The characteristic for dubbing script data is obtained, and is stored in dubbing in teaching data characteristics database;
S40:Generate intelligent dubbing data;
S50:The intelligent dubbing data of generation is synchronized in institute's video scene to be applied;
S60:To after synchronizing intelligent dubbing data and video scene be finely adjusted processing.
Preferably, step S40 includes:
S401:Obtain different types of voice data;
S402:Classification sound characteristic is carried out to the different types of voice data of acquisition to extract;
S403:Classification voice is carried out to the different types of sound characteristic of extraction to be trained, and generates phonetic algorithm model;
S404:It is special according to different types of voice data corresponding in phonetic algorithm model and the corresponding classification sound of the type
Data are levied, synthesize the voice in corresponding voice script using speech regeneration method, adjustment word speed and corresponding pause point, synthesis are dubbed
Data.
It preferably, further include step S41 after the step S40:
Be related to it is special dub script data, to it is special dub script data carry out specially treated.
Preferably, the script data of dubbing includes audio text information, pause point information, word speed information and sound number
According to type information.
Preferably, step S60 includes:
S601:When video scene switches, gradual change processing is carried out to the dubbing data of generation;
S602:Dubbing data and video scene are further finely tuned, guarantee the synchronization of dubbing data and video scene.
The invention has the advantages and positive effects that:The present invention is based on to specifying different types of a large amount of voice datas,
The extraction of the features such as vocal print, sound frequency, intonation, the tone is carried out, speech model training is carried out;It is also based on user's typing
Voice data carries out above-mentioned feature extraction, carries out model training.Script is dubbed according to user's input in this way, so that it may complete
At the synthesis for specifying different types of sound rendering or even user oneself sound.User can choose the sound of a certain type, be
Target video is dubbed;And according to the duration of each scene in the short-sighted frequency of target, each section in dubbing data is finely tuned
It is synchronous;Again to gradual change processing is done at the pause of every section of dubbing data, the case where there are multiple voices to same time point, is done rationally
Volume adjustment operation, finally realize the generation intelligently dubbed with it is synchronous.
Detailed description of the invention
Fig. 1 is that the Freehandhand-drawing video intelligent of first embodiment of the invention embodiment dubs the method stream of generation and synchronous method
Cheng Tu;
Fig. 2 is that the Freehandhand-drawing video intelligent of second embodiment of the invention dubs the method flow diagram of generation and synchronous method;
Fig. 3 is the method flow diagram of the intelligent dubbing data of generation of the embodiment of the present invention;
Fig. 4 be the embodiment of the present invention to after synchronizing intelligent dubbing data and video scene be finely adjusted the method stream of processing
Cheng Tu;
Fig. 5 is that the Freehandhand-drawing video intelligent of inventive embodiments dubs the functional diagram of generation and synchronization engine.
Specific embodiment
In order to better understand the present invention, the present invention is further retouched with attached drawing combined with specific embodiments below
It states.
A kind of Freehandhand-drawing video intelligent of the present invention dubs generation and synchronous method, which is characterized in that the Freehandhand-drawing video intelligence
Generation can be dubbed and synchronous method includes:
S10:Script data is dubbed in acquisition;
S20:The script data of dubbing got is standardized;
S30:Obtain the characteristic for dubbing script data;
S40:Generate intelligent dubbing data;
S50:The intelligent dubbing data of generation is synchronized in institute's video scene to be applied;
S60:To after synchronizing intelligent dubbing data and video scene be finely adjusted processing.
In one embodiment of the invention, setting be stored with dub script data dub script database
VoiceScriptData, it is described dub to be stored in script database VoiceScriptData largely dub script data,
Dubbing script data described in wherein includes audio text information, pause point information, word speed information and voice data type information
Deng.User, which can according to need, dubs foot required for extraction in described dub in script database VoiceScriptData
Notebook data dubs foot to acquisition then by dubbing script data specification tool normalizeVoiceScriptData ()
Notebook data carries out standardization processing.Specifically, main includes that forbidden character in script text and not sounding character are dubbed in filtering,
According to word speed, according to target video scene distribution quantity and video length, first successive step text size is adjusted to dub script
Data are as subsequent algorithm operating specification data.
Further, by dubbing script feature extracting tool getVoiceScriptFeatureData (), acquisition is dubbed
Characteristic in script data, including sound type, word speed and pause point data.
Further, intelligent dubbing data is generated.In one embodiment of the invention, intelligent dubbing data tool is generated
GeneAIVoiceData () includes:
S401:Obtain different types of voice data;
S402:Classification sound characteristic is carried out to the different types of voice data of acquisition to extract;
S403:Classification voice is carried out to the different types of sound characteristic of extraction to be trained, and generates phonetic algorithm model
AIVoiceModel;
S404:It is corresponding according to different types of voice data corresponding in phonetic algorithm model AIVoiceModel and the type
Classification voice characteristics data, synthesize the voice in corresponding voice script using speech regeneration method, adjustment word speed and correspondence are stopped
Pause point synthesizes dubbing data.
The present invention is based on to different types of a large amount of voice datas are specified, vocal print, sound frequency, intonation, tone etc. are carried out
The extraction of feature carries out speech model training;It is also based on the voice data of user's typing, carries out above-mentioned feature extraction,
Carry out model training.Script is dubbed according to user's input in this way, so that it may which different types of sound rendering is specified in completion, even
The synthesis of user oneself sound.User can choose the sound of a certain type, be dubbed for target video.
Further, be related to it is special dub script data, to it is special dub script data carry out specially treated, specifically
For, according to it is special dub script and generate special dub audio.
Further, after according to script data sound dubbing data is dubbed, the dubbing data of generation is matched by synchronizing
Sound and video scene tool syncVoiceWithVideo () mutually tie the dubbing data of synthesis with the short video scene of target
It closes, realizes that sound is drawn and synchronize, specific implementation process first determines whether the duration of dubbing data and the short-sighted frequency of target, realizes and is aligned to the two
Operation.
Further, to after synchronizing intelligent dubbing data and video scene be finely adjusted processing, specifically include:
S601:When video scene switches, gradual change processing is carried out to the dubbing data of generation;
S602:Dubbing data and video scene are further finely tuned, guarantee the synchronization of dubbing data and video scene.
According to the duration of each scene in the short-sighted frequency of target, fine tuning is done to each section in dubbing data and is synchronized;Again to every section
Gradual change processing is done at the pause of dubbing data, the case where there are multiple voices to same time point is reasonable volume adjustment behaviour
Make, finally realize the generation intelligently dubbed with it is synchronous.
It should be noted that herein, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to
Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those
Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment
Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that
There is also other identical elements in process, method, article or equipment including the element.
Finally it should be noted that:Obviously, the above embodiment is merely an example for clearly illustrating the present invention, and simultaneously
The non-restriction to embodiment.For those of ordinary skill in the art, it can also do on the basis of the above description
Other various forms of variations or variation out.There is no necessity and possibility to exhaust all the enbodiments.And thus drawn
The obvious changes or variations that Shen goes out are still in the protection scope of this invention.
Claims (5)
1. a kind of Freehandhand-drawing video intelligent dubs generation and synchronous method, which is characterized in that the Freehandhand-drawing video intelligent dubs life
At and synchronous method include:
S10:Script data is dubbed in acquisition, and is stored in dubbing in script database;
S20:The script data of dubbing got is standardized;
S30:The characteristic for dubbing script data is obtained, and is stored in dubbing in teaching data characteristics database;
S40:Generate intelligent dubbing data;
S50:The intelligent dubbing data of generation is synchronized in institute's video scene to be applied;
S60:To after synchronizing intelligent dubbing data and video scene be finely adjusted processing.
2. Freehandhand-drawing video intelligent according to claim 1 dubs generation and synchronous method, it is characterised in that:Step S40
Including:
S401:Obtain different types of voice data;
S402:Classification sound characteristic is carried out to the different types of voice data of acquisition to extract;
S403:Classification voice is carried out to the different types of sound characteristic of extraction to be trained, and generates phonetic algorithm model;
S404:It is special according to different types of voice data corresponding in phonetic algorithm model and the corresponding classification sound of the type
Data are levied, synthesize the voice in corresponding voice script using speech regeneration method, adjustment word speed and corresponding pause point, synthesis are dubbed
Data.
3. according to claim 1 or Freehandhand-drawing video intelligent described in any one of 2 dubs generation and synchronous method, feature
It is:It further include step S41 after the step S40:
Be related to it is special dub script data, to it is special dub script data carry out specially treated.
4. Freehandhand-drawing video intelligent according to claim 3 dubs generation and synchronous method, it is characterised in that:It is described to dub
Script data includes audio text information, pause point information, word speed information and voice data type information.
5. Freehandhand-drawing video intelligent according to claim 4 dubs generation and synchronous method, it is characterised in that:Step S60
Including:
S601:When video scene switches, gradual change processing is carried out to the dubbing data of generation;
S602:Dubbing data and video scene are further finely tuned, guarantee the synchronization of dubbing data and video scene.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810788821.4A CN108900886A (en) | 2018-07-18 | 2018-07-18 | A kind of Freehandhand-drawing video intelligent dubs generation and synchronous method |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201810788821.4A CN108900886A (en) | 2018-07-18 | 2018-07-18 | A kind of Freehandhand-drawing video intelligent dubs generation and synchronous method |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN108900886A true CN108900886A (en) | 2018-11-27 |
Family
ID=64350849
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201810788821.4A Pending CN108900886A (en) | 2018-07-18 | 2018-07-18 | A kind of Freehandhand-drawing video intelligent dubs generation and synchronous method |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN108900886A (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111031386A (en) * | 2019-12-17 | 2020-04-17 | 腾讯科技(深圳)有限公司 | Video dubbing method and device based on voice synthesis, computer equipment and medium |
| CN111866582A (en) * | 2019-04-26 | 2020-10-30 | 广州声活圈信息科技有限公司 | Deduction for user matching opponent game and deduction synthesis method |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20020193994A1 (en) * | 2001-03-30 | 2002-12-19 | Nicholas Kibre | Text selection and recording by feedback and adaptation for development of personalized text-to-speech systems |
| EP1701527A1 (en) * | 2005-03-10 | 2006-09-13 | Avaya Technology Llc | Graphical menu generation in interactive voice response systems |
| CN101178896A (en) * | 2007-12-06 | 2008-05-14 | 安徽科大讯飞信息科技股份有限公司 | Unit selection voice synthetic method based on acoustics statistical model |
| CN101359473A (en) * | 2007-07-30 | 2009-02-04 | 国际商业机器公司 | Auto speech conversion method and apparatus |
| CN105118498A (en) * | 2015-09-06 | 2015-12-02 | 百度在线网络技术(北京)有限公司 | Training method and apparatus of speech synthesis model |
| CN106531148A (en) * | 2016-10-24 | 2017-03-22 | 咪咕数字传媒有限公司 | Cartoon dubbing method and apparatus based on voice synthesis |
| CN107172449A (en) * | 2017-06-19 | 2017-09-15 | 微鲸科技有限公司 | Multi-medium play method, device and multimedia storage method |
-
2018
- 2018-07-18 CN CN201810788821.4A patent/CN108900886A/en active Pending
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20020193994A1 (en) * | 2001-03-30 | 2002-12-19 | Nicholas Kibre | Text selection and recording by feedback and adaptation for development of personalized text-to-speech systems |
| EP1701527A1 (en) * | 2005-03-10 | 2006-09-13 | Avaya Technology Llc | Graphical menu generation in interactive voice response systems |
| CN101359473A (en) * | 2007-07-30 | 2009-02-04 | 国际商业机器公司 | Auto speech conversion method and apparatus |
| CN101178896A (en) * | 2007-12-06 | 2008-05-14 | 安徽科大讯飞信息科技股份有限公司 | Unit selection voice synthetic method based on acoustics statistical model |
| CN105118498A (en) * | 2015-09-06 | 2015-12-02 | 百度在线网络技术(北京)有限公司 | Training method and apparatus of speech synthesis model |
| CN106531148A (en) * | 2016-10-24 | 2017-03-22 | 咪咕数字传媒有限公司 | Cartoon dubbing method and apparatus based on voice synthesis |
| CN107172449A (en) * | 2017-06-19 | 2017-09-15 | 微鲸科技有限公司 | Multi-medium play method, device and multimedia storage method |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111866582A (en) * | 2019-04-26 | 2020-10-30 | 广州声活圈信息科技有限公司 | Deduction for user matching opponent game and deduction synthesis method |
| CN111031386A (en) * | 2019-12-17 | 2020-04-17 | 腾讯科技(深圳)有限公司 | Video dubbing method and device based on voice synthesis, computer equipment and medium |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Lacey | Listening publics: The politics and experience of listening in the media age | |
| TWI704805B (en) | Video editing method and device | |
| CN105244022B (en) | Audio-video method for generating captions and device | |
| US5880788A (en) | Automated synchronization of video image sequences to new soundtracks | |
| CN110428811B (en) | Data processing method and device and electronic equipment | |
| US20080275700A1 (en) | Method of and System for Modifying Messages | |
| JP5206553B2 (en) | Browsing system, method, and program | |
| CN102280104B (en) | File phoneticization processing method and system based on intelligent indexing | |
| CN104252872B (en) | Lyric generating method and intelligent terminal | |
| CN105244041B (en) | The evaluation method and device of song audition | |
| KR20140133056A (en) | Apparatus and method for providing auto lip-synch in animation | |
| CN113676772B (en) | Video generation method and device | |
| CN115515002A (en) | Intelligent admire class generation method and device based on virtual digital person and storage medium | |
| CN109584859A (en) | Phoneme synthesizing method and device | |
| CN111613224A (en) | Personalized voice synthesis method and device | |
| CN101615417B (en) | Synchronous Chinese lyrics display method which is accurate to words | |
| CN108900886A (en) | A kind of Freehandhand-drawing video intelligent dubs generation and synchronous method | |
| CN103544978A (en) | Multimedia file manufacturing and playing method and intelligent terminal | |
| CN102447785A (en) | Generation method of prompt information of mobile terminal and device | |
| Barbatsis et al. | Analyzing meaning in form: Soap opera's compositional construction of “realness” | |
| Koço et al. | Applying multiview learning algorithms to human-human conversation classification. | |
| CN118945440A (en) | A digital human video generation method and device based on AIGC technology | |
| Thain | Anarchival Images: The Labour of Chronic Collage | |
| JP2008084021A (en) | Movie scenario generation method, program, and apparatus | |
| CN121078294B (en) | Video call subtitle real-time refreshing method based on voice large model analysis |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181127 |