CN108900886A

CN108900886A - A kind of Freehandhand-drawing video intelligent dubs generation and synchronous method

Info

Publication number: CN108900886A
Application number: CN201810788821.4A
Authority: CN
Inventors: 魏博; 邵猛
Original assignee: Shenzhen Qianhai Hand Painted Technology and Culture Co Ltd
Current assignee: Shenzhen Qianhai Hand Painted Technology and Culture Co Ltd
Priority date: 2018-07-18
Filing date: 2018-07-18
Publication date: 2018-11-27

Abstract

The present invention provides a kind of Freehandhand-drawing video intelligent and dubs generation and synchronous method, which is characterized in that the Freehandhand-drawing video intelligent dubs generation and synchronous method includes：S10：Script data is dubbed in acquisition；S20：The script data of dubbing got is standardized；S30：Obtain the characteristic for dubbing script data；S40：Generate intelligent dubbing data；S50：The intelligent dubbing data of generation is synchronized in institute's video scene to be applied；S60：To after synchronizing intelligent dubbing data and video scene be finely adjusted processing.The present invention provides a kind of Freehandhand-drawing video intelligent and dubs generation and synchronous method, based on to different types of a large amount of voice datas are specified, carries out speech model training, and user can choose the sound of a certain type, be dubbed for target video.

Description

A kind of Freehandhand-drawing video intelligent dubs generation and synchronous method

Technical field

The present invention relates to Freehandhand-drawing video intelligents to dub field, and in particular to a kind of Freehandhand-drawing video intelligent dubs generation and same The method of step.

Background technique

In the short-sighted frequency production process of Freehandhand-drawing, sound is an important component part, sound and Freehandhand-drawing animation group together At the short-sighted frequency of Freehandhand-drawing.

And the sound in the short-sighted frequency of Freehandhand-drawing now realizes technology, there are mainly two types of：1. being simply added background music, 2. specially Industry is dubbed personnel and is dubbed according to video image content, and 3. utilize speech synthesis technique, synthesizes the voice of specified text, and These three sound realize technology, each there is many problems and disadvantages.

The first is simply added the mode of background music, and disadvantage music cannot be synchronous with picture material, can not accomplish to picture The effects of explanation in face, can only sound in simple realization Freehandhand-drawing video, the compatible degree of sound and the quality of video are all difficult to protect Card.

Second of profession dubs mode, needs to dub script by the personnel that dub of profession, production, uses dubbing for profession Equipment is gone to realize and be dubbed, then using the video editing tool of profession, is synthesized to video dubbing according to video image content In；The shortcomings that this mode, is and profession dubs communication cost between personnel, monetary cost and entirely dubs synthesis process Time cost, once and video content have modification, it is necessary to re-start it is above-mentioned entirely dub process, for common It is that difficulty is biggish that user's use or professional user, which use,.

The third speech synthesis technique, the voice after synthesis, sounder are machine synthetic videos, word speed, intonation, at the tone Reason, fluency etc. all have a long way to go between true man's sound, can not achieve dubbing for high quality in Freehandhand-drawing video.

To sum up, existing technology, the sound for having no idea to complete well in Freehandhand-drawing video are realized at present.

Summary of the invention

In view of this, the present invention, which provides a kind of Freehandhand-drawing video intelligent, dubs generation and synchronous method, based on to it is specified not A large amount of voice datas of same type carry out speech model training, and user can choose the sound of a certain type, be target video into Row is dubbed；And according to the duration of each scene in the short-sighted frequency of target, fine tuning is done to each section in dubbing data and is synchronized；Again to every Gradual change processing is done at the pause of section dubbing data, the case where there are multiple voices to same time point, does reasonable volume adjustment Operation, finally realize the generation intelligently dubbed with it is synchronous.

The present invention provides a kind of Freehandhand-drawing video intelligent and dubs generation and synchronous method, which is characterized in that the Freehandhand-drawing view Frequency intelligence dubs generation and synchronous method includes：

S10：Script data is dubbed in acquisition, and is stored in dubbing in script database；

S20：The script data of dubbing got is standardized；

S30：The characteristic for dubbing script data is obtained, and is stored in dubbing in teaching data characteristics database；

S40：Generate intelligent dubbing data；

S50：The intelligent dubbing data of generation is synchronized in institute's video scene to be applied；

S60：To after synchronizing intelligent dubbing data and video scene be finely adjusted processing.

Preferably, step S40 includes：

S401：Obtain different types of voice data；

S402：Classification sound characteristic is carried out to the different types of voice data of acquisition to extract；

S403：Classification voice is carried out to the different types of sound characteristic of extraction to be trained, and generates phonetic algorithm model；

S404：It is special according to different types of voice data corresponding in phonetic algorithm model and the corresponding classification sound of the type Data are levied, synthesize the voice in corresponding voice script using speech regeneration method, adjustment word speed and corresponding pause point, synthesis are dubbed Data.

It preferably, further include step S41 after the step S40：

Be related to it is special dub script data, to it is special dub script data carry out specially treated.

Preferably, the script data of dubbing includes audio text information, pause point information, word speed information and sound number According to type information.

Preferably, step S60 includes：

S601：When video scene switches, gradual change processing is carried out to the dubbing data of generation；

S602：Dubbing data and video scene are further finely tuned, guarantee the synchronization of dubbing data and video scene.

The invention has the advantages and positive effects that：The present invention is based on to specifying different types of a large amount of voice datas, The extraction of the features such as vocal print, sound frequency, intonation, the tone is carried out, speech model training is carried out；It is also based on user's typing Voice data carries out above-mentioned feature extraction, carries out model training.Script is dubbed according to user's input in this way, so that it may complete At the synthesis for specifying different types of sound rendering or even user oneself sound.User can choose the sound of a certain type, be Target video is dubbed；And according to the duration of each scene in the short-sighted frequency of target, each section in dubbing data is finely tuned It is synchronous；Again to gradual change processing is done at the pause of every section of dubbing data, the case where there are multiple voices to same time point, is done rationally Volume adjustment operation, finally realize the generation intelligently dubbed with it is synchronous.

Detailed description of the invention

Fig. 1 is that the Freehandhand-drawing video intelligent of first embodiment of the invention embodiment dubs the method stream of generation and synchronous method Cheng Tu；

Fig. 2 is that the Freehandhand-drawing video intelligent of second embodiment of the invention dubs the method flow diagram of generation and synchronous method；

Fig. 3 is the method flow diagram of the intelligent dubbing data of generation of the embodiment of the present invention；

Fig. 4 be the embodiment of the present invention to after synchronizing intelligent dubbing data and video scene be finely adjusted the method stream of processing Cheng Tu；

Fig. 5 is that the Freehandhand-drawing video intelligent of inventive embodiments dubs the functional diagram of generation and synchronization engine.

Specific embodiment

In order to better understand the present invention, the present invention is further retouched with attached drawing combined with specific embodiments below It states.

A kind of Freehandhand-drawing video intelligent of the present invention dubs generation and synchronous method, which is characterized in that the Freehandhand-drawing video intelligence Generation can be dubbed and synchronous method includes：

S10：Script data is dubbed in acquisition；

S20：The script data of dubbing got is standardized；

S30：Obtain the characteristic for dubbing script data；

S40：Generate intelligent dubbing data；

In one embodiment of the invention, setting be stored with dub script data dub script database VoiceScriptData, it is described dub to be stored in script database VoiceScriptData largely dub script data, Dubbing script data described in wherein includes audio text information, pause point information, word speed information and voice data type information Deng.User, which can according to need, dubs foot required for extraction in described dub in script database VoiceScriptData Notebook data dubs foot to acquisition then by dubbing script data specification tool normalizeVoiceScriptData () Notebook data carries out standardization processing.Specifically, main includes that forbidden character in script text and not sounding character are dubbed in filtering, According to word speed, according to target video scene distribution quantity and video length, first successive step text size is adjusted to dub script Data are as subsequent algorithm operating specification data.

Further, by dubbing script feature extracting tool getVoiceScriptFeatureData (), acquisition is dubbed Characteristic in script data, including sound type, word speed and pause point data.

Further, intelligent dubbing data is generated.In one embodiment of the invention, intelligent dubbing data tool is generated GeneAIVoiceData () includes：

S401：Obtain different types of voice data；

S403：Classification voice is carried out to the different types of sound characteristic of extraction to be trained, and generates phonetic algorithm model AIVoiceModel；

S404：It is corresponding according to different types of voice data corresponding in phonetic algorithm model AIVoiceModel and the type Classification voice characteristics data, synthesize the voice in corresponding voice script using speech regeneration method, adjustment word speed and correspondence are stopped Pause point synthesizes dubbing data.

The present invention is based on to different types of a large amount of voice datas are specified, vocal print, sound frequency, intonation, tone etc. are carried out The extraction of feature carries out speech model training；It is also based on the voice data of user's typing, carries out above-mentioned feature extraction, Carry out model training.Script is dubbed according to user's input in this way, so that it may which different types of sound rendering is specified in completion, even The synthesis of user oneself sound.User can choose the sound of a certain type, be dubbed for target video.

Further, be related to it is special dub script data, to it is special dub script data carry out specially treated, specifically For, according to it is special dub script and generate special dub audio.

Further, after according to script data sound dubbing data is dubbed, the dubbing data of generation is matched by synchronizing Sound and video scene tool syncVoiceWithVideo () mutually tie the dubbing data of synthesis with the short video scene of target It closes, realizes that sound is drawn and synchronize, specific implementation process first determines whether the duration of dubbing data and the short-sighted frequency of target, realizes and is aligned to the two Operation.

Further, to after synchronizing intelligent dubbing data and video scene be finely adjusted processing, specifically include：

According to the duration of each scene in the short-sighted frequency of target, fine tuning is done to each section in dubbing data and is synchronized；Again to every section Gradual change processing is done at the pause of dubbing data, the case where there are multiple voices to same time point is reasonable volume adjustment behaviour Make, finally realize the generation intelligently dubbed with it is synchronous.

It should be noted that herein, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including the element.

Finally it should be noted that：Obviously, the above embodiment is merely an example for clearly illustrating the present invention, and simultaneously The non-restriction to embodiment.For those of ordinary skill in the art, it can also do on the basis of the above description Other various forms of variations or variation out.There is no necessity and possibility to exhaust all the enbodiments.And thus drawn The obvious changes or variations that Shen goes out are still in the protection scope of this invention.

Claims

1. a kind of Freehandhand-drawing video intelligent dubs generation and synchronous method, which is characterized in that the Freehandhand-drawing video intelligent dubs life At and synchronous method include：

S20：The script data of dubbing got is standardized；

S40：Generate intelligent dubbing data；

2. Freehandhand-drawing video intelligent according to claim 1 dubs generation and synchronous method, it is characterised in that：Step S40 Including：

S401：Obtain different types of voice data；

3. according to claim 1 or Freehandhand-drawing video intelligent described in any one of 2 dubs generation and synchronous method, feature It is：It further include step S41 after the step S40：

4. Freehandhand-drawing video intelligent according to claim 3 dubs generation and synchronous method, it is characterised in that：It is described to dub Script data includes audio text information, pause point information, word speed information and voice data type information.

5. Freehandhand-drawing video intelligent according to claim 4 dubs generation and synchronous method, it is characterised in that：Step S60 Including：