CN109819313A - Method for processing video frequency, device and storage medium - Google Patents
Method for processing video frequency, device and storage medium Download PDFInfo
- Publication number
- CN109819313A CN109819313A CN201910023976.3A CN201910023976A CN109819313A CN 109819313 A CN109819313 A CN 109819313A CN 201910023976 A CN201910023976 A CN 201910023976A CN 109819313 A CN109819313 A CN 109819313A
- Authority
- CN
- China
- Prior art keywords
- video
- image
- target
- video image
- facial
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Television Signal Processing For Recording (AREA)
- Management Or Editing Of Information On Record Carriers (AREA)
Abstract
The embodiment of the present application discloses a kind of method for processing video frequency, device and storage medium, wherein method for processing video frequency include: obtain user's input dub audio data;Multi-frame video image is obtained from video file;The initial video image comprising target face is determined from multi-frame video image, and the target face in initial video image is merged with the facial image of selection, obtains target video image;Synthesis processing is carried out at least target video image to audio data is dubbed, obtains audio-video composite document.User is dubbed and is organically blended with elements such as user's portraits into video production by this programme, promotes user's depth degree of involvement and video personalization intensity in video production.
Description
Technical field
This application involves technical field of information processing, and in particular to a kind of method for processing video frequency, device and storage medium.
Background technique
With the development of internet with the development of mobile communications network, while also along with the processing capacity of terminal and storage
The fast development of ability, the application program of magnanimity have obtained rapid propagation and use, especially video class application.
Video refer to a series of static images are captured in a manner of electric signal, are noted down, are handled, are stored, transmit with
The various technologies reappeared.Continuous image change is per second when being more than certain frame number picture or more, and human eye can not distinguish the quiet of single width
State picture, it appears that be smooth continuous visual effect, picture continuous in this way is called video.The prosperity of network technology also promotes
The record segment of video is present on internet in the form of streaming media and can be received and be played by computer.The relevant technologies
In, it may also allow for user to carry out the operation such as editing, recombination, format transformation to video material.
Summary of the invention
The embodiment of the present application provides a kind of method for processing video frequency, device and storage medium, can be promoted in video production and be used
The family depth degree of involvement and video personalization intensity.
The embodiment of the present application provides a kind of method for processing video frequency, comprising:
Obtain user's input dubs audio data;
Multi-frame video image is obtained from video file;
The initial video image comprising target face is determined from the multi-frame video image, by the initial video figure
Target face as in is merged with the facial image of selection, obtains target video image;
Audio data is dubbed and at least described target video image carries out synthesis processing to described, obtains audio-video synthesis text
Part.
Correspondingly, the embodiment of the present application also provides a kind of video process apparatus, comprising:
Audio acquiring unit, obtain user's input dubs audio data;
Image acquisition unit, for obtaining multi-frame video image from video file;
Processing unit will for determining the initial video image comprising target face from the multi-frame video image
Target face in the initial video image is merged with the facial image of selection, obtains target video image;
Synthesis unit is obtained for dubbing audio data and at least described target video image carries out synthesis processing to described
To audio-video composite document.
Correspondingly, the storage medium is stored with a plurality of instruction, institute the embodiment of the present application also provides a kind of storage medium
It states instruction to be loaded suitable for processor, to execute the step in method for processing video frequency as described above.
For the embodiment of the present application during playing video file, what acquisition user inputted first dubs audio data, from
Multi-frame video image is obtained in video file.Then, it is determined from the multi-frame video image initial comprising target face
Video image, and the target face in the initial video image is merged with the facial image of selection, obtain target video figure
Picture.Finally, dubbing audio data and at least described target video image carries out synthesis processing to described, audio-video synthesis text is obtained
Part.User can be dubbed and be organically blended with elements such as user's portraits into video production by this programme, promoted and used in video production
The family depth degree of involvement and video personalization intensity.
Detailed description of the invention
In order to more clearly explain the technical solutions in the embodiments of the present application, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, the drawings in the following description are only some examples of the present application, for
For those skilled in the art, without creative efforts, it can also be obtained according to these attached drawings other attached
Figure.
Fig. 1 is a kind of configuration diagram of method for processing video frequency provided by the embodiments of the present application.
Fig. 2 is the flow diagram of method for processing video frequency provided by the embodiments of the present application.
Fig. 3 is another flow diagram of method for processing video frequency provided by the embodiments of the present application.
Fig. 4 is a kind of application scenarios schematic diagram of method for processing video frequency provided by the embodiments of the present application.
Fig. 5 is another configuration diagram of method for processing video frequency provided by the embodiments of the present application.
Fig. 6 a~6e is the interface alternation schematic diagram of method for processing video frequency provided by the embodiments of the present application.
Fig. 7 is another configuration diagram of method for processing video frequency provided by the embodiments of the present application.
Fig. 8 is the structural schematic diagram of video process apparatus provided by the embodiments of the present application.
Fig. 9 is another structural schematic diagram of video process apparatus provided by the embodiments of the present application.
Figure 10 is the structural schematic diagram of terminal provided by the embodiments of the present application.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of embodiments of the present application, instead of all the embodiments.It is based on
Embodiment in the application, those skilled in the art's every other implementation obtained without creative efforts
Example, shall fall in the protection scope of this application.
The embodiment of the present application provides a kind of method for processing video frequency, device and storage medium.
Wherein, which specifically can integrate has in tablet PC (Personal Computer), mobile phone etc.
Storage element is simultaneously equipped with microprocessor and has in the terminating machine of operational capability.For example, specifically being collected with the video process apparatus
At for mobile phone, referring to Fig. 1, for mobile phone during playing video file, obtain user's input dubs audio data, together
When according to predetermined frame rate time interval capture played video, by video pumping frame at image.Then, from multi-frame video image
Determine target face to be processed, the facial image fusion treatment that target face and user are chosen, the target that obtains that treated
Video image (i.e. face fusion image).Then, to treated, target video image is encoded, and obtains video code flow, and
Audio data is encoded, audio code stream is obtained.Finally, video code flow is synthesized output with audio code stream, audio-video is obtained
Composite document.
It is described in detail separately below.It should be noted that the serial number of following embodiment is not as preferably suitable to embodiment
The restriction of sequence.
The embodiment of the present application provides a kind of method for processing video frequency, comprising: obtain user's input dubs audio data;From view
Multi-frame video image is obtained in frequency file;The initial video figure comprising target face is determined from the multi-frame video image
Target face in the initial video image is merged with the facial image of selection, obtains target video image by picture;To described
It dubs audio data and at least described target video image carries out synthesis processing, obtain audio-video composite document.
Referring to Fig. 2, Fig. 2 is the flow diagram of method for processing video frequency provided by the embodiments of the present application.Video processing
The detailed process of method can be such that
101, obtain user's input dubs audio data.
Specifically, can obtain user's input during playing video file dubs audio data, it can also be by
User records dub audio data in advance.It can be during playing video file, use for example, this dubs audio data
The voice messaging that family passes through the equipment real-time recordings such as microphone, the receiver of terminal.The video file can be that totally disappeared sound version view
Frequency file (i.e. without audio data in video file), part noise reduction version video file (only remain portion in video file
Multi-voice frequency data), or non-noise reduction version video file.
It is described to dub audio data in the present embodiment, it may include audio user data, original audio data and back
Scape audio data, wherein the audio user data are user's actual sound that user is directed to that specific video display role records, it can also
Think the aside sound that user records for movie and television contents;The original audio data is the original sound of nonspecific video display role
Sound;The background audio data are video display background sound.For example, including video display role A and video display role B in the video file, then
When playing the video file, the original sound that audio data can retain video display role A sending is dubbed, dubbing user can be to spy
Fixed video display role B needs the lines dubbed to carry out dubbing recording, and in addition to this described dub in audio data can also include
Video display background sound, such as background music, background special efficacy sound etc..
102, multi-frame video image is obtained from video file.
It in the embodiment of the present application, include at least one video display role with facial image in the video file.Pass through
Video file is carried out to take out frame processing, multi-frame video image can be obtained from the video file.
103, the initial video image comprising target face is determined from multi-frame video image, it will be in initial video image
Target face merged with the facial image of selection, obtain target video image.
Wherein, the specific video display role that the target face in the initial video image can need to dub it for user
Face, the facial image of the selection is user by opening the face in the photo that photograph album is chosen, or passes through hand of taking pictures
Section directly shoots obtained face.
Facial image fusion refers to that the facial image by the selection replaces or cover the target facial image, Huo Zheji
Appearance obtained from the target face and the facial image of selection the characteristics of deforms face.When it is implemented, can be first to first
Target face in beginning video image is detected, and integrity information, orientation information, the expression information of the target face are obtained
Deng such as detecting the target face and whether be blocked, if side or just facing towards camera lens, if neigh and shout or cry.It is obtaining
After taking above- mentioned information, performed corresponding processing in conjunction with facial image of the above- mentioned information to selection.Such as when target face is hidden
When gear, the facial image of the selection is carried out adaptable to block processing;When target face is side face, from selection
Adaptable side face image is obtained in facial image;When target face is in sobbing state, to the facial image of selection
Sobbing image procossing is carried out, the facial image chosen more naturally is dissolved among the video image, is obtained
More natural target video image.
Wherein, selected facial image can refer to the facial image shot by mobile phone, be also possible to protect
There are the local facial images in mobile phone.In practical application, the facial image of the selection can be the above-mentioned face for dubbing user
Image.It and comprising target face is then a certain or certain video display angle in captured multi-frame video image in initial video image
The face of color.It is then the facial image that will dub user and the target face progress people in initial video image in practical application
Face fusion, obtains having the face fusion image for dubbing user's face characteristic and target face characteristic.Then, by face fusion figure
As the target face in replacement initial video image, thus the target video image that obtains that treated.
In some embodiments, can also include following below scheme before obtaining multi-frame video image in video file:
Video file is parsed, at least one facial image is therefrom extracted;
The samples selection instruction of user is received, and is chosen from least one described facial image based on samples selection instruction
Sample facial image.
Specifically, terminal can identify all video display roles occurred in video material to video file intelligently parsing
The facial image of (the only role that limit has face), and be each role match piece identity.Then, the video display angle that will identify that
Color extracts display interface and selects for user.Finally, needing to replace one or more video display roles of face by user's selection.
It should be understood that there may be the video figures for not including target facial image in the multi-frame video image captured
Picture.Therefore, it in order to promote the efficiency that facial image merges, can be filtered out from multi-frame video image comprising target facial image
Target video image, execute face mixing operation just for the target video image that is filtered out so as to subsequent.Then, step
" the initial video image comprising target face is determined from multi-frame video image ", may include following below scheme:
Capture the face in each frame of multi-frame video image;
Judge in video image whether include and the matched target face of sample facial image;
If so, using the video image as initial video image.
Specifically, by the facial image of the selected video display role of user, with the people in each frame video image of capture
Face is matched, to filter out the video image for needing to carry out face replacement in multi-frame video image.
In specific implementation process, the face location in residual error network detection single frame video image can be built, such as depth net
Network (Residual Network, abbreviation ResNet), to find all people's face position in single-frame images.Then, face is carried out
Critical point detection passes through face key point location matches piece identity.For example, determining target face from multi-frame video image
When image, it may comprise steps of:
(11) 29 layers of ResNet network is built;
(12) face characteristic is extracted based on histograms of oriented gradients;
(13) 3,000,000 pictures of training complete network training;
(14) the face key point feature of the video display role detected in multi-frame video image is calculated;
(15) facial image of detection is retrieved in the database;
(16) matched face identity is returned.
Wherein, the number of plies of ResNet network is more, and identification accuracy is higher.The number of plies of ResNet network can in the present embodiment
To be set according to the actual situation, however it is not limited to above-mentioned 29 layers.
104, to audio data is dubbed and at least target video image carries out synthesis processing, audio-video composite document is obtained.
Specifically, synthesis processing, which refers to, will dub audio data and at least target video image supercomposed coding, and then closed
Audio-video composite document after.
In some embodiments, the playing duration of the audio-video composite document obtained can be equal to the initial video
The duration of file, i.e., the described audio-video composite document include the target video image containing face and other do not include the figure of face
Picture, by dubbing audio data and the target video image supercomposed coding for described, so that obtained audio-video synthesis text
Part plot is richer.
In other embodiments, the playing duration of the audio-video composite document obtained can be less than the initial view
The duration of frequency file, i.e., the described audio-video composite document can only include the target video image containing face, by will be described
Audio data and the target video image supercomposed coding are dubbed, so that the obtained audio-video composite document has specific video display
The supervising ADR editor effect of role.
A kind of method for processing video frequency provided in this embodiment dubs audio data by acquisition user's input;From video
Multi-frame video image is obtained in file;The initial video image comprising target face is determined from the multi-frame video image,
Target face in the initial video image is merged with the facial image of selection, obtains target video image;Match to described
Sound audio data and at least described target video image carry out synthesis processing, obtain audio-video composite document.This programme is by user
It dubs and organically blends with elements such as user's portraits into video production, promote user's depth degree of involvement and view in video production
The personalized intensity of frequency.
On the basis of the above embodiments, some steps will be further described below.
With reference to Fig. 3, in practical application, need to acquisition dub audio and treated that video image recompiles, with
Synthesize audio-video document output.Meanwhile in conjunction with Fig. 1, in some embodiments, step is " to dubbing audio data and at least target
Video image carries out synthesis processing ", may include following below scheme:
1041, multi-frame video image is updated based on target video image;
1042, updated multi-frame video image is encoded, obtains video code flow;
1043, it encodes to dubbing audio data, obtains audio code stream;
1044, video code flow is synthesized into output with audio code stream.
In the present embodiment, audio data and the target video image and multi-frame video image after replacement face will be dubbed
In the video image that is not screened out, it is common to synthesize audio-video document output, obtain completely matching audio-video.
Wherein, to treated mode that video image encoded can there are many, as long as what product systems were supported
Format.For example, can be based on video formats such as .mpg .mpeg .mp4 .rmvb .wmv .asf .avi .asx, to place
Video image after reason is encoded, and video code flow is formed, thus will treated multi-frame video package images at video file.
In practical application, it can be controlled based on play time of the different coding mode to video code flow.Preferably, may be used
By play time control within 15 seconds.
Likewise, to treated mode that video image encoded can there are many, as long as product systems are supported
Format.For example, audio data can be dubbed to user's input based on audio formats such as .act .mp3 .wma .wav
It is encoded, audio code stream is formed, so that audio code stream is packaged into and the matched audio file of video file.
In some embodiments, the time point of video code flow and each frame of audio code stream or sampled point can be calculated separately, is led to
Audio-video synthesis system is crossed, output is played simultaneously in video code flow and audio code stream that coding is completed, to obtain audio-video conjunction
At file.That is, in some embodiments, step " obtains multi-frame video image " from video file, it may include to flow down
Journey:
Video image is captured from video file according to predetermined frame rate time interval, obtains multi-frame video image.
Wherein, when video is taken out frame into image, predetermined frame rate time interval can be by production manufacturer or this field
Technical staff sets.For example, the frame per second can be 20 frames/second, 50 frames/second etc., corresponding frame per second time interval is 50 millis
Second, 20 milliseconds etc..
Then, step " encoding to audio data is dubbed, obtain audio code stream ", may include following below scheme:
Obtain the video image frame number captured in target time section, wherein target time section is defeated to dub audio data
The initial time entered to finish time time;
Determine total playing duration of video code flow;
According to the frame number and total playing duration, the target playing duration for dubbing audio data is calculated;
Sample frequency is determined based on target playing duration and the corresponding duration of target time section, and is based on the sample frequency pair
This dubs audio data coding, obtains audio code stream.
It should be noted that can allow during playing video file in the present embodiment and dub user's input multistage sound
Frequency evidence.Wherein, the initial time to finish time time, for dub user input wherein a segment of audio time.
Specifically, always broadcasting based on the video image frame number and encoded video code stream captured in target time section
Duration is put, plays the duration that the video image of above-mentioned frame number need to consume after coding can be calculated.If desired realize audio code stream,
Video code flow is played simultaneously, then the duration for the video image consumption for playing above-mentioned frame number after coding is required to dub audio number with above-mentioned
According to target playing duration it is equal.Therefore, the duration that the video image of above-mentioned frame number need to consume is played after coding being calculated, is made
The target playing duration of audio data is dubbed for this.
After knowing the target playing duration for dubbing audio data and the corresponding duration of target time section, by calculating two
The ratio of person dubs audio data progress sample code to this based on the sample frequency to determine sample frequency, will dub
Audio data is compressed, and realizes being played simultaneously for audio & video, and video display role is not to suitable for reading in avoidable audio & video
The problem of type.
Exist in some embodiments, video code flow " is synthesized output with the audio code stream " by step, may include to flow down
Journey:
It determines in the video code flow, the corresponding broadcasting start time point of video image and knot captured in target time section
Beam time point;
Configure the broadcasting start time point and end time point to broadcasting start time point and the end of the audio code stream
Time point, and the video code flow is synthesized into output with the audio code stream.
Specifically, in output data, by the broadcasting start time point and end time point of video code flow and audio code stream
It is synchronous, to realize being played simultaneously for audio & video.
For example, in one section of video material, the interior corresponding broadcasting start time point of video image captured of target time section
00:00:05 and 00:00:10 is distinguished with end time point, then by the broadcasting start time point of the audio code stream and end time point
Also it is set separately in 00:00:05 and 00:00:10.
In some embodiments, step " the target face in initial video image is merged with the facial image of selection ",
May include following below scheme:
Facial key point to target face in initial video image and the facial key point in the facial image of selection
It is detected and is positioned;
The facial image of selection is aligned with target face by affine transformation;
It is updated based on facial characteristics of the facial image after alignment to target face.
Specifically, can use the machine learning algorithm of cascade residual error regression tree, as gradient promotes decision tree
(Gradient Boosting Decision Tree, abbreviation GBDT) algorithm, detects facial key point.It is with GBDT
Example, steps are as follows for specific algorithm model buildings:
(21) using the true shape of N figures of training, building returns original shape;
(22) using pixel difference as feature, tree construction is divided, every picture is made to fall into a leaf node;
(23) difference for calculating all picture shapes and current tree shape in each leaf node, is stored in after being averaged
Leaf node;
(24) shape of tree is updated using the value in leaf;
(25) enough subtrees are established, until GBDT tree shape indicates true shape.
After the completion of algorithm model is built, the face figure of target face in initial video image, selection can be detected using it
Face face key point as in.Then, face key point in the facial image based on the target face detected and selection
It sets, (Procrustes analysis) is analyzed by Pu Shi, default facial image is calculated to target using least square method
The affine transformation matrix of facial image.To be translated, be revolved to selected face based on obtained affine transformation matrix
Turn, the graph transformations such as scaling, target face and default facial image the progress face location in initial video image be aligned,
Both make face feature point close to.For example, can be obtained after affine transformation with reference to facial image a is preset in Fig. 4
To transformed image d.
In some embodiments, step " is carried out more based on facial characteristics of the facial image after alignment to target face
Newly ", may include following below scheme:
It is partitioned into pedestrian's face region division based on the facial key point in target face, obtains the facial characteristics of target face
Region;
Facial characteristic area is handled according to preset algorithm, obtains the facial characteristics template in facial characteristics region;
The facial image and target face fusion after alignment are obtained into face fusion image using facial characteristics template.
Specifically, can use the geometrical characteristic of face, the face characteristic with size, rotation and shift invariant is extracted
Point, for example the key feature points position of such as eyes, nose and lip position can be extracted.For example, choosing 9 of face
Characteristic point, the distributions of these characteristic points have an angle invariability, respectively 2 eyeball central points, 4 canthus points, two nostrils
Midpoint and 2 corners of the mouth points.
For example, in the present embodiment, facial triagnle profile template (i.e. eye mouth nose mould can be obtained by human face characteristic point
Plate and submits profile template using this and ticks input figure details with reference to facial characteristics template c) is used as in Fig. 4, and then superposition is pre-
If two kinds of input figures of facial image and target facial image complete image co-registration.
With reference to Fig. 4, wherein a is default facial image, and b is target facial image, and c is based in target facial image b
The face exposure mask of face characteristic Area generation, d are the image that target image a is obtained after affine transformation, and final output face melts
Blending image e after conjunction.
But when carrying out face characteristic extraction, since the marginal information of part effectively can not be organized, tradition
Edge detection operator cannot reliably extract the feature of face, such as the region of eyes or lip, it is possible to using such as
The algorithm of Susan operator extracts the feature of face.The principle of Susan operator are as follows: using pixel as the border circular areas of radius, i.e. face
Product covering location of pixels is exposure mask, investigates the pixel value of all the points of each point in facial image in the regional scope, with
The consistent degree of the pixel value of current point.
In some embodiments, face template is being utilized, by the facial image and target face fusion after alignment, is obtaining people
Can also include following below scheme after face blending image:
Calculate the pixel value difference of facial characteristics between target face and the facial image of selection;
Color adjustment parameter is generated according to pixel value difference;
Face fusion image is adjusted based on color adjustment parameter.
Wherein, color adjustment parameter is specifically as follows the difference value between pixel rgb value.
Specifically, due to the target face in selected facial image and initial video image in the colour of skin there may be
Larger difference causes after facial image merges, replacement region and original human face region to merge boundary sawtooth effect more bright
It is aobvious.Therefore, it is necessary to reduce edge sawtooth effect by adjusting the pixel value difference between integration region and original region, with enhancing
Facial degrees of fusion.
For example, in some embodiments, pixel value difference can be reduced by blur effect.It is implemented as follows:
(31) pixel value difference of the facial image septum reset feature of target face and selection is calculated;
(32) blur effect is calculated by pixel value difference;
(33) pixel value difference between target face and the facial image of selection is reduced by Gaussian Blur.
Using aforesaid operations, the colour of skin of integration region is modified to be closer to the face skin of target face to realize
Color.
It is another framework of method for processing video frequency provided by the embodiments of the present application with reference to Fig. 5, Fig. 6 a~6e and Fig. 7, Fig. 5
Schematic diagram;Fig. 6 a~6e is the interface alternation schematic diagram of method for processing video frequency provided by the embodiments of the present application;Fig. 7 is the application reality
Another configuration diagram of the method for processing video frequency of example offer is provided.
Firstly, user can log in the account dubbed and registered in application by Account Logon interface, master is dubbed to enter
Interface.As shown in Figure 6 a, when user, which opens, dubs main interface, popular material and other elements can be shown in current interface
Material, user can be by clicking the display control triggering selection current video material progress video preview of material or being directly entered
Dub the stage.In addition, the main interface can also include search column, it, can be from video element by inputting crucial words in search column
Matched video material is found in material library, promotes the retrieval rate of video material.
With reference to 6b, when video material a certain in choosing is dubbed, can know to recognition of face is carried out in the video material
Role in other video, parses the video character with face from the video material.In Fig. 6 b, from the video element chosen
Three video characters are parsed in material and show character image.In the embodiment of the present application, can be arranged in character image can
See or sightless selection control, the video character for needing to carry out face replacement can be chosen by leading to the selection control.For example, Fig. 6 b
The selection icon in the middle character image upper right corner chooses first video character by the selection icon.
In addition, being also provided with image addition interface in current interface, replacement facial image can be added by the interface.It is real
In the application of border, replacement face material can be added from local picture library by the image heaven interface.When it is implemented, replacement
The requirement of face material is that positive face (without coming back, bowing, side turn), face and the face of true man's face are unobstructed.If adding figure
As being unsatisfactory for requiring then carry out in next step, and produces prompt information prompt and add image again.
, can be by cloud algorithm if image selection is completed, user is merged from the replacement facial image locally added in backstage
In the video character selected into video material, face fusion image is obtained.
In some embodiments, it is precisely dubbed for the ease of dubbing user, view can be played during user dubs
Frequency file, and text information corresponding to the lines of each video display role can be shown in video playing interface, in order to prompt to match
Sound user's lines, avoid forgetting word.That is, the method for processing video frequency can also include following below scheme:
Obtain sample text;
Sample text is shown when dubbing audio data in acquisition user's input.
Wherein, sample text can be the text information editted in advance, can be with texts such as any font, size, colors
Format is shown.For example, with reference to " subtitle " region in Fig. 6 c, which can arrange at dotted line.In addition,
The information such as playback progress, playing duration can be shown by progress bar in video material playing process.
Further, the progress prompt of lines can be also carried out to dub to remind user to be ready for.For example, being become by color
Change the subtitle that label is currently played.
In some embodiments, interface setting text edit interface can also be being dubbed, user can pass through text editor
Interface carries out editor's adjustment to already present sample text, the customized demand of the text to meet certain user.
With continued reference to Fig. 5, in some embodiments, in order to render dub user dub atmosphere, can be according in video
Hold the background music for adding corresponding style to the video file, the background music is played while playing video file, so that
User is dubbed to put into as early as possible in video plot.That is, the method for processing video frequency can also include following below scheme:
Obtain sample background audio data;
In samples played background audio data when dubbing audio data for obtaining user's input.
Wherein, background audio data can be physical instrument (such as piano, violin etc.) or electronic musical instrument etc. play out it is pure
Music is also possible to the mixing music with voice and musical instrument.With continued reference to Fig. 6 c, this, which is dubbed, may be provided with music choosing in interface
Interface (musical note icon control as fig. 6 c) is selected, style can be selected from background sound music storehouse by the selection of music interface
Changeable music style.Wherein, which can be the audio data for being stored in cloud, be also possible to terminal local
Audio data.
In practical application, interface setting recording control interface (the microphone icon control in such as Fig. 6 c) can be being dubbed, led to
Crossing the recording control interface can drive that terminal microphone reception user is called to issue to dub voice, and can realize and start to record,
Pause recording continues the functions such as recording.
In the embodiment of the present application, optionally choose whether need to carry out face replacement, text is shown or background sound
The operations such as happy addition.
With reference to Fig. 6 d, after the completion of dubbing, user can by current interface be arranged preview interface, Subtitle Demonstration interface,
Interface, background music setting interface etc., the audio-video composite document recorded from main regulation is arranged in voice.After the completion of adjusting,
It the video of face fusion of preview new, music and can dub, and audio-video synthesis text can be saved by the saving interface of setting
Part.
Finally, can check the video work dubbed by individual subscriber homepage with reference to Fig. 6 e.In practical application, the boundary
Face may be provided with sharing interface, can authorize to related social application or platform, share what is recorded with audio-video document
To other social platforms.
User can be dubbed and be organically blended with elements such as user's portraits into video production by this programme, promote video production
Middle user's depth degree of involvement and video personalization intensity.
For convenient for better implementation method for processing video frequency provided by the embodiments of the present application, the embodiment of the present application also provides one kind
Device (abbreviation processing unit) based on above-mentioned method for processing video frequency.The wherein meaning of noun and phase in above-mentioned method for processing video frequency
Together, specific implementation details can be with reference to the explanation in embodiment of the method.
Referring to Fig. 8, Fig. 8 is the structural schematic diagram of video process apparatus provided by the embodiments of the present application, the wherein processing
Device may include acquiring unit 301, processing unit 302, Video coding subelement 303, audio coding subelement 304 and close
At unit 305, specifically can be such that
Audio acquiring unit 301, for obtain user input dub audio data;
Image acquisition unit 302, for obtaining multi-frame video image from video file;
Processing unit 303 will be first for determining the initial video image comprising target face from multi-frame video image
Target face in beginning video image is merged with the facial image of selection, obtains target video image;
Synthesis unit 304 obtains sound for that will carry out synthesis processing at least target video image to audio data is dubbed
Video Composition file.
In some embodiments, with reference to Fig. 9, video process apparatus 300 can also include:
Extraction unit 305, for being solved to video file before obtaining multi-frame video image in video file
Analysis, therefrom extracts at least one facial image;
Selecting unit 306, the samples selection for receiving user instructs, and is instructed based on samples selection from least one people
Sample facial image is chosen in face image;
Processing unit 304 specifically can be used for:
Capture the facial image in each frame of multi-frame video image;
Judge in video image whether include and the matched target face of sample facial image;
If so, using the video image as initial video image.
In some embodiments, processing unit 304 specifically can be used for:
Facial key point to target face in initial video image and the facial key point in the facial image of selection
It is detected and is positioned;
The facial image of selection is aligned with target face by affine transformation;
It is updated based on facial characteristics of the facial image after alignment to target face.
In some embodiments, synthesis unit 304 may include:
Subelement is updated, for being updated based on the target video image to the multi-frame video image;
Video coding subelement obtains video code flow for encoding to updated multi-frame video image;
Audio coding subelement obtains audio code stream for encoding to the audio data of dubbing;
Synthesizing subunit, for the video code flow to be synthesized output with the audio code stream.
In some embodiments, image acquisition unit 302 specifically can be used for:
Video image is captured from video file according to predetermined frame rate time interval, obtains multi-frame video image;
Audio coding subelement can be used for:
Obtain the video image frame number captured in target time section, wherein target time section is defeated to dub audio data
The initial time entered to finish time time;
Determine total playing duration of video code flow;
According to the frame number and total playing duration, the target playing duration for dubbing audio data is calculated;
Sample frequency is determined based on target playing duration and the corresponding duration of target time section, and based on sample frequency to institute
It states and dubs audio data coding.
Video process apparatus provided by the embodiments of the present application dubs audio by the acquisition user's input of acquiring unit 301
Data;Image acquisition unit 302 obtains multi-frame video image from video file;Processing unit 303 is from multi-frame video image
It determines target facial image, target facial image is merged with default facial image, the video image that obtains that treated;Video is compiled
Numeral unit 303 from multi-frame video image for determining the initial video image comprising target face, by initial video figure
Target face as in is merged with the facial image of selection, obtains target video image;Synthesis unit 304 is to dubbing audio data
Synthesis processing is carried out at least target video image, obtains audio-video composite document.This programme can dub user and user
The elements such as portrait organically blend into video production, and it is personalized strong to promote user's depth degree of involvement and video in video production
Degree.
The embodiment of the present application also provides a kind of terminal, and as shown in Figure 10, which may include radio frequency (RF, Radio
Frequency) circuit 601, include one or more memory 602, the input unit of computer readable storage medium
603, display unit 604, sensor 605, voicefrequency circuit 606, Wireless Fidelity (WiFi, Wireless Fidelity) module
607, the components such as processor 608 and the power supply 609 of processing core are included one or more than one.Those skilled in the art
Member it is appreciated that terminal structure not structure paired terminal shown in Figure 10 restriction, may include more more or less than illustrating
Component, perhaps combine certain components or different component layouts.Wherein:
RF circuit 601 can be used for receiving and sending messages or communication process in, signal sends and receivees, particularly, by base station
After downlink information receives, one or the processing of more than one processor 608 are transferred to;In addition, the data for being related to uplink are sent to
Base station.In general, RF circuit 601 includes but is not limited to antenna, at least one amplifier, tuner, one or more oscillators, uses
Family identity module (SIM, Subscriber Identity Module) card, transceiver, coupler, low-noise amplifier
(LNA, Low Noise Amplifier), duplexer etc..In addition, RF circuit 601 can also by wireless communication with network and its
He communicates equipment.Any communication standard or agreement, including but not limited to global system for mobile telecommunications system can be used in the wireless communication
Unite (GSM, Global System of Mobile communication), general packet radio service (GPRS, General
Packet Radio Service), CDMA (CDMA, Code Division Multiple Access), wideband code division it is more
Location (WCDMA, Wideband Code Division Multiple Access), long term evolution (LTE, Long Term
Evolution), Email, short message service (SMS, Short Messaging Service) etc..
Memory 602 can be used for storing software program and module, and processor 608 is stored in memory 602 by operation
Software program and module, thereby executing various function application and data processing.Memory 602 can mainly include storage journey
Sequence area and storage data area, wherein storing program area can the (ratio of application program needed for storage program area, at least one function
Such as sound-playing function, image player function) etc.;Storage data area, which can be stored, uses created data according to terminal
(such as audio data, phone directory etc.) etc..In addition, memory 602 may include high-speed random access memory, can also include
Nonvolatile memory, for example, at least a disk memory, flush memory device or other volatile solid-state parts.Phase
Ying Di, memory 602 can also include Memory Controller, to provide processor 608 and input unit 603 to memory 602
Access.
Input unit 603 can be used for receiving the number or character information of input, and generate and user setting and function
Control related keyboard, mouse, operating stick, optics or trackball signal input.Specifically, in a specific embodiment
In, input unit 603 may include touch sensitive surface and other input equipments.Touch sensitive surface, also referred to as touch display screen or touching
Control plate, collect user on it or nearby touch operation (such as user using any suitable object such as finger, stylus or
Operation of the attachment on touch sensitive surface or near touch sensitive surface), and corresponding connection dress is driven according to preset formula
It sets.Optionally, touch sensitive surface may include both touch detecting apparatus and touch controller.Wherein, touch detecting apparatus is examined
The touch orientation of user is surveyed, and detects touch operation bring signal, transmits a signal to touch controller;Touch controller from
Touch information is received on touch detecting apparatus, and is converted into contact coordinate, then gives processor 608, and can reception processing
Order that device 608 is sent simultaneously is executed.Furthermore, it is possible to a variety of using resistance-type, condenser type, infrared ray and surface acoustic wave etc.
Type realizes touch sensitive surface.In addition to touch sensitive surface, input unit 603 can also include other input equipments.Specifically, other are defeated
Entering equipment can include but is not limited to physical keyboard, function key (such as volume control button, switch key etc.), trace ball, mouse
One of mark, operating stick etc. are a variety of.
Display unit 604 can be used for showing information input by user or be supplied to user information and terminal it is various
Graphical user interface, these graphical user interface can be made of figure, text, icon, video and any combination thereof.Display
Unit 604 may include display panel, optionally, can using liquid crystal display (LCD, Liquid Crystal Display),
The forms such as Organic Light Emitting Diode (OLED, Organic Light-Emitting Diode) configure display panel.Further
, touch sensitive surface can cover display panel, after touch sensitive surface detects touch operation on it or nearby, send processing to
Device 608 is followed by subsequent processing device 608 and is provided on a display panel accordingly according to the type of touch event to determine the type of touch event
Visual output.Although touch sensitive surface and display panel are to realize input and defeated as two independent components in Figure 10
Enter function, but in some embodiments it is possible to touch sensitive surface and display panel is integrated and realizes and outputs and inputs function.
Terminal may also include at least one sensor 605, such as optical sensor, motion sensor and other sensors.
Specifically, optical sensor may include ambient light sensor and proximity sensor, wherein ambient light sensor can be according to ambient light
Light and shade adjust the brightness of display panel, proximity sensor can close display panel and/or back when terminal is moved in one's ear
Light.As a kind of motion sensor, gravity accelerometer can detect (generally three axis) acceleration in all directions
Size can detect that size and the direction of gravity when static, can be used to identify mobile phone posture application (such as horizontal/vertical screen switching,
Dependent game, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, tap) etc.;It can also configure as terminal
The other sensors such as gyroscope, barometer, hygrometer, thermometer, infrared sensor, details are not described herein.
Voicefrequency circuit 606, loudspeaker, microphone can provide the audio interface between user and terminal.Voicefrequency circuit 606 can
By the electric signal after the audio data received conversion, it is transferred to loudspeaker, voice signal output is converted to by loudspeaker;It is another
The voice signal of collection is converted to electric signal by aspect, microphone, is converted to audio data after being received by voicefrequency circuit 606, then
After the processing of audio data output processor 608, it is sent to such as another terminal through RF circuit 601, or by audio data
Output is further processed to memory 602.Voicefrequency circuit 606 is also possible that earphone jack, with provide peripheral hardware earphone with
The communication of terminal.
WiFi belongs to short range wireless transmission technology, and terminal can help user's transceiver electronics postal by WiFi module 607
Part, browsing webpage and access streaming video etc., it provides wireless broadband internet access for user.Although Figure 10 is shown
WiFi module 607, but it is understood that, and it is not belonging to must be configured into for terminal, it can according to need do not changing completely
Become in the range of the essence of invention and omits.
Processor 608 is the control centre of terminal, using the various pieces of various interfaces and connection whole mobile phone, is led to
It crosses operation or executes the software program and/or module being stored in memory 602, and call and be stored in memory 602
Data execute the various functions and processing data of terminal, to carry out integral monitoring to mobile phone.Optionally, processor 608 can wrap
Include one or more processing cores;Preferably, processor 608 can integrate application processor and modem processor, wherein answer
With the main processing operation system of processor, user interface and application program etc., modem processor mainly handles wireless communication.
It is understood that above-mentioned modem processor can not also be integrated into processor 608.
Terminal further includes the power supply 609 (such as battery) powered to all parts, it is preferred that power supply can pass through power supply pipe
Reason system and processor 608 are logically contiguous, to realize management charging, electric discharge and power managed by power-supply management system
Etc. functions.Power supply 609 can also include one or more direct current or AC power source, recharging system, power failure inspection
The random components such as slowdown monitoring circuit, power adapter or inverter, power supply status indicator.
Although being not shown, terminal can also include camera, bluetooth module etc., and details are not described herein.Specifically in this implementation
In example, the processor 608 in terminal can be corresponding by the process of one or more application program according to following instruction
Executable file is loaded into memory 602, and the application program of storage in the memory 602 is run by processor 608, from
And realize various functions:
Obtain user's input dubs audio data;Multi-frame video image is obtained from video file;From multi-frame video figure
The initial video image comprising target face is determined as in, by the face figure of the target face in initial video image and selection
As fusion, target video image is obtained;Synthesis processing is carried out at least target video image to audio data is dubbed, obtains sound view
Frequency composite document.
For the embodiment of the present application during playing video file, obtain user's input dubs audio data;From video
Multi-frame video image is obtained in file;The initial video image comprising target face is determined from multi-frame video image, it will be first
Target face in beginning video image is merged with the facial image of selection, obtains target video image;To dub audio data with
At least described target video image carries out synthesis processing, obtains audio-video composite document.This programme dubs user and user people
The elements such as picture organically blend into video production, promote user's depth degree of involvement and video personalization intensity in video production.
User can be dubbed and be organically blended with elements such as user's portraits into video production by this programme, and it is deep to promote user in video production
Spend the degree of involvement and video personalization intensity.
It will appreciated by the skilled person that all or part of the steps in the various methods of above-described embodiment can be with
It is completed by instructing, or relevant hardware is controlled by instruction to complete, which can store computer-readable deposits in one
In storage media, and is loaded and executed by processor.
For this purpose, the embodiment of the present application provides a kind of storage medium, wherein being stored with a plurality of instruction, which can be processed
Device is loaded, to execute the step in any method for processing video frequency provided by the embodiment of the present application.For example, the instruction can
To execute following steps:
Obtain user's input dubs audio data;Multi-frame video image is obtained from video file;From multi-frame video figure
The initial video image comprising target face is determined as in, by the face figure of the target face in initial video image and selection
As fusion, target video image is obtained;Synthesis processing is carried out at least target video image to audio data is dubbed, obtains sound view
Frequency composite document.
The specific implementation of above each operation can be found in the embodiment of front, and details are not described herein.
Wherein, which may include: read-only memory (ROM, Read Only Memory), random access memory
Body (RAM, Random Access Memory), disk or CD etc..
By the instruction stored in the storage medium, can execute at any video provided by the embodiment of the present application
Step in reason method, it is thereby achieved that achieved by any method for processing video frequency provided by the embodiment of the present application
Beneficial effect is detailed in the embodiment of front, and details are not described herein.
Detailed Jie has been carried out to a kind of method for processing video frequency, device and storage medium provided by the embodiment of the present application above
It continues, specific examples are used herein to illustrate the principle and implementation manner of the present application, and the explanation of above embodiments is only
It is to be used to help understand the method for this application and its core ideas;Meanwhile for those skilled in the art, according to the application's
Thought, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification should not be construed as
Limitation to the application.
Claims (15)
1. a kind of method for processing video frequency characterized by comprising
Obtain user's input dubs audio data;
Multi-frame video image is obtained from video file;
The initial video image comprising target face is determined from the multi-frame video image, it will be in the initial video image
Target face merged with the facial image of selection, obtain target video image;
Audio data is dubbed and at least described target video image carries out synthesis processing to described, obtains audio-video composite document.
2. method for processing video frequency according to claim 1, which is characterized in that obtaining multi-frame video figure from video file
Before picture, further includes:
Video file is parsed, at least one facial image is therefrom extracted;
The samples selection instruction of user is received, and is chosen from least one described facial image based on samples selection instruction
Sample facial image;
It is described that the initial video image comprising target face is determined from the multi-frame video image, comprising:
Capture the face in each frame of the multi-frame video image;
Judge in the video image whether include and the matched target face of the sample facial image;
If so, using the video image as initial video image.
3. method for processing video frequency according to claim 1, which is characterized in that the mesh by the initial video image
Mark face is merged with the facial image of selection, comprising:
Facial key point to target face in the initial video image and the facial key point in the facial image of selection
It is detected and is positioned;
The facial image of selection is aligned with the target face by affine transformation;
It is updated based on facial characteristics of the facial image after alignment to the target face.
4. method for processing video frequency according to claim 3, which is characterized in that the facial image based on after alignment is to institute
The facial characteristics for stating target face is updated, comprising:
Human face region division is carried out based on the facial key point in the target face, obtains the facial characteristics of the target face
Region;
The facial characteristics region is handled according to preset algorithm, obtains the facial characteristics mould in the facial characteristics region
Plate;
Using the facial characteristics template, by after the alignment facial image and the target face fusion, obtain face and melt
Close image.
5. method for processing video frequency according to claim 4, which is characterized in that the face template is being utilized, it will be described right
Facial image and the target face fusion after neat, after obtaining face fusion image, further includes:
Calculate the pixel value difference of facial characteristics between the target face and the facial image of selection;
Color adjustment parameter is generated according to the pixel value difference;
The face fusion image is adjusted based on the color adjustment parameter.
6. method for processing video frequency according to claim 1, which is characterized in that it is described to it is described dub audio data at least
The target video image carries out synthesis processing, comprising:
The multi-frame video image is updated based on the target video image;
Updated multi-frame video image is encoded, video code flow is obtained;
The audio data of dubbing is encoded, audio code stream is obtained;
The video code flow is synthesized into output with the audio code stream.
7. method for processing video frequency according to claim 6, which is characterized in that described to obtain multi-frame video from video file
Image, comprising:
Video image is captured from video file according to predetermined frame rate time interval, obtains multi-frame video image;
It is described that the audio data of dubbing is encoded, obtain audio code stream, comprising:
Obtain the video image frame number captured in target time section, wherein target time section is to dub what audio data inputted
Initial time to finish time time;
Determine total playing duration of the video code flow;
According to the frame number and total playing duration, the target playing duration of audio data is dubbed described in calculating;
Sample frequency is determined based on the target playing duration and the corresponding duration of target time section, and is based on the sample frequency
Audio data coding is dubbed to described, obtains audio code stream.
8. method for processing video frequency according to claim 7, which is characterized in that described by the video code flow and the audio
Bit-stream synthesis output, comprising:
It determines in the video code flow, the corresponding broadcasting start time point of video image captured in target time section and end
Time point;
Configure the broadcasting start time point and end time point to broadcasting start time point and the end of the audio code stream
Time point, and the video code flow is synthesized into output with the audio code stream.
9. method for processing video frequency according to claim 1-8, which is characterized in that further include:
Obtain sample background audio data and/or sample text;
The video file is played during dubbing audio data in user's input, while playing the sample background audio
Data and/or the display sample text.
10. a kind of video process apparatus characterized by comprising
Audio acquiring unit, obtain user's input dubs audio data;
Image acquisition unit, for obtaining multi-frame video image from video file;
Processing unit will be described for determining the initial video image comprising target face from the multi-frame video image
Target face in initial video image is merged with the facial image of selection, obtains target video image;
Synthesis unit obtains sound for dubbing audio data and at least described target video image carries out synthesis processing to described
Video Composition file.
11. video process apparatus according to claim 10, which is characterized in that described device further include:
Extraction unit, for being parsed to video file, therefrom extracting at least one face figure before playing video file
Picture;
Selecting unit, for receive user samples selection instruct, and based on the samples selection instruction from it is described at least one
Sample facial image is chosen in facial image;
The processing unit is used for:
Capture the face in each frame of the multi-frame video image;
Judge in the video image whether include and the matched target face of the sample facial image;
If so, using the video image as initial video image.
12. video process apparatus according to claim 10, which is characterized in that the processing unit is also used to:
Facial key point to target face in the initial video image and the facial key point in the facial image of selection
It is detected and is positioned;
The facial image of selection is aligned with the target face by affine transformation;
It is updated based on facial characteristics of the facial image after alignment to the target face.
13. video process apparatus according to claim 10, which is characterized in that the synthesis unit includes:
Subelement is updated, for being updated based on the target video image to the multi-frame video image;
Video coding subelement obtains video code flow for encoding to updated multi-frame video image;
Audio coding subelement obtains audio code stream for encoding to the audio data of dubbing;
Synthesizing subunit, for the video code flow to be synthesized output with the audio code stream.
14. video process apparatus according to claim 13, which is characterized in that described image acquiring unit is used for:
Video image is captured from video file according to predetermined frame rate time interval, obtains multi-frame video image;
Audio coding is used for:
Obtain the video image frame number captured in target time section, wherein target time section is to dub what audio data inputted
Initial time to finish time time;
Determine total playing duration of the video code flow;
According to the frame number and total playing duration, the target playing duration of audio data is dubbed described in calculating;
Sample frequency is determined based on the target playing duration and the corresponding duration of target time section, and is based on the sample frequency
Audio data coding is dubbed to described.
15. a kind of storage medium, which is characterized in that the storage medium is stored with a plurality of instruction, and described instruction is suitable for processor
It is loaded, the step in 1 to 9 described in any item method for processing video frequency is required with perform claim.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910023976.3A CN109819313B (en) | 2019-01-10 | 2019-01-10 | Video processing method, device and storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910023976.3A CN109819313B (en) | 2019-01-10 | 2019-01-10 | Video processing method, device and storage medium |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN109819313A true CN109819313A (en) | 2019-05-28 |
| CN109819313B CN109819313B (en) | 2021-01-08 |
Family
ID=66603283
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201910023976.3A Active CN109819313B (en) | 2019-01-10 | 2019-01-10 | Video processing method, device and storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN109819313B (en) |
Cited By (28)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110266973A (en) * | 2019-07-19 | 2019-09-20 | 腾讯科技(深圳)有限公司 | Method for processing video frequency, device, computer readable storage medium and computer equipment |
| CN110363175A (en) * | 2019-07-23 | 2019-10-22 | 厦门美图之家科技有限公司 | Image processing method, device and electronic equipment |
| CN110543826A (en) * | 2019-08-06 | 2019-12-06 | 尚尚珍宝(北京)网络科技有限公司 | Image processing method and device for virtual wearing of wearable product |
| CN110807584A (en) * | 2019-10-30 | 2020-02-18 | 维沃移动通信有限公司 | Object replacement method and electronic equipment |
| CN110856014A (en) * | 2019-11-05 | 2020-02-28 | 北京奇艺世纪科技有限公司 | Moving image generation method, moving image generation device, electronic device, and storage medium |
| CN110868554A (en) * | 2019-11-18 | 2020-03-06 | 广州华多网络科技有限公司 | Method, device and equipment for changing faces in real time in live broadcast and storage medium |
| CN110968736A (en) * | 2019-12-04 | 2020-04-07 | 深圳追一科技有限公司 | Video generation method and device, electronic equipment and storage medium |
| CN111212245A (en) * | 2020-01-15 | 2020-05-29 | 北京猿力未来科技有限公司 | Method and device for synthesizing video |
| CN111353069A (en) * | 2020-02-04 | 2020-06-30 | 清华珠三角研究院 | Character scene video generation method, system, device and storage medium |
| CN111681676A (en) * | 2020-06-09 | 2020-09-18 | 杭州星合尚世影视传媒有限公司 | Method, system and device for identifying and constructing audio frequency by video object and readable storage medium |
| CN111741326A (en) * | 2020-06-30 | 2020-10-02 | 腾讯科技(深圳)有限公司 | Video synthesis method, device, equipment and storage medium |
| CN112040310A (en) * | 2020-09-03 | 2020-12-04 | 广州优谷信息技术有限公司 | Audio and video synthesis method and device, mobile terminal and storage medium |
| CN112562720A (en) * | 2020-11-30 | 2021-03-26 | 清华珠三角研究院 | Lip-synchronization video generation method, device, equipment and storage medium |
| CN112766215A (en) * | 2021-01-29 | 2021-05-07 | 北京字跳网络技术有限公司 | Face fusion method and device, electronic equipment and storage medium |
| CN112929746A (en) * | 2021-02-07 | 2021-06-08 | 北京有竹居网络技术有限公司 | Video generation method and device, storage medium and electronic equipment |
| CN113238698A (en) * | 2021-05-11 | 2021-08-10 | 北京达佳互联信息技术有限公司 | Image processing method, image processing device, electronic equipment and storage medium |
| CN113395569A (en) * | 2021-05-29 | 2021-09-14 | 北京优幕科技有限责任公司 | Video generation method and device |
| CN113727187A (en) * | 2021-08-31 | 2021-11-30 | 平安科技(深圳)有限公司 | Animation video processing method and device based on skeleton migration and related equipment |
| CN113923515A (en) * | 2021-09-29 | 2022-01-11 | 马上消费金融股份有限公司 | Video production method and device, electronic equipment and storage medium |
| CN113965802A (en) * | 2021-10-22 | 2022-01-21 | 深圳市兆驰股份有限公司 | Immersive video interaction method, device, equipment and storage medium |
| CN114040248A (en) * | 2021-11-23 | 2022-02-11 | 维沃移动通信有限公司 | Video processing method and device and electronic equipment |
| CN114286171A (en) * | 2021-08-19 | 2022-04-05 | 腾讯科技(深圳)有限公司 | Video processing method, device, equipment and storage medium |
| CN114398517A (en) * | 2021-12-31 | 2022-04-26 | 北京达佳互联信息技术有限公司 | Video data acquisition method and device |
| CN114821717A (en) * | 2022-04-20 | 2022-07-29 | 北京百度网讯科技有限公司 | Target object fusion method and device, electronic equipment and storage medium |
| CN116132711A (en) * | 2023-02-03 | 2023-05-16 | 北京字跳网络技术有限公司 | Method, device and electronic device for generating video template |
| US11817127B2 (en) | 2020-07-23 | 2023-11-14 | Beijing Bytedance Network Technology Co., Ltd. | Video dubbing method, apparatus, device, and storage medium |
| CN117082188A (en) * | 2023-10-12 | 2023-11-17 | 广东工业大学 | Consistent video generation method and related devices based on Plucker analysis |
| EP4284005A4 (en) * | 2021-02-24 | 2024-07-17 | Petal Cloud Technology Co., Ltd. | VIDEO DUBBLING METHOD, RELATED DEVICE AND COMPUTER-READABLE STORAGE MEDIUM |
Citations (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101504774A (en) * | 2009-03-06 | 2009-08-12 | 暨南大学 | Animation design engine based on virtual reality |
| US20100272417A1 (en) * | 2009-04-27 | 2010-10-28 | Masato Nagasawa | Stereoscopic video and audio recording method, stereoscopic video and audio reproducing method, stereoscopic video and audio recording apparatus, stereoscopic video and audio reproducing apparatus, and stereoscopic video and audio recording medium |
| WO2014001095A1 (en) * | 2012-06-26 | 2014-01-03 | Thomson Licensing | Method for audiovisual content dubbing |
| CN104092957A (en) * | 2014-07-16 | 2014-10-08 | 浙江航天长峰科技发展有限公司 | Method for generating screen video integrating image with voice |
| US9153031B2 (en) * | 2011-06-22 | 2015-10-06 | Microsoft Technology Licensing, Llc | Modifying video regions using mobile device input |
| CN107330408A (en) * | 2017-06-30 | 2017-11-07 | 北京金山安全软件有限公司 | Video processing method and device, electronic equipment and storage medium |
| CN107832741A (en) * | 2017-11-28 | 2018-03-23 | 北京小米移动软件有限公司 | The method, apparatus and computer-readable recording medium of facial modeling |
| CN108259788A (en) * | 2018-01-29 | 2018-07-06 | 努比亚技术有限公司 | Video editing method, terminal and computer readable storage medium |
| CN108765528A (en) * | 2018-04-10 | 2018-11-06 | 南京江大搏达信息科技有限公司 | Game charater face 3D animation synthesizing methods based on data-driven |
| CN108965740A (en) * | 2018-07-11 | 2018-12-07 | 深圳超多维科技有限公司 | A kind of real-time video is changed face method, apparatus, equipment and storage medium |
| CN109063658A (en) * | 2018-08-08 | 2018-12-21 | 吴培希 | A method of it is changed face using deep learning in multi-mobile-terminal video personage |
-
2019
- 2019-01-10 CN CN201910023976.3A patent/CN109819313B/en active Active
Patent Citations (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101504774A (en) * | 2009-03-06 | 2009-08-12 | 暨南大学 | Animation design engine based on virtual reality |
| US20100272417A1 (en) * | 2009-04-27 | 2010-10-28 | Masato Nagasawa | Stereoscopic video and audio recording method, stereoscopic video and audio reproducing method, stereoscopic video and audio recording apparatus, stereoscopic video and audio reproducing apparatus, and stereoscopic video and audio recording medium |
| US9153031B2 (en) * | 2011-06-22 | 2015-10-06 | Microsoft Technology Licensing, Llc | Modifying video regions using mobile device input |
| WO2014001095A1 (en) * | 2012-06-26 | 2014-01-03 | Thomson Licensing | Method for audiovisual content dubbing |
| CN104092957A (en) * | 2014-07-16 | 2014-10-08 | 浙江航天长峰科技发展有限公司 | Method for generating screen video integrating image with voice |
| CN107330408A (en) * | 2017-06-30 | 2017-11-07 | 北京金山安全软件有限公司 | Video processing method and device, electronic equipment and storage medium |
| CN107832741A (en) * | 2017-11-28 | 2018-03-23 | 北京小米移动软件有限公司 | The method, apparatus and computer-readable recording medium of facial modeling |
| CN108259788A (en) * | 2018-01-29 | 2018-07-06 | 努比亚技术有限公司 | Video editing method, terminal and computer readable storage medium |
| CN108765528A (en) * | 2018-04-10 | 2018-11-06 | 南京江大搏达信息科技有限公司 | Game charater face 3D animation synthesizing methods based on data-driven |
| CN108965740A (en) * | 2018-07-11 | 2018-12-07 | 深圳超多维科技有限公司 | A kind of real-time video is changed face method, apparatus, equipment and storage medium |
| CN109063658A (en) * | 2018-08-08 | 2018-12-21 | 吴培希 | A method of it is changed face using deep learning in multi-mobile-terminal video personage |
Non-Patent Citations (1)
| Title |
|---|
| 钟千里: "《图像中人脸自动替换技术的研究与实现》", 《中国优秀硕士学位论文全文数据库》 * |
Cited By (36)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110266973A (en) * | 2019-07-19 | 2019-09-20 | 腾讯科技(深圳)有限公司 | Method for processing video frequency, device, computer readable storage medium and computer equipment |
| CN110363175A (en) * | 2019-07-23 | 2019-10-22 | 厦门美图之家科技有限公司 | Image processing method, device and electronic equipment |
| CN110543826A (en) * | 2019-08-06 | 2019-12-06 | 尚尚珍宝(北京)网络科技有限公司 | Image processing method and device for virtual wearing of wearable product |
| CN110807584A (en) * | 2019-10-30 | 2020-02-18 | 维沃移动通信有限公司 | Object replacement method and electronic equipment |
| CN110856014A (en) * | 2019-11-05 | 2020-02-28 | 北京奇艺世纪科技有限公司 | Moving image generation method, moving image generation device, electronic device, and storage medium |
| CN110868554A (en) * | 2019-11-18 | 2020-03-06 | 广州华多网络科技有限公司 | Method, device and equipment for changing faces in real time in live broadcast and storage medium |
| CN110868554B (en) * | 2019-11-18 | 2022-03-08 | 广州方硅信息技术有限公司 | Method, device and equipment for changing faces in real time in live broadcast and storage medium |
| CN110968736A (en) * | 2019-12-04 | 2020-04-07 | 深圳追一科技有限公司 | Video generation method and device, electronic equipment and storage medium |
| CN110968736B (en) * | 2019-12-04 | 2021-02-02 | 深圳追一科技有限公司 | Video generation method and device, electronic equipment and storage medium |
| CN111212245A (en) * | 2020-01-15 | 2020-05-29 | 北京猿力未来科技有限公司 | Method and device for synthesizing video |
| CN111353069A (en) * | 2020-02-04 | 2020-06-30 | 清华珠三角研究院 | Character scene video generation method, system, device and storage medium |
| CN111681676A (en) * | 2020-06-09 | 2020-09-18 | 杭州星合尚世影视传媒有限公司 | Method, system and device for identifying and constructing audio frequency by video object and readable storage medium |
| CN111681676B (en) * | 2020-06-09 | 2023-08-08 | 杭州星合尚世影视传媒有限公司 | Method, system, device and readable storage medium for constructing audio frequency by video object identification |
| CN111741326A (en) * | 2020-06-30 | 2020-10-02 | 腾讯科技(深圳)有限公司 | Video synthesis method, device, equipment and storage medium |
| CN111741326B (en) * | 2020-06-30 | 2023-08-18 | 腾讯科技(深圳)有限公司 | Video synthesis method, device, equipment and storage medium |
| US11817127B2 (en) | 2020-07-23 | 2023-11-14 | Beijing Bytedance Network Technology Co., Ltd. | Video dubbing method, apparatus, device, and storage medium |
| CN112040310A (en) * | 2020-09-03 | 2020-12-04 | 广州优谷信息技术有限公司 | Audio and video synthesis method and device, mobile terminal and storage medium |
| CN112562720A (en) * | 2020-11-30 | 2021-03-26 | 清华珠三角研究院 | Lip-synchronization video generation method, device, equipment and storage medium |
| CN112766215A (en) * | 2021-01-29 | 2021-05-07 | 北京字跳网络技术有限公司 | Face fusion method and device, electronic equipment and storage medium |
| CN112929746A (en) * | 2021-02-07 | 2021-06-08 | 北京有竹居网络技术有限公司 | Video generation method and device, storage medium and electronic equipment |
| CN112929746B (en) * | 2021-02-07 | 2023-06-16 | 北京有竹居网络技术有限公司 | Video generation method and device, storage medium and electronic equipment |
| EP4284005A4 (en) * | 2021-02-24 | 2024-07-17 | Petal Cloud Technology Co., Ltd. | VIDEO DUBBLING METHOD, RELATED DEVICE AND COMPUTER-READABLE STORAGE MEDIUM |
| CN113238698A (en) * | 2021-05-11 | 2021-08-10 | 北京达佳互联信息技术有限公司 | Image processing method, image processing device, electronic equipment and storage medium |
| CN113395569A (en) * | 2021-05-29 | 2021-09-14 | 北京优幕科技有限责任公司 | Video generation method and device |
| CN114286171A (en) * | 2021-08-19 | 2022-04-05 | 腾讯科技(深圳)有限公司 | Video processing method, device, equipment and storage medium |
| CN113727187A (en) * | 2021-08-31 | 2021-11-30 | 平安科技(深圳)有限公司 | Animation video processing method and device based on skeleton migration and related equipment |
| CN113727187B (en) * | 2021-08-31 | 2022-10-11 | 平安科技(深圳)有限公司 | Animation video processing method and device based on skeleton migration and related equipment |
| CN113923515A (en) * | 2021-09-29 | 2022-01-11 | 马上消费金融股份有限公司 | Video production method and device, electronic equipment and storage medium |
| CN113965802A (en) * | 2021-10-22 | 2022-01-21 | 深圳市兆驰股份有限公司 | Immersive video interaction method, device, equipment and storage medium |
| CN114040248A (en) * | 2021-11-23 | 2022-02-11 | 维沃移动通信有限公司 | Video processing method and device and electronic equipment |
| CN114398517A (en) * | 2021-12-31 | 2022-04-26 | 北京达佳互联信息技术有限公司 | Video data acquisition method and device |
| CN114821717A (en) * | 2022-04-20 | 2022-07-29 | 北京百度网讯科技有限公司 | Target object fusion method and device, electronic equipment and storage medium |
| CN114821717B (en) * | 2022-04-20 | 2024-03-12 | 北京百度网讯科技有限公司 | Target object fusion method, device, electronic equipment and storage medium |
| CN116132711A (en) * | 2023-02-03 | 2023-05-16 | 北京字跳网络技术有限公司 | Method, device and electronic device for generating video template |
| CN117082188A (en) * | 2023-10-12 | 2023-11-17 | 广东工业大学 | Consistent video generation method and related devices based on Plucker analysis |
| CN117082188B (en) * | 2023-10-12 | 2024-01-30 | 广东工业大学 | Consistency video generation method and related device based on Pruk analysis |
Also Published As
| Publication number | Publication date |
|---|---|
| CN109819313B (en) | 2021-01-08 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109819313A (en) | Method for processing video frequency, device and storage medium | |
| US12046037B2 (en) | Adding beauty products to augmented reality tutorials | |
| US10200634B2 (en) | Video generation method, apparatus and terminal | |
| KR101990536B1 (en) | Method for providing information and Electronic apparatus thereof | |
| US12488551B2 (en) | Augmented reality beauty product tutorials | |
| CN107944397A (en) | Video recording method, device and computer-readable recording medium | |
| AU2014200042B2 (en) | Method and apparatus for controlling contents in electronic device | |
| US12155961B2 (en) | Video special effect generation method and terminal | |
| CN105302315A (en) | Image processing method and device | |
| CN104461348B (en) | Information choosing method and device | |
| CN109064387A (en) | Image special effect generation method, device and electronic equipment | |
| US20180190325A1 (en) | Image processing method, image processing apparatus, and program | |
| CN110209879A (en) | A kind of video broadcasting method, device, equipment and storage medium | |
| CN115225756B (en) | Method for determining target object, shooting method and device | |
| CN114339076B (en) | Video shooting method, device, electronic device and storage medium | |
| CN109215655A (en) | Method and mobile terminal for adding text to video | |
| CN110502117B (en) | Screenshot method in electronic terminal and electronic terminal | |
| CN112118397B (en) | Video synthesis method, related device, equipment and storage medium | |
| CN111491124A (en) | Video processing method, device and electronic device | |
| CN109257649A (en) | A kind of multimedia file generation method and terminal device | |
| CN113014801A (en) | Video recording method, video recording device, electronic equipment and medium | |
| WO2023226699A1 (en) | Video recording method and apparatus, and storage medium | |
| CN113095163A (en) | Video processing method and device, electronic equipment and storage medium | |
| CN115209206B (en) | Video editing method, device, equipment, storage medium and computer program product | |
| WO2023226695A1 (en) | Video recording method and apparatus, and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| TR01 | Transfer of patent right | ||
| TR01 | Transfer of patent right |
Effective date of registration: 20221115 Address after: 1402, Floor 14, Block A, Haina Baichuan Headquarters Building, No. 6, Baoxing Road, Haibin Community, Xin'an Street, Bao'an District, Shenzhen, Guangdong 518,101 Patentee after: Shenzhen Yayue Technology Co.,Ltd. Address before: 518057 Tencent Building, No. 1 High-tech Zone, Nanshan District, Shenzhen City, Guangdong Province, 35 floors Patentee before: TENCENT TECHNOLOGY (SHENZHEN) Co.,Ltd. |