[go: up one dir, main page]

CN113269862B - Scene self-adaptive fine three-dimensional face reconstruction method, system and electronic equipment - Google Patents

Scene self-adaptive fine three-dimensional face reconstruction method, system and electronic equipment Download PDF

Info

Publication number
CN113269862B
CN113269862B CN202110601213.XA CN202110601213A CN113269862B CN 113269862 B CN113269862 B CN 113269862B CN 202110601213 A CN202110601213 A CN 202110601213A CN 113269862 B CN113269862 B CN 113269862B
Authority
CN
China
Prior art keywords
dimensional
dimensional face
image
shape
face
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110601213.XA
Other languages
Chinese (zh)
Other versions
CN113269862A (en
Inventor
雷震
朱翔昱
于畅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202110601213.XA priority Critical patent/CN113269862B/en
Publication of CN113269862A publication Critical patent/CN113269862A/en
Application granted granted Critical
Publication of CN113269862B publication Critical patent/CN113269862B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/50Lighting effects
    • G06T15/60Shadow generation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/04Texture mapping
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Graphics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention belongs to the technical field of image processing and pattern recognition, in particular relates to a scene self-adaptive fine three-dimensional face reconstruction method, a system and electronic equipment, and aims to solve the problems of strong model sense and poor generalization of the reconstruction result of the existing three-dimensional face reconstruction; the method comprises the steps of amplifying the face shape of a training set based on a 3DMM and a graphical imaging model to obtain a plurality of three-dimensional face data and images corresponding to the three-dimensional face data; fitting a three-dimensional variable model to the three-dimensional face data corresponding image as an initial shape, and performing virtual multi-view generation based on the three-dimensional face data corresponding image and the initial shape to obtain a multi-view image; inputting the multi-view images into a many-to-one funnel network, and optimizing through a vision-consistent loss function to obtain a refined three-dimensional face shape; the invention improves three aspects of training data construction, model design and scene self-adaptation, realizes fine shape reconstruction and improves the robustness of the model in an unconstrained scene.

Description

Scene self-adaptive fine three-dimensional face reconstruction method, system and electronic equipment
Technical Field
The invention belongs to the technical field of image processing and pattern recognition, and particularly relates to a scene self-adaptive fine three-dimensional face reconstruction method, a system and electronic equipment.
Background
At present, the three-dimensional face reconstruction algorithm is mainly used for carrying out shape reconstruction based on a three-dimensional variable model (3 DMM). However, due to the acquisition cost of three-dimensional data, most 3DMM models are built by only hundreds of scanning point clouds, the spans of ages and the like are very small, the images are often shot in a controlled environment, the faces are in the front and the expression is natural, the obtained model has limited expression capacity, and it is difficult to describe all changes which may exist in the faces in practice.
The existing mainstream algorithm is based on convolutional neural network to reconstruct three-dimensional face, and training of the three-dimensional face generally requires a large amount of dense three-dimensional face point clouds and corresponding face images as supervision. However, the cost of manually labeling the data is high, the implementation is difficult, the existing three-dimensional data set is mostly used as a label of network training by using a 3DMM fitting result, the mode can cause that a three-dimensional shape loses a lot of details, and a reconstruction result tends to have a strong model sense. For this reason, algorithms have been proposed to project the result of the 3DMM fitting onto the original image for self-supervised learning, but since the pose is a main factor affecting the face position and the shape can only slightly affect the position of the point, such techniques focus mainly on the accuracy of pose estimation, rather than shape reconstruction.
In addition, three-dimensional face training data are mostly collected in indoor environments, and due to the characteristic of deep learning data driving, when a model is applied to an outdoor unconstrained scene, the existing algorithm is poor in robustness due to the differences of factors such as illumination, resolution, gesture and shielding.
Disclosure of Invention
In order to solve the problems, namely the problems of strong model sense and poor generalization of the reconstruction result of the existing three-dimensional face reconstruction method, the invention provides a scene self-adaptive fine three-dimensional face reconstruction method, a system and electronic equipment.
The first aspect of the invention provides a scene self-adaptive fine three-dimensional face reconstruction method, which comprises the following steps: step S100, the face shape of the training set is enhanced based on a 3DMM and a graphic imaging model, and a plurality of high-fidelity training data are obtained; the high-fidelity training data comprises three-dimensional face data and corresponding images thereof;
Step S200, based on the acquired high-fidelity training data, fitting a three-dimensional variable model to the corresponding image of the three-dimensional face data as an initial shape, and based on the corresponding image of the three-dimensional face data and the initial shape, performing virtual multi-view generation to obtain a multi-view image;
And inputting the multi-view images into a many-to-one funnel network, and optimizing through a vision consistent loss function to obtain a refined three-dimensional face shape.
In some preferred embodiments, the augmentation in step S100 specifically comprises the steps of: step S110, complementing incomplete depth channels in an input sample image by using gridding, and converting the image into a first three-dimensional grid;
The depth channel comprises a depth channel of a face area and a depth channel of a background area, and the value acquisition method of the depth channel of the face area specifically comprises the following steps: fitting an input sample image based on 3DMM to obtain an initial three-dimensional face shape, and obtaining the initial three-dimensional face shape based on the initial three-dimensional face shape by adopting a non-rigid registration algorithm;
the value of the depth channel of the background area is obtained based on the original depth channel value and smoothness constraint; the sample image is an RGB-D image containing a human face;
step S120, eyes, nose, mouth and cheeks are randomly selected from the training set, and are combined to obtain a first three-dimensional face shape;
Step S130, replacing the face area in the first three-dimensional grid in step S110 with the first three-dimensional face shape, and adjusting the position of a background anchor point through the smoothness constraint of the background area to obtain a second three-dimensional grid as a three-dimensional structure after shape migration;
step S140, performing shadow migration on the second three-dimensional face shape based on a graphical imaging model to obtain a third three-dimensional grid;
And step S150, rendering the third three-dimensional grid to obtain an image with the shape transferred so as to complete the enhancement of the shape of the high-fidelity three-dimensional face.
In some preferred embodiments, the method for acquiring the values of the depth channels of the background area in step S110 is as follows:
uniformly paving a plurality of anchor points in a background area, constructing a triangular network based on the paved anchor points through a Delaunay triangulation algorithm, and calculating the depth of each anchor point through a preset first method; completing the depth channel of the background area based on the depth of each anchor point;
the preset first method specifically comprises the following steps:
Wherein, Representing the Depth of the ith anchor point, mask (x i,yi) represents whether the Depth channel of the ith anchor point has a value, depth (x i,yi) is the value of the Depth channel of the RGB-D image at the position of the ith anchor point, and Connect (i, j) represents whether the ith anchor point and the j anchor point are connected by the edge of the triangle network.
In some preferred embodiments, the adjustment of the location of the background anchor point in step S130 is specifically:
Wherein, For the location of the ith anchor point on the source image,/>Is the target location of the ith anchor point, faceCounter (i) indicates whether the ith anchor point is located in the facial contour, connect (i, j) indicates whether the ith, j anchor points are connected to the background grid.
In some preferred embodiments, the shadow migration in step S140 is specifically:
Based on a preset imaging formula, replacing a normal vector and a specular reflection value with a value after shape migration, and obtaining a target face after shadow migration by using the values of the source images by the other imaging parameters;
The preset imaging formula is as follows:
Wherein, For the color value of the ith point of the three-dimensional shape, T i is the texture value of the ith point, amb is ambient light, and the diagonal matrix Dir represents parallel light in the l direction,/>For the normal direction of the ith point, k s is specular reflection, ve is viewing angle direction, v is angle distribution parameter controlling specular reflection, r i is direction of maximum specular reflection value,/>
In some preferred embodiments, step S200 specifically includes the steps of: step S210, fitting an input sample image based on 3DMM to obtain a roughly reconstructed three-dimensional face; based on the rough reconstructed three-dimensional face, gridding the input sample image to obtain a fourth three-dimensional grid; the depth channel value of the face region is obtained by roughly reconstructing a three-dimensional face, and the depth channel value of the background region is obtained by roughly reconstructing a mean value of the three-dimensional face;
step S220, mirror-image overturning is carried out on the input sample image, the step S210 is repeated to obtain a three-dimensional grid of the image after mirror image, the three-dimensional grid is used as a fifth three-dimensional grid, and the fifth three-dimensional grid is combined with the fourth three-dimensional grid to obtain a sixth three-dimensional grid with complete textures;
Step S230, rendering the sixth three-dimensional mesh shape according to the first preset viewing angle, and calculating the visibility of each triangle mesh to obtain a visibility graph:
step S240, a virtual multi-view image is obtained based on the visibility graph and the rendering result;
step S250, respectively inputting a preset number of virtual views into corresponding encoders to extract image features, and expanding the image features to a UV space to obtain a UV feature group of the multi-view face;
The UV feature sets are connected in series and then input into a decoder, offset between a real three-dimensional face and a rough reconstructed three-dimensional face is obtained by adopting deconvolution, and the offset and the rough reconstructed three-dimensional face are added to obtain a reconstructed three-dimensional shape;
Step S260, setting the texture value of each point of the reconstructed three-dimensional face as (255 ), placing the texture value under direct front light, and calculating the RGB value of each point under the light; rendering the image according to a second preset viewing angle based on the micro-renderers; repeatedly rendering a real three-dimensional face S * serving as a target face, and obtaining an evaluation index L psd of visual similarity based on the calculated error of the rendered image;
Wherein v represents the corresponding view angle number; r (&, &) represents a rendering function, the input of which is three-dimensional shape, texture and illumination; s * is a real three-dimensional face; s init is a rough reconstruction three-dimensional face S init; Δs is the offset; t w is a full white texture, I orth represents parallel light in the direction (0, 1).
In some preferred embodiments, the method further comprises step S300, comprising: fine tuning the many-to-one funnel network;
The fine tuning process is specifically as follows: step S310, grouping data in a training set according to attributes, dividing the data into a plurality of parts based on the distribution of two attributes of age and gender, and grouping the data for training to obtain a dedicated three-dimensional shape model corresponding to the three attributes;
wherein the age distribution is associated with the model Comprises five parts of infants, teenagers, young, middle-aged and elderly people; the sex distribution association model comprises two parts, namely male theta male and female theta female;
Step S320, inputting a non-labeling image of a target scene, estimating the age and the sex of the non-labeling image, and matching corresponding models from model groups with different attributes to obtain two model parameters theta age、θgender;
step S330, extracting key points and segmentation masks of non-annotated images in the target scene;
Step S340, carrying out targeted amplification on the input sample image, and obtaining an amplified image;
Step S350, based on the results of step S330 and step S340, fine tuning the two model parameters θ age、θgender using the weak supervision constraint and the self-supervision constraint to obtain a pseudo tag model:
Step S360, generating a plurality of pseudo three-dimensional labels for the target scene data based on the pseudo label model, obtaining updated pseudo three-dimensional labels by adopting a mean value fusion strategy, and combining the updated pseudo three-dimensional labels with the source data to retrain the many-to-one funnel network.
A second aspect of the present invention provides a scene-adaptive fine three-dimensional face reconstruction system, comprising: the device comprises an acquisition module, an augmentation module, a fitting module, a generation module and an optimization module;
The acquisition module is configured to acquire a two-dimensional face image to be reconstructed as an input sample image;
the augmentation module is configured to augment an input sample image based on the 3DMM and the graphical imaging model to obtain a plurality of high-fidelity training data;
The fitting module is configured to fit a three-dimensional variable model to the corresponding image of the three-dimensional face data based on the acquired high-fidelity training data to serve as an initial shape;
the generating module is configured to generate virtual multi-view based on the three-dimensional face data corresponding image and the initial shape so as to obtain a multi-view image;
The optimizing module is configured to input the multi-view images into a many-to-one funnel network, and optimize the multi-view images through a vision-consistent loss function so as to obtain a refined three-dimensional face shape
A third aspect of the present invention provides an electronic device comprising: at least one processor; and a memory communicatively coupled to at least one of the processors; wherein the memory stores instructions executable by the processor for execution by the processor to implement the scene-adaptive fine three-dimensional face reconstruction method of any one of the above.
A fourth aspect of the present invention proposes a computer readable storage medium storing computer instructions for execution by the computer to implement the scene-adaptive fine three-dimensional face reconstruction method of any one of the above.
1) Aiming at the defects of strong model sense and limited application scene of the reconstruction result of the existing three-dimensional face reconstruction method, the invention improves three aspects of training data construction, model design and scene self-adaption, provides a network structure and a corresponding optimization method aiming at fine shape reconstruction, designs the scene self-adaption method aiming at three-dimensional face reconstruction on the basis, realizes fine shape reconstruction, and improves the robustness of the model in an unconstrained scene.
2) In the construction process of the three-dimensional data, the data is amplified through shape migration, and the shape change of a data set is enriched. On the basis, the accuracy and visual consistency of three-dimensional face reconstruction are improved by a single-view fine three-dimensional face reconstruction method based on the depth neural network. In addition, the scene self-adaptive method of three-dimensional reconstruction can improve the adaptability of the network to the target scene, so that the model can be stably reconstructed in any target scene.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the accompanying drawings in which:
FIG. 1 is a flow chart of one embodiment of a scene adaptive fine three-dimensional face reconstruction method of the present invention;
FIG. 2 is a flow chart of a method of generating high fidelity virtual three-dimensional face data in the invention;
FIG. 3 is a flow chart of a single view fine three-dimensional face reconstruction method in the present invention;
FIG. 4 is a flow chart of a scene adaptation method of three-dimensional reconstruction in the present invention;
FIG. 5 is a schematic diagram of a frame of one embodiment of a scene adaptive fine three-dimensional face reconstruction system of the present invention;
FIG. 6 is a schematic diagram of a computer system of a server for implementing embodiments of the method, system, and apparatus of the present application.
Detailed Description
In order to make the embodiments, technical solutions and advantages of the present invention more obvious, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the embodiments are some, but not all embodiments of the present invention. It should be understood by those skilled in the art that these embodiments are merely for explaining the technical principles of the present invention, and are not intended to limit the scope of the present invention.
The first aspect of the invention provides a scene self-adaptive fine three-dimensional face reconstruction method, which comprises the following steps: step S100, the face shape of the training set is enhanced based on a 3DMM and a graphic imaging model, and a plurality of high-fidelity training data are obtained; the high-fidelity training data comprises three-dimensional face data and corresponding images thereof; step S200, based on the acquired high-fidelity training data, fitting a three-dimensional variable model to the corresponding image of the three-dimensional face data as an initial shape, and based on the corresponding image of the three-dimensional face data and the initial shape, performing virtual multi-view generation to obtain a multi-view image; and inputting the multi-view images into a many-to-one funnel network, and optimizing through a vision consistent loss function to obtain a refined three-dimensional face shape.
Aiming at the defects of strong model sense and poor generalization of the reconstruction result of the existing three-dimensional face reconstruction method, the invention provides a scene self-adaptive fine three-dimensional face reconstruction method, so that the robustness of the model in an unconstrained scene can be improved while the shape is finely reconstructed. The method comprises a high-fidelity virtual three-dimensional face data generation method, a single-view fine three-dimensional face reconstruction method and a three-dimensional reconstruction scene self-adaption method. The high-fidelity virtual three-dimensional face data generation method is characterized in that in the training data construction process, the face shape of a training set is enhanced based on a three-dimensional priori and a graphic imaging model, so that large-scale high-fidelity three-dimensional face data and corresponding images thereof are obtained.
The single-view fine three-dimensional face reconstruction method relies on a deep neural network structure, a three-dimensional variable model is firstly fitted to an image to serve as an initial shape, virtual multi-view generation is carried out on the basis, the initial shape and a target shape are input into a many-to-one funnel network, residual errors between the initial shape and the target shape are learned, and finally the network can intensively optimize the fine structure of the three-dimensional face through a vision-consistent loss function to improve the reconstructed visual effect
The invention is further described below in connection with specific embodiments with reference to the accompanying drawings.
Referring to fig. 1 to 4, a first aspect of the present invention provides a scene-adaptive fine three-dimensional face reconstruction method, the method comprising the steps of: step S100, in the training data construction process, the face shape of a training set is enhanced based on a 3DMM (namely a face 3D deformation statistical model) and a graphic imaging model, and a plurality of high-fidelity training data are obtained; the high-fidelity training data comprises three-dimensional face data and corresponding images; step S200, based on the acquired high-fidelity training data, fitting a three-dimensional variable model to the corresponding image of the three-dimensional face data as an initial shape, and based on the corresponding image of the three-dimensional face data and the initial shape, performing virtual multi-view generation to obtain a multi-view image; inputting the multi-view images into a many-to-one funnel network, and optimizing through a vision-consistent loss function to obtain a refined three-dimensional face shape. In the invention, based on the obtained high-fidelity training data, a single-view fine three-dimensional face reconstruction method is provided, and the network can intensively optimize the fine structure of the three-dimensional face through a virtual multi-view image generation method, a many-to-one funnel network structure and a vision consistent loss function, so that the visual effect of the three-dimensional face reconstruction is effectively improved.
The augmentation in step S100 specifically includes the steps of: step S110, using gridding complementation to incomplete depth channels in an input sample image, and converting the image into a first three-dimensional grid; the depth channel comprises a depth channel of a face region and a depth channel of a background region; the method for acquiring the value of the depth channel of the face region specifically comprises the following steps: fitting an input sample image based on 3DMM to obtain an initial three-dimensional face shape, and obtaining the initial three-dimensional face shape based on the initial three-dimensional face shape by adopting a non-rigid registration algorithm; the sample image is an RGB-D image containing a human face.
The value of the depth channel of the background region is obtained based on the original depth channel value and the smoothness constraint; the acquisition method specifically comprises the following steps: uniformly paving a plurality of anchor points in a background area, constructing a triangular network based on the paved anchor points through a Delaunay triangulation algorithm, and calculating the depth of each anchor point through a preset first method; and complementing the depth channel of the background area based on the depth of each anchor point.
The preset first method specifically comprises the following steps:
Wherein/> Representing the Depth of the ith anchor point, mask (x i,yi) represents whether the Depth channel of the ith anchor point has a value, depth (x i,yi) is the value of the Depth channel of the RGB-D image at the position of the ith anchor point, and Connect (i, j) represents whether the ith anchor point and the j anchor point are connected by the edge of the triangle network.
In step S120, eyes, nose, mouth and cheeks are randomly selected from the training set (i.e. the three-dimensional face sample in the data set), and are combined to obtain a first three-dimensional face shape as a target for subsequent shape migration.
Step S130, replacing the face area in the first three-dimensional grid in the step S110 with a first three-dimensional face shape, and adjusting the position of a background anchor point through smoothness constraint of a background area to obtain a second three-dimensional grid as a three-dimensional structure after shape migration; the position adjustment of the background anchor point is specifically:
Wherein/> For the location of the ith anchor point on the source image,/>Is the target location of the ith anchor point, faceCounter (i) indicates whether the ith anchor point is located in the facial contour, connect (i, j) indicates whether the ith, j anchor points are connected to the background grid.
Step S140, performing shadow migration on the second three-dimensional grid obtained in the step S130 based on the graphical imaging model to obtain a third three-dimensional grid; shadow migration is specifically: based on a preset imaging formula, the normal vector and the specular reflection value are replaced by the value after shape migration, and the other imaging parameters still use the value of the source image to obtain the target face after shadow migration.
The preset imaging formula is as follows: Wherein/> For the color value of the ith point of the three-dimensional shape, T i is the texture value of the ith point, amb is ambient light, and the diagonal matrix Dir represents parallel light in the l direction,/>For the normal direction of the ith point, k s is specular reflection, ve is viewing angle direction, v is angle distribution parameter controlling specular reflection, r i is direction of maximum specular reflection value,/>
And step S150, rendering the obtained third three-dimensional grid to obtain an image with the shape transferred so as to complete the enhancement of the shape of the high-fidelity three-dimensional face.
Further, the step S200 specifically includes the following steps: step S210, fitting an input sample image based on 3DMM to obtain a roughly reconstructed three-dimensional face; based on the rough reconstructed three-dimensional face, gridding the input sample image to obtain a fourth three-dimensional grid; the depth channel value of the face region is obtained by roughly reconstructing the three-dimensional face, and the depth channel value of the background region is obtained by roughly reconstructing the average value of the three-dimensional face.
Step S220, mirror-image overturning is carried out on the input sample image, the step S210 is repeated to obtain a three-dimensional net of the image after mirror image, the three-dimensional net is used as a fifth three-dimensional net, and the fifth three-dimensional net is combined with a fourth three-dimensional net to obtain a sixth three-dimensional net with complete textures;
Step S230, rendering the sixth three-dimensional mesh shape according to a first preset viewing angle, wherein the first preset viewing angle comprises five fixed viewing angles (pitch angle, yaw angle) respectively (0 ° ), (0 °,25 °), (0 °,50 °), (15 °,0 °), (-25 °,0 °); meanwhile, in the rendering process, the visibility of each triangular net is calculated, and the visibility is rendered to a target visual angle as the color of a face piece (namely the face net), so as to obtain a visibility graph vis (m);
Where l= [0, 1] is the viewing angle direction, n (·) is the normal vector of the patch, and m is the patch position at the original viewing angle.
Step S240, obtaining a virtual multi-view image I based on the visibility graph and the rendering result;
I=λ☉Ioriginflip☉Iflip with
wherein I origin and I flip are images rendered by the original three-dimensional network and the mirror three-dimensional network under the target viewing angle, and, as for the dot product operation, λ and λ flip are weights of the corresponding images.
Step S250, inputting a preset number of virtual views into corresponding encoders to extract image features, wherein in the embodiment, the preset number is five, namely, five virtual views are respectively input into the five encoders to extract image features, and the image features are unfolded on a UV space to obtain a UV feature group of a multi-view face; the UV feature set is input into a decoder after being connected in series, the offset delta S between the real three-dimensional face S * and the rough reconstructed three-dimensional face S init is obtained by deconvolution, and the offset delta S and the rough reconstructed three-dimensional face S init are added to obtain a reconstructed three-dimensional shape;
Step S260, setting the texture value of each point of the reconstructed three-dimensional human shape as (255 ), placing the texture value under the direct front light, and calculating the RGB value of each point under the light; rendering it according to a second preset viewing angle based on the micro-renderers, wherein the second preset viewing angle is (0 ° ), (0 °,90 °), (0 °, -90 °), (30 °,0 °), (-30 °,0 °); repeating the process for rendering the real three-dimensional face S * serving as the target face, calculating the error of the rendered image, and obtaining an evaluation index L psd of visual similarity; wherein, psd is a gypsum image descriptor (Plaster Sculpture Descriptor);
Wherein v represents the corresponding view angle number; r (&, &) represents a rendering function, the input of which is three-dimensional shape, texture and illumination; s * is a real three-dimensional face; s init is roughly reconstructing a three-dimensional face; Δs is the offset; t w is a full white texture, I ort h represents parallel light in the direction (0, 1).
Further, the method for reconstructing the fine three-dimensional face of the scene in a self-adaptive manner further comprises step S300, and the method for reconstructing the scene in a self-adaptive manner specifically comprises the following steps: the fine tuning of the many-to-one funnel network, namely the scene self-adaption method of the three-dimensional reconstruction, is achieved by generating pseudo labels for unlabeled images in a target scene to fine tune the many-to-one funnel network, and the adaptability of the network to the target scene is improved.
The fine tuning process is specifically (i.e., the adaptive method is specifically): step S310, grouping the data in the training set according to the attributes, dividing the data into a plurality of parts based on the distribution of the two attributes of age and gender, and grouping the data for training to obtain the exclusive three-dimensional shape model corresponding to the three attributes. Wherein the age distribution is associated with the modelComprises five parts of infants, teenagers, young, middle-aged and elderly people; the sex distribution association model comprises two parts, namely male theta ma and female theta female;
Step S320, inputting a non-labeling image of a target scene, estimating the age and the sex of the non-labeling image, and matching corresponding models from model groups with different attributes to obtain two model parameters theta age、θgender;
step S330, extracting key points F and segmentation masks M of non-labeled images in the target scene;
Step S340, carrying out targeted augmentation on an input sample image, including color disturbance, gaussian blur, attitude disturbance, random occlusion and the like, so as to obtain a series of augmented images;
Step S350, based on the results of step S330 and step S340, fine tuning the two model parameters θ age、θgender with weak supervision constraint and self-supervision constraint, and generating a pseudo tag for the unlabeled image of the target scene, to obtain a pseudo tag model L self:
Lself=α·Lweak+β·Lself_shp+γ·Lself_pose
Lself_shp=||Netshp(x)-Netshp(Aug(x))||;
Lself_pose=||MLP(f(x),f(Aug(x)))-Δpx||;
Wherein α, β, γ are weighting coefficients of three losses, L weak is a weak supervision constraint, L self_shp is a self-supervision shape information constraint, and L self_pose is a self-supervision pose constraint; f is a two-dimensional face key point, i is a key point index, and V 2d is a point which is estimated by the model and is consistent with F semantic after being projected into an image space; n 2d is the number of key points of the two-dimensional face; m is a two-dimensional face segmentation mask, V seg is a boundary point of the projection of the pseudo three-dimensional face label to the two-dimensional and S semantics, and D is the Euclidean distance from the point to the segmentation mask region; net shp (·) is a shape reconstruction network, x is an input image, and Aug is a random image augmentation operation; the input of the MLP (& gtis) is the top-level characteristic extracted from the original image and the augmented image by the three-dimensional reconstruction network, and Deltap x is the gesture offset from the original image to the augmented image; finally, a pseudo tag model group is obtained:
Step S360, generating a plurality of pseudo three-dimensional labels for the target scene data based on the generated pseudo label model, obtaining updated pseudo three-dimensional labels by adopting a mean value fusion strategy, combining the updated pseudo three-dimensional labels with the source data, and retraining a many-to-one funnel network, so that the model can maintain the performance of the source scene, and meanwhile, the generalization of the target scene is improved.
The first aspect of the invention discloses a scene self-adaptive fine three-dimensional face reconstruction method, a high-fidelity virtual three-dimensional face data generation method, a single-view fine three-dimensional face reconstruction method and a three-dimensional reconstruction scene self-adaptive method, wherein the high-fidelity virtual three-dimensional face data generation method is required for fine three-dimensional reconstruction under an unconstrained scene. The high-fidelity virtual three-dimensional face data generation method is characterized in that in the training data construction process, the face shape of a training set is enhanced based on a three-dimensional deformable model and a graphic imaging model, and large-scale three-dimensional face data and corresponding images thereof are obtained. Based on the generated high-fidelity training set, the invention designs a single-view fine three-dimensional face reconstruction method based on a deep neural network, which comprises the following steps: firstly, a three-dimensional variable model is fitted to an image to serve as an initial shape, virtual multi-view generation is carried out on the basis, the initial shape is input into a many-to-one funnel network, residual errors between the initial shape and a target shape are learned, and finally, optimization is carried out through a vision-consistent loss function, so that the network can intensively optimize the fine structure of a three-dimensional human face, and the reconstructed visual effect is improved. On the basis, the scene self-adaptation method generates pseudo labels for unlabeled images in the target scene to finely adjust the many-to-one funnel network, and improves the adaptability of the network to the target scene. Aiming at the defects of strong model sense and poor generalization of the reconstruction result of the existing three-dimensional face reconstruction method, the invention improves three aspects of training data construction, model design and scene self-adaption, provides a network structure and a corresponding optimization method aiming at fine shape reconstruction, designs a scene self-adaption method aiming at three-dimensional face reconstruction on the basis, realizes fine shape reconstruction and improves the robustness of the model under an unconstrained scene.
Referring to fig. 5, a second aspect of the present invention provides a scene-adaptive fine three-dimensional face reconstruction system, comprising: the system comprises an acquisition module 100, an augmentation module 200, a fitting module 300, a generation module 400 and an optimization module 500;
The acquisition module is configured to acquire a two-dimensional face image to be reconstructed as an input sample image;
the augmentation module is configured to augment an input sample image based on the 3DMM and the graphical imaging model to obtain a plurality of high-fidelity training data;
The fitting module is configured to fit a three-dimensional variable model to the corresponding image of the three-dimensional face data based on the acquired high-fidelity training data to serve as an initial shape;
the generating module is configured to generate virtual multi-view based on the three-dimensional face data corresponding image and the initial shape so as to obtain a multi-view image;
the optimizing module is configured to input the multi-view images into a many-to-one funnel network, and optimize the multi-view images through a vision-consistent loss function so as to obtain a refined three-dimensional face shape.
An electronic device of a third embodiment of the present invention includes: at least one processor; and a memory communicatively coupled to at least one of the processors; the memory stores instructions executable by the processor for execution by the processor to implement the scene-adaptive fine three-dimensional face reconstruction method of any one of the above.
A computer-readable storage medium of a fourth embodiment of the present invention stores computer instructions for execution by the computer to implement the scene-adaptive fine three-dimensional face reconstruction method of any one of the above.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the storage device and the processing device described above and the related description may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.
Referring now to FIG. 6, there is shown a schematic diagram of a computer system of a server for implementing embodiments of the methods, systems, and apparatus of the present application. The server illustrated in fig. 6 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present application.
As shown in fig. 6, the computer system includes a central processing unit (CPU, central Processing Unit) 601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage portion 608 into a random access Memory (RAM, random Access Memory) 603. In the RAM 603, various programs and data required for system operation are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other through a bus 604. An Input/Output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, mouse, etc.; an output portion 607 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN (local area network ) card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The drive 610 is also connected to the I/O interface 605 as needed. Removable media 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on drive 610 so that a computer program read therefrom is installed as needed into storage section 608.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 609 and/or installed from the removable medium 611. The above-described functions defined in the method of the present application are performed when the computer program is executed by a Central Processing Unit (CPU) 601. The computer readable medium of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The terms "first," "second," and the like, are used for distinguishing between similar objects and not for describing a particular sequential or chronological order.
It should be noted that, in the description of the present invention, terms such as "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", and the like, which indicate directions or positional relationships, are based on the directions or positional relationships shown in the drawings, are merely for convenience of description, and do not indicate or imply that the apparatus or elements must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
Furthermore, it should be noted that, in the description of the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention can be understood by those skilled in the art according to the specific circumstances.
The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, article, or apparatus/means that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, article, or apparatus/means.
Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will be within the scope of the present invention.

Claims (9)

1. A scene self-adaptive fine three-dimensional face reconstruction method is characterized by comprising the following steps:
step S100, the face shape of the training set is enhanced based on a 3DMM and a graphic imaging model, and a plurality of high-fidelity training data are obtained; the high-fidelity training data comprises three-dimensional face data and corresponding images thereof;
Step S200, based on the acquired high-fidelity training data, fitting a three-dimensional variable model to the corresponding image of the three-dimensional face data as an initial shape, and based on the corresponding image of the three-dimensional face data and the initial shape, performing virtual multi-view generation to obtain a multi-view image;
inputting the multi-view images into a many-to-one funnel network, and optimizing the multi-view images through a vision-consistent loss function to obtain a refined three-dimensional face shape; the augmentation in step S100 specifically includes the steps of:
step S110, complementing incomplete depth channels in an input sample image by using gridding, and converting the image into a first three-dimensional grid;
The depth channel comprises a depth channel of a face area and a depth channel of a background area, and the value acquisition method of the depth channel of the face area specifically comprises the following steps: fitting an input sample image based on 3DMM to obtain an initial three-dimensional face shape, and obtaining the initial three-dimensional face shape based on the initial three-dimensional face shape by adopting a non-rigid registration algorithm;
the value of the depth channel of the background area is obtained based on the original depth channel value and smoothness constraint; the sample image is an RGB-D image containing a human face;
step S120, eyes, nose, mouth and cheeks are randomly selected from the training set, and are combined to obtain a first three-dimensional face shape;
Step S130, replacing the face area in the first three-dimensional grid in step S110 with the first three-dimensional face shape, and adjusting the position of a background anchor point through the smoothness constraint of the background area to obtain a second three-dimensional grid as a three-dimensional structure after shape migration;
step S140, performing shadow migration on the second three-dimensional grid based on a graphical imaging model to obtain a third three-dimensional grid;
And step S150, rendering the third three-dimensional grid to obtain an image with the shape transferred so as to complete the enhancement of the shape of the high-fidelity three-dimensional face.
2. The scene adaptive fine three-dimensional face reconstruction method according to claim 1, wherein the method for acquiring the values of the depth channels of the background area in step S110 is as follows:
uniformly paving a plurality of anchor points in a background area, constructing a triangular network based on the paved anchor points through a Delaunay triangulation algorithm, and calculating the depth of each anchor point through a preset first method; completing the depth channel of the background area based on the depth of each anchor point;
the preset first method specifically comprises the following steps:
Wherein, Representing the Depth of the ith anchor point, mask (x i,yi) represents whether the Depth channel of the ith anchor point has a value, depth (x i,yi) is the value of the Depth channel of the RGB-D image at the position of the ith anchor point, and Connect (i, j) represents whether the ith anchor point and the j anchor point are connected by the edge of the triangle network.
3. The scene adaptive fine three-dimensional face reconstruction method according to claim 1, wherein the adjusting the position of the background anchor point in step S130 is specifically:
Wherein, For the location of the ith anchor point on the source image,/>Is the target location of the ith anchor point, faceCounter (i) indicates whether the ith anchor point is located in the facial contour, connect (i, j) indicates whether the ith, j anchor points are connected to the background grid.
4. The scene adaptive fine three-dimensional face reconstruction method according to claim 1, wherein the shadow migration in step S140 is specifically:
based on a preset imaging formula, the normal vector and the specular reflection value are replaced by the value after shape migration, and the other imaging parameters still use the value of the source image to obtain the target face after shadow migration
The preset imaging formula is as follows:
Wherein, For the color value of the ith point of the three-dimensional shape, T i is the texture value of the ith point, amb is ambient light, and the diagonal matrix Dir represents parallel light in the l direction,/>For the normal direction of the ith point, k s is specular reflection, ve is viewing angle direction, v is angle distribution parameter controlling specular reflection, r i is direction of maximum specular reflection value,/>
5. The scene adaptive fine three-dimensional face reconstruction method according to claim 1, wherein the step S200 specifically comprises the steps of:
step S210, fitting an input sample image based on 3DMM to obtain a roughly reconstructed three-dimensional face; based on the rough reconstructed three-dimensional face, gridding the input sample image to obtain a fourth three-dimensional grid; the depth channel value of the face region is obtained by roughly reconstructing a three-dimensional face, and the depth channel value of the background region is obtained by roughly reconstructing a mean value of the three-dimensional face;
step S220, mirror-image overturning is carried out on the input sample image, the step S210 is repeated to obtain a three-dimensional grid of the image after mirror image, the three-dimensional grid is used as a fifth three-dimensional grid, and the fifth three-dimensional grid is combined with the fourth three-dimensional grid to obtain a sixth three-dimensional grid with complete textures;
Step S230, rendering the sixth three-dimensional mesh shape according to the first preset viewing angle, and calculating the visibility of each triangle mesh to obtain a visibility graph:
step S240, a virtual multi-view image is obtained based on the visibility graph and the rendering result;
step S250, respectively inputting a preset number of virtual views into corresponding encoders to extract image features, and expanding the image features to a UV space to obtain a UV feature group of the multi-view face;
The UV feature sets are connected in series and then input into a decoder, offset between a real three-dimensional face and a rough reconstructed three-dimensional face is obtained by adopting deconvolution, and the offset and the rough reconstructed three-dimensional face are added to obtain a reconstructed three-dimensional shape;
Step S260, setting the texture value of each point of the reconstructed three-dimensional face as (255 ), placing the texture value under direct front light, and calculating the RGB value of each point under the light; rendering the image according to a second preset viewing angle based on the micro-renderers; repeatedly rendering a real three-dimensional face S * serving as a target face, and obtaining an evaluation index L psd of visual similarity based on the calculated error of the rendered image;
Wherein v represents the corresponding view angle number; r (&, &) represents a rendering function, the input of which is three-dimensional shape, texture and illumination; s * is a real three-dimensional face; s init is a rough reconstruction three-dimensional face S init; Δs is the offset; t w is a full white texture, I ort h represents parallel light in the direction (0, 1).
6. The scene adaptive fine three-dimensional face reconstruction method according to claim 1, further comprising step S300, comprising: fine tuning the many-to-one funnel network;
The fine tuning process is specifically as follows:
Step S310, grouping data in a training set according to attributes, dividing the data into a plurality of parts based on the distribution of two attributes of age and gender, and grouping the data for training to obtain a dedicated three-dimensional shape model corresponding to the two attributes;
wherein the age distribution is associated with the model Comprises five parts of infants, teenagers, young, middle-aged and elderly people; the sex distribution association model comprises two parts, namely male theta male and female theta female;
Step S320, inputting a non-labeling image of a target scene, estimating the age and the sex of the non-labeling image, and matching corresponding models from model groups with different attributes to obtain two model parameters theta age、θgender;
step S330, extracting key points and segmentation masks of non-annotated images in the target scene;
Step S340, carrying out targeted amplification on the input sample image, and obtaining an amplified image;
Step S350, based on the results of step S330 and step S340, fine tuning the two model parameters θ age、θgender using the weak supervision constraint and the self-supervision constraint to obtain a pseudo tag model:
Step S360, generating a plurality of pseudo three-dimensional labels for the target scene data based on the pseudo label model, obtaining updated pseudo three-dimensional labels by adopting a mean value fusion strategy, and combining the updated pseudo three-dimensional labels with the source data to retrain the many-to-one funnel network.
7. A scene-adaptive fine three-dimensional face reconstruction system, comprising: the device comprises an acquisition module, an augmentation module, a fitting module, a generation module and an optimization module;
The acquisition module is configured to acquire a two-dimensional face image to be reconstructed as an input sample image;
the augmentation module is configured to augment an input sample image based on the 3DMM and the graphical imaging model to obtain a plurality of high-fidelity training data;
The fitting module is configured to fit a three-dimensional variable model to the corresponding image of the three-dimensional face data based on the acquired high-fidelity training data to serve as an initial shape;
the generating module is configured to generate virtual multi-view based on the three-dimensional face data corresponding image and the initial shape so as to obtain a multi-view image;
The optimizing module is configured to input the multi-view images into a many-to-one funnel network, and optimize the multi-view images through a vision-consistent loss function so as to obtain a refined three-dimensional face shape;
The augmentation module comprises:
Complementing incomplete depth channels in an input sample image by using gridding, and converting the image into a first three-dimensional grid;
The depth channel comprises a depth channel of a face area and a depth channel of a background area, and the value acquisition method of the depth channel of the face area specifically comprises the following steps: fitting an input sample image based on 3DMM to obtain an initial three-dimensional face shape, and obtaining the initial three-dimensional face shape based on the initial three-dimensional face shape by adopting a non-rigid registration algorithm;
the value of the depth channel of the background area is obtained based on the original depth channel value and smoothness constraint; the sample image is an RGB-D image containing a human face;
randomly selecting eyes, nose, mouth and cheeks from the training set, and merging the eyes, nose, mouth and cheeks to obtain a first three-dimensional face shape;
Replacing a face region in the first three-dimensional grid with the first three-dimensional face shape, and adjusting the position of a background anchor point through smoothness constraint of a background region to obtain a second three-dimensional grid to be used as a three-dimensional structure after shape migration;
Shadow migration is carried out on the second three-dimensional grid based on a graphical imaging model, and a third three-dimensional grid is obtained;
And rendering the third three-dimensional grid to obtain an image with the shape transferred so as to complete the shape augmentation of the high-fidelity three-dimensional face.
8. An electronic device, comprising: at least one processor; and a memory communicatively coupled to at least one of the processors; wherein the memory stores instructions executable by the processor for execution by the processor to implement the scene-adaptive fine three-dimensional face reconstruction method of any one of claims 1-6.
9. A computer readable storage medium storing computer instructions for execution by the computer to implement the scene-adaptive fine three-dimensional face reconstruction method of any one of claims 1-6.
CN202110601213.XA 2021-05-31 2021-05-31 Scene self-adaptive fine three-dimensional face reconstruction method, system and electronic equipment Active CN113269862B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110601213.XA CN113269862B (en) 2021-05-31 2021-05-31 Scene self-adaptive fine three-dimensional face reconstruction method, system and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110601213.XA CN113269862B (en) 2021-05-31 2021-05-31 Scene self-adaptive fine three-dimensional face reconstruction method, system and electronic equipment

Publications (2)

Publication Number Publication Date
CN113269862A CN113269862A (en) 2021-08-17
CN113269862B true CN113269862B (en) 2024-06-21

Family

ID=77233658

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110601213.XA Active CN113269862B (en) 2021-05-31 2021-05-31 Scene self-adaptive fine three-dimensional face reconstruction method, system and electronic equipment

Country Status (1)

Country Link
CN (1) CN113269862B (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114049442B (en) * 2021-11-19 2024-07-23 北京航空航天大学 Calculation method of line of sight of 3D face
CN114238904B (en) * 2021-12-08 2023-02-07 马上消费金融股份有限公司 Identity recognition method, and training method and device of dual-channel hyper-resolution model
CN114399424B (en) * 2021-12-23 2025-01-07 北京达佳互联信息技术有限公司 Model training methods and related equipment
CN114332330A (en) * 2021-12-24 2022-04-12 珠海豹趣科技有限公司 A flow special effect production method, device, electronic device and medium
CN114119849B (en) * 2022-01-24 2022-06-24 阿里巴巴(中国)有限公司 Three-dimensional scene rendering method, device and storage medium
CN114549728B (en) * 2022-03-25 2025-01-07 北京百度网讯科技有限公司 Image processing model training method, image processing method, device and medium
CN114529679B (en) * 2022-04-19 2022-09-16 清华大学 Method and device for generating computed holographic field based on nerve radiation field
CN114972584B (en) * 2022-06-22 2023-03-24 猫小兜动漫影视(深圳)有限公司 Method, system, equipment and product for constructing three-dimensional cartoon multi-dimensional model
CN115410032B (en) * 2022-07-26 2025-05-13 北京工业大学 OCTA image classification structure training method based on self-supervised learning
CN115457097A (en) * 2022-08-22 2022-12-09 杭州欣禾圣世科技有限公司 Face reconstruction method, system, device and storage medium based on generated images
CN115393514A (en) * 2022-08-26 2022-11-25 北京百度网讯科技有限公司 Training method of three-dimensional reconstruction model, three-dimensional reconstruction method, device and equipment
CN115393486B (en) * 2022-10-27 2023-03-24 科大讯飞股份有限公司 Method, device and equipment for generating virtual image and storage medium
CN115564642B (en) * 2022-12-05 2023-03-21 腾讯科技(深圳)有限公司 Image conversion method, image conversion device, electronic apparatus, storage medium, and program product
CN116109778B (en) * 2023-03-02 2025-07-22 南京大学 Face three-dimensional reconstruction method based on deep learning, computer equipment and medium
CN116433812B (en) * 2023-06-08 2023-08-25 海马云(天津)信息技术有限公司 Method and device for generating virtual characters using 2D face pictures
CN119180744A (en) * 2023-06-21 2024-12-24 中兴通讯股份有限公司 Multi-view image data generation method and device, terminal equipment and storage medium
CN117953224B (en) * 2024-03-27 2024-07-05 暗物智能科技(广州)有限公司 Open vocabulary 3D panorama segmentation method and system
CN119850410B (en) * 2024-12-03 2025-10-03 同济大学 A 3D reconstruction method for objects in agricultural machinery scenes based on self-evolution mechanism
CN120236005A (en) * 2025-02-18 2025-07-01 华南师范大学 Three-dimensional head portrait model construction method, system, device and storage medium
CN119672276B (en) * 2025-02-21 2025-07-08 苏州元脑智能科技有限公司 Three-dimensional model editing method, device, storage medium and program product
CN119723251B (en) * 2025-02-26 2025-07-01 天津大学 Antagonistic sample generation method based on automatic driving multi-sensor fusion

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110619676A (en) * 2019-09-18 2019-12-27 东北大学 End-to-end three-dimensional face reconstruction method based on neural network
CN111832517A (en) * 2020-07-22 2020-10-27 福建帝视信息科技有限公司 Low-resolution face keypoint detection method based on gated convolution

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10335045B2 (en) * 2016-06-24 2019-07-02 Universita Degli Studi Di Trento Self-adaptive matrix completion for heart rate estimation from face videos under realistic conditions
CN107122705B (en) * 2017-03-17 2020-05-19 中国科学院自动化研究所 Face key point detection method based on three-dimensional face model
CN108510573B (en) * 2018-04-03 2021-07-30 南京大学 A method for reconstruction of multi-view face 3D model based on deep learning
CN110516642A (en) * 2019-08-30 2019-11-29 电子科技大学 A lightweight face 3D key point detection method and system
CN111882643A (en) * 2020-08-10 2020-11-03 网易(杭州)网络有限公司 Three-dimensional face construction method and device and electronic equipment
CN112002014B (en) * 2020-08-31 2023-12-15 中国科学院自动化研究所 Fine structure-oriented three-dimensional face reconstruction method, system and device
CN112819947B (en) * 2021-02-03 2025-02-11 Oppo广东移动通信有限公司 Three-dimensional face reconstruction method, device, electronic device and storage medium
CN112767468B (en) * 2021-02-05 2023-11-03 中国科学院深圳先进技术研究院 Self-supervised 3D reconstruction method and system based on collaborative segmentation and data enhancement
CN114332415B (en) * 2022-03-09 2022-07-29 南方电网数字电网研究院有限公司 Three-dimensional reconstruction method and device of power transmission line corridor based on multi-view technology

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110619676A (en) * 2019-09-18 2019-12-27 东北大学 End-to-end three-dimensional face reconstruction method based on neural network
CN111832517A (en) * 2020-07-22 2020-10-27 福建帝视信息科技有限公司 Low-resolution face keypoint detection method based on gated convolution

Also Published As

Publication number Publication date
CN113269862A (en) 2021-08-17

Similar Documents

Publication Publication Date Title
CN113269862B (en) Scene self-adaptive fine three-dimensional face reconstruction method, system and electronic equipment
US11538229B2 (en) Image processing method and apparatus, electronic device, and computer-readable storage medium
CN111598998B (en) Three-dimensional virtual model reconstruction method, three-dimensional virtual model reconstruction device, computer equipment and storage medium
US11823327B2 (en) Method for rendering relighted 3D portrait of person and computing device for the same
US9317970B2 (en) Coupled reconstruction of hair and skin
WO2021174939A1 (en) Facial image acquisition method and system
US9792725B2 (en) Method for image and video virtual hairstyle modeling
US9679192B2 (en) 3-dimensional portrait reconstruction from a single photo
WO2022001236A1 (en) Three-dimensional model generation method and apparatus, and computer device and storage medium
CN111445582A (en) A 3D reconstruction method of single image face based on illumination prior
CN113111861A (en) Face texture feature extraction method, 3D face reconstruction method, device and storage medium
US8670606B2 (en) System and method for calculating an optimization for a facial reconstruction based on photometric and surface consistency
US10169891B2 (en) Producing three-dimensional representation based on images of a person
CN118071968B (en) Intelligent interaction deep display method and system based on AR technology
CN111462030A (en) Multi-image fused stereoscopic set vision new angle construction drawing method
CN111275824A (en) Surface reconstruction for interactive augmented reality
CN118736092A (en) A method and system for rendering virtual human at any viewing angle based on three-dimensional Gaussian splashing
US20250157114A1 (en) Animatable character generation using 3d representations
US10803677B2 (en) Method and system of automated facial morphing for eyebrow hair and face color detection
CN113989434A (en) Human body three-dimensional reconstruction method and device
JP2025107617A (en) Image processing device, image processing method and program
KR102559691B1 (en) Method and device for reconstructing neural rendering-based geometric color integrated 3D mesh
CN119991937A (en) A single-view 3D human body reconstruction method based on Gaussian surface elements
CN114913284A (en) Three-dimensional face reconstruction model training method, device and computer equipment
CN110751026B (en) Video processing method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant