Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
In the field of human body parametric representation, the traditional skeleton skin-based method has high model speed and strong action expression capability, but the reconstruction precision is limited because simple linear dimension reduction is only carried out on the deformation space of the identity; on the other hand, the representation of the motion is directly based on the relative motion of the joint points and does not limit the rationality of the human motion. To this end, an embodiment of the present invention provides a method for representing a three-dimensional human body shape, as shown in fig. 1, the method mainly includes:
and 11, preprocessing the collected human body grid data set, deforming based on the standard posture, and calculating the ACAP deformation expression and the human body deformation characteristics describing the deformation of the rigid blocks to form a training data set.
The preferred embodiment of this step is as follows:
1) And carrying out standardization processing on the collected human body grid data set to obtain human body grid data with unified topology, and deforming to obtain a neutral human body grid corresponding to each human body grid data through the defined standard posture.
In the embodiment of the invention, the original human body grid data can be obtained from the network, including SCAPE, FAUST, dyna, MANO and the like. For collected human mesh data sets of different sources, the mesh representations of the collected human mesh data sets are inconsistent and need to be converted into a standard topology G = { V, E }, where V is a vertex set and E is an edge set. In this embodiment, a source data is used as a standard topology (e.g., SCAPE); calculating corresponding ACAP (consistent deformation) deformation representation by using action grids in standard topology (for example, seventy action grids in SCAPE), obtaining a group of priori deformation representation bases C of human body actions, and recovering the deformation representation of a human body by using a group of parameters w; and (3) using the linear space of Cw as a priori space of human body deformation, then optimizing a group of vertex coordinates p and rigid transformation parameters of standard topology, namely a rotation parameter R and a translation parameter t, and standardizing human body grid data sets from different sources.
The normalization process is to solve the following optimization problem:
wherein λ is 1 、λ 2 、λ 3 Are all set weights; | w | charging 1 Is a sparse regularization constraint on parameter w;
E prior is a human body deformation prior term determined by a prior deformation representation base C, so that the optimized grid vertex conforms to the human body shape as much as possible, and is represented as follows:
wherein, T i (w) is the prior deformation of a neighborhood of the ith vertex in the standard topology, the prior deformation represents that the base C is multiplied by w to obtain an ACAP characteristic, and then the ith vertex component of the ACAP characteristic is converted to obtain T i (w);q i Is the position of the ith vertex in the deformed reference grid under the standard topology; relative to the deformation reference grid, the position of the ith vertex of the grid to be optimized under the standard topology is p i (ii) a N (i) refers to a neighborhood vertex index set of the ith vertex under the standard topology, j refers to the jth vertex in the N (i), and the positions of the corresponding vertices in the deformation reference grid and the grid to be optimized are respectively represented as q j 、p j ;c ij Is an edge weight value calculated on the deformation reference grid, called cotangent weight, and is specifically q j And q is i The edge weight value of (2). In the embodiment of the invention, the deformation reference grid is a human body grid required for calculating ACAP deformation expression, and can be selected according to actual conditions or experience; as shown in fig. 2, the pre-selected warped reference grid is on the left. Those skilled in the art will appreciate that the deformed reference mesh corresponds one-to-one to all vertices of the standard topology, except for the different locations of the vertices; in the embodiment of the invention, the deformation reference gridFor reference, optimizing a grid under a standard topology; that is, q i Is stationary, p i Is optimized and the value is changed.
E icp Is a point-to-plane registration energy term with the nearest neighbor of the target mesh such that the optimization result is close to the target, expressed as:
d is an index set of the corresponding point pair selected by dynamic calculation under the standard topology; v. of
l(i) Is and p
i The corresponding point on the corresponding target grid,
representing point v
l(i) Normal direction of (2); the target grid refers to human body grid data sets of different sources to be optimized;
E lan the L2 loss energy terms of a group of manually marked sparse corresponding points of the standard topology and the target grid are expressed as follows to reduce sliding errors and avoid local minimum values:
where L is a pre-selected set of corresponding points in a standard topology, the above equation limits the spatial location of the points in the pre-selected set of corresponding points L to be as small as possible.
In the embodiment of the invention, the human body grid data fitted based on the mode has consistent standard topology, and has tolerable registration error with the original data.
The human body model data constructed on the basis also needs to define the corresponding neutral human body grids. Illustratively, a consistent a-gesture in the SPRING dataset may be used as the neutral gesture. And for all the human body data of each identity, selecting a human body with the minimum error with the SPRING average grid as a candidate human body grid, and then converting the candidate human body grid into the posture A by using an ARAP (as rigid as possible) deformation method to be used as a neutral human body grid of the human body of the identity.
2) Calculating ACAP deformation representation of each human body grid data and corresponding neutral human body grid, and recording as f and f s (ii) a And respectively calculating human body deformation characteristics g and g of the human body grid and the corresponding neutral human body grid describing rigid block deformation according to the hinge type deformation characteristics of the human body based on the skeleton s To obtain a set of training data { f, f } s ,g,g s And correspondingly processing the collected data to obtain a training data set.
By the method, the human body grid data with the uniform topology are obtained, and each identity human body has a neutral posture grid. In the embodiment of the invention, the deformation expression of the geometric shape of the ACAP is used for replacing the original Euclidean coordinate shape expression to enhance the modeling precision of large-scale deformation and obtain better performance compared with a general linear model.
The formula for calculating the human body grid data and the ACAP deformation expression of the corresponding neutral human body grid is as follows:
similarly, i represents a vertex in the standard topology, and the above is standardized, so the standard topology can be any standard topology after being standardized or a previously selected standard topology. The other parameters have the same meanings as above and are not described in detail.
T in the above formula i Refers to the affine transformation matrix of the ith vertex in the standard topology, the affine transformation matrix T i The method comprises the steps of transforming a local umbrella-shaped structure of a neighborhood of a deformation reference grid into various deformation information of a structure of a computational grid; by polar decomposition, T i Decomposition to rigid R i And a non-rigid part S i (determined by 3 and 6 degrees of freedom, respectively), after disambiguation of the rigid deformation part of each vertex, the ACAP deformation of the computational mesh is obtainedRepresenting; the dimensions of the representation are 9 times the number of vertices, each vertex having 3 and 6 parametric record rigid and non-rigid deformations, respectively.
Taking each human body grid data and corresponding neutral human body grid as calculation grid to be substituted into the formula to obtain corresponding ACAP deformation expressions f and f s 。
In addition, some approximately rigid parts, such as the lower arm, the head, etc., are defined on the human body in consideration of the skeleton-based hinged deformation characteristics of the human body. On the basis, a deformation characteristic g of a large scale of a human body for describing the deformation of the rigid block is defined, and the calculation formula is as follows:
wherein v is
k Is the set of vertices of the kth rigid block, q
i' 、p
i' Respectively representing the positions of the ith' vertex on the kth rigid block in the deformation reference grid and the calculation grid;
and &>
Respectively averaging the k-th rigid block on the deformation reference grid and the calculation grid; by means of radiation deformation of a rigid block>
And performing polar decomposition and parameterization to obtain the human body deformation characteristic, wherein the dimension of the human body deformation characteristic is 9 times of the number of approximate rigid blocks of the human body.
Similarly, each human body grid data and corresponding neutral human body grid are taken as calculation grids and are substituted into the formula to obtain deformation characteristics g and g s 。
As shown in fig. 2, the rigid blocks defined for calculating the deformation reference grid (left side of fig. 2) and the large-scale features of the human body (right side of fig. 2) are represented for ACAP deformation.
Step 12, constructing an encoder network and a hierarchical reconstruction network to form an end-to-end network structure, and training the network structure by using a training data set; in the training process, the ACAP deformation expression is coded through a coder network to obtain identity attributes and action attributes, the identity attributes and the action attributes are utilized to reconstruct a three-dimensional human body model through a reconstruction network, and errors between a reconstruction result and input training data are utilized to train the coder network and the hierarchical reconstruction network.
In the embodiment of the invention, the encoder network can adopt a standard variational self-encoder structure to learn the identity attribute e from the ACAP deformation expression f of the human body grid data s And attitude attribute e p Then, an end-to-end network structure is formed by combining the reconstructed network for training. Training data { f, f s ,g,g s Only f is used as the input of the network and the rest of the data is used to calculate the reconstruction error.
In the embodiment of the invention, the hierarchical reconstruction network is based on human body geometric prior and comprises two parts, wherein the first part is used for hierarchically reconstructing a main part of a three-dimensional human body model, the second part is used for reconstructing a difference part, and the reconstructed main part and the reconstructed difference part are added to obtain a reconstruction result; the reconstructed network is described as:
wherein,
representation utilization identity attribute e
s And attitude attribute e
p In conjunction with the reconstruction result of (a), based on the result of (b)>
Representation utilization identity attribute e
s The reconstruction result of (2);
Indicates a reconstruction result->
The main part b in (1), W represents a skin layer;
Represents the result of a reconstruction>
The difference part d in (1);
Indicates a reconstruction result->
Main part b of (1)
s ,
Representing a skin layer;
represents the result of a reconstruction>
The difference part d in
s ;
Reconstruction of the main parts b and b
s The following equation is used to reconstruct the human deformation characteristics
And &>
Wherein,
are independent mapping transformations, which may be modeled, for example, using a multi-tier perceptron.
Then, the skin layer is utilized
Reconstruction of the main parts b and b
s :
Wherein the skin layer
In the form of a matrix, is selected>
Represents->
The xth row and the y column; y is the number of rigid blocks, e.g., Y =16; the above principle is that the deformation of each vertex is obtained by linear convex combination of the relative rigid block deformation of the vertex, and the coefficient of the convex combination is determined by learning training and is not set manually.
Difference parts d and d s Expressed as:
wherein,
independent mapping transformations, illustratively, these mapping transformations may be modeled using a multi-tier perceptron.
As shown in fig. 3, an example of the visualization of the various computed portions of the reconstructed network over one reconstructed instance is given. B (e) in FIG. 3 s ,e p )、B(e s 0) i.e. main parts b and b as mentioned herein s 。
In the embodiment of the invention, the network can be trained end to end through the training data set, and after the training is finished, the decoupled low-dimensional hidden layer can be used for representing e s ,e p The method can be used for reconstructing a human body (or applied to aspects of human body editing, motion migration and the like), can also be applied to aspects of human body editing, motion migration and the like, and has wide application prospects in the fields of video live broadcast, virtual fitting, somatosensory games and the like.
During training, L1 mode loss between reconstruction and input features and distribution regularization loss of hidden layer variables can be adopted as loss functions.
The L1 mode loss can be expressed as:
in the above-mentioned formula, the compound has the following structure,
all are the reconstruction results of the relevant data in the training data, and the specific values 9 (number of deformation features) and 16 (number of rigid blocks) involved in the above formula) Are by way of example only and are not limiting.
The regular partial loss of the hidden layer parameter distribution is:
E sKL =D KL (q(e s |f)||p(e s ))
E pKL =D KL (q(e p |f)||p(e p ))
the two losses are KL divergence losses of which the distribution of the standard constraint hidden layer in the variational self-encoder meets the prior distribution.
And step 13, after training, inputting the identity attribute and the action attribute into a trained hierarchical reconstruction network to obtain a three-dimensional human body model reconstruction result.
After the network training is completed through the
step 12, the encoder network can be abandoned, and the reconstructed network is directly used as a decoupled low-dimensional human body parameterized model. The model reconstructs a complete human body grid from two groups of decoupling parameters respectively representing identity and posture. Specifically, the trained hierarchical reconstruction network may reconstruct the identity attribute and the posture attribute of the input data according to the method introduced in the training phase
And/or>
Both are represented by ACAP deformation, and the corresponding human body grid and the neutral human body grid can be obtained through a simple conversion. The conversion method referred to herein can be referred to in the prior art, and is not described in detail.
Compared with the traditional human body parameterized model representation method, the scheme of the embodiment of the invention mainly has the following advantages:
1) The characteristics of input and output are represented by nonlinear deformation, the traditional Euclidean coordinates are replaced, the precision is higher, and the deformation with large scale is more robust.
2) The reconstruction accuracy of the model is further improved by utilizing the strong fitting capability of the neural network and combining the framework design of human body deformation prior.
3) The obtained posture hidden layer representation has certain semantics by utilizing the learning of a large number of human body models with various postures, namely embedding reasonable actions of the human body into a low-dimensional space. However, the posture parameters of the conventional model often have no semantics and may generate unreasonable human body actions.
Through the description of the above embodiments, it is clear to those skilled in the art that the above embodiments may be implemented by software, or by software plus a necessary general hardware platform. With this understanding, the technical solutions of the embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.