HK1202690B

HK1202690B - Method and device for virtual wearing

Info

Publication number: HK1202690B
Application number: HK15103138.0A
Authority: HK
Inventors: 张斯聪
Original assignee: 北京京东尚科信息技术有限公司; 北京京东世纪贸易有限公司
Filing date: 2015-03-27
Publication date: 2018-01-19

Description

Method and device for realizing virtual try-on

Technical Field

The present invention relates to computer technologies, and in particular, to a method and an apparatus for implementing virtual try-on.

Background

With the development of e-commerce, online shopping is becoming an increasing choice of users. Apparel is one of the major consumer goods and is also the target of online shopping for many users. When clothes are purchased, the user usually needs to try on the clothes, and then virtual fitting and virtual fitting technologies are carried out.

The current virtual try-on technology mainly has two types of realization ways:

1. artificially synthesized model try-on

The method adopts the effect of wearing the virtual commodity on a pre-generated human body or a human body local model to provide virtual try-on for a user. This way does not have user's actual body information, and the effect of trying on is not good.

2. Special device for collecting real human body information and trying on

The method utilizes a special device such as a depth sensor to collect the actual body information of a user to form a model of a human body or a human body part for the user to try on. Although this method obtains the actual physical information of the user, it requires special equipment, and is usually provided in a special place provided by a merchant. The user typically only has a common image acquisition device such as a camera mounted on a cell phone or on a computer.

Disclosure of Invention

In view of this, the present invention provides a method and an apparatus for implementing virtual fitting, which enable a user to implement virtual fitting by using a common image capturing device, such as a camera on a mobile phone or a computer.

To achieve the above object, according to one aspect of the present invention, there is provided a method of implementing virtual try-on.

The method for realizing virtual try-on comprises the following steps: carrying out face detection on the acquired initial frame, generating an article image at an initial position under the condition of detecting a face, then overlapping the article image with the initial frame and outputting the article image, wherein the initial position is overlapped with the specified position of the face range in the initial frame; carrying out face posture detection on the face in the current frame to obtain the face posture of the current frame; and generating an article image again according to the current position of the article image and the face pose, enabling the article pose in the article image to be consistent with the face pose, and then overlapping the article image and the current frame for outputting.

Optionally, the step of performing face pose detection on the face in the current frame to obtain the face pose of the current frame includes: determining a plurality of characteristic points on the face image in the initial frame; the following processing is performed for each feature point: tracking the feature points to determine the positions of the feature points in the current frame, performing affine transformation on the neighborhood of the feature points in the initial frame according to the face posture of the previous frame to obtain the projection area of the neighborhood in the current frame, calculating the color offset between the neighborhood in the initial frame and the projection area in the current frame to serve as the tracking deviation of the feature points, and selecting a plurality of feature points with smaller tracking deviation for the plurality of determined feature points; and determining the face pose of the current frame according to the positions of the plurality of characteristic points with smaller tracking deviation in the initial frame and the positions of the plurality of characteristic points in the current frame.

Optionally, the selecting, for the plurality of feature points, a plurality of feature points with a smaller tracking deviation includes: for the determined tracking deviations of the plurality of feature points, clustering according to the tracking deviation by taking the maximum value and the minimum value as initial centers; and selecting the characteristic points corresponding to the two types of the characteristics points with smaller tracking deviation.

Optionally, after the step of determining the face pose of the current frame, the method further includes: and projecting the characteristic points corresponding to the two types of the characteristic points with larger tracking deviation to the image plane of the current frame according to the human face posture of the current frame, and replacing the positions of the characteristic points at the current frame with projection positions.

Optionally, before the step of performing face detection on the acquired initial frame, the method further includes: taking the collected current frame as the initial frame under the condition of receiving a reset instruction; after the step of clustering according to the magnitude of the tracking deviation to obtain two types, the method further comprises the following steps: and outputting prompt information and then receiving a reset instruction under the condition that the proportion of the number of the characteristic points of the class with smaller tracking deviation to the total number of the characteristic points is smaller than a first preset value or the proportion of the number of the characteristic points acquired in the current frame to the total number of the characteristic points acquired in the previous frame is smaller than a second preset value.

Optionally, the article image is a glasses image, a headwear image, or a neck gear image.

According to another aspect of the invention, an apparatus for enabling virtual fitting is provided.

The device for realizing virtual try-on comprises: the face detection module is used for carrying out face detection on the acquired initial frame; the first output module is used for generating an article image at an initial position under the condition that the face detection module collects a face, then overlapping the article image with the initial frame and outputting the article image, wherein the initial position is overlapped with the specified position of the face range in the initial frame; the human face posture detection module is used for detecting the human face posture of the current frame to obtain the human face posture of the current frame; and the second output module is used for generating an article image again according to the current position of the article image and the face pose, enabling the article pose in the article image to be consistent with the face pose, and then overlapping the article image and the current frame for output.

Optionally, the face gesture detection module is further configured to: determining a plurality of characteristic points on the face image in the initial frame; the following processing is performed for each feature point: tracking the feature points to determine the positions of the feature points in the current frame, performing affine transformation on the neighborhood of the feature points in the initial frame according to the face posture of the previous frame to obtain the projection area of the neighborhood in the current frame, calculating the color offset between the neighborhood in the initial frame and the projection area in the current frame to serve as the tracking deviation of the feature points, and selecting a plurality of feature points with smaller tracking deviation for the plurality of determined feature points; and determining the face pose of the current frame according to the positions of the plurality of characteristic points with smaller tracking deviation in the initial frame and the positions of the plurality of characteristic points in the current frame.

Optionally, the face gesture detection module is further configured to: for the determined tracking deviations of the plurality of feature points, clustering according to the tracking deviation by taking the maximum value and the minimum value as initial centers; and selecting the characteristic points corresponding to the two types of the characteristics points with smaller tracking deviation.

Optionally, the system further comprises a modification module, configured to, after the face pose detection module determines the face pose of the current frame, project the feature points corresponding to the one of the two types with the larger tracking deviation to an image plane of the current frame according to the face pose of the current frame, and replace the positions of the feature points at the current frame with projection positions.

Optionally, the system further comprises a reset module and a prompt module, wherein: the reset module is used for receiving a reset instruction and taking the collected current frame as the initial frame under the condition of receiving the reset instruction; the prompting module is used for outputting prompting information under the condition that the ratio of the number of characteristic points of one type with smaller tracking deviation to the total number of the characteristic points is smaller than a first preset value or the ratio of the number of the characteristic points acquired in the current frame to the total number of the characteristic points acquired in the previous frame is smaller than a second preset value after the face posture detection module carries out clustering according to the tracking deviation to obtain the two types.

According to the technical scheme of the invention, the human face posture of each frame is detected, and then the glasses posture is adjusted according to the human face posture, so that a user can complete virtual try-on by using a common image acquisition device, and the user can rotate the head to observe wearing effects of a plurality of angles, and the glasses have higher authenticity.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a schematic diagram of the basic steps of a method of implementing virtual fitting according to an embodiment of the invention;

FIG. 2 is a schematic diagram of the main steps of face pose detection according to an embodiment of the invention;

FIG. 3 is a schematic illustration of collected feature points according to an embodiment of the invention;

FIGS. 4A and 4B are schematic diagrams of texture region fetching in an initial frame and in a current frame, respectively, according to an embodiment of the present invention;

fig. 5 is a schematic diagram of a basic structure of an apparatus for implementing virtual try-on according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The virtual try-on technology of the embodiment of the invention can be applied to a mobile phone with a camera, or a computer connected with or internally provided with the camera, including a tablet computer. Can realize the try-on of articles such as glasses, ornaments and the like. In this embodiment, a trial wearing of eyeglasses will be described as an example. When the glasses fitting device is used, a user selects the glasses to be fitted, the camera is aligned to the face of the user, the screen or a designated key is clicked, and the camera collects the head portrait of the user and displays the glasses on the eyes of the head portrait of the user. The user can click on the glasses in the screen and pan them to further adjust their positional relationship to the eyes. The user can turn the neck up and down or left and right to view the glasses wearing effect at various angles. In this process, the technology of the present embodiment is applied to keep the posture of the glasses in the glasses image on the screen consistent with the posture of the human face, so that the glasses can track the human face movement to realize that the glasses are fixedly worn on the face. The following describes a technical solution of an embodiment of the present invention.

Fig. 1 is a schematic diagram of basic steps of a method of implementing virtual fitting according to an embodiment of the present invention. As shown in fig. 1, the method mainly includes steps S11 to S17 as follows.

Step S11: an initial frame is collected. The acquisition may be automatically started in the case of the camera being started or started according to an operation instruction of the user. For example, the user clicks on a touch screen or presses any or designated button on a keyboard.

Step S12: and carrying out face detection on the initial frame. The method can adopt various existing face detection modes to confirm that the initial frame contains the face and determine the approximate range of the face. The approximate range can be represented by a circumscribed rectangle of the face.

Step S13: a glasses image is generated and superimposed with the initial frame. Specifically, which image of the glasses is generated is selected by the user. For example, the user clicks on one of a plurality of glasses icons appearing in the screen. In the embodiment, the division point which occupies 0.3-0.35: 1 of the upper and lower total length of the face range from the upper end of the face range is set as the eye position in advance. In this step, when the eyeglass image is superimposed on the initial frame, the initial position of the eyeglass image is superimposed on the set eye position. The user can fine-tune the glasses presented on the face by dragging the glasses image.

Step S14: the current frame is collected.

Step S15: and carrying out face posture detection on the face in the current frame. The face pose can be implemented by using various existing face pose (or called face pose) detection technologies. The face pose may be determined using the rotation parameters R (R0, R1, R2) together with the translation parameters T (T0, T1, T2). The rotation parameter and the translation parameter respectively represent the rotation angle of one plane on three coordinate planes and the translation length on three coordinate axes relative to the initial position in the rectangular spatial coordinate system. In this embodiment, the initial position of the face image is the position of the face image in the initial frame, so that for each current frame, the face pose in the current frame, i.e. the above-mentioned rotation parameter and translation parameter, is obtained by comparing the initial frame with the face pose in the current frame. That is, the face pose of each frame after the initial frame is a pose formed with respect to the face pose in the initial frame.

Step S16: the eyeglass image is generated again from the current position of the eyeglass image and the face pose detected in step S15. In this step, the posture of the glasses in the glasses image needs to be matched with the posture of the human face. Therefore, the current position of the glasses image is used as the starting position, the rotation end value and the translation end value of the glasses in the glasses image are determined according to the rotation parameter and the translation parameter of the human face posture, and then the glasses image is generated according to the rotation end value and the translation end value.

Step S17: the eyeglass image generated in step S16 is superimposed with the current frame and then output. The glasses image output at this time is already located near the eyes on the face of the person in the current frame because it has undergone the processing of step S16. Up to this step, the glasses image has already been superimposed on the current frame. For each frame collected thereafter, the processing is also performed as in the above-described flow, i.e., the process returns to step S14.

In the case where the glasses image is superimposed on the current frame, the user can see the state shown in fig. 3. For clarity, the figure shows a single black and white line portrait 30 instead of the portrait captured by the actual camera. The figure is worn with glasses 32. The scheme can realize the try-on of the glasses and can also realize the try-on of ornaments such as earrings, necklaces and the like. For trying on a necklace, the captured face needs to include its neck.

The manner of face pose detection employed in the present embodiment is described below with reference to fig. 2. Fig. 2 is a schematic diagram of the main steps of face pose detection according to an embodiment of the present invention. As shown in fig. 2, the method mainly includes steps S20 to S29 as follows.

Step S20: a plurality of feature points on the face image are determined in the initial frame. Since the feature point tracking is performed in the subsequent step, the feature point is selected in this step in consideration of the convenience of tracking. Surrounding points with rich texture or points with large color gradient can be selected, and the points are still easy to be recognized when the position of the human face changes. Reference may be made to the following documents:

Jean-Yves Bouguet,“Pyramidal Implementation of the Lucas KanadeFeature Tracker Description of the algorithm”,Technical report,MicroprocessorResearch Labs,Intel Corporation(1999)；

Jianbo Shi Carlo Tomasi,“Good features to track”,Proc.IEEEComput.Soc.Conf.Comput.Vision and Pattern Recogn.,pages593-600,1994。

the collected feature points are shown in fig. 3. Fig. 3 is a schematic diagram of collected feature points according to an embodiment of the present invention. A plurality of small circles such as circle 31 in fig. 3 represent the acquired feature points. Next, for each feature point, the deviation of its texture region is determined, which is actually the tracking error of the feature point.

Step S21: and taking 1 feature point as the current feature point. The feature points may be numbered, each time in the order of the numbers. From step S22 to step S24, processing for one feature point is performed.

Step S22: and tracking the current characteristic point to determine the position of the characteristic point in the current frame. Various conventional feature point tracking methods such as an optical flow tracking method, a template matching method, a particle filter method, a feature point detection method, and the like can be used. The optical flow tracking method can adopt Lucas & Kanade method. For each algorithm for tracking feature points, there is a certain error in the application, and it is difficult to ensure that all feature points can be accurately located in a new frame, so in this embodiment, the tracking of feature points is improved, and for each feature point, the difference between its neighborhood in a certain range (called texture region in the following description of the steps) at the initial frame and its neighborhood in the corresponding range at the current frame is compared to determine whether the feature point is accurately tracked. I.e. the way of processing in the next step.

Step S23: and performing affine transformation on the neighborhood of the feature point in the initial frame according to the face posture of the previous frame to obtain a projection area of the neighborhood in the current frame. Since there is inevitably more or less rotation of the face between the two frames, it is preferable to perform affine transformation to make local areas of the two frames comparable. Referring to fig. 4A and 4B, fig. 4A and 4B are schematic diagrams of fetching texture regions in an initial frame and in a current frame, respectively, according to an embodiment of the present invention. A rectangular region centered on a feature point is generally used as a texture region of the feature point. As shown in fig. 4A and 4B, the texture region of the feature point 45 (white point in the figure) in the initial frame 41 (part of the frame is shown in the figure) is a rectangle 42, and the texture region in the current frame 43 is a trapezoid 44. This is because, by the current frame, the face has rotated a certain angle to the left, and if the texture region is still taken around the feature point 45 in fig. 4B by the size of the rectangle 42, an excessively large range of pixels will be acquired, and even in other cases, a background image will be acquired. Therefore, it is preferable to perform affine transformation by projecting the texture region of the feature point in the initial frame onto the current frame plane, so that the texture regions of the feature point in different frames are comparable. Thus, the feature point is actually the above-mentioned projected area in the texture area of the current frame.

Step S24: and calculating the color offset between the texture region of the current characteristic point in the initial frame and the projection region of the texture region in the current frame. The color shift amount is the tracking deviation of the feature point. During calculation, connecting the gray values of all pixel points of the current characteristic points in a texture area in an initial frame into a vector according to the rows or columns of the pixels, wherein the length of the vector is the total number of the pixel points in the texture area; and connecting the pixels of the projection area according to rows or columns, equally dividing according to the total number, taking the gray value of the pixel with larger occupation ratio from the gray value of each grid obtained by equally dividing, and connecting all the gray values of the grids into another vector, wherein the length of the vector is equal to the total number. And calculating the distance between the two vectors to obtain a numerical value, wherein the magnitude of the numerical value reflects the tracking deviation of the characteristic point. Since only the tracking offset needs to be obtained, using a shorter gray value than a vector obtained using RGB values helps to reduce the amount of computation. The vector distance can be represented by Euclidean distance, Mahalanobis distance, cosine distance, correlation system, etc. The present step then proceeds to step S25.

Step S25: and judging whether all the feature points are processed. If yes, go to step S26, otherwise return to step S21.

Step S26: the tracking deviations for all feature points are grouped into two categories by size. Any self-polymerization method can be used, for example, a K-means self-polymerization method. And when calculating, the maximum value and the minimum value of the tracking deviation of all the characteristic points are used as initial centers so as to cluster into two types of larger tracking deviation and smaller tracking deviation.

Step S27: and according to the clustering result of the step S26, taking the characteristic points with smaller tracking deviation as effective characteristic points. Accordingly, the other feature points serve as invalid feature points.

Step S28: and calculating the coordinate transformation relation of the effective characteristic points from the initial frame to the current frame. The coordinate transformation relationship is represented by a matrix P. Various algorithms can be used, such as the Levenberg-Marquardt algorithm, which can be referenced: zhang. "A flexible new technique for camera calibration". IEEE Transactionson Pattern Analysis and Machine integration, 22(11): 1330-:

F.Moreno-Noguer,V.Lepetit and P.Fua"EPnP:Efficient Perspective-n-Point Camera Pose Estimation"

X.S.Gao,X.-R.Hou,J.Tang,H.-F.Chang；"Complete Solution Classificationfor the Perspective-Three-Point Problem"

step S29: and obtaining the face pose of the current frame according to the coordinate transformation relation in the step S28 and the face pose in the initial frame. Namely, the rotation parameter Rn and the translation parameter Tn of the current frame (the nth frame) are calculated according to the matrix P and the rotation parameter R and the translation parameter T.

The above describes a calculation manner of the face pose in the current frame. Other face pose detection algorithms can be used to obtain the face pose in the current frame. The invalid feature points can be corrected by using the face pose in the current frame. The new coordinates of the invalid feature points are calculated according to the rotation parameter Rn and the translation parameter Tn and the coordinates of the invalid feature points in the initial frame, and the new coordinates are used to replace the coordinates of the invalid feature points in the current frame. The coordinates of all the feature points in the current frame after replacement are used for data processing of the next frame. This helps to improve the accuracy of the next frame processing. It is also possible to use only the valid feature values in the current frame for the processing of the next frame, but this reduces the amount of data available.

In this manner, the glasses image is superimposed on each frame, so that the user can see that the glasses are "worn" on the face when the user rotates his head. If the user moves the head violently, causing excessive changes in posture, especially in low light conditions, it is difficult to track the feature points accurately and the glasses on the screen will be out of position with respect to the eyes. In this case, the user may be prompted to perform a reset operation. E.g., clicking the screen or designated key again, at which point the camera captures the user's avatar and presents the glasses at the eyes of the user's avatar. In this case, the user operates to send a reset instruction, and after the mobile phone or the computer receives the reset instruction, the current frame acquired by the camera is used as the initial frame and processed according to the above method. In the processing procedure, if the ratio of the valid feature points is less than a set value, for example, 60%, or the ratio of the feature points acquired in the frame to the feature points acquired in the previous frame is less than a set value, for example, 30%, a prompt message, for example, a text "click the screen to reset", is output to prompt the user to "try on" the glasses again.

Fig. 5 is a schematic diagram of a basic structure of an apparatus for implementing virtual try-on according to an embodiment of the present invention. The device can be arranged in a mobile phone or a computer as software. As shown in fig. 5, the apparatus 50 for implementing virtual fitting mainly includes a face detection module 51, a first output module 52, a face posture detection module 53, and a second output module 54.

The face detection module 51 is configured to perform face detection on the acquired initial frame; the first output module 52 is configured to generate an article image at an initial position under the condition that the face detection module 51 acquires a face, and output the article image after overlapping the initial position with the initial frame, where the initial position overlaps with a specified position of the face in the initial frame; the face pose detection module 53 is configured to perform face pose detection on the face in the current frame to obtain a face pose of the current frame; the second output module 54 is configured to generate an article image again according to the current position of the article image and the face pose, make the article pose in the article image consistent with the face pose, and then superimpose the article image and the current frame for output.

The face pose detection module 53 may also be configured to: determining a plurality of characteristic points on the face image in an initial frame; the following processing is performed for each feature point: tracking the feature point to determine the position of the feature point in the current frame, performing affine transformation on a neighborhood of the feature point in the initial frame according to the face posture of the previous frame to obtain a projection area of the neighborhood in the current frame, and calculating the color offset between the neighborhood in the initial frame and the projection area in the current frame to serve as the tracking deviation of the feature point; selecting a plurality of characteristic points with smaller tracking deviation for the plurality of determined characteristic points; and determining the face pose of the current frame according to the positions of the plurality of characteristic points with smaller tracking deviation in the initial frame and the positions of the plurality of characteristic points in the current frame.

The face pose detection module 53 may also be configured to: for the determined tracking deviations of the plurality of feature points, clustering according to the tracking deviation by taking the maximum value and the minimum value as initial centers; and selecting the characteristic points corresponding to the two types of the characteristics with smaller tracking deviation.

The apparatus 50 for implementing virtual fitting may further include a modification module (not shown in the figure), configured to, after the face pose detection module determines the face pose of the current frame, project the feature points corresponding to the one of the two types with larger tracking deviation to the image plane of the current frame according to the face pose of the current frame, and replace the positions of the feature points at the current frame with projection positions.

The apparatus 50 for implementing virtual try-on may further include a reset module and a prompt module (not shown in the figure), wherein: the reset module is used for receiving a reset instruction and taking the collected current frame as an initial frame under the condition of receiving the reset instruction; the prompting module is used for outputting prompting information under the condition that the ratio of the number of characteristic points of one class with smaller tracking deviation to the total number of the characteristic points is larger than a first preset value or the ratio of the number of the characteristic points acquired in the current frame to the total number of the characteristic points is smaller than a second preset value after the face posture detection module carries out clustering according to the tracking deviation to obtain the two classes.

According to the technical scheme of the embodiment of the invention, the human face posture of each frame is detected, and then the glasses posture is adjusted according to the human face posture, so that a user can complete virtual try-on by using a common image acquisition device, and the user can rotate the head to observe wearing effects at multiple angles, and the glasses have higher authenticity.

While the principles of the invention have been described in connection with specific embodiments thereof, it should be noted that it will be understood by those skilled in the art that all or any of the steps or elements of the method and apparatus of the invention may be implemented in any computing device (including processors, storage media, etc.) or network of computing devices, in hardware, firmware, software, or any combination thereof, which will be within the skill of those in the art after reading the description of the invention and using their basic programming skills.

Thus, the objects of the invention may also be achieved by running a program or a set of programs on any computing device. The computing device may be a general purpose device as is well known. The object of the invention is thus also achieved solely by providing a program product comprising program code for implementing the method or the apparatus. That is, such a program product also constitutes the present invention, and a storage medium storing such a program product also constitutes the present invention. It is to be understood that the storage medium may be any known storage medium or any storage medium developed in the future.

It is further noted that in the apparatus and method of the present invention, it is apparent that each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be regarded as equivalents of the present invention. Also, the steps of executing the series of processes described above may naturally be executed chronologically in the order described, but need not necessarily be executed chronologically. Some steps may be performed in parallel or independently of each other.

The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for implementing virtual fitting, comprising:

carrying out face detection on the acquired initial frame, generating an article image at an initial position under the condition of detecting a face, then overlapping the article image with the initial frame and outputting the article image, wherein the initial position is overlapped with the specified position of the face range in the initial frame;

carrying out face posture detection on the face in the current frame to obtain the face posture of the current frame;

generating an article image again according to the current position of the article image and the face pose, enabling the article pose in the article image to be consistent with the face pose, and then overlapping the article image and the current frame and outputting the overlapped article image and the current frame;

the step of detecting the face pose of the face in the current frame to obtain the face pose of the current frame comprises the following steps:

determining a plurality of characteristic points on the face image in the initial frame;

the following processing is performed for each feature point:

the feature point is tracked to determine the location of the feature point at the current frame,

performing affine transformation on the neighborhood of the feature point in the initial frame according to the face pose of the previous frame to obtain the projection area of the neighborhood in the current frame,

calculating the color offset between the neighborhood in the initial frame and the projected area in the current frame as the tracking offset of the feature point,

selecting a plurality of feature points with smaller tracking deviation for the plurality of determined feature points;

and determining the face pose of the current frame according to the positions of the plurality of characteristic points with smaller tracking deviation in the initial frame and the positions of the plurality of characteristic points in the current frame.

2. The method according to claim 1, wherein the step of selecting, for the plurality of feature points, a plurality of feature points with a smaller tracking deviation comprises:

for the determined tracking deviations of the plurality of feature points, clustering according to the tracking deviation by taking the maximum value and the minimum value as initial centers;

and selecting the characteristic points corresponding to the two types of the characteristics points with smaller tracking deviation.

3. The method of claim 2, wherein the step of determining the face pose of the current frame is followed by the step of:

and projecting the characteristic points corresponding to the two types of the characteristic points with larger tracking deviation to the image plane of the current frame according to the human face posture of the current frame, and replacing the positions of the characteristic points at the current frame with projection positions.

4. The method of claim 2,

before the step of performing face detection on the acquired initial frame, the method further includes: taking the collected current frame as the initial frame under the condition of receiving a reset instruction;

after the step of clustering according to the magnitude of the tracking deviation to obtain two types, the method further comprises the following steps:

and outputting prompt information and then receiving a reset instruction under the condition that the proportion of the number of the characteristic points of the class with smaller tracking deviation to the total number of the characteristic points is smaller than a first preset value or the proportion of the number of the characteristic points acquired in the current frame to the total number of the characteristic points acquired in the previous frame is smaller than a second preset value.

5. The method of any one of claims 1 to 4, wherein the item image is a glasses image, a head gear image, or a neck gear image.

6. An apparatus for implementing virtual fitting, comprising:

the face detection module is used for carrying out face detection on the acquired initial frame;

the first output module is used for generating an article image at an initial position under the condition that the face detection module collects a face, then overlapping the article image with the initial frame and outputting the article image, wherein the initial position is overlapped with the specified position of the face range in the initial frame;

the human face posture detection module is used for detecting the human face posture of the current frame to obtain the human face posture of the current frame;

the second output module is used for generating an article image again according to the current position of the article image and the face pose, enabling the article pose in the article image to be consistent with the face pose, and then overlapping the article image and the current frame for output;

the face pose detection module is further to:

the following processing is performed for each feature point:

7. The apparatus of claim 6, wherein the face pose detection module is further configured to:

8. The apparatus according to claim 7, further comprising a modification module, configured to, after the face pose detection module determines the face pose of the current frame, project the feature points corresponding to the one of the two types with larger tracking deviation to the image plane of the current frame according to the face pose of the current frame, so as to replace the positions of the feature points in the current frame with the projected positions.

9. The apparatus of claim 7, further comprising a reset module and a prompt module, wherein:

the reset module is used for receiving a reset instruction and taking the collected current frame as the initial frame under the condition of receiving the reset instruction;

the prompting module is used for outputting prompting information under the condition that the ratio of the number of characteristic points of one type with smaller tracking deviation to the total number of the characteristic points is smaller than a first preset value or the ratio of the number of the characteristic points acquired in the current frame to the total number of the characteristic points acquired in the previous frame is smaller than a second preset value after the face posture detection module carries out clustering according to the tracking deviation to obtain the two types.

10. The apparatus of any one of claims 6 to 9, wherein the item image is a glasses image, a head gear image, or a neck gear image.