CN113362263B

CN113362263B - Method, apparatus, medium and program product for transforming an image of a virtual idol

Info

Publication number: CN113362263B
Application number: CN202110585489.3A
Authority: CN
Inventors: 吴准; 张晓东; 李士岩
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-05-27
Filing date: 2021-05-27
Publication date: 2023-09-15
Anticipated expiration: 2041-05-27
Also published as: CN113362263A

Abstract

The present disclosure provides a method, apparatus, medium and program product for transforming the image of a virtual idol, relating to the field of artificial intelligence such as deep learning and computer vision. One embodiment of the method comprises the following steps: acquiring attribute information of a virtual even image and attribute information of a standard object; determining a target image transformation mode according to the attribute information of the virtual even image and/or the attribute information of the standard object; and fusing the attribute information of the virtual idol and the attribute information of the standard object based on a fusion model corresponding to the target image transformation mode to obtain a fusion result.

Description

Method, apparatus, medium and program product for transforming an image of a virtual idol

Technical Field

The embodiment of the disclosure relates to the field of computers, in particular to the field of artificial intelligence such as deep learning and computer vision, and particularly relates to a method, equipment, medium and program product for transforming the image of a virtual idol.

Background

At present, the fusion technology is widely applied to various scenes such as virtual visual images, long and short video interesting playing methods, album interesting playing methods and the like. Fusion techniques typically require retaining attribute information of one persona while fusing attribute information of another persona.

Disclosure of Invention

The embodiment of the disclosure provides a method, equipment, medium and program product for transforming the image of a virtual idol.

In a first aspect, an embodiment of the present disclosure provides a method for transforming an avatar, including: acquiring attribute information of a virtual even image and attribute information of a standard object; determining a target image transformation mode according to the attribute information of the virtual even image and/or the attribute information of the standard object; and fusing the attribute information of the virtual idol and the attribute information of the standard object based on a fusion model corresponding to the target image transformation mode to obtain a fusion result.

In a second aspect, an embodiment of the present disclosure provides an apparatus for transforming an avatar, including: an information acquisition unit configured to acquire attribute information of a virtual even image and attribute information of a standard object; a mode determining unit configured to determine a target character transformation mode according to attribute information of the virtual idol and/or attribute information of the standard object; and the information fusion unit is configured to fuse the attribute information of the virtual idol and the attribute information of the standard object based on a fusion model corresponding to the target image transformation mode to obtain a fusion result.

In a third aspect, an embodiment of the present disclosure proposes an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in the first aspect.

In a fourth aspect, embodiments of the present disclosure provide a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a method as described in the first aspect.

In a fifth aspect, embodiments of the present disclosure propose a computer program product comprising a computer program which, when executed by a processor, implements a method as described in the first aspect.

The embodiment of the disclosure provides a method, equipment, medium and program product for transforming the image of a virtual idol, which firstly acquire attribute information of the virtual idol and attribute information of a standard object; then determining a target image transformation mode according to the attribute information of the virtual even image and/or the attribute information of the standard object; and finally, based on a fusion model corresponding to the target image transformation mode, fusing the attribute information of the virtual idol and the attribute information of the standard object to obtain a fusion result. The fusion model corresponding to the target image transformation mode can be determined by the attribute information of the virtual idol image and/or the attribute information of the standard object, and the attribute information of the virtual idol image and the attribute information of the standard object are fused to obtain a fusion result, so that the transformation of the image of the virtual idol image is realized.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

Other features, objects and advantages of the present disclosure will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings. The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is an exemplary system architecture diagram to which the present disclosure may be applied;

FIG. 2 is a flow chart of one embodiment of a method of transforming an avatar in accordance with the present disclosure;

FIG. 3 is a flow chart of one embodiment of a method of transforming an avatar in accordance with the present disclosure;

FIG. 4 is a flow chart of one embodiment of a method of generating a fusion model according to the present disclosure;

FIG. 5 is a schematic illustration of one application scenario of a method of transforming an avatar in accordance with the present disclosure;

FIG. 6 is a flow chart of one embodiment of an apparatus for transforming an avatar in accordance with the present disclosure;

fig. 7 is a block diagram of an electronic device used to implement an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be combined with each other. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of a method of transforming a avatar of a virtual idol or an apparatus of transforming a avatar of a virtual idol of the present disclosure may be applied.

As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user can interact with the server 105 through the network 104 using the terminal devices 101, 102, 103 to acquire attribute information of the virtual idol, attribute information of the standard object, and the like. Various client applications, intelligent interactive applications, such as video related software, live related software, image processing applications, etc., may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, the terminal devices may be electronic products that interact with a user in one or more manners such as a keyboard, a touch pad, a display screen, a touch screen, a remote controller, a voice interaction or a handwriting device, for example, a PC (Personal Computer, a personal computer), a mobile phone, a smart phone, a PDA (Personal Digital Assistant, a personal digital assistant), a wearable device, a PPC (Pocket PC), a tablet computer, a smart car machine, a smart television, a smart speaker, a tablet computer, a laptop portable computer, a desktop computer, and the like. When the terminal devices 101, 102, 103 are software, they can be installed in the above-described electronic devices. Which may be implemented as a plurality of software or software modules, or as a single software or software module. The present invention is not particularly limited herein.

The server 105 may provide various services. For example, the server 105 may acquire attribute information of the virtual idol on the terminal devices 101, 102, 103, and attribute information of the standard object; determining a target image transformation mode according to the attribute information of the virtual even image and/or the attribute information of the standard object; and fusing the attribute information of the virtual idol and the attribute information of the standard object based on a fusion model corresponding to the target image transformation mode to obtain a fusion result.

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster formed by a plurality of servers, or as a single server. When server 105 is software, it may be implemented as a plurality of software or software modules (e.g., to provide distributed services), or as a single software or software module. The present invention is not particularly limited herein.

It should be noted that, the method for transforming the avatar provided by the embodiments of the present disclosure is generally performed by the server 105, and accordingly, the device for transforming the avatar is generally disposed in the server 105.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method of transforming an avatar in accordance with the present disclosure is shown. The method of transforming the avatar of the virtual idol may include the steps of:

in step 201, attribute information of the virtual even image and attribute information of the standard object are acquired.

In the present embodiment, an execution subject (e.g., the terminal devices 101, 102, 103 or the server 105 shown in fig. 1) of the method of determining the avatar of the transformation virtual idol may acquire attribute information of the virtual idol as well as attribute information of the standard object. The virtual idol can be produced in a drawing, animation and other forms, and performs a performance activity in a virtual scene or a real scene such as the internet. The standard object may be an object in a template, and the virtual even image may be transformed based on the standard object.

In this embodiment, the virtual idol may be an avatar corresponding to an entity object (e.g., a live host) in a live scene.

It should be noted that the avatar of the virtual idol is not limited to one mode. The virtual idol may have different images. The avatar of the virtual idol image is typically a 3D avatar. The virtual idol may have different appearances and decorations. The image of each virtual idol also corresponds to a plurality of different types of dressing, and the dressing classification can be classified according to seasons and scenes.

Further, the clothing, the makeup, the ornament, the accessory, the hairstyle, the limb actions and the expressions when the virtual idol image data of each round of interaction are recorded, the dialogue text, the speech speed and the speech tone when the virtual idol image data of each round of interaction are recorded, the 300-dimensional word vector related to the dialogue text can be generated according to the virtual idol image data and the virtual idol voice data, the vector which is encoded by 0 and 1 and related to the features such as the clothing, the makeup, the ornament, the accessory, the hairstyle voice and the speech tone of the virtual idol image is generated, the 38-point skeleton key point vector related to the limb actions of the virtual idol image is generated, the 29-point expression key point vector related to the expression of the virtual idol image is generated, and the four vectors are spliced in sequence to generate high-dimensional vector to be used as the virtual idol image information of the round of interaction.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related attribute information all conform to the regulations of related laws and regulations and do not violate the popular regulations of the public order.

In this embodiment, before obtaining the attribute information of the virtual idol image and the attribute information of the standard object, the method for transforming the image of the virtual idol image may further include: (1) The entity object (i.e. the object corresponding to the virtual object) inputs a specific voice command through the microphone, such as 'i want to play a face of a star', 'i want to play a face of a Sichuan play'; (2) After receiving the instruction, starting a camera of an execution main body to recognize the gesture of the entity object, and calling a face changing technology when the hand is scratched across the face of the camera, so as to automatically change a star character; (3) The face changing technology is based on real-time face changing after video generation and then synthesizing a video stream; after face changing, the entity object can still drive the virtual idol to perform through the face capturing and dynamic capturing equipment, and meanwhile, the virtual idol can also be switched based on clothes and scenes, for example, according to the role of a film, when the face is changed, the scenes are changed, and the clothes of the virtual idol are also changed together. The scenario may be preset and an association between virtual idols established.

And 202, determining a target image transformation mode according to the attribute information of the virtual even image and/or the attribute information of the standard object.

In this embodiment, the executing body may determine the target image transformation mode according to attribute information of the virtual even image; or determining a target image transformation mode according to the attribute information of the standard object; or determining the target image transformation mode according to the attribute information of the virtual even image and the attribute information of the standard object. The above-described target character transformation pattern may be a pre-selected pattern regarding transforming the character of the avatar, which may be used to transform the character of the avatar, for example, facial transformation, hair accessories, scenes (or backgrounds), apparel, decorations, etc.

The target character conversion pattern may be generated from various visual scenes such as a game, a movie, a drama, and the like. For example, the target character transformation pattern may be generated by a face, a clothing, a hairstyle, a decoration, a scene (background), or the like in the "a game"; then, the target character conversion mode may include a mode composed of faces, clothes, hairstyles, decorations, scenes (backgrounds), etc. of the characters in the "a game".

And 203, fusing the attribute information of the virtual idol and the attribute information of the standard object based on a fusion model corresponding to the target image transformation mode to obtain a fusion result.

In this embodiment, the execution body may fuse the attribute information of the virtual even image and the attribute information of the standard object based on the fusion model corresponding to the target image transformation mode, to obtain a fusion result.

Specifically, the execution subject may first search based on the target image transformation mode to obtain a fusion model corresponding to the target image transformation mode; and then, inputting the attribute information of the virtual even image and the attribute information of the standard object into the fusion model, and fusing the results. The fusion model can be used for fusing the attribute information of the virtual idol image and the attribute information of the standard object so as to transform the image of the virtual idol image. The fusion result may be a result output after the fusion model fuses the attribute information of the virtual even image and the attribute information of the standard object.

Taking face fusion as an example. Face fusion techniques typically require the retention of identity information of one face image while fusing attribute information of another face image. In the face method, a self-encoder (auto encoder) attribute reconstruction network exists for a target face, and features of various scales of the attribute reconstruction network are fused into identity information of a template face.

In this embodiment, the fusion is not limited to only facial fusion (e.g., face fusion), but also includes hairstyle fusion, scene (or background) fusion, ornament fusion, clothing fusion, and the like.

The hairstyle fusion can be to splice the hairstyle image with the face image of the virtual even image. Scene fusion may be the processing of a scene of a virtual idol image into a transparent image; and then splicing the scene of the standard object with the virtual even image. The decoration fusion can be to determine the area where the decoration needs to be arranged in the virtual idol, and then to superimpose the layer of the decoration on the area. The dress fusion can splice the dress with the region where the dress of virtual idol is located.

The method for transforming the image of the virtual idol provided by the embodiment of the disclosure comprises the steps of firstly obtaining attribute information of the virtual idol and attribute information of a standard object; then determining a target image transformation mode according to the attribute information of the virtual even image and/or the attribute information of the standard object; and finally, based on a fusion model corresponding to the target image transformation mode, fusing the attribute information of the virtual idol and the attribute information of the standard object to obtain a fusion result. The fusion model corresponding to the target image transformation mode can be determined by the attribute information of the virtual idol image and/or the attribute information of the standard object, and the attribute information of the virtual idol image and the attribute information of the standard object are fused to obtain a fusion result, so that the transformation of the image of the virtual idol image is realized.

In some optional implementations of this embodiment, determining the target avatar transformation mode according to attribute information of the virtual even image and/or attribute information of the standard object may include: acquiring a preset attribute information set; matching the attribute information of the virtual idol and/or the attribute information of the standard object with a preset attribute information set to obtain a matching result; and determining a target image transformation mode according to the matching result.

In this implementation manner, the executing body may first acquire a preset attribute information set; then, matching the attribute information of the virtual even image and the attribute information of the standard object with the attribute information in the preset attribute information set respectively to obtain a matching result; and then, according to the matching result, determining a target image transformation mode. The preset attribute information set may be a set formed by attribute information collected in advance, and after the preset attribute information set is obtained, a mapping relationship between attribute information in the preset attribute information set and attribute information of a virtual even image and attribute information of a standard object needs to be established in advance.

In this implementation manner, the target avatar transformation mode may be determined from a preset avatar transformation mode set based on the type information characterized by the attribute information of the standard object. The type information characterized by the attribute information of the standard object may be information which can be used for distinguishing the type of the standard object in the attribute information of the standard object.

In one example, the type information may include: a target cartoon character, a target movie character, and a target dramatic character; if the type information is the target cartoon character, determining the target image transformation mode corresponding to the target cartoon character from a preset image transformation mode set.

It should be noted that the preset character transformation mode set may be a preset one including a plurality of character transformation modes. After the preset image transformation mode set is acquired, a mapping relation between type information characterized by attribute information of the standard object and the image transformation mode can be established.

In one example, a mapping relationship between the type information and the avatar transformation pattern may be previously established; for example, the type information of the virtual idol image may include a target cartoon character, a target dramatic character; then, a mapping relationship between the target cartoon character and the character transformation pattern 1 and between the target dramatic character and the character transformation pattern 2 may be established in advance; and if the type information characterized by the attribute information of the virtual idol is the target cartoon character, determining that the character transformation mode is a character transformation mode 1.

In this implementation manner, the executing body may match the attribute information of the virtual idol with a preset attribute information set to obtain an initial matching result; if the number of the initial matching results is one, determining a target image transformation mode from the image transformation mode corresponding to the initial matching result; if the number of the initial matching results is multiple, matching can be performed with the initial matching results based on the attribute information of the standard object so as to obtain a final matching result; and then, determining the image transformation mode corresponding to the final matching result as a target image transformation mode.

In this implementation manner, the executing body may determine the initial avatar transformation mode from the preset avatar transformation mode set by using the type information represented by the attribute information of the virtual even image. If the types of the initial image transformation modes are multiple, determining a final image transformation mode from the initial image transformation modes based on the type information characterized by the attribute information of the standard object; and then, deforming the avatar of the avatar according to the final avatar transformation mode.

It should be noted that, attribute information of the virtual even image is matched first; and then, determining whether the matching of the attribute information of the standard object is needed to be executed again according to the matching result.

In practical application, the attribute information of the virtual idol image may be preset by an entity object corresponding to the virtual idol image, and after the user finishes setting the attribute information of the virtual idol image, the entity object may also select the attribute information of the standard object.

In the implementation manner, the determination of the target image transformation mode can be realized through a matching result between the preset attribute information set and the attribute information of the virtual even image and/or the attribute information of the standard object.

In some optional implementations of the present embodiment, determining the target avatar transformation mode based on the matching result includes: and if the matching result comprises third attribute information which is matched with the attribute information of the virtual even image and the attribute information of the standard object in the preset attribute information set, determining a third image transformation mode corresponding to the third attribute information as a target image transformation mode.

In this embodiment, the execution subject may be configured to determine the correspondence between the third attribute information and the attribute information of the virtual even image and the attribute information of the standard object. The matching may be that the similarity meets a preset similarity threshold; or the same. The preset similarity threshold can be determined by the accuracy of the user setting or transforming the image.

In the implementation manner, the accurate determination of the target image transformation mode can be realized through a matching result of simultaneously matching the attribute information in the preset attribute information set with the attribute information of the virtual even image and the attribute information of the standard object.

In some optional implementations of this embodiment, if the matching result includes first attribute information matched with attribute information of the virtual even image in the preset attribute information set, or second attribute information matched with attribute information of the standard object; and

The method for transforming the image of the virtual idol image further comprises the following steps: acquiring user preference information for an image transformation mode;

according to the matching result, determining a target image transformation mode comprises the following steps: if the number of the first image transformation modes corresponding to the first attribute information or the second image transformation modes corresponding to the second attribute information is a plurality of, determining a target image transformation mode from the first image transformation modes or the second image transformation modes according to the user preference information.

In this implementation manner, the executing body may first match attribute information of the virtual even image or attribute information of the standard object with attribute information in a preset attribute information set, and if the matching result includes a plurality of first image transformation modes corresponding to the first attribute information (i.e., attribute information of the virtual even image matches attribute information in the preset attribute information set) or a plurality of second image transformation modes corresponding to the second attribute information (i.e., attribute information of the standard object matches attribute information in the preset attribute information set); at this time, the user preference information of the entity object corresponding to the virtual even image for the image transformation mode is combined again, and the target image transformation mode is determined from the plurality of first image transformation modes or the plurality of second image transformation modes, so that accurate determination of the image transformation mode is realized based on the user preference information. The user preference information may be used to characterize a degree of attention of the physical object corresponding to the virtual idol to the character transformation mode, which may be through a degree of operation (or use) of the physical object corresponding to the virtual idol to the character transformation mode, for example, the physical object corresponding to the virtual idol recently uses the target character transformation mode corresponding to the target movie character, and then the target character transformation mode may be determined from the plurality of first character transformation modes or the plurality of second character transformation modes based on the degree of attention.

In this implementation manner, the executing body may further filter based on the user preference information when the number of the first image transformation modes corresponding to the first attribute information or the second image transformation modes corresponding to the second attribute information is plural, so as to accurately determine the target image transformation mode.

In some optional implementations of the present embodiment, determining the target avatar transformation mode based on the matching result includes: if the matching result comprises first attribute information matched with the attribute information of the virtual idol in the preset attribute information set, determining a first image transformation mode corresponding to the first attribute information as a target image transformation mode; and if the matching result comprises second attribute information matched with the attribute information of the standard object in the preset attribute information set, determining a second image transformation mode corresponding to the second attribute information as a target image transformation mode.

In this implementation manner, the executing body may pre-establish a correspondence between the first attribute information and the attribute information of the virtual idol; or, a correspondence relationship between the second attribute information and the attribute information of the standard object. The matching may be that the similarity meets a preset similarity threshold; or the same. Wherein the preset similarity threshold may be determined by the user or the accuracy of the transformed avatar.

In the implementation manner, the determination of the target image transformation mode can be realized through a matching result between the attribute information in the preset attribute information set and the attribute information of the virtual even image and/or the attribute information of the standard object.

In some optional implementations of the present embodiment, obtaining attribute information of the virtual even image and attribute information of the standard object includes: acquiring three-dimensional deformation model parameters of the virtual idol, and extracting attribute information of the virtual idol from the three-dimensional deformation model parameters; and acquiring three-dimensional deformation model parameters of the standard object, and extracting attribute information of the standard object from the three-dimensional deformation model parameters.

In this implementation manner, the execution body may extract attribute information of the face of the virtual even image from the acquired three-dimensional deformation model (3D Morphable Model,3DMM) parameters of the face of the virtual even image, and extract attribute information of the standard object from the acquired 3DMM parameters of the standard object.

In the implementation manner, the attribute information of the virtual even image and the attribute information of the standard object can be obtained based on the obtained three-dimensional deformation model parameters.

In some optional implementations of the present embodiment, obtaining attribute information of the virtual even image and attribute information of the standard object includes: acquiring attribute information of the virtual idol by using a three-dimensional reconstruction method; and obtaining attribute information of the standard object by using a three-dimensional reconstruction method.

In this implementation, the attribute information may include at least one of: facial attribute information, hairstyle attribute information, scene (or background) attribute information, ornament attribute information, apparel attribute information, person number attribute information, physiological attribute information.

In one example, face attribute information, such as face attribute information, is taken as an example. If the attribute information includes face attribute information, the obtaining attribute information of the virtual even image may include: and acquiring three-dimensional deformation model parameters of the face of the virtual idol by using a face three-dimensional reconstruction method, and extracting attribute information of the face of the virtual idol from the three-dimensional deformation model parameters.

In the implementation manner, the acquisition of the attribute information of the virtual even image and the attribute information of the standard object can be realized based on the three-dimensional reconstruction method.

In some optional implementations of this embodiment, the attribute information of the standard object or the attribute information of the virtual idol may include at least one of: face attribute information, hair style attribute information, scene attribute information, ornament attribute information, and apparel attribute information.

In this embodiment, the attribute information may be represented by its face attribute information, hair style attribute information, scene (or background) attribute information, ornament attribute information, and clothing attribute information. The hairstyle attribute information can be used for characterizing attribute information of the hairstyle, such as information of the hairstyle, color, length and the like. The scene attribute information may be used to characterize scene-related information. The ornament attribute information may be used to characterize information related to the ornament, such as style, quantity, etc. Apparel attribute information may be used to characterize information related to the apparel, such as size, color, and the like.

In the implementation manner, the multidimensional transformation of the image of the virtual even image can be realized based on multidimensional attribute information.

In some optional implementations of this embodiment, the standard object includes at least one of: target movie characters, target drama characters, and target cartoon characters.

In this implementation manner, the execution body may determine the type of the standard object through the type information characterized by the attribute information of the standard object. The target movie character may be a character image appearing in a movie resource. The target dramatic characters may be character images appearing in a dramatic, and the dramatic characters may further include corresponding facial mask. The cartoon character may be a character image appearing in an animation.

In one example, a target movie character is taken as an example.

According to the attribute information of the virtual idol and/or the attribute information of the standard object, determining that the target image transformation mode is the target image transformation mode corresponding to the target film and television character; and then, based on a fusion model corresponding to the target image transformation mode corresponding to the target movie character, fusing the attribute information of the virtual idol and the attribute information of the standard object to obtain a fusion result.

The first image is a face of a physical object corresponding to a virtual even image, the second image is a movie character 'A', the third image is obtained by fusing fusion models corresponding to target image transformation modes corresponding to target movie characters, the third image is obtained by replacing four parts of eyebrows, eyes, nose and mouth on the basis of the first image, and the facial form keeps along the facial form outline in the first image.

In the actual fusion, the parts other than the eyebrows, eyes, nose, and mouth, for example, the hairstyle, ears, and ornaments may be fused.

In one example, a target dramatic character is taken as an example.

When fusion is carried out by the fusion model, attribute information of the standard object can be covered at the attribute information of the virtual idol. For example, make-up at the eyes of a standard object is directly overlaid at the eyes of a virtual idol, or apparel of a standard object is directly overlaid on the body of a virtual idol.

If the angle of the standard object is different from the angle of the virtual even image, the angle of the standard object may be adjusted so that the adjusted angle of the standard object is the same as the angle of the virtual even image. For example, when the face of the standard object faces 30 degrees upwards, the angle adjustment of the attribute information of the standard object can be consistent with the angle of the virtual even image during fusion, and then fusion can be performed.

In one example, a target cartoon character is taken as an example.

The cartoon character can be an animation character or a character in a game. The method of transforming the avatar of the virtual doll as the cartoon character is fused may further include: virtual idol is bound with the muscles and bones of the physical object in advance.

In this implementation, the determination of the target avatar transformation pattern may be implemented by the type of the standard object.

With further reference to fig. 3, fig. 3 illustrates a flow 300 of one embodiment of a method of transforming an avatar in accordance with the present disclosure. The method of transforming the avatar of the virtual idol may include the steps of:

in step 301, attribute information of the virtual even image and attribute information of the standard object are acquired.

Step 302, a preset attribute information set is acquired.

In the present embodiment, an execution subject (e.g., the terminal devices 101, 102, 103 or the server 105 shown in fig. 1) of the method of determining the avatar transformation may acquire a preset set of attribute information. Step 301 and step 302 may be performed simultaneously or separately.

And 303, matching the attribute information of the virtual idol and/or the attribute information of the standard object with a preset attribute information set to obtain a matching result.

In this embodiment, the execution body may match attribute information of the virtual even image and/or attribute information of the standard object with a preset attribute information set to obtain a matching result.

And 304, determining a target image transformation mode according to the matching result.

In this embodiment, the execution body may determine the target character transformation mode according to the matching result.

And 305, fusing the attribute information of the virtual idol and the attribute information of the standard object based on a fusion model corresponding to the target image transformation mode to obtain a fusion result.

In this embodiment, the specific operations of steps 301 and 305 are described in detail in steps 201 and 203, respectively, in the embodiment shown in fig. 2, and are not described herein.

As can be seen from fig. 3, the step of determining the target character transformation pattern is highlighted for transforming the character of the virtual idol in the present embodiment, compared with the embodiment corresponding to fig. 2. Therefore, the scheme described in the embodiment matches the attribute information of the virtual idol and the attribute information of the standard object with the attribute information in the preset attribute information set respectively to obtain a matching result; and then, according to the matching result, obtaining the target image transformation mode. The target image transformation mode can be obtained according to different matching results, so that the image transformation of the virtual even image can be realized based on fusion models corresponding to different target image transformation modes.

In some optional implementations of the present embodiment, determining the target avatar transformation mode based on the matching result includes: if the matching result comprises first attribute information matched with the attribute information of the virtual idol in the preset attribute information set, determining a first image transformation mode corresponding to the first attribute information as a target image transformation mode; if the matching result comprises second attribute information matched with the attribute information of the standard object in the preset attribute information set, determining a second image transformation mode corresponding to the second attribute information as a target image transformation mode; and if the matching result comprises third attribute information matched with the attribute information of the virtual even image and the attribute information of the standard object in the preset attribute information set, determining a third image transformation mode corresponding to the third attribute information as a target image transformation mode.

In this implementation manner, the matching may be that the similarity with the attribute information meets a preset similarity threshold; or, the attribute information of the virtual even image is the same as the first attribute information, or the attribute information of the standard object is the same as the second attribute information.

In the implementation mode, the image transformation of the virtual idol can be realized based on fusion models corresponding to different target image transformation modes.

With further reference to fig. 4, fig. 4 illustrates a flow 400 of one embodiment of a method of generating a fusion model according to the present disclosure. The method for generating the fusion model can comprise the following steps:

step 401, obtaining a training sample, where the training sample includes: sample attribute information of the virtual even image and sample attribute information of the standard object.

In this embodiment, an execution subject of the method of generating a fusion model (e.g., the terminal devices 101, 102, 103 shown in fig. 1) may collect training samples generated thereon; or, the execution subject (e.g., server 105 shown in fig. 1) acquires training samples from the terminal devices (e.g., terminal devices 101, 102, 103 shown in fig. 1).

And step 402, fusing the sample attribute information of the virtual even image and the sample attribute information of the standard object to obtain sample fusion attribute information.

In this embodiment, the execution body may first fuse the virtual idol and the standard object to obtain a fused object; and then, acquiring the attribute information of the fusion object, and taking the attribute information of the fusion object as sample fusion attribute information.

Taking a face as an example, aiming at a face image of a virtual even image in a training sample, a face three-dimensional reconstruction method can be utilized to acquire face attribute information of the virtual even image. Preferably, the three-dimensional deformation model (3D Morphable Model,3DMM) parameters of the virtual idol may be first acquired by using a face three-dimensional reconstruction method, and then the face attribute information of the virtual idol may be extracted from the 3DMM parameters. Wherein, different dimensionalities in the 3DMM parameters respectively correspond to the identity, the expression, the gesture, the illumination, the hairstyle, the clothes, the ornament and other information of the virtual even figure.

In one example, face fusion is taken as an example.

Fusing the sample attribute information of the virtual even image and the sample attribute information of the standard object to obtain sample fusion attribute information, which may include: the face image of the virtual even image and the face image of the standard object can be fused to obtain a fused face image; and then acquiring attribute information of the fused face image, and taking the acquired attribute information as sample fused attribute information. Alternatively, a fusion method based on a generation countermeasure network (Generative Adversarial Networks, GAN) may be used to fuse the face image of the virtual idol with the face image of the standard object, thereby obtaining a fused face image. In practical applications, any GAN-based fusion method may be used, such as face-to-face migration (facemask) methods.

In this embodiment, the fusion technique generally needs to retain the attribute information of one character and fuse the attribute information of another character. In the face image of the virtual even image in the face method, a self-encoder attribute reconstruction network exists, and features of various scales of the attribute reconstruction network are fused into the face image of the standard object.

And step 403, constructing an attribute consistency loss function according to the sample attribute information and the sample fusion attribute information of the virtual idol, and performing self-supervision learning of the fusion model by using the attribute consistency loss function.

In the implementation manner, the sample attribute information and the sample fusion attribute information of the virtual idol image can be respectively obtained, and consistency of the two attribute information is expected during fusion, so that an attribute consistency loss function can be constructed according to the sample attribute information and the sample fusion attribute information of the virtual idol image, and a fusion model can be carried out by utilizing the attribute consistency loss function.

In one example, the L2 norm (L2-norm) of the sample attribute information and the sample fusion attribute information of the virtual idol can be calculated as an attribute consistency loss function withThe body form can be: i A-B I ² Wherein A and B respectively represent sample attribute information and sample fusion attribute information of the virtual even image.

In addition, the self-supervision learning of the fusion model can be performed by combining the attribute consistency loss function and the identity consistency loss function in the GAN-based fusion method. For example, self-supervised learning of the fusion model is performed in combination with the attribute uniformity loss function and the attribute information uniformity loss function in the accessifer method.

It should be noted that, through the above manner, consistency of sample fusion attribute information and sample attribute information of the virtual even image is ensured.

In this embodiment, the method for generating a fusion model may further include: aiming at different image transformation modes, the transformation of the image of the virtual idol is realized by fusion models corresponding to the transformation modes based on the different images.

According to the method for generating the fusion model, the sample attribute information and the sample fusion attribute information of the virtual idol can be obtained respectively, the attribute consistency loss function can be constructed by utilizing the obtained attribute information, and the training of the model is guided by utilizing the attribute consistency loss function, so that the model training effect is improved, the fusion effect when the model obtained by training is fused is further improved, a more real fusion image can be obtained, and the corresponding fusion result can be obtained by giving any virtual idol and target object to the model obtained by training, so that the method has wide applicability, low realization cost and the like.

In some alternative implementations of the present embodiment, the fusion model may be a generative antagonism network model.

Specifically, sample attribute information of the virtual even image and sample attribute information of the standard object are fused to obtain sample fusion attribute information, which comprises the following steps: and inputting the sample attribute information of the virtual idol and the sample attribute information of the standard object into a generator of the generative type countermeasure network model to obtain sample fusion attribute information.

In this implementation manner, the execution body may input the sample attribute information of the standard object and the sample attribute information of the virtual idol into the generator of the generative type countermeasure network model, so as to realize fusion of the sample attribute information of the virtual idol and the sample attribute information of the standard object, so as to obtain sample fusion attribute information.

Constructing an attribute consistency loss function according to sample attribute information and sample fusion attribute information of the virtual idol, and performing self-supervision learning of a fusion model by using the attribute consistency loss function, wherein the self-supervision learning comprises the following steps: respectively inputting the sample fusion attribute information and the sample attribute information of the standard object into a discriminator of the generated type countermeasure network model to obtain a first discrimination result aiming at the sample fusion attribute information and a second discrimination result aiming at the sample attribute information of the standard object; determining an attribute consistency loss function according to the first discrimination result, the second discrimination result and the sample fusion attribute information; and adjusting the network parameters of the generator according to the attribute consistency loss function.

In this implementation manner, the executing body may determine the attribute consistency loss function by using the identifier of the generated type countermeasure network model to obtain a first discrimination result for the sample fusion attribute information, a second discrimination result for the sample fusion attribute information of the standard object, and the sample fusion attribute information, so as to adjust the network parameters of the generator.

In this implementation manner, the execution subject may perform face fusion on the sample attribute information of the virtual even image and the sample attribute information of the standard object by using a fusion method based on a generated countermeasure network (Generative Adversarial Networks, GAN) model, so as to obtain a fusion result. In practical applications, any GAN-based fusion method may be used, such as migration (fasthifter) methods.

Fusion techniques typically require retaining attribute information of one persona while fusing attribute information of another persona. In the accessifter method, a self-encoder (auto encoder) attribute reconstruction network exists for the virtual even image, and features of various scales of the attribute reconstruction network are fused into sample attribute information of the standard object.

In the implementation mode, a fusion method based on GAN is adopted, so that a good fusion effect can be obtained, and further subsequent processing is facilitated.

With further reference to fig. 5, fig. 5 is a schematic diagram 500 of one application scenario of a method of transforming an avatar according to the present disclosure. In this application scenario, taking a face as an example, the terminal device 501 may be configured to obtain a virtual idol of an entity object; then, acquiring the face attribute information of the virtual idol and the face attribute information of the standard object; then, determining a target image transformation mode according to the face attribute information of the virtual idol and/or the face attribute information of the standard object; then, based on a fusion model corresponding to the target image transformation mode, fusing the attribute information of the virtual idol image and the attribute information of the standard object to obtain a fusion result; and then transmitted to the terminal device 503 through the network 502, and the live broadcast of the physical object (i.e., user) of the terminal device 501 is viewed by the user of the terminal device 503.

With further reference to fig. 6, as an implementation of the method shown in the foregoing figures, the present disclosure provides an embodiment of an apparatus for transforming an avatar, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied to various electronic devices.

As shown in fig. 6, the apparatus 600 for transforming the avatar of the present embodiment may include: an information acquisition unit 601, a mode determination unit 602, and an information fusion unit 603. Wherein, the information obtaining unit 601 is configured to obtain attribute information of the virtual even image and attribute information of the standard object; a mode determining unit 602 configured to determine a target character transformation mode according to attribute information of the virtual even image and/or attribute information of the standard object; the information fusion unit 603 is configured to fuse the attribute information of the virtual idol and the attribute information of the standard object based on the fusion model corresponding to the target image transformation mode, so as to obtain a fusion result.

In this embodiment, in the apparatus 600 for transforming the image of the virtual idol: the specific processing of the information obtaining unit 601, the mode determining unit 602 and the information fusing unit 603 and the technical effects thereof may refer to the relevant descriptions of steps 201 to 203 in the corresponding embodiment of fig. 2, and are not described herein.

In some optional implementations of the present embodiment, the mode determining unit 602 includes: an information acquisition subunit configured to acquire a preset attribute information set; the result obtaining subunit is configured to match the attribute information of the virtual idol and/or the attribute information of the standard object with a preset attribute information set to obtain a matching result; and a mode determining sub-unit configured to determine a target character transformation mode according to the matching result.

In some optional implementations of the present embodiment, the mode determination subunit is further configured to: and if the matching result comprises third attribute information which is matched with the attribute information of the virtual even image and the attribute information of the standard object in the preset attribute information set, determining a third image transformation mode corresponding to the third attribute information as a target image transformation mode.

In some optional implementations of this embodiment, if the matching result includes first attribute information matched with attribute information of the virtual even image in the preset attribute information set, or second attribute information matched with attribute information of the standard object; an information acquisition unit 601 further configured to acquire user preference information for the character conversion mode; a mode determination subunit further configured to: if the number of the first image transformation modes corresponding to the first attribute information or the second image transformation modes corresponding to the second attribute information is a plurality of, determining a target image transformation mode from the first image transformation modes or the second image transformation modes according to the user preference information.

In some optional implementations of the present embodiment, the mode determination subunit is further configured to: if the matching result comprises first attribute information matched with the attribute information of the virtual idol in the preset attribute information set, determining a first image transformation mode corresponding to the first attribute information as a target image transformation mode; if the matching result comprises second attribute information matched with the attribute information of the standard object in the preset attribute information set, determining a second image transformation mode corresponding to the second attribute information as a target image transformation mode; and if the matching result comprises third attribute information matched with the attribute information of the virtual even image and the attribute information of the standard object in the preset attribute information set, determining a third image transformation mode corresponding to the third attribute information as a target image transformation mode.

In some optional implementations of the present embodiment, the information obtaining unit 601 is further configured to: acquiring three-dimensional deformation model parameters of the virtual idol, and extracting attribute information of the virtual idol from the three-dimensional deformation model parameters; and acquiring three-dimensional deformation model parameters of the standard object, and extracting attribute information of the standard object from the three-dimensional deformation model parameters.

In some optional implementations of the present embodiment, the information obtaining unit 601 is further configured to: acquiring attribute information of the virtual idol by using a three-dimensional reconstruction device; and acquiring attribute information of the standard object by using the three-dimensional reconstruction device.

In some optional implementations of this embodiment, the apparatus for transforming the avatar of the virtual idol further includes: a sample acquisition unit configured to acquire a training sample including: sample attribute information of the virtual even image and sample attribute information of the standard object; the information obtaining unit is configured to fuse the sample attribute information of the virtual even image and the sample attribute information of the standard object to obtain sample fusion attribute information; and the model training unit is configured to construct an attribute consistency loss function according to the sample attribute information and the sample fusion attribute information of the virtual idol and perform self-supervision learning of the fusion model by utilizing the attribute consistency loss function.

In some optional implementations of the present embodiment, the fusion model is a generative antagonism network model;

An information obtaining unit further configured to: inputting sample attribute information of the virtual idol and sample attribute information of the standard object into a generator of a generating type countermeasure network model to obtain sample fusion attribute information;

a model training unit further configured to: respectively inputting the sample fusion attribute information and the sample attribute information of the standard object into a discriminator of the generated type countermeasure network model to obtain a first discrimination result aiming at the sample fusion attribute information and a second discrimination result aiming at the sample attribute information of the standard object; determining an attribute consistency loss function according to the first discrimination result, the second discrimination result and the sample fusion attribute information; and adjusting the network parameters of the generator according to the attribute consistency loss function.

In some optional implementations of this embodiment, the attribute information of the standard object or the attribute information of the virtual idol includes at least one of: face attribute information, hair style attribute information, scene attribute information, ornament attribute information, and apparel attribute information.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 7 illustrates a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the apparatus 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in device 700 are connected to I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the respective methods and processes described above, for example, a method of transforming the avatar of the virtual idol. For example, in some embodiments, the method of transforming the avatar of the virtual idol may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 700 via ROM 702 and/or communication unit 709. When the computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the above-described method of transforming the avatar of the virtual idol may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the method of transforming the avatar by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

Artificial intelligence is the discipline of studying computers to simulate certain mental processes and intelligent behaviors (e.g., learning, reasoning, thinking, planning, etc.) of humans, both hardware-level and software-level techniques. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural voice processing technology, a machine learning/deep learning technology, a big data processing technology, a knowledge graph technology and the like.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions mentioned in the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A method of transforming an avatar of a virtual idol, comprising:

acquiring attribute information of a virtual even image and attribute information of a standard object;

according to the attribute information of the virtual even image and/or the attribute information of the standard object, determining a target image transformation mode comprises the following steps: acquiring a preset attribute information set; matching the attribute information of the virtual idol and/or the attribute information of the standard object with the preset attribute information set to obtain a matching result; according to the matching result, determining the target image transformation mode comprises the following steps: if the number of the first image transformation modes corresponding to the first attribute information matched with the attribute information of the virtual idol in the preset attribute information set or the number of the second image transformation modes corresponding to the second attribute information matched with the attribute information of the standard object in the preset attribute information set are multiple, determining the target image transformation mode from the first image transformation modes or the second image transformation modes according to the acquired user preference information aiming at the image transformation modes;

And fusing the attribute information of the virtual idol and the attribute information of the standard object based on the fusion model corresponding to the target image transformation mode to obtain a fusion result.

2. The method of claim 1, wherein the determining the object representation transformation mode according to the matching result further comprises:

and if the matching result comprises third attribute information which is matched with the attribute information of the virtual idol and the attribute information of the standard object in the preset attribute information set, determining a third image transformation mode corresponding to the third attribute information as the target image transformation mode.

3. The method according to claim 1 or 2, wherein the acquiring attribute information of the virtual even image and attribute information of the standard object includes:

acquiring three-dimensional deformation model parameters of the virtual idol image, and extracting attribute information of the virtual idol image from the three-dimensional deformation model parameters; and

and acquiring three-dimensional deformation model parameters of the standard object, and extracting attribute information of the standard object from the three-dimensional deformation model parameters.

4. The method of claim 1, wherein the fusion model is determined based on:

Obtaining a training sample, wherein the training sample comprises the following steps: sample attribute information of the virtual even image and sample attribute information of the standard object;

fusing the sample attribute information of the virtual idol and the sample attribute information of the standard object to obtain sample fusion attribute information;

and constructing an attribute consistency loss function according to the sample attribute information of the virtual idol and the sample fusion attribute information, and performing self-supervision learning of the fusion model by utilizing the attribute consistency loss function.

5. The method of claim 4, wherein if the fusion model is a generative antagonism network model; and

the fusing the sample attribute information of the virtual even image and the sample attribute information of the standard object to obtain sample fused attribute information comprises the following steps:

inputting the sample attribute information of the virtual idol and the sample attribute information of the standard object into a generator of the generated type countermeasure network model to obtain the sample fusion attribute information;

the constructing an attribute consistency loss function according to the sample attribute information of the virtual idol and the sample fusion attribute information, and performing self-supervision learning of the fusion model by using the attribute consistency loss function, comprising:

Respectively inputting the sample fusion attribute information and the sample attribute information of the standard object into a discriminator of the generated type countermeasure network model to obtain a first discrimination result aiming at the sample fusion attribute information and a second discrimination result aiming at the sample attribute information of the standard object;

determining the attribute consistency loss function according to the first discrimination result, the second discrimination result and the sample fusion attribute information;

and adjusting network parameters of the generator according to the attribute consistency loss function.

6. The method of any of claims 1, 2, 4-5, wherein the attribute information includes at least one of: face attribute information, hair style attribute information, scene attribute information, ornament attribute information, and apparel attribute information.

7. A method according to claim 3, wherein the attribute information comprises at least one of: face attribute information, hair style attribute information, scene attribute information, ornament attribute information, and apparel attribute information.

8. An apparatus for transforming an avatar of a virtual idol, comprising:

an information acquisition unit configured to acquire attribute information of a virtual even image and attribute information of a standard object;

A mode determining unit configured to determine an object character transformation mode according to attribute information of the virtual even image and/or attribute information of the standard object, comprising: an information acquisition subunit configured to acquire a preset attribute information set; the result obtaining subunit is configured to match the attribute information of the virtual even image and/or the attribute information of the standard object with the preset attribute information set to obtain a matching result; a mode determination subunit configured to determine the target avatar transformation mode according to the matching result, and further configured to: if the number of the first image transformation modes corresponding to the first attribute information matched with the attribute information of the virtual idol in the preset attribute information set or the number of the second image transformation modes corresponding to the second attribute information matched with the attribute information of the standard object in the preset attribute information set are multiple, determining the target image transformation mode from the first image transformation modes or the second image transformation modes according to the acquired user preference information aiming at the image transformation modes;

and the information fusion unit is configured to fuse the attribute information of the virtual idol and the attribute information of the standard object based on a fusion model corresponding to the target image transformation mode to obtain a fusion result.

9. The apparatus of claim 8, wherein the mode determination subunit is further configured to:

10. The apparatus according to claim 8 or 9, wherein the information acquisition unit is further configured to:

11. The apparatus of claim 8, the apparatus further comprising:

a sample acquisition unit configured to acquire a training sample including: sample attribute information of the virtual even image and sample attribute information of the standard object;

the information obtaining unit is configured to fuse the sample attribute information of the virtual idol image and the sample attribute information of the standard object to obtain sample fusion attribute information;

And the model training unit is configured to construct an attribute consistency loss function according to the sample attribute information of the virtual even image and the sample fusion attribute information, and perform self-supervision learning of the fusion model by utilizing the attribute consistency loss function.

12. The apparatus of claim 11, wherein the fusion model is a generative antagonism network model; and

the information obtaining unit is further configured to: inputting the sample attribute information of the virtual idol and the sample attribute information of the standard object into a generator of the generated type countermeasure network model to obtain the sample fusion attribute information;

the model training unit is further configured to: respectively inputting the sample fusion attribute information and the sample attribute information of the standard object into a discriminator of the generated type countermeasure network model to obtain a first discrimination result aiming at the sample fusion attribute information and a second discrimination result aiming at the sample attribute information of the standard object; determining the attribute consistency loss function according to the first discrimination result, the second discrimination result and the sample fusion attribute information; and adjusting network parameters of the generator according to the attribute consistency loss function.

13. The apparatus of any of claims 8, 9, 11-12, wherein the attribute information comprises at least one of: face attribute information, hair style attribute information, scene attribute information, ornament attribute information, and apparel attribute information.

14. The apparatus of claim 10, wherein the attribute information comprises at least one of: face attribute information, hair style attribute information, scene attribute information, ornament attribute information, and apparel attribute information.

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

16. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-7.