WO2025194356A1 - Method for quickly generating multiple groups of customized user avatars - Google Patents
Method for quickly generating multiple groups of customized user avatarsInfo
- Publication number
- WO2025194356A1 WO2025194356A1 PCT/CN2024/082555 CN2024082555W WO2025194356A1 WO 2025194356 A1 WO2025194356 A1 WO 2025194356A1 CN 2024082555 W CN2024082555 W CN 2024082555W WO 2025194356 A1 WO2025194356 A1 WO 2025194356A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- user
- tags
- model
- avatar
- processing unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
Definitions
- the present invention relates to a method for generating a head portrait image, and in particular to a method for quickly generating multiple user head portraits.
- Some apps also generate user avatars based on user photos using drawing software, similar to the functionality of beauty cameras.
- this real-time image generation method can take 5 to 10 minutes to create, requiring user participation and is very time-consuming. Furthermore, it cannot meet the needs of users with specific requirements.
- the present invention provides a method for quickly generating multiple sets of customized user avatars, which solves the shortcomings of user avatars in the existing technology, such as lack of personalization and low recognition. It can also quickly generate multiple sets of user avatars for users to choose from based on their preferences, reducing production time and increasing the specificity of the avatars.
- a method for rapidly generating multiple sets of customized user avatars according to the present invention includes:
- the administrator selects several classification tags including character tags, action/background tags, and object/accessory tags;
- the processing unit obtains corresponding label parameters from the label database according to the classification labels, combines the label parameters into a plurality of label parameter groups according to a label combination method, and stores the combined labels into a label parameter group list;
- the processing unit extracts a corresponding image file list from the multimedia database according to the classification tags involved in the tag parameter group list, and compiles a plurality of model parameters according to the tag parameter group list and the image file list and stores them into a model parameter list;
- the user executes an application on an electronic device to open a category tag selection interface.
- the user selects several category tags including character tags, action/background tags, and object/accessory tags according to his or her preferences.
- the application transmits these category tags to the processing unit; the processing unit combines these category tags into a tag parameter group, and extracts a model parameter list from a model database, and filters out several model parameters corresponding to the same or similar tag parameter groups from the model parameter list according to the tag parameter group; further, the processing unit extracts several corresponding avatar models from the model database according to these model parameters, and then packages these avatar models and transmits them to the application; the application receives these avatar models and unpacks and displays them for the user to select; if the user selects one of these avatar models, the application binds the selected avatar model to the user.
- Classification labels can specify three ways of expressing user preferences so that AI can create images that meet user expectations.
- ChatGPT generates highly relevant action descriptions based on the text selected or entered by the user in the category label selection interface. To ensure diversity in the images generated by the category labels, one category label will generate multiple action descriptions to help generate images. For example, if the action category label is dessert, the generated images include eating pie, cooking cookies, enjoying chocolate cake, etc.
- a highly relevant image background is generated. For example, if the action classification label is cooking cookies, the generated background image includes a restaurant, a kitchen, etc.
- Each avatar model image is generated based on multiple preferences and user selections. In addition to generating images that match the three preferences as much as possible, these generated tags will be recorded in the database. When the user selects three category tags to be generated on the application, the system will search the database for the most similar avatar model and display it on the selection page.
- FIG2 is a schematic diagram of a method for generating image files corresponding to classification labels
- FIG3 is a flow chart of an avatar model training method
- FIG4 is a flow chart of a method for a user to generate a user avatar
- FIG5 is a schematic diagram of an embodiment of generating a user avatar according to a selected category tag
- FIG6 is a schematic diagram of an embodiment of generating a user avatar according to a selected category tag
- FIG7 is a schematic diagram of an embodiment of generating a user avatar based on input category labels
- FIG8 is a schematic diagram of an embodiment of generating a user avatar based on a text string input in a category tag.
- content is a concept that refers to information or individual information elements implemented in text, images, videos, audio files, or a combination thereof, and can be displayed.
- the hardware may be a data processing device, such as a mobile device or personal computer with a built-in central processing unit or other processor.
- the software executed by the hardware may refer to the executed program, object, executable file, thread, or program.
- the database server and model training server of the present invention can be composed of a plurality of computers, as long as they provide corresponding functions.
- the computer includes a processor, a memory connected to the processor module, and a servo network interface.
- the processor can be used to execute operating systems and applications stored in the memory, including a database management system (DBMS), a web server, and/or a web application server, to implement the multiple steps mentioned in the embodiments of the present invention.
- DBMS database management system
- web server a web application server
- FIG1 shows a system 100 for rapidly generating multiple customized user avatars according to the present invention.
- the system 100 includes:
- a database server 10 comprising:
- the tag database 11 includes a plurality of category tags and a category tag list 60. Each of the category tags corresponds to a tag parameter.
- the category tags include a clothing tag, an action tag, an object tag, and another tag.
- the multimedia database 12 includes a plurality of images and corresponding image codes and an image list, wherein the images are associated with classification tags;
- a model database 13 including a plurality of avatar models and corresponding plurality of model parameters, a label parameter group list, and a model parameter list;
- User database 14 including user information and user behavior
- the processing unit 21 is used to combine the label parameters corresponding to the classification labels into a plurality of label parameter groups according to a label combination method, and store the label parameter group list in the model database 13;
- the avatar training unit 22 is configured to extract the associated images from the multimedia database 12 based on the label parameters corresponding to the label parameter groups, generate the avatar models using a deep learning text-to-image model 70 (Stable Diffusion), and store the avatar models in the model database 13, where the model parameters correspond to the label parameter groups.
- a deep learning text-to-image model 70 Stable Diffusion
- the user connects to the model training server 20 via the Internet 50 on the electronic device 30.
- the electronic device 30 includes:
- Application 31 is used to provide a category tag selection interface, allowing the user to select a category tag for which the user avatar is desired to be generated on the category tag selection interface;
- a display screen 32 is used to display these avatar models for a user to select and/or register a user avatar.
- Stable Diffusion is a deep learning text-to-image generation model. It is primarily used to generate detailed images from text descriptions and to generate image-to-image transformations guided by prompts. Stable Diffusion is a variant of the diffusion model, called the latent diffusion model (LDM).
- LDM latent diffusion model
- the number of times each set of model parameters is generated by the deep learning text-to-image generation model 70 (Stable Diffusion) to generate these avatar models is set by an administrator.
- the application 31 further includes a registration unit, through which the user registers the selected avatar model as the user's avatar.
- the registration unit sends a message to the model database 13 to lock the avatar model and marks a corresponding status code as registered.
- the processing unit 21 further includes collecting statistics on the user's behavior based on the classification tags selected by the user and/or the model parameters corresponding to the user avatar registered by the user.
- the status code includes 0: removed from shelves, 1: listed, and 2: registered.
- the application 31 further includes an editing unit, which allows the user to edit the user avatar.
- the application 31 further includes a regeneration indicator. If the user is not satisfied with the batch of avatar models, the user can click the regeneration indicator to allow the avatar training unit 22 to generate the avatar models in real time according to the classification labels selected by the user.
- the label combination method further includes randomly selecting from the primary label and the secondary label as a model training condition.
- the action tag is combined with the background tag, and the object tag is combined with the accessories tag.
- the classification tag selection interface provides users with selections based on clothing tags, action/background tags, and object/accessories tags.
- the category tag selection interface randomly displays the primary tags and/or the secondary tags for the user to select.
- the label combination method further includes the administrator setting these main labels and/or these secondary labels as model training conditions, and the administrator can also set the classification label selection interface to display these main labels and/or these secondary labels for user selection.
- the classification labels corresponding to the model parameters with an avatar generation success rate higher than a threshold are set as the training conditions of the avatar training unit 22 and/or the classification labels displayed on the classification label selection interface, and the avatar generation success rate calculation method includes the success rate of satisfying the user's needs in the first generation.
- Style tags include Chinese retro style, Japanese style, American style, Disney style, personification, cartoon style, etc., but are not limited to these.
- the category tag selection interface further includes a category tag input field.
- the category tags are input by the user in the category tag input field and are classified into similar category tags after natural language analysis.
- the file size of the avatar model includes the large image (for example, resolution 512*512) and the thumbnail (for example, resolution 128*128).
- User information includes name, account number, password, interests, gender, age, blood type, etc.
- User behavior is further collected through APP tracking points to collect user interaction data on platforms associated with the application 31 server, including clicks on content, participation in competitions, joining teams, etc.; for example, in the user's past interaction data record statistics ⁇ 'basketball': ⁇ 'cnt':100,'pref':0.8 ⁇ , it means that the user has interacted with a total of 100 basketball-related categories of content, accounting for 80% of the user's preferences.
- the model training server 20 further includes a similarity calculation unit for calculating a model similarity between the portrait models in the same group.
- the system 100 of the present invention for quickly generating multiple sets of customized user avatars further includes an administrator unit.
- the administrator can set an avatar model similarity percentage and extract avatar models with a similarity percentage higher than or equal to the avatar model similarity percentage from the model database 13 in the form of groups.
- the administrator can further select which avatar models to retain or delete.
- the manager unit also includes a head portrait model optimization record.
- the manager screens the head portrait models stored in the model database 13.
- the manager unit stores the screening record as a head portrait model optimization record and transmits it to the head portrait training unit 22 for learning and training.
- the electronic device 30 includes a computer, a tablet, a smart watch, a personal computer (PC), a mobile terminal, or the like.
- FIG2 A method for rapidly generating multiple sets of customized user avatars according to the present invention is shown in FIG2 , wherein a method for generating images corresponding to classification labels includes an avatar training unit 22 extracting a classification label list 60 from a label database 11 , generating a plurality of images based on the textual content of the plurality of classification labels using a deep learning text-to-image generation model 70 (Stable Diffusion), and mapping these images to these classification labels and storing them in a multimedia database 12 .
- a method for generating images corresponding to classification labels includes an avatar training unit 22 extracting a classification label list 60 from a label database 11 , generating a plurality of images based on the textual content of the plurality of classification labels using a deep learning text-to-image generation model 70 (Stable Diffusion), and mapping these images to these classification labels and storing them in a multimedia database 12 .
- a method for generating images corresponding to classification labels includes an avatar training unit 22 extracting a classification label list 60 from a
- FIG3 shows a method for quickly generating multiple sets of customized user avatars according to the present invention, including: step A10, the processing unit 21 extracts a plurality of classification tags and corresponding plurality of tag parameters stored in a tag database 11, combines the tag parameters corresponding to these classification tags into a plurality of tag parameter groups according to a tag combination method, and stores them as a tag parameter group list in a model database 13.
- step A10 the processing unit 21 extracts a plurality of classification tags and corresponding plurality of tag parameters stored in a tag database 11, combines the tag parameters corresponding to these classification tags into a plurality of tag parameter groups according to a tag combination method, and stores them as a tag parameter group list in a model database 13.
- An example of using 4 tags to form each group of 3 is described below, but this does not limit the present invention.
- step A20 the processing unit 21 extracts a corresponding image list from a multimedia database 12 according to the classification tags involved in the tag parameter group list, as illustrated below but not limiting the present invention.
- step A30 the processing unit 21 compiles a plurality of model parameters according to the label parameter list and the image file list and stores the model parameters into a model parameter list in the model database 13 .
- the following is an example but does not limit the present invention.
- step A40 these model parameters are analyzed by natural language to exclude unreasonable groups.
- the group of basketball uniform-shooting-bat (a001-e001-k001) will be determined to be unreasonable after natural language analysis, and the group will be deleted from the model parameter list. Exceptionally, this step can be omitted to increase interest and creative space.
- the avatar training unit 22 extracts a list of model parameters from the model database 13, extracts several corresponding images from the multimedia database 12 according to each set of model parameters, and uses a deep learning text-to-image model 70 (Stable Diffusion) to generate several avatar models. These avatar models are respectively assigned to these model parameters and stored in the model database 13.
- the avatar training unit 22 feeds back the generated avatar models, based on user registration and/or behavior statistics selected by an administrator, to the avatar training unit 22 for training and learning.
- these images include multimedia images collected from other platforms, which are stored in the multimedia database 12 according to classification tags after being reviewed and approved by the administrator.
- FIG4 shows a method for rapidly generating multiple sets of customized user avatars according to the present invention, comprising:
- A100 - electronic device 30 transmits to processing unit 21 via Internet 50 a plurality of category tags selected by the user on the category tag selection interface of application 31 displayed on display screen 32 of electronic device 30;
- the processing unit 21 receives the classification labels and combines them into a label parameter group;
- the processing unit 21 extracts a model parameter list from the model database 13, and selects a plurality of model parameters corresponding to the same or similar tag parameter groups from the model parameter list according to the tag parameter groups;
- the processing unit 21 extracts corresponding avatar models from the model database 13 according to the model parameters;
- the processing unit 21 packages the avatar models and transmits them to the application 31;
- the application 31 receives and unpacks the avatar models and displays them on the display screen 32 for the user to select.
- the display screen 32 also displays a regeneration indicator;
- the application 31 binds the selected avatar model to the user and sends a registration notification to the model database 13.
- the model database 13 changes the status code corresponding to the avatar model to registered according to the registration notification, and the process ends;
- A170 if the user clicks the regeneration indicator, the application 31 sends a regeneration notification to the processing unit 21;
- the processing unit 21 receives the regeneration notification, extracts the model parameters corresponding to the label parameter group from the multimedia database 12 and transmits them to an avatar training unit 22 in each group.
- the avatar training unit 22 uses a deep learning text-to-image generation model 70 (Stable Diffusion) to generate these avatar models in real time, corresponds these avatar models to these model parameters respectively and stores them in the model database 13; repeat A140.
- the method for rapidly extracting avatar models from the model database 13 based on user-selected category tags includes: the user launching an application 31 on an electronic device 30, displaying a category tag selection interface on screen 32; the user selecting "dog” in the “personality” tag, “listening to music” in the “action” tag, “park” in the “background” tag, “headphones” in the “accessories” tag, and “best quality,” “high resolution,” “simple background,” and “ultra-detailed eyes” in the “other” tag; the application 31 transmitting these category tags to the processing unit 21, which combines the category tags into a tag parameter group and extracts a model parameter list from the model database 13.
- the model parameters corresponding to the same or similar tag parameter groups are filtered from the model parameter list, and the corresponding avatar models are then extracted from the model database 13 based on these model parameters, as shown in FIG5 .
- This method can rapidly provide user avatars that meet user needs, resolving the drawback of prior art techniques that require 5 to 10 minutes to generate user avatars.
- the method for real-time generation of user avatars in which the user selects these category tags includes: the user opens the application 31 on the electronic device 30, and the display screen 32 displays the category tag selection interface; the user selects "cat” in the character tag, “drums” in the object tag, "performance” in the action tag, “park” in the background tag, “headphones” in the accessories tag, and "best quality", "high resolution”, “simple background”, “super detailed eyes”, and “no human” in other tags; the application 31 transmits these category tags to the processing unit 21, and the operation
- the processing unit 21 combines these classification labels into label parameter groups, and extracts the corresponding model parameters of the label parameter groups from the multimedia database 12 on a per-group basis, and transmits them to the avatar training unit 22.
- the avatar training unit 22 uses a deep learning text-to-image generative model 70 (Stable Diffusion) to generate these avatar models in real time, as shown in FIG6 .
- the user selects or enters - a rabbit in the character tag, - playing guitar in the action tag, - in the park in the background tag, - wearing headphones in the accessories tag, and - best quality, high resolution, simple background, ultra-detailed eyes, no humans in the category tag input fields of the category tag selection interface;
- the application 31 transmits the category tags to the processing unit 21, and the processing unit 21 combines the category tags into a tag parameter group and extracts a model parameter list from the model database 13, and filters out the model parameters corresponding to the same or similar tag parameter groups from the model parameter list according to the tag parameter group, and then extracts the corresponding avatar models from the model database 13 according to these model parameters, as shown in Figure 7.
- a user enters a string of text in the category label input field of the category label selection interface - dog's personality, anime style, 2D image, wearing clothes, playing basketball, park background, animal city character, ink painting, full-body photo, low contrast image; the application 31 transmits the text string of the category label input to the avatar training unit 22 of the processing unit 21, and the avatar training unit 22 uses the deep learning text to generative image model 70 (Stable Diffusion) to generate these avatar models in real time, as shown in Figure 8.
- generative image model 70 Stable Diffusion
- a method of quickly generating multiple sets of customized user avatars of the present invention includes: an electronic device transmits several category tags selected by a user in a category tag selection interface displayed on the display screen of the electronic device to a processing unit via the Internet; the processing unit receives these category tags, combines these category tags into a tag parameter group and extracts a model parameter list from a database server, further filters out several model parameters corresponding to the same or similar tag parameter groups, and then extracts several corresponding avatar models from the database server based on these model parameters, packages these avatar models and transmits them to the electronic device, which receives these avatar models, unpacks them, and displays them on the display screen for the user to select.
- the display screen displays these avatar models and also displays a regeneration identifier. If the user clicks on the regeneration identifier, the electronic device transmits a regeneration notification to the processing unit; the processing unit receives the regeneration notification and transmits these classification labels to the processing unit.
- the processing unit uses a deep learning text-to-image model to generate these avatar models in real time, packages these avatar models and transmits them to the electronic device. The electronic device receives these avatar models, unpacks them and displays them on the display screen for the user to select.
- the processing unit receives a regeneration notification, extracts several corresponding images from the database server according to each group of model parameters corresponding to the label parameter group, and uses a deep learning text-to-image generation model to generate these avatar models in real time, packages these avatar models and transmits them to the electronic device, which receives these avatar models and unpacks them and displays them on the display screen for the user to select;
- the method for generating these images includes the processing unit extracting a classification label list from the database server, generating these images according to the text content of these classification labels using the deep learning text-to-image generation model, and storing these images corresponding to these classification labels in the database server.
- the mechanism for changing these classification labels includes: 1. When the number of times the avatar model corresponding to the classification label has been registered exceeds the limit, the avatar training unit will no longer generate the avatar model of the classification label; 2. When the avatar model corresponding to the classification label has been registered The more times, the lower the probability of generating a avatar model with a classification label. 3. When the avatar model corresponding to the classification label is selected not to be registered more times, the avatar training unit will regularly generate avatar models with the classification label.
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Processing Or Creating Images (AREA)
Abstract
Description
本发明涉及一种头像图像的生成方式,尤其是一种快速可生成多张用户头像的方法。The present invention relates to a method for generating a head portrait image, and in particular to a method for quickly generating multiple user head portraits.
现有的社交软件通常能够让使用者更改用户头像。不过当前的头像更改方式较为固定,例如,提供一些默认图像供用户选择,或者将用户上传的图片作为头像,此类方法不具有个性化且识别度较低,无法满足用户的体验需求。Existing social media apps often allow users to change their profile pictures. However, current methods for changing profile pictures are relatively fixed. For example, they provide a few default images for users to choose from, or use user-uploaded pictures as profile pictures. These methods lack personalization and have low recognition, failing to meet user experience requirements.
也有一些应用软件会依用户照片通过绘图软件生成用户头像,类似美肌相机的功能。但此种实时影像生成方法,在制作过程中动辄5~10分钟,需要用户参与编制非常耗时,而且无法满足具有特定要求的用户需求。Some apps also generate user avatars based on user photos using drawing software, similar to the functionality of beauty cameras. However, this real-time image generation method can take 5 to 10 minutes to create, requiring user participation and is very time-consuming. Furthermore, it cannot meet the needs of users with specific requirements.
发明内容Summary of the Invention
有鉴于此,本发明的一种快速生成多组客制化用户头像方法,解决了现有技术中用户头像缺乏个性化和识别度低的缺点,并可依用户喜好快速产生多组的用户头像供用户选择,减少了制作时间也增加了头像的专属性。In view of this, the present invention provides a method for quickly generating multiple sets of customized user avatars, which solves the shortcomings of user avatars in the existing technology, such as lack of personalization and low recognition. It can also quickly generate multiple sets of user avatars for users to choose from based on their preferences, reducing production time and increasing the specificity of the avatars.
本发明的一种快速生成多组客制化用户头像方法包括:A method for rapidly generating multiple sets of customized user avatars according to the present invention includes:
由管理者选择包含人物标签、动作/背景标签、物件/饰品标签的数个分类标签;The administrator selects several classification tags including character tags, action/background tags, and object/accessory tags;
运算处理单元依这些分类标签从标签数据库取得对应的数个标签参数,将这些标签参数依一标签组合方式组合成数个标签参数组并储存成标签参数组列表;The processing unit obtains corresponding label parameters from the label database according to the classification labels, combines the label parameters into a plurality of label parameter groups according to a label combination method, and stores the combined labels into a label parameter group list;
运算处理单元依该标签参数组列表所涉及的这些分类标签从多媒体数据库提取对应的图档列表,并依所述标签参数组列表及所述图档列表编排出数个模型参数并储存成一模型参数列表;The processing unit extracts a corresponding image file list from the multimedia database according to the classification tags involved in the tag parameter group list, and compiles a plurality of model parameters according to the tag parameter group list and the image file list and stores them into a model parameter list;
一头像训练单元从所述模型数据库提取模型参数列表,依每组的模型参数从多媒体数据库提取对应的数个图档,利用深度学习文字到生成图像模型(Stable Diffusion)生成数个头像模型,将这些头像模型分别对应这些模型参数,并储存到模型数据库。An avatar training unit extracts a list of model parameters from the model database, extracts several corresponding images from the multimedia database according to each set of model parameters, and uses a deep learning text-to-image generative model (Stable Diffusion) to generate several avatar models. These avatar models are respectively matched with these model parameters and stored in the model database.
由用户在电子设备执行应用程序开启分类标签选择界面,用户依自己的喜好选择包含人物标签、动作/背景标签、物件/饰品标签的数个分类标签,应用程序将这些分类标签传送到运算处理单元;运算处理单元将这些分类标签组合成一标签参数组,并从模型数据库提取一模型参数列表,依标签参数组从该模型参数列表筛选出相同或相近的标签参数组所对应的数个模型参数;进一步,运算处理单元依这些模型参数从模型数据库提取对应的数个头像模型,再将这些头像模型封包后传送到应用程序;应用程序接收这些头像模型并进行解包和显示供该用户选择;若用户选择这些头像模型中之一,应用程序将所选的头像模型与用户绑定。The user executes an application on an electronic device to open a category tag selection interface. The user selects several category tags including character tags, action/background tags, and object/accessory tags according to his or her preferences. The application transmits these category tags to the processing unit; the processing unit combines these category tags into a tag parameter group, and extracts a model parameter list from a model database, and filters out several model parameters corresponding to the same or similar tag parameter groups from the model parameter list according to the tag parameter group; further, the processing unit extracts several corresponding avatar models from the model database according to these model parameters, and then packages these avatar models and transmits them to the application; the application receives these avatar models and unpacks and displays them for the user to select; if the user selects one of these avatar models, the application binds the selected avatar model to the user.
若没有用户满意的头像模型,可以点选一重新生成标示符,使应用程序通知运算处理单元将所述标签参数组所对应的模型参数依每组的方式从多媒体数据库提取对应的数个图档传 送到头像训练单元,再利用一深度学习文字到生成图像模型(Stable Diffusion)实时生成这些头像模型,再将这些头像模型封包后传送到该应用程序供该用户选择。If there is no avatar model that the user is satisfied with, the user can click a regenerate indicator to make the application notify the processing unit to extract the corresponding several images from the multimedia database according to each group of model parameters corresponding to the tag parameter group. The data is sent to the avatar training unit, which then uses a deep learning text-to-image generation model (Stable Diffusion) to generate these avatar models in real time. These avatar models are then packaged and sent to the application for the user to select.
分类标签可以指定三种表现方式来呈现用户喜好,以便AI制作出符合用户预期的图像。Classification labels can specify three ways of expressing user preferences so that AI can create images that meet user expectations.
深度学习文字到生成图像模型(StableDiffusion)预设向不太拟真的方向生成头像模型,例如拟人、卡通化,避免过度写实的风格。The deep learning text-to-image generation model (StableDiffusion) is preset to generate avatar models in a less realistic direction, such as anthropomorphism and cartoonization, avoiding overly realistic styles.
根据用户喜好在分类标签选择界面选择或输入的文字搭配使用ChatGPT,可产生相关性高的动作描述,为使分类标签生成的影像具有多样性,一个分类标签将生成多个动作描述帮助产生图像,例如动作分类标签是甜点则生成的图像包含在吃派、烹饪饼干、享用巧克力蛋糕…等等。ChatGPT generates highly relevant action descriptions based on the text selected or entered by the user in the category label selection interface. To ensure diversity in the images generated by the category labels, one category label will generate multiple action descriptions to help generate images. For example, if the action category label is dessert, the generated images include eating pie, cooking cookies, enjoying chocolate cake, etc.
根据用户喜好及产生的动作描述产生相关性高的影像背景例如动作分类标签是烹饪饼干则生成的背景图像包含餐厅、厨房...。Based on the user's preferences and the generated action description, a highly relevant image background is generated. For example, if the action classification label is cooking cookies, the generated background image includes a restaurant, a kitchen, etc.
根据用户喜好及产生的动作描述产生相关性高的配件。Generate highly relevant accessories based on user preferences and generated action descriptions.
每一张头像模型图像是由多个喜好并依照用户选择进行生成,图像除了尽可能生成与三项喜好相符的影像外,会将这些生成标签纪录在数据库中,当用户于应用程序上选定了三个欲生成的分类标签时,系统将在数据库搜寻与之最相似的头像模型并显示于选择页面中。Each avatar model image is generated based on multiple preferences and user selections. In addition to generating images that match the three preferences as much as possible, these generated tags will be recorded in the database. When the user selects three category tags to be generated on the application, the system will search the database for the most similar avatar model and display it on the selection page.
应用程序将于用户注册头像模型当下纪录用户的选择与不选择的纪录,将叠加用户行为。The application will record the user's selection and non-selection when the user registers the avatar model, and will superimpose the user behavior.
图1是一种快速生成多组客制化用户头像方法的系统示意图;FIG1 is a system diagram of a method for rapidly generating multiple sets of customized user avatars;
图2是分类标签对应的图档产生方法示意图;FIG2 is a schematic diagram of a method for generating image files corresponding to classification labels;
图3是头像模型训练方法流程图;FIG3 is a flow chart of an avatar model training method;
图4是用户生成用户头像方法流程图;FIG4 is a flow chart of a method for a user to generate a user avatar;
图5是依选择分类标签产生用户头像的一实施例示意图;FIG5 is a schematic diagram of an embodiment of generating a user avatar according to a selected category tag;
图6是依选择分类标签产生用户头像的一实施例示意图;FIG6 is a schematic diagram of an embodiment of generating a user avatar according to a selected category tag;
图7是依输入分类标签产生用户头像的一实施例示意图;FIG7 is a schematic diagram of an embodiment of generating a user avatar based on input category labels;
图8是依在分类标签输入文字串产生用户头像的一实施例示意图。FIG8 is a schematic diagram of an embodiment of generating a user avatar based on a text string input in a category tag.
附图标记说明:10-数据库服务器;11-标签数据库;12-多媒体数据库;13-模型数据库;14-用户数据库;20-模型训练服务器;21-运算处理单元;22-头像训练单元;30-电子设备;31-应用程序;32-显示屏幕;50-网际网络;60-分类标签列表;70-深度学习文字到生成图像模型;100-快速生成多组客制化用户头像方法的系统;A10~A50-头像模型训练方法流程;A100~A180-用户生成用户头像方法流程。Explanation of the accompanying symbols: 10-database server; 11-label database; 12-multimedia database; 13-model database; 14-user database; 20-model training server; 21-processing unit; 22-avatar training unit; 30-electronic device; 31-application; 32-display screen; 50-Internet; 60-classification label list; 70-deep learning text to generate image model; 100-system for quickly generating multiple sets of customized user avatars; A10~A50-avatar model training method process; A100~A180-user generation user avatar method process.
以下参照附图详细描述本发明。只要有可能,在附图及以下描述中使用相同附图标号来指代相同或类似部分。尽管本说明书描述若干说明性实施例,但修改、调适以及其他实施方式是有可能的。举例而言,可对图式中示出的组件及步骤作出替代、添加或修改,且可藉由对所揭露的方法进行替代、重新排序、移除或添加步骤来修改本文中所描述的说明性方法。因此,本发明不限于所揭露的实施例,本发明保护范围由权利要求书界定。The present invention is described in detail below with reference to the accompanying drawings. Whenever possible, the same reference numerals are used in the drawings and the following description to refer to the same or similar parts. Although this specification describes several illustrative embodiments, modifications, adaptations, and other implementations are possible. For example, substitutions, additions, or modifications may be made to the components and steps shown in the drawings, and the illustrative methods described herein may be modified by substituting, reordering, removing, or adding steps to the disclosed methods. Therefore, the present invention is not limited to the disclosed embodiments, and the scope of protection of the present invention is defined by the claims.
在本说明书中,内容为一种概念,表示实现于文字、图像、影片、音档或其组合之信息或个别信息元素,并可供予展示。In this specification, content is a concept that refers to information or individual information elements implemented in text, images, videos, audio files, or a combination thereof, and can be displayed.
在本说明书中,词语单元、装置、终端、服务器或系统等通常指由相应硬件执行的硬件与软件的组合。例如,硬件可为一数据处理装置,例如一内建中央处理器或其他处理器的移动装置及个人计算机。另外,由硬件执行的软件在本文可指被执行的程序、物件、可执行档、执行绪以及程序等。In this specification, terms such as "unit," "device," "terminal," "server," or "system" generally refer to the combination of hardware and software executed by the corresponding hardware. For example, the hardware may be a data processing device, such as a mobile device or personal computer with a built-in central processing unit or other processor. Furthermore, the software executed by the hardware may refer to the executed program, object, executable file, thread, or program.
本领域普通技术人员应该具备有计算机结构与计算机组织的通常知识,可以理解到本发明并不限定这些计算机的形式,以及网络连接的架构,本发明的数据库服务器、模型训练服务器可以由多个计算机联合组成,只要其提供相对应的功能即可,计算机包含处理器、连接至处理器模块存储器与伺服网络界面。处理器可以用于执行储存在存储器当中的作业系统与应用程序,包含数据库管理系统(DBMS,database management system)、网页服务器(web server)与/或网页应用程序服务器(web application server)在内,用于实现本发明所述实施例提及的多个步骤。本发明整体可以透过网站平台、应用程序(APP)或其他方式实现。Those skilled in the art should have general knowledge of computer structure and computer organization, and will understand that the present invention is not limited to the form of these computers and the architecture of network connections. The database server and model training server of the present invention can be composed of a plurality of computers, as long as they provide corresponding functions. The computer includes a processor, a memory connected to the processor module, and a servo network interface. The processor can be used to execute operating systems and applications stored in the memory, including a database management system (DBMS), a web server, and/or a web application server, to implement the multiple steps mentioned in the embodiments of the present invention. The present invention as a whole can be implemented through a website platform, an application (APP), or other means.
本发明的一种快速生成多组客制化用户头像方法的系统100如图1所示,系统100包含:FIG1 shows a system 100 for rapidly generating multiple customized user avatars according to the present invention. The system 100 includes:
数据库服务器10,该数据库服务器10包含:A database server 10, comprising:
标签数据库11,包含数个分类标签、一分类标签列表60,这些分类标签分别对应一标签参数,这些分类标签包含一穿着标签、一动作标签、一物件标签及一其他标签;The tag database 11 includes a plurality of category tags and a category tag list 60. Each of the category tags corresponds to a tag parameter. The category tags include a clothing tag, an action tag, an object tag, and another tag.
多媒体数据库12,包含数个图档与对应的数个图档编码及一图档列表,这些图档关联分类标签;The multimedia database 12 includes a plurality of images and corresponding image codes and an image list, wherein the images are associated with classification tags;
模型数据库13,包含数个头像模型与对应的数个模型参数、一标签参数组列表、一模型参数列表;A model database 13, including a plurality of avatar models and corresponding plurality of model parameters, a label parameter group list, and a model parameter list;
用户数据库14,包含一用户资料、一用户行为;User database 14, including user information and user behavior;
连接该数据库服务器10的一模型训练服务器20,模型训练服务器20包含:A model training server 20 connected to the database server 10, the model training server 20 comprising:
运算处理单元21,用于将这些分类标签对应的这些标签参数依一标签组合方式组合成复数个标签参数组,并储存成标签参数组列表到模型数据库13;The processing unit 21 is used to combine the label parameters corresponding to the classification labels into a plurality of label parameter groups according to a label combination method, and store the label parameter group list in the model database 13;
头像训练单元22,用于依这些标签参数组所对应的这些标签参数,从多媒体数据库12中提取关联的这些图档,利用一深度学习文字到生成图像模型70(Stable Diffusion)生成这些头像模型,将这些头像模型分别对应这些模型参数并储存到模型数据库13,这些模型参数对应标签参数组;The avatar training unit 22 is configured to extract the associated images from the multimedia database 12 based on the label parameters corresponding to the label parameter groups, generate the avatar models using a deep learning text-to-image model 70 (Stable Diffusion), and store the avatar models in the model database 13, where the model parameters correspond to the label parameter groups.
电子设备30,用户通过网际网络(互联网)50连接模型训练服务器20,电子设备30包含: The user connects to the model training server 20 via the Internet 50 on the electronic device 30. The electronic device 30 includes:
应用程序31,用于提供一分类标签选择界面,让该用户在分类标签选择界面选择希望生成用户头像的分类标签;以及Application 31 is used to provide a category tag selection interface, allowing the user to select a category tag for which the user avatar is desired to be generated on the category tag selection interface; and
一显示屏幕32,用于显示这些头像模型供一用户选择或/及注册一用户头像。A display screen 32 is used to display these avatar models for a user to select and/or register a user avatar.
Stable Diffusion是一种深度学习文字到图像生成模型,它主要用于根据文字的描述产生详细图像,以及在提示词指导下产生图生图的转变。Stable Diffusion是一种扩散模型(diffusion model)的变体,叫做“潜在扩散模型”(latent diffusion model;LDM)。Stable Diffusion is a deep learning text-to-image generation model. It is primarily used to generate detailed images from text descriptions and to generate image-to-image transformations guided by prompts. Stable Diffusion is a variant of the diffusion model, called the latent diffusion model (LDM).
每一组的模型参数经深度学习文字到生成图像模型70(Stable Diffusion)生成这些头像模型的笔数,由一管理者设定。The number of times each set of model parameters is generated by the deep learning text-to-image generation model 70 (Stable Diffusion) to generate these avatar models is set by an administrator.
应用程序31还包含一注册单元,用户通过注册单元将选定的该头像模型注册为用户头像,注册单元传送讯息通知模型数据库13将头像模型锁定,并将所对应的一状态码标示为已注册。The application 31 further includes a registration unit, through which the user registers the selected avatar model as the user's avatar. The registration unit sends a message to the model database 13 to lock the avatar model and marks a corresponding status code as registered.
运算处理单元21还包含,依该用户选择的这些分类标签或/及该用户所注册的用户头像对应的该模型参数,统计该用户行为。The processing unit 21 further includes collecting statistics on the user's behavior based on the classification tags selected by the user and/or the model parameters corresponding to the user avatar registered by the user.
在一实施例中,状态码包含0:已下架、1:已上架、2:已注册。In one embodiment, the status code includes 0: removed from shelves, 1: listed, and 2: registered.
应用程序31还包含一编辑单元,该用户可对用户头像进行编辑。The application 31 further includes an editing unit, which allows the user to edit the user avatar.
应用程序31进一步还包含一重新生成标示符,用户若对此批的这些头像模型不满意,可点选该重新生成标示符,让该头像训练单元22依该用户选择的这些分类标签实时生成这些头像模型。The application 31 further includes a regeneration indicator. If the user is not satisfied with the batch of avatar models, the user can click the regeneration indicator to allow the avatar training unit 22 to generate the avatar models in real time according to the classification labels selected by the user.
较佳的,标签组合方式包含3个分类标签,由穿着标签、动作标签、物件标签组成。Preferably, the tag combination method includes three classification tags, consisting of clothing tags, action tags, and object tags.
在一实施例中,这些分类标签进一步包含数个主要标签及数个次要标签,这些主要标签包含穿着标签、动作标签、物件标签及一其他主要标签,但不依此为限,这些次要标签包含一人物标签、一背景标签、一饰品标签、一风格标签及一其他次要标签,但不依此为限。In one embodiment, these classification tags further include several main tags and several secondary tags, these main tags include clothing tags, action tags, object tags and other main tags, but are not limited to these, these secondary tags include a character tag, a background tag, an accessories tag, a style tag and other secondary tags, but are not limited to these.
在一实施例中,次要标签绑定主要标签,例如动作标签绑定背景标签、物件标签绑定饰品标签。例如,动作标签若为打篮球则绑定背景标签为篮球场,物件标签若为机车则绑定饰品标签为安全帽。In one embodiment, secondary tags are bound to primary tags, such as an action tag being bound to a background tag, and an object tag being bound to an accessory tag. For example, if the action tag is playing basketball, the background tag is bound to a basketball court, and if the object tag is a motorcycle, the accessory tag is bound to a helmet.
在一实施例中,标签组合方式还包含从主要标签及次要标签随机选择作为模型训练条件。In one embodiment, the label combination method further includes randomly selecting from the primary label and the secondary label as a model training condition.
在一实施例中,动作标签合并背景标签、物件标签合并饰品标签,分类标签选择界面依穿着标签、动作/背景标签、物件/饰品标签供用户选择。In one embodiment, the action tag is combined with the background tag, and the object tag is combined with the accessories tag. The classification tag selection interface provides users with selections based on clothing tags, action/background tags, and object/accessories tags.
在另一实施例中,分类标签选择界面随机显示这些主要标签或/及这些次要标签供该用户选择。In another embodiment, the category tag selection interface randomly displays the primary tags and/or the secondary tags for the user to select.
在另一实施例中,标签组合方式还包含管理者设定这些主要标签或/及这些次要标签作为模型训练条件,也可以由管理者设定分类标签选择界面显示这些主要标签或/及这些次要标签供用户选择。In another embodiment, the label combination method further includes the administrator setting these main labels and/or these secondary labels as model training conditions, and the administrator can also set the classification label selection interface to display these main labels and/or these secondary labels for user selection.
在一实施例中,设定一头像生成成功率高于一阈值的这些模型参数所对应的这些分类标签,作为头像训练单元22训练条件或/及分类标签选择界面显示的这些分类标签,头像生成成功率计算方式包含第一次生成即满足该用户需求的成功率。 In one embodiment, the classification labels corresponding to the model parameters with an avatar generation success rate higher than a threshold are set as the training conditions of the avatar training unit 22 and/or the classification labels displayed on the classification label selection interface, and the avatar generation success rate calculation method includes the success rate of satisfying the user's needs in the first generation.
在一实施例中,分类标签选择界面显示的这些分类标签,是由用户在应用程序31上所有选定的这些分类标签累计,再依类别统计次数与比例组成后排序显示或依次数较高的前3项显示。In one embodiment, the category tags displayed on the category tag selection interface are accumulated from all the category tags selected by the user on the application 31, and are then sorted and displayed according to the category statistics and proportions or the top three with the highest numbers.
穿着标签包含篮球服、泳衣、嘻哈服饰等…,但不依此为限。Clothing labels include basketball uniforms, swimsuits, hip-hop clothing, etc., but are not limited to these.
动作标签包含投篮、游泳、跳舞等…,但不依此为限。Action tags include shooting, swimming, dancing, etc., but are not limited to these.
物件标签包含篮球、泳镜、爵士鼓等…,但不依此为限。Object labels include, but are not limited to, basketballs, swimming goggles, and jazz drums.
背景标签包含篮球场、海边、雪景等…,但不依此为限。Background labels include, but are not limited to, basketball court, beach, snow scene, etc.
饰品标签包含徽章、头戴、泳圈、纹身等…,但不依此为限。Accessory labels include, but are not limited to, badges, headbands, swim rings, tattoos, etc.
人物标签包含人类、动漫角色、虚拟人物、动物等…,但不依此为限。Character tags include humans, anime characters, virtual characters, animals, etc., but are not limited thereto.
风格标签包含中式复古风、日式风、美式风、迪斯尼风、拟人、卡通化等…,但不依此为限。Style tags include Chinese retro style, Japanese style, American style, Disney style, personification, cartoon style, etc., but are not limited to these.
在一实施例中,分类标签选择界面还包含一分类标签输入栏位,这些分类标签由用户在该分类标签输入栏位输入,经一自然语言分析后归类到相近的该分类标签。In one embodiment, the category tag selection interface further includes a category tag input field. The category tags are input by the user in the category tag input field and are classified into similar category tags after natural language analysis.
这些图档及这些图档编码都关联到相关的分类标签,例如,分类标签为穿着标签(A)、图档为篮球服(a)、图档编码则为篮球服图档1(该图档编码Aa001)、篮球服图档2(该图档编码Aa002)、篮球服图档3(该图档编码Aa003)。These images and their image codes are all associated with relevant classification labels. For example, the classification label is the clothing label (A), the image is the basketball uniform (a), and the image codes are basketball uniform image 1 (the image code is Aa001), basketball uniform image 2 (the image code is Aa002), and basketball uniform image 3 (the image code is Aa003).
头像模型档案大小包含大图(例如分辨率512*512)及缩图(例如分辨率128*128)。The file size of the avatar model includes the large image (for example, resolution 512*512) and the thumbnail (for example, resolution 128*128).
用户资料包含姓名、账号、密码、兴趣、性别、年龄、血型等…。User information includes name, account number, password, interests, gender, age, blood type, etc.
用户行为进一步通过APP埋点搜集用户在与应用程序31服务器有关联的平台上的互动资料,包含点击内容、参加竞赛、加入团队等等;例如,在用户过往的互动资料纪录统计{‘篮球’:{‘cnt’:100,‘pref’:0.8}},表示用户总计曾与100个篮球相关类别内容互动过,并占比此用户喜好的80%。User behavior is further collected through APP tracking points to collect user interaction data on platforms associated with the application 31 server, including clicks on content, participation in competitions, joining teams, etc.; for example, in the user's past interaction data record statistics {'basketball':{'cnt':100,'pref':0.8}}, it means that the user has interacted with a total of 100 basketball-related categories of content, accounting for 80% of the user's preferences.
模型训练服务器20还包含一相似度计算单元,计算同一组的这些头像模型彼此间的一模型相似度。The model training server 20 further includes a similarity calculation unit for calculating a model similarity between the portrait models in the same group.
本发明的一种快速生成多组客制化用户头像方法的系统100还包含一管理者单元,该管理者可设定一头像模型相似度百分比,依组别的形式从该模型数据库13提取高于或等于该头像模型相似度百分比的这些头像模型,进一步由管理者选择保留或删除哪些头像模型。The system 100 of the present invention for quickly generating multiple sets of customized user avatars further includes an administrator unit. The administrator can set an avatar model similarity percentage and extract avatar models with a similarity percentage higher than or equal to the avatar model similarity percentage from the model database 13 in the form of groups. The administrator can further select which avatar models to retain or delete.
管理者单元还包含一头像模型优化纪录,管理者对模型数据库13内储存的这些头像模型进行筛选,管理者单元将筛选纪录储存成头像模型优化纪录传送到头像训练单元22进行学习训练。The manager unit also includes a head portrait model optimization record. The manager screens the head portrait models stored in the model database 13. The manager unit stores the screening record as a head portrait model optimization record and transmits it to the head portrait training unit 22 for learning and training.
电子设备30包含计算机、平板、智慧手表、诸如计算机(PC)、行动终端机或其类似者。The electronic device 30 includes a computer, a tablet, a smart watch, a personal computer (PC), a mobile terminal, or the like.
本发明的一种快速生成多组客制化用户头像方法,如图2所示,其中分类标签对应的图档产生方法包含一头像训练单元22从一标签数据库11提取一分类标签列表60,依数个分类标签文字内容,利用一深度学习文字到生成图像模型70(Stable Diffusion)生成数个图档,将这些图档对应这些分类标签并储存到一多媒体数据库12。 A method for rapidly generating multiple sets of customized user avatars according to the present invention is shown in FIG2 , wherein a method for generating images corresponding to classification labels includes an avatar training unit 22 extracting a classification label list 60 from a label database 11 , generating a plurality of images based on the textual content of the plurality of classification labels using a deep learning text-to-image generation model 70 (Stable Diffusion), and mapping these images to these classification labels and storing them in a multimedia database 12 .
图3显示了本发明的一种快速生成多组客制化用户头像方法,包括:步骤A10,运算处理单元21提取一标签数据库11储存的数个分类标签及对应的复数个标签参数,将这些分类标签对应的标签参数依一标签组合方式组合成数个标签参数组,并储存成一标签参数组列表到一模型数据库13,举例以4个标签来组成每组3个的例子说明如下,但此并不限定本发明。
FIG3 shows a method for quickly generating multiple sets of customized user avatars according to the present invention, including: step A10, the processing unit 21 extracts a plurality of classification tags and corresponding plurality of tag parameters stored in a tag database 11, combines the tag parameters corresponding to these classification tags into a plurality of tag parameter groups according to a tag combination method, and stores them as a tag parameter group list in a model database 13. An example of using 4 tags to form each group of 3 is described below, but this does not limit the present invention.
步骤A20,该运算处理单元21依标签参数组列表所涉及的这些分类标签从一多媒体数据库12提取对应的一图档列表,举例说明如下但不限定本发明。
In step A20 , the processing unit 21 extracts a corresponding image list from a multimedia database 12 according to the classification tags involved in the tag parameter group list, as illustrated below but not limiting the present invention.
步骤A30,运算处理单元21依标签参数组列表及图档列表编排出数个模型参数并储存成一模型参数列表到模型数据库13,举例说明如下但不限定本发明。
In step A30 , the processing unit 21 compiles a plurality of model parameters according to the label parameter list and the image file list and stores the model parameters into a model parameter list in the model database 13 . The following is an example but does not limit the present invention.
步骤A40,这些模型参数经自然语言分析,排除不合理的组别,例如篮球服-投篮-球棒(a001-e001-k001)该组经自然语言分析后将判定为不合理,则该组从模型参数列表中删除,例外的是,为了增加趣味性及创作空间该步骤也可省略。In step A40, these model parameters are analyzed by natural language to exclude unreasonable groups. For example, the group of basketball uniform-shooting-bat (a001-e001-k001) will be determined to be unreasonable after natural language analysis, and the group will be deleted from the model parameter list. Exceptionally, this step can be omitted to increase interest and creative space.
步骤A50,头像训练单元22从该模型数据库13提取模型参数列表,依每组的模型参数从多媒体数据库12提取对应的数个图档,利用一深度学习文字到生成图像模型70(Stable Diffusion)生成数个头像模型,将这些头像模型分别对应这些模型参数并储存到模型数据库13。 In step A50, the avatar training unit 22 extracts a list of model parameters from the model database 13, extracts several corresponding images from the multimedia database 12 according to each set of model parameters, and uses a deep learning text-to-image model 70 (Stable Diffusion) to generate several avatar models. These avatar models are respectively assigned to these model parameters and stored in the model database 13.
在一实施例中,头像训练单元22对已生成的这些头像模型,经用户注册或/及一管理者挑选的行为统计,反馈到头像训练单元22进行训练学习。In one embodiment, the avatar training unit 22 feeds back the generated avatar models, based on user registration and/or behavior statistics selected by an administrator, to the avatar training unit 22 for training and learning.
在一实施例中,这些图档包含从其他平台收集到的多媒体图档,经管理者审核通过后依分类标签储存到多媒体数据库12。In one embodiment, these images include multimedia images collected from other platforms, which are stored in the multimedia database 12 according to classification tags after being reviewed and approved by the administrator.
图4显示了本发明的一种快速生成多组客制化用户头像方法,包括:FIG4 shows a method for rapidly generating multiple sets of customized user avatars according to the present invention, comprising:
A100一电子设备30通过网际网络50向运算处理单元21传送用户在电子设备30的显示屏幕32显示应用程序31的分类标签选择界面所选择的数个分类标签;A100 - electronic device 30 transmits to processing unit 21 via Internet 50 a plurality of category tags selected by the user on the category tag selection interface of application 31 displayed on display screen 32 of electronic device 30;
A110,运算处理单元21接收这些分类标签,将这些分类标签组合成标签参数组;A110, the processing unit 21 receives the classification labels and combines them into a label parameter group;
A120,运算处理单元21从模型数据库13提取模型参数列表,依标签参数组从模型参数列表筛选出相同或相近的标签参数组所对应的数个模型参数;A120, the processing unit 21 extracts a model parameter list from the model database 13, and selects a plurality of model parameters corresponding to the same or similar tag parameter groups from the model parameter list according to the tag parameter groups;
A130,运算处理单元21依这些模型参数从模型数据库13提取对应的数个头像模型;A130, the processing unit 21 extracts corresponding avatar models from the model database 13 according to the model parameters;
A140,运算处理单元21将这些头像模型封包后传送到该应用程序31;A140, the processing unit 21 packages the avatar models and transmits them to the application 31;
A150,应用程序31接收这些头像模型并进行解包后,显示在显示屏幕32供用户选择,显示屏幕32同时显示一重新生成标示符;A150, the application 31 receives and unpacks the avatar models and displays them on the display screen 32 for the user to select. The display screen 32 also displays a regeneration indicator;
A160,若用户选择这些头像模型之一,应用程序31将所选头像模型绑定用户并传送一注册通知到模型数据库13,模型数据库13依该注册通知更改该头像模型对应的状态码为已注册,结束该流程;A160, if the user selects one of these avatar models, the application 31 binds the selected avatar model to the user and sends a registration notification to the model database 13. The model database 13 changes the status code corresponding to the avatar model to registered according to the registration notification, and the process ends;
A170,若用户点选重新生成标示符,应用程序31传送一重新生成通知到运算处理单元21;A170, if the user clicks the regeneration indicator, the application 31 sends a regeneration notification to the processing unit 21;
A180,运算处理单元21接收重新生成通知,将标签参数组所对应的模型参数依每组的方式从多媒体数据库12提取对应的数个图档传送到一头像训练单元22,头像训练单元22利用一深度学习文字到生成图像模型70(Stable Diffusion)实时生成这些头像模型,将这些头像模型分别对应这些模型参数并储存到该模型数据库13;重复A140。A180, the processing unit 21 receives the regeneration notification, extracts the model parameters corresponding to the label parameter group from the multimedia database 12 and transmits them to an avatar training unit 22 in each group. The avatar training unit 22 uses a deep learning text-to-image generation model 70 (Stable Diffusion) to generate these avatar models in real time, corresponds these avatar models to these model parameters respectively and stores them in the model database 13; repeat A140.
在一种快速生成多组客制化用户头像方法的实施例中,依用户选择的这些分类标签从该模型数据库13快速提取这些头像模型的方法包含:由该用户在电子设备30上开启应用程序31,显示屏幕32显示分类标签选择界面;用户分别在人物标签选择-狗、在动作标签选择-听音乐、在背景标签选择-公园、在饰品标签选择-耳机,在他标签选择-最好的质量、高分辨率、简单的背景、超详细的眼睛;应用程序31将这些分类标签传送运算处理单元21,运算处理单元21将这些分类标签组合成标签参数组,并从模型数据库13提取模型参数列表,依标签参数组从该模型参数列表筛选出相同或相近的标签参数组所对应的这些模型参数,再依这些模型参数从该模型数据库13提取对应的这些头像模型,如图5所示。本方法可快速的提供满足用户需求的用户头像,解决了现有技术生成用户头像需要耗时5~10分钟的缺点。In one embodiment of a method for rapidly generating multiple customized user avatars, the method for rapidly extracting avatar models from the model database 13 based on user-selected category tags includes: the user launching an application 31 on an electronic device 30, displaying a category tag selection interface on screen 32; the user selecting "dog" in the "personality" tag, "listening to music" in the "action" tag, "park" in the "background" tag, "headphones" in the "accessories" tag, and "best quality," "high resolution," "simple background," and "ultra-detailed eyes" in the "other" tag; the application 31 transmitting these category tags to the processing unit 21, which combines the category tags into a tag parameter group and extracts a model parameter list from the model database 13. Based on the tag parameter group, the model parameters corresponding to the same or similar tag parameter groups are filtered from the model parameter list, and the corresponding avatar models are then extracted from the model database 13 based on these model parameters, as shown in FIG5 . This method can rapidly provide user avatars that meet user needs, resolving the drawback of prior art techniques that require 5 to 10 minutes to generate user avatars.
在快速生成多组客制化用户头像方法的一实施例中,用户选择这些分类标签的实时生成用户头像方法包含:由用户在该电子设备30上开启该应用程序31,显示屏幕32显示分类标签选择界面;用户分别在人物标签选择-猫、在物件标签选择-爵士鼓、在动作标签选择-演奏、在背景标签选择-公园、在饰品标签选择-耳机,在其他标签选择-最好的质量、高分辨率、简单的背景、超详细的眼睛、没有人类;应用程序31将这些分类标签传送到运算处理单元21,运 算处理单元21将这些分类标签组合成标签参数组,并将标签参数组所对应的模型参数依每组的方式从多媒体数据库12提取对应的数个图档传送到头像训练单元22,头像训练单元22利用深度学习文字到生成图像模型70(Stable Diffusion)实时生成这些头像模型,如图6所示。In one embodiment of the method for quickly generating multiple sets of customized user avatars, the method for real-time generation of user avatars in which the user selects these category tags includes: the user opens the application 31 on the electronic device 30, and the display screen 32 displays the category tag selection interface; the user selects "cat" in the character tag, "drums" in the object tag, "performance" in the action tag, "park" in the background tag, "headphones" in the accessories tag, and "best quality", "high resolution", "simple background", "super detailed eyes", and "no human" in other tags; the application 31 transmits these category tags to the processing unit 21, and the operation The processing unit 21 combines these classification labels into label parameter groups, and extracts the corresponding model parameters of the label parameter groups from the multimedia database 12 on a per-group basis, and transmits them to the avatar training unit 22. The avatar training unit 22 uses a deep learning text-to-image generative model 70 (Stable Diffusion) to generate these avatar models in real time, as shown in FIG6 .
在一实施例中,用户在分类标签选择界面的这些分类标签输入栏位,分别在人物标签选择或输入-一只兔子、在动作标签选择或输入-在弹吉他、在背景标签选择或输入-在公园、在饰品标签选择或输入-戴着耳机,在其他标签选择或输入-最好的质量、高分辨率、简单的背景、超详细的眼睛、没有人类;应用程序31将分类标签传送运算处理单元21,运算处理单元21将这些分类标签组合成标签参数组,并从该模型数据库13提取模型参数列表,依标签参数组从模型参数列表筛选出相同或相近的标签参数组所对应的这些模型参数,再依这些模型参数从模型数据库13提取对应的这些头像模型,如图7所示。In one embodiment, the user selects or enters - a rabbit in the character tag, - playing guitar in the action tag, - in the park in the background tag, - wearing headphones in the accessories tag, and - best quality, high resolution, simple background, ultra-detailed eyes, no humans in the category tag input fields of the category tag selection interface; the application 31 transmits the category tags to the processing unit 21, and the processing unit 21 combines the category tags into a tag parameter group and extracts a model parameter list from the model database 13, and filters out the model parameters corresponding to the same or similar tag parameter groups from the model parameter list according to the tag parameter group, and then extracts the corresponding avatar models from the model database 13 according to these model parameters, as shown in Figure 7.
在一实施例中,用户在分类标签选择界面的分类标签输入栏位输入一串文字-狗的性格,动漫风格,2D影像,穿衣服,打篮球,以公园为背景,动物方城市的角色,水墨画,全身照,低对比度影像;应用程序31将分类标签输入的文字串传送运算处理单元21的头像训练单元22,头像训练单元22利用深度学习文字到生成图像模型70(Stable Diffusion)实时生成这些头像模型,如图8所示。In one embodiment, a user enters a string of text in the category label input field of the category label selection interface - dog's personality, anime style, 2D image, wearing clothes, playing basketball, park background, animal city character, ink painting, full-body photo, low contrast image; the application 31 transmits the text string of the category label input to the avatar training unit 22 of the processing unit 21, and the avatar training unit 22 uses the deep learning text to generative image model 70 (Stable Diffusion) to generate these avatar models in real time, as shown in Figure 8.
在一实施例中,本发明的一种快速生成多组客制化用户头像方法包含:一电子设备通过网际网络传向运算处理单元送用户在该电子设备的显示屏幕显示的一分类标签选择界面所选择的数个分类标签;运算处理单元接收这些分类标签,将这些分类标签组合成一标签参数组并从一数据库服务器提取一模型参数列表,进一步筛选出相同或相近的该标签参数组所对应的数个模型参数,再依这些模型参数从数据库服务器提取对应的数个头像模型,将这些头像模型封包后传送到电子设备,电子设备接收这些头像模型并进行解包后显示在显示屏幕供该用户选择。In one embodiment, a method of quickly generating multiple sets of customized user avatars of the present invention includes: an electronic device transmits several category tags selected by a user in a category tag selection interface displayed on the display screen of the electronic device to a processing unit via the Internet; the processing unit receives these category tags, combines these category tags into a tag parameter group and extracts a model parameter list from a database server, further filters out several model parameters corresponding to the same or similar tag parameter groups, and then extracts several corresponding avatar models from the database server based on these model parameters, packages these avatar models and transmits them to the electronic device, which receives these avatar models, unpacks them, and displays them on the display screen for the user to select.
显示屏幕显示这些头像模型同时还显示一重新生成标示符,若用户点选该重新生成标示符,电子设备传送一重新生成通知到运算处理单元;运算处理单元接收重新生成通知,将这些分类标签传送运算处理单元,运算处理单元利用深度学习文字到生成图像模型实时生成这些头像模型,将这些头像模型封包后传送到电子设备,电子设备接收这些头像模型并进行解包后并显示在显示屏幕供该用户选择。The display screen displays these avatar models and also displays a regeneration identifier. If the user clicks on the regeneration identifier, the electronic device transmits a regeneration notification to the processing unit; the processing unit receives the regeneration notification and transmits these classification labels to the processing unit. The processing unit uses a deep learning text-to-image model to generate these avatar models in real time, packages these avatar models and transmits them to the electronic device. The electronic device receives these avatar models, unpacks them and displays them on the display screen for the user to select.
在一实施例中,运算处理单元接收重新生成通知,将标签参数组所对应的模型参数依每组的方式从数据库服务器提取对应的数个图档,运算处理单元利用一深度学习文字到生成图像模型实时生成这些头像模型,将这些头像模型封包后传送给电子设备,电子设备接收这些头像模型并进行解包后显示在显示屏幕供该用户选择;这些图档产生方法包含运算处理单元从该数据库服务器提取一分类标签列表,依这些分类标签文字内容,利用该深度学习文字到生成图像模型生成这些图档,将这些图档对应这些分类标签储存到该数据库服务器。In one embodiment, the processing unit receives a regeneration notification, extracts several corresponding images from the database server according to each group of model parameters corresponding to the label parameter group, and uses a deep learning text-to-image generation model to generate these avatar models in real time, packages these avatar models and transmits them to the electronic device, which receives these avatar models and unpacks them and displays them on the display screen for the user to select; the method for generating these images includes the processing unit extracting a classification label list from the database server, generating these images according to the text content of these classification labels using the deep learning text-to-image generation model, and storing these images corresponding to these classification labels in the database server.
这些分类标签更改的机制包含:1.当分类标签对应的头像模型被注册的次数大于限量次数,头像训练单元不再生成分类标签的该头像模型;2.当分类标签对应的头像模型被注册的 次数越多,将逐渐降低生成分类标签的头像模型机率;3.当分类标签对应的头像模型被选择不注册的次数越多,头像训练单元将会定期生成分类标签的头像模型。The mechanism for changing these classification labels includes: 1. When the number of times the avatar model corresponding to the classification label has been registered exceeds the limit, the avatar training unit will no longer generate the avatar model of the classification label; 2. When the avatar model corresponding to the classification label has been registered The more times, the lower the probability of generating a avatar model with a classification label. 3. When the avatar model corresponding to the classification label is selected not to be registered more times, the avatar training unit will regularly generate avatar models with the classification label.
尽管上文已经描述了某些示例性实施例和实施方式,但是根据该描述其他实施例和修改将是显而易见的。因此,本发明不限于这样的示例性实施例,而是限于所提出的权利要求和各种明显修改和等效布置的更广泛的范围。 While certain exemplary embodiments and implementations have been described above, other embodiments and modifications will be apparent from this description. Accordingly, the present invention is not limited to such exemplary embodiments, but rather to the broader scope of the claims set forth and various obvious modifications and equivalent arrangements.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2024/082555 WO2025194356A1 (en) | 2024-03-20 | 2024-03-20 | Method for quickly generating multiple groups of customized user avatars |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2024/082555 WO2025194356A1 (en) | 2024-03-20 | 2024-03-20 | Method for quickly generating multiple groups of customized user avatars |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025194356A1 true WO2025194356A1 (en) | 2025-09-25 |
Family
ID=97138085
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2024/082555 Pending WO2025194356A1 (en) | 2024-03-20 | 2024-03-20 | Method for quickly generating multiple groups of customized user avatars |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025194356A1 (en) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20150312523A1 (en) * | 2012-04-09 | 2015-10-29 | Wenlong Li | System and method for avatar management and selection |
| CN109388456A (en) * | 2018-09-20 | 2019-02-26 | 维沃移动通信有限公司 | A kind of head portrait selection method and mobile terminal |
| CN110472090A (en) * | 2019-08-20 | 2019-11-19 | 腾讯科技(深圳)有限公司 | Image search method and relevant apparatus, storage medium based on semantic label |
| CN116977486A (en) * | 2023-04-28 | 2023-10-31 | 北京搜狗科技发展有限公司 | Image generation method, device, equipment and storage medium |
-
2024
- 2024-03-20 WO PCT/CN2024/082555 patent/WO2025194356A1/en active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20150312523A1 (en) * | 2012-04-09 | 2015-10-29 | Wenlong Li | System and method for avatar management and selection |
| CN109388456A (en) * | 2018-09-20 | 2019-02-26 | 维沃移动通信有限公司 | A kind of head portrait selection method and mobile terminal |
| CN110472090A (en) * | 2019-08-20 | 2019-11-19 | 腾讯科技(深圳)有限公司 | Image search method and relevant apparatus, storage medium based on semantic label |
| CN116977486A (en) * | 2023-04-28 | 2023-10-31 | 北京搜狗科技发展有限公司 | Image generation method, device, equipment and storage medium |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11663827B2 (en) | Generating a video segment of an action from a video | |
| Escalera et al. | Chalearn looking at people and faces of the world: Face analysis workshop and challenge 2016 | |
| EP3473016B1 (en) | Method and system for automatically producing video highlights | |
| US9183557B2 (en) | Advertising targeting based on image-derived metrics | |
| US20220358405A1 (en) | System and Method for Generating Artificial Intelligence Driven Insights | |
| CN113630630B (en) | Method, device and equipment for processing video commentary dubbing information | |
| Irwin et al. | Consuming esports and trash talking: how do social norms and moderating attributes influence behaviour? | |
| Akranglyte et al. | Formation of character and image of sportsman as f competitive advantage in mass media | |
| Marr | Big Data for small business for dummies | |
| CN118276923A (en) | Mini-program dynamic configuration method, device, computer equipment and storage medium | |
| KR20150086289A (en) | Automated thumbnail selection for online video | |
| CN117915992A (en) | Data sticker generation for sports | |
| US20250254402A1 (en) | Systems and methods for generating sports media content for an interactive display | |
| WO2025194356A1 (en) | Method for quickly generating multiple groups of customized user avatars | |
| TWI891289B (en) | Method for rapidly generating multiple customized user avatars | |
| US20200342547A1 (en) | Systems and methods for facilitating engagement between entities and individuals | |
| TWM661875U (en) | A system that quickly generates multiple sets of customized user avatars | |
| CN113434779B (en) | Interactive reading method and device capable of intelligent recommendation, computing equipment and storage medium | |
| KR20110016728A (en) | Online game based automatic newspaper generation system and method | |
| CN114140604A (en) | Method and device for processing props of virtual scene and electronic equipment | |
| CA3125164A1 (en) | Technology configured to enable monitoring of user engagement with physical printed materials via augmented reality delivery system | |
| Eichelbaum et al. | Classification of icon type and cooldown state in video game replays | |
| Smith et al. | Dynamic data: branding the digital drive | |
| CN118612499A (en) | Image generation method, device, equipment, storage medium and computer program product | |
| JP2025070647A (en) | Information processing device, information processing method, program, and information processing system |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24930042 Country of ref document: EP Kind code of ref document: A1 |