CN116910198A

CN116910198A - Digital person control method and device, electronic equipment and storage medium

Info

Publication number: CN116910198A
Application number: CN202310572396.6A
Authority: CN
Inventors: 张培养; 吴松城
Original assignee: Xiamen Black Mirror Technology Co ltd
Current assignee: Xiamen Black Mirror Technology Co ltd
Priority date: 2023-05-19
Filing date: 2023-05-19
Publication date: 2023-10-20

Abstract

The invention discloses a digital human control method, device, electronic equipment and storage medium. The method includes: obtaining the target text to be broadcast, converting the target text into at least one clause according to the preset sentence splitting rules, and converting the sub-sentence into sub-sentences. Input the preset emotion classification model into the sentence, determine the emotion label corresponding to the clause, determine the target body animation and target expression animation from the preset animation library based on the emotion label, and input the clause, emotion label, and target timbre corresponding to the clause. Preset speech mouth shape generation model to obtain target audio and mouth shape animation. If there are multiple clauses, control the preset digital person to execute the target audio, mouth animation, target body animation and target expression animation according to the order of each clause, so that the preset digital person performs and broadcasts the target text, thereby realizing the digital People report the text according to the emotions corresponding to the text, thereby improving the interaction efficiency of digital people and improving the user experience.

Description

A digital human control method, device, electronic device and storage medium

技术领域Technical field

本申请涉及计算机技术领域，更具体地，涉及一种数字人的控制方法、装置、电子设备和存储介质。The present application relates to the field of computer technology, and more specifically, to a digital human control method, device, electronic equipment and storage medium.

背景技术Background technique

随着人工智能的不断发展，数字人交互也开始应用在各个领域中，以实现智能化的人机交互。With the continuous development of artificial intelligence, digital human interaction has also begun to be applied in various fields to achieve intelligent human-computer interaction.

现有技术中，在控制数字人进行交互时，数字人的语言交互与肢体动作中经常存在衔接延迟、动作与表述不一致、动作单一等问题，造成交互效率较低，影响了用户体验。In the existing technology, when controlling a digital human to interact, there are often problems such as connection delays, inconsistencies between movements and expressions, and single movements in the digital human's language interaction and body movements, resulting in low interaction efficiency and affecting the user experience.

因此，如何提高数字人的交互效率，是目前有待解决的技术问题。Therefore, how to improve the efficiency of digital human interaction is a technical problem that needs to be solved.

需要说明的是，在上述背景技术部分公开的信息仅用于加强对本公开的背景的理解，因此可以包括不构成对本领域普通技术人员已知的现有技术的信息。It should be noted that the information disclosed in the above background section is only used to enhance understanding of the background of the present disclosure, and therefore may include information that does not constitute prior art known to those of ordinary skill in the art.

发明内容Contents of the invention

本申请实施例提出了一种数字人的控制方法、装置、电子设备和存储介质，驱动数字人按与文本对应的情绪对文本进行播报，用以提高数字人的交互效率。Embodiments of the present application propose a digital human control method, device, electronic device, and storage medium to drive the digital human to broadcast the text according to the emotion corresponding to the text, so as to improve the interactive efficiency of the digital human.

第一方面，提供一种数字人的控制方法，所述方法包括：获取待播报的目标文本，按预设句子拆分规则将所述目标文本转换为至少一个子句；将所述子句输入预设情绪分类模型，确定与所述子句对应的情绪标签；根据所述情绪标签从预设动画库中确定目标肢体动画和目标表情动画；将所述子句、所述情绪标签以及与所述子句对应的目标音色输入预设语音口型生成模型，得到目标音频和口型动画；若所述子句为多个，按各所述子句的先后顺序控制预设数字人执行所述目标音频、所述口型动画、所述目标肢体动画和所述目标表情动画，以使所述预设数字人表演播报所述目标文本。In a first aspect, a method for controlling a digital human is provided. The method includes: obtaining a target text to be broadcast, converting the target text into at least one clause according to a preset sentence splitting rule; and inputting the clause Preset an emotion classification model to determine the emotion label corresponding to the clause; determine the target body animation and target expression animation from the preset animation library according to the emotion label; combine the clause, the emotion label and the The target timbre corresponding to the clause is input into the preset speech mouth shape generation model to obtain the target audio and mouth shape animation; if there are multiple clauses, the preset digital person is controlled to execute the above in the order of each clause. The target audio, the mouth shape animation, the target body animation and the target expression animation are used to make the preset digital person perform and broadcast the target text.

第二方面，提供一种数字人的控制装置，所述装置包括：获取模块，用于获取待播报的目标文本，按预设句子拆分规则将所述目标文本转换为至少一个子句；第一确定模块，用于将所述子句输入预设情绪分类模型，确定与所述子句对应的情绪标签；第二确定模块，用于根据所述情绪标签从预设动画库中确定目标肢体动画和目标表情动画；生成模块，用于将所述子句、所述情绪标签以及与所述子句对应的目标音色输入预设语音口型生成模型，得到目标音频和口型动画；控制模块，用于若所述子句为多个，按各所述子句的先后顺序控制预设数字人执行所述目标音频、所述口型动画、所述目标肢体动画和所述目标表情动画，以使所述预设数字人表演播报所述目标文本。In a second aspect, a control device for a digital human is provided. The device includes: an acquisition module for acquiring target text to be broadcast, and converting the target text into at least one clause according to preset sentence splitting rules; A determination module for inputting the clause into a preset emotion classification model and determining the emotion label corresponding to the clause; a second determination module for determining the target limb from the preset animation library based on the emotion label Animation and target expression animation; a generation module for inputting the clause, the emotion label and the target timbre corresponding to the clause into a preset voice mouth shape generation model to obtain the target audio and mouth shape animation; the control module , used to control the preset number of people to execute the target audio, the mouth shape animation, the target body animation and the target expression animation according to the order of each of the clauses if there are multiple clauses. So that the preset digital person performs and broadcasts the target text.

第三方面，提供一种电子设备，包括：处理器；以及存储器，用于存储所述处理器的可执行指令；其中，所述处理器配置为经由执行所述可执行指令来执行第一方面所述的数字人的控制方法。In a third aspect, an electronic device is provided, including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the first aspect via executing the executable instructions. The digital human control method.

第四方面，提供一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现第一方面所述的数字人的控制方法。A fourth aspect provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the digital human control method described in the first aspect is implemented.

通过应用以上技术方案，获取待播报的目标文本，按预设句子拆分规则将目标文本转换为至少一个子句，将子句输入预设情绪分类模型，确定与子句对应的情绪标签，根据情绪标签从预设动画库中确定目标肢体动画和目标表情动画，将子句、情绪标签以及与子句对应的目标音色输入预设语音口型生成模型，得到目标音频和口型动画。若子句为多个，按各子句的先后顺序控制预设数字人执行目标音频、口型动画、目标肢体动画和目标表情动画，以使预设数字人表演播报目标文本，以此实现使数字人按与文本对应的情绪对文本进行播报，从而提高了数字人的交互效率，并提升了用户体验。By applying the above technical solution, the target text to be broadcast is obtained, the target text is converted into at least one clause according to the preset sentence splitting rules, the clause is input into the preset emotion classification model, and the emotion label corresponding to the clause is determined. The emotion tag determines the target limb animation and target expression animation from the preset animation library, and inputs the clause, emotion tag, and target timbre corresponding to the clause into the preset speech mouth shape generation model to obtain the target audio and mouth shape animation. If there are multiple clauses, control the preset digital person to execute the target audio, mouth shape animation, target body animation and target expression animation according to the order of each clause, so that the preset digital person performs and broadcasts the target text, thereby realizing the digital People report the text according to the emotions corresponding to the text, thereby improving the interaction efficiency of digital people and improving the user experience.

附图说明Description of the drawings

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present application. For those skilled in the art, other drawings can also be obtained based on these drawings without exerting creative efforts.

图1示出了本发明实施例提出的一种数字人的控制方法的流程示意图；Figure 1 shows a schematic flow chart of a digital human control method proposed by an embodiment of the present invention;

图2示出了本发明另一实施例提出的一种数字人的控制方法的流程示意图；Figure 2 shows a schematic flow chart of a digital human control method proposed by another embodiment of the present invention;

图3示出了本发明实施例对预设数字人进行服装设置的流程示意图；Figure 3 shows a schematic flow chart of clothing setting for a preset digital person according to an embodiment of the present invention;

图4示出了本发明实施例提出的一种数字人的控制装置的结构示意图；Figure 4 shows a schematic structural diagram of a digital human control device proposed by an embodiment of the present invention;

图5示出了本发明实施例提出的一种电子设备的结构示意图。FIG. 5 shows a schematic structural diagram of an electronic device according to an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only some of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.

需要说明的是，本领域技术人员在考虑说明书及实践这里公开的发明后，将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化，这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的，本申请的真正范围和精神由权利要求部分指出。It should be noted that those skilled in the art will easily come up with other embodiments of the present application after considering the specification and practicing the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of this application that follow the general principles of this application and include common knowledge or customary technical means in the technical field that are not disclosed in this application. . It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the claims.

应当理解的是，本申请并不局限于下面已经描述并在附图中示出的精确结构，并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求来限制。It is to be understood that the present application is not limited to the precise structures described below and illustrated in the accompanying drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

本申请可用于众多通用或专用的计算装置环境或配置中。例如：个人计算机、服务器计算机、手持设备或便携式设备、平板型设备、多处理器装置、包括以上任何装置或设备的分布式计算环境等等。The present application may be used in numerous general purpose or special purpose computing device environments or configurations. For example: personal computers, server computers, handheld devices or portable devices, tablet devices, multi-processor devices, distributed computing environments including any of the above devices or devices, etc.

本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述，例如程序模块。一般地，程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请，在这些分布式计算环境中，由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中，程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types. The present application may also be practiced in distributed computing environments where tasks are performed by remote processing devices connected through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices.

本申请实施例提供一种数字人的控制方法，如图1所示，该方法包括以下步骤：An embodiment of the present application provides a digital human control method, as shown in Figure 1. The method includes the following steps:

步骤S101，获取待播报的目标文本，按预设句子拆分规则将所述目标文本转换为至少一个子句。Step S101: Obtain the target text to be broadcast, and convert the target text into at least one clause according to preset sentence splitting rules.

本实施例中，待播报的目标文本可以是用户在终端输入的或从其他服务器接收的，也可以是与用户发出的互动指令(如语音指令或文本指令等)对应的响应文本，还可以是在满足预设播报条件时由预设数字人进行自主播报的文本。目标文本由一个或多个子句组成，由于不同的子句可能表征不同的情绪，为了便于后续进行情绪识别，按预设句子拆分规则将目标文本转换为至少一个子句。例如，预设句子拆分规则可以为，根据目标文本中的标点符号(如句号、问号、感叹号、分号、逗号等)对目标文本进行拆分。In this embodiment, the target text to be broadcast may be input by the user on the terminal or received from other servers, or it may be a response text corresponding to an interactive instruction issued by the user (such as a voice instruction or a text instruction, etc.), or it may be Text that is independently broadcast by a preset digital person when the preset broadcast conditions are met. The target text consists of one or more clauses. Since different clauses may represent different emotions, in order to facilitate subsequent emotion recognition, the target text is converted into at least one clause according to the preset sentence splitting rules. For example, the preset sentence splitting rule may be to split the target text according to punctuation marks in the target text (such as periods, question marks, exclamation points, semicolons, commas, etc.).

步骤S102，将所述子句输入预设情绪分类模型，确定与所述子句对应的情绪标签。Step S102: Enter the clause into a preset emotion classification model to determine the emotion label corresponding to the clause.

本实施例中，预先对海量文本进行情绪标注，基于机器学习算法训练出一个预设情绪分类模型。在获取子句后，将子句输入该预设情绪分类模型，根据预设情绪分类模型的输出结果得到与子句对应的情绪标签。例如，若子句为“今天真是太衰了”，则对应的情绪标签为Frustration(受挫)；若子句为“这真是个天才的点子”，则对应的情绪标签为Excitement(兴奋)。In this embodiment, a large amount of text is emotionally annotated in advance, and a preset emotion classification model is trained based on a machine learning algorithm. After obtaining the clause, the clause is input into the preset emotion classification model, and the emotion label corresponding to the clause is obtained according to the output result of the preset emotion classification model. For example, if the clause is "Today is so bad", the corresponding emotion label is Frustration; if the clause is "This is such a genius idea", the corresponding emotion label is Excitement.

本领域技术人员可根据实际需要采用不同的机器学习算法训练出预设情绪分类模型，例如，使用RNN(Recurrent Neural Networks，循环神经网络)，采用多对一的网络结构，结合海量数据的标注进行训练，不同的机器学习算法并不影响本申请的保护范围。Those skilled in the art can use different machine learning algorithms to train a preset emotion classification model according to actual needs. For example, use RNN (Recurrent Neural Networks, cyclic neural network), adopt a many-to-one network structure, and combine it with the annotation of massive data. Training, different machine learning algorithms do not affect the protection scope of this application.

步骤S103，根据所述情绪标签从预设动画库中确定目标肢体动画和目标表情动画。Step S103: Determine the target body animation and target expression animation from the preset animation library according to the emotion tag.

本实施例中，预设动画库中包括多个预设肢体动画和多个预设表情动画，各预设肢体动画和各预设表情动画均关联相应的情绪，在获取情绪标签后，根据情绪标签查询预设动画库，从与该情绪标签匹配的预设肢体动画和预设表情动画中确定目标肢体动画和目标表情动画。In this embodiment, the preset animation library includes a plurality of preset body animations and a plurality of preset expression animations. Each preset body animation and each preset expression animation are associated with corresponding emotions. After obtaining the emotion tags, according to the emotion The tag queries the preset animation library, and determines the target limb animation and target expression animation from the preset limb animations and preset expression animations that match the emotion tag.

步骤S104，将所述子句、所述情绪标签以及与所述子句对应的目标音色输入预设语音口型生成模型，得到目标音频和口型动画。Step S104: Enter the clause, the emotion label, and the target timbre corresponding to the clause into a preset speech mouth shape generation model to obtain the target audio and mouth shape animation.

为了使预设数字人表演播报目标文本，在确定目标肢体动画和目标表情动画后，还需要获取相应的目标音频和口型动画。本实施例中，预先建立可以根据文本、音色和情绪标签生成音频和口型动画的预设语音口型生成模型，预设语音口型生成模型的形式可以为SDK(Software Development Kit，软件开发工具包)。将子句、情绪标签以及与子句对应的目标音色输入预设语音口型生成模型，由预设语音口型生成模型处理后得到符合情绪标签和目标音色的目标音频和口型动画。In order for the preset digital human to perform and broadcast the target text, after determining the target body animation and target expression animation, it is also necessary to obtain the corresponding target audio and mouth animation. In this embodiment, a preset voice mouth shape generation model that can generate audio and mouth shape animation based on text, timbre and emotion tags is pre-established. The form of the preset voice mouth shape generation model can be SDK (Software Development Kit, software development tool) Bag). Input the clause, emotion label and the target timbre corresponding to the clause into the preset speech mouth shape generation model. After processing by the preset speech mouth shape generation model, the target audio and mouth shape animation that conform to the emotion label and target timbre are obtained.

步骤S105，若所述子句为多个，按各所述子句的先后顺序控制预设数字人执行所述目标音频、所述口型动画、所述目标肢体动画和所述目标表情动画，以使所述预设数字人表演播报所述目标文本。Step S105, if there are multiple clauses, control the preset number of people to execute the target audio, the mouth shape animation, the target body animation and the target expression animation according to the order of each clause. So that the preset digital person performs and broadcasts the target text.

本实施例中，预设数字人可以是用户从多个预先建立的数字人中选定的，例如用户可根据自己的喜好(如服装、外貌、声音、性格、职业等)选择对应类型的数字人作为预设数字人。预设数字人也可以是根据用户提供的人脸照片创建的数字人。In this embodiment, the preset digital person can be selected by the user from multiple pre-established digital people. For example, the user can select a corresponding type of number according to his or her own preferences (such as clothing, appearance, voice, personality, occupation, etc.) People as default digital people. The preset digital person may also be a digital person created based on a face photo provided by the user.

每个子句均对应有目标音频、口型动画、目标肢体动画和目标表情动画，如果存在多个子句，需要按各子句在目标文本中的先后顺序，控制预设数字人执行目标音频、口型动画、目标肢体动画和目标表情动画，具体为执行相应的驱动参数，从而使数字人按目标音色和相应情绪表演播报目标文本，让用户感受到预设数字人在用一种特征的情绪开口讲话，提升了用户体验。可以理解的是，若仅存在一个子句，则直接控制预设数字人执行与该一个子句对应的目标音频、口型动画、目标肢体动画和目标表情动画。Each clause corresponds to the target audio, mouth animation, target body animation and target expression animation. If there are multiple clauses, it is necessary to control the preset digital person to execute the target audio, mouth according to the order of each clause in the target text. type animation, target body animation and target expression animation, specifically to execute the corresponding driving parameters, so that the digital person can broadcast the target text according to the target timbre and corresponding emotional performance, allowing the user to feel that the preset digital person is speaking with a characteristic emotion Speech improves the user experience. It can be understood that if there is only one clause, the preset digital person is directly controlled to execute the target audio, mouth animation, target body animation and target expression animation corresponding to the one clause.

举例来说，若目标文本为：“今天天气不错，主人可以出去散散心哦。千万别宅在家里，这样人家会不高兴的！期待你拥有快乐的一天哦！”。For example, if the target text is: "The weather is nice today, the owner can go out for a walk. Don't stay at home, otherwise people will be unhappy! I hope you have a happy day!".

按预设句子拆分规则将目标文本拆分后，得到多个子句如下：After splitting the target text according to the preset sentence splitting rules, multiple clauses are obtained as follows:

1)今天天气不错，主人可以出去散散心哦。1) The weather is nice today. Master can go out and relax.

2)千万别宅在家里，这样人家会不高兴的！2) Don’t stay at home, otherwise people will be unhappy!

3)期待你拥有快乐的一天哦！3) I hope you have a happy day!

把这三个子句分别输入预设情绪分类模型，得到三个情绪标签如下：Enter these three clauses into the preset emotion classification model respectively, and get the three emotion labels as follows:

1)Relief，轻松；1) Relief, easy;

2)Worry，担忧；2) Worry, worry;

3)Hope，希望。3) Hope, hope.

根据各情绪标签从预设动画库中确定目标肢体动画和目标表情动画，并将各子句、各情绪标签和目标音色输入预设语音口型生成模型，得到目标音频和口型动画，最后按各子句的先后顺序控制预设数字人执行目标音频、口型动画、目标肢体动画和目标表情动画，实现使预设数字人进行如下表演：Determine the target limb animation and target expression animation from the preset animation library according to each emotion label, and input each clause, each emotion label, and the target timbre into the preset speech mouth shape generation model to obtain the target audio and mouth shape animation, and finally press The sequence of each clause controls the preset digital person to perform the target audio, mouth animation, target body animation and target expression animation, so that the preset digital person can perform the following performances:

预设数字人先用轻松的语气和肢体(含表情)语言说“今天天气不错，主人可以出去散散心哦”，然后用有些担忧的语气和肢体(含表情)语言说“千万别宅在家里，这样人家会不高兴的”，最后用期待的语气和肢体(含表情)语言说“期待你拥有快乐的一天哦”。预设数字人把整段目标文本表演出来，从而带给用户生动有趣的体验。The preset digital person first uses a relaxed tone and body language (including expressions) to say "The weather is nice today, the owner can go out for a walk", and then uses a somewhat worried tone and body language (including expressions) to say "Don't stay home." At home, people will be unhappy if you do this." Finally, use an expectant tone and body language (including expressions) to say, "I hope you have a happy day." The preset digital human performs the entire target text, thus giving users a vivid and interesting experience.

本申请实施例还提出了一种数字人的控制方法，如图2所示，包括以下步骤：The embodiment of this application also proposes a digital human control method, as shown in Figure 2, including the following steps:

步骤S201，获取待播报的目标文本，按预设句子拆分规则将所述目标文本转换为至少一个子句。Step S201: Obtain the target text to be broadcast, and convert the target text into at least one clause according to preset sentence splitting rules.

本实施例中，待播报的目标文本可以是用户在终端输入的或从其他服务器接收的，也可以是与用户发出的互动指令(如语音指令或文本指令等)对应的响应文本，还可以是在满足预设播报条件时由预设数字人进行自主播报的文本。目标文本由一个或多个子句组成，由于不同的子句可能表征不同的情绪，为了便于后续进行情绪识别，按预设句子拆分规则将目标文本转换为至少一个子句。In this embodiment, the target text to be broadcast may be input by the user on the terminal or received from other servers, or it may be a response text corresponding to an interactive instruction issued by the user (such as a voice instruction or a text instruction, etc.), or it may be Text that is independently broadcast by a preset digital person when the preset broadcast conditions are met. The target text consists of one or more clauses. Since different clauses may represent different emotions, in order to facilitate subsequent emotion recognition, the target text is converted into at least one clause according to the preset sentence splitting rules.

步骤S202，将所述子句输入预设情绪分类模型，确定与所述子句对应的情绪标签。Step S202: Enter the clause into a preset emotion classification model to determine the emotion label corresponding to the clause.

步骤S203，从所述预设动画库中确定与所述情绪标签对应的动画组合，所述动画组合包括预设肢体动画和预设表情动画。Step S203: Determine an animation combination corresponding to the emotion label from the preset animation library, where the animation combination includes a preset body animation and a preset expression animation.

预设动画库中包括多个动画组合，每个动画组合包括符合相应情绪的预设肢体动画和预设表情动画，比如，情绪低落时，肢体动作变慢、面部表情呆滞；情绪高涨时，肢体动作敏捷，面部表情夸张等。将情绪标签与各动画组合对应的情绪进行比对，确定出与情绪标签对应的动画组合。The preset animation library includes multiple animation combinations. Each animation combination includes preset body animations and preset expression animations that match the corresponding emotions. For example, when the mood is low, the body movements slow down and the facial expressions are dull; when the mood is high, the body movements Agile movements, exaggerated facial expressions, etc. Compare the emotion label with the emotion corresponding to each animation combination to determine the animation combination corresponding to the emotion label.

步骤S204，若所述动画组合为多个，从各所述动画组合中随机选定一个与所述预设数字人的性别一致的目标动画组合，并根据所述目标动画组合确定所述目标肢体动画和所述目标表情动画。Step S204: If there are multiple animation combinations, randomly select a target animation combination that is consistent with the gender of the preset digital person from each of the animation combinations, and determine the target limb based on the target animation combination. animation and the target expression animation.

不同性别的数字人会表现出不同的肢体动作和表情，同一情绪标签可对应一个或多个动画组合。因此，若与情绪标签对应的动画组合为多个，从各动画组合中随机选定一个与预设数字人的性别一致的目标动画组合，并将目标动画组合中的预设肢体动画和预设表情动画作为目标肢体动画和目标表情动画，从而使预设数字人的动作和表情符合自身的性别，实现更加准确的对预设数字人进行控制。Digital people of different genders will show different body movements and expressions, and the same emotion label can correspond to one or more animation combinations. Therefore, if there are multiple animation combinations corresponding to the emotion tags, randomly select a target animation combination that is consistent with the gender of the preset digital person from each animation combination, and combine the preset body animations and preset body animations in the target animation combination. Expression animation serves as the target limb animation and target expression animation, so that the movements and expressions of the preset digital person conform to their own gender, achieving more accurate control of the preset digital person.

可选的，还可根据预设数字人的多种预设属性确定相应的目标动画组合，例如，若与情绪标签对应的动画组合为多个，从各动画组合中随机选定一个与预设数字人的多种预设属性中的至少一种一致的目标动画组合，并将目标动画组合中的预设肢体动画和预设表情动画作为目标肢体动画和目标表情动画,其中，所述预设属性包括性别、职业、年龄、性格等，从而使目标肢体动画和所述目标表情动画更加符合预设数字人的特征。Optionally, the corresponding target animation combination can also be determined based on various preset attributes of the preset digital person. For example, if there are multiple animation combinations corresponding to the emotion tags, randomly select one from each animation combination that matches the preset one. At least one consistent target animation combination among multiple preset attributes of the digital human, and the preset limb animation and preset expression animation in the target animation combination are used as the target limb animation and target expression animation, wherein the preset Attributes include gender, occupation, age, personality, etc., thereby making the target body animation and the target expression animation more consistent with the characteristics of the preset digital human.

步骤S205，若所述动画组合为一个，根据所述动画组合确定所述目标肢体动画和所述目标表情动画。Step S205: If the animation combination is one, determine the target limb animation and the target expression animation according to the animation combination.

若与情绪标签对应的动画组合为一个，将该动画组合中的预设肢体动画和预设表情动画作为目标肢体动画和目标表情动画。If the animations corresponding to the emotion tags are combined into one, the preset body animations and preset expression animations in the animation combination are used as the target body animations and target expression animations.

步骤S206，将所述子句、所述情绪标签以及与所述子句对应的目标音色输入预设语音口型生成模型，得到目标音频和口型动画。Step S206: Enter the clause, the emotion label, and the target timbre corresponding to the clause into a preset speech mouth shape generation model to obtain the target audio and mouth shape animation.

为了使预设数字人表演播报目标文本，在确定目标肢体动画和目标表情动画后，还需要获取相应的目标音频和口型动画。本实施例中，预先建立可以根据文本、音色和情绪标签生成音频和口型动画的预设语音口型生成模型。将子句、情绪标签以及与子句对应的目标音色输入预设语音口型生成模型，由预设语音口型生成模型处理后得到符合情绪标签和目标音色的目标音频和口型动画。In order for the preset digital human to perform and broadcast the target text, after determining the target body animation and target expression animation, it is also necessary to obtain the corresponding target audio and mouth animation. In this embodiment, a preset voice mouth shape generation model that can generate audio and mouth shape animation based on text, timbre, and emotion tags is pre-established. Input the clause, emotion label and the target timbre corresponding to the clause into the preset speech mouth shape generation model. After processing by the preset speech mouth shape generation model, the target audio and mouth shape animation that conform to the emotion label and target timbre are obtained.

步骤S207，若所述子句为多个，按各所述子句的先后顺序控制预设数字人执行所述目标音频、所述口型动画、所述目标肢体动画和所述目标表情动画，以使所述预设数字人表演播报所述目标文本。Step S207, if there are multiple clauses, control the preset number of people to execute the target audio, the mouth shape animation, the target body animation and the target expression animation in the order of each of the clauses, So that the preset digital person performs and broadcasts the target text.

每个子句均对应有目标音频、口型动画、目标肢体动画和目标表情动画，如果存在多个子句，需要按各子句在目标文本中的先后顺序，控制预设数字人执行目标音频、口型动画、目标肢体动画和目标表情动画，具体为执行相应的驱动参数，从而使数字人按目标音色和相应情绪表演播报目标文本，让用户感受到预设数字人在用一种特征的情绪开口讲话，提升了用户体验。可以理解的是，若仅存在一个子句，则直接控制预设数字人执行与该子句对应的目标音频、口型动画、目标肢体动画和目标表情动画。Each clause corresponds to the target audio, mouth animation, target body animation and target expression animation. If there are multiple clauses, it is necessary to control the preset digital person to execute the target audio, mouth according to the order of each clause in the target text. type animation, target body animation and target expression animation, specifically to execute the corresponding driving parameters, so that the digital person can broadcast the target text according to the target timbre and corresponding emotional performance, allowing the user to feel that the preset digital person is speaking with a characteristic emotion Speech improves the user experience. It can be understood that if there is only one clause, the preset digital person is directly controlled to execute the target audio, mouth animation, target body animation and target expression animation corresponding to the clause.

在本申请一些实施例中，在按各所述子句的先后顺序控制预设数字人执行所述目标音频、所述口型动画、所述目标肢体动画和所述目标表情动画之前，如图3所示，所述方法还包括以下步骤：In some embodiments of the present application, before controlling the preset digital person to execute the target audio, the mouth animation, the target body animation and the target expression animation in the order of each clause, as shown in Figure As shown in 3, the method also includes the following steps:

步骤S301，根据与所述预设数字人的当前服装对应的当前温度区间，判断所述当前服装是否与当前气温匹配。Step S301: Based on the current temperature interval corresponding to the current clothing of the preset digital person, determine whether the current clothing matches the current temperature.

本实施例中，预先为预设数字人设置多套可选服装，各套服装对应合适的预设温度区间。首先确定与预设数字人的当前服装对应的当前温度区间，根据当前温度区间判断当前服装是否与当前气温匹配，其中，当前气温可以从互联网实时获取，也可根据最近一次获取的气温值确定。例如，若当前气温处于当前温度区间，则确认两者匹配，否则不匹配。In this embodiment, multiple sets of optional clothing are set in advance for the preset digital person, and each set of clothing corresponds to an appropriate preset temperature range. First, determine the current temperature range corresponding to the current clothing of the preset digital person, and determine whether the current clothing matches the current temperature based on the current temperature range. The current temperature can be obtained in real time from the Internet, or can be determined based on the most recently obtained temperature value. For example, if the current temperature is within the current temperature range, confirm that the two match, otherwise they do not match.

步骤S302，若不匹配，根据与各当前可选服装对应的各预设温度区间，判断各当前可选服装中是否存在与所述当前气温匹配的匹配服装。Step S302: If there is no match, determine whether there is matching clothing matching the current temperature among each currently optional clothing based on each preset temperature interval corresponding to each currently optional clothing.

若当前服装与当前气温不匹配，说明需要对当前服装进行更换，确定与各当前可选服装对应的各预设温度区间，将各预设温度区间与当前气温进行比较，判断各当前可选服装中是否存在与当前气温匹配的匹配服装。If the current clothing does not match the current temperature, it means that the current clothing needs to be replaced. Each preset temperature interval corresponding to each currently optional clothing is determined. Each preset temperature interval is compared with the current temperature to determine each currently optional clothing. Is there a matching outfit that matches the current temperature?

步骤S303，若存在所述匹配服装且所述匹配服装为多个，根据与各所述匹配服装对应的各匹配温度区间从各所述匹配服装中确定最佳服装，并将所述当前服装替换为所述最佳服装。Step S303: If there is a matching garment and there are multiple matching garments, determine the best garment from each matching garment according to each matching temperature interval corresponding to each matching garment, and replace the current garment with the matching garment. Best outfit for said.

若存在匹配服装且匹配服装为多个，还需要进一步从各匹配服装中确定最佳服装，具体的，根据与各匹配服装对应的各匹配温度区间，从各匹配服装中确定最佳服装，最后将当前服装替换为最佳服装，从而使预设数字人的服装更加符合当前气温，提高了趣味性和用户体验。If there are matching clothing and there are multiple matching clothing, it is necessary to further determine the best clothing from each matching clothing. Specifically, according to each matching temperature interval corresponding to each matching clothing, determine the best clothing from each matching clothing, and finally Replace the current clothing with the best clothing, so that the preset digital person's clothing is more in line with the current temperature, improving fun and user experience.

可以理解的是，若当前服装与当前气温匹配，则保持当前服装。若匹配服装为一个，则将当前服装替换为该匹配服装。It can be understood that if the current clothing matches the current temperature, the current clothing is maintained. If there is one matching clothing, replace the current clothing with the matching clothing.

在本申请一些实施例中，在判断各当前可选服装中是否存在与所述当前气温匹配的匹配服装之后，所述方法还包括：In some embodiments of the present application, after determining whether there is matching clothing matching the current temperature among the currently available clothing, the method further includes:

若不存在所述匹配服装，确定所述当前温度区间中最大值和最小值分别减去所述当前气温后的第一差值和第二差值；If the matching clothing does not exist, determine the first difference and the second difference respectively obtained by subtracting the current temperature from the maximum value and the minimum value in the current temperature interval;

将所述第一差值和第二差值中绝对值最小的一个作为目标差值；The one with the smallest absolute value among the first difference and the second difference is used as the target difference;

将所述目标差值与多个预设抱怨区间进行比较，若存在与所述目标差值对应的目标抱怨区间，控制所述预设数字人执行与所述目标抱怨区间对应的抱怨动画。The target difference is compared with multiple preset complaint intervals. If there is a target complaint interval corresponding to the target difference, the preset digital person is controlled to execute a complaint animation corresponding to the target complaint interval.

本实施例中，如果各当前可选服装中不存在匹配服装，会使预设数字人执行相应的抱怨动画，如向用户抱怨太冷或太热。具体的，先确定当前温度区间中最大值和最小值分别减去当前气温后的第一差值和第二差值，然后将第一差值和第二差值中绝对值最小的一个作为目标差值，再将目标差值与多个预设抱怨区间进行比较，若存在与目标差值对应的目标抱怨区间，控制预设数字人执行与目标抱怨区间对应的抱怨动画，实现使预设数字人进行诉苦表演，从而使预设数字人更加符合真人对温度的感受，进一步提升了趣味性和用户体验。In this embodiment, if there is no matching clothing among the currently selectable clothing, the preset digital person will perform a corresponding complaint animation, such as complaining to the user about being too cold or too hot. Specifically, first determine the first difference value and the second difference value after subtracting the current temperature from the maximum value and the minimum value in the current temperature interval, and then use the smallest absolute value among the first difference value and the second difference value as the target. difference, and then compare the target difference with multiple preset complaint intervals. If there is a target complaint interval corresponding to the target difference, control the preset digital person to execute the complaint animation corresponding to the target complaint interval to achieve the goal of making the preset number People perform complaints, so that the preset digital people are more in line with real people's feelings about temperature, further improving the fun and user experience.

另外，若不存在匹配服装，还可将当前服装的当前温度区间与各当前可选服装的各预设温度区间之间进行比较，并选择与当前气温最接近的最佳温度区间，将与该最佳温度区间对应的服装作为匹配服装，如果当前服装为匹配服装，则保持当前服装，否则将当前服装替换为匹配服装。In addition, if there is no matching clothing, you can also compare the current temperature range of the current clothing with each preset temperature range of each currently available clothing, and select the best temperature range closest to the current temperature, and match it with the current temperature range. The clothing corresponding to the optimal temperature range is used as matching clothing. If the current clothing is matching clothing, the current clothing is maintained, otherwise the current clothing is replaced with matching clothing.

举例来说，预设抱怨区间包括(-1,0)，(-∞,-10]，(0,1]，(10,+∞)。For example, the default complaint intervals include (-1,0), (-∞,-10], (0,1], (10,+∞).

若目标抱怨区间为(-1,0)，按以下文本使预设数字人执行抱怨动画：If the target complaint interval is (-1,0), press the following text to cause the preset number of people to execute the complaint animation:

1)人家有点冷啊！1) It’s a bit cold here!

2)天气转凉了，主人给我穿厚点吧！2) The weather is getting colder. Master, please wear thicker clothes for me!

若目标抱怨区间为(-∞,-10]，按以下文本使预设数字人执行抱怨动画：If the target complaint interval is (-∞,-10], press the following text to make the preset number of people execute the complaint animation:

1)主人，我要冻死了！！！1) Master, I’m freezing to death! ! !

2)我要冻死了，谁来救救我啊！！！2) I'm freezing to death, who will save me? ! !

3)主人，你再不给我穿厚衣服，我就不理你了！！！3) Master, if you don’t put me in thick clothes, I will ignore you! ! !

若目标抱怨区间为(0,1]，按以下文本使预设数字人执行抱怨动画：If the target complaint interval is (0,1], press the following text to cause the preset number of people to execute the complaint animation:

1)咦，我好像微微流汗了。1) Hey, I seem to be sweating a little.

2)脱件外套应该会更舒服吧？2) It would be more comfortable to take off a coat, right?

若目标抱怨区间为(10,+∞)，按以下文本使预设数字人执行抱怨动画：If the target complaint interval is (10,+∞), press the following text to make the preset number of people execute the complaint animation:

1)再这样闷下去，我就要脱水啦！！！1) If I continue to be bored like this, I will become dehydrated! ! !

2)主人，你是打算想把我热死吗？2) Master, are you planning to heat me to death?

在本申请一些实施例中，在控制所述预设数字人执行与所述预设抱怨区间对应的抱怨动画之后，所述方法还包括：In some embodiments of the present application, after controlling the preset digital person to execute a complaint animation corresponding to the preset complaint interval, the method further includes:

展示为所述预设数字人增加服装数量的提示信息。Prompt information for increasing the number of clothes for the preset digital person is displayed.

本实施例中，若出现抱怨动画，说明当前可选服装的数量不足，展示为预设数字人增加服装数量的提示信息，以使用户通过购买或做任务增加服装数量，提升了用户体验。In this embodiment, if a complaint animation appears, it means that the number of currently available clothing is insufficient, and a prompt message for increasing the number of clothing for the preset digital person is displayed, so that the user can increase the number of clothing by purchasing or completing tasks, which improves the user experience.

在本申请一些实施例中，所述根据与各所述匹配服装对应的各匹配温度区间从各所述匹配服装中确定最佳服装，包括：In some embodiments of the present application, determining the best clothing from each matching clothing based on each matching temperature interval corresponding to each matching clothing includes:

根据公式一确定与所述匹配温度区间对应的α值，所述公式一为：The α value corresponding to the matching temperature interval is determined according to Formula 1, which is:

确定各所述α值中的最小α值，并将与所述最小α值对应的匹配服装作为所述最佳服装；或，Determine the minimum α value among each of the α values, and use the matching clothing corresponding to the minimum α value as the best clothing; or,

从各所述α值中确定处于预设取值范围的多个目标α值，并从与各所述目标α值对应的各目标匹配服装中随机选定一个作为所述最佳服装；Determine a plurality of target α values within a preset value range from each of the α values, and randomly select one of the target matching garments corresponding to each of the target α values as the best garment;

其中，L1为所述当前温度减去所述匹配温度区间中最小值的差值，L2为所述匹配温度区间中最大值减去所述当前温度的差值。Wherein, L1 is the difference between the current temperature minus the minimum value in the matching temperature interval, and L2 is the difference between the maximum value in the matching temperature interval minus the current temperature.

本实施例中，为了避免频繁切换服装，需要使预设数字人的服装以最能长时间抵抗气温变化的策略来选择服装。具体的，先根据公式一确定与匹配温度区间对应的α值，公式一可以看出，越是靠近匹配温度区间中心的值，越被优先选取。可以根据各α值中的最小α值确定最佳服装，也可以基于预设取值范围从各α值中确定多个目标α值，从各目标α值中随机选定一个作为最佳服装，从而避免了频繁进行服装切换。In this embodiment, in order to avoid frequent clothing switching, the clothing of the preset digital person needs to be selected with a strategy that can best resist temperature changes for a long time. Specifically, first determine the α value corresponding to the matching temperature interval according to Formula 1. It can be seen from Formula 1 that the closer the value is to the center of the matching temperature interval, the more preferred it is. The best clothing can be determined based on the minimum α value among the α values, or multiple target α values can be determined from each α value based on the preset value range, and one of the target α values can be randomly selected as the best clothing. This avoids frequent clothing switching.

举例来说，若当前气温为18℃，匹配温度区间包括[8,18],[12,22],[19,30]，最佳服装对应的温度区间应为[12,22]，理由如下：当前气温是18℃，对于[8,18]这个区间的服装，若下一刻气温稍微升高一点，该区间的服装就已经不适用了；而[19,30]这个气温区间完全不适合；只有[12,22]这个区间最能抵抗当前气温的变化。For example, if the current temperature is 18°C, the matching temperature range includes [8,18], [12,22], [19,30], and the temperature range corresponding to the best clothing should be [12,22] for the following reasons : The current temperature is 18°C. For clothing in the [8,18] range, if the temperature rises slightly next moment, the clothing in this range will no longer be suitable; and the [19,30] temperature range is completely unsuitable; Only the interval [12,22] is most resistant to changes in current temperature.

作为一种替代方式，所述根据与各所述匹配服装对应的各匹配温度区间从各所述匹配服装中确定最佳服装，包括：As an alternative, determining the best clothing from each matching clothing based on each matching temperature interval corresponding to each matching clothing includes:

确定当天的最高气温和最低气温，并基于所述最高气温和所述最低气温生成目标温度区间；将各匹配温度区间中与所述目标温度区间重合度最高的一个作为最佳温度区间，根据所述最佳温度区间确定所述最佳服装，从而使预设数字人的服装更加符合当前的气温，避免了频繁进行服装切换。Determine the highest temperature and the lowest temperature of the day, and generate a target temperature interval based on the highest temperature and the lowest temperature; use the one with the highest degree of overlap with the target temperature interval among the matching temperature intervals as the best temperature interval. The optimal temperature interval determines the optimal clothing, thereby making the clothing of the preset digital person more consistent with the current temperature and avoiding frequent clothing switching.

在本申请一些实施例中，在将所述当前服装替换为所述最佳服装之后，所述方法还包括：In some embodiments of the present application, after replacing the current clothing with the optimal clothing, the method further includes:

按预设文件格式将所述预设数字人保存为静态文件。The preset digital person is saved as a static file in a preset file format.

本实施例中，通过将按预设文件格式将预设数字人保存为静态文件，避免了加载过程中动态计算，可以实现数字人的快速加载。其中，预设文件格式包括但不限于glTF、FBX等。In this embodiment, by saving the preset digital human as a static file in a preset file format, dynamic calculation during the loading process is avoided, and rapid loading of the digital human can be achieved. Among them, the default file formats include but are not limited to glTF, FBX, etc.

通过应用以上技术方案，获取待播报的目标文本，按预设句子拆分规则将目标文本转换为至少一个子句，将子句输入预设情绪分类模型，确定与子句对应的情绪标签，从预设动画库中确定与情绪标签对应的动画组合，其中，动画组合包括预设肢体动画和预设表情动画。若动画组合为多个，从各动画组合中随机选定一个与预设数字人的性别一致的目标动画组合，并根据目标动画组合确定目标肢体动画和目标表情动画，若动画组合为一个，根据动画组合确定目标肢体动画和目标表情动画，将子句、情绪标签以及与子句对应的目标音色输入预设语音口型生成模型，得到目标音频和口型动画。若子句为多个，按各子句的先后顺序控制预设数字人执行目标音频、口型动画、目标肢体动画和目标表情动画，以使预设数字人表演播报目标文本，以此实现使数字人按与文本对应的情绪对文本进行播报，从而提高了数字人的交互效率，并提升了用户体验。By applying the above technical solution, the target text to be broadcast is obtained, the target text is converted into at least one clause according to the preset sentence splitting rules, the clause is input into the preset emotion classification model, and the emotion label corresponding to the clause is determined, from An animation combination corresponding to the emotion label is determined in the preset animation library, where the animation combination includes preset body animation and preset expression animation. If there are multiple animation combinations, randomly select a target animation combination that is consistent with the gender of the preset digital person from each animation combination, and determine the target limb animation and target expression animation based on the target animation combination. If there is one animation combination, based on The animation combination determines the target body animation and target expression animation, and inputs the clause, emotion label, and target timbre corresponding to the clause into the preset speech mouth shape generation model to obtain the target audio and mouth shape animation. If there are multiple clauses, control the preset digital person to execute the target audio, mouth shape animation, target body animation and target expression animation according to the order of each clause, so that the preset digital person performs and broadcasts the target text, thereby realizing the digital People report the text according to the emotions corresponding to the text, thereby improving the interaction efficiency of digital people and improving the user experience.

本申请实施例还提出了一种数字人的控制装置，如图4所示，所述装置包括：获取模块401，用于获取待播报的目标文本，按预设句子拆分规则将所述目标文本转换为至少一个子句；第一确定模块402，用于将所述子句输入预设情绪分类模型，确定与所述子句对应的情绪标签；第二确定模块403，用于根据所述情绪标签从预设动画库中确定目标肢体动画和目标表情动画；生成模块404，用于将所述子句、所述情绪标签以及与所述子句对应的目标音色输入预设语音口型生成模型，得到目标音频和口型动画；控制模块405，用于若所述子句为多个，按各所述子句的先后顺序控制预设数字人执行所述目标音频、所述口型动画、所述目标肢体动画和所述目标表情动画，以使所述预设数字人表演播报所述目标文本。The embodiment of the present application also proposes a digital human control device, as shown in Figure 4. The device includes: an acquisition module 401, which is used to acquire the target text to be broadcast, and divide the target text according to the preset sentence splitting rules. The text is converted into at least one clause; the first determination module 402 is used to input the clause into the preset emotion classification model and determine the emotion label corresponding to the clause; the second determination module 403 is used to according to the The emotion tag determines the target body animation and the target expression animation from the preset animation library; the generation module 404 is used to input the clause, the emotion tag, and the target timbre corresponding to the clause into the preset voice mouth shape generation model to obtain the target audio and mouth animation; the control module 405 is used to control the preset number of people to execute the target audio and mouth animation according to the order of each clause if there are multiple clauses. , the target body animation and the target expression animation, so that the preset digital person performs and broadcasts the target text.

在具体的应用场景中，第二确定模块403，具体用于：从所述预设动画库中确定与所述情绪标签对应的动画组合，所述动画组合包括预设肢体动画和预设表情动画；若所述动画组合为多个，从各所述动画组合中随机选定一个与所述预设数字人的性别一致的目标动画组合，并根据所述目标动画组合确定所述目标肢体动画和所述目标表情动画；若所述动画组合为一个，根据所述动画组合确定所述目标肢体动画和所述目标表情动画。In a specific application scenario, the second determination module 403 is specifically used to: determine an animation combination corresponding to the emotion tag from the preset animation library, where the animation combination includes a preset body animation and a preset expression animation. ; If there are multiple animation combinations, randomly select a target animation combination that is consistent with the gender of the preset digital person from each of the animation combinations, and determine the target limb animation and animation based on the target animation combination. The target expression animation; if the animation combination is one, the target limb animation and the target expression animation are determined according to the animation combination.

在具体的应用场景中，所述装置还包括服装模块，用于：根据与所述预设数字人的当前服装对应的当前温度区间，判断所述当前服装是否与当前气温匹配；若不匹配，根据与各当前可选服装对应的各预设温度区间，判断各当前可选服装中是否存在与所述当前气温匹配的匹配服装；若存在所述匹配服装且所述匹配服装为多个，根据与各所述匹配服装对应的各匹配温度区间从各所述匹配服装中确定最佳服装，并将所述当前服装替换为所述最佳服装。In a specific application scenario, the device further includes a clothing module for: judging whether the current clothing matches the current temperature according to the current temperature interval corresponding to the current clothing of the preset digital person; if not, According to each preset temperature interval corresponding to each currently optional clothing, it is determined whether there is a matching clothing that matches the current temperature in each currently optional clothing; if there is a matching clothing and there are multiple matching clothing, according to Each matching temperature interval corresponding to each of the matching clothing determines the best clothing from each of the matching clothing, and replaces the current clothing with the best clothing.

在具体的应用场景中，所述装置还包括抱怨模块，用于：若不存在所述匹配服装，确定所述当前温度区间中最大值和最小值分别减去所述当前气温后的第一差值和第二差值；将所述第一差值和第二差值中绝对值最小的一个作为目标差值；将所述目标差值与多个预设抱怨区间进行比较，若存在与所述目标差值对应的目标抱怨区间，控制所述预设数字人执行与所述目标抱怨区间对应的抱怨动画。In a specific application scenario, the device further includes a complaint module for: if the matching clothing does not exist, determine the first difference between the maximum value and the minimum value in the current temperature interval minus the current temperature. value and the second difference; use the smallest absolute value among the first difference and the second difference as the target difference; compare the target difference with multiple preset complaint intervals, and if there is any The target complaint interval corresponding to the target difference value is used to control the preset digital person to execute a complaint animation corresponding to the target complaint interval.

在具体的应用场景中，所述装置还包括提示模块，用于：展示为所述预设数字人增加服装数量的提示信息。In a specific application scenario, the device further includes a prompt module configured to display prompt information for increasing the number of clothes for the preset digital person.

在具体的应用场景中，所述服装模块具体用于：根据公式一确定与所述匹配温度区间对应的α值，所述公式一为：；确定各所述α值中的最小α值，并将与所述最小α值对应的匹配服装作为所述最佳服装；或，从各所述α值中确定处于预设取值范围的多个目标α值，并从与各所述目标α值对应的各目标匹配服装中随机选定一个作为所述最佳服装；其中，L1为所述当前温度减去所述匹配温度区间中最小值的差值，L2为所述匹配温度区间中最大值减去所述当前温度的差值。In a specific application scenario, the clothing module is specifically used to determine the α value corresponding to the matching temperature interval according to Formula 1, which is: ; Determine the minimum α value among each of the α values, and use the matching clothing corresponding to the minimum α value as the best clothing; or, determine from each of the α values within a preset value range. target α values, and randomly select one of the target matching garments corresponding to each target α value as the best garment; where L1 is the current temperature minus the minimum value in the matching temperature interval The difference, L2 is the difference between the maximum value in the matching temperature interval minus the current temperature.

在具体的应用场景中，所述装置还包括保存模块，用于：按预设文件格式将所述预设数字人保存为静态文件。In a specific application scenario, the device further includes a saving module, configured to save the preset digital person as a static file in a preset file format.

通过应用以上技术方案，数字人的控制装置包括：获取模块，用于获取待播报的目标文本，按预设句子拆分规则将目标文本转换为至少一个子句；第一确定模块，用于将子句输入预设情绪分类模型，确定与子句对应的情绪标签；第二确定模块，用于根据情绪标签从预设动画库中确定目标肢体动画和目标表情动画；生成模块，用于将子句、情绪标签以及与子句对应的目标音色输入预设语音口型生成模型，得到目标音频和口型动画；控制模块，用于若子句为多个，按各子句的先后顺序控制预设数字人执行目标音频、口型动画、目标肢体动画和目标表情动画，以使预设数字人表演播报目标文本，以此实现使数字人按与文本对应的情绪对文本进行播报，从而提高了数字人的交互效率，并提升了用户体验。By applying the above technical solution, the digital human control device includes: an acquisition module, used to obtain the target text to be broadcast, and convert the target text into at least one clause according to the preset sentence splitting rules; a first determination module, used to The clause inputs the preset emotion classification model to determine the emotion label corresponding to the clause; the second determination module is used to determine the target body animation and target expression animation from the preset animation library according to the emotion label; the generation module is used to convert the sub-clause into a preset emotion classification model. Sentences, emotion labels, and target timbres corresponding to the clauses are input into the preset voice mouth shape generation model to obtain target audio and mouth shape animations; the control module is used to control the presets in the order of each clause if there are multiple clauses. The digital human performs target audio, mouth animation, target body animation, and target expression animation to enable the preset digital human to perform and broadcast the target text, thereby enabling the digital human to broadcast the text according to the emotion corresponding to the text, thus improving the digital Human interaction efficiency and improved user experience.

本发明实施例还提供了一种电子设备，如图5所示，包括处理器501、通信接口502、存储器503和通信总线504，其中，处理器501，通信接口502，存储器503通过通信总线504完成相互间的通信，An embodiment of the present invention also provides an electronic device, as shown in Figure 5, including a processor 501, a communication interface 502, a memory 503, and a communication bus 504. The processor 501, the communication interface 502, and the memory 503 communicate through the communication bus 504. complete mutual communication,

存储器503，用于存储处理器的可执行指令；Memory 503, used to store executable instructions of the processor;

处理器501，被配置为经由执行所述可执行指令来执行：Processor 501 is configured to, via execution of the executable instructions:

获取待播报的目标文本，按预设句子拆分规则将所述目标文本转换为至少一个子句；将所述子句输入预设情绪分类模型，确定与所述子句对应的情绪标签；根据所述情绪标签从预设动画库中确定目标肢体动画和目标表情动画；将所述子句、所述情绪标签以及与所述子句对应的目标音色输入预设语音口型生成模型，得到目标音频和口型动画；若所述子句为多个，按各所述子句的先后顺序控制预设数字人执行所述目标音频、所述口型动画、所述目标肢体动画和所述目标表情动画，以使所述预设数字人表演播报所述目标文本。Obtain the target text to be broadcast, convert the target text into at least one clause according to the preset sentence splitting rules; input the clause into the preset emotion classification model to determine the emotion label corresponding to the clause; according to The emotion tag determines the target limb animation and target expression animation from the preset animation library; the clause, the emotion tag, and the target timbre corresponding to the clause are input into the preset speech mouth shape generation model to obtain the target Audio and mouth animation; if there are multiple clauses, control the preset number of people to execute the target audio, the mouth animation, the target body animation and the target according to the order of each clause. Expression animation, so that the preset digital person performs and announces the target text.

上述通信总线可以是PCI(Peripheral Component Interconnect，外设部件互连标准)总线或EISA(Extended Industry Standard Architecture，扩展工业标准结构)总线等。该通信总线可以分为地址总线、数据总线、控制总线等。为便于表示，图中仅用一条粗线表示，但并不表示仅有一根总线或一种类型的总线。The above communication bus may be a PCI (Peripheral Component Interconnect, Peripheral Component Interconnect Standard) bus or an EISA (Extended Industry Standard Architecture, Extended Industry Standard Architecture) bus, etc. The communication bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.

通信接口用于上述终端与其他设备之间的通信。The communication interface is used for communication between the above terminal and other devices.

存储器可以包括RAM(Random Access Memory，随机存取存储器)，也可以包括非易失性存储器，例如至少一个磁盘存储器。可选的，存储器还可以是至少一个位于远离前述处理器的存储装置。The memory may include RAM (Random Access Memory), or may include non-volatile memory, such as at least one disk memory. Optionally, the memory may also be at least one storage device located far away from the aforementioned processor.

上述的处理器可以是通用处理器，包括CPU(Central Processing Unit，中央处理器)、NP(Network Processor，网络处理器)等；还可以是DSP(Digital Signal Processing，数字信号处理器)、ASIC(Application Specific Integrated Circuit，专用集成电路)、FPGA(Field Programmable Gate Array，现场可编程门阵列)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。The above-mentioned processor can be a general-purpose processor, including CPU (Central Processing Unit, central processing unit), NP (Network Processor, network processor), etc.; it can also be DSP (Digital Signal Processing, digital signal processor), ASIC ( Application Specific Integrated Circuit, FPGA (Field Programmable Gate Array, field programmable gate array) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

在本发明提供的又一实施例中，还提供了一种计算机可读存储介质，该计算机可读存储介质中存储有计算机程序，所述计算机程序被处理器执行时实现如上所述的数字人的控制方法。In yet another embodiment of the present invention, a computer-readable storage medium is also provided. The computer-readable storage medium stores a computer program. When the computer program is executed by a processor, the digital human body as described above is implemented. control method.

在本发明提供的又一实施例中，还提供了一种包含指令的计算机程序产品，当其在计算机上运行时，使得计算机执行如上所述的数字人的控制方法。In yet another embodiment provided by the present invention, a computer program product containing instructions is also provided, which when run on a computer causes the computer to execute the digital human control method as described above.

在上述实施例中，可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时，可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时，全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中，或者从一个计算机可读存储介质向另一个计算机可读存储介质传输，例如，所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质，(例如，软盘、硬盘、磁带)、光介质(例如，DVD)、或者半导体介质(例如固态硬盘)等。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in accordance with the embodiments of the present invention are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center through wired (such as coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more available media integrated. The available media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, DVD), or semiconductor media (eg, solid state drive), etc.

需要说明的是，在本文中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that these entities or operations are mutually exclusive. any such actual relationship or sequence exists between them. Furthermore, the terms "comprises," "comprises," or any other variations thereof are intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus that includes a list of elements includes not only those elements, but also those not expressly listed other elements, or elements inherent to the process, method, article or equipment. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, method, article, or apparatus that includes the stated element.

本说明书中的各个实施例均采用相关的方式描述，各个实施例之间相同相似的部分互相参见即可，每个实施例重点说明的都是与其他实施例的不同之处。Each embodiment in this specification is described in a related manner. The same and similar parts between the various embodiments can be referred to each other. Each embodiment focuses on its differences from other embodiments.

以上所述仅为本发明的较佳实施例而已，并非用于限定本发明的保护范围。凡在本发明的精神和原则之内所作的任何修改、等同替换、改进等，均包含在本发明的保护范围内。The above descriptions are only preferred embodiments of the present invention and are not intended to limit the scope of the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present invention are included in the protection scope of the present invention.

Claims

1. A method for controlling a digital human, characterized in that the method includes:

Obtain the target text to be broadcast, and convert the target text into at least one clause according to preset sentence splitting rules;

Enter the clause into a preset emotion classification model to determine the emotion label corresponding to the clause;

Determine the target body animation and target expression animation from the preset animation library according to the emotion tag;

Input the clause, the emotion label and the target timbre corresponding to the clause into the preset speech mouth shape generation model to obtain the target audio and mouth shape animation;

If there are multiple clauses, control the preset number of people to execute the target audio, the mouth animation, the target body animation and the target expression animation in the order of each clause, so that all The preset digital person performs and broadcasts the target text.

2. The method of claim 1, wherein determining the target body animation and target expression animation from a preset animation library according to the emotion tag includes:

Determine an animation combination corresponding to the emotion tag from the preset animation library, where the animation combination includes preset body animations and preset expression animations;

If there are multiple animation combinations, randomly select a target animation combination that is consistent with the gender of the preset digital person from each of the animation combinations, and determine the target limb animation and the target body animation according to the target animation combination. Describe target expression animation;

If the animation combination is one, the target limb animation and the target expression animation are determined according to the animation combination.

3. The method of claim 1, wherein the preset number of people is controlled to execute the target audio, the mouth shape animation, the target body animation and the target body animation in the order of each of the clauses. Before the target expression is animated, the method further includes:

According to the current temperature interval corresponding to the current clothing of the preset digital person, determine whether the current clothing matches the current temperature;

If there is no match, determine whether there is matching clothing matching the current temperature in each currently optional clothing based on each preset temperature interval corresponding to each currently optional clothing;

If there is a matching garment and there are multiple matching garments, the best garment is determined from each matching garment according to each matching temperature interval corresponding to each matching garment, and the current garment is replaced with the matching garment. Best costume.

4. The method of claim 3, wherein after determining whether there is matching clothing matching the current temperature among the currently selectable clothing, the method further includes:

If the matching clothing does not exist, determine the first difference and the second difference respectively obtained by subtracting the current temperature from the maximum value and the minimum value in the current temperature interval;

The one with the smallest absolute value among the first difference and the second difference is used as the target difference;

The target difference is compared with multiple preset complaint intervals. If there is a target complaint interval corresponding to the target difference, the preset digital person is controlled to execute a complaint animation corresponding to the target complaint interval.

5. The method of claim 4, wherein after controlling the preset digital person to execute a complaint animation corresponding to the preset complaint interval, the method further includes:

Prompt information for increasing the number of clothes for the preset digital person is displayed.

6. The method of claim 3, wherein determining the best garment from each of the matching garments according to each matching temperature interval corresponding to each of the matching garments includes:

The α value corresponding to the matching temperature interval is determined according to Formula 1, which is:

Determine the minimum α value among each of the α values, and use the matching clothing corresponding to the minimum α value as the best clothing; or,

Determine a plurality of target α values within a preset value range from each of the α values, and randomly select one of the target matching garments corresponding to each of the target α values as the best garment;

Wherein, L1 is the difference between the current temperature minus the minimum value in the matching temperature interval, and L2 is the difference between the maximum value in the matching temperature interval minus the current temperature.

7. The method of claim 3, wherein after replacing the current clothing with the optimal clothing, the method further includes:

The preset digital person is saved as a static file in a preset file format.

8. A digital human control device, characterized in that the device includes:

The acquisition module is used to acquire the target text to be broadcast, and convert the target text into at least one clause according to the preset sentence splitting rules;

A first determination module, configured to input the clause into a preset emotion classification model and determine the emotion label corresponding to the clause;

a second determination module, configured to determine the target body animation and target expression animation from the preset animation library according to the emotion tag;

A generation module configured to input the clause, the emotion label, and the target timbre corresponding to the clause into a preset speech mouth shape generation model to obtain the target audio and mouth shape animation;

A control module configured to, if there are multiple clauses, control the preset number of people to execute the target audio, the mouth shape animation, the target body animation and the target expression in the order of each of the clauses. Animation, so that the preset digital person performs and announces the target text.

9. An electronic device, characterized in that it includes:

processor; and

memory for storing executable instructions for the processor;

Wherein, the processor is configured to execute the digital human control method according to any one of claims 1 to 7 by executing the executable instructions.

10. A computer-readable storage medium on which a computer program is stored, characterized in that when the computer program is executed by a processor, the digital human control method according to any one of claims 1 to 7 is implemented.