CN114328815A

CN114328815A - Text mapping model processing method and device, computer equipment and storage medium

Info

Publication number: CN114328815A
Application number: CN202111376101.5A
Authority: CN
Inventors: 周辉阳; 闫昭
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-11-19
Filing date: 2021-11-19
Publication date: 2022-04-12
Anticipated expiration: 2041-11-19
Also published as: CN114328815B

Abstract

The embodiment of the application discloses a processing method and device of a text mapping model, computer equipment and a storage medium, and belongs to the technical field of computers. The method comprises the following steps: acquiring sample text information and first label information, and mapping the sample text information based on a text mapping model to obtain predicted text information; and determining second label information based on a first similarity between the predicted text information and the sample text information and a second similarity between the first label information and the sample text information, and training the text mapping model based on the second label information, the predicted text information and the first similarity corresponding to the predicted text information. In the embodiment of the application, the text information most similar to the sample text information is used as the second label information, and the text mapping model is trained based on the predicted text information, the second label text information and the first similarity between the predicted text information and the sample text information, so that the mapping effect of the text mapping model is improved.

Description

Processing method, device, computer equipment and storage medium of text mapping model

技术领域technical field

本申请实施例涉及计算机技术领域，特别涉及一种文本映射模型的处理方法、装置、计算机设备及存储介质。The embodiments of the present application relate to the field of computer technologies, and in particular, to a method, apparatus, computer device, and storage medium for processing a text mapping model.

背景技术Background technique

随着计算机技术的发展，人工智能模型广泛应用于多种场景下。其中，在自然语言处理场景下，通常会基于文本映射模型对文本信息进行映射，得到与该文本信息相似的文本信息，以便实现文本信息的扩充，但目前的文本映射模型的映射效果仍然较差。With the development of computer technology, artificial intelligence models are widely used in various scenarios. Among them, in the natural language processing scenario, the text information is usually mapped based on the text mapping model to obtain text information similar to the text information, so as to realize the expansion of the text information, but the mapping effect of the current text mapping model is still poor. .

发明内容SUMMARY OF THE INVENTION

本申请实施例提供了一种文本映射模型的处理方法、装置、计算机设备及存储介质，能够提升文本映射模型的映射效果。所述技术方案如下：Embodiments of the present application provide a method, apparatus, computer device and storage medium for processing a text mapping model, which can improve the mapping effect of the text mapping model. The technical solution is as follows:

一方面，提供了一种文本映射模型的处理方法，所述方法包括：In one aspect, a method for processing a text mapping model is provided, the method comprising:

获取样本文本信息及第一标签信息，所述第一标签信息为与所述样本文本信息之间的相似度不小于相似度阈值的文本信息；Obtain sample text information and first label information, where the first label information is text information whose similarity with the sample text information is not less than a similarity threshold;

基于所述文本映射模型，对所述样本文本信息进行映射，得到预测文本信息；Based on the text mapping model, mapping the sample text information to obtain predicted text information;

基于所述预测文本信息与所述样本文本信息之间的第一相似度及所述第一标签信息与所述样本文本信息之间的第二相似度，确定第二标签信息，所述第二标签信息为所述预测文本信息和所述第一标签信息中与所述样本文本信息之间的相似度较大的文本信息；Based on a first similarity between the predicted text information and the sample text information and a second similarity between the first label information and the sample text information, second label information is determined, and the second label information is determined. The label information is the text information with a larger similarity between the predicted text information and the first label information and the sample text information;

基于所述第二标签信息、所述预测文本信息及所述预测文本信息对应的所述第一相似度，对所述文本映射模型进行训练，所述文本映射模型用于映射出任一文本信息的相似文本信息。Based on the second label information, the predicted text information, and the first similarity corresponding to the predicted text information, the text mapping model is trained, and the text mapping model is used to map any text information. Similar text information.

另一方面，提供了一种文本映射模型的处理装置，所述装置包括：In another aspect, there is provided an apparatus for processing a text mapping model, the apparatus comprising:

获取模块，用于获取样本文本信息及第一标签信息，所述第一标签信息为与所述样本文本信息之间的相似度不小于相似度阈值的文本信息；an acquisition module, configured to acquire sample text information and first label information, where the first label information is text information whose similarity with the sample text information is not less than a similarity threshold;

映射模块，用于基于所述文本映射模型，对所述样本文本信息进行映射，得到预测文本信息；a mapping module, configured to map the sample text information based on the text mapping model to obtain predicted text information;

确定模块，用于基于所述预测文本信息与所述样本文本信息之间的第一相似度及所述第一标签信息与所述样本文本信息之间的第二相似度，确定第二标签信息，所述第二标签信息为所述预测文本信息和所述第一标签信息中与所述样本文本信息之间的相似度较大的文本信息；A determination module, configured to determine second label information based on a first similarity between the predicted text information and the sample text information and a second similarity between the first label information and the sample text information , the second label information is the text information with a larger similarity between the predicted text information and the first label information and the sample text information;

训练模块，用于基于所述第二标签信息、所述预测文本信息及所述预测文本信息对应的所述第一相似度，对所述文本映射模型进行训练，所述文本映射模型用于映射出任一文本信息的相似文本信息。A training module, configured to train the text mapping model based on the second label information, the predicted text information, and the first similarity corresponding to the predicted text information, and the text mapping model is used for mapping Similar text messages out of any text message.

在一种可能实现方式中，所述装置还包括：In a possible implementation, the apparatus further includes:

所述获取模块，还用于获取所述预测文本信息与所述样本文本信息之间的第三相似度和第四相似度，所述第三相似度指示所述预测文本信息与所述样本文本信息包含的词语的差异情况，所述第四相似度指示所述预测文本信息与所述样本文本信息之间的语义相似情况；The obtaining module is further configured to obtain a third degree of similarity and a fourth degree of similarity between the predicted text information and the sample text information, where the third degree of similarity indicates the predicted text information and the sample text Differences of words included in the information, and the fourth similarity indicates the semantic similarity between the predicted text information and the sample text information;

融合模块，用于对所述第三相似度及第四相似度进行加权融合，得到所述第一相似度。The fusion module is used for weighted fusion of the third similarity and the fourth similarity to obtain the first similarity.

在另一种可能实现方式中，所述获取模块，用于基于至少一种字符数目，分别对所述预测文本信息进行划分，得到至少一个第一词语集合，属于同一所述第一词语集合的词语包含字符的数目相同；基于至少一种所述字符数目，分别对所述样本文本信息进行划分，得到至少一个第二词语集合，属于同一所述第二词语集合的词语包含字符的数目相同；确定第一数目及第二数目，所述第一数目指示每种所述字符数目对应的所述第一词语集合与第二词语集合中不同词语的数目之和，所述第二数目指示至少一个所述第一词语集合与至少一个所述第二词语集合中词语的总数目；将所述第一数目与所述第二数目的比值，确定为所述预测文本信息与所述样本文本信息之间的所述第三相似度。In another possible implementation manner, the obtaining module is configured to divide the predicted text information based on the number of at least one character, respectively, to obtain at least one first word set, which belongs to the same first word set. The words contain the same number of characters; based on at least one of the number of characters, the sample text information is divided respectively to obtain at least one second set of words, and the words belonging to the same second set of words contain the same number of characters; determining a first number and a second number, the first number indicating the sum of the number of different words in the first set of words and the second set of words corresponding to each of the number of characters, the second number indicating at least one The total number of words in the first word set and at least one of the second word sets; determining the ratio of the first number to the second number as the difference between the predicted text information and the sample text information; the third similarity between.

在另一种可能实现方式中，所述获取模块，用于分别对所述预测文本信息及所述样本文本信息进行语义提取，得到所述预测文本信息的第一语义特征及所述样本文本信息的第二语义特征；将所述第一语义特征与所述第二语义特征之间的相似度，确定为所述第四相似度。In another possible implementation manner, the obtaining module is configured to perform semantic extraction on the predicted text information and the sample text information, respectively, to obtain a first semantic feature of the predicted text information and the sample text information The second semantic feature of ; determining the similarity between the first semantic feature and the second semantic feature as the fourth similarity.

在另一种可能实现方式中，所述获取模块，用于将所述预测文本信息与所述样本文本信息进行拼接，得到拼接文本信息；对所述拼接文本信息进行语义提取，得到所述拼接文本信息对应的第三语义特征；对所述第三语义特征进行分类处理，得到分类结果；将所述分类结果确定为所述第四相似度。In another possible implementation, the acquisition module is configured to splicing the predicted text information and the sample text information to obtain spliced text information; perform semantic extraction on the spliced text information to obtain the spliced text information A third semantic feature corresponding to the text information; classifying the third semantic feature to obtain a classification result; and determining the classification result as the fourth similarity.

在另一种可能实现方式中，所述训练模块，用于基于所述第二标签信息及所述预测文本信息，获取所述预测文本信息对应的第一损失值；基于所述预测文本信息对应的所述第一相似度和所述第一损失值，对所述文本映射模型进行训练。In another possible implementation, the training module is configured to obtain a first loss value corresponding to the predicted text information based on the second label information and the predicted text information; The first similarity and the first loss value of , the text mapping model is trained.

在另一种可能实现方式中，所述训练模块，用于分别基于目标相似度与每个所述预测文本信息对应的所述第一相似度之间的差值，获取每个所述预测文本信息对应的权重参数；基于每个所述预测文本信息对应的所述权重参数，对多个所述预测文本信息对应的所述第一损失值进行加权平均，得到第二损失值；将多个所述预测文本信息对应的所述第一损失值的平均值，确定为第三损失值；基于所述第二损失值及所述第三损失值，对所述文本映射模型进行训练。In another possible implementation, the training module is configured to obtain each of the predicted texts based on the difference between the target similarity and the first similarity corresponding to each of the predicted text information, respectively. weight parameter corresponding to the information; based on the weight parameter corresponding to each of the predicted text information, perform a weighted average of the first loss values corresponding to a plurality of the predicted text information to obtain a second loss value; The average value of the first loss values corresponding to the predicted text information is determined as a third loss value; the text mapping model is trained based on the second loss value and the third loss value.

在另一种可能实现方式中，所述获取模块，包括：In another possible implementation, the obtaining module includes:

获取单元，用于获取第一文本信息集合，所述第一文本信息集合包括多个第一文本信息及每个所述第一文本信息对应的标签信息；an acquisition unit, configured to acquire a first text information set, where the first text information set includes a plurality of first text information and label information corresponding to each of the first text information;

确定单元，用于确定每个所述第一文本信息与对应的标签信息之间的相似度；a determining unit, configured to determine the similarity between each of the first text information and the corresponding label information;

筛选单元，用于基于每个所述第一文本信息对应的相似度，从所述第一文本信息集合中筛选出相似度大于所述相似度阈值的所述样本文本信息及所述第一标签信息。A screening unit, configured to screen out the sample text information and the first label whose similarity is greater than the similarity threshold from the first text information set based on the similarity corresponding to each of the first text information information.

在另一种可能实现方式中，所述筛选单元，用于基于每个所述第一文本信息对应的相似度，从所述第一文本信息集合中筛选出相似度大于所述相似度阈值的至少一个第二文本信息，将筛选出的所述第二文本信息及对应的标签信息构成第二文本信息集合；从所述第二文本信息集合中获取所述样本文本信息及所述第一标签信息，所述样本文本信息为任一所述第二文本信息。In another possible implementation, the screening unit is configured to, based on the similarity corresponding to each of the first text information, filter out from the first text information set those whose similarity is greater than the similarity threshold. At least one piece of second text information, the second text information and the corresponding label information filtered out form a second text information set; the sample text information and the first label are obtained from the second text information set information, the sample text information is any of the second text information.

在另一种可能实现方式中，所述获取模块，还用于基于所述第一文本信息集合中的所述第一文本信息及对应的标签信息，对所述文本映射模型进行训练。In another possible implementation manner, the acquiring module is further configured to train the text mapping model based on the first text information and corresponding label information in the first text information set.

在另一种可能实现方式中，所述确定模块，用于响应于多个所述第一相似度中最大的所述第一相似度大于所述第二相似度，将最大的所述第一相似度对应的预测文本信息，确定为所述第二标签信息，多个所述第一相似度为多个所述预测文本信息与所述样本文本信息之间的相似度；或者，响应于多个所述第一相似度均不大于所述第二相似度，将所述第一标签信息确定为所述第二标签信息。In another possible implementation manner, the determining module is configured to, in response to the largest first similarity among the plurality of first similarities being greater than the second similarity, determine the largest first similarity The predicted text information corresponding to the similarity is determined to be the second label information, and the multiple first similarities are the similarity between multiple predicted text information and the sample text information; or, in response to multiple Each of the first degrees of similarity is not greater than the second degree of similarity, and the first label information is determined as the second label information.

在另一种可能实现方式中，所述映射模块，还用于基于所述文本映射模型，对目标文本信息进行映射，得到所述目标文本信息的相似文本信息。In another possible implementation manner, the mapping module is further configured to map the target text information based on the text mapping model to obtain similar text information of the target text information.

另一方面，提供了一种计算机设备，所述计算机设备包括处理器和存储器，所述存储器中存储有至少一条计算机程序，所述至少一条计算机程序由所述处理器加载并执行以实现如上述方面所述的文本映射模型的处理方法所执行的操作。In another aspect, a computer device is provided, the computer device includes a processor and a memory, the memory stores at least one computer program, the at least one computer program is loaded and executed by the processor to achieve the above The operations performed by the processing method of the text mapping model described in the aspect.

另一方面，提供了一种计算机可读存储介质，所述计算机可读存储介质中存储有至少一条计算机程序，所述至少一条计算机程序由处理器加载并执行以实现如上述方面所述的文本映射模型的处理方法所执行的操作。In another aspect, a computer-readable storage medium is provided, and at least one computer program is stored in the computer-readable storage medium, and the at least one computer program is loaded and executed by a processor to realize the text according to the above aspects. Map the action performed by the model's handler method.

再一方面，提供了一种计算机程序产品，包括计算机程序，所述计算机程序被处理器执行时实现如上述方面所述的文本映射模型的处理方法所执行的操作。In yet another aspect, a computer program product is provided, including a computer program that, when executed by a processor, implements the operations performed by the method for processing a text mapping model according to the above aspects.

本申请实施例提供的技术方案带来的有益效果至少包括：The beneficial effects brought by the technical solutions provided in the embodiments of the present application include at least:

本申请实施例提供的方法、装置、计算机设备及存储介质，按照预测文本信息和第一标签信息分别与样本文本信息之间的相似度，将预测文本信息和第一标签信息中与样本文本信息最相似的文本信息作为第二标签信息，不仅考虑到预测文本信息与第二标签文本信息之间的差异，还考虑到了预测文本信息与样本文本信息之间的第一相似度，基于第二标签信息、预测文本信息及预测文本信息对应的第一相似度，对文本映射模型进行训练，以保证后续基于训练后的文本映射模型能够映射出任一文本信息的相似文本信息，从而提升了文本映射模型的映射效果。In the method, apparatus, computer equipment, and storage medium provided by the embodiments of the present application, according to the similarity between the predicted text information and the first label information and the sample text information, respectively, the predicted text information and the first label information are compared with the sample text information. The most similar text information is used as the second label information, considering not only the difference between the predicted text information and the second label text information, but also the first similarity between the predicted text information and the sample text information, based on the second label information, predicted text information, and the first similarity corresponding to the predicted text information, and train the text mapping model to ensure that the subsequent text mapping model can map similar text information of any text information based on the trained text mapping model, thereby improving the text mapping model. mapping effect.

附图说明Description of drawings

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请实施例的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments. Obviously, the drawings in the following description are only some implementations of the embodiments of the present application. For example, for those of ordinary skill in the art, other drawings can also be obtained based on these drawings without any creative effort.

图1是本申请实施例提供的一种实施环境的结构示意图；1 is a schematic structural diagram of an implementation environment provided by an embodiment of the present application;

图2是本申请实施例提供的一种文本映射模型的处理方法的流程图；2 is a flowchart of a method for processing a text mapping model provided by an embodiment of the present application;

图3是本申请实施例提供的另一种文本映射模型的处理方法的流程图；3 is a flowchart of another method for processing a text mapping model provided by an embodiment of the present application;

图4是本申请实施例提供的一种文本映射模型的结构示意图；4 is a schematic structural diagram of a text mapping model provided by an embodiment of the present application;

图5是本申请实施例提供的又一种文本映射模型的处理方法的流程图；5 is a flowchart of another method for processing a text mapping model provided by an embodiment of the present application;

图6是本申请实施例提供的一种训练文本映射模型的示意图；6 is a schematic diagram of a training text mapping model provided by an embodiment of the present application;

图7是本申请实施例提供的一种数据对比的示意图；7 is a schematic diagram of a data comparison provided by an embodiment of the present application;

图8是本申请实施例提供的一种问答知识库的编辑界面的示意图；8 is a schematic diagram of an editing interface of a question and answer knowledge base provided by an embodiment of the present application;

图9是本申请实施例提供的一种文本映射模型的处理装置的结构示意图；9 is a schematic structural diagram of an apparatus for processing a text mapping model provided by an embodiment of the present application;

图10是本申请实施例提供的另一种文本映射模型的处理装置的结构示意图；10 is a schematic structural diagram of a processing apparatus for another text mapping model provided by an embodiment of the present application;

图11是本申请实施例提供的一种终端的结构示意图；FIG. 11 is a schematic structural diagram of a terminal provided by an embodiment of the present application;

图12是本申请实施例提供的一种服务器的结构示意图。FIG. 12 is a schematic structural diagram of a server provided by an embodiment of the present application.

具体实施方式Detailed ways

为使本申请实施例的目的、技术方案和优点更加清楚，下面将结合附图对本申请实施方式作进一步地详细描述。In order to make the objectives, technical solutions and advantages of the embodiments of the present application more clear, the embodiments of the present application will be further described in detail below with reference to the accompanying drawings.

本申请所使用的术语“第一”、“第二”、“第三”、“第四”等可在本文中用于描述各种概念，但除非特别说明，这些概念不受这些术语限制。这些术语仅用于将一个概念与另一个概念区分。举例来说，在不脱离本申请的范围的情况下，能够将第一相似度称为第二相似度，且类似地，可将第二相似度称为第一相似度。The terms "first", "second", "third", "fourth", etc. used in this application may be used herein to describe various concepts, but these concepts are not limited by these terms unless otherwise specified. These terms are only used to distinguish one concept from another. For example, a first degree of similarity could be referred to as a second degree of similarity, and similarly, a second degree of similarity could be referred to as a first degree of similarity, without departing from the scope of this application.

本申请所使用的术语“至少一个”、“多个”、“每个”、“任一”，至少一个包括一个、两个或两个以上，多个包括两个或两个以上，而每个是指对应的多个中的每一个，任一是指多个中的任意一个。举例来说，多个预测文本信息包括3个预测文本信息，而每个是指这3个预测文本信息中的每一个预测文本信息，任一是指这3个预测文本信息中的任意一个，能够是第一个预测文本信息，或者，是第二个预测文本信息，或者，是第三个预测文本信息。As used in this application, the terms "at least one", "plurality", "each" and "any", at least one includes one, two or more, multiple includes two or more, and each Each refers to each of the corresponding plurality, and any refers to any one of the plurality. For example, the plurality of predicted text information includes three predicted text information, and each refers to each of the three predicted text information, and any refers to any one of the three predicted text information, It can be the first predictive text information, or the second predictive text information, or the third predictive text information.

人工智能(Artificial Intelligence，AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能，感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说，人工智能是计算机科学的一个综合技术，它企图了解智能的实质，并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法，使机器具有感知、推理与决策的功能。Artificial Intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.

人工智能技术是一门综合学科，涉及领域广泛，既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。Artificial intelligence technology is a comprehensive discipline, involving a wide range of fields, including both hardware-level technology and software-level technology. The basic technologies of artificial intelligence generally include technologies such as sensors, special artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

自然语言处理(Nature Language processing，NLP)是计算机科学领域与人工智能领域中的一个重要方向。它研究能实现人与计算机之间用自然语言进行有效通信的各种理论和方法。自然语言处理是一门融语言学、计算机科学、数学于一体的科学。因此，这一领域的研究将涉及自然语言，即人们日常使用的语言，所以它与语言学的研究有着密切的联系。自然语言处理技术通常包括文本处理、语义理解、机器翻译、机器人问答、知识图谱等技术。Natural Language Processing (NLP) is an important direction in the field of computer science and artificial intelligence. It studies various theories and methods that can realize effective communication between humans and computers using natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Therefore, research in this field will involve natural language, the language that people use on a daily basis, so it is closely related to the study of linguistics. Natural language processing technology usually includes text processing, semantic understanding, machine translation, robot question answering, knowledge graph and other technologies.

机器学习(Machine Learning，ML)是一门多领域交叉学科，涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为，以获取新的知识或技能，重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心，是使计算机具有智能的根本途径，其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、示教学习等技术。Machine Learning (ML) is a multi-field interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. It specializes in how computers simulate or realize human learning behaviors to acquire new knowledge or skills, and to reorganize existing knowledge structures to continuously improve their performance. Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent, and its applications are in all fields of artificial intelligence. Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, teaching learning and other techniques.

本申请实施例提供的方案，基于人工智能的机器学习技术，能够训练文本映射模型，利用训练后的文本映射模型，能够映射出任一文本信息的相似文本信息，从而实现了文本映射模型的处理方法。The solution provided by the embodiment of the present application, based on the machine learning technology of artificial intelligence, can train a text mapping model, and use the trained text mapping model to map similar text information of any text information, thereby realizing the processing method of the text mapping model .

本申请实施例提供的文本映射模型的处理方法，由计算机设备执行。可选地，该计算机设备为终端或服务器。可选地，该服务器是独立的物理服务器，或者，是多个物理服务器构成的服务器集群或者分布式系统，或者，是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN(Content DeliveryNetwork，内容分发网络)、以及大数据和人工智能平台等基础云计算服务的云服务器。可选地，该终端是智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱、智能手表、智能语音交互设备、智能家电及车载终端等，但并不局限于此。The processing method of the text mapping model provided by the embodiment of the present application is executed by a computer device. Optionally, the computer device is a terminal or a server. Optionally, the server is an independent physical server, or a server cluster or a distributed system composed of multiple physical servers, or provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, Cloud servers for cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network), and basic cloud computing services such as big data and artificial intelligence platforms. Optionally, the terminal is a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a smart voice interaction device, a smart home appliance, a vehicle-mounted terminal, etc., but is not limited thereto.

在一些实施例中，本申请实施例所涉及的计算机程序可被部署在一个计算机设备上执行，或者在位于一个地点的多个计算机设备上执行，又或者，在分布在多个地点且通过通信网络互连的多个计算机设备上执行，分布在多个地点且通过通信网络互连的多个计算机设备能够组成区块链系统。In some embodiments, the computer programs involved in the embodiments of the present application may be deployed and executed on one computer device, or executed on multiple computer devices located at one location, or distributed in multiple locations and communicated through Executed on a plurality of computer devices interconnected by a network, and a plurality of computer devices distributed in multiple locations and interconnected through a communication network can form a blockchain system.

在一些实施例中，该计算机设备提供为服务器。图1是本申请实施例提供的一种实施环境的示意图。参见图1，该实施环境包括终端101和服务器102。终端101和服务器102之间通过无线或者有线网络连接。In some embodiments, the computer device is provided as a server. FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application. Referring to FIG. 1 , the implementation environment includes a terminal 101 and a server 102 . The terminal 101 and the server 102 are connected through a wireless or wired network.

服务器102用于对文本映射模型进行训练，在对文本映射模型训练完成后，存储该文本映射模型，终端101通过与服务器之间的网络连接，向服务器102发送目标文本信息，服务器102用于接收终端101发送的目标文本信息，基于文本映射模型，映射出该目标文本信息的相似文本信息，并向终端101发送该相似文本信息，由终端101接收该相似文本信息。The server 102 is used for training the text mapping model, and after the training of the text mapping model is completed, the text mapping model is stored, and the terminal 101 sends the target text information to the server 102 through the network connection with the server, and the server 102 is used for receiving The target text information sent by the terminal 101 is mapped to the similar text information of the target text information based on the text mapping model, and the similar text information is sent to the terminal 101, and the similar text information is received by the terminal 101.

在一种可能实现方式中，终端101上安装由服务器102提供服务的目标应用，终端101能够通过该目标应用实现例如文本映射功能。可选地，目标应用为终端101操作系统中的目标应用，或者为第三方提供的目标应用。例如，目标应用为文本映射应用，该文本映射应用具有文本映射的功能，当然，该文本映射应用还能够具有其他功能，例如，点评功能、问答功能、导航功能、游戏功能等。In a possible implementation manner, a target application provided by the server 102 is installed on the terminal 101, and the terminal 101 can implement, for example, a text mapping function through the target application. Optionally, the target application is a target application in the operating system of the terminal 101 or a target application provided by a third party. For example, the target application is a text mapping application, and the text mapping application has a text mapping function. Of course, the text mapping application can also have other functions, such as a review function, a question-and-answer function, a navigation function, a game function, and the like.

终端101用于基于用户标识登录目标应用，通过目标应用向服务器102发送文本扩充请求，该文本扩充请求携带目标文本信息，服务器102用于接收终端101发送文本扩充请求，基于文本映射模型，映射出该目标文本信息的相似文本信息，向终端101发送该相似文本信息，由终端101接收该相似文本信息。The terminal 101 is used to log in to the target application based on the user ID, and sends a text expansion request to the server 102 through the target application. The text expansion request carries the target text information, and the server 102 is used to receive the text expansion request sent by the terminal 101. Based on the text mapping model, map the The similar text information of the target text information is sent to the terminal 101, and the terminal 101 receives the similar text information.

图2是本申请实施例提供的一种文本映射模型的处理方法的流程图，由计算机设备执行，该计算机设备为终端或服务器，如图2所示，该方法包括：2 is a flowchart of a method for processing a text mapping model provided by an embodiment of the present application, which is executed by a computer device, where the computer device is a terminal or a server. As shown in FIG. 2 , the method includes:

201、计算机设备获取样本文本信息及第一标签信息，第一标签信息为与样本文本信息之间的相似度不小于相似度阈值的文本信息。201. The computer device acquires sample text information and first label information, where the first label information is text information whose similarity with the sample text information is not less than a similarity threshold.

其中，样本文本信息为任意类型的文本信息，例如，该样本文本信息包括询问语句、回答语句或者疾病描述语句等。相似度阈值为任意的数值，例如，相似度阈值为0.8或3等。该第一标签信息与样本文本信息之间的相似度不小于相似度阈值，表示该第一标签信息与该样本文本信息具有相同或相似的含义。例如，样本文本信息为“简单的小问题”，第一标签信息为“超简单的问题”。The sample text information is any type of text information, for example, the sample text information includes query sentences, answer sentences, or disease description sentences. The similarity threshold is an arbitrary value, for example, the similarity threshold is 0.8 or 3. The similarity between the first label information and the sample text information is not less than the similarity threshold, indicating that the first label information and the sample text information have the same or similar meaning. For example, the sample text information is "Simple Small Question", and the first label information is "Super Simple Question".

202、计算机设备基于文本映射模型，对样本文本信息进行映射，得到预测文本信息。202. The computer device maps the sample text information based on the text mapping model to obtain predicted text information.

其中，该文本映射模型用于映射出任一文本信息的相似文本信息。由于该文本映射模型为待训练的模型，则当前的文本映射模型可能不准确，基于该文本映射模型映射出的预测文本信息可能与该样本文本信息相似。Wherein, the text mapping model is used to map out similar text information of any text information. Since the text mapping model is the model to be trained, the current text mapping model may be inaccurate, and the predicted text information mapped based on the text mapping model may be similar to the sample text information.

203、计算机设备基于预测文本信息与样本文本信息之间的第一相似度及第一标签信息与样本文本信息之间的第二相似度，确定第二标签信息，第二标签信息为预测文本信息和第一标签信息中与样本文本信息之间的相似度较大的文本信息。203. The computer device determines the second label information based on the first similarity between the predicted text information and the sample text information and the second similarity between the first label information and the sample text information, and the second label information is the predicted text information and the text information with a greater similarity between the first label information and the sample text information.

其中，第一相似度用于表示预测文本信息与样本文本信息之间的相似程度，第二相似度用于表示第一标签信息与样本文本信息之间的相似程度。在本申请实施例中，基于文本映射模型得到预测文本信息后，按照预测文本信息和第一标签信息分别与样本文本信息之间的相似度，从预测文本信息和第一标签信息中选取与样本文本信息最相似的文本信息作为第二标签信息，即第二标签信息为预测文本信息和第一标签信息中与样本文本信息最相似的文本信息，以保证后续的文本映射模型的训练效果。例如，如果第一相似度大于第二相似度，则该第二标签信息为预测文本信息，如果第一相似度不大于第二相似度，该第二标签信息为第一标签信息。The first similarity is used to represent the similarity between the predicted text information and the sample text information, and the second similarity is used to represent the similarity between the first label information and the sample text information. In the embodiment of the present application, after the predicted text information is obtained based on the text mapping model, according to the similarity between the predicted text information and the first label information and the sample text information, respectively, the sample is selected from the predicted text information and the first label information. The text information with the most similar text information is used as the second label information, that is, the second label information is the text information most similar to the sample text information in the predicted text information and the first label information, so as to ensure the training effect of the subsequent text mapping model. For example, if the first similarity is greater than the second similarity, the second label information is predicted text information, and if the first similarity is not greater than the second similarity, the second label information is the first label information.

204、计算机设备基于第二标签信息、预测文本信息及预测文本信息对应的第一相似度，对文本映射模型进行训练，该文本映射模型用于映射出任一文本信息的相似文本信息。204. The computer device trains a text mapping model based on the second label information, the predicted text information, and the first similarity corresponding to the predicted text information, where the text mapping model is used to map similar text information of any text information.

本申请实施例提供的方法，按照预测文本信息和第一标签信息分别与样本文本信息之间的相似度，将预测文本信息和第一标签信息中与样本文本信息最相似的文本信息作为第二标签信息，不仅考虑到预测文本信息与第二标签文本信息之间的差异，还考虑到了预测文本信息与样本文本信息之间的第一相似度，基于第二标签信息、预测文本信息及预测文本信息对应的第一相似度，对文本映射模型进行训练，以保证后续基于训练后的文本映射模型能够映射出任一文本信息的相似文本信息，从而提升了文本映射模型的映射效果。In the method provided by this embodiment of the present application, according to the similarity between the predicted text information and the first label information and the sample text information respectively, the text information that is most similar to the sample text information in the predicted text information and the first label information is regarded as the second Label information, considering not only the difference between the predicted text information and the second label text information, but also the first similarity between the predicted text information and the sample text information, based on the second label information, the predicted text information and the predicted text The first similarity corresponding to the information, the text mapping model is trained to ensure that the subsequent text mapping model after training can map similar text information of any text information, thereby improving the mapping effect of the text mapping model.

在上述图2所示实施例的基础上，还能够获取预测文本信息对应的第一相似度和第一损失值，基于预测文本信息对应的第一相似度和第一损失值，对文本映射模型进行训练，训练过程详见下述实施例。On the basis of the embodiment shown in FIG. 2 above, the first similarity and the first loss value corresponding to the predicted text information can also be obtained, and based on the first similarity and the first loss value corresponding to the predicted text information, the text mapping model Carry out training, and the training process is detailed in the following examples.

301、计算机设备获取样本文本信息及第一标签信息，第一标签信息为与样本文本信息之间的相似度不小于相似度阈值的文本信息。301. The computer device acquires sample text information and first label information, where the first label information is text information whose similarity with the sample text information is not less than a similarity threshold.

302、计算机设备基于文本映射模型，对样本文本信息进行映射，得到预测文本信息。302. The computer device maps the sample text information based on the text mapping model to obtain predicted text information.

其中，该预测文本信息为文本映射模型映射出的可能与样本文本信息相似的文本信息。The predicted text information is the text information that may be similar to the sample text information mapped by the text mapping model.

在一种可能实现方式中，该步骤302包括：计算机设备基于文本映射模型，对样本文本信息进行映射，得到至少一个预测文本信息。In a possible implementation manner, the step 302 includes: the computer device maps the sample text information based on the text mapping model to obtain at least one piece of predicted text information.

在本申请实施例中，该文本映射模型用于映射出任一文本信息的至少一个相似文本信息。In this embodiment of the present application, the text mapping model is used to map out at least one similar text information of any text information.

在一种可能实现方式中，该步骤302包括：计算机设备基于文本映射模型，对样本文本信息进行编码，得到编码特征，对该编码特征进行解码处理，得到第一个词语，将该第一个词语的词语特征与编码特征融合，得到第一个融合特征，对该第一个融合特征进行解码，得到第二个词语，将第一个融合特征与第二个词语的词语特征进行融合，得到第二个融合特征，对第二个融合特征进行解码，得到第三个词语，以此类推，直至得到第n个词语，将得到的多个词语构成该预测文本信息。In a possible implementation manner, the step 302 includes: the computer device encodes the sample text information based on the text mapping model to obtain an encoded feature, decodes the encoded feature to obtain a first word, and the first word The word feature of the word is fused with the encoding feature to obtain the first fusion feature, the first fusion feature is decoded to obtain the second word, and the first fusion feature is fused with the word feature of the second word to obtain The second fusion feature decodes the second fusion feature to obtain the third word, and so on, until the nth word is obtained, and the obtained multiple words constitute the predicted text information.

在本申请实施例中，基于文本映射模型生射出样本文本信息对应的预测文本信息的过程中，依次输出预测文本信息的每个词语，且在输出每个词语时，依赖于该词语之前的多个词语，以保证输出的预测文本信息与该样本文本信息尽可能相似，从而保证文本映射模型的映射效果。如图4所示，文本映射模型包括编码层和解码层，基于编码层，对样本文本信息进行编码，得到编码特征，基于解码层，对该编码特征进行解码处理，得到第一个词语，基于解码层，将该第一个词语的词语特征与编码特征融合，得到第一个融合特征，对该第一个融合特征进行解码，得到第二个词语，基于解码层，将第一个融合特征与第二个词语的词语特征进行融合，得到第二个融合特征，对第二个融合特征进行解码，得到第三个词语，以此类推，直至得到第n个词语，将得到的多个词语构成该预测文本信息In the embodiment of the present application, in the process of generating the predicted text information corresponding to the sample text information based on the text mapping model, each word of the predicted text information is output in sequence, and when each word is output, it depends on the number of words before the word. words to ensure that the output predicted text information is as similar as possible to the sample text information, so as to ensure the mapping effect of the text mapping model. As shown in Figure 4, the text mapping model includes an encoding layer and a decoding layer. Based on the encoding layer, the sample text information is encoded to obtain encoding features. Based on the decoding layer, the encoding features are decoded to obtain the first word. Based on The decoding layer fuses the word feature of the first word with the encoding feature to obtain the first fusion feature, decodes the first fusion feature, and obtains the second word. Based on the decoding layer, the first fusion feature is Fusion with the word feature of the second word to obtain the second fusion feature, decode the second fusion feature to obtain the third word, and so on, until the nth word is obtained, the obtained multiple words constitute the predicted text information

303、计算机设备获取预测文本信息与样本文本信息之间的第三相似度和第四相似度。303. The computer device acquires a third degree of similarity and a fourth degree of similarity between the predicted text information and the sample text information.

其中，第三相似度指示预测文本信息与样本文本信息包含的词语的差异情况，能够表示出预测文本信息与样本文本信息在语法上的相似程度。第四相似度指示预测文本信息与样本文本信息之间的语义相似情况，能够表示出预测文本信息与样本文本信息在语义上的相似程度。The third degree of similarity indicates the difference between words contained in the predicted text information and the sample text information, and can indicate the grammatical similarity between the predicted text information and the sample text information. The fourth similarity indicates the semantic similarity between the predicted text information and the sample text information, and can represent the semantic similarity between the predicted text information and the sample text information.

在一种可能实现方式中，基于文本映射模型得到样本文本信息对应的多个预测文本信息，则该步骤303包括：计算机设备获取每个预测文本信息与样本文本信息之间的第三相似度和第四相似度。In a possible implementation manner, a plurality of predicted text information corresponding to the sample text information are obtained based on the text mapping model, then step 303 includes: the computer device obtains the third similarity and the sample text information between each predicted text information and the sample text information. Fourth similarity.

在一种可能实现方式中，获取预测文本信息与样本文本信息之间的第三相似度的过程，包括以下步骤3031-3034：In a possible implementation manner, the process of obtaining the third degree of similarity between the predicted text information and the sample text information includes the following steps 3031-3034:

3031、基于至少一种字符数目，分别对预测文本信息进行划分，得到至少一个第一词语集合，属于同一第一词语集合的词语包含字符的数目相同。3031. Divide the predicted text information based on the number of at least one character, respectively, to obtain at least one first word set, and words belonging to the same first word set contain the same number of characters.

其中，字符数目为任意的数目，例如，字符数目为1或2或3等。按照至少一种字符数目对该预测文本信息进行划分，得到至少一个第一词语集合，每个第一词语集合与一种字符数目对应，属于同一第一词语集合的词语包含字符的数目相同，属于同一第一词语集合的词语包含字符的数目与该第一词语集合对应的字符数目相同。例如，至少一种字符数目包括1、2及3，该预测文本信息为“我肚子饿”，则基于3种字符数目对该预测文本信息进行划分，得到3个第一词语集合，第一个第一词语集合包括“我”、“肚”、“子”及“饿”，即第一个第一词语集合中的每个词语包含1个字符；第二个第一词语集合包括“我肚”、“肚子”及“子饿”，即第二个第一词语集合中的每个词语包含2个字符；第三个第一词语集合包括“我肚子”及“肚子饿”，即第三个第一词语集合中的每个词语包含3个字符。The number of characters is any number, for example, the number of characters is 1, 2, or 3. Divide the predicted text information according to the number of at least one type of characters to obtain at least one first word set, each first word set corresponds to a number of characters, and the words belonging to the same first word set contain the same number of characters and belong to The number of characters included in the words of the same first word set is the same as the number of characters corresponding to the first word set. For example, if at least one character number includes 1, 2, and 3, and the predicted text information is "I'm hungry", the predicted text information is divided based on the three character numbers to obtain three first word sets, the first The first word set includes "me", "belly", "child" and "hungry", that is, each word in the first first word set contains 1 character; the second first word set includes "my belly". ”, “belly” and “子 hungry”, that is, each word in the second first word set contains 2 characters; the third first word set includes “my belly” and “belly hungry”, namely Each word in the first set of words contains 3 characters.

在一种可能实现方式中，该步骤3031包括：对于每种字符数目，基于该字符数目及目标步长，对该预测文本信息进行划分，得到该字符数目对应的第一词语集合。In a possible implementation manner, step 3031 includes: for each number of characters, based on the number of characters and the target step size, dividing the predicted text information to obtain a first set of words corresponding to the number of characters.

其中，目标步长为每次划分词语时移动的字数，该目标步长为任意的数值，例如，该目标步长为1。例如，基于N-Gram(一种统计语言模型)，对该预测文本信息进行划分，得到至少一个第一词语集合。例如，从预测文本信息的第一个字符开始，将长度等于该字符数目的选取框按照该目标步长逐渐移动，以框选预测文本信息中的字符，每移动一次将选取框内的字符数目的字符作为一个词语，将得到的多个词语构成第一词语集合。The target step size is the number of words moved each time the words are divided, and the target step size is an arbitrary value, for example, the target step size is 1. For example, based on N-Gram (a statistical language model), the predicted text information is divided to obtain at least one first word set. For example, starting from the first character of the predicted text information, gradually move the marquee whose length is equal to the number of characters according to the target step to select the characters in the predicted text information. The characters of are regarded as a word, and the plurality of words obtained form the first word set.

3032、基于至少一种字符数目，分别对样本文本信息进行划分，得到至少一个第二词语集合，属于同一第二词语集合的词语包含字符的数目相同。3032. Divide the sample text information respectively based on the number of at least one character to obtain at least one second word set, and words belonging to the same second word set contain the same number of characters.

其中，每个第二词语集合与一种字符数目对应，且每个第二词语集合中的词语包含字符的数目与对应的字符数目相等。该步骤3032与上述步骤3031同理，在此不再赘述。Wherein, each second word set corresponds to a number of characters, and the number of characters included in the words in each second word set is equal to the corresponding number of characters. This step 3032 is the same as the above-mentioned step 3031, and will not be repeated here.

3033、确定第一数目及第二数目，第一数目指示每种字符数目对应的第一词语集合与第二词语集合中不同词语的数目之和，第二数目指示至少一个第一词语集合与至少一个第二词语集合中词语的总数目。3033. Determine a first number and a second number, where the first number indicates the sum of the number of different words in the first word set and the second word set corresponding to each number of characters, and the second number indicates at least one first word set and at least one word set. The total number of words in a second word set.

对于任一字符数目对应的第一词语集合和第二词语集合，确定该字符数目对应的第一词语集合和第二词语集合中包含不同词语的数目，将至少一种字符数目对应的数目之和，确定为该第一数目。For the first word set and the second word set corresponding to any number of characters, determine the number of different words contained in the first word set and the second word set corresponding to the number of characters, and calculate the sum of the numbers corresponding to the number of at least one character , which is determined as the first number.

3034、将第一数目与第二数目的比值，确定为预测文本信息与样本文本信息之间的第三相似度。3034. Determine the ratio of the first number to the second number as the third similarity between the predicted text information and the sample text information.

通过按照至少一种字符数目，将该预测文本信息划分成至少一个第一词语集合，将该样本文本信息划分成至少一个第二词语集合，不同字符数目对应的第一词语集合中词语包含字符的数目不同，按照不同粒度级的词语集合来确定该预测文本信息与该样本文本信息之间的第三相似度，能够得到预测文本信息与该样本文本信息在语法上的相似度，从而保证得到的第三相似度的准确性。By dividing the predicted text information into at least one first word set according to the number of at least one character, and dividing the sample text information into at least one second word set, the words in the first word set corresponding to different numbers of characters include characters The number is different, the third similarity between the predicted text information and the sample text information is determined according to the word sets of different granularity levels, and the grammatical similarity between the predicted text information and the sample text information can be obtained, so as to ensure the obtained The accuracy of the third similarity.

在一种可能实现方式中，获取预测文本信息与样本文本信息之间的第四相似度的过程，包括以下两种方式：In a possible implementation manner, the process of obtaining the fourth similarity between the predicted text information and the sample text information includes the following two methods:

第一种方式：分别对预测文本信息及样本文本信息进行语义提取，得到预测文本信息的第一语义特征及样本文本信息的第二语义特征；将第一语义特征与第二语义特征之间的相似度，确定为第四相似度。The first way: semantically extract the predicted text information and the sample text information, respectively, to obtain the first semantic feature of the predicted text information and the second semantic feature of the sample text information; The similarity is determined as the fourth similarity.

其中，第一语义特征用于表征该预测文本信息，第二语义特征用于表征该样本文本信息，第一语义特征与第二语义特征能够以任意的形式表示，例如，第一语义特征及第二语义特征均以特征向量的形式表示。The first semantic feature is used to represent the predicted text information, and the second semantic feature is used to represent the sample text information. The first semantic feature and the second semantic feature can be represented in any form, for example, the first semantic feature and the second semantic feature can be represented in any form. Both semantic features are represented in the form of feature vectors.

可选地，第一语义特征及第二语义特征均以特征向量的形式表示，则确定第四相似度的过程包括：将第一语义特征向量与第二语义特征向量的乘积，确定为该第四相似度。Optionally, both the first semantic feature and the second semantic feature are represented in the form of feature vectors, and the process of determining the fourth similarity includes: determining the product of the first semantic feature vector and the second semantic feature vector as the first semantic feature vector. Four similarities.

第二种方式：将预测文本信息与样本文本信息进行拼接，得到拼接文本信息；对拼接文本信息进行语义提取，得到拼接文本信息对应的第三语义特征；对第三语义特征进行分类处理，得到分类结果；将分类结果确定为第四相似度。The second method: splicing the predicted text information and the sample text information to obtain the splicing text information; performing semantic extraction on the splicing text information to obtain the third semantic feature corresponding to the splicing text information; classifying the third semantic feature to obtain Classification result; determine the classification result as the fourth similarity.

其中，第三语义特征用于表示拼接文本信息，该第三语义特征中融入了该预测文本信息的特征及该样本文本信息的特征，即第三语义特征能够表征该预测文本信息及该样本文本信息。The third semantic feature is used to represent the spliced text information, and the third semantic feature incorporates the feature of the predicted text information and the feature of the sample text information, that is, the third semantic feature can represent the predicted text information and the sample text. information.

通过采取先拼接、再分类的方式，对该预测文本信息与该样本文本信息进行处理，能够捕捉到该预测文本信息与该样本文本信息中的核心词语之间的变化，从而在语义上确定出该预测文本信息与该样本文本信息之间的是否相同，以保证得到的第四相似度的准确性。By processing the predicted text information and the sample text information by splicing first and then classifying it, the changes between the predicted text information and the core words in the sample text information can be captured, so as to determine semantically. Whether the predicted text information and the sample text information are the same, so as to ensure the accuracy of the obtained fourth similarity.

例如，预测文本信息为“标签为什么显示灰色”，该样本文本信息为“标签为什么显示棕色”，即两种文本信息的核心词语“灰色”和“棕色”不同，按照上述先拼接、再分类的方式，获取的第四相似度小，从而能够确定出预测文本信息与该样本文本信息的含义不同。For example, the predicted text information is "why the label is displayed in gray", and the sample text information is "why the tag is displayed in brown", that is, the core words "gray" and "brown" of the two text information are different. In this way, the obtained fourth similarity is small, so that it can be determined that the meanings of the predicted text information and the sample text information are different.

可选地，采用自注意力的方式，对拼接文本信息进行语义提取，得到第三语义特征，对第三语义特征进行归一化处理，得到分类结果，将该分类结果确定为第四相似度。Optionally, using a self-attention method, semantically extract the spliced text information to obtain a third semantic feature, normalize the third semantic feature to obtain a classification result, and determine the classification result as the fourth similarity. .

通过自注意力的方式，对拼接文本信息进行语义提取的过程中，能够将该预测文本信息及该样本文本信息进行交叉编码，以此保证第三语义特征是通过该预测文本信息的特征及该样本文本信息的特征融合得到的特征，后续对该第三语义特征进行归一化处理，以确定出该预测文本信息与该样本文本信息中的核心词语是否相同，该分类结果即表示该预测文本信息与该样本文本信息中的核心词语的相似程度，从而确定出第四相似度。In the process of semantically extracting the spliced text information by means of self-attention, the predicted text information and the sample text information can be cross-encoded, so as to ensure that the third semantic feature is obtained through the features of the predicted text information and the The feature obtained by the feature fusion of the sample text information, and then the third semantic feature is normalized to determine whether the predicted text information is the same as the core words in the sample text information, and the classification result indicates the predicted text. The degree of similarity between the information and the core words in the sample text information, so as to determine the fourth degree of similarity.

需要说明的是，本申请实施例是以上述两种方式分别来获取第四相似度的，而在另一实施例中，上述两种方式能够结合，将上述两种方式获取到的相似度进行加权融合，得到第四相似度。在一种可能实现方式中，将按照上述第一种方式得到的相似度为第五相似度，按照上述第二种方式得到的相似度为第六相似度，将第五相似度与第六相似度加权融合，得到第四相似度。It should be noted that, in this embodiment of the present application, the above two methods are used to obtain the fourth similarity, while in another embodiment, the above two methods can be combined, and the similarity obtained by the above two methods is Weighted fusion to obtain the fourth similarity. In a possible implementation manner, the similarity obtained according to the first method above is the fifth similarity, the similarity obtained according to the second method above is the sixth similarity, and the fifth similarity is similar to the sixth similarity Degree-weighted fusion to obtain the fourth similarity.

304、计算机设备对第三相似度及第四相似度进行加权融合，得到预测文本信息与样本文本信息之间的第一相似度。304. The computer device performs weighted fusion on the third similarity and the fourth similarity to obtain a first similarity between the predicted text information and the sample text information.

其中，该第一相似度用于表示预测文本信息与样本文本信息之间的相似程度。通过多种方式考虑该第一文本信息与该标签信息之间的相似度，得到第三相似度和第四相似度，对第三相似度及第四相似度进行加权融合，保证得到的预测文本信息与样本文本信息之间的相似度的准确性。本申请提供了一种语义与语法相结合的打分机制，基于该打分机制，确定出预测文本信息与样本文本信息之间的相似度，从而保证确定出的第一相似度的准确性。The first similarity is used to represent the similarity between the predicted text information and the sample text information. Considering the similarity between the first text information and the label information in various ways, the third similarity and the fourth similarity are obtained, and the third similarity and the fourth similarity are weighted and fused to ensure the obtained predicted text The accuracy of the similarity between the information and the sample text information. The present application provides a scoring mechanism combining semantics and grammar. Based on the scoring mechanism, the similarity between the predicted text information and the sample text information is determined, thereby ensuring the accuracy of the determined first similarity.

在一种可能实现方式中，第四相似度是基于第五相似度及第六相似度加权得到的，则该步骤303包括：对第三相似度、第五相似度及第六相似度进行加权融合，得到第一相似度。In a possible implementation manner, the fourth similarity is obtained by weighting the fifth similarity and the sixth similarity, then step 303 includes: weighting the third similarity, the fifth similarity and the sixth similarity Fusion to get the first similarity.

可选地，第三相似度、第五相似度、第六相似度及第一相似度满足以下关系：Optionally, the third similarity, the fifth similarity, the sixth similarity and the first similarity satisfy the following relationship:

Score(all)＝2*Score(sbert)+2*Score(common_bert)+1*Score(diversity)Score(all)=2*Score( _sbert )+2*Score(common bert)+1*Score(diversity)

其中，Score(all)用于表示第一相似度，Score(sbert)用于表示第五相似度，Score(common_berr)用于表示第六相似度，Score(diversity)用于表示第三相似度。Among them, Score(all) is used to represent the first similarity, Score( _sbert ) is used to represent the fifth similarity, Score(common berr) is used to represent the sixth similarity, and Score(diversity) is used to represent the third similarity Spend.

需要说明的是，本申请实施例是先获取预测文本信息与样本文本信息之间的第三相似度和第四相似度，之后对第三相似度及第四相似度进行加权融合，得到该预测文本信息与样本文本信息之间的第四相似度的，而在另一实施例中，无需执行步骤303-304，能够采取其他方式，获取基于所述预测文本信息与所述样本文本信息之间的第一相似度。It should be noted that, in this embodiment of the present application, the third similarity and the fourth similarity between the predicted text information and the sample text information are obtained first, and then the third similarity and the fourth similarity are weighted and fused to obtain the prediction. The fourth similarity between the text information and the sample text information, and in another embodiment, it is not necessary to perform steps 303-304, and other methods can be used to obtain the information based on the predicted text information and the sample text information. the first similarity.

305、计算机设备基于预测文本信息与样本文本信息之间的第一相似度及第一标签信息与样本文本信息之间的第二相似度，确定第二标签信息，第二标签信息为预测文本信息和第一标签信息中与样本文本信息之间的相似度较大的文本信息。305. The computer device determines the second label information based on the first similarity between the predicted text information and the sample text information and the second similarity between the first label information and the sample text information, and the second label information is the predicted text information and the text information with a greater similarity between the first label information and the sample text information.

其中，第一相似度用于表示预测文本信息与样本文本信息之间的相似程度，第二相似度用于表示第一标签信息与样本文本信息之间的相似程度，通过对比第一相似度及第二相似度，即可从预测文本信息和第一标签信息中确定出与样本文本信息最相似的文本信息，将最相似的文本信息作为第二标签信息。获取第一标签信息与样本文本信息之间的第二相似度的方式，与上述获取预测文本信息与样本文本信息之间的第一相似度的方式同理，在此不再赘述。The first similarity is used to represent the similarity between the predicted text information and the sample text information, and the second similarity is used to represent the similarity between the first label information and the sample text information. By comparing the first similarity and the sample text information The second degree of similarity can determine the text information that is most similar to the sample text information from the predicted text information and the first label information, and use the most similar text information as the second label information. The manner of acquiring the second degree of similarity between the first label information and the sample text information is the same as the manner of acquiring the first degree of similarity between the predicted text information and the sample text information, and will not be repeated here.

在一种可能实现方式中，该步骤305包括以下两种方式：In a possible implementation manner, this step 305 includes the following two manners:

第一种方式：响应于多个第一相似度中最大的第一相似度大于第二相似度，将最大的第一相似度对应的预测文本信息，确定为第二标签信息，该多个第一相似度为多个预测文本信息与样本文本信息之间的相似度。The first way: in response to that the largest first similarity among the multiple first similarities is greater than the second similarity, the predicted text information corresponding to the largest first similarity is determined as the second label information, and the multiple first similarity is determined as the second label information. A similarity is the similarity between the plurality of predicted text information and the sample text information.

第二种方式：响应于多个第一相似度均不大于第二相似度，将第一标签信息确定为第二标签信息。The second manner: in response to the multiple first degrees of similarity being not greater than the second degrees of similarity, determining the first label information as the second label information.

在本申请实施例中，基于文本映射模型，对样本文本信息进行映射，得到多个预测文本信息，则确定每个预测文本信息与样本文本信息之间的第一相似度以及第一标签信息与样本文本信息之间的第二相似度，将多个第一相似度与第二相似度进行对比，以确定出最大的相似度，将最大的相似度对应的信息确定为第二标签信息。例如，如果最大的相似度为第二相似度，则将第二相似度对应的第一标签信息确定为第二标签信息，如果最大的相似度为一个第一相似度，将该第一相似度对应的预测文本信息，确定为第二标签信息。In the embodiment of the present application, based on the text mapping model, the sample text information is mapped to obtain a plurality of predicted text information, and then the first similarity between each predicted text information and the sample text information and the first label information and the first label information are determined. For the second similarity between the sample text information, the first similarity is compared with the second similarity to determine the maximum similarity, and the information corresponding to the maximum similarity is determined as the second label information. For example, if the maximum similarity is the second similarity, the first label information corresponding to the second similarity is determined as the second label information, and if the maximum similarity is a first similarity, the first similarity The corresponding predicted text information is determined as the second label information.

在一种可能实现方式中，该方法还包括：在第二标签信息为预测文本信息的情况下，将第二文本信息中该样本文本信息对应的第一标签信息更新为该第二标签信息。In a possible implementation manner, the method further includes: when the second label information is predicted text information, updating the first label information corresponding to the sample text information in the second text information to the second label information.

如果预测文本信息与样本文本信息之间的第一相似度，大于第一标签信息与样本文本信息之间的第二相似度，表示预测文本信息与样本文本信息更相似，即该预测文本信息更适合作为样本文本信息的标签信息，因此，对第二文本信息集合中该样本文本信息对应的第一标签信息进行更新，以保证第二文本信息集合中的第二文本信息及对应的标签信息更准确，以保证第二文本信息集合的质量更高，后续基于第二文本信息集合中的第二文本信息及对应的标签信息对文本映射模型进行训练时，能够保证文本映射模型的映射效果。If the first similarity between the predicted text information and the sample text information is greater than the second similarity between the first label information and the sample text information, it means that the predicted text information is more similar to the sample text information, that is, the predicted text information is more similar to the sample text information. It is suitable as the label information of the sample text information. Therefore, the first label information corresponding to the sample text information in the second text information set is updated to ensure that the second text information and the corresponding label information in the second text information set are updated. To ensure that the quality of the second text information set is higher, when the text mapping model is subsequently trained based on the second text information in the second text information set and the corresponding label information, the mapping effect of the text mapping model can be guaranteed.

306、计算机设备基于第二标签信息及预测文本信息，获取预测文本信息对应的第一损失值。306. The computer device acquires, based on the second label information and the predicted text information, a first loss value corresponding to the predicted text information.

其中，该预测文本信息及该第二标签信息之间的差异，能够表示该文本映射模型的准确性，因此，基于该预测文本信息及该第二标签信息来确定出第一损失值，以便后续基于第一损失值来训练文本映射模型。The difference between the predicted text information and the second label information can represent the accuracy of the text mapping model. Therefore, the first loss value is determined based on the predicted text information and the second label information, so as to facilitate subsequent A text mapping model is trained based on the first loss value.

在一种可能实现方式中，该步骤306包括：基于该第二标签信息及每个预测文本信息，获取每个预测文本信息对应的第一损失值。In a possible implementation manner, the step 306 includes: obtaining a first loss value corresponding to each predicted text information based on the second label information and each predicted text information.

在本申请实施例中，基于文本映射模型能够映射出样本文本信息对应的多个预测文本信息，则基于每个预测文本信息及该第二标签信息，能够确定出每个预测文本信息对应的第一损失值，以便后续基于多个第一损失值对文本映射模型进行训练。In the embodiment of the present application, based on the text mapping model, a plurality of predicted text information corresponding to the sample text information can be mapped, and then based on each predicted text information and the second label information, the first predicted text information corresponding to each predicted text information can be determined. A loss value for subsequent training of the text mapping model based on the plurality of first loss values.

307、计算机设备基于预测文本信息对应的第一相似度和第一损失值，对文本映射模型进行训练。307. The computer device trains the text mapping model based on the first similarity and the first loss value corresponding to the predicted text information.

预测文本信息对应的第一相似度和第一损失值均能够反映出文本映射模型的准确性，基于预测文本信息对应的第一相似度和第一损失值，对文本映射模型进行训练，以提升文本映射模型的映射效果。Both the first similarity and the first loss value corresponding to the predicted text information can reflect the accuracy of the text mapping model. Based on the first similarity and the first loss value corresponding to the predicted text information, the text mapping model is trained to improve the The mapping effect of the text mapping model.

在一种可能实现方式中，该步骤307包括以下步骤3071-3074：In a possible implementation, the step 307 includes the following steps 3071-3074:

3071、计算机设备分别基于目标相似度与每个预测文本信息对应的第一相似度之间的差值，获取每个预测文本信息对应的权重参数。3071. The computer device obtains a weight parameter corresponding to each predicted text information based on the difference between the target similarity and the first similarity corresponding to each predicted text information, respectively.

其中，目标相似度为任意的数值，用于表示期望预测文本信息与样本文本信息之间的相似度的最大值。例如，目标相似度为5，每个预测文本信息对应的第一相似度均不大于5。权重参数指示每个预测文本信息是否准确，也能反映出文本映射模型的映射效果，基于该权重参数能够调整损失值的大小，权重参数越大，表示预测文本信息越不准确，后续计算出的损失值越大；权重参数越小，表示预测文本信息越准确，后续计算出的损失值越小。Among them, the target similarity is an arbitrary numerical value, which is used to represent the maximum similarity between the expected predicted text information and the sample text information. For example, the target similarity is 5, and the first similarity corresponding to each predicted text information is not greater than 5. The weight parameter indicates whether each predicted text information is accurate, and can also reflect the mapping effect of the text mapping model. Based on the weight parameter, the size of the loss value can be adjusted. The larger the weight parameter, the less accurate the predicted text information. The larger the loss value; the smaller the weight parameter, the more accurate the predicted text information, and the smaller the subsequent calculated loss value.

在一种可能实现方式中，目标相似度、任一预测文本信息对应的第一相似度和权重参数，满足以下关系：In a possible implementation manner, the target similarity, the first similarity corresponding to any predicted text information, and the weight parameter satisfy the following relationship:

其中，rewards用于表示任一预测文本信息对应的权重参数，μ用于表示超参数，μ为常数，score_max用于表示目标相似度，score_real用于表示该预测文本信息对应的第一相似度。Among them, rewards is used to represent the weight parameter corresponding to any predicted text information, μ is used to represent the hyperparameter, μ is a constant, score _max is used to represent the target similarity, and score _real is used to represent the first similarity corresponding to the predicted text information Spend.

3072、计算机设备基于每个预测文本信息对应的权重参数，对多个预测文本信息对应的第一损失值进行加权平均，得到第二损失值。3072. Based on the weight parameter corresponding to each predicted text information, the computer device performs a weighted average of the first loss values corresponding to the multiple predicted text information to obtain a second loss value.

考虑到了每个预测文本信息的准确性，通过每个预测文本信息对应的权重参数，对多个预测文本信息对应的第一损失值进行加权平均，以保证得到的第二损失值准确。Taking into account the accuracy of each predicted text information, a weighted average of the first loss values corresponding to the multiple predicted text information is performed through the weight parameter corresponding to each predicted text information to ensure that the obtained second loss value is accurate.

在一种可能实现方式中，每个预测文本信息对应的权重参数、多个预测文本信息对应的第一损失值及第二损失值，满足以下关系：In a possible implementation manner, the weight parameter corresponding to each predicted text information, the first loss value and the second loss value corresponding to multiple predicted text information satisfy the following relationship:

loss_rl＝mean(loss_mle-batch·rewards)loss _rl = mean(loss _mle-batch ·rewards)

其中，loss_rl用于表示第二损失值，loss_mle-batch用于表示预测文本信息对应的第一损失值，rewards用于表示预测文本信息对应的权重参数，mean(·)用于表示求平均值的函数。Among them, loss _rl is used to represent the second loss value, loss _mle-batch is used to represent the first loss value corresponding to the predicted text information, rewards is used to represent the weight parameter corresponding to the predicted text information, mean( ) is used to represent the average function of value.

3073、计算机设备将多个预测文本信息对应的第一损失值的平均值，确定为第三损失值。3073. The computer device determines the average value of the first loss values corresponding to the plurality of predicted text information as the third loss value.

每个预测文本信息对应的第一损失值，能够反映出文本映射模型的准确性，综合考虑多个预测文本信息对应的第一损失值，将个预测文本信息对应的第一损失值的平均值作为第三损失值，后续以第三损失值来训练文本映射模型，以保证训练文本映射模型的准确性，从而能够提升文本映射模型的映射效果。The first loss value corresponding to each predicted text information can reflect the accuracy of the text mapping model. Considering the first loss values corresponding to multiple predicted text information comprehensively, the average value of the first loss values corresponding to each predicted text information is calculated. As the third loss value, the text mapping model is subsequently trained with the third loss value to ensure the accuracy of the training text mapping model, thereby improving the mapping effect of the text mapping model.

在一种可能实现方式中，该第三损失值及多个预测文本信息对应的第一损失值，满足以下关系：In a possible implementation manner, the third loss value and the first loss values corresponding to multiple predicted text information satisfy the following relationship:

loss_mle＝mean(loss_mle-batch)loss _mle = mean(loss _mle-batch )

其中，loss_mle用于表示第三损失值，loss_mle-batch用于表示多个预测文本信息对应的第一损失值，mean(·)用于表示求平均值。Among them, loss _mle is used to represent the third loss value, loss _mle-batch is used to represent the first loss value corresponding to multiple predicted text information, and mean( ) is used to represent the average value.

3074、计算机设备基于第二损失值及第三损失值，对文本映射模型进行训练。3074. The computer device trains the text mapping model based on the second loss value and the third loss value.

由于第二损失值及第三损失值均能反映出文本映射模型的准确性，基于第二损失值及第三损失值来对文本映射模型进行训练，以提升文本映射模型的映射效果。Since both the second loss value and the third loss value can reflect the accuracy of the text mapping model, the text mapping model is trained based on the second loss value and the third loss value, so as to improve the mapping effect of the text mapping model.

在一种可能实现方式中，该步骤3074包括：对第二损失值及第三损失值进行加权融合，得到第四损失值，基于第四损失值，对文本映射模型进行训练。In a possible implementation manner, step 3074 includes: performing weighted fusion on the second loss value and the third loss value to obtain a fourth loss value, and training the text mapping model based on the fourth loss value.

可选地，第二损失值、第三损失值及第四损失值，满足以下关系：Optionally, the second loss value, the third loss value and the fourth loss value satisfy the following relationship:

loss＝λloss_rl+(1-λ)loss_mle loss=λloss _rl +(1-λ)loss _mle

其中，loss用于表示第四损失值，λ用于表示超参数，λ为常数，loss_rl用于表示第二损失值，loss_mle用于表示第三损失值。Among them, loss is used to represent the fourth loss value, λ is used to represent the hyperparameter, λ is a constant, loss _rl is used to represent the second loss value, and loss _mle is used to represent the third loss value.

通过第二损失值及第三损失值对文本映射模型进行训练，考虑到了预测文本信息与样本文本信息之间的第一相似度的影响，以使在预测文本信息与样本文本信息足够相似的情况下，减小损失值，以减少调整文本模型的调整幅度，在预测文本信息与样本文本信息不相似的情况下，增大损失值，以加快文本映射模型的收敛和优化，以保证最终训练得到的文本映射模型的映射效果。The text mapping model is trained through the second loss value and the third loss value, taking into account the influence of the first similarity between the predicted text information and the sample text information, so that when the predicted text information and the sample text information are sufficiently similar , reduce the loss value to reduce the adjustment range of adjusting the text model, and increase the loss value to speed up the convergence and optimization of the text mapping model when the predicted text information is not similar to the sample text information, so as to ensure that the final training results The mapping effect of the text mapping model.

需要说明的是，本申请实施例是通过预测文本信息对应的第一损失值和第一相似度对文本映射模型进行训练的，而在另一实施例中，无需执行步骤306-307，能够采取其他方式，基于第二标签信息、预测文本信息及预测文本信息对应的第一相似度，对文本映射模型进行训练。It should be noted that, in this embodiment of the present application, the text mapping model is trained by predicting the first loss value and the first similarity corresponding to the text information. In other manners, the text mapping model is trained based on the second label information, the predicted text information, and the first similarity corresponding to the predicted text information.

需要说明的是，本申请实施例仅是基于样本文本信息及对应的第一标签信息对文本映射模型进行一次迭代训练来说明的，而在另一实施例中，能够按照上述步骤301-307，对文本映射模型进行多次迭代训练。在一种可能实现方式中，按照上述步骤301-307对文本映射模型进行迭代训练的过程中，响应于迭代次数达到第一数值，停止对文本映射模型进行训练；或者，响应于当前迭代次数的第四损失值小于第二数值，停止对文本映射模型进行训练。It should be noted that the embodiment of the present application is only described by performing an iterative training on the text mapping model based on the sample text information and the corresponding first label information. The text mapping model is trained for multiple iterations. In a possible implementation manner, in the process of iteratively training the text mapping model according to the above steps 301-307, in response to the number of iterations reaching the first value, the training of the text mapping model is stopped; or, in response to the current iteration number of The fourth loss value is smaller than the second value, and the training of the text mapping model is stopped.

其中，第一数值用于表示迭代次数的最大值，第一数值及第二数值均为任意的数值，例如，第一数值为100，第二数值为0.3。The first value is used to represent the maximum number of iterations, and both the first value and the second value are arbitrary values, for example, the first value is 100, and the second value is 0.3.

需要说明的是，本申请实施例仅是基于一个样本文本信息及对应的第一标签信息对文本映射模型进行一次迭代训练来说明的，而在另一实施例中，在对文本映射模型进行一次迭代训练时，能够基于多个样本文本信息及每个样本文本信息对应的第一标签信息，对文本映射模型进行训练。例如，从第二文本信息集合中获取多个样本文本信息及每个样本文本信息对应的第一标签信息，之后按照上述步骤301-306，获取每个样本文本信息对应的预测文本信息，并获取每个预测文本信息对应的第一相似度和第一损失值，基于多个预测文本信息对应的第一相似度和第一损失值，按照上述步骤307，对文本映射模型进行训练。It should be noted that the embodiment of the present application is only described by performing an iterative training on the text mapping model based on one sample text information and the corresponding first label information. In another embodiment, the text mapping model is trained once. During iterative training, the text mapping model can be trained based on multiple sample text information and the first label information corresponding to each sample text information. For example, obtain a plurality of sample text information and the first label information corresponding to each sample text information from the second text information set, and then according to the above steps 301-306, obtain the predicted text information corresponding to each sample text information, and obtain The first similarity and the first loss value corresponding to each predicted text information are based on the first similarity and the first loss value corresponding to the plurality of predicted text information, and the text mapping model is trained according to the above step 307 .

在一种可能实现方式中，在步骤307之后，该方法还包括：基于文本映射模型，对目标文本信息进行映射，得到目标文本信息的相似文本信息。In a possible implementation manner, after step 307, the method further includes: mapping the target text information based on the text mapping model to obtain similar text information of the target text information.

其中，目标文本信息为任意的文本信息。在训练得到文本映射模型后，基于该文本映射模型能够映射出该目标文本信息的相似文本信息。The target text information is any text information. After training the text mapping model, similar text information of the target text information can be mapped based on the text mapping model.

可选地，基于该文本映射模型，对目标文本信息进行映射，得到目标文本信息的多个相似文本信息。Optionally, based on the text mapping model, the target text information is mapped to obtain a plurality of similar text information of the target text information.

并且，本申请提供了一种语义与语法相结合的打分机制，以此来确定出第一文本信息与对应的标签信息之间的相似度，从而保证确定出的相似度的准确性。Moreover, the present application provides a scoring mechanism combining semantics and grammar, so as to determine the similarity between the first text information and the corresponding label information, thereby ensuring the accuracy of the determined similarity.

并且，在对文本映射模型进行训练的过程中，会对标签信息进行更新，使得更新后的标签信息为预测文本信息和标签文本信息中与样本文本信息最相似的文本信息，以保证标签信息的准确性，基于更新后的标签信息、预测文本信息及预测文本信息对应的第一相似度，来对文本映射模型进行训练，能够提升文本映射模型的映射效果，避免以固定标签的方式对模型进行训练，使得文本映射模型具有更好的泛化能力。In addition, in the process of training the text mapping model, the label information is updated, so that the updated label information is the text information that is most similar to the sample text information among the predicted text information and the label text information, so as to ensure that the label information is consistent. Accuracy, based on the updated label information, predicted text information and the first similarity corresponding to the predicted text information, to train the text mapping model, which can improve the mapping effect of the text mapping model and avoid the use of fixed labels. training, so that the text mapping model has better generalization ability.

在图2所示实施例的基础上，还能够对文本映射模型进行多阶段的训练，即先基于第一文本信息集合中的第一文本信息及对应的标签信息对文本映射模型进行第一阶段训练，之后，对第一文本信息集合进行筛选，基于筛选得到的第二文本信息集合对文本映射模型进行第二阶段的训练，训练过程详见下述实施例。On the basis of the embodiment shown in FIG. 2 , the text mapping model can also be trained in multiple stages, that is, the first stage of the text mapping model is performed based on the first text information and the corresponding label information in the first text information set. After training, the first text information set is screened, and the text mapping model is trained in the second stage based on the second text information set obtained by screening. For details of the training process, refer to the following embodiments.

图5是本申请实施例提供的一种文本映射模型的处理方法的流程图，由计算机设备执行，如图5所示，该方法包括：5 is a flowchart of a method for processing a text mapping model provided by an embodiment of the present application, which is executed by a computer device. As shown in FIG. 5 , the method includes:

501、计算机设备获取第一文本信息集合。501. The computer device acquires a first text information set.

其中，第一文本信息集合包括多个第一文本信息及每个第一文本信息对应的标签信息，第一文本信息为任意类型的文本信息，例如，该第一文本信息包括询问语句、回答语句或者疾病描述语句等。每个第一文本信息对应的标签信息为与该第一文本信息可能相似的文本信息。The first text information set includes a plurality of first text information and label information corresponding to each first text information, and the first text information is any type of text information. For example, the first text information includes a query sentence, an answer sentence Or disease description sentences, etc. Label information corresponding to each first text information is text information that may be similar to the first text information.

502、计算机设备基于第一文本信息集合中的第一文本信息及对应的标签信息，对文本映射模型进行训练。502. The computer device trains the text mapping model based on the first text information and the corresponding label information in the first text information set.

其中，该文本映射模型为任意的网络模型，例如，该文本映射模型为Seq2Seq(Sequence to Sequence，一种神经网络模型)，或者为Transformer(一种注意力网络模型)。The text mapping model is an arbitrary network model, for example, the text mapping model is Seq2Seq (Sequence to Sequence, a neural network model), or a Transformer (an attention network model).

通过第一文本信息集合中的第一文本信息及对应的标签信息，对文本映射模型进行训练，以使训练后的文本映射模型具备初步的文本映射能力，能够映射出任一文本信息的相似文本信息。The text mapping model is trained according to the first text information and the corresponding label information in the first text information set, so that the trained text mapping model has preliminary text mapping capabilities and can map similar text information of any text information .

在一种可能实现方式中，该步骤502包括：基于该文本映射模型，对第一文本信息进行映射，得到第一文本信息对应的预测文本信息，基于该第一文本信息对应的预测文本信息和标签信息，对该文本映射模型进行训练。In a possible implementation manner, step 502 includes: mapping the first text information based on the text mapping model to obtain predicted text information corresponding to the first text information, and based on the predicted text information corresponding to the first text information and label information, and train the text mapping model.

基于该文本映射模型，映射出第一文本信息对应的预测文本信息，该第一文本信息对应的预测文本信息和标签信息之间的差异，能够反映出文本映射模型的映射效果的好坏，基于该第一文本信息对应的预测文本信息和标签信息对文本映射模型进行训练，以提升文本映射模型的映射效果。Based on the text mapping model, the predicted text information corresponding to the first text information is mapped, and the difference between the predicted text information corresponding to the first text information and the label information can reflect the quality of the mapping effect of the text mapping model. The predicted text information and the label information corresponding to the first text information train the text mapping model, so as to improve the mapping effect of the text mapping model.

可选地，对文本映射模型进行训练的过程包括：基于第一文本信息对应的预测文本信息和标签信息，确定第五损失值，基于该第五损失值，对文本映射模型进行训练。Optionally, the process of training the text mapping model includes: determining a fifth loss value based on the predicted text information and label information corresponding to the first text information, and training the text mapping model based on the fifth loss value.

其中，第五损失值指示第一文本信息对应的预测文本信息与标签信息之间的差异程度，能够反映出文本映射模型的映射效果的好坏。例如，采用交叉熵损失函数，基于第一文本信息对应的预测文本信息和标签信息，确定第五损失值。通过第五损失值对文本映射模型进行训练，以提升文本映射模型的映射效果。The fifth loss value indicates the degree of difference between the predicted text information corresponding to the first text information and the label information, which can reflect the quality of the mapping effect of the text mapping model. For example, using a cross-entropy loss function, the fifth loss value is determined based on the predicted text information and label information corresponding to the first text information. The text mapping model is trained through the fifth loss value to improve the mapping effect of the text mapping model.

在一种可能实现方式中，该步骤502包括：基于多个第一文本信息及每个第一文本信息对应的标签信息，对文本映射模型进行迭代训练。In a possible implementation manner, the step 502 includes: performing iterative training on the text mapping model based on a plurality of first text information and label information corresponding to each first text information.

通过对文本映射模型进行多次迭代训练，以尽可能提升文本映射模型的映射效果。Iteratively trains the text mapping model for many times to improve the mapping effect of the text mapping model as much as possible.

可选地，在对文本映射模型进行迭代训练的过程中，响应于迭代次数达到第三数值，停止对文本映射模型进行训练；或者，响应于当前迭代次数的第五损失值小于第四数值，停止对文本映射模型进行训练。Optionally, in the process of iteratively training the text mapping model, in response to the number of iterations reaching the third numerical value, the training of the text mapping model is stopped; or, in response to the fifth loss value of the current number of iterations being less than the fourth numerical value, Stop training the text mapping model.

其中，第三数值用于表示迭代次数的最大值，第三数值及第四数值均为任意的数值，例如，第三数值为100，第四数值为0.3。The third value is used to represent the maximum number of iterations, and both the third value and the fourth value are arbitrary values, for example, the third value is 100, and the fourth value is 0.3.

503、计算机设备确定每个第一文本信息与对应的标签信息之间的相似度。503. The computer device determines the similarity between each first text information and the corresponding tag information.

其中，每个第一文本信息与对应的标签信息之间的相似度，能够体现出第一文本信息与对应的标签信息之间的相似程度。The similarity between each first text information and the corresponding label information can reflect the similarity between the first text information and the corresponding label information.

在一种可能实现方式中，该步骤503包括：对于任一第一文本信息及对应的标签信息，获取该第一文本信息与该标签信息之间的第七相似度和第八相似度，对第七相似度及第八相似度进行加权融合，得到该第一文本信息与该标签信息之间的相似度。In a possible implementation manner, the step 503 includes: for any first text information and corresponding label information, obtaining the seventh similarity and the eighth similarity between the first text information and the label information, The seventh similarity and the eighth similarity are weighted and fused to obtain the similarity between the first text information and the label information.

其中，第七相似度指示该第一文本信息与该标签信息包含的词语的差异情况，能够表示出该第一文本信息与该标签信息在语法上的相似程度。第八相似度指示该第一文本信息与该标签信息之间的语义相似情况，能够表示出该第一文本信息与该标签信息在语义上的相似程度。通过多种方式考虑该第一文本信息与该标签信息之间的相似度，得到第七相似度和第八相似度，对第七相似度及第八相似度进行加权融合，保证得到的第一文本信息与该标签信息之间的相似度的准确性。本申请提供了一种语义与语法相结合的打分机制，基于该打分机制确定出第一文本信息与对应的标签信息之间的相似度，从而保证确定出的相似度的准确性。Wherein, the seventh degree of similarity indicates the difference between the words contained in the first text information and the label information, and can indicate the degree of grammatical similarity between the first text information and the label information. The eighth degree of similarity indicates the semantic similarity between the first text information and the label information, and can indicate the degree of semantic similarity between the first text information and the label information. Considering the similarity between the first text information and the label information in various ways, the seventh similarity and the eighth similarity are obtained, and the seventh similarity and the eighth similarity are weighted and fused to ensure that the obtained first The accuracy of the similarity between the text information and the label information. The present application provides a scoring mechanism combining semantics and grammar, and based on the scoring mechanism, the similarity between the first text information and the corresponding label information is determined, thereby ensuring the accuracy of the determined similarity.

可选地，获取该第一文本信息与该标签信息之间的第七相似度的过程包括：Optionally, the process of acquiring the seventh degree of similarity between the first text information and the label information includes:

5031、基于至少一种字符数目，分别对该第一文本信息进行划分，得到至少一个第三词语集合，属于同一第三词语集合的词语包含字符的数目相同。5031. Divide the first text information based on the number of at least one character, respectively, to obtain at least one third word set, and words belonging to the same third word set contain the same number of characters.

5032、基于至少一种字符数目，分别对该标签信息进行划分，得到至少一个第四词语集合，属于同一第四词语集合的词语包含字符的数目相同。5032. Divide the label information respectively based on the number of at least one character to obtain at least one fourth word set, and words belonging to the same fourth word set contain the same number of characters.

5033、确定第三数目及第四数目，第三数目指示每种字符数目对应的第三词语集合与第四词语集合中不同词语的数目之和，第四数目指示至少一个第三词语集合与至少一个第四词语集合中词语的总数目。5033. Determine a third number and a fourth number, where the third number indicates the sum of the number of different words in the third word set and the fourth word set corresponding to each number of characters, and the fourth number indicates at least one third word set and at least one different word set. The total number of words in a fourth word set.

5034、将第三数目与第四数目的比值，确定为该第一文本信息与该标签信息之间的第七相似度。5034. Determine the ratio of the third number to the fourth number as the seventh degree of similarity between the first text information and the tag information.

该步骤5031-5034与上述步骤3031-3034同理，在此不再赘述。The steps 5031-5034 are the same as the above-mentioned steps 3031-3034, and are not repeated here.

可选地，获取该第一文本信息与该标签信息之间的第八相似度的过程，包括以下两种方式：Optionally, the process of acquiring the eighth degree of similarity between the first text information and the label information includes the following two ways:

第一种方式：分别对该第一文本信息及该标签信息进行语义提取，得到该第一文本信息的第四语义特征及该标签信息的第五语义特征；将第四语义特征与第五语义特征之间的相似度，确定为第八相似度。The first way: perform semantic extraction on the first text information and the label information, respectively, to obtain the fourth semantic feature of the first text information and the fifth semantic feature of the label information; the fourth semantic feature and the fifth semantic feature are obtained. The similarity between the features is determined as the eighth similarity.

获取第八相似度的第一种方式，与上述获取第四相似度的第一种方式同理，在此不再赘述。The first manner of acquiring the eighth degree of similarity is the same as the above-mentioned first manner of acquiring the fourth degree of similarity, and details are not described herein again.

第二种方式：将该第一文本信息与该标签信息进行拼接，得到第一拼接文本信息；对第一拼接文本信息进行语义提取，得到第一拼接文本信息对应的第六语义特征；对第六语义特征进行分类处理，得到分类结果；将分类结果确定为第八相似度。The second way: splicing the first text information and the label information to obtain the first splicing text information; performing semantic extraction on the first splicing text information to obtain the sixth semantic feature corresponding to the first splicing text information; Six semantic features are classified and processed to obtain a classification result; the classification result is determined as the eighth degree of similarity.

获取第八相似度的第二种方式，与上述获取第四相似度的第二种方式同理，在此不再赘述。The second manner of acquiring the eighth degree of similarity is the same as the second manner of acquiring the fourth degree of similarity described above, and details are not described herein again.

需要说明的是，本申请实施例是以上述两种方式分别来获取第八相似度的，而在另一实施例中，上述两种方式能够结合，将上述两种方式获取到的相似度进行加权融合，得到第八相似度。在一种可能实现方式中，将按照上述第一种方式得到的相似度为第九相似度，按照上述第二种方式得到的相似度为第十相似度，将第九相似度与第十相似度加权融合，得到第八相似度。It should be noted that, in this embodiment of the present application, the above two methods are used to obtain the eighth similarity, while in another embodiment, the above two methods can be combined, and the similarity obtained by the above two methods Weighted fusion, the eighth similarity is obtained. In a possible implementation manner, the similarity obtained according to the first method above is the ninth similarity, the similarity obtained according to the second method above is the tenth similarity, and the ninth similarity is similar to the tenth similarity The degree-weighted fusion is used to obtain the eighth degree of similarity.

504、计算机设备基于每个第一文本信息对应的相似度，从第一文本信息集合中筛选出相似度大于相似度阈值的至少一个第二文本信息，将筛选出的第二文本信息及对应的标签信息构成第二文本信息集合。504. Based on the similarity corresponding to each first text information, the computer device filters out at least one second text information whose similarity is greater than the similarity threshold from the first text information set, and selects the filtered second text information and the corresponding The tag information constitutes a second set of text information.

其中，相似度阈值为任意的阈值，例如，相似度阈值为0.8，或者3等。The similarity threshold is an arbitrary threshold, for example, the similarity threshold is 0.8, or 3, and so on.

在本申请实施例中，第一文本信息集合包括多个第一文本信息及每个第一文本信息对应的标签信息，每个第一文本信息与对应的标签信息构成一个信息组合，第一文本信息集合中的信息组合的质量不一，可能存在较差的信息组合，即信息组合中的第一文本信息与标签信息不相似，如果按照较差的信息组合对文本映射模型进行训练，会导致文本映射模型的映射效果差。因此，通过对第一文本信息集合中的文本信息及对应的标签信息进行筛选，生成一个第二文本信息集合，以保证第二文本信息集合中的每个第二文本信息与对应的标签信息之间的相似度大于相似度阈值，即第二文本信息集合包括高质量的信息组合，后续基于第二文本信息集合中的第二文本信息及对应的标签信息，对文本映射模型进行训练，能够保证文本映射模型的映射效果。In the embodiment of the present application, the first text information set includes a plurality of first text information and label information corresponding to each first text information, each first text information and the corresponding label information constitute an information combination, and the first text information The quality of the information combination in the information set is different, and there may be a poor information combination, that is, the first text information in the information combination is not similar to the label information. If the text mapping model is trained according to the poor information combination, it will lead to The mapping effect of the text mapping model is poor. Therefore, by screening the text information and the corresponding label information in the first text information set, a second text information set is generated to ensure that each second text information in the second text information set and the corresponding label information are consistent with each other. The similarity between the two is greater than the similarity threshold, that is, the second text information set includes high-quality information combinations, and the subsequent training of the text mapping model based on the second text information and the corresponding label information in the second text information set can ensure that The mapping effect of the text mapping model.

需要说明的是，本申请实施例是先基于第一文本信息集合中的第一文本信息及对应的标签信息对文本映射模型进行训练，之后再基于第一文本信息集合筛选出第二文本信息集合的，而在另一实施例中，能够先执行上述步骤503-504，筛选出第二文本信息集合，之后，再执行上述步骤502，本申请对执行步骤的顺序不作限定。It should be noted that, in this embodiment of the present application, the text mapping model is trained based on the first text information in the first text information set and the corresponding label information, and then the second text information set is filtered based on the first text information set. However, in another embodiment, the above steps 503-504 can be performed first to filter out the second text information set, and then the above step 502 can be performed. The present application does not limit the order of the execution steps.

505、计算机设备从第二文本信息集合中获取样本文本信息及第一标签信息，样本文本信息为任一第二文本信息。505. The computer device acquires sample text information and first label information from the second text information set, where the sample text information is any second text information.

在本申请实施例中，第二文本信息集合包括至少一个第二文本信息及每个第二文本信息对应的标签信息，从该第二文本信息集合中获取任一第二文本信息作为样本文本信息，并获取该样本文本信息对应的第一标签信息。In this embodiment of the present application, the second text information set includes at least one second text information and label information corresponding to each second text information, and any second text information is obtained from the second text information set as sample text information , and obtain the first label information corresponding to the sample text information.

需要说明的是，本申请实施例是先对第一文本信息集合进行筛选得到第二文本信息集合，从第二文本信息集合中获取样本文本信息及第一标签信息的，而在另一实施例中，无需执行步骤504-505，能够采取其他方式，基于每个第一文本信息对应的相似度，从第一文本信息集合中筛选出相似度大于相似度阈值的样本文本信息及第一标签信息。It should be noted that, in this embodiment of the present application, the first text information set is first screened to obtain the second text information set, and the sample text information and the first label information are obtained from the second text information set. , there is no need to perform steps 504-505, and other methods can be adopted to screen out the sample text information and the first label information whose similarity is greater than the similarity threshold from the first text information set based on the similarity corresponding to each first text information .

需要说明的是，本申请实施例是先基于第一文本信息集合对文本映射模型进行初步训练，之后，在对第一文本信息集合中的第一文本信息进行筛选，构建第二文本信息集合，从第二文本信息集合中获取样本文本信息及第一标签信息的，而在另一实施例中，无需先基于第一文本信息集合对文本映射模型进行训练，也无需执行步骤501-504，能够采取其他方式，获取样本文本信息及第一标签信息，第一标签信息为与样本文本信息之间的相似度不小于相似度阈值的文本信息。It should be noted that, in the embodiment of the present application, the text mapping model is initially trained based on the first text information set, and then the first text information in the first text information set is screened to construct the second text information set, The sample text information and the first label information are obtained from the second text information set, and in another embodiment, there is no need to first train the text mapping model based on the first text information set, and there is no need to perform steps 501-504. Other methods are used to obtain sample text information and first label information, where the first label information is text information whose similarity with the sample text information is not less than a similarity threshold.

506、计算机设备基于文本映射模型，对样本文本信息进行映射，得到预测文本信息。506. The computer device maps the sample text information based on the text mapping model to obtain predicted text information.

507、计算机设备基于预测文本信息与样本文本信息之间的第一相似度及第一标签信息与样本文本信息之间的第二相似度，确定第二标签信息，第二标签信息为预测文本信息和第一标签信息中与样本文本信息之间的相似度较大的文本信息。507. The computer device determines the second label information based on the first similarity between the predicted text information and the sample text information and the second similarity between the first label information and the sample text information, and the second label information is the predicted text information and the text information with a greater similarity between the first label information and the sample text information.

508、计算机设备基于第二标签信息、预测文本信息及预测文本信息对应的第一相似度，对文本映射模型进行训练，该文本映射模型用于映射出任一文本信息的相似文本信息。508. The computer device trains a text mapping model based on the second label information, the predicted text information, and the first similarity corresponding to the predicted text information, where the text mapping model is used to map similar text information of any text information.

该步骤506-508与上述步骤302-307同理，在此不再赘述。The steps 506-508 are the same as the above-mentioned steps 302-307, and are not repeated here.

并且，在对文本映射模型进行训练的过程中，会对第二文本信息集合中样本文本信息对应的标签信息进行更新，使得更新后的标签信息为预测文本信息和标签文本信息中与样本文本信息最相似的文本信息，以保证标签信息的准确性，以此来对文本映射模型进行训练，能够提升文本映射模型的映射效果，避免以固定标签的方式对模型进行训练，使得文本映射模型具有更好的泛化能力。In addition, in the process of training the text mapping model, the label information corresponding to the sample text information in the second text information set is updated, so that the updated label information is the predicted text information and the label text information and the sample text information. The most similar text information to ensure the accuracy of the label information, in order to train the text mapping model, it can improve the mapping effect of the text mapping model, avoid training the model with fixed labels, and make the text mapping model more efficient. good generalization ability.

并且，通过对初始的第一文本信息集合进行筛选，清理第一文本信息集合中低质量的信息组合，以保证筛选出的第二文本信息中的第二文本信息与对应的标签信息之间的相似度大于相似度阈值，提升了第二文本信息集合中信息组合的质量，以第二文本信息集合中的第二文本信息及对应的标签信息对文本映射模型进行训练，以保证训练文本映射模型的样本信息的准确性，从而保证后续训练文本映射模型的训练效果。In addition, by screening the initial first text information set, the low-quality information combinations in the first text information set are cleaned up, so as to ensure that the second text information in the screened second text information and the corresponding tag information are indistinguishable. The similarity is greater than the similarity threshold, which improves the quality of the information combination in the second text information set, and the text mapping model is trained with the second text information in the second text information set and the corresponding label information, so as to ensure the training of the text mapping model. The accuracy of the sample information, so as to ensure the training effect of the subsequent training text mapping model.

并且，本申请采用多阶段对文本映射模型进行训练，先基于第一文本信息集合对文本映射模型进行初步的训练，使得文本映射模型具备初步的文本映射能力，之后再以筛选出的第二文本信息集合对文本映射模型进行训练，进一步提升文本映射模型的映射效果。In addition, the present application adopts multi-stage training for the text mapping model. First, the text mapping model is initially trained based on the first text information set, so that the text mapping model has preliminary text mapping capabilities, and then the screened second text is used. The information set trains the text mapping model to further improve the mapping effect of the text mapping model.

如图6所示，对文本映射模型进行两个阶段的训练，第一个训练阶段，基于第一文本信息集合对文本映射模型进行训练，第二个训练阶段，采用打分机制，基于第一文本信息集合，筛选出第二文本信息集合，基于第二文本信息集合对文本映射模型进行训练。并且，在训练过程中，采用打分机制，确定出样本文本信息与预测文本信息之间的第一相似度及第一标签信息与样本文本信息之间的第二相似度，基于第一相似度和第二相似度，确定第二标签信息，基于第二标签信息、预测文本信息及预测文本信息对应的第一相似度，对文本映射模型进行强化训练。As shown in Figure 6, the text mapping model is trained in two stages. In the first training stage, the text mapping model is trained based on the first text information set. In the second training stage, a scoring mechanism is used, based on the first text information set, filter out the second text information set, and train the text mapping model based on the second text information set. Moreover, in the training process, a scoring mechanism is used to determine the first similarity between the sample text information and the predicted text information and the second similarity between the first label information and the sample text information, based on the first similarity and The second similarity is to determine the second label information, and based on the second label information, the predicted text information, and the first similarity corresponding to the predicted text information, the text mapping model is intensively trained.

如图7所示，将本申请实施例提供的文本映射模型的处理方法，与相关技术中的文本映射模型处理方法进行比较。相关技术1是直接基于原始的训练数据集对文本映射模型进行训练，相关技术2是对原始的训练数据集进行过滤，采用交叉熵损失函数，并通过过滤后的训练数据集，对文本映射模型进行训练。通过对比可知，本申请实施例提供的方法得到的训练数据量最小，但本申请实施例提供的文本映射模型映射出的相似文本信息的数量多，且映射出的相似文本信息的质量高，即本申请实施例提供的文本映射模型的性能更优，映射效果更好。As shown in FIG. 7 , the processing method of the text mapping model provided by the embodiment of the present application is compared with the processing method of the text mapping model in the related art. Related technology 1 is to train the text mapping model directly based on the original training data set, and related technology 2 is to filter the original training data set, adopt the cross-entropy loss function, and pass the filtered training data set. to train. By comparison, it can be seen that the amount of training data obtained by the method provided in the embodiment of the present application is the smallest, but the amount of similar text information mapped by the text mapping model provided by the embodiment of the present application is large, and the quality of the mapped similar text information is high, that is, The text mapping model provided by the embodiment of the present application has better performance and better mapping effect.

本申请实施例提供的文本映射模型的处理方法能够应用于多种场景下，例如，应用于智能问答场景下，基于该文本映射模型能够实现对问答知识库的扩充。如图8所示，智能对话机器人的问答知识库中配置有多个问题及每个问题的答案，在问答知识库的编辑界面，响应于对任一问题对应的编辑选项的触发操作，显示问答编辑界面，用户能够在该问答编辑界面中编辑问题和对应的答案，基于文本映射模型，对问答编辑界面中输入的问题进行映射，得到该问题的多个相似问题，并显示提示标识801，以提示用户映射出当前输入的问题的多个相似问题，用户点击相似问题选项，显示多个相似问题，用户点击保存选项，将输入的问题与对应的多个相似问题关联，丰富了问答知识库中每个答案对应的多个问题，后续用户与该智能对话机器人进行对话时，用户输入任一问题，智能对话机器人能够从该问答知识库查找该任一问题的答案，从而提升用户的体验，避免了由于问答知识库中配置的问题少而导致智能对话机器人无法针对问题进行回答的情况。The processing method of the text mapping model provided by the embodiment of the present application can be applied in various scenarios, for example, in the scenario of intelligent question answering, and the question and answer knowledge base can be expanded based on the text mapping model. As shown in Figure 8, the question-and-answer knowledge base of the intelligent dialogue robot is configured with multiple questions and the answer to each question. Editing interface, the user can edit questions and corresponding answers in the question and answer editing interface, map the question input in the question and answer editing interface based on the text mapping model, obtain a plurality of similar questions of the question, and display the prompt identification 801 to Prompt the user to map out multiple similar questions of the current input question. The user clicks the similar question option to display multiple similar questions. The user clicks the save option to associate the input question with the corresponding multiple similar questions, which enriches the Q&A knowledge base. There are multiple questions corresponding to each answer. When the subsequent user has a dialogue with the intelligent dialogue robot, the user enters any question, and the intelligent dialogue robot can find the answer to any question from the Q&A knowledge base, thereby improving the user experience and avoiding It solves the situation that the intelligent dialogue robot cannot answer the questions due to the few questions configured in the Q&A knowledge base.

图9是本申请实施例提供的一种文本映射模型的处理装置的结构示意图，如图9所示，该装置包括：FIG. 9 is a schematic structural diagram of an apparatus for processing a text mapping model provided by an embodiment of the present application. As shown in FIG. 9 , the apparatus includes:

获取模块901，用于获取样本文本信息及第一标签信息，第一标签信息为与样本文本信息之间的相似度不小于相似度阈值的文本信息；an acquisition module 901, configured to acquire sample text information and first label information, where the first label information is text information whose similarity with the sample text information is not less than a similarity threshold;

映射模块902，用于基于文本映射模型，对样本文本信息进行映射，得到预测文本信息；A mapping module 902, configured to map the sample text information based on the text mapping model to obtain predicted text information;

确定模块903，用于基于预测文本信息与样本文本信息之间的第一相似度及第一标签信息与样本文本信息之间的第二相似度，确定第二标签信息，第二标签信息为预测文本信息和第一标签信息中与样本文本信息之间的相似度较大的文本信息；The determining module 903 is configured to determine the second label information based on the first similarity between the predicted text information and the sample text information and the second similarity between the first label information and the sample text information, and the second label information is the prediction Text information with a greater similarity between the text information and the first label information and the sample text information;

训练模块904，用于基于第二标签信息、预测文本信息及预测文本信息对应的第一相似度，对文本映射模型进行训练，文本映射模型用于映射出任一文本信息的相似文本信息。The training module 904 is configured to train a text mapping model based on the second label information, the predicted text information and the first similarity corresponding to the predicted text information, and the text mapping model is used to map similar text information of any text information.

在一种可能实现方式中，如图10所示，装置还包括：In a possible implementation manner, as shown in FIG. 10 , the apparatus further includes:

获取模块901，还用于获取预测文本信息与样本文本信息之间的第三相似度和第四相似度，第三相似度指示预测文本信息与样本文本信息包含的词语的差异情况，第四相似度指示预测文本信息与样本文本信息之间的语义相似情况；The obtaining module 901 is further configured to obtain the third similarity and the fourth similarity between the predicted text information and the sample text information, the third similarity indicates the difference between the words contained in the predicted text information and the sample text information, and the fourth similarity The degree indicates the semantic similarity between the predicted text information and the sample text information;

融合模块905，用于对第三相似度及第四相似度进行加权融合，得到第一相似度。The fusion module 905 is used for weighted fusion of the third similarity and the fourth similarity to obtain the first similarity.

在另一种可能实现方式中，获取模块901，用于基于至少一种字符数目，分别对预测文本信息进行划分，得到至少一个第一词语集合，属于同一第一词语集合的词语包含字符的数目相同；基于至少一种字符数目，分别对样本文本信息进行划分，得到至少一个第二词语集合，属于同一第二词语集合的词语包含字符的数目相同；确定第一数目及第二数目，第一数目指示每种字符数目对应的第一词语集合与第二词语集合中不同词语的数目之和，第二数目指示至少一个第一词语集合与至少一个第二词语集合中词语的总数目；将第一数目与第二数目的比值，确定为预测文本信息与样本文本信息之间的第三相似度。In another possible implementation manner, the obtaining module 901 is configured to divide the predicted text information based on the number of at least one character, respectively, to obtain at least one first word set, and words belonging to the same first word set include the number of characters are the same; based on the number of at least one character, the sample text information is divided respectively to obtain at least one second word set, and the words belonging to the same second word set contain the same number of characters; determine the first number and the second number, the first The number indicates the sum of the number of different words in the first word set and the second word set corresponding to each number of characters, and the second number indicates the total number of words in at least one first word set and at least one second word set; The ratio of the first number to the second number is determined as the third degree of similarity between the predicted text information and the sample text information.

在另一种可能实现方式中，获取模块901，用于分别对预测文本信息及样本文本信息进行语义提取，得到预测文本信息的第一语义特征及样本文本信息的第二语义特征；将第一语义特征与第二语义特征之间的相似度，确定为第四相似度。In another possible implementation manner, the obtaining module 901 is configured to perform semantic extraction on the predicted text information and the sample text information respectively, so as to obtain the first semantic feature of the predicted text information and the second semantic feature of the sample text information; The similarity between the semantic feature and the second semantic feature is determined as the fourth similarity.

在另一种可能实现方式中，获取模块901，用于将预测文本信息与样本文本信息进行拼接，得到拼接文本信息；对拼接文本信息进行语义提取，得到拼接文本信息对应的第三语义特征；对第三语义特征进行分类处理，得到分类结果；将分类结果确定为第四相似度。In another possible implementation, the acquisition module 901 is configured to splicing the predicted text information and the sample text information to obtain the spliced text information; perform semantic extraction on the spliced text information to obtain the third semantic feature corresponding to the spliced text information; The third semantic feature is classified to obtain a classification result; the classification result is determined as the fourth similarity.

在另一种可能实现方式中，训练模块904，用于基于第二标签信息及预测文本信息，获取预测文本信息对应的第一损失值；基于预测文本信息对应的第一相似度和第一损失值，对文本映射模型进行训练。In another possible implementation manner, the training module 904 is configured to obtain the first loss value corresponding to the predicted text information based on the second label information and the predicted text information; based on the first similarity and the first loss corresponding to the predicted text information value to train the text mapping model.

在另一种可能实现方式中，训练模块904，用于分别基于目标相似度与每个预测文本信息对应的第一相似度之间的差值，获取每个预测文本信息对应的权重参数；基于每个预测文本信息对应的权重参数，对多个预测文本信息对应的第一损失值进行加权平均，得到第二损失值；将多个预测文本信息对应的第一损失值的平均值，确定为第三损失值；基于第二损失值及第三损失值，对文本映射模型进行训练。In another possible implementation manner, the training module 904 is configured to obtain a weight parameter corresponding to each predicted text information based on the difference between the target similarity and the first similarity corresponding to each predicted text information; For the weight parameter corresponding to each predicted text information, the first loss value corresponding to the multiple predicted text information is weighted and averaged to obtain the second loss value; the average value of the first loss values corresponding to the multiple predicted text information is determined as The third loss value; the text mapping model is trained based on the second loss value and the third loss value.

在另一种可能实现方式中，如图10所示，获取模块901，包括：In another possible implementation manner, as shown in FIG. 10 , the obtaining module 901 includes:

获取单元9011，用于获取第一文本信息集合，第一文本信息集合包括多个第一文本信息及每个第一文本信息对应的标签信息；The obtaining unit 9011 is configured to obtain a first text information set, where the first text information set includes a plurality of first text information and label information corresponding to each first text information;

确定单元9012，用于确定每个第一文本信息与对应的标签信息之间的相似度；a determining unit 9012, configured to determine the similarity between each first text information and the corresponding label information;

筛选单元9013，用于基于每个第一文本信息对应的相似度，从第一文本信息集合中筛选出相似度大于相似度阈值的样本文本信息及第一标签信息。The screening unit 9013 is configured to, based on the similarity corresponding to each first text information, filter out the sample text information and the first label information whose similarity is greater than the similarity threshold from the first text information set.

在另一种可能实现方式中，筛选单元9013，用于基于每个第一文本信息对应的相似度，从第一文本信息集合中筛选出相似度大于相似度阈值的至少一个第二文本信息，将筛选出的第二文本信息及对应的标签信息构成第二文本信息集合；从第二文本信息集合中获取样本文本信息及第一标签信息，样本文本信息为任一第二文本信息。In another possible implementation, the screening unit 9013 is configured to, based on the similarity corresponding to each first text information, filter out at least one second text information whose similarity is greater than the similarity threshold from the first text information set, The filtered second text information and the corresponding label information constitute a second text information set; sample text information and the first label information are obtained from the second text information set, and the sample text information is any second text information.

在另一种可能实现方式中，获取模块901，还用于基于第一文本信息集合中的第一文本信息及对应的标签信息，对文本映射模型进行训练。In another possible implementation manner, the obtaining module 901 is further configured to train the text mapping model based on the first text information and the corresponding label information in the first text information set.

在另一种可能实现方式中，确定模块903，用于响应于多个第一相似度中最大的第一相似度大于第二相似度，将最大的第一相似度对应的预测文本信息，确定为第二标签信息，多个第一相似度为多个预测文本信息与样本文本信息之间的相似度；或者，响应于多个第一相似度均不大于第二相似度，将第一标签信息确定为第二标签信息。In another possible implementation manner, the determining module 903 is configured to, in response to the largest first similarity among the multiple first similarities being greater than the second similarity, determine the predicted text information corresponding to the largest first similarity, to determine is the second label information, and the multiple first similarities are the similarities between the multiple predicted text information and the sample text information; or, in response to the multiple first similarities being no greater than the second similarity, the first label is The information is determined to be the second tag information.

在另一种可能实现方式中，映射模块902，还用于基于文本映射模型，对目标文本信息进行映射，得到目标文本信息的相似文本信息。In another possible implementation manner, the mapping module 902 is further configured to map the target text information based on the text mapping model to obtain similar text information of the target text information.

需要说明的是：上述实施例提供的文本映射模型的处理装置，仅以上述各功能模块的划分进行举例说明，实际应用中，可以根据需要而将上述功能分配由不同的功能模块完成，即将计算机设备的内部结构划分成不同的功能模块，以完成以上描述的全部或者部分功能。另外，上述实施例提供的文本映射模型的处理装置与文本映射模型的处理方法实施例属于同一构思，其具体实现过程详见方法实施例，这里不再赘述。It should be noted that: the processing device of the text mapping model provided in the above-mentioned embodiment is only illustrated by the division of the above-mentioned functional modules. The internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus for processing a text mapping model provided in the above embodiment and the embodiment of the method for processing a text mapping model belong to the same concept, and the specific implementation process is detailed in the method embodiment, which will not be repeated here.

本申请实施例还提供了一种计算机设备，该计算机设备包括处理器和存储器，存储器中存储有至少一条计算机程序，该至少一条计算机程序由处理器加载并执行以实现上述实施例的文本映射模型的处理方法所执行的操作。An embodiment of the present application further provides a computer device, the computer device includes a processor and a memory, the memory stores at least one computer program, and the at least one computer program is loaded and executed by the processor to implement the text mapping model of the above embodiment The operation performed by the processing method.

可选地，计算机设备提供为终端。图11示出了本申请一个示例性实施例提供的终端1100的结构框图。该终端1100可以是便携式移动终端，比如：智能手机、平板电脑、MP3播放器(Moving Picture Experts Group Audio Layer III，动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV，动态影像专家压缩标准音频层面4)播放器、笔记本电脑或台式电脑。终端1100还可能被称为用户设备、便携式终端、膝上型终端、台式终端等其他名称。Optionally, the computer equipment is provided as a terminal. FIG. 11 shows a structural block diagram of a terminal 1100 provided by an exemplary embodiment of the present application. The terminal 1100 may be a portable mobile terminal, such as a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, the standard audio layer 3 of the moving picture experts compression), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic Video Expert Compresses Standard Audio Layer 4) Player, Laptop or Desktop. Terminal 1100 may also be called user equipment, portable terminal, laptop terminal, desktop terminal, and the like by other names.

终端1100包括有：处理器1101和存储器1102。The terminal 1100 includes: a processor 1101 and a memory 1102 .

处理器1101可以包括一个或多个处理核心，比如4核心处理器、8核心处理器等。处理器1101可以采用DSP(Digital Signal Processing，数字信号处理)、FPGA(Field-Programmable Gate Array，现场可编程门阵列)、PLA(Programmable Logic Array，可编程逻辑阵列)中的至少一种硬件形式来实现。处理器1101也可以包括主处理器和协处理器，主处理器是用于对在唤醒状态下的数据进行处理的处理器，也称CPU(Central ProcessingUnit，中央处理器)；协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中，处理器1101可以集成有GPU(Graphics Processing Unit，图像处理器)，GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中，处理器1101还可以包括AI(Artificial Intelligence，人工智能)处理器，该AI处理器用于处理有关机器学习的计算操作。The processor 1101 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 1101 may use at least one hardware form among DSP (Digital Signal Processing, digital signal processing), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, programmable logic array) accomplish. The processor 1101 may also include a main processor and a coprocessor. The main processor is a processor used to process data in a wake-up state, also called a CPU (Central Processing Unit, central processing unit); A low-power processor for processing data in a standby state. In some embodiments, the processor 1101 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used for rendering and drawing the content that needs to be displayed on the display screen. In some embodiments, the processor 1101 may further include an AI (Artificial Intelligence, artificial intelligence) processor, where the AI processor is used to process computing operations related to machine learning.

存储器1102可以包括一个或多个计算机可读存储介质，该计算机可读存储介质可以是非暂态的。存储器1102还可包括高速随机存取存储器，以及非易失性存储器，比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中，存储器1102中的非暂态的计算机可读存储介质用于存储至少一个计算机程序，该至少一个计算机程序用于被处理器1101所执行以实现本申请中方法实施例提供的文本映射模型的处理方法。Memory 1102 may include one or more computer-readable storage media, which may be non-transitory. Memory 1102 may also include high-speed random access memory, as well as non-volatile memory, such as one or more disk storage devices, flash storage devices. In some embodiments, the non-transitory computer-readable storage medium in the memory 1102 is used to store at least one computer program, and the at least one computer program is used to be executed by the processor 1101 to implement the methods provided by the method embodiments in this application. The processing method of the text map model.

在一些实施例中，终端1100还可选包括有：外围设备接口1103和至少一个外围设备。处理器1101、存储器1102和外围设备接口1103之间可以通过总线或信号线相连。各个外围设备可以通过总线、信号线或电路板与外围设备接口1103相连。具体地，外围设备包括：射频电路1104、显示屏1105、摄像头组件1106、音频电路1107、定位组件1108和电源1109中的至少一种。In some embodiments, the terminal 1100 may optionally further include: a peripheral device interface 1103 and at least one peripheral device. The processor 1101, the memory 1102 and the peripheral device interface 1103 may be connected through a bus or a signal line. Each peripheral device can be connected to the peripheral device interface 1103 through a bus, a signal line or a circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 1104 , a display screen 1105 , a camera assembly 1106 , an audio circuit 1107 , a positioning assembly 1108 and a power supply 1109 .

外围设备接口1103可被用于将I/O(Input/Output，输入/输出)相关的至少一个外围设备连接到处理器1101和存储器1102。在一些实施例中，处理器1101、存储器1102和外围设备接口1103被集成在同一芯片或电路板上；在一些其他实施例中，处理器1101、存储器1102和外围设备接口1103中的任意一个或两个可以在单独的芯片或电路板上实现，本实施例对此不加以限定。The peripheral device interface 1103 may be used to connect at least one peripheral device related to I/O (Input/Output) to the processor 1101 and the memory 1102 . In some embodiments, processor 1101, memory 1102, and peripherals interface 1103 are integrated on the same chip or circuit board; in some other embodiments, any one of processor 1101, memory 1102, and peripherals interface 1103 or The two can be implemented on a separate chip or circuit board, which is not limited in this embodiment.

射频电路1104用于接收和发射RF(Radio Frequency，射频)信号，也称电磁信号。射频电路1104通过电磁信号与通信网络以及其他通信设备进行通信。射频电路1104将电信号转换为电磁信号进行发送，或者，将接收到的电磁信号转换为电信号。可选地，射频电路1104包括：天线系统、RF收发器、一个或多个放大器、调谐器、振荡器、数字信号处理器、编解码芯片组、用户身份模块卡等等。射频电路1104可以通过至少一种无线通信协议来与其它终端进行通信。该无线通信协议包括但不限于：万维网、城域网、内联网、各代移动通信网络(2G、3G、4G及5G)、无线局域网和/或WiFi(Wireless Fidelity，无线保真)网络。在一些实施例中，射频电路1104还可以包括NFC(Near Field Communication，近距离无线通信)有关的电路，本申请对此不加以限定。The radio frequency circuit 1104 is used for receiving and transmitting RF (Radio Frequency, radio frequency) signals, also called electromagnetic signals. The radio frequency circuit 1104 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 1104 converts electrical signals into electromagnetic signals for transmission, or converts received electromagnetic signals into electrical signals. Optionally, the radio frequency circuit 1104 includes an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and the like. The radio frequency circuit 1104 may communicate with other terminals through at least one wireless communication protocol. The wireless communication protocol includes but is not limited to: World Wide Web, Metropolitan Area Network, Intranet, various generations of mobile communication networks (2G, 3G, 4G and 5G), wireless local area network and/or WiFi (Wireless Fidelity, Wireless Fidelity) network. In some embodiments, the radio frequency circuit 1104 may further include a circuit related to NFC (Near Field Communication, short-range wireless communication), which is not limited in this application.

显示屏1105用于显示UI(User Interface，用户界面)。该UI可以包括图形、文本、图标、视频及其它们的任意组合。当显示屏1105是触摸显示屏时，显示屏1105还具有采集在显示屏1105的表面或表面上方的触摸信号的能力。该触摸信号可以作为控制信号输入至处理器1101进行处理。此时，显示屏1105还可以用于提供虚拟按钮和/或虚拟键盘，也称软按钮和/或软键盘。在一些实施例中，显示屏1105可以为一个，设置在终端1100的前面板；在另一些实施例中，显示屏1105可以为至少两个，分别设置在终端1100的不同表面或呈折叠设计；在另一些实施例中，显示屏1105可以是柔性显示屏，设置在终端1100的弯曲表面上或折叠面上。甚至，显示屏1105还可以设置成非矩形的不规则图形，也即异形屏。显示屏1105可以采用LCD(Liquid Crystal Display，液晶显示屏)、OLED(Organic Light-EmittingDiode,有机发光二极管)等材质制备。The display screen 1105 is used for displaying UI (User Interface, user interface). The UI can include graphics, text, icons, video, and any combination thereof. When the display screen 1105 is a touch display screen, the display screen 1105 also has the ability to acquire touch signals on or above the surface of the display screen 1105 . The touch signal can be input to the processor 1101 as a control signal for processing. At this time, the display screen 1105 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, there may be one display screen 1105, which is arranged on the front panel of the terminal 1100; in other embodiments, there may be at least two display screens 1105, which are respectively arranged on different surfaces of the terminal 1100 or in a folded design; In other embodiments, the display screen 1105 may be a flexible display screen, which is disposed on a curved surface or a folding surface of the terminal 1100 . Even, the display screen 1105 can also be set as a non-rectangular irregular figure, that is, a special-shaped screen. The display screen 1105 can be made of materials such as LCD (Liquid Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode, organic light emitting diode).

摄像头组件1106用于采集图像或视频。可选地，摄像头组件1106包括前置摄像头和后置摄像头。前置摄像头设置在终端的前面板，后置摄像头设置在终端的背面。在一些实施例中，后置摄像头为至少两个，分别为主摄像头、景深摄像头、广角摄像头、长焦摄像头中的任意一种，以实现主摄像头和景深摄像头融合实现背景虚化功能、主摄像头和广角摄像头融合实现全景拍摄以及VR(Virtual Reality，虚拟现实)拍摄功能或者其它融合拍摄功能。在一些实施例中，摄像头组件1106还可以包括闪光灯。闪光灯可以是单色温闪光灯，也可以是双色温闪光灯。双色温闪光灯是指暖光闪光灯和冷光闪光灯的组合，可以用于不同色温下的光线补偿。The camera assembly 1106 is used to capture images or video. Optionally, the camera assembly 1106 includes a front camera and a rear camera. The front camera is arranged on the front panel of the terminal, and the rear camera is arranged on the back of the terminal. In some embodiments, there are at least two rear cameras, which are any one of a main camera, a depth-of-field camera, a wide-angle camera, and a telephoto camera, so as to realize the fusion of the main camera and the depth-of-field camera to realize the background blur function, the main camera It is integrated with the wide-angle camera to achieve panoramic shooting and VR (Virtual Reality, virtual reality) shooting functions or other integrated shooting functions. In some embodiments, the camera assembly 1106 may also include a flash. The flash can be a single color temperature flash or a dual color temperature flash. Dual color temperature flash refers to the combination of warm light flash and cold light flash, which can be used for light compensation under different color temperatures.

音频电路1107可以包括麦克风和扬声器。麦克风用于采集用户及环境的声波，并将声波转换为电信号输入至处理器1101进行处理，或者输入至射频电路1104以实现语音通信。出于立体声采集或降噪的目的，麦克风可以为多个，分别设置在终端1100的不同部位。麦克风还可以是阵列麦克风或全向采集型麦克风。扬声器则用于将来自处理器1101或射频电路1104的电信号转换为声波。扬声器可以是传统的薄膜扬声器，也可以是压电陶瓷扬声器。当扬声器是压电陶瓷扬声器时，不仅可以将电信号转换为人类可听见的声波，也可以将电信号转换为人类听不见的声波以进行测距等用途。在一些实施例中，音频电路1107还可以包括耳机插孔。Audio circuitry 1107 may include a microphone and speakers. The microphone is used to collect the sound waves of the user and the environment, convert the sound waves into electrical signals, and input them to the processor 1101 for processing, or to the radio frequency circuit 1104 to realize voice communication. For the purpose of stereo collection or noise reduction, there may be multiple microphones, which are respectively disposed in different parts of the terminal 1100 . The microphone may also be an array microphone or an omnidirectional collection microphone. The speaker is used to convert the electrical signal from the processor 1101 or the radio frequency circuit 1104 into sound waves. The loudspeaker can be a traditional thin-film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, it can not only convert electrical signals into sound waves audible to humans, but also convert electrical signals into sound waves inaudible to humans for distance measurement and other purposes. In some embodiments, audio circuitry 1107 may also include a headphone jack.

定位组件1108用于定位终端1100的当前地理位置，以实现导航或LBS(LocationBased Service，基于位置的服务)。定位组件1108可以是基于美国的GPS(GlobalPositioning System，全球定位系统)、中国的北斗系统或俄罗斯的伽利略系统的定位组件。The positioning component 1108 is used to locate the current geographic location of the terminal 1100 to implement navigation or LBS (Location Based Service, location-based service). The positioning component 1108 may be a positioning component based on the GPS (Global Positioning System, global positioning system) of the United States, the Beidou system of China or the Galileo system of Russia.

电源1109用于为终端1100中的各个组件进行供电。电源1109可以是交流电、直流电、一次性电池或可充电电池。当电源1109包括可充电电池时，该可充电电池可以是有线充电电池或无线充电电池。有线充电电池是通过有线线路充电的电池，无线充电电池是通过无线线圈充电的电池。该可充电电池还可以用于支持快充技术。The power supply 1109 is used to power various components in the terminal 1100 . The power source 1109 may be alternating current, direct current, disposable batteries, or rechargeable batteries. When the power source 1109 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. Wired rechargeable batteries are batteries that are charged through wired lines, and wireless rechargeable batteries are batteries that are charged through wireless coils. The rechargeable battery can also be used to support fast charging technology.

在一些实施例中，终端1100还包括有一个或多个传感器1110。该一个或多个传感器1110包括但不限于：加速度传感器1111、陀螺仪传感器1112、压力传感器1113、指纹传感器1114、光学传感器1115以及接近传感器1116。In some embodiments, the terminal 1100 also includes one or more sensors 1110 . The one or more sensors 1110 include, but are not limited to, an acceleration sensor 1111 , a gyro sensor 1112 , a pressure sensor 1113 , a fingerprint sensor 1114 , an optical sensor 1115 , and a proximity sensor 1116 .

加速度传感器1111可以检测以终端1100建立的坐标系的三个坐标轴上的加速度大小。比如，加速度传感器1111可以用于检测重力加速度在三个坐标轴上的分量。处理器1101可以根据加速度传感器1111采集的重力加速度信号，控制显示屏1105以横向视图或纵向视图进行用户界面的显示。加速度传感器1111还可以用于游戏或者用户的运动数据的采集。The acceleration sensor 1111 can detect the magnitude of acceleration on the three coordinate axes of the coordinate system established by the terminal 1100 . For example, the acceleration sensor 1111 can be used to detect the components of the gravitational acceleration on the three coordinate axes. The processor 1101 can control the display screen 1105 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1111 . The acceleration sensor 1111 can also be used for game or user movement data collection.

陀螺仪传感器1112可以检测终端1100的机体方向及转动角度，陀螺仪传感器1112可以与加速度传感器1111协同采集用户对终端1100的3D动作。处理器1101根据陀螺仪传感器1112采集的数据，可以实现如下功能：动作感应(比如根据用户的倾斜操作来改变UI)、拍摄时的图像稳定、游戏控制以及惯性导航。The gyroscope sensor 1112 can detect the body direction and rotation angle of the terminal 1100 , and the gyroscope sensor 1112 can cooperate with the acceleration sensor 1111 to collect 3D actions of the user on the terminal 1100 . The processor 1101 can implement the following functions according to the data collected by the gyro sensor 1112: motion sensing (such as changing the UI according to the user's tilt operation), image stabilization during shooting, game control, and inertial navigation.

压力传感器1113可以设置在终端1100的侧边框和/或显示屏1105的下层。当压力传感器1113设置在终端1100的侧边框时，可以检测用户对终端1100的握持信号，由处理器1101根据压力传感器1113采集的握持信号进行左右手识别或快捷操作。当压力传感器1113设置在显示屏1105的下层时，由处理器1101根据用户对显示屏1105的压力操作，实现对UI界面上的可操作性控件进行控制。可操作性控件包括按钮控件、滚动条控件、图标控件、菜单控件中的至少一种。The pressure sensor 1113 may be disposed on the side frame of the terminal 1100 and/or the lower layer of the display screen 1105 . When the pressure sensor 1113 is disposed on the side frame of the terminal 1100, the user's holding signal of the terminal 1100 can be detected, and the processor 1101 can perform left and right hand identification or shortcut operations according to the holding signal collected by the pressure sensor 1113. When the pressure sensor 1113 is disposed on the lower layer of the display screen 1105, the processor 1101 controls the operability controls on the UI interface according to the user's pressure operation on the display screen 1105. The operability controls include at least one of button controls, scroll bar controls, icon controls, and menu controls.

指纹传感器1114用于采集用户的指纹，由处理器1101根据指纹传感器1114采集到的指纹识别用户的身份，或者，由指纹传感器1114根据采集到的指纹识别用户的身份。在识别出用户的身份为可信身份时，由处理器1101授权该用户执行相关的敏感操作，该敏感操作包括解锁屏幕、查看加密信息、下载软件、支付及更改设置等。指纹传感器1114可以被设置在终端1100的正面、背面或侧面。当终端1100上设置有物理按键或厂商Logo时，指纹传感器1114可以与物理按键或厂商Logo集成在一起。The fingerprint sensor 1114 is used to collect the user's fingerprint, and the processor 1101 identifies the user's identity according to the fingerprint collected by the fingerprint sensor 1114, or the fingerprint sensor 1114 identifies the user's identity according to the collected fingerprint. When the user's identity is identified as a trusted identity, the processor 1101 authorizes the user to perform relevant sensitive operations, including unlocking the screen, viewing encrypted information, downloading software, making payments, and changing settings. The fingerprint sensor 1114 may be disposed on the front, back or side of the terminal 1100 . When the terminal 1100 is provided with physical buttons or a manufacturer's logo, the fingerprint sensor 1114 may be integrated with the physical buttons or the manufacturer's logo.

光学传感器1115用于采集环境光强度。在一个实施例中，处理器1101可以根据光学传感器1115采集的环境光强度，控制显示屏1105的显示亮度。具体地，当环境光强度较高时，调高显示屏1105的显示亮度；当环境光强度较低时，调低显示屏1105的显示亮度。在另一个实施例中，处理器1101还可以根据光学传感器1115采集的环境光强度，动态调整摄像头组件1106的拍摄参数。Optical sensor 1115 is used to collect ambient light intensity. In one embodiment, the processor 1101 can control the display brightness of the display screen 1105 according to the ambient light intensity collected by the optical sensor 1115 . Specifically, when the ambient light intensity is high, the display brightness of the display screen 1105 is increased; when the ambient light intensity is low, the display brightness of the display screen 1105 is decreased. In another embodiment, the processor 1101 can also dynamically adjust the shooting parameters of the camera assembly 1106 according to the ambient light intensity collected by the optical sensor 1115 .

接近传感器1116，也称距离传感器，设置在终端1100的前面板。接近传感器1116用于采集用户与终端1100的正面之间的距离。在一个实施例中，当接近传感器1116检测到用户与终端1100的正面之间的距离逐渐变小时，由处理器1101控制显示屏1105从亮屏状态切换为息屏状态；当接近传感器1116检测到用户与终端1100的正面之间的距离逐渐变大时，由处理器1101控制显示屏1105从息屏状态切换为亮屏状态。A proximity sensor 1116 , also called a distance sensor, is provided on the front panel of the terminal 1100 . The proximity sensor 1116 is used to collect the distance between the user and the front of the terminal 1100 . In one embodiment, when the proximity sensor 1116 detects that the distance between the user and the front of the terminal 1100 is gradually decreasing, the processor 1101 controls the display screen 1105 to switch from the bright screen state to the off screen state; when the proximity sensor 1116 detects When the distance between the user and the front of the terminal 1100 gradually increases, the processor 1101 controls the display screen 1105 to switch from the screen-off state to the screen-on state.

本领域技术人员可以理解，图11中示出的结构并不构成对终端1100的限定，可以包括比图示更多或更少的组件，或者组合某些组件，或者采用不同的组件布置。Those skilled in the art can understand that the structure shown in FIG. 11 does not constitute a limitation on the terminal 1100, and may include more or less components than the one shown, or combine some components, or adopt different component arrangements.

可选地，计算机设备提供为服务器。图12是本申请实施例提供的一种服务器的结构示意图，该服务器1200可因配置或性能不同而产生比较大的差异，可以包括一个或一个以上处理器(Central Processing Units，CPU)1201和一个或一个以上的存储器1202，其中，存储器1202中存储有至少一条计算机程序，至少一条计算机程序由处理器1201加载并执行以实现上述各个方法实施例提供的方法。当然，该服务器还可以具有有线或无线网络接口、键盘及输入输出接口等部件，以便进行输入输出，该服务器还可以包括其他用于实现设备功能的部件，在此不做赘述。Optionally, the computer device is provided as a server. FIG. 12 is a schematic structural diagram of a server provided by an embodiment of the present application. The server 1200 may vary greatly due to different configurations or performance, and may include one or more processors (Central Processing Units, CPU) 1201 and a Or more than one memory 1202, wherein, at least one computer program is stored in the memory 1202, and the at least one computer program is loaded and executed by the processor 1201 to implement the methods provided by the above method embodiments. Of course, the server may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface for input and output, and the server may also include other components for implementing device functions, which will not be described here.

本申请实施例还提供了一种计算机可读存储介质，该计算机可读存储介质中存储有至少一条计算机程序，该至少一条计算机程序由处理器加载并执行以实现上述实施例的文本映射模型的处理方法所执行的操作。Embodiments of the present application further provide a computer-readable storage medium, where at least one computer program is stored in the computer-readable storage medium, and the at least one computer program is loaded and executed by a processor to implement the text mapping model of the foregoing embodiment. The action performed by the handler method.

本申请实施例还提供了一种计算机程序产品或计算机程序，提供了一种计算机程序产品，包括计算机程序，计算机程序被处理器执行时实现如上述实施例的文本映射模型的处理方法所执行的操作。Embodiments of the present application further provide a computer program product or computer program, and provide a computer program product, including a computer program, when the computer program is executed by a processor, the processing method for implementing the text mapping model in the above-mentioned embodiments is executed. operate.

本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成，也可以通过程序来指令相关的硬件完成，所述程序可以存储于一种计算机可读存储介质中，上述提到的存储介质可以是只读存储器，磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps for implementing the above embodiments can be completed by hardware, or can be completed by instructing relevant hardware through a program, and the program can be stored in a computer-readable storage medium. The storage medium can be read-only memory, magnetic disk or optical disk, etc.

以上所述仅为本申请实施例的可选实施例，并不用以限制本申请实施例，凡在本申请实施例的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本申请的保护范围之内。The above descriptions are only optional embodiments of the embodiments of the present application, and are not intended to limit the embodiments of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the embodiments of the present application shall be Included within the scope of protection of this application.

Claims

1. A processing method for a text mapping model, wherein the method comprises:

Obtain sample text information and first label information, where the first label information is text information whose similarity with the sample text information is not less than a similarity threshold;

Based on the text mapping model, the sample text information is mapped to obtain the predicted text information;

Based on a first similarity between the predicted text information and the sample text information and a second similarity between the first label information and the sample text information, second label information is determined, and the second label information is determined. The label information is the text information with a larger similarity between the predicted text information and the first label information and the sample text information;

Based on the second label information, the predicted text information, and the first similarity corresponding to the predicted text information, the text mapping model is trained, and the text mapping model is used to map any text information. Similar text information.

2 . The method according to claim 1 , wherein the method based on the first similarity between the predicted text information and the sample text information and the difference between the first label information and the sample text information. 3 . before determining the second label information, the method further includes:

obtaining a third degree of similarity and a fourth degree of similarity between the predicted text information and the sample text information, where the third degree of similarity indicates the difference between the predicted text information and the words contained in the sample text information, The fourth degree of similarity indicates semantic similarity between the predicted text information and the sample text information;

The third similarity and the fourth similarity are weighted and fused to obtain the first similarity.

3. The method according to claim 2, wherein the obtaining the third similarity between the predicted text information and the sample text information comprises:

Based on the number of at least one character, the predicted text information is divided respectively to obtain at least one first word set, and the words belonging to the same first word set contain the same number of characters;

Based on the number of at least one of the characters, the sample text information is divided respectively to obtain at least one second word set, and the words belonging to the same second word set contain the same number of characters;

determining a first number and a second number, the first number indicating the sum of the number of different words in the first set of words and the second set of words corresponding to each of the number of characters, the second number indicating the total number of words in at least one of the first set of words and at least one of the second set of words;

The ratio of the first number to the second number is determined as the third similarity between the predicted text information and the sample text information.

4. The method according to claim 2, wherein obtaining the fourth similarity between the predicted text information and the sample text information comprises:

Semantically extracting the predicted text information and the sample text information, respectively, to obtain a first semantic feature of the predicted text information and a second semantic feature of the sample text information;

The similarity between the first semantic feature and the second semantic feature is determined as the fourth similarity.

5. The method according to claim 2, wherein acquiring the fourth similarity between the predicted text information and the sample text information comprises:

splicing the predicted text information and the sample text information to obtain spliced text information;

performing semantic extraction on the spliced text information to obtain a third semantic feature corresponding to the spliced text information;

classifying the third semantic feature to obtain a classification result;

The classification result is determined as the fourth similarity.

6 . The method according to claim 1 , wherein the text mapping is performed based on the second label information, the predicted text information, and the first similarity corresponding to the predicted text information. 7 . The model is trained, including:

obtaining a first loss value corresponding to the predicted text information based on the second label information and the predicted text information;

The text mapping model is trained based on the first similarity and the first loss value corresponding to the predicted text information.

7. The method according to claim 6, wherein the training of the text mapping model based on the first similarity and the first loss value corresponding to the predicted text information comprises:

Obtain a weight parameter corresponding to each of the predicted text information based on the difference between the target similarity and the first similarity corresponding to each of the predicted text information;

Based on the weight parameter corresponding to each of the predicted text information, a weighted average of the first loss values corresponding to a plurality of the predicted text information is performed to obtain a second loss value;

determining the average value of the first loss values corresponding to the plurality of predicted text information as the third loss value;

The text mapping model is trained based on the second loss value and the third loss value.

8. The method according to claim 1, wherein the acquiring the sample text information and the first label information comprises:

obtaining a first text information set, where the first text information set includes a plurality of first text information and label information corresponding to each of the first text information;

determining the similarity between each of the first text information and the corresponding label information;

Based on the similarity corresponding to each of the first text information, the sample text information and the first label information whose similarity is greater than the similarity threshold are selected from the first text information set.

9 . The method according to claim 8 , wherein, based on the similarity corresponding to each of the first text information, filtering out the similarity greater than the similarity threshold from the first text information set. 10 . The sample text information and the first label information of , including:

Based on the similarity corresponding to each of the first text information, at least one second text information whose similarity is greater than the similarity threshold is selected from the first text information set, and the filtered second text is The information and the corresponding label information constitute a second text information set;

The sample text information and the first label information are acquired from the second text information set, and the sample text information is any one of the second text information.

10. The method according to claim 8, wherein, before the sample text information is mapped based on the text mapping model to obtain predicted text information, the method further comprises:

The text mapping model is trained based on the first text information and corresponding label information in the first text information set.

11. The method according to any one of claims 1-10, wherein the method is based on a first similarity between the predicted text information and the sample text information and the first label information and the The second similarity between the sample text information is determined, and the second label information is determined, including:

In response to that the largest first similarity among the plurality of first similarities is greater than the second similarity, determining the predicted text information corresponding to the largest first similarity as the second label information , a plurality of the first degrees of similarity are the degrees of similarity between a plurality of the predicted text information and the sample text information; or,

In response to a plurality of the first degrees of similarity being not greater than the second degrees of similarity, the first label information is determined as the second label information.

12. The method according to any one of claims 1-10, wherein, based on the second label information, the predicted text information, and the first similarity corresponding to the predicted text information, After training the text mapping model, the method further includes:

Based on the text mapping model, the target text information is mapped to obtain similar text information of the target text information.

13. An apparatus for processing a text mapping model, wherein the apparatus comprises:

an acquisition module, configured to acquire sample text information and first label information, where the first label information is text information whose similarity with the sample text information is not less than a similarity threshold;

a mapping module, configured to map the sample text information based on the text mapping model to obtain predicted text information;

A determination module, configured to determine second label information based on a first similarity between the predicted text information and the sample text information and a second similarity between the first label information and the sample text information , the second label information is the text information with a larger similarity between the predicted text information and the first label information and the sample text information;

A training module, configured to train the text mapping model based on the second label information, the predicted text information, and the first similarity corresponding to the predicted text information, and the text mapping model is used for mapping Similar text messages out of any text message.

14. A computer device, characterized in that the computer device comprises a processor and a memory, the memory stores at least one computer program, and the at least one computer program is loaded and executed by the processor to implement the claim The operation performed by the processing method of the text mapping model according to any one of claims 1 to 12.

15. A computer-readable storage medium, wherein at least one computer program is stored in the computer-readable storage medium, and the at least one computer program is loaded and executed by a processor to implement any one of claims 1 to 12. An operation performed by the processing method of the text mapping model according to the claim.

16. A computer program product, comprising a computer program, characterized in that, when the computer program is executed by a processor, the operations performed by the method for processing a text mapping model according to any one of claims 1 to 12 are implemented.