CN114579799A

CN114579799A - A method, device, device and medium for generating a recorded manuscript

Info

Publication number: CN114579799A
Application number: CN202210129457.7A
Authority: CN
Inventors: 徐波
Original assignee: GUANGDONG LIWEI NETWORK TECHNOLOGY CO LTD; Duoyi Network Co ltd
Current assignee: GUANGDONG LIWEI NETWORK TECHNOLOGY CO LTD; Duoyi Network Co ltd
Priority date: 2022-02-11
Filing date: 2022-02-11
Publication date: 2022-06-03

Abstract

The invention provides a recording manuscript generating method, a device, equipment and a medium, wherein the method comprises the following steps: obtaining a self-defined pinyin combination sequence; converting each pinyin in the pinyin combination sequence into a corresponding Chinese character based on a preset mapping relation between the pinyin sequence and the Chinese character sequence to obtain a Chinese character combination sequence; inputting the Chinese character combination sequence into a trained text generation model to obtain an initial recording manuscript; and correcting the error of the initial recording manuscript to obtain a target recording manuscript. By adopting the embodiment of the invention, the input pinyin sequence can be customized according to actual requirements to generate the recording manuscript which has less characters and contains all pinyin types, thereby further reducing the recording corpus required for constructing the voice synthesis model and lowering the threshold of customizing the sound library.

Description

A method, device, device and medium for generating a recorded manuscript

技术领域technical field

本发明涉及录音技术领域，尤其涉及一种录音文稿生成方法、装置、设备及介质。The present invention relates to the technical field of audio recording, and in particular, to a method, device, equipment and medium for generating audio recordings.

背景技术Background technique

构建语音合成模型需要文本语音的对齐语料，而文本语音的对齐语料一般是通过配音人员根据录音文稿进行录音获取，因此，录音文稿的内容质量直接决定了文本语音的对齐语料的内容质量。录音文稿是用于指导配音人员进行录音的文本文件，包含需要录音的文字内容，但是，本发明人在对现有技术的研究中发现，现有技术中构建录音文稿的方法，或是构建录音文稿的文稿数量较多，或是需要专业写手来撰写录音文稿，成本较高，均不满足通过少量语料进行音库定制的需求。Constructing a speech synthesis model requires the alignment corpus of text and speech, and the alignment corpus of text and speech is generally obtained by dubbing personnel according to the recorded manuscript. Therefore, the content quality of the recorded manuscript directly determines the content quality of the alignment corpus of text and speech. The recording manuscript is a text file used to instruct the dubbing personnel to perform recording, and contains the text content that needs to be recorded. However, the inventor found in the research on the prior art that the method for constructing the recording manuscript in the prior art, or the construction of the recording The number of manuscripts is large, or professional writers are required to write recording manuscripts, and the cost is high, which does not meet the needs of customizing the sound library with a small amount of corpus.

发明内容SUMMARY OF THE INVENTION

本发明提供一种录音文稿生成方法、装置、设备及介质，能够减少构建语音合成模型所需的录音语料，降低音库定制的门槛。The present invention provides a method, device, equipment and medium for generating a recording manuscript, which can reduce the recording corpus required for building a speech synthesis model and lower the threshold for customizing a sound library.

为实现上述目的，本发明实施例提供了一种录音文稿生成方法，包括以下步骤：To achieve the above purpose, an embodiment of the present invention provides a method for generating a recorded manuscript, comprising the following steps:

获取自定义的拼音组合序列；其中，所述拼音组合序列包括多个拼音和每一所述拼音出现的次数；Obtain a custom pinyin combination sequence; wherein, the pinyin combination sequence includes a plurality of pinyin and the number of times each of the pinyin appears;

基于拼音序列与汉字序列之间的预设映射关系，将所述拼音组合序列中的每一所述拼音转换为对应的汉字，得到汉字组合序列；其中，所述汉字组合序列包括多个汉字和每一所述汉字出现的次数；Based on the preset mapping relationship between the pinyin sequence and the Chinese character sequence, each of the pinyin in the pinyin combination sequence is converted into a corresponding Chinese character to obtain a Chinese character combination sequence; wherein, the Chinese character combination sequence includes a plurality of Chinese characters and the number of occurrences of each said Chinese character;

将所述汉字组合序列输入到训练完成的文本生成模型中，得到初始录音文稿；The Chinese character combination sequence is input into the text generation model that the training is completed, and the initial recording manuscript is obtained;

对所述初始录音文稿进行纠错，得到目标录音文稿。Error correction is performed on the initial recording manuscript to obtain a target audio recording manuscript.

作为其中一种可选的实施例，所述基于拼音序列与汉字序列之间的预设映射关系，将所述拼音组合序列中的每一所述拼音转换为对应的汉字，得到汉字组合序列，包括：As an optional embodiment, according to the preset mapping relationship between the pinyin sequence and the Chinese character sequence, each of the pinyin in the pinyin combination sequence is converted into a corresponding Chinese character to obtain a Chinese character combination sequence, include:

基于预设映射表中的拼音排列顺序，将所述拼音组合序列中的各拼音按照预设的顺序排序，得到排序后的拼音序列；Based on the pinyin arrangement order in the preset mapping table, sorting each pinyin in the pinyin combination sequence according to the preset order to obtain a sorted pinyin sequence;

基于所述预设映射表，将排序后的拼音序列转换为对应的汉字序列，并在所述汉字序列中的每一汉字后增加该汉字对应拼音的出现次数，得到汉字组合序列。Based on the preset mapping table, the sorted pinyin sequence is converted into a corresponding Chinese character sequence, and after each Chinese character in the Chinese character sequence, the number of occurrences of the corresponding pinyin of the Chinese character is increased to obtain a Chinese character combination sequence.

作为其中一种可选的实施例，通过以下方式训练所述文本生成模型：As an optional embodiment, the text generation model is trained in the following manner:

获取问答对语料；其中，所述问答对语料包括原始句子和问题句子；Obtaining a question-and-answer pair corpus; wherein, the question-answer pair corpus includes an original sentence and a question sentence;

将所述问答对语料输入到预设的文本生成模型中，对所述文本生成模型进行训练，得到训练完成的文本生成模型。The question-and-answer pair corpus is input into a preset text generation model, and the text generation model is trained to obtain a trained text generation model.

作为其中一种可选的实施例，通过以下方式获取问答对语料：As an optional embodiment, the question-and-answer pair corpus is obtained in the following manner:

获取文本语料，并将所述文本语料按照预设的分句方法分成多个原始句子；Obtaining a text corpus, and dividing the text corpus into a plurality of original sentences according to a preset sentence segmentation method;

基于预设映射表，将所述文本语料中的每一所述原始句子转换为对应的拼音序列；Based on the preset mapping table, each of the original sentences in the text corpus is converted into a corresponding pinyin sequence;

统计每一所述拼音序列中各拼音的出现次数，并根据所述出现次数将每一所述拼音序列中的各拼音按照预设的排列顺序进行排列，得到排序后的拼音序列；Count the number of occurrences of each pinyin in each described pinyin sequence, and arrange each pinyin in each described pinyin sequence according to the preset order according to the number of occurrences to obtain a sorted pinyin sequence;

基于预设映射表，将所述排列后的拼音序列转换为对应的汉字序列，并在所述汉字序列中的每一汉字后增加该汉字对应拼音的出现次数，得到每一所述原始句子对应的问题句子。Based on the preset mapping table, the arranged pinyin sequence is converted into a corresponding Chinese character sequence, and the number of occurrences of the corresponding pinyin of the Chinese character is added after each Chinese character in the Chinese character sequence, so that each original sentence corresponds to question sentence.

将每一原始句子和其对应的问题句子构成一组问答对，得到多组问答对语料。Each original sentence and its corresponding question sentence are formed into a set of question-answer pairs, and multiple sets of question-answer pairs are obtained.

作为其中一种可选的实施例，所述拼音组合序列还包括每一所述拼音的音调；As an optional embodiment, the pinyin combination sequence further includes the tone of each of the pinyin;

则，所述基于拼音序列与汉字序列之间的预设映射关系，将所述拼音组合序列中的每一所述拼音转换为对应的汉字，得到汉字组合序列，包括：Then, based on the preset mapping relationship between the pinyin sequence and the Chinese character sequence, each of the pinyin in the pinyin combination sequence is converted into a corresponding Chinese character, and the Chinese character combination sequence is obtained, including:

基于所述预设映射表，根据每一所述拼音的音调，将排序后的拼音序列转换为对应的汉字序列，并在所述汉字序列中的每一汉字后增加该汉字对应拼音的出现次数，得到汉字组合序列。Based on the preset mapping table, according to the pitch of each pinyin, the sorted pinyin sequence is converted into a corresponding Chinese character sequence, and the number of occurrences of the corresponding pinyin of the Chinese character is increased after each Chinese character in the Chinese character sequence , get the Chinese character combination sequence.

作为其中一种可选的实施例，所述预设映射表包括通用规范汉字表。As an optional embodiment, the preset mapping table includes a general standard Chinese character table.

作为其中一种可选的实施例，所述文本生成模型包括GPT、GPT2、GPT3、LaserTagger、LSTM。As an optional embodiment, the text generation model includes GPT, GPT2, GPT3, LaserTagger, and LSTM.

本发明实施例提供了一种录音文稿生成装置，包括：An embodiment of the present invention provides an apparatus for generating a recorded document, including:

拼音组合序列获取模块，用于获取自定义的拼音组合序列；其中，所述拼音组合序列包括多个拼音和每一所述拼音出现的次数；a pinyin combination sequence acquisition module for acquiring a custom pinyin combination sequence; wherein the pinyin combination sequence includes a plurality of pinyin and the number of times each of the pinyin occurs;

汉字组合序列获取模块，用于基于拼音序列与汉字序列之间的预设映射关系，将所述拼音组合序列中的每一所述拼音转换为对应的汉字，得到汉字组合序列；其中，所述汉字组合序列包括多个汉字和每一所述汉字出现的次数；A Chinese character combination sequence acquisition module, configured to convert each of the pinyin in the pinyin combination sequence into a corresponding Chinese character based on the preset mapping relationship between the pinyin sequence and the Chinese character sequence, to obtain a Chinese character combination sequence; wherein, the The Chinese character combination sequence includes a plurality of Chinese characters and the number of occurrences of each said Chinese character;

初始录音文稿获取模块，用于将所述汉字组合序列输入到训练完成的文本生成模型中，得到初始录音文稿；The initial recording manuscript acquisition module is used to input the Chinese character combination sequence into the text generation model that the training is completed to obtain the initial recording manuscript;

目标录音文稿获取模块，用于对所述初始录音文稿进行纠错，得到目标录音文稿。The target recording manuscript acquisition module is used to correct the errors of the initial recording manuscript to obtain the target audio recording manuscript.

本发明实施例提供了一种终端设备，包括处理器、存储器以及存储在所述存储器中且被配置为由所述处理器执行的计算机程序，所述处理器执行所述计算机程序时实现上述实施例所述的录音文稿生成方法。An embodiment of the present invention provides a terminal device, including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, where the above implementation is implemented when the processor executes the computer program The method for generating the recorded manuscript described in the example.

本发明实施例提供了一种计算机可读存储介质，所述计算机可读存储介质包括存储的计算机程序，其中，在所述计算机程序运行时控制所述计算机可读存储介质所在设备执行上述实施例所述的车辆定位方法。An embodiment of the present invention provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, wherein when the computer program runs, the device where the computer-readable storage medium is located is controlled to execute the foregoing embodiments The described vehicle positioning method.

与现有技术相比，本发明实施例提供的一种录音文稿生成方法、装置、设备及介质，能够根据实际需求定制输入的拼音序列，生成文字数量少且包含全部拼音类型的录音文稿，从而进一步减少构建语音合成模型所需的录音语料，降低音库定制的门槛。Compared with the prior art, a method, device, equipment and medium for generating a recorded manuscript provided by the embodiment of the present invention can customize the input pinyin sequence according to actual needs, and generate a recorded manuscript with a small number of characters and all types of pinyin, thereby Further reduce the recording corpus required to build a speech synthesis model, and lower the threshold for customizing the sound library.

附图说明Description of drawings

图1是本发明实施例提供的一种录音文稿生成方法的流程示意图；1 is a schematic flowchart of a method for generating a recorded manuscript provided by an embodiment of the present invention;

图2是本发明实施例提供的一种录音文稿生成装置的结构示意图；2 is a schematic structural diagram of a device for generating a recorded manuscript provided by an embodiment of the present invention;

图3是本发明实施例提供的一种终端设备的结构示意图。FIG. 3 is a schematic structural diagram of a terminal device according to an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

本发明实施例提供了一种录音文稿生成方法，参见图1，是本发明实施例提供的一种录音文稿生成方法的流程示意图，所述方法包括步骤S11至步骤S14：An embodiment of the present invention provides a method for generating a recorded manuscript. Referring to FIG. 1 , it is a schematic flowchart of a method for generating a recorded manuscript provided by an embodiment of the present invention, and the method includes steps S11 to S14:

S11、获取自定义的拼音组合序列；其中，所述拼音组合序列包括多个拼音和每一所述拼音出现的次数。S11. Obtain a custom pinyin combination sequence; wherein the pinyin combination sequence includes a plurality of pinyin and the number of occurrences of each pinyin.

需要说明的是，自定义的拼音组合序列是以句子为单位。It should be noted that the customized pinyin combination sequence is based on sentences.

S12、基于拼音序列与汉字序列之间的预设映射关系，将所述拼音组合序列中的每一所述拼音转换为对应的汉字，得到汉字组合序列；其中，所述汉字组合序列包括多个汉字和每一所述汉字出现的次数。S12. Based on the preset mapping relationship between the pinyin sequence and the Chinese character sequence, convert each of the pinyin in the pinyin combination sequence into a corresponding Chinese character to obtain a Chinese character combination sequence; wherein, the Chinese character combination sequence includes a plurality of Chinese characters and the number of occurrences of each said Chinese character.

S13、将所述汉字组合序列输入到训练完成的文本生成模型中，得到初始录音文稿。S13. Input the Chinese character combination sequence into the trained text generation model to obtain an initial recording manuscript.

S14、对所述初始录音文稿进行纠错，得到目标录音文稿。S14. Perform error correction on the initial recorded manuscript to obtain a target recorded manuscript.

需要说明的是，文本纠错方法包括不限于：基于统计的方法、基于词典的方法、基于神经网络的方法。在实际应用过程中，对录音文稿进行纠错的方法可以根据实际需求或者试验进行设定，在此不作限定。其中，可以理解的是，如果模型生成文本的本身没有错误，则文本纠错后的文本和原文一致。It should be noted that the text error correction method includes but is not limited to: a statistical-based method, a dictionary-based method, and a neural network-based method. In the actual application process, the method for correcting the errors of the recorded manuscript can be set according to actual needs or experiments, which is not limited here. Among them, it can be understood that if the text generated by the model itself has no errors, the text after the text error correction is consistent with the original text.

另外，需要说明的是，文本生成又称为自然语言生成，自然语言生成系统可被定义为接受非语言形式的信息作为输入，生成可读的文字表述。本发明是通过用户指定拼音文本，然后生成汉字文本。例如用户制定qi3这个拼音出现1次，lai2出现1次等的需求，通过采用本发明实施例，得到类似“三月五栋加油笑起来”的汉字文本，纠错后即符合用户的预设条件。In addition, it should be noted that text generation is also called natural language generation, and a natural language generation system can be defined as accepting information in non-linguistic form as input to generate readable text expressions. In the present invention, the pinyin text is specified by the user, and then the Chinese character text is generated. For example, the user specifies that the pinyin of qi3 appears once, lai2 appears once, etc., by using the embodiment of the present invention, a Chinese character text similar to "March 5th building, come on and laugh" is obtained, and the user's preset conditions are met after error correction. .

进一步的，在现有技术中，构建录音文稿的方法一般包括：(1)选取人类撰写的若干文字作品作为录音文稿；例如从中小学课文、新闻资讯、散文、小说等作品中选取部分文章作为录音文稿；(2)从人类撰写的句子中筛选部分句子组合作为录音文稿；例如，把新闻资讯、散文、小说等文档切分为多个句子，然后设定规则从切分的句子中筛选若干句子，用筛选的句子构造录音文稿；(3)专业人士按照预设要求撰写录音文稿；例如，指定题材、字数、体裁等条件，由专业写手按照这些预设条件要求撰写文章，撰写的文章适当处理作为录音文稿。但是，上述方法存在以下问题：(1)选取人类撰写的若干文字作品作为录音文稿；这种方法构建的录音文稿文采较高、逻辑严谨、可读性强，但是拼音很多重复，信息冗余度大，导致覆盖全部拼音所需的文稿数量规模大，构建语音合成模型所需这类语料数量规模大，不利于少量资源定制音库；(2)从人类撰写的句子中筛选部分句子组合作为录音文稿；这种方法构建的文稿每个句子都逻辑通顺，可读性较强，但是高频拼音重复率高，导致覆盖全部拼音所需的文稿数量较多，不利于定制音库；(3)专业人士按照预设要求撰写录音文稿；这种方法构建的文稿文采较高、逻辑性强、可读性强，但是对专业写手要求高，成本高，而且如果指定字数要求和拼音覆盖要求，则需要专业写手对拼音理解深刻，对专业写手要求更高，难以成功实施。Further, in the prior art, the method for constructing a recording manuscript generally includes: (1) Selecting several written works written by humans as the recording manuscript; (2) Select some sentence combinations from sentences written by humans as recording manuscripts; for example, divide documents such as news information, prose, novels, etc. into multiple sentences, and then set rules to select some sentences from the segmented sentences. , construct the recorded manuscript with the screened sentences; (3) Professionals write the recorded manuscript according to the preset requirements; for example, specify the subject matter, number of words, genre and other conditions, and the professional writers will write the articles according to these preset requirements, and the articles written are appropriate Process the document as a recording. However, the above method has the following problems: (1) Select several written works written by humans as the recorded manuscript; the recorded manuscript constructed by this method is of high literary quality, rigorous logic and strong readability, but there are many repetitions in pinyin and redundant information. Large, resulting in a large number of manuscripts required to cover all pinyin, and the large number of such corpora required to build a speech synthesis model, which is not conducive to customizing the sound library with a small amount of resources; (2) Select some sentence combinations from sentences written by humans as recordings manuscript; each sentence of the manuscript constructed by this method is logical and readable, but the repetition rate of high-frequency pinyin is high, resulting in a large number of manuscripts needed to cover all pinyin, which is not conducive to customizing the sound library; (3) Professionals write recorded manuscripts according to preset requirements; the manuscripts constructed by this method are of high literary quality, strong logic, and strong readability, but have high requirements for professional writers and high costs, and if the word count requirements and pinyin coverage requirements are specified, It requires professional writers to have a deep understanding of pinyin, and the requirements for professional writers are higher, which is difficult to implement successfully.

而与现有技术相比，本发明实施例提供的一种录音文稿生成方法，能够根据实际需求定制输入的拼音序列，生成文字数量少且包含全部拼音类型的录音文稿，从而进一步减少构建语音合成模型所需的录音语料，降低音库定制的门槛。另外，本发明采用自动的方式构建录音文稿，不需要耗费人力资源，从而进一步降低了撰写成本。Compared with the prior art, the method for generating a recorded manuscript provided by the embodiment of the present invention can customize the input pinyin sequence according to actual needs, and generate a recorded manuscript with a small number of characters and including all pinyin types, thereby further reducing the construction of speech synthesis. The recording corpus required by the model reduces the threshold for customizing the sound library. In addition, the present invention constructs the recorded manuscript in an automatic manner without consuming human resources, thereby further reducing the writing cost.

需要说明的是，采用本发明实施例，能够在包含全部拼音类型且每个拼音出现次数高于预设阈值的情况下，使得整个录音文稿字数更少，而不是针对单句拼音对应的字数少。例如从新闻文稿中抽取句子组成录音文稿，可能需要10万字，而该方案只需要1万字即可。It should be noted that, by adopting the embodiment of the present invention, when all types of pinyin are included and the number of occurrences of each pinyin is higher than a preset threshold, the number of words in the entire recorded manuscript can be reduced, rather than the number of words corresponding to a single sentence of pinyin. For example, extracting sentences from news manuscripts to form a recording manuscript may require 100,000 words, but this program only needs 10,000 words.

作为其中一种可选的实施例，所述步骤S12包括：As an optional embodiment, the step S12 includes:

S121、基于预设映射表中的拼音排列顺序，将所述拼音组合序列中的各拼音按照预设的顺序排序，得到排序后的拼音序列；S121, based on the pinyin arrangement order in the preset mapping table, sort each pinyin in the pinyin combination sequence according to the preset order to obtain the sorted pinyin sequence;

S122、基于所述预设映射表，将排序后的拼音序列转换为对应的汉字序列，并在所述汉字序列中的每一汉字后增加该汉字对应拼音的出现次数，得到汉字组合序列。S122. Based on the preset mapping table, convert the sorted pinyin sequence into a corresponding Chinese character sequence, and increase the number of occurrences of the corresponding pinyin of the Chinese character after each Chinese character in the Chinese character sequence to obtain a Chinese character combination sequence.

在实际应用过程中，自定义的拼音组合序列可以为已经按照预设的顺序排序后的拼音组合序列，然后直接基于预设映射表进行转换，得到汉字组合序列，也可以为未经过顺序排序的拼音组合序列，在输入后再进行重新排序和转换，具体可以实际需求或试验进行设定，在此不作限定。In the actual application process, the customized pinyin combination sequence can be the pinyin combination sequence that has been sorted according to the preset order, and then directly converted based on the preset mapping table to obtain the Chinese character combination sequence, or it can be the unsorted pinyin combination sequence. The pinyin combination sequence is reordered and converted after input, which can be set according to actual needs or experiments, which is not limited here.

需要说明的是，自定义输入的拼音组合序列一般的组成结构为：拼音+音调对应的数字+出现次数，但是，在实际应用过程中，也可以使用拼音+出现次数的构成，具体的设定与用户具体要达到的效果有关，在此不作限定。It should be noted that the general composition structure of the pinyin combination sequence of custom input is: pinyin + number corresponding to tone + number of occurrences, but in the actual application process, the composition of pinyin + number of occurrences can also be used. The specific setting It is related to the effect to be achieved by the user, which is not limited here.

示例性的，自定义拼音输入来生成录音文稿的若干例子如下：Illustratively, several examples of customizing pinyin input to generate audio transcripts are as follows:

(1)若干拼音出现多次：(1) Several pinyin appear multiple times:

假设自定义输入的拼音及其出现次数为：dong 1 wu 1 jia 1qi 2 yue 2 san1you 1lai 2 xiao 1，将这些拼音组合序列进行排序，得到qi 2 lai 2 yue 2 jia 1 san1you 1 dong 1 wu 1 xiao 1，然后从《通用规范汉字表》中找到每个拼音对应的汉字中次序最前的一个字作为该拼音的专属汉字，转换为汉字组合序列为：企2来2月2家1三1游1动1武1效1；Assuming that the pinyin of the custom input and the number of occurrences are: dong 1 wu 1 jia 1qi 2 yue 2 san1you 1lai 2 xiao 1, sort these pinyin combinations to get qi 2 lai 2 yue 2 jia 1 san1you 1 dong 1 wu 1 xiao 1, and then find the first character in the Chinese character corresponding to each pinyin from the "General Standard Chinese Character Table" as the exclusive Chinese character for the pinyin, and convert it into a combination sequence of Chinese characters: 2, 2, 2, 1, 3, 1 tour in February 1 action 1 martial arts 1 effect 1;

(2)每个拼音只出现一次：(2) Each pinyin appears only once:

拼音及其出现次数为：qi3 1 lai2 1 yue4 1 jia1 1 san1 1 you2 1 dong4 1wu3 1 xiao4 1。转换文输入模型的文本为：企1来1月1家1三1游1动1武1效1，文本生成模型输出的文本为：三月五栋加油笑起来，文本纠错后的录音文稿的文本为：三月五栋加油笑起来。Pinyin and its occurrences are: qi3 1 lai2 1 yue4 1 jia1 1 san1 1 you2 1 dong4 1wu3 1 xiao4 1. The text input to the converted text model is: Enterprise 1 to January 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, and 1, and the text output from the text generation model is: March 5th, come on, laugh, and the recorded manuscript after text error correction The text reads: March 5th, come on and laugh.

(3)单个拼音多次：(3) A single pinyin multiple times:

拼音及其出现次数为：qi3 4，转换文输入模型的文本为：企4，文本生成模型输出的文本为：起起起起，文本纠错后的录音文稿的文本为：起起起起。The pinyin and its occurrence times are: qi3 4, the text input to the converted text model is: Qi 4, the text output by the text generation model is: Qiqiqiqi, and the text of the recorded manuscript after text error correction is: Qiqiqiqi.

(4)每个拼音都出现多次：(4) Each pinyin appears multiple times:

自定义输入的拼音及其出现次数为：qi3 4 lai2 4 yue4 4。转换文输入模型的文本为：企4来4月4，文本生成模型输出的文本为：月月跃起来，起来起来跃起来，文本纠错后的录音文稿的文本为：月月跃起来，起来起来跃起来。The pinyin and the number of occurrences of the custom input are: qi3 4 lai2 4 yue4 4. The text of the converted text input model is: Qi 4 to April 4, the text output by the text generation model is: Yueyue jump up, get up and jump up, and the text of the recorded manuscript after text error correction is: Yueyue jump up, get up Get up and jump up.

需要说明的是，在实际应用过程中，预设的映射表可以根据实际需求或者试验进行设定，此处的通用规范汉字表的参考标准仅为举例说明，而不作限定。It should be noted that, in the actual application process, the preset mapping table can be set according to actual requirements or experiments, and the reference standard of the general standard Chinese character table here is only for illustration and not for limitation.

需要说明的是，分句方法包括不限于：基于标点符号分句、用NLP预训练模型分句。It should be noted that the sentence segmentation methods include but are not limited to: sentence segmentation based on punctuation marks and sentence segmentation using an NLP pre-trained model.

示例性的，获取或构建问答对语料的主要步骤包括：Exemplarily, the main steps to obtain or construct a question-and-answer pair corpus include:

1、收集大量多题材的文本语料；其中，收集文本语料的渠道包括但不限于：新闻资讯、网络文学、课本教材、散文小说、古代文学；收集文本语料的方法包括但不限于：网络爬虫、开源下载、OCR识别、语音识别；1. Collect a large number of text corpora with multiple subjects; among them, the channels for collecting text corpus include but are not limited to: news information, online literature, textbooks, prose novels, ancient literature; methods for collecting text corpus include but are not limited to: web crawlers, Open source download, OCR recognition, speech recognition;

2、将文本语料转换为问答对语料；2. Convert the text corpus into a question-and-answer corpus;

(1)用分句方法把语料分为以句子为单位的文本，构成句子列表；(1) Divide the corpus into sentence-based texts by the method of clauses to form a list of sentences;

(2)每个拼音用一个专属汉字表示，构成拼音汉字映射表；(2) Each pinyin is represented by an exclusive Chinese character to form a mapping table of pinyin and Chinese characters;

3、对句子列表中的每个句子进行汉字转拼音操作，生成拼音序列；其中，汉字转拼音的方法包括但不限于：基于拼音词典的汉字转拼音、基于统计学习的汉字转拼音、基于自然语言处理预训练模型的汉字转拼音；3, each sentence in the sentence list is converted into Pinyin from Chinese characters to generate a pinyin sequence; wherein, the methods for Chinese characters to Pinyin include, but are not limited to: Chinese characters based on Pinyin Dictionary, Chinese characters based on statistical learning, based on natural Chinese characters to Pinyin for language processing pre-training models;

4、生成问答对的问题句子；4. Generate question sentences for question and answer pairs;

(1)统计每个句子转出的拼音序列的每个拼音出现次数。(1) Count the number of occurrences of each pinyin in the pinyin sequence transferred from each sentence.

(2)拼音按拼音出现次数降序排序，如果拼音出现次数相同，则按《通用规范汉字表》的次序升序排序。(2) Pinyin is sorted in descending order of the number of occurrences of pinyin. If the number of occurrences of pinyin is the same, it is sorted in ascending order according to the order of the "General Standard Chinese Character Table".

(3)依据排序后的拼音序列，根据拼音汉字映射表转为汉字，每个汉字后面增加该汉字对应拼音出现的次数，形成问答对语料的问题句子；(3) according to the pinyin sequence after the sorting, according to the pinyin Chinese character mapping table, turn into Chinese characters, increase the number of times that the corresponding pinyin of this Chinese character occurs behind each Chinese character, form the question sentence of the question and answer pair corpus;

例如：句子列表的原始句子为：加油，三月动起来，五月笑起来。For example: The original sentence of the sentence list is: Come on, move in March, smile in May.

生成的问题句子为：企2来2月2家1三1游1动1武1效1。The generated question sentence is: Enterprise 2 to February 2, 1, 3, 1, swimming, 1, 1, martial, 1, 1.

5、生成问答对语料。5. Generate question-and-answer corpus.

(1)用句子列表中的句子作为答案句子。(1) Use the sentence in the sentence list as the answer sentence.

(2)用上步骤生成的问题句子和答案句子构成一个问答对。(2) Use the question sentence and the answer sentence generated in the previous step to form a question-and-answer pair.

(3)对句子列表的所有句子进行操作，生成问答对语料。(3) Operate all sentences in the sentence list to generate a question-and-answer pair corpus.

示例性的，假设自定义输入的拼音及其出现次数为：dong4 1 wu3 1 jia1 1 qi32 yue4 2 san1 1 you2 1lai2 2 xiao4 1，将这些拼音组合序列进行排序，得到qi3 2lai2 2 yue4 2jia1 1 san1 1 you2 1 dong4 1 wu3 1 xiao4 1，然后从《通用规范汉字表》中找到每个拼音对应的汉字中次序最前的一个字作为该拼音的专属汉字，转换为汉字组合序列为：企2来2月2家1三1游1动1武1效1。Exemplarily, assuming that the pinyin of the custom input and the number of occurrences thereof are: dong4 1 wu3 1 jia1 1 qi32 yue4 2 san1 1 you2 1lai2 2 xiao4 1, sort these pinyin combinations to get qi3 2lai2 2 yue4 2jia1 1 san1 1 you2 1 dong4 1 wu3 1 xiao4 1, and then find the first character in the Chinese character corresponding to each pinyin from the "General Standard Chinese Character Table" as the exclusive Chinese character for the pinyin, and convert it into a Chinese character combination sequence: Qi2 to February 2 households, 1 three, 1 swimming, 1 action, 1 martial arts, 1 effect and 1.

需要说明的是，文本生成模型可用开源的预训练的模型作为基础模型，模型参数也可进行自由定制，包括不限于：根据训练语料数据量设置参数、根据经验设置参数，在此不作限定。It should be noted that the text generation model can use an open-source pre-trained model as the basic model, and the model parameters can also be freely customized, including but not limited to: setting parameters according to the amount of training corpus data and setting parameters according to experience, which are not limited here.

本发明实施例提供了一种录音文稿生成装置，参见图2，是本发明实施例提供的一种录音文稿生成装置20的结构示意图，包括：An embodiment of the present invention provides an apparatus for generating a recorded document. Referring to FIG. 2 , it is a schematic structural diagram of an apparatus for generating a recorded document 20 provided by an embodiment of the present invention, including:

拼音组合序列获取模块21，用于获取自定义的拼音组合序列；其中，所述拼音组合序列包括多个拼音和每一所述拼音出现的次数；The pinyin combination sequence acquisition module 21 is used to acquire a custom pinyin combination sequence; wherein, the pinyin combination sequence includes a plurality of pinyin and the number of times each of the pinyin occurs;

汉字组合序列获取模块22，用于基于拼音序列与汉字序列之间的预设映射关系，将所述拼音组合序列中的每一所述拼音转换为对应的汉字，得到汉字组合序列；其中，所述汉字组合序列包括多个汉字和每一所述汉字出现的次数；The Chinese character combination sequence acquisition module 22 is used to convert each of the pinyin in the pinyin combination sequence into a corresponding Chinese character based on the preset mapping relationship between the pinyin sequence and the Chinese character sequence to obtain the Chinese character combination sequence; Described Chinese character combination sequence comprises a plurality of Chinese characters and the number of times that each described Chinese character appears;

初始录音文稿获取模块23，用于将所述汉字组合序列输入到训练完成的文本生成模型中，得到初始录音文稿；The initial recording manuscript acquisition module 23 is used to input the Chinese character combination sequence into the text generation model that the training is completed to obtain the initial recording manuscript;

目标录音文稿获取模块24，用于对所述初始录音文稿进行纠错，得到目标录音文稿。The target recording manuscript obtaining module 24 is used for error correction of the initial recording manuscript to obtain the target audio recording manuscript.

与现有技术相比，本发明实施例提供的一种录音文稿生成装置，能够根据实际需求定制输入的拼音序列，生成文字数量少且包含全部拼音类型的录音文稿，从而进一步减少构建语音合成模型所需的录音语料，降低音库定制的门槛。Compared with the prior art, an apparatus for generating a recorded manuscript provided by the embodiment of the present invention can customize the input pinyin sequence according to actual needs, and generate a recorded manuscript with a small number of characters and including all types of pinyin, thereby further reducing the construction of a speech synthesis model. The required recording corpus reduces the threshold for sound library customization.

作为其中一种可选的实施例，所述汉字组合序列获取模块22具体用于：As one of the optional embodiments, the Chinese character combination sequence acquisition module 22 is specifically used for:

基于所述预设映射表，将排序后的拼音序列转换为对应的汉字序列，并在所述汉字序列中的每一汉字后增加该汉字对应拼音的出现次数，得到汉字组合序列。Based on the preset mapping table, the sorted pinyin sequence is converted into a corresponding Chinese character sequence, and the number of occurrences of the corresponding pinyin of the Chinese character is added after each Chinese character in the Chinese character sequence to obtain a Chinese character combination sequence.

获取问答对语料；其中，所述问答对语料包括原始句子和问题句子；Obtain a question-and-answer pair corpus; wherein, the question-and-answer pair corpus includes original sentences and question sentences;

基于预设映射表，将所述文本语料中的每一所述原始句子转换为对应的拼音序列；Based on a preset mapping table, converting each of the original sentences in the text corpus into a corresponding pinyin sequence;

另外，需要说明的是，本发明实施例提供的一种录音文稿生成装置的各实施例的具体实现方案和有益效果，与本发明实施例提供的一种录音文稿生成方法的各实施例的具体实现方案和有益效果对应相同，在此不作赘述。In addition, it should be noted that the specific implementation schemes and beneficial effects of each embodiment of the device for generating a recorded document provided by the embodiment of the present invention are different from the specific implementation schemes and beneficial effects of each embodiment of the method for generating a recorded document provided by the embodiment of the present invention. The implementation scheme and the beneficial effects are correspondingly the same, and are not repeated here.

本发明实施例提供了一种终端设备，参见图3，是本发明实施例提供的一种终端设备的结构示意图。该实施例的终端设备3包括：处理器30、存储器31以及存储在所述存储器31中并可在所述处理器30上运行的计算机程序。所述处理器30执行所述计算机程序时实现上述任一实施例所述的录音文稿生成方法。或者，所述处理器30执行所述计算机程序时实现上述各装置实施例中各模块的功能。An embodiment of the present invention provides a terminal device. Referring to FIG. 3 , it is a schematic structural diagram of a terminal device provided by an embodiment of the present invention. The terminal device 3 of this embodiment includes: a processor 30 , a memory 31 , and a computer program stored in the memory 31 and executable on the processor 30 . When the processor 30 executes the computer program, the method for generating a recorded document described in any of the foregoing embodiments is implemented. Alternatively, when the processor 30 executes the computer program, the functions of the modules in the foregoing device embodiments are implemented.

示例性的，所述计算机程序可以被分割成一个或多个模块，所述一个或者多个模块被存储在所述存储器31中，并由所述处理器30执行，以完成本发明。所述一个或多个模块可以是能够完成特定功能的一系列计算机程序指令段，该指令段用于描述所述计算机程序在所述终端设备3中的执行过程。Exemplarily, the computer program can be divided into one or more modules, and the one or more modules are stored in the memory 31 and executed by the processor 30 to complete the present invention. The one or more modules may be a series of computer program instruction segments capable of performing specific functions, and the instruction segments are used to describe the execution process of the computer program in the terminal device 3 .

所述终端设备3可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述终端设备3可包括，但不仅限于，处理器30、存储器31。本领域技术人员可以理解，所述示意图仅仅是终端设备的示例，并不构成对终端设备的限定，可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件，例如所述终端设备3还可以包括输入输出设备、网络接入设备、总线等。The terminal device 3 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. The terminal device 3 may include, but is not limited to, a processor 30 and a memory 31 . Those skilled in the art can understand that the schematic diagram is only an example of a terminal device, and does not constitute a limitation to the terminal device, and may include more or less components than the one shown in the figure, or combine some components, or different components, For example, the terminal device 3 may further include an input and output device, a network access device, a bus, and the like.

所称处理器30可以是中央处理单元(Central Processing Unit，CPU)，还可以是其他通用处理器、数字信号处理器(Digital Signal Processor，DSP)、专用集成电路(Application Specific Integrated Circuit，ASIC)、现场可编程门阵列(Field-Programmable Gate Array，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等，所述处理器30是所述终端设备3的控制中心，利用各种接口和线路连接整个终端设备3的各个部分。The so-called processor 30 may be a central processing unit (Central Processing Unit, CPU), and may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor can be a microprocessor or the processor can also be any conventional processor. part.

所述存储器31可用于存储所述计算机程序和/或模块，所述处理器30通过运行或执行存储在所述存储器31内的计算机程序和/或模块，以及调用存储在存储器31内的数据，实现所述终端设备3的各种功能。所述存储器31可主要包括存储程序区和存储数据区，其中，存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等；存储数据区可存储根据手机的使用所创建的数据(比如音频数据、电话本等)等。此外，存储器31可以包括高速随机存取存储器，还可以包括非易失性存储器，例如硬盘、内存、插接式硬盘，智能存储卡(Smart Media Card,SMC)，安全数字(Secure Digital,SD)卡，闪存卡(Flash Card)、至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。The memory 31 can be used to store the computer programs and/or modules, and the processor 30 can run or execute the computer programs and/or modules stored in the memory 31 and call the data stored in the memory 31, Various functions of the terminal device 3 are realized. The memory 31 may mainly include a stored program area and a stored data area, wherein the stored program area may store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), etc.; the storage data area may Stores data (such as audio data, phonebook, etc.) created according to the use of the mobile phone, and the like. In addition, the memory 31 may include high-speed random access memory, and may also include non-volatile memory, such as hard disk, internal memory, plug-in hard disk, Smart Media Card (SMC), Secure Digital (SD) card, Flash Card, at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

其中，所述终端设备3集成的模块如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明实现上述实施例方法中的全部或部分流程，也可以通过计算机程序来指令相关的硬件来完成，所述的计算机程序可存储于计算机可读存储介质中，该计算机程序在被处理器30执行时，可实现上述各个方法实施例的步骤。其中，所述计算机程序包括计算机程序代码，所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括：能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是，所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减，例如在某些司法管辖区，根据立法和专利实践，计算机可读介质不包括电载波信号和电信信号。Wherein, if the modules integrated in the terminal device 3 are implemented in the form of software functional units and sold or used as independent products, they may be stored in a computer-readable storage medium. Based on this understanding, the present invention can implement all or part of the processes in the methods of the above embodiments, and can also be completed by instructing relevant hardware through a computer program, and the computer program can be stored in a computer-readable storage medium. When executed by the processor 30, the steps of the various method embodiments described above may be implemented. Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form, and the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM, Read-Only Memory) , Random Access Memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium, etc. It should be noted that the content contained in the computer-readable media may be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction, for example, in some jurisdictions, according to legislation and patent practice, the computer-readable media Electric carrier signals and telecommunication signals are not included.

需说明的是，以上所描述的装置实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外，本发明提供的装置实施例附图中，模块之间的连接关系表示它们之间具有通信连接，具体可以实现为一条或多条通信总线或信号线。本领域普通技术人员在不付出创造性劳动的情况下，即可以理解并实施。It should be noted that the device embodiments described above are only schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical unit, that is, it can be located in one place, or it can be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. In addition, in the drawings of the apparatus embodiments provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines. Those of ordinary skill in the art can understand and implement it without creative effort.

本发明实施例提供了一种计算机可读存储介质，所述计算机可读存储介质包括存储的计算机程序，其中，在所述计算机程序运行时控制所述计算机可读存储介质所在设备执行如上述所述的录音文稿生成方法。An embodiment of the present invention provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, wherein when the computer program runs, the device where the computer-readable storage medium is located is controlled to execute the above-mentioned The method for generating the recorded transcript described.

本领域技术人员可以理解，可以对实施例中的装置中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个装置中。可以把实施例中的模块或单元组合成一个模块或单元，以及此外可以把它们分成多个子模块或子单元。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外，可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述，本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。Those skilled in the art can understand that the modules in the apparatus in the embodiment can be adaptively changed and arranged in one or more apparatuses different from the embodiment. The modules or units in the embodiments may be combined into one module or unit, and further they may be divided into multiple sub-modules or sub-units. All features disclosed in this specification (including accompanying claims, abstract and drawings) and any method so disclosed may be employed in any combination, unless at least some of such features and/or procedures or elements are mutually exclusive. All processes or units of equipment are combined. Each feature disclosed in this specification (including accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

应该注意的是，上述实施例对本发明进行说明而不是对本发明进行限制，并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中，不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包括”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。It should be noted that the above-described embodiments illustrate rather than limit the invention, and that alternative embodiments may be devised by those skilled in the art without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements.

此外，本领域的技术人员能够理解，尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征，但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如，在本发明的权利要求书中，所要求保护的实施例的任意之一都可以以任意的组合方式来使用。Furthermore, those skilled in the art will appreciate that although some of the embodiments described herein include certain features, but not others, included in other embodiments, that combinations of features of different embodiments are intended to be within the scope of the invention within and form different embodiments. For example, in the claims of this invention, any of the claimed embodiments may be used in any combination.

以上所述是本发明的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也视为本发明的保护范围。The above are the preferred embodiments of the present invention. It should be pointed out that for those skilled in the art, without departing from the principles of the present invention, several improvements and modifications can be made, and these improvements and modifications may also be regarded as It is the protection scope of the present invention.

Claims

1. a recording manuscript generation method, is characterized in that, comprises:

Obtain a custom pinyin combination sequence; wherein, the pinyin combination sequence includes a plurality of pinyin and the number of times each of the pinyin appears;

Based on the preset mapping relationship between the pinyin sequence and the Chinese character sequence, each of the pinyin in the pinyin combination sequence is converted into a corresponding Chinese character to obtain a Chinese character combination sequence; wherein, the Chinese character combination sequence includes a plurality of Chinese characters and the number of occurrences of each said Chinese character;

The Chinese character combination sequence is input into the text generation model that the training is completed, and the initial recording manuscript is obtained;

Error correction is performed on the initial recording manuscript to obtain a target audio recording manuscript.

2. recording manuscript generation method according to claim 1, is characterized in that, described based on the preset mapping relation between pinyin sequence and Chinese character sequence, each described pinyin in described pinyin combination sequence is converted into corresponding Chinese characters, get the Chinese character combination sequence, including:

Based on the pinyin arrangement order in the preset mapping table, sorting each pinyin in the pinyin combination sequence according to the preset order to obtain a sorted pinyin sequence;

Based on the preset mapping table, the sorted pinyin sequence is converted into a corresponding Chinese character sequence, and after each Chinese character in the Chinese character sequence, the number of occurrences of the corresponding pinyin of the Chinese character is increased to obtain a Chinese character combination sequence.

3. recording manuscript generation method according to claim 1, is characterized in that, described text generation model is trained in the following manner:

Obtaining a question-and-answer pair corpus; wherein, the question-answer pair corpus includes an original sentence and a question sentence;

The question-and-answer pair corpus is input into a preset text generation model, and the text generation model is trained to obtain a trained text generation model.

4. recording manuscript generation method according to claim 3, is characterized in that, obtains question-and-answer pair language material by following way:

Obtaining a text corpus, and dividing the text corpus into a plurality of original sentences according to a preset sentence segmentation method;

Based on the preset mapping table, each of the original sentences in the text corpus is converted into a corresponding pinyin sequence;

Count the number of occurrences of each pinyin in each described pinyin sequence, and arrange each pinyin in each described pinyin sequence according to the preset order according to the number of occurrences to obtain a sorted pinyin sequence;

Based on the preset mapping table, the arranged pinyin sequence is converted into a corresponding Chinese character sequence, and the number of occurrences of the corresponding pinyin of the Chinese character is added after each Chinese character in the Chinese character sequence, so that each original sentence corresponds to question sentence.

Each original sentence and its corresponding question sentence are formed into a set of question-answer pairs, and multiple sets of question-answer pairs are obtained.

5. recording manuscript generation method according to claim 1, is characterized in that, described pinyin combination sequence also comprises the pitch of each described pinyin;

Then, based on the preset mapping relationship between the pinyin sequence and the Chinese character sequence, each of the pinyin in the pinyin combination sequence is converted into a corresponding Chinese character, and the Chinese character combination sequence is obtained, including:

Based on the preset mapping table, according to the pitch of each pinyin, the sorted pinyin sequence is converted into a corresponding Chinese character sequence, and the number of occurrences of the corresponding pinyin of the Chinese character is increased after each Chinese character in the Chinese character sequence , get the Chinese character combination sequence.

6 . The method for generating a recorded manuscript according to claim 1 , wherein the preset mapping table comprises a general standard Chinese character table. 7 .

7. The method for generating a recorded document according to claim 1, wherein the text generation model comprises GPT, GPT2, GPT3, LaserTagger, and LSTM.

8. a recording document generation device, is characterized in that, comprises:

a pinyin combination sequence acquisition module for acquiring a custom pinyin combination sequence; wherein the pinyin combination sequence includes a plurality of pinyin and the number of times each of the pinyin occurs;

A Chinese character combination sequence acquisition module, configured to convert each of the pinyin in the pinyin combination sequence into a corresponding Chinese character based on the preset mapping relationship between the pinyin sequence and the Chinese character sequence, to obtain a Chinese character combination sequence; wherein, the The Chinese character combination sequence includes a plurality of Chinese characters and the number of occurrences of each said Chinese character;

The initial recording manuscript acquisition module is used to input the Chinese character combination sequence into the text generation model that the training is completed to obtain the initial recording manuscript;

The target recording manuscript acquisition module is used to correct the errors of the initial recording manuscript to obtain the target audio recording manuscript.

9. A terminal device, characterized by comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, when the processor executes the computer program, the computer program as claimed in the claim is implemented The method for generating a recorded manuscript described in any one of requirements 1 to 7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored computer program, wherein, when the computer program is run, the device where the computer-readable storage medium is located is controlled to perform as claimed in the claims The recording manuscript generation method described in any one of 1 to 7.