CN114091408B

CN114091408B - Text correction, model training method, correction model, equipment and robot

Info

Publication number: CN114091408B
Application number: CN202010773594.5A
Authority: CN
Inventors: 顾鹏程; 汤烨; 谢韬; 沈冀; 高倩; 邵长东
Original assignee: Ecovacs Commercial Robotics Co Ltd
Current assignee: Ecovacs Commercial Robotics Co Ltd
Priority date: 2020-08-04
Filing date: 2020-08-04
Publication date: 2024-12-27
Anticipated expiration: 2040-08-04
Also published as: CN114091408A

Abstract

Embodiments of the present application provide text correction, model training methods, correction models, devices and robots. In some embodiments of the present application, at least part of the words in the text are converted into pinyin to obtain a converted character sequence; the character sequence is processed using a text correction model to obtain a corrected text; wherein the text correction model is trained with training samples, the training samples include multiple character sequence samples, and at least part of the character sequence samples are pinyin. Through the above scheme, the words that cannot be accurately recognized are converted into pinyin, and then corrected using a character sequence containing pinyin and a pre-trained text correction model. The use of the text correction model can achieve the simultaneous output of the correction results corresponding to all the pinyins in the character sequence, which can achieve higher correction efficiency and higher correction processing speed, and can also effectively reduce the computing burden of the equipment during the correction process.

Description

Text correction and model training method, correction model, equipment and robot

Technical Field

The application belongs to the technical field of artificial intelligence, and particularly relates to a text correction and model training method, correction model, equipment and a robot.

Background

With the continuous development of artificial intelligence technology, various language recognition and text recognition technologies are increasingly entering people's lives, such as speech recognition in man-machine interaction dialogue applications, OCR recognition in text images, and so on.

In practical application, the problem of inaccurate identification can also occur in the intelligent identification application. Taking voice recognition as an example, voice information is easy to be interfered by environmental noise when being collected, so that the recognition is inaccurate, and in the process of converting the voice into characters, if the collected voice pronunciation is not standard, the recognition is also inaccurate. Aiming at the problem of inaccurate identification, the identification result needs to be corrected, so that an accurate identification result is obtained.

Disclosure of Invention

Aspects of the application provide text correction, model training methods, correction models, devices, and robots for implementing a scheme for correcting recognized text in text.

In a first aspect, an embodiment of the present application provides a text correction method, including:

converting at least part of words in the text into pinyin to obtain a converted character sequence;

processing the character sequence by using a text correction model to obtain corrected text;

The text correction model is obtained through training of training samples, the training samples comprise a plurality of character sequence samples, and at least part of the character sequence samples are pinyin.

In a second aspect, an embodiment of the present application provides a text correction model, the model including:

An input layer, configured to receive a character sequence containing pinyin corresponding to a text, where the character sequence includes at least one character string that represents a single word or a word;

According to the sequence of the character strings in the character sequence, calculating the corresponding characters at the first position according to the corresponding feature vectors of the character strings at the first position and the feature vectors corresponding to the character strings at least one adjacent position adjacent to the first position;

and the output layer is used for outputting the characters at the first position.

In a third aspect, an embodiment of the present application provides a text correction model training method, where the method includes:

Acquiring a character sequence sample containing pinyin;

Training a text correction model by using the character sequence sample to obtain an output result;

and optimizing the text correction model based on the output result.

In a fourth aspect, an embodiment of the present application provides an electronic device, where the electronic device includes a memory and a processor, where,

The memory is used for storing programs;

The processor, coupled to the memory, is configured to execute the program stored in the memory for:

the text correction model is obtained through training a training sample, the training sample comprises a plurality of character sequence samples, and at least part of the character sequence samples are pinyin

In a fifth aspect, an embodiment of the present application provides a robot including:

The machine body is provided with a machine body,

The advancing assembly is arranged on the machine body and is used for providing power for the machine body so as to enable the machine body to move independently in a place where the machine body is positioned;

the storage component is internally provided with a voice recognition module and a text correction module;

the voice recognition module is used for converting received voice into text;

The text correction module converts at least part of words in the text into pinyin to obtain a converted character sequence;

And processing the character sequence by using a text correction model to obtain corrected text, wherein the text correction model is obtained by training a training sample, the training sample comprises a plurality of character sequence samples, and at least part of the character sequence samples are pinyin.

In some embodiments of the present application, in performing speech recognition application, a corresponding text to be corrected is obtained according to a speech recognition result, and the text may include some characters that cannot be accurately determined, so that the part of words in the text may be converted into pinyin, and a character sequence including pinyin is obtained. And correcting the character sequence by using a pre-trained text correction model to obtain corrected characters corresponding to each Pinyin. Through the technical scheme, the words which cannot be accurately identified are converted into pinyin, and the character sequences containing the pinyin and the pre-trained text correction model are used for correcting the words. The text correction model can be used for outputting correction results corresponding to all the pinyin in the character sequence at the same time, so that higher correction efficiency and correction processing speed can be obtained, and the calculation burden of equipment in the correction process can be effectively reduced.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

fig. 1 is a schematic flow chart of an object searching method according to an embodiment of the present application;

FIG. 2 is a table diagram of pinyin representations provided by an embodiment of the present application;

FIG. 3 is a flowchart of a Pinyin conversion method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a text correction model according to an embodiment of the present application;

FIG. 5 is a schematic flow chart of a text correction model training method according to an embodiment of the present application;

FIG. 6 is a table diagram of training samples according to an embodiment of the present application;

Fig. 7a is a schematic diagram of an application scenario of a robot according to an embodiment of the present application;

fig. 7b is a schematic diagram of another robot application scenario provided in an embodiment of the present application;

Fig. 8 is a schematic structural diagram of a text correction apparatus according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, the "plurality" generally includes at least two, but does not exclude the case of at least one.

The words "if", as used herein, may be interpreted as "at" or "when" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrase "if determined" or "if detected (stated condition or event)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event)" or "in response to detection (stated condition or event), depending on the context.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a product or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such product or system. Without further limitation, an element defined by the phrase "comprising one of the elements" does not exclude the presence of additional identical elements in a commodity or system comprising the element.

Although some technical solutions consider the context, the text obtained by simply recognizing the context may have a relatively large deviation. If the obtained character with deviation is used for correction, the correction is affected by the character sound, the wrong character meaning and the meaning after the wrong character forms words, and is not beneficial to correction. Therefore, the proposal of the application utilizes the combination of pinyin and context to avoid the adverse effect of incorrect characters and incorrect characters forming words on correction, and the result obtained by correcting the pinyin is more similar to the original audio essence.

In addition, in the existing correction scheme, an encoding-decoding network technology is generally adopted, and in this technology, only one correction result can be output in each round of correction calculation, and the previous correction result is used as the input of the next correction result, so that the correction results of all characters are output after repeated iterations. For example, the sentence "I want to manage financial resources" is that the Pinyin sequence is wo yao li cai, chinese characters "I" are generated according to the code information of wo yao li cai, chinese characters "want" are generated according to the code information of wo yao li cai and "I", chinese characters "reason" are generated according to the code information of wo yao li cai and "I want to manage financial resources", chinese characters "financial resources" are generated according to the code information of wo yao li cai and "I want to manage financial resources", and finally "I want to manage financial resources" are output. And outputting a correction result after the four-wheel iterative correction calculation. The correction method has low processing speed and low efficiency, and can obtain correction results through a large calculation amount. In order to solve the above problems, the present inventors have proposed the present application.

The following describes in detail the technical solutions provided by the embodiments of the present application with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of a data processing method according to an embodiment of the present application. The execution subject of the method may be a server device or the server device may cooperate with a client device. The method specifically comprises the following steps:

and 101, converting at least part of words in the text into pinyin to obtain a converted character sequence.

And 102, processing the character sequence by using a text correction model to obtain corrected text, wherein the text correction model is obtained by training a training sample, the training sample comprises a plurality of character sequence samples, and at least part of the character sequence samples are pinyin.

The character sequence here may be a character sequence composed entirely of pinyin, or may be a character sequence composed of characters, words, sentences and pinyin. In the converted character sequence, the association sequence of the characters, words, sentences and pinyin is completely corresponding to the sequence in the text. If the text is obtained after speech recognition, it is required that the text is as close as possible to the speech content without filtering or deleting some of the words or words that may be considered nonsensical, because it is not ensured in the recognition result whether the filtered or deleted words or words are nonsensical in nature, and some word may be deleted by mistake due to a recognition error, so that it cannot be corrected to affect the true meaning that the text is intended to express.

For example, the user speaks the sentence "you good" to a terminal device (such as a counter robot) with a voice recognition function, please help me recommend a set of financial product bars ", the terminal device may have limited recognition capability, and then the terminal device may send the text to the server, where the server converts the undetermined word into pinyin, for example," one set "is converted into pinyin" yi tao ". Then the available character sequence is "you good, please help me recommend yi tao financial product bar". All words can also be converted to pinyin, and the corresponding character sequence is "ninhao, qing bang wo tui jian yi tao LI CAI CHAN PIN ba". Or a character sequence consisting of sentences and pinyin, such as "ninhao, please ask me to recommend a financial product bar".

After the character sequence is obtained, the character sequence is used as input of a text correction model, so that the character sequence is corrected by using the text correction model, and corrected text is obtained. In practical application, the whole character sequence corresponding to the text can be used as input, for example, "you good, please help me recommend yi tao financial product bars". The character sequence may also be decomposed, for example, by combining one or more words adjacent to each other around the pinyin to obtain a sub-character sequence, for example, "recommend yi tao financing" or "recommend yi tao" or the like.

In practical applications, in the process of converting speech recognition into text, miswords recognized as the same pinyin and the same tone, miswords of different tones and confusing miswords occur. To avoid adverse effects of these problems on the correction effect, a variety of pinyin representations may be employed, including, for example, at least one of a toned pinyin representation, a unvoiced pinyin representation, and a first-gram pinyin representation.

For example, fig. 2 is a table diagram of pinyin expression modes provided in an embodiment of the present application. As can be seen from FIG. 2, the actual content of the speech to be recognized is "I want to withdraw money", and the conversion of the text after speech recognition to obtain pinyin may be that the toned pinyin is expressed as "wo3, yao4, qu3, kuan3", the uttered pinyin is expressed as "wo, yao, qu, kuan", and the first initial pinyin is expressed as "w, y, q, k". It should be noted that, the training and the practical use of the pinyin expression mode should be kept consistent, for example, when training, only the training sample with the tone pinyin expression mode is used to train the text correction model, then when using, the pinyin in the input character sequence should also be the tone pinyin expression mode, when training, the training sample with the tone pinyin expression mode, the tone-removing pinyin expression mode and the first initial consonant pinyin expression mode is used to train the text correction model, and when using, the pinyin in the input character sequence should also be the tone pinyin expression mode, the tone-removing pinyin expression mode and the first initial consonant pinyin expression mode.

Fig. 3 is a flowchart of a pinyin conversion method according to an embodiment of the present application. As shown in fig. 3, converting at least part of words in a text into pinyin to obtain a converted character sequence, which includes the following steps:

301, acquiring word stock. 302, determining non-lexical words in the text based on the lexicon. And 303, converting the non-word stock words in the text into pinyin so as to obtain the character sequence.

The term "as used herein refers to a term" used to help correct recognition results, and the terms included in the term "may be set according to actual application requirements. For example, if the application scene is a hospital, the corresponding word stock contains related words (such as gastroscope) of medical professional terms, and if the application scene is a bank, the corresponding word stock contains related words (such as deposit and withdrawal, financial management and the like, for example, account opening). The word stock may be updated over time, and the updated word stock may contain some of the latest popular words. Of course, the word stock can also be opened for the user, the user can optimize the word stock (for example, delete or add some words) according to the use habit of the user, so that the word stock is better suitable for the correction requirement of the user, for example, the user pronounces nonstandard, dialect tones or words exist, and after the user optimizes the word stock according to the own requirement, the word stock can be utilized to correct the words corresponding to the voice quickly and accurately during error correction.

As an alternative, in the text correction process, which word stock is specifically selected may be customized according to the actual application scenario or specified by the user. In addition, the server may select the application scenario or the word stock that the current sentence to be corrected may relate to according to the determined text carried in the text after the server receives the text, and then recommend the word stock of at least one scenario according to the guess result. Therefore, the effective word stock can be locked quickly, the processing speed is improved, and the workload of personnel is also reduced.

In practical application, after a text is obtained through voice recognition, checking words or characters in the text with word stock words in a word stock one by one, if the words are in the word stock, determining the characters corresponding to the words, and if the words are not in the word stock, namely the words cannot be found from the word stock, determining the words as non-word stock words. Furthermore, the non-word stock words are converted into pinyin, and the pinyin can have various expression modes, which have been illustrated in the foregoing embodiments, and the detailed description is not repeated herein, and reference may be made to the content of the foregoing embodiments. After the word library is used for correcting errors, words needing correction can be determined more accurately.

In the above embodiment, the character sequence is generated by first performing speech recognition to obtain texts which are all characters, and then converting part of the words (non-word stock words which do not belong to the word stock) in the texts into pinyin according to the word stock. In addition, in the voice recognition process, the word which cannot be recognized can be directly converted into pinyin according to the audio.

In one or more embodiments of the present technology, the text correction model is a coding network model in a sequence-to-sequence neural network model. Specifically, the coding network model is a convolutional neural network, a cyclic neural network, a long-term and short-term memory network and a transducer network.

The text correction model is a coding network model, namely only a coding process is adopted, and a decoding process is not adopted, so that the calculated amount is greatly reduced, the calculation efficiency can be effectively improved, the voice correction can be more rapidly completed, and the server downtime is not easily caused. And when inputting to the coding network model, the pinyin and related (i.e. adjacent) words are input together, and when calculating, the feature vectors related to the pinyin context are fully considered for calculating. When the characters corresponding to each pinyin are output independently, namely, the output of the latter character does not need to iterate with the former character, so that the processing speed of the coding network model is faster, and particularly, when the length of an input character sequence is longer, the advantage of the processing speed is more obvious compared with the processing mode with output iteration or decoding process.

In addition, the text correction model may include certain characters in the character sequence input, so that the characters are input into the model without correction. For example, searching the characters in the text to be corrected according to the word stock, if the characters are found in the word stock, determining the words as word stock words, and generating a character sequence by the word stock words and the pinyin according to the context relation of the text to be corrected. The pinyin is output according to the correction result, and the word library words adjacent to or associated with the pinyin are directly output to obtain corrected text. For example, the pinyin is located at a first position in the text to be corrected, the thesaurus words are located at a second position adjacent to the first position in the corrected text, after the character sequence is corrected by using the text correction model, the first position is output as characters corresponding to the pinyin, and the second position is output as thesaurus words. Because the word library and the word are determined, the text correction model is not required to be subjected to related calculation or processing, but only the text corresponding to the pinyin is determined by means of the word library and the word, the correction effect can be improved, the calculation load is not increased for the text correction model, and the processing efficiency is not reduced.

Based on the above embodiments, in the application of voice recognition, a corresponding text to be corrected is obtained according to the voice recognition result, and the text may contain some characters that cannot be accurately determined, so that the part of words in the text may be converted into pinyin, and a character sequence containing pinyin is obtained. And correcting the character sequence by using a pre-trained text correction model to obtain corrected characters corresponding to each Pinyin. Through the technical scheme, the words which cannot be accurately identified are converted into pinyin, and the pinyin sequence and the pre-trained text correction model are used for correcting the words. The text correction model can be utilized to realize the simultaneous output of correction results corresponding to all Pinyin, so that higher correction efficiency and higher correction processing speed can be obtained, and the calculation burden of equipment in the correction process can be effectively reduced

The embodiment of the application also provides a text correction model. Fig. 4 is a schematic structural diagram of a text correction model according to an embodiment of the present application. As can be seen from FIG. 4, the model includes an input layer for receiving a corresponding sequence of characters containing pinyin for a text, the sequence of characters including at least one string representing a single word or word;

The text needs to be converted into a sequence of characters before the sequence of characters is input to the model input layer. The related technical solutions of obtaining text from speech and converting text into character sequences have been explained in the above embodiments, and specific reference may be made to the above embodiments, and the detailed description thereof will not be repeated here. The present embodiment will focus on explaining the operation of the text correction model.

After the input layer obtains the input character sequence, a first position corresponding to the pinyin character string in the character sequence and a character string adjacent to the first position are determined. The character strings which are transmitted to the middle layer and represent pinyin in the character sequence are converted into feature vectors, and simultaneously, the character strings at adjacent positions are also converted into feature vectors. Further, the text at the first position is calculated based on the feature vector. After the characters are determined, the characters are directly output by using an output layer, and the characters output by the output layer are mutually independent. The words referred to herein may be Chinese characters, arabic numerals, greek letters (e.g., α, β), etc. When outputting, the positions of the characters are consistent with the positions of the characters when inputting, namely, the characters obtained by correcting the pinyin at the first position are still placed at the first position, and the characters adjacent to each other in front and behind the characters are unchanged. For example, as shown in fig. 4, assuming that the input character sequence is (x 1, x2, x3, x 4), the input character sequence is input into the coding network model, and after the intermediate layer processing, the mutually independent (y 1, y2, y3, y 4) can be directly obtained, and since no iterative relationship exists between the outputs, the simultaneous output can be ensured, the processing efficiency can be improved, and the calculation burden of the computer is reduced to avoid the downtime problem. The intermediate layer referred to herein may be implemented based on a self-attention (self-attention) mechanism.

In one or more embodiments of the present application, the character sequence includes a thesaurus word, the thesaurus word being determined based on a thesaurus, and the output layer is further configured to, when the character string at the second location is the thesaurus word, use the thesaurus word as the text output at the second location.

In practical application, the text is obtained after the voice recognition, words in the text can be corrected by using a word stock, words in the word stock are called word stock words, and words not in the word stock are called non-word stock words. Because the non-word stock words are not in the word stock, the real word content of the non-word stock words cannot be determined, and therefore, the words corresponding to the non-word stock words expressed by Pinyin need to be determined with the help of the determined word stock words. Since the determined thesaurus words in the character sequence no longer need to be corrected, the thesaurus words and the pinyin character strings are input into the text correction model together for more accurately correcting the pinyin character strings. Thus, at the time of output, the character string at the second position in the character sequence will be directly output. Through the scheme, the word stock words are output as the original words, and excessive occupation of computing resources can be avoided. The working efficiency of the text correction model is improved.

In one or more embodiments of the present application, when calculating the text corresponding to the first position according to the feature vector corresponding to the character string at the first position and the feature vector corresponding to the character string at the at least one adjacent position adjacent to the first position, the at least one middle layer is specifically configured to calculate, according to the feature vector corresponding to the character string at the first position and the feature vector corresponding to the character string at the at least one adjacent position adjacent to the first position, a plurality of candidate texts corresponding to the first position and probabilities of the candidate texts, and select the candidate text with high probability as the text corresponding to the first position.

Specifically, the middle layer is utilized to calculate a first feature vector of the character string at the first position and a second feature vector of the character string at the second position. The first location and the second location are adjacent locations, and in practice, the pinyin string at the first location may be corrected in combination with the string at the third location for better correction. Because the position relationship needs to be considered in the calculation process, the position vector corresponding to each character needs to be determined through the middle layer, for example, the first feature vector corresponding to the pinyin character string and the position vector thereof are summed to obtain the feature vector of the character string at the first position, and the second feature vector corresponding to the word string of the word bank and the position vector thereof are summed to obtain the feature vector of the character string at the second position. Then, the association degree of the two feature vectors is calculated, and the probability that the pinyin at the first position corresponds to each non-word stock word is determined.

And comparing the probability with a threshold value, and if the probability is larger than the threshold value, selecting the word with the highest probability as the corrected non-word stock word. If the probability is smaller than the threshold value, the correction result of the middle layer is abandoned, and the pinyin at the first position is not corrected, for example, characters in the text generated by directly utilizing the voice recognition result can be filled in the first position, and corrected text is generated.

Based on the above embodiment, when the text correction model is used to input the character sequence for correction, the pinyin context association relationship in the character sequence is fully considered. And when outputting, the output results of the corrected characters are mutually independent. That is, simultaneous input and simultaneous output can be achieved, and the output efficiency is higher.

Fig. 5 is a schematic flow chart of a text correction model training method according to an embodiment of the present application. As can be seen from fig. 5, the method specifically comprises the following steps:

501, obtaining a character sequence sample containing pinyin.

And 502, training a text correction model by using the character sequence sample to obtain an output result.

And 503, optimizing the text correction model based on the output result.

In order to enable the text correction model to have better correction capability, the actual situation is fully considered when the training sample is manufactured. The method comprises the steps of obtaining a character sequence sample containing pinyin only, obtaining a character sequence sample containing pinyin and Chinese characters, modifying tones of the pinyin in the existing character sequence sample to obtain a new character sequence sample, modifying initials in the pinyin in the existing character sequence sample to obtain a new character sequence sample, modifying finals in the pinyin in the existing character sequence sample to obtain a new character sequence sample, and modifying one pinyin in adjacent pinyin in the existing character sequence sample to obtain a new character sequence sample.

When training sample data is generated, some original words in the sentence are randomly selected to be reserved and not replaced by pinyin, and context sentences are added with a certain probability to form a combined sequence of Chinese characters and pinyin. For example, as shown in fig. 6, which is a training sample table diagram provided by the embodiment of the present application, an example of data generation shown in fig. 6 is a sentence of "which income is high" in which a character sequence sample containing pinyin is "li3cai c an3 pin3 na3 ge4 shou1 yi4 gao1", a character sequence sample containing pinyin and Chinese characters is "financial product na3 ge4 shou i4 gao1 yi4 gao1", a tone of pinyin in an existing character sequence sample is modified to obtain a new character sequence sample as "li3cai c an3 pin3 na3 ge4 shou y 1 gao", an initial consonant in pinyin in an existing character sequence sample is modified to obtain a new character sequence sample as "li3cai c an3 pin3 la3 ge4 shou yi4 gao1", a final in an existing character sequence sample is modified to obtain a new character sequence as "li3 k 3 in a 3c an3 n3 i3 k 2" 96 c an3 n 3g 3 k 4 yi 3 k 2 ".

In addition, the pinyin in the generated corpus needs to be modified according to special situations, for example, (1) the pinyin input which is completely the same as the original sentence pinyin is adopted to solve the recognition errors of the same tones of the same pinyin, such as ' wu3 zhong 3', the input which is different from the original sentence pinyin in syllable is adopted to solve the recognition errors of different tones of the same pinyin, such as ' wu3 zhong 3', the input which is corresponding to the original sentence pinyin and is the input of the confusing tone is adopted to solve the recognition errors of the confusing tone, including the recognition errors of the confusing tone such as ' tally li3 cai 2' the confusing ni3 cai 1', the confusing tone of the initial consonant ' outer consonant wai4 hui 4' the confusing vowel ' ten wan4 hui ', and (4) the arbitrary input is adopted to improve the robustness of the model, such as ' laugh xiao 4' hua 4' and hua4 '. At the time of actual training, the above 4 modes are replaced by adopting the ratio of 5:2:2:1. The sentence can be repeated for a plurality of times, and different words are selected for replacement, so that a plurality of pieces of different data are obtained.

It should be noted that, when the pinyin is constructed, a plurality of pinyin expression modes may be constructed, including at least one of a pinyin expression mode with a tone, a pinyin expression mode with a intonation, and a pinyin expression mode with a first initial consonant. Specifically:

When constructing data, each sentence in the corpus is firstly converted from a text sequence into a corresponding character sequence containing pinyin character strings as input of a text correction model. Because in the speech recognition text, the words recognized as the same pinyin and the same tone error word, the same pinyin and different tone error word and the confusing tone error word can occur. Therefore, according to different characteristics of pinyin, 3 different pinyin sequence expression modes can be designed, namely, the tone-removed pinyin, the tone-added pinyin and the initial consonant pinyin are respectively used. The 3 expression modes can be used as pinyin characteristic input independently or in combination. Reference may be made specifically to the corresponding embodiment of fig. 2, and the detailed description will not be repeated here.

The embodiment of the application also provides a robot. The robot includes:

The machine body is provided with a machine body,

the voice recognition module is used for converting received voice into text;

In addition, the robot can be further provided with a communication module, and the communication module is used for sending the text obtained by voice recognition to the server, and the server corrects the text by using a text correction model.

For example, fig. 7a is a schematic diagram of an application scenario of a robot according to an embodiment of the present application. In some public places, more and more scenes are beginning to introduce robots to provide relevant services to customers. For example, in a bank business hall, the system is used for guiding customers to complete related business such as account opening and deposit, in a hospital diagnosis and treatment table or hall, for guiding related diagnosis and treatment works such as patient appointment and diagnosis, in a mall, for guiding users to complete shopping activities such as commodity purchasing and settlement, and the system can also be a robot for sweeping goods in a warehouse, a large-scale sweeping robot in a public place, and the like. Robots in these application scenarios need to face different customers, who may come from various regions in the country as well as foreign people. There is also a significant difference in the pronunciation of the customer when the voice interaction is performed. Some nonstandard pronunciations will directly affect the speech recognition result, and problems of some recognition errors inevitably occur. Therefore, in order for the robot to better complete the voice interworking, further correction of the text corresponding to the voice recognition result is required. For ease of understanding, the following is specifically illustrated for a banking application scenario. Assuming that a customer wants to select a financial product, the customer can actively search for the teller machine robot after arriving at the hall, or the teller machine robot can actively search for the customer after sensing that the customer enters the hall. The customer can speak a voice of "which of the financial products is high in return" to the automated teller machine in the bank business hall. Then the automated teller machine recognizes a text after receiving the speech. The automated teller machine sends the text to the server, and the server corrects the text (e.g., financial product CHAN PIN NA GE shou yi gao) by using a text correction model to obtain corrected text information (e.g., which of the financial products is high). And returning the corrected text information to the automatic teller machine, wherein the automatic teller machine can correctly understand the content of the customer, and recommending at least one financial product to the customer according to the order of the income. In the specific application process, at least one financial product can be recommended to the customer according to the corrected voice and the historical record of the customer for purchasing the financial product. In these application scenarios, not only the robot faces a more complex communication environment, but also the ditch flux is larger, and more text contents need to be corrected. Therefore, the technical scheme of the application not only can improve the recognition result, but also can effectively improve the correction processing speed and efficiency.

For another example, fig. 7b is a schematic diagram of another robot application scenario provided in an embodiment of the present application. As can be seen from fig. 7b, the robot is communicatively connected to the server. Assuming that the robot is a floor sweeping robot, when a user speaks about the robot to sweep the robot under the bedridden condition, the robot recognizes and obtains a text after receiving the voice, the text is sent to a server, the server corrects the text by using a text correction model to obtain correct text information, and the correct text information is returned to the robot, so that the robot correctly understands the content spoken by the user and executes the task required by the user.

Of course, in some smart home scenarios, a voice command may not be directly sent to the robot, for example, the user may speak the voice command through a mobile phone client, then the client sends a voice recognition result to the server, the server corrects the text by using a text correction model to obtain correct text information, and the robot correctly understands what the user speaks to execute the task required by the user after returning to the robot.

In addition, the text correction scheme in the embodiment can be widely applied to various voice interaction scenes. For example, intelligent electrical equipment such as an intelligent sound box corrects text of a voice recognition result in a voice interaction process, generates text to correct in the voice interaction process when unmanned application is performed, and the like. After the voice recognition is carried out on other application software, the text corresponding to the voice recognition result can be further corrected. The text correction scheme in the embodiment can be applied to directly correcting the text file, for example, after meeting recording is completed by using a recording pen, the recording pen can directly export the text file, a user cannot determine whether the exporting result of the recording pen is accurate after obtaining the text file, and the text file can be corrected by using the scheme.

Fig. 8 is a schematic structural diagram of a text correction apparatus according to an embodiment of the present application. From fig. 8, it can be seen that the following modules are included:

The conversion module 81 is configured to convert at least part of the words in the text into pinyin, and obtain a converted character sequence.

And the processing module 82 is configured to process the character sequence by using a text correction model to obtain corrected text, where the text correction model is obtained by training a training sample, and the training sample includes a plurality of character sequence samples, and at least part of the character sequence samples are pinyin.

Further, the conversion module 81 is further configured to obtain a word stock, determine non-word stock words in the text based on the word stock, and convert the non-word stock words in the text into pinyin to obtain the character sequence.

The pinyin expression mode comprises at least one of a pinyin expression mode with tone, a pinyin expression mode with intonation and a pinyin expression mode with first consonant.

Optionally, the text correction model is a coding network model in a sequence-to-sequence neural network model.

Optionally, the coding network model is a convolutional neural network, a cyclic neural network, a long-term and short-term memory network, and a transducer network

The embodiment of the application also provides an electronic device, and fig. 9 is a schematic structural diagram of the electronic device provided by the embodiment of the application. The electronic device comprises a power supply assembly 901, a memory 902 and a processor 903, wherein,

The memory 902 is configured to store a program;

the processor 903 is coupled to the memory 902, and is configured to execute the program stored in the memory 902, for:

In addition to the above functions, the processor 903 may realize other functions when executing the program in the memory 902, and in particular, reference may be made to the foregoing descriptions of embodiments.

The embodiment of the application also provides a computer readable storage medium storing a computer program. The computer-readable storage medium stores a computer program, which when executed by one or more processors, causes the one or more processors to perform the steps in the respective method embodiments shown in fig. 1-5.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by the computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims

1. A method of text correction comprising:

Converting at least part of words in a text into pinyin to obtain a converted character sequence, wherein the method specifically comprises the steps of obtaining a word stock, determining non-word stock words in the text based on the word stock, converting the non-word stock words in the text into pinyin to obtain the character sequence, determining the word stock words in the text based on the word stock, and combining the word stock words with the pinyin according to context relation in the text to obtain the character sequence;

2. The method of claim 1, wherein the pinyin representation includes at least one of a toned pinyin representation, a unvoiced pinyin representation, and a first-gram pinyin representation.

3. The method according to any one of claims 1 to 2, wherein the text correction model is a coding network model in a sequence-to-sequence neural network model.

4. The method of claim 3, wherein the encoding network model is a convolutional neural network, a recurrent neural network, a long-short-term memory network, a transducer network.

5. The method of claim 1, wherein the processing of the sequence of characters using a text correction model results in corrected text,

And outputting the characters corresponding to the pinyin in the character sequence to a first position in the text by using the text correction model, and outputting the thesaurus words to a second position in the text.

6. A method for training a text correction model, comprising:

Acquiring a character sequence sample containing pinyin, wherein the character sequence sample contains the pinyin and word stock words;

Optimizing the text correction model based on the output result;

Wherein the text correction model includes:

The input layer is used for receiving a character sequence which corresponds to a text and contains pinyin, wherein the character sequence contains at least one character string representing a single word or a word, and the generation mode of the character sequence comprises the steps of obtaining a word stock, determining non-word stock words in the text based on the word stock, converting the non-word stock words in the text into pinyin so as to obtain the character sequence, determining the word stock words in the text based on the word stock, and combining the word stock words with the pinyin according to context relation in the text so as to obtain the character sequence;

7. The method of claim 6, wherein obtaining a sample of a character sequence containing pinyin comprises at least one of:

Acquiring a character sequence sample only containing pinyin;

Acquiring a character sequence sample containing pinyin and Chinese characters;

modifying the tone of the pinyin in the existing character sequence sample to obtain a new character sequence sample;

Modifying the initial consonants in the pinyin in the existing character sequence sample to obtain a new character sequence sample;

modifying vowels in pinyin in the existing character sequence sample to obtain a new character sequence sample;

Modifying one of adjacent pinyin in the existing character sequence sample to obtain a new character sequence sample.

8. The method of claim 6, wherein the pinyin representation comprises:

At least one of a pinyin expression with tone, a pinyin expression with intonation, and a pinyin expression with first consonant.

9. The method of claim 6, wherein the sequence of characters includes thesaurus words, the thesaurus words being determined based on thesaurus, and

And the output layer is further configured to, when the character string at the second position is the thesaurus word, use the thesaurus word as the text output at the second position.

10. The method according to claim 9, wherein the at least one intermediate layer is configured to, when calculating the text corresponding to the first position based on the feature vector corresponding to the character string at the first position and the feature vector corresponding to the character string at the at least one adjacent position to the first position:

According to the feature vector corresponding to the character string at the first position and the feature vector corresponding to the character string at the at least one adjacent position adjacent to the first position, calculating to obtain a plurality of candidate characters and probabilities of the candidate characters corresponding to the first position;

and selecting the candidate characters with high probability as the characters corresponding to the first position.

11. The method of claim 9, wherein the at least one intermediate layer is further configured to:

and when the probabilities of the plurality of candidate characters corresponding to the first position are lower than a threshold value, discarding the correction of the character string of the first position.

12. An electronic device is characterized by comprising a memory and a processor, wherein,

The memory is used for storing programs;

13. A robot comprising a robot body, a robot body and a robot body, characterized by comprising the following steps:

The machine body is provided with a machine body,

the voice recognition module is used for converting received voice into text;

The text correction module is used for converting at least part of words in a text into pinyin to obtain a converted character sequence, and concretely comprises the steps of obtaining a word stock, determining non-word stock words in the text based on the word stock, converting the non-word stock words in the text into pinyin to obtain the character sequence, wherein the step of converting the non-word stock words in the text into pinyin to obtain the character sequence comprises the steps of determining the word stock words in the text based on the word stock, and combining the word stock words with the pinyin to obtain the character sequence according to context relation in the text;