Disclosure of Invention
In view of the above problems, the present invention provides a vehicle control voice recognition method and device, which improve the reliability of vehicle control voice recognition and avoid the execution of false recognition and false instructions.
In a first aspect, the present application provides the following technical solutions through an embodiment:
a vehicle control voice recognition method comprises the following steps:
acquiring a first voice recognition result of a cloud; matching the first voice recognition result in a preset vehicle control word bank to obtain a matching result; the vehicle control word bank is matched with a vehicle subjected to voice control; judging whether the first voice recognition result comprises a first vehicle control type semantic or not based on the matching result; the first vehicle control type semantic is a vehicle control semantic supported by the vehicle; if yes, obtaining a second voice recognition result based on the matching result; wherein the second voice recognition result is used for voice control of the vehicle; and if not, generating prompt information based on the matching result.
Optionally, the vehicle control word bank includes a vehicle control action word bank and a vehicle control object word bank, the matching result includes an action matching result and an object matching result, and the matching of the first speech recognition result in a preset vehicle control word bank to obtain a matching result includes:
matching the first voice recognition result in the vehicle control action word bank to obtain an action matching result; and matching the first voice recognition result in the vehicle control object word bank to obtain an object matching result.
Optionally, the determining, based on the matching result, whether the first speech recognition result includes a first vehicle control type semantic includes:
if the action matching result contains the vehicle control action words in the vehicle control action word bank and the object matching result contains the vehicle control object words in the vehicle control object word bank, determining that the first voice recognition result comprises a first vehicle control type semantic; otherwise, determining that the first voice recognition result does not comprise the first vehicle control type semantic.
Optionally, the obtaining a second speech recognition result based on the matching result includes:
and modifying the first voice recognition result based on the matching result to obtain the second voice recognition result.
Optionally, the generating of the prompt information based on the matching result includes:
obtaining a semantic recognition result corresponding to the first voice recognition result; judging whether the semantic recognition result is a second vehicle control type semantic; if yes, generating first prompt information representing a non-support-of-talk operation based on the semantic recognition result; and if not, generating second prompt information based on the semantic recognition result.
Optionally, the generating of the first prompt information based on the semantic recognition result includes:
acquiring corresponding dialect information based on the semantic recognition result; and generating the first prompt message based on the dialogistic information.
In a second aspect, based on the same inventive concept, the present application provides the following technical solutions through an embodiment:
an in-vehicle control speech recognition device comprising:
the acquisition module is used for acquiring a first voice recognition result of the cloud; the matching module is used for matching the first voice recognition result in a preset vehicle control word bank to obtain a matching result; the vehicle control word bank is matched with a vehicle subjected to voice control; the judging module is used for judging whether the first voice recognition result comprises a first vehicle control type semantic or not based on the matching result; the first vehicle control type semantic is a vehicle control semantic supported by the vehicle; the first processing module is used for obtaining a second voice recognition result based on the matching result if the first processing module is used for obtaining the second voice recognition result; wherein the second voice recognition result is used for voice control of the vehicle; and the second processing module is used for generating prompt information based on the matching result if the matching result is not the same as the preset matching result.
Optionally, the vehicle control word bank includes a vehicle control action word bank and a vehicle control object word bank, the matching result includes an action matching result and an object matching result, and the matching module is specifically configured to:
matching the first voice recognition result in the vehicle control action word bank to obtain an action matching result; and matching the first voice recognition result in the vehicle control object word bank to obtain an object matching result.
In a third aspect, based on the same inventive concept, the present application provides the following technical solutions through an embodiment:
a vehicle control voice recognition device comprising a processor and a memory coupled to the processor, the memory storing instructions that, when executed by the processor, cause the vehicle control voice recognition device to perform the steps of the method of any one of the first aspects above.
In a fourth aspect, based on the same inventive concept, the present application provides the following technical solutions through an embodiment:
a computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of any of the first aspects.
The vehicle control voice recognition method and device provided in the embodiment can be used for recognizing vehicle control voice, and a proprietary vehicle control word bank is adopted for semantic matching in the whole recognition process instead of directly using cloud semantic recognition. After semantic matching is carried out by using the vehicle control word bank, whether the first voice recognition result contains the first vehicle control type semantic meaning or not can be judged, and therefore the real intention of the user can be recognized. And finally, a second voice recognition result for performing voice control on the vehicle is generated according to the matching result, so that the reliability is improved, and the false recognition and the execution of a false instruction are avoided.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As shown in fig. 1, the vehicle-mounted Speech system generally includes a microphone, a recording and denoising module, a wake-up module, a Speech-To-Speech (ASR), a semantic understanding (NLU), a knowledge map, a Text-To-Speech (TTS), and a speaker, wherein the recording and denoising module and the wake-up module are generally integrated in a vehicle machine, and along with technical progress in related fields, the ASR, the NLU, and the TTS may be placed in the vehicle machine locally or in a cloud, and are limited by respective limitations of the local and the cloud, so the vehicle-mounted Speech system in the market is often a cloud-end fusion scheme.
Therefore, as shown in fig. 2, in the conventional vehicle-mounted speech system, after the wake-up engine is started, an english-training signal is collected and then processed to form an audio file, the audio file is divided into two paths, one path is transmitted to the cloud speech semantic recognition module (online ASR & NLU) for speech and semantic recognition, and the other path is transmitted to the local speech semantic recognition module (online ASR & NLU) for speech and semantic recognition. And then, carrying out result arbitration based on the cloud and local recognition results, carrying out instruction execution, and finally carrying out voice feedback. In the result arbitration module and the instruction execution module, the former has the main function of arbitrating and selecting semantic results of an end cloud and a peer cloud, generally, the result is cloud-first, the result is local without the cloud semantic result, and arbitration judgment items such as timeout time are set; the latter is an important interface for the voice system to connect to the vehicle control module. As shown in fig. 3, in the voice vehicle Control scheme, the "instruction execution" module of the voice system performs information interaction with a vehicle CAN (Controller Area Network) Unit of the vehicle system, the "instruction execution" module converts a semantic result into a vehicle Control instruction, transmits the vehicle Control instruction to the CAN Unit, converts the vehicle Control instruction into a bus instruction meeting vehicle communication specifications and security scenes, performs logic processing and communication interaction with other ECU (Electronic Control Unit) nodes of the vehicle, and finally feeds back a state and a result of the other ECU of the vehicle executing the vehicle instruction to the "instruction execution" module.
In the embodiment of the present invention, the architecture of the vehicle-mounted speech system may be improved, and a newly added semantic recognition module (onboard VC _ NLU) is added between the cloud speech semantic recognition module (online ASR & NLU) and the Result arbitration module (Result arbiter), as shown in fig. 4. The method or the device in the embodiment of the invention can be applied to the newly added semantic recognition module. It should be noted that the method or apparatus of the present invention is not only applied to the above-mentioned scenario, but also applied to the above-mentioned improved vehicle-mounted speech system. For example, in the example of fig. 4, the newly added semantic recognition module is located locally in the vehicle, while in other examples, the newly added semantic recognition module may be located in the cloud; in addition, the method or apparatus of the present invention can also be integrated into other processing modules or apparatuses with processing capability for application, without limitation. Of course, the method and apparatus disclosed in the present invention may also be applied to other devices with control scenarios, such as intelligent robots, boats and various intelligent home appliances, without limitation. The method and apparatus of the present invention are described in detail below by way of an example.
First embodiment
Referring to fig. 5, fig. 5 is a flowchart illustrating a vehicle control voice recognition method according to a first embodiment of the present invention, where the method includes:
step S10: and acquiring a first voice recognition result of the cloud.
In step S10, after the user wakes up the voice capture module on the vehicle, the vehicle can capture the voice of the user and upload the captured voice data to the cloud. Therefore, voice recognition is carried out at the cloud end, and a first voice recognition result is obtained. In addition, semantic recognition can be performed on the first voice recognition result at the cloud end, so that a semantic recognition result is obtained.
Step S20: matching the first voice recognition result in a preset vehicle control word bank to obtain a matching result; and the vehicle control word bank is matched with the vehicle subjected to voice control.
In step S20, the preset vehicle control word bank is a word bank in which the vehicle control operation and the vehicle control object are stored in advance. The word bank can be independent from the cloud word bank, and can be convenient for automobile manufacturers to maintain. In addition, in the actual business application process, the cloud speech and semantic recognition used in speech recognition are usually provided by a third party outside an automobile manufacturer to provide a recognition engine, so that the cloud speech and semantic recognition are often difficult to adapt to all vehicle types after cloud recognition. The customization requirements of the individual phone cannot be met. In the embodiment, the automobile manufacturer can update the latest automobile control actions, automobile control objects, dialects and the like in the preset automobile control word bank, so that the accuracy of later semantic recognition is ensured. For example, for an automobile manufacturer, each vehicle type may correspond to a specific vehicle control word bank, or each vehicle family may correspond to a specific vehicle control word bank, so that different specific vehicle control word banks are used for different vehicle types or vehicle families, and the accuracy of semantic recognition is ensured.
Further, the vehicle control word bank of the embodiment may include a vehicle control action word bank and a vehicle control object word bank. Correspondingly, the matching result may include an action matching result and an object matching result. At this time, step S20 may include: matching the first voice recognition result in a vehicle control action word bank to obtain an action matching result; and matching the first voice recognition result in the vehicle control object word bank to obtain an object matching result. For example, the action matching result has two cases: 1. the corresponding vehicle control action is not matched; 2. and matching to corresponding vehicle control actions, such as opening, closing, enlarging, fast forwarding, opening and the like. The object matching result also has two cases: 1. not matching to the corresponding vehicle control object; 2. and matching the automobile control objects to corresponding automobile control objects, such as 'air conditioner', 'windshield wiper', 'atmosphere lamp', 'fog lamp', '360-degree look around', and the like.
Step S30: judging whether the first voice recognition result comprises a first vehicle control type semantic or not based on the matching result; and the first vehicle control type semantic is a vehicle control semantic supported by the vehicle.
In step S30, the first vehicle control type semantic is a specific vehicle control type semantic corresponding to the vehicle, and the first vehicle control type semantic corresponds to the vehicle control thesaurus. Therefore, if the action matching result includes the vehicle control action words in the vehicle control action word bank and the object matching result includes the vehicle control object words in the vehicle control object word bank, it may be determined that the first speech recognition result includes the first vehicle control class semantic. Otherwise, determining that the first voice recognition result does not include the first vehicle control type semantic. In the matching process, the matching result can be determined by the existing algorithm of natural language processing, for example, semantic similarity calculation is performed, and for frequently wrong words and fuzzy words, engineers can perform targeted training on the matching algorithm and personalized design on the vehicle control word bank to improve the matching accuracy. For example, the intention that the user voice wants to express is "open a vehicle condition display interface", the cloud-side voice recognition result is "open a vehicle condition", the cloud-side semantic recognition result is "open a vehicle window", and obviously, the cloud-side semantic recognition result is an erroneous result. And the correct result of the 'opening condition' can be obtained by adopting the vehicle control word stock for identification. Even if the target result of opening the window cannot be obtained, the maintenance engineer can quickly update the vehicle control word bank or the semantic analysis algorithm to obtain a corresponding correct result, and the method is not limited by the cloud.
The voice recognized by the cloud end is subjected to semantic recognition through the vehicle control word bank special for the vehicle through the recognition process, so that correct semantics can be obtained quickly.
Step S40: if yes, obtaining a second voice recognition result based on the matching result; wherein the second voice recognition result is used for voice control of the vehicle.
In step S40, when the first speech recognition result includes the first vehicle control class semantic, a second speech recognition result is obtained based on the matching result. The second speech recognition result determined in this embodiment has two cases: 1. and when the first vehicle control type semantic meaning obtained by recognition is the same as the semantic meaning contained in the first voice recognition result, taking the first voice recognition result as a second voice recognition result. 2. When the semantic meaning of the first vehicle control class obtained by recognition is different from the semantic meaning contained in the first voice recognition result, correcting the first voice recognition result based on the matching result to obtain a second voice recognition result; for example, when the intention expressed by the user is "turn off the ambience lamp", but the cloud speech recognition structure is "turn off the phoenix tail lamp", at this time, the matching result matched in the vehicle control thesaurus is "turn off the ambience lamp", and the first speech recognition result "turn off the phoenix tail lamp" is corrected to obtain the second speech recognition result "turn off the ambience lamp". And finally, generating a corresponding vehicle control instruction according to the second voice recognition result to perform voice control on the vehicle.
Step S50: and if not, generating prompt information based on the matching result.
In step S50, when the first speech recognition result does not include the first vehicle control type semantic, that is, the corresponding semantic for control cannot be found in the vehicle control thesaurus, the prompt information is generated based on the matching result. Specifically, the process of generating the prompt message may include:
firstly, a semantic recognition result corresponding to a first voice recognition result is obtained, and the semantic recognition result is a recognition result of a cloud. Then, judging whether the semantic recognition result is a second vehicle control type semantic; the second vehicle control type semantic meaning is a semantic meaning which cannot be identified through the vehicle control word bank.
1. And if the semantic recognition result is the second vehicle control type semantic, the first voice recognition result is beyond the recognition range of the vehicle control word bank of the current vehicle fixed language. At this time, two cases may be included:
1) the vehicle control word bank contains the semantics which the user wants to express, but the corresponding matching result cannot be matched in the vehicle control word bank through the first voice recognition result.
For this case, first prompt information for prompting the user to change the speech technology may be generated based on the semantic recognition result. Specifically, the corresponding dialect information may be obtained based on the semantic recognition result. The dialect information is a standard dialect that matches the semantic identification result. For example, the standard language corresponding to the vehicle control semantic "open window" is "open window". The verbal information may be obtained from a verbal library pre-stored on the vehicle. Further, generating first prompt information based on the dialect information; finally, the TTS module on the vehicle outputs the voice. For example, the first prompt message output is "bad meaning, the system does not support this, if you want to open the window," open the window "can be said.
2) The vehicle control word bank does not contain the semantics which the user wants to express.
In this case, it is explained that the current vehicle does not support the vehicle control action corresponding to the semantic recognition result. First prompt information that does not support the vehicle control function may be generated. For example, if the semantic recognition result is "open trunk", and the vehicle control thesaurus does not contain the semantic meaning, that is, the vehicle does not support the function, then a voice prompt message of "no answer, and the system does not support the function at present in an effort to learn the skill.
When the dialect corresponding to the semantic recognition result of the cloud exceeds the keyword vocabulary defined by the vehicle control semantic of the vehicle, an error vehicle control instruction is executed possibly, and a safety risk is caused. The steps are used for judging and executing to prompt a user that the execution of the wrong instruction can be effectively avoided.
2. If the semantic recognition result is not the second vehicle control type semantic meaning, the voice sent by the user cannot be used for vehicle control, and at the moment, second prompt information can be generated based on the semantic recognition result. The second prompt message can be a semantic recognition result recognized by a cloud end and output to a vehicle-mounted human-computer interface for displaying. Or not supported by voice prompts.
In addition, in this embodiment, there are provided false recognition cases of the cloud speech recognition result (cloud ASR result), the cloud semantic recognition result (cloud NLU result), and the vehicle semantic recognition result (vehicle-controlled thesaurus recognition/local VC _ NLU result), as shown in tables 1 and 2 below:
TABLE 1
TABLE 2
It can be seen from the above table 1 and table 2 that the real intention of the user can be effectively recognized after the recognition is performed by the method of the present embodiment, and the cloud speech recognition result is corrected to execute the vehicle control action, so that the corresponding guidance prompt can be provided to the user even if the vehicle control action cannot be executed.
It should be noted that, as shown in table 1 and table 2, the method of this embodiment can solve at least the following problems:
1. the result of the cloud ASR is not consistent with the user intention, and the cloud NLU has no semantic result, so that failure is caused, and the user intention is not executed.
2. And the cloud ASR result and the NLU result are not in accordance with the user intention, failure is caused, and an action which is not in accordance with the user intention is executed.
3. The cloud ASR result accords with the user intention, but the cloud NLU semantic result does not accord with the vehicle control intention of the user, so that failure is caused, and the action which does not accord with the user intention is executed.
4. The result of the cloud ASR accords with the user intention, but the cloud NLU has no semantic result, so that failure is caused, and the user intention is not executed;
5. the cloud ASR result accords with the user intention, but the cloud NLU is an incorrect vehicle control semantic result, so that failure is caused, and an action which does not accord with the user intention is executed.
In summary, the vehicle control voice recognition method provided in this embodiment includes: acquiring a first voice recognition result of a cloud; matching the first voice recognition result in a preset vehicle control word bank to obtain a matching result; the vehicle control word bank is matched with the vehicle subjected to voice control; finally, judging whether the first voice recognition result comprises a first vehicle control semantic based on the matching result, wherein the first vehicle control semantic is a vehicle control semantic supported by the vehicle; if so, obtaining a second voice recognition result based on the matching result; the second voice recognition result is used for carrying out voice control on the vehicle; and if not, generating prompt information based on the matching result. In the whole recognition process, the special vehicle control word bank is adopted for semantic matching, and cloud semantic recognition is not directly used. After semantic matching is carried out by using the vehicle control word bank, whether the first voice recognition result contains the first vehicle control type semantic meaning or not can be judged, and therefore the real intention of the user can be recognized. And finally, a second voice recognition result for performing voice control on the vehicle is generated according to the matching result, so that the reliability is improved, and the false recognition and the execution of a false instruction are avoided.
Second embodiment
Referring to fig. 6, a second embodiment of the present invention provides a vehicle-controlled speech recognition apparatus 300 based on the same inventive concept. Fig. 6 is a schematic structural diagram illustrating functional modules of a vehicle-control speech recognition device according to a second embodiment of the present invention. The vehicle-control voice recognition device 300 includes:
the obtaining module 301 is configured to obtain a first voice recognition result of a cloud;
the matching module 302 is configured to match the first voice recognition result in a preset vehicle control word bank to obtain a matching result; the vehicle control word bank is matched with a vehicle subjected to voice control;
a judging module 303, configured to judge whether the first speech recognition result includes a first vehicle control type semantic based on the matching result; the first vehicle control type semantic is a vehicle control semantic supported by the vehicle;
a first processing module 304, configured to, if yes, obtain a second speech recognition result based on the matching result; wherein the second voice recognition result is used for voice control of the vehicle;
and a second processing module 305, configured to generate a prompt message based on the matching result if the matching result is not the same as the first matching result.
As an optional implementation manner, the vehicle control word bank includes a vehicle control action word bank and a vehicle control object word bank, the matching result includes an action matching result and an object matching result, and the matching module 302 is specifically configured to:
matching the first voice recognition result in the vehicle control action word bank to obtain an action matching result; and matching the first voice recognition result in the vehicle control object word bank to obtain an object matching result.
As an optional implementation manner, the determining module 303 is specifically configured to:
if the action matching result contains the vehicle control action words in the vehicle control action word bank and the object matching result contains the vehicle control object words in the vehicle control object word bank, determining that the first voice recognition result comprises a first vehicle control type semantic; otherwise, determining that the first voice recognition result does not comprise the first vehicle control type semantic.
As an optional implementation manner, the first processing module 304 is specifically configured to:
and modifying the first voice recognition result based on the matching result to obtain the second voice recognition result.
As an optional implementation manner, the second processing module 305 is specifically configured to:
obtaining a semantic recognition result corresponding to the first voice recognition result; judging whether the semantic recognition result is a second vehicle control type semantic; if yes, generating first prompt information representing a non-support-of-talk operation based on the semantic recognition result; and if not, generating second prompt information based on the semantic recognition result.
As an optional implementation manner, the second processing module 305 is further specifically configured to:
acquiring corresponding dialect information based on the semantic recognition result; and generating the first prompt message based on the dialogistic information.
It should be noted that, the implementation and technical effects of the vehicle-controlled speech recognition apparatus 300 according to the embodiment of the present invention are the same as those of the foregoing method embodiment, and for a brief description, reference may be made to corresponding contents in the foregoing method embodiment for the part of the embodiment of the apparatus that is not mentioned.
Third embodiment
Based on the same inventive concept, a third embodiment of the present invention further provides a vehicle control voice recognition device, which includes a processor and a memory, wherein the memory is coupled to the processor, and the memory stores instructions that, when executed by the processor, cause the vehicle control voice recognition device to perform any one of the steps of the above method embodiments.
It should be noted that, in the vehicle-control voice recognition apparatus provided in the embodiment of the present invention, the specific implementation and the generated technical effect of each step are the same as those of the foregoing method embodiment, and for brief description, for the sake of brevity, corresponding contents in the foregoing method embodiment may be referred to for what is not mentioned in this embodiment.
Fourth embodiment
Based on the same inventive concept, a fourth embodiment of the present invention also provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, performs any one of the steps in the above-mentioned method embodiments.
It should be noted that, in the computer-readable storage medium provided by the embodiment of the present invention, the specific implementation manner of each step and the generated technical effect achieved when the program is executed by the processor are the same as those of the foregoing method embodiment, and for the sake of brief description, for the sake of brevity, no mention in this embodiment may be made to the corresponding contents in the foregoing method embodiment.
The term "and/or" appearing herein is merely one type of associative relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship; the word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.