WO2017071453A1

WO2017071453A1 - Method and device for voice recognition

Info

Publication number: WO2017071453A1
Application number: PCT/CN2016/100864
Authority: WO
Inventors: 田孝辉
Original assignee: Yulong Computer Telecommunication Scientific Shenzhen Co Ltd
Current assignee: Yulong Computer Telecommunication Scientific Shenzhen Co Ltd
Priority date: 2015-10-28
Filing date: 2016-09-29
Publication date: 2017-05-04
Anticipated expiration: 2018-04-28
Also published as: CN105551498A

Abstract

A method and device for voice recognition. The method comprises: acquiring a state of an audio player of an electronic device (101); when the audio player is in a playback state, utilizing a first microphone of the electronic device to capture live sounds at the scene, and utilizing a second microphone of the electronic device to capture sounds being played back by the audio player (102), where the live sounds at the scene comprise a voice command of a user and the sounds played back by the audio player; recognizing the voice command of the user from the live sounds at the scene and the sound played back by the audio player (103); and operating the electronic device on the basis of the recognized voice command of the user (104). The technical solution provided in embodiments effectively solves the obstacle to voice recognition technology in a complex scenario, allows the user to use voice commands to effectively operate the electronic device while the audio player is in the playback state, thus greatly enhancing user experience.

Description

Method and device for speech recognition

本申请要求于2015年10月28日提交中国专利局，申请号为201510716257.1、发明名称为“一种语音识别的方法及装置”的中国专利申请的优先权，其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. 201510716257.1, entitled "A Method and Apparatus for Speech Recognition", filed on October 28, 2015, the entire contents of in.

Technical field

本发明涉及语音识别技术领域，具体主要涉及了一种语音识别的方法及装置。The invention relates to the field of speech recognition technology, and in particular relates to a method and a device for speech recognition.

Background technique

目前，随着科学技术的发展，人们可以通过语音对电子设备进行控制。在安静的环境下，电子设备可以有效采集用户的语音，对于用户的语音进行分析处理，以便获取有效的指令，并根据该指令进行动作，从而对用户的语音进行有效响应。At present, with the development of science and technology, people can control electronic devices through voice. In a quiet environment, the electronic device can effectively collect the user's voice, analyze and process the user's voice, in order to obtain a valid instruction, and act according to the instruction, thereby effectively responding to the user's voice.

但是，当电子设备在播放音乐时，用户通过语音对电子设备进行指示，此时，电子设备会同时获取用户的语音和播放的音乐，用户的语音和播放的音乐混合后为正确识别用户的语音增加了很大的难度。However, when the electronic device is playing music, the user indicates the electronic device by voice. At this time, the electronic device acquires the user's voice and the played music at the same time, and the user's voice and the played music are mixed to correctly identify the user's voice. Adding a lot of difficulty.

Summary of the invention

本发明提供了一种语音识别的方法及装置，有效地解决复杂场景下语音识别技术的障碍，使用户可以在音响处于播放状态时，同时使用语音指令对电子设备进行有效操作，极大地提高了用户的体验。The invention provides a method and a device for voice recognition, which effectively solves the obstacle of the voice recognition technology in a complex scene, so that the user can use the voice command to effectively operate the electronic device while the sound is in the playing state, which greatly improves the operation. User experience.

本发明实施例第一方面公开了一种语音识别的方法，包括：A first aspect of the embodiments of the present invention discloses a method for voice recognition, including:

获取电子设备的音响的状态；Obtaining the state of the sound of the electronic device;

当所述音响处于播放状态时，利用所述电子设备的第一麦克风采集现场的声音，利用所述电子设备的第二麦克风获取音响播放的声音，其中所述现场的声音包括用户的语音指令和所述音响播放的声音；When the sound is in a playing state, the first microphone of the electronic device is used to collect the sound of the scene, and the second microphone of the electronic device is used to obtain the sound of the sound playing, wherein the sound of the scene includes the voice instruction of the user and The sound of the sound playing;

从所述现场的声音和所述音响播放的声音中识别出用户的语音指令；Identifying a user's voice command from the sound of the scene and the sound played by the sound;

根据识别出的所述用户的语音指令对所述电子设备进行操作。The electronic device is operated according to the recognized voice instruction of the user.

结合第一方面，在第一方面的第一种可能的实施方式中，所述从所述现场的声音和所述音响播放的声音中识别出用户的语音指令之前，所述方法还包括：In conjunction with the first aspect, in a first possible implementation manner of the first aspect, before the recognizing a voice instruction of the user from the sound of the scene and the sound of the sound playing, the method further includes:

所述电子设备将所述现场的声音填入第一声道得到第一音频数据；The electronic device fills the sound of the scene into the first channel to obtain first audio data;

所述电子设备将所述音响播放的声音填入第二声道得到第二音频数据。The electronic device fills the sound of the audio playback into the second channel to obtain second audio data.

结合第一方面第一种可能的实施方式，在第一方面第二种可能的实施方式中，所述电子设备根据预设方法从所述现场的声音和所述音响播放的声音中获取用户的语音指令，具体包括：With reference to the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, the electronic device acquires a user from the sound of the scene and the sound played by the sound according to a preset method. Voice commands, including:

所述电子设备获取所述第一声道的数据和所述第二声道的数据；所述电子设备利用频率转换方法从所述第一声道的数据和所述第二声道的数据中获取有效音频数据流；The electronic device acquires data of the first channel and data of the second channel; the electronic device uses a frequency conversion method from data of the first channel and data of the second channel Obtaining a valid audio data stream;

所述电子设备利用自动增益控制算法AGC对所述有效音频数据流进行噪声消除，以便获取用户的语音指令。The electronic device performs noise cancellation on the valid audio data stream using an automatic gain control algorithm AGC to acquire a user's voice command.

结合第一方面的第一种可能的实施方式，在第一方面的第三种可能的实施方式中，所述方法还包括：In conjunction with the first possible implementation of the first aspect, in a third possible implementation of the first aspect, the method further includes:

所述电子设备通过协调所述第一麦克风和所述第二麦克风，控制所述第一声道与所述第二声道的之间数据的延时小于阈值。The electronic device controls the delay of data between the first channel and the second channel to be less than a threshold by coordinating the first microphone and the second microphone.

结合第一方面，在第一方面的第四种可能的实施方式中，所述方法还包括：In conjunction with the first aspect, in a fourth possible implementation of the first aspect, the method further includes:

当所述电子设备的音响处于非播放状态时，所述电子设备利用所述第一麦克风和所述第二麦克风获取用户语音；When the sound of the electronic device is in a non-playing state, the electronic device acquires a user voice by using the first microphone and the second microphone;

所述电子设备利用AGC算法对所述用户语音进行噪声消除以便获取所述用户的语音指令；The electronic device performs noise cancellation on the user voice by using an AGC algorithm to acquire a voice instruction of the user;

所述电子设备根据所述用户的语音指令进行操作。The electronic device operates in accordance with a voice instruction of the user.

结合第一方面，在第一方面的第五种可能的实施方式中，所述第一麦克风为主麦克风；所述第二麦克风为副麦克风；所述第一声道为左声道；所述第二声道为右声道。With reference to the first aspect, in a fifth possible implementation manner of the first aspect, the first microphone is a primary microphone; the second microphone is a secondary microphone; the first channel is a left channel; The second channel is the right channel.

本发明实施例第二方面公开了一种语音识别的装置，包括：A second aspect of the embodiments of the present invention discloses a device for voice recognition, including:

第一获取单元，设置为获取电子设备的音响的状态；a first acquiring unit configured to acquire a state of an audio of the electronic device;

采集单元，设置为当所述音响处于播放状态时，利用所述电子设备的第一麦克风采集现场的声音；利用所述电子设备的第二麦克风获取音响播放的声音；The collecting unit is configured to collect the sound of the scene by using the first microphone of the electronic device when the sound is in the playing state, and acquire the sound of the sound playing by using the second microphone of the electronic device;

其中所述现场的声音包括用户的语音指令和所述音响播放的声音；The sound of the scene includes a voice command of the user and a sound played by the sound;

第二获取单元，设置为从所述现场的声音和所述音响播放的声音中获取用户的语音指令；a second acquiring unit, configured to acquire a voice instruction of the user from the sound of the scene and the sound played by the sound;

操作单元，设置为根据识别出的所述用户的语音指令对所述电子设备进行操作。The operating unit is configured to operate the electronic device according to the recognized voice command of the user.

结合第二方面，在第二方面的第一种可能的实施方式中，所述装置还包括：In conjunction with the second aspect, in a first possible implementation manner of the second aspect, the device further includes:

处理单元，设置为所述电子设备将所述现场的声音填入第一声道得到第一音频数据，将所述音响播放的声音填入第二声道得到第二音频数据。The processing unit is configured to: the electronic device fills the sound of the scene into the first channel to obtain the first audio data, and fills the sound of the audio to the second channel to obtain the second audio data.

结合第二方面第一种可能的实施方式，在第二方面的第二种可能的实施方式中，所述第二获取单元具体设置为：With reference to the first possible implementation manner of the second aspect, in a second possible implementation manner of the second aspect, the second acquiring unit is specifically configured to:

获取所述第一声道的数据和所述第二声道的数据；Obtaining data of the first channel and data of the second channel;

利用频率转换方法从所述第一声道的数据和所述第二声道的数据中获取有效音频数据流；Acquiring a valid audio data stream from the data of the first channel and the data of the second channel by using a frequency conversion method;

利用自动增益控制算法AGC对所述有效音频数据流进行噪声消除，以便获取用户的语音指令。The effective audio data stream is subjected to noise cancellation using an automatic gain control algorithm AGC to acquire a user's voice command.

结合第二方面或第二方面的第一种可能的实施方式，在第二方面的第三种可能的实施方式中，所述装置还包括：In conjunction with the second aspect or the first possible implementation of the second aspect, in a third possible implementation of the second aspect, the device further includes:

控制单元，设置为所述电子设备通过协调所述第一麦克风和所述第二麦克风，控制所述第一声道与所述第二声道的之间数据的延时小于阈值。And a control unit configured to: control, by the electronic device, the delay of data between the first channel and the second channel by using the first microphone and the second microphone to be less than a threshold.

结合第二方面，在第二方面的第四种可能的实施方式中，In conjunction with the second aspect, in a fourth possible implementation of the second aspect,

所述第一获取单元还设置为当所述音响处于非播放状态时，所述电子设备利用所述第一麦克风和所述第二麦克风获取用户语音；The first obtaining unit is further configured to: when the sound is in a non-playing state, the electronic device acquires a user voice by using the first microphone and the second microphone;

利用AGC算法对所述语音进行噪声消除以便获取所述用户的语音指令。The speech is noise cancelled using an AGC algorithm to obtain the user's voice instructions.

可以看出，在本发明实施例的方案中，当电子设备音响处于播放状态时，所述电子设备可以利用第一麦克风采集现场的声音，利用第二麦克风获取音响播放的声音，其中所述现场的声音包括用户的语音指令和所述音响播放的声音，根据本技术方案电子设备可以根据第二麦克风获取的音响播放的声音将第一麦克风获取到的现场声音中的音响播放声音去除掉，从而得到用户语音指令，使用户可以在音响处于播放状态时，可同时使用语音指令对电子设备进行有效操作，极大地提高了用户的体验。It can be seen that, in the solution of the embodiment of the present invention, when the electronic device audio is in the playing state, the electronic device may use the first microphone to collect the sound of the scene, and use the second microphone to obtain the sound of the audio playing, wherein the scene The sound of the user includes the voice command of the user and the voice played by the sound. According to the technical solution, the electronic device can remove the sound of the sound played in the live sound acquired by the first microphone according to the sound of the sound played by the second microphone, thereby The user voice command is obtained, so that the user can use the voice command to effectively operate the electronic device while the sound is in the playing state, thereby greatly improving the user experience.

DRAWINGS

为了更清楚地说明本发明实施例中的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings to be used in the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present invention. Those skilled in the art can also obtain other drawings based on these drawings without paying any creative work.

图1为本发明实施例提供的一种语音识别的方法的流程示意图；1 is a schematic flowchart of a method for voice recognition according to an embodiment of the present invention;

图2为本发明的实施例提供的另一种语音识别的方法的流程示意图；FIG. 2 is a schematic flowchart diagram of another method for voice recognition according to an embodiment of the present invention;

图3为本发明的实施例提供的一种语音识别装置的示意图；FIG. 3 is a schematic diagram of a voice recognition apparatus according to an embodiment of the present invention; FIG.

图4为本发明实施例提供的另一种语音识别装置的示意图；4 is a schematic diagram of another voice recognition apparatus according to an embodiment of the present invention;

图5为本发明实施例提供的另一种语音识别装置的示意图；FIG. 5 is a schematic diagram of another voice recognition apparatus according to an embodiment of the present invention; FIG.

图6为本发明实施例提供的电子设备的结构示意图。FIG. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

detailed description

本发明提供一种语音识别的方法及装置，有效地解决了复杂场景下语音识别技术障碍，使用户可以在音响处于播放状态时，可同时使用语音指令对电子设备进行有效操作，极大地提高了用户的体验。The invention provides a method and a device for voice recognition, which effectively solves the obstacle of voice recognition technology in a complex scene, so that the user can use the voice command to effectively operate the electronic device at the same time when the sound is in the playing state, which greatly improves the operation. User experience.

为了使本技术领域的人员更好地理解本发明方案，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚地描述，显然，所描述的实施例仅仅是本发明一部分的实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都应当属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly described in conjunction with the drawings in the embodiments of the present invention. Some embodiments of the invention, rather than all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts shall fall within the scope of the present invention.

在本发明实施例中使用的术语仅仅出于描述特定实施例的目的，而非旨在限制本发明。在本发明实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式，除非上下文清楚地表示其他含义。还应当理解，本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。进一步应当理解，本文中采用的术语“包括”规定了所述的特征、整体、步骤、操作、元件和/或部件的存在，而不排除一个或多个其他特征、整体、步骤、操作、元件、部件和/或它们的组的存在或附加The terms used in the embodiments of the present invention are for the purpose of describing particular embodiments only and are not intended to limit the invention. The singular forms "a", "the" and "the" It will also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It is further understood that the term "comprising", used herein, is intended to mean the presence of the features, the whole, the steps, the operation, the elements and / , or the presence or addition of components and/or their groups

本发明实施例所述的方法可应用在各类具有语音识别功能的智能终端中，例如平板电脑、智能手机、电子阅读器、遥控器、个人电脑PC、笔记本电脑、车载设备、网络电视、可穿戴设备等具有语音识别功能的智能终端中。The method described in the embodiments of the present invention can be applied to various types of intelligent terminals having voice recognition functions, such as a tablet computer, a smart phone, an e-reader, a remote controller, a personal computer PC, a notebook computer, an in-vehicle device, a network television, and the like. A smart terminal having a voice recognition function such as a wearable device.

本发明一种语音识别的方法的一个实施例，一种语音识别的方法，其特征在于，所述方法包括：获取电子设备的音响的状态；当音响处于播放状态时，利用所述电子设备的第一麦克风采集现场的声音，利用所述电子设备的第二麦克风获取音响播放的声音，其中所述现场的声音包括用户的语音指令和所述音响播放的声音；从所述现场的声音和所述音响播放的声音中识别出用户的语音指令；根据识别出的所述用户的语音指令对所述电子设备进行操作。An embodiment of a method for voice recognition, a method for voice recognition, the method comprising: acquiring a state of an audio of an electronic device; and utilizing the electronic device when the sound is in a playing state The first microphone collects the sound of the scene, and the second microphone of the electronic device acquires the sound of the sound playing, wherein the sound of the scene includes the voice command of the user and the sound of the sound playing; the sound and the scene from the scene The voice command of the user is recognized in the voice played by the audio, and the electronic device is operated according to the recognized voice command of the user.

请参阅图1，图1是本发明的一个实施例提供的一种语音识别的方法的流程示意图。其中，如图1所示，本发明的一个实施例提供的一种语音识别方法可以包括以下内容：Please refer to FIG. 1. FIG. 1 is a schematic flowchart diagram of a method for voice recognition according to an embodiment of the present invention. As shown in FIG. 1 , a voice recognition method provided by an embodiment of the present invention may include the following content:

101、获取电子设备的音响状态。101. Acquire an acoustic state of the electronic device.

其中，所述的电子设备为具有播放功能的智能设备。例如，平板电脑、智能手机、电子阅读器、笔记本电脑、车载设备、网络电视、可穿戴设备或其他具有播放功能的智能设备。The electronic device is a smart device with a play function. For example, tablets, smartphones, e-readers, laptops, in-vehicle devices, IPTV, wearables, or other smart devices with playback capabilities.

其中，所述的音响状态，包括音响的开关状态，即音响是否为播放状态。Wherein, the sound state includes an on/off state of the sound, that is, whether the sound is in a play state.

102、当所述音响处于播放状态时，利用所述电子设备的第一麦克风采集现场的声音，利用所述电子设备的第二麦克风获取音响播放的声音，其中所述现场的声音包括用户的语音指令和所述音响播放的声音。102. When the audio is in a playing state, the first microphone of the electronic device is used to collect the sound of the scene, and the second microphone of the electronic device is used to obtain the sound of the audio playing, wherein the sound of the scene includes the voice of the user. The command and the sound of the audio playback.

其中，所述第一麦克风为主麦克风，所述第二麦克风为副麦克风。The first microphone is a primary microphone, and the second microphone is a secondary microphone.

其中，所述用户语音指令可为命令智能设备执行某项操作，例如命令移动终端根据用户的话音自动输入文字，命令车载设备按照用户的语音进行导航，命令手机或者可穿戴设备回答用户提出的问题或其他操作指令。The user voice command may perform an operation on the command smart device, for example, instructing the mobile terminal to automatically input text according to the voice of the user, instructing the in-vehicle device to navigate according to the voice of the user, and instructing the mobile phone or the wearable device to answer the question raised by the user. Or other operating instructions.

103、从所述现场的声音和所述音响播放的声音中识别出用户的语音指令。103. Identify a voice command of the user from the sound of the scene and the sound played by the sound.

优选地，所述电子设备获取所述第一声道的数据和所述第二声道的数据；所述电子设备利用频率转换方法从所述第一声道的数据和所述第二声道的数据中获取有效音频数据流；其中，所述电子设备利用自动增益控制算法AGC对所述有效音频数据流进行噪声消除，以便获取用户的语音指令。Preferably, the electronic device acquires data of the first channel and data of the second channel; the electronic device utilizes a frequency conversion method from data of the first channel and the second channel Obtaining a valid audio data stream in the data; wherein the electronic device performs noise cancellation on the valid audio data stream using an automatic gain control algorithm AGC to obtain a user's voice command.

其中，所述第一声道为左声道；所述第二声道为右声道。Wherein the first channel is a left channel; the second channel is a right channel.

其中，自动增益控制算法AGC为在输入信号变化很大的情况下，自动保持输出信号在很小范围内变化的一种自动控制算法。Among them, the automatic gain control algorithm AGC is an automatic control algorithm that automatically keeps the output signal within a small range when the input signal changes greatly.

104、根据识别出的所述用户的语音指令对所述电子设备进行操作。104. The electronic device is operated according to the identified voice instruction of the user.

其中，所述根据所述用户语音指令对所述电子设备进行操作是通过对语音信息进行语音识别和语义分析得到的。其中，所述对所述电子设备进行操作包括调用各种形式的业务信息。其中，所述业务信息可以是各种媒体形式的，例如文字、声音、图像、动画等；可以是移动终端从本地存储中调用的，也可以是移动终端从网络中获取的；可以是开始调用某个程序时呈现给用户的各种媒体信息，也可以是程序运行中呈现给用户的各种媒体信息。具体地，通过语音识别，得到与该语音信息相应的文字。在某些应用中，例如语音输入法中，可以将文字作为业务信息。一般情况下，还可以对文字进行语义分析，经过语义分析之后得到与文字语义相应的操作指令，即与语音信息相应的操作指令。然后，通过执行语音指令，得到语音信息对应的业务信息。The operation of the electronic device according to the user voice instruction is obtained by performing voice recognition and semantic analysis on the voice information. The operating the electronic device includes invoking various forms of service information. The service information may be in various media formats, such as text, voice, image, animation, etc.; the mobile terminal may be called from the local storage, or may be obtained by the mobile terminal from the network; The various media information presented to the user in a certain program may also be various media information presented to the user during the running of the program. Specifically, text corresponding to the voice information is obtained by voice recognition. In some applications, such as voice input methods, text can be used as business information. Under normal circumstances, it is also possible to perform semantic analysis on the text, and after semantic analysis, an operation instruction corresponding to the semantics of the text, that is, an operation instruction corresponding to the voice information is obtained. Then, by executing the voice instruction, the service information corresponding to the voice information is obtained.

可选的，所述根据所述用户的语音指令对所述电子设备进行操作之前，还包括电子设备对识别到的语音指令进行输出，使用户对语音指令进行确认。其种，所述语音指令的输出形式包括汉字输出，即将用户的语音转换为文字形式显示，用户通过点击确认后，执行下一步操作工作。可选的，所述语音指令的输出形式也可以为语音输出，所述电子设备对语音指令识别后，通过重复语音指令的形式进行语音指令的输出，用户通过输入确认的语音指令或点击屏幕对语音指令进行确认。Optionally, before the operating the electronic device according to the voice instruction of the user, the method further includes: the electronic device outputting the recognized voice command, so that the user confirms the voice command. The output form of the voice command includes a Chinese character output, that is, the user's voice is converted into a text form display, and the user performs the next operation after confirming by clicking. Optionally, the output form of the voice command may also be a voice output. After the electronic device recognizes the voice command, the voice command is output by repeating the voice command, and the user inputs the confirmed voice command or clicks on the screen. The voice command is confirmed.

可以看出，本实施例的方案中，用户在获取电子设备的音响处于播放状态时，通过利用所述电子设备数字双麦克识别技术，即第一麦克风采集现场的声音，利用所述电子设备的第二麦克风获取音响播放的声音，其中所述现场的声音包括用户的语音指令和所述音响播放的声音，同时根据第二麦克风获取的音响播放的声音将第一麦克风获取的现场声音中的音响播放声音去除，从而实现通过语音指令对电子设备进行有效操作。这样有效地解决了复杂场景下语音识别技术障碍，极大地提高了用户体验。It can be seen that, in the solution of the embodiment, when the user acquires the sound of the electronic device in the playing state, the digital microphone is used to capture the sound of the scene, and the first microphone is used to collect the sound of the electronic device. The second microphone acquires the sound of the audio playing, wherein the sound of the scene includes the voice command of the user and the sound of the sound playing, and the sound of the live sound acquired by the first microphone according to the sound played by the sound of the second microphone The playback sound is removed, thereby enabling efficient operation of the electronic device by voice commands. This effectively solves the problem of speech recognition technology in complex scenes and greatly improves the user experience.

其中，根据所述用户的语音指令对所述电子设备进行操作，可选的，对所述语音指令的识别可以用本地的语音识别模块，例如安装在本地的语音识别软件，当然，也可以是其它远程设备上的语音识别模块。The electronic device is operated according to the voice instruction of the user. Optionally, the voice command can be identified by using a local voice recognition module, such as a local voice recognition software. Speech recognition module on other remote devices.

具体地，由于一些用户发音不准或存在口音等问题，当终端识别到语音指令后，可根据预设规则，确定一识别算法，该识别算法即可针对特殊用户的语音信息进行识别，例如，当用户在确定识别算法时，输入的语音测试信息为方言语音信息，则确定的识别算法是针对用户的方言语音信息的识别算法，此后，当用户输入的待识别语音信息为方言语音信息时，根据预设规则确定的识别算法即可对其进行识别，从而完成相应的操作指令动作。当用户需要根据语音信息，使终端完成相应的动作行为时，用户需向终端输入待识别语音信息，终端接收该待识别语音信息。根据识别算法，对待识别语音信息进行识别。具体地，由于网络或本地语音模型库中包括至少一个用户的语音信息，并且，每一个语音信息都存在对应的动作行为，因此，当终端接收到用户输入的待识别语音信息后，根据上述确定的识别算法，对用户输入的待识别语音信息进行识别，当终端识别出用户输入的待识别语音信息为网络或本地语音模型库中的目标语音信息时，终端即可根据目标语音信息所对应的动作行为，识别出待识别语音信息所对应的动作行为，从而执行该识别出的动作行为。Specifically, due to problems such as inaccurate pronunciation or accent, when the terminal recognizes the voice command, the terminal may determine an identification algorithm according to the preset rule, and the recognition algorithm may identify the voice information of the special user, for example, When the user determines the recognition algorithm, the input voice test information is dialect voice information, and the determined recognition algorithm is a recognition algorithm for the dialect voice information of the user. Thereafter, when the voice information to be recognized input by the user is a dialect voice message, The recognition algorithm determined according to the preset rule can identify it, thereby completing the corresponding operation instruction action. When the user needs to perform the corresponding action behavior according to the voice information, the user needs to input the voice information to be recognized to the terminal, and the terminal receives the voice information to be recognized. The voice information to be recognized is identified according to the recognition algorithm. Specifically, since the network or the local voice model library includes at least one user's voice information, and each voice information has a corresponding action behavior, when the terminal receives the voice information to be recognized input by the user, according to the above determination. The identification algorithm identifies the voice information to be recognized input by the user, and when the terminal recognizes that the voice information to be recognized input by the user is the target voice information in the network or the local voice model library, the terminal may correspond to the target voice information. The action behavior identifies the action behavior corresponding to the voice information to be recognized, thereby performing the recognized action behavior.

例如：目前，终端普遍具有语音拍照功能，当用户对着终端说“拍照”或“茄子”时，终端进行语音识别后，判断是否为相应字符，从而执行拍照功能。但是，由于用户发音不准或者口吃，对终端说“拍照”或“茄子”后，仍然无法执行拍照功能，比较难堪，此时，用户可开启上述语音校准模式，根据用户提供的“拍照”等目标语音信息，对用户提供的“拍照”或“茄子”等待识别语音信息进行校准，识别出用户希望执行的拍照功能，从而为用户“量身定做”了一套属于他的语音识别系统，语音拍照功能得以实现。For example, at present, the terminal generally has a voice photographing function. When the user says “photographing” or “eggplant” to the terminal, the terminal performs voice recognition to determine whether it is a corresponding character, thereby performing a photographing function. However, because the user's pronunciation is not accurate or stuttering, after saying "photograph" or "eggplant" to the terminal, it is still impossible to perform the photographing function, which is relatively embarrassing. At this time, the user can turn on the above-mentioned voice calibration mode, according to the "photographing" provided by the user, etc. The target voice information is used to calibrate the “photograph” or “eggplant” provided by the user, and recognize the voice function that the user wishes to perform, thereby “tailoring” a set of voice recognition system belonging to him. The camera function is implemented.

通过上述技术方案，可以针对特殊用户进行“量身定做”的语音识别，具有“特殊个体特殊对待”的优点，避免了统一处理的做法，具有一定的再学习性，极大地增加了语音识别率，提高了用户的体验效果。Through the above technical solutions, it is possible to carry out "tailor-made" speech recognition for special users, and has the advantage of "special individual special treatment", avoids the unified processing, has certain re-learning, and greatly increases the speech recognition rate. , improve the user experience.

请参阅图2，图2是本发明的另一个实施例提供的一种语音识别的方法的流程示意图。其中，如图2所示，本发明的一个实施例提供的一种语音识别方法可以包括以下内容：Please refer to FIG. 2. FIG. 2 is a schematic flowchart diagram of a method for voice recognition according to another embodiment of the present invention. As shown in FIG. 2, a voice recognition method provided by an embodiment of the present invention may include the following contents:

201、获取电子设备的音响状态。201. Acquire an audio state of the electronic device.

202、判断所述音响状态是否为播放状态。202. Determine whether the acoustic state is a play state.

若为播放状态，则执行步骤203；If it is in the playing state, step 203 is performed;

若为非播放状态，则执行步骤208。If it is in the non-play state, step 208 is performed.

203、当音响处于播放状态时，利用所述电子设备的第一麦克风采集现场的声音，利用所述电子设备的第二麦克风获取音响播放的声音，其中所述现场的声音包括用户的语音指令和所述音响播放的声音。203. When the sound is in the playing state, the first microphone of the electronic device is used to collect the sound of the scene, and the second microphone of the electronic device is used to obtain the sound of the sound playing, wherein the sound of the scene includes the voice command of the user and The sound of the audio playback.

204、所述电子设备将所述现场的声音填入第一声道得到第一音频数据；204. The electronic device fills the sound of the scene into the first channel to obtain first audio data.

205、所述电子设备通过协调所述第一麦克风和所述第二麦克风，控制所述第一声道与所述第二声道之间数据的延时小于阈值。205. The electronic device controls, by coordinating the first microphone and the second microphone, a delay of data between the first channel and the second channel to be less than a threshold.

举例来说，所述阈值可以为1ms、2ms、3ms、4ms、5ms、6ms、7ms或其他值。For example, the threshold may be 1 ms, 2 ms, 3 ms, 4 ms, 5 ms, 6 ms, 7 ms, or other values.

优选的，所述阈值为小于或等于2ms的任意正值。Preferably, the threshold is any positive value less than or equal to 2 ms.

可以理解的是，为了达到良好的语音识别效果，我们需要保证在接收第一声道的音效数据流时能同时连续不断地接收第二声道的音效数据流，并保证这两种数据流的同步性。It can be understood that in order to achieve good speech recognition, we need to ensure that the sound data stream of the second channel can be continuously received while receiving the sound data stream of the first channel, and the two data streams are guaranteed. Synchronization.

206、述第一音频数据中获取第一有效音频数据；所述电子设备利用频率转换算法从所述第二音频数据中获取第二有效音频数据；所述电子设备利用自动增益控制算法AGC对所述第一有效音频数据和第二有效音频数据进行噪声消除处理以识别出用户的语音指令。206: Obtain first valid audio data in the first audio data; the electronic device acquires second valid audio data from the second audio data by using a frequency conversion algorithm; and the electronic device uses an automatic gain control algorithm AGC The first valid audio data and the second valid audio data are subjected to noise cancellation processing to identify a user's voice instruction.

207、根据识别出的所述用户的语音指令对所述电子设备进行操作。207. Perform operations on the electronic device according to the identified voice command of the user.

208、当所述电子设备的音响处于非播放状态时，所述电子设备利用所述第一麦克风和所述第二麦克风获取用户语音；所述电子设备利用AGC算法对所述用户语音进行噪声消除以便获取所述用户的语音指令；所述电子设备根据所述用户的语音指令进行操作。208. When the audio of the electronic device is in a non-playing state, the electronic device acquires a user voice by using the first microphone and the second microphone; and the electronic device performs noise cancellation on the user voice by using an AGC algorithm. In order to obtain a voice instruction of the user; the electronic device operates according to the voice instruction of the user.

可以看出，本实施例的方案中，当用户音响处于非播放状态的时候，则语音识别状态转换为数字双麦克降噪技术，通过识别区分语音指令和噪音，从而实现其指令。It can be seen that, in the solution of the embodiment, when the user's audio is in the non-playing state, the voice recognition state is converted into a digital double-mike noise reduction technology, and the voice command and the noise are distinguished by the recognition, thereby realizing the instruction.

请参阅图3，图3是本发明的一个实施例提供的一种语音识别的装置的示意图。其中，如图3所示，本发明的一个实施例提供的一种语音识别装置可以包括以下内容：Please refer to FIG. 3. FIG. 3 is a schematic diagram of an apparatus for voice recognition according to an embodiment of the present invention. As shown in FIG. 3, a voice recognition apparatus provided by an embodiment of the present invention may include the following contents:

第一获取单元301，设置为获取所述电子设备的音响的状态；The first obtaining unit 301 is configured to acquire a state of the sound of the electronic device;

其中，所述第一获取单元还设置为当所述音响处于非播放状态时，所述电子设备利用所述第一麦克风和所述第二麦克风获取用户语音；The first acquiring unit is further configured to: when the sound is in a non-playing state, the electronic device acquires a user voice by using the first microphone and the second microphone;

采集单元302，设置为当所述音响处于播放状态时，利用所述电子设备的第一麦克风采集现场的声音，利用所述电子设备的第二麦克风获取音响播放的声音，其中所述现场的声音包括用户的语音指令和所述音响播放的声音。The collecting unit 302 is configured to collect the sound of the scene by using the first microphone of the electronic device when the sound is in the playing state, and acquire the sound of the sound playing by using the second microphone of the electronic device, wherein the sound of the scene is The voice command of the user and the sound of the audio play are included.

其中，声音采集装置可以包括移动终端内的麦克风阵列或穿戴式声音采集装置。穿戴式声音采集装置可以是佩戴于用户喉咙部位的电子皮肤纹身，也可以是佩戴于用户耳蜗里的骨传感麦克风等。移动终端可根据自身的方位运动状态，来选择进行采集声音的声音采集装置。The sound collection device may include a microphone array or a wearable sound collection device in the mobile terminal. The wearable sound collecting device may be an electronic skin tattoo worn on the throat of the user, or a bone sensing microphone worn in the user's cochlea. The mobile terminal can select a sound collecting device for collecting sound according to its own azimuth motion state.

优选的，所述的声音采集装置为双麦克声音采集装置。Preferably, the sound collection device is a dual microphone sound collection device.

第二获取单元303，设置为从所述现场的声音和所述音响播放的声音中识别出用户的语音指令。The second obtaining unit 303 is configured to recognize a voice command of the user from the sound of the scene and the sound played by the sound.

优选地，所述第二获取单元具体设置为：获取所述第一声道的数据和所述第二声道的数据；利用频率转换方法从所述第一声道的数据和所述第二声道的数据中获取有效音频数据流；利用自动增益控制算法AGC对所述有效音频数据流进行噪声消除，以便获取用户的语音指令。Preferably, the second obtaining unit is specifically configured to: acquire data of the first channel and data of the second channel; and use data from the first channel and the second by using a frequency conversion method Acquiring an effective audio data stream in the data of the channel; performing noise cancellation on the valid audio data stream by using an automatic gain control algorithm AGC to acquire a user's voice command.

操作单元304，根据识别出的所述用户的语音指令对所述电子设备进行操作。The operation unit 304 operates the electronic device according to the recognized voice instruction of the user.

其中，第一获取单元301、采集单元302、第二获取单元303、操作单元304，可以设置为执行实施例1中步骤101、102、 103、104所述的方法，具体描述详见实施例1对所述方法的描述，在此不再赘述。The first obtaining unit 301, the collecting unit 302, the second obtaining unit 303, and the operating unit 304 may be configured to perform the methods described in steps 101, 102, 103, and 104 in Embodiment 1. For details, see Embodiment 1 The description of the method will not be repeated here.

请参阅图4，图4是本发明的一个实施例提供的另一种语音识别的装置的示意图。其中，如图4所示，本发明的一个实施例提供的一种语音识别装置可以包括以下内容：Referring to FIG. 4, FIG. 4 is a schematic diagram of another apparatus for voice recognition according to an embodiment of the present invention. As shown in FIG. 4, a voice recognition apparatus provided by an embodiment of the present invention may include the following contents:

第一获取单元401，设置为获取所述电子设备的音响的状态；The first obtaining unit 401 is configured to acquire a state of the sound of the electronic device;

采集单元402，设置为当所述音响处于播放状态时，利用所述电子设备的第一麦克风采集现场的声音，利用所述电子设备的第二麦克风获取音响播放的声音，其中所述现场的声音包括用户的语音指令和所述音响播放的声音。The collecting unit 402 is configured to collect the sound of the scene by using the first microphone of the electronic device when the sound is in the playing state, and acquire the sound of the sound playing by using the second microphone of the electronic device, wherein the sound of the scene The voice command of the user and the sound of the audio play are included.

处理单元403，设置为将所述现场的声音填入第一声道得到第一音频数据；所述电子设备将所述音响播放的声音填入第二声道得到第二音频数据。The processing unit 403 is configured to fill the first channel with the sound of the scene to obtain first audio data; the electronic device fills the sound of the audio to the second channel to obtain second audio data.

控制单元404，所述电子设备通过协调所述第一麦克风和所述第二麦克风，控制所述第一声道与所述第二声道之间数据的延时小于阈值。The control unit 404, the electronic device controls the delay of data between the first channel and the second channel to be less than a threshold by coordinating the first microphone and the second microphone.

第二获取单元405，设置为从所述现场的声音和所述音响播放的声音中识别出用户的语音指令。The second obtaining unit 405 is configured to recognize a voice command of the user from the sound of the scene and the sound played by the sound.

可选的，所述第二获取单元，还设置为当所述电子设备获取语音指令后，将语音指令进行输出，使用户进行确认。Optionally, the second obtaining unit is further configured to: after the electronic device acquires the voice instruction, output the voice command, so that the user confirms.

可选的，所述第二获取单元，还设置为根据搜集的所述确认信息，获取所述用户对所述语音识别业务信息的认可度，并且接收所述用户对所述语音识别业务信息的认可度信息。Optionally, the second obtaining unit is further configured to: obtain the recognition degree of the voice recognition service information by the user according to the collected confirmation information, and receive the user to the voice recognition service information. Recognition information.

操作单元406，根据识别出的所述用户的语音指令对所述电子设备进行操作。The operation unit 406 operates the electronic device according to the recognized voice instruction of the user.

其中，第一获取单元401、采集单元402、处理单元403、控制单元404、第二获取单元405、操作单元406，可以设置为执行实施例2中步骤201、202、 203、204、205、206所述的方法，具体描述详见实施例2对所述方法的描述，在此不再赘述。The first obtaining unit 401, the collecting unit 402, the processing unit 403, the control unit 404, the second obtaining unit 405, and the operating unit 406 may be configured to perform steps 201, 202, 203, 204, 205, and 206 in Embodiment 2. For a detailed description of the method, refer to the description of the method in Embodiment 2, and details are not described herein again.

可以看出，本实施例的方案中，所述装置增加了控制单元404、处理单元403，同时还增加了对识别出的语音指令的确认，极大地提高了用户体验。It can be seen that, in the solution of the embodiment, the device adds the control unit 404 and the processing unit 403, and also increases the confirmation of the recognized voice command, which greatly improves the user experience.

请参阅图5，图5是本发明的一个实施例提供的另一种语音识别的装置的示意图。其中，如图5所示，本发明的一个实施例提供的一种语音识别装置可以包括以下内容：Referring to FIG. 5, FIG. 5 is a schematic diagram of another apparatus for voice recognition according to an embodiment of the present invention. As shown in FIG. 5, a voice recognition apparatus provided by an embodiment of the present invention may include the following contents:

第一获取单元501，设置为获取所述电子设备的音响的状态；The first obtaining unit 501 is configured to acquire a state of the sound of the electronic device;

采集单元502，设置为当所述音响处于播放状态时，利用所述电子设备的第一麦克风采集现场的声音，利用所述电子设备的第二麦克风获取音响播放的声音，其中所述现场的声音包括用户的语音指令和所述音响播放的声音。The collecting unit 502 is configured to collect the sound of the scene by using the first microphone of the electronic device when the sound is in the playing state, and acquire the sound of the sound playing by using the second microphone of the electronic device, wherein the sound of the scene The voice command of the user and the sound of the audio play are included.

处理单元503，设置为将所述现场的声音填入第一声道得到第一音频数据；所述电子设备将所述音响播放的声音填入第二声道得到第二音频数据。The processing unit 503 is configured to fill the first channel with the sound of the scene to obtain first audio data; and the electronic device fills the sound of the audio to the second channel to obtain second audio data.

控制单元504，所述电子设备通过协调所述第一麦克风和所述第二麦克风，将所述第一声道与所述第二声道的之间数据的延时小于阈值。The control unit 504, the electronic device, by coordinating the first microphone and the second microphone, delays a data between the first channel and the second channel by less than a threshold.

第二获取单元505，设置为从所述现场的声音和所述音响播放的声音中识别出用户的语音指令。The second obtaining unit 505 is configured to recognize a voice command of the user from the sound of the scene and the sound of the sound playing.

可选的，所述第二获取单元，还设置为对根据搜集的所述确认信息以及所述信息搜集模板，获取所述用户对所述语音识别业务信息的认可度，并且接收所述用户对所述语音识别业务信息的认可度信息。Optionally, the second obtaining unit is further configured to acquire the recognition degree of the voice recognition service information by the user according to the collected confirmation information and the information collection template, and receive the user pair The voice recognition service information has the degree of acceptance information.

操作单元506，设置为根据识别出的所述用户的语音指令对所述电子设备进行操作。The operating unit 506 is configured to operate the electronic device according to the recognized voice command of the user.

优化单元507，设置为根据获取单元505获取的所述用户对所述语音识别业务信息的认可度对语音识别算法进行优化。The optimization unit 507 is configured to optimize the speech recognition algorithm according to the recognition degree of the user for the voice recognition service information acquired by the acquisition unit 505.

请参阅图6，是本发明的一个实施例提供的一种电子设备的结构示意图。本发明实施例中的电子设备600可以是不同类型的电子设备，例如：智能手机、平板电脑、掌上电脑以及移动互联网设备、个人数字助理、媒体播放器、智能电视、智能手表、智能眼镜、智能手环等。Please refer to FIG. 6 , which is a schematic structural diagram of an electronic device according to an embodiment of the present invention. The electronic device 600 in the embodiment of the present invention may be different types of electronic devices, such as: smart phones, tablet computers, handheld computers, and mobile Internet devices, personal digital assistants, media players, smart televisions, smart watches, smart glasses, and smart phones. Hand ring and so on.

如图6所示，本发明实施例中的电子设备600包括：至少一个处理装置610，例如CPU，至少一个接收装置613，至少一个存储装置614，至少一个发送装置615，至少一个通信总线612。其中，所述通信总线612用于实现这些组件之间的连接通信。其中，所述接收装置613和所述发送装置615可以是有线发送端口，也可以为无线设备，例如包括天线装置，用于与其他设备进行数据通信。所述存储装置614可以是高速RAM存储器，也可以是非易失性存储器（non-volatile memory），例如至少一个磁盘存储器。As shown in FIG. 6, the electronic device 600 in the embodiment of the present invention includes: at least one processing device 610, such as a CPU, at least one receiving device 613, at least one storage device 614, at least one transmitting device 615, and at least one communication bus 612. The communication bus 612 is used to implement connection communication between these components. The receiving device 613 and the sending device 615 may be wired transmission ports, or may be wireless devices, for example, including antenna devices, for performing data communication with other devices. The storage device 614 may be a high speed RAM memory or a non-volatile memory such as at least one disk storage.

所述处理装置610可执行所述电子设备600的操作系统以及安装的各类应用程序、程序代码等，例如，上述的各个单元，包括所述第一获取单元301（401、501）、所述采集单元302（402、502）、所述第二获取单元303（405、505）、操作单元304（406、506）、所述处理单元403（503）、所述控制单元404（504）、优化单元507等。The processing device 610 can execute an operating system of the electronic device 600 and various installed applications, program codes, and the like, for example, each unit described above, including the first acquiring unit 301 (401, 501), the Acquisition unit 302 (402, 502), second acquisition unit 303 (405, 505), operation unit 304 (406, 506), processing unit 403 (503), control unit 404 (504), optimization Unit 507, etc.

所述存储装置614中存储有程序代码，且所述处理装置610可通过通信总线612，调用所述存储装置614中存储的程序代码以执行相关的功能。例如，图3、图4、图5中所述的各个单元（例如，所述第一获取单元301、所述采集单元302、所述第二获取单元303、所述操作单元304等）是存储在所述存储装置614中的程序代码，并由所述处理装置610所执行，从而实现所述各个单元的功能以实现语音识别的处理。Program code is stored in the storage device 614, and the processing device 610 can invoke program code stored in the storage device 614 via the communication bus 612 to perform related functions. For example, each unit described in FIG. 3, FIG. 4, FIG. 5 (for example, the first acquisition unit 301, the collection unit 302, the second acquisition unit 303, the operation unit 304, etc.) is stored. The program code in the storage device 614 is executed by the processing device 610 to implement the functions of the respective units to implement the process of speech recognition.

在本发明的一个实施例中，所述存储装置614存储多个指令，所述多个指令被所述处理装置610所执行以实现语音识别方法。具体而言，所述处理装置610获取电子设备的音响的状态；当所述音响处于播放状态时，所述处理装置610利用所述电子设备的第一麦克风采集现场的声音，利用所述电子设备的第二麦克风获取音响播放的声音，其中所述现场的声音包括用户的语音指令和所述音响播放的声音；所述处理装置610从所述现场的声音和所述音响播放的声音中识别出用户的语音指令；所述处理装置610根据识别出的所述用户的语音指令对所述电子设备进行操作。In one embodiment of the invention, the storage device 614 stores a plurality of instructions that are executed by the processing device 610 to implement a speech recognition method. Specifically, the processing device 610 acquires the state of the sound of the electronic device; when the sound is in the playing state, the processing device 610 collects the sound of the scene by using the first microphone of the electronic device, and utilizes the electronic device The second microphone acquires the sound of the sound playing, wherein the sound of the scene includes the voice command of the user and the sound of the sound playing; the processing device 610 identifies from the sound of the scene and the sound played by the sound a voice command of the user; the processing device 610 operates the electronic device according to the recognized voice command of the user.

在进一步的实施例中，在处理装置610从所述现场的声音和所述音响播放的声音中识别出用户的语音指令之前，所述处理装置610将所述现场的声音填入第一声道得到第一音频数据；所述处理装置610将所述音响播放的声音填入第二声道得到第二音频数据。In a further embodiment, the processing device 610 fills in the sound of the scene into the first channel before the processing device 610 identifies the user's voice command from the sound of the scene and the sound played by the sound. Obtaining first audio data; the processing device 610 fills the sound of the audio playback into the second channel to obtain second audio data.

在进一步的实施例中，所述处理装置610从所述现场的声音和所述音响播放的声音中识别出用户的语音指令，具体包括：In a further embodiment, the processing device 610 identifies the voice command of the user from the sound of the scene and the sound of the sound, specifically including:

所述处理装置610利用频率转换算法从所述第一音频数据中获取第一有效音频数据；The processing device 610 acquires first valid audio data from the first audio data by using a frequency conversion algorithm;

所述处理装置610利用频率转换算法从所述第二音频数据中获取第二有效音频数据；The processing device 610 acquires second valid audio data from the second audio data by using a frequency conversion algorithm;

所述处理装置610利用自动增益控制算法AGC对所述第一有效音频数据和第二有效音频数据进行噪声消除处理以识别出用户的语音指令。The processing device 610 performs noise cancellation processing on the first valid audio data and the second effective audio data by using an automatic gain control algorithm AGC to recognize a user's voice instruction.

在进一步的实施例中，所述处理装置610通过协调所述第一麦克风和所述第二麦克风，控制所述第一声道与所述第二声道的之间数据的延时小于阈值。在进一步的实施例中，当所述电子设备的音响处于非播放状态时，所述处理装置610利用所述第一麦克风和所述第二麦克风获取用户语音；所述处理装置610利用AGC算法对所述用户语音进行噪声消除以便获取所述用户的语音指令；所述处理装置610根据所述用户的语音指令进行操作。In a further embodiment, the processing device 610 controls the delay of data between the first channel and the second channel to be less than a threshold by coordinating the first microphone and the second microphone. In a further embodiment, the processing device 610 acquires user speech using the first microphone and the second microphone when the audio of the electronic device is in a non-playing state; the processing device 610 utilizes an AGC algorithm pair The user voice performs noise cancellation to acquire a voice instruction of the user; the processing device 610 operates according to the voice instruction of the user.

具体地，所述处理装置610对上述指令的具体实现方法可参考图1至图2对应实施例中相关步骤的描述，在此不赘述。For a specific implementation of the above-mentioned instructions, the processing device 610 may refer to the description of related steps in the corresponding embodiments in FIG. 1 to FIG. 2, and details are not described herein.

可以看出，本实施例的方案中，用户有效地解决了复杂场景下语音识别技术障碍，同时本实施例还增加了语音识别认可度获取和根据认可度数据对语音识别装置进行优化的优化模块，极大地提高了用户体验。It can be seen that, in the solution of the embodiment, the user effectively solves the problem of the voice recognition technology in the complex scenario, and the embodiment further increases the voice recognition recognition degree acquisition and the optimization module for optimizing the voice recognition device according to the approval data. , greatly improving the user experience.

在上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述的部分，可以参见其他实施例的相关描述。In the above embodiments, the descriptions of the various embodiments are different, and the details that are not detailed in a certain embodiment can be referred to the related descriptions of other embodiments.

在本申请所提供的几个实施例中，应该理解到，所揭露的装置，可通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性或其它的形式。In the several embodiments provided herein, it should be understood that the disclosed apparatus may be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical or otherwise.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

另外，在本发明各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备（可为个人计算机、服务器或者网络设备等）执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、只读存储器（ROM，Read-Only Memory）、随机存取存储器（RAM，Random Access Memory）、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention. The foregoing storage medium includes: a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and the like. .

Claims

A method for speech recognition, characterized in that the method comprises:

Obtaining the state of the sound of the electronic device;

When the sound is in a playing state, the first microphone of the electronic device is used to collect the sound of the scene, and the second microphone of the electronic device is used to obtain the sound of the sound playing, wherein the sound of the scene includes the voice instruction of the user and The sound of the sound playing;

Identifying a user's voice command from the sound of the scene and the sound played by the sound;

The electronic device is operated according to the recognized voice instruction of the user.

The method according to claim 1, wherein the method further comprises: before the voice instruction of the user is recognized from the sound of the scene and the sound of the sound playing, the method further comprising:

The electronic device fills the sound of the scene into the first channel to obtain first audio data;

The electronic device fills the sound of the audio playback into the second channel to obtain second audio data.

The method according to claim 2, wherein the electronic device identifies the voice command of the user from the sound of the scene and the sound of the sound, specifically comprising:

The electronic device acquires first valid audio data from the first audio data by using a frequency conversion algorithm;

The electronic device acquires second valid audio data from the second audio data by using a frequency conversion algorithm;

The electronic device performs noise cancellation processing on the first valid audio data and the second effective audio data by using an automatic gain control algorithm AGC to identify a user's voice instruction.

The method of claim 2, wherein the method further comprises:

The electronic device controls the delay of data between the first channel and the second channel to be less than a threshold by coordinating the first microphone and the second microphone.

The method of claim 1 further comprising:

When the sound of the electronic device is in a non-playing state, the electronic device acquires a user voice by using the first microphone and the second microphone;

The electronic device performs noise cancellation on the user voice by using an AGC algorithm to acquire a voice instruction of the user;

The electronic device operates in accordance with a voice instruction of the user.

A device for speech recognition, characterized in that the device comprises:

a first acquiring unit configured to acquire a state of an audio of the electronic device;

The collecting unit is configured to collect the sound of the scene by using the first microphone of the electronic device when the sound is in the playing state, and acquire the sound of the sound playing by using the second microphone of the electronic device;

The sound of the scene includes a voice command of the user and a sound played by the sound;

a second acquiring unit, configured to acquire a voice instruction of the user from the sound of the scene and the sound played by the sound;

The operating unit is configured to operate the electronic device according to the recognized voice command of the user.

The device of claim 6 further comprising

The processing unit is configured to: the electronic device fills the sound of the scene into the first channel to obtain the first audio data, and fills the sound of the audio to the second channel to obtain the second audio data.

The device according to claim 7, wherein the second obtaining unit is specifically configured to:

Obtaining data of the first channel and data of the second channel;

Acquiring a valid audio data stream from the data of the first channel and the data of the second channel by using a frequency conversion method;

The effective audio data stream is subjected to noise cancellation using an automatic gain control algorithm AGC to acquire a user's voice command.

The device according to claim 6 or 7, wherein the device further comprises:

And a control unit configured to: control, by the electronic device, the delay of data between the first channel and the second channel by using the first microphone and the second microphone to be less than a threshold.

The device of claim 6 wherein:

The first obtaining unit is further configured to: when the sound is in a non-playing state, the electronic device acquires a user voice by using the first microphone and the second microphone;

The speech is noise cancelled using an AGC algorithm to obtain the user's voice instructions.