CN109036406A

CN109036406A - A kind of processing method of voice messaging, device, equipment and storage medium

Info

Publication number: CN109036406A
Application number: CN201810864520.5A
Authority: CN
Inventors: 干晓萍; 范思越
Original assignee: Shenzhen Skyworth RGB Electronics Co Ltd
Current assignee: Shenzhen Skyworth RGB Electronics Co Ltd
Priority date: 2018-08-01
Filing date: 2018-08-01
Publication date: 2018-12-18
Also published as: WO2020024620A1

Abstract

The embodiment of the invention discloses a voice information processing method, device, equipment and storage medium. The method includes: after the voice function is turned on, receiving current voice information input by the user; if the current voice information does not match the reference voice information stored in the voice library, converting the current voice information into text information for display ; Obtain an editing instruction for the text information, and perform an editing operation on the text information according to the editing instruction, and use the new text information after the editing operation as the target text information; combine the target text information with the The current voice information is correspondingly stored in the voice library. By adopting the above technical solution, the problem of limited recognition of different accents of different users when using voice to control electrical equipment is solved. While improving user experience, it also contributes to the popularization of voice input methods for electrical equipment.

Description

A voice information processing method, device, equipment and storage medium

技术领域technical field

本发明实施例涉及语音识别领域，尤其涉及一种语音信息的处理方法、装置、设备和存储介质。Embodiments of the present invention relate to the field of voice recognition, and in particular, to a voice information processing method, device, device, and storage medium.

背景技术Background technique

随着科学技术的发展，电器设备智能化和人性化已经是人们普遍关注的问题，电器设备智能化和人性化为人们的操作提供了极大的便利。With the development of science and technology, the intelligentization and humanization of electrical equipment has become a common concern of people. The intelligentization and humanization of electrical equipment provide great convenience for people's operation.

对于当前市面上的各种电器设备，如电视机、机顶盒等，这些电器设备与用户之间的人机交互一般为单一的按键交互，即通过传统的软键盘输入方式对电器设备进行控制。目前，该类软键盘输入是市场上比较常用也比较流行输入方式。但是这种输入方式在使用过程中，操作相对繁琐，例如，用户在进行汉字输入时，需将汉字对应的拼音逐个进行输入。对于一些不了解拼音或五笔输入法的用户，则无法使用这种软键盘输入方式。For various electrical equipment currently on the market, such as televisions, set-top boxes, etc., the human-computer interaction between these electrical equipment and users is generally a single button interaction, that is, the electrical equipment is controlled through traditional soft keyboard input. At present, this type of soft keyboard input is a commonly used and popular input method in the market. However, the operation of this input method is relatively cumbersome during use. For example, when the user inputs Chinese characters, he needs to input the pinyin corresponding to the Chinese characters one by one. For some users who do not understand Pinyin or Wubi input method, this soft keyboard input method cannot be used.

目前，还存在另外一种是比较流行的语音输入方式。通过语音输入虽然能为用户提供了极大的便利，但是由于不同地区用户的口音存在差异，电器设备在识别过程中难以对不同口音进行识别，从而导致语音输入法也难以得到普及。Currently, there is another popular voice input method. Although voice input can provide users with great convenience, due to differences in the accents of users in different regions, it is difficult for electrical equipment to recognize different accents during the recognition process, resulting in the difficulty of popularizing voice input methods.

发明内容Contents of the invention

本发明实施例提供一种语音信息的处理方法、装置、设备和存储介质，以解决利用语音控制电器设备时，不同用户的不同口音识别受限的问题。Embodiments of the present invention provide a voice information processing method, device, device, and storage medium to solve the problem of limited recognition of different accents of different users when using voice to control electrical equipment.

第一方面，本发明实施例提供了一种语音信息的处理方法，该方法包括：In a first aspect, an embodiment of the present invention provides a method for processing voice information, the method comprising:

在语音功能开启后，接收用户输入的当前语音信息；After the voice function is turned on, receive the current voice information input by the user;

如果所述当前语音信息与语音库中已存储的参考语音信息不匹配，则将所述当前语音信息转换为文字信息进行显示；If the current voice information does not match the reference voice information stored in the voice library, converting the current voice information into text information for display;

获取对所述文字信息的编辑指令，并根据所述编辑指令对所述文字信息执行编辑操作，并将执行编辑操作后的新的文字信息作为目标文字信息；Acquiring an editing instruction for the text information, and performing an editing operation on the text information according to the editing instruction, and using the new text information after the editing operation as the target text information;

将所述目标文字信息与所述当前语音信息对应存入所述语音库中。Correspondingly storing the target text information and the current voice information into the voice library.

第二方面，本发明实施例还提供了一种语音信息的处理装置，该装置包括：In the second aspect, the embodiment of the present invention also provides a voice information processing device, the device comprising:

当前语音信息获取模块，用于在语音功能开启后，接收用户输入的当前语音信息；The current voice information acquisition module is used to receive the current voice information input by the user after the voice function is turned on;

第一显示模块，用于如果所述当前语音信息与语音库中已存储的参考语音信息不匹配，则将所述当前语音信息转换为文字信息进行显示；A first display module, configured to convert the current voice information into text information for display if the current voice information does not match the reference voice information stored in the voice database;

文字信息编辑模块，用于获取对所述文字信息的编辑指令，并根据所述编辑指令对所述文字信息执行编辑操作，并将执行编辑操作后的新的文字信息作为目标文字信息；A text information editing module, configured to obtain an editing instruction for the text information, perform an editing operation on the text information according to the editing instruction, and use the new text information after the editing operation as the target text information;

存储模块，用于将所述目标文字信息与所述当前语音信息对应存入所述语音库中。A storage module, configured to store the target text information and the current voice information in the voice library correspondingly.

第三方面，本发明实施例还提供了一种设备，该设备包括：In a third aspect, an embodiment of the present invention also provides a device, which includes:

一个或多个处理器；one or more processors;

存储装置，用于存储一个或多个程序，storage means for storing one or more programs,

当所述一个或多个程序被所述一个或多个处理器执行，使得所述一个或多个处理器实现本发明任意实施例所提供的语音信息的处理方法。When the one or more programs are executed by the one or more processors, the one or more processors are made to implement the voice information processing method provided by any embodiment of the present invention.

第四方面，本发明实施例还提供了一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现本发明任意实施例所提供的语音信息的处理方法。In a fourth aspect, the embodiment of the present invention also provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the voice information processing method provided by any embodiment of the present invention is implemented.

本发明实施例在语音功能开启后，通过接收用户输入的当前语音信息，如果判断出当前语音信息与语音库中已存储的参考语音信息不匹配，则将当前语音信息转换为文字信息进行显示。此时，如果用户通过所显示的文字信息发现该文字信息与用户已发送的语音信息并不对应时，用户可对该文字信息进行编辑，从而使该文字信息与用户已输出的语音信息相匹配。电器设备在获取用户对文字信息的编辑指令后，可根据编辑指令对文字信息执行编辑操作，并将执行编辑操作后的新的文字信息作为目标文字信息。通过将目标文字信息与当前语音信息对应存入语音库中，当用户再次发出该语音信息时，可从预设语音库中查找到对应的文字信息，如果该文字信息与电器设备的控制指令相对应，则可控制电器设备执行与文字信息对应的控制操作。通过采用上述技术方案，实现了电器设备对不同用户的不同口音进行识别，并可根据识别结果执行相应的动作，从而使得不同地区存在不同口音的用户都能通过语音信息控制电器设备，在提升用户体验的同时，也有助于电器设备语音输入法的大量普及。In the embodiment of the present invention, after the voice function is turned on, by receiving the current voice information input by the user, if it is judged that the current voice information does not match the reference voice information stored in the voice library, the current voice information is converted into text information for display. At this time, if the user finds that the text information does not correspond to the voice information sent by the user through the displayed text information, the user can edit the text information so that the text information matches the voice information that the user has output . After acquiring the user's editing instruction on the text information, the electrical device may perform an editing operation on the text information according to the editing instruction, and use the new text information after the editing operation as the target text information. By storing the target text information and the current voice information in the voice database correspondingly, when the user sends out the voice information again, the corresponding text information can be found from the preset voice database. Correspondingly, the electrical equipment can be controlled to perform the control operation corresponding to the text information. By adopting the above technical solution, the electrical equipment can recognize the different accents of different users, and can perform corresponding actions according to the recognition results, so that users with different accents in different regions can control the electrical equipment through voice information, and improve user experience. At the same time of experience, it also contributes to the popularization of voice input methods for electrical equipment.

附图说明Description of drawings

图1为本发明实施例一提供的一种语音信息的处理方法的流程图；FIG. 1 is a flow chart of a voice information processing method provided by Embodiment 1 of the present invention;

图2为本发明实施例二提供的一种语音信息的处理方法的流程图；FIG. 2 is a flow chart of a voice information processing method provided in Embodiment 2 of the present invention;

图3为本发明实施例三提供的一种语音信息的处理装置的结构框图；FIG. 3 is a structural block diagram of a voice information processing device provided in Embodiment 3 of the present invention;

图4为本发明实施例四提供的一种设备的结构示意图。FIG. 4 is a schematic structural diagram of a device provided by Embodiment 4 of the present invention.

具体实施方式Detailed ways

下面结合附图和实施例对本发明作进一步的详细说明。可以理解的是，此处所描述的具体实施例仅仅用于解释本发明，而非对本发明的限定。另外还需要说明的是，为了便于描述，附图中仅示出了与本发明相关的部分而非全部结构。The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, but not to limit the present invention. In addition, it should be noted that, for the convenience of description, only some structures related to the present invention are shown in the drawings but not all structures.

实施例一Embodiment one

图1为本发明实施例一提供的一种语音信息的处理方法的流程图，该方法可以由语音信息的处理装置来执行，该装置可以通过软件和/或硬件的方式实现，该装置可以集成在如电视、空调等电器设备中，也可集成到智能手机或平板电脑等移动终端中。参见图1，本实施例的方法具体包括：Figure 1 is a flow chart of a voice information processing method provided in Embodiment 1 of the present invention, the method can be executed by a voice information processing device, the device can be implemented by software and/or hardware, and the device can be integrated In electrical appliances such as TVs and air conditioners, it can also be integrated into mobile terminals such as smartphones or tablets. Referring to Fig. 1, the method of the present embodiment specifically includes:

S110、在语音功能开启后，接收用户输入的当前语音信息。S110. After the voice function is turned on, receive current voice information input by the user.

其中，语音输入功能的状态包括激活状态和非激活状态两种。在用户需要使用语音输入法进行通信或交流时，用户可以通过点击电器设备的遥控器上的语音按钮来开启语音输入功能。Wherein, the state of the voice input function includes two kinds of active state and inactive state. When the user needs to use the voice input method for communication or communication, the user can click the voice button on the remote controller of the electrical device to enable the voice input function.

示例性的，可为语音输入功能的状态设置对应的标识符，在激活状态下，可以设置该标识符的标志位为1；在非激活状态下，可以设置该标识符的标志位为0。本实施例中，在检测语音功能是否被激活时，可通过读取该标志位所对应的数值来确定。Exemplarily, a corresponding identifier may be set for the state of the voice input function. In the active state, the flag bit of the identifier may be set to 1; in the inactive state, the flag bit of the identifier may be set to 0. In this embodiment, when detecting whether the voice function is activated, it can be determined by reading the value corresponding to the flag bit.

示例性的，在检测到语音功能开启后，表明用户存在想要通过语音信息控制电器设备的意愿，此时，电器设备将激活语音输入法面板，以接收用户输入的当前语音信息。Exemplarily, after it is detected that the voice function is turned on, it indicates that the user has a desire to control the electrical equipment through voice information. At this time, the electrical equipment will activate the voice input method panel to receive the current voice information input by the user.

S120、如果当前语音信息与语音库中已存储的参考语音信息不匹配，则将当前语音信息转换为文字信息进行显示。S120. If the current voice information does not match the reference voice information stored in the voice database, convert the current voice information into text information for display.

示例性的，用户输入语音信息主要是为了控制电器设备执行相关的动作，例如切换频道、调节音量等，从而可以替代用户手动操作，以提升用户体验。而在利用语音信息控制电器设备进行动作时，一般都会根据语音信息的不同，设置不同的控制指令，电器设备可根据不同的控制指令执行对应的动作。Exemplarily, the voice information input by the user is mainly to control the electrical equipment to perform related actions, such as switching channels, adjusting the volume, etc., so as to replace the user's manual operation and improve user experience. When using voice information to control electrical equipment to perform actions, generally different control instructions are set according to different voice information, and the electrical equipment can perform corresponding actions according to different control instructions.

一般的，电器设备对用户语音信息的识别都是按照对普通话的识别方式进行识别，并且电器设备中存在的语音控制指令一般与普通话相对应。如果电器设备所接收的语音信息不是普通话语音，而是具有用户所在地方特色的方言，则电器设备将无法根据语音信息进行对应的控制。因此为了保证对不同语音信息的正确识别，电器设备会将语音信息的识别结果以文字形式进行显示，以供用户进行确认。并且，本实施例中，文字信息与电器设备的控制指令也存在预设的对应关系，只要用户确认了文字信息无误，并发送了确认指令，电器设备在接收到该确认指令后，则可执行与该文字信息对应的控制动作。Generally, the electrical equipment recognizes the user's voice information according to the recognition method of Mandarin, and the voice control instructions in the electrical equipment generally correspond to Mandarin. If the voice information received by the electrical equipment is not Mandarin voice, but a dialect with the local characteristics of the user, the electrical equipment will not be able to perform corresponding control according to the voice information. Therefore, in order to ensure the correct recognition of different voice information, the electrical equipment will display the recognition result of the voice information in text form for the user to confirm. Moreover, in this embodiment, there is also a preset correspondence between the text information and the control commands of the electrical equipment. As long as the user confirms that the text information is correct and sends a confirmation command, the electrical equipment can execute the control command after receiving the confirmation command. The control action corresponding to the text information.

示例性的，本实施例中，语音信息转换文字信息可采用语音识别、语义解析和语音合成等技术自动将用户输入的语音信息转换为文字信息。其中，进行文字转换的作用如下：用户可根据转换后的文字信息确定出电器设备对当前语音信息的识别是否有误，即电器设备所识别的当前语音信息与用户最开始想要表达的内容是否一致，也即电器设备的识别结果是否符合用户的初衷。或者如果用户在输入语音信息之后仍有需要添加或修改的地方，用户也可以及时进行修正，以避免将还未完善的语音信息发出。Exemplarily, in this embodiment, voice information is converted into text information by using technologies such as speech recognition, semantic analysis, and speech synthesis to automatically convert the voice information input by the user into text information. Among them, the function of text conversion is as follows: the user can determine whether the recognition of the current voice information by the electrical device is wrong according to the converted text information, that is, whether the current voice information recognized by the electrical device is consistent with the content that the user originally wanted to express. Consistency, that is, whether the recognition result of the electrical equipment meets the original intention of the user. Or if the user still needs to add or modify after inputting the voice information, the user can also make corrections in time to avoid sending out the voice information that is not perfect yet.

示例性的，如果用户发现所显示的文字信息与自己想要表达的语音信息不匹配，则可对文字信息进行修改，使其与已发出的语音信息相对应。Exemplarily, if the user finds that the displayed text information does not match the voice information he wants to express, he can modify the text information so that it corresponds to the sent voice information.

需要说明的是，本实施例中的语音库主要用于存储用户的语音信息和对应的文字信息，该文字信息是指与用户输出的语音信息相匹配的文字信息。例如，如果用户发现电器设备当前界面所显示的文字信息与自己想要表达的语音信息并不匹配，则需对文字信息进行修改，而语音库中存储的文字信息即为经过用户修改的且与语音信息相匹配的文字信息。It should be noted that the voice library in this embodiment is mainly used to store the user's voice information and corresponding text information, and the text information refers to text information that matches the voice information output by the user. For example, if the user finds that the text information displayed on the current interface of the electrical equipment does not match the voice information he wants to express, he needs to modify the text information, and the text information stored in the voice library is the one that has been modified by the user and is consistent with the voice information. Voice messages match text messages.

示例性的，本实施例中，当电器设备在每次接收到当前语音信息时，则需将当前语音信息与预设语音库中已存储的语音信息进行匹配，如果当前语音信息为当前用户之前已经输入过的语音信息，即预设语音库中存储有该语音信息和对应的文字信息，即使该语音信息不是电器设备预设支持识别的普通话，电器设备也可从预设语音库中查找到该语音信息，并可对应找到该语音信息所对应的文字信息进行显示。在用户确认文字信息无误后，即如果接收到用户的确认指令，则可执行该文字信息所对应的控制操作。Exemplarily, in this embodiment, when the electrical equipment receives the current voice information each time, it needs to match the current voice information with the voice information stored in the preset voice library, if the current voice information is The voice information that has been input, that is, the voice information and the corresponding text information are stored in the preset voice library. Even if the voice information is not the Mandarin Chinese that the electrical equipment presets to support recognition, the electrical equipment can also find it from the preset voice library. The voice information can be displayed corresponding to the found text information corresponding to the voice information. After the user confirms that the text information is correct, that is, if the user's confirmation instruction is received, the control operation corresponding to the text information can be executed.

优选的，电器设备在将当前语音信息与语音库中已存储的参考语音信息进行匹配时，可先对语音信息做预处理，例如可采用VAD(Voice Activity Detection，语音活动检测)和回声消除等方式，其中，VAD方式是对语音信号首尾段的静音进行切除，以降低对后续语音识别造成的干扰。在完成预处理后，可采用如有线性预测倒谱系数(LPCC)算法和Mel倒谱系数(MFCC)算法，对语音信号进行特征提取，然后利用声学模型以及语音模型技术将声音片段与语音库中已存储的语音信息进行匹配。Preferably, when the electrical device matches the current voice information with the reference voice information stored in the voice database, it can first preprocess the voice information, for example, VAD (Voice Activity Detection, voice activity detection) and echo cancellation can be used. mode, wherein, the VAD mode is to cut off the silence at the beginning and end of the speech signal, so as to reduce the interference caused to the subsequent speech recognition. After the preprocessing is completed, the linear predictive cepstral coefficient (LPCC) algorithm and the Mel cepstral coefficient (MFCC) algorithm can be used to extract the features of the speech signal, and then the acoustic model and speech model technology are used to combine the sound clips with the speech library Match the voice information stored in .

S130、获取对文字信息的编辑指令，并根据编辑指令对文字信息执行编辑操作，并将执行编辑操作后的新的文字信息作为目标文字信息。S130. Obtain an editing instruction for the text information, perform an editing operation on the text information according to the editing instruction, and use new text information after the editing operation as target text information.

示例性的，在本实施例的预设语音库中，可存储有用户已经输入的语音信息和对应的文字信息。电器设备只要检测到语音信息时，则可将接收到的语音信息进行识别，并转换为文字信息进行显示，以供用户确认和校正。如果接收到用户的编辑指令，则说明电器设备对该语音信息的识别结果与用户想要表达的语音信息并不匹配，此时，通过根据编辑指令对文字执行编辑操作，可将编辑操作后的新的文字信息作为目标文字信息，并可将用户当前语音信息和目标语音信息对应存入预设语音库中。Exemplarily, in the preset voice library of this embodiment, voice information and corresponding text information already input by the user may be stored. As long as the electrical equipment detects the voice information, it can recognize the received voice information and convert it into text information for display, so as to be confirmed and corrected by the user. If the user's editing instruction is received, it means that the recognition result of the electrical equipment for the voice information does not match the voice information the user wants to express. At this time, by performing an editing operation on the text according to the editing instruction, the edited text The new text information is used as the target text information, and the user's current voice information and the target voice information can be correspondingly stored in the preset voice library.

S140、将目标文字信息与当前语音信息对应存入语音库中。S140. Correspondingly storing the target text information and the current voice information into the voice database.

示例性的，本实施例中将目标语音信息与当前语音信息对应存储到语音库中，如果电器设备再次接收到与当前语音信息相同的语音信息，则可基于该语音库对再次接收到的语音信息进行准确识别，并从语音库中查找到该语音信息所对应的文字信息，并进行显示，从而解决了电器设备难以识别不同口音的问题，有助于推广电器设备的语音输入功能。Exemplarily, in this embodiment, the target voice information and the current voice information are correspondingly stored in the voice library. If the electrical device receives the same voice information as the current voice information again, the received voice can be analyzed based on the voice library. Accurately identify the information, and find the text information corresponding to the voice information from the voice database, and display it, thus solving the problem that electrical equipment is difficult to recognize different accents, and helping to promote the voice input function of electrical equipment.

示例性的，如果电器设备接收到的语音信息为用户初次输入的语音信息，则该语音信息与语音库中已存储的语音信息并不匹配，则可按照本发明实施例提供的方式对该语音信息进行文字转换，如果用户对转换后的文字信息进行了编辑，则将编辑后的目标文字信息和对应的语音信息对应存入语音库中。Exemplarily, if the voice information received by the electrical equipment is the voice information input by the user for the first time, and the voice information does not match the voice information stored in the voice library, then the voice information can be The information is converted into text, and if the user edits the converted text information, the edited target text information and the corresponding voice information are correspondingly stored in the voice database.

本实施例的技术方案，电器设备通过将接收的当前语音信息与语音库中已存储的参考语音信息进行匹配，如果二者不匹配，则将当前语音转换为文字信息进行显示，并将对该文字信息执行编辑操作后的目标文字信息和当前语音信息对应存入语音库中，从而可使得电器设备再次接收到该目标语音信息时，该即使该语音信息具有用户的口音，电器设备也可对该语音信息进行识别，并执行相应的动作，通过采用上述技术方案，可使得不同地存在不同口音用户都能够通过语音信息控制电器设备，在提升用户体验的同时，也有助于电器设备语音输入法的大量普及。In the technical solution of this embodiment, the electrical equipment matches the received current voice information with the reference voice information stored in the voice library, if the two do not match, the current voice is converted into text information for display, and the After the text information is edited, the target text information and the current voice information are correspondingly stored in the voice database, so that when the electrical equipment receives the target voice information again, even if the voice information has the user’s accent, the electrical equipment can The voice information is recognized and corresponding actions are performed. By adopting the above technical solution, users with different accents can control electrical equipment through voice information. While improving user experience, it is also helpful for voice input methods of electrical equipment. massive popularity.

实施例二Embodiment two

图2为本发明实施例二提供的一种语音信息的处理方法的流程图，本实施例在上述实施例的基础上进行了优化，其中与上述实施例相同或相应的术语的解释在此不再赘述。参见图2，本实施例提供的方法包括：Fig. 2 is a flow chart of a voice information processing method provided by Embodiment 2 of the present invention. This embodiment is optimized on the basis of the above-mentioned embodiments, and explanations of terms that are the same as or corresponding to the above-mentioned embodiments are not explained here. Let me repeat. Referring to Figure 2, the method provided in this embodiment includes:

S210、在语音功能开启后，接收用户输入的当前语音信息。S210. After the voice function is turned on, receive current voice information input by the user.

示例性的，由于语音输入方式相对于手动操作方式的用户体验更好，因此，为了提升用户体验，电器设备会优先推荐语音输入方式。但是，在在语音功能开启后，如果在设定时间内未接收到用户输入的当前语音信息，则将当前语音输入界面切换为文字输入界面，以供用户进行文字输入。Exemplarily, since the user experience of the voice input mode is better than that of the manual operation mode, in order to improve the user experience, the electrical device will preferentially recommend the voice input mode. However, after the voice function is turned on, if the current voice information input by the user is not received within the set time, the current voice input interface is switched to a text input interface for the user to input text.

其中，设定时间可以为电器设备出厂前所设置的时间，例如30秒，也可以为用户根据自身需求设置的时间。Wherein, the set time may be the time set before the electrical equipment leaves the factory, such as 30 seconds, or the time set by the user according to his own needs.

S220、判断当前语音信息与语音库中已存储的参考语音信息是否相匹配，若是，则执行步骤S230；否则，执行步骤S250。S220. Determine whether the current voice information matches the reference voice information stored in the voice library, if yes, perform step S230; otherwise, perform step S250.

示例性的，确定当前语音信息与语音库中已存储的参考语音信息相匹配的操作可以为：Exemplarily, the operation of determining that the current voice information matches the reference voice information stored in the voice library may be:

基于预设语音识别算法对所述语音信息进行预处理，得到多个语音片段；基于预设声学模型，将多个语音片段与语音库中已存储的语音信息进行相似度比较；如果相似度达到设定阈值，则确定当前语音信息与语音库中已存储的语音信息相匹配。Preprocess the voice information based on a preset voice recognition algorithm to obtain a plurality of voice segments; based on a preset acoustic model, compare the similarity of the multiple voice segments with the voice information stored in the voice bank; if the similarity reaches If the threshold is set, it is determined that the current voice information matches the voice information stored in the voice library.

其中，预设语音识别算法为VDA、回声消除和语音拆分等算法，通过该算法可得到多个语音片段。其中，预设声学模型可以为隐马尔科夫模型，通过该模型，可提取语音片段中的声学特征，并将该声学特征与语音库中已存储的语音信息进行相似度比较。其中，设定阈值为经验值，优选为95％。Among them, the preset voice recognition algorithm is algorithms such as VDA, echo cancellation and voice splitting, through which multiple voice segments can be obtained. Wherein, the preset acoustic model may be a Hidden Markov Model, through which the acoustic feature in the speech segment can be extracted, and the similarity between the acoustic feature and the speech information stored in the speech database can be compared. Wherein, the set threshold is an empirical value, preferably 95%.

S230、显示当前语音信息所对应的文字信息，继续执行步骤S240。S230. Display the text information corresponding to the current voice information, and continue to execute step S240.

S240、如果接收到用户的确认指令，则控制当前设备执行与文字信息对应的控制操作。S240. If a confirmation instruction from the user is received, control the current device to perform a control operation corresponding to the text information.

可选的，显示当前语音信息所对应的文字信息的方式可以为：从语音库中查询与当前语音信息相匹配的参考信息所对应的文字信息，并对该文字信息进行显示。或者如果当前语音信息为普通话语音，则可直接将该普通话语音转换为文字信息进行显示。本实施例中，文字信息显示的作用是为了供用户确认电器设备对语音信息识别的准确性，如果电器设备要执行文字信息对应的控制操作，还需在接收到用户的确认指令后再进行。Optionally, the manner of displaying the text information corresponding to the current voice information may be: querying the voice database for the text information corresponding to the reference information matching the current voice information, and displaying the text information. Or if the current voice information is Mandarin voice, the Mandarin voice can be directly converted into text information for display. In this embodiment, the function of displaying the text information is for the user to confirm the accuracy of the recognition of the voice information by the electrical equipment. If the electrical equipment needs to perform the control operation corresponding to the text information, it needs to receive the user's confirmation instruction before proceeding.

示例性的，确认指令可以为：用户通过遥控器发出的确认指令，例如用户点击了遥控器上的确认按键。或者也可以为电器设备识别出包含有确认标识的语音信息，即用户发出了包含有确认标识的语音信息，该确认标识可以为“OK”或“确认”等。Exemplarily, the confirmation instruction may be: a confirmation instruction issued by the user through the remote control, for example, the user clicks a confirmation button on the remote control. Alternatively, the electrical equipment may recognize voice information containing a confirmation mark, that is, the user sends out a voice message containing a confirmation mark, and the confirmation mark may be "OK" or "confirmation" and the like.

S250、将当前语音信息转换为文字信息进行显示，继续执行步骤S260。S250. Convert the current voice information into text information for display, and continue to execute step S260.

S260、获取对文字信息的编辑指令，并根据编辑指令对文字信息执行编辑操作，并将执行编辑操作后的新的文字信息作为目标文字信息。S260. Obtain an editing instruction for the text information, perform an editing operation on the text information according to the editing instruction, and use new text information after the editing operation as target text information.

示例性的，本实施例中提供的文字输入法为具有智能记忆功能的输入法，即电器设备会按照用户对词汇的使用频率对词汇进行存储。当用户在进行文字输入时，只需输入文字的首字符。电器设备如果检测出首字符与本地已存储的多个目标词汇的首字符相匹配，则将多个目标词汇按照使用频率递减的方式进行展示；其中，多个目标词汇均为使用频率达到预设频率的词汇。Exemplarily, the text input method provided in this embodiment is an input method with an intelligent memory function, that is, the electrical device will store the vocabulary according to the frequency of use of the vocabulary by the user. When the user is inputting text, he only needs to input the first character of the text. If the electrical equipment detects that the first character matches the first character of multiple target words stored locally, the multiple target words will be displayed in a manner of decreasing frequency of use; among them, the frequency of use of multiple target words has reached the preset frequency vocabulary.

进一步的，由于本地存储空间有限，因此，电器设备将在存储空间的存储容量达到预设最大存储容量之前，自动清除使用频率较低的词汇或短语。具体的清除规则优选为：电器设备对本地存储的词汇及短语的利用频率做排序，排序在最后面的词汇会优先被清除掉，每次清除的词汇或短语容量占总词汇容量的百分之二十。Further, since the local storage space is limited, the electrical device will automatically clear words or phrases that are used less frequently before the storage capacity of the storage space reaches the preset maximum storage capacity. The specific clearing rule is preferably as follows: the electrical device sorts the usage frequency of the locally stored vocabulary and phrases, and the vocabulary that is sorted at the end will be cleared first, and the capacity of each cleared vocabulary or phrase accounts for a percentage of the total vocabulary capacity twenty.

S270、将目标文字信息与当前语音信息对应存入语音库中。S270. Correspondingly storing the target text information and the current voice information into the voice database.

本实施例在上述实施例的基础上，采用了语音输入与具有智能记忆功能输入法相结合的方式，智能地记录下电器设备在对语音识别时识别错误的语音信息以及用户修改后的正确的目标文字信息，并将该正确的文字信息和语音信息对应存入语音库中。当用户再次输入该语音信息时，电器设备可基于语音库中存储的内容自动识别出用户的语音信息，在接收到用户的确认指令后，则控制当前电器设备执行与目标文字信息对应的控制操作。On the basis of the above-mentioned embodiments, this embodiment adopts the combination of voice input and input method with intelligent memory function, and intelligently records the wrong voice information of the electrical equipment during voice recognition and the correct target modified by the user. text information, and correspondingly store the correct text information and voice information in the voice library. When the user inputs the voice information again, the electrical device can automatically recognize the user's voice information based on the content stored in the voice library, and after receiving the user's confirmation instruction, control the current electrical device to perform the control operation corresponding to the target text information .

实施例三Embodiment three

图3为本发明实施例三提供的一种语音信息的处理装置的结构框图，如图3所示，该装置包括：当前语音信息获取模块310、第一显示模块320、文字信息编辑模块330和存储模块340。Fig. 3 is a structural block diagram of a device for processing speech information provided by Embodiment 3 of the present invention. As shown in Fig. 3 , the device includes: a current speech information acquisition module 310, a first display module 320, a text information editing module 330 and storage module 340 .

其中，当前语音信息获取模块310，用于在语音功能开启后，接收用户输入的当前语音信息；Wherein, the current voice information acquisition module 310 is configured to receive the current voice information input by the user after the voice function is turned on;

第一显示模块320，用于如果所述当前语音信息与语音库中已存储的参考语音信息不匹配，则将所述当前语音信息转换为文字信息进行显示；The first display module 320 is configured to convert the current voice information into text information for display if the current voice information does not match the reference voice information stored in the voice library;

文字信息编辑模块330，用于获取对所述文字信息的编辑指令，并根据所述编辑指令对所述文字信息执行编辑操作，并将执行编辑操作后的新的文字信息作为目标文字信息；A text information editing module 330, configured to obtain an editing instruction for the text information, perform an editing operation on the text information according to the editing instruction, and use the new text information after the editing operation as the target text information;

存储模块340，用于将所述目标文字信息与所述当前语音信息对应存入所述语音库中。The storage module 340 is configured to store the target text information and the current voice information in the voice library correspondingly.

在上述实施例的基础上，该装置还包括：On the basis of the foregoing embodiments, the device also includes:

第二显示模块，用于如果所述当前语音信息与语音库中已存储的参考语音信息相匹配，则显示所述当前语音信息所对应的文字信息；The second display module is used to display the text information corresponding to the current voice information if the current voice information matches the reference voice information stored in the voice database;

控制模块，用于如果接收到用户的确认指令，则控制当前设备执行与所述文字信息对应的控制操作。The control module is configured to control the current device to perform a control operation corresponding to the text information if a confirmation instruction from the user is received.

界面切换模块，用于在语音功能开启后，如果在设定时间内未接收到用户输入的当前语音信息，则将当前语音输入界面切换为文字输入界面，以供用户进行文字输入。The interface switching module is used to switch the current voice input interface to a text input interface for the user to input text if the current voice information input by the user is not received within a set time after the voice function is turned on.

在上述实施例的基础上，第二显示模块具体用于：On the basis of the above embodiments, the second display module is specifically used for:

基于预设语音识别算法对所述语音信息进行预处理，得到多个语音片段；Preprocessing the voice information based on a preset voice recognition algorithm to obtain multiple voice segments;

基于预设声学模型，将所述多个语音片段与语音库中已存储的参考语音信息进行相似度比较；Based on the preset acoustic model, comparing the similarity between the plurality of speech segments and the reference speech information stored in the speech library;

如果所述相似度达到设定阈值，则确定所述当前语音信息与语音库中已存储的参考语音信息相匹配；If the similarity reaches a set threshold, it is determined that the current voice information matches the reference voice information stored in the voice database;

显示所述当前语音信息所对应的文字信息。and display text information corresponding to the current voice information.

在上述实施例的基础上，所述确认指令为：On the basis of the foregoing embodiments, the confirmation instruction is:

用户通过遥控器发出的确认指令；或，A confirmation command from the user via the remote control; or,

包含有确认标识的语音信息。Contains a voice message with an acknowledgment flag.

首字符识别模块，用于在上述实施例的基础上，The initial character recognition module is used to, on the basis of the above-mentioned embodiments,

词汇展示模块，用于如果所述首字符与本地已存储的多个目标词汇的首字符相匹配，则将多个目标词汇按照使用频率递减的方式进行展示；其中，所述多个目标词汇均为使用频率达到预设频率的词汇。A vocabulary display module, configured to display the multiple target vocabulary in a manner of decreasing frequency of use if the first character matches the first character of the locally stored multiple target vocabulary; wherein, the multiple target vocabulary are Words whose usage frequency reaches a preset frequency.

本发明实施例所提供的语音信息的处理装置可执行本发明任意实施例所提供的语音信息的处理方法，具备执行方法相应的功能模块和有益效果。未在上述实施例中详尽描述的技术细节，可参见本发明任意实施例所提供的语音信息的处理方法。The speech information processing device provided in the embodiment of the present invention can execute the speech information processing method provided in any embodiment of the present invention, and has corresponding functional modules and beneficial effects for executing the method. For technical details not exhaustively described in the foregoing embodiments, reference may be made to the voice information processing method provided in any embodiment of the present invention.

实施例四Embodiment four

图4为本发明实施例四提供的一种设备的结构示意图。图4示出了适于用来实现本发明实施方式的示例性设备12的框图。图4显示的设备12仅仅是一个示例，不应对本发明实施例的功能和使用范围带来任何限制。FIG. 4 is a schematic structural diagram of a device provided by Embodiment 4 of the present invention. Figure 4 shows a block diagram of an exemplary device 12 suitable for use in implementing embodiments of the present invention. The device 12 shown in FIG. 4 is only an example, and should not limit the functions and scope of use of this embodiment of the present invention.

如图4所示，设备12以通用计算设备的形式表现。设备12的组件可以包括但不限于：一个或者多个处理器或者处理单元16，系统存储器28，连接不同系统组件(包括系统存储器28和处理单元16)的总线18。As shown in FIG. 4, device 12 takes the form of a general-purpose computing device. Components of device 12 may include, but are not limited to: one or more processors or processing units 16, system memory 28, bus 18 connecting various system components including system memory 28 and processing unit 16.

总线18表示几类总线结构中的一种或多种，包括存储器总线或者存储器控制器，外围总线，图形加速端口，处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说，这些体系结构包括但不限于工业标准体系结构(ISA)总线，微通道体系结构(MAC)总线，增强型ISA总线、视频电子标准协会(VESA)局域总线以及外围组件互连(PCI)总线。Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus structures. These architectures include, by way of example, but are not limited to Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MAC) bus, Enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect ( PCI) bus.

设备12典型地包括多种计算机系统可读介质。这些介质可以是任何能够被设备12访问的可用介质，包括易失性和非易失性介质，可移动的和不可移动的介质。Device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by device 12 and include both volatile and nonvolatile media, removable and non-removable media.

系统存储器28可以包括易失性存储器形式的计算机系统可读介质，例如随机存取存储器(RAM)30和/或高速缓存存储器32。设备12可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例，存储系统34可以用于读写不可移动的、非易失性磁介质(图4未显示，通常称为“硬盘驱动器”)。尽管图4中未示出，可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器，以及对可移动非易失性光盘(例如CD-ROM,DVD-ROM或者其它光介质)读写的光盘驱动器。在这些情况下，每个驱动器可以通过一个或者多个数据介质接口与总线18相连。存储器28可以包括至少一个程序产品，该程序产品具有一组(例如至少一个)程序模块，这些程序模块被配置以执行本发明各实施例的功能。System memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache memory 32 . Device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read and write to non-removable, non-volatile magnetic media (not shown in FIG. 4, commonly referred to as a "hard drive"). Although not shown in FIG. 4, a disk drive for reading and writing to removable nonvolatile disks (e.g., "floppy disks") may be provided, as well as for removable nonvolatile optical disks (e.g., CD-ROM, DVD-ROM or other optical media) CD-ROM drive. In these cases, each drive may be connected to bus 18 via one or more data media interfaces. Memory 28 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of various embodiments of the present invention.

具有一组(至少一个)程序模块42的程序/实用工具40，可以存储在例如存储器28中，这样的程序模块42包括但不限于操作系统、一个或者多个应用程序、其它程序模块以及程序数据，这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块42通常执行本发明所描述的实施例中的功能和/或方法。A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including but not limited to an operating system, one or more application programs, other program modules, and program data , each or some combination of these examples may include implementations of network environments. Program modules 42 generally perform the functions and/or methodologies of the described embodiments of the invention.

设备12也可以与一个或多个外部设备14(例如键盘、指向设备、显示器24等)通信，还可与一个或者多个使得用户能与该设备12交互的设备通信，和/或与使得该设备12能与一个或多个其它计算设备进行通信的任何设备(例如网卡，调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口22进行。并且，设备12还可以通过网络适配器20与一个或者多个网络(例如局域网(LAN)，广域网(WAN)和/或公共网络，例如因特网)通信。如图所示，网络适配器20通过总线18与设备12的其它模块通信。应当明白，尽管图中未示出，可以结合设备12使用其它硬件和/或软件模块，包括但不限于：微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。Device 12 may also communicate with one or more external devices 14 (e.g., a keyboard, pointing device, display 24, etc.), and with one or more devices that enable a user to interact with Device 12 is capable of communicating with any device (eg, network card, modem, etc.) that communicates with one or more other computing devices. Such communication may occur through input/output (I/O) interface 22 . Also, device 12 may communicate with one or more networks (eg, local area network (LAN), wide area network (WAN), and/or public networks, such as the Internet) via network adapter 20 . As shown, network adapter 20 communicates with other modules of device 12 via bus 18 . It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and Data backup storage system, etc.

处理单元16通过运行存储在系统存储器28中的程序，从而执行各种功能应用以及数据处理，例如实现本发明任意实施例所提供的语音信息的处理方法，该方法包括：The processing unit 16 executes various functional applications and data processing by running the program stored in the system memory 28, such as implementing the voice information processing method provided by any embodiment of the present invention, the method includes:

实施例五Embodiment five

本发明实施例五还提供了一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现本发明任意实施例所提供的语音信息的处理方法，该方法包括：Embodiment 5 of the present invention also provides a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the method for processing voice information provided by any embodiment of the present invention is implemented. The method includes:

本发明实施例的计算机存储介质，可以采用一个或多个计算机可读的介质的任意组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括：具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文件中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。The computer storage medium in the embodiments of the present invention may use any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (non-exhaustive list) of computer readable storage media include: electrical connections with one or more leads, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), Erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In this document, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.

计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号，其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式，包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质，该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。A computer readable signal medium may include a data signal carrying computer readable program code in baseband or as part of a carrier wave. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device. .

计算机可读介质上包含的程序代码可以用任何适当的介质传输，包括——但不限于无线、电线、光缆、RF等等，或者上述的任意合适的组合。Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including - but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

可以以一种或多种程序设计语言或其组合来编写用于执行本发明操作的计算机程序代码，所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++，还包括常规的过程式程序设计语言—诸如”C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中，远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机，或者，可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for carrying out the operations of the present invention may be written in one or more programming languages, or combinations thereof, including object-oriented programming languages—such as Java, Smalltalk, C++, and conventional Procedural programming language—such as "C" or a similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).

注意，上述仅为本发明的较佳实施例及所运用技术原理。本领域技术人员会理解，本发明不限于这里所述的特定实施例，对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本发明的保护范围。因此，虽然通过以上实施例对本发明进行了较为详细的说明，但是本发明不仅仅限于以上实施例，在不脱离本发明构思的情况下，还可以包括更多其他等效实施例，而本发明的范围由所附的权利要求范围决定。Note that the above are only preferred embodiments of the present invention and applied technical principles. Those skilled in the art will understand that the present invention is not limited to the specific embodiments described herein, and that various obvious changes, readjustments and substitutions can be made by those skilled in the art without departing from the protection scope of the present invention. Therefore, although the present invention has been described in detail through the above embodiments, the present invention is not limited to the above embodiments, and can also include more other equivalent embodiments without departing from the concept of the present invention, and the present invention The scope is determined by the scope of the appended claims.

Claims

1. A processing method for voice information, comprising:

After the voice function is turned on, receive the current voice information input by the user;

If the current voice information does not match the reference voice information stored in the voice library, converting the current voice information into text information for display;

Acquiring an editing instruction for the text information, and performing an editing operation on the text information according to the editing instruction, and using the new text information after the editing operation as the target text information;

Correspondingly storing the target text information and the current voice information into the voice library.

2. The method according to claim 1, further comprising:

If the current voice information matches the reference voice information stored in the voice database, then display the text information corresponding to the current voice information;

If a confirmation instruction from the user is received, the current device is controlled to perform a control operation corresponding to the text information.

3. The method according to claim 1, further comprising:

After the voice function is turned on, if the current voice information input by the user is not received within the set time, the current voice input interface is switched to a text input interface for the user to input text.

4. The method according to claim 2, wherein the matching of the current voice information with the stored reference voice information in the voice database comprises:

Preprocessing the voice information based on a preset voice recognition algorithm to obtain multiple voice segments;

Based on the preset acoustic model, comparing the similarity between the plurality of speech segments and the reference speech information stored in the speech library;

If the similarity reaches the set threshold, it is determined that the current voice information matches the reference voice information stored in the voice database.

5. The method according to claim 2, wherein the confirmation instruction is:

A confirmation command from the user via the remote control; or,

Contains a voice message with an acknowledgment flag.

6. The method according to claim 3, further comprising:

When the user enters text, recognize the first character entered by the user;

If the first character matches the first character of a plurality of target vocabularies that have been stored locally, the multiple target vocabularies are displayed in a descending manner of use frequency; frequency vocabulary.

7. A processing device for voice information, comprising:

The current voice information acquisition module is used to receive the current voice information input by the user after the voice function is turned on;

A first display module, configured to convert the current voice information into text information for display if the current voice information does not match the reference voice information stored in the voice library;

A text information editing module, configured to acquire an editing instruction for the text information, perform an editing operation on the text information according to the editing instruction, and use the new text information after the editing operation as the target text information;

A storage module, configured to store the target text information and the current voice information in the voice library correspondingly.

8. The device according to claim 7, further comprising:

The second display module is used to display the text information corresponding to the current voice information if the current voice information matches the reference voice information stored in the voice database;

The control module is configured to control the current device to perform a control operation corresponding to the text information if a confirmation instruction from the user is received.

9. A device, characterized in that the device comprises:

one or more processors;

storage means for storing one or more programs,

When the one or more programs are executed by the one or more processors, the one or more processors are made to implement the voice information processing method according to any one of claims 1-6.

10. A computer-readable storage medium, on which a computer program is stored, wherein when the program is executed by a processor, the voice information processing method according to any one of claims 1-6 is realized.