[go: up one dir, main page]

CN113808588B - Intelligent voice navigation method, device and system, and computer-readable storage medium - Google Patents

Intelligent voice navigation method, device and system, and computer-readable storage medium Download PDF

Info

Publication number
CN113808588B
CN113808588B CN202010534986.6A CN202010534986A CN113808588B CN 113808588 B CN113808588 B CN 113808588B CN 202010534986 A CN202010534986 A CN 202010534986A CN 113808588 B CN113808588 B CN 113808588B
Authority
CN
China
Prior art keywords
voice
intelligent
semantic
short code
interactive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010534986.6A
Other languages
Chinese (zh)
Other versions
CN113808588A (en
Inventor
董斌
朱云峰
彭倩
张小凡
林玮玮
张�杰
陆东明
严秋红
蔡林俊
李贵阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN202010534986.6A priority Critical patent/CN113808588B/en
Publication of CN113808588A publication Critical patent/CN113808588A/en
Application granted granted Critical
Publication of CN113808588B publication Critical patent/CN113808588B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/50Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
    • H04M3/51Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing
    • H04M3/5166Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing in combination with interactive voice response systems or voice portals, e.g. as front-ends

Landscapes

  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Navigation (AREA)
  • Telephonic Communication Services (AREA)

Abstract

本公开涉及一种智能语音导航方法、装置和系统、计算机可读存储介质。该智能语音导航方法包括:排队机将用户语音流发送到语音识别装置和语义识别装置;语义识别装置将用户语音流的语音语义识别内容发送到智能语音导航装置;智能语音导航装置根据所述语音语义识别内容生成短编码标识,建立语音语义识别内容和短编码标识的对应关系;智能语音导航装置将短编码标识返回给排队机;排队机将短编码标识发送到互动式语音应答服务器;互动式语音应答服务器根据短编码标识从智能语音导航装置获取对应的语音语义识别内容。本公开基于自定义XID短码机制,解决了主流排队机对智能导航识别文字信息过长受限问题。

The present disclosure relates to an intelligent voice navigation method, device and system, and a computer-readable storage medium. The intelligent voice navigation method comprises: the queuing machine sends the user voice stream to the voice recognition device and the semantic recognition device; the semantic recognition device sends the voice semantic recognition content of the user voice stream to the intelligent voice navigation device; the intelligent voice navigation device generates a short coding identifier according to the voice semantic recognition content, and establishes a corresponding relationship between the voice semantic recognition content and the short coding identifier; the intelligent voice navigation device returns the short coding identifier to the queuing machine; the queuing machine sends the short coding identifier to the interactive voice response server; the interactive voice response server obtains the corresponding voice semantic recognition content from the intelligent voice navigation device according to the short coding identifier. The present disclosure is based on a custom XID short code mechanism, which solves the problem that mainstream queuing machines are limited in recognizing too long text information for intelligent navigation.

Description

Intelligent voice navigation method, device and system and computer readable storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence, and in particular, to an intelligent voice navigation method, apparatus and system, and a computer readable storage medium.
Background
In the related art, many call centers adopt IVR (INTERACTIVE VOICE RESPONSE ) flow implementation modes, user key information (code) is transmitted to an IVR server through a voice queuing machine, and the IVR server drives the next action, so that data transmission between the queuing machine and the IVR server is not required to be defined for a long time, for example, a Huazhi call center only supports 23 bytes, namely 11 Chinese characters at maximum.
Under the condition, the intelligent voice navigation is realized, interaction semantic information obtained by voice analysis of a user is required to be transmitted, and the identification information often exceeds the byte limit of a queuing machine, so that the interaction semantic information cannot be transmitted to an IVR server for flow node execution.
The situation exists in the models of UAP8100, UAP3100, zhongxing communication MS10 and the like of call centers constructed before 2016, which brings difficulty to the traditional call centers for realizing intelligent voice navigation, and the related technical solutions require larger adjustment and change of IVR flow Code mechanism.
Disclosure of Invention
In view of at least one of the above technical problems, the present disclosure provides an intelligent voice navigation method, apparatus and system, and a computer readable storage medium, which solve the problem that a main stream queuing machine is limited in excessively long for intelligent navigation recognition text information based on a custom XID short code mechanism.
According to one aspect of the present disclosure, there is provided an intelligent voice navigation method, including:
the queuing machine sends the user voice stream to a voice recognition device and a semantic recognition device;
the semantic recognition device sends voice semantic recognition content of the user voice stream to the intelligent voice navigation device;
The intelligent voice navigation device generates a short code identifier according to the voice semantic recognition content, and establishes a corresponding relation between the voice semantic recognition content and the short code identifier;
the intelligent voice navigation device returns the short code identification to the queuing machine;
The queuing machine sends the short code identification to the interactive voice response server;
And the interactive voice response server acquires corresponding voice semantic recognition content from the intelligent voice navigation device according to the short code identification.
In some embodiments of the present disclosure, the obtaining, by the interactive voice response server, the corresponding voice semantic recognition content from the intelligent voice navigation device according to the short code identifier includes:
the intelligent voice navigation device acquires corresponding voice semantic recognition content according to the short code identification;
the intelligent voice navigation device generates corresponding interactive voice flow executing nodes according to the voice semantic recognition content;
The intelligent voice navigation device packages the interactive voice flow executing node into a flow node voice extensible markup language which can be executed by the interactive voice response server;
And the intelligent voice navigation device returns the flow node voice extensible markup language to the interactive voice response server.
In some embodiments of the present disclosure, the intelligent voice navigation method further comprises:
For the scene of multi-round interaction in intelligent voice navigation, the intelligent voice navigation device is matched with the interactive voice response server, and under the condition of carrying out short code identification conversion on voice semantic identification content, the association logic of the semantic context is established so as to realize complementary identification based on multi-semantic slots.
In some embodiments of the present disclosure, the queuing machine sending the user voice stream to the voice recognition device and the semantic recognition device includes:
The queuing machine sends the user voice stream to the interactive voice response server to request voice allocation;
The interactive voice response server instructs the queuing machine to send the user voice stream to the voice recognition device;
the voice recognition device performs voice recognition on the user voice stream and sends the words after voice recognition to the semantic recognition device.
According to another aspect of the present disclosure, there is provided an intelligent voice navigation apparatus, comprising:
the system comprises a short code conversion module, a short code identification module, a short code response server, a short code identification module, a short code response module and a voice response server, wherein the short code conversion module is used for receiving voice semantic identification content of a user voice stream, the queuing machine sends the user voice stream to the voice identification device and the semantic identification device, and the semantic identification device outputs the voice semantic identification content of the user voice stream to the intelligent voice navigation device;
The short coding decoding module is used for analyzing the short coding identification of the interactive voice response server, acquiring voice semantic recognition content corresponding to the short coding identification, and returning the voice semantic recognition content corresponding to the short coding identification to the interactive voice response server.
In some embodiments of the present disclosure, a short code decoding module is configured to obtain corresponding speech semantic recognition content according to a short code identifier, generate a corresponding interactive speech process execution node according to the speech semantic recognition content, package the interactive speech process execution node into a process node speech extensible markup language executable by an interactive speech response server, and return the process node speech extensible markup language to the interactive speech response server.
In some embodiments of the present disclosure, a short code conversion module and a short code decoding module are configured to, for a scenario of multiple interactions in intelligent voice navigation, cooperate with an interactive voice response server, and under a condition of performing short code identification conversion on voice semantic recognition content, establish association logic of semantic contexts to implement multi-semantic slot-based complement recognition.
In some embodiments of the present disclosure, the intelligent voice navigation apparatus further comprises:
And the message middleware module is used for storing the corresponding relation between the voice semantic recognition content and the short code identification.
According to another aspect of the present disclosure, there is provided an intelligent voice navigation system, comprising:
A queuing machine for transmitting the user voice stream to the voice recognition device and the semantic recognition device;
The semantic recognition device is used for sending voice semantic recognition content of the user voice stream to the intelligent voice navigation device;
the intelligent voice navigation device is used for generating a short code identifier according to the voice semantic recognition content, establishing a corresponding relation between the voice semantic recognition content and the short code identifier, returning the short code identifier to the queuing machine, and indicating the queuing machine to send the short code identifier to the interactive voice response server;
and the interactive voice response server is used for acquiring corresponding voice semantic recognition content from the intelligent voice navigation device according to the short code identification.
In some embodiments of the present disclosure, the intelligent voice navigation system further comprises:
The queuing machine is also used for sending the user voice stream to the interactive voice response server to request voice allocation;
the interactive voice response server is also used for instructing the queuing machine to send the user voice stream to the voice recognition device;
And the voice recognition device is used for carrying out voice recognition on the voice stream of the user and sending the words after voice recognition to the semantic recognition device.
In some embodiments of the disclosure, the intelligent voice navigation apparatus is an intelligent voice navigation apparatus as described in any of the embodiments above.
According to another aspect of the present disclosure, there is provided a computer readable storage medium storing computer instructions that when executed by a processor implement the intelligent voice navigation method according to any one of the embodiments above.
The method and the device solve the problem that the main stream queuing machine is excessively long and limited for intelligent navigation recognition text information based on a custom XID short code mechanism.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort to a person of ordinary skill in the art.
Fig. 1 is a schematic diagram of some embodiments of the disclosed intelligent voice navigation method.
Fig. 2 is a schematic diagram of other embodiments of the intelligent voice navigation method of the present disclosure.
Fig. 3 is a schematic diagram of a short code identification multi-round interaction mechanism in some embodiments of the present disclosure.
Fig. 4 is a schematic diagram of some embodiments of an intelligent voice navigation device of the present disclosure.
Fig. 5 is a schematic diagram of some embodiments of the disclosed intelligent voice navigation system.
Detailed Description
The following description of the technical solutions in the embodiments of the present disclosure will be made clearly and completely with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, not all embodiments. The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses. Based on the embodiments in this disclosure, all other embodiments that a person of ordinary skill in the art would obtain without making any inventive effort are within the scope of protection of this disclosure.
The relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless it is specifically stated otherwise.
Meanwhile, it should be understood that the sizes of the respective parts shown in the drawings are not drawn in actual scale for convenience of description.
Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but should be considered part of the specification where appropriate.
In all examples shown and discussed herein, any specific values should be construed as merely illustrative, and not a limitation. Thus, other examples of the exemplary embodiments may have different values.
It should be noted that like reference numerals and letters refer to like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.
Fig. 1 is a schematic diagram of some embodiments of the disclosed intelligent voice navigation method. Preferably, the present embodiment may be performed by the disclosed intelligent voice navigation apparatus or the disclosed intelligent voice navigation system. The method may comprise steps 11-16, wherein:
Step 11, the queuing machine sends the user voice stream to the voice recognition device and the semantic recognition device.
In some embodiments of the present disclosure, step 11 may include steps 111-113, wherein:
In step 111, the queuing machine sends the user voice stream to the interactive voice response server, requesting voice allocation.
At step 112, the interactive voice response server instructs the queuing machine to send the user voice stream to the voice recognition device.
Step 113, the voice recognition device performs voice recognition on the user voice stream and sends the words after voice recognition to the semantic recognition device.
In step 12, the semantic recognition device sends the speech semantic recognition content of the user speech stream to the intelligent speech navigation device.
And 13, the intelligent voice navigation device generates a short code identifier XID according to the voice semantic recognition content, and establishes a corresponding relation between the voice semantic recognition content and the short code identifier by using a Redis (Remote Dictionary Server, namely a remote dictionary service) memory server.
In some embodiments of the present disclosure, redis is a mature, open-source data caching service architecture, which is widely used for memory-based data fast reading and writing, and efficient real-time message stack management can be achieved based on Redis clusters. The embodiment of the disclosure builds transmission scheduling of the high-concurrency intelligent navigation interaction semantic XML data message based on Redis.
And step 14, the intelligent voice navigation device returns the short code identification to the queuing machine.
And step 15, the queuing machine sends the short code identification to the interactive voice response server.
And step 16, the interactive voice response server acquires corresponding voice semantic recognition content from the intelligent voice navigation device according to the short code identification.
In some embodiments of the present disclosure, step 16 may include steps 161-164, wherein:
step 161, the intelligent voice navigation device obtains the corresponding voice semantic recognition content according to the short code identification.
Step 162, the intelligent voice navigation device generates corresponding interactive voice flow executing nodes according to the voice semantic recognition content.
In step 163, the intelligent voice navigation device encapsulates the interactive voice process execution node into a process node voice extensible markup language executable by the interactive voice response server.
In step 164, the intelligent voice navigation device returns the process node voice extensible markup language to the interactive voice response server.
In some embodiments of the disclosure, the intelligent voice navigation method may further include, for a scenario of multiple interactions in intelligent voice navigation, establishing association logic of a semantic context under the condition that the intelligent voice navigation device cooperates with the interactive voice response server to perform short code identification conversion on voice semantic recognition content, so as to achieve multi-semantic slot complement recognition.
Based on the intelligent voice navigation method provided by the embodiment of the disclosure, the original Code mechanism and core architecture are not changed. According to the embodiment of the disclosure, the Redis memory server is utilized, the XID short codes are customized, and the temporary relation with the user voice recognition characters is established, so that the problem that the transmission length of intelligent navigation voice semantic analysis recognition data is limited is solved, a large amount of investment caused by upgrading a call center is avoided, and intelligent voice navigation online is accelerated.
The embodiment of the disclosure can realize intelligent IVR navigation service based on the traditional call center and support multi-round interaction between the customer service robot and the user, thereby replacing the traditional key IVR service and realizing flattening.
Fig. 2 is a schematic diagram of other embodiments of the intelligent voice navigation method of the present disclosure. Preferably, the present embodiment may be performed by the disclosed intelligent voice navigation apparatus or the disclosed intelligent voice navigation system. The method may include steps 201-216, wherein:
In step 201, in the case that the user makes a telephone call to the queuing machine through the user terminal, the queuing machine transmits the user voice stream to the interactive voice response IVR server, requesting voice distribution.
At step 202, the interactive voice response server returns a guided speech to the queuing machine, instructing the queuing machine to send the user voice stream to an ASR (Automatic Speech Recognition, automatic speech recognition device, also known as speech recognition device).
In step 203, the queuing machine sends the user voice stream to the voice recognition device.
In step 204, the speech recognition device performs speech recognition on the user speech stream and sends the speech-recognized text to the semantic NLP (Natural Language Processing, natural language processing, also called semantic recognition device).
In step 205, the semantic recognition device sends the speech semantic recognition content of the user speech stream to the intelligent speech navigation device.
In some embodiments of the present disclosure, the speech semantic recognition content of the user speech stream may be a semantic recognition result of a semantic recognition device, and then step 206 and step 207 are performed.
In some embodiments of the present disclosure, the speech semantic recognition content of the user speech stream may be interactive semantic XML (eXtensible Markup Language ).
In step 206, the short code conversion module of the intelligent voice navigation apparatus generates a short code identifier XID according to the voice semantic recognition content, and returns the short code identifier to the queuing machine, and then step 208 is executed.
Step 207, the intelligent voice navigation device caches the voice semantic recognition content of the user voice stream by using the Redis service, and establishes a corresponding relationship between the voice semantic recognition content and the short code identifier.
In step 208, the queuing machine transmits the short code identification XID to the interactive voice response server.
In step 209, the interactive voice response server requests the node to execute, and transmits the short code identifier XID to the short code decoding module of the intelligent voice navigation device.
In step 210, the short code decoding module of the intelligent voice navigation apparatus parses the short code identifier XID, requests the voice semantic recognition content corresponding to the short code identifier, i.e., requests the voice semantic recognition content corresponding to the short code identifier (e.g., interactive semantic XML) from the Redis message middleware module.
Step 211, the Redis message middleware module of the intelligent voice navigation apparatus returns the voice semantic recognition content (for example, interactive semantic XML) corresponding to the short code identifier to the short code decoding module.
Step 212, the short encoder/decoder module of the intelligent voice navigation device generates corresponding interactive voice flow executing nodes according to the voice semantic recognition content, encapsulates the interactive voice flow executing nodes into flow nodes VXML (Voice eXtensible Markup Language ) which can be executed by the interactive voice response server, and returns the flow nodes voice extensible markup language to the interactive voice response server.
In some embodiments of the present disclosure, step 212 may include returning full node VXML in response to the IVR flow request.
In some embodiments of the present disclosure, the IVR server obtains the actual user voice, semantic recognition content at the intelligent voice navigation device by XID in the manner of steps 209-212, thereby driving actions behind the queuing machine (step 213-216).
Step 213, the interactive voice response server initiates the IVR flow node and returns the voice distribution result to the queuing machine.
In step 214, the queuing machine requests TTS (Text To Speech) broadcasts from a TTS Speech synthesis device.
In step 215, the tts speech synthesis device synthesizes the broadcast message and returns it to the queuing machine.
In step 216, the queuing machine broadcasts the broadcast sound to the user.
According to the embodiment of the disclosure, the intelligent voice navigation integration method based on the dynamically generated XID short coding data is provided by establishing the adapting device with the Redis cache service, so that the problem that the originally complex recognition text information with the byte number exceeding the limit of the queuing machine cannot be transmitted is solved, and finally, the voice interaction recognition information of the user is explicitly pointed to the voice navigation IVR flow executing node.
The voice stream output by the queuing machine in the embodiment of the disclosure is converted through an interface protocol, sent to a voice recognition and semantic recognition engine, sent to the device, cached through a Redis service, and dynamically generated into an XID short code to establish an association relation.
The intelligent voice navigation device of the embodiment of the disclosure sends the XID to the queuing machine, and the queuing machine sends the XID to the IVR server, thereby bypassing the limitation that the traditional queuing machine does not support voice recognition content and transmission data is limited.
The embodiment of the disclosure solves the problem that the main stream queuing machine is excessively long and limited for intelligent navigation recognition text information based on a custom XID short code mechanism.
The embodiment of the disclosure can realize the MRCP (Media Resource Control Protocol ) protocol architecture in a packet-grabbing mode, and breaks through the limitation that the traditional voice AI engine needs privately-arranged. Based on the new architecture, the above embodiments of the present disclosure can adapt to the mainstream speech AI engine interface protocol, and the engine only needs to provide http (HyperText Transfer Protocol) with open standardized capabilities, i.e. hypertext transfer protocol/websocket (a protocol for full duplex communication over a single TCP connection) interface.
Fig. 3 is a schematic diagram of a short code identification multi-round interaction mechanism in some embodiments of the present disclosure. For multi-round interaction in intelligent voice navigation, the intelligent voice navigation device disclosed by the invention is matched with an IVR server, and provides a multi-semantic slot position complement recognition mechanism, so that context associated logic can be brought when voice recognition data are subjected to XID coding conversion, and a scene of one-time business multi-round man-machine conversation is better supported.
For example, in the embodiment of fig. 3, according to the voice stream of the user 'i want to check bill', the check bill code is determined, and the intermediate node is determined to be 01002, which is used as the starting slot in the multi-semantic slots. The queuing machine broadcasts a voice of 'confirming whether inquiring is the local machine or the other machine' to the user, and searches a local code and an intermediate node +01 as an intermediate slot in the multi-semantic slots according to a voice stream of 'I' returned by the user. The queuing machine broadcasts a voice of 'asking you to inquire a month bill of the present year', and searches for a 2 month code and intermediate node +02 as an ending slot in the multi-semantic slots according to a voice stream of '2 months' returned by the user. And integrating semantics of the three slots of the 2 months, the local machine and the bill to perform context association, and finally providing the bill total amount of the 2 months of the user for the user and reminding the user to pay.
The embodiment of the disclosure can realize a multi-semantic slot completion recognition mechanism of the XID short code based on session association and realize one-time business multi-round man-machine conversation.
Compared with the conventional mode of the related technology for customizing and developing traffic logic at the application level, the embodiment of the disclosure can realize dual-channel traffic information management and real-time integration in the intelligent voice navigation device, can realize the associated output of traffic channel and voice flow AI processing results, and has high reusability value.
Fig. 4 is a schematic diagram of some embodiments of an intelligent voice navigation device of the present disclosure. As shown in fig. 4, the intelligent voice navigation apparatus of the present disclosure may include a short transcoding module 41 and a short codec module 42, wherein:
the short code conversion module 41 is configured to receive speech semantic recognition content of a user speech stream, wherein the queuing machine sends the user speech stream to the speech recognition device and the semantic recognition device, the semantic recognition device outputs the speech semantic recognition content of the user speech stream to the intelligent speech navigation device, generate a short code identifier according to the speech semantic recognition content, establish a corresponding relationship between the speech semantic recognition content and the short code identifier, and return the short code identifier to the queuing machine to instruct the queuing machine to send the short code identifier to the interactive speech response server.
In some embodiments of the present disclosure, short transcoding module 41 may be a semantic XID short transcoding module.
In some embodiments of the present disclosure, the short transcoding module 41 may be configured to define XID short codes based on rules executed by nodes, and to output results in place of the original too many words of semantic recognition corresponding to the codes, thereby bypassing queuing machine limitations.
The short code decoding module 42 is configured to parse the short code identifier of the interactive voice response server, obtain voice semantic recognition content corresponding to the short code identifier, and return the voice semantic recognition content corresponding to the short code identifier to the interactive voice response server.
In some embodiments of the present disclosure, the short codec module 42 may be configured to obtain corresponding speech semantic recognition content according to the short code identifier, generate a corresponding interactive speech process execution node according to the speech semantic recognition content, package the interactive speech process execution node into a process node speech extensible markup language executable by the interactive speech response server, and return the process node speech extensible markup language to the interactive speech response server.
In some embodiments of the present disclosure, short transcoding module 41 may be an XID short code node parsing module.
In some embodiments of the present disclosure, the short code conversion module 41 may be configured to parse the generated XID short code, obtain multiple corresponding IVR flow execution nodes, and package the multiple corresponding IVR flow execution nodes into flow nodes VXML executable by the IVR.
In some embodiments of the present disclosure, the short code conversion module 41 and the short code decoding module 42 may be configured to, for a scenario of multiple interactions in intelligent voice navigation, cooperate with an interactive voice response server to establish a semantic context association logic when performing short code identifier conversion on voice semantic recognition content, so as to implement multi-semantic slot-based complement recognition.
In some embodiments of the present disclosure, as shown in fig. 4, the intelligent voice navigation apparatus may further include a message middleware module 43, wherein:
The message middleware module 43 is configured to store a correspondence between the speech semantic recognition content and the short code identifier.
In some embodiments of the present disclosure, message middleware module 43 may be a Redis message middleware module.
In some embodiments of the present disclosure, message middleware module 43 may be used to carry a transmission schedule for highly concurrent smart navigation interactive semantic XML data messages.
Based on the intelligent voice navigation device provided by the embodiment of the disclosure, the original Code mechanism and the core architecture are not changed. According to the embodiment of the disclosure, the Redis memory server is utilized, the XID short codes are customized, and the temporary relation with the user voice recognition characters is established, so that the problem that the transmission length of intelligent navigation voice semantic analysis recognition data is limited is solved, a large amount of investment caused by upgrading a call center is avoided, and intelligent voice navigation online is accelerated.
The embodiment of the disclosure can realize intelligent IVR navigation service based on the traditional call center and support multi-round interaction between the customer service robot and the user, thereby replacing the traditional key IVR service and realizing flattening.
Fig. 5 is a schematic diagram of some embodiments of the disclosed intelligent voice navigation system. As shown in fig. 5, the intelligent voice navigation apparatus of the present disclosure may include a queuing machine 1, a semantic recognition apparatus 2, an interactive voice response server 3, and an intelligent voice navigation apparatus 4, wherein:
A queuing machine 1 for transmitting a user voice stream to a voice recognition device and a semantic recognition device.
Semantic recognition means 2 for transmitting the speech semantic recognition content of the user speech stream to the intelligent speech navigation means.
In some embodiments of the present disclosure, the speech semantic recognition content of the user speech stream may be interactive semantic XML.
The intelligent voice navigation device 4 is used for generating a short code identification XID according to the voice semantic identification content, establishing a corresponding relation between the voice semantic identification content and the short code identification, returning the short code identification to the queuing machine, and instructing the queuing machine to send the short code identification to the interactive voice response server.
And the interactive voice response server 3 is used for acquiring corresponding voice semantic recognition content from the intelligent voice navigation device according to the short code identification.
In some embodiments of the present disclosure, as shown in fig. 5, the intelligent voice navigation system may further include a voice recognition device 5, wherein:
The queuing machine 1 can also be used for sending a user voice stream to an interactive voice response IVR server to request voice distribution in the case that a user makes a telephone call into the queuing machine through a user terminal.
The interactive voice response server 3 may also be adapted to return a pilot call to the queuing machine instructing the queuing machine to send the user's voice stream to the voice recognition device.
The voice recognition device 5 is used for performing voice recognition on the voice stream of the user and sending the words after voice recognition to the semantic recognition device 2.
Semantic recognition means 2 for transmitting the speech semantic recognition content of the user speech stream to the intelligent speech navigation means 4.
In some embodiments of the present disclosure, the speech semantic recognition content may be a semantic result packaged in XML format, as shown in fig. 5.
In some embodiments of the present disclosure, as shown in fig. 5, the intelligent voice navigation apparatus 4 may include a short transcoding module 41, a short codec module 42, and a message middleware module 43, wherein:
the short code conversion module 41 of the intelligent voice navigation apparatus 4 is configured to generate a short code identifier XID according to the voice semantic recognition content (semantic XML), establish a correspondence between the voice semantic recognition content and the short code identifier in the short code decoding module 42, return the short code identifier to the queuing machine 1, and instruct the queuing machine 1 to send the short code identifier to the interactive voice response server.
The interactive voice response server 3 is configured to request the node to execute and transmit the short code identifier XID to the short code decoding module 42 of the intelligent voice navigation device.
The short code decoding module 42 of the intelligent voice navigation apparatus is configured to parse the short code identifier XID, and request the voice semantic recognition content corresponding to the short code identifier, that is, request the voice semantic recognition content corresponding to the short code identifier (for example, interactive semantic XML) from the Redis message middleware module.
The Redis message middleware module 43 of the intelligent voice navigation apparatus is configured to return voice semantic recognition content (for example, interactive semantic XML) corresponding to the short code identifier to the short code decoding module 42.
The short codec module 42 of the intelligent voice navigation device is configured to generate a corresponding interactive voice process execution node (e.g., an XML package node) according to the voice semantic recognition content, package the interactive voice process execution node into a process node VXML executable by the interactive voice response server, and return the process node voice extensible markup language to the interactive voice response server.
FIG. 2 also provides a schematic diagram of other embodiments of the disclosed intelligent voice navigation system. In comparison with the embodiment of fig. 5, the intelligent speech navigation system of the embodiment of fig. 2 may further comprise a TTS speech synthesis device 6 in addition to the queuing machine 1, the speech recognition device 5, the semantic recognition device 2, the interactive speech response server 3 and the intelligent speech navigation device 4, wherein:
The interactive voice response server 3 may also be configured to initiate an IVR process node and return the voice allocation result to the queuing machine.
The queuing machine 1 may also be used to request TTS broadcasts from TTS speech synthesis means.
The TTS voice synthesis device 6 is configured to synthesize a broadcast signal and return the broadcast signal to the queuing machine 1, and instruct the queuing machine 1 to broadcast the broadcast signal to a user.
In some embodiments of the present disclosure, the intelligent voice navigation apparatus of the embodiments of fig. 2 and 3 may be an intelligent voice navigation apparatus as described in any of the embodiments described above (e.g., the embodiment of fig. 4).
Based on the intelligent voice navigation system provided by the embodiment of the disclosure, the original Code mechanism and the core architecture are not changed. According to the embodiment of the disclosure, the Redis memory server is utilized, the XID short codes are customized, and the temporary relation with the user voice recognition characters is established, so that the problem that the transmission length of intelligent navigation voice semantic analysis recognition data is limited is solved, a large amount of investment caused by upgrading a call center is avoided, and intelligent voice navigation online is accelerated.
The embodiment of the disclosure can realize intelligent IVR navigation service based on the traditional call center and support multi-round interaction between the customer service robot and the user, thereby replacing the traditional key IVR service and realizing flattening.
The intelligent voice navigation method, the intelligent voice navigation device and the intelligent voice navigation system adopted by the embodiment of the disclosure are practically applied to intelligent IVR flattening construction projects of multimedia customer service cooperation groups of certain operators. The embodiment of the disclosure is verified in the project, so that the intelligent voice navigation service of the city operator is rapidly online, and the improvement and upgrading of the call center are avoided.
According to another aspect of the disclosure, there is provided a computer readable storage medium storing computer instructions that when executed by a processor implement an intelligent voice navigation method as described in any of the embodiments above (e.g., the embodiments of fig. 1 or 2).
Based on the computer readable storage medium provided by the above embodiments of the present disclosure, the original Code mechanism and core architecture are not changed. According to the embodiment of the disclosure, the Redis memory server is utilized, the XID short codes are customized, and the temporary relation with the user voice recognition characters is established, so that the problem that the transmission length of intelligent navigation voice semantic analysis recognition data is limited is solved, a large amount of investment caused by upgrading a call center is avoided, and intelligent voice navigation online is accelerated.
The embodiment of the disclosure can realize intelligent IVR navigation service based on the traditional call center and support multi-round interaction between the customer service robot and the user, thereby replacing the traditional key IVR service and realizing flattening.
The voice stream output by the queuing machine in the embodiment of the disclosure is converted through an interface protocol, sent to a voice recognition and semantic recognition engine, sent to the device, cached through a Redis service, and dynamically generated into an XID short code to establish an association relation.
The intelligent voice navigation device of the embodiment of the disclosure sends the XID to the queuing machine, and the queuing machine sends the XID to the IVR server, thereby bypassing the limitation that the traditional queuing machine does not support voice recognition content and transmission data is limited.
The embodiment of the disclosure solves the problem that the main stream queuing machine is excessively long and limited for intelligent navigation recognition text information based on a custom XID short code mechanism.
The embodiment of the disclosure can realize the MRCP protocol architecture in a packet grabbing mode, and breaks through the limitation that the traditional voice AI engine needs privately-arranged deployment. Based on the new architecture, the above embodiments of the present disclosure can adapt to the mainstream speech AI engine interface protocol, and the engine only needs to provide http (HyperText Transfer Protocol) with open standardized capabilities, i.e. hypertext transfer protocol/websocket (a protocol for full duplex communication over a single TCP connection) interface.
The intelligent voice navigation apparatus described above may be implemented as a general purpose processor, a Programmable Logic Controller (PLC), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any suitable combination thereof for performing the functions described herein.
Thus far, the present disclosure has been described in detail. In order to avoid obscuring the concepts of the present disclosure, some details known in the art are not described. How to implement the solutions disclosed herein will be fully apparent to those skilled in the art from the above description.
Those of ordinary skill in the art will appreciate that all or a portion of the steps implementing the above embodiments may be implemented by hardware, or may be implemented by a program indicating that the relevant hardware is implemented, where the program may be stored on a computer readable storage medium, where the storage medium may be a read only memory, a magnetic disk or optical disk, etc.
The description of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the disclosure in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiments were chosen and described in order to best explain the principles of the disclosure and the practical application, and to enable others of ordinary skill in the art to understand the disclosure for various embodiments with various modifications as are suited to the particular use contemplated.

Claims (10)

1. An intelligent voice navigation method, comprising:
the queuing machine sends the user voice stream to a voice recognition device and a semantic recognition device;
the semantic recognition device sends voice semantic recognition content of the user voice stream to the intelligent voice navigation device;
The intelligent voice navigation device generates a short code identifier according to the voice semantic recognition content, and establishes a corresponding relation between the voice semantic recognition content and the short code identifier;
the intelligent voice navigation device returns the short code identification to the queuing machine;
The queuing machine sends the short code identification to the interactive voice response server;
The interactive voice response server acquires corresponding voice semantic recognition content from the intelligent voice navigation device according to the short code identification;
the interactive voice response server obtaining corresponding voice semantic recognition content from the intelligent voice navigation device according to the short code identification comprises the following steps:
the intelligent voice navigation device acquires corresponding voice semantic recognition content according to the short code identification;
the intelligent voice navigation device generates corresponding interactive voice flow executing nodes according to the voice semantic recognition content;
The intelligent voice navigation device packages the interactive voice flow executing node into a flow node voice extensible markup language which can be executed by the interactive voice response server;
And the intelligent voice navigation device returns the flow node voice extensible markup language to the interactive voice response server.
2. The intelligent voice navigation method of claim 1, further comprising:
For the scene of multi-round interaction in intelligent voice navigation, the intelligent voice navigation device is matched with the interactive voice response server, and under the condition of carrying out short code identification conversion on voice semantic identification content, the association logic of the semantic context is established so as to realize complementary identification based on multi-semantic slots.
3. The intelligent voice navigation method according to claim 1 or 2, wherein the queuing machine transmits the user voice stream to the voice recognition apparatus and the semantic recognition apparatus comprises:
The queuing machine sends the user voice stream to the interactive voice response server to request voice allocation;
The interactive voice response server instructs the queuing machine to send the user voice stream to the voice recognition device;
the voice recognition device performs voice recognition on the user voice stream and sends the words after voice recognition to the semantic recognition device.
4. An intelligent voice navigation apparatus, comprising:
the system comprises a short code conversion module, a short code identification module, a short code response server, a short code identification module, a short code response module and a voice response server, wherein the short code conversion module is used for receiving voice semantic identification content of a user voice stream, the queuing machine sends the user voice stream to the voice identification device and the semantic identification device, and the semantic identification device outputs the voice semantic identification content of the user voice stream to the intelligent voice navigation device;
The short coding decoding module is used for analyzing the short coding identification of the interactive voice response server to obtain voice semantic recognition content corresponding to the short coding identification, and returning the voice semantic recognition content corresponding to the short coding identification to the interactive voice response server;
The short coding and decoding module is used for acquiring corresponding voice semantic recognition content according to the short coding identification, generating a corresponding interactive voice flow execution node according to the voice semantic recognition content, packaging the interactive voice flow execution node into a flow node voice extensible markup language which can be executed by the interactive voice response server, and returning the flow node voice extensible markup language to the interactive voice response server.
5. The intelligent voice navigation apparatus of claim 4, wherein,
The short code conversion module and the short code decoding module are used for establishing the association logic of the semantic context under the condition of carrying out short code identification conversion on the semantic identification content of the voice by matching with the interactive voice response server for the scene of multi-round interaction in intelligent voice navigation so as to realize the multi-semantic slot-based complement identification.
6. The intelligent voice navigation apparatus of claim 4 or 5, further comprising:
And the message middleware module is used for storing the corresponding relation between the voice semantic recognition content and the short code identification.
7. An intelligent voice navigation system, comprising:
A queuing machine for transmitting the user voice stream to the voice recognition device and the semantic recognition device;
The semantic recognition device is used for sending voice semantic recognition content of the user voice stream to the intelligent voice navigation device;
the intelligent voice navigation device is used for generating a short code identifier according to the voice semantic recognition content, establishing a corresponding relation between the voice semantic recognition content and the short code identifier, returning the short code identifier to the queuing machine, and indicating the queuing machine to send the short code identifier to the interactive voice response server;
The interactive voice response server is used for acquiring corresponding voice semantic recognition content from the intelligent voice navigation device according to the short code identification;
The intelligent voice navigation device is used for acquiring corresponding voice semantic recognition content according to the short code identification, generating a corresponding interactive voice flow execution node according to the voice semantic recognition content, packaging the interactive voice flow execution node into a flow node voice extensible markup language which can be executed by the interactive voice response server, and returning the flow node voice extensible markup language to the interactive voice response server.
8. The intelligent voice navigation system of claim 7, further comprising:
The queuing machine is also used for sending the user voice stream to the interactive voice response server to request voice allocation;
the interactive voice response server is also used for instructing the queuing machine to send the user voice stream to the voice recognition device;
And the voice recognition device is used for carrying out voice recognition on the voice stream of the user and sending the words after voice recognition to the semantic recognition device.
9. The intelligent voice navigation system of claim 7 or 8,
The intelligent voice navigation device is used for establishing the association logic of the semantic context under the condition of carrying out short code identification conversion on voice semantic identification content by matching with the interactive voice response server for the scene of multi-round interaction in intelligent voice navigation so as to realize multi-semantic slot-based complement identification.
10. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the intelligent voice navigation method of any of claims 1-3.
CN202010534986.6A 2020-06-12 2020-06-12 Intelligent voice navigation method, device and system, and computer-readable storage medium Active CN113808588B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010534986.6A CN113808588B (en) 2020-06-12 2020-06-12 Intelligent voice navigation method, device and system, and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010534986.6A CN113808588B (en) 2020-06-12 2020-06-12 Intelligent voice navigation method, device and system, and computer-readable storage medium

Publications (2)

Publication Number Publication Date
CN113808588A CN113808588A (en) 2021-12-17
CN113808588B true CN113808588B (en) 2024-12-13

Family

ID=78944061

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010534986.6A Active CN113808588B (en) 2020-06-12 2020-06-12 Intelligent voice navigation method, device and system, and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN113808588B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004045900A (en) * 2002-07-12 2004-02-12 Toyota Central Res & Dev Lab Inc Voice interaction device and program
CN108231080A (en) * 2018-01-05 2018-06-29 广州蓝豹智能科技有限公司 Voice method for pushing, device, smart machine and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6999930B1 (en) * 2002-03-27 2006-02-14 Extended Systems, Inc. Voice dialog server method and system
US7054421B2 (en) * 2002-05-31 2006-05-30 International Business Machines Corporation Enabling legacy interactive voice response units to accept multiple forms of input
US20170293610A1 (en) * 2013-03-15 2017-10-12 Bao Tran Voice assistant

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004045900A (en) * 2002-07-12 2004-02-12 Toyota Central Res & Dev Lab Inc Voice interaction device and program
CN108231080A (en) * 2018-01-05 2018-06-29 广州蓝豹智能科技有限公司 Voice method for pushing, device, smart machine and storage medium

Also Published As

Publication number Publication date
CN113808588A (en) 2021-12-17

Similar Documents

Publication Publication Date Title
CN110442701B (en) Voice conversation processing method and device
US6934756B2 (en) Conversational networking via transport, coding and control conversational protocols
EP1311102A1 (en) Streaming audio under voice control
CN106409283B (en) Man-machine mixed interaction system and method based on audio
EP3084633B1 (en) Attribute-based audio channel arbitration
CN101207586B (en) Method and system for real-time automatic communication
US20030088421A1 (en) Universal IP-based and scalable architectures across conversational applications using web services for speech and audio processing resources
US20060235694A1 (en) Integrating conversational speech into Web browsers
KR20190075800A (en) Intelligent personal assistant interface system
US20090198497A1 (en) Method and apparatus for speech synthesis of text message
JP2023509868A (en) SERVER-SIDE PROCESSING METHOD AND SERVER FOR ACTIVELY PROPOSING START OF DIALOGUE, AND VOICE INTERACTION SYSTEM FOR POSITIVELY PROPOSING START OF DIALOGUE
KR20010075552A (en) System and method for providing network coordinated conversational services
CN102299934A (en) Voice input method based on cloud mode and voice recognition
CN109637534A (en) Voice remote control method, system, controlled device and computer readable storage medium
CN110659361B (en) Conversation method, device, equipment and medium
CN107728497B (en) Communication method for man-machine interaction
Di Fabbrizio et al. A speech mashup framework for multimodal mobile services
CN110971685B (en) Content processing method, content processing device, computer equipment and storage medium
CN110418181B (en) Service processing method and device for smart television, smart device and storage medium
CN113159483A (en) Task scheduling method and device based on RPA and AI, robot and medium
CN113808588B (en) Intelligent voice navigation method, device and system, and computer-readable storage medium
KR102181583B1 (en) System for voice recognition of interactive robot and the method therof
CN114373449A (en) Intelligent device, server and voice interaction method
CN104517609A (en) Voice recognition method and device
JP2005151553A (en) Voice portal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant