CN113808588B

CN113808588B - Intelligent voice navigation method, device and system, and computer-readable storage medium

Info

Publication number: CN113808588B
Application number: CN202010534986.6A
Authority: CN
Inventors: 董斌; 朱云峰; 彭倩; 张小凡; 林玮玮; 张�杰; 陆东明; 严秋红; 蔡林俊; 李贵阳
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2020-06-12
Filing date: 2020-06-12
Publication date: 2024-12-13
Anticipated expiration: 2040-06-12
Also published as: CN113808588A

Abstract

The present disclosure relates to an intelligent voice navigation method, device and system, and a computer-readable storage medium. The intelligent voice navigation method comprises: the queuing machine sends the user voice stream to the voice recognition device and the semantic recognition device; the semantic recognition device sends the voice semantic recognition content of the user voice stream to the intelligent voice navigation device; the intelligent voice navigation device generates a short coding identifier according to the voice semantic recognition content, and establishes a corresponding relationship between the voice semantic recognition content and the short coding identifier; the intelligent voice navigation device returns the short coding identifier to the queuing machine; the queuing machine sends the short coding identifier to the interactive voice response server; the interactive voice response server obtains the corresponding voice semantic recognition content from the intelligent voice navigation device according to the short coding identifier. The present disclosure is based on a custom XID short code mechanism, which solves the problem that mainstream queuing machines are limited in recognizing too long text information for intelligent navigation.

Description

Intelligent voice navigation method, device and system and computer readable storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to an intelligent voice navigation method, apparatus and system, and a computer readable storage medium.

Background

In the related art, many call centers adopt IVR (INTERACTIVE VOICE RESPONSE ) flow implementation modes, user key information (code) is transmitted to an IVR server through a voice queuing machine, and the IVR server drives the next action, so that data transmission between the queuing machine and the IVR server is not required to be defined for a long time, for example, a Huazhi call center only supports 23 bytes, namely 11 Chinese characters at maximum.

Under the condition, the intelligent voice navigation is realized, interaction semantic information obtained by voice analysis of a user is required to be transmitted, and the identification information often exceeds the byte limit of a queuing machine, so that the interaction semantic information cannot be transmitted to an IVR server for flow node execution.

The situation exists in the models of UAP8100, UAP3100, zhongxing communication MS10 and the like of call centers constructed before 2016, which brings difficulty to the traditional call centers for realizing intelligent voice navigation, and the related technical solutions require larger adjustment and change of IVR flow Code mechanism.

Disclosure of Invention

In view of at least one of the above technical problems, the present disclosure provides an intelligent voice navigation method, apparatus and system, and a computer readable storage medium, which solve the problem that a main stream queuing machine is limited in excessively long for intelligent navigation recognition text information based on a custom XID short code mechanism.

According to one aspect of the present disclosure, there is provided an intelligent voice navigation method, including:

the queuing machine sends the user voice stream to a voice recognition device and a semantic recognition device;

the semantic recognition device sends voice semantic recognition content of the user voice stream to the intelligent voice navigation device;

The intelligent voice navigation device generates a short code identifier according to the voice semantic recognition content, and establishes a corresponding relation between the voice semantic recognition content and the short code identifier;

the intelligent voice navigation device returns the short code identification to the queuing machine;

The queuing machine sends the short code identification to the interactive voice response server;

And the interactive voice response server acquires corresponding voice semantic recognition content from the intelligent voice navigation device according to the short code identification.

In some embodiments of the present disclosure, the obtaining, by the interactive voice response server, the corresponding voice semantic recognition content from the intelligent voice navigation device according to the short code identifier includes:

the intelligent voice navigation device acquires corresponding voice semantic recognition content according to the short code identification;

the intelligent voice navigation device generates corresponding interactive voice flow executing nodes according to the voice semantic recognition content;

The intelligent voice navigation device packages the interactive voice flow executing node into a flow node voice extensible markup language which can be executed by the interactive voice response server;

And the intelligent voice navigation device returns the flow node voice extensible markup language to the interactive voice response server.

In some embodiments of the present disclosure, the intelligent voice navigation method further comprises:

For the scene of multi-round interaction in intelligent voice navigation, the intelligent voice navigation device is matched with the interactive voice response server, and under the condition of carrying out short code identification conversion on voice semantic identification content, the association logic of the semantic context is established so as to realize complementary identification based on multi-semantic slots.

In some embodiments of the present disclosure, the queuing machine sending the user voice stream to the voice recognition device and the semantic recognition device includes:

The queuing machine sends the user voice stream to the interactive voice response server to request voice allocation;

The interactive voice response server instructs the queuing machine to send the user voice stream to the voice recognition device;

the voice recognition device performs voice recognition on the user voice stream and sends the words after voice recognition to the semantic recognition device.

According to another aspect of the present disclosure, there is provided an intelligent voice navigation apparatus, comprising:

the system comprises a short code conversion module, a short code identification module, a short code response server, a short code identification module, a short code response module and a voice response server, wherein the short code conversion module is used for receiving voice semantic identification content of a user voice stream, the queuing machine sends the user voice stream to the voice identification device and the semantic identification device, and the semantic identification device outputs the voice semantic identification content of the user voice stream to the intelligent voice navigation device;

The short coding decoding module is used for analyzing the short coding identification of the interactive voice response server, acquiring voice semantic recognition content corresponding to the short coding identification, and returning the voice semantic recognition content corresponding to the short coding identification to the interactive voice response server.

In some embodiments of the present disclosure, a short code decoding module is configured to obtain corresponding speech semantic recognition content according to a short code identifier, generate a corresponding interactive speech process execution node according to the speech semantic recognition content, package the interactive speech process execution node into a process node speech extensible markup language executable by an interactive speech response server, and return the process node speech extensible markup language to the interactive speech response server.

In some embodiments of the present disclosure, a short code conversion module and a short code decoding module are configured to, for a scenario of multiple interactions in intelligent voice navigation, cooperate with an interactive voice response server, and under a condition of performing short code identification conversion on voice semantic recognition content, establish association logic of semantic contexts to implement multi-semantic slot-based complement recognition.

In some embodiments of the present disclosure, the intelligent voice navigation apparatus further comprises:

And the message middleware module is used for storing the corresponding relation between the voice semantic recognition content and the short code identification.

According to another aspect of the present disclosure, there is provided an intelligent voice navigation system, comprising:

A queuing machine for transmitting the user voice stream to the voice recognition device and the semantic recognition device;

The semantic recognition device is used for sending voice semantic recognition content of the user voice stream to the intelligent voice navigation device;

the intelligent voice navigation device is used for generating a short code identifier according to the voice semantic recognition content, establishing a corresponding relation between the voice semantic recognition content and the short code identifier, returning the short code identifier to the queuing machine, and indicating the queuing machine to send the short code identifier to the interactive voice response server;

and the interactive voice response server is used for acquiring corresponding voice semantic recognition content from the intelligent voice navigation device according to the short code identification.

In some embodiments of the present disclosure, the intelligent voice navigation system further comprises:

The queuing machine is also used for sending the user voice stream to the interactive voice response server to request voice allocation;

the interactive voice response server is also used for instructing the queuing machine to send the user voice stream to the voice recognition device;

And the voice recognition device is used for carrying out voice recognition on the voice stream of the user and sending the words after voice recognition to the semantic recognition device.

In some embodiments of the disclosure, the intelligent voice navigation apparatus is an intelligent voice navigation apparatus as described in any of the embodiments above.

According to another aspect of the present disclosure, there is provided a computer readable storage medium storing computer instructions that when executed by a processor implement the intelligent voice navigation method according to any one of the embodiments above.

The method and the device solve the problem that the main stream queuing machine is excessively long and limited for intelligent navigation recognition text information based on a custom XID short code mechanism.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort to a person of ordinary skill in the art.

Fig. 1 is a schematic diagram of some embodiments of the disclosed intelligent voice navigation method.

Fig. 2 is a schematic diagram of other embodiments of the intelligent voice navigation method of the present disclosure.

Fig. 3 is a schematic diagram of a short code identification multi-round interaction mechanism in some embodiments of the present disclosure.

Fig. 4 is a schematic diagram of some embodiments of an intelligent voice navigation device of the present disclosure.

Fig. 5 is a schematic diagram of some embodiments of the disclosed intelligent voice navigation system.

Detailed Description

The following description of the technical solutions in the embodiments of the present disclosure will be made clearly and completely with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, not all embodiments. The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses. Based on the embodiments in this disclosure, all other embodiments that a person of ordinary skill in the art would obtain without making any inventive effort are within the scope of protection of this disclosure.

The relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless it is specifically stated otherwise.

Meanwhile, it should be understood that the sizes of the respective parts shown in the drawings are not drawn in actual scale for convenience of description.

Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but should be considered part of the specification where appropriate.

In all examples shown and discussed herein, any specific values should be construed as merely illustrative, and not a limitation. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that like reference numerals and letters refer to like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.

Fig. 1 is a schematic diagram of some embodiments of the disclosed intelligent voice navigation method. Preferably, the present embodiment may be performed by the disclosed intelligent voice navigation apparatus or the disclosed intelligent voice navigation system. The method may comprise steps 11-16, wherein:

Step 11, the queuing machine sends the user voice stream to the voice recognition device and the semantic recognition device.

In some embodiments of the present disclosure, step 11 may include steps 111-113, wherein:

In step 111, the queuing machine sends the user voice stream to the interactive voice response server, requesting voice allocation.

At step 112, the interactive voice response server instructs the queuing machine to send the user voice stream to the voice recognition device.

Step 113, the voice recognition device performs voice recognition on the user voice stream and sends the words after voice recognition to the semantic recognition device.

In step 12, the semantic recognition device sends the speech semantic recognition content of the user speech stream to the intelligent speech navigation device.

And 13, the intelligent voice navigation device generates a short code identifier XID according to the voice semantic recognition content, and establishes a corresponding relation between the voice semantic recognition content and the short code identifier by using a Redis (Remote Dictionary Server, namely a remote dictionary service) memory server.

In some embodiments of the present disclosure, redis is a mature, open-source data caching service architecture, which is widely used for memory-based data fast reading and writing, and efficient real-time message stack management can be achieved based on Redis clusters. The embodiment of the disclosure builds transmission scheduling of the high-concurrency intelligent navigation interaction semantic XML data message based on Redis.

And step 14, the intelligent voice navigation device returns the short code identification to the queuing machine.

And step 15, the queuing machine sends the short code identification to the interactive voice response server.

And step 16, the interactive voice response server acquires corresponding voice semantic recognition content from the intelligent voice navigation device according to the short code identification.

In some embodiments of the present disclosure, step 16 may include steps 161-164, wherein:

step 161, the intelligent voice navigation device obtains the corresponding voice semantic recognition content according to the short code identification.

Step 162, the intelligent voice navigation device generates corresponding interactive voice flow executing nodes according to the voice semantic recognition content.

In step 163, the intelligent voice navigation device encapsulates the interactive voice process execution node into a process node voice extensible markup language executable by the interactive voice response server.

In step 164, the intelligent voice navigation device returns the process node voice extensible markup language to the interactive voice response server.

In some embodiments of the disclosure, the intelligent voice navigation method may further include, for a scenario of multiple interactions in intelligent voice navigation, establishing association logic of a semantic context under the condition that the intelligent voice navigation device cooperates with the interactive voice response server to perform short code identification conversion on voice semantic recognition content, so as to achieve multi-semantic slot complement recognition.

Based on the intelligent voice navigation method provided by the embodiment of the disclosure, the original Code mechanism and core architecture are not changed. According to the embodiment of the disclosure, the Redis memory server is utilized, the XID short codes are customized, and the temporary relation with the user voice recognition characters is established, so that the problem that the transmission length of intelligent navigation voice semantic analysis recognition data is limited is solved, a large amount of investment caused by upgrading a call center is avoided, and intelligent voice navigation online is accelerated.

The embodiment of the disclosure can realize intelligent IVR navigation service based on the traditional call center and support multi-round interaction between the customer service robot and the user, thereby replacing the traditional key IVR service and realizing flattening.

Fig. 2 is a schematic diagram of other embodiments of the intelligent voice navigation method of the present disclosure. Preferably, the present embodiment may be performed by the disclosed intelligent voice navigation apparatus or the disclosed intelligent voice navigation system. The method may include steps 201-216, wherein:

In step 201, in the case that the user makes a telephone call to the queuing machine through the user terminal, the queuing machine transmits the user voice stream to the interactive voice response IVR server, requesting voice distribution.

At step 202, the interactive voice response server returns a guided speech to the queuing machine, instructing the queuing machine to send the user voice stream to an ASR (Automatic Speech Recognition, automatic speech recognition device, also known as speech recognition device).

In step 203, the queuing machine sends the user voice stream to the voice recognition device.

In step 204, the speech recognition device performs speech recognition on the user speech stream and sends the speech-recognized text to the semantic NLP (Natural Language Processing, natural language processing, also called semantic recognition device).

In step 205, the semantic recognition device sends the speech semantic recognition content of the user speech stream to the intelligent speech navigation device.

In some embodiments of the present disclosure, the speech semantic recognition content of the user speech stream may be a semantic recognition result of a semantic recognition device, and then step 206 and step 207 are performed.

In some embodiments of the present disclosure, the speech semantic recognition content of the user speech stream may be interactive semantic XML (eXtensible Markup Language ).

In step 206, the short code conversion module of the intelligent voice navigation apparatus generates a short code identifier XID according to the voice semantic recognition content, and returns the short code identifier to the queuing machine, and then step 208 is executed.

Step 207, the intelligent voice navigation device caches the voice semantic recognition content of the user voice stream by using the Redis service, and establishes a corresponding relationship between the voice semantic recognition content and the short code identifier.

In step 208, the queuing machine transmits the short code identification XID to the interactive voice response server.

In step 209, the interactive voice response server requests the node to execute, and transmits the short code identifier XID to the short code decoding module of the intelligent voice navigation device.

In step 210, the short code decoding module of the intelligent voice navigation apparatus parses the short code identifier XID, requests the voice semantic recognition content corresponding to the short code identifier, i.e., requests the voice semantic recognition content corresponding to the short code identifier (e.g., interactive semantic XML) from the Redis message middleware module.

Step 211, the Redis message middleware module of the intelligent voice navigation apparatus returns the voice semantic recognition content (for example, interactive semantic XML) corresponding to the short code identifier to the short code decoding module.

Step 212, the short encoder/decoder module of the intelligent voice navigation device generates corresponding interactive voice flow executing nodes according to the voice semantic recognition content, encapsulates the interactive voice flow executing nodes into flow nodes VXML (Voice eXtensible Markup Language ) which can be executed by the interactive voice response server, and returns the flow nodes voice extensible markup language to the interactive voice response server.

In some embodiments of the present disclosure, step 212 may include returning full node VXML in response to the IVR flow request.

In some embodiments of the present disclosure, the IVR server obtains the actual user voice, semantic recognition content at the intelligent voice navigation device by XID in the manner of steps 209-212, thereby driving actions behind the queuing machine (step 213-216).

Step 213, the interactive voice response server initiates the IVR flow node and returns the voice distribution result to the queuing machine.

In step 214, the queuing machine requests TTS (Text To Speech) broadcasts from a TTS Speech synthesis device.

In step 215, the tts speech synthesis device synthesizes the broadcast message and returns it to the queuing machine.

In step 216, the queuing machine broadcasts the broadcast sound to the user.

According to the embodiment of the disclosure, the intelligent voice navigation integration method based on the dynamically generated XID short coding data is provided by establishing the adapting device with the Redis cache service, so that the problem that the originally complex recognition text information with the byte number exceeding the limit of the queuing machine cannot be transmitted is solved, and finally, the voice interaction recognition information of the user is explicitly pointed to the voice navigation IVR flow executing node.

The voice stream output by the queuing machine in the embodiment of the disclosure is converted through an interface protocol, sent to a voice recognition and semantic recognition engine, sent to the device, cached through a Redis service, and dynamically generated into an XID short code to establish an association relation.

The intelligent voice navigation device of the embodiment of the disclosure sends the XID to the queuing machine, and the queuing machine sends the XID to the IVR server, thereby bypassing the limitation that the traditional queuing machine does not support voice recognition content and transmission data is limited.

The embodiment of the disclosure solves the problem that the main stream queuing machine is excessively long and limited for intelligent navigation recognition text information based on a custom XID short code mechanism.

The embodiment of the disclosure can realize the MRCP (Media Resource Control Protocol ) protocol architecture in a packet-grabbing mode, and breaks through the limitation that the traditional voice AI engine needs privately-arranged. Based on the new architecture, the above embodiments of the present disclosure can adapt to the mainstream speech AI engine interface protocol, and the engine only needs to provide http (HyperText Transfer Protocol) with open standardized capabilities, i.e. hypertext transfer protocol/websocket (a protocol for full duplex communication over a single TCP connection) interface.

Fig. 3 is a schematic diagram of a short code identification multi-round interaction mechanism in some embodiments of the present disclosure. For multi-round interaction in intelligent voice navigation, the intelligent voice navigation device disclosed by the invention is matched with an IVR server, and provides a multi-semantic slot position complement recognition mechanism, so that context associated logic can be brought when voice recognition data are subjected to XID coding conversion, and a scene of one-time business multi-round man-machine conversation is better supported.

For example, in the embodiment of fig. 3, according to the voice stream of the user 'i want to check bill', the check bill code is determined, and the intermediate node is determined to be 01002, which is used as the starting slot in the multi-semantic slots. The queuing machine broadcasts a voice of 'confirming whether inquiring is the local machine or the other machine' to the user, and searches a local code and an intermediate node +01 as an intermediate slot in the multi-semantic slots according to a voice stream of 'I' returned by the user. The queuing machine broadcasts a voice of 'asking you to inquire a month bill of the present year', and searches for a 2 month code and intermediate node +02 as an ending slot in the multi-semantic slots according to a voice stream of '2 months' returned by the user. And integrating semantics of the three slots of the 2 months, the local machine and the bill to perform context association, and finally providing the bill total amount of the 2 months of the user for the user and reminding the user to pay.

The embodiment of the disclosure can realize a multi-semantic slot completion recognition mechanism of the XID short code based on session association and realize one-time business multi-round man-machine conversation.

Compared with the conventional mode of the related technology for customizing and developing traffic logic at the application level, the embodiment of the disclosure can realize dual-channel traffic information management and real-time integration in the intelligent voice navigation device, can realize the associated output of traffic channel and voice flow AI processing results, and has high reusability value.

Fig. 4 is a schematic diagram of some embodiments of an intelligent voice navigation device of the present disclosure. As shown in fig. 4, the intelligent voice navigation apparatus of the present disclosure may include a short transcoding module 41 and a short codec module 42, wherein:

the short code conversion module 41 is configured to receive speech semantic recognition content of a user speech stream, wherein the queuing machine sends the user speech stream to the speech recognition device and the semantic recognition device, the semantic recognition device outputs the speech semantic recognition content of the user speech stream to the intelligent speech navigation device, generate a short code identifier according to the speech semantic recognition content, establish a corresponding relationship between the speech semantic recognition content and the short code identifier, and return the short code identifier to the queuing machine to instruct the queuing machine to send the short code identifier to the interactive speech response server.

In some embodiments of the present disclosure, short transcoding module 41 may be a semantic XID short transcoding module.

In some embodiments of the present disclosure, the short transcoding module 41 may be configured to define XID short codes based on rules executed by nodes, and to output results in place of the original too many words of semantic recognition corresponding to the codes, thereby bypassing queuing machine limitations.

The short code decoding module 42 is configured to parse the short code identifier of the interactive voice response server, obtain voice semantic recognition content corresponding to the short code identifier, and return the voice semantic recognition content corresponding to the short code identifier to the interactive voice response server.

In some embodiments of the present disclosure, the short codec module 42 may be configured to obtain corresponding speech semantic recognition content according to the short code identifier, generate a corresponding interactive speech process execution node according to the speech semantic recognition content, package the interactive speech process execution node into a process node speech extensible markup language executable by the interactive speech response server, and return the process node speech extensible markup language to the interactive speech response server.

In some embodiments of the present disclosure, short transcoding module 41 may be an XID short code node parsing module.

In some embodiments of the present disclosure, the short code conversion module 41 may be configured to parse the generated XID short code, obtain multiple corresponding IVR flow execution nodes, and package the multiple corresponding IVR flow execution nodes into flow nodes VXML executable by the IVR.

In some embodiments of the present disclosure, the short code conversion module 41 and the short code decoding module 42 may be configured to, for a scenario of multiple interactions in intelligent voice navigation, cooperate with an interactive voice response server to establish a semantic context association logic when performing short code identifier conversion on voice semantic recognition content, so as to implement multi-semantic slot-based complement recognition.

In some embodiments of the present disclosure, as shown in fig. 4, the intelligent voice navigation apparatus may further include a message middleware module 43, wherein:

The message middleware module 43 is configured to store a correspondence between the speech semantic recognition content and the short code identifier.

In some embodiments of the present disclosure, message middleware module 43 may be a Redis message middleware module.

In some embodiments of the present disclosure, message middleware module 43 may be used to carry a transmission schedule for highly concurrent smart navigation interactive semantic XML data messages.

Based on the intelligent voice navigation device provided by the embodiment of the disclosure, the original Code mechanism and the core architecture are not changed. According to the embodiment of the disclosure, the Redis memory server is utilized, the XID short codes are customized, and the temporary relation with the user voice recognition characters is established, so that the problem that the transmission length of intelligent navigation voice semantic analysis recognition data is limited is solved, a large amount of investment caused by upgrading a call center is avoided, and intelligent voice navigation online is accelerated.

Fig. 5 is a schematic diagram of some embodiments of the disclosed intelligent voice navigation system. As shown in fig. 5, the intelligent voice navigation apparatus of the present disclosure may include a queuing machine 1, a semantic recognition apparatus 2, an interactive voice response server 3, and an intelligent voice navigation apparatus 4, wherein:

A queuing machine 1 for transmitting a user voice stream to a voice recognition device and a semantic recognition device.

Semantic recognition means 2 for transmitting the speech semantic recognition content of the user speech stream to the intelligent speech navigation means.

In some embodiments of the present disclosure, the speech semantic recognition content of the user speech stream may be interactive semantic XML.

The intelligent voice navigation device 4 is used for generating a short code identification XID according to the voice semantic identification content, establishing a corresponding relation between the voice semantic identification content and the short code identification, returning the short code identification to the queuing machine, and instructing the queuing machine to send the short code identification to the interactive voice response server.

And the interactive voice response server 3 is used for acquiring corresponding voice semantic recognition content from the intelligent voice navigation device according to the short code identification.

In some embodiments of the present disclosure, as shown in fig. 5, the intelligent voice navigation system may further include a voice recognition device 5, wherein:

The queuing machine 1 can also be used for sending a user voice stream to an interactive voice response IVR server to request voice distribution in the case that a user makes a telephone call into the queuing machine through a user terminal.

The interactive voice response server 3 may also be adapted to return a pilot call to the queuing machine instructing the queuing machine to send the user's voice stream to the voice recognition device.

The voice recognition device 5 is used for performing voice recognition on the voice stream of the user and sending the words after voice recognition to the semantic recognition device 2.

Semantic recognition means 2 for transmitting the speech semantic recognition content of the user speech stream to the intelligent speech navigation means 4.

In some embodiments of the present disclosure, the speech semantic recognition content may be a semantic result packaged in XML format, as shown in fig. 5.

In some embodiments of the present disclosure, as shown in fig. 5, the intelligent voice navigation apparatus 4 may include a short transcoding module 41, a short codec module 42, and a message middleware module 43, wherein:

the short code conversion module 41 of the intelligent voice navigation apparatus 4 is configured to generate a short code identifier XID according to the voice semantic recognition content (semantic XML), establish a correspondence between the voice semantic recognition content and the short code identifier in the short code decoding module 42, return the short code identifier to the queuing machine 1, and instruct the queuing machine 1 to send the short code identifier to the interactive voice response server.

The interactive voice response server 3 is configured to request the node to execute and transmit the short code identifier XID to the short code decoding module 42 of the intelligent voice navigation device.

The short code decoding module 42 of the intelligent voice navigation apparatus is configured to parse the short code identifier XID, and request the voice semantic recognition content corresponding to the short code identifier, that is, request the voice semantic recognition content corresponding to the short code identifier (for example, interactive semantic XML) from the Redis message middleware module.

The Redis message middleware module 43 of the intelligent voice navigation apparatus is configured to return voice semantic recognition content (for example, interactive semantic XML) corresponding to the short code identifier to the short code decoding module 42.

The short codec module 42 of the intelligent voice navigation device is configured to generate a corresponding interactive voice process execution node (e.g., an XML package node) according to the voice semantic recognition content, package the interactive voice process execution node into a process node VXML executable by the interactive voice response server, and return the process node voice extensible markup language to the interactive voice response server.

FIG. 2 also provides a schematic diagram of other embodiments of the disclosed intelligent voice navigation system. In comparison with the embodiment of fig. 5, the intelligent speech navigation system of the embodiment of fig. 2 may further comprise a TTS speech synthesis device 6 in addition to the queuing machine 1, the speech recognition device 5, the semantic recognition device 2, the interactive speech response server 3 and the intelligent speech navigation device 4, wherein:

The interactive voice response server 3 may also be configured to initiate an IVR process node and return the voice allocation result to the queuing machine.

The queuing machine 1 may also be used to request TTS broadcasts from TTS speech synthesis means.

The TTS voice synthesis device 6 is configured to synthesize a broadcast signal and return the broadcast signal to the queuing machine 1, and instruct the queuing machine 1 to broadcast the broadcast signal to a user.

In some embodiments of the present disclosure, the intelligent voice navigation apparatus of the embodiments of fig. 2 and 3 may be an intelligent voice navigation apparatus as described in any of the embodiments described above (e.g., the embodiment of fig. 4).

Based on the intelligent voice navigation system provided by the embodiment of the disclosure, the original Code mechanism and the core architecture are not changed. According to the embodiment of the disclosure, the Redis memory server is utilized, the XID short codes are customized, and the temporary relation with the user voice recognition characters is established, so that the problem that the transmission length of intelligent navigation voice semantic analysis recognition data is limited is solved, a large amount of investment caused by upgrading a call center is avoided, and intelligent voice navigation online is accelerated.

The intelligent voice navigation method, the intelligent voice navigation device and the intelligent voice navigation system adopted by the embodiment of the disclosure are practically applied to intelligent IVR flattening construction projects of multimedia customer service cooperation groups of certain operators. The embodiment of the disclosure is verified in the project, so that the intelligent voice navigation service of the city operator is rapidly online, and the improvement and upgrading of the call center are avoided.

According to another aspect of the disclosure, there is provided a computer readable storage medium storing computer instructions that when executed by a processor implement an intelligent voice navigation method as described in any of the embodiments above (e.g., the embodiments of fig. 1 or 2).

Based on the computer readable storage medium provided by the above embodiments of the present disclosure, the original Code mechanism and core architecture are not changed. According to the embodiment of the disclosure, the Redis memory server is utilized, the XID short codes are customized, and the temporary relation with the user voice recognition characters is established, so that the problem that the transmission length of intelligent navigation voice semantic analysis recognition data is limited is solved, a large amount of investment caused by upgrading a call center is avoided, and intelligent voice navigation online is accelerated.

The embodiment of the disclosure can realize the MRCP protocol architecture in a packet grabbing mode, and breaks through the limitation that the traditional voice AI engine needs privately-arranged deployment. Based on the new architecture, the above embodiments of the present disclosure can adapt to the mainstream speech AI engine interface protocol, and the engine only needs to provide http (HyperText Transfer Protocol) with open standardized capabilities, i.e. hypertext transfer protocol/websocket (a protocol for full duplex communication over a single TCP connection) interface.

The intelligent voice navigation apparatus described above may be implemented as a general purpose processor, a Programmable Logic Controller (PLC), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any suitable combination thereof for performing the functions described herein.

Thus far, the present disclosure has been described in detail. In order to avoid obscuring the concepts of the present disclosure, some details known in the art are not described. How to implement the solutions disclosed herein will be fully apparent to those skilled in the art from the above description.

Those of ordinary skill in the art will appreciate that all or a portion of the steps implementing the above embodiments may be implemented by hardware, or may be implemented by a program indicating that the relevant hardware is implemented, where the program may be stored on a computer readable storage medium, where the storage medium may be a read only memory, a magnetic disk or optical disk, etc.

The description of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the disclosure in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiments were chosen and described in order to best explain the principles of the disclosure and the practical application, and to enable others of ordinary skill in the art to understand the disclosure for various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. An intelligent voice navigation method, comprising:

The interactive voice response server acquires corresponding voice semantic recognition content from the intelligent voice navigation device according to the short code identification;

the interactive voice response server obtaining corresponding voice semantic recognition content from the intelligent voice navigation device according to the short code identification comprises the following steps:

2. The intelligent voice navigation method of claim 1, further comprising:

3. The intelligent voice navigation method according to claim 1 or 2, wherein the queuing machine transmits the user voice stream to the voice recognition apparatus and the semantic recognition apparatus comprises:

4. An intelligent voice navigation apparatus, comprising:

The short coding decoding module is used for analyzing the short coding identification of the interactive voice response server to obtain voice semantic recognition content corresponding to the short coding identification, and returning the voice semantic recognition content corresponding to the short coding identification to the interactive voice response server;

The short coding and decoding module is used for acquiring corresponding voice semantic recognition content according to the short coding identification, generating a corresponding interactive voice flow execution node according to the voice semantic recognition content, packaging the interactive voice flow execution node into a flow node voice extensible markup language which can be executed by the interactive voice response server, and returning the flow node voice extensible markup language to the interactive voice response server.

5. The intelligent voice navigation apparatus of claim 4, wherein,

The short code conversion module and the short code decoding module are used for establishing the association logic of the semantic context under the condition of carrying out short code identification conversion on the semantic identification content of the voice by matching with the interactive voice response server for the scene of multi-round interaction in intelligent voice navigation so as to realize the multi-semantic slot-based complement identification.

6. The intelligent voice navigation apparatus of claim 4 or 5, further comprising:

7. An intelligent voice navigation system, comprising:

The interactive voice response server is used for acquiring corresponding voice semantic recognition content from the intelligent voice navigation device according to the short code identification;

The intelligent voice navigation device is used for acquiring corresponding voice semantic recognition content according to the short code identification, generating a corresponding interactive voice flow execution node according to the voice semantic recognition content, packaging the interactive voice flow execution node into a flow node voice extensible markup language which can be executed by the interactive voice response server, and returning the flow node voice extensible markup language to the interactive voice response server.

8. The intelligent voice navigation system of claim 7, further comprising:

9. The intelligent voice navigation system of claim 7 or 8,

The intelligent voice navigation device is used for establishing the association logic of the semantic context under the condition of carrying out short code identification conversion on voice semantic identification content by matching with the interactive voice response server for the scene of multi-round interaction in intelligent voice navigation so as to realize multi-semantic slot-based complement identification.

10. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the intelligent voice navigation method of any of claims 1-3.