WO2019103569A1 - Procédé d'amélioration de la performance de reconnaissance vocale sur la base d'un contexte, appareil informatique et support d'enregistrement lisible par ordinateur - Google Patents
Procédé d'amélioration de la performance de reconnaissance vocale sur la base d'un contexte, appareil informatique et support d'enregistrement lisible par ordinateur Download PDFInfo
- Publication number
- WO2019103569A1 WO2019103569A1 PCT/KR2018/014680 KR2018014680W WO2019103569A1 WO 2019103569 A1 WO2019103569 A1 WO 2019103569A1 KR 2018014680 W KR2018014680 W KR 2018014680W WO 2019103569 A1 WO2019103569 A1 WO 2019103569A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- present
- user
- stt
- text
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1815—Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
Definitions
- the present invention relates to an interactive AI agent system, and more particularly to a method for improving the performance of speech recognition based on a context.
- 10-2013-0031231 discloses a technology for providing a user with a plurality of text conversion results for voice input so that the user can directly input a result of accurate text conversion
- Korean Patent Laid-Open Publication No. 10-2017-0099917 proposes a plurality of responses based on the context information for each of a plurality of text conversion results for speech input, Technology is disclosed.
- one service principal may provide an interactive AI agent system, but some functions may be serviced through an external optimized server.
- a function of converting user's voice into text can be provided in the form of an API, and a representative example is the Google Speech API.
- STT Sound-To-Text
- a service when a service is received from an external STT server, it transmits a voice input or transmits a file format and a syntax hint together with a voice loudspeaker, and receives at least one text conversion value associated with the transmitted voice input.
- a syntax hint is information that aids in the processing of a given audio, and may be a specific word or phrase.
- the external STT server can improve the accuracy of voice recognition of the transmitted voice file by using the syntax hint.
- an interactive AI agent system that receives speech of a free speech form and provides services of various domain based contexts
- the interactive AI agent system builds a hierarchical conversation flow management model including sufficient dialog management knowledge, for example, sequential conversation flow patterns for providing the corresponding service And manage and provide appropriate information when converting speech recognition to text.
- an interactive AI agent system that can more easily grasp the user's intention based on accurate user speech recognition and provide an appropriate response can be provided.
- FIG. 1 is a schematic diagram of a system environment in which an interactive AI agent system may be implemented, according to one embodiment of the present invention.
- FIG. 2 is a functional block diagram that schematically illustrates the functional configuration of the user terminal 102 of FIG. 1, in accordance with one embodiment of the present invention.
- FIG. 3 is a functional block diagram that schematically illustrates the functional configuration of the interactive AI agent server 106 of FIG. 1, according to one embodiment of the present invention.
- FIG. 4 is an exemplary operational flow diagram performed by the STT auxiliary module of FIG. 3, in accordance with an embodiment of the present invention.
- " module " or " module " means a functional part that performs at least one function or operation, and may be implemented by hardware or software or a combination of hardware and software. Also, a plurality of "modules” or “sub-modules” may be integrated into at least one software module and implemented by at least one processor, except for "module” or “sub-module” have.
- the 'interactive AI agent system' is a system in which a user interacts with a user via a natural word input (for example, a natural language) input from a user through interactive interaction via a natural language of voice and / (E.g., commands, statements, requests, questions, etc. from the user) to determine the intent of the user and to perform the necessary actions based on the user's intent, i.e., , And is not limited to any particular form of information processing system.
- the interactive AI agent system may be for providing a predetermined service, wherein the service may comprise a plurality of sub-task categories (e.g., , Product inquiries, brand inquiries, design inquiries, price inquiries, return inquiries, etc.).
- the operations performed by the " interactive AI agent system " include, for example, an interactive response and / or task performance, each of which is performed according to the intention of the user in a sequential flow of sub- Lt; / RTI >
- the interactive response provided by the " interactive AI agent system " may be in the form of a visual, auditory and / or tactile (e.g., voice, sound, text, video, image, symbol, emoticon, hyperlink, Animation, various notices, motion, haptic feedback, and the like), and the like.
- the task performed by the 'interactive AI agent system' may include, for example, searching for information, proceeding with purchase of goods, writing a message, writing an email, dialing, playing music, photographing, / Navigation services, and the like, as well as various types of tasks (including, but not limited to, examples).
- the 'interactive AI agent system' includes a chatbot system based on a messenger platform, for example, a chatbot system for exchanging messages with a user on a messenger, providing various information desired by the user, but it should be understood that the present invention is not limited thereto.
- FIG. 1 is a schematic diagram of a system environment 100 in which an interactive AI agent system may be implemented, in accordance with one embodiment of the present invention.
- the system environment 100 includes a plurality of user terminals 102a-102n, a communication network 104, an interactive AI agent server 106, an external service server 108 and an external STT service server 110, . ≪ / RTI >
- each of the plurality of user terminals 102a-102n may be any user electronic device having wired or wireless communication capability.
- Each of the user terminals 102a-102n may be a variety of wired or wireless communication terminals, including, for example, a smart speaker, a music player, a game console, a digital TV, a set top box, a smart phone, a tablet PC, a desktop, a laptop, It is to be understood that the invention is not limited to any particular form.
- each of the user terminals 102a-102n can communicate with the interactive AI agent server 106, i. E., Via the communication network 104, with the necessary information.
- each of the user terminals 102a-102n can communicate with the external service server 108 through the communication network 104, that is, send and receive necessary information.
- each of the user terminals 102a-102n may receive user input in the form of voice and / or text from the outside, and may interact with the interactive AI agent server 106 via the communication network 104, (E.g., providing a specific conversation response and / or providing a specific task response) obtained through communication with the external service server 108 and / or communication with the external service server 108 (and / or processing within the user terminals 102a-102n) And the like) to the user.
- an interactive response as a result of an operation corresponding to a user input provided by the user terminals 102a-102n may be, for example, a sequence of sub- May be provided in accordance with the conversation flow pattern of the sub-task classification corresponding to the user input at that time in the flow.
- each of the user terminals 102a-102n may provide a dialog response as a result of an operation corresponding to a user input, in a visual, audible and / or tactile form (e.g., Images, symbols, emoticons, hyperlinks, animations, various notices, motion, haptic feedback, and the like), and the like.
- task execution as an operation corresponding to a user input is performed by, for example, searching for information, proceeding to purchase goods, composing a message, creating an email, dialing, music playback, photographing, Services, and the like, as well as performing various types of tasks.
- the communication network 104 may include any wired or wireless communication network, e.g., a TCP / IP communication network.
- the communication network 104 may include, for example, a Wi-Fi network, a LAN network, a WAN network, an Internet network, and the like, and the present invention is not limited thereto.
- the communication network 104 may be any of a variety of wired or wireless, such as Ethernet, GSM, EDGE, CDMA, TDMA, OFDM, Bluetooth, VoIP, Wi- May be implemented using a communication protocol.
- the interactive AI agent server 106 may communicate with the user terminals 102a-102n via the communication network 104.
- the interactive AI agent server 106 sends and receives necessary information to and from the user terminals 102a-102n via the communication network 104, The operation result corresponding to the user input, i.e., the user's intention, can be provided to the user.
- the interactive AI agent server 106 receives voice-like user natural language input from the user terminal 102a-102n, for example via the communication network 104, Can be converted into user natural language input of a character form. According to an embodiment of the present invention, the interactive AI agent server 106 transmits user voice input received from the user terminals 102a - 102n to the external STT server 110, At least one text data corresponding to a user input in the form of a voice can be received. According to one embodiment of the present invention, the interactive AI agent server 106 receives at least one text data from the external STT server 110 and, based on the STT conversion assist database described below, Perform evaluation on each of them, and output at least one text data and an evaluation result.
- the interactive AI agent server 106 receives user natural language input in the form of speech and / or text from the user terminal 102a-102n, for example via the communication network 104, The received natural language input can be processed based on the models to determine the intent of the user.
- the interactive AI agent server 106 may communicate with the external service server 108 via the communication network 104, as described above.
- the external service server 108 may be, for example, a messaging service server, an online consultation center server, an online shopping mall server, an information search server, a map service server, a navigation service server, and the like.
- an interactive response based on the user's intent which is transmitted from the interactive AI agent server 106 to the user terminals 102a-102n, It should be noted that this may include content.
- the interactive AI agent server 106 is shown as being a separate physical server configured to communicate with the external service server 108 via the communication network 104, the present disclosure is not limited thereto. According to another embodiment of the present invention, the interactive AI agent server 106 may be included as part of various service servers such as an online consultation center server or an online shopping mall server.
- the interactive AI agent server 106 collects interactive logs (e.g., may include a plurality of users and / or system utterance records) over various paths, Automatically analyze the conversation logs, and create and / or update a conversation flow management model based on the analysis results.
- the interactive AI agent server 106 classifies each utterance record into one of the predetermined task categories, for example, through keyword analysis on the collected conversation logs, Can be analyzed stochastically.
- the external STT server 110 receives a voice input of a user through a communication module and converts the received voice input into at least one It can be converted into text data in a character form and transmitted.
- the external STT server 110 may receive the user's speech input and related syntax hints and convert the user's speech input into text data in at least one character form based thereon.
- the user terminal 102 includes a user input receiving module 202, a sensor module 204, a program memory module 206, a processing module 208, a communication module 210, 212).
- the user input receiving module 202 may receive various types of input from a user, for example, a natural language input such as a voice input and / or a text input (and additionally, Can be received.
- the user input receiving module 202 includes, for example, a microphone and an audio circuit, and can acquire a user audio input signal through a microphone and convert the obtained signal into audio data.
- the user input receiving module 202 may include various types of input devices such as various pointing devices such as a mouse, a joystick, and a trackball, a keyboard, a touch panel, a touch screen, , And can acquire a text input and / or a touch input signal inputted from a user through these input devices.
- the user input received at the user input receiving module 202 may be associated with performing a predetermined task, such as performing a predetermined application or searching for certain information, etc. However, It is not.
- the user input received at the user input receiving module 202 may require only a simple conversation response, regardless of the execution of a predetermined application or retrieval of information.
- the user input received at the user input receiving module 202 may relate to a simple statement for unilateral communication.
- the sensor module 204 includes one or more different types of sensors through which the status information of the user terminal 102, such as the physical state of the user terminal 102, Software and / or hardware status, or information regarding the environmental conditions of the user terminal 102, and the like.
- the sensor module 204 may include an optical sensor, for example, and may sense the ambient light condition of the user terminal 102 through the optical sensor.
- the sensor module 204 includes, for example, a movement sensor and can detect whether the corresponding user terminal 102 is moving through the movement sensor.
- the sensor module 204 includes, for example, a velocity sensor and a GPS sensor, and through these sensors, the position and / or orientation of the corresponding user terminal 102 can be detected.
- the sensor module 204 may include other various types of sensors, including temperature sensors, image sensors, pressure sensors, touch sensors, and the like.
- the program memory module 206 may be any storage medium that stores various programs that may be executed on the user terminal 102, such as various application programs and related data.
- program memory module 206 may include, for example, a telephone dialer application, an email application, an instant messaging application, a camera application, a music playback application, a video playback application, an image management application, , And data related to the execution of these programs.
- the program memory module 206 may be configured to include various types of volatile or non-volatile memory such as DRAM, SRAM, DDR RAM, ROM, magnetic disk, optical disk, .
- the processing module 208 may communicate with each component module of the user terminal 102 and perform various operations on the user terminal 102. According to one embodiment of the present invention, the processing module 208 can drive and execute various application programs on the program memory module 206. [ According to one embodiment of the present invention, the processing module 208 may receive signals from the user input receiving module 202 and the sensor module 204, if necessary, and perform appropriate processing on these signals have. According to one embodiment of the present invention, the processing module 208 may, if necessary, perform appropriate processing on signals received from the outside via the communication module 210.
- the communication module 210 is configured to allow the user terminal 102 to communicate with the interactive AI agent server 106 and / or the external service server 108 via the communication network 104 of FIG. 1 Communication.
- the communication module 212 may be configured to receive signals from, for example, the user input receiving module 202 and the sensor module 204 via the communication network 104 in accordance with a predetermined protocol, To server 106 and / or to external service server 108.
- the communication module 210 may provide various signals received from the interactive AI agent server 106 and / or the external service server 108 via the communication network 104, e.g., voice and / Or a response signal including a natural language response in the form of a text, or various control signals, and perform appropriate processing according to a predetermined protocol.
- signals received from the interactive AI agent server 106 and / or the external service server 108 via the communication network 104 e.g., voice and / Or a response signal including a natural language response in the form of a text, or various control signals, and perform appropriate processing according to a predetermined protocol.
- the response output module 212 may output a response corresponding to a user input in various forms such as time, auditory, and / or tactile sense.
- the response output module 212 includes various display devices such as a touch screen based on technology such as LCD, LED, OLED, QLED, and the like, Such as text, symbols, video, images, hyperlinks, animations, various notices, etc., to the user.
- the response output module 212 may include, for example, a speaker or a headset and may provide an audible response, e.g., voice and / or acoustic response corresponding to user input, can do.
- the response output module 212 includes a motion / haptic feedback generator, through which a tactile response, e.g., motion / haptic feedback, can be provided to the user.
- a tactile response e.g., motion / haptic feedback
- the response output module 212 may simultaneously provide any combination of two or more of a text response, a voice response, and a motion / haptic feedback corresponding to a user input.
- FIG. 3 is a functional block diagram that schematically illustrates the functional configuration of the interactive AI agent server 106 of FIG. 1, according to one embodiment of the present invention.
- the interactive AI agent server 106 includes a communication module 310, a Speech-To-Text (STT) auxiliary module 320, a Natural Language Understanding (NLU) A text-to-speech (TTS) module 340, a storage module 350, and a conversation flow management model building / updating module 360.
- STT Speech-To-Text
- NLU Natural Language Understanding
- TTS text-to-speech
- the communication module 310 is configured to allow the interactive AI agent server 106 to communicate with the user terminal 102 and / or via the communication network 104, in accordance with any wired or wireless communication protocol, To communicate with the external service server 108 and / or the external STT server 110.
- the communication module 310 can receive voice input and / or text input from the user, transmitted from the user terminal 102 via the communication network 104.
- the communication module 310 may communicate with the user terminal 102 via the communication network 104 with or without voice input and / or text input from the user, The status information of the user terminal 102 transmitted from the terminal 102 can be received.
- the status information may include various status information (e.g., the physical state of the user terminal 102) associated with the user terminal 102 at the time of speech input from the user and / Software and / or hardware status of the user terminal 102, environmental status information around the user terminal 102, etc.).
- communication module 310 may also include an interactive response (e. G., A native < / RTI > And / or control signals to the user terminal 102 via the communication network 104.
- the user terminal 102 may be connected to the user terminal 102 via the network 104,
- the STT auxiliary module 320 can receive the voice input from the user input received through the communication module 310 and transmit the received voice input to the external STT server 110. According to one embodiment of the present invention, the STT auxiliary module 320 can transmit the voice input received through the communication module 310 and the information related to the voice input to the external STT server 110 together. According to one embodiment of the present invention, the STT auxiliary module 320 receives at least one text data converted from the voice input transmitted through the communication module 310, and transmits the STT conversion assist database 350 The translation accuracy for each of the at least one text data can be evaluated on a basis.
- the NLU module 330 may receive text input from the communication module 310 or the STT auxiliary module 320.
- the textual input received at the NLU module 330 may be transmitted to the user terminal 102 via the user text input or communication module 310 received from the user terminal 102 via the communication network 104, (E.g., a sequence of words) received from the external STT server via the STT auxiliary module 320.
- the NLU module 330 may include status information associated with a corresponding user input, such as upon receipt of a textual input or thereafter, such as the status information of the user terminal 102 at the time of the user input And the like.
- the status information may include various status information (e.g., the physical state of the user terminal 102, the software status of the user terminal 102) associated with the user terminal 102 at the time of user input and / And / or hardware state, environmental state information around the user terminal 102, etc.).
- various status information e.g., the physical state of the user terminal 102, the software status of the user terminal 102 associated with the user terminal 102 at the time of user input and / And / or hardware state, environmental state information around the user terminal 102, etc.
- the NLU module 330 may map the received text input to one or more user intents. Where the user intent can be associated with a series of operations (s) that can be understood and performed by the interactive AI agent server 106 according to the user intention. According to one embodiment of the present invention, the NLU module 330 may refer to the status information described above in associating the received text input with one or more user intentions.
- the TTS module 340 may receive an interactive response that is generated to be transmitted to the user terminal 102.
- the interactive response received at the TTS module 340 may be a natural word or a sequence of words having a textual form.
- the TTS module 340 may convert the input of the above received text form into speech form according to various types of algorithms.
- the storage module 350 may include various databases. According to one embodiment of the present invention, the storage module 350 may include a user database 352, a conversation understanding knowledge base 354, an interaction log database 356, and a conversation flow management model 368.
- the user database 352 may be a database for storing and managing characteristic data for each user.
- the user database 352 may include, for example, previous conversation history of the user, pronunciation feature information of the user, user lexical preference, location of the user, And may include various user-specific information.
- the conversation understanding knowledge base 354 may include, for example, a predefined ontology model.
- an ontology model can be represented, for example, in a hierarchical structure between nodes, where each node is associated with an " intention " node corresponding to the user & Node (a sub-attribute node directly linked to an " intent " node or linked back to an " attribute " node of an " intent " node).
- " attribute " nodes directly or indirectly linked to an " intention " node and its " intent " node may constitute one domain, and an ontology may be composed of such a set of domains .
- the conversation understanding knowledge base 354 may be configured to include domains that each correspond to all intents, for example, an interactive AI agent system that can understand and perform corresponding actions have.
- the ontology model can be dynamically changed by addition or deletion of nodes or modification of relations between nodes.
- the intention nodes and attribute nodes of each domain in the ontology model may be associated with words and / or phrases associated with corresponding user intents or attributes, respectively.
- the conversation understanding knowledge base 354 includes an ontology model 354 that includes an ontology model including a hierarchy of nodes and a set of words and / or phrases associated with each node, , And the STT auxiliary module 320 can determine the user's intention based on the ontology model implemented in the lexical dictionary form.
- the STT assistance module 320 upon receipt of a text input or sequence of words, can determine which of the domains in the ontology model the respective words in the sequence are associated with , And can determine the corresponding domain, i. E. User intention, based on such a determination.
- the conversation log database 356 may be a database that classifies, stores, and manages conversation logs collected in any of various ways according to a predetermined criterion. According to an embodiment of the present invention, the conversation log database 356 may be stored in association with, for example, the number of times that the user of the service domain frequently uses words, phrases, sentences, and various other types of user input.
- the dialogue flow management model 358 may include a probabilistic distribution model for a sequential flow between a plurality of sub-task classes needed for providing a service in relation to a given service domain .
- the dialogue flow management model 358 may include, for example, a sequential flow between each sub-task category belonging to the service domain in the form of a probability graph.
- the dialogue flow management model 358 may include, for example, a probabilistic distribution of each task classification obtained on various sequential flows that may occur between each of the sub-task classes.
- the dialogue flow management model 358 may also include a library of dialog patterns belonging to each task category.
- each database contained in the storage module 350 may reside, for example, at the user terminal 102 and distributed to the user terminal 102 and the interactive AI agent server 106 And the like.
- the conversation flow management model building / updating module 360 automatically analyzes each conversation log stored in the conversation log database 356 collected by any of a variety of methods, And build and / or update the conversation flow management model.
- the dialogue flow management model build / update unit 360 generates a dialogue flow management model, for example, through keyword analysis on conversation logs stored in the conversation log database 356, One of the categories, and group the utterance records of the same sub-task category.
- the dialogue flow management model construction / update unit 360 can grasp, for example, a sequential flow between each group, i.e., each lower task category, as a probabilistic distribution.
- the dialogue flow management model construction / update unit 360 can construct a sequential flow between the sub-task categories on the service domain, for example, in the form of a probability graph.
- the dialogue flow management model building / updating unit 360 may be configured to determine, for example, all sequential flows that may occur between each of the sub-task classes, It is possible to determine the probability of occurrence of the flow between each job classification, and thereby obtain a stochastic distribution of each sequential flow between the above-mentioned lower job classes.
- the conversation flow management model construction / update unit 360 performs keyword analysis on the conversation logs collected in any of various ways, and stores each speech history on the conversation log in a predetermined operation It can be classified and tagged as one of the categories.
- the predetermined task classifications may be, for example, each of the sub classifications belonging to one service domain.
- the conversation flow management model building / updating unit 360 constructs the conversation flow management model building / updating unit 360 based on, for example, a sub-task classification of a product inquiry, a brand inquiry, a design inquiry, Quot; can be classified and tagged with any one of them.
- the dialogue flow management model construction / update unit 360 may previously select relevant keywords for each of the lower task categories, and, based on the selected keywords, Can be classified into classification.
- the conversation flow management model construction / update unit 360 can group speech data classified and tagged into any one of a plurality of job data categories among speech data of the same classification.
- the speech history groups grouped into the same category may be included in the dialogue flow management model as the conversation patterns of the category.
- the dialogue flow management model construction / update unit 360 can analyze the probabilistic distribution of the time series sequential between the respective lower task categories from the dialogue logs.
- a sub-task classification belongs to a product inquiry, a brand inquiry, a design inquiry, a price inquiry, and a return inquiry
- a working classification there is a probability of 70% of product inquiry, 20% of brand inquiry, 5% of design inquiry, 3% of price inquiry, and 2% of return inquiry.
- the price inquiry is 13%
- the return inquiry is 1% probability
- each of the sub work categories can be layered as the probability distribution of this sequential flow.
- the dialogue flow management model construction / update unit 360 may construct a sequential flow between lower task classes on a service domain, for example, in a stochastic graph form. According to an embodiment of the present invention, the dialogue flow management model construction / update unit 360 can recursively grasp the probabilistic relation of the sequential flow between the respective lower task classes, for example, Sequential flow can be configured.
- the dialogue flow management model construction / update unit 360 can delete a flow having a probability less than the threshold from the analysis result of the probabilistic distribution of the time series sequence between the lower task classes. For example, if the probability is 2%, if the probability that the return inquiry appears after the inquiry of the commodity is 1% in the service domain of the commodity purchase, the flow of the return inquiry after the commodity inquiry is deleted from the conversation flow management model .
- the interactive AI agent system is a client-server model between the user terminal 102 and the interactive AI agent server 106, And is based on a so-called " thin client-server model ", which delegates all other functions of the interactive AI agent system to the server, but the present invention is not limited thereto.
- the interactive AI agent system may be implemented as a distributed application between the user terminal and the server, or as a stand-alone application installed on the user terminal .
- the interactive AI agent system implements the functions of the interactive AI agent system distributed between the user terminal and the server according to an embodiment of the present invention
- the distribution of each function of the interactive AI agent system between the client and the server is It should be understood that the invention may be otherwise embodied.
- the specific module has been described as performing certain operations for convenience, the present invention is not limited thereto. According to another embodiment of the present invention, it is to be understood that the operations described as being performed by any particular module in the above description may be performed by separate and distinct modules, respectively.
- FIG. 4 is an exemplary operational flow diagram performed by the STT auxiliary module of FIG. 3, in accordance with an embodiment of the present invention.
- the STT assistance module 320 may receive a user's speech input including a natural language input composed of one or more words.
- the natural language input may be a voice input, e.g., received via the microphone of the user terminal 102a-102n and transmitted via the communication module 310.
- the STT assistance module 320 transmits the voice input of the user received in step 402 to the external STT server 110.
- the voice input may be in a voice file (e.g., wave file) or streaming format.
- the STT auxiliary module 320 may transmit information (e.g., a file format, an encoding format, and the like) and a syntax hint together with a voice input of a user.
- the phrase hint may be a specific word or phrase as information that aids the given audio processing.
- the STT assistance module 320 may receive at least one textual data associated with the voice file transmitted from the external STT server 110.
- the at least one text data may include a score (probability) given by an external STT server.
- the STT assistance module 320 may evaluate the conversion accuracy for each of the at least one textual data.
- the conversion accuracy may be a probability for each of the at least one text data or a relative rank for each of the at least one text data.
- the STT auxiliary module 320 may evaluate the conversion accuracy of each of the at least one text data, according to a predetermined criterion. In one embodiment of the present invention, the STT auxiliary module 320 may evaluate the conversion accuracy for each of the at least one text data in consideration of the scores given by the external STT server for each of the at least one text data.
- the STT auxiliary module 320 may evaluate the conversion accuracy of each of the at least one text data based on the STT conversion auxiliary database.
- the STT change assistance database includes a user database 352 for storing and managing user-specific feature data, an conversation log database 356 in which existing conversation logs of users are analyzed and stored, A dialogue understanding knowledge base 352 in which attributes associated with an intent to be included are stored, a dialogue flow 352 that is a probabilistic distribution model for a sequential flow between a plurality of lower task classes necessary for providing a service in association with the service domain, And a management model 358.
- the STT auxiliary module 320 may evaluate the conversion accuracy based on the number of occurrences of words contained in each of the at least one text conversion results.
- the number of occurrences of words can be calculated based on the conversation log database in which the number of occurrences of words per domain is stored. For example, when the corresponding domain is "finance" and the received text data is "one time” and "Japan", the number of occurrences for "one time” stored in the domain user database is 7200 and the number of occurrences for "Japan” 10 times, the probability of conversion accuracy can be determined to be higher than that of " one time "
- the STT auxiliary module 320 may evaluate the conversion accuracy based on the similarity between the sentences included in each of the at least one text conversion results and the sentences stored in the STT translation assistant database.
- a method of calculating the similarity between sentences includes a statistical method of constructing a vector with each word frequency included in a sentence and obtaining a cosine similarity between vectors, or a semantic similarity based on WordNet distance Various semantic methods can be used.
- the STT auxiliary module 320 receives at least one converted text data from the external STT server 110 via the communication module 310, and based on a predetermined knowledge model prepared in advance To determine the intent of the user corresponding to the user natural language input and to evaluate the conversion accuracy based on the determined intent.
- the STT assistance module 320 when determining the user's intent, may send the received text input to one or more It can correspond to a user intent.
- the STT auxiliary module 320 receives at least one converted text data from the external STT server 110 via the communication module 310, The conversion accuracy can be evaluated based on the hierarchical position.
- the STT auxiliary module 320 receives the hierarchical location information of the corresponding speech input from the conversation flow management model building / updating module 360, which configures a sequential flow on the service domain in the form of a probability graph .
- the STT auxiliary module 320 outputs at least one text conversion result.
- the STT auxiliary module 320 may output at least one text conversion result and an evaluation result together.
- a computer program according to an embodiment of the present invention may be stored in a storage medium readable by a computer processor or the like such as a nonvolatile memory such as EPROM, EEPROM, flash memory device, a magnetic disk such as an internal hard disk and a removable disk, CDROM disks, and the like. Also, the program code (s) may be implemented in assembly language or machine language. And all changes and modifications that fall within the true spirit and scope of the present invention are intended to be embraced by the following claims.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Machine Translation (AREA)
Abstract
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| KR1020170159269A KR101970899B1 (ko) | 2017-11-27 | 2017-11-27 | 문맥 기반으로 음성 인식의 성능을 향상하기 위한 방법, 컴퓨터 장치 및 컴퓨터 판독가능 기록 매체 |
| KR10-2017-0159269 | 2017-11-27 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2019103569A1 true WO2019103569A1 (fr) | 2019-05-31 |
Family
ID=66282142
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/KR2018/014680 Ceased WO2019103569A1 (fr) | 2017-11-27 | 2018-11-27 | Procédé d'amélioration de la performance de reconnaissance vocale sur la base d'un contexte, appareil informatique et support d'enregistrement lisible par ordinateur |
Country Status (2)
| Country | Link |
|---|---|
| KR (1) | KR101970899B1 (fr) |
| WO (1) | WO2019103569A1 (fr) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112819664A (zh) * | 2019-10-31 | 2021-05-18 | 乐金信世股份有限公司 | 用于学习外语的设备及使用其提供外语学习服务的方法 |
| CN114860896A (zh) * | 2021-02-03 | 2022-08-05 | 卢文祥 | 基于复杂任务分析的对话方法及系统 |
| US12260856B2 (en) | 2021-12-23 | 2025-03-25 | Y.E. Hub Armenia LLC | Method and system for recognizing a user utterance |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2020218659A1 (fr) | 2019-04-26 | 2020-10-29 | (주)아크릴 | Dispositif de réponse à une interrogation automatisée pour ventes de produits d'assurance au moyen d'un réseau neuronal artificiel |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20140062656A (ko) * | 2012-11-14 | 2014-05-26 | 한국전자통신연구원 | 계층적 대화 태스크 라이브러리를 이용한 이중 대화관리 기반 음성대화시스템 |
| KR20140111538A (ko) * | 2013-03-11 | 2014-09-19 | 삼성전자주식회사 | 대화형 서버, 디스플레이 장치 및 제어 방법 |
| KR20160060335A (ko) * | 2014-11-20 | 2016-05-30 | 에스케이텔레콤 주식회사 | 대화 분리 장치 및 이에서의 대화 분리 방법 |
| WO2016151698A1 (fr) * | 2015-03-20 | 2016-09-29 | 株式会社 東芝 | Dispositif, procédé et programme de dialogue |
| KR20170088164A (ko) * | 2016-01-22 | 2017-08-01 | 한국전자통신연구원 | 점증적 대화지식 자가학습 기반 대화장치 및 그 방법 |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20160031231A (ko) | 2014-09-12 | 2016-03-22 | 엘지전자 주식회사 | 공기 조화기의 실외기 |
| US9836452B2 (en) | 2014-12-30 | 2017-12-05 | Microsoft Technology Licensing, Llc | Discriminating ambiguous expressions to enhance user experience |
-
2017
- 2017-11-27 KR KR1020170159269A patent/KR101970899B1/ko active Active
-
2018
- 2018-11-27 WO PCT/KR2018/014680 patent/WO2019103569A1/fr not_active Ceased
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20140062656A (ko) * | 2012-11-14 | 2014-05-26 | 한국전자통신연구원 | 계층적 대화 태스크 라이브러리를 이용한 이중 대화관리 기반 음성대화시스템 |
| KR20140111538A (ko) * | 2013-03-11 | 2014-09-19 | 삼성전자주식회사 | 대화형 서버, 디스플레이 장치 및 제어 방법 |
| KR20160060335A (ko) * | 2014-11-20 | 2016-05-30 | 에스케이텔레콤 주식회사 | 대화 분리 장치 및 이에서의 대화 분리 방법 |
| WO2016151698A1 (fr) * | 2015-03-20 | 2016-09-29 | 株式会社 東芝 | Dispositif, procédé et programme de dialogue |
| KR20170088164A (ko) * | 2016-01-22 | 2017-08-01 | 한국전자통신연구원 | 점증적 대화지식 자가학습 기반 대화장치 및 그 방법 |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112819664A (zh) * | 2019-10-31 | 2021-05-18 | 乐金信世股份有限公司 | 用于学习外语的设备及使用其提供外语学习服务的方法 |
| CN114860896A (zh) * | 2021-02-03 | 2022-08-05 | 卢文祥 | 基于复杂任务分析的对话方法及系统 |
| US12260856B2 (en) | 2021-12-23 | 2025-03-25 | Y.E. Hub Armenia LLC | Method and system for recognizing a user utterance |
Also Published As
| Publication number | Publication date |
|---|---|
| KR101970899B1 (ko) | 2019-04-24 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2019124647A1 (fr) | Procédé et appareil informatique permettant de construire ou de mettre à jour automatiquement un modèle hiérarchique de gestion de flux de conversations destiné à un système d'agent ai interactif et support d'enregistrement lisible par ordinateur | |
| KR102120751B1 (ko) | 대화 이해 ai 시스템에 의하여, 머신러닝을 대화 관리 기술에 적용한 하이브리드 계층적 대화 흐름 모델을 기초로 답변을 제공하는 방법 및 컴퓨터 판독가능 기록 매체 | |
| KR101959292B1 (ko) | 문맥 기반으로 음성 인식의 성능을 향상하기 위한 방법, 컴퓨터 장치 및 컴퓨터 판독가능 기록 매체 | |
| KR101891492B1 (ko) | 답변을 변형하여 상황에 맞는 자연어 대화를 제공하는 방법, 컴퓨터 장치 및 컴퓨터 판독가능 기록 매체 | |
| WO2019147039A1 (fr) | Procédé de détermination d'un motif optimal de conversation pour la réalisation d'un objectif à un instant particulier pendant une session de conversation associée à un système de service d'ia de compréhension de conversation, procédé de détermination de probabilité de prédiction d'accomplissement d'objectif et support d'enregistrement lisible par ordinateur | |
| KR101950387B1 (ko) | 학습 데이터 중 식별 가능하지만 학습 가능성이 없는 데이터의 레이블화를 통한, 대화형 ai 에이전트 시스템을 위한 지식베이스 모델의 구축 또는 갱신 방법, 컴퓨터 장치, 및 컴퓨터 판독 가능 기록 매체 | |
| CN111026840B (zh) | 文本处理方法、装置、服务器和存储介质 | |
| KR101932263B1 (ko) | 적시에 실질적 답변을 제공함으로써 자연어 대화를 제공하는 방법, 컴퓨터 장치 및 컴퓨터 판독가능 기록 매체 | |
| WO2019103569A1 (fr) | Procédé d'amélioration de la performance de reconnaissance vocale sur la base d'un contexte, appareil informatique et support d'enregistrement lisible par ordinateur | |
| CN117215647A (zh) | 基于语言模型的指令执行方法、装置及存储介质 | |
| WO2019088383A1 (fr) | Procédé et dispositif informatique de fourniture de conversation en langage naturel en fournissant une réponse d'interjection en temps opportun, et support d'enregistrement lisible par ordinateur | |
| KR20190094087A (ko) | 머신러닝 기반의 대화형 ai 에이전트 시스템과 연관된, 사용자 맞춤형 학습 모델을 포함하는 사용자 단말 및 사용자 맞춤형 학습 모델이 기록된 컴퓨터 판독가능 기록 매체 | |
| KR101932264B1 (ko) | 복수 개의 같은 유형의 엔티티 정보의 분석에 기초한 인텐트 결정을 제공하는 방법 및 대화형 ai 에이전트 시스템, 및 컴퓨터 판독가능 기록 매체 | |
| KR20190103951A (ko) | 학습 데이터 중 식별 가능하지만 학습 가능성이 없는 데이터의 레이블화를 통한, 대화형 ai 에이전트 시스템을 위한 지식베이스 모델의 구축 또는 갱신 방법, 컴퓨터 장치, 및 컴퓨터 판독 가능 기록 매체 | |
| WO2019143170A1 (fr) | Procédé de génération de modèle de conversation pour système de service ai de compréhension de conversation ayant un but prédéterminé, et support d'enregistrement lisible par ordinateur | |
| WO2019142976A1 (fr) | Procédé de commande d'affichage, support d'enregistrement lisible par ordinateur, et dispositif informatique pour afficher une réponse de conversation candidate pour une entrée de parole d'utilisateur | |
| WO2019156537A1 (fr) | Système d'agent ai interactif et procédé pour fournir activement un service lié à la sécurité et similaire par l'intermédiaire d'une session de dialogue ou d'une session séparée sur la base d'une surveillance de session de dialogue entre des utilisateurs, et support d'enregistrement lisible par ordinateur | |
| Gonge et al. | Voice recognition system for desktop assistant | |
| KR20210045702A (ko) | 키워드 기반 북마크 검색 서비스 제공을 위하여 북마크 정보를 저장하는 방법 및 컴퓨터 판독가능 기록 매체 | |
| KR102120748B1 (ko) | 대화 이해 ai 시스템에 의하여, 계층적으로 저장되어 있는 북마크에 대한 문맥기반 검색 서비스를 제공하는 방법 및 컴퓨터 판독가능 기록 매체 | |
| KR20190094081A (ko) | 대화형 ai 에이전트 시스템을 위한 지식베이스의 시각화 방법 및 컴퓨터 판독가능 기록 매체 | |
| KR20210045699A (ko) | 계층적으로 저장되어 있는 북마크에 대한 문맥기반 검색 서비스를 제공하는 방법 및 컴퓨터 판독가능 기록 매체 | |
| KR102120749B1 (ko) | 대화 이해 ai 시스템에 의하여, 키워드 기반 북마크 검색 서비스 제공을 위하여 북마크 정보를 저장하는 방법 및 컴퓨터 판독가능 기록 매체 | |
| WO2019066132A1 (fr) | Procédé d'authentification basée sur un contexte d'utilisateur ayant une sécurité améliorée, système d'agent ai interactif et support d'enregistrement lisible par ordinateur | |
| JP2022003494A (ja) | 対話中の文脈の因果関係に応じた応答文を推定するプログラム、装置及び方法 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18881839 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 18881839 Country of ref document: EP Kind code of ref document: A1 |
|
| 32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 22.01.2021) |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 18881839 Country of ref document: EP Kind code of ref document: A1 |