US20220130545A1

US20220130545A1 - Task-oriented Dialogue System for Automatic Disease Diagnosis

Info

Publication number: US20220130545A1
Application number: US17/508,655
Authority: US
Inventors: Zhongyu Wei
Original assignee: Individual
Current assignee: Individual
Priority date: 2020-10-22
Filing date: 2021-10-22
Publication date: 2022-04-28
Also published as: US20220130546A1; US11562829B2

Abstract

A task-oriented dialogue system, and methods therefor, including a symptom extraction module for extracting the symptoms from a dataset including user self-report data and user-doctor conversational data; a symptoms normalization module for normalizing the extracted symptoms and generating a user goal; an agent simulator module simulating the behavior of a doctor, and a user simulator module simulating the behavior of a patient; and a dialogue policy learning module for training the dialogue policy via reinforcement learning; wherein the user simulator module samples the user goal.

Description

The present patent application claims a priority to China application No. 202011135075.2, filed on Oct. 22, 2020, and a priority to China application No. 202011136008.2, filed on Oct. 22, 2020. The entire content of each applications is incorporated herein by reference.

TECHNICAL FIELD

This invention is related to a system for automatic disease diagnosis, and more particularly, to a task-oriented dialogue system for automatic disease diagnosis. This invention is also related to a method thereof.

BACKGROUND

Automatic phenotype identification using electronic health records (EHRs) has been a rising topic in recent years (“A review of approaches to identifying patient phenotype cohorts using electronic health records”, Shivade et al., 2013, Journal of the American Medical Informatics Association). Researchers explore with various machine learning approaches to identify symptoms and diseases for patients given multiple types of information (both numerical data and pure texts). Experimental results prove the effectiveness of the identification of heart failure, type 2 diabetes, autism spectrum disorders, infection detection etc. Currently, most attempts focus on some specific types of diseases and it is difficult to transfer models from one disease to another.
In general, each EHR contains multiple types of data, including personal information, admission note, diagnose tests, vital signs and medical image. And it is collected accumulatively following a diagnostic procedure in clinic, which involves interactions between patients and doctors and some complicated medical tests. Therefore, it is very expensive to collect EHRs for different diseases. How to collect the information from patient automatically remains the challenge for automatic diagnosis.
In 2003, Milward et al. proposed an ontology-based dialogue system that supports electronic referrals for breast cancer (“Ontology-based dialogue systems”. In Proc.3rd Workshop on Knowledge and reasoning in practical dialogue systems (IJCAI03)), which can deal with the informative response of users based on the medical domain ontologies. In addition, Tang et al. and Kao et al. proposed two works where deep reinforcement learning is applied for automatic diagnosis (“Inquire and diagnose: Neural symptom checking ensemble using deep reinforcement learning”, 2016, In Proceedings of NIPS Workshop on Deep Reinforcement Learning; and “Context-aware symptom checking for disease diagnosis using hierarchical reinforcement learning”, 2018). However, their models need extra human resources to categorize the diseases into different groups and the data used is simulated that can not reflect the situation of the real patients.
Recently, due to its promising potentials and alluring commercial values, research about task-oriented dialogue system (DS) has attracted increasing attention in different domains, including ticket booking, online shopping and restaurant searching. It is believed that applying DS in the medical domain has great potential to reduce the cost of collecting data from patients.
However, there is a gap to fill for applying DS in disease identification. There are basically two major challenges. First, the lack of annotated medical dialogue dataset. Second, no available DS framework for disease identification.
Therefore, there is a need to provide a novel task-oriented dialogue system addressing the above problems.

SUMMARY

In this application, to address the above problems, we make the first move to build a dialogue system facilitating automatic information collection and diagnosis making for medical domain. A reinforcement learning based framework for medical dataset has been proposed. For a matching work with this framework, a first medical dataset for dialogue system, which includes both patient self-report data and patient-doctor conversational data, has been built. The experiment results on this dataset show that the dialogue system of this application is able to collect symptoms from patients via conversation and improve the accuracy for automatic diagnosis.
In one aspect of this invention, it is provided a task-oriented dialogue system, including
a symptom extraction module for extracting the symptoms from a dataset including user self-report data and user-doctor conversational data;
a symptoms normalization module for normalizing the extracted symptoms and generating a user goal;
an agent simulator module simulating the behavior of a doctor, and a user simulator module simulating the behavior of a patient; and
a dialogue policy learning module for training the dialogue policy via reinforcement learning;
wherein the user simulator module samples the user goal.
Preferably, the user goal includes,
a disease tag tagging the disease that the user suffers;
explicit symptoms extracted from the user self-report data;
implicit symptoms extracted from the user-doctor conversational data; and
disease slots that the user request.
Preferably, the user simulator module includes a dialogue state tracker for tracking the dialogue state.
Preferably, the user simulator module iteratively takes a user action according to a current user state and a previous agent action, and transits into the next user state; wherein the agent action includes “inform” action and “request” action, while the user action includes “deny” action, “confirm” action and “not-sure” action.
Preferably, the user state includes an agenda and a goal, wherein the agenda contains a list of symptoms and symptoms status, and the agenda tracks the progress of the dialogue, and the user goal ensures that the user simulator module behaves in a consistent, goal-oriented manner.
Preferably, the symptoms status is whether or not symptoms are requested.
Preferably, every dialogue session is initiated by the user simulator via a user action which includes the requested disease slot and all explicit symptoms.
Preferably, during the course of the dialogue session, in terms of the symptom requested by the agent simulator, the user simulator returns a positive answer when the symptom is positive, a negative answer when the symptom is negative, and a not-sure answer when the symptom is not mentioned in the user goal.
Preferably, the dialogue session will be recognized as successful when the agent simulator informs correct disease.
Preferably, the dialogue session will be recognized as failed when the agent simulator makes incorrect diagnosis or the dialogue turn reaches the maximum dialogue turn.
Preferably, the dialogue session will be terminated by the user simulator when recognized as successful.
Preferably, the dialogue policy learning module trains the dialogue policy by using parameters of dialogue states, actions, rewards, policy, and transitions.
Preferably, the dialogue state includes symptoms requested by the agent simulator and informed by the user simulator till the current time, the previous action of the user simulator, the previous action of the agent simulator and the turn information.
Preferably, the dialogue state further comprises a symptoms vector, the dimension of which is equal to the number of all symptoms; wherein elements of the symptoms vector are 1 for positive symptoms, −1 for negative symptoms, −2 for not-sure symptoms, and 0 for not-mentioned symptoms.
Preferably, the actions each including a dialogue act (e.g., “inform”, “request”, “deny” and “confirm”) and a slot (i.e., normalized symptoms or a special “disease” slot).
Preferably, the transition is the updating of dialogue state based on the current agent action, the previous user action and the step time.
Preferably, the reward is an immediate reward at step time t after taking the current agent action.
Preferably, the policy describes the behaviors of the agent simulator, takes the dialogue state as input and outputs the probability distribution over all agent actions.
Preferably, the policy is parameterized with a deep Q-network which takes the dialogue state as input and outputs Q for all agent actions.
Preferably, the Q-network is trained by updating the parameters iteratively to reduce the mean squared error between the Q-value computed from the current network Q and the Q-value obtained from the Bellman equation.
Preferably, the Bellman equation is parameterized as
yi=r+γ max_a′ Q(s′,a′|θ _i ⁻)
wherein, Q(s′, a′|θ_i ⁻) is the target network with parameters from some previous iteration.
Preferably, the current DQN network is updated multiple times with different batches drawn randomly from the buffer, while the target DQN network is fixed during the updating of current DQN network.
Preferably, the times that the current DQN network updated is depending on the batch size and the current size of replay buffer.
Preferably, at the end of each epoch, the target network is replaced by the current network and the current network is evaluated on training set.
Preferably, the buffer will be flushed if the current network performs better than all previous versions of network.
In another aspect of this invention, it is provided a task-oriented dialogue method, including
provide a symptom extraction module for extracting the symptoms from a dataset including user self-report data and user-doctor conversational data;
provide a symptoms normalization module for normalizing the extracted symptoms and generating a user goal;
provide an agent simulator module simulating the behavior of a doctor, and a user simulator module simulating the behavior of a patient; and
provide a dialogue policy learning module for training the dialogue policy via reinforcement learning;
wherein the user simulator module samples the user goal.

BRIEF DESCRIPTION OF DRAWINGS

The foregoing summary, as well as the following detailed description, will be better understood when read in conjunction with the appended drawings. For the purpose of illustration, there is shown in the drawings certain embodiments of the present disclosure. It should be understood, however, that the invention is not limited to the precise arrangements and instrumentalities shown. The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an implementation of systems and apparatuses consistent with the present invention and, together with the description, serve to explain advantages and principles consistent with the invention.

Wherein:

FIG. 1 illustratively shows an example utterance with annotations of symptoms in BIO format used in one embodiment of this application;

FIG. 2 illustratively shows an example of user goal used in one embodiment of this application; and

FIG. 3 illustratively shows the learning curve of all the three dialogue systems used in the experiment of this application.

DETAILED DESCRIPTION OF THE EMBODIMENTS

Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not limited in its application to the details of construction and to the arrangements of the components set forth in the following description or illustrated in the drawings. The Figures and written description are provided to teach any person skilled in the art to make and use the inventions for which patent protection is sought. The invention is capable of other embodiments and of being practiced and carried out in various ways. Those skilled in the art will appreciate that not all features of a commercial embodiment are shown for the sake of clarity and understanding. Persons of skill in the art will also appreciate that the development of an actual commercial embodiment incorporating aspects of the present inventions will require numerous implementation—specific decisions to achieve the developer's ultimate goal for the commercial embodiment. While these efforts may be complex and time-consuming, these efforts nevertheless would be a routine undertaking for those of skill in the art having the benefit of this disclosure.
In addition, it is to be understood that the phraseology and terminology employed herein are for the purpose of description and should not be regarded as limiting. For example, the use of a singular term, such as, “a” is not intended as limiting of the number of items. Also the use of relational terms, such as but not limited to, “top,” “bottom,” “left,” “right,” “upper,” “lower,” “down,” “up,” “side,” are used in the description for clarity in specific reference to the Figures and are not intended to limit the scope of the invention or the appended claims. Further, it should be understood that any one of the features of the invention may be used separately or in combination with other features. Other systems, methods, features, and advantages of the invention will be or become apparent to one with skill in the art upon examination of the Figures and the detailed description. It is intended that all such additional systems, methods, features, and advantages be included within this description, be within the scope of the present invention, and be protected by the accompanying claims.
Embodiments of the subject matter and the functional operations described in this specification optionally can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments of the subject matter described in this specification can, for example, be implemented as one or more computer program products, e.g., one or more modules of computer program instructions encoded on a computer readable medium for execution by, or to control the operation of, data processing apparatus.
The computer readable medium can be a machine readable tangible storage device, a machine readable tangible storage substrate, a tangible memory device, or a combination of one or more of them. The term “data processing apparatus” encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. A computer program (also known as a program, software, software application, script, or code), can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., on or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a processor for performing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio player, a Global Positioning System (GPS) receiver, to name just a few. Computer readable media suitable for storing computer program instructions and data include all forms of nonvolatile memory, media, and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
To provide for interaction with a user, embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) to LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any from, including acoustic, speech, or tactile input.
Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back end, middleware, or front end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), e.g., the Internet.
The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client server relationship to each other.
In this application, the dataset is collected from the pediatric department in a Chinese online healthcare community (http://muzhi.baidu.com). It is a popular website for users to inquire with doctors online. Usually, a patient would provide a piece of self-report presenting his/her basic conditions. Then a doctor will initialize a conversation to collect more information and make a diagnosis based on both the self-report and the conversational data. An example is shown in Table 1. Please note in this application some data is collected in Chinese, so in the tables and figures we have added English translation to the data in Chinese.

TABLE 1

Self-report

	, ?
	The little baby get sputum in throat and have watery diarrhea.
	what kind of medicine needs to be taken?

Conversation

. . .
Doctor:	?
	Does the baby have a cough or diarrhea now?
Patient:	,
	No cough, but diarrhea.
Doctor:	?
	Does the baby choking milk?
Patient:
	He vomits milk sometimes.
. . .

As can be seen, the doctor can obtain additional symptoms during conversation beyond the self-report. For each patient, the final diagnosis from doctors can also be obtained as the label. For clarity, symptoms from self-reports are termed as explicit symptoms while those from conversational data as implicit symptoms.
Four types of diseases are chosen for annotation, including upper respiratory infection, children functional dyspepsia, infantile diarrhea and children's bronchitis. Three annotators (one with medical background) are invited to label all the symptom phrases in both self-reports and conversational data. The annotation is performed in two steps, namely symptom extraction and symptom normalization.
It is to follow the BIO (begin-in-out) schema for symptom identification.
FIG. 1 shows an example utterance with annotations of symptoms in BIO format. Each Chinese character is assigned a label of “B”, “I” or “0”. Also, each extracted symptom expression is tagged with “True” or “False” indicating whether the patient suffers from this symptom or not. In order to improve the annotation agreement between annotators, it is to create two guidelines for the self-report and the conversational data respectively. Each record is annotated by at least two annotators. Any inconsistency would be further judged by the third one. The Cohen's kappa coefficient between two annotators are 71% and 67% for self-reports and conversations respectively.
After symptom expression identification, medical experts manually link each symptom expression to the most relevant concept on SNOMED CT (https://www.snomed.org/snomed-ct) for normalization. Table 2 shows some phrases that describe symptoms in the example and some related concepts in SNOMED CT.

	TABLE 2

		Related concept in
	Extracted symptom expression	SNOMED CT

	(cough)	(cough)
	(sneez)	(sneezing)
	(cnot)	(cnot)
	(have loose bowels)	(diarrhea)
	37.5-37.7 (body	(low-grade fever)
	temperature between 37.5-37.7)

The overview of dataset is presented in Table 3, wherein # of user goal is the number of dialogue sessions of each disease, Ave # of explicit symptoms and Ave # of implicit symptoms are the average number of explicit and implicit symptoms among user goals respectively.

TABLE 3

	of	Ave of	Ave of
	user	explicit	implicit
Disease	goal	symptoms	symptoms

infantile diarrhea	200	1.15	2.71
children functional dyspepsia	150	1.70	3.20
upper respiratory infection	160	2.56	3.55
children's bronchitis	200	2.87	3.64

After symptom extraction and normalization, there are 144 unique symptoms identified. In order to reduce the size of action space of the DS, only 67 symptoms with a frequency greater than or equal to 10 are kept. Samples are then generated, called “user goal”. Each user goal (see below an example in FIG. 2) is derived from one real world patient record (www.sdspeople.fudan.edu.cn/zywei/data/ac12018-mds.zip).
A task-oriented DS typically contains three components, namely Natural Language Understanding (NLU), Dialogue Manager (DM) and Natural Language Generation (NLG). NLU detects the user intent and slots with values from utterances; DM tracks the dialogue states and takes system actions; NLG generates natural language given the system actions.
In this application, it is to focus on the DM for automatic diagnosis consisting of two sub-modules, namely, dialogue state tracker (DST) and policy learning. Both NLU and NLG are implemented with template-based models.
Typically, a user simulator is designed to interact with the dialogue system. In this application it is to follow the same setting as Li et al (“End-to-end task completion neural dialogue systems, 2017, Proceedings of the Eighth International Joint Conference on Natural Language Processing) to design the medical DS of this application.
FIG. 2 shows an example of user goal. Each user goal consists of four parts, disease tag is the disease that the user suffers; explicit symptoms are symptoms extracted from the user self-report; implicit symptoms are symptoms extracted from the conversational data between the patient and the doctor; request slots is the disease slot that the user would request.
At the beginning of a dialogue session, the user simulator samples a user goal, while the agent attempts to make a diagnosis for the user. The system will learn to select the best response action at each time step by maximizing a long term reward.
At the beginning of each dialogue session, a user simulator samples a user goal from the experiment dataset. At each turn t, the user takes an action au,t according to the current user state su,t and the previous agent action at-1, and transits into the next user state su, t+1. In practice, the user state su is factored into an agenda A (“Agenda-based user simulation for bootstrapping a pomdp dialogue system”, Schatzmann et. Al., 2007, The Conference of the North American Chapter of the Association for Computational Linguistics) and a goal G, noted as su=(A,G). During the course of the dialogue, the goal G ensures that the user behaves in a consistent, goal-oriented manner. And the agenda contains a list of symptoms and their status (whether or not they are requested) to track the progress of the conversation.
Every dialogue session is initiated by the user via the user action au,1 which consists of the requested disease slot and all explicit symptoms. In terms of the symptom requested by the agent during the course of the dialogue, the user will take one of the three actions including True (if the symptom is positive), False (if the symptom is negative), and not_sure (if the symptom is not mentioned in the user goal). If the agent informs correct disease, the dialogue session will be terminated as successful by the user. Otherwise, the dialogue session will be recognized as failed if the agent makes incorrect diagnosis or the dialogue turn reaches the maximum dialogue turn T.
In this application, we cast DS as Markov Decision Process (MDP) (“Pomdp-based statistical spoken dialog systems: A review”. Young et al., 2013, Proceedings of the IEEE) and train the dialogue policy via reinforcement learning (“Strategic dialogue management via deep reinforcement learning”, Cuayahuitl et al., 2015, CoRR). An MDP is composed of states, actions, rewards, policy, and transitions.
A dialogue state s includes symptoms requested by the agent and informed by the user till the current time t, the previous action of the user, the previous action of the agent and the turn information. In terms of the representation vector of symptoms, it's dimension is equal to the number of all symptoms, whose elements for positive symptoms are 1, negative symptoms are −1, not-sure symptoms are −2 and not-mentioned symptoms are 0. Each state s ∈ S is the concatenation of these four vectors.
An action a ∈A is composed of a dialogue act (e.g., “inform”, “request”, “deny” and “confirm”) and a slot (i.e., normalized symptoms or a special slot “disease”). In addition, “thanks” and “close dialogue” are also two actions.
The transition from st to st+1 is the updating of state st based on the agent action at, the previous user action au,t−1 and the step time t.
The reward rt+1=R(st, at) is the immediate reward at step time t after taking the action at, also known as reinforcement.
The policy π describes the behaviors of an agent, which takes the state st as input and outputs the probability distribution over all possible actions π(at|st).
In this application, the policy is parameterized with a deep Q-network (DQN) (Mnih et al., “Human-level control through deep reinforcement learning”, Nature, 2015), which takes the state st as input and outputs Q(st, a; ⁰) for all actions a. A Q-network can be trained by updating the parameters ⁰i at iteration i to reduce the mean squared error between the Q-value computed from the current network Q(s, a|⁰i) and the Q-value obtained from the Bellman equation
yi=r+γ max_a′ Q(s′,a′|θ _i ⁻)
where Q(s′, a′|θ_i ⁻) is the target network with parameters θ_i ⁻ from some previous iteration. In practice, the behavior distribution is often selected by an ∈-greedy policy that takes an action a=arg max_a′Q(s_t, a′; θ) with probability 1-∈ and selects a random action with probability ∈, which can improve the efficiency of exploration. When training the policy, experience replay is used. We store the agent's experiences at each time-step, ε_t=(s_t, a_t, r_t, s_t+1) in a fixed size, queue-like buffer D.
In a simulation epoch, the current DQN network is updated multiple times (depending on the batch size and the current size of replay buffer) with different batches drawn randomly from the buffer, while the target DQN network is fixed during the updating of current DQN network. At the end of each epoch, the target network is replaced by the current network and the current network is evaluated on training set. The buffer will be flushed if the current network performs better than all previous versions.

Experiments

The max dialogue turn T is 22. A positive reward of +44 is given to the agent at the end of a success dialogue, and a −22 reward is given to a failure one. We apply a step penalty of −1 for each turn to encourage shorter dialogues. The dataset is divided into two parts: 80% for training with 568 user goals and 20% for testing with 142 user goals. The ∈ of ∈-greedy strategy is set to 0.1 for effective action space exploration and the γ in Bellman equation is 0.9. The size of buffer D is 10000 and the batch size is 30. And the neural network of DQN is a single layer network. The learning rate is 0.001. Each simulation epoch consists of 100 dialogue sessions and the current network is evaluated on 500 dialogue sessions at the end of each epoch. Before training, the buffer is pre-filled with the experiences of the rule-based agent (see below) to warm start our dialogue system.
To evaluate the performance of the proposed framework, our model is compared with baselines in terms of three evaluation metrics following Li et al. (“End-to-end task completion neural dialogue systems”, 2017, In Proceedings of the Eighth International Joint Conference on Natural Language Processing) and Peng at al. (“Adversarial advantage actor-critic model for task-completion dialogue policy learning”, 2017, https://arxiv.org/abs/1710.11277; “Composite task-completion dialogue policy learning via hierarchical deep reinforcement learning”, 2017, Conference on Empirical Methods in Natural Language Processing), namely, success rate, average reward and the average number of turns per dialogue session. As for classification models, we use accuracy as the metric.
The baselines include:
(1) SVM: This model treats the automatic diagnosis as a multi-class classification problem. It takes one-hot representation of symptoms in the user goal as input, and predicts the disease. There are two configurations: one takes both explicit and implicit symptoms as input (denoted as SVM-ex&im), and the other takes only explicit symptoms to predict the disease (denoted as SVM-ex).
(2) Random Agent: At each turn, the random agent takes an action randomly from the action space as the response to the user's action.
(3) Rule-based Agent: The rule-based agent takes an action based on handcrafted rules. Conditioned on the current dialogue state s_t, the agent will inform disease if all the known symptoms related are detected. If no disease can be identified, the agent will select one of the left symptoms randomly to inform. The relations between diseases and symptoms are extracted from the annotated corpus in advance. In this work, only the first T/2.5 (2.5 is a hyper-parameter) symptoms with high frequency are kept for each disease so that the rule-based agent could inform a disease within the max dialogue turn T.
Table 4 shows the accuracy of two SVM-based models.

TABLE 4

Disease	SVM-ex&im	SVM-ex

Infantile diarrhea	0.91	0.89
Children functional dyspepsia	0.34	0.28
Upper respiratory infection	0.52	0.44
Children's bronchitis	0.93	0.71
Overall	0.71	0.59

The result shows that the implicit symptoms can greatly improve the accuracy of disease identification for all the four diseases, which demonstrates the contribution of implicit symptoms when making diagnosis for patients.
FIG. 3 shows the learning curve of all the three dialogue systems and Table 5 shows the performance of these agents on testing set, wherein performance of the three dialogue systems on 5K simulated dialogues is shown.

TABLE 5

Model	Success	Reward	Turn

Random Agent	0.06	−24.36	17.51
Rule Agent	0.23	−13.78	17.00
DQN Agent	0.65	20.51	5.11

Due to the large action space, the random agent performs badly. The rule-based agent outperforms the random agent in a large margin. This indicates that the rule-based agent is well designed. It can also be seen that the RL-based DQN agent outperforms rule-based agent significantly. Moreover, DQN agent outperforms SVM-ex by collecting additional implicit symptoms via conversing with patients. However, there is still a gap between the performance of DQN agent and SVM-ex&im in terms of accuracy, which indicates that there is still rooms for the improvement of the dialogue system.
The experiment results show that the dialogue system of this invention is able to collect additional symptoms via conversation with patients and improve the accuracy for automatic diagnosis. Hence, it fills the gap of applying DS in disease identification
It will be appreciated by those skilled in the art that changes could be made to the embodiments described above without departing from the broad inventive concept thereof. It is understood, therefore, that the invention disclosed herein is not limited to the particular embodiments disclosed, but it is intended to cover modifications within the spirit and scope of the present invention as defined by the appended claims.

Claims

1. A task-oriented dialogue system, comprising:

a symptom extraction module for extracting the symptoms from a dataset including user self-report data and user-doctor conversational data;

a symptoms normalization module for normalizing the extracted symptoms and generating a user goal;

an agent simulator module simulating the behavior of a doctor, and a user simulator module simulating the behavior of a patient; and

a dialogue policy learning module for training the dialogue policy via reinforcement learning;

wherein the user goal includes

a disease tag tagging the disease that the user suffers;

explicit symptoms extracted from the user self-report data;

implicit symptoms extracted from the user-doctor conversational data; and

disease slots that the user request; and

wherein the user simulator module samples the user goal,

2. The system of claim 1, wherein the user simulator module includes a dialogue state tracker for tracking the dialogue state.

3. The system of claim 2, wherein the user simulator module iteratively takes a user action according to a current user state and a previous agent action, and transits into the next user state; wherein the agent action includes “inform” action and “request” action, while the user action includes “deny” action, “confirm” action and “not-sure” action.

4. The system of claim 1, wherein, every dialogue session is initiated by the user simulator via a user action which includes the requested disease slot and all explicit symptoms.

5. The system of claim 4, wherein during the course of the dialogue session, in terms of the symptom requested by the agent simulator, the user simulator returns a positive answer when the symptom is positive, a negative answer when the symptom is negative, and a not-sure answer when the symptom is not mentioned in the user goal.

6. The system of claim 4, wherein the dialogue session will be recognized as successful when the agent simulator informs correct disease.

7. The system of claim 4, wherein the dialogue session will be recognized as failed when the agent simulator makes incorrect diagnosis or the dialogue turn reaches the maximum dialogue turn.

8. The system of claim 6, wherein the dialogue session will be terminated by the user simulator when recognized as successful.

9. The system of claim 1, wherein the dialogue policy learning module trains the dialogue policy by using parameters of dialogue states, actions, rewards, policy, and transitions.

10. The system of claim 9, wherein the dialogue state includes symptoms requested by the agent simulator and informed by the user simulator till the current time, the previous action of the user simulator, the previous action of the agent simulator and the turn information.

11. The system of claim 1, wherein the dialogue state further comprises a symptoms vector, the dimension of which is equal to the number of all symptoms; wherein elements of the symptoms vector are 1 for positive symptoms, −1 for negative symptoms, −2 for not-sure symptoms, and 0 for not-mentioned symptoms.

12. The system of claim 9, wherein the transition is the updating of dialogue state based on the current agent action, the previous user action and the step time.

13. The system of claim 9, wherein the reward is an immediate reward at step time t after taking the current agent action.

14. The system of claim 9, wherein the policy describes the behaviors of the agent simulator, takes the dialogue state as input and outputs the probability distribution over all agent actions.

15. The system of claim 14, wherein the policy is parameterized with a deep Q-network which takes the dialogue state as input and outputs Q for all agent actions.

16. The system of claim 15, wherein the Q-network is trained by updating the parameters iteratively to reduce the mean squared error between the Q-value computed from the current network Q and the Q-value obtained from the Bellman equation.

17. The system of claim 16, wherein the Bellman equation is parameterized as

y _i =r+γ max_a′ Q(s′,a′|θ _i ⁻)

and wherein, Q(s′, a′|θ_i ⁻) is the target network with parameters from some previous iteration.

18. The system of claim 1, wherein the current DQN network is updated multiple times with different batches drawn randomly from the buffer, while the target DQN network is fixed during the updating of current DQN network; and the times that the current DQN network updated is depending on the batch size and the current size of replay buffer.

19. A task-oriented dialogue system, comprising:

wherein the user simulator module samples the user goal.

20. A task-oriented dialogue method, comprising:

providing a symptom extraction module for extracting the symptoms from a dataset including user self-report data and user-doctor conversational data;

providing a symptoms normalization module for normalizing the extracted symptoms and generating a user goal;

providing an agent simulator module simulating the behavior of a doctor, and a user simulator module simulating the behavior of a patient; and

providing a dialogue policy learning module for training the dialogue policy via reinforcement learning;

wherein the user simulator module samples the user goal.