[go: up one dir, main page]

US20220130545A1 - Task-oriented Dialogue System for Automatic Disease Diagnosis - Google Patents

Task-oriented Dialogue System for Automatic Disease Diagnosis Download PDF

Info

Publication number
US20220130545A1
US20220130545A1 US17/508,655 US202117508655A US2022130545A1 US 20220130545 A1 US20220130545 A1 US 20220130545A1 US 202117508655 A US202117508655 A US 202117508655A US 2022130545 A1 US2022130545 A1 US 2022130545A1
Authority
US
United States
Prior art keywords
user
dialogue
symptoms
simulator
agent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US17/508,655
Inventor
Zhongyu Wei
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202011136008.2A external-priority patent/CN112420189A/en
Priority claimed from CN202011135075.2A external-priority patent/CN112349409A/en
Application filed by Individual filed Critical Individual
Publication of US20220130545A1 publication Critical patent/US20220130545A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7271Specific aspects of physiological measurement analysis
    • A61B5/7275Determining trends in physiological measurement data; Predicting development of a medical condition based on physiological measurements, e.g. determining a risk factor
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/20ICT specially adapted for the handling or processing of patient-related medical or healthcare data for electronic clinical trials or questionnaires
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H40/00ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
    • G16H40/20ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the management or administration of healthcare resources or facilities, e.g. managing hospital staff or surgery rooms

Definitions

  • This invention is related to a system for automatic disease diagnosis, and more particularly, to a task-oriented dialogue system for automatic disease diagnosis. This invention is also related to a method thereof.
  • each EHR contains multiple types of data, including personal information, admission note, diagnose tests, vital signs and medical image. And it is collected accumulatively following a diagnostic procedure in clinic, which involves interactions between patients and doctors and some complicated medical tests. Therefore, it is very expensive to collect EHRs for different diseases. How to collect the information from patient automatically remains the challenge for automatic diagnosis.
  • a task-oriented dialogue system including
  • a symptom extraction module for extracting the symptoms from a dataset including user self-report data and user-doctor conversational data
  • a symptoms normalization module for normalizing the extracted symptoms and generating a user goal
  • an agent simulator module simulating the behavior of a doctor, and a user simulator module simulating the behavior of a patient
  • a dialogue policy learning module for training the dialogue policy via reinforcement learning
  • the user simulator module samples the user goal.
  • the user goal includes,
  • the user simulator module includes a dialogue state tracker for tracking the dialogue state.
  • the user simulator module iteratively takes a user action according to a current user state and a previous agent action, and transits into the next user state; wherein the agent action includes “inform” action and “request” action, while the user action includes “deny” action, “confirm” action and “not-sure” action.
  • the user state includes an agenda and a goal, wherein the agenda contains a list of symptoms and symptoms status, and the agenda tracks the progress of the dialogue, and the user goal ensures that the user simulator module behaves in a consistent, goal-oriented manner.
  • the symptoms status is whether or not symptoms are requested.
  • every dialogue session is initiated by the user simulator via a user action which includes the requested disease slot and all explicit symptoms.
  • the user simulator returns a positive answer when the symptom is positive, a negative answer when the symptom is negative, and a not-sure answer when the symptom is not mentioned in the user goal.
  • the dialogue session will be recognized as successful when the agent simulator informs correct disease.
  • the dialogue session will be recognized as failed when the agent simulator makes incorrect diagnosis or the dialogue turn reaches the maximum dialogue turn.
  • the dialogue session will be terminated by the user simulator when recognized as successful.
  • the dialogue policy learning module trains the dialogue policy by using parameters of dialogue states, actions, rewards, policy, and transitions.
  • the dialogue state includes symptoms requested by the agent simulator and informed by the user simulator till the current time, the previous action of the user simulator, the previous action of the agent simulator and the turn information.
  • the dialogue state further comprises a symptoms vector, the dimension of which is equal to the number of all symptoms; wherein elements of the symptoms vector are 1 for positive symptoms, ⁇ 1 for negative symptoms, ⁇ 2 for not-sure symptoms, and 0 for not-mentioned symptoms.
  • the actions each including a dialogue act (e.g., “inform”, “request”, “deny” and “confirm”) and a slot (i.e., normalized symptoms or a special “disease” slot).
  • a dialogue act e.g., “inform”, “request”, “deny” and “confirm”
  • a slot i.e., normalized symptoms or a special “disease” slot.
  • the transition is the updating of dialogue state based on the current agent action, the previous user action and the step time.
  • the reward is an immediate reward at step time t after taking the current agent action.
  • the policy describes the behaviors of the agent simulator, takes the dialogue state as input and outputs the probability distribution over all agent actions.
  • the policy is parameterized with a deep Q-network which takes the dialogue state as input and outputs Q for all agent actions.
  • the Q-network is trained by updating the parameters iteratively to reduce the mean squared error between the Q-value computed from the current network Q and the Q-value obtained from the Bellman equation.
  • the Bellman equation is parameterized as
  • ⁇ i ⁇ ) is the target network with parameters from some previous iteration.
  • the current DQN network is updated multiple times with different batches drawn randomly from the buffer, while the target DQN network is fixed during the updating of current DQN network.
  • the times that the current DQN network updated is depending on the batch size and the current size of replay buffer.
  • the target network is replaced by the current network and the current network is evaluated on training set.
  • the buffer will be flushed if the current network performs better than all previous versions of network.
  • a task-oriented dialogue method including
  • a symptom extraction module for extracting the symptoms from a dataset including user self-report data and user-doctor conversational data
  • a symptoms normalization module for normalizing the extracted symptoms and generating a user goal
  • the user simulator module samples the user goal.
  • FIG. 1 illustratively shows an example utterance with annotations of symptoms in BIO format used in one embodiment of this application;
  • FIG. 2 illustratively shows an example of user goal used in one embodiment of this application.
  • FIG. 3 illustratively shows the learning curve of all the three dialogue systems used in the experiment of this application.
  • Embodiments of the subject matter and the functional operations described in this specification optionally can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them.
  • Embodiments of the subject matter described in this specification can, for example, be implemented as one or more computer program products, e.g., one or more modules of computer program instructions encoded on a computer readable medium for execution by, or to control the operation of, data processing apparatus.
  • the computer readable medium can be a machine readable tangible storage device, a machine readable tangible storage substrate, a tangible memory device, or a combination of one or more of them.
  • data processing apparatus encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers.
  • the apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.
  • a computer program (also known as a program, software, software application, script, or code), can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
  • a computer program does not necessarily correspond to a file in a file system.
  • a program can be stored in a portion of a file that holds other programs or data (e.g., on or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code).
  • a computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
  • the processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output.
  • the processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
  • processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer.
  • a processor will receive instructions and data from a read only memory or a random access memory or both.
  • the essential elements of a computer are a processor for performing instructions and one or more memory devices for storing instructions and data.
  • a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks.
  • mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks.
  • a computer need not have such devices.
  • a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio player, a Global Positioning System (GPS) receiver, to name just a few.
  • Computer readable media suitable for storing computer program instructions and data include all forms of nonvolatile memory, media, and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD ROM disks.
  • the processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
  • embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) to LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer.
  • a display device e.g., a CRT (cathode ray tube) to LCD (liquid crystal display) monitor
  • a keyboard and a pointing device e.g., a mouse or a trackball
  • Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any from, including acoustic, speech, or tactile input.
  • Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back end, middleware, or front end components.
  • the components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), e.g., the Internet.
  • LAN local area network
  • WAN wide area network
  • the computing system can include clients and servers.
  • a client and server are generally remote from each other and typically interact through a communication network.
  • the relationship of client and server arises by virtue of computer programs running on the respective computers and having a client server relationship to each other.
  • the dataset is collected from the pediatric department in a Chinese online healthcare community (http://muzhi.baidu.com). It is a popular website for users to inquire with doctors online. Usually, a patient would provide a piece of self-report presenting his/her basic conditions. Then a doctor will initialize a conversation to collect more information and make a diagnosis based on both the self-report and the conversational data.
  • Table 1 An example is shown in Table 1. Please note in this application some data is collected in Chinese, so in the tables and figures we have added English translation to the data in Chinese.
  • the doctor can obtain additional symptoms during conversation beyond the self-report.
  • the final diagnosis from doctors can also be obtained as the label.
  • symptoms from self-reports are termed as explicit symptoms while those from conversational data as implicit symptoms.
  • BIO begin-in-out
  • FIG. 1 shows an example utterance with annotations of symptoms in BIO format.
  • Each Chinese character is assigned a label of “B”, “I” or “0”.
  • each extracted symptom expression is tagged with “True” or “False” indicating whether the patient suffers from this symptom or not.
  • Each record is annotated by at least two annotators. Any inconsistency would be further judged by the third one.
  • the Cohen's kappa coefficient between two annotators are 71% and 67% for self-reports and conversations respectively.
  • Each user goal (see below an example in FIG. 2 ) is derived from one real world patient record (www.sdspeople.fudan.edu.cn/zywei/data/ac12018-mds.zip).
  • a task-oriented DS typically contains three components, namely Natural Language Understanding (NLU), Dialogue Manager (DM) and Natural Language Generation (NLG).
  • NLU detects the user intent and slots with values from utterances; DM tracks the dialogue states and takes system actions; NLG generates natural language given the system actions.
  • a user simulator is designed to interact with the dialogue system.
  • this application it is to follow the same setting as Li et al (“End-to-end task completion neural dialogue systems, 2017, Proceedings of the Eighth International Joint Conference on Natural Language Processing) to design the medical DS of this application.
  • FIG. 2 shows an example of user goal.
  • Each user goal consists of four parts, disease tag is the disease that the user suffers; explicit symptoms are symptoms extracted from the user self-report; implicit symptoms are symptoms extracted from the conversational data between the patient and the doctor; request slots is the disease slot that the user would request.
  • the user simulator samples a user goal, while the agent attempts to make a diagnosis for the user.
  • the system will learn to select the best response action at each time step by maximizing a long term reward.
  • a user simulator samples a user goal from the experiment dataset.
  • the user takes an action au,t according to the current user state su,t and the previous agent action at-1, and transits into the next user state su, t+1.
  • the goal G ensures that the user behaves in a consistent, goal-oriented manner.
  • the agenda contains a list of symptoms and their status (whether or not they are requested) to track the progress of the conversation.
  • Every dialogue session is initiated by the user via the user action au,1 which consists of the requested disease slot and all explicit symptoms.
  • the user will take one of the three actions including True (if the symptom is positive), False (if the symptom is negative), and not_sure (if the symptom is not mentioned in the user goal).
  • the agent informs correct disease, the dialogue session will be terminated as successful by the user. Otherwise, the dialogue session will be recognized as failed if the agent makes incorrect diagnosis or the dialogue turn reaches the maximum dialogue turn T.
  • MDP Markov Decision Process
  • a dialogue state s includes symptoms requested by the agent and informed by the user till the current time t, the previous action of the user, the previous action of the agent and the turn information.
  • it's dimension is equal to the number of all symptoms, whose elements for positive symptoms are 1, negative symptoms are ⁇ 1, not-sure symptoms are ⁇ 2 and not-mentioned symptoms are 0.
  • Each state s ⁇ S is the concatenation of these four vectors.
  • An action a ⁇ A is composed of a dialogue act (e.g., “inform”, “request”, “deny” and “confirm”) and a slot (i.e., normalized symptoms or a special slot “disease”).
  • a dialogue act e.g., “inform”, “request”, “deny” and “confirm”
  • a slot i.e., normalized symptoms or a special slot “disease”.
  • “thanks” and “close dialogue” are also two actions.
  • the transition from st to st+1 is the updating of state st based on the agent action at, the previous user action au,t ⁇ 1 and the step time t.
  • the policy ⁇ describes the behaviors of an agent, which takes the state st as input and outputs the probability distribution over all possible actions ⁇ (at
  • the policy is parameterized with a deep Q-network (DQN) (Mnih et al., “Human-level control through deep reinforcement learning”, Nature, 2015), which takes the state st as input and outputs Q(st, a; 0 ) for all actions a.
  • DQN deep Q-network
  • a Q-network can be trained by updating the parameters 0 i at iteration i to reduce the mean squared error between the Q-value computed from the current network Q(s, a
  • ⁇ i ⁇ ) is the target network with parameters ⁇ i ⁇ from some previous iteration.
  • the current DQN network is updated multiple times (depending on the batch size and the current size of replay buffer) with different batches drawn randomly from the buffer, while the target DQN network is fixed during the updating of current DQN network.
  • the target network is replaced by the current network and the current network is evaluated on training set. The buffer will be flushed if the current network performs better than all previous versions.
  • the max dialogue turn T is 22.
  • a positive reward of +44 is given to the agent at the end of a success dialogue, and a ⁇ 22 reward is given to a failure one.
  • the dataset is divided into two parts: 80% for training with 568 user goals and 20% for testing with 142 user goals.
  • the ⁇ of ⁇ -greedy strategy is set to 0.1 for effective action space exploration and the ⁇ in Bellman equation is 0.9.
  • the size of buffer D is 10000 and the batch size is 30.
  • the neural network of DQN is a single layer network.
  • the learning rate is 0.001.
  • Each simulation epoch consists of 100 dialogue sessions and the current network is evaluated on 500 dialogue sessions at the end of each epoch. Before training, the buffer is pre-filled with the experiences of the rule-based agent (see below) to warm start our dialogue system.
  • the baselines include:
  • SVM This model treats the automatic diagnosis as a multi-class classification problem. It takes one-hot representation of symptoms in the user goal as input, and predicts the disease. There are two configurations: one takes both explicit and implicit symptoms as input (denoted as SVM-ex&im), and the other takes only explicit symptoms to predict the disease (denoted as SVM-ex).
  • Random Agent At each turn, the random agent takes an action randomly from the action space as the response to the user's action.
  • Rule-based Agent takes an action based on handcrafted rules. Conditioned on the current dialogue state s t , the agent will inform disease if all the known symptoms related are detected. If no disease can be identified, the agent will select one of the left symptoms randomly to inform. The relations between diseases and symptoms are extracted from the annotated corpus in advance. In this work, only the first T/2.5 (2.5 is a hyper-parameter) symptoms with high frequency are kept for each disease so that the rule-based agent could inform a disease within the max dialogue turn T.
  • Table 4 shows the accuracy of two SVM-based models.
  • FIG. 3 shows the learning curve of all the three dialogue systems and Table 5 shows the performance of these agents on testing set, wherein performance of the three dialogue systems on 5K simulated dialogues is shown.
  • DQN agent outperforms SVM-ex by collecting additional implicit symptoms via conversing with patients.
  • SVM-ex&im there is still a gap between the performance of DQN agent and SVM-ex&im in terms of accuracy, which indicates that there is still rooms for the improvement of the dialogue system.
  • the experiment results show that the dialogue system of this invention is able to collect additional symptoms via conversation with patients and improve the accuracy for automatic diagnosis. Hence, it fills the gap of applying DS in disease identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Computational Linguistics (AREA)
  • Pathology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Animal Behavior & Ethology (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Surgery (AREA)
  • Signal Processing (AREA)
  • Psychiatry (AREA)
  • Veterinary Medicine (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Physiology (AREA)
  • Databases & Information Systems (AREA)
  • Fuzzy Systems (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

A task-oriented dialogue system, and methods therefor, including a symptom extraction module for extracting the symptoms from a dataset including user self-report data and user-doctor conversational data; a symptoms normalization module for normalizing the extracted symptoms and generating a user goal; an agent simulator module simulating the behavior of a doctor, and a user simulator module simulating the behavior of a patient; and a dialogue policy learning module for training the dialogue policy via reinforcement learning; wherein the user simulator module samples the user goal.

Description

  • The present patent application claims a priority to China application No. 202011135075.2, filed on Oct. 22, 2020, and a priority to China application No. 202011136008.2, filed on Oct. 22, 2020. The entire content of each applications is incorporated herein by reference.
  • TECHNICAL FIELD
  • This invention is related to a system for automatic disease diagnosis, and more particularly, to a task-oriented dialogue system for automatic disease diagnosis. This invention is also related to a method thereof.
  • BACKGROUND
  • Automatic phenotype identification using electronic health records (EHRs) has been a rising topic in recent years (“A review of approaches to identifying patient phenotype cohorts using electronic health records”, Shivade et al., 2013, Journal of the American Medical Informatics Association). Researchers explore with various machine learning approaches to identify symptoms and diseases for patients given multiple types of information (both numerical data and pure texts). Experimental results prove the effectiveness of the identification of heart failure, type 2 diabetes, autism spectrum disorders, infection detection etc. Currently, most attempts focus on some specific types of diseases and it is difficult to transfer models from one disease to another.
  • In general, each EHR contains multiple types of data, including personal information, admission note, diagnose tests, vital signs and medical image. And it is collected accumulatively following a diagnostic procedure in clinic, which involves interactions between patients and doctors and some complicated medical tests. Therefore, it is very expensive to collect EHRs for different diseases. How to collect the information from patient automatically remains the challenge for automatic diagnosis.
  • In 2003, Milward et al. proposed an ontology-based dialogue system that supports electronic referrals for breast cancer (“Ontology-based dialogue systems”. In Proc.3rd Workshop on Knowledge and reasoning in practical dialogue systems (IJCAI03)), which can deal with the informative response of users based on the medical domain ontologies. In addition, Tang et al. and Kao et al. proposed two works where deep reinforcement learning is applied for automatic diagnosis (“Inquire and diagnose: Neural symptom checking ensemble using deep reinforcement learning”, 2016, In Proceedings of NIPS Workshop on Deep Reinforcement Learning; and “Context-aware symptom checking for disease diagnosis using hierarchical reinforcement learning”, 2018). However, their models need extra human resources to categorize the diseases into different groups and the data used is simulated that can not reflect the situation of the real patients.
  • Recently, due to its promising potentials and alluring commercial values, research about task-oriented dialogue system (DS) has attracted increasing attention in different domains, including ticket booking, online shopping and restaurant searching. It is believed that applying DS in the medical domain has great potential to reduce the cost of collecting data from patients.
  • However, there is a gap to fill for applying DS in disease identification. There are basically two major challenges. First, the lack of annotated medical dialogue dataset. Second, no available DS framework for disease identification.
  • Therefore, there is a need to provide a novel task-oriented dialogue system addressing the above problems.
  • SUMMARY
  • In this application, to address the above problems, we make the first move to build a dialogue system facilitating automatic information collection and diagnosis making for medical domain. A reinforcement learning based framework for medical dataset has been proposed. For a matching work with this framework, a first medical dataset for dialogue system, which includes both patient self-report data and patient-doctor conversational data, has been built. The experiment results on this dataset show that the dialogue system of this application is able to collect symptoms from patients via conversation and improve the accuracy for automatic diagnosis.
  • In one aspect of this invention, it is provided a task-oriented dialogue system, including
  • a symptom extraction module for extracting the symptoms from a dataset including user self-report data and user-doctor conversational data;
  • a symptoms normalization module for normalizing the extracted symptoms and generating a user goal;
  • an agent simulator module simulating the behavior of a doctor, and a user simulator module simulating the behavior of a patient; and
  • a dialogue policy learning module for training the dialogue policy via reinforcement learning;
  • wherein the user simulator module samples the user goal.
  • Preferably, the user goal includes,
  • a disease tag tagging the disease that the user suffers;
  • explicit symptoms extracted from the user self-report data;
  • implicit symptoms extracted from the user-doctor conversational data; and
  • disease slots that the user request.
  • Preferably, the user simulator module includes a dialogue state tracker for tracking the dialogue state.
  • Preferably, the user simulator module iteratively takes a user action according to a current user state and a previous agent action, and transits into the next user state; wherein the agent action includes “inform” action and “request” action, while the user action includes “deny” action, “confirm” action and “not-sure” action.
  • Preferably, the user state includes an agenda and a goal, wherein the agenda contains a list of symptoms and symptoms status, and the agenda tracks the progress of the dialogue, and the user goal ensures that the user simulator module behaves in a consistent, goal-oriented manner.
  • Preferably, the symptoms status is whether or not symptoms are requested.
  • Preferably, every dialogue session is initiated by the user simulator via a user action which includes the requested disease slot and all explicit symptoms.
  • Preferably, during the course of the dialogue session, in terms of the symptom requested by the agent simulator, the user simulator returns a positive answer when the symptom is positive, a negative answer when the symptom is negative, and a not-sure answer when the symptom is not mentioned in the user goal.
  • Preferably, the dialogue session will be recognized as successful when the agent simulator informs correct disease.
  • Preferably, the dialogue session will be recognized as failed when the agent simulator makes incorrect diagnosis or the dialogue turn reaches the maximum dialogue turn.
  • Preferably, the dialogue session will be terminated by the user simulator when recognized as successful.
  • Preferably, the dialogue policy learning module trains the dialogue policy by using parameters of dialogue states, actions, rewards, policy, and transitions.
  • Preferably, the dialogue state includes symptoms requested by the agent simulator and informed by the user simulator till the current time, the previous action of the user simulator, the previous action of the agent simulator and the turn information.
  • Preferably, the dialogue state further comprises a symptoms vector, the dimension of which is equal to the number of all symptoms; wherein elements of the symptoms vector are 1 for positive symptoms, −1 for negative symptoms, −2 for not-sure symptoms, and 0 for not-mentioned symptoms.
  • Preferably, the actions each including a dialogue act (e.g., “inform”, “request”, “deny” and “confirm”) and a slot (i.e., normalized symptoms or a special “disease” slot).
  • Preferably, the transition is the updating of dialogue state based on the current agent action, the previous user action and the step time.
  • Preferably, the reward is an immediate reward at step time t after taking the current agent action.
  • Preferably, the policy describes the behaviors of the agent simulator, takes the dialogue state as input and outputs the probability distribution over all agent actions.
  • Preferably, the policy is parameterized with a deep Q-network which takes the dialogue state as input and outputs Q for all agent actions.
  • Preferably, the Q-network is trained by updating the parameters iteratively to reduce the mean squared error between the Q-value computed from the current network Q and the Q-value obtained from the Bellman equation.
  • Preferably, the Bellman equation is parameterized as

  • yi=r+γ maxa′ Q(s′,a′|θ i )
  • wherein, Q(s′, a′|θi ) is the target network with parameters from some previous iteration.
  • Preferably, the current DQN network is updated multiple times with different batches drawn randomly from the buffer, while the target DQN network is fixed during the updating of current DQN network.
  • Preferably, the times that the current DQN network updated is depending on the batch size and the current size of replay buffer.
  • Preferably, at the end of each epoch, the target network is replaced by the current network and the current network is evaluated on training set.
  • Preferably, the buffer will be flushed if the current network performs better than all previous versions of network.
  • In another aspect of this invention, it is provided a task-oriented dialogue method, including
  • provide a symptom extraction module for extracting the symptoms from a dataset including user self-report data and user-doctor conversational data;
  • provide a symptoms normalization module for normalizing the extracted symptoms and generating a user goal;
  • provide an agent simulator module simulating the behavior of a doctor, and a user simulator module simulating the behavior of a patient; and
  • provide a dialogue policy learning module for training the dialogue policy via reinforcement learning;
  • wherein the user simulator module samples the user goal.
  • BRIEF DESCRIPTION OF DRAWINGS
  • The foregoing summary, as well as the following detailed description, will be better understood when read in conjunction with the appended drawings. For the purpose of illustration, there is shown in the drawings certain embodiments of the present disclosure. It should be understood, however, that the invention is not limited to the precise arrangements and instrumentalities shown. The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an implementation of systems and apparatuses consistent with the present invention and, together with the description, serve to explain advantages and principles consistent with the invention.
  • Wherein:
  • FIG. 1 illustratively shows an example utterance with annotations of symptoms in BIO format used in one embodiment of this application;
  • FIG. 2 illustratively shows an example of user goal used in one embodiment of this application; and
  • FIG. 3 illustratively shows the learning curve of all the three dialogue systems used in the experiment of this application.
  • DETAILED DESCRIPTION OF THE EMBODIMENTS
  • Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not limited in its application to the details of construction and to the arrangements of the components set forth in the following description or illustrated in the drawings. The Figures and written description are provided to teach any person skilled in the art to make and use the inventions for which patent protection is sought. The invention is capable of other embodiments and of being practiced and carried out in various ways. Those skilled in the art will appreciate that not all features of a commercial embodiment are shown for the sake of clarity and understanding. Persons of skill in the art will also appreciate that the development of an actual commercial embodiment incorporating aspects of the present inventions will require numerous implementation—specific decisions to achieve the developer's ultimate goal for the commercial embodiment. While these efforts may be complex and time-consuming, these efforts nevertheless would be a routine undertaking for those of skill in the art having the benefit of this disclosure.
  • In addition, it is to be understood that the phraseology and terminology employed herein are for the purpose of description and should not be regarded as limiting. For example, the use of a singular term, such as, “a” is not intended as limiting of the number of items. Also the use of relational terms, such as but not limited to, “top,” “bottom,” “left,” “right,” “upper,” “lower,” “down,” “up,” “side,” are used in the description for clarity in specific reference to the Figures and are not intended to limit the scope of the invention or the appended claims. Further, it should be understood that any one of the features of the invention may be used separately or in combination with other features. Other systems, methods, features, and advantages of the invention will be or become apparent to one with skill in the art upon examination of the Figures and the detailed description. It is intended that all such additional systems, methods, features, and advantages be included within this description, be within the scope of the present invention, and be protected by the accompanying claims.
  • Embodiments of the subject matter and the functional operations described in this specification optionally can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments of the subject matter described in this specification can, for example, be implemented as one or more computer program products, e.g., one or more modules of computer program instructions encoded on a computer readable medium for execution by, or to control the operation of, data processing apparatus.
  • The computer readable medium can be a machine readable tangible storage device, a machine readable tangible storage substrate, a tangible memory device, or a combination of one or more of them. The term “data processing apparatus” encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. A computer program (also known as a program, software, software application, script, or code), can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., on or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
  • The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
  • Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a processor for performing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio player, a Global Positioning System (GPS) receiver, to name just a few. Computer readable media suitable for storing computer program instructions and data include all forms of nonvolatile memory, media, and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
  • To provide for interaction with a user, embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) to LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any from, including acoustic, speech, or tactile input.
  • Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back end, middleware, or front end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), e.g., the Internet.
  • The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client server relationship to each other.
  • In this application, the dataset is collected from the pediatric department in a Chinese online healthcare community (http://muzhi.baidu.com). It is a popular website for users to inquire with doctors online. Usually, a patient would provide a piece of self-report presenting his/her basic conditions. Then a doctor will initialize a conversation to collect more information and make a diagnosis based on both the self-report and the conversational data. An example is shown in Table 1. Please note in this application some data is collected in Chinese, so in the tables and figures we have added English translation to the data in Chinese.
  • TABLE 1
    Self-report
    Figure US20220130545A1-20220428-P00001
      ,
    Figure US20220130545A1-20220428-P00002
    Figure US20220130545A1-20220428-P00003
    Figure US20220130545A1-20220428-P00004
     ?
    The little baby get sputum in throat and have watery diarrhea.
    what kind of medicine needs to be taken?
    Conversation
    . . .
    Doctor:
    Figure US20220130545A1-20220428-P00005
    Figure US20220130545A1-20220428-P00006
     ?
    Does the baby have a cough or diarrhea now?
    Patient:
    Figure US20220130545A1-20220428-P00007
     ,
    Figure US20220130545A1-20220428-P00008
    No cough, but diarrhea.
    Doctor:
    Figure US20220130545A1-20220428-P00009
     ?
    Does the baby choking milk?
    Patient:
    Figure US20220130545A1-20220428-P00010
    He vomits milk sometimes.
    . . .
  • As can be seen, the doctor can obtain additional symptoms during conversation beyond the self-report. For each patient, the final diagnosis from doctors can also be obtained as the label. For clarity, symptoms from self-reports are termed as explicit symptoms while those from conversational data as implicit symptoms.
  • Four types of diseases are chosen for annotation, including upper respiratory infection, children functional dyspepsia, infantile diarrhea and children's bronchitis. Three annotators (one with medical background) are invited to label all the symptom phrases in both self-reports and conversational data. The annotation is performed in two steps, namely symptom extraction and symptom normalization.
  • It is to follow the BIO (begin-in-out) schema for symptom identification.
  • FIG. 1 shows an example utterance with annotations of symptoms in BIO format. Each Chinese character is assigned a label of “B”, “I” or “0”. Also, each extracted symptom expression is tagged with “True” or “False” indicating whether the patient suffers from this symptom or not. In order to improve the annotation agreement between annotators, it is to create two guidelines for the self-report and the conversational data respectively. Each record is annotated by at least two annotators. Any inconsistency would be further judged by the third one. The Cohen's kappa coefficient between two annotators are 71% and 67% for self-reports and conversations respectively.
  • After symptom expression identification, medical experts manually link each symptom expression to the most relevant concept on SNOMED CT (https://www.snomed.org/snomed-ct) for normalization. Table 2 shows some phrases that describe symptoms in the example and some related concepts in SNOMED CT.
  • TABLE 2
    Related concept in
    Extracted symptom expression SNOMED CT
    Figure US20220130545A1-20220428-P00011
     (cough)
    Figure US20220130545A1-20220428-P00012
     (cough)
    Figure US20220130545A1-20220428-P00013
     (sneez)
    Figure US20220130545A1-20220428-P00014
     (sneezing)
    Figure US20220130545A1-20220428-P00015
     (cnot)
    Figure US20220130545A1-20220428-P00016
     (cnot)
    Figure US20220130545A1-20220428-P00017
     (have loose bowels)
    Figure US20220130545A1-20220428-P00018
     (diarrhea)
    Figure US20220130545A1-20220428-P00019
     37.5-37.7 
    Figure US20220130545A1-20220428-P00020
     (body
    Figure US20220130545A1-20220428-P00021
     (low-grade fever)
    temperature between 37.5-37.7)
  • The overview of dataset is presented in Table 3, wherein # of user goal is the number of dialogue sessions of each disease, Ave # of explicit symptoms and Ave # of implicit symptoms are the average number of explicit and implicit symptoms among user goals respectively.
  • TABLE 3
    Figure US20220130545A1-20220428-P00022
     of
    Ave
    Figure US20220130545A1-20220428-P00022
     of
    Ave 
    Figure US20220130545A1-20220428-P00022
     of
    user explicit implicit
    Disease goal symptoms symptoms
    infantile diarrhea 200 1.15 2.71
    children functional dyspepsia 150 1.70 3.20
    upper respiratory infection 160 2.56 3.55
    children's bronchitis 200 2.87 3.64
  • After symptom extraction and normalization, there are 144 unique symptoms identified. In order to reduce the size of action space of the DS, only 67 symptoms with a frequency greater than or equal to 10 are kept. Samples are then generated, called “user goal”. Each user goal (see below an example in FIG. 2) is derived from one real world patient record (www.sdspeople.fudan.edu.cn/zywei/data/ac12018-mds.zip).
  • A task-oriented DS typically contains three components, namely Natural Language Understanding (NLU), Dialogue Manager (DM) and Natural Language Generation (NLG). NLU detects the user intent and slots with values from utterances; DM tracks the dialogue states and takes system actions; NLG generates natural language given the system actions.
  • In this application, it is to focus on the DM for automatic diagnosis consisting of two sub-modules, namely, dialogue state tracker (DST) and policy learning. Both NLU and NLG are implemented with template-based models.
  • Typically, a user simulator is designed to interact with the dialogue system. In this application it is to follow the same setting as Li et al (“End-to-end task completion neural dialogue systems, 2017, Proceedings of the Eighth International Joint Conference on Natural Language Processing) to design the medical DS of this application.
  • FIG. 2 shows an example of user goal. Each user goal consists of four parts, disease tag is the disease that the user suffers; explicit symptoms are symptoms extracted from the user self-report; implicit symptoms are symptoms extracted from the conversational data between the patient and the doctor; request slots is the disease slot that the user would request.
  • At the beginning of a dialogue session, the user simulator samples a user goal, while the agent attempts to make a diagnosis for the user. The system will learn to select the best response action at each time step by maximizing a long term reward.
  • At the beginning of each dialogue session, a user simulator samples a user goal from the experiment dataset. At each turn t, the user takes an action au,t according to the current user state su,t and the previous agent action at-1, and transits into the next user state su, t+1. In practice, the user state su is factored into an agenda A (“Agenda-based user simulation for bootstrapping a pomdp dialogue system”, Schatzmann et. Al., 2007, The Conference of the North American Chapter of the Association for Computational Linguistics) and a goal G, noted as su=(A,G). During the course of the dialogue, the goal G ensures that the user behaves in a consistent, goal-oriented manner. And the agenda contains a list of symptoms and their status (whether or not they are requested) to track the progress of the conversation.
  • Every dialogue session is initiated by the user via the user action au,1 which consists of the requested disease slot and all explicit symptoms. In terms of the symptom requested by the agent during the course of the dialogue, the user will take one of the three actions including True (if the symptom is positive), False (if the symptom is negative), and not_sure (if the symptom is not mentioned in the user goal). If the agent informs correct disease, the dialogue session will be terminated as successful by the user. Otherwise, the dialogue session will be recognized as failed if the agent makes incorrect diagnosis or the dialogue turn reaches the maximum dialogue turn T.
  • In this application, we cast DS as Markov Decision Process (MDP) (“Pomdp-based statistical spoken dialog systems: A review”. Young et al., 2013, Proceedings of the IEEE) and train the dialogue policy via reinforcement learning (“Strategic dialogue management via deep reinforcement learning”, Cuayahuitl et al., 2015, CoRR). An MDP is composed of states, actions, rewards, policy, and transitions.
  • A dialogue state s includes symptoms requested by the agent and informed by the user till the current time t, the previous action of the user, the previous action of the agent and the turn information. In terms of the representation vector of symptoms, it's dimension is equal to the number of all symptoms, whose elements for positive symptoms are 1, negative symptoms are −1, not-sure symptoms are −2 and not-mentioned symptoms are 0. Each state s ∈ S is the concatenation of these four vectors.
  • An action a ∈A is composed of a dialogue act (e.g., “inform”, “request”, “deny” and “confirm”) and a slot (i.e., normalized symptoms or a special slot “disease”). In addition, “thanks” and “close dialogue” are also two actions.
  • The transition from st to st+1 is the updating of state st based on the agent action at, the previous user action au,t−1 and the step time t.
  • The reward rt+1=R(st, at) is the immediate reward at step time t after taking the action at, also known as reinforcement.
  • The policy π describes the behaviors of an agent, which takes the state st as input and outputs the probability distribution over all possible actions π(at|st).
  • In this application, the policy is parameterized with a deep Q-network (DQN) (Mnih et al., “Human-level control through deep reinforcement learning”, Nature, 2015), which takes the state st as input and outputs Q(st, a; 0) for all actions a. A Q-network can be trained by updating the parameters 0 i at iteration i to reduce the mean squared error between the Q-value computed from the current network Q(s, a|0 i) and the Q-value obtained from the Bellman equation

  • yi=r+γ maxa′ Q(s′,a′|θ i )
  • where Q(s′, a′|θi ) is the target network with parameters θi from some previous iteration. In practice, the behavior distribution is often selected by an ∈-greedy policy that takes an action a=arg maxa′Q(st, a′; θ) with probability 1-∈ and selects a random action with probability ∈, which can improve the efficiency of exploration. When training the policy, experience replay is used. We store the agent's experiences at each time-step, εt=(st, at, rt, st+1) in a fixed size, queue-like buffer D.
  • In a simulation epoch, the current DQN network is updated multiple times (depending on the batch size and the current size of replay buffer) with different batches drawn randomly from the buffer, while the target DQN network is fixed during the updating of current DQN network. At the end of each epoch, the target network is replaced by the current network and the current network is evaluated on training set. The buffer will be flushed if the current network performs better than all previous versions.
  • Experiments
  • The max dialogue turn T is 22. A positive reward of +44 is given to the agent at the end of a success dialogue, and a −22 reward is given to a failure one. We apply a step penalty of −1 for each turn to encourage shorter dialogues. The dataset is divided into two parts: 80% for training with 568 user goals and 20% for testing with 142 user goals. The ∈ of ∈-greedy strategy is set to 0.1 for effective action space exploration and the γ in Bellman equation is 0.9. The size of buffer D is 10000 and the batch size is 30. And the neural network of DQN is a single layer network. The learning rate is 0.001. Each simulation epoch consists of 100 dialogue sessions and the current network is evaluated on 500 dialogue sessions at the end of each epoch. Before training, the buffer is pre-filled with the experiences of the rule-based agent (see below) to warm start our dialogue system.
  • To evaluate the performance of the proposed framework, our model is compared with baselines in terms of three evaluation metrics following Li et al. (“End-to-end task completion neural dialogue systems”, 2017, In Proceedings of the Eighth International Joint Conference on Natural Language Processing) and Peng at al. (“Adversarial advantage actor-critic model for task-completion dialogue policy learning”, 2017, https://arxiv.org/abs/1710.11277; “Composite task-completion dialogue policy learning via hierarchical deep reinforcement learning”, 2017, Conference on Empirical Methods in Natural Language Processing), namely, success rate, average reward and the average number of turns per dialogue session. As for classification models, we use accuracy as the metric.
  • The baselines include:
  • (1) SVM: This model treats the automatic diagnosis as a multi-class classification problem. It takes one-hot representation of symptoms in the user goal as input, and predicts the disease. There are two configurations: one takes both explicit and implicit symptoms as input (denoted as SVM-ex&im), and the other takes only explicit symptoms to predict the disease (denoted as SVM-ex).
  • (2) Random Agent: At each turn, the random agent takes an action randomly from the action space as the response to the user's action.
  • (3) Rule-based Agent: The rule-based agent takes an action based on handcrafted rules. Conditioned on the current dialogue state st, the agent will inform disease if all the known symptoms related are detected. If no disease can be identified, the agent will select one of the left symptoms randomly to inform. The relations between diseases and symptoms are extracted from the annotated corpus in advance. In this work, only the first T/2.5 (2.5 is a hyper-parameter) symptoms with high frequency are kept for each disease so that the rule-based agent could inform a disease within the max dialogue turn T.
  • Table 4 shows the accuracy of two SVM-based models.
  • TABLE 4
    Disease SVM-ex&im SVM-ex
    Infantile diarrhea 0.91 0.89
    Children functional dyspepsia 0.34 0.28
    Upper respiratory infection 0.52 0.44
    Children's bronchitis 0.93 0.71
    Overall 0.71 0.59
  • The result shows that the implicit symptoms can greatly improve the accuracy of disease identification for all the four diseases, which demonstrates the contribution of implicit symptoms when making diagnosis for patients.
  • FIG. 3 shows the learning curve of all the three dialogue systems and Table 5 shows the performance of these agents on testing set, wherein performance of the three dialogue systems on 5K simulated dialogues is shown.
  • TABLE 5
    Model Success Reward Turn
    Random Agent 0.06 −24.36 17.51
    Rule Agent 0.23 −13.78 17.00
    DQN Agent 0.65 20.51 5.11
  • Due to the large action space, the random agent performs badly. The rule-based agent outperforms the random agent in a large margin. This indicates that the rule-based agent is well designed. It can also be seen that the RL-based DQN agent outperforms rule-based agent significantly. Moreover, DQN agent outperforms SVM-ex by collecting additional implicit symptoms via conversing with patients. However, there is still a gap between the performance of DQN agent and SVM-ex&im in terms of accuracy, which indicates that there is still rooms for the improvement of the dialogue system.
  • The experiment results show that the dialogue system of this invention is able to collect additional symptoms via conversation with patients and improve the accuracy for automatic diagnosis. Hence, it fills the gap of applying DS in disease identification
  • It will be appreciated by those skilled in the art that changes could be made to the embodiments described above without departing from the broad inventive concept thereof. It is understood, therefore, that the invention disclosed herein is not limited to the particular embodiments disclosed, but it is intended to cover modifications within the spirit and scope of the present invention as defined by the appended claims.

Claims (20)

1. A task-oriented dialogue system, comprising:
a symptom extraction module for extracting the symptoms from a dataset including user self-report data and user-doctor conversational data;
a symptoms normalization module for normalizing the extracted symptoms and generating a user goal;
an agent simulator module simulating the behavior of a doctor, and a user simulator module simulating the behavior of a patient; and
a dialogue policy learning module for training the dialogue policy via reinforcement learning;
wherein the user goal includes
a disease tag tagging the disease that the user suffers;
explicit symptoms extracted from the user self-report data;
implicit symptoms extracted from the user-doctor conversational data; and
disease slots that the user request; and
wherein the user simulator module samples the user goal,
2. The system of claim 1, wherein the user simulator module includes a dialogue state tracker for tracking the dialogue state.
3. The system of claim 2, wherein the user simulator module iteratively takes a user action according to a current user state and a previous agent action, and transits into the next user state; wherein the agent action includes “inform” action and “request” action, while the user action includes “deny” action, “confirm” action and “not-sure” action.
4. The system of claim 1, wherein, every dialogue session is initiated by the user simulator via a user action which includes the requested disease slot and all explicit symptoms.
5. The system of claim 4, wherein during the course of the dialogue session, in terms of the symptom requested by the agent simulator, the user simulator returns a positive answer when the symptom is positive, a negative answer when the symptom is negative, and a not-sure answer when the symptom is not mentioned in the user goal.
6. The system of claim 4, wherein the dialogue session will be recognized as successful when the agent simulator informs correct disease.
7. The system of claim 4, wherein the dialogue session will be recognized as failed when the agent simulator makes incorrect diagnosis or the dialogue turn reaches the maximum dialogue turn.
8. The system of claim 6, wherein the dialogue session will be terminated by the user simulator when recognized as successful.
9. The system of claim 1, wherein the dialogue policy learning module trains the dialogue policy by using parameters of dialogue states, actions, rewards, policy, and transitions.
10. The system of claim 9, wherein the dialogue state includes symptoms requested by the agent simulator and informed by the user simulator till the current time, the previous action of the user simulator, the previous action of the agent simulator and the turn information.
11. The system of claim 1, wherein the dialogue state further comprises a symptoms vector, the dimension of which is equal to the number of all symptoms; wherein elements of the symptoms vector are 1 for positive symptoms, −1 for negative symptoms, −2 for not-sure symptoms, and 0 for not-mentioned symptoms.
12. The system of claim 9, wherein the transition is the updating of dialogue state based on the current agent action, the previous user action and the step time.
13. The system of claim 9, wherein the reward is an immediate reward at step time t after taking the current agent action.
14. The system of claim 9, wherein the policy describes the behaviors of the agent simulator, takes the dialogue state as input and outputs the probability distribution over all agent actions.
15. The system of claim 14, wherein the policy is parameterized with a deep Q-network which takes the dialogue state as input and outputs Q for all agent actions.
16. The system of claim 15, wherein the Q-network is trained by updating the parameters iteratively to reduce the mean squared error between the Q-value computed from the current network Q and the Q-value obtained from the Bellman equation.
17. The system of claim 16, wherein the Bellman equation is parameterized as

y i =r+γ maxa′ Q(s′,a′|θ i )
and wherein, Q(s′, a′|θi ) is the target network with parameters from some previous iteration.
18. The system of claim 1, wherein the current DQN network is updated multiple times with different batches drawn randomly from the buffer, while the target DQN network is fixed during the updating of current DQN network; and the times that the current DQN network updated is depending on the batch size and the current size of replay buffer.
19. A task-oriented dialogue system, comprising:
a symptom extraction module for extracting the symptoms from a dataset including user self-report data and user-doctor conversational data;
a symptoms normalization module for normalizing the extracted symptoms and generating a user goal;
an agent simulator module simulating the behavior of a doctor, and a user simulator module simulating the behavior of a patient; and
a dialogue policy learning module for training the dialogue policy via reinforcement learning;
wherein the user simulator module samples the user goal.
20. A task-oriented dialogue method, comprising:
providing a symptom extraction module for extracting the symptoms from a dataset including user self-report data and user-doctor conversational data;
providing a symptoms normalization module for normalizing the extracted symptoms and generating a user goal;
providing an agent simulator module simulating the behavior of a doctor, and a user simulator module simulating the behavior of a patient; and
providing a dialogue policy learning module for training the dialogue policy via reinforcement learning;
wherein the user simulator module samples the user goal.
US17/508,655 2020-10-22 2021-10-22 Task-oriented Dialogue System for Automatic Disease Diagnosis Abandoned US20220130545A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202011136008.2A CN112420189A (en) 2020-10-22 2020-10-22 Hierarchical disease diagnosis system, disease diagnosis method, device and apparatus
CN202011135075.2 2020-10-22
CN202011136008.2 2020-10-22
CN202011135075.2A CN112349409A (en) 2020-10-22 2020-10-22 Disease type prediction method, device, equipment and system

Publications (1)

Publication Number Publication Date
US20220130545A1 true US20220130545A1 (en) 2022-04-28

Family

ID=81257482

Family Applications (2)

Application Number Title Priority Date Filing Date
US17/508,655 Abandoned US20220130545A1 (en) 2020-10-22 2021-10-22 Task-oriented Dialogue System for Automatic Disease Diagnosis
US17/508,675 Active US11562829B2 (en) 2020-10-22 2021-10-22 Task-oriented dialogue system with hierarchical reinforcement learning

Family Applications After (1)

Application Number Title Priority Date Filing Date
US17/508,675 Active US11562829B2 (en) 2020-10-22 2021-10-22 Task-oriented dialogue system with hierarchical reinforcement learning

Country Status (1)

Country Link
US (2) US20220130545A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12423383B2 (en) * 2021-01-06 2025-09-23 Electronics And Telecommunications Research Institute Method for exploration based on curiosity and prioritization of experience data in multi-agent reinforcement learning
JP7647359B2 (en) * 2021-06-08 2025-03-18 トヨタ自動車株式会社 Multi-agent simulation system and multi-agent simulation method
JP7491268B2 (en) * 2021-06-08 2024-05-28 トヨタ自動車株式会社 Multi-agent Simulation System

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107729710B (en) * 2016-08-11 2021-04-13 宏达国际电子股份有限公司 Medical systems and non-transitory computer readable media
US20180165602A1 (en) * 2016-12-14 2018-06-14 Microsoft Technology Licensing, Llc Scalability of reinforcement learning by separation of concerns
EP3543914A1 (en) 2018-03-22 2019-09-25 Koninklijke Philips N.V. Techniques for improving turn-based automated counseling to alter behavior
CN110504026B (en) * 2018-05-18 2022-07-26 宏达国际电子股份有限公司 Control method and medical system
CN111951943B (en) * 2020-09-27 2021-01-05 平安科技(深圳)有限公司 Intelligent triage method and device, electronic equipment and storage medium
US20220107628A1 (en) * 2020-10-07 2022-04-07 The Boeing Company Systems and methods for distributed hierarchical control in multi-agent adversarial environments

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Lin Xu, et al., "End-to-End Knowledge-Routed Relational Dialogue System for Automatic Diagnosis", The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19), pg. 7346-53 ©2019 (Year: 2019) *

Also Published As

Publication number Publication date
US20220130546A1 (en) 2022-04-28
US11562829B2 (en) 2023-01-24

Similar Documents

Publication Publication Date Title
Wei et al. Task-oriented dialogue system for automatic diagnosis
US20250132051A1 (en) Ensemble machine learning systems and methods
CN111949759B (en) Retrieval method, system and computer equipment for medical record text similarity
US10847265B2 (en) Systems and methods for responding to healthcare inquiries
US10699215B2 (en) Self-training of question answering system using question profiles
EP3895178A1 (en) System and method for providing health information
US20220130545A1 (en) Task-oriented Dialogue System for Automatic Disease Diagnosis
US20170351677A1 (en) Generating Answer Variants Based on Tables of a Corpus
US20180075194A1 (en) Medical Condition Independent Engine for Medical Treatment Recommendation System
US11532387B2 (en) Identifying information in plain text narratives EMRs
US11379660B2 (en) Deep learning approach to computing spans
CN103999081A (en) Generation of natural language processing model for information domain
US20180121603A1 (en) Identification of Related Electronic Medical Record Documents in a Question and Answer System
CN110534185B (en) Labeling data acquisition method, triage device, storage medium and equipment
US20160078182A1 (en) Using Toxicity Level in Treatment Recommendations by Question Answering Systems
CN114420279A (en) Method, device, device and storage medium for recommending medical resources
US11295080B2 (en) Automatic detection of context switch triggers
WO2020228636A1 (en) Training method and apparatus, dialogue processing method and system, and medium
US10102200B2 (en) Predicate parses using semantic knowledge
CN113689951A (en) Intelligent diagnosis guiding method, system and computer readable storage medium
CN113314236A (en) Intelligent question-answering system for hypertension
Yan et al. EIRAD: An evidence-based dialogue system with highly interpretable reasoning path for automatic diagnosis
US11334720B2 (en) Machine learned sentence span inclusion judgments
Ucan et al. Generating Medical Reports With a Novel Deep Learning Architecture
Dumitrache Truth in disagreement: Crowdsourcing labeled data for natural language processing

Legal Events

Date Code Title Description
STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: ADVISORY ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION