CN110188170B - Multi-entry medical question template device and method thereof - Google Patents
Multi-entry medical question template device and method thereof Download PDFInfo
- Publication number
- CN110188170B CN110188170B CN201910450711.1A CN201910450711A CN110188170B CN 110188170 B CN110188170 B CN 110188170B CN 201910450711 A CN201910450711 A CN 201910450711A CN 110188170 B CN110188170 B CN 110188170B
- Authority
- CN
- China
- Prior art keywords
- template
- question
- medical
- user
- reasoning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Human Computer Interaction (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域Technical Field
本发明属于计算机的人工智能领域,尤其是是一种多入口医学问句模板装置及其方法,可以广泛应用于医学行业的智能信息检索与自动问答系统中。The present invention belongs to the field of computer artificial intelligence, and in particular, is a multi-entry medical question template device and method thereof, which can be widely used in intelligent information retrieval and automatic question-answering systems in the medical industry.
背景技术Background Art
互联网技术的飞速发展,使网络信息呈指数的增长,为了从这些海量信息中快速、准确的获得需要的信息,传统的搜索引擎已不能满足用户的需求,智能信息检索与自动问答系统因其能够准确、直接地回答用户问题,正逐渐成为研究的重点和热点,然而由于用户问句的多样性与随意性,造成准确获取用户问句的语义与预期相差较大,因此目前自动问答与智能信息检索答案的准确率还不高。The rapid development of Internet technology has led to an exponential growth in network information. In order to quickly and accurately obtain the required information from this massive amount of information, traditional search engines can no longer meet the needs of users. Intelligent information retrieval and automatic question-answering systems are gradually becoming the focus and hotspot of research because they can accurately and directly answer user questions. However, due to the diversity and randomness of user questions, the accurate acquisition of the semantics of user questions is quite different from the expectations. Therefore, the accuracy of automatic question-answering and intelligent information retrieval answers is not high.
问句处理与理解是智能信息检索与自动问答系统中的第一个执行阶段,它的目标是让计算机理解用户的问句,获取用户的提问意图,为后面的信息检索提供依据。问句的理解一般包含词法分析、句法分析和语义分析,目前,语义分析是自然语言处理的瓶颈。在面向领域的信息检索与问答系统中,用户的查询具有很多相似性,例如在基于本体的知识库中,很多都是对概念或实体的属性进行查询,还有对概念之间的关系以及属性之间的关系,可以把这些问句抽取成基于领域本体的、具有代表性、封装有语义信息的问句模板,从而有效地避免复杂的词法分析、句法分析和语义分析。因此,研究基于语义关系和问句模板的问句理解方法具有十分重要的意义。Question processing and understanding is the first execution stage in intelligent information retrieval and automatic question answering systems. Its goal is to enable computers to understand users' questions, obtain users' question intentions, and provide a basis for subsequent information retrieval. Question understanding generally includes lexical analysis, syntactic analysis, and semantic analysis. At present, semantic analysis is the bottleneck of natural language processing. In domain-oriented information retrieval and question answering systems, user queries have many similarities. For example, in an ontology-based knowledge base, many queries are about the attributes of concepts or entities, as well as the relationships between concepts and between attributes. These questions can be extracted into representative question templates based on domain ontology that encapsulate semantic information, thereby effectively avoiding complex lexical analysis, syntactic analysis, and semantic analysis. Therefore, it is of great significance to study question understanding methods based on semantic relationships and question templates.
目前基于语义关系和问句模板的问句理解方法,普遍采用一个模板对应领域本体或领域知识库中的一个语义关系。然而,在一个医学信息系统中,知识和关系较为复杂,一个问句语义需要通过多种语义关系给出解释。例如,在询问疾病的相关症状时,就需要多种语义关系来表达,单一语义关系对于疾病症状的表达不全面,不能对疾病症状进行较为全面的描述。为了使得模板的语义更为全面、明确和清晰,同时也为提高模板模板匹配精度与设计效率,减少模板库的规模,寻找一种基于医学概念与医学关系的多入口医学问句模板装置与应用方法是十分有必要的。At present, the question understanding method based on semantic relations and question templates generally adopts a template corresponding to a semantic relation in the domain ontology or domain knowledge base. However, in a medical information system, the knowledge and relations are relatively complex, and the semantics of a question need to be explained through multiple semantic relations. For example, when asking about the relevant symptoms of a disease, multiple semantic relations are needed to express it. A single semantic relation is not comprehensive in expressing the symptoms of the disease and cannot provide a more comprehensive description of the symptoms of the disease. In order to make the semantics of the template more comprehensive, clear and clear, and at the same time to improve the template matching accuracy and design efficiency, and reduce the size of the template library, it is very necessary to find a multi-entry medical question template device and application method based on medical concepts and medical relations.
发明内容Summary of the invention
本发明为解决医学行业的智能信息检索与自动问答系统中,知识和关系较为复杂、一个问句语义需要通过多种语义关系给出解释的问题而设计的,同时也为提高模板设计效率、减少模板库的规模,以及最大限度地满足用户的提问意图,提供了一种多入口医学问句模板装置及其方法。The present invention is designed to solve the problem in intelligent information retrieval and automatic question-answering systems in the medical industry that knowledge and relationships are relatively complex and the semantics of a question need to be explained through multiple semantic relationships. At the same time, it also provides a multi-entry medical question template device and method to improve template design efficiency, reduce the size of the template library, and maximize the satisfaction of users' question intentions.
为实现上述目的,本发明的技术方案为:To achieve the above object, the technical solution of the present invention is:
一种多入口医学问句模板装置,所述模板装置是一种基于医学概念与医学关系的医学问句转换机构,它将主模板结构、近义模板结构、多入口联合结构中的推理规则与推理函数绑定在一起,实现将多种近义用户问句转换为一个多入口医学问句模板,以根据转换获得的多入口医学问句模板从UMLS医学知识库中抽取相应答案。A multi-entry medical question template device is a medical question conversion mechanism based on medical concepts and medical relationships. It binds together the inference rules and inference functions in the main template structure, the synonymous template structure, and the multi-entry joint structure to realize the conversion of multiple synonymous user questions into a multi-entry medical question template, so as to extract the corresponding answers from the UMLS medical knowledge base according to the multi-entry medical question template obtained by the conversion.
进一步的,所述问句模板将主模板结构、近义模板结构、多入口联合结构中的推理规则与推理函数绑定在一起,其巴科斯-诺尔范式定义为:Furthermore, the question template binds the inference rules and inference functions in the main template structure, the synonymous template structure, and the multi-entry joint structure together, and its Backus-Naur form is defined as:
<多入口医学问句模板>::=(<主模板结构>,{<近义模板结构>},<多入口联合结构>) (1)<Multi-entry medical question template>::=(<main template structure>,{<synonymous template structure>},<multi-entry joint structure>) (1)
<多入口联合结构>::=({<入口联合结构>}) (2)<Multiple entry union structure>::=({<entry union structure>}) (2)
<入口联合结构>::=({<同义模板结构>},<首选绑定结构>,{<次选绑定结构>}) (3)<Entry Union Structure>::=({<Synonymous Template Structure>}, <Preferred Binding Structure>, {<Secondary Binding Structure>}) (3)
<首选绑定结构>::=(<首选推理规则>,<首选推理函数>) (4)<preferred binding structure>::=(<preferred inference rule>, <preferred inference function>) (4)
<次选绑定结构>::=(<次选推理规则>,<次选推理函数>) (5)<Secondary Binding Structure>::=(<Secondary Inference Rule>, <Secondary Inference Function>) (5)
所述主模板结构为使用变量和标签表示的问句模板最具代表性的句子结构,反映了问句模板的浅层提问语义;The main template structure is the most representative sentence structure of the question template represented by variables and labels, reflecting the shallow question semantics of the question template;
所述近义模板结构是一种与主模板结构具有相近语义的模板结构;一个问句模板包含一个以上的近义结构;The synonymous template structure is a template structure with similar semantics to the main template structure; a question template contains more than one synonymous structure;
所述多入口联合结构是一种实现问句模板的各种子语义到医学知识系统中不同语义关系的转换机制;一个问句模板包含一个以上入口联合结构;每个入口联合结构表示其所在问句模板的某种子语义的推理方法,由一组同义模板结构、首选推理规则及其推理函数、以及多个有序的次选推理规则及其推理函数组成;其中,同义模板结构是一组反映该入口联合结构子语义的同义问句句子结构,每个推理规则与一个对应的推理函数相关联,首选推理规则是最符合该入口联合结构的模板结构语义的答案推理方法,次选推理规则按照其与模板结构语义的相近程度进行排序;The multi-entry joint structure is a mechanism for converting various sub-semantics of a question template into different semantic relationships in a medical knowledge system; a question template contains more than one entry joint structure; each entry joint structure represents a reasoning method for a certain sub-semantic of the question template in which it is located, and is composed of a group of synonymous template structures, preferred reasoning rules and their reasoning functions, and a plurality of ordered secondary reasoning rules and their reasoning functions; wherein the synonymous template structure is a group of synonymous question sentence structures reflecting the sub-semantics of the entry joint structure, each reasoning rule is associated with a corresponding reasoning function, the preferred reasoning rule is the answer reasoning method that best conforms to the template structure semantics of the entry joint structure, and the secondary reasoning rules are sorted according to their closeness to the template structure semantics;
所述推理规则是代表问句模板的深层提问语义,使用基于医学概念和医学关系的谓词公式精确表示预期答案的推理过程与用户提问的意图;The inference rule is a deep question semantics representing a question template, and uses a predicate formula based on medical concepts and medical relations to accurately represent the reasoning process of the expected answer and the user's intention of asking the question;
所述推理函数是一个与推理规则相绑定的答案推理程序,它执行推理规则指定的推理功能,并通过推理规则指定的语义关系从医学知识库中抽取对应的答案,它的驱动语义来源于它所匹配的用户问句中的医学知识元素;The inference function is an answer inference program bound to the inference rule, which executes the inference function specified by the inference rule and extracts the corresponding answer from the medical knowledge base through the semantic relationship specified by the inference rule. Its driving semantics comes from the medical knowledge elements in the user question it matches;
所述公式(1)中的“主模板结构”与“近义模板结构”集合的并集等于公式(3)中的“同义模板结构”的并集。The union of the set of "primary template structure" and the set of "synonymous template structure" in formula (1) is equal to the union of the set of "synonymous template structures" in formula (3).
更进一步的,所述主模板结构和近义模板结构中,任意两个模板结构的基于核心元素、变量类型与顺序的归一化语义相似度要小于1。Furthermore, among the main template structure and the synonymous template structure, the normalized semantic similarity of any two template structures based on core elements, variable types and sequences is less than 1.
更进一步的,模板结构被定义为一组使用模板标注符标注以及变量类型符、按一定顺序排列的模板元素,所述模板结构包括主模板结构、近义模板结构,问句模板的提问语义由模板结构中的变量和核心元素来表征,用巴科斯-诺尔范式定义为:Furthermore, the template structure is defined as a set of template elements annotated with template annotation symbols and variable type symbols and arranged in a certain order. The template structure includes a main template structure and a synonym template structure. The question semantics of the question template is represented by the variables and core elements in the template structure, which is defined in Backus-Naur form as:
<模板结构>:=(<模板元素1>,<模板元素2>,…,<模板元素n>) (6)<Template structure>:=(<Template element 1>, <Template element 2>, ..., <Template element n>) (6)
<模板元素>:=(<核心元素>,<可选元素>,<变量>) (7)<Template element>:=(<core element>,<optional element>,<variable>) (7)
<核心元素>:=在模板中使用标注符“<”、“>”标注的元素 (8)<Core element>: = Elements marked with the markers “<”, “>” in the template (8)
<可选元素>:=在模板中使用标注符“[”、“]”标注的元素 (9)<Optional element>: = Elements marked with the markers “[”, “]” in the template (9)
<变量>:=<变量名>+“:”+<变量类型符> (10)<variable>:=<variable name>+“:”+<variable type> (10)
其中,模板标注符包括:<>用于界定模板中的一个必选的核心元素;[]用于界定模板中的一个可省略的可选元素,{}表示模板中的元素集合;|用于在模板中分离同义词;The template markers include: <> is used to define a required core element in the template; [] is used to define an optional element that can be omitted in the template; {} represents a set of elements in the template; | is used to separate synonyms in the template;
变量类型符包括:①Concept:<c:Concept>声明模板变量c是UMLS知识库中的一个医学概念;②Relation:<r:Concept>声明模板变量r是UMLS知识库中的一个医学关系;③ConceptSet:<s:ConceptSet>声明模板变量s是UMLS知识库中的一个医学概念集合;④Type:<t:Type>声明模板变量t属于UMLS知识库中的Type类型。Variable type symbols include: ①Concept: <c:Concept> declares that the template variable c is a medical concept in the UMLS knowledge base; ②Relation: <r:Concept> declares that the template variable r is a medical relationship in the UMLS knowledge base; ③ConceptSet: <s:ConceptSet> declares that the template variable s is a medical concept set in the UMLS knowledge base; ④Type: <t:Type> declares that the template variable t belongs to the Type type in the UMLS knowledge base.
更进一步的,所述推理规则是一个通过模板描述逻辑系统定义的逻辑蕴涵式,所述模板描述逻辑系统简称为TDLS,TDLS为如下的二元组:Furthermore, the inference rule is a logical implication defined by a template description logic system, the template description logic system is abbreviated as TDLS, and TDLS is the following tuple:
TDLS::=(<谓词集合>,<运算符>)TDLS::=(<predicate set>,<operator>)
所述谓词,是用于声明、识别与确定问句模板中的医学概念与医学关系;在TDLS中,包括三种类型的谓词:一元谓词、二元谓词和三元谓词;所述一元谓词用于声明模板变量所属的医学知识的类别,所述二元谓词用于声明两个模板变量之间的语义关系;所述三元谓词,用于声明医学关系的定义域和值域;The predicate is used to declare, identify and determine the medical concepts and medical relations in the question template. In TDLS, there are three types of predicates: unary predicates, binary predicates and ternary predicates. The unary predicate is used to declare the category of medical knowledge to which the template variable belongs, and the binary predicate is used to declare the semantic relationship between two template variables. The ternary predicate is used to declare the domain and range of the medical relationship.
所述运算符包括:①符号“∧”:表示合取运算,代表逻辑“与”,运算对象为谓词或谓词逻辑表达式;②符号“∨”:表示析取运算,代表逻辑“或”,运算对象为谓词或谓词逻辑表达式;③符号表示全称量词,代表任何个体,运算对象为医学概念;④符号表示存在量词,代表存在某个个体,运算对象为医学概念;⑤符号“:”:表示医学知识变量的类型定义符,运算对象:左边为模板变量,右边为一阶模板谓词;⑥符号“.”:表示医学概念或实例的关系的引用运算对象为医学概念。The operators include: ① symbol “∧”: represents conjunction operation, represents logical “and”, and the operation object is a predicate or a predicate logic expression; ② symbol “∨”: represents disjunction operation, represents logical “or”, and the operation object is a predicate or a predicate logic expression; ③ symbol It represents a universal quantifier, representing any individual, and the operation object is a medical concept; ④Symbol It represents an existential quantifier, representing the existence of an individual, and the operation object is a medical concept; ⑤Symbol “:”: It represents the type definition symbol of a medical knowledge variable, and the operation object: the template variable on the left and the first-order template predicate on the right; ⑥Symbol “.”: It represents the reference operation object of the relationship between medical concepts or instances, which is a medical concept.
利用以上所述的多入口医学问句模板装置进行答案推理的方法,包括以下步骤:The method for answer reasoning using the multi-entry medical question template device described above comprises the following steps:
S1.建立多入口医学问句模板库:S1. Establish a multi-entry medical question template library:
S11.收集基于UMLS的医学问答系统中的用户问句集合,并对所有用户问句进行词根还原;S11. Collect the user question sets in the UMLS-based medical question answering system and perform root word restoration on all user questions;
S12.使用所述模板标注符与所述变量类型符对用户问句进行模板化,标注出其中的核心元素、可选元素、UMLS的概念名与关系名,并将其中的每一个UMLS概念名、概念名集合与关系名分别使用一个变量名及变量类型符进行替换;S12. Template the user's question using the template marker and the variable type marker, mark the core elements, optional elements, UMLS concept names and relationship names, and replace each UMLS concept name, concept name set and relationship name with a variable name and variable type marker;
S13.对模板化后的用户问句进行分类,将语义相近的用户问句收集在一起,形成一个多入口医学问句模板的模板结构集;S13. classifying the templated user questions, collecting user questions with similar semantics together, and forming a template structure set of multi-entry medical question templates;
S14.重复步骤S13,直到用户问句集合中所有用户问句分类完毕,形成只包括模板结构集的多入口医学问句模板库;S14. Repeat step S13 until all user questions in the user question set are classified, forming a multi-entry medical question template library including only the template structure set;
S15.按照模板结构的不同子语义,同时结合UMLS知识库中的医学概念与医学关系,将每一个多入口医学问句模板中的模板结构集分成多组,形成多个入口联合结构;S15. According to different sub-semantics of the template structure and in combination with the medical concepts and medical relations in the UMLS knowledge base, the template structure set in each multi-entry medical question template is divided into multiple groups to form a multi-entry joint structure;
S16.使用模板描述逻辑系统,同时结合UMLS知识库中的医学概念与医学关系,为多入口医学问句模板库的每一个入口联合结构设计出符合该入口联合结构的推理规则与推理函数。S16. Using the template description logic system and combining the medical concepts and medical relations in the UMLS knowledge base, design reasoning rules and reasoning functions that conform to the entry joint structure for each entry joint structure of the multi-entry medical question template library.
S2.将待匹配的用户问句与所述多入口医学问句模板库中的问句模板相匹配:S2. Matching the user's question to be matched with the question template in the multi-entry medical question template library:
S21.将待匹配的用户问句进行如下预处理:首先,对用户问句进行词根还原,并以UMLS统一医学语言系统中的医学知识元素作为词典,标注出用户问句中的医学概念与关系,标注方式为:<医学概念:Concept>、<医学概念集合:ConceptSet>与<医学关系:Relation>,然后,将用户问句中非医学概念的名词与非医学关系的动词标注为核心元素,以及将疑问词与介词标注为核心元素,最后得到标注有核心元素与UMLS医学知识元素的用户问句;S21. Preprocess the user questions to be matched as follows: first, restore the user questions to their roots, and use the medical knowledge elements in the UMLS unified medical language system as a dictionary to mark the medical concepts and relationships in the user questions. The marking method is: <medical concept: Concept>, <medical concept set: ConceptSet> and <medical relationship: Relation>. Then, mark the nouns of non-medical concepts and verbs of non-medical relationships in the user questions as core elements, and mark the question words and prepositions as core elements. Finally, the user questions marked with core elements and UMLS medical knowledge elements are obtained.
S22.将预处理后的用户问句,依次与多入口医学问句模板库中的问句模板进行句模相似度计算,将其中句模相似度最高的问句模板确认为与该用户问句匹配的问句模板。S22. The preprocessed user question is sequentially subjected to sentence template similarity calculations with the question templates in the multi-entry medical question template library, and the question template with the highest sentence template similarity is confirmed as the question template matching the user question.
S3.将与用户问句匹配度最高的问句模板所在的入口联合结构,作为用户问句在所匹配的多入口联合结构中的执行入口,执行符合该入口联合结构的推理函数以完成答案的推理与抽取。S3. The entry joint structure containing the question template with the highest matching degree with the user's question is used as the execution entry of the user's question in the matched multi-entry joint structure, and the reasoning function that conforms to the entry joint structure is executed to complete the reasoning and extraction of the answer.
进一步的,所述步骤S22中,用户问句与每一个多入口医学问句模板的句模相似度计算公式为:Furthermore, in step S22, the formula for calculating the sentence template similarity between the user's question and each multi-entry medical question template is:
其中,User表示用户问句,MUTP表示模板库中的一个多入口医学问句模板,TSS表示多入口医学问句模板MUTP中的模板结构集合,包括MUTP中的主模板结构与近义模板结构,TS为模板结构集合TSS中的任意一个模板结构,StruSim(User,TS)表示用户问句User与模板结构TS间的结构相似度,计算公式如下:Among them, User represents the user question, MUTP represents a multi-entry medical question template in the template library, TSS represents the template structure set in the multi-entry medical question template MUTP, including the main template structure and the synonymous template structure in MUTP, TS is any template structure in the template structure set TSS, StruSim(User,TS) represents the structural similarity between the user question User and the template structure TS, and the calculation formula is as follows:
StruSim(User,TS)=VarSim(User,TS)×KeySim(User,TS) (12)StruSim(User,TS)=VarSim(User,TS)×KeySim(User,TS) (12)
其中,VarSim(User,TS)表示模板结构TS中的变量与用户问句User中的UMLS医学知识元素之间的相似度,计算方法如公式(13)与(14),KeySim(User,TS)表示模板结构TS中的核心元素与用户问句User中的核心元素之间的相似度,计算方法如公式(15)与(16):Among them, VarSim(User, TS) represents the similarity between the variables in the template structure TS and the UMLS medical knowledge elements in the user question User, and the calculation method is as shown in formulas (13) and (14). KeySim(User, TS) represents the similarity between the core elements in the template structure TS and the core elements in the user question User, and the calculation method is as shown in formulas (15) and (16):
其中,i为Var中的任意一个变量,Var为模板结构TS中的变量集合,j为UE中的任意一个UMLS医学知识元素,UE表示用户问句User中的UMLS医学知识元素集合,Type(i)、Type(j)表示i、j所属的UMLS医学知识的类型,表示j的类型与i的类型相同或j的类型被i的类型所包含,m为Key中的任意一个核心元素,Key为模板结构TS中的核心元素集合,n为KE中的任意一个核心元素,KE表示用户问句User中的核心元素集合,VS(i)、KS(m)分别表示模板结构TS中的变量i、核心元素m与用户问句User之间的相似度,sim(m,n)表示基于通用语义词典的归一化的词语语义相似度计算,STH为一个相似度阈值;sim(m,n)的计算公式为:Where i is any variable in Var, Var is a variable set in the template structure TS, j is any UMLS medical knowledge element in UE, UE represents the set of UMLS medical knowledge elements in the user question User, Type(i) and Type(j) represent the types of UMLS medical knowledge to which i and j belong. Indicates that the type of j is the same as the type of i or the type of j is included in the type of i, m is any core element in Key, Key is the core element set in the template structure TS, n is any core element in KE, KE represents the core element set in the user question User, VS(i) and KS(m) represent the similarity between the variable i and the core element m in the template structure TS and the user question User, sim(m,n) represents the normalized word semantic similarity calculation based on the general semantic dictionary, STH is a similarity threshold; the calculation formula of sim(m,n) is:
其中,LCS(m,n)表示核心元素m,n之间在通用语义词典中的最近公共父结点,depth(LCS(m,n))表示LCS(m,n)在通用语义词典中的深度,pathLen(m,n)表示核心元素m,n之间在通用语义词典中的最短路径;Where LCS(m,n) represents the nearest common parent node between core elements m and n in the general semantic dictionary, depth(LCS(m,n)) represents the depth of LCS(m,n) in the general semantic dictionary, and pathLen(m,n) represents the shortest path between core elements m and n in the general semantic dictionary.
所述通用语义词典是指基于分类结构、跨领域的可计算词典。The universal semantic dictionary refers to a cross-domain computable dictionary based on a classification structure.
进一步的,所述步骤S3中,执行符合入口联合结构的推理函数包括以下子步骤:Furthermore, in step S3, executing the inference function conforming to the entry joint structure includes the following sub-steps:
S31.首先执行与首选推理规则相绑定的首选推理函数,当且仅当首选推理函数执行失败,才开始执行次选推理函数,并转步骤S32,否则返回首选推理函数的答案抽取结果并结束推理函数的执行;S31. First, execute the preferred reasoning function bound to the preferred reasoning rule. If and only if the execution of the preferred reasoning function fails, start executing the secondary reasoning function and go to step S32. Otherwise, return the answer extraction result of the preferred reasoning function and end the execution of the reasoning function.
S32.执行第一个次选推理函数,当且仅当第一个次选推理函数执行失败,才开始执行第二个次选推理函数,按此方式依次执行模板所绑定的所有次选推理函数,如果所有推理函数全部执行失败则返回一个失败信息,否则返回执行成功的推理函数的答案抽取结果并结束推理函数的执行;所述推理函数执行失败是指推理函数在UMLS知识库中没有发现提问概念具有规定的语义关系记录。S32. Execute the first secondary reasoning function. If and only if the first secondary reasoning function fails to execute, start executing the second secondary reasoning function. In this way, execute all the secondary reasoning functions bound to the template in turn. If all reasoning functions fail to execute, return a failure message. Otherwise, return the answer extraction result of the successfully executed reasoning function and end the execution of the reasoning function. The failure of the reasoning function execution means that the reasoning function does not find a record with the specified semantic relationship of the question concept in the UMLS knowledge base.
以上所述的基于医学概念与医学关系的多入口医学问句模板装置与应用方法,该装置所获得的多入口联合模板是一种基于医学概念与医学关系的问句公式,它将主模板结构、近义模板结构、多入口联合结构中的推理规则与推理函数绑定在一起,形成从多种近义用户问句到多种相近答案的转换,从而可提高模板匹配精度与模板设计效率,减少模板库的规模,以及最大限度地满足用户的提问意图。The multi-entry medical question template device and application method based on medical concepts and medical relationships described above, the multi-entry joint template obtained by the device is a question formula based on medical concepts and medical relationships, which binds the reasoning rules and reasoning functions in the main template structure, the synonymous template structure, and the multi-entry joint structure together to form a conversion from multiple synonymous user questions to multiple similar answers, thereby improving the template matching accuracy and template design efficiency, reducing the size of the template library, and maximizing the satisfaction of the user's question intentions.
本发明的多入口医学问句模板所涉及的医学知识库系符合美国国立医学图书馆所提出的UMLS(Unified Medical Language System)统一医学语言系统,即本发明的多入口医学问句模板是基于UMLS的医学概念与医学关系。UMLS是一种医学概念语义网,其中医学概念超过一百万个,概念类型为133种语义类型,并在这些概念类型之间建立了超过76种关系,然后以树结构组织而成。本发明的问句模板由于符合UMLS的要求,因此可用于从UMLS医学知识库中抽取相应答案,从而最大限度的满足用户的提问意图。The medical knowledge base involved in the multi-entry medical question template of the present invention complies with the UMLS (Unified Medical Language System) unified medical language system proposed by the U.S. National Library of Medicine, that is, the multi-entry medical question template of the present invention is based on the medical concepts and medical relationships of UMLS. UMLS is a medical concept semantic network, in which there are more than one million medical concepts, 133 semantic types of concept types, and more than 76 relationships are established between these concept types, which are then organized in a tree structure. Since the question template of the present invention complies with the requirements of UMLS, it can be used to extract corresponding answers from the UMLS medical knowledge base, thereby maximally satisfying the user's question intention.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1是本发明的应用方法示意图。FIG. 1 is a schematic diagram of the application method of the present invention.
具体实施方式DETAILED DESCRIPTION
以下结合具体实施例对本发明作进一步说明,但本发明的保护范围不限于以下实施例。The present invention is further described below in conjunction with specific embodiments, but the protection scope of the present invention is not limited to the following embodiments.
(一)多入口医学问句模板的定义和结构(I) Definition and structure of multi-entry medical question template
一种多入口医学问句模板装置,所述模板装置是一种基于医学概念与医学关系的多入口医学问句转换机构,它将主模板结构、近义模板结构、多入口联合结构中的推理规则与推理函数绑定在一起,实现将多种近义用户问句转换为一个多入口医学问句模板,以根据转换获得的多入口医学问句模板从UMLS医学知识库中抽取相应答案。其中,UMLS是指Unified Medical Language System,统一医学语言系统。A multi-entry medical question template device, the template device is a multi-entry medical question conversion mechanism based on medical concepts and medical relationships, which binds the inference rules and inference functions in the main template structure, the synonymous template structure, and the multi-entry joint structure together to realize the conversion of multiple synonymous user questions into a multi-entry medical question template, so as to extract the corresponding answer from the UMLS medical knowledge base according to the converted multi-entry medical question template. UMLS refers to the Unified Medical Language System.
进一步的,所述问句模板将主模板结构、近义模板结构、多入口联合结构中的推理规则与推理函数绑定在一起,其巴科斯-诺尔范式(BNF范式)定义为:Furthermore, the question template binds the inference rules and inference functions in the main template structure, the synonymous template structure, and the multi-entry joint structure together, and its Backus-Naur form (BNF) is defined as:
<多入口医学问句模板>::=(<主模板结构>,{<近义模板结构>},<多入口联合结构>) (1)<Multi-entry medical question template>::=(<main template structure>,{<synonymous template structure>},<multi-entry joint structure>) (1)
<多入口联合结构>::=({<入口联合结构>}) (2)<Multiple entry union structure>::=({<entry union structure>}) (2)
<入口联合结构>::=({<同义模板结构>},<首选绑定结构>,{<次选绑定结构>}) (3)<Entry Union Structure>::=({<Synonymous Template Structure>}, <Preferred Binding Structure>, {<Secondary Binding Structure>}) (3)
<首选绑定结构>::=(<首选推理规则>,<首选推理函数>) (4)<preferred binding structure>::=(<preferred inference rule>, <preferred inference function>) (4)
<次选绑定结构>::=(<次选推理规则>,<次选推理函数>) (5)<Secondary Binding Structure>::=(<Secondary Inference Rule>, <Secondary Inference Function>) (5)
主模板结构:使用变量和标签表示的问句模板最具代表性的句子结构,反映了问句模板的浅层提问语义。Main template structure: The most representative sentence structure of the question template represented by variables and labels, reflecting the shallow question semantics of the question template.
近义模板结构:一种与主模板结构具有相近语义的模板结构;一个问句模板可以包含多个近义结构。主模板结构和近义模板结构供模板匹配时使用。Synonymous template structure: a template structure with similar semantics to the main template structure; a question template can contain multiple synonymous structures. The main template structure and the synonymous template structure are used for template matching.
多入口联合结构:一种实现问句模板的各种子语义到医学知识系统中不同语义关系的转换机制;一个问句模板包含一个以上入口联合结构。每个入口联合结构表示其所在问句模板的某种子语义的推理方法,由一组同义模板结构、首选推理规则及其推理函数、以及多个有序的次选推理规则及其推理函数组成。其中,同义模板结构是一组反映该入口联合结构子语义的同义问句句子结构,每个推理规则与一个对应的推理函数相关联,其中,首选推理规则是最符合该入口联合结构的模板结构语义的答案推理方法,次选推理规则按照其与模板结构语义的相近程度进行排序;Multi-entry joint structure: a mechanism for converting various sub-semantics of question templates into different semantic relationships in medical knowledge systems; a question template contains more than one entry joint structure. Each entry joint structure represents the reasoning method of a certain sub-semantic of the question template in which it is located, and is composed of a set of synonymous template structures, preferred reasoning rules and their reasoning functions, and multiple ordered secondary reasoning rules and their reasoning functions. Among them, the synonymous template structure is a set of synonymous question sentence structures that reflect the sub-semantics of the entry joint structure. Each reasoning rule is associated with a corresponding reasoning function, among which the preferred reasoning rule is the answer reasoning method that best conforms to the template structure semantics of the entry joint structure, and the secondary reasoning rules are sorted according to their degree of closeness to the template structure semantics;
推理规则:代表问句模板的深层提问语义,使用基于医学概念和医学关系的谓词公式精确表示预期答案的推理过程与用户提问的意图。Inference rules: represent the deep semantics of the question template, using predicate formulas based on medical concepts and medical relations to accurately express the reasoning process of the expected answer and the user's intention of asking the question.
推理函数:一个与推理规则相绑定的答案推理程序,它执行推理规则指定的推理功能,并通过推理规则指定的语义关系从医学知识库中抽取对应的答案,它的驱动语义来源于它所匹配的用户问句中的医学知识元素,并由驱动语义完成推理函数的形式参数到实际参数的映射。Reasoning function: an answer reasoning program bound to the reasoning rules, which executes the reasoning function specified by the reasoning rules and extracts the corresponding answer from the medical knowledge base through the semantic relationship specified by the reasoning rules. Its driving semantics comes from the medical knowledge elements in the user questions it matches, and the driving semantics completes the mapping of the formal parameters of the reasoning function to the actual parameters.
(二)模板结构之间的关系与约束(II) Relationships and constraints between template structures
公式(1)中的“主模板结构”与“近义模板结构”集合的并集等于所有公式(3)中的“同义模板结构”的并集,即在公式(1)中给出的“主模板结构”或“近义模板结构”至少出现在一个公式3中的“同义模板结构”中。The union of the "primary template structure" and the set of "synonymous template structures" in formula (1) is equal to the union of all the "synonymous template structures" in formula (3), that is, the "primary template structure" or "synonymous template structure" given in formula (1) appears in at least one of the "synonymous template structures" in formula 3.
主模板结构和近义模板结构中,任意两个模板结构中的核心元素、变量类型及其它们之间的顺序不能完全相同,任意两个模板结构的基于核心元素、变量类型与顺序的归一化语义相似度要小于1。In the main template structure and the synonymous template structure, the core elements, variable types and their order in any two template structures cannot be exactly the same, and the normalized semantic similarity based on the core elements, variable types and order of any two template structures must be less than 1.
(三)基于变量与核心元素的问句模板结构(III) Question template structure based on variables and core elements
(1)标注符(1) Marking symbols
本发明为问句模板结构设计了一套模板元素标注符,用于分隔与界定模板中的各类元素,如表1所示。The present invention designs a set of template element markers for the question template structure, which are used to separate and define various elements in the template, as shown in Table 1.
表1 模板标注符Table 1 Template annotation symbols
(2)变量类型符(2) Variable type symbol
用于定义模板结构中的变量类型,如表2所示。例如,<c1:Concept>表示变量c1是本体概念。It is used to define the variable type in the template structure, as shown in Table 2. For example, <c 1 :Concept> indicates that the variable c 1 is an ontology concept.
表2 问句模板结构中的变量类型符Table 2 Variable type symbols in the question template structure
(3)模板结构(3) Template structure
模板结构被定义为一组使用表1所示的模板标注符标注以及表2所示的变量类型符、按一定顺序排列的模板元素,所述模板结构包括主模板结构、近义模板结构,问句模板的提问语义由模板结构中的变量和核心元素来表征,用巴科斯-诺尔范式(BNF范式)定义为:The template structure is defined as a set of template elements annotated with the template annotation symbols shown in Table 1 and the variable type symbols shown in Table 2 and arranged in a certain order. The template structure includes a main template structure and a synonym template structure. The question semantics of the question template is represented by the variables and core elements in the template structure, and is defined in Backus-Naur Form (BNF) as follows:
<模板结构>:=(<模板元素1>,<模板元素2>,…,<模板元素n>) (6)<Template structure>:=(<Template element 1>, <Template element 2>, ..., <Template element n>) (6)
<模板元素>:=(<核心元素>,<可选元素>,<变量>) (7)<Template element>:=(<core element>, <optional element>, <variable>) (7)
<核心元素>:=在模板中使用标注符“<”、“>”标注的元素 (8)<Core element>: = Elements marked with the markers “<”, “>” in the template (8)
<可选元素>:=在模板中使用标注符“[”、“]”标注的元素 (9)<Optional element>: = Elements marked with the markers “[”, “]” in the template (9)
<变量>:=<变量名>+“:”+<变量类型符> (10)<variable>:=<variable name>+“:”+<variable type> (10)
(四)基于谓词逻辑的推理规则(IV) Inference rules based on predicate logic
为精确地表达多入口医学问句模板的语义,定义多入口问句模板中的推理规则,本发明设计了一套模板逻辑系统。该逻辑系统是一个以医学概念与医学关系为运算对象,对多入口医学问句模板进行语义运算与语义解释的一种特殊的一阶描述逻辑,并将推理规则定义成一个通过模板描述逻辑系统定义的逻辑蕴涵式,本发明将应用于多入口医学问句模板中的模板描述逻辑系统(Template Description Logic System)TDLS定义成如下的二元组:In order to accurately express the semantics of multi-entry medical question templates and define the inference rules in the multi-entry question templates, the present invention designs a template logic system. The logic system is a special first-order description logic that uses medical concepts and medical relationships as operation objects, performs semantic operations and semantic interpretations on multi-entry medical question templates, and defines the inference rules as a logical implication defined by the template description logic system. The present invention defines the template description logic system (TDLS) applied to the multi-entry medical question template as the following two-tuple:
TDLS::=(<谓词集合>,<运算符>)TDLS::=(<predicate set>,<operator>)
(1)谓词,是用于声明、识别与确定问句模板中的医学概念与医学关系;在TDLS中,包括三种类型的谓词:一元谓词、二元谓词和三元谓词。一元谓词用于声明模板变量所属的医学知识的类别。二元谓词用于声明两个模板变量之间的语义关系。三元谓词,用于声明医学关系的定义域和值域。其中,一元谓词也可以用作模板结构标注中的变量类型符。表3、表4和表5分别列出并解释了本发明模板描述逻辑中的一元谓词、二元谓词和三元谓词。(1) Predicates are used to declare, identify and determine medical concepts and medical relationships in question templates; in TDLS, there are three types of predicates: unary predicates, binary predicates and ternary predicates. Unary predicates are used to declare the category of medical knowledge to which the template variable belongs. Binary predicates are used to declare the semantic relationship between two template variables. Ternary predicates are used to declare the domain and range of medical relationships. Among them, unary predicates can also be used as variable type symbols in template structure annotations. Tables 3, 4 and 5 respectively list and explain the unary predicates, binary predicates and ternary predicates in the template description logic of the present invention.
表3 模板逻辑中的一元谓词Table 3 Unary predicates in template logic
表4 模板逻辑中的二元谓词Table 4 Binary predicates in template logic
表5 模板逻辑中的三元谓词Table 5 Ternary predicates in template logic
(2)运算符:本发明通过扩展一阶谓词的常规运算符,进一步提高模板逻辑的表达能力,分为单目、双目与三目三种类型的运算符。表6列出了本文为模板逻辑所扩展的运算符。(2) Operators: This invention further improves the expressiveness of template logic by extending conventional operators of first-order predicates, which are divided into three types of operators: unary, binary, and ternary. Table 6 lists the operators extended by this invention for template logic.
表6 模板运算符Table 6 Template operators
(五)本发明所使用的医学知识库(包括医学概念与医学关系)所符合的规范(V) The medical knowledge base (including medical concepts and medical relationships) used in this invention complies with the standards
本发明的多入口医学问句模板所涉及的医学知识库系符合美国国立医学图书馆(National Library of Medicine)所提出的UMLS(Unified Medical Language System)统一医学语言系统,即本发明的多入口医学问句模板是基于UMLS的医学概念与医学关系。UMLS是一种医学概念语义网,其中医学概念超过一百万个,概念类型为133种语义类型,并在这些概念类型之间建立了超过76种关系,然后以树结构组织而成。The medical knowledge base involved in the multi-entry medical question template of the present invention is in accordance with the UMLS (Unified Medical Language System) proposed by the National Library of Medicine of the United States, that is, the multi-entry medical question template of the present invention is based on the medical concepts and medical relationships of UMLS. UMLS is a medical concept semantic network, in which there are more than one million medical concepts, 133 semantic types of concepts, and more than 76 relationships are established between these concept types, and then organized in a tree structure.
(六)多入口医学问句模板装置的应用方法(VI) Application method of multi-entry medical question template device
应用方法具体包括:The application methods include:
S1.首先建立多入口医学问句模板库。本发明的多入口医学问句模板库是指在一个基于UMLS的医学问答系统中所建立的多入口医学问句模板的集合,它反映了用户基于UMLS的医学知识的兴趣和需求。建立多入口医学问句模板库的具体流程为:S1. First, establish a multi-entry medical question template library. The multi-entry medical question template library of the present invention refers to a collection of multi-entry medical question templates established in a medical question answering system based on UMLS, which reflects the user's interest and demand for medical knowledge based on UMLS. The specific process of establishing a multi-entry medical question template library is as follows:
S11.收集基于UMLS的医学问答系统中的用户问句集合,并对所有用户问句进行词根还原;S11. Collect the user question sets in the UMLS-based medical question answering system and perform root word restoration on all user questions;
S12.使用模板标注符与变量类型符对用户问句进行模板化,标注出其中的核心元素、可选元素、UMLS的概念名与关系名,并将其中的每一个UMLS概念名、概念名集合与关系名分别使用一个变量名及变量类型符进行替换;S12. Template the user's question using template annotations and variable type annotations, annotate the core elements, optional elements, UMLS concept names and relationship names, and replace each UMLS concept name, concept name set and relationship name with a variable name and variable type annotation;
S13.对模板化后的用户问句进行分类,将语义相近的用户问句收集在一起,形成一个多入口医学问句模板的模板结构集;S13. classifying the templated user questions, collecting user questions with similar semantics together, and forming a template structure set of multi-entry medical question templates;
S14.重复步骤S13,直到用户问句集合中所有用户问句分类完毕,形成只包括模板结构集的多入口医学问句模板库;S14. Repeat step S13 until all user questions in the user question set are classified, forming a multi-entry medical question template library including only the template structure set;
S15.按照模板结构的不同子语义,同时结合UMLS知识库中的医学概念与医学关系,将每一个多入口医学问句模板中的模板结构集分成多组,形成多个入口联合结构;S15. According to different sub-semantics of the template structure and in combination with the medical concepts and medical relations in the UMLS knowledge base, the template structure set in each multi-entry medical question template is divided into multiple groups to form a multi-entry joint structure;
S16.使用所述模板描述逻辑系统(TDLS),同时结合UMLS知识库中的医学概念与医学关系,为多入口医学问句模板库的每一个入口联合结构设计出符合该入口联合结构的推理规则与推理函数。S16. Using the template description logic system (TDLS), combined with the medical concepts and medical relations in the UMLS knowledge base, design reasoning rules and reasoning functions that conform to the entry joint structure for each entry joint structure of the multi-entry medical question template library.
S2.将待匹配的用户问句与多入口医学问句模板库中的问句模板相匹配:S2. Match the user question to be matched with the question template in the multi-entry medical question template library:
S21.将待匹配的用户问句进行如下预处理:首先,对用户问句进行词根还原,并以UMLS统一医学语言系统中的医学知识元素作为词典,标注出用户问句中的医学概念与关系,标注方式为:<医学概念:Concept>、<医学概念集合:ConceptSet>与<医学关系:Relation>,如:<chronic podopompholyx:Concept>,然后,将用户问句中非医学概念的名词与非医学关系的动词标注为核心元素,以及将疑问词与介词标注为核心元素,最后得到标注有核心元素与UMLS医学知识元素的用户问句;UMLS医学知识元素是指UMLS统一医学语言系统中的医学概念、医学概念集合与关系;S21. The user questions to be matched are preprocessed as follows: first, the user questions are rooted, and the medical knowledge elements in the UMLS unified medical language system are used as the dictionary to mark the medical concepts and relations in the user questions. The marking method is: <medical concept: Concept>, <medical concept set: ConceptSet> and <medical relationship: Relation>, such as: <chronic podopompholyx: Concept>, then the nouns of non-medical concepts and verbs of non-medical relations in the user questions are marked as core elements, and the question words and prepositions are marked as core elements, and finally the user questions marked with core elements and UMLS medical knowledge elements are obtained; UMLS medical knowledge elements refer to medical concepts, medical concept sets and relations in the UMLS unified medical language system;
S22.将预处理后的用户问句,依次与多入口医学问句模板库中的问句模板进行句模相似度计算,将其中句模相似度最高的问句模板确认为与该用户问句匹配的问句模板。S22. The preprocessed user question is sequentially subjected to sentence template similarity calculations with the question templates in the multi-entry medical question template library, and the question template with the highest sentence template similarity is confirmed as the question template matching the user question.
S3.将与用户问句匹配度最高的问句模板所在的入口联合结构,作为用户问句在所匹配的多入口联合结构中的执行入口,执行符合该入口联合结构的推理函数以完成答案的推理与抽取。S3. The entry joint structure containing the question template with the highest matching degree with the user's question is used as the execution entry of the user's question in the matched multi-entry joint structure, and the reasoning function that conforms to the entry joint structure is executed to complete the reasoning and extraction of the answer.
进一步的,在步骤S22中,用户问句与每一个多入口医学问句模板的句模相似度计算公式为:Furthermore, in step S22, the formula for calculating the sentence template similarity between the user's question and each multi-entry medical question template is:
其中,User表示用户问句,MUTP表示模板库中的一个多入口医学问句模板,TSS表示多入口医学问句模板MUTP中的模板结构集合,包括MUTP中的主模板结构与近义模板结构,TS为模板结构集合TSS中的任意一个模板结构,StruSim(User,TS)表示用户问句User与模板结构TS间的结构相似度。match(User,MUTP)最大值的即为与用户问句匹配度最高的问句模板。Among them, User represents the user question, MUTP represents a multi-entry medical question template in the template library, TSS represents the template structure set in the multi-entry medical question template MUTP, including the main template structure and the synonymous template structure in MUTP, TS is any template structure in the template structure set TSS, StruSim(User,TS) represents the structural similarity between the user question User and the template structure TS. The question template with the maximum value of match(User,MUTP) is the one with the highest matching degree with the user question.
StruSim(User,TS)计算公式如下:The calculation formula of StruSim(User,TS) is as follows:
StruSim(User,TS)=VarSim(User,TS)×KeySim(User,TS) (12)StruSim(User,TS)=VarSim(User,TS)×KeySim(User,TS) (12)
其中,VarSim(User,TS)表示模板结构TS中的变量与用户问句User中的UMLS医学知识元素之间的相似度,计算方法如公式(13)与(14),KeySim(User,TS)表示模板结构TS中的核心元素与用户问句User中的核心元素之间的相似度,计算方法如公式(15)与(16):Among them, VarSim(User, TS) represents the similarity between the variables in the template structure TS and the UMLS medical knowledge elements in the user question User, and the calculation method is as shown in formulas (13) and (14). KeySim(User, TS) represents the similarity between the core elements in the template structure TS and the core elements in the user question User, and the calculation method is as shown in formulas (15) and (16):
其中,i为Var中的任意一个变量,Var为模板结构TS中的变量集合,j为UE中的任意一个UMLS医学知识元素,UE表示用户问句User中的UMLS医学知识元素集合,Type(i)Type(j)表示i、j所属的UMLS医学知识Among them, i is any variable in Var, Var is the variable set in the template structure TS, j is any UMLS medical knowledge element in UE, UE represents the UMLS medical knowledge element set in the user question User, Type(i)Type(j) represents the UMLS medical knowledge element to which i and j belong.
、的类型,表示j的类型与i的类型相同或j的类型被i的类型所包含,m为Key中的任意一个核心元素,Key为模板结构TS中的核心元素集合,n为KE中的任意一个核心元素,KE表示用户问句User中的核心元素集合,VS(i)、KS(m)分别表示模板结构TS中的变量i、核心元素m与用户问句User之间的相似度,表示sim(m,n)表示基于通用语义词典的归一化的词语语义相似度计算,通用语义词典是指基于分类结构、跨领域的可计算词典,如美国普林斯顿大学的WordNet或中科院的HowNet;STH为一相似度阈值,本实施例该阈值取0.85。sim(m,n)的计算公式为:, type, Indicates that the type of j is the same as the type of i or the type of j is included in the type of i, m is any core element in Key, Key is the core element set in the template structure TS, n is any core element in KE, KE represents the core element set in the user question User, VS(i) and KS(m) respectively represent the similarity between the variable i and the core element m in the template structure TS and the user question User, sim(m,n) represents the normalized semantic similarity calculation of words based on the universal semantic dictionary, the universal semantic dictionary refers to a computable dictionary based on a classification structure and across fields, such as WordNet of Princeton University in the United States or HowNet of the Chinese Academy of Sciences; STH is a similarity threshold, which is 0.85 in this embodiment. The calculation formula of sim(m,n) is:
其中,LCS(m,n)表示核心元素m,n之间在通用语义词典中的最近公共父结点,depth(LCS(m,n))表示LCS(m,n)在通用语义词典中的深度,pathLen(m,n)表示核心元素m,n之间在通用语义词典中的最短路径。Among them, LCS(m,n) represents the nearest common parent node between core elements m and n in the universal semantic dictionary, depth(LCS(m,n)) represents the depth of LCS(m,n) in the universal semantic dictionary, and pathLen(m,n) represents the shortest path between core elements m and n in the universal semantic dictionary.
进一步的,步骤S3中,执行符合入口联合结构的推理函数包括以下子步骤:Furthermore, in step S3, executing the inference function conforming to the entry joint structure includes the following sub-steps:
S31.首先执行与首选推理规则相绑定的首选推理函数,当且仅当首选推理函数执行失败,才开始执行次选推理函数,并转步骤S32,否则返回首选推理函数的答案抽取结果并结束推理函数的执行;S31. First, execute the preferred reasoning function bound to the preferred reasoning rule. If and only if the execution of the preferred reasoning function fails, start executing the secondary reasoning function and go to step S32. Otherwise, return the answer extraction result of the preferred reasoning function and end the execution of the reasoning function.
S32.执行第一个次选推理函数,当且仅当第一个次选推理函数执行失败,才开始执行第二个次选推理函数,按此方式依次执行模板所绑定的所有次选推理函数,如果所有推理函数全部执行失败则返回一个失败信息,否则返回执行成功的推理函数的答案抽取结果并结束推理函数的执行;S32. Execute the first secondary reasoning function. If and only if the first secondary reasoning function fails to execute, start executing the second secondary reasoning function. In this way, execute all secondary reasoning functions bound to the template in sequence. If all reasoning functions fail to execute, return a failure message. Otherwise, return the answer extraction result of the successfully executed reasoning function and end the execution of the reasoning function.
所述推理函数执行失败是指推理函数在UMLS知识库中没有发现提问概念具有规定的语义关系记录。The failure of executing the reasoning function means that the reasoning function does not find a record of the question concept having a prescribed semantic relationship in the UMLS knowledge base.
(七)多入口医学问句模板的具体实例:(VII) Specific examples of multi-entry medical question templates:
在本实例中,通过两个多入口医学问句模板的实际定义,展示本发明所提出的多入口医学问句模板的结构及其定义方法。In this example, the structure of the multi-entry medical question template proposed by the present invention and its definition method are demonstrated through the actual definition of two multi-entry medical question templates.
实例1。Example 1.
问句模板1:询问概念所包含的部分Question template 1: Asking about the parts of a concept
涉及关系:part of关系(id=123005000)Involved relationship: part of relationship (id = 123005000)
与is-a关系(id=116680003)Is-a relationship (id=116680003)
<主模板结构>::=<what><do>[the]<c:body structure><include|contain>[?]<main template structure>::=<what><do>[the]<c:body structure><include|contain>[? ]
<近义模板结构>::={<what><category|type><of><c:Concept><be><include|contain>[?],<synonymous template structure>::={<what><category|type><of><c:Concept><be><include|contain>[? ],
<what><be>[the]<category><of><c:Concept>[?],<what><be>[the]<category><of><c:Concept>[? ],
<what><be>[the]<part><of><c:body structure>[?]}<what><be>[the]<part><of><c:body structure>[? ]}
<入口联合结构1>::=(<Entry Union Structure 1>::=(
<同义模板结构>::={<what><do>[the]<c:body structure><include|contain>[?],<Synonymous template structure>::={<what><do>[the]<c:body structure><include|contain>[? ],
<what><be>[the]<part><of><c:body structure>[?]}<what><be>[the]<part><of><c:body structure>[? ]}
<首选推理函数>::=Reasoning_function1(c,pa)<Preferred reasoning function>::=Reasoning_function1(c,pa)
<次选推理函数>::=Reasoning_function1(c,ia)<Secondary Reasoning Function>::=Reasoning_function1(c,ia)
))
<入口联合结构2>::=(<Entry Union Structure 2>::=(
<同义模板结构>::={<what><category|type><of><c:Concept><be><include|contain>[?],<Synonymous template structure>::={<what><category|type><of><c:Concept><be><include|contain>[? ],
<what><be>[the]<category|type><of><c:Concept>[?]}<what><be>[the]<category|type><of><c:Concept>[? ]}
<首选函数调用>::=Reasoning_function1(c,ia)<Preferred function call>::=Reasoning_function1(c,ia)
<次选函数调用>::=Reasoning_function1(c,pa)<Secondary function call>::=Reasoning_function1(c,pa)
))
<例句>:What does the Entire facial bone include?<Example>: What does the Entire facial bone include?
What are the types of Colds?What are the types of Colds?
实例2。Example 2.
问句模板2:询问疾病症状Question template 2: Asking about disease symptoms
涉及关系:Associated morphology相关表现形态(id=116676008)Related relations: Associated morphology (id=116676008)
Associated with与……相关(id=47429007)Associated with(id=47429007)
Due to原因(id=42752001)Due to reason (id=42752001)
<主模板结构>::=<what><be>[the]<obvious><indication><for|of><c:disease>[?]<Main template structure>::=<what><be>[the]<obvious><indication><for|of><c:disease>[? ]
<近义模板结构>::={<what><do><c:disease><relate|connect|associate><to>[?],<synonymous template structure>::={<what><do><c:disease><relate|connect|associate><to>[?],
<what><be>[the]<obvious><indication><of|for><c:disease>[?],<what><be>[the]<obvious><indication><of|for><c:disease>[? ],
<what><be>[the]<obvious><indication><for|of><patients><with><c:disease>[?],<what><be>[the]<obvious><indication><for|of><patients><with><c:disease>[? ],
<what><be>[the]<obvious><symptom><for|of><c:disease>[?],<what><be>[the]<obvious><symptom><for|of><c:disease>[? ],
<what><cause>[the]<disease><of><c:disease>[?],<what><cause>[the]<disease><of><c:disease>[? ],
<What><do><c:Concept><relate|connect|associate>to[?]}<What><do><c:Concept><relate|connect|associate>to[? ]}
<入口联合结构1>::=(<Entry Union Structure 1>::=(
<同义模板结构>::={<what><be>[the]<obvious><indication><for|of><c:Concept>[?]<Synonymous template structure>::={<what><be>[the]<obvious><indication><for|of><c:Concept>[?]
<what><be>[the]<obvious><indication><for|of><patient><with><c:Concept>[?],<what><be>[the]<obvious><indication><for|of><patient><with><c:Concept>[? ],
<what><be>[the]<obvious><indication><of|for><c:Concept>[?],<what><be>[the]<obvious><indication><of|for><c:Concept>[? ],
<what><be>[the]<obvious><symptom><for|of><c:Concept>[?]}<what><be>[the]<obvious><symptom><for|of><c:Concept>[? ]}
<首选推理函数>::=Reasoning_function2(c,am)<Preferred reasoning function>::=Reasoning_function2(c,am)
<次选推理函数1>::=Reasoning_function2(c,aw)<Secondary Reasoning Function 1>::=Reasoning_function2(c,aw)
<次选推理函数2>::=Reasoning_function2(c,dt)<Secondary Reasoning Function 2>::=Reasoning_function2(c,dt)
))
<入口联合结构2>::=(<Entry Union Structure 2>::=(
<同义模板结构>::=<what><cause>[the]<disease><of><c:disease>[?]<synonymous template structure>::=<what><cause>[the]<disease><of><c:disease>[? ]
<首选推理函数>::=Reasoning_function2(c,dt)<Preferred reasoning function>::=Reasoning_function2(c,dt)
<次选推理函数1>::=Reasoning_function2(c,aw)<Secondary Reasoning Function 1>::=Reasoning_function2(c,aw)
<次选推理函数2>::=Reasoning_function2(c,am)<Secondary Reasoning Function 2>::=Reasoning_function2(c,am)
))
<入口联合结构3>::=(<Entry Union Structure 3>::=(
<同义模板结构>::=<what><do><c:disease><relate|connect|associate><to>[?]<Synonymous template structure>::=<what><do><c:disease><relate|connect|associate><to>[?]
<首选推理函数>::=Reasoning_function2(c,aw)<Preferred reasoning function>::=Reasoning_function2(c,aw)
<次选推理函数1>::=Reasoning_function2(c,am)<Secondary Reasoning Function 1>::=Reasoning_function2(c,am)
<次选推理函数2>::=Reasoning_function2(c,dt)<Secondary Reasoning Function 2>::=Reasoning_function2(c,dt)
))
<例句>:What are the symptoms for Staphylococcus epidermidismeningitis?<Example sentence>: What are the symptoms for Staphylococcus epidermidismeningitis?
What causes the disease of Diabetic maculopathy?What causes the disease of Diabetic maculopathy?
(八)多入口医学问句模板的应用实例(VIII) Application examples of multi-entry medical question templates
在本实例中,通过两个基于SNOMED-CT的医学问答系统来展示本发明提出的多入口医学问句模板的应用效果。SNOMED-CT(Systematized Nomenclature of MedicineClinical Terms,医学系统命名法—临床术语),是由国际卫生术语标准制定组织IHTSDO(International Health Terminology Standards Development Organization)开发和维护的基于概念的结构化综合性临床术语集。UMLS的概念来自一百多个不同的源词汇表,SNOMED-CT是UMLS的最重要的源词汇表,即SNOMED-CT本体是UMLS最为重要的子集。In this example, two medical question-answering systems based on SNOMED-CT are used to demonstrate the application effect of the multi-entry medical question template proposed in the present invention. SNOMED-CT (Systematized Nomenclature of MedicineClinical Terms) is a concept-based structured comprehensive clinical terminology set developed and maintained by the International Health Terminology Standards Development Organization (IHTSDO). The concepts of UMLS come from more than one hundred different source vocabularies, and SNOMED-CT is the most important source vocabulary of UMLS, that is, the SNOMED-CT ontology is the most important subset of UMLS.
SNOMED-CT医学本体是一个概念层级体系,其中,有一个非常特殊的根概念,该概念包含SNOMED-CT中所有概念层级体系。在SNOMED-CT中,通过“is-a”关系直接与根概念相连的概念被称为顶层概念,其余概念都至少会与一个顶层概念通过“is-a”关系相连。这些顶层概念在SNOMED-CT中有着不同含义,所有顶层概念ID及含义如下表7所示:SNOMED-CT medical ontology is a concept hierarchy system, in which there is a very special root concept, which contains all the concept hierarchies in SNOMED-CT. In SNOMED-CT, the concepts directly connected to the root concept through the "is-a" relationship are called top-level concepts, and the remaining concepts are connected to at least one top-level concept through the "is-a" relationship. These top-level concepts have different meanings in SNOMED-CT. All top-level concept IDs and meanings are shown in Table 7 below:
表7 SNOMED-CT中的顶层概念Table 7 Top-level concepts in SNOMED-CT
SNOMED-CT中的概念可构成的医学语义关系如表8所示。The medical semantic relations that can be formed by concepts in SNOMED-CT are shown in Table 8.
表8 SNOMED-CT中医学语义关系Table 8 Semantic relationships of traditional Chinese medicine in SNOMED-CT
SNOMED-CT是一个庞大的医学知识库,如今,它包含超过440,000个活跃医学概念和超过100万个活跃医学关系记录。本实例以SNOMED-CT为知识库,分别以传统的单入口、单执行路线的医学问句模板与本发明提出的多入口、多条执行路线的医学问句模板,实现了两个医学问答系统,通过对比验证本发明提出的多入口、多条执行路线的医学问句模板在医学问答系统中的优越性,实验结果如表9所示:SNOMED-CT is a huge medical knowledge base. Today, it contains more than 440,000 active medical concepts and more than 1 million active medical relationship records. This example uses SNOMED-CT as the knowledge base, and implements two medical question answering systems using the traditional single-entry, single-execution route medical question template and the multi-entry, multi-execution route medical question template proposed in the present invention. The superiority of the multi-entry, multi-execution route medical question template proposed in the present invention in the medical question answering system is verified by comparison. The experimental results are shown in Table 9:
表9 两个基于SNOMED-CT本体的医学问答系统性能对比Table 9 Performance comparison of two medical question answering systems based on SNOMED-CT ontology
其中,用户满意度的计算公式如下:The calculation formula of user satisfaction is as follows:
表9表明基于本发明模板的问答系统2将基于传统模板的问答系统1的问句模板数从150个大幅减少到了60个,从而提高了模板设计效率,这主要得益于本发明的多入口医学模板将具有同义或近义提问语义的问句结构集中到一个模板中,并通过多条执行路线进行分别处理。另一方面,表9还表明基于本发明模板的问答系统2将基于传统模板的问答系统1的用户满意度从78%大幅提高到了85%,这表明本发明的多入口医学模板可以显著地改进医学问答系统的性能,这主要得益于本发明的多入口医学模板采用多个绑定的推理函数来处理一个匹配的问句结构,每个推理函数尝试通过某一种语义关系推理答案,因此当提问概念不具有首选推理函数所规定的语义关系时,系统会尝试通过剩余的推理函数来获取答案,从而扩大了答案的推理范围。Table 9 shows that the question answering system 2 based on the template of the present invention has greatly reduced the number of question templates of the question answering system 1 based on the traditional template from 150 to 60, thereby improving the efficiency of template design. This is mainly due to the fact that the multi-entry medical template of the present invention has concentrated the question structures with synonymous or near-synonymous question semantics into one template and processed them separately through multiple execution routes. On the other hand, Table 9 also shows that the question answering system 2 based on the template of the present invention has greatly improved the user satisfaction of the question answering system 1 based on the traditional template from 78% to 85%, which shows that the multi-entry medical template of the present invention can significantly improve the performance of the medical question answering system. This is mainly due to the fact that the multi-entry medical template of the present invention uses multiple bound reasoning functions to process a matching question structure, and each reasoning function attempts to infer the answer through a certain semantic relationship. Therefore, when the question concept does not have the semantic relationship specified by the preferred reasoning function, the system will try to obtain the answer through the remaining reasoning functions, thereby expanding the reasoning range of the answer.
Claims (4)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910450711.1A CN110188170B (en) | 2019-05-28 | 2019-05-28 | Multi-entry medical question template device and method thereof |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910450711.1A CN110188170B (en) | 2019-05-28 | 2019-05-28 | Multi-entry medical question template device and method thereof |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN110188170A CN110188170A (en) | 2019-08-30 |
| CN110188170B true CN110188170B (en) | 2023-05-09 |
Family
ID=67718315
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201910450711.1A Active CN110188170B (en) | 2019-05-28 | 2019-05-28 | Multi-entry medical question template device and method thereof |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN110188170B (en) |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110888969A (en) * | 2019-11-27 | 2020-03-17 | 华为技术有限公司 | Dialog response method and device |
| CN113689176B (en) * | 2021-07-15 | 2024-08-02 | 东风汽车集团股份有限公司 | Method and system for establishing vehicle function safety management flow |
| CN115034204B (en) * | 2022-05-12 | 2023-05-23 | 浙江大学 | Method for generating structured medical text, computer device and storage medium |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6584464B1 (en) * | 1999-03-19 | 2003-06-24 | Ask Jeeves, Inc. | Grammar template query system |
| CN104361127A (en) * | 2014-12-05 | 2015-02-18 | 广西师范大学 | Multilanguage question and answer interface fast constituting method based on domain ontology and template logics |
| CN105868313A (en) * | 2016-03-25 | 2016-08-17 | 浙江大学 | Mapping knowledge domain questioning and answering system and method based on template matching technique |
| CN105912846A (en) * | 2016-04-07 | 2016-08-31 | 南京小网科技有限责任公司 | Intelligent medical aid decision making system on basis of cloud computing technique and medical knowledge base technique |
| CN108021703A (en) * | 2017-12-26 | 2018-05-11 | 广西师范大学 | A kind of talk formula intelligent tutoring system |
| CN109065139A (en) * | 2018-09-10 | 2018-12-21 | 平安科技(深圳)有限公司 | Medical follow up method, apparatus, computer equipment and storage medium |
| CN109492077A (en) * | 2018-09-29 | 2019-03-19 | 北明智通(北京)科技有限公司 | The petrochemical field answering method and system of knowledge based map |
-
2019
- 2019-05-28 CN CN201910450711.1A patent/CN110188170B/en active Active
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6584464B1 (en) * | 1999-03-19 | 2003-06-24 | Ask Jeeves, Inc. | Grammar template query system |
| CN104361127A (en) * | 2014-12-05 | 2015-02-18 | 广西师范大学 | Multilanguage question and answer interface fast constituting method based on domain ontology and template logics |
| CN105868313A (en) * | 2016-03-25 | 2016-08-17 | 浙江大学 | Mapping knowledge domain questioning and answering system and method based on template matching technique |
| CN105912846A (en) * | 2016-04-07 | 2016-08-31 | 南京小网科技有限责任公司 | Intelligent medical aid decision making system on basis of cloud computing technique and medical knowledge base technique |
| CN108021703A (en) * | 2017-12-26 | 2018-05-11 | 广西师范大学 | A kind of talk formula intelligent tutoring system |
| CN109065139A (en) * | 2018-09-10 | 2018-12-21 | 平安科技(深圳)有限公司 | Medical follow up method, apparatus, computer equipment and storage medium |
| CN109492077A (en) * | 2018-09-29 | 2019-03-19 | 北明智通(北京)科技有限公司 | The petrochemical field answering method and system of knowledge based map |
Non-Patent Citations (3)
| Title |
|---|
| A Biomedical Question Answering System Based on SNOMED-CT;Xinhua Zhu 等;《International Conference on Knowledge Science, Engineering and Management(KSEM 2018)》;20180812;第16-21页 * |
| Keyword question answering system with report generation for linked data;Sangdo Han 等;《2015 International Conference on Big Data and Smart Computing (BIGCOMP)》;20150402;第23-24页 * |
| 基于本体的自动答疑系统的研究与实现;刘汉兴 等;《计算机应用》;20100201;第30卷(第2期);第415-416页 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN110188170A (en) | 2019-08-30 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Saha et al. | ATHENA: an ontology-driven system for natural language querying over relational data stores | |
| Lee et al. | XClust: clustering XML schemas for effective integration | |
| US9715493B2 (en) | Method and system for monitoring social media and analyzing text to automate classification of user posts using a facet based relevance assessment model | |
| Alicante et al. | Unsupervised entity and relation extraction from clinical records in Italian | |
| Sawant et al. | Neural architecture for question answering using a knowledge graph and web corpus | |
| CN104361127A (en) | Multilanguage question and answer interface fast constituting method based on domain ontology and template logics | |
| CN115114420A (en) | A knowledge graph question answering method, terminal device and storage medium | |
| CN110188170B (en) | Multi-entry medical question template device and method thereof | |
| CN106649597A (en) | Method for automatically establishing back-of-book indexes of book based on book contents | |
| CN101446942A (en) | Semantic character labeling method of natural language sentence | |
| CN105335487A (en) | Agricultural specialist information retrieval system and method on basis of agricultural technology information ontology library | |
| CN104750819A (en) | A Biomedical Literature Retrieval Method and System Based on Word Grouping Algorithm | |
| CN105550189A (en) | Ontology-based intelligent retrieval system for information security event | |
| CN113076411A (en) | Medical query expansion method based on knowledge graph | |
| CN112651234B (en) | A method and device for semi-open information extraction | |
| CN101398858A (en) | Web service semantic extracting method based on noumenon learning | |
| CN107818081A (en) | Sentence similarity appraisal procedure based on deep semantic model and semantic character labeling | |
| Assi et al. | Data linking over RDF knowledge graphs: A survey | |
| CN104699695A (en) | Relation extraction method based on multi-feature semantic tree kernel and information retrieving method | |
| Krishnamurthy et al. | Which noun phrases denote which concepts? | |
| Menad et al. | BioSTransformers for Biomedical Ontologies Alignment. | |
| CN112883172B (en) | Biomedical question-answering method based on dual knowledge selection | |
| Dai et al. | Entity disambiguation using a markov-logic network | |
| CN110188169A (en) | A kind of knowledge matching process, system and equipment based on simplified label | |
| Ivanova | Cross-lingual and multilingual ontology mapping-survey |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| TA01 | Transfer of patent application right | ||
| TA01 | Transfer of patent application right |
Effective date of registration: 20230331 Address after: Room 801, 85 Kefeng Road, Huangpu District, Guangzhou City, Guangdong Province Applicant after: Yami Technology (Guangzhou) Co.,Ltd. Address before: 541004 No. 15 Yucai Road, Qixing District, Guilin, the Guangxi Zhuang Autonomous Region Applicant before: Guangxi Normal University Effective date of registration: 20230331 Address after: 156 Shop, Fukang Road, Xintiandi Commercial Plaza, Sucheng District, Suqian City, Jiangsu Province, 223800 Applicant after: Ding Yuehui Address before: Room 801, 85 Kefeng Road, Huangpu District, Guangzhou City, Guangdong Province Applicant before: Yami Technology (Guangzhou) Co.,Ltd. |
|
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| TR01 | Transfer of patent right | ||
| TR01 | Transfer of patent right |
Effective date of registration: 20241126 Address after: Innovation Space N-1, N-3, 5th Floor, Suning Square, Sucheng District, Suqian City, Jiangsu Province, 223800 Patentee after: Jiangsu Weiyao Information Technology Co.,Ltd. Country or region after: China Address before: 156 Shop, Fukang Road, Xintiandi Commercial Plaza, Sucheng District, Suqian City, Jiangsu Province, 223800 Patentee before: Ding Yuehui Country or region before: China |