CN117373592A - Medical record diagnosis write omission detection method, device, equipment and readable storage medium - Google Patents
Medical record diagnosis write omission detection method, device, equipment and readable storage medium Download PDFInfo
- Publication number
- CN117373592A CN117373592A CN202311322233.9A CN202311322233A CN117373592A CN 117373592 A CN117373592 A CN 117373592A CN 202311322233 A CN202311322233 A CN 202311322233A CN 117373592 A CN117373592 A CN 117373592A
- Authority
- CN
- China
- Prior art keywords
- diagnosis
- medical record
- candidate
- record text
- missed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/10—Pre-processing; Data cleansing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Primary Health Care (AREA)
- Epidemiology (AREA)
- Medical Informatics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Public Health (AREA)
- Computational Linguistics (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
Description
技术领域Technical Field
本申请涉及自然语言处理技术领域,更具体的说,是涉及一种病历诊断漏写检测方法、装置、设备及可读存储介质。The present application relates to the technical field of natural language processing, and more specifically, to a method, device, equipment and readable storage medium for detecting omissions in medical record diagnosis.
背景技术Background Art
病历诊断填写是医疗领域中至关重要的环节,完整、准确的诊断有助于减少医疗纠纷风险,对后续诊疗行为分析,优化资源配置,提升医疗服务质量有重大意义,因此,病历诊断漏写检测显得尤为必要。Filling out medical record diagnosis is a crucial link in the medical field. A complete and accurate diagnosis helps reduce the risk of medical disputes and is of great significance to the subsequent analysis of diagnosis and treatment behaviors, optimizing resource allocation, and improving the quality of medical services. Therefore, it is particularly necessary to detect missed diagnoses in medical records.
目前,病历诊断漏写检测一般采用人工检测的方式,即由医生或者医院的病历质检人员对病历进行检查,将病历中可能漏写的诊断进行标注,然而,病历实际的书写情况非常复杂,通过人工检测病历诊断的漏写情况工作量庞大,检测效率低下,且检测人员的专业素质对检测结果的准确性影响较大。为了解决人工进行病历诊断漏写检测造成的检测效率和检测准确性低下的问题,现有技术中还存在一些通过深度学习模型、规则库或知识库实现的病历诊断漏写检测技术,但是这些技术中都只是依据诊断语义相似度进行诊断漏写检测,而实际上语义不相似的两个诊断可能是同一诊断,这将导致病历诊断漏写检测结果准确度较低。At present, medical record diagnosis omission detection generally adopts manual detection, that is, doctors or hospital medical record quality inspectors check medical records and mark diagnoses that may be omitted in medical records. However, the actual writing of medical records is very complicated. Manual detection of omissions in medical record diagnoses is a huge workload, with low detection efficiency, and the professional quality of the inspectors has a great impact on the accuracy of the detection results. In order to solve the problems of low detection efficiency and accuracy caused by manual detection of medical record diagnosis omissions, there are some medical record diagnosis omission detection technologies implemented by deep learning models, rule bases or knowledge bases in the prior art. However, these technologies only perform diagnosis omission detection based on the semantic similarity of diagnoses, and in fact, two diagnoses with different semantics may be the same diagnosis, which will lead to low accuracy of medical record diagnosis omission detection results.
因此,如何提供一种病历诊断漏写检测方法,以提升病历诊断漏写检测结果准确度,成为本领域技术人员亟待解决的技术问题。Therefore, how to provide a method for detecting omissions in medical record diagnosis to improve the accuracy of the detection results of omissions in medical record diagnosis has become a technical problem that needs to be urgently solved by technical personnel in this field.
发明内容Summary of the invention
鉴于上述问题,本申请提出了一种病历诊断漏写检测方法、装置、设备及可读存储介质。具体方案如下:In view of the above problems, this application proposes a method, device, equipment and readable storage medium for detecting missed medical record diagnosis. The specific solution is as follows:
一种病历诊断漏写检测方法,所述方法包括:A method for detecting missed medical record diagnoses, the method comprising:
确定待检测的病历文本;Determine the medical records to be tested;
确定所述病历文本对应的候选漏写诊断;Determining a candidate omitted diagnosis corresponding to the medical record text;
从所述候选漏写诊断中确定目标候选漏写诊断,所述目标候选漏写诊断为与所述病历文本中的已写诊断语义一致的候选漏写诊断,以及,被确定是所述已写诊断的症状的候选漏写诊断;Determine a target candidate omitted diagnosis from the candidate omitted diagnoses, the target candidate omitted diagnosis being a candidate omitted diagnosis that is semantically consistent with a written diagnosis in the medical record text, and a candidate omitted diagnosis that is determined to be a symptom of the written diagnosis;
将所述候选漏写诊断中除所述目标候选漏写诊断之外的其他候选漏写诊断确定为漏写诊断。The other candidate omission diagnoses among the candidate omission diagnoses except the target candidate omission diagnosis are determined as omission diagnoses.
可选地,所述确定所述病历文本对应的候选漏写诊断,包括:Optionally, determining a candidate omitted diagnosis corresponding to the medical record text includes:
生成所述病历文本对应的提示信息,所述提示信息用于指示确定所述病历文本对应的候选漏写诊断;Generating prompt information corresponding to the medical record text, wherein the prompt information is used to indicate the candidate omitted diagnosis corresponding to the medical record text;
调用生成式模型,将所述病历文本对应的提示信息输入所述生成式模型;Calling a generative model, and inputting prompt information corresponding to the medical record text into the generative model;
基于所述生成式模型的回复,确定所述病历文本对应的候选漏写诊断。Based on the response of the generative model, a candidate omitted diagnosis corresponding to the medical record text is determined.
可选地,所述生成所述病历文本对应的提示信息,包括:Optionally, the generating of prompt information corresponding to the medical record text includes:
对所述病历文本进行分割,得到多个病历文本片段;Segmenting the medical record text to obtain multiple medical record text segments;
针对每个病历文本片段,生成与所述病历文本片段对应的提示信息,所述提示信息用于指示确定所述病历文本片段对应的候选漏写诊断;各个病历文本片段对应的提示信息的组合为所述病历文本对应的提示信息。For each medical record text fragment, prompt information corresponding to the medical record text fragment is generated, and the prompt information is used to indicate the candidate omitted diagnosis corresponding to the medical record text fragment; the combination of prompt information corresponding to each medical record text fragment is the prompt information corresponding to the medical record text.
可选地,所述生成式模型的是通过对已有的医疗领域生成式模型进行微调后得到的;所述对已有的医疗领域生成式模型进行微调的方式包括:Optionally, the generative model is obtained by fine-tuning an existing generative model in the medical field; the method of fine-tuning the existing generative model in the medical field includes:
获取训练用病历文本、所述训练用病历文本对应的诊断以及所述训练用病历文本中各个诊断的标签,所述训练用病历文本中每个诊断的标签包括第一标签、第二标签和第三标签,所述第一标签用于指示所述诊断是否确诊、所述第二标签用于指示所述诊断的诊断类型、所述第三标签用于指示所述诊断的召回支撑片段;Obtaining a training medical record text, a diagnosis corresponding to the training medical record text, and a label of each diagnosis in the training medical record text, wherein the label of each diagnosis in the training medical record text includes a first label, a second label, and a third label, wherein the first label is used to indicate whether the diagnosis is confirmed, the second label is used to indicate the diagnosis type of the diagnosis, and the third label is used to indicate a recall support segment of the diagnosis;
基于所述训练用病历文本、所述训练用病历文本对应的诊断以及所述训练用病历文本中各个诊断的标签,生成多个训练语料,每个所述训练语料中包括输入文本以及预期输出文本;Based on the training medical record text, the diagnosis corresponding to the training medical record text, and the label of each diagnosis in the training medical record text, a plurality of training corpora are generated, each of the training corpora includes an input text and an expected output text;
将所述输入文本输入已有的医疗领域生成式模型,以所述已有的医疗领域生成式模型的输出趋近于所述输入文本对应的预期输出文本为目标,对所述已有的医疗领域生成式模型进行微调。The input text is input into an existing generative model in the medical field, and the existing generative model in the medical field is fine-tuned with the goal of making the output of the existing generative model in the medical field approach the expected output text corresponding to the input text.
可选地,所述从所述候选漏写诊断中确定目标候选漏写诊断,包括:Optionally, determining a target candidate omission diagnosis from the candidate omission diagnoses includes:
基于所述候选漏写诊断与所述已写诊断,确定多个诊断对,每个诊断对中包括一个所述候选漏写诊断以及一个所述已写诊断;Based on the candidate missing diagnosis and the written diagnosis, determining a plurality of diagnosis pairs, each diagnosis pair including one of the candidate missing diagnosis and one of the written diagnosis;
针对每个诊断对,判断所述诊断对中两个诊断的语义是否一致;For each diagnosis pair, determining whether the semantics of two diagnoses in the diagnosis pair are consistent;
如果所述诊断对中两个诊断的语义一致,则确定所述诊断对中的候选漏写诊断为目标候选漏写诊断;If the semantics of the two diagnoses in the diagnosis pair are consistent, determining the candidate omitted diagnosis in the diagnosis pair as the target candidate omitted diagnosis;
如果所述诊断对中两个诊断的语义不一致,则将所述诊断对与预设的诊断症状对集合中各个诊断症状对进行对比;If the semantics of the two diagnoses in the diagnosis pair are inconsistent, comparing the diagnosis pair with each diagnosis-symptom pair in a preset diagnosis-symptom pair set;
如果所述诊断对与所述诊断症状对集合中某个诊断症状对一致,则确定所述候选漏写诊断为目标候选漏写诊断。If the diagnosis pair is consistent with a diagnosis symptom pair in the diagnosis symptom pair set, the candidate omission diagnosis is determined to be a target candidate omission diagnosis.
可选地,所述基于所述候选漏写诊断与所述已写诊断,确定多个诊断对,包括:Optionally, the determining a plurality of diagnosis pairs based on the candidate missed diagnosis and the written diagnosis includes:
对所述候选漏写诊断进行规范化处理,得到规范化处理后的候选漏写诊断;Normalizing the candidate omission diagnoses to obtain normalized candidate omission diagnoses;
对所述已写诊断进行规范化处理,得到规范化处理后的已写诊断;Performing normalization processing on the written diagnosis to obtain a normalized written diagnosis;
基于所述规范化处理后的候选漏写诊断与所述规范化处理后的已写诊断,确定所述多个诊断对。The plurality of diagnosis pairs are determined based on the normalized candidate missed diagnoses and the normalized written diagnoses.
可选地,所述基于所述候选漏写诊断与所述已写诊断,确定多个诊断对,包括:Optionally, the determining a plurality of diagnosis pairs based on the candidate missed diagnosis and the written diagnosis includes:
对所述候选漏写诊断与所述已写诊断进行两两组合形成候选诊断对;每个所述候选诊断对中包括一个所述候选漏写诊断以及一个所述已写诊断;Combining the candidate missing diagnosis and the written diagnosis in pairs to form candidate diagnosis pairs; each of the candidate diagnosis pairs includes one candidate missing diagnosis and one written diagnosis;
对各个所述候选诊断对进行规范化处理,得到规范化处理后的候选诊断对;Performing normalization processing on each of the candidate diagnosis pairs to obtain normalized candidate diagnosis pairs;
对各个所述规范化处理后的候选诊断对进行去重处理,确定所述多个诊断对。De-duplication processing is performed on each of the normalized candidate diagnosis pairs to determine the multiple diagnosis pairs.
可选地,所述预设的诊断症状对集合的确定方式包括:Optionally, the preset diagnostic symptom pair set is determined in a manner including:
获取参考病历文本、所述参考病历文本对应的候选漏写诊断以及所述参考病历文本对应的已写诊断;Obtaining a reference medical record text, a candidate omitted diagnosis corresponding to the reference medical record text, and a written diagnosis corresponding to the reference medical record text;
对所述参考病历文本对应的已写诊断进行两两组合得到第一诊断对集合,对所述参考病历文本对应的候选漏写诊断以及所述参考病历文本对应的已写诊断进行两两组合,并对所述参考病历文本对应的候选漏写诊断进行两两组合得到第二诊断对集合;Combining the written diagnoses corresponding to the reference medical record text in pairs to obtain a first diagnosis pair set, combining the candidate omitted diagnoses corresponding to the reference medical record text and the written diagnoses corresponding to the reference medical record text in pairs, and combining the candidate omitted diagnoses corresponding to the reference medical record text in pairs to obtain a second diagnosis pair set;
计算所述第一诊断对集合中,每个所述第一诊断对的共现概率;Calculating the co-occurrence probability of each of the first diagnosis pairs in the first diagnosis pair set;
基于所述第一诊断对集合中,各个所述第一诊断对的共现概率,从所述第一诊断对集合以及所述第二诊断对集合中确定所述诊断症状对集合。The diagnosis-symptom pair set is determined from the first diagnosis pair set and the second diagnosis pair set based on the co-occurrence probability of each of the first diagnosis pairs in the first diagnosis pair set.
一种病历诊断漏写检测装置,所述装置包括:A device for detecting omissions in medical record diagnosis, the device comprising:
病历文本确定单元,用于确定待检测的病历文本;A medical record text determination unit, used to determine the medical record text to be detected;
候选漏写诊断确定单元,用于确定所述病历文本对应的候选漏写诊断;A candidate omitted diagnosis determining unit, used to determine a candidate omitted diagnosis corresponding to the medical record text;
目标候选漏写诊断确定单元,用于从所述候选漏写诊断中确定目标候选漏写诊断,所述目标候选漏写诊断为与所述病历文本中的已写诊断语义一致的候选漏写诊断,以及,被确定是所述已写诊断的症状的候选漏写诊断;a target candidate omitted diagnosis determining unit, configured to determine a target candidate omitted diagnosis from the candidate omitted diagnoses, wherein the target candidate omitted diagnosis is a candidate omitted diagnosis that is semantically consistent with a written diagnosis in the medical record text, and a candidate omitted diagnosis that is determined to be a symptom of the written diagnosis;
漏写诊断确定单元,用于将所述候选漏写诊断中除所述目标候选漏写诊断之外的其他候选漏写诊断确定为漏写诊断。The omission diagnosis determining unit is configured to determine other candidate omission diagnoses among the candidate omission diagnoses except the target candidate omission diagnosis as omission diagnoses.
可选地,所述候选漏写诊断确定单元,包括:Optionally, the candidate omission diagnosis determination unit includes:
提示信息生成子单元,用于生成所述病历文本对应的提示信息,所述提示信息用于指示确定所述病历文本对应的候选漏写诊断;A prompt information generating subunit, used for generating prompt information corresponding to the medical record text, wherein the prompt information is used for indicating the candidate omitted diagnosis corresponding to the medical record text;
生成式模型调用子单元,用于调用生成式模型,将所述病历文本对应的提示信息输入所述生成式模型;A generative model calling subunit, used to call the generative model and input the prompt information corresponding to the medical record text into the generative model;
候选漏写诊断确定子单元,用于基于所述生成式模型的回复,确定所述病历文本对应的候选漏写诊断。The candidate omitted diagnosis determination subunit is used to determine the candidate omitted diagnosis corresponding to the medical record text based on the response of the generative model.
可选地,所述提示信息生成子单元,具体用于:Optionally, the prompt information generating subunit is specifically used to:
对所述病历文本进行分割,得到多个病历文本片段;Segmenting the medical record text to obtain multiple medical record text segments;
针对每个病历文本片段,生成与所述病历文本片段对应的提示信息,所述提示信息用于指示确定所述病历文本片段对应的候选漏写诊断;各个病历文本片段对应的提示信息的组合为所述病历文本对应的提示信息。For each medical record text fragment, prompt information corresponding to the medical record text fragment is generated, and the prompt information is used to indicate the candidate omitted diagnosis corresponding to the medical record text fragment; the combination of prompt information corresponding to each medical record text fragment is the prompt information corresponding to the medical record text.
可选地,所述生成式模型的是通过对已有的医疗领域生成式模型进行微调后得到的;所述装置包括微调单元,所述微调单元具体用于:Optionally, the generative model is obtained by fine-tuning an existing generative model in the medical field; the device includes a fine-tuning unit, and the fine-tuning unit is specifically used to:
获取训练用病历文本、所述训练用病历文本对应的诊断以及所述训练用病历文本中各个诊断的标签,所述训练用病历文本中每个诊断的标签包括第一标签、第二标签和第三标签,所述第一标签用于指示所述诊断是否确诊、所述第二标签用于指示所述诊断的诊断类型、所述第三标签用于指示所述诊断的召回支撑片段;Obtaining a training medical record text, a diagnosis corresponding to the training medical record text, and a label of each diagnosis in the training medical record text, wherein the label of each diagnosis in the training medical record text includes a first label, a second label, and a third label, wherein the first label is used to indicate whether the diagnosis is confirmed, the second label is used to indicate the diagnosis type of the diagnosis, and the third label is used to indicate a recall support segment of the diagnosis;
基于所述训练用病历文本、所述训练用病历文本对应的诊断以及所述训练用病历文本中各个诊断的标签,生成多个训练语料,每个所述训练语料中包括输入文本以及预期输出文本;Based on the training medical record text, the diagnosis corresponding to the training medical record text, and the label of each diagnosis in the training medical record text, a plurality of training corpora are generated, each of the training corpora includes an input text and an expected output text;
将所述输入文本输入已有的医疗领域生成式模型,以所述已有的医疗领域生成式模型的输出趋近于所述输入文本对应的预期输出文本为目标,对所述已有的医疗领域生成式模型进行微调。The input text is input into an existing generative model in the medical field, and the existing generative model in the medical field is fine-tuned with the goal of making the output of the existing generative model in the medical field approach the expected output text corresponding to the input text.
可选地,所述目标候选漏写诊断确定单元,包括:Optionally, the target candidate missing writing diagnosis determination unit includes:
诊断对确定单元,用于基于所述候选漏写诊断与所述已写诊断,确定多个诊断对,每个诊断对中包括一个所述候选漏写诊断以及一个所述已写诊断;A diagnosis pair determination unit, configured to determine a plurality of diagnosis pairs based on the candidate missing diagnosis and the written diagnosis, each diagnosis pair including one of the candidate missing diagnosis and one of the written diagnosis;
判断单元,用于针对每个诊断对,判断所述诊断对中两个诊断的语义是否一致;A judging unit, configured to judge, for each diagnosis pair, whether the semantics of two diagnoses in the diagnosis pair are consistent;
第一处理单元,用于如果所述诊断对中两个诊断的语义一致,则确定所述诊断对中的候选漏写诊断为目标候选漏写诊断;A first processing unit, configured to determine a candidate omitted diagnosis in the diagnosis pair as a target candidate omitted diagnosis if the semantics of the two diagnoses in the diagnosis pair are consistent;
对比单元,用于如果所述诊断对中两个诊断的语义不一致,则将所述诊断对与预设的诊断症状对集合中各个诊断症状对进行对比;A comparing unit, configured to compare the diagnosis pair with each diagnosis-symptom pair in a preset diagnosis-symptom pair set if the semantics of the two diagnoses in the diagnosis pair are inconsistent;
第二处理单元,用于如果所述诊断对与所述诊断症状对集合中某个诊断症状对一致,则确定所述候选漏写诊断为目标候选漏写诊断。The second processing unit is configured to determine the candidate omission diagnosis as a target candidate omission diagnosis if the diagnosis pair is consistent with a diagnosis symptom pair in the diagnosis symptom pair set.
可选地,所述诊断对确定单元,具体用于:Optionally, the diagnosis pair determination unit is specifically configured to:
对所述候选漏写诊断进行规范化处理,得到规范化处理后的候选漏写诊断;Normalizing the candidate omission diagnoses to obtain normalized candidate omission diagnoses;
对所述已写诊断进行规范化处理,得到规范化处理后的已写诊断;Performing normalization processing on the written diagnosis to obtain a normalized written diagnosis;
基于所述规范化处理后的候选漏写诊断与所述规范化处理后的已写诊断,确定所述多个诊断对。The plurality of diagnosis pairs are determined based on the normalized candidate missed diagnoses and the normalized written diagnoses.
可选地,所述诊断对确定单元,具体用于:Optionally, the diagnosis pair determination unit is specifically configured to:
对所述候选漏写诊断与所述已写诊断进行两两组合形成候选诊断对;每个所述候选诊断对中包括一个所述候选漏写诊断以及一个所述已写诊断;Combining the candidate missing diagnosis and the written diagnosis in pairs to form candidate diagnosis pairs; each of the candidate diagnosis pairs includes one candidate missing diagnosis and one written diagnosis;
对各个所述候选诊断对进行规范化处理,得到规范化处理后的候选诊断对;Performing normalization processing on each of the candidate diagnosis pairs to obtain normalized candidate diagnosis pairs;
对各个所述规范化处理后的候选诊断对进行去重处理,确定所述多个诊断对。De-duplication processing is performed on each of the normalized candidate diagnosis pairs to determine the multiple diagnosis pairs.
可选地,所述装置还包括诊断症状对集合确定单元,所述诊断症状对集合确定单元,具体用于:Optionally, the device further comprises a diagnostic symptom pair set determining unit, wherein the diagnostic symptom pair set determining unit is specifically configured to:
获取参考病历文本、所述参考病历文本对应的候选漏写诊断以及所述参考病历文本对应的已写诊断;Obtaining a reference medical record text, a candidate omitted diagnosis corresponding to the reference medical record text, and a written diagnosis corresponding to the reference medical record text;
对所述参考病历文本对应的已写诊断进行两两组合得到第一诊断对集合,对所述参考病历文本对应的候选漏写诊断以及所述参考病历文本对应的已写诊断进行两两组合,并对所述参考病历文本对应的候选漏写诊断进行两两组合得到第二诊断对集合;Combining the written diagnoses corresponding to the reference medical record text in pairs to obtain a first diagnosis pair set, combining the candidate omitted diagnoses corresponding to the reference medical record text and the written diagnoses corresponding to the reference medical record text in pairs, and combining the candidate omitted diagnoses corresponding to the reference medical record text in pairs to obtain a second diagnosis pair set;
计算所述第一诊断对集合中,每个所述第一诊断对的共现概率;Calculating the co-occurrence probability of each of the first diagnosis pairs in the first diagnosis pair set;
基于所述第一诊断对集合中,各个所述第一诊断对的共现概率,从所述第一诊断对集合以及所述第二诊断对集合中确定所述诊断症状对集合。The diagnosis-symptom pair set is determined from the first diagnosis pair set and the second diagnosis pair set based on the co-occurrence probability of each of the first diagnosis pairs in the first diagnosis pair set.
一种病历诊断漏写检测设备,包括存储器和处理器;A device for detecting omissions in medical record diagnosis, comprising a memory and a processor;
所述存储器,用于存储程序;The memory is used to store programs;
所述处理器,用于执行所述程序,实现如上所述的病历诊断漏写检测方法的各个步骤。The processor is used to execute the program to implement the various steps of the medical record diagnosis omission detection method as described above.
一种可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时,实现如上所述的病历诊断漏写检测方法的各个步骤。A readable storage medium stores a computer program, which, when executed by a processor, implements the various steps of the medical record diagnosis omission detection method as described above.
借由上述技术方案,本申请公开了一种病历诊断漏写检测方法、装置、设备及可读存储介质。本方案中,在确定待检测的病历文本之后,先确定出病历文本对应的候选漏写诊断;然后从候选漏写诊断中确定目标候选漏写诊断,目标候选漏写诊断为与病历文本中的已写诊断语义一致的候选漏写诊断,以及,被确定是已写诊断的症状的候选漏写诊断;最后将候选漏写诊断中除目标候选漏写诊断之外的其他候选漏写诊断确定为漏写诊断,本方案是从诊断语义相似度以及诊断的症状判定两个方面进行病历诊断漏写检测的,当一个候选漏写诊断是一个已写诊断的症状时,即使二者语义一致,仍能够避免该候选漏写诊断被检测为漏写诊断,从而提升病历诊断漏写检测结果准确度。By means of the above technical scheme, the present application discloses a method, device, equipment and readable storage medium for detecting omissions in medical record diagnoses. In this scheme, after determining the medical record text to be detected, first determine the candidate omission diagnoses corresponding to the medical record text; then determine the target candidate omission diagnoses from the candidate omission diagnoses, the target candidate omission diagnoses are candidate omission diagnoses that are semantically consistent with the written diagnoses in the medical record text, and candidate omission diagnoses that are determined to be symptoms of the written diagnoses; finally, determine the other candidate omission diagnoses except the target candidate omission diagnoses in the candidate omission diagnoses as omission diagnoses. This scheme performs omission detection of medical record diagnoses from two aspects: the semantic similarity of diagnoses and the symptom determination of diagnoses. When a candidate omission diagnosis is a symptom of a written diagnosis, even if the two are semantically consistent, it is still possible to avoid the candidate omission diagnosis from being detected as an omission diagnosis, thereby improving the accuracy of the results of omission detection of medical record diagnoses.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本申请的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:Various other advantages and benefits will become apparent to those of ordinary skill in the art by reading the detailed description of the preferred embodiments below. The accompanying drawings are only for the purpose of illustrating the preferred embodiments and are not to be considered as limiting the present application. Also, the same reference symbols are used throughout the accompanying drawings to represent the same components. In the accompanying drawings:
图1为本申请实施例公开的一种病历诊断漏写检测方法的流程示意图;FIG1 is a flow chart of a method for detecting missed medical record diagnosis disclosed in an embodiment of the present application;
图2为本申请实施例公开的一种利用生成式大模型确定病历文本对应的候选漏写诊断的方法的流程示意图;FIG2 is a flow chart of a method for determining candidate omitted diagnoses corresponding to medical record texts using a generative large model disclosed in an embodiment of the present application;
图3为本申请实施例公开的一种对已有的医疗领域生成式模型进行微调的方法的流程示意图;FIG3 is a flow chart of a method for fine-tuning an existing generative model in the medical field disclosed in an embodiment of the present application;
图4为本申请实施例公开的一种从所述候选漏写诊断中确定目标候选漏写诊断的方法的流程示意图;FIG4 is a flow chart of a method for determining a target candidate omission diagnosis from the candidate omission diagnoses disclosed in an embodiment of the present application;
图5为本申请实施例公开的一种确定预设的诊断症状对集合的方法的流程示意图;FIG5 is a flow chart of a method for determining a preset diagnostic symptom pair set disclosed in an embodiment of the present application;
图6为本申请实施例公开的一种病历诊断漏写检测装置结构示意图;FIG6 is a schematic diagram of the structure of a device for detecting missed medical record diagnosis disclosed in an embodiment of the present application;
图7为本申请实施例公开的一种病历诊断漏写检测设备的硬件结构框图。FIG. 7 is a hardware structure block diagram of a medical record diagnosis omission detection device disclosed in an embodiment of the present application.
具体实施方式DETAILED DESCRIPTION
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The following will be combined with the drawings in the embodiments of the present application to clearly and completely describe the technical solutions in the embodiments of the present application. Obviously, the described embodiments are only part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of this application.
接下来,通过下述实施例对本申请提供的病历诊断漏写检测方法进行介绍。Next, the medical record diagnosis omission detection method provided by the present application is introduced through the following embodiments.
参照图1,图1为本申请实施例公开的一种病历诊断漏写检测方法的流程示意图,该方法可以包括如下步骤:Referring to FIG. 1 , FIG. 1 is a flow chart of a method for detecting missed medical record diagnosis disclosed in an embodiment of the present application. The method may include the following steps:
步骤S101:确定待检测的病历文本。Step S101: Determine the medical record text to be detected.
在本申请中,病历文本是患者的病历所对应的文本,其可以是根据患者自述和医生问询的情况录入的电子病历,也可以是对手写的纸质病历进行扫描或者拍摄后得到的病历图像进行光学字符识别得到的文本,对此,本申请不进行任何限定。In the present application, the medical record text is the text corresponding to the patient's medical record, which can be an electronic medical record entered based on the patient's self-description and the doctor's inquiries, or it can be a text obtained by optical character recognition of a medical record image obtained by scanning or photographing a handwritten paper medical record. This application does not impose any limitations on this.
待检测的病历文本即需要进行病历诊断漏写检测的病历文本,其可以是待检测的原始病历文本,也可以是对待检测的原始病历文本进行预处理处理之后得到的病历文本,预处理包括脱敏和信息筛选处理,脱敏和信息筛选处理具体可以是将原始病历文本中个人或医师姓名、地址、入院时间、病床号、鉴别诊断等字段删除,得到待检测的病历文本,对此,本申请不进行任何限定。The medical record text to be tested is the medical record text that needs to be tested for omissions in medical record diagnosis. It can be the original medical record text to be tested, or it can be the medical record text obtained after preprocessing the original medical record text to be tested. The preprocessing includes desensitization and information screening. The desensitization and information screening processing can specifically include deleting fields such as the individual or doctor's name, address, admission time, bed number, differential diagnosis, etc. in the original medical record text to obtain the medical record text to be tested. This application does not impose any restrictions on this.
步骤S102:确定所述病历文本对应的候选漏写诊断。Step S102: Determine the candidate omitted diagnosis corresponding to the medical record text.
病历文本中包含了大量的病情描述和患者接受检查、诊疗的信息,同时也包含了医生诊断后得到的结论性的信息。其中,病情描述和患者接受检查、诊疗的信息,也可以侧面反映患者可以患有的疾病,而医生诊断得到的结论性信息,则能够直接体现患者患有的疾病。在本申请中,病历文本对应的候选漏写诊断是病历文本对应患者可能患有的疾病,候选漏写诊断可能是直接记载在病历文本中,也可能没有直接记载在病历文本中,对此,本申请不进行任何限定。The medical record text contains a large amount of description of the condition and information about the examinations and treatments received by the patient, as well as the conclusive information obtained after the doctor's diagnosis. Among them, the description of the condition and the information about the examinations and treatments received by the patient can also indirectly reflect the diseases that the patient may suffer from, while the conclusive information obtained by the doctor's diagnosis can directly reflect the diseases that the patient suffers from. In this application, the candidate omitted diagnosis corresponding to the medical record text is the disease that the patient may suffer from corresponding to the medical record text. The candidate omitted diagnosis may be directly recorded in the medical record text, or it may not be directly recorded in the medical record text. This application does not impose any restrictions on this.
在本申请中,确定病历文本对应的候选漏写诊断的方式可以有多种。作为一种可实施方式,可以将疾病名称作为命名体,对病历文本进行命名体识别得到,也可以预先收集疾病标准或者特效药的信息,并将检查项或者药品名称作为命名体,对病历文本进行命名体识别得到病历文本中包含的检查项或者药品名称,并对应疾病标准或者特效药的信息,由此确定病历文本对应的候选漏写诊断。此外,还可以预先标注样本病历文本中包含的疾病,并据此训练深度学习模型,使得深度学习模型能够学习到病历文本与疾病之间的映射关系,从而能够针对输入的病历文本,输出其包含的疾病作为候选漏写诊断。另外,ChatGPT模型、PaLM(Pathways Language Model)模型、盘古大模型、星火认知大模型等生成式模型具备较强的自然语言理解能力,在本申请中还可以利用生成式大模型确定病历文本对应的候选漏写诊断。对此,本申请也不进行任何限定。In the present application, there are many ways to determine the candidate omitted diagnosis corresponding to the medical record text. As an implementable method, the name of the disease can be used as the nomenclature, and the medical record text can be identified by the nomenclature. It is also possible to collect the information of the disease standard or specific medicine in advance, and use the examination item or the name of the drug as the nomenclature, and identify the nomenclature of the medical record text to obtain the examination item or drug name contained in the medical record text, and the corresponding disease standard or specific medicine information, thereby determining the candidate omitted diagnosis corresponding to the medical record text. In addition, the diseases contained in the sample medical record text can also be pre-marked, and the deep learning model can be trained accordingly, so that the deep learning model can learn the mapping relationship between the medical record text and the disease, so that the disease contained in the input medical record text can be output as a candidate omitted diagnosis. In addition, generative models such as ChatGPT model, PaLM (Pathways Language Model) model, Pangu model, Spark cognitive model, etc. have strong natural language understanding capabilities, and generative models can also be used in this application to determine the candidate omitted diagnosis corresponding to the medical record text. In this regard, this application does not make any restrictions.
需要说明的是,在本申请中,为了提升病历文本对应的候选漏写诊断的完善程度,可以采用上述多种方式分别对病历文本进行处理,得到多种候选漏写诊断结果,再对多种候选漏写诊断结果进行融合,得到最终的候选漏写诊断,对此,本申请也不进行限定。It should be noted that in the present application, in order to improve the completeness of the candidate omission diagnoses corresponding to the medical record text, the above-mentioned multiple methods can be used to process the medical record text separately to obtain multiple candidate omission diagnosis results, and then the multiple candidate omission diagnosis results are fused to obtain the final candidate omission diagnosis. This application does not limit this.
步骤S103:从所述候选漏写诊断中确定目标候选漏写诊断,所述目标候选漏写诊断为与所述病历文本中的已写诊断语义一致的候选漏写诊断,以及,被确定是所述已写诊断的症状的候选漏写诊断。Step S103: determining a target candidate omitted diagnosis from the candidate omitted diagnoses, wherein the target candidate omitted diagnosis is a candidate omitted diagnosis that is semantically consistent with a written diagnosis in the medical record text, and a candidate omitted diagnosis that is determined to be a symptom of the written diagnosis.
病历文本中的已写诊断的确定方式可以是获取病历文本中的诊断列表字段,例如,病历文本首页的出院诊断列表字段或病历文本出院记录的出院诊断列表字段,基于病历文本中的诊断列表字段确定病历文本中的已写诊断。The written diagnosis in the medical record text can be determined by obtaining the diagnosis list field in the medical record text, for example, the discharge diagnosis list field on the first page of the medical record text or the discharge diagnosis list field in the discharge record of the medical record text, and determining the written diagnosis in the medical record text based on the diagnosis list field in the medical record text.
在本申请中,针对每个候选漏写诊断,需要确定其是否为漏写诊断,一般情况下,如果两个诊断的语义一致,则可确定两个诊断是同一诊断,因此,在本申请中,当一个候选漏写诊断与一个已写诊断的语义一致时,可以排除该候选漏写诊断,但是,在病历实际书写和质量评价中,根据《临床诊断学》第八版等教材及病历书写规范,对于一些入院初期没有特异性的临床症状、体征诊断,在经深入检查、综合分析后有明确诊断的情况下应予以排除。而“在入院初期没有特异性的临床症状、体征诊断”很有可能与“经深入检查、综合分析后的明确诊断”语义并不一致,这种情况下,则会将“在入院初期没有特异性的临床症状、体征诊断”确定为漏写诊断,因此,在本申请中,从诊断语义相似度以及诊断的症状判定两个方面排除不是漏写诊断的候选漏写诊断。In this application, for each candidate omitted diagnosis, it is necessary to determine whether it is an omitted diagnosis. Generally, if the semantics of two diagnoses are consistent, it can be determined that the two diagnoses are the same diagnosis. Therefore, in this application, when a candidate omitted diagnosis is semantically consistent with a written diagnosis, the candidate omitted diagnosis can be excluded. However, in the actual writing and quality evaluation of medical records, according to the eighth edition of "Clinical Diagnosis" and other textbooks and medical record writing specifications, some clinical symptoms and signs that do not have specificity in the early stage of hospitalization should be excluded if there is a clear diagnosis after in-depth examination and comprehensive analysis. However, "no specific clinical symptoms and signs in the early stage of hospitalization" is likely to be inconsistent with the semantics of "clear diagnosis after in-depth examination and comprehensive analysis". In this case, "no specific clinical symptoms and signs in the early stage of hospitalization" will be determined as an omitted diagnosis. Therefore, in this application, candidate omitted diagnoses that are not omitted diagnoses are excluded from the two aspects of diagnostic semantic similarity and diagnostic symptom determination.
在本申请中,可以基于诊断语义相似度以及诊断的症状判定,将所述候选漏写诊断与所述病历文本中的已写诊断进行对比,进而从所述候选漏写诊断中确定目标候选漏写诊断,具体实现方式将通过后面的实施例详细说明,此处不再展开描述。In the present application, the candidate omitted diagnosis can be compared with the written diagnosis in the medical record text based on the diagnostic semantic similarity and the diagnosis symptom judgment, and then the target candidate omitted diagnosis can be determined from the candidate omitted diagnoses. The specific implementation method will be described in detail through the subsequent embodiments and will not be described in detail here.
步骤S104:将所述候选漏写诊断中除所述目标候选漏写诊断之外的其他候选漏写诊断确定为漏写诊断。Step S104: determining other candidate omission diagnoses among the candidate omission diagnoses except the target candidate omission diagnosis as omission diagnoses.
本实施例公开了一种病历诊断漏写检测方法。本方案中,在确定待检测的病历文本之后,先确定出病历文本对应的候选漏写诊断;然后从候选漏写诊断中确定目标候选漏写诊断,目标候选漏写诊断为与病历文本中的已写诊断语义一致的候选漏写诊断,以及,被确定是已写诊断的症状的候选漏写诊断;最后将候选漏写诊断中除目标候选漏写诊断之外的其他候选漏写诊断确定为漏写诊断,本方案是从诊断语义相似度以及诊断的症状判定两个方面进行病历诊断漏写检测的,当一个候选漏写诊断是一个已写诊断的症状时,即使二者语义一致,仍能够避免该候选漏写诊断被检测为漏写诊断,从而提升病历诊断漏写检测结果准确度。This embodiment discloses a method for detecting omissions in medical record diagnoses. In this solution, after determining the medical record text to be detected, first determine the candidate omission diagnoses corresponding to the medical record text; then determine the target candidate omission diagnosis from the candidate omission diagnoses, the target candidate omission diagnosis is a candidate omission diagnosis that is semantically consistent with the written diagnosis in the medical record text, and a candidate omission diagnosis that is determined to be a symptom of the written diagnosis; finally, determine the other candidate omission diagnoses except the target candidate omission diagnosis in the candidate omission diagnoses as omission diagnoses. This solution performs omission detection of medical record diagnoses from two aspects: diagnostic semantic similarity and diagnostic symptom determination. When a candidate omission diagnosis is a symptom of a written diagnosis, even if the two are semantically consistent, it is still possible to avoid the candidate omission diagnosis from being detected as an omission diagnosis, thereby improving the accuracy of the medical record diagnosis omission detection results.
在上述实施例中提到,ChatGPT模型、PaLM(Pathways Language Model)模型、盘古大模型、星火认知大模型等生成式模型具备较强的自然语言理解能力,在本申请中还可以利用生成式大模型确定病历文本对应的候选漏写诊断,接下来在本申请的另一个实施例中,对利用生成式大模型确定病历文本对应的候选漏写诊断的具体实现方式进行说明,参照图2,图2为本申请实施例公开的一种利用生成式大模型确定病历文本对应的候选漏写诊断的方法的流程示意图,该方法可以包括如下步骤:In the above embodiments, it is mentioned that generative models such as ChatGPT model, PaLM (Pathways Language Model) model, Pangu model, Spark cognitive model, etc. have strong natural language understanding capabilities. In the present application, the generative model can also be used to determine the candidate omission diagnosis corresponding to the medical record text. Next, in another embodiment of the present application, the specific implementation method of using the generative model to determine the candidate omission diagnosis corresponding to the medical record text is described. With reference to Figure 2, Figure 2 is a flow chart of a method for determining the candidate omission diagnosis corresponding to the medical record text using a generative model disclosed in an embodiment of the present application. The method may include the following steps:
步骤S201:生成所述病历文本对应的提示信息,所述提示信息用于指示确定所述病历文本对应的候选漏写诊断。Step S201: Generate prompt information corresponding to the medical record text, where the prompt information is used to indicate the candidate omitted diagnosis corresponding to the medical record text.
在本申请中,可以预设特定问题以及预设提示信息模板,预设特定问题比如可以为“该病历文本中存在哪些诊断?”、“请问上述病历文本中包含哪些诊断?”等,预设提示信息模板中可以包含用于填充预设问题的槽位以及用于填充病历文本的槽位。In the present application, specific questions and preset prompt information templates can be preset. The preset specific questions can be, for example, "What diagnoses are there in the medical record text?", "What diagnoses are included in the above medical record text?", etc. The preset prompt information template can include slots for filling in preset questions and slots for filling in medical record text.
作为一种可实施方式,可以将该病历文本与预设的一个特定问题填充到一个提示信息模板的相应槽位中,生成与该病历文本对应的提示信息。As an implementable method, the medical record text and a preset specific question can be filled into a corresponding slot of a prompt information template to generate prompt information corresponding to the medical record text.
为便于理解,假设预设提示信息模板为“给定病历文本:{input}\{prompt}”,其中“{input}”这一槽位用于填充病历文本,“{prompt}”这一槽位用于填充预设问题,预设问题为“请问上述病历文本中包含哪些诊断?”,病历文本为“滕**主任医师查房认为该患者为肠粘连,胆囊结石伴萎缩性胆囊炎,胆总管结石行腹腔镜下行肠粘连松解+胆囊切除+胆总管切开取石+T型管引流术,手术顺利,现患者生命体征平稳,腹腔引流管无胆汁样及浑浊液体,腹部无腹膜炎体征说明无胆漏、肠瘘发生。”,则与该病历文本对应的提示信息可以为:For ease of understanding, assume that the preset prompt information template is "Given medical record text: {input}\{prompt}", where the slot "{input}" is used to fill in the medical record text, and the slot "{prompt}" is used to fill in the preset question. The preset question is "What diagnoses are included in the above medical record text?" The medical record text is "Chief Physician Teng** believed during the ward rounds that the patient had intestinal adhesions, gallbladder stones with atrophic cholecystitis, and common bile duct stones. The patient underwent laparoscopic intestinal adhesion lysis + cholecystectomy + common bile duct incision and lithotomy + T-tube drainage. The operation was successful. The patient's vital signs are stable, there is no bile-like and turbid fluid in the abdominal drainage tube, and there are no signs of peritonitis in the abdomen, indicating that there is no bile leakage or intestinal fistula." The prompt information corresponding to the medical record text can be:
“给定病历文本:滕**主任医师查房认为该患者为肠粘连,胆囊结石伴萎缩性胆囊炎,胆总管结石行腹腔镜下行肠粘连松解+胆囊切除+胆总管切开取石+T型管引流术,手术顺利,现患者生命体征平稳,腹腔引流管无胆汁样及浑浊液体,腹部无腹膜炎体征说明无胆漏、肠瘘发生。"Given medical record text: Chief Physician Teng** believed during the ward round that the patient had intestinal adhesions, gallbladder stones with atrophic cholecystitis, and common bile duct stones. He underwent laparoscopic intestinal adhesion lysis + cholecystectomy + common bile duct incision and lithotomy + T-tube drainage. The operation was successful. The patient's vital signs are now stable. There is no bile-like or turbid fluid in the abdominal drainage tube. There are no signs of peritonitis in the abdomen, indicating that there is no bile leakage or intestinal fistula.
请问上述病历文本中包含哪些诊断?”。What diagnoses are included in the above medical record text? ".
考虑到病历文本的长度可能超出生成式模型的可处理文本长度阈值,作为另一种可实施方式,可以对所述病历文本进行分割,得到多个病历文本片段,针对每个病历文本片段,生成与所述病历文本片段对应的提示信息,所述提示信息用于指示确定所述病历文本片段对应的候选漏写诊断。各个病历文本片段对应的提示信息的组合为所述病历文本对应的提示信息。Considering that the length of the medical record text may exceed the processable text length threshold of the generative model, as another possible implementation method, the medical record text can be segmented to obtain multiple medical record text segments, and for each medical record text segment, prompt information corresponding to the medical record text segment is generated, and the prompt information is used to indicate the candidate omitted diagnosis corresponding to the medical record text segment. The combination of the prompt information corresponding to each medical record text segment is the prompt information corresponding to the medical record text.
在本申请中,对所述病历文本进行分割的方式可以有多种,比如,按照长度切分(例如每256个字做一切分)、按照字段切分(例如将诊断经过、查房记录等字段分别提取出来)、按照特定标点符号切分(例如,按照“。”对病历文本进行分割)等一种或多种方式组合,对此,本申请不进行任何限定。In the present application, there are many ways to segment the medical record text, such as segmentation by length (for example, segmentation every 256 words), segmentation by field (for example, extracting fields such as diagnosis process and ward rounds records separately), segmentation by specific punctuation marks (for example, segmenting the medical record text according to "."), and one or more combinations of these methods. The present application does not impose any limitations on this.
在本申请中,可以针对每个病历文本片段,可以将该病历文本片段与预设的一个特定问题填充到一个提示信息模板的相应槽位中,生成与该病历文本片段对应的提示信息。In the present application, for each medical record text fragment, the medical record text fragment and a preset specific question can be filled into a corresponding slot of a prompt information template to generate prompt information corresponding to the medical record text fragment.
为便于理解,假设病历文本为text,对其进行分割,得到多个病历文本片段为text1,text2,…,textn,其中,n为大于等于1的整数,预设提示信息模板为“给定病历文本:{input}\{prompt}”,其中“{input}”这一槽位用于填充病历文本片段,“{prompt}”这一槽位用于填充预设问题,预设问题为“请问上述病历文本中包含哪些诊断?”,则各个所述病历文本片段对应的提示信息具体如下:For ease of understanding, assume that the medical record text is text, and segment it to obtain multiple medical record text segments text 1 , text 2 , ..., text n , where n is an integer greater than or equal to 1, and the preset prompt information template is "Given medical record text: {input}\{prompt}", where the slot "{input}" is used to fill in the medical record text segment, and the slot "{prompt}" is used to fill in the preset question, the preset question is "What diagnoses are included in the above medical record text?", then the prompt information corresponding to each of the medical record text segments is as follows:
“给定病历文本:text1 "Given medical record text: text 1
请问上述病历文本中包含哪些诊断?”What diagnoses are included in the above medical record text? "
“给定病历文本:text2 “Given medical record text: text 2
请问上述病历文本中包含哪些诊断?”What diagnoses are included in the above medical record text? "
………
“给定病历文本:textn "Given medical record text: text n
请问上述病历文本中包含哪些诊断?”What diagnoses are included in the above medical record text? "
步骤S202:调用生成式模型,将所述病历文本对应的提示信息输入所述生成式模型。Step S202: calling a generative model, and inputting prompt information corresponding to the medical record text into the generative model.
在本申请中,如果所述病历文本对应的提示信息中包括多个病历文本片段对应的提示信息,在将各个所述病历文本片段对应的提示信息输入所述生成式模型时,可以每次输入一个病历文本片段对应的提示信息,也可以每次输入多个病历文本片段对应的提示信息,对此,本申请不进行任何限定。In the present application, if the prompt information corresponding to the medical record text includes prompt information corresponding to multiple medical record text fragments, when inputting the prompt information corresponding to each medical record text fragment into the generative model, the prompt information corresponding to one medical record text fragment can be input at a time, or the prompt information corresponding to multiple medical record text fragments can be input at a time. This application does not impose any limitations on this.
需要说明的是,本申请的生成式模型可以是已有的医疗领域生成式模型,考虑到已有的医疗领域生成式大模型对病历文本的理解能力欠佳,在本申请中,也可以利用训练用病历文本,对已有的医疗领域生成式模型进行微调,得到微调后的生成式模型,供该步骤进行调用。对已有的医疗领域生成式模型的微调方式,将在后面的实施例详细说明,此处不再展开描述。It should be noted that the generative model of the present application can be an existing generative model in the medical field. Considering that the existing large generative models in the medical field have poor understanding of medical record texts, in the present application, the training medical record text can also be used to fine-tune the existing generative model in the medical field to obtain a fine-tuned generative model for calling in this step. The fine-tuning method of the existing generative model in the medical field will be described in detail in the following embodiments and will not be described in detail here.
步骤S203:基于所述生成式模型的回复,确定所述病历文本对应的候选漏写诊断。Step S203: Based on the response of the generative model, determine the candidate omitted diagnosis corresponding to the medical record text.
在本申请中,所述病历文本对应的候选漏写诊断是结构化信息,比如可以为列表的形式,列表中包含多个元素,每个元素表征一个候选漏写诊断。所述生成式模型的回复一般是自然语言描述信息,可以通过正则匹配等方式从中提取出所述病历文本对应的候选漏写诊断。In the present application, the candidate omitted diagnosis corresponding to the medical record text is structured information, such as a list, which contains multiple elements, each element representing a candidate omitted diagnosis. The response of the generative model is generally natural language description information, from which the candidate omitted diagnosis corresponding to the medical record text can be extracted by regular matching and other methods.
例如,生成式模型的回复为“该病历文本包含diagnosis1,diagnosis2,......,diagnosisn”,则病历文本对应的候选漏写诊断可以为[diagnosis1,diagnosis2,......,diagnosisn]。For example, if the generative model responds with “the medical record text contains diagnosis 1 , diagnosis 2 , ..., diagnosis n ”, then the candidate omitted diagnoses corresponding to the medical record text may be [diagnosis 1 , diagnosis 2 , ..., diagnosis n ].
现有技术中通过深度学习模型实现的病历诊断漏写检测技术,需要用到多个模型进行分阶段召回候选漏写诊断,易产生错误传递,并且需要重复分析同一病历文本,导致检测效率低下。本实施例中通过自然语言理解能力较强的生成式大模型分析病历文本,解决了分阶段召回易产生错误传递的问题,生成式模型能够考虑到病历文本不同诊断之间的语义联系,使得召回的候选漏写诊断更准确、更全面,还能够避免重复分析同一病历文本,提高了检测效率。In the prior art, the medical record diagnosis omission detection technology implemented by deep learning models requires multiple models to recall candidate omission diagnoses in stages, which is prone to error transmission and requires repeated analysis of the same medical record text, resulting in low detection efficiency. In this embodiment, a large generative model with strong natural language comprehension ability is used to analyze the medical record text, which solves the problem of error transmission caused by staged recall. The generative model can take into account the semantic connection between different diagnoses in the medical record text, making the recalled candidate omission diagnoses more accurate and comprehensive, and can also avoid repeated analysis of the same medical record text, thereby improving detection efficiency.
接下来,在本申请的另一个实施例中,对已有的医疗领域生成式模型进行微调的方式进行详细介绍,参照图3,图3为本申请实施例公开的一种对已有的医疗领域生成式模型进行微调的方法的流程示意图,该方法可以包括如下步骤:Next, in another embodiment of the present application, a method for fine-tuning an existing generative model in the medical field is described in detail, with reference to FIG3 , which is a flow chart of a method for fine-tuning an existing generative model in the medical field disclosed in an embodiment of the present application, and the method may include the following steps:
步骤S301:获取训练用病历文本、所述训练用病历文本对应的诊断以及所述训练用病历文本中各个诊断的标签,所述训练用病历文本中每个诊断的标签包括第一标签、第二标签和第三标签,所述第一标签用于指示所述诊断是否确诊、所述第二标签用于指示所述诊断的诊断类型、所述第三标签用于指示所述诊断的召回支撑片段。Step S301: Obtain a training medical record text, the diagnosis corresponding to the training medical record text, and the label of each diagnosis in the training medical record text, wherein the label of each diagnosis in the training medical record text includes a first label, a second label, and a third label, wherein the first label is used to indicate whether the diagnosis is confirmed, the second label is used to indicate the diagnosis type of the diagnosis, and the third label is used to indicate a recall support fragment of the diagnosis.
在本申请中,训练用病历文本的确定方式可以是先收集不同来源,不同科室的原始病历文本,对收集的原始病历文本进行预处理,得到预处理后的病历文本,然后对预处理后的长病历文本进行切分处理,得到训练用病历文本,预处理包括脱敏和信息筛选处理,脱敏和信息筛选处理具体可以是将原始病历文本中个人或医师姓名、地址、入院时间、病床号、鉴别诊断等字段删除,对所述病历文本进行分割的方式可以有多种,比如,按照长度切分(例如每256个字做一切分)、按照字段切分(例如将诊断经过、查房记录等字段分别提取出来)、按照特定标点符号切分(例如,按照“。”对病历文本进行分割)等一种或多种方式组合,对此,本申请不进行任何限定。In the present application, the method for determining the medical record text for training can be to first collect original medical record texts from different sources and different departments, pre-process the collected original medical record texts to obtain pre-processed medical record texts, and then segment the pre-processed long medical record texts to obtain training medical record texts. The pre-processing includes desensitization and information screening processing. The desensitization and information screening processing can specifically be to delete the fields such as personal or physician name, address, admission time, bed number, differential diagnosis, etc. in the original medical record text. There are many ways to segment the medical record text, such as segmentation by length (for example, segmentation every 256 words), segmentation by field (for example, extracting fields such as diagnosis process and ward rounds records separately), segmentation by specific punctuation marks (for example, segmenting the medical record text according to "."), and other one or more methods. This application does not impose any restrictions on this.
作为一种可实施方式,训练用病历文本对应的诊断可以通过人工处理或者采用深度学习模型、规则库或知识库确定训练用病历文本对应的诊断。As an implementable method, the diagnosis corresponding to the training medical record text can be determined by manual processing or by using a deep learning model, a rule base or a knowledge base.
例如,训练用病历文本为“滕**主任医师查房认为该患者为肠粘连,胆囊结石伴萎缩性胆囊炎,胆总管结石行腹腔镜下行肠粘连松解+胆囊切除+胆总管切开取石+T型管引流术,手术顺利,现患者生命体征平稳,腹腔引流管无胆汁样及浑浊液体,腹部无腹膜炎体征说明无胆漏、肠瘘发生。”,可以采用实体抽取的方式对该训练用病历文本进行诊断实体的抽取,得到该训练用病历文本对应的诊断“肠粘连、肠瘘、胆囊结石伴萎缩性胆囊炎”。For example, the training medical record text is "Chief Physician Teng** believed during the ward rounds that the patient had intestinal adhesions, gallstones with atrophic cholecystitis, and common bile duct stones. The patient underwent laparoscopic intestinal adhesion lysis + cholecystectomy + common bile duct incision and lithotomy + T-tube drainage. The operation was successful and the patient's vital signs are stable. There is no bile or turbid fluid in the abdominal drainage tube, and there are no signs of peritonitis in the abdomen, indicating that there is no bile leakage or intestinal fistula." The diagnostic entities of the training medical record text can be extracted by entity extraction to obtain the diagnosis "intestinal adhesions, intestinal fistulas, gallstones with atrophic cholecystitis" corresponding to the training medical record text.
训练用病历文本对应的各个诊断的标签可以采用人工标注的方式,也可以基于规则的方式,例如,对于“肠瘘”诊断,通过“(无|否认)[^。,]*发生”的正则匹配到“无胆漏、肠瘘发生”,从而可以判断“肠瘘”不是确诊诊断;且通过对规则分类,如该规则为“否认类”规则,从而可以判断“肠瘘”是由于被否认所以不是确诊诊断;通过人工标注支撑片段为“现患者生命体征平稳,腹腔引流管无胆汁样及浑浊液体,腹部无腹膜炎体征说明无胆漏、肠瘘发生。”。所以“肠瘘”诊断有以下标签:“是否确诊:非确诊;诊断类型:否认类诊断;支撑片段:“现患者生命体征平稳,腹腔引流管无胆汁样及浑浊液体,腹部无腹膜炎体征说明无胆漏、肠瘘发生。”。The labels of each diagnosis corresponding to the training medical record text can be manually annotated or rule-based. For example, for the diagnosis of "intestinal fistula", the regular expression "(no|denial)[^.,]*occurrence" matches "no bile leakage, intestinal fistula occurrence", so that it can be judged that "intestinal fistula" is not a confirmed diagnosis; and by classifying the rules, such as the rule is a "denial type" rule, it can be judged that "intestinal fistula" is not a confirmed diagnosis because it is denied; the manually annotated supporting fragment is "the patient's vital signs are stable, there is no bile sample and turbid fluid in the abdominal drainage tube, and there is no peritonitis sign in the abdomen, indicating that there is no bile leakage or intestinal fistula occurrence.". Therefore, the diagnosis of "intestinal fistula" has the following labels: "Is it confirmed: unconfirmed; Diagnosis type: denial type diagnosis; Supporting fragment: "The patient's vital signs are stable, there is no bile sample and turbid fluid in the abdominal drainage tube, and there is no peritonitis sign in the abdomen, indicating that there is no bile leakage or intestinal fistula occurrence. ".
步骤S302:基于所述训练用病历文本、所述训练用病历文本对应的诊断以及所述训练用病历文本中各个诊断的标签,生成多个训练语料,每个所述训练语料中包括输入文本以及预期输出文本。Step S302: Based on the training medical record text, the diagnosis corresponding to the training medical record text and the label of each diagnosis in the training medical record text, a plurality of training corpora are generated, each of which includes an input text and an expected output text.
在本申请中,可以基于所述训练用病历文本、所述训练用病历文本对应的诊断以及所述训练用病历文本中各个诊断的标签,结合预设的模板,生成多个训练语料,训练语料可以涉及不同的任务类型,比如单否类任务、解释型任务、抽取型任务等,训练语料的表达形式需要尽可能丰富,比如可以采用人工标注思维链的方式。In the present application, multiple training corpora can be generated based on the training medical record text, the diagnosis corresponding to the training medical record text, and the labels of each diagnosis in the training medical record text, combined with a preset template. The training corpus can involve different task types, such as single-negative tasks, interpretation tasks, extraction tasks, etc. The expression form of the training corpus needs to be as rich as possible, for example, the method of manually annotating thought chains can be adopted.
预设的模板例如:“{input}\n该病历文本中存在哪些诊断?”、“以下病历文本中存在哪些诊断\n{input}?”、“{input}\n{diagnosis}诊断在该文本中是否为确诊诊断?”、“{input}\n为什么{diagnosis}诊断在以上病历文本中不是确诊诊断?”等,其中input为病历文本,diagnosis为特定诊断。Preset templates include: "{input}\nWhich diagnoses exist in this medical record text?", "Which diagnoses exist in the following medical record text\n{input}?", "{input}\nIs {diagnosis} a confirmed diagnosis in this text?", "{input}\nWhy is {diagnosis} not a confirmed diagnosis in the above medical record text?", etc., where input is the medical record text and diagnosis is a specific diagnosis.
为便于理解,给出如下几个训练语料的示例:For ease of understanding, the following examples of training corpus are given:
示例一:Example 1:
示例二:Example 2:
示例三:Example 3:
步骤S303:将所述输入文本输入已有的医疗领域生成式模型,以所述已有的医疗领域生成式模型的输出趋近于所述输入文本对应的预期输出文本为目标,对所述已有的医疗领域生成式模型进行微调。Step S303: input the input text into an existing generative model in the medical field, and fine-tune the existing generative model in the medical field with the goal of making the output of the existing generative model in the medical field approach the expected output text corresponding to the input text.
在本申请中,可以将预期输出文本作为正确答案,将已有的医疗领域生成式模型的输出与预期输出文本进行对比,设计损失函数衡量两个输出的差异,并通过反馈机制对模型参数不断更新至模型收敛,完成微调。In this application, the expected output text can be used as the correct answer, the output of the existing generative model in the medical field can be compared with the expected output text, a loss function is designed to measure the difference between the two outputs, and the model parameters are continuously updated through a feedback mechanism until the model converges and fine-tuning is completed.
采用本申请实施例对已有的医疗领域生成式模型进行微调后,该生成式模型将具备良好的诊断召回能力,从而为后续检测结果的准确度奠定了基础。After fine-tuning the existing generative model in the medical field using the embodiments of the present application, the generative model will have good diagnostic recall capabilities, thus laying the foundation for the accuracy of subsequent test results.
在本申请的另一个实施例中,对步骤S103从所述候选漏写诊断中确定目标候选漏写诊断的具体实现方式进行说明,参照图4,图4为本申请实施例公开的一种从所述候选漏写诊断中确定目标候选漏写诊断的方法的流程示意图,该方法可以包括如下步骤:In another embodiment of the present application, a specific implementation method of determining a target candidate omission diagnosis from the candidate omission diagnoses in step S103 is described, with reference to FIG. 4 , which is a flow chart of a method for determining a target candidate omission diagnosis from the candidate omission diagnoses disclosed in an embodiment of the present application. The method may include the following steps:
步骤S401:基于所述候选漏写诊断与所述已写诊断,确定多个诊断对,每个诊断对中包括一个所述候选漏写诊断以及一个所述已写诊断。Step S401: Based on the candidate missing diagnosis and the written diagnosis, a plurality of diagnosis pairs are determined, each diagnosis pair including one of the candidate missing diagnosis and one of the written diagnosis.
作为一种可实施方式,可以对所述候选漏写诊断与所述已写诊断进行两两组合形成诊断对。As an implementable method, the candidate missing diagnoses and the written diagnoses may be combined in pairs to form diagnosis pairs.
考虑到同一诊断可能会采用不同的方式描述,导致可能存在多个诊断实际上表征的是同一诊断的情况,为了避免对同一诊断进行重复性判断,作为一种可实施方式,在本申请中也可以先对候选漏写诊断进行规范化处理,得到规范化处理后的候选漏写诊断,并对已写诊断进行规范化处理,得到规范化处理后的已写诊断,再基于规范化处理后的候选漏写诊断与规范化处理后的已写诊断,确定多个诊断对。Taking into account that the same diagnosis may be described in different ways, resulting in the possibility that multiple diagnoses actually represent the same diagnosis, in order to avoid repetitive judgments on the same diagnosis, as a feasible implementation method, in the present application, the candidate omitted diagnoses may be first normalized to obtain a normalized candidate omitted diagnoses, and the written diagnoses may be normalized to obtain a normalized written diagnoses, and then multiple diagnosis pairs may be determined based on the normalized candidate omitted diagnoses and the normalized written diagnoses.
作为另一种可实施方式,也可以对所述候选漏写诊断与所述已写诊断进行两两组合形成候选诊断对,每个候选诊断对中包括一个所述候选漏写诊断以及一个所述已写诊断,再对各个候选诊断对进行规范化处理,得到规范化处理后的候选诊断对,最后对规范化处理后的候选诊断对进行去重处理,确定多个诊断对。As another possible implementation method, the candidate omitted diagnosis and the written diagnosis can be combined in pairs to form candidate diagnosis pairs, each candidate diagnosis pair including one of the candidate omitted diagnosis and one of the written diagnosis, and then each candidate diagnosis pair is normalized to obtain normalized candidate diagnosis pairs, and finally the normalized candidate diagnosis pairs are deduplicated to determine multiple diagnosis pairs.
需要说明的是,所述规范化处理,可以是对诊断文本进行向量化,也可以是通过知识图谱获取诊断的主导词、部位、病因、性质等等关键要素进行规范化。It should be noted that the normalization processing can be to vectorize the diagnosis text, or to obtain the diagnosis's leading words, location, cause, nature and other key elements through the knowledge graph for normalization.
为便于理解,假设候选漏写诊断包括[diagnosis1,diagnosis2,......,diagnosisn],已写诊断包括[diagnosisOrigin1,diagnosisOrigin2,......,diagnosisOriginn],则诊断对示例如[diagnosis1,diagnosisOrigin1]。For ease of understanding, assuming that the candidate missed diagnoses include [diagnosis 1 , diagnosis 2 , ..., diagnosis n ], and the written diagnoses include [diagnosisOrigin 1 , diagnosisOrigin 2 , ..., diagnosisOrigin n ], then an example of a diagnosis pair is [diagnosis 1 , diagnosisOrigin 1 ].
步骤S402:针对每个诊断对,判断所述诊断对中两个诊断的语义是否一致,如果所述诊断对中两个诊断的语义一致,则执行步骤S403,如果所述诊断对中两个诊断的语义不一致,则执行步骤S404。Step S402: for each diagnosis pair, determine whether the semantics of the two diagnoses in the diagnosis pair are consistent. If the semantics of the two diagnoses in the diagnosis pair are consistent, execute step S403; if the semantics of the two diagnoses in the diagnosis pair are inconsistent, execute step S404.
在本申请中,可以计算所述诊断对中的候选漏写诊断与已写诊断的语义相似度,进而判断所述诊断对中两个诊断的语义是否一致。In the present application, the semantic similarity between the candidate omitted diagnosis and the written diagnosis in the diagnosis pair may be calculated, thereby determining whether the semantics of the two diagnoses in the diagnosis pair are consistent.
步骤S403:确定所述诊断对中的候选漏写诊断为目标候选漏写诊断。Step S403: determining the candidate omission diagnosis in the diagnosis pair as the target candidate omission diagnosis.
步骤S404:将所述诊断对与预设的诊断症状对集合中各个诊断症状对进行对比,如果所述诊断对与所述诊断症状对集合中某个诊断症状对一致,则确定所述候选漏写诊断为目标候选漏写诊断,如果所述诊断对与所述诊断症状对集合中各个诊断症状对均不一致,则确定所述候选漏写诊断为非目标候选漏写诊断。Step S404: Compare the diagnostic pair with each diagnostic symptom pair in a preset diagnostic symptom pair set. If the diagnostic pair is consistent with a diagnostic symptom pair in the diagnostic symptom pair set, determine the candidate omission diagnosis as a target candidate omission diagnosis. If the diagnostic pair is inconsistent with each diagnostic symptom pair in the diagnostic symptom pair set, determine the candidate omission diagnosis as a non-target candidate omission diagnosis.
在本申请中,预设的诊断症状对集合中包括多个诊断症状对,每个诊断症状对中,包含两个诊断,其中一个诊断是另一个诊断对应的症状。为便于理解,假设一个诊断症状对为[diagnosisa,diagnosisb],则diagnosisa为diagnosisb的症状,或,diagnosisb为diagnosisa的症状。In the present application, the preset diagnostic symptom pair set includes multiple diagnostic symptom pairs, each diagnostic symptom pair includes two diagnoses, one of which is the symptom corresponding to the other diagnosis. For ease of understanding, assuming that a diagnostic symptom pair is [diagnosis a , diagnosis b ], then diagnosis a is the symptom of diagnosis b , or, diagnosis b is the symptom of diagnosis a .
其中,预设的诊断症状对集合是基于大量的病历数据确定得到的,具体确定方式将通过后面的实施例详细说明。Among them, the preset diagnostic symptom pair set is determined based on a large amount of medical record data, and the specific determination method will be described in detail through the following embodiments.
在本申请的另一个实施例中,对预设的诊断症状对集合的确定方式进行详细介绍。参照图5,图5为本申请实施例公开的一种确定预设的诊断症状对集合的方法的流程示意图,该方法可以包括如下步骤:In another embodiment of the present application, a method for determining a preset diagnostic symptom pair set is described in detail. Referring to FIG. 5 , FIG. 5 is a flow chart of a method for determining a preset diagnostic symptom pair set disclosed in an embodiment of the present application, and the method may include the following steps:
步骤S501:获取参考病历文本、所述参考病历文本对应的候选漏写诊断以及所述参考病历文本对应的已写诊断。Step S501: Obtain a reference medical record text, a candidate omitted diagnosis corresponding to the reference medical record text, and a written diagnosis corresponding to the reference medical record text.
参考病历文本的确定方式可以是先收集不同来源,不同科室的原始病历文本,对收集的原始病历文本进行预处理,得到预处理后的病历文本,作为参考病历文本。预处理的方式可以参见前述实施例中的相关描述,此处不再赘述。参考病历文本对应的候选漏写诊断可以是人工确定的,也可以是采用现有技术确定的,对此,本申请不进行任何限定。所述参考病历文本对应的已写诊断的确定方式可以参考前述实施例的相关描述,此处不再赘述。The reference medical record text may be determined by first collecting original medical record texts from different sources and different departments, preprocessing the collected original medical record texts, and obtaining the preprocessed medical record texts as reference medical record texts. The preprocessing method may refer to the relevant description in the aforementioned embodiment, which will not be repeated here. The candidate omitted diagnosis corresponding to the reference medical record text may be determined manually or by using the prior art, and this application does not impose any limitation on this. The method for determining the written diagnosis corresponding to the reference medical record text may refer to the relevant description in the aforementioned embodiment, which will not be repeated here.
步骤S502:对所述参考病历文本对应的已写诊断进行两两组合得到第一诊断对集合,对所述参考病历文本对应的候选漏写诊断以及所述参考病历文本对应的已写诊断进行两两组合,并对所述参考病历文本对应的候选漏写诊断进行两两组合得到第二诊断对集合。Step S502: Combining the written diagnoses corresponding to the reference medical record text in pairs to obtain a first diagnosis pair set, combining the candidate omitted diagnoses corresponding to the reference medical record text and the written diagnoses corresponding to the reference medical record text in pairs, and combining the candidate omitted diagnoses corresponding to the reference medical record text in pairs to obtain a second diagnosis pair set.
需要说明的是,所述第一诊断对集合中包括多个第一诊断对,所述第二诊断对集合中包括多个第二诊断对。每个第一诊断对中包括两个已写诊断,每个第二诊断对中包括一个候选漏写诊断和一个已写诊断,或者包括两个候选漏写诊断。It should be noted that the first diagnostic pair set includes multiple first diagnostic pairs, and the second diagnostic pair set includes multiple second diagnostic pairs. Each first diagnostic pair includes two written diagnoses, and each second diagnostic pair includes one candidate missing diagnosis and one written diagnosis, or two candidate missing diagnoses.
步骤S503:计算所述第一诊断对集合中,每个所述第一诊断对的共现概率。Step S503: Calculate the co-occurrence probability of each of the first diagnosis pairs in the first diagnosis pair set.
在本申请中,可以计算所述第一诊断对在全部参考病历文本中出现的次数,以及第一诊断对集合中各个诊断对在全部参考病历文本中出现的总次数之间的比值,作为所述第一诊断对的共现概率。In the present application, the number of times the first diagnosis pair appears in all reference medical record texts and the ratio between the total number of times each diagnosis pair in the first diagnosis pair set appears in all reference medical record texts can be calculated as the co-occurrence probability of the first diagnosis pair.
步骤S504:基于所述第一诊断对集合中,各个所述第一诊断对的共现概率,从所述第一诊断对集合以及所述第二诊断对集合中确定诊断症状对集合。Step S504: determining a diagnosis-symptom pair set from the first diagnosis pair set and the second diagnosis pair set based on the co-occurrence probability of each of the first diagnosis pairs in the first diagnosis pair set.
诊断对的共现比例越高,则共现概率越大,共现情况越密集,该诊断对的存在越普遍,越有可能是常发生在同一个病人身上的两类疾病,或是常见疾病与并发症的关系,其中一个疾病越不可能是另一个疾病的症状,相反,共现比例越低,共现概率越低,共现情况越少,意味着医生实际书写中不会将两个诊断同时书写,其中一个疾病越有可能是另一个疾病的症状。因此,在本申请中,需要排除共现概率较大的第一诊断对,以及与共现概率较大的第一诊断对语义一致的第二诊断对。The higher the co-occurrence ratio of the diagnosis pair, the greater the co-occurrence probability, the more dense the co-occurrence situation, the more common the existence of the diagnosis pair, the more likely it is that they are two types of diseases that often occur in the same patient, or the relationship between a common disease and complications, and the less likely one disease is to be a symptom of another disease. On the contrary, the lower the co-occurrence ratio, the lower the co-occurrence probability, and the fewer co-occurrence situations, which means that doctors will not write two diagnoses at the same time in actual writing, and the more likely one disease is to be a symptom of another disease. Therefore, in this application, it is necessary to exclude the first diagnosis pair with a higher co-occurrence probability and the second diagnosis pair that is semantically consistent with the first diagnosis pair with a higher co-occurrence probability.
作为一种可实施方式,可以预设第一阈值和第二阈值,第一阈值大于第二阈值,第一阈值和第二阈值的具体取值可以根据场景需求设定,本申请不进行任何限定。则可以从所述第二诊断对集合中确定目标第二诊断对,所述目标第二诊断对为与共现概率大于第一预设阈值的第一诊断对语义一致的第二诊断对。将所述第一诊断对集合中,共现概率小于第二预设阈值的第一诊断对,以及所述第二诊断对集合中除所述目标第二诊断对之外的第二诊断对,进行组合得到候选诊断症状对集合。As an implementable method, a first threshold and a second threshold may be preset, the first threshold being greater than the second threshold, and the specific values of the first threshold and the second threshold may be set according to scenario requirements, and this application does not impose any limitation. Then, a target second diagnostic pair may be determined from the second diagnostic pair set, and the target second diagnostic pair is a second diagnostic pair that is semantically consistent with the first diagnostic pair whose co-occurrence probability is greater than the first preset threshold. The first diagnostic pairs in the first diagnostic pair set whose co-occurrence probability is less than the second preset threshold, and the second diagnostic pairs in the second diagnostic pair set except the target second diagnostic pair, are combined to obtain a set of candidate diagnostic symptom pairs.
作为一种可实施方式,可以将候选诊断症状对集合作为所述诊断症状对集合。考虑到候选诊断症状对集合中各个诊断的描述内容不规范,作为另一种可实施方式,可以对所述候选诊断症状对集合进行标注,确定所述诊断症状对集合。As an implementable method, the candidate diagnosis symptom pair set can be used as the diagnosis symptom pair set. Considering that the description content of each diagnosis in the candidate diagnosis symptom pair set is not standardized, as another implementable method, the candidate diagnosis symptom pair set can be labeled to determine the diagnosis symptom pair set.
在本申请中,可以采用人工全量标注的方式,也可以先进行人工采样标注,再通过诊断对相似度或其他聚类手段对类型较为一致的诊断对进行分类从而实现诊断对的批量标注(例如肾功能损伤分别与低钙血症、低氯血症组成的诊断对既可以通过聚类聚为一类,只对其中一对进行标注),减少标注工作量。In the present application, manual full labeling can be adopted, or manual sampling labeling can be performed first, and then the diagnosis pairs with relatively consistent types can be classified through diagnosis pair similarity or other clustering methods to achieve batch labeling of diagnosis pairs (for example, the diagnosis pairs consisting of renal function injury, hypocalcemia and hypochloremia can be clustered into one category through clustering, and only one pair is labeled), reducing the labeling workload.
下面对本申请实施例公开的病历诊断漏写检测装置进行描述,下文描述的病历诊断漏写检测装置与上文描述的病历诊断漏写检测方法可相互对应参照。The following is a description of a medical record diagnosis omission detection device disclosed in an embodiment of the present application. The medical record diagnosis omission detection device described below and the medical record diagnosis omission detection method described above can be referenced to each other.
参照图6,图6为本申请实施例公开的一种病历诊断漏写检测装置结构示意图。如图6所示,该病历诊断漏写检测装置可以包括:Referring to Figure 6, Figure 6 is a schematic diagram of the structure of a medical record diagnosis omission detection device disclosed in an embodiment of the present application. As shown in Figure 6, the medical record diagnosis omission detection device may include:
病历文本确定单元11,用于确定待检测的病历文本;A medical record text determination unit 11, used to determine the medical record text to be detected;
候选漏写诊断确定单元12,用于确定所述病历文本对应的候选漏写诊断;A candidate omitted diagnosis determining unit 12, used to determine a candidate omitted diagnosis corresponding to the medical record text;
目标候选漏写诊断确定单元13,用于从所述候选漏写诊断中确定目标候选漏写诊断,所述目标候选漏写诊断为与所述病历文本中的已写诊断语义一致的候选漏写诊断,以及,被确定是所述已写诊断的症状的候选漏写诊断;A target candidate omitted diagnosis determining unit 13 is used to determine a target candidate omitted diagnosis from the candidate omitted diagnoses, wherein the target candidate omitted diagnosis is a candidate omitted diagnosis that is semantically consistent with a written diagnosis in the medical record text, and a candidate omitted diagnosis that is determined to be a symptom of the written diagnosis;
漏写诊断确定单元14,用于将所述候选漏写诊断中除所述目标候选漏写诊断之外的其他候选漏写诊断确定为漏写诊断。The omission diagnosis determining unit 14 is configured to determine other candidate omission diagnoses among the candidate omission diagnoses except the target candidate omission diagnosis as omission diagnoses.
作为一种可实施方式,所述候选漏写诊断确定单元,包括:As an implementable embodiment, the candidate missing writing diagnosis determination unit includes:
提示信息生成子单元,用于生成所述病历文本对应的提示信息,所述提示信息用于指示确定所述病历文本对应的候选漏写诊断;A prompt information generating subunit, used for generating prompt information corresponding to the medical record text, wherein the prompt information is used for indicating the candidate omitted diagnosis corresponding to the medical record text;
生成式模型调用子单元,用于调用生成式模型,将所述病历文本对应的提示信息输入所述生成式模型;A generative model calling subunit, used to call the generative model and input the prompt information corresponding to the medical record text into the generative model;
候选漏写诊断确定子单元,用于基于所述生成式模型的回复,确定所述病历文本对应的候选漏写诊断。The candidate omitted diagnosis determination subunit is used to determine the candidate omitted diagnosis corresponding to the medical record text based on the response of the generative model.
作为一种可实施方式,所述提示信息生成子单元,具体用于:As an implementable embodiment, the prompt information generating subunit is specifically used for:
对所述病历文本进行分割,得到多个病历文本片段;Segmenting the medical record text to obtain multiple medical record text segments;
针对每个病历文本片段,生成与所述病历文本片段对应的提示信息,所述提示信息用于指示确定所述病历文本片段对应的候选漏写诊断;各个病历文本片段对应的提示信息的组合为所述病历文本对应的提示信息。For each medical record text fragment, prompt information corresponding to the medical record text fragment is generated, and the prompt information is used to indicate the candidate omitted diagnosis corresponding to the medical record text fragment; the combination of prompt information corresponding to each medical record text fragment is the prompt information corresponding to the medical record text.
作为一种可实施方式,所述生成式模型的是通过对已有的医疗领域生成式模型进行微调后得到的;所述装置包括微调单元,所述微调单元具体用于:As an implementable embodiment, the generative model is obtained by fine-tuning an existing generative model in the medical field; the device includes a fine-tuning unit, and the fine-tuning unit is specifically used to:
获取训练用病历文本、所述训练用病历文本对应的诊断以及所述训练用病历文本中各个诊断的标签,所述训练用病历文本中每个诊断的标签包括第一标签、第二标签和第三标签,所述第一标签用于指示所述诊断是否确诊、所述第二标签用于指示所述诊断的诊断类型、所述第三标签用于指示所述诊断的召回支撑片段;Acquire a training medical record text, a diagnosis corresponding to the training medical record text, and a label of each diagnosis in the training medical record text, wherein the label of each diagnosis in the training medical record text includes a first label, a second label, and a third label, wherein the first label is used to indicate whether the diagnosis is confirmed, the second label is used to indicate the diagnosis type of the diagnosis, and the third label is used to indicate a recall support segment of the diagnosis;
基于所述训练用病历文本、所述训练用病历文本对应的诊断以及所述训练用病历文本中各个诊断的标签,生成多个训练语料,每个所述训练语料中包括输入文本以及预期输出文本;Based on the training medical record text, the diagnosis corresponding to the training medical record text, and the label of each diagnosis in the training medical record text, a plurality of training corpora are generated, each of the training corpora includes an input text and an expected output text;
将所述输入文本输入已有的医疗领域生成式模型,以所述已有的医疗领域生成式模型的输出趋近于所述输入文本对应的预期输出文本为目标,对所述已有的医疗领域生成式模型进行微调。The input text is input into an existing generative model in the medical field, and the existing generative model in the medical field is fine-tuned with the goal of making the output of the existing generative model in the medical field approach the expected output text corresponding to the input text.
作为一种可实施方式,所述目标候选漏写诊断确定单元,包括:As an implementable embodiment, the target candidate missing writing diagnosis determination unit includes:
诊断对确定单元,用于基于所述候选漏写诊断与所述已写诊断,确定多个诊断对,每个诊断对中包括一个所述候选漏写诊断以及一个所述已写诊断;A diagnosis pair determination unit, configured to determine a plurality of diagnosis pairs based on the candidate missing diagnosis and the written diagnosis, each diagnosis pair including one of the candidate missing diagnosis and one of the written diagnosis;
判断单元,用于针对每个诊断对,判断所述诊断对中两个诊断的语义是否一致;A judging unit, configured to judge, for each diagnosis pair, whether the semantics of two diagnoses in the diagnosis pair are consistent;
第一处理单元,用于如果所述诊断对中两个诊断的语义一致,则确定所述诊断对中的候选漏写诊断为目标候选漏写诊断;A first processing unit, configured to determine a candidate omitted diagnosis in the diagnosis pair as a target candidate omitted diagnosis if the semantics of the two diagnoses in the diagnosis pair are consistent;
对比单元,用于如果所述诊断对中两个诊断的语义不一致,则将所述诊断对与预设的诊断症状对集合中各个诊断症状对进行对比;A comparing unit, configured to compare the diagnosis pair with each diagnosis-symptom pair in a preset diagnosis-symptom pair set if the semantics of the two diagnoses in the diagnosis pair are inconsistent;
第二处理单元,用于如果所述诊断对与所述诊断症状对集合中某个诊断症状对一致,则确定所述候选漏写诊断为目标候选漏写诊断。The second processing unit is configured to determine the candidate omission diagnosis as a target candidate omission diagnosis if the diagnosis pair is consistent with a diagnosis symptom pair in the diagnosis symptom pair set.
作为一种可实施方式,所述诊断对确定单元,具体用于:As an implementable embodiment, the diagnosis pair determination unit is specifically used for:
对所述候选漏写诊断进行规范化处理,得到规范化处理后的候选漏写诊断;Normalizing the candidate omission diagnoses to obtain normalized candidate omission diagnoses;
对所述已写诊断进行规范化处理,得到规范化处理后的已写诊断;Performing normalization processing on the written diagnosis to obtain a normalized written diagnosis;
基于所述规范化处理后的候选漏写诊断与所述规范化处理后的已写诊断,确定所述多个诊断对。The plurality of diagnosis pairs are determined based on the normalized candidate missed diagnoses and the normalized written diagnoses.
作为一种可实施方式,所述诊断对确定单元,具体用于:As an implementable embodiment, the diagnosis pair determination unit is specifically used for:
对所述候选漏写诊断与所述已写诊断进行两两组合形成候选诊断对;每个所述候选诊断对中包括一个所述候选漏写诊断以及一个所述已写诊断;Combining the candidate missing diagnosis and the written diagnosis in pairs to form candidate diagnosis pairs; each of the candidate diagnosis pairs includes one candidate missing diagnosis and one written diagnosis;
对各个所述候选诊断对进行规范化处理,得到规范化处理后的候选诊断对;Performing normalization processing on each of the candidate diagnosis pairs to obtain normalized candidate diagnosis pairs;
对各个所述规范化处理后的候选诊断对进行去重处理,确定所述多个诊断对。De-duplication processing is performed on each of the normalized candidate diagnosis pairs to determine the multiple diagnosis pairs.
作为一种可实施方式,所述装置还包括诊断症状对集合确定单元,所述诊断症状对集合确定单元,具体用于:As an implementable embodiment, the device further includes a diagnostic symptom pair set determining unit, and the diagnostic symptom pair set determining unit is specifically used to:
获取参考病历文本、所述参考病历文本对应的候选漏写诊断以及所述参考病历文本对应的已写诊断;Obtaining a reference medical record text, a candidate omitted diagnosis corresponding to the reference medical record text, and a written diagnosis corresponding to the reference medical record text;
对所述参考病历文本对应的已写诊断进行两两组合得到第一诊断对集合,对所述参考病历文本对应的候选漏写诊断以及所述参考病历文本对应的已写诊断进行两两组合,并对所述参考病历文本对应的候选漏写诊断进行两两组合得到第二诊断对集合;Combining the written diagnoses corresponding to the reference medical record text in pairs to obtain a first diagnosis pair set, combining the candidate omitted diagnoses corresponding to the reference medical record text and the written diagnoses corresponding to the reference medical record text in pairs, and combining the candidate omitted diagnoses corresponding to the reference medical record text in pairs to obtain a second diagnosis pair set;
计算所述第一诊断对集合中,每个所述第一诊断对的共现概率;Calculating the co-occurrence probability of each of the first diagnosis pairs in the first diagnosis pair set;
基于所述第一诊断对集合中,各个所述第一诊断对的共现概率,从所述第一诊断对集合以及所述第二诊断对集合中确定所述诊断症状对集合。The diagnosis-symptom pair set is determined from the first diagnosis pair set and the second diagnosis pair set based on the co-occurrence probability of each of the first diagnosis pairs in the first diagnosis pair set.
参照图7,图7为本申请实施例提供的一种病历诊断漏写检测设备的硬件结构框图,参照图7,病历诊断漏写检测设备的硬件结构可以包括:至少一个处理器1,至少一个通信接口2,至少一个存储器3和至少一个通信总线4;Referring to FIG. 7 , FIG. 7 is a hardware structure block diagram of a medical record diagnosis omission detection device provided in an embodiment of the present application. Referring to FIG. 7 , the hardware structure of the medical record diagnosis omission detection device may include: at least one processor 1, at least one communication interface 2, at least one memory 3 and at least one communication bus 4;
在本申请实施例中,处理器1、通信接口2、存储器3、通信总线4的数量为至少一个,且处理器1、通信接口2、存储器3通过通信总线4完成相互间的通信;In the embodiment of the present application, the number of the processor 1, the communication interface 2, the memory 3, and the communication bus 4 is at least one, and the processor 1, the communication interface 2, and the memory 3 communicate with each other through the communication bus 4;
处理器1可能是一个中央处理器CPU,或者是特定集成电路ASIC(ApplicationSpecific Integrated Circuit),或者是被配置成实施本发明实施例的一个或多个集成电路等;The processor 1 may be a central processing unit CPU, or an application-specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement the embodiments of the present invention, etc.;
存储器3可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatilememory)等,例如至少一个磁盘存储器;The memory 3 may include a high-speed RAM memory, and may also include a non-volatile memory, such as at least one disk memory;
其中,存储器存储有程序,处理器可调用存储器存储的程序,所述程序用于:The memory stores a program, and the processor can call the program stored in the memory, and the program is used to:
确定待检测的病历文本;Determine the medical records to be tested;
确定所述病历文本对应的候选漏写诊断;Determining a candidate omitted diagnosis corresponding to the medical record text;
从所述候选漏写诊断中确定目标候选漏写诊断,所述目标候选漏写诊断为与所述病历文本中的已写诊断语义一致的候选漏写诊断,以及,被确定是所述已写诊断的症状的候选漏写诊断;Determine a target candidate omitted diagnosis from the candidate omitted diagnoses, the target candidate omitted diagnosis being a candidate omitted diagnosis that is semantically consistent with a written diagnosis in the medical record text, and a candidate omitted diagnosis that is determined to be a symptom of the written diagnosis;
将所述候选漏写诊断中除所述目标候选漏写诊断之外的其他候选漏写诊断确定为漏写诊断。The other candidate omission diagnoses among the candidate omission diagnoses except the target candidate omission diagnosis are determined as omission diagnoses.
可选的,所述程序的细化功能和扩展功能可参照上文描述。Optionally, the detailed functions and extended functions of the program may refer to the above description.
本申请实施例还提供一种可读存储介质,该可读存储介质可存储有适于处理器执行的程序,所述程序用于:The embodiment of the present application further provides a readable storage medium, which may store a program suitable for execution by a processor, wherein the program is used to:
确定待检测的病历文本;Determine the medical records to be tested;
确定所述病历文本对应的候选漏写诊断;Determining a candidate omitted diagnosis corresponding to the medical record text;
从所述候选漏写诊断中确定目标候选漏写诊断,所述目标候选漏写诊断为与所述病历文本中的已写诊断语义一致的候选漏写诊断,以及,被确定是所述已写诊断的症状的候选漏写诊断;Determine a target candidate omitted diagnosis from the candidate omitted diagnoses, the target candidate omitted diagnosis being a candidate omitted diagnosis that is semantically consistent with a written diagnosis in the medical record text, and a candidate omitted diagnosis that is determined to be a symptom of the written diagnosis;
将所述候选漏写诊断中除所述目标候选漏写诊断之外的其他候选漏写诊断确定为漏写诊断。The other candidate omission diagnoses among the candidate omission diagnoses except the target candidate omission diagnosis are determined as omission diagnoses.
可选的,所述程序的细化功能和扩展功能可参照上文描述。Optionally, the detailed functions and extended functions of the program may refer to the above description.
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。Finally, it should be noted that, in this article, relational terms such as first and second, etc. are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Moreover, the terms "include", "comprise" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or device. In the absence of further restrictions, the elements defined by the sentence "comprise a ..." do not exclude the presence of other identical elements in the process, method, article or device including the elements.
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。The various embodiments in this specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments. The same or similar parts between the various embodiments can be referenced to each other.
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本申请。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本申请的精神或范围的情况下,在其它实施例中实现。因此,本申请将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above description of the disclosed embodiments enables those skilled in the art to implement or use the present application. Various modifications to these embodiments will be apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the present application. Therefore, the present application will not be limited to the embodiments shown herein, but will conform to the widest scope consistent with the principles and novel features disclosed herein.
Claims (11)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311322233.9A CN117373592A (en) | 2023-10-12 | 2023-10-12 | Medical record diagnosis write omission detection method, device, equipment and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311322233.9A CN117373592A (en) | 2023-10-12 | 2023-10-12 | Medical record diagnosis write omission detection method, device, equipment and readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117373592A true CN117373592A (en) | 2024-01-09 |
Family
ID=89397658
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311322233.9A Pending CN117373592A (en) | 2023-10-12 | 2023-10-12 | Medical record diagnosis write omission detection method, device, equipment and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117373592A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP7624261B1 (en) * | 2024-05-14 | 2025-01-30 | フィッティングクラウド株式会社 | Medical document creation support system, medical document creation support method, and program |
-
2023
- 2023-10-12 CN CN202311322233.9A patent/CN117373592A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP7624261B1 (en) * | 2024-05-14 | 2025-01-30 | フィッティングクラウド株式会社 | Medical document creation support system, medical document creation support method, and program |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107562732B (en) | Method and system for processing electronic medical record | |
US12170133B2 (en) | Automated information extraction and enrichment in pathology report using natural language processing | |
US8793199B2 (en) | Extraction of information from clinical reports | |
CN109522551B (en) | Entity linking method and device, storage medium and electronic equipment | |
López-Úbeda et al. | COVID-19 detection in radiological text reports integrating entity recognition | |
CN110335653B (en) | Non-standard medical record analysis method based on openEHR medical record format | |
CN107578798B (en) | Method and system for processing electronic medical record | |
EP3928322A1 (en) | Automated generation of structured patient data record | |
CN112541066B (en) | Text-based structured medical report detection method and related equipment | |
US12100517B2 (en) | Generalized biomarker model | |
US11709877B2 (en) | Systems and methods for targeted annotation of data | |
CN110609910A (en) | Medical knowledge graph construction method and device, storage medium and electronic device | |
CN106844351A (en) | A kind of medical institutions towards multi-data source organize class entity recognition method and device | |
CN111477320B (en) | Construction system of treatment effect prediction model, treatment effect prediction system and terminal | |
Martinez et al. | Cross-hospital portability of information extraction of cancer staging information | |
US20170220743A1 (en) | Tracking real-time assessment of quality monitoring in endoscopy | |
CN117633209A (en) | Method and system for patient information summary | |
CN117373592A (en) | Medical record diagnosis write omission detection method, device, equipment and readable storage medium | |
KR101607672B1 (en) | Apparatus and method for permutation based pattern discovery technique in unstructured clinical documents | |
Benson et al. | Leveraging natural language processing to extract features of colorectal polyps from pathology reports for epidemiologic study | |
WO2021008601A1 (en) | Method for testing medical data | |
Yogarajan et al. | Seeing the whole patient: using multi-label medical text classification techniques to enhance predictions of medical codes | |
CN113111660A (en) | Data processing method, device, equipment and storage medium | |
CN111079420B (en) | Text recognition method and device, computer readable medium and electronic equipment | |
CN117493642B (en) | Similar electronic medical record retrieval method, device, terminal and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |