CN107680676A - A kind of gestational diabetes Forecasting Methodology based on electronic health record data-driven - Google Patents
A kind of gestational diabetes Forecasting Methodology based on electronic health record data-driven Download PDFInfo
- Publication number
- CN107680676A CN107680676A CN201710877528.0A CN201710877528A CN107680676A CN 107680676 A CN107680676 A CN 107680676A CN 201710877528 A CN201710877528 A CN 201710877528A CN 107680676 A CN107680676 A CN 107680676A
- Authority
- CN
- China
- Prior art keywords
- data
- prediction
- gdm
- model
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Quality & Reliability (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
Description
技术领域technical field
本发明涉及糖尿病预测领域,具体而言是一种基于电子病历数据驱动的妊娠期糖尿病预测方法。The invention relates to the field of diabetes prediction, in particular to a method for predicting gestational diabetes driven by electronic medical record data.
背景技术Background technique
在疾病预测中,以妊娠期糖尿病(gestational diabetes mellitus , GDM)为例,根据国际糖尿病联调查,虽然越来越多的妇女接受产前检查,但仍是最常见的妊娠并发症,其定义为妊娠前糖代谢正常或有潜在糖耐量减退、妊娠期才出现或确诊的糖尿病。GDM带来的严重后果,使得医疗群体对其提前诊断和预防十分重视。妊娠期糖尿病带来的风险包括:母婴2型糖尿病、胎儿过度生长和短期内相关不良预后风险,以及后代长期肥胖等风险。GDM预测诊断和预防作为妇幼保健群体都十分关注的重要问题,使得其成为健康医疗大数据应用的一个重要领域。临床医生和孕妇都期待在更早的妊娠阶段就能够感知GDM风险,以期尽早预防和干预。随着电子病历(Electronic medical record, EMR)及相关临床数据的周期性收集和跨界数据的积累,EMR再利用和大数据分析技术为GDM提前诊断和预防提供前瞻性工具。In disease prediction, take gestational diabetes mellitus (GDM) as an example. According to the International Diabetes Federation survey, although more and more women receive prenatal care, it is still the most common pregnancy complication, which is defined as Diabetes with normal glucose metabolism or potential impaired glucose tolerance before pregnancy, or diagnosed during pregnancy. The serious consequences of GDM have made the medical community attach great importance to its early diagnosis and prevention. Risks associated with gestational diabetes include type 2 diabetes in both mother and child, fetal overgrowth and related short-term adverse prognosis risks, and long-term obesity in offspring. GDM predictive diagnosis and prevention are important issues that are of great concern to the maternal and child health care group, making it an important field for the application of big data in health care. Both clinicians and pregnant women expect to be able to perceive the risk of GDM at an earlier stage of pregnancy, so as to prevent and intervene as early as possible. With the periodic collection of electronic medical records (EMR) and related clinical data and the accumulation of cross-border data, EMR reuse and big data analysis technologies provide prospective tools for the early diagnosis and prevention of GDM.
根据传统的诊断方法如国家卫生行业标准的妊娠期糖尿病诊断指南(2011),常在妊娠24-28周通过口服葡萄糖耐量试验(oral glucose tolerance test, OGTT),执行妊娠期糖尿病的筛选与诊断。在此之前,孕产妇也会进行大量的系统性产前检查。孕妇健康医疗数据的全面动态采集、利用与传递,在多源信息系统中积累了大量的EMR数据。结合海量的健康医疗大数据,通过妇幼保健智能应用和基于临床决策支持系统等工具也能够收集大量关于妇幼人群医疗保健服务的碎片化信息和追踪数据,为GDM预测提供了大量的跨界数据。这些应用能够动态实时采集数据,提供个性化、精准化的健康管理服务,这已成为行业内一种新趋势。通过EMR实现GDM风险预测和模式辨识,降低GDM病给母婴带来的高风险,逐渐成为提升妇幼健康水平的重要途径。According to traditional diagnostic methods, such as the National Health Industry Standard Diagnosis Guidelines for Gestational Diabetes (2011), the screening and diagnosis of gestational diabetes are often performed by oral glucose tolerance test (OGTT) at 24-28 weeks of gestation. Before that, pregnant women will also undergo a large number of systematic prenatal examinations. The comprehensive and dynamic collection, utilization and transmission of pregnant women's health and medical data has accumulated a large amount of EMR data in the multi-source information system. Combined with massive health and medical big data, a large amount of fragmented information and tracking data on maternal and child health care services can also be collected through tools such as maternal and child health care intelligent applications and clinical decision support systems, providing a large amount of cross-border data for GDM forecasting. These applications can dynamically collect data in real time and provide personalized and precise health management services, which has become a new trend in the industry. Realizing GDM risk prediction and pattern identification through EMR, reducing the high risk of GDM disease to mothers and infants, has gradually become an important way to improve the health of women and children.
发明内容Contents of the invention
有鉴于现有研究之不足,本发明提供一种基于电子病历数据驱动的妊娠期糖尿病预测方法;在区域医疗服务中发挥日益重要的作用;使用临床数据,结合人工智能和机器学习等方法,提供疾病的智能决策支持系统,这有助于解决重复检验检查和重复诊疗问题,提供医生工作效率和降低劳动负荷,加强医疗差错控制,丰富区域医疗信息共享平台的服务方式及提高应用价值。In view of the deficiencies of existing research, the present invention provides a method for predicting gestational diabetes driven by electronic medical records; it plays an increasingly important role in regional medical services; using clinical data, combined with methods such as artificial intelligence and machine learning, provides An intelligent decision support system for diseases, which helps to solve the problems of repeated inspections and repeated diagnosis and treatment, improves the work efficiency of doctors and reduces labor load, strengthens the control of medical errors, enriches the service methods of the regional medical information sharing platform and improves the application value.
本发明实现过程是,构造一种基于电子病历数据驱动的妊娠期糖尿病预测方法,其特征在于:包括以下步骤:The implementation process of the present invention is to construct a method for predicting gestational diabetes based on electronic medical record data, which is characterized in that: comprising the following steps:
(1)、输入与ETL数据清洗模块。获取EMR对应的历史建档孕妇数据,并通过抽取、转换和加载等步骤完成初步数据清洗,完成去隐私与数据质量管理;(1) Input and ETL data cleaning module. Obtain historical archived pregnant women data corresponding to EMR, and complete preliminary data cleaning through steps such as extraction, conversion, and loading, and complete deprivation and data quality management;
(2)、病案编码与特征数据关联模块。通过EMR系统的患者识别码ID进行时空脱敏数据关联,结合临床知识和经验筛选特征数据,生成GDM数据仓库;(2), medical record coding and feature data association module. Through the patient identification code ID of the EMR system, the time-space desensitization data association is carried out, and the characteristic data is screened by combining clinical knowledge and experience to generate a GDM data warehouse;
(3)、EMR数据预处理模块。对输入数据进行缺失值、离散化和归一化处理;(3), EMR data preprocessing module. Perform missing value, discretization and normalization processing on input data;
(4)、二次数据处理模块。进行分类标签校准,完成纳入排除标准检查;(4), secondary data processing module. Perform classification label calibration and complete inclusion and exclusion criteria checks;
(5)、特征工程模块。将数据划分为GDM和非GDM两类,将与疾病关联的临床数据作为条件属性,标记类别为决策属性,进行嵌入式特征选择;(5), feature engineering module. Divide the data into two categories: GDM and non-GDM, use the clinical data associated with the disease as the conditional attribute, mark the category as the decision attribute, and perform embedded feature selection;
(6)、机器学习模块。根据选择的输入特征,将全数据划分为训练样本和测试样本,选择时间窗和机器学习模型,进行十字交叉法训练,得到预测算法;(6), machine learning module. According to the selected input features, the whole data is divided into training samples and test samples, the time window and machine learning model are selected, and the cross method training is carried out to obtain the prediction algorithm;
(7)、预测应用模块。将未确诊的孕妇电子病历数据,输入步骤(6)中的机器学习模型,推理这些待诊断孕妇的GDM发生值(或风险率)。(7) Forecast application module. Input the electronic medical record data of undiagnosed pregnant women into the machine learning model in step (6), and infer the occurrence value (or risk rate) of GDM in these undiagnosed pregnant women.
具体而言,首先启动电子病历数据驱动的妊娠期糖尿病预测方法的处理过程,访问电子病历数据存储器,并将数据输入流程,通过ETL模块完成数据清洗工作。随后结合电子病历数据收集流程,识别预测问题,完成病案编码与特征数据关联。通过去隐私相关方法,消除患者数据中的隐私信息,检查数据质量,实现查询关联的脱敏数据模块,完成GDM数据仓库构建。在EMR数据预处理阶段,完成缺失值处理,实现数据离散化与归一化处理工作。然后,实现分类标签分析与校准,完成电子病历数据纳入排除标准检查,完成二次数据处理工作。对实验数据集进行样本划分,包括确诊GDM与否的两个数据集,进而实现特征工程模块。接着,通过时间窗划分与模型选择模块,进入:全域数据预测模型GDPM、分期数据预测模型SDPM或周数据预测模型WDPM,然后进入基于机器学习的GDM预测模型,进而实现预测应用。将应用结果作为反馈控制来优化模型,在后续阶段的特征工程中实现数据复用。最后结束数据处理流程。Specifically, first start the processing process of the EMR data-driven gestational diabetes prediction method, access the EMR data storage, input the data into the process, and complete the data cleaning work through the ETL module. Then, combined with the electronic medical record data collection process, the prediction problem is identified, and the medical record coding and characteristic data association are completed. Through de-privacy-related methods, the privacy information in patient data is eliminated, the data quality is checked, and the query associated desensitized data module is realized to complete the construction of the GDM data warehouse. In the EMR data preprocessing stage, the missing value processing is completed, and the data discretization and normalization processing are realized. Then, realize classification label analysis and calibration, complete the electronic medical record data inclusion and exclusion criteria check, and complete the secondary data processing. The experimental data set is divided into samples, including two data sets for GDM diagnosis or not, and then the feature engineering module is realized. Then, through the time window division and model selection module, enter: the global data prediction model GDPM, the staged data prediction model SDPM or the weekly data prediction model WDPM, and then enter the GDM prediction model based on machine learning, and then realize the prediction application. Use the application results as feedback control to optimize the model and enable data reuse in subsequent stages of feature engineering. Finally, end the data processing process.
根据本发明所述的一种基于电子病历数据驱动的妊娠期糖尿病预测方法,其特征在于:采集不同时间窗划分方法构建预测模型:According to a method for predicting gestational diabetes driven by electronic medical record data according to the present invention, it is characterized in that: different time window division methods are collected to construct a prediction model:
(1)、全域数据预测模型,使用早于OGTT的EMR数据,推理分类值;(1) Global data prediction model, using EMR data earlier than OGTT, to infer classification values;
(2)、分期数据预测模型,使用孕早期或13周至23周的EMR,推理分类值;(2) Stage data prediction model, using EMR in the first trimester or 13 to 23 weeks, to infer classification values;
(3)、周数据预测模型,使用从第12周开始的每周EMR,推理分类值。(3) Weekly data prediction model, using weekly EMRs starting from week 12, to infer categorical values.
根据本发明所述的一种基于电子病历数据驱动的妊娠期糖尿病预测方法,其特征在于:通过病案编码与特征数据关联,构建GDM数据仓库:According to a kind of gestational diabetes prediction method driven by electronic medical record data according to the present invention, it is characterized in that: the GDM data warehouse is constructed by associating the medical record code with the feature data:
(1)、建档数据集,包括建档ID、建档医生、建档时间、丈夫年龄、丈夫身体状况、丈夫嗜好、BMI等;(1) File creation data set, including file ID, file doctor, file date, husband's age, husband's physical condition, husband's hobbies, BMI, etc.;
(2)、产检数据集,包括产检ID、体重、产检时间、孕周、舒张压等;(2) Obstetrical inspection data set, including obstetrical inspection ID, weight, obstetric inspection time, gestational week, diastolic blood pressure, etc.;
(3)、LIS数据,包括血常规、肝功能、肾功能、电解质、糖化血红蛋白、血脂、铁蛋白等;(3), LIS data, including blood routine, liver function, kidney function, electrolytes, glycosylated hemoglobin, blood lipids, ferritin, etc.;
(4)、病案首页,包括就诊号、病案号、诊断编号、诊断类型、诊断ICD编码(医生)、诊断ICD编码(病案)等。(4) The first page of the medical record, including the consultation number, medical record number, diagnosis number, diagnosis type, diagnostic ICD code (doctor), diagnostic ICD code (medical record), etc.
鉴于基于智能推理模型的预测系统在智慧医疗服务中发挥日益重要的作用,本发明提供了基于机器学习的妊娠期糖尿病预测框架,根据采集数据的不同时间窗划分方法,构建了全域数据预测模型、分期数据预测模型和周数据预测模型三组预测框架。在识别预测问题后,通过输入与ETL数据清洗、病案编码与特征数据关联、EMR数据预处理、二次数据处理、特征工程、机器学习、预测应用七个步骤,实现了高维度电子病历的数据挖掘。使用临床数据构建了关于确诊的标记数据集,并将其划分为用于模型训练和测试的2个子集。通过支持向量机、贝叶斯网络、CHAID决策树以及基于集成的混合模型进行预测,实现GDM模式分类。基于预测模型开发了区域医疗应用系统,结果表明基于机器学习的GDM预测模型为区域医疗孕产妇提前预测妊娠期糖尿病提供了一种有效的应用工具。In view of the increasingly important role of prediction systems based on intelligent reasoning models in smart medical services, the present invention provides a prediction framework for gestational diabetes based on machine learning, and constructs a global data prediction model, There are three sets of forecasting frameworks: periodical data forecasting model and weekly data forecasting model. After identifying the prediction problem, through seven steps of input and ETL data cleaning, medical record coding and feature data association, EMR data preprocessing, secondary data processing, feature engineering, machine learning, and predictive application, the data of high-dimensional electronic medical records is realized. dig. A labeled dataset about diagnosis was constructed using clinical data and divided into 2 subsets for model training and testing. GDM pattern classification is achieved through support vector machines, Bayesian networks, CHAID decision trees, and predictions based on ensemble-based hybrid models. The regional medical application system was developed based on the prediction model, and the results showed that the GDM prediction model based on machine learning provided an effective application tool for early prediction of gestational diabetes in pregnant women in regional medical care.
本发明具有以下优点:构建了基于机器学习的妊娠期糖尿病(GDM)预测模型集,根据采集数据的不同时间窗划分方法,涵盖了孕产妇提前预测妊娠期糖尿病的主要需求,提高了临床决策支持系统的智能水平,在区域医疗服务中发挥日益重要的作用;通过病案编码与临床特征数据关联,构建基于核心数据的GDM数据仓库,为高维度EMR数据的知识挖掘与数据管理提供技术支撑,能够提高医生工作效率和降低工作负荷,增强医疗差错控制,丰富区域医疗信息共享平台的服务方式及提高应用价值。The present invention has the following advantages: a gestational diabetes mellitus (GDM) prediction model set based on machine learning is constructed, and according to different time window division methods of collected data, it covers the main needs of pregnant women to predict gestational diabetes mellitus in advance, and improves clinical decision support The intelligent level of the system plays an increasingly important role in regional medical services; through the association of medical record coding and clinical feature data, a GDM data warehouse based on core data is constructed to provide technical support for knowledge mining and data management of high-dimensional EMR data, which can Improve doctors' work efficiency and reduce workload, enhance medical error control, enrich the service methods of the regional medical information sharing platform and improve the application value.
附图说明Description of drawings
图1电子病历数据驱动的妊娠期糖尿病预测;Figure 1 Prediction of gestational diabetes driven by electronic medical record data;
图2基于二次清洗和机器学习的GDM预测框架;Figure 2 GDM prediction framework based on secondary cleaning and machine learning;
图3全域数据预测模型;Figure 3 Global data prediction model;
图4分期数据预测模型;Figure 4 staging data prediction model;
图5周数据预测模型;Figure 5 Weekly data prediction model;
图6孕早期建档前数据收集流程;Figure 6 Data collection process before filing in the first trimester;
图7基于机器学习的GDM预测模块;Fig. 7 GDM prediction module based on machine learning;
图8是一种电子病历数据驱动的妊娠期糖尿病预测方法流程示意图。Fig. 8 is a schematic flowchart of a method for predicting gestational diabetes driven by electronic medical record data.
具体实施方式detailed description
下面将结合附图对本发明进行详细说明,对本发明实施例中的技术方案进行清楚、完整地描述。所描述的实施例仅仅是本发明的一部分而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The present invention will be described in detail below in conjunction with the accompanying drawings, and the technical solutions in the embodiments of the present invention will be clearly and completely described. The described embodiments are some but not all embodiments of the invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.
为探索涵盖临床经验的电子病历数据是否有助于对孕早期孕妇群体进行GDM状态预测。综合电子病历中的结构化、非结构化数据和其他外部网络数据,不仅能对妊娠期糖尿病的并发症进行临床数据关联分析,还能提前预测孕妇患妊娠期糖尿病(GDM)的风险。本文通过健康医疗大数据分析方法,提供一个早于OGTT诊断的GDM预测框架,如图1所示,即电子病历数据驱动的妊娠期糖尿病预测;To explore whether electronic medical record data covering clinical experience is helpful for the prediction of GDM status in the first-trimester group of pregnant women. Combining structured and unstructured data in electronic medical records and other external network data can not only conduct clinical data association analysis on complications of gestational diabetes, but also predict the risk of pregnant women suffering from gestational diabetes mellitus (GDM) in advance. This paper provides a GDM prediction framework earlier than OGTT diagnosis through the analysis method of health care big data, as shown in Figure 1, that is, the prediction of gestational diabetes driven by electronic medical record data;
本发明所述的一种基于电子病历数据驱动的妊娠期糖尿病预测方法,包括以下步骤:A kind of gestational diabetes prediction method driven by electronic medical record data according to the present invention comprises the following steps:
(1)、获取电子病历数据对应的历史建档孕妇数据;(1) Obtain the historical archived pregnant women data corresponding to the electronic medical record data;
(2)、获取OGTT确定的相关GDM信息,将孕妇数据划分为确诊的GDM和用于参照的非GDM;(2) Obtain the relevant GDM information determined by the OGTT, and divide the data of pregnant women into confirmed GDM and non-GDM for reference;
(3)、从步骤(2)得到的样本通过电子病例系统的患者识别码ID进行时空数据关联,通过特征选择筛选与GDM疾病关联的标记的电子病历的特征数据;(3) The samples obtained from step (2) are associated with spatiotemporal data through the patient identification code ID of the electronic medical record system, and the feature data of the marked electronic medical records associated with the GDM disease are screened through feature selection;
(4)、获取每个样本的临床数据(建档数据、产检数据、LIS数据、病案首页)及其特征属性,所述与疾病关联的临床数据作为条件属性,标记类别的数据为分类决策属性;以每个样本电子病例的条件属性为输入向量,以疾病的GDM发生值为输出,建立机器学习模型;(4) Obtain the clinical data (archive data, obstetric examination data, LIS data, medical record home page) and its characteristic attributes of each sample. The clinical data associated with the disease is used as a conditional attribute, and the data of the marked category is a classification decision attribute ; The conditional attribute of each sample electronic case is used as the input vector, and the GDM occurrence value of the disease is output to establish a machine learning model;
(5)、获取未确诊的孕妇的电子病历特征数据,输入步骤(4)中的机器学习模型,推理可预测待查孕妇的GDM发生值(或风险率)。(5) Obtain the characteristic data of the electronic medical records of undiagnosed pregnant women, input the machine learning model in step (4), and reason to predict the occurrence value (or risk rate) of GDM in pregnant women to be investigated.
将以上电子病历数据驱动的妊娠期糖尿病预测的过程细化,发明基于二次清洗和机器学习的GDM预测框架,如图2所示,即基于二次清洗和机器学习的GDM预测框架:The process of gestational diabetes prediction driven by the above electronic medical record data is refined, and a GDM prediction framework based on secondary cleaning and machine learning is invented, as shown in Figure 2, that is, a GDM prediction framework based on secondary cleaning and machine learning:
在模型设计阶段,首先是识别预测问题。在此基础上,通过各种途径,收集妊娠期糖尿病病人的各类医疗数据,储存在计算机信息系统中。根据时间窗划分进行系统假设分析,发明的方案主要包括三组方案。运用LIS系统,分析实验仪器传出的检验数据,生成检验报告,通过网络存储在数据库中。临床数据提取的依据主要包括以下三点:①定性-定量数据正则化;②妊娠前已有糖尿病即糖尿病合并妊娠(pre-gestational diabetes mellitus,PGDM);③临床知识判断,活化部分凝血活酶时间(Activated Partial ThromboplastinTime, APTT)间隔;④其他数据校准与预处理过程。In the model design phase, the first step is to identify the forecasting problem. On this basis, various medical data of patients with gestational diabetes are collected through various channels and stored in the computer information system. According to the system hypothesis analysis based on the time window division, the invented scheme mainly includes three groups of schemes. Using the LIS system, analyze the inspection data sent by the experimental instruments, generate inspection reports, and store them in the database through the network. The basis for clinical data extraction mainly includes the following three points: ①Qualitative-quantitative data regularization; ②Pre-gestational diabetes mellitus (PGDM) with pre-gestational diabetes; ③Clinical knowledge judgment, activated partial thromboplastin time (Activated Partial ThromboplastinTime, APTT) interval; ④Other data calibration and preprocessing processes.
在一次处理阶段,使用的主要包括抽取、转换和加载(Extract-Transform-Load,ETL)等数据清洗工具。数据经处理后,进行时间窗迭代推理。在确定了数据仓库后,进行临床数据提取和二次处理。在ICD-10中,妊娠期糖尿病诊断代码为O24.4。运用ICD-10将数据分为病案首页、检验项目、检查结果等,并建立电子病历。将二次处理后的数据,进行特征工程处理。主要包括两方面任务:算法自动选择;专家参与选择共有特征集。最后是模型挖掘,通过知识发现、优化预测和数据分析等过程,建立恰当的模型,存放收集的信息。In the primary processing stage, data cleaning tools such as Extract-Transform-Load (ETL) are mainly used. After the data is processed, time window iterative reasoning is performed. After the data warehouse is determined, clinical data extraction and secondary processing are carried out. In ICD-10, the diagnosis code for gestational diabetes is O24.4. Use ICD-10 to divide the data into the first page of medical records, inspection items, inspection results, etc., and establish electronic medical records. The data after secondary processing is subjected to feature engineering processing. It mainly includes two tasks: algorithm automatic selection; expert participation in selection of common feature set. The last is model mining, through processes such as knowledge discovery, optimization prediction and data analysis, to establish an appropriate model and store the collected information.
在预测框架构建阶段,基于多尺度时间窗划分方法形成三组预测模型。因时间尺度选择决定了用于机器学习的数据粒度。根据多尺度时间窗划分方法,按照时间尺度从较大、中等和较小三个等级,发明的数据挖掘方案主要包括三组模型:全域数据预测模型、分期数据预测模型和周数据预测模型。In the forecasting framework construction stage, three sets of forecasting models are formed based on the multi-scale time window division method. The choice of time scale determines the granularity of data used for machine learning. According to the multi-scale time window division method, according to the three grades of large, medium and small according to the time scale, the invented data mining scheme mainly includes three groups of models: global data prediction model, staged data prediction model and weekly data prediction model.
表1基于多尺度时间窗划分方法的三组预测模型Table 1 Three groups of prediction models based on multi-scale time window division method
注:训练数据标签:正、负值来源于OGTT的诊断ICD的离散值映射。Note: Training data labels: positive and negative values are derived from the discrete value mapping of OGTT's diagnostic ICD.
除此之外,可以按照尺度选择更宽泛的时间窗,如天等,这里因篇幅不做更多介绍。In addition, you can choose a wider time window according to the scale, such as days, etc., and I will not introduce more here due to the space.
下面对全域数据预测模型、分期数据预测模型和周数据预测模型三组模型进行分别介绍。The following three groups of models are introduced respectively, namely, the global data forecasting model, the phased data forecasting model and the weekly data forecasting model.
全域数据预测模型GDPM:全域数据预测模型(Global Data Prediction Model,GDPM)将早于OGTT的孕产妇电子病历数据,作为一个整体数据集,都提取出来,用于机器学习和预测。全域数据预测模型如图3所示。在这种思路下,数据是在整个怀孕期间收集的电子病例数据,是一个稀疏的数据集。通过预测模型实现孕早期的GDM辨识,预测后期GDM发生结果,得到的是全域数据的分类值(及其概率),即全孕周期的妊娠期糖尿病预测值。Global Data Prediction Model GDPM: Global Data Prediction Model (Global Data Prediction Model, GDPM) extracts maternal electronic medical record data earlier than OGTT as a whole data set for machine learning and prediction. The global data prediction model is shown in Figure 3. In this line of thinking, the data is electronic case data collected throughout pregnancy, a sparse dataset. Through the prediction model to realize the identification of GDM in the first trimester and predict the occurrence of GDM in the later stage, the classification value (and its probability) of the global data is obtained, that is, the prediction value of gestational diabetes in the whole pregnancy cycle.
分期数据预测模型SDPM:分期数据预测模型(Staged Data Prediction Model,SDPM)将早于OGTT的孕产妇电子病历数据,划分为孕早期数据集(13孕周前),孕中期数据集(13周-OGTT前),形成两个数据集,进行预测。分期数据预测模型如图4所示。包括两种子模式:其一,输入孕早期数据集(孕13周前),包括孕早期的建档数据、产检数据、LIS数据、病案首页,相应地得到孕早期预测分类值(及其概率)。其二,输入孕中期数据集(13周-OGTT前),包括孕中期的建档数据、产检数据、LIS数据、病案首页等,相应地得到孕中期预测分类值(及其概率)。Staged data prediction model SDPM: The staged data prediction model (Staged Data Prediction Model, SDPM) divides the maternal electronic medical record data earlier than OGTT into the first trimester data set (before 13 weeks of pregnancy), the second trimester data set (13 weeks - Before OGTT), two data sets are formed for prediction. The staging data forecasting model is shown in Figure 4. It includes two sub-modes: first, input the data set of the first trimester (before 13 weeks of pregnancy), including the filing data of the first trimester, obstetric inspection data, LIS data, and the front page of medical records, and correspondingly obtain the predicted classification value (and its probability) of the first trimester . Second, input the second-trimester data set (13 weeks-before OGTT), including second-trimester filing data, obstetric inspection data, LIS data, medical record homepage, etc., and obtain the second-trimester prediction classification value (and its probability) accordingly.
周数据预测模型:周数据预测模型(Weekly Data Prediction Model, WDPM)将早于OGTT的孕产妇电子病历数据,按照一定的时间窗(如孕周)划分为多个数据集,进行预测。周数据预测模型如图5所示。这一组模型下,输入数据分别为第12、13、14、…、23孕周数据集,包括各孕周的产检数据、LIS数据、病案首页等。各孕周虽然时间间隔一致,但孕妇因实际需求和行为方式,确定了各孕周的数据统计频率分布不均,在有的孕周数据规模较大,如第12周建档期;而在有的孕周数据规模较小,因在这些孕周上孕妇去做产检的人数规模不太大。Weekly data prediction model: The weekly data prediction model (Weekly Data Prediction Model, WDPM) divides the maternal electronic medical record data earlier than OGTT into multiple data sets according to a certain time window (such as gestational weeks) for prediction. The weekly data forecasting model is shown in Figure 5. Under this set of models, the input data are the 12th, 13th, 14th, ..., 23rd gestational week data sets, including the obstetric inspection data, LIS data, medical record homepage, etc. of each gestational week. Although the time interval of each gestational week is the same, due to the actual needs and behaviors of pregnant women, the statistical frequency distribution of each gestational week is determined to be uneven. In some gestational weeks, the data scale is relatively large, such as the 12th week. Some data on gestational weeks are relatively small, because the number of pregnant women who go to check-ups at these gestational weeks is not large.
结合临床知识和领域专家意见,考虑到方案GDPM,数据统计分布不均,不适合后续的数据处理;方案SDPM中,数据时间跨度太长,孕妇体征差异大,也不适合数据的进一步处理。只有在孕早期收集的且相对集中的数据才符合要求,可以用来做进一步分析,即实施方案WDPM。Combining clinical knowledge and expert opinion in the field, considering the program GDPM, the statistical distribution of data is uneven, which is not suitable for subsequent data processing; in the program SDPM, the data time span is too long, and the signs of pregnant women vary greatly, so it is not suitable for further data processing. Only the relatively concentrated data collected in the first trimester meet the requirements and can be used for further analysis, that is, the implementation plan WDPM.
在使用EMR数据或进行临床数据挖掘时,除了需要掌握临床指南、临床路径外,还需要完成以下重要子过程:When using EMR data or conducting clinical data mining, in addition to mastering clinical guidelines and clinical pathways, the following important sub-processes need to be completed:
输入与ETL数据清洗:输入数据(或数据收集流程)以传统门诊服务流程为线索,分析建档期间内数据收集和信息集结的流程。孕早期建档前数据收集流程,如图5所示。在孕早期即怀孕0~12周内,尚未形成建档数据。在0~5周时孕妇确认怀孕,他们会选择做妇科检查或不做妇科检查,如果无异常,妇产科医生就在孕妇怀孕12周时建档。对不做妇科检查的孕妇,妇产科医生也在他们怀孕12周时建档。接下来的步骤包括:首先,对做妇科检查的孕妇,妇产科医生为他们做B超检查。如果检查结果是宫外孕,就要终止妊娠;如果不是,再进行妇科三合诊检查。其次,检查子宫肌瘤是否影响胎儿发育,如果有影响,需要外科医生对孕妇进行手术,摘除子宫肌瘤;如果没有影响,就进行常规妇科检查,包括白带检查、宫颈刮片检查和妇科窥器检查。接着,判断孕妇是否被细菌感染,如果孕妇被细菌感染,应立刻安排内科医生就诊治疗;如果没有被感染,可以进行绒毛膜采样,不过该检查有风险。然后,检查胎儿是否严重异常,如果检查结果显示胎儿严重异常,就要终止妊娠;如果胎儿并无严重异常,再由内科医生为孕妇检查心、肝、肾功能等,确定孕妇其他功能有无异常。如果有异常,孕妇应就诊治疗;如果无异常,妇产科医生就在孕妇怀孕12周时建档。对不做妇科检查的孕妇,妇产科医生也在他们怀孕12周时建档。此外,在24-28孕周糖耐量检测数据流程中,孕妇首先领取检查单,交费后到达化验室,进行血常规以及血型和各项抗体检查。接着准备尿液进行尿常规检查。然后进行阴道分泌物检查。最后进行24—28周糖耐量检测,内分泌科医务人员先为孕妇空腹抽血;接着孕妇喝下医院开的葡萄糖水,耗时5分钟左右;一小时后,医务人员为孕妇抽血;再过一小时,医务人员第三次为其抽血。稍后医务人员进行血糖浓度检测,耗时一天,最后孕妇拿到检测结果。Input and ETL data cleaning: The input data (or data collection process) takes the traditional outpatient service process as a clue to analyze the process of data collection and information aggregation during the filing period. The process of data collection before filing in the first trimester is shown in Figure 5. In the first trimester, that is, within 0 to 12 weeks of pregnancy, no filing data has been formed. When pregnant women confirm pregnancy at 0-5 weeks, they will choose to have a gynecological examination or not to do a gynecological examination. If there is no abnormality, the obstetrician and gynecologist will file when the pregnant woman is 12 weeks pregnant. For pregnant women who do not have gynecological examinations, obstetricians and gynecologists will also file when they are 12 weeks pregnant. The next steps include: First, for the pregnant women undergoing gynecological examinations, obstetricians and gynecologists will perform B-ultrasound examinations for them. If the test result is an ectopic pregnancy, the pregnancy must be terminated; if not, a gynecological triad examination should be performed. Secondly, check whether the uterine fibroids affect the fetal development. If so, the surgeon needs to operate on the pregnant woman to remove the uterine fibroids; if there is no effect, perform routine gynecological examinations, including leucorrhea examination, cervical smear examination and gynecological speculum examine. Then, determine whether the pregnant woman is infected by bacteria. If the pregnant woman is infected by bacteria, a physician should be arranged for treatment immediately; if not infected, chorionic villus sampling can be performed, but this test is risky. Then, check whether the fetus is seriously abnormal. If the test results show that the fetus is seriously abnormal, the pregnancy must be terminated; if the fetus is not seriously abnormal, a physician will check the heart, liver, and kidney functions of the pregnant woman to determine whether other functions of the pregnant woman are abnormal. . If there is any abnormality, the pregnant woman should see a doctor for treatment; if there is no abnormality, the obstetrician and gynecologist will file when the pregnant woman is 12 weeks pregnant. For pregnant women who do not have gynecological examinations, obstetricians and gynecologists will also file when they are 12 weeks pregnant. In addition, in the 24-28 weeks of gestational glucose tolerance test data flow, pregnant women first receive the checklist, and after paying the fee, they arrive at the laboratory for blood routine, blood type and various antibody tests. Urine is then prepared for routine urinalysis. A vaginal discharge test is then performed. Finally, the 24-28 week glucose tolerance test is carried out. The medical staff of the endocrinology department first draw blood for the pregnant woman on an empty stomach; then the pregnant woman drinks the glucose water prescribed by the hospital, which takes about 5 minutes; one hour later, the medical staff draws blood for the pregnant woman; An hour later, the medical staff drew blood for the third time. Later, the medical staff tested the blood sugar concentration, which took a day, and finally the pregnant woman got the test result.
在输入与ETL数据清洗阶段,使用的主要包括抽取、转换和加载(Extract-Transform-Load,ETL)等数据清洗工具。数据经处理后,进行时间窗迭代推理。确定了数据的类型,其次是临床数据提取。在数据处理中,使用的数据清洗方法主要包括:数据标准化处理、查找表处理(去除无效数据)、模糊动态匹配、基本常识逻辑判断、领域常识判断、数据相关性校验。通过这些方法实现缺失数据处理、相似重复对象检测、异常数据处理、逻辑错误检测、不一致数据处理等。In the input and ETL data cleaning phase, data cleaning tools such as Extract-Transform-Load (ETL) are mainly used. After the data is processed, time window iterative reasoning is performed. The type of data was identified, followed by clinical data extraction. In data processing, the data cleaning methods used mainly include: data standardization processing, lookup table processing (removing invalid data), fuzzy dynamic matching, basic common sense logic judgment, domain common sense judgment, and data correlation verification. Through these methods, missing data processing, similar duplicate object detection, abnormal data processing, logical error detection, inconsistent data processing, etc. are realized.
病案编码与特征数据关联:通过EMR系统的患者识别码ID进行时空脱敏数据关联,结合临床知识和经验筛选特征数据,生成GDM数据仓库。用于GDM机器学习的数据仓库的数据来源主要为住院患者(如孕妇)和门诊等建档数据,包括:建档数据、产检明细、LIS数据和病案首页。建档数据选取建档ID、建档医生、建档时间、丈夫年龄、丈夫身体状况、丈夫嗜好、BMI等。产检明细选取登记号、产检ID、体重、产检时间、孕周、舒张压等。LIS数据选取血常规、肝功能、肾功能、电解质、糖化血红蛋白、血脂、铁蛋白等。病案首页就诊号、病案号、诊断编号、诊断类型、诊断ICD编码(医生)、诊断ICD编码(病案)。Medical record coding and characteristic data association: through the patient identification code ID of the EMR system, the temporal and spatial desensitization data association is carried out, and the characteristic data is screened in combination with clinical knowledge and experience to generate a GDM data warehouse. The data sources of the data warehouse used for GDM machine learning are mainly inpatients (such as pregnant women) and outpatient data, including: archive data, obstetric inspection details, LIS data, and medical record homepages. The filing data selects the filing ID, filing doctor, filing time, husband's age, husband's physical condition, husband's hobbies, BMI, etc. Obstetric inspection details select registration number, obstetric inspection ID, weight, obstetric inspection time, gestational week, diastolic blood pressure, etc. Blood routine, liver function, kidney function, electrolytes, glycosylated hemoglobin, blood lipids, ferritin, etc. were selected for LIS data. On the first page of the medical record, visit number, medical record number, diagnosis number, diagnosis type, diagnosis ICD code (doctor), diagnosis ICD code (medical record).
EMR预测数据预处理:对输入数据进行缺失值、离散化和归一化处理;数据清洗中存在的数据质量问题及实例,如表2所示。EMR prediction data preprocessing: missing value, discretization and normalization processing of input data; data quality problems and examples in data cleaning, as shown in Table 2.
表2数据清洗中存在的数据质量问题及实例Table 2 Data quality problems and examples in data cleaning
二次数据处理:在ICD-10中,妊娠期糖尿病诊断代码为O24.4。进行分类标签校准,完成纳入排除标准检查。运用ICD-10将数据分为病案首页、检验项目、检查结果等,并建立电子病历。Secondary data processing: In ICD-10, the diagnosis code for gestational diabetes mellitus is O24.4. Classification label calibration was performed and inclusion and exclusion criteria checks were completed. Use ICD-10 to divide the data into the first page of medical records, inspection items, inspection results, etc., and establish electronic medical records.
特征工程:在数据清洗和二次处理中的临床数据集统计信息表中,数据源分为建档数据、产检明细、LIS数据、病案首页等。将二次处理后的数据,进行特征工程处理。将数据划分为GDM和非GDM两类,将与疾病关联的临床数据作为条件属性,标记类别为决策属性,进行嵌入式特征选择。主要包括两方面任务:算法自动选择;专家参与选择共有特征集,存放提取的特征信息。Feature engineering: In the statistical information table of clinical data sets in data cleaning and secondary processing, data sources are divided into archive data, obstetric inspection details, LIS data, medical record home pages, etc. The data after secondary processing is subjected to feature engineering processing. Divide the data into two types, GDM and non-GDM, regard the clinical data associated with the disease as the conditional attribute, mark the category as the decision attribute, and perform embedded feature selection. It mainly includes two tasks: automatic algorithm selection; experts participate in the selection of common feature sets and store the extracted feature information.
机器学习:预测模型选择:结合临床知识和领域专家意见,充分考虑到方案GDPM、方案SDPM、方案WDPM的主要特征,如数据时间跨度、孕妇体征差异等,进行基于机器学习的GDM预测模型选择。通过知识发现、优化预测和数据分析等过程,建立恰当的数据挖掘模型。根据选择的输入特征,将全数据划分为训练样本和测试样本,选择时间窗和机器学习模型,进行十字交叉法训练,得到预测算法。使用贝叶斯网络、支持向量机、CHAID决策树以及集成的混合模型,进行妊娠期糖尿病预测实验,如图7所示,即基于机器学习的GDM预测模块。训练的学习算法能够提供反馈信息,改善参数学习和模型调整过程。并且使用受试者工作特征曲线(receiver operating characteristic curve,ROC)和曲线下面积(Area Underroc Curve, AUC)提供评价指标,对模型性能进行综合评价,进而反馈控制预测模型和算法。Machine learning: Predictive model selection: Combining clinical knowledge and expert opinions in the field, fully considering the main features of the scheme GDPM, scheme SDPM, and scheme WDPM, such as data time span, differences in pregnant women's signs, etc., to select a GDM prediction model based on machine learning. Through the processes of knowledge discovery, optimization prediction and data analysis, an appropriate data mining model is established. According to the selected input features, the whole data is divided into training samples and test samples, the time window and machine learning model are selected, and the cross method training is carried out to obtain the prediction algorithm. Using Bayesian network, support vector machine, CHAID decision tree and integrated mixed model, the prediction experiment of gestational diabetes is carried out, as shown in Figure 7, which is the GDM prediction module based on machine learning. The trained learning algorithm can provide feedback information to improve the parameter learning and model tuning process. And use receiver operating characteristic curve (receiver operating characteristic curve, ROC) and area under the curve (Area Underroc Curve, AUC) to provide evaluation indicators, comprehensively evaluate the performance of the model, and then feed back the control prediction model and algorithm.
预测应用:将未确诊的孕妇电子病历数据,输入机器学习模型,推理这些待诊断孕妇的GDM发生值(或风险率)。该方法实现了高维度EHRs正负样本不均衡的GDM数据挖掘,构建关于建档数据的特征维度和OGTT诊断ICD标记的核数据集,通过预测模型实现孕早期的GDM辨识。将建档样本数据集划分为GDM和非GDM(nGDM)子集,进行模型训练和测试。通过孕早期建档的产检数据,预测后期GDM发生结果,比正常情况下的OGTT诊断结果提前。其中,全域数据预测模型比OGTT诊断结果稍早,分期数据预测模型会将预测状态提前到孕早期或OGTT前的孕中期;周数据预测模型会将GDM预测的风险提前到各个孕周。Prediction application: input the electronic medical record data of undiagnosed pregnant women into the machine learning model, and deduce the GDM occurrence value (or risk rate) of these undiagnosed pregnant women. This method realizes the GDM data mining of the unbalanced positive and negative samples of high-dimensional EHRs, constructs the feature dimension of the archived data and the nuclear data set of OGTT diagnostic ICD markers, and realizes the identification of GDM in the first trimester through the prediction model. Divide the archival sample dataset into GDM and non-GDM (nGDM) subsets for model training and testing. Predict the occurrence of GDM in the later period through the prenatal inspection data filed in the first trimester, which is earlier than the OGTT diagnosis result under normal circumstances. Among them, the global data prediction model is slightly earlier than the OGTT diagnosis result, and the staging data prediction model will advance the prediction status to the first trimester or the second trimester before OGTT; the weekly data prediction model will advance the risk of GDM prediction to each gestational week.
该方法为GDM的提前诊断和预防提供重要的前瞻性工具,其基于电子病历数据驱动的GDM预测方法的流程模块,如图8所示。通过HER实现GDM风险预测和模式辨识。使用准确度最高的模型进行预测,其结果与24至28孕周的GDM筛查诊断时间相比大幅度提前,能够提示孕妇及时参加锻炼、健康饮食计划等。This method provides an important prospective tool for the early diagnosis and prevention of GDM, which is based on the process module of the GDM prediction method driven by electronic medical record data, as shown in Figure 8. Realize GDM risk prediction and pattern recognition through HER. Using the model with the highest accuracy for prediction, the results are significantly earlier than the GDM screening and diagnosis time at 24 to 28 weeks of gestation, which can prompt pregnant women to participate in exercise and healthy diet plans in time.
GDM预测在妇幼医疗服务中的应用:Application of GDM prediction in maternal and child medical services:
GDM预测系统为区域医疗妇幼保健大数据的存储和分析提供了重要平台,提供高度集成、具备开放架构的系统功能,通过灵活配置监管主题来满足不同层次的需求,可为国家、省、市各级提供强大的监管业务支持。基于机器学习的GDM预测功能形成的妇幼保健智能应用,能够使孕产妇了解高危妊娠等的危害性,从而自觉地参加孕、产期系统保健,有针对性地提示孕产妇积极参与体育锻炼、健康饮食计划等,降低孕产妇围产期异常妊娠的发生率和围产儿病、死率,保障妇女及儿童的生命安全。妇幼保健智能应用能通过提供健康信息服务,为妇幼人群提高个体健康管理能力。The GDM forecasting system provides an important platform for the storage and analysis of regional medical maternal and child health big data. It provides highly integrated system functions with an open architecture. It can meet the needs of different levels through flexible configuration of regulatory topics. Provide strong regulatory business support. The intelligent application of maternal and child health care based on the GDM prediction function of machine learning can enable pregnant women to understand the dangers of high-risk pregnancy, so as to consciously participate in the systemic health care during pregnancy and childbirth, and specifically remind pregnant women to actively participate in physical exercise and health care. Diet plan, etc., to reduce the incidence of abnormal perinatal pregnancy and perinatal disease and mortality of pregnant women, and ensure the safety of women and children. Maternal and child health care smart applications can improve individual health management capabilities for women and children by providing health information services.
总结:妊娠期糖尿病的预测成为孕妇提前诊断和预防相关疾病的关键。本文发明的数据驱动的GDM预测模型,为孕妇提前预测提供了一种有效方法。根据采集数据的不同时间窗划分方法,构建了三组预测框架。通过采集孕妇产检的数据进行机器学习,预测后期GDM发生率,与正常情况下的OGTT诊断时间相比,在不同程度上提前了对孕妇GDM风险的预测。本文为GDM提前诊断和预防提供重要的前瞻性工具,后续工作将提高预测方法的准确性和提供个体化医疗服务。Conclusion: Prediction of gestational diabetes becomes the key to early diagnosis and prevention of related diseases in pregnant women. The data-driven GDM prediction model invented in this paper provides an effective method for predicting pregnant women in advance. According to the different time window division methods of the collected data, three groups of prediction frameworks were constructed. Predict the incidence of GDM in the later period by collecting the data of pregnant women's obstetric examination, and predict the risk of GDM in pregnant women in advance to varying degrees compared with the OGTT diagnosis time under normal circumstances. This article provides an important prospective tool for early diagnosis and prevention of GDM, and follow-up work will improve the accuracy of prediction methods and provide individualized medical services.
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本发明。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下,在其它实施例中实现。因此,本发明将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the invention. Therefore, the present invention will not be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (1)
- A kind of 1. gestational diabetes Forecasting Methodology based on electronic health record data-driven, it is characterised in that:In such a way Realize;Structure is with lower module;(1), input with ETL data cleansing modules:History corresponding to obtaining EMR is filed pregnant woman's data, and by extracting, change and The steps such as loading complete preliminary data cleaning, and privacy and data quality management are gone in completion;(2), medical record coding with characteristic relating module:Space-time desensitization data are carried out by the patient identification code ID of emr system Association, with reference to clinical knowledge and experience screening characteristic, generate GDM data warehouses;(3), EMR data pretreatment module:Missing values, discretization and normalized are carried out to input data;(4), secondary data processing module:Tag along sort calibration is carried out, exclusion standard inspection is included in completion;(5), Feature Engineering module:Data are divided into GDM and the classes of non-GDM two, using with the clinical data of disease association as bar Part attribute, mark classification are decision attribute, carry out embedded feature selecting;(6), machine learning module:According to the input feature vector of selection, total evidence is divided into training sample and test sample, selected Time window and machine learning model, crossing method training is carried out, obtains prediction algorithm;(7), prediction application module:By not yet diagnosed pregnant woman's electronic health record data, input step(6)In machine learning model, The GDM occurrence values of these pregnant woman to be diagnosed of reasoning(Or relative risk);During implementation, start the processing procedure of the gestational diabetes Forecasting Methodology of electronic health record data-driven first, access electronics Medical record data memory, and flow is entered data into, completing data cleansing with ETL data cleansing modules by input works;With Electronic health record Data Collection flow is combined afterwards, identification prediction problem, is completed medical record coding and is associated with characteristic;By going privacy Correlation technique, the privacy information in patient data is eliminated, check the quality of data, realize the desensitization data module of inquiry association, it is complete Built into GDM data warehouses;Handled by EMR data pretreatment module in EMR data pretreatment stage, completion missing values, realize Data Discretization with returning One changes processing work;Then, analyzed and calibrated by secondary data processing modules implement tag along sort, completed electronic health record data and include exclusion mark Standard checks, completes secondary data processing work;Sample division is carried out to experimental data set, including GDM whether two numbers made a definite diagnosis According to collection, and then realize Feature Engineering module;Then, the division of passage time window and Model selection module, enter:Universe data prediction model GDPM, by stages data prediction Model SDPM or weekly data forecast model WDPM, subsequently into the GDM forecast models based on machine learning, and then realize that prediction should With;Optimized model will be carried out as feedback control using result, data-reusing is realized in the Feature Engineering of follow-up phase;Finally tie Beam data handling process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710877528.0A CN107680676B (en) | 2017-09-26 | 2017-09-26 | A data-driven prediction method for gestational diabetes mellitus based on electronic medical records |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710877528.0A CN107680676B (en) | 2017-09-26 | 2017-09-26 | A data-driven prediction method for gestational diabetes mellitus based on electronic medical records |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107680676A true CN107680676A (en) | 2018-02-09 |
CN107680676B CN107680676B (en) | 2021-04-27 |
Family
ID=61138163
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710877528.0A Active CN107680676B (en) | 2017-09-26 | 2017-09-26 | A data-driven prediction method for gestational diabetes mellitus based on electronic medical records |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107680676B (en) |
Cited By (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108334501A (en) * | 2018-03-21 | 2018-07-27 | 王欣 | Electronic document analysis system based on machine learning and method |
CN108428478A (en) * | 2018-02-27 | 2018-08-21 | 东北师范大学 | The thyroid cancer Risk Forecast Method excavated based on heterogeneous medical data |
CN108565017A (en) * | 2018-04-23 | 2018-09-21 | 杜欣欣 | A kind of clinical decision system and its method of cervical lesions |
CN108615560A (en) * | 2018-03-19 | 2018-10-02 | 安徽锐欧赛智能科技有限公司 | A kind of clinical medical data analysis method based on data mining |
CN109448855A (en) * | 2018-09-17 | 2019-03-08 | 大连大学 | A kind of diabetes glucose prediction technique based on CNN and Model Fusion |
CN109524118A (en) * | 2018-11-01 | 2019-03-26 | 上海海事大学 | A kind of screen method for gestational diabetes based on machine learning and physical examination data |
CN109671507A (en) * | 2018-12-24 | 2019-04-23 | 万达信息股份有限公司 | A kind of obstetrics' disease that calls for specialized treatment coupling index method for digging based on Electronic Health Record |
CN109801694A (en) * | 2018-12-18 | 2019-05-24 | 北京仁泽健康服务中心 | A kind of disease therapeutic regimen intelligence management-control method and system |
CN109830303A (en) * | 2019-02-01 | 2019-05-31 | 上海众恒信息产业股份有限公司 | Clinical data mining analysis and aid decision-making method based on internet integration medical platform |
CN110246577A (en) * | 2019-05-31 | 2019-09-17 | 深圳江行联加智能科技有限公司 | A method of based on artificial intelligence auxiliary gestational diabetes genetic risk prediction |
WO2019196281A1 (en) * | 2018-04-11 | 2019-10-17 | 平安科技(深圳)有限公司 | Epidemic grading and prediction method and apparatus, computer apparatus, and readable storage medium |
CN110808097A (en) * | 2019-10-30 | 2020-02-18 | 中国福利会国际和平妇幼保健院 | Gestational diabetes prediction system and method |
CN111180070A (en) * | 2019-12-30 | 2020-05-19 | 腾讯科技(深圳)有限公司 | Medical record data analysis method and device |
CN111312399A (en) * | 2020-02-24 | 2020-06-19 | 南京鼓楼医院 | A method for establishing a model for early prediction of gestational diabetes mellitus |
CN111710420A (en) * | 2020-05-15 | 2020-09-25 | 深圳先进技术研究院 | A method, system, terminal and storage medium for complication risk prediction based on electronic medical record big data |
CN111899866A (en) * | 2020-07-28 | 2020-11-06 | 四川大学华西医院 | Evaluation system for surgical complications based on deep learning |
CN111933290A (en) * | 2020-08-14 | 2020-11-13 | 苏州赫亚斯顿智能科技有限公司 | Method and device for establishing artificial reproduction pregnancy prediction by machine learning model |
CN111968741A (en) * | 2020-07-15 | 2020-11-20 | 华南理工大学 | Diabetes complication high-risk early warning system based on deep learning and integrated learning |
WO2020240543A1 (en) * | 2019-05-24 | 2020-12-03 | Yeda Research And Development Co. Ltd. | Method and system for predicting gestational diabetes |
CN112331342A (en) * | 2020-10-27 | 2021-02-05 | 昆明理工大学 | Disease risk grade evaluation method based on gridding covariate factors |
CN112635046A (en) * | 2019-09-24 | 2021-04-09 | 西门子医疗有限公司 | System and method for infection notification |
CN113012806A (en) * | 2021-02-20 | 2021-06-22 | 西安交通大学医学院第二附属医院 | Early prediction method for gestational diabetes |
CN113366445A (en) * | 2019-09-30 | 2021-09-07 | 株式会社日立信息通信工程 | State prediction system |
CN113744870A (en) * | 2021-09-14 | 2021-12-03 | 中国医学科学院阜外医院 | System and method for main diagnosis and prediction of first page of medical record |
CN114155959A (en) * | 2021-12-07 | 2022-03-08 | 郑州大学 | Intelligent auxiliary diagnosis method and system for thyroid diseases and readable storage medium |
WO2022063047A1 (en) | 2020-09-22 | 2022-03-31 | 博邦芳舟医疗科技(北京)有限公司 | Photoplethysmography-based non-invasive diabetes prediction system and method |
CN114612255A (en) * | 2022-04-08 | 2022-06-10 | 重庆邮电大学 | An insurance pricing method based on electronic medical record data feature selection |
CN114898888A (en) * | 2022-07-15 | 2022-08-12 | 武汉大学 | Medical data processing method, apparatus, computer equipment and readable storage medium |
CN116130093A (en) * | 2023-03-06 | 2023-05-16 | 重庆医科大学 | Method, system, terminal and medium for managing gestational diabetes patients based on DSS |
CN116612879A (en) * | 2023-07-19 | 2023-08-18 | 北京惠每云科技有限公司 | Diagnostic result prediction method, diagnostic result prediction device, electronic equipment and storage medium |
EP4069058A4 (en) * | 2019-10-20 | 2024-02-14 | Cognitivecare India Labs LLP | A maternal and infant health intelligence & cognitive insights (mihic) system and score to predict the risk of maternal, fetal and infant morbidity and mortality |
CN117995340A (en) * | 2024-04-07 | 2024-05-07 | 北京惠每云科技有限公司 | Intelligent recruitment method and device for clinical trial based on large model |
CN118039062A (en) * | 2024-04-12 | 2024-05-14 | 四川省肿瘤医院 | Individualized chemotherapy dose remote control method based on big data analysis |
CN119626553A (en) * | 2025-02-12 | 2025-03-14 | 绿色医疗科技(大连)有限公司 | A pregnancy complication risk assessment system for high-risk pregnant women |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106777909A (en) * | 2016-11-28 | 2017-05-31 | 深圳市人民医院 | Gestational period health risk assessment system |
CN106778042A (en) * | 2017-01-26 | 2017-05-31 | 中电科软件信息服务有限公司 | Cardio-cerebral vascular disease patient similarity analysis method and system |
CN106874663A (en) * | 2017-01-26 | 2017-06-20 | 中电科软件信息服务有限公司 | Cardiovascular and cerebrovascular disease Risk Forecast Method and system |
-
2017
- 2017-09-26 CN CN201710877528.0A patent/CN107680676B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106777909A (en) * | 2016-11-28 | 2017-05-31 | 深圳市人民医院 | Gestational period health risk assessment system |
CN106778042A (en) * | 2017-01-26 | 2017-05-31 | 中电科软件信息服务有限公司 | Cardio-cerebral vascular disease patient similarity analysis method and system |
CN106874663A (en) * | 2017-01-26 | 2017-06-20 | 中电科软件信息服务有限公司 | Cardiovascular and cerebrovascular disease Risk Forecast Method and system |
Cited By (47)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108428478B (en) * | 2018-02-27 | 2022-03-29 | 东北师范大学 | Thyroid cancer risk prediction method based on heterogeneous medical data mining |
CN108428478A (en) * | 2018-02-27 | 2018-08-21 | 东北师范大学 | The thyroid cancer Risk Forecast Method excavated based on heterogeneous medical data |
CN108615560A (en) * | 2018-03-19 | 2018-10-02 | 安徽锐欧赛智能科技有限公司 | A kind of clinical medical data analysis method based on data mining |
CN108334501B (en) * | 2018-03-21 | 2021-07-20 | 王欣 | Electronic document analysis system and method based on machine learning |
CN108334501A (en) * | 2018-03-21 | 2018-07-27 | 王欣 | Electronic document analysis system based on machine learning and method |
WO2019196281A1 (en) * | 2018-04-11 | 2019-10-17 | 平安科技(深圳)有限公司 | Epidemic grading and prediction method and apparatus, computer apparatus, and readable storage medium |
CN108565017A (en) * | 2018-04-23 | 2018-09-21 | 杜欣欣 | A kind of clinical decision system and its method of cervical lesions |
CN109448855A (en) * | 2018-09-17 | 2019-03-08 | 大连大学 | A kind of diabetes glucose prediction technique based on CNN and Model Fusion |
CN109524118A (en) * | 2018-11-01 | 2019-03-26 | 上海海事大学 | A kind of screen method for gestational diabetes based on machine learning and physical examination data |
CN109801694A (en) * | 2018-12-18 | 2019-05-24 | 北京仁泽健康服务中心 | A kind of disease therapeutic regimen intelligence management-control method and system |
CN109671507A (en) * | 2018-12-24 | 2019-04-23 | 万达信息股份有限公司 | A kind of obstetrics' disease that calls for specialized treatment coupling index method for digging based on Electronic Health Record |
CN109830303A (en) * | 2019-02-01 | 2019-05-31 | 上海众恒信息产业股份有限公司 | Clinical data mining analysis and aid decision-making method based on internet integration medical platform |
WO2020240543A1 (en) * | 2019-05-24 | 2020-12-03 | Yeda Research And Development Co. Ltd. | Method and system for predicting gestational diabetes |
CN110246577B (en) * | 2019-05-31 | 2021-04-30 | 深圳江行联加智能科技有限公司 | Method for assisting gestational diabetes genetic risk prediction based on artificial intelligence |
CN110246577A (en) * | 2019-05-31 | 2019-09-17 | 深圳江行联加智能科技有限公司 | A method of based on artificial intelligence auxiliary gestational diabetes genetic risk prediction |
CN112635046A (en) * | 2019-09-24 | 2021-04-09 | 西门子医疗有限公司 | System and method for infection notification |
CN113366445A (en) * | 2019-09-30 | 2021-09-07 | 株式会社日立信息通信工程 | State prediction system |
EP4069058A4 (en) * | 2019-10-20 | 2024-02-14 | Cognitivecare India Labs LLP | A maternal and infant health intelligence & cognitive insights (mihic) system and score to predict the risk of maternal, fetal and infant morbidity and mortality |
CN110808097A (en) * | 2019-10-30 | 2020-02-18 | 中国福利会国际和平妇幼保健院 | Gestational diabetes prediction system and method |
CN111180070A (en) * | 2019-12-30 | 2020-05-19 | 腾讯科技(深圳)有限公司 | Medical record data analysis method and device |
CN111312399A (en) * | 2020-02-24 | 2020-06-19 | 南京鼓楼医院 | A method for establishing a model for early prediction of gestational diabetes mellitus |
CN111710420A (en) * | 2020-05-15 | 2020-09-25 | 深圳先进技术研究院 | A method, system, terminal and storage medium for complication risk prediction based on electronic medical record big data |
CN111710420B (en) * | 2020-05-15 | 2024-03-19 | 深圳先进技术研究院 | Complication onset risk prediction method, system, terminal and storage medium based on electronic medical record big data |
CN111968741B (en) * | 2020-07-15 | 2023-07-18 | 华南理工大学 | High-risk early warning system for diabetic complications based on deep learning and integrated learning |
CN111968741A (en) * | 2020-07-15 | 2020-11-20 | 华南理工大学 | Diabetes complication high-risk early warning system based on deep learning and integrated learning |
CN111899866A (en) * | 2020-07-28 | 2020-11-06 | 四川大学华西医院 | Evaluation system for surgical complications based on deep learning |
CN111899866B (en) * | 2020-07-28 | 2022-04-22 | 四川大学华西医院 | Surgical operation complication evaluation system based on deep learning |
CN111933290A (en) * | 2020-08-14 | 2020-11-13 | 苏州赫亚斯顿智能科技有限公司 | Method and device for establishing artificial reproduction pregnancy prediction by machine learning model |
CN111933290B (en) * | 2020-08-14 | 2023-10-10 | 北京赫雅智能科技有限公司 | Method and device for predicting artificial reproduction conception by machine learning model |
WO2022063047A1 (en) | 2020-09-22 | 2022-03-31 | 博邦芳舟医疗科技(北京)有限公司 | Photoplethysmography-based non-invasive diabetes prediction system and method |
CN112331342A (en) * | 2020-10-27 | 2021-02-05 | 昆明理工大学 | Disease risk grade evaluation method based on gridding covariate factors |
CN113012806A (en) * | 2021-02-20 | 2021-06-22 | 西安交通大学医学院第二附属医院 | Early prediction method for gestational diabetes |
CN113012806B (en) * | 2021-02-20 | 2024-01-19 | 西安交通大学医学院第二附属医院 | Early prediction method for gestational diabetes mellitus |
CN113744870A (en) * | 2021-09-14 | 2021-12-03 | 中国医学科学院阜外医院 | System and method for main diagnosis and prediction of first page of medical record |
CN113744870B (en) * | 2021-09-14 | 2023-06-27 | 中国医学科学院阜外医院 | System and method for main diagnosis and prediction of medical records front page |
CN114155959A (en) * | 2021-12-07 | 2022-03-08 | 郑州大学 | Intelligent auxiliary diagnosis method and system for thyroid diseases and readable storage medium |
CN114612255A (en) * | 2022-04-08 | 2022-06-10 | 重庆邮电大学 | An insurance pricing method based on electronic medical record data feature selection |
CN114612255B (en) * | 2022-04-08 | 2023-11-07 | 湖南提奥医疗科技有限公司 | Insurance pricing method based on electronic medical record data feature selection |
CN114898888B (en) * | 2022-07-15 | 2022-09-23 | 武汉大学 | Medical data processing method, apparatus, computer equipment and readable storage medium |
CN114898888A (en) * | 2022-07-15 | 2022-08-12 | 武汉大学 | Medical data processing method, apparatus, computer equipment and readable storage medium |
CN116130093A (en) * | 2023-03-06 | 2023-05-16 | 重庆医科大学 | Method, system, terminal and medium for managing gestational diabetes patients based on DSS |
CN116612879B (en) * | 2023-07-19 | 2023-09-26 | 北京惠每云科技有限公司 | Diagnostic result prediction method, diagnostic result prediction device, electronic equipment and storage medium |
CN116612879A (en) * | 2023-07-19 | 2023-08-18 | 北京惠每云科技有限公司 | Diagnostic result prediction method, diagnostic result prediction device, electronic equipment and storage medium |
CN117995340A (en) * | 2024-04-07 | 2024-05-07 | 北京惠每云科技有限公司 | Intelligent recruitment method and device for clinical trial based on large model |
CN117995340B (en) * | 2024-04-07 | 2024-06-25 | 北京惠每云科技有限公司 | Intelligent recruitment method and device for clinical trial based on large model |
CN118039062A (en) * | 2024-04-12 | 2024-05-14 | 四川省肿瘤医院 | Individualized chemotherapy dose remote control method based on big data analysis |
CN119626553A (en) * | 2025-02-12 | 2025-03-14 | 绿色医疗科技(大连)有限公司 | A pregnancy complication risk assessment system for high-risk pregnant women |
Also Published As
Publication number | Publication date |
---|---|
CN107680676B (en) | 2021-04-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107680676A (en) | A kind of gestational diabetes Forecasting Methodology based on electronic health record data-driven | |
CN107242858B (en) | Personalized pregnant infant monitoring method and system | |
Liu et al. | Machine learning algorithms to predict early pregnancy loss after in vitro fertilization-embryo transfer with fetal heart rate as a strong predictor | |
CN112365943B (en) | Method, device, electronic device and storage medium for predicting patient hospitalization duration | |
Raja et al. | A machine learning‐based prediction model for preterm birth in rural India | |
JP4139822B2 (en) | How to select medical and biochemical diagnostic tests using neural network related applications | |
Wasan et al. | The impact of data mining techniques on medical diagnostics | |
Souza et al. | The development of a simplified, effective, labour monitoring-to-action (SELMA) tool for better outcomes in labour difficulty (BOLD): study protocol | |
Ford et al. | Characteristics of unmatched maternal and baby records in linked birth records and hospital discharge data | |
US20030004906A1 (en) | Method for selecting medical and biochemical diagnostic tests using neural network-related applications | |
CN114639479A (en) | Intelligent diagnosis auxiliary system based on medical knowledge map | |
CN110115563A (en) | A kind of TCM Syndrome Type forecasting system | |
US20250046460A1 (en) | Method for establishing digital twin based on medical health, device, and storage for the same | |
CN111292815A (en) | Community big data health service system based on cloud | |
Liu et al. | [Retracted] Artificial Intelligence Technology‐Based Medical Information Processing and Emergency First Aid Nursing Management | |
CN118430834A (en) | Intelligent medical decision model training method based on gynecological clinic | |
Nwamekwe et al. | Evaluating Advances in Machine Learning Algorithms for Predicting and Preventing Maternal and Foetal Mortality in Nigerian Healthcare: A Systematic Approach | |
Amare et al. | Harmonising Antenatal Care Records Across Four African Countries: A Comparative Analysis and Development of a Standardised Data Model | |
Fernandes et al. | ILITIA: telehealth architecture for high-risk gestation classification | |
Xie et al. | Identifying spontaneous abortion from clinical notes within a large integrated healthcare system | |
Lu | [Retracted] The Effect of Nursing Intervention Model Using Mobile Nursing System on Pregnancy Outcome of Pregnant Women | |
CN113823410A (en) | Information recommendation method and device, storage medium and electronic equipment | |
Durrani et al. | A semantic-based framework for verbal autopsy to identify the cause of maternal death | |
CN112331322A (en) | Method, device, processor and storage medium for realizing quantitative evaluation and processing of hospital specialty ability based on neural network | |
Skarga-Bandurova et al. | Extracting interesting rules from gestation course data for early diagnosis of neonatal hypoxia |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |