[go: up one dir, main page]

CN111899888A - Gynecological tumor disease risk prediction visualization system - Google Patents

Gynecological tumor disease risk prediction visualization system Download PDF

Info

Publication number
CN111899888A
CN111899888A CN202010687315.3A CN202010687315A CN111899888A CN 111899888 A CN111899888 A CN 111899888A CN 202010687315 A CN202010687315 A CN 202010687315A CN 111899888 A CN111899888 A CN 111899888A
Authority
CN
China
Prior art keywords
gynecological
disease
risk
prediction
risk factors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010687315.3A
Other languages
Chinese (zh)
Inventor
薛付忠
季晓康
丁荔洁
王永超
杨帆
刘聪聪
陈晓璐
冯一平
王博洁
王睿
朱俊奉
刘真
肖鹏
马官慧
韩君铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kangping Medical Health Co ltd
Shandong University
Sunshine Insurance Group Co Ltd
Original Assignee
Kangping Medical Health Co ltd
Shandong University
Sunshine Insurance Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kangping Medical Health Co ltd, Shandong University, Sunshine Insurance Group Co Ltd filed Critical Kangping Medical Health Co ltd
Priority to CN202010687315.3A priority Critical patent/CN111899888A/en
Publication of CN111899888A publication Critical patent/CN111899888A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H15/00ICT specially adapted for medical reports, e.g. generation or transmission thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Public Health (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Medical Informatics (AREA)
  • Accounting & Taxation (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Primary Health Care (AREA)
  • Finance (AREA)
  • Epidemiology (AREA)
  • Development Economics (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The invention discloses a gynecological tumor disease risk prediction visualization system, which comprises a risk prediction model construction module, wherein gynecological related disease variables are obtained from gynecological cases, correlation analysis is carried out on the gynecological related disease variables and gynecological events, and a gynecological risk prediction model is constructed on the basis of screened risk factors; the gynecological probability prediction module receives the disease risk prediction request, calls a related historical disease data queue, and obtains a gynecological disease probability prediction result based on a gynecological prediction model; and the health report generation module is used for generating a visual report according to the gynecological morbidity prediction result, the acquired physiological index information of the user, the risk factors related to the user and the contribution rate of the risk factors. The gynecological disease probability is predicted according to the physiological indexes of the tested person, the probability value determined by the indexes can be intuitively known, and the data relation among the indexes is intuitively displayed by combining the visualization of the diagnosis report result.

Description

一种妇科肿瘤疾病风险预测可视化系统A visualization system for risk prediction of gynecological tumor diseases

技术领域technical field

本发明属于医疗大数据处理技术领域,尤其涉及一种妇科肿瘤疾病风险预测可视化系统。The invention belongs to the technical field of medical big data processing, and in particular relates to a gynecological tumor disease risk prediction visualization system.

背景技术Background technique

本部分的陈述仅仅是提供了与本公开相关的背景技术信息,不必然构成在先技术。The statements in this section merely provide background information related to the present disclosure and do not necessarily constitute prior art.

妇科肿瘤疾病包括卵巢癌、宫颈癌、乳腺癌、子宫内膜癌等,目前临床上用于妇科风险预测的指标有几种,如卵巢癌的预测,癌抗原CA125是最为常用的监测卵巢癌的肿瘤标志物,CA125存在于上皮性妇科组织和病人的血清中,主要用于辅助诊断恶性浆液性卵巢癌,上皮性卵巢癌,同时也是卵巢癌术后、化疗后疗效观察的指标。但是,发明人发现CA125对于早期卵巢癌监测并不敏感。另一种指标人附睾蛋白4(HE4)在卵巢癌组织和患者血清中呈高表达,而在卵巢正常组织或良性肿瘤中低表达或不表达,但是发明人发现该方法,在运用到大规模检测血清标本时遇到了质量控制、样本准备等问题;以及在与CA125进行联合检测时存在临界值的界定及与CA125联合应用两者界定的问题。Gynecological tumor diseases include ovarian cancer, cervical cancer, breast cancer, endometrial cancer, etc. At present, there are several clinical indicators for gynecological risk prediction, such as the prediction of ovarian cancer. Cancer antigen CA125 is the most commonly used to monitor ovarian cancer. Tumor marker, CA125 exists in epithelial gynecological tissues and serum of patients. It is mainly used to assist in the diagnosis of malignant serous ovarian cancer and epithelial ovarian cancer. However, the inventors found that CA125 is not sensitive for early ovarian cancer surveillance. Another marker, human epididymal protein 4 (HE4), is highly expressed in ovarian cancer tissues and patient serum, but is low or not expressed in normal ovarian tissues or benign tumors, but the inventors found that this method can be used in large-scale applications The quality control, sample preparation and other problems encountered in the detection of serum samples; and the definition of the critical value and the combination of CA125 in the joint detection of CA125.

再者,构建预测模型时,常用的有Gail模型和Claus模型,如对乳腺癌的预测,Gail模型是全球首个乳腺癌风险预测模型,其纳入了四个危险因素:初潮年龄、先前的乳房活组织检查次数、首次活产年龄及患有乳腺癌的一级亲属(母亲或姐妹)的数量。但由于该模型是依据美国白人建立的,因此可能具有种族局限性;另外,Gail模型HIA忽略了一级亲属诊断乳腺癌的年龄、二级亲属患乳腺癌的情况以及乳腺癌的家族史等。Furthermore, when constructing prediction models, the Gail model and Claus model are commonly used, such as the prediction of breast cancer. The Gail model is the world's first breast cancer risk prediction model, which incorporates four risk factors: age at menarche, previous breast Number of biopsies, age at first live birth, and number of first-degree relatives (mother or sister) with breast cancer. However, because the model is based on white Americans, it may have racial limitations; in addition, the Gail model HIA ignores the age at diagnosis of breast cancer in first-degree relatives, the status of breast cancer in second-degree relatives, and the family history of breast cancer.

Claus模型主要用来评估有乳腺癌家族史的女性,该模型纳入的危险因素由年龄、患乳腺癌的亲属数量以及发病年龄,但该模型并不适用于没有乳腺癌家族史的女性,且在其他种族的应用也有待进一步验证。The Claus model is mainly used to evaluate women with a family history of breast cancer. The risk factors included in the model are age, the number of relatives with breast cancer, and the age of onset, but the model is not applicable to women without a family history of breast cancer, and in Applications for other races are also subject to further verification.

综上,目前极少数建立妇科风险预测模型,而且模型建立所采用的指标主要凭借临床经验、已有的公开文献等手段获取,这些指标的选取具有很强的主观性,针对特定人群建立的现有的模型不具备普适性;同时上述根据标记物对妇科的诊断,或通过提取相关的病理图像进行图像识别诊断的方法都是对是否患妇科的诊断,但是在发病前期,正常人在正常的体检过程中,仅通过上述方法并不能发现潜在的发病几率,或者某一项体检指标异常并不能有效诊断是否存在发病几率。To sum up, very few gynecological risk prediction models have been established at present, and the indicators used in the establishment of the models are mainly obtained by means of clinical experience and existing public literature. Some models are not universal; at the same time, the above-mentioned methods of diagnosing gynecology based on markers or extracting relevant pathological images for image recognition and diagnosis are all methods of diagnosing gynecology. In the process of physical examination, only the above methods cannot find the potential incidence of disease, or the abnormality of a certain physical examination index cannot effectively diagnose whether there is an incidence of disease.

其次,对于诊断结果,目前大多是以纸质版体检报告的形式供用户查看,或利用电子表格的形式,将诊断过程的各项指标在表格中呈现;以表格的形式呈现数据,静态的数据形式使用户的体检降低,缺乏直观性,数据可视化可实现动态显示;另外无法直观显示各个指标之间的数据关系,未推广使用预测结果的可视化技术。Secondly, for the diagnosis results, most of them are in the form of paper physical examination reports for users to view, or in the form of spreadsheets, the indicators of the diagnosis process are presented in tables; the data are presented in the form of tables, static data The form reduces the user's physical examination, lacks intuitiveness, and data visualization can realize dynamic display; in addition, it cannot visually display the data relationship between various indicators, and the visualization technology of prediction results is not promoted.

另外,在保险领域,根据年龄、性别因素对客户进行保费厘定,定价上单一,不具有个体针对性;而且,在传统保险行业中,对用户进行健康评估的风险模型是基于行业传统经验构建的,既无法精准获取、评估客户健康信息,也无法实时更新,并且无法排除虚假信息,在面向客户的保险产品设计上存在定价单一、简易核保等问题。In addition, in the field of insurance, premiums are determined for customers based on age and gender factors, and the pricing is single and does not have individual specificity; moreover, in the traditional insurance industry, the risk model for user health assessment is constructed based on traditional industry experience. , it is impossible to accurately obtain and evaluate customer health information, nor to update it in real time, and it is impossible to exclude false information. There are problems such as single pricing and simple underwriting in the design of customer-oriented insurance products.

发明内容SUMMARY OF THE INVENTION

为克服上述现有技术的不足,本发明提供了一种妇科肿瘤疾病风险预测可视化系统,根据被测者的生理指标,预测患妇科的概率,并且能够直观了解该概率值由哪几项指标决定的,同时结合对诊断报告结果的可视化,直观显示各个指标之间的数据关系。In order to overcome the above-mentioned deficiencies of the prior art, the present invention provides a gynecological tumor disease risk prediction visualization system, which can predict the probability of suffering from gynecology according to the physiological indicators of the subject, and can intuitively understand which indicators determine the probability value. At the same time, combined with the visualization of the diagnostic report results, the data relationship between each indicator can be visually displayed.

为实现上述目的,本发明的一个或多个实施例提供了如下技术方案:To achieve the above object, one or more embodiments of the present invention provide the following technical solutions:

一种妇科肿瘤疾病风险预测可视化系统,包括:A gynecological tumor disease risk prediction visualization system, including:

风险预测模型构建模块,在病例中获取相关疾病变量,与妇科肿瘤事件发生进行相关性分析,筛选得到危险因素;基于筛选的危险因素构建妇科肿瘤疾病风险预测模型;Risk prediction model building module, obtain relevant disease variables in cases, conduct correlation analysis with gynecological tumor events, and screen out risk factors; build gynecological tumor disease risk prediction models based on the screened risk factors;

妇科肿瘤疾病概率预测模块,接收发病风险预测请求,调取相关历史疾病数据队列,基于妇科预测模型获取妇科发病概率预测结果;The probability prediction module of gynecological tumor diseases receives the request for prediction of the incidence risk, retrieves the relevant historical disease data queue, and obtains the prediction result of the probability of gynecological incidence based on the gynecological prediction model;

健康报告生成模块,根据妇科肿瘤疾病发病概率预测结果、获取的用户生理指标信息、与该用户相关的危险因素及危险因素的贡献率生成可视化报告。The health report generation module generates a visual report according to the prediction result of the incidence probability of gynecological tumor diseases, the obtained user's physiological index information, the risk factors related to the user and the contribution rate of the risk factors.

在更多实施例中,提供一种电子设备,包括存储器和处理器以及存储在存储器上并在处理器上运行的计算机指令,所述计算机指令被处理器运行时,完成以下步骤:In further embodiments, there is provided an electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor, the computer instructions, when executed by the processor, perform the following steps:

在妇科肿瘤病例中获取相关疾病变量,与肿瘤事件进行相关性分析,筛选得到危险因素;基于筛选的危险因素构建妇科肿瘤疾病风险预测模型;Obtain relevant disease variables in gynecological tumor cases, conduct correlation analysis with tumor events, and screen out risk factors; build gynecological tumor disease risk prediction models based on the screened risk factors;

接收发病风险预测请求,调取相关历史疾病数据队列,基于预测模型获取妇科肿瘤疾病发病概率预测结果;Receive disease risk prediction requests, retrieve relevant historical disease data queues, and obtain gynecological tumor disease incidence probability prediction results based on prediction models;

根据妇科肿瘤疾病发病概率预测结果、获取的用户生理指标信息、与该用户相关的危险因素及危险因素的贡献率生成可视化报告。A visual report is generated according to the prediction results of the incidence probability of gynecological tumor diseases, the obtained user's physiological index information, the risk factors related to the user and the contribution rate of the risk factors.

在更多实施例中,提供一种计算机可读存储介质,用于存储计算机指令,所述计算机指令被处理器执行时,完成以下步骤:In further embodiments, a computer-readable storage medium is provided for storing computer instructions that, when executed by a processor, perform the following steps:

在妇科肿瘤病例中获取妇科肿瘤疾病相关变量,与妇科肿瘤事件进行相关性分析,筛选得到危险因素;基于筛选的危险因素构建妇科肿瘤疾病风险预测模型;Obtain gynecological tumor disease-related variables from gynecological tumor cases, conduct correlation analysis with gynecological tumor events, and screen out risk factors; build a gynecological tumor disease risk prediction model based on the screened risk factors;

接收发病风险预测请求,调取相关历史疾病数据队列,基于预测模型获取妇科肿瘤疾病发病概率预测结果;Receive disease risk prediction requests, retrieve relevant historical disease data queues, and obtain gynecological tumor disease incidence probability prediction results based on prediction models;

根据妇科肿瘤疾病发病概率预测结果、获取的用户生理指标信息、与该用户相关的危险因素及危险因素的贡献率生成可视化报告。A visual report is generated according to the prediction results of the incidence probability of gynecological tumor diseases, the obtained user's physiological index information, the risk factors related to the user and the contribution rate of the risk factors.

以上一个或多个技术方案存在以下有益效果:One or more of the above technical solutions have the following beneficial effects:

本发明采用数据的可视化,使用图表总结复杂的数据,确保用户能够清晰、直观的理解数据关系,使用大数据可视化的工具报告就可以使用简短的图形体现复杂信息,甚至单个图形也能做到,通过交互元素以及类似于热图、fever charts等新的可视化工具,解释各种不同的数据关系。The present invention adopts data visualization, uses charts to summarize complex data, ensures that users can clearly and intuitively understand data relationships, and uses big data visualization tools to report complex information with short graphics, even a single graphic can do it, Interpret various data relationships through interactive elements and new visualization tools like heatmaps, fever charts, and more.

在保险领域的应用上,保险公司能够根据系统输出的被保险人疾病风险,进行传统保险核保手段的补充和支持,精准阻断高风险客群的同时,提升人工核保效率。同时基于人群的疾病风险预测结果,也能为保险公司未来设计保险产品提供更精准的疾病发生率数据,使得针对不同个体健康水平的差异化费率保险产品设计成为可能,让客户享受到更为合理的费率,更为全面的保险保障,同时支持保险公司健康保险产品创新,吸引更多优质客户。In the field of insurance, insurance companies can supplement and support traditional insurance underwriting methods according to the disease risk of the insured output by the system, accurately block high-risk customer groups, and improve the efficiency of manual underwriting. At the same time, the disease risk prediction results based on the population can also provide more accurate disease incidence data for insurance companies to design insurance products in the future, making it possible to design differentiated premium insurance products for different individual health levels, allowing customers to enjoy more Reasonable rates, more comprehensive insurance protection, and support for insurance companies to innovate health insurance products to attract more high-quality customers.

本发明的妇科肿瘤疾病预测模型基于与疾病相关的危险因素,而非以标记物或图像处理的方法,而现有模型并未考虑到疾病危险因素的影响,而且不能以正常的生理指标发现潜在的发病几率,或者基于某一项体检指标的异常无法有效诊断是否存在发病几率。本发明基于疾病危险因素建模,便于发现高危个体,及早进行干预或治疗,提高生活质量,预测模型的预测能力和检验效度均较高,有一定实际指导意义。The gynecological tumor disease prediction model of the present invention is based on disease-related risk factors, rather than markers or image processing methods, and the existing model does not take into account the influence of disease risk factors, and cannot find potential potential with normal physiological indicators. The incidence of the disease, or the abnormality of a certain physical examination index cannot effectively diagnose whether there is an incidence of the disease. Based on the modeling of disease risk factors, the present invention facilitates the discovery of high-risk individuals, early intervention or treatment, and improvement of life quality.

本发明基于疾病大数据队列,采用相关性分析等数据挖掘方法充分挖掘了与妇科相关的危险因素,很大程度上弥补了仅进行人工筛选的主观性;并且,在疾病大数据的支撑下,保证了危险因素不被遗漏,且保证了后续预测模型的通用性。Based on the disease big data queue, the invention fully mines the risk factors related to gynecology by using data mining methods such as correlation analysis, which largely makes up for the subjectivity of only manual screening; and, under the support of disease big data, It ensures that risk factors are not omitted, and the generality of subsequent prediction models is guaranteed.

附图说明Description of drawings

构成本发明的一部分的说明书附图用来提供对本发明的进一步理解,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。The accompanying drawings forming a part of the present invention are used to provide further understanding of the present invention, and the exemplary embodiments of the present invention and their descriptions are used to explain the present invention, and do not constitute an improper limitation of the present invention.

图1为本发明实施例1提供的妇科风险预测可视化系统功能架构图;1 is a functional architecture diagram of a gynecological risk prediction visualization system provided in Embodiment 1 of the present invention;

图2为本发明实施例1提供的数据标准化方法流程图。FIG. 2 is a flowchart of the data standardization method provided in Embodiment 1 of the present invention.

具体实施方式Detailed ways

应该指出,以下详细说明都是示例性的,旨在对本发明提供进一步的说明。除非另有指明,本文使用的所有技术和科学术语具有与本发明所属技术领域的普通技术人员通常理解的相同含义。It should be noted that the following detailed description is exemplary and intended to provide further explanation of the invention. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

需要注意的是,这里所使用的术语仅是为了描述具体实施方式,而非意图限制根据本发明的示例性实施方式。如在这里所使用的,除非上下文另外明确指出,否则单数形式也意图包括复数形式,此外,还应当理解的是,当在本说明书中使用术语“包含”和/或“包括”时,其指明存在特征、步骤、操作、器件、组件和/或它们的组合。It should be noted that the terminology used herein is for the purpose of describing specific embodiments only, and is not intended to limit the exemplary embodiments according to the present invention. As used herein, unless the context clearly dictates otherwise, the singular is intended to include the plural as well, furthermore, it is to be understood that when the terms "comprising" and/or "including" are used in this specification, it indicates that There are features, steps, operations, devices, components and/or combinations thereof.

在不冲突的情况下,本发明中的实施例及实施例中的特征可以相互组合。Embodiments of the invention and features of the embodiments may be combined with each other without conflict.

实施例一Example 1

如图1所示,本实施例公开了一种妇科肿瘤疾病风险预测可视化系统,包括云平台,所述云平台,包括:As shown in FIG. 1 , this embodiment discloses a visualization system for risk prediction of gynecological tumor diseases, including a cloud platform, and the cloud platform includes:

风险预测模型构建模块,在妇科病例中获取妇科相关疾病变量,与患妇科事件进行相关性分析,筛选得到危险因素;基于筛选的危险因素构建妇科风险预测模型;Risk prediction model building module, obtain gynecological-related disease variables in gynecological cases, conduct correlation analysis with gynecological events, and screen out risk factors; build gynecological risk prediction models based on the screened risk factors;

妇科概率预测模块,接收发病风险预测请求,调取相关历史疾病数据队列,基于妇科预测模型获取妇科发病概率预测结果;The gynecological probability prediction module receives the request for onset risk prediction, retrieves the relevant historical disease data queue, and obtains the gynecological onset probability prediction result based on the gynecological prediction model;

健康报告生成模块,根据妇科发病概率预测结果、获取的用户生理指标信息、与该用户相关的危险因素及危险因素的贡献率生成可视化报告。The health report generation module generates a visual report according to the prediction result of gynecological incidence probability, the obtained user's physiological index information, the risk factors related to the user and the contribution rate of the risk factors.

在所述风险预测模型构建模块中:In the said risk prediction model building block:

从分布式数据库系统调取疾病大数据队列,具体为:The disease big data queue is retrieved from the distributed database system, specifically:

步骤1.1:根据预设的与疾病有关的字段,查找分布式数据库系统中包含这些字段的数据表;Step 1.1: According to the preset disease-related fields, find the data tables containing these fields in the distributed database system;

步骤1.2:基于查找到的数据表,抽取身份证号、疾病、疾病编码、发病时间等字段,并记录该疾病的数据来源,例如源地市、源数据表,在数据表中的ID等,生成疾病大数据队列。Step 1.2: Based on the found data table, extract fields such as ID number, disease, disease code, onset time, etc., and record the data source of the disease, such as source city, source data table, ID in the data table, etc., Generate disease big data cohorts.

所述分布式数据库系统包括布设在各地市的医疗信息数据库。本实施例中,医疗信息数据库包括分布于山东省各地市的全员人口信息数据库、公共卫生数据库、电子病历数据库、医保数据库、健康体检数据库、死因数据库等。The distributed database system includes medical information databases located in various cities. In this embodiment, the medical information database includes a full-staff population information database, a public health database, an electronic medical record database, a medical insurance database, a health examination database, a cause-of-death database, etc. distributed in various cities in Shandong Province.

其中,所述全员人口信息数据库,包括:居民个人基本信息、社保信息、住房信息和居民诚信失信信息。Wherein, the population information database of all employees includes: basic personal information of residents, social security information, housing information, and information on residents' integrity and dishonesty.

所述公共卫生数据库,包括:个人健康基本信息、残疾人表、健康体检表、学生体检表、出生医学证明、新生儿家庭访视信息、儿童健康检查信息、产前随访服务信息、分娩记录表、产后访视服务信息、孕检记录表、预防接种卡信息、传染病报告卡、职业病报告卡、食源性疾病卡、死亡医学证明、高血压患者随访表、2型糖尿病患者随访表、重性精神疾病患者管理表、重性精神病患者发病信息、重性精神病患者出院信息、冠心病患者信息、脑卒中患者信息、肿瘤患者信息、肺结核患者随访表、育龄夫妇信息、婚姻信息、生育登记和生育审批信息、妊娠信息、妇女生育史信息、避孕信息、计划生育手术信息、流出人口信息、流入人口信息、门诊摘要信息、住院摘要信息。The public health database includes: basic personal health information, disabled person form, health check-up form, student medical check-up form, birth medical certificate, newborn family visit information, child health examination information, prenatal follow-up service information, and birth record form , postpartum visit service information, pregnancy examination record form, vaccination card information, infectious disease report card, occupational disease report card, food-borne disease card, death medical certificate, follow-up form for patients with hypertension, follow-up form for patients with type 2 diabetes, severe Sexual mental illness patient management table, serious mental illness patient information, serious mental illness patient discharge information, coronary heart disease patient information, stroke patient information, tumor patient information, tuberculosis patient follow-up form, childbearing age couple information, marriage information, birth registration and Fertility approval information, pregnancy information, women's reproductive history information, contraceptive information, family planning surgery information, outflow population information, inflow population information, outpatient summary information, and hospitalization summary information.

所述电子病历数据库,包括:门诊/急诊挂号、门诊/急诊就诊记录、门诊/急诊/住院西药处方、门诊/急诊/住院中药处方、检验记录、检验结果明细、检查记录、入院记录、住院病案首页、中医住院病案首页、出院记录。The electronic medical record database includes: outpatient/emergency registration, outpatient/emergency visit records, outpatient/emergency/inpatient western medicine prescriptions, outpatient/emergency/inpatient Chinese medicine prescriptions, inspection records, details of inspection results, inspection records, admission records, inpatient medical records Home page, home page of TCM inpatient medical records, and discharge records.

所述健康体检数据库,包括:体检报告单、体检项目表、体检项目明细表。The health check-up database includes: a check-up report, a check-up item table, and a check-up item list.

所述医疗保险数据库,包括:基本信息、诊疗信息、医保费用总信息、医保费用明细信息。The medical insurance database includes: basic information, diagnosis and treatment information, general information on medical insurance expenses, and detailed information on medical insurance expenses.

如图2所示,对疾病大数据队列进行数据标准化:As shown in Figure 2, data normalization is performed on the disease big data cohort:

步骤2.1:从疾病大数据队列中筛选样本数据集,将样本数据中的疾病名称与疾病分类标准中的疾病名称进行对照,将样本数据中的疾病名称进行标准化;Step 2.1: Screen the sample data set from the disease big data queue, compare the disease names in the sample data with the disease names in the disease classification standard, and standardize the disease names in the sample data;

其中,所述将样本数据中的疾病名称进行标准化包括:创建标准化名称字段,依次按照以下步骤执行标准化:Wherein, the standardizing the disease names in the sample data includes: creating a standardized name field, and performing the standardization according to the following steps in sequence:

(1)名称相同对照:获取疾病名称与疾病分类标准中的疾病名称完全一致的样本数据,将原疾病名称写入标准化名称字段。(1) Control with the same name: Obtain sample data whose disease name is completely consistent with the disease name in the disease classification standard, and write the original disease name into the standardized name field.

(2)名称相似对照:获取疾病名称与疾病分类标准中的疾病名称相似度超过设定阈值的样本数据,将原疾病名称写入标准化名称字段;所述相似性度量可采用余弦相似度、欧氏距离等现有文本相似度方法,在此不做限定。(2) Name similarity comparison: obtain sample data whose disease name and disease name similarity in disease classification standards exceed the set threshold, and write the original disease name into the standardized name field; the similarity measure can be cosine similarity, European Existing text similarity methods, such as Clan distance, are not limited here.

(3)包含对照:获取疾病名称与疾病分类标准中的疾病名称存在包含关系的样本数据,例如“前列腺炎(非手术治疗)”和“前列腺炎”,将原疾病名称写入标准化名称字段。(3) Inclusion control: Obtain sample data with an inclusion relationship between the disease name and the disease name in the disease classification standard, such as "prostatitis (non-surgical treatment)" and "prostatitis", and write the original disease name into the standardized name field.

(4)经由客户端由用户对样本数据的标准化名称进行人工审核。具体地,人工审核时可按频数将疾病名称进行排序,优先审核频数大的疾病名称。(4) The standardized name of the sample data is manually reviewed by the user via the client. Specifically, during manual review, the disease names can be sorted by frequency, and the disease names with high frequency are prioritized for review.

标准化过程中,系统自动将疾病大数据队列中各疾病名称所对应的对照方式进行记录。本实施例,所述样本数据集选择数据质量较好的医保数据(约60000条),疾病分类标准采用ICD10编码。During the standardization process, the system automatically records the comparison method corresponding to each disease name in the disease big data queue. In this embodiment, the sample data set selects medical insurance data with better data quality (about 60,000 pieces), and the disease classification standard is coded by ICD10.

步骤2.2:对于疾病大数据队列中未标准化的数据,将疾病名称与样本数据中的原疾病名称进行对照,完成部分疾病名称的标准化;Step 2.2: For the unstandardized data in the disease big data cohort, compare the disease name with the original disease name in the sample data, and complete the standardization of some disease names;

其中,所述对照方式与步骤2.1相同。具体地,对于疾病名称与样本数据中的原疾病名称满足名称相同、名称相似度大于设定阈值或存在包含关系的数据,将样本数据中原疾病名称对应的标准化名称写入标准化字段。Wherein, the control method is the same as step 2.1. Specifically, for the data in which the disease name and the original disease name in the sample data satisfy the same name, the name similarity is greater than the set threshold, or there is an inclusion relationship, the standardized name corresponding to the original disease name in the sample data is written into the standardized field.

步骤2.3:对于疾病大数据队列中剩余未标准化的数据,将疾病编码与疾病分类标准中的编码进行对照,对于编码对照成功的数据,将疾病分类标准中的编码相应的疾病名称写入标准化字段。Step 2.3: For the remaining unstandardized data in the disease big data cohort, compare the disease code with the code in the disease classification standard, and for the data with successful coding comparison, write the disease name corresponding to the code in the disease classification standard into the standardized field .

具体地,将疾病编码与疾病分类标准中的编码进行对照分阶段进行:首先与疾病分类标准中的编码全部6位进行对照,其次与前4位进行对照,最后与前2位进行对照。Specifically, the disease codes are compared with the codes in the disease classification standards in stages: firstly, all 6 digits of the codes in the disease classification standards are compared, then the first 4 digits are compared, and finally the first 2 digits are compared.

步骤2.4:经由客户端由用户对疾病大数据队列中的标准化名称进行人工审核,由于待标准化的数据量大(约700万),此处可按频数将疾病名称进行排序,仅审核频数较大的疾病名称;Step 2.4: The standardized names in the disease big data queue are manually reviewed by the user through the client. Due to the large amount of data to be standardized (about 7 million), the disease names can be sorted by frequency here, and only the review frequency is large. the name of the disease;

步骤2.5:统计对照率,若对照率超过设定阈值,标准化结束。Step 2.5: Statistical comparison rate, if the comparison rate exceeds the set threshold, the standardization ends.

本实施例针对来源复杂的医疗大数据,基于多个层级的文本匹配方式,获取样本数据的标准化数据,然后基于样本的标准化数据,依次按照名称和编码匹配的方式,完成海量的数据标准化,相较于全部医疗大数据之间与标准数据直接匹配的方式,能够得到更高的标准化率和准确率,且兼顾了标准化的效率。For medical big data with complex sources, this embodiment obtains standardized data of sample data based on multiple levels of text matching methods, and then completes massive data standardization based on the standardized data of samples in sequence by matching names and codes. Compared with the direct matching between all medical big data and standard data, higher standardization rate and accuracy rate can be obtained, and the efficiency of standardization can be taken into account.

基于疾病数据标准化的疾病大数据队列,建立妇科肿瘤疾病队列,具体为:Based on the disease big data cohort standardized by disease data, establish a gynecological tumor disease cohort, specifically:

步骤3.1:从疾病大数据队列中检索与妇科相关的疾病名称;由于妇科相关的表达形式较多,此处需进行同义词扩展,本领域技术人员可以理解,还可以通过构造逻辑表达式进行检索;Step 3.1: Retrieve the names of diseases related to gynecology from the disease big data queue; since there are many expressions related to gynecology, synonym expansion is required here, and those skilled in the art can understand that it can also be retrieved by constructing logical expressions;

步骤3.2:经由客户端由用户对检索得到的妇科相关疾病名称进行审核;本领域技术人员可以理解,该审核可针对数据记录进行单独删减,也可通过构造逻辑表达式进行批量删减;Step 3.2: The user reviews the retrieved gynecological-related disease names via the client; those skilled in the art can understand that the review can be performed for individual deletion of data records, or can be deleted in batches by constructing logical expressions;

步骤3.3:根据妇科相关疾病名称,从疾病大数据队列中匹配身份证号、性别、地域等数据,得到妇科肿瘤疾病队列。Step 3.3: According to the names of gynecological-related diseases, match the ID number, gender, region and other data from the disease big data cohort to obtain the gynecological tumor disease cohort.

妇科肿瘤疾病队列中各数据均可作为索引进行额外的检索,例如,针对妇科肿瘤疾病队列中的某一身份证号,可从分布式数据库中获取该身份证号对应的所有相关医疗数据记录。Each data in the gynecological tumor disease cohort can be used as an index for additional retrieval. For example, for a certain ID number in the gynecological tumor disease cohort, all relevant medical data records corresponding to the ID number can be obtained from the distributed database.

在本实施例中,接收队列纳入排除标准,构建最终队列,一部分用于建模,一部分用于模型验证。被配置为执行以下步骤:In this embodiment, the receiving queue is included in the exclusion criteria, and a final queue is constructed, one part is used for modeling, and the other part is used for model verification. is configured to perform the following steps:

在本实施例中,队列病例纳入标准为:妇科初次诊断时间在2012年1月1日到2016年12月31日期间妇科的发病情况,有明确的临床影像学检查或/和病理学核实信息;队列排除标准:2012年1月1日前死亡及发生所有类型癌症的个体。In this example, the inclusion criteria for cohort cases are: the initial diagnosis of gynecology from January 1, 2012 to December 31, 2016, the incidence of gynecology, with clear clinical imaging examination or/and pathological verification information ; Cohort exclusion criteria: individuals who died and developed all types of cancer before January 1, 2012.

在所述风险预测模型构建模块中,根据妇科相关疾病变量统计相关危险因素并进行筛选:In the risk prediction model building module, relevant risk factors are counted and screened according to gynecological-related disease variables:

步骤4.1:将各疾病变量与患妇科事件进行相关性分析,将相关性大于设定阈值的疾病变量作为候选危险因素;本实施例采用多因素Cox比例风险模型。Step 4.1: Perform correlation analysis between each disease variable and gynecological events, and use disease variables with a correlation greater than a set threshold as a candidate risk factor; this embodiment adopts a multi-factor Cox proportional hazards model.

(1)根据是否具有疾病变量,构建二值化危险因素矩阵X,其中,每一行对应一个人,每一列对应一类危险因素,矩阵X的第m行第n列X(m,n)表示第m个人是否具有第n类疾病变量,若是,记为1,若否,记为0;(1) According to whether there is a disease variable, construct a binary risk factor matrix X, in which each row corresponds to a person, each column corresponds to a type of risk factor, and the mth row of the matrix X and the nth column X(m, n) represent Whether the mth person has the nth disease variable, if so, record it as 1, if not, record it as 0;

(2)根据是否发生妇科事件,构建二值化妇科矩阵Y,其中,矩阵Y包含一列,每一行对应一个人是否发生妇科事件;(2) According to whether a gynecological event occurs, construct a binarized gynecological matrix Y, wherein the matrix Y includes a column, and each row corresponds to whether a gynecological event occurs in a person;

(3)将二值化危险因素矩阵X的每一列与矩阵Y进行相关性分析,得到相关性矩阵R,矩阵R中的各元素表示各疾病变量与妇科的相关性,将相关性大于设定阈值的疾病变量作为候选危险因素。(3) Perform correlation analysis between each column of the binary risk factor matrix X and matrix Y to obtain a correlation matrix R. Each element in the matrix R represents the correlation between each disease variable and gynecology, and the correlation is greater than the set value. Threshold disease variables served as candidate risk factors.

步骤4.2:基于贝叶斯网络,从候选危险因素中筛选最终危险因素。Step 4.2: Based on the Bayesian network, screen the final risk factors from the candidate risk factors.

贝叶斯网络是一种表示变量间连接概率的图形模式,可用于发现数据间的潜在关系,贝叶斯学习的结果表示为随机变量的概率分布,它可以解释为对不同可能性的信任程度。本实施例将所述步骤4.1中得到的候选危险因素与妇科事件输入贝叶斯网络,得到与妇科事件有关联的候选危险因素作为最终的危险因素。A Bayesian network is a graphical pattern representing the probability of connections between variables, which can be used to discover potential relationships between data. The result of Bayesian learning is represented as a probability distribution of random variables, which can be interpreted as the degree of trust in different possibilities . In this embodiment, the candidate risk factors and gynecological events obtained in the step 4.1 are input into the Bayesian network, and the candidate risk factors related to the gynecological events are obtained as the final risk factors.

本领域技术人员可以理解,还可以基于文献、临床数据和国家标准,人为的辅助指标筛选,采用多种指标筛选方法,防止重要指标的遗漏。Those skilled in the art can understand that, based on literature, clinical data and national standards, artificial auxiliary index screening can also be used, and various index screening methods can be adopted to prevent the omission of important indicators.

在妇科概率预测模块中,妇科发病概率预测结果计算方法为:In the gynecological probability prediction module, the calculation method of the gynecological incidence probability prediction result is:

接收用户终端发送的预测请求,调取所述用户的历史疾病数据队列,对于预测模型中的每个危险因素变量,若该用户患有该危险因素相应的疾病,则对该危险因素变量赋值为1,否则赋值为0,计算该用户的妇科发病概率。Receive the prediction request sent by the user terminal, and retrieve the historical disease data queue of the user. For each risk factor variable in the prediction model, if the user suffers from the disease corresponding to the risk factor, the risk factor variable is assigned as 1, otherwise it is assigned a value of 0, and the probability of gynecological morbidity of the user is calculated.

获取该用户有关妇科的危险因素及各危险因素的贡献率;Obtain the user's gynecological risk factors and the contribution rate of each risk factor;

具体地,各危险因素的贡献率计算方法为:Specifically, the calculation method of the contribution rate of each risk factor is:

对于上述赋值为1的每个危险因素变量,分别赋值为0并计算妇科发病概率,得到该用户不患有该危险因素相应的疾病时的发病概率;将其与妇科概率预测模块得到的发病概率作差,得到每个危险因素相应的疾病对该用户得妇科的贡献率。For each risk factor variable with a value of 1 above, assign a value of 0 and calculate the incidence probability of gynecology to obtain the incidence probability when the user does not suffer from the disease corresponding to the risk factor; compare it with the incidence probability obtained by the gynecology probability prediction module Do the difference, and get the contribution rate of each risk factor's corresponding disease to the user's gynecology.

在更多实施例中,提供一种卵巢癌疾病风险预测系统,包括根据筛选的危险因素构建风险预测模型;In more embodiments, an ovarian cancer disease risk prediction system is provided, comprising constructing a risk prediction model according to the screened risk factors;

所述卵巢癌相关疾病变量包括子宫内膜异位症、多囊卵巢综合征、盆腔炎、卵巢囊肿、子宫肌瘤、卵巢良性肿瘤和糖尿病等。The ovarian cancer-related disease variables include endometriosis, polycystic ovary syndrome, pelvic inflammatory disease, ovarian cysts, uterine fibroids, benign ovarian tumors, and diabetes.

筛选的危险因素包括:年龄、子宫内膜异位症、盆腔炎、卵巢囊肿、子宫肌瘤、卵巢良性肿瘤和糖尿病。Screening risk factors included: age, endometriosis, pelvic inflammatory disease, ovarian cysts, uterine fibroids, benign ovarian tumors, and diabetes.

构建风险预测模型,包括:Build risk prediction models, including:

步骤5.1:基于筛选的危险因素采用Cox风险回归模型进行单因素分析,通过逐步筛选法选独立预测因子,建立多因素Cox比例风险模型。检验水准α=0.05。Step 5.1: Based on the screened risk factors, the Cox risk regression model was used for univariate analysis, and the independent predictors were selected by the stepwise screening method to establish a multi-factor Cox proportional hazards model. Inspection level α=0.05.

步骤5.2:将年龄、子宫内膜异位症、盆腔炎、卵巢囊肿、子宫肌瘤、卵巢良性肿瘤和糖尿病的危险因素输入多因素Cox比例风险模型,建立疾病预测模型。Step 5.2: The risk factors of age, endometriosis, pelvic inflammatory disease, ovarian cysts, uterine fibroids, benign ovarian tumors and diabetes were entered into the multivariate Cox proportional hazards model to establish a disease prediction model.

具体地,首先基于每个危险因素分别进行单因素建模,得到预测性能最好的初始预测模型,相应的危险因素即为最重要因素;然后,在该初始预测模型的基础上,引入其他危险因素中的一个,进行两因素建模,得到预测性能最好的两因素预测模型,新引入的危险因素即为次重要因素;依次类推,依次引入新的危险指标,直至预测模型的性能不再增强。Specifically, firstly, single-factor modeling is performed based on each risk factor to obtain the initial prediction model with the best prediction performance, and the corresponding risk factor is the most important factor; then, on the basis of the initial prediction model, other risks are introduced. One of the factors, carry out two-factor modeling, and obtain the two-factor prediction model with the best prediction performance. The newly introduced risk factor is the second most important factor; and so on, new risk indicators are introduced in turn, until the performance of the prediction model is no longer. enhanced.

其中,每构建一次预测模型,均计算ROC、灵敏度、特异度;然后计算NRI=(灵敏度test2+特异度test2)-(灵敏度test1+特异度test1),作为模型性能的衡量指标。若NRI>0,提示在加入了新的预测因子后,新模型的预测能力有所改善,正确分类的比例提高了NRI个百分点。NRI提高越多,变量预测效果越好,变量越重要。Among them, each time a prediction model is constructed, ROC, sensitivity and specificity are calculated; then NRI=(sensitivity test2+specificity test2)-(sensitivity test1+specificity test1) is calculated as a measure of model performance. If NRI>0, it indicates that after adding new predictors, the predictive ability of the new model has improved, and the proportion of correct classification has increased by NRI percentage points. The more the NRI improves, the better the variable predicts and the more important the variable is.

本实施例模型的构建采用每次引入一个危险因素的方式,逐步确定与妇科最相关的危险因素,且保证了预测的准确度。The construction of the model in this embodiment adopts the method of introducing one risk factor at a time, and gradually determines the risk factor most relevant to gynecology, and ensures the accuracy of prediction.

基于此,得到了性能最优的预测模型,同时,对筛选得到的危险因素的重要性进行了排序。Based on this, the prediction model with the best performance was obtained, and at the same time, the importance of the screened risk factors was ranked.

在更多实施例中,提供一种宫颈癌风险预测系统,采用Gail模型,建立宫颈癌疾病预测模型;In more embodiments, a cervical cancer risk prediction system is provided, which adopts the Gail model to establish a cervical cancer disease prediction model;

宫颈癌相关疾病变量包括人类乳头状瘤病毒感染、盆腔炎、宫颈炎、安装宫内节育器、子宫平滑肌瘤、妊娠合并巨大儿和异位妊娠;Cervical cancer-related disease variables included human papillomavirus infection, pelvic inflammatory disease, cervicitis, IUD installation, uterine leiomyomas, pregnancy with macrosomia, and ectopic pregnancy;

筛选后的风险因素包括:人类乳头状瘤病毒感染、盆腔炎、宫颈炎、子宫平滑肌瘤和安装宫内节育器。Post-screening risk factors include: human papillomavirus infection, pelvic inflammatory disease, cervicitis, uterine leiomyomas, and IUD installation.

在更多实施例中,提供一种乳腺癌风险预测系统,采用logistic回归模型和Gail模型构建乳腺癌风险预测模型;In more embodiments, a breast cancer risk prediction system is provided, which adopts a logistic regression model and a Gail model to construct a breast cancer risk prediction model;

乳腺癌相关疾病变量包括年龄、乳腺炎、乳腺良性肿瘤、乳腺增生、子宫肌瘤、乳腺囊肿、乳腺肿物、乳腺导管瘘、乳腺发育不良和乳腺过早发育;Breast cancer-related disease variables included age, mastitis, benign breast tumors, breast hyperplasia, uterine fibroids, breast cysts, breast masses, breast duct fistulas, breast dysplasia, and premature breast development;

筛选出的危险因素包括年龄、乳腺炎、乳腺良性肿瘤、乳腺增生、子宫肌瘤和乳腺囊肿。Screened risk factors included age, mastitis, benign breast tumors, breast hyperplasia, uterine fibroids, and breast cysts.

在更多实施例中,提供一种子宫内膜癌风险预测系统,采用logistic回归模型,基于筛选的危险因素构建子宫内膜癌风险预测模型,In more embodiments, an endometrial cancer risk prediction system is provided, using a logistic regression model to construct an endometrial cancer risk prediction model based on the screened risk factors,

子宫内膜癌相关疾病变量包括高血压、糖尿病、肥胖、冠心病、不孕症、贫血、卵巢疾病和子宫肌瘤;Endometrial cancer-related disease variables include hypertension, diabetes, obesity, coronary heart disease, infertility, anemia, ovarian disease, and uterine fibroids;

筛选出的危险因素包括糖尿病,高血压,卵巢疾病,不孕症,子宫肌瘤和贫血。Risk factors screened for include diabetes, hypertension, ovarian disease, infertility, uterine fibroids and anemia.

在本实施例中,该系统还包括:In this embodiment, the system further includes:

用户管理模块,用于对注册用户的身份信息进行管理;The user management module is used to manage the identity information of registered users;

疾病应对策略管理模块,用于对各类疾病的注意事项、应对建议进行存储;The disease coping strategy management module is used to store the precautions and coping suggestions for various diseases;

妇科危险因素指引模块,对于用户患有的对妇科有影响的疾病,获取相应的应对策略;The gynecological risk factor guidance module, to obtain the corresponding coping strategies for the diseases that the user suffers from affecting gynecology;

健康报告生成模块,用于根据健康信息、妇科发病概率预测结果和妇科危险因素指引结果生成可视化报告。所述可视化报告包括图形化,例如点图、热图、气泡图等。The health report generation module is used to generate visual reports based on health information, gynecological incidence probability prediction results and gynecological risk factor guidance results. The visual report includes graphics, such as dot plots, heat maps, bubble charts, and the like.

云平台中预先封装相关数据处理方法,上述的数据处理均在云平台执行,数据不会传输至其他终端,保证了数据的安全,保护了用户的隐私。The relevant data processing methods are pre-packaged in the cloud platform. The above data processing is performed on the cloud platform, and the data will not be transmitted to other terminals, which ensures the security of the data and protects the privacy of users.

本实施例采用云平台作为数据汇总和数据处理的核心,与各级地市医疗机构的数据库对接,保证了数据的真实性和完整性,以及数据的安全性。In this embodiment, the cloud platform is used as the core of data aggregation and data processing, and is connected with the databases of medical institutions at all levels, so as to ensure the authenticity and integrity of the data and the security of the data.

本实施例提供了针对用户的健康评估系统,能够预测用户的妇科发病概率,以及该用户所患与妇科有关疾病的贡献率,给出这些疾病的应对策略,起到引导用户预防妇科的作用。This embodiment provides a health assessment system for the user, which can predict the probability of the user's gynecological onset and the contribution rate of the gynecological-related diseases suffered by the user, and provide coping strategies for these diseases, so as to guide the user to prevent gynecology.

在本实施例中,所述系统包括工作终端,所述工作终端,包括:In this embodiment, the system includes a working terminal, and the working terminal includes:

数据标准化模块,用于对云平台中样本数据标准化结果和全部数据标准化结果进行审核;The data standardization module is used to review the standardization results of sample data and all data standardization results in the cloud platform;

妇科相关疾病名称获取模块,用于接收用户输入的与妇科相关的疾病名称,或用于检索疾病名称的逻辑表达式;以及对检索到的疾病名称进行审核;A gynecology-related disease name acquisition module, used to receive gynecology-related disease names input by the user, or a logical expression for retrieving disease names; and reviewing the retrieved disease names;

危险因素确定模块,用于从云平台获取候选危险因素及其贝叶斯网络结构图,接收用户对危险因素的确认和修正并发送至云平台;The risk factor determination module is used to obtain candidate risk factors and their Bayesian network structure diagram from the cloud platform, receive the user's confirmation and correction of risk factors and send them to the cloud platform;

模型构建模块,用于接收病例纳入标准以及所采用的模型;Model building blocks for receiving case inclusion criteria and the model employed;

模型修正模块,用于对所采用的模型和模型参数进行修正。The model correction module is used to correct the adopted model and model parameters.

用户终端,包括:User terminal, including:

登录认证模块,用于对用户身份进行认证;The login authentication module is used to authenticate the user identity;

健康报告查看模块,用于从云平台获取该用户的健康信息,包括历史体检信息、病例信息等;The health report viewing module is used to obtain the user's health information from the cloud platform, including historical physical examination information, case information, etc.;

妇科概率预测模块,用于从云平台获取妇科发病概率预测结果;The gynecological probability prediction module is used to obtain the gynecological incidence probability prediction results from the cloud platform;

妇科危险因素指引模块,用于从云平台获取该用户有关妇科的危险因素及各危险因素的贡献率;The gynecological risk factor guide module is used to obtain the user's gynecological risk factors and the contribution rate of each risk factor from the cloud platform;

健康报告生成模块,用于根据健康信息、妇科发病概率预测结果和妇科危险因素指引结果生成可视化报告。The health report generation module is used to generate visual reports based on health information, gynecological incidence probability prediction results and gynecological risk factor guidance results.

在更多实施例中,还提供:In further embodiments, there is also provided:

一种电子设备,包括存储器和处理器以及存储在存储器上并在处理器上运行的计算机指令,所述计算机指令被处理器运行时,完成以下步骤:An electronic device, comprising a memory and a processor and computer instructions stored in the memory and running on the processor, when the computer instructions are executed by the processor, the following steps are completed:

在妇科病例中获取妇科相关疾病变量,与患妇科事件进行相关性分析,筛选得到危险因素;基于筛选的危险因素构建妇科风险预测模型;Obtain gynecological-related disease variables from gynecological cases, conduct correlation analysis with gynecological events, and screen out risk factors; build a gynecological risk prediction model based on the screened risk factors;

接收发病风险预测请求,调取相关历史疾病数据队列,基于妇科预测模型获取妇科发病概率预测结果;Receive disease risk prediction requests, retrieve relevant historical disease data queues, and obtain gynecological incidence probability prediction results based on gynecological prediction models;

根据妇科发病概率预测结果、获取的用户生理指标信息、与该用户相关的危险因素及危险因素的贡献率生成可视化报告。A visual report is generated according to the prediction result of gynecological incidence probability, the obtained user's physiological index information, the risk factors related to the user and the contribution rate of the risk factors.

一种计算机可读存储介质,用于存储计算机指令,所述计算机指令被处理器执行时,完成以下步骤:A computer-readable storage medium for storing computer instructions, when the computer instructions are executed by a processor, the following steps are completed:

在妇科病例中获取妇科相关疾病变量,与患妇科事件进行相关性分析,筛选得到危险因素;基于筛选的危险因素构建妇科风险预测模型;Obtain gynecological-related disease variables from gynecological cases, conduct correlation analysis with gynecological events, and screen out risk factors; build a gynecological risk prediction model based on the screened risk factors;

接收发病风险预测请求,调取相关历史疾病数据队列,基于妇科预测模型获取妇科发病概率预测结果;Receive disease risk prediction requests, retrieve relevant historical disease data queues, and obtain gynecological incidence probability prediction results based on gynecological prediction models;

根据妇科发病概率预测结果、获取的用户生理指标信息、与该用户相关的危险因素及危险因素的贡献率生成可视化报告。A visual report is generated according to the prediction result of gynecological incidence probability, the obtained user's physiological index information, the risk factors related to the user and the contribution rate of the risk factors.

本领域技术人员应该明白,上述本发明的各模块或各步骤可以用通用的计算机装置来实现,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。本发明不限制于任何特定的硬件和软件的结合。Those skilled in the art should understand that the above modules or steps of the present invention can be implemented by a general-purpose computer device, or alternatively, they can be implemented by a program code executable by the computing device, so that they can be stored in a storage device. The device is executed by a computing device, or they are separately fabricated into individual integrated circuit modules, or multiple modules or steps in them are fabricated into a single integrated circuit module for implementation. The present invention is not limited to any specific combination of hardware and software.

以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included within the protection scope of the present invention.

上述虽然结合附图对本发明的具体实施方式进行了描述,但并非对本发明保护范围的限制,所属领域技术人员应该明白,在本发明的技术方案的基础上,本领域技术人员不需要付出创造性劳动即可做出的各种修改或变形仍在本发明的保护范围以内。Although the specific embodiments of the present invention have been described above in conjunction with the accompanying drawings, they do not limit the scope of protection of the present invention. Those skilled in the art should understand that on the basis of the technical solutions of the present invention, those skilled in the art do not need to pay creative work. Various modifications or variations that can be made are still within the protection scope of the present invention.

Claims (10)

1. A system for visualizing risk prediction of a gynecological tumor disease, comprising:
the risk prediction model building module is used for obtaining gynecological relevant disease variables in gynecological cases, carrying out correlation analysis on the gynecological relevant disease variables and gynecological events, and screening to obtain risk factors; constructing a gynecological risk prediction model based on the screened risk factors;
the gynecological probability prediction module receives the disease risk prediction request, calls a related historical disease data queue, and obtains a gynecological disease probability prediction result based on a gynecological prediction model;
and the health report generation module is used for generating a visual report according to the gynecological morbidity prediction result, the acquired physiological index information of the user, the risk factors related to the user and the contribution rate of the risk factors.
2. The system of claim 1, wherein the risk prediction visualization system for gynecological tumor diseases,
the gynecological tumor diseases comprise: endometrial, breast, cervical and ovarian cancer.
3. The system of claim 1, wherein the risk prediction visualization system for gynecological tumor diseases,
the risk prediction model building module further comprises:
performing single factor analysis based on the screened risk factors, and selecting independent prediction factors of gynecological tumor diseases by a step-by-step screening method to establish a multi-factor proportional risk model;
and inputting the risk factors into a multi-factor proportional risk model, and establishing a gynecological tumor disease prediction model.
4. The system for visualizing risk prediction of gynecological tumor diseases according to claim 1, wherein the calculation method of the gynecological incidence probability prediction result comprises:
and for each risk factor variable in the prediction model, if the disease corresponding to the risk factor is suffered, assigning the value of the risk factor variable to be 1, otherwise, assigning the value of the risk factor variable to be 0, and calculating the gynecological morbidity probability.
5. The system for visualizing risk prediction of gynecological tumor diseases according to claim 1, wherein the calculation method of the contribution rate of risk factors comprises:
respectively assigning the risk factors to be 0 and calculating the gynecological disease incidence probability to obtain the disease incidence probability without the risk factors; and (4) subtracting the risk probability with the incidence probability obtained by the gynecological probability prediction module to obtain the contribution rate of each risk factor to the gynecological event.
6. The system of claim 1, wherein the acquiring of gynecological-related disease variables in gynecological cases comprises:
searching the gynecological disease name from the disease big data queue, and auditing the gynecological disease name;
according to the gynecological related disease name, matching the identification number, the sex and the regional data from the disease big data queue to obtain a gynecological tumor disease queue;
the received cases are included in the standard, and gynecological cases are obtained from the gynecological tumor disease queue.
7. The system of claim 6, wherein the risk prediction visualization system for gynecological tumor diseases,
the disease big data queue searches a data table containing fields in a database system according to preset fields related to diseases; and based on the searched data table, extracting the identification number and the fields related to the diseases to generate a disease big data queue.
8. The system of claim 6, wherein the risk prediction visualization system for gynecological tumor diseases,
disease data normalization on a disease big data cohort, the disease data normalization comprising:
screening a sample data set from the disease big data queue, comparing the disease name in the sample data with the disease name in the disease classification standard, and standardizing the disease name in the sample data;
for the data which is not standardized in the disease big data queue, comparing the disease name with the original disease name in the sample data, and for the successfully compared data, writing the corresponding standardized disease name completion part in the sample data into the standardized field;
and for the data which is not standardized and remains in the disease big data queue, comparing the disease codes with the codes in the disease classification standard, and writing the disease names corresponding to the codes in the disease classification standard into the standardized fields for the data with successful code comparison.
9. An electronic device comprising a memory and a processor and computer instructions stored on the memory and executed on the processor, the computer instructions when executed by the processor performing the steps of:
acquiring gynecological related disease variables in gynecological cases, carrying out correlation analysis on the gynecological related disease variables and gynecological affected events, and screening to obtain risk factors; constructing a gynecological risk prediction model based on the screened risk factors;
receiving a disease risk prediction request, calling a related historical disease data queue, and obtaining a gynecological disease probability prediction result based on a gynecological prediction model;
and generating a visual report according to the gynecological morbidity prediction result, the acquired physiological index information of the user, the risk factors related to the user and the contribution rate of the risk factors.
10. A computer readable storage medium storing computer instructions that, when executed by a processor, perform the steps of:
acquiring gynecological related disease variables in gynecological cases, carrying out correlation analysis on the gynecological related disease variables and gynecological affected events, and screening to obtain risk factors; constructing a gynecological risk prediction model based on the screened risk factors;
receiving a disease risk prediction request, calling a related historical disease data queue, and obtaining a gynecological disease probability prediction result based on a gynecological prediction model;
and generating a visual report according to the gynecological morbidity prediction result, the acquired physiological index information of the user, the risk factors related to the user and the contribution rate of the risk factors.
CN202010687315.3A 2020-07-16 2020-07-16 Gynecological tumor disease risk prediction visualization system Pending CN111899888A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010687315.3A CN111899888A (en) 2020-07-16 2020-07-16 Gynecological tumor disease risk prediction visualization system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010687315.3A CN111899888A (en) 2020-07-16 2020-07-16 Gynecological tumor disease risk prediction visualization system

Publications (1)

Publication Number Publication Date
CN111899888A true CN111899888A (en) 2020-11-06

Family

ID=73190521

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010687315.3A Pending CN111899888A (en) 2020-07-16 2020-07-16 Gynecological tumor disease risk prediction visualization system

Country Status (1)

Country Link
CN (1) CN111899888A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112614595A (en) * 2020-12-25 2021-04-06 联仁健康医疗大数据科技股份有限公司 Survival analysis model construction method and device, electronic terminal and storage medium
CN112763727A (en) * 2021-04-08 2021-05-07 北京大学第三医院(北京大学第三临床医学院) Biomarkers and kits for diagnosis of polycystic ovarian syndrome and methods of use
CN114628033A (en) * 2022-02-22 2022-06-14 湖南新云网科技有限公司 Disease risk prediction method, device, equipment and storage medium
CN117995412A (en) * 2024-04-07 2024-05-07 粤港澳大湾区数字经济研究院(福田) A method, device, terminal and storage medium for predicting future disease probability
CN118969310A (en) * 2024-07-31 2024-11-15 山东大学齐鲁医院 An interactive CSEP intraoperative massive bleeding prediction method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107085666A (en) * 2017-05-24 2017-08-22 山东大学 System and method for disease risk assessment and personalized health report generation
CN107153774A (en) * 2017-05-24 2017-09-12 山东大学 The disease forecasting system of the structure and application of chronic disease risk assessment the hyperbolic model model
CN107201401A (en) * 2017-05-23 2017-09-26 深圳市第二人民医院 A kind of Multiple-Factor Model and its method for building up for pathogenesis of breast carcinoma risk profile
CN107358047A (en) * 2017-07-13 2017-11-17 刘峰 Diabetic assesses and management system
CN109841281A (en) * 2017-11-29 2019-06-04 郑州大学第一附属医院 Construction method based on coexpression similitude identification adenocarcinoma of lung early diagnosis mark and risk forecast model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107201401A (en) * 2017-05-23 2017-09-26 深圳市第二人民医院 A kind of Multiple-Factor Model and its method for building up for pathogenesis of breast carcinoma risk profile
CN107085666A (en) * 2017-05-24 2017-08-22 山东大学 System and method for disease risk assessment and personalized health report generation
CN107153774A (en) * 2017-05-24 2017-09-12 山东大学 The disease forecasting system of the structure and application of chronic disease risk assessment the hyperbolic model model
CN107358047A (en) * 2017-07-13 2017-11-17 刘峰 Diabetic assesses and management system
CN109841281A (en) * 2017-11-29 2019-06-04 郑州大学第一附属医院 Construction method based on coexpression similitude identification adenocarcinoma of lung early diagnosis mark and risk forecast model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
薛付忠: "大数据背景下整合健康保险&健康维护的理论方法体系", 《山东大学学报(医学版)》, vol. 9, no. 57, pages 1 - 19 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112614595A (en) * 2020-12-25 2021-04-06 联仁健康医疗大数据科技股份有限公司 Survival analysis model construction method and device, electronic terminal and storage medium
CN112763727A (en) * 2021-04-08 2021-05-07 北京大学第三医院(北京大学第三临床医学院) Biomarkers and kits for diagnosis of polycystic ovarian syndrome and methods of use
CN114628033A (en) * 2022-02-22 2022-06-14 湖南新云网科技有限公司 Disease risk prediction method, device, equipment and storage medium
CN117995412A (en) * 2024-04-07 2024-05-07 粤港澳大湾区数字经济研究院(福田) A method, device, terminal and storage medium for predicting future disease probability
CN117995412B (en) * 2024-04-07 2025-11-18 粤港澳大湾区数字经济研究院(福田) A method, device, terminal, and storage medium for predicting the probability of future disease incidence.
CN118969310A (en) * 2024-07-31 2024-11-15 山东大学齐鲁医院 An interactive CSEP intraoperative massive bleeding prediction method and system

Similar Documents

Publication Publication Date Title
CN111899888A (en) Gynecological tumor disease risk prediction visualization system
Dol et al. Timing of neonatal mortality and severe morbidity during the postnatal period: a systematic review
Chen et al. Understanding the relationship between cesarean birth and stress, anxiety, and depression after childbirth: A nationwide cohort study
Souza et al. The development of a simplified, effective, labour monitoring-to-action (SELMA) tool for better outcomes in labour difficulty (BOLD): study protocol
Weldegebreal et al. Precancerous cervical lesion among HIV-positive women in Sub-Saharan Africa: a systematic review and meta-analysis
CN111883253A (en) Disease data analysis method and lung cancer risk prediction system based on medical knowledge base
Abebe et al. Determinants of early initiation of first antenatal care visit in Ethiopia based on the 2019 Ethiopia mini-demographic and health survey: A multilevel analysis
CN111816319A (en) A step-by-step screening method for the determination of critical disease indicators of the urinary system and a risk prediction system
Orazulike et al. A 3-year retrospective review of mortality in women of reproductive age in a tertiary health facility in Port Harcourt, Nigeria
Escobar et al. Maternal and perinatal outcomes in mixed antenatal care modality implementing telemedicine in the southwestern region of Colombia during the COVID-19 pandemic
Yimer et al. Adverse obstetric outcomes in public hospitals of southern Ethiopia: the role of parity
Wong et al. Evaluating bias-mitigated predictive models of perinatal mood and anxiety disorders
Li et al. Multimodal learning system integrating electronic medical records and hysteroscopic images for reproductive outcome prediction and risk stratification of endometrial injury: a multicenter diagnostic study
Arusi et al. Predictors of uterine rupture after one previous cesarean section: An unmatched case–control study
Karkee et al. Obstetric complications and cesarean delivery in Nepal
Mesay et al. A prognostic study for the development of risk prediction model for the success of vaginal birth following a cesarean surgery at Felege Hiwot Comprehensive Specialized Hospital, Northwest Ethiopia
CN111816318A (en) A cardiac disease data cohort generation method and risk prediction system
CN111816316A (en) A disease data scheduling management method and bone cancer risk prediction system
Battarbee et al. Cost‐effectiveness of ultrasound before non‐invasive prenatal screening for fetal aneuploidy
Baranov et al. Validation of the prediction model for success of vaginal birth after cesarean delivery at the university hospital in Barcelona
Nethery et al. Validation of insurance billing codes for monitoring antenatal screening
CN111814169A (en) A kind of gastrointestinal disease data encryption acquisition method and risk prediction system
Abebe et al. Spatial distribution, and predictors of late initiation of first antenatal care visit in Ethiopia: Spatial and multilevel analysis
CN118016288A (en) Method and system for predicting prognosis dynamic risk of senile primary colorectal lymphoma
Tisler et al. Nationwide study on development and validation of a risk prediction model for CIN3+ and cervical cancer in Estonia

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination