[go: up one dir, main page]

CN109978701A - Personal probability forecasting method and the system of being hospitalized - Google Patents

Personal probability forecasting method and the system of being hospitalized Download PDF

Info

Publication number
CN109978701A
CN109978701A CN201910258525.8A CN201910258525A CN109978701A CN 109978701 A CN109978701 A CN 109978701A CN 201910258525 A CN201910258525 A CN 201910258525A CN 109978701 A CN109978701 A CN 109978701A
Authority
CN
China
Prior art keywords
data
medical insurance
factor
basic medical
predictive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910258525.8A
Other languages
Chinese (zh)
Inventor
万湘琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Pacific Health Management Co Ltd
Original Assignee
Pacific Health Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pacific Health Management Co Ltd filed Critical Pacific Health Management Co Ltd
Priority to CN201910258525.8A priority Critical patent/CN109978701A/en
Publication of CN109978701A publication Critical patent/CN109978701A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Mathematical Optimization (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Operations Research (AREA)
  • Computational Mathematics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Mathematical Analysis (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Algebra (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Technology Law (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The present invention a kind of personal probability forecasting method and system in hospital include: to acquire the basic medical insurance reimbursement data and corresponding insured people's information data in the current year;Data normalization processing is carried out to basic medical insurance settlement data, basic medical insurance clearing detailed data and insured people's information data, to obtain the basic medical insurance clearing detailed data of the basic medical insurance settlement data of standard, standard and the insured people's information data of standard;Four major class predictive factors are generated based on the basic medical insurance settlement data of standard, the basic medical insurance clearing detailed data of standard and the insured people's information data of standard, four major class predictive factors include personal information, health care costs, medical act and disease type, generate multiple sub- predictive factors based on four major class predictive factors;Feature Conversion is carried out to sub- predictive factor;Feature Dimension Reduction is carried out to reduce the quantity of sub- predictive factor to the sub- predictive factor after conversion;Logic Regression Models are established based on the sub- predictive factor after feature selecting to predict admission rate next year.

Description

Personal probability forecasting method and the system of being hospitalized
Technical field
The present invention relates to personal probabilistic forecasting technical fields of being hospitalized, more particularly to a kind of personal probability forecasting method of being hospitalized With personal probabilistic forecasting system of being hospitalized.
Background technique
Basic medical insurance reimbursement data cover insured people's personal information, disease information, medical behavior, medical expense, social security Type etc. is multi-field, and medical incidence is much higher for the incidence that other insure, and data granularity is thin, can For portraying portrait of the insured people in terms of medical treatment & health, the admission rate prediction model of foundation can be realization:
The price core of hospitalization benefit insurance is protected;
High risk group identification, medical expense is mainly derived from pays in hospital, the High risk group high to probability in hospital, Intervened in advance and managed, can effectively control out the rapid growth of medical expense.
The assessment of hospitalize ability, to patient's being hospitalized generally within following a period of time of current year hospitalization The prediction of rate can assess hospitalize level to a certain extent.
It is obtained in current major insurance company Claims Resolution data or open resource that perhaps research institution's meeting foundation is had by oneself Data are studied in the prediction for being hospitalized probability of personal level.But in view of the limitation of data volume, data granularity Limitation has deficiency on precision of prediction.
Summary of the invention
The present invention is in view of the problems of the existing technology and insufficient, provides a kind of personal be hospitalized and probability forecasting method and is System.
The present invention is to solve above-mentioned technical problem by following technical proposals:
The present invention provides a kind of personal probability forecasting method of being hospitalized, it is characterized in that comprising following steps:
Step 1, collecting sample: the basic medical insurance for acquiring the current year submits an expense account data and corresponding insured people's Information Number According to the basic medical insurance reimbursement data include basic medical insurance settlement data and basic medical insurance clearing detailed data;
Step 2, data normalization: to basic medical insurance settlement data, basic medical insurance clearing detailed data and insured people's information Data carry out data normalization processing, to obtain the basic medical insurance clearing detailed data of the basic medical insurance settlement data of standard, standard With the insured people's information data of standard;
Step 3, Feature Engineering: based on the basic medical insurance settlement data of standard, the basic medical insurance clearing detailed data of standard and mark Quasi- insured people's information data generates four major class predictive factors, and four major class predictive factors include personal information, health care costs, medical row For and disease type, generate multiple sub- predictive factors based on four major class predictive factors;
Step 4, Feature Conversion: Feature Conversion is carried out to sub- predictive factor;
Step 5, feature selecting: Feature Dimension Reduction is carried out to reduce the number of sub- predictive factor to the sub- predictive factor after conversion Amount;
Step 6 establishes model: establishing Logic Regression Models based on the sub- predictive factor after feature selecting to predict next year The admission rate of degree;
Wherein, Y indicates admission rate next year, θiIndicate independent variable, 0≤i≤n, XjSon after indicating feature selecting is pre- J-th of sub- predictive factor in the factor is surveyed, 1≤j≤n, n indicate the quantity of the sub- predictive factor after feature selecting.
Preferably, in step 4, the Feature Conversion of numeric type predictive factor is carried out using impact coding, use The Feature Conversion of one-hot-encoding progress character type predictive factor.
Preferably, in steps of 5, being analyzed in the sub- predictive factor after conversion and being relative to each other using factor correlativity The factor, only retain a factor in the factor being relative to each other, using XGBOOST algorithm removal predictive power it is weaker because Son.
Preferably, the field that basic medical insurance settlement data includes mainly has personal number, and number of going to a doctor, consultation time, just Examine classification ,/discharge time of being admitted to hospital, diagnosis coding, diagnosis name, department's title, medical total amount, medical insurance reimbursed sum, think highly of oneself The amount of money, serious disease reimbursed sum, other reimbursed sums etc.;
The field that basic medical insurance clearing detailed data includes mainly has personal number, and number of going to a doctor settles accounts odd numbers, medical insurance mesh Record coding, medical insurance directory title, unit price, quantity, the amount of money pay ratio for oneself, at one's own expense amount of money etc.;
The main has age of field that insured people's information data includes, gender, insurance kind, retired state, registered permanent residence property, culture Degree, political affiliation, job category etc..
The present invention also provides a kind of personal probabilistic forecasting systems of being hospitalized, it is characterized in that comprising data acquisition module, number According to processing module, data generation module, Feature Conversion module, feature selection module and model building module;
The data acquisition module is used to acquire the basic medical insurance reimbursement data in the current year and corresponding insured people believes Data are ceased, the basic medical insurance reimbursement data include basic medical insurance settlement data and basic medical insurance clearing detailed data;
The data processing module is used to believe basic medical insurance settlement data, basic medical insurance clearing detailed data and insured people It ceases data and carries out data normalization processing, to obtain the basic medical insurance clearing detail number of the basic medical insurance settlement data of standard, standard According to the insured people's information data of standard;
The data generation module is used to settle accounts detailed data based on the basic medical insurance settlement data of standard, the basic medical insurance of standard Four major class predictive factors are generated with the insured people's information data of standard, four major class predictive factors include personal information, health care costs, doctor Treatment behavior and disease type generate multiple sub- predictive factors based on four major class predictive factors;
The Feature Conversion module is used to carry out Feature Conversion to sub- predictive factor;
The feature selection module is used to carry out Feature Dimension Reduction to the sub- predictive factor after conversion to reduce sub- predictive factor Quantity;
The model building module is used to establish Logic Regression Models based on the sub- predictive factor after feature selecting to predict Admission rate next year;
Wherein, Y indicates admission rate next year, θiIndicate independent variable, 0≤i≤n, XjSon after indicating feature selecting is pre- J-th of sub- predictive factor in the factor is surveyed, 1≤j≤n, n indicate the quantity of the sub- predictive factor after feature selecting.
Preferably, the Feature Conversion module is used to carry out the feature of numeric type predictive factor using impact coding Conversion carries out the Feature Conversion of character type predictive factor using one-hot-encoding.
Preferably, the feature selection module is used to analyze the sub- predictive factor after conversion using factor correlativity In the factor that is relative to each other, only retain a factor in the factor being relative to each other, predictive power removed using XGBOOST algorithm The weaker factor.
Preferably, the field that basic medical insurance settlement data includes mainly has personal number, and number of going to a doctor, consultation time, just Examine classification ,/discharge time of being admitted to hospital, diagnosis coding, diagnosis name, department's title, medical total amount, medical insurance reimbursed sum, think highly of oneself The amount of money, serious disease reimbursed sum, other reimbursed sums etc.;
The field that basic medical insurance clearing detailed data includes mainly has personal number, and number of going to a doctor settles accounts odd numbers, medical insurance mesh Record coding, medical insurance directory title, unit price, quantity, the amount of money pay ratio for oneself, at one's own expense amount of money etc.;
The main has age of field that insured people's information data includes, gender, insurance kind, retired state, registered permanent residence property, culture Degree, political affiliation, job category etc..
On the basis of common knowledge of the art, above-mentioned each optimum condition, can any combination to get each preferable reality of the present invention Example.
The positive effect of the present invention is that:
The cover time of the reimbursement data of basic medical insurance is long, and area coverage is wide.The history information of personal level is gone to a doctor Information, medicine information, the inspection used, diagnosis and treatment, operation information are more comprehensive, greatly improve admission rate prediction model Precision.
Detailed description of the invention
Fig. 1 is that the individual of present pre-ferred embodiments is hospitalized the flow chart of probability forecasting method.
Fig. 2 is that the individual of present pre-ferred embodiments is hospitalized the structural block diagram of probabilistic forecasting system.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Admission rate prediction model submits an expense account data and insured people's information data with the basic medical insurance in the current year to predict down The probability of being hospitalized in one year.The variable pond of admission rate prediction model includes personal information, health care costs, medical act and disease About 1500 predictive factors of 4 major class of medical information.Then, predict personal level next year by Logic Regression Models Probability in hospital.The expression formula of model are as follows:
Lower mask body introduces admission rate prediction model.
As shown in Figure 1, the present embodiment provides a kind of personal probability forecasting methods of being hospitalized comprising following steps:
Step 1, collecting sample: the basic medical insurance for acquiring the current year submits an expense account data and corresponding insured people's Information Number According to the basic medical insurance reimbursement data include basic medical insurance settlement data and basic medical insurance clearing detailed data.
The field that basic medical insurance settlement data includes mainly has personal number, number of going to a doctor, consultation time, classification of going to a doctor, It is admitted to hospital/discharge time, diagnosis coding, diagnosis name, department's title, medical total amount, medical insurance reimbursed sum, the amount of money of thinking highly of oneself, greatly Sick reimbursed sum, other reimbursed sums etc..
The field that basic medical insurance clearing detailed data includes mainly has personal number, and number of going to a doctor settles accounts odd numbers, medical insurance mesh Record coding, medical insurance directory title, unit price, quantity, the amount of money pay ratio for oneself, at one's own expense amount of money etc..
The main has age of field that insured people's information data includes, gender, insurance kind, retired state, registered permanent residence property, culture Degree, political affiliation, job category etc..
Step 2, data normalization: to basic medical insurance settlement data, basic medical insurance clearing detailed data and insured people's information Data carry out data normalization processing, to obtain the basic medical insurance clearing detailed data of the basic medical insurance settlement data of standard, standard With the insured people's information data of standard.
Data normalization and to establish standard scale be the important prerequisite using this model.All new cities are required to data Table in library is standardized.The field format of different cities medical insurance data is all different, after data normalization, standard scale Content be it is the same, in this way be convenient for subsequent code reuse.
Step 3, Feature Engineering: based on the basic medical insurance settlement data of standard, the basic medical insurance clearing detailed data of standard and mark Quasi- insured people's information data generates four major class predictive factors, and four major class predictive factors include personal information, health care costs, medical row For and disease type, generate more than 1500 sub- predictive factors based on four major class predictive factors, lay base for the foundation of prediction model Plinth.
(a) personal information
In the data of Tonglu, personal information includes the age of insurer, gender, insured type, occupation, the registered permanent residence, on-job shape State.
(b) health care costs
Health care costs are most important feature classifications.The layering of health care costs includes hospital grade, hospital category, medical class Type, season cost and account of payment type.We have carried out mathematical statistics and conduct to everyone expenditure in the same category Feature, including whole expenditures, average outgo and maximum expenditure.Such as plan as a whole the maximum value paid in the outpatient service of general hospital, Average value and synthesis.
(c) medical act
Medical act is relevant predictive factor with physician office visits, length of stay etc..The granularity and layering of medical act It is consistent with the exfoliated particles degree of health care costs predictive factor.
(d) disease information
The method that the label of disease information has used the disease of independent research to be grouped encodes a ICD10 of China more than 20,000, According to medical phase relation, expense phase relation, the phase relation of diagnosis and treatment path is divided into a group more than 360.Each insured people can be according to it Medical historical information stamps a disease label more than 360.It is otherwise 0 if there is such disease diagnosis information is then 1.
(e) medical information
It is analyzed by data, has chosen 84 and be grouped with the maximally related diagnosis and treatment of medical expense.Then, just according to insured people Information is examined, has stamped 84 diagnosis and treatment labels to each insured people.
(f) medicine information
889 medicine informations are divided into 27 disease categories on the basis of PCG drug is grouped by this method.For example, using The insured people for crossing Pravastatin will stamp the label of hyperlipidemia.
Wherein, the acquisition channel of disease type is (d), (e) and (f).
Step 4, Feature Conversion: carrying out Feature Conversion to sub- predictive factor, and it is pre- to carry out numeric type using impact coding The Feature Conversion for surveying the factor carries out the Feature Conversion of character type predictive factor using one-hot-encoding.
Feature Conversion is the value of original predictive factor to be converted into and predicted the more relevant numerical value of target.For difference The predictive factor of type can use different Feature Conversion methods.
Carry out transforming numerical type variable using impact coding, including spend class variable, because it can be preferably and pre- It surveys target and establishes linear relationship.It carries out impact coding and is divided into 100 by certain method firstly the need of by the field Bucket, each bucket can be by certain conversion methods by the numerical value of the numerical value conversion Cheng Xin of original field.
Transformed valuebucketi=f (original value)
Step 5, feature selecting: Feature Dimension Reduction is carried out to reduce the number of sub- predictive factor to the sub- predictive factor after conversion Amount.The factor being relative to each other in the sub- predictive factor after conversion is analyzed using factor correlativity, only retains phase each other A factor in the factor of pass, using the weaker factor of XGBOOST algorithm removal predictive power.
A part of subset is screened to do feature selecting using statistics or the method for modeling, this process also referred to as reduces dimension Degree, abbreviation dimensionality reduction.Due to producing a large amount of predictive factor, the method for the two major classes used carrys out system and efficiently does feature choosing It selects.
Firstly, automatically removing the factor being relative to each other by factor correlativity analysis.
Then, the weaker factor of predictive power is automatically removed using model.Based on factor correlativity analysis method it is excellent Point is that calculating speed is very fast.The advantages of model-based method is that the efficiency for the precision of prediction that it improves model is higher, still The disadvantage is that calculating speed is slower.Feature Dimension Reduction is carried out using XGBOOST algorithm.
Last removal manually.The opinion of domain knowledge and industry specialists is extremely important.Some predictive factors need to combine special The opinion of family is added or is removed manually.
Step 6 establishes model: establishing Logic Regression Models based on the sub- predictive factor after feature selecting to predict next year The admission rate of degree.
Logic Regression Models are a kind of generalized linear regression models (Generalized Linear Model), are usually used in pre- Survey certain disease or certain probability happened.Its dependent variable can be two classification, be also possible to it is polytypic, still Two classification it is more commonly used.Logistic regression assumes that dependent variable and residual error obey bi-distribution, and independent variable is linear with probability of happening Relationship, and it is mutually indepedent between independent variable.Logistic regression has carried out Logit transformation, model expression to dependent variable are as follows:
The probability of model prediction are as follows:
Wherein, Y indicates admission rate next year, θiIndicate independent variable, 0≤i≤n, XjSon after indicating feature selecting is pre- J-th of sub- predictive factor in the factor is surveyed, 1≤j≤n, n indicate the quantity of the sub- predictive factor after feature selecting.
Mode has used 41 independents variable: personal information: 2;Health care costs: 18;Medical act: 1;Disease type: 16;It examines Treatment type: 4.
It wherein, is the medical treatment flower of the fourth quater with the strongest health care costs category feature of probability positive correlation of being hospitalized next year Take;The strongest personal information category feature of positive correlation is retirement mark;The strongest disease type category feature of positive correlation is gestation State;Diagnosis and treatment type category feature is childbirth correlation, and negatively correlated with probability of being hospitalized next year.
Using R2, AUROC, Gini with KS index come measure model prediction as a result, but emphasis it is different.
R2 is the ratio of regression sum of square and total sum of squares.It reflects regression equation to the interpretability of prediction target. Its data biggish for absolute value is more sensitive.
AUROC is the area (Area under ROC Curve) under ROC curve.ROC curve (receiver Operating characteristic curve), also known as experience linearity curve.It is according to a series of two different mode classifications (cut off value or threshold value), using true positive rate as ordinate, false positive rate is the curve that abscissa is drawn.ROC curve can be easy to The recognition capability to object event (certain disease, be hospitalized etc.) when any boundary value is found on ground.ROC curve is closer to upper left The accuracy at angle, test is higher.AUROC can intuitively be interpreted as the random positive sample ranking uniformly extracted and uniformly take out Expectation before the random negative sample taken, for its value between 0.5-1, value is higher, and the predictive ability of model is better.
Gini coefficient is to measure model to the index (Gini-AUROC*2-1) of positive, negative client's discrimination.Gini system For several values between 0-1, value is higher, and the discrimination of model is better.In the assessment of model capability, Gini coefficient is in 0.3- Indicate that the separating capacity of model is medium between 0.39;Gini coefficient indicates that the separating capacity of model is high between 0.4-0.59; Gini coefficient, which is greater than 0.6, indicates that the separating capacity of model is fabulous.
KS (Kolmogorov-Smirnov) index be under different two mode classifications (cut off value or threshold value), model The maximum value of the difference of true positive rate and false positive rate.It indicates the ability that model can distinguish positive, negative client.KS value Between 0-1.Value is bigger, and the separating capacity of model is better.Common to say, KS > 0.2 indicates that model has preferable prediction quasi- True property.
R-Square of the model on test set reaches 8.16%, KS and reaches 29.15%, has preferable prediction accurate Property.
Model classification R-Square AUROC Gini KS
xgboost 10.75% 69.24% 38.49% 30.44%
Logistic Regression 8.16% 67.66% 35.32% 29.15%
The model method refers to the reimbursement data of basic medical insurance and the information data of insured people, in big data In the environment of predict the probability of being hospitalized of insured people next year, the accuracy of prediction is compared with common commercial insurance company own The prediction model established in Claims Resolution data greatly improves.The price core of insurance industry is protected and risk control capability is one It is secondary greatly to be promoted, have great importance.
As shown in Fig. 2, the present embodiment also provides a kind of personal probabilistic forecasting system of being hospitalized comprising data acquisition module 1, Data processing module 2, data generation module 3, Feature Conversion module 4, feature selection module 5 and model building module 6.
The data acquisition module 1 is used to acquire the basic medical insurance reimbursement data in the current year and corresponding insured people believes Data are ceased, the basic medical insurance reimbursement data include basic medical insurance settlement data and basic medical insurance clearing detailed data.
The data processing module 2 is used for basic medical insurance settlement data, basic medical insurance clearing detailed data and insured people Information data carries out data normalization processing, to obtain the basic medical insurance clearing detail of the basic medical insurance settlement data of standard, standard Data and the insured people's information data of standard.
The data generation module 3 is used to settle accounts detail number based on the basic medical insurance settlement data of standard, the basic medical insurance of standard Generate four major class predictive factors according to the insured people's information data of standard, four major class predictive factors include personal information, health care costs, Medical act and disease type generate multiple sub- predictive factors based on four major class predictive factors.
The Feature Conversion module 4 is used to carry out Feature Conversion to sub- predictive factor.
The feature selection module 5 is used to carry out Feature Dimension Reduction to the sub- predictive factor after conversion to reduce sub- predictive factor Quantity.
The model building module 6 is used to establish Logic Regression Models based on the sub- predictive factor after feature selecting to predict Admission rate next year.
Wherein, Y indicates admission rate next year, θiIndicate independent variable, 0≤i≤n, XjSon after indicating feature selecting is pre- J-th of sub- predictive factor in the factor is surveyed, 1≤j≤n, n indicate the quantity of the sub- predictive factor after feature selecting.
Although specific embodiments of the present invention have been described above, it will be appreciated by those of skill in the art that these It is merely illustrative of, protection scope of the present invention is defined by the appended claims.Those skilled in the art is not carrying on the back Under the premise of from the principle and substance of the present invention, many changes and modifications may be made, but these are changed Protection scope of the present invention is each fallen with modification.

Claims (8)

1. a kind of personal probability forecasting method of being hospitalized, which is characterized in that itself the following steps are included:
Step 1, collecting sample: the basic medical insurance reimbursement data and corresponding insured people's information data in the current year, institute are acquired Stating basic medical insurance reimbursement data includes basic medical insurance settlement data and basic medical insurance clearing detailed data;
Step 2, data normalization: to basic medical insurance settlement data, basic medical insurance clearing detailed data and insured people's information data Data normalization processing is carried out, to obtain the basic medical insurance clearing detailed data of the basic medical insurance settlement data of standard, standard and mark Quasi- insured people's information data;
Step 3, Feature Engineering: based on the basic medical insurance settlement data of standard, the basic medical insurance clearing detailed data of standard and standard ginseng Guarantor's information data generate four major class predictive factors, four major class predictive factors include personal information, health care costs, medical act and Disease type generates multiple sub- predictive factors based on four major class predictive factors;
Step 4, Feature Conversion: Feature Conversion is carried out to sub- predictive factor;
Step 5, feature selecting: Feature Dimension Reduction is carried out to reduce the quantity of sub- predictive factor to the sub- predictive factor after conversion;
Step 6 establishes model: establishing Logic Regression Models based on the sub- predictive factor after feature selecting to predict next year Admission rate;
Wherein, Y indicates admission rate next year, θiIndicate independent variable, 0≤i≤n, XjIndicate feature selecting after son prediction because J-th of sub- predictive factor in son, 1≤j≤n, n indicate the quantity of the sub- predictive factor after feature selecting.
2. personal probability forecasting method of being hospitalized as described in claim 1, which is characterized in that in step 4, using impact Coding carries out the Feature Conversion of numeric type predictive factor, and the spy of character type predictive factor is carried out using one-hot-encoding Sign conversion.
3. personal probability forecasting method of being hospitalized as described in claim 1, which is characterized in that in steps of 5, using factor correlation Property analyze the factor that is relative to each other in the sub- predictive factor after conversion, only retain one in the factor being relative to each other The factor, using the weaker factor of XGBOOST algorithm removal predictive power.
4. as described in claim 1 personal probability forecasting method of being hospitalized, which is characterized in that basic medical insurance settlement data includes Field mainly has personal number, number of going to a doctor, consultation time, classification of going to a doctor ,/discharge time of being admitted to hospital, diagnosis coding, diagnosis name Claim, department's title, medical total amount, medical insurance reimbursed sum, the amount of money of thinking highly of oneself, serious disease reimbursed sum, other reimbursed sums etc.;
The field that basic medical insurance clearing detailed data includes mainly has personal number, and number of going to a doctor settles accounts odd numbers, and medical insurance directory is compiled Code, medical insurance directory title, unit price, quantity, the amount of money pay ratio for oneself, at one's own expense amount of money etc.;
The main has age of field that insured people's information data includes, gender, insurance kind, retired state, registered permanent residence property, cultural journey Degree, political affiliation, job category etc..
5. a kind of personal probabilistic forecasting system of being hospitalized, which is characterized in that it includes data acquisition module, data processing module, number According to generation module, Feature Conversion module, feature selection module and model building module;
The basic medical insurance that the data acquisition module is used to acquire the current year submits an expense account data and corresponding insured people's Information Number According to the basic medical insurance reimbursement data include basic medical insurance settlement data and basic medical insurance clearing detailed data;
The data processing module is used for basic medical insurance settlement data, basic medical insurance clearing detailed data and insured people's Information Number According to carrying out data normalization processing, thus obtain the basic medical insurance settlement data of standard, standard basic medical insurance clearing detailed data and The insured people's information data of standard;
The data generation module is used for based on the basic medical insurance settlement data of standard, the basic medical insurance clearing detailed data of standard and mark Quasi- insured people's information data generates four major class predictive factors, and four major class predictive factors include personal information, health care costs, medical row For and disease type, generate multiple sub- predictive factors based on four major class predictive factors;
The Feature Conversion module is used to carry out Feature Conversion to sub- predictive factor;
The feature selection module is used to carry out Feature Dimension Reduction to the sub- predictive factor after conversion to reduce the number of sub- predictive factor Amount;
The model building module is used to establish Logic Regression Models based on the sub- predictive factor after feature selecting next to predict The admission rate in year;
Wherein, Y indicates admission rate next year, θiIndicate independent variable, 0≤i≤n, XjIndicate feature selecting after son prediction because J-th of sub- predictive factor in son, 1≤j≤n, n indicate the quantity of the sub- predictive factor after feature selecting.
6. personal probabilistic forecasting system of being hospitalized as claimed in claim 5, which is characterized in that the Feature Conversion module is for adopting It is pre- to carry out character type using one-hot-encoding for the Feature Conversion that numeric type predictive factor is carried out with impact coding Survey the Feature Conversion of the factor.
7. personal probabilistic forecasting system of being hospitalized as claimed in claim 5, which is characterized in that the feature selection module is for adopting Analyze the factor that is relative to each other in the sub- predictive factor after conversion with factor correlativity, only retain be relative to each other because A factor in son, using the weaker factor of XGBOOST algorithm removal predictive power.
8. as claimed in claim 5 personal probabilistic forecasting system of being hospitalized, which is characterized in that basic medical insurance settlement data includes Field mainly has personal number, number of going to a doctor, consultation time, classification of going to a doctor ,/discharge time of being admitted to hospital, diagnosis coding, diagnosis name Claim, department's title, medical total amount, medical insurance reimbursed sum, the amount of money of thinking highly of oneself, serious disease reimbursed sum, other reimbursed sums etc.;
The field that basic medical insurance clearing detailed data includes mainly has personal number, and number of going to a doctor settles accounts odd numbers, and medical insurance directory is compiled Code, medical insurance directory title, unit price, quantity, the amount of money pay ratio for oneself, at one's own expense amount of money etc.;
The main has age of field that insured people's information data includes, gender, insurance kind, retired state, registered permanent residence property, cultural journey Degree, political affiliation, job category etc..
CN201910258525.8A 2019-04-01 2019-04-01 Personal probability forecasting method and the system of being hospitalized Pending CN109978701A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910258525.8A CN109978701A (en) 2019-04-01 2019-04-01 Personal probability forecasting method and the system of being hospitalized

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910258525.8A CN109978701A (en) 2019-04-01 2019-04-01 Personal probability forecasting method and the system of being hospitalized

Publications (1)

Publication Number Publication Date
CN109978701A true CN109978701A (en) 2019-07-05

Family

ID=67082232

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910258525.8A Pending CN109978701A (en) 2019-04-01 2019-04-01 Personal probability forecasting method and the system of being hospitalized

Country Status (1)

Country Link
CN (1) CN109978701A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111179102A (en) * 2019-12-25 2020-05-19 北京亚信数据有限公司 Medical insurance underwriting and protecting wind control method and device and storage medium
CN111739600A (en) * 2020-06-22 2020-10-02 平安医疗健康管理股份有限公司 Information processing method and device, computer equipment and readable storage medium
CN113642669A (en) * 2021-08-30 2021-11-12 平安医疗健康管理股份有限公司 Fraud prevention detection method, device and equipment based on feature analysis and storage medium
CN114822857A (en) * 2021-01-18 2022-07-29 阿里巴巴集团控股有限公司 Prediction method of repeat admission, computing device and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2651452A1 (en) * 2008-05-02 2009-11-02 Accenture Global Services Gmbh System for predictive analytics using real-world pharmaceutical transactions
US20110313788A1 (en) * 2010-06-17 2011-12-22 Cerner Innovation, Inc. Readmission risk assesment
US20130103615A1 (en) * 2009-02-11 2013-04-25 Johnathan Mun Project economics analysis tool
US20140108044A1 (en) * 2012-10-12 2014-04-17 Jayaram Reddy Methods and systems for analyzing health risk score and managing healthcare cost
JP2015090689A (en) * 2013-11-07 2015-05-11 株式会社日立製作所 Medical data analysis system and medical data analysis method
CN105335618A (en) * 2015-11-10 2016-02-17 成都数联易康科技有限公司 Patient feature depiction method and false hospitalization behavior detection method based on the patient feature depiction method
WO2017117230A1 (en) * 2015-12-29 2017-07-06 24/7 Customer, Inc. Method and apparatus for facilitating on-demand building of predictive models
CN108109063A (en) * 2017-12-07 2018-06-01 上海点融信息科技有限责任公司 For the method, apparatus and computer readable storage medium of prediction label predicted value
CN108733631A (en) * 2018-04-09 2018-11-02 中国平安人寿保险股份有限公司 A kind of data assessment method, apparatus, terminal device and storage medium
CN108921710A (en) * 2018-06-08 2018-11-30 东莞迪赛软件技术有限公司 The method and system of medical insurance abnormality detection
CN109545317A (en) * 2018-10-30 2019-03-29 平安科技(深圳)有限公司 The method and Related product of behavior in hospital are determined based on prediction model in hospital

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2651452A1 (en) * 2008-05-02 2009-11-02 Accenture Global Services Gmbh System for predictive analytics using real-world pharmaceutical transactions
US20130103615A1 (en) * 2009-02-11 2013-04-25 Johnathan Mun Project economics analysis tool
US20110313788A1 (en) * 2010-06-17 2011-12-22 Cerner Innovation, Inc. Readmission risk assesment
US20140108044A1 (en) * 2012-10-12 2014-04-17 Jayaram Reddy Methods and systems for analyzing health risk score and managing healthcare cost
JP2015090689A (en) * 2013-11-07 2015-05-11 株式会社日立製作所 Medical data analysis system and medical data analysis method
CN105335618A (en) * 2015-11-10 2016-02-17 成都数联易康科技有限公司 Patient feature depiction method and false hospitalization behavior detection method based on the patient feature depiction method
WO2017117230A1 (en) * 2015-12-29 2017-07-06 24/7 Customer, Inc. Method and apparatus for facilitating on-demand building of predictive models
CN108109063A (en) * 2017-12-07 2018-06-01 上海点融信息科技有限责任公司 For the method, apparatus and computer readable storage medium of prediction label predicted value
CN108733631A (en) * 2018-04-09 2018-11-02 中国平安人寿保险股份有限公司 A kind of data assessment method, apparatus, terminal device and storage medium
CN108921710A (en) * 2018-06-08 2018-11-30 东莞迪赛软件技术有限公司 The method and system of medical insurance abnormality detection
CN109545317A (en) * 2018-10-30 2019-03-29 平安科技(深圳)有限公司 The method and Related product of behavior in hospital are determined based on prediction model in hospital

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
鱼在天上飞: "基于Xgboost算法的保险赔偿建模分析", 《HTTP://ZHUANLAN.ZHIHU.COM/P/59858487》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111179102A (en) * 2019-12-25 2020-05-19 北京亚信数据有限公司 Medical insurance underwriting and protecting wind control method and device and storage medium
CN111179102B (en) * 2019-12-25 2023-10-03 北京亚信数据有限公司 Medical insurance verification wind control method, device and storage medium
CN111739600A (en) * 2020-06-22 2020-10-02 平安医疗健康管理股份有限公司 Information processing method and device, computer equipment and readable storage medium
CN114822857A (en) * 2021-01-18 2022-07-29 阿里巴巴集团控股有限公司 Prediction method of repeat admission, computing device and storage medium
CN113642669A (en) * 2021-08-30 2021-11-12 平安医疗健康管理股份有限公司 Fraud prevention detection method, device and equipment based on feature analysis and storage medium
CN113642669B (en) * 2021-08-30 2024-04-05 平安医疗健康管理股份有限公司 Feature analysis-based fraud prevention detection method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
Ruggieri et al. Machine-learning based vulnerability analysis of existing buildings
Li et al. A survey on statistical methods for health care fraud detection
Tsai et al. Stock prediction by searching for similarities in candlestick charts
CN109935330A (en) Personal health risk score prediction technique and system
US20020133441A1 (en) Methods and systems for identifying attributable errors in financial processes
CN109978701A (en) Personal probability forecasting method and the system of being hospitalized
WO2004046882A2 (en) Fraud and abuse detection and entity profiling in hierarchical coded payment systems
CN109978230B (en) Intelligent power sale amount prediction method based on deep convolutional neural network
Mylonakis et al. Evaluating the likelihood of using linear discriminant analysis as a commercial bank card owners credit scoring model
CN112990386B (en) User value clustering method and device, computer equipment and storage medium
CN110738573A (en) Data processing method, device, storage medium and device based on classifier
Kaya Keleş An overview: the impact of data mining applications on various sectors
CN112884593A (en) Medical insurance fraud and insurance behavior detection method and early warning device based on graph cluster analysis
Duman et al. Heath care fraud detection methods and new approaches
Boddapati An Analysis and Prediction Of Health Insurance Costs Using Machine Learning-Based Regressor Techniques
Zamzuri et al. The forecasting of poverty using the ensemble learning classification methods
CN117637127A (en) Hospital department analysis method and system based on medical insurance DIP grouping
JP2000259719A (en) Method and device for calculating probability of default on obligation
CN117764692A (en) Method for predicting credit risk default probability
Widiyono et al. Utilization of data mining to predict non-performing loan
CN119477508A (en) Method for building retail credit risk prediction model and credit card business Scorealphad model
Wasesa et al. Using smart card data to develop origin-destination matrix-based business analytics for bus rapid transit systems: case study of Jakarta, Indonesia
CN115237970A (en) Data prediction method, device, equipment, storage medium and program product
CN113888047A (en) Technical improvement project investment scale prediction method and system considering regional investment capacity
Apitzsch et al. Cluster Analysis of Mixed Data Types in Credit Risk: A study of clustering algorithms to detect customer segments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190705

WD01 Invention patent application deemed withdrawn after publication