CN117237087A - Multi-target gain model construction method, device, equipment and storage medium - Google Patents
Multi-target gain model construction method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN117237087A CN117237087A CN202311162917.7A CN202311162917A CN117237087A CN 117237087 A CN117237087 A CN 117237087A CN 202311162917 A CN202311162917 A CN 202311162917A CN 117237087 A CN117237087 A CN 117237087A
- Authority
- CN
- China
- Prior art keywords
- model
- feature data
- target
- gain
- loan
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010276 construction Methods 0.000 title claims abstract description 12
- 238000012549 training Methods 0.000 claims abstract description 75
- 238000012216 screening Methods 0.000 claims abstract description 32
- 238000000034 method Methods 0.000 claims description 37
- 238000012217 deletion Methods 0.000 claims description 16
- 230000037430 deletion Effects 0.000 claims description 16
- 238000004590 computer program Methods 0.000 claims description 14
- 230000008859 change Effects 0.000 claims description 11
- 238000004422 calculation algorithm Methods 0.000 claims description 10
- 230000009466 transformation Effects 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 8
- 238000009795 derivation Methods 0.000 claims description 7
- 238000011156 evaluation Methods 0.000 claims description 6
- 230000004927 fusion Effects 0.000 claims description 4
- 238000013145 classification model Methods 0.000 abstract description 13
- 238000010801 machine learning Methods 0.000 abstract description 4
- 230000008901 benefit Effects 0.000 abstract description 2
- 238000012360 testing method Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 5
- 230000009471 action Effects 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000000586 desensitisation Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
Landscapes
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
The application discloses a multi-target gain model construction method, a device, equipment and a storage medium, relating to the field of machine learning, comprising the following steps: performing feature screening on a plurality of feature data determined based on service data by utilizing a target variable to obtain a training set; the target variables comprise movable branch target variables and advanced settlement target variables; training the initial model by using a training set to obtain four target models; fusing two models corresponding to the movable branch target variable to obtain a first gain model, and fusing two models corresponding to the early-clearing target variable to obtain a second gain model; and determining a loan yield based on the first gain model and the second gain model, and determining a target interval of forward gain from a plurality of intervals determined based on the loan yield so as to provide loan rights and benefits to the target guest group in the target interval. According to the application, the loan yield is determined based on two gain models obtained by fusing a plurality of classification models, so that the aim of maximizing the platform on the loan amount is fulfilled.
Description
Technical Field
The present invention relates to the field of machine learning, and in particular, to a method, an apparatus, a device, and a storage medium for constructing a multi-objective gain model.
Background
In financial business, the service that the user side hopes to get can borrow at any time and repay at any time, but for a financial platform, the platform side hopes that the higher the loan amount used by the user, the longer the use time, the better. Thus, the platform side uses certain preferential policies, such as issuing rights and interests along with borrowing, issuing fee deduction coupons, and the like, to actuate the user's application and withdrawal behaviors. There is a certain balance between the issuing of the lending benefit and the platform on the credit, so how to find out the crowd which can be activated actively and can maximize the platform on the credit is a problem to be solved at present.
In the current machine learning model, a Y value is generally set, and the feature screening process finds the most stable feature with the highest correlation with the Y value. However, for the current scenario, the Y value of the business should be defined as the on-credit balance, but in terms of machine learning modeling, the on-credit balance is a continuous value, and the fitting difficulty of the available features to the linear model in the actual business is very large, and a classification model is generally used. Even if the credit is divided into three grades, namely high, medium and low, the multi-classification problem is solved, and the multi-classification screening and distinguishing degree is good. For the differences between the test and control groups, gain models are typically used to solve, which are common gain modeling methods for marketing scenarios. However, for the current business scenario, the person driven by the rights and interests is theoretically the person with the highest probability to pay in advance, so that the effect of the test group is lower than that of the control group, and the purpose of increasing the credit balance cannot be achieved.
Disclosure of Invention
In view of the above, the present application aims to provide a method, a device and a storage medium for constructing a multi-objective gain model, which can determine the loan yield by two gain models obtained by fusing a plurality of classification models, thereby achieving the objective of maximizing the loan balance of a platform. The specific scheme is as follows:
in a first aspect, the present application provides a method for constructing a multi-target gain model, including:
performing feature derivation on the collected customer service data to obtain a plurality of feature data;
performing feature screening on the plurality of feature data by utilizing a target variable to obtain a training set, and constructing an initial model based on an extreme gradient lifting algorithm; the target variables comprise a first movable support target variable and a first advanced clearing target variable which are determined based on loan rights issuing, and a second movable support target variable and a second advanced clearing target variable which are determined based on loan rights not issuing;
training the initial model by using the training set to obtain a target model; the target model comprises a first movable support model, a first advanced settlement model, a second movable support model and a second advanced settlement model;
fusing the first movable branch model and the second movable branch model to obtain a first gain model, and fusing the first advanced clearing model and the second advanced clearing model to obtain a second gain model;
And determining a loan yield based on the first gain model and the second gain model, and determining a target interval of forward gain from a plurality of box intervals determined based on the loan yield so as to issue loan rights and interests to a target guest group in the target interval.
Optionally, the feature screening is performed on the plurality of feature data by using a target variable to obtain a training set, including:
determining the correlation among the plurality of feature data, and dividing the feature data with the correlation larger than a preset correlation threshold value in the plurality of feature data into a group to obtain a plurality of groups of correlation features;
determining information values between feature data contained in each group of associated features and the target variable respectively, and screening target feature data with the maximum information value from each group of associated features respectively;
a training set is determined based on the target feature data and the remaining feature data of the plurality of feature data other than the feature data contained by the plurality of sets of associated features.
Optionally, before determining the correlation between the several kinds of feature data, the method further includes:
determining the deletion rate corresponding to the plurality of characteristic data respectively, and deleting the characteristic data with the deletion rate larger than a preset deletion threshold value in the plurality of characteristic data;
And filling missing values in the characteristic data with the missing rate not larger than the preset missing threshold value in the plurality of characteristic data.
Optionally, after the filling the missing values in the feature data with the missing rate not greater than the preset missing threshold value in the plurality of feature data, the method further includes:
determining feature dimensions corresponding to the feature data respectively, and deleting the feature data of which the feature dimensions are not more than a preset dimension threshold value in the feature data;
determining a plurality of initial feature data with feature dimensions larger than the preset dimension threshold value from the plurality of feature data;
and determining the dimension duty ratio corresponding to each feature dimension in various initial feature data, and carrying out logarithmic transformation processing on the initial feature data with the dimension duty ratio larger than a preset duty ratio threshold value in the plurality of initial feature data.
Optionally, the determining the training set based on the target feature data and the remaining feature data of the plurality of feature data except the feature data included in the plurality of sets of associated features includes:
determining a plurality of types of feature data to be screened based on the target feature data and the rest feature data except the feature data contained in the plurality of groups of associated features in the plurality of types of feature data;
Screening feature data to be checked, wherein the change trend of the feature data is consistent with the change trend of the target variable, from the feature data to be screened;
determining stability indexes corresponding to the various feature data to be checked respectively, and screening screened feature data with stability indexes larger than a preset stability threshold value from the various feature data to be checked;
constructing a tree model based on the various screened characteristic data, and determining the characteristic importance of the various screened characteristic data in the tree model respectively;
and constructing a training set based on feature data with feature importance greater than a preset importance threshold value, wherein the feature importance is screened from various screened feature data.
Optionally, the training the initial model by using the training set to obtain a target model includes:
inputting the training set and the target variable to a super-parameter optimizer, so as to iterate model parameters by using the super-parameter optimizer and based on a preset model evaluation index, thereby obtaining super-parameters;
and inputting the super parameters and the target variables into the initial model to train the initial model by utilizing the training set so as to obtain a target model.
Optionally, the fusing the first movable branch model and the second movable branch model to obtain a first gain model, and fusing the first advanced clearing model and the second advanced clearing model to obtain a second gain model, includes:
determining a first movable branch probability and a first non-movable branch probability based on the first movable branch model and the loan equity issuing, and determining a second movable branch probability and a second non-movable branch probability based on the second movable branch model and the loan equity non-issuing;
determining a first gain model based on the first dynamic branch probability, the first non-dynamic branch probability, the second dynamic branch probability, and the second non-dynamic branch probability;
determining a first advanced clearing probability and a first non-advanced clearing probability based on the first advanced clearing model and the loan equity issuance, and determining a second advanced clearing probability and a second non-advanced clearing probability based on the second advanced clearing model and the loan equity non-issuance;
a second gain model is determined based on the first advanced clearing probability, the first non-advanced clearing probability, the second advanced clearing probability, and the second non-advanced clearing probability.
In a second aspect, the present application provides a multi-target gain model construction apparatus, including:
the feature deriving module is used for performing feature derivation on the collected customer service data to obtain a plurality of feature data;
the training set determining module is used for carrying out feature screening on the plurality of feature data by utilizing the target variable to obtain a training set, and constructing an initial model based on an extreme gradient lifting algorithm; the target variables comprise a first movable support target variable and a first advanced clearing target variable which are determined based on loan rights issuing, and a second movable support target variable and a second advanced clearing target variable which are determined based on loan rights not issuing;
the model training module is used for training the initial model by utilizing the training set to obtain a target model; the target model comprises a first movable support model, a first advanced settlement model, a second movable support model and a second advanced settlement model;
the model fusion module is used for fusing the first movable branch model and the second movable branch model to obtain a first gain model, and fusing the first advanced clearing model and the second advanced clearing model to obtain a second gain model;
And the right sending module is used for determining the loan income ratio based on the first gain model and the second gain model, and determining a target interval of forward gain from a plurality of box intervals determined based on the loan income ratio so as to carry out loan right issuing on target guest groups in the target interval.
In a third aspect, the present application provides an electronic device, comprising:
a memory for storing a computer program;
and a processor for executing the computer program to implement the multi-objective gain model construction method.
In a fourth aspect, the present application provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the multi-objective gain model building method described above.
In the application, the collected customer service data is subjected to characteristic derivation to obtain a plurality of characteristic data; performing feature screening on the plurality of feature data by utilizing a target variable to obtain a training set, and constructing an initial model based on an extreme gradient lifting algorithm; the target variables comprise a first movable support target variable and a first advanced clearing target variable which are determined based on loan rights issuing, and a second movable support target variable and a second advanced clearing target variable which are determined based on loan rights not issuing; training the initial model by using the training set to obtain a target model; the target model comprises a first movable support model, a first advanced settlement model, a second movable support model and a second advanced settlement model; fusing the first movable branch model and the second movable branch model to obtain a first gain model, and fusing the first advanced clearing model and the second advanced clearing model to obtain a second gain model; and determining a loan yield based on the first gain model and the second gain model, and determining a target interval of forward gain from a plurality of box intervals determined based on the loan yield so as to issue loan rights and interests to a target guest group in the target interval. Therefore, the application performs feature screening on a plurality of feature data by utilizing the target variable, thereby taking the feature data with strong correlation with the target variable as a training set; the clients with the released loan interests are equivalent to a test group, the clients without the released loan interests are equivalent to a comparison group, the initial model is trained by utilizing a training set to obtain a plurality of classification models, the two classification models corresponding to the movable branch target variable are fused to obtain a first gain model, so that people applying for and withdrawing the loan interests are found out, the two classification models corresponding to the advanced settlement target variable are fused to obtain a second gain model, so that people settling in advance due to the release of the loan interests are found out, and then the rate of income of the loan is determined based on the two gain models, so that people withdrawing the loan interests but not settling in advance due to the release of the loan interests can be found out, and the continuous value problem and the multi-classification problem of the loan residual can be solved; in addition, a target interval of forward gain is determined from a plurality of box intervals determined by the loan yield, and loan equity issuing is carried out on target guest groups in the target interval of forward gain, so that the target guest groups which can be actuated by the loan equity are screened, the quality of the guest groups for loan equity issuing is improved, the problem that blind clients are easy to disturb the clients and are not intended to be directed to the clients is solved, the overall return on investment is improved, and the aim of maximizing the platform on the credit is fulfilled.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for constructing a multi-objective gain model according to the present application;
FIG. 2 is a diagram showing the distribution of the logarithmic transformed characteristic data according to the present application;
FIG. 3 is a schematic diagram of a multi-objective gain model construction device according to the present application;
fig. 4 is a block diagram of an electronic device according to the present disclosure.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
At present, a platform side utilizes a certain preferential policy, such as issuing rights and interests along with the rights and interests, issuing a fee deduction coupon and the like to actuate the application and withdrawal behaviors of a user, and certain mutual balance exists between the rights and interests along with the rights and interests and the platform at the credit, so that how to find out people who can be actively actuated and can maximize the platform at the credit is a problem to be solved at present. Therefore, the application provides a multi-target gain model construction method, which can determine the loan yield rate based on two gain models obtained by fusing a plurality of classification models, and realize the aim of maximizing the platform in the loan amount.
Referring to fig. 1, the embodiment of the application discloses a multi-target gain model construction method, which comprises the following steps:
and S11, performing feature derivation on the collected customer service data to obtain a plurality of feature data.
In this embodiment, a plurality of feature data are derived based on the collected daily customer service data; wherein the plurality of characteristic data includes, but is not limited to, customer financial information, daily activity information, credit rating data, and the like. It should be noted that the customer financial information includes, but is not limited to, available line, line usage, credit amount, approval line, etc. The daily activity information includes, but is not limited to, trace, use times and App viscosity of a daily operation App (Application program), and client App activity characteristics processed according to the use habit of a user, such as daily activity, night activity and the like; and according to the App use characteristics processed by the App during use, such as deep use, medium use, shallow use and the like. Credit data includes, but is not limited to, house credits, car credits, liabilities, and the like.
Step S12, performing feature screening on the plurality of feature data by utilizing a target variable to obtain a training set, and constructing an initial model based on an extreme gradient lifting algorithm; the target variables include a first movable support target variable and a first advanced clearing target variable determined based on the issuing of the loan equity, and a second movable support target variable and a second advanced clearing target variable determined based on the non-issuing of the loan equity.
In this embodiment, feature screening is performed on a plurality of feature data by using a target variable to obtain a training set, which may include: determining the correlation among a plurality of feature data, and dividing the feature data with the correlation larger than a preset correlation threshold value in the plurality of feature data into a group to obtain a plurality of groups of correlation features; determining information values between feature data contained in each group of associated features and target variables respectively, and screening target feature data with maximum information values from each group of associated features respectively; the training set is determined based on the target feature data and the remaining feature data of the plurality of feature data other than the feature data contained by the plurality of sets of associated features. It will be appreciated that the correlation between the various feature data is first determined and the correlation calculation formula is as follows:
Wherein X, Y each represents feature data, cov represents covariance, σ represents standard deviation, E () represents average value, and ρ represents correlation. For example, the plurality of feature data includes A, B, C, D, and feature data having a correlation greater than a preset correlation threshold, for example, 0.8, among the plurality of feature data is grouped to obtain a plurality of groups of associated features, for example [ A, B ]. Calculating information values (IV, information Value) between the feature data a and the feature data B and the target variables respectively, and retaining target feature data with the maximum information value, such as feature data a; a training set is constructed based on the target feature data a and the remaining feature data C, D of the plurality of feature data except the feature data included in the plurality of sets of associated features.
Before determining the correlation between the plurality of types of feature data, determining the deletion rate corresponding to the plurality of types of feature data respectively, and deleting the feature data with the deletion rate greater than a preset deletion threshold value in the plurality of types of feature data; and filling missing values in the feature data with the missing rate not larger than a preset missing threshold value in the plurality of feature data. It can be understood that each feature data is traversed, and for any feature data, if the deletion rate is greater than 0.3, the feature data is deleted; and if the missing rate is less than or equal to 0.3, filling the missing value in the characteristic data by using a space similarity average method. The spatial similarity average method is to map all samples of the feature data in a spatial vector, represent the spatial distance difference between the samples by using Euclidean distance, divide the samples with similar distances into a plurality of clusters to obtain a plurality of clusters, and replace the missing value in the same cluster with the average value of the samples in the cluster, thereby reducing the difference of the samples due to the feature missing to the greatest extent.
Further, after filling the missing values in the feature data with the missing rate not greater than the preset missing threshold value in the plurality of feature data, determining feature dimensions corresponding to the plurality of feature data respectively, and deleting the feature data with the feature dimensions not greater than the preset dimension threshold value in the plurality of feature data; determining a plurality of initial feature data with feature dimensions larger than a preset dimension threshold value from the plurality of feature data; and determining the dimension duty ratio corresponding to each feature dimension in various initial feature data, and carrying out logarithmic transformation processing on the initial feature data with the dimension duty ratio larger than a preset duty ratio threshold value in a plurality of initial feature data. It can be understood that each feature data is traversed, and if the feature dimension of any feature data is smaller than or equal to 1, that is, only one value is taken, it indicates that the representation dimension of the feature data is too single, and deletion processing is required for the feature data; for example, if the business is done in China, that is, if the nationality of the client belongs to China, the nationality field is not distinguished, the characteristic dimension is 1, and the deletion processing is required for the nationality of the client. If the feature dimension is greater than 1, the dimension duty ratio corresponding to each feature dimension in the feature data is required to be continuously determined, and if all the feature dimension duty ratios are less than or equal to 0.7, logarithmic transformation processing is not required to be carried out on the feature data; if the specific dimension ratio of a certain feature is greater than 0.7, the feature data needs to be subjected to logarithmic transformation, wherein the logarithmic transformation formula is as follows:
X=log e (X Original source );
Wherein X is Original source Representing the original characteristic data; x represents the logarithmic transformed feature data. As shown in fig. 2, the variance of the feature data can be stabilized by the logarithmic transformation process, the bipolar differentiation can be reduced, and the distribution of the feature data after logarithmic transformation can be made to approximate to a normal distribution.
In this embodiment, determining the training set based on the target feature data and the remaining feature data of the plurality of feature data except the feature data included in the plurality of sets of associated features may include determining the plurality of types of feature data to be screened based on the target feature data and the remaining feature data of the plurality of types of feature data except the feature data included in the plurality of sets of associated features; screening feature data to be checked, wherein the change trend of the feature data is consistent with the change trend of a target variable, from a plurality of feature data to be screened; determining stability indexes corresponding to various feature data to be checked respectively, and screening screened feature data with stability indexes larger than a preset stability threshold value from the various feature data to be checked; constructing a tree model based on various screened characteristic data, and determining the characteristic importance of the various screened characteristic data in the tree model respectively; a training set is constructed based on feature data having a feature importance greater than a preset importance threshold, which is selected from among various selected feature data. It can be understood that, for any feature data to be screened, as for different box-dividing dimensions, such as chi-square box division, equal-frequency box division, etc., the feature data should keep a certain trend, that is, as the number of people corresponding to the feature data in each box-dividing interval changes, the number of people corresponding to the target variable in each box-dividing interval changes, so that the target variable is more stable, therefore, feature data to be checked, in which the change trend of the feature data is consistent with the change trend of the target variable, needs to be screened from the feature data to be screened, and feature data meeting service interpretability needs to be screened from the feature data to be checked, for example, as for the number of browsing apps in the last half year, theoretically, the number of browsing should be more, the viscosity of apps is stronger, and the demand of customers is higher.
Further, for several kinds of feature data to be checked, which satisfy service interpretability, it is necessary to determine group stability indexes (PSI, population Stability Index) corresponding to the various kinds of feature data to be checked, and to screen screened feature data with group stability indexes greater than a preset stability threshold from the various kinds of feature data to be checked. For example, for the number of browsing apps in the last half year, the higher the number of browsing apps is, the higher the demand of the client is, while by the time of 8 months, the lower the demand of the client is, indicating that the feature data does not satisfy the cross-time consistency, and therefore the feature data, the number of browsing apps in the last half year, needs to be deleted.
Further, a plurality of tree models are constructed based on various screened feature data, for any one screened feature data, the feature importance of the screened feature data in a single tree model is calculated first, then an average value is calculated based on the feature importance of the screened feature data in each tree model, and the total feature importance corresponding to the screened feature data is determined based on the average value. And screening the feature data with the total feature importance greater than a preset importance threshold from the plurality of screened feature data to construct a training set.
In the embodiment, an initial model is built based on an extreme gradient lifting algorithm (XGBoost, eXtreme Gradient Boosting), so that a second derivative can be adopted to enable a loss function to be more accurate, and a regular term is adopted to avoid tree overfitting; the Block storage can be used for parallel calculation. Further, the target variables mentioned in the application comprise a first movable support target variable and a first advanced clearing target variable which are determined based on the issuing of the loan rights, namely whether a client moves support after the issuing of the loan rights and whether the client clears in advance after the issuing of the loan rights; based on the second movable support target variable and the second advanced settlement target variable which are determined by the non-issuing of the loan right, namely, whether the client moves support when the loan right is not issued and whether the client settles in advance when the loan right is not issued.
S13, training the initial model by using the training set to obtain a target model; the target model comprises a first movable support model, a first advanced settlement model, a second movable support model and a second advanced settlement model.
In this embodiment, training the initial model by using the training set to obtain the target model may include inputting the training set and the target variable to a super-parameter optimizer, so as to iterate model parameters by using the super-parameter optimizer and based on a preset model evaluation index, so as to obtain super-parameters; the hyper-parameters and the target variables are input into the initial model to train the initial model by utilizing the training set so as to obtain the target model. It can be understood that, the training set constructed based on the processed and screened characteristic data and the target variable are input to the super-parameter optimizer, the target variable corresponds to the training set, the super-parameter optimizer can iterate model parameters in a given range in a stepwise iteration mode, and the optimal super-parameter combination is selected by taking a preset model evaluation index as a judgment basis; for example, in the model training process, the preset model evaluation index used may be an F1 Score (F1 Score). After the super parameters are obtained, the target variable, the corresponding training set and the optimal super parameters are all input into the initial model to perform model training, so that a trained target model is obtained. The target model includes a first movable branch model M1 corresponding to a first movable branch target variable, a first advanced clearing model M2 corresponding to a first advanced clearing target variable, a second movable branch model M3 corresponding to a second movable branch target variable, and a second advanced clearing model M4 corresponding to a second advanced clearing target variable. Further, the trained four target models are serialized into disk for subsequent use.
And S14, fusing the first movable branch model and the second movable branch model to obtain a first gain model, and fusing the first advanced clearing model and the second advanced clearing model to obtain a second gain model.
In this embodiment, the clients issuing the loan rights are equivalent to the test group, that is, the test group corresponds to the first movable branch model and the first advanced settlement model; the clients not issuing the loan rights are equivalent to the comparison group, namely the comparison group corresponds to the second movable support model and the second advanced settlement model. Determining a first movable support probability M1 based on a first movable support model and loan interest issuing p1 And a first passive branch probability M1 p0 And determines a second movable support probability M3 based on the second movable support model and the non-issuing of the loan rights p1 And a second passive branch probability M3 p0 The method comprises the steps of carrying out a first treatment on the surface of the Based on the first dynamic branch probabilityThe first gain model U1 can be determined by the first non-movable branch probability, the second movable branch probability and the second non-movable branch probability, namely the probability of generating a movable branch will due to loan rights issuing can be calculated through the first gain model; the specific calculation formula is as follows:
further, a first advanced clearing probability M2 is determined based on the first advanced clearing model and the loan interest issuing p1 And a first non-advanced clearing probability M2 p0 And determines a second advanced clearing probability M4 based on the second advanced clearing model and the loan equity non-issuing p1 And a second non-advanced clearing probability M4 p0 The method comprises the steps of carrying out a first treatment on the surface of the Determining a second gain model U2 based on the first advanced clearing probability, the first non-advanced clearing probability, the second advanced clearing probability and the second non-advanced clearing probability, i.e. calculating the probability of generating advanced clearing will due to loan interest release through the second gain model; the specific calculation formula is as follows:
and S15, determining the loan yield based on the first gain model and the second gain model, and determining a target interval of forward gain from a plurality of box division intervals determined based on the loan yield so as to issue loan rights and interests to target guest groups in the target interval.
In this embodiment, the first gain model may calculate the probability U1 of generating a willingness to pay by the issuing of the loan equity, and the second gain model may calculate the probability U2 of generating a willingness to settle in advance by the issuing of the loan equity, so that the loan yield P may be determined based on the first gain model and the second gain model Increase the number of The specific calculation formula is P Increase the number of The =u1-U2, i.e. the loan yield reflects the probability that the equity is paid out but not yet cleared in advance, i.e. the ratio of the balance in question.
In this embodiment, when the marketing strategy is manufactured, considering that the costs corresponding to different marketing modes are different, in order to maximize the ROI (Return On Investment ) index, different marketing modes are formulated according to different marketing gain score intervals. For example, according to the loan yield P Increase the number of The sequence from large to small is divided into 3 box division sections, which are respectively set into three labels of high, medium and low. In view of data sensitivity, the data were subjected to conversion desensitization in the unit of control group 1, and the final results are shown in table one and table two, the table one being the comparison result without using the multi-target gain model, and the table two being the comparison result with using the multi-target gain model.
List one
Box separating | Average balance of persons in control group | Average balance of experimental group | Balance gain |
Full amount of | 100% | 101.4698% | 1.4698% |
Watch II
Box separating | Average balance of persons in control group | Average balance of experimental group | Balance gain |
High height | 100% | 104.9356% | 4.9356% |
In (a) | 100% | 98.7114% | -1.2886% |
Low and low | 100% | 96.4085% | -3.5915% |
From the results in the first and second tables, it can be seen that the multi-objective gain model can be used to determine which bin section has higher balance gain, which is the forward gain, wherein the balance gain is the value obtained by subtracting the average balance of the control group from the average balance of the experimental group, and the average balance is the value obtained by converting the ratio between the credit and the loan number into unit 1. Further, a target interval of positive gain is determined from three box intervals of high, medium and low determined based on the loan yield, and loan rights are issued to target guest groups in the target interval, so that clients capable of bringing gain are screened, namely, clients generating negative gain are excluded from loan rights issuing. Thus, the clients of the loan equity issuing can be converted from blind selection to a staff-dependent method, and the quality of the client group of the loan equity issuing can be improved.
Therefore, the application performs feature screening on a plurality of feature data by utilizing the target variable, thereby taking the feature data with strong correlation with the target variable as a training set; the clients with the released loan interests are equivalent to a test group, the clients without the released loan interests are equivalent to a comparison group, the initial model is trained by utilizing a training set to obtain a plurality of classification models, the two classification models corresponding to the movable branch target variable are fused to obtain a first gain model, so that people applying for and withdrawing the loan interests are found out, the two classification models corresponding to the advanced settlement target variable are fused to obtain a second gain model, so that people settling in advance due to the release of the loan interests are found out, and then the rate of income of the loan is determined based on the two gain models, so that people withdrawing the loan interests but not settling in advance due to the release of the loan interests can be found out, and the continuous value problem and the multi-classification problem of the loan residual can be solved; in addition, a target interval of forward gain is determined from a plurality of box intervals determined by the loan yield, and loan equity issuing is carried out on target guest groups in the target interval of forward gain, so that the target guest groups which can be actuated by the loan equity are screened, the quality of the guest groups for loan equity issuing is improved, the problem that blind clients are easy to disturb the clients and are not intended to be directed to the clients is solved, the overall return on investment is improved, and the aim of maximizing the platform on the credit is fulfilled.
Referring to fig. 3, an embodiment of the present invention discloses a multi-objective gain model construction device, including:
the feature deriving module 11 is configured to perform feature derivation on the collected customer service data to obtain a plurality of feature data;
the training set determining module 12 is configured to perform feature screening on the plurality of feature data by using a target variable to obtain a training set, and construct an initial model based on an extreme gradient lifting algorithm; the target variables comprise a first movable support target variable and a first advanced clearing target variable which are determined based on loan rights issuing, and a second movable support target variable and a second advanced clearing target variable which are determined based on loan rights not issuing;
the model training module 13 is configured to train the initial model by using the training set to obtain a target model; the target model comprises a first movable support model, a first advanced settlement model, a second movable support model and a second advanced settlement model;
the model fusion module 14 is configured to fuse the first movable branch model and the second movable branch model to obtain a first gain model, and fuse the first advanced clearing model and the second advanced clearing model to obtain a second gain model;
And the rights and interests sending module 15 is configured to determine a loan income ratio based on the first gain model and the second gain model, and determine a target interval of forward gain from a plurality of binning intervals determined based on the loan income ratio, so as to issue a loan rights and interests to a target guest group in the target interval.
Therefore, the application performs feature screening on a plurality of feature data by utilizing the target variable, thereby taking the feature data with strong correlation with the target variable as a training set; the clients with the released loan interests are equivalent to a test group, the clients without the released loan interests are equivalent to a comparison group, the initial model is trained by utilizing a training set to obtain a plurality of classification models, the two classification models corresponding to the movable branch target variable are fused to obtain a first gain model, so that people applying for and withdrawing the loan interests are found out, the two classification models corresponding to the advanced settlement target variable are fused to obtain a second gain model, so that people settling in advance due to the release of the loan interests are found out, and then the rate of income of the loan is determined based on the two gain models, so that people withdrawing the loan interests but not settling in advance due to the release of the loan interests can be found out, and the continuous value problem and the multi-classification problem of the loan residual can be solved; in addition, a target interval of forward gain is determined from a plurality of box intervals determined by the loan yield, and loan equity issuing is carried out on target guest groups in the target interval of forward gain, so that the target guest groups which can be actuated by the loan equity are screened, the quality of the guest groups for loan equity issuing is improved, the problem that blind clients are easy to disturb the clients and are not intended to be directed to the clients is solved, the overall return on investment is improved, and the aim of maximizing the platform on the credit is fulfilled.
In some embodiments, the training set determination module 12 includes:
the characteristic data grouping unit is used for determining the correlation among the plurality of types of characteristic data and grouping the characteristic data with the correlation larger than a preset correlation threshold value in the plurality of types of characteristic data into a group so as to obtain a plurality of groups of correlation characteristics;
the information value determining unit is used for determining information values between the characteristic data contained in each group of the associated characteristics and the target variables respectively, and screening target characteristic data with the maximum information value from each group of the associated characteristics respectively;
and the training set determining submodule is used for determining a training set based on the target feature data and the rest feature data except the feature data contained in the plurality of groups of associated features in the plurality of types of feature data.
In some specific embodiments, the multi-target gain model building apparatus further includes:
the deletion rate determining unit is used for determining the deletion rate corresponding to the plurality of types of characteristic data respectively and deleting the characteristic data with the deletion rate larger than a preset deletion threshold value in the plurality of types of characteristic data;
and the missing value filling unit is used for filling missing values in the characteristic data with the missing rate not larger than the preset missing threshold value in the plurality of characteristic data.
In some specific embodiments, the multi-target gain model building apparatus further includes:
the characteristic dimension determining unit is used for determining characteristic dimensions corresponding to the plurality of types of characteristic data respectively and deleting the characteristic data with the characteristic dimensions not larger than a preset dimension threshold value in the plurality of types of characteristic data;
the logarithmic transformation processing unit is used for determining a plurality of initial characteristic data with characteristic dimension larger than the preset dimension threshold value from the plurality of characteristic data; and determining the dimension duty ratio corresponding to each feature dimension in various initial feature data, and carrying out logarithmic transformation processing on the initial feature data with the dimension duty ratio larger than a preset duty ratio threshold value in the plurality of initial feature data.
In some embodiments, the training set determination submodule includes:
the to-be-screened data determining unit is used for determining a plurality of types of to-be-screened characteristic data based on the target characteristic data and the residual characteristic data except the characteristic data contained in the plurality of groups of associated characteristics in the plurality of types of characteristic data;
the change trend screening unit is used for screening feature data to be checked, wherein the change trend of the feature data is consistent with the change trend of the target variable, from the feature data to be screened;
The stability screening unit is used for determining stability indexes corresponding to the various feature data to be checked respectively, and screening screened feature data with the stability indexes larger than a preset stability threshold value from the various feature data to be checked;
an importance determining unit for constructing a tree model based on the various screened feature data and determining feature importance of the various screened feature data in the tree model, respectively;
and the importance screening unit is used for constructing a training set based on the feature data with the importance of the features screened from the various screened feature data being greater than a preset importance threshold.
In some embodiments, the model training module 13 includes:
the super-parameter determining unit is used for inputting the training set and the target variable into a super-parameter optimizer so as to iterate model parameters by utilizing the super-parameter optimizer and based on a preset model evaluation index to obtain super-parameters;
and the model training unit is used for inputting the super parameters and the target variables into the initial model so as to train the initial model by utilizing the training set to obtain a target model.
In some embodiments, the model fusion module 14 includes:
the movable support probability determining unit is used for determining a first movable support probability and a first non-movable support probability based on the first movable support model and the loan equity issuing, and determining a second movable support probability and a second non-movable support probability based on the second movable support model and the loan equity non-issuing;
a first model determining unit configured to determine a first gain model based on the first movable support probability, the first non-movable support probability, the second movable support probability, and the second non-movable support probability;
a clearing probability determination unit configured to determine a first advanced clearing probability and a first non-advanced clearing probability based on the first advanced clearing model and the loan interest issuance, and determine a second advanced clearing probability and a second non-advanced clearing probability based on the second advanced clearing model and the loan interest non-issuance;
and a second model determining unit configured to determine a second gain model based on the first advanced clearing probability, the first non-advanced clearing probability, the second advanced clearing probability, and the second non-advanced clearing probability.
Further, the embodiment of the present application further discloses an electronic device, and fig. 4 is a block diagram of an electronic device 20 according to an exemplary embodiment, where the content of the diagram is not to be considered as any limitation on the scope of use of the present application.
Fig. 4 is a schematic structural diagram of an electronic device 20 according to an embodiment of the present application. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. Wherein the memory 22 is configured to store a computer program that is loaded and executed by the processor 21 to implement the relevant steps in the multi-objective gain model building method disclosed in any of the foregoing embodiments. In addition, the electronic device 20 in the present embodiment may be specifically an electronic computer.
In this embodiment, the power supply 23 is configured to provide an operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and the communication protocol to be followed is any communication protocol applicable to the technical solution of the present application, which is not specifically limited herein; the input/output interface 25 is used for acquiring external input data or outputting external output data, and the specific interface type thereof may be selected according to the specific application requirement, which is not limited herein.
The memory 22 may be a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk, or an optical disk, and the resources stored thereon may include an operating system 221, a computer program 222, and the like, and the storage may be temporary storage or permanent storage.
The operating system 221 is used for managing and controlling various hardware devices on the electronic device 20 and computer programs 222, which may be Windows Server, netware, unix, linux, etc. The computer program 222 may further include a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the multi-objective gain model building method performed by the electronic device 20 as disclosed in any of the previous embodiments.
Further, the application also discloses a computer readable storage medium for storing a computer program; wherein the computer program, when executed by a processor, implements the disclosed multi-objective gain model building method. For specific steps of the method, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and no further description is given here.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing has outlined rather broadly the more detailed description of the application in order that the detailed description of the application that follows may be better understood, and in order that the present principles and embodiments may be better understood; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.
Claims (10)
1. The method for constructing the multi-target gain model is characterized by comprising the following steps of:
performing feature derivation on the collected customer service data to obtain a plurality of feature data;
performing feature screening on the plurality of feature data by utilizing a target variable to obtain a training set, and constructing an initial model based on an extreme gradient lifting algorithm; the target variables comprise a first movable support target variable and a first advanced clearing target variable which are determined based on loan rights issuing, and a second movable support target variable and a second advanced clearing target variable which are determined based on loan rights not issuing;
training the initial model by using the training set to obtain a target model; the target model comprises a first movable support model, a first advanced settlement model, a second movable support model and a second advanced settlement model;
Fusing the first movable branch model and the second movable branch model to obtain a first gain model, and fusing the first advanced clearing model and the second advanced clearing model to obtain a second gain model;
and determining a loan yield based on the first gain model and the second gain model, and determining a target interval of forward gain from a plurality of box intervals determined based on the loan yield so as to issue loan rights and interests to a target guest group in the target interval.
2. The method for constructing a multi-objective gain model according to claim 1, wherein the feature screening of the plurality of feature data by using the objective variable to obtain a training set comprises:
determining the correlation among the plurality of feature data, and dividing the feature data with the correlation larger than a preset correlation threshold value in the plurality of feature data into a group to obtain a plurality of groups of correlation features;
determining information values between feature data contained in each group of associated features and the target variable respectively, and screening target feature data with the maximum information value from each group of associated features respectively;
A training set is determined based on the target feature data and the remaining feature data of the plurality of feature data other than the feature data contained by the plurality of sets of associated features.
3. The method for constructing a multi-target gain model according to claim 2, further comprising, before determining the correlation between the plurality of types of feature data:
determining the deletion rate corresponding to the plurality of characteristic data respectively, and deleting the characteristic data with the deletion rate larger than a preset deletion threshold value in the plurality of characteristic data;
and filling missing values in the characteristic data with the missing rate not larger than the preset missing threshold value in the plurality of characteristic data.
4. The method for constructing a multi-objective gain model according to claim 3, wherein after the filling of the missing values in the feature data having the missing rate not greater than the preset missing threshold value in the plurality of feature data, the method further comprises:
determining feature dimensions corresponding to the feature data respectively, and deleting the feature data of which the feature dimensions are not more than a preset dimension threshold value in the feature data;
determining a plurality of initial feature data with feature dimensions larger than the preset dimension threshold value from the plurality of feature data;
And determining the dimension duty ratio corresponding to each feature dimension in various initial feature data, and carrying out logarithmic transformation processing on the initial feature data with the dimension duty ratio larger than a preset duty ratio threshold value in the plurality of initial feature data.
5. The method according to claim 2, wherein determining the training set based on the target feature data and the remaining feature data of the plurality of feature data excluding the feature data included in the plurality of sets of associated features comprises:
determining a plurality of types of feature data to be screened based on the target feature data and the rest feature data except the feature data contained in the plurality of groups of associated features in the plurality of types of feature data;
screening feature data to be checked, wherein the change trend of the feature data is consistent with the change trend of the target variable, from the feature data to be screened;
determining stability indexes corresponding to the various feature data to be checked respectively, and screening screened feature data with stability indexes larger than a preset stability threshold value from the various feature data to be checked;
constructing a tree model based on the various screened characteristic data, and determining the characteristic importance of the various screened characteristic data in the tree model respectively;
And constructing a training set based on feature data with feature importance greater than a preset importance threshold value, wherein the feature importance is screened from various screened feature data.
6. The method for constructing a multi-target gain model according to claim 1, wherein training the initial model by using the training set to obtain a target model comprises:
inputting the training set and the target variable to a super-parameter optimizer, so as to iterate model parameters by using the super-parameter optimizer and based on a preset model evaluation index, thereby obtaining super-parameters;
and inputting the super parameters and the target variables into the initial model to train the initial model by utilizing the training set so as to obtain a target model.
7. The method according to any one of claims 1 to 6, wherein the fusing the first movable branch model and the second movable branch model to obtain a first gain model, and fusing the first advanced clearing model and the second advanced clearing model to obtain a second gain model, includes:
determining a first movable branch probability and a first non-movable branch probability based on the first movable branch model and the loan equity issuing, and determining a second movable branch probability and a second non-movable branch probability based on the second movable branch model and the loan equity non-issuing;
Determining a first gain model based on the first dynamic branch probability, the first non-dynamic branch probability, the second dynamic branch probability, and the second non-dynamic branch probability;
determining a first advanced clearing probability and a first non-advanced clearing probability based on the first advanced clearing model and the loan equity issuance, and determining a second advanced clearing probability and a second non-advanced clearing probability based on the second advanced clearing model and the loan equity non-issuance;
a second gain model is determined based on the first advanced clearing probability, the first non-advanced clearing probability, the second advanced clearing probability, and the second non-advanced clearing probability.
8. A multi-objective gain model construction apparatus, comprising:
the feature deriving module is used for performing feature derivation on the collected customer service data to obtain a plurality of feature data;
the training set determining module is used for carrying out feature screening on the plurality of feature data by utilizing the target variable to obtain a training set, and constructing an initial model based on an extreme gradient lifting algorithm; the target variables comprise a first movable support target variable and a first advanced clearing target variable which are determined based on loan rights issuing, and a second movable support target variable and a second advanced clearing target variable which are determined based on loan rights not issuing;
The model training module is used for training the initial model by utilizing the training set to obtain a target model; the target model comprises a first movable support model, a first advanced settlement model, a second movable support model and a second advanced settlement model;
the model fusion module is used for fusing the first movable branch model and the second movable branch model to obtain a first gain model, and fusing the first advanced clearing model and the second advanced clearing model to obtain a second gain model;
and the right sending module is used for determining the loan income ratio based on the first gain model and the second gain model, and determining a target interval of forward gain from a plurality of box intervals determined based on the loan income ratio so as to carry out loan right issuing on target guest groups in the target interval.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the multi-objective gain model construction method according to any one of claims 1 to 7.
10. A computer readable storage medium for storing a computer program which when executed by a processor implements the multi-objective gain model construction method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311162917.7A CN117237087A (en) | 2023-09-07 | 2023-09-07 | Multi-target gain model construction method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311162917.7A CN117237087A (en) | 2023-09-07 | 2023-09-07 | Multi-target gain model construction method, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117237087A true CN117237087A (en) | 2023-12-15 |
Family
ID=89095968
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311162917.7A Pending CN117237087A (en) | 2023-09-07 | 2023-09-07 | Multi-target gain model construction method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117237087A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN119475192A (en) * | 2025-01-09 | 2025-02-18 | 北京淇瑀信息科技有限公司 | Model building system, method and computer program product based on relative distribution |
-
2023
- 2023-09-07 CN CN202311162917.7A patent/CN117237087A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN119475192A (en) * | 2025-01-09 | 2025-02-18 | 北京淇瑀信息科技有限公司 | Model building system, method and computer program product based on relative distribution |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109636482B (en) | Data processing method and system based on similarity model | |
CN109300039A (en) | The method and system of intellectual product recommendation are carried out based on artificial intelligence and big data | |
Foroghi et al. | Applying decision tree to predict bankruptcy | |
CN114078050A (en) | Loan overdue prediction method and device, electronic equipment and computer readable medium | |
US20130103617A1 (en) | Computer-Implemented Systems And Methods For Forecasting And Estimation Using Grid Regression | |
CN116596659A (en) | Enterprise intelligent credit approval method, system and medium based on big data wind control | |
Makoni et al. | Modelling and forecasting Zimbabwe’s tourist arrivals using time series method: a case study of Victoria Falls rainforest | |
CN113344438A (en) | Loan system, loan monitoring method, loan monitoring apparatus, and loan medium for monitoring loan behavior | |
Statistics | Socio-economic indexes for areas (SEIFA) | |
CN114219630A (en) | Service risk prediction method, device, equipment and medium | |
CN117237087A (en) | Multi-target gain model construction method, device, equipment and storage medium | |
CN110738565A (en) | Real estate finance artificial intelligence composite risk control model based on data collection | |
Nelson et al. | Housing inequalities: The space-time geography of housing policies | |
Wang et al. | Tourism demand with subtle seasonality: Recognition and forecasting | |
Çılgın et al. | The effect of outlier detection methods in real estate valuation with machine learning | |
CN119172476A (en) | Method, system and device for intelligent sorting of outbound calls for non-performing assets based on machine learning | |
Senousy et al. | A Smart social insurance big data analytics framework based on machine learning algorithms | |
CN113222767A (en) | Data processing method and device for indexing securities combination | |
CN117952740A (en) | Method, device and server for determining resource amount processing strategy of client | |
Vaidya et al. | Decision support system for the stock market using data analytics and artificial intelligence | |
CN117764692A (en) | Method for predicting credit risk default probability | |
CN115796389A (en) | Tax prediction method, apparatus, device and computer readable storage medium | |
Beusch et al. | Labour Market Trajectories of the Self-employed in the Netherlands | |
CN113159966A (en) | Capital planning recommendation method, device, equipment and storage medium based on big data | |
Adamiec et al. | Understanding the indicative factors of university/college closings |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |