CN112232950A - Loan risk assessment method and device, equipment and computer-readable storage medium - Google Patents
Loan risk assessment method and device, equipment and computer-readable storage medium Download PDFInfo
- Publication number
- CN112232950A CN112232950A CN202011431333.1A CN202011431333A CN112232950A CN 112232950 A CN112232950 A CN 112232950A CN 202011431333 A CN202011431333 A CN 202011431333A CN 112232950 A CN112232950 A CN 112232950A
- Authority
- CN
- China
- Prior art keywords
- data
- loan
- user
- result
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- General Engineering & Computer Science (AREA)
- Marketing (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Technology Law (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Business, Economics & Management (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Development Economics (AREA)
- Computational Linguistics (AREA)
- Databases & Information Systems (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
According to the technical scheme, the borrowing and lending data of the first user and the first target data in the borrowing and lending data are obtained, the first target data are input into a preset prediction model, the result output by the prediction model is obtained and is used as the first result, the second target data in the borrowing and lending data of the first user are obtained, the second target data are input into the clustering model, the type of the first user output by the clustering model is obtained, and the risk level corresponding to the type of the first user in advance is used as the second result. And further obtaining an evaluation result of the loan risk of the first user determined according to the first result and the second result. Because the loan data comprises the multi-head loan behavior data of the user, the asset data of the user and the consumption data of the user, the loan risk of the user is evaluated from multiple angles, so that the evaluation basis is more comprehensive, the accuracy of the evaluation result is improved, and the accuracy of the evaluation result is further improved as the evaluation result is obtained according to the first result and the second result.
Description
Technical Field
The present application relates to the field of electronic information, and in particular, to a method and an apparatus for assessing loan risk, a device, and a computer-readable storage medium.
Background
In recent years, a large number of credit institutions emerge in the market, and the number of selectable loan channels and financial services of users is increased, and the situation of multi-head loan also appears. Multi-headed loan refers to the act of a single borrower submitting credit requests to two or more financial institutions at the same time. While such lending activities to some extent meet the financial needs of the user, they also involve higher risks:
generally, credit institutions grant credit based on a combination of user credit, economic source, etc., and place a payment within their payment capacity. However, due to the fact that the information of the lending parties is asymmetric, if the same user obtains credit awarded by multiple financial institutions, the loan out of the repayment range of the user is obtained, once the loan officer fund chain is broken, the loan officer is easy to default frequently, and bad account risk is triggered.
There is a need for a quantitative assessment of the risk of breach by an individual's long-line credit activity. The accuracy of the existing risk assessment method for the long-term credit is to be improved.
Disclosure of Invention
In the process of research, the applicant finds that the economic abilities of different lending individuals are different, and the existing risk assessment method for the multi-head credit is only designed for the multi-head credit information, so that the accuracy of the method is to be improved due to the lack of comprehensive assessment for the lending individuals.
The application provides a method, a device and equipment for assessing loan risk and a computer readable storage medium, and aims to solve the problem of how to improve the risk assessment accuracy of multi-head loan.
In order to achieve the above object, the present application provides the following technical solutions:
a method for assessing loan risk, comprising:
acquiring borrowing and lending data of a first user, wherein the borrowing and lending data of the first user comprises multi-head borrowing and lending behavior data of the first user, asset data of the first user and consumption data of the first user, and any one of the borrowing and lending data comprises at least one data item;
acquiring first target data in the loan data, wherein the first target data comprises data items of which the predictive power values are greater than a preset threshold value in the loan data, and the predictive power values are positively correlated with the influence degrees of the data items on the evaluation results;
inputting the first target data into a preset prediction model to obtain a result output by the prediction model as a first result;
acquiring second target data in the loan data, wherein the second target data comprises data items of which the data saturation is greater than a preset saturation threshold in the loan data;
inputting the second target data into a clustering model to obtain a type of the first user output by the clustering model, taking a risk grade corresponding to the type of the first user in advance as a second result, and clustering borrowing and lending data of a plurality of users by the clustering model to obtain a plurality of types;
and obtaining an evaluation result of the loan risk of the first user, wherein the evaluation result is determined according to the first result and the second result.
Optionally, the multi-head loan behavior data of the first user at least includes:
and the first user repayment data items of a plurality of borrowers.
Optionally, the first user's multi-head loan behavior data further comprises at least one of:
the first user applies data items for borrowing of the plurality of borrowers;
a placement data item for the first user by the plurality of borrowers;
the plurality of borrowers query data items of the first user.
Optionally, the first target data further includes:
data items specified according to expert experience;
the second target data further includes:
data items specified according to expert experience.
Optionally, the training process of the prediction model includes:
acquiring sample target data and a label of the sample target data, wherein the sample target data comprises a data item of which the predictive capability value is greater than the preset threshold value in sample loan data and a data item specified according to expert experience;
training the predictive model using a portion of the sample target data and a label for the portion of the sample target data;
validating the trained predictive model using another portion of the sample target data and a label of the another portion of the sample target data;
and if the verification fails, reselecting sample target data to train the prediction model until the verification passes.
Optionally, the method further comprises:
acquiring a plurality of types obtained by clustering loan data of a plurality of users by the clustering model;
and obtaining other attributes of different types by comparing the data items in different types, wherein the other attributes are the attributes except the risk level.
Optionally, the prediction model is a supervised model and the clustering model is an unsupervised model.
An assessment apparatus for loan risk, comprising:
the system comprises a loan data acquisition module, a data processing module and a data processing module, wherein the loan data acquisition module is used for acquiring loan data of a first user, the loan data of the first user comprises multi-head loan behavior data of the first user, asset data of the first user and consumption data of the first user, and any one of the loan data comprises at least one data item;
the system comprises a first data acquisition module, a second data acquisition module and a third data acquisition module, wherein the first data acquisition module is used for acquiring first target data in the loan data, the first target data comprises data items of which the predictive capability values are greater than a preset threshold value in the loan data, and the predictive capability values are positively correlated with the influence degrees of the data items on the evaluation results;
the first result obtaining module is used for inputting the first target data into a preset prediction model to obtain a result output by the prediction model as a first result;
the second data acquisition module is used for acquiring second target data in the loan data, wherein the second target data comprises data items of which the data saturation is greater than a preset saturation threshold in the loan data;
the second result acquisition module is used for inputting the second target data into a clustering model to obtain the type of the first user output by the clustering model, taking the risk level corresponding to the type of the first user in advance as a second result, and clustering loan data of a plurality of users by the clustering model to obtain a plurality of types;
and the evaluation result acquisition module is used for acquiring an evaluation result of the loan risk of the first user, and the evaluation result is determined according to the first result and the second result.
An assessment device for loan risk comprising a memory and a processor;
the memory is used for storing programs, and the processor is used for operating the programs so as to realize the assessment method for loan risk.
A computer-readable storage medium, on which a computer program is stored which, when run on a computer, carries out the above-mentioned assessment method for loan risk.
According to the technical scheme, the borrowing and lending data of the first user and the first target data in the borrowing and lending data are obtained, the first target data are input into a preset prediction model, the result output by the prediction model is obtained and is used as the first result, the second target data in the borrowing and lending data of the first user are obtained, the second target data are input into the clustering model, the type of the first user output by the clustering model is obtained, and the risk level corresponding to the type of the first user in advance is used as the second result. And further obtaining an evaluation result of the loan risk of the first user determined according to the first result and the second result. The borrowing and lending data comprise multi-head borrowing and lending behavior data of the user, asset data of the user and consumption data of the user, and therefore the borrowing and lending risk of the user is evaluated from multiple angles, so that evaluation basis is more comprehensive, accuracy of an evaluation result is improved, moreover, the first target data comprise data items of which the prediction capability values are larger than a preset threshold value in the borrowing and lending data, accurate results can be output by a prediction model, the second target data comprise data items of which the data saturation is larger than the preset saturation threshold value in the borrowing and lending data, and accurate results can be output by a clustering model. Further, the evaluation result is obtained according to the first result and the second result, so that the accuracy of the evaluation result is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of an assessment method for loan risk according to an embodiment of the present disclosure;
FIG. 2 is a flow chart of another method for assessing loan risk disclosed in an embodiment of the present application;
FIG. 3 is a flow chart of a method for training a model disclosed in an embodiment of the present application;
fig. 4 is a schematic structural diagram of an apparatus for assessing loan risk according to an embodiment of the present disclosure.
Detailed Description
The method and the device for assessing the loan risk disclosed in the embodiments of the present application can be applied to, but are not limited to, a loan approval platform of a bank, and are used for assessing the loan risk of a user, specifically, assessing the loan risk of the user from multiple angles, and particularly, assessing the loan risk in a multi-head loan situation, thereby providing a reference for approval of loans.
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Fig. 1 is a method for assessing loan risk disclosed in an embodiment of the present application, including the following steps:
s101: loan data for a first user is obtained.
In this embodiment, the loan data of the first user includes multi-head loan behavior data of the first user, asset data of the first user, and consumption data of the first user. Any type of lending data includes at least one data item.
The multi-head loan behavior data is that the first user can acquire the multi-head loan behavior data from a plurality of loan institution platforms aiming at the loan behavior data of a plurality of loan institutions.
Specifically, the multi-head loan behavior data of any one user may include a repayment data item, and the repayment data item may be obtained from the collection flow information of a plurality of collection channels.
For example, the payment data items preset for any one user include, but are not limited to, the following fields: the identification of the receiver of the payment, the number of failed payment in the last 6 months and the number of failed payment in the last 1 year caused by insufficient balance. And merging and sorting the generation and collection flow information of a plurality of collection channels (the specific implementation mode can refer to the prior art), so as to obtain the numerical value of the preset repayment data item field.
Optionally, in order to improve the fitness of the multi-head loan behavior data and the loan behavior of the user, so as to further improve the accuracy of the evaluation result, the multi-head loan behavior data may further include at least one of the following: the system comprises a first user, a plurality of borrowers, a plurality of lender loan application data item, a plurality of borrowers payment data item and a plurality of borrowers inquiry data item.
The data items of the first user for applying for the borrowing of the plurality of borrowers can be obtained from the plurality of lending platforms.
For example, the preset data items of the loan application of any one user include, but are not limited to, the following fields: the loan party identifier, the number of loan institutions applied by the user in about 1 month, and the number of successful applications by the user. And merging and sorting the user authentication information of the plurality of loan platforms (the user needs real-name authentication before loan application, and the authentication behaviors of different loan platforms can be regarded as loan application operations in a plurality of institutions), so as to obtain the value of the preset loan application data field.
The loan data items for the first user are obtained from a plurality of payment channels by a plurality of borrowers.
For example, the data items for the placement of the borrower to any one of the users are preset to include, but are not limited to, the following fields: the identification of the lender, the total amount of money which is put by all the lenders for nearly 3 months on the user and the number of the putting mechanisms for putting money on the user.
Because the payment-withholding transaction can be regarded as the payment putting action of the loan platform to the user, the payment-withholding flow information of a plurality of payment channels is merged and collated to obtain the value of the payment data item field.
Query data items for the first user by a plurality of borrowers may be obtained from a plurality of lending platforms.
For example, the query behavior data items of the plurality of borrowers for the first user may include, but are not limited to, the following fields: the method comprises the steps of inquiring the identification of a lender of the lending behavior data of the first user in a month, inquiring the times of the lending behavior data of the first user in a month and inquiring the total times of the lending behavior data of the first user in a month.
Asset data for any one user includes, but is not limited to: real estate asset data for a user. Asset data for a user may be obtained from a non-mobile asset registration authority.
Consumption data for any one user includes, but is not limited to: the total amount of money consumed by the user within the preset time range. The consumer data may be obtained from various consumer collection terminals, such as POS machines and the like.
It should be noted that the numerical values of the data items may be obtained by preprocessing, such as cleaning, integrating, and processing, the raw data obtained from each channel or platform, and the specific preprocessing manner may refer to the prior art and is not described herein again.
S102: first target data in the loan data is obtained.
The first target data comprises data items of the loan data, wherein the prediction capability value is greater than a preset threshold value. The predictive ability value is positively correlated with the degree of influence of the data item on the evaluation result.
In this embodiment, the prediction capability value may be an Information Value (IV) of the data item. The specific process for converting the loan data into an IV is described in the examples below.
S103: and inputting the first target data into a preset prediction model to obtain a result output by the prediction model as a first result.
In this embodiment, the prediction model may be a logistic regression-based scoring card model. The training process of the model can be seen in the flow shown in fig. 3.
Specifically, the result output by the prediction model may be the probability of the first user's overdue repayment behavior, so as to quantify the risk.
S104: and acquiring second target data in the loan data.
The second target data comprises data items of which the data saturation in the loan data is greater than a preset saturation threshold. Specifically, the data items with data saturation greater than the preset saturation threshold are determined according to the generation process of the clustering model, which is specifically referred to in the following embodiments.
S105: and inputting the second target data into the clustering model to obtain the type of the first user output by the clustering model, and taking the risk level corresponding to the type of the first user in advance as a second result.
In this embodiment, the clustering model clusters the loan data of a plurality of users to obtain a plurality of types.
The risk level pre-corresponding to each type is determined according to at least one user data item included in each type output in the training process of the clustering model, which is specifically referred to in the following embodiments.
S106: and obtaining the assessment result of the loan risk of the first user.
In this embodiment, the evaluation result is determined according to the first result and the second result, and two optional methods for determining the evaluation result include:
1. and obtaining an evaluation result by using expert experience according to the first result and the second result.
2. And converting the second result into a score according to a preset corresponding relation, and carrying out weighted average on the first result and the second result to obtain an evaluation result.
It should be noted that the specific method for obtaining the evaluation result may also include other various methods, which are not described herein again.
The flow shown in fig. 1 has the following beneficial effects:
1. the multi-head loan behavior data of the first user comprises loan relations between the first user and a plurality of loan platforms, so that the risk assessment of the multi-head loan is more accurate. More importantly, the multi-head loan is matched with the repayment capacity and the consumption capacity by combining the multi-head loan behavior data of the user and the asset capacity data and the consumption behavior data of the first user, so that the comprehensiveness of the evaluation basis is improved, and the accuracy of the evaluation result is improved.
2. The multi-head loan behavior data of the first user is generated according to the real loan behavior of the first user, and is not influenced by subjective factors, so that the accuracy of the evaluation result is higher.
Similarly, the consumption behavior data of the first user is also data generated according to the consumption behavior of the first user, so that the accuracy of the evaluation result can be further improved.
3. The evaluation result is obtained according to the first result and the second result, and the first result is a result which is output by the prediction model and obtained by quantifying the loan risk of the first user according to the loan data of the first user. The second result is the type of the first user output according to the clustering model, and the clustering model clusters the borrowing and lending data of the multiple users to obtain multiple types, so that the risk level corresponding to the type of the first user indicated by the second result is determined not only according to the borrowing and lending data of the first user, but also according to the borrowing and lending data of the multiple users, and the accuracy of the evaluation result obtained by combining the first result and the second result is high.
Fig. 2 is a schematic diagram of another loan risk assessment method disclosed in the embodiment of the present application, in which detailed steps of data processing are added as compared with the flowchart shown in fig. 1. The process shown in fig. 2 comprises the following steps:
s201: raw loan data for a first user is obtained.
The original loan data comprises original data of the multi-head loan behavior of the first user, original data of the assets of the first user and original data of consumption of the first user. Any of the raw loan data includes at least one data item.
The manner in which the raw loan data is obtained can be seen in the above examples. That is, the loan data in the above embodiment may be regarded as data obtained by performing data processing on the original loan data, but the data processing does not affect the type of the data and the channel of obtaining the data.
S202: and performing data cleaning on the original loan data of the first user.
Cleaning includes, but is not limited to, the following operations: the numerical value of the numerical anomaly is deleted, or the average value of the homogeneous data items is used instead of the numerical value of the numerical anomaly. And taking the preset value as the numerical value of the data item without numerical value.
The data is more standard through cleaning processing of the data, and the model is favorable for outputting an evaluation result with higher accuracy.
S203: the raw loan data is processed according to the type of data items in the cleaning results.
In this embodiment, the processing includes, but is not limited to, the following:
1. and carrying out box separation processing on the data items belonging to the continuous variable.
2. And carrying out base reduction processing on the data items belonging to the nominal variable. For example, multiple nominal data items are merged into one nominal data item.
The definitions and specific implementation manners of the binning processing and the base dropping processing may refer to the prior art, and are not described herein again.
S204: the processed data item is subjected to Evidence Weight conversion (WOE).
The purpose of WOE is to enable the values of the data items to meet the requirements of the logistic regression model for the data format.
S205: and carrying out data equalization processing on the evidence weight conversion result.
In this embodiment, the data equalization process includes, but is not limited to: upsampling and downsampling.
It should be noted that the data equalization process is an optional step, and aims to improve the equalization of data distribution so as to further improve the accuracy of the first result output by the prediction model. If S205 is not executed, the result of WOE is the data after the data equalization process.
S206: and taking the data with the IV value larger than a preset threshold value in the data after the data equalization processing as first prediction data.
S207: and using the data item specified according to the expert experience as second prediction data.
In this embodiment, the first prediction data and the second prediction data are collectively referred to as first target data.
It can be understood that the IV value of the second prediction data is not greater than the preset threshold, but in practice, there is a part of data items, and although the IV value is not greater than the preset threshold, the influence on the evaluation result is large, and therefore, the part of data items is also used as the modeling data item to improve the accuracy of the evaluation result.
Specifically, a second prediction data selection interface may be displayed, and the expert may select the second prediction data in the selection interface.
S208: and inputting the first target data into the scoring card model based on the logistic regression to obtain the evaluation value of the loan risk of the first user output by the scoring card model based on the logistic regression.
In this embodiment, the logistic regression scoring card model is a prediction model, and the output evaluation value of the loan risk of the first user is the first result.
S201 to S208 shown in fig. 2 focus on a process of generating data (i.e., data items after data equalization processing) for inputting the prediction model from the original loan data, and the process makes the input data of the prediction model more suitable for the prediction model and more normative and balanced, so that the accuracy of the first result output by the prediction model can be further improved. Moreover, target data can be manually specified, and the accuracy of an evaluation result is further improved.
S209: and selecting the data items with the data saturation degree larger than a preset saturation degree threshold value from the processed data items as first clustering data.
In this embodiment, the processed data items refer to data obtained in S203, and the data items whose data saturation is greater than the preset saturation threshold are determined according to a clustering model training process, which is specifically referred to in S211.
S210: and using the data item specified according to expert experience as second clustering data.
In this embodiment, the first cluster data and the second cluster data are collectively referred to as second target data.
S211: and inputting the second target data into a preset clustering model to obtain the type of the first user output by the clustering model.
In this embodiment, the clustering model is an unsupervised model, the types of the clustering model include multiple types, and the specific structure can be referred to in the prior art.
The clustering model is obtained by training according to clustering data of a plurality of users, and the training process of the clustering model comprises the following steps:
1. clustering data of a plurality of users is obtained.
The cluster data of the plurality of users includes: data items in which the data saturation in the loan data (original loan data or processed data) of a plurality of users is greater than a preset saturation threshold, and data items specified by expert experience.
The method for calculating the data saturation of any data item (denoted as a first data item) comprises the following steps: the number of first data items having numerical values in the first data items of the plurality of users is a proportion of the number of all the first data items. For example, the first data item is a repayment data item, and the number of repayment data items with numerical values in the original loan data of the plurality of users accounts for 75% of all the repayment data items, and exceeds a preset saturation threshold value by 70%, then the repayment data items are taken as cluster data.
2. And enabling the clustering data of the plurality of users to belong to a preset clustering model.
3. And acquiring a plurality of types of the clustering model obtained by clustering the clustering data of a plurality of users.
Based on the clustering principle of the unsupervised model, the clustering model clusters a plurality of users into a type (marked as N types) with a preset number based on a preset unsupervised clustering algorithm through continuously iteratively fitting model parameters, wherein the preset number is greater than 1, and it needs to be noted that the preset number can be a numerical value specified by expert experience.
Wherein. Each type includes at least one user, wherein the clustering data of users belonging to the same type are similar (i.e. the difference is small), and the clustering data of users belonging to different types are different (i.e. the difference is large), wherein the difference is obtained by a clustering algorithm, for example, the difference is characterized by the euclidean distance between the clustering sample data.
It can be understood that the data items included in the second target data are determined according to the clustering data of the multiple users in the training process of the clustering model, and the type to which the first user belongs is any one of multiple types obtained by clustering the clustering data of the multiple users by the clustering model.
It should be noted that the clustering model is a supervised model, and can be updated in real time according to real-time loan data. For example, the prediction model continues the iterative tuning process by continuously inputting cluster data of different users into the clustering model.
S212: and acquiring the attribute corresponding to the type of the first user in advance as a second result.
In this embodiment, the attribute corresponding to any type in advance at least includes a risk level, and also includes other attributes.
The process of obtaining the attribute corresponding to each type in advance comprises the following steps:
1. data items of a plurality of users included in each type are acquired.
Optionally, the users included in any one type (denoted as a target type) may be users in a target type obtained by clustering a plurality of users in the process of training the clustering model.
2. By comparing data items in different types, different types of attributes are obtained.
Based on the characteristics of the clusters, the attributes of the data items belonging to the same type of users are similar, and the attributes of the data items belonging to the same type of users are different, so that the data items in different types are compared according to a preset rule, and different types of risk levels and other attributes are obtained according to the similar data items among the same type of users and the different data items among the different types of users, for example, the other attributes comprise loan willingness.
Take the example that the types output by the clustering model include 4 types. The attributes of the types obtained according to expert experience are respectively: the system comprises a high risk group, a suspicious group, a concerned group and a normal group, wherein the high risk group has high loan risk and low loan willingness. The suspicious population and the concerned population are the populations with loan risks and loan willingness to be further checked and verified respectively, and the normal population has strong loan requirements and low loan risk. And taking the attribute of each type as the attribute corresponding to the type in advance, and taking the risk level of the target type and other attributes as a second result when the first user belongs to the target type, for example, if the attribute of the type of the first user output by the clustering model is 'normal crowd', the first user has strong loan demand and loan risk, and therefore, the loan item can be directionally recommended to the first user.
In the process shown in fig. 2, steps S201 to S203 and steps S209 to S212 focus on generating second target data from the original loan data, and obtaining an attribute corresponding to the type of the first user output by the clustering model according to the second target data as a second result.
The attributes comprise risk levels, the types are obtained by the clustering model according to the clustering data of the users, namely the type of the first user is related to the lending data of the first user and the lending data of other users, and the second target data can be manually designated, so that the accuracy of the risk evaluation result of the risk level included by the second result as the multi-head lending is high.
Furthermore, because the attributes corresponding to the multiple types are obtained by analyzing the similar clustering data according to the expert experience, not only can the risk level of the user be obtained, but also other attributes of the user can be obtained, for example, whether the loan will exist or not, that is, the second action result can be used as a risk evaluation result of multi-head loan, and also can be used as reference data of the loan will, and is used for mining clients with loan demands and excellent qualification, improving the conversion rate of the clients, helping financial institutions reduce marketing customer acquisition cost
S213: an evaluation result is determined from the first result and the second result.
In this embodiment, the method for determining the assessment result according to the first result and the second result may include various methods, for example, determining the assessment result according to the first result and the risk level, and other attributes may be used as another reference dimension of the loan risk.
The process shown in fig. 2 not only obtains the evaluation value (i.e., the first result) of the loan risk of the user through the loan data of the user, but also determines the type of the user through the loan data of the user and the loan data of other users, and further obtains the attribute corresponding to the type as the second result, the evaluation value of the loan risk of the user quantifies the loan risk of the user for evaluating the loan risk of the user, and the behavior clustering result of the user is not only used for evaluating the loan risk of the user, but also used for determining other attributes of the user, so as to provide a basis for analyzing the loan behavior. Therefore, the method improves the accuracy of the loan risk assessment by combining the assessment value and the attribute of the loan risk of the user.
For example, assuming that the user group includes a plurality of users, the first results of the plurality of users all indicate that the user is a user with a high loan risk, and the loan request of all the users in the user group is rejected only according to the first results. However, in the method, the types to which all the users in the user group belong are acquired according to the type output by the clustering model and belong to the same type, and the attribute of the type is the attention crowd, so that the loan risk of the users is wholly improved, and the lender can consider loan on a part of the users in the user group according to the sequencing of the first result value.
Fig. 3 is a model training process provided in the embodiment of the present application, which includes the following steps:
s301: sample target data and a label for the sample target data are obtained.
The sample target data includes data items in the sample loan data whose predictive power value is greater than a preset threshold value, and data items specified according to expert experience. The channels and manners for acquiring the sample loan data can be referred to in the above embodiments.
S302: the sample target data is divided into two parts. One part as a training data set and one part as a verification data set.
S303: the model is trained using the training data set and the labels.
S304: the trained model is validated using a validation dataset.
S305: if the verification is not passed, returning to execute S301, it can be understood that the sample target data is to be retrieved, and the same sample target data as the last training process is not used.
S306: and if the verification is passed, obtaining a trained model.
It should be noted that the model is output to various financial institutions holding the cards for use, the effect of the model is evaluated according to the use feedback of the customers, and when the effect of the model is reduced, the model is optimized, that is, the parameters of the model are dynamically adjusted in real time, so that the model is ensured to always keep the highest precision. The model effect depends on three factors: independent variables (training data sets), dependent variables (labels) and algorithms are usually kept unchanged, and the measures for optimizing the model include, but are not limited to, replacing labels or adding labels to ensure that the model always keeps the optimal precision. See in particular the prior art.
Fig. 4 is a device for assessing loan risk according to an embodiment of the present application, including: the system comprises a loan data acquisition module, a first result acquisition module, a second data acquisition module, a second result acquisition module and an evaluation result acquisition module.
The system comprises a borrowing data acquisition module, a borrowing data acquisition module and a data processing module, wherein the borrowing data acquisition module is used for acquiring borrowing data of a first user, the borrowing data of the first user comprises multi-head borrowing and lending behavior data of the first user, asset data of the first user and consumption data of the first user, and any one of the borrowing data comprises at least one data item;
the system comprises a first data acquisition module, a second data acquisition module and a third data acquisition module, wherein the first data acquisition module is used for acquiring first target data in the loan data, the first target data comprises data items of which the predictive capability values are greater than a preset threshold value in the loan data, and the predictive capability values are positively correlated with the influence degrees of the data items on the evaluation results;
the first result obtaining module is used for inputting the first target data into a preset prediction model to obtain a result output by the prediction model as a first result;
the second data acquisition module is used for acquiring second target data in the loan data, wherein the second target data comprises data items of which the data saturation is greater than a preset saturation threshold in the loan data;
the second result acquisition module is used for inputting the second target data into a clustering model to obtain the type of the first user output by the clustering model, taking the risk level corresponding to the type of the first user in advance as a second result, and clustering loan data of a plurality of users by the clustering model to obtain a plurality of types;
and the evaluation result acquisition module is used for acquiring an evaluation result of the loan risk of the first user, and the evaluation result is determined according to the first result and the second result.
Optionally, the multi-head loan behavior data at least includes: and the user repayment data items of a plurality of borrowers. Further, the method can also comprise the following steps: at least one of a loan application data item of the plurality of borrowers by the user, a placement data item of the plurality of borrowers to the user, and a query data item of the first user by the plurality of borrowers.
Optionally, the first target data further includes: data items specified according to expert experience;
the second target data further includes: data items specified according to expert experience
Optionally, the training process of the prediction model includes: acquiring sample target data and a label of the sample target data, wherein the sample target data comprises a data item of which the predictive capability value is greater than the preset threshold value in sample loan data and a data item specified according to expert experience; training the predictive model using a portion of the sample target data and a label for the portion of the sample target data; validating the trained predictive model using another portion of the sample target data and a label of the another portion of the sample target data; and if the verification fails, reselecting sample target data to train the prediction model until the verification passes.
Optionally, the apparatus further comprises: an attribute determination unit to:
acquiring a plurality of types obtained by clustering loan data of a plurality of users by the clustering model;
and obtaining other attributes of different types by comparing the data items in different types, wherein the other attributes are the attributes except the risk level.
Optionally, the prediction model is a supervised model and the clustering model is an unsupervised model.
The loan risk assessment apparatus shown in fig. 4 has high assessment accuracy because the multi-head loan behavior data, the asset data of the user, and the consumption data of the user are used as assessment bases.
The embodiment of the application also discloses loan risk assessment equipment, which comprises a processor and a memory. Wherein the processor and the memory communicate over a bus.
The memory is used for storing programs, and the processor is used for running the programs to realize the assessment method for the loan risk:
a method for assessing loan risk, comprising:
acquiring borrowing and lending data of a first user, wherein the borrowing and lending data of the first user comprises multi-head borrowing and lending behavior data of the first user, asset data of the first user and consumption data of the first user, and any one of the borrowing and lending data comprises at least one data item;
acquiring first target data in the loan data, wherein the first target data comprises data items of which the predictive power values are greater than a preset threshold value in the loan data, and the predictive power values are positively correlated with the influence degrees of the data items on the evaluation results;
inputting the first target data into a preset prediction model to obtain a result output by the prediction model as a first result;
acquiring second target data in the loan data, wherein the second target data comprises data items of which the data saturation is greater than a preset saturation threshold in the loan data;
inputting the second target data into a clustering model to obtain a type of the first user output by the clustering model, taking a risk grade corresponding to the type of the first user in advance as a second result, and clustering borrowing and lending data of a plurality of users by the clustering model to obtain a plurality of types;
and obtaining an evaluation result of the loan risk of the first user, wherein the evaluation result is determined according to the first result and the second result.
Optionally, the multi-head loan behavior data of the first user at least includes:
and the first user repayment data items of a plurality of borrowers.
Optionally, the first user's multi-head loan behavior data further comprises at least one of:
the first user applies data items for borrowing of the plurality of borrowers;
a placement data item for the first user by the plurality of borrowers;
the plurality of borrowers query data items of the first user.
Optionally, the first target data further includes:
data items specified according to expert experience;
the second target data further includes:
data items specified according to expert experience.
Optionally, the training process of the prediction model includes:
acquiring sample target data and a label of the sample target data, wherein the sample target data comprises a data item of which the predictive capability value is greater than the preset threshold value in sample loan data and a data item specified according to expert experience;
training the predictive model using a portion of the sample target data and a label for the portion of the sample target data;
validating the trained predictive model using another portion of the sample target data and a label of the another portion of the sample target data;
and if the verification fails, reselecting sample target data to train the prediction model until the verification passes.
Optionally, the method further comprises:
acquiring a plurality of types obtained by clustering loan data of a plurality of users by the clustering model;
and obtaining other attributes of different types by comparing the data items in different types, wherein the other attributes are the attributes except the risk level.
Optionally, the prediction model is a supervised model and the clustering model is an unsupervised model.
The embodiment of the present application further discloses a computer-readable storage medium, on which a computer program is stored, and when the computer program runs on a computer, the method for assessing loan risk according to the above method embodiment is performed:
a method for assessing loan risk, comprising:
acquiring borrowing and lending data of a first user, wherein the borrowing and lending data of the first user comprises multi-head borrowing and lending behavior data of the first user, asset data of the first user and consumption data of the first user, and any one of the borrowing and lending data comprises at least one data item;
acquiring first target data in the loan data, wherein the first target data comprises data items of which the predictive power values are greater than a preset threshold value in the loan data, and the predictive power values are positively correlated with the influence degrees of the data items on the evaluation results;
inputting the first target data into a preset prediction model to obtain a result output by the prediction model as a first result;
acquiring second target data in the loan data, wherein the second target data comprises data items of which the data saturation is greater than a preset saturation threshold in the loan data;
inputting the second target data into a clustering model to obtain a type of the first user output by the clustering model, taking a risk grade corresponding to the type of the first user in advance as a second result, and clustering borrowing and lending data of a plurality of users by the clustering model to obtain a plurality of types;
and obtaining an evaluation result of the loan risk of the first user, wherein the evaluation result is determined according to the first result and the second result.
Optionally, the multi-head loan behavior data of the first user at least includes:
and the first user repayment data items of a plurality of borrowers.
Optionally, the first user's multi-head loan behavior data further comprises at least one of:
the first user applies data items for borrowing of the plurality of borrowers;
a placement data item for the first user by the plurality of borrowers;
the plurality of borrowers query data items of the first user.
Optionally, the first target data further includes:
data items specified according to expert experience;
the second target data further includes:
data items specified according to expert experience.
Optionally, the training process of the prediction model includes:
acquiring sample target data and a label of the sample target data, wherein the sample target data comprises a data item of which the predictive capability value is greater than the preset threshold value in sample loan data and a data item specified according to expert experience;
training the predictive model using a portion of the sample target data and a label for the portion of the sample target data;
validating the trained predictive model using another portion of the sample target data and a label of the another portion of the sample target data;
and if the verification fails, reselecting sample target data to train the prediction model until the verification passes.
Optionally, the method further comprises:
acquiring a plurality of types obtained by clustering loan data of a plurality of users by the clustering model;
and obtaining other attributes of different types by comparing the data items in different types, wherein the other attributes are the attributes except the risk level.
Optionally, the prediction model is a supervised model and the clustering model is an unsupervised model.
The functions described in the method of the embodiment of the present application, if implemented in the form of software functional units and sold or used as independent products, may be stored in a storage medium readable by a computing device. Based on such understanding, part of the contribution to the prior art of the embodiments of the present application or part of the technical solution may be embodied in the form of a software product stored in a storage medium and including several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
1. A method for assessing loan risk, comprising:
acquiring borrowing and lending data of a first user, wherein the borrowing and lending data of the first user comprises multi-head borrowing and lending behavior data of the first user, asset data of the first user and consumption data of the first user, and any one of the borrowing and lending data comprises at least one data item;
acquiring first target data in the loan data, wherein the first target data comprises data items of which the predictive power values are greater than a preset threshold value in the loan data, and the predictive power values are positively correlated with the influence degrees of the data items on the evaluation results;
inputting the first target data into a preset prediction model to obtain a result output by the prediction model as a first result;
acquiring second target data in the loan data, wherein the second target data comprises data items of which the data saturation is greater than a preset saturation threshold in the loan data;
inputting the second target data into a clustering model to obtain a type of the first user output by the clustering model, taking a risk grade corresponding to the type of the first user in advance as a second result, and clustering borrowing and lending data of a plurality of users by the clustering model to obtain a plurality of types;
and obtaining an evaluation result of the loan risk of the first user, wherein the evaluation result is determined according to the first result and the second result.
2. The method of claim 1, wherein the first user's multi-head loan activity data comprises at least:
and the first user repayment data items of a plurality of borrowers.
3. The method of claim 2, wherein the first user's multi-head loan behavior data further comprises at least one of:
the first user applies data items for borrowing of the plurality of borrowers;
a placement data item for the first user by the plurality of borrowers;
the plurality of borrowers query data items of the first user.
4. The method of any of claims 1-3, wherein the first target data further comprises:
data items specified according to expert experience;
the second target data further includes:
data items specified according to expert experience.
5. The method according to any one of claims 1-3, wherein the training process of the predictive model comprises:
acquiring sample target data and a label of the sample target data, wherein the sample target data comprises a data item of which the predictive capability value is greater than the preset threshold value in sample loan data and a data item specified according to expert experience;
training the predictive model using a portion of the sample target data and a label for the portion of the sample target data;
validating the trained predictive model using another portion of the sample target data and a label of the another portion of the sample target data;
and if the verification fails, reselecting sample target data to train the prediction model until the verification passes.
6. The method of claim 1, further comprising:
acquiring a plurality of types obtained by clustering loan data of a plurality of users by the clustering model;
and obtaining other attributes of different types by comparing the data items in different types, wherein the other attributes are the attributes except the risk level.
7. The method of claim 1, wherein the predictive model is a supervised model and the clustering model is an unsupervised model.
8. An assessment apparatus for loan risk, comprising:
the system comprises a loan data acquisition module, a data processing module and a data processing module, wherein the loan data acquisition module is used for acquiring loan data of a first user, the loan data of the first user comprises multi-head loan behavior data of the first user, asset data of the first user and consumption data of the first user, and any one of the loan data comprises at least one data item;
the system comprises a first data acquisition module, a second data acquisition module and a third data acquisition module, wherein the first data acquisition module is used for acquiring first target data in the loan data, the first target data comprises data items of which the predictive capability values are greater than a preset threshold value in the loan data, and the predictive capability values are positively correlated with the influence degrees of the data items on the evaluation results;
the first result obtaining module is used for inputting the first target data into a preset prediction model to obtain a result output by the prediction model as a first result;
the second data acquisition module is used for acquiring second target data in the loan data, wherein the second target data comprises data items of which the data saturation is greater than a preset saturation threshold in the loan data;
the second result acquisition module is used for inputting the second target data into a clustering model to obtain the type of the first user output by the clustering model, taking the risk level corresponding to the type of the first user in advance as a second result, and clustering loan data of a plurality of users by the clustering model to obtain a plurality of types;
and the evaluation result acquisition module is used for acquiring an evaluation result of the loan risk of the first user, and the evaluation result is determined according to the first result and the second result.
9. An assessment device for loan risk comprising a memory and a processor;
the memory is used for storing a program, and the processor is used for operating the program to realize the assessment method for loan risk according to any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when the computer program runs on a computer, executes the method for assessing a loan risk according to any one of claims 1 to 7.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202011431333.1A CN112232950A (en) | 2020-12-10 | 2020-12-10 | Loan risk assessment method and device, equipment and computer-readable storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202011431333.1A CN112232950A (en) | 2020-12-10 | 2020-12-10 | Loan risk assessment method and device, equipment and computer-readable storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN112232950A true CN112232950A (en) | 2021-01-15 |
Family
ID=74124718
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202011431333.1A Pending CN112232950A (en) | 2020-12-10 | 2020-12-10 | Loan risk assessment method and device, equipment and computer-readable storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN112232950A (en) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113516548A (en) * | 2021-05-14 | 2021-10-19 | 牛少侠科技(山西)有限公司 | Financial borrowing and lending method and system based on block chain |
| CN113592623A (en) * | 2021-07-20 | 2021-11-02 | 浙江惠瀜网络科技有限公司 | Construction method of risk assessment system before vehicle loan and credit and risk assessment method |
| CN113657724A (en) * | 2021-07-29 | 2021-11-16 | 上海淇玥信息技术有限公司 | Resource allocation method and device based on multi-source heterogeneous data and electronic equipment |
| CN118333633A (en) * | 2024-06-13 | 2024-07-12 | 江西达途数字技术有限公司 | Credit and debt standing book management system and method based on big data |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108182634A (en) * | 2018-01-31 | 2018-06-19 | 国信优易数据有限公司 | A kind of training method for borrowing or lending money prediction model, debt-credit Forecasting Methodology and device |
| CN108961032A (en) * | 2017-05-25 | 2018-12-07 | 腾讯科技(深圳)有限公司 | Borrow or lend money processing method, device and server |
| CN110349009A (en) * | 2019-07-02 | 2019-10-18 | 北京淇瑀信息科技有限公司 | A kind of bull debt-credit violation correction method, apparatus and electronic equipment |
| CN111311402A (en) * | 2020-03-30 | 2020-06-19 | 百维金科(上海)信息科技有限公司 | XGboost-based internet financial wind control model |
| CN111325248A (en) * | 2020-02-10 | 2020-06-23 | 深圳华策辉弘科技有限公司 | Method and system for reducing pre-loan business risk |
| CN111507831A (en) * | 2020-05-29 | 2020-08-07 | 长安汽车金融有限公司 | Credit risk automatic assessment method and device |
-
2020
- 2020-12-10 CN CN202011431333.1A patent/CN112232950A/en active Pending
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108961032A (en) * | 2017-05-25 | 2018-12-07 | 腾讯科技(深圳)有限公司 | Borrow or lend money processing method, device and server |
| CN108182634A (en) * | 2018-01-31 | 2018-06-19 | 国信优易数据有限公司 | A kind of training method for borrowing or lending money prediction model, debt-credit Forecasting Methodology and device |
| CN110349009A (en) * | 2019-07-02 | 2019-10-18 | 北京淇瑀信息科技有限公司 | A kind of bull debt-credit violation correction method, apparatus and electronic equipment |
| CN111325248A (en) * | 2020-02-10 | 2020-06-23 | 深圳华策辉弘科技有限公司 | Method and system for reducing pre-loan business risk |
| CN111311402A (en) * | 2020-03-30 | 2020-06-19 | 百维金科(上海)信息科技有限公司 | XGboost-based internet financial wind control model |
| CN111507831A (en) * | 2020-05-29 | 2020-08-07 | 长安汽车金融有限公司 | Credit risk automatic assessment method and device |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113516548A (en) * | 2021-05-14 | 2021-10-19 | 牛少侠科技(山西)有限公司 | Financial borrowing and lending method and system based on block chain |
| CN113592623A (en) * | 2021-07-20 | 2021-11-02 | 浙江惠瀜网络科技有限公司 | Construction method of risk assessment system before vehicle loan and credit and risk assessment method |
| CN113657724A (en) * | 2021-07-29 | 2021-11-16 | 上海淇玥信息技术有限公司 | Resource allocation method and device based on multi-source heterogeneous data and electronic equipment |
| CN118333633A (en) * | 2024-06-13 | 2024-07-12 | 江西达途数字技术有限公司 | Credit and debt standing book management system and method based on big data |
| CN118333633B (en) * | 2024-06-13 | 2024-08-13 | 江西达途数字技术有限公司 | Credit and debt standing book management system and method based on big data |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Paleologo et al. | Subagging for credit scoring models | |
| CN111861174B (en) | Credit assessment method for user portrait | |
| Van Thiel et al. | Artificial intelligence credit risk prediction: An empirical study of analytical artificial intelligence tools for credit risk prediction in a digital era | |
| CN112561685B (en) | Customer classification method and device | |
| CN112232950A (en) | Loan risk assessment method and device, equipment and computer-readable storage medium | |
| Jiang et al. | Deciphering big data in consumer credit evaluation | |
| CN111476660A (en) | Intelligent wind control system and method based on data analysis | |
| US20150363875A1 (en) | System and Method for Filtering and Analyzing Transaction Information | |
| CN111708883A (en) | Credit credit limit determination method and device based on machine learning and equipment fingerprint | |
| CN114240617A (en) | Service request processing method and device, computer equipment and storage medium | |
| CN116361542A (en) | Product recommendation method, device, computer equipment and storage medium | |
| Li et al. | Credit Risk management of P2P network Lending | |
| Meursault et al. | One threshold doesn't fit all: Tailoring machine learning predictions of consumer default for lower‐income areas | |
| CN115564591A (en) | A method for determining a financing product and related equipment | |
| CN114626940A (en) | Data analysis method and device and electronic equipment | |
| CN117575773A (en) | Method, device, computer equipment and storage medium for determining service data | |
| Hou et al. | A Trial of Student Self‐Sponsored Peer‐to‐Peer Lending Based on Credit Evaluation Using Big Data Analysis | |
| Huang et al. | Attention discrimination under time constraints: Evidence from retail lending | |
| CN117764692A (en) | Method for predicting credit risk default probability | |
| CN113537666B (en) | Evaluation model training method, evaluation and business auditing method, device and equipment | |
| Wang | Default Risks in Marketplace Lending | |
| CN119887364A (en) | Method for constructing retail credit risk prediction model and credit card and special stage service Scorealpha d model | |
| CN119477509A (en) | Methods for building retail credit risk prediction models and Scorealpha2 models for credit cards and special installment businesses | |
| Garzozi-Pincay et al. | The Sociodemographic Characteristics Use for Credit Scoring and Financial Inclusion in Credits Granting, Case: Ecuador 2024 | |
| Hara Khanam | Credit scoring using Logistic regression |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210115 |
|
| RJ01 | Rejection of invention patent application after publication |