CN119761901A

CN119761901A - A scoring modeling method based on the fusion credit data of enterprises in the financial leasing industry

Info

Publication number: CN119761901A
Application number: CN202411842297.6A
Authority: CN
Inventors: 崔涛; 王光; 张旿; 张奇宝; 赵坤; 赵俊; 钱嘉楠; 李嘉和; 徐国栋
Original assignee: China Power Investment Ronghe Financial Leasing Co ltd; Shanghai Credit Reporting Co ltd
Current assignee: China Power Investment Ronghe Financial Leasing Co ltd; Shanghai Credit Reporting Co ltd
Priority date: 2024-12-13
Filing date: 2024-12-13
Publication date: 2025-04-04

Abstract

The invention provides a method for modeling enterprise credit information fusion score based on financing and leasing industry, which comprises three steps of enterprise public information modeling, enterprise credit information modeling and credit information fusion score, wherein enterprise public information score and enterprise credit information score are respectively obtained by respectively establishing an enterprise public information model and an enterprise credit information model, and then enterprise credit information fusion score is obtained according to agreed weight rules and a final fusion scoring result and key feature variable are displayed by a service application system.

Description

Enterprise credit data fusion scoring modeling method based on financing and renting industry

Technical Field

The invention relates to the technical field of computers, in particular to an enterprise credit data fusion scoring modeling method based on financing and leasing industries.

Background

In recent years, credit investigation data is an important production element and plays an important role in the scenes of enterprise credit evaluation, government supervision and the like. Non-banking financial institutions such as financing leases, business insurance and petty loans commonly face some pain points in credit investigation data acquisition, processing and application processes, and the demands of data in the aspects of safety, compliance, aging and the like are urgently met through an enterprise credit investigation data fusion scoring modeling method. On one hand, public government affair information such as enterprise business, judicial, tax, intellectual property and the like is diversified in data service organization in the market, the data format is not uniform, the text content of the part of information is various and needs professional interpretation and analysis, on the other hand, the use of enterprise credit information has strict authorization requirements, the complicated approval process can reduce the use timeliness of the data, the data value is difficult to be exerted to the maximum extent, and the credit behavior analysis of an enterprise information main body faces challenges.

Disclosure of Invention

The invention relates to a method for solving enterprise credit data fusion score modeling based on financing and renting industries. And rapidly reading mass information through big data software, mining the association relation of the data bottom layer, constructing an objective statistical model, scientifically predicting enterprise risk through model scoring, and comprehensively improving the application value and the use efficiency of the data. The invention provides a solution for modeling data preprocessing, model design and evaluation, multi-model fusion, system integration application and other works of enterprise credit data.

The enterprise public information in the method refers to government public information from industry and commerce, judicial, tax, intellectual property and the like. The enterprise public information has the conditions of multiple data items, scattered and not concentrated, and users need to log in a plurality of data platforms to acquire the required data, thus the efficiency is low, the data service link is longer, the operation is complex, and the like.

The enterprise credit information in the method is information derived from enterprise credit reports, including enterprise basic information, repayment performance information, guarantee information and the like. The enterprise credit information data has strict use requirements, the enterprise credit report information is applied in the company and needs strict standard management on the premise of obtaining the authorization of the information body, the use data approval process is complex, and the timeliness is to be improved. The credit report information of enterprises is generally in pdf format, and the credit information has strong specialization, and data structure analysis is needed to be firstly carried out and then professional interpretation analysis is needed.

A method for modeling enterprise credit information data fusion scoring based on financing and leasing industries is used for enterprise public information and enterprise credit information data modeling processes of enterprise credit risk assessment and comprises the technical fields of data preprocessing, model design and assessment, multi-model fusion, system integration application and the like.

The aim of the invention is realized by the following technical scheme:

A method for modeling enterprise credit data fusion scoring based on financing leasing industry comprises three steps of enterprise public information modeling, enterprise credit information modeling and credit data fusion scoring.

(1) Modeling enterprise public information:

Based on the identification information of the enterprise information main body, the enterprise basic information is called through an enterprise public information inquiry API, and the called enterprise basic information is stored in a service system memory, wherein the identification information of the enterprise information main body comprises, but is not limited to, enterprise names and/or unified social credit codes;

Defining a modeling target according to project application requirements, using a logistic regression model as a core modeling technology, using the data of the called enterprise basic information as sample data, modeling the enterprise public information to obtain an enterprise public information model, wherein a model result of the enterprise public information model comprises a variable name, a variable meaning, a variable value and a percentage preparation score;

And obtaining the integral score of the enterprise public information model through the corresponding value and percentage preparation score of each variable of the enterprise public information model.

Wherein, each variable in the enterprise public information model corresponds to a value and a percentage preparation score is defined according to the following table:

Specifically, in the enterprise public information modeling, modeling is performed on enterprise public information to obtain an enterprise public information model, and the overall score of the enterprise public information model is finally obtained through corresponding value and score of each variable of the enterprise public information model, wherein the method specifically comprises the following steps:

(1.1) data cleaning analysis:

and performing data cleaning and calculation on the retrieved enterprise basic information data to obtain characteristic variables in an enterprise public information model, wherein the data cleaning mainly comprises the steps of removing repeated data, removing logic conflict data, completing part of univariate calculation, processing noise data, abnormal values and outliers and processing missing numerical values.

(1.2) Feature variable analysis:

carrying out statistical characteristics and distribution analysis on characteristic variables in the obtained enterprise public information model, checking extreme values and processing the extreme values;

And (3) sorting the results of the feature variable analysis into a feature variable table, and recording the feature variable names, the calculation logic, the data coverage and the data distribution basic conditions.

(1.3) Evidence Weight (WOE) analysis:

Converting the logistic regression model into a standard grading card format through WOE conversion to obtain the variable value of the characteristic variable;

firstly, carrying out automatic box separation on all characteristic variables, then manually checking the reliability of an automatic box separation result, whether the automatic box separation result meets business requirements or not, whether the automatic box separation result has interpretability or not, and then judging whether the manual box separation is needed or not;

WOE for each category is defined as follows:

Wherein, columns Bad Distribution and Good Distribution represent the Distribution of "Bad clients" and "good clients" in each category, respectively, which are obtained by dividing the number of frequencies in each category by the total number of "Bad clients" or "good clients";

If the ratio in brackets is less than 1 then WOE is negative and vice versa WOE is positive.

(1.4) Modeling and debugging:

Initializing a series of model variables, fitting a model based on the current series of variables, wherein the model result of the fitted model comprises a characteristic variable name, a variable meaning, a variable value and a percentage preparation score, and then judging whether the fitted model is an optimal model or not;

if the model is judged to be the optimal model, a final model and variables of the enterprise public information model are obtained;

if the model is judged not to be the optimal model, adding some variables into the model or deleting some variables, then re-fitting a model based on a current series of variables, judging whether the re-fitted model is the optimal model or not, and obtaining a final model and variables of the enterprise public information model until the optimal model is found.

(1.5) Fractional linear conversion:

The grading score is linearly converted into 0-100 grades, the distribution characteristics are unchanged, and the conversion formula is as follows:

(1.6) model achievement presentation

And displaying the final modeling variable, the variable value and the percentile score of each variable of the enterprise public information model on the business system.

The data cleaning analysis (1.1) comprises a cleaning rule for general data and a cleaning rule for specific data.

(A) The general data cleaning rule is specifically processed as follows:

(A1) The date field is uniformly displayed according to the YYYY-MM-DD format;

(A2) An amount type field, which is to unify all amounts into a numerical format and calculate according to ten thousand yuan of the Renminbi;

(A3) The proportion field is used for unifying all proportions into a numerical format, removing percentage numbers, and supplementing 0 to 0 before decimal points;

(A4) And (3) repeating the data, namely, for the same event, possibly multiple repeated information records exist in the data table, and the data deduplication takes a keyword of 'company name + event unique identification judgment' as a main identification mode.

(B) The specific data cleaning rule is specifically processed as follows:

(B1) The enterprise registration comprises the steps of correcting an enterprise registration date by using the enterprise operation starting date if the enterprise registration date is empty, deleting the observation that the enterprise operation starting date is empty, deleting the observation that the operation state is cancel or cancel but the cancel date or cancel date is not empty, deleting the observation that the enterprise operation expiration date is not empty but the enterprise operation expiration date is less than the enterprise operation starting date, deleting the observation that the cancel date is not empty but the cancel date is less than the enterprise registration date;

(B2) The main personnel are to combine the job names of the same company and the same person name, wherein one person has a plurality of job positions and is written in two rows, and the job names are combined into one row;

(B3) The executed person information is that the date of the case is deleted and is not empty, but the date of the case is observed by the date of the case < the registration date of the enterprise;

(B4) The open announcement, delete the observation that the case setting time is not empty, but the case setting time is < the enterprise registration date;

(B5) Deleting the observation that the release date is not empty, but the release date is less than the enterprise registration date;

(B6) Abnormal operation, namely deleting the observation of the listing date < enterprise registration date, which is not empty, but is not empty;

(B7) Administrative penalties, namely deleting the observation that the penalty decision date is not empty but the penalty decision date is less than the enterprise registration date, and substituting the public date if the penalty decision date is empty;

(B8) Administrative license, delete license start date is not empty, but license start date < observation of business registration date;

(B9) Spot check, namely deleting the observation that the spot check date is not empty, but the spot check date is less than the enterprise registration date;

(B10) The business change, delete the business change-change date is not empty, but the business change-change date < = observation of the enterprise registration date;

(B11) External investment, namely deleting the observation that the date of the operation is not empty, but the date of the operation is less than the registration date of the enterprise;

(B12) A branch office deleting the observation that the established date is not empty, but the established date is < the enterprise registration date;

(B13) The method comprises the steps of checking the stock right, namely deleting the stock right, checking the stock right, setting up the registration date, and observing the stock right, setting up the registration date < the enterprise registration date;

(B14) Real estate mortgage-delete check-in date is not empty, but check-in date < observation of business check-in date.

In the modeling and debugging step (1.4), whether the fitted model is an optimal model or not is judged, and the method specifically comprises the steps of splitting a training set and a verification set for sample data, wherein the splitting ratio is 7:3, the fact that the bad sample proportion of the training set and the verification set is consistent with the bad sample proportion of the whole data is guaranteed in splitting, obtaining KS curves and ROC curves of the training set and the verification set, which have the best effect, namely the optimal statistical model of quantitative analysis, calculating the model score of each sample according to the model result of the obtained optimal statistical model and the characteristic variable value condition of each sample data, obtaining the sample score distribution of the optimal statistical model, reflecting the distinguishing capability, stability and possible deviation of the optimal statistical model for different samples according to the sample score distribution, and judging whether the model score of the sample can be the best distinguished from the state of the good sample or not according to the actual application scene of the model (for example, the good sample is concentrated in a low section, the good sample is concentrated in a high section), judging the best state of the best distinguished good sample or not being the best distinguished sample.

In the modeling and debugging of the step (1.4), initializing a series of model variables, wherein the step comprises removing variables which obviously have no effect on modeling or have no business meaning, removing variables with excessively low information values and excessively high repeated value proportion, and removing other variables except for one reserved variable in two or more variables with higher pearson correlation.

(2) Enterprise credit information modeling

The enterprise credit information is derived from an enterprise credit report and comprises credit prompt information, loan transaction summary information, guarantee transaction summary information, loan account information and the like;

Layering the model, primarily screening key variables by an analytic hierarchy process, comprehensively considering the service attribute, the correlation, the data coverage and other conditions of the variables, giving a score by an expert scoring process, modeling the credit information of the enterprise to obtain an enterprise credit information model, wherein the model result of the enterprise credit information model comprises a variable name, a variable meaning, a variable value and a percentage preparation score;

The method comprises the steps of inquiring whether a credit report exists in an enterprise information main body based on identification information of the enterprise information main body, if the credit report exists, taking data of latest report date in a database, judging whether a field of ' year with credit transaction for the first time ' in a section of a credit prompt information unit ' in the credit report is empty, calculating enterprise credit information scores according to an enterprise credit information model if the field is not empty, not supporting calculating the enterprise credit information scores and enabling the enterprise credit information scores to be in empty processing if the field is empty, and not supporting calculating the enterprise credit information scores and enabling the enterprise credit information scores to be in empty processing if the credit report does not exist.

Wherein, each variable in the enterprise credit information model corresponds to a value and a percentage preparation score is defined by the following table:

the enterprise credit information is derived from enterprise credit reports, the data sources are single, and the structured data is normative, including credit prompt information, loan transaction summary information, guarantee transaction summary information, loan account information and the like. In the modeling of the enterprise credit information (2), modeling is performed on the enterprise credit information to obtain an enterprise credit information model, which specifically includes:

(2.1) obtaining the characteristic variables in the enterprise credit information model through data cleaning and calculation

Cleaning credit information data of enterprises, selecting important information dimension, processing variables, calculating, and if the credit information data table has records of the credit reports queried for a plurality of times, arranging the credit reports in reverse order according to the generation time of the credit reports, and taking the data of the credit report with the latest date as modeling sample data;

And re-examining and checking the fetched enterprise credit information data through data cleaning so as to discover and correct errors in the data file and reduce the influence of the error data on the model performance, wherein the data cleaning mainly comprises the steps of removing repeated data, removing logic conflict data, completing part of univariate calculation, processing noise data, outliers and processing missing numerical values.

(2.2) Feature analysis

Carrying out statistical feature analysis and distribution analysis on feature variables in the enterprise credit information model, checking extreme values and processing the extreme values;

and (3) sorting a characteristic variable table according to the result of the characteristic analysis, and recording the characteristic variable names, the calculation logic, the data coverage and the data distribution basic conditions.

(2.3) Evidence Weight (WOE) analysis

Obtaining the variable value of the characteristic variable through WOE conversion;

WOE for each category is defined as follows:

(2.4) Scoring design

(2.4.1) Whole sample fraction distribution

The enterprise credit information model is modeled according to expert scoring rules, the score of each sample is obtained according to the scoring interval and scoring rules of each variable, whether the sample scores are concentrated, dispersed or have abnormal values or not is helped to identify through integral sample score distribution analysis, and the variables, variable values and scoring conditions of the model are readjusted according to score results and in combination with application scenes of financing and leasing industries;

(2.4.2) distribution of quality sample scores

The modeling target defines the clients as good clients or bad clients according to the client management classification of the project actual application party;

the good and bad samples are distinguished according to the bad definition label;

because each sample has a model score, the score distribution is carried out according to the good and bad samples, and the score distribution is used for checking whether the good and bad samples can be distinguished or not, namely, whether the good samples are concentrated in a high section and the bad samples are concentrated in a low section is judged, and according to the score result, the variables, the variable values and the score conditions of the model are readjusted in combination with the application scene of the financing leasing industry;

step (2.3) WOE analysis and step (2.4) scoring design are subjected to multi-round optimization so as to achieve the state that the sample score can best distinguish good samples from bad samples, and then a final model and variables of the enterprise credit information model are obtained;

(2.5) model achievement presentation

And showing the final modeling variable, the variable value and the percentile score of each variable in the enterprise credit information model on a business system.

(3) Credit data fusion scoring

The whole grading of the enterprise public information model obtained in the step (1) and the grading of the enterprise credit information obtained in the step (2) are output to a business application system in a credit investigation data fusion grading interface mode according to a contracted weight rule, and the business application system displays a final fusion grading result and key characteristic variables;

The credit data fusion scoring interface output content comprises, but is not limited to, enterprise names, unified social credit codes, credit fusion scores, enterprise public information scores, enterprise credit information scores, values and scores of each variable of an enterprise public information model, values and scores of each variable of an enterprise credit information model.

In the credit information data fusion score (3), the agreed weight rule may be that when the credit information score of the enterprise is not empty, the credit information data fusion score is calculated according to the weight ratio of the credit information score of the enterprise to the credit information score of the enterprise being 4:6, and when the credit information score of the enterprise is empty, the credit information data fusion score of the enterprise is consistent with the credit information score of the enterprise. Of course, the weighting rules may also be modified and adjusted according to the business requirements.

The invention provides an enterprise credit data fusion score modeling method based on financing and leasing industry, which comprises the steps of firstly, deeply researching enterprise public information and enterprise credit information, respectively describing a data analysis process and a data analysis result from a data sample summary, a data preprocessing rule, a characteristic variable analysis and a WOE analysis, secondly, selecting an applicable modeling method according to sample magnitude, characteristic variable condition, modeling target and the like of the enterprise public information and the enterprise credit information, respectively establishing an enterprise public information model and an enterprise credit information model, further performing model parameter tuning and score distribution tuning, and finally, completing fusion scoring of the enterprise public information model and the enterprise credit information model, and creating a set of total score interface output service.

Compared with the prior art, the enterprise credit data fusion scoring modeling method based on the financing and renting industry provided by the invention has the following main beneficial effects:

1. method for modeling public information of enterprise by using credit investigation organization view angle

The patent application provides a method for constructing a set of enterprise public information modeling method based on a financing and leasing industry enterprise credit information data fusion scoring modeling method, which is based on the view angle of a third-party credit agency, and based on the analysis of millions of enterprise public information macroscopic data, a credit risk condition of an enterprise information main body on the social public level is combed, government public information business association attributes are analyzed, key feature variables are extracted, and a logical regression model is used for selecting the model to construct the enterprise public information data range, data preprocessing, model design and assessment overall process.

2. Method for constructing enterprise credit information modeling from perspective of financing leasing company

The method for modeling the credit information of the enterprises is characterized in that a set of enterprise credit information modeling method is built, from the actual business demands of financing leasing companies, classification hierarchical management after client loan is combined, bad definition tags of modeling data are set, basic information, loan and guarantee transaction information and repayment performance information of an enterprise information main body on a credit report are combed, key feature variables are refined, and an enterprise credit information model is built through a hierarchical analysis method and an expert scoring method.

3. Method for constructing credit information data fusion scoring method based on financing lease industry

The enterprise credit information only reflects one aspect of enterprise credit risk, and the replacement data value of the enterprise public information is more and more important, so that the requirements of enterprise credit risk assessment cannot be completely met by modeling only the credit information, and the enterprise public information partial modeling is urgent to be converged. The method for fusing the enterprise public information model and the enterprise credit information model is constructed in the method of the patent application, meets the requirements of scoring modeling of different scenes in the future, and has strong expansibility.

Through the analysis, the patent based on the credit data fusion score modeling method for the financing leasing enterprises has higher novelty, practicability and expansibility compared with other score modeling methods in the market.

The invention provides a method for solving enterprise credit data fusion score modeling based on financing and renting industries. And integrating the data advantages of the third-party credit investigation organization and the application scene of the financing and renting company, constructing an objective statistical model by a modeling technology, scientifically predicting the enterprise risk by model scoring, and comprehensively improving the application value and the use efficiency of the data.

Social benefits

The method for solving enterprise credit information data fusion score modeling based on the financing and renting industry comprehensively improves the application value of the substitute data. The modeling method combines the behavior records of the enterprise main body on the social public layers of business, judicial, tax, intellectual property and the like, combines the behavior expression of the credit information of the enterprise, designs a reasonable credit risk assessment model, comprehensively and accurately characterizes the credit portrait of the information main body, displays the conditions of social operation, credit losing information, credit guarantee and the like of the information main body through scoring profiles, comprehensively improves the social public credit awareness of the enterprise group, and strengthens the application value of the substitute data.

The method for solving enterprise credit data fusion score modeling based on the financing and renting industry effectively improves credit data modeling specification and efficiency. The invention carries out deep thinking and research from the aspects of data preprocessing, model design and evaluation, multi-model fusion, system integration application and the like, provides reference values for other non-banking financial institutions (financing leases, business insurance, small loans, consumption finances and the like), and also provides an effective modeling method for data modeling of enterprise credit investigation industry to a certain extent. The invention provides a reliable credit sign data fusion scoring modeling method in the aspects of preventing credit risk, improving performance level and the like, and helps other non-banking financial institutions to quickly build scoring models adapting to own business requirements.

The invention provides a method for solving enterprise credit information data fusion score modeling based on financing and renting industries, which strengthens the construction of an honest social credit system. The invention creates response country to the request of the standard credit information management, explores the business (enterprise) credit information standardization road, perfects the ' cross-domain ' credibility combined incentive and the ' off-credibility combined punishment and withdrawal mechanism, restricts the enterprise information main body to perform legal honest operation, maintains the normal order of the market, builds the honest social environment, promotes the general financial development, gradually improves the general financial institution risk control level, makes greater contribution to the establishment of the industry risk control system of the business self-discipline and the social supervision, and can generate huge social benefit and play an important role in creating the good credit environment and the establishment of the honest system.

Economic benefit

The method for solving enterprise credit information data fusion score modeling based on the financing and renting industry improves the management efficiency of wind control of the financing and renting industry. With the continuous development of various credit businesses and the vigorous competition of industries, more and more financing and leasing institutions realize that gradually increasing the risk control level of enterprises is a necessary basis for the steady promotion of the businesses. The risk data is deeply mined and analyzed by utilizing a data analysis tool to identify potential risk modes and trends, mass data is integrated into simple scoring data by a modeling method, and risks are identified and predicted by objective statistics. By establishing a scoring monitoring system, ongoing risk is tracked and assessed in real time. The reference application of scoring in wind control can effectively simplify and optimize the wind control management flow, reduce unnecessary steps and links and improve the working efficiency.

The method for solving enterprise credit information data fusion score modeling based on the financing and renting industry reduces transaction cost and improves social operation efficiency. By applying the credit information data fusion score, the credit information losing person can be effectively tracked and monitored, so that the credit information losing person is restricted in various aspects, for example, the application of financial products such as loans is limited, or the participation of the credit information losing person in commercial activities such as bidding is limited. This can make the frustration of the distruster, greatly increasing the cost of distrusting. More trust and opportunities are available to the daemon, such as lower interest rates, faster approval speeds, etc., expanding the value of the daemon, which may encourage more enterprises to become daemons. The situation of asymmetric information is reduced, the transaction cost is reduced, the operation efficiency of the society is improved, and the economic benefit of the society is further improved.

The method for solving enterprise credit data fusion scoring modeling based on financing and leasing industries is provided, wherein enterprise public information is government public information from business, judicial, tax, intellectual property and the like, the enterprise public information has the condition of multiple data items and scattered and not centralized, and a user needs to log in a plurality of data platforms to acquire required data, so that the efficiency is low, a data service link is long, the operation is complex and the like.

The method for solving enterprise credit data fusion score modeling based on financing and leasing industry is provided, wherein the enterprise credit information is strictly used, the credit report is strictly and normally managed when being applied in the company on the premise of obtaining the authorization of an information main body, the process of using data approval is complex, and the timeliness is to be improved. Credit reports are generally in pdf format, credit information is proprietary, and data structure analysis is needed before professional interpretation analysis.

Detailed Description

Other advantages and features of the invention are shown by the following description of embodiments of the invention, given by way of example and not by way of limitation, with reference to the accompanying drawings.

(1) Modeling enterprise public information:

Defining a modeling target according to project application requirements, using a logistic regression model as a core modeling technology, using the data of the called enterprise basic information as sample data, and modeling the enterprise public information to obtain an enterprise public information model;

and obtaining the integral score of the enterprise public information model through the corresponding value and score of each variable of the enterprise public information model.

The enterprise public information data in the method refers to government affair public information such as industry and commerce, judicial tax, intellectual property and the like, and comprises but is not limited to enterprise registration information, stockholder information, main personnel, industry and commerce change, enterprise annual report, administrative permission, administrative penalty, stock right quality, enterprise external investment and the like.

The specific information types and main field items of the enterprise public information data are shown in the following table 1:

TABLE 1 Enterprise public information types and field items

In data modeling, a general modeling goal is a goal or objective to build a model to achieve. Modeling goals may be diverse, typically for prediction and decision support, and common modeling goals are budgeting for the likelihood of occurrence of some kind of bad definition event over a period of time.

For modeling of public information of enterprises, definition of bad samples may be different according to specific situations, and general bad definition schemes and advantages are as follows in table 2:

TABLE 2 Enterprise public information-modeling goal general purpose scenario comparison

The modeling target is selected by comprehensively considering project application scenes and sample data distribution characteristics. To enhance the designability of modeling targets, modeling targets may be defined in terms of project application requirements.

For example:

the definition of modeling target variables ("good" and "bad") can be defined as "good" clients and "bad" clients according to the actual application requirements of the project and the post-credit management classification of the enterprise client group. Such as whether overdue, overdue condition, risk five-level classification, etc.

In the method, the modeling target is defined as a good client according to the client management classification of 'N1/N2/N3' of the actual application party of the financing and leasing industry, and the client management classification of 'A/B/C' is defined as a bad client.

N1 is a normal class item;

n2 is a project in the construction period, and whether the risk is uncertain or not is specific;

N3 is a few small operation flaw items, such as the situation that the electric charge income is not reported in time in the last month;

class C is a risk occurrence that requires intervention by a customer manager;

Class B is a risk occurrence, and requires company intervention, such as collection promotion and the like;

class A is a risk occurrence, and the problems need to be treated by legal litigation and other means;

the degree of deterioration of the item is N1-N2-N3-C-B-A.

(1.1) data cleaning analysis:

Data cleansing analysis, including cleansing rules for generic data and cleansing rules for specific data.

(A) The general data cleaning rule is specifically processed as follows:

(A1) The date field is uniformly displayed according to the YYYY-MM-DD format;

(A4) And (3) repeating the data, namely, for the same event, possibly multiple repeated information records exist in the data table, and the data deduplication takes a keyword of 'company name + event unique identification judgment' as a main identification mode. The duplicate removal judgment keywords of each table are shown in Table 3 below.

TABLE 3 Enterprise public information record uniqueness judgment identification

(B) The specific data cleaning rule is specifically processed as follows:

(1.2) Feature variable analysis:

(1.3) Evidence Weight (WOE) analysis:

WOE for each category is defined as follows:

(1.4) Modeling and debugging:

Initializing a series of model variables, fitting a model based on the current series of variables, wherein the model result of the fitted model comprises a characteristic variable name, a variable meaning, a variable value and a percentage score, and then judging whether the fitted model is an optimal model or not. If the model is judged to be the optimal model, a final model and variables of the enterprise public information model are obtained, if the model is judged to be the non-optimal model, a model is re-fitted based on a current series of variables after some variables are added or deleted to the model, whether the re-fitted model is the optimal model is judged, and until the optimal model is found, the final model and the variables of the enterprise public information model are obtained.

Initializing model variables, including:

① Removing variables that are significantly ineffective for modeling

And manually removing variables which obviously have no effect on modeling or have no business meaning in the original data of the enterprise public information, such as variables of unified social credit codes, registration authorities, legal representatives, permitted business projects, business scope and the like.

② Removing variables with information values that are too low and repetition value ratios that are too high

For example, in modeling and debugging a certain model, it can be seen from the following table 4 that the information value (info_value) of the feature variable "business name" is smaller than 0.02, and the information value is too low to be removed. The repetition value ratio (identification_rate) of the number of the copyright of the works, the number of the judicial auctions, the number of the information of the trusted executives, the number of the owe taxes information of the intellectual property rights, the number of the administrative penalties, the number of the documents of the judge of the last 2 years is larger than 0.95, and the repeated value ratio is overlarge for removal.

TABLE 4 data modeling characteristic variable information value IV, repeat value ratio IR

③ Removing variables with higher pearson correlation

For example, pearson correlation coefficient values of "annual report of business" and "annual number of establishment" have pearson correlation of 0.813. The 'enterprise annual report' is removed in the modeling process, and only the 'established years' variable is reserved.

The method comprises the steps of carrying out training set and verification set splitting on sample data, wherein the splitting ratio is 7:3, ensuring that the bad sample proportion of the training set and the verification set is consistent with the bad sample proportion of the whole data during splitting, obtaining KS curves and ROC curves of the training set and the verification set, wherein the best effect is the best statistical model of quantitative analysis, calculating the model score of each sample according to the model result of the obtained best statistical model and the characteristic variable value of each sample data, obtaining the sample score distribution of the best statistical model, reflecting the distinguishing capability, stability and possible deviation of the best statistical model on different samples through the sample score distribution, judging whether the model score of the sample can be used for distinguishing the good sample from the bad sample according to the actual application scene of the model, for example, wherein the bad sample is concentrated in a low segment, judging the best model if the best state of the good sample is distinguished from the bad sample, namely obtaining the final model and variable of an enterprise public information model, and if the best state of the best sample is not judged.

When the model is judged to be not the optimal model, adding some variables into the model or deleting some variables, then re-fitting a model based on a current series of variables, and continuously judging whether the re-fitted model is the optimal model according to the method until the optimal model is found, and obtaining a final model and variables of the enterprise public information model.

(1.5) Fractional linear conversion

(1.6) model achievement presentation

TABLE 5 Enterprise public information model

(2) Enterprise credit information modeling

layering the model, primarily screening key variables by an analytic hierarchy process, comprehensively considering the service attribute, the correlation, the data coverage and other conditions of the variables, giving scores by an expert scoring process, modeling the credit information of the enterprise, and obtaining an enterprise credit information model;

(2.2) Feature analysis

(2.3) Evidence Weight (WOE) analysis

WOE for each category is defined as follows:

(2.4) Scoring design

(2.4.1) Whole sample fraction distribution

(2.4.2) distribution of quality sample scores

(2.5) model achievement presentation

TABLE 6 Enterprise credit information model

(3) Credit data fusion scoring

The obtained integral score of the enterprise public information model and the enterprise credit information score are output to a business application system in an interface mode according to the agreed weight rule; the business system displays the final fused scoring result and key feature variables.

In an actual business scenario, each business information entity may calculate a business disclosure information score, but may not be able to calculate a business credit information score because of missing credit information. And (3) finishing processing and calculating the enterprise public information score and the enterprise credit information score in the system, and calculating the fusion score according to the weight ratio of 4:6.

For example, the enterprise public information model score is 80, the credit information model score is 60, and the credit information data fusion score is 80×0.4+60×0.6=68.

For example, the enterprise public information model score is 70, and the credit information data fusion score is 70 when no credit information data exists.

The credit information data fusion scoring interface output content comprises an enterprise name, a unified social credit code, a credit information fusion score, an enterprise public information score, an enterprise credit information score, a value and a score of each variable of the enterprise public information model, and a value and a score of each variable of the enterprise credit information model.

Although the invention has been described in terms of the preferred embodiment, it is not intended to limit the scope of the invention to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.

Claims

1. A scoring modeling method based on the fusion credit data of enterprises in the financial leasing industry, characterized by comprising the following steps:

(1) Enterprise public information modeling:

Based on the identification information of the enterprise information subject, retrieve the enterprise basic information through the enterprise public information query API, and store the retrieved enterprise basic information in the business system memory, wherein the identification information of the enterprise information subject includes but is not limited to the enterprise name and/or unified social credit code;

Define the modeling target according to the project application requirements, use the logistic regression model as the core modeling technology, use the retrieved enterprise basic information data as sample data, model the enterprise public information, and obtain the enterprise public information model. The model results of the enterprise public information model include variable name, variable meaning, variable value, and percentage score;

Through the corresponding values and percentage scores of each variable in the enterprise public information model, the overall score of the enterprise public information model is finally obtained;

The corresponding values and percentage scores of each variable in the enterprise public information model are defined in the following table:

(2) Enterprise credit information modeling

The enterprise credit information is derived from the enterprise credit report, including: credit reminder information, loan transaction summary information, guarantee transaction summary information, loan account information, etc.;

The model is layered and key variables are initially screened through the "hierarchical analysis method". The business attributes, relevance, data coverage and other conditions of the variables are comprehensively considered, and scores are assigned through the "expert scoring method". Modeling is carried out for corporate credit information to obtain the corporate credit information model. The model results of the corporate credit information model include variable names, variable meanings, variable values, and percentage scores.

Based on the identification information of the enterprise information subject, check whether the enterprise information subject has a credit report; if there is a credit report, use the latest data in the database with the report date to determine whether the "Year of the First Credit Transaction" field in the "Credit Prompt Information Unit" section in the credit report is empty. If the field is not empty, calculate the enterprise credit information score according to the enterprise credit information model; if the field is empty, the calculation of the enterprise credit information score is not supported and the enterprise credit information score is treated as empty; if there is no credit report, the calculation of the enterprise credit information score is not supported and the enterprise credit information score is treated as empty;

The corresponding values and percentage scores of each variable in the enterprise credit information model are defined in the following table:

(3) Credit data integration scoring

The overall score of the enterprise public information model obtained in step (1) and the enterprise credit information score obtained in step (2) are output to the business application system through the credit data fusion scoring interface according to the agreed weight rules, and the business application system displays the final fusion scoring results and key feature variables;

The output content of the credit data fusion scoring interface includes but is not limited to: enterprise name, unified social credit code, credit fusion score, enterprise public information score, enterprise credit information score, value and score of each variable of the enterprise public information model, and value and score of each variable of the enterprise credit information model.

2. The scoring modeling method based on the fusion credit data of the financial leasing industry as claimed in claim 1 is characterized by: the modeling is performed on the public information of the enterprise to obtain the public information model of the enterprise; the overall score of the public information model of the enterprise is finally obtained by taking corresponding values and scores for each variable of the public information model of the enterprise; specifically including:

(1.1) Data cleaning and analysis:

Perform data cleaning and calculation on the retrieved enterprise basic information data to obtain characteristic variables in the enterprise public information model. The data cleaning mainly includes: removing duplicate data, removing logical conflict data, completing some single variable calculations, processing noise data, abnormal values and outliers, and processing missing values;

(1.2) Characteristic variable analysis:

Conduct statistical characteristics and distribution analysis on the characteristic variables in the obtained enterprise public information model, check extreme values and process them;

Organize the results of feature variable analysis into a feature variable table, recording the feature variable name, calculation logic, data coverage, and basic data distribution information;

(1.3) Weight of Evidence (WOE) Analysis:

The logistic regression model is transformed into a standard scorecard format through WOE transformation to obtain the variable values of the feature variables;

First, all feature variables are automatically binned. Then, the reliability of the automatic binning results is manually checked to see whether they meet business requirements and are interpretable. Then, it is determined whether manual binning is needed.

The WOE for each category is defined as follows:

The columns Bad Distribution and Good Distribution represent the distribution of "bad customers" and "good customers" in each category, respectively. They are obtained by dividing the frequency count in each category by the total number of "bad customers" or "good customers".

If the ratio in the brackets is less than 1, the WOE is negative, otherwise the WOE is positive;

(1.4) Modeling and debugging:

Initialize a series of model variables, fit a model based on the current series of variables, and the model results of the fitted model include the feature variable name, variable meaning, variable value, and percentile score, and then determine whether the fitted model is the optimal model;

If it is judged to be the optimal model, the final model and variables of the enterprise public information model are obtained;

If it is judged that it is not the optimal model, some variables are added to the model or some variables are deleted, and then a new model is fitted based on the current series of variables to determine whether the refitted model is the optimal model. Until the optimal model is found, the final model and variables of the enterprise public information model are obtained;

(1.5) Fractional linear transformation:

The score is linearly converted to 0-100 points, and the distribution characteristics remain unchanged. The conversion formula is:

(1.6) Model Results Presentation

The final model input variables, variable values and percentage scores of each variable of the enterprise public information model are displayed on the business system.

3. The method for fusion scoring modeling based on credit data of financial leasing industry enterprises as claimed in claim 2 is characterized in that: the data cleaning analysis (1.1) specifically includes:

(A) General data cleaning rules, the specific processing methods are as follows:

(A1) Date fields: displayed in the format of YYYY-MM-DD;

(A2) Amount fields: All amounts are converted into numerical format and calculated in RMB 10,000;

(A3) Ratio fields: All ratios are converted into numerical format, percentage signs are removed, and zero is added if there is no zero before the decimal point;

(A4) Data duplication: For the same event, there may be multiple duplicate information records in the data table. Data deduplication is mainly identified by the keyword "company name + event unique identifier judgment";

(B) Cleaning rules for specific data. The specific processing methods are as follows:

(B1) Enterprise registration: If the enterprise registration date is empty, the date is corrected with the enterprise operation start date; delete the observations with empty enterprise registration date; delete the observations with empty enterprise operation start date; delete the observations with business status of cancellation or revocation, but the cancellation date or revocation date is not empty; delete the observations with business end date not empty, but business end date < business start date; delete the observations with cancellation date not empty, but cancellation date < enterprise registration date; delete the observations with revocation date not empty, but revocation date < enterprise registration date;

(B2) Main personnel: merge the job titles of the same company and the same person; if a person has multiple job titles and they are written on two lines, merge them into one line;

(B3) Information on the person subject to enforcement: Delete the observations where the filing date is not empty but the filing date is less than the enterprise registration date;

(B4) Court Announcement: Delete the observations where the filing date is not empty but the filing date is less than the enterprise registration date;

(B5) Dishonest debtor: Delete the observations where the publication date is not empty but the publication date is less than the enterprise registration date;

(B6) Operational anomalies: Delete observations where the inclusion date is not empty but the inclusion date is less than the enterprise registration date; delete observations where the removal date is less than the inclusion date;

(B7) Administrative penalties: Delete observations where the penalty decision date is not empty, but the penalty decision date is less than the enterprise registration date; if the penalty decision date is empty, replace it with the public announcement date;

(B8) Administrative license: Delete the observations where the license start date is not empty but the license start date is less than the enterprise registration date; delete the observations where the license end date is not empty but the license end date is less than the license start date;

(B9) Spot check: delete the observations where the spot check date is not empty but the spot check date is less than the enterprise registration date;

(B10) Industrial and Commercial Change: Delete the observations where the Industrial and Commercial Change-Change Date is not empty, but the Industrial and Commercial Change-Change Date is <= Enterprise Registration Date;

(B11) Outward investment: Delete the observations where the opening date is not empty but the opening date is less than the enterprise registration date;

(B12) Branches: Delete observations where the establishment date is not empty but the establishment date is less than the enterprise registration date;

(B13) Equity pledge: Delete the observations where the equity pledge registration date is not empty, but the equity pledge registration date is less than the enterprise registration date;

(B14) Movable Property Mortgage: Delete the observations where the registration date is not empty but the registration date is less than the enterprise registration date.

4. The scoring modeling method based on the fusion credit data of enterprises in the financial leasing industry as claimed in claim 2 is characterized in that: in the modeling and debugging of the step (1.4), the judgment of whether the fitted model is the optimal model specifically includes:

The sample data is split into a training set and a validation set with a split ratio of 7:3. When splitting, the proportion of bad samples in the training set and validation set must be consistent with the proportion of bad samples in the full data.

Obtain the KS curve and ROC curve of the training set and validation set, and the best one is the optimal statistical model for quantitative analysis;

According to the model results of the optimal statistical model obtained and the characteristic variable values of each sample data, the model score of each sample is calculated to obtain the sample score distribution of the optimal statistical model;

The sample score distribution reflects the optimal statistical model's ability to distinguish different samples, stability, and possible deviations. Based on the actual application scenario of the model, it is determined whether the sample model score can best distinguish between good and bad samples.

If the state of good and bad samples can be best distinguished, it is judged as the optimal model. If the state of good and bad samples cannot be best distinguished, it is judged as not the optimal model.

5. The method for fusion scoring modeling based on credit data of enterprises in the financial leasing industry as claimed in claim 2 is characterized in that: in the modeling and debugging of the step (1.4), the initialization of a series of model variables includes:

Remove variables that are obviously not useful for modeling or have no business significance;

Remove variables with low information values and high proportion of duplicate values; and

Remove all but one of the two or more variables with high Pearson correlations.

6. The scoring modeling method based on the fusion credit data of financial leasing industry enterprises according to claim 1 is characterized by:

In the aforementioned (2) enterprise credit information modeling, the aforementioned modeling of enterprise credit information to obtain an enterprise credit information model specifically includes:

(2.1) Obtain characteristic variables in the enterprise credit information model through data cleaning and calculation

Clean the enterprise credit information data, select important information dimensions, and then process and calculate the variables. If there are multiple records of credit reports being queried for the same enterprise in the credit information data table, sort them in reverse order by the time the credit reports were generated, and take the data of the latest credit report as the modeling sample data;

Re-examine and verify the retrieved enterprise credit information data through data cleaning to find and correct errors in the data files and reduce the impact of erroneous data on model performance. The data cleaning mainly includes: removing duplicate data, removing logical conflict data, completing some single variable calculations, processing noise data, abnormal values and outliers, and processing missing values;

(2.2) Feature analysis

Conduct statistical characteristic analysis and distribution analysis on characteristic variables in the enterprise credit information model, check extreme values and process them;

The results of feature analysis are compiled into a feature variable table, which records the feature variable name, calculation logic, data coverage, and basic data distribution information;

(2.3) Weight of Evidence (WOE) Analysis

The variable values of the characteristic variables are obtained through WOE transformation;

The WOE for each category is defined as follows:

(2.4) Scoring design

(2.4.1) Overall sample score distribution

The enterprise credit information model is built according to the expert scoring rules. The score of each sample is obtained according to the scoring range and scoring rules of each variable. The overall sample score distribution analysis helps to identify whether the sample scores are concentrated, dispersed or have outliers. According to the score results, the model variables, variable values, and scores are adjusted in combination with the application scenarios of the financial leasing industry.

(2.4.2) Distribution of good and bad sample scores

The modeling goal is to define customers as "good" customers or "bad" customers based on the customer management classification of the actual project user;

Good and bad samples are distinguished based on the “bad definition” label;

Because each sample has a model score, we will make a score distribution based on good and bad samples to see whether the score can distinguish good and bad samples, that is, whether good samples are concentrated in high segments and bad samples are concentrated in low segments. According to the score results, we will adjust the model variables, variable values, and scores in combination with the application scenarios of the financial leasing industry.

Step (2.3) WOE analysis and step (2.4) score design are tuned multiple times to achieve a state where the sample score can best distinguish good and bad samples, and the final model and variables of the enterprise credit information model are obtained;

(2.5) Model Results Presentation

The final model input variables, variable values and percentage scores of each variable in the enterprise credit information model are displayed on the business system.

7. The credit data fusion scoring modeling method based on the financial leasing industry as claimed in claim 1 is characterized in that: in the (3) credit data fusion scoring, the agreed weight rule is:

When the enterprise credit information score is not empty, the credit data fusion score is calculated based on the weight ratio of the enterprise public information model score to the enterprise credit information score of 4:6;

When the enterprise credit information score is empty, the enterprise credit data fusion score is consistent with its enterprise public information model score.