[go: up one dir, main page]

CN119761901A - A scoring modeling method based on the fusion credit data of enterprises in the financial leasing industry - Google Patents

A scoring modeling method based on the fusion credit data of enterprises in the financial leasing industry Download PDF

Info

Publication number
CN119761901A
CN119761901A CN202411842297.6A CN202411842297A CN119761901A CN 119761901 A CN119761901 A CN 119761901A CN 202411842297 A CN202411842297 A CN 202411842297A CN 119761901 A CN119761901 A CN 119761901A
Authority
CN
China
Prior art keywords
enterprise
model
information
data
date
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411842297.6A
Other languages
Chinese (zh)
Inventor
崔涛
王光
张旿
张奇宝
赵坤
赵俊
钱嘉楠
李嘉和
徐国栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Power Investment Ronghe Financial Leasing Co ltd
Shanghai Credit Reporting Co ltd
Original Assignee
China Power Investment Ronghe Financial Leasing Co ltd
Shanghai Credit Reporting Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Power Investment Ronghe Financial Leasing Co ltd, Shanghai Credit Reporting Co ltd filed Critical China Power Investment Ronghe Financial Leasing Co ltd
Priority to CN202411842297.6A priority Critical patent/CN119761901A/en
Publication of CN119761901A publication Critical patent/CN119761901A/en
Pending legal-status Critical Current

Links

Landscapes

  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The invention provides a method for modeling enterprise credit information fusion score based on financing and leasing industry, which comprises three steps of enterprise public information modeling, enterprise credit information modeling and credit information fusion score, wherein enterprise public information score and enterprise credit information score are respectively obtained by respectively establishing an enterprise public information model and an enterprise credit information model, and then enterprise credit information fusion score is obtained according to agreed weight rules and a final fusion scoring result and key feature variable are displayed by a service application system.

Description

Enterprise credit data fusion scoring modeling method based on financing and renting industry
Technical Field
The invention relates to the technical field of computers, in particular to an enterprise credit data fusion scoring modeling method based on financing and leasing industries.
Background
In recent years, credit investigation data is an important production element and plays an important role in the scenes of enterprise credit evaluation, government supervision and the like. Non-banking financial institutions such as financing leases, business insurance and petty loans commonly face some pain points in credit investigation data acquisition, processing and application processes, and the demands of data in the aspects of safety, compliance, aging and the like are urgently met through an enterprise credit investigation data fusion scoring modeling method. On one hand, public government affair information such as enterprise business, judicial, tax, intellectual property and the like is diversified in data service organization in the market, the data format is not uniform, the text content of the part of information is various and needs professional interpretation and analysis, on the other hand, the use of enterprise credit information has strict authorization requirements, the complicated approval process can reduce the use timeliness of the data, the data value is difficult to be exerted to the maximum extent, and the credit behavior analysis of an enterprise information main body faces challenges.
Disclosure of Invention
The invention relates to a method for solving enterprise credit data fusion score modeling based on financing and renting industries. And rapidly reading mass information through big data software, mining the association relation of the data bottom layer, constructing an objective statistical model, scientifically predicting enterprise risk through model scoring, and comprehensively improving the application value and the use efficiency of the data. The invention provides a solution for modeling data preprocessing, model design and evaluation, multi-model fusion, system integration application and other works of enterprise credit data.
The enterprise public information in the method refers to government public information from industry and commerce, judicial, tax, intellectual property and the like. The enterprise public information has the conditions of multiple data items, scattered and not concentrated, and users need to log in a plurality of data platforms to acquire the required data, thus the efficiency is low, the data service link is longer, the operation is complex, and the like.
The enterprise credit information in the method is information derived from enterprise credit reports, including enterprise basic information, repayment performance information, guarantee information and the like. The enterprise credit information data has strict use requirements, the enterprise credit report information is applied in the company and needs strict standard management on the premise of obtaining the authorization of the information body, the use data approval process is complex, and the timeliness is to be improved. The credit report information of enterprises is generally in pdf format, and the credit information has strong specialization, and data structure analysis is needed to be firstly carried out and then professional interpretation analysis is needed.
A method for modeling enterprise credit information data fusion scoring based on financing and leasing industries is used for enterprise public information and enterprise credit information data modeling processes of enterprise credit risk assessment and comprises the technical fields of data preprocessing, model design and assessment, multi-model fusion, system integration application and the like.
The aim of the invention is realized by the following technical scheme:
A method for modeling enterprise credit data fusion scoring based on financing leasing industry comprises three steps of enterprise public information modeling, enterprise credit information modeling and credit data fusion scoring.
(1) Modeling enterprise public information:
Based on the identification information of the enterprise information main body, the enterprise basic information is called through an enterprise public information inquiry API, and the called enterprise basic information is stored in a service system memory, wherein the identification information of the enterprise information main body comprises, but is not limited to, enterprise names and/or unified social credit codes;
Defining a modeling target according to project application requirements, using a logistic regression model as a core modeling technology, using the data of the called enterprise basic information as sample data, modeling the enterprise public information to obtain an enterprise public information model, wherein a model result of the enterprise public information model comprises a variable name, a variable meaning, a variable value and a percentage preparation score;
And obtaining the integral score of the enterprise public information model through the corresponding value and percentage preparation score of each variable of the enterprise public information model.
Wherein, each variable in the enterprise public information model corresponds to a value and a percentage preparation score is defined according to the following table:
Specifically, in the enterprise public information modeling, modeling is performed on enterprise public information to obtain an enterprise public information model, and the overall score of the enterprise public information model is finally obtained through corresponding value and score of each variable of the enterprise public information model, wherein the method specifically comprises the following steps:
(1.1) data cleaning analysis:
and performing data cleaning and calculation on the retrieved enterprise basic information data to obtain characteristic variables in an enterprise public information model, wherein the data cleaning mainly comprises the steps of removing repeated data, removing logic conflict data, completing part of univariate calculation, processing noise data, abnormal values and outliers and processing missing numerical values.
(1.2) Feature variable analysis:
carrying out statistical characteristics and distribution analysis on characteristic variables in the obtained enterprise public information model, checking extreme values and processing the extreme values;
And (3) sorting the results of the feature variable analysis into a feature variable table, and recording the feature variable names, the calculation logic, the data coverage and the data distribution basic conditions.
(1.3) Evidence Weight (WOE) analysis:
Converting the logistic regression model into a standard grading card format through WOE conversion to obtain the variable value of the characteristic variable;
firstly, carrying out automatic box separation on all characteristic variables, then manually checking the reliability of an automatic box separation result, whether the automatic box separation result meets business requirements or not, whether the automatic box separation result has interpretability or not, and then judging whether the manual box separation is needed or not;
WOE for each category is defined as follows:
Wherein, columns Bad Distribution and Good Distribution represent the Distribution of "Bad clients" and "good clients" in each category, respectively, which are obtained by dividing the number of frequencies in each category by the total number of "Bad clients" or "good clients";
If the ratio in brackets is less than 1 then WOE is negative and vice versa WOE is positive.
(1.4) Modeling and debugging:
Initializing a series of model variables, fitting a model based on the current series of variables, wherein the model result of the fitted model comprises a characteristic variable name, a variable meaning, a variable value and a percentage preparation score, and then judging whether the fitted model is an optimal model or not;
if the model is judged to be the optimal model, a final model and variables of the enterprise public information model are obtained;
if the model is judged not to be the optimal model, adding some variables into the model or deleting some variables, then re-fitting a model based on a current series of variables, judging whether the re-fitted model is the optimal model or not, and obtaining a final model and variables of the enterprise public information model until the optimal model is found.
(1.5) Fractional linear conversion:
The grading score is linearly converted into 0-100 grades, the distribution characteristics are unchanged, and the conversion formula is as follows:
(1.6) model achievement presentation
And displaying the final modeling variable, the variable value and the percentile score of each variable of the enterprise public information model on the business system.
The data cleaning analysis (1.1) comprises a cleaning rule for general data and a cleaning rule for specific data.
(A) The general data cleaning rule is specifically processed as follows:
(A1) The date field is uniformly displayed according to the YYYY-MM-DD format;
(A2) An amount type field, which is to unify all amounts into a numerical format and calculate according to ten thousand yuan of the Renminbi;
(A3) The proportion field is used for unifying all proportions into a numerical format, removing percentage numbers, and supplementing 0 to 0 before decimal points;
(A4) And (3) repeating the data, namely, for the same event, possibly multiple repeated information records exist in the data table, and the data deduplication takes a keyword of 'company name + event unique identification judgment' as a main identification mode.
(B) The specific data cleaning rule is specifically processed as follows:
(B1) The enterprise registration comprises the steps of correcting an enterprise registration date by using the enterprise operation starting date if the enterprise registration date is empty, deleting the observation that the enterprise operation starting date is empty, deleting the observation that the operation state is cancel or cancel but the cancel date or cancel date is not empty, deleting the observation that the enterprise operation expiration date is not empty but the enterprise operation expiration date is less than the enterprise operation starting date, deleting the observation that the cancel date is not empty but the cancel date is less than the enterprise registration date;
(B2) The main personnel are to combine the job names of the same company and the same person name, wherein one person has a plurality of job positions and is written in two rows, and the job names are combined into one row;
(B3) The executed person information is that the date of the case is deleted and is not empty, but the date of the case is observed by the date of the case < the registration date of the enterprise;
(B4) The open announcement, delete the observation that the case setting time is not empty, but the case setting time is < the enterprise registration date;
(B5) Deleting the observation that the release date is not empty, but the release date is less than the enterprise registration date;
(B6) Abnormal operation, namely deleting the observation of the listing date < enterprise registration date, which is not empty, but is not empty;
(B7) Administrative penalties, namely deleting the observation that the penalty decision date is not empty but the penalty decision date is less than the enterprise registration date, and substituting the public date if the penalty decision date is empty;
(B8) Administrative license, delete license start date is not empty, but license start date < observation of business registration date;
(B9) Spot check, namely deleting the observation that the spot check date is not empty, but the spot check date is less than the enterprise registration date;
(B10) The business change, delete the business change-change date is not empty, but the business change-change date < = observation of the enterprise registration date;
(B11) External investment, namely deleting the observation that the date of the operation is not empty, but the date of the operation is less than the registration date of the enterprise;
(B12) A branch office deleting the observation that the established date is not empty, but the established date is < the enterprise registration date;
(B13) The method comprises the steps of checking the stock right, namely deleting the stock right, checking the stock right, setting up the registration date, and observing the stock right, setting up the registration date < the enterprise registration date;
(B14) Real estate mortgage-delete check-in date is not empty, but check-in date < observation of business check-in date.
In the modeling and debugging step (1.4), whether the fitted model is an optimal model or not is judged, and the method specifically comprises the steps of splitting a training set and a verification set for sample data, wherein the splitting ratio is 7:3, the fact that the bad sample proportion of the training set and the verification set is consistent with the bad sample proportion of the whole data is guaranteed in splitting, obtaining KS curves and ROC curves of the training set and the verification set, which have the best effect, namely the optimal statistical model of quantitative analysis, calculating the model score of each sample according to the model result of the obtained optimal statistical model and the characteristic variable value condition of each sample data, obtaining the sample score distribution of the optimal statistical model, reflecting the distinguishing capability, stability and possible deviation of the optimal statistical model for different samples according to the sample score distribution, and judging whether the model score of the sample can be the best distinguished from the state of the good sample or not according to the actual application scene of the model (for example, the good sample is concentrated in a low section, the good sample is concentrated in a high section), judging the best state of the best distinguished good sample or not being the best distinguished sample.
In the modeling and debugging of the step (1.4), initializing a series of model variables, wherein the step comprises removing variables which obviously have no effect on modeling or have no business meaning, removing variables with excessively low information values and excessively high repeated value proportion, and removing other variables except for one reserved variable in two or more variables with higher pearson correlation.
(2) Enterprise credit information modeling
The enterprise credit information is derived from an enterprise credit report and comprises credit prompt information, loan transaction summary information, guarantee transaction summary information, loan account information and the like;
Layering the model, primarily screening key variables by an analytic hierarchy process, comprehensively considering the service attribute, the correlation, the data coverage and other conditions of the variables, giving a score by an expert scoring process, modeling the credit information of the enterprise to obtain an enterprise credit information model, wherein the model result of the enterprise credit information model comprises a variable name, a variable meaning, a variable value and a percentage preparation score;
The method comprises the steps of inquiring whether a credit report exists in an enterprise information main body based on identification information of the enterprise information main body, if the credit report exists, taking data of latest report date in a database, judging whether a field of ' year with credit transaction for the first time ' in a section of a credit prompt information unit ' in the credit report is empty, calculating enterprise credit information scores according to an enterprise credit information model if the field is not empty, not supporting calculating the enterprise credit information scores and enabling the enterprise credit information scores to be in empty processing if the field is empty, and not supporting calculating the enterprise credit information scores and enabling the enterprise credit information scores to be in empty processing if the credit report does not exist.
Wherein, each variable in the enterprise credit information model corresponds to a value and a percentage preparation score is defined by the following table:
the enterprise credit information is derived from enterprise credit reports, the data sources are single, and the structured data is normative, including credit prompt information, loan transaction summary information, guarantee transaction summary information, loan account information and the like. In the modeling of the enterprise credit information (2), modeling is performed on the enterprise credit information to obtain an enterprise credit information model, which specifically includes:
(2.1) obtaining the characteristic variables in the enterprise credit information model through data cleaning and calculation
Cleaning credit information data of enterprises, selecting important information dimension, processing variables, calculating, and if the credit information data table has records of the credit reports queried for a plurality of times, arranging the credit reports in reverse order according to the generation time of the credit reports, and taking the data of the credit report with the latest date as modeling sample data;
And re-examining and checking the fetched enterprise credit information data through data cleaning so as to discover and correct errors in the data file and reduce the influence of the error data on the model performance, wherein the data cleaning mainly comprises the steps of removing repeated data, removing logic conflict data, completing part of univariate calculation, processing noise data, outliers and processing missing numerical values.
(2.2) Feature analysis
Carrying out statistical feature analysis and distribution analysis on feature variables in the enterprise credit information model, checking extreme values and processing the extreme values;
and (3) sorting a characteristic variable table according to the result of the characteristic analysis, and recording the characteristic variable names, the calculation logic, the data coverage and the data distribution basic conditions.
(2.3) Evidence Weight (WOE) analysis
Obtaining the variable value of the characteristic variable through WOE conversion;
firstly, carrying out automatic box separation on all characteristic variables, then manually checking the reliability of an automatic box separation result, whether the automatic box separation result meets business requirements or not, whether the automatic box separation result has interpretability or not, and then judging whether the manual box separation is needed or not;
WOE for each category is defined as follows:
Wherein, columns Bad Distribution and Good Distribution represent the Distribution of "Bad clients" and "good clients" in each category, respectively, which are obtained by dividing the number of frequencies in each category by the total number of "Bad clients" or "good clients";
If the ratio in brackets is less than 1 then WOE is negative and vice versa WOE is positive.
(2.4) Scoring design
(2.4.1) Whole sample fraction distribution
The enterprise credit information model is modeled according to expert scoring rules, the score of each sample is obtained according to the scoring interval and scoring rules of each variable, whether the sample scores are concentrated, dispersed or have abnormal values or not is helped to identify through integral sample score distribution analysis, and the variables, variable values and scoring conditions of the model are readjusted according to score results and in combination with application scenes of financing and leasing industries;
(2.4.2) distribution of quality sample scores
The modeling target defines the clients as good clients or bad clients according to the client management classification of the project actual application party;
the good and bad samples are distinguished according to the bad definition label;
because each sample has a model score, the score distribution is carried out according to the good and bad samples, and the score distribution is used for checking whether the good and bad samples can be distinguished or not, namely, whether the good samples are concentrated in a high section and the bad samples are concentrated in a low section is judged, and according to the score result, the variables, the variable values and the score conditions of the model are readjusted in combination with the application scene of the financing leasing industry;
step (2.3) WOE analysis and step (2.4) scoring design are subjected to multi-round optimization so as to achieve the state that the sample score can best distinguish good samples from bad samples, and then a final model and variables of the enterprise credit information model are obtained;
(2.5) model achievement presentation
And showing the final modeling variable, the variable value and the percentile score of each variable in the enterprise credit information model on a business system.
(3) Credit data fusion scoring
The whole grading of the enterprise public information model obtained in the step (1) and the grading of the enterprise credit information obtained in the step (2) are output to a business application system in a credit investigation data fusion grading interface mode according to a contracted weight rule, and the business application system displays a final fusion grading result and key characteristic variables;
The credit data fusion scoring interface output content comprises, but is not limited to, enterprise names, unified social credit codes, credit fusion scores, enterprise public information scores, enterprise credit information scores, values and scores of each variable of an enterprise public information model, values and scores of each variable of an enterprise credit information model.
In the credit information data fusion score (3), the agreed weight rule may be that when the credit information score of the enterprise is not empty, the credit information data fusion score is calculated according to the weight ratio of the credit information score of the enterprise to the credit information score of the enterprise being 4:6, and when the credit information score of the enterprise is empty, the credit information data fusion score of the enterprise is consistent with the credit information score of the enterprise. Of course, the weighting rules may also be modified and adjusted according to the business requirements.
The invention provides an enterprise credit data fusion score modeling method based on financing and leasing industry, which comprises the steps of firstly, deeply researching enterprise public information and enterprise credit information, respectively describing a data analysis process and a data analysis result from a data sample summary, a data preprocessing rule, a characteristic variable analysis and a WOE analysis, secondly, selecting an applicable modeling method according to sample magnitude, characteristic variable condition, modeling target and the like of the enterprise public information and the enterprise credit information, respectively establishing an enterprise public information model and an enterprise credit information model, further performing model parameter tuning and score distribution tuning, and finally, completing fusion scoring of the enterprise public information model and the enterprise credit information model, and creating a set of total score interface output service.
Compared with the prior art, the enterprise credit data fusion scoring modeling method based on the financing and renting industry provided by the invention has the following main beneficial effects:
1. method for modeling public information of enterprise by using credit investigation organization view angle
The patent application provides a method for constructing a set of enterprise public information modeling method based on a financing and leasing industry enterprise credit information data fusion scoring modeling method, which is based on the view angle of a third-party credit agency, and based on the analysis of millions of enterprise public information macroscopic data, a credit risk condition of an enterprise information main body on the social public level is combed, government public information business association attributes are analyzed, key feature variables are extracted, and a logical regression model is used for selecting the model to construct the enterprise public information data range, data preprocessing, model design and assessment overall process.
2. Method for constructing enterprise credit information modeling from perspective of financing leasing company
The method for modeling the credit information of the enterprises is characterized in that a set of enterprise credit information modeling method is built, from the actual business demands of financing leasing companies, classification hierarchical management after client loan is combined, bad definition tags of modeling data are set, basic information, loan and guarantee transaction information and repayment performance information of an enterprise information main body on a credit report are combed, key feature variables are refined, and an enterprise credit information model is built through a hierarchical analysis method and an expert scoring method.
3. Method for constructing credit information data fusion scoring method based on financing lease industry
The enterprise credit information only reflects one aspect of enterprise credit risk, and the replacement data value of the enterprise public information is more and more important, so that the requirements of enterprise credit risk assessment cannot be completely met by modeling only the credit information, and the enterprise public information partial modeling is urgent to be converged. The method for fusing the enterprise public information model and the enterprise credit information model is constructed in the method of the patent application, meets the requirements of scoring modeling of different scenes in the future, and has strong expansibility.
Through the analysis, the patent based on the credit data fusion score modeling method for the financing leasing enterprises has higher novelty, practicability and expansibility compared with other score modeling methods in the market.
The invention provides a method for solving enterprise credit data fusion score modeling based on financing and renting industries. And integrating the data advantages of the third-party credit investigation organization and the application scene of the financing and renting company, constructing an objective statistical model by a modeling technology, scientifically predicting the enterprise risk by model scoring, and comprehensively improving the application value and the use efficiency of the data.
Social benefits
The method for solving enterprise credit information data fusion score modeling based on the financing and renting industry comprehensively improves the application value of the substitute data. The modeling method combines the behavior records of the enterprise main body on the social public layers of business, judicial, tax, intellectual property and the like, combines the behavior expression of the credit information of the enterprise, designs a reasonable credit risk assessment model, comprehensively and accurately characterizes the credit portrait of the information main body, displays the conditions of social operation, credit losing information, credit guarantee and the like of the information main body through scoring profiles, comprehensively improves the social public credit awareness of the enterprise group, and strengthens the application value of the substitute data.
The method for solving enterprise credit data fusion score modeling based on the financing and renting industry effectively improves credit data modeling specification and efficiency. The invention carries out deep thinking and research from the aspects of data preprocessing, model design and evaluation, multi-model fusion, system integration application and the like, provides reference values for other non-banking financial institutions (financing leases, business insurance, small loans, consumption finances and the like), and also provides an effective modeling method for data modeling of enterprise credit investigation industry to a certain extent. The invention provides a reliable credit sign data fusion scoring modeling method in the aspects of preventing credit risk, improving performance level and the like, and helps other non-banking financial institutions to quickly build scoring models adapting to own business requirements.
The invention provides a method for solving enterprise credit information data fusion score modeling based on financing and renting industries, which strengthens the construction of an honest social credit system. The invention creates response country to the request of the standard credit information management, explores the business (enterprise) credit information standardization road, perfects the ' cross-domain ' credibility combined incentive and the ' off-credibility combined punishment and withdrawal mechanism, restricts the enterprise information main body to perform legal honest operation, maintains the normal order of the market, builds the honest social environment, promotes the general financial development, gradually improves the general financial institution risk control level, makes greater contribution to the establishment of the industry risk control system of the business self-discipline and the social supervision, and can generate huge social benefit and play an important role in creating the good credit environment and the establishment of the honest system.
Economic benefit
The method for solving enterprise credit information data fusion score modeling based on the financing and renting industry improves the management efficiency of wind control of the financing and renting industry. With the continuous development of various credit businesses and the vigorous competition of industries, more and more financing and leasing institutions realize that gradually increasing the risk control level of enterprises is a necessary basis for the steady promotion of the businesses. The risk data is deeply mined and analyzed by utilizing a data analysis tool to identify potential risk modes and trends, mass data is integrated into simple scoring data by a modeling method, and risks are identified and predicted by objective statistics. By establishing a scoring monitoring system, ongoing risk is tracked and assessed in real time. The reference application of scoring in wind control can effectively simplify and optimize the wind control management flow, reduce unnecessary steps and links and improve the working efficiency.
The method for solving enterprise credit information data fusion score modeling based on the financing and renting industry reduces transaction cost and improves social operation efficiency. By applying the credit information data fusion score, the credit information losing person can be effectively tracked and monitored, so that the credit information losing person is restricted in various aspects, for example, the application of financial products such as loans is limited, or the participation of the credit information losing person in commercial activities such as bidding is limited. This can make the frustration of the distruster, greatly increasing the cost of distrusting. More trust and opportunities are available to the daemon, such as lower interest rates, faster approval speeds, etc., expanding the value of the daemon, which may encourage more enterprises to become daemons. The situation of asymmetric information is reduced, the transaction cost is reduced, the operation efficiency of the society is improved, and the economic benefit of the society is further improved.
The method for solving enterprise credit data fusion scoring modeling based on financing and leasing industries is provided, wherein enterprise public information is government public information from business, judicial, tax, intellectual property and the like, the enterprise public information has the condition of multiple data items and scattered and not centralized, and a user needs to log in a plurality of data platforms to acquire required data, so that the efficiency is low, a data service link is long, the operation is complex and the like.
The method for solving enterprise credit data fusion score modeling based on financing and leasing industry is provided, wherein the enterprise credit information is strictly used, the credit report is strictly and normally managed when being applied in the company on the premise of obtaining the authorization of an information main body, the process of using data approval is complex, and the timeliness is to be improved. Credit reports are generally in pdf format, credit information is proprietary, and data structure analysis is needed before professional interpretation analysis.
Detailed Description
Other advantages and features of the invention are shown by the following description of embodiments of the invention, given by way of example and not by way of limitation, with reference to the accompanying drawings.
A method for modeling enterprise credit data fusion scoring based on financing leasing industry comprises three steps of enterprise public information modeling, enterprise credit information modeling and credit data fusion scoring.
(1) Modeling enterprise public information:
Based on the identification information of the enterprise information main body, the enterprise basic information is called through an enterprise public information inquiry API, and the called enterprise basic information is stored in a service system memory, wherein the identification information of the enterprise information main body comprises, but is not limited to, enterprise names and/or unified social credit codes;
Defining a modeling target according to project application requirements, using a logistic regression model as a core modeling technology, using the data of the called enterprise basic information as sample data, and modeling the enterprise public information to obtain an enterprise public information model;
and obtaining the integral score of the enterprise public information model through the corresponding value and score of each variable of the enterprise public information model.
The enterprise public information data in the method refers to government affair public information such as industry and commerce, judicial tax, intellectual property and the like, and comprises but is not limited to enterprise registration information, stockholder information, main personnel, industry and commerce change, enterprise annual report, administrative permission, administrative penalty, stock right quality, enterprise external investment and the like.
The specific information types and main field items of the enterprise public information data are shown in the following table 1:
TABLE 1 Enterprise public information types and field items
In data modeling, a general modeling goal is a goal or objective to build a model to achieve. Modeling goals may be diverse, typically for prediction and decision support, and common modeling goals are budgeting for the likelihood of occurrence of some kind of bad definition event over a period of time.
For modeling of public information of enterprises, definition of bad samples may be different according to specific situations, and general bad definition schemes and advantages are as follows in table 2:
TABLE 2 Enterprise public information-modeling goal general purpose scenario comparison
The modeling target is selected by comprehensively considering project application scenes and sample data distribution characteristics. To enhance the designability of modeling targets, modeling targets may be defined in terms of project application requirements.
For example:
the definition of modeling target variables ("good" and "bad") can be defined as "good" clients and "bad" clients according to the actual application requirements of the project and the post-credit management classification of the enterprise client group. Such as whether overdue, overdue condition, risk five-level classification, etc.
In the method, the modeling target is defined as a good client according to the client management classification of 'N1/N2/N3' of the actual application party of the financing and leasing industry, and the client management classification of 'A/B/C' is defined as a bad client.
N1 is a normal class item;
n2 is a project in the construction period, and whether the risk is uncertain or not is specific;
N3 is a few small operation flaw items, such as the situation that the electric charge income is not reported in time in the last month;
class C is a risk occurrence that requires intervention by a customer manager;
Class B is a risk occurrence, and requires company intervention, such as collection promotion and the like;
class A is a risk occurrence, and the problems need to be treated by legal litigation and other means;
the degree of deterioration of the item is N1-N2-N3-C-B-A.
Specifically, in the enterprise public information modeling, modeling is performed on enterprise public information to obtain an enterprise public information model, and the overall score of the enterprise public information model is finally obtained through corresponding value and score of each variable of the enterprise public information model, wherein the method specifically comprises the following steps:
(1.1) data cleaning analysis:
and performing data cleaning and calculation on the retrieved enterprise basic information data to obtain characteristic variables in an enterprise public information model, wherein the data cleaning mainly comprises the steps of removing repeated data, removing logic conflict data, completing part of univariate calculation, processing noise data, abnormal values and outliers and processing missing numerical values.
Data cleansing analysis, including cleansing rules for generic data and cleansing rules for specific data.
(A) The general data cleaning rule is specifically processed as follows:
(A1) The date field is uniformly displayed according to the YYYY-MM-DD format;
(A2) An amount type field, which is to unify all amounts into a numerical format and calculate according to ten thousand yuan of the Renminbi;
(A3) The proportion field is used for unifying all proportions into a numerical format, removing percentage numbers, and supplementing 0 to 0 before decimal points;
(A4) And (3) repeating the data, namely, for the same event, possibly multiple repeated information records exist in the data table, and the data deduplication takes a keyword of 'company name + event unique identification judgment' as a main identification mode. The duplicate removal judgment keywords of each table are shown in Table 3 below.
TABLE 3 Enterprise public information record uniqueness judgment identification
(B) The specific data cleaning rule is specifically processed as follows:
(B1) The enterprise registration comprises the steps of correcting an enterprise registration date by using the enterprise operation starting date if the enterprise registration date is empty, deleting the observation that the enterprise operation starting date is empty, deleting the observation that the operation state is cancel or cancel but the cancel date or cancel date is not empty, deleting the observation that the enterprise operation expiration date is not empty but the enterprise operation expiration date is less than the enterprise operation starting date, deleting the observation that the cancel date is not empty but the cancel date is less than the enterprise registration date;
(B2) The main personnel are to combine the job names of the same company and the same person name, wherein one person has a plurality of job positions and is written in two rows, and the job names are combined into one row;
(B3) The executed person information is that the date of the case is deleted and is not empty, but the date of the case is observed by the date of the case < the registration date of the enterprise;
(B4) The open announcement, delete the observation that the case setting time is not empty, but the case setting time is < the enterprise registration date;
(B5) Deleting the observation that the release date is not empty, but the release date is less than the enterprise registration date;
(B6) Abnormal operation, namely deleting the observation of the listing date < enterprise registration date, which is not empty, but is not empty;
(B7) Administrative penalties, namely deleting the observation that the penalty decision date is not empty but the penalty decision date is less than the enterprise registration date, and substituting the public date if the penalty decision date is empty;
(B8) Administrative license, delete license start date is not empty, but license start date < observation of business registration date;
(B9) Spot check, namely deleting the observation that the spot check date is not empty, but the spot check date is less than the enterprise registration date;
(B10) The business change, delete the business change-change date is not empty, but the business change-change date < = observation of the enterprise registration date;
(B11) External investment, namely deleting the observation that the date of the operation is not empty, but the date of the operation is less than the registration date of the enterprise;
(B12) A branch office deleting the observation that the established date is not empty, but the established date is < the enterprise registration date;
(B13) The method comprises the steps of checking the stock right, namely deleting the stock right, checking the stock right, setting up the registration date, and observing the stock right, setting up the registration date < the enterprise registration date;
(B14) Real estate mortgage-delete check-in date is not empty, but check-in date < observation of business check-in date.
(1.2) Feature variable analysis:
carrying out statistical characteristics and distribution analysis on characteristic variables in the obtained enterprise public information model, checking extreme values and processing the extreme values;
And (3) sorting the results of the feature variable analysis into a feature variable table, and recording the feature variable names, the calculation logic, the data coverage and the data distribution basic conditions.
(1.3) Evidence Weight (WOE) analysis:
Converting the logistic regression model into a standard grading card format through WOE conversion to obtain the variable value of the characteristic variable;
firstly, carrying out automatic box separation on all characteristic variables, then manually checking the reliability of an automatic box separation result, whether the automatic box separation result meets business requirements or not, whether the automatic box separation result has interpretability or not, and then judging whether the manual box separation is needed or not;
WOE for each category is defined as follows:
Wherein, columns Bad Distribution and Good Distribution represent the Distribution of "Bad clients" and "good clients" in each category, respectively, which are obtained by dividing the number of frequencies in each category by the total number of "Bad clients" or "good clients";
If the ratio in brackets is less than 1 then WOE is negative and vice versa WOE is positive.
(1.4) Modeling and debugging:
Initializing a series of model variables, fitting a model based on the current series of variables, wherein the model result of the fitted model comprises a characteristic variable name, a variable meaning, a variable value and a percentage score, and then judging whether the fitted model is an optimal model or not. If the model is judged to be the optimal model, a final model and variables of the enterprise public information model are obtained, if the model is judged to be the non-optimal model, a model is re-fitted based on a current series of variables after some variables are added or deleted to the model, whether the re-fitted model is the optimal model is judged, and until the optimal model is found, the final model and the variables of the enterprise public information model are obtained.
Initializing model variables, including:
① Removing variables that are significantly ineffective for modeling
And manually removing variables which obviously have no effect on modeling or have no business meaning in the original data of the enterprise public information, such as variables of unified social credit codes, registration authorities, legal representatives, permitted business projects, business scope and the like.
② Removing variables with information values that are too low and repetition value ratios that are too high
For example, in modeling and debugging a certain model, it can be seen from the following table 4 that the information value (info_value) of the feature variable "business name" is smaller than 0.02, and the information value is too low to be removed. The repetition value ratio (identification_rate) of the number of the copyright of the works, the number of the judicial auctions, the number of the information of the trusted executives, the number of the owe taxes information of the intellectual property rights, the number of the administrative penalties, the number of the documents of the judge of the last 2 years is larger than 0.95, and the repeated value ratio is overlarge for removal.
TABLE 4 data modeling characteristic variable information value IV, repeat value ratio IR
③ Removing variables with higher pearson correlation
For example, pearson correlation coefficient values of "annual report of business" and "annual number of establishment" have pearson correlation of 0.813. The 'enterprise annual report' is removed in the modeling process, and only the 'established years' variable is reserved.
The method comprises the steps of carrying out training set and verification set splitting on sample data, wherein the splitting ratio is 7:3, ensuring that the bad sample proportion of the training set and the verification set is consistent with the bad sample proportion of the whole data during splitting, obtaining KS curves and ROC curves of the training set and the verification set, wherein the best effect is the best statistical model of quantitative analysis, calculating the model score of each sample according to the model result of the obtained best statistical model and the characteristic variable value of each sample data, obtaining the sample score distribution of the best statistical model, reflecting the distinguishing capability, stability and possible deviation of the best statistical model on different samples through the sample score distribution, judging whether the model score of the sample can be used for distinguishing the good sample from the bad sample according to the actual application scene of the model, for example, wherein the bad sample is concentrated in a low segment, judging the best model if the best state of the good sample is distinguished from the bad sample, namely obtaining the final model and variable of an enterprise public information model, and if the best state of the best sample is not judged.
When the model is judged to be not the optimal model, adding some variables into the model or deleting some variables, then re-fitting a model based on a current series of variables, and continuously judging whether the re-fitted model is the optimal model according to the method until the optimal model is found, and obtaining a final model and variables of the enterprise public information model.
(1.5) Fractional linear conversion
The grading score is linearly converted into 0-100 grades, the distribution characteristics are unchanged, and the conversion formula is as follows:
(1.6) model achievement presentation
And displaying the final modeling variable, the variable value and the percentile score of each variable of the enterprise public information model on the business system.
TABLE 5 Enterprise public information model
(2) Enterprise credit information modeling
The enterprise credit information is derived from an enterprise credit report and comprises credit prompt information, loan transaction summary information, guarantee transaction summary information, loan account information and the like;
layering the model, primarily screening key variables by an analytic hierarchy process, comprehensively considering the service attribute, the correlation, the data coverage and other conditions of the variables, giving scores by an expert scoring process, modeling the credit information of the enterprise, and obtaining an enterprise credit information model;
The method comprises the steps of inquiring whether a credit report exists in an enterprise information main body based on identification information of the enterprise information main body, if the credit report exists, taking data of latest report date in a database, judging whether a field of ' year with credit transaction for the first time ' in a section of a credit prompt information unit ' in the credit report is empty, calculating enterprise credit information scores according to an enterprise credit information model if the field is not empty, not supporting calculating the enterprise credit information scores and enabling the enterprise credit information scores to be in empty processing if the field is empty, and not supporting calculating the enterprise credit information scores and enabling the enterprise credit information scores to be in empty processing if the credit report does not exist.
The enterprise credit information is derived from enterprise credit reports, the data sources are single, and the structured data is normative, including credit prompt information, loan transaction summary information, guarantee transaction summary information, loan account information and the like. In the modeling of the enterprise credit information (2), modeling is performed on the enterprise credit information to obtain an enterprise credit information model, which specifically includes:
(2.1) obtaining the characteristic variables in the enterprise credit information model through data cleaning and calculation
Cleaning credit information data of enterprises, selecting important information dimension, processing variables, calculating, and if the credit information data table has records of the credit reports queried for a plurality of times, arranging the credit reports in reverse order according to the generation time of the credit reports, and taking the data of the credit report with the latest date as modeling sample data;
And re-examining and checking the fetched enterprise credit information data through data cleaning so as to discover and correct errors in the data file and reduce the influence of the error data on the model performance, wherein the data cleaning mainly comprises the steps of removing repeated data, removing logic conflict data, completing part of univariate calculation, processing noise data, outliers and processing missing numerical values.
(2.2) Feature analysis
Carrying out statistical feature analysis and distribution analysis on feature variables in the enterprise credit information model, checking extreme values and processing the extreme values;
and (3) sorting a characteristic variable table according to the result of the characteristic analysis, and recording the characteristic variable names, the calculation logic, the data coverage and the data distribution basic conditions.
(2.3) Evidence Weight (WOE) analysis
Obtaining the variable value of the characteristic variable through WOE conversion;
firstly, carrying out automatic box separation on all characteristic variables, then manually checking the reliability of an automatic box separation result, whether the automatic box separation result meets business requirements or not, whether the automatic box separation result has interpretability or not, and then judging whether the manual box separation is needed or not;
WOE for each category is defined as follows:
Wherein, columns Bad Distribution and Good Distribution represent the Distribution of "Bad clients" and "good clients" in each category, respectively, which are obtained by dividing the number of frequencies in each category by the total number of "Bad clients" or "good clients";
If the ratio in brackets is less than 1 then WOE is negative and vice versa WOE is positive.
(2.4) Scoring design
(2.4.1) Whole sample fraction distribution
The enterprise credit information model is modeled according to expert scoring rules, the score of each sample is obtained according to the scoring interval and scoring rules of each variable, whether the sample scores are concentrated, dispersed or have abnormal values or not is helped to identify through integral sample score distribution analysis, and the variables, variable values and scoring conditions of the model are readjusted according to score results and in combination with application scenes of financing and leasing industries;
(2.4.2) distribution of quality sample scores
The modeling target defines the clients as good clients or bad clients according to the client management classification of the project actual application party;
the good and bad samples are distinguished according to the bad definition label;
because each sample has a model score, the score distribution is carried out according to the good and bad samples, and the score distribution is used for checking whether the good and bad samples can be distinguished or not, namely, whether the good samples are concentrated in a high section and the bad samples are concentrated in a low section is judged, and according to the score result, the variables, the variable values and the score conditions of the model are readjusted in combination with the application scene of the financing leasing industry;
step (2.3) WOE analysis and step (2.4) scoring design are subjected to multi-round optimization so as to achieve the state that the sample score can best distinguish good samples from bad samples, and then a final model and variables of the enterprise credit information model are obtained;
(2.5) model achievement presentation
And showing the final modeling variable, the variable value and the percentile score of each variable in the enterprise credit information model on a business system.
TABLE 6 Enterprise credit information model
(3) Credit data fusion scoring
The obtained integral score of the enterprise public information model and the enterprise credit information score are output to a business application system in an interface mode according to the agreed weight rule; the business system displays the final fused scoring result and key feature variables.
In an actual business scenario, each business information entity may calculate a business disclosure information score, but may not be able to calculate a business credit information score because of missing credit information. And (3) finishing processing and calculating the enterprise public information score and the enterprise credit information score in the system, and calculating the fusion score according to the weight ratio of 4:6.
For example, the enterprise public information model score is 80, the credit information model score is 60, and the credit information data fusion score is 80×0.4+60×0.6=68.
For example, the enterprise public information model score is 70, and the credit information data fusion score is 70 when no credit information data exists.
The credit information data fusion scoring interface output content comprises an enterprise name, a unified social credit code, a credit information fusion score, an enterprise public information score, an enterprise credit information score, a value and a score of each variable of the enterprise public information model, and a value and a score of each variable of the enterprise credit information model.
Although the invention has been described in terms of the preferred embodiment, it is not intended to limit the scope of the invention to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.

Claims (7)

1.一种基于融资租赁行业企业征信数据融合评分建模方法,其特征在于,包括以下步骤:1. A scoring modeling method based on the fusion credit data of enterprises in the financial leasing industry, characterized by comprising the following steps: (1)企业公开信息建模:(1) Enterprise public information modeling: 基于企业信息主体的标识信息,通过企业公开信息查询API调取企业基础信息,并且在业务系统内存中存储调取到的企业基础信息,所述企业信息主体的标识信息包括但不限于企业名称和/或统一社会信用代码;Based on the identification information of the enterprise information subject, retrieve the enterprise basic information through the enterprise public information query API, and store the retrieved enterprise basic information in the business system memory, wherein the identification information of the enterprise information subject includes but is not limited to the enterprise name and/or unified social credit code; 依据项目应用需求定义建模目标,采用逻辑回归模型作为核心建模技术,以调取到的企业基础信息的数据作为样本数据,针对企业公开信息进行建模,获得企业公开信息模型,企业公开信息模型的模型结果包括变量名称、变量含义、变量取值、百分制得分;Define the modeling target according to the project application requirements, use the logistic regression model as the core modeling technology, use the retrieved enterprise basic information data as sample data, model the enterprise public information, and obtain the enterprise public information model. The model results of the enterprise public information model include variable name, variable meaning, variable value, and percentage score; 通过企业公开信息模型的每个变量对应取值、百分制得分,最终得出企业公开信息模型的整体评分;Through the corresponding values and percentage scores of each variable in the enterprise public information model, the overall score of the enterprise public information model is finally obtained; 其中,企业公开信息模型中的每个变量对应取值、百分制得分按下表定义:The corresponding values and percentage scores of each variable in the enterprise public information model are defined in the following table: (2)企业信贷信息建模(2) Enterprise credit information modeling 所述的企业信贷信息来源于企业信用报告,包括:信用提示信息、借贷交易汇总信息、担保交易汇总信息、借贷账户信息等;The enterprise credit information is derived from the enterprise credit report, including: credit reminder information, loan transaction summary information, guarantee transaction summary information, loan account information, etc.; 通过“层次分析法”完成模型分层、重点变量初筛,再综合考虑变量的业务属性、相关性、数据覆盖度等情况,通过“专家打分法”赋予分值,针对企业信贷信息进行建模,获得企业信贷信息模型,企业信贷信息模型的模型结果包括变量名称、变量含义、变量取值、百分制得分;The model is layered and key variables are initially screened through the "hierarchical analysis method". The business attributes, relevance, data coverage and other conditions of the variables are comprehensively considered, and scores are assigned through the "expert scoring method". Modeling is carried out for corporate credit information to obtain the corporate credit information model. The model results of the corporate credit information model include variable names, variable meanings, variable values, and percentage scores. 基于企业信息主体的标识信息,查询该企业信息主体是否有信用报告;如果有信用报告,则取用数据库中报告日期最新的数据,判断信用报告中的“信用提示信息单元”章节中的“首次有信贷交易的年份”字段是否为空,如果该字段不为空则按照企业信贷信息模型去计算企业信贷信息评分,如果该字段为空则不支持计算企业信贷信息评分并且该项企业信贷信息评分为空处理;如果没有信用报告,则不支持计算企业信贷信息评分并且该项企业信贷信息评分为空处理;Based on the identification information of the enterprise information subject, check whether the enterprise information subject has a credit report; if there is a credit report, use the latest data in the database with the report date to determine whether the "Year of the First Credit Transaction" field in the "Credit Prompt Information Unit" section in the credit report is empty. If the field is not empty, calculate the enterprise credit information score according to the enterprise credit information model; if the field is empty, the calculation of the enterprise credit information score is not supported and the enterprise credit information score is treated as empty; if there is no credit report, the calculation of the enterprise credit information score is not supported and the enterprise credit information score is treated as empty; 其中,企业信贷信息模型中的每个变量对应取值、百分制得分按下表定义:The corresponding values and percentage scores of each variable in the enterprise credit information model are defined in the following table: (3)征信数据融合评分(3) Credit data integration scoring 对步骤(1)中获得的企业公开信息模型的整体评分和步骤(2)中获得的企业信贷信息评分,按照约定的权重规则,再通过征信数据融合评分接口的方式输出给业务应用系统,由业务应用系统展现出最终的融合的评分结果及关键特征变量;The overall score of the enterprise public information model obtained in step (1) and the enterprise credit information score obtained in step (2) are output to the business application system through the credit data fusion scoring interface according to the agreed weight rules, and the business application system displays the final fusion scoring results and key feature variables; 所述征信数据融合评分接口输出内容包括但不限于:企业名称、统一社会信用代码、征信融合得分、企业公开信息得分、企业信贷信息得分、企业公开信息模型每个变量的值及得分、企业信贷信息模型每个变量的值及得分。The output content of the credit data fusion scoring interface includes but is not limited to: enterprise name, unified social credit code, credit fusion score, enterprise public information score, enterprise credit information score, value and score of each variable of the enterprise public information model, and value and score of each variable of the enterprise credit information model. 2.如权利要求1所述的基于融资租赁行业企业征信数据融合评分建模方法,其特征在于:所述的针对企业公开信息进行建模,获得企业公开信息模型;通过企业公开信息模型的每个变量对应取值、分值,最终得出企业公开信息模型的整体评分;具体包括:2. The scoring modeling method based on the fusion credit data of the financial leasing industry as claimed in claim 1 is characterized by: the modeling is performed on the public information of the enterprise to obtain the public information model of the enterprise; the overall score of the public information model of the enterprise is finally obtained by taking corresponding values and scores for each variable of the public information model of the enterprise; specifically including: (1.1)数据清洗分析:(1.1) Data cleaning and analysis: 对调取的企业基础信息数据进行数据清洗、计算,获得企业公开信息模型中的特征变量,所述数据清洗主要包括:去除重复数据、去除逻辑冲突数据、完成部分单变量计算、处理噪声数据、异常值和离群点、处理缺失数值;Perform data cleaning and calculation on the retrieved enterprise basic information data to obtain characteristic variables in the enterprise public information model. The data cleaning mainly includes: removing duplicate data, removing logical conflict data, completing some single variable calculations, processing noise data, abnormal values and outliers, and processing missing values; (1.2)特征变量分析:(1.2) Characteristic variable analysis: 对获得的企业公开信息模型中的特征变量做统计特性和分布分析,检查极端值并对极端值进行处理;Conduct statistical characteristics and distribution analysis on the characteristic variables in the obtained enterprise public information model, check extreme values and process them; 将特征变量分析的结果整理一个特征变量表,记录特征变量名称、计算逻辑、数据覆盖度、数据分布基本情况;Organize the results of feature variable analysis into a feature variable table, recording the feature variable name, calculation logic, data coverage, and basic data distribution information; (1.3)证据权重(WOE)分析:(1.3) Weight of Evidence (WOE) Analysis: 通过WOE转换将逻辑回归模型转变为标准评分卡格式,获得特征变量的变量取值;The logistic regression model is transformed into a standard scorecard format through WOE transformation to obtain the variable values of the feature variables; 首先对所有特征变量进行自动分箱,然后人工查看自动分箱结果的可靠性、是否符合业务需求、是否具有可解释性,再判断是否需要人工分箱;First, all feature variables are automatically binned. Then, the reliability of the automatic binning results is manually checked to see whether they meet business requirements and are interpretable. Then, it is determined whether manual binning is needed. 每个类别的WOE定义如下:The WOE for each category is defined as follows: 其中,列Bad Distribution和Good Distribution分别表示各类别中“坏客户”的分布情况和“好客户”的分布情况,它们由每一类别中的频率数除以“坏客户”或“好客户”的总数而得到;The columns Bad Distribution and Good Distribution represent the distribution of "bad customers" and "good customers" in each category, respectively. They are obtained by dividing the frequency count in each category by the total number of "bad customers" or "good customers". 如果括号内的比值小于1则WOE是负值,反之则WOE是正值;If the ratio in the brackets is less than 1, the WOE is negative, otherwise the WOE is positive; (1.4)建模调试:(1.4) Modeling and debugging: 初始化一系列模型变量,基于当前的一系列变量拟合一个模型,拟合的模型的模型结果包括特征变量名称、变量含义、变量取值、百分制得分,然后判断这个拟合的模型是否为最优模型;Initialize a series of model variables, fit a model based on the current series of variables, and the model results of the fitted model include the feature variable name, variable meaning, variable value, and percentile score, and then determine whether the fitted model is the optimal model; 如果判断为是最优模型,则获得企业公开信息模型的最终模型和变量;If it is judged to be the optimal model, the final model and variables of the enterprise public information model are obtained; 如果判断为不是最优模型,则向模型中增加一些变量或删除一些变量之后再基于当前的一系列变量重新拟合一个模型,判断这个重新拟合的模型是否为最优模型,直到发现最优模型,则获得企业公开信息模型的最终模型和变量;If it is judged that it is not the optimal model, some variables are added to the model or some variables are deleted, and then a new model is fitted based on the current series of variables to determine whether the refitted model is the optimal model. Until the optimal model is found, the final model and variables of the enterprise public information model are obtained; (1.5)分数线性转化:(1.5) Fractional linear transformation: 将评分分数线性转换为0~100分,分布特征不变,转换公式为:The score is linearly converted to 0-100 points, and the distribution characteristics remain unchanged. The conversion formula is: (1.6)模型成果展现(1.6) Model Results Presentation 将企业公开信息模型的最终入模变量、变量取值以及每个变量的百分制分数在业务系统上进行展现。The final model input variables, variable values and percentage scores of each variable of the enterprise public information model are displayed on the business system. 3.如权利要求2所述的基于融资租赁行业企业征信数据融合评分建模方法,其特征在于:所述的(1.1)数据清洗分析,具体包括:3. The method for fusion scoring modeling based on credit data of financial leasing industry enterprises as claimed in claim 2 is characterized in that: the data cleaning analysis (1.1) specifically includes: (A)通用数据的清洗规则,具体的处理方式如下:(A) General data cleaning rules, the specific processing methods are as follows: (A1)日期类字段:统一按YYYY-MM-DD格式展示;(A1) Date fields: displayed in the format of YYYY-MM-DD; (A2)金额类字段:将所有金额统一成数值格式,按人民币万元计算;(A2) Amount fields: All amounts are converted into numerical format and calculated in RMB 10,000; (A3)比例类字段:将所有比例统一成数值格式,去除百分号,小数点前未有0的补足0;(A3) Ratio fields: All ratios are converted into numerical format, percentage signs are removed, and zero is added if there is no zero before the decimal point; (A4)数据重复类:对于同一个事件,数据表中可能会存在多条重复的信息记录,数据去重以“公司名称+事件唯一性标识判断”关键词为主要识别方式;(A4) Data duplication: For the same event, there may be multiple duplicate information records in the data table. Data deduplication is mainly identified by the keyword "company name + event unique identifier judgment"; (B)特定数据的清洗规则,具体的处理方式如下:(B) Cleaning rules for specific data. The specific processing methods are as follows: (B1)企业注册登记:若企业注册日期为空,则以企业经营起始日期修正该日期;删除企业注册日期为空的观测;删除企业经营起始日期为空的观测;删除经营状态为注销或吊销,但是注销日期或吊销日期不为空的观测;删除企业经营截止日期不为空,但是企业经营截止日期<企业经营起始日期的观测;删除注销日期不为空,但是注销日期<企业注册日期的观测;删除吊销日期不为空,但是吊销日期<企业注册日期的观测;(B1) Enterprise registration: If the enterprise registration date is empty, the date is corrected with the enterprise operation start date; delete the observations with empty enterprise registration date; delete the observations with empty enterprise operation start date; delete the observations with business status of cancellation or revocation, but the cancellation date or revocation date is not empty; delete the observations with business end date not empty, but business end date < business start date; delete the observations with cancellation date not empty, but cancellation date < enterprise registration date; delete the observations with revocation date not empty, but revocation date < enterprise registration date; (B2)主要人员:合并同一公司、同一个人名的职位名称;一个人有多个职位而且是分两行写的,给它合并到一行;(B2) Main personnel: merge the job titles of the same company and the same person; if a person has multiple job titles and they are written on two lines, merge them into one line; (B3)被执行人信息:删除立案日期不为空,但是立案日期<企业注册日期的观测;(B3) Information on the person subject to enforcement: Delete the observations where the filing date is not empty but the filing date is less than the enterprise registration date; (B4)开庭公告:删除立案时间不为空,但是立案时间<企业注册日期的观测;(B4) Court Announcement: Delete the observations where the filing date is not empty but the filing date is less than the enterprise registration date; (B5)失信被执行人:删除发布日期不为空,但是发布日期<企业注册日期的观测;(B5) Dishonest debtor: Delete the observations where the publication date is not empty but the publication date is less than the enterprise registration date; (B6)经营异常:删除列入日期不为空,但是列入日期<企业注册日期的观测;删除移出日期<列入日期的观测;(B6) Operational anomalies: Delete observations where the inclusion date is not empty but the inclusion date is less than the enterprise registration date; delete observations where the removal date is less than the inclusion date; (B7)行政处罚:删除处罚决定日期不为空,但是处罚决定日期<企业注册日期的观测;若处罚决定日期为空,则以公示日期替代;(B7) Administrative penalties: Delete observations where the penalty decision date is not empty, but the penalty decision date is less than the enterprise registration date; if the penalty decision date is empty, replace it with the public announcement date; (B8)行政许可:删除许可起始日期不为空,但是许可起始日期<企业注册日期的观测;删除许可截止日期不为空,但是许可截止日期<许可起始日期的观测;(B8) Administrative license: Delete the observations where the license start date is not empty but the license start date is less than the enterprise registration date; delete the observations where the license end date is not empty but the license end date is less than the license start date; (B9)抽查检查:删除抽查日期不为空,但是抽查日期<企业注册日期的观测;(B9) Spot check: delete the observations where the spot check date is not empty but the spot check date is less than the enterprise registration date; (B10)工商变更:删除工商变更-变更日期不为空,但是工商变更-变更日期<=企业注册日期的观测;(B10) Industrial and Commercial Change: Delete the observations where the Industrial and Commercial Change-Change Date is not empty, but the Industrial and Commercial Change-Change Date is <= Enterprise Registration Date; (B11)对外投资:删除开业日期不为空,但是开业日期<企业注册日期的观测;(B11) Outward investment: Delete the observations where the opening date is not empty but the opening date is less than the enterprise registration date; (B12)分支机构:删除成立日期不为空,但是成立日期<企业注册日期的观测;(B12) Branches: Delete observations where the establishment date is not empty but the establishment date is less than the enterprise registration date; (B13)股权出质:删除股权出质设立登记日期不为空,但是股权出质设立登记日期<企业注册日期的观测;(B13) Equity pledge: Delete the observations where the equity pledge registration date is not empty, but the equity pledge registration date is less than the enterprise registration date; (B14)动产抵押:删除登记日期不为空,但是登记日期<企业注册日期的观测。(B14) Movable Property Mortgage: Delete the observations where the registration date is not empty but the registration date is less than the enterprise registration date. 4.如权利要求2所述的基于融资租赁行业企业征信数据融合评分建模方法,其特征在于:所述步骤(1.4)建模调试中,所述的判断拟合的模型是否为最优模型,具体包括:4. The scoring modeling method based on the fusion credit data of enterprises in the financial leasing industry as claimed in claim 2 is characterized in that: in the modeling and debugging of the step (1.4), the judgment of whether the fitted model is the optimal model specifically includes: 对样本数据进行训练集和验证集拆分,拆分比例为7:3,拆分时必须保证训练集和验证集的坏样本占比与全量数据的坏样本占比保持一致;The sample data is split into a training set and a validation set with a split ratio of 7:3. When splitting, the proportion of bad samples in the training set and validation set must be consistent with the proportion of bad samples in the full data. 获得训练集和验证集的KS曲线和ROC曲线,效果最好的即为定量分析的最优统计模型;Obtain the KS curve and ROC curve of the training set and validation set, and the best one is the optimal statistical model for quantitative analysis; 根据获得的最优统计模型的模型结果,以及每个样本数据的特征变量取值情况,计算出每个样本的模型得分,获得该最优统计模型的样本分值分布;According to the model results of the optimal statistical model obtained and the characteristic variable values of each sample data, the model score of each sample is calculated to obtain the sample score distribution of the optimal statistical model; 通过样本分值分布反映最优统计模型对不同样本的区分能力、稳定性以及可能存在的偏差,根据模型实际应用场景,判断样本的模型得分是否能最好的区分好坏样本的状态;The sample score distribution reflects the optimal statistical model's ability to distinguish different samples, stability, and possible deviations. Based on the actual application scenario of the model, it is determined whether the sample model score can best distinguish between good and bad samples. 如果能最好的区分好坏样本的状态则判断为最优模型,如果不能最好的区分好坏样本的状态则判断为不是最优模型。If the state of good and bad samples can be best distinguished, it is judged as the optimal model. If the state of good and bad samples cannot be best distinguished, it is judged as not the optimal model. 5.如权利要求2所述的基于融资租赁行业企业征信数据融合评分建模方法,其特征在于:所述步骤(1.4)建模调试中,所述的初始化一系列模型变量,包括:5. The method for fusion scoring modeling based on credit data of enterprises in the financial leasing industry as claimed in claim 2 is characterized in that: in the modeling and debugging of the step (1.4), the initialization of a series of model variables includes: 移除对建模明显无作用或无业务含义的变量;Remove variables that are obviously not useful for modeling or have no business significance; 移除信息值过低和重复值比例过高的变量;以及Remove variables with low information values and high proportion of duplicate values; and 移除皮尔逊相关性较高的两个或多个变量中的除保留的一个变量外的其他变量。Remove all but one of the two or more variables with high Pearson correlations. 6.如权利要求1所述的基于融资租赁行业企业征信数据融合评分建模方法,其特征在于:6. The scoring modeling method based on the fusion credit data of financial leasing industry enterprises according to claim 1 is characterized by: 所述的(2)企业信贷信息建模中,所述的针对企业信贷信息进行建模,获得企业信贷信息模型,具体包括:In the aforementioned (2) enterprise credit information modeling, the aforementioned modeling of enterprise credit information to obtain an enterprise credit information model specifically includes: (2.1)通过数据清洗、计算,获得企业信贷信息模型中的特征变量(2.1) Obtain characteristic variables in the enterprise credit information model through data cleaning and calculation 对企业信贷信息数据进行清洗,选取重要信息维度,再进行变量的加工计算,如果信贷信息数据表中存在同一个企业有多次被查询信用报告的记录,则按信用报告生成时间倒序排列,取日期最新的一份信用报告的数据作为建模样本数据;Clean the enterprise credit information data, select important information dimensions, and then process and calculate the variables. If there are multiple records of credit reports being queried for the same enterprise in the credit information data table, sort them in reverse order by the time the credit reports were generated, and take the data of the latest credit report as the modeling sample data; 通过数据清洗对调取的企业信贷信息数据进行重新审查和校验,以便发现并纠正数据文件中的错误,减少错误数据对模型性能的影响,所述数据清洗主要包括:去除重复数据、去除逻辑冲突数据、完成部分单变量计算、处理噪声数据、异常值和离群点、处理缺失数值;Re-examine and verify the retrieved enterprise credit information data through data cleaning to find and correct errors in the data files and reduce the impact of erroneous data on model performance. The data cleaning mainly includes: removing duplicate data, removing logical conflict data, completing some single variable calculations, processing noise data, abnormal values and outliers, and processing missing values; (2.2)特征分析(2.2) Feature analysis 对企业信贷信息模型中的特征变量做统计特征分析和分布分析,检查极端值并对极端值进行处理;Conduct statistical characteristic analysis and distribution analysis on characteristic variables in the enterprise credit information model, check extreme values and process them; 特征分析的结果整理一个特征变量表,记录特征变量名称、计算逻辑、数据覆盖度、数据分布基本情况;The results of feature analysis are compiled into a feature variable table, which records the feature variable name, calculation logic, data coverage, and basic data distribution information; (2.3)证据权重(WOE)分析(2.3) Weight of Evidence (WOE) Analysis 通过WOE转换获得特征变量的变量取值;The variable values of the characteristic variables are obtained through WOE transformation; 首先对所有特征变量进行自动分箱,然后人工查看自动分箱结果的可靠性、是否符合业务需求、是否具有可解释性,再判断是否需要人工分箱;First, all feature variables are automatically binned. Then, the reliability of the automatic binning results is manually checked to see whether they meet business requirements and are interpretable. Then, it is determined whether manual binning is needed. 每个类别的WOE定义如下:The WOE for each category is defined as follows: 其中,列Bad Distribution和Good Distribution分别表示各类别中“坏客户”的分布情况和“好客户”的分布情况,它们由每一类别中的频率数除以“坏客户”或“好客户”的总数而得到;The columns Bad Distribution and Good Distribution represent the distribution of "bad customers" and "good customers" in each category, respectively. They are obtained by dividing the frequency count in each category by the total number of "bad customers" or "good customers". 如果括号内的比值小于1则WOE是负值,反之则WOE是正值;If the ratio in the brackets is less than 1, the WOE is negative, otherwise the WOE is positive; (2.4)评分设计(2.4) Scoring design (2.4.1)整体样本分数分布(2.4.1) Overall sample score distribution 企业信贷信息模型是根据专家打分规则建模,依照每个变量的评分区间及得分规则,获得每个样本的分值,通过整体样本分数分布分析帮助识别样本分值是否集中、分散或存在异常值等问题,根据分数结果,结合融资租赁行业应用场景再调整模型的变量、变量取值、得分情况;The enterprise credit information model is built according to the expert scoring rules. The score of each sample is obtained according to the scoring range and scoring rules of each variable. The overall sample score distribution analysis helps to identify whether the sample scores are concentrated, dispersed or have outliers. According to the score results, the model variables, variable values, and scores are adjusted in combination with the application scenarios of the financial leasing industry. (2.4.2)好坏样本分数分布(2.4.2) Distribution of good and bad sample scores 建模目标依据项目实际应用方的客户管理分类将客户定义为“好”客户或“坏”客户;The modeling goal is to define customers as "good" customers or "bad" customers based on the customer management classification of the actual project user; 好坏样本依据“坏定义”标签区分;Good and bad samples are distinguished based on the “bad definition” label; 因为每一个样本都有一个模型得分,这里按好坏样本再做分数分布,用于查看分值能否将好坏样本区分开,即:是否达到好样本集中在高分段、坏样本集中在低分段,根据分数结果,结合融资租赁行业应用场景再调整模型的变量、变量取值、得分情况;Because each sample has a model score, we will make a score distribution based on good and bad samples to see whether the score can distinguish good and bad samples, that is, whether good samples are concentrated in high segments and bad samples are concentrated in low segments. According to the score results, we will adjust the model variables, variable values, and scores in combination with the application scenarios of the financial leasing industry. 步骤(2.3)WOE分析和步骤(2.4)评分设计进行多轮调优,以便达成样本得分能最好的区分好坏样本的状态,则获得企业信贷信息模型的最终模型和变量;Step (2.3) WOE analysis and step (2.4) score design are tuned multiple times to achieve a state where the sample score can best distinguish good and bad samples, and the final model and variables of the enterprise credit information model are obtained; (2.5)模型成果展现(2.5) Model Results Presentation 将企业信贷信息模型中的最终入模变量、变量取值以及每个变量的百分制分数在业务系统上进行展现。The final model input variables, variable values and percentage scores of each variable in the enterprise credit information model are displayed on the business system. 7.如权利要求1所述的基于融资租赁行业企业征信数据融合评分建模方法,其特征在于:所述(3)征信数据融合评分中,所述约定的权重规则为:7. The credit data fusion scoring modeling method based on the financial leasing industry as claimed in claim 1 is characterized in that: in the (3) credit data fusion scoring, the agreed weight rule is: 当企业信贷信息评分不为空时,按企业公开信息模型得分与企业信贷信息评分的权重比为4∶6计算征信数据融合得分;When the enterprise credit information score is not empty, the credit data fusion score is calculated based on the weight ratio of the enterprise public information model score to the enterprise credit information score of 4:6; 当企业信贷信息评分为空时,该企业征信数据融合得分与其企业公开信息模型得分一致。When the enterprise credit information score is empty, the enterprise credit data fusion score is consistent with its enterprise public information model score.
CN202411842297.6A 2024-12-13 2024-12-13 A scoring modeling method based on the fusion credit data of enterprises in the financial leasing industry Pending CN119761901A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411842297.6A CN119761901A (en) 2024-12-13 2024-12-13 A scoring modeling method based on the fusion credit data of enterprises in the financial leasing industry

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411842297.6A CN119761901A (en) 2024-12-13 2024-12-13 A scoring modeling method based on the fusion credit data of enterprises in the financial leasing industry

Publications (1)

Publication Number Publication Date
CN119761901A true CN119761901A (en) 2025-04-04

Family

ID=95187045

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411842297.6A Pending CN119761901A (en) 2024-12-13 2024-12-13 A scoring modeling method based on the fusion credit data of enterprises in the financial leasing industry

Country Status (1)

Country Link
CN (1) CN119761901A (en)

Similar Documents

Publication Publication Date Title
Minutiello et al. The quality of nonfinancial voluntary disclosure: A systematic literature network analysis on sustainability reporting and integrated reporting
US12147647B2 (en) Artificial intelligence assisted evaluations and user interface for same
US20110238566A1 (en) System and methods for determining and reporting risk associated with financial instruments
US20060004595A1 (en) Data integration method
Culot et al. Using supply chain databases in academic research: A methodological critique
WO2017210519A1 (en) Dynamic self-learning system for automatically creating new rules for detecting organizational fraud
Alshehadeh et al. The impact of business intelligence tools on sustaining financial report quality in Jordanian commercial banks
CN112419030B (en) Method, system and equipment for evaluating financial fraud risk
Gamal et al. Corporate sustainability performance throughout the firm life cycle: Case of Egypt
Nwankwo et al. Knowledge discovery and analytics in process reengineering: a study of port clearance processes
Hu Predicting and improving invoice-to-cash collection through machine learning
Zhou et al. Judicial waves, ethical shifts: bankruptcy courts and corporate ESG performance
Kim et al. Trustworthy residual vehicle value prediction for auto finance
Duan et al. Integrating process mining and machine learning for advanced internal control evaluation in auditing
Sirikulvadhana Data mining as a financial auditing tool
CN119205305A (en) A financial product intelligent matching and pre-credit method and system
Choi et al. Noncompliance with non‐accounting securities regulations and GAAP violations
CN118626910A (en) Method, device and server for determining customer profile
Roubtsova et al. A Practical Extension of Frameworks for Auditing with Process Mining.
CN119761901A (en) A scoring modeling method based on the fusion credit data of enterprises in the financial leasing industry
CN117114812A (en) A method and device for recommending financial products for enterprises
Bakhshi et al. Developing a hybrid approach to credit priority based on accounting variables (using analytical network process (ANP) and multi-criteria decision-making)
Pan Fraudulent firm classification using monotonic classification techniques
CN120541465B (en) Report data anomaly monitoring and quality assessment system, method and electronic equipment
Melidis Personalized marketing campaign for upselling using predictive modeling in the health insurance sector

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination