CN119761901A - A scoring modeling method based on the fusion credit data of enterprises in the financial leasing industry - Google Patents
A scoring modeling method based on the fusion credit data of enterprises in the financial leasing industry Download PDFInfo
- Publication number
- CN119761901A CN119761901A CN202411842297.6A CN202411842297A CN119761901A CN 119761901 A CN119761901 A CN 119761901A CN 202411842297 A CN202411842297 A CN 202411842297A CN 119761901 A CN119761901 A CN 119761901A
- Authority
- CN
- China
- Prior art keywords
- enterprise
- model
- information
- data
- date
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 71
- 230000004927 fusion Effects 0.000 title claims abstract description 61
- 238000009826 distribution Methods 0.000 claims description 55
- 238000004458 analytical method Methods 0.000 claims description 51
- 238000004140 cleaning Methods 0.000 claims description 30
- 238000012545 processing Methods 0.000 claims description 24
- 238000004364 calculation method Methods 0.000 claims description 20
- 230000008569 process Effects 0.000 claims description 15
- 238000013179 statistical model Methods 0.000 claims description 14
- 238000013461 design Methods 0.000 claims description 11
- 238000006243 chemical reaction Methods 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 9
- 230000002159 abnormal effect Effects 0.000 claims description 8
- 238000007477 logistic regression Methods 0.000 claims description 6
- 230000008859 change Effects 0.000 claims description 4
- 238000005516 engineering process Methods 0.000 claims description 4
- 230000010354 integration Effects 0.000 claims description 4
- 238000004445 quantitative analysis Methods 0.000 claims description 3
- 238000013077 scoring method Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims 3
- 238000010200 validation analysis Methods 0.000 claims 3
- 238000003672 processing method Methods 0.000 claims 2
- 238000000926 separation method Methods 0.000 description 20
- 230000008901 benefit Effects 0.000 description 7
- 238000011835 investigation Methods 0.000 description 7
- 238000002360 preparation method Methods 0.000 description 6
- 238000012795 verification Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 238000007781 pre-processing Methods 0.000 description 5
- 238000013475 authorization Methods 0.000 description 3
- 230000006399 behavior Effects 0.000 description 3
- 238000007405 data analysis Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 230000008520 organization Effects 0.000 description 3
- 238000012502 risk assessment Methods 0.000 description 3
- 238000012954 risk control Methods 0.000 description 3
- 238000010276 construction Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 230000001502 supplementing effect Effects 0.000 description 2
- 230000032683 aging Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000006866 deterioration Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000029305 taxis Effects 0.000 description 1
Landscapes
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
The invention provides a method for modeling enterprise credit information fusion score based on financing and leasing industry, which comprises three steps of enterprise public information modeling, enterprise credit information modeling and credit information fusion score, wherein enterprise public information score and enterprise credit information score are respectively obtained by respectively establishing an enterprise public information model and an enterprise credit information model, and then enterprise credit information fusion score is obtained according to agreed weight rules and a final fusion scoring result and key feature variable are displayed by a service application system.
Description
Technical Field
The invention relates to the technical field of computers, in particular to an enterprise credit data fusion scoring modeling method based on financing and leasing industries.
Background
In recent years, credit investigation data is an important production element and plays an important role in the scenes of enterprise credit evaluation, government supervision and the like. Non-banking financial institutions such as financing leases, business insurance and petty loans commonly face some pain points in credit investigation data acquisition, processing and application processes, and the demands of data in the aspects of safety, compliance, aging and the like are urgently met through an enterprise credit investigation data fusion scoring modeling method. On one hand, public government affair information such as enterprise business, judicial, tax, intellectual property and the like is diversified in data service organization in the market, the data format is not uniform, the text content of the part of information is various and needs professional interpretation and analysis, on the other hand, the use of enterprise credit information has strict authorization requirements, the complicated approval process can reduce the use timeliness of the data, the data value is difficult to be exerted to the maximum extent, and the credit behavior analysis of an enterprise information main body faces challenges.
Disclosure of Invention
The invention relates to a method for solving enterprise credit data fusion score modeling based on financing and renting industries. And rapidly reading mass information through big data software, mining the association relation of the data bottom layer, constructing an objective statistical model, scientifically predicting enterprise risk through model scoring, and comprehensively improving the application value and the use efficiency of the data. The invention provides a solution for modeling data preprocessing, model design and evaluation, multi-model fusion, system integration application and other works of enterprise credit data.
The enterprise public information in the method refers to government public information from industry and commerce, judicial, tax, intellectual property and the like. The enterprise public information has the conditions of multiple data items, scattered and not concentrated, and users need to log in a plurality of data platforms to acquire the required data, thus the efficiency is low, the data service link is longer, the operation is complex, and the like.
The enterprise credit information in the method is information derived from enterprise credit reports, including enterprise basic information, repayment performance information, guarantee information and the like. The enterprise credit information data has strict use requirements, the enterprise credit report information is applied in the company and needs strict standard management on the premise of obtaining the authorization of the information body, the use data approval process is complex, and the timeliness is to be improved. The credit report information of enterprises is generally in pdf format, and the credit information has strong specialization, and data structure analysis is needed to be firstly carried out and then professional interpretation analysis is needed.
A method for modeling enterprise credit information data fusion scoring based on financing and leasing industries is used for enterprise public information and enterprise credit information data modeling processes of enterprise credit risk assessment and comprises the technical fields of data preprocessing, model design and assessment, multi-model fusion, system integration application and the like.
The aim of the invention is realized by the following technical scheme:
A method for modeling enterprise credit data fusion scoring based on financing leasing industry comprises three steps of enterprise public information modeling, enterprise credit information modeling and credit data fusion scoring.
(1) Modeling enterprise public information:
Based on the identification information of the enterprise information main body, the enterprise basic information is called through an enterprise public information inquiry API, and the called enterprise basic information is stored in a service system memory, wherein the identification information of the enterprise information main body comprises, but is not limited to, enterprise names and/or unified social credit codes;
Defining a modeling target according to project application requirements, using a logistic regression model as a core modeling technology, using the data of the called enterprise basic information as sample data, modeling the enterprise public information to obtain an enterprise public information model, wherein a model result of the enterprise public information model comprises a variable name, a variable meaning, a variable value and a percentage preparation score;
And obtaining the integral score of the enterprise public information model through the corresponding value and percentage preparation score of each variable of the enterprise public information model.
Wherein, each variable in the enterprise public information model corresponds to a value and a percentage preparation score is defined according to the following table:
Specifically, in the enterprise public information modeling, modeling is performed on enterprise public information to obtain an enterprise public information model, and the overall score of the enterprise public information model is finally obtained through corresponding value and score of each variable of the enterprise public information model, wherein the method specifically comprises the following steps:
(1.1) data cleaning analysis:
and performing data cleaning and calculation on the retrieved enterprise basic information data to obtain characteristic variables in an enterprise public information model, wherein the data cleaning mainly comprises the steps of removing repeated data, removing logic conflict data, completing part of univariate calculation, processing noise data, abnormal values and outliers and processing missing numerical values.
(1.2) Feature variable analysis:
carrying out statistical characteristics and distribution analysis on characteristic variables in the obtained enterprise public information model, checking extreme values and processing the extreme values;
And (3) sorting the results of the feature variable analysis into a feature variable table, and recording the feature variable names, the calculation logic, the data coverage and the data distribution basic conditions.
(1.3) Evidence Weight (WOE) analysis:
Converting the logistic regression model into a standard grading card format through WOE conversion to obtain the variable value of the characteristic variable;
firstly, carrying out automatic box separation on all characteristic variables, then manually checking the reliability of an automatic box separation result, whether the automatic box separation result meets business requirements or not, whether the automatic box separation result has interpretability or not, and then judging whether the manual box separation is needed or not;
WOE for each category is defined as follows:
Wherein, columns Bad Distribution and Good Distribution represent the Distribution of "Bad clients" and "good clients" in each category, respectively, which are obtained by dividing the number of frequencies in each category by the total number of "Bad clients" or "good clients";
If the ratio in brackets is less than 1 then WOE is negative and vice versa WOE is positive.
(1.4) Modeling and debugging:
Initializing a series of model variables, fitting a model based on the current series of variables, wherein the model result of the fitted model comprises a characteristic variable name, a variable meaning, a variable value and a percentage preparation score, and then judging whether the fitted model is an optimal model or not;
if the model is judged to be the optimal model, a final model and variables of the enterprise public information model are obtained;
if the model is judged not to be the optimal model, adding some variables into the model or deleting some variables, then re-fitting a model based on a current series of variables, judging whether the re-fitted model is the optimal model or not, and obtaining a final model and variables of the enterprise public information model until the optimal model is found.
(1.5) Fractional linear conversion:
The grading score is linearly converted into 0-100 grades, the distribution characteristics are unchanged, and the conversion formula is as follows:
(1.6) model achievement presentation
And displaying the final modeling variable, the variable value and the percentile score of each variable of the enterprise public information model on the business system.
The data cleaning analysis (1.1) comprises a cleaning rule for general data and a cleaning rule for specific data.
(A) The general data cleaning rule is specifically processed as follows:
(A1) The date field is uniformly displayed according to the YYYY-MM-DD format;
(A2) An amount type field, which is to unify all amounts into a numerical format and calculate according to ten thousand yuan of the Renminbi;
(A3) The proportion field is used for unifying all proportions into a numerical format, removing percentage numbers, and supplementing 0 to 0 before decimal points;
(A4) And (3) repeating the data, namely, for the same event, possibly multiple repeated information records exist in the data table, and the data deduplication takes a keyword of 'company name + event unique identification judgment' as a main identification mode.
(B) The specific data cleaning rule is specifically processed as follows:
(B1) The enterprise registration comprises the steps of correcting an enterprise registration date by using the enterprise operation starting date if the enterprise registration date is empty, deleting the observation that the enterprise operation starting date is empty, deleting the observation that the operation state is cancel or cancel but the cancel date or cancel date is not empty, deleting the observation that the enterprise operation expiration date is not empty but the enterprise operation expiration date is less than the enterprise operation starting date, deleting the observation that the cancel date is not empty but the cancel date is less than the enterprise registration date;
(B2) The main personnel are to combine the job names of the same company and the same person name, wherein one person has a plurality of job positions and is written in two rows, and the job names are combined into one row;
(B3) The executed person information is that the date of the case is deleted and is not empty, but the date of the case is observed by the date of the case < the registration date of the enterprise;
(B4) The open announcement, delete the observation that the case setting time is not empty, but the case setting time is < the enterprise registration date;
(B5) Deleting the observation that the release date is not empty, but the release date is less than the enterprise registration date;
(B6) Abnormal operation, namely deleting the observation of the listing date < enterprise registration date, which is not empty, but is not empty;
(B7) Administrative penalties, namely deleting the observation that the penalty decision date is not empty but the penalty decision date is less than the enterprise registration date, and substituting the public date if the penalty decision date is empty;
(B8) Administrative license, delete license start date is not empty, but license start date < observation of business registration date;
(B9) Spot check, namely deleting the observation that the spot check date is not empty, but the spot check date is less than the enterprise registration date;
(B10) The business change, delete the business change-change date is not empty, but the business change-change date < = observation of the enterprise registration date;
(B11) External investment, namely deleting the observation that the date of the operation is not empty, but the date of the operation is less than the registration date of the enterprise;
(B12) A branch office deleting the observation that the established date is not empty, but the established date is < the enterprise registration date;
(B13) The method comprises the steps of checking the stock right, namely deleting the stock right, checking the stock right, setting up the registration date, and observing the stock right, setting up the registration date < the enterprise registration date;
(B14) Real estate mortgage-delete check-in date is not empty, but check-in date < observation of business check-in date.
In the modeling and debugging step (1.4), whether the fitted model is an optimal model or not is judged, and the method specifically comprises the steps of splitting a training set and a verification set for sample data, wherein the splitting ratio is 7:3, the fact that the bad sample proportion of the training set and the verification set is consistent with the bad sample proportion of the whole data is guaranteed in splitting, obtaining KS curves and ROC curves of the training set and the verification set, which have the best effect, namely the optimal statistical model of quantitative analysis, calculating the model score of each sample according to the model result of the obtained optimal statistical model and the characteristic variable value condition of each sample data, obtaining the sample score distribution of the optimal statistical model, reflecting the distinguishing capability, stability and possible deviation of the optimal statistical model for different samples according to the sample score distribution, and judging whether the model score of the sample can be the best distinguished from the state of the good sample or not according to the actual application scene of the model (for example, the good sample is concentrated in a low section, the good sample is concentrated in a high section), judging the best state of the best distinguished good sample or not being the best distinguished sample.
In the modeling and debugging of the step (1.4), initializing a series of model variables, wherein the step comprises removing variables which obviously have no effect on modeling or have no business meaning, removing variables with excessively low information values and excessively high repeated value proportion, and removing other variables except for one reserved variable in two or more variables with higher pearson correlation.
(2) Enterprise credit information modeling
The enterprise credit information is derived from an enterprise credit report and comprises credit prompt information, loan transaction summary information, guarantee transaction summary information, loan account information and the like;
Layering the model, primarily screening key variables by an analytic hierarchy process, comprehensively considering the service attribute, the correlation, the data coverage and other conditions of the variables, giving a score by an expert scoring process, modeling the credit information of the enterprise to obtain an enterprise credit information model, wherein the model result of the enterprise credit information model comprises a variable name, a variable meaning, a variable value and a percentage preparation score;
The method comprises the steps of inquiring whether a credit report exists in an enterprise information main body based on identification information of the enterprise information main body, if the credit report exists, taking data of latest report date in a database, judging whether a field of ' year with credit transaction for the first time ' in a section of a credit prompt information unit ' in the credit report is empty, calculating enterprise credit information scores according to an enterprise credit information model if the field is not empty, not supporting calculating the enterprise credit information scores and enabling the enterprise credit information scores to be in empty processing if the field is empty, and not supporting calculating the enterprise credit information scores and enabling the enterprise credit information scores to be in empty processing if the credit report does not exist.
Wherein, each variable in the enterprise credit information model corresponds to a value and a percentage preparation score is defined by the following table:
the enterprise credit information is derived from enterprise credit reports, the data sources are single, and the structured data is normative, including credit prompt information, loan transaction summary information, guarantee transaction summary information, loan account information and the like. In the modeling of the enterprise credit information (2), modeling is performed on the enterprise credit information to obtain an enterprise credit information model, which specifically includes:
(2.1) obtaining the characteristic variables in the enterprise credit information model through data cleaning and calculation
Cleaning credit information data of enterprises, selecting important information dimension, processing variables, calculating, and if the credit information data table has records of the credit reports queried for a plurality of times, arranging the credit reports in reverse order according to the generation time of the credit reports, and taking the data of the credit report with the latest date as modeling sample data;
And re-examining and checking the fetched enterprise credit information data through data cleaning so as to discover and correct errors in the data file and reduce the influence of the error data on the model performance, wherein the data cleaning mainly comprises the steps of removing repeated data, removing logic conflict data, completing part of univariate calculation, processing noise data, outliers and processing missing numerical values.
(2.2) Feature analysis
Carrying out statistical feature analysis and distribution analysis on feature variables in the enterprise credit information model, checking extreme values and processing the extreme values;
and (3) sorting a characteristic variable table according to the result of the characteristic analysis, and recording the characteristic variable names, the calculation logic, the data coverage and the data distribution basic conditions.
(2.3) Evidence Weight (WOE) analysis
Obtaining the variable value of the characteristic variable through WOE conversion;
firstly, carrying out automatic box separation on all characteristic variables, then manually checking the reliability of an automatic box separation result, whether the automatic box separation result meets business requirements or not, whether the automatic box separation result has interpretability or not, and then judging whether the manual box separation is needed or not;
WOE for each category is defined as follows:
Wherein, columns Bad Distribution and Good Distribution represent the Distribution of "Bad clients" and "good clients" in each category, respectively, which are obtained by dividing the number of frequencies in each category by the total number of "Bad clients" or "good clients";
If the ratio in brackets is less than 1 then WOE is negative and vice versa WOE is positive.
(2.4) Scoring design
(2.4.1) Whole sample fraction distribution
The enterprise credit information model is modeled according to expert scoring rules, the score of each sample is obtained according to the scoring interval and scoring rules of each variable, whether the sample scores are concentrated, dispersed or have abnormal values or not is helped to identify through integral sample score distribution analysis, and the variables, variable values and scoring conditions of the model are readjusted according to score results and in combination with application scenes of financing and leasing industries;
(2.4.2) distribution of quality sample scores
The modeling target defines the clients as good clients or bad clients according to the client management classification of the project actual application party;
the good and bad samples are distinguished according to the bad definition label;
because each sample has a model score, the score distribution is carried out according to the good and bad samples, and the score distribution is used for checking whether the good and bad samples can be distinguished or not, namely, whether the good samples are concentrated in a high section and the bad samples are concentrated in a low section is judged, and according to the score result, the variables, the variable values and the score conditions of the model are readjusted in combination with the application scene of the financing leasing industry;
step (2.3) WOE analysis and step (2.4) scoring design are subjected to multi-round optimization so as to achieve the state that the sample score can best distinguish good samples from bad samples, and then a final model and variables of the enterprise credit information model are obtained;
(2.5) model achievement presentation
And showing the final modeling variable, the variable value and the percentile score of each variable in the enterprise credit information model on a business system.
(3) Credit data fusion scoring
The whole grading of the enterprise public information model obtained in the step (1) and the grading of the enterprise credit information obtained in the step (2) are output to a business application system in a credit investigation data fusion grading interface mode according to a contracted weight rule, and the business application system displays a final fusion grading result and key characteristic variables;
The credit data fusion scoring interface output content comprises, but is not limited to, enterprise names, unified social credit codes, credit fusion scores, enterprise public information scores, enterprise credit information scores, values and scores of each variable of an enterprise public information model, values and scores of each variable of an enterprise credit information model.
In the credit information data fusion score (3), the agreed weight rule may be that when the credit information score of the enterprise is not empty, the credit information data fusion score is calculated according to the weight ratio of the credit information score of the enterprise to the credit information score of the enterprise being 4:6, and when the credit information score of the enterprise is empty, the credit information data fusion score of the enterprise is consistent with the credit information score of the enterprise. Of course, the weighting rules may also be modified and adjusted according to the business requirements.
The invention provides an enterprise credit data fusion score modeling method based on financing and leasing industry, which comprises the steps of firstly, deeply researching enterprise public information and enterprise credit information, respectively describing a data analysis process and a data analysis result from a data sample summary, a data preprocessing rule, a characteristic variable analysis and a WOE analysis, secondly, selecting an applicable modeling method according to sample magnitude, characteristic variable condition, modeling target and the like of the enterprise public information and the enterprise credit information, respectively establishing an enterprise public information model and an enterprise credit information model, further performing model parameter tuning and score distribution tuning, and finally, completing fusion scoring of the enterprise public information model and the enterprise credit information model, and creating a set of total score interface output service.
Compared with the prior art, the enterprise credit data fusion scoring modeling method based on the financing and renting industry provided by the invention has the following main beneficial effects:
1. method for modeling public information of enterprise by using credit investigation organization view angle
The patent application provides a method for constructing a set of enterprise public information modeling method based on a financing and leasing industry enterprise credit information data fusion scoring modeling method, which is based on the view angle of a third-party credit agency, and based on the analysis of millions of enterprise public information macroscopic data, a credit risk condition of an enterprise information main body on the social public level is combed, government public information business association attributes are analyzed, key feature variables are extracted, and a logical regression model is used for selecting the model to construct the enterprise public information data range, data preprocessing, model design and assessment overall process.
2. Method for constructing enterprise credit information modeling from perspective of financing leasing company
The method for modeling the credit information of the enterprises is characterized in that a set of enterprise credit information modeling method is built, from the actual business demands of financing leasing companies, classification hierarchical management after client loan is combined, bad definition tags of modeling data are set, basic information, loan and guarantee transaction information and repayment performance information of an enterprise information main body on a credit report are combed, key feature variables are refined, and an enterprise credit information model is built through a hierarchical analysis method and an expert scoring method.
3. Method for constructing credit information data fusion scoring method based on financing lease industry
The enterprise credit information only reflects one aspect of enterprise credit risk, and the replacement data value of the enterprise public information is more and more important, so that the requirements of enterprise credit risk assessment cannot be completely met by modeling only the credit information, and the enterprise public information partial modeling is urgent to be converged. The method for fusing the enterprise public information model and the enterprise credit information model is constructed in the method of the patent application, meets the requirements of scoring modeling of different scenes in the future, and has strong expansibility.
Through the analysis, the patent based on the credit data fusion score modeling method for the financing leasing enterprises has higher novelty, practicability and expansibility compared with other score modeling methods in the market.
The invention provides a method for solving enterprise credit data fusion score modeling based on financing and renting industries. And integrating the data advantages of the third-party credit investigation organization and the application scene of the financing and renting company, constructing an objective statistical model by a modeling technology, scientifically predicting the enterprise risk by model scoring, and comprehensively improving the application value and the use efficiency of the data.
Social benefits
The method for solving enterprise credit information data fusion score modeling based on the financing and renting industry comprehensively improves the application value of the substitute data. The modeling method combines the behavior records of the enterprise main body on the social public layers of business, judicial, tax, intellectual property and the like, combines the behavior expression of the credit information of the enterprise, designs a reasonable credit risk assessment model, comprehensively and accurately characterizes the credit portrait of the information main body, displays the conditions of social operation, credit losing information, credit guarantee and the like of the information main body through scoring profiles, comprehensively improves the social public credit awareness of the enterprise group, and strengthens the application value of the substitute data.
The method for solving enterprise credit data fusion score modeling based on the financing and renting industry effectively improves credit data modeling specification and efficiency. The invention carries out deep thinking and research from the aspects of data preprocessing, model design and evaluation, multi-model fusion, system integration application and the like, provides reference values for other non-banking financial institutions (financing leases, business insurance, small loans, consumption finances and the like), and also provides an effective modeling method for data modeling of enterprise credit investigation industry to a certain extent. The invention provides a reliable credit sign data fusion scoring modeling method in the aspects of preventing credit risk, improving performance level and the like, and helps other non-banking financial institutions to quickly build scoring models adapting to own business requirements.
The invention provides a method for solving enterprise credit information data fusion score modeling based on financing and renting industries, which strengthens the construction of an honest social credit system. The invention creates response country to the request of the standard credit information management, explores the business (enterprise) credit information standardization road, perfects the ' cross-domain ' credibility combined incentive and the ' off-credibility combined punishment and withdrawal mechanism, restricts the enterprise information main body to perform legal honest operation, maintains the normal order of the market, builds the honest social environment, promotes the general financial development, gradually improves the general financial institution risk control level, makes greater contribution to the establishment of the industry risk control system of the business self-discipline and the social supervision, and can generate huge social benefit and play an important role in creating the good credit environment and the establishment of the honest system.
Economic benefit
The method for solving enterprise credit information data fusion score modeling based on the financing and renting industry improves the management efficiency of wind control of the financing and renting industry. With the continuous development of various credit businesses and the vigorous competition of industries, more and more financing and leasing institutions realize that gradually increasing the risk control level of enterprises is a necessary basis for the steady promotion of the businesses. The risk data is deeply mined and analyzed by utilizing a data analysis tool to identify potential risk modes and trends, mass data is integrated into simple scoring data by a modeling method, and risks are identified and predicted by objective statistics. By establishing a scoring monitoring system, ongoing risk is tracked and assessed in real time. The reference application of scoring in wind control can effectively simplify and optimize the wind control management flow, reduce unnecessary steps and links and improve the working efficiency.
The method for solving enterprise credit information data fusion score modeling based on the financing and renting industry reduces transaction cost and improves social operation efficiency. By applying the credit information data fusion score, the credit information losing person can be effectively tracked and monitored, so that the credit information losing person is restricted in various aspects, for example, the application of financial products such as loans is limited, or the participation of the credit information losing person in commercial activities such as bidding is limited. This can make the frustration of the distruster, greatly increasing the cost of distrusting. More trust and opportunities are available to the daemon, such as lower interest rates, faster approval speeds, etc., expanding the value of the daemon, which may encourage more enterprises to become daemons. The situation of asymmetric information is reduced, the transaction cost is reduced, the operation efficiency of the society is improved, and the economic benefit of the society is further improved.
The method for solving enterprise credit data fusion scoring modeling based on financing and leasing industries is provided, wherein enterprise public information is government public information from business, judicial, tax, intellectual property and the like, the enterprise public information has the condition of multiple data items and scattered and not centralized, and a user needs to log in a plurality of data platforms to acquire required data, so that the efficiency is low, a data service link is long, the operation is complex and the like.
The method for solving enterprise credit data fusion score modeling based on financing and leasing industry is provided, wherein the enterprise credit information is strictly used, the credit report is strictly and normally managed when being applied in the company on the premise of obtaining the authorization of an information main body, the process of using data approval is complex, and the timeliness is to be improved. Credit reports are generally in pdf format, credit information is proprietary, and data structure analysis is needed before professional interpretation analysis.
Detailed Description
Other advantages and features of the invention are shown by the following description of embodiments of the invention, given by way of example and not by way of limitation, with reference to the accompanying drawings.
A method for modeling enterprise credit data fusion scoring based on financing leasing industry comprises three steps of enterprise public information modeling, enterprise credit information modeling and credit data fusion scoring.
(1) Modeling enterprise public information:
Based on the identification information of the enterprise information main body, the enterprise basic information is called through an enterprise public information inquiry API, and the called enterprise basic information is stored in a service system memory, wherein the identification information of the enterprise information main body comprises, but is not limited to, enterprise names and/or unified social credit codes;
Defining a modeling target according to project application requirements, using a logistic regression model as a core modeling technology, using the data of the called enterprise basic information as sample data, and modeling the enterprise public information to obtain an enterprise public information model;
and obtaining the integral score of the enterprise public information model through the corresponding value and score of each variable of the enterprise public information model.
The enterprise public information data in the method refers to government affair public information such as industry and commerce, judicial tax, intellectual property and the like, and comprises but is not limited to enterprise registration information, stockholder information, main personnel, industry and commerce change, enterprise annual report, administrative permission, administrative penalty, stock right quality, enterprise external investment and the like.
The specific information types and main field items of the enterprise public information data are shown in the following table 1:
TABLE 1 Enterprise public information types and field items
In data modeling, a general modeling goal is a goal or objective to build a model to achieve. Modeling goals may be diverse, typically for prediction and decision support, and common modeling goals are budgeting for the likelihood of occurrence of some kind of bad definition event over a period of time.
For modeling of public information of enterprises, definition of bad samples may be different according to specific situations, and general bad definition schemes and advantages are as follows in table 2:
TABLE 2 Enterprise public information-modeling goal general purpose scenario comparison
The modeling target is selected by comprehensively considering project application scenes and sample data distribution characteristics. To enhance the designability of modeling targets, modeling targets may be defined in terms of project application requirements.
For example:
the definition of modeling target variables ("good" and "bad") can be defined as "good" clients and "bad" clients according to the actual application requirements of the project and the post-credit management classification of the enterprise client group. Such as whether overdue, overdue condition, risk five-level classification, etc.
In the method, the modeling target is defined as a good client according to the client management classification of 'N1/N2/N3' of the actual application party of the financing and leasing industry, and the client management classification of 'A/B/C' is defined as a bad client.
N1 is a normal class item;
n2 is a project in the construction period, and whether the risk is uncertain or not is specific;
N3 is a few small operation flaw items, such as the situation that the electric charge income is not reported in time in the last month;
class C is a risk occurrence that requires intervention by a customer manager;
Class B is a risk occurrence, and requires company intervention, such as collection promotion and the like;
class A is a risk occurrence, and the problems need to be treated by legal litigation and other means;
the degree of deterioration of the item is N1-N2-N3-C-B-A.
Specifically, in the enterprise public information modeling, modeling is performed on enterprise public information to obtain an enterprise public information model, and the overall score of the enterprise public information model is finally obtained through corresponding value and score of each variable of the enterprise public information model, wherein the method specifically comprises the following steps:
(1.1) data cleaning analysis:
and performing data cleaning and calculation on the retrieved enterprise basic information data to obtain characteristic variables in an enterprise public information model, wherein the data cleaning mainly comprises the steps of removing repeated data, removing logic conflict data, completing part of univariate calculation, processing noise data, abnormal values and outliers and processing missing numerical values.
Data cleansing analysis, including cleansing rules for generic data and cleansing rules for specific data.
(A) The general data cleaning rule is specifically processed as follows:
(A1) The date field is uniformly displayed according to the YYYY-MM-DD format;
(A2) An amount type field, which is to unify all amounts into a numerical format and calculate according to ten thousand yuan of the Renminbi;
(A3) The proportion field is used for unifying all proportions into a numerical format, removing percentage numbers, and supplementing 0 to 0 before decimal points;
(A4) And (3) repeating the data, namely, for the same event, possibly multiple repeated information records exist in the data table, and the data deduplication takes a keyword of 'company name + event unique identification judgment' as a main identification mode. The duplicate removal judgment keywords of each table are shown in Table 3 below.
TABLE 3 Enterprise public information record uniqueness judgment identification
(B) The specific data cleaning rule is specifically processed as follows:
(B1) The enterprise registration comprises the steps of correcting an enterprise registration date by using the enterprise operation starting date if the enterprise registration date is empty, deleting the observation that the enterprise operation starting date is empty, deleting the observation that the operation state is cancel or cancel but the cancel date or cancel date is not empty, deleting the observation that the enterprise operation expiration date is not empty but the enterprise operation expiration date is less than the enterprise operation starting date, deleting the observation that the cancel date is not empty but the cancel date is less than the enterprise registration date;
(B2) The main personnel are to combine the job names of the same company and the same person name, wherein one person has a plurality of job positions and is written in two rows, and the job names are combined into one row;
(B3) The executed person information is that the date of the case is deleted and is not empty, but the date of the case is observed by the date of the case < the registration date of the enterprise;
(B4) The open announcement, delete the observation that the case setting time is not empty, but the case setting time is < the enterprise registration date;
(B5) Deleting the observation that the release date is not empty, but the release date is less than the enterprise registration date;
(B6) Abnormal operation, namely deleting the observation of the listing date < enterprise registration date, which is not empty, but is not empty;
(B7) Administrative penalties, namely deleting the observation that the penalty decision date is not empty but the penalty decision date is less than the enterprise registration date, and substituting the public date if the penalty decision date is empty;
(B8) Administrative license, delete license start date is not empty, but license start date < observation of business registration date;
(B9) Spot check, namely deleting the observation that the spot check date is not empty, but the spot check date is less than the enterprise registration date;
(B10) The business change, delete the business change-change date is not empty, but the business change-change date < = observation of the enterprise registration date;
(B11) External investment, namely deleting the observation that the date of the operation is not empty, but the date of the operation is less than the registration date of the enterprise;
(B12) A branch office deleting the observation that the established date is not empty, but the established date is < the enterprise registration date;
(B13) The method comprises the steps of checking the stock right, namely deleting the stock right, checking the stock right, setting up the registration date, and observing the stock right, setting up the registration date < the enterprise registration date;
(B14) Real estate mortgage-delete check-in date is not empty, but check-in date < observation of business check-in date.
(1.2) Feature variable analysis:
carrying out statistical characteristics and distribution analysis on characteristic variables in the obtained enterprise public information model, checking extreme values and processing the extreme values;
And (3) sorting the results of the feature variable analysis into a feature variable table, and recording the feature variable names, the calculation logic, the data coverage and the data distribution basic conditions.
(1.3) Evidence Weight (WOE) analysis:
Converting the logistic regression model into a standard grading card format through WOE conversion to obtain the variable value of the characteristic variable;
firstly, carrying out automatic box separation on all characteristic variables, then manually checking the reliability of an automatic box separation result, whether the automatic box separation result meets business requirements or not, whether the automatic box separation result has interpretability or not, and then judging whether the manual box separation is needed or not;
WOE for each category is defined as follows:
Wherein, columns Bad Distribution and Good Distribution represent the Distribution of "Bad clients" and "good clients" in each category, respectively, which are obtained by dividing the number of frequencies in each category by the total number of "Bad clients" or "good clients";
If the ratio in brackets is less than 1 then WOE is negative and vice versa WOE is positive.
(1.4) Modeling and debugging:
Initializing a series of model variables, fitting a model based on the current series of variables, wherein the model result of the fitted model comprises a characteristic variable name, a variable meaning, a variable value and a percentage score, and then judging whether the fitted model is an optimal model or not. If the model is judged to be the optimal model, a final model and variables of the enterprise public information model are obtained, if the model is judged to be the non-optimal model, a model is re-fitted based on a current series of variables after some variables are added or deleted to the model, whether the re-fitted model is the optimal model is judged, and until the optimal model is found, the final model and the variables of the enterprise public information model are obtained.
Initializing model variables, including:
① Removing variables that are significantly ineffective for modeling
And manually removing variables which obviously have no effect on modeling or have no business meaning in the original data of the enterprise public information, such as variables of unified social credit codes, registration authorities, legal representatives, permitted business projects, business scope and the like.
② Removing variables with information values that are too low and repetition value ratios that are too high
For example, in modeling and debugging a certain model, it can be seen from the following table 4 that the information value (info_value) of the feature variable "business name" is smaller than 0.02, and the information value is too low to be removed. The repetition value ratio (identification_rate) of the number of the copyright of the works, the number of the judicial auctions, the number of the information of the trusted executives, the number of the owe taxes information of the intellectual property rights, the number of the administrative penalties, the number of the documents of the judge of the last 2 years is larger than 0.95, and the repeated value ratio is overlarge for removal.
TABLE 4 data modeling characteristic variable information value IV, repeat value ratio IR
③ Removing variables with higher pearson correlation
For example, pearson correlation coefficient values of "annual report of business" and "annual number of establishment" have pearson correlation of 0.813. The 'enterprise annual report' is removed in the modeling process, and only the 'established years' variable is reserved.
The method comprises the steps of carrying out training set and verification set splitting on sample data, wherein the splitting ratio is 7:3, ensuring that the bad sample proportion of the training set and the verification set is consistent with the bad sample proportion of the whole data during splitting, obtaining KS curves and ROC curves of the training set and the verification set, wherein the best effect is the best statistical model of quantitative analysis, calculating the model score of each sample according to the model result of the obtained best statistical model and the characteristic variable value of each sample data, obtaining the sample score distribution of the best statistical model, reflecting the distinguishing capability, stability and possible deviation of the best statistical model on different samples through the sample score distribution, judging whether the model score of the sample can be used for distinguishing the good sample from the bad sample according to the actual application scene of the model, for example, wherein the bad sample is concentrated in a low segment, judging the best model if the best state of the good sample is distinguished from the bad sample, namely obtaining the final model and variable of an enterprise public information model, and if the best state of the best sample is not judged.
When the model is judged to be not the optimal model, adding some variables into the model or deleting some variables, then re-fitting a model based on a current series of variables, and continuously judging whether the re-fitted model is the optimal model according to the method until the optimal model is found, and obtaining a final model and variables of the enterprise public information model.
(1.5) Fractional linear conversion
The grading score is linearly converted into 0-100 grades, the distribution characteristics are unchanged, and the conversion formula is as follows:
(1.6) model achievement presentation
And displaying the final modeling variable, the variable value and the percentile score of each variable of the enterprise public information model on the business system.
TABLE 5 Enterprise public information model
(2) Enterprise credit information modeling
The enterprise credit information is derived from an enterprise credit report and comprises credit prompt information, loan transaction summary information, guarantee transaction summary information, loan account information and the like;
layering the model, primarily screening key variables by an analytic hierarchy process, comprehensively considering the service attribute, the correlation, the data coverage and other conditions of the variables, giving scores by an expert scoring process, modeling the credit information of the enterprise, and obtaining an enterprise credit information model;
The method comprises the steps of inquiring whether a credit report exists in an enterprise information main body based on identification information of the enterprise information main body, if the credit report exists, taking data of latest report date in a database, judging whether a field of ' year with credit transaction for the first time ' in a section of a credit prompt information unit ' in the credit report is empty, calculating enterprise credit information scores according to an enterprise credit information model if the field is not empty, not supporting calculating the enterprise credit information scores and enabling the enterprise credit information scores to be in empty processing if the field is empty, and not supporting calculating the enterprise credit information scores and enabling the enterprise credit information scores to be in empty processing if the credit report does not exist.
The enterprise credit information is derived from enterprise credit reports, the data sources are single, and the structured data is normative, including credit prompt information, loan transaction summary information, guarantee transaction summary information, loan account information and the like. In the modeling of the enterprise credit information (2), modeling is performed on the enterprise credit information to obtain an enterprise credit information model, which specifically includes:
(2.1) obtaining the characteristic variables in the enterprise credit information model through data cleaning and calculation
Cleaning credit information data of enterprises, selecting important information dimension, processing variables, calculating, and if the credit information data table has records of the credit reports queried for a plurality of times, arranging the credit reports in reverse order according to the generation time of the credit reports, and taking the data of the credit report with the latest date as modeling sample data;
And re-examining and checking the fetched enterprise credit information data through data cleaning so as to discover and correct errors in the data file and reduce the influence of the error data on the model performance, wherein the data cleaning mainly comprises the steps of removing repeated data, removing logic conflict data, completing part of univariate calculation, processing noise data, outliers and processing missing numerical values.
(2.2) Feature analysis
Carrying out statistical feature analysis and distribution analysis on feature variables in the enterprise credit information model, checking extreme values and processing the extreme values;
and (3) sorting a characteristic variable table according to the result of the characteristic analysis, and recording the characteristic variable names, the calculation logic, the data coverage and the data distribution basic conditions.
(2.3) Evidence Weight (WOE) analysis
Obtaining the variable value of the characteristic variable through WOE conversion;
firstly, carrying out automatic box separation on all characteristic variables, then manually checking the reliability of an automatic box separation result, whether the automatic box separation result meets business requirements or not, whether the automatic box separation result has interpretability or not, and then judging whether the manual box separation is needed or not;
WOE for each category is defined as follows:
Wherein, columns Bad Distribution and Good Distribution represent the Distribution of "Bad clients" and "good clients" in each category, respectively, which are obtained by dividing the number of frequencies in each category by the total number of "Bad clients" or "good clients";
If the ratio in brackets is less than 1 then WOE is negative and vice versa WOE is positive.
(2.4) Scoring design
(2.4.1) Whole sample fraction distribution
The enterprise credit information model is modeled according to expert scoring rules, the score of each sample is obtained according to the scoring interval and scoring rules of each variable, whether the sample scores are concentrated, dispersed or have abnormal values or not is helped to identify through integral sample score distribution analysis, and the variables, variable values and scoring conditions of the model are readjusted according to score results and in combination with application scenes of financing and leasing industries;
(2.4.2) distribution of quality sample scores
The modeling target defines the clients as good clients or bad clients according to the client management classification of the project actual application party;
the good and bad samples are distinguished according to the bad definition label;
because each sample has a model score, the score distribution is carried out according to the good and bad samples, and the score distribution is used for checking whether the good and bad samples can be distinguished or not, namely, whether the good samples are concentrated in a high section and the bad samples are concentrated in a low section is judged, and according to the score result, the variables, the variable values and the score conditions of the model are readjusted in combination with the application scene of the financing leasing industry;
step (2.3) WOE analysis and step (2.4) scoring design are subjected to multi-round optimization so as to achieve the state that the sample score can best distinguish good samples from bad samples, and then a final model and variables of the enterprise credit information model are obtained;
(2.5) model achievement presentation
And showing the final modeling variable, the variable value and the percentile score of each variable in the enterprise credit information model on a business system.
TABLE 6 Enterprise credit information model
(3) Credit data fusion scoring
The obtained integral score of the enterprise public information model and the enterprise credit information score are output to a business application system in an interface mode according to the agreed weight rule; the business system displays the final fused scoring result and key feature variables.
In an actual business scenario, each business information entity may calculate a business disclosure information score, but may not be able to calculate a business credit information score because of missing credit information. And (3) finishing processing and calculating the enterprise public information score and the enterprise credit information score in the system, and calculating the fusion score according to the weight ratio of 4:6.
For example, the enterprise public information model score is 80, the credit information model score is 60, and the credit information data fusion score is 80×0.4+60×0.6=68.
For example, the enterprise public information model score is 70, and the credit information data fusion score is 70 when no credit information data exists.
The credit information data fusion scoring interface output content comprises an enterprise name, a unified social credit code, a credit information fusion score, an enterprise public information score, an enterprise credit information score, a value and a score of each variable of the enterprise public information model, and a value and a score of each variable of the enterprise credit information model.
Although the invention has been described in terms of the preferred embodiment, it is not intended to limit the scope of the invention to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.
Claims (7)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411842297.6A CN119761901A (en) | 2024-12-13 | 2024-12-13 | A scoring modeling method based on the fusion credit data of enterprises in the financial leasing industry |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411842297.6A CN119761901A (en) | 2024-12-13 | 2024-12-13 | A scoring modeling method based on the fusion credit data of enterprises in the financial leasing industry |
Publications (1)
Publication Number | Publication Date |
---|---|
CN119761901A true CN119761901A (en) | 2025-04-04 |
Family
ID=95187045
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202411842297.6A Pending CN119761901A (en) | 2024-12-13 | 2024-12-13 | A scoring modeling method based on the fusion credit data of enterprises in the financial leasing industry |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN119761901A (en) |
-
2024
- 2024-12-13 CN CN202411842297.6A patent/CN119761901A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Minutiello et al. | The quality of nonfinancial voluntary disclosure: A systematic literature network analysis on sustainability reporting and integrated reporting | |
US12147647B2 (en) | Artificial intelligence assisted evaluations and user interface for same | |
US20110238566A1 (en) | System and methods for determining and reporting risk associated with financial instruments | |
US20060004595A1 (en) | Data integration method | |
Culot et al. | Using supply chain databases in academic research: A methodological critique | |
WO2017210519A1 (en) | Dynamic self-learning system for automatically creating new rules for detecting organizational fraud | |
Alshehadeh et al. | The impact of business intelligence tools on sustaining financial report quality in Jordanian commercial banks | |
CN112419030B (en) | Method, system and equipment for evaluating financial fraud risk | |
Gamal et al. | Corporate sustainability performance throughout the firm life cycle: Case of Egypt | |
Nwankwo et al. | Knowledge discovery and analytics in process reengineering: a study of port clearance processes | |
Hu | Predicting and improving invoice-to-cash collection through machine learning | |
Zhou et al. | Judicial waves, ethical shifts: bankruptcy courts and corporate ESG performance | |
Kim et al. | Trustworthy residual vehicle value prediction for auto finance | |
Duan et al. | Integrating process mining and machine learning for advanced internal control evaluation in auditing | |
Sirikulvadhana | Data mining as a financial auditing tool | |
CN119205305A (en) | A financial product intelligent matching and pre-credit method and system | |
Choi et al. | Noncompliance with non‐accounting securities regulations and GAAP violations | |
CN118626910A (en) | Method, device and server for determining customer profile | |
Roubtsova et al. | A Practical Extension of Frameworks for Auditing with Process Mining. | |
CN119761901A (en) | A scoring modeling method based on the fusion credit data of enterprises in the financial leasing industry | |
CN117114812A (en) | A method and device for recommending financial products for enterprises | |
Bakhshi et al. | Developing a hybrid approach to credit priority based on accounting variables (using analytical network process (ANP) and multi-criteria decision-making) | |
Pan | Fraudulent firm classification using monotonic classification techniques | |
CN120541465B (en) | Report data anomaly monitoring and quality assessment system, method and electronic equipment | |
Melidis | Personalized marketing campaign for upselling using predictive modeling in the health insurance sector |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |