CN108876076A

CN108876076A - The personal credit methods of marking and device of data based on instruction

Info

Publication number: CN108876076A
Application number: CN201710322533.5A
Authority: CN
Inventors: 张湛梅; 张晓川; 徐睿; 崔志顺
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Group Guangdong Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Group Guangdong Co Ltd
Priority date: 2017-05-09
Filing date: 2017-05-09
Publication date: 2018-11-23

Abstract

The embodiment of the invention discloses the personal credit methods of marking and device of a kind of data based on instruction.Method includes：Sample account group is obtained, and selects positive sample and negative sample from sample account group according to preset rules；Branch mailbox processing is carried out to the first pre-set level group, and the corresponding WOE value of each branch mailbox is obtained according to the accounting of negative sample in each branch mailbox；The estimator of the parameter of prebuild Logic Regression Models is obtained according to the corresponding WOE value of each branch mailbox；Penalty term is configured to the estimator of the Logic Regression Models parameter, the penalty term is for configuration signal data and other contributions of non-signaling data to the Logic Regression Models；It is scored according to personal credit of the Logic Regression Models to user, obtains the personal credit scoring of user.The embodiment of the present invention is based on signaling data and improves to Logic Regression Models, occupies biggish specific gravity to guarantee signaling data in Logic Regression Models, compared with prior art, has the advantages that scoring is more accurate.

Description

Personal credit scoring method and device based on instruction data

技术领域technical field

本发明实施例涉及信用风险管理技术领域，具体涉及一种基于指令数据的个人信用评分方法及装置。The embodiments of the present invention relate to the technical field of credit risk management, and in particular to a personal credit scoring method and device based on instruction data.

背景技术Background technique

截至2015年9月末，央行征信系统已经收录8.7亿自然人和2102万户企业及其他组织，央行征信系统收集的信息以银行信贷信息为核心，还包括社保、公积金、民事裁决与执行、公共事业和通讯缴费记录等。实际上，移动公司拥有翔实的用户行为数据、用户背景资料，还可以掌握用户位置信息的信令数据、通话通信记录、交费消费记录，这些数据也早已经纳入到了国家征信系统中。作为大数据应用落地最具有实用价值的征信领域，移动公司一直都在不断的探索。As of the end of September 2015, the credit information system of the central bank has included 870 million natural persons and 21.02 million enterprises and other organizations. Business and communication payment records, etc. In fact, mobile companies have detailed user behavior data, user background information, and can also grasp signaling data of user location information, call communication records, payment and consumption records, and these data have already been incorporated into the national credit system. As the field of credit investigation with the most practical value in the application of big data, mobile companies have been constantly exploring.

在实现本发明实施例的过程中，发明人发现通信运营商建立的个人信用评分方法主要考虑了用户的基本信息、业务订购信息、消费能力、通信行为、历史欠费停机记录、交往圈等多方面的因素，但由于考虑的因素并没有侧重点，因此实际得到的评分结果并不准确。In the process of implementing the embodiment of the present invention, the inventor found that the personal credit scoring method established by the communication operator mainly considered the user's basic information, service order information, consumption ability, communication behavior, historical arrears and downtime records, contact circles, etc. However, because the factors considered are not focused, the actual scoring results are not accurate.

发明内容Contents of the invention

本发明实施例的一个目的是解决现有技术由于评分考虑的因素没有侧重点导致评分结果不准确的问题。An object of the embodiments of the present invention is to solve the problem in the prior art that scoring results are inaccurate due to lack of emphasis on factors considered in scoring.

本发明实施例提出了一种基于指令数据的个人信用评分方法，包括：The embodiment of the present invention proposes a personal credit scoring method based on instruction data, including:

获取样本账户群，并按照预设规则从样本账户群中选取出正样本和负样本；Obtain sample account groups, and select positive samples and negative samples from the sample account groups according to preset rules;

对第一预设指标群进行分箱处理，并根据每个分箱内负样本的占比获取每个分箱对应的WOE值；Perform binning processing on the first preset index group, and obtain the WOE value corresponding to each bin according to the proportion of negative samples in each bin;

根据每个分箱对应的WOE值获取预构建逻辑回归模型的参数的估计量；Obtain the estimator of the parameters of the pre-built logistic regression model according to the WOE value corresponding to each bin;

对所述逻辑回归模型参数的估计量配置惩罚项，所述惩罚项用于配置信令数据与其他非信令数据对所述逻辑回归模型的贡献；Configuring a penalty item for the estimator of the logistic regression model parameter, the penalty item is used to configure the contribution of signaling data and other non-signaling data to the logistic regression model;

根据所述逻辑回归模型对用户的个人信用进行评分，获取用户的个人信用评分。Score the user's personal credit according to the logistic regression model to obtain the user's personal credit score.

可选的，所述按照预设规则从样本账户群中选取出正样本和负样本包括：Optionally, selecting positive samples and negative samples from sample account groups according to preset rules includes:

采用熵值法判断每个样本账户的第二预设指标群的离散程度；Use the entropy method to determine the degree of dispersion of the second preset indicator group for each sample account;

根据每个样本账户的第二预设指标群的离散程度从所述样本账户群中选取正样本和负样本。Select positive samples and negative samples from the sample account groups according to the degree of dispersion of the second preset index group of each sample account.

可选的，所述对所述逻辑回归模型参数的估计量配置惩罚项包括：Optionally, the configuration penalty item for the estimator of the logistic regression model parameter includes:

对所述第一预设指标群进行分析，获取与信令数据相关的第一指标组以及与信令数据无关的第二指标组；Analyzing the first preset index group to obtain a first index group related to signaling data and a second index group unrelated to signaling data;

构建所述第二指标组中指标的系数与所述第一指标组中指标的系数的惩罚项；Constructing a penalty term between the coefficients of the indicators in the second indicator group and the coefficients of the indicators in the first indicator group;

将所述惩罚项配置至所述逻辑回归模型参数的估计量。The penalty term is configured to the estimator of the logistic regression model parameter.

可选的，所述惩罚项为 Optionally, the penalty item is

其中，ψ₁为惩罚系数，β_j为所述第二指标组中第j个指标的系数，为所述第一指标组中第k_n个指标的系数。Wherein, ψ ₁ is the penalty coefficient, and β _j is the coefficient of the jth index in the second index group, is the coefficient of the k _nth index in the first index group.

可选的，所述根据所述逻辑回归模型对用户的个人信用进行评分包括：Optionally, the scoring the user's personal credit according to the logistic regression model includes:

在预设约束条件下，将所述逻辑回归模型转化为评分模型；Under preset constraint conditions, converting the logistic regression model into a scoring model;

将用户的第二预设指标群对应的数据作为评分模型的输入，获取所述第二预设指标群中每个指标对应各分箱的评分值；Using the data corresponding to the user's second preset index group as the input of the scoring model, and obtaining the scoring value corresponding to each sub-bin for each index in the second preset index group;

根据每个指标对应各分箱的评分值获取用户的个人信用评分。Obtain the user's personal credit score according to the score value of each indicator corresponding to each bin.

本发明实施例提出了一种基于指令数据的个人信用评分装置，包括：The embodiment of the present invention proposes a personal credit scoring device based on instruction data, including:

获取模块，用于获取样本账户群，并按照预设规则从样本账户群中选取出正样本和负样本；The obtaining module is used to obtain sample account groups, and select positive samples and negative samples from the sample account groups according to preset rules;

分箱模块，用于对第一预设指标群进行分箱处理，并根据每个分箱内负样本的占比获取每个分箱对应的WOE值；A binning module, configured to bin the first preset index group, and obtain the WOE value corresponding to each bin according to the proportion of negative samples in each bin;

建模模块，用于根据每个分箱对应的WOE值获取预构建逻辑回归模型的参数的估计量；A modeling module, configured to obtain an estimator of a parameter of a pre-built logistic regression model according to the WOE value corresponding to each bin;

配置模块，用于对所述逻辑回归模型参数的估计量配置惩罚项，所述惩罚项用于配置信令数据与其他非信令数据对所述逻辑回归模型的贡献；A configuration module, configured to configure a penalty item for the estimator of the logistic regression model parameter, and the penalty item is used to configure the contribution of signaling data and other non-signaling data to the logistic regression model;

评分模块，用于根据所述逻辑回归模型对用户的个人信用进行评分，获取用户的个人信用评分。The scoring module is configured to score the user's personal credit according to the logistic regression model, and obtain the user's personal credit score.

可选的，所述获取模块，用于采用熵值法判断每个样本账户的第二预设指标群的离散程度；根据每个样本账户的第二预设指标群的离散程度从所述样本账户群中选取正样本和负样本。Optionally, the acquisition module is configured to use an entropy method to judge the degree of dispersion of the second preset index group of each sample account; according to the degree of dispersion of the second preset index group of each sample account, the sample Select positive samples and negative samples from the account group.

可选的，所述配置模块，用于对所述第一预设指标群进行分析，获取与信令数据相关的第一指标组以及与信令数据无关的第二指标组；构建所述第二指标组中指标的系数与所述第一指标组中指标的系数的惩罚项；将所述惩罚项配置至所述逻辑回归模型参数的估计量。Optionally, the configuration module is configured to analyze the first preset index group, obtain a first index group related to signaling data and a second index group unrelated to signaling data; construct the second index group The coefficient of the index in the second index group and the penalty item of the coefficient of the index in the first index group; the penalty item is configured as the estimator of the logistic regression model parameter.

可选的，所述惩罚项为 Optionally, the penalty item is

可选的，所述评分模块，用于在预设约束条件下，将所述逻辑回归模型转化为评分模型；将用户的第二预设指标群对应的数据作为评分模型的输入，获取所述第二预设指标群中每个指标对应各分箱的评分值；根据每个指标对应各分箱的评分值获取用户的个人信用评分。Optionally, the scoring module is configured to convert the logistic regression model into a scoring model under preset constraints; use the data corresponding to the user's second preset index group as the input of the scoring model to obtain the Each index in the second preset index group corresponds to the scoring value of each bin; and the user's personal credit score is obtained according to the scoring value of each index corresponding to each bin.

由上述技术方案可知，本发明实施例提出的一种基于指令数据的个人信用评分方法及装置基于信令数据对逻辑回归模型进行改进，以保证信令数据在逻辑回归模型中占有较大的比重，与现有技术相比，具有评分更加准确的优点。It can be seen from the above technical solution that a personal credit scoring method and device based on instruction data proposed in the embodiment of the present invention improves the logistic regression model based on signaling data, so as to ensure that signaling data occupies a larger proportion in the logistic regression model , compared with the prior art, has the advantage of more accurate scoring.

附图说明Description of drawings

通过参考附图会更加清楚的理解本发明的特征和优点，附图是示意性的而不应理解为对本发明进行任何限制，在附图中：The features and advantages of the present invention will be more clearly understood by referring to the accompanying drawings, which are schematic and should not be construed as limiting the invention in any way. In the accompanying drawings:

图1示出了本发明一实施例提供的基于指令数据的个人信用评分方法的流程示意图；Fig. 1 shows a schematic flow chart of a personal credit scoring method based on instruction data provided by an embodiment of the present invention;

图2示出了本发明另一实施例提供的基于指令数据的个人信用评分方法的流程示意图；Fig. 2 shows a schematic flow chart of a personal credit scoring method based on instruction data provided by another embodiment of the present invention;

图3示出了本发明一实施例提供的基于指令数据的个人信用评分装置的结构示意图。Fig. 3 shows a schematic structural diagram of a personal credit scoring device based on instruction data provided by an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明的一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

图1示出了本发明一实施例提供的基于指令数据的个人信用评分方法的流程示意图，参见图1，该方法可由处理器实现，具体包括如下区别技术特征：Fig. 1 shows a schematic flow chart of a personal credit scoring method based on instruction data provided by an embodiment of the present invention. Referring to Fig. 1, the method can be implemented by a processor, and specifically includes the following distinguishing technical features:

110、获取样本账户群，并按照预设规则从样本账户群中选取出正样本和负样本；110. Obtain sample account groups, and select positive samples and negative samples from the sample account groups according to preset rules;

需要说明的是，此处的样本账户可以为某企业的用户的手机号码、用户编号等用户唯一标识性的信息；然后，基于一些划分的规则从中选取出正样本和负样本，即好的用户和坏的用户各有多少个，或者占总样本的百分比是多少。It should be noted that the sample account here can be the user's mobile phone number, user number and other unique identification information of a user of an enterprise; then, based on some division rules, positive samples and negative samples are selected, that is, good user How many and bad users are there, or what is the percentage of the total sample.

其中，划分的规则有多种，例如：黄金分割法。Among them, there are many division rules, for example: the golden section method.

120、对第一预设指标群进行分箱处理，并根据每个分箱内负样本的占比获取每个分箱对应的WOE值；120. Perform binning processing on the first preset index group, and obtain the WOE value corresponding to each bin according to the proportion of negative samples in each bin;

需要说明的是，分箱处理法为较为成熟的技术，此处不再赘述。It should be noted that the binning method is a relatively mature technology and will not be repeated here.

130、根据每个分箱对应的WOE值获取预构建逻辑回归模型的参数的估计量；130. Obtain the parameter estimator of the pre-built logistic regression model according to the WOE value corresponding to each bin;

140、对所述逻辑回归模型参数的估计量配置惩罚项，所述惩罚项用于配置信令数据与其他非信令数据对所述逻辑回归模型的贡献；140. Configure a penalty item for the estimator of the logistic regression model parameter, and the penalty item is used to configure the contribution of signaling data and other non-signaling data to the logistic regression model;

需要说明的是，为将模型的评分因素侧重于信令数据，需要限定其他非信令数据与信令数据之间的关系，以保证信令数据在逻辑回归模型中所占的比重要大于其他非信令数据所占比重。It should be noted that in order to focus the scoring factors of the model on signaling data, the relationship between other non-signaling data and signaling data needs to be limited to ensure that the proportion of signaling data in the logistic regression model is greater than that of other non-signaling data. The proportion of non-signaling data.

150、根据所述逻辑回归模型对用户的个人信用进行评分，获取用户的个人信用评分。150. Score the user's personal credit according to the logistic regression model, and acquire the user's personal credit score.

需要说明的是，模型建立完成后，将用户的第一预设指标群的相关信息作为模型的输入，得出用户的个人信用评分。It should be noted that after the model is established, the relevant information of the user's first preset index group is used as the input of the model to obtain the user's personal credit score.

可见，本实施例基于信令数据对逻辑回归模型进行改进，以保证信令数据在逻辑回归模型中占有较大的比重，与现有技术相比，具有评分更加准确的优点。It can be seen that this embodiment improves the logistic regression model based on the signaling data to ensure that the signaling data occupies a larger proportion in the logistic regression model, and has the advantage of more accurate scoring compared with the prior art.

下面对上述步骤进行详细说明：The above steps are described in detail below:

首先，步骤110中选取正/负样本的方法可以包括如下步骤：First, the method for selecting positive/negative samples in step 110 may include the following steps:

其次，步骤140具体包括：Secondly, step 140 specifically includes:

所述惩罚项的形式为 The form of the penalty term is

步骤150具体包括：Step 150 specifically includes:

其中，转化步骤中涉及的约束条件可以为评分的范围，例如：1-100。Wherein, the constraint condition involved in the converting step may be a scoring range, for example: 1-100.

可见，本实施例利用基于信令数据的自适应逻辑回归模型进行个人信用评分，自适应选取对信用评分有效的指标和系数，保证了个人信用评分模型在筛选指标的时候保持稳定并体现信令数据的重要作用，减少模型系数的误差，使得评分模型更加合理。It can be seen that this embodiment utilizes an adaptive logistic regression model based on signaling data to score personal credit, and adaptively selects indicators and coefficients that are effective for credit scoring, ensuring that the personal credit scoring model remains stable and reflects signaling when screening indicators. The important role of data reduces the error of model coefficients and makes the scoring model more reasonable.

图2示出了本发明另一实施例提供的基于指令数据的个人信用评分方法的流程示意图，下面参见图2对本发明的设计原理进行详细说明：Fig. 2 shows a schematic flow chart of a personal credit scoring method based on instruction data provided by another embodiment of the present invention. Referring to Fig. 2, the design principle of the present invention will be described in detail below:

1、设计思路1. Design idea

本方案主要针对传统逻辑回归个人信用评分模型进行优化，这里采用的是基于信令数据自适应的逻辑回归模型进行个人信用评分。整个方案的主要流程是先采用熵值法提取正负样本作为标准样本数据用于后续评分建模，选取与衡量个人信用相关的指标作为建模的输入变量，包括基本信息、消费能力、信用记录、人脉关系、行为偏好方面的指标以及信令数据方面的指标。再对标准样本数据的指标进行分箱和求其WOE值等预处理，然后建立基于信令数据的自适应逻辑回归模型，利用信令数据对模型进行自适应训练，自动选取对信用评分有效的指标和系数，最后将回归模型转化为个人信用评分表，用于个人信用评分。This solution is mainly optimized for the traditional logistic regression personal credit scoring model. Here, an adaptive logistic regression model based on signaling data is used for personal credit scoring. The main process of the whole program is to first use the entropy method to extract positive and negative samples as standard sample data for subsequent scoring modeling, and select indicators related to measuring personal credit as input variables for modeling, including basic information, consumption ability, and credit records , network connections, indicators of behavioral preferences, and indicators of signaling data. Then preprocess the indicators of the standard sample data by binning and calculating their WOE values, and then establish an adaptive logistic regression model based on signaling data, use the signaling data to perform adaptive training on the model, and automatically select the ones that are effective for credit scoring Indicators and coefficients, and finally transform the regression model into a personal credit scoring table for personal credit scoring.

210、提取标准样本数据用于评分建模；210. Extract standard sample data for scoring modeling;

要建立个人信用评估体系，必须先选择出一部分标准的样本，作为参照体系，能够区分出哪些是好用户，那些是坏用户，后续的评分模型就基于这些数据进行分析。To establish a personal credit evaluation system, some standard samples must be selected first, as a reference system, which can distinguish good users from bad users, and the subsequent scoring model will be analyzed based on these data.

本技术方案采用的是利用熵值法并结合欠费方面的指标对用户进行评分，分值由高到低排序，得分越高则用户的欠费程度越高，违约的概率也随之增加，所以取得分前1％的用户作为坏用户，即正样本；在剩下的用户中随机抽取总用户人数的10％作为好用户，即负样本。具体的步骤如下：This technical solution uses the entropy method combined with indicators of arrears to score users, and the scores are sorted from high to low. The higher the score, the higher the user's arrears, and the probability of default increases accordingly. Therefore, the top 1% users are taken as bad users, that is, positive samples; 10% of the total number of users are randomly selected from the remaining users as good users, that is, negative samples. The specific steps are as follows:

1、选取近三个月停机总次数、近三个月欠费总金额和客户账期类型作为指标，这些指标均衡量了用户的欠费违约情况。由于指标的取值范围不一致，为了避免过于侧重单个指标，需要对指标进行标准化，标准化公式如下：1. Select the total number of outages in the past three months, the total amount of arrears in the past three months, and the type of customer account period as indicators. These indicators balance the default situation of users in arrears. Due to the inconsistent value range of the indicators, in order to avoid too much emphasis on a single indicator, it is necessary to standardize the indicators. The standardization formula is as follows:

其中，U_ij,i＝1,2,...,m,j＝1,2,3为原始数据中第j个指标的第i个记录，m为总用户人数，V_ij为标准化后的数据。Among them, U _ij ,i=1,2,...,m,j=1,2,3 is the i-th record of the j-th indicator in the original data, m is the total number of users, and V _ij is the standardized data.

2、通过计算熵值可以用来判断三个月停机总次数、近三个月欠费总金额和客户账期类型三个指标的离散程度，离散程度越大表明该指标对综合评价影响越大。2. By calculating the entropy value, it can be used to judge the degree of dispersion of the three indicators of the total number of shutdowns in the past three months, the total amount of arrears in the past three months, and the type of customer account period. The greater the degree of dispersion, the greater the impact of this indicator on the comprehensive evaluation .

首先计算指标的熵值，衡量了指标的离散程度，计算公式如下：First, calculate the entropy value of the indicator, which measures the degree of dispersion of the indicator. The calculation formula is as follows:

其中，r_ij表示第i个记录下第j个指标的比重 Among them, r _ij represents the proportion of the index j in the i-th record

然后，计算指标的权重，衡量了三个月停机总次数、近三个月欠费总金额和客户账期类型三个指标在计算总分时理应乘上的系数，计算公式如下：Then, calculate the weight of the indicators, and measure the coefficients that should be multiplied when calculating the total score of the three indicators:

其中，h_j为第j个指标的差异性系数h_j＝1-e_j,j＝1,2,3。Wherein, h _j is the difference coefficient of the jth index h _j =1-e _j , j=1,2,3.

最后，根据指标的权重和指标值，计算每个用户的熵值法得分Finally, calculate the entropy score of each user based on the weight of the indicator and the indicator value

3、对S_i分值由高到低排序，分值越高表示在欠费违约方面越严重，取得分前1％的用户作为坏用户，即正样本；在剩下的用户中随机抽取总用户人数的10％作为好用户，即负样本。正负样本的合集即为标准的样本数据，用于后续建立信用评分模型。3. Sort the S _i scores from high to low. The higher the score, the more serious the default in arrears. The users who get the top 1% of the score are regarded as bad users, that is, positive samples; 10% of the number of users are regarded as good users, that is, negative samples. The collection of positive and negative samples is the standard sample data, which is used to establish the credit scoring model in the future.

220、选取与衡量个人信用相关的指标并进行分箱等预处理；220. Select indicators related to measuring personal credit and perform preprocessing such as binning;

选取能全面评估用户信用状况的指标，同时为了便于后续的评分能形成评分表便于评估信用得分，需要对指标进行分箱处理，得到WOE值。Select indicators that can comprehensively evaluate the user's credit status, and at the same time, in order to facilitate the subsequent scoring and form a scoring table to facilitate the evaluation of credit scores, it is necessary to bin the indicators to obtain the WOE value.

为了全面评估用户的信用情况，除了从传统评分角度提取用户的基本信息、消费能力、信用记录、人脉关系和行为偏好等五大方面指标，还加入用户的信令数据，这里的信令数据主要考虑位置信息。考虑用户的白天和晚上的常驻位置，白天常驻位置在高端写字楼和CBD、晚上常驻位置在高端小区的用户，其信用状况较为优质。In order to comprehensively evaluate the user's credit situation, in addition to extracting the five major indicators of the user's basic information, consumption ability, credit record, personal relationship, and behavior preference from the perspective of traditional scoring, the user's signaling data is also added. The signaling data here is mainly considered location information. Considering the user's permanent location during the day and night, users who reside in high-end office buildings and CBDs during the day and in high-end residential areas at night have a relatively high-quality credit status.

用户的基本信息主要包含品牌、在网时长和身份等信息；消费能力是衡量用户在通信消费的消费层次、消费级别、消费活跃度，主要包含账户余额、主套餐包含的费用、上月总通话次数、上三个自然月平均充值额度等等；信用记录用于衡量用户履约能力，包含上三个自然月欠费总额、上一自然月单停机天数、上一自然月双停机天数等等；人脉关系用于衡量用户社交关系强度，从社交影响力和身边人的信用分来评估人脉关系，包括高频对端号码个数、高频对端号码平均时长、亲密人员个数、亲密人员平均消费水平等等；行为偏好用于衡量用户使用app的活跃度以及应用偏好，包括APP类型偏好top1、社区交友使用次数、社区交友使用流量、电商购物使用次数、股票类APP使用次数等等。用户的信令数据主要选取工作日10：00至17：00常驻位置为高端写字楼和CBD的次数和22：00至次日6：00常驻位置为高端小区的次数。The basic information of the user mainly includes information such as brand, online time, and identity; the consumption ability is a measure of the consumption level, consumption level, and consumption activity of the user in communication consumption, which mainly includes the account balance, the cost included in the main package, and the total number of calls last month. The number of times, the average recharge amount in the last three natural months, etc.; the credit record is used to measure the user's performance ability, including the total amount of arrears in the last three natural months, the number of single downtime days in the previous natural month, the number of double downtime days in the previous natural month, etc.; The network relationship is used to measure the strength of the user's social relationship. The network relationship is evaluated from the social influence and the credit score of the people around, including the number of high-frequency peer numbers, the average duration of high-frequency peer numbers, the number of close people, and the average number of close people. Consumption level, etc.; behavior preference is used to measure the activity of users using apps and application preferences, including APP type preference top1, the number of times of making friends in the community, the traffic of making friends in the community, the number of times of using e-commerce shopping, the number of times of using stock apps, etc. The signaling data of users mainly select the number of times that the permanent location is high-end office buildings and CBDs from 10:00 to 17:00 on weekdays and the number of times that the permanent location is high-end residential areas from 22:00 to 6:00 the next day.

为了便于后续的评分能形成评分表便于评估信用得分，需要对指标进行分箱，对于连续型指标，一个合理的分箱是应该使得每个箱内的数据量较为均衡，不宜过多或者过少，同时各个箱内负样本的占比应呈现单调上升或下降的趋势，这里采用WOE值，它既可以衡量各个分箱的趋势情况，也是后续的回归模型的变量输入，其计算公式如下：In order to facilitate the subsequent scoring and form a scoring table to facilitate the evaluation of credit scores, it is necessary to bin the indicators. For continuous indicators, a reasonable binning should make the amount of data in each box more balanced, and should not be too much or too little , and the proportion of negative samples in each bin should show a monotonous upward or downward trend. The WOE value is used here, which can not only measure the trend of each bin, but also be the variable input of the subsequent regression model. The calculation formula is as follows:

对于离散型指标，在指标的取值不多的时候，可直接按其取值作为分箱并求取WOE值；在取值较多的时候，可对某些取值进行合并，再求对应的WOE值。For discrete indicators, when the value of the indicator is not many, it can be directly divided into bins according to its value and calculate the WOE value; when there are many values, some values can be combined, and then the corresponding value can be calculated WOE value.

230、利用信令数据对评分模型进行自适应训练；230. Using the signaling data to perform adaptive training on the scoring model;

240、自动选取对信用评分有效的指标和系数240. Automatically select indicators and coefficients that are effective for credit scoring

首先，建立基于信令数据自适应的逻辑回归模型进行个人信用评分。First, an adaptive logistic regression model based on signaling data is established for personal credit scoring.

逻辑回归在信用评分模型中使用比较广泛，它的结构简单，系数的作用容易在业务上解释。Logistic regression is widely used in credit scoring models. Its structure is simple, and the role of coefficients is easy to explain in business.

用户为坏用户的概率可用P表示，则逻辑回归模型可表示为The probability that a user is a bad user can be expressed by P, and the logistic regression model can be expressed as

其中x_i(i＝1,2,...,s)为指标，由于P取值在0到1之间，而通过logit变换后，取值范围可变换为任意实数值，需要求解的是β＝(β₀,β₁,...,β_s)^T。Among them, x _i (i=1,2,...,s) is the index, since the value of P is between 0 and 1, after the logit transformation, the value range can be transformed into any real value, what needs to be solved is β=(β ₀ ,β ₁ ,...,β _s ) ^T .

在使用逻辑回归预测时，可以使用全部指标进入模型，但某些对预测贡献度不高的也会进入模型，导致模型预测的偏差变大，此时的解决方法是做变量筛选如前进法、后退法、逐步回归等方法，剔除作用不明显的指标。When using logistic regression prediction, all indicators can be used to enter the model, but some indicators that do not contribute much to the prediction will also enter the model, resulting in a larger deviation in model prediction. The solution at this time is to do variable screening such as forward method, Back-off method, stepwise regression and other methods to eliminate indicators with insignificant effects.

但这些传统的回归模型在做逐步回归的时候，变量选择和参数估计是分开两个阶段，导致了模型选择的不稳定性。基于信令数据自适应的逻辑回归模型利用了信令数据自适应地同时进行变量选择和系数估计，有效减小模型系数估计偏差。However, when these traditional regression models do stepwise regression, variable selection and parameter estimation are separated into two stages, which leads to the instability of model selection. The logistic regression model based on signaling data adaptation makes use of signaling data to simultaneously perform variable selection and coefficient estimation adaptively, effectively reducing the bias of model coefficient estimation.

这里首先采用Adaptive—Lasso方法求解逻辑回归模型。给定数据(X⁽ⁱ⁾,y⁽ⁱ⁾),i＝1,2,...,n，其中X⁽ⁱ⁾＝(x_i1,...,x_is)，表示样本数据中的第i个数据的WOE值向量，共n个，x_i1表示第i个数据的第一个指标对应的WOE值，y⁽ⁱ⁾表示目标变量，若第i个数据为正样本，则y⁽ⁱ⁾＝1；若第i个数据为负样本，则y⁽ⁱ⁾＝0。则在Adaptive—Lasso方法下β＝(β₀,β₁,...,β_s)^T的估计量定义为Here, the Adaptive-Lasso method is firstly used to solve the logistic regression model. Given data (X ⁽ⁱ⁾ ,y ⁽ⁱ⁾ ), i=1,2,...,n, where X ⁽ⁱ⁾ =(x _i1 ,...,x _is ), means that in the sample data The WOE value vector of the i-th data, a total of n, x _i1 represents the WOE value corresponding to the first index of the i-th data, y ⁽ⁱ⁾ represents the target variable, if the i-th data is a positive sample, then y ^{( i)} =1; if the i-th data is a negative sample, then y ⁽ⁱ⁾ =0. Then under the Adaptive-Lasso method, the estimator of β=(β ₀ ,β ₁ ,...,β _s ) ^T is defined as

(2)式的第一部分表示模型拟合的优良度，这是一般逻辑回归模型在求解时的部分，第二部分则表示系数的惩罚项，λ_n为惩罚参数。而其中表示公式(1)进行最小二乘估计得到的β_j的估计值，当|β_j|系数较大的时候，给予较小的惩罚，能得到较小的偏差；而当|β_j|系数较小的时候，给予较大的惩罚，该系数则近似为0，实现了变量选择的功能。The first part of the formula (2) represents the goodness of model fitting, which is the part of the general logistic regression model when solving, the second part Then it represents the penalty item of the coefficient, and λ _n is the penalty parameter. and in Indicates the estimated value of β _j obtained by the least squares estimation of formula (1). When the coefficient of |β _j | is large, Given a small penalty, a small deviation can be obtained; and when the |β _j | coefficient is small, Given a larger penalty, the coefficient is approximately 0, realizing the function of variable selection.

同时求解的过程需要利用信令数据方面的指标对其他指标的系数进行自适应地控制，确保信令数据方面的指标贡献较高的权重，所以需要在Adaptive—Lasso方法的基础上增加惩罚项。At the same time, the process of solving needs to use the indicators of signaling data to adaptively control the coefficients of other indicators to ensure that the indicators of signaling data contribute a higher weight, so it is necessary to add a penalty item on the basis of the Adaptive-Lasso method.

记工作日10：00至17：00常驻位置为高端写字楼和CBD的次数和22：00至次日6：00常驻位置为高端小区的次数两个指标在所有指标x_i(i＝1,2,...,s)中的下标为k₁,k₂，即表示工作日10：00至17：00常驻位置为高端写字楼和CBD的次数，表示指标对应的系数。Record the number of times that the permanent location is high-end office buildings and CBDs from 10:00 to 17:00 on weekdays and the number of times that the permanent location is high-end residential areas from 22:00 to 6:00 the next day. The two indicators are in all indicators x _i (i=1 ,2,...,s) the subscripts are k ₁ ,k ₂ , namely Indicates the number of times the permanent location is high-end office buildings and CBDs from 10:00 to 17:00 on weekdays, Representation index corresponding coefficients.

为了保证信令数据方面的指标和贡献较高的权重，需要对β_j之间的差异进行控制。考虑添加惩罚项用于控制指标和的系数值，通过限制的大小，保证了指标和的系数必须大于其他指标的系数，即确保了信令数据方面的指标在模型贡献较高的权重，而ψ₁为惩罚系数。In order to guarantee the signaling data aspect indicators and Contributing higher weights requires controlling for differences between _βj . Consider adding a penalty used to control indicators and The coefficient value of , restricted by The size of the guaranteed index and The coefficient of must be greater than the coefficients of other indicators, that is, to ensure that the indicators of signaling data contribute a higher weight to the model, and ψ ₁ is the penalty coefficient.

综上则有基于信令数据的自适应逻辑回归模型β＝(β₀,β₁,...,β_s)^T的估计量定义为In summary, the adaptive logistic regression model based on signaling data β=(β ₀ ,β ₁ ,...,β _s ) The estimator of ^T is defined as

250、将回归模型转化为评分模型250. Transform the regression model into a scoring model

将回归系数转换为信用评分的形式是一个量表编制的过程，为了方便业务人员使用以及评分之间的差异具有业务意义，通常需要满足一下三点要求：Converting regression coefficients into credit scores is a process of scale compilation. In order to facilitate the use of business personnel and the differences between scores have business significance, the following three requirements are usually required:

1、评分控制在一定范围内，如0-900分之间。1. The score is controlled within a certain range, such as between 0-900 points.

2、在特定的分数时，好用户和坏用户具有一定的比例关系，这里采用。2. For a specific score, there is a certain proportional relationship between good users and bad users, which is adopted here.

衡量，如希望评分值在600分的时候好用户与坏用户的比例为50：1。 For measurement, if the score is 600, the ratio of good users to bad users is 50:1.

3、评分值的增加应该能反映好用户和坏用户比例的变化，如希望评分值没增加50分，odds也增加一倍。3. The increase in the rating value should reflect the change in the proportion of good users and bad users. If the rating value does not increase by 50 points, the odds will also double.

目前业界比较通用的信用评分方程式如下：The current credit scoring formula commonly used in the industry is as follows:

score＝offest+factor×ln(odds)，score=offest+factor×ln(odds),

为了满足以上3个条件，该方程式需满足以下两个等式In order to satisfy the above three conditions, the equation needs to satisfy the following two equations

a、score＝offest+factor×ln(odds)a. score=offest+factor×ln(odds)

b、score+pdo＝offest+factor×ln(2×odds)b. score+pdo=offest+factor×ln(2×odds)

其中pdo表示odds增加1倍需要评分值增加的值。则有Among them, pdo indicates the value that the score value needs to be increased by doubling the odds. then there is

factor＝pdo/ln(2)，offest＝score-factor×ln(odds)。factor=pdo/ln(2), offset=score-factor×ln(odds).

从而得到最终的评分方程式为：Thus, the final scoring equation is obtained as:

score＝offest+factor×ln(odds)score=offest+factor×ln(odds)

假如评分值在600分的时候好用户与坏用户的比例为50：1，且odds增加一倍的时候，评分增加50分。则有：If the ratio of good users to bad users is 50:1 when the score is 600 points, and the odds doubles, the score increases by 50 points. Then there are:

factor＝50/ln(2)＝72.13，offest＝600-72.13×ln(50)＝317.83factor＝50/ln(2)＝72.13，offest＝600-72.13×ln(50)＝317.83

于是得到最终的评分方程式：score＝317.83+72.13×ln(odds)。Then the final scoring equation is obtained: score=317.83+72.13×ln(odds).

由于逻辑回归方程的左边可知-logit(P)＝ln(odds)，则将步骤4得到β的估计量代入评分方程式，得到：Since the left side of the logistic regression equation shows -logit(P)=ln(odds), the estimator of β will be obtained in step 4 Substituting into the scoring equation, we get:

这里的x_i表示第i个变量(指标)的值所对应的分箱的WOE值，为(3)式得到的回归模型系数。Here x _i represents the binned WOE value corresponding to the i-th variable (indicator) value, is the regression model coefficient obtained from formula (3).

故，根据评分公式可得到对应每个变量每个分箱的评分值。Therefore, according to the scoring formula, the scoring value corresponding to each bin of each variable can be obtained.

其中，WOE表示变量的分箱对应的WOE值。Among them, WOE represents the WOE value corresponding to the binning of the variable.

对于方法实施方式，为了简单描述，故将其都表述为一系列的动作组合，但是本领域技术人员应该知悉，本发明实施方式并不受所描述的动作顺序的限制，因为依据本发明实施方式，某些步骤可以采用其他顺序或者同时进行。其次，本领域技术人员也应该知悉，说明书中所描述的实施方式均属于优选实施方式，所涉及的动作并不一定是本发明实施方式所必须的。For the method implementation, for the sake of simple description, it is expressed as a series of action combinations, but those skilled in the art should know that the implementation of the present invention is not limited by the described action order, because according to the implementation of the present invention , certain steps may be performed in other order or simultaneously. Secondly, those skilled in the art should also know that the implementations described in the specification belong to preferred implementations, and the actions involved are not necessarily required by the implementations of the present invention.

图3示出了本发明一实施例提供的基于指令数据的个人信用评分装置的结构示意图，参见图2，该装置包括：获取模块310、分箱模块320、建模模块330、配置模块340以及评分模块，其中：Fig. 3 shows a schematic structural diagram of a personal credit scoring device based on instruction data provided by an embodiment of the present invention. Referring to Fig. 2, the device includes: an acquisition module 310, a binning module 320, a modeling module 330, a configuration module 340 and Scoring module, where:

获取模块310，用于获取样本账户群，并按照预设规则从样本账户群中选取出正样本和负样本；An acquisition module 310, configured to acquire sample account groups, and select positive samples and negative samples from the sample account groups according to preset rules;

分箱模块320，用于对第一预设指标群进行分箱处理，并根据每个分箱内负样本的占比获取每个分箱对应的WOE值；The binning module 320 is configured to bin the first preset index group, and obtain the WOE value corresponding to each bin according to the proportion of negative samples in each bin;

建模模块330，用于根据每个分箱对应的WOE值获取预构建逻辑回归模型的参数的估计量；Modeling module 330, for obtaining the estimator of the parameter of pre-built logistic regression model according to the WOE value corresponding to each sub-bin;

配置模块340，用于对所述逻辑回归模型参数的估计量配置惩罚项，所述惩罚项用于配置信令数据与其他非信令数据对所述逻辑回归模型的贡献；The configuration module 340 is configured to configure a penalty item for the estimator of the logistic regression model parameter, and the penalty item is used to configure the contribution of signaling data and other non-signaling data to the logistic regression model;

评分模块350，用于根据所述逻辑回归模型对用户的个人信用进行评分，获取用户的个人信用评分。Scoring module 350, configured to score the user's personal credit according to the logistic regression model, and obtain the user's personal credit score.

获取模块310在接收到开始评分的指令后，从预建立数据库中获取账户样本群，然后对账户样本群进行划分，并将划分结果发送至分箱模块320；分箱模块320结合第一预设指标群对接收到的正/负样本进行分箱处理，然后获取每个分箱对应的WOE数据，然后将其发送至建模模块330，由建模模块330结合接收到的数据对预建立的模型进行解析，获取模型中未知参数的估计量，然后将模型发送至配置模块340；配置模块340通过对模型配置惩罚项，以限定信令指标和非信令指标对模型的贡献，并将建立完成的模型发送至评分模块350；评分模型350基于建立完成的模型对用户进行评分。After receiving the instruction to start scoring, the acquisition module 310 acquires the account sample group from the pre-established database, then divides the account sample group, and sends the division result to the binning module 320; the binning module 320 combines the first preset The indicator group performs binning processing on the received positive/negative samples, then obtains the WOE data corresponding to each binning, and then sends it to the modeling module 330, and the modeling module 330 combines the received data with the pre-established Analyze the model, obtain the estimated quantity of unknown parameters in the model, and then send the model to the configuration module 340; the configuration module 340 configures the penalty item for the model to limit the contribution of the signaling index and the non-signaling index to the model, and will establish The completed model is sent to the scoring module 350; the scoring model 350 scores the user based on the completed model.

下面对本装置的各功能模块进行详细说明：The functional modules of the device are described in detail below:

获取模块310，用于采用熵值法判断每个样本账户的第二预设指标群的离散程度；根据每个样本账户的第二预设指标群的离散程度从所述样本账户群中选取正样本和负样本。The acquisition module 310 is used to judge the degree of dispersion of the second preset indicator group of each sample account by using the entropy method; and select positive indicators from the sample account group according to the degree of dispersion of the second preset indicator group of each sample account. samples and negative samples.

配置模块340，用于对所述第一预设指标群进行分析，获取与信令数据相关的第一指标组以及与信令数据无关的第二指标组；构建所述第二指标组中指标的系数与所述第一指标组中指标的系数的惩罚项；将所述惩罚项配置至所述逻辑回归模型参数的估计量。The configuration module 340 is configured to analyze the first preset indicator group, obtain a first indicator group related to signaling data and a second indicator group unrelated to signaling data; construct indicators in the second indicator group and the penalty item of the coefficient of the index in the first index group; configure the penalty item to the estimator of the logistic regression model parameter.

评分模块350，用于在预设约束条件下，将所述逻辑回归模型转化为评分模型；将用户的第二预设指标群对应的数据作为评分模型的输入，获取所述第二预设指标群中每个指标对应各分箱的评分值；根据每个指标对应各分箱的评分值获取用户的个人信用评分。The scoring module 350 is configured to transform the logistic regression model into a scoring model under preset constraints; use the data corresponding to the second preset indicator group of the user as the input of the scoring model to obtain the second preset indicator Each indicator in the group corresponds to the scoring value of each sub-box; the user's personal credit score is obtained according to the scoring value of each indicator corresponding to each sub-bin.

可见，对比于现有的逻辑回归个人信用评分技术，本技术方案利用基于信令数据的自适应逻辑回归模型对传统的逻辑回归个人信用评分模型进行改进，保证了个人信用评分模型在筛选指标的时候保持稳定并体现信令数据的重要作用，减少模型系数的误差，使得评分模型更加合理。It can be seen that, compared with the existing logistic regression personal credit scoring technology, this technical solution uses an adaptive logistic regression model based on signaling data to improve the traditional logistic regression personal credit scoring model, ensuring that the personal credit scoring model is effective in screening indicators. The time remains stable and reflects the important role of signaling data, reducing the error of model coefficients and making the scoring model more reasonable.

综上，对比于传统的个人信用评分方法,本方案所能带来的效益对比如下:In summary, compared with the traditional personal credit scoring method, the benefits brought by this program are compared as follows:

对于装置实施方式而言，由于其与方法实施方式基本相似，所以描述的比较简单，相关之处参见方法实施方式的部分说明即可。As for the device implementation, since it is basically similar to the method implementation, the description is relatively simple, and for related parts, please refer to the part of the description of the method implementation.

应当注意的是，在本发明的装置的各个部件中，根据其要实现的功能而对其中的部件进行了逻辑划分，但是，本发明不受限于此，可以根据需要对各个部件进行重新划分或者组合。It should be noted that among the various components of the device of the present invention, the components are logically divided according to the functions to be realized, but the present invention is not limited thereto, and each component can be re-divided as required or a combination.

本发明的各个部件实施方式可以以硬件实现，或者以在一个或者多个处理器上运行的软件模块实现，或者以它们的组合实现。本装置中，PC通过实现因特网对设备或者装置远程控制，精准的控制设备或者装置每个操作的步骤。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如，计算机程序和计算机程序产品)。这样实现本发明的程序可以存储在计算机可读介质上，并且程序产生的文件或文档具有可统计性，产生数据报告和cpk报告等，能对功放进行批量测试并统计。应该注意的是上述实施方式对本发明进行说明而不是对本发明进行限制，并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施方式。在权利要求中，不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中，这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。The various component implementations of the present invention can be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. In this device, the PC realizes the remote control of the equipment or device through the Internet, and precisely controls each operation step of the device or device. The present invention can also be implemented as an apparatus or an apparatus program (for example, a computer program and a computer program product) for performing a part or all of the methods described herein. In this way, the program for realizing the present invention can be stored on a computer-readable medium, and the files or documents generated by the program can be counted, and can generate data reports and cpk reports, etc., and can perform batch testing and statistics on power amplifiers. It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means can be embodied by one and the same item of hardware. The use of the words first, second, and third, etc. does not indicate any order. These words can be interpreted as names.

最后应说明的是：以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still be Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent replacements are made to some of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the various embodiments of the present invention.

Claims

1. a kind of personal credit methods of marking of data based on instruction, which is characterized in that including：

Sample account group is obtained, and selects positive sample and negative sample from sample account group according to preset rules；

Branch mailbox processing is carried out to the first pre-set level group, and each branch mailbox is obtained according to the accounting of negative sample in each branch mailbox and is corresponded to WOE value；

The estimator of the parameter of prebuild Logic Regression Models is obtained according to the corresponding WOE value of each branch mailbox；

Penalty term is configured to the estimator of the Logic Regression Models parameter, the penalty term is for configuration signal data and other Contribution of the non-signaling data to the Logic Regression Models；

It is scored according to personal credit of the Logic Regression Models to user, obtains the personal credit scoring of user.

2. the method according to claim 1, wherein described select from sample account group according to preset rules Positive sample and negative sample include：

The dispersion degree of the second pre-set level group of each sample account is judged using Information Entropy；

Chosen from the sample account group according to the dispersion degree of the second pre-set level group of each sample account positive sample and Negative sample.

3. the method according to claim 1, wherein the estimator to the Logic Regression Models parameter is matched Setting penalty term includes：

The first pre-set level group is analyzed, the first index group relevant to signaling data and and signaling data are obtained The second unrelated index group；

Construct the penalty term of the second index group middle finger target coefficient Yu the first index group middle finger target coefficient；

The penalty term is configured to the estimator of the Logic Regression Models parameter.

4. according to the method described in claim 3, it is characterized in that, the penalty term is

Wherein, ψ₁For penalty coefficient, β_jFor the coefficient of j-th of index in the second index group,For in the first index group Kth_nThe coefficient of a index.

5. method according to claim 1-4, which is characterized in that it is described according to the Logic Regression Models to The personal credit at family carries out scoring：

Under default constraint condition, Rating Model is converted by the Logic Regression Models；

Using the corresponding data of the second pre-set level group of user as the input of Rating Model, the second pre-set level group is obtained In each index correspond to the score value of each branch mailbox；

The personal credit scoring of user is obtained according to the score value that each index corresponds to each branch mailbox.

6. a kind of personal credit scoring apparatus of data based on instruction, which is characterized in that including：

Module is obtained, for obtaining sample account group, and positive sample is selected from sample account group according to preset rules and bears Sample；

Branch mailbox module for carrying out branch mailbox processing to the first pre-set level group, and is obtained according to the accounting of negative sample in each branch mailbox Take the corresponding WOE value of each branch mailbox；

Modeling module, the estimator of the parameter for obtaining prebuild Logic Regression Models according to the corresponding WOE value of each branch mailbox；

Configuration module configures penalty term for the estimator to the Logic Regression Models parameter, and the penalty term is for configuring Signaling data and other contributions of non-signaling data to the Logic Regression Models；

Grading module obtains the individual of user for scoring according to personal credit of the Logic Regression Models to user Credit scoring.

7. device according to claim 6, which is characterized in that the acquisition module, for each using Information Entropy judgement The dispersion degree of second pre-set level group of sample account；According to the dispersion degree of the second pre-set level group of each sample account Positive sample and negative sample are chosen from the sample account group.

8. device according to claim 6, which is characterized in that the configuration module, for first pre-set level Group is analyzed, and the first index group relevant to signaling data and the second index group unrelated with signaling data are obtained；Building The penalty term of the second index group middle finger target coefficient and the first index group middle finger target coefficient；The penalty term is matched It sets to the estimator of the Logic Regression Models parameter.

9. device according to claim 8, which is characterized in that the penalty term is

Wherein, Ψ₁For penalty coefficient, β_jFor the coefficient of j-th of index in the second index group,For the first index group Middle kth_nThe coefficient of a index.

10. according to the described in any item devices of claim 6-9, which is characterized in that institute's scoring module, in default constraint Under the conditions of, Rating Model is converted by the Logic Regression Models；Using the corresponding data of the second pre-set level group of user as The input of Rating Model obtains the score value that each index in the second pre-set level group corresponds to each branch mailbox；According to each finger The score value of the corresponding each branch mailbox of mark obtains the personal credit scoring of user.