CN107578270A - Method, device and computing device for constructing a financial label - Google Patents
Method, device and computing device for constructing a financial label Download PDFInfo
- Publication number
- CN107578270A CN107578270A CN201710655552.XA CN201710655552A CN107578270A CN 107578270 A CN107578270 A CN 107578270A CN 201710655552 A CN201710655552 A CN 201710655552A CN 107578270 A CN107578270 A CN 107578270A
- Authority
- CN
- China
- Prior art keywords
- user
- consumption
- transaction
- matrix
- dimension
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
本发明实施例涉及数据处理技术领域,尤其涉及一种金融标签的构建方法、装置及计算设备,用以解决现有技术中对用户进行标签的方法无法标示出用户的消费特征,准确性低的问题。本发明实施例中,获取参考用户的交易数据以及行为日志;根据参考用户的交易数据,构建消费评分矩阵,消费评分矩阵中的一个元素为参考用户在一个交易维度上的消费评分;根据参考用户的行为日志,建立参考用户的向量空间模型;针对参考用户的一个行为数据,将行为数据对应的词语映射到消费评分矩阵中参考用户的一个交易维度上,根据行为数据和映射的交易维度的消费评分,确定参考用户的综合评分;根据参考用户的综合评分,确定参考用户的金融标签。
The embodiment of the present invention relates to the technical field of data processing, and in particular to a construction method, device, and computing device of a financial label, which are used to solve the problem that the method of labeling users in the prior art cannot indicate the consumption characteristics of the user, and the accuracy is low. question. In the embodiment of the present invention, the transaction data and behavior log of the reference user are obtained; according to the transaction data of the reference user, a consumption scoring matrix is constructed, and an element in the consumption scoring matrix is the consumption scoring of the reference user on a transaction dimension; According to the behavior log of the reference user, the vector space model of the reference user is established; for a behavior data of the reference user, the words corresponding to the behavior data are mapped to a transaction dimension of the reference user in the consumption scoring matrix, and the consumption according to the behavior data and the mapped transaction dimension Score, determine the comprehensive score of the reference user; determine the financial label of the reference user according to the comprehensive score of the reference user.
Description
技术领域technical field
本发明涉及数据处理技术领域,尤其涉及一种金融标签的构建方法、装置及计算设备。The present invention relates to the technical field of data processing, in particular to a financial label construction method, device and computing equipment.
背景技术Background technique
用户标签是根据用户的社会属性、生活习惯和消费行为等信息而抽象出一个标签化的用户画像模型。该模型可以被广泛应用在精准营销、用户及行业分析和个性化服务等领域。目前已公开的标签构建方法主要通过过滤用户的Web浏览日志,提取关键字段作为标签标识,并通过判断该标识所属的标签类别为用户打上标签。同时通过综合评价标签出现的时间和频次等信息计算用户在该标签下的兴趣度,并将其作为用户标签的权重。User labeling is to abstract a labeled user portrait model based on information such as the user's social attributes, living habits, and consumption behavior. This model can be widely used in precision marketing, user and industry analysis, and personalized services. The currently disclosed label construction method mainly filters the user's web browsing logs, extracts key fields as the label identifier, and tags the user by judging the label category to which the identifier belongs. At the same time, the user's interest in the tag is calculated by comprehensively evaluating the time and frequency of the tag's appearance, and it is used as the weight of the user tag.
现有技术中的标签方法仅以用户的行为日志作为确定标签的依据,即基于用户在互联网上的浏览行为对用户进行标签。首先这种基于浏览行为的标签构建方法会产生大量的语义冗余的文本标签,这使得在标签过程中很难从海量的、带有信息冗余的标签体系中找出有针对性的标签用于分析。此外,由于金融和支付行业的特殊性,许多常用的标签无法从用户的行为日志中直接获取。因此,仅根据用户的行为日志进行标签的方法往往无法标示出用户的消费特征,准确性较低。The labeling method in the prior art only uses the user's behavior log as the basis for determining the label, that is, the user is labeled based on the user's browsing behavior on the Internet. First of all, this method of label construction based on browsing behavior will generate a large number of semantically redundant text labels, which makes it difficult to find targeted label users from the massive label system with information redundancy during the labeling process. for analysis. In addition, due to the particularity of the financial and payment industries, many commonly used tags cannot be directly obtained from user behavior logs. Therefore, the method of labeling only based on the user's behavior log is often unable to mark the user's consumption characteristics, and the accuracy is low.
发明内容Contents of the invention
本申请提供一种金融标签的构建方法、装置及计算设备,用以解决现有技术中对用户进行标签的方法无法标示出用户的消费特征,准确性低的问题。The present application provides a method, device and computing equipment for constructing a financial label, which are used to solve the problem that the user's consumption characteristics cannot be marked in the existing method of labeling the user, and the accuracy is low.
本发明实施例提供的一种金融标签的构建方法,包括:A method for constructing a financial label provided by an embodiment of the present invention includes:
获取参考用户的交易数据以及行为日志;Obtain transaction data and behavior logs of reference users;
根据所述参考用户的交易数据,构建消费评分矩阵,所述消费评分矩阵中的一个元素为所述参考用户在一个交易维度上的消费评分;Constructing a consumption scoring matrix according to the transaction data of the reference user, wherein an element in the consumption scoring matrix is the consumption scoring of the reference user in a transaction dimension;
根据所述参考用户的行为日志,建立所述参考用户的向量空间模型,所述向量空间模型中包括所述参考用户的多个行为数据,每个行为数据对应所述参考用户的行为日志中的一个词语;According to the behavior log of the reference user, a vector space model of the reference user is established, and the vector space model includes a plurality of behavior data of the reference user, and each behavior data corresponds to an item in the behavior log of the reference user a word;
针对所述参考用户的一个行为数据,将所述行为数据对应的词语映射到所述消费评分矩阵中所述参考用户的一个交易维度上,根据所述行为数据和映射的交易维度的消费评分,确定所述参考用户的综合评分;For a behavior data of the reference user, map the words corresponding to the behavior data to a transaction dimension of the reference user in the consumption scoring matrix, and according to the consumption scoring of the behavior data and the mapped transaction dimension, determining a composite score for the reference user;
根据所述参考用户的综合评分,确定所述参考用户的金融标签。The financial label of the reference user is determined according to the comprehensive score of the reference user.
可选的,所述根据所有参考用户的交易数据,构建消费评分矩阵,包括:Optionally, the consumption scoring matrix is constructed according to the transaction data of all reference users, including:
针对每个参考用户,利用所述参考用户的交易数据,计算所述参考用户在不同交易维度的消费状况;根据所述消费状况,计算所述参考用户在每个交易维度的消费评分;For each reference user, use the transaction data of the reference user to calculate the consumption status of the reference user in different transaction dimensions; calculate the consumption score of the reference user in each transaction dimension according to the consumption status;
利用所有参考用户在每个交易维度的消费评分,构建消费评分矩阵。Use the consumption scores of all reference users in each transaction dimension to construct a consumption score matrix.
可选的,利用以下公式计算所述参考用户在一个交易维度的消费评分:Optionally, use the following formula to calculate the consumption score of the reference user in a transaction dimension:
其中,Score为所述用户在一个交易维度的消费评分,θ为所述交易维度的权重;ω为所述参考用户在所述交易维度的消费笔数和消费金额的加权平均值;υ为所有参考用户在所述交易维度的消费均值,σ为所有参考用户在所述交易维度的方差;为所述参考用户在所述交易维度的消费金额与所述参考用户的所有消费金额之和的比值。Among them, Score is the consumption score of the user in a transaction dimension, θ is the weight of the transaction dimension; ω is the weighted average of the number of transactions and the consumption amount of the reference user in the transaction dimension; υ is all The average consumption value of the reference user in the transaction dimension, σ is the variance of all reference users in the transaction dimension; is the ratio of the consumption amount of the reference user in the transaction dimension to the sum of all consumption amounts of the reference user.
可选的,所述根据所有参考用户的交易数据,构建消费评分矩阵之后,还包括:Optionally, after constructing the consumption scoring matrix according to the transaction data of all reference users, it also includes:
采用矩阵分解的方法,对所述消费评分矩阵中的残缺值进行补全。The method of matrix decomposition is used to complete the missing values in the consumption scoring matrix.
可选的,所述采用矩阵分解的方法,对所述消费评分矩阵中的残缺值进行补全,包括:Optionally, the matrix decomposition method is used to complete the incomplete values in the consumption scoring matrix, including:
随机生成第一参数行向量和第二参数行向量,所述第一参数行向量的元素个数与所述消费评分矩阵的行数相等,所述第二参数行向量的元素个数与所述消费评分矩阵的列数相等;Randomly generate a first parameter row vector and a second parameter row vector, the number of elements of the first parameter row vector is equal to the number of rows of the consumption score matrix, and the number of elements of the second parameter row vector is equal to the number of elements of the The consumption score matrix has the same number of columns;
根据所述第一参数行向量和所述第二参数行向量,计算所述消费评分矩阵的误差;calculating an error of the consumption score matrix according to the first parameter row vector and the second parameter row vector;
根据所述误差更新所述第一参数行向量和所述第二参数行向量,并重复步骤根据所述第一参数行向量和所述第二参数行向量,计算所述消费评分矩阵的误差,直至所述误差收敛;updating the first parameter row vector and the second parameter row vector according to the error, and repeating steps to calculate the error of the consumption score matrix according to the first parameter row vector and the second parameter row vector, until the error converges;
根据所述第一参数行向量和所述第二参数行向量确定补全后的消费评分矩阵。A completed consumption score matrix is determined according to the first parameter row vector and the second parameter row vector.
可选的,所述将所述行为数据对应的词语映射到所述消费评分矩阵中所述参考用户的一个交易维度上,包括:Optionally, the mapping of words corresponding to the behavior data to a transaction dimension of the reference user in the consumption scoring matrix includes:
针对所述参考用户的一个交易维度,计算所述交易维度与所述行为数据对应的词语的相似度;For a transaction dimension of the reference user, calculate the similarity between the transaction dimension and the words corresponding to the behavior data;
从所述参考用户的所有交易维度中,确定与所述行为数据对应的词语的相似度最高的交易维度;From all the transaction dimensions of the reference user, determine the transaction dimension with the highest similarity to the words corresponding to the behavior data;
将所述行为数据对应的词语映射到所述相似度最高的交易维度上。The word corresponding to the behavior data is mapped to the transaction dimension with the highest similarity.
可选的,所述参考用户的数量为多个;Optionally, the number of the reference users is multiple;
所述根据所述参考用户的综合评分,确定所述参考用户的金融标签,包括:The determining the financial label of the reference user according to the comprehensive score of the reference user includes:
根据业务规则和参考用户的背景资料,从所有参考用户中确定属于同一类标签的参考用户;According to the business rules and the background information of the reference users, determine the reference users belonging to the same category of tags from all the reference users;
根据属于同一类标签的参考用户的综合评分,确定该类标签的预测模型;According to the comprehensive score of the reference users belonging to the same category of labels, the prediction model of this category of labels is determined;
根据各类标签的预测模型,得到综合标签分类模型;According to the prediction model of various labels, a comprehensive label classification model is obtained;
根据所述参考用户的综合评分和所述综合标签分类模型,确定所述参考用户的金融标签。The financial label of the reference user is determined according to the comprehensive score of the reference user and the comprehensive label classification model.
可选的,所述根据所述行为数据和映射的交易维度的消费评分,确定所述参考用户的综合评分之后,还包括:Optionally, after determining the comprehensive score of the reference user according to the behavior data and the consumption score of the mapped transaction dimension, the method further includes:
确定形成综合评分矩阵的历史时间,所述综合评分矩阵为所有参考用户的综合评分组成;Determine the historical time for forming the comprehensive scoring matrix, the comprehensive scoring matrix is composed of the comprehensive scoring of all reference users;
根据所述历史时间和当前时间,计算所述当前时间之下,衰减后的综合评分矩阵;According to the historical time and the current time, calculate the comprehensive score matrix after the decay under the current time;
根据以下公式计算所述衰减后的综合评分矩阵,Calculate the comprehensive scoring matrix after the decay according to the following formula,
其中,α为衰减因子,t为当前时间,T为历史时间,M(T)为历史时间下的综合评分矩阵,M(t)为当前时间下的综合评分矩阵,M’(t)为所述衰减后的综合评分矩阵。Among them, α is the attenuation factor, t is the current time, T is the historical time, M(T) is the comprehensive scoring matrix under the historical time, M(t) is the comprehensive scoring matrix under the current time, and M'(t) is the comprehensive scoring matrix under the current time. The comprehensive scoring matrix after the decay described above.
一种金融标签的构建装置,包括:A device for constructing a financial label, comprising:
获取单元,用于获取参考用户的交易数据以及行为日志;The acquisition unit is used to acquire the transaction data and behavior logs of the reference user;
交易处理单元,用于根据所述参考用户的交易数据,构建消费评分矩阵,所述消费评分矩阵中的一个元素为所述参考用户在一个交易维度上的消费评分;A transaction processing unit, configured to construct a consumption scoring matrix based on the transaction data of the reference user, where an element in the consumption scoring matrix is the consumption scoring of the reference user in a transaction dimension;
文本处理单元,用于根据所述参考用户的行为日志,建立所述参考用户的向量空间模型,所述向量空间模型中包括所述参考用户的多个行为数据,每个行为数据对应所述参考用户的行为日志中的一个词语;A text processing unit, configured to establish a vector space model of the reference user based on the behavior log of the reference user, the vector space model includes a plurality of behavior data of the reference user, and each behavior data corresponds to the reference user A word in the user's behavior log;
组合计算单元,用于针对所述参考用户的一个行为数据,将所述行为数据对应的词语映射到所述消费评分矩阵中所述参考用户的一个交易维度上,根据所述行为数据和映射的交易维度的消费评分,确定所述参考用户的综合评分;A combined computing unit, for a piece of behavior data of the reference user, to map the words corresponding to the behavior data to a transaction dimension of the reference user in the consumption scoring matrix, according to the behavior data and the mapped The consumption score of the transaction dimension is used to determine the comprehensive score of the reference user;
标签单元,用于根据所述参考用户的综合评分,确定所述参考用户的金融标签。The labeling unit is configured to determine the financial label of the reference user according to the comprehensive score of the reference user.
可选的,所述交易处理单元,具体用于:Optionally, the transaction processing unit is specifically used for:
针对每个参考用户,利用所述参考用户的交易数据,计算所述参考用户在不同交易维度的消费状况;根据所述消费状况,计算所述参考用户在每个交易维度的消费评分;For each reference user, use the transaction data of the reference user to calculate the consumption status of the reference user in different transaction dimensions; calculate the consumption score of the reference user in each transaction dimension according to the consumption status;
利用所有参考用户在每个交易维度的消费评分,构建消费评分矩阵。Use the consumption scores of all reference users in each transaction dimension to construct a consumption score matrix.
可选的,所述交易处理单元,具体用于利用以下公式计算所述参考用户在一个交易维度的消费评分:Optionally, the transaction processing unit is specifically configured to calculate the consumption score of the reference user in one transaction dimension by using the following formula:
其中,Score为所述用户在一个交易维度的消费评分,θ为所述交易维度的权重;ω为所述参考用户在所述交易维度的消费笔数和消费金额的加权平均值;υ为所有参考用户在所述交易维度的消费均值,σ为所有参考用户在所述交易维度的方差;为所述参考用户在所述交易维度的消费金额与所述参考用户的所有消费金额之和的比值。Among them, Score is the consumption score of the user in a transaction dimension, θ is the weight of the transaction dimension; ω is the weighted average of the number of transactions and the consumption amount of the reference user in the transaction dimension; υ is all The average consumption value of the reference user in the transaction dimension, σ is the variance of all reference users in the transaction dimension; is the ratio of the consumption amount of the reference user in the transaction dimension to the sum of all consumption amounts of the reference user.
可选的,所述交易处理单元,还用于:Optionally, the transaction processing unit is also used for:
采用矩阵分解的方法,对所述消费评分矩阵中的残缺值进行补全。The method of matrix decomposition is used to complete the missing values in the consumption scoring matrix.
可选的,所述交易处理单元,具体用于:Optionally, the transaction processing unit is specifically used for:
随机生成第一参数行向量和第二参数行向量,所述第一参数行向量的元素个数与所述消费评分矩阵的行数相等,所述第二参数行向量的元素个数与所述消费评分矩阵的列数相等;Randomly generate a first parameter row vector and a second parameter row vector, the number of elements of the first parameter row vector is equal to the number of rows of the consumption score matrix, and the number of elements of the second parameter row vector is equal to the number of elements of the The consumption score matrix has the same number of columns;
根据所述第一参数行向量和所述第二参数行向量,计算所述消费评分矩阵的误差;calculating an error of the consumption score matrix according to the first parameter row vector and the second parameter row vector;
根据所述误差更新所述第一参数行向量和所述第二参数行向量,并重复步骤根据所述第一参数行向量和所述第二参数行向量,计算所述消费评分矩阵的误差,直至所述误差收敛;updating the first parameter row vector and the second parameter row vector according to the error, and repeating steps to calculate the error of the consumption score matrix according to the first parameter row vector and the second parameter row vector, until the error converges;
根据所述第一参数行向量和所述第二参数行向量确定补全后的消费评分矩阵。A completed consumption score matrix is determined according to the first parameter row vector and the second parameter row vector.
可选的,所述组合计算单元,具体用于:Optionally, the combined computing unit is specifically used for:
针对所述参考用户的一个交易维度,计算所述交易维度与所述行为数据对应的词语的相似度;For a transaction dimension of the reference user, calculate the similarity between the transaction dimension and the words corresponding to the behavior data;
从所述参考用户的所有交易维度中,确定与所述行为数据对应的词语的相似度最高的交易维度;From all the transaction dimensions of the reference user, determine the transaction dimension with the highest similarity to the words corresponding to the behavior data;
将所述行为数据对应的词语映射到所述相似度最高的交易维度上。The word corresponding to the behavior data is mapped to the transaction dimension with the highest similarity.
可选的,所述参考用户的数量为多个;Optionally, the number of the reference users is multiple;
所述标签单元,具体用于:The label unit is specifically used for:
根据业务规则和参考用户的背景资料,从所有参考用户中确定属于同一类标签的参考用户;According to the business rules and the background information of the reference users, determine the reference users belonging to the same category of tags from all the reference users;
根据属于同一类标签的参考用户的综合评分,确定该类标签的预测模型;According to the comprehensive score of the reference users belonging to the same category of labels, the prediction model of this category of labels is determined;
根据各类标签的预测模型,得到综合标签分类模型;According to the prediction model of various labels, a comprehensive label classification model is obtained;
根据所述参考用户的综合评分和所述综合标签分类模型,确定所述参考用户的金融标签。The financial label of the reference user is determined according to the comprehensive score of the reference user and the comprehensive label classification model.
可选的,所述组合计算单元,还用于:Optionally, the combined computing unit is also used for:
确定形成综合评分矩阵的历史时间,所述综合评分矩阵为所有参考用户的综合评分组成;Determine the historical time for forming the comprehensive scoring matrix, the comprehensive scoring matrix is composed of the comprehensive scoring of all reference users;
根据所述历史时间和当前时间,计算所述当前时间之下,衰减后的综合评分矩阵;According to the historical time and the current time, calculate the comprehensive score matrix after the decay under the current time;
根据以下公式计算所述衰减后的综合评分矩阵,Calculate the comprehensive scoring matrix after the decay according to the following formula,
其中,α为衰减因子,t为当前时间,T为历史时间,M(T)为历史时间下的综合评分矩阵,M(t)为当前时间下的综合评分矩阵,M’(t)为所述衰减后的综合评分矩阵。Among them, α is the attenuation factor, t is the current time, T is the historical time, M(T) is the comprehensive scoring matrix under the historical time, M(t) is the comprehensive scoring matrix under the current time, and M'(t) is the comprehensive scoring matrix under the current time. The comprehensive scoring matrix after the decay described above.
一种计算设备,包括:A computing device comprising:
存储器,用于存储程序指令;memory for storing program instructions;
处理器,用于调用所述存储器中存储的程序指令,按照获得的程序执行:获取参考用户的交易数据以及行为日志;根据所述参考用户的交易数据,构建消费评分矩阵,所述消费评分矩阵中的一个元素为所述参考用户在一个交易维度上的消费评分;根据所述参考用户的行为日志,建立所述参考用户的向量空间模型,所述向量空间模型中包括所述参考用户的多个行为数据,每个行为数据对应所述参考用户的行为日志中的一个词语;针对所述参考用户的一个行为数据,将所述行为数据对应的词语映射到所述消费评分矩阵中所述参考用户的一个交易维度上,根据所述行为数据和映射的交易维度的消费评分,确定所述参考用户的综合评分;根据所述参考用户的综合评分,确定所述参考用户的金融标签。The processor is used to call the program instructions stored in the memory, and execute according to the obtained program: obtain the transaction data and behavior logs of the reference user; construct a consumption scoring matrix according to the transaction data of the reference user, and the consumption scoring matrix One element in is the consumption score of the reference user on a transaction dimension; according to the behavior log of the reference user, the vector space model of the reference user is established, and the vector space model includes multiple information of the reference user Behavior data, each behavior data corresponds to a word in the behavior log of the reference user; for a behavior data of the reference user, map the word corresponding to the behavior data to the reference in the consumption scoring matrix In one transaction dimension of the user, the comprehensive score of the reference user is determined according to the behavior data and the consumption score of the mapped transaction dimension; the financial label of the reference user is determined according to the comprehensive score of the reference user.
本发明实施例中,从所有用户中任选部分用户作为参考用户,获取参考用户的交易数据和行为日志。根据参考用户的交易数据,构建消费评分矩阵,其中,消费评分中的一个元素为参考用户在一个交易维度上的消费评分。同时,根据参考用户的行为日志,建立参考用户的向量空间模型,该向量空间模型中包括参考用户的多个行为数据,每个行为数据对应参考用户的行为日志中的一个词语。本发明实施例不仅以用户的行为日志作为标签构建的依据,还参考了用户的交易数据,并将两者进行融合。具体来说,针对参考用户的一个行为数据,将所述行为数据对应的词语映射到所述消费评分矩阵中所述参考用户的一个交易维度上,并根据行为数据和映射的交易维度的消费评分,确定该参考用户的综合评分。最后,根据该综合评分,确定参考用户的综合评分。本发明实施例通过对用户的行为喜好和交易情况进行综合评价,建立能准确描述用户金融偏好和消费特征的标签,解决了根据用户的行为日志进行标签的方法往往无法标示出用户的消费特征的问题。且与现有技术相比,本发明实施例通过对行为日志和交易数据进行融合,构建用户的特征,由于所有用户的消费评分数据均由用户的交易明细和行为日志综合决定,这种方式降低了传统方法中依据计算行为日志的文本之间相似度造成的累计误差,增加了标签建立的准确度。In the embodiment of the present invention, some users are selected from all users as reference users, and the transaction data and behavior logs of the reference users are obtained. According to the transaction data of the reference user, a consumption score matrix is constructed, wherein one element in the consumption score is the consumption score of the reference user on a transaction dimension. At the same time, according to the behavior log of the reference user, a vector space model of the reference user is established, the vector space model includes multiple behavior data of the reference user, and each behavior data corresponds to a word in the behavior log of the reference user. The embodiment of the present invention not only uses the user's behavior log as the basis for label construction, but also refers to the user's transaction data, and integrates the two. Specifically, for a behavior data of a reference user, map the words corresponding to the behavior data to a transaction dimension of the reference user in the consumption scoring matrix, and use the behavior data and the consumption score of the mapped transaction dimension , to determine the composite score of the reference user. Finally, according to the comprehensive score, the comprehensive score of the reference user is determined. The embodiment of the present invention establishes a label that can accurately describe the user's financial preferences and consumption characteristics by comprehensively evaluating the user's behavior preferences and transaction conditions, and solves the problem that the method of labeling based on the user's behavior log often cannot mark the user's consumption characteristics question. And compared with the prior art, the embodiment of the present invention builds user characteristics by fusing behavior logs and transaction data. Since the consumption rating data of all users are determined comprehensively by the user’s transaction details and behavior logs, this method reduces The cumulative error caused by the similarity between the texts based on the calculation of behavior logs in the traditional method is eliminated, and the accuracy of label establishment is increased.
附图说明Description of drawings
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简要介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. For Those of ordinary skill in the art can also obtain other drawings based on these drawings without any creative effort.
图1为本发明实施例提供的一种金融标签的构建方法的流程示意图;Fig. 1 is a schematic flow chart of a construction method of a financial label provided by an embodiment of the present invention;
图2为本发明具体实施例中建立标签的方法的流程示意图;Fig. 2 is a schematic flow chart of a method for establishing a label in a specific embodiment of the present invention;
图3为本发明实施例提供的一种金融标签的构建装置的结构示意图。Fig. 3 is a schematic structural diagram of a financial label construction device provided by an embodiment of the present invention.
具体实施方式detailed description
为了使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明作进一步地详细描述,显然,所描述的实施例仅仅是本发明一部份实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, rather than all embodiments . Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.
图1示例性示出了本发明实施例提供的一种金融标签的构建方法的流程示意图。如图1所示,本发明实施例提供的金融标签的构建方法,包括以下步骤:Fig. 1 exemplarily shows a schematic flowchart of a method for constructing a financial label provided by an embodiment of the present invention. As shown in Figure 1, the construction method of the financial label provided by the embodiment of the present invention includes the following steps:
步骤101、获取参考用户的交易数据以及行为日志。Step 101, acquire the transaction data and behavior log of the reference user.
步骤102、根据所述参考用户的交易数据,构建消费评分矩阵,所述消费评分矩阵中的一个元素为所述参考用户在一个交易维度上的消费评分。Step 102: Construct a consumption score matrix according to the transaction data of the reference user, and an element in the consumption score matrix is the consumption score of the reference user in a transaction dimension.
步骤103、根据所述参考用户的行为日志,建立所述参考用户的向量空间模型,所述向量空间模型中包括所述参考用户的多个行为数据,每个行为数据对应所述参考用户的行为日志中的一个词语。Step 103, according to the behavior log of the reference user, establish a vector space model of the reference user, the vector space model includes a plurality of behavior data of the reference user, and each behavior data corresponds to the behavior of the reference user A word in the log.
步骤104、针对所述参考用户的一个行为数据,将所述行为数据对应的词语映射到所述消费评分矩阵中所述参考用户的一个交易维度上,根据所述行为数据和映射的交易维度的消费评分,确定所述参考用户的综合评分。Step 104: For a piece of behavior data of the reference user, map the word corresponding to the behavior data to a transaction dimension of the reference user in the consumption scoring matrix, according to the relationship between the behavior data and the mapped transaction dimension The consumption score is used to determine the comprehensive score of the reference user.
步骤105、根据所述参考用户的综合评分,确定所述参考用户的金融标签。Step 105. Determine the financial label of the reference user according to the comprehensive score of the reference user.
本发明实施例中,从所有用户中任选部分用户作为参考用户,获取参考用户的交易数据和行为日志。根据参考用户的交易数据,构建消费评分矩阵,其中,消费评分中的一个元素为参考用户在一个交易维度上的消费评分。同时,根据参考用户的行为日志,建立参考用户的向量空间模型,该向量空间模型中包括参考用户的多个行为数据,每个行为数据对应参考用户的行为日志中的一个词语。本发明实施例不仅以用户的行为日志作为标签构建的依据,还参考了用户的交易数据,并将两者进行融合。具体来说,针对参考用户的一个行为数据,将所述行为数据对应的词语映射到所述消费评分矩阵中所述参考用户的一个交易维度上,并根据行为数据和映射的交易维度的消费评分,确定该参考用户的综合评分。最后,根据该综合评分,确定参考用户的综合评分。本发明实施例通过对用户的行为喜好和交易情况进行综合评价,建立能准确描述用户金融偏好和消费特征的标签,解决了根据用户的行为日志进行标签的方法往往无法标示出用户的消费特征的问题。且与现有技术相比,本发明实施例通过对行为日志和交易数据进行融合,构建用户的特征,由于所有用户的消费评分数据均由用户的交易明细和行为日志综合决定,这种方式降低了传统方法中依据计算行为日志的文本之间相似度造成的累计误差,增加了标签建立的准确度。In the embodiment of the present invention, some users are selected from all users as reference users, and the transaction data and behavior logs of the reference users are obtained. According to the transaction data of the reference user, a consumption score matrix is constructed, wherein one element in the consumption score is the consumption score of the reference user on a transaction dimension. At the same time, according to the behavior log of the reference user, a vector space model of the reference user is established, the vector space model includes multiple behavior data of the reference user, and each behavior data corresponds to a word in the behavior log of the reference user. The embodiment of the present invention not only uses the user's behavior log as the basis for label construction, but also refers to the user's transaction data, and integrates the two. Specifically, for a behavior data of a reference user, map the words corresponding to the behavior data to a transaction dimension of the reference user in the consumption scoring matrix, and use the behavior data and the consumption score of the mapped transaction dimension , to determine the composite score of the reference user. Finally, according to the comprehensive score, the comprehensive score of the reference user is determined. The embodiment of the present invention establishes a label that can accurately describe the user's financial preferences and consumption characteristics by comprehensively evaluating the user's behavior preferences and transaction conditions, and solves the problem that the method of labeling based on the user's behavior log often cannot mark the user's consumption characteristics question. And compared with the prior art, the embodiment of the present invention builds user characteristics by fusing behavior logs and transaction data. Since the consumption rating data of all users are determined comprehensively by the user’s transaction details and behavior logs, this method reduces The cumulative error caused by the similarity between the texts based on the calculation of behavior logs in the traditional method is eliminated, and the accuracy of label establishment is increased.
本发明实施例中,为参考用户建立标签是基于参考用户的交易数据和行为日志,因此,首先需要获取参考用户的交易数据,并进行处理。In the embodiment of the present invention, creating a label for a reference user is based on the transaction data and behavior logs of the reference user. Therefore, firstly, the transaction data of the reference user needs to be obtained and processed.
上述步骤102,所述根据所有参考用户的交易数据,构建消费评分矩阵,包括:The above step 102, constructing a consumption scoring matrix according to the transaction data of all reference users includes:
针对每个参考用户,利用所述参考用户的交易数据,计算所述参考用户在不同交易维度的消费状况;根据所述消费状况,计算所述参考用户在每个交易维度的消费评分;For each reference user, use the transaction data of the reference user to calculate the consumption status of the reference user in different transaction dimensions; calculate the consumption score of the reference user in each transaction dimension according to the consumption status;
利用所有参考用户在每个交易维度的消费评分,构建消费评分矩阵。Use the consumption scores of all reference users in each transaction dimension to construct a consumption score matrix.
具体来说,获取参考用户的交易数据,并根据交易数据计算每个参考用户的消费状况,消费状况可以是参考用户的消费笔数或消费金额等。为了使计算更有针对性,本发明实施例中根据交易数据统计参考用户在不同交易维度的消费状况。具体为5个交易维度,分别是交易地理、交易时间段、交易金额段、交易渠道和交易商户类型。之后,根据参考用户的消费状况计算其在每个交易维度的消费评分,参考用户在一个交易维度的消费评分可以利用以下公式计算:Specifically, the transaction data of the reference user is obtained, and the consumption status of each reference user is calculated according to the transaction data. The consumption status may be the number of transactions or the consumption amount of the reference user. In order to make the calculation more targeted, in the embodiment of the present invention, the consumption status of the reference user in different transaction dimensions is counted according to the transaction data. Specifically, there are five transaction dimensions, namely transaction geography, transaction time period, transaction amount period, transaction channel and transaction merchant type. After that, calculate the consumption score of the reference user in each transaction dimension according to the consumption status of the reference user. The consumption score of the reference user in a transaction dimension can be calculated using the following formula:
其中,Score为所述用户在一个交易维度的消费评分,θ为所述交易维度的权重;ω为所述参考用户在所述交易维度的消费笔数和消费金额的加权平均值;υ为所有参考用户在所述交易维度的消费均值,σ为所有参考用户在所述交易维度的消费方差;为所述参考用户在所述交易维度的消费金额与所述参考用户的所有消费金额之和的比值。Among them, Score is the consumption score of the user in a transaction dimension, θ is the weight of the transaction dimension; ω is the weighted average of the number of transactions and the consumption amount of the reference user in the transaction dimension; υ is all The average consumption value of the reference user in the transaction dimension, σ is the consumption variance of all reference users in the transaction dimension; is the ratio of the consumption amount of the reference user in the transaction dimension to the sum of all consumption amounts of the reference user.
例如,计算用户a在交易时间段为交易商户类型为化妆品的消费评分。由公式1,θ为化妆品的权重,该值可根据相应的交易维度的业务情况来确定。ω为用户a在化妆品上的消费笔数和消费金额的加权平均值。υ为所有参考用户在化妆品上的消费均值。σ为所有参考用户在化妆品上的消费方差。为用户a在化妆品上的消费金额与用户a的所有消费金额之和的比值。For example, calculate user a's consumption score for the transaction merchant type of cosmetics during the transaction time period. According to formula 1, θ is the weight of cosmetics, and this value can be determined according to the business situation of the corresponding transaction dimension. ω is the weighted average of user a's consumption on cosmetics and the consumption amount. υ is the average consumption of all reference users on cosmetics. σ is the consumption variance of all reference users on cosmetics. is the ratio of user a's consumption amount on cosmetics to the sum of all user a's consumption amounts.
利用公式1,可以计算出一个参考用户在每个交易维度的消费评分,并根据所有参考用户的消费评分构建消费评分矩阵。Using Formula 1, the consumption score of a reference user in each transaction dimension can be calculated, and a consumption score matrix can be constructed based on the consumption scores of all reference users.
由于部分参考用户的交易行为频度较低,导致这些参考用户在许多维度有未交易的情况,会形成大量消费评分的残缺值。因此,步骤102,所述根据所有参考用户的交易数据,构建消费评分矩阵之后,还包括:Due to the low transaction frequency of some reference users, these reference users have not traded in many dimensions, which will form a large number of incomplete values of consumption scores. Therefore, step 102, after constructing the consumption scoring matrix according to the transaction data of all reference users, further includes:
采用矩阵分解的方法,对所述消费评分矩阵中的残缺值进行补全。The method of matrix decomposition is used to complete the missing values in the consumption scoring matrix.
具体来说,所述采用矩阵分解的方法,对所述消费评分矩阵中的残缺值进行补全,包括:Specifically, the method of using matrix decomposition to complete the incomplete values in the consumption scoring matrix includes:
随机生成第一参数行向量和第二参数行向量,所述第一参数行向量的元素个数与所述消费评分矩阵的行数相等,所述第二参数行向量的元素个数与所述消费评分矩阵的列数相等;Randomly generate a first parameter row vector and a second parameter row vector, the number of elements of the first parameter row vector is equal to the number of rows of the consumption score matrix, and the number of elements of the second parameter row vector is equal to the number of elements of the The consumption score matrix has the same number of columns;
根据所述第一参数行向量和所述第二参数行向量,计算所述消费评分矩阵的误差;calculating an error of the consumption score matrix according to the first parameter row vector and the second parameter row vector;
根据所述误差更新所述第一参数行向量和所述第二参数行向量,并重复步骤根据所述第一参数行向量和所述第二参数行向量,计算所述消费评分矩阵的误差,直至所述误差收敛;updating the first parameter row vector and the second parameter row vector according to the error, and repeating steps to calculate the error of the consumption score matrix according to the first parameter row vector and the second parameter row vector, until the error converges;
根据所述第一参数行向量和所述第二参数行向量确定补全后的消费评分矩阵。A completed consumption score matrix is determined according to the first parameter row vector and the second parameter row vector.
通常我们用评分矩阵MR×N表示所有参考用户在各个维度的消费评分,残缺值由0代替。其中1≤r≤R表示用户索引,即共有R个参考用户,1≤n≤N表示交易维度索引,即每个用户有N个交易维度。ψ(r)表示用户参数向量,ψ(n)为维度参数向量。我们希望找到合适的参数向量ψ(r)和ψ(n)使得MR×N=ψ(r)T·ψ(n)。具体方法如下:Usually we use the rating matrix M R × N to represent the consumption ratings of all reference users in each dimension, and the incomplete value is replaced by 0. Among them, 1≤r≤R indicates the user index, that is, there are R reference users in total, and 1≤n≤N indicates the transaction dimension index, that is, each user has N transaction dimensions. ψ(r) represents the user parameter vector, and ψ(n) is the dimension parameter vector. We wish to find suitable parameter vectors ψ(r) and ψ(n) such that MR ×N =ψ(r) T ·ψ(n). The specific method is as follows:
步骤一、输入评分矩阵MR×N,并随机初始化生成用户参数向量ψ(r)和维度参数向量ψ(n)。其中,用户参数向量为第一参数向量,维度参数向量为第二参数向量,由于用户参数向量的元素个数等于该参考用户的交易维度个数,因此用户参数向量的元素个数与评分矩阵MR×N的行数相等。相应的,维度参数向量的元素个数与评分矩阵MR×N的列数相等。Step 1: Input the rating matrix M R×N , and randomly initialize and generate user parameter vector ψ(r) and dimension parameter vector ψ(n). Among them, the user parameter vector is the first parameter vector, and the dimension parameter vector is the second parameter vector. Since the number of elements in the user parameter vector is equal to the number of transaction dimensions of the reference user, the number of elements in the user parameter vector is the same as the scoring matrix M R×N has the same number of rows. Correspondingly, the number of elements of the dimension parameter vector is equal to the number of columns of the scoring matrix M R×N .
步骤二、计算评分矩阵MR×N中非零元素与ψ(r)T·ψ(n)的误差,即计算εR×N=MR×N-ψ(r)T·ψ(n)……公式2。Step 2. Calculate the error between the non-zero elements in the scoring matrix M R×N and ψ(r) T ·ψ(n), that is, calculate ε R×N = M R×N -ψ(r) T ·ψ(n) ...Formula 2.
步骤三、根据误差更新参数向量ψ(r)和ψ(n),计算公式为:Step 3. Update the parameter vectors ψ(r) and ψ(n) according to the error, the calculation formula is:
ψ(r)=ψ(r)+α[εR×N·ψ(n)-λr·ψ(r)]……公式3,ψ(r)=ψ(r)+α[ε R×N ψ(n)-λ r ψ(r)]...Formula 3,
ψ(n)=ψ(n)+α[εR×N·ψ(r)-λn·ψ(n)]……公式4。ψ(n)=ψ(n)+α[εR ×N ·ψ(r)−λn·ψ( n )]…Formula 4.
其中,α和λ均为学习更新速率,α为整体速率,其数值通常为0.05。因为用户参数向量ψ(r)往往是一个较长的向量,维度参数向量ψ(n)往往较短,为了区别两个向量的更新速率,引入参数λ。其中λr为用户向量ψ(r)的更新速率,通常小于α,λn为用户向量ψ(r)的更新速率,通常大于α。Among them, α and λ are learning update rates, α is the overall rate, and its value is usually 0.05. Because the user parameter vector ψ(r) is often a longer vector, and the dimension parameter vector ψ(n) is often shorter, in order to distinguish the update rate of the two vectors, the parameter λ is introduced. Where λ r is the update rate of the user vector ψ(r), usually smaller than α, and λ n is the update rate of the user vector ψ(r), usually greater than α.
步骤四、重复步骤二和步骤三,直至误差矩阵εR×N稳定,误差矩阵εR×N稳定是指上次计算所得误差矩阵与当前误差矩阵的每个元素之差的绝对值的均值小于某个固定值。Step 4. Repeat step 2 and step 3 until the error matrix ε R×N is stable, and the stability of the error matrix ε R×N means that the mean value of the absolute value of the difference between the error matrix obtained last time and each element of the current error matrix is less than some fixed value.
步骤五、输出补全后的完整的评分矩阵 Step 5. Output the complete scoring matrix after completion
本发明实施例中,除了以参考用户的交易数据作为标签构建的依据并进行处理,还参考了用户的交易数据。进一步,步骤104,将所述行为数据对应的词语映射到所述消费评分矩阵中所述参考用户的一个交易维度上,包括:In the embodiment of the present invention, in addition to referring to the user's transaction data as the basis for label construction and processing, the user's transaction data is also referred to. Further, in step 104, the words corresponding to the behavior data are mapped to a transaction dimension of the reference user in the consumption scoring matrix, including:
针对所述参考用户的一个交易维度,计算所述交易维度与所述行为数据对应的词语的相似度;For a transaction dimension of the reference user, calculate the similarity between the transaction dimension and the words corresponding to the behavior data;
从所述参考用户的所有交易维度中,确定与所述行为数据对应的词语的相似度最高的交易维度;From all the transaction dimensions of the reference user, determine the transaction dimension with the highest similarity to the words corresponding to the behavior data;
将所述行为数据对应的词语映射到所述相似度最高的交易维度上。The word corresponding to the behavior data is mapped to the transaction dimension with the highest similarity.
具体来说,需要对参考用户的行为数据进行处理,这里的处理方式与现有的针对用户的文本数据的处理方法相似。对同一参考用户的浏览信息、评论、行为日志等日志信息进行收集、清洗和过滤,将该参考用户的所有日志信息聚合成该参考用户对应的文档,利用TF-IDF(term frequency–inverse document frequency,信息检索数据挖掘的常用加权技术)建立该参考用户的文本向量空间模型,文本向量空间模型中包括该参考用户的各个行为词语以及对应的行为数据。Specifically, the behavior data of the reference user needs to be processed, and the processing method here is similar to the existing processing method for the user's text data. Collect, clean and filter log information such as browsing information, comments, and behavior logs of the same reference user, aggregate all log information of the reference user into a document corresponding to the reference user, and use TF-IDF (term frequency–inverse document frequency , a commonly used weighting technique for information retrieval data mining) to establish a text vector space model of the reference user, which includes each behavior word and corresponding behavior data of the reference user.
之后,将参考用户的行为数据与交易维度进行融合。通过计算相似度的方法将参考用户的行为数据映射在交易维度上,可以利用知网和新闻分类语料库计算参考用户行为词语与交易维度的相关度,具体可以将语料库学习出来的相似度与基于知网计算的相似度进行组合作为最后的相似度。将新闻分类映射到交易维度可以是利用加权运算,其中加权的权重为交易维度与行为词语的文本相似度。After that, the behavioral data of the reference user is fused with the transaction dimension. The behavior data of the reference user is mapped to the transaction dimension by calculating the similarity, and the correlation between the reference user behavior words and the transaction dimension can be calculated by using HowNet and the news classification corpus. Specifically, the similarity learned from the corpus can be compared with the knowledge-based The similarities calculated by the network are combined as the final similarity. The mapping of the news classification to the transaction dimension may utilize a weighting operation, wherein the weighted weight is the text similarity between the transaction dimension and the behavior word.
例如,通过参考用户的交易数据计算出该参考用户在交易维度“餐饮”上的初始的消费评分为Score餐饮=0.1,则将该参考用户所有与“餐饮”相似的行为词语(例如“烤鱼”)对应的行为数据加权在餐饮的消费评分上,具体公式可以为:For example, by calculating the transaction data of the reference user, the initial consumption score of the reference user on the transaction dimension "dining" is Score catering = 0.1, then all the behavioral words of the reference user similar to "dining" (such as "grilled fish") ") The corresponding behavioral data is weighted on the consumption score of catering. The specific formula can be:
Score总=Score餐饮+s×Score烤鱼……公式6Score total = Score catering + s × Score grilled fish ... Formula 6
其中,Score烤鱼为根据TFIDF算法算出的“烤鱼”的评分,s为“烤鱼”与“餐饮”的文本相似度。Among them, Score grilled fish is the score of "grilled fish" calculated according to the TFIDF algorithm, and s is the text similarity between "grilled fish" and "catering".
考虑到参考用户的消费状况是一种时序数据,会随着时间的变化而变化,因此,需要对参考用户的综合评分进行衰减。所述根据所述行为数据和映射的交易维度的消费评分,确定所述参考用户的综合评分之后,还包括:Considering that the consumption status of the reference user is a kind of time series data, which will change with time, it is necessary to attenuate the comprehensive score of the reference user. After determining the comprehensive score of the reference user according to the behavior data and the consumption score of the mapped transaction dimension, it also includes:
确定形成综合评分矩阵的历史时间,所述综合评分矩阵为所有参考用户的综合评分组成;Determine the historical time for forming the comprehensive scoring matrix, the comprehensive scoring matrix is composed of the comprehensive scoring of all reference users;
根据所述历史时间和当前时间,计算所述当前时间之下,衰减后的综合评分矩阵;According to the historical time and the current time, calculate the comprehensive score matrix after the decay under the current time;
根据以下公式计算所述衰减后的综合评分矩阵,Calculate the comprehensive scoring matrix after the decay according to the following formula,
其中,α为衰减因子,t为当前时间,T为历史时间,M(T)为历史时间下的综合评分矩阵,M(t)为当前时间下的综合评分矩阵,M’(t)为所述衰减后的综合评分矩阵。Among them, α is the attenuation factor, t is the current time, T is the historical time, M(T) is the comprehensive scoring matrix under the historical time, M(t) is the comprehensive scoring matrix under the current time, and M'(t) is the comprehensive scoring matrix under the current time. The comprehensive scoring matrix after the decay described above.
最后,根据综合评分矩阵为用户建立标签,主要是考虑如何建立分类器。由于建立标签是一种统计数据,所以建立分类器所依据的参考用户的数量为多个,这里参考用户的数量越多,最终建立的分类器越准确。Finally, according to the comprehensive scoring matrix to establish labels for users, the main consideration is how to establish a classifier. Since the establishment of tags is a kind of statistical data, the number of reference users used to establish the classifier is multiple, and the greater the number of reference users, the more accurate the finally established classifier.
所述根据所述参考用户的综合评分,确定所述参考用户的金融标签,包括:The determining the financial label of the reference user according to the comprehensive score of the reference user includes:
根据业务规则和参考用户的背景资料,从所有参考用户中确定属于同一类标签的参考用户;According to the business rules and the background information of the reference users, determine the reference users belonging to the same category of tags from all the reference users;
根据属于同一类标签的参考用户的综合评分,确定该类标签的预测模型;According to the comprehensive score of the reference users belonging to the same category of labels, the prediction model of this category of labels is determined;
根据各类标签的预测模型,得到综合标签分类模型;According to the prediction model of various labels, a comprehensive label classification model is obtained;
根据所述参考用户的综合评分和所述综合标签分类模型,确定所述参考用户的金融标签。The financial label of the reference user is determined according to the comprehensive score of the reference user and the comprehensive label classification model.
具体来说,根据业务规则和参考用户的背景资料,将所有的参考用户建立标签。例如,可以为参考用户建立的标签体系,主要分为以下四大类:(1)人口属性:例如性别,年龄,消费水平等。(2)状态属性:例如卡额度是否需要提升,是否有房,是否有车,是否有小孩等。(3)交易属性:例如是否经常输错密码,是否经常余额不足等。(4)金融及消费偏好:例如理财、保险、数码、取现爱好者、云闪付达人等。对于人口属性和状态属性两类标签,我们通过业务知识和参考用户的客观背景资料共同构建。以人口属性中的性别标签为例,50%的数据的由业务规则决定,例如参考用户如果在“烟酒”、“男装”等维度消费较频繁,则认为该参考用户为男性。若参考用户在”化妆品”、“美容”等维度消费较频繁,则认为是女性。训练集中50%的数据由参考用户的背景资料中获得。对于消费偏好类标签,我们通过验证参考用户在一个月内是否在该维度消费来作为验证集合。Specifically, all reference users are tagged according to business rules and background information of the reference users. For example, the label system that can be established for reference users is mainly divided into the following four categories: (1) Demographic attributes: such as gender, age, consumption level, etc. (2) Status attributes: For example, whether the card limit needs to be increased, whether there is a house, whether there is a car, whether there are children, etc. (3) Transaction attributes: For example, whether the wrong password is often entered, whether the balance is often insufficient, etc. (4) Financial and consumption preferences: such as wealth management, insurance, digital, cash withdrawal enthusiasts, cloud flash payment experts, etc. For the two types of tags, population attribute and state attribute, we jointly construct them through business knowledge and reference users' objective background information. Taking the gender label in demographic attributes as an example, 50% of the data is determined by business rules. For example, if a reference user consumes frequently in dimensions such as "tobacco and alcohol" and "men's clothing", the reference user is considered to be male. If the reference user consumes more frequently in dimensions such as "cosmetics" and "beauty", it is considered to be a woman. 50% of the data in the training set is obtained from the background information of reference users. For the consumption preference label, we use it as a verification set by verifying whether the reference user consumes in this dimension within a month.
根据参考用户的标签,从综合评分矩阵的所有参考用户中选出多个参考用户,作为训练样本加入训练集中,并根据训练样本的数量确定所述训练集中每个训练样本的权值;将所有训练样本随机划分为k组,将每组中权值最大的训练样本作为测试样本;根据所述测试样本确定分类器;利用所述分类器对所有训练样本进行分类,并与训练样本的标签相对比,确定所述分类器的误差率以及分类器权重;根据所述分类器权重对所有训练样本的权值进行更新,并重复步骤将所有训练样本随机划分为k组,直至确定的分类器个数大于阈值;将所有分类器进行线性加权,确定综合分类器;利用所述综合分类器对所有参考用户进行分类,根据分类结果将部分参考用户作为训练样本加入所述训练集中,并重复步骤根据训练样本的数量确定所述训练集中每个训练样本的权值,直至所述训练集中的训练样本不再变化;将训练样本不再变化的综合分类器作为金融分类器,为所述待分类用户建立金融标签。According to the label of the reference user, select a plurality of reference users from all reference users of the comprehensive scoring matrix, add them as training samples to the training set, and determine the weight of each training sample in the training set according to the number of training samples; The training samples are randomly divided into k groups, and the training samples with the largest weight in each group are used as test samples; a classifier is determined according to the test samples; all training samples are classified by the classifier, and are compared with the labels of the training samples ratio, determine the error rate of the classifier and the weight of the classifier; update the weights of all training samples according to the weight of the classifier, and repeat the steps to randomly divide all the training samples into k groups until the determined number of classifiers The number is greater than the threshold; all classifiers are linearly weighted to determine a comprehensive classifier; use the comprehensive classifier to classify all reference users, and add some reference users as training samples to the training set according to the classification results, and repeat the steps according to The quantity of training samples determines the weight of each training sample in the training set, until the training samples in the training set no longer change; the comprehensive classifier with the training samples no longer changing as the financial classifier, for the user to be classified Create a financial label.
下面以具体实施例对上述建立标签的流程进行详细描述,如图2所示,具体步骤包括:The following is a detailed description of the above-mentioned process of establishing a label with a specific embodiment, as shown in Figure 2, and the specific steps include:
步骤201、初始化。从所有参考用户中选出n个作为训练样本加入训练集T中,假设训练集T={(x1,y1),(x2,y2),……(xn,yn)},其中,yn为第n个参考用户,xn为第n个参考用户在综合评分矩阵中对应的向量。初始化训练集中各训练样本的权值,所有的权重W都赋值为即 Step 201, initialization. Select n from all reference users as training samples and add them to the training set T, assuming that the training set T={(x 1 ,y 1 ),(x 2 ,y 2 ),...(x n ,y n )} , where y n is the nth reference user, and x n is the vector corresponding to the nth reference user in the comprehensive scoring matrix. Initialize the weights of each training sample in the training set, and all weights W are assigned which is
步骤202、随机采样构建分类器。将所有训练样本随机划分为k组,将k组中权值W最大的训练样本作为测试样本,将测试样本训练得到分类器Gi。Step 202, building a classifier by random sampling. All the training samples are randomly divided into k groups, and the training samples with the largest weight W in the k groups are used as test samples, and the test samples are trained to obtain the classifier G i .
步骤203、权重更新。利用分类器Gi对训练集T中的每个训练样本进行分类。将每一个训练样本的分类与该训练样本的标签相对比,计算每一个训练样本的误差,从而统计分类器Gi的误差率ei,并根据下列公式计算分类器Gi的权重 Step 203, weight update. Use the classifier G i to classify each training sample in the training set T. Compare the classification of each training sample with the label of the training sample, and calculate the error of each training sample, so as to count the error rate e i of the classifier G i , and calculate the weight of the classifier G i according to the following formula
其中当误差率ei≤0.5时,分类器权重即分类器误差率越小其权重越大。Wherein when the error rate e i ≤0.5, the classifier weight That is, the smaller the error rate of the classifier, the greater its weight.
步骤204、判断确定的分类器个数是否大于阈值,若是,执行步骤206;否则执行步骤205。Step 204 , judging whether the determined number of classifiers is greater than a threshold, if yes, execute step 206 ; otherwise, execute step 205 .
步骤205、利用下述公式对训练样本权重进行更新,之后执行步骤202。Step 205 , update the training sample weights using the following formula, and then execute step 202 .
其中δj为二值函数,当Gi对训练样本x分类正确时δj为1,否则δj为0。Among them, δ j is a binary function, when G i classifies the training sample x correctly, δ j is 1, otherwise δ j is 0.
步骤206、将所有分类器进行线性加权,得到综合分类器 Step 206, performing linear weighting on all classifiers to obtain a comprehensive classifier
步骤207、利用步骤205得到的综合分类器对所有参考用户进行分类,根据分类结果将得分较高的参考用户加入训练集中。Step 207: Use the integrated classifier obtained in step 205 to classify all reference users, and add reference users with higher scores to the training set according to the classification results.
步骤208、判断训练集中的训练样本是否变化,若是,则执行步骤201,否则执行步骤209。Step 208 , judging whether the training samples in the training set have changed, if yes, execute step 201 , otherwise execute step 209 .
步骤209、将训练样本不再变化的综合分类器作为金融分类器,利用金融分类器为待分类用户建立金融标签,并将金融分类器的准确率作为标签的权重。Step 209: Use the comprehensive classifier whose training samples do not change as the financial classifier, use the financial classifier to create financial labels for the users to be classified, and use the accuracy rate of the financial classifier as the weight of the label.
本发明实施例通过对用户的行为数据和交易数据进行融合,构建中间层特征数据,利用机器学习方法对用户标签进行学习,得到用户标签的置信度。与现有技术相比,本发明中所有参考用户的交易数据均由参考用户的交易明细计算得到,降低了传统方法由于计算文本之间相似度造成的累计误差,使得最终生成的标签有较高的准确度。同时,相对于传统的根据频繁字段生成标签的方法,本发明实施例由业务规则直接建立标签体系,在标签生成的过程中均可验证其正确性,这样,生成的标签更具针对性。此外,本发明实施例避免了传统方法将标签出现频次作为标签权重的方式,而是通过构造训练数据的方式,利用机器学习算法计算标签置信度,并将此作为标签权重。这大大方便了标签的后续使用,标签的使用者可以通过选择相应的置信度来获得目标人群。In the embodiment of the present invention, by fusing the user's behavior data and transaction data, constructing the feature data of the middle layer, using the machine learning method to learn the user's label, and obtaining the confidence of the user's label. Compared with the existing technology, all the transaction data of the reference users in the present invention are calculated from the transaction details of the reference users, which reduces the cumulative error caused by the calculation of the similarity between texts in the traditional method, so that the final generated labels have higher the accuracy. At the same time, compared to the traditional method of generating tags based on frequent fields, the embodiment of the present invention directly establishes a tag system based on business rules, and its correctness can be verified during the tag generation process. In this way, the generated tags are more targeted. In addition, the embodiment of the present invention avoids the traditional method of using the frequency of tag appearance as the tag weight, but constructs the training data, uses the machine learning algorithm to calculate the tag confidence, and uses this as the tag weight. This greatly facilitates the subsequent use of the label, and the user of the label can obtain the target group by selecting the corresponding confidence level.
图3示例性示出了本发明实施例提供的一种金融标签的构建装置的结构示意图。Fig. 3 exemplarily shows a schematic structural diagram of a financial tag construction device provided by an embodiment of the present invention.
如图3所示,本发明实施例提供的一种金融标签的构建装置,包括:As shown in Figure 3, a financial tag construction device provided by an embodiment of the present invention includes:
获取单元301,用于获取参考用户的交易数据以及行为日志;An acquisition unit 301, configured to acquire transaction data and behavior logs of the reference user;
交易处理单元302,用于根据所述参考用户的交易数据,构建消费评分矩阵,所述消费评分矩阵中的一个元素为所述参考用户在一个交易维度上的消费评分;The transaction processing unit 302 is configured to construct a consumption scoring matrix according to the transaction data of the reference user, where an element in the consumption scoring matrix is the consumption scoring of the reference user in a transaction dimension;
文本处理单元303,用于根据所述参考用户的行为日志,建立所述参考用户的向量空间模型,所述向量空间模型中包括所述参考用户的多个行为数据,每个行为数据对应所述参考用户的行为日志中的一个词语;The text processing unit 303 is configured to establish a vector space model of the reference user according to the behavior log of the reference user, the vector space model includes a plurality of behavior data of the reference user, and each behavior data corresponds to the Refer to a word in the user's behavior log;
组合计算单元304,用于针对所述参考用户的一个行为数据,将所述行为数据对应的词语映射到所述消费评分矩阵中所述参考用户的一个交易维度上,根据所述行为数据和映射的交易维度的消费评分,确定所述参考用户的综合评分;The combination calculation unit 304 is configured to map a word corresponding to the behavior data to a transaction dimension of the reference user in the consumption scoring matrix for a behavior data of the reference user, and according to the behavior data and the mapping The consumption score of the transaction dimension is used to determine the comprehensive score of the reference user;
标签单元305,用于根据所述参考用户的综合评分,确定所述参考用户的金融标签。The labeling unit 305 is configured to determine the financial label of the reference user according to the comprehensive score of the reference user.
可选的,所述交易处理单元302,具体用于:Optionally, the transaction processing unit 302 is specifically configured to:
针对每个参考用户,利用所述参考用户的交易数据,计算所述参考用户在不同交易维度的消费状况;根据所述消费状况,计算所述参考用户在每个交易维度的消费评分;For each reference user, use the transaction data of the reference user to calculate the consumption status of the reference user in different transaction dimensions; calculate the consumption score of the reference user in each transaction dimension according to the consumption status;
利用所有参考用户在每个交易维度的消费评分,构建消费评分矩阵。Use the consumption scores of all reference users in each transaction dimension to construct a consumption score matrix.
可选的,所述交易处理单元302,具体用于利用以下公式计算所述参考用户在一个交易维度的消费评分:Optionally, the transaction processing unit 302 is specifically configured to calculate the consumption score of the reference user in one transaction dimension by using the following formula:
其中,Score为所述用户在一个交易维度的消费评分,θ为所述交易维度的权重;ω为所述参考用户在所述交易维度的消费笔数和消费金额的加权平均值;υ为所有参考用户在所述交易维度的消费均值,σ为所有参考用户在所述交易维度的方差;为所述参考用户在所述交易维度的消费金额与所述参考用户的所有消费金额之和的比值。Among them, Score is the consumption score of the user in a transaction dimension, θ is the weight of the transaction dimension; ω is the weighted average of the number of transactions and the consumption amount of the reference user in the transaction dimension; υ is all The average consumption value of the reference user in the transaction dimension, σ is the variance of all reference users in the transaction dimension; is the ratio of the consumption amount of the reference user in the transaction dimension to the sum of all consumption amounts of the reference user.
可选的,所述交易处理单元302,还用于:Optionally, the transaction processing unit 302 is further configured to:
采用矩阵分解的方法,对所述消费评分矩阵中的残缺值进行补全。The method of matrix decomposition is used to complete the missing values in the consumption scoring matrix.
可选的,所述交易处理单元,具体用于:Optionally, the transaction processing unit is specifically used for:
随机生成第一参数行向量和第二参数行向量,所述第一参数行向量的元素个数与所述消费评分矩阵的行数相等,所述第二参数行向量的元素个数与所述消费评分矩阵的列数相等;Randomly generate a first parameter row vector and a second parameter row vector, the number of elements of the first parameter row vector is equal to the number of rows of the consumption score matrix, and the number of elements of the second parameter row vector is equal to the number of elements of the The consumption score matrix has the same number of columns;
根据所述第一参数行向量和所述第二参数行向量,计算所述消费评分矩阵的误差;calculating an error of the consumption score matrix according to the first parameter row vector and the second parameter row vector;
根据所述误差更新所述第一参数行向量和所述第二参数行向量,并重复步骤根据所述第一参数行向量和所述第二参数行向量,计算所述消费评分矩阵的误差,直至所述误差收敛;updating the first parameter row vector and the second parameter row vector according to the error, and repeating steps to calculate the error of the consumption score matrix according to the first parameter row vector and the second parameter row vector, until the error converges;
根据所述第一参数行向量和所述第二参数行向量确定补全后的消费评分矩阵。A completed consumption score matrix is determined according to the first parameter row vector and the second parameter row vector.
可选的,所述组合计算单元304,具体用于:Optionally, the combination calculation unit 304 is specifically used for:
针对所述参考用户的一个交易维度,计算所述交易维度与所述行为数据对应的词语的相似度;For a transaction dimension of the reference user, calculate the similarity between the transaction dimension and the words corresponding to the behavior data;
从所述参考用户的所有交易维度中,确定与所述行为数据对应的词语的相似度最高的交易维度;From all the transaction dimensions of the reference user, determine the transaction dimension with the highest similarity to the words corresponding to the behavior data;
将所述行为数据对应的词语映射到所述相似度最高的交易维度上。The word corresponding to the behavior data is mapped to the transaction dimension with the highest similarity.
可选的,所述参考用户的数量为多个;Optionally, the number of the reference users is multiple;
所述标签单元305,具体用于:The label unit 305 is specifically used for:
根据业务规则和参考用户的背景资料,从所有参考用户中确定属于同一类标签的参考用户;According to the business rules and the background information of the reference users, determine the reference users belonging to the same category of tags from all the reference users;
根据属于同一类标签的参考用户的综合评分,确定该类标签的预测模型;According to the comprehensive score of the reference users belonging to the same category of labels, the prediction model of this category of labels is determined;
根据各类标签的预测模型,得到综合标签分类模型;According to the prediction model of various labels, a comprehensive label classification model is obtained;
根据所述参考用户的综合评分和所述综合标签分类模型,确定所述参考用户的金融标签。The financial label of the reference user is determined according to the comprehensive score of the reference user and the comprehensive label classification model.
可选的,所述组合计算单元304,还用于:Optionally, the combination computing unit 304 is also used for:
确定形成综合评分矩阵的历史时间,所述综合评分矩阵为所有参考用户的综合评分组成;Determine the historical time for forming the comprehensive scoring matrix, the comprehensive scoring matrix is composed of the comprehensive scoring of all reference users;
根据所述历史时间和当前时间,计算所述当前时间之下,衰减后的综合评分矩阵;According to the historical time and the current time, calculate the comprehensive score matrix after the decay under the current time;
根据以下公式计算所述衰减后的综合评分矩阵,Calculate the comprehensive scoring matrix after the decay according to the following formula,
其中,α为衰减因子,t为当前时间,T为历史时间,M(T)为历史时间下的综合评分矩阵,M(t)为当前时间下的综合评分矩阵,M’(t)为所述衰减后的综合评分矩阵。Among them, α is the attenuation factor, t is the current time, T is the historical time, M(T) is the comprehensive scoring matrix under the historical time, M(t) is the comprehensive scoring matrix under the current time, and M'(t) is the comprehensive scoring matrix under the current time. The comprehensive scoring matrix after the decay described above.
本发明实施例提供了一种计算设备,该计算设备具体可以为桌面计算机、便携式计算机、智能手机、平板电脑、个人数字助理(Personal Digital Assistant,PDA)等。该计算设备可以包括中央处理器(Center Processing Unit,CPU)、存储器、输入/输出设备等,输入设备可以包括键盘、鼠标、触摸屏等,输出设备可以包括显示设备,如液晶显示器(Liquid Crystal Display,LCD)、阴极射线管(Cathode Ray Tube,CRT)等。An embodiment of the present invention provides a computing device, and the computing device may specifically be a desktop computer, a portable computer, a smart phone, a tablet computer, a personal digital assistant (Personal Digital Assistant, PDA), and the like. The computing device may include a central processing unit (Center Processing Unit, CPU), memory, input/output devices, etc., the input device may include a keyboard, mouse, touch screen, etc., and the output device may include a display device, such as a liquid crystal display (Liquid Crystal Display, LCD), cathode ray tube (Cathode Ray Tube, CRT), etc.
存储器可以包括只读存储器(ROM)和随机存取存储器(RAM),并向处理器提供存储器中存储的程序指令和数据。在本发明实施例中,存储器可以用于金融标签的构建方法的程序。The memory may include read only memory (ROM) and random access memory (RAM), and provides the processor with program instructions and data stored in the memory. In the embodiment of the present invention, the memory may be used for the program of the financial tag construction method.
处理器通过调用存储器存储的程序指令,处理器用于按照获得的程序指令执行:获取参考用户的交易数据以及行为日志;根据所述参考用户的交易数据,构建消费评分矩阵,所述消费评分矩阵中的一个元素为所述参考用户在一个交易维度上的消费评分;根据所述参考用户的行为日志,建立所述参考用户的向量空间模型,所述向量空间模型中包括所述参考用户的多个行为数据,每个行为数据对应所述参考用户的行为日志中的一个词语;针对所述参考用户的一个行为数据,将所述行为数据对应的词语映射到所述消费评分矩阵中所述参考用户的一个交易维度上,根据所述行为数据和映射的交易维度的消费评分,确定所述参考用户的综合评分;根据所述参考用户的综合评分,确定所述参考用户的金融标签。The processor calls the program instructions stored in the memory, and the processor is used to execute according to the obtained program instructions: obtain the transaction data and behavior logs of the reference user; construct a consumption scoring matrix according to the transaction data of the reference user, and the consumption scoring matrix One element of is the consumption score of the reference user on a transaction dimension; according to the behavior log of the reference user, the vector space model of the reference user is established, and the vector space model includes multiple Behavior data, each behavior data corresponds to a word in the behavior log of the reference user; for a behavior data of the reference user, map the word corresponding to the behavior data to the reference user in the consumption scoring matrix On a transaction dimension of , determine the comprehensive score of the reference user according to the behavior data and the consumption score of the mapped transaction dimension; determine the financial label of the reference user according to the comprehensive score of the reference user.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions The device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby The instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.
尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。While preferred embodiments of the invention have been described, additional changes and modifications to these embodiments can be made by those skilled in the art once the basic inventive concept is appreciated. Therefore, it is intended that the appended claims be construed to cover the preferred embodiment as well as all changes and modifications which fall within the scope of the invention.
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包括这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the spirit and scope of the present invention. Thus, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and equivalent technologies thereof, the present invention also intends to include these modifications and variations.
Claims (17)
- A kind of 1. construction method of financial label, it is characterised in that including:Obtain the transaction data and user behaviors log with reference to user;According to the transaction data with reference to user, structure consumption rating matrix, the element consumed in rating matrix For the consumption scoring with reference to user on a transaction dimension;According to the user behaviors log with reference to user, the vector space model with reference to user, the vector space mould are established Type includes multiple behavioral datas with reference to user, and each behavioral data is corresponded in the user behaviors log with reference to user One word;For a behavioral data with reference to user, word corresponding to the behavioral data is mapped to the consumption and scored On a transaction dimension described in matrix with reference to user, commented according to the consumption of the behavioral data and the transaction dimension of mapping Point, determine the comprehensive grading with reference to user;According to the comprehensive grading with reference to user, the financial label with reference to user is determined.
- 2. the method as described in claim 1, it is characterised in that described to be disappeared according to all transaction data with reference to user, structure Take rating matrix, including:User is referred to for each, using the transaction data with reference to user, calculates the reference user in different transaction dimensions The condition of consumption of degree;According to the condition of consumption, calculating is described to score with reference to user in the consumption of each transaction dimension;Scored using all with reference to user in the consumption of each transaction dimension, structure consumption rating matrix.
- 3. method as claimed in claim 2, it is characterised in that calculate the reference user in a transaction using below equation The consumption scoring of dimension:Wherein, Score is consumption scoring of the user in a transaction dimension, and θ is the weight of the transaction dimension;ω is institute State with reference to user in the consumption stroke count of the transaction dimension and the weighted average of spending amount;υ is all reference users in institute State the consumption average of transaction dimension, σ be it is all with reference to user the transaction dimension variance;Be it is described with reference to user in institute State the spending amount of transaction dimension and the ratio of all spending amount sums with reference to user.
- 4. the method as described in claim 1, it is characterised in that described to be disappeared according to all transaction data with reference to user, structure After taking rating matrix, in addition to:Using the method for matrix decomposition, completion is carried out to the incomplete value in the consumption rating matrix.
- 5. method as claimed in claim 4, it is characterised in that the method using matrix decomposition, score the consumption Incomplete value in matrix carries out completion, including:First parameter line vector sum the second parameter row vector of random generation, the element number of the first parameter row vector with it is described The line number of consumption rating matrix is equal, the columns phase of the element number of the second parameter row vector and the consumption rating matrix Deng;The second parameter row vector according to the first parameter line vector sum, calculate the error of the consumption rating matrix;Second parameter row vector described in the first parameter line vector sum according to the error update, and repeat step is according to Second parameter row vector described in first parameter line vector sum, the error of the consumption rating matrix is calculated, until the error is received Hold back;The second parameter row vector determines the consumption rating matrix after completion according to the first parameter line vector sum.
- 6. the method as described in claim 1, it is characterised in that it is described word corresponding to the behavioral data is mapped to it is described Consume on a transaction dimension described in rating matrix with reference to user, including:For a transaction dimension with reference to user, transaction dimension word corresponding with the behavioral data is calculated Similarity;From the All Activity dimension with reference to user, it is determined that the similarity highest of word corresponding with the behavioral data Transaction dimension;Word corresponding to the behavioral data is mapped on the similarity highest transaction dimension.
- 7. the method as described in claim 1, it is characterised in that the quantity with reference to user is multiple;It is described that the financial label with reference to user is determined according to the comprehensive grading with reference to user, including:According to business rule and the background information with reference to user, from all with reference to the reference for determining to belong to same class label in user User;According to the comprehensive grading for the reference user for belonging to same class label, the forecast model of such label is determined;According to the forecast model of all kinds of labels, obtain integrating labeling model;According to the comprehensive grading with reference to user and the comprehensive labeling model, the finance mark with reference to user is determined Label.
- 8. the method as described in any one of claim 1 to 7, it is characterised in that described according to the behavioral data and mapping The consumption scoring of transaction dimension, after determining the comprehensive grading with reference to user, in addition to:It is determined that forming the historical time of comprehensive grading matrix, the comprehensive grading matrix is all comprehensive grading groups with reference to user Into;According to the historical time and current time, calculate under the current time, the comprehensive grading matrix after decay;Comprehensive grading matrix after the decay is calculated according to below equation,<mrow> <msup> <mi>M</mi> <mo>,</mo> </msup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mi>&alpha;</mi> <mrow> <mi>&alpha;</mi> <mo>+</mo> <mrow> <mo>(</mo> <mi>t</mi> <mo>-</mo> <mi>T</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mi>M</mi> <mrow> <mo>(</mo> <mi>T</mi> <mo>)</mo> </mrow> <mo>+</mo> <mi>M</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </mrow>Wherein, α is decay factor, and t is current time, and T is historical time, and M (T) is the comprehensive grading matrix under historical time, M (t) it is the comprehensive grading matrix under current time, M ' (t) is the comprehensive grading matrix after the decay.
- A kind of 9. construction device of financial label, it is characterised in that including:Acquiring unit, for obtaining transaction data and user behaviors log with reference to user;Transaction handling unit, for according to the transaction data with reference to user, structure consumption rating matrix, the consumption scoring An element in matrix is the consumption scoring with reference to user on a transaction dimension;Text-processing unit, for according to the user behaviors log with reference to user, establishing the vector space mould with reference to user Type, the vector space model include multiple behavioral datas with reference to user, and each behavioral data corresponds to the reference A word in the user behaviors log of user;Computing unit is combined, for for a behavioral data with reference to user, by word corresponding to the behavioral data It is mapped on a transaction dimension described in the consumption rating matrix with reference to user, according to the behavioral data and mapping The consumption scoring of transaction dimension, determines the comprehensive grading with reference to user;Tag unit, for according to the comprehensive grading with reference to user, determining the financial label with reference to user.
- 10. device as claimed in claim 9, it is characterised in that the transaction handling unit, be specifically used for:User is referred to for each, using the transaction data with reference to user, calculates the reference user in different transaction dimensions The condition of consumption of degree;According to the condition of consumption, calculating is described to score with reference to user in the consumption of each transaction dimension;Scored using all with reference to user in the consumption of each transaction dimension, structure consumption rating matrix.
- 11. device as claimed in claim 10, it is characterised in that the transaction handling unit, specifically for utilizing following public affairs Formula calculating is described to score with reference to user in the consumption of a transaction dimension:Wherein, Score is consumption scoring of the user in a transaction dimension, and θ is the weight of the transaction dimension;ω is institute State with reference to user in the consumption stroke count of the transaction dimension and the weighted average of spending amount;υ is all reference users in institute State the consumption average of transaction dimension, σ be it is all with reference to user the transaction dimension variance;Be it is described with reference to user in institute State the spending amount of transaction dimension and the ratio of all spending amount sums with reference to user.
- 12. device as claimed in claim 9, it is characterised in that the transaction handling unit, be additionally operable to:Using the method for matrix decomposition, completion is carried out to the incomplete value in the consumption rating matrix.
- 13. device as claimed in claim 12, it is characterised in that the transaction handling unit, be specifically used for:First parameter line vector sum the second parameter row vector of random generation, the element number of the first parameter row vector with it is described The line number of consumption rating matrix is equal, the columns phase of the element number of the second parameter row vector and the consumption rating matrix Deng;The second parameter row vector according to the first parameter line vector sum, calculate the error of the consumption rating matrix;Second parameter row vector described in the first parameter line vector sum according to the error update, and repeat step is according to Second parameter row vector described in first parameter line vector sum, the error of the consumption rating matrix is calculated, until the error is received Hold back;The second parameter row vector determines the consumption rating matrix after completion according to the first parameter line vector sum.
- 14. device as claimed in claim 9, it is characterised in that the combination computing unit, be specifically used for:For a transaction dimension with reference to user, transaction dimension word corresponding with the behavioral data is calculated Similarity;From the All Activity dimension with reference to user, it is determined that the similarity highest of word corresponding with the behavioral data Transaction dimension;Word corresponding to the behavioral data is mapped on the similarity highest transaction dimension.
- 15. device as claimed in claim 9, it is characterised in that the quantity with reference to user is multiple;The tag unit, is specifically used for:According to business rule and the background information with reference to user, from all with reference to the reference for determining to belong to same class label in user User;According to the comprehensive grading for the reference user for belonging to same class label, the forecast model of such label is determined;According to the forecast model of all kinds of labels, obtain integrating labeling model;According to the comprehensive grading with reference to user and the comprehensive labeling model, the finance mark with reference to user is determined Label.
- 16. the device as described in any one of claim 9 to 15, it is characterised in that the combination computing unit, be additionally operable to:It is determined that forming the historical time of comprehensive grading matrix, the comprehensive grading matrix is all comprehensive grading groups with reference to user Into;According to the historical time and current time, calculate under the current time, the comprehensive grading matrix after decay;Comprehensive grading matrix after the decay is calculated according to below equation,<mrow> <msup> <mi>M</mi> <mo>,</mo> </msup> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mi>&alpha;</mi> <mrow> <mi>&alpha;</mi> <mo>+</mo> <mrow> <mo>(</mo> <mi>t</mi> <mo>-</mo> <mi>T</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mi>M</mi> <mrow> <mo>(</mo> <mi>T</mi> <mo>)</mo> </mrow> <mo>+</mo> <mi>M</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </mrow>Wherein, α is decay factor, and t is current time, and T is historical time, and M (T) is the comprehensive grading matrix under historical time, M (t) it is the comprehensive grading matrix under current time, M ' (t) is the comprehensive grading matrix after the decay.
- A kind of 17. computing device, it is characterised in that including:Memory, instructed for storage program;Processor, for calling the programmed instruction stored in the memory, performed according to the program of acquisition:Acquisition refers to user Transaction data and user behaviors log;According to the transaction data with reference to user, structure consumption rating matrix, the consumption is commented An element in sub-matrix is the consumption scoring with reference to user on a transaction dimension;According to described with reference to user's User behaviors log, establishes the vector space model with reference to user, and the vector space model includes described with reference to user's Multiple behavioral datas, each behavioral data correspond to a word in the user behaviors log with reference to user;For the reference The behavioral data of user, word corresponding to the behavioral data is mapped to described in the consumption rating matrix with reference to use On one transaction dimension at family, scored, determined described with reference to use according to the consumption of the behavioral data and the transaction dimension of mapping The comprehensive grading at family;According to the comprehensive grading with reference to user, the financial label with reference to user is determined.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710655552.XA CN107578270A (en) | 2017-08-03 | 2017-08-03 | Method, device and computing device for constructing a financial label |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710655552.XA CN107578270A (en) | 2017-08-03 | 2017-08-03 | Method, device and computing device for constructing a financial label |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107578270A true CN107578270A (en) | 2018-01-12 |
Family
ID=61035073
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710655552.XA Pending CN107578270A (en) | 2017-08-03 | 2017-08-03 | Method, device and computing device for constructing a financial label |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107578270A (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109034941A (en) * | 2018-06-13 | 2018-12-18 | 平安科技(深圳)有限公司 | Products Show method, apparatus, computer equipment and storage medium |
CN109035028A (en) * | 2018-06-29 | 2018-12-18 | 平安科技(深圳)有限公司 | Intelligence, which is thrown, cares for strategy-generating method and device, electronic equipment, storage medium |
CN109543668A (en) * | 2018-11-29 | 2019-03-29 | 税友软件集团股份有限公司 | A kind of salary bill item identification method, device, equipment and readable storage medium storing program for executing |
CN109614606A (en) * | 2018-10-23 | 2019-04-12 | 中山大学 | Classification and prediction method and device for fine range of long text cases based on document embedding |
CN109635990A (en) * | 2018-10-12 | 2019-04-16 | 阿里巴巴集团控股有限公司 | A kind of training method, prediction technique, device and electronic equipment |
CN110019563A (en) * | 2018-08-09 | 2019-07-16 | 北京首钢自动化信息技术有限公司 | A kind of portrait modeling method and device based on multidimensional data |
CN110210692A (en) * | 2018-05-24 | 2019-09-06 | 腾讯科技(深圳)有限公司 | A kind of trade company's mode identification method, device, storage medium and terminal device |
WO2019223379A1 (en) * | 2018-05-22 | 2019-11-28 | 阿里巴巴集团控股有限公司 | Product recommendation method and device |
WO2019232891A1 (en) * | 2018-06-06 | 2019-12-12 | 平安科技(深圳)有限公司 | Method and device for acquiring user portrait, computer apparatus and storage medium |
CN111414122A (en) * | 2019-12-26 | 2020-07-14 | 腾讯科技(深圳)有限公司 | Intelligent text processing method and device, electronic equipment and storage medium |
CN112766293A (en) * | 2019-11-05 | 2021-05-07 | 腾讯科技(深圳)有限公司 | Data feature extraction method and business object classification method and device |
CN113506138A (en) * | 2021-07-16 | 2021-10-15 | 瑞幸咖啡信息技术(厦门)有限公司 | Data estimation method, device, equipment and storage medium of business object |
CN113963367A (en) * | 2021-10-22 | 2022-01-21 | 深圳前海环融联易信息科技服务有限公司 | Financial transaction file based on model and money extraction method |
CN114357020A (en) * | 2021-12-17 | 2022-04-15 | 杭州摸象大数据科技有限公司 | Business scenario data extraction method, device, computer equipment and storage medium |
TWI779387B (en) * | 2020-11-06 | 2022-10-01 | 台北富邦商業銀行股份有限公司 | Smart customer tagging device and method thereof |
CN117708183A (en) * | 2023-11-08 | 2024-03-15 | 广州西米科技有限公司 | Potential user mining method and system based on user consumption habit |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130073336A1 (en) * | 2011-09-15 | 2013-03-21 | Stephan HEATH | System and method for using global location information, 2d and 3d mapping, social media, and user behavior and information for a consumer feedback social media analytics platform for providing analytic measfurements data of online consumer feedback for global brand products or services of past, present, or future customers, users or target markets |
CN103295145A (en) * | 2012-02-28 | 2013-09-11 | 北京星源无限传媒科技有限公司 | Mobile phone advertising method based on user consumption feature vector |
CN105959745A (en) * | 2016-05-25 | 2016-09-21 | 北京铭嘉实咨询有限公司 | Advertising method and system |
CN106022869A (en) * | 2016-05-12 | 2016-10-12 | 北京邮电大学 | Consumption object recommending method and consumption object recommending device |
-
2017
- 2017-08-03 CN CN201710655552.XA patent/CN107578270A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130073336A1 (en) * | 2011-09-15 | 2013-03-21 | Stephan HEATH | System and method for using global location information, 2d and 3d mapping, social media, and user behavior and information for a consumer feedback social media analytics platform for providing analytic measfurements data of online consumer feedback for global brand products or services of past, present, or future customers, users or target markets |
CN103295145A (en) * | 2012-02-28 | 2013-09-11 | 北京星源无限传媒科技有限公司 | Mobile phone advertising method based on user consumption feature vector |
CN106022869A (en) * | 2016-05-12 | 2016-10-12 | 北京邮电大学 | Consumption object recommending method and consumption object recommending device |
CN105959745A (en) * | 2016-05-25 | 2016-09-21 | 北京铭嘉实咨询有限公司 | Advertising method and system |
Non-Patent Citations (1)
Title |
---|
高艳等: "面向用户偏好发现的隐变量模型构建与推理", 《计算机应用》 * |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019223379A1 (en) * | 2018-05-22 | 2019-11-28 | 阿里巴巴集团控股有限公司 | Product recommendation method and device |
CN110210692A (en) * | 2018-05-24 | 2019-09-06 | 腾讯科技(深圳)有限公司 | A kind of trade company's mode identification method, device, storage medium and terminal device |
WO2019232891A1 (en) * | 2018-06-06 | 2019-12-12 | 平安科技(深圳)有限公司 | Method and device for acquiring user portrait, computer apparatus and storage medium |
CN109034941B (en) * | 2018-06-13 | 2023-03-31 | 平安科技(深圳)有限公司 | Product recommendation method and device, computer equipment and storage medium |
CN109034941A (en) * | 2018-06-13 | 2018-12-18 | 平安科技(深圳)有限公司 | Products Show method, apparatus, computer equipment and storage medium |
CN109035028A (en) * | 2018-06-29 | 2018-12-18 | 平安科技(深圳)有限公司 | Intelligence, which is thrown, cares for strategy-generating method and device, electronic equipment, storage medium |
CN109035028B (en) * | 2018-06-29 | 2023-08-22 | 平安科技(深圳)有限公司 | Intelligent consultation strategy generation method and device, electronic equipment and storage medium |
WO2020000689A1 (en) * | 2018-06-29 | 2020-01-02 | 平安科技(深圳)有限公司 | Transfer-learning-based robo-advisor strategy generation method and apparatus, and electronic device and storage medium |
CN110019563A (en) * | 2018-08-09 | 2019-07-16 | 北京首钢自动化信息技术有限公司 | A kind of portrait modeling method and device based on multidimensional data |
CN109635990B (en) * | 2018-10-12 | 2022-09-16 | 创新先进技术有限公司 | A training method, prediction method, device, electronic device and storage medium |
CN109635990A (en) * | 2018-10-12 | 2019-04-16 | 阿里巴巴集团控股有限公司 | A kind of training method, prediction technique, device and electronic equipment |
CN109614606B (en) * | 2018-10-23 | 2023-02-03 | 中山大学 | Method and device for classification and prediction of fine range in long text cases based on document embedding |
CN109614606A (en) * | 2018-10-23 | 2019-04-12 | 中山大学 | Classification and prediction method and device for fine range of long text cases based on document embedding |
CN109543668A (en) * | 2018-11-29 | 2019-03-29 | 税友软件集团股份有限公司 | A kind of salary bill item identification method, device, equipment and readable storage medium storing program for executing |
CN112766293A (en) * | 2019-11-05 | 2021-05-07 | 腾讯科技(深圳)有限公司 | Data feature extraction method and business object classification method and device |
CN112766293B (en) * | 2019-11-05 | 2024-08-09 | 腾讯科技(深圳)有限公司 | Data feature extraction method, business object classification method and device |
CN111414122B (en) * | 2019-12-26 | 2021-06-11 | 腾讯科技(深圳)有限公司 | Intelligent text processing method and device, electronic equipment and storage medium |
CN111414122A (en) * | 2019-12-26 | 2020-07-14 | 腾讯科技(深圳)有限公司 | Intelligent text processing method and device, electronic equipment and storage medium |
TWI779387B (en) * | 2020-11-06 | 2022-10-01 | 台北富邦商業銀行股份有限公司 | Smart customer tagging device and method thereof |
CN113506138B (en) * | 2021-07-16 | 2024-06-07 | 瑞幸咖啡信息技术(厦门)有限公司 | Data prediction method, device and equipment of business object and storage medium |
CN113506138A (en) * | 2021-07-16 | 2021-10-15 | 瑞幸咖啡信息技术(厦门)有限公司 | Data estimation method, device, equipment and storage medium of business object |
CN113963367A (en) * | 2021-10-22 | 2022-01-21 | 深圳前海环融联易信息科技服务有限公司 | Financial transaction file based on model and money extraction method |
CN113963367B (en) * | 2021-10-22 | 2024-05-28 | 深圳前海环融联易信息科技服务有限公司 | Model-based financial transaction file and money extraction method |
CN114357020A (en) * | 2021-12-17 | 2022-04-15 | 杭州摸象大数据科技有限公司 | Business scenario data extraction method, device, computer equipment and storage medium |
CN117708183B (en) * | 2023-11-08 | 2024-06-11 | 广州西米科技有限公司 | Potential user mining method and system based on user consumption habit |
CN117708183A (en) * | 2023-11-08 | 2024-03-15 | 广州西米科技有限公司 | Potential user mining method and system based on user consumption habit |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107578270A (en) | Method, device and computing device for constructing a financial label | |
US20230102337A1 (en) | Method and apparatus for training recommendation model, computer device, and storage medium | |
Kosinski et al. | Mining big data to extract patterns and predict real-life outcomes. | |
Tsiotas | Detecting different topologies immanent in scale-free networks with the same degree distribution | |
Wu et al. | The influence of international tourism receipts on economic development: Evidence from China’s 31 major regions | |
CN106844407B (en) | Tag network generation method and system based on data set correlation | |
Sun et al. | Online ensemble using adaptive windowing for data streams with concept drift | |
US12020267B2 (en) | Method, apparatus, storage medium, and device for generating user profile | |
CN115002200A (en) | User portrait based message pushing method, device, equipment and storage medium | |
CN106611375A (en) | Text analysis-based credit risk assessment method and apparatus | |
CN111274330A (en) | Target object determination method and device, computer equipment and storage medium | |
CN111966886B (en) | Object recommendation method, object recommendation device, electronic equipment and storage medium | |
CN109740642A (en) | Invoice category identification method, device, electronic device and readable storage medium | |
CN110110225A (en) | Online education recommended models and construction method based on user behavior data analysis | |
CN111639485B (en) | Course recommendation method and related equipment based on text similarity | |
CN117421491A (en) | Method and device for quantifying social media account running data and electronic equipment | |
Hamad et al. | Sentiment analysis of restaurant reviews in social media using naïve bayes | |
CN112434126A (en) | Information processing method, device, equipment and storage medium | |
CN113204662B (en) | Method, device and computer equipment for predicting user group based on photo-search behavior | |
CN112632275B (en) | Crowd clustering data processing method, device and equipment based on personal text information | |
CN114357184A (en) | Item recommendation method and related device, electronic equipment and storage medium | |
Fan et al. | An improved quantum clustering algorithm with weighted distance based on PSO and research on the prediction of electrical power demand | |
CN113032671A (en) | Content processing method, content processing device, electronic equipment and storage medium | |
Altmann | Statistical Laws in Complex Systems: Combining Mechanistic Models and Data Analysis | |
Seidpisheh et al. | Hierarchical clustering of heavy-tailed data using a new similarity measure |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180112 |