CN111754253A - User authentication method, device, computer equipment and storage medium - Google Patents
User authentication method, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN111754253A CN111754253A CN201910535299.3A CN201910535299A CN111754253A CN 111754253 A CN111754253 A CN 111754253A CN 201910535299 A CN201910535299 A CN 201910535299A CN 111754253 A CN111754253 A CN 111754253A
- Authority
- CN
- China
- Prior art keywords
- feature
- user
- authenticated
- data
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0251—Targeted advertisements
- G06Q30/0269—Targeted advertisements based on user profile or attribute
- G06Q30/0271—Personalized advertisement
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0631—Recommending goods or services
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Strategic Management (AREA)
- Development Economics (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Economics (AREA)
- Data Mining & Analysis (AREA)
- Game Theory and Decision Science (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域technical field
本申请涉及互联网技术领域,尤其涉及一种用户的鉴权方法、装置、计算机设备及存储介质。The present application relates to the field of Internet technologies, and in particular, to a user authentication method, device, computer equipment and storage medium.
背景技术Background technique
随着电商业务持续发展,扩充新的品类方向成为各电商平台的一种趋势,比如从购物品类扩展到租房服务品类,租房业务是电商平台的新的业务方向,新业务的开展需要挖掘需求用户进行精准营销,提高潜在用户的转化率。With the continuous development of e-commerce business, the expansion of new categories has become a trend of various e-commerce platforms. For example, from shopping categories to rental service categories, rental business is a new business direction of e-commerce platforms, and the development of new businesses requires Mining demand users for precise marketing and improving the conversion rate of potential users.
现有技术中,通常基于用户的线上租房行为数据,通过预测模型来预测用户群潜在的转化率来达到营销目的。但是对于新业务来说,训练预测模型所能够采用的训练样本量有限,导致潜在用户的挖掘量低,对新业务的营销效果较差。In the prior art, based on the user's online rental behavior data, a prediction model is used to predict the potential conversion rate of the user group to achieve marketing purposes. However, for new businesses, the number of training samples that can be used to train the prediction model is limited, resulting in a low mining volume of potential users and poor marketing effects for new businesses.
发明内容SUMMARY OF THE INVENTION
本申请提供一种用户的鉴权方法、装置、计算机设备及存储介质,以解决现有技术潜在用户的挖掘量小,确定不准确等缺陷。The present application provides a user authentication method, device, computer equipment and storage medium, so as to solve the defects of the prior art, such as the small amount of mining potential users and the inaccurate determination.
本申请第一个方面提供一种用户的鉴权方法,包括:A first aspect of the present application provides a user authentication method, including:
基于第一预设样本数据,确定与目标业务相关的目标特征类,所述第一预设样本数据包括用户历史行为的第一特征数据;determining a target feature class related to the target service based on the first preset sample data, where the first preset sample data includes the first feature data of the user's historical behavior;
接收鉴权请求,所述鉴权请求包括待鉴权用户的标识;receiving an authentication request, where the authentication request includes the identification of the user to be authenticated;
根据所述待鉴权用户的标识获取所述待鉴权用户的相关数据,所述相关数据包括所述待鉴权用户对应的各所述目标特征类的第二特征数据;Obtain relevant data of the user to be authenticated according to the identification of the user to be authenticated, and the relevant data includes the second feature data of each of the target feature classes corresponding to the user to be authenticated;
基于所述待鉴权用户的相关数据对所述待鉴权用户进行鉴权;Authenticating the user to be authenticated based on the relevant data of the user to be authenticated;
若鉴权通过,向所述待鉴权用户推送与所述目标业务相关的内容。If the authentication is passed, the content related to the target service is pushed to the user to be authenticated.
可选地,所述基于第一预设样本数据,确定与目标业务相关的目标特征类,包括:Optionally, determining the target feature class related to the target service based on the first preset sample data, including:
根据所述第一特征数据及预设的随机森林RF模型,确定所述第一特征数据中各特征类在所述RF模型中的目标特征重要性评分;According to the first feature data and the preset random forest RF model, determine the target feature importance score of each feature class in the first feature data in the RF model;
根据各特征类在所述RF模型中的目标特征重要性评分,确定与目标业务相关的目标特征类。According to the target feature importance score of each feature class in the RF model, the target feature class related to the target service is determined.
可选地,根据所述第一特征数据,确定第一特征数据中各特征类在所述RF模型中的目标特征重要性评分,包括:Optionally, according to the first feature data, determine the target feature importance score of each feature class in the first feature data in the RF model, including:
根据所述第一特征数据,获取至少两个决策树对应分别对应的第一样本;obtaining first samples corresponding to at least two decision trees according to the first feature data;
对于每个决策树,基于其对应的第一样本,确定M个特征类中随机的m个特征类在该决策树中的第一特征重要性评分,M为第一特征数据中包括的特征类的总数,m为小于M的整数;For each decision tree, based on its corresponding first sample, determine the first feature importance score of m random feature classes among the M feature classes in the decision tree, where M is the feature included in the first feature data The total number of classes, m is an integer less than M;
对于每个特征类,基于该特征类在各决策树中的第一特征重要性评分,确定该特征类的第二特征重要性评分,所述第二特征重要性评分为该特征类在各决策树中的第一特征重要性评分的和;For each feature class, based on the first feature importance score of the feature class in each decision tree, determine the second feature importance score of the feature class, and the second feature importance score is the feature class in each decision tree. The sum of the first feature importance scores in the tree;
将各特征类的第二特征重要性评分进行归一化处理,获得各特征类的所述目标特征重要性评分。The second feature importance score of each feature class is normalized to obtain the target feature importance score of each feature class.
可选地,对于每个决策树,基于其对应的第一样本,确定M个特征类中随机的m个特征类在该决策树中的第一特征重要性评分,包括:Optionally, for each decision tree, based on its corresponding first sample, determine the first feature importance score of m random feature classes in the decision tree among the M feature classes, including:
对于该决策树中的每个节点,计算m个特征类中各特征类在该节点的基尼Gini指数;For each node in the decision tree, calculate the Gini index of each feature class of the m feature classes at the node;
对于m个特征类中的每个特征类,基于该特征类在该决策树中各节点的Gini指数,计算获得该特征类在各节点的Gini指数变化量;For each feature class in the m feature classes, based on the Gini index of each node of the feature class in the decision tree, calculate and obtain the variation of the Gini index of the feature class at each node;
基于m个特征类中的每个特征类在各节点的Gini指数变化量,计算获得m个特征类在该决策树中的第一特征重要性评分。Based on the variation of the Gini index of each of the m feature classes at each node, the first feature importance score of the m feature classes in the decision tree is calculated and obtained.
可选地,根据各特征类在所述RF模型中的目标特征重要性评分,确定与目标业务相关的目标特征类,包括:Optionally, according to the target feature importance score of each feature class in the RF model, determine the target feature class related to the target service, including:
根据所述目标特征重要性评分,获取评分大于预设阈值的特征类,作为所述目标特征类。According to the target feature importance score, a feature class with a score greater than a preset threshold is obtained as the target feature class.
可选地,所述目标业务为租房业务,所述第一特征数据包括租房类行为特征数据,以及基于租房类行为特征数据挖掘出的非租房类行为特征数据。Optionally, the target business is a rental business, and the first feature data includes rental behavior feature data and non-rental behavior feature data mined based on the rental behavior feature data.
可选地,所述基于所述待鉴权用户的相关数据对所述待鉴权用户进行鉴权,包括:Optionally, the authentication of the user to be authenticated based on the relevant data of the user to be authenticated includes:
基于所述待鉴权用户的相关数据及预先训练好的分类模型,对所述待鉴权用户进行鉴权。Based on the relevant data of the user to be authenticated and the pre-trained classification model, the user to be authenticated is authenticated.
可选地,所述分类模型为Xgboost模型;Optionally, the classification model is an Xgboost model;
在基于所述待鉴权用户的相关数据及预先训练好的分类模型,对所述待鉴权用户进行鉴权之前,所述方法还包括:Before authenticating the user to be authenticated based on the relevant data of the user to be authenticated and the pre-trained classification model, the method further includes:
获取训练样本数据;Get training sample data;
采用所述训练样本数据对预先建立的Xgboost网络进行训练,获得所述Xgboost模型。The pre-established Xgboost network is trained by using the training sample data to obtain the Xgboost model.
本申请第二个方面提供一种用户的鉴权装置,包括:A second aspect of the present application provides an authentication device for a user, including:
确定模块,用于基于第一预设样本数据,确定与目标业务相关的目标特征类,所述第一预设样本数据包括用户历史行为的第一特征数据;a determination module, configured to determine a target feature class related to the target service based on first preset sample data, where the first preset sample data includes first feature data of user historical behavior;
接收模块,用于接收鉴权请求,所述鉴权请求包括待鉴权用户的标识;a receiving module, configured to receive an authentication request, the authentication request including the identification of the user to be authenticated;
获取模块,用于根据所述待鉴权用户的标识获取所述待鉴权用户的相关数据,所述相关数据包括所述待鉴权用户对应的各所述目标特征类的第二特征数据;an acquisition module, configured to acquire the relevant data of the user to be authenticated according to the identification of the user to be authenticated, the relevant data including the second feature data of each of the target feature classes corresponding to the user to be authenticated;
鉴权模块,用于基于所述待鉴权用户的相关数据对所述待鉴权用户进行鉴权;An authentication module, configured to authenticate the user to be authenticated based on the relevant data of the user to be authenticated;
推送模块,用于若鉴权通过,向所述待鉴权用户推送与所述目标业务相关的内容。A push module, configured to push the content related to the target service to the user to be authenticated if the authentication is passed.
可选地,所述确定模块,具体用于:Optionally, the determining module is specifically used for:
根据所述第一特征数据及预设的随机森林RF模型,确定所述第一特征数据中各特征类在所述RF模型中的目标特征重要性评分;According to the first feature data and the preset random forest RF model, determine the target feature importance score of each feature class in the first feature data in the RF model;
根据各特征类在所述RF模型中的目标特征重要性评分,确定与目标业务相关的目标特征类。According to the target feature importance score of each feature class in the RF model, the target feature class related to the target service is determined.
可选地,所述确定模块,具体用于:Optionally, the determining module is specifically used for:
根据所述第一特征数据,获取至少两个决策树对应分别对应的第一样本;obtaining first samples corresponding to at least two decision trees according to the first feature data;
对于每个决策树,基于其对应的第一样本,确定M个特征类中随机的m个特征类在该决策树中的第一特征重要性评分,M为第一特征数据中包括的特征类的总数,m为小于M的整数;For each decision tree, based on its corresponding first sample, determine the first feature importance score of m random feature classes among the M feature classes in the decision tree, where M is the feature included in the first feature data The total number of classes, m is an integer less than M;
对于每个特征类,基于该特征类在各决策树中的第一特征重要性评分,确定该特征类的第二特征重要性评分,所述第二特征重要性评分为该特征类在各决策树中的第一特征重要性评分的和;For each feature class, based on the first feature importance score of the feature class in each decision tree, determine the second feature importance score of the feature class, and the second feature importance score is the feature class in each decision tree. The sum of the first feature importance scores in the tree;
将各特征类的第二特征重要性评分进行归一化处理,获得各特征类的所述目标特征重要性评分。The second feature importance score of each feature class is normalized to obtain the target feature importance score of each feature class.
可选地,所述确定模块,具体用于:Optionally, the determining module is specifically used for:
对于该决策树中的每个节点,计算m个特征类中各特征类在该节点的基尼Gini指数;For each node in the decision tree, calculate the Gini index of each feature class of the m feature classes at the node;
对于m个特征类中的每个特征类,基于该特征类在该决策树中各节点的Gini指数,计算获得该特征类在各节点的Gini指数变化量;For each feature class in the m feature classes, based on the Gini index of each node of the feature class in the decision tree, calculate and obtain the variation of the Gini index of the feature class at each node;
基于m个特征类中的每个特征类在各节点的Gini指数变化量,计算获得m个特征类在该决策树中的第一特征重要性评分。Based on the variation of the Gini index of each of the m feature classes at each node, the first feature importance score of the m feature classes in the decision tree is calculated and obtained.
可选地,所述确定模块,具体用于:Optionally, the determining module is specifically used for:
根据所述目标特征重要性评分,获取评分大于预设阈值的特征类,作为所述目标特征类。According to the target feature importance score, a feature class with a score greater than a preset threshold is obtained as the target feature class.
可选地,所述目标业务为租房业务,所述第一特征数据包括租房类行为特征数据,以及基于租房类行为特征数据挖掘出的非租房类行为特征数据。Optionally, the target business is a rental business, and the first feature data includes rental behavior feature data and non-rental behavior feature data mined based on the rental behavior feature data.
可选地,所述鉴权模块,具体用于:Optionally, the authentication module is specifically used for:
基于所述待鉴权用户的相关数据及预先训练好的分类模型,对所述待鉴权用户进行鉴权。Based on the relevant data of the user to be authenticated and the pre-trained classification model, the user to be authenticated is authenticated.
可选地,所述分类模型为Xgboost模型;所述获取模块,还用于:Optionally, the classification model is an Xgboost model; the acquisition module is also used for:
获取训练样本数据;Get training sample data;
采用所述训练样本数据对预先建立的Xgboost网络进行训练,获得所述Xgboost模型。The pre-established Xgboost network is trained by using the training sample data to obtain the Xgboost model.
本申请第三个方面提供一种计算机设备,包括:至少一个处理器和存储器;A third aspect of the present application provides a computer device, comprising: at least one processor and a memory;
所述存储器存储计算机程序;所述至少一个处理器执行所述存储器存储的计算机程序,以实现第一个方面提供的方法。The memory stores a computer program; the at least one processor executes the computer program stored in the memory to implement the method provided by the first aspect.
本申请第四个方面提供一种计算机可读存储介质,该计算机可读存储介质中存储有计算机程序,所述计算机程序被执行时实现第一个方面提供的方法。A fourth aspect of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed, the method provided in the first aspect is implemented.
本申请提供的用户的鉴权方法、装置、计算机设备及存储介质,通过采用预设模型从用户群体的众多特征类中确定与目标业务相关的目标特征类,扩充目标业务的特征,并进一步基于扩充后的目标特征类,根据所述待鉴权用户的标识获取所述待鉴权用户的相关数据,基于所述待鉴权用户的相关数据对所述待鉴权用户进行鉴权,向鉴权通过的待鉴权用户推送与目标业务相关的内容,提高了潜在用户的挖掘量及准确性,可以有效提高业务信息推送的准确性,从而提高营销效果。The user authentication method, device, computer equipment and storage medium provided by the present application, by adopting a preset model, determine the target feature class related to the target service from the numerous feature classes of the user group, expand the feature of the target service, and further based on The expanded target feature class obtains the relevant data of the user to be authenticated according to the identity of the user to be authenticated, authenticates the user to be authenticated based on the relevant data of the user to be authenticated, and sends the authentication to the authentication user. Users to be authenticated who have passed the authorization can push content related to the target business, which improves the mining volume and accuracy of potential users, and can effectively improve the accuracy of business information push, thereby improving the marketing effect.
附图说明Description of drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following will briefly introduce the accompanying drawings used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description These are some embodiments of the present application, and for those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.
图1为本申请一实施例提供的用户的鉴权方法的流程示意图;1 is a schematic flowchart of a user authentication method provided by an embodiment of the present application;
图2为本申请另一实施例提供的用户的鉴权方法的流程示意图;2 is a schematic flowchart of a user authentication method provided by another embodiment of the present application;
图3为本申请一实施例提供的整体鉴权流程的示意图;3 is a schematic diagram of an overall authentication process provided by an embodiment of the present application;
图4为本申请一实施例提供的用户的鉴权装置的结构示意图;4 is a schematic structural diagram of a user authentication device provided by an embodiment of the present application;
图5为本申请一实施例提供的计算机设备的结构示意图。FIG. 5 is a schematic structural diagram of a computer device according to an embodiment of the present application.
通过上述附图,已示出本申请明确的实施例,后文中将有更详细的描述。这些附图和文字描述并不是为了通过任何方式限制本公开构思的范围,而是通过参考特定实施例为本领域技术人员说明本申请的概念。Specific embodiments of the present application have been shown by the above-mentioned drawings, and will be described in more detail hereinafter. These drawings and written descriptions are not intended to limit the scope of the disclosed concepts in any way, but to illustrate the concepts of the present application to those skilled in the art by referring to specific embodiments.
具体实施方式Detailed ways
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present application.
首先对本申请所涉及的名词进行解释:First, the terms involved in this application are explained:
随机森林:Random Forest,RF,指的是利用多棵决策树对样本进行训练并预测的一种分类器。Random Forest: Random Forest, RF, refers to a classifier that uses multiple decision trees to train and predict samples.
Xgboost:Xgboost是GB(Gradient boosting)算法的高效实现,xgboost中的基学习器除了可以是CART(gbtree)也可以是线性分类器(gblinear)。Xgboost: Xgboost is an efficient implementation of the GB (Gradient boosting) algorithm. The base learner in xgboost can be a CART (gbtree) or a linear classifier (gblinear).
本申请实施例提供的用户的鉴权方法,适用于电商平台在新业务或原业务中对潜在用户进行挖掘的场景,比如租房业务,结合用户群体在目标业务的直接行为特征数据及基于直接行为数据挖掘出的其他行为特征数据,确定出与目标业务相关的重要特征类。比如根据用户在租房类上浏览的频次、天数、关注、搜索等,找出潜在用户有用特征,比如产生租房类行为的用户还浏览过其他哪些品类的页面,比如有一定比例的用户还都浏览了母婴和家居家装,则可以根据这些人群在母婴和家居家装品类的浏览、搜索、关注、下单行为下的戏份行为作为潜在有用特征。再比如根据用户浏览时的位置信息(比如经纬度坐标)反查用户地址,根据用户作息时间、结合用户年龄、职业、得出是否是大学生或白领工作地址变换等,将这一部分用户群体也可以纳入潜在人群,等等。The user authentication method provided by the embodiment of the present application is suitable for the scenario in which the e-commerce platform mines potential users in a new business or an original business, such as a rental business, combined with the direct behavioral feature data of the user group in the target business and based on direct Other behavioral feature data mined from behavioral data is used to determine important feature classes related to the target business. For example, according to the user's browsing frequency, days, attention, search, etc., to find out the useful characteristics of potential users, such as which other categories of pages have been browsed by users who generate rental behavior, for example, a certain percentage of users have also browsed In terms of maternal and child and home improvement, the behaviors of these groups in the category of browsing, searching, following, and placing orders can be used as potentially useful features. Another example is based on the location information (such as longitude and latitude coordinates) of the user when browsing the user's address, and according to the user's work and rest time, combined with the user's age, occupation, and whether it is a college student or white-collar work address change, etc., this part of the user group can also be included. potential crowd, etc.
此外,术语“第一”、“第二”等仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。在以下各实施例的描述中,“多个”的含义是两个以上,除非另有明确具体的限定。In addition, the terms "first", "second", etc. are used for descriptive purposes only, and should not be construed as indicating or implying relative importance or implying the number of indicated technical features. In the description of the following embodiments, the meaning of "plurality" is two or more, unless otherwise expressly and specifically defined.
下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。下面将结合附图,对本发明的实施例进行描述。The following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.
实施例一Example 1
本实施例提供一种用户的鉴权方法,用于鉴权目标业务的潜在用户。本实施例的执行主体为用户的鉴权装置,该装置可以设置在计算机设备中,比如服务器、台式电脑、笔记本电脑等,也可以设置在电商平台中。This embodiment provides a user authentication method for authenticating potential users of a target service. The execution subject of this embodiment is the user's authentication device, and the device may be set in a computer device, such as a server, a desktop computer, a notebook computer, etc., or may be set in an e-commerce platform.
如图1所示,为本实施例提供的用户的鉴权方法的流程示意图,该方法包括:As shown in FIG. 1 , a schematic flowchart of a user authentication method provided in this embodiment, the method includes:
步骤101,基于第一预设样本数据,确定与目标业务相关的目标特征类,第一预设样本数据包括用户历史行为的第一特征数据。Step 101: Determine a target feature class related to the target service based on first preset sample data, where the first preset sample data includes first feature data of historical user behavior.
具体的,第一预设样本数据可以包括目标业务的直接特征数据及基于直接特征数据挖掘的间接特征数据。目标业务可以是电商平台新开展的品类的业务,比如租房业务,也可以是原有的业务。Specifically, the first preset sample data may include direct feature data of the target service and indirect feature data based on direct feature data mining. The target business can be a new category of business launched by the e-commerce platform, such as the rental business, or it can be an original business.
以租房业务为例,第一预设样本数据可以包括用户群体在租房品类上浏览的频次、天数、关注、搜索等等行为特征数据,以及根据这些用户群体在租房相关的其他品类上的浏览、搜索、关注、下单行为特征数据和用户浏览时的位置信息、作息时间、年龄、职业,以及根据这些分析得出是否大学生或白领工作地址变换、地址变化后距离等等。将这些用户也纳入潜在人群。Taking the rental business as an example, the first preset sample data may include behavioral characteristic data such as the frequency, days, attention, search, etc. of the user group browsing on the rental category, as well as the browsing, Search, follow, order behavior characteristic data and user’s location information when browsing, work and rest time, age, occupation, and based on these analysis, whether the address of college students or white-collar jobs has changed, the distance after the address change, etc. Include these users in the potential population as well.
示例性的,根据租房品类上的用户历史行为特征数据,统计出已经在租房品类完成了租房用户转化的用户,并获取这些用户在电商平台其他品类的行为特征数据,比如浏览、搜索、关注、下单其他商品的行为特征数据,经过统计分析获取潜在用户有用特征,比如发现已租房的用户群体中,有相当一部分在母婴、家居家装部分有浏览、搜索、关注、下单等行为,可以将这些特征纳入到潜在用户有用特征中,再比如,通过用户浏览时的位置信息反查用户地址,根据用户作息时间,结合用户年龄、职业、得出是否大学生或白领工作地址变换、以及地址变换后的距离等等,比如在毕业季,大学生毕业需要租房,根据用户浏览时的位置信息可以反查出用户的地址是学校,结合用户年龄,判断可能今年要毕业等。将这些特征也乃入潜在用户有用特征中,按照类似这种方式,根据已有数据以及经验挖掘出一批潜在用户有用特征。也即第一预设样本数据包括收集到的各种潜在用户有用特征数据(即第一特征数据)。具体的潜在用户有用特征的收集过程可以是人根据经验获得,也可以是根据已有数据统计分析获得,具体方式不限定。Exemplarily, according to the user's historical behavior characteristic data on the rental category, the users who have completed the conversion of rental users in the rental category are counted, and the behavior characteristics data of these users in other categories on the e-commerce platform, such as browsing, searching, following, are obtained. , behavioral characteristics data of other products placed, and obtain useful characteristics of potential users through statistical analysis. These features can be incorporated into the useful features of potential users. For another example, the user’s address can be reversely checked through the user’s location information when browsing. According to the user’s work and rest time, combined with the user’s age and occupation, it can be determined whether the address is changed for college students or white-collar workers, and the address The changed distance, etc. For example, in the graduation season, college students need to rent a house after graduation. According to the location information of the user when browsing, it can be reversed that the user's address is the school, and combined with the user's age, it can be judged that they may graduate this year. These features are also included in the useful features of potential users, and in a similar way, a group of useful features of potential users are mined based on existing data and experience. That is, the first preset sample data includes various collected useful feature data of potential users (ie, the first feature data). The specific collection process of the useful features of potential users may be obtained by a person based on experience, or may be obtained by statistical analysis of existing data, and the specific manner is not limited.
在收集到这些潜在用户有用特征后,可以从电商平台的用户历史数据中,获取到这些潜在用户有用特征相关的用户历史行为数据,形成第一预设样本数据,用于从这些潜在用户有用特征中确定出与目标业务相关的重要特征类(即目标特征类),特征类即表示一类特征,比如用户年龄为一类特征,用户性别为一类特征。特征数据是表示包括各特征类的特征值的用户数据,比如“张三-男-23岁-学生”。After the useful features of these potential users are collected, the historical user behavior data related to the useful features of these potential users can be obtained from the historical user data of the e-commerce platform to form the first preset sample data, which is used to obtain useful features from these potential users. An important feature class (ie, target feature class) related to the target business is determined in the features, and the feature class represents a class of features, for example, user age is a class of features, and user gender is a class of features. The feature data is user data representing feature values including each feature class, such as "Zhang San-male-23-year-old-student".
示例性的,包括租房类行为特征数据以及非租房类行为特征数据在内共获得800个特征类。后续需要从这800个特征类中,确定出与租房业务相关的一批目标特征类。比如确定出其中200个特征类与租房业务比较相关等等。Exemplarily, a total of 800 feature classes are obtained, including the behavior characteristic data of renting and non-renting behavior. In the follow-up, it is necessary to determine a batch of target feature classes related to the rental business from these 800 feature classes. For example, it is determined that 200 characteristic categories are relatively related to the rental business and so on.
可选地,可以采用预设的随机森林RF模型从第一特征数据中包括的特征类中确定出与目标业务相关的目标特征类。也可以采用其他算法模型确定出目标特征类,具体方式本实施例不做限定。Optionally, a preset random forest RF model may be used to determine a target feature class related to the target service from feature classes included in the first feature data. Other algorithm models may also be used to determine the target feature class, and the specific manner is not limited in this embodiment.
步骤102,接收鉴权请求,鉴权请求包括待鉴权用户的标识。Step 102: Receive an authentication request, where the authentication request includes the identifier of the user to be authenticated.
具体的,当商家需要鉴权哪些用户为潜在用户时,可以通过终端向用户的鉴权装置发送鉴权请求,鉴权请求中可以包括待鉴权用户的标识。待鉴权用户的标识可以是用户在电商平台注册的账户,比如为用户分配的用户ID,也可以是其他形式的标识,只要在电商平台内可以唯一标识用户即可。鉴权请求中可以包括一个或多个待鉴权用户的标识。通常可以是批量的待鉴权用户,比如商家相应的管理人员收集全部或部分在电商平台注册过的用户ID,需要从这些里面鉴权出目标业务的潜在用户,则管理人员可以通过终端界面触发鉴权请求,将这些用户ID作为待鉴权用户的标识携带在鉴权请求中。Specifically, when the merchant needs to authenticate which users are potential users, an authentication request may be sent to the user's authentication device through the terminal, and the authentication request may include the identifier of the user to be authenticated. The identification of the user to be authenticated may be the account registered by the user on the e-commerce platform, such as the user ID assigned to the user, or other forms of identification, as long as the user can be uniquely identified in the e-commerce platform. The authentication request may include the identifiers of one or more users to be authenticated. Usually it can be a batch of users to be authenticated. For example, the corresponding management personnel of the merchant collect all or part of the user IDs registered on the e-commerce platform, and need to authenticate the potential users of the target business from these, then the management personnel can use the terminal interface. An authentication request is triggered, and these user IDs are carried in the authentication request as the identifiers of the users to be authenticated.
步骤103,根据待鉴权用户的标识获取待鉴权用户的相关数据,相关数据包括待鉴权用户对应的各目标特征类的第二特征数据。Step 103: Acquire relevant data of the user to be authenticated according to the identifier of the user to be authenticated, where the relevant data includes second feature data of each target feature class corresponding to the user to be authenticated.
具体的,在接收到鉴权请求后,可以从鉴权请求中获取待鉴权用户的标识,并进一步根据待鉴权用户的标识获取待鉴权用户的相关数据,相关数据包括待鉴权用户对应的各目标特征类的第二特征数据。相关数据可以从相关品类的用户历史数据中提取获得。比如从10月份的用户历史数据中,提取第二特征数据,形成相关数据。用于鉴权11月份的目标潜在用户,等等。具体相关数据可以根据实际需求获取相应的数据,本实施例不做限定。Specifically, after receiving the authentication request, the identification of the user to be authenticated can be obtained from the authentication request, and the relevant data of the user to be authenticated can be further obtained according to the identification of the user to be authenticated, and the relevant data includes the user to be authenticated. Corresponding second feature data of each target feature class. Relevant data can be extracted from user historical data of related categories. For example, from the user's historical data in October, the second characteristic data is extracted to form relevant data. Used to authenticate target potential users in November, etc. The specific relevant data can be obtained according to actual needs, which is not limited in this embodiment.
步骤104,基于待鉴权用户的相关数据对待鉴权用户进行鉴权。In
具体的,在获取到待鉴权用户的相关数据后,则可以基于待鉴权用户的相关数据对待鉴权用户进行鉴权,鉴权待鉴权用户是否为目标业务的潜在用户。具体可以采用预先训练好的分类模型对待鉴权用户进行鉴权。Specifically, after obtaining the relevant data of the user to be authenticated, the user to be authenticated can be authenticated based on the relevant data of the user to be authenticated, and whether the user to be authenticated is a potential user of the target service can be authenticated. Specifically, a pre-trained classification model may be used to authenticate the user to be authenticated.
示例性的,目标特征类包括100种,即每条第二特征数据为100维,第二特征数据包括1万条,将第二特征数据输入预先训练好的分类模型,鉴权出一批目标潜在用户,比如鉴权出1万条数据中,有2000个用户为目标潜在用户。也即,对这2000个用户的鉴权结果为鉴权通过。Exemplarily, the target feature class includes 100 types, that is, each second feature data is 100-dimensional, and the second feature data includes 10,000 pieces. Input the second feature data into the pre-trained classification model, and authenticate a batch of targets. Potential users, for example, out of 10,000 pieces of data authenticated, 2,000 users are targeted potential users. That is, the authentication result for these 2000 users is that the authentication is passed.
步骤105,若鉴权通过,向待鉴权用户推送与目标业务相关的内容。
具体的,在鉴权出目标潜在用户后,则可以向鉴权通过的待鉴权用户推送与目标业务相关的内容,比如在用户通过手机APP登录到电商平台网页时,在页面推送租房链接,或者向用户手机推送相关信息等等。具体的推送方式可以根据实际需求设置,本实施例不做限定。Specifically, after the target potential user is authenticated, the content related to the target service can be pushed to the user to be authenticated who has passed the authentication. , or push relevant information to the user's mobile phone, etc. The specific push mode can be set according to actual needs, which is not limited in this embodiment.
本实施例提供的用户的鉴权方法,通过采用预设模型从用户群体的众多特征类中确定与目标业务相关的目标特征类,扩充目标业务的特征,并进一步基于扩充后的目标特征类,根据待鉴权用户的标识获取待鉴权用户的相关数据,基于待鉴权用户的相关数据对待鉴权用户进行鉴权,向鉴权通过的待鉴权用户推送与目标业务相关的内容,提高了潜在用户的挖掘量及准确性,可以有效提高业务信息推送的准确性,从而提高营销效果。In the user authentication method provided by this embodiment, the target feature class related to the target service is determined from the numerous feature classes of the user group by adopting a preset model, the feature of the target service is expanded, and further based on the expanded target feature class, Obtain the relevant data of the user to be authenticated according to the identification of the user to be authenticated, authenticate the user to be authenticated based on the relevant data of the user to be authenticated, and push the content related to the target service to the user to be authenticated that has passed the authentication, improving the It can effectively improve the accuracy of business information push, thereby improving the marketing effect.
实施例二Embodiment 2
本实施例对实施例一提供的方法做进一步补充说明。This embodiment further supplements the description of the method provided in the first embodiment.
如图2所示,为本实施例提供的用户的鉴权方法的流程示意图。As shown in FIG. 2 , it is a schematic flowchart of a user authentication method provided in this embodiment.
作为一种可实施的方式,在上述实施例一的基础上,可选地,步骤101具体包括:As an implementable manner, on the basis of the foregoing Embodiment 1, optionally, step 101 specifically includes:
步骤1011,根据第一特征数据及预设的随机森林RF模型,确定第一特征数据中各特征类在RF模型中的目标特征重要性评分。Step 1011 , according to the first feature data and the preset random forest RF model, determine the target feature importance score of each feature class in the first feature data in the RF model.
步骤1012,根据各特征类在RF模型中的目标特征重要性评分,确定与目标业务相关的目标特征类。Step 1012 , according to the target feature importance score of each feature class in the RF model, determine the target feature class related to the target service.
可选地,步骤1011具体可以包括:Optionally, step 1011 may specifically include:
步骤10111,根据第一特征数据,获取至少两个决策树对应分别对应的第一样本。Step 10111: Acquire respective first samples corresponding to at least two decision trees according to the first feature data.
步骤10112,对于每个决策树,基于其对应的第一样本,确定M个特征类中随机的m个特征类在该决策树中的第一特征重要性评分,M为第一特征数据中包括的特征类的总数,m为小于M的整数。
步骤10113,对于每个特征类,基于该特征类在各决策树中的第一特征重要性评分,确定该特征类的第二特征重要性评分,第二特征重要性评分为该特征类在各决策树中的第一特征重要性评分的和。
步骤10114,将各特征类的第二特征重要性评分进行归一化处理,获得各特征类的目标特征重要性评分。Step 10114: Normalize the second feature importance scores of each feature class to obtain the target feature importance score of each feature class.
可选地,对于每个决策树,基于其对应的第一样本,确定M个特征类中随机的m个特征类在该决策树中的第一特征重要性评分,包括:Optionally, for each decision tree, based on its corresponding first sample, determine the first feature importance score of m random feature classes in the decision tree among the M feature classes, including:
对于该决策树中的每个节点,计算m个特征类中各特征类在该节点的基尼Gini指数;For each node in the decision tree, calculate the Gini index of each feature class of the m feature classes at the node;
对于m个特征类中的每个特征类,基于该特征类在该决策树中各节点的Gini指数,计算获得该特征类在各节点的Gini指数变化量;For each feature class in the m feature classes, based on the Gini index of each node of the feature class in the decision tree, calculate and obtain the variation of the Gini index of the feature class at each node;
基于m个特征类中的每个特征类在各节点的Gini指数变化量,计算获得m个特征类在该决策树中的第一特征重要性评分。Based on the variation of the Gini index of each of the m feature classes at each node, the first feature importance score of the m feature classes in the decision tree is calculated and obtained.
具体的,在随机森林算法中,通常需要设置多个决策树,每个决策树可以包括多个节点。Specifically, in the random forest algorithm, it is usually necessary to set up multiple decision trees, and each decision tree may include multiple nodes.
对于每个决策树,从第一特征数据中随机抽取其对应的第一样本。第一样本包括M个特征类,第一样本与第一特征数据中包括的特征类一致。For each decision tree, its corresponding first sample is randomly selected from the first feature data. The first sample includes M feature classes, and the first sample is consistent with the feature classes included in the first feature data.
示例性的,比如第一特征数据有10万条,每条800维特征(即包括M=800个特征类),可以以有放回方式随机抽取6万条*800维作为该决策树对应的第一样本。对于每个决策树都随机抽取6万条*800维。获得各决策树对应的第一样本。Exemplarily, for example, there are 100,000 pieces of first feature data, each with 800-dimensional features (that is, including M=800 feature classes), and 60,000*800-dimensional features can be randomly selected with replacement as the corresponding decision tree. first sample. For each decision tree, 60,000*800 dimensions are randomly selected. Obtain the first sample corresponding to each decision tree.
在获取到每个决策树对应的第一样本后,对于每个决策树,基于其对应的第一样本,来确定M个特征类中随机的m个特征类在该决策树中的第一特征重要性评分,M为第一特征数据中包括的特征类的总数,m为小于M的整数。After the first sample corresponding to each decision tree is obtained, for each decision tree, based on its corresponding first sample, determine the number of random m feature classes among the M feature classes in the decision tree. A feature importance score, M is the total number of feature classes included in the first feature data, and m is an integer smaller than M.
示例性的,从800个特征类中,随机抽取500个特征类,决策树上每个节点的决定都是基于这些特征类决定的,根据这500个特征类来确定节点的最佳分裂方式。Exemplarily, 500 feature classes are randomly selected from 800 feature classes, the decision of each node on the decision tree is determined based on these feature classes, and the optimal splitting method of nodes is determined according to these 500 feature classes.
设VIM表示特征重要性评分,GI表示Gini指数,计算每个特征类在决策树中的节点s上的Gini指数GIs为:Let VIM represent the feature importance score, GI represent the Gini index, and calculate the Gini index GIs of each feature class on the node s in the decision tree as:
其中,K表示该特征类下,用户的特征值有K个类别,psk表示节点s中用户特征值中类别k所占的比例。Among them, K represents that under this feature class, there are K categories of user feature values, and p sk represents the proportion of category k in the user feature value in node s.
示例性的,对于特征类A,当前节点s有60条样本,其中,有的用户浏览1天,有的用户浏览2天,有的用户浏览3天,将该特征类A下,样本被分为K=2个类别,统计其中浏览1天和2天的用户总数、浏览3天的用户总数,假如1天的有10条,2天的有20条,则浏览1天和2天的用户总数为30条,浏览3天的有30条,则特征类A在节点s上的Gini指数为:Exemplarily, for feature class A, the current node s has 60 samples, of which some users browse for 1 day, some users browse for 2 days, and some users browse for 3 days. Under this feature class A, the samples are divided into For K=2 categories, count the total number of users who browsed for 1 day and 2 days, and the total number of users who browsed for 3 days. If there are 10 records in 1 day and 20 records in 2 days, then the users who browsed for 1 day and 2 days The total number is 30, and 30 have been browsed for 3 days, then the Gini index of feature class A on node s is:
在获得m个特征类中各特征类在该决策树中各节点的Gini指数后,对于m个特征类中的每个特征类,基于该特征类在该决策树中各节点的Gini指数,计算获得该特征类在各节点的Gini指数变化量。具体节点s分列前后的Gini指数变化量为:After obtaining the Gini index of each node in the decision tree of each feature class in the m feature classes, for each feature class in the m feature classes, based on the Gini index of the feature class in each node in the decision tree, calculate Obtain the variation of the Gini index of the feature class at each node. The variation of the Gini index before and after the specific node s is divided into:
其中,j表示第j个特征类,GIl和GIr分别表示第j个特征类在s节的左分支节点和又分支节点的Gini指数。Among them, j represents the jth feature class, and GIl and GIr represent the Gini indices of the left branch node and the branch node of the jth feature class in section s, respectively.
依此计算获得各特征类在各节点的的Gini指数变化量。然后基于m个特征类中的每个特征类在各节点的Gini指数变化量,计算获得m个特征类在该决策树中的第一特征重要性评分。第j个特征类在该决策树中的第一特征重要性评分为:According to this calculation, the variation of Gini index of each feature class at each node is obtained. Then, based on the variation of the Gini index of each feature class in the m feature classes at each node, the first feature importance score of the m feature classes in the decision tree is calculated and obtained. The first feature importance score of the jth feature class in this decision tree is:
其中,i表示第i棵决策树,S表示第i棵决策树上的非叶子节点的集合,s表示第i棵决策树上的非叶子节点。Among them, i represents the ith decision tree, S represents the set of non-leaf nodes on the ith decision tree, and s represents the non-leaf nodes on the ith decision tree.
若RF模型中共有n棵树,则获得的第j个特征类的第二特征重要性评分为:If there are n trees in the RF model, the obtained second feature importance score of the jth feature class is:
对第二特征重要性评分进行归一化处理,获得各特征类的目标特征重要性评分。对于第j个特征类的目标特征重要性评分为:The second feature importance score is normalized to obtain the target feature importance score of each feature class. The target feature importance score for the jth feature class is:
其中,M表示特征类的总数,t表示第t个特征类。Among them, M represents the total number of feature classes, and t represents the t-th feature class.
在获得各特征类的目标特征重要性评分后,根据各特征类在RF模型中的目标特征重要性评分,确定与目标业务相关的目标特征类。After obtaining the target feature importance score of each feature class, the target feature class related to the target business is determined according to the target feature importance score of each feature class in the RF model.
可选地,根据目标特征重要性评分,获取评分大于预设阈值的特征类,作为目标特征类。或者根据目标特征重要性评分,获取预设比例或预设数量的特征类作为目标特征类。具体选取方式可以根据实际需求设置,本实施例不做限定。Optionally, according to the target feature importance score, a feature class with a score greater than a preset threshold is obtained as the target feature class. Or according to the target feature importance score, a preset proportion or a preset number of feature classes are obtained as the target feature classes. The specific selection manner may be set according to actual requirements, which is not limited in this embodiment.
作为另一种可实施的方式,在上述实施例一的基础上,可选地,目标业务为租房业务,第一特征数据包括租房类行为特征数据,以及基于租房类行为特征数据挖掘出的非租房类行为特征数据。As another implementable manner, on the basis of the above-mentioned first embodiment, optionally, the target business is a rental business, and the first feature data includes rental behavior feature data, and non-identity data mined based on the rental behavior feature data. Rental behavior characteristic data.
作为另一种可实施的方式,在上述实施例一的基础上,可选地,基于待鉴权用户的相关数据对待鉴权用户进行鉴权,包括:As another implementable manner, on the basis of the above-mentioned Embodiment 1, optionally, the user to be authenticated is authenticated based on the relevant data of the user to be authenticated, including:
基于待鉴权用户的相关数据及预先训练好的分类模型,对待鉴权用户进行鉴权。Based on the relevant data of the user to be authenticated and the pre-trained classification model, the user to be authenticated is authenticated.
可选地,分类模型为Xgboost模型;在基于待鉴权用户的相关数据及预先训练好的分类模型,对待鉴权用户进行鉴权之前,方法还包括:Optionally, the classification model is an Xgboost model; before the user to be authenticated is authenticated based on the relevant data of the user to be authenticated and the pre-trained classification model, the method further includes:
步骤2011,获取训练样本数据。Step 2011, acquiring training sample data.
步骤2012,采用训练样本数据对预先建立的Xgboost网络进行训练,获得Xgboost模型。Step 2012, using the training sample data to train the pre-established Xgboost network to obtain an Xgboost model.
具体的,训练样本数据包括训练特征数据和标注数据。其中,训练特征数据包括目标特征类的用户数据,标注数据包括每条用户数据是否已发生转化,即是否已经租房。实际已发生转化的训练特征数据作为正样本集Y,根据Y数据量大小按照一定比例,比如1:1或1:2抽取负样本集,即未发生转化的训练特征数据。负样本集可以使用分层随机抽样。将正样本集和负样本集合成训练样本集(即训练样本数据),采用训练样本数据对预先建立的Xgboost网络进行训练,获得Xgboost模型。具体的训练过程与现有的训练过程一致,在此不再赘述。Specifically, the training sample data includes training feature data and labeling data. Among them, the training feature data includes user data of the target feature class, and the labeling data includes whether each piece of user data has been converted, that is, whether the user has rented a house. The training feature data that has actually been transformed is used as the positive sample set Y, and the negative sample set is extracted according to the amount of Y data according to a certain ratio, such as 1:1 or 1:2, that is, the training feature data that has not been transformed. Negative sample sets can use stratified random sampling. The positive sample set and the negative sample are set into a training sample set (ie, training sample data), and the pre-established Xgboost network is trained by using the training sample data to obtain an Xgboost model. The specific training process is consistent with the existing training process and will not be repeated here.
可选地,可以将训练样本数据按一定的比例随机切分,比如按3:7的比例随机切分,一部分作为训练样本集,一部分作为测试集,查看Xgboost模型的准确率、AUC、F1值等,优化到合理范围即可获得训练好的Xgboost模型,去鉴权待鉴权用户,将分类为1的数据作为目标潜在用户,即鉴权通过的待鉴权用户。Optionally, the training sample data can be randomly divided according to a certain ratio, for example, randomly divided at a ratio of 3:7, part of which is used as a training sample set, and a part of which is used as a test set. Check the accuracy, AUC, and F1 value of the Xgboost model. Wait, and optimize to a reasonable range to obtain a trained Xgboost model to authenticate the user to be authenticated, and use the data classified as 1 as the target potential user, that is, the user to be authenticated who has passed the authentication.
AUC:即曲线下的面积,这条曲线便是ROC曲线,全称为the Receiver OperatingCharacteristic曲线,它最开始使用是上世纪50年代的电信号分析中,在1978年的“BasicPrinciples of ROC Analysis”开始流行起来。ROC曲线描述器的True Positive Rate(TPR,分类器分类正确的正样本个数占总正样本个数的比例)与False Positive Rate(FPR,分类器分类错误的负样本个数占总负样本个数的比例)之间的变化关系。AUC: the area under the curve, this curve is the ROC curve, the full name is the Receiver OperatingCharacteristic curve, it was first used in the analysis of electrical signals in the 1950s, and became popular in 1978 with the "Basic Principles of ROC Analysis" stand up. True Positive Rate (TPR, the ratio of the number of positive samples classified correctly by the classifier to the total number of positive samples) and False Positive Rate (FPR, the number of negative samples classified incorrectly by the classifier accounted for the total number of negative samples) of the ROC curve descriptor ratio of numbers).
F1值:为精确率与召回率的调和平均值,它的值更接近与精确值与召回率中较小的值。F1 value: It is the harmonic mean of precision and recall, and its value is closer to the smaller of precision and recall.
精确率:是指分类器分类正确的正样本的个数占该分类器所有分类为正样本个数的比例。Accuracy rate: refers to the proportion of the number of positive samples classified correctly by the classifier to the total number of positive samples classified by the classifier.
召回率:指分类器分类正确的正样本的个数占所有的正样本个数的比例Recall rate: refers to the proportion of the number of correct positive samples classified by the classifier to the total number of positive samples
如图3所示,为本实施例提供的整体鉴权流程的示意图。其中特征提取部分即获得潜在用户有用特征的过程。特征工程部分即采用RF模型确定目标特征类的过程,特征重要性即根据目标特征重要性评分确定各特征类的重要性,并确定出比较重要的一批目标特征类。核心特征集即训练样本数据用于训练分类模型,优化后模型即训练好的分类模型,预测集及采用优化后的模型鉴权出的潜在用户。As shown in FIG. 3 , it is a schematic diagram of the overall authentication process provided in this embodiment. The feature extraction part is the process of obtaining useful features of potential users. The feature engineering part is the process of using the RF model to determine the target feature class, and the feature importance is to determine the importance of each feature class according to the target feature importance score, and determine a batch of more important target feature classes. The core feature set is the training sample data used to train the classification model, and the optimized model is the trained classification model, the prediction set, and the potential users identified by the optimized model.
需要说明的是,本实施例中各可实施的方式可以单独实施,也可以在不冲突的情况下以任意组合方式结合实施本申请不做限定。It should be noted that, each implementable manner in this embodiment may be implemented independently, or may be implemented in combination in any combination under the condition of no conflict, which is not limited in this application.
本实施例提供的用户的鉴权方法,通过采用预设模型从用户群体的众多特征类中确定与目标业务相关的目标特征类,扩充目标业务的特征,并进一步基于扩充后的目标特征类,根据待鉴权用户的标识获取待鉴权用户的相关数据,基于待鉴权用户的相关数据对待鉴权用户进行鉴权,向鉴权通过的待鉴权用户推送与目标业务相关的内容,提高了潜在用户的挖掘量及准确性,可以有效提高业务信息推送的准确性,从而提高营销效果。In the user authentication method provided by this embodiment, the target feature class related to the target service is determined from the numerous feature classes of the user group by adopting a preset model, the feature of the target service is expanded, and further based on the expanded target feature class, Obtain the relevant data of the user to be authenticated according to the identification of the user to be authenticated, authenticate the user to be authenticated based on the relevant data of the user to be authenticated, and push the content related to the target service to the user to be authenticated that has passed the authentication, improving the It can effectively improve the accuracy of business information push, thereby improving the marketing effect.
实施例三Embodiment 3
本实施例提供一种用户的鉴权装置,用于执行上述实施例一的方法。This embodiment provides an authentication device for a user, which is used to execute the method of the first embodiment.
如图4所示,为本实施例提供的用户的鉴权装置的结构示意图。该用户的鉴权装置30包括确定模块31、接收模块32、获取模块33、鉴权模块34和推送模块35。As shown in FIG. 4 , it is a schematic structural diagram of a user authentication apparatus provided in this embodiment. The
其中,确定模块,用于基于第一预设样本数据,确定与目标业务相关的目标特征类,第一预设样本数据包括用户历史行为的第一特征数据;接收模块,用于接收鉴权请求,鉴权请求包括待鉴权用户的标识;获取模块,用于根据待鉴权用户的标识获取待鉴权用户的相关数据,相关数据包括待鉴权用户对应的各目标特征类的第二特征数据;鉴权模块,用于基于待鉴权用户的相关数据对待鉴权用户进行鉴权;推送模块,用于若鉴权通过,向待鉴权用户推送与目标业务相关的内容。The determining module is configured to determine the target feature class related to the target service based on the first preset sample data, where the first preset sample data includes the first feature data of the user's historical behavior; the receiving module is configured to receive an authentication request , the authentication request includes the identification of the user to be authenticated; the acquisition module is used to obtain the relevant data of the user to be authenticated according to the identification of the user to be authenticated, and the relevant data includes the second feature of each target feature class corresponding to the user to be authenticated data; the authentication module is used to authenticate the user to be authenticated based on the relevant data of the user to be authenticated; the push module is used to push the content related to the target service to the user to be authenticated if the authentication is passed.
关于本实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。Regarding the apparatus in this embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment of the method, and will not be described in detail here.
根据本实施例提供的用户的鉴权装置,通过采用预设模型从用户群体的众多特征类中确定与目标业务相关的目标特征类,扩充目标业务的特征,并进一步基于扩充后的目标特征类,根据待鉴权用户的标识获取待鉴权用户的相关数据,基于待鉴权用户的相关数据对待鉴权用户进行鉴权,向鉴权通过的待鉴权用户推送与目标业务相关的内容,提高了潜在用户的挖掘量及准确性,可以有效提高业务信息推送的准确性,从而提高营销效果。According to the user authentication device provided in this embodiment, the target feature class related to the target service is determined from the numerous feature classes of the user group by adopting a preset model, the feature of the target service is expanded, and further based on the expanded target feature class , obtain the relevant data of the user to be authenticated according to the identity of the user to be authenticated, authenticate the user to be authenticated based on the relevant data of the user to be authenticated, and push the content related to the target service to the user to be authenticated who has passed the authentication, The mining volume and accuracy of potential users are improved, and the accuracy of business information push can be effectively improved, thereby improving the marketing effect.
实施例四Embodiment 4
本实施例对上述实施例提供的装置做进一步补充说明。This embodiment provides further supplementary descriptions of the devices provided in the foregoing embodiments.
作为一种可实施的方式,在上述实施例的基础上,可选地,确定模块,具体用于:As an implementable manner, on the basis of the foregoing embodiment, optionally, the determination module is specifically used for:
根据第一特征数据及预设的随机森林RF模型,确定第一特征数据中各特征类在RF模型中的目标特征重要性评分;According to the first feature data and the preset random forest RF model, determine the target feature importance score of each feature class in the first feature data in the RF model;
根据各特征类在RF模型中的目标特征重要性评分,确定与目标业务相关的目标特征类。According to the target feature importance score of each feature class in the RF model, the target feature class related to the target business is determined.
可选地,确定模块,具体用于:Optionally, determine a module, specifically for:
根据第一特征数据,获取至少两个决策树对应分别对应的第一样本;According to the first feature data, obtain the first samples corresponding to at least two decision trees respectively;
对于每个决策树,基于其对应的第一样本,确定M个特征类中随机的m个特征类在该决策树中的第一特征重要性评分,M为第一特征数据中包括的特征类的总数,m为小于M的整数;For each decision tree, based on its corresponding first sample, determine the first feature importance score of m random feature classes among the M feature classes in the decision tree, where M is the feature included in the first feature data The total number of classes, m is an integer less than M;
对于每个特征类,基于该特征类在各决策树中的第一特征重要性评分,确定该特征类的第二特征重要性评分,第二特征重要性评分为该特征类在各决策树中的第一特征重要性评分的和;For each feature class, based on the first feature importance score of the feature class in each decision tree, the second feature importance score of the feature class is determined, and the second feature importance score is the feature class in each decision tree. The sum of the first feature importance scores of ;
将各特征类的第二特征重要性评分进行归一化处理,获得各特征类的目标特征重要性评分。The second feature importance score of each feature class is normalized to obtain the target feature importance score of each feature class.
可选地,确定模块,具体用于:Optionally, determine a module, specifically for:
对于该决策树中的每个节点,计算m个特征类中各特征类在该节点的基尼Gini指数;For each node in the decision tree, calculate the Gini index of each feature class of the m feature classes at the node;
对于m个特征类中的每个特征类,基于该特征类在该决策树中各节点的Gini指数,计算获得该特征类在各节点的Gini指数变化量;For each feature class in the m feature classes, based on the Gini index of each node of the feature class in the decision tree, calculate and obtain the variation of the Gini index of the feature class at each node;
基于m个特征类中的每个特征类在各节点的Gini指数变化量,计算获得m个特征类在该决策树中的第一特征重要性评分。Based on the variation of the Gini index of each of the m feature classes at each node, the first feature importance score of the m feature classes in the decision tree is calculated and obtained.
在一些实施方式中,可选地,确定模块,具体用于:In some embodiments, optionally, the determining module is specifically used for:
根据目标特征重要性评分,获取评分大于预设阈值的特征类,作为目标特征类。According to the importance score of the target feature, the feature class with the score greater than the preset threshold is obtained as the target feature class.
作为另一种可实施的方式,在上述实施例的基础上,可选地,目标业务为租房业务,第一特征数据包括租房类行为特征数据,以及基于租房类行为特征数据挖掘出的非租房类行为特征数据。As another implementable manner, on the basis of the above-mentioned embodiment, optionally, the target business is a rental business, and the first feature data includes rental behavior feature data and non-rental properties mined based on the rental behavior feature data. Class behavior characteristic data.
作为另一种可实施的方式,在上述实施例的基础上,可选地,鉴权模块,具体用于:As another implementable manner, on the basis of the foregoing embodiment, optionally, the authentication module is specifically used for:
基于待鉴权用户的相关数据及预先训练好的分类模型,对待鉴权用户进行鉴权。Based on the relevant data of the user to be authenticated and the pre-trained classification model, the user to be authenticated is authenticated.
可选地,分类模型为Xgboost模型;获取模块,还用于:Optionally, the classification model is an Xgboost model; the acquisition module is also used for:
获取训练样本数据;Get training sample data;
采用训练样本数据对预先建立的Xgboost网络进行训练,获得Xgboost模型。The pre-established Xgboost network is trained with the training sample data to obtain the Xgboost model.
关于本实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。Regarding the apparatus in this embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment of the method, and will not be described in detail here.
需要说明的是,本实施例中各可实施的方式可以单独实施,也可以在不冲突的情况下以任意组合方式结合实施本申请不做限定。It should be noted that, each implementable manner in this embodiment may be implemented independently, or may be implemented in combination in any combination under the condition of no conflict, which is not limited in this application.
根据本实施例的用户的鉴权装置,通过采用预设模型从用户群体的众多特征类中确定与目标业务相关的目标特征类,扩充目标业务的特征,并进一步基于扩充后的目标特征类,根据待鉴权用户的标识获取待鉴权用户的相关数据,基于待鉴权用户的相关数据对待鉴权用户进行鉴权,向鉴权通过的待鉴权用户推送与目标业务相关的内容,提高了潜在用户的挖掘量及准确性,可以有效提高业务信息推送的准确性,从而提高营销效果。According to the user authentication device of the present embodiment, the target feature class related to the target service is determined from the numerous feature classes of the user group by adopting a preset model, the feature of the target service is expanded, and further based on the expanded target feature class, Obtain the relevant data of the user to be authenticated according to the identification of the user to be authenticated, authenticate the user to be authenticated based on the relevant data of the user to be authenticated, and push the content related to the target service to the user to be authenticated that has passed the authentication, improving the It can effectively improve the accuracy of business information push, thereby improving the marketing effect.
实施例五Embodiment 5
本实施例提供一种计算机设备,用于执行上述实施例提供的方法。This embodiment provides a computer device for executing the method provided by the foregoing embodiment.
如图5所示,为本实施例提供的计算机设备的结构示意图。该计算机设备50包括:至少一个处理器51和存储器52;As shown in FIG. 5 , it is a schematic structural diagram of a computer device provided in this embodiment. The
存储器存储计算机程序;至少一个处理器执行存储器存储的计算机程序,以实现上述实施例提供的方法。The memory stores a computer program; at least one processor executes the computer program stored in the memory to implement the methods provided by the above embodiments.
根据本实施例的计算机设备,通过采用预设模型从用户群体的众多特征类中确定与目标业务相关的目标特征类,扩充目标业务的特征,并进一步基于扩充后的目标特征类,根据待鉴权用户的标识获取待鉴权用户的相关数据,基于待鉴权用户的相关数据对待鉴权用户进行鉴权,向鉴权通过的待鉴权用户推送与目标业务相关的内容,提高了潜在用户的挖掘量及准确性,可以有效提高业务信息推送的准确性,从而提高营销效果。According to the computer equipment of the present embodiment, the target feature class related to the target service is determined from the numerous feature classes of the user group by using the preset model, the features of the target service are expanded, and further based on the expanded target feature class, according to the target feature class to be identified The identification of the authorized user obtains the relevant data of the user to be authenticated, authenticates the user to be authenticated based on the relevant data of the user to be authenticated, and pushes the content related to the target service to the user to be authenticated who has passed the authentication. The mining volume and accuracy can effectively improve the accuracy of business information push, thereby improving the marketing effect.
实施例六Embodiment 6
本实施例提供一种计算机可读存储介质,该计算机可读存储介质中存储有计算机程序,计算机程序被执行时实现上述任一实施例提供的方法。This embodiment provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed, the method provided by any of the foregoing embodiments is implemented.
根据本实施例的计算机可读存储介质,通过采用预设模型从用户群体的众多特征类中确定与目标业务相关的目标特征类,扩充目标业务的特征,并进一步基于扩充后的目标特征类,根据待鉴权用户的标识获取待鉴权用户的相关数据,基于待鉴权用户的相关数据对待鉴权用户进行鉴权,向鉴权通过的待鉴权用户推送与目标业务相关的内容,提高了潜在用户的挖掘量及准确性,可以有效提高业务信息推送的准确性,从而提高营销效果。According to the computer-readable storage medium of the present embodiment, the target feature class related to the target service is determined from the numerous feature classes of the user group by using a preset model, the feature of the target service is expanded, and further based on the expanded target feature class, Obtain the relevant data of the user to be authenticated according to the identification of the user to be authenticated, authenticate the user to be authenticated based on the relevant data of the user to be authenticated, and push the content related to the target service to the user to be authenticated that has passed the authentication, improving the It can effectively improve the accuracy of business information push, thereby improving the marketing effect.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of hardware plus software functional units.
上述以软件功能单元的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请各个实施例所述方法的部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The above-mentioned integrated units implemented in the form of software functional units can be stored in a computer-readable storage medium. The above-mentioned software function unit is stored in a storage medium, and includes several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute the methods described in the various embodiments of the present application. some steps. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .
本领域技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。上述描述的装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of the description, only the division of the above functional modules is used for illustration. The internal structure is divided into different functional modules to complete all or part of the functions described above. For the specific working process of the apparatus described above, reference may be made to the corresponding process in the foregoing method embodiments, and details are not described herein again.
最后应说明的是:以上各实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述各实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The technical solutions described in the foregoing embodiments can still be modified, or some or all of the technical features thereof can be equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the embodiments of the present application. scope.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910535299.3A CN111754253A (en) | 2019-06-20 | 2019-06-20 | User authentication method, device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910535299.3A CN111754253A (en) | 2019-06-20 | 2019-06-20 | User authentication method, device, computer equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111754253A true CN111754253A (en) | 2020-10-09 |
Family
ID=72672935
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910535299.3A Pending CN111754253A (en) | 2019-06-20 | 2019-06-20 | User authentication method, device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111754253A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113554460A (en) * | 2021-07-19 | 2021-10-26 | 北京沃东天骏信息技术有限公司 | Method and device for identifying potential user |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030139963A1 (en) * | 2000-12-08 | 2003-07-24 | Chickering D. Maxwell | Decision theoretic approach to targeted solicitation by maximizing expected profit increases |
CN101589405A (en) * | 2006-02-02 | 2009-11-25 | 微软公司 | Ad targeting and/or pricing based on customer behavior |
CN105488697A (en) * | 2015-12-09 | 2016-04-13 | 焦点科技股份有限公司 | Potential customer mining method based on customer behavior characteristics |
CN105915956A (en) * | 2015-12-15 | 2016-08-31 | 乐视网信息技术(北京)股份有限公司 | Video content recommendation method, device, server and system |
CN107590102A (en) * | 2016-07-06 | 2018-01-16 | 阿里巴巴集团控股有限公司 | Random Forest model generation method and device |
CN107730311A (en) * | 2017-09-29 | 2018-02-23 | 北京小度信息科技有限公司 | A kind of method for pushing of recommendation information, device and server |
CN107918922A (en) * | 2017-11-15 | 2018-04-17 | 中国联合网络通信集团有限公司 | Business recommended method and business recommended device |
CN108076157A (en) * | 2017-12-29 | 2018-05-25 | 北京奇虎科技有限公司 | Message content push control method, system and computer equipment |
CN108932625A (en) * | 2017-05-23 | 2018-12-04 | 北京京东尚科信息技术有限公司 | Analysis method, device, medium and the electronic equipment of user behavior data |
CN109522876A (en) * | 2018-12-13 | 2019-03-26 | 北京交通大学 | Subway station building staircase selection prediction technique and system based on BP neural network |
CN109636482A (en) * | 2018-12-21 | 2019-04-16 | 苏宁易购集团股份有限公司 | Data processing method and system based on similarity model |
CN109767255A (en) * | 2018-12-06 | 2019-05-17 | 东莞团贷网互联网科技服务有限公司 | A method of it is modeled by big data and realizes intelligence operation and precision marketing |
-
2019
- 2019-06-20 CN CN201910535299.3A patent/CN111754253A/en active Pending
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030139963A1 (en) * | 2000-12-08 | 2003-07-24 | Chickering D. Maxwell | Decision theoretic approach to targeted solicitation by maximizing expected profit increases |
CN101589405A (en) * | 2006-02-02 | 2009-11-25 | 微软公司 | Ad targeting and/or pricing based on customer behavior |
CN105488697A (en) * | 2015-12-09 | 2016-04-13 | 焦点科技股份有限公司 | Potential customer mining method based on customer behavior characteristics |
CN105915956A (en) * | 2015-12-15 | 2016-08-31 | 乐视网信息技术(北京)股份有限公司 | Video content recommendation method, device, server and system |
CN107590102A (en) * | 2016-07-06 | 2018-01-16 | 阿里巴巴集团控股有限公司 | Random Forest model generation method and device |
CN108932625A (en) * | 2017-05-23 | 2018-12-04 | 北京京东尚科信息技术有限公司 | Analysis method, device, medium and the electronic equipment of user behavior data |
CN107730311A (en) * | 2017-09-29 | 2018-02-23 | 北京小度信息科技有限公司 | A kind of method for pushing of recommendation information, device and server |
CN107918922A (en) * | 2017-11-15 | 2018-04-17 | 中国联合网络通信集团有限公司 | Business recommended method and business recommended device |
CN108076157A (en) * | 2017-12-29 | 2018-05-25 | 北京奇虎科技有限公司 | Message content push control method, system and computer equipment |
CN109767255A (en) * | 2018-12-06 | 2019-05-17 | 东莞团贷网互联网科技服务有限公司 | A method of it is modeled by big data and realizes intelligence operation and precision marketing |
CN109522876A (en) * | 2018-12-13 | 2019-03-26 | 北京交通大学 | Subway station building staircase selection prediction technique and system based on BP neural network |
CN109636482A (en) * | 2018-12-21 | 2019-04-16 | 苏宁易购集团股份有限公司 | Data processing method and system based on similarity model |
Non-Patent Citations (1)
Title |
---|
邓尚坤等: "基于随机森林方法的证券市场内幕交易行为识别", 三峡大学学报(人文社会科学版), vol. 41, no. 3, 31 May 2019 (2019-05-31), pages 70 - 75 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113554460A (en) * | 2021-07-19 | 2021-10-26 | 北京沃东天骏信息技术有限公司 | Method and device for identifying potential user |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107424043B (en) | Product recommendation method and device and electronic equipment | |
CN106528693B (en) | Educational resource recommended method and system towards individualized learning | |
CN112528025A (en) | Text clustering method, device and equipment based on density and storage medium | |
CN107423613B (en) | Method and device for determining device fingerprint according to similarity and server | |
CN103455411B (en) | The foundation of daily record disaggregated model, user behaviors log sorting technique and device | |
CN107590232B (en) | A resource recommendation system and method based on network learning environment | |
US20170235726A1 (en) | Information identification and extraction | |
CN108269122B (en) | Advertisement similarity processing method and device | |
CN103248658A (en) | Service recommendation device, service recommendation method and mobile device | |
KR20190128246A (en) | Searching methods and apparatus and non-transitory computer-readable storage media | |
CN108182253A (en) | For generating the method and apparatus of information | |
CN113918806B (en) | Method and related device for automatically recommending training courses | |
CN106095738A (en) | Recommendation tables single slice | |
CN114648010A (en) | Data table standardization method, apparatus, equipment and computer storage medium | |
CN113656699B (en) | User feature vector determining method, related equipment and medium | |
CN111626767A (en) | Resource data distribution method, device and equipment | |
CN113704373A (en) | User identification method and device based on movement track data and storage medium | |
JP7092194B2 (en) | Information processing equipment, judgment method, and program | |
CN113379004A (en) | Data table classification method and device, electronic equipment and storage medium | |
CN105512122B (en) | The sort method and device of information retrieval system | |
CN105653548A (en) | Method and system for identifying page type of electronic document | |
CN113706173B (en) | Information management method and device, electronic equipment and storage medium | |
CN113204662B (en) | Method, device and computer equipment for predicting user group based on photo-search behavior | |
CN108959289B (en) | Website category acquisition method and device | |
CN111754253A (en) | User authentication method, device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201009 |
|
RJ01 | Rejection of invention patent application after publication |