CN117708222A

CN117708222A - Association rule mining method for customer segmentation

Info

Publication number: CN117708222A
Application number: CN202311430166.2A
Authority: CN
Inventors: 韩彤童; 于钰娜; 刘文博
Original assignee: Qilu Institute of Technology
Current assignee: Qilu Institute of Technology
Priority date: 2023-10-31
Filing date: 2023-10-31
Publication date: 2024-03-15

Abstract

The invention provides a client subdivision-oriented association rule mining method, which comprises the following steps: carrying out client subdivision portraits on the collected client data sets, and dividing the client data sets into clients with different value categories; counting commodity data purchased by clients of each price class, and generating a commodity transaction data set of the clients of each price class; and screening out frequent item sets from each commodity transaction data set to obtain a target association rule for purchasing commodities by clients of each price class. The method improves the accuracy of client subdivision, results profit margin and achieves maximization of enterprise profit.

Description

Association rule mining method for customer segmentation

技术领域Technical field

本发明属于数据挖掘技术领域，尤其涉及一种面向客户细分的关联规则挖掘方法。The invention belongs to the technical field of data mining, and in particular relates to an association rule mining method for customer segmentation.

背景技术Background technique

面向客户细分的关联规则挖掘可以从消费者交易记录中发掘商品与商品之间的关联关系，进而通过商品捆绑销售或者相关推荐的方式带来更多的销售量。现有的方法发现的均为商品间的关联关系，没有考虑各个属性的重要程度，而且忽略了不同客户之间的差异性和多样性，并未对商品与客户群体特征之间进行关联性分析，导致挖掘出的关联规则缺乏针对性和有效性。Association rule mining for customer segmentation can discover the relationship between products from consumer transaction records, and then bring more sales through product bundling or related recommendations. Existing methods only discover correlations between commodities, without considering the importance of each attribute, and ignore the differences and diversity among different customers, and do not conduct correlation analysis between commodities and customer group characteristics. , resulting in the mined association rules lacking pertinence and effectiveness.

发明内容Contents of the invention

针对现有技术的不足，本发明提出一种面向客户细分的关联规则挖掘方法，通过对客户进行细分，并在客户细分的条件下分析商品与客户群体特征之间的关联性，提高了关联规则挖掘的效率，提高结果的利润率。In view of the shortcomings of the existing technology, the present invention proposes an association rule mining method for customer segmentation. By segmenting customers and analyzing the correlation between products and customer group characteristics under the conditions of customer segmentation, the invention improves the efficiency of customer segmentation. It improves the efficiency of association rule mining and improves the profit margin of results.

为了实现上述目的，本发明一方面提供一种面向客户细分的关联规则挖掘方法，包含：In order to achieve the above object, on the one hand, the present invention provides an association rule mining method for customer segmentation, including:

对收集的客户数据集进行客户细分画像，划分为不同价值类别的客户；Conduct customer segmentation portraits on the collected customer data sets and divide them into customers of different value categories;

统计每一价值类别的客户所购买的商品数据，生成每一价值类别客户的商品交易数据集；Statistics of commodity data purchased by customers in each value category are generated, and a commodity transaction data set of customers in each value category is generated;

从每一所述商品交易数据集中筛选出频繁项集，得出每一价值类别客户购买商品的目标关联规则。Frequent item sets are filtered out from each of the commodity transaction data sets, and target association rules for customers purchasing commodities in each value category are obtained.

在一些实施例中，以客户最后一次购买日期和统计周期结束日期之间的时间间隔、客户在统计期间的购买次数、以及在统计期间客户在购买中花费的总金额与每个客户消费产生的总利润作为评价指标，构造客户细分模型，确定所述客户数据集中每一客户的价值得分，In some embodiments, the time interval between the customer's last purchase date and the end date of the statistical period, the number of purchases made by the customer during the statistical period, and the total amount spent by the customer on purchases during the statistical period are calculated based on the time interval between the customer's last purchase date and the end date of the statistical period. Total profit is used as an evaluation indicator to construct a customer segmentation model and determine the value score of each customer in the customer data set.

score＝w_R×R+w_F×F+w_M×M_p score＝w _R ×R+w _F ×F+w _M ×M _p

其中，score表示客户价值得分，R表示客户最后一次购买日期和统计周期结束日期之间的时间间隔，为第一评价指标；F表示客户在统计期间的购买次数，为第二评价指标；为第三评价指标，M表示统计期间客户在购买中花费的总金额，P表示每个客户消费产生的总利润，q1、q2为权重；w_R、w_F、w_M分别表示R，F，M_p这三个评价指标的权重。Among them, score represents the customer value score, R represents the time interval between the customer's last purchase date and the end date of the statistical period, which is the first evaluation indicator; F represents the number of purchases by the customer during the statistical period, which is the second evaluation indicator; As the third evaluation index, M represents the total amount spent by customers on purchases during the statistical period, P represents the total profit generated by each customer's consumption, q1 and q2 are weights; w _R , w _F , and w _M represent R and F respectively. M _pThe weight of these three evaluation indicators.

在一些实施例中，采用K-means聚类算法对客户价值得分进行聚类，划分出不同价值类别的客户，包含：In some embodiments, the K-means clustering algorithm is used to cluster customer value scores and classify customers into different value categories, including:

统计客户数据集中每一客户的价值得分，生成价值得分数据集；Calculate the value score of each customer in the customer data set and generate a value score data set;

从所述价值得分数据集中随机选择K个价值得分数据作为K个聚类的聚类中心，K表示聚类分组数；Randomly select K value score data from the value score data set as the cluster centers of K clusters, where K represents the number of cluster groups;

将所述价值得分数据集中剩余每个数据分配到其中心距离最短的那一聚类中；Assign each remaining data in the value score data set to the cluster with the shortest distance between its centers;

重新计算每一聚类的新的聚类中心，并在新的聚类中心与上一次迭代中得到的聚类中心相同时，输出所述聚类分组数。Recalculate the new cluster center of each cluster, and when the new cluster center is the same as the cluster center obtained in the previous iteration, output the number of cluster groups.

在一些实施例中，通过确定不同聚类分组数K下的轮廓系数，确定最佳的K值。In some embodiments, the optimal K value is determined by determining the silhouette coefficients under different clustering group numbers K.

在一些实施例中，利用熵权法结合层次分析法，确定所述客户细分模型中各个评价指标的权重，包含：In some embodiments, the entropy weight method combined with the analytic hierarchy process is used to determine the weight of each evaluation indicator in the customer segmentation model, including:

为所述第一评价指标、第二评价指标、第三评价指标构造层次模型，Construct a hierarchical model for the first evaluation index, the second evaluation index, and the third evaluation index,

从所述第一评价指标、第二评价指标、第三评价指标中选取一指标，利用熵权法确定该指标的第一权重；Select an index from the first evaluation index, the second evaluation index, and the third evaluation index, and determine the first weight of the index using the entropy weight method;

将所述第一权重输入所述层次模型中，用层次分析法得到所述第一评价指标、第二评价指标、第三评价指标的第二权重。The first weight is input into the hierarchical model, and the second weight of the first evaluation index, the second evaluation index, and the third evaluation index is obtained using the analytic hierarchy process.

在一些实施例中，利用熵权法确定评价指标的第一权重，包含：In some embodiments, the entropy weight method is used to determine the first weight of the evaluation index, including:

每一评价指标包含每一客户对应的子指标；Each evaluation indicator includes sub-indicators corresponding to each customer;

对各子指标进行归一化，计算各子指标的信息熵值为：Normalize each sub-indicator and calculate the information entropy value of each sub-indicator as:

式中，一子指标所占的比值/>X_ij表示第i个客户样本的第j个子指标；i＝1，2，…，n；j＝1，2，…，m；In the formula, The ratio of one sub-indicator/> X _ij represents the j-th sub-indicator of the i-th customer sample; i=1, 2,...,n; j=1, 2,...,m;

计算信息熵差异系数为：Calculate the information entropy difference coefficient as:

P_j＝1-H_j(j＝1,2,…,m)P _j ＝1-H _j (j＝1,2,…,m)

根据信息熵系数计算各子指标权重为：得到第一权重。The weight of each sub-index is calculated based on the information entropy coefficient: Get the first weight.

在一些实施例中，从每一所述商品交易数据集中筛选出频繁项集，得出每一价值类别客户购买商品的关联规则，包含：In some embodiments, frequent item sets are filtered out from each commodity transaction data set, and association rules for commodities purchased by customers in each value category are obtained, including:

扫描每一所述商品交易数据集，根据支持度找到频繁项目集，Scan each commodity transaction data set and find frequent item sets based on support,

把得到的频繁项集生成候选关联规则，丢弃置信度值低于最小置信度的候选关联规则，得到每一价值类别客户购买商品的目标关联规则。Generate candidate association rules from the obtained frequent item sets, discard the candidate association rules whose confidence value is lower than the minimum confidence, and obtain the target association rules for the products purchased by customers in each value category.

在一些实施例中，根据支持度找到频繁项目集，包含：In some embodiments, finding frequent itemsets based on support includes:

统计每一所述商品交易数据集中频繁出现的项目集的出现次数，删除支持度低于第一阈值的项目，将剩余的项目按支持度降序排序并存储在项头表中；Count the number of occurrences of frequently occurring item sets in each commodity transaction data set, delete items whose support is lower than the first threshold, sort the remaining items in descending order of support and store them in the item header table;

读取所述项头表，删除其中的支持度低于第一支持度阈值的项目，将剩余的项目按支持度降序排序，确定频繁项集；Read the item header table, delete items whose support is lower than the first support threshold, sort the remaining items in descending order of support, and determine frequent item sets;

将每一所述商品交易数据集中排序后的频繁项集依次插入频繁模式树中，构建FP-tree。Insert the sorted frequent itemsets in each commodity transaction data set into the frequent pattern tree in order to construct an FP-tree.

在一些实施例中，在FP-tree的项头表前加入哈希表和有序链表，用于记录每一个数据项当前的最后一个节点。In some embodiments, a hash table and an ordered linked list are added before the header table of the FP-tree to record the current last node of each data item.

在一些实施例中，对客户数据集进行客户细分画像之前，还包含：In some embodiments, before performing customer segmentation profiling on the customer data set, it also includes:

对客户数据集进行预处理，包含：Preprocess customer data sets, including:

对满足预设清洗条件的数据进行丢弃、删除；Discard and delete data that meets preset cleaning conditions;

对数据清理后的客户数据集进行邻域属性约简，计算每个样本的邻域，并分析邻域与样本的一致性，在剩余样本中生成正域样本。Perform neighborhood attribute reduction on the customer data set after data cleaning, calculate the neighborhood of each sample, analyze the consistency between the neighborhood and the sample, and generate positive domain samples among the remaining samples.

本发明一方面还提供了一种面向客户细分的关联规则挖掘装置，采取上述的面向客户细分的关联规则挖掘方法，该装置至少包含：On the one hand, the present invention also provides a customer segmentation-oriented association rule mining device, adopting the above-mentioned customer segmentation-oriented association rule mining method, and the device at least includes:

客户细分模块，用于对收集的客户数据集进行客户细分画像，划分为不同价值类别的客户；The customer segmentation module is used to perform customer segmentation portraits on the collected customer data sets and divide them into customers of different value categories;

关联规则挖掘模块，用于统计每一价值类别的客户所购买的商品数据，生成每一价值类别客户的商品交易数据集；The association rule mining module is used to count the commodity data purchased by customers in each value category and generate a commodity transaction data set for customers in each value category;

本发明另一方面还提供了一种可读存储介质，可读存储介质上存储有程序或指令，该程序或指令被处理器执行时实现上述面向客户细分的关联规则挖掘方法的步骤，且能达到相同的技术效果。On the other hand, the present invention also provides a readable storage medium. A program or instruction is stored on the readable storage medium. When the program or instruction is executed by a processor, the steps of the above-mentioned association rule mining method for customer segmentation are implemented, and can achieve the same technical effect.

由以上方案可知，本发明的优点在于：It can be seen from the above solutions that the advantages of the present invention are:

本发明提供的面向客户细分的关联规则挖掘方法，通过对收集的客户数据集进行客户细分画像，划分为不同价值类别的客户；然后，统计每一价值类别的客户所购买的商品数据，生成每一价值类别客户的商品交易数据集；最后，从每一所述商品交易数据集中筛选出频繁项集，得出每一价值类别客户购买商品的目标关联规则。该方法引入利润属性改进了RFM模型，提高结果的利润率；并利用熵权法和层次分析法综合求权重来确定改进后RFM模型各个指标的权重，同时利用轮廓系数确定最优K值后，使用K-means算法进行聚类，建立客户细分模型，提高客户细分的精确度，实现企业收益的最大化。同时，采用在FP-tree筛选出频繁项集，确定每一价值类别客户购买商品的目标关联规则，并在项头表前加入哈希表和有序链表，通过避免重复扫描数据集来提高关联规则挖掘效率，大大减少了搜索时间。The association rule mining method for customer segmentation provided by the present invention performs customer segmentation portraits on the collected customer data sets and divides them into customers of different value categories; then, counts the product data purchased by customers of each value category, Generate commodity transaction data sets for customers in each value category; finally, filter out frequent item sets from each commodity transaction data set to obtain target association rules for products purchased by customers in each value category. This method introduces the profit attribute to improve the RFM model and improve the profit margin of the results; and uses the entropy weight method and the analytic hierarchy process to comprehensively calculate the weights to determine the weight of each indicator of the improved RFM model, and uses the contour coefficient to determine the optimal K value. Use the K-means algorithm for clustering, establish a customer segmentation model, improve the accuracy of customer segmentation, and maximize corporate profits. At the same time, the FP-tree is used to filter out frequent item sets, determine the target association rules for the products purchased by customers in each value category, and add a hash table and ordered linked list in front of the item header table to improve the association by avoiding repeated scanning of the data set. Rule mining efficiency greatly reduces search time.

附图说明Description of the drawings

图1显示了面向客户细分的关联规则挖掘方法的整体流程示意图；Figure 1 shows the overall flow diagram of the association rule mining method for customer segmentation;

图2显示了FP-tree频繁模式树结构图；Figure 2 shows the FP-tree frequent pattern tree structure diagram;

图3显示了面向客户细分的关联规则挖掘装置的架构图；Figure 3 shows the architecture diagram of the association rule mining device for customer segmentation;

其中：in:

300-面向客户细分的关联规则挖掘装置；300 - Association rule mining device for customer segmentation;

301-数据预处理模块；301-Data preprocessing module;

302-客户细分模块；302-Customer segmentation module;

303-关联规则挖掘模块；303-Association rule mining module;

S1-S4：步骤。S1-S4: steps.

具体实施方式Detailed ways

为让本发明的上述特征和效果能阐述的更明确易懂，下文特举实施例，并配合说明书附图作详细说明如下。In order to make the above-mentioned features and effects of the present invention more clear and understandable, examples are given below and are described in detail with reference to the accompanying drawings.

下面将对本发明提供的面向客户细分的关联规则挖掘方法进行详细说明。The association rule mining method for customer segmentation provided by the present invention will be described in detail below.

如图1中所示，图1示出了该面向客户细分的关联规则挖掘方法的总体流程图。As shown in Figure 1, Figure 1 shows the overall flow chart of the association rule mining method for customer segmentation.

一种面向客户细分的关联规则挖掘方法，包含：An association rule mining method for customer segmentation, including:

S1、对收集的客户数据集进行预处理。S1. Preprocess the collected customer data set.

于本实施例中，基于邻域粗糙集的属性简化算法对收集的客户数据进行预处理，确保选择出最重要的特征，同时该算法也考虑到了属性之间的关联性。基于邻域粗糙集的属性简化算法经过不断迭代，每次迭代都引入一个能使正域潜力发挥到最大的属性到属性简化集，直到正域停止变化，此时这就明确了数据集的一个简化。其中值得注意的是，在基于邻域粗糙集的属性简化算法中的最佳邻域半径必须由数据集的具体情况决定。In this embodiment, the attribute simplification algorithm based on neighborhood rough sets preprocesses the collected customer data to ensure that the most important features are selected. At the same time, the algorithm also takes into account the correlation between attributes. The attribute simplification algorithm based on neighborhood rough sets continues to iterate. Each iteration introduces an attribute that can maximize the potential of the positive domain into the attribute simplified set until the positive domain stops changing. At this time, this clarifies an attribute of the data set. simplify. It is worth noting that the optimal neighborhood radius in the attribute reduction algorithm based on neighborhood rough sets must be determined by the specific conditions of the data set.

在数据预处理阶段，首先，对收集的初始客户数据进行数据清洗，以分析数据的规律以及异常值，于具体实现中，可以使用Pandas对满足清洗条件的数据进行丢弃、删除。然后，采用基于邻域粗糙集属性约简算法实现，对数据清理后的客户数据集进行属性约简，可以通过Python语言实现。具体的，通过计算每个样本的邻域，并分析邻域与样本的一致性，在剩余样本中生成正域样本，这样可以避免大量的重复操作，并可以确保选择出最重要的特征，同时考虑到了属性之间的关联性。In the data preprocessing stage, first, the collected initial customer data is cleaned to analyze data patterns and outliers. In specific implementation, Pandas can be used to discard and delete data that meets the cleaning conditions. Then, the attribute reduction algorithm based on neighborhood rough sets is used to implement attribute reduction on the customer data set after data cleaning, which can be implemented through Python language. Specifically, by calculating the neighborhood of each sample and analyzing the consistency between the neighborhood and the sample, positive domain samples are generated among the remaining samples. This can avoid a large number of repeated operations and ensure that the most important features are selected. Taking into account the correlation between attributes.

S2、对预处理后的客户数据集进行客户细分画像，划分为不同价值类别的客户。S2. Carry out customer segmentation portraits on the preprocessed customer data set and divide it into customers of different value categories.

由于传统的RFM模型的客户价值分类结果准确性并不高，获得的每类价值客户的收益缺乏差异化，不仅如此，分群后重要价值客户的企业收益情况和人均获利情况有时还会小于一般客户，这使得高收益客户区分度不高，分群趋势也不明显。因此，于本实施例中，引入利润属性，改进RFM模型，基于改进的RFM模型和K-means聚类算法进行客户细分画像，划分为不同价值类别的客户。具体如下：Since the accuracy of the customer value classification results of the traditional RFM model is not high, the income of each type of value customers lacks differentiation. Not only that, the corporate income and per capita profit of important value customers after grouping are sometimes smaller than the average Customers, this makes the differentiation of high-yield customers not very high, and the grouping trend is not obvious. Therefore, in this embodiment, profit attributes are introduced, the RFM model is improved, and customer segmentation portraits are performed based on the improved RFM model and K-means clustering algorithm, and customers are divided into different value categories. details as follows:

于本实施例中，在传统RFM模型的基础上，引入利润属性，修改模型中指标M的计算方式，构造改进后的客户细分模型。具体，以客户最后一次购买日期和统计周期结束日期之间的时间间隔、客户在统计期间的购买次数、以及在统计期间客户在购买中花费的总金额与每个客户消费产生的总利润作为评价指标，构造客户细分模型，确定客户数据集中每一客户的价值得分，即：In this embodiment, based on the traditional RFM model, profit attributes are introduced, the calculation method of the indicator M in the model is modified, and an improved customer segmentation model is constructed. Specifically, the evaluation is based on the time interval between the customer's last purchase date and the end date of the statistical period, the number of purchases made by the customer during the statistical period, the total amount spent by the customer on purchases during the statistical period and the total profit generated by each customer's consumption. Indicators, construct a customer segmentation model, and determine the value score of each customer in the customer data set, namely:

score＝w_R×R+w_F×F+w_M×M_p score＝w _R ×R+w _F ×F+w _M ×M _p

其中，score表示客户价值得分，R表示客户最后一次购买日期和统计周期结束日期之间的时间间隔，为第一评价指标；F表示客户在统计期间的购买次数，为第二评价指标；为第三评价指标，M表示统计期间客户在购买中花费的总金额，P表示每个客户消费产生的总利润，q₁、q₂为权重；w_R、w_F、w_M分别表示R，F，M_p这三个评价指标的权重。Among them, score represents the customer value score, R represents the time interval between the customer's last purchase date and the end date of the statistical period, which is the first evaluation indicator; F represents the number of purchases by the customer during the statistical period, which is the second evaluation indicator; As the third evaluation index, M represents the total amount spent by customers on purchases during the statistical period, P represents the total profit generated by each customer's consumption, q ₁ and q ₂ are weights; w _R , w _F , and w _M represent R respectively. The weights of the three evaluation indicators F and M _p .

进一步的，对于客户细分模型中的各评价指标的权重值，于本实施例利用熵权法结合层次分析法，确定客户细分模型中各个评价指标的权重。具体为：Furthermore, for the weight values of each evaluation index in the customer segmentation model, in this embodiment, the entropy weight method combined with the analytic hierarchy process is used to determine the weight of each evaluation index in the customer segmentation model. Specifically:

首先，对于第三评价指标的权重q₁、q₂，本实施例中采用层次分析法的主成分分析确定权值。具体，将每个客户消费产生的总利润P作为第四评价指标，客户在一段时间内平均支付金额即统计期间客户在购买中花费的总金额与客户在统计期间的购买次数的比值作为第五评价指标，为第四评价指标、第五评价指标构造层次模型，用层次分析法得到第四评价指标、第五评价指标的权重q₁、q₂。具体：通过构造第四评价指标、第五评价指标的各主成分，将主成分的方差贡献率作为权重，然后对指标在各主成分线性组合中的系数加权平均并进行归一化处理，归一化的指数权重即为第四评价指标、第五评价指标的权重q₁、q₂。First, for the third evaluation index The weights q ₁ and q ₂ are determined by principal component analysis of the analytic hierarchy process in this embodiment. Specifically, the total profit P generated by each customer's consumption is used as the fourth evaluation indicator, and the average payment amount of the customer within a period of time, that is, the ratio of the total amount spent by the customer on purchases during the statistical period to the number of purchases made by the customer during the statistical period, is used as the fifth evaluation indicator. For evaluation indicators, construct a hierarchical model for the fourth and fifth evaluation indicators, and use the analytic hierarchy process to obtain the weights q ₁ and q ₂ of the fourth and fifth evaluation indicators. Specifically: by constructing the principal components of the fourth evaluation index and the fifth evaluation index, using the variance contribution rate of the principal components as the weight, and then weighting the average of the coefficients of the index in the linear combination of each principal component and performing normalization processing. The unified index weight is the weight q ₁ and q ₂ of the fourth evaluation index and the fifth evaluation index.

进一步的，在确定出第四评价指标、第五评价指标的权重q₁、q2之后，于本实施例中，利用熵权法结合层次分析法，确定第一评价指标、第二评价指标、第三评价指标的权重。于具体实现中，可以选用如下方式：Further, after determining the weights q ₁ and q2 of the fourth and fifth evaluation indicators, in this embodiment, the entropy weight method combined with the analytic hierarchy process is used to determine the first, second and third evaluation indicators. The weight of the three evaluation indicators. In specific implementation, the following methods can be used:

首先，通过上述主成分分析法，为第一评价指标、第二评价指标、第三评价指标构造层次模型。First, through the above-mentioned principal component analysis method, a hierarchical model is constructed for the first evaluation index, the second evaluation index, and the third evaluation index.

然后，从第一评价指标、第二评价指标、第三评价指标中选取至少一指标，利用熵权法确定该指标的第一权重。Then, at least one index is selected from the first evaluation index, the second evaluation index, and the third evaluation index, and the first weight of the index is determined using the entropy weight method.

最后，将第一权重输入层次模型中，进一步通过层次分析法得到第一评价指标、第二评价指标、第三评价指标的第二权重。Finally, the first weight is input into the hierarchical model, and the second weight of the first evaluation index, the second evaluation index, and the third evaluation index is further obtained through the analytic hierarchy process.

于具体实现时，本实施例利用熵权法确定评价指标的第一权重，熵权法是一种客观赋权法，该方法通过计算熵和熵权值来确定各个指标的权重。不确定性越大，熵值越大，对应的熵权越小。若熵权为零，则表明无法提供任何有用的信息给决策者，则删除该指标。具体包含：During specific implementation, this embodiment uses the entropy weight method to determine the first weight of the evaluation index. The entropy weight method is an objective weighting method. This method determines the weight of each indicator by calculating entropy and entropy weight. The greater the uncertainty, the greater the entropy value, and the smaller the corresponding entropy weight. If the entropy weight is zero, it indicates that it cannot provide any useful information to decision makers, so the indicator is deleted. Specifically includes:

从第一评价指标、第二评价指标、第三评价指标中选取至少一指标，每一评价指标包含每一客户对应的子指标。对各子指标进行归一化处理，由于客户细分模型中指标仅包括正向型指标和负向型指标，若选取的指标为正向型指标，其归一化方法为：At least one index is selected from the first evaluation index, the second evaluation index, and the third evaluation index, and each evaluation index includes sub-indexes corresponding to each customer. Normalize each sub-indicator. Since the indicators in the customer segmentation model only include positive indicators and negative indicators, if the selected indicator is a positive indicator, the normalization method is:

z_ij＝x_ij-min(x_1j,x_2j,…,x_nj)/max(x_1j,x_2j,…,x_nj)-min(x_1j,x_2j,…,x_nj)z _ij ＝x _ij -min(x _1j ,x _2j ,…,x _nj )/max(x _1j ,x _2j ,…,x _nj )-min(x _1j ,x _2j ,…,x _nj )

若选取的指标为负向指标，其归一化方法为：If the selected indicator is a negative indicator, the normalization method is:

z_ij＝max(x_1j,x_2j,…,x_nj)-x_ij/max(x_1j,x_2j,…,x_nj)-min(x_1j,x_2j,…,x_nj)z _ij ＝max(x _1j ,x _2j ,…,x _nj) -x _ij /max(x _1j ,x _2j ,…,x _nj )-min(x _1j ,x _2j ,…,x _nj )

其中，X_ij表示第i个客户样本的第j个子指标；i＝1，2，…，n；j＝1，2，…，m。Among _them ,

进一步的，在对各子指标进行归一化处理之后，计算各子指标的信息熵值，并由信息熵值，确定信息熵差异系数，进而根据信息熵系数计算簇各子指标权重。具体，各子指标的信息熵值表示为：式中，/>一子指标所占的比值信息熵差异系数表示为：P_j＝1-H_j(j＝1,2,…,m)；计算各子指标权重为：/>从而得到第一权重。Further, after normalizing each sub-indicator, the information entropy value of each sub-indicator is calculated, and the information entropy difference coefficient is determined from the information entropy value, and then the weight of each sub-indicator of the cluster is calculated based on the information entropy coefficient. Specifically, the information entropy value of each sub-indicator is expressed as: In the formula,/> The proportion of one sub-indicator The information entropy difference coefficient is expressed as: P _j =1-H _j (j = 1,2,...,m); the weight of each sub-index is calculated as:/> Thus the first weight is obtained.

于本实施例中，在上述利用熵权法确定出第一权重之后，将该第一权重输入层次模型中，并转换为AHP判断矩阵中的两两比较数据，进一步通过层次分析法计算得到AHP总排序权重，即得到第一评价指标、第二评价指标、第三评价指标的第二权重，进而构造出客户细分模型，确定出客户数据集中每一客户的价值得分，即：In this embodiment, after the first weight is determined using the entropy weight method, the first weight is input into the hierarchical model and converted into pairwise comparison data in the AHP judgment matrix. The AHP is further calculated through the analytic hierarchy process. The total ranking weight is to obtain the second weight of the first evaluation index, the second evaluation index, and the third evaluation index, and then construct a customer segmentation model to determine the value score of each customer in the customer data set, that is:

score＝w_R×R+w_F×F+w_M×M_p score＝w _R ×R+w _F ×F+w _M ×M _p

进一步的，于本实施例中，在构造出客户细分模型，确定出客户数据集中每一客户的价值得分之后，采用K-means聚类算法对客户价值得分进行聚类，划分出不同价值类别的客户，具体包含：Further, in this embodiment, after constructing a customer segmentation model and determining the value score of each customer in the customer data set, the K-means clustering algorithm is used to cluster the customer value scores and divide them into different value categories. customers, specifically including:

从价值得分数据集中随机选择K个价值得分数据作为K个聚类的聚类中心，K表示聚类分组数。K value score data are randomly selected from the value score data set as the cluster centers of K clusters, where K represents the number of cluster groups.

将价值得分数据集中剩余每个数据分配到其中心距离最短的那一聚类中。于具体实现中，通常使用欧氏距离测量，即：其中m为数据对象的维度，X_j、C_ij为X和C_i的第j个属性值。Assign each remaining piece of data in the value score dataset to the cluster with the shortest distance between its centers. In specific implementation, Euclidean distance measurement is usually used, that is: Where m is the dimension of the data object, X _j and C _ij are the j-th attribute values of X and C _i .

然后，重新计算每一聚类的新的聚类中心，具体可以将组内所有数据的均值作为新的聚类中心，平方误差为并在新的聚类中心与上一次迭代中得到的聚类中心相同时，输出聚类分组数K。Then, recalculate the new cluster center of each cluster. Specifically, the mean of all data in the group can be used as the new cluster center. The square error is And when the new clustering center is the same as the clustering center obtained in the previous iteration, the number of clustering groups K is output.

对于如何选择最佳的聚类分组数K，本实施例引入轮廓系数，通过计算不同聚类分组数K下的轮廓系数，确定最佳的K值，提高K-means聚类有效性。单个数据i的轮廓系数S表示为：Regarding how to select the optimal clustering group number K, this embodiment introduces the silhouette coefficient. By calculating the silhouette coefficient under different clustering grouping numbers K, the optimal K value is determined to improve the effectiveness of K-means clustering. The silhouette coefficient S of a single data i is expressed as:

其中，a(i)表示数据i到同一聚类内其他数据点不相似程度的平均值，b(i)表示数据i到其他聚类的平均不相似程度的最小值，-1＜S(i)＜1，S(i)表示该聚类中数据点的分组紧密程度，当S(i)越接近1，聚类效果越好。当S(i)约等于0时，则a(i)和b(i)近似相等。Among them, a(i) represents the average degree of dissimilarity between data i and other data points in the same cluster, b(i) represents the minimum value of the average degree of dissimilarity between data i and other clusters, -1＜S(i )<1, S(i) represents the tightness of grouping of data points in the cluster. When S(i) is closer to 1, the clustering effect is better. When S(i) is approximately equal to 0, then a(i) and b(i) are approximately equal.

S3、统计每一价值类别的客户所购买的商品数据，生成每一价值类别客户的商品交易数据集。S3. Statistics of commodity data purchased by customers in each value category, and generates a commodity transaction data set for customers in each value category.

于本实施例中，在通过步骤S2实现客户细分画像之后，进而统计出每一价值类别的客户所购买的商品数据，生成每一价值类别客户的商品交易数据集，以便进一步进行关联规则挖掘。In this embodiment, after the customer segmentation portrait is implemented through step S2, the data of goods purchased by customers of each value category is calculated, and a commodity transaction data set of customers of each value category is generated for further mining of association rules. .

S4、从每一商品交易数据集中筛选出频繁项集，得出每一价值类别客户购买商品的目标关联规则。S4. Filter out frequent item sets from each product transaction data set, and obtain the target association rules for customers purchasing products in each value category.

在具体实现中，低支持度和置信度是评估生成规则有效性的主要指标。如果一个项目集的支持度大于或等于用户定义的最小支持度，则称为频繁项目集。因此，于本实施例中，通过两次扫描商品交易数据集，根据支持度和置信度筛选出频繁项集，然后提取关联规则，丢弃置信度值低于最小置信度的关联规则。具体，通过扫描每一商品交易数据集，根据支持度找到频繁项目集，然后，把得到的频繁项集生成候选关联规则，丢弃置信度值低于最小置信度的候选关联规则，得到每一价值类别客户购买商品的目标关联规则。In specific implementations, low support and confidence are the main indicators to evaluate the effectiveness of generated rules. If the support of an itemset is greater than or equal to the user-defined minimum support, it is called a frequent itemset. Therefore, in this embodiment, by scanning the commodity transaction data set twice, frequent item sets are filtered out based on support and confidence, and then association rules are extracted, and association rules whose confidence value is lower than the minimum confidence are discarded. Specifically, by scanning each commodity transaction data set, find frequent item sets based on support, and then generate candidate association rules from the obtained frequent item sets, discarding candidate association rules with confidence values lower than the minimum confidence level, and obtain each value Target association rules for category customers to purchase items.

于本实施例中，具体通过对数据集生成FP-tree，然后从生成的FP-tree中不断挖掘频繁项集。即：首先，统计每一商品交易数据集中频繁出现的项目集的出现次数，删除支持度低于第一阈值的项目，将剩余的项目按支持度降序排序并存储在项头表中；读取项头表，再次删除其中的支持度低于第一支持度阈值的项目，将剩余的项目按支持度降序排序，确定出频繁项集。然后，将每一商品交易数据集中排序后的频繁项集依次插入频繁模式树中，构建FP-tree。在构建好FP-tree后，就可以开始对之前处理过的数据集进行频繁项集的挖掘了。也就是首先从构建好的FP-tree中的得到条件模式基，进而构建条件FP-tree，最后迭代重复前两步，直至树中包含一个元素项为止。然后把得到的频繁项集生成候选关联规则，设置最小置信度，丢弃置信度值低于最小置信度的候选关联规则，对候选关联规则进一步过滤，这样得到了每一价值类别客户购买商品的目标关联规则。In this embodiment, an FP-tree is generated for the data set, and then frequent item sets are continuously mined from the generated FP-tree. That is: first, count the number of occurrences of frequently appearing item sets in each commodity transaction data set, delete items whose support is lower than the first threshold, sort the remaining items in descending order of support and store them in the item header table; read From the item header table, items whose support is lower than the first support threshold are deleted again, and the remaining items are sorted in descending order of support to determine frequent item sets. Then, the sorted frequent itemsets in each commodity transaction data set are inserted into the frequent pattern tree in turn to construct an FP-tree. After constructing the FP-tree, you can start mining frequent itemsets on the previously processed data sets. That is, first obtain the conditional pattern base from the constructed FP-tree, then construct the conditional FP-tree, and finally iteratively repeat the first two steps until the tree contains an element. Then generate candidate association rules from the obtained frequent item sets, set the minimum confidence level, discard the candidate association rules whose confidence value is lower than the minimum confidence level, and further filter the candidate association rules, thus obtaining the target of purchasing goods by customers in each value category. Association rules.

此外，由于在上述构建FP-tree的过程中，需要扫描两次数据集，增加了搜索时间，影响了算法的运行效率。针对此情况，本实施例进一步在FP-tree的项头表前加入哈希表和有序链表，用于记录每一个数据项当前的最后一个节点，充分利用第一次扫描数据库得来的事务项信息，这样只需要扫描一次数据集即可构建FP-tree频繁模式树，减少搜索时间，不需要重复扫描原数据。In addition, since in the above process of constructing FP-tree, the data set needs to be scanned twice, the search time is increased and the operating efficiency of the algorithm is affected. In view of this situation, this embodiment further adds a hash table and an ordered linked list before the item header table of the FP-tree to record the current last node of each data item and make full use of the transactions obtained from the first scan of the database. information, so that the FP-tree frequent pattern tree can be constructed by scanning the data set only once, reducing search time and eliminating the need to repeatedly scan the original data.

下面以一具体商品交易数据集说明该步骤，商品交易数据集如表1中所示，This step is explained below with a specific commodity transaction data set. The commodity transaction data set is shown in Table 1.

表1-1商品交易数据Table 1-1 Commodity transaction data

根据表1中的数据，设置最小支持度为0.4，进行关联规则算法挖掘，其步骤如下：According to the data in Table 1, set the minimum support to 0.4 and conduct association rule algorithm mining. The steps are as follows:

(1)首次扫描表1的商品交易数据集，建立项头表，如表1-2所示。(1) Scan the commodity transaction data set in Table 1 for the first time and establish a header table, as shown in Table 1-2.

表1-2数据项头表Table 1-2 Data header table

(2)通过排序后的项头表，第二次扫描数据集，删除非频繁1-项集，排序后的数据集见表1-3所示。(2) Through the sorted item header table, scan the data set for the second time and delete non-frequent 1-item sets. The sorted data set is shown in Table 1-3.

表1-3排序后数据集Table 1-3 Sorted data set

(3)建立FP-Tree，进行关联规则挖掘。(3) Establish FP-Tree to mine association rules.

根据排序后的数据集，建立FP-tree频繁模式树，其结果见图2所示。Based on the sorted data set, an FP-tree frequent pattern tree is established, and the results are shown in Figure 2.

首先，找出树中的最低子节点，从底向上依次挖掘，得出子树{B，A，D}，{B，A，C，D}，{B，C}，{C，D}。First, find the lowest child node in the tree and dig from the bottom up to get the subtrees {B, A, D}, {B, A, C, D}, {B, C}, {C, D} .

对最低子节点D，得出FP子树为{B:8，A:6，C:3，D:2}，{B:8，A:6，D:2}，{C:5，D:3}，根据支持度过滤得出其条件模式基为{B:6，A:6，D:4}；对最低子节点C，其FP子树为{8:7，C:1}，根据支持度过滤，{B，C}为非频繁项集。For the lowest child node D, the FP subtree is {B:8, A:6, C:3, D:2}, {B:8, A:6, D:2}, {C:5, D :3}, according to support filtering, the conditional pattern base is {B:6, A:6, D:4}; for the lowest child node C, its FP subtree is {8:7, C:1}, Filtering based on support, {B, C} is a non-frequent item set.

综上，得出频繁3-项集为{B，A，D}，频繁2-项集{{B，A}，{B，D}，{A，D}}，频繁1-项集为{{B}，{A}，{C}，{D}}。To sum up, it can be concluded that the frequent 3-item set is {B, A, D}, the frequent 2-item set is {{B, A}, {B, D}, {A, D}}, and the frequent 1-item set is {{B}, {A}, {C}, {D}}.

最后，把得到的频繁项集生成候选关联规则，设置最小置信度，丢弃置信度值低于最小置信度的候选关联规则。Finally, the obtained frequent item sets are generated into candidate association rules, the minimum confidence is set, and candidate association rules whose confidence value is lower than the minimum confidence are discarded.

此外，进一步的，当数据量很大时，构建的FP-tree的结构也会变得很庞大，十分影响算法的性能，甚至使得机器内存很难实现迭代中条件模式树的构建，无法实现FP-Growth挖掘算法。针对此问题，具体可以采用在Hadoop平台上进行关联规则算法并行化，并通过动态分组实现了负载均衡，采用多个局部FP-tree代替全局FP-tree，几个分区同时进行FP挖掘，最后通过主节点将统计结果进行聚合，筛选出不低于阈值的数据。这样就可以解决海量数据下庞大的FP-Tree无法驻留在内存的问题，高效地实现在Hadoop框架下FP的分布式挖掘。In addition, further, when the amount of data is large, the structure of the constructed FP-tree will also become very large, which greatly affects the performance of the algorithm, and even makes it difficult to construct the conditional pattern tree in the iteration in the machine memory, making it impossible to realize FP -Growth mining algorithm. To address this problem, we can specifically parallelize the association rule algorithm on the Hadoop platform, achieve load balancing through dynamic grouping, use multiple local FP-trees to replace the global FP-tree, perform FP mining in several partitions at the same time, and finally use The master node aggregates the statistical results and filters out data that is not lower than the threshold. This can solve the problem that huge FP-Tree cannot reside in memory under massive data, and effectively realize distributed mining of FP under the Hadoop framework.

综上，本发明提供的面向客户细分的关联规则挖掘方法，通过对收集的客户数据集进行客户细分画像，划分为不同价值类别的客户；然后，统计每一价值类别的客户所购买的商品数据，生成每一价值类别客户的商品交易数据集；最后，从每一商品交易数据集中筛选出频繁项集，得出每一价值类别客户购买商品的目标关联规则。该方法在数据预处理过程中，采用邻域粗糙集进行属性约简，提高数据处理的运行效率。在客户细分过程中，针对RFM客户细分模型存在细分准确率不高的问题，考虑利用聚类算法来提高客户分群质量。并且为了实现企业收益的最大化，通过引入利润属性改进了RFM模型，然后利用熵权法和层次分析法综合求权重来确定改进后RFM模型各个指标的权重，最后利用轮廓系数确定最优K值后，使用K-means算法进行聚类，建立客户细分模型，并对算法的有效性进行验证。在关联规则挖掘过程中，针对FP-Growth关联规则挖掘算法需对数据集扫描两次，致使效率低下的情况，采用在FP-tree的项头表前加入哈希表和有序链表，通过避免重复扫描数据集来提高挖掘效率，大大减少了搜索时间。该面向客户细分的关联规则挖掘方法适用于零售企业、电商企业、金融企业针对会员用户进行客户细分画像、确定高价值客户，针对他们购买的商品关联规则挖掘可以指导精准推荐商品或商品摆放位置等。In summary, the association rule mining method for customer segmentation provided by the present invention performs customer segmentation profiling on the collected customer data sets and divides them into customers of different value categories; and then counts the purchases of customers in each value category. Commodity data is used to generate commodity transaction data sets for customers in each value category; finally, frequent item sets are filtered out from each commodity transaction data set to obtain target association rules for products purchased by customers in each value category. During the data preprocessing process, this method uses neighborhood rough sets for attribute reduction to improve the operating efficiency of data processing. During the customer segmentation process, the RFM customer segmentation model has the problem of low segmentation accuracy. Consider using clustering algorithms to improve the quality of customer segmentation. In order to maximize corporate profits, the RFM model was improved by introducing profit attributes, and then the entropy weight method and the analytic hierarchy process were used to comprehensively calculate the weights to determine the weight of each indicator of the improved RFM model, and finally the contour coefficient was used to determine the optimal K value. Finally, the K-means algorithm is used for clustering, a customer segmentation model is established, and the effectiveness of the algorithm is verified. In the process of association rule mining, in order to solve the problem that the FP-Growth association rule mining algorithm needs to scan the data set twice, resulting in low efficiency, a hash table and an ordered linked list are added before the header table of FP-tree to avoid Repeatedly scan the data set to improve mining efficiency, greatly reducing search time. This customer segmentation-oriented association rule mining method is suitable for retail companies, e-commerce companies, and financial companies to conduct customer segmentation portraits and identify high-value customers for member users. Mining association rules for the products they purchase can guide accurate recommendation of products or products. Placement, etc.

此外，本发明上述实施例可以应用于面向客户细分的关联规则挖掘方法功能的终端设备中，该终端设备可以包括个人终端、以及上位机终端等，本发明实施例对此不加以限制。In addition, the above-mentioned embodiments of the present invention can be applied to terminal devices with the function of the association rule mining method for customer segmentation. The terminal devices may include personal terminals, host computer terminals, etc., and the embodiments of the present invention are not limited thereto.

参照图3，图3示出了一种面向客户细分的关联规则挖掘装置300，其可实现通过如图1所示的面向客户细分的关联规则挖掘，本申请实施例提供的面向客户细分的关联规则挖掘装置能够实现上述面向客户细分的关联规则挖掘方法实现的各个过程。其至少包含：Referring to Figure 3, Figure 3 shows a customer segmentation-oriented association rule mining device 300, which can realize customer detail-oriented association rule mining as shown in Figure 1. The separate association rule mining device can realize each process of the above-mentioned customer segmentation-oriented association rule mining method. It contains at least:

数据预处理模块301，用于对收集的客户数据集进行预处理。The data preprocessing module 301 is used to preprocess the collected customer data sets.

客户细分模块302，用于对预处理后的客户数据集进行客户细分画像，划分为不同价值类别的客户；The customer segmentation module 302 is used to perform customer segmentation portraits on the preprocessed customer data set and divide it into customers of different value categories;

关联规则挖掘模块303，用于统计每一价值类别的客户所购买的商品数据，生成每一价值类别客户的商品交易数据集；The association rule mining module 303 is used to count the commodity data purchased by customers in each value category and generate a commodity transaction data set for customers in each value category;

从每一商品交易数据集中筛选出频繁项集，得出每一价值类别客户购买商品的目标关联规则。Frequent item sets are filtered out from each product transaction data set, and the target association rules for customers purchasing products in each value category are obtained.

此外，应当理解，在根据本申请实施例的面向客户细分的关联规则挖掘装置300中，仅以上述各功能模块的划分进行举例说明，实际应用中，可以根据需要而将上述功能分配由不同的功能模块完成，即面向客户细分的关联规则挖掘装置300可划分为与上述例示出的模块不同的功能模块，以完成以上描述的全部或者部分功能。In addition, it should be understood that in the association rule mining device 300 for customer segmentation according to the embodiment of the present application, only the division of the above-mentioned functional modules is used as an example. In actual applications, the above-mentioned functions can be allocated according to needs. The functional modules are completed, that is, the association rule mining device 300 for customer segmentation can be divided into functional modules different from the modules illustrated above to complete all or part of the functions described above.

此外，本申请实施例还提供了一种电子设备，包括处理器，存储器，存储在存储器上并可在处理器上运行的程序或指令，该程序或指令被处理器执行时实现上述面向客户细分的关联规则挖掘方法的步骤，且能达到相同的技术效果。In addition, embodiments of the present application also provide an electronic device, including a processor, a memory, and a program or instructions stored in the memory and executable on the processor. When the program or instructions are executed by the processor, the above-mentioned customer-oriented details can be implemented. The steps of the association rule mining method are divided into different steps, and can achieve the same technical effect.

此外，本申请实施例还提供一种可读存储介质，可读存储介质上存储有程序或指令，该程序或指令被处理器执行时实现上述面向客户细分的关联规则挖掘方法的步骤，且能达到相同的技术效果。In addition, embodiments of the present application also provide a readable storage medium. Programs or instructions are stored on the readable storage medium. When the program or instructions are executed by the processor, the steps of the above-mentioned association rule mining method for customer segmentation are implemented, and can achieve the same technical effect.

需要说明的是，在本文中，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。此外，需要指出的是，本申请实施方式中的方法和装置的范围不限按示出或讨论的顺序来执行功能，还可包括根据所涉及的功能按基本同时的方式或按相反的顺序来执行功能，例如，可以按不同于所描述的次序来执行所描述的方法，并且还可以施加、省去、或组合各种步骤。另外，参照某些示例所描述的特征可在其他示例中被组合。It should be noted that, in this document, the terms "comprising", "comprises" or any other variations thereof are intended to cover a non-exclusive inclusion, such that a process, method, article or device that includes a series of elements not only includes those elements, It also includes other elements not expressly listed or inherent in the process, method, article or apparatus. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, method, article or apparatus that includes that element. In addition, it should be pointed out that the scope of the methods and devices in the embodiments of the present application is not limited to performing functions in the order shown or discussed, but may also include performing functions in a substantially simultaneous manner or in reverse order according to the functions involved. Functions may be performed, for example, the methods described may be performed in an order different from that described, and various steps may also be applied, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。Through the above description of the embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better. implementation.

上面结合附图对本申请的实施例进行了描述，但是本申请并不局限于上述的具体实施方式，上述的具体实施方式仅仅是示意性的，而不是限制性的，本领域的普通技术人员在本申请的启示下，在不脱离本申请宗旨和权利要求所保护的范围情况下，还可做出很多形式，均属于本申请的保护之内。The embodiments of the present application have been described above in conjunction with the accompanying drawings. However, the present application is not limited to the above-mentioned specific implementations. The above-mentioned specific implementations are only illustrative and not restrictive. Those of ordinary skill in the art will Inspired by this application, many forms can be made without departing from the purpose of this application and the scope protected by the claims, all of which fall within the protection of this application.

Claims

1. The association rule mining method for client-oriented subdivision is characterized by comprising the following steps of:

carrying out client subdivision portraits on the collected client data sets, and dividing the client data sets into clients with different value categories;

counting commodity data purchased by clients of each price class, and generating a commodity transaction data set of the clients of each price class;

and screening out frequent item sets from each commodity transaction data set to obtain a target association rule for purchasing commodities by clients of each price class.

2. The method according to claim 1, characterized by comprising:

constructing a client subdivision model by taking the time interval between the last purchase date of the client and the end date of the statistics period, the number of purchases of the client during the statistics period, the total amount spent by the client during the statistics period in purchases and the total profit generated by consumption of each client as evaluation indexes, determining the value score of each client in the client data set,

score＝w _R ×R+w _F ×F+w _M ×M _p

wherein score represents a customer value score, R represents a time interval between a last purchase date of the customer and an end date of the statistical period, and is a first evaluation index; f represents the purchase times of the clients in the statistical period and is a second evaluation index;for the third evaluation index, M represents the total amount spent by the customer in purchasing during the statistics period, P represents the total profit generated by consumption of each customer, and q1 and q2 are weights; w (w) _R 、w _F 、w _M Respectively represent R, F, M _p Weights of the three evaluation indexes.

3. The method according to claim 2, characterized by comprising:

clustering the customer value scores by adopting a K-means clustering algorithm, and dividing the customers with different value categories, wherein the method comprises the following steps:

counting the value score of each client in the client data set to generate a value score data set;

randomly selecting K value score data from the value score data set as cluster centers of K clusters, wherein K represents the cluster grouping number;

distributing each data remaining in the value score data set to the cluster with the shortest central distance;

and recalculating a new cluster center of each cluster, and outputting the cluster grouping number when the new cluster center is the same as the cluster center obtained in the last iteration.

4. A method according to claim 3, comprising:

the optimal K value is determined by determining the contour coefficients under different cluster groupings K.

5. The method of claim 2, wherein the step of determining the position of the substrate comprises,

and determining the weight of each evaluation index in the client subdivision model by utilizing an entropy weight method and a analytic hierarchy process, wherein the weight comprises the following steps:

constructing a hierarchical model for the first, second and third evaluation indexes,

selecting an index from the first evaluation index, the second evaluation index and the third evaluation index, and determining a first weight of the index by using an entropy weight method;

and inputting the first weight into the hierarchical model, and obtaining the second weights of the first evaluation index, the second evaluation index and the third evaluation index by using a hierarchical analysis method.

6. The method of claim 5, wherein the step of determining the position of the probe is performed,

determining a first weight of the evaluation index by using an entropy weight method comprises the following steps:

each evaluation index comprises a sub-index corresponding to each customer;

normalizing each sub-index, and calculating the information entropy value of each sub-index as follows:

wherein,the ratio of one index to->X _ij A j-th sub-index representing an i-th customer sample; i=1, 2, …, n; j=1, 2, …, m;

the information entropy difference coefficient is calculated as follows:

P _j ＝1-H _j (j＝1,2,…,m)

calculating the weights of all the sub indexes according to the information entropy coefficient as follows:a first weight is obtained.

7. The method of claim 1, wherein screening the frequent item sets from each of the commodity transaction data sets to derive association rules for each value class customer to purchase commodity comprises:

scanning each commodity transaction data set, finding frequent item sets according to the support degree,

and generating candidate association rules from the obtained frequent item set, discarding candidate association rules with confidence values lower than the minimum confidence values, and obtaining target association rules of commodity purchase of clients with each value category.

8. The method of claim 7, wherein the step of determining the position of the probe is performed,

finding a frequent item set according to the support, including:

counting the occurrence times of the item sets frequently occurring in each commodity transaction data set, deleting the items with the support degree lower than a first threshold value, sorting the rest items according to the support degree descending order, and storing the rest items in an item header table;

reading the item header table, deleting items with the support degree lower than a first support degree threshold value, sorting the rest items in descending order of the support degree, and determining frequent item sets;

and sequentially inserting the ordered frequent item sets in each commodity transaction data set into a frequent pattern tree to construct an FP-tree.

9. The method as recited in claim 8, further comprising:

and adding a hash table and an ordered linked list in front of the head table of the FP-tree for recording the current last node of each data item.

10. The method of claim 1, further comprising, prior to client subdivision representation of the client data set:

preprocessing a customer data set, comprising:

discarding and deleting the data meeting the preset cleaning conditions;

and carrying out neighborhood attribute reduction on the client data set after data cleaning, calculating the neighborhood of each sample, analyzing the consistency of the neighborhood and the sample, and generating a positive domain sample in the residual samples.