CN104732419A - Application of positive and negative sequence mode screening method in customer purchasing behavior analysis - Google Patents
Application of positive and negative sequence mode screening method in customer purchasing behavior analysis Download PDFInfo
- Publication number
- CN104732419A CN104732419A CN201510025586.1A CN201510025586A CN104732419A CN 104732419 A CN104732419 A CN 104732419A CN 201510025586 A CN201510025586 A CN 201510025586A CN 104732419 A CN104732419 A CN 104732419A
- Authority
- CN
- China
- Prior art keywords
- sequence
- feasible
- customer
- pattern
- size
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 24
- 238000004458 analytical method Methods 0.000 title claims abstract description 17
- 238000012216 screening Methods 0.000 title claims abstract description 8
- 230000006399 behavior Effects 0.000 claims description 26
- 238000012360 testing method Methods 0.000 claims description 8
- 230000002596 correlated effect Effects 0.000 claims description 6
- 230000000875 corresponding effect Effects 0.000 claims description 6
- 238000005065 mining Methods 0.000 description 13
- CURNJKLCYZZBNJ-UHFFFAOYSA-M sodium;4-nitrophenolate Chemical compound [Na+].[O-]C1=CC=C([N+]([O-])=O)C=C1 CURNJKLCYZZBNJ-UHFFFAOYSA-M 0.000 description 8
- 230000009268 pathologic speech processing Effects 0.000 description 4
- 208000032207 progressive 1 supranuclear palsy Diseases 0.000 description 4
- 241001122767 Theaceae Species 0.000 description 1
- 235000008429 bread Nutrition 0.000 description 1
- 230000001364 causal effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
本发明涉及正负序列模式筛选方法在客户购买行为分析中的应用,提出了一个名为SAP的高效算法来选择出可行的正负序列模式,所述算法的主要思想是通过e-NSP方法挖掘出所有的正序列模式和负序列模式,然后对每一个序列模式进行筛选,选出可行的正负序列模式。本发明应用在进行客户购买行为分析过程中,不仅利用最小支持度筛选出某一段时间内客户购买比较多的商品,而且还利用了相关系数来筛选出这段时间内与客户购买相关性比较大的商品,这样客户在购买商品时,利用本发明可以向他推荐一些其它客户都会买的并且和此产品相关性比较大的商品,从而增加客户的交易机会,将网站浏览者转变为购买者,提高交叉销售能力,提高客户的忠诚度,以及提高网站的经济效益。The invention relates to the application of positive and negative sequence pattern screening methods in the analysis of customer purchase behavior, and proposes an efficient algorithm named SAP to select feasible positive and negative sequence patterns. The main idea of the algorithm is to mine through the e-NSP method All positive sequence patterns and negative sequence patterns are extracted, and then each sequence pattern is screened to select feasible positive and negative sequence patterns. The present invention is applied in the process of analyzing the customer's purchase behavior, not only using the minimum support to screen out the products that the customer purchases more in a certain period of time, but also using the correlation coefficient to screen out the goods that have a relatively high correlation with the customer's purchase during this period of time In this way, when a customer purchases a product, the present invention can be used to recommend to him some products that other customers will buy and have a relatively high correlation with this product, thereby increasing the transaction opportunities of the customer and converting the website browser into a buyer. Improve cross-selling capabilities, increase customer loyalty, and improve website profitability.
Description
技术领域 technical field
本发明涉及正负序列模式筛选方法在客户购买行为分析中的应用,属于序列模式挖掘应用的技术领域。 The invention relates to the application of a positive and negative sequence pattern screening method in the analysis of customer purchase behavior, and belongs to the technical field of sequence pattern mining applications.
背景技术 Background technique
随着互联网的普及和电子商务业务的发展,网上购物也已经成为人们主要的购物方式之一。我们只需联网操作不出家门即可获得较为满意的商品。然而,随着信息数量的激增,使得网上购物变得复杂,耗时,同时很多大型的电子商务网站,如Amazon、阿里巴巴旗下的淘宝和天猫商城、京东等都积累了大量的客户交易数据。如何充分利用这些数据,获得客户的购物模式,对客户进行个性化的商品推荐,从而更好地提高网站的服务质量和经济效益,是电子商务迫切需要解决的问题。 Along with the popularization of Internet and the development of e-commerce business, online shopping has also become one of people's main shopping methods. We only need to operate online without going out of the house to get more satisfactory products. However, with the surge in the amount of information, online shopping has become complicated and time-consuming. At the same time, many large-scale e-commerce websites, such as Amazon, Alibaba’s Taobao and Tmall Mall, and JD.com, have accumulated a large amount of customer transaction data. . How to make full use of these data, obtain customers' shopping patterns, and recommend personalized products to customers, so as to better improve the service quality and economic benefits of the website, is an urgent problem to be solved by e-commerce.
与传统的经营方式相比,电子商务的商家不能直观的去了解客户,获取的相关数据有限(比如客户的注册信息,购买记录等)。通过对大量的客户购买记录进行分析和挖掘,发现客户的频繁访问序列模式,针对不同的客户属性和网上购物步骤,采用不同的商品推荐形式,适时的向客户推荐恰当的商品,并优化电子商务网站商品的摆放位置,可以有效的增加客户的交易机会,将网站浏览者转变为购买者,提高交叉销售能力,提高客户的忠诚度,以及提高网站的服务质量和经济效益。 Compared with traditional business methods, e-commerce merchants cannot intuitively understand customers and obtain limited relevant data (such as customer registration information, purchase records, etc.). Through the analysis and mining of a large number of customer purchase records, the frequent access sequence pattern of customers is discovered, and different product recommendation forms are adopted for different customer attributes and online shopping steps, so as to recommend appropriate products to customers in a timely manner and optimize e-commerce The placement of products on the website can effectively increase the transaction opportunities of customers, turn website visitors into buyers, improve cross-selling capabilities, increase customer loyalty, and improve the service quality and economic benefits of the website.
电子商务的个性化商品推荐,它不需要付出很大的成本,只需要网站的内容根据每个客户的特点进行适当的调整,根据每个客户的消费偏好制作较多的个性化商品推荐网页,给客户提供较多的选择。这样也就相当于为每一个网络上的客户建立了一个网络上的商店,向每个客户进行具有针对性的商品推荐,帮助客户从庞大的商品目录中挑选出真正适合自己需要的商品。 Personalized product recommendation for e-commerce does not require a lot of cost. It only needs to adjust the content of the website according to the characteristics of each customer, and make more personalized product recommendation pages according to the consumption preferences of each customer. Provide customers with more choices. This is equivalent to establishing an online store for each customer on the network, recommending targeted products to each customer, and helping customers choose products that really suit their needs from a huge catalog.
目前大多数人都是用关联规则分析进行个性化商品推荐,很少有人用序列模式分析来解决这一问题。关联规则分析所要解决的问题是发现哪些商品是客户喜 欢一起购买的,客户在一次交易中购买某些商品后还会购买哪些商品,它是发现交易的内部规律的过程,从而使得分析人员能够按照客户的购买兴趣来安排商品的摆放,以增加交易量。而序列模式分析所要解决的问题是客户在完成一次交易之后,在以后的特定时间内,还会购买什么商品,是发现交易之间关系规律的过程,使得售货方能够根据当前的商品买卖情况来预测以后的商品买卖情况,从而能够更好的安排商品的摆放。它的主要目的是研究商品购买的先后关系,找出其中的规律,即不仅需要知道商品是否被购买,而且需要确定该商品与其它商品购买的先后顺序,例如,在线定购过商品A的客户,40%的人会在2个月内订购商品B。序列模式能够发现数据库中某一段时间内的一个频繁序列,即在这个时间段内哪些商品会被客户购买的比较多,多或少的标准是由最小支持度来决定的。每个序列是按照交易的时间排列的一组集合,可以设置最小支持度来挖掘满足不同频繁程度的序列。但在应用序列模式分析客户购买行为,解决个性化商品推荐问题时,他们仅考虑了已发生的事件,也称为正序列模式(Positive Sequential Pattern,PSP)挖掘。 At present, most people use association rule analysis for personalized product recommendation, and few people use sequential pattern analysis to solve this problem. The problem to be solved by association rule analysis is to find out which commodities customers like to buy together, and which commodities customers will buy after purchasing certain commodities in one transaction. It is the process of discovering the internal laws of transactions, so that analysts can Arrange the placement of products according to the purchasing interests of customers to increase the transaction volume. The problem to be solved by the sequential pattern analysis is that after a customer completes a transaction, what products will he buy in a certain period of time in the future? To predict the future trading situation of commodities, so as to better arrange the placement of commodities. Its main purpose is to study the sequence of commodity purchases and find out the rules, that is, it is not only necessary to know whether the commodity is purchased, but also to determine the order in which the commodity is purchased with other commodities. For example, customers who have ordered commodity A online, 40% of people will order item B within 2 months. Sequence mode can find a frequent sequence in a certain period of time in the database, that is, which products will be purchased by customers more during this period of time, and the standard of more or less is determined by the minimum support. Each sequence is a set of collections arranged according to the time of the transaction, and the minimum support can be set to mine sequences that meet different frequency levels. However, when applying sequential patterns to analyze customer purchase behavior and solve the problem of personalized product recommendation, they only consider the events that have occurred, which is also called positive sequential pattern (Positive Sequential Pattern, PSP) mining.
与正序列模式挖掘相比,负序列模式(Negative Sequential Pattern,NSP)挖掘还考虑了未发生事件,为数据分析提供了新的角度,能够更深入地分析和理解数据中的潜在含义。例如:a代表面包,b代表咖啡,c代表茶,d代表糖,表示一个客户购买序列模式,该模式说明在某一段时间内,该客户在购买了商品a、b后,在没有购买商品c的情况下,购买了商品d。这样的负序列模式有助于更全面地获取数据中隐含的规则和模式,在客户行为分析、医保欺诈检测、缺失基因与疾病的关系等众多应用领域起着不可替代的作用。但是在挖掘负序列模式之后,我们发现很难选出可行的能用于决策的模式,并不是所有的序列模式都能作用于客户购买行为分析,而且挖掘负序列模式后发现之前挖掘的正序列模式可能会误导决策,并且决策制定者还不知道。例如,仅挖掘正序列模式时,如得到<abc>,决策者就可以用它来进行决策。但是现在基于<abc>,我们可以获得的负序列模式有和显然,不是所有的模式都能用于决策。如果选取的决策模式不包含<abc>,说明原来用<abc>进行的决策就是误导的,而决策者并不知晓。因此如何选择可行的能用于决策的正负序列模式的选择问题是在挖掘负序列模式之后需要解决的最紧急的问题。 Compared with positive sequential pattern mining, negative sequential pattern (Negative Sequential Pattern, NSP) mining also considers non-occurring events, which provides a new perspective for data analysis and enables deeper analysis and understanding of potential meanings in data. For example: a stands for bread, b stands for coffee, c stands for tea, d stands for sugar, Represents a customer purchase sequence pattern, which indicates that within a certain period of time, the customer purchased product d after purchasing product a and b without purchasing product c. Such negative sequence patterns help to more comprehensively obtain the hidden rules and patterns in the data, and play an irreplaceable role in many application fields such as customer behavior analysis, medical insurance fraud detection, and the relationship between missing genes and diseases. However, after mining the negative sequence patterns, we found it difficult to select feasible patterns that can be used for decision-making. Not all sequence patterns can be used in the analysis of customer purchase behavior, and after mining the negative sequence patterns, we found the previously mined positive sequences Patterns can mislead decisions, and decision makers don't know it yet. For example, when only positive sequence patterns are mined, if <abc> is obtained, decision makers can use it to make decisions. But now based on <abc>, the negative sequence patterns we can get are and Obviously, not all patterns can be used for decision making. If the selected decision-making mode does not contain <abc>, it means that the original decision-making using <abc> is misleading, and the decision-maker does not know it. Therefore, how to choose a feasible positive and negative sequence pattern that can be used for decision-making is the most urgent problem that needs to be solved after mining the negative sequence pattern.
虽然有许多论文讨论了关联规则挖掘中的可行性知识发现和选择可行的模式/规则或感兴趣的方法,但是到目前为止还没有发现任何关于如何选择可行的正负序列模式的论文。这可能是因为此问题在挖掘负序列模式之前很难被发现,而且研究负序列模式挖掘的论文非常少,大部分主要集中在如何设计一个挖掘算法和如何提高算法的效率上。 Although there are many papers discussing feasible knowledge discovery and selecting feasible patterns/rules or methods of interest in association rule mining, so far no papers have been found on how to select feasible positive and negative sequential patterns. This may be because it is difficult to find this problem before mining negative sequence patterns, and there are very few papers on negative sequence pattern mining, most of which focus on how to design a mining algorithm and how to improve the efficiency of the algorithm.
以电子商务平台中的网站用户购买订单数据为挖掘的数据源。 The data source for mining is the purchase order data of website users in the e-commerce platform.
以5个客户在3个月内的交易为例,如表1是已经整理好的客户购买序列数据库。字母代表的是商品ID。 Taking the transactions of 5 customers within 3 months as an example, Table 1 is the customer purchase sequence database that has been sorted out. The letters represent the product ID.
表1是已经整理好的客户购买序列数据库 Table 1 is the customer purchase sequence database that has been sorted out
一个客户在某个时间段内所有的交易记录构成一个有序的序列,序列用<>表示。在序列中,项/项集是有顺序的,每个项都代表交易的一种商品,而元素则是指该客户在某一个具体的时间点一次性购买的所有商品,用{}或()表示,该客户可能在不同的时间段购买同一件商品,即一个项可能在一个序列的不同元素中发生。如表1中ID为20的客户购买序列为<(ad)c(bc)(ae)>,该客户分别在第一次和第四次购物时购买了商品a,其中(ad),c,(bc),(ae)这四个项目集可称为序列的元素,商品a,b,c,d,e则称为项,如果一个元素中只有一个项,则括号可以省略,如该序列中的元素c。 All transaction records of a customer within a certain period of time form an ordered sequence, and the sequence is represented by <>. In the sequence, the items/items are in order, each item represents a product of the transaction, and the element refers to all the products purchased by the customer at a specific point in time, using {} or ( ) indicates that the customer may purchase the same item in different time periods, that is, an item may occur in different elements of a sequence. For example, the purchase sequence of the customer whose ID is 20 in Table 1 is <(ad)c(bc)(ae)>, the customer purchased commodity a in the first and fourth shopping, where (ad), c, The four item sets (bc), (ae) can be called the elements of the sequence, and the commodities a, b, c, d, and e are called items. If there is only one item in an element, the brackets can be omitted, such as the sequence Element c in .
发明内容 Contents of the invention
针对现有技术的不足,本发明提供正负序列模式筛选方法在客户购买行为分析中的应用。本发明中提出一个名为SAP的高效算法来选择出可行的正负序列模式,所述算法的主要思想是通过e-NSP方法挖掘出所有的正序列模式和负序列模式,然后对每一个序列模式进行筛选,选出可行的正负序列模式。通过该算法筛选后得到的序列模式,来分析客户的购买行为,使得售货方能够根据当前的商品买卖情况来预测以后的商品买卖情况,从而能够更好的安排商品的摆放,提高商品销售量。 Aiming at the deficiencies of the prior art, the present invention provides the application of the positive and negative sequential pattern screening method in the analysis of customer purchase behavior. In the present invention, an efficient algorithm named SAP is proposed to select feasible positive and negative sequence patterns. The main idea of the algorithm is to dig out all positive sequence patterns and negative sequence patterns through the e-NSP method, and then for each sequence Modes are screened to select feasible positive and negative sequence modes. The sequence pattern obtained after screening by this algorithm is used to analyze the customer's purchase behavior, so that the seller can predict the future commodity sales situation according to the current commodity sales situation, so as to better arrange the placement of commodities and improve commodity sales quantity.
本发明的技术方案如下: Technical scheme of the present invention is as follows:
正负序列模式筛选方法在客户购买行为分析中的应用,包括如下步骤: The application of the positive and negative sequential pattern screening method in the analysis of customer purchase behavior includes the following steps:
(1)用相关系数函数来测量商品和商品之间的关系: (1) Use the correlation coefficient function to measure the relationship between commodities and commodities:
客户每次购买的一种商品为单独的一项,该客户在某一个具体的时间点一次性购买的所有商品为一个元素,所述序列包括一个客户在某一段时间内购买商品所对应的所有元素;ρ代表一个序列中任意两个元素之间相关系数: A product purchased by a customer each time is a separate item, and all the products purchased by the customer at one time at a specific point in time are an element, and the sequence includes all items corresponding to a customer's purchase of products within a certain period of time. element; ρ represents the correlation coefficient between any two elements in a sequence:
如果ρ>0,那么上述两个元素正相关;一次购买的商品越多,另一次购买的商品也越多; If ρ>0, then the above two elements are positively correlated; the more goods purchased at one time, the more goods purchased at the other time;
如果ρ=0,那么上述两个元素无相关性;一次购买的商品和另一次购买的商品的购买行为是相互独立的,这两次的购买行为互不影响; If ρ=0, then the above two elements have no correlation; the purchase behavior of the commodity purchased once and the commodity purchased in another purchase are independent of each other, and the two purchase behaviors do not affect each other;
如果ρ<0,那么上述两个元素是负相关;一次购买的商品越多,另一次购买的商品也越少; If ρ<0, then the above two elements are negatively correlated; the more goods purchased at one time, the less goods purchased at the other time;
所述ρ的范围在-1到1之间,ρ的绝对值越小,那么这两个元素的相关性越小;设置阈值ρmin,即,只选用ρ≥ρmin对应的序列,即, The range of ρ is between -1 and 1, the smaller the absolute value of ρ, the smaller the correlation between these two elements; set the threshold ρ min , that is, only select the sequence corresponding to ρ≥ρ min , that is,
当一个序列中任意两个元素的相关性系数均≥ρmin时,则选用该序列;否则,排除该序列; When the correlation coefficients of any two elements in a sequence are ≥ ρ min , the sequence is selected; otherwise, the sequence is excluded;
(2)选择可行的序列模式: (2) Select a feasible sequence mode:
判断一个序列是否是可行的方法是测试任意大小为2的子序列是否是可行的,而这些大小为2的子序列是由一个序列中任意相邻的2个元素构成的,即,如果一个大小为k,其中k>1,的正序列模式或负序列模式P=<e1e2…ek>是可行的,那么要求<e1e2>,<e2e3>,…,<ek-1ek>也是可行的,定义如下: The way to judge whether a sequence is feasible is to test whether any subsequence of size 2 is feasible, and these subsequences of size 2 are composed of any adjacent 2 elements in a sequence, that is, if a size is k, where k>1, the positive sequence pattern or negative sequence pattern P=<e 1 e 2 …e k > is feasible, then requires <e 1 e 2 >,<e 2 e 3 >,…,< e k-1 e k > is also feasible, defined as follows:
定义1.可行的序列模式 Definition 1. Feasible sequential patterns
一个大小为k(k>1)的正序列模式或负序列模式P=<e1e2…ek>是可行的,如果
asp(ei-1,ei)=s(<ei-1ei>)≥ms∧ asp(e i-1 ,e i )=s(<e i-1 e i >)≥ms∧
(ii) (ii)
(f(ei-1,ei,ms,ρmin)=1), (f(e i-1 ,e i ,ms,ρ min )=1),
其中 in
f()是一个有关支持度、相关性的约束函数;ms是用户设定的最小支持度阈值,用来剪掉非频繁的序列;其中s()代表序列的支持度,s()≠0或1; f() is a constraint function related to support and correlation; ms is the minimum support threshold set by the user, which is used to cut off infrequent sequences; where s() represents the support of the sequence, s()≠0 or 1;
依据定义1可知,一个大小为k(k>1)的正序列模式或负序列模式P=<e1e2…ek>,如果<ei-1 ei>不是可行的序列模式,那么该序列模式P是不可行的; According to definition 1, a positive sequence pattern or negative sequence pattern P=<e 1 e 2 …e k > with a size of k (k>1), if <e i-1 e i > is not a feasible sequence pattern, then the sequence pattern P is not feasible;
从定义1,如果P=<e1e2…ek>是可行的,那么<ei-1ei>必须是可行的序列模式,否则P=<e1e2…ek>将不是可行的序列模式; From definition 1, if P=<e 1 e 2 …e k > is feasible, then <e i-1 e i > must be a feasible sequence pattern, otherwise P=<e 1 e 2 …e k > will not be a feasible sequence pattern;
(3)利用SAP算法的步骤如下: (3) The steps of using the SAP algorithm are as follows:
第一步,用e-NSP方法挖掘出所有的正序列模式和负序列模式; In the first step, all positive sequence patterns and negative sequence patterns are mined with the e-NSP method;
第二步,对于每一个大小为2的序列模式,利用上述定义1对所述大小为2的序列模式进行测试,如果该大小为2的序列模式不是可行的序列模式,那么删除该模式和所有包含该模式的序列模式; The second step, for each sequence pattern with a size of 2, use the above definition 1 to test the sequence pattern with a size of 2, if the sequence pattern with a size of 2 is not a feasible sequence pattern, delete this pattern and all the sequence pattern containing the pattern;
第三步,对于大小超过2的序列模式,将其按照每相邻2个元素组合为一个序列,拆分成多个大小为2的序列,利用第二步所述的方法对每个大小为2的序列进行测试,如果每个大小为2的序列都是可行的序列,那么此大小超过2的序列模式则是可行的序列模式; In the third step, for a sequence pattern whose size exceeds 2, combine it into a sequence according to every adjacent 2 elements, split it into multiple sequences with a size of 2, and use the method described in the second step for each size of 2 sequences, if each sequence with a size of 2 is a feasible sequence, then the sequence pattern with a size greater than 2 is a feasible sequence pattern;
第四步,按照所述第三步得到所有的可行的序列模式;通过这些可行的序列模式来分析客户的购买行为; The fourth step is to obtain all feasible sequence patterns according to the third step; analyze the customer's purchase behavior through these feasible sequence patterns;
(4)SAP算法伪代码如下: (4) The pseudo code of the SAP algorithm is as follows:
算法SAP Algorithm SAP
输入:D:客户购买序列数据库;ms:最小支持度;ρmin:相关性阈值; Input: D: customer purchase sequence database; ms: minimum support; ρ min : correlation threshold;
Output:ASP:可用于分析客户购买行为的序列模式的集合; Output: ASP: A collection of sequential patterns that can be used to analyze customer purchase behavior;
(1)让ASP=Φ; (1) Let ASP=Φ;
(2)用e-NSP方法挖掘得到所有的正序列模式PSPs和负序列模式NSPs,然后将它们存储到集合{PNSP}中; (2) Use the e-NSP method to mine all the positive sequence patterns PSPs and negative sequence patterns NSPs, and then store them in the set {PNSP};
(3)for k从2到PSP的最大长度in{PNSP}do{ (3) for k from 2 to the maximum length of PSP in{PNSP}do{
(4)for每一个大小为k的模式P=<e1 e2 … ek>in{PNSP}do{ (4) for each pattern P=<e 1 e 2 ... e k >in{PNSP}do{
(5)用定义1来测试模式P; (5) Test pattern P with definition 1;
(6)如果P是一个可行的序列模式,那么 (6) If P is a feasible sequence pattern, then
(7)将序列模式P加入到集合ASP中; (7) adding the sequence pattern P to the set ASP;
(8)否则 (8) otherwise
(9)从{PNSP}中删除模式P以及所有包含P的模式; (9) Delete pattern P and all patterns containing P from {PNSP};
(10)} (10)}
(11)k++; (11) k++;
(12)} (12)}
(13)返回ASP;通过返回的结果来分析客户的购买行为。 (13) Return to ASP; analyze the customer's purchase behavior through the returned result.
本发明的优势在于: The advantages of the present invention are:
本发明应用在进行客户购买行为分析过程中,不仅利用最小支持度筛选出某一段时间内客户购买比较多的商品,而且还利用了相关系数来筛选出这段时间内与客户购买相关性比较大的商品,这样客户在购买商品时,利用本发明可以向他推荐一些其它客户都会买的并且和此产品相关性比较大的商品,从而增加客户的交易机会,将网站浏览者转变为购买者,提高交叉销售能力,提高客户的忠诚度,以及提高网站的经济效益。 The present invention is applied in the process of analyzing the customer's purchase behavior, not only using the minimum support to screen out the products that the customer purchases more in a certain period of time, but also using the correlation coefficient to screen out the goods that have a relatively high correlation with the customer's purchase during this period of time In this way, when a customer purchases a product, the present invention can be used to recommend to him some products that other customers will buy and have a relatively high correlation with this product, thereby increasing the transaction opportunities of the customer and converting the website browser into a buyer. Improve cross-selling capabilities, increase customer loyalty, and improve website profitability.
具体实施例 specific embodiment
下面结合实施例对本发明做详细的说明,但不限于此。 The present invention will be described in detail below in conjunction with the examples, but not limited thereto.
实施例1、 Embodiment 1,
正负序列模式筛选方法在客户购买行为分析中的应用,包括如下步骤: The application of the positive and negative sequential pattern screening method in the analysis of customer purchase behavior includes the following steps:
(1)用相关系数函数来测量商品和商品之间的关系: (1) Use the correlation coefficient function to measure the relationship between commodities and commodities:
客户每次购买的一种商品为单独的一项,该客户在某一个具体的时间点一次性购买的所有商品为一个元素,所述序列包括一个客户在某一段时间内购买商品所对应的所有元素;ρ代表一个序列中任意两个元素之间相关系数: A product purchased by a customer each time is a separate item, and all the products purchased by the customer at one time at a specific point in time are an element, and the sequence includes all items corresponding to a customer's purchase of products within a certain period of time. element; ρ represents the correlation coefficient between any two elements in a sequence:
如果ρ>0,那么上述两个元素正相关;一次购买的商品越多,另一次购买的商品也越多; If ρ>0, then the above two elements are positively correlated; the more goods purchased at one time, the more goods purchased at the other time;
如果ρ=0,那么上述两个元素无相关性;一次购买的商品和另一次购买的商品的购买行为是相互独立的,这两次的购买行为互不影响; If ρ=0, then the above two elements have no correlation; the purchase behavior of the commodity purchased once and the commodity purchased in another purchase are independent of each other, and the two purchase behaviors do not affect each other;
如果ρ<0,那么上述两个元素是负相关;一次购买的商品越多,另一次购买的商品也越少; If ρ<0, then the above two elements are negatively correlated; the more goods purchased at one time, the less goods purchased at the other time;
所述ρ的范围在-1到1之间,ρ的绝对值越小,那么这两个元素的相关性越小;设置阈值ρmin,即,只选用ρ≥ρmin对应的序列,即, The range of ρ is between -1 and 1, the smaller the absolute value of ρ, the smaller the correlation between these two elements; set the threshold ρ min , that is, only select the sequence corresponding to ρ≥ρ min , that is,
当一个序列中任意两个元素的相关性系数均≥ρmin时,则选用该序列;否则,排除该序列; When the correlation coefficients of any two elements in a sequence are ≥ ρ min , the sequence is selected; otherwise, the sequence is excluded;
(2)选择可行的序列模式: (2) Select a feasible sequence mode:
判断一个序列是否是可行的方法是测试任意大小为2的子序列是否是可行的,而这些大小为2的子序列是由一个序列中任意相邻的2个元素构成的,即,如果一个大小为k,其中k>1,的正序列模式或负序列模式P=<e1e2…ek>是可行的,那么要求<e1e2>,<e2e3>,…,<ek-1ek>也是可行的,定义如下: The way to judge whether a sequence is feasible is to test whether any subsequence of size 2 is feasible, and these subsequences of size 2 are composed of any adjacent 2 elements in a sequence, that is, if a size is k, where k>1, the positive sequence pattern or negative sequence pattern P=<e 1 e 2 …e k > is feasible, then requires <e 1 e 2 >,<e 2 e 3 >,…,< e k-1 e k > is also feasible, defined as follows:
定义1.可行的序列模式 Definition 1. Feasible sequential patterns
一个大小为k(k>1)的正序列模式或负序列模式P=<e1e2…ek>是可行的,如果
asp(ei-1,ei)=s(<ei-1ei>)≥ms∧ asp(e i-1 ,e i )=s(<e i-1 e i >)≥ms∧
(ii) (ii)
(f(ei-1,ei,ms,ρmin)=1), (f(e i-1 ,e i ,ms,ρ min )=1),
其中 in
f()是一个有关支持度、相关性的约束函数;ms是用户设定的最小支持度阈值,用来剪掉非频繁的序列;其中s()代表序列的支持度,s()≠0或1; f() is a constraint function related to support and correlation; ms is the minimum support threshold set by the user, which is used to cut off infrequent sequences; where s() represents the support of the sequence, s()≠0 or 1;
依据定义1可知,一个大小为k(k>1)的正序列模式或负序列模式P=<e1e2…ek>,如果<ei-1 ei>不是可行的序列模式,那么该序列模式P是不可行的; According to definition 1, a positive sequence pattern or negative sequence pattern P=<e 1 e 2 …e k > with a size of k (k>1), if <e i-1 e i > is not a feasible sequence pattern, then the sequence pattern P is not feasible;
从定义1,如果P=<e1e2…ek>是可行的,那么<ei-1ei>必须是可行的序列模式,否则P=<e1e2…ek>将不是可行的序列模式; From definition 1, if P=<e 1 e 2 …e k > is feasible, then <e i-1 e i > must be a feasible sequence pattern, otherwise P=<e 1 e 2 …e k > will not be a feasible sequence pattern;
(3)利用SAP算法的步骤如下: (3) The steps of using the SAP algorithm are as follows:
第一步,用e-NSP方法挖掘出所有的正序列模式和负序列模式; In the first step, all positive sequence patterns and negative sequence patterns are mined with the e-NSP method;
第二步,对于每一个大小为2的序列模式,利用上述定义1对所述大小为2的 序列模式进行测试,如果该大小为2的序列模式不是可行的序列模式,那么删除该模式和所有包含该模式的序列模式; The second step, for each sequence pattern with a size of 2, use the above definition 1 to test the sequence pattern with a size of 2, if the sequence pattern with a size of 2 is not a feasible sequence pattern, delete this pattern and all the sequence pattern containing the pattern;
第三步,对于大小超过2的序列模式,将其按照每相邻2个元素组合为一个序列,拆分成多个大小为2的序列,利用第二步所述的方法对每个大小为2的序列进行测试,如果每个大小为2的序列都是可行的序列,那么此大小超过2的序列模式则是可行的序列模式; In the third step, for a sequence pattern whose size exceeds 2, combine it into a sequence according to every adjacent 2 elements, split it into multiple sequences with a size of 2, and use the method described in the second step for each size of 2 sequences, if each sequence with a size of 2 is a feasible sequence, then the sequence pattern with a size greater than 2 is a feasible sequence pattern;
第四步,按照所述第三步得到所有的可行的序列模式;通过这些可行的序列模式来分析客户的购买行为; The fourth step is to obtain all feasible sequence patterns according to the third step; analyze the customer's purchase behavior through these feasible sequence patterns;
(4)SAP算法伪代码如下: (4) The pseudo code of the SAP algorithm is as follows:
算法SAP Algorithm SAP
输入:D:客户购买序列数据库;ms:最小支持度;ρmin:相关性阈值; Input: D: customer purchase sequence database; ms: minimum support; ρ min : correlation threshold;
Output:ASP:可用于分析客户购买行为的序列模式的集合; Output: ASP: A collection of sequential patterns that can be used to analyze customer purchase behavior;
(1)让ASP=Φ; (1) Let ASP=Φ;
(2)用e-NSP方法挖掘得到所有的正序列模式PSPs和负序列模式NSPs,然后将它们存储到集合{PNSP}中; (2) Use the e-NSP method to mine all the positive sequence patterns PSPs and negative sequence patterns NSPs, and then store them in the set {PNSP};
(3)for k从2到PSP的最大长度in{PNSP}do{ (3) for k from 2 to the maximum length of PSP in{PNSP}do{
(4)for每一个大小为k的模式P=<e1 e2 … ek>in{PNSP}do{ (4) for each pattern P=<e 1 e 2 ... e k >in{PNSP}do{
(5)用定义1来测试模式P; (5) Test pattern P with definition 1;
(6)如果P是一个可行的序列模式,那么 (6) If P is a feasible sequence pattern, then
(7)将序列模式P加入到集合ASP中; (7) adding the sequence pattern P to the set ASP;
(8)否则 (8) otherwise
(9)从{PNSP}中删除模式P以及所有包含P的模式; (9) Delete pattern P and all patterns containing P from {PNSP};
(10)} (10)}
(11)k++; (11) k++;
(12)} (12)}
(13)返回ASP;通过返回的结果来分析客户的购买行为。 (13) Return to ASP; analyze the customer's purchase behavior through the returned result.
其中所述客户购买行为分析,其中序列模式分析的侧重点在于分析数据间的前后或因果关系。就是在时间有序的事务集中,找到那些“一些项跟随另一些项”的内部事务模式。例如:9个月以前购买奔腾Pc的客户很可能在一个月内订购 新的CPU芯片。再例如,购买了PC的客户,可能接着买内存芯片,再买CD—ROM。 In the analysis of customer purchase behavior, the focus of sequence pattern analysis is to analyze the before and after or causal relationship between data. It is to find those internal transaction patterns of "some items follow other items" in the time-ordered transaction set. Example: A customer who bought a Pentium PC 9 months ago is likely to order a new CPU chip within a month. For another example, a customer who bought a PC may then buy a memory chip and then a CD-ROM.
通过从客户购买记录中挖掘出很多客户在一段时间内都会购买并且相关性比较大的商品,来便于电子商务的组织者预测客户的行为对客户提供个性化服务,发现什么商品会在另外一些商品购买后购买,从而可以向客户推荐一些其它客户都会买的并且和此产品相关性比较大的产品,把这些商品可以放到最显眼的位置。例如,当客户在线购买一台个人电脑时,系统可能根据以前挖掘出来的序列模式建议他考虑同时购买其他的一些东西,比如“购买这种个人电脑的人在三个月之内很可能要再买某种特殊的打印机或CD-ROM”,可以送给用户一个短期优惠券,从而促进产品销售。而负序列模式中的负项,即客户不购买的商品,我们则不需要向客户推荐,例如,<智能手机,照相机,内存卡>该序列模式,当客户购买智能手机时,系统会向客户推荐购买内存卡而不推荐照相机,因为购买智能手机的人,很可能在三个月内再购买内存卡,而不购买照相机。 By digging out a lot of products that customers will buy within a period of time and have a relatively high correlation from customer purchase records, it is convenient for e-commerce organizers to predict customer behavior and provide personalized services to customers, and find out what products will be in other products Purchase after purchase, so that you can recommend to customers some products that other customers will buy and are relatively related to this product, and these products can be placed in the most prominent position. For example, when a customer buys a personal computer online, the system may suggest that he consider buying other things at the same time based on the previously mined sequence patterns, such as "people who buy this kind of personal computer are likely to buy it again within three months." Buy a particular printer or CD-ROM" can give users a short-term coupon to boost product sales. And the negative items in the negative sequence pattern, that is, the goods that customers do not buy, we do not need to recommend to customers, for example, <smartphone, Camera, memory card > this sequence mode, when customers buy a smart phone, the system will recommend to the customer to buy a memory card instead of a camera, because people who buy a smart phone are likely to buy a memory card within three months instead of a camera. Buy a camera.
通过对客户购买行为进行分析,发现交易之间的关系规律,不仅可以根据当前的商品买卖情况来预测以后的商品买卖情况,还可以更好的安排商品的摆放,从而提高商品销售量。 By analyzing customer purchase behavior and discovering the relationship between transactions, it can not only predict the future commodity sales situation according to the current commodity sales situation, but also better arrange the placement of commodities, thereby increasing commodity sales.
Claims (1)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510025586.1A CN104732419B (en) | 2015-01-19 | 2015-01-19 | Application of the positive and negative sequence pattern screening technique in customers buying behavior analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510025586.1A CN104732419B (en) | 2015-01-19 | 2015-01-19 | Application of the positive and negative sequence pattern screening technique in customers buying behavior analysis |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104732419A true CN104732419A (en) | 2015-06-24 |
CN104732419B CN104732419B (en) | 2018-04-27 |
Family
ID=53456290
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510025586.1A Active CN104732419B (en) | 2015-01-19 | 2015-01-19 | Application of the positive and negative sequence pattern screening technique in customers buying behavior analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104732419B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105354728A (en) * | 2015-12-11 | 2016-02-24 | 北京京东尚科信息技术有限公司 | Discount coupon pushing method and device |
CN106384253A (en) * | 2016-09-30 | 2017-02-08 | 中国银联股份有限公司 | Consumption behavior analysis method in bankcard transaction and consumption behavior analysis device thereof |
CN106910132A (en) * | 2017-01-11 | 2017-06-30 | 齐鲁工业大学 | Top k can decision-making application of the negative sequence pattern in client insures behavioural analysis |
CN107515942A (en) * | 2017-08-31 | 2017-12-26 | 齐鲁工业大学 | Purchasing behavior analysis method for mining decision-making negative sequence patterns in infrequent sequences |
CN107563857A (en) * | 2017-08-31 | 2018-01-09 | 齐鲁工业大学 | The customers buying behavior analysis method of logic-based reasoning negative customers rule trimming technology |
CN109146542A (en) * | 2018-07-10 | 2019-01-04 | 齐鲁工业大学 | A method of excavating positive and negative sequence rules |
CN110111184A (en) * | 2019-05-08 | 2019-08-09 | 齐鲁工业大学 | A kind of negative sequence recommended method and system based on weighting Bayesian inference |
CN110880136A (en) * | 2018-09-06 | 2020-03-13 | 北京京东尚科信息技术有限公司 | Recommendation method, system, equipment and storage medium for matched product |
WO2020258483A1 (en) * | 2019-06-27 | 2020-12-30 | 齐鲁工业大学 | Clinical medication behavior analysis system based on highly effective negative sequential mining pattern, and working method therefor |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001282985A (en) * | 2000-03-31 | 2001-10-12 | Hitachi Ltd | Sales information analysis method |
CN101206751A (en) * | 2007-12-25 | 2008-06-25 | 北京科文书业信息技术有限公司 | Customer recommendation system based on data digging and method thereof |
CN101493925A (en) * | 2009-03-09 | 2009-07-29 | 浙江工商大学 | Retail industry dime ticket generating method by employing increment type excavation |
US20100131602A1 (en) * | 2008-11-21 | 2010-05-27 | Searete Llc, A Limited Liability Corporation Of The State Of Delaware | Correlating data indicating at least one subjective user state with data indicating at least one objective occurrence associated with a user |
CN102629360A (en) * | 2012-03-13 | 2012-08-08 | 浙江大学 | Effective dynamic commodity recommendation method and commodity recommendation system |
-
2015
- 2015-01-19 CN CN201510025586.1A patent/CN104732419B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001282985A (en) * | 2000-03-31 | 2001-10-12 | Hitachi Ltd | Sales information analysis method |
CN101206751A (en) * | 2007-12-25 | 2008-06-25 | 北京科文书业信息技术有限公司 | Customer recommendation system based on data digging and method thereof |
US20100131602A1 (en) * | 2008-11-21 | 2010-05-27 | Searete Llc, A Limited Liability Corporation Of The State Of Delaware | Correlating data indicating at least one subjective user state with data indicating at least one objective occurrence associated with a user |
CN101493925A (en) * | 2009-03-09 | 2009-07-29 | 浙江工商大学 | Retail industry dime ticket generating method by employing increment type excavation |
CN102629360A (en) * | 2012-03-13 | 2012-08-08 | 浙江大学 | Effective dynamic commodity recommendation method and commodity recommendation system |
Non-Patent Citations (1)
Title |
---|
缪裕青 等: "一种基于序列末项位置信息的序列模式挖掘算法", 《计算机应用研究》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105354728A (en) * | 2015-12-11 | 2016-02-24 | 北京京东尚科信息技术有限公司 | Discount coupon pushing method and device |
CN106384253A (en) * | 2016-09-30 | 2017-02-08 | 中国银联股份有限公司 | Consumption behavior analysis method in bankcard transaction and consumption behavior analysis device thereof |
CN106910132A (en) * | 2017-01-11 | 2017-06-30 | 齐鲁工业大学 | Top k can decision-making application of the negative sequence pattern in client insures behavioural analysis |
CN107515942A (en) * | 2017-08-31 | 2017-12-26 | 齐鲁工业大学 | Purchasing behavior analysis method for mining decision-making negative sequence patterns in infrequent sequences |
CN107563857A (en) * | 2017-08-31 | 2018-01-09 | 齐鲁工业大学 | The customers buying behavior analysis method of logic-based reasoning negative customers rule trimming technology |
CN107563857B (en) * | 2017-08-31 | 2020-10-09 | 齐鲁工业大学 | Analysis method of customer purchasing behavior based on logical reasoning negative association rule pruning technology |
CN109146542A (en) * | 2018-07-10 | 2019-01-04 | 齐鲁工业大学 | A method of excavating positive and negative sequence rules |
CN110880136A (en) * | 2018-09-06 | 2020-03-13 | 北京京东尚科信息技术有限公司 | Recommendation method, system, equipment and storage medium for matched product |
CN110111184A (en) * | 2019-05-08 | 2019-08-09 | 齐鲁工业大学 | A kind of negative sequence recommended method and system based on weighting Bayesian inference |
WO2020258483A1 (en) * | 2019-06-27 | 2020-12-30 | 齐鲁工业大学 | Clinical medication behavior analysis system based on highly effective negative sequential mining pattern, and working method therefor |
Also Published As
Publication number | Publication date |
---|---|
CN104732419B (en) | 2018-04-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104732419B (en) | Application of the positive and negative sequence pattern screening technique in customers buying behavior analysis | |
Chresentia et al. | Assessing consumer adoption model on e-wallet: an extended UTAUT2 approach | |
Rudansky-Kloppers | Investigating factors influencing customer online buying satisfaction in Gauteng, South Africa | |
CN104504159B (en) | Application of the positive and negative sequence pattern of multiple supports in customers buying behavior analysis | |
CN104537553B (en) | Repeat application of the negative sequence pattern in customers buying behavior analysis | |
CN104574153A (en) | Method for quickly applying negative sequence mining patterns to customer purchasing behavior analysis | |
US9767231B2 (en) | Method and system for calculating affinity between entities using electrical circuit analogy | |
Kinker et al. | An analysis of consumer behaviors towards online shopping of electronic goods with special reference to Bhopal and Jabalpur city | |
KR101145471B1 (en) | System and method for providing mobile shopping mall service | |
Zhang et al. | Product discovery and consumer search routes: Evidence from a mobile app | |
KR101979237B1 (en) | Method and apparatus for providing shopping information | |
Amatus et al. | Effects of website appearance, security and electronic word-of-mouth (EWOM) on online customer loyalty: trust as mediating factor | |
Kumar et al. | Analyzing the application of UTAUT2 model in predicting the adoption of electronic shopping in Nigeria | |
Olotewo | Examining the antecedents of in-store and online purchasing behavior: a case of Nigeria | |
Chen et al. | Boosting recommendation in unexplored categories by user price preference | |
Ghosal et al. | Acceptance of Online Shopping in West Bengal: Customer's Perception | |
Sarwar et al. | Determinants of online booking trials for travel related products: A PLS-SEM approach | |
HORVÁTH et al. | Consumer Behaviour of a New Generation of Customers Regarding Education in Terms of The Assessment of Uncertainty Factors in E-Commerce | |
CN106910104A (en) | Application of the negative sequence pattern based on individual event missing in commercial product recommending | |
Gracz | Differences in online shopping risk perception between urban and rural consumers in Poland | |
Bhat et al. | Examining the Effect of Demonetisation on Grocery Retailing in India | |
Putra | Analysis of Intention to Transact Use Marketplace and Social Media Reviewed from Technology Accepted Model and Perceived Risk on College Students in Jember | |
Datta et al. | Impact of Customers’ attitude towards online shopping in the context of Bangladesh: A Case from Northern Region | |
Long | Factors Influencing Digital Trust Among Young People in Phnom Penh: The Adoption of Expectation Confirmatory Theory | |
Nallamekala et al. | Data Science and Machine Learning Approach to Improve Online Grocery Store Sales Performance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20230224 Address after: Room 1799, 17/F, No. A7-4, Hanyu Financial and Business Center, No. 7000 Jingshi Road, Jinan Area, China (Shandong) Free Trade Pilot Zone, Jinan, Shandong Province, 250000 Patentee after: Shandong Yuanjing Information Technology Co.,Ltd. Address before: No. 3501, Daxue Road, University Science Park, Changqing West New Town, Jinan, Shandong 250353 Patentee before: Qilu University of Technology |
|
TR01 | Transfer of patent right |