CN117010986A

CN117010986A - Trial dress matching method based on user behavior data

Info

Publication number: CN117010986A
Application number: CN202310858399.6A
Authority: CN
Inventors: 李文奇; 刘峰
Original assignee: Beijing Xiaoxiang Technology Co ltd
Current assignee: Beijing Xiaoxiang Technology Co ltd
Priority date: 2023-07-13
Filing date: 2023-07-13
Publication date: 2023-11-07

Abstract

The invention relates to a trial dress matching method based on user behavior data, which belongs to the field of personalized recommendation algorithms and specifically comprises the following steps: s1, constructing a product parameter matrix, S2, generating a user scoring matrix, S3, user hierarchical clustering, S4, and optimizing and solving the mixed planning. The invention has the advantages that: in the method, a large number of users of the platform are taken into consideration at the same time, and the data mining algorithm and the data planning algorithm are combined to realize the optimal matching degree of the whole user, and limited trial package products are recommended to the user, so that better popularization effect is achieved, and the waste of the trial package is reduced.

Description

Trial dress matching method based on user behavior data

Technical Field

The invention relates to a trial-package matching method based on user behavior data, belongs to the field of personalized recommendation algorithms, and particularly relates to an optimal trial-package matching scheme provided by utilizing a data mining related algorithm and a hybrid planning optimization algorithm based on the behavior data of users, such as searching, browsing, purchasing and the like, and oriented to a large number of users and limited trial-package inventory.

Background

In the background of the vigorous development of electronic commerce and the internet industry, various new products are presented in the field of view of consumers. Although many products are advertised through traditional advertising or current social media to generate a certain popularization effect, consumers are more and more difficult to get driven by a large amount of internet information, so that a plurality of platforms can add trial packages of the promoted products into orders of users, and users hope to purchase the products, especially products such as food, beverage, make-up and the like, through visual product trial.

At present, the distribution of many trial packages is simply performed based on the popularization requirement of brands, or only the products purchased by users and the database of odd trial packages are used for simple keyword matching, but the actual requirement and preference depth of the users are not considered in the distribution link. Meanwhile, different from advertisements, the quantity of the trial package products is limited, if the product is a single brand, the product types are limited, the distribution of the trial packages is relatively easy, and for large comprehensive platforms, especially platforms with straight camping properties, the trial packages are expected to be used for recommendation in a cross-brand and cross-type mode, so that when the sales of the platform is improved, a trial package matching scheme for a large number of users of the whole platform is formed, and the optimal trial package popularization effect is very necessary.

When selling products to various users, the product needs to be popularized by using the trial assembly users to apply the user order on a platform. The existing trial packages matched to users usually depend on the current popularization strategy of the platform, for example, the platform can directly select inventory trial packages or new product trial packages, or similar products are matched according to keywords directly according to the products purchased by the users at present and put into packages of the users, but the putting is blind and does not consider the preference condition of the users. In other words, the trial fit is matched to unsuitable users, essentially resulting in additional cost wastage. The user searches and browses to the ordered product on the platform to generate a large amount of behavior data, and the behavior data reflects the preference of the user, so that the preference results can be completely combined on the issuing of the trial package.

Yet another problem encountered with the distribution of trial packages is that the number of issuable packages is often limited due to problems with the actual situation of trial package production, its own attributes, and the cost of actual operation. While the user base is very large, even if half of the recommendation algorithm is adopted to complete recommendation for the user, the actual number limitation may make the recommendation scheme incapable of being executed.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a trial dress matching method based on user behavior data, which comprises the following steps:

a trial dress matching method based on user behavior data comprises the following steps:

s1, constructing a product parameter matrix, which specifically comprises the following steps: constructing a parameter sequence which is compatible with a certain product; carrying out parameter matching marking on each product under the products, namely marking whether the product accords with parameters in a parameter sequence, forming parameter marking vectors of each product after marking is finished, splicing marking vectors of all products in a whole library together as row vectors to form a product parameter matrix of the whole library, and then analyzing or recommending the products according to requirements to extract corresponding product parameter matrixes;

s2, generating a user scoring matrix, specifically: for a certain type of product in the step S1, generating preference scores of a user group on the parameter sequences of the product formed in the step S1, analyzing search keyword data, purchase keyword data and evaluation keyword data of users, adding analysis results of related keywords, or forming scoring vectors of each user on the product parameters after weight addition, and splicing the vectors as row vectors to form an integral user scoring matrix;

s3, hierarchical clustering of users, specifically: after the user scoring matrix is obtained, performing unsupervised cluster analysis on the whole user by taking the user scoring as a basis, and subsequently recommending and combining trial packages for each clustered user category; before clustering the user scoring matrix, firstly separating out a plurality of layers according to a strategy of issuing trial dress, then performing unsupervised clustering on each layer, namely directly combining users with similar preference, performing cluster analysis on the users of each layer by using an unsupervised clustering algorithm, splitting the user scoring matrix according to a clustering result after the clustering is completed, calculating average parameter scoring of each class, and recording the number of people in each cluster;

s4, optimizing and solving the mixed planning, wherein the method specifically comprises the following steps: the average user scoring matrix after clustering, the number of clusters, the number of various user matching products and the product parameter matrix vector are multiplied to finally obtain a vector, wherein the vector is the total matching preference score of all users, and the operation is carried out through a mixed integer programming algorithm to finally obtain the optimal scheme.

In the step S1, parameter matching marking is performed for each product under the product, that is, whether the product accords with parameters in the parameter sequence is marked, and usually, marking is performed with 0 or 1, that is, marking is performed with 1 according to the description of the product, and marking is performed with 0 if the product does not accord with the description of the product; t parameters are generated in total through the parameter sequence, and corresponding marking row vectors f are generated for each product p _p All row vectors are spliced together to form a parameter matrix of the total library P commodity:

meanwhile, for P trial package products which need to be combined with the trial package, a corresponding parameter matrix is generated:

in the step S2, in order to eliminate the deviation generated in the calculation process of different liveness of different users, the scoring vector needs to be normalized, that is, each element in the vector is processed into a decimal between 0 and 1; because the influence degree of different behaviors on the preference of the computing user is different, the evaluation type data is the most important, and secondly, purchase, browse and search are respectively carried out, so that weight is required to be introduced in the final summation;

for a certain user n, searching for a product a, browsing for a product b, purchasing a product c, scoring a product d and scoring an e product e by analyzing behavior data of the user n; meanwhile, the total search amount, the total browse amount, the total purchase amount and the total evaluation amount of the user on the whole platform are A, B, C, D respectively; after extracting the parameter vectors of all the products involved in the above-mentioned behavior, the following summary calculation is started:

wherein, the coefficients 1,2,3, 4-4 in front of each term are weight coefficients which are introduced differently according to different behaviors and influence on preference calculation;

when trial assembly allocation is required for N users, preference vector u of each user _n And the length T is generated by the above, and the vectors are finally spliced together as row vectors to finally form a scoring matrix U of the whole user:

in the step S3, the average parameters of the users are scored, namely, the k-th user is scored, and the user scoring matrix is

Averaging the scores of each column in the matrix, i.e

After the mean vector of each class is calculated, the mean vector is spliced together as a row vector to obtain a scoring mean matrix of the whole K classes:

and simultaneously recording clustered indexes, wherein the indexes comprise user quantity vectors of each class, and the length is K:

M＝(m ₁ ，…，m _K )；

the upper and lower value limit vectors of the trial dress combination required by each type of user are K:

product marking matrices that must be issued, depending on the characteristics of each type of user:

g _pk a value of 0 or 1,1 indicates that the trial package combination for the kth class of users must have a product p.

In the step S4, specifically:

by averaging the clustered average user scoring matricesThe number vector M of various clusters and the number matrix X of various matched products of various users _p×K Product parameter matrix F _P×T The vector multiplication can finally obtain a vector which is the total matching preference score of all users, the final purpose of generating a matching scheme is to find the number matrix of the optimal various user matching products, the vector is made to take the maximum value, and the operation is carried out through a mixed integer programming algorithm, so that the optimal scheme is finally obtained, and the specific calculation steps and methods are as follows:

(1) Setting unknowns, namely setting an unknown matrix according to a trial assembly scheme of each type of users to be finally output:

each element x in this matrix _pk Expressed as the number of products p in the k-th class of user trial combination, thus the whole matrix X _p×K Marking outTrial dress combination results of each class of the overall K class users;

(2) Determining an optimization objective function:

the objective of the integer programming is to maximize the total score of the product preference of the whole user after trial dress combination is performed for each type of user programming, so that the objective function of the optimization problem is as follows:

the last L is a vector;

(3) Setting algorithm limiting conditions: when the trial package combination of various users is matched, the limiting conditions of inventory and release amount are required to be met, a certain limiting function is set in integer programming, and the specific limiting conditions and the corresponding limiting functions are as follows:

(1) the number of products in the trial package cannot be negative:

(2) trial dress value limit:

wherein v is _p The corresponding market value of the trial package for the product p;

(3) trial applications must be issued:

(4) non-combinable packets:when->Wherein B is _p1 Is unable to match product p ₁ Product collections that appear in the same package at the same time, this limitation avoids trial too similar or contradictory occurrences in packages of the same classPackaging a product;

(5) minimum trial package issue:R _p the minimum requirement for the total release amount of the product p trial package ensures that each product can be promoted by trial package release;

(6) highest trial inventory:Q _p the stock quantity of the trial package for the product p is limited, and the stock quantity of the existing trial package is not exceeded in the issuing process;

(7) total number of trial packages limit:S _k the maximum total number of trial package products required for the trial package combination of the kth class of users;

(8) single number of trial packages limit:S _p a limit on the number of trial packages for product p issued to a single user;

the problem of whole integer programming is constructed by setting up unknowns, target optimization functions and limiting functions, then the calculation is carried out by a mathematical optimizer, and finally the optimal solution X is output _p×K Each column of the matrix is the trial dress combination result for each class of users.

The invention has the advantages that:

in the method, a large number of users of the platform are taken into consideration at the same time, and the data mining algorithm and the data planning algorithm are combined to realize the optimal matching degree of the whole user, and limited trial package products are recommended to the user, so that better popularization effect is achieved, and the waste of the trial package is reduced.

Drawings

Fig. 1 is a schematic flow chart of the present invention.

Detailed Description

The invention will be further described with reference to specific embodiments, and advantages and features of the invention will become apparent from the description. These examples are merely exemplary and do not limit the scope of the invention in any way. It will be understood by those skilled in the art that various changes and substitutions of details and forms of the technical solution of the present invention may be made without departing from the spirit and scope of the present invention, but these changes and substitutions fall within the scope of the present invention.

Referring to fig. 1, the invention relates to a trial dress matching method based on user behavior data, which comprises the following steps:

s1, constructing a product parameter matrix, which specifically comprises the following steps: constructing a parameter sequence which is compatible with a certain product; the method is characterized in that keywords which can describe the types of product efficacy, color, specification and the like are usually used, the classification and description of the products can be read through the extraction of the keywords, the product information can be collected from manufacturers or suppliers, analysis and the like can be performed on the product information, and parameters with finer granularity can be summarized and the like, so that a parameter sequence which can specially describe the products can be finally formed; then, carrying out parameter matching marking on each product under the products, namely marking whether the product accords with parameters in a parameter sequence (namely marking whether the product accords with parameters in the parameter sequence or not, and marking by using 0 or 1 in general), after marking is finished, namely forming parameter marking vectors of each product, and after the marking vectors of all products in the whole library are spliced together as row vectors, forming a product parameter matrix of the whole library, and then analyzing or recommending the products according to the need to extract the corresponding product parameter matrix;

taking the cosmetic product as an example, the parameters or keywords that may be included in the sequence are shown in table 1.

TABLE 1

Wherein product function, texture, color and cosmetic effect data need to be collected and summarized from various suppliers, and brand classification and price interval index in the basic parameter part need to be processed and used on the basis of parameters provided by the suppliers. The brand classification method is to classify the brand by analyzing the brand performance in some dimensions, and the specific classification method is shown in table 2.

TABLE 2

The price interval index is to extract all the prices of a certain class of products to obtain the prices of 25%, 50% and 75% of the products, and to compare the specific price of each product with the above several branches to mark the grade of the price of the product. Using this approach, rather than the price itself, avoids unfairness in later dispensing of the product due to the price base of the different categories of product.

Note that the whole parameter sequence should include all parameters of various products in the market, so that each product can use the parameter sequence to complete its own parameter matching mark to form its own parameter file.

Marking whether the product meets the parameters in the parameter sequence or not, wherein the product is marked with 0 or 1 usually, namely, the product meets the description of the item and is marked as 1, and the product does not meet the description of the item and is marked as 0; t parameters are generated in total through the parameter sequence, and corresponding marking row vectors f are generated for each product p _p All row vectors are spliced together to form a parameter matrix of the total library P commodity:

s2, generating a user scoring matrix, specifically: for a certain type of product in the step S1, generating a preference grade of a user group to the parameter sequence of the product formed in the step S1, analyzing search keyword data, purchase keyword data and evaluation keyword data of users, adding analysis results of related keywords, or after weight addition (the weight is usually an integer between 1 and 10 and is determined by the importance of different behaviors), namely forming a grade vector of each user to the product parameter, and splicing the vectors together as row vectors to form an integral user grade matrix;

when keyword data analysis is performed, first, search, browse, purchase and evaluation records of a user on a platform for the past month are extracted, the records can all correspond to specific products, the products in the library already generate corresponding parameter vectors according to the first step, and the vectors are added to obtain an overall scoring vector. In order to eliminate the deviation of different liveness of different users in the calculation process, the scoring vector needs to be normalized, i.e. each element in the vector is processed into a decimal between 0 and 1. Then, since different behaviors have different degrees of influence on calculating the preference of the user, the evaluation class data is most important, and secondly, purchase, browse and search are respectively performed, so that weight needs to be introduced at the time of final summation.

For a certain user n, searching for a product a, browsing for a product b, purchasing a product c, scoring a product d and scoring an e product e by analyzing behavior data of the user n; meanwhile, the total search amount, the total browse amount, the total purchase amount and the total evaluation amount of the user on the whole platform are A, B, C, D respectively; on the hand

After the parameter vectors of all the products involved in the above-mentioned actions are taken, the following summary calculation is started:

the method comprises the steps of carrying out a first treatment on the surface of the Wherein, the coefficients 1,2,3, 4-4 in front of each term are weight coefficients which are introduced differently according to different behaviors and influence on preference calculation;

s3, hierarchical clustering of users, specifically: after the user scoring matrix is obtained, performing unsupervised cluster analysis on the whole user by taking the user scoring as a basis, and subsequently recommending and combining trial packages for each clustered user category; before clustering the user scoring matrix, separating out a plurality of layers according to a strategy of issuing trial dress, performing unsupervised clustering on each layer, namely directly combining users with similar preference, performing cluster analysis on the users of each layer by using an unsupervised clustering algorithm, splitting the user scoring matrix according to a clustering result after the clustering is completed, calculating average parameter scoring of each class, and recording the number of people in each cluster. Splitting a user scoring matrix according to a clustering result after clustering is completed, calculating average parameter scoring of each category, and recording the number of people in each cluster;

in the step S3, after the user scoring matrix is obtained, in order to reduce the calculation cost and the combination cost of the trial packages, the users with similar preference may be categorized and unified into a class of users for trial package distribution. In a specific operation, based on the user score, unsupervised cluster analysis can be performed on the whole user, and the recommendation and combination of trial packages can be performed on each clustered user category subsequently.

Before clustering the user scoring matrix, a plurality of layers can be separated according to the strategy of issuing the trial dress, for example, the trial dress values matched with different order amounts are different, users who purchase specific products need to be matched with specific using dress and the like, the users can be separated into different layers according to specific conditions, and unsupervised clustering is carried out on each layer, so that users with similar preference are directly combined. Suitable unsupervised clustering algorithms may be used herein to perform cluster analysis on the user levels, such as K-means algorithms, DBSCAN, etc.

Continuing the above example, aiming at different amounts of orders of users, the value requirements of the matched trial packages are different, for example, the amount of the orders is 0-100 yuan, the trial package combinations within 20 yuan need to be matched, the trial package combinations within 20-50 yuan need to be matched, the trial package combinations within 50-100 yuan need to be matched, the trial package combinations within 500-1000 yuan need to be matched, the trial package combinations within 100-150 yuan need to be matched, so that the order amounts of the users are divided into 4 classes firstly, the user scoring matrix corresponding to the 4 classes is extracted, unsupervised clustering is carried out on each class of users, the K value can be determined according to the actual matching cost, the number of the matched trial package combinations and the number of each class of users are comprehensively determined, for example, the total class number is respectively required to be clustered.

To continue the subsequent calculation, the relevant parameters of the class users need to be extracted, firstly, the average parameters of the class users are scored, namely, the k class users are scored, and the user scoring matrix is that

Averaging the scores of each column in the matrix, i.e

M＝(m ₁ ,...,m _K )；

In the step S4, specifically:

by averaging the clustered average user scoring matricesThe number vector M of various clusters and the number matrix X of various matched products of various users _p×K Product parameter matrix F _P×T Vector multiplication can finally obtain a vector, the vector is the total matching preference score of all users, and the final purpose of generating a matching scheme is to find the optimal oneThe number matrix of the products is matched by various users, the vector is made to be the maximum value, and the operation is carried out through a mixed integer programming algorithm, so that the optimal scheme is finally obtained, and the specific calculation steps and methods are as follows:

each element x in this matrix _pk Expressed as the number of products p in the k-th class of user trial combination, thus the whole matrix X _p×K Marking trial assembly combination results of each class of the overall K-class users;

(2) Determining an optimization objective function:

the last L is a vector;

(1) the number of products in the trial package cannot be negative:

(2) trial dress value limit:

(3) trial applications must be issued:

(4) non-combinable packets:when->Wherein B is _p1 Is unable to match product p ₁ Meanwhile, the product set in the same package is limited, so that the trial packaged products which are too similar or contradictory in the same package are avoided;

by setting up unknowns, objective optimization functions and constraint functions, the whole integer gaugeThe construction of the question of the division is completed, the calculation is carried out through a mathematical optimizer, and finally the optimal solution X is output _p×K Each column of the matrix is the trial dress combination result for each class of users.

The invention also relates to a system for implementing such a method, comprising an input database for transmitting back user behavior data and product data collected at the front end; and the characteristic extractor is used for extracting various parameter characteristics of the product and the user preference, the scoring device is used for quantifying various parameters into a parameter matrix according to a given method, the mathematical programming optimizer is used for calculating, the mathematical programming problem constructed according to the trial loading matching method based on the user behavior data is optimized and solved, and finally, the trial loading distribution result of each user is output and is transmitted back to the front-stage application to be presented to the user and the actual delivery is executed.

The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.

Claims

1. A trial dress matching method based on user behavior data is characterized by comprising the following steps:

2. The method for matching trial packages based on user behavior data according to claim 1, wherein in the step S1, parameter matching is performed for each product under the product, that is, whether the product meets parameters in the parameter sequence is marked, and usually, 0 or 1 is used for marking, that is, the mark meeting the description of the product is 1, and if not, the mark is 0; t parameters are generated in total through the parameter sequence, and corresponding marking row vectors f are generated for each product p _p All the row vectors are spliced together to form a full library P ^* Parameter matrix for individual goods：

3. the trial-package matching method based on user behavior data according to claim 1 or 2, wherein in the step S2, in order to eliminate the deviation of different activities of different users in the calculation process, the scoring vector needs to be normalized, i.e. each element in the vector is processed into a fraction between 0 and 1; because the influence degree of different behaviors on the preference of the computing user is different, the evaluation type data is the most important, and secondly, purchase, browse and search are respectively carried out, so that weight is required to be introduced in the final summation;

when trial assembly allocation is required for N users, the preference of each user is orientedQuantity u _n And the length T is generated by the above, and the vectors are finally spliced together as row vectors to finally form a scoring matrix U of the whole user:

4. a trial matching method based on user behavior data according to claim 3, wherein in the step S3, the average parameters of the class of users are scored, i.e. for the k-th class of users, and the user scoring matrix is:

averaging the scores of each column in the matrix, i.e

M＝(m ₁ ，…，m _K )；

5. A trial matching method based on user behavior data according to claim 3, wherein in the step S4, specifically:

by averaging the clustered average user scoring matricesQuantity matrix X of various clustering people quantity vector M and various user matching products _p×K Product parameter matrix F _P×T The vector multiplication can finally obtain a vector which is the total matching preference score of all users, the final purpose of generating a matching scheme is to find the number matrix of the optimal various user matching products, the vector is made to take the maximum value, and the operation is carried out through a mixed integer programming algorithm, so that the optimal scheme is finally obtained, and the specific calculation steps and methods are as follows:

(2) Determining an optimization objective function:

the last L is a vector;

(1) the number of products in the trial package cannot be negative:

(2) trial dress value limit:

(3) trial applications must be issued: