CN111695005A

CN111695005A - Application method of internet user access track behavior big data analysis algorithm

Info

Publication number: CN111695005A
Application number: CN202010488643.0A
Authority: CN
Inventors: 徐建民; 余成勇
Original assignee: Wuhai Dashi Intelligence Technology Co ltd
Current assignee: Wuhai Dashi Intelligence Technology Co ltd
Priority date: 2020-06-02
Filing date: 2020-06-02
Publication date: 2020-09-22

Abstract

The invention discloses an application method of an internet user access track behavior big data analysis algorithm, which is used for calculating the similarity between behavior events; and determining the preference degree of the user for the related content according to the similarity of the behavior events and the historical behavior of the user, recommending the content according to the quantitative index of the preference degree, and grouping and clustering the users with the same interest and preference. The invention has the beneficial effects that: according to the method, the user behavior event is analyzed by extracting the basic data of the access track by utilizing the user access track, the access duration and the access frequency.

Description

Application method of internet user access track behavior big data analysis algorithm

Technical Field

The invention relates to the technical field of internet user access tracks, in particular to an application method of an internet user access track behavior big data analysis algorithm.

Background

The access behavior track of people on the Internet comprises user active behavior and non-active behavior, wherein the user active behavior is the behavior of clicking (Click) pages by users, and the non-active user behavior is the behavior of simultaneously generating auxiliary pages while clicking (Click) pages by users. Typically, an active click (click) action is generated in conjunction with the attachment into multiple pages, Hits. In a user access behavior, the number of pages generated by the non-active behavior is several times, dozens of times or even hundreds of times of the number of pages generated by the active behavior, so that a large number of 'junk' pages are generated in one access behavior, and the interest characteristics of the user are seriously influenced and accurately depicted. At present, the solution is to set the "garbage" pages (i.e. inactive behaviors) as blacklists for filtering, and form PageViews (usually abbreviated as PV) to approach the active behaviors.

Formula algorithm in the prior art

The denominator | n (i) | is the number of users who like the behavior event i, and the numerator n (i) n (j) is the number of users who like both the behavior event i and the behavior event j. Therefore, the above formula can be understood as how many proportion of users who like the behavior event i also like the behavior event j.

Although the above formula seems reasonable, there is a problem that if the behavior event j is hot and is liked by many people, Wij is large and close to 1. Therefore, the formula may cause any behavior event to have a great similarity with the popular behavior event, which is obviously not a good characteristic for the recommendation system aiming to mine the long tail information.

Therefore, it is necessary to provide an application method of an internet user access trajectory behavior big data analysis algorithm for the above problems.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide an application method of an internet user access track behavior big data analysis algorithm to solve the problems.

Application method of internet user access track behavior big data analysis algorithm, wherein application method comprises steps

(1) Calculating similarity between the behavior events;

(2) generating a recommendation list for the user according to the similarity of the behavior events and the historical behavior of the user;

(3) calculating the interestingness for the behavioral event using the following formula;

wherein the denominator | N (i) | N (j) | is the number of users who like the behavior event i and j, and the numerator N (i) | N (j) is the number of users who like the behavior event i and the behavior event j at the same time;

(4) when the ItemCF algorithm is used for calculating the similarity of the behavior events, firstly, a user-behavior event inverted list is established, and then, for each user, every two behavior events in the behavior event list of the user are added with 1 in a co-occurrence matrix C;

(5) for each action event set, adding one to each action event in the action event set to obtain a matrix, adding the matrixes to obtain an upper C matrix, and normalizing the C matrix to obtain a cosine similarity matrix W between the action events;

(6) after obtaining the similarity between the behavior events, ItemCF calculates the interest of the user u in a behavior event j by the following formula:

where n (u) is a set of behavioral events liked by the user, S (j, K) is a set of K behavioral events most similar to behavioral event j, wji is the similarity of behavioral events j and i, and rui is the interest of user u in behavioral event i.

Preferably, step (4) establishes a list containing his favorite behavior events for each user.

Preferably, C [ i ] [ j ] records the number of users who like the behavior event i and the behavior event j at the same time.

Preferably, step (2) is realized by steps (3) to (6).

Preferably, the action events can be interests, hobbies, habits, commodities and the like.

Compared with the prior art, the invention has the beneficial effects that: according to the method, the user behavior event is analyzed by extracting the basic data of the access track by utilizing the user access track, the access duration and the access frequency.

Drawings

FIG. 1 is a user behavior event interest matrix diagram of an application method of an internet user access track behavior big data analysis algorithm provided by the invention;

FIG. 2 is a diagram of an example behavior event recommendation matrix of the present invention.

Detailed Description

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.

The embodiments of the invention will be described in detail below with reference to the drawings, but the invention can be implemented in many different ways as defined and covered by the claims.

As shown in figure 1 and combined with figure 2, an application method of an internet user access track behavior big data analysis algorithm is provided, wherein the application method comprises the following steps

(1) Calculating similarity between the behavior events;

Further, step (4) establishes a list containing favorite behavior events for each user.

Further, C [ i ] [ j ] records the number of users who like the behavior event i and the behavior event j at the same time.

Further, the step (2) is realized by the steps (3) to (6).

Further, the action events may be interests, hobbies, habits, commodities, and the like.

As shown in FIG. 2, the user likes both C + + Primer Chinese edition and Programming America. ItemCF would then find the 3 books that are most similar to the two books, and then calculate the user's interest level in each book according to the formula definition. For example, ItemCF recommends "guidance on algorithm" to the user because this book is similar to "C + + Primer Chinese edition" with a similarity of 0.4, and this book is also similar to "American on Programming" with a similarity of 0.5. Considering that the user's interest level in C + + Primer chinese version is 1.3 and the interest level in programmed beauty is 0.9, the user's interest level in algorithm introduction is 1.3 × 0.4+0.9 × 0.5 — 0.97.

As can be seen from this example, one advantage of ItemCF is that it can provide a recommendation explanation that takes advantage of the user's historical liking

The action event is interpreted for the current recommendation. The recommendation of ItemCF is more personalized, and the interest inheritance of the user is reflected. ItemCF can be greatly advantageous in book, e-commerce and movie sites, such as Amazon, Bean, Netflix. First, in these websites, the interests of the user are relatively fixed and persistent. A technician may be purchasing a technical book and they are not as sensitive to how hot the book is, and in fact the more sophisticated technicians the more likely they are to look at the book. In addition, the popularity of most users in these systems is not needed to assist them in determining the quality of a performance event, but rather, the quality of the performance event can be determined by themselves based on knowledge in the field. The task of personalized recommendations in these websites is therefore to help the user find behavioural events relevant to his research field. Therefore, the ItemCF algorithm becomes the preferred algorithm for these web sites. In addition, the behavior event updating speed of the websites is not particularly fast, the behavior event similarity matrix is updated once a day, which is acceptable for the websites, and ItemCF needs to maintain a behavior event similarity matrix. From the storage perspective, if there are many users, a large space is needed for maintaining the user interest similarity matrix, and similarly, if there are many behavior events, the cost for maintaining the behavior event similarity matrix is large, the number of users is often very large in the actual internet, and the number of behavior events is relatively small in books and e-commerce websites. Furthermore, the similarity of behavioral events is generally stable with respect to the user's interests, so using ItemCF is a better choice.

The behavioral event cold start is a serious problem for the ItemCF algorithm. Since the principle of the ItemCF algorithm is to give the user

Recommending behavioral events that are similar to those that he previously liked. The ItemCF algorithm calculates a behavior event similarity table (generally once a day) by using user behaviors at intervals, and the ItemCF algorithm puts a previously calculated behavior event correlation matrix in a memory during online service. Therefore, when a new behavior event is added, the behavior event does not exist in the behavior event correlation table in the memory, and therefore the ItemCF algorithm cannot recommend the new behavior event. The method for solving the problem is to update the behavior event similarity table frequently, but calculating the behavior event similarity based on the user behaviors is a very time-consuming matter, and the main reason is that the user behavior log is very huge. Moreover, if the new behavior event is not displayed to the user, the user cannot generate the behavior for the new behavior event, and the correlation matrix containing the new behavior event cannot be calculated through the behavior log calculation. For this reason, we can only calculate the behavior event correlation table using the content information of the item, and frequently update the correlation table (e.g., once in half an hour).

The content information of the behavior event is various, and different types of behavior events have different content information. In the case of a movie, the content information generally includes a title, a director, actors, drama, genre, country, era, and the like. In the case of books, the content information typically includes title, author, publisher, text, category, etc. Another approach to solve the problem of user cold start is to not immediately present the recommendation result to the user when the new user accesses the recommendation system for the first time, but to provide some behavior events to the user, let the user feedback their interest in these behavior events, and then provide personalized recommendations according to the user feedback.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. An application method of an internet user access track behavior big data analysis algorithm is characterized by comprising the following steps: in which the method steps are applied

(1) Calculating similarity between the behavior events;

2. The method for applying the big data analysis algorithm for the internet user access track behavior as claimed in claim 1, wherein: wherein step (4) creates a list for each user containing his favorite behavioral events.

3. The method for applying the big data analysis algorithm for the internet user access track behavior as claimed in claim 1, wherein: wherein C [ i ] [ j ] records the number of users who like the behavior event i and the behavior event j at the same time.

4. The method for applying the big data analysis algorithm for the internet user access track behavior as claimed in claim 1, wherein: the step (2) is realized by the steps (3) to (6).

5. The method for applying the big data analysis algorithm for the internet user access track behavior as claimed in claim 1, wherein: wherein the behavioral events can be interests, hobbies, habits and commodities.