CN111695005A - Application method of internet user access track behavior big data analysis algorithm - Google Patents
Application method of internet user access track behavior big data analysis algorithm Download PDFInfo
- Publication number
- CN111695005A CN111695005A CN202010488643.0A CN202010488643A CN111695005A CN 111695005 A CN111695005 A CN 111695005A CN 202010488643 A CN202010488643 A CN 202010488643A CN 111695005 A CN111695005 A CN 111695005A
- Authority
- CN
- China
- Prior art keywords
- behavior
- user
- event
- events
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/9035—Filtering based on additional data, e.g. user or group profiles
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an application method of an internet user access track behavior big data analysis algorithm, which is used for calculating the similarity between behavior events; and determining the preference degree of the user for the related content according to the similarity of the behavior events and the historical behavior of the user, recommending the content according to the quantitative index of the preference degree, and grouping and clustering the users with the same interest and preference. The invention has the beneficial effects that: according to the method, the user behavior event is analyzed by extracting the basic data of the access track by utilizing the user access track, the access duration and the access frequency.
Description
Technical Field
The invention relates to the technical field of internet user access tracks, in particular to an application method of an internet user access track behavior big data analysis algorithm.
Background
The access behavior track of people on the Internet comprises user active behavior and non-active behavior, wherein the user active behavior is the behavior of clicking (Click) pages by users, and the non-active user behavior is the behavior of simultaneously generating auxiliary pages while clicking (Click) pages by users. Typically, an active click (click) action is generated in conjunction with the attachment into multiple pages, Hits. In a user access behavior, the number of pages generated by the non-active behavior is several times, dozens of times or even hundreds of times of the number of pages generated by the active behavior, so that a large number of 'junk' pages are generated in one access behavior, and the interest characteristics of the user are seriously influenced and accurately depicted. At present, the solution is to set the "garbage" pages (i.e. inactive behaviors) as blacklists for filtering, and form PageViews (usually abbreviated as PV) to approach the active behaviors.
Formula algorithm in the prior artThe denominator | n (i) | is the number of users who like the behavior event i, and the numerator n (i) n (j) is the number of users who like both the behavior event i and the behavior event j. Therefore, the above formula can be understood as how many proportion of users who like the behavior event i also like the behavior event j.
Although the above formula seems reasonable, there is a problem that if the behavior event j is hot and is liked by many people, Wij is large and close to 1. Therefore, the formula may cause any behavior event to have a great similarity with the popular behavior event, which is obviously not a good characteristic for the recommendation system aiming to mine the long tail information.
Therefore, it is necessary to provide an application method of an internet user access trajectory behavior big data analysis algorithm for the above problems.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide an application method of an internet user access track behavior big data analysis algorithm to solve the problems.
Application method of internet user access track behavior big data analysis algorithm, wherein application method comprises steps
(1) Calculating similarity between the behavior events;
(2) generating a recommendation list for the user according to the similarity of the behavior events and the historical behavior of the user;
(3) calculating the interestingness for the behavioral event using the following formula;
wherein the denominator | N (i) | N (j) | is the number of users who like the behavior event i and j, and the numerator N (i) | N (j) is the number of users who like the behavior event i and the behavior event j at the same time;
(4) when the ItemCF algorithm is used for calculating the similarity of the behavior events, firstly, a user-behavior event inverted list is established, and then, for each user, every two behavior events in the behavior event list of the user are added with 1 in a co-occurrence matrix C;
(5) for each action event set, adding one to each action event in the action event set to obtain a matrix, adding the matrixes to obtain an upper C matrix, and normalizing the C matrix to obtain a cosine similarity matrix W between the action events;
(6) after obtaining the similarity between the behavior events, ItemCF calculates the interest of the user u in a behavior event j by the following formula:
where n (u) is a set of behavioral events liked by the user, S (j, K) is a set of K behavioral events most similar to behavioral event j, wji is the similarity of behavioral events j and i, and rui is the interest of user u in behavioral event i.
Preferably, step (4) establishes a list containing his favorite behavior events for each user.
Preferably, C [ i ] [ j ] records the number of users who like the behavior event i and the behavior event j at the same time.
Preferably, step (2) is realized by steps (3) to (6).
Preferably, the action events can be interests, hobbies, habits, commodities and the like.
Compared with the prior art, the invention has the beneficial effects that: according to the method, the user behavior event is analyzed by extracting the basic data of the access track by utilizing the user access track, the access duration and the access frequency.
Drawings
FIG. 1 is a user behavior event interest matrix diagram of an application method of an internet user access track behavior big data analysis algorithm provided by the invention;
FIG. 2 is a diagram of an example behavior event recommendation matrix of the present invention.
Detailed Description
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
The embodiments of the invention will be described in detail below with reference to the drawings, but the invention can be implemented in many different ways as defined and covered by the claims.
As shown in figure 1 and combined with figure 2, an application method of an internet user access track behavior big data analysis algorithm is provided, wherein the application method comprises the following steps
(1) Calculating similarity between the behavior events;
(2) generating a recommendation list for the user according to the similarity of the behavior events and the historical behavior of the user;
(3) calculating the interestingness for the behavioral event using the following formula;
wherein the denominator | N (i) | N (j) | is the number of users who like the behavior event i and j, and the numerator N (i) | N (j) is the number of users who like the behavior event i and the behavior event j at the same time;
(4) when the ItemCF algorithm is used for calculating the similarity of the behavior events, firstly, a user-behavior event inverted list is established, and then, for each user, every two behavior events in the behavior event list of the user are added with 1 in a co-occurrence matrix C;
(5) for each action event set, adding one to each action event in the action event set to obtain a matrix, adding the matrixes to obtain an upper C matrix, and normalizing the C matrix to obtain a cosine similarity matrix W between the action events;
(6) after obtaining the similarity between the behavior events, ItemCF calculates the interest of the user u in a behavior event j by the following formula:
where n (u) is a set of behavioral events liked by the user, S (j, K) is a set of K behavioral events most similar to behavioral event j, wji is the similarity of behavioral events j and i, and rui is the interest of user u in behavioral event i.
Further, step (4) establishes a list containing favorite behavior events for each user.
Further, C [ i ] [ j ] records the number of users who like the behavior event i and the behavior event j at the same time.
Further, the step (2) is realized by the steps (3) to (6).
Further, the action events may be interests, hobbies, habits, commodities, and the like.
Compared with the prior art, the invention has the beneficial effects that: according to the method, the user behavior event is analyzed by extracting the basic data of the access track by utilizing the user access track, the access duration and the access frequency.
As shown in FIG. 2, the user likes both C + + Primer Chinese edition and Programming America. ItemCF would then find the 3 books that are most similar to the two books, and then calculate the user's interest level in each book according to the formula definition. For example, ItemCF recommends "guidance on algorithm" to the user because this book is similar to "C + + Primer Chinese edition" with a similarity of 0.4, and this book is also similar to "American on Programming" with a similarity of 0.5. Considering that the user's interest level in C + + Primer chinese version is 1.3 and the interest level in programmed beauty is 0.9, the user's interest level in algorithm introduction is 1.3 × 0.4+0.9 × 0.5 — 0.97.
As can be seen from this example, one advantage of ItemCF is that it can provide a recommendation explanation that takes advantage of the user's historical liking
The action event is interpreted for the current recommendation. The recommendation of ItemCF is more personalized, and the interest inheritance of the user is reflected. ItemCF can be greatly advantageous in book, e-commerce and movie sites, such as Amazon, Bean, Netflix. First, in these websites, the interests of the user are relatively fixed and persistent. A technician may be purchasing a technical book and they are not as sensitive to how hot the book is, and in fact the more sophisticated technicians the more likely they are to look at the book. In addition, the popularity of most users in these systems is not needed to assist them in determining the quality of a performance event, but rather, the quality of the performance event can be determined by themselves based on knowledge in the field. The task of personalized recommendations in these websites is therefore to help the user find behavioural events relevant to his research field. Therefore, the ItemCF algorithm becomes the preferred algorithm for these web sites. In addition, the behavior event updating speed of the websites is not particularly fast, the behavior event similarity matrix is updated once a day, which is acceptable for the websites, and ItemCF needs to maintain a behavior event similarity matrix. From the storage perspective, if there are many users, a large space is needed for maintaining the user interest similarity matrix, and similarly, if there are many behavior events, the cost for maintaining the behavior event similarity matrix is large, the number of users is often very large in the actual internet, and the number of behavior events is relatively small in books and e-commerce websites. Furthermore, the similarity of behavioral events is generally stable with respect to the user's interests, so using ItemCF is a better choice.
The behavioral event cold start is a serious problem for the ItemCF algorithm. Since the principle of the ItemCF algorithm is to give the user
Recommending behavioral events that are similar to those that he previously liked. The ItemCF algorithm calculates a behavior event similarity table (generally once a day) by using user behaviors at intervals, and the ItemCF algorithm puts a previously calculated behavior event correlation matrix in a memory during online service. Therefore, when a new behavior event is added, the behavior event does not exist in the behavior event correlation table in the memory, and therefore the ItemCF algorithm cannot recommend the new behavior event. The method for solving the problem is to update the behavior event similarity table frequently, but calculating the behavior event similarity based on the user behaviors is a very time-consuming matter, and the main reason is that the user behavior log is very huge. Moreover, if the new behavior event is not displayed to the user, the user cannot generate the behavior for the new behavior event, and the correlation matrix containing the new behavior event cannot be calculated through the behavior log calculation. For this reason, we can only calculate the behavior event correlation table using the content information of the item, and frequently update the correlation table (e.g., once in half an hour).
The content information of the behavior event is various, and different types of behavior events have different content information. In the case of a movie, the content information generally includes a title, a director, actors, drama, genre, country, era, and the like. In the case of books, the content information typically includes title, author, publisher, text, category, etc. Another approach to solve the problem of user cold start is to not immediately present the recommendation result to the user when the new user accesses the recommendation system for the first time, but to provide some behavior events to the user, let the user feedback their interest in these behavior events, and then provide personalized recommendations according to the user feedback.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.
Claims (5)
1. An application method of an internet user access track behavior big data analysis algorithm is characterized by comprising the following steps: in which the method steps are applied
(1) Calculating similarity between the behavior events;
(2) generating a recommendation list for the user according to the similarity of the behavior events and the historical behavior of the user;
(3) calculating the interestingness for the behavioral event using the following formula;
wherein the denominator | N (i) | N (j) | is the number of users who like the behavior event i and j, and the numerator N (i) | N (j) is the number of users who like the behavior event i and the behavior event j at the same time;
(4) when the ItemCF algorithm is used for calculating the similarity of the behavior events, firstly, a user-behavior event inverted list is established, and then, for each user, every two behavior events in the behavior event list of the user are added with 1 in a co-occurrence matrix C;
(5) for each action event set, adding one to each action event in the action event set to obtain a matrix, adding the matrixes to obtain an upper C matrix, and normalizing the C matrix to obtain a cosine similarity matrix W between the action events;
(6) after obtaining the similarity between the behavior events, ItemCF calculates the interest of the user u in a behavior event j by the following formula:
where n (u) is a set of behavioral events liked by the user, S (j, K) is a set of K behavioral events most similar to behavioral event j, wji is the similarity of behavioral events j and i, and rui is the interest of user u in behavioral event i.
2. The method for applying the big data analysis algorithm for the internet user access track behavior as claimed in claim 1, wherein: wherein step (4) creates a list for each user containing his favorite behavioral events.
3. The method for applying the big data analysis algorithm for the internet user access track behavior as claimed in claim 1, wherein: wherein C [ i ] [ j ] records the number of users who like the behavior event i and the behavior event j at the same time.
4. The method for applying the big data analysis algorithm for the internet user access track behavior as claimed in claim 1, wherein: the step (2) is realized by the steps (3) to (6).
5. The method for applying the big data analysis algorithm for the internet user access track behavior as claimed in claim 1, wherein: wherein the behavioral events can be interests, hobbies, habits and commodities.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010488643.0A CN111695005A (en) | 2020-06-02 | 2020-06-02 | Application method of internet user access track behavior big data analysis algorithm |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010488643.0A CN111695005A (en) | 2020-06-02 | 2020-06-02 | Application method of internet user access track behavior big data analysis algorithm |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN111695005A true CN111695005A (en) | 2020-09-22 |
Family
ID=72479190
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202010488643.0A Pending CN111695005A (en) | 2020-06-02 | 2020-06-02 | Application method of internet user access track behavior big data analysis algorithm |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN111695005A (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114461899A (en) * | 2021-12-24 | 2022-05-10 | 新奥新智科技有限公司 | Collaborative filtering recommendation method and device for user, electronic equipment and storage medium |
| CN118332216A (en) * | 2024-04-28 | 2024-07-12 | 北京才多对信息技术有限公司 | Browsing behavior information identification method, device, equipment and medium |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103886047A (en) * | 2014-03-12 | 2014-06-25 | 浙江大学 | Distributed on-line recommending method orientated to stream data |
| CN104598643A (en) * | 2015-02-13 | 2015-05-06 | 成都品果科技有限公司 | Article similarity contribution factor, similarity acquiring method, as well as article recommendation method and system thereof |
| CN106095841A (en) * | 2016-06-05 | 2016-11-09 | 西华大学 | Method is recommended in a kind of mobile Internet advertisement based on collaborative filtering |
| CN108664564A (en) * | 2018-04-13 | 2018-10-16 | 东华大学 | A kind of improvement collaborative filtering recommending method based on item contents feature |
-
2020
- 2020-06-02 CN CN202010488643.0A patent/CN111695005A/en active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103886047A (en) * | 2014-03-12 | 2014-06-25 | 浙江大学 | Distributed on-line recommending method orientated to stream data |
| CN104598643A (en) * | 2015-02-13 | 2015-05-06 | 成都品果科技有限公司 | Article similarity contribution factor, similarity acquiring method, as well as article recommendation method and system thereof |
| CN106095841A (en) * | 2016-06-05 | 2016-11-09 | 西华大学 | Method is recommended in a kind of mobile Internet advertisement based on collaborative filtering |
| CN108664564A (en) * | 2018-04-13 | 2018-10-16 | 东华大学 | A kind of improvement collaborative filtering recommending method based on item contents feature |
Non-Patent Citations (1)
| Title |
|---|
| 项亮, 人民邮电出版社 * |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114461899A (en) * | 2021-12-24 | 2022-05-10 | 新奥新智科技有限公司 | Collaborative filtering recommendation method and device for user, electronic equipment and storage medium |
| CN118332216A (en) * | 2024-04-28 | 2024-07-12 | 北京才多对信息技术有限公司 | Browsing behavior information identification method, device, equipment and medium |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20250371577A1 (en) | Computer system including a processor configured to evaluate browse history data and social media data | |
| Salehi et al. | Personalized recommendation of learning material using sequential pattern mining and attribute based collaborative filtering | |
| US10102307B2 (en) | Method and system for multi-phase ranking for content personalization | |
| US8001008B2 (en) | System and method of collaborative filtering based on attribute profiling | |
| RU2725659C2 (en) | Method and system for evaluating data on user-element interactions | |
| US9251516B2 (en) | Systems and methods for electronic distribution of job listings | |
| Ali et al. | Movie recommendation system using genome tags and content-based filtering | |
| US20150242750A1 (en) | Asymmetric Rankers for Vector-Based Recommendation | |
| US20080140641A1 (en) | Knowledge and interests based search term ranking for search results validation | |
| US9430572B2 (en) | Method and system for user profiling via mapping third party interests to a universal interest space | |
| Pyo et al. | Automatic and personalized recommendation of TV program contents using sequential pattern mining for smart TV user interaction | |
| De Maio et al. | Social media marketing through time‐aware collaborative filtering | |
| Rajabi Kouchi et al. | A movie recommender system based on user profile and artificial bee colony optimization | |
| Wu et al. | Div-clustering: Exploring active users for social collaborative recommendation | |
| CN111695005A (en) | Application method of internet user access track behavior big data analysis algorithm | |
| Košir et al. | Web user profiles with time-decay and prototyping | |
| Madadipouya | A location-based movie recommender system using collaborative filtering | |
| Sharma et al. | An efficient semantic clustering of URLs for web page recommendation | |
| Su et al. | An item-based music recommender system using music content similarity | |
| Liu | Personalized recommendation algorithm for movie data combining rating matrix and user subjective preference | |
| Putta et al. | Analytical study of content-based and collaborative filtering methods for recommender systems | |
| Wadpelli et al. | Manifesta: An Event Management Platform Using Recommendation System | |
| Prasanth et al. | Semantic chameleon clustering analysis algorithm with recommendation rules for efficient web usage mining | |
| Wen | Development of personalized online systems for web search, recommendations, and e-commerce | |
| Singh et al. | Personalized Web Search: A Survey |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200922 |