[go: up one dir, main page]

CN111695005A - Application method of internet user access track behavior big data analysis algorithm - Google Patents

Application method of internet user access track behavior big data analysis algorithm Download PDF

Info

Publication number
CN111695005A
CN111695005A CN202010488643.0A CN202010488643A CN111695005A CN 111695005 A CN111695005 A CN 111695005A CN 202010488643 A CN202010488643 A CN 202010488643A CN 111695005 A CN111695005 A CN 111695005A
Authority
CN
China
Prior art keywords
behavior
user
event
events
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010488643.0A
Other languages
Chinese (zh)
Inventor
徐建民
余成勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhai Dashi Intelligence Technology Co ltd
Original Assignee
Wuhai Dashi Intelligence Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhai Dashi Intelligence Technology Co ltd filed Critical Wuhai Dashi Intelligence Technology Co ltd
Priority to CN202010488643.0A priority Critical patent/CN111695005A/en
Publication of CN111695005A publication Critical patent/CN111695005A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9035Filtering based on additional data, e.g. user or group profiles

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an application method of an internet user access track behavior big data analysis algorithm, which is used for calculating the similarity between behavior events; and determining the preference degree of the user for the related content according to the similarity of the behavior events and the historical behavior of the user, recommending the content according to the quantitative index of the preference degree, and grouping and clustering the users with the same interest and preference. The invention has the beneficial effects that: according to the method, the user behavior event is analyzed by extracting the basic data of the access track by utilizing the user access track, the access duration and the access frequency.

Description

Application method of internet user access track behavior big data analysis algorithm
Technical Field
The invention relates to the technical field of internet user access tracks, in particular to an application method of an internet user access track behavior big data analysis algorithm.
Background
The access behavior track of people on the Internet comprises user active behavior and non-active behavior, wherein the user active behavior is the behavior of clicking (Click) pages by users, and the non-active user behavior is the behavior of simultaneously generating auxiliary pages while clicking (Click) pages by users. Typically, an active click (click) action is generated in conjunction with the attachment into multiple pages, Hits. In a user access behavior, the number of pages generated by the non-active behavior is several times, dozens of times or even hundreds of times of the number of pages generated by the active behavior, so that a large number of 'junk' pages are generated in one access behavior, and the interest characteristics of the user are seriously influenced and accurately depicted. At present, the solution is to set the "garbage" pages (i.e. inactive behaviors) as blacklists for filtering, and form PageViews (usually abbreviated as PV) to approach the active behaviors.
Formula algorithm in the prior art
Figure RE-GDA0002631410640000011
The denominator | n (i) | is the number of users who like the behavior event i, and the numerator n (i) n (j) is the number of users who like both the behavior event i and the behavior event j. Therefore, the above formula can be understood as how many proportion of users who like the behavior event i also like the behavior event j.
Although the above formula seems reasonable, there is a problem that if the behavior event j is hot and is liked by many people, Wij is large and close to 1. Therefore, the formula may cause any behavior event to have a great similarity with the popular behavior event, which is obviously not a good characteristic for the recommendation system aiming to mine the long tail information.
Therefore, it is necessary to provide an application method of an internet user access trajectory behavior big data analysis algorithm for the above problems.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide an application method of an internet user access track behavior big data analysis algorithm to solve the problems.
Application method of internet user access track behavior big data analysis algorithm, wherein application method comprises steps
(1) Calculating similarity between the behavior events;
(2) generating a recommendation list for the user according to the similarity of the behavior events and the historical behavior of the user;
(3) calculating the interestingness for the behavioral event using the following formula;
Figure RE-GDA0002631410640000021
wherein the denominator | N (i) | N (j) | is the number of users who like the behavior event i and j, and the numerator N (i) | N (j) is the number of users who like the behavior event i and the behavior event j at the same time;
(4) when the ItemCF algorithm is used for calculating the similarity of the behavior events, firstly, a user-behavior event inverted list is established, and then, for each user, every two behavior events in the behavior event list of the user are added with 1 in a co-occurrence matrix C;
(5) for each action event set, adding one to each action event in the action event set to obtain a matrix, adding the matrixes to obtain an upper C matrix, and normalizing the C matrix to obtain a cosine similarity matrix W between the action events;
(6) after obtaining the similarity between the behavior events, ItemCF calculates the interest of the user u in a behavior event j by the following formula:
Figure RE-GDA0002631410640000031
where n (u) is a set of behavioral events liked by the user, S (j, K) is a set of K behavioral events most similar to behavioral event j, wji is the similarity of behavioral events j and i, and rui is the interest of user u in behavioral event i.
Preferably, step (4) establishes a list containing his favorite behavior events for each user.
Preferably, C [ i ] [ j ] records the number of users who like the behavior event i and the behavior event j at the same time.
Preferably, step (2) is realized by steps (3) to (6).
Preferably, the action events can be interests, hobbies, habits, commodities and the like.
Compared with the prior art, the invention has the beneficial effects that: according to the method, the user behavior event is analyzed by extracting the basic data of the access track by utilizing the user access track, the access duration and the access frequency.
Drawings
FIG. 1 is a user behavior event interest matrix diagram of an application method of an internet user access track behavior big data analysis algorithm provided by the invention;
FIG. 2 is a diagram of an example behavior event recommendation matrix of the present invention.
Detailed Description
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
The embodiments of the invention will be described in detail below with reference to the drawings, but the invention can be implemented in many different ways as defined and covered by the claims.
As shown in figure 1 and combined with figure 2, an application method of an internet user access track behavior big data analysis algorithm is provided, wherein the application method comprises the following steps
(1) Calculating similarity between the behavior events;
(2) generating a recommendation list for the user according to the similarity of the behavior events and the historical behavior of the user;
(3) calculating the interestingness for the behavioral event using the following formula;
Figure RE-GDA0002631410640000041
wherein the denominator | N (i) | N (j) | is the number of users who like the behavior event i and j, and the numerator N (i) | N (j) is the number of users who like the behavior event i and the behavior event j at the same time;
(4) when the ItemCF algorithm is used for calculating the similarity of the behavior events, firstly, a user-behavior event inverted list is established, and then, for each user, every two behavior events in the behavior event list of the user are added with 1 in a co-occurrence matrix C;
(5) for each action event set, adding one to each action event in the action event set to obtain a matrix, adding the matrixes to obtain an upper C matrix, and normalizing the C matrix to obtain a cosine similarity matrix W between the action events;
(6) after obtaining the similarity between the behavior events, ItemCF calculates the interest of the user u in a behavior event j by the following formula:
Figure RE-GDA0002631410640000042
where n (u) is a set of behavioral events liked by the user, S (j, K) is a set of K behavioral events most similar to behavioral event j, wji is the similarity of behavioral events j and i, and rui is the interest of user u in behavioral event i.
Further, step (4) establishes a list containing favorite behavior events for each user.
Further, C [ i ] [ j ] records the number of users who like the behavior event i and the behavior event j at the same time.
Further, the step (2) is realized by the steps (3) to (6).
Further, the action events may be interests, hobbies, habits, commodities, and the like.
Compared with the prior art, the invention has the beneficial effects that: according to the method, the user behavior event is analyzed by extracting the basic data of the access track by utilizing the user access track, the access duration and the access frequency.
As shown in FIG. 2, the user likes both C + + Primer Chinese edition and Programming America. ItemCF would then find the 3 books that are most similar to the two books, and then calculate the user's interest level in each book according to the formula definition. For example, ItemCF recommends "guidance on algorithm" to the user because this book is similar to "C + + Primer Chinese edition" with a similarity of 0.4, and this book is also similar to "American on Programming" with a similarity of 0.5. Considering that the user's interest level in C + + Primer chinese version is 1.3 and the interest level in programmed beauty is 0.9, the user's interest level in algorithm introduction is 1.3 × 0.4+0.9 × 0.5 — 0.97.
As can be seen from this example, one advantage of ItemCF is that it can provide a recommendation explanation that takes advantage of the user's historical liking
The action event is interpreted for the current recommendation. The recommendation of ItemCF is more personalized, and the interest inheritance of the user is reflected. ItemCF can be greatly advantageous in book, e-commerce and movie sites, such as Amazon, Bean, Netflix. First, in these websites, the interests of the user are relatively fixed and persistent. A technician may be purchasing a technical book and they are not as sensitive to how hot the book is, and in fact the more sophisticated technicians the more likely they are to look at the book. In addition, the popularity of most users in these systems is not needed to assist them in determining the quality of a performance event, but rather, the quality of the performance event can be determined by themselves based on knowledge in the field. The task of personalized recommendations in these websites is therefore to help the user find behavioural events relevant to his research field. Therefore, the ItemCF algorithm becomes the preferred algorithm for these web sites. In addition, the behavior event updating speed of the websites is not particularly fast, the behavior event similarity matrix is updated once a day, which is acceptable for the websites, and ItemCF needs to maintain a behavior event similarity matrix. From the storage perspective, if there are many users, a large space is needed for maintaining the user interest similarity matrix, and similarly, if there are many behavior events, the cost for maintaining the behavior event similarity matrix is large, the number of users is often very large in the actual internet, and the number of behavior events is relatively small in books and e-commerce websites. Furthermore, the similarity of behavioral events is generally stable with respect to the user's interests, so using ItemCF is a better choice.
The behavioral event cold start is a serious problem for the ItemCF algorithm. Since the principle of the ItemCF algorithm is to give the user
Recommending behavioral events that are similar to those that he previously liked. The ItemCF algorithm calculates a behavior event similarity table (generally once a day) by using user behaviors at intervals, and the ItemCF algorithm puts a previously calculated behavior event correlation matrix in a memory during online service. Therefore, when a new behavior event is added, the behavior event does not exist in the behavior event correlation table in the memory, and therefore the ItemCF algorithm cannot recommend the new behavior event. The method for solving the problem is to update the behavior event similarity table frequently, but calculating the behavior event similarity based on the user behaviors is a very time-consuming matter, and the main reason is that the user behavior log is very huge. Moreover, if the new behavior event is not displayed to the user, the user cannot generate the behavior for the new behavior event, and the correlation matrix containing the new behavior event cannot be calculated through the behavior log calculation. For this reason, we can only calculate the behavior event correlation table using the content information of the item, and frequently update the correlation table (e.g., once in half an hour).
The content information of the behavior event is various, and different types of behavior events have different content information. In the case of a movie, the content information generally includes a title, a director, actors, drama, genre, country, era, and the like. In the case of books, the content information typically includes title, author, publisher, text, category, etc. Another approach to solve the problem of user cold start is to not immediately present the recommendation result to the user when the new user accesses the recommendation system for the first time, but to provide some behavior events to the user, let the user feedback their interest in these behavior events, and then provide personalized recommendations according to the user feedback.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (5)

1. An application method of an internet user access track behavior big data analysis algorithm is characterized by comprising the following steps: in which the method steps are applied
(1) Calculating similarity between the behavior events;
(2) generating a recommendation list for the user according to the similarity of the behavior events and the historical behavior of the user;
(3) calculating the interestingness for the behavioral event using the following formula;
Figure RE-FDA0002631410630000011
wherein the denominator | N (i) | N (j) | is the number of users who like the behavior event i and j, and the numerator N (i) | N (j) is the number of users who like the behavior event i and the behavior event j at the same time;
(4) when the ItemCF algorithm is used for calculating the similarity of the behavior events, firstly, a user-behavior event inverted list is established, and then, for each user, every two behavior events in the behavior event list of the user are added with 1 in a co-occurrence matrix C;
(5) for each action event set, adding one to each action event in the action event set to obtain a matrix, adding the matrixes to obtain an upper C matrix, and normalizing the C matrix to obtain a cosine similarity matrix W between the action events;
(6) after obtaining the similarity between the behavior events, ItemCF calculates the interest of the user u in a behavior event j by the following formula:
Figure RE-FDA0002631410630000012
where n (u) is a set of behavioral events liked by the user, S (j, K) is a set of K behavioral events most similar to behavioral event j, wji is the similarity of behavioral events j and i, and rui is the interest of user u in behavioral event i.
2. The method for applying the big data analysis algorithm for the internet user access track behavior as claimed in claim 1, wherein: wherein step (4) creates a list for each user containing his favorite behavioral events.
3. The method for applying the big data analysis algorithm for the internet user access track behavior as claimed in claim 1, wherein: wherein C [ i ] [ j ] records the number of users who like the behavior event i and the behavior event j at the same time.
4. The method for applying the big data analysis algorithm for the internet user access track behavior as claimed in claim 1, wherein: the step (2) is realized by the steps (3) to (6).
5. The method for applying the big data analysis algorithm for the internet user access track behavior as claimed in claim 1, wherein: wherein the behavioral events can be interests, hobbies, habits and commodities.
CN202010488643.0A 2020-06-02 2020-06-02 Application method of internet user access track behavior big data analysis algorithm Pending CN111695005A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010488643.0A CN111695005A (en) 2020-06-02 2020-06-02 Application method of internet user access track behavior big data analysis algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010488643.0A CN111695005A (en) 2020-06-02 2020-06-02 Application method of internet user access track behavior big data analysis algorithm

Publications (1)

Publication Number Publication Date
CN111695005A true CN111695005A (en) 2020-09-22

Family

ID=72479190

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010488643.0A Pending CN111695005A (en) 2020-06-02 2020-06-02 Application method of internet user access track behavior big data analysis algorithm

Country Status (1)

Country Link
CN (1) CN111695005A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114461899A (en) * 2021-12-24 2022-05-10 新奥新智科技有限公司 Collaborative filtering recommendation method and device for user, electronic equipment and storage medium
CN118332216A (en) * 2024-04-28 2024-07-12 北京才多对信息技术有限公司 Browsing behavior information identification method, device, equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103886047A (en) * 2014-03-12 2014-06-25 浙江大学 Distributed on-line recommending method orientated to stream data
CN104598643A (en) * 2015-02-13 2015-05-06 成都品果科技有限公司 Article similarity contribution factor, similarity acquiring method, as well as article recommendation method and system thereof
CN106095841A (en) * 2016-06-05 2016-11-09 西华大学 Method is recommended in a kind of mobile Internet advertisement based on collaborative filtering
CN108664564A (en) * 2018-04-13 2018-10-16 东华大学 A kind of improvement collaborative filtering recommending method based on item contents feature

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103886047A (en) * 2014-03-12 2014-06-25 浙江大学 Distributed on-line recommending method orientated to stream data
CN104598643A (en) * 2015-02-13 2015-05-06 成都品果科技有限公司 Article similarity contribution factor, similarity acquiring method, as well as article recommendation method and system thereof
CN106095841A (en) * 2016-06-05 2016-11-09 西华大学 Method is recommended in a kind of mobile Internet advertisement based on collaborative filtering
CN108664564A (en) * 2018-04-13 2018-10-16 东华大学 A kind of improvement collaborative filtering recommending method based on item contents feature

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
项亮, 人民邮电出版社 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114461899A (en) * 2021-12-24 2022-05-10 新奥新智科技有限公司 Collaborative filtering recommendation method and device for user, electronic equipment and storage medium
CN118332216A (en) * 2024-04-28 2024-07-12 北京才多对信息技术有限公司 Browsing behavior information identification method, device, equipment and medium

Similar Documents

Publication Publication Date Title
US20250371577A1 (en) Computer system including a processor configured to evaluate browse history data and social media data
Salehi et al. Personalized recommendation of learning material using sequential pattern mining and attribute based collaborative filtering
US10102307B2 (en) Method and system for multi-phase ranking for content personalization
US8001008B2 (en) System and method of collaborative filtering based on attribute profiling
RU2725659C2 (en) Method and system for evaluating data on user-element interactions
US9251516B2 (en) Systems and methods for electronic distribution of job listings
Ali et al. Movie recommendation system using genome tags and content-based filtering
US20150242750A1 (en) Asymmetric Rankers for Vector-Based Recommendation
US20080140641A1 (en) Knowledge and interests based search term ranking for search results validation
US9430572B2 (en) Method and system for user profiling via mapping third party interests to a universal interest space
Pyo et al. Automatic and personalized recommendation of TV program contents using sequential pattern mining for smart TV user interaction
De Maio et al. Social media marketing through time‐aware collaborative filtering
Rajabi Kouchi et al. A movie recommender system based on user profile and artificial bee colony optimization
Wu et al. Div-clustering: Exploring active users for social collaborative recommendation
CN111695005A (en) Application method of internet user access track behavior big data analysis algorithm
Košir et al. Web user profiles with time-decay and prototyping
Madadipouya A location-based movie recommender system using collaborative filtering
Sharma et al. An efficient semantic clustering of URLs for web page recommendation
Su et al. An item-based music recommender system using music content similarity
Liu Personalized recommendation algorithm for movie data combining rating matrix and user subjective preference
Putta et al. Analytical study of content-based and collaborative filtering methods for recommender systems
Wadpelli et al. Manifesta: An Event Management Platform Using Recommendation System
Prasanth et al. Semantic chameleon clustering analysis algorithm with recommendation rules for efficient web usage mining
Wen Development of personalized online systems for web search, recommendations, and e-commerce
Singh et al. Personalized Web Search: A Survey

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200922