CN107895303B

CN107895303B - A method of personalized recommendation based on OCEAN model

Info

Publication number: CN107895303B
Application number: CN201711131237.3A
Authority: CN
Inventors: 刘珊; 杨波; 郑文锋; 刘雨薇
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2017-11-15
Filing date: 2017-11-15
Publication date: 2022-03-25
Anticipated expiration: 2037-11-15
Also published as: CN107895303A

Abstract

The invention discloses a method for personalized recommendation based on an OCEAN model. By establishing an OCEAN model of a microblog user, the personalized recommendation method based on the user's OCEAN model is realized. When building the user's OCEAN model, the user's microblog text is imported into the LDA model, and the implicit meaning is discovered from the text in an unguided method to improve the accuracy of prediction. At the same time, the personalized recommendation is based on user clustering, which narrows the search scope of users and reduces the calculation amount of real-time recommendation. Combining the user's OCEAN model with the personalized recommendation, and deeply researching the user's personality characteristics, the personalized recommendation process is more in line with the user's psychology and has higher accuracy.

Description

Personalized recommendation method based on OCEAN model

Technical Field

The invention belongs to the technical field of personality prediction and personalized recommendation, and particularly relates to a personalized recommendation method based on an OCEAN model.

Background

In psychology, the OCEAN model is the five broad dimensions used to describe human personality, and this theory is based on the five-personality factor model. Five types of factors for the OCEAN model include: strict, outward, open, and fit for the personality and personality traits of human and nervous system. O stands for open to expert, C stands for Consumeriousness, E stands for exchange, A stands for Agreebleness and N stands for Neuroticisms. These five factors provide a rich conceptual architecture. In addition, the previous researches find that the five-personality theoretical model is strongly related to the behaviors of people on the social network sites.

Current personalized recommendation algorithms can be roughly classified into four categories:

(1) the recommendation mechanism based on the demographics is a recommendation method which is easy to implement, and simply finds the relevance degree of the user according to the basic information of the system user, and then recommends other articles which are liked by the similar user to the current user.

(2) The recommendation based on the content is a recommendation mechanism which is most widely applied at the beginning of the emergence of a recommendation engine, and the core idea of the recommendation based on the content recommendation is to find the relevance of the item or the content according to the metadata of the recommended item or the content and then recommend the similar item to a user based on the past preference record of the user. The recommendation system is mostly applied to the application of some information, some labels are extracted from the articles as the keywords of the articles, and then the similarity of the two articles can be evaluated through the labels.

(3) Association rule based recommendations, which are more common in e-commerce systems, have also proven to work well. The practical meaning is that users who have purchased some items prefer to purchase others. The primary goal of association rule-based recommendation systems is to mine association rules, i.e., collections of items purchased by many users at the same time, which can be recommended to each other.

(4) Collaborative filtering, which is a recommendation method widely used in recommendation systems. This algorithm is based on an assumption of "category by category, people by group" that users who like the same item are more likely to have the same interest. The recommendation system based on collaborative filtering is generally applied to a system with user scoring, and the user preference for the articles is described through the scores. Collaborative filtering is considered as an example of using collective intelligence, without requiring special treatment of items, but rather by establishing associations between items by users. Currently, collaborative filtering recommendation systems are differentiated into two types: user-based recommendations and Item-based recommendations.

However, the current personalized recommendation method is based on the above four categories, and does not combine the personality characteristics of the user well for marketing. The behavior of the user is not random, but implies many specific patterns. The network social behavior of the user reflects the user character, and meanwhile, the character of the user also influences the user behavior, so that the character of the user can be taken into consideration during online accurate marketing, online commodity recommendation, social recommendation and auxiliary product design, and a better result is obtained.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a personalized recommendation method based on an OCEAN model, which is used for performing personalized recommendation based on the personality of a user.

In order to achieve the above object, the invention provides a method for personalized recommendation based on an OCEAN model, which is characterized by comprising the following steps:

(1) establishing OCEAN model of social network site user

(1.1) selecting a plurality of microblog accounts, carrying out five personality tests on the users to obtain scores of five personality dimensions, and taking the scores of the five personality dimensions as an OCEAN model of the tested user;

(1.2) acquiring page content in a browser simulation mode, capturing microblog data of tested users, and respectively assembling the microblog data of each user into a text document;

(1.3) preprocessing the text document: filtering the text document, performing word segmentation processing, and storing the text document in a specified database after words are removed;

(1.4) importing the text documents of all tested users in the database into an LDA theme model, and outputting the probability distribution of the text document theme of each tested user by the LDA theme model;

(1.5) taking the document theme probability distribution of the user to be tested as sample input, taking the OCEAN model of the user to be tested as sample output, training by using a BP neural network, establishing a mapping model between the document theme distribution of the user and the OCEAN model of the user, and taking the mapping model as the OCEAN model for predicting the social network site user;

(2) personalized recommendation is carried out on users based on OCEAN model of social network site users

(2.1) user clustering

Based on an OCEAN model of the social network site users, dividing the users into user groups with K different characters by using a K-means clustering algorithm;

(2.2) carrying out personalized recommendation on the target users according to the categories to which the target users belong

When a target user appears, firstly determining a clustering category where the target user is located, then respectively taking all microblogs sent by each user in the category where the target user is located as a candidate set item, respectively performing text feature random extraction on each candidate set item by using a word frequency-inverse document, and constructing an n-dimensional vector as attribute data of each candidate set item, wherein each microblog is extracted as a one-dimensional vector;

assembling a text document according to microblog data of a target user, and performing text feature random extraction on the text document of the target user by using word frequency-inverse document frequency to construct an m-dimensional vector as favorite information of the target user;

and according to a cosine similarity formula, calculating the similarity between the favorite data of the user and the attribute data of each candidate set item, and recommending the candidate set item with the highest similarity to the target user as a recommendation set.

The invention aims to realize the following steps:

the invention discloses an OCEAN model-based personalized recommendation method, which is implemented by establishing an OCEAN model of a microblog user. When the OCEAN model of the user is established, the microblog text of the user is led into the LDA model, implicit connotation is found from the text in a non-guidance method, and prediction accuracy is improved. Meanwhile, the personalized recommendation is established on the basis of user clustering, the search range of the user is narrowed, and the calculation amount of real-time recommendation is reduced. The OCEAN model of the user is combined with personalized recommendation, the study is carried out by going deep into the character characteristics of the user, the psychology of the user is better met in the personalized recommendation process, and the accuracy is higher.

Meanwhile, the personalized recommendation method based on the OCEAN model further has the following beneficial effects:

(1) the OCEAN model of the microblog user is established, the index of the user character is considered before the traditional personalized recommendation, the character of the user and the preference of the user are integrated, and the recommendation method is high in accuracy and more suitable for the psychology of the user.

(2) When the users are clustered, the initial clustering centers of the clustering algorithm are not randomly selected, and the users with high micro-blogger page access volume are manually selected as the clustering centers, so that isolated points can be better reduced.

Drawings

FIG. 1 is a flow chart of a method for personalized recommendation based on an OCEAN model according to the present invention;

FIG. 2 is a diagram of an LDA topic model.

Detailed Description

The following description of the embodiments of the present invention is provided in order to better understand the present invention for those skilled in the art with reference to the accompanying drawings. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the subject matter of the present invention.

Examples

FIG. 1 is a flow chart of a method for personalized recommendation based on an OCEAN model according to the present invention.

In this embodiment, as shown in fig. 1, a method for personalized recommendation based on an OCEAN model in the present invention includes the following steps:

s1, selecting a plurality of microblog accounts, carrying out five personality tests on the users to obtain scores of five personality dimensions, and taking the scores of the five personality dimensions as an OCEAN model of the tested user;

in this embodiment, in 1991, Five-personality scale (Big Five Inventory, BFI) compiled by university of california at berkeley university psychologist over P John on the basis of OCEAN model theory is a universally recognized personality test scale, the credibility and validity of the scale are widely verified in multiple psychological experiments, and the scale is used in this application to obtain a user OCEAN model required for training.

S2, acquiring page content in a browser simulation mode, and capturing microblog data of the tested user, wherein the microblog data of the user is divided into two parts: text documents and user basic information. The text document refers to the summary of all microblog texts sent by the user, the basic information of the user comprises user registration time, user attention quantity, user microblog quantity, whether personalized signatures exist or not and the like, and then the microblog data of each user are respectively gathered into one text document;

s3, preprocessing the text document: filtering the text document, performing word segmentation processing, and storing the text document in a specified database after words are removed;

s4, importing the text documents of all tested users in the database into an LDA theme model, and outputting the probability distribution of the text document theme of each tested user by the LDA theme model;

in this embodiment, the LDA topic model is shown in fig. 2, and parameter definitions in the LDA topic model are shown in table 1;

symbol interpretation:

TABLE 1

Inputting an LDA topic model: the set of all user text documents, the number of topics K, the hyper-parameters α and β are in accordance with the usual empirical values: the setting K is 10 and the setting K is,

β＝0.01，γ＝20

output of LDA topic model: a topic probability distribution for each user text document.

S5, inputting a sample by taking the document theme probability distribution of the user to be tested as a sample, outputting a sample by taking the OCEAN model of the user to be tested as a sample, training by utilizing a BP neural network, establishing a mapping model between the document theme distribution of the user and the OCEAN model of the user, and taking the mapping model as the OCEAN model for predicting the user of the social network site;

s6 clustering based on social network users

in the embodiment, the k-means clustering algorithm has high efficiency, is widely applied to clustering large-scale data, and has good effect on low-level data sets. The invention selects a k-means clustering algorithm.

And setting k as an input parameter of the k-means algorithm, representing the output quantity of the algorithm after the algorithm is segmented and calculated on a data set, wherein the data set consists of n data points and represents the quantity of all users, and the input parameter is the number k of clusters and the OCEAN model data of the users. The specific algorithm is as follows:

1) setting a five-dimensional data multi-bit set I ═ I of a user OCEAN model₁,i₂,...,i₅}；

2) All m users are searched and recorded as a set U ═ U₁,u₂,...,u_m}；

3) Manually selecting users with higher access quantity and different labels from m users as initial clustering centers, and marking as { W₁,W₂,...,W_K}；

4) And (4) circularly inputting the vector, calculating the average value of the objects in each cluster, and updating the cluster center until no change occurs.

S7, carrying out personalized recommendation on the target users according to the categories to which the target users belong

for example: recording the collection of all collected microblog candidate sets as D ═ D₁,d₂,...,d_NAnd the set of words appearing in all microblogs is T ═ T₁,t₂,...,t_N}. That is, we have N candidate set items to be processed, and these items contain N different words. We will eventually use a vector to represent an item, say the jth item is denoted d_j＝{w_1j,w_2j,...,w_njIn which w_1jDenotes the 1 st word t₁In the weight in article j, a larger value indicates more importance; therefore, to represent the jth item, d needs to be computed_jThe value of each component. Utilizing term frequency-inverse document frequency (tf for short) commonly used in information retrieval-idf). The tf-idf corresponding to the kth word in the dictionary in the jth microblog is as follows:

wherein TF (t)_k,d_j) Is the number of times the k-th word appears in the candidate set item j, and n_kThe number of microblogs including the k-th word is determined.

The final weight of the kth word in the microblog j is obtained by the following formula:

Wherein, the cosine similarity formula is:

the scores of a user U and a candidate item I on the n-dimensional item space are respectively expressed as a vector U_a、I_aThen the similarity cos (U, I) is:

U_athe preference value of the target user U for the a-th item is shown, namely the value corresponding to the a-th item in the preference data. I is_aRepresenting the value corresponding to item a in the candidate set item.

Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.

Claims

1. a method for personalized recommendation based on OCEAN model, is characterized in that, comprises the following steps:

(1), establish the OCEAN model of social networking site users

(1.1) Select a number of Weibo accounts, conduct five personality tests on these users, get scores of five personality dimensions, and then use the scores of these five personality dimensions as the OCEAN model of the tested users;

(1.2) Obtain the page content by simulating a browser, and capture the microblog data of the tested user. The user's microblog data is divided into two parts: text documents and basic user information; among them, text documents refer to the information sent by users. The summary of all Weibo texts, the basic user information includes the user registration time, the number of users’ attention, the number of user Weibo entries, and whether there is a personalized signature, and then each user’s Weibo data is summarized into a text document;

(1.3), preprocess the text document: the text document is filtered, segmented, and stored in the specified database after removing the stop word;

(1.4), import the text documents of all the tested users in the database into the LDA topic model, and the LDA topic model outputs the text document topic probability distribution of each tested user;

Among them, the input of the LDA topic model includes: the set of all user text documents, the number of topics K, hyperparameters α and β;

(1.5), using the subject probability distribution of the subject user's document as the sample input, using the OCEAN model of the subject user as the sample output, using the BP neural network for training, and establishing a mapping model between the subject distribution of the user document and the user's OCEAN model, And the mapping model is used as an OCEAN model for predicting users of social networking sites;

(2) Personalized recommendation for users based on the OCEAN model of users of social networking sites

(2.1), user clustering

Based on the OCEAN model of social networking site users, the K-means clustering algorithm is used to divide users into K user groups with different personalities;

Let k be the input parameter of the k-means algorithm, representing the number of outputs of the algorithm after dividing and calculating on the data set. The data set consists of n data points, representing the number of all users, and the input parameters are the number of clusters k and User's OCEAN model data; the specific algorithm is as follows:

1), set the data bits of the five dimensions of the user OCEAN model I={i ₁ , i ₂ ,..., i ₅ };

2) Retrieve all m users, denoted as set U={u ₁ , u ₂ ,..., _um };

3) From m users, manually select users with higher traffic and different labels as the initial clustering centers, denoted as {W ₁ , W ₂ ,...,W _K };

4) Circulate the input vector, calculate the average value of the objects in each cluster, and update the cluster center until it no longer changes;

(2.2), make personalized recommendations according to the category of target users

When the target user appears, first determine the cluster category where the target user is located, and then take all the microblogs sent by each user in the target user's category as a candidate set item, and then use the word frequency-inverse document to analyze each candidate separately. Randomly extract text features from the set item, and construct an n-dimensional vector as the attribute data of each candidate set item, where each microblog is extracted as a one-dimensional vector;

According to the target user's microblog data, a text document is aggregated, and the word frequency-inverse document frequency is also used to randomly extract the text features of the target user's text document to construct an m-dimensional vector, which is used as the target user's preference data;

According to the cosine similarity formula, the similarity between the user's preference data and the attribute data of each candidate set item is calculated, and the candidate set item with the highest similarity is used as the recommendation set and recommended to the target user.