The application is a divisional application of a patent with an application date of 2012, 5 and 8, and an application number of 201210152084.1, and is named as a method and a system for acquiring personalized features of a user.
Detailed Description
The method of the present invention will be described in further detail with reference to the accompanying drawings.
The description of specific embodiments of the process of this patent includes the following sections. Firstly, a numbering method of users, documents and features and a parameter vector representation method of the users and the documents are explained; then, a parameter vector updating algorithm based on a user access user signal and a parameter vector updating algorithm based on a user access document signal are explained; then, a method of searching for a user or a user group on a social network according to a given feature is described; finally, a system for obtaining personalized features of a user is described.
The numbering method of the user, document and feature is explained first.
The method comprises the steps that a plurality of users are obtained on the Internet, each user has at least one user identification, and the user identification comprises one of a user account, a mobile phone number, a Cookie identification code, an IP address, an Email address and an instant communication number. The obtained multiple users are numbered uniformly, and the user numbers are collected together to form a user set U, wherein M is the number of the users, and each user has a unique user code. Also, a plurality of documents are acquired on the internet, and a plurality of Web pages are acquired by, for example, a spider program. Documents on the internet have a unique identification, such as the URL address of a Web page. The obtained multiple documents are numbered uniformly, and the document numbers are collected together to form a document set D {1, 2.
And uniformly numbering the features of each element in the user set U and the document set D to form a feature set K, wherein the feature set K is {1, 2. The features represent attributes of users and documents such as news, finance, science, music, military, and sports, among others.
The following describes a method of representing a parameter vector of a user and a document. The parameter vector representation method is similar to the vector representation method of the vector space model VSM, namely, the feature item is used as a basic unit of the user feature or the document feature. The parameter vector of the user is represented by a set of the relevance of the user to each feature, and the parameter vector of the document is represented by a set of the relevance of the document to each feature. If a user or document does not have a feature, the relevance of the user or document to the feature is zero.
Fig. 1 shows a method for representing a parameter vector of each user in a user set U. The parameter vector of any user i (i e U) in the user set U is set to be ku (i) ═ uwi1, uwi2,.., uwik.., uwiL), wherein uwik represents the degree of correlation of the user i with the feature K (K e K), uwik e [ a, b ], a and b are nonnegative constants. In addition, the relevancy of each user in the user set U and the kth feature of the feature set K is collected together to form a vector, which is called the kth user column vector of the user set U (uw1K, uw 2K.., uwMk).
FIG. 2 is a method for representing a parameter vector for each document in the document set D. The parameter vector of any one document n (n e D) in the document set D is set to kd (n) ═ (dwn1, dwn2,. rightwords, dwnk,. and dwnL), where dwnk represents the degree of correlation of the document n with the feature K (K e K), and dwnk e [ a, b ], a and b are nonnegative constants. In addition, the relevance of each document in the document set D and the kth feature of the feature set K are collected together to form a vector, which is called a kth document column vector (dw1K, dw2K,..., dwNk) of the document set D.
The degree of relevance is a real number that represents how closely a document or user is to a feature in the feature set K. If a document or user is associated with music features more and less with sports features, we say that the document or user has a high correlation with music features and a low correlation with sports features. In addition, some features have correlation, so that the dimension of the feature set K can be reduced by reducing the correlation among the features during feature selection, the requirement on a server storage space is reduced, and the algorithm efficiency is improved. Some features need not be listed directly in the feature set because the relevance of these features can be calculated from the relevance of one or several other features in the feature set K.
The following explains a setting method of an initial value of a parameter vector of a user or a document. The following three examples are given for illustration. If the parameter vector of the user or document is not set to an initial value, its initial value of the parameter vector is set to a zero vector by default.
Example 1 is a method of manually setting the initial value of a parameter vector for a user i (i ∈ U) or a document n (n ∈ D). For example, set total number of features L to 5, set feature set K to (science, finance, education, music, sports), and set ku (i) to (uwi1, uwi2, uwi3, uwi4, uwi5) to (0, 0.00032, 0, 0.00059, 0). Namely, the correlation between the user i and the "finance" feature is 0.00032, the correlation between the user i and the "music" feature is 0.00059, and the correlation between the user i and other features is zero. Using a similar method, an initial value of a parameter vector kd (n) ═ dwn1, dwn2,. and dwnk,. and dwnL) of any document n can be set.
Example 2 is a method of setting the initial value of the parameter vector of the user i (i ∈ U). By the user iSubmit a set of document sets H {
The parameter vector of the document r (r e H) is kd (r) (dwr1, dwr 2.., dwrL), so for each K e K, uwik (σ 1/s) · (r e H) [ dwrk/(Σ (K e K) dwrk)]Wherein s is the number of elements of the set H, and σ 1 is a set constant. Using a similar method, the user i may also select a group of users in the user set U to calculate the initial value of the parameter vector of the user i.
Example 3 is a method of setting initial values of parameter vectors of a document. A catalog is a special document with a corresponding document number. For example, web portals typically include catalogs of news, music, sports, finance, and technology. We assume that documents under the same category have some of the same characteristics, e.g., documents under sports category are all related to sports. If document n (n e D) is a document under directory h (h e D), the initial value of the parameter vector of document n is determined by the parameter vector of directory h. For example, for each K ∈ K, dwnk ═ σ 2 · dwhk is set, where σ 2 is a set constant.
Fig. 3 is a method for obtaining a user personalized feature based on a user access user signal. The method specifically comprises the following steps:
s11, acquiring a plurality of users on the Internet, and storing a user set U (1, 2.., M) consisting of the users; setting a plurality of features, and storing a feature set K ═ {1, 2.., L } consisting of the plurality of features;
s12, setting initial values of parameter vectors for a plurality of users in the user set U;
s13, receiving a signal of any user i (i belongs to U) for accessing any user j (j belongs to U);
s14. reading a parameter vector ku (i) ═ uwi1, uwi 2., uwik., uwiL of the user i, wherein uwik represents a degree of correlation of the user i with a feature K (K ∈ K);
s15. read a parameter vector ku (j) ═ uwj1, uwj 2.,. uwjk.,. uwjL for the user j, where uwjk represents a degree of correlation of the user j with a feature K (K e K);
s16, updating the parameter vectors of the user i and the user j by using a parameter vector updating algorithm,
Ku*(i)=function1[Ku(i),Ku(j)],
Ku*(j)=function2[Ku(i),Ku(j)];
return is made to step S13.
Wherein the Ku (i) and the Ku (i) represent parameter vectors of the user i before and after updating, respectively, and the Ku (j) represent parameter vectors of the user j before and after updating, respectively; the kux (i) ═ uwi1, (uwi 2., uwik.,. uwiL.), and the kux (j) ═ uwj 1.,. uwj 2.,. uwjk.,. uwjL.; said function1 indicates that Ku (i) is a function of Ku (i) and Ku (j), and said function2 indicates that Ku (j) is a function of Ku (i) and Ku (j); the user i and the user j represent any two users in the user set U without specific reference to any two users, for example, when the nth time step S13 is executed, i is 1023, j is 29328 in the signal, and when the n +1 th time step S13 is executed, i is 737443, j is 837487 in the signal.
In the method shown in fig. 3, after the step S16 is performed, the method further includes the step of updating the Ku (i) and the Ku (j), that is, assigning values Ku (i) and Ku (j).
In an example of application of the method of fig. 3, the type of the signal is one of the following types: t-1 indicates that the user i pays attention to (follow) the user j, T-2 indicates that the user i adds the user j as a friend, T-3 indicates that the user i transfers information of the user j, T-4 indicates that the user i reviews information issued by the user j, T-5 indicates that the user i collects information of the user j, T-6 indicates that the user i sends a private message to the user j, T-7 indicates that the user i tags the user j, and T-8 indicates that the user i likes information issued by the user j. In one example of an application of the method of fig. 3, the signal is collected from a system log.
In one example of an application of the method depicted in FIG. 3, the method satisfies Ku (i) ≧ Ku (i) and Ku (j) ≧ Ku (j). Wherein the inequality Ku (i) ≧ Ku (i) means that for each K ∈ K, uwik ≧ uwik; the inequality Ku (j) ≧ Ku (j) means that for each K ∈ K, uwjk ≧ uwjk.
In an example of application of the method of FIG. 3, for each K ∈ K, uwik is an increasing function of uwjk; for each K ∈ K, uwjk is an increasing function of uwik.
In an example of application of the method of fig. 3, for each K e K, uwiK is a decreasing function of Σ (K e K) uwjk; for each K e K, uwjK is a decreasing function of Σ (K e K) uwik.
In an example application of the method of fig. 3, the signal comprises user identifications of the user i and the user j, each user identification corresponding to a unique user code. The application example reads the user code of the user i according to the user identification of the user i and reads the parameter vector of the user i according to the user code of the user i; and reading the user code of the user j according to the user identifier of the user j, and reading the parameter vector of the user j according to the user code of the user j.
In the method shown in fig. 3, after the parameter vector updating algorithm is executed for a set number of times, for each feature K ∈ K, normalization processing (normalization) is required on the kth user column vector (uw1K, uw 2K.., uwMk) in the user set U. The meaning of performing the parameter vector update algorithm once is to bring the Ku (i) and Ku (j) into the function1 and function2, resulting in the Ku (j) and Ku (i) processes. The specific application example of the normalization method is as follows:
example 1: a method of normalizing a k-th user column vector (uw1k, uw2 k.., uwMk) in a user set U includes setting temp ∑ (t ∈ U) uwtk, and setting uwik ═ uwik/temp for each i ∈ U.
Example 2: the method for normalizing the kth user column vector (uw1k, uw2 k.., uwMk) in the user set U is as follows. Firstly, calculating temp ═ Σ (t ∈ U) uwtk, and calculating uwik ═ uwik/temp for each i ∈ U; then sequencing uw1k, uw2 k.,. uwMk and equally dividing uw1k, uw2 k.,. uwMk into r groups according to the sequencing result, and taking out the minimum data composition set { s1, s 2.,. sr } in each group, wherein s1 < s 2. < sr; finally, uw1k, uw2 k.., uwMk was treated as follows: for each i ∈ U, if uwik < s1, set uwik ∈ a; if sm is less than or equal to uwik less than or equal to sm +1, setting uwik equal to g (sm); if uwik > sr, then uwik ═ b is set. Wherein g (sm) is an increasing function, g (sm) epsilon (a, b), m is more than or equal to 1 and less than r, a and b are non-negative constants, and r is a set parameter.
Application example 1
This is an example of an application of the method described in fig. 3. In the method shown in fig. 3, the parameter vector updating algorithm updates the parameter vectors of the user i and the user j by the following specific application example:
uwik*=uwik+λ1(j,i,T)·f1[Ku(j)](for each k e UKi
K)
uwjk*=uwjk+λ2(i,j,T)·f2[Ku(i)](for each k e UKj
K)
Wherein uwik and uwik represent the kth component of the parameter vector of user i before and after updating, respectively, and uwjk represent the kth component of the parameter vector of user j before and after updating, respectively; the λ 1(j, i, T) is an influence coefficient of the user j on the user i under the type T of the signal, and the λ 2(i, j, T) is an influence coefficient of the user i on the user j under the type T of the signal. The UKi is a set composed of features corresponding to Pi components with the largest values in the parameter vectors ku (i) of the user i (uwi1, uwi2,.., uwik,.., uwiL), the UKj is a set composed of features corresponding to Pj components with the largest values in the parameter vectors ku (j) (uwj1, uwj2,.., uwjk,.., uwjL) of the user j, Pi and Pj are set parameters, Pi is less than or equal to L, and Pj is less than or equal to L. For example, i-30, P30-3, UK30 { literature, computer, biology }; j 265, P265 2, and UK265 { music, history }. After the execution of the specific algorithm, a setting uwjk is made for each k e UKi; for each k e UKj, uwik is set to uwik.
In the application example 1, the specific algorithm may be further defined as satisfying uwjk ≧ uwjk for each k ∈ UKi; for each k ∈ UKj, uwik ≧ uwik is satisfied.
In the application example 1, f1[ ku (j) ] is a function of the parameter vector ku (j) of the user j, and f2Ku (i) ] is a function of the parameter vector ku (i) of the user i. Specific implementation methods of the f1[ ku (j)) ] and the f2[ ku (i)) ] include the following examples:
example 1: said f1[ Ku (j) ] is an increasing function of said uwjk, and a decreasing function of Σ (K ∈ K) uwjk; f2[ Ku (i) ], is an increasing function of uwik and a decreasing function of Σ (K ∈ K) uwik.
Example 2: f1[ ku (j)) ] ═ σ 3 · uwjk/(∑ (K ∈ K) uwjk), f2[ ku (i)) ] ═ σ 4 · uwik/(Σ (K ∈ K) uwik), where σ 3 and σ 4 are set constants.
Example 3: f1[ ku (j)) ] ═ σ 5 · uwjk, f2[ ku (i)) ] ═ σ 6 · uwik, where σ 5 and σ 6 are set constants.
Example 4: f1[ ku (j)) ] ═ σ 7 · {1/[1+ exp (-uwjk) ] }, f2[ ku (i)) ] ═ σ 8 · {1/[1+ exp (-uwik) ] }, where σ 7 and σ 8 are set constants.
In the application example 1, a specific implementation method of the λ 1(j, i, T) and the λ 2(i, j, T) includes:
example 1: the λ 1(j, i, T) and the λ 2(i, j, T) are functions of the similarity sim (i, j) between the parameter vectors of the user i and the user j, respectively. For example, λ 1(j, i, T) ═ c1 · sim (i, j), λ 2(i, j, T) ═ c2 · sim (i, j), and sim (i, j) | | | ku (i), ku (j) | | [ ∑ Σ (K ∈ K) (uwik · uwjk) ]/{ [ Σ (K ∈ K) (uwik)2]1/2 [ Σ (K ∈ K) (uwjk)2]1/2}, and c1 and c2 are set constants. The implication of this example is that the higher the similarity between the parameter vectors of user i and user j, the larger the scaling factor they "vote" for each other.
Example 2: the λ 1(j, i, T) ═ U1(j) · U2(i), and the λ 2(i, j, T) ═ U1(i) · U2(j), where U1(j) indicates whether the parameter vector of user j can be used for updating the parameter vectors of other users in the user set U, and U1(i) indicates whether the parameter vector of user i can be used for updating the parameter vectors of other users in the user set U; u2(i) indicates whether the parameter vector of user i can be updated by the parameter vectors of other users in the user set U; u2(j) indicates whether the parameter vector of user j can be updated by the parameter vectors of other users in the user set U. u1(j), u2(j), u1(i) and u2(i) are setting parameters, and their values are 0 or 1. 1 represents yes and 0 represents no. The meaning of this example is that in order to prevent malicious attacks, the parameter vector of a user who has not been authenticated by reliability cannot be updated to the parameter vectors of other users; some special users have their parameter vectors that cannot be updated by the parameter vectors of other users.
Example 3: the λ 1(j, i, T) ═ s1(T), and the λ 2(i, j, T) ═ s2 (T). Wherein the T is a type of user signal accessed by a user, the s1(T) and the s2(T) are functions of the T.
Example 4: the λ 1(j, i, T) and λ 2(i, j, T) were generated using a combination of the methods of examples 1 to 3 above. For example
λ1(j,i,T)={c1·sim(i,j)}·{u1(j)·u2(i)}·s1(T)
λ2(i,j,T)={c2·sim(i,j)}·{u1(i)·u2(j)}·s2(T)。
Example 5: the λ 1(j, i, T) is a function of the number of users in the relational network of the user j, and the λ 2(i, j, T) is a function of the number of users in the relational network of the user i.
Example 6: the λ 1(j, i, T) and the λ 2(i, j, T) are set constants.
In the application example 1, after the specific algorithm is executed for a set number of times, the K-th user column vector (uw1K, uw 2K.., uwMk) needs to be normalized for each feature K ∈ K.
Application example 2
This is an example of the method of application example 1. For the sake of illustration, we assume that there are three users on the internet, each having two features, i.e., user set U ═ {1, 2, 3}, and feature set K ═ 1, 2 }. The parameter vectors for user 1, user 2 and user 3 are (uw11, uw12), (uw21, uw22) and (uw31, uw32), respectively. Where uwik (i ∈ U, K ∈ K) represents the degree of correlation of the user i with the feature K.
If a signal that the user 2 accesses the user 3 is received and the signal type T is 1, updating the parameter vectors of the user 2 and the user 3 according to the following parameter vector updating algorithm:
uw21*=uw21+λ1(3,2,1){uw31/(uw31+uw32)}
uw22*=uw22+λ1(3,2,1){uw32/(uw31+uw32)}
uw31*=uw31+λ2(2,3,1){uw21/(uw21+uw22)}
uw32*=uw32+λ2(2,3,1){uw22/(uw21+uw22)}
where λ 1(3, 2, 1) represents an influence coefficient of the user 3 on the user 2 under the signal type T ═ 1; λ 2(2, 3, 1) represents the influence coefficient of the user 2 on the user 3 with the signal type T equal to 1. Let λ 1(3, 2, 1) ═ c1 · sim (2, 3) · u1(3) · u2(2) · s1 (1); λ 2(2, 3, 1) ═ c2 · sim (2, 3) · u1(2) · u2(3) · s2(1), where s1(1) ═ 3 and s2(1) · 1.5; c1 and c2 are set constants; u1(3) indicates whether the parameter vector of user 3 can be used to update the parameter vectors of other users in the user set U, U1(2) indicates whether the parameter vector of user 2 can be used to update the parameter vectors of other users in the user set U, U2(2) indicates whether the parameter vector of user 2 can be updated by the parameter vectors of other users in the user set U, U2(3) indicates whether the parameter vector of user 3 can be updated by the parameter vectors of other users in the user set U, and U1(2) ═ U2(2) ═ U1(3) ═ U2(3) ═ 1; the sim (2, 3) represents the similarity between the parameter vectors of the user 2 and the user 3, i.e. sim (2, 3) ═ w21 · uw31+ uw22 · uw32)/{ [ (uw21)2+ (uw22)2]1/2 · [ (uw31)2+ (uw32)2] 1/2.
After the above algorithm is executed, the parameter vectors of the user 2 and the user 3 are updated, i.e., uw31 ═ uw31, uw32 ═ uw32, uw21 ═ uw21, and uw22 ═ uw22 are set.
After the above algorithm is executed, the user column vectors (uw11, uw21, uw31) and (uw12, uw22, uw32) are normalized. The algorithm is as follows: if temp1 ═ uw11+ uw21+ uw31, uw11 ═ uw11/temp1, uw21 ═ uw21/temp1, and uw31 ═ uw31/temp1 are set for the characteristic k ═ 1; if temp2 is uw12+ uw22+ uw32, uw12 is uw12/temp2, uw22 is uw22/temp2, and uw32 is uw32/temp2 for the feature k 2.
Fig. 4 is a method for obtaining personalized features of a user based on a signal of the user accessing a document. The method specifically comprises the following steps:
s21, acquiring a plurality of users on the Internet, and storing a user set U (1, 2.., M) consisting of the users; acquiring a plurality of documents on the internet, and storing a document set D ═ 1, 2. Setting a plurality of features, and storing a feature set K ═ {1, 2.., L } consisting of the plurality of features;
s22, setting parameter vector initial values for a plurality of users in the user set U, and setting parameter vector initial values for a plurality of documents in the document set D;
s23, receiving a signal that any user m (m belongs to U) accesses any document n (n belongs to D);
s24, reading a parameter vector ku (m) ═ uwm1, uwm 2.,. uwmk.,. uwmml) of the user m, wherein uwmk represents a degree of correlation of the user m with a feature K (K ∈ K);
s25. reading a parameter vector kd (n) of the document n, (dwn1, dwn 2.., dwnk.,. d wnl), wherein dwnk represents a degree of correlation of the document n with a feature K (K e K);
s26, updating the parameter vectors of the user m and the document n by using a parameter vector updating algorithm 2,
Ku*(m)=function3[Ku(m),Kd(n)],
Kd*(n)=function4[Ku(m),Kd(n)];
after the step S26 is completed, the process returns to the step S23.
Wherein the Ku (m) and the Ku x (m) represent parameter vectors of the user m before and after updating, respectively, and the Kd (n) and the Kd x (n) represent parameter vectors of the document n before and after updating, respectively; the Kd (m) is (uwm1, uwm2, hw, uwmk, hw ml), and the Kd (n) is (dwn1, dwn2, dwnk, dwnL); said function3 indicates that Ku (m) is a function of Ku (m) and Kd (n), and said function4 indicates that Kd (n) is a function of Ku (m) and Kd (n); the user m represents any user in the user set U without referring to a user, and the document n represents any document in the document set D without referring to a document, for example, when the nth execution of step S23 is executed, m 1023 and n 3428 are included in the signal, and when the nth +1 execution of step S23 is executed, m 33456 and n 28477 are included in the signal.
In the method shown in fig. 4, after the step S26 is performed, the method further includes the step of updating the Ku (m) and the Kd (n), i.e., assigning Kd (n) ═ Kd (n) and Ku (m) ═ Ku (m).
In the method of fig. 4, the type of the signal is at least one of the following types: t-9 represents the user m clicks the link of the document n, T-10 represents the address of the user m typing the document n, T-11 represents the user m sets the label for the document n, T-12 represents the user m sets the document n as a bookmark, T-13 represents the user m sets the document n as a favorite (such as +1 of Like and google of facial makeup), T-14 represents the user m forwards the document n, T-15 represents the user m reviews the document n, and T-16 represents the user m collects the document n. In one example of the method of fig. 4, the signal is collected from a Web log. The Web logs comprise a server log (server log), an error log (error log), a Cookie log and the like.
In one example of use of the method described in FIG. 4, the method satisfies Ku (m) ≧ Ku (m) and Kd (n) ≧ Kd (n). Wherein the inequality Ku (m) ≧ Ku (m) means uwmk ≧ uwmk for each K ∈ K; the inequality Kd (n) ≧ Kd (n) means that for each K ∈ K, there is dwnk ≧ dwnk.
In an example of application of the method of fig. 4, for each K e K, uwmk is an increasing function of dwnk, and a decreasing function of Σ (K e K) dwnk; for each K e K, the dwnk is an increasing function of the uwmk and a decreasing function of Σ (K e K) uwmk.
In an example application of the method of fig. 4, the signal includes a user identifier of the user m and a document identifier of the document n, the user identifier corresponds to a unique user code, and the document identifier corresponds to a unique document code. The application example reads the user code of the user m through the user identifier of the user m and reads the parameter vector of the user m according to the user code of the user m; reading the document code of the document n through the document identification of the document n, and reading the parameter vector of the document n according to the document code of the document n.
In the method shown in fig. 4, after the parameter vector update algorithm 2 is executed for a set number of times t1, for each feature K e K, normalizing the kth user column vector (uw1K, uw2K,. once, uwMk); after the parameter vector updating algorithm 2 is executed for a set time t2, for each feature K ∈ K, normalizing the kth document column vector (dw1K, dw 2K., dwNk); wherein t1 and t2 are positive integers. Performing the parameter vector update algorithm 2 once means a process of substituting the Ku (m) and the Kd (n) into the function3 and the function4, resulting in the Ku (m) and the Kd (n). The specific application example of the normalization method is as follows:
example 1: a method of normalizing a kth user column vector (uw1k, uw2 k.., uwMk) in a user set U includes setting temp ∑ (t ∈ U) uwtk, and for each m ∈ U, setting uwMk ═ uwMk/temp. A method of normalizing the kth document column vector (dw1k, dw2 k.., dwNk) in the document set D includes setting temp ∑ (te e D) dwtk, and for each n e D, setting dwNk ═ dwNk/temp.
Example 2: the method for normalizing the kth document column vector (dw1k, dw2 k.., dwNk) in the document set D is as follows. First calculating temp ═ Σ (t ∈ D) dwtk, and dwnk ═ dwnk/temp for each n ∈ D; then sequencing dw1k, dw2 k.,. dwNk and equally dividing dw1k, dw2 k.., dwNk into r groups according to the sequencing result, and taking out the minimum data composition set { s1, s 2.., sr } in each group, wherein s1 < s 2. < sr; dw1k, dw2 k.., dwNk is finally treated as follows: for each n ∈ D, if dwnk < s1, then dwnk ═ a is set; if sm is less than or equal to dwnk is less than or equal to sm +1, then dwnk is set to g (sm); if dwnk > sr, dwnk is set to b. Wherein g (sm) is an increasing function, g (sm) epsilon (a, b), m is more than or equal to 1 and less than r, a and b are non-negative constants, and r is a set parameter. In the same way, the k-th user column vector in the user set U can be normalized.
Application example 3
This is an example of an application of the method described in fig. 4. The parameter vector updating algorithm 2 updates the parameter vectors of the user m and the document n by the following specific application examples:
uwmk*=uwmk+λ3(n,m,T)·f3[Kd(n)](for each k e DKn
K)
dwnk*=dwnk+λ4(m,n,T)·f4[Ku(m)](for each k ∈ DKm
K)
Wherein uwmk and uwmk represent the kth component of the parameter vector of the user m before and after updating, respectively, and dwnk represent the kth component of the parameter vector of the document n before and after updating, respectively; the λ 3(n, m, T) is the influence coefficient of the document n on the user m under the type T of the signal, and the λ 4(m, n, T) is the influence coefficient of the user m on the document n under the type T of the signal. The DKm is a set of features corresponding to Pm components with the largest values in parameter vectors ku (m) (uwm1, uwm 2.,. so, uwmk.,. so, uwmL) of the user m, the DKn is a set of features corresponding to Qn components with the largest values in parameter vectors kd (n) (dwn1, dwn 2.,. dwnk.,. so, dwnL), Pm and Qn are set parameters, Pm is equal to or less than L, and Qn is equal to or less than L. For example, m 30, P30 3, UK30 { music, sports, finance }; n is 265, Q265 is 2, and DK265 is { music, building }. In addition, after the above-described specific algorithm is executed, settings are made such that dwnk ═ dwnk ∈ DKm is set for each k ∈ DKm, and uwmk ═ uwmk is set for each k ∈ DKn.
In the application instance 3, the specific algorithm may be further defined as satisfying uwmk ≧ uwmk for each k ∈ DKn; for each k ∈ DKm, dwnk ≧ dwnk is satisfied.
In the application example 3, f3[ kd (n)) ] is a function of the parameter vector kd (n) of the document n, and f4[ ku (m)) ] is a function of the parameter vector ku (m) of the user m. The specific implementation method of the f3[ Kd (n)) ] and the f4[ Ku (m)) ] comprises the following steps:
example 1: said f3[ Kd (n) ] is an increasing function of said dwnk and a decreasing function of Σ (K ∈ K) dwnk; f4[ Ku (m) ] is an increasing function of uwmk and a decreasing function of ∑ (K ∈ K) uwmk.
Example 2: f3[ kd (n)) ] ═ σ 3 · dwnk/(∑ (K ∈ K) dwnk), f4[ ku (m)) ] ═ σ 4 · uwmk/(Σ (K ∈ K) uwmk), where σ 3 and σ 4 are set constants.
Example 3: f3[ kd (n)) ] ═ σ 5 · dwnk, f4[ ku (m)) ] ═ σ 6 · uwmk, where σ 5 and σ 6 are set constants.
Example 4: f3[ kd (n)) ] ═ σ 7 · {1/[1+ exp (-dwnk) ] }, f4[ ku (m)) ] ═ σ 8 · {1/[1+ exp (-uwmk) ] }, where σ 7 and σ 8 are set constants.
In the application example 3, specific implementation methods of the λ 3(n, m, T) and the λ 4(m, n, T) include the following examples:
example 1: the λ 3(n, m, T) and the λ 4(m, n, T) are functions of the similarity sim (m, n) between the parameter vectors of the user m and the document n, respectively. For example, λ 3(n, m, T) ═ c1 · sim (m, n), λ 4(m, n, T) ═ c2 · sim (m, n), and sim (m, n) | | ku (m), kd (n) | | [ ∑ Σ (K ∈ K) (uwmk · dwnk) ]/{ [ Σ (K ∈ K) (uwmk)2]1/2 [ Σ (K ∈ K) (dwnk)2]1/2}, and c1 and c2 are set constants. The implication of this example is that the higher the similarity between the parameter vectors of the user and the document, the larger the scaling factor they "vote" for each other.
Example 2: the λ 3(n, m, T) ═ U2(m) · D1(n), and the λ 4(n, m, T) · U1(m) · D2(n), where D1(n) indicates whether the parameter vector of document n can be used to update the parameter vectors of users in user set U, U2(m) indicates whether the parameter vector of user m can be updated by the parameter vector of documents in document set D, U1(m) indicates whether the parameter vector of user m can be used to update the parameter vector of documents in document set D, and D2(n) indicates whether the parameter vector of document n can be updated by the parameter vectors of users in user set U. u1(m), u2(m), d1(n) and d2(n) are setting parameters, and the values of the setting parameters are 0 or 1. 1 represents yes and 0 represents no. The meaning of this example is that in order to prevent malicious attacks, some documents (or users) cannot update the parameter vectors of other users (or documents) because the documents (or users) are not authenticated by reliability; some special documents (or users) have their parameter vectors that cannot be updated by the parameter vectors of other users (or documents).
Example 3: the λ 3(n, m, T) ═ s1(T), and the λ 4(m, n, T) ═ s2 (T). Wherein the T is a type of user access document signal, the s1(T) and the s2(T) are functions of the T.
Example 4: the λ 3(n, m, T) and λ 4(m, n, T) were generated using a combination of the methods of examples 1-3 above. Namely, it is
λ3(n,m,T)={c1·sim(m,n)}·{u2(m)·d1(n)}·s1(T)
λ4(m,n,T)={c2·sim(m,n)}·{u1(m)·d2(n)}·s2(T)。
Example 5: the λ 3(n, m, T) is a function of the number of times the document n is accessed or the PageRank value, and the λ 4(m, n, T) is a function of the number of users in the user's m relational network.
Example 6: the λ 3(n, m, T) and the λ 4(m, n, T) are set constants.
In the application example 3, after the specific parameter vector update algorithm is executed for the set number of times, the K-th document column vector (dw1K, dw 2K.., dwNk) and the K-th user column vector (uw1K, uw 2K.., uwMk) need to be normalized for each feature K ∈ K.
Application example 4
This is an application example of the method described in application example 3. For convenience of explanation, we assume that there are two users and three documents on the internet, each having two features, i.e., user set U ═ {1, 2}, document set D ═ 1, 2, 3}, and feature set K ═ 1, 2 }. The parameter vectors of user 1 and user 2 are (uw11, uw12) and (uw21, uw22), respectively, and the parameter vectors of document 1, document 2, and document 3 are (dw11, dw12), (dw21, dw22), and (dw31, dw32), respectively. Wherein uwmk (m is equal to U, K is equal to K) represents the correlation degree of the user m and the characteristic K; dwnk (n ∈ D, K ∈ K) represents the relevance of the document n to the feature K.
Assuming that a signal is received in the server that the user 2 accesses the document 3 and the signal type T is 9, the parameter vectors of the user 2 and the document 3 are updated according to the following algorithm:
uw21*=uw21+λ3(3,2,9){dw31/(dw31+dw32)}
uw22*=uw22+λ3(3,2,9){dw32/(dw31+dw32)}
dw31*=dw31+λ4(2,3,9){uw21/(uw21+uw22)}
dw32*=dw32+λ4(2,3,9){uw22/(uw21+uw22)}
wherein λ 3(3, 2, 9) represents an influence coefficient of the document 3 on the user 2 under the signal type T ═ 9; λ 4(2, 3, 9) represents an influence coefficient of the user 2 on the document 3 under the signal type T of 9. For example, λ 3(3, 2, 9) ═ c1 · sim (2, 3) · s1 (9); λ 4(2, 3, 9) ═ c2 · sim (2, 3) · s2(9), where s1(9) ═ 3 and s2(9) ═ 1.5; c1 and c2 are set constants; the sim (2, 3) represents the similarity between the parameter vectors of the user 2 and the document 3, namely: sim (2, 3) ═ w21 · dw31+ uw22 · dw32)/{ [ (uw21)2+ (uw22)2]1/2 · [ (dw31)2+ (dw32)2]1/2 }.
After the above algorithm is executed, the parameter vectors of the user 2 and the document 3 are updated, i.e., uw21 ═ uw21, uw22 ═ uw22, dw31 ═ dw31, and dw32 ═ dw32 are set.
After the above algorithm is executed, the user column vectors (uw11, uw21) and (uw12, uw22) are normalized, and the document column vectors (dw11, dw21, dw31) and (dw12, dw22, dw32) are normalized.
The algorithm for the normalization process of the user column vectors is as follows: if temp1 ═ uw11+ uw21, uw11 ═ uw11/temp1 and uw21 ═ uw21/temp1 are set for characteristic k ═ 1; if temp2 is uw12+ uw22, uw12 is uw12/temp2 and uw22 is uw22/temp2 for characteristic k 2.
The algorithm for the normalization process of the document column vector is as follows: if temp1 ═ dw11+ dw21+ dw31, dw11 ═ dw11/temp1, dw21 ═ dw21/temp1, dw31 ═ dw31/temp1 are set for the characteristic k ═ 1; if temp2 is dw12+ dw22+ dw32, dw12 is dw12/temp2, dw22 is dw22/temp2, and dw32 is dw32/temp2 for the feature k 2.
FIG. 5 is a flow chart of a method of querying a user for a particular characteristic. The method comprises the following steps executed in the server:
A11. receiving a query vector set by any query user e (e belongs to U);
A12. the query user e selects a group of users Q (Q e.g. U) in the user set U, for example, selects a group of users with ages in a given interval or all users with locations in a set geographical area in a social network; if the user does not select the data, the default value is Q ═ U;
A13. calculating a personalized ranking value UR (e, m) of each user m (m E Q) in the group of users Q according to the query vector and the parameter vector of each user in the group of users Q; the UR (e, m) represents a personalized rank value for the user m based on the user e's query vector;
A14. and sending the identifications of some users with the maximum personalized ranking value to the inquiring user e in the group of users Q.
In the method shown in fig. 5, the query vector set by the user e is ks (e) ═ ks (swe1, swe 2.., swek.,. swek.., sweL), where swek represents the degree of correlation between the user that the user e desires to query and the feature K (K ∈ K), and swek ∈ [ a, b ], a and b are preset nonnegative constants. The query vector ks (e) has the following setting methods.
The first is that the user e selects a feature from the feature set K and sets a feature correlation, for example, set swe 2-0.00023 and swe 6-0.00061, to the feature set K, and the correlation of the user e with other features is 0.
The second is to assign the parameter vector ku (e) of the user e to the query vector ks (e).
The third is that the user e submits a set of users or documents Se. When in use
Then, the parameter vector of the user r (r ∈ Se) is (uwr1, uwr 2.., uwrL), so the query vector of the user e is set to: for each feature K e K, swek ∈ K · (σ 9/s) · (r e Se) [ uwrk/(∑ (K e K) uwrk)](ii) a When in use
Then, the parameter vector of the document r (r ∈ Se) is (dwr1, dwr 2.,. dwrL), so the query vector of the user e is set to: for each feature K e K, swek ∈ K · (σ 10/s) · ∑ (r e Se) [ dwrk/(∑ (K e K) dwrk)]。
In an application example of the method shown in fig. 5, the personalized ranking value UR (e, m) is calculated by a query vector ks (e) (swe1, swe 2.,. swek.,. sweL.) of the user e and a parameter vector ku (m) of the user m (m e Q) (uwm1, uwm 2.,. uwmk.,. u, u ml), for example
FIG. 6 is a flow chart of a method for querying a user group having a particular characteristic. A plurality of user groups are obtained, and a user cluster G ═ 1, 2. A parameter vector of a group of users i (i e G) is set to (gwi1, gwi 2.,. gwik.,. gwiL), wherein the gwik represents a degree of correlation of the group of users i with a feature K (K e K). Therefore, the method for querying the user group with the specific characteristics is as follows:
A21. calculating a parameter vector of each user group in the user group G; the parameter vector of a user group is calculated by the parameter vector of each user contained in the user group; for example, assuming that all users in the user group i form the user set Bi, the parameter vector calculation method of the user group i is to set gwik ═ (σ 11/s) ·Σ (t ∈ Bi) [ uwtk/(Σ (K ∈ K) uwtk) ] for each feature K ∈ K, where s is the number of elements in the user set Bi, and σ 11 is a set constant.
A22. Receiving a query vector set by any query user e (e belongs to U);
A23. calculating a personalized ranking value GR (e, i) of each user group i (i belongs to G) in the user cluster G according to the query vector and the parameter vector of each user group in the user cluster G; the GR (e, i) represents a personalized ranking value for the group of users i based on the user e's query vector;
A24. and in the user cluster G, sending the identifications of some user groups with the maximum personalized ranking values to the inquiring user e.
In the method shown in fig. 6, the query vector set by the user e is ks (e) ═ ks (swe1, swe 2., swek., sweL), where swek represents the correlation between the user group that the user e desires to query and the feature K (K ∈ K), and swek ∈ [ a, b ], a and b are preset nonnegative constants. The query vector Ks (e) has four setting methods, the first three methods are the same as the three methods described in FIG. 5. And fourthly, the user e submits a user group, and the parameter vector of the user group is assigned to the Ks (e).
In the method for searching for the user group, the personalized ranking value GR (e, i) is calculated by a query vector ks (e) (swe1, swe 2.., swek.., sweL) of the user e and a parameter vector ku (i) (i ∈ G) (gwi1, gwi 2.., gwi.., ghil) of the user group i (i ∈ G), for example, the personalized ranking value GR (e, i) is calculated by the user e
Fig. 7 is a system structure diagram for acquiring the personalized features of the user. The system 200 comprises the following functional modules:
user, document and feature setting module 211: acquiring a plurality of users on the internet to form a user set U ═ 1, 2.., M }, and storing the user set U in a user database 220; acquiring a plurality of documents on the internet to form a document set D ═ 1, 2.. multidata, N }, and storing the document set D in a document database 230; setting a plurality of features, forming a feature set K ═ {1, 2.., L }, and storing the feature set in a feature database 240;
user and document parameter vector initial value setting module 212: setting initial values of parameter vectors for a plurality of users in the user set U, and storing the initial values in the user database 220; setting initial values of parameter vectors for a plurality of documents in the document set D and storing the initial values in the document database 230; the initial value of the parameter vector of the user and the document which are not set with the initial value of the parameter vector is set as a zero vector by default;
the user access user signal acquisition module 213: the system comprises a Web log database 250, a signal 1 and a signal processing module, wherein the signal 1 is used for collecting a signal 1 of any user i (i belongs to U) accessing any user j (j belongs to U), and the signal 1 is stored in the Web log database 250; a signal for the user i (101) to access the user j (102) is sent to a social network server 302;
user access document signal collection module 214: the method comprises the steps of collecting a signal 2 of any user m (m belongs to U) (103) accessing any document n (n belongs to D), wherein the signal 2 is stored in a Web log database 250; a signal for said user m (103) to access said document n, to be sent to at least one application server, said application server comprising a web portal server 301, a social network server 302, a search engine server 303 and an instant messaging server 304;
user and document parameter vector update module 215: reading the parameter vectors of the user i (101) and the user j (102) in the user database 220 according to the signal 1, then updating the parameter vectors of the user i (101) and the user j (102) through a parameter vector updating algorithm, and storing the updated parameter vectors of the user i and the user j in the user database 220; reading the parameter vector of the user m (103) in the user database 220 and the parameter vector of the document n in the document database 230 according to the signal 2, then updating the parameter vectors of the user m (103) and the document n by a parameter vector updating algorithm 2, and storing the updated parameter vector of the user m in the user database 220 and the updated parameter vector of the document n in the document database 230;
the user query module 216: the module has the inquiry function of users and user groups; the user query function includes: receiving a query vector set by a querying user and obtaining a set of users
Then, according to the query vector and the parameter vector of each user in the group of users Q, calculating a personalized ranking value of each user in the group of users Q, and according to the magnitude of the personalized ranking value, sending the identification of at least one user in the group of users Q to the query user, see steps A11-A14; the user group query function includes: receiving a query vector set by a query user, then calculating a personalized ranking value of each user group in the user group G according to the query vector and the parameter vector of each user group in the user group G, and sending an identifier of at least one user group in the user group G to the query user according to the size of the personalized ranking value, see steps A21-A24.
The above-mentioned application examples are only preferred application examples of the present invention, and are not intended to limit the scope of the present invention.