[go: up one dir, main page]

HK1182802B - Method and system for determining user group, information query and recommendation - Google Patents

Method and system for determining user group, information query and recommendation Download PDF

Info

Publication number
HK1182802B
HK1182802B HK13110008.4A HK13110008A HK1182802B HK 1182802 B HK1182802 B HK 1182802B HK 13110008 A HK13110008 A HK 13110008A HK 1182802 B HK1182802 B HK 1182802B
Authority
HK
Hong Kong
Prior art keywords
user
information
preference
value
demand
Prior art date
Application number
HK13110008.4A
Other languages
Chinese (zh)
Other versions
HK1182802A1 (en
Inventor
苏宁军
顾海杰
Original Assignee
阿里巴巴集团控股有限公司
Filing date
Publication date
Priority claimed from CN201110445052.6A external-priority patent/CN103186539B/en
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of HK1182802A1 publication Critical patent/HK1182802A1/en
Publication of HK1182802B publication Critical patent/HK1182802B/en

Links

Description

Method and system for determining user group, information query and recommendation
Technical Field
The invention relates to the field of internet information query, in particular to a method and a system for determining user groups, querying information and recommending information.
Background
The types of users of e-commerce websites are various, and there are enterprise users and individual users, and the enterprise users can be classified into raw material suppliers, production manufacturers, wholesale retailers, merchants, and the like. Generally, the demands of enterprise users are more stable and focused than individual users, and raw material suppliers and production manufacturers are more concentrated than the demands of wholesale retailers and merchants, how to identify the divergence degree of the demands of the users and how to determine which user group the users belong to, and provide commodity/information inquiry and recommendation according to the demand type of the user group, so that the system has great significance for improving the accuracy of a recommendation engine system and the user experience.
The existing recommendation systems generally recommend the goods/information based on the interest and preference of the user or the correlation between the user and the goods/information. The existing recommendation system algorithm is generally not distinguished for various users, and the same recommendation algorithm is adopted for individual users and enterprise users. For example, a recommendation algorithm based on user interest preference determines categories or leaf categories in which a user is interested according to the amount of access behavior of the user to the categories or leaf categories, and then recommends good or new commodities/information under the categories. Based on a recommendation algorithm of user and commodity/information correlation, a relevant user or commodity is found first, and then other commodities with higher correlation degree or commodities concerned by other users with higher correlation degree are recommended when the user browses a certain commodity.
In the invention process, the inventor finds that the recommendation algorithms of the existing recommendation system are the same for different user groups, and the adopted recommendation algorithms are inquired in a plurality of categories, so that the inquired commodities are more in quantity and the inquiry speed is low;
when the conventional commodity recommendation method is applied to the internet industry to perform algorithm analysis on massive data (such as millions of users and billions of commodity data), the conventional commodity recommendation method has the defects of large data volume, complex operation process, high requirement on system resources and long calculation time, and is difficult to meet the service requirement of the internet industry on quick response to information query.
Disclosure of Invention
The embodiment of the application provides a method and a system for determining user groups, inquiring and recommending information, and is used for solving the problems that when a conventional commodity recommending method in the prior art is applied to the internet industry to perform algorithm analysis on mass data, the conventional commodity recommending method has large data volume and complex operation process, has high requirements on system resources and long calculation time, and is difficult to meet the service requirement of quick response of the internet industry.
An embodiment of the present application provides a method for determining a user group, which specifically includes:
acquiring behavior record information of a user on M leaf categories;
counting the behavior record information, and analyzing to obtain M preference values p of the user to the M leaf categoriesiWherein each of the M leaf categories corresponds to the M preference values piA preference value p ofiM is an integer greater than or equal to 1;
based on the M preference values piCalculating to obtain a divergence value H of the demand preference of the user;
and comparing the preference divergence value H with a first threshold value G, and listing the user into a demand focus user group when the preference divergence value H is less than or equal to the first threshold value G.
Wherein the preference value piThe method specifically comprises the following steps: the behavior record of each leaf category accessed by the user accounts for the proportion value of the behavior records of the M leaf categories accessed by the user;
preferably, M preference values p of the M leaf categories are obtained in the analysis for more accurate determination of the user populationiThen, the method further comprises the following steps:
preference value p to be less than a second threshold valueiFiltering the corresponding leaf category to obtain the remaining preference degrees which are more than or equal to the first threshold valueValue piCorresponding N leaf categories; n is less than or equal to M;
based on the N preference values piAnd calculating to obtain the demand preference divergence H of the user.
In the above method, the calculating, based on the N preference values Pi, the demand preference divergence H of the user is specifically:
by information entropy formulaAnd calculating to obtain a demand preference divergence value H of the user.
The second embodiment of the present application provides an information query and recommendation method, which specifically includes:
preference value p in each leaf category obtained based on behavior record information of user on M leaf categoriesiThe information entropy formula is used for determining that the user belongs to a user group of a demand focusing class;
inquiring the behavior record information of the demand focusing user group on a clustering commodity unit, and analyzing to obtain clustering commodity information preferred by the user demand; the clustering commodity unit is specifically used for storing the clustering commodity information obtained by clustering the commodity information under S leaf categories; s is an integer greater than or equal to 1;
and inquiring and recommending the clustered commodity information with the demand preference to the user.
Wherein the preference value p in each leaf category obtained based on the behavior record information of the user on the M leaf categoriesiThe method specifically comprises the following steps:
acquiring behavior record information of a user on M leaf categories;
counting the behavior record information, and analyzing to obtain M preference values p of the M leaf categoriesiWherein each of the M leaf categories corresponds to the M preference values piOne preference value pi
Wherein M is an integer greater than or equal to 1; the preference value piThe method specifically comprises the following steps: and the behavior record of each leaf category accessed by the user accounts for the proportion value of the behavior records of the M leaf categories accessed by the user.
The preference value p is used as the basisiAnd an information entropy formula, wherein the step of determining that the user belongs to a demand focus user group specifically comprises the following steps:
calculating to obtain a demand preference divergence value H of the user through an information entropy formula;
and comparing the preference divergence value H with a first threshold value G, and listing the user into a demand focus user group when the preference divergence value H is less than or equal to the first threshold value G.
The method comprises the following steps of clustering commodity information under S leaf categories, wherein the clustering specifically comprises the following steps:
performing word segmentation on the commodity title and the information description under each leaf category in the S leaf categories;
extracting keywords of commodity information under each leaf category;
and clustering the commodity information containing the keywords under each leaf category to obtain L pieces of clustered commodity information.
The identifying of the clustered commodity information preferred by the user demand based on the behavior record information of the demand focus type user group in the clustered commodity information specifically comprises the following steps:
acquiring behavior record information of the demand focus user group in the L pieces of clustered commodity information;
counting the behavior record information to obtain an access quantity value Pa of each piece of clustered commodity information in the L pieces of clustered commodity information and a frequency Fa of accessing each piece of clustered commodity information in a specific time period;
and filtering R pieces of clustered commodity information corresponding to the frequency value Fa smaller than the first preset threshold value K and the access magnitude value Pa smaller than the second threshold value X to obtain L minus R pieces of clustered commodity information which serve as the clustered commodity information preferred by the user requirements.
A third embodiment of the present application provides a system for determining a user group, which specifically includes:
the acquiring unit is used for acquiring behavior record information of a user on M leaf categories;
a statistic analysis unit for counting the behavior record information and analyzing to obtain M preference values p of the M leaf categoriesiWherein each of the M leaf categories corresponds to the M preference values piOne preference value piM is an integer greater than or equal to 1;
a first calculating unit for calculating a preference value p based on the M preference valuesiCalculating to obtain a divergence value H of the demand preference of the user;
and the user group determining unit is used for comparing the preference divergence value H with a first threshold value G, and listing the user into a demand focus user group when the preference divergence value H is smaller than or equal to the first threshold value G.
In order to determine the user group more accurately, the system as described above further includes:
a filtering unit for filtering the preference value p smaller than the second threshold valueiFiltering the corresponding leaf category to obtain the remaining preference value p which is greater than or equal to the first threshold valueiCorresponding N leaf categories; n is less than or equal to M;
a second calculating unit for calculating the preference values p based on the N preference valuesiAnd calculating to obtain the demand preference divergence H of the user.
The fourth embodiment of the present application provides an information query and recommendation system, which specifically includes:
a user group determination unit for obtaining preference value p of each leaf category based on behavior record information of users on M leaf categoriesiThe information entropy formula is used for determining that the user belongs to a user group of a demand focusing class;
the query analysis unit is used for analyzing and obtaining the clustered commodity information preferred by the user demand based on the behavior record information of the demand focus type user group on the clustered commodity unit;
and the recommending unit is used for recommending the clustered commodity information with the demand preference to the user.
Wherein, the clustering unit specifically comprises:
the word segmentation unit is used for segmenting the commodity title and the information description under each leaf category in the S leaf categories;
the keyword extraction unit is used for extracting keywords of the commodity information under each leaf category;
and the commodity information clustering unit is used for clustering the commodity information containing the key words under each leaf category to obtain L pieces of clustered commodity information.
The identification unit in the system provided in the fourth embodiment of the present application specifically includes:
an obtaining unit II, configured to obtain behavior record information of the demand focus user group in the L pieces of clustered commodity information;
a second statistical analysis unit, configured to perform statistics on the behavior record information to obtain an access quantity Pa for each piece of clustered commodity information in the L pieces of clustered commodity information and a frequency Fa for accessing each piece of clustered commodity information in a specific time period;
and the filtering and identifying unit is used for filtering R pieces of clustered commodity information corresponding to the frequency value Fa smaller than the first preset threshold value K and the access magnitude value Pa smaller than the second threshold value X to obtain L minus R pieces of clustered commodity information which serve as the clustered commodity information preferred by the user demand.
The technical scheme provided by one or more embodiments of the application has at least one of the following beneficial technical effects or advantages:
the method and the device have the advantages that the algorithm for accurately identifying the divergence degree of the demand preference of the user is obtained by virtue of the concept of 'information entropy', so that whether the user belongs to a demand focusing group or a demand diverging group is accurately determined;
when the conventional commodity recommendation method is applied to the internet industry for carrying out algorithm analysis on massive data (such as millions of users and billions of commodity data), the conventional commodity recommendation method has the defects of high system resource requirement and long calculation time due to the large data amount, complex operation process and difficulty in meeting the service requirement of quick response of the internet industry. By using the technical scheme of the invention, the technical defects can be well overcome, and the problems of huge calculation data amount, low calculation speed and high server pressure of the conventional commodity recommendation method are solved;
for a focused user group, since the commodities recommended to the user are focused on a set smaller than the leaf categories, namely the clustered commodity unit, only the preferred categories of the user need to be queried, and the query speed is high due to the fact that the query categories are few; the quantity of commodities needing to be retrieved is greatly reduced by one-time recommendation, and the response time and the efficiency of the server are improved; and simultaneously, the recommendation accuracy and effect are better.
Drawings
Fig. 1 is a flowchart of a method for determining a user group according to an embodiment of the present disclosure;
FIG. 2 is a flowchart of a method for querying and recommending information according to a second embodiment of the present application;
FIG. 3 is a block diagram of a system for determining a user group according to a third embodiment of the present disclosure;
fig. 4 is a block diagram of a system for querying and recommending information provided in the fourth embodiment of the present application;
fig. 5 is a detailed block diagram of a user group determination unit in the information query and recommendation system according to the fourth embodiment of the present application;
fig. 6 is a specific block diagram of an identification unit in the information query and recommendation system according to the fourth embodiment of the present application.
Detailed Description
The embodiment of the application provides a method and a system for determining user groups, inquiring and recommending information, and is used for solving the problems that when a conventional commodity recommending method in the prior art is applied to the internet industry to perform algorithm analysis on mass data, the conventional commodity recommending method has large data volume and complex operation process, has high requirements on system resources and long calculation time, and is difficult to meet the service requirement of quick response of the internet industry.
The technical solution will be described in detail with reference to the drawings and the embodiments of the specification.
We borrow the concept of 'information entropy' to obtain an algorithm for accurately identifying the divergence of user demand preferences. The information entropy is a concept for measuring information quantity in the information theory, and the more ordered a system is, the lower the information entropy is; conversely, the larger the uncertainty of a system variable, the higher the information entropy.
As shown in fig. 1, based on the concept of "information entropy", an embodiment of the present application provides a method for determining a user group, which specifically includes:
step 101, acquiring behavior record information of a user on M leaf categories;
the categories are tree-shaped hierarchical system structures which are built by sorting commodities according to categories, the leaf categories are the categories on the finest layer, and no sub-categories exist. The behavior record information of the user can be stored in a database of the background server or a database of a server used by the user. Of course, the category may be a category of a commodity in an e-commerce website, or a category of various videos in a video website, such as a movie, a tv show, and the like, and any category formed by information that can be classified is within the scope of the idea of the embodiment.
102, counting the behavior record information, and analyzing to obtain M preference values p of the user for M leaf categoriesiWherein each of the M leaf categories corresponds to M preference values piA preference value p ofiM is an integer greater than or equal to 1;
the preference value piThe method specifically comprises the following steps: the behavior records of each leaf category accessed by the user account for the proportion value of the behavior records of the M leaf categories accessed by the user;
step 103, based on M preference values piCalculating to obtain a divergence value H of the demand preference of the user;
and 104, comparing the preference divergence value H with a first threshold value G, and listing the users into the demand focus user group when the preference divergence value H is less than or equal to the first threshold value G.
The types of users on the website are various, and there are enterprise users and individual users, and the enterprise users can be classified into raw material suppliers, production manufacturers, wholesale retailers, merchants and the like. Generally, the demands of enterprise users are more stable and focused than that of individual users, and the demands of raw material suppliers and production manufacturers are more concentrated than that of wholesale retailers and merchants, so that users with relatively stable demand types and demand amounts of commodities on websites are determined as demand-focused users; the group combined by the demand focus class users is called a demand focus class user group.
Wherein, in order to more accurately determine the user group, M preference values p of M leaf categories are obtained in the analysisiThen, the method further comprises the following steps:
preference value p to be less than a second threshold valueiFiltering the corresponding leaf category to obtain the remaining preference value p which is greater than or equal to the first threshold valueiCorresponding N leaf categories; n is less than or equal to M; due to piLess than or equal to 1, so that the second threshold value can be set to 0.3, then p is setiThe leaf categories with values less than 0.3 are filtered out. In practical application, the size of the second threshold value can be set according to specific situations.
Based on N preference values piAnd calculating to obtain the demand preference divergence H of the user.
In the above method, based on the N preference values Pi, the demand preference divergence H of the user is calculated, specifically:
by information entropy formulaAnd calculating to obtain a demand preference divergence value H of the user.
For example, if the user has the same preference for one or several categories, and only the number of preference categories is different:
when the user only prefers one category, the information entropy is-1 × log2(1) ═ 0;
when the user prefers two categories and the preference degrees are equally distributed, the entropy of the information is-0.5 log2(0.5) -0.5 log2(0.5) 1;
when the user prefers 4 categories and the preference degrees are evenly distributed, the entropy of the information is 4 (-0.25 log2(0.25)) -2
Therefore, the larger the number of the preference categories, the larger the entropy value, which reflects the larger the divergence of the user demand, and vice versa.
Of course, in practical situations, the user's preference interests are usually not evenly distributed for as many leaf categories, and then:
assuming that users all prefer 3 categories, and the preference value ratio of the user a to the three leaf categories is 0.3, 0.3 and 0.4, the entropy is 1.57;
the preference value ratio of the user B to the three leaf subclasses is 0.2, 0.2 and 0.6, and the entropy is 1.22;
the preference value ratio of the user C to the three leaf sub-categories is 0.05, 0.1, 0.85, and the entropy is 0.75.
Therefore, the more focused the access behavior is under the few preference categories, the smaller the entropy value is, the larger the focusing degree of the user requirement is reflected, and the smaller the focusing degree is vice versa.
Therefore, in the technical solution provided in the first embodiment of the present application, in order to more accurately obtain the demand preference divergence of the user, a threshold G is set, users with H less than or equal to G are listed as a demand focus user group, and users with H greater than G are listed as a demand divergence user group; for example, if G is set to 1, user C is listed as the demand focus user group, and users a and B are listed as the demand divergence user group. In an application system, the threshold value G can be set according to actual conditions, and different information query and recommendation methods suitable for the users can be adopted for different user groups after the users are distinguished.
As shown in fig. 2, a second embodiment of the present application provides an information query and recommendation method, which specifically includes:
step 201, obtaining preference value p in each leaf category based on behavior record information of user on M leaf categoriesiAnd information entropy formulaDetermining that the user belongs to a demand focusing user group;
step 202, inquiring behavior record information of a demand focus user group on a clustered commodity unit, and analyzing to obtain clustered commodity information preferred by a user; the clustering commodity unit is specifically used for storing the clustering commodity information obtained by clustering the commodity information under S leaf categories; s is an integer greater than or equal to 1;
and step 203, recommending the clustered commodity information preferred by the user to the user.
Wherein, for step 201, the preference value p in each leaf category is obtained based on the behavior record information of the user on the M leaf categoriesiThe method specifically comprises the following steps:
acquiring behavior record information of a user on M leaf categories;
counting the behavior record information, and analyzing to obtain M preference values p of the user to M leaf categoriesiWherein each of the M leaf categories corresponds to M preference values piOne preference value pi(ii) a Wherein M is an integer greater than or equal to 1; the preference value piThe method specifically comprises the following steps: the behavior records of each leaf category accessed by the user account for the proportion value of the behavior records of the M leaf categories accessed by the user;
by information entropy formulaCalculating to obtain a divergence value H of the demand preference of the user;
comparing the preference divergence value H with a first threshold value G, and listing the users into a demand focusing user group when the preference divergence value H is less than or equal to the first threshold value G; and if the preference divergence value H is larger than the first threshold value G, listing the user into the demand divergence user group. As already explained above, it is not described in detail here.
After determining that a certain user belongs to a user group with focused requirements, the information required by the user needs to be further accurately identified, and then the behavior record information of the user needs to be inquired, so that the commodity liked by the user is determined, and further the commodity is recommended to the user.
For the user group with more divergent requirements, the commodity/information can be well recommended according to the existing category preference algorithm or correlation algorithm, for the user group with focused requirements, the preference of the existing recommendation algorithm on the commodity/information is only positioned to the leaf category, however, tens of thousands or even hundreds of thousands of commodities/information are still possible under one leaf category, and due to the huge data volume, the complex operation process has high requirements on system resources, the calculation time is long, and the service requirement of quick response of the internet industry is difficult to meet.
For a group of users who require relatively stable focus, it is first necessary to more accurately locate the categories of the preferred goods/information. Therefore, in the embodiment of the present application, under the leaf category, the commodities/information are further clustered and subdivided, and the clustering method is as follows: clustering commodity information under the S leaf categories to obtain clustered commodity information; and S is an integer greater than or equal to 1.
The method specifically comprises the following steps:
performing word segmentation on the commodity title and the information description under each leaf category in the S leaf categories;
extracting keywords of commodity information under each leaf category;
and clustering the commodity information containing the keywords under each leaf category to obtain L pieces of clustered commodity information.
For example: under the leaf category of "shirt" there is a title: mjx2011 Men's shirt made of casual frosted latticed shirt with long sleeves and thickened in New autumn and winter
1, segmenting a title: winter style, leisure, frosted, checked, shirt, men, long sleeves, thickened
2, extracting words and phrases with high occurrence frequency and large influence degree by frequent pattern mining;
suppose that the word found to occur most frequently is: leisure, plaid, shirt, winter style
Identifying attribute words with large influence degree by searching and clicking behavior data such as attribute keywords by a user;
find the most searched words of the user: male long sleeve
And 4, combining the key words to obtain: leisure, lattice, shirt, winter style, men, long sleeves
And 5, under the leaf category of the shirt, classifying all the commodities with the words in the title into one category to obtain clustered commodity information.
Obviously, the clustered commodity information is information finer than the leaf category information, and the requirements of the user can be positioned more accurately.
Further, identifying the clustered commodity information preferred by the user based on the behavior record information of the demand focus type user group in the clustered commodity information specifically comprises:
acquiring behavior record information of the demand focus user group in the L pieces of clustered commodity information;
counting the behavior record information to obtain an access quantity value Pa of each piece of clustered commodity information in the L pieces of clustered commodity information and a frequency Fa of accessing each piece of clustered commodity information in a specific time period;
and filtering R pieces of clustered commodity information corresponding to the frequency value Fa smaller than the first preset threshold value K and the access magnitude value Pa smaller than the second threshold value X to obtain L minus R pieces of clustered commodity information which serve as the clustered commodity information preferred by the user requirements.
According to the behavior log information of the users accessing the electronic commerce website collected in the server, the access behavior quantity of each user to the clustered commodity information under each accessed commodity/information leaf category is obtained through data statistical analysis and is used as the preference value of each user to each clustered commodity information.
In order to identify the stability of the user access demand, in addition to the access amount to the clustered commodity information, the frequency of access to the clustered commodity set information needs to be considered. In this embodiment, the total number of visits to the clustered commodity information a in one month is used as the visit volume:
Pa=Pa1+Pa2+...Pa30
and taking the number of days for accessing the clustered commodity information A in one month as the access frequency:
fa n/30(n is the number of days to access the clustered commodity information A in a month)
In order to more accurately identify the clustered commodity information preferred by the user, the clustered commodity information with low visiting frequency and the clustered commodity information with low visiting quantity can be filtered, and if Fa < ═ M (a set threshold value), the clustered commodity information corresponding to Fa is filtered;
and sequencing the access quantity Pa of the rest clustered commodity information, taking the TopN information as the user preference information, and finally inquiring and recommending the information to the user.
As shown in fig. 3, a third embodiment of the present application provides a system for determining a user group, which specifically includes:
an obtaining unit 301, configured to obtain behavior record information of a user on M leaf categories;
a statistic analysis unit 302, configured to perform statistics on the behavior record information to obtain M preference values p of the M leaf categories through analysisiWherein each of the M leaf categories corresponds to the M preference values piOne preference value piM is an integer greater than or equal to 1;
a first calculating unit 303 for calculating a preference value p based on the M preference valuesiCalculating to obtain a divergence value H of the demand preference of the user;
a user group determining unit 304, configured to compare the preference divergence value H with a first threshold G, and when the preference divergence value H is smaller than or equal to the first threshold G, list the user in a demand focus type user group.
In order to determine the user group more accurately, the system as described above further includes:
a filtering unit for filtering the preference value p smaller than the second threshold valueiFiltering the corresponding leaf category to obtain the remaining preference value p which is greater than or equal to the first threshold valueiCorresponding N leaf categories; n is less than or equal to M;
a second calculating unit for calculating the preference values p based on the N preference valuesiAnd calculating to obtain the demand preference divergence H of the user.
As shown in fig. 4, a fourth embodiment of the present application provides an information query and recommendation system, which specifically includes:
a user group determination unit 401 for obtaining a preference value p in each leaf category based on the behavior record information of the user in the M leaf categoriesiAnd information entropy formulaDetermining that the user belongs to a demand focus class user group;
the query analysis unit 402 is configured to query behavior record information of the demand focus type user group on a clustered commodity unit, and analyze the behavior record information to obtain clustered commodity information preferred by the user demand; the clustering commodity unit is specifically used for storing the clustering commodity information obtained by clustering the commodity information under S leaf categories; s is an integer greater than or equal to 1;
a recommending unit 403, configured to recommend the clustered commodity information of the demand preference to the user.
As shown in fig. 5, the user group determining unit 401 specifically includes:
the obtaining unit I401 a is used for obtaining behavior record information of a user on M leaf categories;
a first statistical analysis unit 401b, configured to perform statistics on the behavior record information to obtain M preference values p of the M leaf categories through analysisiWherein each of the M leaf categories corresponds to the M preference values piOne preference value pi(ii) a Wherein M is an integer greater than or equal to 1; the preference value piThe method specifically comprises the following steps: and the behavior record of each leaf category accessed by the user accounts for the proportion value of the behavior records of the M leaf categories accessed by the user.
A demand preference divergence unit 401c for passing the information entropy formulaCalculating to obtain a divergence value H of the demand preference of the user;
a determining unit 401d, configured to compare the preference divergence value H with a first threshold G, and when the preference divergence value H is smaller than or equal to the first threshold G, list the user into a demand focus type user group.
Wherein the system as described above further comprises:
the clustering unit is used for clustering the commodity information under the S leaf categories to obtain the clustered commodity information; and S is an integer greater than or equal to 1.
Wherein, the clustering unit specifically comprises:
the word segmentation unit is used for segmenting the commodity title and the information description under each leaf category in the S leaf categories;
the keyword extraction unit is used for extracting keywords of the commodity information under each leaf category;
and the commodity information clustering unit is used for clustering the commodity information containing the key words under each leaf category to obtain S pieces of clustered commodity information.
As shown in fig. 6, the query analysis unit 402 in the system provided in the fourth embodiment of the present application specifically includes:
an obtaining unit two 402a, configured to obtain behavior record information of the demand focus class user group in the S pieces of clustered commodity information;
a second statistical analysis unit 402b configured to perform statistics on the behavior record information to obtain an access quantity Pa for each piece of clustered commodity information in the S pieces of clustered commodity information and a frequency Fa for accessing each piece of clustered commodity information in a specific time period;
the filtering and identifying unit 402c filters L pieces of clustered commodity information corresponding to the frequency value Fa smaller than the first predetermined threshold value K and the access magnitude Pa smaller than the second threshold value X to obtain S minus L pieces of clustered commodity information as the clustered commodity information preferred by the user demand.
The technical scheme provided by one or more embodiments of the application has at least one of the following beneficial technical effects or advantages:
the method and the device have the advantages that the algorithm for accurately identifying the divergence degree of the demand preference of the user is obtained by virtue of the concept of 'information entropy', so that whether the user belongs to a demand focusing group or a demand diverging group is accurately determined;
when the conventional commodity recommendation method is applied to the internet industry for carrying out algorithm analysis on massive data (such as millions of users and billions of commodity data), the conventional commodity recommendation method has the defects of high system resource requirement and long calculation time due to the large data amount, complex operation process and difficulty in meeting the service requirement of quick response of the internet industry. By using the technical scheme of the invention, the technical defects can be well overcome, and the problems of huge calculation data amount, low calculation speed and high server pressure of the original commodity recommendation method are solved;
for a focused user group, since the commodities recommended to the user are focused on a set smaller than the leaf categories, namely the clustered commodity unit, only the preferred categories of the user need to be queried, and the query speed is high due to the fact that the query categories are few; the quantity of commodities needing to be retrieved is greatly reduced by one-time recommendation, and the response time and the efficiency of the server are improved; and simultaneously, the recommendation accuracy and effect are better.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (9)

1. An information query and recommendation method is characterized by comprising the following steps:
preference value p in each leaf category obtained based on behavior record information of user on M leaf categoriesiThe information entropy formula is used for determining that the user belongs to a user group of a demand focusing class;
inquiring the behavior record information of the demand focusing user group on a clustering commodity unit, and analyzing to obtain clustering commodity information preferred by the user demand; the clustering commodity unit is specifically used for storing the clustering commodity information obtained by clustering the commodity information under S leaf categories; s is an integer greater than or equal to 1; recommending the clustered commodity information of the demand preference to the user;
the preference value p is used as the basisiAnd an information entropy formula, wherein the step of determining that the user belongs to a demand focus user group specifically comprises the following steps:
calculating to obtain a demand preference divergence value H of the user through an information entropy formula;
and comparing the preference divergence value H with a first threshold value G, and listing the user into a demand focus user group when the preference divergence value H is less than or equal to the first threshold value G.
2. The method of claim 1, wherein the information entropy formula is:
3. the method of claim 1, wherein the preference value p for each leaf category obtained based on the user's behavior record information on M leaf categoriesiThe method specifically comprises the following steps:
acquiring behavior record information of a user on M leaf categories;
counting the behavior record information, and analyzing to obtain M preference values p of the user to the M leaf categoriesiWherein each of the M leaf categories corresponds to the M preference valuespiOne preference value pi
Wherein M is an integer greater than or equal to 1; the preference value piThe method specifically comprises the following steps: and the behavior record of each leaf category accessed by the user accounts for the proportion value of the behavior records of the M leaf categories accessed by the user.
4. The method according to claim 1, wherein the clustering the commodity information under the S leaf categories specifically comprises:
performing word segmentation on the commodity title and the information description under each leaf category in the S leaf categories;
extracting keywords of commodity information under each leaf category;
clustering commodity information containing the key words under each leaf category to obtain L pieces of clustered commodity information; wherein L and S are the same or different integers.
5. The method according to claim 1, wherein the querying behavior record information of the demand focus class user group on a cluster commodity unit, and analyzing to obtain the cluster commodity information of the user demand preference specifically comprises:
acquiring behavior record information of the demand focus user group in the L pieces of clustered commodity information;
counting the behavior record information to obtain an access quantity value Pa of each piece of clustered commodity information in the L pieces of clustered commodity information and a frequency Fa of accessing each piece of clustered commodity information in a specific time period;
and filtering R pieces of clustered commodity information corresponding to the frequency value Fa smaller than the first preset threshold value K and the access magnitude value Pa smaller than the second threshold value X to obtain L minus R pieces of clustered commodity information which serve as the clustered commodity information preferred by the user requirements.
6. A system for querying and recommending information is characterized by specifically comprising:
user' sA group determination unit for obtaining preference value p of each leaf category based on behavior record information of user on M leaf categoriesiThe information entropy formula is used for determining that the user belongs to a user group of a demand focusing class;
the query analysis unit is used for querying the behavior record information of the demand focus type user group on the clustered commodity unit and analyzing to obtain the clustered commodity information preferred by the user demand;
the recommending unit is used for recommending the clustered commodity information with the demand preference to the user;
the user group determination unit specifically includes:
the demand preference divergence unit is used for calculating a demand preference divergence value H of the user through an information entropy formula;
and the determining unit is used for comparing the preference divergence value H with a first threshold value G, and listing the user into a demand focus user group when the preference divergence value H is smaller than or equal to the first threshold value G.
7. The system of claim 6, wherein the user group determination unit specifically comprises:
the obtaining unit I is used for obtaining behavior record information of a user on M leaf categories;
a first statistic analysis unit, configured to perform statistics on the behavior record information to obtain M preference values p of the M leaf categories through analysisiWherein each of the M leaf categories corresponds to the M preference values piOne preference value pi(ii) a Wherein M is an integer greater than or equal to 1; the preference value piThe method specifically comprises the following steps: and the behavior record of each leaf category accessed by the user accounts for the proportion value of the behavior records of the M leaf categories accessed by the user.
8. The system of claim 6, wherein the clustering commodity units specifically comprises:
the word segmentation unit is used for segmenting the commodity title and the information description under each leaf category in the S leaf categories;
the keyword extraction unit is used for extracting keywords of the commodity information under each leaf category;
and the commodity information clustering unit is used for clustering the commodity information containing the key words under each leaf category to obtain L pieces of clustered commodity information.
9. The system of claim 6, wherein the query analysis unit specifically comprises:
an obtaining unit II, configured to obtain behavior record information of the demand focus user group in the L pieces of clustered commodity information;
a second statistical analysis unit, configured to perform statistics on the behavior record information to obtain an access quantity Pa for each piece of clustered commodity information in the L pieces of clustered commodity information and a frequency Fa for accessing each piece of clustered commodity information in a specific time period;
and the filtering and identifying unit is used for filtering R pieces of clustered commodity information corresponding to the frequency value Fa smaller than the first preset threshold value K and the access magnitude value Pa smaller than the second threshold value X to obtain L minus R pieces of clustered commodity information which serve as the clustered commodity information preferred by the user demand.
HK13110008.4A 2013-08-27 Method and system for determining user group, information query and recommendation HK1182802B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110445052.6A CN103186539B (en) 2011-12-27 2011-12-27 A kind of method and system determining user group, information inquiry and recommendation

Publications (2)

Publication Number Publication Date
HK1182802A1 HK1182802A1 (en) 2013-12-06
HK1182802B true HK1182802B (en) 2017-06-02

Family

ID=

Similar Documents

Publication Publication Date Title
TWI539395B (en) Determine the user groups, information query and recommended methods and systems
US11977541B2 (en) Systems and methods for rapid data analysis
CN107220365B (en) Precise recommendation system and method based on collaborative filtering and parallel processing of association rules
CN103377250B (en) Top k based on neighborhood recommend method
CN110532479A (en) A kind of information recommendation method, device and equipment
CN108550068B (en) A method and system for personalized product recommendation based on user behavior analysis
CN105373597B (en) User collaborative filtering recommendation method based on k‑medoids item clustering and local interest fusion
CN106600310B (en) Method and system for carrying out sales volume prediction based on network search index
CN112052394B (en) Recommended methods, systems, terminal devices and storage media for professional content information
CN105589905A (en) User interest data analysis and collection system and method
US20120150958A1 (en) Methods and apparatus to determine audience engagement indices associated with media presentations
CN114861079B (en) A collaborative filtering recommendation method and system integrating product features
CN102402594A (en) Rich media personalized recommendation method
CN107180093A (en) Information search method and device and ageing inquiry word recognition method and device
CN105338408B (en) Video recommendation method based on time factor
CN107153656A (en) A kind of information search method and device
CN115062013A (en) Information recommendation method, device, equipment and storage medium
CN117194787A (en) An application push method based on user behavior analysis
CN109033352B (en) Value added service recommendation method and device
Peng et al. Improved collaborative filtering algorithm in the research and application of personalized movie recommendations
CN109241195B (en) Ranking calculation method and device
CN114780570B (en) A data query method, apparatus, device, and storage medium
HK1182802B (en) Method and system for determining user group, information query and recommendation
CN120336604B (en) A User Classification Method Based on Pre-aggregated Storage Tables
Zhang An intelligent recommendation method of remote ideological and political education resources based on user clustering