CN111651670B - Content retrieval method, device terminal and storage medium based on user behavior patterns - Google Patents
Content retrieval method, device terminal and storage medium based on user behavior patterns Download PDFInfo
- Publication number
- CN111651670B CN111651670B CN202010454350.0A CN202010454350A CN111651670B CN 111651670 B CN111651670 B CN 111651670B CN 202010454350 A CN202010454350 A CN 202010454350A CN 111651670 B CN111651670 B CN 111651670B
- Authority
- CN
- China
- Prior art keywords
- content
- recall
- user
- click
- preset
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9538—Presentation of query results
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention provides a content retrieval method based on a user behavior pattern, which comprises the following steps: acquiring an input search keyword; determining at least one recall content matched with the search keyword in a preset content database; calculating the association score corresponding to each recall content according to a preset association score calculation method based on a user behavior map, wherein the user behavior map is constructed according to content included in a preset content database and historical click data corresponding to each piece of content; and sequencing at least one piece of recall content according to the association score, and outputting the sequencing result as a target retrieval result. The embodiment of the invention also provides a content retrieval device, a terminal and a computer readable storage medium based on the user behavior pattern. The invention can improve the effectiveness of the sequencing and displaying of the search results and the subsequent conversion rate. In addition, the invention also relates to a blockchain technology, and the sequencing result can be stored in a blockchain node.
Description
Technical Field
The present invention relates to the field of data retrieval technologies, and in particular, to a content retrieval method, a device terminal, and a computer readable storage medium based on a user behavior pattern.
Background
With the popularization of the internet and the development of search technology, various websites or APPs (applications) are provided with a search function, and a user can search for desired contents by inputting keywords. For example, the keyword "car wash" is input in a search interface provided by a search website or APP to find out contents such as functions, articles, questions and answers related to the "car wash".
In a specific implementation, the implementation of searching the related content through the keywords can be implemented based on ES (Elastic Search) full-text search engine, and after the content such as articles, functions, questions and answers is processed by ES word segmentation, an inverted index is established and put into a database. When the user searches, the keyword input by the user is subjected to the same word segmentation process as before, then the keyword and the like after the word segmentation process are matched with the inverted index to determine related content, and the related content is recalled from the database. And sequencing the recalled contents according to the service weight to determine the display sequence, and returning to the front end to display the search result.
However, in the above search scheme, only the matching degree between the input keywords and the content in the database is considered, the current user behavior information or other user behavior information is not considered, and personalized search ordering aiming at the actual demands of the user is not achieved, so that the satisfaction degree of the user on the search results is insufficient and the corresponding click rate is relatively low.
Disclosure of Invention
Based on this, it is necessary to address the above-mentioned problems, and a content retrieval method, an apparatus terminal, and a computer-readable storage medium based on a user behavior pattern are proposed.
A content retrieval method based on a user behavior pattern, comprising:
Acquiring an input search keyword;
determining at least one recall content matched with the search keyword in a preset content database according to the search keyword;
Calculating the association score corresponding to each recall content according to a preset association score calculation method based on a user behavior map, wherein the user behavior map is constructed according to content included in a preset content database and historical click data corresponding to each piece of content, and the historical click data comprises click users and click times;
and sequencing at least one piece of recall content according to the association score, and outputting the sequencing result as a target retrieval result.
Wherein the method further comprises: and for each piece of content contained in the preset content database, determining a content node and a user node corresponding to the content according to the click users and the click times in the historical click data, and constructing the user behavior map according to the content node and the user node.
Based on the user behavior pattern, the association score corresponding to each recall content is calculated according to a preset association score calculation method, and the method further comprises the following steps: determining click users and click times associated with each recall content according to the user behavior patterns aiming at each recall content; and calculating the association score corresponding to the recall content according to the associated click users and the click times.
Wherein the historical click data further comprises click time; the calculating the association score corresponding to the recall content according to the associated click user and the click times further comprises: calculating a time penalty score corresponding to the click time according to a preset time penalty function; and calculating the association score corresponding to the recall content according to the associated click times of the click users and the time penalty score corresponding to the click time.
Wherein the user behavior pattern further comprises trending content tags arranged on one or more contents; the step of determining at least one recall content matched with the search keyword in a preset content database according to the search keyword, and the step of further comprising: and determining at least one content with a hot content tag as the recall content in the preset content database.
Based on the user behavior pattern, the association score corresponding to each recall content is calculated according to a preset association score calculation method, and the method further comprises the following steps: and under the condition that the recall content is provided with a popular content label, calculating the association score corresponding to the recall content according to a preset punishment weight coefficient and a preset association score calculation method.
Wherein, according to the search keyword, determining at least one recall content matched with the search keyword in a preset content database, and further comprising: matching the search keywords with inverted indexes corresponding to the preset content database, and determining at least one recall content according to a matching result; wherein, the matching the search keyword with the inverted index corresponding to the preset content database, and determining at least one recall content according to the matching result, further comprises: according to the inverted index, calculating a matching score between each piece of content contained in the preset content database and the search keyword; taking the content with the matching score exceeding a preset matching threshold value as the recall content; or sorting each piece of content contained in the preset content database according to the matching score, and determining the sorting result in the preset content database according to the sorting result, wherein the sorting result is stored in a blockchain.
A content retrieval device based on a user behavior profile, comprising:
The keyword acquisition module is used for acquiring the input search keywords;
The content recall module is used for determining at least one recall content matched with the search keyword in a preset content database according to the search keyword;
the relevance score calculation module is used for respectively calculating relevance scores corresponding to each recall content according to a preset relevance score calculation method based on a user behavior map, wherein the user behavior map is constructed according to the content included in the preset content database and historical click data corresponding to each piece of content, and the historical click data comprises click users and click times;
and the sorting module is used for sorting at least one piece of recall content according to the association score and outputting a sorting result as a target retrieval result.
A terminal comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:
Acquiring an input search keyword;
determining at least one recall content matched with the search keyword in a preset content database according to the search keyword;
Calculating the association score corresponding to each recall content according to a preset association score calculation method based on a user behavior map, wherein the user behavior map is constructed according to content included in a preset content database and historical click data corresponding to each piece of content, and the historical click data comprises click users and click times;
and sequencing at least one piece of recall content according to the association score, and outputting the sequencing result as a target retrieval result.
A readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
Acquiring an input search keyword;
determining at least one recall content matched with the search keyword in a preset content database according to the search keyword;
Calculating the association score corresponding to each recall content according to a preset association score calculation method based on a user behavior map, wherein the user behavior map is constructed according to content included in a preset content database and historical click data corresponding to each piece of content, and the historical click data comprises click users and click times;
and sequencing at least one piece of recall content according to the association score, and outputting the sequencing result as a target retrieval result.
The invention has the following beneficial effects:
After the content retrieval method, the device terminal and the computer readable storage medium based on the user behavior pattern are adopted, at least one recall content matched with the recall content is summarized from a preset content database according to the retrieval keywords input by a user in the process of content retrieval, then the association score corresponding to each recall content is calculated based on the constructed user behavior pattern and a preset association score calculation method, and the recall content is ranked according to the association score, so that the ranked recall content is used as a final target retrieval result and is output to the user. That is, after the content retrieval method, the device terminal and the computer readable storage medium based on the user behavior pattern are adopted, the retrieval results obtained according to the input retrieval keywords can be further ranked based on the user behavior pattern, so that the effectiveness of ranking and displaying the retrieval results is improved, and the subsequent conversion rate of content retrieval is improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Wherein:
FIG. 1 is a flowchart of a content retrieval method based on a user behavior pattern according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating a user behavior pattern according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a user behavior pattern according to an embodiment of the present invention;
FIG. 4 is a flowchart of a content retrieval method based on a user behavior pattern according to an embodiment of the present invention;
FIG. 5 is a flowchart of a content retrieval method based on a user behavior pattern according to an embodiment of the present invention;
FIG. 6 is a flowchart of a content retrieval method based on a user behavior pattern according to an embodiment of the present invention;
FIG. 7 is a flow chart of determining recall content based on search keywords in an embodiment of the invention;
FIG. 8 is a schematic diagram of a content retrieval device based on a user behavior pattern according to an embodiment of the present invention;
FIG. 9 is a schematic diagram of a content retrieval device based on a user behavior pattern according to an embodiment of the present invention;
FIG. 10 is a schematic diagram of the association score computation module 106 according to an embodiment of the present invention;
FIG. 11 is a schematic diagram of the content recall module 104 according to an embodiment of the present invention;
FIG. 12 is a schematic diagram of a computer device running the content retrieval method based on user behavior patterns according to one embodiment of the application;
fig. 13 is a schematic structural diagram of an embodiment of a readable storage medium according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In this embodiment, in order to solve the problems of insufficient satisfaction and low click rate of the search result caused by the user behavior in the content search scheme, a content search method based on a user behavior map is provided.
Referring to fig. 1, fig. 1 is a flowchart of an embodiment of a content retrieval method based on a user behavior pattern according to the present invention.
Specifically, as shown in fig. 1, the content retrieval method based on the user behavior pattern provided by the invention comprises steps S102-S108:
step S102: and acquiring the input search keywords.
In this implementation scenario, the user may input the search keyword through an application or a preset search interface. The search keywords are keywords (the number of the keywords may be one or a plurality of keywords) determined by the user according to the search requirement.
In a specific embodiment, the user may input a search keyword through an input function in the application program for searching for related content in the application program. For example, the search keyword input by the user may be "car wash" for searching for content related to "car wash".
Step S104: and determining at least one recall content matched with the search keyword in a preset content database according to the search keyword.
In this embodiment, the content retrieval is implemented based on a preset content database, where the preset content database includes a plurality of pieces of content, for example, a piece of content may be an article, a function, or the like. The user can search the content database for the content to be checked through a preset search function.
Specifically, the process of determining at least one recall content according to the search keyword may perform content recall through an open source search engine according to the search keyword, so as to obtain at least one recall content; recall of content is accomplished, for example, by ES (Elastic Search) full text search engine.
Step S106: based on the user behavior patterns, calculating the association score corresponding to each recall content according to a preset association score calculation method.
In this embodiment, after determining a plurality of recall contents according to the search keywords, further sorting the recall contents is required, and then returning the recall contents to display according to the sorting result, so that the recall contents displayed in the search result of the search interface are more consistent with the content that the user wants to search.
In this embodiment, the ordering of the recall content is based on the association score for each piece of recall content. In this step, the calculation of the association score will be described. The association score describes the association degree of each piece of recall content and the search keywords input by the user, and the higher the association score is, the higher the association degree is, the higher the possibility that the user clicks the piece of recall content is, and the higher the satisfaction degree of the user on the search is.
Specifically, in this step, the calculation of the association score is based on the user behavior pattern. The user behavior patterns are data patterns constructed according to all contents included in a preset content database and historical click data corresponding to each piece of content.
Specifically, the historical click data includes the historical click data corresponding to each piece of content included in the content data, and specifically includes the click content (content label) corresponding to each click, the click user (label corresponding to the user), and the click times; in other embodiments, the corresponding click time for each click is also included.
In the process of constructing the user behavior pattern, aiming at each piece of content contained in a preset content database, determining clicking users, clicking times and the like in historical clicking data corresponding to the piece of content, determining corresponding content nodes and user nodes in the user behavior pattern, and constructing the user behavior pattern according to the content nodes and the user nodes. Specifically, according to historical click data or new click data of a user, the historical click data or the new click data is quickly stored into a neo4j database through a stream data processing frame such as a store or SPARK STREAMING, so that a corresponding user behavior map is constructed, and the content database and the corresponding user behavior map are updated.
Specifically, as shown in fig. 2, a schematic diagram of a User behavior map constructed according to a Content node (the label of the Content node is User in fig. 2) and a User node (the label of the User node is Content in fig. 2) is given.
With continued reference to fig. 2, the nodes included in fig. 2 are specifically as follows:
the attributes corresponding to the user node (USer) include a user_id (user identifier) and a read_num (user node's degree of outgoing) for identifying the number of content clicked by the user. The attributes corresponding to the Content node (Content) include content_id (Content identification) and content_type (Content type).
With continued reference to FIG. 2, the relationships contained in the user behavior profile shown in FIG. 2 are as follows:
Based on the user behavior profile, a corresponding path pattern (PATH PATTERN) may be constructed. Specifically, for any piece of recall content c, u is the user currently inputting the search keyword, based on the user behavior pattern, the user au corresponding to the recall content c after clicking can be determined, and the corresponding path is determined as follows:
Path=u→(ac:Content)←(au:User)→c,
where ac is any piece of content.
In this embodiment, the association score F (c, u) between the recall content c and the user u currently inputting the search keyword can be determined by calculating the number of paths between the recall content c and the user u, and the association score F (c, u) is the association score F (c) of the recall content c. The number of paths between the recall content c and the user u is the number of times the recall content c is clicked by the user au.
Specifically, the calculation process of the association score F (c) of each recall content c is as follows: determining a click user (wherein the click user associated with the recall content is the user au who clicks the recall content c) and the click times associated with the recall content c according to the user behavior pattern; and then calculating an association score F (c) corresponding to the recall content c according to all the users au and the corresponding click times.
Specifically, the association score F (c) may be calculated according to the following formula:
F(c)=F(c,u)=∑au∈Pat(1+au.read_num),
Where au.read_num represents the number of clicks of the user au on the recall content c.
In this embodiment, the calculation of the association score F (c) may also be performed not by directly summing the number of clicks, but by the inverse of the logarithm of the number of clicks, in order to avoid the influence of an abnormally active user on the result of the calculation of the association score. In particular, assume that there is a user in au who is crawling and has crawled most (e.g., 80%) of the content in the preset content database. The user is "active" but clicking on the content is not of interest to him, so the user's impact on Path should be much smaller than a user who has clicked on only a few tens of pieces of content. Specifically, in order to reduce the influence of such abnormally active users on the Path, referring to the idea of TF-IDF (term frequency-inverse document frequency, which is a common weighting technique for information retrieval and data mining, TF means word frequency, IDF means inverse text frequency index), a factor of IUF (Inverse User Frequence, a parameter of the inverse of the logarithm of the user liveness) is introduced, that is, the inverse of the logarithm of the number of user clicks is adopted to replace one addition of the simple number of user clicks, and the actual accuracy of the calculated association score is closer to the actual requirement, thereby improving the scientificity and accuracy of the calculation of the association score.
Specifically, the association score F (c) is calculated according to the following formula by the following formula:
Further, in other embodiments, in the process of calculating the association score for such abnormally active users, for some users who are too active, all paths corresponding to the users may be deleted directly in the Path, so as to remove the influence of the paths on the calculation of the association score.
Specifically, for the user au, determining the corresponding clicking times (can be for all contents or for the current recall content c), if the clicking times exceeds a preset clicking times threshold, determining the user au as an abnormal active user, and deleting the Path corresponding to the user au in the Path so as to improve the scientificity and the accuracy of the association score calculation.
Step S108: and sequencing at least one piece of recall content according to the association score, and outputting the sequencing result as a target retrieval result.
After the association score corresponding to each piece of recall content is calculated, the recall content can be ordered according to the association score, namely all recall content is ordered in descending order according to the association score. In the sorting result, the higher the association score, the more front the corresponding recall content is in the sorting result, and the more matching is made between the recall content and the retrieval requirement of the current retrieval on the user. Therefore, in this embodiment, the recall content is ranked according to the association score, and the ranked recall content is output as the target search result, so that the user can preferentially view or click on the recall content with a higher association score.
As can be seen from the above description, in the process of content retrieval in this embodiment, at least one piece of recall content that is matched is summarized from a preset content database according to a retrieval keyword input by a user, then, based on a constructed user behavior pattern and a preset association score calculation method, an association score corresponding to each piece of recall content is calculated, and the recall content is ranked according to the association score, so that the ranked recall content is used as a final target retrieval result and is output to the user. Through the scheme, the search results obtained according to the input search keywords can be further ranked based on the user behavior patterns, so that the search efficiency is effectively improved.
Further, in another embodiment, in determining recall content in the preset content database, not only whether each piece of content matches with the input search keyword, but also the coverage of the search result needs to be enlarged, and whether the corresponding content is popular content is considered in determining the recall result.
Specifically, in one embodiment, the nodes included in the user behavior map further include popular content nodes.
As shown in fig. 3, a schematic diagram of a user behavior pattern including a popular content node (HotContent) is provided.
With continued reference to fig. 3, the hot content node (HotContent) included in fig. 3 is specifically as follows:
With continued reference to fig. 3, the user behavior diagram shown in fig. 3 further includes the following table relationships:
That is, the content contained in the preset content database is further provided with a corresponding popular content tag, and the content marked with the popular content tag is the popular content. In the corresponding user behavior map, for the case of the popular Content, a label corresponding to HotContent (popular Content label) is further attached to the corresponding Content node (Content) for identifying that the Content node attached with the label corresponds to the popular Content node.
In the above-mentioned user behavior patterns, there are a plurality of labels supporting one node (e.g., content node Content). So in actual storage, both Content and HotContent tags are attached to the same Content node (for Content nodes that are popular Content).
In the case of considering whether the content is the popular content, as shown in fig. 4, step S104 in the above embodiment: according to the search keyword, determining at least one recall content matched with the search keyword in a preset content database, and further comprising the following step S1042:
step S1042: and determining at least one content with a hot content label as recall content in a preset content database.
The popular content is the content with more clicks of other users. Generally, the hot content is clicked on higher records than the other content. If certain popular content is contained in the determined recall content, the click rate of the user on the retrieval result can be improved, and the coverage range of the recall content can be improved.
Specifically, in this embodiment, in determining the recall content, it is necessary to determine, from among contents included in preset content data, not only a content matching with the search keyword as the recall content, but also a part of popular content in a preset content database as the recall content. Specifically, the duty ratio of the hot content in the recall content may be set to a preset ratio, for example, 20%. That is, the final recall content is composed of matching content recalled from the search keyword, and popular content. The hot content is used as the supplement of the recall content, so that the recall content can be ensured to be effectively determined.
Further, if the input search keyword is relatively cool, the number of contents matched with the search keyword is small, and the number requirement of content recall cannot be met; in this case, hot content may also be employed as a supplement to the recall content to ensure that a certain number of recall content is effectively determined.
In another embodiment, if the search keyword input by the user is a misplaced word or there is no matching content, the number of content matching the search keyword determined in step S104 may be 0 or less, resulting in a search failure. In this case, the hot content can be used as a spam of the recall content, and the recall content is filled through the hot content, so that a certain number of recall contents can be determined effectively.
Generally, if a piece of content is popular content, the number of clicks on the content is significantly higher than that of the ordinary content, and the corresponding association score is also higher. In order to avoid that the score of the popular content is too high to cause the calculation of the association score to be lost or the popular content is loud and the popular content is seized, the preset content retrieval effect is not achieved; in this embodiment, for calculating the association score of the recall content that is the popular content, it is further necessary to multiply a penalty factor α smaller than 1 and then use the penalty factor α as the association score of the popular content.
Specifically, as shown in fig. 5, step S106 is described above: based on the user behavior pattern, calculating the association score corresponding to each recall content according to a preset association score calculation method, and further comprising:
step S106A1: under the condition that the recall content is not provided with a popular content label, calculating a corresponding association score of the recall content according to a preset association score calculation method;
Step S106A2: under the condition that the recall content is provided with a popular content label, calculating the association score corresponding to the recall content according to a preset penalty weight coefficient and a preset association score calculation method.
That is, the calculation of the association score of the recall content c is performed according to the following calculation formula:
That is, the influence of the score of the popular content is reasonably considered in the calculation of the association score, and the accuracy and the authenticity of the calculation of the association score are improved by adding a punishment factor (punishment weight coefficient alpha) to the association score of the popular content, so that the effectiveness of the final search result sequencing and display is improved, and the subsequent conversion rate is improved.
In the present embodiment, it is possible to determine whether or not a piece of content is popular content by calculating a popularity value of the content.
Specifically, for each piece of content, determining the click times and recall times corresponding to the piece of content according to the historical click data, sorting according to the click times and recall times respectively, and then calculating the heat value corresponding to the piece of content according to the first sorting result of the click times and the second sorting result according to the recall times. Wherein the recall number refers to whether the piece of content was recalled as recall content during the retrieval. Wherein, the clicking times and recall times can each represent whether a piece of content is popular.
In a specific embodiment, the calculation of the popularity value of a piece of content may be calculated according to the sequence number of the piece of content in the first ranking result and the sequence number of the piece of content in the second ranking result. For example, in one embodiment, the popularity value of content a = the sequence number of content a at the first ranking result + the sequence number of content a at the second ranking result. In other embodiments, the calculation of the popularity value may be other methods of calculating the ranking result that ranks according to the number of clicks and the number of recalls.
In the actual searching process, whether a user wants to search a certain piece of content in the searching process is also related to the clicking time of the piece of content in the historical clicking data, and the clicking behaviors at different times have different actual interest influences on the current user to search. For example, other users 'clicks on the content should have significantly higher impact on the content than other users' clicks on the content 1 year or 10 years ago; the user is more concerned with the current hot content than with the relevant content 1 year or 10 years ago. Therefore, in the present embodiment, in calculating the association score of the recall content, the influence of the corresponding click time also needs to be considered in considering the number of clicks.
Specifically, the historical click data also comprises the corresponding click time of each click; as shown in fig. 6, the step of calculating the association score corresponding to the recall content according to the associated click user and the number of clicks further includes:
Step S106B1: calculating a time penalty score corresponding to the click time according to a preset time penalty function;
Step S106B2: and calculating the association score corresponding to the recall content according to the associated click times of the click users and the time penalty score corresponding to the click time.
In the process of calculating the association score F (c), au.read_num is originally used as the calculation of the click times; in this embodiment, in order to consider the influence of the click time, the penalty weight is multiplied for each click.
Specifically, in one embodiment, the penalty weight corresponding to the click time is a time penalty score corresponding to the click time calculated according to a preset time penalty function.
Specifically for each click p in au.read_num, in the association score F (c), au.read_num is replaced with the following parameters:
∑p∈au.readscore_p_time·p_num,
Which represents the sum of the products of each click number p_num and the corresponding time penalty score score_p_time in the click number of the user au on the corresponding content, the influence of the click behavior of the user au on the current user interest of the search can be more accurately identified relative to au. Wherein the time penalty function for calculating the time penalty score p time may be a time dependent negative correlation function.
In this embodiment, the process of determining at least one recall content in step S104 is further described in detail.
As described above, the process of determining at least one recall content in the preset content database according to the search keyword may be to implement recall of the recall content by using ES (Elastic Search) full text search engine. ELASTIC SEARCH is a Lucene-based search server, which provides a distributed multi-user-capable full-text search engine based on a RESTful web interface; ELASTIC SEARCH is developed in the Java language and issued as open source under Apache license terms, and is a popular enterprise-class search engine.
In a specific embodiment, after a user inputs a search keyword, word segmentation processing is performed on the input search keyword, and a word segmentation result corresponding to the search keyword is obtained. And for a preset content database, performing word segmentation processing on each piece of content by the same word segmentation method, then constructing an inverted index for the word segmentation result, and searching the preset content database through the inverted index.
And matching the word segmentation result corresponding to the search keyword with an inverted index corresponding to a preset content database, and determining recall content according to the matching result. And, the process of determining recall content according to the matching result may be a flow chart of determining recall content according to a search keyword as given in fig. 7.
Specifically, the step S104 further includes steps S401 to S403:
step S401: according to the inverted index corresponding to the preset content database, respectively calculating the matching score between each piece of content contained in the preset content database and the search keyword;
Step S402: taking the content with the matching score exceeding the preset matching threshold value as recall content;
Or, step S403: and sorting each piece of content contained in the preset content database according to the matching score, and determining the sorting result in the preset content database according to the sorting result, wherein the sorting result is stored in a blockchain.
It should be emphasized that, to further ensure the privacy and security of the above-mentioned ordering result, the above-mentioned ordering result may also be stored in a node of a blockchain.
According to the inverted index corresponding to the preset content database, calculating the matching score between each piece of content and the search keyword according to the high-frequency word and the low-frequency word in the inverted index, wherein the matching score represents the matching degree between each piece of content and the search keyword, and determining whether the piece of content should be recalled according to the matching degree.
After the matching score calculation between each piece of content and the search keyword is completed, recall content can be determined according to the matching score. In a particular embodiment, whether a piece of content is recalled is determined based on whether the match score exceeds a preset match threshold; for example, in the case where the match score exceeds a preset match threshold, the piece of content is taken as recall content, and in the case where the match score does not exceed the preset match threshold, the piece of content is not considered in determining recall content.
In another embodiment, whether a piece of content is recalled is determined based on the ordering of the matching scores of the content among the matching scores of all the content. Specifically, all the contents are arranged in descending order according to the matching score, then the content corresponding to the number N of the front-ordered contents is used as the recall content in the ordering result according to the number N of the recall contents required to be recalled.
In this embodiment, in order to ensure the coverage of the search result in determining the recall content, the number of recall contents needs to be increased in determining the recall content in step S104. For example, in general, the number of recalled contents is 200, and in this embodiment, the number of recalled contents can be increased to 240, 300, or even higher. That is, by increasing the number of recalled content, the coverage of the search results can be increased, thereby increasing the effectiveness of the final search result ordering and presentation and increasing the subsequent conversion rate.
In one embodiment, as shown in fig. 8, a content retrieval device based on a user behavior pattern is also proposed. Specifically, as shown in fig. 8, the content retrieval device based on the user behavior pattern includes:
a keyword obtaining module 102, configured to obtain an input search keyword;
a content recall module 104, configured to determine, according to the search keyword, at least one recall content that matches the search keyword in a preset content database;
The association score calculation module 106 is configured to calculate association scores corresponding to each recall content according to a preset association score calculation method based on a user behavior map, where the user behavior map is constructed according to content included in the preset content database and historical click data corresponding to each piece of content, and the historical click data includes click users and click times;
And the sorting module 108 is configured to sort at least one piece of recall content according to the association score, and output the sorted result as a target retrieval result.
In one embodiment, as shown in fig. 9, the content retrieval device based on a user behavior pattern further includes a user behavior pattern construction module 110, configured to determine, for each piece of content included in the preset content database, a content node and a user node corresponding to the content according to the click user and the number of clicks in the historical click data, and construct the user behavior pattern according to the content node and the user node.
In one embodiment, the association score calculation module 106 is further configured to determine, for each recall content, a click user and a click number associated with the recall content according to the user behavior spectrum; and calculating the association score corresponding to the recall content according to the associated click users and the click times.
In one embodiment, the historical click data further includes click times; as shown in fig. 10, the association score calculation module 106 includes a time penalty score calculation unit 1062 and an association score calculation unit 1064, where the time penalty score calculation unit 1062 is configured to calculate a time penalty score corresponding to the click time according to a preset time penalty function; the association score calculating unit 1064 is configured to calculate an association score corresponding to the recall content according to the associated click times of the click users and the time penalty score corresponding to the click time.
In one embodiment, the user behavior profile further includes trending content tags disposed on one or more content; the content recall module 104 is further configured to determine, in the preset content database, at least one content set with a popular content tag as the recall content.
In one embodiment, the association score calculating module 106 is further configured to calculate, when the recall content is provided with a popular content tag, an association score corresponding to the recall content according to a preset penalty weight coefficient and a preset association score calculating method.
The content recall module 104 is further configured to match the search keyword with an inverted index corresponding to the preset content database, and determine at least one recall content according to a matching result;
As shown in fig. 11, the content recall module 104 further includes a matching score calculating subunit 1042 and a content recall subunit 1044, where the matching score calculating subunit 1042 is configured to calculate, according to the inverted index, a matching score between each piece of content included in the preset content database and the search keyword; the content recall sub-module 1044 is configured to take, as the recall content, content whose matching score exceeds a preset matching threshold; or sorting each piece of content contained in the preset content database according to the matching score, and determining the sorting result in the preset content database according to the sorting result, wherein the sorting result is stored in a blockchain.
As can be seen from the foregoing description, in this embodiment, the content retrieval device based on the user behavior pattern collects at least one recall content matched from a preset content database according to a retrieval keyword input by a user, then calculates a correlation score corresponding to each recall content based on the constructed user behavior pattern and a preset correlation score calculation method, and ranks the recall content according to the correlation scores, so as to take the ranked recall content as a final target retrieval result and output the final target retrieval result to the user. That is, after the content retrieval method, the device terminal and the computer readable storage medium based on the user behavior pattern are adopted, the retrieval results obtained according to the input retrieval keywords can be further ranked based on the user behavior pattern, so that the effectiveness of ranking and displaying the retrieval results is improved, and the subsequent conversion rate of content retrieval is improved.
FIG. 12 illustrates an internal block diagram of a computer device in one embodiment. The computer device may specifically be a terminal or a server. As shown in fig. 12, the computer device includes a processor, a memory, and a network interface connected by a system bus. The memory includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system, and may also store a computer program that, when executed by a processor, causes the processor to implement a content retrieval method based on a user behavior profile. The internal memory may also have stored therein a computer program which, when executed by the processor, causes the processor to perform a content retrieval method based on a user behavior profile. It will be appreciated by those skilled in the art that the structure shown in FIG. 12 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In one embodiment, a smart terminal is provided that includes a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of:
Acquiring an input search keyword;
determining at least one recall content matched with the search keyword in a preset content database according to the search keyword;
Calculating the association score corresponding to each recall content according to a preset association score calculation method based on a user behavior map, wherein the user behavior map is constructed according to content included in a preset content database and historical click data corresponding to each piece of content, and the historical click data comprises click users and click times;
and sequencing at least one piece of recall content according to the association score, and outputting the sequencing result as a target retrieval result.
As can be seen from the above description, in this embodiment, the terminal collects at least one recall content that is matched from a preset content database according to a search keyword input by a user, then calculates a relevance score corresponding to each recall content based on a constructed user behavior graph and a preset relevance score calculation method, and sorts the recall content according to the relevance scores, so that the sorted recall content is used as a final target search result and is output to the user. That is, after the content retrieval method, the device terminal and the computer readable storage medium based on the user behavior pattern are adopted, the retrieval results obtained according to the input retrieval keywords can be further ranked based on the user behavior pattern, so that the effectiveness of ranking and displaying the retrieval results is improved, and the subsequent conversion rate of content retrieval is improved.
In one embodiment, referring to fig. 13, a schematic structural diagram of an embodiment of a readable storage medium according to the present invention is provided. The readable storage medium 10 has stored therein at least one computer program 20, the computer program 20 being for execution by a processor to implement the method of:
Acquiring an input search keyword;
determining at least one recall content matched with the search keyword in a preset content database according to the search keyword;
Calculating the association score corresponding to each recall content according to a preset association score calculation method based on a user behavior map, wherein the user behavior map is constructed according to content included in a preset content database and historical click data corresponding to each piece of content, and the historical click data comprises click users and click times;
and sequencing at least one piece of recall content according to the association score, and outputting the sequencing result as a target retrieval result.
In one embodiment, the readable storage medium 20 may be a memory chip, a hard disk or a removable hard disk in a terminal, or other readable and writable storage tools such as a flash disk, an optical disk, etc., and may also be a server, etc.
As can be seen from the above description, in this embodiment, the computer program in the readable storage medium may collect at least one recall content that is matched from a preset content database according to a search keyword input by a user, then calculate a relevance score corresponding to each recall content based on a constructed user behavior pattern and a preset relevance score calculation method, and rank the recall content according to the relevance score, so as to use the ranked recall content as a final target search result and output the result to the user. That is, after the content retrieval method, the device terminal and the computer readable storage medium based on the user behavior pattern are adopted, the retrieval results obtained according to the input retrieval keywords can be further ranked based on the user behavior pattern, so that the effectiveness of ranking and displaying the retrieval results is improved, and the subsequent conversion rate of content retrieval is improved.
Those skilled in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a non-volatile computer readable storage medium, and where the program, when executed, may include processes in the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (SYNCHLINK) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The blockchain (Blockchain), essentially a de-centralized database, is a string of data blocks that are generated in association using cryptographic methods, each of which contains information from a batch of network transactions for verifying the validity (anti-counterfeit) of its information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.
Claims (8)
1. A content retrieval method based on a user behavior pattern, comprising:
Acquiring an input search keyword;
determining at least one recall content matched with the search keyword in a preset content database according to the search keyword;
Based on the user behavior pattern, calculating the association score corresponding to each recall content according to a preset association score calculation method, wherein the method comprises the following steps: for each recall content, determining a click user and the click times associated with the recall content according to the user behavior patterns, and calculating an association score corresponding to the recall content according to the associated click user and the click times, wherein the user behavior patterns are constructed according to the content included in the preset content database and historical click data corresponding to each piece of content, the historical click data comprises the click user and the click times, the preset association score calculating method comprises the calculation of the reciprocal of the logarithm of the click times, and specifically the association score is calculated through the following formula:
The method comprises the steps that c is recall content, F (c) is an association score corresponding to the recall content, u is a user currently inputting a search keyword, F (c, u) is an association score between the recall content c and the user u currently inputting the search keyword, au is a clicking user associated with the recall content, au.read_num represents the clicking times of au to the recall content c, path is a Path constructed according to a user behavior map, and if the clicking times of au are greater than a preset clicking times threshold, the Path corresponding to au is deleted in the Path;
Sorting at least one piece of recall content according to the association score, and outputting a sorting result as a target retrieval result;
For each piece of content contained in the preset content database, determining content nodes and user nodes corresponding to the content according to click users and click times in the historical click data, and constructing the user behavior patterns according to the content nodes and the user nodes, wherein the attributes corresponding to the user nodes comprise user identifications and the number of the user click contents, and the attributes corresponding to the content nodes comprise content identifications and content types.
2. The method of claim 1, wherein the historical click data further comprises click times;
The calculating the association score corresponding to the recall content according to the associated click user and the click times further comprises:
calculating a time penalty score corresponding to the click time according to a preset time penalty function;
and calculating the association score corresponding to the recall content according to the associated click times of the click users and the time penalty score corresponding to the click time.
3. The method of claim 1, wherein the user behavior profile further comprises trending content tags disposed on one or more content;
the step of determining at least one recall content matched with the search keyword in a preset content database according to the search keyword, and the step of further comprising:
And determining at least one content with a hot content tag as the recall content in the preset content database.
4. The method of claim 3, wherein the calculating, based on the user behavior pattern, the association score corresponding to each recall content according to the preset association score calculating method, further comprises:
And under the condition that the recall content is provided with a popular content label, calculating the association score corresponding to the recall content according to a preset punishment weight coefficient and a preset association score calculation method.
5. The method of claim 1, wherein the determining at least one recall content matching the search keyword in a preset content database according to the search keyword, further comprises:
matching the search keywords with inverted indexes corresponding to the preset content database, and determining at least one recall content according to a matching result;
Wherein, the matching the search keyword with the inverted index corresponding to the preset content database, and determining at least one recall content according to the matching result, further comprises:
According to the inverted index, calculating a matching score between each piece of content contained in the preset content database and the search keyword;
taking the content with the matching score exceeding a preset matching threshold value as the recall content;
Or sorting each piece of content contained in the preset content database according to the matching score, and determining the recall content in the preset content database according to the sorting result.
6. A content retrieval device based on a user behavior pattern, comprising:
The keyword acquisition module is used for acquiring the input search keywords;
The content recall module is used for determining at least one recall content matched with the search keyword in a preset content database according to the search keyword;
The association score calculation module is used for calculating the association score corresponding to each recall content according to a preset association score calculation method based on the user behavior pattern; the association score calculating module is specifically configured to determine, for each recall content, a click user and a click frequency associated with the recall content according to the user behavior spectrum, and calculate an association score corresponding to the recall content according to the associated click user and the click frequency, where the user behavior spectrum is constructed according to content included in the preset content database and historical click data corresponding to each piece of content, the historical click data includes the click user and the click frequency, the preset association score calculating method includes calculating an inverse of a logarithm of the click frequency, and specifically calculates the association score according to the following formula:
The method comprises the steps that c is recall content, F (c) is an association score corresponding to the recall content, u is a user currently inputting a search keyword, F (c, u) is an association score between the recall content c and the user u currently inputting the search keyword, au is a clicking user associated with the recall content, au.read_num represents the clicking times of au to the recall content c, path is a Path constructed according to a user behavior map, and if the clicking times of au are greater than a preset clicking times threshold, the Path corresponding to au is deleted in the Path;
the sorting module is used for sorting at least one piece of recall content according to the association score and outputting a sorting result as a target retrieval result;
For each piece of content contained in the preset content database, determining content nodes and user nodes corresponding to the content according to click users and click times in the historical click data, and constructing the user behavior patterns according to the content nodes and the user nodes, wherein the attributes corresponding to the user nodes comprise user identifications and the number of the user click contents, and the attributes corresponding to the content nodes comprise content identifications and content types.
7. A terminal comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1 to 5.
8. A computer readable storage medium, characterized in that a computer program is stored, which, when being executed by a processor, causes the processor to perform the steps of the method according to any of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010454350.0A CN111651670B (en) | 2020-05-26 | 2020-05-26 | Content retrieval method, device terminal and storage medium based on user behavior patterns |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010454350.0A CN111651670B (en) | 2020-05-26 | 2020-05-26 | Content retrieval method, device terminal and storage medium based on user behavior patterns |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111651670A CN111651670A (en) | 2020-09-11 |
CN111651670B true CN111651670B (en) | 2024-08-06 |
Family
ID=72343411
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010454350.0A Active CN111651670B (en) | 2020-05-26 | 2020-05-26 | Content retrieval method, device terminal and storage medium based on user behavior patterns |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111651670B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112559763B (en) * | 2020-12-09 | 2025-05-09 | 用友网络科技股份有限公司 | Search result recall method, device and readable storage medium |
CN112860989B (en) * | 2021-01-20 | 2022-02-01 | 平安科技(深圳)有限公司 | Course recommendation method and device, computer equipment and storage medium |
CN113609827B (en) * | 2021-08-09 | 2023-05-26 | 海南大学 | Content processing method and system based on intent-driven DIKW |
CN113988062A (en) * | 2021-10-22 | 2022-01-28 | 上海浦东发展银行股份有限公司 | Client unit information semi-automatic verification method based on short text matching |
CN114385830A (en) * | 2022-01-14 | 2022-04-22 | 中国建设银行股份有限公司 | Operation and maintenance knowledge online question and answer method, device, electronic equipment and storage medium |
CN114610793B (en) * | 2022-03-09 | 2022-10-04 | 东莞市创为新科技有限公司 | Interaction method, system and storage medium based on big data statistical analysis |
CN115794993A (en) * | 2022-11-17 | 2023-03-14 | 唯品会(广州)软件有限公司 | Search recall method and device, computer equipment and storage medium |
CN116450927B (en) * | 2023-02-28 | 2025-09-02 | 北京爱奇艺科技有限公司 | A search term ranking model training, search term ranking method and device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108710635A (en) * | 2018-04-08 | 2018-10-26 | 达而观信息科技(上海)有限公司 | A kind of content recommendation method and device |
CN109086394A (en) * | 2018-07-27 | 2018-12-25 | 天津字节跳动科技有限公司 | Search ordering method, device, computer equipment and storage medium |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100481141B1 (en) * | 2004-04-17 | 2005-04-07 | 엔에이치엔(주) | System and method for selecting search listings in an internet search engine and ordering the search listings |
US10268763B2 (en) * | 2014-07-25 | 2019-04-23 | Facebook, Inc. | Ranking external content on online social networks |
-
2020
- 2020-05-26 CN CN202010454350.0A patent/CN111651670B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108710635A (en) * | 2018-04-08 | 2018-10-26 | 达而观信息科技(上海)有限公司 | A kind of content recommendation method and device |
CN109086394A (en) * | 2018-07-27 | 2018-12-25 | 天津字节跳动科技有限公司 | Search ordering method, device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN111651670A (en) | 2020-09-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111651670B (en) | Content retrieval method, device terminal and storage medium based on user behavior patterns | |
US11985037B2 (en) | Systems and methods for conducting more reliable assessments with connectivity statistics | |
JP5575902B2 (en) | Information retrieval based on query semantic patterns | |
CN107391687B (en) | A Hybrid Recommendation System for Local Chronicle Websites | |
CN109063108B (en) | Search ranking method and device, computer equipment and storage medium | |
CN104217031B (en) | A kind of method and apparatus that user's classification is carried out according to server search daily record data | |
CN105765573B (en) | Improvements in website traffic optimization | |
US9569499B2 (en) | Method and apparatus for recommending content on the internet by evaluating users having similar preference tendencies | |
CN112231555B (en) | Recall method, device, equipment and storage medium based on user portrait label | |
Kong et al. | Predicting search intent based on pre-search context | |
CN102054003B (en) | Methods and systems for recommending network information and creating network resource index | |
CN109245996B (en) | Mail pushing method and device, computer equipment and storage medium | |
US20090228353A1 (en) | Query classification based on query click logs | |
US20090287645A1 (en) | Search results with most clicked next objects | |
US20150278345A1 (en) | Method, apparatus, and server for acquiring recommended topic | |
CN109886772A (en) | Products Show method, apparatus, computer equipment and storage medium | |
CN112784141A (en) | Search result quality determination method and device, storage medium and computer equipment | |
WO2012097309A1 (en) | Providing search information | |
CN111767445A (en) | Data search method, apparatus, computer equipment and storage medium | |
CN113468441A (en) | Search sorting method, device, equipment and storage medium based on weight adjustment | |
US20140156668A1 (en) | Apparatus and method for indexing electronic content | |
CN114625973B (en) | Anonymous information cross-domain recommendation method and device, electronic equipment and storage medium | |
CN118172138A (en) | Intelligent recommendation system for electronic commerce | |
US9305088B1 (en) | Personalized search results | |
CN116993512A (en) | Recommendation method, system, equipment and storage medium for financial products |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |