CN107357891A - A kind of homepage Link Recommendation method - Google Patents
A kind of homepage Link Recommendation method Download PDFInfo
- Publication number
- CN107357891A CN107357891A CN201710565551.6A CN201710565551A CN107357891A CN 107357891 A CN107357891 A CN 107357891A CN 201710565551 A CN201710565551 A CN 201710565551A CN 107357891 A CN107357891 A CN 107357891A
- Authority
- CN
- China
- Prior art keywords
- homepage
- keyword
- similarity
- content
- characteristic information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
- G06F16/9558—Details of hyperlinks; Management of linked annotations
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of homepage Link Recommendation method, this method comprises the following steps:(1) search result related to keyword is obtained according to the keyword of input;(2) filtered search result extracts all homepage lists of links related to keyword;(3) homepage links html source code corresponding to all homepages in chained list in obtaining step (2);(4) multigroup characteristic information is extracted from corresponding html source code for each homepage;(5) similarity of the homepage and keyword is asked for according to characteristic information for each homepage;(6) all homepages are ranked up and by similarity highest homepage Link Recommendation to user according to similarity.Compared with prior art, the inventive method is simple, is easily achieved, and can at utmost meet user's request.
Description
Technical field
The present invention relates to a kind of network search method, more particularly, to a kind of homepage Link Recommendation method.
Background technology
In modern society, web search has become the indispensable information source instrument of people.People use search
During engine, one or one group of keyword are inputted first, and search engine provides search result list, demand of the people further according to oneself
Follow blindly the link that oneself needs is found out in retrieval result list.Generally, these search engines have respective technical support user
According to keyword retrieval, and the technology of each search engine and sort method also have difference, and so, user uses different search
Engine, obtained result have some difference.But these search engines are disadvantageous in that:They and not know about user real
Want that it is what to retrieve field or content either interested, the keyword thought in user's brain, the understanding of search engine may
It is far from each other.Certainly, the result of user's care should have been contained in the results list for searching for obtain according to search key
Link, only because each engine searching algorithm is different, its sort result is for user and non-optimal, it is also possible to which user intends
Position in search result list is linked at corresponding to the keyword of searching very rearward, user can not immediately find.Therefore, each family
Search engine develops the algorithm of oneself, collects information from many aspects, the true connotation of the keyword of " conjecture " user input, and gives
Go out most probable and meet the search result that user requires.
The content of the invention
It is an object of the present invention to overcome the above-mentioned drawbacks of the prior art and provide a kind of homepage Link Recommendation
Method.
The purpose of the present invention can be achieved through the following technical solutions:
A kind of homepage Link Recommendation method, this method comprise the following steps:
(1) search result related to keyword is obtained according to the keyword of input;
(2) filtered search result extracts all homepage lists of links related to keyword;
(3) homepage links html source code corresponding to all homepages in chained list in obtaining step (2);
(4) multigroup characteristic information is extracted from corresponding html source code for each homepage;
(5) similarity of the homepage and keyword is asked for according to characteristic information for each homepage;
(6) all homepages are ranked up and by similarity highest homepage Link Recommendation to user according to similarity.
Step (2) is specially:Extraction contains TLD and the link conduct with national domain from search result
Filter result simultaneously forms homepage lists of links.
Multigroup characteristic information includes in step (3):The content of heading label, police network are put on record content, metadata tag
The content of subtab described in the content and metadata tag of middle keyword subtab.
Step (5) is specifically, determine the weighted value w of i-th group of characteristic informationi, i=1,2 ... ... n, n represent characteristic information
Total group of number, the similarity of each homepage and keyword is then determined using following step:
(a) degree of correlation X of i-th group of characteristic information and keyword is determinedi, i=1,2 ... ... n;
(c) similarity of the homepage of similarity to be asked for and keyword is F:
It is determined that also need to judge whether keyword can carry out phrase fractionation before the similarity of each homepage and keyword, if
Can, using keyword be split as it is multiple it is crucial segment and as keyword is compared, otherwise by keyword directly as comparison keyword,
And then step (b) is:I-th group of characteristic information is split into multiple feature participles, obtains and compares keyword in i-th group of characteristic information
Feature participle in the frequency that occurs be used as described in degree of correlation Xi。
Compared with prior art, the invention has the advantages that:
(1) the invention provides the recommendation method linked for homepage, specific aim is stronger, by being carried from html source code
Take multigroup characteristic information and then obtain the similarity of homepage and keyword, recommendation results more conform to user's request;
(2) frequency is obtained to determine every group of spy by simply comparing during the similarity of present invention acquisition homepage and keyword
The degree of correlation of reference breath, while consider that the significance level of each group characteristic information determines respective weights, finally by weighted sum method
Corresponding similarity is obtained, this method is simple and convenient, but simultaneously all the time using the keyword of user's input as search target so that search
Needed for hitch fruit is more close to the users, high degree meets that user requires.
Brief description of the drawings
Fig. 1 is the FB(flow block) of invention homepage Link Recommendation method.
Embodiment
The present invention is described in detail with specific embodiment below in conjunction with the accompanying drawings.
Embodiment 1
As shown in figure 1, a kind of homepage Link Recommendation method, this method comprise the following steps:
(1) search result related to keyword is obtained according to the keyword of input;
(2) filtered search result extracts all homepage lists of links related to keyword;
(3) homepage links html source code corresponding to all homepages in chained list in obtaining step (2);
(4) multigroup characteristic information is extracted from corresponding html source code for each homepage;
(5) similarity of the homepage and keyword is asked for according to characteristic information for each homepage;
(6) all homepages are ranked up and by similarity highest homepage Link Recommendation to user according to similarity, or
The connection of all homepages can all be recommended user's common user according to sequencing of similarity and voluntarily selected by person from high to low.
Step (2) is specially:Extraction contains TLD and the link conduct with national domain from search result
Filter result simultaneously forms homepage lists of links, described TLD and the domain name with country identification, refers to international interconnection
The TLD that Network Information Centre is announced, such as " .com ", " .org ", " .net ", " .edu ", " .gov ", " .mil " etc., with
And such as with " cn ", represent belong to Chinese domain name:" .com.cn ", " .org.cn ", " .net.cn ", " .edu.cn ",
" .gov.cn ", " .mil.cn " etc..
Multigroup characteristic information includes in step (3):The content of heading label, police network are put on record content, metadata tag
The content of subtab, the content of wherein heading label refer to described in the content and metadata tag of middle keyword subtab<
title>The content of label, the content of keyword subtab refers in metadata tag<meta data>In label<
keywords>The content of subtab, the content of subtab refers to described in metadata tag<metadata>In label<
description>The content of subtab.
Step (5) is specifically, determine the weighted value w of i-th group of characteristic informationi, i=1,2 ... ... n, n represent characteristic information
Total group of number, the similarity of each homepage and keyword is then determined using following step:
(a) degree of correlation X of i-th group of characteristic information and keyword is determinedi, i=1,2 ... ... n;
(c) similarity of the homepage of similarity to be asked for and keyword is F:
It is determined that also need to judge whether keyword can carry out phrase fractionation before the similarity of each homepage and keyword, if
Can, using keyword be split as it is multiple it is crucial segment and as keyword is compared, otherwise by keyword directly as comparison keyword,
And then step (b) is:I-th group of characteristic information is split into multiple feature participles, obtains and compares keyword in i-th group of characteristic information
Feature participle in the frequency that occurs be used as described in degree of correlation Xi。
The present embodiment inputs " gift cup " as the specific implementation method for illustrating the present invention exemplified by keyword using user.User
Input " gift cup " (being designated as A), analyze the html source code content of search result list web page, filter out with " .com " and
The link of " .com.cn " ending, obtains home page of company list;To each homepage, the html source code content of its webpage is analyzed, is obtained
To 4 groups of characteristic informations, respectively heading label content (<title>The content of label), police network puts on record content, metadata
In label keyword subtab content (<meta data>In label<keywords>The content of subtab) and metadata mark
Described in label subtab content (<metadata>In label<description>The content of subtab).By above-mentioned title
The content of label, police network, which are put on record, to be retouched in the content of keyword subtab and metadata tag in content, metadata tag
The content for stating subtab is designated as B, C, D and E successively.Weight corresponding to B, C, D and E is defined as w1=1, w2=2, w3=1, w4
=1.Then, A, B, C, D, E content are segmented.Assuming that to one of homepage, the content after following participle is obtained:
A is segmented:Gift, cup
B is segmented:XYZ, water, cup, brand, official website, Shenzhen, XX, daily necessities, Co., Ltd
C is segmented:(here without content).
D is segmented:Water, cup, motion, water, cup, insulation, cup, space, cup, automobile, cup, teacup, cup, cup, lovers,
Cup, gift, cup, office, cup, anion, water, cup, XX, life, articles for use
E is segmented:XX, daily necessities, Co., Ltd, with all strength, making, XYZ, brand, be, China is outdoor, water tool, it is top
One of brand, motion, water, cup, thermos cup is outdoor, camping kettle, children, water, cup, is in great demand, at home and abroad, is well received by the public.
The frequency that " gift " in A participles occurs in B participles, C participles, D participles and E participles is respectively 0,0,1,0;A
In " cup " in B participles, C participles, D participles and E participles the number that occurs be respectively 1,0,9,2.So X1=1, X2=0, X3
=10, X4=2, finally obtaining similarity F corresponding to the homepage is:
F=1 × 1+0 × 2+10 × 1+2 × 1=13.
By that analogy, the similarity of each homepage is calculated, is then sorted according to the order of Similarity value from high to low each
Individual homepage, recommends user.
Embodiment 2
The present embodiment, which is inputted using user exemplified by " red point design " is used as keyword, illustrates specific implementation method of the invention.With
Family input " red point design " (being designated as A), analyze the html source code content of search result list web page, filter out with " .com " and
The link of " .com.cn " ending, obtains home page of company list;To each homepage, the html source code content of its webpage is analyzed, is obtained
To 4 groups of characteristic informations, respectively heading label content (<title>The content of label), police network puts on record content, metadata
In label keyword subtab content (<meta data>In label<keywords>The content of subtab) and metadata mark
Described in label subtab content (<meta data>In label<description>The content of subtab).By above-mentioned title
The content of label, police network, which are put on record, to be retouched in the content of keyword subtab and metadata tag in content, metadata tag
The content for stating subtab is designated as B, C, D and E successively.Weight corresponding to B, C, D and E is defined as w1=1, w2=2, w3=1, w4
=1.Then, A, B, C, D, E content are segmented.Assuming that to one of homepage, the content after following participle is obtained:
A is segmented:It is red, design
B is segmented:It is red, design, Shanghai, Co., Ltd
C is segmented:Shanghai, red point, design, Co., Ltd
D is segmented:Brand, design, design, company, VI, design, design, company
E is segmented:Red, brand, design, brand, planning, brand, design, brand, management, integrally, high-end, brand is whole
Close, plan, design, team
The frequency that " red point " in A participles occurs in B participles, C participles, D participles and E participles is respectively 1,1,0,1, A
The number that " design " in participle occurs in B participles, C participles, D participles and E participles is respectively 1, Isosorbide-5-Nitrae, and 3.So X1=2, X2
=2, X3=4, X4=4, finally obtaining similarity F corresponding to the homepage is:
F=2 × 1+2 × 2+4 × 1+4 × 1=14.
By that analogy, the similarity of each homepage is calculated, is then sorted according to the order of Similarity value from high to low each
Individual homepage, recommends user.
Claims (5)
- A kind of 1. homepage Link Recommendation method, it is characterised in that this method comprises the following steps:(1) search result related to keyword is obtained according to the keyword of input;(2) filtered search result extracts all homepage lists of links related to keyword;(3) homepage links html source code corresponding to all homepages in chained list in obtaining step (2);(4) multigroup characteristic information is extracted from corresponding html source code for each homepage;(5) similarity of the homepage and keyword is asked for according to characteristic information for each homepage;(6) all homepages are ranked up and by similarity highest homepage Link Recommendation to user according to similarity.
- 2. a kind of homepage Link Recommendation method according to claim 1, it is characterised in that step (2) is specially:From search As a result middle extraction contains TLD and link with national domain as filter result and forms homepage lists of links.
- A kind of 3. homepage Link Recommendation method according to claim 1, it is characterised in that multigroup feature letter in step (3) Breath includes:The content of heading label, police network are put on record the content of keyword subtab and first number in content, metadata tag According to the content of subtab described in label.
- 4. a kind of homepage Link Recommendation method according to claim 1, it is characterised in that step (5) is specifically, determine the The weighted value w of i group characteristic informationsi, i=1,2 ... ... n, n are represented total group of number of characteristic information, then determined using following step The similarity of each homepage and keyword:(a) degree of correlation X of i-th group of characteristic information and keyword is determinedi, i=1,2 ... ... n;(c) similarity of the homepage of similarity to be asked for and keyword is F:
- 5. a kind of homepage Link Recommendation method according to claim 4, it is characterised in that it is determined that each homepage and key Also need to judge whether keyword can carry out phrase fractionation before the similarity of word, if can, keyword is split as multiple crucial points Word is simultaneously used as comparison keyword, and otherwise by keyword directly as comparison keyword, and then step (b) is:I-th group of feature is believed Breath splits into multiple feature participles, obtains and compares the frequency conduct that keyword occurs in the feature participle of i-th group of characteristic information Described degree of correlation Xi。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710565551.6A CN107357891A (en) | 2017-07-12 | 2017-07-12 | A kind of homepage Link Recommendation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710565551.6A CN107357891A (en) | 2017-07-12 | 2017-07-12 | A kind of homepage Link Recommendation method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107357891A true CN107357891A (en) | 2017-11-17 |
Family
ID=60291945
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710565551.6A Pending CN107357891A (en) | 2017-07-12 | 2017-07-12 | A kind of homepage Link Recommendation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107357891A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114615511A (en) * | 2020-12-09 | 2022-06-10 | 上海哔哩哔哩科技有限公司 | Bullet screen key content skipping method and bullet screen skipping method |
CN114630194A (en) * | 2020-12-09 | 2022-06-14 | 上海哔哩哔哩科技有限公司 | Method, system, equipment and computer readable storage medium for bullet screen jump link |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101158971A (en) * | 2007-11-15 | 2008-04-09 | 深圳市迅雷网络技术有限公司 | Method and device for sorting search results based on search engine |
CN101369276A (en) * | 2008-09-28 | 2009-02-18 | 杭州电子科技大学 | A Forensics Method of Web Browser Cache Data |
CN101546309A (en) * | 2008-03-26 | 2009-09-30 | 国际商业机器公司 | Method and equipment for constructing indexes to resource content in computer network |
CN101630327A (en) * | 2009-08-14 | 2010-01-20 | 昆明理工大学 | Design method of theme network crawler system |
CN101641697A (en) * | 2007-03-23 | 2010-02-03 | 微软公司 | Related search queries for a webpage and their applications |
CN101853308A (en) * | 2010-06-11 | 2010-10-06 | 中兴通讯股份有限公司 | Method and application terminal for personalized meta-search |
CN101968819A (en) * | 2010-11-05 | 2011-02-09 | 中国传媒大学 | Audio/video intelligent catalog information acquisition method facing to wide area network |
CN103310014A (en) * | 2013-07-02 | 2013-09-18 | 北京航空航天大学 | Method for improving accuracy of search result |
US20140229601A1 (en) * | 2011-09-22 | 2014-08-14 | Beijing Qihoo Technology Company Limited | URL Navigation Page Generation Method, Device and Program |
CN104216931A (en) * | 2013-05-29 | 2014-12-17 | 酷盛(天津)科技有限公司 | Real-time recommending system and method |
US20150006506A1 (en) * | 2009-03-04 | 2015-01-01 | Alibaba Group Holding Limited | Evaluation of web pages |
CN105786951A (en) * | 2015-12-31 | 2016-07-20 | 北京金山安全软件有限公司 | Method and device for extracting content blocks in webpage and server |
CN106033428A (en) * | 2015-03-11 | 2016-10-19 | 北大方正集团有限公司 | Uniform resource locator selection method and uniform resource locator selection device |
-
2017
- 2017-07-12 CN CN201710565551.6A patent/CN107357891A/en active Pending
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101641697A (en) * | 2007-03-23 | 2010-02-03 | 微软公司 | Related search queries for a webpage and their applications |
CN101158971A (en) * | 2007-11-15 | 2008-04-09 | 深圳市迅雷网络技术有限公司 | Method and device for sorting search results based on search engine |
CN101546309A (en) * | 2008-03-26 | 2009-09-30 | 国际商业机器公司 | Method and equipment for constructing indexes to resource content in computer network |
CN101369276A (en) * | 2008-09-28 | 2009-02-18 | 杭州电子科技大学 | A Forensics Method of Web Browser Cache Data |
US20150006506A1 (en) * | 2009-03-04 | 2015-01-01 | Alibaba Group Holding Limited | Evaluation of web pages |
CN101630327A (en) * | 2009-08-14 | 2010-01-20 | 昆明理工大学 | Design method of theme network crawler system |
CN101853308A (en) * | 2010-06-11 | 2010-10-06 | 中兴通讯股份有限公司 | Method and application terminal for personalized meta-search |
CN101968819A (en) * | 2010-11-05 | 2011-02-09 | 中国传媒大学 | Audio/video intelligent catalog information acquisition method facing to wide area network |
US20140229601A1 (en) * | 2011-09-22 | 2014-08-14 | Beijing Qihoo Technology Company Limited | URL Navigation Page Generation Method, Device and Program |
CN104216931A (en) * | 2013-05-29 | 2014-12-17 | 酷盛(天津)科技有限公司 | Real-time recommending system and method |
CN103310014A (en) * | 2013-07-02 | 2013-09-18 | 北京航空航天大学 | Method for improving accuracy of search result |
CN106033428A (en) * | 2015-03-11 | 2016-10-19 | 北大方正集团有限公司 | Uniform resource locator selection method and uniform resource locator selection device |
CN105786951A (en) * | 2015-12-31 | 2016-07-20 | 北京金山安全软件有限公司 | Method and device for extracting content blocks in webpage and server |
Non-Patent Citations (1)
Title |
---|
任丽芸等: ""搜索引擎网页排序算法研究综述"", 《电脑与电信》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114615511A (en) * | 2020-12-09 | 2022-06-10 | 上海哔哩哔哩科技有限公司 | Bullet screen key content skipping method and bullet screen skipping method |
CN114630194A (en) * | 2020-12-09 | 2022-06-14 | 上海哔哩哔哩科技有限公司 | Method, system, equipment and computer readable storage medium for bullet screen jump link |
US11843843B2 (en) | 2020-12-09 | 2023-12-12 | Shanghai Bilibili Technology Co., Ltd. | Bullet screen key content jump method and bullet screen jump method |
CN114630194B (en) * | 2020-12-09 | 2023-12-19 | 上海哔哩哔哩科技有限公司 | Bullet screen jump linking method, system, equipment and computer readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104143005B (en) | A kind of related search system and method | |
KR101450358B1 (en) | Searching structured geographical data | |
CN101320375B (en) | Digital Book Search Method Based on User Click Behavior | |
KR100814667B1 (en) | Systems and methods for clustering search results | |
CN103177075B (en) | The detection of Knowledge based engineering entity and disambiguation | |
US8682881B1 (en) | System and method for extracting structured data from classified websites | |
JP5212610B2 (en) | Representative image or representative image group display system, method and program thereof, and representative image or representative image group selection system, method and program thereof | |
CN105205689A (en) | Method and system for recommending commercial tenant | |
US20080162514A1 (en) | System and method for generating a relationship network | |
JP2005085285A5 (en) | ||
CN102194006B (en) | Search system and method capable of gathering personalized features of group | |
CN106383887A (en) | Environment-friendly news data acquisition and recommendation display method and system | |
Hauff et al. | Placing images on the world map: a microblog-based enrichment approach | |
CN107590232A (en) | A kind of resource recommendation system and method based on Network Study Environment | |
CN104699838B (en) | A kind of Webpage search method for pushing, and more site searches combined method | |
CN106599215A (en) | Question generation method and question generation system based on deep learning | |
CN103294692A (en) | Information recommendation method and system | |
CN102375813A (en) | Duplicate detection system and method for search engines | |
CN113282834A (en) | Web search intelligent ordering method, system and computer storage medium based on mobile internet data deep mining | |
CN110970112A (en) | Method and system for constructing knowledge graph for nutrition and health | |
CN102467544B (en) | Information smart searching method and system based on space fuzzy coding | |
CN107357891A (en) | A kind of homepage Link Recommendation method | |
CN109241438B (en) | Element-based cross-channel hot event discovery method and device and storage medium | |
JPH1021250A (en) | Method for retrieving plural data bases and method for searching document between plural data bases | |
CN108984582A (en) | A kind of inquiry request processing method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171117 |