[go: up one dir, main page]

CN107357891A - A kind of homepage Link Recommendation method - Google Patents

A kind of homepage Link Recommendation method Download PDF

Info

Publication number
CN107357891A
CN107357891A CN201710565551.6A CN201710565551A CN107357891A CN 107357891 A CN107357891 A CN 107357891A CN 201710565551 A CN201710565551 A CN 201710565551A CN 107357891 A CN107357891 A CN 107357891A
Authority
CN
China
Prior art keywords
homepage
keyword
similarity
content
characteristic information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710565551.6A
Other languages
Chinese (zh)
Inventor
陈刚
何积丰
张新阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Open Source Cloud Data Technology (shanghai) Co Ltd
Original Assignee
Open Source Cloud Data Technology (shanghai) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Open Source Cloud Data Technology (shanghai) Co Ltd filed Critical Open Source Cloud Data Technology (shanghai) Co Ltd
Priority to CN201710565551.6A priority Critical patent/CN107357891A/en
Publication of CN107357891A publication Critical patent/CN107357891A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9558Details of hyperlinks; Management of linked annotations
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of homepage Link Recommendation method, this method comprises the following steps:(1) search result related to keyword is obtained according to the keyword of input;(2) filtered search result extracts all homepage lists of links related to keyword;(3) homepage links html source code corresponding to all homepages in chained list in obtaining step (2);(4) multigroup characteristic information is extracted from corresponding html source code for each homepage;(5) similarity of the homepage and keyword is asked for according to characteristic information for each homepage;(6) all homepages are ranked up and by similarity highest homepage Link Recommendation to user according to similarity.Compared with prior art, the inventive method is simple, is easily achieved, and can at utmost meet user's request.

Description

A kind of homepage Link Recommendation method
Technical field
The present invention relates to a kind of network search method, more particularly, to a kind of homepage Link Recommendation method.
Background technology
In modern society, web search has become the indispensable information source instrument of people.People use search During engine, one or one group of keyword are inputted first, and search engine provides search result list, demand of the people further according to oneself Follow blindly the link that oneself needs is found out in retrieval result list.Generally, these search engines have respective technical support user According to keyword retrieval, and the technology of each search engine and sort method also have difference, and so, user uses different search Engine, obtained result have some difference.But these search engines are disadvantageous in that:They and not know about user real Want that it is what to retrieve field or content either interested, the keyword thought in user's brain, the understanding of search engine may It is far from each other.Certainly, the result of user's care should have been contained in the results list for searching for obtain according to search key Link, only because each engine searching algorithm is different, its sort result is for user and non-optimal, it is also possible to which user intends Position in search result list is linked at corresponding to the keyword of searching very rearward, user can not immediately find.Therefore, each family Search engine develops the algorithm of oneself, collects information from many aspects, the true connotation of the keyword of " conjecture " user input, and gives Go out most probable and meet the search result that user requires.
The content of the invention
It is an object of the present invention to overcome the above-mentioned drawbacks of the prior art and provide a kind of homepage Link Recommendation Method.
The purpose of the present invention can be achieved through the following technical solutions:
A kind of homepage Link Recommendation method, this method comprise the following steps:
(1) search result related to keyword is obtained according to the keyword of input;
(2) filtered search result extracts all homepage lists of links related to keyword;
(3) homepage links html source code corresponding to all homepages in chained list in obtaining step (2);
(4) multigroup characteristic information is extracted from corresponding html source code for each homepage;
(5) similarity of the homepage and keyword is asked for according to characteristic information for each homepage;
(6) all homepages are ranked up and by similarity highest homepage Link Recommendation to user according to similarity.
Step (2) is specially:Extraction contains TLD and the link conduct with national domain from search result Filter result simultaneously forms homepage lists of links.
Multigroup characteristic information includes in step (3):The content of heading label, police network are put on record content, metadata tag The content of subtab described in the content and metadata tag of middle keyword subtab.
Step (5) is specifically, determine the weighted value w of i-th group of characteristic informationi, i=1,2 ... ... n, n represent characteristic information Total group of number, the similarity of each homepage and keyword is then determined using following step:
(a) degree of correlation X of i-th group of characteristic information and keyword is determinedi, i=1,2 ... ... n;
(c) similarity of the homepage of similarity to be asked for and keyword is F:
It is determined that also need to judge whether keyword can carry out phrase fractionation before the similarity of each homepage and keyword, if Can, using keyword be split as it is multiple it is crucial segment and as keyword is compared, otherwise by keyword directly as comparison keyword, And then step (b) is:I-th group of characteristic information is split into multiple feature participles, obtains and compares keyword in i-th group of characteristic information Feature participle in the frequency that occurs be used as described in degree of correlation Xi
Compared with prior art, the invention has the advantages that:
(1) the invention provides the recommendation method linked for homepage, specific aim is stronger, by being carried from html source code Take multigroup characteristic information and then obtain the similarity of homepage and keyword, recommendation results more conform to user's request;
(2) frequency is obtained to determine every group of spy by simply comparing during the similarity of present invention acquisition homepage and keyword The degree of correlation of reference breath, while consider that the significance level of each group characteristic information determines respective weights, finally by weighted sum method Corresponding similarity is obtained, this method is simple and convenient, but simultaneously all the time using the keyword of user's input as search target so that search Needed for hitch fruit is more close to the users, high degree meets that user requires.
Brief description of the drawings
Fig. 1 is the FB(flow block) of invention homepage Link Recommendation method.
Embodiment
The present invention is described in detail with specific embodiment below in conjunction with the accompanying drawings.
Embodiment 1
As shown in figure 1, a kind of homepage Link Recommendation method, this method comprise the following steps:
(1) search result related to keyword is obtained according to the keyword of input;
(2) filtered search result extracts all homepage lists of links related to keyword;
(3) homepage links html source code corresponding to all homepages in chained list in obtaining step (2);
(4) multigroup characteristic information is extracted from corresponding html source code for each homepage;
(5) similarity of the homepage and keyword is asked for according to characteristic information for each homepage;
(6) all homepages are ranked up and by similarity highest homepage Link Recommendation to user according to similarity, or The connection of all homepages can all be recommended user's common user according to sequencing of similarity and voluntarily selected by person from high to low.
Step (2) is specially:Extraction contains TLD and the link conduct with national domain from search result Filter result simultaneously forms homepage lists of links, described TLD and the domain name with country identification, refers to international interconnection The TLD that Network Information Centre is announced, such as " .com ", " .org ", " .net ", " .edu ", " .gov ", " .mil " etc., with And such as with " cn ", represent belong to Chinese domain name:" .com.cn ", " .org.cn ", " .net.cn ", " .edu.cn ", " .gov.cn ", " .mil.cn " etc..
Multigroup characteristic information includes in step (3):The content of heading label, police network are put on record content, metadata tag The content of subtab, the content of wherein heading label refer to described in the content and metadata tag of middle keyword subtab< title>The content of label, the content of keyword subtab refers in metadata tag<meta data>In label< keywords>The content of subtab, the content of subtab refers to described in metadata tag<metadata>In label< description>The content of subtab.
Step (5) is specifically, determine the weighted value w of i-th group of characteristic informationi, i=1,2 ... ... n, n represent characteristic information Total group of number, the similarity of each homepage and keyword is then determined using following step:
(a) degree of correlation X of i-th group of characteristic information and keyword is determinedi, i=1,2 ... ... n;
(c) similarity of the homepage of similarity to be asked for and keyword is F:
It is determined that also need to judge whether keyword can carry out phrase fractionation before the similarity of each homepage and keyword, if Can, using keyword be split as it is multiple it is crucial segment and as keyword is compared, otherwise by keyword directly as comparison keyword, And then step (b) is:I-th group of characteristic information is split into multiple feature participles, obtains and compares keyword in i-th group of characteristic information Feature participle in the frequency that occurs be used as described in degree of correlation Xi
The present embodiment inputs " gift cup " as the specific implementation method for illustrating the present invention exemplified by keyword using user.User Input " gift cup " (being designated as A), analyze the html source code content of search result list web page, filter out with " .com " and The link of " .com.cn " ending, obtains home page of company list;To each homepage, the html source code content of its webpage is analyzed, is obtained To 4 groups of characteristic informations, respectively heading label content (<title>The content of label), police network puts on record content, metadata In label keyword subtab content (<meta data>In label<keywords>The content of subtab) and metadata mark Described in label subtab content (<metadata>In label<description>The content of subtab).By above-mentioned title The content of label, police network, which are put on record, to be retouched in the content of keyword subtab and metadata tag in content, metadata tag The content for stating subtab is designated as B, C, D and E successively.Weight corresponding to B, C, D and E is defined as w1=1, w2=2, w3=1, w4 =1.Then, A, B, C, D, E content are segmented.Assuming that to one of homepage, the content after following participle is obtained:
A is segmented:Gift, cup
B is segmented:XYZ, water, cup, brand, official website, Shenzhen, XX, daily necessities, Co., Ltd
C is segmented:(here without content).
D is segmented:Water, cup, motion, water, cup, insulation, cup, space, cup, automobile, cup, teacup, cup, cup, lovers, Cup, gift, cup, office, cup, anion, water, cup, XX, life, articles for use
E is segmented:XX, daily necessities, Co., Ltd, with all strength, making, XYZ, brand, be, China is outdoor, water tool, it is top One of brand, motion, water, cup, thermos cup is outdoor, camping kettle, children, water, cup, is in great demand, at home and abroad, is well received by the public.
The frequency that " gift " in A participles occurs in B participles, C participles, D participles and E participles is respectively 0,0,1,0;A In " cup " in B participles, C participles, D participles and E participles the number that occurs be respectively 1,0,9,2.So X1=1, X2=0, X3 =10, X4=2, finally obtaining similarity F corresponding to the homepage is:
F=1 × 1+0 × 2+10 × 1+2 × 1=13.
By that analogy, the similarity of each homepage is calculated, is then sorted according to the order of Similarity value from high to low each Individual homepage, recommends user.
Embodiment 2
The present embodiment, which is inputted using user exemplified by " red point design " is used as keyword, illustrates specific implementation method of the invention.With Family input " red point design " (being designated as A), analyze the html source code content of search result list web page, filter out with " .com " and The link of " .com.cn " ending, obtains home page of company list;To each homepage, the html source code content of its webpage is analyzed, is obtained To 4 groups of characteristic informations, respectively heading label content (<title>The content of label), police network puts on record content, metadata In label keyword subtab content (<meta data>In label<keywords>The content of subtab) and metadata mark Described in label subtab content (<meta data>In label<description>The content of subtab).By above-mentioned title The content of label, police network, which are put on record, to be retouched in the content of keyword subtab and metadata tag in content, metadata tag The content for stating subtab is designated as B, C, D and E successively.Weight corresponding to B, C, D and E is defined as w1=1, w2=2, w3=1, w4 =1.Then, A, B, C, D, E content are segmented.Assuming that to one of homepage, the content after following participle is obtained:
A is segmented:It is red, design
B is segmented:It is red, design, Shanghai, Co., Ltd
C is segmented:Shanghai, red point, design, Co., Ltd
D is segmented:Brand, design, design, company, VI, design, design, company
E is segmented:Red, brand, design, brand, planning, brand, design, brand, management, integrally, high-end, brand is whole Close, plan, design, team
The frequency that " red point " in A participles occurs in B participles, C participles, D participles and E participles is respectively 1,1,0,1, A The number that " design " in participle occurs in B participles, C participles, D participles and E participles is respectively 1, Isosorbide-5-Nitrae, and 3.So X1=2, X2 =2, X3=4, X4=4, finally obtaining similarity F corresponding to the homepage is:
F=2 × 1+2 × 2+4 × 1+4 × 1=14.
By that analogy, the similarity of each homepage is calculated, is then sorted according to the order of Similarity value from high to low each Individual homepage, recommends user.

Claims (5)

  1. A kind of 1. homepage Link Recommendation method, it is characterised in that this method comprises the following steps:
    (1) search result related to keyword is obtained according to the keyword of input;
    (2) filtered search result extracts all homepage lists of links related to keyword;
    (3) homepage links html source code corresponding to all homepages in chained list in obtaining step (2);
    (4) multigroup characteristic information is extracted from corresponding html source code for each homepage;
    (5) similarity of the homepage and keyword is asked for according to characteristic information for each homepage;
    (6) all homepages are ranked up and by similarity highest homepage Link Recommendation to user according to similarity.
  2. 2. a kind of homepage Link Recommendation method according to claim 1, it is characterised in that step (2) is specially:From search As a result middle extraction contains TLD and link with national domain as filter result and forms homepage lists of links.
  3. A kind of 3. homepage Link Recommendation method according to claim 1, it is characterised in that multigroup feature letter in step (3) Breath includes:The content of heading label, police network are put on record the content of keyword subtab and first number in content, metadata tag According to the content of subtab described in label.
  4. 4. a kind of homepage Link Recommendation method according to claim 1, it is characterised in that step (5) is specifically, determine the The weighted value w of i group characteristic informationsi, i=1,2 ... ... n, n are represented total group of number of characteristic information, then determined using following step The similarity of each homepage and keyword:
    (a) degree of correlation X of i-th group of characteristic information and keyword is determinedi, i=1,2 ... ... n;
    (c) similarity of the homepage of similarity to be asked for and keyword is F:
  5. 5. a kind of homepage Link Recommendation method according to claim 4, it is characterised in that it is determined that each homepage and key Also need to judge whether keyword can carry out phrase fractionation before the similarity of word, if can, keyword is split as multiple crucial points Word is simultaneously used as comparison keyword, and otherwise by keyword directly as comparison keyword, and then step (b) is:I-th group of feature is believed Breath splits into multiple feature participles, obtains and compares the frequency conduct that keyword occurs in the feature participle of i-th group of characteristic information Described degree of correlation Xi
CN201710565551.6A 2017-07-12 2017-07-12 A kind of homepage Link Recommendation method Pending CN107357891A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710565551.6A CN107357891A (en) 2017-07-12 2017-07-12 A kind of homepage Link Recommendation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710565551.6A CN107357891A (en) 2017-07-12 2017-07-12 A kind of homepage Link Recommendation method

Publications (1)

Publication Number Publication Date
CN107357891A true CN107357891A (en) 2017-11-17

Family

ID=60291945

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710565551.6A Pending CN107357891A (en) 2017-07-12 2017-07-12 A kind of homepage Link Recommendation method

Country Status (1)

Country Link
CN (1) CN107357891A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114615511A (en) * 2020-12-09 2022-06-10 上海哔哩哔哩科技有限公司 Bullet screen key content skipping method and bullet screen skipping method
CN114630194A (en) * 2020-12-09 2022-06-14 上海哔哩哔哩科技有限公司 Method, system, equipment and computer readable storage medium for bullet screen jump link

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101158971A (en) * 2007-11-15 2008-04-09 深圳市迅雷网络技术有限公司 Method and device for sorting search results based on search engine
CN101369276A (en) * 2008-09-28 2009-02-18 杭州电子科技大学 A Forensics Method of Web Browser Cache Data
CN101546309A (en) * 2008-03-26 2009-09-30 国际商业机器公司 Method and equipment for constructing indexes to resource content in computer network
CN101630327A (en) * 2009-08-14 2010-01-20 昆明理工大学 Design method of theme network crawler system
CN101641697A (en) * 2007-03-23 2010-02-03 微软公司 Related search queries for a webpage and their applications
CN101853308A (en) * 2010-06-11 2010-10-06 中兴通讯股份有限公司 Method and application terminal for personalized meta-search
CN101968819A (en) * 2010-11-05 2011-02-09 中国传媒大学 Audio/video intelligent catalog information acquisition method facing to wide area network
CN103310014A (en) * 2013-07-02 2013-09-18 北京航空航天大学 Method for improving accuracy of search result
US20140229601A1 (en) * 2011-09-22 2014-08-14 Beijing Qihoo Technology Company Limited URL Navigation Page Generation Method, Device and Program
CN104216931A (en) * 2013-05-29 2014-12-17 酷盛(天津)科技有限公司 Real-time recommending system and method
US20150006506A1 (en) * 2009-03-04 2015-01-01 Alibaba Group Holding Limited Evaluation of web pages
CN105786951A (en) * 2015-12-31 2016-07-20 北京金山安全软件有限公司 Method and device for extracting content blocks in webpage and server
CN106033428A (en) * 2015-03-11 2016-10-19 北大方正集团有限公司 Uniform resource locator selection method and uniform resource locator selection device

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101641697A (en) * 2007-03-23 2010-02-03 微软公司 Related search queries for a webpage and their applications
CN101158971A (en) * 2007-11-15 2008-04-09 深圳市迅雷网络技术有限公司 Method and device for sorting search results based on search engine
CN101546309A (en) * 2008-03-26 2009-09-30 国际商业机器公司 Method and equipment for constructing indexes to resource content in computer network
CN101369276A (en) * 2008-09-28 2009-02-18 杭州电子科技大学 A Forensics Method of Web Browser Cache Data
US20150006506A1 (en) * 2009-03-04 2015-01-01 Alibaba Group Holding Limited Evaluation of web pages
CN101630327A (en) * 2009-08-14 2010-01-20 昆明理工大学 Design method of theme network crawler system
CN101853308A (en) * 2010-06-11 2010-10-06 中兴通讯股份有限公司 Method and application terminal for personalized meta-search
CN101968819A (en) * 2010-11-05 2011-02-09 中国传媒大学 Audio/video intelligent catalog information acquisition method facing to wide area network
US20140229601A1 (en) * 2011-09-22 2014-08-14 Beijing Qihoo Technology Company Limited URL Navigation Page Generation Method, Device and Program
CN104216931A (en) * 2013-05-29 2014-12-17 酷盛(天津)科技有限公司 Real-time recommending system and method
CN103310014A (en) * 2013-07-02 2013-09-18 北京航空航天大学 Method for improving accuracy of search result
CN106033428A (en) * 2015-03-11 2016-10-19 北大方正集团有限公司 Uniform resource locator selection method and uniform resource locator selection device
CN105786951A (en) * 2015-12-31 2016-07-20 北京金山安全软件有限公司 Method and device for extracting content blocks in webpage and server

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
任丽芸等: ""搜索引擎网页排序算法研究综述"", 《电脑与电信》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114615511A (en) * 2020-12-09 2022-06-10 上海哔哩哔哩科技有限公司 Bullet screen key content skipping method and bullet screen skipping method
CN114630194A (en) * 2020-12-09 2022-06-14 上海哔哩哔哩科技有限公司 Method, system, equipment and computer readable storage medium for bullet screen jump link
US11843843B2 (en) 2020-12-09 2023-12-12 Shanghai Bilibili Technology Co., Ltd. Bullet screen key content jump method and bullet screen jump method
CN114630194B (en) * 2020-12-09 2023-12-19 上海哔哩哔哩科技有限公司 Bullet screen jump linking method, system, equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN104143005B (en) A kind of related search system and method
KR101450358B1 (en) Searching structured geographical data
CN101320375B (en) Digital Book Search Method Based on User Click Behavior
KR100814667B1 (en) Systems and methods for clustering search results
CN103177075B (en) The detection of Knowledge based engineering entity and disambiguation
US8682881B1 (en) System and method for extracting structured data from classified websites
JP5212610B2 (en) Representative image or representative image group display system, method and program thereof, and representative image or representative image group selection system, method and program thereof
CN105205689A (en) Method and system for recommending commercial tenant
US20080162514A1 (en) System and method for generating a relationship network
JP2005085285A5 (en)
CN102194006B (en) Search system and method capable of gathering personalized features of group
CN106383887A (en) Environment-friendly news data acquisition and recommendation display method and system
Hauff et al. Placing images on the world map: a microblog-based enrichment approach
CN107590232A (en) A kind of resource recommendation system and method based on Network Study Environment
CN104699838B (en) A kind of Webpage search method for pushing, and more site searches combined method
CN106599215A (en) Question generation method and question generation system based on deep learning
CN103294692A (en) Information recommendation method and system
CN102375813A (en) Duplicate detection system and method for search engines
CN113282834A (en) Web search intelligent ordering method, system and computer storage medium based on mobile internet data deep mining
CN110970112A (en) Method and system for constructing knowledge graph for nutrition and health
CN102467544B (en) Information smart searching method and system based on space fuzzy coding
CN107357891A (en) A kind of homepage Link Recommendation method
CN109241438B (en) Element-based cross-channel hot event discovery method and device and storage medium
JPH1021250A (en) Method for retrieving plural data bases and method for searching document between plural data bases
CN108984582A (en) A kind of inquiry request processing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20171117