[go: up one dir, main page]

WO2008030568A3 - Feed crawling system and method and spam feed filter - Google Patents

Feed crawling system and method and spam feed filter Download PDF

Info

Publication number
WO2008030568A3
WO2008030568A3 PCT/US2007/019558 US2007019558W WO2008030568A3 WO 2008030568 A3 WO2008030568 A3 WO 2008030568A3 US 2007019558 W US2007019558 W US 2007019558W WO 2008030568 A3 WO2008030568 A3 WO 2008030568A3
Authority
WO
WIPO (PCT)
Prior art keywords
feed
crawling
spam
urls
database
Prior art date
Application number
PCT/US2007/019558
Other languages
French (fr)
Other versions
WO2008030568A2 (en
Inventor
James Ruga
Rebecca Berrigan
Original Assignee
Feedster Inc
James Ruga
Rebecca Berrigan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Feedster Inc, James Ruga, Rebecca Berrigan filed Critical Feedster Inc
Publication of WO2008030568A2 publication Critical patent/WO2008030568A2/en
Publication of WO2008030568A3 publication Critical patent/WO2008030568A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)
  • Computer And Data Communications (AREA)

Abstract

A feed crawling system, method, and computer program product. A spam filter and method for filtering. A system and method for feed crawling with spam filtering. A computer system and associated method and computer program product for crawling content feeds, the computer system comprising: at least one processor for executing at least one process; a database providing a storage for storing location information or universal reference locators (urls); a first process for prioritizing a list of urls to be crawled; a parallelized crawler process for crawling the urls and storing the results in the database; and an indexing process for indexing the database for a user to search.
PCT/US2007/019558 2006-09-07 2007-09-07 Feed crawling system and method and spam feed filter WO2008030568A2 (en)

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
US82490306P 2006-09-07 2006-09-07
US60/824,903 2006-09-07
US82511406P 2006-09-08 2006-09-08
US60/825,114 2006-09-08
US85057707A 2007-09-05 2007-09-05
US85059207A 2007-09-05 2007-09-05
US11/850,577 2007-09-05
US11/850,592 2007-09-05

Publications (2)

Publication Number Publication Date
WO2008030568A2 WO2008030568A2 (en) 2008-03-13
WO2008030568A3 true WO2008030568A3 (en) 2008-10-16

Family

ID=39157869

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2007/019558 WO2008030568A2 (en) 2006-09-07 2007-09-07 Feed crawling system and method and spam feed filter

Country Status (1)

Country Link
WO (1) WO2008030568A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108710672A (en) * 2018-05-17 2018-10-26 南京大学 A kind of Theme Crawler of Content method based on increment bayesian algorithm

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491438A (en) * 2018-02-12 2018-09-04 陆夏根 A kind of technology policy retrieval analysis method

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6182085B1 (en) * 1998-05-28 2001-01-30 International Business Machines Corporation Collaborative team crawling:Large scale information gathering over the internet
US6266664B1 (en) * 1997-10-01 2001-07-24 Rulespace, Inc. Method for scanning, analyzing and rating digital information content
US6377984B1 (en) * 1999-11-02 2002-04-23 Alta Vista Company Web crawler system using parallel queues for queing data sets having common address and concurrently downloading data associated with data set in each queue
US20020188841A1 (en) * 1995-07-27 2002-12-12 Jones Kevin C. Digital asset management and linking media signals with related data using watermarks
US20020194161A1 (en) * 2001-04-12 2002-12-19 Mcnamee J. Paul Directed web crawler with machine learning
US6631369B1 (en) * 1999-06-30 2003-10-07 Microsoft Corporation Method and system for incremental web crawling
US6738767B1 (en) * 2000-03-20 2004-05-18 International Business Machines Corporation System and method for discovering schematic structure in hypertext documents
US20050086206A1 (en) * 2003-10-15 2005-04-21 International Business Machines Corporation System, Method, and service for collaborative focused crawling of documents on a network
US20050102259A1 (en) * 2003-11-12 2005-05-12 Yahoo! Inc. Systems and methods for search query processing using trend analysis
US20050192936A1 (en) * 2004-02-12 2005-09-01 Meek Christopher A. Decision-theoretic web-crawling and predicting web-page change
US20050262062A1 (en) * 2004-05-08 2005-11-24 Xiongwu Xia Methods and apparatus providing local search engine
US20060136420A1 (en) * 2004-12-20 2006-06-22 Yahoo!, Inc. System and method for providing improved access to a search tool in electronic mail-enabled applications

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020188841A1 (en) * 1995-07-27 2002-12-12 Jones Kevin C. Digital asset management and linking media signals with related data using watermarks
US6266664B1 (en) * 1997-10-01 2001-07-24 Rulespace, Inc. Method for scanning, analyzing and rating digital information content
US6182085B1 (en) * 1998-05-28 2001-01-30 International Business Machines Corporation Collaborative team crawling:Large scale information gathering over the internet
US6631369B1 (en) * 1999-06-30 2003-10-07 Microsoft Corporation Method and system for incremental web crawling
US6377984B1 (en) * 1999-11-02 2002-04-23 Alta Vista Company Web crawler system using parallel queues for queing data sets having common address and concurrently downloading data associated with data set in each queue
US6738767B1 (en) * 2000-03-20 2004-05-18 International Business Machines Corporation System and method for discovering schematic structure in hypertext documents
US20020194161A1 (en) * 2001-04-12 2002-12-19 Mcnamee J. Paul Directed web crawler with machine learning
US20050086206A1 (en) * 2003-10-15 2005-04-21 International Business Machines Corporation System, Method, and service for collaborative focused crawling of documents on a network
US20050102259A1 (en) * 2003-11-12 2005-05-12 Yahoo! Inc. Systems and methods for search query processing using trend analysis
US20050192936A1 (en) * 2004-02-12 2005-09-01 Meek Christopher A. Decision-theoretic web-crawling and predicting web-page change
US20050262062A1 (en) * 2004-05-08 2005-11-24 Xiongwu Xia Methods and apparatus providing local search engine
US20060136420A1 (en) * 2004-12-20 2006-06-22 Yahoo!, Inc. System and method for providing improved access to a search tool in electronic mail-enabled applications

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108710672A (en) * 2018-05-17 2018-10-26 南京大学 A kind of Theme Crawler of Content method based on increment bayesian algorithm
CN108710672B (en) * 2018-05-17 2020-04-14 南京大学 A Topic Crawler Method Based on Incremental Bayesian Algorithm

Also Published As

Publication number Publication date
WO2008030568A2 (en) 2008-03-13

Similar Documents

Publication Publication Date Title
WO2008011029A3 (en) Method and system for creating a concept-object database
WO2007047252A3 (en) System, method & computer program product for concept based searching & analysis
WO2007081681A3 (en) Search system with query refinement and search method
WO2008088721A3 (en) Querying data and an associated ontology in a database management system
WO2007065947A3 (en) System and method of implementing an e-mail interface for a content management system
WO2005098591A3 (en) Methods and systems for structuring event data in a database for location and retrieval
WO2008021832A3 (en) Harvesting data from page
WO2008070866A3 (en) Interleaving search results
WO2007108788A3 (en) Method and system for answer extraction
WO2007103191A3 (en) Comparative web search
WO2009123866A3 (en) Method and system for organizing information
WO2006116196A3 (en) Media object metadata association and ranking
WO2007144853A3 (en) Method and apparatus for performing customized paring on a xml document based on application
WO2007059216A3 (en) Methods and apparatus for rank-based response set clustering
Sutherland et al. Equilibrium modeling of Cu (II) biosorption onto untreated and treated forest macro-fungus Fomes fasciatus.
WO2008030568A3 (en) Feed crawling system and method and spam feed filter
WO2007115219A3 (en) Item management systems and associated methods
WO2008009995A3 (en) System and method for indexing stored electronic data using a b-tree
ATE496474T1 (en) MULTI-LAYER ENVELOPE PROCESS AND CONTENT DELIVERY SYSTEM
WO2004107204A3 (en) Data processing method and system for combining database tables
Khosla et al. Efficacy of insecticidal dusts on natural infestation of Trogoderma granarium (Everts) on wheat seeds
WO2009120329A3 (en) Online analytic processing cube with time stamping
Wang JiaHong et al. Adsorption of Cr (VI) from aqueous solution onto short-chain polyaniline/palygorskite composites.
Lu HongTao et al. In situ oxidation and efficient simultaneous adsorption of arsenite and arsenate by Mg-Fe-LDH with persulfate intercalation.
Fazeli et al. Effect of Environmental Parameters on Economically Important Copepods in Chabahar Bay in 2007

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07811709

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07811709

Country of ref document: EP

Kind code of ref document: A2