|
US6594662B1
(en)
*
|
1998-07-01 |
2003-07-15 |
Netshadow, Inc. |
Method and system for gathering information resident on global computer networks
|
|
US6883135B1
(en)
*
|
2000-01-28 |
2005-04-19 |
Microsoft Corporation |
Proxy server using a statistical model
|
|
JP4283466B2
(ja)
*
|
2001-10-12 |
2009-06-24 |
富士通株式会社 |
リンク関係に基づく文書整理方法
|
|
US20040264677A1
(en)
*
|
2003-06-30 |
2004-12-30 |
Horvitz Eric J. |
Ideal transfer of call handling from automated systems to human operators based on forecasts of automation efficacy and operator load
|
|
US8707312B1
(en)
|
2003-07-03 |
2014-04-22 |
Google Inc. |
Document reuse in a search engine crawler
|
|
US7725452B1
(en)
*
|
2003-07-03 |
2010-05-25 |
Google Inc. |
Scheduler for search engine crawler
|
|
US7584221B2
(en)
*
|
2004-03-18 |
2009-09-01 |
Microsoft Corporation |
Field weighting in text searching
|
|
US7475067B2
(en)
*
|
2004-07-09 |
2009-01-06 |
Aol Llc |
Web page performance scoring
|
|
US7567959B2
(en)
|
2004-07-26 |
2009-07-28 |
Google Inc. |
Multiple index based information retrieval system
|
|
US7702618B1
(en)
|
2004-07-26 |
2010-04-20 |
Google Inc. |
Information retrieval system for archiving multiple document versions
|
|
US7711679B2
(en)
|
2004-07-26 |
2010-05-04 |
Google Inc. |
Phrase-based detection of duplicate documents in an information retrieval system
|
|
US7987172B1
(en)
*
|
2004-08-30 |
2011-07-26 |
Google Inc. |
Minimizing visibility of stale content in web searching including revising web crawl intervals of documents
|
|
WO2006027973A1
(ja)
*
|
2004-09-07 |
2006-03-16 |
Interman Corporation |
情報検索提供装置および情報検索提供システム
|
|
US7606793B2
(en)
|
2004-09-27 |
2009-10-20 |
Microsoft Corporation |
System and method for scoping searches using index keys
|
|
US8065296B1
(en)
*
|
2004-09-29 |
2011-11-22 |
Google Inc. |
Systems and methods for determining a quality of provided items
|
|
US7827181B2
(en)
|
2004-09-30 |
2010-11-02 |
Microsoft Corporation |
Click distance determination
|
|
US7739277B2
(en)
|
2004-09-30 |
2010-06-15 |
Microsoft Corporation |
System and method for incorporating anchor text into ranking search results
|
|
US7761448B2
(en)
|
2004-09-30 |
2010-07-20 |
Microsoft Corporation |
System and method for ranking search results using click distance
|
|
US7716198B2
(en)
|
2004-12-21 |
2010-05-11 |
Microsoft Corporation |
Ranking search results using feature extraction
|
|
US7536389B1
(en)
|
2005-02-22 |
2009-05-19 |
Yahoo ! Inc. |
Techniques for crawling dynamic web content
|
|
US7792833B2
(en)
|
2005-03-03 |
2010-09-07 |
Microsoft Corporation |
Ranking search results using language types
|
|
US8666964B1
(en)
*
|
2005-04-25 |
2014-03-04 |
Google Inc. |
Managing items in crawl schedule
|
|
US8386459B1
(en)
*
|
2005-04-25 |
2013-02-26 |
Google Inc. |
Scheduling a recrawl
|
|
US7509315B1
(en)
|
2005-06-24 |
2009-03-24 |
Google Inc. |
Managing URLs
|
|
US7610267B2
(en)
*
|
2005-06-28 |
2009-10-27 |
Yahoo! Inc. |
Unsupervised, automated web host dynamicity detection, dead link detection and prerequisite page discovery for search indexed web pages
|
|
US7599917B2
(en)
|
2005-08-15 |
2009-10-06 |
Microsoft Corporation |
Ranking search results using biased click distance
|
|
EP1938214A1
(en)
*
|
2005-10-11 |
2008-07-02 |
Taptu Ltd. |
Search using changes in prevalence of content items on the web
|
|
US8095565B2
(en)
*
|
2005-12-05 |
2012-01-10 |
Microsoft Corporation |
Metadata driven user interface
|
|
US20070143300A1
(en)
*
|
2005-12-20 |
2007-06-21 |
Ask Jeeves, Inc. |
System and method for monitoring evolution over time of temporal content
|
|
US7599931B2
(en)
*
|
2006-03-03 |
2009-10-06 |
Microsoft Corporation |
Web forum crawler
|
|
US7475069B2
(en)
*
|
2006-03-29 |
2009-01-06 |
International Business Machines Corporation |
System and method for prioritizing websites during a webcrawling process
|
|
US20070260586A1
(en)
*
|
2006-05-03 |
2007-11-08 |
Antonio Savona |
Systems and methods for selecting and organizing information using temporal clustering
|
|
WO2008030568A2
(en)
*
|
2006-09-07 |
2008-03-13 |
Feedster, Inc. |
Feed crawling system and method and spam feed filter
|
|
US20080104257A1
(en)
*
|
2006-10-26 |
2008-05-01 |
Yahoo! Inc. |
System and method using a refresh policy for incremental updating of web pages
|
|
US8745183B2
(en)
*
|
2006-10-26 |
2014-06-03 |
Yahoo! Inc. |
System and method for adaptively refreshing a web page
|
|
US7672943B2
(en)
*
|
2006-10-26 |
2010-03-02 |
Microsoft Corporation |
Calculating a downloading priority for the uniform resource locator in response to the domain density score, the anchor text score, the URL string score, the category need score, and the link proximity score for targeted web crawling
|
|
US20080104502A1
(en)
*
|
2006-10-26 |
2008-05-01 |
Yahoo! Inc. |
System and method for providing a change profile of a web page
|
|
WO2008070415A2
(en)
*
|
2006-11-14 |
2008-06-12 |
Deepdive Technologies Inc. |
Networked information collection apparatus and method
|
|
US7886042B2
(en)
*
|
2006-12-19 |
2011-02-08 |
Yahoo! Inc. |
Dynamically constrained, forward scheduling over uncertain workloads
|
|
US7979458B2
(en)
|
2007-01-16 |
2011-07-12 |
Microsoft Corporation |
Associating security trimmers with documents in an enterprise search system
|
|
US8725719B2
(en)
*
|
2007-02-13 |
2014-05-13 |
Microsoft Corporation |
Managing web page links using structured data
|
|
US20080215541A1
(en)
*
|
2007-03-01 |
2008-09-04 |
Microsoft Corporation |
Techniques for searching web forums
|
|
JP4668942B2
(ja)
*
|
2007-03-28 |
2011-04-13 |
日本電信電話株式会社 |
符号列生成装置、符号列入力装置、符号列生成プログラムおよび符号列入力プログラム
|
|
US20090013068A1
(en)
*
|
2007-07-02 |
2009-01-08 |
Eaglestone Robert J |
Systems and processes for evaluating webpages
|
|
US20090024583A1
(en)
*
|
2007-07-18 |
2009-01-22 |
Yahoo! Inc. |
Techniques in using feedback in crawling web content
|
|
US20090070346A1
(en)
*
|
2007-09-06 |
2009-03-12 |
Antonio Savona |
Systems and methods for clustering information
|
|
US8117223B2
(en)
|
2007-09-07 |
2012-02-14 |
Google Inc. |
Integrating external related phrase information into a phrase-based indexing information retrieval system
|
|
US8041704B2
(en)
*
|
2007-10-12 |
2011-10-18 |
The Regents Of The University Of California |
Searching for virtual world objects
|
|
US9348912B2
(en)
|
2007-10-18 |
2016-05-24 |
Microsoft Technology Licensing, Llc |
Document length as a static relevance feature for ranking search results
|
|
US7840569B2
(en)
|
2007-10-18 |
2010-11-23 |
Microsoft Corporation |
Enterprise relevancy ranking using a neural network
|
|
US7984000B2
(en)
|
2007-10-31 |
2011-07-19 |
Microsoft Corporation |
Predicting and using search engine switching behavior
|
|
WO2009059480A1
(en)
*
|
2007-11-08 |
2009-05-14 |
Shanghai Hewlett-Packard Co., Ltd |
Url and anchor text analysis for focused crawling
|
|
US8886660B2
(en)
*
|
2008-02-07 |
2014-11-11 |
Siemens Enterprise Communications Gmbh & Co. Kg |
Method and apparatus for tracking a change in a collection of web documents
|
|
US8812493B2
(en)
|
2008-04-11 |
2014-08-19 |
Microsoft Corporation |
Search results ranking using editing distance and document information
|
|
JP2009282738A
(ja)
|
2008-05-22 |
2009-12-03 |
Nec Electronics Corp |
自動更新装置、自動更新方法、及びプログラム
|
|
US8321793B1
(en)
*
|
2008-07-02 |
2012-11-27 |
Amdocs Software Systems Limited |
System, method, and computer program for recommending web content to a user
|
|
KR100975510B1
(ko)
*
|
2008-07-17 |
2010-08-11 |
엔에이치엔(주) |
웹 페이지 색인 업데이트 방법 및 시스템
|
|
US8805861B2
(en)
*
|
2008-12-09 |
2014-08-12 |
Google Inc. |
Methods and systems to train models to extract and integrate information from data sources
|
|
JP5157865B2
(ja)
*
|
2008-12-09 |
2013-03-06 |
日本電気株式会社 |
情報収集装置、情報収集方法及びプログラム
|
|
US20100205168A1
(en)
*
|
2009-02-10 |
2010-08-12 |
Microsoft Corporation |
Thread-Based Incremental Web Forum Crawling
|
|
US20100211533A1
(en)
*
|
2009-02-18 |
2010-08-19 |
Microsoft Corporation |
Extracting structured data from web forums
|
|
US8712992B2
(en)
*
|
2009-03-28 |
2014-04-29 |
Microsoft Corporation |
Method and apparatus for web crawling
|
|
US20100287148A1
(en)
*
|
2009-05-08 |
2010-11-11 |
Cpa Global Patent Research Limited |
Method, System, and Apparatus for Targeted Searching of Multi-Sectional Documents within an Electronic Document Collection
|
|
US8484180B2
(en)
*
|
2009-06-03 |
2013-07-09 |
Yahoo! Inc. |
Graph-based seed selection algorithm for web crawlers
|
|
US9213780B2
(en)
*
|
2009-06-26 |
2015-12-15 |
Microsoft Technology Licensing Llc |
Cache and index refreshing strategies for variably dynamic items and accesses
|
|
US20110016471A1
(en)
*
|
2009-07-15 |
2011-01-20 |
Microsoft Corporation |
Balancing Resource Allocations Based on Priority
|
|
US8352852B2
(en)
*
|
2009-08-14 |
2013-01-08 |
Red Hat, Inc. |
Portal replay and foresee
|
|
US9135261B2
(en)
|
2009-12-15 |
2015-09-15 |
Emc Corporation |
Systems and methods for facilitating data discovery
|
|
US8156240B2
(en)
*
|
2010-03-01 |
2012-04-10 |
Yahoo! Inc. |
Mechanism for supporting user content feeds
|
|
US8738635B2
(en)
|
2010-06-01 |
2014-05-27 |
Microsoft Corporation |
Detection of junk in search result ranking
|
|
US8433700B2
(en)
*
|
2010-09-17 |
2013-04-30 |
Verisign, Inc. |
Method and system for triggering web crawling based on registry data
|
|
US8832065B2
(en)
*
|
2010-10-29 |
2014-09-09 |
Fujitsu Limited |
Technique for coordinating the distributed, parallel crawling of interactive client-server applications
|
|
CN102480524B
(zh)
*
|
2010-11-26 |
2014-09-10 |
中国科学院声学研究所 |
一种网页爬虫协作方法
|
|
US8793706B2
(en)
|
2010-12-16 |
2014-07-29 |
Microsoft Corporation |
Metadata-based eventing supporting operations on data
|
|
CN102567407B
(zh)
*
|
2010-12-22 |
2014-07-16 |
北大方正集团有限公司 |
一种论坛回帖增量采集方法及系统
|
|
US8255385B1
(en)
|
2011-03-22 |
2012-08-28 |
Microsoft Corporation |
Adaptive crawl rates based on publication frequency
|
|
US8600968B2
(en)
|
2011-04-19 |
2013-12-03 |
Microsoft Corporation |
Predictively suggesting websites
|
|
CN102890692A
(zh)
|
2011-07-22 |
2013-01-23 |
阿里巴巴集团控股有限公司 |
一种网页信息抽取方法及抽取系统
|
|
US8782031B2
(en)
|
2011-08-09 |
2014-07-15 |
Microsoft Corporation |
Optimizing web crawling with user history
|
|
SG2014012694A
(en)
*
|
2011-09-27 |
2014-04-28 |
Amazon Tech Inc |
Historical browsing session management
|
|
US9495462B2
(en)
|
2012-01-27 |
2016-11-15 |
Microsoft Technology Licensing, Llc |
Re-ranking search results
|
|
US9881101B2
(en)
|
2012-11-16 |
2018-01-30 |
International Business Machines Corporation |
Dynamic file retrieving for web page loading
|
|
US9122992B2
(en)
*
|
2012-12-12 |
2015-09-01 |
Lenovo (Singapore) Pte. Ltd. |
Predicting web page
|
|
US10114804B2
(en)
|
2013-01-18 |
2018-10-30 |
International Business Machines Corporation |
Representation of an element in a page via an identifier
|
|
RU2592390C2
(ru)
*
|
2013-07-15 |
2016-07-20 |
Общество С Ограниченной Ответственностью "Яндекс" |
Система, способ и устройство для оценки сеансов просмотра
|
|
CN104657391B
(zh)
*
|
2013-11-21 |
2018-08-03 |
阿里巴巴集团控股有限公司 |
页面的处理方法及装置
|
|
CN105024870A
(zh)
*
|
2014-04-24 |
2015-11-04 |
中国移动通信集团公司 |
一种实现拨测的方法及系统
|
|
RU2589310C2
(ru)
*
|
2014-09-30 |
2016-07-10 |
Закрытое акционерное общество "Лаборатория Касперского" |
Система и способ расчета интервала повторного определения категорий сетевого ресурса
|
|
US9160680B1
(en)
|
2014-11-18 |
2015-10-13 |
Kaspersky Lab Zao |
System and method for dynamic network resource categorization re-assignment
|
|
US10216694B2
(en)
|
2015-08-24 |
2019-02-26 |
Google Llc |
Generic scheduling
|
|
US11570209B2
(en)
|
2015-10-28 |
2023-01-31 |
Qomplx, Inc. |
Detecting and mitigating attacks using forged authentication objects within a domain
|
|
MX391269B
(es)
|
2015-10-28 |
2025-03-21 |
Viasat Inc |
Sugerencia operativa generada por máquina en función del tiempo.
|
|
US20220014555A1
(en)
|
2015-10-28 |
2022-01-13 |
Qomplx, Inc. |
Distributed automated planning and execution platform for designing and running complex processes
|
|
US10742647B2
(en)
*
|
2015-10-28 |
2020-08-11 |
Qomplx, Inc. |
Contextual and risk-based multi-factor authentication
|
|
US12206708B2
(en)
|
2015-10-28 |
2025-01-21 |
Qomplx Llc |
Correlating network event anomalies using active and passive external reconnaissance to identify attack information
|
|
US10210255B2
(en)
*
|
2015-12-31 |
2019-02-19 |
Fractal Industries, Inc. |
Distributed system for large volume deep web data extraction
|
|
RU2632143C1
(ru)
*
|
2016-04-11 |
2017-10-02 |
Общество С Ограниченной Ответственностью "Яндекс" |
Способ обучения модуля ранжирования с использованием обучающей выборки с зашумленными ярлыками
|
|
WO2018124757A1
(ko)
*
|
2016-12-30 |
2018-07-05 |
(주)엠더블유스토리 |
크롤러 원격 관리 시스템 및 그 관리 방법
|
|
US10491622B2
(en)
*
|
2017-01-04 |
2019-11-26 |
Synack, Inc. |
Automatic webpage change detection
|
|
CN108062368B
(zh)
*
|
2017-12-08 |
2021-05-07 |
北京百度网讯科技有限公司 |
全量数据翻译方法、装置、服务器及存储介质
|
|
US10671371B2
(en)
|
2018-06-12 |
2020-06-02 |
International Business Machines Corporation |
Alerting an offline user of a predicted computer file update
|
|
EP3467740A1
(en)
*
|
2018-06-20 |
2019-04-10 |
DataCo GmbH |
Method and system for generating reports
|
|
US11379539B2
(en)
|
2019-05-22 |
2022-07-05 |
Microsoft Technology Licensing, Llc |
Efficient freshness crawl scheduling
|
|
US12141214B2
(en)
*
|
2020-03-30 |
2024-11-12 |
Google Llc |
Adversarial bandits policy for crawling highly dynamic content
|
|
CN111444412B
(zh)
*
|
2020-04-03 |
2023-06-16 |
北京明朝万达科技股份有限公司 |
网络爬虫任务的调度方法及装置
|
|
KR102563125B1
(ko)
*
|
2021-02-01 |
2023-08-03 |
(주)레몬클라우드 |
최저가제공장치 및 최저가제공방법
|
|
US12019691B2
(en)
|
2021-04-02 |
2024-06-25 |
Trackstreet, Inc. |
System and method for reducing crawl frequency and memory usage for an autonomous internet crawler
|
|
US12316698B2
(en)
*
|
2021-12-31 |
2025-05-27 |
Tangoe Us, Inc. |
Robotic process automation for telecom expense management information change detection and notification
|
|
KR20230134724A
(ko)
*
|
2022-03-15 |
2023-09-22 |
성균관대학교산학협력단 |
웹 페이지의 시변화 데이터 예측 방법, 장치, 이를 이용한 웹 관리 시스템, 컴퓨터 판독 가능한 기록 매체 및 컴퓨터 프로그램
|
|
WO2023211304A1
(ru)
*
|
2022-04-29 |
2023-11-02 |
Публичное Акционерное Общество "Сбербанк России" |
Система и способ сбора и обработки новостей в сети интернет
|