CN101303700B - Method and system for collecting web page - Google Patents
Method and system for collecting web page Download PDFInfo
- Publication number
- CN101303700B CN101303700B CN2008101112988A CN200810111298A CN101303700B CN 101303700 B CN101303700 B CN 101303700B CN 2008101112988 A CN2008101112988 A CN 2008101112988A CN 200810111298 A CN200810111298 A CN 200810111298A CN 101303700 B CN101303700 B CN 101303700B
- Authority
- CN
- China
- Prior art keywords
- dns
- url
- dns request
- obtains
- request
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000012545 processing Methods 0.000 claims description 18
- 239000000284 extract Substances 0.000 claims description 10
- 238000004458 analytical method Methods 0.000 claims description 6
- 230000015572 biosynthetic process Effects 0.000 claims description 4
- 230000008676 import Effects 0.000 claims description 3
- 230000008569 process Effects 0.000 description 9
- 230000006870 function Effects 0.000 description 4
- 238000012423 maintenance Methods 0.000 description 4
- 230000000977 initiatory effect Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 238000002386 leaching Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 238000012797 qualification Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Landscapes
- Information Transfer Between Computers (AREA)
Abstract
Description
Claims (17)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN2008101112988A CN101303700B (en) | 2008-06-13 | 2008-06-13 | Method and system for collecting web page |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN2008101112988A CN101303700B (en) | 2008-06-13 | 2008-06-13 | Method and system for collecting web page |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN101303700A CN101303700A (en) | 2008-11-12 |
| CN101303700B true CN101303700B (en) | 2010-04-21 |
Family
ID=40113603
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN2008101112988A Expired - Fee Related CN101303700B (en) | 2008-06-13 | 2008-06-13 | Method and system for collecting web page |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN101303700B (en) |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102316099B (en) * | 2011-07-28 | 2014-10-22 | 中国科学院计算机网络信息中心 | Network fishing detection method and apparatus thereof |
| CN109347996A (en) * | 2018-12-10 | 2019-02-15 | 中共中央办公厅电子科技学院 | A kind of DNS domain name acquisition system and method |
| CN110891090B (en) * | 2019-11-29 | 2023-01-31 | 北京声智科技有限公司 | Request method, device, server, system and storage medium |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101046820A (en) * | 2006-03-29 | 2007-10-03 | 国际商业机器公司 | System and method for prioritizing websites during a webcrawling process |
| CN101178736A (en) * | 2007-12-11 | 2008-05-14 | 腾讯科技(深圳)有限公司 | Web page collecting method and web page collecting server |
-
2008
- 2008-06-13 CN CN2008101112988A patent/CN101303700B/en not_active Expired - Fee Related
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101046820A (en) * | 2006-03-29 | 2007-10-03 | 国际商业机器公司 | System and method for prioritizing websites during a webcrawling process |
| CN101178736A (en) * | 2007-12-11 | 2008-05-14 | 腾讯科技(深圳)有限公司 | Web page collecting method and web page collecting server |
Non-Patent Citations (2)
| Title |
|---|
| 朴星海.面向主题的网络爬行器相关技术研究.哈尔滨工业大学工学硕士学位论文.2007,8-15,38-45. * |
| 苏旋.分布式网络爬虫技术的研究与实现.哈尔滨工业大学工学硕士学位论文.2006,3-4,16-19,28-31. * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN101303700A (en) | 2008-11-12 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN107895009B (en) | Distributed internet data acquisition method and system | |
| CN104735138B (en) | A kind of distributed acquisition method and system of user oriented generation content | |
| CN107087001B (en) | A Distributed Internet Important Address Space Retrieval System | |
| CN107885777A (en) | A control method and system for crawling web page data based on collaborative crawler | |
| CN102710795B (en) | Hot spot polymerization method and device | |
| CN105243159A (en) | Visual script editor-based distributed web crawler system | |
| CN104038363A (en) | Method for acquiring and counting CCDN provider information | |
| CN105930502B (en) | System, client and method for collecting data | |
| CN103853743A (en) | Distributed system and log query method thereof | |
| CN103927370A (en) | Network information batch acquisition method of combined text and picture information | |
| CN105005600A (en) | Preprocessing method of URL (Uniform Resource Locator) in access log | |
| CN111859069B (en) | Network malicious crawler identification method, system, terminal and storage medium | |
| CN113656673A (en) | Master-slave distributed content crawling robot for advertisement delivery | |
| CN110808868B (en) | Test data acquisition method and device, computer equipment and storage medium | |
| CN106790593B (en) | A page processing method and device | |
| CN103778908A (en) | Karaoke member VOD system and VOD method thereof | |
| CN102761628B (en) | Universal domain name identification, processing device and method | |
| CN107911466A (en) | A kind of association method under multi-layer framework | |
| CN107766234A (en) | A kind of assessment method, the apparatus and system of the webpage health degree based on mobile device | |
| CN101303700B (en) | Method and system for collecting web page | |
| CN103513986B (en) | A kind of method utilizing CGI technology to realize dynamic web server in without operating system equipment | |
| CN104253875A (en) | DNS (domain name system) flow analysis method | |
| CN103581349B (en) | A kind of domain name analytic method and device | |
| CN102571780A (en) | Control method, equipment and system for accessing network resource | |
| CN111611508B (en) | Identification method and device for actual website access of user |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| ASS | Succession or assignment of patent right |
Owner name: CHENGDU CITY HUAWEI SAIMENTEKE SCIENCE CO., LTD. Free format text: FORMER OWNER: HUAWEI TECHNOLOGY CO., LTD. Effective date: 20090424 |
|
| C41 | Transfer of patent application or patent right or utility model | ||
| TA01 | Transfer of patent application right |
Effective date of registration: 20090424 Address after: Qingshui River District, Chengdu high tech Zone, Sichuan Province, China: 611731 Applicant after: CHENGDU HUAWEI SYMANTEC TECHNOLOGIES Co.,Ltd. Address before: Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen Province, China: 518129 Applicant before: HUAWEI TECHNOLOGIES Co.,Ltd. |
|
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant | ||
| C56 | Change in the name or address of the patentee |
Owner name: HUAWEI DIGITAL TECHNOLOGY (CHENGDU) CO., LTD. Free format text: FORMER NAME: CHENGDU HUAWEI SYMANTEC TECHNOLOGIES CO., LTD. |
|
| CP01 | Change in the name or title of a patent holder |
Address after: 611731 Chengdu high tech Zone, Sichuan, West Park, Qingshui River Patentee after: HUAWEI DIGITAL TECHNOLOGIES (CHENG DU) Co.,Ltd. Address before: 611731 Chengdu high tech Zone, Sichuan, West Park, Qingshui River Patentee before: CHENGDU HUAWEI SYMANTEC TECHNOLOGIES Co.,Ltd. |
|
| CF01 | Termination of patent right due to non-payment of annual fee | ||
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20100421 |