CN103235785B - A kind of method of batch extracting web page resources material - Google Patents
A kind of method of batch extracting web page resources material Download PDFInfo
- Publication number
- CN103235785B CN103235785B CN201310105247.5A CN201310105247A CN103235785B CN 103235785 B CN103235785 B CN 103235785B CN 201310105247 A CN201310105247 A CN 201310105247A CN 103235785 B CN103235785 B CN 103235785B
- Authority
- CN
- China
- Prior art keywords
- resource material
- processor
- resource
- file
- web page
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 239000000463 material Substances 0.000 title claims abstract description 158
- 238000000034 method Methods 0.000 title claims abstract description 41
- 239000000284 extract Substances 0.000 claims abstract description 6
- 238000012544 monitoring process Methods 0.000 claims abstract description 6
- 230000008878 coupling Effects 0.000 claims abstract description 3
- 238000010168 coupling process Methods 0.000 claims abstract description 3
- 238000005859 coupling reaction Methods 0.000 claims abstract description 3
- 238000001914 filtration Methods 0.000 claims description 21
- 241000700605 Viruses Species 0.000 claims description 17
- 238000000151 deposition Methods 0.000 claims description 3
- 230000006854 communication Effects 0.000 abstract description 5
- 238000005516 engineering process Methods 0.000 abstract description 5
- 238000000605 extraction Methods 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 4
- 230000005055 memory storage Effects 0.000 description 3
- 230000007423 decrease Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000004321 preservation Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 235000014510 cooky Nutrition 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
Landscapes
- Information Transfer Between Computers (AREA)
Abstract
The present invention relates to browser field, specifically disclose a kind of method of batch extracting web page resources material, the present invention is by monitoring the communication process of browser and Web service end, material processor receives the request containing analysis result of Web service end, the resource material that described material processor configuration described request is corresponding, and described resource material is monitored, snoop procedure comprises to be filtered and downloads; User is by the direct accessed web page of the present invention, and snoop procedure completes in the process of accessed web page, and user is without the need to doing other operations except accessed web page.By technology provided by the invention, the described resource material received generates and deposits path by described Web service end automatically, and generate the described resource material of file type corresponding to the described analysis result of coupling and file content, reach batch downloaded resources material, improve and extract the security of resource material, reduce labor workload, improve the object of extraction efficiency.
Description
Technical field
The present invention relates to browser field, especially relate to a kind of can from the method for the self-defined Configuration Type file of batch extracting website.
Background technology
Prior art mainly comprises two kinds: one, HttpWatch, a web data be integrated on InternetExplorer analyzes plug-in unit, function comprises the functions such as web-page summarization, Cookies management, cache management, report output, also have in resource material obtaining and relate to, but can only singlely download, cannot resource material on batch extracting webpage.Two, HttpFox, possesses similar functions with Httpwatch, is integrated on FireFox with card format, but does not possess file download function, therefore cannot extract the resource material on webpage.
For above-mentioned technical matters, in prior art, also there is no effective solution at present.
Summary of the invention
The technical matters that the present invention solves is to provide a kind of method of batch extracting web page resources material, and the present invention can allow user extract material on webpage more easily, and carries out safety detection.The present invention can batch downloaded resources material, improves the security of extracting resource material, decreases labor workload, improves extraction efficiency.
In order to solve the problems of the technologies described above, the technology used in the present invention solution by monitoring the communication process of browser and Web service end, the data transmitted in intercept communication process, filtering information download file.Specifically comprise:
Step one: client's side link Web service end, described client submits request to described Web service end;
Step 2: described Web service termination is received and responds described request, the file type corresponding to described request and file content are analyzed, generate the analysis result of file type corresponding to described request and file content, the more described request containing analysis result is transferred to material processor;
Step 3: described material processor receives the described request containing analysis result, resource material corresponding to described request searched by described material processor on webpage, the resource material corresponding to the described request searched is monitored, and snoop procedure comprises to be filtered and download;
Step 4: after snoop procedure completes, the described resource material downloaded is transferred to caching server and carries out buffer memory by described material processor;
Step 5: the resource material of described download is transferred to described Web service end by described caching server;
Step 6: the resource material of the described download received generates and deposits path by described Web service end automatically, and the resource material generating the described analysis result of coupling;
Step 7: described client receives the feedback of described Web service end, preserves the resource material of described the matching analysis result according to described path of depositing.
Preferably, the described filtration in described monitoring comprises:
S1: described material processor arranges band .* filtering option, and the resource material that described material processor is corresponding to the described request searched on webpage is analyzed, and analyzes described resource material and whether is with .* filtering option;
S2: when described material processor receives the described resource material of described band .* filtering option, then described resource material meets filtercondition, performs following S3 and operates;
When described material processor does not receive the described resource material of described band .* filtering option; Described material processor reads the type set that described resource material is arranged from database, and described material processor, to the next item down whether searching the type set of the described resource material read from database, carries out following S2.A or following S2.B process;
S2.A: when the next item down of the type set of the described resource material read searched by described material processor from database; Carry out following S2.A.a step, or carry out following S2.A.b1 to following S2.A.b2 step;
S2.B: when the next item down of type set of the described resource material read do not searched by described material processor from database; Carry out following S2.A.b2 step;
S2.A.a: when the type set of described resource material of reading exceeds the border of the type set that described material processor is arranged, resource material corresponding to described request searched again by described material processor on webpage, gets back to described S1 step;
S2.A.b1: when the type set of described resource material of reading does not exceed the border of the type set that described material processor is arranged, then described material processor extracts the suffix portion of url data in described resource material;
S2.A.b2: the type set that the described analysis result that described Web service end generates and the described resource material of reading are arranged or the suffix portion extracting url data in described resource material are mated, then described resource material meets filtercondition, carries out following S3 operation;
S3: described material processor carries out killing filtration to the garbage files met in the described resource material of filtercondition and virus document, described resource material contains described garbage files and described virus document, carries out following S3.A or following S3.B process;
S3.A: when described material processor be not filled into described resource material contain described garbage files and described virus document time, continue perform down operation;
S3.B: when described material processor be filled into described resource material contain described garbage files and described virus document time, point out described client be select killing virus or select continue perform download step, carry out following S3.B.a or following S3.B.b process;
S3.B.a: when client is selected to continue to perform download, then skip killing filtration step, continue to perform down operation;
S3.B.b: when client selects killing to filter, then killing is carried out to the described garbage files in described resource material and described virus document, until described resource material safety, proceed down operation.
Preferably, whether the described download in described monitoring exceedes threshold value according to the data length of described resource material, carries out following NA or following NB process:
NA: when the data length of described resource material exceedes threshold value, what generate the described resource material that will download according to described Web service end deposits path and whether creates file, carries out following NA.a1 to following NA.a4 step or following NA.b step;
NA.a1: when according to described Web service end to the described resource material that will download generate deposit path directly create file time, open the described file of establishment, the data of described resource material are received, by the described file that the data write of the described resource material received creates after filter process;
NA.a2: the data receiver of described resource material completes;
NA.a3: close the described file created;
NA.a4: downloaded;
NA.b: when not creating described file, does not receive the data of described resource material after filter process, and resource material corresponding to described request searched again by described material processor on webpage, re-starts and filters and download;
NB: when the data length of described resource material does not exceed threshold value, to applying for that in internal memory whether memory headroom is enough, carries out following NB.a1 to following NB.a3 step or following NB.b step;
NB.a1: when applying for that memory headroom is enough in internal memory, receives the data of described resource material after filter process, by the data write memory of the described resource material of reception, gets back to described NA.a2 step;
NB.a2: discharge described internal memory;
NB.a3: downloaded;
NB.b: when applying for that in internal memory memory headroom is not enough, carry out described NA.b step.
Preferably, described request is transmitted with the form of data stream.
Preferably, described external memory storage is one or more in floppy disk, hard disk, CD or USB flash disk.
Preferably, described client is one or more in mobile phone, personal computer, panel computer.
Preferably, described web page resources material comprise picture, document, form, can perform in script, photo, audio frequency, video one or more.
Know-why of the present invention is: monitor the communication process of browser and Web service end, and the data transmitted in intercept communication process are to reach the object of filtering information, download file.Embedded browser control part in program, user is by the direct accessed web page of the present invention, and snoop procedure also completes in this accessed web page process, and user is without the need to doing other operations except accessed web page.
The present invention compared with prior art, has following beneficial effect:
User only need by the browser access page provided by the invention, user is without the need to doing other any operations, just energy batch downloaded resources material, and can the resource material that will extract be monitored, carry out filtering and downloading in the process monitored, improve the security of extracting resource material, decrease labor workload, improve extraction efficiency.It is a kind of new technology with promotional value.
Accompanying drawing explanation
The method that Fig. 1 shows batch extracting web page resources material monitors process flow diagram;
Fig. 2 shows the method filtering process figure of batch extracting web page resources material;
The method that Fig. 3 shows batch extracting web page resources material downloads process flow diagram.
Embodiment
In order to the technical scheme understanding technical matters solved by the invention better, provide, below in conjunction with drawings and Examples, the present invention is further elaborated.Specific embodiment described herein only in order to explain enforcement of the present invention, but is not intended to limit the present invention.
One of embodiment of the present invention:
S1, client open webpage input network address, click carriage return and access described webpage;
S2, client press Shift and F2 button simultaneously, eject Download Info panel;
The information such as the url data of file type, download progress, file path and resource material that S3, described Download Info Display panel will be downloaded;
S4, killing filtration is carried out to the garbage files in the described resource material that will download and virus document, if be not filled into described garbage files and described virus document, then carry out S5 step; If be filled into described garbage files and described virus document, can dialog box be ejected, such as, " please select
killing virusstill
under continuation carry" printed words, prompting client be killing virus or continue perform down operation, if customer selecting click such as "
continue to download" printed words, then skip killing filtration step, continue S5 step; If customer selecting is clicked such as "
killing is filtered" printed words, then carry out killing filtration to garbage files and virus document, " file is safe, please to treat that described file security can eject such as
continue to download" printed words, then continue S5 step.
Right button popup menu in S5, a download items in office, clicks " opened file folder ", under directly browsing to the catalogue of described file, clicks " copying URL ", directly copies the url data of described file;
Left double click in S6, a download items in office, directly opens described file;
S7, click " preservation " button, eject and preserve interface, the file type that system default client submits to request corresponding and file content, and the described file automatically generated deposit path, what described client can select the file type different from described system default and described file at described preservation interface deposits path, selects the path of depositing of the described file of download to have internal storage and external memory storage.
In a preferred embodiment, described request is transmitted with the form of data stream.
In a preferred embodiment, described external memory storage is one or more in floppy disk, hard disk, CD or USB flash disk.
In a preferred embodiment, described client be mobile phone, personal computer, panel computer or other obtain with website and communicate and be configured with the hardware (such as: processor) of presentation materials and the device of software (such as: FLASH software, windows operating system etc.).
In a preferred embodiment, described web page resources material comprise picture, document, form, can perform in script, photo, audio frequency, video one or more.
Above by specific embodiment detailed describe the present invention; but those skilled in the art should be understood that; the present invention is not limited to the above embodiment; all within ultimate principle of the present invention; any amendment of doing, combination and equivalent replacement etc., be all included within protection scope of the present invention.
Claims (6)
1. a method for batch extracting web page resources material, is characterized in that, comprising:
Step one: client's side link Web service end, described client submits request to described Web service end;
Step 2: described Web service termination is received and responds described request, the file type corresponding to described request and file content are analyzed, generate the analysis result of file type corresponding to described request and file content, the more described request containing analysis result is transferred to material processor;
Step 3: described material processor receives the described request containing analysis result, resource material corresponding to described request searched by described material processor on webpage, the resource material corresponding to the described request searched is monitored, and snoop procedure comprises to be filtered and download;
Step 4: after snoop procedure completes, the described resource material downloaded is transferred to caching server and carries out buffer memory by described material processor;
Step 5: the resource material of described download is transferred to described Web service end by described caching server;
Step 6: the resource material of the described download received generates and deposits path by described Web service end automatically, and the resource material generating the described analysis result of coupling;
Step 7: described client receives the feedback of described Web service end, preserves the resource material of described the matching analysis result according to described path of depositing.
2. the method for batch extracting web page resources material according to claim 1, is characterized in that, the described filtration in described monitoring comprises:
S1: described material processor arranges band .* filtering option, and the resource material that described material processor is corresponding to the described request searched on webpage is analyzed, and analyzes described resource material and whether is with .* filtering option;
S2: when described material processor receives the described resource material of described band .* filtering option, then described resource material meets filtercondition, performs following S3 and operates;
When described material processor does not receive the described resource material of described band .* filtering option; Described material processor reads the type set that described resource material is arranged from database, and described material processor, to the next item down whether searching the type set of the described resource material read from database, carries out following S2.A or following S2.B process;
S2.A: when the next item down of the type set of the described resource material read searched by described material processor from database; Carry out following S2.A.a step, or carry out following S2.A.b1 to following S2.A.b2 step;
S2.B: when the next item down of type set of the described resource material read do not searched by described material processor from database; Carry out following S2.A.b2 step;
S2.A.a: when the type set of described resource material of reading exceeds the border of the type set that described material processor is arranged, resource material corresponding to described request searched again by described material processor on webpage, gets back to described S1 step;
S2.A.b1: when the type set of described resource material of reading does not exceed the border of the type set that described material processor is arranged, then described material processor extracts the suffix portion of url data in described resource material;
S2.A.b2: the type set that the described analysis result that described Web service end generates and the described resource material of reading are arranged or the suffix portion extracting url data in described resource material are mated, then described resource material meets filtercondition, carries out following S3 operation;
S3: described material processor carries out killing filtration to the garbage files met in the described resource material of filtercondition and virus document, described resource material contains described garbage files and described virus document, carries out following S3.A or following S3.B process;
S3.A: when described material processor be not filled into described resource material contain described garbage files and described virus document time, continue perform down operation;
S3.B: when described material processor be filled into described resource material contain described garbage files and described virus document time, point out described client be select killing virus or select continue perform download step, carry out following S3.B.a or following S3.B.b process;
S3.B.a: when client is selected to continue to perform download, then skip killing filtration step, continue to perform down operation;
S3.B.b: when client selects killing to filter, then killing is carried out to the described garbage files in described resource material and described virus document, until described resource material safety, proceed down operation.
3. the method for batch extracting web page resources material according to claim 1, is characterized in that, whether the described download in described monitoring exceedes threshold value according to the data length of described resource material, carries out following NA or following NB process:
NA: when the data length of described resource material exceedes threshold value, what generate the described resource material that will download according to described Web service end deposits path and whether creates file, carries out following NA.a1 to following NA.a4 step or following NA.b step;
NA.a1: when according to described Web service end to the described resource material that will download generate deposit path directly create file time, open the described file of establishment, the data of described resource material are received, by the described file that the data write of the described resource material received creates after filter process;
NA.a2: the data receiver of described resource material completes;
NA.a3: close the described file created;
NA.a4: downloaded;
NA.b: when not creating described file, does not receive the data of described resource material after filter process, and resource material corresponding to described request searched again by described material processor on webpage, re-starts and filters and download;
NB: when the data length of described resource material does not exceed threshold value, to applying for that in internal memory whether memory headroom is enough, carries out following NB.a1 to following NB.a3 step or following NB.b step;
NB.a1: when applying for that memory headroom is enough in internal memory, receives the data of described resource material after filter process, by the data write memory of the described resource material of reception, gets back to described NA.a2 step;
NB.a2: discharge described internal memory;
NB.a3: downloaded;
NB.b: when applying for that in internal memory memory headroom is not enough, carry out described NA.b step.
4. the method for batch extracting web page resources material according to claim 1, it is characterized in that, described request is transmitted with the form of data stream.
5. the method for batch extracting web page resources material according to claim 1, is characterized in that, described client is one or more in mobile phone, personal computer, panel computer.
6. the method for batch extracting web page resources material according to claim 1, is characterized in that, described web page resources material comprises picture, document, form, can perform in script, photo, audio frequency, video one or more.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310105247.5A CN103235785B (en) | 2013-03-28 | 2013-03-28 | A kind of method of batch extracting web page resources material |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310105247.5A CN103235785B (en) | 2013-03-28 | 2013-03-28 | A kind of method of batch extracting web page resources material |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103235785A CN103235785A (en) | 2013-08-07 |
CN103235785B true CN103235785B (en) | 2016-02-24 |
Family
ID=48883827
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310105247.5A Active CN103235785B (en) | 2013-03-28 | 2013-03-28 | A kind of method of batch extracting web page resources material |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103235785B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110955852A (en) * | 2018-09-25 | 2020-04-03 | 北京国双科技有限公司 | Content import method and device |
CN111651418B (en) * | 2020-05-29 | 2022-03-08 | 腾讯科技(深圳)有限公司 | Document content downloading method and device, computer equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6718365B1 (en) * | 2000-04-13 | 2004-04-06 | International Business Machines Corporation | Method, system, and program for ordering search results using an importance weighting |
CN101477576A (en) * | 2009-01-20 | 2009-07-08 | 华为技术有限公司 | Method, equipment and system for providing network materials to search engine |
CN102254027A (en) * | 2011-07-29 | 2011-11-23 | 四川长虹电器股份有限公司 | Method for obtaining webpage contents in batch |
CN102646135A (en) * | 2012-03-31 | 2012-08-22 | 奇智软件(北京)有限公司 | Method, device and system for collecting web pages |
CN102955791A (en) * | 2011-08-23 | 2013-03-06 | 句容今太科技园有限公司 | Searching and classifying service system for network information |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110119161A1 (en) * | 2009-11-18 | 2011-05-19 | Van Treeck George M | Automated ratings of new products and services |
-
2013
- 2013-03-28 CN CN201310105247.5A patent/CN103235785B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6718365B1 (en) * | 2000-04-13 | 2004-04-06 | International Business Machines Corporation | Method, system, and program for ordering search results using an importance weighting |
CN101477576A (en) * | 2009-01-20 | 2009-07-08 | 华为技术有限公司 | Method, equipment and system for providing network materials to search engine |
CN102254027A (en) * | 2011-07-29 | 2011-11-23 | 四川长虹电器股份有限公司 | Method for obtaining webpage contents in batch |
CN102955791A (en) * | 2011-08-23 | 2013-03-06 | 句容今太科技园有限公司 | Searching and classifying service system for network information |
CN102646135A (en) * | 2012-03-31 | 2012-08-22 | 奇智软件(北京)有限公司 | Method, device and system for collecting web pages |
Also Published As
Publication number | Publication date |
---|---|
CN103235785A (en) | 2013-08-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9336202B2 (en) | Method and system relating to salient content extraction for electronic content | |
CN102646135B (en) | Method, device and system for collecting web pages | |
CN110245069B (en) | Page version testing method and device and page display method and device | |
CN109981322B (en) | Method and device for cloud resource management based on label | |
WO2014204877A1 (en) | Capturing website content through capture services | |
CN106294648A (en) | A kind of processing method and processing device for page access path | |
CN104580093A (en) | Processing method, device and system for notification messages of websites | |
WO2020253366A1 (en) | Webpage mailbox data crawling method and apparatus, terminal, and storage medium | |
CN105051685A (en) | System and method to enable web property access to a native application | |
CN104899212B (en) | Web page display method, server and system | |
CN105447201A (en) | An optimization method and terminal for sharing information | |
TW201409273A (en) | Method and device for responding to webpage access request | |
CN102314437A (en) | Method for supporting user to browse multiple format resources and equipment | |
CN102682013A (en) | Method for operating compressed file in network storage appliance | |
CN110442819A (en) | Data processing method, device, storage medium and terminal | |
EP3594823B1 (en) | Information display method, terminal and server | |
CN109471974A (en) | Filter method, apparatus, electronic equipment and the storage medium of third party's web advertisement | |
CN111245880B (en) | Behavior trajectory reconstruction-based user experience monitoring method and device | |
CN105550179A (en) | Webpage collection method and browser plug-in | |
CN103235785B (en) | A kind of method of batch extracting web page resources material | |
KR20130026558A (en) | System and providing method for integration of reply comment | |
CN107562452A (en) | Terminal preset application update method, intelligent terminal and the device with store function | |
KR102259595B1 (en) | System for providing mobile based file sending service using short message service | |
CN108763930A (en) | WEB page streaming analytic method based on minimal cache model | |
CN103577433A (en) | Intelligent page browsing method, system and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |