[go: up one dir, main page]

CN103235785B - A kind of method of batch extracting web page resources material - Google Patents

A kind of method of batch extracting web page resources material Download PDF

Info

Publication number
CN103235785B
CN103235785B CN201310105247.5A CN201310105247A CN103235785B CN 103235785 B CN103235785 B CN 103235785B CN 201310105247 A CN201310105247 A CN 201310105247A CN 103235785 B CN103235785 B CN 103235785B
Authority
CN
China
Prior art keywords
resource material
processor
resource
file
web page
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310105247.5A
Other languages
Chinese (zh)
Other versions
CN103235785A (en
Inventor
徐培镖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
4399 NETWORK Co Ltd
Original Assignee
4399 NETWORK Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 4399 NETWORK Co Ltd filed Critical 4399 NETWORK Co Ltd
Priority to CN201310105247.5A priority Critical patent/CN103235785B/en
Publication of CN103235785A publication Critical patent/CN103235785A/en
Application granted granted Critical
Publication of CN103235785B publication Critical patent/CN103235785B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The present invention relates to browser field, specifically disclose a kind of method of batch extracting web page resources material, the present invention is by monitoring the communication process of browser and Web service end, material processor receives the request containing analysis result of Web service end, the resource material that described material processor configuration described request is corresponding, and described resource material is monitored, snoop procedure comprises to be filtered and downloads; User is by the direct accessed web page of the present invention, and snoop procedure completes in the process of accessed web page, and user is without the need to doing other operations except accessed web page.By technology provided by the invention, the described resource material received generates and deposits path by described Web service end automatically, and generate the described resource material of file type corresponding to the described analysis result of coupling and file content, reach batch downloaded resources material, improve and extract the security of resource material, reduce labor workload, improve the object of extraction efficiency.

Description

A kind of method of batch extracting web page resources material
Technical field
The present invention relates to browser field, especially relate to a kind of can from the method for the self-defined Configuration Type file of batch extracting website.
Background technology
Prior art mainly comprises two kinds: one, HttpWatch, a web data be integrated on InternetExplorer analyzes plug-in unit, function comprises the functions such as web-page summarization, Cookies management, cache management, report output, also have in resource material obtaining and relate to, but can only singlely download, cannot resource material on batch extracting webpage.Two, HttpFox, possesses similar functions with Httpwatch, is integrated on FireFox with card format, but does not possess file download function, therefore cannot extract the resource material on webpage.
For above-mentioned technical matters, in prior art, also there is no effective solution at present.
Summary of the invention
The technical matters that the present invention solves is to provide a kind of method of batch extracting web page resources material, and the present invention can allow user extract material on webpage more easily, and carries out safety detection.The present invention can batch downloaded resources material, improves the security of extracting resource material, decreases labor workload, improves extraction efficiency.
In order to solve the problems of the technologies described above, the technology used in the present invention solution by monitoring the communication process of browser and Web service end, the data transmitted in intercept communication process, filtering information download file.Specifically comprise:
Step one: client's side link Web service end, described client submits request to described Web service end;
Step 2: described Web service termination is received and responds described request, the file type corresponding to described request and file content are analyzed, generate the analysis result of file type corresponding to described request and file content, the more described request containing analysis result is transferred to material processor;
Step 3: described material processor receives the described request containing analysis result, resource material corresponding to described request searched by described material processor on webpage, the resource material corresponding to the described request searched is monitored, and snoop procedure comprises to be filtered and download;
Step 4: after snoop procedure completes, the described resource material downloaded is transferred to caching server and carries out buffer memory by described material processor;
Step 5: the resource material of described download is transferred to described Web service end by described caching server;
Step 6: the resource material of the described download received generates and deposits path by described Web service end automatically, and the resource material generating the described analysis result of coupling;
Step 7: described client receives the feedback of described Web service end, preserves the resource material of described the matching analysis result according to described path of depositing.
Preferably, the described filtration in described monitoring comprises:
S1: described material processor arranges band .* filtering option, and the resource material that described material processor is corresponding to the described request searched on webpage is analyzed, and analyzes described resource material and whether is with .* filtering option;
S2: when described material processor receives the described resource material of described band .* filtering option, then described resource material meets filtercondition, performs following S3 and operates;
When described material processor does not receive the described resource material of described band .* filtering option; Described material processor reads the type set that described resource material is arranged from database, and described material processor, to the next item down whether searching the type set of the described resource material read from database, carries out following S2.A or following S2.B process;
S2.A: when the next item down of the type set of the described resource material read searched by described material processor from database; Carry out following S2.A.a step, or carry out following S2.A.b1 to following S2.A.b2 step;
S2.B: when the next item down of type set of the described resource material read do not searched by described material processor from database; Carry out following S2.A.b2 step;
S2.A.a: when the type set of described resource material of reading exceeds the border of the type set that described material processor is arranged, resource material corresponding to described request searched again by described material processor on webpage, gets back to described S1 step;
S2.A.b1: when the type set of described resource material of reading does not exceed the border of the type set that described material processor is arranged, then described material processor extracts the suffix portion of url data in described resource material;
S2.A.b2: the type set that the described analysis result that described Web service end generates and the described resource material of reading are arranged or the suffix portion extracting url data in described resource material are mated, then described resource material meets filtercondition, carries out following S3 operation;
S3: described material processor carries out killing filtration to the garbage files met in the described resource material of filtercondition and virus document, described resource material contains described garbage files and described virus document, carries out following S3.A or following S3.B process;
S3.A: when described material processor be not filled into described resource material contain described garbage files and described virus document time, continue perform down operation;
S3.B: when described material processor be filled into described resource material contain described garbage files and described virus document time, point out described client be select killing virus or select continue perform download step, carry out following S3.B.a or following S3.B.b process;
S3.B.a: when client is selected to continue to perform download, then skip killing filtration step, continue to perform down operation;
S3.B.b: when client selects killing to filter, then killing is carried out to the described garbage files in described resource material and described virus document, until described resource material safety, proceed down operation.
Preferably, whether the described download in described monitoring exceedes threshold value according to the data length of described resource material, carries out following NA or following NB process:
NA: when the data length of described resource material exceedes threshold value, what generate the described resource material that will download according to described Web service end deposits path and whether creates file, carries out following NA.a1 to following NA.a4 step or following NA.b step;
NA.a1: when according to described Web service end to the described resource material that will download generate deposit path directly create file time, open the described file of establishment, the data of described resource material are received, by the described file that the data write of the described resource material received creates after filter process;
NA.a2: the data receiver of described resource material completes;
NA.a3: close the described file created;
NA.a4: downloaded;
NA.b: when not creating described file, does not receive the data of described resource material after filter process, and resource material corresponding to described request searched again by described material processor on webpage, re-starts and filters and download;
NB: when the data length of described resource material does not exceed threshold value, to applying for that in internal memory whether memory headroom is enough, carries out following NB.a1 to following NB.a3 step or following NB.b step;
NB.a1: when applying for that memory headroom is enough in internal memory, receives the data of described resource material after filter process, by the data write memory of the described resource material of reception, gets back to described NA.a2 step;
NB.a2: discharge described internal memory;
NB.a3: downloaded;
NB.b: when applying for that in internal memory memory headroom is not enough, carry out described NA.b step.
Preferably, described request is transmitted with the form of data stream.
Preferably, described external memory storage is one or more in floppy disk, hard disk, CD or USB flash disk.
Preferably, described client is one or more in mobile phone, personal computer, panel computer.
Preferably, described web page resources material comprise picture, document, form, can perform in script, photo, audio frequency, video one or more.
Know-why of the present invention is: monitor the communication process of browser and Web service end, and the data transmitted in intercept communication process are to reach the object of filtering information, download file.Embedded browser control part in program, user is by the direct accessed web page of the present invention, and snoop procedure also completes in this accessed web page process, and user is without the need to doing other operations except accessed web page.
The present invention compared with prior art, has following beneficial effect:
User only need by the browser access page provided by the invention, user is without the need to doing other any operations, just energy batch downloaded resources material, and can the resource material that will extract be monitored, carry out filtering and downloading in the process monitored, improve the security of extracting resource material, decrease labor workload, improve extraction efficiency.It is a kind of new technology with promotional value.
Accompanying drawing explanation
The method that Fig. 1 shows batch extracting web page resources material monitors process flow diagram;
Fig. 2 shows the method filtering process figure of batch extracting web page resources material;
The method that Fig. 3 shows batch extracting web page resources material downloads process flow diagram.
Embodiment
In order to the technical scheme understanding technical matters solved by the invention better, provide, below in conjunction with drawings and Examples, the present invention is further elaborated.Specific embodiment described herein only in order to explain enforcement of the present invention, but is not intended to limit the present invention.
One of embodiment of the present invention:
S1, client open webpage input network address, click carriage return and access described webpage;
S2, client press Shift and F2 button simultaneously, eject Download Info panel;
The information such as the url data of file type, download progress, file path and resource material that S3, described Download Info Display panel will be downloaded;
S4, killing filtration is carried out to the garbage files in the described resource material that will download and virus document, if be not filled into described garbage files and described virus document, then carry out S5 step; If be filled into described garbage files and described virus document, can dialog box be ejected, such as, " please select killing virusstill under continuation carry" printed words, prompting client be killing virus or continue perform down operation, if customer selecting click such as " continue to download" printed words, then skip killing filtration step, continue S5 step; If customer selecting is clicked such as " killing is filtered" printed words, then carry out killing filtration to garbage files and virus document, " file is safe, please to treat that described file security can eject such as continue to download" printed words, then continue S5 step.
Right button popup menu in S5, a download items in office, clicks " opened file folder ", under directly browsing to the catalogue of described file, clicks " copying URL ", directly copies the url data of described file;
Left double click in S6, a download items in office, directly opens described file;
S7, click " preservation " button, eject and preserve interface, the file type that system default client submits to request corresponding and file content, and the described file automatically generated deposit path, what described client can select the file type different from described system default and described file at described preservation interface deposits path, selects the path of depositing of the described file of download to have internal storage and external memory storage.
In a preferred embodiment, described request is transmitted with the form of data stream.
In a preferred embodiment, described external memory storage is one or more in floppy disk, hard disk, CD or USB flash disk.
In a preferred embodiment, described client be mobile phone, personal computer, panel computer or other obtain with website and communicate and be configured with the hardware (such as: processor) of presentation materials and the device of software (such as: FLASH software, windows operating system etc.).
In a preferred embodiment, described web page resources material comprise picture, document, form, can perform in script, photo, audio frequency, video one or more.
Above by specific embodiment detailed describe the present invention; but those skilled in the art should be understood that; the present invention is not limited to the above embodiment; all within ultimate principle of the present invention; any amendment of doing, combination and equivalent replacement etc., be all included within protection scope of the present invention.

Claims (6)

1. a method for batch extracting web page resources material, is characterized in that, comprising:
Step one: client's side link Web service end, described client submits request to described Web service end;
Step 2: described Web service termination is received and responds described request, the file type corresponding to described request and file content are analyzed, generate the analysis result of file type corresponding to described request and file content, the more described request containing analysis result is transferred to material processor;
Step 3: described material processor receives the described request containing analysis result, resource material corresponding to described request searched by described material processor on webpage, the resource material corresponding to the described request searched is monitored, and snoop procedure comprises to be filtered and download;
Step 4: after snoop procedure completes, the described resource material downloaded is transferred to caching server and carries out buffer memory by described material processor;
Step 5: the resource material of described download is transferred to described Web service end by described caching server;
Step 6: the resource material of the described download received generates and deposits path by described Web service end automatically, and the resource material generating the described analysis result of coupling;
Step 7: described client receives the feedback of described Web service end, preserves the resource material of described the matching analysis result according to described path of depositing.
2. the method for batch extracting web page resources material according to claim 1, is characterized in that, the described filtration in described monitoring comprises:
S1: described material processor arranges band .* filtering option, and the resource material that described material processor is corresponding to the described request searched on webpage is analyzed, and analyzes described resource material and whether is with .* filtering option;
S2: when described material processor receives the described resource material of described band .* filtering option, then described resource material meets filtercondition, performs following S3 and operates;
When described material processor does not receive the described resource material of described band .* filtering option; Described material processor reads the type set that described resource material is arranged from database, and described material processor, to the next item down whether searching the type set of the described resource material read from database, carries out following S2.A or following S2.B process;
S2.A: when the next item down of the type set of the described resource material read searched by described material processor from database; Carry out following S2.A.a step, or carry out following S2.A.b1 to following S2.A.b2 step;
S2.B: when the next item down of type set of the described resource material read do not searched by described material processor from database; Carry out following S2.A.b2 step;
S2.A.a: when the type set of described resource material of reading exceeds the border of the type set that described material processor is arranged, resource material corresponding to described request searched again by described material processor on webpage, gets back to described S1 step;
S2.A.b1: when the type set of described resource material of reading does not exceed the border of the type set that described material processor is arranged, then described material processor extracts the suffix portion of url data in described resource material;
S2.A.b2: the type set that the described analysis result that described Web service end generates and the described resource material of reading are arranged or the suffix portion extracting url data in described resource material are mated, then described resource material meets filtercondition, carries out following S3 operation;
S3: described material processor carries out killing filtration to the garbage files met in the described resource material of filtercondition and virus document, described resource material contains described garbage files and described virus document, carries out following S3.A or following S3.B process;
S3.A: when described material processor be not filled into described resource material contain described garbage files and described virus document time, continue perform down operation;
S3.B: when described material processor be filled into described resource material contain described garbage files and described virus document time, point out described client be select killing virus or select continue perform download step, carry out following S3.B.a or following S3.B.b process;
S3.B.a: when client is selected to continue to perform download, then skip killing filtration step, continue to perform down operation;
S3.B.b: when client selects killing to filter, then killing is carried out to the described garbage files in described resource material and described virus document, until described resource material safety, proceed down operation.
3. the method for batch extracting web page resources material according to claim 1, is characterized in that, whether the described download in described monitoring exceedes threshold value according to the data length of described resource material, carries out following NA or following NB process:
NA: when the data length of described resource material exceedes threshold value, what generate the described resource material that will download according to described Web service end deposits path and whether creates file, carries out following NA.a1 to following NA.a4 step or following NA.b step;
NA.a1: when according to described Web service end to the described resource material that will download generate deposit path directly create file time, open the described file of establishment, the data of described resource material are received, by the described file that the data write of the described resource material received creates after filter process;
NA.a2: the data receiver of described resource material completes;
NA.a3: close the described file created;
NA.a4: downloaded;
NA.b: when not creating described file, does not receive the data of described resource material after filter process, and resource material corresponding to described request searched again by described material processor on webpage, re-starts and filters and download;
NB: when the data length of described resource material does not exceed threshold value, to applying for that in internal memory whether memory headroom is enough, carries out following NB.a1 to following NB.a3 step or following NB.b step;
NB.a1: when applying for that memory headroom is enough in internal memory, receives the data of described resource material after filter process, by the data write memory of the described resource material of reception, gets back to described NA.a2 step;
NB.a2: discharge described internal memory;
NB.a3: downloaded;
NB.b: when applying for that in internal memory memory headroom is not enough, carry out described NA.b step.
4. the method for batch extracting web page resources material according to claim 1, it is characterized in that, described request is transmitted with the form of data stream.
5. the method for batch extracting web page resources material according to claim 1, is characterized in that, described client is one or more in mobile phone, personal computer, panel computer.
6. the method for batch extracting web page resources material according to claim 1, is characterized in that, described web page resources material comprises picture, document, form, can perform in script, photo, audio frequency, video one or more.
CN201310105247.5A 2013-03-28 2013-03-28 A kind of method of batch extracting web page resources material Active CN103235785B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310105247.5A CN103235785B (en) 2013-03-28 2013-03-28 A kind of method of batch extracting web page resources material

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310105247.5A CN103235785B (en) 2013-03-28 2013-03-28 A kind of method of batch extracting web page resources material

Publications (2)

Publication Number Publication Date
CN103235785A CN103235785A (en) 2013-08-07
CN103235785B true CN103235785B (en) 2016-02-24

Family

ID=48883827

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310105247.5A Active CN103235785B (en) 2013-03-28 2013-03-28 A kind of method of batch extracting web page resources material

Country Status (1)

Country Link
CN (1) CN103235785B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110955852A (en) * 2018-09-25 2020-04-03 北京国双科技有限公司 Content import method and device
CN111651418B (en) * 2020-05-29 2022-03-08 腾讯科技(深圳)有限公司 Document content downloading method and device, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6718365B1 (en) * 2000-04-13 2004-04-06 International Business Machines Corporation Method, system, and program for ordering search results using an importance weighting
CN101477576A (en) * 2009-01-20 2009-07-08 华为技术有限公司 Method, equipment and system for providing network materials to search engine
CN102254027A (en) * 2011-07-29 2011-11-23 四川长虹电器股份有限公司 Method for obtaining webpage contents in batch
CN102646135A (en) * 2012-03-31 2012-08-22 奇智软件(北京)有限公司 Method, device and system for collecting web pages
CN102955791A (en) * 2011-08-23 2013-03-06 句容今太科技园有限公司 Searching and classifying service system for network information

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110119161A1 (en) * 2009-11-18 2011-05-19 Van Treeck George M Automated ratings of new products and services

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6718365B1 (en) * 2000-04-13 2004-04-06 International Business Machines Corporation Method, system, and program for ordering search results using an importance weighting
CN101477576A (en) * 2009-01-20 2009-07-08 华为技术有限公司 Method, equipment and system for providing network materials to search engine
CN102254027A (en) * 2011-07-29 2011-11-23 四川长虹电器股份有限公司 Method for obtaining webpage contents in batch
CN102955791A (en) * 2011-08-23 2013-03-06 句容今太科技园有限公司 Searching and classifying service system for network information
CN102646135A (en) * 2012-03-31 2012-08-22 奇智软件(北京)有限公司 Method, device and system for collecting web pages

Also Published As

Publication number Publication date
CN103235785A (en) 2013-08-07

Similar Documents

Publication Publication Date Title
US9336202B2 (en) Method and system relating to salient content extraction for electronic content
CN102646135B (en) Method, device and system for collecting web pages
CN110245069B (en) Page version testing method and device and page display method and device
CN109981322B (en) Method and device for cloud resource management based on label
WO2014204877A1 (en) Capturing website content through capture services
CN106294648A (en) A kind of processing method and processing device for page access path
CN104580093A (en) Processing method, device and system for notification messages of websites
WO2020253366A1 (en) Webpage mailbox data crawling method and apparatus, terminal, and storage medium
CN105051685A (en) System and method to enable web property access to a native application
CN104899212B (en) Web page display method, server and system
CN105447201A (en) An optimization method and terminal for sharing information
TW201409273A (en) Method and device for responding to webpage access request
CN102314437A (en) Method for supporting user to browse multiple format resources and equipment
CN102682013A (en) Method for operating compressed file in network storage appliance
CN110442819A (en) Data processing method, device, storage medium and terminal
EP3594823B1 (en) Information display method, terminal and server
CN109471974A (en) Filter method, apparatus, electronic equipment and the storage medium of third party's web advertisement
CN111245880B (en) Behavior trajectory reconstruction-based user experience monitoring method and device
CN105550179A (en) Webpage collection method and browser plug-in
CN103235785B (en) A kind of method of batch extracting web page resources material
KR20130026558A (en) System and providing method for integration of reply comment
CN107562452A (en) Terminal preset application update method, intelligent terminal and the device with store function
KR102259595B1 (en) System for providing mobile based file sending service using short message service
CN108763930A (en) WEB page streaming analytic method based on minimal cache model
CN103577433A (en) Intelligent page browsing method, system and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant