[go: up one dir, main page]

CN102799610B - Method and system for collecting network information - Google Patents

Method and system for collecting network information Download PDF

Info

Publication number
CN102799610B
CN102799610B CN201210180521.0A CN201210180521A CN102799610B CN 102799610 B CN102799610 B CN 102799610B CN 201210180521 A CN201210180521 A CN 201210180521A CN 102799610 B CN102799610 B CN 102799610B
Authority
CN
China
Prior art keywords
information
collected
classification
user
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210180521.0A
Other languages
Chinese (zh)
Other versions
CN102799610A (en
Inventor
赵勇
党书国
阎飞飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Shuqin Technology Co Ltd
Original Assignee
BEIJING QILEKE TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING QILEKE TECHNOLOGY CO LTD filed Critical BEIJING QILEKE TECHNOLOGY CO LTD
Priority to CN201210180521.0A priority Critical patent/CN102799610B/en
Publication of CN102799610A publication Critical patent/CN102799610A/en
Application granted granted Critical
Publication of CN102799610B publication Critical patent/CN102799610B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a system for collecting network information. The method comprises the following steps of: acquiring information to be collected which is required to be collected by a user according to a collection instruction of the user; analyzing the information to be collected to determine a classification of the information to be collected; and storing the information to be collected and the determined classification of the information to be collected, wherein the information to be collected comprises website information which is required to be collected by the user and/or information relevant to webpage contents. Preferably, during determination, based on a preset keyword library, at least one of text analysis, semantic analysis and word frequency statistical analysis is executed on information of the webpage contents corresponding to the information to be collected, so that whether at least one keyword is comprised in the information of the webpage contents corresponding to the information to be collected and also comprised in the preset keyword library is judged; and the classification of the information to be collected is determined according to a judgment result. By the method and the system, the convenience in network collection for the user is enhanced.

Description

Network information collection method and system
Technical Field
The present invention relates to network technologies, and in particular, to a method and a system for collecting network information for collecting network contents such as network addresses and web page contents.
Background
With the popularization and development of internet technology, the number and content of websites, blogs and microblogs are rapidly increasing. Some technical solutions for helping users to collect network contents are emerging.
In one collection mode, the user can add the website access address and its name or the web page access address and its name to the browser's favorites as a bookmark (consisting of the web page name and corresponding link) in the browser's favorites. When a user wants to access the collected webpage, the corresponding bookmark in the favorite is clicked, and the browser can be switched to the corresponding page for reading. In this collection manner, if the user does not manually classify the collected web page bookmarks to place newly added bookmarks in the category set under the favorite selected by the user, the browser will place the web links that the user wants to collect as bookmarks in the favorites in sequence according to the addition order of the user, which may cause the user to not easily find bookmarks corresponding to the web page links when the user wants to access the web page links through the collected bookmarks after a period of time while collecting the web page links. Further, if the user wishes to place the network links to be collected in the favorites as bookmarks in categories, manual setting is required, which may cause unnecessary trouble to the user.
In another collection mode, the user may access a web page provided by the web service provider for collecting the website links, and input the website name or website link that the user wants to collect and the classification of the website in the web page to store the website links that the user likes to access. In this manner, the user needs to manually set/select the category of the website to be collected, and even manually add the website name or website link, which tedious manual operations greatly reduce the user-friendliness of the collection system.
The invention is provided in order to better help the user to organize and manage the content collected by the user.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a network information collecting method and system capable of automatically classifying the network information to be collected.
In order to solve the technical problem, the invention provides a network information collection method. The method comprises the following steps:
an acquisition step, namely acquiring information to be collected, which is to be collected by a user, according to a collection instruction of the user;
a determining step, namely analyzing the information to be collected to determine the classification of the information to be collected;
a collection step of storing the information to be collected and the determined classification of the information to be collected,
the information to be collected comprises information related to website information and/or webpage content to be collected by the user.
According to another aspect of the present invention, in the determining step, at least one of text analysis, semantic analysis and word frequency statistical analysis is performed on the web page content information corresponding to the information to be collected based on a preset keyword library to determine whether there is at least one keyword which is included in the web page content information corresponding to the information to be collected and is included in the preset keyword library, and a classification of the information to be collected is determined according to a determination result, wherein the preset keyword library includes a plurality of keywords, and each keyword corresponds to one or more of the classifications; the webpage content information corresponding to the information to be collected comprises webpage content pointed by website information in the information to be collected, part or all of webpage content of a website corresponding to the website information in the information to be collected, and/or webpage content included in the information to be collected.
According to another aspect of the invention, when the judgment result is yes, the classification corresponding to the at least one keyword is determined as the classification of the information to be collected; or, determining the classification corresponding to one or more keywords which appear more frequently in the webpage content information corresponding to the information to be collected in the at least one keyword as the classification of the information to be collected.
According to another aspect of the present invention, in the analyzing step, when the determination result is negative, the classification of the information to be collected is determined as a preset default classification; or determining the classification designated by the user as the classification of the information to be collected.
According to still another aspect of the present invention, further comprising: and a setting step, namely adding, deleting and modifying the keywords of the preset keyword library according to the instruction of the user.
According to another aspect of the present invention, in the determining step, at least one of text analysis, semantic analysis and word frequency statistical analysis is performed on the web content information corresponding to the information to be collected to obtain keywords which are used for embodying features of the web content information corresponding to the information to be collected and are included in the web content information, and the classification of the information to be collected is determined according to the at least one keyword, where the web content information corresponding to the information to be collected includes web content pointed by website information in the information to be collected, part or all of the web content of a website corresponding to the website information in the information to be collected, and/or the web content included in the information to be collected.
According to another aspect of the present invention, in the determining step, a category corresponding to the one or more keywords is determined as a category of the information to be collected.
According to another aspect of the present invention, in the determining step, when none of the classifications corresponding to the one or more keywords is a classification used by the user in a previous collection process, the classification of the information to be collected is determined as a preset default classification; or determining the classification designated by the user as the classification of the information to be collected.
According to still another aspect of the present invention, the classification corresponding to each keyword is set in advance, or determined by performing text analysis and/or semantic analysis.
According to yet another aspect of the invention, the web page content includes all or part of text, images and/or movies in the web page.
According to another aspect of the present invention, when the information to be collected is website information to be collected by the user, the obtaining step further includes analyzing, according to a machine learning algorithm, the webpage content information corresponding to the website information to be collected by the user.
According to yet another aspect of the invention, the machine learning algorithm is naive bayes, a support vector machine, a latent dirichlet allocation model, and/or a neural network.
According to another aspect of the invention, a network information collection system is also provided. The system comprises: the acquisition unit is used for acquiring information to be collected, which is collected by the user, according to the collection instruction of the user; the determining unit is used for analyzing the information to be collected so as to determine the classification of the information to be collected; and the collection unit is used for storing the information to be collected and the determined classification of the information to be collected, wherein the information to be collected comprises information related to website information and/or webpage content to be collected by the user.
According to another aspect of the present invention, the determining unit further performs at least one of text analysis, semantic analysis and word frequency statistical analysis on the web content information corresponding to the information to be collected based on a preset keyword library to determine whether there is at least one keyword which is included in the web content information corresponding to the information to be collected and is included in the preset keyword library, and determines the classification of the information to be collected according to a determination result, wherein the preset keyword library includes a plurality of keywords, and each keyword corresponds to one or more of the classifications; the webpage content information corresponding to the information to be collected comprises webpage content pointed by website information in the information to be collected, part or all of webpage content of a website corresponding to the website information in the information to be collected, and/or webpage content included in the information to be collected.
According to another aspect of the present invention, when the determination result is yes, the determining unit determines the category corresponding to the at least one keyword as the category of the information to be collected; or, determining the classification corresponding to one or more keywords which appear more frequently in the webpage content information corresponding to the information to be collected in the at least one keyword as the classification of the information to be collected.
According to one or more embodiments of the invention, after the information to be collected, which is to be collected by the user, is acquired according to the collection indication of the user, the information to be collected is analyzed, and the classification of the information to be collected can be determined according to the analysis result without manually participating in the classification process. Therefore, the information to be collected can be automatically stored in the collection positions of the corresponding categories, and the convenience of using network collection by the user is enhanced.
In other words, one or more embodiments of the invention solve the problems of complicated content classification, lack of organization and the like in the favorites when the user does not perform manual classification, so that the collection operation is more convenient and faster.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a schematic diagram illustrating a network information collection system according to the present embodiment;
fig. 2A and 2B illustrate flowcharts of a network information collection method according to first and second embodiments of the present invention, respectively;
FIG. 3 shows a flow diagram of an example of network favorites according to the present invention;
FIG. 4 illustrates a flow diagram of yet another example of network collections according to the present invention;
FIG. 5 illustrates a flow chart of yet another example of network collections according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings.
It should be noted that, if not conflicting, the embodiments of the present invention and the features of the embodiments may be combined with each other within the scope of protection of the present invention. Additionally, the steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions and, although a logical order is illustrated in the flow charts, in some cases, the steps illustrated or described may be performed in an order different than here.
First embodiment
Fig. 1 shows a schematic configuration diagram of a network information collection system according to the present embodiment. As shown in fig. 1, a server 10 is network-connected to a plurality of clients 20.
It should be noted that fig. 1 shows only one server 10, however, the server 10 of the present invention may be multiple servers, for example, multiple computer devices in a cloud platform may jointly function as a server. The client 10 may be a computer, a Personal Digital Assistant (PDA), a tablet computer, a smart phone, or other various computing devices.
The connection between the server 10 and the client 20 may be a wired network or a wireless network. In the client 20, a browser or other network information processing software for accessing network information may be provided.
Fig. 2A illustrates steps of a network information collection method according to the present embodiment. The method according to the present embodiment is described below with reference to fig. 2A.
Step 210, acquiring information to be collected, which is to be collected by a user, according to a collection instruction of the user, wherein the information to be collected includes website information and/or webpage content related information, such as a webpage address, a webpage name, and all or part of webpage content, which are to be collected by the user;
step 220, analyzing the information to be collected to determine the classification of the information to be collected;
step 230, storing the information to be collected and the determined classification of the information to be collected.
In step 210, more specifically, when the user clicks on the browser of the client 20, the browser receives the user's favorite indication. According to the collection instruction, the browser can acquire the information to be collected, which is collected by the user.
In one example, the user clicks a customized collection button or a browser plug-in on the browser, and the browser obtains the website information of the currently accessed webpage as the information to be collected according to the operation (corresponding to the collection instruction of the user) of the user. Certainly, the information to be collected may not only be website information to be collected by the user, but also be web page content to be collected by the user, including pictures, texts, even images and the like in the web page.
Preferably, the browser may also send the received collection instruction of the user to the server 10, and the server 10 obtains the information to be collected, which is to be collected by the user, according to the instruction, so as to store the information to be collected in the server 10 in step 230.
For example, the user clicks a custom button, browser plug-in, menu, etc. on the browser for local collection or network collection, the server 10 receives a message (corresponding to the user's collection indication) from the browser indicating that the click event has occurred, and determines the information to be collected by the user based on the message. For example, the server 10 uses the website included in the message and uses all or part of the content of the webpage corresponding to the website as the information to be collected according to the website. More specifically, the server 10 may analyze, according to a machine learning algorithm such as nave Bayesian Model (NBC), a Support Vector Machine (SVM), a potential Dirichlet Allocation Model (LDA), a neural network, and the like, a web content pointed by the website information in the downloaded information to be collected, a part or all of the web contents of the website corresponding to the website information in the information to be collected, and/or a web content included in the information to be collected.
In addition, other software or modules in the client 20 may also be used to perform the operation of acquiring the information to be collected, which is to be collected by the user, according to the collection instruction of the user.
The process of analyzing the information to be collected to determine the classification of the information to be collected in step 220 is described in detail below.
Firstly, analyzing the webpage content information corresponding to the information to be collected based on a preset keyword library to judge whether at least one keyword which is contained in the webpage content information corresponding to the information to be collected and contained in the preset keyword library exists.
For example, the frequency of occurrence of each keyword in the keyword library in the web content information corresponding to the information to be collected is analyzed through the word frequency analysis, so that the above-mentioned determination operation can be completed, that is, it is determined whether at least one keyword which is included in the web content information corresponding to the information to be collected and is included in the preset keyword library exists.
For another example, keywords that can be used to reflect characteristics of the web content information corresponding to the information to be collected in the web content information corresponding to the information to be collected may also be analyzed through semantic analysis or text analysis, and then whether the keywords are in a preset keyword library is determined, thereby completing the above determination operation.
If the result of the determination is yes, that is, if it is determined that at least one keyword is included in the web content information corresponding to the information to be collected and included in the preset keyword library, the category corresponding to the keyword may be determined as the category of the information to be collected. It should be noted that the number of the keywords may be one or more, one keyword may correspond to multiple categories, and multiple keywords may also correspond to the same category, so that the categories of the information to be collected may be multiple.
Preferably, when it is determined that there are more keywords included in the web content information corresponding to the information to be collected and included in the preset keyword library, a category corresponding to one or more keywords that appear more frequently in the web content information corresponding to the information to be collected among the keywords may be determined as the category of the information to be collected, so that the classification accuracy may be better improved.
In addition, during analysis, the three analysis modes of text analysis, semantic analysis and word frequency statistical analysis can be combined for use, so as to analyze and obtain the keywords which can most embody the characteristics of the webpage content information corresponding to the information to be collected.
In addition, only one or a plurality of classifications corresponding to more keywords in the classifications corresponding to the keywords can be used as the classification of the information to be collected, so that the classification accuracy is improved.
Determining the classification of the information to be collected as a preset default classification when the judgment result is negative, namely, the judgment result is that no keyword which is contained in the webpage content information corresponding to the information to be collected and is contained in the preset keyword library exists; or determining the classification designated by the user as the classification of the information to be collected.
The process of step 220 may be performed by client 20, a browser or other software on client 20, or server 10.
Then, step 230 is entered, and the information to be collected and the determined classification of the information to be collected are stored.
It should be noted that the information to be collected and the determined classification of the information to be collected may be stored in the client 20, or may be stored in the server 10.
For example, when the user desires to locally store the website information as the collection information so as to conveniently access the web page by directly clicking a bookmark on the browser next time, the browser installed on the client 20 may store the website information and the name of the web page and the category determined according to this step as a bookmark of one browser. In this way, when the user wants to access the bookmark next time, the bookmark can be conveniently searched by category, so that the manual classification of the user is not needed, and the user friendliness is improved.
Similarly, when a user desires to store collection information on a network, the collection information and its categories may be stored by the server 20. In this way, the user can conveniently access the collected contents by category through the network.
In the foregoing embodiments, the classification corresponding to each keyword may be set in advance, or the classification corresponding to each keyword may be determined by performing text analysis and/or semantic analysis.
In addition, the preset keyword library can be manually configured in advance by research personnel or determined by a program, and a setting interface can also be provided for a user, so that the addition, deletion and modification operations can be carried out on the keywords of the preset keyword library according to the indication of the user.
In this embodiment, the web page content is analyzed for each keyword in the preset keyword library based on the preset keyword library, and word frequency analysis is not required for each keyword of the web page content, so that complexity of analysis processing can be reduced well.
Second embodiment
Fig. 2B illustrates a flowchart of a network information collection method according to the present embodiment. In fig. 2B, the same or similar steps as in fig. 2 are denoted by the same reference numerals.
Step 210 and step 230 of the present embodiment are substantially the same as those of the first embodiment, and therefore, are not expanded in detail. In this embodiment, step 221 replaces step 220, and the information of the web pages to be collected can be automatically classified without preselecting the keyword library.
More specifically, in step 221 of this embodiment, text analysis, semantic analysis, word frequency analysis, and/or the like are performed on the web content information corresponding to the information to be collected to obtain keywords which are used for embodying features of the web content information corresponding to the information to be collected and are included in the web content information corresponding to the information to be collected, and then the classification of the information to be collected is determined according to the keywords.
Compared with the previous preferred embodiment, the embodiment does not necessarily need to preset a keyword library, but directly performs text analysis, semantic analysis and/or word frequency statistical analysis and the like on the webpage content, and analyzes and determines one or more keywords which can reflect the characteristics of the webpage content most. For example, the word with the highest frequency of occurrence is analyzed as the keyword. Some words with specific meanings are also analyzed as keywords, for example, if the automobile variety such as popular, audi, etc. appears many times, it can be indicated that the website belongs to the automobile class, and it is determined as a keyword through semantic analysis.
Then, the classification corresponding to the analyzed keywords can be determined as the classification of the information to be collected.
Further, it can also be determined whether the keywords are categories that the user has used in a previous collection process. If not, determining the classification of the information to be collected as a preset default classification; or determining the classification designated by the user as the classification of the information to be collected.
In the embodiment, the automatic classification of the information to be collected can be better realized without preselecting and setting a keyword library.
Example one
FIG. 3 shows a flow diagram of an example of network favorites according to the present invention.
After the user collects the favorite websites or links, the system automatically extracts keywords, classifies the contents to be collected and adds the contents to be collected into the favorites of the corresponding category.
The following describes in detail how the example performs automatic keyword extraction in a web collection for content classification steps:
step 310, a user logs in a network favorite;
step 320, the user directly adds the website or link to be collected in the system, and clicks the collection; or in the process of browsing the webpage, selecting a mode of needing to collect through a browser plug-in customized by the system, and automatically collecting the current page link or the picture and content (corresponding to the information to be collected) selected by the user for the user by the system;
step 330, the system analyzes, compares and automatically extracts the content of the corresponding website or link according to the machine learning algorithms of naive Bayes, support vector machines, LDA, neural networks and the like, and extracts the keywords of the content description through the algorithms of word segmentation, word frequency statistics and the like;
step 340, the system performs text and semantic analysis according to the extracted keywords to obtain the classification of the keywords;
step 350, according to the classification result, the system adds the connection or favorite content to the favorites of the corresponding category.
Example two
FIG. 4 illustrates a flow chart of yet another example of network collections according to the present invention.
In this example, when the user collects the website, link or page, blog, microblog content, the system automatically classifies it into the corresponding favorite.
The steps of the present example are described in detail below.
Step 410, clicking a website or a link to be collected by a user;
step 420, the system crawler program automatically captures the corresponding website or the linked content;
430, automatically performing text analysis, semantic analysis, word frequency statistics and other work on the captured contents by the system;
step 440, the system extracts one or more keywords from the captured content according to a predefined keyword library, so as to realize automatic keyword extraction of the system;
step 450, the system automatically classifies the corresponding content according to the category to which the keyword belongs;
step 460, the system adds the corresponding web address or link to the favorites of the corresponding category.
Example two
FIG. 5 illustrates a flow chart of yet another example of network collections according to the present invention.
In this example, what the user wants to collect is the content of a page, a blog, or a microblog. The specific steps of this example are as follows:
step 510, clicking a page, a blog and a microblog to be collected by a user;
step 520, the system automatically performs text analysis, semantic analysis, word frequency statistics and other work on the corresponding content;
step 530, which is substantially the same as step 440 above and is not described again;
step 540, which is substantially the same as step 450 and is not described again;
step 550, the system adds the corresponding page, blog, microblog to the favorites of the corresponding category.
Therefore, the content of the website, link or page, blog and microblog which the user wants to collect is automatically classified into the favorite of the category to which the user belongs by the system, and the user can greatly conveniently access the content of the corresponding website, link or page, blog and microblog according to the category.
The system is intelligentized, one behavior of user collection can intelligently reflect the collection habits and hobbies of the user, and the website can automatically classify the collection contents of the user; the collection mode is various, the collection can be performed by using a website, the collection can be performed by using applications of various operation platforms (such as android, ios and winphone), and the collection can be performed by using a customized browser plug-in.
The above embodiment has been described by taking a browser as an example of the network information processing software, and it should be noted that, alternatively, other network information processing software built in or installed on the client may be used.
In general, as described in the above embodiments, the client and the server are generally two different devices connected to the network, but as a specific example, when both the web server and the browser are installed in the same computer, the client and the server may be the same device.
Those skilled in the art will appreciate that the modules or steps of the invention described above can be implemented in a general purpose computing device, centralized on a single computing device or distributed across a network of computing devices, and optionally implemented in program code that is executable by a computing device, such that the modules or steps are stored in a memory device and executed by a computing device, fabricated separately into integrated circuit modules, or fabricated as a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
Although the embodiments of the present invention have been described above, the above descriptions are only for the convenience of understanding the present invention, and are not intended to limit the present invention. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (12)

1. A network information collection method is characterized by comprising the following steps:
an acquisition step, namely acquiring information to be collected, which is to be collected by a user, according to a collection instruction of the user;
a determining step, namely analyzing the information to be collected to determine the classification of the information to be collected;
a collection step of storing the information to be collected and the determined classification of the information to be collected,
wherein,
the information to be collected comprises information related to website information and/or webpage content to be collected by the user;
in the step of determining,
performing at least one of text analysis and semantic analysis on the webpage content information corresponding to the information to be collected to acquire keywords which are used for embodying the characteristics of the webpage content information corresponding to the information to be collected and are contained in the webpage content information, determining the classification of the information to be collected according to at least one keyword, wherein,
the webpage content information corresponding to the information to be collected comprises all webpage contents of websites corresponding to the website information in the information to be collected;
in the determining step, when all the classifications corresponding to one or more keywords are not the classifications used by the user in the previous collection process, the classification specified by the user is determined as the classification of the information to be collected.
2. The method of claim 1, wherein, in the determining step,
performing at least one of text analysis and semantic analysis on the web content information corresponding to the information to be collected based on a preset keyword library to judge whether at least one keyword which is contained in the web content information corresponding to the information to be collected and contained in the preset keyword library exists or not, and determining the classification of the information to be collected according to the judgment result, wherein,
the preset keyword library comprises a plurality of keywords, and each keyword corresponds to one or more classifications;
the webpage content information corresponding to the information to be collected comprises webpage content pointed by website information in the information to be collected, all webpage content of a website corresponding to the website information in the information to be collected, and/or webpage content included in the information to be collected.
3. The method according to claim 2, wherein in the determining step, when the judgment result is yes,
determining the classification corresponding to the at least one keyword as the classification of the information to be collected; or,
and determining the classification corresponding to one or more keywords which appear more frequently in the webpage content information corresponding to the information to be collected in the at least one keyword as the classification of the information to be collected.
4. The method according to claim 2, wherein in the analyzing step, when the judgment result is negative,
determining the classification of the information to be collected as a preset default classification; or
And determining the classification designated by the user as the classification of the information to be collected.
5. The method of any of claims 2 to 4, further comprising:
and a setting step, namely adding, deleting and modifying the keywords of the preset keyword library according to the instruction of the user.
6. The method of claim 1, wherein, in the determining step,
and determining the classification corresponding to the one or more keywords as the classification of the information to be collected.
7. The method according to claim 3 or 6,
the classification corresponding to each keyword is preset, or determined by performing text analysis and/or semantic analysis.
8. The method according to any one of claims 1 to 4 or 6,
the web page content includes all or part of text, images and/or video in the web page.
9. The method according to any one of claims 2 to 4, wherein, when the information to be collected is website information to be collected by the user, the obtaining step further comprises,
and analyzing the webpage content information corresponding to the website information to be collected by the user according to a machine learning algorithm.
10. The method of claim 9, wherein the machine learning algorithm is naive bayes, a support vector machine, a latent dirichlet allocation model, and/or a neural network.
11. A network information collection system, comprising:
the acquisition unit is used for acquiring information to be collected, which is collected by the user, according to the collection instruction of the user;
the determining unit is used for analyzing the information to be collected so as to determine the classification of the information to be collected;
a collection unit for storing the information to be collected and the determined classification of the information to be collected,
wherein,
the information to be collected comprises information related to website information and/or webpage content to be collected by the user;
the determining unit further performs at least one of text analysis and semantic analysis on the web content information corresponding to the information to be collected based on a preset keyword library to determine whether at least one keyword included in the web content information corresponding to the information to be collected and included in the preset keyword library exists, and determines the classification of the information to be collected according to the determination result,
the preset keyword library comprises a plurality of keywords, and each keyword corresponds to one or more classifications;
the webpage content information corresponding to the information to be collected comprises all webpage contents of websites corresponding to the website information in the information to be collected;
the determining unit determines the classification specified by the user as the classification of the information to be collected when all the classifications corresponding to the one or more keywords are not the classifications used by the user in the previous collection process.
12. The system according to claim 11, wherein the determination unit, when the determination result is yes,
determining the classification corresponding to the at least one keyword as the classification of the information to be collected; or,
and determining the classification corresponding to one or more keywords which appear more frequently in the webpage content information corresponding to the information to be collected in the at least one keyword as the classification of the information to be collected.
CN201210180521.0A 2012-06-01 2012-06-01 Method and system for collecting network information Active CN102799610B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210180521.0A CN102799610B (en) 2012-06-01 2012-06-01 Method and system for collecting network information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210180521.0A CN102799610B (en) 2012-06-01 2012-06-01 Method and system for collecting network information

Publications (2)

Publication Number Publication Date
CN102799610A CN102799610A (en) 2012-11-28
CN102799610B true CN102799610B (en) 2017-04-12

Family

ID=47198720

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210180521.0A Active CN102799610B (en) 2012-06-01 2012-06-01 Method and system for collecting network information

Country Status (1)

Country Link
CN (1) CN102799610B (en)

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103885980A (en) * 2012-12-21 2014-06-25 腾讯科技(深圳)有限公司 Bookmark adding method and browser
CN104077314A (en) * 2013-03-28 2014-10-01 腾讯科技(深圳)有限公司 Method and system for adding browser into favorites and terminal equipment
US9674271B2 (en) 2013-04-28 2017-06-06 Tencent Technology (Shenzhen) Company Limited Platform for sharing collected information with third-party applications
CN104125264B (en) * 2013-04-28 2017-08-25 腾讯科技(深圳)有限公司 Resource collecting method, device and equipment
CN104123316B (en) * 2013-04-28 2018-12-04 腾讯科技(深圳)有限公司 Resource collecting method, device and equipment
CN103324669B (en) * 2013-05-20 2016-12-28 北京奇虎科技有限公司 A kind of method that Web page bookmark is processed and client
CN104239297A (en) * 2013-06-06 2014-12-24 腾讯科技(北京)有限公司 Webpage bookmarking method, system and device
CN104391936B (en) * 2014-11-21 2018-04-13 百度在线网络技术(北京)有限公司 The treating method and apparatus of label in browser collection folder
CN104899270A (en) * 2015-05-26 2015-09-09 惠州Tcl移动通信有限公司 Intelligent terminal and information storage method for same
CN105095517A (en) * 2015-09-17 2015-11-25 安一恒通(北京)科技有限公司 Method and device for sorting favorites of browser
WO2017124367A1 (en) * 2016-01-21 2017-07-27 马岩 App-based member information classification method and system
CN105893584A (en) * 2016-04-03 2016-08-24 北京设集约科技有限公司 Method, client and system for displaying website label of favorites
CN107436907A (en) * 2016-05-27 2017-12-05 中国联合网络通信集团有限公司 Web text classification integration method and device
CN106095985B (en) * 2016-06-20 2019-08-20 网际傲游(北京)科技有限公司 A kind of method of dynamic collection and cluster web pages information
CN106570061B (en) * 2016-09-30 2020-04-03 维沃移动通信有限公司 A kind of management method of webpage label and mobile terminal
CN108959316B (en) * 2017-05-24 2021-08-20 北京搜狗科技发展有限公司 Method and device for adding webpage to favorites
CN107193981A (en) * 2017-05-26 2017-09-22 腾讯科技(深圳)有限公司 Collection file is shown, processing method and processing device, computer-readable storage medium and equipment
CN109033306A (en) * 2018-07-17 2018-12-18 佛山市灏金赢科技有限公司 A kind of browsing webpage method for sorting and system for mobile client
CN109657168B (en) * 2018-11-30 2021-04-23 维沃移动通信有限公司 A collection record display method and device
CN110059268A (en) * 2018-12-27 2019-07-26 阿里巴巴集团控股有限公司 Collect the determination method, apparatus and client device of object type
CN109493845A (en) * 2019-01-02 2019-03-19 百度在线网络技术(北京)有限公司 For generating the method and device of audio
CN110351183B (en) * 2019-06-03 2021-06-08 创新先进技术有限公司 Resource collection method and device in instant messaging
CN114297466A (en) * 2021-12-31 2022-04-08 中国电信股份有限公司 Web page collection method, device, storage medium and electronic device
CN115098819A (en) * 2022-06-27 2022-09-23 平安银行股份有限公司 Webpage collection method and device
CN115248803B (en) * 2022-09-22 2023-02-17 天津联想协同科技有限公司 Collection method and device suitable for network disk file, network disk and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1756160A (en) * 2004-09-27 2006-04-05 戴志军 Individualized website convenient for user accessing Internet
CN101382954A (en) * 2008-09-25 2009-03-11 北京搜狗科技发展有限公司 Method and system for providing web site collection name
CN102043805A (en) * 2009-10-19 2011-05-04 阿里巴巴集团控股有限公司 Method and device for generating Internet navigation page
CN102298614A (en) * 2011-07-29 2011-12-28 百度在线网络技术(北京)有限公司 Method for determining collection category of page collection information and device and equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1756160A (en) * 2004-09-27 2006-04-05 戴志军 Individualized website convenient for user accessing Internet
CN101382954A (en) * 2008-09-25 2009-03-11 北京搜狗科技发展有限公司 Method and system for providing web site collection name
CN102043805A (en) * 2009-10-19 2011-05-04 阿里巴巴集团控股有限公司 Method and device for generating Internet navigation page
CN102298614A (en) * 2011-07-29 2011-12-28 百度在线网络技术(北京)有限公司 Method for determining collection category of page collection information and device and equipment

Also Published As

Publication number Publication date
CN102799610A (en) 2012-11-28

Similar Documents

Publication Publication Date Title
CN102799610B (en) Method and system for collecting network information
US11989244B2 (en) Shared user driven clipping of multiple web pages
CN104133820B (en) Content recommendation method and content recommendation device
US8533199B2 (en) Intelligent bookmarks and information management system based on the same
US20090089293A1 (en) Selfish data browsing
US20150262069A1 (en) Automatic topic and interest based content recommendation system for mobile devices
US20170308552A1 (en) Relevancy evaluation for image search results
US20090299990A1 (en) Method, apparatus and computer program product for providing correlations between information from heterogenous sources
CN101996193B (en) A kind of represent the processing method of network resource link, system and internet terminal
CN110232126B (en) Hot spot mining method, server and computer readable storage medium
TW201514845A (en) Title and body extraction from web page
KR100868187B1 (en) Photo-based integrated content generation and provision system and method thereof.
CN104899220A (en) Application program recommendation method and system
CN110929058B (en) Trademark picture retrieval method and device, storage medium and electronic device
CN112699295A (en) Webpage content recommendation method and device and computer readable storage medium
US20150286711A1 (en) Method for web information discovery and user interface
TWI457775B (en) Method for sorting and managing websites and electronic device of executing the same
CN116561402A (en) Method, device and server for acquiring target content information in webpage
CN102663070B (en) Method and system for providing browser application
CN103995704A (en) Function providing method and device for application program
CN105095404A (en) Method and apparatus for processing and recommending webpage information
CN112818217B (en) A method and system for recommending web services based on blockchain sharding
CN106033414A (en) A hotspot information processing method and system
CN110889279B (en) Method and device for displaying display information in document
KR102361157B1 (en) Electronic device and Method for filtering content in an electronic device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20230206

Address after: Room 1102, 11 / F, building 2, dingchuang wealth center, 1166 liangmu Road, Cangqian street, Yuhang District, Hangzhou City, Zhejiang Province, 311100

Patentee after: ZHEJIANG SHUQIN TECHNOLOGY CO.,LTD.

Address before: 405C, Building 106, Lize Zhongyuan, Chaoyang District, Beijing, 100102

Patentee before: BEIJING QILEKE TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right