WO2018027928A1 - Procédé et système de capture de mégadonnées de forum - Google Patents
Procédé et système de capture de mégadonnées de forum Download PDFInfo
- Publication number
- WO2018027928A1 WO2018027928A1 PCT/CN2016/094945 CN2016094945W WO2018027928A1 WO 2018027928 A1 WO2018027928 A1 WO 2018027928A1 CN 2016094945 W CN2016094945 W CN 2016094945W WO 2018027928 A1 WO2018027928 A1 WO 2018027928A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- search
- keyword
- forum
- search results
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
Definitions
- the invention relates to the field of big data, in particular to a method and a system for grasping massive data of a forum.
- Big data refers to a collection of data that cannot be captured, managed, and processed by conventional software tools within a certain time frame. It requires a new processing model to have stronger decision-making power, insight and process optimization capabilities to adapt to massive and high growth. Rate and diversified information assets, existing big data is diverse, such as web page data, how to search for the desired data from web page data is a problem worth studying, and existing technical solutions cannot achieve effective web page data. search for.
- the application provides a method for grasping massive data of a forum. It solves the shortcomings of the prior art technical solution that the effective search of webpage data cannot be realized.
- a method for crawling massive data of a forum comprising the following steps:
- Baidu search and Google search are opened in the forum data to search according to the keyword;
- the method further includes:
- search results are the same in both search results, the same search results are displayed on either page.
- the method further includes:
- a crawling system for forum massive data comprising:
- An obtaining unit for obtaining a keyword to be searched An obtaining unit for obtaining a keyword to be searched
- a search unit for opening a Baidu search and a Google search in the forum data according to the keyword, respectively searching according to the keyword;
- a paging unit that displays two search results left and right.
- system further includes:
- system further includes:
- a blocking unit that blocks the promoted webpage is
- the technical solution provided by the invention obtains a keyword to be searched, and according to the keyword, the Baidu search and the Google search are respectively searched according to the keyword, and the two search results are displayed on the left and right pages, so that it has the advantage of effective search.
- FIG. 1 is a flowchart of a method for capturing massive data of a forum according to a first preferred embodiment of the present invention
- FIG. 2 is a structural diagram of a crawling system for forum mass data according to a second preferred embodiment of the present invention.
- FIG. 1 is a schematic diagram of a method for capturing massive data of a forum according to a first preferred embodiment of the present invention. The method is as shown in FIG. 1 and includes the following steps:
- Step S101 Acquire a keyword to be searched
- Step S102 Open Baidu search and Google search in the forum data according to the keyword, and perform search according to the keyword respectively;
- step S103 the two search results are displayed on the left and right pages.
- the technical solution provided by the invention obtains a keyword to be searched, and according to the keyword, the Baidu search and the Google search are respectively searched according to the keyword, and the two search results are displayed on the left and right pages, so that it has the advantage of effective search.
- the foregoing method may further include:
- search results are the same in both search results, the same search results are displayed on either page.
- the foregoing method may further include:
- FIG. 2 is a schematic diagram of a forum for capturing massive data according to a second preferred embodiment of the present invention.
- the system includes:
- An obtaining unit 201 configured to acquire a keyword to be searched
- the searching unit 202 is configured to open a Baidu search and a Google search in the forum data according to the keyword, and perform a search according to the keyword respectively;
- the paging unit 203 is configured to display two search results to the left and right pages.
- the technical solution provided by the invention obtains a keyword to be searched, and according to the keyword, the Baidu search and the Google search are respectively searched according to the keyword, and the two search results are displayed on the left and right pages, so that it has the advantage of effective search.
- the above system may further include:
- the allocating unit 204 is configured to display the same search result on any one of the pages if the two search results have the same search result.
- the above system may further include:
- the shielding unit 205 is configured to block the promoted webpage.
- the program may be stored in a computer readable storage medium, and the storage medium may include: Flash drive, read-only memory (English: Read-Only Memory, referred to as: ROM), random accessor (English: Random Access Memory, referred to as: RAM), disk or CD.
- ROM Read-Only Memory
- RAM Random Access Memory
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
L'invention concerne un procédé de capture de mégadonnées de forum, présentant l'avantage d'une recherche efficace. Le procédé comprend les étapes suivantes consistant à : acquérir un mot-clé qui doit être recherché (101); ouvrir une recherche dans Baidu et une recherche dans Google respectivement, et effectuer une recherche conformément au mot-clé (102); et afficher les deux résultats de la recherche sur des pages gauche et droite (103).
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2016/094945 WO2018027928A1 (fr) | 2016-08-12 | 2016-08-12 | Procédé et système de capture de mégadonnées de forum |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2016/094945 WO2018027928A1 (fr) | 2016-08-12 | 2016-08-12 | Procédé et système de capture de mégadonnées de forum |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2018027928A1 true WO2018027928A1 (fr) | 2018-02-15 |
Family
ID=61161608
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2016/094945 Ceased WO2018027928A1 (fr) | 2016-08-12 | 2016-08-12 | Procédé et système de capture de mégadonnées de forum |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2018027928A1 (fr) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100211588A1 (en) * | 2009-02-13 | 2010-08-19 | Microsoft Corporation | Context-Aware Query Suggestion By Mining Log Data |
| CN105117476A (zh) * | 2015-09-08 | 2015-12-02 | 刘珉恺 | 一种基于网络平台的搜索方法 |
| CN105683966A (zh) * | 2016-01-30 | 2016-06-15 | 深圳市博信诺达经贸咨询有限公司 | 基于大数据的搜索方法及系统 |
| CN105849730A (zh) * | 2016-03-25 | 2016-08-10 | 马岩 | 数据抓取的方法及系统 |
-
2016
- 2016-08-12 WO PCT/CN2016/094945 patent/WO2018027928A1/fr not_active Ceased
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100211588A1 (en) * | 2009-02-13 | 2010-08-19 | Microsoft Corporation | Context-Aware Query Suggestion By Mining Log Data |
| CN105117476A (zh) * | 2015-09-08 | 2015-12-02 | 刘珉恺 | 一种基于网络平台的搜索方法 |
| CN105683966A (zh) * | 2016-01-30 | 2016-06-15 | 深圳市博信诺达经贸咨询有限公司 | 基于大数据的搜索方法及系统 |
| CN105849730A (zh) * | 2016-03-25 | 2016-08-10 | 马岩 | 数据抓取的方法及系统 |
Non-Patent Citations (1)
| Title |
|---|
| SHUANGMOCHE: "The Strongest Cheap Copy Search Engine Websites: You Have to Admit, So Genius", 7 December 2008 (2008-12-07), pages 1 - 7, Retrieved from the Internet <URL:http://bbs.tianya.cn/post-no04-751986-l.shtml> * |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2017128362A1 (fr) | Procédé et système de recherche faisant appel à des données massives | |
| WO2017117806A1 (fr) | Procédé et système de recherche de terme pour des informations web | |
| WO2018027928A1 (fr) | Procédé et système de capture de mégadonnées de forum | |
| WO2018027927A1 (fr) | Procédé et système de recherche de données de page web | |
| WO2018032246A1 (fr) | Procédé et système de recherche de mégadonnées(big data) dans un réseau local | |
| WO2018032245A1 (fr) | Procédé et système de recherche de données destinés à des données de commentaire d'un logiciel de réseautage social | |
| WO2018032250A1 (fr) | Procédé et système de recherche de données de texte destinés à des données volumineuses | |
| WO2018032251A1 (fr) | Procédé et système pour appliquer un niveau de sécurité à l'extraction de mégadonnées | |
| WO2018032252A1 (fr) | Procédé et système de recherche sécurisée de mégadonnées sur des forums de discussion | |
| WO2018032249A1 (fr) | Procédé et système d'extraction de données audio | |
| WO2018032254A1 (fr) | Procédé et système d'extraction de vidéo de confiance dans des mégadonnées | |
| WO2018032253A1 (fr) | Procédé et système de recherche sécurisée pour mégadonnées d'images | |
| WO2017128357A1 (fr) | Procédé à base de mégadonnées et système d'analyse de page web | |
| WO2018032248A1 (fr) | Procédé et système d'application de recherche d'image pour rechercher dans des mégadonnées | |
| WO2017117783A1 (fr) | Système et procédé de recherche d'informations de réseau | |
| WO2018006254A1 (fr) | Procédé et système de récupération basés sur des données de courrier de réseau local | |
| WO2017128440A1 (fr) | Procédé et système destinés à la surveillance et au rappel de mégadonnées | |
| WO2018006217A1 (fr) | Procédé et système de récupération basés sur des données de courrier de réseau | |
| WO2018006218A1 (fr) | Procédé et système de récupération basés sur des données de courrier locales | |
| WO2018006256A1 (fr) | Procédé et système de collecte de données de courrier locales | |
| WO2018006255A1 (fr) | Procédé et système de collecte de données de messagerie de réseau | |
| WO2018027342A1 (fr) | Procédé et système d'application de synonyme dans une recherche de mégadonnées | |
| WO2018014316A1 (fr) | Procédé et système de collecte de données de courrier électronique d'un réseau local | |
| WO2017128438A1 (fr) | Procédé et système d'application de mégadonnées | |
| WO2018027470A1 (fr) | Procédé et système de partage de mégadonnées dans wechat |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16912383 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 16912383 Country of ref document: EP Kind code of ref document: A1 |