[go: up one dir, main page]

WO2018027928A1 - Procédé et système de capture de mégadonnées de forum - Google Patents

Procédé et système de capture de mégadonnées de forum Download PDF

Info

Publication number
WO2018027928A1
WO2018027928A1 PCT/CN2016/094945 CN2016094945W WO2018027928A1 WO 2018027928 A1 WO2018027928 A1 WO 2018027928A1 CN 2016094945 W CN2016094945 W CN 2016094945W WO 2018027928 A1 WO2018027928 A1 WO 2018027928A1
Authority
WO
WIPO (PCT)
Prior art keywords
search
keyword
forum
search results
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2016/094945
Other languages
English (en)
Chinese (zh)
Inventor
马岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Boxinnuoda Economic Relations & Trade Consultants Co Ltd
Original Assignee
Shenzhen Boxinnuoda Economic Relations & Trade Consultants Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Boxinnuoda Economic Relations & Trade Consultants Co Ltd filed Critical Shenzhen Boxinnuoda Economic Relations & Trade Consultants Co Ltd
Priority to PCT/CN2016/094945 priority Critical patent/WO2018027928A1/fr
Publication of WO2018027928A1 publication Critical patent/WO2018027928A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the invention relates to the field of big data, in particular to a method and a system for grasping massive data of a forum.
  • Big data refers to a collection of data that cannot be captured, managed, and processed by conventional software tools within a certain time frame. It requires a new processing model to have stronger decision-making power, insight and process optimization capabilities to adapt to massive and high growth. Rate and diversified information assets, existing big data is diverse, such as web page data, how to search for the desired data from web page data is a problem worth studying, and existing technical solutions cannot achieve effective web page data. search for.
  • the application provides a method for grasping massive data of a forum. It solves the shortcomings of the prior art technical solution that the effective search of webpage data cannot be realized.
  • a method for crawling massive data of a forum comprising the following steps:
  • Baidu search and Google search are opened in the forum data to search according to the keyword;
  • the method further includes:
  • search results are the same in both search results, the same search results are displayed on either page.
  • the method further includes:
  • a crawling system for forum massive data comprising:
  • An obtaining unit for obtaining a keyword to be searched An obtaining unit for obtaining a keyword to be searched
  • a search unit for opening a Baidu search and a Google search in the forum data according to the keyword, respectively searching according to the keyword;
  • a paging unit that displays two search results left and right.
  • system further includes:
  • system further includes:
  • a blocking unit that blocks the promoted webpage is
  • the technical solution provided by the invention obtains a keyword to be searched, and according to the keyword, the Baidu search and the Google search are respectively searched according to the keyword, and the two search results are displayed on the left and right pages, so that it has the advantage of effective search.
  • FIG. 1 is a flowchart of a method for capturing massive data of a forum according to a first preferred embodiment of the present invention
  • FIG. 2 is a structural diagram of a crawling system for forum mass data according to a second preferred embodiment of the present invention.
  • FIG. 1 is a schematic diagram of a method for capturing massive data of a forum according to a first preferred embodiment of the present invention. The method is as shown in FIG. 1 and includes the following steps:
  • Step S101 Acquire a keyword to be searched
  • Step S102 Open Baidu search and Google search in the forum data according to the keyword, and perform search according to the keyword respectively;
  • step S103 the two search results are displayed on the left and right pages.
  • the technical solution provided by the invention obtains a keyword to be searched, and according to the keyword, the Baidu search and the Google search are respectively searched according to the keyword, and the two search results are displayed on the left and right pages, so that it has the advantage of effective search.
  • the foregoing method may further include:
  • search results are the same in both search results, the same search results are displayed on either page.
  • the foregoing method may further include:
  • FIG. 2 is a schematic diagram of a forum for capturing massive data according to a second preferred embodiment of the present invention.
  • the system includes:
  • An obtaining unit 201 configured to acquire a keyword to be searched
  • the searching unit 202 is configured to open a Baidu search and a Google search in the forum data according to the keyword, and perform a search according to the keyword respectively;
  • the paging unit 203 is configured to display two search results to the left and right pages.
  • the technical solution provided by the invention obtains a keyword to be searched, and according to the keyword, the Baidu search and the Google search are respectively searched according to the keyword, and the two search results are displayed on the left and right pages, so that it has the advantage of effective search.
  • the above system may further include:
  • the allocating unit 204 is configured to display the same search result on any one of the pages if the two search results have the same search result.
  • the above system may further include:
  • the shielding unit 205 is configured to block the promoted webpage.
  • the program may be stored in a computer readable storage medium, and the storage medium may include: Flash drive, read-only memory (English: Read-Only Memory, referred to as: ROM), random accessor (English: Random Access Memory, referred to as: RAM), disk or CD.
  • ROM Read-Only Memory
  • RAM Random Access Memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

L'invention concerne un procédé de capture de mégadonnées de forum, présentant l'avantage d'une recherche efficace. Le procédé comprend les étapes suivantes consistant à : acquérir un mot-clé qui doit être recherché (101); ouvrir une recherche dans Baidu et une recherche dans Google respectivement, et effectuer une recherche conformément au mot-clé (102); et afficher les deux résultats de la recherche sur des pages gauche et droite (103).
PCT/CN2016/094945 2016-08-12 2016-08-12 Procédé et système de capture de mégadonnées de forum Ceased WO2018027928A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/094945 WO2018027928A1 (fr) 2016-08-12 2016-08-12 Procédé et système de capture de mégadonnées de forum

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/094945 WO2018027928A1 (fr) 2016-08-12 2016-08-12 Procédé et système de capture de mégadonnées de forum

Publications (1)

Publication Number Publication Date
WO2018027928A1 true WO2018027928A1 (fr) 2018-02-15

Family

ID=61161608

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/094945 Ceased WO2018027928A1 (fr) 2016-08-12 2016-08-12 Procédé et système de capture de mégadonnées de forum

Country Status (1)

Country Link
WO (1) WO2018027928A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100211588A1 (en) * 2009-02-13 2010-08-19 Microsoft Corporation Context-Aware Query Suggestion By Mining Log Data
CN105117476A (zh) * 2015-09-08 2015-12-02 刘珉恺 一种基于网络平台的搜索方法
CN105683966A (zh) * 2016-01-30 2016-06-15 深圳市博信诺达经贸咨询有限公司 基于大数据的搜索方法及系统
CN105849730A (zh) * 2016-03-25 2016-08-10 马岩 数据抓取的方法及系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100211588A1 (en) * 2009-02-13 2010-08-19 Microsoft Corporation Context-Aware Query Suggestion By Mining Log Data
CN105117476A (zh) * 2015-09-08 2015-12-02 刘珉恺 一种基于网络平台的搜索方法
CN105683966A (zh) * 2016-01-30 2016-06-15 深圳市博信诺达经贸咨询有限公司 基于大数据的搜索方法及系统
CN105849730A (zh) * 2016-03-25 2016-08-10 马岩 数据抓取的方法及系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SHUANGMOCHE: "The Strongest Cheap Copy Search Engine Websites: You Have to Admit, So Genius", 7 December 2008 (2008-12-07), pages 1 - 7, Retrieved from the Internet <URL:http://bbs.tianya.cn/post-no04-751986-l.shtml> *

Similar Documents

Publication Publication Date Title
WO2017128362A1 (fr) Procédé et système de recherche faisant appel à des données massives
WO2017117806A1 (fr) Procédé et système de recherche de terme pour des informations web
WO2018027928A1 (fr) Procédé et système de capture de mégadonnées de forum
WO2018027927A1 (fr) Procédé et système de recherche de données de page web
WO2018032246A1 (fr) Procédé et système de recherche de mégadonnées(big data) dans un réseau local
WO2018032245A1 (fr) Procédé et système de recherche de données destinés à des données de commentaire d&#39;un logiciel de réseautage social
WO2018032250A1 (fr) Procédé et système de recherche de données de texte destinés à des données volumineuses
WO2018032251A1 (fr) Procédé et système pour appliquer un niveau de sécurité à l&#39;extraction de mégadonnées
WO2018032252A1 (fr) Procédé et système de recherche sécurisée de mégadonnées sur des forums de discussion
WO2018032249A1 (fr) Procédé et système d&#39;extraction de données audio
WO2018032254A1 (fr) Procédé et système d&#39;extraction de vidéo de confiance dans des mégadonnées
WO2018032253A1 (fr) Procédé et système de recherche sécurisée pour mégadonnées d&#39;images
WO2017128357A1 (fr) Procédé à base de mégadonnées et système d&#39;analyse de page web
WO2018032248A1 (fr) Procédé et système d&#39;application de recherche d&#39;image pour rechercher dans des mégadonnées
WO2017117783A1 (fr) Système et procédé de recherche d&#39;informations de réseau
WO2018006254A1 (fr) Procédé et système de récupération basés sur des données de courrier de réseau local
WO2017128440A1 (fr) Procédé et système destinés à la surveillance et au rappel de mégadonnées
WO2018006217A1 (fr) Procédé et système de récupération basés sur des données de courrier de réseau
WO2018006218A1 (fr) Procédé et système de récupération basés sur des données de courrier locales
WO2018006256A1 (fr) Procédé et système de collecte de données de courrier locales
WO2018006255A1 (fr) Procédé et système de collecte de données de messagerie de réseau
WO2018027342A1 (fr) Procédé et système d&#39;application de synonyme dans une recherche de mégadonnées
WO2018014316A1 (fr) Procédé et système de collecte de données de courrier électronique d&#39;un réseau local
WO2017128438A1 (fr) Procédé et système d&#39;application de mégadonnées
WO2018027470A1 (fr) Procédé et système de partage de mégadonnées dans wechat

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16912383

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16912383

Country of ref document: EP

Kind code of ref document: A1