[go: up one dir, main page]

CN110134845A - Project public opinion monitoring method, device, computer equipment and storage medium - Google Patents

Project public opinion monitoring method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN110134845A
CN110134845A CN201910270796.5A CN201910270796A CN110134845A CN 110134845 A CN110134845 A CN 110134845A CN 201910270796 A CN201910270796 A CN 201910270796A CN 110134845 A CN110134845 A CN 110134845A
Authority
CN
China
Prior art keywords
public opinion
project
data source
target
corpus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910270796.5A
Other languages
Chinese (zh)
Inventor
吴壮伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910270796.5A priority Critical patent/CN110134845A/en
Publication of CN110134845A publication Critical patent/CN110134845A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请实施例提供了一种项目舆情监控方法、装置、计算机设备及计算机可读存储介质。本申请实施例属于数据展示技术领域,在实现项目舆情监控时,在获取目标项目的标识信息后,根据标识信息通过网络搜索的方式获取目标项目的数据源网站列表,根据标识信息从所述数据源网站列表所包含的数据源网站中爬取目标项目的语料,实现获取关于目标项目的语料,然后通过自然语言处理解析语料以识别语料所包含的主体名称及舆情特征,将所述主体名称及所述舆情特征导入图数据库以构建所述目标项目的舆情关系图谱,可视化的显示所述目标项目的舆情关系图谱,以对目标单位从细分角度实现项目的舆情监控,从而提高目标单位内细分角度的舆情监控效率。

Embodiments of the present application provide a project public opinion monitoring method, device, computer equipment, and computer-readable storage medium. The embodiment of the present application belongs to the field of data display technology. When realizing project public opinion monitoring, after obtaining the identification information of the target project, the data source website list of the target project is obtained through network search according to the identification information, and the data source website list is obtained from the data according to the identification information. The corpus of the target project is crawled from the data source website contained in the source website list, and the corpus about the target project is obtained, and then the corpus is parsed through natural language processing to identify the subject name and public opinion features contained in the corpus, and the subject name and The public opinion features are imported into the graph database to construct the public opinion relationship map of the target project, and the public opinion relationship map of the target project is visually displayed, so as to monitor the public opinion of the project from the perspective of subdivision of the target unit, thereby improving the detailed information within the target unit. The efficiency of public opinion monitoring from different angles.

Description

项目舆情监控方法、装置、计算机设备及存储介质Project public opinion monitoring method, device, computer equipment and storage medium

技术领域technical field

本申请涉及数据展示技术领域,尤其涉及一种项目舆情监控方法、装置、计算机设备及计算机可读存储介质。The present application relates to the technical field of data display, and in particular to a project public opinion monitoring method, device, computer equipment, and computer-readable storage medium.

背景技术Background technique

企业舆情信息是目前做企业关系项目都会涉及到的内容,而传统技术中都是采取定向的采集指定网站数据的方法,比如财经网站。但这样采集的信息反映的是一个企业整体的舆论情况,若需要企业某一方面的企业舆情,比如针对企业的一款产品、一项投资或者一项广告营销等内容的舆情,需要从大量的整体的舆情信息数据中进行筛选,并且筛选出的内容不够准确,均导致获取该方面舆情的效率不高的问题。Corporate public opinion information is currently involved in all corporate relationship projects, while traditional technology uses a targeted method of collecting data from designated websites, such as financial websites. However, the information collected in this way reflects the overall public opinion of an enterprise. If public opinion on a certain aspect of an enterprise is needed, such as public opinion on a product, an investment, or an advertising campaign of the enterprise, it is necessary to collect information from a large number of sources. The overall public opinion information data is screened, and the screened content is not accurate enough, which leads to the problem of low efficiency in obtaining public opinion in this area.

发明内容Contents of the invention

本申请实施例提供了一种项目舆情监控方法、装置、计算机设备及计算机可读存储介质,能够解决传统技术中目标项目舆情监控效率不高的问题。The embodiment of the present application provides a project public opinion monitoring method, device, computer equipment, and computer-readable storage medium, which can solve the problem of low efficiency of target project public opinion monitoring in the traditional technology.

第一方面,本申请实施例提供了一种项目舆情监控方法,所述方法包括:通过预设方式获取目标项目的标识信息;根据所述标识信息通过网络搜索的方式获取所述目标项目的数据源网站列表;根据所述标识信息从所述数据源网站列表所包含的数据源网站中爬取所述目标项目的语料;通过自然语言处理解析所述语料以识别所述语料所包含的主体名称及舆情特征;将所述主体名称及所述舆情特征导入图数据库以构建所述目标项目的舆情关系图谱;显示所述舆情关系图谱。In the first aspect, an embodiment of the present application provides a project public opinion monitoring method, the method comprising: obtaining the identification information of the target project in a preset manner; obtaining the data of the target project through a network search according to the identification information Source website list; crawl the corpus of the target item from the data source website contained in the data source website list according to the identification information; analyze the corpus by natural language processing to identify the subject name contained in the corpus and public opinion characteristics; importing the subject name and the public opinion characteristics into a graph database to construct a public opinion relationship graph of the target project; displaying the public opinion relationship graph.

第二方面,本申请实施例还提供了一种项目舆情监控装置,包括:第一获取单元,用于通过预设方式获取目标项目的标识信息;第二获取单元,用于根据所述标识信息通过网络搜索的方式获取所述目标项目的数据源网站列表;爬取单元,用于根据所述标识信息从所述数据源网站列表所包含的数据源网站中爬取所述目标项目的语料;识别单元,用于通过自然语言处理解析所述语料以识别所述语料所包含的主体名称及舆情特征;构建单元,用于将所述主体名称及所述舆情特征导入图数据库以构建所述目标项目的舆情关系图谱;显示单元,用于显示所述舆情关系图谱。In the second aspect, the embodiment of the present application also provides a project public opinion monitoring device, including: a first acquisition unit, configured to acquire the identification information of the target item in a preset manner; a second acquisition unit, configured to Obtaining a list of data source websites of the target item through network search; a crawling unit configured to crawl the corpus of the target item from the data source websites included in the list of data source websites according to the identification information; The identification unit is used to analyze the corpus through natural language processing to identify the subject name and public opinion features contained in the corpus; the construction unit is used to import the subject name and the public opinion features into the graph database to construct the target A public opinion relationship map of the project; a display unit configured to display the public opinion relationship map.

第三方面,本申请实施例还提供了一种计算机设备,其包括存储器及处理器,所述存储器上存储有计算机程序,所述处理器执行所述计算机程序时实现所述项目舆情监控方法。In the third aspect, the embodiment of the present application also provides a computer device, which includes a memory and a processor, the memory stores a computer program, and the processor implements the project public opinion monitoring method when executing the computer program.

第四方面,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时使所述处理器执行所述项目舆情监控方法。In the fourth aspect, the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor executes the project public opinion monitoring method.

本申请实施例提供了一种项目舆情监控方法、装置、计算机设备及计算机可读存储介质。本申请实施例实现项目舆情监控时,在获取目标项目的标识信息后,根据所述标识信息通过网络搜索的方式获取所述目标项目的数据源网站列表,根据所述标识信息从所述数据源网站列表所包含的数据源网站中爬取所述目标项目的语料,从而实现获取较全面的关于目标项目的语料,然后通过自然语言处理解析所述语料以识别所述语料所包含的主体名称及舆情特征,将所述主体名称及所述舆情特征导入图数据库以构建所述目标项目的舆情关系图谱,可视化的显示所述目标项目的舆情关系图谱,从而对目标单位从细分角度实现项目的舆情监控,从而提高目标单位内细分角度的舆情监控效率。Embodiments of the present application provide a project public opinion monitoring method, device, computer equipment, and computer-readable storage medium. When the embodiment of the present application implements project public opinion monitoring, after obtaining the identification information of the target project, obtain the data source website list of the target project through network search according to the identification information, and obtain the website list of the data source of the target project according to the identification information from the data source The corpus of the target item is crawled from the data source website contained in the website list, so as to obtain a more comprehensive corpus about the target item, and then analyze the corpus through natural language processing to identify the subject name and Public opinion features, importing the subject name and the public opinion features into the graph database to construct the public opinion relationship map of the target project, and visually display the public opinion relationship map of the target project, so as to realize the target unit from the perspective of subdivision. Public opinion monitoring, so as to improve the efficiency of public opinion monitoring from the subdivision angle within the target unit.

附图说明Description of drawings

为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions of the embodiments of the present application more clearly, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are some embodiments of the present application. Ordinary technicians can also obtain other drawings based on these drawings on the premise of not paying creative work.

图1为本申请实施例提供的项目舆情监控方法的应用场景示意图;Fig. 1 is a schematic diagram of the application scenario of the project public opinion monitoring method provided by the embodiment of the present application;

图2为本申请实施例提供的项目舆情监控方法的流程示意图;Fig. 2 is a schematic flow diagram of the project public opinion monitoring method provided by the embodiment of the present application;

图3为本申请实施例提供的项目舆情监控方法的另一个流程示意图;Fig. 3 is another schematic flow chart of the project public opinion monitoring method provided by the embodiment of the present application;

图4为本申请实施例提供的项目舆情监控方法的一个子流程示意图;Fig. 4 is a schematic diagram of a sub-flow of the project public opinion monitoring method provided by the embodiment of the present application;

图5为本申请实施例提供的项目舆情监控方法的另一个子流程示意图;Fig. 5 is a schematic diagram of another sub-flow of the project public opinion monitoring method provided by the embodiment of the present application;

图6为本申请实施例提供的项目舆情监控方法的第三个子流程示意图;6 is a schematic diagram of the third sub-flow of the project public opinion monitoring method provided by the embodiment of the present application;

图7为本申请实施例提供的项目舆情监控装置的示意性框图;FIG. 7 is a schematic block diagram of a project public opinion monitoring device provided in an embodiment of the present application;

图8为本申请实施例提供的项目舆情监控装置的另一个示意性框图;以及Fig. 8 is another schematic block diagram of the project public opinion monitoring device provided by the embodiment of the present application; and

图9为本申请实施例提供的计算机设备的示意性框图。Fig. 9 is a schematic block diagram of a computer device provided by an embodiment of the present application.

具体实施方式Detailed ways

下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It should be understood that when used in this specification and the appended claims, the terms "comprising" and "comprises" indicate the presence of described features, integers, steps, operations, elements and/or components, but do not exclude one or Presence or addition of multiple other features, integers, steps, operations, elements, components and/or collections thereof.

还应当理解,在此本申请说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本申请。如在本申请说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。It should also be understood that the terminology used in the specification of this application is for the purpose of describing particular embodiments only and is not intended to limit the application. As used in this specification and the appended claims, the singular forms "a", "an" and "the" are intended to include plural referents unless the context clearly dictates otherwise.

还应当进一步理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It should also be further understood that the term "and/or" used in the description of the present application and the appended claims refers to any combination and all possible combinations of one or more of the associated listed items, and includes these combinations .

请参阅图1,图1为本申请实施例提供的项目舆情监控方法的应用场景示意图。本申请实施例提供的项目舆情监控方法可应用于图1所示的终端中,通过安装于终端上的软件来实现所述项目舆情监控方法的步骤,其中所述终端可以为笔记本电脑、平板电脑或者台式电脑等电子设备。本申请实施例提供的项目舆情监控方法的具体实现过程如下:终端通过预设方式获取目标项目的标识信息;根据所述标识信息通过网络搜索的方式获取所述目标项目的数据源网站列表;根据所述标识信息从所述数据源网站列表所包含的数据源网站中爬取所述目标项目的语料;通过自然语言处理解析所述语料以识别所述语料所包含的主体名称及舆情特征;将所述主体名称及所述舆情特征导入图数据库以构建所述目标项目的舆情关系图谱;显示所述舆情关系图谱。Please refer to FIG. 1 . FIG. 1 is a schematic diagram of an application scenario of the project public opinion monitoring method provided by the embodiment of the present application. The project public opinion monitoring method provided in the embodiment of the present application can be applied to the terminal shown in Figure 1, and the steps of the project public opinion monitoring method are realized through the software installed on the terminal, wherein the terminal can be a notebook computer or a tablet computer Or electronic equipment such as desktop computers. The specific implementation process of the project public opinion monitoring method provided by the embodiment of the present application is as follows: the terminal obtains the identification information of the target project in a preset manner; obtains the data source website list of the target project through a network search according to the identification information; The identification information crawls the corpus of the target project from the data source website included in the data source website list; parses the corpus through natural language processing to identify the subject name and public opinion features contained in the corpus; The subject name and the public opinion features are imported into a graph database to construct a public opinion relationship graph of the target item; and the public opinion relationship graph is displayed.

需要说明的是,图1中仅仅示意出台式电脑作为终端,在实际操作过程中,终端的类型不限于图1中所示,所述终端还可以为手机、笔记本电脑或者平板电脑等电子设备,上述项目舆情监控方法的应用场景仅仅用于说明本申请技术方案,并不用于限定本申请技术方案。It should be noted that in FIG. 1 only a desktop computer is shown as a terminal. In the actual operation process, the type of the terminal is not limited to that shown in FIG. The application scenarios of the project public opinion monitoring method above are only used to illustrate the technical solution of the present application, and are not intended to limit the technical solution of the present application.

图2为本申请实施例提供的项目舆情监控方法的示意性流程图。该项目舆情监控方法应用于图1的终端中,以完成项目舆情监控方法的全部或者部分功能。Fig. 2 is a schematic flow chart of a project public opinion monitoring method provided by an embodiment of the present application. The project public opinion monitoring method is applied to the terminal shown in FIG. 1 to complete all or part of the functions of the project public opinion monitoring method.

请参阅图2,图2是本申请实施例提供的项目舆情监控方法的流程示意图。如图2所示,该方法包括以下步骤S210-S260:Please refer to FIG. 2 . FIG. 2 is a schematic flowchart of a project public opinion monitoring method provided by an embodiment of the present application. As shown in Figure 2, the method includes the following steps S210-S260:

S210、通过预设方式获取目标项目的标识信息。S210. Obtain the identification information of the target item in a preset manner.

其中,预设方式是指人工通过输入设备输入的方式或者通过自然语言处理目标项 目的语料以获得词汇对获得的词汇进行筛选的方式。Wherein, the preset method refers to a method of manually inputting through an input device or a method of obtaining vocabulary through natural language processing of the corpus of the target item and screening the obtained vocabulary .

目标项目,是指企业或者其他组织确定监控舆情的项目,比如,一个企业的目标项目指企业中某一方面的内容,比如,企业的一款产品、一场营销、一项投资、一个事件等内容,是企业一个细分方面的内容。The target project refers to the project that the enterprise or other organizations determine to monitor public opinion. For example, the target project of an enterprise refers to the content of a certain aspect of the enterprise, such as a product of the enterprise, a marketing, an investment, an event, etc. Content is the content of a segmented aspect of an enterprise.

目标项目的标识信息,是指针对该目标项目主要内容的标识性信息,是对该目标项目关键内容的描述,比如,针对一款产品,该产品的名称、各个属性等,比如一款手机等。The identification information of the target item refers to the identification information for the main content of the target item, which is a description of the key content of the target item, for example, for a product, the name and various attributes of the product, such as a mobile phone, etc. .

具体地,所述目标项目的标识信息可以通过输入设备接收输入的信息,从而获取该项目的标识信息以对该项目进行舆情监控,比如,获取一款手机的品牌名称、型号、处理器及续航等性能描述。另外,目标项目的标识信息也可以通过自然语言处理的方式获取,从目标项目的语料源中爬取初始语料,将所述初始语料进行分词和筛选等自然语言处理,筛选出目标项目当前受关注的目标项目的标识信息,再根据标识信息对该目标项目爬取更多较为全面的语料,通过更多的语料对目标项目进行准确的舆情监控。比如,终端爬取一个企业一预定周期的数据,通过自然语言处理从语料中筛选出该企业当前的热点目标项目,获取该目标项目的标识信息,也就是筛选出企业当前被关注的热点项目的舆情,根据该标识信息,爬取该标识信息涉及的目标项目的数据,对该数据进行进一步分析,获取该目标项目的舆情,以供企业进行参考。Specifically, the identification information of the target item can receive input information through an input device, thereby obtaining the identification information of the item to monitor public opinion on the item, for example, obtaining the brand name, model, processor and battery life of a mobile phone and other performance descriptions. In addition, the identification information of the target project can also be obtained through natural language processing. The initial corpus is crawled from the corpus source of the target project, and the initial corpus is subjected to natural language processing such as word segmentation and screening, and the target project is currently concerned. According to the identification information of the target project, more comprehensive corpus is crawled for the target project according to the identification information, and more corpus is used to accurately monitor the public opinion of the target project. For example, the terminal crawls the data of an enterprise for a predetermined period, screens out the current hot target items of the company from the corpus through natural language processing, and obtains the identification information of the target items, that is, screens out the hot items that the company is currently concerned about. Public opinion, according to the identification information, crawls the data of the target project involved in the identification information, further analyzes the data, and obtains the public opinion of the target project for reference by enterprises.

S220、根据所述标识信息通过网络搜索的方式获取所述目标项目的数据源网站列表。S220. Obtain a list of data source websites of the target item by searching the Internet according to the identification information.

具体地,终端获取目标项目的标识信息,根据所述目标项目的标识信息,通过搜索获取所述目标项目涉及的语料源,所述语料源也就是目标项目的数据源网站,若包含多个数据源网站,则为数据源网站列表。比如,企业为了针对性的了解企业一产品项目上的舆情,也就是了解外界对该产品的评价,需要对该产品进行舆情监控,可以获取该产品的关键字,比如该产品的品牌名称、型号等,根据所述关键字,获取所述产品涉及的数据源,当然,所述数据源也可以是手动输入的网址,根据输入的网址和关键字,获取形成该产品舆情涉及的语料,所述语料包括新闻、产品介绍、产品评述和评论等,可以以文章或者句子等形式描述存在于各个类型的网站中。比如,企业为了实现一款手机产品的舆情监控,以改进手机产品后期的营销和设计等,需要获取手机产品的关键字,比如手机名称、手机型号等关键字,根据手机产品的关键字,获取包含手机产品的数据源,从数据源中获取针对该款手机产品舆情的语料。Specifically, the terminal obtains the identification information of the target item, and according to the identification information of the target item, obtains the corpus source involved in the target item through searching, and the corpus source is the data source website of the target item. source website, it is a list of data source websites. For example, in order to understand the public opinion on a product project of the enterprise, that is, to understand the external evaluation of the product, an enterprise needs to monitor the public opinion of the product, and can obtain the keywords of the product, such as the brand name and model of the product etc., according to the keyword, obtain the data source involved in the product, of course, the data source can also be a manually input URL, according to the input URL and keyword, obtain the corpus involved in forming the public opinion of the product, the The corpus includes news, product introductions, product reviews and comments, etc., and can be described in the form of articles or sentences and exists in various types of websites. For example, in order to realize the public opinion monitoring of a mobile phone product and improve the marketing and design of mobile phone products, etc., an enterprise needs to obtain keywords of mobile phone products, such as mobile phone name, mobile phone model and other keywords. According to the keywords of mobile phone products, obtain Contains the data source of the mobile phone product, and obtains the corpus of public opinion for the mobile phone product from the data source.

S230、根据所述标识信息从所述数据源网站列表所包含的数据源网站中爬取所述目标项目的语料。S230. Crawl the corpus of the target item from the data source websites included in the data source website list according to the identification information.

其中,爬取是指通过爬虫进行爬取,爬虫是指网络爬虫,网络爬虫又被称为网页蜘蛛、网络机器人,或者网页追逐者等,是一种按照一定规则自动地抓取万维网信息的程序或者脚本。Among them, crawling refers to crawling through crawlers, crawlers refer to web crawlers, and web crawlers are also called web spiders, web robots, or web page chasers, etc., and are a program that automatically grabs information on the World Wide Web according to certain rules. or script.

具体地,要实施对目标项目的舆情监控,通过构建爬虫系统爬取互联网上目标项目的有关语料,通过对语料的解析构建目标项目的舆情关系图谱,通过所述舆情关系图谱获取目标项目的舆情以实现对目标项目的舆情监控。由于网络爬虫是一个自动提取网页的程序,爬取所述数据源网站中包含所述目标项目的语料,通过爬虫程序可以只爬取与目标项目有关的数据,通过预设方式获取目标项目的标识信息,通过标识信息获取目标项目的数据源网站列表后,爬虫系统根据目标项目的数据源网站列表,通过爬取可以获取数据源网站中目标项目的丰富语料。Specifically, it is necessary to implement public opinion monitoring on the target project, crawl the relevant corpus of the target project on the Internet by building a crawler system, construct a public opinion relationship map of the target project through the analysis of the corpus, and obtain the public opinion of the target project through the public opinion relationship map. In order to realize the public opinion monitoring of the target project. Since a web crawler is a program that automatically extracts web pages, and crawls the corpus containing the target item in the data source website, only the data related to the target item can be crawled through the crawler program, and the identification of the target item can be obtained by a preset method Information, after obtaining the data source website list of the target project through the identification information, the crawler system can obtain rich corpus of the target project in the data source website through crawling according to the data source website list of the target project.

进一步地,还可以对语料源包含的数据进行筛选,根据筛选出的数据获取目标项目舆情中某一方面的舆情,比如,针对某一手机产品的摄像头评价,电池续航、处理器或者系统的优劣等数据进行筛选以形成对应舆情。Furthermore, the data contained in the corpus can also be screened, and the public opinion of a certain aspect of the target project public opinion can be obtained according to the screened data, for example, the camera evaluation for a certain mobile phone product, battery life, processor or system optimization. Inferior data are screened to form corresponding public opinion.

S240、通过自然语言处理解析所述语料以识别所述语料所包含的主体名称及舆情特征。S240. Analyze the corpus by using natural language processing to identify subject names and public opinion features included in the corpus.

其中,所述目标项目,是指企业或者其他组织确定监控舆情的内容整体,比如,一个企业的目标项目可以为企业的一款产品、一场营销、一项投资、一个事件等内容,是企业一个细分方面的内容整体。所述主体名称指语料中包含的对象,该对象为该目标项目的名称以及该目标项目内各组成部分的名称,比如,若一目标项目为手机A,主体名称包括手机A的名称及组成手机A的各个部件或者组件的名称,由于语料中包含有手机A的名称以及组成手机A的显示屏B的名称和摄像头C的名称,在识别的过程中,计算机设备无法区分各个主体名称之间的关系,比如显示屏B和摄像头C从属于手机A,但可以识别出所述语料所包含的手机A的名称,以及组成手机A的显示屏B的名称和摄像头C的名称为主体名称。Among them, the target project refers to the overall content of the enterprise or other organizations to determine and monitor public opinion. For example, the target project of an enterprise can be a product, a marketing, an investment, an event, etc. of the enterprise. A subdivision of the content of the whole. The subject name refers to the object contained in the corpus, which is the name of the target project and the names of each component in the target project. For example, if a target project is mobile phone A, the subject name includes the title of mobile phone A and the composition of the mobile phone. The name of each component or component of A, because the corpus contains the name of mobile phone A, the name of the display screen B and the name of camera C that make up mobile phone A, in the process of recognition, the computer equipment cannot distinguish between the names of the various subjects For example, the display screen B and the camera C belong to the mobile phone A, but the name of the mobile phone A contained in the corpus can be identified, and the name of the display screen B and the name of the camera C that make up the mobile phone A are the subject names.

舆情特征,是指目标项目舆情的关键词,是评价目标项目的特征描述,用于描述目标项目中主体名称对应主体的属性及主体之间关系,比如,若目标项目为一款手机,目标项目中包含的主体名称为手机名称、手机中各个部件或者组件的名称,比如组成手机的显示屏的名称及摄像头名称,舆情特征是对主体名称对应的主体的评价及描述,比如,手机的配置高、显示屏大或者摄像头拍照效果好等体现手机特征的描述。需要说明的是,主体名称是指区分主体的标识,主体名称可以有不同形式的表述,比如,针对手机,除了手机品牌作为手机名称外,手机品牌的具体型号或者代号也可以作为手机名称。The public opinion feature refers to the keywords of the public opinion of the target project. It is a feature description for evaluating the target project. It is used to describe the attributes of the subject name corresponding to the subject in the target project and the relationship between subjects. For example, if the target project is a mobile phone, the target project The subject name contained in the subject name is the name of the mobile phone, the names of various parts or components in the mobile phone, such as the name of the display screen and the name of the camera that make up the mobile phone. The public opinion feature is the evaluation and description of the subject corresponding to the subject name. For example, the configuration of the mobile phone is high. , a large display screen, or a camera with a good photo effect, etc., which reflect the characteristics of the mobile phone. It should be noted that the subject name refers to the identification of the subject, and the subject name can be expressed in different forms. For example, for mobile phones, in addition to the mobile phone brand as the mobile phone name, the specific model or code name of the mobile phone brand can also be used as the mobile phone name.

具体地,通过自然语言处理解析所述语料,是指将所述语料根据句子分隔号进行分割以获取句子数据集,根据所述语料构建命名实体模型,通过所述命名实体模型识别出所述句子数据集中所包含的主体名称,对所述语料进行词性分析及目标关系的检索以获取所述目标项目的舆情特征。比如,通过自然语言处理技术,解析获取的语料,识别出手机名称信息和相关特征描述,为目标项目舆情提供重要的数据源。其中,命名实体识别,英文为Named Entity Recognition,简称NER,又称作“专名识别”,是指识别文本中具有特定意义的实体,主要包括人名、地名、机构名、专有名词等。一般来说,命名实体识别的任务就是识别出待处理文本中三大类(实体类、时间类和数字类)、七小类(人名、机构名、地名、时间、日期、货币和百分比)命名实体,中文命名实体模型包括CRF模型及基于字的BiLSTM-CRF模型。通过获取的详细而全面的数据源,通过自然语言处理方法,获取有关目标项目的舆情信息,后续将舆情信息导入到图数据库中,以完善节点和节点属性的数据。比如,通过命名实体模型,识别出有关目标项目的句子语料,对语料进行分词后,对词进行词性分析和特征词分析,比如名词、动词、形容词及这些词之间的关系,从而抽取出语料中的目标项目舆情信息。Specifically, parsing the corpus through natural language processing refers to dividing the corpus according to the sentence separator to obtain a sentence data set, constructing a named entity model based on the corpus, and identifying the sentence through the named entity model The name of the subject contained in the data set, the part of speech analysis and the retrieval of the target relationship are performed on the corpus to obtain the public opinion characteristics of the target item. For example, through natural language processing technology, analyze the acquired corpus, identify the mobile phone name information and related feature descriptions, and provide an important data source for the public opinion of the target project. Among them, named entity recognition, English is Named Entity Recognition, NER for short, also known as "proper name recognition", which refers to the identification of entities with specific meaning in text, mainly including names of people, places, institutions, proper nouns, etc. Generally speaking, the task of named entity recognition is to identify three major categories (entity, time, and number) and seven subcategories (person name, organization name, place name, time, date, currency, and percentage) in the text to be processed. Entities, Chinese named entity models include CRF models and word-based BiLSTM-CRF models. Through the obtained detailed and comprehensive data sources, the public opinion information about the target project is obtained through natural language processing methods, and then the public opinion information is imported into the graph database to improve the data of nodes and node attributes. For example, through the named entity model, the sentence corpus related to the target project is identified, and after the corpus is segmented, the part of speech analysis and feature word analysis are performed on the words, such as nouns, verbs, adjectives and the relationship between these words, so as to extract the corpus The public opinion information of the target project in .

进一步地,还可以通过人工智能的模型训练和自动学习技术,提高目标项目中主体名称识别和舆情特征识别的准确性。具体地,通过自然语言处理技术,对获取的大量语料进行分词处理,并对获取的分词进行筛选,此时通过人工智能的模型训练和自动学习技术提高主体名称识别和舆情特征识别的准确性,比如,通过人工智能模型和自动学习技术筛选出获取的分词中包含的名词及动词等,及筛选出名词及动词中按照由高到低顺序排列的前预设位数的词语,将名词作为目标项目主体,动词作为主体间关系的描述,建立实体模型。Furthermore, artificial intelligence model training and automatic learning technology can also be used to improve the accuracy of subject name recognition and public opinion feature recognition in target projects. Specifically, through natural language processing technology, a large amount of corpus is used for word segmentation processing, and the acquired word segmentation is screened. At this time, artificial intelligence model training and automatic learning technology are used to improve the accuracy of subject name recognition and public opinion feature recognition. For example, the nouns and verbs contained in the obtained word segmentation are screened out through the artificial intelligence model and automatic learning technology, and the pre-preset digits of the nouns and verbs are screened out in order from high to low, and the nouns are used as the target The subject of the project, the verb is used as the description of the relationship between subjects, and the entity model is established.

S250、将所述主体名称及所述舆情特征导入图数据库以构建所述目标项目的舆情关系图谱。S250. Import the subject name and the public opinion features into a graph database to construct a public opinion relationship graph of the target item.

其中,图数据库,又称为图形数据库,英文为Graph Database,图形数据库是NoSQL数据库的一种类型,它应用图形理论存储实体之间的关系信息,常见的图形数据库包括Neo4j、FlockDB及AllegroGrap等。在一个图形数据库中,数据库的最主要组成主要有两种,节点集和连接节点的关系,节点集就是图中一系列节点的集合,图形数据库中,每个节点仍具有标示自己所属实体类型的标签,也既是其所属的节点集,并记录一系列描述该节点特性的属性,除此之外,还可以通过关系来连接各个节点。Among them, graph database, also known as graph database, is Graph Database in English. Graph database is a type of NoSQL database. It applies graph theory to store relationship information between entities. Common graph databases include Neo4j, FlockDB, and AllegroGrap. In a graph database, there are two main components of the database, the node set and the relationship between the connecting nodes. The node set is a collection of a series of nodes in the graph. In the graph database, each node still has The label is also the set of nodes it belongs to, and records a series of attributes describing the characteristics of the node. In addition, the nodes can also be connected through relationships.

具体地,将通过自然语言处理解析所述语料识别出来所述目标项目的主体名称及舆情特征导入到图数据库中,完善图数据库的节点和连接节点关系的数据,其中,节点对应主体名称和舆情特征,同时描述节点之间的关系。在设计图形数据库时,由多个节点组成节点集,节点之间通过关系进行关联,分清图中节点集,节点以及节点之间的相互关系,在导入数据时,图形数据库自动识别导入数据中的节点数据和关系数据,将所述节点数据和关系数据分别归属到图形数据库的对应位置上。在本实例中,将所述主体名称及所述舆情特征导入图数据库后,可以自动构建所述目标项目的舆情关系图谱,比如,若目标项目为一款手机,目标项目中包含的主体名称为手机名称、手机中各个部件或者组件的名称,比如组成手机的显示屏的名称及摄像头名称,舆情特征是对主体名称对应的主体的评价及描述,比如,手机的显示屏大是对手机特征的描述,三个分别节点对应“手机”、“显示屏”及“大”,同时通过“手机”与“显示屏”之间的“包含”关系、“显示屏”与“大”之间的形容关系将三个节点依次连接起来以形成舆情关系图谱。本申请实施例中通过所述目标项目的舆情关系图谱的方式,存储目标项目的动态舆情数据,能更好地可视化和提取目标项目的舆情。Specifically, import the subject name and public opinion features of the target project identified through natural language processing and analysis of the corpus into the graph database, and improve the data of the nodes and connection node relationships of the graph database, wherein the nodes correspond to the subject name and public opinion features, while describing the relationship between nodes. When designing a graph database, a node set is composed of multiple nodes, and the nodes are associated through relationships. The node set in the graph, the nodes, and the relationship between nodes are distinguished. When importing data, the graph database automatically recognizes the data in the imported data. The node data and the relationship data are assigned to the corresponding positions of the graph database respectively. In this example, after importing the subject name and the public opinion features into the graph database, the public opinion relationship map of the target project can be automatically constructed. For example, if the target project is a mobile phone, the subject name contained in the target project is The name of the mobile phone, the name of each component or component in the mobile phone, such as the name of the display screen and the name of the camera that make up the mobile phone. The public opinion feature is the evaluation and description of the subject corresponding to the subject name. For example, the large display screen of the mobile phone is a reflection of the mobile phone features. Description, the three nodes correspond to "mobile phone", "display screen" and "big", and at the same time through the "containment" relationship between "mobile phone" and "display screen", and the description between "display screen" and "big" The relationship connects the three nodes in turn to form a public opinion relationship graph. In the embodiment of the present application, the dynamic public opinion data of the target item is stored by means of the public opinion relationship graph of the target item, so that the public opinion of the target item can be better visualized and extracted.

进一步地,所述目标项目的舆情关系图谱包括目标项目名称、所述目标项目的子项目名称及子项目名称对应的舆情特征。Further, the public opinion relationship graph of the target project includes the name of the target project, the names of the sub-projects of the target project, and the public opinion features corresponding to the names of the sub-projects.

其中,目标项目的子项目是指目标项目的组成部分。比如,假若该目标项目为一款手机,则该手机的显示屏、摄像头、电池及中央处理器等组件均为该手机的子项目,子项目对应的型号为子项目的名称。Wherein, a sub-item of the target item refers to a component of the target item. For example, if the target item is a mobile phone, then components such as the display screen, camera, battery and central processing unit of the mobile phone are all sub-items of the mobile phone, and the model corresponding to the sub-item is the name of the sub-item.

具体地,所述目标项目的舆情关系图谱中的元素包括所述目标项目的舆情关系图谱中的主体名称及舆情特征。比如,所述目标项目的舆情关系图谱中有名词等主体名称和形容词等特征描述。舆情特征,是指舆情的关键词,比如,拍照清晰及电池续航时间长等。通过目标项目的舆情关系图谱的方式,存储目标项目动态舆情数据,以实现舆情数据更好地可视化和提取,通过构建目标项目的图谱数据,搭建了与目标项目相关的新闻语料库,在目标项目的图谱数据可视化之前,还需要对目标项目舆情数据进行时间排序,罗列出目标项目最近排名比较靠前的新闻数据,比如,通过检索A手机产品使用的中央处理器等零部件,则通过遍历A节点的相关客体,即可获取到A产品的相关舆情等。另外,还可以对目标项目的相关具体领域进行深度分析,比如,若目标项目为一产品,对该产品的供应商关系也可以进行分析,需要从供应商属性中得到与目标产品相关的舆情数据,并且对数据进行分类和去重,呈现至用户以进行舆情监控,比如,对手机产品中中央处理器的测评,可以影响对产品的舆情,比如智能手机的中央处理器骁龙820的发热问题,对采用骁龙820的手机影响就较大,对骁龙835处理器性能优点的各种好评,带来对采用骁龙835手机的性能上的各种好评,比如速度快及省电等手机的舆情特征。Specifically, the elements in the public opinion relationship graph of the target item include subject names and public opinion features in the public opinion relationship graph of the target item. For example, the public opinion relationship graph of the target project includes subject names such as nouns and feature descriptions such as adjectives. Public opinion features refer to keywords of public opinion, for example, clear photos and long battery life. Through the public opinion relationship map of the target project, the dynamic public opinion data of the target project is stored to achieve better visualization and extraction of public opinion data. By constructing the map data of the target project, a news corpus related to the target project is built. In the target project Before graph data visualization, it is also necessary to sort the public opinion data of the target project by time, and list the recent news data of the target project that ranks relatively high. The relevant objects of the node can obtain the relevant public opinion of A product, etc. In addition, in-depth analysis of specific fields related to the target project can also be carried out. For example, if the target project is a product, the supplier relationship of the product can also be analyzed, and the public opinion data related to the target product needs to be obtained from the supplier attributes. , and classify and deduplicate the data, and present it to users for public opinion monitoring. For example, the evaluation of the CPU in mobile phone products can affect the public opinion of the product, such as the heating problem of the CPU Snapdragon 820 in smartphones , has a greater impact on mobile phones using the Snapdragon 820. Various praises for the performance advantages of the Snapdragon 835 processor have brought various praises for the performance of the Snapdragon 835 mobile phone, such as fast speed and power saving. characteristics of public opinion.

S260、显示所述舆情关系图谱。S260. Display the public opinion relationship graph.

具体地,终端将构建的所述目标项目的舆情关系图谱进行显示,提供给用户以使用户根据所述目标项目的舆情关系图谱实现对所述目标项目的舆情监控,以使目标项目监控人员根据目标项目的舆情关系图谱获得目标项目的舆情结论,实现对目标项目舆情监控,以对目标项目舆情做对应处理,比如,可以获得目标项目舆情的正面信息和反面信息,获取目标项目舆情中的事件评估信息和渠道评估信息,以作相应公关措施。Specifically, the terminal displays the constructed public opinion relationship map of the target project and provides it to the user so that the user can monitor the public opinion of the target project according to the public opinion relationship map of the target project, so that the target project monitors can monitor the public opinion according to the The public opinion relationship map of the target project obtains the conclusion of the public opinion of the target project, realizes the monitoring of the public opinion of the target project, and handles the public opinion of the target project correspondingly, for example, can obtain the positive information and negative information of the public opinion of the target project, and obtain the events in the public opinion of the target project Evaluation information and channel evaluation information for corresponding public relations measures.

进一步地,还可以对获取的目标项目舆情中正面和反面的舆情结论根据不同机制进行排序,充分利用正面舆情实现利益最大化,对反面舆情采取对应措施,消除消极影响,比如,某一产品出现的屏幕问题或者电池问题等。Furthermore, it is also possible to sort the positive and negative public opinion conclusions of the obtained target project public opinion according to different mechanisms, make full use of the positive public opinion to maximize the benefits, and take corresponding measures against the negative public opinion to eliminate the negative impact. For example, a certain product appears screen problems or battery problems.

本申请实施例实现项目舆情监控时,在获取目标项目的标识信息后,根据所述标识信息通过网络搜索的方式获取所述目标项目的数据源网站列表,根据所述标识信息从所述数据源网站列表所包含的数据源网站中爬取所述目标项目的语料,从而实现获取较全面的关于目标项目的语料,然后通过自然语言处理解析所述语料以识别所述语料所包含的主体名称及舆情特征,将所述主体名称及所述舆情特征导入图数据库以构建所述目标项目的舆情关系图谱,可视化的显示所述目标项目的舆情关系图谱,从而对目标单位从细分角度实现项目的舆情监控,增加目标项目的具体舆情,能更好地针对目标单位中某一项具体项目实现针对该项目的舆情监控,从而评判该项目的得失优劣。从而提高目标单位内细分角度的舆情监控效率。When the embodiment of the present application implements project public opinion monitoring, after obtaining the identification information of the target project, obtain the data source website list of the target project through network search according to the identification information, and obtain the website list of the data source of the target project according to the identification information from the data source The corpus of the target item is crawled from the data source website contained in the website list, so as to obtain a more comprehensive corpus about the target item, and then analyze the corpus through natural language processing to identify the subject name and Public opinion features, importing the subject name and the public opinion features into the graph database to construct the public opinion relationship map of the target project, and visually display the public opinion relationship map of the target project, so as to realize the target unit from the perspective of subdivision. Public opinion monitoring, increasing the specific public opinion of the target project, can better realize the public opinion monitoring for a specific project in the target unit, so as to judge the pros and cons of the project. In this way, the efficiency of public opinion monitoring from the subdivision angle within the target unit can be improved.

请参阅图3,图3为本申请实施例提供的项目舆情监控方法的另一个流程示意图。在该实施例中,所述根据所述标识信息通过网络搜索的方式获取所述目标项目的数据源网站列表的步骤之后,还包括:Please refer to FIG. 3 . FIG. 3 is another schematic flowchart of the project public opinion monitoring method provided by the embodiment of the present application. In this embodiment, after the step of obtaining the list of data source websites of the target item by searching the Internet according to the identification information, it further includes:

S221、通过爬取的方式更新所述数据源网站列表。S221. Update the data source website list by crawling.

具体地,构建一个自动化增加数据源的爬虫策略,通过深度爬取从互联网上获取目标项目较为全面的数据源。能够自动化增加数据源的爬虫策略,是指所述爬虫接收初始化数据源网站后,根据获得的数据源网站能够自动扩展出更多的数据源网站以增加语料来源,从而获取目标项目较全面的语料。在本实施例中,能够自动化增加数据源的爬虫策略是指爬虫根据获得的数据源网站的类型和网址结构特征,通过爬取的方法,挖掘出与获得的数据源网址有关联的新数据源网站,比如与获得的数据源网址有相同的后缀,或者与获得的数据源网址属于同一个类型,比如均属于财经类网站等,从而从一个财经类网站扩展到其他财经类网站,由于同属财经类网站,就有可能存在针对同一个目标项目从不同角度进行解读的语料。由于彼此之间有关联的网站,尤其是在面对目标项目的热点问题时,会从不同的角度对目标项目进行解读和报道,从而不断完善数据源网站中的网站,丰富数据源网站中的数据源,达到增加数据源,保证数据量的基础。通过数据源网站获取目标项目的有关语料,通过丰富的数据源以获取目标项目全面而丰富的语料。进一步地,自动化增加数据源的爬虫策略可以为构建实时分布式的爬虫系统,通过分布式爬虫系统提高爬取数据的效率。具体地,服务器获取目标项目的初始数据源网站列表,将所述初始数据源网站列表按照预设条件进行分类以获取不同类型的数据源网站列表,封装不同类型的所述数据源网站列表至对应的Docker容器,将不同的Docker容器部署到不同的服务器上,启动所述Docker容器以使所述Docker容器通过爬取的方式获取新数据源网站,将所述新数据源网站添加至对应的初始数据源网站列表以更新所述目标项目的数据源网站。比如,构建一个自动化增加数据源的爬虫策略为实时分布式的爬虫系统,所述爬虫系统能根据输入的清单,比如根据输入的清单中网站网址的标识,区分不同网站的类型,根据网站的类型,分配清单到各个服务器中,实现分布式的数据爬取和数据入库,以提高爬取数据的效率。Specifically, build a crawler strategy that automatically adds data sources, and obtain more comprehensive data sources of target projects from the Internet through deep crawling. The crawler strategy that can automatically increase the data source means that after the crawler receives the initialized data source website, it can automatically expand more data source websites to increase the source of corpus according to the obtained data source website, so as to obtain a more comprehensive corpus of the target project . In this embodiment, the crawler strategy that can automatically increase the data source means that the crawler digs out a new data source associated with the obtained data source website by crawling according to the type of the obtained data source website and the structure characteristics of the URL Websites, for example, have the same suffix as the obtained data source website, or belong to the same type as the obtained data source website, for example, both belong to financial websites, etc., so as to expand from one financial website to other financial websites. There may be corpus interpreted from different angles for the same target item. Because the websites related to each other, especially when facing the hot issues of the target project, they will interpret and report the target project from different angles, so as to continuously improve the websites in the data source website and enrich the information in the data source website. Data sources, to increase data sources and ensure the basis of data volume. Obtain the relevant corpus of the target project through the data source website, and obtain comprehensive and rich corpus of the target project through rich data sources. Further, the crawler strategy of automatically adding data sources can be used to build a real-time distributed crawler system, and improve the efficiency of crawling data through the distributed crawler system. Specifically, the server obtains the initial data source website list of the target project, classifies the initial data source website list according to preset conditions to obtain different types of data source website lists, and encapsulates the different types of data source website lists into corresponding Docker container, deploy different Docker containers on different servers, start the Docker container so that the Docker container can obtain the new data source website by crawling, and add the new data source website to the corresponding initial List of data source sites to update the data source sites for said target project. For example, constructing a crawler strategy for automatically adding data sources is a real-time distributed crawler system. The crawler system can distinguish the types of different websites according to the input list, such as the identification of the website URL in the input list. According to the type of website , assign the list to each server, realize distributed data crawling and data storage, and improve the efficiency of crawling data.

请参阅图4,图4为本申请实施例提供的项目舆情监控方法的一个子流程示意图。如图4所示,在该实施例中,所述通过爬取的方式更新所述数据源网站列表的步骤包括:Please refer to FIG. 4 . FIG. 4 is a schematic diagram of a sub-flow of the project public opinion monitoring method provided by the embodiment of the present application. As shown in Figure 4, in this embodiment, the step of updating the list of data source websites by crawling includes:

S2210、获取所述目标项目的初始数据源网站列表;S2210. Obtain an initial data source website list of the target item;

S2211、将所述初始数据源网站列表按照预设条件进行分类以获取不同类型的数据源网站列表;S2211. Classify the initial data source website list according to preset conditions to obtain different types of data source website lists;

S2212、封装所述不同类型的数据源网站列表至对应的Docker容器;S2212. Encapsulating the different types of data source website lists into corresponding Docker containers;

S2213、启动所述Docker容器以使所述Docker容器通过爬取的方式获取新数据源网站;S2213. Start the Docker container so that the Docker container obtains the new data source website by crawling;

S2214、将所述新数据源网站按照类型分别添加至对应的分类后的数据源网站列表以更新所述目标项目的数据源网站。S2214. Add the new data source website according to type to the corresponding classified data source website list to update the data source website of the target project.

其中,预设条件包括网站地址或者数据来源等条件,网站地址是指根据网站的统一资源定位符(英文为Uniform Resource Locator,缩写为URL)来进行分类,由于不同网站的反爬虫策略不一样,导致网站中网页的数据结构不一样,针对不同的网站需要用不同的爬取策略,比如,新浪网的新闻比较好爬取,用BeautifulSoup直接解析,进行直接爬取即可,网易新闻的标题及内容是使用JS异步加载的,单纯的下载网页源代码是没有标题及内容的,可以在Network的JS中找到需要的内容,可以使用正则表达式来获取我们需要的标题及其链接,今日头条的新闻跟前两个不一样,它的标题和链接是封装到Json文件中的,但是Json文件的URL参数是通过一个JS随机算法变化的,需要模拟Json文件的参数,否则找不到Json文件的具体URL,网站来源包括财经网站、新闻网站或者论坛等。Among them, the preset conditions include conditions such as website address or data source, and the website address refers to classifying according to the Uniform Resource Locator (Uniform Resource Locator in English, abbreviated as URL) of the website. Since different websites have different anti-crawler strategies, As a result, the data structure of the web pages in the website is different, and different crawling strategies are required for different websites. For example, the news of Sina.com is easier to crawl, and it can be directly parsed by BeautifulSoup and directly crawled. The title of NetEase news and The content is loaded asynchronously using JS. Simply downloading the source code of the web page does not have the title and content. You can find the required content in the JS of the Network. You can use regular expressions to obtain the title and its link we need. Toutiao's The news is different from the previous two. Its title and link are encapsulated into the Json file, but the URL parameters of the Json file are changed through a JS random algorithm. It is necessary to simulate the parameters of the Json file, otherwise the specific details of the Json file cannot be found. URL, website sources include financial websites, news websites or forums, etc.

具体地,获取配置的目标项目的初始数据源网站列表,爬虫系统自动根据所述初始数据源网站列表的预设条件将所述初始数据源网站列表进行分类以获取不同类型的数据源网站列表,比如根据网站标识将数据源网站分为不同类型,然后封装不同类型的所述数据源网站列表至对应的Docker容器,所述Docker容器被部署到不同的服务器上,启动所述Docker容器以使所述Docker容器通过爬取获取丰富的新数据源网站,将所述新数据源网站添加至对应的初始数据源网站列表以更新所述目标项目的数据源网站,从而不断完善目标项目的数据源网站。具体来说,包括以下子步骤:Specifically, the initial data source website list of the configured target project is obtained, and the crawler system automatically classifies the initial data source website list according to the preset conditions of the initial data source website list to obtain different types of data source website lists, For example, the data source website is divided into different types according to the website identifier, and then the list of different types of data source websites is packaged into a corresponding Docker container, and the Docker container is deployed on a different server, and the Docker container is started so that all The Docker container obtains rich new data source websites by crawling, and adds the new data source websites to the corresponding initial data source website list to update the data source website of the target project, thereby continuously improving the data source website of the target project . Specifically, the following sub-steps are included:

首先,获得初始网站列表,该列表可以通过手动配置,也就是由人工提供初始的数据源网站,也可以是是根据标识信息搜索到的网站列表。First, an initial website list is obtained. The list can be manually configured, that is, an initial data source website is provided manually, or it can be a website list searched based on identification information.

其次,通过将编写好的爬虫代码封装到Docker容器中,其中代码包括了提取网站URL的部分,同时还有匹配URL与对应爬取程序的代码,从而使URL自动与爬取程序对应,通过对应的爬虫程序爬取对应的URL的网站。其中,需要构建URL与爬虫程序的索引关系,提前做好所有URL类型的网络爬虫,以使不同类型的URL爬虫对应不同的爬虫程序。Secondly, by encapsulating the written crawler code into a Docker container, the code includes the part of extracting the URL of the website, as well as the code for matching the URL and the corresponding crawler program, so that the URL automatically corresponds to the crawler program, and through the corresponding The crawler program crawls the website corresponding to the URL. Among them, it is necessary to construct an index relationship between URLs and crawlers, and prepare web crawlers for all URL types in advance, so that different types of URL crawlers correspond to different crawler programs.

第三,启动容器Docker1,通过爬虫代码将总输入清单进行分类和分割,将同一类的数据源清单进行保存,形成待爬取列表,等待爬取。其中,通过启动URL分类和分割的代码,对输入的网站URL列表根据URL类型进行分类,实现网站URL列表进行分类操作,然后,启动列表分割的代码,将不同的数据源清单分成若干个列表,对应不同机器上的Docker容器。Third, start the container Docker1, classify and divide the total input list through the crawler code, save the list of data sources of the same type, and form a list to be crawled, waiting to be crawled. Among them, by starting the URL classification and segmentation code, the input website URL list is classified according to the URL type, and the website URL list is classified, and then, the list segmentation code is started, and different data source lists are divided into several lists. Corresponds to Docker containers on different machines.

第四,启动容器Docker2,通过获得的数据源清单列表,通过匹配URL对应的爬虫程序,比如,X网站,对应着X网站爬取和解析的代码,传入X网站即可爬取,对外部网络进行访问,分开抓取对应的数据,并将数据返回到数据库中。Fourth, start the container Docker2, through the obtained list of data sources, by matching the crawler program corresponding to the URL, for example, X website, corresponding to the crawling and parsing code of the X website, and then uploading the X website to crawl, external The network is accessed, the corresponding data is fetched separately, and the data is returned to the database.

进一步地,爬虫程序根据获取的URL挖掘出新的URL,也就是爬虫程序通过启动URL挖掘出新的URL,并将新的URL存储到待爬取的URL列表中以完善URL列表。同时,还可以核对是否有爬取数据过程中报错的情况,若有报错的情况,针对此网站的爬取过程结束。Further, the crawler program digs out new URLs according to the obtained URLs, that is, the crawler program digs out new URLs by starting the URL, and stores the new URLs in the URL list to be crawled to complete the URL list. At the same time, you can also check whether there is an error reported in the process of crawling data. If there is an error reported, the crawling process for this website ends.

对URL进行分类,可以通过预先设置的URL正则表达式进行。每类URL列表都有对应的正则表达式,通过判断返回的结果是否为空,来判定是否为该类URL。判断过程如下:若返回结果非空,则判断为该类URL,若判断结果为空,判断为非该类URL。To classify URLs, you can use preset URL regular expressions. Each type of URL list has a corresponding regular expression. By judging whether the returned result is empty, it is determined whether it is a URL of this type. The judging process is as follows: if the returned result is not empty, it is judged as a URL of this type; if the result of judging is empty, it is judged as a URL of not this type.

第五,直到所有Docker2的待爬取网站列表为空,停止操作。为了完善数据源网站列表,可以采取定时或者不定时的方式根据已获得的数据源网站列表重复上述步骤,以实现数据源网站列表的更新。Fifth, until all the Docker2 website lists to be crawled are empty, stop the operation. In order to improve the list of data source websites, the above steps can be repeated according to the obtained list of data source websites in a regular or irregular manner, so as to update the list of data source websites.

请参阅图5,图5为本申请实施例提供的项目舆情监控方法的另一个子流程示意图。如图5所示,在该实施例中,所述通过自然语言处理解析所述语料以识别所述语料所包含的主体名称及舆情特征的步骤包括:Please refer to FIG. 5 . FIG. 5 is a schematic diagram of another sub-flow of the project public opinion monitoring method provided by the embodiment of the present application. As shown in Figure 5, in this embodiment, the step of analyzing the corpus to identify the subject name and public opinion features contained in the corpus through natural language processing includes:

S2400、将所述语料根据句子分隔号进行分割以获取句子数据集。S2400. Divide the corpus according to sentence separators to obtain sentence data sets.

其中,句子分隔号包括句子标点符号和分解词,所述句子标点符号包括“。”、“?”、“;”及“!”等标点符号,所述分解词包括“的”、“且”、“中”、“我们”及“根据”等预先设置的可以作为句子隔断的字或者词。Wherein, the sentence separator includes sentence punctuation marks and decomposition words, and the sentence punctuation marks include punctuation marks such as ".", "?", ";" and "!", and the decomposition words include "of", "and" , "中", "We" and "Based on" and other preset words or words that can be used as sentence separators.

具体地,将通过爬虫系统爬取的语料根据句子分隔号进行分隔,得到句子数据集,以便从句子数据集中筛选出包含名称的句子。Specifically, the corpus crawled by the crawler system is separated according to the sentence separator to obtain a sentence data set, so as to filter out sentences containing names from the sentence data set.

S2401、根据所述语料构建命名实体模型。S2401. Construct a named entity model according to the corpus.

其中,命名实体,英文为Named Entity,所谓的命名实体就是人名、机构名、地名以及其他所有以名称为标识的实体,更广泛的实体还包括数字、日期、货币、地址等等。Among them, the named entity, English is Named Entity, the so-called named entity is the name of the person, the name of the organization, the place name and all other entities identified by the name, and the wider entity also includes numbers, dates, currencies, addresses and so on.

具体地,命名实体模型的构建,通过获取的语料内容进行命名实体的标注,通过CRF模型,构建命名实体识别模型,识别出目标项目的主体名称。其中,CRF模型,CRF,英文为Conditional Random Field,条件随机场,是近几年自然语言处理领域常用的算法之一,基于统计学的模型,CRF本质上是隐含变量的马尔科夫链和可观测状态到隐含变量的条件概率。Specifically, in the construction of the named entity model, the named entity is marked through the acquired corpus content, and the named entity recognition model is constructed through the CRF model to identify the subject name of the target project. Among them, the CRF model, CRF, Conditional Random Field in English, conditional random field, is one of the commonly used algorithms in the field of natural language processing in recent years, based on statistical models, CRF is essentially a Markov chain of hidden variables and Conditional probability of observable states to hidden variables.

S2402、通过所述命名实体模型识别出所述句子数据集中所包含的主体名称。S2402. Using the named entity model, identify subject names included in the sentence dataset.

其中,命名实体识别,英文为Named Entity Recognition,简称NER,又称作“专名识别”,是指识别文本中具有特定意义的实体,主要包括人名、地名、机构名、专有名词等。Among them, named entity recognition, English is Named Entity Recognition, NER for short, also known as "proper name recognition", which refers to the identification of entities with specific meaning in text, mainly including names of people, places, institutions, proper nouns, etc.

具体地,命名实体模型构建完成后,通过命名实体模型处理获得的句子数据集,通过命名实体模型可以自动识别出句子数据集中包含的主题名称。比如,通过所述语料内容进行命名实体的标注,通过CRF模型,构建命名实体识别模型,识别出公司主体名称。通过命名实体模型,识别出所述目标项目的相关信息中的句子语料,对词进行词性分析和项目关键关系的检索,若出现了核心的关键词,比如手机产品,如电池续航持久、系统便于操作等,则将相关信息保存为目标项目的具体属性,同时该具体属性还可以携带上当前日期和时间,丰富目标项目的舆情关系图谱的舆情数据。Specifically, after the named entity model is constructed, the obtained sentence data set is processed through the named entity model, and the topic names contained in the sentence data set can be automatically identified through the named entity model. For example, the named entity is marked through the content of the corpus, and the named entity recognition model is constructed through the CRF model to identify the name of the company entity. Through the named entity model, the sentence corpus in the relevant information of the target project is identified, and the word part of speech is analyzed and the key relationship of the project is retrieved. If there are core keywords, such as mobile phone products, such as long-lasting battery life and convenient system operation, etc., save the relevant information as a specific attribute of the target project. At the same time, the specific attribute can also carry the current date and time to enrich the public opinion data of the public opinion relationship graph of the target project.

S2403、对所述语料进行词性分析及目标关系的检索以获取所述目标项目的舆情特征。S2403. Perform part-of-speech analysis and target relationship retrieval on the corpus to obtain public opinion features of the target item.

其中,词性是指以词的特点作为划分词类的根据,比如动词、名词等词性。目标关系是指所述语料中包含的目标项目涉及的主体之间的关系,比如,手机与手机组件之间的从属关系等,比如,手机与手机的摄像头之间的包含关系等。Among them, the part of speech refers to the characteristics of words as the basis for dividing part of speech, such as verbs, nouns and other parts of speech. The target relationship refers to the relationship between subjects involved in the target item included in the corpus, for example, the affiliation relationship between the mobile phone and components of the mobile phone, for example, the inclusion relationship between the mobile phone and the camera of the mobile phone, etc.

具体地,对所述语料进行词性分析和主体关系的识别,包括以下过程Specifically, performing part-of-speech analysis and subject relationship identification on the corpus includes the following processes

首先,对所述语料进行分词。对语句类型进行分词操作可以采用结巴分词。其中,结巴分词是Python中分词工具之一,Python中分词工具很多,包括盘古分词、Yaha分词、Jieba分词、清华THULAC等。First, word segmentation is performed on the corpus. The word segmentation operation for the sentence type can use stuttering word segmentation. Among them, stammering word segmentation is one of the word segmentation tools in Python. There are many word segmentation tools in Python, including Pangu word segmentation, Yaha word segmentation, Jieba word segmentation, Tsinghua THULAC, etc.

其次,进行核心关系的抽取。具体地,抽取出动词的动作,并且进行关键词列表的匹配,若是动词词汇在关键词内,则认定为核心关系,并且获取到动词后面的名词对象,为命名关系客体,获取到动词前面的名词对象,为命名关系主体,命名关系主体也就是目标。将获取的命名关系主体、命名关系客体以及命名关系主体和命名关系客体之间的关系作为舆情特征,将抽取的核心关系涉及的主体名称及体现属性的特征数据存入图数据库中。Second, the core relationship is extracted. Specifically, the action of the verb is extracted, and the keyword list is matched. If the verb vocabulary is in the keyword, it is identified as the core relationship, and the noun object behind the verb is obtained, which is the object of the named relationship, and the object in front of the verb is obtained The noun object is the subject of the named relationship, and the subject of the named relationship is also the target. The acquired naming relationship subjects, naming relationship objects, and the relationship between naming relationship subjects and naming relationship objects are used as public opinion features, and the extracted core relationship involves subject names and characteristic data reflecting attributes are stored in the graph database.

进一步地,所述根据所述语料构建命名实体模型的步骤包括:Further, the step of constructing a named entity model according to the corpus includes:

1)、对所述语料进行分词以获取分词结果;1), carry out word segmentation to described corpus to obtain word segmentation result;

2)、通过预设的特征模板提取所述分词结果中的特征数据;2), extracting the feature data in the word segmentation result by a preset feature template;

3)、基于所述特征数据训练预设的条件随机场模型以构建命名实体模型。3) Training a preset conditional random field model based on the feature data to construct a named entity model.

具体地,通过获取的语料构建命名实体模型,具体包括以下步骤:Specifically, constructing a named entity model through the acquired corpus includes the following steps:

首先,获得命名实体训练语料,该语料主要来自于爬虫系统通过爬取的方式获取的目标项目的语料。First, the named entity training corpus is obtained, which mainly comes from the corpus of the target item obtained by the crawler system through crawling.

其次,对所述语料预处理。主要采用结巴分词并且去除停用词以及无意义的词,获取分词结果。Second, the corpus is preprocessed. Mainly use stuttering word segmentation and remove stop words and meaningless words to obtain word segmentation results.

第三,进行特征提取。通过由正则表达式组成的特征模板进行特征提取,获取的特征包括词、词性、边界词、命名实体特征词。Third, perform feature extraction. Feature extraction is performed through feature templates composed of regular expressions, and the acquired features include words, parts of speech, boundary words, and named entity feature words.

第四,创建和训练基于条件随机场的模型。条件随机场也就是CRF模型,通过训练数据训练CRF模型,获得CRF模型的参数,保存训练后的CRF模型。Fourth, create and train models based on conditional random fields. The conditional random field is also the CRF model. The CRF model is trained through the training data, the parameters of the CRF model are obtained, and the trained CRF model is saved.

第五,通过测试数据的评价,并保留识别率高等最终符合要求的模型,以获取构建的命名实体模型。Fifth, through the evaluation of the test data, and retain the model with a high recognition rate that finally meets the requirements, to obtain the constructed named entity model.

请参阅图6,图6为本申请实施例提供的项目舆情监控方法的第三个子流程示意图。在该实施例中,所述通过自然语言处理解析所述语料以识别所述语料所包含的主体名称及舆情特征的步骤包括:Please refer to FIG. 6, which is a schematic diagram of the third sub-flow of the project public opinion monitoring method provided by the embodiment of the present application. In this embodiment, the step of parsing the corpus through natural language processing to identify subject names and public opinion features contained in the corpus includes:

S2500、对所述语料进行分词以获取所述语料的词汇列表。S2500. Segment the corpus to obtain a vocabulary list of the corpus.

具体地,对所述语料预处理,主要采用结巴分词并且去除停用词以及无意义的词,获取词汇列表。Specifically, for the preprocessing of the corpus, stammer word segmentation is mainly used and stop words and meaningless words are removed to obtain a vocabulary list.

S2501、使用第一正则表达式抽取出所述词汇列表中的核心关系以得到舆情特征;S2501. Use the first regular expression to extract the core relationship in the vocabulary list to obtain public opinion features;

S2502、使用第二正则表达式抽取出所述词汇列表中的核心关系涉及的命名实体以得到主体名称。S2502. Use the second regular expression to extract the named entities involved in the core relationship in the vocabulary list to obtain the subject name.

具体地,使用正则表达式进行核心关系的抽取。具体地,通过正则表达式抽取出动词的动作,并且进行关键词列表的匹配,若是动词词汇在关键词内,则认定为核心关系,并且获取到动词后面的名词对象,为命名关系客体,获取到动词前面的名词对象,为命名关系主体,命名关系主体也就是目标。将获取的命名关系主体、命名关系客体以及命名关系主体和命名关系客体之间的关系作为舆情特征。其中,正则表达式,又称规则表达式,英语为Regular Expression,在代码中常简写为Regex、Regexp或RE,正则表达式通常被用来检索、替换那些符合某个模式(规则)的文本。Specifically, regular expressions are used to extract core relations. Specifically, the action of the verb is extracted through regular expressions, and the keyword list is matched. If the verb vocabulary is in the keyword, it is identified as the core relationship, and the noun object behind the verb is obtained, which is the object of the named relationship. The noun object in front of the verb is the subject of the named relation, and the subject of the named relation is also the target. The obtained naming relationship subject, naming relationship object, and the relationship between the naming relationship subject and the naming relationship object are used as public opinion features. Among them, regular expressions, also known as regular expressions, are Regular Expressions in English, and are often abbreviated as Regex, Regexp or RE in code. Regular expressions are usually used to retrieve and replace text that conforms to a certain pattern (rule).

然后,将所述核心关系作为舆情特征及所述命名实体作为主体名称导入图数据库以构建所述目标项目的舆情关系图谱。在设计图形数据库时,分清图中节点集,节点以及关系之间的相互联系,在导入数据时,图形数据库自动识别导入数据中的节点数据和关系数据,将所述节点数据和关系数据分别归属到图形数据库的对应位置上。在本实例中,将所述核心关系及所述命名实体导入图数据库后,可以自动构建所述目标项目的舆情关系图谱。比如,将核心关系来构建手机与手机组件之间的关系,而将组件的特征描述导入了组件节点的属性中。其中,图数据库,又称为图形数据库,英文为Graph Database,图形数据库是NoSQL数据库的一种类型,它应用图形理论存储实体之间的关系信息,常见的图形数据库包括Neo4j、FlockDB及AllegroGrap等。Then, import the core relationship as the public opinion feature and the named entity as the subject name into the graph database to construct the public opinion relationship graph of the target project. When designing a graph database, distinguish between node sets, nodes, and relationships in the graph. When importing data, the graph database automatically recognizes the node data and relationship data in the imported data, and assigns the node data and relationship data to the to the corresponding location in the graph database. In this example, after the core relationship and the named entity are imported into the graph database, the public opinion relationship graph of the target item can be automatically constructed. For example, the core relationship is used to construct the relationship between mobile phones and mobile phone components, and the feature description of components is imported into the attributes of component nodes. Among them, graph database, also known as graph database, is Graph Database in English. Graph database is a type of NoSQL database. It applies graph theory to store relationship information between entities. Common graph databases include Neo4j, FlockDB, and AllegroGrap.

请继续参阅图3,如图3所示,在该实施例中,所述显示所述舆情关系图谱的步骤之后,还包括:Please continue to refer to Fig. 3, as shown in Fig. 3, in this embodiment, after the step of displaying the public opinion relationship map, it also includes:

S270、按照预设顺序组合所述舆情关系图谱中的元素以通过文字形式描述所述目标项目的舆情。S270. Combine the elements in the public opinion relationship graph in a preset order to describe the public opinion of the target item in text form.

进一步地,所述按照预设顺序组合所述舆情关系图谱中的元素以通过文字形式描述所述目标项目的舆情的步骤包括:Further, the step of combining elements in the public opinion relationship graph in a preset order to describe the public opinion of the target item in text includes:

按照预设顺序组合所述舆情关系图谱中的元素以通过文字形式描述所述目标项目的正面舆情信息、反面舆情信息、事件评估信息和渠道评估信息。Combining the elements in the public opinion relationship graph in a preset order to describe the positive public opinion information, negative public opinion information, event evaluation information and channel evaluation information of the target project in text form.

具体地,不但以所述目标项目的舆情关系图谱的形式显示目标项目的舆情以实现目标项目舆情监控,同时,通过结合文字的显示形式,给出所述目标项目的舆情关系图谱的舆情结论,以供目标项目舆情监控人员参考。所述舆情结论包括舆情的正面舆情信息、反面舆情信息、事件评估信息和渠道评估信息,其中,所述正面舆情信息指舆情的正面影响,比如,针对一款手机,该手机的电池续航时间长及外观漂亮等,反面舆情信息指舆情的反面影响,比如,针对一款手机,该手机的电池续航时间短及充电时手机太热等,事件评估信息是指对舆情中某一事件的影响进行预测评价和估计,比如,一产品测评报道对该产品的影响,渠道评估信息是指语料来源所属的渠道对该目标的影响,比如,不同网站的受众、规模及影响均不相同,需要评估事件所属的渠道对目标影响的估计,比如,微博、微信朋友圈及论坛对目标的影响各不相同。Specifically, not only displaying the public opinion of the target project in the form of the public opinion relationship map of the target project to realize the public opinion monitoring of the target project, but at the same time, by combining the display form of the text, giving the public opinion conclusion of the public opinion relationship map of the target project, For the reference of public opinion monitoring personnel of the target project. The public opinion conclusion includes positive public opinion information, negative public opinion information, event evaluation information and channel evaluation information of public opinion, wherein the positive public opinion information refers to the positive impact of public opinion, for example, for a mobile phone, the battery life of the mobile phone is long The negative public opinion information refers to the negative impact of public opinion. For example, for a mobile phone, the battery life of the mobile phone is short and the mobile phone is too hot when charging, etc. The event evaluation information refers to the impact of an event in the public opinion. Forecast evaluation and estimation, for example, the impact of a product evaluation report on the product, channel evaluation information refers to the impact of the channel to which the corpus source belongs to the target, for example, different websites have different audiences, scales, and influences, and events need to be evaluated Estimates of the influence of the channel to which the target belongs, for example, Weibo, WeChat circle of friends and forums have different influences on the target.

按照预设顺序组合所述目标项目的舆情关系图谱中的元素以通过文字形式描述所述目标项目的舆情时,可以根据图形数据库中存储的实体之间的关系信息,根据图形数据库在设计图形数据库时的信息特征,分清图中节点集与节点以及关系之间的相互联系,然后将节点与节点之间的关系通过文字描述出来,以实现通过文字形式描述所述目标项目的舆情,给项目舆情监控人员以文字性的提示。比如,若所述目标项目的舆情关系图谱中,节点A和B之间的关系从属关系,通过文字形式描述所述目标项目的舆情时可以描述为“节点A从属于节点B”。进一步地,若获得节点A影响节点B的信息,还可以进一步从获取的语料中筛选出节点A影响节点B的相关信息,根据训练出的正则表达式或者语言模型形成节点A影响节点B的信息摘要,以文字形式提供给项目舆情监控人员,供项目舆情监控人员参考,比如,手机的处理器对手机的影响,或者手机的电池对手机的影响等。其中,语言模型,比如N-gram语言模型或者神经网络语言模型等。When combining the elements in the public opinion relationship map of the target item in a preset order to describe the public opinion of the target item in text form, the graphic database can be designed according to the relationship information between entities stored in the graphic database Information characteristics of time, distinguish the interrelationships between node sets, nodes and relationships in the graph, and then describe the relationship between nodes in words, so as to describe the public opinion of the target project in text form and give the public opinion of the project The monitoring staff will give textual prompts. For example, if in the public opinion relationship graph of the target project, there is a subordination relationship between nodes A and B, the public opinion of the target project can be described as "node A is subordinate to node B" when describing the public opinion of the target project in text form. Further, if the information that node A influences node B is obtained, the relevant information that node A influences node B can be further screened out from the acquired corpus, and the information that node A influences node B can be formed according to the trained regular expression or language model Abstract, provided in text form to the project public opinion monitoring personnel for reference, for example, the impact of the processor of the mobile phone on the mobile phone, or the impact of the battery of the mobile phone on the mobile phone, etc. Among them, a language model, such as an N-gram language model or a neural network language model.

进一步地,在一个实施例中,通过构建目标项目图谱数据,搭建了与目标项目相关的新闻语料库,在可视化之前,还需要对目标项目的舆情数据进行时间排序,按照时间顺序罗列出目标项目排名靠前的新闻数据,以进一步筛选出有效数据,提高数据的处理效率。比如,通过构建一款手机的图谱数据,搭建了与该款手机相关的新闻语料库,在可视化之前,还需要对该款手机的舆情数据进行时间排序,罗列出该款手机最近排名比较前的新闻数据,以获知最近该款手机最受关注的性能,比如拍照或者电池电量等。Further, in one embodiment, by constructing the map data of the target project, a news corpus related to the target project is built. Before visualization, the public opinion data of the target project needs to be time-sorted, and the target projects are listed in chronological order Top-ranked news data to further filter out effective data and improve data processing efficiency. For example, by constructing the map data of a mobile phone, a news corpus related to the mobile phone is built. Before visualization, the public opinion data of the mobile phone needs to be time-sorted, and a list of the most recent rankings of the mobile phone is listed. News data to know the most concerned performance of this mobile phone recently, such as taking pictures or battery power.

另外,还可以对目标项目的关联领域进行深度分析。比如,产品竞争关系中,需要从竞品属性中得到与目标项目相关的舆情数据,并且对数据进行分类和去重,呈现至用户。In addition, in-depth analysis of the relevant fields of the target project can also be carried out. For example, in the product competition relationship, it is necessary to obtain public opinion data related to the target project from the attributes of competing products, and classify and deduplicate the data, and present it to users.

进一步地,获得目标项目的舆情,实现对所述目标项目的舆情监控,可以进一步根据目标项目的舆情做应对处理,实现维护目标项目,比如,实现企业的产品形象公关,以维护企业的形象和利益。比如,若目标项目为一款手机,获得该款手机的舆情的正面信息和反面信息,获取该款手机舆情中的事件评估信息和渠道评估信息,以作相应公关措施。Further, to obtain the public opinion of the target project, realize the public opinion monitoring of the target project, and further deal with it according to the public opinion of the target project, realize the maintenance of the target project, for example, realize the public relations of the product image of the enterprise to maintain the image and image of the enterprise Benefit. For example, if the target project is a mobile phone, obtain the positive and negative information of the public opinion of the mobile phone, and obtain the event evaluation information and channel evaluation information in the public opinion of the mobile phone for corresponding public relations measures.

需要说明的是,上述各个实施例所述的项目舆情监控方法,可以根据需要将不同实施例中包含的技术特征重新进行组合,以获取组合后的实施方案,但都在本申请要求的保护范围之内。It should be noted that the project public opinion monitoring methods described in the above embodiments can recombine the technical features contained in different embodiments according to needs to obtain a combined implementation plan, but all of them are within the scope of protection required by this application within.

请参阅图7,图7为本申请实施例提供的项目舆情监控装置的示意性框图。对应于上述项目舆情监控方法,本申请实施例还提供一种项目舆情监控装置。如图7所示,该项目舆情监控装置包括用于执行上述项目舆情监控方法的单元,该装置可以被配置于终端等计算机设备中。具体地,请参阅图7,该项目舆情监控装置700包括第一获取单元701、第二获取单元702、爬取单元703、识别单元704、构建单元705及显示单元706。Please refer to FIG. 7, which is a schematic block diagram of a project public opinion monitoring device provided by an embodiment of the present application. Corresponding to the above project public opinion monitoring method, an embodiment of the present application further provides a project public opinion monitoring device. As shown in FIG. 7 , the project public opinion monitoring device includes a unit for executing the above-mentioned project public opinion monitoring method, and the device can be configured in a computer device such as a terminal. Specifically, please refer to FIG. 7 , the project public opinion monitoring device 700 includes a first acquisition unit 701 , a second acquisition unit 702 , a crawling unit 703 , an identification unit 704 , a construction unit 705 and a display unit 706 .

其中,第一获取单元701,用于通过预设方式获取目标项目的标识信息;Wherein, the first acquiring unit 701 is configured to acquire the identification information of the target item in a preset manner;

第二获取单元702,用于根据所述标识信息通过网络搜索的方式获取所述目标项目的数据源网站列表;The second acquiring unit 702 is configured to acquire a list of data source websites of the target item through a network search according to the identification information;

爬取单元703,用于根据所述标识信息从所述数据源网站列表所包含的数据源网站中爬取所述目标项目的语料;A crawling unit 703, configured to crawl the corpus of the target item from the data source websites included in the data source website list according to the identification information;

识别单元704,用于通过自然语言处理解析所述语料以识别所述语料所包含的主体名称及舆情特征;A recognition unit 704, configured to parse the corpus through natural language processing to identify subject names and public opinion features contained in the corpus;

构建单元705,用于将所述主体名称及所述舆情特征导入图数据库以构建所述目标项目的舆情关系图谱;A construction unit 705, configured to import the subject name and the public opinion features into the graph database to construct the public opinion relationship graph of the target project;

显示单元706,用于显示所述舆情关系图谱。The display unit 706 is configured to display the public opinion relationship graph.

请参阅图8,图8为本申请实施例提供的项目舆情监控装置的另一个示意性框图。如图8所示,在该实施例中,所述项目舆情监控装置700还包括:Please refer to FIG. 8 . FIG. 8 is another schematic block diagram of a project public opinion monitoring device provided by an embodiment of the present application. As shown in Figure 8, in this embodiment, the project public opinion monitoring device 700 also includes:

更新单元707、用于通过爬取的方式更新所述数据源网站列表。The update unit 707 is configured to update the list of data source websites by means of crawling.

请继续参阅图8,如图8所示,所述更新单元707包括:Please continue to refer to FIG. 8, as shown in FIG. 8, the updating unit 707 includes:

获取子单元7071,用于获取所述目标项目的初始数据源网站列表;The obtaining subunit 7071 is used to obtain the initial data source website list of the target project;

分类子单元7072,用于将所述初始数据源网站列表按照预设条件进行分类以获取不同类型的数据源网站列表;A classification subunit 7072, configured to classify the initial data source website list according to preset conditions to obtain different types of data source website lists;

封装子单元7073,用于封装所述不同类型的数据源网站列表至对应的Docker容器;Encapsulation subunit 7073, configured to encapsulate the different types of data source website lists into corresponding Docker containers;

爬取子单元7074,用于启动所述Docker容器以使所述Docker容器通过爬取的方式获取新数据源网站;The crawling subunit 7074 is used to start the Docker container so that the Docker container obtains a new data source website by crawling;

更新子单元7075,用于将所述新数据源网站按照类型分别添加至对应的分类后的数据源网站列表以更新所述目标项目的数据源网站。The update subunit 7075 is configured to add the new data source website according to type to the corresponding classified data source website list to update the data source website of the target project.

在一个实施例中,所述目标项目的舆情关系图谱包括目标项目名称、所述目标项目的子项目名称及子项目名称对应的舆情特征。In one embodiment, the public opinion relationship graph of the target item includes the target item name, sub-item names of the target item, and public opinion features corresponding to the sub-item names.

请参阅图8,如图8所示,在该实施例中,所述识别单元704包括:Please refer to FIG. 8, as shown in FIG. 8, in this embodiment, the identification unit 704 includes:

分割子单元7041,用于将所述语料根据句子分隔号进行分割以获取句子数据集;The segmentation subunit 7041 is used to segment the corpus according to the sentence separator to obtain the sentence data set;

构建子单元7042,用于根据所述语料构建命名实体模型;Build a subunit 7042, configured to build a named entity model according to the corpus;

识别子单元7043,用于通过所述命名实体模型识别出所述句子数据集中所包含的主体名称;The identification subunit 7043 is used to identify the subject names contained in the sentence data set through the named entity model;

检索子单元7044,用于对所述语料进行词性分析及目标关系的检索以获取所述目标项目的舆情特征。The retrieval subunit 7044 is configured to perform part-of-speech analysis on the corpus and retrieve target relationships to obtain public opinion features of the target item.

请参阅图8,如图8所示,在该实施例中,所述项目舆情监控装置700还包括:Please refer to FIG. 8, as shown in FIG. 8, in this embodiment, the project public opinion monitoring device 700 also includes:

描述单元708,用于按照预设顺序组合所述舆情关系图谱中的元素以通过文字形式描述所述目标项目的舆情。The description unit 708 is configured to combine elements in the public opinion relationship graph in a preset order to describe the public opinion of the target item in text.

在一个实施例中,所述描述单元708,用于按照预设顺序组合所述舆情关系图谱中的元素以通过文字形式描述所述目标项目的正面舆情信息、反面舆情信息、事件评估信息和渠道评估信息。In one embodiment, the description unit 708 is configured to combine the elements in the public opinion relationship graph in a preset order to describe the positive public opinion information, negative public opinion information, event evaluation information and channels of the target project in text form Assessment information.

需要说明的是,所属领域的技术人员可以清楚地了解到,上述项目舆情监控装置和各单元的具体实现过程,可以参考前述方法实施例中的相应描述,为了描述的方便和简洁,在此不再赘述。It should be noted that those skilled in the art can clearly understand that the specific implementation process of the above project public opinion monitoring device and each unit can refer to the corresponding descriptions in the foregoing method embodiments. Let me repeat.

同时,上述项目舆情监控装置中各个单元的划分和连接方式仅用于举例说明,在其他实施例中,可将项目舆情监控装置按照需要划分为不同的单元,也可将项目舆情监控装置中各单元采取不同的连接顺序和方式,以完成上述项目舆情监控装置的全部或部分功能。At the same time, the division and connection of each unit in the above-mentioned project public opinion monitoring device are only for illustration. In other embodiments, the project public opinion monitoring device can be divided into different units according to needs, and each unit in the project public opinion monitoring device can also be The units adopt different connection sequences and methods to complete all or part of the functions of the public opinion monitoring device of the above project.

上述项目舆情监控装置可以实现为一种计算机程序的形式,该计算机程序可以在如图9所示的计算机设备上运行。The above project public opinion monitoring device can be realized in the form of a computer program, and the computer program can run on the computer equipment as shown in FIG. 9 .

请参阅图9,图9是本申请实施例提供的一种计算机设备的示意性框图。该计算机设备900可以是台式机电脑或者服务器等计算机设备,也可以是其他设备中的组件或者部件。Please refer to FIG. 9. FIG. 9 is a schematic block diagram of a computer device provided by an embodiment of the present application. The computer device 900 may be a computer device such as a desktop computer or a server, or may be a component or component in other devices.

参阅图9,该计算机设备900包括通过系统总线901连接的处理器902、存储器和网络接口905,其中,存储器可以包括非易失性存储介质903和内存储器904。Referring to FIG. 9 , the computer device 900 includes a processor 902 connected through a system bus 901 , a memory and a network interface 905 , wherein the memory may include a non-volatile storage medium 903 and an internal memory 904 .

该非易失性存储介质903可存储操作系统9031和计算机程序9032。该计算机程序9032被执行时,可使得处理器902执行一种上述项目舆情监控方法。The non-volatile storage medium 903 can store an operating system 9031 and a computer program 9032 . When the computer program 9032 is executed, it can cause the processor 902 to execute the above-mentioned project public opinion monitoring method.

该处理器902用于提供计算和控制能力,以支撑整个计算机设备900的运行。The processor 902 is used to provide calculation and control capabilities to support the operation of the entire computer device 900 .

该内存储器904为非易失性存储介质903中的计算机程序9032的运行提供环境,该计算机程序9032被处理器902执行时,可使得处理器902执行一种上述项目舆情监控方法。The internal memory 904 provides an environment for the running of the computer program 9032 in the non-volatile storage medium 903. When the computer program 9032 is executed by the processor 902, the processor 902 can execute a method for monitoring project public opinion mentioned above.

该网络接口905用于与其它设备进行网络通信。本领域技术人员可以理解,图9中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备900的限定,具体的计算机设备900可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。例如,在一些实施例中,计算机设备可以仅包括存储器及处理器,在这样的实施例中,存储器及处理器的结构及功能与图9所示实施例一致,在此不再赘述。The network interface 905 is used for network communication with other devices. Those skilled in the art can understand that the structure shown in FIG. 9 is only a block diagram of a partial structure related to the solution of this application, and does not constitute a limitation on the computer device 900 on which the solution of this application is applied. The specific computer device 900 may include more or fewer components than shown, or combine certain components, or have a different arrangement of components. For example, in some embodiments, the computer device may only include a memory and a processor. In such an embodiment, the structures and functions of the memory and the processor are consistent with those of the embodiment shown in FIG. 9 , and will not be repeated here.

其中,所述处理器902用于运行存储在存储器中的计算机程序9032,以实现如下步骤:通过预设方式获取目标项目的标识信息;根据所述标识信息通过网络搜索的方式获取所述目标项目的数据源网站列表;根据所述标识信息从所述数据源网站列表所包含的数据源网站中爬取所述目标项目的语料;通过自然语言处理解析所述语料以识别所述语料所包含的主体名称及舆情特征;将所述主体名称及所述舆情特征导入图数据库以构建所述目标项目的舆情关系图谱;显示所述舆情关系图谱。Wherein, the processor 902 is configured to run a computer program 9032 stored in the memory to implement the following steps: obtain the identification information of the target item in a preset manner; obtain the target item through a network search according to the identification information A list of data source websites; crawl the corpus of the target project from the data source websites contained in the data source website list according to the identification information; analyze the corpus by natural language processing to identify the corpus contained in the corpus subject name and public opinion features; importing the subject name and the public opinion features into a graph database to construct a public opinion relationship map of the target project; displaying the public opinion relationship map.

在一实施例中,所述处理器902在实现所述根据所述标识信息通过网络搜索的方式获取所述目标项目的数据源网站列表的步骤之后,还实现以下步骤:In one embodiment, after the processor 902 implements the step of obtaining the list of data source websites of the target item by searching the Internet according to the identification information, it further implements the following steps:

通过爬取的方式更新所述数据源网站列表。The list of data source websites is updated by crawling.

在一实施例中,所述处理器902在实现所述通过爬取的方式更新所述数据源网站列表的步骤时,具体实现以下步骤:In one embodiment, when the processor 902 implements the step of updating the list of data source websites by crawling, it specifically implements the following steps:

获取所述目标项目的初始数据源网站列表;obtaining an initial data source website list of the target project;

将所述初始数据源网站列表按照预设条件进行分类以获取不同类型的数据源网站列表;Classifying the initial data source website list according to preset conditions to obtain different types of data source website lists;

封装所述不同类型的数据源网站列表至对应的Docker容器;Encapsulating the different types of data source website lists into corresponding Docker containers;

启动所述Docker容器以使所述Docker容器通过爬取的方式获取新数据源网站;Start the Docker container so that the Docker container obtains the new data source website by crawling;

将所述新数据源网站按照类型分别添加至对应的分类后的数据源网站列表以更新所述目标项目的数据源网站。The new data source website is added to the corresponding classified data source website list according to type to update the data source website of the target project.

在一实施例中,所述处理器902在实现所述目标项目的舆情关系图谱时,所述舆情关系图谱具体包括以下内容:目标项目名称、所述目标项目的子项目名称及子项目名称对应的舆情特征。In one embodiment, when the processor 902 implements the public opinion relationship map of the target item, the public opinion relationship map specifically includes the following content: target item name, sub-item name of the target item, and sub-item name correspondence characteristics of public opinion.

在一实施例中,所述处理器902在实现所述通过自然语言处理解析所述语料以识别所述语料所包含的主体名称及舆情特征的步骤时,具体实现以下步骤:In one embodiment, when the processor 902 implements the step of parsing the corpus through natural language processing to identify subject names and public opinion features contained in the corpus, the following steps are specifically implemented:

将所述语料根据句子分隔号进行分割以获取句子数据集;The corpus is divided according to the sentence separator to obtain a sentence data set;

根据所述语料构建命名实体模型;Constructing a named entity model according to the corpus;

通过所述命名实体模型识别出所述句子数据集中所包含的主体名称;Identifying subject names contained in the sentence data set through the named entity model;

对所述语料进行词性分析及目标关系的检索以获取所述目标项目的舆情特征。Perform part-of-speech analysis and target relationship retrieval on the corpus to obtain public opinion features of the target item.

在一实施例中,所述处理器902在实现所述显示所述舆情关系图谱的步骤之后,还实现以下步骤:In one embodiment, the processor 902 further implements the following steps after realizing the step of displaying the public opinion relationship graph:

按照预设顺序组合所述舆情关系图谱中的元素以通过文字形式描述所述目标项目的舆情。Combining the elements in the public opinion relationship graph in a preset order to describe the public opinion of the target item in text form.

在一实施例中,所述处理器902在实现所述按照预设顺序组合所述舆情关系图谱中的元素以通过文字形式描述所述目标项目的舆情的步骤时,具体实现以下步骤:In one embodiment, when the processor 902 implements the step of combining elements in the public opinion relationship graph in a preset order to describe the public opinion of the target item in text form, the following steps are specifically implemented:

按照预设顺序组合所述舆情关系图谱中的元素以通过文字形式描述所述目标项目的正面舆情信息、反面舆情信息、事件评估信息和渠道评估信息。Combining the elements in the public opinion relationship graph in a preset order to describe the positive public opinion information, negative public opinion information, event evaluation information and channel evaluation information of the target project in text form.

应当理解,在本申请实施例中,处理器902可以是中央处理单元(CentralProcessing Unit,CPU),该处理器902还可以是其他通用处理器、数字信号处理器(DigitalSignal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that in the embodiment of the present application, the processor 902 may be a central processing unit (Central Processing Unit, CPU), and the processor 902 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), dedicated integrated Circuit (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. Wherein, the general-purpose processor may be a microprocessor, or the processor may be any conventional processor or the like.

本领域普通技术人员可以理解的是实现上述实施例的方法中的全部或部分流程,是可以通过计算机程序来完成,该计算机程序可存储于一计算机可读存储介质。该计算机程序被该计算机系统中的至少一个处理器执行,以实现上述方法的实施例的流程步骤。Those skilled in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by a computer program, and the computer program can be stored in a computer-readable storage medium. The computer program is executed by at least one processor in the computer system to implement the process steps of the above method embodiments.

因此,本申请还提供一种计算机可读存储介质。该计算机可读存储介质可以为非易失性的计算机可读存储介质,该计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行时使处理器执行如下步骤:Therefore, the present application also provides a computer-readable storage medium. The computer-readable storage medium may be a non-volatile computer-readable storage medium, and the computer-readable storage medium stores a computer program, and when the computer program is executed by the processor, the processor performs the following steps:

一种计算机程序产品,当其在计算机上运行时,使得计算机执行以上各实施例中所描述的项目舆情监控方法的步骤。A computer program product, when running on a computer, causes the computer to execute the steps of the project public opinion monitoring method described in the above embodiments.

所述计算机可读存储介质可以是前述设备的内部存储单元,例如设备的硬盘或内存。所述计算机可读存储介质也可以是所述设备的外部存储设备,例如所述设备上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述计算机可读存储介质还可以既包括所述设备的内部存储单元也包括外部存储设备。The computer-readable storage medium may be an internal storage unit of the aforementioned device, such as a hard disk or memory of the device. The computer-readable storage medium can also be an external storage device of the device, such as a plug-in hard disk equipped on the device, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card , Flash Card (Flash Card) and so on. Further, the computer-readable storage medium may also include both an internal storage unit of the device and an external storage device.

所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的设备、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the above-described equipment, devices and units can refer to the corresponding process in the foregoing method embodiments, and details are not repeated here.

所述计算机可读存储介质可以是U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的计算机可读存储介质。The computer-readable storage medium may be various computer-readable storage media capable of storing program codes, such as a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk, or an optical disk.

本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art can realize that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, computer software, or a combination of the two. In order to clearly illustrate the relationship between hardware and software Interchangeability. In the above description, the composition and steps of each example have been generally described according to their functions. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.

在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的。例如,各个单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。In the several embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. For example, the device embodiments described above are illustrative only. For example, the division of each unit is only a logical function division, and there may be another division method in actual implementation. For example, several units or components may be combined or integrated into another system, or some features may be omitted, or not implemented.

本申请实施例方法中的步骤可以根据实际需要进行顺序调整、合并和删减。本申请实施例装置中的单元可以根据实际需要进行合并、划分和删减。另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以是两个或两个以上单元集成在一个单元中。The steps in the methods of the embodiments of the present application can be adjusted, combined and deleted according to actual needs. Units in the device in the embodiment of the present application may be combined, divided and deleted according to actual needs. In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.

该集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分,或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台电子设备(可以是个人计算机,终端,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a storage medium. Based on this understanding, the technical solution of this application is essentially or part of the contribution to the prior art, or all or part of the technical solution can be embodied in the form of software products, and the computer software products are stored in a storage medium In the above, several instructions are included to make an electronic device (which may be a personal computer, a terminal, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.

以上所述,仅为本申请的具体实施方式,但本申请明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。The above is only a specific embodiment of the application, but the scope of protection of the invention is not limited thereto. Any person familiar with the technical field can easily think of various equivalents within the technical scope disclosed in this application. The modifications or replacements should be covered by the scope of protection of this application. Therefore, the protection scope of the present application should be based on the protection scope of the claims.

Claims (10)

1.一种项目舆情监控方法,其特征在于,所述方法包括:1. A project public opinion monitoring method, is characterized in that, described method comprises: 通过预设方式获取目标项目的标识信息;Obtain the identification information of the target item through a preset method; 根据所述标识信息通过网络搜索的方式获取所述目标项目的数据源网站列表;Obtaining a list of data source websites of the target project through a network search according to the identification information; 根据所述标识信息从所述数据源网站列表所包含的数据源网站中爬取所述目标项目的语料;crawling the corpus of the target item from the data source websites included in the data source website list according to the identification information; 通过自然语言处理解析所述语料以识别所述语料所包含的主体名称及舆情特征;Parsing the corpus by natural language processing to identify subject names and public opinion features contained in the corpus; 将所述主体名称及所述舆情特征导入图数据库以构建所述目标项目的舆情关系图谱;Importing the subject name and the public opinion features into the graph database to construct the public opinion relationship graph of the target project; 显示所述舆情关系图谱。Display the public opinion relationship graph. 2.根据权利要求1所述项目舆情监控方法,其特征在于,所述根据所述标识信息通过网络搜索的方式获取所述目标项目的数据源网站列表的步骤之后,还包括:2. according to the described project public opinion monitoring method of claim 1, it is characterized in that, after the step of obtaining the data source website list of described target project by the mode of network search according to described identification information, also comprise: 通过爬取的方式更新所述数据源网站列表。The list of data source websites is updated by crawling. 3.根据权利要求2所述项目舆情监控方法,其特征在于,所述通过爬取的方式更新所述数据源网站列表的步骤包括:3. according to the described project public opinion monitoring method of claim 2, it is characterized in that, the described step of updating described data source website list by the mode of crawling comprises: 获取所述目标项目的初始数据源网站列表;obtaining an initial data source website list of the target project; 将所述初始数据源网站列表按照预设条件进行分类以获取不同类型的数据源网站列表;Classifying the initial data source website list according to preset conditions to obtain different types of data source website lists; 封装所述不同类型的数据源网站列表至对应的Docker容器;Encapsulating the different types of data source website lists into corresponding Docker containers; 启动所述Docker容器以使所述Docker容器通过爬取的方式获取新数据源网站;Start the Docker container so that the Docker container obtains the new data source website by crawling; 将所述新数据源网站按照类型分别添加至对应的分类后的数据源网站列表以更新所述目标项目的数据源网站。The new data source website is added to the corresponding classified data source website list according to type to update the data source website of the target project. 4.根据权利要求1所述项目舆情监控方法,其特征在于,所述目标项目的舆情关系图谱包括目标项目名称、所述目标项目的子项目名称及子项目名称对应的舆情特征。4. The project public opinion monitoring method according to claim 1, wherein the public opinion relationship graph of the target project includes the name of the target project, the names of sub-projects of the target project, and public opinion features corresponding to the sub-project names. 5.根据权利要求1所述项目舆情监控方法,其特征在于,所述通过自然语言处理解析所述语料以识别所述语料所包含的主体名称及舆情特征的步骤包括:5. according to the described project public opinion monitoring method of claim 1, it is characterized in that, the described corpus by natural language processing is analyzed to identify the main body title that described corpus contains and the step of public opinion feature comprises: 将所述语料根据句子分隔号进行分割以获取句子数据集;The corpus is divided according to the sentence separator to obtain a sentence data set; 根据所述语料构建命名实体模型;Constructing a named entity model according to the corpus; 通过所述命名实体模型识别出所述句子数据集中所包含的主体名称;Identifying subject names contained in the sentence data set through the named entity model; 对所述语料进行词性分析及目标关系的检索以获取所述目标项目的舆情特征。Perform part-of-speech analysis and target relationship retrieval on the corpus to obtain public opinion features of the target item. 6.根据权利要求1所述项目舆情监控方法,其特征在于,所述显示所述舆情关系图谱的步骤之后,还包括:6. The project public opinion monitoring method according to claim 1, characterized in that, after the step of displaying the public opinion relationship map, further comprising: 按照预设顺序组合所述舆情关系图谱中的元素以通过文字形式描述所述目标项目的舆情。Combining the elements in the public opinion relationship graph in a preset order to describe the public opinion of the target item in text form. 7.根据权利要求6所述项目舆情监控方法,其特征在于,所述目标项目的舆情包括正面舆情信息、反面舆情信息、事件评估信息和渠道评估信息。7. The project public opinion monitoring method according to claim 6, wherein the public opinion of the target project includes positive public opinion information, negative public opinion information, event evaluation information and channel evaluation information. 8.一种项目舆情监控装置,其特征在于,包括:8. A project public opinion monitoring device, characterized in that it comprises: 第一获取单元,用于通过预设方式获取目标项目的标识信息;The first acquisition unit is configured to acquire the identification information of the target item in a preset manner; 第二获取单元,用于根据所述标识信息通过网络搜索的方式获取所述目标项目的数据源网站列表;A second obtaining unit, configured to obtain a list of data source websites of the target project by searching the Internet according to the identification information; 爬取单元,用于根据所述标识信息从所述数据源网站列表所包含的数据源网站中爬取所述目标项目的语料;A crawling unit, configured to crawl the corpus of the target item from the data source websites included in the data source website list according to the identification information; 识别单元,用于通过自然语言处理解析所述语料以识别所述语料所包含的主体名称及舆情特征;A recognition unit, configured to parse the corpus through natural language processing to identify subject names and public opinion features contained in the corpus; 构建单元,用于将所述主体名称及所述舆情特征导入图数据库以构建所述目标项目的舆情关系图谱;A construction unit for importing the subject name and the public opinion features into the graph database to construct the public opinion relationship graph of the target project; 显示单元,用于显示所述舆情关系图谱。A display unit, configured to display the public opinion relationship graph. 9.一种计算机设备,其特征在于,所述计算机设备包括存储器以及与所述存储器相连的处理器;所述存储器用于存储计算机程序;所述处理器用于运行所述存储器中存储的计算机程序,以执行如权利要求1-7任一项所述项目舆情监控方法的步骤。9. A computer device, characterized in that the computer device includes a memory and a processor connected to the memory; the memory is used to store a computer program; the processor is used to run the computer program stored in the memory , to execute the steps of the project public opinion monitoring method as described in any one of claims 1-7. 10.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时使所述处理器执行如权利要求1-7中任一项所述项目舆情监控方法的步骤。10. A computer-readable storage medium, characterized in that, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor executes any one of claims 1-7. The steps of the project public opinion monitoring method described in the item.
CN201910270796.5A 2019-04-04 2019-04-04 Project public opinion monitoring method, device, computer equipment and storage medium Pending CN110134845A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910270796.5A CN110134845A (en) 2019-04-04 2019-04-04 Project public opinion monitoring method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910270796.5A CN110134845A (en) 2019-04-04 2019-04-04 Project public opinion monitoring method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110134845A true CN110134845A (en) 2019-08-16

Family

ID=67569394

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910270796.5A Pending CN110134845A (en) 2019-04-04 2019-04-04 Project public opinion monitoring method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110134845A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111143336A (en) * 2019-11-27 2020-05-12 三盟科技股份有限公司 College scientific research data management-oriented web crawler management method and platform
CN111611408A (en) * 2020-05-27 2020-09-01 北京明略软件系统有限公司 Public opinion analysis method and device, computer equipment and storage medium
CN111666426A (en) * 2020-06-10 2020-09-15 北京海致星图科技有限公司 Method, system and equipment for acquiring knowledge graph multi-scene graph data
CN111858959A (en) * 2020-07-23 2020-10-30 平安付科技服务有限公司 Method, device, computer equipment and storage medium for generating component relationship graph
CN112069381A (en) * 2020-09-27 2020-12-11 中国科学院深圳先进技术研究院 A monitoring management method and system based on natural language processing technology
CN112328936A (en) * 2020-11-02 2021-02-05 杭州安恒信息安全技术有限公司 Website identification method, device and equipment and computer readable storage medium
CN113657547A (en) * 2021-08-31 2021-11-16 平安医疗健康管理股份有限公司 Public opinion monitoring method based on natural language processing model and related equipment thereof
CN114297990A (en) * 2021-12-22 2022-04-08 北京捷通华声科技股份有限公司 Public opinion monitoring method and device, computer readable storage medium and processor
CN114723509A (en) * 2021-01-06 2022-07-08 腾讯科技(深圳)有限公司 Method, device, electronic device and storage medium for identifying object state
CN115033771A (en) * 2022-06-07 2022-09-09 启明信息技术股份有限公司 Method for quickly generating online public opinion data crawler code

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102708096A (en) * 2012-05-29 2012-10-03 代松 Network intelligence public sentiment monitoring system based on semantics and work method thereof
WO2018023981A1 (en) * 2016-08-03 2018-02-08 平安科技(深圳)有限公司 Public opinion analysis method, device, apparatus and computer readable storage medium
CN108874878A (en) * 2018-05-03 2018-11-23 众安信息技术服务有限公司 A kind of building system and method for knowledge mapping
CN109471937A (en) * 2018-10-11 2019-03-15 平安科技(深圳)有限公司 A text classification method and terminal device based on machine learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102708096A (en) * 2012-05-29 2012-10-03 代松 Network intelligence public sentiment monitoring system based on semantics and work method thereof
WO2018023981A1 (en) * 2016-08-03 2018-02-08 平安科技(深圳)有限公司 Public opinion analysis method, device, apparatus and computer readable storage medium
CN108874878A (en) * 2018-05-03 2018-11-23 众安信息技术服务有限公司 A kind of building system and method for knowledge mapping
CN109471937A (en) * 2018-10-11 2019-03-15 平安科技(深圳)有限公司 A text classification method and terminal device based on machine learning

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111143336A (en) * 2019-11-27 2020-05-12 三盟科技股份有限公司 College scientific research data management-oriented web crawler management method and platform
CN111611408A (en) * 2020-05-27 2020-09-01 北京明略软件系统有限公司 Public opinion analysis method and device, computer equipment and storage medium
CN111666426A (en) * 2020-06-10 2020-09-15 北京海致星图科技有限公司 Method, system and equipment for acquiring knowledge graph multi-scene graph data
CN111858959A (en) * 2020-07-23 2020-10-30 平安付科技服务有限公司 Method, device, computer equipment and storage medium for generating component relationship graph
CN112069381A (en) * 2020-09-27 2020-12-11 中国科学院深圳先进技术研究院 A monitoring management method and system based on natural language processing technology
CN112328936A (en) * 2020-11-02 2021-02-05 杭州安恒信息安全技术有限公司 Website identification method, device and equipment and computer readable storage medium
CN114723509A (en) * 2021-01-06 2022-07-08 腾讯科技(深圳)有限公司 Method, device, electronic device and storage medium for identifying object state
CN113657547A (en) * 2021-08-31 2021-11-16 平安医疗健康管理股份有限公司 Public opinion monitoring method based on natural language processing model and related equipment thereof
CN113657547B (en) * 2021-08-31 2024-05-14 平安医疗健康管理股份有限公司 Public opinion monitoring method based on natural language processing model and related equipment thereof
CN114297990A (en) * 2021-12-22 2022-04-08 北京捷通华声科技股份有限公司 Public opinion monitoring method and device, computer readable storage medium and processor
CN115033771A (en) * 2022-06-07 2022-09-09 启明信息技术股份有限公司 Method for quickly generating online public opinion data crawler code

Similar Documents

Publication Publication Date Title
CN110134845A (en) Project public opinion monitoring method, device, computer equipment and storage medium
US8082264B2 (en) Automated scheme for identifying user intent in real-time
KR101741509B1 (en) Device and method for analyzing corporate reputation by data mining of news, recording medium for performing the method
US11609959B2 (en) System and methods for generating an enhanced output of relevant content to facilitate content analysis
CN110334202A (en) Method for constructing user interest tags based on news application software and related equipment
CN110110156A (en) Industry public sentiment monitoring method, device, computer equipment and storage medium
CN110263248A (en) A kind of information-pushing method, device, storage medium and server
CN102054015A (en) System and method for organizing community intelligence information using an organic object data model
CN110134844A (en) Public opinion monitoring method, device, computer equipment and storage medium in subdivided fields
CN108647225A (en) A kind of electric business grey black production public sentiment automatic mining method and system
CN112328857B (en) Product knowledge aggregation method and device, computer equipment and storage medium
CN112989208B (en) Information recommendation method and device, electronic equipment and storage medium
CN111447575A (en) Short message pushing method, device, equipment and storage medium
CN112328856A (en) Common event tracking method and device, computer equipment and computer readable medium
CN102982025B (en) A kind of search need recognition methods and device
CN104881446A (en) Searching method and searching device
Arafat et al. Analyzing public emotion and predicting stock market using social media
CN110019763B (en) Text filtering method, system, equipment and computer readable storage medium
CN114706948A (en) News processing method and device, storage medium and electronic equipment
CN113095078A (en) Associated asset determination method and device and electronic equipment
CN116226494B (en) Crawler system and method for information search
Thakkar Twitter sentiment analysis using hybrid naive Bayes
Yin et al. Research of integrated algorithm establishment of a spam detection system
CN112905790A (en) Method, device and system for extracting qualitative indexes of supervision events
JP5187187B2 (en) Experience information search system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190816