CN112581006B

CN112581006B - Public opinion information screening and enterprise subject risk level monitoring public opinion system and method

Info

Publication number: CN112581006B
Application number: CN202011562957.7A
Authority: CN
Inventors: 吴美娟
Original assignee: Hangzhou Hengtai Technology Co ltd
Current assignee: Hangzhou Hengtai Technology Co ltd
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2024-08-09
Anticipated expiration: 2040-12-25
Also published as: CN112581006A

Abstract

The invention relates to a public opinion engine and a method for screening public opinion information and monitoring risk levels of enterprise subjects, wherein the public opinion engine comprises the following components: the main body emotion classification module comprises a plurality of classified emotion classification models and is used for carrying out emotion tendencies on the acquired public opinion information; the topic classification module is used for carrying out single topic classification or multi-topic classification on the acquired public opinion information; the named-body recognition module is used for recognizing the named-body and calculating the compactness of the named-body and the public opinion information; the public opinion risk scoring module is used for acquiring the risk grade of the public opinion information containing the naming body; the similarity retrieval module is used for carrying out similarity calculation on the obtained different public opinion information and carrying out online public opinion information screening; and the enterprise main body risk level monitoring module is used for acquiring the current risk levels of different enterprise main bodies and monitoring in real time. The invention can rapidly screen appointed related information from massive news information data in real time and monitor the risk level of the enterprise main body in real time.

Description

Public opinion information screening and enterprise subject risk level monitoring public opinion system and method

Technical Field

The invention relates to the technical field of computers, in particular to a public opinion engine and method for screening public opinion information and monitoring risk levels of enterprise subjects.

Background

The public opinion information aims to remind wind control personnel to pay attention to the public opinion information, and the display information comprises a main body name, public opinion content, verification degree, message exposure time and the like. The existing public opinion engine generally adopts NLP and ML technology, combines with financial knowledge background, captures pain points of various business scenes, builds an algorithm model, and accurately analyzes various news. At present, most public opinion engines on the market often only pay attention to the quantity of news and neglect the quality of the news, and blindly push massive news information, so that similar news has higher repeatability and low efficiency or false early warning and report are often caused. Furthermore, the user is difficult to grasp news points, the interference of irrelevant news is large, and the news points are easy to mislead by irrelevant news.

Disclosure of Invention

The invention aims to provide a public opinion engine and a public opinion engine method for screening public opinion information and monitoring risk levels of enterprise subjects.

In order to achieve the above object, the present invention provides a public opinion engine for screening public opinion information and monitoring risk level of enterprise subjects, comprising:

The main body emotion classification module comprises a plurality of classified emotion classification models and is used for carrying out emotion tendencies on the acquired public opinion information;

the topic classification module is used for carrying out single topic classification or multi-topic classification on the obtained public opinion information;

The named-body recognition module is used for recognizing the named-body and calculating the compactness of the named-body and the public opinion information;

the public opinion risk scoring module is used for acquiring the risk grade of the public opinion information containing the named bodies;

the similarity retrieval module is used for carrying out similarity calculation on the obtained different public opinion information and screening the public opinion information;

And the enterprise main body risk level monitoring module is used for acquiring the current risk levels of different enterprise main bodies and dynamically monitoring the risk level changes of the enterprise main bodies corresponding to the named bodies.

According to one aspect of the invention, the subject emotion classification module is obtained by:

Constructing a training sample set, and giving labeling of three categories, namely positive, neutral and negative, to samples in the sample set;

dividing the sample set, carrying out parameter grid optimal search on each emotion classification model in a cross verification mode, verifying the emotion classification model by using a verification set, and taking the parameter with the best performance as an optimal model;

And the main body emotion classification module takes the result obtained by the prediction results of all the optimal emotion classification models through the majority voting rule as the final emotion tendency of the main body.

According to one aspect of the invention, the named-body recognition module extracts named bodies in the obtained key sentences and calculates the compactness between the named bodies and the public opinion information based on syntactic analysis of the obtained public opinion information.

According to one aspect of the invention, the public opinion risk scoring module comprises:

A keyword dictionary for extracting keywords and calculating word scores of the keywords in the public opinion information;

a negative event library for acquiring negative events related to the named entity over the years;

The public opinion risk scoring module scores key sentences in public opinion information based on the named-body recognition module, the keyword dictionary and the negative event library to obtain sentence scores, and obtains risk grades of the public opinion information containing the named-bodies based on the sentence scores.

According to one aspect of the present invention, the scoring module for public opinion risk obtains a sentence score by scoring the key sentences based on the named-body recognition module, the keyword dictionary and the negative event library, including:

Acquiring a named entity of a key sentence in the public opinion information based on the named entity identification module, and the compactness of the named entity and the public opinion information;

acquiring keywords, word scores and word frequencies of key sentences in the public opinion information based on the keyword dictionary;

Acquiring negative events of key sentences in the public opinion information based on the negative event library;

and scoring key sentences in the public opinion information based on the naming body, the compactness, the key words, the word scores, the word frequencies and the negative events to obtain the sentence scores.

According to one aspect of the invention, the public opinion risk scoring module scores key sentences in the public opinion information through a sentence scoring formula;

the sentence scoring formula is:

k (Max (Max (keyscore (1+ (word frequency-1)/10))) 0.8, max (scenescore))

Where K represents the closeness of the sentence to the named body, keyscore represents the word score, scenescore represents the negative event score;

According to one aspect of the invention, the public opinion risk scoring module integrates the named-body content of the key sentences with the completed scores, and scores the named-body content with the completed scores to obtain the risk level of the public opinion information containing the named-body.

According to one aspect of the present invention, the process of integrating the named-content of the scored key sentence by the public opinion risk scoring module includes:

the public opinion risk scoring module judges the key sentences with the completed scores; judging whether the key sentence is an question sentence, if so, directly ignoring the key sentence, otherwise, reserving the key sentence;

judging whether the key sentence is a sample sentence or not, if so, ignoring the key sentence, otherwise, keeping the key sentence;

And merging sentences related to the same naming body in the reserved key sentences according to the public opinion information sequence based on the judging result.

According to one aspect of the invention, in the process of scoring the content of the named entities which are integrated to obtain the risk level of the public opinion information containing the named entities, the risk score of the named entities is obtained through a named entity risk score formula, and the corresponding risk level is obtained, wherein the named entity risk score formula is as follows:

named entity risk score = min (1, max (all sentences under the same named entity score) x (1+min (1, (number of sentences under the same named entity-1)/10)) + average of additional scores of the number of sentences under the same named entity;

The method for calculating the additional score of the public opinion information comprises the following steps:

Max (word score (1+ (word frequency-1)/10)) × (2, (1+ (word frequency-1)/10) high word score) 0.8

Wherein the word score and the word frequency are obtained based on the keywords appearing in the rest of sentences extracted from the keyword dictionary, and the word frequency of the high-score word is the word frequency of the keywords obtained in Max (word score (1+ (word frequency-1)/10)).

According to one aspect of the invention, in the process of calculating the word score of the keyword in the public opinion information, a word score formula is adopted to obtain the word score, wherein the word score formula is as follows:

word score = 1/word rank +0.5 emotion of word + topic risk

According to one aspect of the present invention, the process of calculating the compactness of the named object and the public opinion information includes:

Judging whether the sentences in which the named entities are located have views, if so, entering the next step, otherwise, outputting a preset first compactness value;

judging whether the sentences in which the named entities are located are question sentences, conditional sentences or sample sentences, if not, entering the next step, otherwise, outputting a preset first compactness value;

judging whether the named body carries a suffix word or not, if not, entering the next step, otherwise, outputting a preset first compactness value;

Judging whether the named bodies in the sentence where the named bodies are located are only one, if yes, judging whether the syntax structure of the sentence meets the main-predicate relation, if yes, outputting a preset second compactness value, and otherwise, outputting a preset first compactness value; if a plurality of sentences exist in the sentence, judging whether the sentences are in a parallel structure, if so, splitting the structure of the sentences, determining whether the sentences have main bodies, if so, outputting a preset second compactness value, otherwise, outputting a preset third compactness value; and if the sentence is not in the parallel structure, outputting a preset second compactness value.

According to one aspect of the invention, the similarity retrieval module is used for similarity calculation of public opinion information and real-time public opinion information screening;

the similarity retrieval module calculates the similarity of public opinion information, which comprises the following steps:

Calculating a similarity relation between any two pieces of public opinion information, wherein if the title similarity or the text similarity is larger than a preset threshold value, the similarity relation between the public opinion information is defined, otherwise, the similarity relation does not exist;

constructing the public opinion information with similarity relations into a public opinion similarity set;

Ranking the release time of the public opinion information in the public opinion similarity set, reserving the earliest piece of public opinion information as a comparison sample, and deleting the rest public opinion information in the similarity set;

the process of the similarity retrieval module for screening the real-time public opinion information comprises the following steps:

and obtaining online public opinion information and carrying out similarity calculation based on the comparison sample to construct a real-time public opinion set.

According to one aspect of the invention, based on the main body emotion classification module, the topic classification module, the named-body recognition module and the public opinion risk scoring module, public opinion information in a comparison sample set is grouped according to enterprise main bodies;

And acquiring risk scores of corresponding enterprise subjects of the current node according to the named-body risk scores in the corresponding public opinion information, mapping the current risk grades of the enterprise subjects based on the enterprise subject risk scores, and outputting the risk grades for dynamically monitoring the risk grade changes of the enterprise subjects corresponding to the named-body.

In order to achieve the above-mentioned object, the present invention provides a method for monitoring risk level of enterprise main body using the above-mentioned public opinion engine, comprising:

s1, acquiring online public opinion information, calculating each dimension label result of the public opinion information, screening the public opinion information meeting the requirements according to preset dimension label values, and constructing an information set, wherein each dimension label result comprises emotion tendency, topic distribution, a named body and risk score;

s2, carrying out similarity analysis on the information set, calculating the similarity between the public opinion information in the information set, removing the similar public opinion information and constructing a comparison sample set;

s3, classifying public opinion information in the comparison sample set according to enterprise subjects, calculating risk scores of the enterprise subjects of the current node according to named-body risk scores in the corresponding public opinion information, mapping out risk grades of each enterprise subject based on the risk scores of the enterprise subjects, and dynamically monitoring risk grade changes of the enterprise subjects corresponding to the named-body.

According to one aspect of the present invention, in step S1, the step of obtaining online public opinion information and calculating each dimension label result of the public opinion information, the step of calculating the emotion tendencies includes:

respectively identifying the public opinion information through the emotion classification model, and obtaining a prediction result;

And taking the result obtained by the prediction results of all emotion classification models through the majority voting rule as the final emotion tendency.

According to one aspect of the present invention, in step S1, in the step of acquiring online public opinion information and calculating each dimension label result of the public opinion information, the step of calculating the risk score includes:

acquiring a named entity of a key sentence in the public opinion information based on a named entity identification module, and determining the compactness of the named entity and the public opinion information;

Obtaining keywords, word scores and word frequencies in the keywords based on a keyword dictionary in the public opinion risk score module;

Acquiring negative events in the key sentences based on a negative event library in the public opinion risk scoring module;

And scoring the key sentences based on the naming body, the compactness, the key words, the word scores, the word frequencies and the negative events to obtain sentence scores of the key sentences.

According to one aspect of the present invention, in step S1, the public opinion risk scoring module integrates named-body content of the key sentences with the scores, and scores the integrated named-body content to obtain a risk level of the public opinion information including the named-body.

According to one aspect of the present invention, the step of integrating the named-content of the scored key sentence by the public opinion risk scoring module includes:

According to one aspect of the present invention, in step S2, similarity analysis is performed on the information set, similarity between the public opinion information in the information set is calculated, and similar public opinion information is removed and a comparison sample set is constructed, where:

Calculating a similarity relation between any two pieces of public opinion information based on the similarity retrieval module, wherein if the title similarity or the text similarity is larger than a preset threshold value, the similarity relation between the public opinion information is defined, otherwise, the similarity relation does not exist;

and constructing a comparison sample set based on the obtained comparison sample and the public opinion collection sample without similarity relation.

According to one aspect of the present invention, in step S3, based on the subject emotion classification module, the topic classification module, the named-body recognition module, and the public opinion risk scoring module, public opinion information in the comparison sample set is grouped according to an enterprise subject;

For each piece of public opinion information screened according to the enterprise main body, multiplying the named body risk score of the public opinion information by an attenuation coefficient obtained by the time interval between the current public opinion information and the earliest published public opinion information, and taking the value of a certain quantile as an alternative item I of the enterprise main body risk score according to the sequence from small to large;

Meanwhile, public opinion information in a certain latest preset time interval is acquired, and the maximum value obtained by multiplying the named body risk score by the corresponding attenuation coefficient is used as an alternative option II of the enterprise main body risk score;

obtaining the maximum value of the two alternative options as a risk score of the enterprise main body corresponding to the current node;

And mapping the current risk level of the enterprise entity based on the enterprise entity risk score and outputting the risk level to dynamically monitor the risk level change of the enterprise entity corresponding to each named body.

According to the scheme of the invention, the public opinion engine can timely grasp, reasonably classify and analyze massive negative news, extract the negative news which is considered by investors to be related to main body default risks, and greatly improve the efficiency of information news review by users.

According to the scheme, the problem of rapidly screening appointed related information from the explosive type massive news information data in real time is solved, and the public opinion information meeting the requirements is rapidly and efficiently screened through the control of different dimension labels of the public opinion information.

According to the scheme of the invention, the public opinion engine can carry out fine processing on the acquired document and completely analyze the acquired document to obtain the named body and the viewpoint expressed by the whole document, and can obtain the accurate score of the whole document.

According to the scheme of the invention, the public opinion engine is more comprehensive in dividing the acquired documents, and meanwhile, the creation modes of combining a named body library, a white list, a keyword dictionary, a negative event library and the like with financial business scenes are realized, so that the semantic analysis result meets the requirements of clients.

According to the scheme of the invention, the public opinion analysis engine has the advantages of high efficiency and high accuracy, the processing process is carried out simultaneously, each piece of information can be completed in a short time, and the analysis efficiency is greatly improved.

Drawings

FIG. 1 is a block diagram schematically showing the construction of a public opinion engine of the present invention;

FIG. 2 is a flow chart schematically illustrating the processing of the subject emotion classification module in the public opinion engine according to the present invention;

FIG. 3 is a flow chart schematically illustrating the calculation of the compactness of a named entity and public opinion information in a public opinion engine according to the present invention;

FIG. 4 schematically shows a process flow diagram of a public opinion risk scoring module according to the present invention;

Fig. 5 is a block diagram schematically illustrating steps of a method for monitoring risk levels of an enterprise subject by a public opinion engine according to the present invention.

Detailed Description

The present invention will be described in detail below with reference to the drawings and the specific embodiments, which are not described in detail herein, but the embodiments of the present invention are not limited to the following embodiments.

The method solves the problems of rapidly screening appointed related information from explosive growth mass news information data in real time and monitoring risk conditions of related enterprise subjects according to the information. The invention relates to a public opinion analysis engine capable of rapidly and efficiently screening public opinion information to meet requirements through label control of different dimensions of public opinion information. The labels of all dimensions in the engine mainly comprise two major categories, namely, combining a machine learning methodology to construct emotion tendency, topic distribution and text similarity models, and obtaining emotion tendency and topic labels of news information and similar information sets after all information clustering; and the other is to combine the knowledge of the financial field with the natural language processing technology to construct a naming body recognition model and a quantification method of information risk scores, and extract naming bodies of news information and enterprise main body risk scores. Firstly, eliminating information which does not meet the condition by controlling the label of each dimension; then, reserving an earliest piece of information from the similar information, and eliminating the rest similar information; and finally, reserving all information meeting the conditions. The engine provides an efficient information screening function on one hand, and dynamically observes the change of the risk level of the entity on the other hand, so as to provide basis for the financial institution to conduct wind control management.

As shown in fig. 1, according to an embodiment of the present invention, a public opinion engine for screening public opinion information and monitoring risk level of enterprise subjects includes: the system comprises a main body emotion classification module, a theme classification module, a named body identification module, a public opinion risk scoring module, a similarity retrieval module and an enterprise main body risk level monitoring module.

In this embodiment, the information acquisition port is used to crawl the web page information on the internet, and the original content in the web page information is input to the structured extraction module for structured processing (for example, content filtering, automatic duplication elimination, etc.) to obtain public opinion information and store data.

In this embodiment, the public opinion engine receives and processes the acquired public opinion information. The main body emotion classification module performs emotion classification on the acquired public opinion information to acquire emotion tendencies of the public opinion information, and comprises a plurality of classified emotion classification models; the topic classification module performs single topic classification or multi-topic classification on the acquired public opinion information; the named-body recognition module is used for recognizing the named-body and calculating the compactness of the named-body and the public opinion information; the public opinion risk scoring module acquires the risk grade of the public opinion information containing the naming body; the similarity retrieval module is used for carrying out similarity calculation on the obtained different public opinion information and screening the online public opinion information; the enterprise subject risk level monitoring module is used for acquiring the current risk level of the enterprise subject and dynamically monitoring the risk level change of the enterprise subject corresponding to each named body.

Referring to fig. 2, according to an embodiment of the present invention, the main emotion classification module uses 9 machine learning algorithms with different characteristics as base learners to systematically identify news information emotion tendencies based on a machine learning integration method, and finally selects 9 learner majority vote results as final emotion tendency results.

In this embodiment, the subject emotion classification module is obtained by:

firstly, analyzing near 3W news by experts and researchers through sampling, finally selecting 1W of reported news related to enterprise credit as a training sample set, and giving labels of positive, middle and negative 3 categories;

Secondly, dividing a sample set, carrying out parameter grid optimal search on each emotion classification model by adopting a plurality of cross verification modes, verifying the emotion classification model by using a verification set, and taking the parameter with the best performance as an optimal model; in this embodiment, the base learner selects from a linear classification algorithm with different features, an algorithm based on probability distribution, an inertia algorithm, an algorithm with decision number as a core and a neural network 5-class algorithm, for example, the emotion classification model is independently trained by at least one of a plurality of machine learning methods such as LR, NB, decision tree, KNN, SVM and the like, and the optimal model of each model is obtained by adjusting parameters by a grid optimization method;

and finally, taking the result obtained by the majority voting rule of all classifier prediction results as the final emotion tendency.

Through practical application verification, the method can remarkably improve the recall rate and prediction accuracy of negative information, improve the prediction performance of the whole learner, and achieve the accuracy of more than 86%.

According to one embodiment of the invention, the topic classification module is obtained by;

In the embodiment, an LDA method is adopted to obtain a topic classification module;

firstly, adding a named body dictionary and a mechanism dictionary on the basis of original virtual words, prepositions, pronouns and the like, and constructing stopwords dictionary;

Secondly, training all news information of the 19-year financial channel of the 170W eastern financial resources by utilizing an LDA model, and optimizing the model by adjusting thresholds of common words and specific word frequencies in the news information;

Finally, the first 70 topics are selected, the topics are named according to the probability distribution situation of each word in the topics, and then the topics are mapped into 7 major classes such as debt repayment capacity, debt repayment willingness, law and regulation, credit compliance, market situation, high management dynamics, other credit correlation and the like. The LDA theme merging mapping relation is as follows:

According to the invention, the classification accuracy of the obtained topic classification module on single topics reaches more than 80%, and the classification accuracy of the topic classification module on multiple topics reaches more than 90%.

According to one embodiment of the invention, the named-body recognition module is based on syntactic analysis, extracts a named-body list related in a text, calculates the relation degree between the named-body and the text, is used for recognizing the named-body of public opinion information, extracting the recognized named-body, and calculating the closeness of the named-body and the public opinion information (the closeness affects the weight of the scoring process of a main body). In this embodiment, the named body recognition module is obtained by: firstly, acquiring the full name, short name and great-use name of a named body associated with an enterprise owner based on industrial and commercial data, and simultaneously acquiring a stock code, a stock name, a bond code and a bond name issued by the named body associated with the enterprise owner according to market data; secondly, referring to the recognition rule of the same named object, and collecting the fuzzy matched named object expression in the information by combining the word vector similarity result; and finally, auditing the expression list of the named volumes, and eliminating abnormal and ambiguous named volume expression forms.

As shown in fig. 3, in the process of calculating the compactness of a named entity and public opinion information according to an embodiment of the present invention, the method includes:

Judging whether the sentences in which the naming bodies are located have views, if so, entering the next step, otherwise, outputting a preset first compactness value (for example, taking 0);

judging whether the sentences in which the naming bodies are positioned are question sentences, conditional sentences or sample sentences, if not, entering the next step, otherwise, outputting a preset first compactness value;

Judging whether the naming body carries a suffix word or not, if not, entering the next step, otherwise, outputting a preset first compactness value;

Judging whether the number of the named bodies in the sentences in which the named bodies are located is one, if so, judging whether the syntax structure of the sentences meets the main-predicate relation, if so, outputting a preset second compactness value (for example, taking 1), otherwise, outputting a preset first compactness value; if a plurality of sentences exist, judging whether the sentences are in a parallel structure, if so, splitting the structure of the sentences, determining whether the sentences have main bodies (the main bodies represent that the sentences satisfy a main-name relationship in syntactic analysis and the enterprise names are subject), if so, outputting a preset second compactness value, otherwise, outputting a preset third compactness value (for example, taking 0.3); if the sentence is not in the parallel structure, outputting a preset second compactness value.

According to one embodiment of the present invention, a public opinion risk scoring module is used to obtain a risk level of the public opinion information including the named bodies. In this embodiment, the public opinion risk scoring module includes: keyword dictionary, negative event library.

In this embodiment, the keyword dictionary is used for extracting keywords of public opinion information and calculating word scores of the keywords in the public opinion information; in this embodiment, the method for creating the keyword dictionary is similar to the method for creating the named entity recognition module, and the keyword dictionary is expanded by using news information from financial websites as a basis after unsupervised training and combining the attention points of the named entities in the credit risk field, and finally the keyword dictionary is checked by an expert and determined after cross-validation. In this embodiment, a keyword dictionary is generated, and a word class, a word emotion, and a topic risk of a keyword are respectively labeled according to a business scenario. Furthermore, the word level can be understood as being predefined by an expert, the emotion of the word is firstly trained by the corpus and subject risk is adjusted by the expert, the subject to which the word belongs is obtained according to the expected training, and the subject risk of the subject is predefined by the expert.

In this embodiment, a negative event library is used to obtain negative events related to a named entity over the years. In this embodiment, the negative event library is generated by: acquiring all information public opinion information related to the name body which has been violated in the past year, and sorting the time types of the information public opinion information; and determining a negative event library through statistical analysis and correlation analysis between the event and the default, and marking attribute values such as emotion tendency, grade, risk, type and the like of the event by combining with a credit risk scene. In this embodiment, negative events in information public opinion information are extracted by a supervised learning method, and the specific extraction steps are as follows: 1) Intersection character of event and sentence > = event length 0.9; 2) With an event length of 1.2 as a window, scroll through sentences. Due to the stationarity of the scene, the extracted events in the two steps of final output are combined, so that the accuracy of event extraction is ensured.

Referring to fig. 4, according to an embodiment of the present invention, a public opinion engine acquires public opinion information and then divides sentences. For example, public opinion information is divided into independent sentences by punctuation marks, which typically represent the end of sentences, such as periods, semicolons, question marks, and exclamation marks. The above-described partitioning process is effective for simple sentences, and there is no ambiguity. In the present embodiment, the compound sentences such as the continuous sentence, the comparative sentence, the inflected sentence, the ordered sentence and the like are split twice by using the semicolon, the continuous sentence and the like as separators.

After the splitting of sentences is completed, space processing is carried out on each sentence, if the sentence length is greater than 300 words and the space number is greater than 11, sentence breaking is carried out according to the spaces, and separator marks are supplemented.

In this embodiment, after the public opinion information is split, content extraction of keywords and keyword groups is performed on the public opinion information according to a keyword dictionary and a negative event library, and the extracted automatic abstract method is used to extract the keywords of the text.

Furthermore, the public opinion risk scoring module scores the key sentences based on the named-body recognition module, the key word dictionary and the negative event library to obtain sentence scores, and obtains the risk level of the public opinion information containing the named-body based on the sentence scores.

Referring to fig. 4, in the process of scoring key sentences based on a named-body recognition module, a keyword dictionary and a negative event library to obtain sentence scores, a public opinion risk scoring module according to an embodiment of the present invention includes:

acquiring a named entity in the key sentence based on the named entity identification module, and obtaining the compactness of the named entity and public opinion information;

acquiring keywords, word scores and word frequencies in the key sentences based on the keyword dictionary;

Acquiring negative events in the key sentences based on the negative event library;

and scoring the key sentences based on the naming body, the compactness, the key words, the word scores, the word frequencies and the negative events to obtain sentence scores.

According to one embodiment of the invention, a public opinion risk scoring module scores key sentences in the public opinion information through a sentence scoring formula;

The sentence scoring formula is:

k (Max (Max (keyscore (1+ (word frequency-1)/10))) 0.8, max (scenescore))

Where K represents the closeness of the sentence to the named body, keyscore represents the word score, and scenescore represents the negative event score.

According to one embodiment of the invention, the public opinion risk scoring module integrates the content of the named bodies of the key sentences which are scored, and scores the content of the named bodies which are scored to obtain the risk level of the public opinion information containing the named bodies.

According to one embodiment of the present invention, the process of integrating the named-body content of the scored key sentence by the public opinion risk scoring module includes:

the public opinion risk scoring module judges the key sentences which finish scoring; judging whether the key sentence is a question sentence, if so, directly ignoring the key sentence, otherwise, reserving the key sentence;

judging whether the key sentence is a sample sentence, if so, ignoring the key sentence, otherwise, keeping the key sentence;

Referring to fig. 4, in the process of scoring the content of the named entities after integration to obtain the risk level of the public opinion information including the named entities according to an embodiment of the present invention, a named entity risk score formula is used to obtain a named entity risk score and a corresponding risk level, where the named entity risk score formula is as follows:

The word score and the word frequency are obtained based on the keywords appearing in the rest of sentences extracted from the keyword dictionary, and the word frequency of the high-score word is the word frequency of the keywords obtained in Max (word score (1+ (word frequency-1)/10)). It should be noted that the rest of sentences involved in the method of calculating the additional score refer to a collective term of sentences that do not contain a body of business (i.e., named body) in the public opinion information.

According to one embodiment of the invention, in the process of calculating the word score of the keyword in the public opinion information, a word score formula is adopted to obtain the word score, wherein the word score formula is as follows:

word score = 1/word rank +0.5 emotion of word + topic risk

In this embodiment, the word score ranges between [0,1 ].

In this embodiment, the risk level is output by constructing the mapping relationship between the obtained risk score and the risk level of the named object. Specifically, after statistical analysis of risk scores of 300W historical test data, 1% of data is removed, and the maximum value and the minimum value are taken as the normalization basis of the risk scores. And then mapping the risk score of the named body in the normalized public opinion information into a risk grade according to the following relation, wherein the mapping table is as follows:

Scaler_score	Risk_level
		[0,0.3)	No risk
[0.3,0.5)	Low risk
		[0.5,0.8)	Risk in
[0.8,1]	High risk

According to one embodiment of the invention, the similarity retrieval module is used for similarity calculation of public opinion information and real-time public opinion information screening. In this embodiment, the similarity calculation process of the similarity search module for public opinion information includes:

constructing public opinion information with similarity relations into a public opinion similarity set;

Ranking the release time of public opinion information in the public opinion similarity set, reserving the earliest piece of public opinion information as a comparison sample, and deleting the rest public opinion information in the similarity set;

And obtaining online public opinion information and carrying out similarity calculation based on the comparison sample to construct a real-time public opinion collection.

According to one embodiment of the invention, the enterprise subject risk level monitoring module is used for acquiring the current risk levels of different enterprise subjects and dynamically monitoring the risk level changes of the enterprise subjects corresponding to each named body.

Specifically, based on a main body emotion classification module, a theme classification module, a named body recognition module and a public opinion risk scoring module, public opinion information in a comparison sample set is grouped according to an enterprise main body;

and acquiring risk scores of corresponding enterprise subjects of the current node according to the named-body risk scores in the corresponding public opinion information, mapping the risk grade of the current enterprise subject based on the enterprise subject risk scores, and outputting the risk grade, so as to dynamically monitor the risk grade change of the enterprise subject corresponding to each named-body.

Referring to fig. 5, according to an embodiment of the present invention, a method for performing risk level monitoring of an enterprise subject based on the public opinion engine of the present invention includes:

S1, acquiring online public opinion information, calculating each dimension label result of the public opinion information, screening the public opinion information meeting the requirements according to preset dimension label values, and constructing an information set, wherein each dimension label result comprises emotion tendencies, topic distribution, names and risk scores, and each preset dimension label value comprises preset emotion tendencies, preset topic distribution conditions, preset names and risk scores.

S2, carrying out similarity analysis on the information set, calculating similarity between public opinion information in the information set, removing similar public opinion information and constructing a comparison sample set;

Referring to fig. 5, in step S1, in the step of obtaining online public opinion information, based on the above-mentioned dividing manner of public opinion information, the HMM model in the Jieba chinese natural language processing word segmentation library is used for dividing, and the integrity of the dictionary is forced to be preserved by means of self-defining the dictionary. And then, obtaining keywords and keyword groups in the public opinion information through a summary generation module, and extracting key sentences of the text by using an extraction type automatic abstract method.

In step S1, in the step of obtaining online public opinion information and calculating the label result of each dimension of the public opinion information, the step of calculating emotion tendency includes:

Respectively identifying the public opinion information through an emotion classification model, and obtaining a prediction result;

the predicted results of all emotion classification models are taken as the final emotion tendencies (e.g. negative) by the majority voting rule.

In step S1, according to an embodiment of the present invention, in the step of obtaining online public opinion information and calculating each dimension label result of the public opinion information, the step of calculating the risk score includes:

Acquiring a named entity in the key sentence based on the named entity identification module, and obtaining the compactness of the named entity and public opinion information; in this embodiment, it is necessary to determine whether the sentence of the public opinion information contains a named entity, specifically, in the foregoing step, the sentence is segmented and then submitted to the named entity recognition module, and a named entity list appearing in the sentence is extracted; judging whether the named body carries a suffix word or not for the named body which appears, and removing the named body if the named body carries the suffix word. If only one named body is left in the sentence and the named body is in the white list, main predicate sentence judgment is carried out, if the named body is a subject, the named body is reserved, otherwise, the named body is removed. If there are complex structures of multiple subjects remaining in the sentence, if there are financial institutions, then culling, remaining and analyzing according to syntax, if it is a passive sentence, scoring 100% by the first named body after the word, and scoring 30% by the remaining subjects. For example, the stock of the newly controlled strand holding the racewheel is frozen and the newly controlled strand mortgages the racewheel stock to the China banking.

Acquiring keywords, word scores and word frequencies in the key sentences based on the keyword dictionary; in this embodiment, the hotspot/sensitive word analysis is implemented by a keyword dictionary. After the public opinion information is subjected to word segmentation, the public opinion information is subjected to intersection with a keyword dictionary, and information such as keywords, word frequency, word distance and the like is extracted.

According to one embodiment of the invention, the public opinion risk scoring module scores key sentences in the public opinion information through a sentence scoring formula, wherein the sentence scoring formula is as follows:

k (Max (Max (keyscore (1+ (word frequency-1)/10))) 0.8, max (scenescore))

According to one embodiment of the present invention, in step S1, the public opinion risk scoring module integrates named-body content of the scored key sentences, and performs risk scoring on the integrated named-body content to obtain a named-body risk score.

According to one embodiment of the present invention, the step of integrating named-body content of the scored key sentence by the public opinion risk scoring module includes:

According to one embodiment of the invention, in the process of scoring the content of the named entities which are integrated to obtain the risk level of the public opinion information containing the named entities, the named entity risk score formula is used for obtaining the named entity risk score of the named entities and obtaining the corresponding risk level, wherein the named entity risk score formula is as follows:

The word score and the word frequency are obtained based on the keywords appearing in the rest of sentences extracted from the keyword dictionary, and the word frequency of the high-score word is the word frequency of the keywords obtained in Max (word score (1+ (word frequency-1)/10)).

According to one embodiment of the present invention, in step S1, in the process of calculating the word score of the keyword in the public opinion information, a word score formula is adopted to obtain the word score, where the word score formula is:

Word score = 1/word rank +0.5 emotion of word + topic risk.

Judging whether a viewpoint exists in a sentence in which the named body exists, if so, entering the next step, otherwise, outputting a preset first compactness value;

Judging whether the number of the named bodies in the sentence is one, if so, judging whether the syntax structure of the sentence meets the main-predicate relation, if so, outputting a preset second compactness value, otherwise, outputting a preset first compactness value; if a plurality of sentences exist in the sentence, judging whether the sentences are in a parallel structure, if so, splitting the structure of the sentences, determining whether the sentences have main bodies, if so, outputting a preset second compactness value, otherwise, outputting a preset third compactness value; if the sentence is not in the parallel structure, outputting a preset second compactness value.

Referring to fig. 5, in step S2, according to an embodiment of the present invention, similarity analysis is performed on an information set, similarity between public opinion information in the information set is calculated, and similar public opinion information is removed and a comparison sample set is constructed:

Calculating a similarity relation between any two pieces of public opinion information based on a similarity retrieval module, wherein if the title similarity or the text similarity is larger than a preset threshold value, the similarity relation between the public opinion information is defined, otherwise, the similarity relation does not exist;

and ordering the release time of the public opinion information in the public opinion similarity set, reserving the earliest piece of public opinion information as a comparison sample, and deleting the rest public opinion information in the similarity set.

According to one embodiment of the invention, the similarity retrieval module is composed of 2 parts, one part is used for similarity retrieval of news headlines, and similarity is calculated according to character matching rules and word embedding respectively. The character matching rule is as follows: firstly, cleaning a title and then word segmentation; secondly, calculating a word intersection/word union value of the two text titles, and marking the value as sim_title; finally, the similarity simvalue of the two titles is determined. The process of calculating the similarity is as follows: when sim_title > =0.8, then simvalue =sim_title; otherwise, continuing to judge, if the number of word intersections is greater than 0.9 times of the length of any of the header words, simvalue =0.9, otherwise, simvalue =sim_title. Word embedding is a technique of converting words represented by natural language into vectors or matrix form which can be understood by a computer, firstly, training wordembedding based on a machine learning word2vec method by using 300 ten thousand news in history; then, the text title passes through word embedding to obtain a low-dimensional word vector; and finally, adopting the cosine distance as a similarity value. The similarity for the title eventually takes max (the maximum of the two methods). The second part is used for similarity retrieval of news texts, and 5 words are selected according to the final rolling window length through model verification based on simhash pure word frequency statistical method.

In this embodiment, the workflow of the similarity search module is as follows:

firstly, calculating a similarity relation between any two pieces of news information, wherein if the title similarity or the text similarity is larger than 0.8, defining that the similarity relation exists between texts, otherwise, not exists;

Then, constructing news with similar relations into a similar set;

finally, sorting according to the news release time, reserving the earliest piece of news information (the earliest released public opinion information), and deleting the rest news information in the similar collection.

In this embodiment, the similarity retrieval module in the similarity retrieval module may also implement real-time public opinion information screening. The method screens out information data meeting the requirements from massive news information. According to the invention, the public opinion engine processes public opinion information based on the labels (emotion tendencies, topic distribution, named body identification and risk score) of each specified dimension, and screens out all information meeting the conditions to form an information set. For example, if it is desired to acquire news of a negative high risk of a city casting enterprise, the named-body label is all named-body enterprises corresponding to the city casting enterprise, the compactness label is greater than 0, the emotion label is negative, the topic label is None, and the risk label is high risk.

Referring to fig. 5, in step S3, classifying public opinion information in a comparison sample set according to enterprise subjects, calculating risk scores of the enterprise subjects of a current node according to named-body risk scores in corresponding public opinion information, and mapping risk levels of each enterprise subject based on the risk scores of the enterprise subjects includes:

based on a main body emotion classification module, a theme classification module, a named body recognition module and a public opinion risk scoring module, grouping public opinion information in a comparison sample set according to an enterprise main body;

Then, for the public opinion information screened by each enterprise main body, multiplying the risk score of the named object in the public opinion information by the attenuation coefficient (obtained by the time interval between the current public opinion information and the earliest published public opinion information), and taking the 95 score thereof as S_95 according to the sequence from small to large; meanwhile, taking the public opinion information in the last 3 days, multiplying the risk score of the named body by the maximum value of the corresponding attenuation coefficient, and marking the maximum value as s3_max; finally, the value of max (s3_max, s_95) is calculated and used as the risk score of the enterprise subject. The risk score of the enterprise main body is mapped into risk levels according to the mapping relation shown in the table below, and is used as the current risk level of the enterprise main body and output for dynamically monitoring the risk level change of the enterprise main body corresponding to each named body. Where the decay factor = 0.97 ^T, where T represents the time interval (e.g., days) between the current time and the earliest publication time of public opinion.

Scaler_score	Risk_level
		[0,0.8)	No risk
[0.8,0.9)	Low risk
		[0.9,0.95)	Risk in
[0.95,1]	High risk

The foregoing is merely exemplary of embodiments of the invention and, as regards devices and arrangements not explicitly described in this disclosure, it should be understood that this can be done by general purpose devices and methods known in the art.

The above description is only one embodiment of the present invention, and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. Public opinion system of screening public opinion information and monitoring enterprise subject risk level, characterized in that includes:

The enterprise main body risk level monitoring module is used for acquiring the current risk levels of different enterprise main bodies and dynamically monitoring the risk level changes of the enterprise main bodies corresponding to each named body;

The named-body recognition module extracts named bodies in the obtained key sentences based on syntactic analysis of the obtained public opinion information, and calculates compactness between the named bodies and the public opinion information;

The process for calculating the compactness of the named entity and the public opinion information comprises the following steps:

Judging whether the named bodies in the sentence where the named bodies are located are only one, if yes, judging whether the syntax structure of the sentence meets the main-predicate relation, if yes, outputting a preset second compactness value, and otherwise, outputting a preset first compactness value; if a plurality of sentences exist in the sentence, judging whether the sentences are in a parallel structure, if so, splitting the structure of the sentences, determining whether the sentences have main bodies, if so, outputting a preset second compactness value, otherwise, outputting a preset third compactness value; if the sentence is not in the parallel structure, outputting a preset second compactness value;

the public opinion risk scoring module includes:

the public opinion risk scoring module scores key sentences in public opinion information based on the named-body recognition module, the key word dictionary and the negative event library to obtain sentence scores, and obtains risk grades of the public opinion information containing the named-bodies based on the sentence scores;

The scoring module for public opinion risk scores the key sentences based on the named body recognition module, the key word dictionary and the negative event library, and the scoring process for obtaining sentence scores comprises the following steps:

Scoring key sentences in the public opinion information based on the naming body, the compactness, the key words, the word scores, the word frequencies and the negative events to obtain sentence scores;

the public opinion risk scoring module scores key sentences in the public opinion information through a sentence scoring formula;

the sentence scoring formula is:

k (Max (Max (keyscore (1+ (word frequency-1)/10))) 0.8, max (scenescore))

In the process of scoring the integrated named-body content to obtain the risk grade of the public opinion information containing the named-body, obtaining the risk score of the named-body through a named-body risk score formula and obtaining a corresponding risk grade, wherein the named-body risk score formula is as follows:

2. The public opinion system of claim 1, wherein the subject emotion classification module is obtained by:

3. The public opinion system of claim 2, wherein the public opinion risk scoring module integrates named-body content of the key sentences that complete scoring and scores named-body content that completes integration to obtain a risk rating of the public opinion information that includes the named-body.

4. The public opinion system of claim 3, wherein the public opinion risk scoring module performs named-body content integration on the scored key sentences, comprising:

5. The public opinion system of claim 2, wherein in calculating a word score of the keyword in the public opinion information, a word score formula is used to obtain the word score, wherein the word score formula is:

Word score = 1/word rank +0.5 emotion of word + topic risk.

6. The public opinion system of claim 1, wherein the similarity retrieval module is configured to calculate similarity of public opinion information and perform real-time public opinion information screening;

7. The public opinion system of claim 6, wherein the public opinion information in the comparison sample set is grouped by enterprise subject based on the subject emotion classification module, the topic classification module, the named-body recognition module, the public opinion risk scoring module;

8. A method of monitoring risk level of an enterprise subject employing the public opinion system of any of claims 1-7, comprising:

9. The method of claim 8, wherein in step S1, the step of obtaining online public opinion information and calculating each dimension label result of the public opinion information, the step of calculating the emotion tendencies comprises:

10. The method according to claim 9, wherein in the step of obtaining online public opinion information and calculating each dimension label result of the public opinion information in step S1, the step of calculating the risk score includes:

11. The method of claim 10, wherein in step S1, the public opinion risk scoring module integrates named-body content of the key sentences that have been scored, and scores named-body content that has been integrated to obtain a risk rating of the public opinion information that includes the named-body.

12. The method of claim 11, wherein the public opinion risk scoring module performs named-body content integration on the scored key sentences, comprising:

13. The method of claim 12, wherein in step S2, similarity analysis is performed on the information set, similarity between the public opinion information in the information set is calculated, and similar public opinion information is removed and a comparison sample set is constructed:

14. The method of claim 12, wherein in step S3, public opinion information in the comparison sample set is grouped by enterprise subject based on the subject emotion classification module, the topic classification module, the named-body recognition module, the public opinion risk scoring module;