CN117370678B - Community public opinion monitoring method and related device based on big data - Google Patents
Community public opinion monitoring method and related device based on big data Download PDFInfo
- Publication number
- CN117370678B CN117370678B CN202311431914.9A CN202311431914A CN117370678B CN 117370678 B CN117370678 B CN 117370678B CN 202311431914 A CN202311431914 A CN 202311431914A CN 117370678 B CN117370678 B CN 117370678B
- Authority
- CN
- China
- Prior art keywords
- data
- community
- target
- user
- public opinion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Economics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the field of big data, and discloses a community public opinion monitoring method and a related device based on big data, which are used for improving the accuracy of community public opinion monitoring. The method comprises the following steps: performing topic analysis on the target monitoring data to obtain a plurality of topic clusters, and performing topic identification and screening on the topic clusters to obtain target monitoring topics; carrying out emotion tendency analysis according to the target monitoring topics to obtain target emotion tendency classification; carrying out community relation analysis according to the target emotion tendency classification and the target monitoring topics, generating a first community user relation network, and carrying out key user identification to obtain at least one second user; performing relationship network weighted analysis on the first community user relationship network based on at least one second user to generate a second community user relationship network; and inputting the second community user relationship network into a community public opinion prediction model to predict community public opinion evolution, and obtaining a target public opinion prediction result.
Description
Technical Field
The invention relates to the field of big data, in particular to a community public opinion monitoring method and a related device based on big data.
Background
Community public opinion refers to public opinion and affective information about community events, topics or individuals that are generated within a particular community. With the rapid development of the Internet, community platforms such as social media and forums become main channels for people to express opinions and emotions, and a large amount of community public opinion data emerge. The mass data contains rich community public opinion information such as event comments, user relations and the like, and has important influence on enterprise and public decisions.
The existing scheme mainly relies on manual grabbing and simple statistical analysis, cannot meet the requirements of efficient processing and deep mining of mass data, and further is low in accuracy.
Disclosure of Invention
The invention provides a community public opinion monitoring method and a related device based on big data, which are used for improving the accuracy of community public opinion monitoring.
The first aspect of the invention provides a community public opinion monitoring method based on big data, which comprises the following steps: acquiring initial monitoring data of a plurality of first users in a target community through a preset big data platform, and performing data integrity check sum data screening on the initial monitoring data to obtain standard monitoring data; performing text word segmentation and part-of-speech tagging on the standard monitoring data to obtain target word segmentation data, and performing named entity recognition on the target word segmentation data to obtain target monitoring data; performing topic analysis on the target monitoring data to obtain a plurality of topic clusters, and performing topic identification and screening on the topic clusters to obtain target monitoring topics; according to the target monitoring topics, carrying out emotion tendency analysis on the target monitoring data to obtain target emotion tendency classification; according to the target emotion tendency classification and the target monitoring topics, carrying out community relation analysis on the plurality of first users to generate a first community user relation network, and carrying out key user identification on the first community user relation network to obtain at least one second user; performing relationship network weighted analysis on the first community user relationship network based on the at least one second user to generate a corresponding second community user relationship network; and inputting the second community user relation network into a preset community public opinion prediction model to perform community public opinion evolution prediction to obtain a target public opinion prediction result, and matching a corresponding target public opinion processing strategy according to the target public opinion prediction result.
With reference to the first aspect, in a first implementation manner of the first aspect of the present invention, the acquiring initial monitoring data of a plurality of first users in a target community through a preset big data platform, and performing data integrity check and data screening on the initial monitoring data to obtain standard monitoring data includes: acquiring initial monitoring data of a plurality of first users in a target community through a preset big data platform, and performing data preprocessing on the initial monitoring data to obtain first monitoring data; calculating a covariance matrix of the first monitoring data, and carrying out eigenvalue decomposition on the covariance matrix to obtain a plurality of first eigenvalues and eigenvectors of each first eigenvalue; sorting analysis is carried out on the plurality of first characteristic values to obtain a characteristic value sorting result, and the plurality of first characteristic values are screened according to the characteristic value sorting result to obtain a plurality of second characteristic values; performing data projection on the first monitoring data according to the plurality of second characteristic values to obtain reduced-dimension monitoring data; performing data reconstruction on the reduced-dimension monitoring data to obtain second monitoring data, and performing difference analysis on the second monitoring data and the first monitoring data to obtain target difference degree; and carrying out data integrity verification according to the target difference degree to obtain a data integrity verification result, and carrying out data screening according to the data integrity verification result to obtain standard monitoring data.
With reference to the first aspect, in a second implementation manner of the first aspect of the present invention, performing topic analysis on the target monitoring data to obtain a plurality of topic clusters, and performing topic identification and screening on the plurality of topic clusters to obtain a target monitoring topic, where the method includes: constructing a corresponding co-occurrence matrix according to the target monitoring data, and initializing parameters of a preset topic analysis model to obtain the number of topic clusters and the initial probability of the topic corresponding to each word; iteratively updating the co-occurrence matrix through the topic analysis model, and decomposing the co-occurrence matrix into a distribution probability of a topic cluster and a target probability of a topic corresponding to a word; distributing each word to different topic clusters according to the target probability of the topic corresponding to the word to form a plurality of topic clusters; performing topic relevance on the topic clusters to obtain topic relevance of each topic cluster, and performing topic screening on the topic clusters according to the topic relevance of each topic cluster to obtain a target monitoring topic.
With reference to the first aspect, in a third implementation manner of the first aspect of the present invention, according to the target monitored topic, performing emotion tendency analysis on the target monitored data to obtain a target emotion tendency classification, including: according to the emotion classification attribute words corresponding to the target monitoring topics, emotion tendency grades of the emotion classification attribute words are classified, and emotion tendency grades of the emotion classification attribute words are obtained; extracting attribute words from the target monitoring data according to the emotion classification attribute words to obtain frequency data of the emotion classification attribute words; and according to the frequency data and the emotion tendency grade, performing emotion tendency classification on the target monitoring data to obtain target emotion tendency classification.
With reference to the first aspect, in a fourth implementation manner of the first aspect of the present invention, the performing, according to the target emotion tendency classification and the target monitoring topic, community relation analysis on the plurality of first users, generating a first community user relation network, and performing key user identification on the first community user relation network to obtain at least one second user includes: screening first user data corresponding to each first user and the target emotion tendency classification and the target monitoring topic according to the target emotion tendency classification and the target monitoring topic, and analyzing the first user data to obtain the relationship and interaction among users; creating a plurality of nodes according to the plurality of first users, and creating a plurality of directed edges according to the relationship and interaction between the users; generating a corresponding first community user relation network according to the plurality of nodes and the plurality of directed edges, carrying out clustering calculation on the first community user relation network to obtain a clustering result, and calculating the importance of each node in the first community user relation network; invoking a preset graph calculation cluster analysis model, and carrying out user grouping calculation on the first community user relation network according to the clustering result and the importance of each node to obtain a plurality of user groups; and carrying out key user identification on the plurality of user groups, and determining at least one second user corresponding to each user group.
With reference to the first aspect, in a fifth implementation manner of the first aspect of the present invention, the generating a corresponding second community user relationship network based on the relationship network weighted analysis of the first community user relationship network by the at least one second user includes: acquiring second user data corresponding to at least one second user, and performing feature extraction on the second user data to obtain a plurality of user feature data; calculating weight data of each node in the first community user relation network based on the plurality of user characteristic data; and carrying out relation network weighted analysis on the first community user relation network according to the weight data of each node to generate a corresponding second community user relation network.
With reference to the first aspect, in a sixth implementation manner of the first aspect of the present invention, inputting the second community user relationship network into a preset community public opinion prediction model to perform community public opinion evolution prediction to obtain a target public opinion prediction result, and matching a corresponding target public opinion processing policy according to the target public opinion prediction result, where the method includes: performing network membership analysis on the second community user relationship network to obtain a target network membership, and constructing a community user relationship matrix according to the target network membership; inputting the community user relation matrix into a preset community public opinion prediction model, wherein the community public opinion prediction model comprises: a plurality of base prediction models and a meta prediction model, wherein the plurality of base prediction models include a long-short term memory network, a convolutional neural network, and a converter model; carrying out community public opinion evolution prediction on the community user relation matrix through the long-term and short-term memory network to obtain a first public opinion prediction result; carrying out community public opinion evolution prediction on the community user relation matrix through the convolutional neural network to obtain a second public opinion prediction result; carrying out community public opinion evolution prediction on the community user relation matrix through the converter model to obtain a third public opinion prediction result; inputting the first public opinion prediction result, the second public opinion prediction result and the third public opinion prediction result into the meta prediction model for result integration to obtain a target public opinion prediction result; and constructing a mapping relation between the public opinion prediction result and the public opinion processing strategy, and matching a target public opinion processing strategy corresponding to the target public opinion prediction result according to the mapping relation.
The second aspect of the present invention provides a big data based community public opinion monitoring device, which includes: the acquisition module is used for acquiring initial monitoring data of a plurality of first users in a target community through a preset big data platform, and carrying out data integrity check and data screening on the initial monitoring data to obtain standard monitoring data; the word segmentation module is used for carrying out text word segmentation and part-of-speech tagging on the standard monitoring data to obtain target word segmentation data, and carrying out named entity recognition on the target word segmentation data to obtain target monitoring data; the screening module is used for carrying out topic analysis on the target monitoring data to obtain a plurality of topic clusters, and carrying out topic identification and screening on the topic clusters to obtain target monitoring topics; the classification module is used for analyzing emotion tendencies of the target monitoring data according to the target monitoring topics to obtain target emotion tendencies classification; the analysis module is used for carrying out community relation analysis on the plurality of first users according to the target emotion tendency classification and the target monitoring topics, generating a first community user relation network, and carrying out key user identification on the first community user relation network to obtain at least one second user; the processing module is used for carrying out relation network weighted analysis on the first community user relation network based on the at least one second user and generating a corresponding second community user relation network; and the prediction module is used for inputting the second community user relation network into a preset community public opinion prediction model to perform community public opinion evolution prediction to obtain a target public opinion prediction result, and matching a corresponding target public opinion processing strategy according to the target public opinion prediction result.
With reference to the second aspect, in a first implementation manner of the second aspect of the present invention, the acquiring module is specifically configured to: acquiring initial monitoring data of a plurality of first users in a target community through a preset big data platform, and performing data preprocessing on the initial monitoring data to obtain first monitoring data; calculating a covariance matrix of the first monitoring data, and carrying out eigenvalue decomposition on the covariance matrix to obtain a plurality of first eigenvalues and eigenvectors of each first eigenvalue; sorting analysis is carried out on the plurality of first characteristic values to obtain a characteristic value sorting result, and the plurality of first characteristic values are screened according to the characteristic value sorting result to obtain a plurality of second characteristic values; performing data projection on the first monitoring data according to the plurality of second characteristic values to obtain reduced-dimension monitoring data; performing data reconstruction on the reduced-dimension monitoring data to obtain second monitoring data, and performing difference analysis on the second monitoring data and the first monitoring data to obtain target difference degree; and carrying out data integrity verification according to the target difference degree to obtain a data integrity verification result, and carrying out data screening according to the data integrity verification result to obtain standard monitoring data.
With reference to the second aspect, in a second implementation manner of the second aspect of the present invention, the screening module is specifically configured to: constructing a corresponding co-occurrence matrix according to the target monitoring data, and initializing parameters of a preset topic analysis model to obtain the number of topic clusters and the initial probability of the topic corresponding to each word; iteratively updating the co-occurrence matrix through the topic analysis model, and decomposing the co-occurrence matrix into a distribution probability of a topic cluster and a target probability of a topic corresponding to a word; distributing each word to different topic clusters according to the target probability of the topic corresponding to the word to form a plurality of topic clusters; performing topic relevance on the topic clusters to obtain topic relevance of each topic cluster, and performing topic screening on the topic clusters according to the topic relevance of each topic cluster to obtain a target monitoring topic.
With reference to the second aspect, in a third implementation manner of the second aspect of the present invention, the classification module is specifically configured to: according to the emotion classification attribute words corresponding to the target monitoring topics, emotion tendency grades of the emotion classification attribute words are classified, and emotion tendency grades of the emotion classification attribute words are obtained; extracting attribute words from the target monitoring data according to the emotion classification attribute words to obtain frequency data of the emotion classification attribute words; and according to the frequency data and the emotion tendency grade, performing emotion tendency classification on the target monitoring data to obtain target emotion tendency classification.
With reference to the second aspect, in a fourth implementation manner of the second aspect of the present invention, the analysis module is specifically configured to: screening first user data corresponding to each first user and the target emotion tendency classification and the target monitoring topic according to the target emotion tendency classification and the target monitoring topic, and analyzing the first user data to obtain the relationship and interaction among users; creating a plurality of nodes according to the plurality of first users, and creating a plurality of directed edges according to the relationship and interaction between the users; generating a corresponding first community user relation network according to the plurality of nodes and the plurality of directed edges, carrying out clustering calculation on the first community user relation network to obtain a clustering result, and calculating the importance of each node in the first community user relation network; invoking a preset graph calculation cluster analysis model, and carrying out user grouping calculation on the first community user relation network according to the clustering result and the importance of each node to obtain a plurality of user groups; and carrying out key user identification on the plurality of user groups, and determining at least one second user corresponding to each user group.
With reference to the second aspect, in a fifth implementation manner of the second aspect of the present invention, the processing module is specifically configured to: acquiring second user data corresponding to at least one second user, and performing feature extraction on the second user data to obtain a plurality of user feature data; calculating weight data of each node in the first community user relation network based on the plurality of user characteristic data; and carrying out relation network weighted analysis on the first community user relation network according to the weight data of each node to generate a corresponding second community user relation network.
With reference to the second aspect, in a sixth implementation manner of the second aspect of the present invention, the prediction module is specifically configured to: performing network membership analysis on the second community user relationship network to obtain a target network membership, and constructing a community user relationship matrix according to the target network membership; inputting the community user relation matrix into a preset community public opinion prediction model, wherein the community public opinion prediction model comprises: a plurality of base prediction models and a meta prediction model, wherein the plurality of base prediction models include a long-short term memory network, a convolutional neural network, and a converter model; carrying out community public opinion evolution prediction on the community user relation matrix through the long-term and short-term memory network to obtain a first public opinion prediction result; carrying out community public opinion evolution prediction on the community user relation matrix through the convolutional neural network to obtain a second public opinion prediction result; carrying out community public opinion evolution prediction on the community user relation matrix through the converter model to obtain a third public opinion prediction result; inputting the first public opinion prediction result, the second public opinion prediction result and the third public opinion prediction result into the meta prediction model for result integration to obtain a target public opinion prediction result; and constructing a mapping relation between the public opinion prediction result and the public opinion processing strategy, and matching a target public opinion processing strategy corresponding to the target public opinion prediction result according to the mapping relation.
The third aspect of the present invention provides a community public opinion monitoring device based on big data, comprising: a memory and at least one processor, the memory having instructions stored therein; the at least one processor invokes the instructions in the memory to cause the big data based community public opinion monitoring device to perform the big data based community public opinion monitoring method described above.
A fourth aspect of the present invention provides a computer-readable storage medium having instructions stored therein that, when executed on a computer, cause the computer to perform the big data based community public opinion monitoring method described above.
In the technical scheme provided by the invention, subject analysis is carried out on target monitoring data to obtain a plurality of subject clusters, and topic identification and screening are carried out on the plurality of subject clusters to obtain target monitoring topics; carrying out emotion tendency analysis according to the target monitoring topics to obtain target emotion tendency classification; carrying out community relation analysis according to the target emotion tendency classification and the target monitoring topics, generating a first community user relation network, and carrying out key user identification to obtain at least one second user; performing relationship network weighted analysis on the first community user relationship network based on at least one second user to generate a second community user relationship network; the method comprises the steps of inputting a second community user relationship network into a community public opinion prediction model to conduct community public opinion evolution prediction to obtain a target public opinion prediction result, acquiring initial monitoring data of a plurality of first users in a target community based on a big data platform, and processing and analyzing the data in real time, so that public opinion monitoring can know community public opinion dynamic and event development trends more timely. Through analysis of a large amount of community public opinion data, the views, emotion tendencies and focus of attention of a plurality of first users in the community can be comprehensively known, evolution tendencies of target public opinion events are predicted through a machine learning algorithm, decision makers are helped to timely take countermeasures to prevent public opinion risks, users with great influence on public opinion in the community are found through community relation analysis and key user identification, interaction and communication can be carried out with the users more pertinently, a more reasonable and effective public opinion processing scheme is formulated through matching of a prediction model and public opinion processing strategies, and accuracy of community public opinion monitoring is improved.
Drawings
FIG. 1 is a schematic diagram of an embodiment of a community public opinion monitoring method based on big data according to an embodiment of the present invention;
FIG. 2 is a flow chart of data integrity checksum data screening in accordance with an embodiment of the present invention;
FIG. 3 is a flow chart of community relationship analysis in an embodiment of the invention;
FIG. 4 is a flowchart of community public opinion evolution prediction according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an embodiment of a community public opinion monitoring device based on big data according to an embodiment of the present invention;
Fig. 6 is a schematic diagram of an embodiment of a community public opinion monitoring device based on big data according to an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a community public opinion monitoring method and a related device based on big data, which are used for improving the accuracy of community public opinion monitoring. The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.
For easy understanding, a specific flow of an embodiment of the present invention is described below, referring to fig. 1, and an embodiment of a community public opinion monitoring method based on big data in an embodiment of the present invention includes:
s101, acquiring initial monitoring data of a plurality of first users in a target community through a preset big data platform, and performing data integrity check sum data screening on the initial monitoring data to obtain standard monitoring data;
It can be understood that the execution subject of the present invention may be a community public opinion monitoring device based on big data, and may also be a terminal or a server, which is not limited herein. The embodiment of the invention is described by taking a server as an execution main body as an example.
Specifically, the server acquires initial monitoring data of a plurality of first users in a target community through a preset big data platform, and performs data preprocessing on the initial monitoring data to obtain first monitoring data. Data preprocessing may include data cleansing, deduplication, denoising, etc. operations to ensure the quality and accuracy of the data. The first monitored data is further analyzed. A covariance matrix of the first monitored data is calculated, the matrix being used to measure correlations between different features. And carrying out eigenvalue decomposition on the covariance matrix to obtain a plurality of first eigenvalues and eigenvectors corresponding to the eigenvalues. After the first characteristic value and the characteristic vector are obtained, the characteristic values are subjected to sorting analysis, and a characteristic value sorting result is obtained. And screening the characteristic values with higher importance according to the sorting result to obtain a plurality of second characteristic values. The second characteristic values are used for carrying out data projection on the first monitoring data, so that dimension reduction of the data is realized. The reduced-dimension monitoring data can be used for data reconstruction. And restoring the data subjected to the dimension reduction into second monitoring data through an inverse conversion process. And performing difference analysis on the second monitoring data and the first monitoring data, and calculating the target difference degree between the second monitoring data and the first monitoring data. The target difference degree reflects the information degree of the data lost in the dimension reduction process and is used for evaluating the integrity and the dimension reduction effect of the data. In the data integrity check phase, a variance threshold may be set. Only data with a target degree of difference above the threshold is considered to be more important and representative data. After data integrity verification, screened standard monitoring data with higher quality and information quantity is obtained, and the method is suitable for subsequent public opinion analysis and prediction. For example, suppose a server is to monitor user comments about a certain cell phone in a certain community. The server acquires 1000 pieces of comment data from the social media platform, and each comment comprises the evaluation and experience of the user on the mobile phone. And the server performs data cleaning on the comment data to remove repeated comments and noise. And carrying out text analysis on the cleaned comment data to obtain emotion tendencies (such as positive, neutral and negative) as first monitoring data. And calculating covariance matrixes of the comment data, and decomposing the characteristic values to obtain a plurality of first characteristic values and corresponding characteristic vectors. And carrying out sequencing analysis on the first characteristic values to obtain characteristic value sequencing results, and selecting the first characteristic values as second characteristic values. And reducing the dimension of the comment data by using the second characteristic value to obtain the dimension-reduced data representation. And restoring the data subjected to the dimension reduction into second monitoring data through inverse conversion. And performing difference analysis on the second monitoring data and the first monitoring data, and calculating the target difference degree between the second monitoring data and the first monitoring data. Assuming that the server sets the variance threshold to 0.1, only comment data with a target variance higher than 0.1 is considered important and representative, conforming to the attention target of the server. The comment data passing the data integrity check are standard monitoring data, and can be used for subsequent public opinion analysis and prediction.
S102, performing text word segmentation and part-of-speech tagging on standard monitoring data to obtain target word segmentation data, and performing named entity recognition on the target word segmentation data to obtain target monitoring data;
In particular, server criteria monitoring data is typically raw text data collected from social media, news websites, or other channels. Such data may include postings, comments, news articles, etc. of the user, covering discussion and feedback of a particular topic or event. The goal is to perform a series of processes on these text data to obtain structured target monitoring data for subsequent analysis and prediction. First, text segmentation. Text segmentation is the process of segmenting a continuous sequence of text into words or phrases. The word segmentation converts the text into discrete vocabulary units, which lays a foundation for subsequent processing. And then marking the parts of speech. Part of speech tagging is to tag each word after word segmentation with its part of speech, such as nouns, verbs, adjectives, etc. The task of part-of-speech tagging is to determine the grammatical role that each word plays in a sentence, which helps to better understand the structure and meaning of the sentence. Finally Named Entity Recognition (NER). Named entity recognition is the process of recognizing entities in text that have a particular meaning, such as person names, place names, organization names, and the like. NER helps extract meaningful information from text, identifying important entities related to a particular topic or event. Through text word segmentation, part-of-speech tagging and named entity recognition, the server carries out language-level processing and analysis on the standard monitoring data, so that target monitoring data are obtained. The target monitoring data is processed through the steps, has better structural and semantic information, and provides a more valuable data base for the follow-up topic analysis, emotion tendency analysis and other public opinion monitoring tasks. Such analysis and preprocessing processes have wide application in the fields of public opinion monitoring and information extraction, etc., helping servers to better understand and process large amounts of text data.
S103, performing topic analysis on the target monitoring data to obtain a plurality of topic clusters, and performing topic identification and screening on the topic clusters to obtain target monitoring topics;
It should be noted that, the co-occurrence matrix is constructed based on the target monitoring data. The co-occurrence matrix is used for representing co-occurrence relations among different words in the target monitoring data. And preprocessing the original text data, such as word segmentation, stop word removal, punctuation mark removal and the like, so as to obtain a word list of each text. And counting the occurrence frequency of each word in the same text, and constructing a co-occurrence matrix according to the co-occurrence frequency. Second, topic analysis is performed, typically using probabilistic topic models, such as LATENT DIRICHLET Allocation (LDA). In topic analysis, a topic model is firstly initialized in a parameter mode, and the number of topic clusters and the initial probability of the topic corresponding to each word are determined. And iteratively updating the co-occurrence matrix to decompose the co-occurrence matrix into the distribution probability of the topic clusters and the target probability of the topics corresponding to the words. This results in a plurality of topic clusters, each topic cluster consisting of a set of associated words. And distributing each word to different topic clusters according to the target probability of the topic corresponding to the word. In this way, a plurality of topic clusters are formed, each representing a different topic. By analyzing the word content of each topic cluster, topics represented by the topic clusters can be identified, such as 'technological development', 'environmental protection', 'healthy life', and the like. And carrying out topic relevance calculation, and obtaining topic relevance of each topic cluster by measuring the association degree between words in different topic clusters. The highly relevant topic clusters represent that the internal words are closely related, and form a more consistent topic. Based on the calculation result of topic relevance, screening a plurality of topic clusters, and reserving the topic cluster with higher topic relevance with the target monitoring as the final target monitoring topic. For example, assume that the server has a collection of social media comment data that contains discussions about the brand of cell phone. The server firstly performs word segmentation on the text, removes stop words, and then constructs a co-occurrence matrix to represent co-occurrence relations among different words. Decomposing the co-occurrence matrix by using an LDA topic model to obtain a plurality of topic clusters, wherein each cluster consists of a group of associated words. And distributing each word to different topic clusters according to the target probability of the topic corresponding to the word. By analyzing the word content in each topic cluster, the server may find that some of the clusters represent different topics such as "cell phone performance", "cell phone appearance", "user experience", etc. And calculating topic relevance and screening, and reserving topic clusters with higher topic relevance to the mobile phone brands as target monitoring topics, so that discussion and public opinion situations of users on the mobile phone brands are better known.
S104, carrying out emotion tendency analysis on the target monitoring data according to the target monitoring topics to obtain target emotion tendency classification;
specifically, according to the target monitoring topic, matching the corresponding emotion classification attribute words. Emotion classification attribute words are words used to represent different emotion tendencies, such as "like", "happy" representing positive emotion, and "offensive", "sad" representing negative emotion, etc. And selecting a proper emotion classification attribute word set according to the characteristics of the topics. And carrying out emotion tendency grade classification on the emotion classification attribute words. Emotional tendency levels are used to represent the degree of emotion expressed by each emotion classification attribute word and can be generally classified into positive, neutral, and negative levels. For example, "like" may be classified as positive emotion rating, while "neutral" may be classified as neutral emotion rating. And carrying out emotion classification on the target monitoring data through extracting the attribute words. The attribute word extraction is to extract emotion classification attribute words contained in the target monitoring data to form a word list. And calculating the frequency data of the emotion classification attribute words in the target monitoring data. The frequency data represents the number of times each emotion classification attribute word appears in the data and is used for measuring the importance degree of each emotion classification attribute word in the target data. And carrying out emotion tendency classification on the target monitoring data according to the frequency data and emotion tendency level of the emotion classification attribute words. The overall emotion tendencies of the target monitoring data can be judged according to the occurrence frequency of the emotion classification attribute words and the emotion tendency grades corresponding to the emotion classification attribute words. For example, if the positive emotion word frequency is high and the emotion tendencies are positive, then the target monitoring data may be prone to positive emotion classification. For example, suppose that the server performs emotion trend analysis on user comments of a certain product. The server selects a set of emotion classification attribute words such as "good score", "satisfactory", "bad score", "dissatisfaction", and the like. Emotion tendencies are classified for these emotion classification attribute words, such as "good score" and "satisfactory" as positive emotion ranks, and "bad score" and "dissatisfaction" as negative emotion ranks. The server extracts the emotion classification attribute words contained in the user comments and obtains the frequency data of the emotion classification attribute words in the comment data. Based on these frequency data and emotion tendency levels, the server classifies emotion tendency of the user comments, and determines whether the user's evaluation of the product as a whole is positive or negative. Such emotional tendency analysis may help businesses understand feedback and feelings of users, providing valuable references for product improvement and marketing strategies.
S105, carrying out community relation analysis on a plurality of first users according to target emotion tendency classification and target monitoring topics, generating a first community user relation network, and carrying out key user identification on the first community user relation network to obtain at least one second user;
Specifically, first user data of each first user related to the target emotional tendency classification and topics are screened out according to the target emotional tendency classification and the target monitoring topics. Such data may include information about the posting, commentary, interactions, etc. of the user. By parsing the data, relationships and interactions between users, such as reply, praise, share, etc. actions between users can be obtained. Based on the first user data screened, a plurality of nodes are created, each node representing a first user, and a plurality of directed edges are created according to the relationship and interaction between the users, representing the relationship of communication and interaction between the users. And generating a corresponding first community user relation network according to the nodes and the directed edges. In the network, nodes represent first users and directed edges represent relationships and interactions between users. And carrying out clustering calculation on the community network, and clustering users with similar relations and interaction modes together to obtain a clustering result. The importance of each node in the first community user relationship network is calculated, and a correlation algorithm in graph theory, such as a PageRank algorithm, can be used to measure the importance of each node in the community network. And calling a preset graph calculation cluster analysis model, and carrying out user grouping calculation on the first community user relation network by combining the clustering result and the importance of the nodes, wherein the user group with similar relation is divided into a plurality of user groups. And carrying out key user identification on the user groups, and determining at least one second user corresponding to each user group. Key users may be identified by some criteria, such as users with higher importance, impact, or liveness in the community network are considered key users. For example, suppose a server analyzes user data of a certain social media platform, the target emotion tendencies are classified as positive emotions, and the target monitoring topics are new product releases of a certain brand. The server first obtains user data from the platform related to the release of the brand new product, including posts and comments of the user, etc. The data is then parsed to analyze relationships and interactions between users. Based on this data, the server creates a plurality of nodes, each node representing a first user, and constructs directed edges representing communications and interactions between users. And the server performs clustering calculation on the community network, and clusters users with similar relations and interaction modes together to obtain a clustering result. The server calculates importance for each node, finding out users with higher importance in the community network. And according to the clustering result and the importance of the nodes, carrying out user grouping calculation on the community network, dividing the user group with similar relation into a plurality of user groups, and identifying at least one key user from each user group, wherein the key user can be used as a second user and represents the users with important influence under the new product release topics of the brand.
S106, carrying out relation network weighted analysis on the first community user relation network based on at least one second user to generate a corresponding second community user relation network;
Specifically, second user data corresponding to at least one second user is obtained. The second user data may include social networking behavior, interaction information, personal attributes, etc. of the user. By feature extraction of the data, a plurality of user feature data are obtained, which feature data can be used to describe the behavior and characteristics of the second user in the social network. Based on these user characteristic data, weight data for each node in the first community user relationship network is calculated. The weight data can be calculated according to indexes such as importance, influence, liveness and the like of the second user and used for measuring the importance degree of the second user in the first community user relation network. And carrying out relation network weighting analysis according to the weight data of each node. The weighted analysis is to apply the weight data to the first community user relationship network, and adjust the network structure and the connection strength by giving each node different weights. The weighted analysis may affect the degree of association between nodes by increasing or decreasing the weight of the edges. And generating a corresponding second community user relation network according to the result of the weighted analysis. The second community user relationship network will reflect the nodes and edges of the first community user relationship network that are relevant to the second user and the strength of the connection between the nodes will be affected by the weighted analysis. For example, assume that the server has a user relationship network of a social media platform, which includes associations between different users. The server performs feature extraction on the data of the two second users and calculates weight data of each node in the first community user relation network to weight and analyze the network. The characteristic data of the first second user may include the activity of the first second user on social media, the number of fans and the like; the characteristic data of the second user may include his posting frequency, sharing behavior, etc. Based on the feature data, the server calculates weights for each node, the weights representing the importance and influence of each node in the first community user relationship network. And according to the weight data, carrying out weighted analysis on the first community user relation network. The weighted analysis may enhance the connection strength between the first and second users and other nodes because he has a higher impact and liveness on social media; at the same time, the connection strength between the second user and the other nodes may be reduced because of less activity on the social media. Based on the results of the weighted analysis, a corresponding second community user relationship network is generated, which network will reflect the nodes and edges of the first community user relationship network that are associated with the two second users, and the connection strength between the nodes will be adjusted by the weighted analysis.
S107, inputting a second community user relation network into a preset community public opinion prediction model to conduct community public opinion evolution prediction, obtaining a target public opinion prediction result, and matching a corresponding target public opinion processing strategy according to the target public opinion prediction result.
Specifically, the server performs network membership analysis on the second community user relationship network to identify membership of different users in the network, namely their status and influence in the community. And constructing a community user relationship matrix according to the analysis result, wherein the matrix represents the relationship strength and the connection condition between different users. The preset community public opinion prediction model comprises a plurality of base prediction models and a meta prediction model. The base prediction model includes long-term memory network (LSTM), convolutional Neural Network (CNN), and transducer model (transducer). LSTM is suitable for prediction of sequence data, CNN can be used for prediction of image or matrix data, and transducer is a modern model for sequence modeling. And respectively inputting the community user relation matrix into LSTM, CNN and a Transformer base prediction model, and carrying out evolution prediction on the community public opinion through the models to obtain three different public opinion prediction results, namely a first public opinion prediction result, a second public opinion prediction result and a third public opinion prediction result. Inputting the first public opinion prediction result, the second public opinion prediction result and the third public opinion prediction result into a meta prediction model, and integrating the results through the meta prediction model to obtain a final target public opinion prediction result. The meta-prediction model can be a simple linear regression model or a more complex integrated learning method, and is used for fusing the results of the base prediction model to obtain comprehensive prediction. And constructing a mapping relation between the public opinion prediction result and the public opinion processing strategy according to the target public opinion prediction result. The mapping may be a predefined rule table or a machine learning based model. And according to the mapping relation, matching corresponding public opinion processing strategies to give corresponding processing suggestions for the prediction result. For example, assume that a server performs public opinion analysis on a user relationship network of a social media platform. The server firstly performs network membership analysis to determine the status and influence of users in communities, and then constructs a community user relationship matrix to represent the relationship strength among users. The server uses a preset community public opinion prediction model comprising LSTM, CNN, transformer and the like to conduct evolution prediction on community public opinion, and three different public opinion prediction results are obtained. And integrating the results through a meta-prediction model to obtain a final target public opinion prediction result. The server constructs a mapping relation between the public opinion prediction result and the public opinion processing strategy according to the prediction result, and matches the corresponding public opinion processing strategy according to the mapping relation to provide targeted processing suggestions for community public opinion evolution.
In the embodiment of the invention, subject analysis is carried out on target monitoring data to obtain a plurality of subject clusters, and topic identification and screening are carried out on the plurality of subject clusters to obtain target monitoring topics; carrying out emotion tendency analysis according to the target monitoring topics to obtain target emotion tendency classification; carrying out community relation analysis according to the target emotion tendency classification and the target monitoring topics, generating a first community user relation network, and carrying out key user identification to obtain at least one second user; performing relationship network weighted analysis on the first community user relationship network based on at least one second user to generate a second community user relationship network; the method comprises the steps of inputting a second community user relationship network into a community public opinion prediction model to conduct community public opinion evolution prediction to obtain a target public opinion prediction result, acquiring initial monitoring data of a plurality of first users in a target community based on a big data platform, and processing and analyzing the data in real time, so that public opinion monitoring can know community public opinion dynamic and event development trends more timely. Through analysis of a large amount of community public opinion data, the views, emotion tendencies and focus of attention of a plurality of first users in the community can be comprehensively known, evolution tendencies of target public opinion events are predicted through a machine learning algorithm, decision makers are helped to timely take countermeasures to prevent public opinion risks, users with great influence on public opinion in the community are found through community relation analysis and key user identification, interaction and communication can be carried out with the users more pertinently, a more reasonable and effective public opinion processing scheme is formulated through matching of a prediction model and public opinion processing strategies, and accuracy of community public opinion monitoring is improved.
In a specific embodiment, as shown in fig. 2, the process of performing step S101 may specifically include the following steps:
s201, acquiring initial monitoring data of a plurality of first users in a target community through a preset big data platform, and performing data preprocessing on the initial monitoring data to obtain first monitoring data;
s202, calculating a covariance matrix of first monitoring data, and carrying out eigenvalue decomposition on the covariance matrix to obtain a plurality of first eigenvalues and eigenvectors of each first eigenvalue;
S203, sorting analysis is carried out on the first characteristic values to obtain characteristic value sorting results, and the first characteristic values are screened according to the characteristic value sorting results to obtain second characteristic values;
s204, carrying out data projection on the first monitoring data according to the plurality of second characteristic values to obtain the monitoring data after dimension reduction;
S205, carrying out data reconstruction on the reduced-dimension monitoring data to obtain second monitoring data, and carrying out difference analysis on the second monitoring data and the first monitoring data to obtain target difference degree;
s206, performing data integrity verification according to the target difference degree to obtain a data integrity verification result, and performing data screening according to the data integrity verification result to obtain standard monitoring data.
Specifically, the server uses a preset large data platform to connect to a data source of the target community, and obtains initial monitoring data of a plurality of first users from the data source. Such data may include postings, comments, news articles, etc. of the user, covering discussion and feedback of a particular topic or event. And carrying out data preprocessing on the initial monitoring data, including removing repeated data, processing missing values, cleaning invalid information and the like, so as to ensure the quality and accuracy of the data, and obtaining first monitoring data. The server calculates a covariance matrix of the first monitored data. The covariance matrix reflects the correlation between the various features in the data. And decomposing the eigenvalue of the covariance matrix, and obtaining a plurality of first eigenvalues and eigenvectors corresponding to the eigenvalues by the server. And the server performs sequencing analysis on the plurality of first characteristic values to obtain a characteristic value sequencing result. According to the sorting result, the server screens out a plurality of second characteristic values, and the characteristic values play an important role in the dimension reduction process. Based on a plurality of second characteristic values, the server performs data projection on the first monitoring data, and dimension reduction operation on the data is achieved. The reduced-dimension monitoring data has lower dimension, but still retains important information of the data. And the server performs data reconstruction on the reduced-dimension monitoring data, and restores the data to the original dimension to obtain second monitoring data. And the server performs difference analysis on the second monitoring data and the first monitoring data, and calculates the target difference degree. The difference reflects the degree of information loss in the dimension reduction process, and can also be used for measuring the similarity between the dimension reduced data and the original data. And carrying out data integrity verification according to the target difference degree, and ensuring that the data subjected to dimension reduction can still accurately reflect the characteristics of the original data. And according to the data integrity check result, the server performs data screening to obtain standard monitoring data. These standard monitoring data can be used for subsequent analysis and prediction, such as topic analysis, emotional tendency analysis, etc. For example, assume that a server connects to a social media platform using a preset big data platform, and obtains posting data for users in a collection of target communities. And the server performs data preprocessing on the data, removes repeated data and invalid information, and obtains first monitoring data. And calculating a covariance matrix of the first monitoring data, and decomposing the covariance matrix to obtain a plurality of first eigenvalues and eigenvectors. And screening out a plurality of second characteristic values by the server according to the characteristic value sequencing result, and carrying out data projection and dimension reduction to obtain the dimension reduced monitoring data. And reconstructing the dimension reduced data to obtain second monitoring data. And calculating the target difference degree by carrying out difference analysis on the second monitoring data and the first monitoring data, and carrying out data integrity verification. And according to the verification result, the server obtains the screened standard monitoring data for subsequent community public opinion analysis and prediction.
In a specific embodiment, the process of executing step S103 may specifically include the following steps:
(1) Constructing a corresponding co-occurrence matrix according to the target monitoring data, and initializing parameters of a preset topic analysis model to obtain the number of topic clusters and the initial probability of the topic corresponding to each word;
(2) Iteratively updating the co-occurrence matrix through a topic analysis model, and decomposing the co-occurrence matrix into the distribution probability of topic clusters and the target probability of topics corresponding to words;
(3) Distributing each word to different topic clusters according to the target probability of the topic corresponding to the word to form a plurality of topic clusters;
(4) Topic relevance is carried out on the topic clusters to obtain topic relevance of each topic cluster, topic screening is carried out on the topic clusters according to the topic relevance of each topic cluster to obtain target monitoring topics.
Specifically, the server extracts words from the target monitoring data and constructs a co-occurrence matrix. A co-occurrence matrix is a matrix that represents the frequency of co-occurrence between words, where rows and columns represent different words, respectively, and the elements in the matrix represent the number or probability of these words occurring simultaneously within a certain context window. And initializing parameters of a preset topic analysis model. The topic analysis model is a technology for mining topic information from text data, and common models include latent semantic analysis (LATENT SEMANTIC ANALYSIS, LSA) and Latent Dirichlet Allocation (LDA), etc. In the parameter initialization stage, the server determines the number of topic clusters and the initial probability of the topic corresponding to each word. And iteratively updating the co-occurrence matrix through the topic analysis model. In the iterative process, the model continuously adjusts the number of topic clusters and the probability of the corresponding topics of the words so as to explain the data in the co-occurrence matrix to the greatest extent. After the iteration is completed, the server decomposes the co-occurrence matrix into a distribution probability of the topic cluster and a target probability of the topic corresponding to the word. These probability information will describe the degree of association between each topic cluster and the terms, as well as the distribution of each term in the different topic clusters. And according to the target probability of the topic corresponding to the word, the server distributes each word to different topic clusters to form a plurality of topic clusters. In this way, the server divides the target monitoring data into different topic clusters according to the co-occurrence relation among the words and the result of the topic analysis model, and each topic cluster represents a specific topic or topic. The server performs topic relevance analysis on the plurality of topic clusters. Topic relevance is used to measure the relevance between words within each topic cluster and the similarity between topic clusters. And obtaining topic relevance scores of each topic cluster by the server through topic relevance analysis. And according to the scores, the server screens a plurality of topic clusters to obtain target monitoring topics. For example, assume that a server collects a series of monitoring data about an event from social media using a preset big data platform. The server extracts words in the data and constructs a co-occurrence matrix, wherein co-occurrence relations among the words are recorded. The server selects a preset LDA topic analysis model, and performs parameter initialization to determine the number of topic clusters and the initial probability of the topic corresponding to each word. And decomposing the co-occurrence matrix into the distribution probability of the topic cluster and the target probability of the topic corresponding to the word through iterative updating. And distributing the words to different topic clusters according to the target probability of the topics corresponding to the words, so as to form a plurality of topic clusters. The server analyzes the topic relevance of the topic clusters to obtain relevance scores of each topic cluster, and screens the topic clusters according to the scores to obtain target monitoring topics.
In a specific embodiment, the process of executing step S104 may specifically include the following steps:
(1) According to the emotion classification attribute words corresponding to the target monitoring topic matching, emotion tendency grades are carried out on the emotion classification attribute words, and the emotion tendency grades of the emotion classification attribute words are obtained;
(2) Extracting attribute words from the target monitoring data according to the emotion classification attribute words to obtain frequency data of the emotion classification attribute words;
(3) And carrying out emotion tendency classification on the target monitoring data according to the frequency data and the emotion tendency level to obtain target emotion tendency classification.
Specifically, the server establishes an emotion classification attribute word stock. The emotion classification attribute word stock is a database containing words of different emotion classifications, each word being associated with a particular emotion tendency. The word stock can be obtained by a method of manually marking or automatically mining emotion words. And matching the corresponding emotion classification attribute words from the emotion classification attribute word library according to the target monitoring topics. The matching process can be realized through a text matching algorithm or a vocabulary alignment technology, vocabulary related to topics is compared with vocabulary in the emotion classification attribute word stock, and matched emotion attribute words are found out. And carrying out emotion tendency grading on the matched emotion classification attribute words. Emotional tendency levels typically include positive, negative, and neutral levels that describe emotional attitudes of emotional vocabulary expressions. The emotion tendency level can be judged according to the frequency of emotion words in the text, the context and other information. After the emotion tendency classification of the emotion classification attribute words is completed, the server extracts the attribute words from the target monitoring data to obtain frequency data of the emotion classification attribute words. Attribute word extraction is to extract words related to emotion classification attribute words from text data, and the words can reflect emotion tendencies of the text. And carrying out emotion tendency classification on the target monitoring data by combining the emotion tendency level of the emotion classification attribute words and the frequency data of the attribute words in the target monitoring data. Machine learning algorithms such as naive Bayes, support vector machines and the like can be adopted, emotion classification attribute words and corresponding emotion tendency grades are used as features, attribute word frequencies in target monitoring data are used as input, and an emotion classification model is trained. By means of this model, the target monitoring data can be divided into different emotional tendency categories, such as positive, negative or neutral. For example, assume that the server has a target community, focusing on an important sporting event. The server collects a large number of posts and comments from the community and extracts text data related to the event. The server establishes an emotion classification attribute word library which contains active, passive and neutral emotion words. Based on the event topic, the server matches out from the word stock some attribute words related to the event emotion, such as "win", "lose", "won" and the like. The server classifies the emotion tendencies of the matched attribute words, such as "winning", "won" as positive emotion and "losing" as negative emotion. And extracting the attribute words from the target monitoring data by the server, and finding out the frequency data of the emotion classification attribute words appearing in the attribute words. The server combines the emotion tendency level of the emotion classification attribute words with the frequency data of the attribute words in the target monitoring data to train an emotion classification model. Through this model, the server classifies the text data in the community into different emotional tendency categories, such as positive, negative or neutral, to obtain a target emotional tendency category. Thus, the server can know the overall emotion attitude of the community to the event, and provide valuable references for subsequent public opinion analysis and processing.
Wherein target monitoring data is encoded appropriately, and frequency data and emotion tendencies are expressed as genes of chromosomes. For example, frequency data and emotional tendency levels may be represented using binary codes. Initializing a population: a number of chromosomes are randomly generated as an initial population. Fitness function: an fitness function is defined for evaluating the fitness of each chromosome, i.e. the accuracy of its emotional tendency classification. The fitness function may be measured based on the difference between the standard result of the target emotional tendency classification and the actual emotional tendency classification result. Selecting: and selecting a part of chromosomes as parents of the next generation population according to the fitness value of each chromosome by using a selection operator. Crossing: and performing crossover operation on the selected parent chromosomes by using crossover operators to generate new chromosomes as offspring of the next generation population. The crossing may be in the form of a single point crossing, a multi-point crossing, etc. Variation: mutation operations are performed on the offspring chromosomes using mutation operators to increase diversity of the population. The manner of mutation may be random position mutation, i.e., randomly changing a part of the gene values in the chromosome. Generating a new generation population: and forming a new population by the offspring chromosome and part of the parent chromosome obtained through the selection, crossing and mutation operations. Termination condition: and judging whether to end the iteration of the genetic algorithm according to a preset termination condition (such as reaching the maximum iteration number or meeting a certain fitness threshold). Outputting a result: and when the termination condition is met, selecting a chromosome with highest fitness as a final emotion tendency classification result, and outputting a target emotion tendency classification.
In a specific embodiment, as shown in fig. 3, the process of executing step S105 may specifically include the following steps:
s301, screening first user data corresponding to each first user and the target emotion tendency classification and the target monitoring topic according to the target emotion tendency classification and the target monitoring topic, and analyzing the first user data to obtain the relationship and interaction among the users;
S302, creating a plurality of nodes according to a plurality of first users, and creating a plurality of directed edges according to the relationship and interaction among the users;
S303, generating a corresponding first community user relation network according to a plurality of nodes and a plurality of directed edges, carrying out clustering calculation on the first community user relation network to obtain a clustering result, and calculating the importance of each node in the first community user relation network;
s304, invoking a preset graph calculation cluster analysis model, and carrying out user grouping calculation on the first community user relation network according to the clustering result and the importance of each node to obtain a plurality of user groups;
s305, carrying out key user identification on a plurality of user groups, and determining at least one second user corresponding to each user group.
Specifically, the server acquires initial monitoring data of a plurality of first users in a target community according to a preset big data platform, and performs data preprocessing to obtain the first monitoring data. Such data may include community interaction data such as posts, comments, etc. of the user. And screening first user data corresponding to the target emotion tendency classification and topics of each first user according to the target emotion tendency classification and the target monitoring topics. The method can be realized through a text classification algorithm and keyword matching, and user data related to the target emotion tendencies and topics are found. And analyzing the screened first user data to obtain the relationship and interaction between the users. The method can be realized by a social network analysis method, and the interaction behavior among the users, such as praise, comment, reply and the like, is analyzed to construct a relationship network among the users. A plurality of nodes are created according to a plurality of first users, and a plurality of directed edges are created according to relationships and interactions between users. Thus, each user is represented as a node in the graph, and the interaction behavior between users is represented as directed edges. And generating a corresponding first community user relation network according to the plurality of nodes and the plurality of directed edges. This is a complex network in which nodes represent users and directed edges represent interactions between users. And after the first community user relation network is obtained, carrying out clustering calculation on the first community user relation network to obtain a clustering result. This can be achieved by graph calculation algorithms such as spectral clustering, K-means, etc., dividing the users into different groups, each group representing a user group. After the importance of each node in the user relation network is obtained through calculation, a preset graph calculation cluster analysis model can be called, and user clustering calculation is carried out on the first community user relation network by combining the clustering result and the node importance. In this way, the server divides users into different groups, each group representing a group of users, who may have similar interests and points of interest in the community. And carrying out key user identification on the plurality of user groups, and determining at least one second user corresponding to each user group. The key user identification can be realized by a social network analysis method, so that users with larger influence and higher liveness in each user group are found, and the users can be representative users of the group and can have larger influence on attitudes and behaviors of the group. For example, suppose that the server monitors a community on a social media platform for environmental topics. The server collects posting and comment data for a plurality of users from the community. The server firstly screens out user data related to the environmental topics, and analyzes the data to obtain the interaction relationship among users. The server uses the users as nodes, and constructs a user relation network according to the interaction behavior among the users, wherein the directed edges represent the interaction relation among the users. The server performs cluster computation on the user relationship network, dividing the users into several different groups. After the importance of each node in the user relation network is calculated, the server calls a preset graph calculation cluster analysis model, and user grouping calculation is carried out on the user relation network to obtain a plurality of user groups. The server performs key user identification on a plurality of user groups, and finds out users with larger influence and higher liveness in each group, wherein the users are possibly representative users of the group and can also have larger influence on attitudes and behaviors of the group. Therefore, the server can deeply understand attitudes and attention points of different user groups in the community to the environmental protection topics, and provide valuable references for subsequent public opinion analysis and processing.
In a specific embodiment, the process of executing step S106 may specifically include the following steps:
(1) Acquiring second user data corresponding to at least one second user, and performing feature extraction on the second user data to obtain a plurality of user feature data;
(2) Calculating weight data of each node in the first community user relation network based on the plurality of user characteristic data;
(3) And carrying out relation network weighted analysis on the first community user relation network according to the weight data of each node to generate a corresponding second community user relation network.
Specifically, the data of at least one second user is obtained through a preset big data platform or other data sources. The data may include social media posts, comments, personal data, etc. of the second user. And preprocessing and extracting features of the second user data. Preprocessing comprises data cleaning, noise removal, missing value processing and the like, and accuracy and integrity of data are ensured. Feature extraction is the extraction of meaningful features from the second user's data, and may use text processing techniques, image processing techniques, etc., to convert the raw data into feature vectors that can be analyzed. Based on the plurality of user characteristic data, the server calculates weight data for each node in the first community user relationship network. The weight of a node may reflect the importance, impact, or other key indicators of a user in the community. The weight calculation may be based on the user's liveness, connection conditions in the social network, the number of fans of the user, etc. And carrying out relation network weighted analysis on the first community user relation network according to the weight data of each node to generate a corresponding second community user relation network. In the weighted analysis, the weights may be used to adjust the strength of edges or the weight of connections between nodes, reflecting the strength of the relationship between users. In this way, the second community user relationship network obtained by the server can reflect the relationship and interaction situation among users more accurately. For example, suppose that the server monitors a community on a social media platform for a technical topic. The server acquires the data of two second users through a preset big data platform, and performs preprocessing and feature extraction on the data. These characteristics include indicators of the posting frequency, number of praise, number of comments, etc. of the user. The server calculates weight data for each node in the first community user relationship network, the weights representing the importance and influence of the user in the community. For example, user a has a higher posting frequency and many fans, so that the user a has a higher weight; and the posting frequency of the user B is lower, the number of the vermicelli is smaller, and the weight is smaller. And the server performs relation network weighted analysis on the first community user relation network according to the weight data of each node, and a corresponding second community user relation network is generated. In the second community user relation network, the strength and the connection condition of edges between nodes are influenced by weights, and the relation strength and the interaction condition between users are reflected. In this way, the second community user relationship network obtained by the server will more accurately reflect the relationship and community structure between users. These data and analysis results may provide valuable information for subsequent public opinion predictions and processing.
In a specific embodiment, as shown in fig. 4, the process of performing step S107 may specifically include the following steps:
S401, carrying out network membership analysis on a second community user relationship network to obtain a target network membership, and constructing a community user relationship matrix according to the target network membership;
S402, inputting a community user relation matrix into a preset community public opinion prediction model, wherein the community public opinion prediction model comprises: a plurality of base prediction models and a meta prediction model, wherein the plurality of base prediction models comprise a long-short-term memory network, a convolutional neural network and a converter model;
s403, carrying out community public opinion evolution prediction on a community user relation matrix through a long-term and short-term memory network to obtain a first public opinion prediction result;
S404, carrying out community public opinion evolution prediction on a community user relation matrix through a convolutional neural network to obtain a second public opinion prediction result;
S405, carrying out community public opinion evolution prediction on a community user relation matrix through a converter model to obtain a third public opinion prediction result;
S406, inputting the first public opinion prediction result, the second public opinion prediction result and the third public opinion prediction result into a meta prediction model for result integration to obtain a target public opinion prediction result;
S407, constructing a mapping relation between the public opinion prediction result and the public opinion processing strategy, and matching the target public opinion processing strategy corresponding to the target public opinion prediction result according to the mapping relation.
Specifically, the server performs a network membership analysis on the second community user relationship network. The server uses complex network analysis methods and graph clustering algorithms to identify nodes that play a central role in the community. By analyzing the relationship between community users, the server obtains the target network affiliation, namely which nodes have important roles and influence in the community. And constructing a community user relation matrix according to the target network affiliation. The community user relationship matrix is a two-dimensional matrix, wherein rows and columns respectively represent user nodes in the community, and elements of the matrix represent relationship strengths between the nodes. By translating the target network affiliations into a matrix form, the server better represents and analyzes the relationships between community users. And inputting the constructed community user relation matrix into a preset community public opinion prediction model. The predictive model is a composite model, consisting of a plurality of base predictive models and a meta predictive model. The base prediction model comprises a long-short-term memory network (LSTM), a Convolutional Neural Network (CNN) and a converter model (transducer). The base prediction models have different performances under different scenes, and the server comprehensively considers a plurality of prediction results through the combination of the base prediction models and the base prediction models. And carrying out community public opinion evolution prediction on the community user relation matrix through a long-term and short-term memory network to obtain a first public opinion prediction result. LSTM is a cyclic neural network suitable for sequence data, and a server obtains the trend of community public opinion in a future period by predicting the sequence of a community user relation matrix. And carrying out community public opinion evolution prediction on the community user relation matrix through a convolutional neural network to obtain a second public opinion prediction result. The CNN is a deep learning model suitable for image and matrix data, in a community user relation matrix, a server predicts the image as a two-dimensional image, and local characteristics of community public opinion can be captured through the CNN. And carrying out community public opinion evolution prediction on the community user relation matrix through the converter model to obtain a third public opinion prediction result. The converter model is a neural network based on a self-attention mechanism, can effectively process sequence data, predicts a community user relation matrix through the converter model, and obtains global features of community public opinion evolution through a server. And inputting the first public opinion prediction result, the second public opinion prediction result and the third public opinion prediction result into a meta prediction model to perform result integration, so as to obtain a target public opinion prediction result. The meta-prediction model can be a simple linear combination model, and the server obtains more accurate public opinion prediction results by carrying out weighted fusion on a plurality of prediction results. And constructing a mapping relation between the public opinion prediction result and the public opinion processing strategy, and matching a target public opinion processing strategy corresponding to the target public opinion prediction result according to the mapping relation. By matching the prediction results with corresponding processing strategies, the server takes measures in time to cope with different community public opinion developments. For example, assume that a server performs public opinion analysis on a certain community of a social media platform. The server obtains monitoring data of a plurality of first users from the community, including relationships and interactions between the users. And obtaining the target network affiliation by the server through a complex network analysis method and a graph clustering algorithm, and determining the nodes playing a core role in the community. And constructing a community user relation matrix according to the target network affiliation. The server inputs the community user relation matrix into a preset community public opinion prediction model which comprises a plurality of basic prediction models such as LSTM, CNN and Transformer. And predicting the community user relation matrix through the base prediction models to obtain three different public opinion prediction results. And integrating the three prediction results through a meta-prediction model to obtain a final target public opinion prediction result. And according to the mapping relation between the prediction result and the public opinion processing strategy, the server takes corresponding measures to cope with the evolution and change of the community public opinion.
The method for monitoring community public opinion based on big data in the embodiment of the present invention is described above, and the device for monitoring community public opinion based on big data in the embodiment of the present invention is described below, referring to fig. 5, one embodiment of the device for monitoring community public opinion based on big data in the embodiment of the present invention includes:
The acquiring module 501 is configured to acquire initial monitoring data of a plurality of first users in a target community through a preset big data platform, and perform data integrity check and data screening on the initial monitoring data to obtain standard monitoring data;
The word segmentation module 502 is configured to perform text word segmentation and part-of-speech tagging on the standard monitoring data to obtain target word segmentation data, and perform named entity recognition on the target word segmentation data to obtain target monitoring data;
The screening module 503 is configured to perform topic analysis on the target monitoring data to obtain a plurality of topic clusters, and perform topic identification and screening on the plurality of topic clusters to obtain a target monitoring topic;
The classification module 504 is configured to perform emotion tendency analysis on the target monitoring data according to the target monitoring topic, so as to obtain a target emotion tendency classification;
The analysis module 505 is configured to perform community relation analysis on the plurality of first users according to the target emotion tendency classification and the target monitoring topic, generate a first community user relation network, and perform key user identification on the first community user relation network to obtain at least one second user;
A processing module 506, configured to perform a relationship network weighted analysis on the first community user relationship network based on the at least one second user, and generate a corresponding second community user relationship network;
And the prediction module 507 is configured to input the second community user relationship network into a preset community public opinion prediction model to perform community public opinion evolution prediction, obtain a target public opinion prediction result, and match a corresponding target public opinion processing policy according to the target public opinion prediction result.
Optionally, the obtaining module 501 is specifically configured to: acquiring initial monitoring data of a plurality of first users in a target community through a preset big data platform, and performing data preprocessing on the initial monitoring data to obtain first monitoring data; calculating a covariance matrix of the first monitoring data, and carrying out eigenvalue decomposition on the covariance matrix to obtain a plurality of first eigenvalues and eigenvectors of each first eigenvalue; sorting analysis is carried out on the plurality of first characteristic values to obtain a characteristic value sorting result, and the plurality of first characteristic values are screened according to the characteristic value sorting result to obtain a plurality of second characteristic values; performing data projection on the first monitoring data according to the plurality of second characteristic values to obtain reduced-dimension monitoring data; performing data reconstruction on the reduced-dimension monitoring data to obtain second monitoring data, and performing difference analysis on the second monitoring data and the first monitoring data to obtain target difference degree; and carrying out data integrity verification according to the target difference degree to obtain a data integrity verification result, and carrying out data screening according to the data integrity verification result to obtain standard monitoring data.
Optionally, the screening module 503 is specifically configured to: constructing a corresponding co-occurrence matrix according to the target monitoring data, and initializing parameters of a preset topic analysis model to obtain the number of topic clusters and the initial probability of the topic corresponding to each word; iteratively updating the co-occurrence matrix through the topic analysis model, and decomposing the co-occurrence matrix into a distribution probability of a topic cluster and a target probability of a topic corresponding to a word; distributing each word to different topic clusters according to the target probability of the topic corresponding to the word to form a plurality of topic clusters; performing topic relevance on the topic clusters to obtain topic relevance of each topic cluster, and performing topic screening on the topic clusters according to the topic relevance of each topic cluster to obtain a target monitoring topic.
Optionally, the classification module 504 is specifically configured to: according to the emotion classification attribute words corresponding to the target monitoring topics, emotion tendency grades of the emotion classification attribute words are classified, and emotion tendency grades of the emotion classification attribute words are obtained; extracting attribute words from the target monitoring data according to the emotion classification attribute words to obtain frequency data of the emotion classification attribute words; and according to the frequency data and the emotion tendency grade, performing emotion tendency classification on the target monitoring data to obtain target emotion tendency classification.
Optionally, the analysis module 505 is specifically configured to: screening first user data corresponding to each first user and the target emotion tendency classification and the target monitoring topic according to the target emotion tendency classification and the target monitoring topic, and analyzing the first user data to obtain the relationship and interaction among users; creating a plurality of nodes according to the plurality of first users, and creating a plurality of directed edges according to the relationship and interaction between the users; generating a corresponding first community user relation network according to the plurality of nodes and the plurality of directed edges, carrying out clustering calculation on the first community user relation network to obtain a clustering result, and calculating the importance of each node in the first community user relation network; invoking a preset graph calculation cluster analysis model, and carrying out user grouping calculation on the first community user relation network according to the clustering result and the importance of each node to obtain a plurality of user groups; and carrying out key user identification on the plurality of user groups, and determining at least one second user corresponding to each user group.
Optionally, the processing module 506 is specifically configured to: acquiring second user data corresponding to at least one second user, and performing feature extraction on the second user data to obtain a plurality of user feature data; calculating weight data of each node in the first community user relation network based on the plurality of user characteristic data; and carrying out relation network weighted analysis on the first community user relation network according to the weight data of each node to generate a corresponding second community user relation network.
Optionally, the prediction module 507 is specifically configured to: performing network membership analysis on the second community user relationship network to obtain a target network membership, and constructing a community user relationship matrix according to the target network membership; inputting the community user relation matrix into a preset community public opinion prediction model, wherein the community public opinion prediction model comprises: a plurality of base prediction models and a meta prediction model, wherein the plurality of base prediction models include a long-short term memory network, a convolutional neural network, and a converter model; carrying out community public opinion evolution prediction on the community user relation matrix through the long-term and short-term memory network to obtain a first public opinion prediction result; carrying out community public opinion evolution prediction on the community user relation matrix through the convolutional neural network to obtain a second public opinion prediction result; carrying out community public opinion evolution prediction on the community user relation matrix through the converter model to obtain a third public opinion prediction result; inputting the first public opinion prediction result, the second public opinion prediction result and the third public opinion prediction result into the meta prediction model for result integration to obtain a target public opinion prediction result; and constructing a mapping relation between the public opinion prediction result and the public opinion processing strategy, and matching a target public opinion processing strategy corresponding to the target public opinion prediction result according to the mapping relation.
Performing topic analysis on the target monitoring data through the cooperative cooperation of the components to obtain a plurality of topic clusters, and performing topic identification and screening on the topic clusters to obtain target monitoring topics; carrying out emotion tendency analysis according to the target monitoring topics to obtain target emotion tendency classification; carrying out community relation analysis according to the target emotion tendency classification and the target monitoring topics, generating a first community user relation network, and carrying out key user identification to obtain at least one second user; performing relationship network weighted analysis on the first community user relationship network based on at least one second user to generate a second community user relationship network; the method comprises the steps of inputting a second community user relationship network into a community public opinion prediction model to conduct community public opinion evolution prediction to obtain a target public opinion prediction result, acquiring initial monitoring data of a plurality of first users in a target community based on a big data platform, and processing and analyzing the data in real time, so that public opinion monitoring can know community public opinion dynamic and event development trends more timely. Through analysis of a large amount of community public opinion data, the views, emotion tendencies and focus of attention of a plurality of first users in the community can be comprehensively known, evolution tendencies of target public opinion events are predicted through a machine learning algorithm, decision makers are helped to timely take countermeasures to prevent public opinion risks, users with great influence on public opinion in the community are found through community relation analysis and key user identification, interaction and communication can be carried out with the users more pertinently, a more reasonable and effective public opinion processing scheme is formulated through matching of a prediction model and public opinion processing strategies, and accuracy of community public opinion monitoring is improved.
The big data based community public opinion monitoring device in the embodiment of the present invention is described in detail from the perspective of the modularized functional entity in fig. 5 above, and the big data based community public opinion monitoring device in the embodiment of the present invention is described in detail from the perspective of hardware processing below.
Fig. 6 is a schematic structural diagram of a big data-based community public opinion monitoring device 600 according to an embodiment of the present invention, where the big data-based community public opinion monitoring device 600 may have a relatively large difference due to different configurations or performances, and may include one or more processors (central processing units, CPU) 610 (e.g., one or more processors) and a memory 620, one or more storage mediums 630 (e.g., one or more mass storage devices) storing applications 633 or data 632. Wherein the memory 620 and the storage medium 630 may be transitory or persistent storage. The program stored in the storage medium 630 may include one or more modules (not shown), each of which may include a series of instruction operations on the big data-based community public opinion monitoring device 600. Still further, the processor 610 may be configured to communicate with the storage medium 630 to execute a series of instruction operations in the storage medium 630 on the big data based community public opinion monitoring device 600.
The big data based community public opinion monitoring device 600 may also include one or more power supplies 640, one or more wired or wireless network interfaces 650, one or more input output interfaces 660, and/or one or more operating systems 631, such as Windows Server, mac OS X, unix, linux, freeBSD, etc. It will be appreciated by those skilled in the art that the big data based community public opinion monitoring device structure shown in fig. 6 does not constitute a limitation of big data based community public opinion monitoring devices, and may include more or fewer components than illustrated, or may combine certain components, or a different arrangement of components.
The invention also provides a community public opinion monitoring device based on big data, which comprises a memory and a processor, wherein the memory stores computer readable instructions, and the computer readable instructions, when executed by the processor, cause the processor to execute the steps of the community public opinion monitoring method based on big data in the above embodiments.
The present invention also provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, and may also be a volatile computer readable storage medium, where instructions are stored in the computer readable storage medium, when the instructions are executed on a computer, cause the computer to perform the steps of the big data based community public opinion monitoring method.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (4)
1. The community public opinion monitoring method based on big data is characterized by comprising the following steps of:
acquiring initial monitoring data of a plurality of first users in a target community through a preset big data platform, and performing data integrity check sum data screening on the initial monitoring data to obtain standard monitoring data; the method specifically comprises the following steps: acquiring initial monitoring data of a plurality of first users in a target community through a preset big data platform, and performing data preprocessing on the initial monitoring data to obtain first monitoring data; calculating a covariance matrix of the first monitoring data, and carrying out eigenvalue decomposition on the covariance matrix to obtain a plurality of first eigenvalues and eigenvectors of each first eigenvalue; sorting analysis is carried out on the plurality of first characteristic values to obtain a characteristic value sorting result, and the plurality of first characteristic values are screened according to the characteristic value sorting result to obtain a plurality of second characteristic values; performing data projection on the first monitoring data according to the plurality of second characteristic values to obtain reduced-dimension monitoring data; performing data reconstruction on the reduced-dimension monitoring data to obtain second monitoring data, and performing difference analysis on the second monitoring data and the first monitoring data to obtain target difference degree; performing data integrity verification according to the target difference degree to obtain a data integrity verification result, and performing data screening according to the data integrity verification result to obtain standard monitoring data;
performing text word segmentation and part-of-speech tagging on the standard monitoring data to obtain target word segmentation data, and performing named entity recognition on the target word segmentation data to obtain target monitoring data;
Performing topic analysis on the target monitoring data to obtain a plurality of topic clusters, and performing topic identification and screening on the topic clusters to obtain target monitoring topics; the method specifically comprises the following steps: constructing a corresponding co-occurrence matrix according to the target monitoring data, and initializing parameters of a preset topic analysis model to obtain the number of topic clusters and the initial probability of the topic corresponding to each word; iteratively updating the co-occurrence matrix through the topic analysis model, and decomposing the co-occurrence matrix into a distribution probability of a topic cluster and a target probability of a topic corresponding to a word; distributing each word to different topic clusters according to the target probability of the topic corresponding to the word to form a plurality of topic clusters; performing topic relevance on the multiple topic clusters to obtain topic relevance of each topic cluster, and performing topic screening on the multiple topic clusters according to the topic relevance of each topic cluster to obtain a target monitoring topic;
according to the target monitoring topics, carrying out emotion tendency analysis on the target monitoring data to obtain target emotion tendency classification; the method specifically comprises the following steps: according to the emotion classification attribute words corresponding to the target monitoring topics, emotion tendency grades of the emotion classification attribute words are classified, and emotion tendency grades of the emotion classification attribute words are obtained; extracting attribute words from the target monitoring data according to the emotion classification attribute words to obtain frequency data of the emotion classification attribute words; according to the frequency data and the emotion tendency level, performing emotion tendency classification on the target monitoring data to obtain target emotion tendency classification;
According to the target emotion tendency classification and the target monitoring topics, carrying out community relation analysis on the plurality of first users to generate a first community user relation network, and carrying out key user identification on the first community user relation network to obtain at least one second user; the method specifically comprises the following steps: screening first user data corresponding to each first user and the target emotion tendency classification and the target monitoring topic according to the target emotion tendency classification and the target monitoring topic, and analyzing the first user data to obtain the relationship and interaction among users; creating a plurality of nodes according to the plurality of first users, and creating a plurality of directed edges according to the relationship and interaction between the users; generating a corresponding first community user relation network according to the plurality of nodes and the plurality of directed edges, carrying out clustering calculation on the first community user relation network to obtain a clustering result, and calculating the importance of each node in the first community user relation network; invoking a preset graph calculation cluster analysis model, and carrying out user grouping calculation on the first community user relation network according to the clustering result and the importance of each node to obtain a plurality of user groups; carrying out key user identification on the plurality of user groups, and determining at least one second user corresponding to each user group;
Performing relationship network weighted analysis on the first community user relationship network based on the at least one second user to generate a corresponding second community user relationship network; the method specifically comprises the following steps: acquiring second user data corresponding to at least one second user, and performing feature extraction on the second user data to obtain a plurality of user feature data; calculating weight data of each node in the first community user relation network based on the plurality of user characteristic data; according to the weight data of each node, carrying out relation network weight analysis on the first community user relation network to generate a corresponding second community user relation network;
Inputting the second community user relation network into a preset community public opinion prediction model to conduct community public opinion evolution prediction to obtain a target public opinion prediction result, and matching a corresponding target public opinion processing strategy according to the target public opinion prediction result; the method specifically comprises the following steps: performing network membership analysis on the second community user relationship network to obtain a target network membership, and constructing a community user relationship matrix according to the target network membership; inputting the community user relation matrix into a preset community public opinion prediction model, wherein the community public opinion prediction model comprises: a plurality of base prediction models and a meta prediction model, wherein the plurality of base prediction models include a long-short term memory network, a convolutional neural network, and a converter model; carrying out community public opinion evolution prediction on the community user relation matrix through the long-term and short-term memory network to obtain a first public opinion prediction result; carrying out community public opinion evolution prediction on the community user relation matrix through the convolutional neural network to obtain a second public opinion prediction result; carrying out community public opinion evolution prediction on the community user relation matrix through the converter model to obtain a third public opinion prediction result; inputting the first public opinion prediction result, the second public opinion prediction result and the third public opinion prediction result into the meta prediction model for result integration to obtain a target public opinion prediction result; and constructing a mapping relation between the public opinion prediction result and the public opinion processing strategy, and matching a target public opinion processing strategy corresponding to the target public opinion prediction result according to the mapping relation.
2. Big data-based community public opinion monitoring device, its characterized in that, community public opinion monitoring device based on big data includes:
The acquisition module is used for acquiring initial monitoring data of a plurality of first users in a target community through a preset big data platform, and carrying out data integrity check and data screening on the initial monitoring data to obtain standard monitoring data; the method specifically comprises the following steps: acquiring initial monitoring data of a plurality of first users in a target community through a preset big data platform, and performing data preprocessing on the initial monitoring data to obtain first monitoring data; calculating a covariance matrix of the first monitoring data, and carrying out eigenvalue decomposition on the covariance matrix to obtain a plurality of first eigenvalues and eigenvectors of each first eigenvalue; sorting analysis is carried out on the plurality of first characteristic values to obtain a characteristic value sorting result, and the plurality of first characteristic values are screened according to the characteristic value sorting result to obtain a plurality of second characteristic values; performing data projection on the first monitoring data according to the plurality of second characteristic values to obtain reduced-dimension monitoring data; performing data reconstruction on the reduced-dimension monitoring data to obtain second monitoring data, and performing difference analysis on the second monitoring data and the first monitoring data to obtain target difference degree; performing data integrity verification according to the target difference degree to obtain a data integrity verification result, and performing data screening according to the data integrity verification result to obtain standard monitoring data;
The word segmentation module is used for carrying out text word segmentation and part-of-speech tagging on the standard monitoring data to obtain target word segmentation data, and carrying out named entity recognition on the target word segmentation data to obtain target monitoring data;
The screening module is used for carrying out topic analysis on the target monitoring data to obtain a plurality of topic clusters, and carrying out topic identification and screening on the topic clusters to obtain target monitoring topics; the method specifically comprises the following steps: constructing a corresponding co-occurrence matrix according to the target monitoring data, and initializing parameters of a preset topic analysis model to obtain the number of topic clusters and the initial probability of the topic corresponding to each word; iteratively updating the co-occurrence matrix through the topic analysis model, and decomposing the co-occurrence matrix into a distribution probability of a topic cluster and a target probability of a topic corresponding to a word; distributing each word to different topic clusters according to the target probability of the topic corresponding to the word to form a plurality of topic clusters; performing topic relevance on the multiple topic clusters to obtain topic relevance of each topic cluster, and performing topic screening on the multiple topic clusters according to the topic relevance of each topic cluster to obtain a target monitoring topic;
the classification module is used for analyzing emotion tendencies of the target monitoring data according to the target monitoring topics to obtain target emotion tendencies classification; the method specifically comprises the following steps: according to the emotion classification attribute words corresponding to the target monitoring topics, emotion tendency grades of the emotion classification attribute words are classified, and emotion tendency grades of the emotion classification attribute words are obtained; extracting attribute words from the target monitoring data according to the emotion classification attribute words to obtain frequency data of the emotion classification attribute words; according to the frequency data and the emotion tendency level, performing emotion tendency classification on the target monitoring data to obtain target emotion tendency classification;
The analysis module is used for carrying out community relation analysis on the plurality of first users according to the target emotion tendency classification and the target monitoring topics, generating a first community user relation network, and carrying out key user identification on the first community user relation network to obtain at least one second user; the method specifically comprises the following steps: screening first user data corresponding to each first user and the target emotion tendency classification and the target monitoring topic according to the target emotion tendency classification and the target monitoring topic, and analyzing the first user data to obtain the relationship and interaction among users; creating a plurality of nodes according to the plurality of first users, and creating a plurality of directed edges according to the relationship and interaction between the users; generating a corresponding first community user relation network according to the plurality of nodes and the plurality of directed edges, carrying out clustering calculation on the first community user relation network to obtain a clustering result, and calculating the importance of each node in the first community user relation network; invoking a preset graph calculation cluster analysis model, and carrying out user grouping calculation on the first community user relation network according to the clustering result and the importance of each node to obtain a plurality of user groups; carrying out key user identification on the plurality of user groups, and determining at least one second user corresponding to each user group;
The processing module is used for carrying out relation network weighted analysis on the first community user relation network based on the at least one second user and generating a corresponding second community user relation network; the method specifically comprises the following steps: acquiring second user data corresponding to at least one second user, and performing feature extraction on the second user data to obtain a plurality of user feature data; calculating weight data of each node in the first community user relation network based on the plurality of user characteristic data; according to the weight data of each node, carrying out relation network weight analysis on the first community user relation network to generate a corresponding second community user relation network;
The prediction module is used for inputting the second community user relation network into a preset community public opinion prediction model to conduct community public opinion evolution prediction, obtaining a target public opinion prediction result, and matching a corresponding target public opinion processing strategy according to the target public opinion prediction result; the method specifically comprises the following steps: performing network membership analysis on the second community user relationship network to obtain a target network membership, and constructing a community user relationship matrix according to the target network membership; inputting the community user relation matrix into a preset community public opinion prediction model, wherein the community public opinion prediction model comprises: a plurality of base prediction models and a meta prediction model, wherein the plurality of base prediction models include a long-short term memory network, a convolutional neural network, and a converter model; carrying out community public opinion evolution prediction on the community user relation matrix through the long-term and short-term memory network to obtain a first public opinion prediction result; carrying out community public opinion evolution prediction on the community user relation matrix through the convolutional neural network to obtain a second public opinion prediction result; carrying out community public opinion evolution prediction on the community user relation matrix through the converter model to obtain a third public opinion prediction result; inputting the first public opinion prediction result, the second public opinion prediction result and the third public opinion prediction result into the meta prediction model for result integration to obtain a target public opinion prediction result; and constructing a mapping relation between the public opinion prediction result and the public opinion processing strategy, and matching a target public opinion processing strategy corresponding to the target public opinion prediction result according to the mapping relation.
3. Big data-based community public opinion monitoring device, characterized in that the big data-based community public opinion monitoring device includes: a memory and at least one processor, the memory having instructions stored therein;
The at least one processor invoking the instructions in the memory to cause the big data based community public opinion monitoring device to perform the big data based community public opinion monitoring method of claim 1.
4. A computer readable storage medium having instructions stored thereon, wherein the instructions when executed by a processor implement the big data based community public opinion monitoring method of claim 1.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202311431914.9A CN117370678B (en) | 2023-10-31 | 2023-10-31 | Community public opinion monitoring method and related device based on big data |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202311431914.9A CN117370678B (en) | 2023-10-31 | 2023-10-31 | Community public opinion monitoring method and related device based on big data |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN117370678A CN117370678A (en) | 2024-01-09 |
| CN117370678B true CN117370678B (en) | 2024-07-16 |
Family
ID=89394423
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202311431914.9A Active CN117370678B (en) | 2023-10-31 | 2023-10-31 | Community public opinion monitoring method and related device based on big data |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN117370678B (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118821046B (en) * | 2024-07-04 | 2025-04-04 | 安徽三联学院 | Network public opinion management system and method based on deep learning |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107330557A (en) * | 2017-06-28 | 2017-11-07 | 中国石油大学(华东) | A method and device for tracking and predicting public opinion hotspots based on community division and entropy |
| CN110516067A (en) * | 2019-08-23 | 2019-11-29 | 北京工商大学 | Public opinion monitoring method, system and storage medium based on topic detection |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108959383A (en) * | 2018-05-31 | 2018-12-07 | 平安科技(深圳)有限公司 | Analysis method, device and the computer readable storage medium of network public-opinion |
| CN109684646A (en) * | 2019-01-15 | 2019-04-26 | 江苏大学 | A kind of microblog topic sentiment analysis method based on topic influence |
| CN110929145B (en) * | 2019-10-17 | 2023-07-21 | 平安科技(深圳)有限公司 | Public opinion analysis method, public opinion analysis device, computer device and storage medium |
| CN113569008B (en) * | 2021-07-20 | 2024-11-26 | 南京市栖霞区民政事务服务中心 | A big data analysis method and system based on community governance data |
-
2023
- 2023-10-31 CN CN202311431914.9A patent/CN117370678B/en active Active
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107330557A (en) * | 2017-06-28 | 2017-11-07 | 中国石油大学(华东) | A method and device for tracking and predicting public opinion hotspots based on community division and entropy |
| CN110516067A (en) * | 2019-08-23 | 2019-11-29 | 北京工商大学 | Public opinion monitoring method, system and storage medium based on topic detection |
Also Published As
| Publication number | Publication date |
|---|---|
| CN117370678A (en) | 2024-01-09 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN110909165A (en) | Data processing method, device, medium and electronic equipment | |
| JPWO2006087854A1 (en) | Information classification device, information classification method, information classification program, information classification system | |
| CN111694941B (en) | Reply information determining method and device, storage medium and electronic equipment | |
| WO2020135642A1 (en) | Model training method and apparatus employing generative adversarial network | |
| CN119106322A (en) | User grouping method, device, equipment, storage medium and program product | |
| CN117370678B (en) | Community public opinion monitoring method and related device based on big data | |
| CN112215629B (en) | Multi-target advertisement generating system and method based on construction countermeasure sample | |
| CN113407808A (en) | Method and device for judging applicability of graph neural network model and computer equipment | |
| CN114036293B (en) | Data processing method and device and electronic equipment | |
| CN118861438B (en) | Engineering information digital consultation management method and system based on Internet of things | |
| CN114218354A (en) | Text analysis method, device, computer equipment and storage medium | |
| KR102454261B1 (en) | Collaborative partner recommendation system and method based on user information | |
| KR20230049486A (en) | Political tendency analysis device and service providing method using the same | |
| CN113254788A (en) | Big data based recommendation method and system and readable storage medium | |
| KR102155692B1 (en) | Methods for performing sentiment analysis of messages in social network service based on part of speech feature and sentiment analysis apparatus for performing the same | |
| CN119807351A (en) | Information processing method, device, equipment and storage medium | |
| CN109254993B (en) | Text-based character data analysis method and system | |
| CN112115705B (en) | Screening method and device of electronic resume | |
| Shanthi et al. | A satin optimized dynamic learning model (SODLM) for sentiment analysis using opinion mining | |
| CN115660695A (en) | Customer service personnel label portrait construction method and device, electronic equipment and storage medium | |
| Pramono | Sentiment Analysis of Political Discourse on Platform X using Graph Neural Network (GNN) | |
| Sridhar et al. | Extending Deep Neural Categorisation Models for Recommendations by Applying Gradient Based Learning | |
| Sahu et al. | Combating Hate Speech on Q&A Forums with Machine Learning | |
| KR20210030210A (en) | Patent analysis apparatus for finding technology sustainability | |
| Alghalibi et al. | Deep Tweets Analyzer Model for Twitter Mood Visualization and Prediction Based Deep Learning Approach |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |