[go: up one dir, main page]

CN106598944A - Civil aviation security public opinion emotion analysis method - Google Patents

Civil aviation security public opinion emotion analysis method Download PDF

Info

Publication number
CN106598944A
CN106598944A CN201611062208.1A CN201611062208A CN106598944A CN 106598944 A CN106598944 A CN 106598944A CN 201611062208 A CN201611062208 A CN 201611062208A CN 106598944 A CN106598944 A CN 106598944A
Authority
CN
China
Prior art keywords
word
microblogging
text
emotion
score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611062208.1A
Other languages
Chinese (zh)
Other versions
CN106598944B (en
Inventor
韩萍
李杉
贾云飞
牛勇钢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Civil Aviation University of China
Original Assignee
Civil Aviation University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Civil Aviation University of China filed Critical Civil Aviation University of China
Priority to CN201611062208.1A priority Critical patent/CN106598944B/en
Publication of CN106598944A publication Critical patent/CN106598944A/en
Application granted granted Critical
Publication of CN106598944B publication Critical patent/CN106598944B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

一种民航舆情情感分析方法。其包括对互联网上包含民航安保舆情关键词的微博文本进行检索、预处理和分词操作;构建词典;对微博进行打分,得到该微博情感分值;根据情感分值对微博进行主客观判别,得到该微博对民航安全的威胁度分值;根据威胁度分值判定微博文本中的言论对民航安全的威胁度等级等步骤。本发明利用文本语义和微博表情符号相结合的方式来判定微博文本的情感得分,克服了词典和语义规则的局限性,提高了情感得分判断准确度。充分利用微博文本的特点,使威胁度等级判定更加合理。本发明不同于机器学习方法,不需要用大规模带标记数据进行训练,因此更适用于实时数据流处理。

A sentiment analysis method for civil aviation public opinion. It includes retrieval, preprocessing and word segmentation of microblog texts on the Internet containing civil aviation security public opinion keywords; building a dictionary; scoring microblogs to obtain the sentiment score of the microblog; Obtaining the threat degree score of the microblog to civil aviation safety through objective discrimination; judging the threat degree level of the comments in the microblog text to civil aviation safety according to the threat degree score. The invention uses the combination of text semantics and microblog emoticons to determine the emotion score of the microblog text, overcomes the limitations of dictionaries and semantic rules, and improves the accuracy of emotion score judgment. Make full use of the characteristics of microblog text to make the judgment of threat level more reasonable. The present invention is different from the machine learning method, does not need large-scale labeled data for training, and therefore is more suitable for real-time data flow processing.

Description

一种民航安保舆情情感分析方法A sentiment analysis method for civil aviation security public opinion

技术领域technical field

本发明属于自然语言处理中的文本情感分析技术领域,特别是涉及一种民航安保舆情情感分析方法。The invention belongs to the technical field of text sentiment analysis in natural language processing, and in particular relates to a method for sentiment analysis of civil aviation security public opinion.

背景技术Background technique

在信息急速膨胀的互联网时代,越来越多的用户倾向于通过互联网来分享自己的观点或体验,所以社交网络中存在着大量的带有主观情感色彩的短文本。新浪微博是为大众提供娱乐休闲生活服务的信息分享和交流平台,目前新浪微博的活跃用户数保持在2亿左右,其继承了传统论坛、博客等形式的优点,结合手机等移动终端,使信息能够实时快速发布和获取。微博集娱乐、社交、营销于一身,已经从满足人们“弱关系”的社交需求上逐渐演变成为大众化的舆论平台,成为一个最重要的实时信息源和一种影响力日益增强的网络舆论传播中心,越来越多机构及公众人物都通过微博来发布或传播信息。In the Internet age with rapid information expansion, more and more users tend to share their opinions or experiences through the Internet, so there are a large number of short texts with subjective emotions in social networks. Sina Weibo is an information sharing and communication platform that provides entertainment, leisure and life services for the public. At present, the number of active users of Sina Weibo remains at about 200 million. It inherits the advantages of traditional forums and blogs, combined with mobile terminals such as mobile phones, Enable information to be released and acquired quickly in real time. Weibo integrates entertainment, social networking, and marketing. It has gradually evolved from meeting people's social needs for "weak ties" to a popular public opinion platform, and has become the most important source of real-time information and an increasingly influential network of public opinion dissemination. Center, more and more institutions and public figures are publishing or disseminating information through Weibo.

情感分析是对带有情感色彩的文本进行处理、分析和应用的过程,是自然语言处理中较前沿的研究领域。它是结合现有诸多研究成果的一种具体应用,与作为新网络社交媒体的微博相结合,有着重要的实用价值。微博情感分析的主要目的就是从微博信息中识别主观信息,挖掘用户对产品、新闻、热点事件等评论信息所持有的观点和态度。Sentiment analysis is the process of processing, analyzing and applying emotional text, and it is a cutting-edge research field in natural language processing. It is a specific application combining many existing research results, and it has important practical value when combined with Weibo as a new network social media. The main purpose of microblog sentiment analysis is to identify subjective information from microblog information, and to mine users' views and attitudes on comment information such as products, news, and hot events.

在民航领域,网络舆论高度自由化的同时带来了一些负面影响,例如发布虚假威胁言论、谣言、偏激语言等。通过对与民航相关的微博文本进行情感倾向性分析,可以过滤出对民航安全有威胁的微博,从而锁定有犯罪倾向的重点用户,及时推送给相关公安部门进行处理。除此之外,文本情感分析还有以下几个方面的应用:预测电影票房、股票走势、市场动态等。因此对于微博文本情感倾向性分析具有十分重要的意义。In the field of civil aviation, the high degree of liberalization of online public opinion has brought some negative effects, such as publishing false threats, rumors, extreme language, etc. By analyzing the sentiment tendency of microblog texts related to civil aviation, microblogs that threaten the safety of civil aviation can be filtered out, so as to lock key users with criminal tendencies and push them to relevant public security departments for processing in a timely manner. In addition, text sentiment analysis has the following applications: predicting movie box office, stock trends, market dynamics, etc. Therefore, it is of great significance to analyze the sentiment tendency of microblog text.

目前,中文文本情感分析方法主要有基于语义理解和基于机器学习两类方法。但这两种方法应用于微博情感分析中主要存在以下问题:①基于语义理解的方法用构建基准褒贬词库和定义表达规则的方法对语料进行模式匹配,对于表达方式复杂、不规则的微博文本处理上有很大的局限性。②基于机器学习的方法受限于特征的选取和语料规模大小,且容易产生过拟合效应,不适用于实时的大批量文本处理。At present, there are mainly two types of methods based on semantic understanding and machine learning based on Chinese text sentiment analysis methods. However, these two methods mainly have the following problems in the application of microblog sentiment analysis: ① The method based on semantic understanding uses the method of constructing a benchmark lexicon and defining expression rules to perform pattern matching on the corpus. There are great limitations on blog text processing. ②The method based on machine learning is limited by the selection of features and the size of the corpus, and is prone to over-fitting effects, so it is not suitable for real-time bulk text processing.

发明内容Contents of the invention

为了解决上述问题,本发明的目的在于提供一种民航安保舆情情感分析方法。In order to solve the above problems, the object of the present invention is to provide a method for analyzing civil aviation security public opinion sentiment.

为了达到上述目的,本发明提供的民航安保舆情情感分析方法包括按顺序进行的下列步骤:In order to achieve the above object, the civil aviation security public opinion emotion analysis method provided by the invention comprises the following steps carried out in order:

(1)对互联网上包含民航安保舆情关键词的微博文本进行检索、预处理和分词操作;(1) Retrieve, preprocess, and word-segment operations on microblog texts on the Internet that contain civil aviation security public opinion keywords;

(2)构建用于微博文本语义分析所需的各类词典,构建方法分为选取现有词典和自主构造的方式;(2) Construct various dictionaries required for semantic analysis of microblog texts, and the construction methods are divided into selecting existing dictionaries and self-constructing methods;

(3)根据上述步骤(2)构建的词典,对上述经步骤(1)分词后的微博进行打分,得到该微博的情感分值;(3) according to the dictionary that above-mentioned step (2) constructs, above-mentioned microblog after step (1) participle is scored, obtains the emotion score of this microblog;

(4)根据步骤(3)中得到的情感分值对微博进行主客观判别,用于过滤新闻报道在内的客观微博,保留带有主观性的微博,最终得到该微博对民航安全的威胁度分值;(4) According to the emotional score obtained in step (3), the subjective and objective discrimination of microblogs is carried out, which is used to filter objective microblogs including news reports, keep subjective microblogs, and finally obtain the microblog’s impact on civil aviation Security threat score;

(5)根据步骤(4)得到的威胁度分值判定微博文本中的言论对民航安全的威胁度等级,然后筛选出威胁度等级高的重点人员,并作为预警信息报送相关部门。(5) According to the threat score obtained in step (4), determine the threat level of the remarks in the microblog text to civil aviation safety, and then select the key personnel with high threat level, and report it to the relevant department as early warning information.

在步骤(1)中,所述的对互联网上包含民航安保舆情关键词的微博文本进行检索、预处理和分词操作的方法是:抓取互联网上包含民航安保舆情关键词的微博文本,从这些微博文本中检索涉及民航安保舆情的关键词,关键词分为地点词语和行为词语两类,检索策略分为单个词语检索和组合检索两种方式;然后对上述检索结果进行预处理操作,以去除网页链接、转发、回复微博时的用户昵称、话题标签、特殊字符在内的噪声信息,并提取表情符号;之后利用分词工具对上述经过预处理后的结果进行分词,分词工具使用Java开源分词工具Ansj。In step (1), the method for retrieving, preprocessing and word segmentation operations on the microblog text containing civil aviation security public opinion keywords on the Internet is: grabbing the microblog texts containing civil aviation security public opinion keywords on the Internet, Retrieve keywords related to civil aviation security public opinion from these microblog texts. The keywords are divided into two types: location words and action words, and the retrieval strategies are divided into single word retrieval and combination retrieval; and then perform preprocessing operations on the above retrieval results , to remove noise information including user nicknames, hashtags, and special characters when web links, reposts, and replies to Weibo, and extract emoticons; then use word segmentation tools to segment the above preprocessed results. The word segmentation tool uses Java Open source word segmentation tool Ansj.

在步骤(2)中,所述的词典包括情感词典、否定词典、修饰词典、连词词典、表情符号词典、网络热词词典和民航安保舆情词典。In step (2), the dictionaries include sentiment dictionaries, negative dictionaries, modification dictionaries, conjunction dictionaries, emoticon dictionaries, Internet hot word dictionaries and civil aviation security public opinion dictionaries.

在步骤(3)中,所述的根据上述步骤(2)构建的词典,对上述经步骤(1)分词后的微博进行打分,得到该微博的情感分值的方法包括下列步骤:In step (3), the described dictionary constructed according to the above-mentioned steps (2) scores the above-mentioned microblog after the word segmentation through step (1), and the method for obtaining the emotional score of the microblog includes the following steps:

1)从上述经步骤(1)分词后的微博文本中提取或确定情感词:1) Extract or determine the emotional words from the above-mentioned microblog text after step (1) word segmentation:

提取情感词的方法是将上述微博文本中经过分词后得到的词语与上述情感词典和网络热词词典进行匹配,若某一词语存在于上述两个词典中,则选取为情感词;The method for extracting emotional words is to match the words obtained after word segmentation in the above-mentioned microblog text with the above-mentioned emotional dictionary and the network hot word dictionary, and if a certain word exists in the above-mentioned two dictionaries, it is selected as an emotional word;

确定情感词的方法是对没有出现在情感词典和网络热词词典中的词语采用语义相似度方法进行;具体方法是对于两个词语w1和w2,如果词语w1有n个义项或概念:x1,x2…,xn,词语w2有m个义项或概念:y1,y2…,ym,规定词语w1和w2的相似度是各个义项或概念相似度的最大值,即:The method of determining the emotional word is to use the semantic similarity method for words that do not appear in the emotional dictionary and the online hot word dictionary; the specific method is for two words w 1 and w 2 , if the word w 1 has n meanings or concepts : x 1 , x 2 …, x n , the word w 2 has m meanings or concepts: y 1 , y 2 …, y m , it is stipulated that the similarity between words w 1 and w 2 is the maximum of the similarities between the meanings or concepts value, namely:

两个义原的相似度计算公式为:The formula for calculating the similarity between two sememes is:

其中,λ是正的可变参数;d(x1,y2)表示义原x1和义原y2在层次树中的距离;Among them, λ is a positive variable parameter; d(x 1 , y 2 ) represents the distance between sememe x 1 and sememe y 2 in the hierarchical tree;

将词语w与正面情感词典中每个种子词按式(1)及式(2)进行相似度计算得到该词与正面种子词的相似度,再将词语w与负面情感词典中每个种子词进行相似度计算得到该词与负面种子词的相似度,通过比较它们之间的均差值,最终得到词语w的情感倾向值,计算公式如下:Calculate the similarity between the word w and each seed word in the positive sentiment dictionary according to formula (1) and formula (2) to obtain the similarity between the word and the positive seed word, and then compare the word w with each seed word in the negative sentiment dictionary Carry out the similarity calculation to obtain the similarity between the word and the negative seed word. By comparing the mean difference between them, the emotional tendency value of the word w is finally obtained. The calculation formula is as follows:

其中,pi表示某一正面情感种子词,nj表示某一负面情感种子词;情感倾向值Sw的取值范围为(-1,1);设定阈值T,将计算出的情感倾向值Sw与阈值T进行比较,以判定词语w是否属于情感词;当|Sw|>T时,判定词语w为情感词,该情感词的强度定为10·SwAmong them, p i represents a positive emotional seed word, n j represents a negative emotional seed word; the value range of the emotional tendency value S w is (-1, 1); setting the threshold T, the calculated emotional tendency The value S w is compared with the threshold T to determine whether the word w belongs to an emotional word; when |S w |>T, it is determined that the word w is an emotional word, and the strength of the emotional word is set at 10·S w ;

2)确定微博中包含上述情感词的每一微博子句的文本情感得分;2) Determine the text emotion score of each microblog clause containing the above-mentioned emotional words in the microblog;

2.1)若微博子句中包含情感词,且在其之前出现属于否定词典中的否定词或修饰词典中的修饰词时,按以下几种情况计算该微博子句的文本情感得分Sa:2.1) If the microblog clause contains emotional words, and there are negative words in the negative dictionary or modifier words in the modification dictionary before it, the text sentiment score Sa of the microblog clause is calculated according to the following situations:

e)程度副词+情感词,情感词强度随副词强度改变,文本情感得分为:e) Adverbs of degree + emotional words, the intensity of emotional words changes with the intensity of adverbs, and the emotional score of the text is:

Sa=Ma·ps·pa (4)Sa=M a ps pa (4)

f)否定词+情感词,情感词的极性按照否定词的个数而改变,文本情感得分为:f) Negative words + emotional words, the polarity of the emotional words changes according to the number of negative words, and the sentiment score of the text is:

Sa=(-1)n·ps·pa (5)Sa=(-1) n ps pa (5)

g)程度副词+否定词+情感词,情感词极性反转,并且强度随副词强度改变,文本情感得分为:g) degree adverb + negative word + emotional word, the polarity of the emotional word is reversed, and the intensity changes with the intensity of the adverb, the text emotion score is:

Sa=(-1)·Ma·ps·pa (6)Sa=(-1) M a ps pa (6)

h)否定词+程度副词+情感词,由于否定出现在程度副词之前,情感词极性反转后,情感词强度较直接否定有所减弱,引入第一权重因子z1=0.5,文本情感得分为:h) Negative words + degree adverbs + emotional words, since negation appears before degree adverbs, after the polarity of emotional words is reversed, the strength of emotional words is weakened compared with direct negation, the first weight factor z 1 = 0.5 is introduced, and the text emotion score for:

Sa=(-1)·Ma·ps·pa·z1 (7)Sa=(-1) M a ps pa z 1 (7)

其中,ps表示情感词的强度,pa表示情感词极性,Ma表示程度副词的强度:Among them, ps represents the intensity of emotional words, pa represents the polarity of emotional words, and M a represents the intensity of degree adverbs:

2.2)若微博子句中包含连词词典中的转折连词,该微博子句属于复合句,考虑到句间的情感极性转移,按以下几种情况计算该微博子句的文本情感得分:2.2) If the microblog clause contains transitional conjunctions in the conjunction dictionary, the microblog clause is a compound sentence. Considering the emotional polarity transfer between sentences, the text sentiment score of the microblog clause is calculated according to the following situations:

d)转折关系:当微博子句中出现“但是”、“然而”等语义反转词汇时,前一微博子句的极性将会发生改变,这两个微博子句的整体极性将与后一个微博子句相同,引入第二权重因子z2=-1,文本情感得分为:d) Turning relationship: when semantic reversal words such as "but" and "however" appear in a microblog clause, the polarity of the previous microblog clause will change, and the overall polarity of the two microblog clauses will be the same as The latter microblog clause is the same, and the second weight factor z 2 =-1 is introduced, and the sentiment score of the text is:

Sen=z2Sen1+Sen2 (8)Sen=z 2 Sen 1 +Sen 2 (8)

e)递进关系:前后两个微博子句极性相同,强度增强,引入第三权重因子z3=1.5,文本情感得分为:e) Progressive relationship: the two microblog clauses before and after have the same polarity, and the strength is enhanced. The third weight factor z 3 =1.5 is introduced, and the text sentiment score is:

Sen=z3(Sen1+Sen2) (9)Sen=z 3 (Sen 1 +Sen 2 ) (9)

f)让步关系:后一个微博子句的极性会发生反转,整句的极性与前一微博子句相同,引入第四权重因子z4=-1,文本情感得分为:f) Concession relationship: the polarity of the latter microblog clause will be reversed, the polarity of the whole sentence is the same as that of the previous microblog clause, and the fourth weight factor z 4 =-1 is introduced, and the text sentiment score is:

Sen=Sen1+z4Sen2 (10)Sen=Sen 1 +z 4 Sen 2 (10)

其中,Sen1表示前一个微博子句的文本情感得分,Sen2表示后一个微博子句的文本情感得分;Among them, Sen 1 represents the text sentiment score of the previous microblog clause, and Sen 2 represents the text sentiment score of the latter microblog clause;

3)确定微博中表情符号得分;3) Determining the score of emoticons in Weibo;

根据表情符号词典,查出该微博中所有表情符号的极性及强度,并记录每个表情符号的个数;令Ni为第i个表情符号的个数,ei为该表情符号的强度,pi为该表情符号的极性,则微博中的表情符号得分计算公式为:According to the emoticon dictionary, find out the polarity and intensity of all emoticons in the microblog, and record the number of each emoticon; let N i be the number of the i-th emoticon, e i be the number of the emoticon Intensity, p i is the polarity of the emoticon, then the calculation formula of the emoticon score in Weibo is:

4)将上述的微博文本情感得分和表情符号得分进行加权求和,得到每一条微博的情感分值,公式如下:4) The above-mentioned microblog text sentiment score and emoticon score are weighted and summed to obtain the sentiment score of each microblog. The formula is as follows:

S1=α·scoreemo+β·scoretext (12)S 1 =α·score emo +β·score text (12)

其中,α、β为可调权值,取值范围是(0,1),α+β=1,通过交叉测试集验证能够选择正确分类概率最大时的α、β值;scoretext为该微博的文本情感得分,为各微博子句文本情感得分的平均值。Among them, α and β are adjustable weights, the value range is (0,1), α+β=1, the value of α and β when the probability of correct classification is the highest can be selected through cross-test set verification; score text is the micro The text sentiment score of the blog is the average of the text sentiment scores of each microblog clause.

在步骤(4)中,所述的根据步骤(3)中得到的情感分值对微博进行主客观判别,用于过滤新闻报道在内的客观微博,保留带有主观性的微博,最终得到该微博对民航安全的威胁度分值的方法是:In step (4), the subjective and objective discrimination is carried out to microblogs according to the emotional score obtained in step (3), for filtering objective microblogs including news reports, and retaining microblogs with subjectivity, The method to finally obtain the threat degree score of the Weibo to civil aviation safety is:

首先采用以下方法对微博文本进行主客观判别:First, the following methods are used to judge the subjective and objective microblog text:

1)对于情感分值S1=0的微博,若其中包含第一人称名词或代词,则认为是主观微博文本,否则为客观微博文本;1) For a microblog with an emotional score S 1 =0, if it contains first-person nouns or pronouns, it is considered to be a subjective microblog text, otherwise it is an objective microblog text;

2)对于情感分值S1≠0的微博,若其中包含新闻报道的特殊谓语用词,或微博文本中的转发次数至少2次,则认为是客观微博文本,否则是主观微博文本;2) For a microblog with an emotional score S 1 ≠ 0, if it contains special predicate words for news reports, or the number of reposts in the microblog text is at least 2 times, it is considered as an objective microblog text, otherwise it is a subjective microblog text;

将客观微博文本的威胁度分值设定为0,并且不进行威胁度分值计算,只计算主观微博的威胁度分值,计算公式如式(13)所示:Set the threat score of the objective microblog text to 0, and do not calculate the threat score, but only calculate the threat score of the subjective microblog. The calculation formula is shown in formula (13):

其中,D表示威胁度分值,范围在[-10,10]之间;S1表示微博文本的情感分值;S2<w1,w2>为民航安保舆情威胁分数,w1表示地点词语,w2表示行为词语;Among them, D represents the threat score, and the range is between [-10,10]; S 1 represents the sentiment score of the microblog text; S 2 <w 1 , w 2 > is the civil aviation security public opinion threat score, and w 1 represents place words, w 2 represents behavior words;

民航安保舆情威胁分数S2<w1,w2>的计算过程如下:查找微博文本中的行为词语w2,然后判断该行为词语的类型;当该行为词语为直接型时,民航安保舆情威胁分数S2<w1,w2>的值取该行为词语的强度;当该行为词语为间接型时,判断该微博文本中是否同时存在地点词语,如果同时存在,则民航安保舆情威胁分数S2<w1,w2>的值取该行为词语的强度,如果不同时存在,威胁分数S2<w1,w2>为0。The calculation process of civil aviation security public opinion threat score S 2 <w 1 ,w 2 >is as follows: find the behavioral word w 2 in the microblog text, and then judge the type of the behavioral word; when the behavioral word is direct, the civil aviation security public opinion The value of the threat score S 2 <w 1 ,w 2 > is the strength of the behavioral words; when the behavioral words are indirect, judge whether there are location words in the microblog text at the same time, if they exist at the same time, civil aviation security public opinion threat The value of the score S 2 <w 1 ,w 2 > is the strength of the behavior word, and if it does not exist at the same time, the threat score S 2 <w 1 ,w 2 >is 0.

在步骤(5)中,所述的根据步骤(4)得到的威胁度分值判定微博文本中的言论对民航安全的威胁度等级的方法是:In step (5), the method for determining the degree of threat level of remarks in the microblog text to civil aviation safety according to the degree of threat score that step (4) obtains is:

当威胁度分值D>0时,该微博文本表达的是积极情感,属于安全言论,因此不进行威胁度等级判定;当威胁度分值D≤0时,判定该微博文本含有民航安保舆情关键词,并表达的是消极情感,需要重点关注,然后根据下面的威胁度等级标准对微博文本进行威胁度等级判定;威胁度等级标准是对现有的微博文本进行测试而得到的,具体如下:When the threat degree score D > 0, the microblog text expresses positive emotions and belongs to safety speech, so the threat level judgment is not performed; when the threat degree score D ≤ 0, it is determined that the microblog text contains civil aviation security public opinion keywords, and express negative emotions, which need to be focused on, and then judge the threat level of the microblog text according to the following threat level standard; the threat level standard is obtained by testing the existing microblog text ,details as follows:

1)-4.5≤D≤0时为低等威胁度;1) When -4.5≤D≤0, it is a low threat degree;

2)-7≤D<-4.5时为中等威胁度;2) When -7≤D<-4.5, it is a medium threat;

3)-10≤D<7时为高等威胁度。3) When -10≤D<7, it is a high threat level.

本发明提供的民航安保舆情情感分析方法具有以下优点:(1)本发明利用文本语义和微博表情符号相结合的方式来判定微博文本的情感得分,克服了词典和语义规则的局限性,提高了情感得分判断的准确度。(2)在微博文本情感得分的基础上对该微博文本的威胁度分值进行计算,并得到威胁度等级,提高了民航公安部门预警能力,具有十分重要的意义。(3)充分利用微博文本的特点,使威胁度等级判定更加合理。(4)本发明不同于机器学习方法,不需要用大规模带标记数据进行训练,因此更适用于实时数据流处理。The civil aviation security public opinion sentiment analysis method provided by the present invention has the following advantages: (1) the present invention utilizes the mode that text semantics and microblog emoticons combine to determine the sentiment score of microblog text, overcomes the limitation of dictionary and semantic rule, Improved the accuracy of sentiment score judgments. (2) Calculate the threat score of the microblog text on the basis of the emotional score of the microblog text, and obtain the threat level, which improves the early warning ability of the civil aviation public security department, which is of great significance. (3) Make full use of the characteristics of microblog text to make the threat level judgment more reasonable. (4) The present invention is different from machine learning methods and does not need large-scale labeled data for training, so it is more suitable for real-time data stream processing.

附图说明Description of drawings

图1为本发明提供的民航安保舆情情感分析方法流程图。Fig. 1 is a flow chart of the method for analyzing civil aviation security public opinion sentiment provided by the present invention.

图2为本发明中情感分值计算方法流程图。Fig. 2 is a flow chart of the calculation method of emotion score in the present invention.

具体实施方式detailed description

下面结合附图和具体实施例对本发明提供的民航安保舆情情感分析方法进行详细说明。The civil aviation security public opinion sentiment analysis method provided by the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

如图1所示,本发明提供的民航安保舆情情感分析方法包括按顺序进行的下列步骤:As shown in Figure 1, the civil aviation security public opinion emotion analysis method provided by the present invention comprises the following steps carried out in order:

(1)对互联网上包含民航安保舆情关键词的微博文本进行检索、预处理和分词操作;(1) Retrieve, preprocess, and word-segment operations on microblog texts on the Internet that contain civil aviation security public opinion keywords;

抓取互联网上包含民航安保舆情关键词的微博文本,作为本发明的分析对象,从这些微博文本中检索涉及民航安保舆情的关键词,关键词分为地点词语和行为词语两类,为保证数据获取的全面性,检索策略分为单个词语检索和组合检索两种方式,单个词语检索即把两类词语分别单独进行检索,其中地点词语如机场、跑道、航站楼、航班等,行为词语如炸机、劫机、空闹、斗殴、抗议等;组合检索为地点+行为的检索模式,例如“机场+炸弹”、“航班+爆炸”、“机场+劫机”等,并将检索结果存储在数据库中;Grab the microblog texts that contain civil aviation security public opinion keywords on the Internet, as the analysis object of the present invention, retrieve keywords related to civil aviation security public opinion from these microblog texts, keywords are divided into two types of location words and behavior words, for To ensure the comprehensiveness of data acquisition, the retrieval strategy is divided into two methods: single word retrieval and combination retrieval. Single word retrieval refers to the retrieval of two types of words separately. Among them, location words such as airport, runway, terminal building, flight, etc., behavior Words such as plane bombing, hijacking, fuss, brawl, protest, etc.; combined retrieval is the retrieval mode of location + behavior, such as "airport + bomb", "flight + explosion", "airport + hijacking", etc., and store the retrieval results in the database;

同时,为了提高系统效率,设置微博用户“白名单”,该名单内的微博用户为各地机场公安官方微博和新闻门户网站微博。由于这些微博用户经常发布包含民航安保舆情关键词的微博,但又不在预警监测范围内,因此在关键词检索时予以去除。At the same time, in order to improve the efficiency of the system, a "white list" of Weibo users is set up. The Weibo users in this list are the official Weibo of the airport police and news portal Weibo. Since these microblog users often post microblogs containing keywords of civil aviation security public opinion, but they are not within the scope of early warning and monitoring, they are removed from the keyword search.

然后对上述检索结果进行预处理操作,以去除微博文本中与情感表达无关的噪声信息,例如:(1)网页链接,形如“http://t.cn/Rtj0WWN”等,由于不包含有用信息,因此在预处理时去除。(2)转发、回复微博时的用户昵称、话题标签、特殊字符等,形如“@李琼子:回复@草图匠老王:这事儿归精神病院管警察叔叔不管”,其中@符号后的微博用户名需要去除。并提取表情符号,由于抓取到的微博文本中表情符号用方括号文字表示,如“这就是口碑和效果的见证[强]”,其中的表情符号为提取出方括号内的文字,用于表情符号的情感值计算。Then preprocess the above retrieval results to remove noise information irrelevant to emotional expression in the microblog text, for example: (1) Web page links, such as "http://t.cn/Rtj0WWN", etc., because they do not contain Useful information, so it is removed during preprocessing. (2) User nicknames, hashtags, special characters, etc. when reposting and replying to Weibo, in the form of "@李琼子: Reply to @船图托老王: This matter belongs to the police uncle in the mental hospital", where the Weibo after the @ symbol The blog username needs to be removed. And extract the emoticons, because the emoticons in the captured Weibo text are represented by square brackets, such as "This is the testimony of word-of-mouth and effect [strong]", the emoticons are The text in the square brackets is extracted for the calculation of the emotional value of the emoticon.

之后利用分词工具对上述经过预处理后的结果进行分词,分词工具使用Java开源分词工具Ansj。Then use the word segmentation tool to segment the above preprocessed results. The word segmentation tool uses the Java open source word segmentation tool Ansj.

(2)构建用于微博文本语义分析所需的各类词典,构建方法分为选取现有词典和自主构造的方式,各类词典的构成如下:(2) Construct all kinds of dictionaries required for semantic analysis of microblog texts. The construction methods are divided into selecting existing dictionaries and self-construction. The composition of various dictionaries is as follows:

1)情感词典:目前主流的情感词典有台湾大学NTUSD词典、知网HowNet中英文情感词典、大连理工大学情感词汇本体库等。在这几种主流词典中,大连理工大学情感词汇本体库将词语分为正面、负面、中性三类,并按照1,3,5,7,9五种强度标注情感极性,有利于微博文本情感值的计算,因此选用该词典作为本发明中的情感词典。1) Emotional dictionaries: The current mainstream emotional dictionaries include NTUSD Dictionary of National Taiwan University, HowNet Chinese-English Emotional Dictionary, Dalian University of Technology Emotional Lexicon Ontology Database, etc. Among these mainstream dictionaries, the Dalian University of Technology Emotional Vocabulary Ontology Database divides words into three categories: positive, negative, and neutral, and marks the emotional polarity according to five intensities of 1, 3, 5, 7, and 9, which is beneficial to micro The calculation of blog text emotion value, so select this dictionary as the emotion dictionary in the present invention.

2)否定词典:当情感词前面出现否定词时,会发生情感极性的反转,本发明统计了常用否定词共19个,由此构造成否定词典。所述的否定词为:不、没、无、非、莫、弗、毋、未、否、别、无非、不够、不是、不曾、未必、没有、不要、难以、未曾。2) Negative dictionary: when negative words appear in front of emotional words, the inversion of emotional polarity can take place, the present invention has counted 19 commonly used negative words, is constructed into negative dictionary thus. The negative words described are: no, no, nothing, no, Mo, Eph, no, no, no, other, nothing, not enough, not, never, not necessarily, no, not, difficult, never.

3)修饰词典:根据中文语法的语义规则,情感值的强弱与程度副词的修饰作用有直接关系,所以将修饰词选取为判断规则之一十分必要。本发明中的修饰词典选择知网(HowNet)中的中文程度级别词语,分为6个级别,如表1所示。3) Modification dictionary: According to the semantic rules of Chinese grammar, the strength of emotional value is directly related to the modification function of degree adverbs, so it is necessary to select modifiers as one of the judgment rules. The modified dictionary in the present invention selects the Chinese level words in HowNet (HowNet), and is divided into 6 levels, as shown in Table 1.

表1修饰词典示例Table 1 Modification dictionary example

4)连词词典:在复合句中,连词会导致情感极性的变化,从而对情感极性的判定产生影响。本发明选取了具有转折关系、递进关系和让步关系的连词作为连词词典。所述的的表示转折关系的连词为:但是、但、然而、而、可是、不过、就是、只是、可;表示递进关系的连词为:而且、甚至、以至、并且、尤其;表示让步关系的连词为:即使、尽管、哪怕、就算、纵然。4) Conjunction dictionary: In compound sentences, conjunctions can lead to changes in emotional polarity, thereby affecting the judgment of emotional polarity. The present invention selects the conjunctions with turning relationship, progressive relationship and concession relationship as the conjunction dictionary. The above-mentioned conjunctions expressing turning relationship are: but, but, however, but, but, however, that is, only, can; the conjunctions expressing progressive relationship are: and, even, even, and, especially; expressing concessional relationship The conjunctions of are: even though, despite, even though, even if, even though.

5)表情符号词典:新浪微博中提供了大量的表情符号,本发明选取了新浪微博内置表情,情感极性分为正向和负向两类,并按照强度进行人工标注。如表2所示。5) Emoji dictionary: Sina Weibo provides a large number of emoticons, and the present invention selects the built-in emoticons of Sina Weibo, and the emotional polarity is divided into two categories, positive and negative, and manually marked according to intensity. As shown in table 2.

表2表情符号词典示例Table 2 Example of emoji dictionary

6)网络热词词典:微博作为一种社交媒体,其文字具有非正式和口语化的特点,因此网络用语的使用频率很高,但这些流行词语并不包含在传统的情感词典中,将网络新词加入情感词典十分必要。在网词网(http://wangci.net)中,有网络流行用语词库和释义。本发明抓取了该网站的494个网络热门词语,按照与情感词典相同的准则,标注出情感极性和强度而构造成网络热词词典,作为情感词典的补充。6) Dictionary of online hot words: Weibo, as a social media, has the characteristics of informality and colloquialism, so the frequency of use of online words is very high, but these popular words are not included in the traditional emotional dictionary. It is very necessary to add new words on the Internet to the emotional dictionary. In Wangci.net (http://wangci.net), there are thesaurus and definitions of Internet popular terms. The present invention captures 494 network hot words of the website, marks the emotional polarity and intensity according to the same criterion as the sentiment dictionary, and constructs a network hot word dictionary as a supplement to the sentiment dictionary.

7)民航安保舆情词典:选取民航安保舆情关键词中的地点词语和行为词语构造成民航安保舆情词典。除基础词语外,本发明搜集了常见字的同音错别字以增加词典的完备性。例如,“炸”的同音错别字“诈”、“劫”的同音错别字“截”等,词典中增加的词语为“诈机”、“诈弹”、“截机”等。7) Civil aviation security public opinion dictionary: select the location words and behavior words in the civil aviation security public opinion keywords to construct the civil aviation security public opinion dictionary. Except the basic words, the present invention collects the homophonic typos of common words to increase the completeness of the dictionary. For example, the homophonic typos of "boom" and "jue" and the homophonic typos of "jie" are "cut", etc., and the words added in the dictionary are "fraud machine", "fraud bomb", "cut machine" and so on.

(3)根据上述步骤(2)构建的词典,对上述经步骤(1)分词后的微博进行打分,得到该微博的情感分值;(3) according to the dictionary that above-mentioned step (2) constructs, above-mentioned microblog after step (1) participle is scored, obtains the emotion score of this microblog;

具体步骤如图2所示。The specific steps are shown in Figure 2.

1)从上述经步骤(1)分词后的微博文本中提取或确定情感词:1) Extract or determine the emotional words from the above-mentioned microblog text after step (1) word segmentation:

将上述微博文本中经过分词后得到的词语与上述情感词典和网络热词词典进行匹配,若某一词语存在于上述两个词典中,则选取为情感词;The words obtained after word segmentation in the above-mentioned microblog text are matched with the above-mentioned emotional dictionary and the network hot word dictionary, and if a certain word exists in the above-mentioned two dictionaries, then it is selected as an emotional word;

如果词语没有出现在情感词典和网络热词词典中,则用语义相似度的方法确定出情感词。为了降低运算量,只保留了名词、动词和形容词作为备选情感词。本发明利用知网语义相似度算法作为基准算法,其在衡量两个词语相似度上具有很好的效果。具体方法是对于两个词语w1和w2,如果词语w1有n个义项或概念:x1,x2…,xn,词语w2有m个义项或概念:y1,y2…,ym,规定词语w1和w2的相似度是各个义项或概念相似度的最大值,即:If the word does not appear in the sentiment dictionary and the network hot word dictionary, the sentiment word is determined by the method of semantic similarity. In order to reduce the amount of computation, only nouns, verbs and adjectives are reserved as candidate emotional words. The invention uses the HowNet semantic similarity algorithm as a benchmark algorithm, which has a good effect on measuring the similarity of two words. The specific method is for two words w 1 and w 2 , if word w 1 has n meanings or concepts: x 1 , x 2 ..., x n , word w 2 has m meanings or concepts: y 1 , y 2 ... ,y m , it is stipulated that the similarity of words w 1 and w 2 is the maximum value of the similarity of each meaning item or concept, that is:

两个义原的相似度计算公式为:The formula for calculating the similarity between two sememes is:

其中,λ是正的可变参数;d(x1,y2)表示义原x1和义原y2在层次树中的距离。Wherein, λ is a positive variable parameter; d(x 1 , y 2 ) represents the distance between sememe x 1 and sememe y 2 in the hierarchical tree.

对于任意一个词语,可以通过计算该词语和情感词典中种子词之间的相似度获得其情感倾向值,计算方法是:将词语w与正面情感词典中每个种子词按式(1)及式(2)进行相似度计算得到该词与正面种子词的相似度,再将词语w与负面情感词典中每个种子词进行相似度计算得到该词与负面种子词的相似度,通过比较它们之间的均差值,最终得到词语w的情感倾向值,计算公式如下:For any word, its emotional tendency value can be obtained by calculating the similarity between the word and the seed word in the sentiment dictionary. The calculation method is: the word w and each seed word in the positive sentiment dictionary according to formula (1) and formula (2) Carry out the similarity calculation to obtain the similarity between the word and the positive seed word, and then calculate the similarity between the word w and each seed word in the negative sentiment dictionary to obtain the similarity between the word and the negative seed word, by comparing them The average difference between the words, and finally get the emotional tendency value of the word w, the calculation formula is as follows:

其中,pi表示某一正面情感种子词,nj表示某一负面情感种子词;情感倾向值Sw的取值范围为(-1,1)。设定阈值T,将计算出的情感倾向值Sw与阈值T进行比较,以判定词语w是否属于情感词。当|Sw|>T时,判定词语w为情感词。该情感词的强度定为10·Sw,从而与情感词典中的强度量级保持一致。Among them, p i represents a positive emotional seed word, n j represents a negative emotional seed word; the value range of the emotional tendency value S w is (-1,1). Set the threshold T, and compare the calculated emotional tendency value S w with the threshold T to determine whether the word w belongs to the emotional word. When |S w |>T, the word w is judged to be an emotional word. The strength of the sentiment word is set as 10·S w , which is consistent with the magnitude of strength in the sentiment dictionary.

2)确定微博中包含上述情感词的每一微博子句的文本情感得分;2) Determine the text emotion score of each microblog clause containing the above-mentioned emotional words in the microblog;

2.1)若微博子句中包含情感词,且在其之前出现属于否定词典中的否定词或修饰词典中的修饰词时,按以下几种情况计算该微博子句的文本情感得分Sa:2.1) If the microblog clause contains emotional words, and there are negative words in the negative dictionary or modifier words in the modification dictionary before it, the text sentiment score Sa of the microblog clause is calculated according to the following situations:

i)程度副词+情感词,情感词强度随副词强度改变,文本情感得分为:i) degree adverb+emotional word, the intensity of the emotional word changes with the intensity of the adverb, and the emotional score of the text is:

Sa=Ma·ps·pa (4)Sa=M a ps pa (4)

j)否定词+情感词,情感词的极性按照否定词的个数而改变,文本情感得分为:j) Negative words + emotional words, the polarity of emotional words changes according to the number of negative words, and the sentiment score of the text is:

Sa=(-1)n·ps·pa (5)Sa=(-1) n ps pa (5)

k)程度副词+否定词+情感词,情感词极性反转,并且强度随副词强度改变,文本情感得分为:k) Degree adverb + negative word + emotional word, the polarity of the emotional word is reversed, and the intensity changes with the intensity of the adverb, the text emotion score is:

Sa=(-1)·Ma·ps·pa (6)Sa=(-1) M a ps pa (6)

l)否定词+程度副词+情感词,由于否定出现在程度副词之前,情感词极性反转后,情感词强度较直接否定有所减弱,引入第一权重因子z1=0.5,文本情感得分为:l) Negative words + degree adverbs + emotional words, since negation appears before degree adverbs, after the polarity of emotional words is reversed, the strength of emotional words is weakened compared with direct negation, the first weight factor z 1 = 0.5 is introduced, and the text emotion score for:

Sa=(-1)·Ma·ps·pa·z1 (7)Sa=(-1) M a ps pa z 1 (7)

其中,ps表示情感词的强度,pa表示情感词极性,Ma表示程度副词的强度:Among them, ps represents the intensity of emotional words, pa represents the polarity of emotional words, and M a represents the intensity of degree adverbs:

2.2)若微博子句中包含连词词典中的转折连词,该微博子句属于复合句,考虑到句间的情感极性转移,按以下几种情况计算该微博子句的文本情感得分:2.2) If the microblog clause contains transitional conjunctions in the conjunction dictionary, the microblog clause is a compound sentence. Considering the emotional polarity transfer between sentences, the text sentiment score of the microblog clause is calculated according to the following situations:

g)转折关系:当微博子句中出现“但是”、“然而”等语义反转词汇时,前一微博子句的极性将会发生改变,这两个微博子句的整体极性将与后一个微博子句相同,引入第二权重因子z2=-1,文本情感得分为:g) Turning relationship: when semantic reversal words such as "but" and "however" appear in a microblog clause, the polarity of the previous microblog clause will change, and the overall polarity of the two microblog clauses will be the same as The latter microblog clause is the same, and the second weight factor z 2 =-1 is introduced, and the sentiment score of the text is:

Sen=z2Sen1+Sen2 (8)Sen=z 2 Sen 1 +Sen 2 (8)

h)递进关系:前后两个微博子句极性相同,强度增强,引入第三权重因子z3=1.5,文本情感得分为:h) Progressive relationship: the two microblog clauses before and after have the same polarity, and the strength is enhanced. The third weight factor z 3 =1.5 is introduced, and the sentiment score of the text is:

Sen=z3(Sen1+Sen2) (9)Sen=z 3 (Sen 1 +Sen 2 ) (9)

i)让步关系:后一个微博子句的极性会发生反转,整句的极性与前一微博子句相同,引入第四权重因子z4=-1,文本情感得分为:i) Concession relationship: the polarity of the latter microblog clause will be reversed, the polarity of the whole sentence is the same as that of the previous microblog clause, and the fourth weight factor z 4 =-1 is introduced, and the text sentiment score is:

Sen=Sen1+z4Sen2 (10)Sen=Sen 1 +z 4 Sen 2 (10)

其中,Sen1表示前一个微博子句的文本情感得分,Sen2表示后一个微博子句的文本情感得分;Among them, Sen 1 represents the text sentiment score of the previous microblog clause, and Sen 2 represents the text sentiment score of the latter microblog clause;

3)确定微博中表情符号得分;3) Determining the score of emoticons in Weibo;

新浪微博中提供了大量的表情符号,通过在微博中使用表情符号可以鲜明地表示出该微博的情感倾向。将表情符号作为情感分值的一项加权项,对于整条微博文本的情感倾向判定具有一定的修正作用。根据表情符号词典,查出该微博中所有表情符号的极性及强度,并记录每个表情符号的个数;令Ni为第i个表情符号的个数,ei为该表情符号的强度,pi为该表情符号的极性,则微博中的表情符号得分计算公式为:Sina Weibo provides a large number of emoticons, and the use of emoticons in Weibo can clearly express the emotional tendency of the Weibo. Taking emoticons as a weighted item of emotional score has a certain corrective effect on the judgment of the emotional tendency of the entire microblog text. According to the emoticon dictionary, find out the polarity and intensity of all emoticons in the microblog, and record the number of each emoticon; let N i be the number of the i-th emoticon, e i be the number of the emoticon Intensity, p i is the polarity of the emoticon, then the calculation formula of the emoticon score in Weibo is:

4)将上述微博的文本情感得分和表情符号得分进行加权求和,即可得到每一条微博的情感分值,公式如下:4) The text sentiment score and emoticon score of the above microblogs are weighted and summed to obtain the sentiment score of each microblog. The formula is as follows:

S1=α·scoreemo+β·scoretext (12)S 1 =α·score emo +β·score text (12)

其中,α、β为可调权值,取值范围是(0,1),α+β=1,通过交叉测试集验证能够选择正确分类概率最大时的α、β值;scoretext为该微博的文本情感得分,为各微博子句文本情感得分的平均值。当情感分值S1为正时,判定该微博表达正向情感;当情感分值S1为负时,判定该微博表达负向情感。Among them, α and β are adjustable weights, the value range is (0,1), α+β=1, the value of α and β when the probability of correct classification is the highest can be selected through cross-test set verification; score text is the micro The text sentiment score of the blog is the average of the text sentiment scores of each microblog clause. When the sentiment score S 1 is positive, it is determined that the microblog expresses positive emotion; when the sentiment score S 1 is negative, it is determined that the microblog expresses negative emotion.

(4)根据步骤(3)中得到的情感分值对微博进行主客观判别,用于过滤新闻报道在内的客观微博,保留带有主观性的微博,最终得到该微博对民航安全的威胁度分值;(4) According to the emotional score obtained in step (3), the subjective and objective discrimination of microblogs is carried out, which is used to filter objective microblogs including news reports, keep subjective microblogs, and finally obtain the microblog’s impact on civil aviation Security threat score;

首先采用以下方法对微博文本进行主客观判别:First, the following methods are used to judge the subjective and objective microblog text:

1)对于情感分值S1=0的微博,若其中包含第一人称名词或代词,则认为是主观微博文本,否则为客观微博文本。1) For a microblog with an emotion score S 1 =0, if it contains first-person nouns or pronouns, it is considered as a subjective microblog text, otherwise it is an objective microblog text.

2)对于情感分值S1≠0的微博,若其中包含新闻报道的特殊谓语用词,或微博文本中的转发次数至少2次,则认为是客观微博文本,否则是主观微博文本。2) For a microblog with an emotional score S 1 ≠ 0, if it contains special predicate words for news reports, or the number of reposts in the microblog text is at least 2 times, it is considered as an objective microblog text, otherwise it is a subjective microblog text.

将客观微博文本的威胁度分值设定为0,并且不进行下面的威胁度分值计算,只计算主观微博的威胁度分值,计算公式如式(13)所示:Set the threat score of the objective microblog text to 0, and do not perform the following calculation of the threat score, and only calculate the threat score of the subjective microblog. The calculation formula is shown in formula (13):

其中,D表示威胁度分值,范围在[-10,10]之间;S1表示微博文本的情感分值;S2<w1,w2>为民航安保舆情威胁分数,w1表示地点词语,w2表示行为词语;Among them, D represents the threat score, and the range is between [-10,10]; S 1 represents the sentiment score of the microblog text; S 2 <w 1 , w 2 > is the civil aviation security public opinion threat score, and w 1 represents place words, w 2 represents behavior words;

在民航安保舆情词典中,地点词语包括机场、跑道、航站楼、航班等,行为词语包括炸机、劫机、空闹、斗殴、抗议等;其中行为词语有两个属性,第一个属性为强度,衡量了该词语对民航安保的威胁程度,度量标准分为1,3,5,7,9五种强度,与情感词语的强度度量一致。第二个属性为词语类型,词语类型分为两类,一类为直接型,即只出现这一个词语就能判定为对民航有威胁,例如炸机、劫机、霸机等;另一类为间接型,即必须与地点词语同时出现才能判定出是否对民航安保有威胁,例如斗殴、抗议、抽烟等。只存在间接型行为词语时,不足以判断其对民航安保有威胁。In the Civil Aviation Security Public Opinion Dictionary, location words include airport, runway, terminal building, flight, etc., and behavior words include bombing, hijacking, air trouble, fight, protest, etc. Among them, behavior words have two attributes, the first attribute is Intensity, which measures the threat level of the word to civil aviation security. The measurement standard is divided into five intensities of 1, 3, 5, 7, and 9, which are consistent with the intensity measurement of emotional words. The second attribute is the word type. The word type is divided into two types. One is the direct type, that is, only the appearance of this word can be judged as a threat to civil aviation, such as bombing, hijacking, and overlord, etc.; the other is Indirect type, that is, it must appear together with the location words to determine whether there is a threat to civil aviation security, such as fights, protests, smoking, etc. When there are only indirect behavior words, it is not enough to judge that they pose a threat to civil aviation security.

民航安保舆情威胁分数S2<w1,w2>的计算过程如下:查找微博文本中的行为词语w2,然后判断该行为词语的类型;当该行为词语为直接型时,民航安保舆情威胁分数S2<w1,w2>的值取该行为词语的强度;当该行为词语为间接型时,判断该微博文本中是否同时存在地点词语,如果同时存在,则民航安保舆情威胁分数S2<w1,w2>的值取该行为词语的强度,如果不同时存在,威胁分数S2<w1,w2>为0。The calculation process of civil aviation security public opinion threat score S 2 <w 1 ,w 2 >is as follows: find the behavioral word w 2 in the microblog text, and then judge the type of the behavioral word; when the behavioral word is direct, the civil aviation security public opinion The value of the threat score S 2 <w 1 ,w 2 > is the strength of the behavioral words; when the behavioral words are indirect, judge whether there are location words in the microblog text at the same time, if they exist at the same time, civil aviation security public opinion threat The value of the score S 2 <w 1 ,w 2 > is the strength of the behavior word, and if it does not exist at the same time, the threat score S 2 <w 1 ,w 2 >is 0.

(5)根据步骤(4)得到的威胁度分值判定微博文本中的言论对民航安全的威胁度等级,然后筛选出威胁度等级高的重点人员,并作为预警信息报送相关部门。(5) According to the threat score obtained in step (4), determine the threat level of the remarks in the microblog text to civil aviation safety, and then select the key personnel with high threat level, and report it to the relevant department as early warning information.

从步骤(4)中得到的威胁度分值可以看出,当威胁度分值D>0时,该微博文本表达的是积极情感,属于安全言论,因此不进行威胁度等级判定;当威胁度分值D≤0时,判定该微博文本含有民航安保舆情关键词,并表达的是消极情感,需要重点关注,然后根据下面的威胁度等级标准对微博文本进行威胁度等级判定。威胁度等级标准是对现有的微博文本进行测试而得到的,具体如下:From the threat degree score obtained in step (4), it can be seen that when the threat degree score D>0, the microblog text expresses positive emotions and belongs to safe speech, so the threat level judgment is not performed; when the threat degree When the degree score D≤0, it is determined that the microblog text contains civil aviation security public opinion keywords and expresses negative emotions, which needs to be focused on, and then the threat level of the microblog text is judged according to the following threat level standards. The threat level standard is obtained by testing existing Weibo texts, as follows:

1)-4.5≤D≤0时为低等威胁度。1) When -4.5≤D≤0, it is a low threat degree.

2)-7≤D<-4.5时为中等威胁度。2) When -7≤D<-4.5, it is a medium threat.

3)-10≤D<7时为高等威胁度。3) When -10≤D<7, it is a high threat degree.

表3列出了按照本发明方法对某些微博文本进行处理后得到的威胁度分值及威胁度等级。从表中可以看出,本发明方法能够较准确地判别出微博文本是否对民航安全具有威胁。Table 3 lists the threat degree scores and threat degree levels obtained after processing some microblog texts according to the method of the present invention. It can be seen from the table that the method of the present invention can more accurately determine whether the microblog text is a threat to civil aviation safety.

表3微博文本的威胁度判定结果Table 3 Judgment results of threat degree of microblog text

Claims (6)

1. a kind of civil aviaton's security public sentiment sentiment analysis method, it is characterised in that:Described civil aviaton's security public sentiment sentiment analysis method Including the following steps for carrying out in order:
(1) operation of line retrieval, pretreatment and participle is entered to the microblogging text comprising civil aviaton's security public sentiment key word on the Internet;
(2) build for all kinds of dictionaries needed for the analysis of microblogging text semantic, construction method is divided into the existing dictionary of selection and autonomous The mode of construction;
(3) dictionary built according to above-mentioned steps (2), gives a mark to the above-mentioned microblogging Jing after step (1) participle, obtains this micro- Rich emotion score value;
(4) subjective and objective differentiation is carried out to microblogging according to the emotion score value obtained in step (3), for news report being filtered interior Objective microblogging, retains the microblogging with subjectivity, finally gives Threat score value of the microblogging to safety of civil aviation;
(5) the Threat score value obtained according to step (4) judges the Threat grade of speech in microblogging text to safety of civil aviation, Then the high emphasis personnel of Threat grade are filtered out, and relevant departments are reported and submitted as early warning information.
2. civil aviaton's security public sentiment sentiment analysis method according to claim 1, it is characterised in that:It is in step (1), described To on the Internet comprising civil aviaton's security public sentiment key word microblogging text enter line retrieval, pretreatment and participle operation method It is:Microblogging text comprising civil aviaton's security public sentiment key word on crawl the Internet, the retrieval from these microblogging texts are related to civil aviaton The key word of security public sentiment, key word are divided into two class of place word and behavior word, search strategy be divided into single word retrieval and Combined retrieval two ways;Then pretreatment operation is carried out to above-mentioned retrieval result, to remove web page interlinkage, forward, reply micro- User's pet name, topic label when rich, spcial character is in interior noise information, and extracts emoticon;Participle work is utilized afterwards Tool carries out participle through pretreated result to above-mentioned, and participle instrument is increased income participle instrument Ansj using Java.
3. civil aviaton's security public sentiment sentiment analysis method according to claim 1, it is characterised in that:It is in step (2), described Dictionary include sentiment dictionary, negative dictionary, modification dictionary, conjunction dictionary, emoticon dictionary, network hot word dictionary and civil aviaton Security public sentiment dictionary.
4. civil aviaton's security public sentiment sentiment analysis method according to claim 1, it is characterised in that:It is in step (3), described The dictionary built according to above-mentioned steps (2), the above-mentioned microblogging Jing after step (1) participle is given a mark, the microblogging is obtained The method of emotion score value comprises the following steps:
1) extract or determination emotion word from the above-mentioned microblogging text Jing after step (1) participle:
The method for extracting emotion word is the word and above-mentioned sentiment dictionary and net that will be obtained after participle in above-mentioned microblogging text Network hot word dictionary is matched, if a certain word is present in above-mentioned two dictionary, is chosen for emotion word;
The method for determining emotion word is to adopt semantic similitude to not appearing in the word in sentiment dictionary and network hot word dictionary Degree method is carried out;Concrete grammar is for two word w1And w2If, word w1There are the n senses of a dictionary entry or concept:x1,x2…,xn, word Language w2There are the m senses of a dictionary entry or concept:y1,y2…,ym, it is stipulated that word w1And w2Similarity be each senses of a dictionary entry or concept similarity most It is big to be worth, i.e.,:
S i m ( w 1 , w 2 ) = m a x i &Element; &lsqb; 1 , n &rsqb; , j &Element; &lsqb; 1 , m &rsqb; ( S i m ( x i , y j ) ) - - - ( 1 )
Two former calculating formula of similarity of justice are:
S i m ( x 1 , y 2 ) = &lambda; &lambda; + d ( x 1 , y 2 ) - - - ( 2 )
Wherein, λ is positive variable element;d(x1,y2) represent adopted original x1With adopted original y2Distance in hierarchical tree;
By each seed words in word w and positive emotion dictionary by formula (1) and formula (2) carry out Similarity Measure obtain the word with just The similarity of face seed words, then by each seed words in word w and negative emotion dictionary carry out Similarity Measure obtain the word with The similarity of negative seed words, by comparing the equal difference between them, finally gives the Sentiment orientation value of word w, calculates public Formula is as follows:
S w = 1 n &Sigma; i = 1 n S i m ( w , p i ) - 1 m &Sigma; j = 1 m S i m ( w , n j ) - - - ( 3 )
Wherein, piRepresent a certain positive emotion seed words, njRepresent a certain negative emotion seed words;Sentiment orientation value SwValue Scope is (- 1,1);Given threshold T, by Sentiment orientation value S for calculatingwIt is compared with threshold value T, whether to judge word w Belong to emotion word;When | Sw| during > T, word w is judged as emotion word, the intensity of the emotion word is set to 10Sw
2) determine the text emotion score of each microblogging clause comprising above-mentioned emotion word in microblogging;
2.1) if emotion word is included in microblogging clause, and occur before it belonging to the negative word or modification dictionary in negative dictionary In qualifier when, calculate text emotion score Sa of microblogging clause by following several situations:
A) degree adverb+emotion word, emotion word intensity change with adverbial word intensity, and text emotion must be divided into:
Sa=Ma·ps·pa (4)
B) negative word+emotion word, the polarity of emotion word change according to the number of negative word, and text emotion must be divided into:
Sa=(- 1)n·ps·pa (5)
C) degree adverb+negative word+emotion word, emotion word polarity inversion, and intensity change with adverbial word intensity, and text emotion is obtained It is divided into:
Sa=(- 1) Ma·ps·pa (6)
D) negative word+degree adverb+emotion word, before occurring in degree adverb due to negative, after emotion word polarity inversion, emotion Word intensity negates more directly to have weakened, and introduces the first weight factor z1=0.5, text emotion must be divided into:
Sa=(- 1) Ma·ps·pa·z1 (7)
Wherein, ps represents the intensity of emotion word, and pa represents emotion word polarity, MaRepresent the intensity of degree adverb:
If 2.2) comprising the adversative conjunction in conjunction dictionary in microblogging clause, microblogging clause belongs to compound sentence, it is contemplated that between sentence Feeling polarities transfer, calculate the text emotion score of microblogging clause by following several situations:
A) turning relation:When occurring the semantic reversion vocabulary such as " but ", " but " in microblogging clause, the pole of previous microblogging clause Property will change, the integral polarity of the two microbloggings clause will be identical with latter microblogging clause, introduce the second weight because Sub- z2=-1, text emotion must be divided into:
Sen=z2Sen1+Sen2 (8)
B) progressive relationship:Former and later two microbloggings clause's polarity is identical, intensity enhancing, introduces the 3rd weight factor z3=1.5, text Emotion must be divided into:
Sen=z3(Sen1+Sen2) (9)
C) concession relation:The polarity of latter microblogging clause can occur reversion, and the polarity of whole sentence is identical with previous microblogging clause, draws Enter the 4th weight factor z4=-1, text emotion must be divided into:
Sen=Sen1+z4Sen2 (10)
Wherein, Sen1Represent the text emotion score of previous microblogging clause, Sen2Represent the text emotion of latter microblogging clause Score;
3) determine emoticon score in microblogging;
According to emoticon dictionary, the polarity and intensity of all emoticons in the microblogging are found, and records each emoticon Number;Make NiFor the number of i-th emoticon, eiFor the intensity of the emoticon, piFor the polarity of the emoticon, then Emoticon score computing formula in microblogging is:
score e m o = &Sigma; i = 1 N N i &CenterDot; e i &CenterDot; p i - - - ( 11 )
4) above-mentioned microblog text affective score and emoticon score are weighted into summation, obtain the emotion of each microblogging Score value, formula are as follows:
S1=α scoreemo+β·scoretext (12)
Wherein, α, β are adjustable weights, and span is that (0,1), alpha+beta=1 can select correct by the checking of cross-beta collection α, β value when class probability is maximum;scoretextFor the text emotion score of the microblogging, it is each microblogging clause text emotion score Meansigma methodss.
5. civil aviaton's security public sentiment sentiment analysis method according to claim 1, it is characterised in that:It is in step (4), described Subjective and objective differentiation is carried out to microblogging according to the emotion score value obtained in step (3), for filtering news report interior objective Microblogging, retains the microblogging with subjectivity, and finally giving the microblogging to the method for the Threat score value of safety of civil aviation is:
Subjective and objective differentiation is carried out to microblogging text initially with following methods:
1) for emotion score value S1=0 microblogging, if wherein including first person noun or pronoun, then it is assumed that be subjective microblogging text This, is otherwise objective microblogging text;
2) for emotion score value S1≠ 0 microblogging, if the wherein special predicate word comprising news report, or in microblogging text Hop count at least 2 times, then it is assumed that be objective microblogging text, is otherwise subjective microblogging text;
The Threat score value of objective microblogging text is set as into 0, and degree of impending score value is not calculated, only calculate subjective microblogging Threat score value, shown in computing formula such as formula (13):
D = 1 2 ( S 1 - S 2 < w 1 , w 2 > ) - - - ( 13 )
Wherein, D represents Threat score value, and scope is between [- 10,10];S1Represent the emotion score value of microblogging text;S2< w1,w2 > is that civil aviaton's security public sentiment threatens fraction, w1Represent place word, w2Expression behavior word;
Civil aviaton's security public sentiment threatens fraction S2< w1,w2The calculating process of > is as follows:Search the behavior word w in microblogging text2, so The type of behavior word is judged afterwards;When behavior word is Direct-type, civil aviaton's security public sentiment threatens fraction S2< w1,w2> Value take the intensity of behavior word;When behavior word is indirect-type, judge whether there is ground simultaneously in the microblogging text Point word, if while existed, civil aviaton's security public sentiment threatens fraction S2< w1,w2The value of > takes the intensity of behavior word, such as Exist when fruit is different, threaten fraction S2< w1,w2> is 0.
6. civil aviaton's security public sentiment sentiment analysis method according to claim 1, it is characterised in that:It is in step (5), described The Threat score value obtained according to step (4) judge the method for speech in microblogging text to the Threat grade of safety of civil aviation It is:
As Threat score value D > 0, the microblogging text representation is positive emotion, belongs to safe speech, therefore does not impend Degree grade judges;When Threat score value D≤0, judge that the microblogging text contains civil aviaton's security public sentiment key word, and express be Negative Affect, needs to pay close attention to, and then microblogging text degree of impending grade is sentenced according to following Threat classification standard It is fixed;Threat classification standard be existing microblogging text is tested obtained from, it is specific as follows:
1) during -4.5≤D≤0 it is low Threat;
2) during -7≤D < -4.5 it is medium Threat;
3) during -10≤D < 7 it is high Threat.
CN201611062208.1A 2016-11-25 2016-11-25 A kind of civil aviaton's security public sentiment sentiment analysis method Expired - Fee Related CN106598944B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611062208.1A CN106598944B (en) 2016-11-25 2016-11-25 A kind of civil aviaton's security public sentiment sentiment analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611062208.1A CN106598944B (en) 2016-11-25 2016-11-25 A kind of civil aviaton's security public sentiment sentiment analysis method

Publications (2)

Publication Number Publication Date
CN106598944A true CN106598944A (en) 2017-04-26
CN106598944B CN106598944B (en) 2019-03-19

Family

ID=58594761

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611062208.1A Expired - Fee Related CN106598944B (en) 2016-11-25 2016-11-25 A kind of civil aviaton's security public sentiment sentiment analysis method

Country Status (1)

Country Link
CN (1) CN106598944B (en)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273671A (en) * 2017-05-31 2017-10-20 江苏金琉璃科技有限公司 It is a kind of to realize the method and system that medical performance quantifies
CN107291899A (en) * 2017-06-22 2017-10-24 努比亚技术有限公司 A kind of recommendation method and terminal and computer-readable recording medium based on label
CN107908715A (en) * 2017-11-10 2018-04-13 中国民航大学 Microblog emotional polarity discriminating method based on Adaboost and grader Weighted Fusion
CN107943789A (en) * 2017-11-17 2018-04-20 新华网股份有限公司 Mood analysis method, device and the server of topic information
CN107945033A (en) * 2017-11-14 2018-04-20 李勇 A kind of analysis method of network public-opinion, system and relevant apparatus
CN108021651A (en) * 2017-11-30 2018-05-11 中科金联(北京)科技有限公司 Network public opinion risk assessment method and device
CN108319587A (en) * 2018-02-05 2018-07-24 中译语通科技股份有限公司 A kind of public sentiment value calculation method and system of more weights, computer
CN108536801A (en) * 2018-04-03 2018-09-14 中国民航大学 A kind of civil aviaton's microblogging security public sentiment sentiment analysis method based on deep learning
CN109145306A (en) * 2018-09-11 2019-01-04 刘瑞军 The three-dimensional expression generation method of text-driven
CN110069786A (en) * 2019-05-06 2019-07-30 北京理琪教育科技有限公司 Analysis method, device and the equipment of language composition Sentiment orientation
CN110084427A (en) * 2019-04-26 2019-08-02 飞叶科技股份有限公司 A kind of smart city public sentiment event prediction algorithm
CN110163688A (en) * 2019-05-30 2019-08-23 复旦大学 Commodity network public sentiment detection system
CN111104515A (en) * 2019-12-24 2020-05-05 山东众志电子有限公司 Emotional word text information classification method
CN111611385A (en) * 2020-05-27 2020-09-01 中航信移动科技有限公司 Flight monitoring and early warning system and method based on public opinion analysis
CN111626050A (en) * 2020-05-25 2020-09-04 安徽理工大学 Microblog emotion analysis method based on expression dictionary and emotion common sense
CN111881360A (en) * 2020-08-12 2020-11-03 杭州安恒信息技术股份有限公司 Public opinion data processing method, system, equipment and readable storage medium
CN111950860A (en) * 2020-07-21 2020-11-17 中证征信(深圳)有限公司 Method and device for monitoring enterprise public opinion risk index
CN112016331A (en) * 2020-10-30 2020-12-01 成都智元汇信息技术股份有限公司 Passenger transport passenger emotion analysis method
CN112364947A (en) * 2021-01-14 2021-02-12 北京崔玉涛儿童健康管理中心有限公司 Text similarity calculation method and device
CN112417258A (en) * 2020-12-02 2021-02-26 深圳市罗湖医院集团 Method, platform and terminal for crushing rumor information in health knowledge search engine
CN113220962A (en) * 2020-09-10 2021-08-06 深圳信息职业技术学院 Public opinion analysis method based on internet big data
CN114238624A (en) * 2021-06-30 2022-03-25 武汉众智数字技术有限公司 Intelligent Internet public opinion early warning and handling method and system
CN114443841A (en) * 2021-12-31 2022-05-06 深圳云天励飞技术股份有限公司 Netizen speech analysis method, device, server and storage medium
CN117010409A (en) * 2023-10-07 2023-11-07 成都中轨轨道设备有限公司 Text recognition method and system based on natural language semantic analysis
CN119336966A (en) * 2024-12-23 2025-01-21 山东理工职业学院 An Internet rumor identification system based on artificial intelligence

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101894102A (en) * 2010-07-16 2010-11-24 浙江工商大学 A method and device for analyzing subjective text sentiment tendency
US8165869B2 (en) * 2007-12-10 2012-04-24 International Business Machines Corporation Learning word segmentation from non-white space languages corpora
CN103207860A (en) * 2012-01-11 2013-07-17 北大方正集团有限公司 Method and device for extracting entity relationships of public sentiment events
CN103530360A (en) * 2013-10-12 2014-01-22 广西师范学院 Network Social Influence Maximization Algorithm Based on Microblog Text Emotional Computation
CN103559233A (en) * 2012-10-29 2014-02-05 中国人民解放军国防科学技术大学 Extraction method for network new words in microblogs and microblog emotion analysis method and system
CN104516962A (en) * 2014-12-18 2015-04-15 北京牡丹电子集团有限责任公司数字电视技术中心 Monitoring method and system for microblogging public opinion
CN104537097A (en) * 2015-01-09 2015-04-22 成都布林特信息技术有限公司 Microblog public opinion monitoring system
CN104809104A (en) * 2015-05-11 2015-07-29 苏州大学 Method and system for identifying micro-blog textual emotion
CN105389389A (en) * 2015-12-10 2016-03-09 安徽博约信息科技有限责任公司 Network public opinion transmission situation media linked analysis method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8165869B2 (en) * 2007-12-10 2012-04-24 International Business Machines Corporation Learning word segmentation from non-white space languages corpora
CN101894102A (en) * 2010-07-16 2010-11-24 浙江工商大学 A method and device for analyzing subjective text sentiment tendency
CN103207860A (en) * 2012-01-11 2013-07-17 北大方正集团有限公司 Method and device for extracting entity relationships of public sentiment events
CN103559233A (en) * 2012-10-29 2014-02-05 中国人民解放军国防科学技术大学 Extraction method for network new words in microblogs and microblog emotion analysis method and system
CN103530360A (en) * 2013-10-12 2014-01-22 广西师范学院 Network Social Influence Maximization Algorithm Based on Microblog Text Emotional Computation
CN104516962A (en) * 2014-12-18 2015-04-15 北京牡丹电子集团有限责任公司数字电视技术中心 Monitoring method and system for microblogging public opinion
CN104537097A (en) * 2015-01-09 2015-04-22 成都布林特信息技术有限公司 Microblog public opinion monitoring system
CN104809104A (en) * 2015-05-11 2015-07-29 苏州大学 Method and system for identifying micro-blog textual emotion
CN105389389A (en) * 2015-12-10 2016-03-09 安徽博约信息科技有限责任公司 Network public opinion transmission situation media linked analysis method

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273671B (en) * 2017-05-31 2018-03-30 江苏金琉璃科技有限公司 A kind of method and system realized medical performance and quantified
CN107273671A (en) * 2017-05-31 2017-10-20 江苏金琉璃科技有限公司 It is a kind of to realize the method and system that medical performance quantifies
CN107291899A (en) * 2017-06-22 2017-10-24 努比亚技术有限公司 A kind of recommendation method and terminal and computer-readable recording medium based on label
CN107908715A (en) * 2017-11-10 2018-04-13 中国民航大学 Microblog emotional polarity discriminating method based on Adaboost and grader Weighted Fusion
CN107945033A (en) * 2017-11-14 2018-04-20 李勇 A kind of analysis method of network public-opinion, system and relevant apparatus
CN107943789A (en) * 2017-11-17 2018-04-20 新华网股份有限公司 Mood analysis method, device and the server of topic information
CN108021651B (en) * 2017-11-30 2020-07-28 中科金联(北京)科技有限公司 Network public opinion risk assessment method and device
CN108021651A (en) * 2017-11-30 2018-05-11 中科金联(北京)科技有限公司 Network public opinion risk assessment method and device
CN108319587A (en) * 2018-02-05 2018-07-24 中译语通科技股份有限公司 A kind of public sentiment value calculation method and system of more weights, computer
CN108319587B (en) * 2018-02-05 2021-11-19 中译语通科技股份有限公司 Multi-weight public opinion value calculation method and system and computer
CN108536801A (en) * 2018-04-03 2018-09-14 中国民航大学 A kind of civil aviaton's microblogging security public sentiment sentiment analysis method based on deep learning
CN109145306A (en) * 2018-09-11 2019-01-04 刘瑞军 The three-dimensional expression generation method of text-driven
CN110084427A (en) * 2019-04-26 2019-08-02 飞叶科技股份有限公司 A kind of smart city public sentiment event prediction algorithm
CN110069786A (en) * 2019-05-06 2019-07-30 北京理琪教育科技有限公司 Analysis method, device and the equipment of language composition Sentiment orientation
CN110163688A (en) * 2019-05-30 2019-08-23 复旦大学 Commodity network public sentiment detection system
CN111104515A (en) * 2019-12-24 2020-05-05 山东众志电子有限公司 Emotional word text information classification method
CN111626050A (en) * 2020-05-25 2020-09-04 安徽理工大学 Microblog emotion analysis method based on expression dictionary and emotion common sense
CN111626050B (en) * 2020-05-25 2023-12-12 安徽理工大学 Microblog emotion analysis method based on expression dictionary and emotion general knowledge
CN111611385A (en) * 2020-05-27 2020-09-01 中航信移动科技有限公司 Flight monitoring and early warning system and method based on public opinion analysis
CN111950860B (en) * 2020-07-21 2024-04-16 中证征信(深圳)有限公司 Monitoring method and device for enterprise public opinion risk index
CN111950860A (en) * 2020-07-21 2020-11-17 中证征信(深圳)有限公司 Method and device for monitoring enterprise public opinion risk index
CN111881360A (en) * 2020-08-12 2020-11-03 杭州安恒信息技术股份有限公司 Public opinion data processing method, system, equipment and readable storage medium
CN113220962A (en) * 2020-09-10 2021-08-06 深圳信息职业技术学院 Public opinion analysis method based on internet big data
CN112016331A (en) * 2020-10-30 2020-12-01 成都智元汇信息技术股份有限公司 Passenger transport passenger emotion analysis method
CN112417258A (en) * 2020-12-02 2021-02-26 深圳市罗湖医院集团 Method, platform and terminal for crushing rumor information in health knowledge search engine
CN112364947B (en) * 2021-01-14 2021-06-29 北京育学园健康管理中心有限公司 Text similarity calculation method and device
CN112364947A (en) * 2021-01-14 2021-02-12 北京崔玉涛儿童健康管理中心有限公司 Text similarity calculation method and device
CN114238624A (en) * 2021-06-30 2022-03-25 武汉众智数字技术有限公司 Intelligent Internet public opinion early warning and handling method and system
CN114443841A (en) * 2021-12-31 2022-05-06 深圳云天励飞技术股份有限公司 Netizen speech analysis method, device, server and storage medium
CN117010409A (en) * 2023-10-07 2023-11-07 成都中轨轨道设备有限公司 Text recognition method and system based on natural language semantic analysis
CN117010409B (en) * 2023-10-07 2023-12-12 成都中轨轨道设备有限公司 Text recognition method and system based on natural language semantic analysis
CN119336966A (en) * 2024-12-23 2025-01-21 山东理工职业学院 An Internet rumor identification system based on artificial intelligence
CN119336966B (en) * 2024-12-23 2025-07-11 山东理工职业学院 Network rumor recognition system based on artificial intelligence

Also Published As

Publication number Publication date
CN106598944B (en) 2019-03-19

Similar Documents

Publication Publication Date Title
CN106598944A (en) Civil aviation security public opinion emotion analysis method
CN113378565B (en) Event analysis method, device, device and storage medium for multi-source data fusion
CN103793503B (en) Opinion mining and classification method based on web texts
US10437867B2 (en) Scenario generating apparatus and computer program therefor
CN105005553B (en) Short text Sentiment orientation analysis method based on sentiment dictionary
Vishwakarma et al. Recent state-of-the-art of fake news detection: A review
CN108536801A (en) A kind of civil aviaton&#39;s microblogging security public sentiment sentiment analysis method based on deep learning
CN106202372A (en) A kind of method of network text information emotional semantic classification
CN103559233A (en) Extraction method for network new words in microblogs and microblog emotion analysis method and system
CN104915443B (en) A kind of abstracting method of Chinese microblogging evaluation object
EP3086240A1 (en) Complex predicate template gathering device, and computer program therefor
Ding et al. Scoring tourist attractions based on sentiment lexicon
Tiwari et al. Comparative analysis of different machine learning methods for hate speech recognition in twitter text data
Su et al. Mining and comparing user reviews across similar mobile apps
Azizov et al. Frank at CheckThat!-2023: Detecting the Political Bias of News Articles and News Media.
Lin Using cross‐encoders to measure the similarity of short texts in political science
Campbell et al. Content+ context networks for user classification in twitter
El Barachi et al. Combining named entity recognition and emotion analysis of tweets for early warning of violent actions
Özel et al. Effects of feature extraction and classification methods on cyberbully detection
Bahrainian et al. Fuzzy subjective sentiment phrases: A context sensitive and self-maintaining sentiment lexicon
Yuan et al. A hybrid method for multi-class sentiment analysis of micro-blogs
Zhang et al. Spam comments detection with self-extensible dictionary and text-based features
Tavan et al. Identifying Ironic Content Spreaders on Twitter using Psychometrics, Contextual and Ironic Features with Gradient Boosting Classifier.
Shanthi et al. Suicidal Ideation Prediction Using Machine Learning
Nandan et al. Sentiment Analysis of Twitter Classification by Applying Hybrid-Based Techniques

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190319

Termination date: 20191125

CF01 Termination of patent right due to non-payment of annual fee