[go: up one dir, main page]

CN108170692B - Hotspot event information processing method and device - Google Patents

Hotspot event information processing method and device Download PDF

Info

Publication number
CN108170692B
CN108170692B CN201611117512.1A CN201611117512A CN108170692B CN 108170692 B CN108170692 B CN 108170692B CN 201611117512 A CN201611117512 A CN 201611117512A CN 108170692 B CN108170692 B CN 108170692B
Authority
CN
China
Prior art keywords
hot
event
keyword
word
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611117512.1A
Other languages
Chinese (zh)
Other versions
CN108170692A (en
Inventor
林家欣
汤煌
张小鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201611117512.1A priority Critical patent/CN108170692B/en
Publication of CN108170692A publication Critical patent/CN108170692A/en
Application granted granted Critical
Publication of CN108170692B publication Critical patent/CN108170692B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a method for processing hotspot event information, which comprises the following steps: performing text word segmentation processing on the plurality of text messages to obtain word segments contained in each text message; extracting at least one keyword from the participles contained in each text message according to the statistical data of each participle obtained by processing; acquiring a flow heating metric value of each keyword according to the network flow data of the text information corresponding to each keyword; determining keywords with the flow heat degree value higher than a first preset threshold value as hot words; and according to the correlation degree among all hot words, performing hot word clustering on the determined hot words to obtain at least one event hot word cluster. The embodiment of the invention also discloses a device for processing the hotspot event information. By adopting the invention, the hotspot event in the network can be mastered timely and accurately.

Description

Hotspot event information processing method and device
Technical Field
The invention relates to the technical field of internet, in particular to a method and a device for processing hotspot event information.
Background
The hot event refers to an event which generates a certain degree of social propagation and influence in a period of time, and the subject of the event can be generally represented by a plurality of strongly related words, which are called hot words and are called hot words for short. In order to provide a network service with higher instantaneity and timeliness, it is a very important issue to grasp a hot spot event in the current network as soon as possible.
In the prior art, a member in a network analysis team generally reviews and analyzes tens of thousands of network articles in a manual browsing manner to obtain a hotspot event, so that the efficiency is very low, and the hotspot event is easily influenced by subjective factors, so that the timing and accuracy of mastering the hotspot event are greatly influenced.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for processing hotspot event information, which can automatically analyze to obtain a hotspot event according to the statistical data of the participles of the network information and the included network traffic data of the participles, so as to timely and accurately grasp the hotspot event in the network.
In order to solve the above technical problem, an embodiment of the present invention provides a method for processing hotspot event information, where the method includes:
performing text word segmentation processing on the plurality of text messages to obtain word segments contained in each text message;
extracting at least one keyword from the participles contained in each text message according to the statistical data of each participle obtained by processing;
acquiring a flow heating metric value of each keyword according to the network flow data of the text information corresponding to each keyword;
determining keywords with the flow heat degree value higher than a first preset threshold value as hot words;
and according to the correlation degree among all hot words, performing hot word clustering on the determined hot words to obtain at least one event hot word cluster.
Correspondingly, an embodiment of the present invention further provides a device for processing hotspot event information, including:
the text word segmentation module is used for performing text word segmentation processing on the plurality of text messages to obtain word segments contained in each text message;
the keyword extraction module is used for extracting at least one keyword from the participles contained in each text message according to the statistical data of each participle obtained by the processing;
the heat value acquisition module is used for acquiring the flow heat value of each keyword according to the network flow data of the text information corresponding to each keyword;
the hot word determining module is used for determining the keywords with the flow heat degree value higher than a first preset threshold value as hot words;
and the event hot word clustering module is used for performing hot word clustering on the determined hot words according to the correlation degree among the hot words to obtain at least one event hot word cluster.
The hot event information processing device in the embodiment of the invention extracts the keywords from the text information in the network, counts the network flow data of the keywords to obtain the hot words appearing in the text information, and clusters according to the correlation degree between the hot words to obtain event hot word clusters representing different hot events.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a method for processing hotspot event information in an embodiment of the present invention;
fig. 2 is a schematic flowchart of a hotspot event information processing method in another embodiment of the present invention;
fig. 3 is a schematic structural diagram of a hotspot event information processing device in the embodiment of the invention;
FIG. 4 is a schematic structural diagram of a heat value obtaining module according to an embodiment of the present invention;
FIG. 5 is a block diagram of an event hotword clustering module according to an embodiment of the present invention;
fig. 6 is a schematic diagram of a hardware structure of a hotspot event information processing device according to an embodiment of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The method and the device for processing the hotspot event information in the embodiment of the invention can be implemented in personal equipment such as a personal computer, a notebook computer, a tablet personal computer, a personal workstation and the like, and can also be implemented by a network background server, a server cluster and service site equipment. If not specifically stated, the hot spot event information processing method in the embodiment of the present invention may be implemented by the hot spot event information processing apparatus in the present invention.
Fig. 1 is a schematic flowchart of a method for processing hotspot event information in an embodiment of the present invention, where an implementation flow of the method for processing hotspot event information in an embodiment of the present invention includes:
s101, performing text word segmentation processing on the plurality of text messages to obtain word segments contained in each text message.
Specifically, the hotspot event information processing device may collect text information that appears in a period of time (for example, the current day, the last week, or the last 30 days), where the text information may include network news articles, comment articles, SNS articles (Social Networking services, here, referred to as articles published on a Social Networking service platform, such as a microblog, a friend circle, or an article published on a public number), and the like. The hotspot event information processing device may perform text segmentation on each collected text message, for example, may obtain text segments included in the text messages by using a text segmentation processing method such as full-mode segmentation or search segmentation. In addition, the text information content can be preprocessed before word segmentation, such as messy code filtering, punctuation filtering, Chinese character complex and simple conversion, word segmentation, stop word filtering and the like.
And S102, extracting at least one keyword from the participles contained in each text message according to the statistical data of each participle obtained by processing.
Specifically, the statistical data of each participle may include a word frequency, a text number, an inverse text frequency, or the like. According to the statistical data of the participles, the frequency or the degree of meaning (such as ' the ', ' the ' can ' and the like, although more appear, the participles can not be considered as keywords) of the occurrence of each participle in the text information can be obtained, so that at least one keyword can be extracted from the participles contained in each text information.
In an alternative embodiment, at least one keyword may be extracted from the segmented words included in each text information through a TF-IDF (Term Frequency-Inverse Document Frequency) algorithm or a TextRank Document ranking algorithm.
Taking TF-IDF algorithm as an example, the word frequency TF may be the number of times a given participle appears in the certain text message divided by the total number of participles processed according to the plurality of text messages,
Figure GDA0001222416180000041
wherein n isi,jIs that the word is in document djThe denominator is in the document djTotal number of all participle features in (c). The inverse document frequency IDF may be obtained by dividing the total number of the plurality of text messages by the number of text messages containing a certain participle, and then taking the logarithm of the obtained quotient, that is:
Figure GDA0001222416180000042
where | D | is the total number of the plurality of text messages, | { j: t |i∈djIs taken to contain a word tiNumber of text messages (i.e. n)k,jNumber of text information not equal to 0). To assess how important a word is to a document or a set of domain documents in a corpus.
tfi-dfi,j=tfi,j×idfiOften, a high word frequency within a particular document, and a low document frequency for that word across the entire document set, may result in a high-weighted TF-IDF. Therefore, by filtering lower words of TF-IDFCommon words can be filtered out, and important words are reserved. In the embodiment of the present invention, a preset number (e.g., 3, 5, or 10) of the segmentations with the highest TF-IDF in the segmentations of each text message may be determined as the keyword.
Similarly, the importance of the participles in a certain text message can be ranked through a TextRank algorithm, and the participles with the highest importance in a preset number are determined as keywords.
S103, acquiring the flow heating capacity value of each keyword according to the network flow data of the text information corresponding to each keyword.
The text information corresponding to the keyword is the text information containing the keyword, and when the network traffic generated by the text information containing a certain keyword is larger, the situation that the attention possibly received by the keyword is more likely to be indicated to a certain extent, so that the traffic heat value of each keyword can be obtained according to the network traffic data of the text information corresponding to each keyword. The network traffic data may include click rate, share rate, comment rate, text information amount or text source amount, and the like.
In view of the fact that the network traffic data fluctuates in absolute value due to the fluctuations of holidays and the like, in an alternative embodiment, the hotspot event information processing device may determine the traffic heat value of the target keyword according to the traffic ratio of the network traffic data of the text message including the target keyword to the total network traffic data of the plurality of text messages. For example, the following normalized score calculation formula is used:
Figure GDA0001222416180000043
wherein N is the size of the keyword vocabulary, i represents the keyword numbered i in the keyword vocabulary, KciIs the count of network traffic data containing text information for the keyword numbered i,
Figure GDA0001222416180000051
being network traffic data of all keywordsSum of counts KriIs the normalized count score. For example, 1000 pieces of text information are collected, and network traffic data (for example, click rate) generated by acquiring the 1000 pieces of text information is 100000, where 200 pieces of text information include the keyword "erlotinib", and then the network traffic data generated by the 200 pieces of text information including the keyword "erlotinib" is 20000, then the traffic ratio is 20000/100000 ═ 0.2, and then the traffic heat value of the keyword "erlotinib" may be determined to be 0.2, and in an optional embodiment, in order to prevent the value from being too small, the traffic ratio may be multiplied by a set value to be used as the traffic heat value.
In an optional embodiment, the hotspot event information processing device may further distinguish network traffic data of different traffic types, for example, count the number of clicks, the number of shared messages, the number of text messages, and the number of text sources, further obtain a traffic ratio between the network traffic data of each traffic type of the text message including the target keyword and the total network traffic data of the corresponding type of the text messages, and then determine the traffic heat value of the target keyword according to a product of the traffic ratios of the target keyword to the traffic types.
In an alternative embodiment, considering the smoothness of the heat change, a laplacian smoothing operator smooth _ ratio may be further introduced in calculating the flow heat value, such as calculating the flow heat value of each keyword using the following formula:
Figure GDA0001222416180000052
wherein i is the number of the keyword, M is the number of the flow type, j is the number of the flow type, Kri(j) For normalized score of keyword numbered i on the day on flow type j, Median (Kr)i(j) Kr is the last 30 daysi(j) Smooth _ ratio is the laplacian smoothing factor. Illustratively, the value may be 0.00005. Therefore, the flow heat value of the keyword obtained by the calculation of the formula is usually between 0.05 and 2000]An interval.
And S104, determining the keywords with the flow heat degree value higher than a first preset threshold value as the hot words.
The flow heat value of each keyword is obtained through the calculation, the hotspot event information processing device may determine the keyword with the flow heat value higher than a first preset threshold as the hotspot word, and may set the first preset threshold according to different flow heat value calculation manners, for example, the flow heat value obtained through the calculation of the above formula (1) may be 2 or 500, that is, all the keywords with the flow heat values higher than 2 are determined as the hotspot words, or all the keywords with the flow heat values higher than 500 are determined as the hotspot words. In addition, the first preset threshold value can be determined according to the distribution condition of the flow heat values of all the keywords obtained through calculation, for example, 30% of the keywords can be screened out as hot words by the set first preset threshold value.
From the perspective of keyword distribution related to a hotspot event, a typical hotspot event generally consists of several central hotwords and several event-related peripheral hotwords, so that the hotspot words may include the central hotwords and the peripheral hotwords. For example, when a certain marriage event occurs in 2016, 8 months, the central hot words are a certain king, a certain horse and a broker, and the peripheral hot words comprise green cap, rape, divorce and the like. Further, in an optional embodiment, the hotspot event information processing device may determine a keyword with a traffic heat value not lower than a second preset threshold as a central hotword, and determine a keyword with a traffic heat value lower than the second preset threshold and higher than the first preset threshold as a peripheral hotword. Still taking the flow rate calorific value calculated by the above equation (1) as an example, the first preset threshold value may be set to 2, and the second preset threshold value may be set to 500.
And S105, performing hot word clustering on the determined hot words according to the correlation degree among the hot words to obtain at least one event hot word cluster.
Because more than one hot word may often appear in one hot event, for example, the hot words of one king, a mao, a broker and the like exist in the certain event of the king, and the hot words actually belong to only one aspect of the same hot event, the correlation degree among the hot words needs to be analyzed, and then the hot words with strong correlation are subjected to hot word clustering, so that different hot words belonging to the same hot event are prevented from belonging to multiple hot events, and thus, a large deviation and influence are caused on the analysis and statistical result of the hot events.
In a specific implementation, the hotspot event information processing device can determine the correlation degree between the hotspot words according to the mutual information, the heat vector cooperation degree or the semantic correlation degree between the hotspot words. Wherein:
mutual information formula: pmi (w)1,w2)=log(p(w1,w2)/p(w1)p(w2)),p(w1) The expression w1Probability of occurrence of p (w)2) The expression w2Probability of occurrence of p (w)1,w2) I.e. hot word w1With the hot word w2Probability of appearing in the same text message.
Semantic similarity formula: sem (w)1,w2)=cos(V′w1,V′w2),V′w1And V'w2Are respectively a word w1And w2For example, each hot Word can be represented as a semantic Vector with the same dimension by a Word2Vector method, and then a cosine value between the semantic vectors respectively representing two hot words is calculated as a semantic similarity.
The heat vector cooperation degree formula: hsim (w)1,w2)=cos(Vw1,Vw2),Vw1And Vw2Are respectively a word w1And w2In an optional embodiment, network traffic data of text information including a certain hot word in a plurality of preset time periods may be used as the multidimensional heat vector of the hot word, where the network traffic data including the hot word in each preset time period is respectively used as a one-dimensional vector value in the multidimensional heat vector of the hot word, for example, in the last 7 days, each day includes a hot word w1As a one-dimensional vectorSo as to obtain a 7-dimensional vector representing the hot word; and then calculating cosine values between the multidimensional heat vectors corresponding to the two hot words respectively, namely, the cosine values respectively representing the two hot words are mostly cosine values between the heat vectors, and the cosine values are used as the heat vector cooperation degree between the two hot words.
In an optional embodiment, the hotspot event information processing device may determine the correlation degree between the hotspot words according to any one of the mutual information, the popularity vector cooperation degree, or the semantic correlation degree between the hotspot words, and may also determine the correlation degree between the hotspot words by combining the correlation calculation results of the three aspects, for example, according to the following formula:
similarity formula: sim (w)1,w2)=pmi(w1,w2)·Hsim(w1,w2)·Sem(w1,w2)
After the relevancy among all the hot words is obtained, the hot event information processing device can perform hot word clustering on the hot words with the relevancy reaching a preset relevancy threshold.
Further, in an optional embodiment, the hot words include center hot words and peripheral hot words, when clustering is performed, the hot event information processing device may cluster the peripheral hot words to corresponding center hot words, that is, calculate the correlation between each peripheral hot word and each center hot word, respectively, and if the correlation between a center hot word with the highest correlation with a certain peripheral hot word and the peripheral hot word reaches the preset correlation threshold, cluster the peripheral hot words and the center hot words with the highest correlation into a cluster. And then clustering between the central hot words, namely calculating the correlation between each central hot word, if the correlation between two central hot words reaches the preset correlation threshold, clustering the two central hot words into one class, simultaneously clustering the peripheral hot words which are respectively clustered with the two central hot words into one class and the two central hot words into one class, and finally obtaining at least one event hot word cluster, wherein each event hot word cluster can represent one hot event.
Further, in an optional embodiment, the hot spot event information processing apparatus may output the obtained at least one event hotword cluster as a result, may further analyze each hot spot event according to the obtained at least one event hotword cluster, for example, count attention of a certain hot spot event or network traffic data, and may further classify text information in the network according to the obtained at least one event hotword cluster, that is, establish an association between the text information in the network and each hot spot event.
Therefore, the hot event information processing device in the embodiment of the invention extracts the keywords from the text information in the network, counts the network flow data of the keywords to obtain the hot words appearing in the text information, and clusters the hot words according to the correlation degree between the hot words to obtain event hot word clusters representing different hot events.
Fig. 2 is a schematic flow chart of a hotspot event information processing method in another embodiment of the present invention, where as shown in the flowchart of the hotspot event information processing method in the present embodiment, the method may include:
s201, performing text word segmentation processing on the plurality of text messages to obtain word segments contained in each text message.
S202, extracting at least one keyword from the participles contained in each text message according to the statistical data of each participle obtained by the processing.
S201 and S202 may refer to S101 and S102 in the foregoing embodiments, and are not described in detail in this embodiment.
S203, network traffic data of the text information corresponding to each keyword is obtained, and the network traffic data comprises at least one traffic type of the text information containing the keyword.
Network traffic data of different traffic types may include: click quantity, share quantity, text information quantity, text source quantity and the like.
S204, acquiring the flow ratio of the network flow data of each flow type of the text information containing the target keyword to the total network flow data of the corresponding type of the text information.
For example, using the following normalized score calculation formula:
Figure GDA0001222416180000081
wherein N is the size of the keyword vocabulary, i represents the keyword numbered i in the keyword vocabulary, KciIs a count of network traffic data on a certain traffic type for a text message containing a keyword numbered i,
Figure GDA0001222416180000082
is the sum of the counts of network traffic data for all keywords on the same traffic type, KriIs the normalized count score. For example, 1000 pieces of text information are collected in total, the click rate generated by acquiring the 1000 pieces of text information is 100000, 200 pieces of text information have a keyword "erlonilo", and further the click rate generated by the 200 pieces of text information including the keyword "erlonilo" is 20000, so that the flow rate is 20000/100000 ═ 0.2, and further the flow rate of the keyword "erlonilo" on the flow type of the click rate can be determined to be 0.2, and similarly, the flow rate of "erlonilo" on different flow types such as the share rate, the number of text information, and the number of text sources can be calculated respectively. In an alternative embodiment, to prevent the value from being too small, the flow ratio may be multiplied by a set value for subsequent calculation of the flow rate heating value.
S205, determining the flow heating metric value of the target keyword according to the product of the flow ratios of the target keyword corresponding to the flow types.
In an alternative embodiment, the flow calorimetric values for the respective keywords may be calculated using equation (1) in embodiment one above. The calculated flow heat value of the keyword is usually in the range of [ 0.05-2000 ].
S206, determining the keywords with the flow heat value not lower than a second preset threshold value as the central hot words, and determining the keywords with the flow heat value lower than the second preset threshold value and higher than the first preset threshold value as the central hot words.
From the perspective of keyword distribution related to a hotspot event, a typical hotspot event generally consists of several central hotwords and several event-related peripheral hotwords, so that the hotspot words may include the central hotwords and the peripheral hotwords. For example, when a certain marriage event occurs in 2016, 8 months, the central hot words are a certain king, a certain horse and a broker, and the peripheral hot words comprise green cap, rape, divorce and the like. Further, in an optional embodiment, the hotspot event information processing device may determine a keyword with a traffic heat value not lower than a second preset threshold as a central hotword, and determine a keyword with a traffic heat value lower than the second preset threshold and higher than the first preset threshold as a peripheral hotword. Still taking the flow rate calorific value calculated by the above equation (1) as an example, the first preset threshold value may be set to 2, and the second preset threshold value may be set to 500.
And S207, determining the correlation degree between the hot words according to the mutual information, the heat vector cooperation degree or the semantic similarity between the hot words.
And S208, performing hot word clustering on the determined hot words according to the correlation degree among the hot words.
In this embodiment, when clustering is performed, the hotspot event information processing device may first cluster the peripheral hotwords to the corresponding central hotword, that is, calculate the correlation between each peripheral hotword and each central hotword, and if the correlation between a central hotword with the highest correlation with a certain peripheral hotword and the peripheral hotword reaches the preset correlation threshold, cluster the peripheral hotword and the central hotword with the highest correlation into one cluster. And then clustering between the central hot words, namely calculating the correlation between each central hot word, if the correlation between two central hot words reaches the preset correlation threshold, clustering the two central hot words into one class, simultaneously clustering the peripheral hot words which are respectively clustered with the two central hot words into one class and the two central hot words into one class, and finally obtaining at least one event hot word cluster, wherein each event hot word cluster can represent one hot event.
S209, comparing the hot words in the event hot word cluster with the participles contained in the target text information respectively;
s210, determining event hot word clusters associated with the target text information according to the hot words contained in the target text information.
In this embodiment, at least one event hotword cluster obtained by S208 hotword clustering is used to perform hot event association division on the text information, that is, the event hotword cluster associated with the target text information is determined according to the hot words included in the participles of the target text information. If the target text information contains hot words of different event hot word clusters, determining the event hot word cluster associated with the target text information according to the frequency of occurrence of the hot words of the different event hot word clusters in the target text information, for example, the frequency of occurrence of the hot words in the event hot word cluster a is 20 times, the frequency of occurrence of the hot words in the event hot word cluster B is 10 times, the frequency of occurrence of the hot words in the event hot word cluster B is 5 times, and no hot words of other event hot word clusters occur, and then determining the event hot word cluster a as the event hot word cluster associated with the target text information, or referring to the target text information as the associated text information of the hot event corresponding to the event hot word cluster a.
If the hot words comprise center hot words and peripheral hot words, the participles of the target text information comprise at least one center hot word and at least one peripheral hot word in the event hot word cluster associated with the participles. Further, in an optional embodiment, the occurrence times of the central hotword and the peripheral hotwords may be weighted, for example, the weighted value of the occurrence times of the central hotword is 2, the weighted value of the occurrence times of the peripheral hotword is 1, the number of times of occurrence of the central hotword in the event hotword cluster a in the target text information is 10, the number of times of occurrence of the peripheral hotword in the event hotword cluster a is 5, then a score 10 × 2+5 is 25 after weighting, and after calculating the weighting score of other event hotword clusters in the same manner, the event hotword cluster with the highest weighting score is determined as the event hotword cluster associated with the target text information, or the target text information is associated with the hotspot event corresponding to the event hotword cluster.
Therefore, the hot event information processing device in the embodiment of the invention extracts the keywords from the text information in the network, counts the network flow data of the keywords to obtain the hot words appearing in the text information, clusters according to the correlation degree between the hot words to obtain the event hot word clusters representing different hot events, and can also perform hot event correlation division on the target text information through the event hot word clusters.
Fig. 3 is a schematic structural diagram of a hotspot event information processing device in an embodiment of the present invention, where the hotspot event information processing device in the embodiment of the present invention includes:
the text word segmentation module 310 is configured to perform text word segmentation on the multiple pieces of text information to obtain words included in each piece of text information.
Specifically, the hotspot event information processing device may collect text information that appears in a period of time (for example, the current day, the last week, or the last 30 days), where the text information may include network news articles, comment articles, SNS articles (Social Networking services, here, referred to as articles published on a Social Networking service platform, such as a microblog, a friend circle, or an article published on a public number), and the like. The text segmentation module 310 may perform text segmentation on each collected text message, for example, may use a text segmentation processing method such as full-mode segmentation or search segmentation to obtain text segments included in the text messages. In addition, the text information content can be preprocessed before word segmentation, such as messy code filtering, punctuation filtering, Chinese character complex and simple conversion, word segmentation, stop word filtering and the like.
And a keyword extraction module 320, configured to extract at least one keyword from the segmented words included in each text message according to the statistical data of each segmented word obtained through the processing.
Specifically, the statistical data of each participle may include a word frequency, a text number, an inverse text frequency, or the like. According to the statistical data of the participles, the frequency or the degree of meaning (such as ' the ', ' the ' can ' and the like, although more appear, the participles can not be considered as keywords) of the occurrence of each participle in the text information can be obtained, so that at least one keyword can be extracted from the participles contained in each text information.
In an alternative embodiment, at least one keyword may be extracted from the segmented words included in each text message through a TF-IDF algorithm or a TextRank document ranking algorithm.
The heat value obtaining module 330 is configured to obtain a flow heat value of each keyword according to the network traffic data of the text information corresponding to each keyword.
The text information corresponding to the keyword is the text information containing the keyword, and when the network traffic generated by the text information containing a certain keyword is larger, the situation that the attention possibly received by the keyword is more likely to be indicated to a certain extent, so that the traffic heat value of each keyword can be obtained according to the network traffic data of the text information corresponding to each keyword. The network traffic data may include click rate, share rate, comment rate, text information amount or text source amount, and the like.
In consideration of that the network traffic data fluctuates in absolute value due to fluctuations in holidays and the like, in an optional embodiment, the heat value obtaining module 330 may determine the traffic heat value of the target keyword according to a traffic ratio of the network traffic data of the text message including the target keyword to the total network traffic data of the plurality of text messages. For example, the following normalized score calculation formula is used:
Figure GDA0001222416180000121
wherein N is the size of the keyword vocabulary, i represents the keyword numbered i in the keyword vocabulary, KciIs the count of network traffic data containing text information for the keyword numbered i,
Figure GDA0001222416180000122
is the sum of the counts of network traffic data for all keywords, KriIs the normalized count score. For example, 1000 pieces of text information are collected, and network traffic data (for example, click rate) generated by acquiring the 1000 pieces of text information is 100000, where 200 pieces of text information include the keyword "erlotinib", and then the network traffic data generated by the 200 pieces of text information including the keyword "erlotinib" is 20000, then the traffic ratio is 20000/100000 ═ 0.2, and then the traffic heat value of the keyword "erlotinib" may be determined to be 0.2, and in an optional embodiment, in order to prevent the value from being too small, the traffic ratio may be multiplied by a set value to be used as the traffic heat value.
In an alternative embodiment, the heat value obtaining module 330 may further include, as shown in fig. 4:
the classification heat obtaining unit 331 is configured to obtain traffic ratios of network traffic data of each traffic type of text information including the target keyword to total network traffic data of corresponding types of the plurality of text information.
The method includes the steps of distinguishing network traffic data of different traffic types, for example, respectively counting click quantity, sharing quantity, text information quantity and text source quantity, and further obtaining traffic ratios of the network traffic data of each traffic type of the text information including the target keyword to total network traffic data of corresponding types of the text information.
The heat value calculating unit 332 determines the flow heat value of the target keyword according to the product of the flow ratios of the target keyword corresponding to the flow types.
In an optional embodiment, in consideration of the smoothness of the heat change, a laplacian smoothing operator smooth _ ratio may be further introduced to the calculation of the flow heat value, for example, the flow heat value of each keyword is calculated by using formula (1), and the calculated flow heat value of the keyword is usually in an interval of [ 0.05-2000 ].
And the hot word determining module 340 is configured to determine a keyword with a flow heat value higher than a first preset threshold as a hot word.
The flow heat value of each keyword is obtained through the above calculation, the hotword determining module 340 may determine the keyword whose flow hotness value is higher than a first preset threshold as a hotspot word, and may set the first preset threshold according to different flow hotness value calculation manners, for example, the flow hotness value obtained through the calculation of the above formula (1) may be 2 or 500, that is, all the keywords whose flow hotness values are higher than 2 are determined as the hotspot words, or all the keywords whose flow hotness values are higher than 500 are determined as the hotspot words. In addition, the first preset threshold value can be determined according to the distribution condition of the flow heat values of all the keywords obtained through calculation, for example, 30% of the keywords can be screened out as hot words by the set first preset threshold value.
From the perspective of keyword distribution related to a hotspot event, a typical hotspot event generally consists of several central hotwords and several event-related peripheral hotwords, so that the hotspot words may include the central hotwords and the peripheral hotwords. For example, when a certain marriage event occurs in 2016, 8 months, the central hot words are a certain king, a certain horse and a broker, and the peripheral hot words comprise green cap, rape, divorce and the like. Further, in an optional embodiment, the hotword determining module 340 may determine a keyword with a flow hotness value not lower than a second preset threshold as a central hotword, and determine a keyword with a flow hotness value lower than the second preset threshold and higher than the first preset threshold as a peripheral hotword. Still taking the flow rate calorific value calculated by the above equation (1) as an example, the first preset threshold value may be set to 2, and the second preset threshold value may be set to 500.
And the event hotword clustering module 350 is configured to perform hotword clustering on the determined hot words according to the correlation between the hot words to obtain at least one event hotword cluster.
Because more than one hot word may often appear in one hot event, for example, the hot words of one king, a mao, a broker and the like exist in the certain event of the king, and the hot words actually belong to only one aspect of the same hot event, the correlation degree among the hot words needs to be analyzed, and then the hot words with strong correlation are subjected to hot word clustering, so that different hot words belonging to the same hot event are prevented from belonging to multiple hot events, and thus, a large deviation and influence are caused on the analysis and statistical result of the hot events.
In a specific implementation, the event hotword clustering module 350 may determine the correlation degree between the hot words according to the mutual information, the hot vector cooperation degree, or the semantic correlation degree between the hot words. Wherein:
mutual information formula: pmi (w)1,w2)=log(p(w1,w2)/p(w1)p(w2)),p(w1) The expression w1Probability of occurrence of p (w)2) The expression w2Probability of occurrence of p (w)1,w2) I.e. hot word w1With the hot word w2Probability of appearing in the same text message.
Semantic similarity formula: sem (w)1,w2)=cos(V′w1,V′w2),V′w1And V'w2Are respectively a word w1And w2For example, each hot Word can be represented as a semantic Vector with the same dimension by a Word2Vector method, and then a cosine value between the semantic vectors respectively representing two hot words is calculated as a semantic similarity.
The heat vector cooperation degree formula: hsim (w)1,w2)=cos(Vw1,Vw2),Vw1And Vw2Are respectively a word w1And w2In an optional embodiment, the multidimensional heat vector may use network traffic data including text information of a certain hot word in a plurality of preset time periods as the multidimensional heat of the hot wordVector, wherein the network traffic data including the hot spot word in each preset time period is respectively used as a one-dimensional vector value in the multi-dimensional heat vector of the hot spot word, for example, the network traffic data including the hot spot word w in the last 7 days can be used as a one-dimensional vector value in each day1The number of the text information is used as a one-dimensional vector, so that a 7-dimensional vector representing the hot word is obtained; and then calculating cosine values between the multidimensional heat vectors corresponding to the two hot words respectively, namely, the cosine values respectively representing the two hot words are mostly cosine values between the heat vectors, and the cosine values are used as the heat vector cooperation degree between the two hot words.
In an alternative embodiment, the event hotword clustering module 350 may determine the correlation degree between the hot words according to any one of the mutual information, the popularity vector cooperation degree, or the semantic correlation degree between the hot words, or may determine the correlation degree between the hot words by combining the correlation calculation results of the three aspects, for example, according to the following formula:
similarity formula: sim (w)1,w2)=pmi(w1,w2)·Hsim(w1,w2)·Sem(w1,w2)
In another embodiment, the event hotword clustering module 350 further includes, as shown in fig. 5:
the heat vector obtaining unit 351 is configured to use network traffic data, which includes text information of a certain hot word in multiple preset time periods, as a multi-dimensional heat vector of the hot word, where the network traffic data, which includes the hot word in each preset time period, is respectively used as a one-dimensional vector value in the multi-dimensional heat vector of the hot word;
the synergy obtaining unit 353 is configured to calculate cosine values between the multidimensional heat vectors respectively corresponding to the two hot words, and use the cosine values as the degree of synergy of the heat vectors between the two hot words;
the relevancy obtaining unit 355 is configured to determine relevancy among the hot words according to the hot vector cooperativity among the hot words.
Further, in an optional embodiment, the hot words include center hot words and peripheral hot words, so that when clustering is performed, the event hot word clustering module 350 may cluster the peripheral hot words to corresponding center hot words, that is, calculate the correlation between each peripheral hot word and each center hot word, respectively, and if the correlation between a center hot word with the highest correlation with a certain peripheral hot word and the peripheral hot word reaches the preset correlation threshold, cluster the peripheral hot words and the center hot words with the highest correlation into a cluster. And then clustering between the central hot words, namely calculating the correlation between each central hot word, if the correlation between two central hot words reaches the preset correlation threshold, clustering the two central hot words into one class, simultaneously clustering the peripheral hot words which are respectively clustered with the two central hot words into one class and the two central hot words into one class, and finally obtaining at least one event hot word cluster, wherein each event hot word cluster can represent one hot event.
Further, in an optional embodiment, the hot spot event information processing apparatus may output the obtained at least one event hotword cluster as a result, may further analyze each hot spot event according to the obtained at least one event hotword cluster, for example, count attention of a certain hot spot event or network traffic data, and may further classify text information in the network according to the obtained at least one event hotword cluster, that is, establish an association between the text information in the network and each hot spot event.
Thus, in an optional embodiment, the hotspot event information processing device may further include:
and the hot word matching module 360 is configured to compare hot words in the at least one event hot word cluster with the segments included in the target text information.
And the hot spot text partitioning module 370 is configured to determine event hot word clusters associated with the target text information according to the hot spot words included in the target text information.
In this embodiment, hot event association division is performed on the text information by using at least one event hot word cluster obtained by the event hot word clustering module 350, that is, the event hot word cluster associated with the target text information is determined according to the hot words included in the participles of the target text information. If the target text information contains hot words of different event hot word clusters, determining the event hot word cluster associated with the target text information according to the frequency of occurrence of the hot words of the different event hot word clusters in the target text information, for example, the frequency of occurrence of the hot words in the event hot word cluster a is 20 times, the frequency of occurrence of the hot words in the event hot word cluster B is 10 times, the frequency of occurrence of the hot words in the event hot word cluster B is 5 times, and no hot words of other event hot word clusters occur, and then determining the event hot word cluster a as the event hot word cluster associated with the target text information, or referring to the target text information as the associated text information of the hot event corresponding to the event hot word cluster a.
If the hot words include a central hot word and peripheral hot words, the hot text division module 370 determines that the obtained segments of the target text information include at least one central hot word and at least one peripheral hot word in the event hot word cluster associated therewith. Further, in an optional embodiment, the occurrence times of the central hotword and the peripheral hotwords may be weighted, for example, the weighted value of the occurrence times of the central hotword is 2, the weighted value of the occurrence times of the peripheral hotword is 1, the number of times of occurrence of the central hotword in the event hotword cluster a in the target text information is 10, the number of times of occurrence of the peripheral hotword in the event hotword cluster a is 5, then a score 10 × 2+5 is 25 after weighting, and after calculating the weighting score of other event hotword clusters in the same manner, the event hotword cluster with the highest weighting score is determined as the event hotword cluster associated with the target text information, or the target text information is associated with the hotspot event corresponding to the event hotword cluster.
Therefore, the hot event information processing device in the embodiment of the invention extracts the keywords from the text information in the network, counts the network flow data of the keywords to obtain the hot words appearing in the text information, clusters according to the correlation degree between the hot words to obtain the event hot word clusters representing different hot events, and can also perform hot event correlation division on the target text information through the event hot word clusters.
It should be noted that the hotspot event information processing device may be an electronic device such as a PC, or may also be a portable electronic device such as a PAD, a tablet computer, or a laptop computer, and is not limited to the description herein; the hotspot event information processing device at least comprises a database for storing data and a processor for processing the data, and can comprise a built-in storage medium or an independently arranged storage medium.
As for the processor for data Processing, when executing Processing, the processor can be implemented by a microprocessor, a Central Processing Unit (CPU), a Digital Signal Processor (DSP), or a Programmable logic Array (FPGA); for the storage medium, the storage medium contains operation instructions, which may be computer executable codes, and the operation instructions implement the steps in the flow of the hotspot event information processing method according to the embodiment of the present invention, for example, as shown in fig. 1-2.
Fig. 6 shows an example of a hardware entity as a hotspot event information processing device. The apparatus comprises a processor 601, a storage medium 602, and at least one external communication interface 603; the processor 601, storage medium 602, and communication interface 603 are all connected by a bus 604.
The processor 601 in the hotspot event information processing device can call the operation instructions in the storage medium 602 to execute the following process:
performing text word segmentation processing on the plurality of text messages to obtain word segments contained in each text message;
extracting at least one keyword from the participles contained in each text message according to the statistical data of each participle obtained by processing;
acquiring a flow heating metric value of each keyword according to the network flow data of the text information corresponding to each keyword;
determining keywords with the flow heat degree value higher than a first preset threshold value as hot words;
and according to the correlation degree among all hot words, performing hot word clustering on the determined hot words to obtain at least one event hot word cluster.
Here, it should be noted that: the above description related to the device for processing the hotspot event information is similar to the above description of the method for processing the hotspot event information, and the description of the beneficial effects of the same method is omitted for brevity. For technical details not disclosed in the embodiment of the hotspot event information processing device of the present invention, please refer to the description of the embodiment of the method of the present invention.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (12)

1. A hotspot event information processing method is characterized by comprising the following steps:
performing text word segmentation processing on the plurality of text messages to obtain word segments contained in each text message;
extracting at least one keyword from the participles contained in each text message according to the statistical data of each participle obtained by processing;
the method comprises the steps of obtaining network traffic data of text information corresponding to each keyword, wherein the network traffic data of the text information corresponding to the keyword comprises network traffic data of at least one traffic type generated by the text information containing the keyword, and the at least one traffic type comprises click rate, share rate or comment rate;
acquiring the flow ratio of the network flow data of each flow type of the text information containing the target keyword to the total network flow data of the corresponding type of the text information;
determining a flow heat value of the target keyword according to the product of flow ratios of the target keyword to the flow types;
determining keywords with the flow heat degree value higher than a first preset threshold value as hot words;
taking network traffic data of text information containing a certain hot word in a plurality of preset time periods as a multi-dimensional heat vector of the hot word, wherein the network traffic data containing the hot word in each preset time period are respectively taken as one-dimensional vector values in the multi-dimensional heat vector of the hot word;
calculating cosine values between the multidimensional heat vectors respectively corresponding to the two hot words to serve as the heat vector cooperation degree between the two hot words;
determining the correlation degree between the hot words according to the heat vector cooperation degree between the hot words;
and according to the correlation degree among all hot words, performing hot word clustering on the determined hot words to obtain at least one event hot word cluster.
2. The method for processing hotspot event information according to claim 1, wherein the extracting at least one keyword from the participles included in each text message according to the statistical data of each participle obtained by the processing comprises:
and extracting at least one keyword from the participles contained in each text message through a word frequency-inverse document frequency algorithm or a document ranking algorithm.
3. The method for processing hotspot event information according to claim 1, further comprising:
respectively comparing hot words in the at least one event hot word cluster with participles contained in target text information;
and determining event hot word clusters associated with the target text information according to the hot words contained in the target text information.
4. The method for processing the hot spot event information according to claim 3, wherein the determining the hot spot event associated with the target text message according to the hot spot words included in the target text message includes:
and determining event hot word clusters associated with the target text information according to the number of hot words contained in the target text information and respectively belonging to different event hot word clusters.
5. The hotspot event information processing method according to claim 3 or 4, wherein the hotspot words comprise a central hotword and peripheral hotwords;
the determining the keywords with the flow heat value higher than the first preset threshold value as the hot words comprises:
determining keywords with the flow heat value not lower than a second preset threshold as central hot words, and determining keywords with the flow heat value lower than the second preset threshold and higher than a first preset threshold as peripheral hot words;
the determining event hot word clusters associated with the target text information according to the hot words contained in the target text information includes:
the participles of the target text information comprise at least one central hot word and at least one peripheral hot word in the event hot word cluster associated with the participle.
6. A hotspot event information processing device, comprising:
the text word segmentation module is used for performing text word segmentation processing on the plurality of text messages to obtain word segments contained in each text message;
the keyword extraction module is used for extracting at least one keyword from the participles contained in each text message according to the statistical data of each participle obtained by the processing;
the heat value acquisition module is used for acquiring the flow heat value of each keyword according to the network flow data of the text information corresponding to each keyword; the heat value acquisition module comprises:
the classification heat obtaining unit is used for obtaining network flow data of the text information corresponding to each keyword, the network flow data of the text information corresponding to the keyword comprises network flow data of at least one flow type generated by the text information containing the keyword, the at least one flow type comprises click quantity, sharing quantity or comment quantity, and the flow ratio of the network flow data of each flow type of the text information containing the target keyword to total network flow data of the corresponding type of the text information is obtained;
the heat value calculating unit is used for determining the flow heat value of the target keyword according to the product of the flow ratios of the target keyword corresponding to each flow type;
the hot word determining module is used for determining the keywords with the flow heat degree value higher than a first preset threshold value as hot words;
the event hot word clustering module is used for performing hot word clustering on the determined hot words according to the correlation degree among the hot words to obtain at least one event hot word cluster; the event hotword clustering module comprises:
the system comprises a heat vector acquisition unit, a multi-dimensional heat vector acquisition unit and a hot word processing unit, wherein the heat vector acquisition unit is used for taking network traffic data of text information containing a certain hot word in a plurality of preset time periods as the multi-dimensional heat vector of the hot word, and the network traffic data containing the hot word in each preset time period are respectively taken as one-dimensional vector values in the multi-dimensional heat vector of the hot word;
the cooperation degree obtaining unit is used for calculating cosine values between the multidimensional heat vectors respectively corresponding to the two hot words to serve as the cooperation degree of the heat vectors between the two hot words;
and the relevancy obtaining unit is used for determining the relevancy among all the hot words according to the popularity vector cooperativity among all the hot words.
7. The hotspot event information processing device of claim 6, wherein the keyword extraction module is configured to:
and extracting at least one keyword from the participles contained in each text message through a word frequency-inverse document frequency algorithm or a document ranking algorithm.
8. The hotspot event information processing device according to claim 6, further comprising:
the hot word matching module is used for respectively comparing the hot words in the at least one event hot word cluster with the participles contained in the target text information;
and the hot text division module is used for determining event hot word clusters associated with the target text information according to the hot words contained in the target text information.
9. The hotspot event information processing device of claim 8, wherein the hotspot text partitioning module is configured to:
and determining event hot word clusters associated with the target text information according to the number of hot words contained in the target text information and respectively belonging to different event hot word clusters.
10. The hotspot event information processing device according to claim 8 or 9, wherein the hotspot words comprise a central hotword and peripheral hotwords;
the hotword determination module is to:
determining keywords with the flow heat value not lower than a second preset threshold as central hot words, and determining keywords with the flow heat value lower than the second preset threshold and higher than a first preset threshold as peripheral hot words;
and the hot spot text partitioning module determines the obtained event hot word clusters associated with the target text information, wherein the participles of the target text information comprise at least one central hot word and at least one peripheral hot word in the associated event hot word clusters.
11. A computer storage medium, characterized in that the computer storage medium stores a computer program comprising program instructions which, when executed by a processor, perform the method according to any one of claims 1-5.
12. A hotspot event information processing device comprising a memory and a processing medium, the storage medium storing a computer program which, when executed by a processor, performs the method of any one of claims 1 to 5.
CN201611117512.1A 2016-12-07 2016-12-07 Hotspot event information processing method and device Active CN108170692B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611117512.1A CN108170692B (en) 2016-12-07 2016-12-07 Hotspot event information processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611117512.1A CN108170692B (en) 2016-12-07 2016-12-07 Hotspot event information processing method and device

Publications (2)

Publication Number Publication Date
CN108170692A CN108170692A (en) 2018-06-15
CN108170692B true CN108170692B (en) 2021-08-24

Family

ID=62526873

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611117512.1A Active CN108170692B (en) 2016-12-07 2016-12-07 Hotspot event information processing method and device

Country Status (1)

Country Link
CN (1) CN108170692B (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110750682B (en) * 2018-07-06 2022-08-16 武汉斗鱼网络科技有限公司 Title hot word automatic metering method, storage medium, electronic equipment and system
CN109032780A (en) * 2018-07-10 2018-12-18 广州极天信息技术股份有限公司 A kind of semantic web services interface arrangement
CN109635286B (en) * 2018-11-26 2022-04-12 平安科技(深圳)有限公司 Policy hotspot analysis method and device, computer equipment and storage medium
CN111368070B (en) * 2018-12-06 2024-06-21 北京国双科技有限公司 Method and device for determining hot event
CN109800431B (en) * 2019-01-23 2020-07-28 中国科学院自动化研究所 Event information keyword extraction, monitoring method and system and storage and processing device
CN110232149B (en) * 2019-05-09 2022-03-01 北京邮电大学 Hot event detection method and system
CN110351374B (en) * 2019-07-16 2022-04-01 深圳市网心科技有限公司 File deployment method, device and equipment
CN110472132B (en) * 2019-08-01 2024-07-16 腾讯科技(深圳)有限公司 Method, device and medium for acquiring safe public opinion information
CN110458296B (en) * 2019-08-02 2023-08-29 腾讯科技(深圳)有限公司 Method and device for marking target event, storage medium and electronic device
CN110990571B (en) * 2019-12-02 2024-04-02 北京秒针人工智能科技有限公司 Method and device for acquiring discussion duty ratio, storage medium and electronic equipment
CN110990708B (en) * 2019-12-11 2023-05-02 Oppo(重庆)智能科技有限公司 Hot event determination method and device, storage medium and electronic equipment
CN111309903B (en) * 2020-01-20 2023-06-16 北京大米未来科技有限公司 A data processing method, device, storage medium and electronic equipment
CN111324801B (en) * 2020-02-17 2022-06-21 昆明理工大学 Hot event discovery method in judicial field based on hot words
CN111538891B (en) * 2020-04-21 2023-04-07 招商局金融科技有限公司 Hot event monitoring method and device, computer device and readable storage medium
CN112560445A (en) * 2020-12-05 2021-03-26 上饶市中科院云计算中心大数据研究院 Method and device for detecting hot line hot spot appeal topics of captain
CN113076335B (en) * 2021-04-02 2024-05-24 西安交通大学 Network module factor detection method, system, equipment and storage medium
CN113420544B (en) * 2021-05-19 2025-03-21 北京沃东天骏信息技术有限公司 A method, device, electronic device and storage medium for determining hot words
CN113792210B (en) * 2021-08-19 2022-09-09 广州云硕科技发展有限公司 Thermal control method and system based on semantic real-time analysis
CN114186123A (en) * 2021-11-05 2022-03-15 北京百度网讯科技有限公司 Processing method, device, device and storage medium for hot event
CN118101498B (en) * 2024-04-29 2024-06-25 深圳市海域达赫科技有限公司 Network traffic prediction method, device, system and storage medium based on big data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103823792A (en) * 2014-03-07 2014-05-28 网易(杭州)网络有限公司 Method and equipment for detecting hotspot events from text document
CN104598632A (en) * 2015-02-05 2015-05-06 北京航空航天大学 Hot event detection method and device
CN105488196A (en) * 2015-12-07 2016-04-13 中国人民大学 Automatic hot topic mining system based on internet corpora
CN105740466A (en) * 2016-03-04 2016-07-06 百度在线网络技术(北京)有限公司 Method and device for excavating incidence relation between hotspot concepts

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105068991A (en) * 2015-07-30 2015-11-18 成都鼎智汇科技有限公司 Big data based public sentiment discovery method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103823792A (en) * 2014-03-07 2014-05-28 网易(杭州)网络有限公司 Method and equipment for detecting hotspot events from text document
CN104598632A (en) * 2015-02-05 2015-05-06 北京航空航天大学 Hot event detection method and device
CN105488196A (en) * 2015-12-07 2016-04-13 中国人民大学 Automatic hot topic mining system based on internet corpora
CN105740466A (en) * 2016-03-04 2016-07-06 百度在线网络技术(北京)有限公司 Method and device for excavating incidence relation between hotspot concepts

Also Published As

Publication number Publication date
CN108170692A (en) 2018-06-15

Similar Documents

Publication Publication Date Title
CN108170692B (en) Hotspot event information processing method and device
US12288033B2 (en) Method and system for securely storing private data in a semantic analysis system
US11003726B2 (en) Method, apparatus, and system for recommending real-time information
US10423648B2 (en) Method, system, and computer readable medium for interest tag recommendation
US9367603B2 (en) Systems and methods for behavioral segmentation of users in a social data network
US9183293B2 (en) Systems and methods for scalable topic detection in social media
Perdana et al. Combining likes-retweet analysis and naive bayes classifier within twitter for sentiment analysis
US10002187B2 (en) Method and system for performing topic creation for social data
Bates et al. Counting clusters in twitter posts
CN107908616B (en) Method and device for predicting trend words
CN103218368B (en) A kind of method and apparatus excavating hot word
Bykau et al. Fine-grained controversy detection in Wikipedia
US9996529B2 (en) Method and system for generating dynamic themes for social data
CN109462635B (en) Information pushing method, computer readable storage medium and server
CN105512300B (en) information filtering method and system
CN108984514A (en) Acquisition methods and device, storage medium, the processor of word
CN106776542B (en) Keyword processing method and device for user feedback information and server
CN110443264A (en) A kind of method and apparatus of cluster
CN114238782B (en) Data processing method, device, server and computer readable storage medium
CN110019556B (en) Topic news acquisition method, device and equipment thereof
CN110019763B (en) Text filtering method, system, equipment and computer readable storage medium
US11822609B2 (en) Prediction of future prominence attributes in data set
KR102078541B1 (en) Issue interest based news value evaluation apparatus and method, storage media storing the same
CN104484329B (en) Consumption hot spot method for tracing and device based on comment centre word timing variations analysis
US20170286531A1 (en) Scalable mining of trending insights from text

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant