[go: up one dir, main page]

CN107862081B - Network information source searching method and device and server - Google Patents

Network information source searching method and device and server Download PDF

Info

Publication number
CN107862081B
CN107862081B CN201711223777.4A CN201711223777A CN107862081B CN 107862081 B CN107862081 B CN 107862081B CN 201711223777 A CN201711223777 A CN 201711223777A CN 107862081 B CN107862081 B CN 107862081B
Authority
CN
China
Prior art keywords
node
phrase
public opinion
semantic
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711223777.4A
Other languages
Chinese (zh)
Other versions
CN107862081A (en
Inventor
肖仕刚
黄勇
陈航
宋国志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Silence Information Technology Co ltd
Original Assignee
Sichuan Silence Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Silence Information Technology Co ltd filed Critical Sichuan Silence Information Technology Co ltd
Priority to CN201711223777.4A priority Critical patent/CN107862081B/en
Publication of CN107862081A publication Critical patent/CN107862081A/en
Application granted granted Critical
Publication of CN107862081B publication Critical patent/CN107862081B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a network information source searching method, a network information source searching device and a server, and relates to the field of computer security. Public opinion information semantic recognition and social network node viscosity correlation analysis are adopted to enable the social network node to have the functions of information source network discovery and property recognition. Compared with the traditional keyword semantic analysis and fixed point information source relation network extraction, the method combines a phrase probability space and semantic combined matrix division method and a naive Bayes classification method to construct a node semantic tree, node depth detection and vector conversion viscosity matching to extract an information source network, a viscosity clustering algorithm and a cross-correlation recognition final information source, shows more accurate and reasonable information source capture, is based on the same public opinion characteristic data, and has the advantages of various analysis dimensions, deep social relation analysis and public opinion characteristic recognition and more visual data expression. The system has strong pertinence of detection objects, can analyze deep-level characteristics of data, detects public sentiment source networks, and easily discovers social network information sources.

Description

Network information source searching method and device and server
Technical Field
The invention relates to the field of computer security, in particular to a network information source searching method, a network information source searching device and a server.
Background
With the rapid development of the internet, the network consciousness shape safety problem is paid unprecedented attention. As an amplifier of the current thought and culture collecting place and social public opinion, the activity of the social network in the Internet reaches unprecedented level, and the characteristics of directness, burst property, deviation property and the like make the social network be a key object for social and government attention and monitoring. The rapid grasp of public opinion information, the accurate prediction of public opinion trend and the rapid mining and identification of public opinion threat sources become the key points of public opinion safety attack and defense war, however, the traditional public opinion supervision mode is completely stranded in the face of the current multi-field expansion, huge user groups and rapidly changing network environment. At present, most public opinion identification and analysis modes are based on a traditional statistical analysis mode, generally based on a manually maintained threat keyword library, and do not consider the association relationship between phrases, and do not deeply consider and analyze the transmissibility and timeliness of the phrases.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method, an apparatus, and a server for searching a network information source, so as to solve the problem that a source of threatening public opinion information cannot be found quickly and accurately.
The embodiment of the invention provides a network information source searching method, which comprises the following steps: constructing a public opinion phrase probability space according to a public opinion phrase database; extracting a phrase sequence of single public opinion information, and constructing a semantic joint probability matrix by combining the public opinion phrase probability space; obtaining threat coefficients of the single public opinion information by using the semantic joint probability matrix and a naive Bayesian classification algorithm, and constructing a node semantic tree by combining the semantic joint probability matrix; acquiring node interaction network topology distribution from a social node network through a depth detection algorithm, constructing a bidirectional node incidence matrix, and calculating a viscosity matching coefficient according to the bidirectional node incidence matrix and the node semantic tree; performing vector conversion on the bidirectional node incidence matrix to form an initial matrix to be analyzed, and acquiring an information source network from the initial matrix by using a hierarchical extraction algorithm and the viscosity matching coefficient; constructing an information source semantic tree aiming at the information source network, and drawing an information source word group viscosity distribution diagram by combining node interaction network topology distribution and utilizing a viscosity extension algorithm; and extracting an information source semantic characteristic phrase from the information source word group viscosity distribution diagram by using a viscosity clustering algorithm, and performing correlation analysis on the semantic tree of each node in the information source network to extract an information source.
Preferably, the step of constructing the public opinion phrase probability space according to the public opinion phrase database further includes: calculating the citation probability of each phrase in the public opinion phrase database, calculating the universality probability according to the phrase distribution state in the public opinion phrase database, and calculating the aging coefficient according to the use time distribution of each phrase in the public opinion phrase database; and constructing a public opinion phrase probability space according to the reference probability, the universality probability and the time efficiency.
Preferably, the step of extracting a phrase sequence of a single piece of public opinion information and constructing a semantic joint probability matrix by combining the public opinion phrase probability space further includes: extracting a phrase sequence of single public opinion information; constructing a frequency matrix according to the simultaneous occurrence frequency of any two phrases in the phrase sequence, constructing a threat weight distribution matrix according to the threat weight distribution of public opinion information formed by any two phrases in the phrase sequence in a phrase probability space, integrally constructing an individual weight product matrix according to the product of the self threat weights of any two phrases in the phrase sequence, and constructing an individual probability matrix according to the self probability space characteristics of any two phrases in the phrase sequence; and combining the frequency matrix, the threat weight distribution matrix, the individual weight product matrix and the individual probability matrix to construct a semantic joint probability matrix.
Preferably, the step of obtaining the threat coefficient of the single public opinion information by using the semantic joint probability matrix and the naive bayesian classification algorithm and constructing the node semantic tree by combining the semantic joint probability matrix further comprises: the method comprises the steps of adopting a conditional independent hypothesis to evaluate the overall rationality of single public opinion information, adopting a Markov random field chain joint probability hypothesis to evaluate the semantic rationality of the single public opinion information, obtaining a threat coefficient according to the overall rationality and the semantic rationality, and combining a semantic joint probability matrix to construct a node semantic tree.
Preferably, the step of obtaining the node interaction network topology distribution from the social node network through the depth detection algorithm further includes: the threat coefficient of each user is the average value of the threat coefficients of all public opinion information operated by the user, the threat coefficient of each public opinion information is in accumulative transformation along with the threat coefficient of the user operating the threat coefficient, the user is set as a first node, the public opinion information is set as a second node, and when the user operates certain public opinion information, a connecting edge is generated, so that cyclic diffusion is carried out, and the node network topology distribution is finally obtained.
An embodiment of the present invention further provides a device for searching a network information source, including: the probability space construction module is used for constructing a public opinion phrase probability space according to the public opinion phrase database; the probability matrix construction module is used for extracting a single public opinion information phrase sequence and constructing a semantic joint probability matrix by combining the public opinion phrase probability space; the node semantic tree construction module is used for acquiring the threat coefficient of the single public opinion information by utilizing the semantic joint probability matrix and a naive Bayesian classification algorithm and constructing a node semantic tree by combining the semantic joint probability matrix; the calculation module is used for acquiring node interaction network topological distribution from a social node network through a depth detection algorithm, constructing a bidirectional node incidence matrix and calculating a viscosity matching coefficient according to the bidirectional node incidence matrix and the node semantic tree; the acquisition module is used for carrying out vector conversion on the bidirectional node incidence matrix to form an initial matrix to be analyzed and acquiring an information source network by utilizing a layered extraction algorithm and the viscosity matching coefficient; the drawing module is used for constructing an information source semantic tree aiming at the information source network, and drawing an information source word group viscosity distribution diagram by combining node interaction network topology distribution and utilizing a viscosity extension algorithm; and the extraction module is used for extracting an information source semantic characteristic phrase from the information source word group viscosity distribution diagram by using a viscosity clustering algorithm, performing correlation analysis on the semantic tree of each node in the information source network and extracting the information source.
Preferably, the probability space construction module is further configured to: calculating the citation probability of each phrase in the public opinion phrase database, calculating the universality probability according to the phrase distribution state in the public opinion phrase database, and calculating the aging coefficient according to the use time distribution of each phrase in the public opinion phrase database; and constructing a public opinion phrase probability space according to the reference probability, the universality probability and the time efficiency.
Preferably, the probability matrix construction module is further configured to: extracting a phrase sequence of single public opinion information; constructing a frequency matrix according to the simultaneous occurrence frequency of any two phrases in the phrase sequence, constructing a threat weight distribution matrix according to the threat weight distribution of public opinion information formed by any two phrases in the phrase sequence in a phrase probability space, integrally constructing an individual weight product matrix according to the product of the self threat weights of any two phrases in the phrase sequence, and constructing an individual probability matrix according to the self probability space characteristics of any two phrases in the phrase sequence; and combining the frequency matrix, the threat weight distribution matrix, the individual weight product matrix and the individual probability matrix to construct a semantic joint probability matrix.
Preferably, the node semantic tree construction module is further configured to: the method comprises the steps of adopting a conditional independent hypothesis to evaluate the overall rationality of single public opinion information, adopting a Markov random field chain joint probability hypothesis to evaluate the semantic rationality of the single public opinion information, obtaining a threat coefficient according to the overall rationality and the semantic rationality, and combining a semantic joint probability matrix to construct a node semantic tree.
An embodiment of the present invention further provides a server, including: one or more processors; memory for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement a network information source lookup method as described above.
Compared with the prior art, the network information source searching method, the device and the server provided by the embodiment of the invention have the functions of information source network discovery and property identification through public opinion information semantic identification and social network node viscosity correlation analysis. Compared with the traditional keyword semantic analysis and fixed point information source relation network extraction, the method combines a phrase probability space and semantic combined matrix division method and a naive Bayes classification method to construct a node semantic tree, node depth detection and vector conversion viscosity matching to extract an information source network, a viscosity clustering algorithm and a cross-correlation recognition final information source, shows more accurate and reasonable information source capture, is based on the same public opinion characteristic data, and has the advantages of various analysis dimensions, deep social relation analysis and public opinion characteristic recognition and more visual data expression. The system has strong pertinence of detection objects, can analyze deep-level characteristics of data, detects public sentiment source networks, and easily discovers social network information sources.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a flowchart of a network information source searching method according to an embodiment of the present invention.
Fig. 2 is a flowchart of a network information source searching method according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of a public opinion phrase probability space according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of a node network topology distribution provided in the embodiment of the present invention.
FIG. 5 is a graph of a source word group viscosity distribution provided by an embodiment of the present invention.
Fig. 6 is a schematic diagram of an information source obtained by the viscosity cluster analysis algorithm according to the embodiment of the present invention.
Fig. 7 is a schematic structural diagram of a server according to an embodiment of the present invention.
Fig. 8 is a schematic functional module diagram of a network information source searching apparatus according to an embodiment of the present invention.
Icon: 10-a server; 101-a processor; 102-a memory; 103-a bus; 104-a communication interface; 200-network information source searching means; 201-a probability space construction module; 202-a probability matrix construction module; 203-node semantic tree construction module; 204-a calculation module; 205-an acquisition module; 206-a drawing module; 207-extraction module.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present invention, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
Fig. 1 is a flowchart of a network information source searching method according to an embodiment of the present invention. The network information source searching method provided by the embodiment is applied to a server and used for searching information sources of public opinion information with threats. The network information source searching method of the embodiment combines a phrase probability space and semantic combined matrix partitioning method and a naive Bayes classification method to construct a node semantic tree, extract an information source network by node depth detection and vector conversion viscosity matching, perform a viscosity clustering algorithm and recognize a final information source by cross correlation, and is represented as more accurate and reasonable information source capture.
Please refer to fig. 2, which is a flowchart illustrating a network information source searching method according to an embodiment of the present invention. It should be noted that the method of the present invention is not limited to the specific sequence shown in fig. 2 and described below. As will be described in detail below with respect to the specific process and steps shown in fig. 2, the network information source searching method includes:
and S101, constructing a public opinion phrase probability space according to a public opinion phrase database.
The public opinion phrase database is a preset database, and pre-stored phrases with threats are included in the public opinion phrase database. Calculating the citation probability of each phrase in the public opinion phrase database, calculating the universality probability according to the phrase distribution state in the public opinion phrase database, and calculating the aging coefficient according to the use time distribution of each phrase in the public opinion phrase database;
and constructing a public opinion phrase probability space according to the reference probability, the universality probability and the time efficiency. And as shown in fig. 3, the distribution of the public sentiment phrases in the three-dimensional coordinates constructed by the reference probability, the generality probability and the aging coefficient is shown. The three-dimensional phrase probability space simplifies the later analysis of the public sentiment semantic characteristics by extracting the public sentiment phrase characteristics and the probability coefficient and displaying the public sentiment phrase characteristics and the probability coefficient in a three-dimensional space distribution mode, so that the characteristic identification accuracy can be improved.
And S102, extracting a phrase sequence of single public sentiment information, and constructing a semantic joint probability matrix by combining the public sentiment phrase probability space.
And extracting a phrase sequence of the single piece of public opinion information aiming at the single piece of public opinion information from all the public opinion information needing to be analyzed.
Constructing a frequency matrix according to the simultaneous occurrence frequency of any two phrases in the phrase sequence, constructing a threat weight distribution matrix according to the threat weight distribution of public opinion information formed by any two phrases in the phrase sequence in a phrase probability space, integrally constructing an individual weight product matrix according to the product of the self threat weights of any two phrases in the phrase sequence, and constructing an individual probability matrix according to the self probability space characteristics of any two phrases in the phrase sequence.
And finally, combining the frequency matrix, the threat weight distribution matrix, the individual weight product matrix and the individual probability matrix to construct a semantic joint probability matrix.
And step S103, obtaining the threat coefficient of the single public opinion information by utilizing the semantic joint probability matrix and a naive Bayesian classification algorithm, and constructing a node semantic tree by combining the semantic joint probability matrix.
The method comprises the steps of adopting a conditional independent hypothesis to evaluate the overall rationality of single public opinion information, adopting a Markov Random Field (MRF) chain joint probability hypothesis to evaluate the semantic rationality of the single public opinion information, obtaining a threat coefficient according to the obtained overall rationality and the semantic rationality, and combining a semantic joint probability matrix to construct a node semantic tree.
Specifically, a naive bayes classification algorithm: the posterior probability is standard similarity and prior probability, the public sentiment information is assumed to be D, the threat characteristic of the public sentiment information D is obtained by N phrases forming the information, and H is used+Representing threat information, the use of a naive bayes classification algorithm can be described as: p (H)+|D)∝P(H+)*P(D|H+) The public opinion information D consists of N phrases, so the overall rationality and semantic rationality of the public opinion information D are evaluated by adopting the following two modes.
The condition independent assumption is: the word groups forming public sentiment information are supposed to have no direct influence on each other, the overall rationality of the word groups is judged according to the joint probability, and a formula is applied:
P(H+|D)∝P(H+)*P(N1|H+)*P(N2|H+)....*P(Nn|H+) And substituting the threat prior condition by each phrase threat probability to finally obtain an overall rationality coefficient.
MRF chain joint probability hypothesis: according to the MRF chain principle, the value of each state in the sequence of states depends on the preceding N states. In combination with public opinion information, each phrase is associated with the preceding N phrases to match semantic features, so we assume that N is 1, and can be expressed as:
P(H+|D)∝P(H+)*P(N1)*P(N2|N1)*P(N3|N2)....*P(Nn|Hn-1) Forming the threat probability of the current phrase according to the joint probability of the combined phrasesAnd finally, obtaining a semantic rationality coefficient.
And obtaining a threat coefficient according to the obtained overall rationality and the semantic rationality, and constructing a node semantic tree according to the semantic joint probability matrix when the threat coefficient meets a preset threat condition.
And S104, acquiring node interaction network topological distribution from a social node network through a depth detection algorithm, constructing a bidirectional node incidence matrix, and calculating a viscosity matching coefficient according to the bidirectional node incidence matrix and the node semantic tree.
In step S103, a rationality coefficient (i.e. threat coefficient) of each piece of public opinion information is obtained, and for the "heat conduction mode" where the threat coefficient of each user is an average value of the threat coefficients of all pieces of public opinion information that have been operated, the threat coefficient of each piece of public opinion information is changed cumulatively with the threat coefficients of the users that have operated it, the user is set as a first node, the public opinion information is set as a second node, and when the user operates a piece of public opinion information, edges will be generated, and thus cyclic diffusion is performed, and finally, the node network topology distribution is obtained. Fig. 4 is a schematic diagram of a network topology distribution of nodes according to an embodiment of the present invention. Wherein, X1, X2 and X3 are first nodes representing users, Y1, Y2, Y3 and Y4 are second nodes b11-b34 representing public opinion information, which are connecting edges between the users and the public opinion information, and represent that the users and each public opinion information can be related, and a12, a21 and a23 are three users capable of transmitting the public opinion information.
The nodes are interacted through public opinion information, the threat characteristics of the nodes and the public opinion information are determined through the diffusion degree of the nodes and the public opinion information in the network, the public opinion information with high threat and wide diffusion range in the social network is pointed, and the algorithm is high in accuracy, high in calculation speed and high in parallelism degree.
And S105, performing vector conversion on the bidirectional node incidence matrix to form an initial matrix to be analyzed, and acquiring an information source network by using a hierarchical extraction algorithm and the viscosity matching coefficient.
Firstly, vector conversion is realized on a bidirectional node incidence matrix to form an initial matrix to be analyzed, aiming at the initial matrix, a row i and a column j represent node users, taking four users as an example, the initial matrix is as follows:
Figure BDA0001486359270000101
data in matrix alphaijThe viscosity matching coefficient is expressed. Firstly, the first k node users with the maximum viscosity matching coefficient are selected as seed nodes, connection pointing nodes are searched for each seed node to form a candidate father node combination, then a node N with the highest frequency and viscosity matching coefficient is extracted from the candidate father nodes of the k seed nodes, and a node tree based on the seed nodes is created according to N association pointing. And searching all the associated child nodes of the N, and extracting the similar child nodes of the seed nodes through semantic clustering analysis by combining the characteristics of the seed nodes. And acquiring respective child nodes of the seed nodes aiming at the seed nodes, extracting strongly-associated child nodes by performing semantic clustering analysis on the seed nodes and the child nodes, and redrawing the number of the nodes. And the rest is carried out until the weak association causes the node tree to be converged and closed, and a final information source network is generated.
The viscosity matching coefficient can accurately represent the linkage between nodes, and a hierarchical extraction algorithm can realize low-error extraction of a network hierarchical structure under the support of the viscosity matching coefficient. Aiming at the social network with the hierarchical structure, the key nodes and the key network are extracted through analyzing the network topology structure, and the key nodes and the key network are simplified into a classification tree with a hierarchical structure, so that the data structure of the classification tree can support more data analysis models.
And S106, constructing an information source semantic tree aiming at the information source network, and drawing an information source word group viscosity distribution diagram by combining node interaction network topology distribution and utilizing a viscosity extension algorithm.
And extracting node semantic trees of all nodes in the information source network, and realizing the fusion of all nodes to construct the information source semantic trees. And (4) drawing an information source word group viscosity distribution diagram by using a viscosity extension algorithm in combination with node network topology distribution.
Fig. 5 is a viscosity distribution diagram of the information source word group provided by the embodiment of the invention. Word group viscosity of the information sourceThe distribution diagram is based on a two-dimensional plane coordinate system, firstly, an information source network root node is selected to be arranged in the center O of the coordinate system, deviation angles are calculated according to the number of sub-nodes of the information source network root node to construct a reference vector, the position deviation of the information source network root node on the reference vector is determined according to the viscosity matching coefficient of each sub-node and the root node, then the position deviation of the information source network root node on the vertical reference vector is determined according to the viscosity matching coefficient of the sub-node related to the root node, and finally the position of the sub-node in the coordinate system is determined1And (4) point. By analogy, the algorithm is adopted to extend all the sub node trees to form an information source word group viscosity distribution diagram.
The viscosity extension algorithm realizes the method of converting the semantic tree into the scattered point distribution of the nodes in a two-dimensional space, constructing a multi-level coordinate system according to the node tree level and determining the node deviation position based on the basic vector, so that the nodes are combined with the viscosity matching coefficient to achieve accurate and reasonable state distribution.
And S107, extracting an information source semantic characteristic phrase from the information source word group viscosity distribution diagram by using a viscosity clustering algorithm, performing correlation analysis on a semantic tree of each node in an information source network, and extracting an information source.
In the phrase viscosity profile, as shown in FIG. 6, a positive distribution circle capable of containing as many nodes as possible around the phrase, such as g (m), is drawn for all the phrase nodes with its center point11)、g(m22)、g(m33) And the number of nodes and the ratio of the circle radius are ensured to meet the threshold value. If the central nodes contain each other, the K node positive distribution rings extracted by the method are fused into a new normal distribution ring, such as g (m) for example22) And g (m)33) Forming a new normal distribution ring fused, and marking a new central node as a central node phrase combination of the two original normal distribution rings. And finally, extracting the positive space distribution ring node phrases to form information source semantic characteristic phrases. And finally, performing association analysis on the node semantic tree of each node in the information source network by combining the information source semantic characteristic word group, deleting the nodes with the similarity not meeting the condition, and extracting the final information source.
In other specific embodiments, the transformation characteristics of the information source network can be analyzed by combining a time-dependent diffusion model of the information source network and an interactive transformation model of public sentiment phrases, the information source phrases with increased threat coefficients can be generalized, and the information source phrases can be generalized in the detection and analysis range of the information source.
Fig. 7 is a schematic structural diagram of a server 10 according to an embodiment of the present invention. The server 10 may be a computer or any other computing device with data processing capability, and includes a processor 101, a memory 102, a bus 103 and a communication interface 104, wherein the processor 101, the communication interface 104 and the memory 102 are connected through the bus 103; the processor 101 is for executing executable modules, such as computer programs, stored in the memory 102.
The Memory 102 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized by at least one communication interface 103 (which may be wired or wireless).
The bus 104 may be an ISA bus, PCI bus, EISA bus, or the like. Only one bi-directional arrow is shown in fig. 3, but this does not indicate only one bus or one type of bus.
The memory 102 is used for storing a program, such as the network information source searching apparatus 200 shown in fig. 8. The network information source searching device 200 includes at least one software function module which can be stored in the memory 102 in the form of software or firmware (firmware) or solidified in an Operating System (OS) of the server 10. After receiving the execution instruction, the processor 101 executes the program to implement the network information source searching method disclosed in the embodiment of the present invention.
The processor 101 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 101. The Processor 101 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components.
Fig. 8 is a schematic functional module diagram of a network information source searching apparatus 200 according to an embodiment of the present invention. The network information source searching device 200 comprises a probability space constructing module 201, a probability matrix constructing module 202, a node semantic tree constructing module 203, a calculating module 204, an obtaining module 205, a drawing module 206 and an extracting module 207.
A probability space constructing module 201, configured to construct a public opinion phrase probability space according to the public opinion phrase database.
In this embodiment of the present invention, the probability space constructing module 201 may execute step S101.
The probability matrix construction module 202 is used for extracting a single public opinion information phrase sequence and constructing a semantic joint probability matrix by combining the public opinion phrase probability space.
In this embodiment of the present invention, the probability matrix building module 202 may execute step S102.
And the node semantic tree construction module 203 is configured to obtain a threat coefficient of the single public opinion information by using the semantic joint probability matrix and a naive bayesian classification algorithm, and construct a node semantic tree by combining the semantic joint probability matrix.
In this embodiment of the present invention, the node semantic tree constructing module 203 may execute step S103.
The calculating module 204 is configured to obtain node interaction network topology distribution from a social node network through a depth detection algorithm, construct a bidirectional node incidence matrix, and calculate a viscosity matching coefficient according to the bidirectional node incidence matrix and the node semantic tree.
In this embodiment of the present invention, the calculating module 204 may execute step S104.
An obtaining module 205, configured to perform vector conversion on the bidirectional node correlation matrix to form an initial matrix to be analyzed, and obtain an information source network by using a hierarchical extraction algorithm and the viscosity matching coefficient.
In this embodiment of the present invention, the obtaining module 205 may execute step S105.
And the drawing module 206 is configured to construct an information source semantic tree for the information source network, and draw an information source word group viscosity distribution diagram by combining node interaction network topology distribution and using a viscosity extension algorithm.
In this embodiment of the present invention, the drawing module 206 may execute step S106.
And the extracting module 207 is configured to extract an information source semantic feature phrase from the information source word group viscosity distribution map by using a viscosity clustering algorithm, perform association analysis on a semantic tree of each node in the information source network, and extract an information source.
In this embodiment of the present invention, the extracting module 207 may execute step S107.
In summary, the network information source searching method, the device and the server provided by the embodiments of the present invention have the functions of information source network discovery and property identification through public opinion information semantic identification and social network node viscosity correlation analysis. Compared with the traditional keyword semantic analysis and fixed point information source relation network extraction, the method combines a phrase probability space and semantic combined matrix division method and a naive Bayes classification method to construct a node semantic tree, node depth detection and vector conversion viscosity matching to extract an information source network, a viscosity clustering algorithm and a cross-correlation recognition final information source, shows more accurate and reasonable information source capture, is based on the same public opinion characteristic data, and has the advantages of various analysis dimensions, deep social relation analysis and public opinion characteristic recognition and more visual data expression. The system has strong pertinence of detection objects, can analyze deep-level characteristics of data, detects public sentiment source networks, and easily discovers social network information sources.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes. It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A network information source searching method is characterized by comprising the following steps:
constructing a public opinion phrase probability space according to a public opinion phrase database;
extracting a phrase sequence of single public opinion information, and constructing a semantic joint probability matrix by combining the public opinion phrase probability space;
obtaining threat coefficients of the single public opinion information by using the semantic joint probability matrix and a naive Bayesian classification algorithm, and constructing a node semantic tree by combining the semantic joint probability matrix;
acquiring node interaction network topology distribution from a social node network through a depth detection algorithm, constructing a bidirectional node incidence matrix, and calculating a viscosity matching coefficient according to the bidirectional node incidence matrix and the node semantic tree;
performing vector conversion on the bidirectional node incidence matrix to form an initial matrix to be analyzed, and acquiring an information source network by utilizing a layered extraction algorithm and the viscosity matching coefficient;
constructing an information source semantic tree aiming at the information source network, and drawing an information source word group viscosity distribution diagram by combining node interaction network topology distribution and utilizing a viscosity extension algorithm, wherein the viscosity extension algorithm is used for converting the information source semantic tree into scattered point distribution of nodes in a two-dimensional space, constructing a multi-level coordinate system according to a node tree level and determining a node deviation position based on a basic vector;
and extracting an information source semantic characteristic phrase from the information source word group viscosity distribution diagram by using a viscosity clustering algorithm, and performing correlation analysis on the semantic tree of each node in the information source network to extract an information source.
2. The method as claimed in claim 1, wherein the step of constructing the public phrase probability space according to the public phrase database further comprises:
calculating the citation probability of each phrase in the public opinion phrase database, calculating the universality probability according to the phrase distribution state in the public opinion phrase database, and calculating the aging coefficient according to the use time distribution of each phrase in the public opinion phrase database;
and constructing a public opinion phrase probability space according to the reference probability, the universality probability and the time efficiency.
3. The method for searching network information source according to claim 1 or 2, wherein the step of extracting a phrase sequence of a single piece of public opinion information and constructing a semantic joint probability matrix in combination with the public opinion phrase probability space further comprises:
extracting a phrase sequence of single public opinion information;
constructing a frequency matrix according to the simultaneous occurrence frequency of any two phrases in the phrase sequence, constructing a threat weight distribution matrix according to the threat weight distribution of public opinion information formed by any two phrases in the phrase sequence in a phrase probability space, integrally constructing an individual weight product matrix according to the product of the self threat weights of any two phrases in the phrase sequence, and constructing an individual probability matrix according to the self probability space characteristics of any two phrases in the phrase sequence;
and combining the frequency matrix, the threat weight distribution matrix, the individual weight product matrix and the individual probability matrix to construct a semantic joint probability matrix.
4. The method for searching the network information source according to claim 3, wherein the step of obtaining the threat coefficients of the single public opinion information by using the semantic joint probability matrix and the naive Bayesian classification algorithm and constructing the node semantic tree by combining the semantic joint probability matrix further comprises:
the method comprises the steps of adopting a conditional independent hypothesis to evaluate the overall rationality of single public opinion information, adopting a Markov random field chain joint probability hypothesis to evaluate the semantic rationality of the single public opinion information, obtaining a threat coefficient according to the overall rationality and the semantic rationality, and combining a semantic joint probability matrix to construct a node semantic tree.
5. The method according to claim 4, wherein the step of obtaining the topology distribution of the node interaction network from the social node network by the depth-finding algorithm further comprises:
the threat coefficient of each user is the average value of the threat coefficients of all public opinion information operated by the user, the threat coefficient of each public opinion information is in accumulative transformation along with the threat coefficient of the user operating the threat coefficient, the user is set as a first node, the public opinion information is set as a second node, and when the user operates certain public opinion information, a connecting edge is generated, so that cyclic diffusion is carried out, and the node network topology distribution is finally obtained.
6. A network information source lookup apparatus, comprising:
the probability space construction module is used for constructing a public opinion phrase probability space according to the public opinion phrase database;
the probability matrix construction module is used for extracting a single public opinion information phrase sequence and constructing a semantic joint probability matrix by combining the public opinion phrase probability space;
the node semantic tree construction module is used for acquiring the threat coefficient of the single public opinion information by utilizing the semantic joint probability matrix and a naive Bayesian classification algorithm and constructing a node semantic tree by combining the semantic joint probability matrix;
the calculation module is used for acquiring node interaction network topological distribution from a social node network through a depth detection algorithm, constructing a bidirectional node incidence matrix and calculating a viscosity matching coefficient according to the bidirectional node incidence matrix and the node semantic tree;
the acquisition module is used for carrying out vector conversion on the bidirectional node incidence matrix to form an initial matrix to be analyzed and acquiring an information source network by utilizing a layered extraction algorithm and the viscosity matching coefficient;
the drawing module is used for constructing an information source semantic tree aiming at the information source network, drawing an information source word group viscosity distribution diagram by combining node interaction network topology distribution and utilizing a viscosity extension algorithm, wherein the viscosity extension algorithm is used for converting the information source semantic tree into scattered point distribution of nodes in a two-dimensional space, constructing a multi-level coordinate system according to a node tree level and determining a node deviation position based on a basic vector;
and the extraction module is used for extracting an information source semantic characteristic phrase from the information source word group viscosity distribution diagram by using a viscosity clustering algorithm, performing correlation analysis on the semantic tree of each node in the information source network and extracting the information source.
7. The apparatus of claim 6, wherein the probability space construction module is further configured to: calculating the citation probability of each phrase in the public opinion phrase database, calculating the universality probability according to the phrase distribution state in the public opinion phrase database, and calculating the aging coefficient according to the use time distribution of each phrase in the public opinion phrase database;
and constructing a public opinion phrase probability space according to the reference probability, the universality probability and the time efficiency.
8. The apparatus according to claim 6 or 7, wherein the probability matrix building module is further configured to: extracting a phrase sequence of single public opinion information;
constructing a frequency matrix according to the simultaneous occurrence frequency of any two phrases in the phrase sequence, constructing a threat weight distribution matrix according to the threat weight distribution of public opinion information formed by any two phrases in the phrase sequence in a phrase probability space, integrally constructing an individual weight product matrix according to the product of the self threat weights of any two phrases in the phrase sequence, and constructing an individual probability matrix according to the self probability space characteristics of any two phrases in the phrase sequence;
and combining the frequency matrix, the threat weight distribution matrix, the individual weight product matrix and the individual probability matrix to construct a semantic joint probability matrix.
9. The apparatus of claim 8, wherein the node semantic tree building module is further configured to: the method comprises the steps of adopting a conditional independent hypothesis to evaluate the overall rationality of single public opinion information, adopting a Markov random field chain joint probability hypothesis to evaluate the semantic rationality of the single public opinion information, obtaining a threat coefficient according to the overall rationality and the semantic rationality, and combining a semantic joint probability matrix to construct a node semantic tree.
10. A server, comprising:
one or more processors;
a memory for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-5.
CN201711223777.4A 2017-11-29 2017-11-29 Network information source searching method and device and server Active CN107862081B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711223777.4A CN107862081B (en) 2017-11-29 2017-11-29 Network information source searching method and device and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711223777.4A CN107862081B (en) 2017-11-29 2017-11-29 Network information source searching method and device and server

Publications (2)

Publication Number Publication Date
CN107862081A CN107862081A (en) 2018-03-30
CN107862081B true CN107862081B (en) 2021-07-16

Family

ID=61704267

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711223777.4A Active CN107862081B (en) 2017-11-29 2017-11-29 Network information source searching method and device and server

Country Status (1)

Country Link
CN (1) CN107862081B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112508376A (en) * 2020-11-30 2021-03-16 中国科学院深圳先进技术研究院 Index system construction method
CN112861956A (en) * 2021-02-01 2021-05-28 浪潮云信息技术股份公司 Water pollution model construction method based on data analysis

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001080080A3 (en) * 2000-04-14 2003-02-13 Rightnow Tech Inc Usage based strength between related help topics and context based mapping thereof in a help information retrieval system
CN1766871A (en) * 2004-10-29 2006-05-03 中国科学院研究生院 A Processing Method for Semantic Extraction of Semi-structured Data Based on Context
CN1853180A (en) * 2003-02-14 2006-10-25 尼维纳公司 System and method for semantic knowledge retrieval, management, capture, sharing, discovery, delivery and presentation
US8131701B2 (en) * 2005-09-27 2012-03-06 Patentratings, Llc Method and system for probabilistically quantifying and visualizing relevance between two or more citationally or contextually related data objects
US8209331B1 (en) * 2008-04-02 2012-06-26 Google Inc. Context sensitive ranking
WO2014127673A1 (en) * 2013-02-25 2014-08-28 Tencent Technology (Shenzhen) Company Limited Method and apparatus for acquiring hot topics
CN105677873A (en) * 2016-01-11 2016-06-15 中国电子科技集团公司第十研究所 Text information associating and clustering collecting processing method based on domain knowledge model
CN107066256A (en) * 2017-02-24 2017-08-18 中国人民解放军海军大连舰艇学院 A kind of object based on tense changes the modeling method of model

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7203909B1 (en) * 2002-04-04 2007-04-10 Microsoft Corporation System and methods for constructing personalized context-sensitive portal pages or views by analyzing patterns of users' information access activities
CN101122909B (en) * 2006-08-10 2010-06-16 株式会社日立制作所 Text information retrieval device and text information retrieval method
CN102012929A (en) * 2010-11-26 2011-04-13 北京交通大学 Network consensus prediction method and system
CN102411611B (en) * 2011-10-15 2013-01-02 西安交通大学 Instant interactive text oriented event identifying and tracking method
CN102521291B (en) * 2011-11-29 2014-02-19 浙江大学 A method of importing LIN network description file LDF based on ANTLR
CN102789498B (en) * 2012-07-16 2014-08-06 钱钢 Method and system for carrying out sentiment classification on Chinese comment text on basis of ensemble learning
CN103970805B (en) * 2013-02-05 2018-01-09 日电(中国)有限公司 Move Mode excavating equipment and method
US20170075877A1 (en) * 2015-09-16 2017-03-16 Marie-Therese LEPELTIER Methods and systems of handling patent claims
CN106980385B (en) * 2017-04-07 2018-07-10 吉林大学 Virtual assembly device, system and method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001080080A3 (en) * 2000-04-14 2003-02-13 Rightnow Tech Inc Usage based strength between related help topics and context based mapping thereof in a help information retrieval system
CN1853180A (en) * 2003-02-14 2006-10-25 尼维纳公司 System and method for semantic knowledge retrieval, management, capture, sharing, discovery, delivery and presentation
CN1766871A (en) * 2004-10-29 2006-05-03 中国科学院研究生院 A Processing Method for Semantic Extraction of Semi-structured Data Based on Context
US8131701B2 (en) * 2005-09-27 2012-03-06 Patentratings, Llc Method and system for probabilistically quantifying and visualizing relevance between two or more citationally or contextually related data objects
US8209331B1 (en) * 2008-04-02 2012-06-26 Google Inc. Context sensitive ranking
WO2014127673A1 (en) * 2013-02-25 2014-08-28 Tencent Technology (Shenzhen) Company Limited Method and apparatus for acquiring hot topics
CN105677873A (en) * 2016-01-11 2016-06-15 中国电子科技集团公司第十研究所 Text information associating and clustering collecting processing method based on domain knowledge model
CN107066256A (en) * 2017-02-24 2017-08-18 中国人民解放军海军大连舰艇学院 A kind of object based on tense changes the modeling method of model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Enhancing Sensitivity Classification with Semantic";Graham McDonald et.al;《computer science》;20170408;第1-13页 *
"网络舆情敏感话题发现平台的研究";冯颖;《中国优秀硕士学位论文全文数据库息科技辑》;20091115;I138-1520 *

Also Published As

Publication number Publication date
CN107862081A (en) 2018-03-30

Similar Documents

Publication Publication Date Title
Zubiaga et al. Detection and resolution of rumours in social media: A survey
Unankard et al. Emerging event detection in social networks with location sensitivity
Nair et al. Usage and analysis of Twitter during 2015 Chennai flood towards disaster management
Othman et al. Youtube spam detection framework using naïve bayes and logistic regression
CN111611801B (en) Method, device, server and storage medium for identifying text region attribute
CN112559747B (en) Event classification processing method, device, electronic equipment and storage medium
Vo et al. Automatic data curation for self-supervised learning: A clustering-based approach
CN110457404A (en) Social media account classification method based on complex heterogeneous network
Díaz-Morales Cross-device tracking: Matching devices and cookies
CN104615715A (en) Social network event analyzing method and system based on geographic positions
CN109508385A (en) A kind of character relation analysis method in web page news data based on Bayesian network
Zhao et al. Text sentiment analysis algorithm optimization and platform development in social network
CN107862081B (en) Network information source searching method and device and server
Irfan et al. Classifying botnet attack on internet of things device using random forest
CN110929683A (en) Video public opinion monitoring method and system based on artificial intelligence
Sun et al. Anomaly subgraph detection with feature transfer
Cao et al. Advances in Knowledge Discovery and Data Mining: 19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, 2015, Proceedings, Part I
Arsytania et al. Movie recommender system with cascade hybrid filtering using convolutional neural network
CN110598122A (en) Social group mining method, device, equipment and storage medium
Cahyaningtyas et al. Emotion detection of tweets in Indonesian language using LDA and expression symbol conversion
CN110019763B (en) Text filtering method, system, equipment and computer readable storage medium
Karim et al. Spatiotemporal Aspects of Big Data.
Neela et al. An Ensemble Learning Frame Work for Robust Fake News Detection
Chen et al. Scaling up Markov logic probabilistic inference for social graphs
Nisha et al. Deep KNN Based Text Classification for Cyberbullying Tweet Detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant