[go: up one dir, main page]

CN109885693B - Rapid knowledge comparison method and system based on knowledge graph - Google Patents

Rapid knowledge comparison method and system based on knowledge graph Download PDF

Info

Publication number
CN109885693B
CN109885693B CN201910025419.5A CN201910025419A CN109885693B CN 109885693 B CN109885693 B CN 109885693B CN 201910025419 A CN201910025419 A CN 201910025419A CN 109885693 B CN109885693 B CN 109885693B
Authority
CN
China
Prior art keywords
node
knowledge
domain
graph
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910025419.5A
Other languages
Chinese (zh)
Other versions
CN109885693A (en
Inventor
李兵
熊燚铭
胡方家
陈健
赵玉琦
陈秀清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN201910025419.5A priority Critical patent/CN109885693B/en
Publication of CN109885693A publication Critical patent/CN109885693A/en
Application granted granted Critical
Publication of CN109885693B publication Critical patent/CN109885693B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明提供一种基于知识图谱的快速知识对比方法及系统,包括构建知识表示单元,将各领域词条拆分解析成知识表示单元;构建知识图谱,包括将知识表示单元保存到图数据库中形成知识图谱,领域词条之间形成多对多的图结构关系;构建需对比的领域概念,包括确定需要进行比较的领域概念,拆分解析成知识表示单元,存入知识图谱并建立不破坏原图结构的临时提及关系;抽取领域概念的多级拓扑;对比多级拓扑,计算出拓扑节点权重,然后计算出领域概念的带权相似度,得到知识对比结果。本发明能够快速自动化地实现海量文献的知识对比与分类,支持复杂对比应用,实时性高,实用性强,提高后续查询融合的精度,具有重要的市场价值。

Figure 201910025419

The invention provides a quick knowledge comparison method and system based on a knowledge graph, including constructing a knowledge representation unit, splitting and parsing entries in various fields into knowledge representation units; constructing a knowledge graph, including saving the knowledge representation unit in a graph database to form a The knowledge graph forms a many-to-many graph structure relationship between domain entries; constructs the domain concepts to be compared, including determining the domain concepts that need to be compared, splits and parses them into knowledge representation units, stores them in the knowledge graph and establishes them without destroying the original Temporary mention relationship of graph structure; multi-level topology extraction of domain concepts; comparison of multi-level topologies to calculate the weights of topological nodes, and then to calculate the weighted similarity of domain concepts to obtain knowledge comparison results. The invention can quickly and automatically realize the knowledge comparison and classification of massive documents, supports complex comparison applications, has high real-time performance, strong practicability, improves the accuracy of subsequent query fusion, and has important market value.

Figure 201910025419

Description

Method and system for rapid knowledge comparison based on knowledge graph
Technical Field
The invention belongs to the technical field of computer knowledge comparison, and particularly relates to a knowledge fusion method in the field of knowledge maps.
Background
Knowledge is described by the knowledge map in a mode of constructing entities and entity relations, so that the knowledge can be exchanged, circulated and processed among computers and between the computers and people more easily. On the application level, synonymous concepts from different sources cannot be effectively understood by a computer, and an effective technical scheme for solving the knowledge comparison problem mainly aiming at knowledge fusion is urgently needed. The patent provides a topology extraction and comparison scheme, which can rapidly compare the synonymy degree between concepts in two fields.
The comparison result is used for guiding knowledge comparison and classification of massive documents, such as determining whether two concepts are the same concept, for example, determining whether the two concepts have higher inclusion relationship. The realization of quick contrast can save system resources, improves the real-time practicality of technical application, for example in the medical field, and this contrast result can support quick automatic determination whether a certain case belongs to a certain field, helps the patient to find relevant departments fast.
Disclosure of Invention
Aiming at the problems in the prior knowledge comparison technology, the invention provides a rapid comparison scheme based on a topological structure.
The technical scheme provided by the invention is a fast knowledge comparison method based on a knowledge graph, which comprises the following steps,
step 1, constructing a knowledge representation unit, and splitting and analyzing entries of each field into the knowledge representation unit; the knowledge representation unit comprises a field node AreaNode, a classification node CategoryNode and a description node TextNode, wherein each attribute of each entry is stored in the field node AreaNode, the classification of the entry is stored in the classification node CategoryNode, a detailed sub-entry of the description entry is stored in the description node TextNode, and after a word segmentation method is used for segmenting the attribute text and the description node description text, the mentioned field node and the mentioned node establish an MENTION relationship, wherein the MENTION represents the MENTION;
step 2, constructing a knowledge graph, including storing all knowledge representation units obtained in the step 1 into a graph database to form the knowledge graph, and forming many-to-many graph structure relations among the field entries;
step 3, constructing the domain concepts to be compared, including determining the domain concepts A, B to be compared, splitting and analyzing the domain concepts A, B into knowledge representation units, then storing the knowledge graphs obtained in the step 2 and establishing a temporary reference relationship which does not damage the structure of the original graph;
step 4, extracting the multilevel topology of the domain concept, including extracting the topological structure of the domain concept A, B on the knowledge graph by using a subgraph matching mode, wherein the domain nodes in the knowledge representation unit and other domain nodes related to the description nodes through the MENTION relationship are the first-level topology of the domain concept, the domain nodes in the first-level topology directly generate the MENTION relationship or the domain nodes indirectly generate the MENTION relationship through the description nodes are the second-level topology of the domain concept, the homological N-level topology refers to other domain nodes directly mentioned by the N-1 level topology and other domain nodes indirectly generating the MENTION relationship through the description nodes, and the extracted nodes are not extracted any more;
and 5, comparing the multilevel topology, namely obtaining a data graph according to the topological structure of the domain concept A, B extracted in the step 4, calculating the weight of the topological nodes, and then calculating the weighted similarity alpha of the domain concept A, B to obtain a knowledge comparison result.
Furthermore, in step 5, the topological node weights are calculated based on the following definitions,
in the knowledge graph, a sub-network with a node V as a center and a depth d is modeled into a data graph G (V, d) { V (G (V, d)), E (G (V, d)) }, wherein V (G (V, d)) refers to a point set formed by all nodes in the data graph G (V, d), E (G (V, d)) refers to an edge set generated by all node links in the data graph G (V, d), and the node V is defined additionally0The depth in the data map G (v, D) is D (G (v, D), v0),v0E.g. V (G (V, d)), so as to obtain V for any node0W (G (v, d), v) of0) As indicated by the general representation of the,
Figure GDA0002938317030000021
when considering that two nodes respectively form a node v1、v2Node v is a topology formed by the sub-networks with the center and the depth d0The weights therein are found as follows,
W(G(v1,d),G(v2,d),v0)=W(G(v1,d),v0)+W(G(v2,d),v0)
wherein, G (v)1D) and G (v)2D) is respectively with node v1、v2Data graph, W (V) with depth d modeled by sub-network with center as the center1,d),v0) And W (G (v)2,d),v0) Is node v0In data graph G (v)1D) and G (v)2And weight in d).
In step 5, the weighted similarity a of the domain concept A, B is calculated as follows,
let the central nodes of the domain concept A and the domain concept B be nodes v respectively1、v2Data graph G of corresponding expansion1(d)=G(v1,d)、G2(d)=G(v2D), data graph G1(d) Node set V of1(d)=V(G1(d) Data graph G)2(d) Node set V of2(d)=V(G2(d) The domain concept A, B weighted similarity α is found as follows:
Figure GDA0002938317030000022
wherein the weight W (G)1(d),G2(d),v0)=W(G(v1,d),G(v2,d),v0)。
The invention also provides a system for rapidly comparing knowledge based on the knowledge graph, which comprises the following modules,
the first module is used for constructing a knowledge representation unit and splitting and analyzing each field entry into the knowledge representation unit; the knowledge representation unit comprises a field node AreaNode, a classification node CategoryNode and a description node TextNode, wherein each attribute of each entry is stored in the field node AreaNode, the classification of the entry is stored in the classification node CategoryNode, a detailed sub-entry of the description entry is stored in the description node TextNode, and after a word segmentation method is used for segmenting the attribute text and the description node description text, the mentioned field node and the mentioned node establish an MENTION relationship, wherein the MENTION represents the MENTION;
the second module is used for constructing the knowledge graph, and comprises the steps of storing all knowledge representation units obtained by the first module into a graph database to form the knowledge graph, and forming many-to-many graph structure relations among the domain entries;
the third module is used for constructing the domain concepts needing to be compared, and comprises determining the domain concepts needing to be compared A, B, splitting and analyzing the domain concepts A, B into knowledge representation units, then storing the knowledge representation units into the knowledge graph obtained by the second module, and establishing a temporary reference relationship which does not damage the structure of the original graph;
a fourth module, configured to extract a multi-level topology of the domain concept, where the multi-level topology includes extracting a topology structure of the domain concept A, B on a knowledge graph in a subgraph matching manner, a domain node in a knowledge representation unit and other domain nodes describing nodes associated by an MENTION relationship are first-level topologies of the domain concept, the domain node in the first-level topology directly generates the MENTION relationship or indirectly generates the MENTION relationship through the description node is a second-level topology of the domain concept, and the like N-level topology refers to other domain nodes mentioned directly by the N-1 level topology and other domain nodes indirectly generating the MENTION relationship through the description node, and the extracted nodes are not extracted any more;
and the fifth module is used for comparing the multilevel topology, and comprises the steps of obtaining a data graph according to the topological structure of the domain concept A, B extracted by the fourth module, calculating the weight of the topological nodes, and then calculating the weighted similarity alpha of the domain concept A, B to obtain a knowledge comparison result.
And in a fifth module, calculating topology node weights based on the following definitions,
in the knowledge graph, a sub-network with a node V as a center and a depth d is modeled into a data graph G (V, d) { V (G (V, d)), E (G (V, d)) }, wherein V (G (V, d)) refers to a point set formed by all nodes in the data graph G (V, d), E (G (V, d)) refers to an edge set generated by all node links in the data graph G (V, d), and the node V is defined additionally0The depth in the data map G (v, D) is D (G (v, D), v0),v0E.g. V (G (V, d)), so as to obtain V for any node0W (G (v, d), v) of0) As indicated by the general representation of the,
Figure GDA0002938317030000031
when considering that two nodes respectively form a node v1、v2Node v is a topology formed by the sub-networks with the center and the depth d0The weights therein are found as follows,
W(G(v1,d),G(v2,d),v0)=W(G(v1,d),v0)+W(G(v2,d),v0)
wherein, G (v)1D) and G (v)2D) is respectively with node v1、v2Data graph, W (V) with depth d modeled by sub-network with center as the center1,d),v0) And W (G (v)2,d),v0) Is node v0In data graph G (v)1D) and G (v)2And weight in d).
In the fifth module, moreover, the weighted similarity a of the computing domain concept A, B is implemented as follows,
let the central nodes of the domain concept A and the domain concept B be nodes v respectively1、v2Corresponding data map G1(d)=G(v1,d)、G2(d)=G(v2D), data graph G1(d) Node set V of1(d)=V(G1(d) Data graph G)2(d) Node set V of2(d)=V(G2(d) The domain concept A, B weighted similarity α is found as follows:
Figure GDA0002938317030000032
wherein the weight W (G)1(d),G2(d),v0) I.e. W (G (v))1,d),G(v2,d),v0)。
The invention has the following advantages:
the invention provides a topology extraction scheme, which meets the following requirements:
1. the direct characteristics of the domain concept in this domain are effectively described.
2. The expansion shows the indirect characteristic of the domain concept in the domain.
3. The extraction speed and the extraction depth basically keep a linear relation.
The invention provides a topology comparison scheme, which meets the following requirements:
1. the difference between the two domain concepts is effectively described.
2. Sensitive to highly similar domain concepts.
3. The direct features are well distinguished from the indirect features.
Therefore, the method can quickly and automatically realize knowledge comparison and classification of massive documents, is used for tasks requiring complex comparison, such as document filing, entry quick classification, knowledge fusion and the like, adopts an automatic scheme to replace a large amount of manual analysis, has high real-time performance and strong practicability, improves the precision of subsequent query fusion, and has important market value.
Drawings
Fig. 1 is a schematic diagram of a specific form of a knowledge representation unit according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a multi-level topology extraction method according to an embodiment of the present invention.
Detailed Description
The technical solution of the present invention is described in detail below with reference to the accompanying drawings and examples.
The fast knowledge comparison method based on the knowledge graph provided by the embodiment of the invention comprises the following steps:
step 1: and constructing a knowledge representation unit. The invention provides a method for resolving and analyzing a domain entry into a knowledge representation unit shown in figure 1, wherein each attribute of the entry is stored in a domain node (AreaNode), the category of the entry is stored in a category node (CategoryNode), a detailed sub-entry describing the entry is stored in a description node (TextNode), and after a word segmentation method is used for segmenting the attribute text and the description node description text, a MENTION (MENTION) relationship is established between the domain node and the MENTION node.
The knowledge representation unit refers to a knowledge representation mode with the same overall structure, the same set of logic constraints and different sub-structures, and fig. 1 is a method for representing knowledge in the patent. The knowledge representation unit adopts a uniform overall structure and associated logic, the semantics described by the sub-structures are changed, and for example, the sub-structures of the medicines can include semantic descriptions of attributes such as chemical structures, physicochemical properties, adverse reactions, use cautions and the like.
The domain entries are words related to the domain and paraphrases thereof, for example, 20 ten thousand entries in the medical domain can be stored in a document. When each domain entry in the domain is divided and analyzed into the knowledge representation unit shown in fig. 1, the complete information of the entry including the knowledge name, description and the like is stored in the domain node (area node), the classification to which the entry belongs is stored in the classification node (category node), and the detailed sub-entry for describing the entry is stored in the description node (TextNode). The patent ensures that the complete information of knowledge is not damaged by splitting and analyzing the domain entries into the knowledge representation units, and the unstructured entries are converted into the structured knowledge information which can be understood by a computer.
The classification nodes are connected with each other by using an inclusion (EMBRACE) relationship and represent the inclusion relationship between classification and classification; the classification nodes and the domain nodes are connected by using an Inclusion (INDLUDE) relationship and represent the inclusion relationship of the classification to the domain node entities; the domain nodes and the description nodes are connected by using an Inclusion (INVOLVE) relationship to represent the inclusion relationship of the domain nodes to the sub-entries, and the description nodes are also connected by using an Inclusion (INVOLVE) relationship to record the sub-descriptions of the description information. In addition, the domain nodes are connected by adopting a reference (MENTION) relationship, and the direct association relationship between the domain nodes is recorded; the description nodes and the field nodes are connected by adopting a reference (MENTION) relationship, and the reference relationship of the description nodes to the field nodes is recorded, wherein the two reference relationships are main components for generating the connection between the field nodes and the field nodes.
Note: the three relations of EMBRACE, INCLUDE and INVOLVE are not different semantically, and all represent inclusion and are used for bias sequence reasoning of knowledge. The computer is used for distinguishing the node types at two ends of the relation during storage, so that the computer can conveniently establish indexes of the corresponding relation and play an accelerating role in the comparison process.
The domain concept knowledge representation scheme provided by the invention can completely describe the knowledge structure of the domain concept, and does not damage the structure of other domain concepts when generating relationship with other domain concepts.
Step 2: and (5) constructing a knowledge graph. And (3) storing all the knowledge representation units obtained in the step (1) into a graph database to form a knowledge graph, and forming a many-to-many graph structure relationship among the field entries.
The knowledge representation unit formed in the step 1 already generates a preliminary association map through the classification nodes and the domain nodes, and the complexity of the relationship does not support comparison at this time. After text word segmentation is carried out on the description text and the description node description of the domain nodes through a maximum entropy model, a reference (MENTION) relation is established between the mentioned domain nodes and the mentioned nodes (including the domain nodes and the description nodes). After the operation, each knowledge representation unit is stored in a graph database to form a knowledge graph, and then the domain words form a many-to-many graph structure relationship directly or indirectly through the reference relationship.
And step 3: and constructing a domain concept to be compared. The domain concept refers to a mutually-associated knowledge representation unit which can be correctly expressed by a computer after the domain entries are processed by the steps and does not influence the understanding of people. When the domain entries A, B need to be compared, the domain entries A, B are split and analyzed into knowledge representation units shown in fig. 1 according to step 1, then the knowledge representation units are stored in the knowledge graph obtained in step 2 according to step 2, and a temporary reference relation which does not damage the structure of the original graph is established, at this time, the knowledge representation units are associated with other knowledge representation units to form domain concepts.
And 4, step 4: a multi-level topology of the domain concept is extracted. The topological structure of the domain concepts A, B on the knowledge graph is extracted using subgraph matching.
Fig. 2 is a multi-level topology representation method of the patent, the leftmost side is a knowledge representation unit of a domain concept, domain nodes in the knowledge representation unit and other domain nodes describing nodes related by an MENTION relationship are first-level topologies of the domain concept, and the domain nodes in the first-level topologies directly generate the domain nodes of the MENTION relationship or indirectly generate the field nodes of the MENTION relationship by the description nodes are second-level topologies of the domain concept. Similarly, the N-level topology refers to other domain nodes directly mentioned by the N-1 level topology and other domain nodes indirectly generating the MENTION relationship through the description nodes, and the extracted nodes are not extracted.
And 5: multi-level topologies are compared. And obtaining a data graph from the topological structure of the domain concept A, B extracted in the step 4, calculating the weight of the topological node according to the definition 1, and then calculating the weighted similarity alpha of the domain concept A, B according to the definition 2. The alpha quantization represents how similar the domain concept a is to the domain concept B. This degree of similarity can be used in various domains, and alpha values can be used to guide concept fusion. When the similarity alpha is used for a comparison task, the similarity alpha has direct significance, and when the similarity alpha is used for a classification task, concepts of all fields of a certain class are regarded as a first-level topology, so that the problem of overlarge comparison calculation amount of every two concepts in the traditional classification task can be solved.
Defining:
define 1 node weight: in the graph, a sub-network with a depth d and a node V as a center can be modeled into a data graph G (V, d) { V (G (V, d)), E (G (V, d)) }, wherein V (G (V, d)) refers to a point set formed by all nodes in the data graph G (V, d), E (G (V, d)) refers to an edge set generated by all node links in the data graph G, and a node V is defined additionally0The depth in the data map G (v, D) is D (G (v, D), v0),v0E.g. V (G (V, d)), so as to obtain V for any node0W (G (v, d), v) of0) Expressed as:
Figure GDA0002938317030000061
when considering that two nodes respectively form a node v1、v2Node v in a centralized, depth d subnetwork and co-constructed topology0The weights therein are found as follows:
W(G(v1,d),G(v2,d),v0)=W(G(v1,d),v0)+W(G(v2,d),v0) Formula II
Wherein, G (v)1D) and G (v)2D) is respectively with node v1、v2Data graph modeled for a hub, depth d sub-network, node v0In data graph G (v)1D) and G (v)2Weight W (G (v) in d)1,d),v0) And W (G (v)2,d),v0) And obtaining according to the formula I.
Defining 2 similarity comparison of weighted topological nodes: let the central nodes of the domain concept A and the domain concept B be nodes v respectively1、v2Giving two data graphs G respectively based on the expansion of the domain concept A and the domain concept B1(d)=G(v1,d)、G2(d)=G(v2D), data graph G1(d) Node set V of1(d)=V(G1(d) Data graph G)2(d) Node set V of2(d)=V(G2(d) Wherein the weight value W (G) in the graph1(d),G2(d),v0) E (0,2d), arbitrary node v0The closer to the two-domain concept root node (the domain node used to represent the knowledge unit) the higher the weight. G1(d) G to G2(d) The weighted similarity α of G is found as follows:
Figure GDA0002938317030000062
wherein the weight W (G)1(d),G2(d),v0) I.e. W (G (v))1,d),G(v2,d),v0) And obtaining according to the formula II. Data graph G1(d)、G2(d) Is an extension of the field concept, and d can be preferably 3 when compared.
In specific implementation, the technical scheme can adopt a computer software technology to realize an automatic operation process, and can also adopt a modularized mode to provide a corresponding system. The embodiment of the invention also provides a fast knowledge comparison system based on the knowledge graph, which comprises the following modules,
the first module is used for constructing a knowledge representation unit and splitting and analyzing each field entry into the knowledge representation unit; the knowledge representation unit comprises a field node AreaNode, a classification node CategoryNode and a description node TextNode, wherein each attribute of each entry is stored in the field node AreaNode, the classification of the entry is stored in the classification node CategoryNode, a detailed sub-entry of the description entry is stored in the description node TextNode, and after a word segmentation method is used for segmenting the attribute text and the description node description text, the mentioned field node and the mentioned node establish an MENTION relationship, wherein the MENTION represents the MENTION;
the second module is used for constructing the knowledge graph, and comprises the steps of storing all knowledge representation units obtained by the first module into a graph database to form the knowledge graph, and forming many-to-many graph structure relations among the domain entries;
the third module is used for constructing the domain concepts needing to be compared, and comprises determining the domain concepts needing to be compared A, B, splitting and analyzing the domain concepts A, B into knowledge representation units, then storing the knowledge representation units into the knowledge graph obtained by the second module, and establishing a temporary reference relationship which does not damage the structure of the original graph;
a fourth module, configured to extract a multi-level topology of the domain concept, where the multi-level topology includes extracting a topology structure of the domain concept A, B on a knowledge graph in a subgraph matching manner, a domain node in a knowledge representation unit and other domain nodes describing nodes associated by an MENTION relationship are first-level topologies of the domain concept, the domain node in the first-level topology directly generates the MENTION relationship or indirectly generates the MENTION relationship through the description node is a second-level topology of the domain concept, and the like N-level topology refers to other domain nodes mentioned directly by the N-1 level topology and other domain nodes indirectly generating the MENTION relationship through the description node, and the extracted nodes are not extracted any more;
and the fifth module is used for comparing the multilevel topologies, obtaining a data graph according to the topological structure of the domain concept A, B extracted by the fourth module, calculating the weight of the topological nodes, and then calculating the weighted similarity alpha of the domain concept A, B, wherein the weighted similarity alpha is used for concept fusion.
The specific implementation of each module can refer to corresponding steps, and the detailed description of the invention is omitted.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or scope of the invention as defined in the appended claims.

Claims (6)

1. A fast knowledge comparison method based on knowledge graph is characterized in that: comprises the following steps of (a) carrying out,
step 1, constructing a knowledge representation unit, and splitting and analyzing entries of each field into the knowledge representation unit; the knowledge representation unit comprises a field node AreaNode, a classification node CategoryNode and a description node TextNode, wherein each attribute of each entry is stored in the field node AreaNode, the classification of the entry is stored in the classification node CategoryNode, a detailed sub-entry of the description entry is stored in the description node TextNode, and after a word segmentation method is used for segmenting the attribute text and the description node description text, the mentioned field node and the mentioned node establish an MENTION relationship, wherein the MENTION represents the MENTION;
step 2, constructing a knowledge graph, including storing all knowledge representation units obtained in the step 1 into a graph database to form the knowledge graph, and forming many-to-many graph structure relations among the field entries;
step 3, constructing the domain concepts to be compared, including determining the domain concepts A, B to be compared, splitting and analyzing the domain concepts A, B into knowledge representation units, then storing the knowledge graphs obtained in the step 2 and establishing a temporary reference relationship which does not damage the structure of the original graph;
step 4, extracting the multilevel topology of the domain concept, including extracting the topological structure of the domain concept A, B on the knowledge graph by using a subgraph matching mode, wherein the domain nodes in the knowledge representation unit and other domain nodes related to the description nodes through the MENTION relationship are the first-level topology of the domain concept, the domain nodes in the first-level topology directly generate the MENTION relationship or the domain nodes indirectly generate the MENTION relationship through the description nodes are the second-level topology of the domain concept, the homological N-level topology refers to other domain nodes directly mentioned by the N-1 level topology and other domain nodes indirectly generating the MENTION relationship through the description nodes, and the extracted nodes are not extracted any more;
and 5, comparing the multilevel topology, namely obtaining a data graph according to the topological structure of the domain concept A, B extracted in the step 4, calculating the weight of the topological nodes, and then calculating the weighted similarity alpha of the domain concept A, B to obtain a knowledge comparison result.
2. The method for fast knowledge comparison based on knowledge-graph according to claim 1, wherein: in step 5, the topology node weights are calculated based on the following definitions,
in the knowledge graph, a sub-network with a node V as a center and a depth d is modeled into a data graph G (V, d) { V (G (V, d)), E (G (V, d)) }, wherein V (G (V, d)) refers to a point set formed by all nodes in the data graph G (V, d), E (G (V, d)) refers to an edge set generated by all node links in the data graph G (V, d), and the node V is defined additionally0The depth in the data map G (v, D) is D (G (v, D), v0),v0E.g. V (G (V, d)), so as to obtain V for any node0W (G (v, d), v) of0) As indicated by the general representation of the,
Figure FDA0002938317020000011
when considering that two nodes respectively form a node v1、v2Node v is a topology formed by the sub-networks with the center and the depth d0The weights therein are found as follows,
W(G(v1,d),G(v2,d),v0)=W(G(v1,d),v0)+W(G(v2,d),v0)
wherein, G (v)1D) and G (v)2D) is respectively with node v1、v2Data graph, W (V) with depth d modeled by sub-network with center as the center1,d),v0) And W (G (v)2,d),v0) Is node v0In data graph G (v)1D) and G (v)2And weight in d).
3. The method for fast knowledge comparison based on knowledge-graph according to claim 2, characterized in that: in step 5, the weighted similarity a of the domain concept A, B is calculated as follows,
let the central nodes of the domain concept A and the domain concept B be nodes v respectively1、v2Data graph G of corresponding expansion1(d)=G(v1,d)、G2(d)=G(v2D), data graph G1(d) Node set V of1(d)=V(G1(d) Data graph G)2(d) Node set V of2(d)=V(G2(d) The domain concept A, B weighted similarity α is found as follows:
Figure FDA0002938317020000021
wherein the weight W (G)1(d),G2(d),v0)=W(G(v1,d),G(v2,d),v0)。
4. A fast knowledge comparison system based on knowledge graph is characterized in that: comprises the following modules which are used for realizing the functions of the system,
the first module is used for constructing a knowledge representation unit and splitting and analyzing each field entry into the knowledge representation unit; the knowledge representation unit comprises a field node AreaNode, a classification node CategoryNode and a description node TextNode, wherein each attribute of each entry is stored in the field node AreaNode, the classification of the entry is stored in the classification node CategoryNode, a detailed sub-entry of the description entry is stored in the description node TextNode, and after a word segmentation method is used for segmenting the attribute text and the description node description text, the mentioned field node and the mentioned node establish an MENTION relationship, wherein the MENTION represents the MENTION;
the second module is used for constructing the knowledge graph, and comprises the steps of storing all knowledge representation units obtained by the first module into a graph database to form the knowledge graph, and forming many-to-many graph structure relations among the domain entries;
the third module is used for constructing the domain concepts needing to be compared, and comprises determining the domain concepts needing to be compared A, B, splitting and analyzing the domain concepts A, B into knowledge representation units, then storing the knowledge representation units into the knowledge graph obtained by the second module, and establishing a temporary reference relationship which does not damage the structure of the original graph;
a fourth module, configured to extract a multi-level topology of the domain concept, where the multi-level topology includes extracting a topology structure of the domain concept A, B on a knowledge graph in a subgraph matching manner, a domain node in a knowledge representation unit and other domain nodes describing nodes associated by an MENTION relationship are first-level topologies of the domain concept, the domain node in the first-level topology directly generates the MENTION relationship or indirectly generates the MENTION relationship through the description node is a second-level topology of the domain concept, and the like N-level topology refers to other domain nodes mentioned directly by the N-1 level topology and other domain nodes indirectly generating the MENTION relationship through the description node, and the extracted nodes are not extracted any more;
and the fifth module is used for comparing the multilevel topology, and comprises the steps of obtaining a data graph according to the topological structure of the domain concept A, B extracted by the fourth module, calculating the weight of the topological nodes, and then calculating the weighted similarity alpha of the domain concept A, B to obtain a knowledge comparison result.
5. The system of knowledge-graph-based rapid knowledge comparison of claim 4, wherein: in a fifth module, topology node weights are computed based on the following definitions,
in the knowledge graph, a sub-network with a node V as a center and a depth d is modeled into a data graph G (V, d) { V (G (V, d)), E (G (V, d)) }, wherein V (G (V, d)) refers to a point set formed by all nodes in the data graph G (V, d), E (G (V, d)) refers to an edge set generated by all node links in the data graph G (V, d), and the node V is defined additionally0The depth in the data map G (v, D) is D (G (v, D), v0),v0E.g. V (G (V, d)), so as to obtain V for any node0W (G (v, d), v) of0) As indicated by the general representation of the,
Figure FDA0002938317020000031
when considering that two nodes respectively form a node v1、v2Node v is a topology formed by the sub-networks with the center and the depth d0The weights therein are found as follows,
W(G(v1,d),G(v2,d),v0)=W(G(v1,d),v0)+W(G(v2,d),v0)
wherein, G (v)1D) and G (v)2D) is respectively with node v1、v2Data graph, W (V) with depth d modeled by sub-network with center as the center1,d),v0) And W (G (v)2,d),v0) Is node v0In data graph G (v)1D) and G (v)2And weight in d).
6. The system of knowledge-graph-based rapid knowledge comparison of claim 5 wherein: in the fifth module, the weighted similarity a of the computing domain concept A, B is implemented as follows,
let the central nodes of the domain concept A and the domain concept B be nodes v respectively1、v2Corresponding data map G1(d)=G(v1,d)、G2(d)=G(v2D), data graph G1(d) Node set V of1(d)=V(G1(d) Data graph G)2(d) Node set V of2(d)=V(G2(d) The domain concept A, B weighted similarity α is found as follows:
Figure FDA0002938317020000032
wherein the weight W (G)1(d),G2(d),v0) I.e. W (G (v))1,d),G(v2,d),v0)。
CN201910025419.5A 2019-01-11 2019-01-11 Rapid knowledge comparison method and system based on knowledge graph Active CN109885693B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910025419.5A CN109885693B (en) 2019-01-11 2019-01-11 Rapid knowledge comparison method and system based on knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910025419.5A CN109885693B (en) 2019-01-11 2019-01-11 Rapid knowledge comparison method and system based on knowledge graph

Publications (2)

Publication Number Publication Date
CN109885693A CN109885693A (en) 2019-06-14
CN109885693B true CN109885693B (en) 2021-08-03

Family

ID=66925950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910025419.5A Active CN109885693B (en) 2019-01-11 2019-01-11 Rapid knowledge comparison method and system based on knowledge graph

Country Status (1)

Country Link
CN (1) CN109885693B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110413794A (en) * 2019-06-19 2019-11-05 重庆市重报大数据研究院 A kind of map of culture generation method
CN112182234B (en) * 2020-07-29 2022-06-28 长江勘测规划设计研究有限责任公司 Construction method of knowledge map of watershed flood control planning data
CN112926319B (en) * 2021-02-26 2024-01-12 北京百度网讯科技有限公司 A method, device, equipment and storage medium for determining domain vocabulary
CN113220908B (en) * 2021-07-08 2021-11-05 杭州智会学科技有限公司 Knowledge graph matching method and device
CN117573803B (en) * 2023-11-14 2024-04-19 安徽省征信股份有限公司 Knowledge graph-based new customer identification method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8914366B1 (en) * 2007-06-29 2014-12-16 Google Inc. Evaluating clustering based on metrics
CN104462501A (en) * 2014-12-19 2015-03-25 北京奇虎科技有限公司 Knowledge graph construction method and device based on structural data
CN106326211A (en) * 2016-08-17 2017-01-11 海信集团有限公司 Determination method and device for distance between keywords in interactive statement
CN108304519A (en) * 2018-01-24 2018-07-20 西安交通大学 A kind of knowledge forest construction method based on chart database
CN108830216A (en) * 2018-06-11 2018-11-16 北京理工大学 A kind of adjustable continuous vari-focus target identification system of visual field and method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9378065B2 (en) * 2013-03-15 2016-06-28 Advanced Elemental Technologies, Inc. Purposeful computing
US20150169758A1 (en) * 2013-12-17 2015-06-18 Luigi ASSOM Multi-partite graph database
US11023803B2 (en) * 2017-04-10 2021-06-01 Intel Corporation Abstraction library to enable scalable distributed machine learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8914366B1 (en) * 2007-06-29 2014-12-16 Google Inc. Evaluating clustering based on metrics
CN104462501A (en) * 2014-12-19 2015-03-25 北京奇虎科技有限公司 Knowledge graph construction method and device based on structural data
CN106326211A (en) * 2016-08-17 2017-01-11 海信集团有限公司 Determination method and device for distance between keywords in interactive statement
CN108304519A (en) * 2018-01-24 2018-07-20 西安交通大学 A kind of knowledge forest construction method based on chart database
CN108830216A (en) * 2018-06-11 2018-11-16 北京理工大学 A kind of adjustable continuous vari-focus target identification system of visual field and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ntity Linking with a Knowledge Base : Issues;Wei Shen, Jianyong Wang, Senior Member;《IEEE》;20150228;第443-459页 *

Also Published As

Publication number Publication date
CN109885693A (en) 2019-06-14

Similar Documents

Publication Publication Date Title
CN109885693B (en) Rapid knowledge comparison method and system based on knowledge graph
Fernandez et al. Seeping semantics: Linking datasets using word embeddings for data discovery
CN114218472B (en) Intelligent search system based on knowledge graph
CN111581354A (en) A method and system for calculating similarity of FAQ questions
CN107391677B (en) Method and device for generating Chinese general knowledge graph with entity relation attributes
Ardjani et al. Ontology-alignment techniques: survey and analysis
CN108268600B (en) AI-based unstructured data management method and device
US20170161619A1 (en) Concept-Based Navigation
CN110275947A (en) Domain-specific knowledge graph natural language query method and device based on named entity recognition
Sleeman et al. Topic modeling for RDF graphs
CN106777218B (en) Ontology matching method based on attribute similarity
CN108038106B (en) A self-learning method for fine-grained domain terminology based on context semantics
CN115563313A (en) Semantic retrieval system for literature and books based on knowledge graph
CN106547877B (en) Data element Smart Logo analytic method based on 6W service logic model
Mohammadi et al. Simulated annealing-based ontology matching
Liu et al. Domain ontology graph model and its application in Chinese text classification
Ramar et al. Technical review on ontology mapping techniques
Cruz et al. Building linked ontologies with high precision using subclass mapping discovery
Pujara et al. Generic statistical relational entity resolution in knowledge graphs
Song et al. Multi-domain ontology mapping based on semantics
Duong et al. A hybrid method for integrating multiple ontologies
Zarembo et al. Assessment of name based algorithms for land administration ontology matching
Nguyen et al. CitationLDA++ an extension of LDA for discovering topics in document network
Kumar et al. Efficient structuring of data in big data
Idoudi et al. Ontology knowledge mining for ontology alignment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant