Disclosure of Invention
In order to solve the defects of the prior knowledge base document retrieval, the invention provides a knowledge base document association searching method, a knowledge base document association searching system and a knowledge base document association storage method based on a knowledge spectrum, which establish semantic association among documents, documents and chapter paragraphs in the documents and promote the accuracy and efficiency of query.
The technical purpose of the invention is realized by the following technical scheme:
a knowledge base document association searching method based on a knowledge graph comprises the following steps:
Step 1, constructing a knowledge base network, namely analyzing the chapter structure of a document, and extracting a directory system structure of the document, wherein the directory system structure is a hierarchical structure, and each hierarchical directory system structure contains a document title and a directory;
marking corresponding entity labels on the titles and catalogues of the documents, extracting the entity labels based on the knowledge graph, and constructing association relations between the documents and between semantic units in the documents by whether the entity labels contain the same entity labels or not;
Step 2, carrying out association search based on the constructed knowledge base network, searching for a matching entity label to identify a key entity, extracting other entities associated with the key entity from the knowledge base network, and obtaining the relation among all the entities;
And step 3, sorting according to the matching degree, setting a matching degree threshold value, and screening out search results with higher matching degree and outputting.
Further, the knowledge base network comprises a plurality of levels of knowledge nodes, each level of knowledge node is provided with a label and a knowledge attachment, each knowledge node takes a document as the knowledge attachment, each knowledge attachment comprises a title and a catalog, the title comprises the label, the catalog comprises a plurality of levels of catalog nodes, and each level of catalog nodes is provided with the label.
Further, the association search in step 2 includes the following steps:
step 2.1, a user inputs a query sentence;
step 2.2, performing entity matching in the knowledge base network according to the query statement, matching entity labels, and identifying key entities in the knowledge base network;
step 2.3, when the entity labels cannot be matched, fuzzy matching is carried out on the basis of the user query statement and the entity labels, so that a candidate entity label set is formed;
and 2.4, extracting other entities associated with the key entities from the knowledge base network, and acquiring the relationship between the entities.
Further, the relationships between the entities acquired in step 2 include relationships between documents and association relationships between semantic units within the documents.
Further, when the step 3 is executed, the method comprises the following steps:
Step 3.1, the collection of directory nodes of the document directly related to the entity tag is recorded as V_1, and the weight of each directory node in V_1 is set with weight X, X >0;
Step 3.2, for each directory node in V_1, continuously traversing directory nodes of the sub-layer of the directory nodes to obtain a directory node set V_2 associated with the sub-layer directory nodes of the directory nodes in V_1, wherein the weight of each directory node in V_2 is increased by Y, and Y >0;
Step 3.3, the collection of documents indirectly associated with the entity tag is denoted as V_3;
step 3.4, traversing the directory nodes of each document in the V_3 to obtain a directory node set V_4 associated with the entity tag, wherein the weight of each directory node in the V_4 is increased by Y;
And 3.5, carrying out directory node combination and weight combination on all directory nodes in V_1, V_2 and V_4, and screening TOP weight directory nodes according to weight setting thresholds.
Further, the query statement includes at least one of a keyword, phrase, question.
The invention also provides a knowledge base document association search system based on the knowledge graph, which comprises:
the knowledge base network construction module is used for constructing a knowledge base network containing the association relations in the documents and among the documents;
The input module is used for inputting inquiry sentences by a user;
the matching module is used for carrying out association matching search from the knowledge base network according to the query statement input by the input module;
And the output module is used for outputting the result of the association matching search of the matching module.
The invention also provides a storage medium which stores computer software, and the computer software is executed according to the steps of a knowledge base document association searching method based on the knowledge map when running.
Compared with the prior art, the invention has the beneficial effects that:
1. The knowledge base network constructed in the invention can capture the context relation among the entities by means of the knowledge graph, help a search engine to better understand the background of user inquiry and provide search results which better meet the user intention, and the knowledge graph is used as a displayed knowledge storage and association form, so that semantic association among documents, documents and chapter paragraphs in the documents can be well established, and compared with the traditional keyword search, the method can provide higher efficient search efficiency.
2. The knowledge base network constructed in the invention contains the association relation of fine-grained knowledge among documents and in the documents, and can better understand the query intention of the user by matching with the association search, thereby better improving the knowledge retrieval efficiency.
Detailed Description
The technical scheme of the invention is further described below with reference to the specific embodiments:
a knowledge base document association searching method based on a knowledge graph comprises the following steps:
step 1, constructing a knowledge base network, namely analyzing the chapter structure of the document, extracting the directory system structure of the document through a set rule or directory extraction based on a machine algorithm, wherein the directory system structure is a hierarchical structure, the directory system structure of each hierarchy contains a document title and a directory, and the extraction rule can specify a format template of each directory system, for example, a primary title is 1xxx, a secondary title is 1.1xxx and a tertiary title is 1.1.1xxx.
And marking corresponding entity labels on the titles and the catalogues of the documents, extracting the entity labels based on the knowledge graph, and constructing weak semantic association relations between the documents and between semantic units in the documents by whether the same entity labels are contained.
The knowledge base network is shown in fig. 1, and comprises a plurality of layers of knowledge nodes, wherein each layer of knowledge node is provided with a label and a knowledge attachment respectively, each knowledge node takes a document as the knowledge attachment, each knowledge attachment comprises a title and a catalog, the title comprises the label, the catalog comprises a plurality of layers of catalog nodes, and each layer of catalog node is provided with the label respectively.
And 2, carrying out association search based on the constructed knowledge base network, searching for a matching entity label to identify a key entity, extracting other entities associated with the key entity from the knowledge base network, and acquiring the relationship between all the entities (including the key entity and the other entities associated with the key entity), wherein the relationship between the key entity, the other entities associated with the key entity and the other associated entities comprises the association relationship between the key entity and each semantic unit in the document.
Referring to fig. 2, the method specifically comprises the following steps:
and 2.1, inputting a query sentence by a user, wherein the query sentence is at least one of keywords, phrases, questions and the like as the query sentence.
Step 2.2, performing entity matching in the knowledge base network according to the query statement, matching entity labels, and identifying key entities in the knowledge base network;
and 2.3, when the entity tags cannot be matched, carrying out fuzzy matching based on the user query statement and the entity tags, extracting candidate entity references from qurey, searching in a knowledge base network, calculating the semantic similarity between the entity and the entity references in the knowledge base network, and sorting and outputting according to the semantic similarity to form a candidate entity tag set, wherein a common machine learning model for calculating the semantic similarity comprises :DSSM(Deep Structured SemanticModels)、CNN-DSSM(Convolutional latent Semantic Model)、LSTM-DSSM(Long-Short-Term Memory Deep Structured Semantic Models), and can be realized by using an included angle Cosine (Cosine) algorithm.
And 2.4, extracting other entities associated with the key entities from the knowledge base network, and acquiring the relationship between the entities.
Step 3, sorting according to the matching degree, setting a matching degree threshold value, screening out search results with higher matching degree, and outputting, referring to fig. 2, specifically comprising the following steps:
Step 3.1, the collection of directory nodes of the document to which the entity tag is directly related at one time is denoted as v_1, and the weight of each directory node in v_1 is set to be a weight X, X >0;
step 3.2, for each directory node in v_1, traversing the directory nodes of the directory node sub-layer continuously to obtain a directory node set v_2 associated with the directory nodes of the sub-layer of the directory nodes in v_1, wherein the weight of each directory node in v_2 is increased by Y, Y >0, in this embodiment y=1;
Step 3.3, the collection of documents indirectly associated with the entity tag (such as tag-title-knowledge attachment or tag-knowledge node-knowledge attachment) is denoted as v_3;
step 3.4, traversing the directory nodes of each document in V_3 to obtain a directory node set V_4 associated with the entity tag, wherein the weight of each directory node in V_4 is increased by Y, and Y=1;
and 3.5, merging the directory nodes and combining weights of all the directory nodes in V_1, V_2 and V_4, and screening the directory nodes with higher weights according to the weight setting threshold, for example, screening the directory nodes with the weights ranked five times before.
For a clearer understanding of the present invention, reference is made to the following specific examples:
Step 1, as shown in fig. 3, constructing a knowledge base network shown in fig. 3, and fig. 3 is a knowledge base network structure related to a direct current power conversion station:
the direct current power conversion station is a first-level knowledge node, and the detection management and maintenance management are second-level knowledge nodes;
the circuit breaker detection rule, docx and the converter transformer detection rule, docx are knowledge accessories under the detection management knowledge node, and the converter transformer maintenance rule, docx is knowledge accessories under the maintenance management knowledge node;
The 'breaker detection rule, docx' comprises a primary catalog node (3. Detection method), a secondary catalog node (3.1 bushing detection);
The 'converter transformer detection rule, docx' comprises a primary catalog node (4. Detection method), a secondary catalog node (4.1 short circuit detection, 4.2 sleeve detection);
The docx comprises a primary catalog node (5. Overhauling key process quality space requirement), and a secondary catalog node (5.1 sleeve device overhauling, 5.2 oil storage cabinet overhauling, 5.3 overhauling and breather overhauling);
The detection method, the detection method of 3.1, the detection method of 4, the detection of 4.1 and the detection of short circuit and the detection of 4.2 are used as label nodes;
The converter transformer detection rule, docx and the converter transformer overhaul rule, docx are characterized in that the converter transformer is used as a label node;
taking a sleeve as a label node between the 3.1 sleeve detection and the 5.1 sleeve device maintenance;
The '5.2 oil storage cabinet overhaul' takes the 'oil storage cabinet' as a label node;
"5.3 breather service" takes "breather" as the tag node.
Step 2, performing association search based on the constructed knowledge base network:
The user inputs a converter bushing detection method, and extracts label entities from a knowledge base network, wherein the label entities comprise [ converter transformer ] (subjected to fuzzy matching through a converter), [4.2 bushing detection ] (subjected to accurate matching through bushing detection), [ bushing ] (subjected to accurate matching through bushing) and [ detection ] (subjected to accurate matching through detection);
step 3, sorting according to the matching degree, setting a matching degree threshold value, screening out search results with higher matching degree, and outputting the search results, wherein the reachable paths are as follows:
path 1 [4.2 cannula detection ]
The path 2 is [ converter transformer ] → [ converter transformer detection rule ·docx ] → [4. Detection method ] → [4.2 sleeve detection ]
Path 3 [ detection ] & gt [4. Detection method ] & gt [4.2 cannula detection ]
Path 4 [ sleeve ]. Fwdarw.4.2 sleeve detection ]
And according to the optimal sequence of the paths, obtaining the path 1 as the result with the highest score and outputting.
The embodiment also provides a knowledge base document association search system based on the knowledge graph, which comprises:
The knowledge base network construction module is used for constructing a knowledge base network containing the association relations in the documents and among the documents, as shown in fig. 1;
The input module is used for inputting inquiry sentences by a user;
The matching module is used for carrying out association matching search from the knowledge base network according to the query statement input by the input module, carrying out entity matching in the knowledge base network according to the query statement, matching entity labels, identifying key entities in the knowledge base network, carrying out fuzzy matching based on the user query statement and the entity labels to form a candidate entity label set when the entity labels are not matched, and extracting other entities associated with the key entities in the knowledge base network and obtaining the relation among the entities.
The output module is used for outputting the results of the matching search associated with the matching module, the output module is used for sorting according to the matching degree, setting a matching degree threshold value and screening out the search results with higher matching degree and outputting the search results.
The embodiment also provides a storage medium, wherein the storage medium stores computer software, such as a computer, a mobile phone, a tablet computer, a mobile hard disk, a mobile terminal, vehicle-mounted computer equipment, a network storage space and the like, and the computer software is executed according to the knowledge base document association searching method based on the knowledge map in the embodiment when running.
The present embodiment is further illustrative of the present invention and is not to be construed as limiting the invention, and those skilled in the art can make no inventive modifications to the present embodiment as required after reading the present specification, but only as long as they are within the scope of the claims of the present invention.