CN119474261A

CN119474261A - Knowledge base document association search method, system and storage based on knowledge graph

Info

Publication number: CN119474261A
Application number: CN202411307781.9A
Authority: CN
Inventors: 文辉; 叶昌贵; 周明星
Original assignee: Daguan Data Co ltd
Current assignee: Daguan Data Co ltd
Priority date: 2024-09-19
Filing date: 2024-09-19
Publication date: 2025-02-18

Abstract

The present invention relates to a knowledge base document association search method, system and storage based on knowledge graph, which combines the directory tree of the knowledge base, the chapter structure of the document, the internal directory structure of the document, the document label, etc. to establish the association relationship between documents and between the chapters and paragraphs in the document, build a knowledge base network based on the knowledge base, and perform association search. Compared with the traditional search method, the present invention can better understand the user's query intention and can better improve the efficiency of knowledge retrieval.

Description

Knowledge base document association searching method, system and storage based on knowledge graph

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a knowledge base document association searching method, a knowledge base document association searching system and a knowledge base document association searching storage method based on a knowledge map.

Background

The knowledge base serves as a carrier for storing, organizing and utilizing knowledge, and systematically and organically stores and manages knowledge. Knowledge bases are widely used in such fields as enterprise knowledge management, academic research, research and development knowledge management, etc. The knowledge base provides a convenient way for users to acquire and share knowledge while organizing and storing a large amount of knowledge.

In general, from the view of the kind of data sources to be processed, a large amount of knowledge base is currently mainly biased to process unstructured documents, such as research and development materials in research and development processes, project process documents in project implementation processes, internal regulation policies of enterprises and the like, the formats of the documents are complicated, the kinds and the number of the documents are relatively large, and the retrieval requirements of the documents are strong.

At present, unstructured document retrieval using a knowledge base as a carrier is mainly based on keyword matching, so that the efficiency of obtaining information by a user is low, a large number of document types such as a system, a standard, a specification, a manual and the like exist in organizations and enterprises, the documents are usually stored in various storage systems, and along with the increase of the number of documents, the user can hardly find the required information quickly and accurately. On one hand, the traditional document management mode often depends on the mode of naming folders and files, the mode causes the problems of easy document loss, repetition, disordered naming and the like, when a certain specific document needs to be searched, a user needs to spend a great deal of time and effort to search manually, and on the other hand, even if the required document is found, the traditional searching mode is difficult to provide accurate answers quickly and efficiently because the document generally contains more semantic units. In general, current knowledge base document retrieval has the following problems:

1. Insufficient semantic understanding of search queries;

2. Searching for associations between lack of knowledge;

3. there is no comprehensive consideration of the search context.

Disclosure of Invention

In order to solve the defects of the prior knowledge base document retrieval, the invention provides a knowledge base document association searching method, a knowledge base document association searching system and a knowledge base document association storage method based on a knowledge spectrum, which establish semantic association among documents, documents and chapter paragraphs in the documents and promote the accuracy and efficiency of query.

The technical purpose of the invention is realized by the following technical scheme:

a knowledge base document association searching method based on a knowledge graph comprises the following steps:

Step 1, constructing a knowledge base network, namely analyzing the chapter structure of a document, and extracting a directory system structure of the document, wherein the directory system structure is a hierarchical structure, and each hierarchical directory system structure contains a document title and a directory;

marking corresponding entity labels on the titles and catalogues of the documents, extracting the entity labels based on the knowledge graph, and constructing association relations between the documents and between semantic units in the documents by whether the entity labels contain the same entity labels or not;

Step 2, carrying out association search based on the constructed knowledge base network, searching for a matching entity label to identify a key entity, extracting other entities associated with the key entity from the knowledge base network, and obtaining the relation among all the entities;

And step 3, sorting according to the matching degree, setting a matching degree threshold value, and screening out search results with higher matching degree and outputting.

Further, the knowledge base network comprises a plurality of levels of knowledge nodes, each level of knowledge node is provided with a label and a knowledge attachment, each knowledge node takes a document as the knowledge attachment, each knowledge attachment comprises a title and a catalog, the title comprises the label, the catalog comprises a plurality of levels of catalog nodes, and each level of catalog nodes is provided with the label.

Further, the association search in step 2 includes the following steps:

step 2.1, a user inputs a query sentence;

step 2.2, performing entity matching in the knowledge base network according to the query statement, matching entity labels, and identifying key entities in the knowledge base network;

step 2.3, when the entity labels cannot be matched, fuzzy matching is carried out on the basis of the user query statement and the entity labels, so that a candidate entity label set is formed;

and 2.4, extracting other entities associated with the key entities from the knowledge base network, and acquiring the relationship between the entities.

Further, the relationships between the entities acquired in step 2 include relationships between documents and association relationships between semantic units within the documents.

Further, when the step 3 is executed, the method comprises the following steps:

Step 3.1, the collection of directory nodes of the document directly related to the entity tag is recorded as V_1, and the weight of each directory node in V_1 is set with weight X, X >0;

Step 3.2, for each directory node in V_1, continuously traversing directory nodes of the sub-layer of the directory nodes to obtain a directory node set V_2 associated with the sub-layer directory nodes of the directory nodes in V_1, wherein the weight of each directory node in V_2 is increased by Y, and Y >0;

Step 3.3, the collection of documents indirectly associated with the entity tag is denoted as V_3;

step 3.4, traversing the directory nodes of each document in the V_3 to obtain a directory node set V_4 associated with the entity tag, wherein the weight of each directory node in the V_4 is increased by Y;

And 3.5, carrying out directory node combination and weight combination on all directory nodes in V_1, V_2 and V_4, and screening TOP weight directory nodes according to weight setting thresholds.

Further, the query statement includes at least one of a keyword, phrase, question.

The invention also provides a knowledge base document association search system based on the knowledge graph, which comprises:

the knowledge base network construction module is used for constructing a knowledge base network containing the association relations in the documents and among the documents;

The input module is used for inputting inquiry sentences by a user;

the matching module is used for carrying out association matching search from the knowledge base network according to the query statement input by the input module;

And the output module is used for outputting the result of the association matching search of the matching module.

The invention also provides a storage medium which stores computer software, and the computer software is executed according to the steps of a knowledge base document association searching method based on the knowledge map when running.

Compared with the prior art, the invention has the beneficial effects that:

1. The knowledge base network constructed in the invention can capture the context relation among the entities by means of the knowledge graph, help a search engine to better understand the background of user inquiry and provide search results which better meet the user intention, and the knowledge graph is used as a displayed knowledge storage and association form, so that semantic association among documents, documents and chapter paragraphs in the documents can be well established, and compared with the traditional keyword search, the method can provide higher efficient search efficiency.

2. The knowledge base network constructed in the invention contains the association relation of fine-grained knowledge among documents and in the documents, and can better understand the query intention of the user by matching with the association search, thereby better improving the knowledge retrieval efficiency.

Drawings

Fig. 1 is a knowledge base network design diagram based on knowledge graph in the present invention.

FIG. 2 is a schematic diagram of a knowledge-base document association search process based on knowledge graph in the present invention.

Fig. 3 shows a knowledge base network structure related to a dc power conversion station.

Detailed Description

The technical scheme of the invention is further described below with reference to the specific embodiments:

step 1, constructing a knowledge base network, namely analyzing the chapter structure of the document, extracting the directory system structure of the document through a set rule or directory extraction based on a machine algorithm, wherein the directory system structure is a hierarchical structure, the directory system structure of each hierarchy contains a document title and a directory, and the extraction rule can specify a format template of each directory system, for example, a primary title is 1xxx, a secondary title is 1.1xxx and a tertiary title is 1.1.1xxx.

And marking corresponding entity labels on the titles and the catalogues of the documents, extracting the entity labels based on the knowledge graph, and constructing weak semantic association relations between the documents and between semantic units in the documents by whether the same entity labels are contained.

The knowledge base network is shown in fig. 1, and comprises a plurality of layers of knowledge nodes, wherein each layer of knowledge node is provided with a label and a knowledge attachment respectively, each knowledge node takes a document as the knowledge attachment, each knowledge attachment comprises a title and a catalog, the title comprises the label, the catalog comprises a plurality of layers of catalog nodes, and each layer of catalog node is provided with the label respectively.

And 2, carrying out association search based on the constructed knowledge base network, searching for a matching entity label to identify a key entity, extracting other entities associated with the key entity from the knowledge base network, and acquiring the relationship between all the entities (including the key entity and the other entities associated with the key entity), wherein the relationship between the key entity, the other entities associated with the key entity and the other associated entities comprises the association relationship between the key entity and each semantic unit in the document.

Referring to fig. 2, the method specifically comprises the following steps:

and 2.1, inputting a query sentence by a user, wherein the query sentence is at least one of keywords, phrases, questions and the like as the query sentence.

and 2.3, when the entity tags cannot be matched, carrying out fuzzy matching based on the user query statement and the entity tags, extracting candidate entity references from qurey, searching in a knowledge base network, calculating the semantic similarity between the entity and the entity references in the knowledge base network, and sorting and outputting according to the semantic similarity to form a candidate entity tag set, wherein a common machine learning model for calculating the semantic similarity comprises ：DSSM(Deep Structured SemanticModels)、CNN-DSSM(Convolutional latent Semantic Model)、LSTM-DSSM(Long-Short-Term Memory Deep Structured Semantic Models), and can be realized by using an included angle Cosine (Cosine) algorithm.

Step 3, sorting according to the matching degree, setting a matching degree threshold value, screening out search results with higher matching degree, and outputting, referring to fig. 2, specifically comprising the following steps:

Step 3.1, the collection of directory nodes of the document to which the entity tag is directly related at one time is denoted as v_1, and the weight of each directory node in v_1 is set to be a weight X, X >0;

step 3.2, for each directory node in v_1, traversing the directory nodes of the directory node sub-layer continuously to obtain a directory node set v_2 associated with the directory nodes of the sub-layer of the directory nodes in v_1, wherein the weight of each directory node in v_2 is increased by Y, Y >0, in this embodiment y=1;

Step 3.3, the collection of documents indirectly associated with the entity tag (such as tag-title-knowledge attachment or tag-knowledge node-knowledge attachment) is denoted as v_3;

step 3.4, traversing the directory nodes of each document in V_3 to obtain a directory node set V_4 associated with the entity tag, wherein the weight of each directory node in V_4 is increased by Y, and Y=1;

and 3.5, merging the directory nodes and combining weights of all the directory nodes in V_1, V_2 and V_4, and screening the directory nodes with higher weights according to the weight setting threshold, for example, screening the directory nodes with the weights ranked five times before.

For a clearer understanding of the present invention, reference is made to the following specific examples:

Step 1, as shown in fig. 3, constructing a knowledge base network shown in fig. 3, and fig. 3 is a knowledge base network structure related to a direct current power conversion station:

the direct current power conversion station is a first-level knowledge node, and the detection management and maintenance management are second-level knowledge nodes;

the circuit breaker detection rule, docx and the converter transformer detection rule, docx are knowledge accessories under the detection management knowledge node, and the converter transformer maintenance rule, docx is knowledge accessories under the maintenance management knowledge node;

The 'breaker detection rule, docx' comprises a primary catalog node (3. Detection method), a secondary catalog node (3.1 bushing detection);

The 'converter transformer detection rule, docx' comprises a primary catalog node (4. Detection method), a secondary catalog node (4.1 short circuit detection, 4.2 sleeve detection);

The docx comprises a primary catalog node (5. Overhauling key process quality space requirement), and a secondary catalog node (5.1 sleeve device overhauling, 5.2 oil storage cabinet overhauling, 5.3 overhauling and breather overhauling);

The detection method, the detection method of 3.1, the detection method of 4, the detection of 4.1 and the detection of short circuit and the detection of 4.2 are used as label nodes;

The converter transformer detection rule, docx and the converter transformer overhaul rule, docx are characterized in that the converter transformer is used as a label node;

taking a sleeve as a label node between the 3.1 sleeve detection and the 5.1 sleeve device maintenance;

The '5.2 oil storage cabinet overhaul' takes the 'oil storage cabinet' as a label node;

"5.3 breather service" takes "breather" as the tag node.

Step 2, performing association search based on the constructed knowledge base network:

The user inputs a converter bushing detection method, and extracts label entities from a knowledge base network, wherein the label entities comprise [ converter transformer ] (subjected to fuzzy matching through a converter), [4.2 bushing detection ] (subjected to accurate matching through bushing detection), [ bushing ] (subjected to accurate matching through bushing) and [ detection ] (subjected to accurate matching through detection);

step 3, sorting according to the matching degree, setting a matching degree threshold value, screening out search results with higher matching degree, and outputting the search results, wherein the reachable paths are as follows:

path 1 [4.2 cannula detection ]

The path 2 is [ converter transformer ] → [ converter transformer detection rule ·docx ] → [4. Detection method ] → [4.2 sleeve detection ]

Path 3 [ detection ] & gt [4. Detection method ] & gt [4.2 cannula detection ]

Path 4 [ sleeve ]. Fwdarw.4.2 sleeve detection ]

And according to the optimal sequence of the paths, obtaining the path 1 as the result with the highest score and outputting.

The embodiment also provides a knowledge base document association search system based on the knowledge graph, which comprises:

The knowledge base network construction module is used for constructing a knowledge base network containing the association relations in the documents and among the documents, as shown in fig. 1;

The input module is used for inputting inquiry sentences by a user;

The matching module is used for carrying out association matching search from the knowledge base network according to the query statement input by the input module, carrying out entity matching in the knowledge base network according to the query statement, matching entity labels, identifying key entities in the knowledge base network, carrying out fuzzy matching based on the user query statement and the entity labels to form a candidate entity label set when the entity labels are not matched, and extracting other entities associated with the key entities in the knowledge base network and obtaining the relation among the entities.

The output module is used for outputting the results of the matching search associated with the matching module, the output module is used for sorting according to the matching degree, setting a matching degree threshold value and screening out the search results with higher matching degree and outputting the search results.

The embodiment also provides a storage medium, wherein the storage medium stores computer software, such as a computer, a mobile phone, a tablet computer, a mobile hard disk, a mobile terminal, vehicle-mounted computer equipment, a network storage space and the like, and the computer software is executed according to the knowledge base document association searching method based on the knowledge map in the embodiment when running.

The present embodiment is further illustrative of the present invention and is not to be construed as limiting the invention, and those skilled in the art can make no inventive modifications to the present embodiment as required after reading the present specification, but only as long as they are within the scope of the claims of the present invention.

Claims

1. The knowledge base document association searching method based on the knowledge graph is characterized by comprising the following steps of:

step 2, carrying out association search based on the constructed knowledge base network, searching for a matching entity label to identify a key entity, extracting other entities associated with the key entity from the knowledge base network, and obtaining relations among all the entities, wherein the relations among the entities comprise relations among documents and association relations among semantic units in the documents;

Step 3, sorting according to the matching degree, setting a matching degree threshold value, and screening out search results with higher matching degree and outputting the search results;

When the step 3 is executed, the method comprises the following steps:

2. The knowledge base document association searching method based on the knowledge map according to claim 1, wherein the knowledge base network comprises a plurality of levels of knowledge nodes, each level of knowledge nodes is provided with a label and a knowledge attachment respectively, each knowledge node takes a document as the knowledge attachment, each knowledge attachment comprises a title and a catalog, the title comprises the label, the catalog comprises a plurality of levels of catalog nodes, and each level of catalog nodes is provided with the label respectively.

3. The knowledge-base document associative search method based on a knowledge graph according to claim 2, wherein the associative search in step 2 includes the steps of:

step 2.1, a user inputs a query sentence;

4. The knowledge-based repository document association search method as claimed in claim 3, wherein the query sentence includes at least one of a keyword, a phrase, and a question.

5. A knowledge-base document associative search system based on a knowledge graph, comprising:

The knowledge base network construction module is used for constructing a knowledge base network containing the association relations in the documents and among the documents, extracting a directory system of a hierarchical structure, wherein the directory system of each hierarchical structure contains a document title and a directory;

The input module is used for inputting inquiry sentences by a user;

the system comprises an output module, a matching module, a weight setting module and a weight setting module, wherein the output module is used for outputting a matching module association matching search result, the matching module is used for associating a set V_1 of directory nodes of documents directly associated with entity tags, setting a weight X for each directory node in V_1, associating a set of directory nodes associated with sub-layer directory nodes of each directory node in V_1 as V_2, adding a weight Y for each directory node in V_2, associating a set of documents indirectly associated with the entity tags as V_3, associating a set of directory nodes associated with each document in V_3 with the entity tags as V_4, adding a weight Y for each directory node in V_4, combining all directory nodes in V_1, V_2 and V_4, carrying out weight combination, and screening TOP weight directory nodes in the combination result according to a set threshold.

6. A storage medium storing computer software which, when run, performs the method of any one of claims 1-4.