CN110569371A - Knowledge graph construction method and device and storage equipment - Google Patents
Knowledge graph construction method and device and storage equipment Download PDFInfo
- Publication number
- CN110569371A CN110569371A CN201910875545.XA CN201910875545A CN110569371A CN 110569371 A CN110569371 A CN 110569371A CN 201910875545 A CN201910875545 A CN 201910875545A CN 110569371 A CN110569371 A CN 110569371A
- Authority
- CN
- China
- Prior art keywords
- knowledge
- result
- triples
- specific entity
- triple
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
the invention discloses a knowledge graph construction method, a knowledge graph construction device and computer storage equipment, wherein knowledge is extracted from a data source associated with a specific entity word to obtain a triple of the specific entity word; then fusing the obtained triples of the specific entity words with the existing triples in the concept map to obtain a fusion result; performing upper and lower reasoning on the obtained fusion result to obtain a reasoning result of the triples for expanding the specific entity words; further translating the extended triples in the inference result by using a knowledge generation engine to obtain a translation result; and finally, writing the obtained translation result into a storage service through a storage engine.
Description
Technical Field
The invention relates to the technical field of information processing, in particular to a large-scale knowledge graph construction method and device and computer storage equipment.
Background
The knowledge map, also called knowledge domain visualization or knowledge domain mapping map, is a series of different graphs displaying the relationship between the knowledge development process and the structure, describes knowledge resources and carriers thereof by using visualization technology, and mines, analyzes, constructs, draws and displays knowledge and the mutual relation between the knowledge resources and the carriers. The knowledge graph can be applied to many application scenarios, such as information recommendation based on the knowledge graph in an information recommendation system, or classification based on the knowledge graph in a text classification process, and the like. Therefore, in order to ensure the wide application of the knowledge graph, a plurality of research methods are available to realize the construction of the knowledge graph.
according to the knowledge graph construction method recorded in the patent document with the publication number of CN108563710A, when the knowledge graph is constructed, the labels of published texts and the entity information in the basic graph are used as the information of graph nodes in the knowledge graph to be constructed, and then the occurrence times of the information of two graph stages in the same published text are used as node relation information to complete the construction of the knowledge graph. According to a knowledge graph construction method described in still another patent document with publication No. CN108694177A, a knowledge graph is constructed mainly by constructing relationship data between respective entities.
The concept knowledge graph constructed by the existing knowledge graph construction method has the problems of small scale, single language (Chinese or English) support, single knowledge extraction method, inaccurate isA concept relation, incapability of dynamic updating and the like.
disclosure of Invention
in order to effectively overcome the defects in the conventional knowledge graph construction method, the embodiment of the invention creatively provides a large-scale knowledge graph construction method, a large-scale knowledge graph construction device and computer storage equipment.
According to a first aspect of the embodiments of the present invention, there is provided a method for constructing a knowledge graph, the method including: extracting knowledge from a data source associated with a specific entity word to obtain a triple of the specific entity word; fusing the obtained triples of the specific entity words with the existing triples in the concept map to obtain a fusion result; performing upper and lower reasoning on the obtained fusion result to obtain a reasoning result of the triple of the specific entity word; translating the extended triples in the inference result by using a knowledge generation engine to obtain a translation result; the resulting translation results are written to the storage service by the storage engine.
According to an embodiment of the present invention, the data source associated with the specific entity word includes at least one of the following types: structured data, semi-structured data, and unstructured data; accordingly, knowledge extraction from data sources associated with particular entity words includes: and performing knowledge extraction from the data source associated with the specific entity word by adopting a knowledge extraction mode corresponding to the type of the data source, wherein different data sources correspond to different knowledge extraction methods.
According to one embodiment of the invention, the knowledge extraction from the data source associated with the specific entity word comprises the following steps: if the data source type is structured data, extracting knowledge from a relational database by using a D2R method or extracting knowledge from link data by using a graph mapping method; and/or, if the data source type is semi-structured data, extracting knowledge from the semi-structured data by using a wrapper; and/or if the data source type is unstructured data, extracting knowledge from the free text by using an information extraction method.
According to an embodiment of the present invention, the fusing the obtained triplet of the specific entity word with the existing triplet in the concept graph to obtain a fused result includes: judging whether the obtained triple of the specific entity word is contained in the existing triple in the concept map; and if the obtained triple of the specific entity word is not contained in the existing triple in the concept map, adding the obtained triple of the specific entity word into the concept map to obtain a fusion result.
According to an embodiment of the present invention, the determining whether the obtained triple of the specific entity word is included in the existing triple in the concept graph includes: performing word expansion on the entity words in the obtained triples of the specific entity words to obtain triples after the word expansion; and judging whether the triples of the expanded words are contained in the existing triples in the concept map.
According to one embodiment of the invention, the knowledge generation engine comprises a knowledge base query engine, a neural network translation engine and an online translation engine; correspondingly, the method for translating the triples expanded in the inference result by using the knowledge generation engine to obtain the translation result comprises the following steps: translating the extended triple in the inference result by using a plurality of different knowledge generation engines to obtain a plurality of processing results; performing fusion comparison on the obtained multiple processing results to obtain a fusion comparison result; and determining the processing result with the highest rank in the fusion comparison results as a translation result.
According to an embodiment of the present invention, after writing the obtained translation result into the storage service by the storage engine, the method further includes: and reading the written translation result from the storage service by adopting a query engine matched with the storage engine.
According to a second aspect of the embodiments of the present invention, there is also provided a knowledge-graph constructing apparatus, including: the knowledge extraction module is used for extracting knowledge from a data source associated with a specific entity word to obtain a triple of the specific entity word; the fusion module is used for fusing the obtained triples of the specific entity words with the existing triples in the concept map to obtain a fusion result; the reasoning module is used for carrying out upper and lower reasoning on the obtained fusion result to obtain a reasoning result of the triple of the specific entity word; the knowledge generation module is used for translating the extended triples in the inference result by using a knowledge generation engine to obtain a translation result; and the storage module is used for writing the obtained translation result into the storage service through the storage engine.
According to an embodiment of the present invention, the data source associated with the specific entity word includes at least one of the following types: structured data, semi-structured data, and unstructured data; correspondingly, the knowledge extraction module is specifically configured to extract knowledge from the data source associated with the specific entity word in a knowledge extraction manner corresponding to the type of the data source, where different data source types correspond to different knowledge extraction methods.
According to an embodiment of the present invention, the knowledge extraction module is specifically configured to, if the data source type is structured data, extract knowledge from a relational database using a D2R method or extract knowledge from link data using a graph mapping method; and/or, if the data source type is semi-structured data, extracting knowledge from the semi-structured data by using a wrapper; and/or if the data source type is unstructured data, extracting knowledge from the free text by using an information extraction method.
According to an embodiment of the invention, the fusion module comprises: the judging unit is used for judging whether the obtained triple of the specific entity word is contained in the existing triple in the concept map; and the adding unit is used for adding the obtained triple of the specific entity word into the concept map to obtain a fusion result if the obtained triple of the specific entity word is not contained in the existing triple in the concept map.
according to an embodiment of the present invention, the determining unit is specifically configured to perform word expansion on an entity word in the obtained triple of the specific entity word to obtain an expanded triple; and judging whether the triples of the expanded words are contained in the existing triples in the concept map.
according to one embodiment of the invention, the knowledge generation engine comprises a knowledge query engine, a neural network translation engine and an online translation engine; correspondingly, the knowledge generation module is specifically configured to utilize a plurality of different knowledge generation engines to perform translation processing on the extended triples in the inference result to obtain a plurality of processing results; performing fusion comparison on the obtained multiple processing results to obtain a fusion comparison result; and determining the processing result with the highest rank in the fusion comparison results as a translation result.
according to an embodiment of the present invention, the apparatus further includes a query module, configured to read the written translation result from the storage service using a query engine matching the storage engine.
According to a third aspect of embodiments of the present invention, there is provided a computer storage device comprising a set of computer-executable instructions which, when executed, perform any of the above-described methods of knowledge-graph construction.
The knowledge graph construction method, the knowledge graph construction device and the computer storage equipment disclosed by the embodiment of the invention firstly extract knowledge from a data source associated with a specific entity word to obtain a triple of the specific entity word; then fusing the obtained triples of the specific entity words with the existing triples in the concept map to obtain a fusion result; performing upper and lower reasoning on the obtained fusion result to obtain a reasoning result of the triples for expanding the specific entity words; further translating the extended triples in the inference result by using a knowledge generation engine to obtain a translation result; and finally, writing the obtained translation result into a storage service through a storage engine. Therefore, the invention is strictly organized according to the entity in the construction process of the knowledge graph, which is beneficial to the accurate understanding of the entity; moreover, by carrying out fusion, reasoning and translation processing on the triples obtained by knowledge extraction, a large-scale high-quality concept knowledge graph can be constructed and completed, so that the accuracy and the recall rate of natural language understanding are improved.
Drawings
the above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
FIG. 1 is a diagram of a system architecture for implementing knowledge graph construction according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating an implementation of a knowledge graph construction method according to an embodiment of the present invention;
FIG. 3 shows an architecture diagram of a knowledge generation module of an embodiment of the invention;
FIG. 4 illustrates a conceptual knowledge graph effect diagram of an application example of the present invention;
FIG. 5 is a schematic diagram showing the composition structure of the knowledge graph constructing apparatus according to the embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
FIG. 1 is a diagram illustrating a system architecture for implementing knowledge graph construction according to an embodiment of the present invention. Referring to fig. 1, a system architecture for implementing knowledge graph construction according to an embodiment of the present invention at least includes: the system comprises modules of knowledge extraction, knowledge fusion, knowledge reasoning, knowledge generation, a storage engine, a query engine and the like. The knowledge extraction link can extract knowledge from different types of data sources such as result data, semi-structured data, unstructured data and the like; then, the extracted knowledge is subjected to fusion, reasoning, knowledge generation (namely translation processing) and other processing, so that a large-scale high-quality concept knowledge graph is constructed and completed; further, the constructed concept knowledge graph is written into a storage service through a storage engine, so that the query processing can be carried out through a query engine subsequently.
FIG. 2 is a schematic flow chart of a method for constructing a knowledge graph according to an embodiment of the present invention; please refer to fig. 2. The method for constructing the knowledge graph comprises the following steps: operation 201, performing knowledge extraction from a data source associated with a specific entity word to obtain a triple of the specific entity word; operation 202, fusing the obtained triples of the specific entity words with the existing triples in the concept graph to obtain a fusion result; operation 203, performing upper and lower reasoning on the obtained fusion result to obtain a reasoning result of the triplet for expanding the specific entity word; operation 204, translating the extended triple in the inference result by using a knowledge generation engine to obtain a translation result; in operation 205, the resulting translation result is written into the storage service by the storage engine.
The triples in the knowledge graph generally include (entity, entity relationship, entity). If an entity is considered as a node and an entity relationship (including attributes, categories, etc.) is considered as an edge, then the knowledge base containing a large number of triples becomes a huge knowledge graph. For example, the triplets including liu de hua may be (liu de hua, isA, actor), (liu de hua, isA, singer), (liu de hua, isA, word filler), and (liu de hua, isA, producer), among others.
at operation 201, the data source associated with the particular entity word includes at least one of the following types: structured data, semi-structured data, and unstructured data. Accordingly, operation 201 includes: and performing knowledge extraction from the data source associated with the specific entity word by adopting a knowledge extraction mode corresponding to the type of the data source, wherein different data sources correspond to different knowledge extraction methods.
In an example, if the data source type is structured data, then knowledge is extracted from a relational database using the D2R method or from linked data using the graph mapping method.
In another example, if the data source type is semi-structured data, a wrapper is used to extract knowledge from the semi-structured data. Wherein a wrapper may also be referred to as a decimator. Specifically, the semi-structured data is taken as a web page as an example to analyze the web page, and knowledge and a triple are extracted from the web page. For example, in the case of the liu de hua Baidu encyclopedia page, multiple triples including liu de hua, such as (liu de hua, isA, actor), (liu de hua, isA, singer), (liu de hua, isA, word filler), (liu de hua, isA, slide producer), (liu de hua, isA, music character), (liu de hua, isA, amusement character), etc. can be extracted from the career of the top page, such as the actor, singer, word filler, and slide producer, to the entry tag, such as music character, actor, singer, amusement character, producer, and producer.
In yet another example, if the data source type is unstructured data, then knowledge is extracted from free text using an information extraction method. Specifically, the information extraction method may include: 1) a regular expression; 2) a template; 3) participle/technical dependency (subject & object, i.e., S & O); and 4) a method of sequence labeling, namely a multi-label classification model BERT + BilSTM + CRF.
In operation 202, it is first determined whether the obtained triples of the specific entity word are included in existing triples in the concept graph; and if the obtained triple of the specific entity word is not contained in the existing triple in the concept map, adding the obtained triple of the specific entity word into the concept map to obtain a fusion result. Conversely, if the resulting triples of a particular entity word are contained in existing triples in the concept graph, they may be ignored. For example, a triple (liu de hua, isA, actor) is first determined whether the isA triple exists in the concept map, if so, the isA triple is ignored, and if not, the isA triple is newly added.
According to an embodiment of the present invention, before determining whether the obtained triplet of the specific entity word is included in the existing triples in the concept graph, word expansion may be performed on the entity word in the obtained triplet of the specific entity word to obtain a word expanded triplet; and further judging whether the triples after the word expansion are contained in the existing triples in the concept map. For example, a triplet (liu de hua, isA, actor), liu de hua may be first word expanded into hua zi to obtain an expanded triplet such as (hua zi, isA, actor); and further judging whether the expanded isA triple exists in the concept map, if so, ignoring the extended isA triple, and if not, adding the extended isA triple.
At operation 203, the fused results may be used to perform a context inference using an inference engine to supplement more isA relationships. For example, triplets (liud, isA, entertain character) and (liud, isA, character) may be inferred from triplets (liud, isA, actor), (actor, isA, entertain character), (entertain character, isA, character).
fig. 3 is an architecture diagram of a knowledge generation module according to an embodiment of the invention. Referring to fig. 3, the knowledge generation engine includes a knowledge base query engine, a neural network translation engine, and an online translation engine; correspondingly, in operation 204, firstly, a plurality of different knowledge generation engines are used for performing translation processing on the extended triple in the inference result to obtain a plurality of processing results; then, carrying out fusion comparison on the obtained multiple processing results to obtain a fusion comparison result; and finally determining the processing result with the highest rank in the fusion comparison results as a translation result.
Wherein, different knowledge generation engines correspond to different database models, for example, a knowledge base query engine corresponds to DBPedia and Wikipedia; the neural network translation engine corresponds to a neural network machine translation model; the online translation engine corresponds to Baidu translation, Google translation, track translation and the like. Specifically, a multi-strategy translation mode is used, the triples after upper and lower reasoning are translated into English entities by a knowledge base query engine, a neural network machine translation engine and an online translation engine respectively, the English entities are translated into Chinese entities, and finally translation results with the highest ranking (i.e. top1) in the fusion comparison results are returned after recall result fusion comparison.
At operation 205, the type of storage engine includes gStore, Neo4j, and digraph; and different storage engines correspond to different data formats. Specifically, in operation 205, a corresponding storage service may be selected according to the type of the storage engine, and the translation result (i.e., the knowledge generation result) obtained by the knowledge generation module may be written into the storage service according to the data format corresponding to the type of the selected query engine.
Those skilled in the art will appreciate that in the selection of query engine types, the engine type is typically determined by the amount of storage, which is from small to large, gStore, Neo4j, and digraph. Of course, under the default condition, the engine type corresponding to the default parameter of the device at the time of starting may be used as the standard.
According to an embodiment of the present invention, after operation 205, the method may further include: and reading the written translation result from the storage service by adopting a query engine matched with the storage engine.
the knowledge graph construction method comprises the steps of firstly, extracting knowledge from a data source associated with a specific entity word to obtain a triple of the specific entity word; then fusing the obtained triples of the specific entity words with the existing triples in the concept map to obtain a fusion result; performing upper and lower reasoning on the obtained fusion result to obtain a reasoning result of the triples for expanding the specific entity words; further translating the extended triples in the inference result by using a knowledge generation engine to obtain a translation result; and finally, writing the obtained translation result into a storage service through a storage engine. Taking the concept knowledge graph effect graph of Liu De Hua as shown in FIG. 4 as an example, the finally constructed concept knowledge graph comprises more than 50 ten thousand concepts, 5000 ten thousand entities and 2.5 hundred million isA relations. Therefore, the invention is strictly organized according to the entity in the construction process of the knowledge graph, which is beneficial to the accurate understanding of the entity; moreover, by carrying out fusion, reasoning and translation processing on the triples obtained by knowledge extraction, a large-scale high-quality concept knowledge graph can be constructed and completed, so that the accuracy and the recall rate of natural language understanding are improved. For example, regarding P30 for Huache and Iphone 10 for apple, traditional natural language understanding only extracts Huache, P30, apple, Iphone 10. However, as it is well known that Huacheng refers to a company, P30 and Iphone 10 are electronic products, and since apple may be a fruit or apple company, with the help of conceptual knowledge maps, it can be inferred that apple here refers to apple company, and thus the subject of this text is the product release meeting of Huacheng and apple company.
Also, based on the knowledge-graph constructing method as described above, an embodiment of the present invention further provides a computer-readable storage medium storing a program that, when executed by a processor, causes the processor to perform at least the following operation steps: operation 201, performing knowledge extraction from a data source associated with a specific entity word to obtain a triple of the specific entity word; operation 202, fusing the obtained triples of the specific entity words with the existing triples in the concept graph to obtain a fusion result; operation 203, performing upper and lower reasoning on the obtained fusion result to obtain a reasoning result of the triplet for expanding the specific entity word; operation 204, translating the extended triple in the inference result by using a knowledge generation engine to obtain a translation result; in operation 205, the resulting translation result is written into the storage service by the storage engine.
Further, based on the above-mentioned method for constructing a knowledge graph, an embodiment of the present invention further provides an apparatus for constructing a knowledge graph, as shown in fig. 5, where the apparatus 50 includes: a knowledge extraction module 501, configured to perform knowledge extraction from a data source associated with a specific entity word to obtain a triple of the specific entity word; the fusion module 502 is configured to fuse the obtained triples of the specific entity words with existing triples in the concept graph to obtain a fusion result; the inference module 503 is configured to perform upper and lower inference on the obtained fusion result to obtain an inference result of the triplet that expands the specific entity word; a knowledge generation module 504, configured to perform translation processing on the extended triple in the inference result by using a knowledge generation engine to obtain a translation result; and a storage module 505, configured to write the obtained translation result into the storage service through the storage engine.
According to an embodiment of the present invention, the data source associated with the specific entity word includes at least one of the following types: structured data, semi-structured data, and unstructured data; correspondingly, the knowledge extraction module 501 is specifically configured to extract knowledge from a data source associated with a specific entity word in a knowledge extraction manner corresponding to a data source type, where different data source types correspond to different knowledge extraction methods.
According to an embodiment of the present invention, the knowledge extraction module 501 is specifically configured to, if the data source type is structured data, extract knowledge from a relational database using a D2R method or extract knowledge from link data using a graph mapping method; and/or, if the data source type is semi-structured data, extracting knowledge from the semi-structured data by using a wrapper; and/or if the data source type is unstructured data, extracting knowledge from the free text by using an information extraction method.
According to an embodiment of the present invention, the fusion module 502 includes: the judging unit is used for judging whether the obtained triple of the specific entity word is contained in the existing triple in the concept map; and the adding unit is used for adding the obtained triple of the specific entity word into the concept map to obtain a fusion result if the obtained triple of the specific entity word is not contained in the existing triple in the concept map.
according to an embodiment of the present invention, the determining unit is specifically configured to perform word expansion on an entity word in the obtained triple of the specific entity word to obtain an expanded triple; and judging whether the triples of the expanded words are contained in the existing triples in the concept map.
According to one embodiment of the invention, the knowledge generation engine comprises a knowledge query engine, a neural network translation engine and an online translation engine; correspondingly, the knowledge generation module 504 is specifically configured to perform translation processing on the extended triple in the inference result by using a plurality of different knowledge generation engines to obtain a plurality of processing results; performing fusion comparison on the obtained multiple processing results to obtain a fusion comparison result; and determining the processing result with the highest rank in the fusion comparison results as a translation result.
according to an embodiment of the present invention, as shown in fig. 5, the apparatus 50 further includes a query module 506, configured to read the written translation result from the storage service by using a query engine matching the storage engine.
Here, it should be noted that: the above description of the embodiment of the knowledge graph constructing apparatus is similar to the description of the embodiment of the method shown in fig. 2, and has similar beneficial effects to the embodiment of the method shown in fig. 2, and therefore, the description thereof is omitted. For technical details that are not disclosed in the embodiment of the knowledge-graph constructing apparatus of the present invention, please refer to the description of the embodiment of the method shown in fig. 2 of the present invention, which will not be repeated herein for brevity.
it should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another device, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Read Only Memory (ROM), a magnetic disk, or an optical disk.
Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a magnetic or optical disk, or other various media that can store program code.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. A method of knowledge graph construction, the method comprising:
Extracting knowledge from a data source associated with a specific entity word to obtain a triple of the specific entity word;
Fusing the obtained triples of the specific entity words with the existing triples in the concept map to obtain a fusion result;
performing upper and lower reasoning on the obtained fusion result to obtain a reasoning result of the triple of the specific entity word;
translating the extended triples in the inference result by using a knowledge generation engine to obtain a translation result;
The resulting translation results are written to the storage service by the storage engine.
2. The method of claim 1, wherein the data source associated with the particular entity word comprises at least one of the following types: structured data, semi-structured data, and unstructured data;
Accordingly, knowledge extraction from data sources associated with particular entity words includes:
And extracting knowledge from the data source associated with the specific entity word by adopting a knowledge extraction mode corresponding to the type of the data source, wherein different data source types correspond to different knowledge extraction methods.
3. the method of claim 2, wherein performing knowledge extraction from data sources associated with particular entity words comprises:
If the data source type is structured data, extracting knowledge from a relational database by using a D2R method or extracting knowledge from link data by using a graph mapping method;
And/or, if the data source type is semi-structured data, extracting knowledge from the semi-structured data by using a wrapper;
And/or if the data source type is unstructured data, extracting knowledge from the free text by using an information extraction method.
4. The method according to claim 1, wherein the fusing the obtained triples of the specific entity words with the existing triples in the concept graph to obtain a fused result comprises:
Judging whether the obtained triple of the specific entity word is contained in the existing triple in the concept map;
and if the obtained triple of the specific entity word is not contained in the existing triple in the concept map, adding the obtained triple of the specific entity word into the concept map to obtain a fusion result.
5. the method of claim 4, wherein the determining whether the obtained triples of the specific entity word are included in the existing triples in the concept graph comprises:
performing word expansion on the entity words in the obtained triples of the specific entity words to obtain triples after the word expansion;
and judging whether the triples of the expanded words are contained in the existing triples in the concept map.
6. the method of claim 1, wherein the knowledge generation engine comprises a knowledge base query engine, a neural network translation engine, and an online translation engine;
correspondingly, the method for translating the triples expanded in the inference result by using the knowledge generation engine to obtain the translation result comprises the following steps:
Translating the extended triple in the inference result by using a plurality of different knowledge generation engines to obtain a plurality of processing results;
Performing fusion comparison on the obtained multiple processing results to obtain a fusion comparison result;
And determining the processing result with the highest rank in the fusion comparison results as a translation result.
7. The method of any of claims 1 to 6, wherein after writing the obtained translation results to the storage service by the storage engine, the method further comprises:
and reading the written translation result from the storage service by adopting a query engine matched with the storage engine.
8. An apparatus for knowledge-graph construction, the apparatus comprising:
The knowledge extraction module is used for extracting knowledge from a data source associated with a specific entity word to obtain a triple of the specific entity word;
The fusion module is used for fusing the obtained triples of the specific entity words with the existing triples in the concept map to obtain a fusion result;
the reasoning module is used for carrying out upper and lower reasoning on the obtained fusion result to obtain a reasoning result of the triple of the specific entity word;
The knowledge generation module is used for translating the extended triples in the inference result by using a knowledge generation engine to obtain a translation result;
And the storage module is used for writing the obtained translation result into the storage service through the storage engine.
9. the apparatus of claim 8, wherein the data source associated with a particular entity word comprises at least one of the following types: structured data, semi-structured data, and unstructured data;
Correspondingly, the knowledge extraction module is specifically configured to extract knowledge from the data source associated with the specific entity word in a knowledge extraction manner corresponding to the type of the data source, where different data source types correspond to different knowledge extraction methods.
10. a computer-readable storage medium comprising a set of computer-executable instructions that, when executed, perform the method of knowledge-graph construction of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910875545.XA CN110569371A (en) | 2019-09-17 | 2019-09-17 | Knowledge graph construction method and device and storage equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910875545.XA CN110569371A (en) | 2019-09-17 | 2019-09-17 | Knowledge graph construction method and device and storage equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110569371A true CN110569371A (en) | 2019-12-13 |
Family
ID=68780587
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910875545.XA Pending CN110569371A (en) | 2019-09-17 | 2019-09-17 | Knowledge graph construction method and device and storage equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110569371A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111222918A (en) * | 2020-01-04 | 2020-06-02 | 厦门二五八网络科技集团股份有限公司 | Keyword mining method, device, electronic device and storage medium |
CN111444181A (en) * | 2020-03-20 | 2020-07-24 | 腾讯科技(深圳)有限公司 | Knowledge graph updating method and device and electronic equipment |
CN111767440A (en) * | 2020-09-03 | 2020-10-13 | 平安国际智慧城市科技股份有限公司 | Vehicle portrayal method based on knowledge graph, computer equipment and storage medium |
CN111897972A (en) * | 2020-08-06 | 2020-11-06 | 南方电网科学研究院有限责任公司 | A method and device for visualizing data trajectory |
CN112380864A (en) * | 2020-11-03 | 2021-02-19 | 广西大学 | Text triple labeling sample enhancement method based on translation |
CN114386607A (en) * | 2020-10-16 | 2022-04-22 | 北京鸿享技术服务有限公司 | Knowledge representation method, system, storage medium, and computer device |
CN118153961A (en) * | 2024-04-02 | 2024-06-07 | 国网江苏省电力有限公司南通供电分公司 | A method, device and storage medium for constructing a knowledge graph of a measurement site |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070250493A1 (en) * | 2006-04-19 | 2007-10-25 | Peoples Bruce E | Multilingual data querying |
CN103678714A (en) * | 2013-12-31 | 2014-03-26 | 北京百度网讯科技有限公司 | Construction method and device for entity knowledge base |
CN107368468A (en) * | 2017-06-06 | 2017-11-21 | 广东广业开元科技有限公司 | A kind of generation method and system of O&M knowledge mapping |
CN109271529A (en) * | 2018-10-10 | 2019-01-25 | 内蒙古大学 | Cyrillic Mongolian and the double language knowledge mapping construction methods of traditional Mongolian |
CN109378053A (en) * | 2018-11-30 | 2019-02-22 | 安徽影联云享医疗科技有限公司 | A kind of knowledge mapping construction method for medical image |
CN110008355A (en) * | 2019-04-11 | 2019-07-12 | 华北科技学院 | Disaster scene information fusion method and device based on knowledge graph |
-
2019
- 2019-09-17 CN CN201910875545.XA patent/CN110569371A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070250493A1 (en) * | 2006-04-19 | 2007-10-25 | Peoples Bruce E | Multilingual data querying |
CN103678714A (en) * | 2013-12-31 | 2014-03-26 | 北京百度网讯科技有限公司 | Construction method and device for entity knowledge base |
CN107368468A (en) * | 2017-06-06 | 2017-11-21 | 广东广业开元科技有限公司 | A kind of generation method and system of O&M knowledge mapping |
CN109271529A (en) * | 2018-10-10 | 2019-01-25 | 内蒙古大学 | Cyrillic Mongolian and the double language knowledge mapping construction methods of traditional Mongolian |
CN109378053A (en) * | 2018-11-30 | 2019-02-22 | 安徽影联云享医疗科技有限公司 | A kind of knowledge mapping construction method for medical image |
CN110008355A (en) * | 2019-04-11 | 2019-07-12 | 华北科技学院 | Disaster scene information fusion method and device based on knowledge graph |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111222918A (en) * | 2020-01-04 | 2020-06-02 | 厦门二五八网络科技集团股份有限公司 | Keyword mining method, device, electronic device and storage medium |
CN111222918B (en) * | 2020-01-04 | 2023-06-30 | 厦门二五八网络科技集团股份有限公司 | Keyword mining method and device, electronic equipment and storage medium |
CN111444181A (en) * | 2020-03-20 | 2020-07-24 | 腾讯科技(深圳)有限公司 | Knowledge graph updating method and device and electronic equipment |
CN111897972A (en) * | 2020-08-06 | 2020-11-06 | 南方电网科学研究院有限责任公司 | A method and device for visualizing data trajectory |
CN111897972B (en) * | 2020-08-06 | 2023-10-17 | 南方电网科学研究院有限责任公司 | A data trajectory visualization method and device |
CN111767440A (en) * | 2020-09-03 | 2020-10-13 | 平安国际智慧城市科技股份有限公司 | Vehicle portrayal method based on knowledge graph, computer equipment and storage medium |
CN114386607A (en) * | 2020-10-16 | 2022-04-22 | 北京鸿享技术服务有限公司 | Knowledge representation method, system, storage medium, and computer device |
CN112380864A (en) * | 2020-11-03 | 2021-02-19 | 广西大学 | Text triple labeling sample enhancement method based on translation |
CN118153961A (en) * | 2024-04-02 | 2024-06-07 | 国网江苏省电力有限公司南通供电分公司 | A method, device and storage medium for constructing a knowledge graph of a measurement site |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110569371A (en) | Knowledge graph construction method and device and storage equipment | |
US8868609B2 (en) | Tagging method and apparatus based on structured data set | |
US10180967B2 (en) | Performing application searches | |
JP5576003B1 (en) | Corpus generation device, corpus generation method, and corpus generation program | |
CN107562600B (en) | Page detection method and device, computing equipment and storage medium | |
CN112463991B (en) | Historical behavior data processing method and device, computer equipment and storage medium | |
CN111258577B (en) | Page rendering method, device, electronic equipment and storage medium | |
CN111198852A (en) | Knowledge graph driven metadata relation reasoning method under micro-service architecture | |
CN114021042A (en) | Web page content extraction method, device, computer equipment and storage medium | |
CN113157899B (en) | Big data portrait analysis method, server and readable storage medium | |
CN112463986A (en) | Information storage method and device | |
CN109598171A (en) | A kind of data processing method based on two dimensional code, apparatus and system | |
CN107273548A (en) | The implementation method and device of dynamic page | |
CN109191158A (en) | The processing method and processing equipment of user's portrait label data | |
US20190079649A1 (en) | Ui rendering based on adaptive label text infrastructure | |
US9674259B1 (en) | Semantic processing of content for product identification | |
Zhang et al. | Annotating needles in the haystack without looking: Product information extraction from emails | |
CN111652658A (en) | Portrait fusion method, apparatus, electronic device and computer readable storage medium | |
CN113505889B (en) | Processing method and device of mapping knowledge base, computer equipment and storage medium | |
Comas‐Forgas et al. | ‘AI‐navigating’or ‘AI‐sinking’? An analysis of verbs in research articles titles suspicious of containing AI‐generated/assisted content | |
CN112988986A (en) | Man-machine interaction method, device and equipment | |
CN114510563B (en) | A method and device for extracting abstract text | |
Marinho et al. | Labelled network subgraphs reveal stylistic subtleties in written texts | |
Avignone et al. | Generation of textual/video descriptions for technological products based on structured data | |
CN119336408B (en) | Interface configuration method, device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191213 |
|
RJ01 | Rejection of invention patent application after publication |