[go: up one dir, main page]

CN111221979A - Medicine knowledge graph construction method and system - Google Patents

Medicine knowledge graph construction method and system Download PDF

Info

Publication number
CN111221979A
CN111221979A CN201911421839.1A CN201911421839A CN111221979A CN 111221979 A CN111221979 A CN 111221979A CN 201911421839 A CN201911421839 A CN 201911421839A CN 111221979 A CN111221979 A CN 111221979A
Authority
CN
China
Prior art keywords
entity
drug
knowledge
entities
medicine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911421839.1A
Other languages
Chinese (zh)
Other versions
CN111221979B (en
Inventor
刘大海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zuoyi Health Technology Co Ltd
Original Assignee
Beijing Zuoyi Health Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zuoyi Health Technology Co Ltd filed Critical Beijing Zuoyi Health Technology Co Ltd
Priority to CN201911421839.1A priority Critical patent/CN111221979B/en
Publication of CN111221979A publication Critical patent/CN111221979A/en
Application granted granted Critical
Publication of CN111221979B publication Critical patent/CN111221979B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a medicine knowledge graph construction method, which comprises the following steps: determining a number of categories of entities related to the knowledge of the drug, the entities comprising a number of knowledge elements; determining a relationship/attribute flag for reflecting a relationship between knowledge elements of the entity; acquiring drug specification data; establishing and using a matching template to match the drug specification data to obtain a triple; and/or manually processing the drug instruction data to obtain a triple; and fusing and storing the triples to obtain the medicine knowledge graph. According to the method, the triple is obtained by processing the medicine specification data through the matching template and the manual marking in an independent or combined mode, and the medicine specification data is simplified and fused before the manual marking, so that the information amount of texts is greatly reduced, and the workload of the manual marking is reduced; after matching/labeling is finished, inconsistency of expression in the text and conflict of the content are eliminated through alignment processing, conflict processing and the like, and the accuracy of the text is guaranteed.

Description

Medicine knowledge graph construction method and system
Technical Field
The invention relates to the technical field of knowledge maps, in particular to a medicine knowledge map construction method and system and a computer readable storage medium.
Background
The knowledge graph is essentially a knowledge base of a Semantic Network (Semantic Network), and can be simply understood as a multi-relation graph. The knowledge base is a special database for knowledge management, so as to facilitate the collection, arrangement and extraction of knowledge in related fields.
In a knowledge graph, we usually express nodes in the graph by "entities" and express "edges" in the graph by "relationships/attributes". An entity refers to a real-world thing such as a place name, a concept, a medicine, a component, a company, etc., a relationship is used to express some kind of relation between different entities, an attribute is used to describe an intrinsic feature of an entity, and the attribute and the relationship are sometimes interchangeable.
In the prior art, a high-quality medicine knowledge graph is lacked, because a scheme of extracting knowledge from a medicine specification by using a character string matching or entity recognition model in a plurality of medicine knowledge graph construction schemes has a non-ideal effect, and a large amount of error data and missing data exist no matter matching or entity recognition. The entity recognition technology firstly needs to label samples manually, and then trains an entity recognition model. The medicine labeling scene is very complex, needs thousands to tens of thousands of sample labeling quantities, and has no accurate character string matching and extracting effect in the actual effect.
The medicine industry is a special industry, the error and the deficiency of data can cause great health hidden dangers, and the statistical data shows that 250 ten thousand people in China annually damage the health because of wrong medication, wherein 20 ten thousand people die, which is twice of the number of dead people in national traffic accidents, and the accuracy of the visible medicine knowledge is particularly important.
Disclosure of Invention
In view of the above, the present invention is directed to a method and a system for constructing a drug knowledge graph, which can construct a drug knowledge graph conveniently and efficiently, reduce the amount of information to be processed by fusing/merging drug specification data during the construction process, and avoid inconsistent expressions and errors in the drug knowledge graph by fusing template matching and manual labeling results.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
a method for constructing a knowledge graph of a drug, comprising:
determining a number of categories of entities related to the knowledge of the drug, the entities comprising a number of knowledge elements;
determining a relationship/attribute flag for reflecting a relationship between knowledge elements of the entity;
acquiring drug specification data;
establishing and using a matching template to match the drug specification data to obtain a triple; and/or manually processing the drug instruction data to obtain a triple; wherein the triple is used for reflecting the relationship between the two knowledge elements and the relationship/attribute mark;
and fusing and storing the triples to obtain the medicine knowledge graph.
Further, the determining several categories of entities about drug knowledge includes:
enumerating the knowledge elements related to the drug;
determining the entity by categorizing the knowledge elements.
Further, the entities include at least one or several of the following types: a generic name entity, a trade name entity, a chemical name entity, an approval document entity, a dosage form entity, a specification entity, a mode of administration entity, a time of administration entity, a notice entity, a symptom entity, a disease entity, a population entity, a drug category entity, a component entity, a gender entity, an interaction entity, a drug compatibility entity.
Further, the determining a relationship/attribute flag for reflecting a relationship between knowledge elements of the entity includes:
determining a relationship/attribute label reflecting a relationship between two of said knowledge elements belonging to different or same types of entities.
Further, the determining several categories of entities about drug knowledge includes:
the entities comprise universal name entities, and the universal name entities comprise a plurality of universal names;
the acquiring of the drug instruction data comprises:
and combining a plurality of the medicine specification data with the same common name into one medicine specification data.
Further, the creating and using a matching template to match the drug specification data to obtain a triple comprising:
the matching template includes:
the character slot position is used for defining the character size and the character type for matching the medicine specification data;
the dictionary slot position is used for defining a dictionary, and the dictionary comprises knowledge elements of an entity;
and the auxiliary words are used for combining with the knowledge elements in the dictionary to form key words for matching the medicine specification data.
Further, the obtaining of the triplet by manually processing the drug specification data includes:
manually marking knowledge elements of the entities in the drug specification data and storing the knowledge elements in a preset table;
and performing form conversion on the table to obtain the triple.
Further, the fusing and saving the triples to obtain the drug knowledge graph comprises:
in the case where the drug specification data is processed manually, or in the case where the drug knowledge data is processed by manual processing and a matching template, at least one of the following fusion processes is performed:
alignment processing for fusing similar knowledge elements;
and conflict processing, which is used for constructing a conflict rule based on objective facts, detecting the conflict error of the triple according to the conflict rule, and eliminating the conflict error in a manual processing mode.
The invention also discloses a medicine knowledge graph construction system, which comprises the following steps:
a drug knowledge system building module for:
determining a number of categories of entities related to the knowledge of the drug, the entities comprising a number of knowledge elements; and determining a relationship/attribute flag for reflecting a relationship between knowledge elements of the entity;
a drug knowledge acquisition module to:
acquiring drug specification data; and
establishing medicine knowledge and matching the medicine specification data by using a matching template to obtain a triple; and/or manually processing the drug instruction data to obtain a triple; wherein the triple is used for reflecting the relationship between the two knowledge elements and the relationship/attribute mark;
the medicine knowledge fusion module is used for fusing the triples;
and the medicine knowledge storage module is used for storing the triple to obtain the medicine knowledge map.
Further, the determining several categories of entities about drug knowledge includes:
enumerating the knowledge elements related to the drug;
determining the entity by categorizing the knowledge elements.
Further, the entities include at least one or several of the following types: a generic name entity, a trade name entity, a chemical name entity, an approval document entity, a dosage form entity, a specification entity, a mode of administration entity, a time of administration entity, a notice entity, a symptom entity, a disease entity, a population entity, a drug category entity, a component entity, a gender entity, an interaction entity, a drug compatibility entity.
Further, the determining a relationship/attribute flag for reflecting a relationship between knowledge elements of the entity includes:
determining a relationship/attribute label reflecting a relationship between two of said knowledge elements belonging to different or same types of entities.
Further, the determining several categories of entities about drug knowledge includes:
the entities comprise universal name entities, and the universal name entities comprise a plurality of universal names;
the acquiring of the drug instruction data comprises:
and combining a plurality of the medicine specification data with the same common name into one medicine specification data.
Further, the creating and using a matching template to match the drug specification data to obtain a triple comprising:
the matching template includes:
the character slot position is used for defining the character size and the character type for matching the medicine specification data;
the dictionary slot position is used for defining a dictionary, and the dictionary comprises knowledge elements of an entity;
and the auxiliary words are used for combining with the knowledge elements in the dictionary to form key words for matching the medicine specification data.
Further, the obtaining of the triplet by manually processing the drug specification data includes:
manually marking knowledge elements of the entities in the drug specification data and storing the knowledge elements in a preset table;
and performing form conversion on the table to obtain the triple.
Further, the fusing and saving the triples to obtain the drug knowledge graph comprises:
in the case where the drug specification data is processed manually, or in the case where the drug knowledge data is processed by manual processing and a matching template, at least one of the following fusion processes is performed:
alignment processing for fusing similar knowledge elements;
and conflict processing, which is used for constructing a conflict rule based on objective facts, detecting the conflict error of the triple according to the conflict rule, and eliminating the conflict error in a manual processing mode.
The invention also discloses a computer readable storage medium, which stores a computer program for executing the medicine knowledge graph construction method of the embodiment.
The invention has at least the following beneficial effects:
according to the method, the triple is obtained by processing the medicine specification data through the matching template and the manual marking in an independent or combined mode, and the medicine specification data is simplified and fused before the manual marking, so that the information amount of texts is greatly reduced, and the workload of the manual marking is reduced; after matching/labeling is finished, inconsistency of expression in the text and conflict of the content are eliminated through alignment processing, conflict processing and the like, and the accuracy of the text is guaranteed.
Additional features and advantages of the invention will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the description, serve to explain the invention and not to limit the invention.
In the drawings:
FIG. 1 is a flow chart of a drug knowledge graph construction method according to an embodiment of the invention;
FIG. 2 is a flowchart of a method for determining the entity and the relationship/attribute tag according to an embodiment of the present invention;
fig. 3 is a flowchart of triple obtaining according to an embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating embodiments of the invention, are given by way of illustration and explanation only, not limitation.
As shown in FIG. 1, the invention discloses a method for constructing a knowledge graph of a medicine, which comprises the following steps:
determining a number of categories of entities related to the knowledge of the drug, the entities comprising a number of knowledge elements;
determining a relationship/attribute flag for reflecting a relationship between knowledge elements of the entity;
acquiring drug specification data;
establishing and using a matching template to match the drug specification data to obtain a triple; and/or manually processing the drug instruction data to obtain a triple; wherein the triple is used for reflecting the relationship between the two knowledge elements and the relationship/attribute mark;
and fusing and storing the triples to obtain the medicine knowledge graph.
As shown in FIG. 2, the determining several categories of entities about drug knowledge includes: enumerating the knowledge elements related to the drug; determining the entity by categorizing the knowledge elements.
Alternatively, when a drug knowledge system is constructed, drug elements may not be listed, but drug knowledge may be classified directly according to the experience of drug experts to obtain entities of various types.
The entities include at least one or several of the following types: a generic name entity, a trade name entity, a chemical name entity, an approval document entity, a dosage form entity, a specification entity, a mode of administration entity, a time of administration entity, a notice entity, a symptom entity, a disease entity, a population entity, a drug category entity, a component entity, a gender entity, an interaction entity, a drug compatibility entity.
Further, the determining a relationship/attribute flag for reflecting a relationship between knowledge elements of the entity includes:
determining a relationship/attribute label reflecting a relationship between two of said knowledge elements belonging to different or same types of entities.
As shown in FIG. 3, the determining several categories of entities about drug knowledge includes:
the entities comprise universal name entities, and the universal name entities comprise a plurality of universal names;
the acquiring of the drug instruction data comprises:
and combining a plurality of the medicine specification data with the same common name into one medicine specification data.
Further, the creating and using a matching template to match the drug specification data to obtain a triple comprising:
the matching template includes:
the character slot position is used for defining the character size and the character type for matching the medicine specification data;
the dictionary slot position is used for defining a dictionary, and the dictionary comprises knowledge elements of an entity;
and the auxiliary words are used for combining with the knowledge elements in the dictionary to form key words for matching the medicine specification data.
As shown in fig. 3, the obtaining of the triplet by manually processing the drug specification data includes:
manually marking knowledge elements of the entities in the drug specification data and storing the knowledge elements in a preset table;
and performing form conversion on the table to obtain the triple.
As shown in fig. 3, the merging and saving the triples to obtain the drug knowledge graph includes:
in the case where the drug specification data is processed manually, or in the case where the drug knowledge data is processed by manual processing and a matching template, at least one of the following fusion processes is performed:
alignment processing for fusing similar knowledge elements;
and conflict processing, which is used for constructing a conflict rule based on objective facts, detecting the conflict error of the triple according to the conflict rule, and eliminating the conflict error in a manual processing mode.
The invention also discloses a medicine knowledge graph construction system, which comprises the following steps:
a drug knowledge system building module for:
determining a number of categories of entities related to the knowledge of the drug, the entities comprising a number of knowledge elements; and determining relationship/attribute indicia for reflecting relationships between knowledge elements of the entity;
a drug knowledge acquisition module to:
acquiring drug specification data; and
establishing medicine knowledge and matching the medicine specification data by using a matching template to obtain a triple; and/or manually processing the drug instruction data to obtain a triple; wherein the triple is used for reflecting the relationship between the two knowledge elements and the relationship/attribute mark;
the medicine knowledge fusion module is used for fusing the triples;
and the medicine knowledge storage module is used for storing the triple to obtain the medicine knowledge map.
Further, the determining several categories of entities about drug knowledge includes:
enumerating the knowledge elements related to the drug;
determining the entity by categorizing the knowledge elements.
Further, the entities include at least one or several of the following types: a generic name entity, a trade name entity, a chemical name entity, an approval document entity, a dosage form entity, a specification entity, a mode of administration entity, a time of administration entity, a notice entity, a symptom entity, a disease entity, a population entity, a drug category entity, a component entity, a gender entity, an interaction entity, a drug compatibility entity.
Further, the determining a relationship/attribute flag for reflecting a relationship between knowledge elements of the entity includes:
determining a relationship/attribute label reflecting a relationship between two of said knowledge elements belonging to different or same types of entities.
Further, the determining several categories of entities about drug knowledge includes:
the entities comprise universal name entities, and the universal name entities comprise a plurality of universal names;
the acquiring of the drug instruction data comprises:
and combining a plurality of the medicine specification data with the same common name into one medicine specification data.
The data information amount of the existing medicine specification is too large, so that the processing such as labeling and the like cannot be finished by limited manpower. The medicines of different manufacturers with the same common name are consistent in components, so that the medicine specification is seriously repeated. In the invention, when data is labeled manually, the data is labeled by taking the universal names of the medicines as a unit to fuse the data of the medicine specification, and after the data is fused, the number of all the universal names of the medicines in the market is less than 15000, which can be realized by manual labeling.
Further, the creating and using a matching template to match the drug specification data to obtain a triple comprising:
the matching template includes:
the character slot position is used for defining the character size and the character type for matching the medicine specification data;
the dictionary slot position is used for defining a dictionary, and the dictionary comprises knowledge elements of an entity;
and the auxiliary words are used for combining with the knowledge elements in the dictionary to form key words for matching the medicine specification data.
Further, the obtaining of the triplet by manually processing the drug specification data includes:
manually marking knowledge elements of the entities in the drug specification data and storing the knowledge elements in a preset table;
and performing form conversion on the table to obtain the triple.
Further, the fusing and saving the triples to obtain the drug knowledge graph comprises:
in the case where the drug specification data is processed manually, or in the case where the drug knowledge data is processed by manual processing and a matching template, at least one of the following fusion processes is performed:
alignment processing for fusing similar knowledge elements;
and conflict processing, which is used for constructing a conflict rule based on objective facts, detecting the conflict error of the triple according to the conflict rule, and eliminating the conflict error in a manual processing mode.
The invention also discloses a computer readable storage medium, which stores a computer program for executing the medicine knowledge graph construction method of the embodiment.
As shown in fig. 1, fig. 2 and fig. 3, the following provides a preferred embodiment of the present invention for explaining the technical solution of the present invention in detail:
(1) the drug lists all elements (knowledge elements) related to the drug knowledge according to experience and learned knowledge, such as: ibuprofen sustained release capsule, ibuprofen, sustained release capsule, headache, arthralgia, toothache, fever, 0.3g, fenbidon, component, pregnant woman are forbidden, etc.
(2) Classifying according to the listed elements to determine different types of entities, using an element organization classification system in which concepts are represented as the following entities: common name, trade name, chemical name, specification, approved literature, dosage form, administration time, administration mode, cautionary matters, diseases, symptoms, crowd, interaction, adverse reaction, medicine category, components, solvent, compatibility and incompatibility.
(3) After the classification system is determined, attributes and relationships need to be defined for each category, i.e., relationships/attribute labels (relationship names) between knowledge elements of the entities are determined.
Defining "drug trade name" as a relationship name between a common name entity and a trade name entity;
defining 'chemical name of medicine' as a relation name between a universal name entity and a chemical name entity;
defining 'number' as a relation name between a universal name entity and an approved document number entity;
defining "pharmaceutical dosage form" as a name of relationship between a generic name entity and a dosage form entity;
defining the medicine taking time as a relation name between a universal name entity and a medicine taking time entity;
defining a medicine taking method as a relation name between a universal name entity and a medicine taking mode entity;
defining 'using the medical advice' as a relation name between a universal name entity and a notice entity;
defining "drug indications" as names of relationships between generic name entities and symptom entities;
defining 'drug contraband' as a relationship name between a generic name entity and a symptom entity;
defining "medication contraindications" as names of relationships between generic name entities and symptom entities;
defining 'a drug cautious' as a relationship name between a generic name entity and a symptom entity;
defining 'adverse drug reactions' as a relationship name between a generic name entity and a symptom entity;
defining 'drug-adapted disease' as a name of relationship between a generic name entity and a disease entity;
defining 'drug banned disease' as a name of relationship between a generic name entity and a disease entity;
defining 'medicine contraindication disease' as a relation name between a universal name entity and a disease entity;
defining 'a drug cautious disease' as a relation name between a universal name entity and a disease entity;
defining 'adverse drug reactions' as a relationship name between a generic name entity and a disease entity;
defining 'medicine adaptive population' as a relation name between a universal name entity and a population entity;
defining 'drug banned population' as a relation name between a universal name entity and a population entity;
defining a drug-contraindicated crowd as a relation name between a universal name entity and a crowd entity;
defining 'medicine cautiously-used crowd' as a relation name between a universal name entity and a crowd entity;
defining a 'generic class' as a relation name between a universal name entity and a medicine category entity;
defining 'medicine components' as relationship names between the universal name entities and the component entities;
defining 'drug specification' as a relationship name between an approval document number entity and a specification entity;
defining "carefully using" as a relationship name between component entities and component entities;
defining 'prohibited use together' as a relation name between the component entity and the component entity;
defining 'can share' as a relation name between the component entities;
defining "should be noted jointly" as a relationship name between a component entity and a component entity;
defining "discreet use" as a name of a relationship between a component entity and a category entity;
defining "prohibited use" as a relationship name between the component entity and the category entity;
defining "can share" as the relation name between the component entity and the category entity;
defining "shared attention" as a name of a relationship between a component entity and a category entity;
defining "vehicle" as the name of the relationship between the generic name entity and the generic name entity;
defining 'forbidden compatibility' as a relation name between a universal name entity and a universal name entity;
defining incompatible relation as the relation name between the universal name entity and the universal name entity;
defining 'carefully combining' as a relation name between a universal name entity and a universal name entity;
the "applicable gender" is defined as the name of the relationship between the generic name entity and the gender entity.
(4) Electronic version specification text data, specification photos, electronic version books, and the like are acquired as medicine knowledge data.
(5) The description of the same generic name is fused. When a plurality of specifications are searched for a medicine with the same common name, the specification with the latest medicine revision date is selected as the specification of the medicine with the common name. If the revision dates of the medicines are consistent, the longest character string of the instruction book is selected as the instruction book of the medicine.
(6) And constructing a matching template, acquiring an entity of medicine knowledge, and constructing a slot dictionary.
(7) Matching the drug specification and the book, the matching results are structured into triplets.
Such as: when the forbidden data is extracted, the template is constructed as follows:
a { w:0,5} [ disease ] is disabled
B { w:0,5} [ symptom ] is disabled
C { w:0,5} [ scope ] is disabled
Here, { w:0,5} means 0 to 5 arbitrary characters, "w" means that the slot type is a character type, "0, 5" means that the slot length is 0 at minimum and 5 at maximum, [ disease ], [ symmetry ], [ scope ] is a slot dictionary, [ disease ] represents all diseases, [ symmetry ] represents all symptoms, and [ scope ] represents all people. The algorithm matches sentences meeting the template, such as the sentences in the specification of the ibuprofen sustained-release capsule: "forbidden to other persons allergic to non-steroidal anti-inflammatory drugs", "fit { w:0,5} to other" in the [ scope ] dictionary, the "persons allergic to non-steroidal anti-inflammatory drugs" is collected in advance, and thus the sentence hits template C above. The universal name of the medicine can be obtained in the universal name field of the medicine specification, and then the knowledge can be extracted:
(ibuprofen sustained release capsules, medicine non-use population, non-steroidal anti-inflammatory drug allergic person)
This knowledge is an expression of the triplet.
The template in this example is relatively simple, and many complex templates can be constructed according to the specification and the rules in real-world operation.
(8) And the pharmacist reads the medicine knowledge information such as the medicine specification and the like, marks the knowledge elements in the classified entities and stores the knowledge elements in the table.
(9) The pharmacist annotation data is converted to a ternary representation.
(10) For each class of entities, the similarity of each entity to all other entities in the same class is calculated. The similarity calculation method uses a cosine similarity calculation method to calculate the similarity of the character patterns, uses word vectors trained by medical data to calculate the similarity of semantics, and classifies two entities into one class if one of the two calculated similarities is high. A normalized dictionary, also called synonym dictionary, is thus constructed.
The final dictionary format is shown in table 2:
0.9% sodium chloride injection 0.9% sodium chloride needle
0.9% sodium chloride injection Sodium chloride injection (0.9%)
0.9% sodium chloride injection Physiological sodium chloride injection
TABLE 2
In table 2, the first column is the standard expression after normalization, and the second column is the other expression.
(11) And (4) performing entity alignment on the knowledge mined by the machine and the knowledge labeled by the pharmacist by using a normalization dictionary, and solving the problem of different expressions of the same entity.
(12) Constructing conflict detection rules, wherein a disease cannot be simultaneously an applicable disease of a medicament and a disease forbidden to be used with caution; female drugs should not be indicated for male disorders; the medicinal components should not contain contraindicated components; the drugs forbidden by the pregnant woman should not treat the symptoms of the pregnant woman, all data of the two drugs should not be completely the same, and conflict data are extracted.
(13) The knowledge of character string matching and manual labeling should be completely consistent theoretically, but because errors occur in both the character string matching and the manual labeling, inconsistent data under the two methods need to be extracted. For example, for a certain medicine, the matching module is matched to extract the prohibition of pregnant women, the pharmacist marks that the lactating women are prohibited, and the data should be extracted for examination.
(14) And (4) submitting the conflict data extracted in the steps (12) and (13) to a pharmacist for inspection, and revising the instruction book by the pharmacist to correct the conflict data.
(15) And importing the data after the conflict is corrected into a redis graph database in a triple form.
Although the embodiments of the present invention have been described in detail with reference to the accompanying drawings, the embodiments of the present invention are not limited to the details of the above embodiments, and various simple modifications can be made to the technical solutions of the embodiments of the present invention within the technical idea of the embodiments of the present invention, and the simple modifications all belong to the protection scope of the embodiments of the present invention.
It should be noted that the various features described in the above embodiments may be combined in any suitable manner without departing from the scope of the invention. In order to avoid unnecessary repetition, the embodiments of the present invention do not describe every possible combination.
Those skilled in the art will understand that all or part of the steps in the method according to the above embodiments may be implemented by a program, which is stored in a storage medium and includes several instructions to enable a single chip, a chip, or a processor (processor) to execute all or part of the steps in the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In addition, any combination of various different implementation manners of the embodiments of the present invention is also possible, and the embodiments of the present invention should be considered as disclosed in the embodiments of the present invention as long as the combination does not depart from the spirit of the embodiments of the present invention.

Claims (16)

1. A method for constructing a knowledge graph of a drug is characterized by comprising the following steps:
determining a number of categories of entities related to the knowledge of the drug, the entities comprising a number of knowledge elements;
determining a relationship/attribute flag for reflecting a relationship between knowledge elements of the entity;
acquiring drug specification data;
establishing and using a matching template to match the drug specification data to obtain a triple; and/or manually processing the drug instruction data to obtain a triple; wherein the triple is used for reflecting the relationship between the two knowledge elements and the relationship/attribute mark;
and fusing and storing the triples to obtain the medicine knowledge graph.
2. The drug knowledge graph building method of claim 1, wherein the determining several categories of entities related to drug knowledge comprises:
enumerating the knowledge elements related to the drug;
determining the entity by categorizing the knowledge elements.
3. The drug knowledge graph building method of claim 2, wherein the entities comprise at least one or more of the following types: a generic name entity, a trade name entity, a chemical name entity, an approval document entity, a dosage form entity, a specification entity, a mode of administration entity, a time of administration entity, a notice entity, a symptom entity, a disease entity, a population entity, a drug category entity, a component entity, a gender entity, an interaction entity, a drug compatibility entity.
4. The method of claim 1, wherein the determining of the relationship/attribute labels reflecting the relationship between the knowledge elements of the entity comprises:
determining a relationship/attribute label reflecting a relationship between two of said knowledge elements belonging to different or same types of entities.
The drug knowledge graph building method of claim 1, wherein the determining several categories of entities related to drug knowledge comprises:
the entities comprise universal name entities, and the universal name entities comprise a plurality of universal names;
the acquiring of the drug instruction data comprises:
and combining a plurality of the medicine specification data with the same common name into one medicine specification data.
5. The drug knowledge graph construction method of claim 1, wherein the establishing and using a matching template to match the drug specification data to obtain triples comprises:
the matching template includes:
the character slot position is used for defining the character size and the character type for matching the medicine specification data;
the dictionary slot position is used for defining a dictionary, and the dictionary comprises knowledge elements of an entity;
and the auxiliary words are used for combining with the knowledge elements in the dictionary to form key words for matching the medicine specification data.
6. The method of claim 1, wherein the obtaining triples by manually processing the drug specification data comprises:
manually marking knowledge elements of the entities in the drug specification data and storing the knowledge elements in a preset table;
and performing form conversion on the table to obtain the triple.
7. The method of claim 1, wherein the fusing and saving the triples to obtain the drug knowledge-graph comprises:
in the case where the drug specification data is processed manually, or in the case where the drug knowledge data is processed by manual processing and a matching template, at least one of the following fusion processes is performed:
alignment processing for fusing similar knowledge elements;
and conflict processing, which is used for constructing a conflict rule based on objective facts, detecting the conflict error of the triple according to the conflict rule, and eliminating the conflict error in a manual processing mode.
8. A drug knowledge graph construction system, comprising:
a drug knowledge system building module for:
determining a number of categories of entities related to the knowledge of the drug, the entities comprising a number of knowledge elements; and
determining a relationship/attribute flag for reflecting a relationship between knowledge elements of the entity;
a drug knowledge acquisition module to:
acquiring drug specification data; and
establishing medicine knowledge and matching the medicine specification data by using a matching template to obtain a triple; and/or manually processing the drug instruction data to obtain a triple; wherein the triple is used for reflecting the relationship between the two knowledge elements and the relationship/attribute mark;
the medicine knowledge fusion module is used for fusing the triples;
and the medicine knowledge storage module is used for storing the triple to obtain the medicine knowledge map.
9. The drug knowledge graph building system of claim 8 wherein the determining several categories of entities related to drug knowledge comprises:
enumerating the knowledge elements related to the drug;
determining the entity by categorizing the knowledge elements.
10. The drug knowledge graph building system of claim 9 wherein the entities comprise at least one or more of the following types: a generic name entity, a trade name entity, a chemical name entity, an approval document entity, a dosage form entity, a specification entity, a mode of administration entity, a time of administration entity, a notice entity, a symptom entity, a disease entity, a population entity, a drug category entity, a component entity, a gender entity, an interaction entity, a drug compatibility entity.
11. The drug knowledge graph building system of claim 8, wherein the determining of the relationship/attribute labels reflecting the relationships between the knowledge elements of the entity comprises:
determining a relationship/attribute label reflecting a relationship between two of said knowledge elements belonging to different or same types of entities.
12. The drug knowledge graph building system of claim 8 wherein the determining several categories of entities related to drug knowledge comprises:
the entities comprise universal name entities, and the universal name entities comprise a plurality of universal names;
the acquiring of the drug instruction data comprises:
and combining a plurality of the medicine specification data with the same common name into one medicine specification data.
13. The drug knowledge-graph building system of claim 8, wherein the establishing and using a matching template to match the drug specification data results in a triplet comprising:
the matching template includes:
the character slot position is used for defining the character size and the character type for matching the medicine specification data;
the dictionary slot position is used for defining a dictionary, and the dictionary comprises knowledge elements of an entity;
and the auxiliary words are used for combining with the knowledge elements in the dictionary to form key words for matching the medicine specification data.
14. The drug knowledge graph building system of claim 8, wherein the obtaining of triples by manually processing the drug specification data comprises:
manually marking knowledge elements of the entities in the drug specification data and storing the knowledge elements in a preset table;
and performing form conversion on the table to obtain the triple.
15. The drug knowledge graph building system of claim 8, wherein the fusing and saving the triples to obtain the drug knowledge graph comprises:
in the case where the drug specification data is processed manually, or in the case where the drug knowledge data is processed by manual processing and a matching template, at least one of the following fusion processes is performed:
alignment processing for fusing similar knowledge elements;
and conflict processing, which is used for constructing a conflict rule based on objective facts, detecting the conflict error of the triple according to the conflict rule, and eliminating the conflict error in a manual processing mode.
16. A computer-readable storage medium, the storage medium storing a computer program for performing the method of any of the preceding claims 1-7.
CN201911421839.1A 2019-12-31 2019-12-31 Medicine knowledge graph construction method and system Active CN111221979B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911421839.1A CN111221979B (en) 2019-12-31 2019-12-31 Medicine knowledge graph construction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911421839.1A CN111221979B (en) 2019-12-31 2019-12-31 Medicine knowledge graph construction method and system

Publications (2)

Publication Number Publication Date
CN111221979A true CN111221979A (en) 2020-06-02
CN111221979B CN111221979B (en) 2021-05-28

Family

ID=70826608

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911421839.1A Active CN111221979B (en) 2019-12-31 2019-12-31 Medicine knowledge graph construction method and system

Country Status (1)

Country Link
CN (1) CN111221979B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111914095A (en) * 2020-06-20 2020-11-10 武汉海云健康科技股份有限公司 Medicine interaction relation extraction method and system
CN111968756A (en) * 2020-07-24 2020-11-20 北京索飞麦迪科技有限公司 Knowledge graph construction method and device for medicine specification
CN111985224A (en) * 2020-08-31 2020-11-24 平安医疗健康管理股份有限公司 Medication instruction text processing method, device, equipment and storage medium
CN112053762A (en) * 2020-08-12 2020-12-08 北京左医健康技术有限公司 Medication management method, medication management apparatus, and computer-readable storage medium
CN113076301A (en) * 2021-03-31 2021-07-06 北京搜狗科技发展有限公司 Knowledge base construction method, information query method, device and equipment
WO2021139101A1 (en) * 2020-06-09 2021-07-15 平安科技(深圳)有限公司 Method and apparatus for constructing drug knowledge graph, and computer device
CN114510577A (en) * 2021-12-31 2022-05-17 清华大学 Construction method and device of chemical safety knowledge graph
CN114691887A (en) * 2022-03-21 2022-07-01 浙江大华技术股份有限公司 Attribute triple combination method, apparatus, device and medium
CN114882985A (en) * 2022-07-11 2022-08-09 北京泽桥医疗科技股份有限公司 Medicine multimedia management system and method based on database and AI algorithm identification
CN115344715A (en) * 2022-09-16 2022-11-15 北京富通东方科技有限公司 Medical knowledge map conflict detection method
CN116486939A (en) * 2022-12-26 2023-07-25 北京左医健康技术有限公司 Data mining method and system for medicine knowledge graph and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180121545A1 (en) * 2016-09-17 2018-05-03 Cogilex R&D inc. Methods and system for improving the relevance, usefulness, and efficiency of search engine technology
CN110377755A (en) * 2019-07-03 2019-10-25 江苏省人民医院(南京医科大学第一附属医院) Reasonable medication knowledge map construction method based on medicine specification
CN110390021A (en) * 2019-06-13 2019-10-29 平安科技(深圳)有限公司 Drug knowledge mapping construction method, device, computer equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180121545A1 (en) * 2016-09-17 2018-05-03 Cogilex R&D inc. Methods and system for improving the relevance, usefulness, and efficiency of search engine technology
CN110390021A (en) * 2019-06-13 2019-10-29 平安科技(深圳)有限公司 Drug knowledge mapping construction method, device, computer equipment and storage medium
CN110377755A (en) * 2019-07-03 2019-10-25 江苏省人民医院(南京医科大学第一附属医院) Reasonable medication knowledge map construction method based on medicine specification

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021139101A1 (en) * 2020-06-09 2021-07-15 平安科技(深圳)有限公司 Method and apparatus for constructing drug knowledge graph, and computer device
CN111914095B (en) * 2020-06-20 2024-04-19 武汉海云健康科技股份有限公司 Medicine interaction relation extraction method and system
CN111914095A (en) * 2020-06-20 2020-11-10 武汉海云健康科技股份有限公司 Medicine interaction relation extraction method and system
CN111968756A (en) * 2020-07-24 2020-11-20 北京索飞麦迪科技有限公司 Knowledge graph construction method and device for medicine specification
CN112053762A (en) * 2020-08-12 2020-12-08 北京左医健康技术有限公司 Medication management method, medication management apparatus, and computer-readable storage medium
CN111985224A (en) * 2020-08-31 2020-11-24 平安医疗健康管理股份有限公司 Medication instruction text processing method, device, equipment and storage medium
CN113076301A (en) * 2021-03-31 2021-07-06 北京搜狗科技发展有限公司 Knowledge base construction method, information query method, device and equipment
CN113076301B (en) * 2021-03-31 2025-02-07 北京搜狗科技发展有限公司 A method for building a knowledge base, an information query method, a device and an apparatus
CN114510577A (en) * 2021-12-31 2022-05-17 清华大学 Construction method and device of chemical safety knowledge graph
CN114691887A (en) * 2022-03-21 2022-07-01 浙江大华技术股份有限公司 Attribute triple combination method, apparatus, device and medium
CN114882985A (en) * 2022-07-11 2022-08-09 北京泽桥医疗科技股份有限公司 Medicine multimedia management system and method based on database and AI algorithm identification
CN115344715A (en) * 2022-09-16 2022-11-15 北京富通东方科技有限公司 Medical knowledge map conflict detection method
CN116486939A (en) * 2022-12-26 2023-07-25 北京左医健康技术有限公司 Data mining method and system for medicine knowledge graph and electronic equipment
CN116486939B (en) * 2022-12-26 2024-01-23 北京左医健康技术有限公司 Data mining method and system for medicine knowledge graph and electronic equipment

Also Published As

Publication number Publication date
CN111221979B (en) 2021-05-28

Similar Documents

Publication Publication Date Title
CN111221979B (en) Medicine knowledge graph construction method and system
CN111079377B (en) Method for recognizing named entities of Chinese medical texts
CN106919793B (en) Data standardization processing method and device for medical big data
CN107783950B (en) Method and device for processing drug instruction
Zhu et al. Extracting temporal information from online health communities
CN111723570A (en) Medicine knowledge graph construction method and device and computer equipment
CN115293161A (en) Reasonable medicine taking system and method based on natural language processing and medicine knowledge graph
CN106909783A (en) A kind of case history textual medical Methods of Knowledge Discovering Based based on timeline
CN113343680B (en) Structured information extraction method based on multi-type medical record text
CN112035757A (en) Medical waterfall flow pushing method, device, equipment and storage medium
CN113724830A (en) Medicine taking risk detection method based on artificial intelligence and related equipment
CN114420233A (en) Method for extracting post-structured information of Chinese electronic medical record
CN111104481A (en) Method, device and equipment for identifying matching field
WO2024109097A1 (en) Knowledge map creation method and apparatus for patent text, and storage medium and device
Gupta et al. Algorithms for rapid digitalization of prescriptions
CN116168844A (en) Medical data processing system based on big data analysis
CN112149411B (en) Method for constructing body in clinical application field of antibiotics
CN114913956A (en) Repeated medication reminding method and device based on knowledge graph and electronic equipment
CN116304114B (en) Intelligent data processing method and system based on surgical nursing
CN115527195A (en) Medical equipment nameplate information identification and extraction algorithm
CN111985224A (en) Medication instruction text processing method, device, equipment and storage medium
CN112053760B (en) Medication guide method, medication guide device, and computer-readable storage medium
CN113191141B (en) Query regular expression generation method, device, equipment and storage medium
Datta et al. Preserving medical information from doctor’s prescription ensuring relation among the terminology
CN109817300B (en) Medicine-taking rule generation method based on artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant