CN119785950A

CN119785950A - Medical knowledge base updating method, system, terminal and storage medium

Info

Publication number: CN119785950A
Application number: CN202510273990.4A
Authority: CN
Inventors: 李凤荣; 林文丛
Original assignee: North Health Medical Big Data Technology Co ltd
Current assignee: North Health Medical Big Data Technology Co ltd
Priority date: 2025-03-10
Filing date: 2025-03-10
Publication date: 2025-04-08
Anticipated expiration: 2045-03-10
Also published as: CN119785950B

Abstract

The invention relates to the technical field of knowledge maps, in particular to a medical knowledge base updating method, a system, a terminal and a storage medium, which comprise the steps of obtaining case data, wherein the case data comprises symptoms and diagnosis and treatment schemes; the method comprises the steps of extracting medicine information from a diagnosis and treatment scheme, retrieving target data from a medical knowledge base according to symptoms, wherein the target data comprises medicine entities related to the symptoms, matching the medicine information with the target data, determining difference identity codes and matching identity codes according to matching results, constructing a plurality of triples according to the difference identity codes and the matching identity codes, and updating the structure and attribute values of a knowledge graph of the medical knowledge base according to the triples. The invention can reduce redundant data of the knowledge base and improve the discovery efficiency of new treatment schemes.

Description

Medical knowledge base updating method, system, terminal and storage medium

Technical Field

The invention belongs to the technical field of knowledge maps, and particularly relates to a medical knowledge base updating method, a system, a terminal and a storage medium.

Background

In the medical field, the application of the knowledge graph has important auxiliary effect on the doctor to make diagnosis and treatment schemes. At present, the knowledge graph construction mode of part of medical knowledge base takes symptoms and diagnosis and treatment schemes as nodes and takes the association relationship between the symptoms and the diagnosis and treatment schemes as edges.

However, the mode has the obvious defects that on one hand, due to the diversity and variability of diagnosis and treatment schemes, one disease is often corresponding to a large number of diagnosis and treatment schemes, and a certain repeated part exists among the diagnosis and treatment schemes, so that the problem of redundant data storage is caused, the burden of data management is increased, and on the other hand, the key characteristics of the diagnosis and treatment schemes such as treatment cost, healing effect and the like cannot be comprehensively reflected by the construction mode. Even if a great number of diagnosis and treatment schemes are acquired by doctors, the doctors are difficult to directly screen the optimal schemes from the diagnosis and treatment schemes, and still need to screen and select among a plurality of schemes by virtue of abundant clinical experience, so that a great deal of time and energy are consumed, and diagnosis and treatment efficiency and quality are affected to a certain extent.

Important elements in the diagnosis and treatment scheme are taken as entities, so that a knowledge graph is constructed, and redundant storage can be avoided. However, this approach makes the structure of the knowledge graph more complex, and the later data update work is difficult.

Disclosure of Invention

The invention provides a medical knowledge base updating method, a system, a terminal and a storage medium for solving the technical problems.

In a first aspect, the present invention provides a medical knowledge base updating method, including:

acquiring case data, wherein the case data comprises symptoms and diagnosis and treatment schemes;

extracting medicine information from the diagnosis and treatment scheme, wherein the medicine information comprises identity codes and cost information;

Retrieving target data from a medical knowledge base in accordance with the condition, the target data comprising a pharmaceutical entity having a relationship to the condition;

Matching the drug information with target data, determining a difference identity code and a matching identity code according to a matching result, and constructing a plurality of triples according to the difference identity code and the matching identity code;

updating the structure and attribute values of the knowledge graph of the medical knowledge base according to the triples;

The difference identity code does not have a matched medicine entity in the target data, and the matched identity code has a matched medicine entity in the target data.

In an alternative embodiment, case data is obtained, the case data including a condition and a diagnosis and treatment regimen, comprising:

Acquiring the cure time and cure duration of each piece of case data;

Screening case data with cure time after the previous medical knowledge base update time as sample data;

And generating a corresponding duration identifier for the sample data according to the healing duration of the sample data.

In an alternative embodiment, drug information is extracted from the medical regimen, the drug information including an identity code and cost information, comprising:

extracting medicine name, manufacturer, batch and cost information from the medicine information;

And inquiring the identity codes from the pre-constructed dictionary according to the names, factories and batches of the medicines.

In an alternative embodiment, retrieving target data from a medical knowledge base according to the condition, the target data comprising a pharmaceutical entity having a relationship to the condition, comprises:

determining a retrieval range according to the department type corresponding to the symptom;

Constructing a query statement, wherein the query statement comprises the symptoms and the retrieval range;

and acquiring target data from a medical knowledge base according to the query statement.

In an alternative embodiment, matching the drug information with the target data, determining a difference identity code and a matching identity code according to the matching result, and constructing a plurality of triplets according to the difference identity code and the matching identity code, including:

Decoding the medicine entity in the target data into a corresponding medicine identity code;

calculating intersection of the medicine information and the decoded target data;

judging the identity codes belonging to the intersection in the medicine information as matching identity codes;

judging the identity codes which do not belong to the intersection in the medicine information as differential identity codes;

constructing corresponding triples according to the pairwise combination of the identity codes in the medicine information, and adding the duration identification of the belonging sample data for the triples;

Marking triples containing differential identity codes as differential triples;

marking triples which do not contain differential identity codes as matching triples;

and performing de-duplication treatment on all triples, and recording the repetition times.

In an alternative embodiment, updating the structure and attribute values of the knowledge graph of the medical knowledge base according to a plurality of the triples includes:

if the difference triplet comprises a difference identity code and a matching identity code, constructing a new node corresponding to the difference identity code contained in the difference triplet in a knowledge graph according to the difference triplet, constructing an edge between the new node and an entity node corresponding to the matching identity code, and setting the duration identifier of the difference triplet as the duration attribute of the edge;

If the difference triplet comprises two difference identity codes, respectively constructing two new nodes corresponding to the two difference identity codes in the knowledge graph, constructing an edge between the two new nodes, and setting the duration identifier of the difference triplet as the duration attribute of the edge;

Inquiring two nodes and edges corresponding to the matched triples from the knowledge graph, confirming that the time length identifier of the matched triples is smaller than the time length attribute of the edge, updating the time length attribute of the edge to be the healing time length indicated by the time length identifier of the matched triples, and updating the sum of the original reference attribute of the edge and the repetition number of the matched triples to be the new reference attribute of the edge.

In an alternative embodiment, the method further comprises:

regularly acquiring the reference attribute of the edge of the knowledge graph;

If the reference attribute of any side is determined to be reduced to a preset reference frequency threshold value, converting the side and two connected nodes into triplet data, and generating an audit task according to the triplet data;

The auditing task is distributed to an expert auditing terminal;

Receiving an auditing result returned by an expert auditing terminal, and if the auditing result is reserved, not processing the side and the node corresponding to the triple data;

screening isolated nodes in the knowledge graph, wherein the isolated nodes are nodes without connected edges;

and deleting the isolated nodes in the knowledge graph.

In a second aspect, the present invention provides a medical knowledge base updating system, comprising:

an acquisition module for acquiring case data, the case data includes disorders and treatment regimens;

the first processing module is used for extracting medicine information from the diagnosis and treatment scheme, and the medicine information comprises identity codes and cost information;

A retrieval module for retrieving target data from a medical knowledge base according to the condition, the target data comprising drug entities having a relationship to the condition;

The second processing module is used for matching the drug information with the target data, determining a difference identity code and a matching identity code according to a matching result, and constructing a plurality of triples according to the difference identity code and the matching identity code;

The updating module is used for updating the structure and attribute values of the knowledge graph of the medical knowledge base according to the triples;

In a third aspect, a terminal is provided, including:

A memory for storing a medical knowledge base update program;

a processor for implementing the steps of the medical knowledge base updating method as provided in the first aspect when executing the medical knowledge base updating program.

In a fourth aspect, there is provided a computer readable storage medium having stored thereon a medical knowledge base update program which, when executed by a processor, implements the steps of the medical knowledge base update method as provided in the first aspect.

The medical knowledge base updating method, the system, the terminal and the storage medium have the advantages that medicine information is extracted from newly generated case data, and the medicine information is matched with target data related to symptoms retrieved from a medical knowledge base, so that differential identity codes and matched identity codes in the medicine information are determined, the differential identity codes and the matched identity codes are built into triplets, the triplets are distinguished through distinguishing the identity codes of the medicine, when the knowledge spectrum is updated according to the triplets, corresponding updating means are used, the knowledge spectrum is not required to be traversed once every time to search nodes matched with the differential identity codes, the calculated amount is reduced, and in addition, the knowledge spectrum is completely updated based on the attribute values of edges in the knowledge spectrum generated by the matched identity codes, so that the knowledge spectrum can represent the application trend of a diagnosis and treatment scheme and the characteristics of different diagnosis and treatment schemes, and a proper diagnosis and treatment scheme can be easily found according to the attribute values in subsequent application.

In addition, the invention has reliable design principle, simple structure and very wide application prospect.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.

FIG. 1 is a schematic flow chart of a method of one embodiment of the invention.

FIG. 2 is a schematic block diagram of a system of one embodiment of the present invention.

Fig. 3 is a schematic structural diagram of a terminal according to an embodiment of the present invention.

Detailed Description

In order to make the technical solution of the present invention better understood by those skilled in the art, the technical solution of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

The following explains key terms appearing in the present invention.

A knowledge base is a system for storing and managing knowledge that may contain various types of knowledge, such as facts, rules, principles, experiences, and the like. The knowledge is organized, and stored for ease of retrieval, querying, and utilization.

A knowledge graph is a semantic network that graphically displays knowledge and relationships between knowledge. The knowledge graph consists of nodes and edges, wherein the nodes represent entities (such as characters, organizations, events, concepts and the like), and the edges represent relationships (such as father-son relationships, causal relationships, association relationships and the like) among the entities. By constructing the knowledge graph, the dispersed knowledge can be connected to form an organic whole, so that the knowledge can be better understood and utilized.

The data base of the knowledge base may be a knowledge graph, which in many cases can provide a high-quality data support for the knowledge base, mainly in the following aspects:

And (3) rich data association, namely graphically displaying the relationship between the entities by the knowledge graph, wherein the relationship comprises rich semantic information. When the knowledge graph is used as the data base of the knowledge base, the knowledge base can inherit the detailed association relations. For example, in the medical field, the complex relationships among diseases, symptoms, treatment methods, medicines and other entities in the knowledge graph can enable the medical knowledge base to more comprehensively present medical knowledge, and provide richer reference information for doctor diagnosis and treatment decision.

And the knowledge reasoning is facilitated, and the semantic network structure of the knowledge graph is beneficial to the knowledge reasoning. Knowledge bases constructed based on knowledge patterns can be used to mine implicit knowledge by means of its reasoning ability. For example, in the financial field, a knowledge base can discover potential financial risks and patterns of fraud through inference analysis of relationships between entities such as businesses, personnel, transactions, etc. in a knowledge graph.

And the data quality is improved, namely the data is required to be strictly extracted, cleaned and fused in the process of constructing the knowledge graph, and the accuracy, consistency and integrity of the data are ensured. The high-quality data is used as the basis of the knowledge base, so that the reliability and the practicability of the knowledge base can be improved. For example, in the field of electronic commerce, a commodity knowledge base based on a knowledge graph can accurately present information such as commodity attributes, specifications, using methods and the like, and provide better quality service for users.

Semantic retrieval is supported, the semantic nature of the knowledge graph enables a knowledge base based thereon to support semantic retrieval. When a user queries the knowledge base, the user can obtain information through keyword matching and can obtain related knowledge based on semantic understanding. For example, in the academic field, when a scientific research person queries a knowledge base, a semantic retrieval function based on a knowledge graph can help the scientific research person find knowledge related to a research topic, not just documents matched by keywords.

The medical knowledge base updating method provided by the embodiment of the invention is executed by the computer terminal, and correspondingly, the medical knowledge base updating system is operated in the computer terminal.

FIG. 1 is a schematic flow chart of a method of one embodiment of the invention. The execution body of fig. 1 may be a medical knowledge base updating system. The order of the steps in the flow chart may be changed and some may be omitted according to different needs.

As shown in fig. 1, the method includes:

s1, acquiring case data, wherein the case data comprise symptoms and diagnosis and treatment schemes.

Case data is acquired from a plurality of reliable channels such as a hospital information system and the like, and the data contains key information such as symptoms, corresponding diagnosis and treatment schemes and the like. This is the underlying data source for subsequent work.

S2, extracting medicine information from the diagnosis and treatment scheme, wherein the medicine information comprises identity codes and cost information.

And carrying out deep analysis on the acquired diagnosis and treatment scheme, and extracting medicine information in the diagnosis and treatment scheme, wherein the medicine information comprises an identity code (used for uniquely identifying the medicine) and cost information of the medicine. This information will provide an important basis for subsequent comparison and updating.

S3, retrieving target data from a medical knowledge base according to the symptoms, wherein the target data comprises medicine entities with relations with the symptoms.

According to the symptoms in the case data, searching is carried out in the existing medical knowledge base, and target data such as medicine entities and the like related to the symptoms are obtained. These target data are knowledge representations already in the knowledge base.

S4, matching the medicine information with the target data, determining a difference identity code and a matching identity code according to a matching result, and constructing a plurality of triples according to the difference identity code and the matching identity code.

And matching the extracted medicine information with the retrieved target data in detail. In this process, a differential identity code (i.e., a code in which there is no matching drug entity in the target data) and a matching identity code (i.e., a code in which there is a matching drug entity in the target data) are determined. From these codes, a number of triples are constructed in the form of (drug entity, relationship, attribute value), e.g. (drug, cost, specific amount).

S5, updating the structure and attribute values of the knowledge graph of the medical knowledge base according to the triples.

And updating the structure and attribute values of the knowledge graph of the medical knowledge base by using the constructed multiple triples. And for the drug entity corresponding to the matching identity code, updating the attribute value of the drug entity corresponding to the matching identity code according to the newly acquired information, thereby ensuring that the information of the knowledge graph is more accurate and complete.

In one embodiment of the present invention, based on step S1, a specific embodiment thereof will be described in a non-limiting manner by the following example.

S101, acquiring the cure time and cure duration of each piece of case data.

Firstly, the acquisition channel of the case data is clear, and the case data can be generally acquired from an electronic medical record system (EMR) of a hospital, a clinical data warehouse (CDR) and other systems. These systems store detailed information about the patient from the visit to the end of the treatment, including the time of cure and time node data associated with the cure.

Because the data storage formats of different systems are different, the acquired data needs to be subjected to format analysis. For example, for case data stored in structured data (e.g., JSON, XML format) in an electronic medical record system, a corresponding parsing tool (e.g., JSON library in Python, XML. Etre. Elementtree library) is used to extract cure time fields (typically stored in a date-time format, such as "YYYY-MM-DD HH: MM: SS"). For some semi-structured or unstructured data (such as medical record text), natural Language Processing (NLP) techniques, such as Named Entity Recognition (NER) algorithms, are required to identify key information in the text that represents healing time and convert it to a uniform date and time format.

After the cure time is obtained, the patient's onset time of illness (also extracted from the medical record system) is also required. If the onset time of the illness is missing, the estimation can be performed according to the related information such as the first visit time or the symptom appearance time. And then obtaining the healing time by calculating the time difference between the healing time and the onset time of the illness. In calculating the time difference, the time difference can be accurate to different time units of hours, days, weeks and the like according to specific requirements. For example, using datetime modules in Python to perform time calculations, the time difference is converted to a corresponding duration value.

S102, screening out case data with cure time after the previous medical knowledge base update time as sample data.

In medical knowledge base systems, there is typically a specialized record to keep a timestamp of each update. The accurate time of the previous medical knowledge base update is obtained from a metadata management module or a log file of the system, and the accuracy and the integrity of the time record are ensured.

The cure time of each piece of case data acquired in step S101 is compared with the previous update time. The screening operation can be implemented by using a database query sentence (such as an SQL sentence), and conditions are set in the WHERE clause to screen case data with cure time longer than the previous update time. For example, in MySQL database, a statement of "SELECT FROM case_ DATA WHERE cure_time > 'last update time'" may be used to obtain sample data that meets the conditions. If the data is stored in other data storage systems (such as NoSQL database), the filtering operation is performed according to the query grammar of the corresponding system.

And verifying and cleaning the screened sample data to ensure the quality of the data. Checking whether the sample data has the conditions of missing values, abnormal values and the like, removing case data of missing important information (such as cure time, symptoms and other key fields), and further verifying or correcting the abnormal values (such as obvious unreasonable cure time) so as to ensure the accuracy of subsequent analysis.

S103, generating a corresponding duration identifier for the sample data according to the healing duration of the sample data.

And determining a reasonable healing time interval according to the professional knowledge and actual requirements in the medical field. For example, the cure time period may be divided into various sections such as "short (0-7 days)", "medium (8-30 days)", "long (31 days and more)", and the like. Finer time interval division criteria can also be set according to the characteristics of the specific condition.

And matching the healing time length of each sample data with a set time length interval, and generating a corresponding time length identifier for the sample data according to a matching result. This matching process may be implemented using a conditional statement (e.g., if-elif-else statement in Python). For example, if a sample of data has a cure time of 5 days, a "short" time length indicator will be generated for it according to the interval division described above.

And the generated time length identification and the corresponding sample data are stored in a correlated way, a field can be newly added in the original sample data record to store the time length identification, and a new data table can be established to record the unique identification (such as a case number) of the sample data and the corresponding time length identification, so that the time length identification information can be conveniently acquired and used in the subsequent analysis and processing of the sample data.

In one embodiment of the present invention, based on step S2, a specific embodiment thereof will be described in a non-limiting manner by the following example.

S201, extracting medicine name, manufacturer, batch and cost information from the medicine information.

The medication records in the electronic medical record are semi-structured or unstructured text. The drug name is extracted therefrom using natural language processing techniques, and drug information is then retrieved from a local database based on the drug name.

The database query method includes extracting a required field using SQL query statements. For example, if there is a table named media_info in MySQL database, which contains prescription _id (prescription number), drug_name (drug name), manufacturer (manufacturer), batch_number (lot), cost (cost) and other fields, the following query statement may be used:

SELECT drug_name, manufacturer, batch_number, cost FROM medication_info。

After the data is extracted, it needs to be verified and cleaned. Checking whether the field is null, for the cost field, ensuring that it is a valid numeric type, for the drug name, manufacturer and lot, removing the front and back spaces, special characters, etc.

S202, inquiring the identity codes from a pre-constructed dictionary according to the names, factories and batches of the medicines.

1. Creating and storing a pre-built dictionary.

And collecting relevant information of the medicines, including medicine names, manufacturers, batches and corresponding identity codes. Can be obtained from the database of the medicine supervision department, the information provided by the medicine production enterprises and other channels. The collected data is organized into a dictionary form with the combination of drug name, manufacturer and lot as keys and the identity code as a value. The dictionary may be stored in memory or it may be stored in a file (e.g., JSON file) or database in series for later use.

2. And (5) inquiring.

If the dictionary is stored in memory, keys may be used directly to query the identity code. If the dictionary is stored in the JSON file, the file needs to be read and loaded as a dictionary before query. If the dictionary is stored in a database, SQL query statements may be used to query identity codes based on drug name, manufacturer, and lot. For example, in MySQL database, if there is a table named drug_identity, containing drug_ name, manufacturer, batch _number and identity_code fields, the following query statement may be used:

SELECT IDENTITY _code FROM drug_ IDENTITY WHERE drug_name= 'aspirin' AND manufacturer = 'XX pharmaceutical factory' AND batch_number= '20240101'.

3. Exception handling.

During the query, a situation may occur in which no matching item is found. At this time, it is necessary to perform corresponding exception handling such as logging, returning a default value or prompting the user for manual supplementary information, etc.

In one embodiment of the present invention, based on step S3, a specific embodiment thereof will be described in a non-limiting manner by the following example.

S301, determining a search range according to the department type corresponding to the symptom.

First, a mapping table of disorders and department types needs to be constructed, which can be created based on medical expertise, clinical practice experience, and department settings of hospitals. For example, upper respiratory tract diseases such as common cold and influenza generally correspond to respiratory medicine, and diseases such as fracture and joint injury correspond to orthopedics. These mappings may be stored in a database table, and the table structure may contain two fields, disease (name of the disorder) and department (department type).

In order to ensure the accuracy and the integrity of the mapping relationship, the data such as authoritative medical guidelines, clinical diagnosis and treatment specifications and the like can be referred to, and medical professionals can be invited to carry out auditing and correction.

When specific disease information is received, accurate identification of the disease is required. If the condition information is in text form, natural Language Processing (NLP) techniques such as word segmentation, part-of-speech tagging, named entity recognition, etc. are required to extract the key condition names. For example, for the text "patient suffers from acute gastroenteritis with symptoms of abdominal pain and diarrhea", the condition can be identified as "acute gastroenteritis" by NLP technique.

And then, according to the identified disease name, inquiring in a disease-department mapping table to find the corresponding department type.

And further determining the retrieval range in the medical knowledge base according to the matched department type. Different departments correspond to different knowledge subsets, for example, the retrieval range of the respiratory department comprises knowledge on diagnosis, treatment, medication and the like of respiratory diseases, and the retrieval range of orthopedics relates to knowledge on symptoms, operation methods, rehabilitation schemes and the like of skeletal muscle systems.

A corresponding index or tag may be set in the medical knowledge base for each department type to quickly locate the relevant knowledge area. For example, in the knowledge graph, a label of a department type is added for each node and each side, and after the department type is determined, the node and the side with the label of the department type can be screened out as a search range.

S302, constructing a query statement, wherein the query statement comprises the symptoms and the retrieval range.

Since the medical knowledge base is based on knowledge-graph data, a language, such as SPARQL (SPARQL Protocol and RDF Query Language), suitable for knowledge-graph query is typically selected. SPARQL is a standard language for querying RDF (Resource Description Framework) data that can efficiently query and manipulate nodes and relationships in a knowledge graph.

And constructing an SPARQL query statement by combining the determined symptoms and the search range. The basic structure of a query statement typically includes a SELECT clause, a WHERE clause, and so on. For example, suppose that a drug associated with an "acute gastroenteritis" condition is to be queried and that the scope of the search is a knowledge region of the gastroenterology, the SPARQL query statement is as follows:

PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX medical:<http://example.org/medical#>SELECT ?drug WHERE { ?disease rdf:type medical:Disease ; medical:name " Acute gastroenteritis " ; medical:relatedToDepartment medical:GastroenterologyDepartment . ?treatment medical:treats ?disease ; medical:usesDrug ?drug . ?drug rdf:type medical:Drug . }.

In the query statement described above, PREFIX defines a namespace, and the SELECT clause specifies the results (here, drugs) to be queried, and the WHERE clause describes the conditions of the query, including the name of the disorder, the department to which it belongs, and the relationship between the disorder and the drug.

S303, acquiring target data from a medical knowledge base according to the query statement.

And sending the constructed query sentence to a knowledge graph management system (such as Jena Fuseki, graphDB and the like) for execution. These systems provide storage, querying and management functions for knowledge-graph data.

After receiving the query statement, the knowledge graph management system performs matching and searching in the knowledge graph according to the conditions in the query statement. For example, according to the SPARQL query statement, the system searches the knowledge graph for a disease node named "acute gastroenteritis" and belonging to the department of gastroenterology, and then finds the relevant drug node through the relationship between the disease and the treatment, and between the treatment and the drug.

After the knowledge graph management system executes the query, a query result is returned. The results are typically presented in the form of tables or RDF triples.

The returned results are processed, e.g., duplicate entries removed, data format converted, etc. And then returning the processed result to the user so that the user can acquire target data such as medicines and the like which have relations with the symptoms. For example, the query results are returned to the front-end application in JSON format for review by a doctor or patient.

And accurately determining a retrieval range according to the department type corresponding to the disease, constructing an effective query statement, and acquiring required target data from a medical knowledge base based on knowledge graph data. This local search method reduces the calculation amount of search by narrowing the search range.

In one embodiment of the present invention, based on step S4, a specific embodiment thereof will be described in a non-limiting manner by the following example.

S401, decoding the medicine entity in the target data into a corresponding medicine identity code.

First, mapping rules between the drug entity and the identity code are ascertained based on a predefined dictionary or database. This mapping may be established during a pre-data preparation phase, for example by collecting detailed information (e.g. name, manufacturer, specification, etc.) of the drug in correspondence with the unique identity code. It may be stored in a key-value pair structure of data, such as a dictionary of Python, the keys being feature combinations of drug entities and the values being corresponding identity codes.

Traversing each drug entity in the target data, and converting the drug entity into a corresponding identity code according to the mapping rule. During the decoding process, a situation may be encountered in which the drug entity in the target data does not exist in the mapping relationship. At this point, exception handling, such as logging, is required, suggesting that there may be a problem with data loss or inconsistency, while either skipping the drug entity or using a default code may be selected.

S402, calculating intersection of the medicine information and the decoded target data.

The decoded target data and the medicine information have a unified data format, generally are of a set or list type, and are convenient to compute intersections.

Intersection is calculated using a set operation method provided by a programming language. For example in Python, the set type intersection method can be used.

S403, judging the identity codes belonging to the intersection in the medicine information as matching identity codes.

The identity codes in the drug information are traversed and each code is checked for the presence of an intersection. If so, it is determined to be a matching identity code. The implementation may be using loop and conditional statement, the following is a Python example:

matching_codes = [] for code in drug_info_codes: if code in intersection: matching_codes.append(code).

s404, judging the identity codes which do not belong to the intersection in the medicine information as differential identity codes.

The identity codes in the drug information are also traversed, checking whether each code is not in the intersection. If not, it is determined to be a differential identity code.

S405, constructing corresponding triples according to the pairwise combination of the identity codes in the medicine information, and adding duration identifiers of the belonging sample data for the triples.

The nested loop or combination generating function is used to generate a pairwise combination of identity codes in the pharmaceutical information. For example, in Python, the itertools.

Import itertools # assume that the duration flag duration_label= "short " triples = [] for code_pair in itertools.combinations(drug_info_codes, 2): triple = (code_pair[0], " association", code_pair [1], duration_label) copies.

A triplet is typically made up of two entities (here drug identity codes) and a relationship between them, with the addition of a duration identifier for the sample data. The relationship may be defined according to specific business requirements, for example, "associated" means that two drugs have a certain association in the same sample data.

S406, marking the triples containing the difference identity codes as difference triples.

All generated triples are traversed, and whether each triplet contains a difference identity code is checked. If so, the triplet is marked as a differential triplet. This may be accomplished by adding a tag field to the triplet data structure.

S407, marking the triples which do not contain the difference identity codes as matching triples.

All generated triples are traversed as well, checking whether each triplet does not contain a differential identity code. If not, the triplet is marked as a matching triplet.

S408, performing deduplication processing on all triples, and recording the repetition times.

Deduplication is achieved using a collection or dictionary. If a dictionary is used, the triplet may be used as a key and the number of repetitions may be used as a value.

In one embodiment of the present invention, based on step S5, a specific embodiment thereof will be described in a non-limiting manner by the following example.

S501, if the difference triplet comprises a difference identity code and a matching identity code, a new node corresponding to the difference identity code contained in the difference triplet is constructed in a knowledge graph according to the difference triplet, an edge is constructed between the new node and an entity node corresponding to the matching identity code, a time length identifier of the difference triplet is set as a time length attribute of the edge, and the repetition times of the difference triplet is set as a reference attribute of the edge.

First, traversing all the difference triples, and screening out triples containing a difference identity code and a matching identity code. Screening may be accomplished by parsing the data structure of the triples (e.g., tuples constructed in the previous steps that contain identity codes and tags), checking for cases where the identity codes belong to a set of differential identity codes and a set of matching identity codes.

For each triad screened, a differential identity code therein is determined. The API or command provided by the knowledge-graph management system is used to create a new node corresponding to the differential identity code. The attribute of the node can be set according to actual requirements, for example, identity codes are used as unique identification attributes of the node.

The entity node corresponding to the matching identity code is found, which can be realized by inquiring in the knowledge graph according to the identity code attribute. An edge is then constructed between the new node and the entity node corresponding to the matching identity code. The type of edge may be defined as an "association".

And extracting a duration identifier and the repetition number from the difference triplet, setting the duration identifier as the duration attribute of the edge, and setting the repetition number as the reference attribute of the edge. For example:

The knowledge graph management system used # provides set_edge_property function for triple in single_diff_triples: code1, _, code2, duration_label, _, repetition_count = triple diff_code = code1 if code1 in difference_codes else code2 match_code = code2 if code1 in difference_codes else code1 new_node = create_node({"id": diff_code}) match_node = find_node_by_id(match_code) edge = create_edge(new_node, match_node, " associated ") set_edge_property (edge," duration_label) set_edge_property (edge, "reference number", repetition_count).

S502, if the difference triplet comprises two difference identity codes, respectively constructing two new nodes corresponding to the two difference identity codes in a knowledge graph, constructing an edge between the two new nodes, setting a duration identifier of the difference triplet as a duration attribute of the edge, and setting the repetition times of the difference triplet as a reference attribute of the edge.

Traversing all the difference triples, and screening out triples containing two difference identity codes. Screening is also accomplished by parsing the data structure of the triplets to check if both of the identity codes belong to the set of differential identity codes.

And for each screened triplet, respectively determining two differential identity codes, and creating a new node corresponding to the two differential identity codes by using an API or a command of the knowledge graph management system.

An edge is built between two new nodes, and the type of edge is defined as "association".

And extracting a duration identifier and the repetition number from the difference triplet, setting the duration identifier as the duration attribute of the edge, and setting the repetition number as the reference attribute of the edge.

S503, inquiring two nodes and edges corresponding to the matched triples from the knowledge graph, confirming that the time length identifier of the matched triples is smaller than the time length attribute of the edge, updating the time length attribute of the edge to be the healing time length indicated by the time length identifier if the time length identifier of the matched triples is smaller than the time length attribute of the edge, and updating the sum of the original reference attribute of the edge and the repetition times of the matched triples to be the new reference attribute of the edge.

Traversing all the matching triples, and for each triplet, inquiring the corresponding two nodes and the edges between the nodes from the knowledge graph according to two identity codes in the triples. The query function provided by the knowledge graph management system can be used for querying through the identity coding attribute of the node.

And extracting the duration identifier from the matched triples, acquiring the original duration attribute from the edges, and comparing the duration attribute with the original duration attribute. If the duration identifier of the matched triplet is smaller than the duration attribute of the edge, the duration attribute of the edge is updated to the duration identifier of the matched triplet. Specifically, the duration is compared using the function compare_duration, and the attribute of the edge is set using the function set_edge_property.

Extracting the repetition times from the matched triples, acquiring original reference attributes from the edges, adding the original reference attributes to obtain new reference attribute values, and updating the reference attributes of the edges.

S504, similarly, constructing all medicine identity codes and corresponding symptoms contained in the medicine information into a disease medicine triplet, and updating the edges of the corresponding nodes in the knowledge graph and the reference attributes of the edges by using the disease medicine triplet. If there is no edge between the medicine node corresponding to the medicine triplet and the disease node, generating an edge, setting initial values for the time attribute and the reference attribute, namely the repetition number and the latest time of the medicine triplet, and if there is an edge between the medicine node corresponding to the medicine triplet and the disease node, updating the attribute of the edge only, wherein the updating method is the same as that of the steps 501-503.

Based on the above embodiment, in order to further reduce the data redundancy of the knowledge base, a step S6 is added to delete the invalid data. Specifically, the method comprises the following steps:

S601, regularly acquiring the reference attribute of the edge of the knowledge graph.

The task of timing acquisition of the reference properties is built using a mature timed task framework, such as APScheduler libraries in Python, or the Quartz framework in Java. And reasonably setting the time interval of task execution according to the business requirement and the updating frequency of the knowledge graph data. The setting is carried out once a month.

If the knowledge graph is stored based on RDF, the SPARQL language may be used for querying. And analyzing and storing the query result for later steps. The results may be stored in a database table or data structure (e.g., a list of pythons, a dictionary).

S602, determining that the reference attribute of any side is reduced to a preset reference frequency threshold, converting the side and two connected nodes into triplet data, and generating an audit task according to the triplet data.

And determining the threshold value of the reference times by a business expert and a data analyst together according to the use scene and the data characteristics of the knowledge graph. For example, in a medical knowledge graph, if the number of references to an edge is small, which may mean that the knowledge is less relevant, the threshold may be set to the minimum of the currently acquired reference attributes, or the sum of the minimum and a fixed parameter.

Traversing the acquired edge reference attribute data, and finding out the edge with the reference attribute value lower than the preset threshold value.

And for the screened edges, acquiring information of two nodes connected with the edges from the knowledge graph, and converting the edge and node information into triplet data. The triplet may be in the form of (node 1, edge type, node 2). Wherein get_node_info and get_edge_type are functions that acquire node information and edge type.

An audit task is generated for each triplet data, and the audit task comprises information such as triplet data, audit task ID, task description and the like.

S603, distributing the auditing task to an expert auditing terminal.

And developing an interface with an expert auditing terminal to ensure that auditing task data can be accurately sent to the terminal. The RESTful API may be used for data transfer. And reasonably distributing auditing tasks according to factors such as professional fields, workload and the like of experts. The intelligent distribution may be implemented using a rules engine or a machine learning algorithm.

S604, receiving an auditing result returned by the expert auditing terminal, if the auditing result is reserved, not processing the side and the node corresponding to the triple data, and if the auditing result is deleted, deleting the side corresponding to the triple data.

And developing an interface for receiving a returned result of the expert auditing terminal, analyzing the returned JSON data, and extracting an auditing task ID and an auditing result.

If the auditing result is reserved, no operation is performed, and the edges and nodes corresponding to the triplet data in the knowledge graph are reserved continuously.

If the auditing result is deletion, the side is deleted by using an API provided by the knowledge graph management system. For example, in RDF-based knowledge graphs, DELETE edges may be deleted using the DELETE statement of SPARQL.

S605, screening isolated nodes in the knowledge graph, wherein the isolated nodes are nodes without connected edges.

And traversing all nodes in the knowledge graph by using a traversing interface provided by the knowledge graph management system. For each node it is checked whether there are connected edges. For large-scale knowledge graphs, a block traversal or index optimization mode can be adopted, so that traversal efficiency is improved.

S606, deleting the isolated nodes in the knowledge graph.

And calling an API (application program interface) for deleting the nodes provided by the knowledge graph management system, and deleting the screened isolated nodes from the knowledge graph. For example, in graph database Neo4j, nodes may be deleted using the cytoer statement.

After deleting the nodes, the data consistency of the knowledge graph needs to be ensured, and the situation of data residues or wrong references is avoided. Data verification and cleanup operations may be performed after the nodes are deleted.

Since the reference attribute of an edge indicates the number of times corresponding data is referenced, when the number of references is too small, it is interpreted that the element is not commonly used. And sending the unusual data to an expert auditing terminal, manually auditing the value of the unusual data, and deleting the corresponding side if the unusual data is determined to have no value. When one node has no connected edge, the node is not referenced for a long time, and the corresponding medicine is abandoned and can be directly deleted.

In a specific update scenario, the update method comprises the steps of:

1. Case data preprocessing

1.1 The data acquisition unit acquires case data from the electronic medical record system at regular time, and extracts a cure time stamp and a cure time length value of each record. 1.2 The time screening unit screens cases with cure time after T as effective samples based on the previous knowledge base update time T, and establishes a sample data set. 1.3 The identifier generating unit generates a time length identifier for each sample, wherein the time length identifier comprises healing time length grading information (such as short term less than or equal to 7 days, middle term 8-14 days and long term more than or equal to 15 days).

2. Drug information standardization

2.1 The information extraction engine analyzes the medicine specification and the supply chain data, and extracts four-dimensional characteristics of medicine names, manufacturers, production batches and cost parameters. 2.2 The code mapper calls a pre-constructed drug feature dictionary, and maps the (name+manufacturer+batch) combination into a unique identity code CID to form a standard drug information set.

3. The target data retrieval 3.1 department classifier determines the associated departments (such as respiratory department and cardiovascular department) according to ICD-10 codes of the current symptoms, and defines the retrieval range of the knowledge base. 3.2 The query builder generates structured search statements containing disorder keywords, department constraints, and time-scale parameters. 3.3 And the knowledge base interface executes the retrieval operation and returns a target data set meeting the conditions, which contains the drug entity and the treatment association information thereof.

4. The data matching and map construction (corresponding to steps S4-S5) 4.1 decoding and comparing unit reversely analyzes the medicine entity in the target data into CID, and calculates intersection CID_common and difference CID_diff of the medicine entity and the current medicine information set. 4.2 The triplet generator performs the following operations:

Generating a basic triplet of (CID_A, association relation and CID_B) for medicines in CID_common in a pairwise combination way;

generating a difference triplet for the combination involving cid_diff, marking type (single difference/double difference);

A sample source duration identification and repetition count is appended to all triples. 4.3 The map updating engine dynamically operates the knowledge map:

a single difference triplet, namely, newly creating a CID_diff node, establishing an edge with the existing node, and recording the shortest time length mark and the reference count by the edge attribute;

newly building a double node and an associated edge, wherein the attribute is set as above;

and (3) basic triples, namely if the duration of the existing edge is longer than that of the new identification, updating to a shorter value and accumulating the reference count.

5. The knowledge-graph optimizing module (corresponding to step S6) periodically scans the edge reference attribute, and generates < CID_X, relation, CID_Y > to-be-checked triples for edges with reference times lower than a threshold K (such as 5 times) in continuous N periods (such as 3 months). 5.2 And the task distributor pushes the triples to be checked to the expert terminal, and adds reference data such as related medicine specifications, clinical guidelines and the like. 5.3 The response processor performs the operations according to expert feedback:

a reservation instruction resets the edge reference count to an initial value;

Remove instruction remove target edge and check associated node connectivity.

5.4 The isolated node remover traverses the map regularly, deletes free nodes without connected edges, and releases storage resources.

According to the embodiment, the real-time evolution of the medical knowledge graph is realized through a dynamic triplet generation mechanism, and particularly, the timeliness optimization of the treatment scheme is ensured through a competition updating mechanism of the duration identification. And an expert collaborative redundancy cleaning mechanism is introduced, so that the risk of false deletion is reduced while the freshness of knowledge is maintained. Through testing, the system reduces the redundant data of the knowledge base by 62%, and the discovery efficiency of the new treatment scheme is improved by 41%.

In some embodiments, the medical knowledge base update system may include a plurality of functional modules comprised of computer program segments. The computer program of each program segment in the medical knowledge base updating system may be stored in a memory of a computer terminal and executed by at least one processor to perform (see fig. 1 for details) the functions of medical knowledge base updating.

In this embodiment, the medical knowledge base updating system may be divided into a plurality of functional modules according to the functions performed by the medical knowledge base updating system, as shown in fig. 2. The functional modules of the system can comprise an acquisition module, a first processing module, a retrieval module, a second processing module and an updating module. The module referred to in the present invention refers to a series of computer program segments capable of being executed by at least one processor and of performing a fixed function, stored in a memory. In the present embodiment, the functions of the respective modules will be described in detail in the following embodiments.

Fig. 3 is a schematic diagram of a medical knowledge base updating method according to an embodiment of the present application, which may be applied to a terminal. It will be appreciated by those skilled in the art that the terminal structure referred to in the embodiments of the present application does not constitute a limitation on the terminal, and the terminal may include more or less components than illustrated, or may combine some components, or may have a different arrangement of components. In embodiments of the present application, terminals include, but are not limited to, laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. Terminals may also represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smartphones, wearable terminals, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the embodiments of the application described and/or claimed herein.

The terminal 300 may include a processor 310, a memory 320, and a communication unit 330. The components may communicate via one or more buses, and it will be appreciated by those skilled in the art that the configuration of the server as shown in the drawings is not limiting of the invention, as it may be a bus-like structure, a star-like structure, or include more or fewer components than shown, or may be a combination of certain components or a different arrangement of components.

The memory 320 may be used to store instructions for execution by the processor 310, and the memory 320 may be implemented by any type of volatile or non-volatile memory terminal or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk, or optical disk. The execution of the instructions in memory 320, when executed by processor 310, enables terminal 300 to perform some or all of the steps in the method embodiments described below.

The processor 310 is a control center of the storage terminal, connects various parts of the entire electronic terminal using various interfaces and lines, and performs various functions of the electronic terminal and/or processes data by running or executing software programs and/or modules stored in the memory 320, and invoking data stored in the memory. The processor may be comprised of an integrated circuit (INTEGRATED CIRCUIT, simply referred to as an IC), for example, a single packaged IC, or may be comprised of multiple packaged ICs connected to one another for the same function or for different functions. For example, the processor 310 may include only a central processing unit (Central Processing Unit, CPU for short). In the embodiment of the invention, the CPU can be a single operation core or can comprise multiple operation cores.

And a communication unit 330 for establishing a communication channel so that the storage terminal can communicate with other terminals. Receiving user data sent by other terminals or sending the user data to other terminals.

The present invention also provides a computer storage medium in which a program may be stored, which program may include some or all of the steps in the embodiments provided by the present invention when executed. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory RAM), or the like.

It will be apparent to those skilled in the art that the techniques of embodiments of the present invention may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solution in the embodiments of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium such as a U-disc, a mobile hard disc, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk or an optical disk, etc. various media capable of storing program codes, including several instructions for causing a computer terminal (which may be a personal computer, a server, or a second terminal, a network terminal, etc.) to execute all or part of the steps of the method described in the embodiments of the present invention.

The same or similar parts between the various embodiments in this specification are referred to each other. In particular, for the terminal embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and reference should be made to the description in the method embodiment for relevant points.

In the several embodiments provided by the present invention, it should be understood that the disclosed systems and methods may be implemented in other ways. For example, the system embodiments described above are merely illustrative, e.g., the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with respect to each other may be through some interface, indirect coupling or communication connection of systems or modules, electrical, mechanical, or other form.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.

Although the present invention has been described in detail by way of preferred embodiments with reference to the accompanying drawings, the present invention is not limited thereto. Various equivalent modifications and substitutions may be made in the embodiments of the present invention by those skilled in the art without departing from the spirit and scope of the present invention, and it is intended that all such modifications and substitutions be within the scope of the present invention/be within the scope of the present invention as defined by the appended claims.

Claims

1. A method for updating a medical knowledge base, comprising:

2. The method of claim 1, wherein obtaining case data, the case data including a condition and a treatment plan, comprises:

Acquiring the cure time and cure duration of each piece of case data;

3. The method of claim 1, wherein extracting medication information from the medical regimen, the medication information including identification codes and cost information, comprises:

4. The method of claim 1, wherein retrieving target data from a medical knowledge base in accordance with the condition, the target data comprising a pharmaceutical entity having a relationship to the condition, comprises:

5. The method of claim 2, wherein matching the drug information to the target data, determining a differential identity code and a matching identity code based on the matching result, and constructing a plurality of triples based on the differential identity code and the matching identity code, comprises:

Marking triples containing differential identity codes as differential triples;

6. The method of claim 5, wherein updating the structure and attribute values of the knowledge-graph of the medical knowledge-base based on a plurality of the triples comprises:

constructing all medicine identity codes and corresponding symptoms contained in the medicine information into a disease medicine triplet, and updating edges of corresponding nodes in the knowledge graph and reference attributes of the edges by utilizing the disease medicine triplet;

7. The method of claim 6, wherein the method further comprises:

regularly acquiring the reference attribute of the edge of the knowledge graph;

The auditing task is distributed to an expert auditing terminal;

and deleting the isolated nodes in the knowledge graph.

8. A medical knowledge base updating system, comprising:

9. A terminal, comprising:

A memory for storing a medical knowledge base update program;

a processor for implementing the steps of the medical knowledge base updating method according to any one of claims 1-7 when executing said medical knowledge base updating program.

10. A computer readable storage medium storing a computer program, characterized in that the readable storage medium has stored thereon a medical knowledge base update program which, when executed by a processor, implements the steps of the medical knowledge base update method according to any one of claims 1-7.